Performance improvement - extract single match

Philip Rowlands Thu, 14 Mar 2024 03:08:42 -0700

When 
tar --extract --file test.tar filename
has found and written the contents of "filename", it could stop, but does not.


I assume this is because archive members with matching names could be present 
more than once, therefore the whole archive must be scanned.

However, in the common use-case populating an archive from a filesytem 
~atomically, duplicate member names are not expected.

The reason this was spotted is a compressed archive containing metadata, 
logfiles, and core dumps. Although the metadata members are small and early in 
the archive, reading the whole archive "costs" disk I/O and CPU time for 
decompression.

Are there use cases where members are appended, with the intention of "latest 
version wins" on extraction?

Example follows, filenames have been lightly anonymized:

$ tar --version
tar (GNU tar) 1.30

$ time tar xvaf crashdump.tar.bz2 metadata/files.log
metadata/files.log

real    0m9.570s

We could:
- add an option to stop after one exact match
- stop after one exact match by default, and add an option to continue to 
extract any duplicate members
- do nothing


Cheers,
Phil

Performance improvement - extract single match

Reply via email to