On Aug 13, 2014, at 10:39 PM, Dilip Chhetri <[email protected]> wrote:
> I have lots of large .tar.gz files and I need to extract just a single > (small) file from it. I purposely put it at the front of .tar file so that > extraction is fast, but if that file is gzipped, then 'tar' wants to read the > whole .tgz file before exiting. ... > Q: is there any special option to make it fast. If not this would be really > good enhancement (I saw lot of people asking for it on the web). If someone > can post a patch to fix this behaviour, that would be really nice. I spent > sometime reading source code for tar, but things aren't looking obvious to me. GNU tar relies on a separate program to decompress input files and needs to allow that program to finish in order to avoid problems in various boundary cases. In effect, this means that GNU tar does in fact always decompress the entire file. This is unlikely to change, as a lot of people depend on the current behavior. You could try ‘bsdtar’ (should be available as a package on your favorite OS). It uses a different approach to decompression that may work better for your particular application. If you need to do things very differently, you could consider building a specialized extraction program using libarchive. Libarchive is the archiving and dearchiving engine used by bsdtar. It is also used by other programs (such as package managers) that need to work with tar, cpio, zip, or other archive formats. Finally, if being able to extract single files quickly is really important, consider switching to Zip format. Zip does not offer quite as good overall compression as tar.gz or tar.bz2 but supports much, much faster extraction of single files. Cheers, Tim
