Re: [Bug-tar] fast extract of single file from large tar.gz archive

Tim Kientzle Thu, 14 Aug 2014 08:48:06 -0700

On Aug 13, 2014, at 10:39 PM, Dilip Chhetri <[email protected]> wrote:


> I have lots of large .tar.gz files and I need to extract just a single 
> (small) file from it. I purposely put it at the front of .tar file so that 
> extraction is fast, but if that file is gzipped, then 'tar' wants to read the 
> whole .tgz file before exiting.
...
> Q: is there any special option to make it fast. If not this would be really 
> good enhancement (I saw lot of people asking for it on the web). If someone 
> can post a patch to fix this behaviour, that would be really nice. I spent 
> sometime reading source code for tar, but things aren't looking obvious to me.

GNU tar relies on a separate program to decompress
input files and needs to allow that program to finish in
order to avoid problems in various boundary cases.  In
effect, this means that GNU tar does in fact always
decompress the entire file.  This is unlikely to change,
as a lot of people depend on the current behavior.

You could try ‘bsdtar’ (should be available as a package
on your favorite OS).  It uses a different approach
to decompression that may work better for your particular
application.

If you need to do things very differently, you could
consider building a specialized extraction program using
libarchive.  Libarchive is the archiving and dearchiving engine
used by bsdtar.  It is also used by other programs (such as
package managers) that need to work with tar, cpio, zip, or
other archive formats.

Finally, if being able to extract single files quickly is really
important, consider switching to Zip format.  Zip does not
offer quite as good overall compression as tar.gz or tar.bz2
but supports much, much faster extraction of single files.

Cheers,

Tim

Re: [Bug-tar] fast extract of single file from large tar.gz archive

Reply via email to