I have lots of large .tar.gz files and I need to extract just a single
(small) file from it. I purposely put it at the front of .tar file so that
extraction is fast, but if that file is gzipped, then 'tar' wants to read
the whole .tgz file before exiting.


to explain this phenomena, consider this

1) regular extract
desktop1:/tmp$ time dd if=linux-3.4.2.tar.bz2 bs=1k|bunzip2|tar x
linux-3.4.2/Documentation/ABI/README
78284+1 records in
78284+1 records out
80162970 bytes (80 MB) copied, 8.96967 s, 8.9 MB/s

real    0m8.983s
user    0m9.057s
sys    0m0.549s

* performance is same for "tar jxf linux-3.4.2.tar.bz2
linux-3.4.2/Documentation/ABI/README"

2) crude way of fast extract
dchhetri@desktop1:/tmp$ time dd if=linux-3.4.2.tar.bz2 bs=1k
count=1000|bunzip2|tar x linux-3.4.2/Documentation/ABI/README
1000+0 records in
1000+0 records out
1024000 bytes (1.0 MB) copied, 0.0980247 s, 10.4 MB/s

bunzip2: Compressed file ends unexpectedly;
    perhaps it is corrupted?  *Possible* reason follows.
bunzip2: Inappropriate ioctl for device
    Input file = (stdin), output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

/tmp/tar: Unexpected EOF in archive
/tmp/tar: Error is not recoverable: exiting now

real    0m0.105s
user    0m0.104s
sys    0m0.009s


As you can see, using method (2) I can still extract single file in 0.1
second (vs 8.9 second). Looks to me that 'tar' still reads the whole
archive from stdin even though it is done extracting.

Q: is there any special option to make it fast. If not this would be really
good enhancement (I saw lot of people asking for it on the web). If someone
can post a patch to fix this behaviour, that would be really nice. I spent
sometime reading source code for tar, but things aren't looking obvious to
me.

Thanks,
    Dilip

Reply via email to