On Sat, 30 Jan 2010 20:34:05 -0600, Peng Yu wrote: > On Sat, Jan 30, 2010 at 10:11 AM, jellybean stonerfish > <stonerf...@geocities.com> wrote: >> On Fri, 29 Jan 2010 09:20:22 -0600, Peng Yu wrote: >> >>> It seems that diff can not do comparison on the decompressed files in >>> .gz and .bz2 files. I could first decompress the .gz and .bz2 file and >>> then do the comparison. But it would be convenient to be able to >>> directly compare without explicitly decompressing any files. Could >>> somebody add this feature to diff? >> >> You could make a little script pretty easy. >> >> $ cat /home/js/bin/gzbzdiff >> #!/bin/bash >> >> mkfifo gzi >> mkfifo bzi >> >> gunzip -c $1 > gzi & >> bunzip2 -c $2 > bzi & >> diff -s bzi gzi >> rm bzi gzi >> >> $ gzbzdiff nsd.gz nsd.bz2 >> Files bzi and gzi are identical > > Of course, I could. But I think this may not be an efficient way if the > .gz file is too large, say of the order of GB.
I tried a stupider idea first. Expanding a bzip file to stdout, piping this to gzip, and then piping the gzipped stream to diff to compare with a gzip file, but it always says the files are different. bunzip2 -c nsd.bz2 | gzip - | diff nsd.gz - With further looking I find that if I gzip a file twice the resulting gz files are different. This was not what I expected. $ cat nsd | gzip - > nsd1.gz $ cat nsd | gzip - > nsd2.gz $ diff nsd1.gz nsd2.gz Binary files nsd1.gz and nsd2.gz differ Yet if I gunzip the two gz files, the expanded files are identical. $ gunzip nsd1.gz $ gunzip nsd2.gz $ diff -s nsd1 nsd2 Files nsd1 and nsd2 are identical Maybe the gzip algorithm uses some randomness in its compression?