On Mon, Apr 20, 2009 at 6:39 PM, John Levon <john.levon at sun.com> wrote: > On Mon, Apr 20, 2009 at 09:25:51AM -0700, Rich Burridge wrote: > >> bzip2 is a free and open source block sorting lossless data >> compression algorithm with comparatively high compression efficiency. >> >> pbzip2 is a parallel implementation of the bzip2 algorithm using >> pthreads written in C++ by Jeff Gilchrist that retains file >> compatibility with the common bzip2(1) application included in >> Solaris and many other operating systems > > Is there a reason we're not delivering this version as the real bzip2, > then just providing a symlink? What is the advantage of the non-parallel > implementation exactly to mean it needs a new name?
I use pbzip2 and bzip2 a lot. They should be kept separate. For one thing, I would expect to get regular bzip2 if I called it by that name, likewise pbzip2. There are key behavioural differences that I've seen: - pbzip2 by default will use all the available cpus. You really don't want to make it that easy to saturate a machine - it can be very unpleasant on the other users of the machine. - pbzip2 requires memory equal to the size of the file to decompress a file compressed by bzip2, which may be extremely large and may not work at all -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/