On 9/14/2010 2:04 AM, Gary V. Vaughan wrote: > I'm curious to know what the history of lzma and xz is that makes this > desirable though.
Here's some documentation I put together for the cygwin xz package: xz ======================================================================== This package provides a data compression library and utilities supporting the .xz and .lzma file formats, which use the LZMA compression algorithm. LZMA provides high compression ratios and very fast decompression, with minimal memory requirements for decompression. XZ Utils is the latest generation of this software, supplanting the older LZMA Utils. The cygwin xz package replaces and obsoletes the cygwin lzma package. LZMA Utils (and its own antecedent, the LZMA SDK) provided the 'lzma' tool, which supported the 'LZMA-Alone' file format usually indicated by the extension '.lzma'. Internally, this file format used what is now called the LZMA1 compression algorithm. XZ Utils provides the xz tool, which supports the new .xz file format usually indicated by the extension '.xz'. Internally, it uses a variation of the original LZMA compression algorithm, called LZMA2. However, the new xz tool also seamlessly supports the older .lzma files and LZMA1 compression. History: ======================================================================== 1. LZMA SDK First there was the LZMA SDK. Upstream, it shipped no libraries; only a few executables such as 'lzma'. The source code was provided for public use (under a variety of licenses), but it was expected that developers would incorporate the source code directly into their own projects. This is not The Unix Way. The LZMA SDK was tightly coupled with the 7zip compression program, and both were developed on and solely for the Windows platform. 7zip -- but not the LZMA SDK -- was ported to Unix under the auspices of the p7zip ("Portable 7zip") project. (As an aside, p7zip was then ported to cygwin...to come full circle). However, it should be clear that the file format used by 7zip (and p7zip) was completely different from the one supported by the LZMA SDK's 'lzma' tool. The latter used what was called the 'LZMA-Alone' format, which consisted of 13 bytes of header information followed by a raw lzma-compressed byte-stream. 7zip, on the other hand, used a much more complicated file format capable of hosting multiple files, spanned archives, and other features. The only similarity is that the core data compression algorithm used by both is LZMA. 2. LZMA Utils Eventually, a unix port of the LZMA SDK appeared, in the form of the LZMA Utils distribution, which reorganized the original source code, and provided the decompression code in library form (liblzmadec). It also provided a version of the 'lzma' program, but with a completely different command-line interface. The LZMA Utils version consciously mimicked the command-line options of the familiar gzip and bzip2 tools, while the original LZMA SDK version was...different. Very different. This is because the LZMA SDK's tool was originally intended just as a test and development utility, to help refine the algorithm. So, it has a number of 'compression guru' options that no sane user cares to use, and very few of the 'normal user' options that they would. LZMA Utils: (Lasse Collin) lzma -d foo.tar.lzma uncompress to (implied) foo.tar, and remove original compressed file. lzma foo.tar compress to (implied) foo.tar.lzma, and remove original uncompressed file. Supports familiar "tuning" options like -0 .. -9 Sends output data to stdout using -c Could be invoked under alternate names (symlinks) for different behavior: unlzma == lzma -d (uncompress) lzcat == lzma -dc (uncompress to stdout) LZMA SDK: (Igor Pavlov) lzma d foo.tar.lzma foo.tar lzma e foo.tar foo.tar.lzma mode d/e is the required first non-option argument both input and output files must be specified stdout? what's that? Finally, LZMA Utils also shipped a number of helpful scripts similar to the familiar ones from gzip and bzip2: lzdiff/lzcmp, lzgrep/lzegrep/lzfgrep, lzless/lzmore So, the LZMA SDK version was hardly suitable for replacing or augmenting the existing bzip2 and gzip compression programs on unix systems, expecially as the most common use was in conjuction with tar. But tar expects compression programs to satisfy a common command-line argument format, and to be able to manipulate data on standard streams. Most linux distributions have standardized on LZMA Utils. The lzma tool from both LZMA SDK and LZMA Utils each support the LZMA-Alone (.lzma) file format, as does the liblzmadec library from LZMA Utils. However, the .lzma file format (e.g. LZMA-Alone) is not sufficient for modern needs, as it (1) had no 'signature bytes' so compressed files were difficult to automatically detect and verify, (2) it had no provision for internal integrity checks, and (3) it could not support multi-file archives. 3. XZ Utils Approaching final non-beta release is the newest member of this family, the XZ Utils. Addressing the shortcomings of the LZMA-Alone file format, the xz file format and the (slightly modified) LZMA2 compression algorithm were jointly developed by Lasse Collin (LZMA Utils) and Igor Pavlov (LZMA SDK). The xz tool has all of the benefits of the LZMA Utils' version of the lzma tool, and ships with all of the same helpful scripts. In addition, it can be invoked as either 'xz' (or xzcat, unxz) or 'lzma' (or lzcat, unlzma) so you don't even need to retrain your fingers. You probably should, though, because .lzma files are already being replaced by .xz files on by many software distribution sites, including GNU ones. Finally, the XZ Utils also provides the liblzma decompression AND compression library, which supports both LZMA-Alone (that is, the old .lzma) format, and the new .xz format. The new .xz file format has an easily identifiable initial signature for automated format detection and verification. It supports integrity checks of several types including cryptographic hashes. Finally, the format also supports multiple compressed streams within the same file (that is, multi-file archives). However, the xz tool does NOT, at present, support multi-file archives; only archives with a single compressed stream. As an aside, eventually the 7zip (and pz7ip) utilities will support a "new" .7z format -- which will be simply a compatible variant of the .xz file format, but with custom filters (codecs) specified in the (highly extensible) header defined by the .xz standard. This was the primary reason for the new .xz format to support multi-file archives; because the xz tool has no present need for them, and doesn't even support them (although the liblzma library does). Single File Compression ======================================================================== Although the xz file format supports multiple streams, the xz tool itself is concerned only with single files that have been compressed as a single complete stream using LZMA compression. This is similar to the behavior of the older lzma tool and its LZMA-Alone file format, or the archetypal gzip and bzip2 compression programs. Just as with bzip2 and gzip (and the old lzma tool), to create multi- file archives you should use tar and THEN compress with xz.exe. For an integrated compressed archive file format that uses LZMA compression, see p7zip and the 7zip programs, and their associated .7z file format. -- Chuck