Jim Meyering <[EMAIL PROTECTED]> writes: > I'm probably going to change the documentation so that > people will be less likely to depend on being able to run > a separate program. To be precise, I'd like to document > that the only valid values of GNUSORT_COMPRESSOR are the > empty string, "gzip" and "bzip2"[*].
This sounds extreme, particularly since gzip and bzip2 are not the best algorithms for 'sort' compression, where you want a fast compressor. Better choices right now would include include lzop <http://www.lzop.org/> and maybe QuickLZ <http://www.quicklz.com/>. The fast-compressor field is moving fairly rapidly. (I've heard some rumors from some of my commercial friends.) QuickLZ, a new algorithm, is at the top of the maximumcompression list right now for fast compressors; see <http://www.maximumcompression.com/data/summary_mf3.php>. I would not be surprised to see a new champ next year. > Then we will have the liberty to remove the exec calls and use library > code instead, thus making the code a little more efficient -- but mainly, > more robust. It's not clear to me that it'll be more efficient for the soon-to-be common case of multicore chips, since 'sort' and the compressor can run in parallel. We'll have to measure. I agree about the robustness but that should be up to the user. Perhaps we could put in something that says, "If the compressor is named 'gzip' we may optimize that." and similarly for 'lzop' and/or a few other compressor names. Or, more generally, we could have the convention that if the compressor name starts with "-" we will strip the "-" and then try to optimize the result if we can. Something like that, anyway. > [*] If gzip and bzip2 are good enough for tar, why should sort make any > compromise (exec'ing some other program) in order to be more flexible? For 'sort' the tradeoff is different than for 'tar'. We don't particularly care if the format is stable, since it's throwaway. And we want fast compression, whereas people generating tarballs often are willing to have way slower compression for a slightly higher compression ratio. (Plus, new versions of 'tar' allow arbitrary compressors anyway.) I do have a suggestion: we shouldn't use an environment variable to select a compressor. It should just be an option. Environment variables are funny beasts and it's better to avoid them if we can. I'll construct a patch along those lines if you like. _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils