Paul Eggert <[EMAIL PROTECTED]> wrote: > Jim Meyering <[EMAIL PROTECTED]> writes: >> So, with just one trial each, I see a 19% speed-up. > > Yaayyy! That's good news. Thanks for timing it. I read your email > just after talking with Dan (in person) about how we'd time it. I > just bought 1 TB worth of disk for my home computer and hadn't hooked > it up yet, so was going to volunteer that, but you beat me to it.
I've done some more timings, but with two more sizes of input. Here's the summary, comparing straight sort with sort --comp=gzip: 2.7GB: 6.6% speed-up 10.0GB: 17.8% speed-up For the smaller input, I also did as James Youngman suggested and used "cat" as the no-op compressor/decompressor. That made sort run 34% longer. ==================== Here's the smaller input: $ seq 9999999 > k $ cat k k k k k k k k k > j $ cat j j j j > sort-in $ wc -c sort-in 2839999968 sort-in With --compress=gzip: $ /usr/bin/time ./sort -T. --compress=gzip < sort-in > out 814.07user 29.97system 14:50.16elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (4major+2821589minor)pagefaults 0swaps With no --compress= option: $ /usr/bin/time ./sort -T. < sort-in > out 398.98user 17.08system 15:53.49elapsed 43%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (2major+229797minor)pagefaults 0swaps With --compress=$PWD/cat-wrap: [where the cat-wrap script accepts and ignores the -d option: printf '#!/bin/sh\ntest $# != 0 && test x$1 = x-d && shift; exec cat "$@"' \ > cat-wrap chmod a+x cat-wrap BTW, this example demonstrates already how it'd be nice to be able to specify a decompressor separately: when the decompressor isn't "compressor -d" ] $ /usr/bin/time ./sort -T. --compress=$PWD/cat-wrap < sort-in > out 439.67user 54.02system 19:50.86elapsed 41%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1major+2817586minor)pagefaults 0swaps ================================= Using a 10GB data set (exactly 10737418240 bytes), formed by concatenating four copies of the above and then truncating to the desired length, ... $ /usr/bin/time ./sort -T. --compress=gzip < sort-in > out; Rm out 3330.45user 139.57system 1:00:10elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (5major+10679797minor)pagefaults 0swaps $ /usr/bin/time ./sort -T. < sort-in > out; Rm out 1643.09user 86.83system 1:13:13elapsed 39%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (2major+233951minor)pagefaults 0swaps The result: an 18% speed-up. _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils