Using sort (GNU coreutils) 8.21Linux xxxx 2.6.18-274.12.1.el5 #1 SMP Tue Nov 29
13:37:46 EST 2011 x86_64 x86_64 x86_64 GNU/LinuxCentOS release 5.7 (Final)16
cores / Pentium III Xeon (Cascades) / .18um / X5550 @ 2.67GHz / L1 data cache:
32K, 8-way, 64 byte lines / data TLB: 4K pages, 4-way, 64 entries / L2 unified
cache: 256K, 8-way
Pre-sorting every file in the coreutils-8.21 source distro then using "sort
--files0-from" on the sorted files unexpectedly outperforms "sort --merge
--files0-from" by more than 100% (i.e. "sort --merge" is more than 2x SLOWER
than non-merge.) The gap narrows but still exists when disabling parallelism
(--parallel=1.) FWIW ~20% of the lines being sorted are identical (isolated
newlines), and the same relative performance is observed with other datasets I
tried (I used the coreutils sourcetree as it is readily available.)
SO: is this normal?
Here's what I did:
$ export LC_ALL=C$ export LANG=C$ cd /tmp/coreutils-8.21$ mkdir
/tmp/coreutils-8.21-sorted$ find . -type d | sort | while read d ; do mkdir
/tmp/coreutils-8.21-sorted/$d ; done$ find . -type f | while read f ; do sort
$f > /tmp/coreutils-8.21-sorted/$f ; done$ cd /tmp/coreutils-8.21-sorted$ find
. -type f -print0 >lst$ free total used free shared
buffers cachedMem: 24674780 5897556 18777224 0
780368 4337448-/+ buffers/cache: 779740 23895040Swap: 26738680
0 26738680
Now,
$ time sort -m --files0-from lst >/dev/null
real 0m2.410suser 0m1.466ssys 0m0.943s
$ time sort --files0-from lst >/dev/null
real 0m0.813suser 0m1.891ssys 0m0.274s
Disabling parallelism,
$ time sort --parallel=1 -m --files0-from lst >/dev/null
real 0m2.545suser 0m1.561ssys 0m0.984s
$ time sort --parallel=1 --files0-from lst >/dev/null
real 0m1.748suser 0m1.600ssys 0m0.148s
And using -s:
$ time sort --parallel=1 -m -s --files0-from lst >/dev/null
real 0m2.502suser 0m1.577ssys 0m0.925s
$ time sort --parallel=1 -s --files0-from lst >/dev/null
real 0m1.826suser 0m1.620ssys 0m0.206s
Anyone have any idea why these non-intuitive results? Why does "sort -m"
appear so slow even with --parallel=1?Should this go to [email protected]?
Thank you!Vlad.