On 07/28/2011 01:16 PM, Jim Meyering wrote: > For some reason, sort is now very slow here (Fedora 15, x86_64, with > little free RAM), even for tiny inputs, but only some of the time, > and not when I set OMP_NUM_THREADS=1. > (I noticed because ./bootstrap was taking *forever*, traced it, > which showed gnulib-tool taking all the time, then traced gnulib-tool, > which showed slow sort -- it runs thousands of small ones.) > > time sh -c "printf 'a\nb\n'|LC_ALL=C sort | uniq -d" > sh -c "printf 'a\nb\n'|LC_ALL=C sort | uniq -d" 0.00s user 3.18s system 98% > cpu 3.218 total > $ time sh -c "printf 'a\nb\n'|LC_ALL=C sort | uniq -d" > sh -c "printf 'a\nb\n'|LC_ALL=C sort | uniq -d" 0.01s user 39.90s system 99% > cpu 40.077 total > $ time sh -c "printf 'a\nb\n'|LC_ALL=C sort | uniq -d" > sh -c "printf 'a\nb\n'|LC_ALL=C sort | uniq -d" 0.00s user 5.65s system 99% > cpu 5.679 total > $ time sh -c "printf 'a\nb\n'|LC_ALL=C sort | uniq -d" > sh -c "printf 'a\nb\n'|LC_ALL=C sort | uniq -d" 0.00s user 34.16s system 13% > cpu 4:12.40 total > > Then I can run the very same command 40 times in a row, with it > terminating in less than .3s total. > > Then once more and it takes 5 seconds.
Testing here (2 core, 4 threads) does show a large jump in time, but only the first time I specified a large number of threads. $ time sh -c "printf 'a\nb\n'|LC_ALL=C sort | uniq -d" real 0m0.031s user 0m0.015s sys 0m0.012s $ time sh -c "printf 'a\nb\n'|LC_ALL=C sort --parallel=100 | uniq -d" real 0m2.095s user 0m0.042s sys 0m2.042s $ time sh -c "printf 'a\nb\n'|LC_ALL=C sort --parallel=100 | uniq -d" real 0m0.207s user 0m0.050s sys 0m0.157s My first thought was memory usage which seems more likely as more memory is used for the multithreaded case. Also when doing any perf testing with sort I now disable MALLOC_PERTURB_ considering what large buffers it allocates, otherwise that would disable any lazy allocation that could be done. I was half considering unsetting MALLOC_PERTURB_ for the large buffer allocation internally in sort. A wild guess on your 5s delay is, maybe that run caused something to be swaped out. Maybe your swap is on a mechanical disk which needed to be spun up? It's unlikely related, but I've had particular issues with kswapd0 spinning on F15, and I'm running a patched kernel to fix the issue. Could you repeat tests with a small -S or try the bootstrap with MALLOC_PERTURB_ unset? cheers, Pádraig.