bug#7489: [coreutils] over aggressive threads in sort

Chen Guo Sun, 05 Dec 2010 21:45:32 -0800

Hi Professor Eggert,

On Fri, Dec 3, 2010 at 1:10 PM, Paul Eggert <egg...@cs.ucla.edu> wrote:
> On 12/03/10 12:18, Chen Guo wrote:
> Either option (either switch to mutexes everywhere, or have the top-level
> merge go to memory) should work.  Perhaps we should try both and benchmark
> them.


    Test machine is 4 core i7. The numbers I'm giving are averaged
over 20 runs, given in seconds, and are of the form elapsed / user +
system.

spinlock:
1 thread: 3.354 / 3.349
2 threads: 1.960 / 3.812
4 threads: 1.366 / 5.085

mutex:
1 thread: 3.354 / 3.350
2 threads: 2.062 / 3.628
4 threads: 1.497 / 4.172

spin/ output after
1 thread: 3.519 / 3.517
2 threads: 2.098 / 3.996
4 threads: 1.488 / 5.347

It seems if we have to choose between mutex and output post-sort,
mutex is the way to go. Mutex is faster in the single threaded case,
while in multithreaded the elapsed time is negligibly different, the
user time is much greater. With spinlocks only, the greater system
time was justified (though some might disagree) by the lower elapsed
time. With spinlock outputting post-sort, there is no more
justification for the higher user time.

Before saying anything else, I should note that for mutexes, on 4
threads 20% of the time there's a segfault on a seemingly innocuous
line in queue_insert ():
  node->queued = true

GDB shows that pointers all look normal, and I could not trigger this
over 10 runs with valgrind (it seems valgrind is singlethreaded). If
we do decide to go back to mutexes, I'll look into this issue more.

bug#7489: [coreutils] over aggressive threads in sort

Reply via email to