Hi Professor Eggert, On Fri, Dec 3, 2010 at 1:10 PM, Paul Eggert <egg...@cs.ucla.edu> wrote: > On 12/03/10 12:18, Chen Guo wrote: > Either option (either switch to mutexes everywhere, or have the top-level > merge go to memory) should work. Perhaps we should try both and benchmark > them.
Test machine is 4 core i7. The numbers I'm giving are averaged over 20 runs, given in seconds, and are of the form elapsed / user + system. spinlock: 1 thread: 3.354 / 3.349 2 threads: 1.960 / 3.812 4 threads: 1.366 / 5.085 mutex: 1 thread: 3.354 / 3.350 2 threads: 2.062 / 3.628 4 threads: 1.497 / 4.172 spin/ output after 1 thread: 3.519 / 3.517 2 threads: 2.098 / 3.996 4 threads: 1.488 / 5.347 It seems if we have to choose between mutex and output post-sort, mutex is the way to go. Mutex is faster in the single threaded case, while in multithreaded the elapsed time is negligibly different, the user time is much greater. With spinlocks only, the greater system time was justified (though some might disagree) by the lower elapsed time. With spinlock outputting post-sort, there is no more justification for the higher user time. Before saying anything else, I should note that for mutexes, on 4 threads 20% of the time there's a segfault on a seemingly innocuous line in queue_insert (): node->queued = true GDB shows that pointers all look normal, and I could not trigger this over 10 runs with valgrind (it seems valgrind is singlethreaded). If we do decide to go back to mutexes, I'll look into this issue more.