Re: [PATCH] sort: Add --threads option, which parallelizes internal sort.

Ralf Wildenhues Fri, 03 Apr 2009 23:38:16 -0700

Hello Paul, Glen, Jim,

[ I already wrote this in part to Glen off-list, sorry for the
  duplication ]


* Paul Eggert wrote on Fri, Apr 03, 2009 at 09:57:54PM CEST:
> Of course we cannot reasonably expect this one performance improvement
> to make 'sort' run 16x faster on a 16-CPU machine.

No.  Let's discuss the 1 vs. 2 threads issue for now: the part running
concurrently takes O(N log N) work.  The parts not running concurrently
are: startup code, file reading, list splitting, list merging, and
output.  All of the latter are O(N) or less, with N being the input size.
The concurrent parts do not require any communication/synchronization
between the 2 threads.

This means that a parallelization which doesn't hit the memory bus
bandwidth limit in the concurrent stage, has to have (slowly) improving
1:2 thread behavior when N grows.  But the experiment does not show
that.  This indicates a possible issue.  To exclude bandwidth issues,
one can try to run this code on a system which has as big as possible
memory busses, and e.g., run the threads on separate processors (as
opposed to only separate cores or HT threads).

I'm not trying to veto this patch, it is certainly an improvement over
not having it, and improving things in steps is good; but I'm sure it
could be better even without parallelizing the merge part of the
algorithm.

I already noted off-list to Glen that you too can get an account on the
GCC compile farm, which has some decent multi-way systems.  But for the
above issue a bigger system isn't necessary.

Cheers,
Ralf


_______________________________________________
Bug-coreutils mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: [PATCH] sort: Add --threads option, which parallelizes internal sort.

Reply via email to