Re: [coreutils] draft [PATCH] sort: explicit --parallel=N now overrides environment

2010-12-11 Thread Pádraig Brady
On 11/12/10 09:05, Paul Eggert wrote:
> It seems to me that this code in sort.c:
> 
>   unsigned long int np2 = num_processors (NPROC_CURRENT_OVERRIDABLE);
>   if (!nthreads || nthreads > np2)
> nthreads = np2;
> 
> is now obsolete.  It was written assuming spin locks, but now that
> we use mutexes, shouldn't we respect an explicit --parallel=N
> flag?  Something like the following, say?   This would let the user
> override the environment in the command line, which is normally what
> people would expect.

Yes I think you're right.
Related to this is the default number chosen,
which might be best to restrict to 8 or so
as there are diminishing returns after that.
Of course we'd need to benchmark again with
all the recent changes to find an appropriate default.
The gcc compile farm has a niagra 32 core (gcc12) and
and a magny-cours 24 core (gcc10) available.

Also I notice Chen has inadvertently been omitted
from THANKS, and that his UCLA email address should
probably be added to .mailmap

cheers,
Pádraig.



Re: [coreutils] draft [PATCH] sort: explicit --parallel=N now overrides environment

2010-12-12 Thread Jim Meyering
Pádraig Brady wrote:
> On 11/12/10 09:05, Paul Eggert wrote:
>> It seems to me that this code in sort.c:
>>
>>   unsigned long int np2 = num_processors (NPROC_CURRENT_OVERRIDABLE);
>>   if (!nthreads || nthreads > np2)
>> nthreads = np2;
>>
>> is now obsolete.  It was written assuming spin locks, but now that
>> we use mutexes, shouldn't we respect an explicit --parallel=N
>> flag?  Something like the following, say?   This would let the user
>> override the environment in the command line, which is normally what
>> people would expect.
>
> Yes I think you're right.
> Related to this is the default number chosen,
> which might be best to restrict to 8 or so
> as there are diminishing returns after that.

8 sounds like a reasonable threshold.
I ran tests on a 6-core i7-970 (which shows as 12-core):

gensort -a 1000 in-10M
env time --format=%e sort in-10M > /dev/null
N-parallel elapsed_seconds (avg of 10 trials)
12   6.91
 8   7.13

gensort -a 2000 in-20M
env time --format=%e sort in-20M > /dev/null
N-parallel elapsed_seconds (avg of 10 trials)
12  13.94
 8  14.34

It is indeed a case of diminishing returns, at least in these cases.

> Of course we'd need to benchmark again with
> all the recent changes to find an appropriate default.
> The gcc compile farm has a niagra 32 core (gcc12) and
> and a magny-cours 24 core (gcc10) available.
>
> Also I notice Chen has inadvertently been omitted from THANKS,

I have not been adding commit "author" names to THANKS
partly because eventually I want to generate that file from the
combination of the git commit log and a VC'd list of other
name/email pairs.

> and that his UCLA email address should
> probably be added to .mailmap

Good eye.