Richard Guenther wrote:
> On Nov 27, 2007 2:23 PM, Howard Chu <[EMAIL PROTECTED]> wrote:
>> A bit of a minor mystery. Not a problem, just a curiosity. If someone knew 
>> off
>> the top of their head a reason for it, that'd be cool, but otherwise no 
>> sweat.
> 
> I'd try -Os, you might run into ICache limitations.
Try -Os with and without setting -mpreferred-stack-boundary=4 (or
whatever value you currently have).  Watch memory usage, cache
evictions, etc. while running.
> 
> Richard.
> 
>> -------- Original Message --------
>> Subject: Re: commit: ldap/servers/slapd connection.c daemon.c proto-slap.h
>> syncrepl.c
>> Date: Tue, 27 Nov 2007 05:17:04 -0800
>> From: Howard Chu <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> References: <[EMAIL PROTECTED]>
>> <[EMAIL PROTECTED]>    <[EMAIL PROTECTED]>
>> <[EMAIL PROTECTED]>
>>
>> Howard Chu wrote:
>>> Howard Chu wrote:
>>>> Howard Chu wrote:
>>>>> For reference, the peak throughput with back-null on the previous code was
>>>>> only 7,800 auths/sec (with 8 client threads). With this patch it's 11,140
>>>>> auths/sec.
>> Those numbers are for Windows Server 2003 x86_64 on a Celestica A8440 with 4
>> Opteron 875s, using OpenLDAP compiled with gcc 4.3.0. The following numbers
>> are for Linux 2.6.23.1 x86_64, on the same machine, compiled first with gcc
>> 4.1.2 and then later with gcc 4.2.2. There's no disk I/O in these tests.
>>
>>>>> In both cases the throughput declines as more client threads are
>>>>> used. (Compare to 35,553 auths/sec for the same machine running Linux, 
>>>>> and no
>>>>> drop in throughput all the way up to hundreds/thousands of connections.)
>>> Re-running on Linux with a non-optimized build, peaked at 40,101 auths/sec. 
>>> (I
>>> guess HEAD has sped up a bit more in the past week or so...)
>> OK, this is odd. The code compiled without optimization peaks at 40K 
>> auths/sec
>> at around 124-132 client threads. The code compiled with -O2 peaks at 37K sec
>> at around 128 client threads.
>>
>> The -O2 build is faster from about 4 to 24 client threads. From 28 on up, the
>> nonoptimized code is faster at every load level. I was originally using gcc
>> 4.1.2 but I'm seeing the same result now using gcc 4.2.2. Also, slapd is only
>> configured with 8 worker threads in all of these tests. Strange that whatever
>> optimizations the compiler has generated speeds things up for lighter load,
>> but works against it under heavier load.
>> --
>>    -- Howard Chu
>>    Chief Architect, Symas Corp.  http://www.symas.com
>>    Director, Highland Sun        http://highlandsun.com/hyc/
>>    Chief Architect, OpenLDAP     http://www.openldap.org/project/
>>

Reply via email to