Re: sched_yield() makes OpenLDAP slow

2005-08-23 Thread Lincoln Dale

Denis Vlasenko wrote:


This is what I would expect if run on an otherwise idle machine.
sched_yield just puts you at the back of the line for runnable
processes, it doesn't magically cause you to go to sleep somehow.
 


When a kernel build is occurring??? Plus `top` itself It damn
well sleep while giving up the CPU. If it doesn't it's broken.
   

unless you have all of the kernel source in the buffer cache, a 
concurrent kernel build will spend a fair bit of time in io_wait state ..
as such its perfectly plausible that sched_yield keeps popping back to 
the top of 'runnable' processes . . .



cheers,

lincoln.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-23 Thread Denis Vlasenko
On Tuesday 23 August 2005 14:17, linux-os \(Dick Johnson\) wrote:
> 
> On Mon, 22 Aug 2005, Robert Hancock wrote:
> 
> > linux-os (Dick Johnson) wrote:
> >> I reported thet sched_yield() wasn't working (at least as expected)
> >> back in March of 2004.
> >>
> >>for(;;)
> >>  sched_yield();
> >>
> >> ... takes 100% CPU time as reported by `top`. It should take
> >> practically 0. Somebody said that this was because `top` was
> >> broken, others said that it was because I didn't know how to
> >> code. Nevertheless, the problem was not fixed, even after
> >> schedular changes were made for the current version.
> >
> > This is what I would expect if run on an otherwise idle machine.
> > sched_yield just puts you at the back of the line for runnable
> > processes, it doesn't magically cause you to go to sleep somehow.
> >
> 
> When a kernel build is occurring??? Plus `top` itself It damn
> well sleep while giving up the CPU. If it doesn't it's broken.

top doesn't run all the time:

# strace -o top.strace -tt top

14:52:19.407958 write(1, "  758 root  16   0   104   2"..., 79) = 79
14:52:19.408318 write(1, "  759 root  16   0   100   1"..., 79) = 79
14:52:19.408659 write(1, "  760 root  16   0   100   1"..., 79) = 79
14:52:19.409001 write(1, "  761 root  18   0  2604  39"..., 74) = 74
14:52:19.409342 write(1, "  763 daemon17   0   108   1"..., 78) = 78
14:52:19.409672 write(1, "  773 root  16   0   104   2"..., 79) = 79
14:52:19.410010 write(1, "  774 root  16   0   104   2"..., 79) = 79
14:52:19.410362 write(1, "  775 root  16   0   100   1"..., 79) = 79
14:52:19.410692 write(1, "  776 root  16   0   104   2"..., 79) = 79
14:52:19.411136 write(1, "  777 daemon17   0   108   1"..., 86) = 86
14:52:19.411505 select(1, [0], NULL, NULL, {5, 0}) = 0 (Timeout)
hrrr. ps...
14:52:24.411744 time([1124797944])  = 1124797944
14:52:24.411883 lseek(4, 0, SEEK_SET)   = 0
14:52:24.411957 read(4, "24822.01 18801.28\n", 1023) = 18
14:52:24.412082 access("/var/run/utmpx", F_OK) = -1 ENOENT (No such file or 
directory)
14:52:24.412224 open("/var/run/utmp", O_RDWR) = 8
14:52:24.412328 fcntl64(8, F_GETFD) = 0
14:52:24.412399 fcntl64(8, F_SETFD, FD_CLOEXEC) = 0
14:52:24.412467 _llseek(8, 0, [0], SEEK_SET) = 0
14:52:24.412556 alarm(0)= 0
14:52:24.412643 rt_sigaction(SIGALRM, {0x4015a57c, [], SA_RESTORER, 
0x40094ae8}, {SIG_DFL}, 8) = 0
14:52:24.412747 alarm(1)= 0

However, kernel compile shouldn't.

I suggest stracing with -tt "for(;;) yield();" test proggy with and without
kernel compile in parallel, and comparing the output...

Hmm... actually, knowing that you will argue to death instead...

# cat t.c
#include 

int main() {
for(;;) sched_yield();
return 0;
}
# gcc t.c
# strace -tt ./a.out
...
15:03:41.211324 sched_yield()   = 0
15:03:41.211673 sched_yield()   = 0
15:03:41.212034 sched_yield()   = 0
15:03:41.212400 sched_yield()   = 0
15:03:41.212749 sched_yield()   = 0
15:03:41.213126 sched_yield()   = 0
15:03:41.213486 sched_yield()   = 0
15:03:41.213835 sched_yield()   = 0
15:03:41.214220 sched_yield()   = 0
15:03:41.214577 sched_yield()   = 0
15:03:41.214939 sched_yield()   = 0
I start "while true; do true; done" on another console...
15:03:43.314645 sched_yield()   = 0
15:03:43.847644 sched_yield()   = 0
15:03:43.954635 sched_yield()   = 0
15:03:44.063798 sched_yield()   = 0
15:03:44.171596 sched_yield()   = 0
15:03:44.282624 sched_yield()   = 0
15:03:44.391632 sched_yield()   = 0
15:03:44.498609 sched_yield()   = 0
15:03:44.605584 sched_yield()   = 0
15:03:44.712538 sched_yield()   = 0
15:03:44.819557 sched_yield()   = 0
15:03:44.928594 sched_yield()   = 0
15:03:45.040603 sched_yield()   = 0
15:03:45.148545 sched_yield()   = 0
15:03:45.259311 sched_yield()   = 0
15:03:45.368563 sched_yield()   = 0
15:03:45.476482 sched_yield()   = 0
15:03:45.583568 sched_yield()   = 0
15:03:45.690491 sched_yield()   = 0
15:03:45.797512 sched_yield()   = 0
15:03:45.906534 sched_yield()   = 0
15:03:46.013545 sched_yield()   = 0
15:03:46.120505 sched_yield()   = 0
Ctrl-C

# uname -a
Linux firebird 2.6.12-r4 #1 SMP Sun Jul 17 13:51:47 EEST 2005 i686 unknown 
unknown GNU/Linux
--
vda

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-23 Thread linux-os \(Dick Johnson\)

On Mon, 22 Aug 2005, Robert Hancock wrote:

> linux-os (Dick Johnson) wrote:
>> I reported thet sched_yield() wasn't working (at least as expected)
>> back in March of 2004.
>>
>>  for(;;)
>>  sched_yield();
>>
>> ... takes 100% CPU time as reported by `top`. It should take
>> practically 0. Somebody said that this was because `top` was
>> broken, others said that it was because I didn't know how to
>> code. Nevertheless, the problem was not fixed, even after
>> schedular changes were made for the current version.
>
> This is what I would expect if run on an otherwise idle machine.
> sched_yield just puts you at the back of the line for runnable
> processes, it doesn't magically cause you to go to sleep somehow.
>

When a kernel build is occurring??? Plus `top` itself It damn
well sleep while giving up the CPU. If it doesn't it's broken.

> --

> Robert Hancock  Saskatoon, SK, Canada
> To email, remove "nospam" from [EMAIL PROTECTED]
> Home Page: http://www.roberthancock.com/
>


Cheers,
Dick Johnson
Penguin : Linux version 2.6.12.5 on an i686 machine (5537.79 BogoMips).
Warning : 98.36% of all statistics are fiction.
.
I apologize for the following. I tried to kill it with the above dot :


The information transmitted in this message is confidential and may be 
privileged.  Any review, retransmission, dissemination, or other use of this 
information by persons or entities other than the intended recipient is 
prohibited.  If you are not the intended recipient, please notify Analogic 
Corporation immediately - by replying to this message or by sending an email to 
[EMAIL PROTECTED] - and destroy all copies of this information, including any 
attachments, without reading or disclosing them.

Thank you.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-23 Thread linux-os \(Dick Johnson\)

On Mon, 22 Aug 2005, Robert Hancock wrote:

 linux-os (Dick Johnson) wrote:
 I reported thet sched_yield() wasn't working (at least as expected)
 back in March of 2004.

  for(;;)
  sched_yield();

 ... takes 100% CPU time as reported by `top`. It should take
 practically 0. Somebody said that this was because `top` was
 broken, others said that it was because I didn't know how to
 code. Nevertheless, the problem was not fixed, even after
 schedular changes were made for the current version.

 This is what I would expect if run on an otherwise idle machine.
 sched_yield just puts you at the back of the line for runnable
 processes, it doesn't magically cause you to go to sleep somehow.


When a kernel build is occurring??? Plus `top` itself It damn
well sleep while giving up the CPU. If it doesn't it's broken.

 --

 Robert Hancock  Saskatoon, SK, Canada
 To email, remove nospam from [EMAIL PROTECTED]
 Home Page: http://www.roberthancock.com/



Cheers,
Dick Johnson
Penguin : Linux version 2.6.12.5 on an i686 machine (5537.79 BogoMips).
Warning : 98.36% of all statistics are fiction.
.
I apologize for the following. I tried to kill it with the above dot :


The information transmitted in this message is confidential and may be 
privileged.  Any review, retransmission, dissemination, or other use of this 
information by persons or entities other than the intended recipient is 
prohibited.  If you are not the intended recipient, please notify Analogic 
Corporation immediately - by replying to this message or by sending an email to 
[EMAIL PROTECTED] - and destroy all copies of this information, including any 
attachments, without reading or disclosing them.

Thank you.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-23 Thread Denis Vlasenko
On Tuesday 23 August 2005 14:17, linux-os \(Dick Johnson\) wrote:
 
 On Mon, 22 Aug 2005, Robert Hancock wrote:
 
  linux-os (Dick Johnson) wrote:
  I reported thet sched_yield() wasn't working (at least as expected)
  back in March of 2004.
 
 for(;;)
   sched_yield();
 
  ... takes 100% CPU time as reported by `top`. It should take
  practically 0. Somebody said that this was because `top` was
  broken, others said that it was because I didn't know how to
  code. Nevertheless, the problem was not fixed, even after
  schedular changes were made for the current version.
 
  This is what I would expect if run on an otherwise idle machine.
  sched_yield just puts you at the back of the line for runnable
  processes, it doesn't magically cause you to go to sleep somehow.
 
 
 When a kernel build is occurring??? Plus `top` itself It damn
 well sleep while giving up the CPU. If it doesn't it's broken.

top doesn't run all the time:

# strace -o top.strace -tt top

14:52:19.407958 write(1,   758 root  16   0   104   2..., 79) = 79
14:52:19.408318 write(1,   759 root  16   0   100   1..., 79) = 79
14:52:19.408659 write(1,   760 root  16   0   100   1..., 79) = 79
14:52:19.409001 write(1,   761 root  18   0  2604  39..., 74) = 74
14:52:19.409342 write(1,   763 daemon17   0   108   1..., 78) = 78
14:52:19.409672 write(1,   773 root  16   0   104   2..., 79) = 79
14:52:19.410010 write(1,   774 root  16   0   104   2..., 79) = 79
14:52:19.410362 write(1,   775 root  16   0   100   1..., 79) = 79
14:52:19.410692 write(1,   776 root  16   0   104   2..., 79) = 79
14:52:19.411136 write(1,   777 daemon17   0   108   1..., 86) = 86
14:52:19.411505 select(1, [0], NULL, NULL, {5, 0}) = 0 (Timeout)
hrrr. ps...
14:52:24.411744 time([1124797944])  = 1124797944
14:52:24.411883 lseek(4, 0, SEEK_SET)   = 0
14:52:24.411957 read(4, 24822.01 18801.28\n, 1023) = 18
14:52:24.412082 access(/var/run/utmpx, F_OK) = -1 ENOENT (No such file or 
directory)
14:52:24.412224 open(/var/run/utmp, O_RDWR) = 8
14:52:24.412328 fcntl64(8, F_GETFD) = 0
14:52:24.412399 fcntl64(8, F_SETFD, FD_CLOEXEC) = 0
14:52:24.412467 _llseek(8, 0, [0], SEEK_SET) = 0
14:52:24.412556 alarm(0)= 0
14:52:24.412643 rt_sigaction(SIGALRM, {0x4015a57c, [], SA_RESTORER, 
0x40094ae8}, {SIG_DFL}, 8) = 0
14:52:24.412747 alarm(1)= 0

However, kernel compile shouldn't.

I suggest stracing with -tt for(;;) yield(); test proggy with and without
kernel compile in parallel, and comparing the output...

Hmm... actually, knowing that you will argue to death instead...

# cat t.c
#include sched.h

int main() {
for(;;) sched_yield();
return 0;
}
# gcc t.c
# strace -tt ./a.out
...
15:03:41.211324 sched_yield()   = 0
15:03:41.211673 sched_yield()   = 0
15:03:41.212034 sched_yield()   = 0
15:03:41.212400 sched_yield()   = 0
15:03:41.212749 sched_yield()   = 0
15:03:41.213126 sched_yield()   = 0
15:03:41.213486 sched_yield()   = 0
15:03:41.213835 sched_yield()   = 0
15:03:41.214220 sched_yield()   = 0
15:03:41.214577 sched_yield()   = 0
15:03:41.214939 sched_yield()   = 0
I start while true; do true; done on another console...
15:03:43.314645 sched_yield()   = 0
15:03:43.847644 sched_yield()   = 0
15:03:43.954635 sched_yield()   = 0
15:03:44.063798 sched_yield()   = 0
15:03:44.171596 sched_yield()   = 0
15:03:44.282624 sched_yield()   = 0
15:03:44.391632 sched_yield()   = 0
15:03:44.498609 sched_yield()   = 0
15:03:44.605584 sched_yield()   = 0
15:03:44.712538 sched_yield()   = 0
15:03:44.819557 sched_yield()   = 0
15:03:44.928594 sched_yield()   = 0
15:03:45.040603 sched_yield()   = 0
15:03:45.148545 sched_yield()   = 0
15:03:45.259311 sched_yield()   = 0
15:03:45.368563 sched_yield()   = 0
15:03:45.476482 sched_yield()   = 0
15:03:45.583568 sched_yield()   = 0
15:03:45.690491 sched_yield()   = 0
15:03:45.797512 sched_yield()   = 0
15:03:45.906534 sched_yield()   = 0
15:03:46.013545 sched_yield()   = 0
15:03:46.120505 sched_yield()   = 0
Ctrl-C

# uname -a
Linux firebird 2.6.12-r4 #1 SMP Sun Jul 17 13:51:47 EEST 2005 i686 unknown 
unknown GNU/Linux
--
vda

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-23 Thread Lincoln Dale

Denis Vlasenko wrote:


This is what I would expect if run on an otherwise idle machine.
sched_yield just puts you at the back of the line for runnable
processes, it doesn't magically cause you to go to sleep somehow.
 


When a kernel build is occurring??? Plus `top` itself It damn
well sleep while giving up the CPU. If it doesn't it's broken.
   

unless you have all of the kernel source in the buffer cache, a 
concurrent kernel build will spend a fair bit of time in io_wait state ..
as such its perfectly plausible that sched_yield keeps popping back to 
the top of 'runnable' processes . . .



cheers,

lincoln.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-22 Thread Howard Chu

Florian Weimer wrote:

 * Howard Chu:
> That's not the complete story. BerkeleyDB provides a
> db_env_set_func_yield() hook to tell it what yield function it
> should use when its internal locking routines need such a function.
> If you don't set a specific hook, it just uses sleep(). The
> OpenLDAP backend will invoke this hook during some (not necessarily
> all) init sequences, to tell it to use the thread yield function
> that we selected in autoconf.



 And this helps to increase performance substantially?


When the caller is a threaded program, yes, there is a substantial 
(measurable and noticable) difference. Given that sleep() blocks the 
entire process, the difference is obvious.



> Note that (on systems that support inter-process mutexes) a
> BerkeleyDB database environment may be used by multiple processes
> concurrently.

 Yes, I know this, and I haven't experienced that much trouble with
 deadlocks.  Maybe the way you structure and access the database
 environment can be optimized for deadlock avoidance?


Maybe we already did this deadlock analysis and optimization, years ago 
when we first started developing this backend? Do you think everyone 
else in the world is a total fool?



> As such, the yield function that is provided must work both for
> threads within a single process (PTHREAD_SCOPE_PROCESS) as well as
> between processes (PTHREAD_SCOPE_SYSTEM).



 If I understand you correctly, what you really need is a syscall
 along the lines "don't run me again until all threads T that share
 property X have run, where the Ts aren't necessarily in the same
 process".  The kernel is psychic, it can't really know which
 processes to schedule to satisfy such a requirement.  I don't even
 think "has joined the Berkeley DB environment" is the desired
 property, but something like "is part of this cycle in the wait-for
 graph" or something similar.


You seem to believe we're looking for special treatment for the 
processes we're concerned with, and that's not true. If the system is 
busy with other processes, so be it, the system is busy. If you want 
better performance, you build a dedicated server and don't let anything 
else make the system busy. This is the way mission-critical services are 
delivered, regardless of the service. If you're not running on a 
dedicated system, then your deployment must not be mission critical, and 
so you shouldn't be surprised if a large gcc run slows down some other 
activities in the meantime. If you have a large nice'd job running 
before your normal priority jobs get their timeslice, then you should 
certainly wonder wtf the scheduler is doing, and why your system even 
claims to support nice() when clearly it doesn't mean anything on that 
system.



 I would have to check the Berkeley DB internals in order to tell what
 is feasible to implement.  This code shouldn't be on the fast path,
 so some kernel-based synchronization is probably sufficient.


pthread_cond_wait() probably would be just fine here, but BerkeleyDB 
doesn't work that way.


--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-22 Thread Howard Chu

Florian Weimer wrote:

* Andi Kleen:

  
Has anybody contacted the Sleepycat people with a description of the 
problem yet?



Berkeley DB does not call sched_yield, but OpenLDAP does in some
wrapper code around the Berkeley DB backend.
That's not the complete story. BerkeleyDB provides a 
db_env_set_func_yield() hook to tell it what yield function it should 
use when its internal locking routines need such a function. If you 
don't set a specific hook, it just uses sleep(). The OpenLDAP backend 
will invoke this hook during some (not necessarily all) init sequences, 
to tell it to use the thread yield function that we selected in autoconf.


Note that (on systems that support inter-process mutexes) a BerkeleyDB 
database environment may be used by multiple processes concurrently. As 
such, the yield function that is provided must work both for threads 
within a single process (PTHREAD_SCOPE_PROCESS) as well as between 
processes (PTHREAD_SCOPE_SYSTEM). The previous comment about slapd only 
needing to yield within a single process is inaccurate; since we allow 
slapcat to run concurrently with slapd (to allow hot backups) we need 
BerkeleyDB's locking/yield functions to work in System scope.


--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-22 Thread Howard Chu

Nikita Danilov wrote:

Howard Chu writes:

 > That's beside the point. Folks are making an assertion that
 > sched_yield() is meaningless; this example demonstrates that there are
 > cases where sched_yield() is essential.

It is not essential, it is non-portable.

Code you described is based on non-portable "expectations" about thread
scheduling. Linux implementation of pthreads fails to satisfy
them. Perfectly reasonable. Code is then "fixed" by adding sched_yield()
calls and introducing more non-portable assumptions. Again, there is no
guarantee this would work on any compliant implementation.

While "intuitive" semantics of sched_yield() is to yield CPU and to give
other runnable threads their chance to run, this is _not_ what standard
prescribes (for non-RT threads).
  
Very well; it is not prescribed in the standard and it is non-portable. 
Our code is broken and we will fix it.


But even Dave Butenhof, Mr. Pthreads himself, has said it is reasonable 
to expect sched_yield to yield the CPU. That's what pthread_yield did in 
Pthreads Draft 4 (DCE threads) and it is common knowledge that 
sched_yield is a direct replacement for pthread_yield; i.e., 
pthread_yield() was deleted from the spec because sched_yield fulfilled 
its purpose. Now you're saying "well, technically, sched_yield doesn't 
have to do anything at all" and the letter of the spec supports your 
position, but anybody who's been programming with pthreads since the DCE 
days "knows" that is not the original intention. I wonder that nobody 
has decided to raise this issue with the IEEE/POSIX group and get them 
to issue a correction/clarification in all this time, since the absence 
of specification here really impairs the usefulness of the spec.


Likewise the fact that sched_yield() can now cause the current process 
to be queued behind other processes seems suspect, unless we know for 
sure that the threads are running with PTHREAD_SCOPE_SYSTEM. (I haven't 
checked to see if PTHREAD_SCOPE_PROCESS is still supported in NPTL.)


--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-22 Thread linux-os \(Dick Johnson\)

On Sat, 20 Aug 2005, Robert Hancock wrote:

> Howard Chu wrote:
>> I'll note that we removed a number of the yield calls (that were in
>> OpenLDAP 2.2) for the 2.3 release, because I found that they were
>> redundant and causing unnecessary delays. My own test system is running
>> on a Linux 2.6.12.3 kernel (installed over a SuSE 9.2 x86_64 distro),
>> and OpenLDAP 2.3 runs perfectly well here, now that those redundant
>> calls have been removed. But I also found that I needed to add a new
>> yield(), to work around yet another unexpected issue on this system - we
>> have a number of threads waiting on a condition variable, and the thread
>> holding the mutex signals the var, unlocks the mutex, and then
>> immediately relocks it. The expectation here is that upon unlocking the
>> mutex, the calling thread would block while some waiting thread (that
>> just got signaled) would get to run. In fact what happened is that the
>> calling thread unlocked and relocked the mutex without allowing any of
>> the waiting threads to run. In this case the only solution was to insert
>> a yield() after the mutex_unlock(). So again, for those of you claiming
>> "oh, all you need to do is use a condition variable or any of the other
>> POSIX synchronization primitives" - yes, that's a nice theory, but
>> reality says otherwise.
>
> I encountered a similar issue with some software that I wrote, and used
> a similar workaround, however this was basically because there wasn't
> enough time available at the time to redesign things to work properly.
> The problem here is essentially caused by the fact that the mutex is
> being locked for an excessively large proportion of the time and not
> letting other threads in. In the case I am thinking of, posting the
> messages to the thread that was hogging the mutex via a signaling queue
> would have been a better solution than using yield and having correct
> operation depend on undefined parts of thread scheduling behavior..
>
> --

> Robert Hancock  Saskatoon, SK, Canada
> To email, remove "nospam" from [EMAIL PROTECTED]
> Home Page: http://www.roberthancock.com/
>

I reported thet sched_yield() wasn't working (at least as expected)
back in March of 2004.

for(;;)
 sched_yield();

... takes 100% CPU time as reported by `top`. It should take
practically 0. Somebody said that this was because `top` was
broken, others said that it was because I didn't know how to
code. Nevertheless, the problem was not fixed, even after
schedular changes were made for the current version.

  One can execute:

usleep(0);
... instead of:
sched_yield();

... and Linux then performs exactly like other Unixes when
code is waiting on mutexes.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.12.5 on an i686 machine (5537.79 BogoMips).
Warning : 98.36% of all statistics are fiction.
.
I apologize for the following. I tried to kill it with the above dot :


The information transmitted in this message is confidential and may be 
privileged.  Any review, retransmission, dissemination, or other use of this 
information by persons or entities other than the intended recipient is 
prohibited.  If you are not the intended recipient, please notify Analogic 
Corporation immediately - by replying to this message or by sending an email to 
[EMAIL PROTECTED] - and destroy all copies of this information, including any 
attachments, without reading or disclosing them.

Thank you.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-22 Thread Florian Weimer
* Howard Chu:

>>> Has anybody contacted the Sleepycat people with a description of
>>> the problem yet?

>> Berkeley DB does not call sched_yield, but OpenLDAP does in some
>> wrapper code around the Berkeley DB backend.

> That's not the complete story. BerkeleyDB provides a 
> db_env_set_func_yield() hook to tell it what yield function it should 
> use when its internal locking routines need such a function. If you 
> don't set a specific hook, it just uses sleep(). The OpenLDAP backend 
> will invoke this hook during some (not necessarily all) init sequences, 
> to tell it to use the thread yield function that we selected in autoconf.

And this helps to increase performance substantially?

> Note that (on systems that support inter-process mutexes) a BerkeleyDB 
> database environment may be used by multiple processes concurrently.

Yes, I know this, and I haven't experienced that much trouble with
deadlocks.  Maybe the way you structure and access the database
environment can be optimized for deadlock avoidance?

> As such, the yield function that is provided must work both for
> threads within a single process (PTHREAD_SCOPE_PROCESS) as well as
> between processes (PTHREAD_SCOPE_SYSTEM).

If I understand you correctly, what you really need is a syscall along
the lines "don't run me again until all threads T that share property
X have run, where the Ts aren't necessarily in the same process".  The
kernel is psychic, it can't really know which processes to schedule to
satisfy such a requirement.  I don't even think "has joined the
Berkeley DB environment" is the desired property, but something like
"is part of this cycle in the wait-for graph" or something similar.

I would have to check the Berkeley DB internals in order to tell what
is feasible to implement.  This code shouldn't be on the fast path, so
some kernel-based synchronization is probably sufficient.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-22 Thread Andi Kleen
> processes (PTHREAD_SCOPE_SYSTEM). The previous comment about slapd only 
> needing to yield within a single process is inaccurate; since we allow 
> slapcat to run concurrently with slapd (to allow hot backups) we need 
> BerkeleyDB's locking/yield functions to work in System scope.

That's broken by design - it means you can be arbitarily starved 
by other processes running in parallel. You are basically assuming
your application is the only thing running on the system
which is wrong. Also there are enough synchronization primitives
that can synchronize multiple processes without making
such broken assumptions.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-22 Thread Robert Hancock

linux-os (Dick Johnson) wrote:

I reported thet sched_yield() wasn't working (at least as expected)
back in March of 2004.

for(;;)
 sched_yield();

... takes 100% CPU time as reported by `top`. It should take
practically 0. Somebody said that this was because `top` was
broken, others said that it was because I didn't know how to
code. Nevertheless, the problem was not fixed, even after
schedular changes were made for the current version.


This is what I would expect if run on an otherwise idle machine. 
sched_yield just puts you at the back of the line for runnable 
processes, it doesn't magically cause you to go to sleep somehow.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-22 Thread Howard Chu

Andi Kleen wrote:

> processes (PTHREAD_SCOPE_SYSTEM). The previous comment about slapd
> only needing to yield within a single process is inaccurate; since
> we allow slapcat to run concurrently with slapd (to allow hot
> backups) we need BerkeleyDB's locking/yield functions to work in
> System scope.



 That's broken by design - it means you can be arbitarily starved by
 other processes running in parallel. You are basically assuming your
 application is the only thing running on the system which is wrong.
 Also there are enough synchronization primitives that can synchronize
 multiple processes without making such broken assumptions.


Again, I think you overstate the problem. "Arbitrarily starved by other 
processes" implies that the process scheduler will do a poor job and 
will allow the slapd process to be starved. We do not assume we're the 
only app on the system, we just assume that eventually we will get the 
CPU back. If that's not a valid assumption, then there is something 
wrong with the underlying system environment.


Something you ought to keep in mind - correctness and compliance are 
well and good, but worthless if the end result isn't useful. Windows NT 
has a POSIX-compliant subsystem but it is utterly useless. That's what 
you wind up with when all you do is conform to the letter of the spec.


--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-22 Thread Howard Chu

Andi Kleen wrote:

 processes (PTHREAD_SCOPE_SYSTEM). The previous comment about slapd
 only needing to yield within a single process is inaccurate; since
 we allow slapcat to run concurrently with slapd (to allow hot
 backups) we need BerkeleyDB's locking/yield functions to work in
 System scope.



 That's broken by design - it means you can be arbitarily starved by
 other processes running in parallel. You are basically assuming your
 application is the only thing running on the system which is wrong.
 Also there are enough synchronization primitives that can synchronize
 multiple processes without making such broken assumptions.


Again, I think you overstate the problem. Arbitrarily starved by other 
processes implies that the process scheduler will do a poor job and 
will allow the slapd process to be starved. We do not assume we're the 
only app on the system, we just assume that eventually we will get the 
CPU back. If that's not a valid assumption, then there is something 
wrong with the underlying system environment.


Something you ought to keep in mind - correctness and compliance are 
well and good, but worthless if the end result isn't useful. Windows NT 
has a POSIX-compliant subsystem but it is utterly useless. That's what 
you wind up with when all you do is conform to the letter of the spec.


--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-22 Thread Robert Hancock

linux-os (Dick Johnson) wrote:

I reported thet sched_yield() wasn't working (at least as expected)
back in March of 2004.

for(;;)
 sched_yield();

... takes 100% CPU time as reported by `top`. It should take
practically 0. Somebody said that this was because `top` was
broken, others said that it was because I didn't know how to
code. Nevertheless, the problem was not fixed, even after
schedular changes were made for the current version.


This is what I would expect if run on an otherwise idle machine. 
sched_yield just puts you at the back of the line for runnable 
processes, it doesn't magically cause you to go to sleep somehow.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-22 Thread Florian Weimer
* Howard Chu:

 Has anybody contacted the Sleepycat people with a description of
 the problem yet?

 Berkeley DB does not call sched_yield, but OpenLDAP does in some
 wrapper code around the Berkeley DB backend.

 That's not the complete story. BerkeleyDB provides a 
 db_env_set_func_yield() hook to tell it what yield function it should 
 use when its internal locking routines need such a function. If you 
 don't set a specific hook, it just uses sleep(). The OpenLDAP backend 
 will invoke this hook during some (not necessarily all) init sequences, 
 to tell it to use the thread yield function that we selected in autoconf.

And this helps to increase performance substantially?

 Note that (on systems that support inter-process mutexes) a BerkeleyDB 
 database environment may be used by multiple processes concurrently.

Yes, I know this, and I haven't experienced that much trouble with
deadlocks.  Maybe the way you structure and access the database
environment can be optimized for deadlock avoidance?

 As such, the yield function that is provided must work both for
 threads within a single process (PTHREAD_SCOPE_PROCESS) as well as
 between processes (PTHREAD_SCOPE_SYSTEM).

If I understand you correctly, what you really need is a syscall along
the lines don't run me again until all threads T that share property
X have run, where the Ts aren't necessarily in the same process.  The
kernel is psychic, it can't really know which processes to schedule to
satisfy such a requirement.  I don't even think has joined the
Berkeley DB environment is the desired property, but something like
is part of this cycle in the wait-for graph or something similar.

I would have to check the Berkeley DB internals in order to tell what
is feasible to implement.  This code shouldn't be on the fast path, so
some kernel-based synchronization is probably sufficient.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-22 Thread Andi Kleen
 processes (PTHREAD_SCOPE_SYSTEM). The previous comment about slapd only 
 needing to yield within a single process is inaccurate; since we allow 
 slapcat to run concurrently with slapd (to allow hot backups) we need 
 BerkeleyDB's locking/yield functions to work in System scope.

That's broken by design - it means you can be arbitarily starved 
by other processes running in parallel. You are basically assuming
your application is the only thing running on the system
which is wrong. Also there are enough synchronization primitives
that can synchronize multiple processes without making
such broken assumptions.

-Andi

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-22 Thread linux-os \(Dick Johnson\)

On Sat, 20 Aug 2005, Robert Hancock wrote:

 Howard Chu wrote:
 I'll note that we removed a number of the yield calls (that were in
 OpenLDAP 2.2) for the 2.3 release, because I found that they were
 redundant and causing unnecessary delays. My own test system is running
 on a Linux 2.6.12.3 kernel (installed over a SuSE 9.2 x86_64 distro),
 and OpenLDAP 2.3 runs perfectly well here, now that those redundant
 calls have been removed. But I also found that I needed to add a new
 yield(), to work around yet another unexpected issue on this system - we
 have a number of threads waiting on a condition variable, and the thread
 holding the mutex signals the var, unlocks the mutex, and then
 immediately relocks it. The expectation here is that upon unlocking the
 mutex, the calling thread would block while some waiting thread (that
 just got signaled) would get to run. In fact what happened is that the
 calling thread unlocked and relocked the mutex without allowing any of
 the waiting threads to run. In this case the only solution was to insert
 a yield() after the mutex_unlock(). So again, for those of you claiming
 oh, all you need to do is use a condition variable or any of the other
 POSIX synchronization primitives - yes, that's a nice theory, but
 reality says otherwise.

 I encountered a similar issue with some software that I wrote, and used
 a similar workaround, however this was basically because there wasn't
 enough time available at the time to redesign things to work properly.
 The problem here is essentially caused by the fact that the mutex is
 being locked for an excessively large proportion of the time and not
 letting other threads in. In the case I am thinking of, posting the
 messages to the thread that was hogging the mutex via a signaling queue
 would have been a better solution than using yield and having correct
 operation depend on undefined parts of thread scheduling behavior..

 --

 Robert Hancock  Saskatoon, SK, Canada
 To email, remove nospam from [EMAIL PROTECTED]
 Home Page: http://www.roberthancock.com/


I reported thet sched_yield() wasn't working (at least as expected)
back in March of 2004.

for(;;)
 sched_yield();

... takes 100% CPU time as reported by `top`. It should take
practically 0. Somebody said that this was because `top` was
broken, others said that it was because I didn't know how to
code. Nevertheless, the problem was not fixed, even after
schedular changes were made for the current version.

  One can execute:

usleep(0);
... instead of:
sched_yield();

... and Linux then performs exactly like other Unixes when
code is waiting on mutexes.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.12.5 on an i686 machine (5537.79 BogoMips).
Warning : 98.36% of all statistics are fiction.
.
I apologize for the following. I tried to kill it with the above dot :


The information transmitted in this message is confidential and may be 
privileged.  Any review, retransmission, dissemination, or other use of this 
information by persons or entities other than the intended recipient is 
prohibited.  If you are not the intended recipient, please notify Analogic 
Corporation immediately - by replying to this message or by sending an email to 
[EMAIL PROTECTED] - and destroy all copies of this information, including any 
attachments, without reading or disclosing them.

Thank you.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-22 Thread Howard Chu

Nikita Danilov wrote:

Howard Chu writes:

  That's beside the point. Folks are making an assertion that
  sched_yield() is meaningless; this example demonstrates that there are
  cases where sched_yield() is essential.

It is not essential, it is non-portable.

Code you described is based on non-portable expectations about thread
scheduling. Linux implementation of pthreads fails to satisfy
them. Perfectly reasonable. Code is then fixed by adding sched_yield()
calls and introducing more non-portable assumptions. Again, there is no
guarantee this would work on any compliant implementation.

While intuitive semantics of sched_yield() is to yield CPU and to give
other runnable threads their chance to run, this is _not_ what standard
prescribes (for non-RT threads).
  
Very well; it is not prescribed in the standard and it is non-portable. 
Our code is broken and we will fix it.


But even Dave Butenhof, Mr. Pthreads himself, has said it is reasonable 
to expect sched_yield to yield the CPU. That's what pthread_yield did in 
Pthreads Draft 4 (DCE threads) and it is common knowledge that 
sched_yield is a direct replacement for pthread_yield; i.e., 
pthread_yield() was deleted from the spec because sched_yield fulfilled 
its purpose. Now you're saying well, technically, sched_yield doesn't 
have to do anything at all and the letter of the spec supports your 
position, but anybody who's been programming with pthreads since the DCE 
days knows that is not the original intention. I wonder that nobody 
has decided to raise this issue with the IEEE/POSIX group and get them 
to issue a correction/clarification in all this time, since the absence 
of specification here really impairs the usefulness of the spec.


Likewise the fact that sched_yield() can now cause the current process 
to be queued behind other processes seems suspect, unless we know for 
sure that the threads are running with PTHREAD_SCOPE_SYSTEM. (I haven't 
checked to see if PTHREAD_SCOPE_PROCESS is still supported in NPTL.)


--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-22 Thread Howard Chu

Florian Weimer wrote:

* Andi Kleen:

  
Has anybody contacted the Sleepycat people with a description of the 
problem yet?



Berkeley DB does not call sched_yield, but OpenLDAP does in some
wrapper code around the Berkeley DB backend.
That's not the complete story. BerkeleyDB provides a 
db_env_set_func_yield() hook to tell it what yield function it should 
use when its internal locking routines need such a function. If you 
don't set a specific hook, it just uses sleep(). The OpenLDAP backend 
will invoke this hook during some (not necessarily all) init sequences, 
to tell it to use the thread yield function that we selected in autoconf.


Note that (on systems that support inter-process mutexes) a BerkeleyDB 
database environment may be used by multiple processes concurrently. As 
such, the yield function that is provided must work both for threads 
within a single process (PTHREAD_SCOPE_PROCESS) as well as between 
processes (PTHREAD_SCOPE_SYSTEM). The previous comment about slapd only 
needing to yield within a single process is inaccurate; since we allow 
slapcat to run concurrently with slapd (to allow hot backups) we need 
BerkeleyDB's locking/yield functions to work in System scope.


--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-22 Thread Howard Chu

Florian Weimer wrote:

 * Howard Chu:
 That's not the complete story. BerkeleyDB provides a
 db_env_set_func_yield() hook to tell it what yield function it
 should use when its internal locking routines need such a function.
 If you don't set a specific hook, it just uses sleep(). The
 OpenLDAP backend will invoke this hook during some (not necessarily
 all) init sequences, to tell it to use the thread yield function
 that we selected in autoconf.



 And this helps to increase performance substantially?


When the caller is a threaded program, yes, there is a substantial 
(measurable and noticable) difference. Given that sleep() blocks the 
entire process, the difference is obvious.



 Note that (on systems that support inter-process mutexes) a
 BerkeleyDB database environment may be used by multiple processes
 concurrently.

 Yes, I know this, and I haven't experienced that much trouble with
 deadlocks.  Maybe the way you structure and access the database
 environment can be optimized for deadlock avoidance?


Maybe we already did this deadlock analysis and optimization, years ago 
when we first started developing this backend? Do you think everyone 
else in the world is a total fool?



 As such, the yield function that is provided must work both for
 threads within a single process (PTHREAD_SCOPE_PROCESS) as well as
 between processes (PTHREAD_SCOPE_SYSTEM).



 If I understand you correctly, what you really need is a syscall
 along the lines don't run me again until all threads T that share
 property X have run, where the Ts aren't necessarily in the same
 process.  The kernel is psychic, it can't really know which
 processes to schedule to satisfy such a requirement.  I don't even
 think has joined the Berkeley DB environment is the desired
 property, but something like is part of this cycle in the wait-for
 graph or something similar.


You seem to believe we're looking for special treatment for the 
processes we're concerned with, and that's not true. If the system is 
busy with other processes, so be it, the system is busy. If you want 
better performance, you build a dedicated server and don't let anything 
else make the system busy. This is the way mission-critical services are 
delivered, regardless of the service. If you're not running on a 
dedicated system, then your deployment must not be mission critical, and 
so you shouldn't be surprised if a large gcc run slows down some other 
activities in the meantime. If you have a large nice'd job running 
before your normal priority jobs get their timeslice, then you should 
certainly wonder wtf the scheduler is doing, and why your system even 
claims to support nice() when clearly it doesn't mean anything on that 
system.



 I would have to check the Berkeley DB internals in order to tell what
 is feasible to implement.  This code shouldn't be on the fast path,
 so some kernel-based synchronization is probably sufficient.


pthread_cond_wait() probably would be just fine here, but BerkeleyDB 
doesn't work that way.


--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-21 Thread Florian Weimer
* Andi Kleen:

> Has anybody contacted the Sleepycat people with a description of the 
> problem yet?

Berkeley DB does not call sched_yield, but OpenLDAP does in some
wrapper code around the Berkeley DB backend.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-21 Thread Nikita Danilov
Howard Chu writes:
 > Lee Revell wrote:
 > >  On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote:
 > > > But I also found that I needed to add a new yield(), to work around
 > > > yet another unexpected issue on this system - we have a number of
 > > > threads waiting on a condition variable, and the thread holding the
 > > > mutex signals the var, unlocks the mutex, and then immediately
 > > > relocks it. The expectation here is that upon unlocking the mutex,
 > > > the calling thread would block while some waiting thread (that just
 > > > got signaled) would get to run. In fact what happened is that the
 > > > calling thread unlocked and relocked the mutex without allowing any
 > > > of the waiting threads to run. In this case the only solution was
 > > > to insert a yield() after the mutex_unlock().
 > >
 > >  That's exactly the behavior I would expect.  Why would you expect
 > >  unlocking a mutex to cause a reschedule, if the calling thread still
 > >  has timeslice left?
 >
 > That's beside the point. Folks are making an assertion that
 > sched_yield() is meaningless; this example demonstrates that there are
 > cases where sched_yield() is essential.

It is not essential, it is non-portable.

Code you described is based on non-portable "expectations" about thread
scheduling. Linux implementation of pthreads fails to satisfy
them. Perfectly reasonable. Code is then "fixed" by adding sched_yield()
calls and introducing more non-portable assumptions. Again, there is no
guarantee this would work on any compliant implementation.

While "intuitive" semantics of sched_yield() is to yield CPU and to give
other runnable threads their chance to run, this is _not_ what standard
prescribes (for non-RT threads).

 >
 > --
 >   -- Howard Chu

Nikita.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-21 Thread Nikita Danilov
Howard Chu writes:
  Lee Revell wrote:
On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote:
But I also found that I needed to add a new yield(), to work around
yet another unexpected issue on this system - we have a number of
threads waiting on a condition variable, and the thread holding the
mutex signals the var, unlocks the mutex, and then immediately
relocks it. The expectation here is that upon unlocking the mutex,
the calling thread would block while some waiting thread (that just
got signaled) would get to run. In fact what happened is that the
calling thread unlocked and relocked the mutex without allowing any
of the waiting threads to run. In this case the only solution was
to insert a yield() after the mutex_unlock().
  
That's exactly the behavior I would expect.  Why would you expect
unlocking a mutex to cause a reschedule, if the calling thread still
has timeslice left?
 
  That's beside the point. Folks are making an assertion that
  sched_yield() is meaningless; this example demonstrates that there are
  cases where sched_yield() is essential.

It is not essential, it is non-portable.

Code you described is based on non-portable expectations about thread
scheduling. Linux implementation of pthreads fails to satisfy
them. Perfectly reasonable. Code is then fixed by adding sched_yield()
calls and introducing more non-portable assumptions. Again, there is no
guarantee this would work on any compliant implementation.

While intuitive semantics of sched_yield() is to yield CPU and to give
other runnable threads their chance to run, this is _not_ what standard
prescribes (for non-RT threads).

 
  --
-- Howard Chu

Nikita.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-21 Thread Florian Weimer
* Andi Kleen:

 Has anybody contacted the Sleepycat people with a description of the 
 problem yet?

Berkeley DB does not call sched_yield, but OpenLDAP does in some
wrapper code around the Berkeley DB backend.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Robert Hancock

Howard Chu wrote:
I'll note that we removed a number of the yield calls (that were in 
OpenLDAP 2.2) for the 2.3 release, because I found that they were 
redundant and causing unnecessary delays. My own test system is running 
on a Linux 2.6.12.3 kernel (installed over a SuSE 9.2 x86_64 distro), 
and OpenLDAP 2.3 runs perfectly well here, now that those redundant 
calls have been removed. But I also found that I needed to add a new 
yield(), to work around yet another unexpected issue on this system - we 
have a number of threads waiting on a condition variable, and the thread 
holding the mutex signals the var, unlocks the mutex, and then 
immediately relocks it. The expectation here is that upon unlocking the 
mutex, the calling thread would block while some waiting thread (that 
just got signaled) would get to run. In fact what happened is that the 
calling thread unlocked and relocked the mutex without allowing any of 
the waiting threads to run. In this case the only solution was to insert 
a yield() after the mutex_unlock(). So again, for those of you claiming 
"oh, all you need to do is use a condition variable or any of the other 
POSIX synchronization primitives" - yes, that's a nice theory, but 
reality says otherwise.


I encountered a similar issue with some software that I wrote, and used 
a similar workaround, however this was basically because there wasn't 
enough time available at the time to redesign things to work properly. 
The problem here is essentially caused by the fact that the mutex is 
being locked for an excessively large proportion of the time and not 
letting other threads in. In the case I am thinking of, posting the 
messages to the thread that was hogging the mutex via a signaling queue 
would have been a better solution than using yield and having correct 
operation depend on undefined parts of thread scheduling behavior..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Nick Piggin

Howard Chu wrote:

Lee Revell wrote:


 On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote:
> But I also found that I needed to add a new yield(), to work around
> yet another unexpected issue on this system - we have a number of
> threads waiting on a condition variable, and the thread holding the
> mutex signals the var, unlocks the mutex, and then immediately
> relocks it. The expectation here is that upon unlocking the mutex,
> the calling thread would block while some waiting thread (that just
> got signaled) would get to run. In fact what happened is that the
> calling thread unlocked and relocked the mutex without allowing any
> of the waiting threads to run. In this case the only solution was
> to insert a yield() after the mutex_unlock().

 That's exactly the behavior I would expect.  Why would you expect
 unlocking a mutex to cause a reschedule, if the calling thread still
 has timeslice left?



That's beside the point. Folks are making an assertion that 
sched_yield() is meaningless; this example demonstrates that there are 
cases where sched_yield() is essential.




The point is, with SCHED_OTHER scheduling, sched_yield() need not
do anything. It may not let any other tasks run.

The fact that it does on Linux is because we do attempt to do
something expected... but the simple matter is that you can't realy
on it to do what you expect.

I'm not sure exactly how you would solve the above problem, but I'm
sure it can be achieved using mutexes (for example, you could have
a queue where every thread waits on its own private mutex) but I
don't do much userspace C programming sorry.

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Nikita Danilov
Howard Chu writes:
 > Nikita Danilov wrote:
 > > That returns us to the core of the problem: sched_yield() is used to
 > > implement a synchronization primitive and non-portable assumptions are
 > > made about its behavior: SUS defines that after sched_yield() thread
 > > ceases to run on the CPU "until it again becomes the head of its thread
 > > list", and "thread list" discipline is only defined for real-time
 > > scheduling policies. E.g., 
 > >
 > > int sched_yield(void)
 > > {
 > >return 0;
 > > }
 > >
 > > and
 > >
 > > int sched_yield(void)
 > > {
 > >sleep(100);
 > >return 0;
 > > }
 > >
 > > are both valid sched_yield() implementation for non-rt (SCHED_OTHER)
 > > threads.
 > I think you're mistaken:
 > http://groups.google.com/group/comp.programming.threads/browse_frm/thread/0d4eaf3703131e86/da051ebe58976b00#da051ebe58976b00
 > 
 > sched_yield() is required to be supported even if priority scheduling is 
 > not supported, and it is required to cause the calling thread (not 
 > process) to yield the processor.

Of course sched_yield() is required to be supported, the question is for
how long CPU is yielded. Here is the quote from the SUS (actually the
complete definition of sched_yield()):

The sched_yield() function shall force the running thread to
relinquish the processor until it again becomes the head of its
thread list.

As far as I can see, SUS doesn't specify how "thread list" is maintained
for non-RT scheduling policy, and implementation that immediately places
SCHED_OTHER thread that called sched_yield() back at the head of its
thread list is perfectly valid. Also valid is an implementation that
waits for 100 seconds and then places sched_yield() caller to the head
of the list, etc. Basically, while semantics of sched_yield() are well
defined for RT scheduling policy, for SCHED_OTHER policy standard leaves
it implementation defined.

 > 
 > -- 
 >   -- Howard Chu

Nikita.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Lee Revell
On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote:
> Nick Piggin wrote:
> >  Robert Hancock wrote:
> > > I fail to see how sched_yield is going to be very helpful in this
> > > situation. Since that call can sleep from a range of time ranging
> > > from zero to a long time, it's going to give unpredictable results.
> 
> >  Well, not sleep technically, but yield the CPU for some undefined
> >  amount of time.
> 
> Since the slapd server was not written to run in realtime, nor is it 
> commonly run on realtime operating systems, I don't believe predictable 
> timing here is a criteria we care about. One could say the same of 
> sigsuspend() by the way - it can pause a process for a range of time 
> ranging from zero to a long time. Should we tell application writers not 
> to use this function either, regardless of whether the developer thinks 
> they have a good reason to use it?

Of course not.  We should tell them that if they use sigsuspend() they
cannot assume that the process will not wake up immediately.

Lee


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Howard Chu

Lee Revell wrote:

 On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote:
> But I also found that I needed to add a new yield(), to work around
> yet another unexpected issue on this system - we have a number of
> threads waiting on a condition variable, and the thread holding the
> mutex signals the var, unlocks the mutex, and then immediately
> relocks it. The expectation here is that upon unlocking the mutex,
> the calling thread would block while some waiting thread (that just
> got signaled) would get to run. In fact what happened is that the
> calling thread unlocked and relocked the mutex without allowing any
> of the waiting threads to run. In this case the only solution was
> to insert a yield() after the mutex_unlock().

 That's exactly the behavior I would expect.  Why would you expect
 unlocking a mutex to cause a reschedule, if the calling thread still
 has timeslice left?


That's beside the point. Folks are making an assertion that 
sched_yield() is meaningless; this example demonstrates that there are 
cases where sched_yield() is essential.


--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Lee Revell
On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote:
> But I also found that I needed to add a new 
> yield(), to work around yet another unexpected issue on this system -
> we have a number of threads waiting on a condition variable, and the
> thread holding the mutex signals the var, unlocks the mutex, and then 
> immediately relocks it. The expectation here is that upon unlocking
> the mutex, the calling thread would block while some waiting thread
> (that just got signaled) would get to run. In fact what happened is
> that the calling thread unlocked and relocked the mutex without
> allowing any of the waiting threads to run. In this case the only
> solution was to insert a yield() after the mutex_unlock(). 

That's exactly the behavior I would expect.  Why would you expect
unlocking a mutex to cause a reschedule, if the calling thread still has
timeslice left?

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Howard Chu

Nikita Danilov wrote:

That returns us to the core of the problem: sched_yield() is used to
implement a synchronization primitive and non-portable assumptions are
made about its behavior: SUS defines that after sched_yield() thread
ceases to run on the CPU "until it again becomes the head of its thread
list", and "thread list" discipline is only defined for real-time
scheduling policies. E.g., 


int sched_yield(void)
{
   return 0;
}

and

int sched_yield(void)
{
   sleep(100);
   return 0;
}

are both valid sched_yield() implementation for non-rt (SCHED_OTHER)
threads.

I think you're mistaken:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/0d4eaf3703131e86/da051ebe58976b00#da051ebe58976b00

sched_yield() is required to be supported even if priority scheduling is 
not supported, and it is required to cause the calling thread (not 
process) to yield the processor.


--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Howard Chu

Nick Piggin wrote:

 Robert Hancock wrote:
> I fail to see how sched_yield is going to be very helpful in this
> situation. Since that call can sleep from a range of time ranging
> from zero to a long time, it's going to give unpredictable results.



 Well, not sleep technically, but yield the CPU for some undefined
 amount of time.


Since the slapd server was not written to run in realtime, nor is it 
commonly run on realtime operating systems, I don't believe predictable 
timing here is a criteria we care about. One could say the same of 
sigsuspend() by the way - it can pause a process for a range of time 
ranging from zero to a long time. Should we tell application writers not 
to use this function either, regardless of whether the developer thinks 
they have a good reason to use it?



> It seems to me that this sort of thing is why we have POSIX pthread
> synchronization primitives.. sched_yield is basically there for a
> process to indicate that "what I'm doing doesn't matter much, let
> other stuff run". Any other use of it generally constitutes some
> kind of hack.


In terms of transaction recovery, we do an exponential backoff on the 
retries, because our benchmarks showed that under heavy lock contention, 
immediate retries only made things worse. In fact, having arbitrarily 
long backoff delays here was shown to improve transaction throughput. 
(We use select() with an increasing timeval in combination with the 
yield() call. One way or another we get a longer delay as desired.)


sched_yield is there for a *thread* to indicate "what I'm doing doesn't 
matter much, let other stuff run."


I suppose it may be a hack. But then so is TCP congestion control. In 
both cases, empirical evidence indicates the hack is worthwhile. If you 
haven't done the analysis then you're in no position to deny the value 
of the approach.



 In SCHED_OTHER mode, you're right, sched_yield is basically
 meaningless.



 In a realtime system, there is a very well defined and probably
 useful behaviour.



 Eg. If 2 SCHED_FIFO processes are running at the same priority, One
 can call sched_yield to deterministically give the CPU to the other
 guy.


Well yes, the point of a realtime system is to provide deterministic 
response times to unpredictable input.


I'll note that we removed a number of the yield calls (that were in 
OpenLDAP 2.2) for the 2.3 release, because I found that they were 
redundant and causing unnecessary delays. My own test system is running 
on a Linux 2.6.12.3 kernel (installed over a SuSE 9.2 x86_64 distro), 
and OpenLDAP 2.3 runs perfectly well here, now that those redundant 
calls have been removed. But I also found that I needed to add a new 
yield(), to work around yet another unexpected issue on this system - we 
have a number of threads waiting on a condition variable, and the thread 
holding the mutex signals the var, unlocks the mutex, and then 
immediately relocks it. The expectation here is that upon unlocking the 
mutex, the calling thread would block while some waiting thread (that 
just got signaled) would get to run. In fact what happened is that the 
calling thread unlocked and relocked the mutex without allowing any of 
the waiting threads to run. In this case the only solution was to insert 
a yield() after the mutex_unlock(). So again, for those of you claiming 
"oh, all you need to do is use a condition variable or any of the other 
POSIX synchronization primitives" - yes, that's a nice theory, but 
reality says otherwise.


To say that sched_yield is basically meaningless is far overstating your 
point.

--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Andi Kleen
Howard Chu <[EMAIL PROTECTED]> writes:

> In this specific example, we use whatever
> BerkeleyDB provides and we're certainly not about to write our own
> transactional embedded database engine just for this.

BerkeleyDB is free software after all that comes with source code. 
Surely it can be fixed without rewriting it from scratch.

Has anybody contacted the Sleepycat people with a description of the 
problem yet?

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Nikita Danilov
Howard Chu writes:
 > Nikita Danilov wrote:

[...]

 > 
 > >  What prevents transaction monitor from using, say, condition
 > >  variables to "yield cpu"? That would have an additional advantage of
 > >  blocking thread precisely until specific event occurs, instead of
 > >  blocking for some vague indeterminate load and platform dependent
 > >  amount of time.
 > 
 > Condition variables offer no control over which thread is waken up. 

When only one thread waits on a condition variable, which is exactly a
scenario involved, --sorry if I weren't clear enough-- condition signal
provides precise control over which thread is woken up.

 > We're wandering into the design of the SleepyCat BerkeleyDB library 
 > here, and we don't exert any control over that either. BerkeleyDB 
 > doesn't appear to use pthread condition variables; it seems to construct 
 > its own synchronization mechanisms on top of mutexes (and yield calls). 

That returns us to the core of the problem: sched_yield() is used to
implement a synchronization primitive and non-portable assumptions are
made about its behavior: SUS defines that after sched_yield() thread
ceases to run on the CPU "until it again becomes the head of its thread
list", and "thread list" discipline is only defined for real-time
scheduling policies. E.g., 

int sched_yield(void)
{
   return 0;
}

and

int sched_yield(void)
{
   sleep(100);
   return 0;
}

are both valid sched_yield() implementation for non-rt (SCHED_OTHER)
threads.

Nikita.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Nikita Danilov
Howard Chu writes:
  Nikita Danilov wrote:

[...]

  
What prevents transaction monitor from using, say, condition
variables to yield cpu? That would have an additional advantage of
blocking thread precisely until specific event occurs, instead of
blocking for some vague indeterminate load and platform dependent
amount of time.
  
  Condition variables offer no control over which thread is waken up. 

When only one thread waits on a condition variable, which is exactly a
scenario involved, --sorry if I weren't clear enough-- condition signal
provides precise control over which thread is woken up.

  We're wandering into the design of the SleepyCat BerkeleyDB library 
  here, and we don't exert any control over that either. BerkeleyDB 
  doesn't appear to use pthread condition variables; it seems to construct 
  its own synchronization mechanisms on top of mutexes (and yield calls). 

That returns us to the core of the problem: sched_yield() is used to
implement a synchronization primitive and non-portable assumptions are
made about its behavior: SUS defines that after sched_yield() thread
ceases to run on the CPU until it again becomes the head of its thread
list, and thread list discipline is only defined for real-time
scheduling policies. E.g., 

int sched_yield(void)
{
   return 0;
}

and

int sched_yield(void)
{
   sleep(100);
   return 0;
}

are both valid sched_yield() implementation for non-rt (SCHED_OTHER)
threads.

Nikita.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Andi Kleen
Howard Chu [EMAIL PROTECTED] writes:

 In this specific example, we use whatever
 BerkeleyDB provides and we're certainly not about to write our own
 transactional embedded database engine just for this.

BerkeleyDB is free software after all that comes with source code. 
Surely it can be fixed without rewriting it from scratch.

Has anybody contacted the Sleepycat people with a description of the 
problem yet?

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Howard Chu

Nick Piggin wrote:

 Robert Hancock wrote:
 I fail to see how sched_yield is going to be very helpful in this
 situation. Since that call can sleep from a range of time ranging
 from zero to a long time, it's going to give unpredictable results.



 Well, not sleep technically, but yield the CPU for some undefined
 amount of time.


Since the slapd server was not written to run in realtime, nor is it 
commonly run on realtime operating systems, I don't believe predictable 
timing here is a criteria we care about. One could say the same of 
sigsuspend() by the way - it can pause a process for a range of time 
ranging from zero to a long time. Should we tell application writers not 
to use this function either, regardless of whether the developer thinks 
they have a good reason to use it?



 It seems to me that this sort of thing is why we have POSIX pthread
 synchronization primitives.. sched_yield is basically there for a
 process to indicate that what I'm doing doesn't matter much, let
 other stuff run. Any other use of it generally constitutes some
 kind of hack.


In terms of transaction recovery, we do an exponential backoff on the 
retries, because our benchmarks showed that under heavy lock contention, 
immediate retries only made things worse. In fact, having arbitrarily 
long backoff delays here was shown to improve transaction throughput. 
(We use select() with an increasing timeval in combination with the 
yield() call. One way or another we get a longer delay as desired.)


sched_yield is there for a *thread* to indicate what I'm doing doesn't 
matter much, let other stuff run.


I suppose it may be a hack. But then so is TCP congestion control. In 
both cases, empirical evidence indicates the hack is worthwhile. If you 
haven't done the analysis then you're in no position to deny the value 
of the approach.



 In SCHED_OTHER mode, you're right, sched_yield is basically
 meaningless.



 In a realtime system, there is a very well defined and probably
 useful behaviour.



 Eg. If 2 SCHED_FIFO processes are running at the same priority, One
 can call sched_yield to deterministically give the CPU to the other
 guy.


Well yes, the point of a realtime system is to provide deterministic 
response times to unpredictable input.


I'll note that we removed a number of the yield calls (that were in 
OpenLDAP 2.2) for the 2.3 release, because I found that they were 
redundant and causing unnecessary delays. My own test system is running 
on a Linux 2.6.12.3 kernel (installed over a SuSE 9.2 x86_64 distro), 
and OpenLDAP 2.3 runs perfectly well here, now that those redundant 
calls have been removed. But I also found that I needed to add a new 
yield(), to work around yet another unexpected issue on this system - we 
have a number of threads waiting on a condition variable, and the thread 
holding the mutex signals the var, unlocks the mutex, and then 
immediately relocks it. The expectation here is that upon unlocking the 
mutex, the calling thread would block while some waiting thread (that 
just got signaled) would get to run. In fact what happened is that the 
calling thread unlocked and relocked the mutex without allowing any of 
the waiting threads to run. In this case the only solution was to insert 
a yield() after the mutex_unlock(). So again, for those of you claiming 
oh, all you need to do is use a condition variable or any of the other 
POSIX synchronization primitives - yes, that's a nice theory, but 
reality says otherwise.


To say that sched_yield is basically meaningless is far overstating your 
point.

--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Howard Chu

Nikita Danilov wrote:

That returns us to the core of the problem: sched_yield() is used to
implement a synchronization primitive and non-portable assumptions are
made about its behavior: SUS defines that after sched_yield() thread
ceases to run on the CPU until it again becomes the head of its thread
list, and thread list discipline is only defined for real-time
scheduling policies. E.g., 


int sched_yield(void)
{
   return 0;
}

and

int sched_yield(void)
{
   sleep(100);
   return 0;
}

are both valid sched_yield() implementation for non-rt (SCHED_OTHER)
threads.

I think you're mistaken:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/0d4eaf3703131e86/da051ebe58976b00#da051ebe58976b00

sched_yield() is required to be supported even if priority scheduling is 
not supported, and it is required to cause the calling thread (not 
process) to yield the processor.


--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Lee Revell
On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote:
 But I also found that I needed to add a new 
 yield(), to work around yet another unexpected issue on this system -
 we have a number of threads waiting on a condition variable, and the
 thread holding the mutex signals the var, unlocks the mutex, and then 
 immediately relocks it. The expectation here is that upon unlocking
 the mutex, the calling thread would block while some waiting thread
 (that just got signaled) would get to run. In fact what happened is
 that the calling thread unlocked and relocked the mutex without
 allowing any of the waiting threads to run. In this case the only
 solution was to insert a yield() after the mutex_unlock(). 

That's exactly the behavior I would expect.  Why would you expect
unlocking a mutex to cause a reschedule, if the calling thread still has
timeslice left?

Lee

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Howard Chu

Lee Revell wrote:

 On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote:
 But I also found that I needed to add a new yield(), to work around
 yet another unexpected issue on this system - we have a number of
 threads waiting on a condition variable, and the thread holding the
 mutex signals the var, unlocks the mutex, and then immediately
 relocks it. The expectation here is that upon unlocking the mutex,
 the calling thread would block while some waiting thread (that just
 got signaled) would get to run. In fact what happened is that the
 calling thread unlocked and relocked the mutex without allowing any
 of the waiting threads to run. In this case the only solution was
 to insert a yield() after the mutex_unlock().

 That's exactly the behavior I would expect.  Why would you expect
 unlocking a mutex to cause a reschedule, if the calling thread still
 has timeslice left?


That's beside the point. Folks are making an assertion that 
sched_yield() is meaningless; this example demonstrates that there are 
cases where sched_yield() is essential.


--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Lee Revell
On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote:
 Nick Piggin wrote:
   Robert Hancock wrote:
   I fail to see how sched_yield is going to be very helpful in this
   situation. Since that call can sleep from a range of time ranging
   from zero to a long time, it's going to give unpredictable results.
 
   Well, not sleep technically, but yield the CPU for some undefined
   amount of time.
 
 Since the slapd server was not written to run in realtime, nor is it 
 commonly run on realtime operating systems, I don't believe predictable 
 timing here is a criteria we care about. One could say the same of 
 sigsuspend() by the way - it can pause a process for a range of time 
 ranging from zero to a long time. Should we tell application writers not 
 to use this function either, regardless of whether the developer thinks 
 they have a good reason to use it?

Of course not.  We should tell them that if they use sigsuspend() they
cannot assume that the process will not wake up immediately.

Lee


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Nikita Danilov
Howard Chu writes:
  Nikita Danilov wrote:
   That returns us to the core of the problem: sched_yield() is used to
   implement a synchronization primitive and non-portable assumptions are
   made about its behavior: SUS defines that after sched_yield() thread
   ceases to run on the CPU until it again becomes the head of its thread
   list, and thread list discipline is only defined for real-time
   scheduling policies. E.g., 
  
   int sched_yield(void)
   {
  return 0;
   }
  
   and
  
   int sched_yield(void)
   {
  sleep(100);
  return 0;
   }
  
   are both valid sched_yield() implementation for non-rt (SCHED_OTHER)
   threads.
  I think you're mistaken:
  http://groups.google.com/group/comp.programming.threads/browse_frm/thread/0d4eaf3703131e86/da051ebe58976b00#da051ebe58976b00
  
  sched_yield() is required to be supported even if priority scheduling is 
  not supported, and it is required to cause the calling thread (not 
  process) to yield the processor.

Of course sched_yield() is required to be supported, the question is for
how long CPU is yielded. Here is the quote from the SUS (actually the
complete definition of sched_yield()):

The sched_yield() function shall force the running thread to
relinquish the processor until it again becomes the head of its
thread list.

As far as I can see, SUS doesn't specify how thread list is maintained
for non-RT scheduling policy, and implementation that immediately places
SCHED_OTHER thread that called sched_yield() back at the head of its
thread list is perfectly valid. Also valid is an implementation that
waits for 100 seconds and then places sched_yield() caller to the head
of the list, etc. Basically, while semantics of sched_yield() are well
defined for RT scheduling policy, for SCHED_OTHER policy standard leaves
it implementation defined.

  
  -- 
-- Howard Chu

Nikita.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Nick Piggin

Howard Chu wrote:

Lee Revell wrote:


 On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote:
 But I also found that I needed to add a new yield(), to work around
 yet another unexpected issue on this system - we have a number of
 threads waiting on a condition variable, and the thread holding the
 mutex signals the var, unlocks the mutex, and then immediately
 relocks it. The expectation here is that upon unlocking the mutex,
 the calling thread would block while some waiting thread (that just
 got signaled) would get to run. In fact what happened is that the
 calling thread unlocked and relocked the mutex without allowing any
 of the waiting threads to run. In this case the only solution was
 to insert a yield() after the mutex_unlock().

 That's exactly the behavior I would expect.  Why would you expect
 unlocking a mutex to cause a reschedule, if the calling thread still
 has timeslice left?



That's beside the point. Folks are making an assertion that 
sched_yield() is meaningless; this example demonstrates that there are 
cases where sched_yield() is essential.




The point is, with SCHED_OTHER scheduling, sched_yield() need not
do anything. It may not let any other tasks run.

The fact that it does on Linux is because we do attempt to do
something expected... but the simple matter is that you can't realy
on it to do what you expect.

I'm not sure exactly how you would solve the above problem, but I'm
sure it can be achieved using mutexes (for example, you could have
a queue where every thread waits on its own private mutex) but I
don't do much userspace C programming sorry.

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Robert Hancock

Howard Chu wrote:
I'll note that we removed a number of the yield calls (that were in 
OpenLDAP 2.2) for the 2.3 release, because I found that they were 
redundant and causing unnecessary delays. My own test system is running 
on a Linux 2.6.12.3 kernel (installed over a SuSE 9.2 x86_64 distro), 
and OpenLDAP 2.3 runs perfectly well here, now that those redundant 
calls have been removed. But I also found that I needed to add a new 
yield(), to work around yet another unexpected issue on this system - we 
have a number of threads waiting on a condition variable, and the thread 
holding the mutex signals the var, unlocks the mutex, and then 
immediately relocks it. The expectation here is that upon unlocking the 
mutex, the calling thread would block while some waiting thread (that 
just got signaled) would get to run. In fact what happened is that the 
calling thread unlocked and relocked the mutex without allowing any of 
the waiting threads to run. In this case the only solution was to insert 
a yield() after the mutex_unlock(). So again, for those of you claiming 
oh, all you need to do is use a condition variable or any of the other 
POSIX synchronization primitives - yes, that's a nice theory, but 
reality says otherwise.


I encountered a similar issue with some software that I wrote, and used 
a similar workaround, however this was basically because there wasn't 
enough time available at the time to redesign things to work properly. 
The problem here is essentially caused by the fact that the mutex is 
being locked for an excessively large proportion of the time and not 
letting other threads in. In the case I am thinking of, posting the 
messages to the thread that was hogging the mutex via a signaling queue 
would have been a better solution than using yield and having correct 
operation depend on undefined parts of thread scheduling behavior..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-19 Thread Nick Piggin

Robert Hancock wrote:



I fail to see how sched_yield is going to be very helpful in this 
situation. Since that call can sleep from a range of time ranging from 
zero to a long time, it's going to give unpredictable results.




Well, not sleep technically, but yield the CPU for some undefined
amount of time.

It seems to me that this sort of thing is why we have POSIX pthread 
synchronization primitives.. sched_yield is basically there for a 
process to indicate that "what I'm doing doesn't matter much, let other 
stuff run". Any other use of it generally constitutes some kind of hack.




In SCHED_OTHER mode, you're right, sched_yield is basically meaningless.

In a realtime system, there is a very well defined and probably useful
behaviour.

Eg. If 2 SCHED_FIFO processes are running at the same priority, One can
call sched_yield to deterministically give the CPU to the other guy.

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-19 Thread Robert Hancock

Howard Chu wrote:
You assume that spinlocks are the only reason a developer may want to 
yield the processor. This assumption is unfounded. Case in point - the 
primary backend in OpenLDAP uses a transactional database with 
page-level locking of its data structures to provide high levels of 
concurrency. It is the nature of such a system to encounter deadlocks 
over the normal course of operations. When a deadlock is detected, some 
thread must be chosen (by one of a variety of algorithms) to abort its 
transaction, in order to allow other operations to proceed to 
completion. In this situation, the chosen thread must get control of the 
CPU long enough to clean itself up, and then it must yield the CPU in 
order to allow any other competing threads to complete their 
transaction. The thread with the aborted transaction relinquishes all of 
its locks and then waits to get another shot at the CPU to try 
everything over again. Again, this is all fundamental to the nature of 
transactional programming. If the 2.6 kernel makes this programming 
model unreasonably slow, then quite simply this kernel is not viable as 
a database platform.


I fail to see how sched_yield is going to be very helpful in this 
situation. Since that call can sleep from a range of time ranging from 
zero to a long time, it's going to give unpredictable results.


It seems to me that this sort of thing is why we have POSIX pthread 
synchronization primitives.. sched_yield is basically there for a 
process to indicate that "what I'm doing doesn't matter much, let other 
stuff run". Any other use of it generally constitutes some kind of hack.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-19 Thread Howard Chu

Nikita Danilov wrote:

 Howard Chu <[EMAIL PROTECTED]> writes:
> concurrency. It is the nature of such a system to encounter
> deadlocks over the normal course of operations. When a deadlock is
> detected, some thread must be chosen (by one of a variety of
> algorithms) to abort its transaction, in order to allow other
> operations to proceed to completion. In this situation, the chosen
> thread must get control of the CPU long enough to clean itself up,



 What prevents transaction monitor from using, say, condition
 variables to "yield cpu"? That would have an additional advantage of
 blocking thread precisely until specific event occurs, instead of
 blocking for some vague indeterminate load and platform dependent
 amount of time.


Condition variables offer no control over which thread is waken up. 
We're wandering into the design of the SleepyCat BerkeleyDB library 
here, and we don't exert any control over that either. BerkeleyDB 
doesn't appear to use pthread condition variables; it seems to construct 
its own synchronization mechanisms on top of mutexes (and yield calls). 
In this specific example, we use whatever BerkeleyDB provides and we're 
certainly not about to write our own transactional embedded database 
engine just for this.

--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-19 Thread Howard Chu

Chris Wedgwood wrote:

 On Thu, Aug 18, 2005 at 11:03:45PM -0700, Howard Chu wrote:
> If the 2.6 kernel makes this programming model unreasonably slow,
> then quite simply this kernel is not viable as a database platform.



 Pretty much everyone else manages to make it work.


And this contributes to the discussion how? Pretty much every other 
Unix-ish operating system manages to make scheduling with nice'd 
processes work. If you really want to get into what "everyone else 
manages to make work" you're in for a rough ride.


--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-19 Thread Nikita Danilov
Howard Chu <[EMAIL PROTECTED]> writes:

[...]

> concurrency. It is the nature of such a system to encounter deadlocks
> over the normal course of operations. When a deadlock is detected, some
> thread must be chosen (by one of a variety of algorithms) to abort its
> transaction, in order to allow other operations to proceed to
> completion. In this situation, the chosen thread must get control of the
> CPU long enough to clean itself up,

What prevents transaction monitor from using, say, condition variables
to "yield cpu"? That would have an additional advantage of blocking
thread precisely until specific event occurs, instead of blocking for
some vague indeterminate load and platform dependent amount of time.

> and then it must yield the CPU in
> order to allow any other competing threads to complete their
> transaction.

Again, this sounds like thing doable with standard POSIX synchronization
primitives.

>
> -- 
>   -- Howard Chu

Nikita.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-19 Thread Chris Wedgwood
On Thu, Aug 18, 2005 at 11:03:45PM -0700, Howard Chu wrote:

> If the 2.6 kernel makes this programming model unreasonably slow,
> then quite simply this kernel is not viable as a database platform.

Pretty much everyone else manages to make it work.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-19 Thread Nick Piggin

Hi Howard,

Thanks for joining the discussion. One request, if I may,
can you retain the CC list on posts please?

Howard Chu wrote:
>

 AFAIKS, sched_yield should only really be used by realtime
 applications that know exactly what they're doing.



pthread_yield() was deleted from the POSIX threads drafts years ago. 
sched_yield() is the officially supported API, and OpenLDAP is using it 
for the documented purpose. Anyone who says "applications shouldn't be 
using sched_yield()" doesn't know what they're talking about.




Linux's SCHED_OTHER policy offers static priorities in the range [0..0].
I think anything else would be a bug, because from my reading of the
standards, a process with a higher static priority shall always preempt
a process with a lower priority.

And SCHED_OTHER simply doesn't work that way.

So sched_yield() from a SCHED_OTHER task is free to basically do anything
at all. Is that the kind of behaviour you had in mind?


 It's really more a feature than a bug that it breaks so easily
 because they should be really using futexes instead, which have much
 better behaviour than any sched_yield ever could (they will directly
 wake up another process waiting for the lock and avoid the thundering
 herd for contended locks)



You assume that spinlocks are the only reason a developer may want to 
yield the processor. This assumption is unfounded. Case in point - the 
primary backend in OpenLDAP uses a transactional database with 
page-level locking of its data structures to provide high levels of 
concurrency. It is the nature of such a system to encounter deadlocks 
over the normal course of operations. When a deadlock is detected, some 
thread must be chosen (by one of a variety of algorithms) to abort its 
transaction, in order to allow other operations to proceed to 
completion. In this situation, the chosen thread must get control of the 
CPU long enough to clean itself up, and then it must yield the CPU in 
order to allow any other competing threads to complete their 
transaction. The thread with the aborted transaction relinquishes all of 
its locks and then waits to get another shot at the CPU to try 
everything over again. Again, this is all fundamental to the nature of 


You didn't explain why you can't use a mutex to do this. From
your brief description, it seems like a mutex might just do
the job nicely.

transactional programming. If the 2.6 kernel makes this programming 
model unreasonably slow, then quite simply this kernel is not viable as 
a database platform.




Actually it should still be fast. It may yield excessive CPU to
other tasks (including those that are reniced). You didn't rely
on sched_yield providing some semantics about not doing such a
thing, did you?

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-19 Thread Nick Piggin

Hi Howard,

Thanks for joining the discussion. One request, if I may,
can you retain the CC list on posts please?

Howard Chu wrote:


 AFAIKS, sched_yield should only really be used by realtime
 applications that know exactly what they're doing.



pthread_yield() was deleted from the POSIX threads drafts years ago. 
sched_yield() is the officially supported API, and OpenLDAP is using it 
for the documented purpose. Anyone who says applications shouldn't be 
using sched_yield() doesn't know what they're talking about.




Linux's SCHED_OTHER policy offers static priorities in the range [0..0].
I think anything else would be a bug, because from my reading of the
standards, a process with a higher static priority shall always preempt
a process with a lower priority.

And SCHED_OTHER simply doesn't work that way.

So sched_yield() from a SCHED_OTHER task is free to basically do anything
at all. Is that the kind of behaviour you had in mind?


 It's really more a feature than a bug that it breaks so easily
 because they should be really using futexes instead, which have much
 better behaviour than any sched_yield ever could (they will directly
 wake up another process waiting for the lock and avoid the thundering
 herd for contended locks)



You assume that spinlocks are the only reason a developer may want to 
yield the processor. This assumption is unfounded. Case in point - the 
primary backend in OpenLDAP uses a transactional database with 
page-level locking of its data structures to provide high levels of 
concurrency. It is the nature of such a system to encounter deadlocks 
over the normal course of operations. When a deadlock is detected, some 
thread must be chosen (by one of a variety of algorithms) to abort its 
transaction, in order to allow other operations to proceed to 
completion. In this situation, the chosen thread must get control of the 
CPU long enough to clean itself up, and then it must yield the CPU in 
order to allow any other competing threads to complete their 
transaction. The thread with the aborted transaction relinquishes all of 
its locks and then waits to get another shot at the CPU to try 
everything over again. Again, this is all fundamental to the nature of 


You didn't explain why you can't use a mutex to do this. From
your brief description, it seems like a mutex might just do
the job nicely.

transactional programming. If the 2.6 kernel makes this programming 
model unreasonably slow, then quite simply this kernel is not viable as 
a database platform.




Actually it should still be fast. It may yield excessive CPU to
other tasks (including those that are reniced). You didn't rely
on sched_yield providing some semantics about not doing such a
thing, did you?

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-19 Thread Chris Wedgwood
On Thu, Aug 18, 2005 at 11:03:45PM -0700, Howard Chu wrote:

 If the 2.6 kernel makes this programming model unreasonably slow,
 then quite simply this kernel is not viable as a database platform.

Pretty much everyone else manages to make it work.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-19 Thread Nikita Danilov
Howard Chu [EMAIL PROTECTED] writes:

[...]

 concurrency. It is the nature of such a system to encounter deadlocks
 over the normal course of operations. When a deadlock is detected, some
 thread must be chosen (by one of a variety of algorithms) to abort its
 transaction, in order to allow other operations to proceed to
 completion. In this situation, the chosen thread must get control of the
 CPU long enough to clean itself up,

What prevents transaction monitor from using, say, condition variables
to yield cpu? That would have an additional advantage of blocking
thread precisely until specific event occurs, instead of blocking for
some vague indeterminate load and platform dependent amount of time.

 and then it must yield the CPU in
 order to allow any other competing threads to complete their
 transaction.

Again, this sounds like thing doable with standard POSIX synchronization
primitives.


 -- 
   -- Howard Chu

Nikita.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-19 Thread Howard Chu

Chris Wedgwood wrote:

 On Thu, Aug 18, 2005 at 11:03:45PM -0700, Howard Chu wrote:
 If the 2.6 kernel makes this programming model unreasonably slow,
 then quite simply this kernel is not viable as a database platform.



 Pretty much everyone else manages to make it work.


And this contributes to the discussion how? Pretty much every other 
Unix-ish operating system manages to make scheduling with nice'd 
processes work. If you really want to get into what everyone else 
manages to make work you're in for a rough ride.


--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-19 Thread Howard Chu

Nikita Danilov wrote:

 Howard Chu [EMAIL PROTECTED] writes:
 concurrency. It is the nature of such a system to encounter
 deadlocks over the normal course of operations. When a deadlock is
 detected, some thread must be chosen (by one of a variety of
 algorithms) to abort its transaction, in order to allow other
 operations to proceed to completion. In this situation, the chosen
 thread must get control of the CPU long enough to clean itself up,



 What prevents transaction monitor from using, say, condition
 variables to yield cpu? That would have an additional advantage of
 blocking thread precisely until specific event occurs, instead of
 blocking for some vague indeterminate load and platform dependent
 amount of time.


Condition variables offer no control over which thread is waken up. 
We're wandering into the design of the SleepyCat BerkeleyDB library 
here, and we don't exert any control over that either. BerkeleyDB 
doesn't appear to use pthread condition variables; it seems to construct 
its own synchronization mechanisms on top of mutexes (and yield calls). 
In this specific example, we use whatever BerkeleyDB provides and we're 
certainly not about to write our own transactional embedded database 
engine just for this.

--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-19 Thread Robert Hancock

Howard Chu wrote:
You assume that spinlocks are the only reason a developer may want to 
yield the processor. This assumption is unfounded. Case in point - the 
primary backend in OpenLDAP uses a transactional database with 
page-level locking of its data structures to provide high levels of 
concurrency. It is the nature of such a system to encounter deadlocks 
over the normal course of operations. When a deadlock is detected, some 
thread must be chosen (by one of a variety of algorithms) to abort its 
transaction, in order to allow other operations to proceed to 
completion. In this situation, the chosen thread must get control of the 
CPU long enough to clean itself up, and then it must yield the CPU in 
order to allow any other competing threads to complete their 
transaction. The thread with the aborted transaction relinquishes all of 
its locks and then waits to get another shot at the CPU to try 
everything over again. Again, this is all fundamental to the nature of 
transactional programming. If the 2.6 kernel makes this programming 
model unreasonably slow, then quite simply this kernel is not viable as 
a database platform.


I fail to see how sched_yield is going to be very helpful in this 
situation. Since that call can sleep from a range of time ranging from 
zero to a long time, it's going to give unpredictable results.


It seems to me that this sort of thing is why we have POSIX pthread 
synchronization primitives.. sched_yield is basically there for a 
process to indicate that what I'm doing doesn't matter much, let other 
stuff run. Any other use of it generally constitutes some kind of hack.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-19 Thread Nick Piggin

Robert Hancock wrote:



I fail to see how sched_yield is going to be very helpful in this 
situation. Since that call can sleep from a range of time ranging from 
zero to a long time, it's going to give unpredictable results.




Well, not sleep technically, but yield the CPU for some undefined
amount of time.

It seems to me that this sort of thing is why we have POSIX pthread 
synchronization primitives.. sched_yield is basically there for a 
process to indicate that what I'm doing doesn't matter much, let other 
stuff run. Any other use of it generally constitutes some kind of hack.




In SCHED_OTHER mode, you're right, sched_yield is basically meaningless.

In a realtime system, there is a very well defined and probably useful
behaviour.

Eg. If 2 SCHED_FIFO processes are running at the same priority, One can
call sched_yield to deterministically give the CPU to the other guy.

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-18 Thread Bernardo Innocenti
Andi Kleen wrote:
> Bernardo Innocenti <[EMAIL PROTECTED]> writes:
> 
> It's really more a feature than a bug that it breaks so easily
> because they should be really using futexes instead, which
> have much better behaviour than any sched_yield ever could
> (they will directly wake up another process waiting for the
> lock and avoid the thundering herd for contended locks) 

Actually, I believe they should be using pthread synchronization
primitives instead of relying on Linux-specific functionality.
Glibc already uses futexes internally, so it's almost as efficient.

I've already suggested this to the OpenLDAP people, but with
my limited knowledge of slapd threading requirements, there
may well be a very good reason for busy-waiting with
sched_yield().  Waiting for their answer.

-- 
  // Bernardo Innocenti - Develer S.r.l., R dept.
\X/  http://www.develer.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-18 Thread Andi Kleen
Bernardo Innocenti <[EMAIL PROTECTED]> writes:

It's really more a feature than a bug that it breaks so easily
because they should be really using futexes instead, which
have much better behaviour than any sched_yield ever could
(they will directly wake up another process waiting for the
lock and avoid the thundering herd for contended locks) 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-18 Thread Bernardo Innocenti
Nick Piggin wrote:

> We class the SCHED_OTHER policy as having a single priority, which
> I believe is allowed (and even makes good sense, because dynamic
> and even nice priorities aren't really well defined).
> 
> That also makes our sched_yield() behaviour correct.
> 
> AFAIKS, sched_yield should only really be used by realtime
> applications that know exactly what they're doing.

I'm pretty sure this has already been discussed in the
past, but I fail to see why this new behavior of
sched_yield() would be more correct.

In the OpenLDAP bug discussion, one of the developers
considers this a Linux quirk needing a workaround, not
a real bug in OpenLDAP.

As I understand it, the old behavior was to push the
yielding process to the end of the queue for processes
with the same niceness.  This is somewhat closer to
the (vague) definition in the POSIX man pages:

 The sched_yield() function shall force the running
 thread to relinquish the processor until it again
 becomes the head of its thread list. It takes no arguments.

Pushing the process far behind in the queue, even after
niced CPU crunchers, appears a bit extreme.  It seems
most programs expect sched_yield() to only reschedule
the calling thread wrt its sibling threads, to be used
to implement do-it-yourself spinlocks and the like.

-- 
  // Bernardo Innocenti - Develer S.r.l., R dept.
\X/  http://www.develer.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-18 Thread Maciej Soltysiak
Hello Con,

Thursday, August 18, 2005, 2:47:25 AM, you wrote:
> sched_yield behaviour changed in 2.5 series more than 3 years ago and
> applications that use this as a locking primitive should be updated.
I remember open office had a problem with excessive use of sched_yield()
during 2.5. I guess they changed it but I have not checked.
Does anyone know ?

Back then oo was having serious latency problems on 2.5

Regards,
Maciej


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-18 Thread Maciej Soltysiak
Hello Con,

Thursday, August 18, 2005, 2:47:25 AM, you wrote:
 sched_yield behaviour changed in 2.5 series more than 3 years ago and
 applications that use this as a locking primitive should be updated.
I remember open office had a problem with excessive use of sched_yield()
during 2.5. I guess they changed it but I have not checked.
Does anyone know ?

Back then oo was having serious latency problems on 2.5

Regards,
Maciej


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-18 Thread Andi Kleen
Bernardo Innocenti [EMAIL PROTECTED] writes:

It's really more a feature than a bug that it breaks so easily
because they should be really using futexes instead, which
have much better behaviour than any sched_yield ever could
(they will directly wake up another process waiting for the
lock and avoid the thundering herd for contended locks) 

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-18 Thread Bernardo Innocenti
Nick Piggin wrote:

 We class the SCHED_OTHER policy as having a single priority, which
 I believe is allowed (and even makes good sense, because dynamic
 and even nice priorities aren't really well defined).
 
 That also makes our sched_yield() behaviour correct.
 
 AFAIKS, sched_yield should only really be used by realtime
 applications that know exactly what they're doing.

I'm pretty sure this has already been discussed in the
past, but I fail to see why this new behavior of
sched_yield() would be more correct.

In the OpenLDAP bug discussion, one of the developers
considers this a Linux quirk needing a workaround, not
a real bug in OpenLDAP.

As I understand it, the old behavior was to push the
yielding process to the end of the queue for processes
with the same niceness.  This is somewhat closer to
the (vague) definition in the POSIX man pages:

 The sched_yield() function shall force the running
 thread to relinquish the processor until it again
 becomes the head of its thread list. It takes no arguments.

Pushing the process far behind in the queue, even after
niced CPU crunchers, appears a bit extreme.  It seems
most programs expect sched_yield() to only reschedule
the calling thread wrt its sibling threads, to be used
to implement do-it-yourself spinlocks and the like.

-- 
  // Bernardo Innocenti - Develer S.r.l., RD dept.
\X/  http://www.develer.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-18 Thread Bernardo Innocenti
Andi Kleen wrote:
 Bernardo Innocenti [EMAIL PROTECTED] writes:
 
 It's really more a feature than a bug that it breaks so easily
 because they should be really using futexes instead, which
 have much better behaviour than any sched_yield ever could
 (they will directly wake up another process waiting for the
 lock and avoid the thundering herd for contended locks) 

Actually, I believe they should be using pthread synchronization
primitives instead of relying on Linux-specific functionality.
Glibc already uses futexes internally, so it's almost as efficient.

I've already suggested this to the OpenLDAP people, but with
my limited knowledge of slapd threading requirements, there
may well be a very good reason for busy-waiting with
sched_yield().  Waiting for their answer.

-- 
  // Bernardo Innocenti - Develer S.r.l., RD dept.
\X/  http://www.develer.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-17 Thread Nick Piggin

Joseph Fannin wrote:


On Thu, Aug 18, 2005 at 02:50:16AM +0200, Bernardo Innocenti wrote:



The relative timestamp reveals that slapd is spending 50ms
after yielding.  Meanwhile, GCC is probably being scheduled
for a whole quantum.

Reading the man-page of sched_yield() it seems this isn't
the correct behavior:

  Note: If the current process is the only process in the
  highest priority list at that time, this process will
  continue to run after a call to sched_yield.



  The behavior of sched_yield changed for 2.6.  I suppose the man
page didn't get updated.




We class the SCHED_OTHER policy as having a single priority, which
I believe is allowed (and even makes good sense, because dynamic
and even nice priorities aren't really well defined).

That also makes our sched_yield() behaviour correct.

AFAIKS, sched_yield should only really be used by realtime
applications that know exactly what they're doing.


Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-17 Thread Bernardo Innocenti
Joseph Fannin wrote:

>The behavior of sched_yield changed for 2.6.  I suppose the man
> page didn't get updated.

Now I remember reading about that on LWN or maybe KernelTraffic.
Thanks!

>>I also think OpenLDAP is wrong.  First, it should be calling
>>pthread_yield() because slapd is a multithreading process
>>and it just wants to run the other threads.  See:
> 
> Is it possible that this problem has been noticed and fixed
> already?

The OpenLDAP 2.3.5 source still looks like this.
I've filed a report in OpenLDAP's issue tracker:

 http://www.openldap.org/its/index.cgi/Incoming?id=3950;page=2

-- 
  // Bernardo Innocenti - Develer S.r.l., R dept.
\X/  http://www.develer.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-17 Thread Joseph Fannin
On Thu, Aug 18, 2005 at 02:50:16AM +0200, Bernardo Innocenti wrote:

> The relative timestamp reveals that slapd is spending 50ms
> after yielding.  Meanwhile, GCC is probably being scheduled
> for a whole quantum.
>
> Reading the man-page of sched_yield() it seems this isn't
> the correct behavior:
>
>Note: If the current process is the only process in the
>highest priority list at that time, this process will
>continue to run after a call to sched_yield.

   The behavior of sched_yield changed for 2.6.  I suppose the man
page didn't get updated.

>From linux/Documentation/post-halloween.txt:

| - The behavior of sched_yield() changed a lot.  A task that uses
|   this system call should now expect to sleep for possibly a very
|   long time.  Tasks that do not really desire to give up the
|   processor for a while should probably not make heavy use of this
|   function.  Unfortunately, some GUI programs (like Open Office)
|   do make excessive use of this call and under load their
|   performance is poor.  It seems this new 2.6 behavior is optimal
|   but some user-space applications may need fixing.

This is pretty much all I know about it; I just thought I'd point
it out.

> I also think OpenLDAP is wrong.  First, it should be calling
> pthread_yield() because slapd is a multithreading process
> and it just wants to run the other threads.  See:

Is it possible that this problem has been noticed and fixed
already?

--
Joseph Fannin
[EMAIL PROTECTED]

 /* So there I am, in the middle of my `netfilter-is-wonderful'
talk in Sydney, and someone asks `What happens if you try
to enlarge a 64k packet here?'. I think I said something
eloquent like `fuck'. - RR */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-17 Thread Con Kolivas
On Thu, 18 Aug 2005 10:50 am, Bernardo Innocenti wrote:
> Hello,
>
> I've been investigating a performance problem on a
> server using OpenLDAP 2.2.26 for nss resolution and
> running kernel 2.6.12.
>
> When a CPU bound process such as GCC is running in the
> background (even at nice 10), many trivial commands such
> as "su" or "groups" become extremely slow and take a few
> seconds to complete.
>
> strace revealed that data exchange over the slapd socket
> was where most of the time was spent.  Looking at the
> slapd side, I see several calls to sched_yield() like this:
>
>
> [pid  8780]  0.33 stat64("gidNumber.dbb", 0xb7b3ebcc) = -1 EACCES
> (Permission denied) [pid  8780]  0.59 pread(20,
> "\0\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\2\0\344\17\2\3"..., 4096, 4096) =
> 4096 [pid  8780]  0.83 pread(20,
> "\0\0\0\0\1\0\0\0\4\0\0\0\3\0\0\0\0\0\0\0\222\0<\7\1\5\370"..., 4096,
> 16384) = 4096 [pid  8780]  0.78 time(NULL)= 1124322520
> [pid  8780]  0.66 pread(11,
> "\0\0\0\0\1\0\0\0\250\0\0\0\231\0\0\0\235\0\0\0\16\"..., 4096, 688128)
> = 4096 [pid  8780]  0.000241 write(19,
> "0e\2\1\3d`\4$cn=bernie,ou=group,dc=d"..., 103) = 103 [pid  8780] 
> 0.000137 sched_yield( 
> [pid  8781]  0.050020 <... sched_yield resumed> ) = 0
> [pid  8780]  0.25 <... sched_yield resumed> ) = 0
> [pid  8781]  0.60 futex(0x925ab20, FUTEX_WAIT, 33, NULL  ...> [pid  8780]  0.26 write(19, "0\f\2\1\3e\7\n\1\0\4\0\4\0", 14)
> = 14 [pid  8774]  0.000774 <... select resumed> ) = 1 (in [19])
>
>
> The relative timestamp reveals that slapd is spending 50ms
> after yielding.  Meanwhile, GCC is probably being scheduled
> for a whole quantum.
>
> Reading the man-page of sched_yield() it seems this isn't
> the correct behavior:
>
>Note: If the current process is the only process in the
>highest priority list at that time, this process will
>continue to run after a call to sched_yield.
>
> I also think OpenLDAP is wrong.  First, it should be calling
> pthread_yield() because slapd is a multithreading process
> and it just wants to run the other threads.  See:

sched_yield behaviour changed in 2.5 series more than 3 years ago and 
applications that use this as a locking primitive should be updated.

Cheers,
Con
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


sched_yield() makes OpenLDAP slow

2005-08-17 Thread Bernardo Innocenti
Hello,

I've been investigating a performance problem on a
server using OpenLDAP 2.2.26 for nss resolution and
running kernel 2.6.12.

When a CPU bound process such as GCC is running in the
background (even at nice 10), many trivial commands such
as "su" or "groups" become extremely slow and take a few
seconds to complete.

strace revealed that data exchange over the slapd socket
was where most of the time was spent.  Looking at the
slapd side, I see several calls to sched_yield() like this:


[pid  8780]  0.33 stat64("gidNumber.dbb", 0xb7b3ebcc) = -1 EACCES 
(Permission denied)
[pid  8780]  0.59 pread(20, 
"\0\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\2\0\344\17\2\3"..., 4096, 4096) = 4096
[pid  8780]  0.83 pread(20, 
"\0\0\0\0\1\0\0\0\4\0\0\0\3\0\0\0\0\0\0\0\222\0<\7\1\5\370"..., 4096, 16384) = 
4096
[pid  8780]  0.78 time(NULL)= 1124322520
[pid  8780]  0.66 pread(11, 
"\0\0\0\0\1\0\0\0\250\0\0\0\231\0\0\0\235\0\0\0\16\"..., 4096, 688128) = 
4096
[pid  8780]  0.000241 write(19, "0e\2\1\3d`\4$cn=bernie,ou=group,dc=d"..., 
103) = 103
[pid  8780]  0.000137 sched_yield( 
[pid  8781]  0.050020 <... sched_yield resumed> ) = 0
[pid  8780]  0.25 <... sched_yield resumed> ) = 0
[pid  8781]  0.60 futex(0x925ab20, FUTEX_WAIT, 33, NULL 
[pid  8780]  0.26 write(19, "0\f\2\1\3e\7\n\1\0\4\0\4\0", 14) = 14
[pid  8774]  0.000774 <... select resumed> ) = 1 (in [19])


The relative timestamp reveals that slapd is spending 50ms
after yielding.  Meanwhile, GCC is probably being scheduled
for a whole quantum.

Reading the man-page of sched_yield() it seems this isn't
the correct behavior:

   Note: If the current process is the only process in the
   highest priority list at that time, this process will
   continue to run after a call to sched_yield.

I also think OpenLDAP is wrong.  First, it should be calling
pthread_yield() because slapd is a multithreading process
and it just wants to run the other threads.  See:

int
ldap_pvt_thread_yield( void )
{
#if HAVE_THR_YIELD
return thr_yield();

#elif HAVE_PTHREADS == 10
return sched_yield();

#elif defined(_POSIX_THREAD_IS_GNU_PTH)
sched_yield();
return 0;

#elif HAVE_PTHREADS == 6
pthread_yield(NULL);
return 0;
#else
pthread_yield();
return 0;
#endif
}


-- 
  // Bernardo Innocenti - Develer S.r.l., R dept.
\X/  http://www.develer.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


sched_yield() makes OpenLDAP slow

2005-08-17 Thread Bernardo Innocenti
Hello,

I've been investigating a performance problem on a
server using OpenLDAP 2.2.26 for nss resolution and
running kernel 2.6.12.

When a CPU bound process such as GCC is running in the
background (even at nice 10), many trivial commands such
as su or groups become extremely slow and take a few
seconds to complete.

strace revealed that data exchange over the slapd socket
was where most of the time was spent.  Looking at the
slapd side, I see several calls to sched_yield() like this:


[pid  8780]  0.33 stat64(gidNumber.dbb, 0xb7b3ebcc) = -1 EACCES 
(Permission denied)
[pid  8780]  0.59 pread(20, 
\0\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\2\0\344\17\2\3..., 4096, 4096) = 4096
[pid  8780]  0.83 pread(20, 
\0\0\0\0\1\0\0\0\4\0\0\0\3\0\0\0\0\0\0\0\222\0\7\1\5\370..., 4096, 16384) = 
4096
[pid  8780]  0.78 time(NULL)= 1124322520
[pid  8780]  0.66 pread(11, 
\0\0\0\0\1\0\0\0\250\0\0\0\231\0\0\0\235\0\0\0\16\..., 4096, 688128) = 
4096
[pid  8780]  0.000241 write(19, 0e\2\1\3d`\4$cn=bernie,ou=group,dc=d..., 
103) = 103
[pid  8780]  0.000137 sched_yield( unfinished ...
[pid  8781]  0.050020 ... sched_yield resumed ) = 0
[pid  8780]  0.25 ... sched_yield resumed ) = 0
[pid  8781]  0.60 futex(0x925ab20, FUTEX_WAIT, 33, NULL unfinished ...
[pid  8780]  0.26 write(19, 0\f\2\1\3e\7\n\1\0\4\0\4\0, 14) = 14
[pid  8774]  0.000774 ... select resumed ) = 1 (in [19])


The relative timestamp reveals that slapd is spending 50ms
after yielding.  Meanwhile, GCC is probably being scheduled
for a whole quantum.

Reading the man-page of sched_yield() it seems this isn't
the correct behavior:

   Note: If the current process is the only process in the
   highest priority list at that time, this process will
   continue to run after a call to sched_yield.

I also think OpenLDAP is wrong.  First, it should be calling
pthread_yield() because slapd is a multithreading process
and it just wants to run the other threads.  See:

int
ldap_pvt_thread_yield( void )
{
#if HAVE_THR_YIELD
return thr_yield();

#elif HAVE_PTHREADS == 10
return sched_yield();

#elif defined(_POSIX_THREAD_IS_GNU_PTH)
sched_yield();
return 0;

#elif HAVE_PTHREADS == 6
pthread_yield(NULL);
return 0;
#else
pthread_yield();
return 0;
#endif
}


-- 
  // Bernardo Innocenti - Develer S.r.l., RD dept.
\X/  http://www.develer.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-17 Thread Con Kolivas
On Thu, 18 Aug 2005 10:50 am, Bernardo Innocenti wrote:
 Hello,

 I've been investigating a performance problem on a
 server using OpenLDAP 2.2.26 for nss resolution and
 running kernel 2.6.12.

 When a CPU bound process such as GCC is running in the
 background (even at nice 10), many trivial commands such
 as su or groups become extremely slow and take a few
 seconds to complete.

 strace revealed that data exchange over the slapd socket
 was where most of the time was spent.  Looking at the
 slapd side, I see several calls to sched_yield() like this:


 [pid  8780]  0.33 stat64(gidNumber.dbb, 0xb7b3ebcc) = -1 EACCES
 (Permission denied) [pid  8780]  0.59 pread(20,
 \0\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\2\0\344\17\2\3..., 4096, 4096) =
 4096 [pid  8780]  0.83 pread(20,
 \0\0\0\0\1\0\0\0\4\0\0\0\3\0\0\0\0\0\0\0\222\0\7\1\5\370..., 4096,
 16384) = 4096 [pid  8780]  0.78 time(NULL)= 1124322520
 [pid  8780]  0.66 pread(11,
 \0\0\0\0\1\0\0\0\250\0\0\0\231\0\0\0\235\0\0\0\16\..., 4096, 688128)
 = 4096 [pid  8780]  0.000241 write(19,
 0e\2\1\3d`\4$cn=bernie,ou=group,dc=d..., 103) = 103 [pid  8780] 
 0.000137 sched_yield( unfinished ...
 [pid  8781]  0.050020 ... sched_yield resumed ) = 0
 [pid  8780]  0.25 ... sched_yield resumed ) = 0
 [pid  8781]  0.60 futex(0x925ab20, FUTEX_WAIT, 33, NULL unfinished
 ... [pid  8780]  0.26 write(19, 0\f\2\1\3e\7\n\1\0\4\0\4\0, 14)
 = 14 [pid  8774]  0.000774 ... select resumed ) = 1 (in [19])


 The relative timestamp reveals that slapd is spending 50ms
 after yielding.  Meanwhile, GCC is probably being scheduled
 for a whole quantum.

 Reading the man-page of sched_yield() it seems this isn't
 the correct behavior:

Note: If the current process is the only process in the
highest priority list at that time, this process will
continue to run after a call to sched_yield.

 I also think OpenLDAP is wrong.  First, it should be calling
 pthread_yield() because slapd is a multithreading process
 and it just wants to run the other threads.  See:

sched_yield behaviour changed in 2.5 series more than 3 years ago and 
applications that use this as a locking primitive should be updated.

Cheers,
Con
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-17 Thread Joseph Fannin
On Thu, Aug 18, 2005 at 02:50:16AM +0200, Bernardo Innocenti wrote:

 The relative timestamp reveals that slapd is spending 50ms
 after yielding.  Meanwhile, GCC is probably being scheduled
 for a whole quantum.

 Reading the man-page of sched_yield() it seems this isn't
 the correct behavior:

Note: If the current process is the only process in the
highest priority list at that time, this process will
continue to run after a call to sched_yield.

   The behavior of sched_yield changed for 2.6.  I suppose the man
page didn't get updated.

From linux/Documentation/post-halloween.txt:

| - The behavior of sched_yield() changed a lot.  A task that uses
|   this system call should now expect to sleep for possibly a very
|   long time.  Tasks that do not really desire to give up the
|   processor for a while should probably not make heavy use of this
|   function.  Unfortunately, some GUI programs (like Open Office)
|   do make excessive use of this call and under load their
|   performance is poor.  It seems this new 2.6 behavior is optimal
|   but some user-space applications may need fixing.

This is pretty much all I know about it; I just thought I'd point
it out.

 I also think OpenLDAP is wrong.  First, it should be calling
 pthread_yield() because slapd is a multithreading process
 and it just wants to run the other threads.  See:

Is it possible that this problem has been noticed and fixed
already?

--
Joseph Fannin
[EMAIL PROTECTED]

 /* So there I am, in the middle of my `netfilter-is-wonderful'
talk in Sydney, and someone asks `What happens if you try
to enlarge a 64k packet here?'. I think I said something
eloquent like `fuck'. - RR */
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-17 Thread Bernardo Innocenti
Joseph Fannin wrote:

The behavior of sched_yield changed for 2.6.  I suppose the man
 page didn't get updated.

Now I remember reading about that on LWN or maybe KernelTraffic.
Thanks!

I also think OpenLDAP is wrong.  First, it should be calling
pthread_yield() because slapd is a multithreading process
and it just wants to run the other threads.  See:
 
 Is it possible that this problem has been noticed and fixed
 already?

The OpenLDAP 2.3.5 source still looks like this.
I've filed a report in OpenLDAP's issue tracker:

 http://www.openldap.org/its/index.cgi/Incoming?id=3950;page=2

-- 
  // Bernardo Innocenti - Develer S.r.l., RD dept.
\X/  http://www.develer.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield() makes OpenLDAP slow

2005-08-17 Thread Nick Piggin

Joseph Fannin wrote:


On Thu, Aug 18, 2005 at 02:50:16AM +0200, Bernardo Innocenti wrote:



The relative timestamp reveals that slapd is spending 50ms
after yielding.  Meanwhile, GCC is probably being scheduled
for a whole quantum.

Reading the man-page of sched_yield() it seems this isn't
the correct behavior:

  Note: If the current process is the only process in the
  highest priority list at that time, this process will
  continue to run after a call to sched_yield.



  The behavior of sched_yield changed for 2.6.  I suppose the man
page didn't get updated.




We class the SCHED_OTHER policy as having a single priority, which
I believe is allowed (and even makes good sense, because dynamic
and even nice priorities aren't really well defined).

That also makes our sched_yield() behaviour correct.

AFAIKS, sched_yield should only really be used by realtime
applications that know exactly what they're doing.


Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/