Re: sched_yield() makes OpenLDAP slow
Denis Vlasenko wrote: This is what I would expect if run on an otherwise idle machine. sched_yield just puts you at the back of the line for runnable processes, it doesn't magically cause you to go to sleep somehow. When a kernel build is occurring??? Plus `top` itself It damn well sleep while giving up the CPU. If it doesn't it's broken. unless you have all of the kernel source in the buffer cache, a concurrent kernel build will spend a fair bit of time in io_wait state .. as such its perfectly plausible that sched_yield keeps popping back to the top of 'runnable' processes . . . cheers, lincoln. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
On Tuesday 23 August 2005 14:17, linux-os \(Dick Johnson\) wrote: > > On Mon, 22 Aug 2005, Robert Hancock wrote: > > > linux-os (Dick Johnson) wrote: > >> I reported thet sched_yield() wasn't working (at least as expected) > >> back in March of 2004. > >> > >>for(;;) > >> sched_yield(); > >> > >> ... takes 100% CPU time as reported by `top`. It should take > >> practically 0. Somebody said that this was because `top` was > >> broken, others said that it was because I didn't know how to > >> code. Nevertheless, the problem was not fixed, even after > >> schedular changes were made for the current version. > > > > This is what I would expect if run on an otherwise idle machine. > > sched_yield just puts you at the back of the line for runnable > > processes, it doesn't magically cause you to go to sleep somehow. > > > > When a kernel build is occurring??? Plus `top` itself It damn > well sleep while giving up the CPU. If it doesn't it's broken. top doesn't run all the time: # strace -o top.strace -tt top 14:52:19.407958 write(1, " 758 root 16 0 104 2"..., 79) = 79 14:52:19.408318 write(1, " 759 root 16 0 100 1"..., 79) = 79 14:52:19.408659 write(1, " 760 root 16 0 100 1"..., 79) = 79 14:52:19.409001 write(1, " 761 root 18 0 2604 39"..., 74) = 74 14:52:19.409342 write(1, " 763 daemon17 0 108 1"..., 78) = 78 14:52:19.409672 write(1, " 773 root 16 0 104 2"..., 79) = 79 14:52:19.410010 write(1, " 774 root 16 0 104 2"..., 79) = 79 14:52:19.410362 write(1, " 775 root 16 0 100 1"..., 79) = 79 14:52:19.410692 write(1, " 776 root 16 0 104 2"..., 79) = 79 14:52:19.411136 write(1, " 777 daemon17 0 108 1"..., 86) = 86 14:52:19.411505 select(1, [0], NULL, NULL, {5, 0}) = 0 (Timeout) hrrr. ps... 14:52:24.411744 time([1124797944]) = 1124797944 14:52:24.411883 lseek(4, 0, SEEK_SET) = 0 14:52:24.411957 read(4, "24822.01 18801.28\n", 1023) = 18 14:52:24.412082 access("/var/run/utmpx", F_OK) = -1 ENOENT (No such file or directory) 14:52:24.412224 open("/var/run/utmp", O_RDWR) = 8 14:52:24.412328 fcntl64(8, F_GETFD) = 0 14:52:24.412399 fcntl64(8, F_SETFD, FD_CLOEXEC) = 0 14:52:24.412467 _llseek(8, 0, [0], SEEK_SET) = 0 14:52:24.412556 alarm(0)= 0 14:52:24.412643 rt_sigaction(SIGALRM, {0x4015a57c, [], SA_RESTORER, 0x40094ae8}, {SIG_DFL}, 8) = 0 14:52:24.412747 alarm(1)= 0 However, kernel compile shouldn't. I suggest stracing with -tt "for(;;) yield();" test proggy with and without kernel compile in parallel, and comparing the output... Hmm... actually, knowing that you will argue to death instead... # cat t.c #include int main() { for(;;) sched_yield(); return 0; } # gcc t.c # strace -tt ./a.out ... 15:03:41.211324 sched_yield() = 0 15:03:41.211673 sched_yield() = 0 15:03:41.212034 sched_yield() = 0 15:03:41.212400 sched_yield() = 0 15:03:41.212749 sched_yield() = 0 15:03:41.213126 sched_yield() = 0 15:03:41.213486 sched_yield() = 0 15:03:41.213835 sched_yield() = 0 15:03:41.214220 sched_yield() = 0 15:03:41.214577 sched_yield() = 0 15:03:41.214939 sched_yield() = 0 I start "while true; do true; done" on another console... 15:03:43.314645 sched_yield() = 0 15:03:43.847644 sched_yield() = 0 15:03:43.954635 sched_yield() = 0 15:03:44.063798 sched_yield() = 0 15:03:44.171596 sched_yield() = 0 15:03:44.282624 sched_yield() = 0 15:03:44.391632 sched_yield() = 0 15:03:44.498609 sched_yield() = 0 15:03:44.605584 sched_yield() = 0 15:03:44.712538 sched_yield() = 0 15:03:44.819557 sched_yield() = 0 15:03:44.928594 sched_yield() = 0 15:03:45.040603 sched_yield() = 0 15:03:45.148545 sched_yield() = 0 15:03:45.259311 sched_yield() = 0 15:03:45.368563 sched_yield() = 0 15:03:45.476482 sched_yield() = 0 15:03:45.583568 sched_yield() = 0 15:03:45.690491 sched_yield() = 0 15:03:45.797512 sched_yield() = 0 15:03:45.906534 sched_yield() = 0 15:03:46.013545 sched_yield() = 0 15:03:46.120505 sched_yield() = 0 Ctrl-C # uname -a Linux firebird 2.6.12-r4 #1 SMP Sun Jul 17 13:51:47 EEST 2005 i686 unknown unknown GNU/Linux -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
On Mon, 22 Aug 2005, Robert Hancock wrote: > linux-os (Dick Johnson) wrote: >> I reported thet sched_yield() wasn't working (at least as expected) >> back in March of 2004. >> >> for(;;) >> sched_yield(); >> >> ... takes 100% CPU time as reported by `top`. It should take >> practically 0. Somebody said that this was because `top` was >> broken, others said that it was because I didn't know how to >> code. Nevertheless, the problem was not fixed, even after >> schedular changes were made for the current version. > > This is what I would expect if run on an otherwise idle machine. > sched_yield just puts you at the back of the line for runnable > processes, it doesn't magically cause you to go to sleep somehow. > When a kernel build is occurring??? Plus `top` itself It damn well sleep while giving up the CPU. If it doesn't it's broken. > -- > Robert Hancock Saskatoon, SK, Canada > To email, remove "nospam" from [EMAIL PROTECTED] > Home Page: http://www.roberthancock.com/ > Cheers, Dick Johnson Penguin : Linux version 2.6.12.5 on an i686 machine (5537.79 BogoMips). Warning : 98.36% of all statistics are fiction. . I apologize for the following. I tried to kill it with the above dot : The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
On Mon, 22 Aug 2005, Robert Hancock wrote: linux-os (Dick Johnson) wrote: I reported thet sched_yield() wasn't working (at least as expected) back in March of 2004. for(;;) sched_yield(); ... takes 100% CPU time as reported by `top`. It should take practically 0. Somebody said that this was because `top` was broken, others said that it was because I didn't know how to code. Nevertheless, the problem was not fixed, even after schedular changes were made for the current version. This is what I would expect if run on an otherwise idle machine. sched_yield just puts you at the back of the line for runnable processes, it doesn't magically cause you to go to sleep somehow. When a kernel build is occurring??? Plus `top` itself It damn well sleep while giving up the CPU. If it doesn't it's broken. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ Cheers, Dick Johnson Penguin : Linux version 2.6.12.5 on an i686 machine (5537.79 BogoMips). Warning : 98.36% of all statistics are fiction. . I apologize for the following. I tried to kill it with the above dot : The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
On Tuesday 23 August 2005 14:17, linux-os \(Dick Johnson\) wrote: On Mon, 22 Aug 2005, Robert Hancock wrote: linux-os (Dick Johnson) wrote: I reported thet sched_yield() wasn't working (at least as expected) back in March of 2004. for(;;) sched_yield(); ... takes 100% CPU time as reported by `top`. It should take practically 0. Somebody said that this was because `top` was broken, others said that it was because I didn't know how to code. Nevertheless, the problem was not fixed, even after schedular changes were made for the current version. This is what I would expect if run on an otherwise idle machine. sched_yield just puts you at the back of the line for runnable processes, it doesn't magically cause you to go to sleep somehow. When a kernel build is occurring??? Plus `top` itself It damn well sleep while giving up the CPU. If it doesn't it's broken. top doesn't run all the time: # strace -o top.strace -tt top 14:52:19.407958 write(1, 758 root 16 0 104 2..., 79) = 79 14:52:19.408318 write(1, 759 root 16 0 100 1..., 79) = 79 14:52:19.408659 write(1, 760 root 16 0 100 1..., 79) = 79 14:52:19.409001 write(1, 761 root 18 0 2604 39..., 74) = 74 14:52:19.409342 write(1, 763 daemon17 0 108 1..., 78) = 78 14:52:19.409672 write(1, 773 root 16 0 104 2..., 79) = 79 14:52:19.410010 write(1, 774 root 16 0 104 2..., 79) = 79 14:52:19.410362 write(1, 775 root 16 0 100 1..., 79) = 79 14:52:19.410692 write(1, 776 root 16 0 104 2..., 79) = 79 14:52:19.411136 write(1, 777 daemon17 0 108 1..., 86) = 86 14:52:19.411505 select(1, [0], NULL, NULL, {5, 0}) = 0 (Timeout) hrrr. ps... 14:52:24.411744 time([1124797944]) = 1124797944 14:52:24.411883 lseek(4, 0, SEEK_SET) = 0 14:52:24.411957 read(4, 24822.01 18801.28\n, 1023) = 18 14:52:24.412082 access(/var/run/utmpx, F_OK) = -1 ENOENT (No such file or directory) 14:52:24.412224 open(/var/run/utmp, O_RDWR) = 8 14:52:24.412328 fcntl64(8, F_GETFD) = 0 14:52:24.412399 fcntl64(8, F_SETFD, FD_CLOEXEC) = 0 14:52:24.412467 _llseek(8, 0, [0], SEEK_SET) = 0 14:52:24.412556 alarm(0)= 0 14:52:24.412643 rt_sigaction(SIGALRM, {0x4015a57c, [], SA_RESTORER, 0x40094ae8}, {SIG_DFL}, 8) = 0 14:52:24.412747 alarm(1)= 0 However, kernel compile shouldn't. I suggest stracing with -tt for(;;) yield(); test proggy with and without kernel compile in parallel, and comparing the output... Hmm... actually, knowing that you will argue to death instead... # cat t.c #include sched.h int main() { for(;;) sched_yield(); return 0; } # gcc t.c # strace -tt ./a.out ... 15:03:41.211324 sched_yield() = 0 15:03:41.211673 sched_yield() = 0 15:03:41.212034 sched_yield() = 0 15:03:41.212400 sched_yield() = 0 15:03:41.212749 sched_yield() = 0 15:03:41.213126 sched_yield() = 0 15:03:41.213486 sched_yield() = 0 15:03:41.213835 sched_yield() = 0 15:03:41.214220 sched_yield() = 0 15:03:41.214577 sched_yield() = 0 15:03:41.214939 sched_yield() = 0 I start while true; do true; done on another console... 15:03:43.314645 sched_yield() = 0 15:03:43.847644 sched_yield() = 0 15:03:43.954635 sched_yield() = 0 15:03:44.063798 sched_yield() = 0 15:03:44.171596 sched_yield() = 0 15:03:44.282624 sched_yield() = 0 15:03:44.391632 sched_yield() = 0 15:03:44.498609 sched_yield() = 0 15:03:44.605584 sched_yield() = 0 15:03:44.712538 sched_yield() = 0 15:03:44.819557 sched_yield() = 0 15:03:44.928594 sched_yield() = 0 15:03:45.040603 sched_yield() = 0 15:03:45.148545 sched_yield() = 0 15:03:45.259311 sched_yield() = 0 15:03:45.368563 sched_yield() = 0 15:03:45.476482 sched_yield() = 0 15:03:45.583568 sched_yield() = 0 15:03:45.690491 sched_yield() = 0 15:03:45.797512 sched_yield() = 0 15:03:45.906534 sched_yield() = 0 15:03:46.013545 sched_yield() = 0 15:03:46.120505 sched_yield() = 0 Ctrl-C # uname -a Linux firebird 2.6.12-r4 #1 SMP Sun Jul 17 13:51:47 EEST 2005 i686 unknown unknown GNU/Linux -- vda - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Denis Vlasenko wrote: This is what I would expect if run on an otherwise idle machine. sched_yield just puts you at the back of the line for runnable processes, it doesn't magically cause you to go to sleep somehow. When a kernel build is occurring??? Plus `top` itself It damn well sleep while giving up the CPU. If it doesn't it's broken. unless you have all of the kernel source in the buffer cache, a concurrent kernel build will spend a fair bit of time in io_wait state .. as such its perfectly plausible that sched_yield keeps popping back to the top of 'runnable' processes . . . cheers, lincoln. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Florian Weimer wrote: * Howard Chu: > That's not the complete story. BerkeleyDB provides a > db_env_set_func_yield() hook to tell it what yield function it > should use when its internal locking routines need such a function. > If you don't set a specific hook, it just uses sleep(). The > OpenLDAP backend will invoke this hook during some (not necessarily > all) init sequences, to tell it to use the thread yield function > that we selected in autoconf. And this helps to increase performance substantially? When the caller is a threaded program, yes, there is a substantial (measurable and noticable) difference. Given that sleep() blocks the entire process, the difference is obvious. > Note that (on systems that support inter-process mutexes) a > BerkeleyDB database environment may be used by multiple processes > concurrently. Yes, I know this, and I haven't experienced that much trouble with deadlocks. Maybe the way you structure and access the database environment can be optimized for deadlock avoidance? Maybe we already did this deadlock analysis and optimization, years ago when we first started developing this backend? Do you think everyone else in the world is a total fool? > As such, the yield function that is provided must work both for > threads within a single process (PTHREAD_SCOPE_PROCESS) as well as > between processes (PTHREAD_SCOPE_SYSTEM). If I understand you correctly, what you really need is a syscall along the lines "don't run me again until all threads T that share property X have run, where the Ts aren't necessarily in the same process". The kernel is psychic, it can't really know which processes to schedule to satisfy such a requirement. I don't even think "has joined the Berkeley DB environment" is the desired property, but something like "is part of this cycle in the wait-for graph" or something similar. You seem to believe we're looking for special treatment for the processes we're concerned with, and that's not true. If the system is busy with other processes, so be it, the system is busy. If you want better performance, you build a dedicated server and don't let anything else make the system busy. This is the way mission-critical services are delivered, regardless of the service. If you're not running on a dedicated system, then your deployment must not be mission critical, and so you shouldn't be surprised if a large gcc run slows down some other activities in the meantime. If you have a large nice'd job running before your normal priority jobs get their timeslice, then you should certainly wonder wtf the scheduler is doing, and why your system even claims to support nice() when clearly it doesn't mean anything on that system. I would have to check the Berkeley DB internals in order to tell what is feasible to implement. This code shouldn't be on the fast path, so some kernel-based synchronization is probably sufficient. pthread_cond_wait() probably would be just fine here, but BerkeleyDB doesn't work that way. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Florian Weimer wrote: * Andi Kleen: Has anybody contacted the Sleepycat people with a description of the problem yet? Berkeley DB does not call sched_yield, but OpenLDAP does in some wrapper code around the Berkeley DB backend. That's not the complete story. BerkeleyDB provides a db_env_set_func_yield() hook to tell it what yield function it should use when its internal locking routines need such a function. If you don't set a specific hook, it just uses sleep(). The OpenLDAP backend will invoke this hook during some (not necessarily all) init sequences, to tell it to use the thread yield function that we selected in autoconf. Note that (on systems that support inter-process mutexes) a BerkeleyDB database environment may be used by multiple processes concurrently. As such, the yield function that is provided must work both for threads within a single process (PTHREAD_SCOPE_PROCESS) as well as between processes (PTHREAD_SCOPE_SYSTEM). The previous comment about slapd only needing to yield within a single process is inaccurate; since we allow slapcat to run concurrently with slapd (to allow hot backups) we need BerkeleyDB's locking/yield functions to work in System scope. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Nikita Danilov wrote: Howard Chu writes: > That's beside the point. Folks are making an assertion that > sched_yield() is meaningless; this example demonstrates that there are > cases where sched_yield() is essential. It is not essential, it is non-portable. Code you described is based on non-portable "expectations" about thread scheduling. Linux implementation of pthreads fails to satisfy them. Perfectly reasonable. Code is then "fixed" by adding sched_yield() calls and introducing more non-portable assumptions. Again, there is no guarantee this would work on any compliant implementation. While "intuitive" semantics of sched_yield() is to yield CPU and to give other runnable threads their chance to run, this is _not_ what standard prescribes (for non-RT threads). Very well; it is not prescribed in the standard and it is non-portable. Our code is broken and we will fix it. But even Dave Butenhof, Mr. Pthreads himself, has said it is reasonable to expect sched_yield to yield the CPU. That's what pthread_yield did in Pthreads Draft 4 (DCE threads) and it is common knowledge that sched_yield is a direct replacement for pthread_yield; i.e., pthread_yield() was deleted from the spec because sched_yield fulfilled its purpose. Now you're saying "well, technically, sched_yield doesn't have to do anything at all" and the letter of the spec supports your position, but anybody who's been programming with pthreads since the DCE days "knows" that is not the original intention. I wonder that nobody has decided to raise this issue with the IEEE/POSIX group and get them to issue a correction/clarification in all this time, since the absence of specification here really impairs the usefulness of the spec. Likewise the fact that sched_yield() can now cause the current process to be queued behind other processes seems suspect, unless we know for sure that the threads are running with PTHREAD_SCOPE_SYSTEM. (I haven't checked to see if PTHREAD_SCOPE_PROCESS is still supported in NPTL.) -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
On Sat, 20 Aug 2005, Robert Hancock wrote: > Howard Chu wrote: >> I'll note that we removed a number of the yield calls (that were in >> OpenLDAP 2.2) for the 2.3 release, because I found that they were >> redundant and causing unnecessary delays. My own test system is running >> on a Linux 2.6.12.3 kernel (installed over a SuSE 9.2 x86_64 distro), >> and OpenLDAP 2.3 runs perfectly well here, now that those redundant >> calls have been removed. But I also found that I needed to add a new >> yield(), to work around yet another unexpected issue on this system - we >> have a number of threads waiting on a condition variable, and the thread >> holding the mutex signals the var, unlocks the mutex, and then >> immediately relocks it. The expectation here is that upon unlocking the >> mutex, the calling thread would block while some waiting thread (that >> just got signaled) would get to run. In fact what happened is that the >> calling thread unlocked and relocked the mutex without allowing any of >> the waiting threads to run. In this case the only solution was to insert >> a yield() after the mutex_unlock(). So again, for those of you claiming >> "oh, all you need to do is use a condition variable or any of the other >> POSIX synchronization primitives" - yes, that's a nice theory, but >> reality says otherwise. > > I encountered a similar issue with some software that I wrote, and used > a similar workaround, however this was basically because there wasn't > enough time available at the time to redesign things to work properly. > The problem here is essentially caused by the fact that the mutex is > being locked for an excessively large proportion of the time and not > letting other threads in. In the case I am thinking of, posting the > messages to the thread that was hogging the mutex via a signaling queue > would have been a better solution than using yield and having correct > operation depend on undefined parts of thread scheduling behavior.. > > -- > Robert Hancock Saskatoon, SK, Canada > To email, remove "nospam" from [EMAIL PROTECTED] > Home Page: http://www.roberthancock.com/ > I reported thet sched_yield() wasn't working (at least as expected) back in March of 2004. for(;;) sched_yield(); ... takes 100% CPU time as reported by `top`. It should take practically 0. Somebody said that this was because `top` was broken, others said that it was because I didn't know how to code. Nevertheless, the problem was not fixed, even after schedular changes were made for the current version. One can execute: usleep(0); ... instead of: sched_yield(); ... and Linux then performs exactly like other Unixes when code is waiting on mutexes. Cheers, Dick Johnson Penguin : Linux version 2.6.12.5 on an i686 machine (5537.79 BogoMips). Warning : 98.36% of all statistics are fiction. . I apologize for the following. I tried to kill it with the above dot : The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
* Howard Chu: >>> Has anybody contacted the Sleepycat people with a description of >>> the problem yet? >> Berkeley DB does not call sched_yield, but OpenLDAP does in some >> wrapper code around the Berkeley DB backend. > That's not the complete story. BerkeleyDB provides a > db_env_set_func_yield() hook to tell it what yield function it should > use when its internal locking routines need such a function. If you > don't set a specific hook, it just uses sleep(). The OpenLDAP backend > will invoke this hook during some (not necessarily all) init sequences, > to tell it to use the thread yield function that we selected in autoconf. And this helps to increase performance substantially? > Note that (on systems that support inter-process mutexes) a BerkeleyDB > database environment may be used by multiple processes concurrently. Yes, I know this, and I haven't experienced that much trouble with deadlocks. Maybe the way you structure and access the database environment can be optimized for deadlock avoidance? > As such, the yield function that is provided must work both for > threads within a single process (PTHREAD_SCOPE_PROCESS) as well as > between processes (PTHREAD_SCOPE_SYSTEM). If I understand you correctly, what you really need is a syscall along the lines "don't run me again until all threads T that share property X have run, where the Ts aren't necessarily in the same process". The kernel is psychic, it can't really know which processes to schedule to satisfy such a requirement. I don't even think "has joined the Berkeley DB environment" is the desired property, but something like "is part of this cycle in the wait-for graph" or something similar. I would have to check the Berkeley DB internals in order to tell what is feasible to implement. This code shouldn't be on the fast path, so some kernel-based synchronization is probably sufficient. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
> processes (PTHREAD_SCOPE_SYSTEM). The previous comment about slapd only > needing to yield within a single process is inaccurate; since we allow > slapcat to run concurrently with slapd (to allow hot backups) we need > BerkeleyDB's locking/yield functions to work in System scope. That's broken by design - it means you can be arbitarily starved by other processes running in parallel. You are basically assuming your application is the only thing running on the system which is wrong. Also there are enough synchronization primitives that can synchronize multiple processes without making such broken assumptions. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
linux-os (Dick Johnson) wrote: I reported thet sched_yield() wasn't working (at least as expected) back in March of 2004. for(;;) sched_yield(); ... takes 100% CPU time as reported by `top`. It should take practically 0. Somebody said that this was because `top` was broken, others said that it was because I didn't know how to code. Nevertheless, the problem was not fixed, even after schedular changes were made for the current version. This is what I would expect if run on an otherwise idle machine. sched_yield just puts you at the back of the line for runnable processes, it doesn't magically cause you to go to sleep somehow. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Andi Kleen wrote: > processes (PTHREAD_SCOPE_SYSTEM). The previous comment about slapd > only needing to yield within a single process is inaccurate; since > we allow slapcat to run concurrently with slapd (to allow hot > backups) we need BerkeleyDB's locking/yield functions to work in > System scope. That's broken by design - it means you can be arbitarily starved by other processes running in parallel. You are basically assuming your application is the only thing running on the system which is wrong. Also there are enough synchronization primitives that can synchronize multiple processes without making such broken assumptions. Again, I think you overstate the problem. "Arbitrarily starved by other processes" implies that the process scheduler will do a poor job and will allow the slapd process to be starved. We do not assume we're the only app on the system, we just assume that eventually we will get the CPU back. If that's not a valid assumption, then there is something wrong with the underlying system environment. Something you ought to keep in mind - correctness and compliance are well and good, but worthless if the end result isn't useful. Windows NT has a POSIX-compliant subsystem but it is utterly useless. That's what you wind up with when all you do is conform to the letter of the spec. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Andi Kleen wrote: processes (PTHREAD_SCOPE_SYSTEM). The previous comment about slapd only needing to yield within a single process is inaccurate; since we allow slapcat to run concurrently with slapd (to allow hot backups) we need BerkeleyDB's locking/yield functions to work in System scope. That's broken by design - it means you can be arbitarily starved by other processes running in parallel. You are basically assuming your application is the only thing running on the system which is wrong. Also there are enough synchronization primitives that can synchronize multiple processes without making such broken assumptions. Again, I think you overstate the problem. Arbitrarily starved by other processes implies that the process scheduler will do a poor job and will allow the slapd process to be starved. We do not assume we're the only app on the system, we just assume that eventually we will get the CPU back. If that's not a valid assumption, then there is something wrong with the underlying system environment. Something you ought to keep in mind - correctness and compliance are well and good, but worthless if the end result isn't useful. Windows NT has a POSIX-compliant subsystem but it is utterly useless. That's what you wind up with when all you do is conform to the letter of the spec. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
linux-os (Dick Johnson) wrote: I reported thet sched_yield() wasn't working (at least as expected) back in March of 2004. for(;;) sched_yield(); ... takes 100% CPU time as reported by `top`. It should take practically 0. Somebody said that this was because `top` was broken, others said that it was because I didn't know how to code. Nevertheless, the problem was not fixed, even after schedular changes were made for the current version. This is what I would expect if run on an otherwise idle machine. sched_yield just puts you at the back of the line for runnable processes, it doesn't magically cause you to go to sleep somehow. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
* Howard Chu: Has anybody contacted the Sleepycat people with a description of the problem yet? Berkeley DB does not call sched_yield, but OpenLDAP does in some wrapper code around the Berkeley DB backend. That's not the complete story. BerkeleyDB provides a db_env_set_func_yield() hook to tell it what yield function it should use when its internal locking routines need such a function. If you don't set a specific hook, it just uses sleep(). The OpenLDAP backend will invoke this hook during some (not necessarily all) init sequences, to tell it to use the thread yield function that we selected in autoconf. And this helps to increase performance substantially? Note that (on systems that support inter-process mutexes) a BerkeleyDB database environment may be used by multiple processes concurrently. Yes, I know this, and I haven't experienced that much trouble with deadlocks. Maybe the way you structure and access the database environment can be optimized for deadlock avoidance? As such, the yield function that is provided must work both for threads within a single process (PTHREAD_SCOPE_PROCESS) as well as between processes (PTHREAD_SCOPE_SYSTEM). If I understand you correctly, what you really need is a syscall along the lines don't run me again until all threads T that share property X have run, where the Ts aren't necessarily in the same process. The kernel is psychic, it can't really know which processes to schedule to satisfy such a requirement. I don't even think has joined the Berkeley DB environment is the desired property, but something like is part of this cycle in the wait-for graph or something similar. I would have to check the Berkeley DB internals in order to tell what is feasible to implement. This code shouldn't be on the fast path, so some kernel-based synchronization is probably sufficient. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
processes (PTHREAD_SCOPE_SYSTEM). The previous comment about slapd only needing to yield within a single process is inaccurate; since we allow slapcat to run concurrently with slapd (to allow hot backups) we need BerkeleyDB's locking/yield functions to work in System scope. That's broken by design - it means you can be arbitarily starved by other processes running in parallel. You are basically assuming your application is the only thing running on the system which is wrong. Also there are enough synchronization primitives that can synchronize multiple processes without making such broken assumptions. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
On Sat, 20 Aug 2005, Robert Hancock wrote: Howard Chu wrote: I'll note that we removed a number of the yield calls (that were in OpenLDAP 2.2) for the 2.3 release, because I found that they were redundant and causing unnecessary delays. My own test system is running on a Linux 2.6.12.3 kernel (installed over a SuSE 9.2 x86_64 distro), and OpenLDAP 2.3 runs perfectly well here, now that those redundant calls have been removed. But I also found that I needed to add a new yield(), to work around yet another unexpected issue on this system - we have a number of threads waiting on a condition variable, and the thread holding the mutex signals the var, unlocks the mutex, and then immediately relocks it. The expectation here is that upon unlocking the mutex, the calling thread would block while some waiting thread (that just got signaled) would get to run. In fact what happened is that the calling thread unlocked and relocked the mutex without allowing any of the waiting threads to run. In this case the only solution was to insert a yield() after the mutex_unlock(). So again, for those of you claiming oh, all you need to do is use a condition variable or any of the other POSIX synchronization primitives - yes, that's a nice theory, but reality says otherwise. I encountered a similar issue with some software that I wrote, and used a similar workaround, however this was basically because there wasn't enough time available at the time to redesign things to work properly. The problem here is essentially caused by the fact that the mutex is being locked for an excessively large proportion of the time and not letting other threads in. In the case I am thinking of, posting the messages to the thread that was hogging the mutex via a signaling queue would have been a better solution than using yield and having correct operation depend on undefined parts of thread scheduling behavior.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ I reported thet sched_yield() wasn't working (at least as expected) back in March of 2004. for(;;) sched_yield(); ... takes 100% CPU time as reported by `top`. It should take practically 0. Somebody said that this was because `top` was broken, others said that it was because I didn't know how to code. Nevertheless, the problem was not fixed, even after schedular changes were made for the current version. One can execute: usleep(0); ... instead of: sched_yield(); ... and Linux then performs exactly like other Unixes when code is waiting on mutexes. Cheers, Dick Johnson Penguin : Linux version 2.6.12.5 on an i686 machine (5537.79 BogoMips). Warning : 98.36% of all statistics are fiction. . I apologize for the following. I tried to kill it with the above dot : The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Nikita Danilov wrote: Howard Chu writes: That's beside the point. Folks are making an assertion that sched_yield() is meaningless; this example demonstrates that there are cases where sched_yield() is essential. It is not essential, it is non-portable. Code you described is based on non-portable expectations about thread scheduling. Linux implementation of pthreads fails to satisfy them. Perfectly reasonable. Code is then fixed by adding sched_yield() calls and introducing more non-portable assumptions. Again, there is no guarantee this would work on any compliant implementation. While intuitive semantics of sched_yield() is to yield CPU and to give other runnable threads their chance to run, this is _not_ what standard prescribes (for non-RT threads). Very well; it is not prescribed in the standard and it is non-portable. Our code is broken and we will fix it. But even Dave Butenhof, Mr. Pthreads himself, has said it is reasonable to expect sched_yield to yield the CPU. That's what pthread_yield did in Pthreads Draft 4 (DCE threads) and it is common knowledge that sched_yield is a direct replacement for pthread_yield; i.e., pthread_yield() was deleted from the spec because sched_yield fulfilled its purpose. Now you're saying well, technically, sched_yield doesn't have to do anything at all and the letter of the spec supports your position, but anybody who's been programming with pthreads since the DCE days knows that is not the original intention. I wonder that nobody has decided to raise this issue with the IEEE/POSIX group and get them to issue a correction/clarification in all this time, since the absence of specification here really impairs the usefulness of the spec. Likewise the fact that sched_yield() can now cause the current process to be queued behind other processes seems suspect, unless we know for sure that the threads are running with PTHREAD_SCOPE_SYSTEM. (I haven't checked to see if PTHREAD_SCOPE_PROCESS is still supported in NPTL.) -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Florian Weimer wrote: * Andi Kleen: Has anybody contacted the Sleepycat people with a description of the problem yet? Berkeley DB does not call sched_yield, but OpenLDAP does in some wrapper code around the Berkeley DB backend. That's not the complete story. BerkeleyDB provides a db_env_set_func_yield() hook to tell it what yield function it should use when its internal locking routines need such a function. If you don't set a specific hook, it just uses sleep(). The OpenLDAP backend will invoke this hook during some (not necessarily all) init sequences, to tell it to use the thread yield function that we selected in autoconf. Note that (on systems that support inter-process mutexes) a BerkeleyDB database environment may be used by multiple processes concurrently. As such, the yield function that is provided must work both for threads within a single process (PTHREAD_SCOPE_PROCESS) as well as between processes (PTHREAD_SCOPE_SYSTEM). The previous comment about slapd only needing to yield within a single process is inaccurate; since we allow slapcat to run concurrently with slapd (to allow hot backups) we need BerkeleyDB's locking/yield functions to work in System scope. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Florian Weimer wrote: * Howard Chu: That's not the complete story. BerkeleyDB provides a db_env_set_func_yield() hook to tell it what yield function it should use when its internal locking routines need such a function. If you don't set a specific hook, it just uses sleep(). The OpenLDAP backend will invoke this hook during some (not necessarily all) init sequences, to tell it to use the thread yield function that we selected in autoconf. And this helps to increase performance substantially? When the caller is a threaded program, yes, there is a substantial (measurable and noticable) difference. Given that sleep() blocks the entire process, the difference is obvious. Note that (on systems that support inter-process mutexes) a BerkeleyDB database environment may be used by multiple processes concurrently. Yes, I know this, and I haven't experienced that much trouble with deadlocks. Maybe the way you structure and access the database environment can be optimized for deadlock avoidance? Maybe we already did this deadlock analysis and optimization, years ago when we first started developing this backend? Do you think everyone else in the world is a total fool? As such, the yield function that is provided must work both for threads within a single process (PTHREAD_SCOPE_PROCESS) as well as between processes (PTHREAD_SCOPE_SYSTEM). If I understand you correctly, what you really need is a syscall along the lines don't run me again until all threads T that share property X have run, where the Ts aren't necessarily in the same process. The kernel is psychic, it can't really know which processes to schedule to satisfy such a requirement. I don't even think has joined the Berkeley DB environment is the desired property, but something like is part of this cycle in the wait-for graph or something similar. You seem to believe we're looking for special treatment for the processes we're concerned with, and that's not true. If the system is busy with other processes, so be it, the system is busy. If you want better performance, you build a dedicated server and don't let anything else make the system busy. This is the way mission-critical services are delivered, regardless of the service. If you're not running on a dedicated system, then your deployment must not be mission critical, and so you shouldn't be surprised if a large gcc run slows down some other activities in the meantime. If you have a large nice'd job running before your normal priority jobs get their timeslice, then you should certainly wonder wtf the scheduler is doing, and why your system even claims to support nice() when clearly it doesn't mean anything on that system. I would have to check the Berkeley DB internals in order to tell what is feasible to implement. This code shouldn't be on the fast path, so some kernel-based synchronization is probably sufficient. pthread_cond_wait() probably would be just fine here, but BerkeleyDB doesn't work that way. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
* Andi Kleen: > Has anybody contacted the Sleepycat people with a description of the > problem yet? Berkeley DB does not call sched_yield, but OpenLDAP does in some wrapper code around the Berkeley DB backend. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Howard Chu writes: > Lee Revell wrote: > > On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote: > > > But I also found that I needed to add a new yield(), to work around > > > yet another unexpected issue on this system - we have a number of > > > threads waiting on a condition variable, and the thread holding the > > > mutex signals the var, unlocks the mutex, and then immediately > > > relocks it. The expectation here is that upon unlocking the mutex, > > > the calling thread would block while some waiting thread (that just > > > got signaled) would get to run. In fact what happened is that the > > > calling thread unlocked and relocked the mutex without allowing any > > > of the waiting threads to run. In this case the only solution was > > > to insert a yield() after the mutex_unlock(). > > > > That's exactly the behavior I would expect. Why would you expect > > unlocking a mutex to cause a reschedule, if the calling thread still > > has timeslice left? > > That's beside the point. Folks are making an assertion that > sched_yield() is meaningless; this example demonstrates that there are > cases where sched_yield() is essential. It is not essential, it is non-portable. Code you described is based on non-portable "expectations" about thread scheduling. Linux implementation of pthreads fails to satisfy them. Perfectly reasonable. Code is then "fixed" by adding sched_yield() calls and introducing more non-portable assumptions. Again, there is no guarantee this would work on any compliant implementation. While "intuitive" semantics of sched_yield() is to yield CPU and to give other runnable threads their chance to run, this is _not_ what standard prescribes (for non-RT threads). > > -- > -- Howard Chu Nikita. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Howard Chu writes: Lee Revell wrote: On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote: But I also found that I needed to add a new yield(), to work around yet another unexpected issue on this system - we have a number of threads waiting on a condition variable, and the thread holding the mutex signals the var, unlocks the mutex, and then immediately relocks it. The expectation here is that upon unlocking the mutex, the calling thread would block while some waiting thread (that just got signaled) would get to run. In fact what happened is that the calling thread unlocked and relocked the mutex without allowing any of the waiting threads to run. In this case the only solution was to insert a yield() after the mutex_unlock(). That's exactly the behavior I would expect. Why would you expect unlocking a mutex to cause a reschedule, if the calling thread still has timeslice left? That's beside the point. Folks are making an assertion that sched_yield() is meaningless; this example demonstrates that there are cases where sched_yield() is essential. It is not essential, it is non-portable. Code you described is based on non-portable expectations about thread scheduling. Linux implementation of pthreads fails to satisfy them. Perfectly reasonable. Code is then fixed by adding sched_yield() calls and introducing more non-portable assumptions. Again, there is no guarantee this would work on any compliant implementation. While intuitive semantics of sched_yield() is to yield CPU and to give other runnable threads their chance to run, this is _not_ what standard prescribes (for non-RT threads). -- -- Howard Chu Nikita. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
* Andi Kleen: Has anybody contacted the Sleepycat people with a description of the problem yet? Berkeley DB does not call sched_yield, but OpenLDAP does in some wrapper code around the Berkeley DB backend. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Howard Chu wrote: I'll note that we removed a number of the yield calls (that were in OpenLDAP 2.2) for the 2.3 release, because I found that they were redundant and causing unnecessary delays. My own test system is running on a Linux 2.6.12.3 kernel (installed over a SuSE 9.2 x86_64 distro), and OpenLDAP 2.3 runs perfectly well here, now that those redundant calls have been removed. But I also found that I needed to add a new yield(), to work around yet another unexpected issue on this system - we have a number of threads waiting on a condition variable, and the thread holding the mutex signals the var, unlocks the mutex, and then immediately relocks it. The expectation here is that upon unlocking the mutex, the calling thread would block while some waiting thread (that just got signaled) would get to run. In fact what happened is that the calling thread unlocked and relocked the mutex without allowing any of the waiting threads to run. In this case the only solution was to insert a yield() after the mutex_unlock(). So again, for those of you claiming "oh, all you need to do is use a condition variable or any of the other POSIX synchronization primitives" - yes, that's a nice theory, but reality says otherwise. I encountered a similar issue with some software that I wrote, and used a similar workaround, however this was basically because there wasn't enough time available at the time to redesign things to work properly. The problem here is essentially caused by the fact that the mutex is being locked for an excessively large proportion of the time and not letting other threads in. In the case I am thinking of, posting the messages to the thread that was hogging the mutex via a signaling queue would have been a better solution than using yield and having correct operation depend on undefined parts of thread scheduling behavior.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Howard Chu wrote: Lee Revell wrote: On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote: > But I also found that I needed to add a new yield(), to work around > yet another unexpected issue on this system - we have a number of > threads waiting on a condition variable, and the thread holding the > mutex signals the var, unlocks the mutex, and then immediately > relocks it. The expectation here is that upon unlocking the mutex, > the calling thread would block while some waiting thread (that just > got signaled) would get to run. In fact what happened is that the > calling thread unlocked and relocked the mutex without allowing any > of the waiting threads to run. In this case the only solution was > to insert a yield() after the mutex_unlock(). That's exactly the behavior I would expect. Why would you expect unlocking a mutex to cause a reschedule, if the calling thread still has timeslice left? That's beside the point. Folks are making an assertion that sched_yield() is meaningless; this example demonstrates that there are cases where sched_yield() is essential. The point is, with SCHED_OTHER scheduling, sched_yield() need not do anything. It may not let any other tasks run. The fact that it does on Linux is because we do attempt to do something expected... but the simple matter is that you can't realy on it to do what you expect. I'm not sure exactly how you would solve the above problem, but I'm sure it can be achieved using mutexes (for example, you could have a queue where every thread waits on its own private mutex) but I don't do much userspace C programming sorry. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Howard Chu writes: > Nikita Danilov wrote: > > That returns us to the core of the problem: sched_yield() is used to > > implement a synchronization primitive and non-portable assumptions are > > made about its behavior: SUS defines that after sched_yield() thread > > ceases to run on the CPU "until it again becomes the head of its thread > > list", and "thread list" discipline is only defined for real-time > > scheduling policies. E.g., > > > > int sched_yield(void) > > { > >return 0; > > } > > > > and > > > > int sched_yield(void) > > { > >sleep(100); > >return 0; > > } > > > > are both valid sched_yield() implementation for non-rt (SCHED_OTHER) > > threads. > I think you're mistaken: > http://groups.google.com/group/comp.programming.threads/browse_frm/thread/0d4eaf3703131e86/da051ebe58976b00#da051ebe58976b00 > > sched_yield() is required to be supported even if priority scheduling is > not supported, and it is required to cause the calling thread (not > process) to yield the processor. Of course sched_yield() is required to be supported, the question is for how long CPU is yielded. Here is the quote from the SUS (actually the complete definition of sched_yield()): The sched_yield() function shall force the running thread to relinquish the processor until it again becomes the head of its thread list. As far as I can see, SUS doesn't specify how "thread list" is maintained for non-RT scheduling policy, and implementation that immediately places SCHED_OTHER thread that called sched_yield() back at the head of its thread list is perfectly valid. Also valid is an implementation that waits for 100 seconds and then places sched_yield() caller to the head of the list, etc. Basically, while semantics of sched_yield() are well defined for RT scheduling policy, for SCHED_OTHER policy standard leaves it implementation defined. > > -- > -- Howard Chu Nikita. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote: > Nick Piggin wrote: > > Robert Hancock wrote: > > > I fail to see how sched_yield is going to be very helpful in this > > > situation. Since that call can sleep from a range of time ranging > > > from zero to a long time, it's going to give unpredictable results. > > > Well, not sleep technically, but yield the CPU for some undefined > > amount of time. > > Since the slapd server was not written to run in realtime, nor is it > commonly run on realtime operating systems, I don't believe predictable > timing here is a criteria we care about. One could say the same of > sigsuspend() by the way - it can pause a process for a range of time > ranging from zero to a long time. Should we tell application writers not > to use this function either, regardless of whether the developer thinks > they have a good reason to use it? Of course not. We should tell them that if they use sigsuspend() they cannot assume that the process will not wake up immediately. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Lee Revell wrote: On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote: > But I also found that I needed to add a new yield(), to work around > yet another unexpected issue on this system - we have a number of > threads waiting on a condition variable, and the thread holding the > mutex signals the var, unlocks the mutex, and then immediately > relocks it. The expectation here is that upon unlocking the mutex, > the calling thread would block while some waiting thread (that just > got signaled) would get to run. In fact what happened is that the > calling thread unlocked and relocked the mutex without allowing any > of the waiting threads to run. In this case the only solution was > to insert a yield() after the mutex_unlock(). That's exactly the behavior I would expect. Why would you expect unlocking a mutex to cause a reschedule, if the calling thread still has timeslice left? That's beside the point. Folks are making an assertion that sched_yield() is meaningless; this example demonstrates that there are cases where sched_yield() is essential. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote: > But I also found that I needed to add a new > yield(), to work around yet another unexpected issue on this system - > we have a number of threads waiting on a condition variable, and the > thread holding the mutex signals the var, unlocks the mutex, and then > immediately relocks it. The expectation here is that upon unlocking > the mutex, the calling thread would block while some waiting thread > (that just got signaled) would get to run. In fact what happened is > that the calling thread unlocked and relocked the mutex without > allowing any of the waiting threads to run. In this case the only > solution was to insert a yield() after the mutex_unlock(). That's exactly the behavior I would expect. Why would you expect unlocking a mutex to cause a reschedule, if the calling thread still has timeslice left? Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Nikita Danilov wrote: That returns us to the core of the problem: sched_yield() is used to implement a synchronization primitive and non-portable assumptions are made about its behavior: SUS defines that after sched_yield() thread ceases to run on the CPU "until it again becomes the head of its thread list", and "thread list" discipline is only defined for real-time scheduling policies. E.g., int sched_yield(void) { return 0; } and int sched_yield(void) { sleep(100); return 0; } are both valid sched_yield() implementation for non-rt (SCHED_OTHER) threads. I think you're mistaken: http://groups.google.com/group/comp.programming.threads/browse_frm/thread/0d4eaf3703131e86/da051ebe58976b00#da051ebe58976b00 sched_yield() is required to be supported even if priority scheduling is not supported, and it is required to cause the calling thread (not process) to yield the processor. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Nick Piggin wrote: Robert Hancock wrote: > I fail to see how sched_yield is going to be very helpful in this > situation. Since that call can sleep from a range of time ranging > from zero to a long time, it's going to give unpredictable results. Well, not sleep technically, but yield the CPU for some undefined amount of time. Since the slapd server was not written to run in realtime, nor is it commonly run on realtime operating systems, I don't believe predictable timing here is a criteria we care about. One could say the same of sigsuspend() by the way - it can pause a process for a range of time ranging from zero to a long time. Should we tell application writers not to use this function either, regardless of whether the developer thinks they have a good reason to use it? > It seems to me that this sort of thing is why we have POSIX pthread > synchronization primitives.. sched_yield is basically there for a > process to indicate that "what I'm doing doesn't matter much, let > other stuff run". Any other use of it generally constitutes some > kind of hack. In terms of transaction recovery, we do an exponential backoff on the retries, because our benchmarks showed that under heavy lock contention, immediate retries only made things worse. In fact, having arbitrarily long backoff delays here was shown to improve transaction throughput. (We use select() with an increasing timeval in combination with the yield() call. One way or another we get a longer delay as desired.) sched_yield is there for a *thread* to indicate "what I'm doing doesn't matter much, let other stuff run." I suppose it may be a hack. But then so is TCP congestion control. In both cases, empirical evidence indicates the hack is worthwhile. If you haven't done the analysis then you're in no position to deny the value of the approach. In SCHED_OTHER mode, you're right, sched_yield is basically meaningless. In a realtime system, there is a very well defined and probably useful behaviour. Eg. If 2 SCHED_FIFO processes are running at the same priority, One can call sched_yield to deterministically give the CPU to the other guy. Well yes, the point of a realtime system is to provide deterministic response times to unpredictable input. I'll note that we removed a number of the yield calls (that were in OpenLDAP 2.2) for the 2.3 release, because I found that they were redundant and causing unnecessary delays. My own test system is running on a Linux 2.6.12.3 kernel (installed over a SuSE 9.2 x86_64 distro), and OpenLDAP 2.3 runs perfectly well here, now that those redundant calls have been removed. But I also found that I needed to add a new yield(), to work around yet another unexpected issue on this system - we have a number of threads waiting on a condition variable, and the thread holding the mutex signals the var, unlocks the mutex, and then immediately relocks it. The expectation here is that upon unlocking the mutex, the calling thread would block while some waiting thread (that just got signaled) would get to run. In fact what happened is that the calling thread unlocked and relocked the mutex without allowing any of the waiting threads to run. In this case the only solution was to insert a yield() after the mutex_unlock(). So again, for those of you claiming "oh, all you need to do is use a condition variable or any of the other POSIX synchronization primitives" - yes, that's a nice theory, but reality says otherwise. To say that sched_yield is basically meaningless is far overstating your point. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Howard Chu <[EMAIL PROTECTED]> writes: > In this specific example, we use whatever > BerkeleyDB provides and we're certainly not about to write our own > transactional embedded database engine just for this. BerkeleyDB is free software after all that comes with source code. Surely it can be fixed without rewriting it from scratch. Has anybody contacted the Sleepycat people with a description of the problem yet? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Howard Chu writes: > Nikita Danilov wrote: [...] > > > What prevents transaction monitor from using, say, condition > > variables to "yield cpu"? That would have an additional advantage of > > blocking thread precisely until specific event occurs, instead of > > blocking for some vague indeterminate load and platform dependent > > amount of time. > > Condition variables offer no control over which thread is waken up. When only one thread waits on a condition variable, which is exactly a scenario involved, --sorry if I weren't clear enough-- condition signal provides precise control over which thread is woken up. > We're wandering into the design of the SleepyCat BerkeleyDB library > here, and we don't exert any control over that either. BerkeleyDB > doesn't appear to use pthread condition variables; it seems to construct > its own synchronization mechanisms on top of mutexes (and yield calls). That returns us to the core of the problem: sched_yield() is used to implement a synchronization primitive and non-portable assumptions are made about its behavior: SUS defines that after sched_yield() thread ceases to run on the CPU "until it again becomes the head of its thread list", and "thread list" discipline is only defined for real-time scheduling policies. E.g., int sched_yield(void) { return 0; } and int sched_yield(void) { sleep(100); return 0; } are both valid sched_yield() implementation for non-rt (SCHED_OTHER) threads. Nikita. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Howard Chu writes: Nikita Danilov wrote: [...] What prevents transaction monitor from using, say, condition variables to yield cpu? That would have an additional advantage of blocking thread precisely until specific event occurs, instead of blocking for some vague indeterminate load and platform dependent amount of time. Condition variables offer no control over which thread is waken up. When only one thread waits on a condition variable, which is exactly a scenario involved, --sorry if I weren't clear enough-- condition signal provides precise control over which thread is woken up. We're wandering into the design of the SleepyCat BerkeleyDB library here, and we don't exert any control over that either. BerkeleyDB doesn't appear to use pthread condition variables; it seems to construct its own synchronization mechanisms on top of mutexes (and yield calls). That returns us to the core of the problem: sched_yield() is used to implement a synchronization primitive and non-portable assumptions are made about its behavior: SUS defines that after sched_yield() thread ceases to run on the CPU until it again becomes the head of its thread list, and thread list discipline is only defined for real-time scheduling policies. E.g., int sched_yield(void) { return 0; } and int sched_yield(void) { sleep(100); return 0; } are both valid sched_yield() implementation for non-rt (SCHED_OTHER) threads. Nikita. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Howard Chu [EMAIL PROTECTED] writes: In this specific example, we use whatever BerkeleyDB provides and we're certainly not about to write our own transactional embedded database engine just for this. BerkeleyDB is free software after all that comes with source code. Surely it can be fixed without rewriting it from scratch. Has anybody contacted the Sleepycat people with a description of the problem yet? -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Nick Piggin wrote: Robert Hancock wrote: I fail to see how sched_yield is going to be very helpful in this situation. Since that call can sleep from a range of time ranging from zero to a long time, it's going to give unpredictable results. Well, not sleep technically, but yield the CPU for some undefined amount of time. Since the slapd server was not written to run in realtime, nor is it commonly run on realtime operating systems, I don't believe predictable timing here is a criteria we care about. One could say the same of sigsuspend() by the way - it can pause a process for a range of time ranging from zero to a long time. Should we tell application writers not to use this function either, regardless of whether the developer thinks they have a good reason to use it? It seems to me that this sort of thing is why we have POSIX pthread synchronization primitives.. sched_yield is basically there for a process to indicate that what I'm doing doesn't matter much, let other stuff run. Any other use of it generally constitutes some kind of hack. In terms of transaction recovery, we do an exponential backoff on the retries, because our benchmarks showed that under heavy lock contention, immediate retries only made things worse. In fact, having arbitrarily long backoff delays here was shown to improve transaction throughput. (We use select() with an increasing timeval in combination with the yield() call. One way or another we get a longer delay as desired.) sched_yield is there for a *thread* to indicate what I'm doing doesn't matter much, let other stuff run. I suppose it may be a hack. But then so is TCP congestion control. In both cases, empirical evidence indicates the hack is worthwhile. If you haven't done the analysis then you're in no position to deny the value of the approach. In SCHED_OTHER mode, you're right, sched_yield is basically meaningless. In a realtime system, there is a very well defined and probably useful behaviour. Eg. If 2 SCHED_FIFO processes are running at the same priority, One can call sched_yield to deterministically give the CPU to the other guy. Well yes, the point of a realtime system is to provide deterministic response times to unpredictable input. I'll note that we removed a number of the yield calls (that were in OpenLDAP 2.2) for the 2.3 release, because I found that they were redundant and causing unnecessary delays. My own test system is running on a Linux 2.6.12.3 kernel (installed over a SuSE 9.2 x86_64 distro), and OpenLDAP 2.3 runs perfectly well here, now that those redundant calls have been removed. But I also found that I needed to add a new yield(), to work around yet another unexpected issue on this system - we have a number of threads waiting on a condition variable, and the thread holding the mutex signals the var, unlocks the mutex, and then immediately relocks it. The expectation here is that upon unlocking the mutex, the calling thread would block while some waiting thread (that just got signaled) would get to run. In fact what happened is that the calling thread unlocked and relocked the mutex without allowing any of the waiting threads to run. In this case the only solution was to insert a yield() after the mutex_unlock(). So again, for those of you claiming oh, all you need to do is use a condition variable or any of the other POSIX synchronization primitives - yes, that's a nice theory, but reality says otherwise. To say that sched_yield is basically meaningless is far overstating your point. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Nikita Danilov wrote: That returns us to the core of the problem: sched_yield() is used to implement a synchronization primitive and non-portable assumptions are made about its behavior: SUS defines that after sched_yield() thread ceases to run on the CPU until it again becomes the head of its thread list, and thread list discipline is only defined for real-time scheduling policies. E.g., int sched_yield(void) { return 0; } and int sched_yield(void) { sleep(100); return 0; } are both valid sched_yield() implementation for non-rt (SCHED_OTHER) threads. I think you're mistaken: http://groups.google.com/group/comp.programming.threads/browse_frm/thread/0d4eaf3703131e86/da051ebe58976b00#da051ebe58976b00 sched_yield() is required to be supported even if priority scheduling is not supported, and it is required to cause the calling thread (not process) to yield the processor. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote: But I also found that I needed to add a new yield(), to work around yet another unexpected issue on this system - we have a number of threads waiting on a condition variable, and the thread holding the mutex signals the var, unlocks the mutex, and then immediately relocks it. The expectation here is that upon unlocking the mutex, the calling thread would block while some waiting thread (that just got signaled) would get to run. In fact what happened is that the calling thread unlocked and relocked the mutex without allowing any of the waiting threads to run. In this case the only solution was to insert a yield() after the mutex_unlock(). That's exactly the behavior I would expect. Why would you expect unlocking a mutex to cause a reschedule, if the calling thread still has timeslice left? Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Lee Revell wrote: On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote: But I also found that I needed to add a new yield(), to work around yet another unexpected issue on this system - we have a number of threads waiting on a condition variable, and the thread holding the mutex signals the var, unlocks the mutex, and then immediately relocks it. The expectation here is that upon unlocking the mutex, the calling thread would block while some waiting thread (that just got signaled) would get to run. In fact what happened is that the calling thread unlocked and relocked the mutex without allowing any of the waiting threads to run. In this case the only solution was to insert a yield() after the mutex_unlock(). That's exactly the behavior I would expect. Why would you expect unlocking a mutex to cause a reschedule, if the calling thread still has timeslice left? That's beside the point. Folks are making an assertion that sched_yield() is meaningless; this example demonstrates that there are cases where sched_yield() is essential. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote: Nick Piggin wrote: Robert Hancock wrote: I fail to see how sched_yield is going to be very helpful in this situation. Since that call can sleep from a range of time ranging from zero to a long time, it's going to give unpredictable results. Well, not sleep technically, but yield the CPU for some undefined amount of time. Since the slapd server was not written to run in realtime, nor is it commonly run on realtime operating systems, I don't believe predictable timing here is a criteria we care about. One could say the same of sigsuspend() by the way - it can pause a process for a range of time ranging from zero to a long time. Should we tell application writers not to use this function either, regardless of whether the developer thinks they have a good reason to use it? Of course not. We should tell them that if they use sigsuspend() they cannot assume that the process will not wake up immediately. Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Howard Chu writes: Nikita Danilov wrote: That returns us to the core of the problem: sched_yield() is used to implement a synchronization primitive and non-portable assumptions are made about its behavior: SUS defines that after sched_yield() thread ceases to run on the CPU until it again becomes the head of its thread list, and thread list discipline is only defined for real-time scheduling policies. E.g., int sched_yield(void) { return 0; } and int sched_yield(void) { sleep(100); return 0; } are both valid sched_yield() implementation for non-rt (SCHED_OTHER) threads. I think you're mistaken: http://groups.google.com/group/comp.programming.threads/browse_frm/thread/0d4eaf3703131e86/da051ebe58976b00#da051ebe58976b00 sched_yield() is required to be supported even if priority scheduling is not supported, and it is required to cause the calling thread (not process) to yield the processor. Of course sched_yield() is required to be supported, the question is for how long CPU is yielded. Here is the quote from the SUS (actually the complete definition of sched_yield()): The sched_yield() function shall force the running thread to relinquish the processor until it again becomes the head of its thread list. As far as I can see, SUS doesn't specify how thread list is maintained for non-RT scheduling policy, and implementation that immediately places SCHED_OTHER thread that called sched_yield() back at the head of its thread list is perfectly valid. Also valid is an implementation that waits for 100 seconds and then places sched_yield() caller to the head of the list, etc. Basically, while semantics of sched_yield() are well defined for RT scheduling policy, for SCHED_OTHER policy standard leaves it implementation defined. -- -- Howard Chu Nikita. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Howard Chu wrote: Lee Revell wrote: On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote: But I also found that I needed to add a new yield(), to work around yet another unexpected issue on this system - we have a number of threads waiting on a condition variable, and the thread holding the mutex signals the var, unlocks the mutex, and then immediately relocks it. The expectation here is that upon unlocking the mutex, the calling thread would block while some waiting thread (that just got signaled) would get to run. In fact what happened is that the calling thread unlocked and relocked the mutex without allowing any of the waiting threads to run. In this case the only solution was to insert a yield() after the mutex_unlock(). That's exactly the behavior I would expect. Why would you expect unlocking a mutex to cause a reschedule, if the calling thread still has timeslice left? That's beside the point. Folks are making an assertion that sched_yield() is meaningless; this example demonstrates that there are cases where sched_yield() is essential. The point is, with SCHED_OTHER scheduling, sched_yield() need not do anything. It may not let any other tasks run. The fact that it does on Linux is because we do attempt to do something expected... but the simple matter is that you can't realy on it to do what you expect. I'm not sure exactly how you would solve the above problem, but I'm sure it can be achieved using mutexes (for example, you could have a queue where every thread waits on its own private mutex) but I don't do much userspace C programming sorry. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Howard Chu wrote: I'll note that we removed a number of the yield calls (that were in OpenLDAP 2.2) for the 2.3 release, because I found that they were redundant and causing unnecessary delays. My own test system is running on a Linux 2.6.12.3 kernel (installed over a SuSE 9.2 x86_64 distro), and OpenLDAP 2.3 runs perfectly well here, now that those redundant calls have been removed. But I also found that I needed to add a new yield(), to work around yet another unexpected issue on this system - we have a number of threads waiting on a condition variable, and the thread holding the mutex signals the var, unlocks the mutex, and then immediately relocks it. The expectation here is that upon unlocking the mutex, the calling thread would block while some waiting thread (that just got signaled) would get to run. In fact what happened is that the calling thread unlocked and relocked the mutex without allowing any of the waiting threads to run. In this case the only solution was to insert a yield() after the mutex_unlock(). So again, for those of you claiming oh, all you need to do is use a condition variable or any of the other POSIX synchronization primitives - yes, that's a nice theory, but reality says otherwise. I encountered a similar issue with some software that I wrote, and used a similar workaround, however this was basically because there wasn't enough time available at the time to redesign things to work properly. The problem here is essentially caused by the fact that the mutex is being locked for an excessively large proportion of the time and not letting other threads in. In the case I am thinking of, posting the messages to the thread that was hogging the mutex via a signaling queue would have been a better solution than using yield and having correct operation depend on undefined parts of thread scheduling behavior.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Robert Hancock wrote: I fail to see how sched_yield is going to be very helpful in this situation. Since that call can sleep from a range of time ranging from zero to a long time, it's going to give unpredictable results. Well, not sleep technically, but yield the CPU for some undefined amount of time. It seems to me that this sort of thing is why we have POSIX pthread synchronization primitives.. sched_yield is basically there for a process to indicate that "what I'm doing doesn't matter much, let other stuff run". Any other use of it generally constitutes some kind of hack. In SCHED_OTHER mode, you're right, sched_yield is basically meaningless. In a realtime system, there is a very well defined and probably useful behaviour. Eg. If 2 SCHED_FIFO processes are running at the same priority, One can call sched_yield to deterministically give the CPU to the other guy. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Howard Chu wrote: You assume that spinlocks are the only reason a developer may want to yield the processor. This assumption is unfounded. Case in point - the primary backend in OpenLDAP uses a transactional database with page-level locking of its data structures to provide high levels of concurrency. It is the nature of such a system to encounter deadlocks over the normal course of operations. When a deadlock is detected, some thread must be chosen (by one of a variety of algorithms) to abort its transaction, in order to allow other operations to proceed to completion. In this situation, the chosen thread must get control of the CPU long enough to clean itself up, and then it must yield the CPU in order to allow any other competing threads to complete their transaction. The thread with the aborted transaction relinquishes all of its locks and then waits to get another shot at the CPU to try everything over again. Again, this is all fundamental to the nature of transactional programming. If the 2.6 kernel makes this programming model unreasonably slow, then quite simply this kernel is not viable as a database platform. I fail to see how sched_yield is going to be very helpful in this situation. Since that call can sleep from a range of time ranging from zero to a long time, it's going to give unpredictable results. It seems to me that this sort of thing is why we have POSIX pthread synchronization primitives.. sched_yield is basically there for a process to indicate that "what I'm doing doesn't matter much, let other stuff run". Any other use of it generally constitutes some kind of hack. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Nikita Danilov wrote: Howard Chu <[EMAIL PROTECTED]> writes: > concurrency. It is the nature of such a system to encounter > deadlocks over the normal course of operations. When a deadlock is > detected, some thread must be chosen (by one of a variety of > algorithms) to abort its transaction, in order to allow other > operations to proceed to completion. In this situation, the chosen > thread must get control of the CPU long enough to clean itself up, What prevents transaction monitor from using, say, condition variables to "yield cpu"? That would have an additional advantage of blocking thread precisely until specific event occurs, instead of blocking for some vague indeterminate load and platform dependent amount of time. Condition variables offer no control over which thread is waken up. We're wandering into the design of the SleepyCat BerkeleyDB library here, and we don't exert any control over that either. BerkeleyDB doesn't appear to use pthread condition variables; it seems to construct its own synchronization mechanisms on top of mutexes (and yield calls). In this specific example, we use whatever BerkeleyDB provides and we're certainly not about to write our own transactional embedded database engine just for this. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Chris Wedgwood wrote: On Thu, Aug 18, 2005 at 11:03:45PM -0700, Howard Chu wrote: > If the 2.6 kernel makes this programming model unreasonably slow, > then quite simply this kernel is not viable as a database platform. Pretty much everyone else manages to make it work. And this contributes to the discussion how? Pretty much every other Unix-ish operating system manages to make scheduling with nice'd processes work. If you really want to get into what "everyone else manages to make work" you're in for a rough ride. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Howard Chu <[EMAIL PROTECTED]> writes: [...] > concurrency. It is the nature of such a system to encounter deadlocks > over the normal course of operations. When a deadlock is detected, some > thread must be chosen (by one of a variety of algorithms) to abort its > transaction, in order to allow other operations to proceed to > completion. In this situation, the chosen thread must get control of the > CPU long enough to clean itself up, What prevents transaction monitor from using, say, condition variables to "yield cpu"? That would have an additional advantage of blocking thread precisely until specific event occurs, instead of blocking for some vague indeterminate load and platform dependent amount of time. > and then it must yield the CPU in > order to allow any other competing threads to complete their > transaction. Again, this sounds like thing doable with standard POSIX synchronization primitives. > > -- > -- Howard Chu Nikita. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
On Thu, Aug 18, 2005 at 11:03:45PM -0700, Howard Chu wrote: > If the 2.6 kernel makes this programming model unreasonably slow, > then quite simply this kernel is not viable as a database platform. Pretty much everyone else manages to make it work. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Hi Howard, Thanks for joining the discussion. One request, if I may, can you retain the CC list on posts please? Howard Chu wrote: > AFAIKS, sched_yield should only really be used by realtime applications that know exactly what they're doing. pthread_yield() was deleted from the POSIX threads drafts years ago. sched_yield() is the officially supported API, and OpenLDAP is using it for the documented purpose. Anyone who says "applications shouldn't be using sched_yield()" doesn't know what they're talking about. Linux's SCHED_OTHER policy offers static priorities in the range [0..0]. I think anything else would be a bug, because from my reading of the standards, a process with a higher static priority shall always preempt a process with a lower priority. And SCHED_OTHER simply doesn't work that way. So sched_yield() from a SCHED_OTHER task is free to basically do anything at all. Is that the kind of behaviour you had in mind? It's really more a feature than a bug that it breaks so easily because they should be really using futexes instead, which have much better behaviour than any sched_yield ever could (they will directly wake up another process waiting for the lock and avoid the thundering herd for contended locks) You assume that spinlocks are the only reason a developer may want to yield the processor. This assumption is unfounded. Case in point - the primary backend in OpenLDAP uses a transactional database with page-level locking of its data structures to provide high levels of concurrency. It is the nature of such a system to encounter deadlocks over the normal course of operations. When a deadlock is detected, some thread must be chosen (by one of a variety of algorithms) to abort its transaction, in order to allow other operations to proceed to completion. In this situation, the chosen thread must get control of the CPU long enough to clean itself up, and then it must yield the CPU in order to allow any other competing threads to complete their transaction. The thread with the aborted transaction relinquishes all of its locks and then waits to get another shot at the CPU to try everything over again. Again, this is all fundamental to the nature of You didn't explain why you can't use a mutex to do this. From your brief description, it seems like a mutex might just do the job nicely. transactional programming. If the 2.6 kernel makes this programming model unreasonably slow, then quite simply this kernel is not viable as a database platform. Actually it should still be fast. It may yield excessive CPU to other tasks (including those that are reniced). You didn't rely on sched_yield providing some semantics about not doing such a thing, did you? Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Hi Howard, Thanks for joining the discussion. One request, if I may, can you retain the CC list on posts please? Howard Chu wrote: AFAIKS, sched_yield should only really be used by realtime applications that know exactly what they're doing. pthread_yield() was deleted from the POSIX threads drafts years ago. sched_yield() is the officially supported API, and OpenLDAP is using it for the documented purpose. Anyone who says applications shouldn't be using sched_yield() doesn't know what they're talking about. Linux's SCHED_OTHER policy offers static priorities in the range [0..0]. I think anything else would be a bug, because from my reading of the standards, a process with a higher static priority shall always preempt a process with a lower priority. And SCHED_OTHER simply doesn't work that way. So sched_yield() from a SCHED_OTHER task is free to basically do anything at all. Is that the kind of behaviour you had in mind? It's really more a feature than a bug that it breaks so easily because they should be really using futexes instead, which have much better behaviour than any sched_yield ever could (they will directly wake up another process waiting for the lock and avoid the thundering herd for contended locks) You assume that spinlocks are the only reason a developer may want to yield the processor. This assumption is unfounded. Case in point - the primary backend in OpenLDAP uses a transactional database with page-level locking of its data structures to provide high levels of concurrency. It is the nature of such a system to encounter deadlocks over the normal course of operations. When a deadlock is detected, some thread must be chosen (by one of a variety of algorithms) to abort its transaction, in order to allow other operations to proceed to completion. In this situation, the chosen thread must get control of the CPU long enough to clean itself up, and then it must yield the CPU in order to allow any other competing threads to complete their transaction. The thread with the aborted transaction relinquishes all of its locks and then waits to get another shot at the CPU to try everything over again. Again, this is all fundamental to the nature of You didn't explain why you can't use a mutex to do this. From your brief description, it seems like a mutex might just do the job nicely. transactional programming. If the 2.6 kernel makes this programming model unreasonably slow, then quite simply this kernel is not viable as a database platform. Actually it should still be fast. It may yield excessive CPU to other tasks (including those that are reniced). You didn't rely on sched_yield providing some semantics about not doing such a thing, did you? Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
On Thu, Aug 18, 2005 at 11:03:45PM -0700, Howard Chu wrote: If the 2.6 kernel makes this programming model unreasonably slow, then quite simply this kernel is not viable as a database platform. Pretty much everyone else manages to make it work. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Howard Chu [EMAIL PROTECTED] writes: [...] concurrency. It is the nature of such a system to encounter deadlocks over the normal course of operations. When a deadlock is detected, some thread must be chosen (by one of a variety of algorithms) to abort its transaction, in order to allow other operations to proceed to completion. In this situation, the chosen thread must get control of the CPU long enough to clean itself up, What prevents transaction monitor from using, say, condition variables to yield cpu? That would have an additional advantage of blocking thread precisely until specific event occurs, instead of blocking for some vague indeterminate load and platform dependent amount of time. and then it must yield the CPU in order to allow any other competing threads to complete their transaction. Again, this sounds like thing doable with standard POSIX synchronization primitives. -- -- Howard Chu Nikita. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Chris Wedgwood wrote: On Thu, Aug 18, 2005 at 11:03:45PM -0700, Howard Chu wrote: If the 2.6 kernel makes this programming model unreasonably slow, then quite simply this kernel is not viable as a database platform. Pretty much everyone else manages to make it work. And this contributes to the discussion how? Pretty much every other Unix-ish operating system manages to make scheduling with nice'd processes work. If you really want to get into what everyone else manages to make work you're in for a rough ride. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Nikita Danilov wrote: Howard Chu [EMAIL PROTECTED] writes: concurrency. It is the nature of such a system to encounter deadlocks over the normal course of operations. When a deadlock is detected, some thread must be chosen (by one of a variety of algorithms) to abort its transaction, in order to allow other operations to proceed to completion. In this situation, the chosen thread must get control of the CPU long enough to clean itself up, What prevents transaction monitor from using, say, condition variables to yield cpu? That would have an additional advantage of blocking thread precisely until specific event occurs, instead of blocking for some vague indeterminate load and platform dependent amount of time. Condition variables offer no control over which thread is waken up. We're wandering into the design of the SleepyCat BerkeleyDB library here, and we don't exert any control over that either. BerkeleyDB doesn't appear to use pthread condition variables; it seems to construct its own synchronization mechanisms on top of mutexes (and yield calls). In this specific example, we use whatever BerkeleyDB provides and we're certainly not about to write our own transactional embedded database engine just for this. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Howard Chu wrote: You assume that spinlocks are the only reason a developer may want to yield the processor. This assumption is unfounded. Case in point - the primary backend in OpenLDAP uses a transactional database with page-level locking of its data structures to provide high levels of concurrency. It is the nature of such a system to encounter deadlocks over the normal course of operations. When a deadlock is detected, some thread must be chosen (by one of a variety of algorithms) to abort its transaction, in order to allow other operations to proceed to completion. In this situation, the chosen thread must get control of the CPU long enough to clean itself up, and then it must yield the CPU in order to allow any other competing threads to complete their transaction. The thread with the aborted transaction relinquishes all of its locks and then waits to get another shot at the CPU to try everything over again. Again, this is all fundamental to the nature of transactional programming. If the 2.6 kernel makes this programming model unreasonably slow, then quite simply this kernel is not viable as a database platform. I fail to see how sched_yield is going to be very helpful in this situation. Since that call can sleep from a range of time ranging from zero to a long time, it's going to give unpredictable results. It seems to me that this sort of thing is why we have POSIX pthread synchronization primitives.. sched_yield is basically there for a process to indicate that what I'm doing doesn't matter much, let other stuff run. Any other use of it generally constitutes some kind of hack. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Robert Hancock wrote: I fail to see how sched_yield is going to be very helpful in this situation. Since that call can sleep from a range of time ranging from zero to a long time, it's going to give unpredictable results. Well, not sleep technically, but yield the CPU for some undefined amount of time. It seems to me that this sort of thing is why we have POSIX pthread synchronization primitives.. sched_yield is basically there for a process to indicate that what I'm doing doesn't matter much, let other stuff run. Any other use of it generally constitutes some kind of hack. In SCHED_OTHER mode, you're right, sched_yield is basically meaningless. In a realtime system, there is a very well defined and probably useful behaviour. Eg. If 2 SCHED_FIFO processes are running at the same priority, One can call sched_yield to deterministically give the CPU to the other guy. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Andi Kleen wrote: > Bernardo Innocenti <[EMAIL PROTECTED]> writes: > > It's really more a feature than a bug that it breaks so easily > because they should be really using futexes instead, which > have much better behaviour than any sched_yield ever could > (they will directly wake up another process waiting for the > lock and avoid the thundering herd for contended locks) Actually, I believe they should be using pthread synchronization primitives instead of relying on Linux-specific functionality. Glibc already uses futexes internally, so it's almost as efficient. I've already suggested this to the OpenLDAP people, but with my limited knowledge of slapd threading requirements, there may well be a very good reason for busy-waiting with sched_yield(). Waiting for their answer. -- // Bernardo Innocenti - Develer S.r.l., R dept. \X/ http://www.develer.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Bernardo Innocenti <[EMAIL PROTECTED]> writes: It's really more a feature than a bug that it breaks so easily because they should be really using futexes instead, which have much better behaviour than any sched_yield ever could (they will directly wake up another process waiting for the lock and avoid the thundering herd for contended locks) -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Nick Piggin wrote: > We class the SCHED_OTHER policy as having a single priority, which > I believe is allowed (and even makes good sense, because dynamic > and even nice priorities aren't really well defined). > > That also makes our sched_yield() behaviour correct. > > AFAIKS, sched_yield should only really be used by realtime > applications that know exactly what they're doing. I'm pretty sure this has already been discussed in the past, but I fail to see why this new behavior of sched_yield() would be more correct. In the OpenLDAP bug discussion, one of the developers considers this a Linux quirk needing a workaround, not a real bug in OpenLDAP. As I understand it, the old behavior was to push the yielding process to the end of the queue for processes with the same niceness. This is somewhat closer to the (vague) definition in the POSIX man pages: The sched_yield() function shall force the running thread to relinquish the processor until it again becomes the head of its thread list. It takes no arguments. Pushing the process far behind in the queue, even after niced CPU crunchers, appears a bit extreme. It seems most programs expect sched_yield() to only reschedule the calling thread wrt its sibling threads, to be used to implement do-it-yourself spinlocks and the like. -- // Bernardo Innocenti - Develer S.r.l., R dept. \X/ http://www.develer.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Hello Con, Thursday, August 18, 2005, 2:47:25 AM, you wrote: > sched_yield behaviour changed in 2.5 series more than 3 years ago and > applications that use this as a locking primitive should be updated. I remember open office had a problem with excessive use of sched_yield() during 2.5. I guess they changed it but I have not checked. Does anyone know ? Back then oo was having serious latency problems on 2.5 Regards, Maciej - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Hello Con, Thursday, August 18, 2005, 2:47:25 AM, you wrote: sched_yield behaviour changed in 2.5 series more than 3 years ago and applications that use this as a locking primitive should be updated. I remember open office had a problem with excessive use of sched_yield() during 2.5. I guess they changed it but I have not checked. Does anyone know ? Back then oo was having serious latency problems on 2.5 Regards, Maciej - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Bernardo Innocenti [EMAIL PROTECTED] writes: It's really more a feature than a bug that it breaks so easily because they should be really using futexes instead, which have much better behaviour than any sched_yield ever could (they will directly wake up another process waiting for the lock and avoid the thundering herd for contended locks) -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Nick Piggin wrote: We class the SCHED_OTHER policy as having a single priority, which I believe is allowed (and even makes good sense, because dynamic and even nice priorities aren't really well defined). That also makes our sched_yield() behaviour correct. AFAIKS, sched_yield should only really be used by realtime applications that know exactly what they're doing. I'm pretty sure this has already been discussed in the past, but I fail to see why this new behavior of sched_yield() would be more correct. In the OpenLDAP bug discussion, one of the developers considers this a Linux quirk needing a workaround, not a real bug in OpenLDAP. As I understand it, the old behavior was to push the yielding process to the end of the queue for processes with the same niceness. This is somewhat closer to the (vague) definition in the POSIX man pages: The sched_yield() function shall force the running thread to relinquish the processor until it again becomes the head of its thread list. It takes no arguments. Pushing the process far behind in the queue, even after niced CPU crunchers, appears a bit extreme. It seems most programs expect sched_yield() to only reschedule the calling thread wrt its sibling threads, to be used to implement do-it-yourself spinlocks and the like. -- // Bernardo Innocenti - Develer S.r.l., RD dept. \X/ http://www.develer.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Andi Kleen wrote: Bernardo Innocenti [EMAIL PROTECTED] writes: It's really more a feature than a bug that it breaks so easily because they should be really using futexes instead, which have much better behaviour than any sched_yield ever could (they will directly wake up another process waiting for the lock and avoid the thundering herd for contended locks) Actually, I believe they should be using pthread synchronization primitives instead of relying on Linux-specific functionality. Glibc already uses futexes internally, so it's almost as efficient. I've already suggested this to the OpenLDAP people, but with my limited knowledge of slapd threading requirements, there may well be a very good reason for busy-waiting with sched_yield(). Waiting for their answer. -- // Bernardo Innocenti - Develer S.r.l., RD dept. \X/ http://www.develer.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Joseph Fannin wrote: On Thu, Aug 18, 2005 at 02:50:16AM +0200, Bernardo Innocenti wrote: The relative timestamp reveals that slapd is spending 50ms after yielding. Meanwhile, GCC is probably being scheduled for a whole quantum. Reading the man-page of sched_yield() it seems this isn't the correct behavior: Note: If the current process is the only process in the highest priority list at that time, this process will continue to run after a call to sched_yield. The behavior of sched_yield changed for 2.6. I suppose the man page didn't get updated. We class the SCHED_OTHER policy as having a single priority, which I believe is allowed (and even makes good sense, because dynamic and even nice priorities aren't really well defined). That also makes our sched_yield() behaviour correct. AFAIKS, sched_yield should only really be used by realtime applications that know exactly what they're doing. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Joseph Fannin wrote: >The behavior of sched_yield changed for 2.6. I suppose the man > page didn't get updated. Now I remember reading about that on LWN or maybe KernelTraffic. Thanks! >>I also think OpenLDAP is wrong. First, it should be calling >>pthread_yield() because slapd is a multithreading process >>and it just wants to run the other threads. See: > > Is it possible that this problem has been noticed and fixed > already? The OpenLDAP 2.3.5 source still looks like this. I've filed a report in OpenLDAP's issue tracker: http://www.openldap.org/its/index.cgi/Incoming?id=3950;page=2 -- // Bernardo Innocenti - Develer S.r.l., R dept. \X/ http://www.develer.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
On Thu, Aug 18, 2005 at 02:50:16AM +0200, Bernardo Innocenti wrote: > The relative timestamp reveals that slapd is spending 50ms > after yielding. Meanwhile, GCC is probably being scheduled > for a whole quantum. > > Reading the man-page of sched_yield() it seems this isn't > the correct behavior: > >Note: If the current process is the only process in the >highest priority list at that time, this process will >continue to run after a call to sched_yield. The behavior of sched_yield changed for 2.6. I suppose the man page didn't get updated. >From linux/Documentation/post-halloween.txt: | - The behavior of sched_yield() changed a lot. A task that uses | this system call should now expect to sleep for possibly a very | long time. Tasks that do not really desire to give up the | processor for a while should probably not make heavy use of this | function. Unfortunately, some GUI programs (like Open Office) | do make excessive use of this call and under load their | performance is poor. It seems this new 2.6 behavior is optimal | but some user-space applications may need fixing. This is pretty much all I know about it; I just thought I'd point it out. > I also think OpenLDAP is wrong. First, it should be calling > pthread_yield() because slapd is a multithreading process > and it just wants to run the other threads. See: Is it possible that this problem has been noticed and fixed already? -- Joseph Fannin [EMAIL PROTECTED] /* So there I am, in the middle of my `netfilter-is-wonderful' talk in Sydney, and someone asks `What happens if you try to enlarge a 64k packet here?'. I think I said something eloquent like `fuck'. - RR */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
On Thu, 18 Aug 2005 10:50 am, Bernardo Innocenti wrote: > Hello, > > I've been investigating a performance problem on a > server using OpenLDAP 2.2.26 for nss resolution and > running kernel 2.6.12. > > When a CPU bound process such as GCC is running in the > background (even at nice 10), many trivial commands such > as "su" or "groups" become extremely slow and take a few > seconds to complete. > > strace revealed that data exchange over the slapd socket > was where most of the time was spent. Looking at the > slapd side, I see several calls to sched_yield() like this: > > > [pid 8780] 0.33 stat64("gidNumber.dbb", 0xb7b3ebcc) = -1 EACCES > (Permission denied) [pid 8780] 0.59 pread(20, > "\0\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\2\0\344\17\2\3"..., 4096, 4096) = > 4096 [pid 8780] 0.83 pread(20, > "\0\0\0\0\1\0\0\0\4\0\0\0\3\0\0\0\0\0\0\0\222\0<\7\1\5\370"..., 4096, > 16384) = 4096 [pid 8780] 0.78 time(NULL)= 1124322520 > [pid 8780] 0.66 pread(11, > "\0\0\0\0\1\0\0\0\250\0\0\0\231\0\0\0\235\0\0\0\16\"..., 4096, 688128) > = 4096 [pid 8780] 0.000241 write(19, > "0e\2\1\3d`\4$cn=bernie,ou=group,dc=d"..., 103) = 103 [pid 8780] > 0.000137 sched_yield( > [pid 8781] 0.050020 <... sched_yield resumed> ) = 0 > [pid 8780] 0.25 <... sched_yield resumed> ) = 0 > [pid 8781] 0.60 futex(0x925ab20, FUTEX_WAIT, 33, NULL ...> [pid 8780] 0.26 write(19, "0\f\2\1\3e\7\n\1\0\4\0\4\0", 14) > = 14 [pid 8774] 0.000774 <... select resumed> ) = 1 (in [19]) > > > The relative timestamp reveals that slapd is spending 50ms > after yielding. Meanwhile, GCC is probably being scheduled > for a whole quantum. > > Reading the man-page of sched_yield() it seems this isn't > the correct behavior: > >Note: If the current process is the only process in the >highest priority list at that time, this process will >continue to run after a call to sched_yield. > > I also think OpenLDAP is wrong. First, it should be calling > pthread_yield() because slapd is a multithreading process > and it just wants to run the other threads. See: sched_yield behaviour changed in 2.5 series more than 3 years ago and applications that use this as a locking primitive should be updated. Cheers, Con - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
sched_yield() makes OpenLDAP slow
Hello, I've been investigating a performance problem on a server using OpenLDAP 2.2.26 for nss resolution and running kernel 2.6.12. When a CPU bound process such as GCC is running in the background (even at nice 10), many trivial commands such as "su" or "groups" become extremely slow and take a few seconds to complete. strace revealed that data exchange over the slapd socket was where most of the time was spent. Looking at the slapd side, I see several calls to sched_yield() like this: [pid 8780] 0.33 stat64("gidNumber.dbb", 0xb7b3ebcc) = -1 EACCES (Permission denied) [pid 8780] 0.59 pread(20, "\0\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\2\0\344\17\2\3"..., 4096, 4096) = 4096 [pid 8780] 0.83 pread(20, "\0\0\0\0\1\0\0\0\4\0\0\0\3\0\0\0\0\0\0\0\222\0<\7\1\5\370"..., 4096, 16384) = 4096 [pid 8780] 0.78 time(NULL)= 1124322520 [pid 8780] 0.66 pread(11, "\0\0\0\0\1\0\0\0\250\0\0\0\231\0\0\0\235\0\0\0\16\"..., 4096, 688128) = 4096 [pid 8780] 0.000241 write(19, "0e\2\1\3d`\4$cn=bernie,ou=group,dc=d"..., 103) = 103 [pid 8780] 0.000137 sched_yield( [pid 8781] 0.050020 <... sched_yield resumed> ) = 0 [pid 8780] 0.25 <... sched_yield resumed> ) = 0 [pid 8781] 0.60 futex(0x925ab20, FUTEX_WAIT, 33, NULL [pid 8780] 0.26 write(19, "0\f\2\1\3e\7\n\1\0\4\0\4\0", 14) = 14 [pid 8774] 0.000774 <... select resumed> ) = 1 (in [19]) The relative timestamp reveals that slapd is spending 50ms after yielding. Meanwhile, GCC is probably being scheduled for a whole quantum. Reading the man-page of sched_yield() it seems this isn't the correct behavior: Note: If the current process is the only process in the highest priority list at that time, this process will continue to run after a call to sched_yield. I also think OpenLDAP is wrong. First, it should be calling pthread_yield() because slapd is a multithreading process and it just wants to run the other threads. See: int ldap_pvt_thread_yield( void ) { #if HAVE_THR_YIELD return thr_yield(); #elif HAVE_PTHREADS == 10 return sched_yield(); #elif defined(_POSIX_THREAD_IS_GNU_PTH) sched_yield(); return 0; #elif HAVE_PTHREADS == 6 pthread_yield(NULL); return 0; #else pthread_yield(); return 0; #endif } -- // Bernardo Innocenti - Develer S.r.l., R dept. \X/ http://www.develer.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
sched_yield() makes OpenLDAP slow
Hello, I've been investigating a performance problem on a server using OpenLDAP 2.2.26 for nss resolution and running kernel 2.6.12. When a CPU bound process such as GCC is running in the background (even at nice 10), many trivial commands such as su or groups become extremely slow and take a few seconds to complete. strace revealed that data exchange over the slapd socket was where most of the time was spent. Looking at the slapd side, I see several calls to sched_yield() like this: [pid 8780] 0.33 stat64(gidNumber.dbb, 0xb7b3ebcc) = -1 EACCES (Permission denied) [pid 8780] 0.59 pread(20, \0\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\2\0\344\17\2\3..., 4096, 4096) = 4096 [pid 8780] 0.83 pread(20, \0\0\0\0\1\0\0\0\4\0\0\0\3\0\0\0\0\0\0\0\222\0\7\1\5\370..., 4096, 16384) = 4096 [pid 8780] 0.78 time(NULL)= 1124322520 [pid 8780] 0.66 pread(11, \0\0\0\0\1\0\0\0\250\0\0\0\231\0\0\0\235\0\0\0\16\..., 4096, 688128) = 4096 [pid 8780] 0.000241 write(19, 0e\2\1\3d`\4$cn=bernie,ou=group,dc=d..., 103) = 103 [pid 8780] 0.000137 sched_yield( unfinished ... [pid 8781] 0.050020 ... sched_yield resumed ) = 0 [pid 8780] 0.25 ... sched_yield resumed ) = 0 [pid 8781] 0.60 futex(0x925ab20, FUTEX_WAIT, 33, NULL unfinished ... [pid 8780] 0.26 write(19, 0\f\2\1\3e\7\n\1\0\4\0\4\0, 14) = 14 [pid 8774] 0.000774 ... select resumed ) = 1 (in [19]) The relative timestamp reveals that slapd is spending 50ms after yielding. Meanwhile, GCC is probably being scheduled for a whole quantum. Reading the man-page of sched_yield() it seems this isn't the correct behavior: Note: If the current process is the only process in the highest priority list at that time, this process will continue to run after a call to sched_yield. I also think OpenLDAP is wrong. First, it should be calling pthread_yield() because slapd is a multithreading process and it just wants to run the other threads. See: int ldap_pvt_thread_yield( void ) { #if HAVE_THR_YIELD return thr_yield(); #elif HAVE_PTHREADS == 10 return sched_yield(); #elif defined(_POSIX_THREAD_IS_GNU_PTH) sched_yield(); return 0; #elif HAVE_PTHREADS == 6 pthread_yield(NULL); return 0; #else pthread_yield(); return 0; #endif } -- // Bernardo Innocenti - Develer S.r.l., RD dept. \X/ http://www.develer.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
On Thu, 18 Aug 2005 10:50 am, Bernardo Innocenti wrote: Hello, I've been investigating a performance problem on a server using OpenLDAP 2.2.26 for nss resolution and running kernel 2.6.12. When a CPU bound process such as GCC is running in the background (even at nice 10), many trivial commands such as su or groups become extremely slow and take a few seconds to complete. strace revealed that data exchange over the slapd socket was where most of the time was spent. Looking at the slapd side, I see several calls to sched_yield() like this: [pid 8780] 0.33 stat64(gidNumber.dbb, 0xb7b3ebcc) = -1 EACCES (Permission denied) [pid 8780] 0.59 pread(20, \0\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\2\0\344\17\2\3..., 4096, 4096) = 4096 [pid 8780] 0.83 pread(20, \0\0\0\0\1\0\0\0\4\0\0\0\3\0\0\0\0\0\0\0\222\0\7\1\5\370..., 4096, 16384) = 4096 [pid 8780] 0.78 time(NULL)= 1124322520 [pid 8780] 0.66 pread(11, \0\0\0\0\1\0\0\0\250\0\0\0\231\0\0\0\235\0\0\0\16\..., 4096, 688128) = 4096 [pid 8780] 0.000241 write(19, 0e\2\1\3d`\4$cn=bernie,ou=group,dc=d..., 103) = 103 [pid 8780] 0.000137 sched_yield( unfinished ... [pid 8781] 0.050020 ... sched_yield resumed ) = 0 [pid 8780] 0.25 ... sched_yield resumed ) = 0 [pid 8781] 0.60 futex(0x925ab20, FUTEX_WAIT, 33, NULL unfinished ... [pid 8780] 0.26 write(19, 0\f\2\1\3e\7\n\1\0\4\0\4\0, 14) = 14 [pid 8774] 0.000774 ... select resumed ) = 1 (in [19]) The relative timestamp reveals that slapd is spending 50ms after yielding. Meanwhile, GCC is probably being scheduled for a whole quantum. Reading the man-page of sched_yield() it seems this isn't the correct behavior: Note: If the current process is the only process in the highest priority list at that time, this process will continue to run after a call to sched_yield. I also think OpenLDAP is wrong. First, it should be calling pthread_yield() because slapd is a multithreading process and it just wants to run the other threads. See: sched_yield behaviour changed in 2.5 series more than 3 years ago and applications that use this as a locking primitive should be updated. Cheers, Con - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
On Thu, Aug 18, 2005 at 02:50:16AM +0200, Bernardo Innocenti wrote: The relative timestamp reveals that slapd is spending 50ms after yielding. Meanwhile, GCC is probably being scheduled for a whole quantum. Reading the man-page of sched_yield() it seems this isn't the correct behavior: Note: If the current process is the only process in the highest priority list at that time, this process will continue to run after a call to sched_yield. The behavior of sched_yield changed for 2.6. I suppose the man page didn't get updated. From linux/Documentation/post-halloween.txt: | - The behavior of sched_yield() changed a lot. A task that uses | this system call should now expect to sleep for possibly a very | long time. Tasks that do not really desire to give up the | processor for a while should probably not make heavy use of this | function. Unfortunately, some GUI programs (like Open Office) | do make excessive use of this call and under load their | performance is poor. It seems this new 2.6 behavior is optimal | but some user-space applications may need fixing. This is pretty much all I know about it; I just thought I'd point it out. I also think OpenLDAP is wrong. First, it should be calling pthread_yield() because slapd is a multithreading process and it just wants to run the other threads. See: Is it possible that this problem has been noticed and fixed already? -- Joseph Fannin [EMAIL PROTECTED] /* So there I am, in the middle of my `netfilter-is-wonderful' talk in Sydney, and someone asks `What happens if you try to enlarge a 64k packet here?'. I think I said something eloquent like `fuck'. - RR */ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Joseph Fannin wrote: The behavior of sched_yield changed for 2.6. I suppose the man page didn't get updated. Now I remember reading about that on LWN or maybe KernelTraffic. Thanks! I also think OpenLDAP is wrong. First, it should be calling pthread_yield() because slapd is a multithreading process and it just wants to run the other threads. See: Is it possible that this problem has been noticed and fixed already? The OpenLDAP 2.3.5 source still looks like this. I've filed a report in OpenLDAP's issue tracker: http://www.openldap.org/its/index.cgi/Incoming?id=3950;page=2 -- // Bernardo Innocenti - Develer S.r.l., RD dept. \X/ http://www.develer.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield() makes OpenLDAP slow
Joseph Fannin wrote: On Thu, Aug 18, 2005 at 02:50:16AM +0200, Bernardo Innocenti wrote: The relative timestamp reveals that slapd is spending 50ms after yielding. Meanwhile, GCC is probably being scheduled for a whole quantum. Reading the man-page of sched_yield() it seems this isn't the correct behavior: Note: If the current process is the only process in the highest priority list at that time, this process will continue to run after a call to sched_yield. The behavior of sched_yield changed for 2.6. I suppose the man page didn't get updated. We class the SCHED_OTHER policy as having a single priority, which I believe is allowed (and even makes good sense, because dynamic and even nice priorities aren't really well defined). That also makes our sched_yield() behaviour correct. AFAIKS, sched_yield should only really be used by realtime applications that know exactly what they're doing. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/