Re: nagios and pthreads
Jeremie Le Hen wrote: Hi Christophe, a quick glance at the archives whould have helped you. I also experienced the problem and read the thread. I don't believe anybody found and shared a way to solve it. The conclusion of the thread was that the problem is more in the application then in FreeBSD - the application does things not well defined in POSIX threading environment. The right fix is probably a non trivial change to Nagios. i know that we add already discuss about this problem, but is there any solution for this problem ? --- What's section on nagios website FreeBSD and threads. On FreeBSD there's a native user-level implementation of threads called 'pthread' and there's also an optional ports collection 'linuxthreads' that uses kernel hooks. Some folks from Yahoo! have reported that using the pthread library causes Nagios to pause under heavy I/O load, causing some service check results to be lost. Switching to linuxthreads seems to help this problem, but not fix it. The lock happens in liblthread's __pthread_acquire() - it can't ever acquire the spinlock. It happens when the main thread forks to execute an active check. On the second fork to create the grandchild, the grandchild is created by fork, but never returns from liblthread's fork wrapper, because it's stuck in __pthread_acquire(). Maybe some FreeBSD users can help out with this problem. --- I have just upgraded to 5.4-STABLE but i encountered again the problem. Sometimes, there is a nagios forked child process which consume 100% of CPU. i have heard that there was perhaps a problem with libc_r reported by Luigi Rizzo on this list 06/22/2005, but no news since this date... My workaround is to have a cron job which run every hour and check if there is a bad nagios process and kill it... i know it's very ugly... Do you any solution or what could i do to get more trace when it happen ? sorry, but i am not familiar with ktrace like tools... If someone could help me to help nagios community on freebsd ;-) ? Thanks in advance. This thread should countain some answers : http://lists.freebsd.org/pipermail/freebsd-hackers/2005-June/012435.html Regards, ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nagios and pthreads
Hi Christophe, a quick glance at the archives whould have helped you. i know that we add already discuss about this problem, but is there any solution for this problem ? --- What's section on nagios website FreeBSD and threads. On FreeBSD there's a native user-level implementation of threads called 'pthread' and there's also an optional ports collection 'linuxthreads' that uses kernel hooks. Some folks from Yahoo! have reported that using the pthread library causes Nagios to pause under heavy I/O load, causing some service check results to be lost. Switching to linuxthreads seems to help this problem, but not fix it. The lock happens in liblthread's __pthread_acquire() - it can't ever acquire the spinlock. It happens when the main thread forks to execute an active check. On the second fork to create the grandchild, the grandchild is created by fork, but never returns from liblthread's fork wrapper, because it's stuck in __pthread_acquire(). Maybe some FreeBSD users can help out with this problem. --- I have just upgraded to 5.4-STABLE but i encountered again the problem. Sometimes, there is a nagios forked child process which consume 100% of CPU. i have heard that there was perhaps a problem with libc_r reported by Luigi Rizzo on this list 06/22/2005, but no news since this date... My workaround is to have a cron job which run every hour and check if there is a bad nagios process and kill it... i know it's very ugly... Do you any solution or what could i do to get more trace when it happen ? sorry, but i am not familiar with ktrace like tools... If someone could help me to help nagios community on freebsd ;-) ? Thanks in advance. This thread should countain some answers : http://lists.freebsd.org/pipermail/freebsd-hackers/2005-June/012435.html Regards, -- Jeremie Le Hen jeremie at le-hen dot org ttz at chchile dot org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nagios and pthreads
Hi, Yes but if i understand, there is a bug in libc_r on FreeBSD ? Hi Christophe, a quick glance at the archives whould have helped you. i know that we add already discuss about this problem, but is there any solution for this problem ? --- What's section on nagios website FreeBSD and threads. On FreeBSD there's a native user-level implementation of threads called 'pthread' and there's also an optional ports collection 'linuxthreads' that uses kernel hooks. Some folks from Yahoo! have reported that using the pthread library causes Nagios to pause under heavy I/O load, causing some service check results to be lost. Switching to linuxthreads seems to help this problem, but not fix it. The lock happens in liblthread's __pthread_acquire() - it can't ever acquire the spinlock. It happens when the main thread forks to execute an active check. On the second fork to create the grandchild, the grandchild is created by fork, but never returns from liblthread's fork wrapper, because it's stuck in __pthread_acquire(). Maybe some FreeBSD users can help out with this problem. --- I have just upgraded to 5.4-STABLE but i encountered again the problem. Sometimes, there is a nagios forked child process which consume 100% of CPU. i have heard that there was perhaps a problem with libc_r reported by Luigi Rizzo on this list 06/22/2005, but no news since this date... My workaround is to have a cron job which run every hour and check if there is a bad nagios process and kill it... i know it's very ugly... Do you any solution or what could i do to get more trace when it happen ? sorry, but i am not familiar with ktrace like tools... If someone could help me to help nagios community on freebsd ;-) ? Thanks in advance. This thread should countain some answers : http://lists.freebsd.org/pipermail/freebsd-hackers/2005-June/012435.html Regards, -- Jeremie Le Hen jeremie at le-hen dot org ttz at chchile dot org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nagios and pthreads
Christophe, Yes but if i understand, there is a bug in libc_r on FreeBSD ? libc_r indeed has some kind of bug, I don't know. Anyhow, you are using RELENG_5, so you should be using native threads with either libpthread (libkse, M:N) or libthr (1:1). I don't know what Nagios does just after fork(2), it would be worth to check. It appears that fork(2)ing without exec(2)ing or _exit(2)ing in a pthreaded program is not a valid behaviour, regarding to SUSv3 [1]. I don't want to avoid admitting there is a problem in FreeBSD threading library, I don't know how other OSes handle this, but Nagios folks should really avoid doing what is explicitely dissuaded in SUSv3. For now, it doesn't resolve your problem unfortunately. [1] http://www.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html (look at the RATIONALE section) -- Jeremie Le Hen jeremie at le-hen dot org ttz at chchile dot org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nagios and pthreads
Thanks Jeremie, I will asked nagios developpers how to use libpthread, if there is a configure option when compiling... Bye. Christophe, Yes but if i understand, there is a bug in libc_r on FreeBSD ? libc_r indeed has some kind of bug, I don't know. Anyhow, you are using RELENG_5, so you should be using native threads with either libpthread (libkse, M:N) or libthr (1:1). I don't know what Nagios does just after fork(2), it would be worth to check. It appears that fork(2)ing without exec(2)ing or _exit(2)ing in a pthreaded program is not a valid behaviour, regarding to SUSv3 [1]. I don't want to avoid admitting there is a problem in FreeBSD threading library, I don't know how other OSes handle this, but Nagios folks should really avoid doing what is explicitely dissuaded in SUSv3. For now, it doesn't resolve your problem unfortunately. [1] http://www.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html (look at the RATIONALE section) -- Jeremie Le Hen jeremie at le-hen dot org ttz at chchile dot org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nagios and pthreads
Hi Michal, I also experienced the problem and read the thread. I don't believe anybody found and shared a way to solve it. The conclusion of the thread was that the problem is more in the application then in FreeBSD - the application does things not well defined in POSIX threading environment. The right fix is probably a non trivial change to Nagios. That is exactly my feeling. I think Nagios got pthread support lately and therefore has to lug its historic architectural choices. This problem, in conjunction to the fact that most open-source developpers test their products under Linux only, leads to have a misbehaviour when ran on other Unices, like FreeBSD. Some brave people with appropriate skills and motivation should try to patch Nagios and then try to convince Nagios developpers to integrate this change. Regards, -- Jeremie Le Hen jeremie at le-hen dot org ttz at chchile dot org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
nagios and pthreads
Hi all, i know that we add already discuss about this problem, but is there any solution for this problem ? --- What's section on nagios website FreeBSD and threads. On FreeBSD there's a native user-level implementation of threads called 'pthread' and there's also an optional ports collection 'linuxthreads' that uses kernel hooks. Some folks from Yahoo! have reported that using the pthread library causes Nagios to pause under heavy I/O load, causing some service check results to be lost. Switching to linuxthreads seems to help this problem, but not fix it. The lock happens in liblthread's __pthread_acquire() - it can't ever acquire the spinlock. It happens when the main thread forks to execute an active check. On the second fork to create the grandchild, the grandchild is created by fork, but never returns from liblthread's fork wrapper, because it's stuck in __pthread_acquire(). Maybe some FreeBSD users can help out with this problem. --- I have just upgraded to 5.4-STABLE but i encountered again the problem. Sometimes, there is a nagios forked child process which consume 100% of CPU. i have heard that there was perhaps a problem with libc_r reported by Luigi Rizzo on this list 06/22/2005, but no news since this date... My workaround is to have a cron job which run every hour and check if there is a bad nagios process and kill it... i know it's very ugly... Do you any solution or what could i do to get more trace when it happen ? sorry, but i am not familiar with ktrace like tools... If someone could help me to help nagios community on freebsd ;-) ? Thanks in advance. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]