Re: nagios and pthreads

2005-07-14 Thread Michal Mertl
Jeremie Le Hen wrote:
 Hi Christophe,
 
 a quick glance at the archives whould have helped you.

I also experienced the problem and read the thread. I don't believe
anybody found and shared a way to solve it. The conclusion of the thread
was that the problem is more in the application then in FreeBSD - the
application does things not well defined in POSIX threading environment.

The right fix is probably a non trivial change to Nagios.


  i know that we add already discuss about this problem, but is there any
  solution for this problem ?
  
  ---
  What's section on nagios website
  FreeBSD and threads. On FreeBSD there's a native user-level
  implementation of threads called 'pthread' and there's also an optional
  ports collection 'linuxthreads' that uses kernel hooks. Some folks from
  Yahoo! have reported that using the pthread library causes Nagios to pause
  under heavy I/O load, causing some service check results to be lost.
  Switching to linuxthreads seems to help this problem, but not fix it. The
  lock happens in liblthread's __pthread_acquire() - it can't ever acquire
  the spinlock. It happens when the main thread forks to execute an active
  check. On the second fork to create the grandchild, the grandchild is
  created by fork, but never returns from liblthread's fork wrapper, because
  it's stuck in __pthread_acquire(). Maybe some FreeBSD users can help out
  with this problem.
  ---
  
  
  I have just upgraded to 5.4-STABLE but i encountered again the problem.
  Sometimes, there is a nagios forked child process which consume 100% of
  CPU.
  i have heard that there was perhaps a problem with libc_r reported by
  Luigi Rizzo on this list 06/22/2005, but no news since this date...
  
  My workaround is to have a cron job which run every hour and check if
  there is a bad nagios process and kill it... i know it's very ugly...
  
  Do you any solution or what could i do to get more trace when it happen ?
  sorry, but i am not familiar with ktrace like tools... If someone could
  help me to help nagios community on freebsd ;-) ?
  
  Thanks in advance.
 
 This thread should countain some answers :
 http://lists.freebsd.org/pipermail/freebsd-hackers/2005-June/012435.html
 
 Regards,

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nagios and pthreads

2005-07-13 Thread Jeremie Le Hen
Hi Christophe,

a quick glance at the archives whould have helped you.

 i know that we add already discuss about this problem, but is there any
 solution for this problem ?
 
 ---
 What's section on nagios website
 FreeBSD and threads. On FreeBSD there's a native user-level
 implementation of threads called 'pthread' and there's also an optional
 ports collection 'linuxthreads' that uses kernel hooks. Some folks from
 Yahoo! have reported that using the pthread library causes Nagios to pause
 under heavy I/O load, causing some service check results to be lost.
 Switching to linuxthreads seems to help this problem, but not fix it. The
 lock happens in liblthread's __pthread_acquire() - it can't ever acquire
 the spinlock. It happens when the main thread forks to execute an active
 check. On the second fork to create the grandchild, the grandchild is
 created by fork, but never returns from liblthread's fork wrapper, because
 it's stuck in __pthread_acquire(). Maybe some FreeBSD users can help out
 with this problem.
 ---
 
 
 I have just upgraded to 5.4-STABLE but i encountered again the problem.
 Sometimes, there is a nagios forked child process which consume 100% of
 CPU.
 i have heard that there was perhaps a problem with libc_r reported by
 Luigi Rizzo on this list 06/22/2005, but no news since this date...
 
 My workaround is to have a cron job which run every hour and check if
 there is a bad nagios process and kill it... i know it's very ugly...
 
 Do you any solution or what could i do to get more trace when it happen ?
 sorry, but i am not familiar with ktrace like tools... If someone could
 help me to help nagios community on freebsd ;-) ?
 
 Thanks in advance.

This thread should countain some answers :
http://lists.freebsd.org/pipermail/freebsd-hackers/2005-June/012435.html

Regards,
-- 
Jeremie Le Hen
 jeremie at le-hen dot org  ttz at chchile dot org 
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nagios and pthreads

2005-07-13 Thread Christophe Yayon
Hi,

Yes but if i understand, there is a bug in libc_r on FreeBSD ?


 Hi Christophe,

 a quick glance at the archives whould have helped you.

 i know that we add already discuss about this problem, but is there any
 solution for this problem ?

 ---
 What's section on nagios website
 FreeBSD and threads. On FreeBSD there's a native user-level
 implementation of threads called 'pthread' and there's also an optional
 ports collection 'linuxthreads' that uses kernel hooks. Some folks from
 Yahoo! have reported that using the pthread library causes Nagios to
 pause
 under heavy I/O load, causing some service check results to be lost.
 Switching to linuxthreads seems to help this problem, but not fix it.
 The
 lock happens in liblthread's __pthread_acquire() - it can't ever acquire
 the spinlock. It happens when the main thread forks to execute an active
 check. On the second fork to create the grandchild, the grandchild is
 created by fork, but never returns from liblthread's fork wrapper,
 because
 it's stuck in __pthread_acquire(). Maybe some FreeBSD users can help out
 with this problem.
 ---


 I have just upgraded to 5.4-STABLE but i encountered again the problem.
 Sometimes, there is a nagios forked child process which consume 100% of
 CPU.
 i have heard that there was perhaps a problem with libc_r reported by
 Luigi Rizzo on this list 06/22/2005, but no news since this date...

 My workaround is to have a cron job which run every hour and check if
 there is a bad nagios process and kill it... i know it's very ugly...

 Do you any solution or what could i do to get more trace when it happen
 ?
 sorry, but i am not familiar with ktrace like tools... If someone could
 help me to help nagios community on freebsd ;-) ?

 Thanks in advance.

 This thread should countain some answers :
 http://lists.freebsd.org/pipermail/freebsd-hackers/2005-June/012435.html

 Regards,
 --
 Jeremie Le Hen
  jeremie at le-hen dot org  ttz at chchile dot org 




___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nagios and pthreads

2005-07-13 Thread Jeremie Le Hen
Christophe,

 Yes but if i understand, there is a bug in libc_r on FreeBSD ?

libc_r indeed has some kind of bug, I don't know.

Anyhow, you are using RELENG_5, so you should be using native threads
with either libpthread (libkse, M:N) or libthr (1:1).

I don't know what Nagios does just after fork(2), it would be worth to
check.  It appears that fork(2)ing without exec(2)ing or _exit(2)ing
in a pthreaded program is not a valid behaviour, regarding to
SUSv3 [1].  I don't want to avoid admitting there is a problem in
FreeBSD threading library, I don't know how other OSes handle this,
but Nagios folks should really avoid doing what is explicitely
dissuaded in SUSv3.

For now, it doesn't resolve your problem unfortunately.

[1] http://www.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html
(look at the RATIONALE section)
-- 
Jeremie Le Hen
 jeremie at le-hen dot org  ttz at chchile dot org 
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nagios and pthreads

2005-07-13 Thread Christophe Yayon
Thanks Jeremie,

I will asked nagios developpers how to use libpthread, if there is a
configure option when compiling...

Bye.


 Christophe,

 Yes but if i understand, there is a bug in libc_r on FreeBSD ?

 libc_r indeed has some kind of bug, I don't know.

 Anyhow, you are using RELENG_5, so you should be using native threads
 with either libpthread (libkse, M:N) or libthr (1:1).

 I don't know what Nagios does just after fork(2), it would be worth to
 check.  It appears that fork(2)ing without exec(2)ing or _exit(2)ing
 in a pthreaded program is not a valid behaviour, regarding to
 SUSv3 [1].  I don't want to avoid admitting there is a problem in
 FreeBSD threading library, I don't know how other OSes handle this,
 but Nagios folks should really avoid doing what is explicitely
 dissuaded in SUSv3.

 For now, it doesn't resolve your problem unfortunately.

 [1]
 http://www.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html
 (look at the RATIONALE section)
 --
 Jeremie Le Hen
  jeremie at le-hen dot org  ttz at chchile dot org 




___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nagios and pthreads

2005-07-13 Thread Jeremie Le Hen
Hi Michal,

 I also experienced the problem and read the thread. I don't believe
 anybody found and shared a way to solve it. The conclusion of the thread
 was that the problem is more in the application then in FreeBSD - the
 application does things not well defined in POSIX threading environment.
 
 The right fix is probably a non trivial change to Nagios.

That is exactly my feeling.  I think Nagios got pthread support lately
and therefore has to lug its historic architectural choices.  This
problem, in conjunction to the fact that most open-source developpers
test their products under Linux only, leads to have a misbehaviour when
ran on other Unices, like FreeBSD.  Some brave people with appropriate
skills and motivation should try to patch Nagios and then try to convince
Nagios developpers to integrate this change.

Regards,
-- 
Jeremie Le Hen
 jeremie at le-hen dot org  ttz at chchile dot org 
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


nagios and pthreads

2005-07-12 Thread Christophe Yayon
Hi all,


i know that we add already discuss about this problem, but is there any
solution for this problem ?

---
What's section on nagios website
FreeBSD and threads. On FreeBSD there's a native user-level
implementation of threads called 'pthread' and there's also an optional
ports collection 'linuxthreads' that uses kernel hooks. Some folks from
Yahoo! have reported that using the pthread library causes Nagios to pause
under heavy I/O load, causing some service check results to be lost.
Switching to linuxthreads seems to help this problem, but not fix it. The
lock happens in liblthread's __pthread_acquire() - it can't ever acquire
the spinlock. It happens when the main thread forks to execute an active
check. On the second fork to create the grandchild, the grandchild is
created by fork, but never returns from liblthread's fork wrapper, because
it's stuck in __pthread_acquire(). Maybe some FreeBSD users can help out
with this problem.
---


I have just upgraded to 5.4-STABLE but i encountered again the problem.
Sometimes, there is a nagios forked child process which consume 100% of
CPU.
i have heard that there was perhaps a problem with libc_r reported by
Luigi Rizzo on this list 06/22/2005, but no news since this date...

My workaround is to have a cron job which run every hour and check if
there is a bad nagios process and kill it... i know it's very ugly...

Do you any solution or what could i do to get more trace when it happen ?
sorry, but i am not familiar with ktrace like tools... If someone could
help me to help nagios community on freebsd ;-) ?

Thanks in advance.




___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]