Re: Exec-Program output: freeradius not reading response?
On Tue, Oct 26, 2004 at 02:54:45PM -0700, Nate M wrote: > > > > I've done some troubleshooting of my own, and unsure if this is helpful or > > not, but the process appears to be hanging indefinitely until cleaned up > > within this section of threads.c (beginning line 1141). The line in > > particular it hangs on is the "rcode = ..." line. I am not enuff of a C > > guru to know where to go from here though. > > > > re_wait: > > rcode = sem_wait(&forkers[found].child_done); > > if ((rcode != 0) && (errno == EINTR)) { > > goto re_wait; > > } > > } > > Your time and help in troubleshooting this has been greatly appreciated! > > =) > Additionally.. I just compiled 2.4.27 kernel on this machine and the problem > stops. 2.6.5, 2.6.8.1 and 2.6.9 all vomit. 2.6 bug perhaps? Hmm. It might be an NPTL issue... Try setting the following environment variable for FreeRADIUS and see if that fixes it: LD_ASSUME_KERNEL=2.4.1 (This _should_ make it run with LinuxThreads, rather than NPTL.) (See http://people.redhat.com/drepper/assumekernel.html for details of what LD_ASSUME_KERNEL does.) -- Paul "TBBle" Hampson, on an alternate email client. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Exec-Program output: freeradius not reading response?
"Nate M" <[EMAIL PROTECTED]> wrote: > Additionally.. I just compiled 2.4.27 kernel on this machine and the problem > stops. 2.6.5, 2.6.8.1 and 2.6.9 all vomit. 2.6 bug perhaps? Looks like it. If the FreeRADIUS code works on other platforms, and other versions of Linux, then I'm inclined to say that the FreeRADIUS code is correct, and 2.6 isn't. As to how to fix it, I'm not sure I can suggest anything other than bugging the Linux people. Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
RE: Exec-Program output: freeradius not reading response?
> > I've done some troubleshooting of my own, and unsure if this is helpful or > not, but the process appears to be hanging indefinitely until cleaned up > within this section of threads.c (beginning line 1141). The line in > particular it hangs on is the "rcode = ..." line. I am not enuff of a C > guru to know where to go from here though. > > re_wait: > rcode = sem_wait(&forkers[found].child_done); > if ((rcode != 0) && (errno == EINTR)) { > goto re_wait; > } > } > > Your time and help in troubleshooting this has been greatly appreciated! > =) > Additionally.. I just compiled 2.4.27 kernel on this machine and the problem stops. 2.6.5, 2.6.8.1 and 2.6.9 all vomit. 2.6 bug perhaps? -Nate - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
RE: Exec-Program output: freeradius not reading response?
> > "Nate M" <[EMAIL PROTECTED]> wrote: > > > Problem exists, when posting multiple requests to radiusd it > > occasionally > > > will not receive or somehow omit the exit status of Exec-Program-Wait. > > > > I haven't been able to reproduce it here, so I'm not sure how to fix > > it. > > > > The only thing I can think of is that some platforms don't have > > pthread_sigmask. See src/main/threads.c for how it's used. > > > > Alan DeKok. > > Thanks for the reply Alan, I did confirm my test systems have > pthread_sigmask: > > checking for pthread.h... yes > checking for pthread_create in -lpthread... yes > checking for pthread_sigmask... yes > > While troubleshooting I also confirmed the same issue with rlm_exec doing > a > similar task to what I'm accomplishing in exec-program-wait. > > I've reproduced this on various systems (although, all are newer RH or > Fedora installs) and all perform the same. I however was not able to > duplicate it on an older Redhat 7.2 machine. > > Is there additional data I can provide to further diag this issue? I'm > not > opposed to opening up access to this test box if that would be helpful. > I've done some troubleshooting of my own, and unsure if this is helpful or not, but the process appears to be hanging indefinitely until cleaned up within this section of threads.c (beginning line 1141). The line in particular it hangs on is the "rcode = ..." line. I am not enuff of a C guru to know where to go from here though. re_wait: rcode = sem_wait(&forkers[found].child_done); if ((rcode != 0) && (errno == EINTR)) { goto re_wait; } } Your time and help in troubleshooting this has been greatly appreciated! =) - Nate - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Exec-Program output: freeradius not reading response?
"Nate M" <[EMAIL PROTECTED]> wrote: > While troubleshooting I also confirmed the same issue with rlm_exec doing a > similar task to what I'm accomplishing in exec-program-wait. rlm_exec calls the same functions to do the exec, so it should have all the same "features" as Exec-Program-Wait. > I've reproduced this on various systems (although, all are newer RH or > Fedora installs) and all perform the same. I however was not able to > duplicate it on an older Redhat 7.2 machine. That sounds to me like it's a problem with newer glibc, or kernel. I don't see the problem on the Solaris or NetBSD machines I have access to. > Is there additional data I can provide to further diag this issue? The problem is that the SIGCHLD's are going somewhere, but not where they're supposed to go. So the code in FreeRADIUS doesn't work, because the signals aren't behaving as expected. > I'm not opposed to opening up access to this test box if that would > be helpful. I don't have time for that, sorry. All I can suggest is a re-examination of the way the server deals with threads & SIGCHLD's. Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
RE: Exec-Program output: freeradius not reading response?
> "Nate M" <[EMAIL PROTECTED]> wrote: > > Problem exists, when posting multiple requests to radiusd it > occasionally > > will not receive or somehow omit the exit status of Exec-Program-Wait. > > I haven't been able to reproduce it here, so I'm not sure how to fix > it. > > The only thing I can think of is that some platforms don't have > pthread_sigmask. See src/main/threads.c for how it's used. > > Alan DeKok. Thanks for the reply Alan, I did confirm my test systems have pthread_sigmask: checking for pthread.h... yes checking for pthread_create in -lpthread... yes checking for pthread_sigmask... yes While troubleshooting I also confirmed the same issue with rlm_exec doing a similar task to what I'm accomplishing in exec-program-wait. I've reproduced this on various systems (although, all are newer RH or Fedora installs) and all perform the same. I however was not able to duplicate it on an older Redhat 7.2 machine. Is there additional data I can provide to further diag this issue? I'm not opposed to opening up access to this test box if that would be helpful. - Nate - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Exec-Program output: freeradius not reading response?
"Nate M" <[EMAIL PROTECTED]> wrote: > Problem exists, when posting multiple requests to radiusd it occasionally > will not receive or somehow omit the exit status of Exec-Program-Wait. I haven't been able to reproduce it here, so I'm not sure how to fix it. The only thing I can think of is that some platforms don't have pthread_sigmask. See src/main/threads.c for how it's used. Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Exec-Program output: freeradius not reading response?
Problem exists, when posting multiple requests to radiusd it occasionally will not receive or somehow omit the exit status of Exec-Program-Wait. -- log snippet - radius_xlat: '/etc/raddb/scripts/test.pl' Exec-Program: /etc/raddb/scripts/test.pl Waking up in 3 seconds... Exec-Program output: --- Walking the entire request list --- And later the process blorts.. --- log snippet --- WARNING: Unresponsive child (id 1098905952) for request 6 Server rejecting request 6. A good request looks like: Exec-Program output: 0 (for success) or Exec-Program output: 1 (for reject) I can duplicate this over and over on various machines and platforms. Problem cannot be duplicated in -s mode. I have tons of extra logs available (and previously posted in list) if that will help diagnose this issue. Anyone's help is greatly appreciated. - Nathan Miller - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html