Re: Exec-Program output: freeradius not reading response?

2004-10-27 Thread Paul Hampson
On Tue, Oct 26, 2004 at 02:54:45PM -0700, Nate M wrote:
> > 
> > I've done some troubleshooting of my own, and unsure if this is helpful or
> > not, but the process appears to be hanging indefinitely until cleaned up
> > within this section of threads.c (beginning line 1141).  The line in
> > particular it hangs on is the "rcode = ..." line.  I am not enuff of a C
> > guru to know where to go from here though.
> > 
> > re_wait:
> > rcode = sem_wait(&forkers[found].child_done);
> > if ((rcode != 0) && (errno == EINTR)) {
> > goto re_wait;
> > }
> > }

> > Your time and help in troubleshooting this has been greatly appreciated!
> > =)

> Additionally.. I just compiled 2.4.27 kernel on this machine and the problem
> stops.  2.6.5, 2.6.8.1 and 2.6.9 all vomit.  2.6 bug perhaps?

Hmm. It might be an NPTL issue... Try setting the following environment
variable for FreeRADIUS and see if that fixes it:
LD_ASSUME_KERNEL=2.4.1
(This _should_ make it run with LinuxThreads, rather than NPTL.)

(See http://people.redhat.com/drepper/assumekernel.html for details of
what LD_ASSUME_KERNEL does.)

-- 
Paul "TBBle" Hampson, on an alternate email client.

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Re: Exec-Program output: freeradius not reading response?

2004-10-27 Thread Alan DeKok
"Nate M" <[EMAIL PROTECTED]> wrote:
> Additionally.. I just compiled 2.4.27 kernel on this machine and the problem
> stops.  2.6.5, 2.6.8.1 and 2.6.9 all vomit.  2.6 bug perhaps?

  Looks like it.  If the FreeRADIUS code works on other platforms, and
other versions of Linux, then I'm inclined to say that the FreeRADIUS
code is correct, and 2.6 isn't.

  As to how to fix it, I'm not sure I can suggest anything other than
bugging the Linux people.

  Alan DeKok.


- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


RE: Exec-Program output: freeradius not reading response?

2004-10-26 Thread Nate M
> 
> I've done some troubleshooting of my own, and unsure if this is helpful or
> not, but the process appears to be hanging indefinitely until cleaned up
> within this section of threads.c (beginning line 1141).  The line in
> particular it hangs on is the "rcode = ..." line.  I am not enuff of a C
> guru to know where to go from here though.
> 
> re_wait:
> rcode = sem_wait(&forkers[found].child_done);
> if ((rcode != 0) && (errno == EINTR)) {
> goto re_wait;
> }
> }
> 
> Your time and help in troubleshooting this has been greatly appreciated!
> =)
> 

Additionally.. I just compiled 2.4.27 kernel on this machine and the problem
stops.  2.6.5, 2.6.8.1 and 2.6.9 all vomit.  2.6 bug perhaps?

-Nate


- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


RE: Exec-Program output: freeradius not reading response?

2004-10-26 Thread Nate M
> > "Nate M" <[EMAIL PROTECTED]> wrote:
> > > Problem exists, when posting multiple requests to radiusd it
> > occasionally
> > > will not receive or somehow omit the exit status of Exec-Program-Wait.
> >
> >   I haven't been able to reproduce it here, so I'm not sure how to fix
> > it.
> >
> >   The only thing I can think of is that some platforms don't have
> > pthread_sigmask.  See src/main/threads.c for how it's used.
> >
> >   Alan DeKok.
> 
> Thanks for the reply Alan, I did confirm my test systems have
> pthread_sigmask:
> 
> checking for pthread.h... yes
> checking for pthread_create in -lpthread... yes
> checking for pthread_sigmask... yes
> 
> While troubleshooting I also confirmed the same issue with rlm_exec doing
> a
> similar task to what I'm accomplishing in exec-program-wait.
> 
> I've reproduced this on various systems (although, all are newer RH or
> Fedora installs) and all perform the same.  I however was not able to
> duplicate it on an older Redhat 7.2 machine.
> 
> Is there additional data I can provide to further diag this issue?  I'm
> not
> opposed to opening up access to this test box if that would be helpful.
> 

I've done some troubleshooting of my own, and unsure if this is helpful or
not, but the process appears to be hanging indefinitely until cleaned up
within this section of threads.c (beginning line 1141).  The line in
particular it hangs on is the "rcode = ..." line.  I am not enuff of a C
guru to know where to go from here though.

re_wait:
rcode = sem_wait(&forkers[found].child_done);
if ((rcode != 0) && (errno == EINTR)) {
goto re_wait;
}
}

Your time and help in troubleshooting this has been greatly appreciated! =)

- Nate


- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Re: Exec-Program output: freeradius not reading response?

2004-10-26 Thread Alan DeKok
"Nate M" <[EMAIL PROTECTED]> wrote:
> While troubleshooting I also confirmed the same issue with rlm_exec doing a
> similar task to what I'm accomplishing in exec-program-wait.

  rlm_exec calls the same functions to do the exec, so it should have
all the same "features" as Exec-Program-Wait.

> I've reproduced this on various systems (although, all are newer RH or
> Fedora installs) and all perform the same.  I however was not able to
> duplicate it on an older Redhat 7.2 machine.

  That sounds to me like it's a problem with newer glibc, or kernel.
I don't see the problem on the Solaris or NetBSD machines I have
access to.

> Is there additional data I can provide to further diag this issue? 

  The problem is that the SIGCHLD's are going somewhere, but not where
they're supposed to go.  So the code in FreeRADIUS doesn't work,
because the signals aren't behaving as expected.

> I'm not opposed to opening up access to this test box if that would
> be helpful.

  I don't have time for that, sorry.

  All I can suggest is a re-examination of the way the server deals
with threads & SIGCHLD's.

  Alan DeKok.

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


RE: Exec-Program output: freeradius not reading response?

2004-10-26 Thread Nate M
> "Nate M" <[EMAIL PROTECTED]> wrote:
> > Problem exists, when posting multiple requests to radiusd it
> occasionally
> > will not receive or somehow omit the exit status of Exec-Program-Wait.
> 
>   I haven't been able to reproduce it here, so I'm not sure how to fix
> it.
> 
>   The only thing I can think of is that some platforms don't have
> pthread_sigmask.  See src/main/threads.c for how it's used.
> 
>   Alan DeKok.

Thanks for the reply Alan, I did confirm my test systems have
pthread_sigmask:

checking for pthread.h... yes
checking for pthread_create in -lpthread... yes
checking for pthread_sigmask... yes

While troubleshooting I also confirmed the same issue with rlm_exec doing a
similar task to what I'm accomplishing in exec-program-wait.

I've reproduced this on various systems (although, all are newer RH or
Fedora installs) and all perform the same.  I however was not able to
duplicate it on an older Redhat 7.2 machine.

Is there additional data I can provide to further diag this issue?  I'm not
opposed to opening up access to this test box if that would be helpful.

- Nate


- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Re: Exec-Program output: freeradius not reading response?

2004-10-26 Thread Alan DeKok
"Nate M" <[EMAIL PROTECTED]> wrote:
> Problem exists, when posting multiple requests to radiusd it occasionally
> will not receive or somehow omit the exit status of Exec-Program-Wait.

  I haven't been able to reproduce it here, so I'm not sure how to fix
it.

  The only thing I can think of is that some platforms don't have
pthread_sigmask.  See src/main/threads.c for how it's used.

  Alan DeKok.


- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Exec-Program output: freeradius not reading response?

2004-10-25 Thread Nate M
Problem exists, when posting multiple requests to radiusd it occasionally
will not receive or somehow omit the exit status of Exec-Program-Wait.

-- log snippet -
radius_xlat:  '/etc/raddb/scripts/test.pl'
Exec-Program: /etc/raddb/scripts/test.pl
Waking up in 3 seconds...
Exec-Program output: 
--- Walking the entire request list ---

And later the process blorts.. 

--- log snippet ---
WARNING: Unresponsive child (id 1098905952) for request 6
Server rejecting request 6.

A good request looks like:

Exec-Program output: 0 (for success)
or
Exec-Program output: 1 (for reject)

I can duplicate this over and over on various machines and platforms.
Problem cannot be duplicated in -s mode.

I have tons of extra logs available (and previously posted in list) if that
will help diagnose this issue.

Anyone's help is greatly appreciated.


- Nathan Miller



- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html