Re: [NFS] [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread Wolfgang Walter
Am Mittwoch, 12. September 2007 15:37 schrieb J. Bruce Fields:
> On Wed, Sep 12, 2007 at 02:07:10PM +0200, Wolfgang Walter wrote:
> > as already described old temporary sockets (client is gone) of lockd
> > aren't closed after some time. So, with enough clients and some time
> > gone, there are 80 open dangling sockets and you start getting messages
> > of the form:
> >
> > lockd: too many open TCP sockets, consider increasing the number of nfsd
> > threads.
>
> Thanks for working on this problem!
>
> > If I understand the code then the intention was that the server closes
> > temporary sockets after about 6 to 12 minutes:
> >
> > a timer is started which calls svc_age_temp_sockets every 6 minutes.
> >
> > svc_age_temp_sockets:
> > if a socket is marked OLD it gets closed.
> > sockets which are not marked as OLD are marked OLD
> >
> > every time the sockets receives something OLD is cleared.
> >
> > But svc_age_temp_sockets never closes any socket though because it only
> > closes sockets with svsk->sk_inuse == 0. This seems to be a bug.
> >
> > Here is a patch against 2.6.22.6 which changes the test to
> > svsk->sk_inuse <= 0 which was probably meant. The patched kernel runs
> > fine here. Unused sockets get closed (after 6 to 12 minutes)
>
> So the fact that this changes the behavior means that sk_inuse is taking
> on negative values.  This can't be right--how can something like
> svc_sock_put() (which does an atomic_dec_and_test) work in that case?

You probably misread the code.

if (atomic_read(>sk_inuse) || test_bit(SK_BUSY, >sk_flags))
continue;

This means: any socket where svsk->sk_inuse != 0 or SK_BUSY is set is ignored
by svc_age_temp_sockets: no attempt is made to close the svc.

This seems to be wrong: if svsk->sk_inuse is zero only if svc_delete_socket
has been called for it and will be deleted anyway (probably it is already
closed then).

But the intention of svc_age_temp_sockets is to close open temporary
sockets where no traffic has been received for more than 6 minutes. These
sockets have svsk->sk_inuse >= 1.

My patch does exactly this:

instead of

"skip sockets which are not already deleted or which are busy"

to

"skip sockets which are already deleted or which are busy"

>
> I wish I had time today to figure out what's going on in this case.  But
> from a quick through svsock.c for sk_inuse, it looks odd; I'm suspicious
> of anything without the stereotyped behavior--initializing to one,
> atomic_inc()ing whenever someone takes a reference, and
> atomic_dec_and_test()ing whenever someone drops it
>

Then svc_tcp_accept would be wrong, too (it closes sockets the same way just
without testing for sk_inuse and SK_BUSY).

I think this works because as long as a socket is in sv_tempsocks or
sv_permsocks svsk->sk_inuse can never reach zero. As svc_age_temp_sockets locks
the list nobody can bring svsk->sk_inuse to zero as long as
svc_age_temp_sockets holds the lock. As svc_age_temp_sockets calls
atomic_inc(>sk_inuse) when holding the lock there is no
problem. (the same is true for svc_tcp_accept).

This is the reason why I doubt that this check for svsk->sk_inuse in
svc_age_temp_sockets is usefull at all. It should be always false.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NFS] [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread J. Bruce Fields
On Wed, Sep 12, 2007 at 09:37:29AM -0400, bfields wrote:
> So the fact that this changes the behavior means that sk_inuse is taking
> on negative values.

Uh, no, I misread the tests, sorry.  I'm not awake.--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NFS] [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread J. Bruce Fields
On Wed, Sep 12, 2007 at 02:07:10PM +0200, Wolfgang Walter wrote:
> as already described old temporary sockets (client is gone) of lockd aren't
> closed after some time. So, with enough clients and some time gone, there
> are 80 open dangling sockets and you start getting messages of the form:
> 
> lockd: too many open TCP sockets, consider increasing the number of nfsd 
> threads.

Thanks for working on this problem!

> If I understand the code then the intention was that the server closes
> temporary sockets after about 6 to 12 minutes:
> 
>   a timer is started which calls svc_age_temp_sockets every 6 minutes.
> 
>   svc_age_temp_sockets:
>   if a socket is marked OLD it gets closed.
>   sockets which are not marked as OLD are marked OLD
> 
>   every time the sockets receives something OLD is cleared.
> 
> But svc_age_temp_sockets never closes any socket though because it only
> closes sockets with svsk->sk_inuse == 0. This seems to be a bug.
> 
> Here is a patch against 2.6.22.6 which changes the test to
> svsk->sk_inuse <= 0 which was probably meant. The patched kernel runs fine
> here. Unused sockets get closed (after 6 to 12 minutes)

So the fact that this changes the behavior means that sk_inuse is taking
on negative values.  This can't be right--how can something like
svc_sock_put() (which does an atomic_dec_and_test) work in that case?

I wish I had time today to figure out what's going on in this case.  But
from a quick through svsock.c for sk_inuse, it looks odd; I'm suspicious
of anything without the stereotyped behavior--initializing to one,
atomic_inc()ing whenever someone takes a reference, and
atomic_dec_and_test()ing whenever someone drops it

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NFS] [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread Wolfgang Walter
Am Mittwoch, 12. September 2007 15:37 schrieb J. Bruce Fields:
 On Wed, Sep 12, 2007 at 02:07:10PM +0200, Wolfgang Walter wrote:
  as already described old temporary sockets (client is gone) of lockd
  aren't closed after some time. So, with enough clients and some time
  gone, there are 80 open dangling sockets and you start getting messages
  of the form:
 
  lockd: too many open TCP sockets, consider increasing the number of nfsd
  threads.

 Thanks for working on this problem!

  If I understand the code then the intention was that the server closes
  temporary sockets after about 6 to 12 minutes:
 
  a timer is started which calls svc_age_temp_sockets every 6 minutes.
 
  svc_age_temp_sockets:
  if a socket is marked OLD it gets closed.
  sockets which are not marked as OLD are marked OLD
 
  every time the sockets receives something OLD is cleared.
 
  But svc_age_temp_sockets never closes any socket though because it only
  closes sockets with svsk-sk_inuse == 0. This seems to be a bug.
 
  Here is a patch against 2.6.22.6 which changes the test to
  svsk-sk_inuse = 0 which was probably meant. The patched kernel runs
  fine here. Unused sockets get closed (after 6 to 12 minutes)

 So the fact that this changes the behavior means that sk_inuse is taking
 on negative values.  This can't be right--how can something like
 svc_sock_put() (which does an atomic_dec_and_test) work in that case?

You probably misread the code.

if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, svsk-sk_flags))
continue;

This means: any socket where svsk-sk_inuse != 0 or SK_BUSY is set is ignored
by svc_age_temp_sockets: no attempt is made to close the svc.

This seems to be wrong: if svsk-sk_inuse is zero only if svc_delete_socket
has been called for it and will be deleted anyway (probably it is already
closed then).

But the intention of svc_age_temp_sockets is to close open temporary
sockets where no traffic has been received for more than 6 minutes. These
sockets have svsk-sk_inuse = 1.

My patch does exactly this:

instead of

skip sockets which are not already deleted or which are busy

to

skip sockets which are already deleted or which are busy


 I wish I had time today to figure out what's going on in this case.  But
 from a quick through svsock.c for sk_inuse, it looks odd; I'm suspicious
 of anything without the stereotyped behavior--initializing to one,
 atomic_inc()ing whenever someone takes a reference, and
 atomic_dec_and_test()ing whenever someone drops it


Then svc_tcp_accept would be wrong, too (it closes sockets the same way just
without testing for sk_inuse and SK_BUSY).

I think this works because as long as a socket is in sv_tempsocks or
sv_permsocks svsk-sk_inuse can never reach zero. As svc_age_temp_sockets locks
the list nobody can bring svsk-sk_inuse to zero as long as
svc_age_temp_sockets holds the lock. As svc_age_temp_sockets calls
atomic_inc(svsk-sk_inuse) when holding the lock there is no
problem. (the same is true for svc_tcp_accept).

This is the reason why I doubt that this check for svsk-sk_inuse in
svc_age_temp_sockets is usefull at all. It should be always false.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NFS] [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread J. Bruce Fields
On Wed, Sep 12, 2007 at 02:07:10PM +0200, Wolfgang Walter wrote:
 as already described old temporary sockets (client is gone) of lockd aren't
 closed after some time. So, with enough clients and some time gone, there
 are 80 open dangling sockets and you start getting messages of the form:
 
 lockd: too many open TCP sockets, consider increasing the number of nfsd 
 threads.

Thanks for working on this problem!

 If I understand the code then the intention was that the server closes
 temporary sockets after about 6 to 12 minutes:
 
   a timer is started which calls svc_age_temp_sockets every 6 minutes.
 
   svc_age_temp_sockets:
   if a socket is marked OLD it gets closed.
   sockets which are not marked as OLD are marked OLD
 
   every time the sockets receives something OLD is cleared.
 
 But svc_age_temp_sockets never closes any socket though because it only
 closes sockets with svsk-sk_inuse == 0. This seems to be a bug.
 
 Here is a patch against 2.6.22.6 which changes the test to
 svsk-sk_inuse = 0 which was probably meant. The patched kernel runs fine
 here. Unused sockets get closed (after 6 to 12 minutes)

So the fact that this changes the behavior means that sk_inuse is taking
on negative values.  This can't be right--how can something like
svc_sock_put() (which does an atomic_dec_and_test) work in that case?

I wish I had time today to figure out what's going on in this case.  But
from a quick through svsock.c for sk_inuse, it looks odd; I'm suspicious
of anything without the stereotyped behavior--initializing to one,
atomic_inc()ing whenever someone takes a reference, and
atomic_dec_and_test()ing whenever someone drops it

--b.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NFS] [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread J. Bruce Fields
On Wed, Sep 12, 2007 at 09:37:29AM -0400, bfields wrote:
 So the fact that this changes the behavior means that sk_inuse is taking
 on negative values.

Uh, no, I misread the tests, sorry.  I'm not awake.--b.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/