Re: [PATCH 6/6] NLM: Add reference counting to lockd

2008-01-10 Thread Jeff Layton
On Thu, 10 Jan 2008 14:29:22 +1100
Neil Brown <[EMAIL PROTECTED]> wrote:

> On Tuesday January 8, [EMAIL PROTECTED] wrote:
> > ...and only have lockd exit when the last reference is dropped.
> > 
> > The problem is this:
> > 
> > When a lock that a client is blocking on comes free, lockd does
> > this in nlmsvc_grant_blocked():
> > 
> > nlm_async_call(block->b_call, NLMPROC_GRANTED_MSG,
> > &nlmsvc_grant_ops);
> > 
> > the callback from this call is nlmsvc_grant_callback(). That
> > function does this at the end to wake up lockd:
> > 
> > svc_wake_up(block->b_daemon);
> 
> Uhmmm... Maybe there is an easier way.
> 
> block->b_daemon will always be nlmsvc_serv, so can we simply make this
> 
>   svc_wake_up(nlmsvc_serv);
> with a little locking to make sure nlmsvc_serv is valid?
> 

That's very close to my original patch to fix this problem. I just
replaced svc_wake_up with a call to a new function that wakes up any
lockd that happens to be up. I'm not sure that my original patch was
careful enough with the locking though...

> Actually svc_wake_up is only called from lockd and goes through
> various hoops to find the right rqstp, which we could have known in
> advance.
> So store the rqstp in some global wrapped in a spinlock so we can
> access it safely and just:
> 
>spin_lock(whatever)
>if (nlmsvc_rqstp)
>   wake_up(&nlmsvc_rqstp->rq_wait)
>spin_unlock(whatever)
> 
> 
> That seems a somewhat simpler way of avoiding the particular problem.
> 

Yes. Much.

> 
> Hmmm I guess that nlmsvc_grant_callback could then be run after
> the 'lockd' module had been unloaded.
> Maybe nlm_shutdown_hosts could call rpc_killall_tasks(host->h_rpcclnt)
> on each host.  That should ensure the callback wont happen afterwards.
> 
> Maybe?
> 

I think so. If we let lockd go down before all the RPC's are done,
then the whole problem of accessing lockd data from them sounds like it
could be a problem. If not now, then future changes could cause it.

IIRC, The reason we don't get nlm_destroy_host done on each nlm_host in
this situation is because the h_count is too high. Doing
rpc_killall_tasks in this situation might fix that, but the logic in
all of this is pretty convoluted. I'll see if I can cook up a new
patchset that does this instead.

-- 
Jeff Layton <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] NLM: Add reference counting to lockd

2008-01-09 Thread Neil Brown
On Tuesday January 8, [EMAIL PROTECTED] wrote:
> ...and only have lockd exit when the last reference is dropped.
> 
> The problem is this:
> 
> When a lock that a client is blocking on comes free, lockd does this in
> nlmsvc_grant_blocked():
> 
> nlm_async_call(block->b_call, NLMPROC_GRANTED_MSG, &nlmsvc_grant_ops);
> 
> the callback from this call is nlmsvc_grant_callback(). That function
> does this at the end to wake up lockd:
> 
> svc_wake_up(block->b_daemon);

Uhmmm... Maybe there is an easier way.

block->b_daemon will always be nlmsvc_serv, so can we simply make this

svc_wake_up(nlmsvc_serv);
with a little locking to make sure nlmsvc_serv is valid?

Actually svc_wake_up is only called from lockd and goes through
various hoops to find the right rqstp, which we could have known in
advance.
So store the rqstp in some global wrapped in a spinlock so we can
access it safely and just:

   spin_lock(whatever)
   if (nlmsvc_rqstp)
wake_up(&nlmsvc_rqstp->rq_wait)
   spin_unlock(whatever)


That seems a somewhat simpler way of avoiding the particular problem.


Hmmm I guess that nlmsvc_grant_callback could then be run after
the 'lockd' module had been unloaded.
Maybe nlm_shutdown_hosts could call rpc_killall_tasks(host->h_rpcclnt)
on each host.  That should ensure the callback wont happen afterwards.

Maybe?

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] NLM: Add reference counting to lockd

2008-01-09 Thread Jeff Layton
On Wed, 9 Jan 2008 18:48:14 +
Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> On Wed, Jan 09, 2008 at 01:36:21PM -0500, Jeff Layton wrote:
> > I don't see a good alternative though. We need to be able to drop
> > the and check the refcount in nlmsvc_unlink_block. That function is
> > called from lockd, and we can't have lockd call kthread_stop on
> > itself.
> > 
> > If you see a better way to do this, I'm certainly open to
> > suggestions.
> > 
> > I'll note that my first stab at fixing this problem was to change
> > the svc_wake_up() call in the rpc callback to a routine to wake up
> > any lockd on the box that happened to be up. That sidesteps this
> > entire problem of having to make sure lockd stays up. If we decided
> > that was the right approach we could dump the last patch in this
> > series altogether.
> > 
> > That said there could be other use after free bugs lurking in the
> > lockd code so maybe keeping lockd up until nlm_blocked is empty is
> > the right thing to do.
> 
> What about just not exiting from lockd as long as nlm_blocked is not
> empty?  lockd_down still simply calls kthread_stop, but lockd only
> honours it when nlm_blocked is empty?

lockd can basically block forever in this situation if the client
goes away for good. With the current kthread implementation,
kthread_stops are serialized and I don't think we want to monopolize
the kthread_stop queue.

If kthread_stops could occur in parallel, that would be a different
situation :-)

-- 
Jeff Layton <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] NLM: Add reference counting to lockd

2008-01-09 Thread Christoph Hellwig
On Wed, Jan 09, 2008 at 01:36:21PM -0500, Jeff Layton wrote:
> I don't see a good alternative though. We need to be able to drop the
> and check the refcount in nlmsvc_unlink_block. That function is called
> from lockd, and we can't have lockd call kthread_stop on itself.
> 
> If you see a better way to do this, I'm certainly open to suggestions.
> 
> I'll note that my first stab at fixing this problem was to change the
> svc_wake_up() call in the rpc callback to a routine to wake up any
> lockd on the box that happened to be up. That sidesteps this entire
> problem of having to make sure lockd stays up. If we decided that was
> the right approach we could dump the last patch in this series
> altogether.
> 
> That said there could be other use after free bugs lurking in the lockd
> code so maybe keeping lockd up until nlm_blocked is empty is the right
> thing to do.

What about just not exiting from lockd as long as nlm_blocked is not
empty?  lockd_down still simply calls kthread_stop, but lockd only
honours it when nlm_blocked is empty?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] NLM: Add reference counting to lockd

2008-01-09 Thread Jeff Layton
On Wed, 9 Jan 2008 17:47:07 +
Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> On Tue, Jan 08, 2008 at 02:33:18PM -0500, Jeff Layton wrote:
> > ...and only have lockd exit when the last reference is dropped.
> > 
> > The problem is this:
> > 
> > When a lock that a client is blocking on comes free, lockd does
> > this in nlmsvc_grant_blocked():
> > 
> > nlm_async_call(block->b_call, NLMPROC_GRANTED_MSG,
> > &nlmsvc_grant_ops);
> > 
> > the callback from this call is nlmsvc_grant_callback(). That
> > function does this at the end to wake up lockd:
> > 
> > svc_wake_up(block->b_daemon);
> > 
> > However there is no guarantee that lockd will be up when this
> > happens. If someone shuts down or restarts lockd before the async
> > call completes, then the b_daemon pointer will point to freed
> > memory and the kernel may oops.
> > 
> > I first noticed this on older kernels and had mistakenly thought
> > that newer kernels weren't susceptible, but that's not correct.
> > There's a bit of a race to make sure that the nlm_host is bound
> > when the async call is done, but I can now reproduce this at will
> > on current kernels.
> > 
> > This patch is based on Trond's suggestion to add a new reference
> > counter to lockd, and only allows lockd to go down when it reaches
> > 0. With this change we can't use kthread_stop here.
> > nlmsvc_unlink_block is called by lockd and a kthread can't call
> > kthread_stop on itself. So the patch changes lockd to check the
> > refcount itself and to return if it goes to 0. We do the checking
> > and exit while holding the nlmsvc_mutex to make sure that a new
> > lockd is not started until the old one is down.
> 
> I don't like this signals/kthread mixture at all.  Why can't we simply
> call kthread_stop when the refcount hits zero and keep all the nice
> kthread helpers?
> 

As I stated in an earlier email, I'm not fond of this either :-)

I don't see a good alternative though. We need to be able to drop the
and check the refcount in nlmsvc_unlink_block. That function is called
from lockd, and we can't have lockd call kthread_stop on itself.

If you see a better way to do this, I'm certainly open to suggestions.

I'll note that my first stab at fixing this problem was to change the
svc_wake_up() call in the rpc callback to a routine to wake up any
lockd on the box that happened to be up. That sidesteps this entire
problem of having to make sure lockd stays up. If we decided that was
the right approach we could dump the last patch in this series
altogether.

That said there could be other use after free bugs lurking in the lockd
code so maybe keeping lockd up until nlm_blocked is empty is the right
thing to do.

-- 
Jeff Layton <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] NLM: Add reference counting to lockd

2008-01-09 Thread Christoph Hellwig
On Tue, Jan 08, 2008 at 02:33:18PM -0500, Jeff Layton wrote:
> ...and only have lockd exit when the last reference is dropped.
> 
> The problem is this:
> 
> When a lock that a client is blocking on comes free, lockd does this in
> nlmsvc_grant_blocked():
> 
> nlm_async_call(block->b_call, NLMPROC_GRANTED_MSG, &nlmsvc_grant_ops);
> 
> the callback from this call is nlmsvc_grant_callback(). That function
> does this at the end to wake up lockd:
> 
> svc_wake_up(block->b_daemon);
> 
> However there is no guarantee that lockd will be up when this happens.
> If someone shuts down or restarts lockd before the async call completes,
> then the b_daemon pointer will point to freed memory and the kernel may
> oops.
> 
> I first noticed this on older kernels and had mistakenly thought that
> newer kernels weren't susceptible, but that's not correct. There's a bit
> of a race to make sure that the nlm_host is bound when the async call is
> done, but I can now reproduce this at will on current kernels.
> 
> This patch is based on Trond's suggestion to add a new reference counter
> to lockd, and only allows lockd to go down when it reaches 0. With this
> change we can't use kthread_stop here. nlmsvc_unlink_block is called by
> lockd and a kthread can't call kthread_stop on itself. So the patch
> changes lockd to check the refcount itself and to return if it goes to
> 0. We do the checking and exit while holding the nlmsvc_mutex to make
> sure that a new lockd is not started until the old one is down.

I don't like this signals/kthread mixture at all.  Why can't we simply
call kthread_stop when the refcount hits zero and keep all the nice
kthread helpers?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/6] NLM: Add reference counting to lockd

2008-01-08 Thread Jeff Layton
...and only have lockd exit when the last reference is dropped.

The problem is this:

When a lock that a client is blocking on comes free, lockd does this in
nlmsvc_grant_blocked():

nlm_async_call(block->b_call, NLMPROC_GRANTED_MSG, &nlmsvc_grant_ops);

the callback from this call is nlmsvc_grant_callback(). That function
does this at the end to wake up lockd:

svc_wake_up(block->b_daemon);

However there is no guarantee that lockd will be up when this happens.
If someone shuts down or restarts lockd before the async call completes,
then the b_daemon pointer will point to freed memory and the kernel may
oops.

I first noticed this on older kernels and had mistakenly thought that
newer kernels weren't susceptible, but that's not correct. There's a bit
of a race to make sure that the nlm_host is bound when the async call is
done, but I can now reproduce this at will on current kernels.

This patch is based on Trond's suggestion to add a new reference counter
to lockd, and only allows lockd to go down when it reaches 0. With this
change we can't use kthread_stop here. nlmsvc_unlink_block is called by
lockd and a kthread can't call kthread_stop on itself. So the patch
changes lockd to check the refcount itself and to return if it goes to
0. We do the checking and exit while holding the nlmsvc_mutex to make
sure that a new lockd is not started until the old one is down.

Signed-off-by: Jeff Layton <[EMAIL PROTECTED]>
---
 fs/lockd/svc.c  |   50 +--
 fs/lockd/svclock.c  |8 +++
 include/linux/lockd/lockd.h |1 +
 3 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 0777a4e..b1918e9 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -51,6 +51,7 @@ static DEFINE_MUTEX(nlmsvc_mutex);
 static unsigned intnlmsvc_users;
 static struct task_struct  *nlmsvc_task;
 static struct svc_serv *nlmsvc_serv;
+atomic_t   nlmsvc_ref = ATOMIC_INIT(0);
 intnlmsvc_grace_period;
 unsigned long  nlmsvc_timeout;
 
@@ -133,7 +134,10 @@ lockd(void *vrqstp)
 
set_freezable();
 
-   /* Process request with signals blocked, but allow SIGKILL.  */
+   /*
+* Process request with signals blocked, but allow SIGKILL which
+* signifies that lockd should drop all of its locks.
+*/
allow_signal(SIGKILL);
 
dprintk("NFS locking service started (ver " LOCKD_VERSION ").\n");
@@ -146,15 +150,19 @@ lockd(void *vrqstp)
 
/*
 * The main request loop. We don't terminate until the last
-* NFS mount or NFS daemon has gone away, and we've been sent a
-* signal, or else another process has taken over our job.
+* NFS mount or NFS daemon has gone away, and the nlm_blocked
+* list is empty. The nlmsvc_mutex ensures that we prevent a
+* new lockd from being started before the old one is down.
 */
-   while (!kthread_should_stop()) {
+   mutex_lock(&nlmsvc_mutex);
+   while (atomic_read(&nlmsvc_ref) != 0) {
long timeout = MAX_SCHEDULE_TIMEOUT;
char buf[RPC_MAX_ADDRBUFLEN];
 
+   mutex_unlock(&nlmsvc_mutex);
+
if (try_to_freeze())
-   continue;
+   goto again;
 
if (signalled()) {
flush_signals(current);
@@ -181,11 +189,12 @@ lockd(void *vrqstp)
 */
err = svc_recv(rqstp, timeout);
if (err == -EAGAIN || err == -EINTR)
-   continue;
+   goto again;
if (err < 0) {
printk(KERN_WARNING
   "lockd: terminating on error %d\n",
   -err);
+   mutex_lock(&nlmsvc_mutex);
break;
}
 
@@ -193,8 +202,15 @@ lockd(void *vrqstp)
svc_print_addr(rqstp, buf, sizeof(buf)));
 
svc_process(rqstp);
+again:
+   mutex_lock(&nlmsvc_mutex);
}
 
+   /*
+* at this point lockd is committed to going down. We hold the
+* nlmsvc_mutex until just before exit to prevent a new one
+* from starting before it's down.
+*/
flush_signals(current);
 
if (nlmsvc_ops)
@@ -202,6 +218,7 @@ lockd(void *vrqstp)
nlm_shutdown_hosts();
nlmsvc_task = NULL;
nlmsvc_serv = NULL;
+   mutex_unlock(&nlmsvc_mutex);
 
/* Exit the RPC thread */
svc_exit_thread(rqstp);
@@ -263,6 +280,11 @@ lockd_up(int proto) /* Maybe add a 'family' option when 
IPv6 is supported ?? */
int error = 0;
 
mutex_lock(&nlmsvc_mutex);
+
+   /* first lockd_up caller takes a nlmsvc_ref */
+   if (!nlmsvc_users)
+   atomic_inc(

Re: [PATCH 6/6] NLM: Add reference counting to lockd

2008-01-08 Thread Jeff Layton
On Tue, 08 Jan 2008 10:52:19 -0500
Wendy Cheng <[EMAIL PROTECTED]> wrote:

> Jeff Layton wrote:
> >
> >> The previous patch removes a kill_proc(... SIGKILL),  this one
> >> adds it back.
> >> That makes me wonder if the intermediate state is 'correct'.
> >>
> >> But I also wonder what "correct" means.
> >> Do we want all locks to be dropped when the last nfsd thread dies?
> >> The answer is presumably either "yes" or "no".
> >> If "yes", then we don't have that because if there are any NFS
> >> mounts active, lockd will not be killed.
> >> If "no", then we don't want this kill_proc here.
> >>
> >> The comment in lockd() which currently reads:
> >>
> >>/*
> >> * The main request loop. We don't terminate until the last
> >> * NFS mount or NFS daemon has gone away, and we've been
> >> sent a
> >> * signal, or else another process has taken over our job.
> >> */
> >>
> >> suggests that someone once thought that lockd could hang around
> >> after all nfsd threads and nfs mounts had gone, but I don't think
> >> it does.
> >>
> >> We really should think this through and get it right, because if
> >> lockd ever drops it's locks, then we really need to make sure
> >> sm_notify gets run.  So it needs to be a well defined event.
> >>
> >> Thoughts?
> >>
> >> 
> >
> > This is the part I've been struggling with the most -- defining what
> > proper behavior should be when lockd is restarted. As you point out,
> > restarting lockd without doing a sm_notify could be bad news for
> > data integrity.
> >
> > Then again, we'd like someone to be able to shut down the NFS
> > "service" and be able to unmount underlying filesystems without
> > jumping through special hoops
> >
> > Overall, I think I'd vote "yes". We need to drop locks when the last
> > nfsd goes down. If userspace brings down nfsd, then it's userspace's
> > responsibility to make sure that a sm_notify is sent when nfsd and
> > lockd are restarted.
> >   
> 
> I would vote for "no", at least for nfs v3. Shutting down lockd would 
> require clients to reclaim the locks. With current status (protocol, 
> design, and even the implementation itself, etc), it is simply too 
> disruptive. I understand current logic (i.e. shutting down nfsd but 
> leaving lockd alone) is awkward but debugging multiple platforms 
> (remember clients may not be on linux boxes) is very non-trivial.
> 

The current lockd implementation already drops all locks if nfsd goes
down (providing there are no local NFS mounts). The last lockd_down call
will bring down lockd and it will drop all of its locks in the process.
My vote for "yes" is a vote to keep things the way they are. I don't
think I'd consider it disruptive.

Changing lockd to not drop locks will mean that userspace will need to
take extra steps if someone wants to bring down NFS and unmount an
underlying filesystem. Those extra steps could be a SIGKILL to lockd or
a call into the new interfaces your recent patchset adds. Either way,
that would mean a change in behavior that will have to be accounted for
in userspace.

-- 
Jeff Layton <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] NLM: Add reference counting to lockd

2008-01-08 Thread Peter Staubach

Jeff Layton wrote:

On Tue, 8 Jan 2008 17:46:33 +1100
Neil Brown <[EMAIL PROTECTED]> wrote:

The comments about patch 5/6 seem sane. I'll plan to incorporate them
in the respin...

  

On Saturday January 5, [EMAIL PROTECTED] wrote:


@@ -357,7 +375,18 @@ lockd_down(void)
goto out;
}
warned = 0;
-   kthread_stop(nlmsvc_task);
+   if (atomic_sub_return(1, &nlmsvc_ref) != 0)
+   printk(KERN_WARNING "lockd_down: lockd is waiting
for "
+   "outstanding requests to complete before
exiting.\n");
  

Why not "atomic_dec_and_test" ??




Temporary amnesia? :-) I'll change that, atomic_dec_and_test will be
clearer.

  

+
+   /*
+* Sending a signal is necessary here. If we get to this
point and
+* nlm_blocked isn't empty then lockd may be held hostage
by clients
+* that are still blocking. Sending the signal makes sure
that lockd
+* invalidates all of its locks so that it's just waiting
on RPC
+* callbacks to complete
+*/
+   kill_proc(nlmsvc_task->pid, SIGKILL, 1);
  

The previous patch removes a kill_proc(... SIGKILL),  this one adds it
back.
That makes me wonder if the intermediate state is 'correct'.

But I also wonder what "correct" means.
Do we want all locks to be dropped when the last nfsd thread dies?
The answer is presumably either "yes" or "no".
If "yes", then we don't have that because if there are any NFS mounts
active, lockd will not be killed.
If "no", then we don't want this kill_proc here.

The comment in lockd() which currently reads:

/*
 * The main request loop. We don't terminate until the last
 * NFS mount or NFS daemon has gone away, and we've been sent
a
 * signal, or else another process has taken over our job.
 */

suggests that someone once thought that lockd could hang around after
all nfsd threads and nfs mounts had gone, but I don't think it does.

We really should think this through and get it right, because if lockd
ever drops it's locks, then we really need to make sure sm_notify gets
run.  So it needs to be a well defined event.

Thoughts?




This is the part I've been struggling with the most -- defining what
proper behavior should be when lockd is restarted. As you point out,
restarting lockd without doing a sm_notify could be bad news for data
integrity.

Then again, we'd like someone to be able to shut down the NFS "service"
and be able to unmount underlying filesystems without jumping through
special hoops

Overall, I think I'd vote "yes". We need to drop locks when the last
nfsd goes down. If userspace brings down nfsd, then it's userspace's
responsibility to make sure that a sm_notify is sent when nfsd and lockd
are restarted.
  


I would vote for the simplest possible model that makes sense.
We need a simple model for admins as well as a simple model
which is easy to implement in as bug free way as possible.  The
trick is not making it too simple because that can cost
performance, but not making it too complicated to implement
reasonably and for admins to be able to figure out.

So, I would vote for "yes" as well.  That will yield an
architecture where we can shutdown systems cleanly and will
be easy to understand when locks for clients exist and when
they do not.

   Thanx...

  ps




As a side note, I'm not thrilled with this design that mixes signals
and kthreads, but didn't see another way to do this. I'm open to
suggestions if anyone has them...

  

Also, it is sad that the inc/dec of nlmsvc_ref is called in somewhat
non-obvious ways.
e.g.



+   if (!nlmsvc_users && error)
+   atomic_dec(&nlmsvc_ref);
  

and



+   if (list_empty(&nlm_blocked))
+   atomic_inc(&nlmsvc_ref);
+
if (list_empty(&block->b_list)) {
kref_get(&block->b_count);
} else {
  

where if we moved the atomic_inc a little bit later next to the
"list_add_tail" (which seems to make more sense) it would actually be
wrong... But I think that code is correct as it is - just non-obvious.




The nlmsvc_ref logic is pretty convoluted, unfortunately. I'll plan to
add some comments to clarify what I'm doing there.

Thanks for the review, Neil. I'll see if I can get a new patchset done
in the next few days.

Cheers,
  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] NLM: Add reference counting to lockd

2008-01-08 Thread Wendy Cheng

Jeff Layton wrote:



The previous patch removes a kill_proc(... SIGKILL),  this one adds it
back.
That makes me wonder if the intermediate state is 'correct'.

But I also wonder what "correct" means.
Do we want all locks to be dropped when the last nfsd thread dies?
The answer is presumably either "yes" or "no".
If "yes", then we don't have that because if there are any NFS mounts
active, lockd will not be killed.
If "no", then we don't want this kill_proc here.

The comment in lockd() which currently reads:

/*
 * The main request loop. We don't terminate until the last
 * NFS mount or NFS daemon has gone away, and we've been sent
a
 * signal, or else another process has taken over our job.
 */

suggests that someone once thought that lockd could hang around after
all nfsd threads and nfs mounts had gone, but I don't think it does.

We really should think this through and get it right, because if lockd
ever drops it's locks, then we really need to make sure sm_notify gets
run.  So it needs to be a well defined event.

Thoughts?




This is the part I've been struggling with the most -- defining what
proper behavior should be when lockd is restarted. As you point out,
restarting lockd without doing a sm_notify could be bad news for data
integrity.

Then again, we'd like someone to be able to shut down the NFS "service"
and be able to unmount underlying filesystems without jumping through
special hoops

Overall, I think I'd vote "yes". We need to drop locks when the last
nfsd goes down. If userspace brings down nfsd, then it's userspace's
responsibility to make sure that a sm_notify is sent when nfsd and lockd
are restarted.
  


I would vote for "no", at least for nfs v3. Shutting down lockd would 
require clients to reclaim the locks. With current status (protocol, 
design, and even the implementation itself, etc), it is simply too 
disruptive. I understand current logic (i.e. shutting down nfsd but 
leaving lockd alone) is awkward but debugging multiple platforms 
(remember clients may not be on linux boxes) is very non-trivial.


-- Wendy


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] NLM: Add reference counting to lockd

2008-01-08 Thread Jeff Layton
On Tue, 8 Jan 2008 17:46:33 +1100
Neil Brown <[EMAIL PROTECTED]> wrote:

The comments about patch 5/6 seem sane. I'll plan to incorporate them
in the respin...

> On Saturday January 5, [EMAIL PROTECTED] wrote:
> > @@ -357,7 +375,18 @@ lockd_down(void)
> > goto out;
> > }
> > warned = 0;
> > -   kthread_stop(nlmsvc_task);
> > +   if (atomic_sub_return(1, &nlmsvc_ref) != 0)
> > +   printk(KERN_WARNING "lockd_down: lockd is waiting
> > for "
> > +   "outstanding requests to complete before
> > exiting.\n");
> 
> Why not "atomic_dec_and_test" ??
> 

Temporary amnesia? :-) I'll change that, atomic_dec_and_test will be
clearer.

> > +
> > +   /*
> > +* Sending a signal is necessary here. If we get to this
> > point and
> > +* nlm_blocked isn't empty then lockd may be held hostage
> > by clients
> > +* that are still blocking. Sending the signal makes sure
> > that lockd
> > +* invalidates all of its locks so that it's just waiting
> > on RPC
> > +* callbacks to complete
> > +*/
> > +   kill_proc(nlmsvc_task->pid, SIGKILL, 1);
> 
> The previous patch removes a kill_proc(... SIGKILL),  this one adds it
> back.
> That makes me wonder if the intermediate state is 'correct'.
> 
> But I also wonder what "correct" means.
> Do we want all locks to be dropped when the last nfsd thread dies?
> The answer is presumably either "yes" or "no".
> If "yes", then we don't have that because if there are any NFS mounts
> active, lockd will not be killed.
> If "no", then we don't want this kill_proc here.
> 
> The comment in lockd() which currently reads:
> 
>   /*
>* The main request loop. We don't terminate until the last
>* NFS mount or NFS daemon has gone away, and we've been sent
> a
>* signal, or else another process has taken over our job.
>*/
> 
> suggests that someone once thought that lockd could hang around after
> all nfsd threads and nfs mounts had gone, but I don't think it does.
> 
> We really should think this through and get it right, because if lockd
> ever drops it's locks, then we really need to make sure sm_notify gets
> run.  So it needs to be a well defined event.
> 
> Thoughts?
> 

This is the part I've been struggling with the most -- defining what
proper behavior should be when lockd is restarted. As you point out,
restarting lockd without doing a sm_notify could be bad news for data
integrity.

Then again, we'd like someone to be able to shut down the NFS "service"
and be able to unmount underlying filesystems without jumping through
special hoops

Overall, I think I'd vote "yes". We need to drop locks when the last
nfsd goes down. If userspace brings down nfsd, then it's userspace's
responsibility to make sure that a sm_notify is sent when nfsd and lockd
are restarted.

As a side note, I'm not thrilled with this design that mixes signals
and kthreads, but didn't see another way to do this. I'm open to
suggestions if anyone has them...

> Also, it is sad that the inc/dec of nlmsvc_ref is called in somewhat
> non-obvious ways.
> e.g.
> 
> > +   if (!nlmsvc_users && error)
> > +   atomic_dec(&nlmsvc_ref);
> 
> and
> 
> > +   if (list_empty(&nlm_blocked))
> > +   atomic_inc(&nlmsvc_ref);
> > +
> > if (list_empty(&block->b_list)) {
> > kref_get(&block->b_count);
> > } else {
> 
> where if we moved the atomic_inc a little bit later next to the
> "list_add_tail" (which seems to make more sense) it would actually be
> wrong... But I think that code is correct as it is - just non-obvious.
> 

The nlmsvc_ref logic is pretty convoluted, unfortunately. I'll plan to
add some comments to clarify what I'm doing there.

Thanks for the review, Neil. I'll see if I can get a new patchset done
in the next few days.

Cheers,
-- 
Jeff Layton <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] NLM: Add reference counting to lockd

2008-01-07 Thread Neil Brown
On Saturday January 5, [EMAIL PROTECTED] wrote:
> @@ -357,7 +375,18 @@ lockd_down(void)
>   goto out;
>   }
>   warned = 0;
> - kthread_stop(nlmsvc_task);
> + if (atomic_sub_return(1, &nlmsvc_ref) != 0)
> + printk(KERN_WARNING "lockd_down: lockd is waiting for "
> + "outstanding requests to complete before exiting.\n");

Why not "atomic_dec_and_test" ??

> +
> + /*
> +  * Sending a signal is necessary here. If we get to this point and
> +  * nlm_blocked isn't empty then lockd may be held hostage by clients
> +  * that are still blocking. Sending the signal makes sure that lockd
> +  * invalidates all of its locks so that it's just waiting on RPC
> +  * callbacks to complete
> +  */
> + kill_proc(nlmsvc_task->pid, SIGKILL, 1);

The previous patch removes a kill_proc(... SIGKILL),  this one adds it
back.
That makes me wonder if the intermediate state is 'correct'.

But I also wonder what "correct" means.
Do we want all locks to be dropped when the last nfsd thread dies?
The answer is presumably either "yes" or "no".
If "yes", then we don't have that because if there are any NFS mounts
active, lockd will not be killed.
If "no", then we don't want this kill_proc here.

The comment in lockd() which currently reads:

/*
 * The main request loop. We don't terminate until the last
 * NFS mount or NFS daemon has gone away, and we've been sent a
 * signal, or else another process has taken over our job.
 */

suggests that someone once thought that lockd could hang around after
all nfsd threads and nfs mounts had gone, but I don't think it does.

We really should think this through and get it right, because if lockd
ever drops it's locks, then we really need to make sure sm_notify gets
run.  So it needs to be a well defined event.

Thoughts?

Also, it is sad that the inc/dec of nlmsvc_ref is called in somewhat
non-obvious ways.
e.g.

> + if (!nlmsvc_users && error)
> + atomic_dec(&nlmsvc_ref);

and

> + if (list_empty(&nlm_blocked))
> + atomic_inc(&nlmsvc_ref);
> +
>   if (list_empty(&block->b_list)) {
>   kref_get(&block->b_count);
>   } else {

where if we moved the atomic_inc a little bit later next to the
"list_add_tail" (which seems to make more sense) it would actually be
wrong... But I think that code is correct as it is - just non-obvious.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/6] NLM: Add reference counting to lockd

2008-01-05 Thread Jeff Layton
...and only have lockd exit when the last reference is dropped.

The problem is this:

When a lock that a client is blocking on comes free, lockd does this in
nlmsvc_grant_blocked():

nlm_async_call(block->b_call, NLMPROC_GRANTED_MSG, &nlmsvc_grant_ops);

the callback from this call is nlmsvc_grant_callback(). That function
does this at the end to wake up lockd:

svc_wake_up(block->b_daemon);

However there is no guarantee that lockd will be up when this happens.
If someone shuts down or restarts lockd before the async call completes,
then the b_daemon pointer will point to freed memory and the kernel may
oops.

I first noticed this on older kernels and had mistakenly thought that
newer kernels weren't susceptible, but that's not correct. There's a bit
of a race to make sure that the nlm_host is bound when the async call is
done, but I can now reproduce this at will on current kernels.

This patch is based on Trond's suggestion to add a new reference counter
to lockd, and only allows lockd to go down when it reaches 0. With this
change we can't use kthread_stop here. nlmsvc_unlink_block is called by
lockd and a kthread can't call kthread_stop on itself. So the patch
changes lockd to check the refcount itself and to return if it goes to
0. We do the checking and exit while holding the nlmsvc_mutex to make
sure that a new lockd is not started until the old one is down.

Signed-off-by: Jeff Layton <[EMAIL PROTECTED]>
---
 fs/lockd/svc.c  |   51 +-
 fs/lockd/svclock.c  |5 
 include/linux/lockd/lockd.h |1 +
 3 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index d7209ea..0f56edf 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -51,6 +51,7 @@ static DEFINE_MUTEX(nlmsvc_mutex);
 static unsigned intnlmsvc_users;
 static struct task_struct  *nlmsvc_task;
 static struct svc_serv *nlmsvc_serv;
+atomic_t   nlmsvc_ref = ATOMIC_INIT(0);
 intnlmsvc_grace_period;
 unsigned long  nlmsvc_timeout;
 
@@ -134,7 +135,10 @@ lockd(void *vrqstp)
 
set_freezable();
 
-   /* Process request with signals blocked, but allow SIGKILL.  */
+   /*
+* Process request with signals blocked, but allow SIGKILL which
+* signifies that lockd should drop all of its locks.
+*/
allow_signal(SIGKILL);
 
dprintk("NFS locking service started (ver " LOCKD_VERSION ").\n");
@@ -147,15 +151,19 @@ lockd(void *vrqstp)
 
/*
 * The main request loop. We don't terminate until the last
-* NFS mount or NFS daemon has gone away, and we've been sent a
-* signal, or else another process has taken over our job.
+* NFS mount or NFS daemon has gone away, and the nlm_blocked
+* list is empty. The nlmsvc_mutex ensures that we prevent a
+* new lockd from being started before the old one is down.
 */
-   while (!kthread_should_stop()) {
+   mutex_lock(&nlmsvc_mutex);
+   while (atomic_read(&nlmsvc_ref) != 0) {
long timeout = MAX_SCHEDULE_TIMEOUT;
char buf[RPC_MAX_ADDRBUFLEN];
 
+   mutex_unlock(&nlmsvc_mutex);
+
if (try_to_freeze())
-   continue;
+   goto again;
 
if (signalled()) {
flush_signals(current);
@@ -182,11 +190,12 @@ lockd(void *vrqstp)
 */
err = svc_recv(rqstp, timeout);
if (err == -EAGAIN || err == -EINTR)
-   continue;
+   goto again;
if (err < 0) {
printk(KERN_WARNING
   "lockd: terminating on error %d\n",
   -err);
+   mutex_lock(&nlmsvc_mutex);
break;
}
 
@@ -194,19 +203,22 @@ lockd(void *vrqstp)
svc_print_addr(rqstp, buf, sizeof(buf)));
 
svc_process(rqstp);
+again:
+   mutex_lock(&nlmsvc_mutex);
}
 
-   flush_signals(current);
-
/*
-* Check whether there's a new lockd process before
-* shutting down the hosts and clearing the slot.
+* at this point lockd is committed to going down. We hold the
+* nlmsvc_mutex until just before exit to prevent a new one
+* from starting before it's down.
 */
+   flush_signals(current);
if (nlmsvc_ops)
nlmsvc_invalidate_all();
nlm_shutdown_hosts();
nlmsvc_task = NULL;
nlmsvc_serv = NULL;
+   mutex_unlock(&nlmsvc_mutex);
 
/* Exit the RPC thread */
svc_exit_thread(rqstp);
@@ -269,6 +281,10 @@ lockd_up(int proto) /* Maybe add a 'family' option when 
IPv6 is supported ?? */
int error = 0;

[PATCH 6/6] NLM: Add reference counting to lockd

2007-12-13 Thread Jeff Layton
...and only have lockd exit when the last reference is dropped. This
means that we can't use kthread_stop here. nlmsvc_unlink_block is called
by lockd and a kthread can't call kthread_stop on itself. So, change
lockd to check the refcount itself and to return if it goes to 0. We do
the checking and exit while holding the nlmsvc_mutex to make sure that a
new lockd is not started until the old one is down.

Signed-off-by: Jeff Layton <[EMAIL PROTECTED]>
---
 fs/lockd/svc.c  |   51 --
 fs/lockd/svclock.c  |5 
 include/linux/lockd/lockd.h |1 +
 3 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 1303ce8..05d2317 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -51,6 +51,7 @@ static DEFINE_MUTEX(nlmsvc_mutex);
 static unsigned intnlmsvc_users;
 static struct task_struct *nlmsvc_task;
 static struct svc_serv *   nlmsvc_serv;
+atomic_t   nlmsvc_ref = ATOMIC_INIT(0);
 intnlmsvc_grace_period;
 unsigned long  nlmsvc_timeout;
 
@@ -134,7 +135,10 @@ lockd(struct svc_rqst *rqstp)
 
set_freezable();
 
-   /* Process request with signals blocked, but allow SIGKILL.  */
+   /*
+* Process request with signals blocked, but allow SIGKILL which
+* signifies that lockd should drop all of its locks.
+*/
allow_signal(SIGKILL);
 
dprintk("NFS locking service started (ver " LOCKD_VERSION ").\n");
@@ -147,15 +151,19 @@ lockd(struct svc_rqst *rqstp)
 
/*
 * The main request loop. We don't terminate until the last
-* NFS mount or NFS daemon has gone away, and we've been sent a
-* signal, or else another process has taken over our job.
+* NFS mount or NFS daemon has gone away, and the nlm_blocked
+* list is empty. The nlmsvc_mutex ensures that we prevent a
+* new lockd from being started before the old one is down.
 */
-   while (!kthread_should_stop()) {
+   mutex_lock(&nlmsvc_mutex);
+   while (atomic_read(&nlmsvc_ref) != 0) {
long timeout = MAX_SCHEDULE_TIMEOUT;
char buf[RPC_MAX_ADDRBUFLEN];
 
+   mutex_unlock(&nlmsvc_mutex);
+
if (try_to_freeze())
-   continue;
+   goto again;
 
if (signalled()) {
flush_signals(current);
@@ -182,11 +190,12 @@ lockd(struct svc_rqst *rqstp)
 */
err = svc_recv(rqstp, timeout);
if (err == -EAGAIN || err == -EINTR)
-   continue;
+   goto again;
if (err < 0) {
printk(KERN_WARNING
   "lockd: terminating on error %d\n",
   -err);
+   mutex_lock(&nlmsvc_mutex);
break;
}
 
@@ -194,19 +203,22 @@ lockd(struct svc_rqst *rqstp)
svc_print_addr(rqstp, buf, sizeof(buf)));
 
svc_process(rqstp);
+again:
+   mutex_lock(&nlmsvc_mutex);
}
 
-   flush_signals(current);
-
/*
-* Check whether there's a new lockd process before
-* shutting down the hosts and clearing the slot.
-*/
+* at this point lockd is committed to going down. We hold the
+* nlmsvc_mutex until just before exit to prevent a new one
+* from starting before it's down.
+*/
+   flush_signals(current);
if (nlmsvc_ops)
nlmsvc_invalidate_all();
nlm_shutdown_hosts();
nlmsvc_task = NULL;
nlmsvc_serv = NULL;
+   mutex_unlock(&nlmsvc_mutex);
 
/* Exit the RPC thread */
svc_exit_thread(rqstp);
@@ -267,6 +279,10 @@ lockd_up(int proto) /* Maybe add a 'family' option when 
IPv6 is supported ?? */
int error = 0;
 
mutex_lock(&nlmsvc_mutex);
+
+   if (!nlmsvc_users)
+   atomic_inc(&nlmsvc_ref);
+
/*
 * Check whether we're already up and running.
 */
@@ -313,6 +329,8 @@ lockd_up(int proto) /* Maybe add a 'family' option when 
IPv6 is supported ?? */
 destroy_and_out:
svc_destroy(serv);
 out:
+   if (!nlmsvc_users && error)
+   atomic_dec(&nlmsvc_ref);
if (!error)
nlmsvc_users++;
mutex_unlock(&nlmsvc_mutex);
@@ -341,7 +359,16 @@ lockd_down(void)
goto out;
}
warned = 0;
-   kthread_stop(nlmsvc_task);
+   atomic_dec(&nlmsvc_ref);
+
+   /*
+* Sending a signal is necessary here. If we get to this point and
+* nlm_blocked isn't empty then lockd may be held hostage by clients
+* that are still blocking. Sending the signal makes sure that lockd
+* invalidates