Also, the prepare to wait call does so in non exclusive mode so switching the 
parameter on the wakeup would have no effect as non exclusive waiters are 
always woken anyway.

So, there is some other reason why we decide to yield but never get woken.

Mark.
________________________________
From: Mark Syms <mark.s...@citrix.com>
Sent: Thursday, 11 October 2018 09:14
To: Tim Smith <tim.sm...@citrix.com>,cluster-devel@redhat.com
CC: Andreas Gruenbacher <agrue...@redhat.com>,Ross Lagerwall 
<ross.lagerw...@citrix.com>
Subject: RE: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find 
insert glock


And, sadly, the change to wait all also doesn't prevent the schedule_timeout 
from occurring. Something more subtle going on obviously.

        Mark.

-----Original Message-----
From: Tim Smith <tim.sm...@citrix.com>
Sent: 10 October 2018 09:23
To: cluster-devel@redhat.com
Cc: Andreas Gruenbacher <agrue...@redhat.com>; Ross Lagerwall 
<ross.lagerw...@citrix.com>; Mark Syms <mark.s...@citrix.com>
Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find 
insert glock

On Tuesday, 9 October 2018 16:08:10 BST Tim Smith wrote:
> On Tuesday, 9 October 2018 14:00:34 BST Andreas Gruenbacher wrote:
> > On Tue, 9 Oct 2018 at 14:46, Tim Smith <tim.sm...@citrix.com> wrote:
> > > On Tuesday, 9 October 2018 13:34:47 BST you wrote:
> > > > There must be another reason for the missed wake-up. I'll have
> > > > to study the code some more.
> > >
> > > I think it needs to go into gfs2_glock_dealloc(), in such a way as
> > > to avoid that problem. Currently working out a patch to do that.
> >
> > That doesn't sound right: find_insert_glock is waiting for the glock
> > to be removed from the rhashtable. In gfs2_glock_free, we remove the
> > glock from the rhashtable and then we do the wake-up. Delaying the
> > wakeup further isn't going to help (but it might hide the problem).
>
> The only way we can get the problem we're seeing is if we get an
> effective order of
>
> T0: wake_up_glock()
> T1: prepare_to_wait()
> T1: schedule()
>
> so clearly there's a way for that to happen. Any other order and
> either
> schedule() doesn't sleep or it gets woken.
>
> The only way I can see at the moment to ensure that wake_up_glock()
> *cannot* get called until after prepare_to_wait() is to delay it until
> the read_side critical sections are done, and the first place that's
> got that property is the start of gfs2_glock_dealloc(), unless we want
> to add synchronise_rcu() to gfs2_glock_free() and I'm guessing there's
> a reason it's using
> call_rcu() instead.
>
> I'll keep thinking about it.

OK, we have a result that this definitely *isn't* the right answer (still 
getting the wakeup happening). This is lends more weight to the idea that there 
are multiple waiters, so we'll try your patch.

--
Tim Smith <tim.sm...@citrix.com>


Reply via email to