On Fri, May 04, 2018 at 11:38:37PM -0500, Eric W. Biederman wrote:
> "Paul E. McKenney" <paul...@linux.vnet.ibm.com> writes:
> 
> > On Fri, May 04, 2018 at 03:08:40PM -0500, Eric W. Biederman wrote:
> >> "Paul E. McKenney" <paul...@linux.vnet.ibm.com> writes:
> >> 
> >> > On Fri, May 04, 2018 at 02:03:04PM -0500, Eric W. Biederman wrote:
> >> >> "Paul E. McKenney" <paul...@linux.vnet.ibm.com> writes:
> >> >> 
> >> >> > On Fri, May 04, 2018 at 12:17:20PM -0500, Eric W. Biederman wrote:
> >> >> >> Sebastian Andrzej Siewior <bige...@linutronix.de> writes:
> >> >> >> 
> >> >> >> > On 2018-05-04 11:59:08 [-0500], Eric W. Biederman wrote:
> >> >> >> >> Sebastian Andrzej Siewior <bige...@linutronix.de> writes:
> >> >> >> >> > From: Anna-Maria Gleixner <anna-ma...@linutronix.de>
> >> >> >> > …
> >> >> >> >> > This long-term fix has been made in commit 4abf91047cf 
> >> >> >> >> > ("rtmutex: Make >
> >> >> >> >> > wait_lock irq safe") for different reason.
> >> >> >> >> 
> >> >> >> >> Which tree has this change been made in?  I am not finding the 
> >> >> >> >> commit
> >> >> >> >> you mention above in Linus's tree.
> >> >> >> >
> >> >> >> > I'm sorry, it should have been commit b4abf91047cf ("rtmutex: Make
> >> >> >> > wait_lock irq safe").
> >> >> >> 
> >> >> >> Can you fix that in your patch description and can you also up the
> >> >> >> description of rcu_read_unlock?
> >> >> >> 
> >> >> >> If we don't need to jump through hoops it looks very reasonable to
> >> >> >> remove this unnecessary logic.  But we should fix the description
> >> >> >> in rcu_read_unlock that still says we need these hoops.
> >> >> >
> >> >> > The hoops are still required for rcu_read_lock(), otherwise you
> >> >> > get deadlocks between the scheduler and RCU in PREEMPT=y kernels.
> >> >> > What happens with this patch (if I understand it correctly) is that 
> >> >> > the
> >> >> > signal code now uses a different way of jumping through the hoops.
> >> >> > But the hoops are still jumped through.
> >> >> 
> >> >> The patch changes:
> >> >> 
> >> >> local_irq_disable();
> >> >> rcu_read_lock();
> >> >> spin_lock();
> >> >> rcu_read_unlock();
> >> >> 
> >> >> to:
> >> >> 
> >> >> rcu_read_lock();
> >> >> spin_lock_irq();
> >> >> rcu_read_unlock();
> >> >> 
> >> >> Now that I have a chance to relfect on it the fact that the patern
> >> >> that is being restored does not work is scary.  As the failure has
> >> >> nothing to do with lock ordering and people won't realize what is going
> >> >> on.  Especially since the common rcu modes won't care.
> >> >> 
> >> >> So is it true that taking spin_lock_irq before calling rcu_read_unlock
> >> >> is a problem because of rt_mutex_unlock()?  Or has b4abf91047cf 
> >> >> ("rtmutex: Make
> >> >> wait_lock irq safe") actually fixed that and we can correct the
> >> >> documentation of rcu_read_unlock() ?  And fix __lock_task_sighand?
> >> >
> >> > The problem is that the thing taking the lock might be the scheduler,
> >> > or one of the locks taken while the scheduler's pi and rq locks are
> >> > held.  This occurs only with RCU-preempt.
> >> >
> >> > Here is what can happen:
> >> >
> >> > o        A task does rcu_read_lock().
> >> >
> >> > o        That task is preempted.
> >> >
> >> > o        That task stays preempted for a long time, and is therefore
> >> >  priority boosted.  This boosting involves a high-priority RCU
> >> >  kthread creating an rt_mutex, pretending that the preempted task
> >> >  already holds it, and then acquiring it.
> >> >
> >> > o        The task awakens, acquires the scheduler's rq lock, and
> >> >  then does rcu_read_unlock().
> >> >
> >> > o        Because the task has been priority boosted, __rcu_read_unlock()
> >> >  invokes the rcu_read_unlock_special() slowpath, which does
> >> >  (as you say) rt_mutex_unlock() to deboost.  The deboosting
> >> >  can cause the scheduler to acquire the rq and pi locks, which
> >> >  results in deadlock.
> >> >
> >> > In contrast, holding these scheduler locks across the entirety of the
> >> > RCU-preempt read-side critical section is harmless because then the
> >> > critical section cannot be preempted, which means that priority boosting
> >> > cannot happen, which means that there will be no need to deboost at
> >> > rcu_read_unlock() time.
> >> >
> >> > This restriction has not changed, and as far as I can see is inherent
> >> > in the fact that RCU uses the scheduler and the scheduler uses RCU.
> >> > There is going to be an odd corner case in there somewhere!
> >> 
> >> However if I read things correctly b4abf91047cf ("rtmutex: Make
> >> wait_lock irq safe") did change this.
> >> 
> >> In particular it changed things so that it is only the scheduler locks
> >> that matter, not any old lock that disabled interrupts.  This was done
> >> by disabling disabling interrupts when taking the wait_lock.
> >> 
> >> The rcu_read_unlock documentation states:
> >> 
> >>  * In most situations, rcu_read_unlock() is immune from deadlock.
> >>  * However, in kernels built with CONFIG_RCU_BOOST, rcu_read_unlock()
> >>  * is responsible for deboosting, which it does via rt_mutex_unlock().
> >>  * Unfortunately, this function acquires the scheduler's runqueue and
> >>  * priority-inheritance spinlocks.  This means that deadlock could result
> >>  * if the caller of rcu_read_unlock() already holds one of these locks or
> >>  * any lock that is ever acquired while holding them; or any lock which
> >>  * can be taken from interrupt context because rcu_boost()->rt_mutex_lock()
> >>  * does not disable irqs while taking ->wait_lock.
> >> 
> >> So we can now remove the clause:
> >>  * ; or any lock which
> >>  * can be taken from interrupt context because rcu_boost()->rt_mutex_lock()
> >>  * does not disable irqs while taking ->wait_lock.
> >> 
> >> Without the any lock that disabled interrupts restriction it is now safe
> >> to not worry about the issues with the scheduler locks and the rt_mutex
> >> Which does make it safe to not worry about these crazy complexities in
> >> lock_task_sighand.
> >> 
> >> Paul do you agree or is the patch unsafe?
> >
> > Ah, I thought you were trying to get rid of all but the first line of
> > that paragraph, not just the final clause.  Apologies for my confusion!
> >
> > It looks plausible, but the patch should be stress-tested on a preemptible
> > kernel with priority boosting enabled.  Has that been done?
> >
> > (Me, I would run rcutorture scenario TREE03 for an extended time period
> > on b4abf91047cf with your patch applied.

And with lockdep enabled, which TREE03 does not do by default.

> >                                           But what testing have you
> > done already?)
> 
> Not my patch.  I was just the reviewer asking for some obvious cleanups.
> So people won't get confused.  Sebastian?  Can you tell Paul what
> testing you have done?

Ah, apologies for my confusion!

                                                        Thanx, Paul

Reply via email to