On Wed, Mar 11, 2015 at 04:18:49PM -0400, Sasha Levin wrote:
> On 03/11/2015 04:17 PM, Paul E. McKenney wrote:
> > On Wed, Mar 11, 2015 at 03:57:32PM -0400, Sasha Levin wrote:
> >> Hi all,
> >>
> >> I've started seeing the following hang pretty often during my fuzzing. The
> >> system proceeds to lock up after that.
> >>
> >> [ 3209.655703] INFO: rcu_preempt detected stalls on CPUs/tasks:
> >> [ 3209.655703]     Tasks blocked on level-1 rcu_node (CPUs 16-31):
> >> [ 3209.655703]     (detected by 0, t=30502 jiffies, g=48799, c=48798, 
> >> q=1730)
> >> [ 3209.655703] All QSes seen, last rcu_preempt kthread activity 1 
> >> (4295246069-4295246068), jiffies_till_next_fqs=1, root ->qsmask 0x2
> >> [ 3209.655703] trinity-c24     R  running task    26944  9338   9110 
> >> 0x10080000
> >> [ 3209.655703]  0000000000002396 00000000e68fa48e ffff880050607dc8 
> >> ffffffffa427679b
> >> [ 3209.655703]  ffff880050607d98 ffffffffb1b36000 0000000000000001 
> >> 00000001000440f4
> >> [ 3209.655703]  ffffffffb1b351c8 dffffc0000000000 ffff880050622000 
> >> ffffffffb1721000
> >> [ 3209.655703] Call Trace:
> >> [ 3209.655703] <IRQ> sched_show_task (kernel/sched/core.c:4542)
> >> [ 3209.655703] rcu_check_callbacks (kernel/rcu/tree.c:1225 
> >> kernel/rcu/tree.c:1331 kernel/rcu/tree.c:3389 kernel/rcu/tree.c:3453 
> >> kernel/rcu/tree.c:2683)
> >> [ 3209.655703] ? acct_account_cputime (kernel/tsacct.c:168)
> >> [ 3209.655703] update_process_times (./arch/x86/include/asm/preempt.h:22 
> >> kernel/time/timer.c:1386)
> >> [ 3209.655703] tick_periodic (kernel/time/tick-common.c:92)
> >> [ 3209.655703] ? tick_handle_periodic (kernel/time/tick-common.c:105)
> >> [ 3209.655703] tick_handle_periodic (kernel/time/tick-common.c:105)
> >> [ 3209.655703] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:891)
> >> [ 3209.655703] smp_trace_apic_timer_interrupt 
> >> (arch/x86/kernel/apic/apic.c:934 include/linux/jump_label.h:114 
> >> ./arch/x86/include/asm/trace/irq_vectors.h:45 
> >> arch/x86/kernel/apic/apic.c:935)
> >> [ 3209.655703] trace_apic_timer_interrupt (arch/x86/kernel/entry_64.S:920)
> >> [ 3209.655703] <EOI> ? add_wait_queue (include/linux/wait.h:116 
> >> kernel/sched/wait.c:29)
> >> [ 3209.655703] ? _raw_spin_unlock_irqrestore 
> >> (./arch/x86/include/asm/paravirt.h:809 
> >> include/linux/spinlock_api_smp.h:162 kernel/locking/spinlock.c:191)
> >> [ 3209.655703] add_wait_queue (kernel/sched/wait.c:31)
> >> [ 3209.655703] do_wait (kernel/exit.c:1473)
> >> [ 3209.655703] ? trace_rcu_dyntick (include/trace/events/rcu.h:363 
> >> (discriminator 19))
> >> [ 3209.655703] ? wait_consider_task (kernel/exit.c:1465)
> >> [ 3209.655703] ? find_get_pid (kernel/pid.c:490)
> >> [ 3209.655703] SyS_wait4 (kernel/exit.c:1618 kernel/exit.c:1586)
> >> [ 3209.655703] ? perf_syscall_exit (kernel/trace/trace_syscalls.c:549)
> >> [ 3209.655703] ? SyS_waitid (kernel/exit.c:1586)
> >> [ 3209.655703] ? kill_orphaned_pgrp (kernel/exit.c:1444)
> >> [ 3209.655703] ? syscall_trace_enter_phase2 (arch/x86/kernel/ptrace.c:1592)
> >> [ 3209.655703] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42)
> >> [ 3209.655703] tracesys_phase2 (arch/x86/kernel/entry_64.S:347)
> > 
> > OK, that is not good.
> > 
> > What version are you running and what is your .config?
> 
> latest -next. .config attached.

Aha, I forgot to update rcu/next.  I have now updated it, so it should
make it there today or tomorrow.  In the meantime, does the following
commit help?

Also, how quickly does your test setup reproduce this?

                                                        Thanx, Paul

------------------------------------------------------------------------

rcu: Yet another fix for preemption and CPU hotplug

As noted earlier, the following sequence of events can occur when
running PREEMPT_RCU and HOTPLUG_CPU on a system with a multi-level
rcu_node combining tree:

1.      A group of tasks block on CPUs corresponding to a given leaf
        rcu_node structure while within RCU read-side critical sections.
2.      All CPUs corrsponding to that rcu_node structure go offline.
3.      The next grace period starts, but because there are still tasks
        blocked, the upper-level bits corresponding to this leaf rcu_node
        structure remain set.
4.      All the tasks exit their RCU read-side critical sections and
        remove themselves from the leaf rcu_node structure's list,
        leaving it empty.
5.      But because there now is code to check for this condition at
        force-quiescent-state time, the upper bits are cleared and the
        grace period completes.
    
However, there is another complication that can occur following step 4 above:
    
4a.     The grace period starts, and the leaf rcu_node structure's
        gp_tasks pointer is set to NULL because there are no tasks
        blocked on this structure.
4b.     One of the CPUs corresponding to the leaf rcu_node structure
        comes back online.
4b.     An endless stream of tasks are preempted within RCU read-side
        critical sections on this CPU, such that the ->blkd_tasks
        list is always non-empty.
    
The grace period will never end.

This commit therefore makes the force-quiescent-state processing check only
for absence of tasks blocking the current grace period rather than absence
of tasks altogether.  This will cause a quiescent state to be reported if
the current leaf rcu_node structure is not blocking the current grace period
and its parent thinks that it is, regardless of how RCU managed to get
itself into this state.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: <[email protected]> # 4.0.x

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cc2e9bebf585..f9a56523e8dd 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2212,8 +2212,8 @@ static void rcu_report_unblock_qs_rnp(struct rcu_state 
*rsp,
        unsigned long mask;
        struct rcu_node *rnp_p;
 
-       WARN_ON_ONCE(rsp == &rcu_bh_state || rsp == &rcu_sched_state);
-       if (rnp->qsmask != 0 || rcu_preempt_blocked_readers_cgp(rnp)) {
+       if (rcu_state_p == &rcu_sched_state || rsp != rcu_state_p ||
+           rnp->qsmask != 0 || rcu_preempt_blocked_readers_cgp(rnp)) {
                raw_spin_unlock_irqrestore(&rnp->lock, flags);
                return;  /* Still need more quiescent states! */
        }
@@ -2221,9 +2221,8 @@ static void rcu_report_unblock_qs_rnp(struct rcu_state 
*rsp,
        rnp_p = rnp->parent;
        if (rnp_p == NULL) {
                /*
-                * Either there is only one rcu_node in the tree,
-                * or tasks were kicked up to root rcu_node due to
-                * CPUs going offline.
+                * Only one rcu_node structure in the tree, so don't
+                * try to report up to its nonexistent parent!
                 */
                rcu_report_qs_rsp(rsp, flags);
                return;
@@ -2715,8 +2714,29 @@ static void force_qs_rnp(struct rcu_state *rsp,
                        return;
                }
                if (rnp->qsmask == 0) {
-                       rcu_initiate_boost(rnp, flags); /* releases rnp->lock */
-                       continue;
+                       if (rcu_state_p == &rcu_sched_state ||
+                           rsp != rcu_state_p ||
+                           rcu_preempt_blocked_readers_cgp(rnp)) {
+                               /*
+                                * No point in scanning bits because they
+                                * are all zero.  But we might need to
+                                * priority-boost blocked readers.
+                                */
+                               rcu_initiate_boost(rnp, flags);
+                               /* rcu_initiate_boost() releases rnp->lock */
+                               continue;
+                       }
+                       if (rnp->parent &&
+                           (rnp->parent->qsmask & rnp->grpmask)) {
+                               /*
+                                * Race between grace-period
+                                * initialization and task exiting RCU
+                                * read-side critical section: Report.
+                                */
+                               rcu_report_unblock_qs_rnp(rsp, rnp, flags);
+                               /* rcu_report_unblock_qs_rnp() rlses ->lock */
+                               continue;
+                       }
                }
                cpu = rnp->grplo;
                bit = 1;
@@ -2731,15 +2751,6 @@ static void force_qs_rnp(struct rcu_state *rsp,
                if (mask != 0) {
                        /* Idle/offline CPUs, report. */
                        rcu_report_qs_rnp(mask, rsp, rnp, flags);
-               } else if (rnp->parent &&
-                        list_empty(&rnp->blkd_tasks) &&
-                        !rnp->qsmask &&
-                        (rnp->parent->qsmask & rnp->grpmask)) {
-                       /*
-                        * Race between grace-period initialization and task
-                        * existing RCU read-side critical section, report.
-                        */
-                       rcu_report_unblock_qs_rnp(rsp, rnp, flags);
                } else {
                        /* Nothing to do here, so just drop the lock. */
                        raw_spin_unlock_irqrestore(&rnp->lock, flags);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to