On Mon, Jun 15, 2020 at 09:23:30AM -0700, Paul E. McKenney wrote:
> On Mon, Jun 15, 2020 at 02:56:54PM +0200, Peter Zijlstra wrote:
> > Hi,
> > 
> > So Paul reported rcutorture hitting a NULL dereference, and patch #1 fixes 
> > it.
> > 
> > Now, patch #1 is obviously correct, but I can't explain how exactly it 
> > leads to
> > the observed NULL pointer dereference. The NULL pointer deref happens in
> > find_matching_se()'s last while() loop when is_same_group() fails even 
> > though
> > both parents are NULL.
> 
> My bisection of yet another bug sometimes hits the scheduler NULL pointer
> dereference on older commits.  I will try out patch #2.

Thanks! I've got 16*TREE03 running since this morning, so far so nothing :/
(FWIW that's 16/9 times overcommit, idle time fluctuates around 10%).

> Whether this is reassuring or depressing, I have no idea.  :-/

Worrysome at least, I don't trust stuff I can't explain.

> > The only explanation I have for that is that we just did an activate_task()
> > while: 'task_cpu(p) != cpu_of(rq)', because then 'p->se.cfs_rq' doesn't 
> > match.
> > However, I can't see how the lack of #1 would lead to that. Never-the-less,
> > patch #2 adds assertions to warn us of this case.
> > 
> > Patch #3 is a trivial rename that ought to eradicate some confusion.
> > 
> > The last 3 patches is what I ended up with for cleaning up the whole
> > smp_call_function/irq_work/ttwu thing more.
> 
> Would it be possible to allow a target CPU # on those instances of
> __call_single_data?  This is extremely helpful for debugging lost
> smp_call_function*() calls.

target or source ? Either would be possible, perhaps even both. We have
a spare u32 in __call_single_node.

Something like the below on top of 1-4. If we want to keep this, we
should probably stick it under some CONFIG_DBUG thing or other.

--- a/include/linux/smp_types.h
+++ b/include/linux/smp_types.h
@@ -61,6 +61,7 @@ struct __call_single_node {
                unsigned int    u_flags;
                atomic_t        a_flags;
        };
+       u16 src, dst;
 };
 
 #endif /* __LINUX_SMP_TYPES_H */
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -135,8 +135,14 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(cal
 
 void __smp_call_single_queue(int cpu, struct llist_node *node)
 {
+       struct __call_single_node *n =
+               container_of(node, struct __call_single_node, llist);
+
        WARN_ON_ONCE(cpu == smp_processor_id());
 
+       n->src = smp_processor_id();
+       n->dst = cpu;
+
        /*
         * The list addition should be visible before sending the IPI
         * handler locks the list to pull the entry off it because of

Reply via email to