On 11/13/2014 06:07 PM, Paul E. McKenney wrote:
> On Mon, Oct 27, 2014 at 04:44:25PM -0700, Paul E. McKenney wrote:
>> > On Mon, Oct 27, 2014 at 02:13:29PM -0700, Paul E. McKenney wrote:
>>> > > On Fri, Oct 24, 2014 at 12:39:15PM -0400, Sasha Levin wrote:
>>>> > > > On 10/24/2014 12:13 PM, Paul E. McKenney wrote:
>>>>> > > > > On Fri, Oct 24, 2014 at 08:28:40AM -0400, Sasha Levin wrote:
>>>>>>> > > > >> > On 10/23/2014 03:58 PM, Paul E. McKenney wrote:
>>>>>>>>> > > > >>> > > On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin 
>>>>>>>>> > > > >>> > > wrote:
>>>>>>>>>>>>> > > > >>>>> > >> > On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > On Tue, Oct 14, 2014 at 10:35:10PM 
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > -0400, Sasha Levin wrote:
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> On 10/13/2014 01:35 PM, Dave 
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> Jones wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> oday in "rcu stall 
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> while fuzzing" news:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>>
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> INFO: rcu_preempt 
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> detected stalls on 
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> CPUs/tasks:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>>       Tasks blocked 
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> on level-0 rcu_node 
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> (CPUs 0-3): P766 P646
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>>       Tasks blocked 
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> on level-0 rcu_node 
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> (CPUs 0-3): P766 P646
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>>       (detected by 0, 
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> t=6502 jiffies, 
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> g=75434, c=75433, q=0)
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >>
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> I've complained about RCU 
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> stalls couple days ago (in a 
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> different context)
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> on -next. I guess whatever 
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> causing them made it into 
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> Linus's tree?
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >>
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> https://lkml.org/lkml/2014/10/11/64
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > 
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > And on that one, I must confess that 
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > I don't see where the RCU read-side
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > critical section might be.
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > 
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > Hmmm...  Maybe someone forgot to put 
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > an rcu_read_unlock() somewhere.
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > Can you reproduce this with 
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > CONFIG_PROVE_RCU=y?
>>>>>>>>>>>>> > > > >>>>> > >> > 
>>>>>>>>>>>>> > > > >>>>> > >> > Paul, if that was directed to me - Yes, I see 
>>>>>>>>>>>>> > > > >>>>> > >> > stalls with CONFIG_PROVE_RCU
>>>>>>>>>>>>> > > > >>>>> > >> > set and nothing else is showing up 
>>>>>>>>>>>>> > > > >>>>> > >> > before/after that.
>>>>>>>>> > > > >>> > > Indeed it was directed to you.  ;-)
>>>>>>>>> > > > >>> > > 
>>>>>>>>> > > > >>> > > Does the following crude diagnostic patch turn up 
>>>>>>>>> > > > >>> > > anything?
>>>>>>> > > > >> > 
>>>>>>> > > > >> > Nope, seeing stalls but not seeing that pr_err() you added.
>>>>> > > > > OK, color me confused.  Could you please send me the full dmesg 
>>>>> > > > > or a
>>>>> > > > > pointer to it?
>>>> > > > 
>>>> > > > Attached.
>>> > > 
>>> > > Thank you!  I would complain about the FAULT_INJECTION messages, but
>>> > > they don't appear to be happening all that frequently.
>>> > > 
>>> > > The stack dumps do look different here.  I suspect that this is a real
>>> > > issue in the VM code.
>> > 
>> > And to that end...  The filemap_map_pages() function does have loop over
>> > a list of pages.  I wonder if the rcu_read_lock() should be moved into
>> > the radix_tree_for_each_slot() loop.  CCing linux-mm for their thoughts,
>> > though it looks to me like the current radix_tree_for_each_slot() wants
>> > to be under RCU protection.  But I am not seeing anything that requires
>> > all iterations of the loop to be under the same RCU read-side critical
>> > section.  Maybe something like the following patch?
> Just following up, did the patch below help?

I'm not seeing any more stalls with filemap in them, but I don see different
traces.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to