subject:"\"mlockall triggred rcu

Re: mlockall triggred rcu_preempt stall.

2013-07-30 Thread Dave Jones

On Tue, Jul 30, 2013 at 10:57:18AM -0700, Paul E. McKenney wrote:
 > On Fri, Jul 19, 2013 at 08:32:12PM -0400, Dave Jones wrote:
 > > On Fri, Jul 19, 2013 at 03:15:39PM -0700, Paul E. McKenney wrote:
 > >  > On Fri, Jul 19, 2013 at 10:53:23AM -0400, Dave Jones wrote:
 > >  > > My fuzz tester keeps hitting this. Every instance shows the non-irq 
 > > stack
 > >  > > came in from mlockall.  I'm only seeing this on one box, but that has 
 > > more
 > >  > > ram (8gb) than my other machines, which might explain it.
 > >  > 
 > >  > Are you building CONFIG_PREEMPT=n?  I don't see any preemption points in
 > >  > do_mlockall(), so a range containing enough vmas might well stall the
 > >  > CPU in that case.  
 > > 
 > > That was with full preempt.
 > > 
 > >  > Does the patch below help?  If so, we probably need others, but let's
 > >  > first see if this one helps.  ;-)
 > > 
 > > I'll try it on Monday.
 > 
 > Any news?  If I don't hear otherwise, I will assume that the patch did
 > not help, and will therefore drop it.

I wasn't able to do any tests yesterday, because I kept hitting other oopses.
I've got patches for the more obvious ones now, so I'll start testing rc3 this
afternoon.

Dave


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mlockall triggred rcu_preempt stall.

2013-07-30 Thread Paul E. McKenney

On Fri, Jul 19, 2013 at 08:32:12PM -0400, Dave Jones wrote:
> On Fri, Jul 19, 2013 at 03:15:39PM -0700, Paul E. McKenney wrote:
>  > On Fri, Jul 19, 2013 at 10:53:23AM -0400, Dave Jones wrote:
>  > > My fuzz tester keeps hitting this. Every instance shows the non-irq stack
>  > > came in from mlockall.  I'm only seeing this on one box, but that has 
> more
>  > > ram (8gb) than my other machines, which might explain it.
>  > 
>  > Are you building CONFIG_PREEMPT=n?  I don't see any preemption points in
>  > do_mlockall(), so a range containing enough vmas might well stall the
>  > CPU in that case.  
> 
> That was with full preempt.
> 
>  > Does the patch below help?  If so, we probably need others, but let's
>  > first see if this one helps.  ;-)
> 
> I'll try it on Monday.

Any news?  If I don't hear otherwise, I will assume that the patch did
not help, and will therefore drop it.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mlockall triggred rcu_preempt stall.

2013-07-20 Thread Paul E. McKenney

On Fri, Jul 19, 2013 at 08:32:12PM -0400, Dave Jones wrote:
> On Fri, Jul 19, 2013 at 03:15:39PM -0700, Paul E. McKenney wrote:
>  > On Fri, Jul 19, 2013 at 10:53:23AM -0400, Dave Jones wrote:
>  > > My fuzz tester keeps hitting this. Every instance shows the non-irq stack
>  > > came in from mlockall.  I'm only seeing this on one box, but that has 
> more
>  > > ram (8gb) than my other machines, which might explain it.
>  > 
>  > Are you building CONFIG_PREEMPT=n?  I don't see any preemption points in
>  > do_mlockall(), so a range containing enough vmas might well stall the
>  > CPU in that case.  
> 
> That was with full preempt.
> 
>  > Does the patch below help?  If so, we probably need others, but let's
>  > first see if this one helps.  ;-)
> 
> I'll try it on Monday.

Given full preempt, I wouldn't think that my patch would have any effect,
but look forward to hearing what happens.

Hmmm  Were you running mlockall() concurrently from a bunch of
different processes sharing lots of memory via mmap() or some such?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mlockall triggred rcu_preempt stall.

2013-07-19 Thread Dave Jones

On Fri, Jul 19, 2013 at 03:15:39PM -0700, Paul E. McKenney wrote:
 > On Fri, Jul 19, 2013 at 10:53:23AM -0400, Dave Jones wrote:
 > > My fuzz tester keeps hitting this. Every instance shows the non-irq stack
 > > came in from mlockall.  I'm only seeing this on one box, but that has more
 > > ram (8gb) than my other machines, which might explain it.
 > 
 > Are you building CONFIG_PREEMPT=n?  I don't see any preemption points in
 > do_mlockall(), so a range containing enough vmas might well stall the
 > CPU in that case.  

That was with full preempt.

 > Does the patch below help?  If so, we probably need others, but let's
 > first see if this one helps.  ;-)

I'll try it on Monday.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mlockall triggred rcu_preempt stall.

2013-07-19 Thread Paul E. McKenney

On Fri, Jul 19, 2013 at 10:53:23AM -0400, Dave Jones wrote:
> My fuzz tester keeps hitting this. Every instance shows the non-irq stack
> came in from mlockall.  I'm only seeing this on one box, but that has more
> ram (8gb) than my other machines, which might explain it.

Are you building CONFIG_PREEMPT=n?  I don't see any preemption points in
do_mlockall(), so a range containing enough vmas might well stall the
CPU in that case.  

Does the patch below help?  If so, we probably need others, but let's
first see if this one helps.  ;-)

CCing the MM guys and those who have most recently touched do_mlockall()
for their insight as well.

Thanx, Paul

>   Dave



mm: Place preemption point in do_mlockall() loop

There is a loop in do_mlockall() that lacks a preemption point, which
means that the following can happen on non-preemptible builds of the
kernel:

> My fuzz tester keeps hitting this. Every instance shows the non-irq stack
> came in from mlockall.  I'm only seeing this on one box, but that has more
> ram (8gb) than my other machines, which might explain it.
>
>   Dave
>
> INFO: rcu_preempt self-detected stall on CPU { 3}  (t=6500 jiffies g=470344 
> c=470343 q=0)
> sending NMI to all CPUs:
> NMI backtrace for cpu 3
> CPU: 3 PID: 29664 Comm: trinity-child2 Not tainted 3.11.0-rc1+ #32
> task: 88023e743fc0 ti: 88022f6f2000 task.ti: 88022f6f2000
> RIP: 0010:[]  [] 
> trace_hardirqs_off_caller+0x21/0xb0
> RSP: 0018:880244e03c30  EFLAGS: 0046
> RAX: 88023e743fc0 RBX: 0001 RCX: 003c
> RDX: 000f RSI: 0004 RDI: 81033cab
> RBP: 880244e03c38 R08: 880243288a80 R09: 0001
> R10:  R11: 0001 R12: 880243288a80
> R13: 8802437eda40 R14: 0008 R15: d010
> FS:  7f50ae33b740() GS:880244e0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0097f000 CR3: 000240fa CR4: 001407e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0600
> Stack:
>  810bf86d 880244e03c98 81033cab 0096
>  d008 00030002 0004 0003
>  2710 81c50d00 81c50d00 880244fcde00
> Call Trace:
>  
>  [] ? trace_hardirqs_off+0xd/0x10
>  [] __x2apic_send_IPI_mask+0x1ab/0x1c0
>  [] x2apic_send_IPI_all+0x1c/0x20
>  [] arch_trigger_all_cpu_backtrace+0x65/0xa0
>  [] rcu_check_callbacks+0x331/0x8e0
>  [] ? hrtimer_run_queues+0x20/0x180
>  [] ? sched_clock_cpu+0xb5/0x100
>  [] update_process_times+0x47/0x80
>  [] tick_sched_handle.isra.16+0x25/0x60
>  [] tick_sched_timer+0x41/0x60
>  [] __run_hrtimer+0x81/0x4e0
>  [] ? tick_sched_do_timer+0x60/0x60
>  [] hrtimer_interrupt+0xff/0x240
>  [] local_apic_timer_interrupt+0x34/0x60
>  [] smp_apic_timer_interrupt+0x3f/0x60
>  [] apic_timer_interrupt+0x6f/0x80
>  [] ? retint_restore_args+0xe/0xe
>  [] ? __do_softirq+0xb1/0x440
>  [] irq_exit+0xcd/0xe0
>  [] smp_apic_timer_interrupt+0x45/0x60
>  [] apic_timer_interrupt+0x6f/0x80
>  
>  [] ? retint_restore_args+0xe/0xe
>  [] ? wait_for_completion_killable+0x170/0x170
>  [] ? preempt_schedule_irq+0x53/0x90
>  [] retint_kernel+0x26/0x30
>  [] ? queue_work_on+0x43/0x90
>  [] schedule_on_each_cpu+0xc9/0x1a0
>  [] ? lru_add_drain+0x50/0x50
>  [] lru_add_drain_all+0x15/0x20
>  [] SyS_mlockall+0xa5/0x1a0
>  [] tracesys+0xdd/0xe2

This commit addresses this problem by inserting the required preemption
point.

Reported-by: Dave Jones 
Signed-off-by: Paul E. McKenney 
Cc: KOSAKI Motohiro 
Cc: Michel Lespinasse 
Cc: Andrew Morton 
Cc: Linus Torvalds 

diff --git a/mm/mlock.c b/mm/mlock.c
index 79b7cf7..92022eb 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -506,6 +506,7 @@ static int do_mlockall(int flags)
 
/* Ignore errors */
mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags);
+   cond_resched();
}
 out:
return 0;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

mlockall triggred rcu_preempt stall.

2013-07-19 Thread Dave Jones

My fuzz tester keeps hitting this. Every instance shows the non-irq stack
came in from mlockall.  I'm only seeing this on one box, but that has more
ram (8gb) than my other machines, which might explain it.

Dave

INFO: rcu_preempt self-detected stall on CPU { 3}  (t=6500 jiffies g=470344 
c=470343 q=0)
sending NMI to all CPUs:
NMI backtrace for cpu 3
CPU: 3 PID: 29664 Comm: trinity-child2 Not tainted 3.11.0-rc1+ #32
task: 88023e743fc0 ti: 88022f6f2000 task.ti: 88022f6f2000
RIP: 0010:[]  [] 
trace_hardirqs_off_caller+0x21/0xb0
RSP: 0018:880244e03c30  EFLAGS: 0046
RAX: 88023e743fc0 RBX: 0001 RCX: 003c
RDX: 000f RSI: 0004 RDI: 81033cab
RBP: 880244e03c38 R08: 880243288a80 R09: 0001
R10:  R11: 0001 R12: 880243288a80
R13: 8802437eda40 R14: 0008 R15: d010
FS:  7f50ae33b740() GS:880244e0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0097f000 CR3: 000240fa CR4: 001407e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0600
Stack:
 810bf86d 880244e03c98 81033cab 0096
 d008 00030002 0004 0003
 2710 81c50d00 81c50d00 880244fcde00
Call Trace:
  
 [] ? trace_hardirqs_off+0xd/0x10
 [] __x2apic_send_IPI_mask+0x1ab/0x1c0
 [] x2apic_send_IPI_all+0x1c/0x20
 [] arch_trigger_all_cpu_backtrace+0x65/0xa0
 [] rcu_check_callbacks+0x331/0x8e0
 [] ? hrtimer_run_queues+0x20/0x180
 [] ? sched_clock_cpu+0xb5/0x100
 [] update_process_times+0x47/0x80
 [] tick_sched_handle.isra.16+0x25/0x60
 [] tick_sched_timer+0x41/0x60
 [] __run_hrtimer+0x81/0x4e0
 [] ? tick_sched_do_timer+0x60/0x60
 [] hrtimer_interrupt+0xff/0x240
 [] local_apic_timer_interrupt+0x34/0x60
 [] smp_apic_timer_interrupt+0x3f/0x60
 [] apic_timer_interrupt+0x6f/0x80
 [] ? retint_restore_args+0xe/0xe
 [] ? __do_softirq+0xb1/0x440
 [] irq_exit+0xcd/0xe0
 [] smp_apic_timer_interrupt+0x45/0x60
 [] apic_timer_interrupt+0x6f/0x80
  
 [] ? retint_restore_args+0xe/0xe
 [] ? wait_for_completion_killable+0x170/0x170
 [] ? preempt_schedule_irq+0x53/0x90
 [] retint_kernel+0x26/0x30
 [] ? queue_work_on+0x43/0x90
 [] schedule_on_each_cpu+0xc9/0x1a0
 [] ? lru_add_drain+0x50/0x50
 [] lru_add_drain_all+0x15/0x20
 [] SyS_mlockall+0xa5/0x1a0
 [] tracesys+0xdd/0xe2
Code: 5d c3 0f 1f 84 00 00 00 00 00 44 8b 1d 29 73 bd 00 65 48 8b 04 25 00 ba 
00 00 45 85 db 74 69 44 8b 90 a4 06 00 00 45 85 d2 75 5d <44> 8b 0d a0 47 00 01 
45 85 c9 74 33 44 8b 80 70 06 00 00 45 85 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mlockall triggred rcu_preempt stall.

Re: mlockall triggred rcu_preempt stall.

Re: mlockall triggred rcu_preempt stall.

Re: mlockall triggred rcu_preempt stall.

Re: mlockall triggred rcu_preempt stall.

mlockall triggred rcu_preempt stall.

6 matches

Site Navigation

Mail list logo

Footer information