Re: NO_HZ_FULL and tick running within a reasonable amount of time

2018-04-23 Thread Paul E. McKenney
On Mon, Apr 23, 2018 at 03:04:23PM +0200, Frederic Weisbecker wrote:
> On Tue, Apr 03, 2018 at 10:08:58AM -0700, Paul E. McKenney wrote:
> > On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote:
> > > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote:
> > > > Hello!
> > > > 
> > > > I am hitting the following on today's mainline under rcutorture, but
> > > > only on scenarios built with CONFIG_NO_HZ_FULL=y:
> > > > 
> > > > 
> > > > 
> > > > WARNING: CPU: 0 PID: 7 at 
> > > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> > > > sched_tick_remote+0x113/0x120
> > > > Modules linked in:
> > > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > > Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > > Workqueue: events_unbound sched_tick_remote
> > > > RIP: 0010:sched_tick_remote+0x113/0x120
> > > > RSP: 0018:94d540103e20 EFLAGS: 00010002
> > > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f
> > > > RDX: b2d05e00 RSI:  RDI: 8f95dfd21858
> > > > RBP: 94d540103e48 R08: f6499019 R09: f6499000
> > > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518
> > > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858
> > > > FS:  () GS:8f95dfc0() 
> > > > knlGS:
> > > > CS:  0010 DS:  ES:  CR0: 80050033
> > > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0
> > > > Call Trace:
> > > >  process_one_work+0x1d9/0x6a0
> > > >  worker_thread+0x42/0x420
> > > >  kthread+0xf3/0x130
> > > >  ? rescuer_thread+0x340/0x340
> > > >  ? kthread_delayed_work_timer_fn+0x80/0x80
> > > >  ret_from_fork+0x3a/0x50
> > > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff 
> > > > be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b 
> > > > eb 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54
> > > > ---[ end trace fbdcbe529a8ae799 ]--
> > > > 
> > > > 
> > > > 
> > > > The WARN_ON_ONCE() triggering is this guy:
> > > > 
> > > > delta = rq_clock_task(rq) - curr->se.exec_start;
> > > > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> > > 
> > > Weird. Can you try to print up those values and see how much they drift?
> > > 
> > > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3))
> > >printk_once("clock_task: %lld exec_start: %lld\n", 
> > > rq_clock_task(rq), curr->se.exec_start);
> > 
> > Here you go!
> > 
> > Thanx, Paul
> > 
> > 
> > 
> > WARNING: CPU: 0 PID: 7 at 
> > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> > sched_tick_remote+0xdb/0x100
> > Modules linked in:
> > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > Workqueue: events_unbound sched_tick_remote
> > RIP: 0010:sched_tick_remote+0xdb/0x100
> > RSP: 0018:a2c440103e60 EFLAGS: 00010006
> > RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322
> > RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53
> > RBP: 96f0dfd24328 R08:  R09: 0001
> > R10:  R11:  R12: 96f0de9d2640
> > R13:  R14: 096f0de81700 R15: 96f0de96f540
> > FS:  () GS:96f0dfc0() knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0
> > Call Trace:
> >  process_one_work+0x139/0x3e0
> >  worker_thread+0x42/0x420
> >  kthread+0xf3/0x130
> >  ? create_worker+0x190/0x190
> >  ? kthread_destroy_worker+0x40/0x40
> >  ret_from_fork+0x35/0x40
> > Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f 
> > 85 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d 
> > 7d 57 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d
> > ---[ end trace f0c6a1afa55d130d ]---
> > clock_task: 6304138787 exec_start: 221487873
> 
> I'm trying to reproduce it on a large CPUs system. It seems to be easier
> to trigger it that way.

As long as it isn't just me.  ;-)

On the TASKS03 rcutorture scenario, it triggers pretty quickly on
a two-CPU virtual machine, if that helps.

Thanx, Paul



Re: NO_HZ_FULL and tick running within a reasonable amount of time

2018-04-23 Thread Paul E. McKenney
On Mon, Apr 23, 2018 at 03:04:23PM +0200, Frederic Weisbecker wrote:
> On Tue, Apr 03, 2018 at 10:08:58AM -0700, Paul E. McKenney wrote:
> > On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote:
> > > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote:
> > > > Hello!
> > > > 
> > > > I am hitting the following on today's mainline under rcutorture, but
> > > > only on scenarios built with CONFIG_NO_HZ_FULL=y:
> > > > 
> > > > 
> > > > 
> > > > WARNING: CPU: 0 PID: 7 at 
> > > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> > > > sched_tick_remote+0x113/0x120
> > > > Modules linked in:
> > > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > > Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > > Workqueue: events_unbound sched_tick_remote
> > > > RIP: 0010:sched_tick_remote+0x113/0x120
> > > > RSP: 0018:94d540103e20 EFLAGS: 00010002
> > > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f
> > > > RDX: b2d05e00 RSI:  RDI: 8f95dfd21858
> > > > RBP: 94d540103e48 R08: f6499019 R09: f6499000
> > > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518
> > > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858
> > > > FS:  () GS:8f95dfc0() 
> > > > knlGS:
> > > > CS:  0010 DS:  ES:  CR0: 80050033
> > > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0
> > > > Call Trace:
> > > >  process_one_work+0x1d9/0x6a0
> > > >  worker_thread+0x42/0x420
> > > >  kthread+0xf3/0x130
> > > >  ? rescuer_thread+0x340/0x340
> > > >  ? kthread_delayed_work_timer_fn+0x80/0x80
> > > >  ret_from_fork+0x3a/0x50
> > > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff 
> > > > be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b 
> > > > eb 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54
> > > > ---[ end trace fbdcbe529a8ae799 ]--
> > > > 
> > > > 
> > > > 
> > > > The WARN_ON_ONCE() triggering is this guy:
> > > > 
> > > > delta = rq_clock_task(rq) - curr->se.exec_start;
> > > > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> > > 
> > > Weird. Can you try to print up those values and see how much they drift?
> > > 
> > > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3))
> > >printk_once("clock_task: %lld exec_start: %lld\n", 
> > > rq_clock_task(rq), curr->se.exec_start);
> > 
> > Here you go!
> > 
> > Thanx, Paul
> > 
> > 
> > 
> > WARNING: CPU: 0 PID: 7 at 
> > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> > sched_tick_remote+0xdb/0x100
> > Modules linked in:
> > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > Workqueue: events_unbound sched_tick_remote
> > RIP: 0010:sched_tick_remote+0xdb/0x100
> > RSP: 0018:a2c440103e60 EFLAGS: 00010006
> > RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322
> > RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53
> > RBP: 96f0dfd24328 R08:  R09: 0001
> > R10:  R11:  R12: 96f0de9d2640
> > R13:  R14: 096f0de81700 R15: 96f0de96f540
> > FS:  () GS:96f0dfc0() knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0
> > Call Trace:
> >  process_one_work+0x139/0x3e0
> >  worker_thread+0x42/0x420
> >  kthread+0xf3/0x130
> >  ? create_worker+0x190/0x190
> >  ? kthread_destroy_worker+0x40/0x40
> >  ret_from_fork+0x35/0x40
> > Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f 
> > 85 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d 
> > 7d 57 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d
> > ---[ end trace f0c6a1afa55d130d ]---
> > clock_task: 6304138787 exec_start: 221487873
> 
> I'm trying to reproduce it on a large CPUs system. It seems to be easier
> to trigger it that way.

As long as it isn't just me.  ;-)

On the TASKS03 rcutorture scenario, it triggers pretty quickly on
a two-CPU virtual machine, if that helps.

Thanx, Paul



Re: NO_HZ_FULL and tick running within a reasonable amount of time

2018-04-23 Thread Frederic Weisbecker
On Tue, Apr 03, 2018 at 10:08:58AM -0700, Paul E. McKenney wrote:
> On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote:
> > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote:
> > > Hello!
> > > 
> > > I am hitting the following on today's mainline under rcutorture, but
> > > only on scenarios built with CONFIG_NO_HZ_FULL=y:
> > > 
> > > 
> > > 
> > > WARNING: CPU: 0 PID: 7 at 
> > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> > > sched_tick_remote+0x113/0x120
> > > Modules linked in:
> > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > Workqueue: events_unbound sched_tick_remote
> > > RIP: 0010:sched_tick_remote+0x113/0x120
> > > RSP: 0018:94d540103e20 EFLAGS: 00010002
> > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f
> > > RDX: b2d05e00 RSI:  RDI: 8f95dfd21858
> > > RBP: 94d540103e48 R08: f6499019 R09: f6499000
> > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518
> > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858
> > > FS:  () GS:8f95dfc0() 
> > > knlGS:
> > > CS:  0010 DS:  ES:  CR0: 80050033
> > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0
> > > Call Trace:
> > >  process_one_work+0x1d9/0x6a0
> > >  worker_thread+0x42/0x420
> > >  kthread+0xf3/0x130
> > >  ? rescuer_thread+0x340/0x340
> > >  ? kthread_delayed_work_timer_fn+0x80/0x80
> > >  ret_from_fork+0x3a/0x50
> > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff 
> > > be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb 
> > > 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54
> > > ---[ end trace fbdcbe529a8ae799 ]--
> > > 
> > > 
> > > 
> > > The WARN_ON_ONCE() triggering is this guy:
> > > 
> > >   delta = rq_clock_task(rq) - curr->se.exec_start;
> > >   WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> > 
> > Weird. Can you try to print up those values and see how much they drift?
> > 
> > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3))
> >printk_once("clock_task: %lld exec_start: %lld\n", 
> > rq_clock_task(rq), curr->se.exec_start);
> 
> Here you go!
> 
>   Thanx, Paul
> 
> 
> 
> WARNING: CPU: 0 PID: 7 at 
> /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> sched_tick_remote+0xdb/0x100
> Modules linked in:
> CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> Workqueue: events_unbound sched_tick_remote
> RIP: 0010:sched_tick_remote+0xdb/0x100
> RSP: 0018:a2c440103e60 EFLAGS: 00010006
> RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322
> RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53
> RBP: 96f0dfd24328 R08:  R09: 0001
> R10:  R11:  R12: 96f0de9d2640
> R13:  R14: 096f0de81700 R15: 96f0de96f540
> FS:  () GS:96f0dfc0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0
> Call Trace:
>  process_one_work+0x139/0x3e0
>  worker_thread+0x42/0x420
>  kthread+0xf3/0x130
>  ? create_worker+0x190/0x190
>  ? kthread_destroy_worker+0x40/0x40
>  ret_from_fork+0x35/0x40
> Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f 85 
> 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d 7d 57 
> 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d
> ---[ end trace f0c6a1afa55d130d ]---
> clock_task: 6304138787 exec_start: 221487873
> 

I'm trying to reproduce it on a large CPUs system. It seems to be easier
to trigger it that way.

Thanks.


Re: NO_HZ_FULL and tick running within a reasonable amount of time

2018-04-23 Thread Frederic Weisbecker
On Tue, Apr 03, 2018 at 10:08:58AM -0700, Paul E. McKenney wrote:
> On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote:
> > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote:
> > > Hello!
> > > 
> > > I am hitting the following on today's mainline under rcutorture, but
> > > only on scenarios built with CONFIG_NO_HZ_FULL=y:
> > > 
> > > 
> > > 
> > > WARNING: CPU: 0 PID: 7 at 
> > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> > > sched_tick_remote+0x113/0x120
> > > Modules linked in:
> > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > Workqueue: events_unbound sched_tick_remote
> > > RIP: 0010:sched_tick_remote+0x113/0x120
> > > RSP: 0018:94d540103e20 EFLAGS: 00010002
> > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f
> > > RDX: b2d05e00 RSI:  RDI: 8f95dfd21858
> > > RBP: 94d540103e48 R08: f6499019 R09: f6499000
> > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518
> > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858
> > > FS:  () GS:8f95dfc0() 
> > > knlGS:
> > > CS:  0010 DS:  ES:  CR0: 80050033
> > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0
> > > Call Trace:
> > >  process_one_work+0x1d9/0x6a0
> > >  worker_thread+0x42/0x420
> > >  kthread+0xf3/0x130
> > >  ? rescuer_thread+0x340/0x340
> > >  ? kthread_delayed_work_timer_fn+0x80/0x80
> > >  ret_from_fork+0x3a/0x50
> > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff 
> > > be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb 
> > > 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54
> > > ---[ end trace fbdcbe529a8ae799 ]--
> > > 
> > > 
> > > 
> > > The WARN_ON_ONCE() triggering is this guy:
> > > 
> > >   delta = rq_clock_task(rq) - curr->se.exec_start;
> > >   WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> > 
> > Weird. Can you try to print up those values and see how much they drift?
> > 
> > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3))
> >printk_once("clock_task: %lld exec_start: %lld\n", 
> > rq_clock_task(rq), curr->se.exec_start);
> 
> Here you go!
> 
>   Thanx, Paul
> 
> 
> 
> WARNING: CPU: 0 PID: 7 at 
> /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> sched_tick_remote+0xdb/0x100
> Modules linked in:
> CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> Workqueue: events_unbound sched_tick_remote
> RIP: 0010:sched_tick_remote+0xdb/0x100
> RSP: 0018:a2c440103e60 EFLAGS: 00010006
> RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322
> RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53
> RBP: 96f0dfd24328 R08:  R09: 0001
> R10:  R11:  R12: 96f0de9d2640
> R13:  R14: 096f0de81700 R15: 96f0de96f540
> FS:  () GS:96f0dfc0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0
> Call Trace:
>  process_one_work+0x139/0x3e0
>  worker_thread+0x42/0x420
>  kthread+0xf3/0x130
>  ? create_worker+0x190/0x190
>  ? kthread_destroy_worker+0x40/0x40
>  ret_from_fork+0x35/0x40
> Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f 85 
> 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d 7d 57 
> 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d
> ---[ end trace f0c6a1afa55d130d ]---
> clock_task: 6304138787 exec_start: 221487873
> 

I'm trying to reproduce it on a large CPUs system. It seems to be easier
to trigger it that way.

Thanks.


Re: NO_HZ_FULL and tick running within a reasonable amount of time

2018-04-13 Thread Jacek Tomaka
> On Tue, Apr 03, 2018 at 10:08:58AM -0700, Paul E. McKenney wrote:
> > On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote:
> > > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote:
> > > > Hello!
> > > >
> > > > I am hitting the following on today's mainline under rcutorture, but
> > > > only on scenarios built with CONFIG_NO_HZ_FULL=y:
> > > >
> > > > 
> > > >
> > > > WARNING: CPU: 0 PID: 7 at 
> > > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> > > > sched_tick_remote+0x113/0x120
> > > > Modules linked in:
> > > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > > Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > > Workqueue: events_unbound sched_tick_remote
> > > > RIP: 0010:sched_tick_remote+0x113/0x120
> > > > RSP: 0018:94d540103e20 EFLAGS: 00010002
> > > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f
> > > > RDX: b2d05e00 RSI:  RDI: 8f95dfd21858
> > > > RBP: 94d540103e48 R08: f6499019 R09: f6499000
> > > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518
> > > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858
> > > > FS:  () GS:8f95dfc0() 
> > > > knlGS:
> > > > CS:  0010 DS:  ES:  CR0: 80050033
> > > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0
> > > > Call Trace:
> > > >  process_one_work+0x1d9/0x6a0
> > > >  worker_thread+0x42/0x420
> > > >  kthread+0xf3/0x130
> > > >  ? rescuer_thread+0x340/0x340
> > > >  ? kthread_delayed_work_timer_fn+0x80/0x80
> > > >  ret_from_fork+0x3a/0x50
> > > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff 
> > > > be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b 
> > > > eb 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54
> > > > ---[ end trace fbdcbe529a8ae799 ]--
> > > >
> > > > 
> > > >
> > > > The WARN_ON_ONCE() triggering is this guy:
> > > >
> > > > delta = rq_clock_task(rq) - curr->se.exec_start;
> > > > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> > >
> > > Weird. Can you try to print up those values and see how much they drift?
> > >
> > > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3))
> > >printk_once("clock_task: %lld exec_start: %lld\n", 
> > > rq_clock_task(rq), curr-> >se.exec_start);
> >
> > Here you go!
> >
> > Thanx, Paul
> >
> > 
> >
> > WARNING: CPU: 0 PID: 7 at 
> > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> > sched_tick_remote+0xdb/0x100
> > Modules linked in:
> > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > Workqueue: events_unbound sched_tick_remote
> > RIP: 0010:sched_tick_remote+0xdb/0x100
> > RSP: 0018:a2c440103e60 EFLAGS: 00010006
> > RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322
> > RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53
> > RBP: 96f0dfd24328 R08:  R09: 0001
> > R10:  R11:  R12: 96f0de9d2640
> > R13:  R14: 096f0de81700 R15: 96f0de96f540
> > FS:  () GS:96f0dfc0() knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0
> > Call Trace:
> >  process_one_work+0x139/0x3e0
> >  worker_thread+0x42/0x420
> >  kthread+0xf3/0x130
> >  ? create_worker+0x190/0x190
> >  ? kthread_destroy_worker+0x40/0x40
> >  ret_from_fork+0x35/0x40
> > Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f 
> > 85 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d 
> > 7d 57 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d
> > ---[ end trace f0c6a1afa55d130d ]---
> > clock_task: 6304138787 exec_start: 221487873
> >
>
> That's a 6 second delay, it's huge!
>
> Could it be because you use Qemu and the virtualized CPUs got interrupted for
a long while?

Hello,
I am seeing the following message as well and I am running it on real
hardware (Intel Xeon Phi 7250):
My kernel parameters include:
nohz_full=1-271 noretpoline isolcpus=nohz

RCU threads are pinned to cpu zero
for i in `pgrep rcu[^c]` ; do
taskset -pc 0 $i
done

One thing i noticed though that my date is not set properly:
-bash-4.2# date
Thu Jan  1 10:56:21 UTC 1970

Probably because of :
[0.00] NO_HZ: Full dynticks CPUs: 1-271.
[0.00]  Offload RCU callbacks from CPUs: 1-271.
[0.00] WARNING: Persistent clock returned invalid value!
[0.00]  

Re: NO_HZ_FULL and tick running within a reasonable amount of time

2018-04-13 Thread Jacek Tomaka
> On Tue, Apr 03, 2018 at 10:08:58AM -0700, Paul E. McKenney wrote:
> > On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote:
> > > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote:
> > > > Hello!
> > > >
> > > > I am hitting the following on today's mainline under rcutorture, but
> > > > only on scenarios built with CONFIG_NO_HZ_FULL=y:
> > > >
> > > > 
> > > >
> > > > WARNING: CPU: 0 PID: 7 at 
> > > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> > > > sched_tick_remote+0x113/0x120
> > > > Modules linked in:
> > > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > > Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > > Workqueue: events_unbound sched_tick_remote
> > > > RIP: 0010:sched_tick_remote+0x113/0x120
> > > > RSP: 0018:94d540103e20 EFLAGS: 00010002
> > > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f
> > > > RDX: b2d05e00 RSI:  RDI: 8f95dfd21858
> > > > RBP: 94d540103e48 R08: f6499019 R09: f6499000
> > > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518
> > > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858
> > > > FS:  () GS:8f95dfc0() 
> > > > knlGS:
> > > > CS:  0010 DS:  ES:  CR0: 80050033
> > > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0
> > > > Call Trace:
> > > >  process_one_work+0x1d9/0x6a0
> > > >  worker_thread+0x42/0x420
> > > >  kthread+0xf3/0x130
> > > >  ? rescuer_thread+0x340/0x340
> > > >  ? kthread_delayed_work_timer_fn+0x80/0x80
> > > >  ret_from_fork+0x3a/0x50
> > > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff 
> > > > be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b 
> > > > eb 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54
> > > > ---[ end trace fbdcbe529a8ae799 ]--
> > > >
> > > > 
> > > >
> > > > The WARN_ON_ONCE() triggering is this guy:
> > > >
> > > > delta = rq_clock_task(rq) - curr->se.exec_start;
> > > > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> > >
> > > Weird. Can you try to print up those values and see how much they drift?
> > >
> > > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3))
> > >printk_once("clock_task: %lld exec_start: %lld\n", 
> > > rq_clock_task(rq), curr-> >se.exec_start);
> >
> > Here you go!
> >
> > Thanx, Paul
> >
> > 
> >
> > WARNING: CPU: 0 PID: 7 at 
> > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> > sched_tick_remote+0xdb/0x100
> > Modules linked in:
> > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > Workqueue: events_unbound sched_tick_remote
> > RIP: 0010:sched_tick_remote+0xdb/0x100
> > RSP: 0018:a2c440103e60 EFLAGS: 00010006
> > RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322
> > RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53
> > RBP: 96f0dfd24328 R08:  R09: 0001
> > R10:  R11:  R12: 96f0de9d2640
> > R13:  R14: 096f0de81700 R15: 96f0de96f540
> > FS:  () GS:96f0dfc0() knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0
> > Call Trace:
> >  process_one_work+0x139/0x3e0
> >  worker_thread+0x42/0x420
> >  kthread+0xf3/0x130
> >  ? create_worker+0x190/0x190
> >  ? kthread_destroy_worker+0x40/0x40
> >  ret_from_fork+0x35/0x40
> > Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f 
> > 85 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d 
> > 7d 57 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d
> > ---[ end trace f0c6a1afa55d130d ]---
> > clock_task: 6304138787 exec_start: 221487873
> >
>
> That's a 6 second delay, it's huge!
>
> Could it be because you use Qemu and the virtualized CPUs got interrupted for
a long while?

Hello,
I am seeing the following message as well and I am running it on real
hardware (Intel Xeon Phi 7250):
My kernel parameters include:
nohz_full=1-271 noretpoline isolcpus=nohz

RCU threads are pinned to cpu zero
for i in `pgrep rcu[^c]` ; do
taskset -pc 0 $i
done

One thing i noticed though that my date is not set properly:
-bash-4.2# date
Thu Jan  1 10:56:21 UTC 1970

Probably because of :
[0.00] NO_HZ: Full dynticks CPUs: 1-271.
[0.00]  Offload RCU callbacks from CPUs: 1-271.
[0.00] WARNING: Persistent clock returned invalid value!
[0.00]  

Re: NO_HZ_FULL and tick running within a reasonable amount of time

2018-04-07 Thread Frederic Weisbecker
On Tue, Apr 03, 2018 at 10:08:58AM -0700, Paul E. McKenney wrote:
> On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote:
> > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote:
> > > Hello!
> > > 
> > > I am hitting the following on today's mainline under rcutorture, but
> > > only on scenarios built with CONFIG_NO_HZ_FULL=y:
> > > 
> > > 
> > > 
> > > WARNING: CPU: 0 PID: 7 at 
> > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> > > sched_tick_remote+0x113/0x120
> > > Modules linked in:
> > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > Workqueue: events_unbound sched_tick_remote
> > > RIP: 0010:sched_tick_remote+0x113/0x120
> > > RSP: 0018:94d540103e20 EFLAGS: 00010002
> > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f
> > > RDX: b2d05e00 RSI:  RDI: 8f95dfd21858
> > > RBP: 94d540103e48 R08: f6499019 R09: f6499000
> > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518
> > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858
> > > FS:  () GS:8f95dfc0() 
> > > knlGS:
> > > CS:  0010 DS:  ES:  CR0: 80050033
> > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0
> > > Call Trace:
> > >  process_one_work+0x1d9/0x6a0
> > >  worker_thread+0x42/0x420
> > >  kthread+0xf3/0x130
> > >  ? rescuer_thread+0x340/0x340
> > >  ? kthread_delayed_work_timer_fn+0x80/0x80
> > >  ret_from_fork+0x3a/0x50
> > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff 
> > > be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb 
> > > 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54
> > > ---[ end trace fbdcbe529a8ae799 ]--
> > > 
> > > 
> > > 
> > > The WARN_ON_ONCE() triggering is this guy:
> > > 
> > >   delta = rq_clock_task(rq) - curr->se.exec_start;
> > >   WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> > 
> > Weird. Can you try to print up those values and see how much they drift?
> > 
> > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3))
> >printk_once("clock_task: %lld exec_start: %lld\n", 
> > rq_clock_task(rq), curr->se.exec_start);
> 
> Here you go!
> 
>   Thanx, Paul
> 
> 
> 
> WARNING: CPU: 0 PID: 7 at 
> /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> sched_tick_remote+0xdb/0x100
> Modules linked in:
> CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> Workqueue: events_unbound sched_tick_remote
> RIP: 0010:sched_tick_remote+0xdb/0x100
> RSP: 0018:a2c440103e60 EFLAGS: 00010006
> RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322
> RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53
> RBP: 96f0dfd24328 R08:  R09: 0001
> R10:  R11:  R12: 96f0de9d2640
> R13:  R14: 096f0de81700 R15: 96f0de96f540
> FS:  () GS:96f0dfc0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0
> Call Trace:
>  process_one_work+0x139/0x3e0
>  worker_thread+0x42/0x420
>  kthread+0xf3/0x130
>  ? create_worker+0x190/0x190
>  ? kthread_destroy_worker+0x40/0x40
>  ret_from_fork+0x35/0x40
> Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f 85 
> 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d 7d 57 
> 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d
> ---[ end trace f0c6a1afa55d130d ]---
> clock_task: 6304138787 exec_start: 221487873
> 

That's a 6 second delay, it's huge!

Could it be because you use Qemu and the virtualized CPUs got interrupted for
a long while?


Re: NO_HZ_FULL and tick running within a reasonable amount of time

2018-04-07 Thread Frederic Weisbecker
On Tue, Apr 03, 2018 at 10:08:58AM -0700, Paul E. McKenney wrote:
> On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote:
> > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote:
> > > Hello!
> > > 
> > > I am hitting the following on today's mainline under rcutorture, but
> > > only on scenarios built with CONFIG_NO_HZ_FULL=y:
> > > 
> > > 
> > > 
> > > WARNING: CPU: 0 PID: 7 at 
> > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> > > sched_tick_remote+0x113/0x120
> > > Modules linked in:
> > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > Workqueue: events_unbound sched_tick_remote
> > > RIP: 0010:sched_tick_remote+0x113/0x120
> > > RSP: 0018:94d540103e20 EFLAGS: 00010002
> > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f
> > > RDX: b2d05e00 RSI:  RDI: 8f95dfd21858
> > > RBP: 94d540103e48 R08: f6499019 R09: f6499000
> > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518
> > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858
> > > FS:  () GS:8f95dfc0() 
> > > knlGS:
> > > CS:  0010 DS:  ES:  CR0: 80050033
> > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0
> > > Call Trace:
> > >  process_one_work+0x1d9/0x6a0
> > >  worker_thread+0x42/0x420
> > >  kthread+0xf3/0x130
> > >  ? rescuer_thread+0x340/0x340
> > >  ? kthread_delayed_work_timer_fn+0x80/0x80
> > >  ret_from_fork+0x3a/0x50
> > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff 
> > > be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb 
> > > 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54
> > > ---[ end trace fbdcbe529a8ae799 ]--
> > > 
> > > 
> > > 
> > > The WARN_ON_ONCE() triggering is this guy:
> > > 
> > >   delta = rq_clock_task(rq) - curr->se.exec_start;
> > >   WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> > 
> > Weird. Can you try to print up those values and see how much they drift?
> > 
> > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3))
> >printk_once("clock_task: %lld exec_start: %lld\n", 
> > rq_clock_task(rq), curr->se.exec_start);
> 
> Here you go!
> 
>   Thanx, Paul
> 
> 
> 
> WARNING: CPU: 0 PID: 7 at 
> /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> sched_tick_remote+0xdb/0x100
> Modules linked in:
> CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> Workqueue: events_unbound sched_tick_remote
> RIP: 0010:sched_tick_remote+0xdb/0x100
> RSP: 0018:a2c440103e60 EFLAGS: 00010006
> RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322
> RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53
> RBP: 96f0dfd24328 R08:  R09: 0001
> R10:  R11:  R12: 96f0de9d2640
> R13:  R14: 096f0de81700 R15: 96f0de96f540
> FS:  () GS:96f0dfc0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0
> Call Trace:
>  process_one_work+0x139/0x3e0
>  worker_thread+0x42/0x420
>  kthread+0xf3/0x130
>  ? create_worker+0x190/0x190
>  ? kthread_destroy_worker+0x40/0x40
>  ret_from_fork+0x35/0x40
> Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f 85 
> 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d 7d 57 
> 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d
> ---[ end trace f0c6a1afa55d130d ]---
> clock_task: 6304138787 exec_start: 221487873
> 

That's a 6 second delay, it's huge!

Could it be because you use Qemu and the virtualized CPUs got interrupted for
a long while?


Re: NO_HZ_FULL and tick running within a reasonable amount of time

2018-04-03 Thread Paul E. McKenney
On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote:
> On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote:
> > Hello!
> > 
> > I am hitting the following on today's mainline under rcutorture, but
> > only on scenarios built with CONFIG_NO_HZ_FULL=y:
> > 
> > 
> > 
> > WARNING: CPU: 0 PID: 7 at 
> > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> > sched_tick_remote+0x113/0x120
> > Modules linked in:
> > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > Workqueue: events_unbound sched_tick_remote
> > RIP: 0010:sched_tick_remote+0x113/0x120
> > RSP: 0018:94d540103e20 EFLAGS: 00010002
> > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f
> > RDX: b2d05e00 RSI:  RDI: 8f95dfd21858
> > RBP: 94d540103e48 R08: f6499019 R09: f6499000
> > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518
> > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858
> > FS:  () GS:8f95dfc0() knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0
> > Call Trace:
> >  process_one_work+0x1d9/0x6a0
> >  worker_thread+0x42/0x420
> >  kthread+0xf3/0x130
> >  ? rescuer_thread+0x340/0x340
> >  ? kthread_delayed_work_timer_fn+0x80/0x80
> >  ret_from_fork+0x3a/0x50
> > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff be 
> > ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb 97 
> > 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54
> > ---[ end trace fbdcbe529a8ae799 ]--
> > 
> > 
> > 
> > The WARN_ON_ONCE() triggering is this guy:
> > 
> > delta = rq_clock_task(rq) - curr->se.exec_start;
> > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> 
> Weird. Can you try to print up those values and see how much they drift?
> 
> if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3))
>printk_once("clock_task: %lld exec_start: %lld\n", 
> rq_clock_task(rq), curr->se.exec_start);

Here you go!

Thanx, Paul



WARNING: CPU: 0 PID: 7 at 
/home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
sched_tick_remote+0xdb/0x100
Modules linked in:
CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
Workqueue: events_unbound sched_tick_remote
RIP: 0010:sched_tick_remote+0xdb/0x100
RSP: 0018:a2c440103e60 EFLAGS: 00010006
RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322
RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53
RBP: 96f0dfd24328 R08:  R09: 0001
R10:  R11:  R12: 96f0de9d2640
R13:  R14: 096f0de81700 R15: 96f0de96f540
FS:  () GS:96f0dfc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0
Call Trace:
 process_one_work+0x139/0x3e0
 worker_thread+0x42/0x420
 kthread+0xf3/0x130
 ? create_worker+0x190/0x190
 ? kthread_destroy_worker+0x40/0x40
 ret_from_fork+0x35/0x40
Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f 85 
65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d 7d 57 2b 
01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d
---[ end trace f0c6a1afa55d130d ]---
clock_task: 6304138787 exec_start: 221487873



Re: NO_HZ_FULL and tick running within a reasonable amount of time

2018-04-03 Thread Paul E. McKenney
On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote:
> On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote:
> > Hello!
> > 
> > I am hitting the following on today's mainline under rcutorture, but
> > only on scenarios built with CONFIG_NO_HZ_FULL=y:
> > 
> > 
> > 
> > WARNING: CPU: 0 PID: 7 at 
> > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> > sched_tick_remote+0x113/0x120
> > Modules linked in:
> > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > Workqueue: events_unbound sched_tick_remote
> > RIP: 0010:sched_tick_remote+0x113/0x120
> > RSP: 0018:94d540103e20 EFLAGS: 00010002
> > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f
> > RDX: b2d05e00 RSI:  RDI: 8f95dfd21858
> > RBP: 94d540103e48 R08: f6499019 R09: f6499000
> > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518
> > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858
> > FS:  () GS:8f95dfc0() knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0
> > Call Trace:
> >  process_one_work+0x1d9/0x6a0
> >  worker_thread+0x42/0x420
> >  kthread+0xf3/0x130
> >  ? rescuer_thread+0x340/0x340
> >  ? kthread_delayed_work_timer_fn+0x80/0x80
> >  ret_from_fork+0x3a/0x50
> > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff be 
> > ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb 97 
> > 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54
> > ---[ end trace fbdcbe529a8ae799 ]--
> > 
> > 
> > 
> > The WARN_ON_ONCE() triggering is this guy:
> > 
> > delta = rq_clock_task(rq) - curr->se.exec_start;
> > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> 
> Weird. Can you try to print up those values and see how much they drift?
> 
> if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3))
>printk_once("clock_task: %lld exec_start: %lld\n", 
> rq_clock_task(rq), curr->se.exec_start);

Here you go!

Thanx, Paul



WARNING: CPU: 0 PID: 7 at 
/home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
sched_tick_remote+0xdb/0x100
Modules linked in:
CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
Workqueue: events_unbound sched_tick_remote
RIP: 0010:sched_tick_remote+0xdb/0x100
RSP: 0018:a2c440103e60 EFLAGS: 00010006
RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322
RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53
RBP: 96f0dfd24328 R08:  R09: 0001
R10:  R11:  R12: 96f0de9d2640
R13:  R14: 096f0de81700 R15: 96f0de96f540
FS:  () GS:96f0dfc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0
Call Trace:
 process_one_work+0x139/0x3e0
 worker_thread+0x42/0x420
 kthread+0xf3/0x130
 ? create_worker+0x190/0x190
 ? kthread_destroy_worker+0x40/0x40
 ret_from_fork+0x35/0x40
Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f 85 
65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d 7d 57 2b 
01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d
---[ end trace f0c6a1afa55d130d ]---
clock_task: 6304138787 exec_start: 221487873



Re: NO_HZ_FULL and tick running within a reasonable amount of time

2018-04-03 Thread Frederic Weisbecker
On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote:
> Hello!
> 
> I am hitting the following on today's mainline under rcutorture, but
> only on scenarios built with CONFIG_NO_HZ_FULL=y:
> 
> 
> 
> WARNING: CPU: 0 PID: 7 at 
> /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> sched_tick_remote+0x113/0x120
> Modules linked in:
> CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> Workqueue: events_unbound sched_tick_remote
> RIP: 0010:sched_tick_remote+0x113/0x120
> RSP: 0018:94d540103e20 EFLAGS: 00010002
> RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f
> RDX: b2d05e00 RSI:  RDI: 8f95dfd21858
> RBP: 94d540103e48 R08: f6499019 R09: f6499000
> R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518
> R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858
> FS:  () GS:8f95dfc0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0a015b40 CR3: 1de14000 CR4: 06f0
> Call Trace:
>  process_one_work+0x1d9/0x6a0
>  worker_thread+0x42/0x420
>  kthread+0xf3/0x130
>  ? rescuer_thread+0x340/0x340
>  ? kthread_delayed_work_timer_fn+0x80/0x80
>  ret_from_fork+0x3a/0x50
> Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff be ff 
> ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb 97 66 0f 
> 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54
> ---[ end trace fbdcbe529a8ae799 ]--
> 
> 
> 
> The WARN_ON_ONCE() triggering is this guy:
> 
>   delta = rq_clock_task(rq) - curr->se.exec_start;
>   WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);

Weird. Can you try to print up those values and see how much they drift?

if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3))
   printk_once("clock_task: %lld exec_start: %lld\n", 
rq_clock_task(rq), curr->se.exec_start);

> 
> But given that ->se.exec_start is zeroed from time to time, for example,
> in migrate_task_rq_fair(), I am a bit suspicious of this check.
> 
> What am I missing here?
> 
>   Thanx, Paul
> 


Re: NO_HZ_FULL and tick running within a reasonable amount of time

2018-04-03 Thread Frederic Weisbecker
On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote:
> Hello!
> 
> I am hitting the following on today's mainline under rcutorture, but
> only on scenarios built with CONFIG_NO_HZ_FULL=y:
> 
> 
> 
> WARNING: CPU: 0 PID: 7 at 
> /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
> sched_tick_remote+0x113/0x120
> Modules linked in:
> CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> Workqueue: events_unbound sched_tick_remote
> RIP: 0010:sched_tick_remote+0x113/0x120
> RSP: 0018:94d540103e20 EFLAGS: 00010002
> RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f
> RDX: b2d05e00 RSI:  RDI: 8f95dfd21858
> RBP: 94d540103e48 R08: f6499019 R09: f6499000
> R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518
> R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858
> FS:  () GS:8f95dfc0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0a015b40 CR3: 1de14000 CR4: 06f0
> Call Trace:
>  process_one_work+0x1d9/0x6a0
>  worker_thread+0x42/0x420
>  kthread+0xf3/0x130
>  ? rescuer_thread+0x340/0x340
>  ? kthread_delayed_work_timer_fn+0x80/0x80
>  ret_from_fork+0x3a/0x50
> Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff be ff 
> ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb 97 66 0f 
> 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54
> ---[ end trace fbdcbe529a8ae799 ]--
> 
> 
> 
> The WARN_ON_ONCE() triggering is this guy:
> 
>   delta = rq_clock_task(rq) - curr->se.exec_start;
>   WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);

Weird. Can you try to print up those values and see how much they drift?

if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3))
   printk_once("clock_task: %lld exec_start: %lld\n", 
rq_clock_task(rq), curr->se.exec_start);

> 
> But given that ->se.exec_start is zeroed from time to time, for example,
> in migrate_task_rq_fair(), I am a bit suspicious of this check.
> 
> What am I missing here?
> 
>   Thanx, Paul
> 


Re: NO_HZ_FULL and tick running within a reasonable amount of time

2018-04-03 Thread Peter Zijlstra
On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote:

> The WARN_ON_ONCE() triggering is this guy:
> 
>   delta = rq_clock_task(rq) - curr->se.exec_start;
>   WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> 
> But given that ->se.exec_start is zeroed from time to time, for example,
> in migrate_task_rq_fair(), I am a bit suspicious of this check.
> 
> What am I missing here?

We clear it on migration, but set it when we schedule a task back in.
The above checks that the 'current' task of that CPU had a tick at least
3 seconds ago (to ensure tasks don't go too long without ticks).

The 'current' task is obviously scheduled in and thus must have !0
exec_start time.


Re: NO_HZ_FULL and tick running within a reasonable amount of time

2018-04-03 Thread Peter Zijlstra
On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote:

> The WARN_ON_ONCE() triggering is this guy:
> 
>   delta = rq_clock_task(rq) - curr->se.exec_start;
>   WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> 
> But given that ->se.exec_start is zeroed from time to time, for example,
> in migrate_task_rq_fair(), I am a bit suspicious of this check.
> 
> What am I missing here?

We clear it on migration, but set it when we schedule a task back in.
The above checks that the 'current' task of that CPU had a tick at least
3 seconds ago (to ensure tasks don't go too long without ticks).

The 'current' task is obviously scheduled in and thus must have !0
exec_start time.


NO_HZ_FULL and tick running within a reasonable amount of time

2018-04-02 Thread Paul E. McKenney
Hello!

I am hitting the following on today's mainline under rcutorture, but
only on scenarios built with CONFIG_NO_HZ_FULL=y:



WARNING: CPU: 0 PID: 7 at 
/home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
sched_tick_remote+0x113/0x120
Modules linked in:
CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
Workqueue: events_unbound sched_tick_remote
RIP: 0010:sched_tick_remote+0x113/0x120
RSP: 0018:94d540103e20 EFLAGS: 00010002
RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f
RDX: b2d05e00 RSI:  RDI: 8f95dfd21858
RBP: 94d540103e48 R08: f6499019 R09: f6499000
R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518
R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858
FS:  () GS:8f95dfc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0a015b40 CR3: 1de14000 CR4: 06f0
Call Trace:
 process_one_work+0x1d9/0x6a0
 worker_thread+0x42/0x420
 kthread+0xf3/0x130
 ? rescuer_thread+0x340/0x340
 ? kthread_delayed_work_timer_fn+0x80/0x80
 ret_from_fork+0x3a/0x50
Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff be ff 
ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb 97 66 0f 1f 
84 00 00 00 00 00 41 57 41 56 41 55 41 54
---[ end trace fbdcbe529a8ae799 ]--



The WARN_ON_ONCE() triggering is this guy:

delta = rq_clock_task(rq) - curr->se.exec_start;
WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);

But given that ->se.exec_start is zeroed from time to time, for example,
in migrate_task_rq_fair(), I am a bit suspicious of this check.

What am I missing here?

Thanx, Paul



NO_HZ_FULL and tick running within a reasonable amount of time

2018-04-02 Thread Paul E. McKenney
Hello!

I am hitting the following on today's mainline under rcutorture, but
only on scenarios built with CONFIG_NO_HZ_FULL=y:



WARNING: CPU: 0 PID: 7 at 
/home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 
sched_tick_remote+0x113/0x120
Modules linked in:
CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
Workqueue: events_unbound sched_tick_remote
RIP: 0010:sched_tick_remote+0x113/0x120
RSP: 0018:94d540103e20 EFLAGS: 00010002
RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f
RDX: b2d05e00 RSI:  RDI: 8f95dfd21858
RBP: 94d540103e48 R08: f6499019 R09: f6499000
R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518
R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858
FS:  () GS:8f95dfc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0a015b40 CR3: 1de14000 CR4: 06f0
Call Trace:
 process_one_work+0x1d9/0x6a0
 worker_thread+0x42/0x420
 kthread+0xf3/0x130
 ? rescuer_thread+0x340/0x340
 ? kthread_delayed_work_timer_fn+0x80/0x80
 ret_from_fork+0x3a/0x50
Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff be ff 
ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb 97 66 0f 1f 
84 00 00 00 00 00 41 57 41 56 41 55 41 54
---[ end trace fbdcbe529a8ae799 ]--



The WARN_ON_ONCE() triggering is this guy:

delta = rq_clock_task(rq) - curr->se.exec_start;
WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);

But given that ->se.exec_start is zeroed from time to time, for example,
in migrate_task_rq_fair(), I am a bit suspicious of this check.

What am I missing here?

Thanx, Paul