Re: NO_HZ_FULL and tick running within a reasonable amount of time
On Mon, Apr 23, 2018 at 03:04:23PM +0200, Frederic Weisbecker wrote: > On Tue, Apr 03, 2018 at 10:08:58AM -0700, Paul E. McKenney wrote: > > On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote: > > > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote: > > > > Hello! > > > > > > > > I am hitting the following on today's mainline under rcutorture, but > > > > only on scenarios built with CONFIG_NO_HZ_FULL=y: > > > > > > > > > > > > > > > > WARNING: CPU: 0 PID: 7 at > > > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > > > > sched_tick_remote+0x113/0x120 > > > > Modules linked in: > > > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > > > > Workqueue: events_unbound sched_tick_remote > > > > RIP: 0010:sched_tick_remote+0x113/0x120 > > > > RSP: 0018:94d540103e20 EFLAGS: 00010002 > > > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f > > > > RDX: b2d05e00 RSI: RDI: 8f95dfd21858 > > > > RBP: 94d540103e48 R08: f6499019 R09: f6499000 > > > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518 > > > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858 > > > > FS: () GS:8f95dfc0() > > > > knlGS: > > > > CS: 0010 DS: ES: CR0: 80050033 > > > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0 > > > > Call Trace: > > > > process_one_work+0x1d9/0x6a0 > > > > worker_thread+0x42/0x420 > > > > kthread+0xf3/0x130 > > > > ? rescuer_thread+0x340/0x340 > > > > ? kthread_delayed_work_timer_fn+0x80/0x80 > > > > ret_from_fork+0x3a/0x50 > > > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff > > > > be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b > > > > eb 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 > > > > ---[ end trace fbdcbe529a8ae799 ]-- > > > > > > > > > > > > > > > > The WARN_ON_ONCE() triggering is this guy: > > > > > > > > delta = rq_clock_task(rq) - curr->se.exec_start; > > > > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); > > > > > > Weird. Can you try to print up those values and see how much they drift? > > > > > > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3)) > > >printk_once("clock_task: %lld exec_start: %lld\n", > > > rq_clock_task(rq), curr->se.exec_start); > > > > Here you go! > > > > Thanx, Paul > > > > > > > > WARNING: CPU: 0 PID: 7 at > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > > sched_tick_remote+0xdb/0x100 > > Modules linked in: > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > > Workqueue: events_unbound sched_tick_remote > > RIP: 0010:sched_tick_remote+0xdb/0x100 > > RSP: 0018:a2c440103e60 EFLAGS: 00010006 > > RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322 > > RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53 > > RBP: 96f0dfd24328 R08: R09: 0001 > > R10: R11: R12: 96f0de9d2640 > > R13: R14: 096f0de81700 R15: 96f0de96f540 > > FS: () GS:96f0dfc0() knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0 > > Call Trace: > > process_one_work+0x139/0x3e0 > > worker_thread+0x42/0x420 > > kthread+0xf3/0x130 > > ? create_worker+0x190/0x190 > > ? kthread_destroy_worker+0x40/0x40 > > ret_from_fork+0x35/0x40 > > Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f > > 85 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d > > 7d 57 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d > > ---[ end trace f0c6a1afa55d130d ]--- > > clock_task: 6304138787 exec_start: 221487873 > > I'm trying to reproduce it on a large CPUs system. It seems to be easier > to trigger it that way. As long as it isn't just me. ;-) On the TASKS03 rcutorture scenario, it triggers pretty quickly on a two-CPU virtual machine, if that helps. Thanx, Paul
Re: NO_HZ_FULL and tick running within a reasonable amount of time
On Mon, Apr 23, 2018 at 03:04:23PM +0200, Frederic Weisbecker wrote: > On Tue, Apr 03, 2018 at 10:08:58AM -0700, Paul E. McKenney wrote: > > On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote: > > > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote: > > > > Hello! > > > > > > > > I am hitting the following on today's mainline under rcutorture, but > > > > only on scenarios built with CONFIG_NO_HZ_FULL=y: > > > > > > > > > > > > > > > > WARNING: CPU: 0 PID: 7 at > > > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > > > > sched_tick_remote+0x113/0x120 > > > > Modules linked in: > > > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > > > > Workqueue: events_unbound sched_tick_remote > > > > RIP: 0010:sched_tick_remote+0x113/0x120 > > > > RSP: 0018:94d540103e20 EFLAGS: 00010002 > > > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f > > > > RDX: b2d05e00 RSI: RDI: 8f95dfd21858 > > > > RBP: 94d540103e48 R08: f6499019 R09: f6499000 > > > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518 > > > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858 > > > > FS: () GS:8f95dfc0() > > > > knlGS: > > > > CS: 0010 DS: ES: CR0: 80050033 > > > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0 > > > > Call Trace: > > > > process_one_work+0x1d9/0x6a0 > > > > worker_thread+0x42/0x420 > > > > kthread+0xf3/0x130 > > > > ? rescuer_thread+0x340/0x340 > > > > ? kthread_delayed_work_timer_fn+0x80/0x80 > > > > ret_from_fork+0x3a/0x50 > > > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff > > > > be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b > > > > eb 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 > > > > ---[ end trace fbdcbe529a8ae799 ]-- > > > > > > > > > > > > > > > > The WARN_ON_ONCE() triggering is this guy: > > > > > > > > delta = rq_clock_task(rq) - curr->se.exec_start; > > > > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); > > > > > > Weird. Can you try to print up those values and see how much they drift? > > > > > > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3)) > > >printk_once("clock_task: %lld exec_start: %lld\n", > > > rq_clock_task(rq), curr->se.exec_start); > > > > Here you go! > > > > Thanx, Paul > > > > > > > > WARNING: CPU: 0 PID: 7 at > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > > sched_tick_remote+0xdb/0x100 > > Modules linked in: > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > > Workqueue: events_unbound sched_tick_remote > > RIP: 0010:sched_tick_remote+0xdb/0x100 > > RSP: 0018:a2c440103e60 EFLAGS: 00010006 > > RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322 > > RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53 > > RBP: 96f0dfd24328 R08: R09: 0001 > > R10: R11: R12: 96f0de9d2640 > > R13: R14: 096f0de81700 R15: 96f0de96f540 > > FS: () GS:96f0dfc0() knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0 > > Call Trace: > > process_one_work+0x139/0x3e0 > > worker_thread+0x42/0x420 > > kthread+0xf3/0x130 > > ? create_worker+0x190/0x190 > > ? kthread_destroy_worker+0x40/0x40 > > ret_from_fork+0x35/0x40 > > Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f > > 85 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d > > 7d 57 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d > > ---[ end trace f0c6a1afa55d130d ]--- > > clock_task: 6304138787 exec_start: 221487873 > > I'm trying to reproduce it on a large CPUs system. It seems to be easier > to trigger it that way. As long as it isn't just me. ;-) On the TASKS03 rcutorture scenario, it triggers pretty quickly on a two-CPU virtual machine, if that helps. Thanx, Paul
Re: NO_HZ_FULL and tick running within a reasonable amount of time
On Tue, Apr 03, 2018 at 10:08:58AM -0700, Paul E. McKenney wrote: > On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote: > > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote: > > > Hello! > > > > > > I am hitting the following on today's mainline under rcutorture, but > > > only on scenarios built with CONFIG_NO_HZ_FULL=y: > > > > > > > > > > > > WARNING: CPU: 0 PID: 7 at > > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > > > sched_tick_remote+0x113/0x120 > > > Modules linked in: > > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > > > Workqueue: events_unbound sched_tick_remote > > > RIP: 0010:sched_tick_remote+0x113/0x120 > > > RSP: 0018:94d540103e20 EFLAGS: 00010002 > > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f > > > RDX: b2d05e00 RSI: RDI: 8f95dfd21858 > > > RBP: 94d540103e48 R08: f6499019 R09: f6499000 > > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518 > > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858 > > > FS: () GS:8f95dfc0() > > > knlGS: > > > CS: 0010 DS: ES: CR0: 80050033 > > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0 > > > Call Trace: > > > process_one_work+0x1d9/0x6a0 > > > worker_thread+0x42/0x420 > > > kthread+0xf3/0x130 > > > ? rescuer_thread+0x340/0x340 > > > ? kthread_delayed_work_timer_fn+0x80/0x80 > > > ret_from_fork+0x3a/0x50 > > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff > > > be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb > > > 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 > > > ---[ end trace fbdcbe529a8ae799 ]-- > > > > > > > > > > > > The WARN_ON_ONCE() triggering is this guy: > > > > > > delta = rq_clock_task(rq) - curr->se.exec_start; > > > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); > > > > Weird. Can you try to print up those values and see how much they drift? > > > > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3)) > >printk_once("clock_task: %lld exec_start: %lld\n", > > rq_clock_task(rq), curr->se.exec_start); > > Here you go! > > Thanx, Paul > > > > WARNING: CPU: 0 PID: 7 at > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > sched_tick_remote+0xdb/0x100 > Modules linked in: > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > Workqueue: events_unbound sched_tick_remote > RIP: 0010:sched_tick_remote+0xdb/0x100 > RSP: 0018:a2c440103e60 EFLAGS: 00010006 > RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322 > RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53 > RBP: 96f0dfd24328 R08: R09: 0001 > R10: R11: R12: 96f0de9d2640 > R13: R14: 096f0de81700 R15: 96f0de96f540 > FS: () GS:96f0dfc0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0 > Call Trace: > process_one_work+0x139/0x3e0 > worker_thread+0x42/0x420 > kthread+0xf3/0x130 > ? create_worker+0x190/0x190 > ? kthread_destroy_worker+0x40/0x40 > ret_from_fork+0x35/0x40 > Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f 85 > 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d 7d 57 > 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d > ---[ end trace f0c6a1afa55d130d ]--- > clock_task: 6304138787 exec_start: 221487873 > I'm trying to reproduce it on a large CPUs system. It seems to be easier to trigger it that way. Thanks.
Re: NO_HZ_FULL and tick running within a reasonable amount of time
On Tue, Apr 03, 2018 at 10:08:58AM -0700, Paul E. McKenney wrote: > On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote: > > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote: > > > Hello! > > > > > > I am hitting the following on today's mainline under rcutorture, but > > > only on scenarios built with CONFIG_NO_HZ_FULL=y: > > > > > > > > > > > > WARNING: CPU: 0 PID: 7 at > > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > > > sched_tick_remote+0x113/0x120 > > > Modules linked in: > > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > > > Workqueue: events_unbound sched_tick_remote > > > RIP: 0010:sched_tick_remote+0x113/0x120 > > > RSP: 0018:94d540103e20 EFLAGS: 00010002 > > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f > > > RDX: b2d05e00 RSI: RDI: 8f95dfd21858 > > > RBP: 94d540103e48 R08: f6499019 R09: f6499000 > > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518 > > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858 > > > FS: () GS:8f95dfc0() > > > knlGS: > > > CS: 0010 DS: ES: CR0: 80050033 > > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0 > > > Call Trace: > > > process_one_work+0x1d9/0x6a0 > > > worker_thread+0x42/0x420 > > > kthread+0xf3/0x130 > > > ? rescuer_thread+0x340/0x340 > > > ? kthread_delayed_work_timer_fn+0x80/0x80 > > > ret_from_fork+0x3a/0x50 > > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff > > > be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb > > > 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 > > > ---[ end trace fbdcbe529a8ae799 ]-- > > > > > > > > > > > > The WARN_ON_ONCE() triggering is this guy: > > > > > > delta = rq_clock_task(rq) - curr->se.exec_start; > > > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); > > > > Weird. Can you try to print up those values and see how much they drift? > > > > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3)) > >printk_once("clock_task: %lld exec_start: %lld\n", > > rq_clock_task(rq), curr->se.exec_start); > > Here you go! > > Thanx, Paul > > > > WARNING: CPU: 0 PID: 7 at > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > sched_tick_remote+0xdb/0x100 > Modules linked in: > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > Workqueue: events_unbound sched_tick_remote > RIP: 0010:sched_tick_remote+0xdb/0x100 > RSP: 0018:a2c440103e60 EFLAGS: 00010006 > RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322 > RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53 > RBP: 96f0dfd24328 R08: R09: 0001 > R10: R11: R12: 96f0de9d2640 > R13: R14: 096f0de81700 R15: 96f0de96f540 > FS: () GS:96f0dfc0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0 > Call Trace: > process_one_work+0x139/0x3e0 > worker_thread+0x42/0x420 > kthread+0xf3/0x130 > ? create_worker+0x190/0x190 > ? kthread_destroy_worker+0x40/0x40 > ret_from_fork+0x35/0x40 > Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f 85 > 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d 7d 57 > 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d > ---[ end trace f0c6a1afa55d130d ]--- > clock_task: 6304138787 exec_start: 221487873 > I'm trying to reproduce it on a large CPUs system. It seems to be easier to trigger it that way. Thanks.
Re: NO_HZ_FULL and tick running within a reasonable amount of time
> On Tue, Apr 03, 2018 at 10:08:58AM -0700, Paul E. McKenney wrote: > > On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote: > > > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote: > > > > Hello! > > > > > > > > I am hitting the following on today's mainline under rcutorture, but > > > > only on scenarios built with CONFIG_NO_HZ_FULL=y: > > > > > > > > > > > > > > > > WARNING: CPU: 0 PID: 7 at > > > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > > > > sched_tick_remote+0x113/0x120 > > > > Modules linked in: > > > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > > > > Workqueue: events_unbound sched_tick_remote > > > > RIP: 0010:sched_tick_remote+0x113/0x120 > > > > RSP: 0018:94d540103e20 EFLAGS: 00010002 > > > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f > > > > RDX: b2d05e00 RSI: RDI: 8f95dfd21858 > > > > RBP: 94d540103e48 R08: f6499019 R09: f6499000 > > > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518 > > > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858 > > > > FS: () GS:8f95dfc0() > > > > knlGS: > > > > CS: 0010 DS: ES: CR0: 80050033 > > > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0 > > > > Call Trace: > > > > process_one_work+0x1d9/0x6a0 > > > > worker_thread+0x42/0x420 > > > > kthread+0xf3/0x130 > > > > ? rescuer_thread+0x340/0x340 > > > > ? kthread_delayed_work_timer_fn+0x80/0x80 > > > > ret_from_fork+0x3a/0x50 > > > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff > > > > be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b > > > > eb 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 > > > > ---[ end trace fbdcbe529a8ae799 ]-- > > > > > > > > > > > > > > > > The WARN_ON_ONCE() triggering is this guy: > > > > > > > > delta = rq_clock_task(rq) - curr->se.exec_start; > > > > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); > > > > > > Weird. Can you try to print up those values and see how much they drift? > > > > > > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3)) > > >printk_once("clock_task: %lld exec_start: %lld\n", > > > rq_clock_task(rq), curr-> >se.exec_start); > > > > Here you go! > > > > Thanx, Paul > > > > > > > > WARNING: CPU: 0 PID: 7 at > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > > sched_tick_remote+0xdb/0x100 > > Modules linked in: > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > > Workqueue: events_unbound sched_tick_remote > > RIP: 0010:sched_tick_remote+0xdb/0x100 > > RSP: 0018:a2c440103e60 EFLAGS: 00010006 > > RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322 > > RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53 > > RBP: 96f0dfd24328 R08: R09: 0001 > > R10: R11: R12: 96f0de9d2640 > > R13: R14: 096f0de81700 R15: 96f0de96f540 > > FS: () GS:96f0dfc0() knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0 > > Call Trace: > > process_one_work+0x139/0x3e0 > > worker_thread+0x42/0x420 > > kthread+0xf3/0x130 > > ? create_worker+0x190/0x190 > > ? kthread_destroy_worker+0x40/0x40 > > ret_from_fork+0x35/0x40 > > Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f > > 85 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d > > 7d 57 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d > > ---[ end trace f0c6a1afa55d130d ]--- > > clock_task: 6304138787 exec_start: 221487873 > > > > That's a 6 second delay, it's huge! > > Could it be because you use Qemu and the virtualized CPUs got interrupted for a long while? Hello, I am seeing the following message as well and I am running it on real hardware (Intel Xeon Phi 7250): My kernel parameters include: nohz_full=1-271 noretpoline isolcpus=nohz RCU threads are pinned to cpu zero for i in `pgrep rcu[^c]` ; do taskset -pc 0 $i done One thing i noticed though that my date is not set properly: -bash-4.2# date Thu Jan 1 10:56:21 UTC 1970 Probably because of : [0.00] NO_HZ: Full dynticks CPUs: 1-271. [0.00] Offload RCU callbacks from CPUs: 1-271. [0.00] WARNING: Persistent clock returned invalid value! [0.00]
Re: NO_HZ_FULL and tick running within a reasonable amount of time
> On Tue, Apr 03, 2018 at 10:08:58AM -0700, Paul E. McKenney wrote: > > On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote: > > > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote: > > > > Hello! > > > > > > > > I am hitting the following on today's mainline under rcutorture, but > > > > only on scenarios built with CONFIG_NO_HZ_FULL=y: > > > > > > > > > > > > > > > > WARNING: CPU: 0 PID: 7 at > > > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > > > > sched_tick_remote+0x113/0x120 > > > > Modules linked in: > > > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > > > > Workqueue: events_unbound sched_tick_remote > > > > RIP: 0010:sched_tick_remote+0x113/0x120 > > > > RSP: 0018:94d540103e20 EFLAGS: 00010002 > > > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f > > > > RDX: b2d05e00 RSI: RDI: 8f95dfd21858 > > > > RBP: 94d540103e48 R08: f6499019 R09: f6499000 > > > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518 > > > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858 > > > > FS: () GS:8f95dfc0() > > > > knlGS: > > > > CS: 0010 DS: ES: CR0: 80050033 > > > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0 > > > > Call Trace: > > > > process_one_work+0x1d9/0x6a0 > > > > worker_thread+0x42/0x420 > > > > kthread+0xf3/0x130 > > > > ? rescuer_thread+0x340/0x340 > > > > ? kthread_delayed_work_timer_fn+0x80/0x80 > > > > ret_from_fork+0x3a/0x50 > > > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff > > > > be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b > > > > eb 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 > > > > ---[ end trace fbdcbe529a8ae799 ]-- > > > > > > > > > > > > > > > > The WARN_ON_ONCE() triggering is this guy: > > > > > > > > delta = rq_clock_task(rq) - curr->se.exec_start; > > > > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); > > > > > > Weird. Can you try to print up those values and see how much they drift? > > > > > > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3)) > > >printk_once("clock_task: %lld exec_start: %lld\n", > > > rq_clock_task(rq), curr-> >se.exec_start); > > > > Here you go! > > > > Thanx, Paul > > > > > > > > WARNING: CPU: 0 PID: 7 at > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > > sched_tick_remote+0xdb/0x100 > > Modules linked in: > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > > Workqueue: events_unbound sched_tick_remote > > RIP: 0010:sched_tick_remote+0xdb/0x100 > > RSP: 0018:a2c440103e60 EFLAGS: 00010006 > > RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322 > > RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53 > > RBP: 96f0dfd24328 R08: R09: 0001 > > R10: R11: R12: 96f0de9d2640 > > R13: R14: 096f0de81700 R15: 96f0de96f540 > > FS: () GS:96f0dfc0() knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0 > > Call Trace: > > process_one_work+0x139/0x3e0 > > worker_thread+0x42/0x420 > > kthread+0xf3/0x130 > > ? create_worker+0x190/0x190 > > ? kthread_destroy_worker+0x40/0x40 > > ret_from_fork+0x35/0x40 > > Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f > > 85 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d > > 7d 57 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d > > ---[ end trace f0c6a1afa55d130d ]--- > > clock_task: 6304138787 exec_start: 221487873 > > > > That's a 6 second delay, it's huge! > > Could it be because you use Qemu and the virtualized CPUs got interrupted for a long while? Hello, I am seeing the following message as well and I am running it on real hardware (Intel Xeon Phi 7250): My kernel parameters include: nohz_full=1-271 noretpoline isolcpus=nohz RCU threads are pinned to cpu zero for i in `pgrep rcu[^c]` ; do taskset -pc 0 $i done One thing i noticed though that my date is not set properly: -bash-4.2# date Thu Jan 1 10:56:21 UTC 1970 Probably because of : [0.00] NO_HZ: Full dynticks CPUs: 1-271. [0.00] Offload RCU callbacks from CPUs: 1-271. [0.00] WARNING: Persistent clock returned invalid value! [0.00]
Re: NO_HZ_FULL and tick running within a reasonable amount of time
On Tue, Apr 03, 2018 at 10:08:58AM -0700, Paul E. McKenney wrote: > On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote: > > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote: > > > Hello! > > > > > > I am hitting the following on today's mainline under rcutorture, but > > > only on scenarios built with CONFIG_NO_HZ_FULL=y: > > > > > > > > > > > > WARNING: CPU: 0 PID: 7 at > > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > > > sched_tick_remote+0x113/0x120 > > > Modules linked in: > > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > > > Workqueue: events_unbound sched_tick_remote > > > RIP: 0010:sched_tick_remote+0x113/0x120 > > > RSP: 0018:94d540103e20 EFLAGS: 00010002 > > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f > > > RDX: b2d05e00 RSI: RDI: 8f95dfd21858 > > > RBP: 94d540103e48 R08: f6499019 R09: f6499000 > > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518 > > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858 > > > FS: () GS:8f95dfc0() > > > knlGS: > > > CS: 0010 DS: ES: CR0: 80050033 > > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0 > > > Call Trace: > > > process_one_work+0x1d9/0x6a0 > > > worker_thread+0x42/0x420 > > > kthread+0xf3/0x130 > > > ? rescuer_thread+0x340/0x340 > > > ? kthread_delayed_work_timer_fn+0x80/0x80 > > > ret_from_fork+0x3a/0x50 > > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff > > > be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb > > > 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 > > > ---[ end trace fbdcbe529a8ae799 ]-- > > > > > > > > > > > > The WARN_ON_ONCE() triggering is this guy: > > > > > > delta = rq_clock_task(rq) - curr->se.exec_start; > > > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); > > > > Weird. Can you try to print up those values and see how much they drift? > > > > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3)) > >printk_once("clock_task: %lld exec_start: %lld\n", > > rq_clock_task(rq), curr->se.exec_start); > > Here you go! > > Thanx, Paul > > > > WARNING: CPU: 0 PID: 7 at > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > sched_tick_remote+0xdb/0x100 > Modules linked in: > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > Workqueue: events_unbound sched_tick_remote > RIP: 0010:sched_tick_remote+0xdb/0x100 > RSP: 0018:a2c440103e60 EFLAGS: 00010006 > RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322 > RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53 > RBP: 96f0dfd24328 R08: R09: 0001 > R10: R11: R12: 96f0de9d2640 > R13: R14: 096f0de81700 R15: 96f0de96f540 > FS: () GS:96f0dfc0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0 > Call Trace: > process_one_work+0x139/0x3e0 > worker_thread+0x42/0x420 > kthread+0xf3/0x130 > ? create_worker+0x190/0x190 > ? kthread_destroy_worker+0x40/0x40 > ret_from_fork+0x35/0x40 > Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f 85 > 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d 7d 57 > 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d > ---[ end trace f0c6a1afa55d130d ]--- > clock_task: 6304138787 exec_start: 221487873 > That's a 6 second delay, it's huge! Could it be because you use Qemu and the virtualized CPUs got interrupted for a long while?
Re: NO_HZ_FULL and tick running within a reasonable amount of time
On Tue, Apr 03, 2018 at 10:08:58AM -0700, Paul E. McKenney wrote: > On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote: > > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote: > > > Hello! > > > > > > I am hitting the following on today's mainline under rcutorture, but > > > only on scenarios built with CONFIG_NO_HZ_FULL=y: > > > > > > > > > > > > WARNING: CPU: 0 PID: 7 at > > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > > > sched_tick_remote+0x113/0x120 > > > Modules linked in: > > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > > > Workqueue: events_unbound sched_tick_remote > > > RIP: 0010:sched_tick_remote+0x113/0x120 > > > RSP: 0018:94d540103e20 EFLAGS: 00010002 > > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f > > > RDX: b2d05e00 RSI: RDI: 8f95dfd21858 > > > RBP: 94d540103e48 R08: f6499019 R09: f6499000 > > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518 > > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858 > > > FS: () GS:8f95dfc0() > > > knlGS: > > > CS: 0010 DS: ES: CR0: 80050033 > > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0 > > > Call Trace: > > > process_one_work+0x1d9/0x6a0 > > > worker_thread+0x42/0x420 > > > kthread+0xf3/0x130 > > > ? rescuer_thread+0x340/0x340 > > > ? kthread_delayed_work_timer_fn+0x80/0x80 > > > ret_from_fork+0x3a/0x50 > > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff > > > be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb > > > 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 > > > ---[ end trace fbdcbe529a8ae799 ]-- > > > > > > > > > > > > The WARN_ON_ONCE() triggering is this guy: > > > > > > delta = rq_clock_task(rq) - curr->se.exec_start; > > > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); > > > > Weird. Can you try to print up those values and see how much they drift? > > > > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3)) > >printk_once("clock_task: %lld exec_start: %lld\n", > > rq_clock_task(rq), curr->se.exec_start); > > Here you go! > > Thanx, Paul > > > > WARNING: CPU: 0 PID: 7 at > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > sched_tick_remote+0xdb/0x100 > Modules linked in: > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > Workqueue: events_unbound sched_tick_remote > RIP: 0010:sched_tick_remote+0xdb/0x100 > RSP: 0018:a2c440103e60 EFLAGS: 00010006 > RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322 > RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53 > RBP: 96f0dfd24328 R08: R09: 0001 > R10: R11: R12: 96f0de9d2640 > R13: R14: 096f0de81700 R15: 96f0de96f540 > FS: () GS:96f0dfc0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0 > Call Trace: > process_one_work+0x139/0x3e0 > worker_thread+0x42/0x420 > kthread+0xf3/0x130 > ? create_worker+0x190/0x190 > ? kthread_destroy_worker+0x40/0x40 > ret_from_fork+0x35/0x40 > Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f 85 > 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d 7d 57 > 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d > ---[ end trace f0c6a1afa55d130d ]--- > clock_task: 6304138787 exec_start: 221487873 > That's a 6 second delay, it's huge! Could it be because you use Qemu and the virtualized CPUs got interrupted for a long while?
Re: NO_HZ_FULL and tick running within a reasonable amount of time
On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote: > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote: > > Hello! > > > > I am hitting the following on today's mainline under rcutorture, but > > only on scenarios built with CONFIG_NO_HZ_FULL=y: > > > > > > > > WARNING: CPU: 0 PID: 7 at > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > > sched_tick_remote+0x113/0x120 > > Modules linked in: > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > > Workqueue: events_unbound sched_tick_remote > > RIP: 0010:sched_tick_remote+0x113/0x120 > > RSP: 0018:94d540103e20 EFLAGS: 00010002 > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f > > RDX: b2d05e00 RSI: RDI: 8f95dfd21858 > > RBP: 94d540103e48 R08: f6499019 R09: f6499000 > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518 > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858 > > FS: () GS:8f95dfc0() knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0 > > Call Trace: > > process_one_work+0x1d9/0x6a0 > > worker_thread+0x42/0x420 > > kthread+0xf3/0x130 > > ? rescuer_thread+0x340/0x340 > > ? kthread_delayed_work_timer_fn+0x80/0x80 > > ret_from_fork+0x3a/0x50 > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff be > > ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb 97 > > 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 > > ---[ end trace fbdcbe529a8ae799 ]-- > > > > > > > > The WARN_ON_ONCE() triggering is this guy: > > > > delta = rq_clock_task(rq) - curr->se.exec_start; > > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); > > Weird. Can you try to print up those values and see how much they drift? > > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3)) >printk_once("clock_task: %lld exec_start: %lld\n", > rq_clock_task(rq), curr->se.exec_start); Here you go! Thanx, Paul WARNING: CPU: 0 PID: 7 at /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 sched_tick_remote+0xdb/0x100 Modules linked in: CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Workqueue: events_unbound sched_tick_remote RIP: 0010:sched_tick_remote+0xdb/0x100 RSP: 0018:a2c440103e60 EFLAGS: 00010006 RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322 RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53 RBP: 96f0dfd24328 R08: R09: 0001 R10: R11: R12: 96f0de9d2640 R13: R14: 096f0de81700 R15: 96f0de96f540 FS: () GS:96f0dfc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0 Call Trace: process_one_work+0x139/0x3e0 worker_thread+0x42/0x420 kthread+0xf3/0x130 ? create_worker+0x190/0x190 ? kthread_destroy_worker+0x40/0x40 ret_from_fork+0x35/0x40 Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f 85 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d 7d 57 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d ---[ end trace f0c6a1afa55d130d ]--- clock_task: 6304138787 exec_start: 221487873
Re: NO_HZ_FULL and tick running within a reasonable amount of time
On Tue, Apr 03, 2018 at 01:41:31PM +0200, Frederic Weisbecker wrote: > On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote: > > Hello! > > > > I am hitting the following on today's mainline under rcutorture, but > > only on scenarios built with CONFIG_NO_HZ_FULL=y: > > > > > > > > WARNING: CPU: 0 PID: 7 at > > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > > sched_tick_remote+0x113/0x120 > > Modules linked in: > > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > > Workqueue: events_unbound sched_tick_remote > > RIP: 0010:sched_tick_remote+0x113/0x120 > > RSP: 0018:94d540103e20 EFLAGS: 00010002 > > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f > > RDX: b2d05e00 RSI: RDI: 8f95dfd21858 > > RBP: 94d540103e48 R08: f6499019 R09: f6499000 > > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518 > > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858 > > FS: () GS:8f95dfc0() knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0 > > Call Trace: > > process_one_work+0x1d9/0x6a0 > > worker_thread+0x42/0x420 > > kthread+0xf3/0x130 > > ? rescuer_thread+0x340/0x340 > > ? kthread_delayed_work_timer_fn+0x80/0x80 > > ret_from_fork+0x3a/0x50 > > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff be > > ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb 97 > > 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 > > ---[ end trace fbdcbe529a8ae799 ]-- > > > > > > > > The WARN_ON_ONCE() triggering is this guy: > > > > delta = rq_clock_task(rq) - curr->se.exec_start; > > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); > > Weird. Can you try to print up those values and see how much they drift? > > if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3)) >printk_once("clock_task: %lld exec_start: %lld\n", > rq_clock_task(rq), curr->se.exec_start); Here you go! Thanx, Paul WARNING: CPU: 0 PID: 7 at /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 sched_tick_remote+0xdb/0x100 Modules linked in: CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Workqueue: events_unbound sched_tick_remote RIP: 0010:sched_tick_remote+0xdb/0x100 RSP: 0018:a2c440103e60 EFLAGS: 00010006 RAX: b2d05e00 RBX: 96f0dfd20980 RCX: 00016a8de322 RDX: 0d33a301 RSI: 000177c18623 RDI: fffeb0bf0e53 RBP: 96f0dfd24328 R08: R09: 0001 R10: R11: R12: 96f0de9d2640 R13: R14: 096f0de81700 R15: 96f0de96f540 FS: () GS:96f0dfc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: f7ec6ca3 CR3: 1e03e000 CR4: 06f0 Call Trace: process_one_work+0x139/0x3e0 worker_thread+0x42/0x420 kthread+0xf3/0x130 ? create_worker+0x190/0x190 ? kthread_destroy_worker+0x40/0x40 ret_from_fork+0x35/0x40 Code: 89 8b 43 04 85 c0 0f 85 75 ff ff ff 48 8b 83 e0 0a 00 00 48 85 c0 0f 85 65 ff ff ff e9 69 ff ff ff 48 89 df e8 87 fe ff ff eb 8d <0f> 0b 80 3d 7d 57 2b 01 00 75 a8 48 c7 c7 e8 e1 fd a4 c6 05 6d ---[ end trace f0c6a1afa55d130d ]--- clock_task: 6304138787 exec_start: 221487873
Re: NO_HZ_FULL and tick running within a reasonable amount of time
On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote: > Hello! > > I am hitting the following on today's mainline under rcutorture, but > only on scenarios built with CONFIG_NO_HZ_FULL=y: > > > > WARNING: CPU: 0 PID: 7 at > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > sched_tick_remote+0x113/0x120 > Modules linked in: > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > Workqueue: events_unbound sched_tick_remote > RIP: 0010:sched_tick_remote+0x113/0x120 > RSP: 0018:94d540103e20 EFLAGS: 00010002 > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f > RDX: b2d05e00 RSI: RDI: 8f95dfd21858 > RBP: 94d540103e48 R08: f6499019 R09: f6499000 > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518 > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858 > FS: () GS:8f95dfc0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0 > Call Trace: > process_one_work+0x1d9/0x6a0 > worker_thread+0x42/0x420 > kthread+0xf3/0x130 > ? rescuer_thread+0x340/0x340 > ? kthread_delayed_work_timer_fn+0x80/0x80 > ret_from_fork+0x3a/0x50 > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff be ff > ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb 97 66 0f > 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 > ---[ end trace fbdcbe529a8ae799 ]-- > > > > The WARN_ON_ONCE() triggering is this guy: > > delta = rq_clock_task(rq) - curr->se.exec_start; > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); Weird. Can you try to print up those values and see how much they drift? if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3)) printk_once("clock_task: %lld exec_start: %lld\n", rq_clock_task(rq), curr->se.exec_start); > > But given that ->se.exec_start is zeroed from time to time, for example, > in migrate_task_rq_fair(), I am a bit suspicious of this check. > > What am I missing here? > > Thanx, Paul >
Re: NO_HZ_FULL and tick running within a reasonable amount of time
On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote: > Hello! > > I am hitting the following on today's mainline under rcutorture, but > only on scenarios built with CONFIG_NO_HZ_FULL=y: > > > > WARNING: CPU: 0 PID: 7 at > /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 > sched_tick_remote+0x113/0x120 > Modules linked in: > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > Workqueue: events_unbound sched_tick_remote > RIP: 0010:sched_tick_remote+0x113/0x120 > RSP: 0018:94d540103e20 EFLAGS: 00010002 > RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f > RDX: b2d05e00 RSI: RDI: 8f95dfd21858 > RBP: 94d540103e48 R08: f6499019 R09: f6499000 > R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518 > R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858 > FS: () GS:8f95dfc0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 0a015b40 CR3: 1de14000 CR4: 06f0 > Call Trace: > process_one_work+0x1d9/0x6a0 > worker_thread+0x42/0x420 > kthread+0xf3/0x130 > ? rescuer_thread+0x340/0x340 > ? kthread_delayed_work_timer_fn+0x80/0x80 > ret_from_fork+0x3a/0x50 > Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff be ff > ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb 97 66 0f > 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 > ---[ end trace fbdcbe529a8ae799 ]-- > > > > The WARN_ON_ONCE() triggering is this guy: > > delta = rq_clock_task(rq) - curr->se.exec_start; > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); Weird. Can you try to print up those values and see how much they drift? if (WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3)) printk_once("clock_task: %lld exec_start: %lld\n", rq_clock_task(rq), curr->se.exec_start); > > But given that ->se.exec_start is zeroed from time to time, for example, > in migrate_task_rq_fair(), I am a bit suspicious of this check. > > What am I missing here? > > Thanx, Paul >
Re: NO_HZ_FULL and tick running within a reasonable amount of time
On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote: > The WARN_ON_ONCE() triggering is this guy: > > delta = rq_clock_task(rq) - curr->se.exec_start; > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); > > But given that ->se.exec_start is zeroed from time to time, for example, > in migrate_task_rq_fair(), I am a bit suspicious of this check. > > What am I missing here? We clear it on migration, but set it when we schedule a task back in. The above checks that the 'current' task of that CPU had a tick at least 3 seconds ago (to ensure tasks don't go too long without ticks). The 'current' task is obviously scheduled in and thus must have !0 exec_start time.
Re: NO_HZ_FULL and tick running within a reasonable amount of time
On Mon, Apr 02, 2018 at 03:04:38PM -0700, Paul E. McKenney wrote: > The WARN_ON_ONCE() triggering is this guy: > > delta = rq_clock_task(rq) - curr->se.exec_start; > WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); > > But given that ->se.exec_start is zeroed from time to time, for example, > in migrate_task_rq_fair(), I am a bit suspicious of this check. > > What am I missing here? We clear it on migration, but set it when we schedule a task back in. The above checks that the 'current' task of that CPU had a tick at least 3 seconds ago (to ensure tasks don't go too long without ticks). The 'current' task is obviously scheduled in and thus must have !0 exec_start time.
NO_HZ_FULL and tick running within a reasonable amount of time
Hello! I am hitting the following on today's mainline under rcutorture, but only on scenarios built with CONFIG_NO_HZ_FULL=y: WARNING: CPU: 0 PID: 7 at /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 sched_tick_remote+0x113/0x120 Modules linked in: CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Workqueue: events_unbound sched_tick_remote RIP: 0010:sched_tick_remote+0x113/0x120 RSP: 0018:94d540103e20 EFLAGS: 00010002 RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f RDX: b2d05e00 RSI: RDI: 8f95dfd21858 RBP: 94d540103e48 R08: f6499019 R09: f6499000 R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518 R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858 FS: () GS:8f95dfc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0a015b40 CR3: 1de14000 CR4: 06f0 Call Trace: process_one_work+0x1d9/0x6a0 worker_thread+0x42/0x420 kthread+0xf3/0x130 ? rescuer_thread+0x340/0x340 ? kthread_delayed_work_timer_fn+0x80/0x80 ret_from_fork+0x3a/0x50 Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 ---[ end trace fbdcbe529a8ae799 ]-- The WARN_ON_ONCE() triggering is this guy: delta = rq_clock_task(rq) - curr->se.exec_start; WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); But given that ->se.exec_start is zeroed from time to time, for example, in migrate_task_rq_fair(), I am a bit suspicious of this check. What am I missing here? Thanx, Paul
NO_HZ_FULL and tick running within a reasonable amount of time
Hello! I am hitting the following on today's mainline under rcutorture, but only on scenarios built with CONFIG_NO_HZ_FULL=y: WARNING: CPU: 0 PID: 7 at /home/paulmck/public_git/linux-rcu/kernel/sched/core.c:3124 sched_tick_remote+0x113/0x120 Modules linked in: CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.16.0+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Workqueue: events_unbound sched_tick_remote RIP: 0010:sched_tick_remote+0x113/0x120 RSP: 0018:94d540103e20 EFLAGS: 00010002 RAX: 00012e9bb357 RBX: 8f95dfd21840 RCX: 001f RDX: b2d05e00 RSI: RDI: 8f95dfd21858 RBP: 94d540103e48 R08: f6499019 R09: f6499000 R10: b163d33b R11: a5c8c212 R12: 8f95dfd25518 R13: 8f95de9e4200 R14: 3402 R15: 8f95dfd21858 FS: () GS:8f95dfc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0a015b40 CR3: 1de14000 CR4: 06f0 Call Trace: process_one_work+0x1d9/0x6a0 worker_thread+0x42/0x420 kthread+0xf3/0x130 ? rescuer_thread+0x340/0x340 ? kthread_delayed_work_timer_fn+0x80/0x80 ret_from_fork+0x3a/0x50 Code: ff 48 8b 83 80 0b 00 00 48 85 c0 0f 85 41 ff ff ff e9 45 ff ff ff be ff ff ff ff 4c 89 ff e8 55 44 02 00 85 c0 75 87 0f 0b eb 83 <0f> 0b eb 97 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 ---[ end trace fbdcbe529a8ae799 ]-- The WARN_ON_ONCE() triggering is this guy: delta = rq_clock_task(rq) - curr->se.exec_start; WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); But given that ->se.exec_start is zeroed from time to time, for example, in migrate_task_rq_fair(), I am a bit suspicious of this check. What am I missing here? Thanx, Paul