[15774.966082] divide error: 0000 [#1] SMP [15774.966137] Modules linked in: i915 intel_gtt [15774.966208] CPU: 1 PID: 15319 Comm: gemscript Not tainted 4.7.0-rc1+ #330 [15774.966252] Hardware name: /NUC5CPYB, BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015 [15774.966317] task: ffff880276a55e80 ti: ffff880272c10000 task.ti: ffff880272c10000 [15774.966377] RIP: 0010:[<ffffffff810b12e3>] [<ffffffff810b12e3>] post_init_entity_util_avg+0x43/0x80 [15774.966463] RSP: 0018:ffff880272c13e30 EFLAGS: 00010057 [15774.966504] RAX: 0000000022f00000 RBX: ffff880276a51f80 RCX: 00000000000000e8 [15774.966550] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880276a52000 [15774.966593] RBP: ffff880272c13e30 R08: 0000000000000000 R09: 0000000000000008 [15774.966635] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880276a52544 [15774.966678] R13: ffff880276a52000 R14: ffff880276a521d8 R15: 0000000000003bda [15774.966721] FS: 00007fe4df95a740(0000) GS:ffff88027fd00000(0000) knlGS:0000000000000000 [15774.966781] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [15774.966822] CR2: 00007fe4de7f0378 CR3: 0000000275e15000 CR4: 00000000001006e0 [15774.966860] Stack: [15774.966894] ffff880272c13e68 ffffffff810a69c5 0000000000000206 0000000000003bd7 [15774.966987] 0000000000000000 0000000000000000 ffff880276a51f80 ffff880272c13ee8 [15774.967073] ffffffff8107a364 0000000000003bd7 ffffffffffffffff 0000000000000000 [15774.967162] Call Trace: [15774.967208] [<ffffffff810a69c5>] wake_up_new_task+0x95/0x130 [15774.967256] [<ffffffff8107a364>] _do_fork+0x184/0x400 [15774.967302] [<ffffffff8107a6b7>] SyS_clone+0x37/0x50 [15774.967353] [<ffffffff8100197d>] do_syscall_64+0x5d/0xe0 [15774.967402] [<ffffffff815719bc>] entry_SYSCALL64_slow_path+0x25/0x25 [15774.967444] Code: 89 d1 48 c1 e9 3f 48 01 d1 48 d1 f9 48 85 c9 7e 38 48 85 c0 74 35 48 0f af 07 31 d2 48 89 87 e0 00 00 00 48 8b 76 70 48 83 c6 01 <48> f7 f6 48 39 c8 77 18 48 89 c1 48 89 87 e0 00 00 00 69 c9 7e [15774.968086] RIP [<ffffffff810b12e3>] post_init_entity_util_avg+0x43/0x80 [15774.968142] RSP <ffff880272c13e30>
I've presumed commit 2b8c41daba327 ("sched/fair: Initiate a new task's util avg to a bounded value") to be at fault, hence the CCs. Though it may just be a victim. gdb says 0x43/0x80 is 725 if (cfs_rq->avg.util_avg != 0) { 726 sa->util_avg = cfs_rq->avg.util_avg * se->load.weight; -> 727 sa->util_avg /= (cfs_rq->avg.load_avg + 1); 728 729 if (sa->util_avg > cap) 730 sa->util_avg = cap; 731 } else { I've run the same fork-heavy workload that seemed to hit the initial fault under kasan. kasan has not reported any errors, nor has the bug reoccurred after a day (earlier I had a couple of panics within a few hours). Is it possible for a race window where cfg_rq->avg.load_avg is indeed -1? Any evidence of other memcorruption in the above? -Chris -- Chris Wilson, Intel Open Source Technology Centre