Re: scheduler crash on Power
On 04/08/14 04:20, Michael Ellerman wrote: On Fri, 2014-08-01 at 14:24 -0700, Sukadev Bhattiprolu wrote: Dietmar Eggemann [dietmar.eggem...@arm.com] wrote: | ltcbrazos2-lp07 login: [ 181.915974] [ cut here ] | [ 181.915991] WARNING: at ../kernel/sched/core.c:5881 | | This warning indicates the problem. One of the struct sched_domains does | not have it's groups member set. | | And its happening during a rebuild of the sched domain hierarchy, not | during the initial build. | | You could run your system with the following patch-let (on top of | https://lkml.org/lkml/2014/7/17/288) w/ and w/o the perf related | patches (w/ CONFIG_SCHED_DEBUG enabled). | | @@ -5882,6 +5882,9 @@ static void init_sched_groups_capacity(int cpu, | struct sched_domain *sd) | { | struct sched_group *sg = sd-groups; | | +#ifdef CONFIG_SCHED_DEBUG | + printk(sd name: %s span: %pc\n, sd-name, sd-span); | +#endif | WARN_ON(!sg); | | do { | | This will show if the rebuild of the sched domain hierarchy happens on | both systems and hopefully indicate for which sched_domain the | sd-groups is not set. Thanks for the patch. It appears that the NUMA sched domain does not have the sd-groups set - snippet of the error (with your patch and Peter's patch) [ 181.914494] build_sched_groups: got group c6da with cpus: [ 181.914498] build_sched_groups: got group c000dd83 with cpus: [ 181.915234] sd name: SMT span: 8-15 [ 181.915239] sd name: DIE span: 0-7 [ 181.915242] sd name: NUMA span: 0-15 [ 181.915250] [ cut here ] [ 181.915253] WARNING: at ../kernel/sched/core.c:5891 Patched code: 5884 static void init_sched_groups_capacity(int cpu, struct sched_domain *sd) 5885 { 5886 struct sched_group *sg = sd-groups; 5887 5888 #ifdef CONFIG_SCHED_DEBUG 5889 printk(sd name: %s span: %pc\n, sd-name, sd-span); 5890 #endif 5891 WARN_ON(!sg); Complete log below. I was able to bisect it down to this patch in the 24x7 patchset https://lkml.org/lkml/2014/5/27/804 I replaced the kfree(page) calls in the patch with kmem_cache_free(hv_page_cache, page). The problem sems to disappear if the call to create_events_from_catalog() in hv_24x7_init() is skipped. I am continuing to debug the 24x7 patch. Is that patch just clobbering memory it doesn't own and corrupting the scheduler data structures? Quite likely. When the system comes up initially, it has SMT and DIE sched domain level: ... [0.033832] build_sched_groups: got group c000e7d5 with cpus: [0.033835] build_sched_groups: got group c000e7d8 with cpus: [0.033844] sd name: SMT span: 8-15 [0.033847] sd name: DIE span: 0-15 -- !!! [0.033850] sd name: SMT span: 8-15 [0.033853] sd name: DIE span: 0-15 ... and the cpu mask of DIE spans all CPUs '0-15'. Then during the rebuild of the sched domain hierarchy, this looks very different: ... [ 181.914494] build_sched_groups: got group c6da with cpus: [ 181.914498] build_sched_groups: got group c000dd83 with cpus: [ 181.915234] sd name: SMT span: 8-15 [ 181.915239] sd name: DIE span: 0-7 -- !!! [ 181.915242] sd name: NUMA span: 0-15 ... The cpu mask of the DIE level is all the sudden '0-7', which is clearly wrong. So I suspect that sched_domain_mask_f mask function for the DIE level 'cpu_cpu_mask()' returns a wrong value during this rebuild. Could be checked with this little patch-let: @@ -6467,6 +6467,12 @@ struct sched_domain *build_sched_domain(struct sched_domain_topology_level *tl, if (!sd) return child; + printk(%s: cpu: %d level: %s cpu_map: %pc tl-mask: %pc\n, + __func__, + cpu, tl-name, + cpu_map, + tl-mask(cpu)); + cpumask_and(sched_domain_span(sd), cpu_map, tl-mask(cpu)); if (child) { sd-level = child-level + 1; Should give you something similar like: ... build_sched_domain: cpu: 0 level: GMC cpu_map: 0-4 tl-mask: 0 build_sched_domain: cpu: 0 level: MC cpu_map: 0-4 tl-mask: 0-1 build_sched_domain: cpu: 0 level: DIE cpu_map: 0-4 tl-mask: 0-4 build_sched_domain: cpu: 1 level: GMC cpu_map: 0-4 tl-mask: 1 build_sched_domain: cpu: 1 level: MC cpu_map: 0-4 tl-mask: 0-1 build_sched_domain: cpu: 1 level: DIE cpu_map: 0-4 tl-mask: 0-4 ... cheers -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: scheduler crash on Power
On Fri, 2014-08-01 at 14:24 -0700, Sukadev Bhattiprolu wrote: Dietmar Eggemann [dietmar.eggem...@arm.com] wrote: | ltcbrazos2-lp07 login: [ 181.915974] [ cut here ] | [ 181.915991] WARNING: at ../kernel/sched/core.c:5881 | | This warning indicates the problem. One of the struct sched_domains does | not have it's groups member set. | | And its happening during a rebuild of the sched domain hierarchy, not | during the initial build. | | You could run your system with the following patch-let (on top of | https://lkml.org/lkml/2014/7/17/288) w/ and w/o the perf related | patches (w/ CONFIG_SCHED_DEBUG enabled). | | @@ -5882,6 +5882,9 @@ static void init_sched_groups_capacity(int cpu, | struct sched_domain *sd) | { | struct sched_group *sg = sd-groups; | | +#ifdef CONFIG_SCHED_DEBUG | + printk(sd name: %s span: %pc\n, sd-name, sd-span); | +#endif | WARN_ON(!sg); | | do { | | This will show if the rebuild of the sched domain hierarchy happens on | both systems and hopefully indicate for which sched_domain the | sd-groups is not set. Thanks for the patch. It appears that the NUMA sched domain does not have the sd-groups set - snippet of the error (with your patch and Peter's patch) [ 181.914494] build_sched_groups: got group c6da with cpus: [ 181.914498] build_sched_groups: got group c000dd83 with cpus: [ 181.915234] sd name: SMT span: 8-15 [ 181.915239] sd name: DIE span: 0-7 [ 181.915242] sd name: NUMA span: 0-15 [ 181.915250] [ cut here ] [ 181.915253] WARNING: at ../kernel/sched/core.c:5891 Patched code: 5884 static void init_sched_groups_capacity(int cpu, struct sched_domain *sd) 5885 { 5886 struct sched_group *sg = sd-groups; 5887 5888 #ifdef CONFIG_SCHED_DEBUG 5889 printk(sd name: %s span: %pc\n, sd-name, sd-span); 5890 #endif 5891 WARN_ON(!sg); Complete log below. I was able to bisect it down to this patch in the 24x7 patchset https://lkml.org/lkml/2014/5/27/804 I replaced the kfree(page) calls in the patch with kmem_cache_free(hv_page_cache, page). The problem sems to disappear if the call to create_events_from_catalog() in hv_24x7_init() is skipped. I am continuing to debug the 24x7 patch. Is that patch just clobbering memory it doesn't own and corrupting the scheduler data structures? cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: scheduler crash on Power
Dietmar Eggemann [dietmar.eggem...@arm.com] wrote: | ltcbrazos2-lp07 login: [ 181.915974] [ cut here ] | [ 181.915991] WARNING: at ../kernel/sched/core.c:5881 | | This warning indicates the problem. One of the struct sched_domains does | not have it's groups member set. | | And its happening during a rebuild of the sched domain hierarchy, not | during the initial build. | | You could run your system with the following patch-let (on top of | https://lkml.org/lkml/2014/7/17/288) w/ and w/o the perf related | patches (w/ CONFIG_SCHED_DEBUG enabled). | | @@ -5882,6 +5882,9 @@ static void init_sched_groups_capacity(int cpu, | struct sched_domain *sd) | { | struct sched_group *sg = sd-groups; | | +#ifdef CONFIG_SCHED_DEBUG | + printk(sd name: %s span: %pc\n, sd-name, sd-span); | +#endif | WARN_ON(!sg); | | do { | | This will show if the rebuild of the sched domain hierarchy happens on | both systems and hopefully indicate for which sched_domain the | sd-groups is not set. Thanks for the patch. It appears that the NUMA sched domain does not have the sd-groups set - snippet of the error (with your patch and Peter's patch) [ 181.914494] build_sched_groups: got group c6da with cpus: [ 181.914498] build_sched_groups: got group c000dd83 with cpus: [ 181.915234] sd name: SMT span: 8-15 [ 181.915239] sd name: DIE span: 0-7 [ 181.915242] sd name: NUMA span: 0-15 [ 181.915250] [ cut here ] [ 181.915253] WARNING: at ../kernel/sched/core.c:5891 Patched code: 5884 static void init_sched_groups_capacity(int cpu, struct sched_domain *sd) 5885 { 5886 struct sched_group *sg = sd-groups; 5887 5888 #ifdef CONFIG_SCHED_DEBUG 5889 printk(sd name: %s span: %pc\n, sd-name, sd-span); 5890 #endif 5891 WARN_ON(!sg); Complete log below. I was able to bisect it down to this patch in the 24x7 patchset https://lkml.org/lkml/2014/5/27/804 I replaced the kfree(page) calls in the patch with kmem_cache_free(hv_page_cache, page). The problem sems to disappear if the call to create_events_from_catalog() in hv_24x7_init() is skipped. I am continuing to debug the 24x7 patch. While the patched kernel often crashes before the first ssh into it, the unpatched kernel has been stable through multiple kernel builds. Are there any scheduler specific tests I can run on the unpatched kernel ? Sukadev Complete log: OF stdout device is: /vdevice/vty@3000 Preparing to boot Linux version 3.16.0-rc7-24x7-dbg+ (r...@ltcbrazos2-lp07.austin.ibm.com) (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #38 SMP Fri Aug 1 12:14:18 EDT 2014 Detected machine type: 0101 Max number of cores passed to firmware: 256 (NR_CPUS = 2048) Calling ibm,client-architecture-support... done command line: BOOT_IMAGE=/vmlinux-3.16.0-rc7-24x7-dbg+ root=UUID=e72c49fa-e137-43ff-ab41-44f3124572eb ro vconsole.keymap=us rd.lvm.lv=rhel_ltcbrazos2-lp07/root rd.lvm.lv=rhel_ltcbrazos2-lp07/swap crashkernel=auto vconsole.font=latarcyrheb-sun16 memory layout at init: memory_limit : (16 MB aligned) alloc_bottom : 0a5f alloc_top: 1000 alloc_top_hi : 1000 rmo_top : 1000 ram_top : 1000 instantiating rtas at 0x0ee2... done prom_hold_cpus: skipped copying OF device tree... Building dt strings... Building dt structure... Device tree strings 0x0a60 - 0x0a6016d4 Device tree struct 0x0a61 - 0x0a64 Calling quiesce... returning from prom_init [0.00] crashkernel: memory value expected [0.00] Using pSeries machine description [0.00] Page sizes from device-tree: [0.00] base_shift=12: shift=12, sllp=0x, avpnm=0x, tlbiel=1, penc=0 [0.00] base_shift=12: shift=16, sllp=0x, avpnm=0x, tlbiel=1, penc=7 [0.00] base_shift=12: shift=24, sllp=0x, avpnm=0x, tlbiel=1, penc=56 [0.00] base_shift=16: shift=16, sllp=0x0110, avpnm=0x, tlbiel=1, penc=1 [0.00] base_shift=16: shift=24, sllp=0x0110, avpnm=0x, tlbiel=1, penc=8 [0.00] base_shift=24: shift=24, sllp=0x0100, avpnm=0x0001, tlbiel=0, penc=0 [0.00] base_shift=34: shift=34, sllp=0x0120, avpnm=0x07ff, tlbiel=0, penc=3 [0.00] Using 1TB segments [0.00] kvm_cma: CMA: reserved 256 MiB [0.00] Found initrd at 0xc9a0:0xca5ebcf8 [0.00] bootconsole [udbg0] enabled [0.00] Partition configured for 32 cpus. [0.00] CPU maps initialized for 8 threads per core - smp_release_cpus() spinning_secondaries = 15 - smp_release_cpus() [0.00] Starting Linux PPC64 #38 SMP Fri Aug 1 12:14:18 EDT 2014 [0.00] - [
Re: scheduler crash on Power
On Wed, 2014-07-30 at 00:22 -0700, Sukadev Bhattiprolu wrote: I am getting this crash on a Powerpc system using 3.16.0-rc7 kernel plus some patches related to perf (24x7 counters) that Cody Schafer posted here: https://lkml.org/lkml/2014/5/27/768 I don't get the crash on an unpatched kernel though. You mean you don't get the crash on 3.16-rc7 ? I find it hard to believe those 24x7 patches are causing this. I am also attaching the debug messages that Peterz added here: https://lkml.org/lkml/2014/7/17/288 I don't see any FAIL messages in your log, so it looks like you're not hitting the case that patch was looking for? Appreciate any debug suggestions. Reproduce on an unpatched kernel. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
scheduler crash on Power
I am getting this crash on a Powerpc system using 3.16.0-rc7 kernel plus some patches related to perf (24x7 counters) that Cody Schafer posted here: https://lkml.org/lkml/2014/5/27/768 I don't get the crash on an unpatched kernel though. I have been staring at the perf event patches, but can't find anything impacting the scheduler. Besides the patches had worked on 3.16.0-rc2 kernel on a different Power system. The crash occurs on an idle system, a minute or two after booting to runlevel 3. kernel/sched/core.c: --- 5877 static void init_sched_groups_capacity(int cpu, struct sched_domain *sd) 5878 { 5879 struct sched_group *sg = sd-groups; 5880 5881 WARN_ON(!sg); 5882 5883 do { 5884 sg-group_weight = cpumask_weight(sched_group_cpus(sg)); --- I tried applying the patch discussed in https://lkml.org/lkml/2014/7/16/386 but doesn't seem to help. diff --git a/kernel/sched/core.c b/kernel/sched/core.c index bc1638b..50702a8 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5842,6 +5842,8 @@ build_sched_groups(struct sched_domain *sd, int cpu) continue; group = get_group(i, sdd, sg); + cpumask_clear(sched_group_cpus(sg)); + sg-sgc-capacity = 0; cpumask_setall(sched_group_mask(sg)); for_each_cpu(j, span) { I am also attaching the debug messages that Peterz added here: https://lkml.org/lkml/2014/7/17/288 Appreciate any debug suggestions. Sukadev Red Hat Enterprise Linux Server 7.0 (Maipo) Kernel 3.16.0-rc7-24x7+ on an ppc64 ltcbrazos2-lp07 login: Red Hat Enterprise Linux Server 7.0 (Maipo) Kernel 3.16.0-rc7-24x7+ on an ppc64 ltcbrazos2-lp07 login: [ 181.915974] [ cut here ] [ 181.915991] WARNING: at ../kernel/sched/core.c:5881 [ 181.915994] Modules linked in: sg cfg80211 rfkill nx_crypto ibmveth pseries_rng xfs libcrc32c sd_mod crc_t10dif crct10dif_common ibmvscsi scsi_transport_srp scsi_tgt dm_mirror dm_region_hash dm_log dm_mod [ 181.916024] CPU: 4 PID: 1087 Comm: kworker/4:2 Not tainted 3.16.0-rc7-24x7+ #15 [ 181.916034] Workqueue: events .topology_work_fn [ 181.916038] task: c000dbd4 ti: c000da40 task.ti: c000da40 [ 181.916043] NIP: c00d7528 LR: c00d7578 CTR: [ 181.916047] REGS: c000da403580 TRAP: 0700 Not tainted (3.16.0-rc7-24x7+) [ 181.916051] MSR: 800100029032 SF,EE,ME,IR,DR,RI CR: 28484c24 XER: [ 181.916063] CFAR: c00d74f4 SOFTE: 1 GPR00: c00d7578 c000da403800 c0eaa7f0 0800 GPR04: 0800 0800 c09cf878 GPR08: c09cf880 0001 0010 GPR12: cebe1200 0800 c000cc2f GPR16: c0ef0a68 0078 c000e500 0078 GPR20: 0001 c000cc2f 0001 GPR24: c0db4402 000f c000dea39300 GPR28: c0ef0ae0 c000e544 c0ef4f7c [ 181.916146] NIP [c00d7528] .build_sched_domains+0xc28/0xd90 [ 181.916151] LR [c00d7578] .build_sched_domains+0xc78/0xd90 [ 181.916155] Call Trace: [ 181.916159] [c000da403800] [c00d7578] .build_sched_domains+0xc78/0xd90 (unreliable) [ 181.916166] [c000da403950] [c00d7950] .partition_sched_domains+0x260/0x3f0 [ 181.916175] [c000da403a30] [c0141704] .rebuild_sched_domains_locked+0x54/0x70 [ 181.916182] [c000da403ab0] [c0143a98] .rebuild_sched_domains+0x28/0x50 [ 181.916188] [c000da403b30] [c004f250] .topology_work_fn+0x10/0x30 [ 181.916194] [c000da403ba0] [c00b7100] .process_one_work+0x1a0/0x4c0 [ 181.916199] [c000da403c40] [c00b7970] .worker_thread+0x180/0x630 [ 181.916205] [c000da403d30] [c00bfc88] .kthread+0x108/0x130 [ 181.916214] [c000da403e30] [c000a3e4] .ret_from_kernel_thread+0x58/0x74 [ 181.916220] Instruction dump: [ 181.916223] 7f47492a e93c e90a0010 7d0a4378 7d4a482a 814a 2f8a 419e0008 [ 181.916235] 7f48492a ebdd0010 7fc90074 7929d182 0b09 4814 6000 6000 [ 181.916245] ---[ end trace 6e9d20016598c36c ]--- [ 181.916253] Unable to handle kernel paging request for data at address 0x0018 [ 181.916257] Faulting instruction address: 0xc039d1c0 [ 181.916263] Oops: Kernel access of bad area, sig: 11 [#1] [ 181.916267] SMP NR_CPUS=2048 NUMA pSeries [ 181.916271] Modules linked in: sg cfg80211 rfkill nx_crypto ibmveth pseries_rng xfs libcrc32c sd_mod crc_t10dif crct10dif_common ibmvscsi scsi_transport_srp scsi_tgt dm_mirror dm_region_hash dm_log dm_mod [ 181.916293] CPU: 4 PID: 1087 Comm: kworker/4:2 Tainted: GW 3.16.0-rc7-24x7+ #15 [ 181.916299] Workqueue: