On Fri, 2014-08-01 at 14:24 -0700, Sukadev Bhattiprolu wrote: > Dietmar Eggemann [dietmar.eggem...@arm.com] wrote: > | > ltcbrazos2-lp07 login: [ 181.915974] ------------[ cut here ]------------ > | > [ 181.915991] WARNING: at ../kernel/sched/core.c:5881 > | > | This warning indicates the problem. One of the struct sched_domains does > | not have it's groups member set. > | > | And its happening during a rebuild of the sched domain hierarchy, not > | during the initial build. > | > | You could run your system with the following patch-let (on top of > | https://lkml.org/lkml/2014/7/17/288) w/ and w/o the perf related > | patches (w/ CONFIG_SCHED_DEBUG enabled). > | > | @@ -5882,6 +5882,9 @@ static void init_sched_groups_capacity(int cpu, > | struct sched_domain *sd) > | { > | struct sched_group *sg = sd->groups; > | > | +#ifdef CONFIG_SCHED_DEBUG > | + printk("sd name: %s span: %pc\n", sd->name, sd->span); > | +#endif > | WARN_ON(!sg); > | > | do { > | > | This will show if the rebuild of the sched domain hierarchy happens on > | both systems and hopefully indicate for which sched_domain the > | sd->groups is not set. > > Thanks for the patch. It appears that the NUMA sched domain does not > have the sd->groups set - snippet of the error (with your patch and > Peter's patch) > > [ 181.914494] build_sched_groups: got group c000000006da0000 with cpus: > [ 181.914498] build_sched_groups: got group c0000000dd830000 with cpus: > [ 181.915234] sd name: SMT span: 8-15 > [ 181.915239] sd name: DIE span: 0-7 > [ 181.915242] sd name: NUMA span: 0-15 > [ 181.915250] ------------[ cut here ]------------ > [ 181.915253] WARNING: at ../kernel/sched/core.c:5891 > > Patched code: > > 5884 static void init_sched_groups_capacity(int cpu, struct > sched_domain *sd) > 5885 { > 5886 struct sched_group *sg = sd->groups; > 5887 > 5888 #ifdef CONFIG_SCHED_DEBUG > 5889 printk("sd name: %s span: %pc\n", sd->name, sd->span); > 5890 #endif > 5891 WARN_ON(!sg); > > Complete log below. > > I was able to bisect it down to this patch in the 24x7 patchset > > https://lkml.org/lkml/2014/5/27/804 > > I replaced the kfree(page) calls in the patch with > kmem_cache_free(hv_page_cache, page). > > The problem sems to disappear if the call to create_events_from_catalog() > in hv_24x7_init() is skipped. I am continuing to debug the 24x7 patch.
Is that patch just clobbering memory it doesn't own and corrupting the scheduler data structures? cheers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/