Re: scheduler crash on Power

2014-08-04 Thread Dietmar Eggemann
On 04/08/14 04:20, Michael Ellerman wrote:
 On Fri, 2014-08-01 at 14:24 -0700, Sukadev Bhattiprolu wrote:
 Dietmar Eggemann [dietmar.eggem...@arm.com] wrote:
 |  ltcbrazos2-lp07 login: [  181.915974] [ cut here 
 ]
 |  [  181.915991] WARNING: at ../kernel/sched/core.c:5881
 | 
 | This warning indicates the problem. One of the struct sched_domains does
 | not have it's groups member set.
 | 
 | And its happening during a rebuild of the sched domain hierarchy, not
 | during the initial build.
 | 
 | You could run your system with the following patch-let (on top of
 | https://lkml.org/lkml/2014/7/17/288)  w/ and w/o the perf related
 | patches (w/ CONFIG_SCHED_DEBUG enabled).
 | 
 | @@ -5882,6 +5882,9 @@ static void init_sched_groups_capacity(int cpu,
 | struct sched_domain *sd)
 |  {
 | struct sched_group *sg = sd-groups;
 | 
 | +#ifdef CONFIG_SCHED_DEBUG
 | +   printk(sd name: %s span: %pc\n, sd-name, sd-span);
 | +#endif
 | WARN_ON(!sg);
 | 
 | do {
 | 
 | This will show if the rebuild of the sched domain hierarchy happens on
 | both systems and hopefully indicate for which sched_domain the
 | sd-groups is not set.

 Thanks for the patch. It appears that the NUMA sched domain does not
 have the sd-groups set - snippet of the error (with your patch and
 Peter's patch)

 [  181.914494] build_sched_groups: got group c6da with cpus: 
 [  181.914498] build_sched_groups: got group c000dd83 with cpus: 
 [  181.915234] sd name: SMT span: 8-15
 [  181.915239] sd name: DIE span: 0-7
 [  181.915242] sd name: NUMA span: 0-15
 [  181.915250] [ cut here ]
 [  181.915253] WARNING: at ../kernel/sched/core.c:5891

 Patched code:

  5884 static void init_sched_groups_capacity(int cpu, struct 
 sched_domain *sd)
  5885 {
  5886 struct sched_group *sg = sd-groups;
  5887 
  5888 #ifdef CONFIG_SCHED_DEBUG
  5889 printk(sd name: %s span: %pc\n, sd-name, sd-span);
  5890 #endif
  5891 WARN_ON(!sg);

 Complete log below.

 I was able to bisect it down to this patch in the 24x7 patchset

  https://lkml.org/lkml/2014/5/27/804

 I replaced the kfree(page) calls in the patch with
 kmem_cache_free(hv_page_cache, page).

 The problem sems to disappear if the call to create_events_from_catalog()
 in hv_24x7_init() is skipped. I am continuing to debug the 24x7 patch.
 
 Is that patch just clobbering memory it doesn't own and corrupting the
 scheduler data structures?

Quite likely. When the system comes up initially, it has SMT and DIE
sched domain level:

...
[0.033832] build_sched_groups: got group c000e7d5 with cpus:
[0.033835] build_sched_groups: got group c000e7d8 with cpus:
[0.033844] sd name: SMT span: 8-15
[0.033847] sd name: DIE span: 0-15  -- !!!
[0.033850] sd name: SMT span: 8-15
[0.033853] sd name: DIE span: 0-15
...

and the cpu mask of DIE spans all CPUs '0-15'.

Then during the rebuild of the sched domain hierarchy, this looks very
different:

...
[  181.914494] build_sched_groups: got group c6da with cpus:
[  181.914498] build_sched_groups: got group c000dd83 with cpus:
[  181.915234] sd name: SMT span: 8-15
[  181.915239] sd name: DIE span: 0-7   -- !!!
[  181.915242] sd name: NUMA span: 0-15
...

The cpu mask of the DIE level is all the sudden '0-7', which is clearly
wrong.

So I suspect that sched_domain_mask_f mask function for the DIE level
'cpu_cpu_mask()' returns a wrong value during this rebuild.

Could be checked with this little patch-let:

@@ -6467,6 +6467,12 @@ struct sched_domain *build_sched_domain(struct
sched_domain_topology_level *tl,
if (!sd)
return child;

+   printk(%s: cpu: %d level: %s cpu_map: %pc tl-mask: %pc\n,
+   __func__,
+   cpu, tl-name,
+   cpu_map,
+   tl-mask(cpu));
+
cpumask_and(sched_domain_span(sd), cpu_map, tl-mask(cpu));
if (child) {
sd-level = child-level + 1;


Should give you something similar like:

...
build_sched_domain: cpu: 0 level: GMC cpu_map: 0-4 tl-mask: 0
build_sched_domain: cpu: 0 level: MC cpu_map: 0-4 tl-mask: 0-1
build_sched_domain: cpu: 0 level: DIE cpu_map: 0-4 tl-mask: 0-4
build_sched_domain: cpu: 1 level: GMC cpu_map: 0-4 tl-mask: 1
build_sched_domain: cpu: 1 level: MC cpu_map: 0-4 tl-mask: 0-1
build_sched_domain: cpu: 1 level: DIE cpu_map: 0-4 tl-mask: 0-4
...

 
 cheers
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: scheduler crash on Power

2014-08-03 Thread Michael Ellerman
On Fri, 2014-08-01 at 14:24 -0700, Sukadev Bhattiprolu wrote:
 Dietmar Eggemann [dietmar.eggem...@arm.com] wrote:
 |  ltcbrazos2-lp07 login: [  181.915974] [ cut here ]
 |  [  181.915991] WARNING: at ../kernel/sched/core.c:5881
 | 
 | This warning indicates the problem. One of the struct sched_domains does
 | not have it's groups member set.
 | 
 | And its happening during a rebuild of the sched domain hierarchy, not
 | during the initial build.
 | 
 | You could run your system with the following patch-let (on top of
 | https://lkml.org/lkml/2014/7/17/288)  w/ and w/o the perf related
 | patches (w/ CONFIG_SCHED_DEBUG enabled).
 | 
 | @@ -5882,6 +5882,9 @@ static void init_sched_groups_capacity(int cpu,
 | struct sched_domain *sd)
 |  {
 | struct sched_group *sg = sd-groups;
 | 
 | +#ifdef CONFIG_SCHED_DEBUG
 | +   printk(sd name: %s span: %pc\n, sd-name, sd-span);
 | +#endif
 | WARN_ON(!sg);
 | 
 | do {
 | 
 | This will show if the rebuild of the sched domain hierarchy happens on
 | both systems and hopefully indicate for which sched_domain the
 | sd-groups is not set.
 
 Thanks for the patch. It appears that the NUMA sched domain does not
 have the sd-groups set - snippet of the error (with your patch and
 Peter's patch)
 
 [  181.914494] build_sched_groups: got group c6da with cpus: 
 [  181.914498] build_sched_groups: got group c000dd83 with cpus: 
 [  181.915234] sd name: SMT span: 8-15
 [  181.915239] sd name: DIE span: 0-7
 [  181.915242] sd name: NUMA span: 0-15
 [  181.915250] [ cut here ]
 [  181.915253] WARNING: at ../kernel/sched/core.c:5891
 
 Patched code:
 
   5884 static void init_sched_groups_capacity(int cpu, struct 
 sched_domain *sd)
   5885 {
   5886 struct sched_group *sg = sd-groups;
   5887 
   5888 #ifdef CONFIG_SCHED_DEBUG
   5889 printk(sd name: %s span: %pc\n, sd-name, sd-span);
   5890 #endif
   5891 WARN_ON(!sg);
 
 Complete log below.
 
 I was able to bisect it down to this patch in the 24x7 patchset
 
   https://lkml.org/lkml/2014/5/27/804
 
 I replaced the kfree(page) calls in the patch with
 kmem_cache_free(hv_page_cache, page).
 
 The problem sems to disappear if the call to create_events_from_catalog()
 in hv_24x7_init() is skipped. I am continuing to debug the 24x7 patch.

Is that patch just clobbering memory it doesn't own and corrupting the
scheduler data structures?

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: scheduler crash on Power

2014-08-01 Thread Sukadev Bhattiprolu
Dietmar Eggemann [dietmar.eggem...@arm.com] wrote:
|  ltcbrazos2-lp07 login: [  181.915974] [ cut here ]
|  [  181.915991] WARNING: at ../kernel/sched/core.c:5881
| 
| This warning indicates the problem. One of the struct sched_domains does
| not have it's groups member set.
| 
| And its happening during a rebuild of the sched domain hierarchy, not
| during the initial build.
| 
| You could run your system with the following patch-let (on top of
| https://lkml.org/lkml/2014/7/17/288)  w/ and w/o the perf related
| patches (w/ CONFIG_SCHED_DEBUG enabled).
| 
| @@ -5882,6 +5882,9 @@ static void init_sched_groups_capacity(int cpu,
| struct sched_domain *sd)
|  {
| struct sched_group *sg = sd-groups;
| 
| +#ifdef CONFIG_SCHED_DEBUG
| +   printk(sd name: %s span: %pc\n, sd-name, sd-span);
| +#endif
| WARN_ON(!sg);
| 
| do {
| 
| This will show if the rebuild of the sched domain hierarchy happens on
| both systems and hopefully indicate for which sched_domain the
| sd-groups is not set.

Thanks for the patch. It appears that the NUMA sched domain does not
have the sd-groups set - snippet of the error (with your patch and
Peter's patch)

[  181.914494] build_sched_groups: got group c6da with cpus: 
[  181.914498] build_sched_groups: got group c000dd83 with cpus: 
[  181.915234] sd name: SMT span: 8-15
[  181.915239] sd name: DIE span: 0-7
[  181.915242] sd name: NUMA span: 0-15
[  181.915250] [ cut here ]
[  181.915253] WARNING: at ../kernel/sched/core.c:5891

Patched code:

5884 static void init_sched_groups_capacity(int cpu, struct 
sched_domain *sd)
5885 {
5886 struct sched_group *sg = sd-groups;
5887 
5888 #ifdef CONFIG_SCHED_DEBUG
5889 printk(sd name: %s span: %pc\n, sd-name, sd-span);
5890 #endif
5891 WARN_ON(!sg);

Complete log below.

I was able to bisect it down to this patch in the 24x7 patchset

https://lkml.org/lkml/2014/5/27/804

I replaced the kfree(page) calls in the patch with
kmem_cache_free(hv_page_cache, page).

The problem sems to disappear if the call to create_events_from_catalog()
in hv_24x7_init() is skipped. I am continuing to debug the 24x7 patch.

While the patched kernel often crashes before the first ssh into it,
the unpatched kernel has been stable through multiple kernel builds.

Are there any scheduler specific tests I can run on the unpatched kernel ?

Sukadev

Complete log:

OF stdout device is: /vdevice/vty@3000
Preparing to boot Linux version 3.16.0-rc7-24x7-dbg+ 
(r...@ltcbrazos2-lp07.austin.ibm.com) (gcc version 4.8.2 20140120 (Red Hat 
4.8.2-16) (GCC) ) #38 SMP Fri Aug 1 12:14:18 EDT 2014
Detected machine type: 0101
Max number of cores passed to firmware: 256 (NR_CPUS = 2048)
Calling ibm,client-architecture-support... done
command line: BOOT_IMAGE=/vmlinux-3.16.0-rc7-24x7-dbg+ 
root=UUID=e72c49fa-e137-43ff-ab41-44f3124572eb ro vconsole.keymap=us 
rd.lvm.lv=rhel_ltcbrazos2-lp07/root rd.lvm.lv=rhel_ltcbrazos2-lp07/swap 
crashkernel=auto vconsole.font=latarcyrheb-sun16
memory layout at init:
  memory_limit :  (16 MB aligned)
  alloc_bottom : 0a5f
  alloc_top: 1000
  alloc_top_hi : 1000
  rmo_top  : 1000
  ram_top  : 1000
instantiating rtas at 0x0ee2... done
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x0a60 - 0x0a6016d4
Device tree struct  0x0a61 - 0x0a64
Calling quiesce...
returning from prom_init
[0.00] crashkernel: memory value expected
[0.00] Using pSeries machine description
[0.00] Page sizes from device-tree:
[0.00] base_shift=12: shift=12, sllp=0x, avpnm=0x, 
tlbiel=1, penc=0
[0.00] base_shift=12: shift=16, sllp=0x, avpnm=0x, 
tlbiel=1, penc=7
[0.00] base_shift=12: shift=24, sllp=0x, avpnm=0x, 
tlbiel=1, penc=56
[0.00] base_shift=16: shift=16, sllp=0x0110, avpnm=0x, 
tlbiel=1, penc=1
[0.00] base_shift=16: shift=24, sllp=0x0110, avpnm=0x, 
tlbiel=1, penc=8
[0.00] base_shift=24: shift=24, sllp=0x0100, avpnm=0x0001, 
tlbiel=0, penc=0
[0.00] base_shift=34: shift=34, sllp=0x0120, avpnm=0x07ff, 
tlbiel=0, penc=3
[0.00] Using 1TB segments
[0.00] kvm_cma: CMA: reserved 256 MiB
[0.00] Found initrd at 0xc9a0:0xca5ebcf8
[0.00] bootconsole [udbg0] enabled
[0.00] Partition configured for 32 cpus.
[0.00] CPU maps initialized for 8 threads per core
 - smp_release_cpus()
spinning_secondaries = 15
 - smp_release_cpus()
[0.00] Starting Linux PPC64 #38 SMP Fri Aug 1 12:14:18 EDT 2014
[0.00] -
[

Re: scheduler crash on Power

2014-07-31 Thread Michael Ellerman
On Wed, 2014-07-30 at 00:22 -0700, Sukadev Bhattiprolu wrote:
 I am getting this crash on a Powerpc system using 3.16.0-rc7 kernel plus
 some patches related to perf (24x7 counters) that Cody Schafer posted here:
 
   https://lkml.org/lkml/2014/5/27/768
 
 I don't get the crash on an unpatched kernel though.

You mean you don't get the crash on 3.16-rc7 ?

I find it hard to believe those 24x7 patches are causing this.
 
 I am also attaching the debug messages that Peterz added
 here: https://lkml.org/lkml/2014/7/17/288

I don't see any FAIL messages in your log, so it looks like you're not hitting
the case that patch was looking for?

 Appreciate any debug suggestions.

Reproduce on an unpatched kernel.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

scheduler crash on Power

2014-07-30 Thread Sukadev Bhattiprolu

I am getting this crash on a Powerpc system using 3.16.0-rc7 kernel plus
some patches related to perf (24x7 counters) that Cody Schafer posted here:

https://lkml.org/lkml/2014/5/27/768

I don't get the crash on an unpatched kernel though.

I have been staring at the perf event patches, but can't find anything
impacting the scheduler. Besides the patches had worked on 3.16.0-rc2
kernel on a different Power system.

The crash occurs on an idle system, a minute or two after booting to
runlevel 3.

kernel/sched/core.c:

---
5877 static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
5878 {
5879 struct sched_group *sg = sd-groups;
5880 
5881 WARN_ON(!sg);
5882 
5883 do {
5884 sg-group_weight = cpumask_weight(sched_group_cpus(sg));

---


I tried applying the patch discussed in https://lkml.org/lkml/2014/7/16/386
but doesn't seem to help.

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bc1638b..50702a8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5842,6 +5842,8 @@ build_sched_groups(struct sched_domain *sd, int cpu)
continue;
 
group = get_group(i, sdd, sg);
+   cpumask_clear(sched_group_cpus(sg));
+   sg-sgc-capacity = 0;
cpumask_setall(sched_group_mask(sg));
 
for_each_cpu(j, span) {


I am also attaching the debug messages that Peterz added
here: https://lkml.org/lkml/2014/7/17/288

Appreciate any debug suggestions.

Sukadev



Red Hat Enterprise Linux Server 7.0 (Maipo)
Kernel 3.16.0-rc7-24x7+ on an ppc64

ltcbrazos2-lp07 login: 

Red Hat Enterprise Linux Server 7.0 (Maipo)
Kernel 3.16.0-rc7-24x7+ on an ppc64

ltcbrazos2-lp07 login: [  181.915974] [ cut here ]
[  181.915991] WARNING: at ../kernel/sched/core.c:5881
[  181.915994] Modules linked in: sg cfg80211 rfkill nx_crypto ibmveth 
pseries_rng xfs libcrc32c sd_mod crc_t10dif crct10dif_common ibmvscsi 
scsi_transport_srp scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
[  181.916024] CPU: 4 PID: 1087 Comm: kworker/4:2 Not tainted 3.16.0-rc7-24x7+ 
#15
[  181.916034] Workqueue: events .topology_work_fn
[  181.916038] task: c000dbd4 ti: c000da40 task.ti: 
c000da40
[  181.916043] NIP: c00d7528 LR: c00d7578 CTR: 
[  181.916047] REGS: c000da403580 TRAP: 0700   Not tainted  
(3.16.0-rc7-24x7+)
[  181.916051] MSR: 800100029032 SF,EE,ME,IR,DR,RI  CR: 28484c24  XER: 

[  181.916063] CFAR: c00d74f4 SOFTE: 1 
GPR00: c00d7578 c000da403800 c0eaa7f0 0800 
GPR04: 0800 0800  c09cf878 
GPR08: c09cf880 0001 0010  
GPR12:  cebe1200 0800 c000cc2f 
GPR16: c0ef0a68 0078 c000e500 0078 
GPR20:  0001 c000cc2f 0001 
GPR24: c0db4402 000f  c000dea39300 
GPR28: c0ef0ae0 c000e544  c0ef4f7c 
[  181.916146] NIP [c00d7528] .build_sched_domains+0xc28/0xd90
[  181.916151] LR [c00d7578] .build_sched_domains+0xc78/0xd90
[  181.916155] Call Trace:
[  181.916159] [c000da403800] [c00d7578] 
.build_sched_domains+0xc78/0xd90 (unreliable)
[  181.916166] [c000da403950] [c00d7950] 
.partition_sched_domains+0x260/0x3f0
[  181.916175] [c000da403a30] [c0141704] 
.rebuild_sched_domains_locked+0x54/0x70
[  181.916182] [c000da403ab0] [c0143a98] 
.rebuild_sched_domains+0x28/0x50
[  181.916188] [c000da403b30] [c004f250] .topology_work_fn+0x10/0x30
[  181.916194] [c000da403ba0] [c00b7100] 
.process_one_work+0x1a0/0x4c0
[  181.916199] [c000da403c40] [c00b7970] .worker_thread+0x180/0x630
[  181.916205] [c000da403d30] [c00bfc88] .kthread+0x108/0x130
[  181.916214] [c000da403e30] [c000a3e4] 
.ret_from_kernel_thread+0x58/0x74
[  181.916220] Instruction dump:
[  181.916223] 7f47492a e93c e90a0010 7d0a4378 7d4a482a 814a 2f8a 
419e0008 
[  181.916235] 7f48492a ebdd0010 7fc90074 7929d182 0b09 4814 6000 
6000 
[  181.916245] ---[ end trace 6e9d20016598c36c ]---
[  181.916253] Unable to handle kernel paging request for data at address 
0x0018
[  181.916257] Faulting instruction address: 0xc039d1c0
[  181.916263] Oops: Kernel access of bad area, sig: 11 [#1]
[  181.916267] SMP NR_CPUS=2048 NUMA pSeries
[  181.916271] Modules linked in: sg cfg80211 rfkill nx_crypto ibmveth 
pseries_rng xfs libcrc32c sd_mod crc_t10dif crct10dif_common ibmvscsi 
scsi_transport_srp scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
[  181.916293] CPU: 4 PID: 1087 Comm: kworker/4:2 Tainted: GW 
3.16.0-rc7-24x7+ #15
[  181.916299] Workqueue: