Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
On Mon, Jun 05, 2017 at 02:38:31PM -0700, Andrew Morton wrote: > On Mon, 5 Jun 2017 14:35:11 -0400 Johannes Weinerwrote: > > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -5107,6 +5107,7 @@ static void build_zonelists(pg_data_t *pgdat) > > */ > > static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch); > > static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset); > > +static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); > > static void setup_zone_pageset(struct zone *zone); > > There's a few kb there. It just sits evermore unused after boot? It's not the greatest, but it's nothing new. All the node stats we have now used to be in the zone, i.e. the then bigger boot_pageset, before we moved them to the node level. It just re-adds static boot time space for them now. Of course, if somebody has an idea on how to elegantly reuse that memory after boot, that'd be cool. But we've lived with that footprint for the longest time, so I don't think it's a showstopper.
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
On Mon, Jun 05, 2017 at 02:38:31PM -0700, Andrew Morton wrote: > On Mon, 5 Jun 2017 14:35:11 -0400 Johannes Weiner wrote: > > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -5107,6 +5107,7 @@ static void build_zonelists(pg_data_t *pgdat) > > */ > > static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch); > > static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset); > > +static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); > > static void setup_zone_pageset(struct zone *zone); > > There's a few kb there. It just sits evermore unused after boot? It's not the greatest, but it's nothing new. All the node stats we have now used to be in the zone, i.e. the then bigger boot_pageset, before we moved them to the node level. It just re-adds static boot time space for them now. Of course, if somebody has an idea on how to elegantly reuse that memory after boot, that'd be cool. But we've lived with that footprint for the longest time, so I don't think it's a showstopper.
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
On Tue, Jun 06, 2017 at 09:15:48PM +1000, Michael Ellerman wrote: > But today's linux-next is OK. So I must have missed a fix when testing > this in isolation. > > commit d94b69d9a3f8139e6d5f5d03c197d8004de3905a > Author: Johannes Weiner> AuthorDate: Tue Jun 6 09:19:50 2017 +1000 > Commit: Stephen Rothwell > CommitDate: Tue Jun 6 09:19:50 2017 +1000 > > mm: vmstat: move slab statistics from zone to node counters fix > > Unable to handle kernel paging request at virtual address 2e116007 > pgd = c0004000 > [2e116007] *pgd= > Internal error: Oops: 5 [#1] SMP ARM > > ... > > Booted to userspace: > > $ uname -a > Linux buildroot 4.12.0-rc4-gcc-5.4.1-00130-gd94b69d9a3f8 #354 SMP Tue Jun 6 > 20:44:42 AEST 2017 ppc64le GNU/Linux Thanks for verifying!
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
On Tue, Jun 06, 2017 at 09:15:48PM +1000, Michael Ellerman wrote: > But today's linux-next is OK. So I must have missed a fix when testing > this in isolation. > > commit d94b69d9a3f8139e6d5f5d03c197d8004de3905a > Author: Johannes Weiner > AuthorDate: Tue Jun 6 09:19:50 2017 +1000 > Commit: Stephen Rothwell > CommitDate: Tue Jun 6 09:19:50 2017 +1000 > > mm: vmstat: move slab statistics from zone to node counters fix > > Unable to handle kernel paging request at virtual address 2e116007 > pgd = c0004000 > [2e116007] *pgd= > Internal error: Oops: 5 [#1] SMP ARM > > ... > > Booted to userspace: > > $ uname -a > Linux buildroot 4.12.0-rc4-gcc-5.4.1-00130-gd94b69d9a3f8 #354 SMP Tue Jun 6 > 20:44:42 AEST 2017 ppc64le GNU/Linux Thanks for verifying!
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
Michael Ellermanwrites: > Johannes Weiner writes: >> From 89ed86b5b538d8debd3c29567d7e1d31257fa577 Mon Sep 17 00:00:00 2001 >> From: Johannes Weiner >> Date: Mon, 5 Jun 2017 14:12:15 -0400 >> Subject: [PATCH] mm: vmstat: move slab statistics from zone to node counters >> fix >> >> Unable to handle kernel paging request at virtual address 2e116007 >> pgd = c0004000 >> [2e116007] *pgd= >> Internal error: Oops: 5 [#1] SMP ARM >> Modules linked in: >> CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #200 >> Hardware name: Generic DRA74X (Flattened Device Tree) >> task: c0d0adc0 task.stack: c0d0 >> PC is at __mod_node_page_state+0x2c/0xc8 >> LR is at __per_cpu_offset+0x0/0x8 >> pc : []lr : []psr: 60d3 >> sp : c0d01eec ip : fp : c15782f4 >> r10: r9 : c1591280 r8 : 4000 >> r7 : 0001 r6 : 0006 r5 : 2e116000 r4 : 0007 >> r3 : 0007 r2 : 0001 r1 : 0006 r0 : c0dc27c0 >> Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment none >> Control: 10c5387d Table: 8000406a DAC: 0051 >> Process swapper (pid: 0, stack limit = 0xc0d00218) >> Stack: (0xc0d01eec to 0xc0d02000) >> 1ee0:60d3 c0dc27c0 c0271efc 0001 c0d58864 >> 1f00: ef47 8000 4000 c029fbb0 0100 c1572b5c 2000 >> 1f20: 0001 0001 8000 c029f584 c0d58864 8000 8000 >> 1f40: 01008000 c0c23790 c15782f4 a0d3 c0d58864 c02a0364 c0819388 >> 1f60: c0d58864 00c0 0100 c1572a58 c0aa57a4 0080 2000 c0dca000 >> 1f80: efffe980 c0c53a48 c0c23790 c1572a58 c0c59e48 c0c59de8 c1572b5c >> 1fa0: c0dca000 c0c257a4 c0dca000 c0d07940 c0dca000 c0c00a9c >> 1fc0: c0c00680 c0c53a48 c0dca214 c0d07958 >> 1fe0: c0c53a44 c0d0caa4 8000406a 412fc0f2 8000807c >> [] (__mod_node_page_state) from [] >> (mod_node_page_state+0x2c/0x4c) >> [] (mod_node_page_state) from [] >> (cache_alloc_refill+0x5b8/0x828) >> [] (cache_alloc_refill) from [] >> (kmem_cache_alloc+0x24c/0x2d0) >> [] (kmem_cache_alloc) from [] >> (create_kmalloc_cache+0x20/0x8c) >> [] (create_kmalloc_cache) from [] >> (kmem_cache_init+0xac/0x11c) >> [] (kmem_cache_init) from [] (start_kernel+0x1b8/0x3c0) >> [] (start_kernel) from [<8000807c>] (0x8000807c) >> Code: e79e5103 e28c3001 e0833001 e1a04003 (e19440d5) >> ---[ end trace ]--- > > Just to be clear that's not my call trace. > >> The zone counters work earlier than the node counters because the >> zones have special boot pagesets, whereas the nodes do not. >> >> Add boot nodestats against which we account until the dynamic per-cpu >> allocator is available. > > This isn't working for me. I applied it on top of next-20170605, I still > get an oops: But today's linux-next is OK. So I must have missed a fix when testing this in isolation. commit d94b69d9a3f8139e6d5f5d03c197d8004de3905a Author: Johannes Weiner AuthorDate: Tue Jun 6 09:19:50 2017 +1000 Commit: Stephen Rothwell CommitDate: Tue Jun 6 09:19:50 2017 +1000 mm: vmstat: move slab statistics from zone to node counters fix Unable to handle kernel paging request at virtual address 2e116007 pgd = c0004000 [2e116007] *pgd= Internal error: Oops: 5 [#1] SMP ARM ... Booted to userspace: $ uname -a Linux buildroot 4.12.0-rc4-gcc-5.4.1-00130-gd94b69d9a3f8 #354 SMP Tue Jun 6 20:44:42 AEST 2017 ppc64le GNU/Linux cheers
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
Michael Ellerman writes: > Johannes Weiner writes: >> From 89ed86b5b538d8debd3c29567d7e1d31257fa577 Mon Sep 17 00:00:00 2001 >> From: Johannes Weiner >> Date: Mon, 5 Jun 2017 14:12:15 -0400 >> Subject: [PATCH] mm: vmstat: move slab statistics from zone to node counters >> fix >> >> Unable to handle kernel paging request at virtual address 2e116007 >> pgd = c0004000 >> [2e116007] *pgd= >> Internal error: Oops: 5 [#1] SMP ARM >> Modules linked in: >> CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #200 >> Hardware name: Generic DRA74X (Flattened Device Tree) >> task: c0d0adc0 task.stack: c0d0 >> PC is at __mod_node_page_state+0x2c/0xc8 >> LR is at __per_cpu_offset+0x0/0x8 >> pc : []lr : []psr: 60d3 >> sp : c0d01eec ip : fp : c15782f4 >> r10: r9 : c1591280 r8 : 4000 >> r7 : 0001 r6 : 0006 r5 : 2e116000 r4 : 0007 >> r3 : 0007 r2 : 0001 r1 : 0006 r0 : c0dc27c0 >> Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment none >> Control: 10c5387d Table: 8000406a DAC: 0051 >> Process swapper (pid: 0, stack limit = 0xc0d00218) >> Stack: (0xc0d01eec to 0xc0d02000) >> 1ee0:60d3 c0dc27c0 c0271efc 0001 c0d58864 >> 1f00: ef47 8000 4000 c029fbb0 0100 c1572b5c 2000 >> 1f20: 0001 0001 8000 c029f584 c0d58864 8000 8000 >> 1f40: 01008000 c0c23790 c15782f4 a0d3 c0d58864 c02a0364 c0819388 >> 1f60: c0d58864 00c0 0100 c1572a58 c0aa57a4 0080 2000 c0dca000 >> 1f80: efffe980 c0c53a48 c0c23790 c1572a58 c0c59e48 c0c59de8 c1572b5c >> 1fa0: c0dca000 c0c257a4 c0dca000 c0d07940 c0dca000 c0c00a9c >> 1fc0: c0c00680 c0c53a48 c0dca214 c0d07958 >> 1fe0: c0c53a44 c0d0caa4 8000406a 412fc0f2 8000807c >> [] (__mod_node_page_state) from [] >> (mod_node_page_state+0x2c/0x4c) >> [] (mod_node_page_state) from [] >> (cache_alloc_refill+0x5b8/0x828) >> [] (cache_alloc_refill) from [] >> (kmem_cache_alloc+0x24c/0x2d0) >> [] (kmem_cache_alloc) from [] >> (create_kmalloc_cache+0x20/0x8c) >> [] (create_kmalloc_cache) from [] >> (kmem_cache_init+0xac/0x11c) >> [] (kmem_cache_init) from [] (start_kernel+0x1b8/0x3c0) >> [] (start_kernel) from [<8000807c>] (0x8000807c) >> Code: e79e5103 e28c3001 e0833001 e1a04003 (e19440d5) >> ---[ end trace ]--- > > Just to be clear that's not my call trace. > >> The zone counters work earlier than the node counters because the >> zones have special boot pagesets, whereas the nodes do not. >> >> Add boot nodestats against which we account until the dynamic per-cpu >> allocator is available. > > This isn't working for me. I applied it on top of next-20170605, I still > get an oops: But today's linux-next is OK. So I must have missed a fix when testing this in isolation. commit d94b69d9a3f8139e6d5f5d03c197d8004de3905a Author: Johannes Weiner AuthorDate: Tue Jun 6 09:19:50 2017 +1000 Commit: Stephen Rothwell CommitDate: Tue Jun 6 09:19:50 2017 +1000 mm: vmstat: move slab statistics from zone to node counters fix Unable to handle kernel paging request at virtual address 2e116007 pgd = c0004000 [2e116007] *pgd= Internal error: Oops: 5 [#1] SMP ARM ... Booted to userspace: $ uname -a Linux buildroot 4.12.0-rc4-gcc-5.4.1-00130-gd94b69d9a3f8 #354 SMP Tue Jun 6 20:44:42 AEST 2017 ppc64le GNU/Linux cheers
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
Johannes Weinerwrites: > From 89ed86b5b538d8debd3c29567d7e1d31257fa577 Mon Sep 17 00:00:00 2001 > From: Johannes Weiner > Date: Mon, 5 Jun 2017 14:12:15 -0400 > Subject: [PATCH] mm: vmstat: move slab statistics from zone to node counters > fix > > Unable to handle kernel paging request at virtual address 2e116007 > pgd = c0004000 > [2e116007] *pgd= > Internal error: Oops: 5 [#1] SMP ARM > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #200 > Hardware name: Generic DRA74X (Flattened Device Tree) > task: c0d0adc0 task.stack: c0d0 > PC is at __mod_node_page_state+0x2c/0xc8 > LR is at __per_cpu_offset+0x0/0x8 > pc : []lr : []psr: 60d3 > sp : c0d01eec ip : fp : c15782f4 > r10: r9 : c1591280 r8 : 4000 > r7 : 0001 r6 : 0006 r5 : 2e116000 r4 : 0007 > r3 : 0007 r2 : 0001 r1 : 0006 r0 : c0dc27c0 > Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment none > Control: 10c5387d Table: 8000406a DAC: 0051 > Process swapper (pid: 0, stack limit = 0xc0d00218) > Stack: (0xc0d01eec to 0xc0d02000) > 1ee0:60d3 c0dc27c0 c0271efc 0001 c0d58864 > 1f00: ef47 8000 4000 c029fbb0 0100 c1572b5c 2000 > 1f20: 0001 0001 8000 c029f584 c0d58864 8000 8000 > 1f40: 01008000 c0c23790 c15782f4 a0d3 c0d58864 c02a0364 c0819388 > 1f60: c0d58864 00c0 0100 c1572a58 c0aa57a4 0080 2000 c0dca000 > 1f80: efffe980 c0c53a48 c0c23790 c1572a58 c0c59e48 c0c59de8 c1572b5c > 1fa0: c0dca000 c0c257a4 c0dca000 c0d07940 c0dca000 c0c00a9c > 1fc0: c0c00680 c0c53a48 c0dca214 c0d07958 > 1fe0: c0c53a44 c0d0caa4 8000406a 412fc0f2 8000807c > [] (__mod_node_page_state) from [] > (mod_node_page_state+0x2c/0x4c) > [] (mod_node_page_state) from [] > (cache_alloc_refill+0x5b8/0x828) > [] (cache_alloc_refill) from [] > (kmem_cache_alloc+0x24c/0x2d0) > [] (kmem_cache_alloc) from [] > (create_kmalloc_cache+0x20/0x8c) > [] (create_kmalloc_cache) from [] > (kmem_cache_init+0xac/0x11c) > [] (kmem_cache_init) from [] (start_kernel+0x1b8/0x3c0) > [] (start_kernel) from [<8000807c>] (0x8000807c) > Code: e79e5103 e28c3001 e0833001 e1a04003 (e19440d5) > ---[ end trace ]--- Just to be clear that's not my call trace. > The zone counters work earlier than the node counters because the > zones have special boot pagesets, whereas the nodes do not. > > Add boot nodestats against which we account until the dynamic per-cpu > allocator is available. This isn't working for me. I applied it on top of next-20170605, I still get an oops: $ qemu-system-ppc64 -M pseries -m 1G -kernel build/vmlinux -vga none -nographic SLOF ** QEMU Starting ... Linux version 4.12.0-rc3-gcc-5.4.1-next-20170605-dirty (mich...@ka3.ozlabs.ibm.com) (gcc version 5.4.1 20170214 (Custom 2af61cd06c9fd8f5) ) #352 SMP Tue Jun 6 14:09:57 AEST 2017 ... PID hash table entries: 4096 (order: -1, 32768 bytes) Memory: 1014592K/1048576K available (9920K kernel code, 1536K rwdata, 2608K rodata, 832K init, 1420K bss, 33984K reserved, 0K cma-reserved) Unable to handle kernel paging request for data at address 0x0338 Faulting instruction address: 0xc02cf338 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-gcc-5.4.1-next-20170605-dirty #352 task: c0d11080 task.stack: c0e24000 NIP: c02cf338 LR: c02cf0dc CTR: REGS: c0e279a0 TRAP: 0380 Not tainted (4.12.0-rc3-gcc-5.4.1-next-20170605-dirty) MSR: 82001033 CR: 22482242 XER: CFAR: c02cf6a0 SOFTE: 0 GPR00: c02cf0dc c0e27c20 c0e28300 c0003ffc6300 GPR04: c0e556f8 3f12 GPR08: c0ed3058 0330 ff80 GPR12: 28402824 cfd4 0060 00f540a8 GPR16: 00f540d8 fffd 3dc54ee0 0014 GPR20: c0b90e60 c0b90e90 2000 GPR24: 0401 0001 c0003e00 GPR28: 80010400 f00f8000 0006 c0cb4270 NIP [c02cf338] new_slab+0x338/0x770 LR [c02cf0dc] new_slab+0xdc/0x770 Call Trace: [c0e27c20] [c02cf0dc] new_slab+0xdc/0x770 (unreliable) [c0e27cf0] [c02d6bb4] __kmem_cache_create+0x1a4/0x6a0 [c0e27e00] [c0c73098] create_boot_cache+0x98/0xdc [c0e27e80]
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
Johannes Weiner writes: > From 89ed86b5b538d8debd3c29567d7e1d31257fa577 Mon Sep 17 00:00:00 2001 > From: Johannes Weiner > Date: Mon, 5 Jun 2017 14:12:15 -0400 > Subject: [PATCH] mm: vmstat: move slab statistics from zone to node counters > fix > > Unable to handle kernel paging request at virtual address 2e116007 > pgd = c0004000 > [2e116007] *pgd= > Internal error: Oops: 5 [#1] SMP ARM > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #200 > Hardware name: Generic DRA74X (Flattened Device Tree) > task: c0d0adc0 task.stack: c0d0 > PC is at __mod_node_page_state+0x2c/0xc8 > LR is at __per_cpu_offset+0x0/0x8 > pc : []lr : []psr: 60d3 > sp : c0d01eec ip : fp : c15782f4 > r10: r9 : c1591280 r8 : 4000 > r7 : 0001 r6 : 0006 r5 : 2e116000 r4 : 0007 > r3 : 0007 r2 : 0001 r1 : 0006 r0 : c0dc27c0 > Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment none > Control: 10c5387d Table: 8000406a DAC: 0051 > Process swapper (pid: 0, stack limit = 0xc0d00218) > Stack: (0xc0d01eec to 0xc0d02000) > 1ee0:60d3 c0dc27c0 c0271efc 0001 c0d58864 > 1f00: ef47 8000 4000 c029fbb0 0100 c1572b5c 2000 > 1f20: 0001 0001 8000 c029f584 c0d58864 8000 8000 > 1f40: 01008000 c0c23790 c15782f4 a0d3 c0d58864 c02a0364 c0819388 > 1f60: c0d58864 00c0 0100 c1572a58 c0aa57a4 0080 2000 c0dca000 > 1f80: efffe980 c0c53a48 c0c23790 c1572a58 c0c59e48 c0c59de8 c1572b5c > 1fa0: c0dca000 c0c257a4 c0dca000 c0d07940 c0dca000 c0c00a9c > 1fc0: c0c00680 c0c53a48 c0dca214 c0d07958 > 1fe0: c0c53a44 c0d0caa4 8000406a 412fc0f2 8000807c > [] (__mod_node_page_state) from [] > (mod_node_page_state+0x2c/0x4c) > [] (mod_node_page_state) from [] > (cache_alloc_refill+0x5b8/0x828) > [] (cache_alloc_refill) from [] > (kmem_cache_alloc+0x24c/0x2d0) > [] (kmem_cache_alloc) from [] > (create_kmalloc_cache+0x20/0x8c) > [] (create_kmalloc_cache) from [] > (kmem_cache_init+0xac/0x11c) > [] (kmem_cache_init) from [] (start_kernel+0x1b8/0x3c0) > [] (start_kernel) from [<8000807c>] (0x8000807c) > Code: e79e5103 e28c3001 e0833001 e1a04003 (e19440d5) > ---[ end trace ]--- Just to be clear that's not my call trace. > The zone counters work earlier than the node counters because the > zones have special boot pagesets, whereas the nodes do not. > > Add boot nodestats against which we account until the dynamic per-cpu > allocator is available. This isn't working for me. I applied it on top of next-20170605, I still get an oops: $ qemu-system-ppc64 -M pseries -m 1G -kernel build/vmlinux -vga none -nographic SLOF ** QEMU Starting ... Linux version 4.12.0-rc3-gcc-5.4.1-next-20170605-dirty (mich...@ka3.ozlabs.ibm.com) (gcc version 5.4.1 20170214 (Custom 2af61cd06c9fd8f5) ) #352 SMP Tue Jun 6 14:09:57 AEST 2017 ... PID hash table entries: 4096 (order: -1, 32768 bytes) Memory: 1014592K/1048576K available (9920K kernel code, 1536K rwdata, 2608K rodata, 832K init, 1420K bss, 33984K reserved, 0K cma-reserved) Unable to handle kernel paging request for data at address 0x0338 Faulting instruction address: 0xc02cf338 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-gcc-5.4.1-next-20170605-dirty #352 task: c0d11080 task.stack: c0e24000 NIP: c02cf338 LR: c02cf0dc CTR: REGS: c0e279a0 TRAP: 0380 Not tainted (4.12.0-rc3-gcc-5.4.1-next-20170605-dirty) MSR: 82001033 CR: 22482242 XER: CFAR: c02cf6a0 SOFTE: 0 GPR00: c02cf0dc c0e27c20 c0e28300 c0003ffc6300 GPR04: c0e556f8 3f12 GPR08: c0ed3058 0330 ff80 GPR12: 28402824 cfd4 0060 00f540a8 GPR16: 00f540d8 fffd 3dc54ee0 0014 GPR20: c0b90e60 c0b90e90 2000 GPR24: 0401 0001 c0003e00 GPR28: 80010400 f00f8000 0006 c0cb4270 NIP [c02cf338] new_slab+0x338/0x770 LR [c02cf0dc] new_slab+0xdc/0x770 Call Trace: [c0e27c20] [c02cf0dc] new_slab+0xdc/0x770 (unreliable) [c0e27cf0] [c02d6bb4] __kmem_cache_create+0x1a4/0x6a0 [c0e27e00] [c0c73098] create_boot_cache+0x98/0xdc [c0e27e80] [c0c77608] kmem_cache_init+0x5c/0x160 [c0e27f00]
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
On Mon, 5 Jun 2017 14:35:11 -0400 Johannes Weinerwrote: > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5107,6 +5107,7 @@ static void build_zonelists(pg_data_t *pgdat) > */ > static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch); > static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset); > +static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); > static void setup_zone_pageset(struct zone *zone); There's a few kb there. It just sits evermore unused after boot?
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
On Mon, 5 Jun 2017 14:35:11 -0400 Johannes Weiner wrote: > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5107,6 +5107,7 @@ static void build_zonelists(pg_data_t *pgdat) > */ > static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch); > static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset); > +static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); > static void setup_zone_pageset(struct zone *zone); There's a few kb there. It just sits evermore unused after boot?
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
On Thu, Jun 01, 2017 at 08:07:28PM +1000, Michael Ellerman wrote: > Yury Norovwrites: > > > On Wed, May 31, 2017 at 01:39:00PM +0200, Heiko Carstens wrote: > >> On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote: > >> > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote: > >> > > To re-implement slab cache vs. page cache balancing, we'll need the > >> > > slab counters at the lruvec level, which, ever since lru reclaim was > >> > > moved from the zone to the node, is the intersection of the node, not > >> > > the zone, and the memcg. > >> > > > >> > > We could retain the per-zone counters for when the page allocator > >> > > dumps its memory information on failures, and have counters on both > >> > > levels - which on all but NUMA node 0 is usually redundant. But let's > >> > > keep it simple for now and just move them. If anybody complains we can > >> > > restore the per-zone counters. > >> > > > >> > > Signed-off-by: Johannes Weiner > >> > > >> > This patch causes an early boot crash on s390 (linux-next as of today). > >> > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any > >> > further into this yet, maybe you have an idea? > > > > The same on arm64. > > And powerpc. It looks like we need the following on top. I can't reproduce the crash, but it's verifiable with WARN_ONs in the vmstat functions that the nodestat array isn't properly initialized when slab bootstraps: --- >From 89ed86b5b538d8debd3c29567d7e1d31257fa577 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Mon, 5 Jun 2017 14:12:15 -0400 Subject: [PATCH] mm: vmstat: move slab statistics from zone to node counters fix Unable to handle kernel paging request at virtual address 2e116007 pgd = c0004000 [2e116007] *pgd= Internal error: Oops: 5 [#1] SMP ARM Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #200 Hardware name: Generic DRA74X (Flattened Device Tree) task: c0d0adc0 task.stack: c0d0 PC is at __mod_node_page_state+0x2c/0xc8 LR is at __per_cpu_offset+0x0/0x8 pc : []lr : []psr: 60d3 sp : c0d01eec ip : fp : c15782f4 r10: r9 : c1591280 r8 : 4000 r7 : 0001 r6 : 0006 r5 : 2e116000 r4 : 0007 r3 : 0007 r2 : 0001 r1 : 0006 r0 : c0dc27c0 Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 8000406a DAC: 0051 Process swapper (pid: 0, stack limit = 0xc0d00218) Stack: (0xc0d01eec to 0xc0d02000) 1ee0:60d3 c0dc27c0 c0271efc 0001 c0d58864 1f00: ef47 8000 4000 c029fbb0 0100 c1572b5c 2000 1f20: 0001 0001 8000 c029f584 c0d58864 8000 8000 1f40: 01008000 c0c23790 c15782f4 a0d3 c0d58864 c02a0364 c0819388 1f60: c0d58864 00c0 0100 c1572a58 c0aa57a4 0080 2000 c0dca000 1f80: efffe980 c0c53a48 c0c23790 c1572a58 c0c59e48 c0c59de8 c1572b5c 1fa0: c0dca000 c0c257a4 c0dca000 c0d07940 c0dca000 c0c00a9c 1fc0: c0c00680 c0c53a48 c0dca214 c0d07958 1fe0: c0c53a44 c0d0caa4 8000406a 412fc0f2 8000807c [] (__mod_node_page_state) from [] (mod_node_page_state+0x2c/0x4c) [] (mod_node_page_state) from [] (cache_alloc_refill+0x5b8/0x828) [] (cache_alloc_refill) from [] (kmem_cache_alloc+0x24c/0x2d0) [] (kmem_cache_alloc) from [] (create_kmalloc_cache+0x20/0x8c) [] (create_kmalloc_cache) from [] (kmem_cache_init+0xac/0x11c) [] (kmem_cache_init) from [] (start_kernel+0x1b8/0x3c0) [] (start_kernel) from [<8000807c>] (0x8000807c) Code: e79e5103 e28c3001 e0833001 e1a04003 (e19440d5) ---[ end trace ]--- The zone counters work earlier than the node counters because the zones have special boot pagesets, whereas the nodes do not. Add boot nodestats against which we account until the dynamic per-cpu allocator is available. Signed-off-by: Johannes Weiner --- mm/page_alloc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5f89cfaddc4b..7f341f84b587 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5107,6 +5107,7 @@ static void build_zonelists(pg_data_t *pgdat) */ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch); static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset); +static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); static void setup_zone_pageset(struct zone *zone); /* @@ -6010,6 +6011,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat) spin_lock_init(>lru_lock); lruvec_init(node_lruvec(pgdat)); + pgdat->per_cpu_nodestats = _nodestats; + for (j = 0; j < MAX_NR_ZONES; j++) { struct zone *zone = pgdat->node_zones + j; unsigned long size, realsize, freesize,
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
On Thu, Jun 01, 2017 at 08:07:28PM +1000, Michael Ellerman wrote: > Yury Norov writes: > > > On Wed, May 31, 2017 at 01:39:00PM +0200, Heiko Carstens wrote: > >> On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote: > >> > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote: > >> > > To re-implement slab cache vs. page cache balancing, we'll need the > >> > > slab counters at the lruvec level, which, ever since lru reclaim was > >> > > moved from the zone to the node, is the intersection of the node, not > >> > > the zone, and the memcg. > >> > > > >> > > We could retain the per-zone counters for when the page allocator > >> > > dumps its memory information on failures, and have counters on both > >> > > levels - which on all but NUMA node 0 is usually redundant. But let's > >> > > keep it simple for now and just move them. If anybody complains we can > >> > > restore the per-zone counters. > >> > > > >> > > Signed-off-by: Johannes Weiner > >> > > >> > This patch causes an early boot crash on s390 (linux-next as of today). > >> > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any > >> > further into this yet, maybe you have an idea? > > > > The same on arm64. > > And powerpc. It looks like we need the following on top. I can't reproduce the crash, but it's verifiable with WARN_ONs in the vmstat functions that the nodestat array isn't properly initialized when slab bootstraps: --- >From 89ed86b5b538d8debd3c29567d7e1d31257fa577 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Mon, 5 Jun 2017 14:12:15 -0400 Subject: [PATCH] mm: vmstat: move slab statistics from zone to node counters fix Unable to handle kernel paging request at virtual address 2e116007 pgd = c0004000 [2e116007] *pgd= Internal error: Oops: 5 [#1] SMP ARM Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #200 Hardware name: Generic DRA74X (Flattened Device Tree) task: c0d0adc0 task.stack: c0d0 PC is at __mod_node_page_state+0x2c/0xc8 LR is at __per_cpu_offset+0x0/0x8 pc : []lr : []psr: 60d3 sp : c0d01eec ip : fp : c15782f4 r10: r9 : c1591280 r8 : 4000 r7 : 0001 r6 : 0006 r5 : 2e116000 r4 : 0007 r3 : 0007 r2 : 0001 r1 : 0006 r0 : c0dc27c0 Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 8000406a DAC: 0051 Process swapper (pid: 0, stack limit = 0xc0d00218) Stack: (0xc0d01eec to 0xc0d02000) 1ee0:60d3 c0dc27c0 c0271efc 0001 c0d58864 1f00: ef47 8000 4000 c029fbb0 0100 c1572b5c 2000 1f20: 0001 0001 8000 c029f584 c0d58864 8000 8000 1f40: 01008000 c0c23790 c15782f4 a0d3 c0d58864 c02a0364 c0819388 1f60: c0d58864 00c0 0100 c1572a58 c0aa57a4 0080 2000 c0dca000 1f80: efffe980 c0c53a48 c0c23790 c1572a58 c0c59e48 c0c59de8 c1572b5c 1fa0: c0dca000 c0c257a4 c0dca000 c0d07940 c0dca000 c0c00a9c 1fc0: c0c00680 c0c53a48 c0dca214 c0d07958 1fe0: c0c53a44 c0d0caa4 8000406a 412fc0f2 8000807c [] (__mod_node_page_state) from [] (mod_node_page_state+0x2c/0x4c) [] (mod_node_page_state) from [] (cache_alloc_refill+0x5b8/0x828) [] (cache_alloc_refill) from [] (kmem_cache_alloc+0x24c/0x2d0) [] (kmem_cache_alloc) from [] (create_kmalloc_cache+0x20/0x8c) [] (create_kmalloc_cache) from [] (kmem_cache_init+0xac/0x11c) [] (kmem_cache_init) from [] (start_kernel+0x1b8/0x3c0) [] (start_kernel) from [<8000807c>] (0x8000807c) Code: e79e5103 e28c3001 e0833001 e1a04003 (e19440d5) ---[ end trace ]--- The zone counters work earlier than the node counters because the zones have special boot pagesets, whereas the nodes do not. Add boot nodestats against which we account until the dynamic per-cpu allocator is available. Signed-off-by: Johannes Weiner --- mm/page_alloc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5f89cfaddc4b..7f341f84b587 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5107,6 +5107,7 @@ static void build_zonelists(pg_data_t *pgdat) */ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch); static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset); +static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); static void setup_zone_pageset(struct zone *zone); /* @@ -6010,6 +6011,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat) spin_lock_init(>lru_lock); lruvec_init(node_lruvec(pgdat)); + pgdat->per_cpu_nodestats = _nodestats; + for (j = 0; j < MAX_NR_ZONES; j++) { struct zone *zone = pgdat->node_zones + j; unsigned long size, realsize, freesize, memmap_pages; -- 2.13.0
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
Yury Norovwrites: > On Wed, May 31, 2017 at 01:39:00PM +0200, Heiko Carstens wrote: >> On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote: >> > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote: >> > > To re-implement slab cache vs. page cache balancing, we'll need the >> > > slab counters at the lruvec level, which, ever since lru reclaim was >> > > moved from the zone to the node, is the intersection of the node, not >> > > the zone, and the memcg. >> > > >> > > We could retain the per-zone counters for when the page allocator >> > > dumps its memory information on failures, and have counters on both >> > > levels - which on all but NUMA node 0 is usually redundant. But let's >> > > keep it simple for now and just move them. If anybody complains we can >> > > restore the per-zone counters. >> > > >> > > Signed-off-by: Johannes Weiner >> > >> > This patch causes an early boot crash on s390 (linux-next as of today). >> > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any >> > further into this yet, maybe you have an idea? > > The same on arm64. And powerpc. cheers
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
Yury Norov writes: > On Wed, May 31, 2017 at 01:39:00PM +0200, Heiko Carstens wrote: >> On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote: >> > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote: >> > > To re-implement slab cache vs. page cache balancing, we'll need the >> > > slab counters at the lruvec level, which, ever since lru reclaim was >> > > moved from the zone to the node, is the intersection of the node, not >> > > the zone, and the memcg. >> > > >> > > We could retain the per-zone counters for when the page allocator >> > > dumps its memory information on failures, and have counters on both >> > > levels - which on all but NUMA node 0 is usually redundant. But let's >> > > keep it simple for now and just move them. If anybody complains we can >> > > restore the per-zone counters. >> > > >> > > Signed-off-by: Johannes Weiner >> > >> > This patch causes an early boot crash on s390 (linux-next as of today). >> > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any >> > further into this yet, maybe you have an idea? > > The same on arm64. And powerpc. cheers
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
On Wed, May 31, 2017 at 01:39:00PM +0200, Heiko Carstens wrote: > On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote: > > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote: > > > To re-implement slab cache vs. page cache balancing, we'll need the > > > slab counters at the lruvec level, which, ever since lru reclaim was > > > moved from the zone to the node, is the intersection of the node, not > > > the zone, and the memcg. > > > > > > We could retain the per-zone counters for when the page allocator > > > dumps its memory information on failures, and have counters on both > > > levels - which on all but NUMA node 0 is usually redundant. But let's > > > keep it simple for now and just move them. If anybody complains we can > > > restore the per-zone counters. > > > > > > Signed-off-by: Johannes Weiner> > > > This patch causes an early boot crash on s390 (linux-next as of today). > > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any > > further into this yet, maybe you have an idea? The same on arm64. Yury
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
On Wed, May 31, 2017 at 01:39:00PM +0200, Heiko Carstens wrote: > On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote: > > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote: > > > To re-implement slab cache vs. page cache balancing, we'll need the > > > slab counters at the lruvec level, which, ever since lru reclaim was > > > moved from the zone to the node, is the intersection of the node, not > > > the zone, and the memcg. > > > > > > We could retain the per-zone counters for when the page allocator > > > dumps its memory information on failures, and have counters on both > > > levels - which on all but NUMA node 0 is usually redundant. But let's > > > keep it simple for now and just move them. If anybody complains we can > > > restore the per-zone counters. > > > > > > Signed-off-by: Johannes Weiner > > > > This patch causes an early boot crash on s390 (linux-next as of today). > > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any > > further into this yet, maybe you have an idea? The same on arm64. Yury
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote: > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote: > > To re-implement slab cache vs. page cache balancing, we'll need the > > slab counters at the lruvec level, which, ever since lru reclaim was > > moved from the zone to the node, is the intersection of the node, not > > the zone, and the memcg. > > > > We could retain the per-zone counters for when the page allocator > > dumps its memory information on failures, and have counters on both > > levels - which on all but NUMA node 0 is usually redundant. But let's > > keep it simple for now and just move them. If anybody complains we can > > restore the per-zone counters. > > > > Signed-off-by: Johannes Weiner> > This patch causes an early boot crash on s390 (linux-next as of today). > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any > further into this yet, maybe you have an idea? > > Kernel BUG at 002b0362 [verbose debug info unavailable] > addressing exception: 0005 ilc:3 [#1] SMP > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #16 > Hardware name: IBM 2964 N96 702 (z/VM 6.4.0) > task: 00d75d00 task.stack: 00d6 > Krnl PSW : 040420018000 002b0362 (mod_node_page_state+0x62/0x158) >R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 RI:0 EA:3 > Krnl GPRS: 0001 3d81f000 0006 >0001 00f29b52 0041 >0007 0040 3fe81000 03d100ffa000 >00ee1cd0 00979040 00300abc 00d63c90 > Krnl Code: 002b0350: e3100394 lg %r1,912 >002b0356: e320f0a80004 lg %r2,168(%r15) > #002b035c: e3112090 llgc %r1,0(%r1,%r2) > >002b0362: b9060011 lgbr %r1,%r1 >002b0366: e3200394 lg %r2,912 >002b036c: e3c28090 llgc %r12,0(%r2,%r8) >002b0372: b90600ac lgbr %r10,%r12 >002b0376: b904002a lgr %r2,%r10 > Call Trace: > ([<>] (null)) > [<00300abc>] new_slab+0x35c/0x628 > [<0030740c>] __kmem_cache_create+0x33c/0x638 > [<00e99c0e>] create_boot_cache+0xae/0xe0 > [<00e9e12c>] kmem_cache_init+0x5c/0x138 > [<00e7999c>] start_kernel+0x24c/0x440 > [<00100020>] _stext+0x20/0x80 > Last Breaking-Event-Address: > [<00300ab6>] new_slab+0x356/0x628 FWIW, it looks like your patch only triggers a bug that was introduced with a different change that somehow messes around with the pages used to setup the kernel page tables. I'll look into this.
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote: > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote: > > To re-implement slab cache vs. page cache balancing, we'll need the > > slab counters at the lruvec level, which, ever since lru reclaim was > > moved from the zone to the node, is the intersection of the node, not > > the zone, and the memcg. > > > > We could retain the per-zone counters for when the page allocator > > dumps its memory information on failures, and have counters on both > > levels - which on all but NUMA node 0 is usually redundant. But let's > > keep it simple for now and just move them. If anybody complains we can > > restore the per-zone counters. > > > > Signed-off-by: Johannes Weiner > > This patch causes an early boot crash on s390 (linux-next as of today). > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any > further into this yet, maybe you have an idea? > > Kernel BUG at 002b0362 [verbose debug info unavailable] > addressing exception: 0005 ilc:3 [#1] SMP > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #16 > Hardware name: IBM 2964 N96 702 (z/VM 6.4.0) > task: 00d75d00 task.stack: 00d6 > Krnl PSW : 040420018000 002b0362 (mod_node_page_state+0x62/0x158) >R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 RI:0 EA:3 > Krnl GPRS: 0001 3d81f000 0006 >0001 00f29b52 0041 >0007 0040 3fe81000 03d100ffa000 >00ee1cd0 00979040 00300abc 00d63c90 > Krnl Code: 002b0350: e3100394 lg %r1,912 >002b0356: e320f0a80004 lg %r2,168(%r15) > #002b035c: e3112090 llgc %r1,0(%r1,%r2) > >002b0362: b9060011 lgbr %r1,%r1 >002b0366: e3200394 lg %r2,912 >002b036c: e3c28090 llgc %r12,0(%r2,%r8) >002b0372: b90600ac lgbr %r10,%r12 >002b0376: b904002a lgr %r2,%r10 > Call Trace: > ([<>] (null)) > [<00300abc>] new_slab+0x35c/0x628 > [<0030740c>] __kmem_cache_create+0x33c/0x638 > [<00e99c0e>] create_boot_cache+0xae/0xe0 > [<00e9e12c>] kmem_cache_init+0x5c/0x138 > [<00e7999c>] start_kernel+0x24c/0x440 > [<00100020>] _stext+0x20/0x80 > Last Breaking-Event-Address: > [<00300ab6>] new_slab+0x356/0x628 FWIW, it looks like your patch only triggers a bug that was introduced with a different change that somehow messes around with the pages used to setup the kernel page tables. I'll look into this.
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote: > To re-implement slab cache vs. page cache balancing, we'll need the > slab counters at the lruvec level, which, ever since lru reclaim was > moved from the zone to the node, is the intersection of the node, not > the zone, and the memcg. > > We could retain the per-zone counters for when the page allocator > dumps its memory information on failures, and have counters on both > levels - which on all but NUMA node 0 is usually redundant. But let's > keep it simple for now and just move them. If anybody complains we can > restore the per-zone counters. > > Signed-off-by: Johannes WeinerThis patch causes an early boot crash on s390 (linux-next as of today). CONFIG_NUMA on/off doesn't make any difference. I haven't looked any further into this yet, maybe you have an idea? Kernel BUG at 002b0362 [verbose debug info unavailable] addressing exception: 0005 ilc:3 [#1] SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #16 Hardware name: IBM 2964 N96 702 (z/VM 6.4.0) task: 00d75d00 task.stack: 00d6 Krnl PSW : 040420018000 002b0362 (mod_node_page_state+0x62/0x158) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 0001 3d81f000 0006 0001 00f29b52 0041 0007 0040 3fe81000 03d100ffa000 00ee1cd0 00979040 00300abc 00d63c90 Krnl Code: 002b0350: e3100394 lg %r1,912 002b0356: e320f0a80004 lg %r2,168(%r15) #002b035c: e3112090 llgc %r1,0(%r1,%r2) >002b0362: b9060011 lgbr %r1,%r1 002b0366: e3200394 lg %r2,912 002b036c: e3c28090 llgc %r12,0(%r2,%r8) 002b0372: b90600ac lgbr %r10,%r12 002b0376: b904002a lgr %r2,%r10 Call Trace: ([<>] (null)) [<00300abc>] new_slab+0x35c/0x628 [<0030740c>] __kmem_cache_create+0x33c/0x638 [<00e99c0e>] create_boot_cache+0xae/0xe0 [<00e9e12c>] kmem_cache_init+0x5c/0x138 [<00e7999c>] start_kernel+0x24c/0x440 [<00100020>] _stext+0x20/0x80 Last Breaking-Event-Address: [<00300ab6>] new_slab+0x356/0x628 Kernel panic - not syncing: Fatal exception: panic_on_oops > diff --git a/drivers/base/node.c b/drivers/base/node.c > index 5548f9686016..e57e06e6df4c 100644 > --- a/drivers/base/node.c > +++ b/drivers/base/node.c > @@ -129,11 +129,11 @@ static ssize_t node_read_meminfo(struct device *dev, > nid, K(node_page_state(pgdat, NR_UNSTABLE_NFS)), > nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)), > nid, K(node_page_state(pgdat, NR_WRITEBACK_TEMP)), > -nid, K(sum_zone_node_page_state(nid, > NR_SLAB_RECLAIMABLE) + > - sum_zone_node_page_state(nid, > NR_SLAB_UNRECLAIMABLE)), > -nid, K(sum_zone_node_page_state(nid, > NR_SLAB_RECLAIMABLE)), > +nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE) + > + node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)), > +nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE)), > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > -nid, K(sum_zone_node_page_state(nid, > NR_SLAB_UNRECLAIMABLE)), > +nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)), > nid, K(node_page_state(pgdat, NR_ANON_THPS) * > HPAGE_PMD_NR), > nid, K(node_page_state(pgdat, NR_SHMEM_THPS) * > @@ -141,7 +141,7 @@ static ssize_t node_read_meminfo(struct device *dev, > nid, K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED) * > HPAGE_PMD_NR)); > #else > -nid, K(sum_zone_node_page_state(nid, > NR_SLAB_UNRECLAIMABLE))); > +nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE))); > #endif > n += hugetlb_report_node_meminfo(nid, buf + n); > return n; > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index ebaccd4e7d8c..eacadee83964 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -125,8 +125,6 @@ enum zone_stat_item { > NR_ZONE_UNEVICTABLE, > NR_ZONE_WRITE_PENDING, /* Count of dirty, writeback and unstable pages > */ > NR_MLOCK, /* mlock()ed pages found and moved off LRU */ > - NR_SLAB_RECLAIMABLE, > - NR_SLAB_UNRECLAIMABLE, > NR_PAGETABLE, /* used for pagetables */ > NR_KERNEL_STACK_KB, /* measured in KiB */ > /* Second 128 byte cacheline */ > @@ -152,6 +150,8 @@ enum
Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote: > To re-implement slab cache vs. page cache balancing, we'll need the > slab counters at the lruvec level, which, ever since lru reclaim was > moved from the zone to the node, is the intersection of the node, not > the zone, and the memcg. > > We could retain the per-zone counters for when the page allocator > dumps its memory information on failures, and have counters on both > levels - which on all but NUMA node 0 is usually redundant. But let's > keep it simple for now and just move them. If anybody complains we can > restore the per-zone counters. > > Signed-off-by: Johannes Weiner This patch causes an early boot crash on s390 (linux-next as of today). CONFIG_NUMA on/off doesn't make any difference. I haven't looked any further into this yet, maybe you have an idea? Kernel BUG at 002b0362 [verbose debug info unavailable] addressing exception: 0005 ilc:3 [#1] SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #16 Hardware name: IBM 2964 N96 702 (z/VM 6.4.0) task: 00d75d00 task.stack: 00d6 Krnl PSW : 040420018000 002b0362 (mod_node_page_state+0x62/0x158) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 0001 3d81f000 0006 0001 00f29b52 0041 0007 0040 3fe81000 03d100ffa000 00ee1cd0 00979040 00300abc 00d63c90 Krnl Code: 002b0350: e3100394 lg %r1,912 002b0356: e320f0a80004 lg %r2,168(%r15) #002b035c: e3112090 llgc %r1,0(%r1,%r2) >002b0362: b9060011 lgbr %r1,%r1 002b0366: e3200394 lg %r2,912 002b036c: e3c28090 llgc %r12,0(%r2,%r8) 002b0372: b90600ac lgbr %r10,%r12 002b0376: b904002a lgr %r2,%r10 Call Trace: ([<>] (null)) [<00300abc>] new_slab+0x35c/0x628 [<0030740c>] __kmem_cache_create+0x33c/0x638 [<00e99c0e>] create_boot_cache+0xae/0xe0 [<00e9e12c>] kmem_cache_init+0x5c/0x138 [<00e7999c>] start_kernel+0x24c/0x440 [<00100020>] _stext+0x20/0x80 Last Breaking-Event-Address: [<00300ab6>] new_slab+0x356/0x628 Kernel panic - not syncing: Fatal exception: panic_on_oops > diff --git a/drivers/base/node.c b/drivers/base/node.c > index 5548f9686016..e57e06e6df4c 100644 > --- a/drivers/base/node.c > +++ b/drivers/base/node.c > @@ -129,11 +129,11 @@ static ssize_t node_read_meminfo(struct device *dev, > nid, K(node_page_state(pgdat, NR_UNSTABLE_NFS)), > nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)), > nid, K(node_page_state(pgdat, NR_WRITEBACK_TEMP)), > -nid, K(sum_zone_node_page_state(nid, > NR_SLAB_RECLAIMABLE) + > - sum_zone_node_page_state(nid, > NR_SLAB_UNRECLAIMABLE)), > -nid, K(sum_zone_node_page_state(nid, > NR_SLAB_RECLAIMABLE)), > +nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE) + > + node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)), > +nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE)), > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > -nid, K(sum_zone_node_page_state(nid, > NR_SLAB_UNRECLAIMABLE)), > +nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)), > nid, K(node_page_state(pgdat, NR_ANON_THPS) * > HPAGE_PMD_NR), > nid, K(node_page_state(pgdat, NR_SHMEM_THPS) * > @@ -141,7 +141,7 @@ static ssize_t node_read_meminfo(struct device *dev, > nid, K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED) * > HPAGE_PMD_NR)); > #else > -nid, K(sum_zone_node_page_state(nid, > NR_SLAB_UNRECLAIMABLE))); > +nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE))); > #endif > n += hugetlb_report_node_meminfo(nid, buf + n); > return n; > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index ebaccd4e7d8c..eacadee83964 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -125,8 +125,6 @@ enum zone_stat_item { > NR_ZONE_UNEVICTABLE, > NR_ZONE_WRITE_PENDING, /* Count of dirty, writeback and unstable pages > */ > NR_MLOCK, /* mlock()ed pages found and moved off LRU */ > - NR_SLAB_RECLAIMABLE, > - NR_SLAB_UNRECLAIMABLE, > NR_PAGETABLE, /* used for pagetables */ > NR_KERNEL_STACK_KB, /* measured in KiB */ > /* Second 128 byte cacheline */ > @@ -152,6 +150,8 @@ enum node_stat_item { >