Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
On Fri 15-06-18 01:07:22, Naoya Horiguchi wrote: > On Thu, Jun 14, 2018 at 09:00:50AM +0200, Michal Hocko wrote: > > On Thu 14-06-18 05:16:18, Naoya Horiguchi wrote: > > > On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote: > > > > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote: > > > > [...] > > > > > From: Naoya Horiguchi > > > > > Date: Wed, 13 Jun 2018 12:43:27 +0900 > > > > > Subject: [PATCH] mm: zero remaining unavailable struct pages > > > > > > > > > > There is a kernel panic that is triggered when reading > > > > > /proc/kpageflags > > > > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': > > > > > > > > > > BUG: unable to handle kernel paging request at fffe > > > > > PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 > > > > > Oops: [#1] SMP PTI > > > > > CPU: 2 PID: 1728 Comm: page-types Not tainted > > > > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 > > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > > > 1.11.0-2.fc28 04/01/2014 > > > > > RIP: 0010:stable_page_flags+0x27/0x3c0 > > > > > Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 > > > > > 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 > > > > > <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 > > > > > RSP: 0018:bbd44111fde0 EFLAGS: 00010202 > > > > > RAX: fffe RBX: 7fffeff9 RCX: > > > > > RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 > > > > > RBP: R08: 0001 R09: 0001 > > > > > R10: bbd44111fed8 R11: R12: ed1182fff5c0 > > > > > R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 > > > > > FS: 7efc4335a500() GS:93a5bfc0() > > > > > knlGS: > > > > > CS: 0010 DS: ES: CR0: 80050033 > > > > > CR2: fffe CR3: b2a58000 CR4: 001406e0 > > > > > Call Trace: > > > > >kpageflags_read+0xc7/0x120 > > > > >proc_reg_read+0x3c/0x60 > > > > >__vfs_read+0x36/0x170 > > > > >vfs_read+0x89/0x130 > > > > >ksys_pread64+0x71/0x90 > > > > >do_syscall_64+0x5b/0x160 > > > > >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > RIP: 0033:0x7efc42e75e23 > > > > > Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 > > > > > 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 > > > > > <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 > > > > > > > > > > According to kernel bisection, this problem became visible due to > > > > > commit > > > > > f7f99100d8d9 which changes how struct pages are initialized. > > > > > > > > > > Memblock layout affects the pfn ranges covered by node/zone. Consider > > > > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and > > > > > the default (no memmap= given) memblock layout is like below: > > > > > > > > > > MEMBLOCK configuration: > > > > >memory size = 0x0001fff75c00 reserved size = 0x0300c000 > > > > >memory.cnt = 0x4 > > > > >memory[0x0] [0x1000-0x0009efff], > > > > > 0x0009e000 bytes on node 0 flags: 0x0 > > > > >memory[0x1] [0x0010-0xbffd6fff], > > > > > 0xbfed7000 bytes on node 0 flags: 0x0 > > > > >memory[0x2] [0x0001-0x00013fff], > > > > > 0x4000 bytes on node 0 flags: 0x0 > > > > >memory[0x3] [0x00014000-0x00023fff], > > > > > 0x0001 bytes on node 1 flags: 0x0 > > > > >... > > > > > > > > > > If you give memmap=1G!4G (so it just covers memory[0x2]), > > > > > the range [0x1-0x13fff] is gone: > > > > > > > > > > MEMBLOCK configuration: > > > > >memory size = 0x0001bff75c00 reserved size = 0x0300c000 > > > > >memory.cnt = 0x3 > > > > >memory[0x0] [0x1000-0x0009efff], > > > > > 0x0009e000 bytes on node 0 flags: 0x0 > > > > >memory[0x1] [0x0010-0xbffd6fff], > > > > > 0xbfed7000 bytes on node 0 flags: 0x0 > > > > >memory[0x2] [0x00014000-0x00023fff], > > > > > 0x0001 bytes on node 1 flags: 0x0 > > > > >... > > > > > > > > > > This causes shrinking node 0's pfn range because it is calculated by > > > > > the address range of memblock.memory. So some of struct pages in the > > > > > gap range are left uninitialized. > > > > > > > > > > We have a function zero_resv_unavail() which does zeroing the struct > > > > > pages outside memblock.memory, but currently it covers only the > > > > > reserved > > > > > unavailable range (i.e. memblock.memory && !memblock.reserved). > > > > > This patch extends it to cover all unavailable range, which fixes > > > > > the reported issue. > > > > > > > > Thanks for pin
Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
On Fri 15-06-18 01:07:22, Naoya Horiguchi wrote: > On Thu, Jun 14, 2018 at 09:00:50AM +0200, Michal Hocko wrote: > > On Thu 14-06-18 05:16:18, Naoya Horiguchi wrote: > > > On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote: > > > > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote: > > > > [...] > > > > > From: Naoya Horiguchi > > > > > Date: Wed, 13 Jun 2018 12:43:27 +0900 > > > > > Subject: [PATCH] mm: zero remaining unavailable struct pages > > > > > > > > > > There is a kernel panic that is triggered when reading > > > > > /proc/kpageflags > > > > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': > > > > > > > > > > BUG: unable to handle kernel paging request at fffe > > > > > PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 > > > > > Oops: [#1] SMP PTI > > > > > CPU: 2 PID: 1728 Comm: page-types Not tainted > > > > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 > > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > > > 1.11.0-2.fc28 04/01/2014 > > > > > RIP: 0010:stable_page_flags+0x27/0x3c0 > > > > > Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 > > > > > 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 > > > > > <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 > > > > > RSP: 0018:bbd44111fde0 EFLAGS: 00010202 > > > > > RAX: fffe RBX: 7fffeff9 RCX: > > > > > RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 > > > > > RBP: R08: 0001 R09: 0001 > > > > > R10: bbd44111fed8 R11: R12: ed1182fff5c0 > > > > > R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 > > > > > FS: 7efc4335a500() GS:93a5bfc0() > > > > > knlGS: > > > > > CS: 0010 DS: ES: CR0: 80050033 > > > > > CR2: fffe CR3: b2a58000 CR4: 001406e0 > > > > > Call Trace: > > > > >kpageflags_read+0xc7/0x120 > > > > >proc_reg_read+0x3c/0x60 > > > > >__vfs_read+0x36/0x170 > > > > >vfs_read+0x89/0x130 > > > > >ksys_pread64+0x71/0x90 > > > > >do_syscall_64+0x5b/0x160 > > > > >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > RIP: 0033:0x7efc42e75e23 > > > > > Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 > > > > > 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 > > > > > <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 > > > > > > > > > > According to kernel bisection, this problem became visible due to > > > > > commit > > > > > f7f99100d8d9 which changes how struct pages are initialized. > > > > > > > > > > Memblock layout affects the pfn ranges covered by node/zone. Consider > > > > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and > > > > > the default (no memmap= given) memblock layout is like below: > > > > > > > > > > MEMBLOCK configuration: > > > > >memory size = 0x0001fff75c00 reserved size = 0x0300c000 > > > > >memory.cnt = 0x4 > > > > >memory[0x0] [0x1000-0x0009efff], > > > > > 0x0009e000 bytes on node 0 flags: 0x0 > > > > >memory[0x1] [0x0010-0xbffd6fff], > > > > > 0xbfed7000 bytes on node 0 flags: 0x0 > > > > >memory[0x2] [0x0001-0x00013fff], > > > > > 0x4000 bytes on node 0 flags: 0x0 > > > > >memory[0x3] [0x00014000-0x00023fff], > > > > > 0x0001 bytes on node 1 flags: 0x0 > > > > >... > > > > > > > > > > If you give memmap=1G!4G (so it just covers memory[0x2]), > > > > > the range [0x1-0x13fff] is gone: > > > > > > > > > > MEMBLOCK configuration: > > > > >memory size = 0x0001bff75c00 reserved size = 0x0300c000 > > > > >memory.cnt = 0x3 > > > > >memory[0x0] [0x1000-0x0009efff], > > > > > 0x0009e000 bytes on node 0 flags: 0x0 > > > > >memory[0x1] [0x0010-0xbffd6fff], > > > > > 0xbfed7000 bytes on node 0 flags: 0x0 > > > > >memory[0x2] [0x00014000-0x00023fff], > > > > > 0x0001 bytes on node 1 flags: 0x0 > > > > >... > > > > > > > > > > This causes shrinking node 0's pfn range because it is calculated by > > > > > the address range of memblock.memory. So some of struct pages in the > > > > > gap range are left uninitialized. > > > > > > > > > > We have a function zero_resv_unavail() which does zeroing the struct > > > > > pages outside memblock.memory, but currently it covers only the > > > > > reserved > > > > > unavailable range (i.e. memblock.memory && !memblock.reserved). > > > > > This patch extends it to cover all unavailable range, which fixes > > > > > the reported issue. > > > > > > > > Thanks for pin
Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
On Thu, Jun 14, 2018 at 09:00:50AM +0200, Michal Hocko wrote: > On Thu 14-06-18 05:16:18, Naoya Horiguchi wrote: > > On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote: > > > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote: > > > [...] > > > > From: Naoya Horiguchi > > > > Date: Wed, 13 Jun 2018 12:43:27 +0900 > > > > Subject: [PATCH] mm: zero remaining unavailable struct pages > > > > > > > > There is a kernel panic that is triggered when reading /proc/kpageflags > > > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': > > > > > > > > BUG: unable to handle kernel paging request at fffe > > > > PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 > > > > Oops: [#1] SMP PTI > > > > CPU: 2 PID: 1728 Comm: page-types Not tainted > > > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > > 1.11.0-2.fc28 04/01/2014 > > > > RIP: 0010:stable_page_flags+0x27/0x3c0 > > > > Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 > > > > 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b > > > > 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 > > > > RSP: 0018:bbd44111fde0 EFLAGS: 00010202 > > > > RAX: fffe RBX: 7fffeff9 RCX: > > > > RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 > > > > RBP: R08: 0001 R09: 0001 > > > > R10: bbd44111fed8 R11: R12: ed1182fff5c0 > > > > R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 > > > > FS: 7efc4335a500() GS:93a5bfc0() > > > > knlGS: > > > > CS: 0010 DS: ES: CR0: 80050033 > > > > CR2: fffe CR3: b2a58000 CR4: 001406e0 > > > > Call Trace: > > > >kpageflags_read+0xc7/0x120 > > > >proc_reg_read+0x3c/0x60 > > > >__vfs_read+0x36/0x170 > > > >vfs_read+0x89/0x130 > > > >ksys_pread64+0x71/0x90 > > > >do_syscall_64+0x5b/0x160 > > > >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > RIP: 0033:0x7efc42e75e23 > > > > Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 > > > > 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d > > > > 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 > > > > > > > > According to kernel bisection, this problem became visible due to commit > > > > f7f99100d8d9 which changes how struct pages are initialized. > > > > > > > > Memblock layout affects the pfn ranges covered by node/zone. Consider > > > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and > > > > the default (no memmap= given) memblock layout is like below: > > > > > > > > MEMBLOCK configuration: > > > >memory size = 0x0001fff75c00 reserved size = 0x0300c000 > > > >memory.cnt = 0x4 > > > >memory[0x0] [0x1000-0x0009efff], > > > > 0x0009e000 bytes on node 0 flags: 0x0 > > > >memory[0x1] [0x0010-0xbffd6fff], > > > > 0xbfed7000 bytes on node 0 flags: 0x0 > > > >memory[0x2] [0x0001-0x00013fff], > > > > 0x4000 bytes on node 0 flags: 0x0 > > > >memory[0x3] [0x00014000-0x00023fff], > > > > 0x0001 bytes on node 1 flags: 0x0 > > > >... > > > > > > > > If you give memmap=1G!4G (so it just covers memory[0x2]), > > > > the range [0x1-0x13fff] is gone: > > > > > > > > MEMBLOCK configuration: > > > >memory size = 0x0001bff75c00 reserved size = 0x0300c000 > > > >memory.cnt = 0x3 > > > >memory[0x0] [0x1000-0x0009efff], > > > > 0x0009e000 bytes on node 0 flags: 0x0 > > > >memory[0x1] [0x0010-0xbffd6fff], > > > > 0xbfed7000 bytes on node 0 flags: 0x0 > > > >memory[0x2] [0x00014000-0x00023fff], > > > > 0x0001 bytes on node 1 flags: 0x0 > > > >... > > > > > > > > This causes shrinking node 0's pfn range because it is calculated by > > > > the address range of memblock.memory. So some of struct pages in the > > > > gap range are left uninitialized. > > > > > > > > We have a function zero_resv_unavail() which does zeroing the struct > > > > pages outside memblock.memory, but currently it covers only the reserved > > > > unavailable range (i.e. memblock.memory && !memblock.reserved). > > > > This patch extends it to cover all unavailable range, which fixes > > > > the reported issue. > > > > > > Thanks for pin pointing this down Naoya! I am wondering why we cannot > > > simply mark the excluded ranges to be reserved instead. > > > > I tried your idea with the change below, and it also fixes the kernel panic. > > > > --- > > diff --git a/arch/x86/kernel/e820.c
Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
On Thu, Jun 14, 2018 at 09:00:50AM +0200, Michal Hocko wrote: > On Thu 14-06-18 05:16:18, Naoya Horiguchi wrote: > > On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote: > > > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote: > > > [...] > > > > From: Naoya Horiguchi > > > > Date: Wed, 13 Jun 2018 12:43:27 +0900 > > > > Subject: [PATCH] mm: zero remaining unavailable struct pages > > > > > > > > There is a kernel panic that is triggered when reading /proc/kpageflags > > > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': > > > > > > > > BUG: unable to handle kernel paging request at fffe > > > > PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 > > > > Oops: [#1] SMP PTI > > > > CPU: 2 PID: 1728 Comm: page-types Not tainted > > > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > > 1.11.0-2.fc28 04/01/2014 > > > > RIP: 0010:stable_page_flags+0x27/0x3c0 > > > > Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 > > > > 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b > > > > 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 > > > > RSP: 0018:bbd44111fde0 EFLAGS: 00010202 > > > > RAX: fffe RBX: 7fffeff9 RCX: > > > > RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 > > > > RBP: R08: 0001 R09: 0001 > > > > R10: bbd44111fed8 R11: R12: ed1182fff5c0 > > > > R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 > > > > FS: 7efc4335a500() GS:93a5bfc0() > > > > knlGS: > > > > CS: 0010 DS: ES: CR0: 80050033 > > > > CR2: fffe CR3: b2a58000 CR4: 001406e0 > > > > Call Trace: > > > >kpageflags_read+0xc7/0x120 > > > >proc_reg_read+0x3c/0x60 > > > >__vfs_read+0x36/0x170 > > > >vfs_read+0x89/0x130 > > > >ksys_pread64+0x71/0x90 > > > >do_syscall_64+0x5b/0x160 > > > >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > RIP: 0033:0x7efc42e75e23 > > > > Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 > > > > 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d > > > > 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 > > > > > > > > According to kernel bisection, this problem became visible due to commit > > > > f7f99100d8d9 which changes how struct pages are initialized. > > > > > > > > Memblock layout affects the pfn ranges covered by node/zone. Consider > > > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and > > > > the default (no memmap= given) memblock layout is like below: > > > > > > > > MEMBLOCK configuration: > > > >memory size = 0x0001fff75c00 reserved size = 0x0300c000 > > > >memory.cnt = 0x4 > > > >memory[0x0] [0x1000-0x0009efff], > > > > 0x0009e000 bytes on node 0 flags: 0x0 > > > >memory[0x1] [0x0010-0xbffd6fff], > > > > 0xbfed7000 bytes on node 0 flags: 0x0 > > > >memory[0x2] [0x0001-0x00013fff], > > > > 0x4000 bytes on node 0 flags: 0x0 > > > >memory[0x3] [0x00014000-0x00023fff], > > > > 0x0001 bytes on node 1 flags: 0x0 > > > >... > > > > > > > > If you give memmap=1G!4G (so it just covers memory[0x2]), > > > > the range [0x1-0x13fff] is gone: > > > > > > > > MEMBLOCK configuration: > > > >memory size = 0x0001bff75c00 reserved size = 0x0300c000 > > > >memory.cnt = 0x3 > > > >memory[0x0] [0x1000-0x0009efff], > > > > 0x0009e000 bytes on node 0 flags: 0x0 > > > >memory[0x1] [0x0010-0xbffd6fff], > > > > 0xbfed7000 bytes on node 0 flags: 0x0 > > > >memory[0x2] [0x00014000-0x00023fff], > > > > 0x0001 bytes on node 1 flags: 0x0 > > > >... > > > > > > > > This causes shrinking node 0's pfn range because it is calculated by > > > > the address range of memblock.memory. So some of struct pages in the > > > > gap range are left uninitialized. > > > > > > > > We have a function zero_resv_unavail() which does zeroing the struct > > > > pages outside memblock.memory, but currently it covers only the reserved > > > > unavailable range (i.e. memblock.memory && !memblock.reserved). > > > > This patch extends it to cover all unavailable range, which fixes > > > > the reported issue. > > > > > > Thanks for pin pointing this down Naoya! I am wondering why we cannot > > > simply mark the excluded ranges to be reserved instead. > > > > I tried your idea with the change below, and it also fixes the kernel panic. > > > > --- > > diff --git a/arch/x86/kernel/e820.c
Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
On Thu 14-06-18 05:16:18, Naoya Horiguchi wrote: > On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote: > > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote: > > [...] > > > From: Naoya Horiguchi > > > Date: Wed, 13 Jun 2018 12:43:27 +0900 > > > Subject: [PATCH] mm: zero remaining unavailable struct pages > > > > > > There is a kernel panic that is triggered when reading /proc/kpageflags > > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': > > > > > > BUG: unable to handle kernel paging request at fffe > > > PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 > > > Oops: [#1] SMP PTI > > > CPU: 2 PID: 1728 Comm: page-types Not tainted > > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > 1.11.0-2.fc28 04/01/2014 > > > RIP: 0010:stable_page_flags+0x27/0x3c0 > > > Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 > > > fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 > > > c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 > > > RSP: 0018:bbd44111fde0 EFLAGS: 00010202 > > > RAX: fffe RBX: 7fffeff9 RCX: > > > RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 > > > RBP: R08: 0001 R09: 0001 > > > R10: bbd44111fed8 R11: R12: ed1182fff5c0 > > > R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 > > > FS: 7efc4335a500() GS:93a5bfc0() > > > knlGS: > > > CS: 0010 DS: ES: CR0: 80050033 > > > CR2: fffe CR3: b2a58000 CR4: 001406e0 > > > Call Trace: > > >kpageflags_read+0xc7/0x120 > > >proc_reg_read+0x3c/0x60 > > >__vfs_read+0x36/0x170 > > >vfs_read+0x89/0x130 > > >ksys_pread64+0x71/0x90 > > >do_syscall_64+0x5b/0x160 > > >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > RIP: 0033:0x7efc42e75e23 > > > Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 > > > 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 > > > ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 > > > > > > According to kernel bisection, this problem became visible due to commit > > > f7f99100d8d9 which changes how struct pages are initialized. > > > > > > Memblock layout affects the pfn ranges covered by node/zone. Consider > > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and > > > the default (no memmap= given) memblock layout is like below: > > > > > > MEMBLOCK configuration: > > >memory size = 0x0001fff75c00 reserved size = 0x0300c000 > > >memory.cnt = 0x4 > > >memory[0x0] [0x1000-0x0009efff], > > > 0x0009e000 bytes on node 0 flags: 0x0 > > >memory[0x1] [0x0010-0xbffd6fff], > > > 0xbfed7000 bytes on node 0 flags: 0x0 > > >memory[0x2] [0x0001-0x00013fff], > > > 0x4000 bytes on node 0 flags: 0x0 > > >memory[0x3] [0x00014000-0x00023fff], > > > 0x0001 bytes on node 1 flags: 0x0 > > >... > > > > > > If you give memmap=1G!4G (so it just covers memory[0x2]), > > > the range [0x1-0x13fff] is gone: > > > > > > MEMBLOCK configuration: > > >memory size = 0x0001bff75c00 reserved size = 0x0300c000 > > >memory.cnt = 0x3 > > >memory[0x0] [0x1000-0x0009efff], > > > 0x0009e000 bytes on node 0 flags: 0x0 > > >memory[0x1] [0x0010-0xbffd6fff], > > > 0xbfed7000 bytes on node 0 flags: 0x0 > > >memory[0x2] [0x00014000-0x00023fff], > > > 0x0001 bytes on node 1 flags: 0x0 > > >... > > > > > > This causes shrinking node 0's pfn range because it is calculated by > > > the address range of memblock.memory. So some of struct pages in the > > > gap range are left uninitialized. > > > > > > We have a function zero_resv_unavail() which does zeroing the struct > > > pages outside memblock.memory, but currently it covers only the reserved > > > unavailable range (i.e. memblock.memory && !memblock.reserved). > > > This patch extends it to cover all unavailable range, which fixes > > > the reported issue. > > > > Thanks for pin pointing this down Naoya! I am wondering why we cannot > > simply mark the excluded ranges to be reserved instead. > > I tried your idea with the change below, and it also fixes the kernel panic. > > --- > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > index d1f25c831447..2cef120535d4 100644 > --- a/arch/x86/kernel/e820.c > +++ b/arch/x86/kernel/e820.c > @@ -1248,6 +1248,7 @@ void __init e820__memblock_setup(void) > { > int i; > u64 end; > + u64 addr = 0; > > /* >
Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
On Thu 14-06-18 05:16:18, Naoya Horiguchi wrote: > On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote: > > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote: > > [...] > > > From: Naoya Horiguchi > > > Date: Wed, 13 Jun 2018 12:43:27 +0900 > > > Subject: [PATCH] mm: zero remaining unavailable struct pages > > > > > > There is a kernel panic that is triggered when reading /proc/kpageflags > > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': > > > > > > BUG: unable to handle kernel paging request at fffe > > > PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 > > > Oops: [#1] SMP PTI > > > CPU: 2 PID: 1728 Comm: page-types Not tainted > > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > 1.11.0-2.fc28 04/01/2014 > > > RIP: 0010:stable_page_flags+0x27/0x3c0 > > > Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 > > > fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 > > > c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 > > > RSP: 0018:bbd44111fde0 EFLAGS: 00010202 > > > RAX: fffe RBX: 7fffeff9 RCX: > > > RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 > > > RBP: R08: 0001 R09: 0001 > > > R10: bbd44111fed8 R11: R12: ed1182fff5c0 > > > R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 > > > FS: 7efc4335a500() GS:93a5bfc0() > > > knlGS: > > > CS: 0010 DS: ES: CR0: 80050033 > > > CR2: fffe CR3: b2a58000 CR4: 001406e0 > > > Call Trace: > > >kpageflags_read+0xc7/0x120 > > >proc_reg_read+0x3c/0x60 > > >__vfs_read+0x36/0x170 > > >vfs_read+0x89/0x130 > > >ksys_pread64+0x71/0x90 > > >do_syscall_64+0x5b/0x160 > > >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > RIP: 0033:0x7efc42e75e23 > > > Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 > > > 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 > > > ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 > > > > > > According to kernel bisection, this problem became visible due to commit > > > f7f99100d8d9 which changes how struct pages are initialized. > > > > > > Memblock layout affects the pfn ranges covered by node/zone. Consider > > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and > > > the default (no memmap= given) memblock layout is like below: > > > > > > MEMBLOCK configuration: > > >memory size = 0x0001fff75c00 reserved size = 0x0300c000 > > >memory.cnt = 0x4 > > >memory[0x0] [0x1000-0x0009efff], > > > 0x0009e000 bytes on node 0 flags: 0x0 > > >memory[0x1] [0x0010-0xbffd6fff], > > > 0xbfed7000 bytes on node 0 flags: 0x0 > > >memory[0x2] [0x0001-0x00013fff], > > > 0x4000 bytes on node 0 flags: 0x0 > > >memory[0x3] [0x00014000-0x00023fff], > > > 0x0001 bytes on node 1 flags: 0x0 > > >... > > > > > > If you give memmap=1G!4G (so it just covers memory[0x2]), > > > the range [0x1-0x13fff] is gone: > > > > > > MEMBLOCK configuration: > > >memory size = 0x0001bff75c00 reserved size = 0x0300c000 > > >memory.cnt = 0x3 > > >memory[0x0] [0x1000-0x0009efff], > > > 0x0009e000 bytes on node 0 flags: 0x0 > > >memory[0x1] [0x0010-0xbffd6fff], > > > 0xbfed7000 bytes on node 0 flags: 0x0 > > >memory[0x2] [0x00014000-0x00023fff], > > > 0x0001 bytes on node 1 flags: 0x0 > > >... > > > > > > This causes shrinking node 0's pfn range because it is calculated by > > > the address range of memblock.memory. So some of struct pages in the > > > gap range are left uninitialized. > > > > > > We have a function zero_resv_unavail() which does zeroing the struct > > > pages outside memblock.memory, but currently it covers only the reserved > > > unavailable range (i.e. memblock.memory && !memblock.reserved). > > > This patch extends it to cover all unavailable range, which fixes > > > the reported issue. > > > > Thanks for pin pointing this down Naoya! I am wondering why we cannot > > simply mark the excluded ranges to be reserved instead. > > I tried your idea with the change below, and it also fixes the kernel panic. > > --- > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > index d1f25c831447..2cef120535d4 100644 > --- a/arch/x86/kernel/e820.c > +++ b/arch/x86/kernel/e820.c > @@ -1248,6 +1248,7 @@ void __init e820__memblock_setup(void) > { > int i; > u64 end; > + u64 addr = 0; > > /* >
Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
On Thu, Jun 14, 2018 at 05:16:18AM +, Naoya Horiguchi wrote: > On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote: > > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote: > > [...] > > > From: Naoya Horiguchi > > > Date: Wed, 13 Jun 2018 12:43:27 +0900 > > > Subject: [PATCH] mm: zero remaining unavailable struct pages > > > > > > There is a kernel panic that is triggered when reading /proc/kpageflags > > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': > > > > > > BUG: unable to handle kernel paging request at fffe > > > PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 > > > Oops: [#1] SMP PTI > > > CPU: 2 PID: 1728 Comm: page-types Not tainted > > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > 1.11.0-2.fc28 04/01/2014 > > > RIP: 0010:stable_page_flags+0x27/0x3c0 > > > Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 > > > fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 > > > c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 > > > RSP: 0018:bbd44111fde0 EFLAGS: 00010202 > > > RAX: fffe RBX: 7fffeff9 RCX: > > > RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 > > > RBP: R08: 0001 R09: 0001 > > > R10: bbd44111fed8 R11: R12: ed1182fff5c0 > > > R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 > > > FS: 7efc4335a500() GS:93a5bfc0() > > > knlGS: > > > CS: 0010 DS: ES: CR0: 80050033 > > > CR2: fffe CR3: b2a58000 CR4: 001406e0 > > > Call Trace: > > >kpageflags_read+0xc7/0x120 > > >proc_reg_read+0x3c/0x60 > > >__vfs_read+0x36/0x170 > > >vfs_read+0x89/0x130 > > >ksys_pread64+0x71/0x90 > > >do_syscall_64+0x5b/0x160 > > >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > RIP: 0033:0x7efc42e75e23 > > > Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 > > > 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 > > > ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 > > > > > > According to kernel bisection, this problem became visible due to commit > > > f7f99100d8d9 which changes how struct pages are initialized. > > > > > > Memblock layout affects the pfn ranges covered by node/zone. Consider > > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and > > > the default (no memmap= given) memblock layout is like below: > > > > > > MEMBLOCK configuration: > > >memory size = 0x0001fff75c00 reserved size = 0x0300c000 > > >memory.cnt = 0x4 > > >memory[0x0] [0x1000-0x0009efff], > > > 0x0009e000 bytes on node 0 flags: 0x0 > > >memory[0x1] [0x0010-0xbffd6fff], > > > 0xbfed7000 bytes on node 0 flags: 0x0 > > >memory[0x2] [0x0001-0x00013fff], > > > 0x4000 bytes on node 0 flags: 0x0 > > >memory[0x3] [0x00014000-0x00023fff], > > > 0x0001 bytes on node 1 flags: 0x0 > > >... > > > > > > If you give memmap=1G!4G (so it just covers memory[0x2]), > > > the range [0x1-0x13fff] is gone: > > > > > > MEMBLOCK configuration: > > >memory size = 0x0001bff75c00 reserved size = 0x0300c000 > > >memory.cnt = 0x3 > > >memory[0x0] [0x1000-0x0009efff], > > > 0x0009e000 bytes on node 0 flags: 0x0 > > >memory[0x1] [0x0010-0xbffd6fff], > > > 0xbfed7000 bytes on node 0 flags: 0x0 > > >memory[0x2] [0x00014000-0x00023fff], > > > 0x0001 bytes on node 1 flags: 0x0 > > >... > > > > > > This causes shrinking node 0's pfn range because it is calculated by > > > the address range of memblock.memory. So some of struct pages in the > > > gap range are left uninitialized. > > > > > > We have a function zero_resv_unavail() which does zeroing the struct > > > pages outside memblock.memory, but currently it covers only the reserved > > > unavailable range (i.e. memblock.memory && !memblock.reserved). > > > This patch extends it to cover all unavailable range, which fixes > > > the reported issue. > > > > Thanks for pin pointing this down Naoya! I am wondering why we cannot > > simply mark the excluded ranges to be reserved instead. > > I tried your idea with the change below, and it also fixes the kernel panic. > > --- > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > index d1f25c831447..2cef120535d4 100644 > --- a/arch/x86/kernel/e820.c > +++ b/arch/x86/kernel/e820.c > @@ -1248,6 +1248,7 @@ void __init e820__memblock_setup(void) > { > int i; > u64 end; > + u64 addr = 0; >
Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
On Thu, Jun 14, 2018 at 05:16:18AM +, Naoya Horiguchi wrote: > On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote: > > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote: > > [...] > > > From: Naoya Horiguchi > > > Date: Wed, 13 Jun 2018 12:43:27 +0900 > > > Subject: [PATCH] mm: zero remaining unavailable struct pages > > > > > > There is a kernel panic that is triggered when reading /proc/kpageflags > > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': > > > > > > BUG: unable to handle kernel paging request at fffe > > > PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 > > > Oops: [#1] SMP PTI > > > CPU: 2 PID: 1728 Comm: page-types Not tainted > > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > 1.11.0-2.fc28 04/01/2014 > > > RIP: 0010:stable_page_flags+0x27/0x3c0 > > > Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 > > > fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 > > > c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 > > > RSP: 0018:bbd44111fde0 EFLAGS: 00010202 > > > RAX: fffe RBX: 7fffeff9 RCX: > > > RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 > > > RBP: R08: 0001 R09: 0001 > > > R10: bbd44111fed8 R11: R12: ed1182fff5c0 > > > R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 > > > FS: 7efc4335a500() GS:93a5bfc0() > > > knlGS: > > > CS: 0010 DS: ES: CR0: 80050033 > > > CR2: fffe CR3: b2a58000 CR4: 001406e0 > > > Call Trace: > > >kpageflags_read+0xc7/0x120 > > >proc_reg_read+0x3c/0x60 > > >__vfs_read+0x36/0x170 > > >vfs_read+0x89/0x130 > > >ksys_pread64+0x71/0x90 > > >do_syscall_64+0x5b/0x160 > > >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > RIP: 0033:0x7efc42e75e23 > > > Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 > > > 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 > > > ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 > > > > > > According to kernel bisection, this problem became visible due to commit > > > f7f99100d8d9 which changes how struct pages are initialized. > > > > > > Memblock layout affects the pfn ranges covered by node/zone. Consider > > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and > > > the default (no memmap= given) memblock layout is like below: > > > > > > MEMBLOCK configuration: > > >memory size = 0x0001fff75c00 reserved size = 0x0300c000 > > >memory.cnt = 0x4 > > >memory[0x0] [0x1000-0x0009efff], > > > 0x0009e000 bytes on node 0 flags: 0x0 > > >memory[0x1] [0x0010-0xbffd6fff], > > > 0xbfed7000 bytes on node 0 flags: 0x0 > > >memory[0x2] [0x0001-0x00013fff], > > > 0x4000 bytes on node 0 flags: 0x0 > > >memory[0x3] [0x00014000-0x00023fff], > > > 0x0001 bytes on node 1 flags: 0x0 > > >... > > > > > > If you give memmap=1G!4G (so it just covers memory[0x2]), > > > the range [0x1-0x13fff] is gone: > > > > > > MEMBLOCK configuration: > > >memory size = 0x0001bff75c00 reserved size = 0x0300c000 > > >memory.cnt = 0x3 > > >memory[0x0] [0x1000-0x0009efff], > > > 0x0009e000 bytes on node 0 flags: 0x0 > > >memory[0x1] [0x0010-0xbffd6fff], > > > 0xbfed7000 bytes on node 0 flags: 0x0 > > >memory[0x2] [0x00014000-0x00023fff], > > > 0x0001 bytes on node 1 flags: 0x0 > > >... > > > > > > This causes shrinking node 0's pfn range because it is calculated by > > > the address range of memblock.memory. So some of struct pages in the > > > gap range are left uninitialized. > > > > > > We have a function zero_resv_unavail() which does zeroing the struct > > > pages outside memblock.memory, but currently it covers only the reserved > > > unavailable range (i.e. memblock.memory && !memblock.reserved). > > > This patch extends it to cover all unavailable range, which fixes > > > the reported issue. > > > > Thanks for pin pointing this down Naoya! I am wondering why we cannot > > simply mark the excluded ranges to be reserved instead. > > I tried your idea with the change below, and it also fixes the kernel panic. > > --- > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > index d1f25c831447..2cef120535d4 100644 > --- a/arch/x86/kernel/e820.c > +++ b/arch/x86/kernel/e820.c > @@ -1248,6 +1248,7 @@ void __init e820__memblock_setup(void) > { > int i; > u64 end; > + u64 addr = 0; >
Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote: > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote: > [...] > > From: Naoya Horiguchi > > Date: Wed, 13 Jun 2018 12:43:27 +0900 > > Subject: [PATCH] mm: zero remaining unavailable struct pages > > > > There is a kernel panic that is triggered when reading /proc/kpageflags > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': > > > > BUG: unable to handle kernel paging request at fffe > > PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 > > Oops: [#1] SMP PTI > > CPU: 2 PID: 1728 Comm: page-types Not tainted > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 > > 04/01/2014 > > RIP: 0010:stable_page_flags+0x27/0x3c0 > > Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 > > fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 > > c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 > > RSP: 0018:bbd44111fde0 EFLAGS: 00010202 > > RAX: fffe RBX: 7fffeff9 RCX: > > RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 > > RBP: R08: 0001 R09: 0001 > > R10: bbd44111fed8 R11: R12: ed1182fff5c0 > > R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 > > FS: 7efc4335a500() GS:93a5bfc0() > > knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: fffe CR3: b2a58000 CR4: 001406e0 > > Call Trace: > >kpageflags_read+0xc7/0x120 > >proc_reg_read+0x3c/0x60 > >__vfs_read+0x36/0x170 > >vfs_read+0x89/0x130 > >ksys_pread64+0x71/0x90 > >do_syscall_64+0x5b/0x160 > >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > RIP: 0033:0x7efc42e75e23 > > Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 > > 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 > > ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 > > > > According to kernel bisection, this problem became visible due to commit > > f7f99100d8d9 which changes how struct pages are initialized. > > > > Memblock layout affects the pfn ranges covered by node/zone. Consider > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and > > the default (no memmap= given) memblock layout is like below: > > > > MEMBLOCK configuration: > >memory size = 0x0001fff75c00 reserved size = 0x0300c000 > >memory.cnt = 0x4 > >memory[0x0] [0x1000-0x0009efff], > > 0x0009e000 bytes on node 0 flags: 0x0 > >memory[0x1] [0x0010-0xbffd6fff], > > 0xbfed7000 bytes on node 0 flags: 0x0 > >memory[0x2] [0x0001-0x00013fff], > > 0x4000 bytes on node 0 flags: 0x0 > >memory[0x3] [0x00014000-0x00023fff], > > 0x0001 bytes on node 1 flags: 0x0 > >... > > > > If you give memmap=1G!4G (so it just covers memory[0x2]), > > the range [0x1-0x13fff] is gone: > > > > MEMBLOCK configuration: > >memory size = 0x0001bff75c00 reserved size = 0x0300c000 > >memory.cnt = 0x3 > >memory[0x0] [0x1000-0x0009efff], > > 0x0009e000 bytes on node 0 flags: 0x0 > >memory[0x1] [0x0010-0xbffd6fff], > > 0xbfed7000 bytes on node 0 flags: 0x0 > >memory[0x2] [0x00014000-0x00023fff], > > 0x0001 bytes on node 1 flags: 0x0 > >... > > > > This causes shrinking node 0's pfn range because it is calculated by > > the address range of memblock.memory. So some of struct pages in the > > gap range are left uninitialized. > > > > We have a function zero_resv_unavail() which does zeroing the struct > > pages outside memblock.memory, but currently it covers only the reserved > > unavailable range (i.e. memblock.memory && !memblock.reserved). > > This patch extends it to cover all unavailable range, which fixes > > the reported issue. > > Thanks for pin pointing this down Naoya! I am wondering why we cannot > simply mark the excluded ranges to be reserved instead. I tried your idea with the change below, and it also fixes the kernel panic. --- diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index d1f25c831447..2cef120535d4 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -1248,6 +1248,7 @@ void __init e820__memblock_setup(void) { int i; u64 end; + u64 addr = 0; /* * The bootstrap memblock region count maximum is 128 entries @@ -1264,13 +1265,16 @@ void __init e820__memblock_setup(void) struct e820_entry *entry = _table->entries[i]; end = entry->addr + entry->size; +
Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote: > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote: > [...] > > From: Naoya Horiguchi > > Date: Wed, 13 Jun 2018 12:43:27 +0900 > > Subject: [PATCH] mm: zero remaining unavailable struct pages > > > > There is a kernel panic that is triggered when reading /proc/kpageflags > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': > > > > BUG: unable to handle kernel paging request at fffe > > PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 > > Oops: [#1] SMP PTI > > CPU: 2 PID: 1728 Comm: page-types Not tainted > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 > > 04/01/2014 > > RIP: 0010:stable_page_flags+0x27/0x3c0 > > Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 > > fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 > > c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 > > RSP: 0018:bbd44111fde0 EFLAGS: 00010202 > > RAX: fffe RBX: 7fffeff9 RCX: > > RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 > > RBP: R08: 0001 R09: 0001 > > R10: bbd44111fed8 R11: R12: ed1182fff5c0 > > R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 > > FS: 7efc4335a500() GS:93a5bfc0() > > knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: fffe CR3: b2a58000 CR4: 001406e0 > > Call Trace: > >kpageflags_read+0xc7/0x120 > >proc_reg_read+0x3c/0x60 > >__vfs_read+0x36/0x170 > >vfs_read+0x89/0x130 > >ksys_pread64+0x71/0x90 > >do_syscall_64+0x5b/0x160 > >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > RIP: 0033:0x7efc42e75e23 > > Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 > > 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 > > ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 > > > > According to kernel bisection, this problem became visible due to commit > > f7f99100d8d9 which changes how struct pages are initialized. > > > > Memblock layout affects the pfn ranges covered by node/zone. Consider > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and > > the default (no memmap= given) memblock layout is like below: > > > > MEMBLOCK configuration: > >memory size = 0x0001fff75c00 reserved size = 0x0300c000 > >memory.cnt = 0x4 > >memory[0x0] [0x1000-0x0009efff], > > 0x0009e000 bytes on node 0 flags: 0x0 > >memory[0x1] [0x0010-0xbffd6fff], > > 0xbfed7000 bytes on node 0 flags: 0x0 > >memory[0x2] [0x0001-0x00013fff], > > 0x4000 bytes on node 0 flags: 0x0 > >memory[0x3] [0x00014000-0x00023fff], > > 0x0001 bytes on node 1 flags: 0x0 > >... > > > > If you give memmap=1G!4G (so it just covers memory[0x2]), > > the range [0x1-0x13fff] is gone: > > > > MEMBLOCK configuration: > >memory size = 0x0001bff75c00 reserved size = 0x0300c000 > >memory.cnt = 0x3 > >memory[0x0] [0x1000-0x0009efff], > > 0x0009e000 bytes on node 0 flags: 0x0 > >memory[0x1] [0x0010-0xbffd6fff], > > 0xbfed7000 bytes on node 0 flags: 0x0 > >memory[0x2] [0x00014000-0x00023fff], > > 0x0001 bytes on node 1 flags: 0x0 > >... > > > > This causes shrinking node 0's pfn range because it is calculated by > > the address range of memblock.memory. So some of struct pages in the > > gap range are left uninitialized. > > > > We have a function zero_resv_unavail() which does zeroing the struct > > pages outside memblock.memory, but currently it covers only the reserved > > unavailable range (i.e. memblock.memory && !memblock.reserved). > > This patch extends it to cover all unavailable range, which fixes > > the reported issue. > > Thanks for pin pointing this down Naoya! I am wondering why we cannot > simply mark the excluded ranges to be reserved instead. I tried your idea with the change below, and it also fixes the kernel panic. --- diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index d1f25c831447..2cef120535d4 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -1248,6 +1248,7 @@ void __init e820__memblock_setup(void) { int i; u64 end; + u64 addr = 0; /* * The bootstrap memblock region count maximum is 128 entries @@ -1264,13 +1265,16 @@ void __init e820__memblock_setup(void) struct e820_entry *entry = _table->entries[i]; end = entry->addr + entry->size; +
Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
On Wed, Jun 13, 2018 at 10:40:32AM +0200, Oscar Salvador wrote: > On Wed, Jun 13, 2018 at 05:41:08AM +, Naoya Horiguchi wrote: > > Hi everyone, > > > > I wrote a patch for this issue. > > There was a discussion about prechecking approach, but I finally found > > out it's hard to make change on memblock after numa_init, so I take > > another apporach (see patch description). > > > > I'm glad if you check that it works for you. > > > > Thanks, > > Naoya Horiguchi > > --- > > From: Naoya Horiguchi > > Date: Wed, 13 Jun 2018 12:43:27 +0900 > > Subject: [PATCH] mm: zero remaining unavailable struct pages > > > > There is a kernel panic that is triggered when reading /proc/kpageflags > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': > > > > BUG: unable to handle kernel paging request at fffe > > PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 > > Oops: [#1] SMP PTI > > CPU: 2 PID: 1728 Comm: page-types Not tainted > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 > > 04/01/2014 > > RIP: 0010:stable_page_flags+0x27/0x3c0 > > Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 > > fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 > > c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 > > RSP: 0018:bbd44111fde0 EFLAGS: 00010202 > > RAX: fffe RBX: 7fffeff9 RCX: > > RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 > > RBP: R08: 0001 R09: 0001 > > R10: bbd44111fed8 R11: R12: ed1182fff5c0 > > R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 > > FS: 7efc4335a500() GS:93a5bfc0() > > knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: fffe CR3: b2a58000 CR4: 001406e0 > > Call Trace: > >kpageflags_read+0xc7/0x120 > >proc_reg_read+0x3c/0x60 > >__vfs_read+0x36/0x170 > >vfs_read+0x89/0x130 > >ksys_pread64+0x71/0x90 > >do_syscall_64+0x5b/0x160 > >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > RIP: 0033:0x7efc42e75e23 > > Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 > > 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 > > ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 > > > > According to kernel bisection, this problem became visible due to commit > > f7f99100d8d9 which changes how struct pages are initialized. > > > > Memblock layout affects the pfn ranges covered by node/zone. Consider > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and > > the default (no memmap= given) memblock layout is like below: > > > > MEMBLOCK configuration: > >memory size = 0x0001fff75c00 reserved size = 0x0300c000 > >memory.cnt = 0x4 > >memory[0x0] [0x1000-0x0009efff], > > 0x0009e000 bytes on node 0 flags: 0x0 > >memory[0x1] [0x0010-0xbffd6fff], > > 0xbfed7000 bytes on node 0 flags: 0x0 > >memory[0x2] [0x0001-0x00013fff], > > 0x4000 bytes on node 0 flags: 0x0 > >memory[0x3] [0x00014000-0x00023fff], > > 0x0001 bytes on node 1 flags: 0x0 > >... > > > > If you give memmap=1G!4G (so it just covers memory[0x2]), > > the range [0x1-0x13fff] is gone: > > > > MEMBLOCK configuration: > >memory size = 0x0001bff75c00 reserved size = 0x0300c000 > >memory.cnt = 0x3 > >memory[0x0] [0x1000-0x0009efff], > > 0x0009e000 bytes on node 0 flags: 0x0 > >memory[0x1] [0x0010-0xbffd6fff], > > 0xbfed7000 bytes on node 0 flags: 0x0 > >memory[0x2] [0x00014000-0x00023fff], > > 0x0001 bytes on node 1 flags: 0x0 > >... > > > > This causes shrinking node 0's pfn range because it is calculated by > > the address range of memblock.memory. So some of struct pages in the > > gap range are left uninitialized. > > > > We have a function zero_resv_unavail() which does zeroing the struct > > pages outside memblock.memory, but currently it covers only the reserved > > unavailable range (i.e. memblock.memory && !memblock.reserved). > > This patch extends it to cover all unavailable range, which fixes > > the reported issue. > > > > Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") > > Signed-off-by: Naoya Horiguchi > > --- > > include/linux/memblock.h | 16 > > mm/page_alloc.c | 33 - > > 2 files changed, 24 insertions(+), 25 deletions(-) > > > > diff --git a/include/linux/memblock.h b/include/linux/memblock.h > >
Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
On Wed, Jun 13, 2018 at 10:40:32AM +0200, Oscar Salvador wrote: > On Wed, Jun 13, 2018 at 05:41:08AM +, Naoya Horiguchi wrote: > > Hi everyone, > > > > I wrote a patch for this issue. > > There was a discussion about prechecking approach, but I finally found > > out it's hard to make change on memblock after numa_init, so I take > > another apporach (see patch description). > > > > I'm glad if you check that it works for you. > > > > Thanks, > > Naoya Horiguchi > > --- > > From: Naoya Horiguchi > > Date: Wed, 13 Jun 2018 12:43:27 +0900 > > Subject: [PATCH] mm: zero remaining unavailable struct pages > > > > There is a kernel panic that is triggered when reading /proc/kpageflags > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': > > > > BUG: unable to handle kernel paging request at fffe > > PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 > > Oops: [#1] SMP PTI > > CPU: 2 PID: 1728 Comm: page-types Not tainted > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 > > 04/01/2014 > > RIP: 0010:stable_page_flags+0x27/0x3c0 > > Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 > > fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 > > c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 > > RSP: 0018:bbd44111fde0 EFLAGS: 00010202 > > RAX: fffe RBX: 7fffeff9 RCX: > > RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 > > RBP: R08: 0001 R09: 0001 > > R10: bbd44111fed8 R11: R12: ed1182fff5c0 > > R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 > > FS: 7efc4335a500() GS:93a5bfc0() > > knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: fffe CR3: b2a58000 CR4: 001406e0 > > Call Trace: > >kpageflags_read+0xc7/0x120 > >proc_reg_read+0x3c/0x60 > >__vfs_read+0x36/0x170 > >vfs_read+0x89/0x130 > >ksys_pread64+0x71/0x90 > >do_syscall_64+0x5b/0x160 > >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > RIP: 0033:0x7efc42e75e23 > > Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 > > 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 > > ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 > > > > According to kernel bisection, this problem became visible due to commit > > f7f99100d8d9 which changes how struct pages are initialized. > > > > Memblock layout affects the pfn ranges covered by node/zone. Consider > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and > > the default (no memmap= given) memblock layout is like below: > > > > MEMBLOCK configuration: > >memory size = 0x0001fff75c00 reserved size = 0x0300c000 > >memory.cnt = 0x4 > >memory[0x0] [0x1000-0x0009efff], > > 0x0009e000 bytes on node 0 flags: 0x0 > >memory[0x1] [0x0010-0xbffd6fff], > > 0xbfed7000 bytes on node 0 flags: 0x0 > >memory[0x2] [0x0001-0x00013fff], > > 0x4000 bytes on node 0 flags: 0x0 > >memory[0x3] [0x00014000-0x00023fff], > > 0x0001 bytes on node 1 flags: 0x0 > >... > > > > If you give memmap=1G!4G (so it just covers memory[0x2]), > > the range [0x1-0x13fff] is gone: > > > > MEMBLOCK configuration: > >memory size = 0x0001bff75c00 reserved size = 0x0300c000 > >memory.cnt = 0x3 > >memory[0x0] [0x1000-0x0009efff], > > 0x0009e000 bytes on node 0 flags: 0x0 > >memory[0x1] [0x0010-0xbffd6fff], > > 0xbfed7000 bytes on node 0 flags: 0x0 > >memory[0x2] [0x00014000-0x00023fff], > > 0x0001 bytes on node 1 flags: 0x0 > >... > > > > This causes shrinking node 0's pfn range because it is calculated by > > the address range of memblock.memory. So some of struct pages in the > > gap range are left uninitialized. > > > > We have a function zero_resv_unavail() which does zeroing the struct > > pages outside memblock.memory, but currently it covers only the reserved > > unavailable range (i.e. memblock.memory && !memblock.reserved). > > This patch extends it to cover all unavailable range, which fixes > > the reported issue. > > > > Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") > > Signed-off-by: Naoya Horiguchi > > --- > > include/linux/memblock.h | 16 > > mm/page_alloc.c | 33 - > > 2 files changed, 24 insertions(+), 25 deletions(-) > > > > diff --git a/include/linux/memblock.h b/include/linux/memblock.h > >
Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote: [...] > From: Naoya Horiguchi > Date: Wed, 13 Jun 2018 12:43:27 +0900 > Subject: [PATCH] mm: zero remaining unavailable struct pages > > There is a kernel panic that is triggered when reading /proc/kpageflags > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': > > BUG: unable to handle kernel paging request at fffe > PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 > Oops: [#1] SMP PTI > CPU: 2 PID: 1728 Comm: page-types Not tainted > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 > 04/01/2014 > RIP: 0010:stable_page_flags+0x27/0x3c0 > Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc > 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 > 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 > RSP: 0018:bbd44111fde0 EFLAGS: 00010202 > RAX: fffe RBX: 7fffeff9 RCX: > RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 > RBP: R08: 0001 R09: 0001 > R10: bbd44111fed8 R11: R12: ed1182fff5c0 > R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 > FS: 7efc4335a500() GS:93a5bfc0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: fffe CR3: b2a58000 CR4: 001406e0 > Call Trace: >kpageflags_read+0xc7/0x120 >proc_reg_read+0x3c/0x60 >__vfs_read+0x36/0x170 >vfs_read+0x89/0x130 >ksys_pread64+0x71/0x90 >do_syscall_64+0x5b/0x160 >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x7efc42e75e23 > Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 > 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff > 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 > > According to kernel bisection, this problem became visible due to commit > f7f99100d8d9 which changes how struct pages are initialized. > > Memblock layout affects the pfn ranges covered by node/zone. Consider > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and > the default (no memmap= given) memblock layout is like below: > > MEMBLOCK configuration: >memory size = 0x0001fff75c00 reserved size = 0x0300c000 >memory.cnt = 0x4 >memory[0x0] [0x1000-0x0009efff], > 0x0009e000 bytes on node 0 flags: 0x0 >memory[0x1] [0x0010-0xbffd6fff], > 0xbfed7000 bytes on node 0 flags: 0x0 >memory[0x2] [0x0001-0x00013fff], > 0x4000 bytes on node 0 flags: 0x0 >memory[0x3] [0x00014000-0x00023fff], > 0x0001 bytes on node 1 flags: 0x0 >... > > If you give memmap=1G!4G (so it just covers memory[0x2]), > the range [0x1-0x13fff] is gone: > > MEMBLOCK configuration: >memory size = 0x0001bff75c00 reserved size = 0x0300c000 >memory.cnt = 0x3 >memory[0x0] [0x1000-0x0009efff], > 0x0009e000 bytes on node 0 flags: 0x0 >memory[0x1] [0x0010-0xbffd6fff], > 0xbfed7000 bytes on node 0 flags: 0x0 >memory[0x2] [0x00014000-0x00023fff], > 0x0001 bytes on node 1 flags: 0x0 >... > > This causes shrinking node 0's pfn range because it is calculated by > the address range of memblock.memory. So some of struct pages in the > gap range are left uninitialized. > > We have a function zero_resv_unavail() which does zeroing the struct > pages outside memblock.memory, but currently it covers only the reserved > unavailable range (i.e. memblock.memory && !memblock.reserved). > This patch extends it to cover all unavailable range, which fixes > the reported issue. Thanks for pin pointing this down Naoya! I am wondering why we cannot simply mark the excluded ranges to be reserved instead. -- Michal Hocko SUSE Labs
Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote: [...] > From: Naoya Horiguchi > Date: Wed, 13 Jun 2018 12:43:27 +0900 > Subject: [PATCH] mm: zero remaining unavailable struct pages > > There is a kernel panic that is triggered when reading /proc/kpageflags > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': > > BUG: unable to handle kernel paging request at fffe > PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 > Oops: [#1] SMP PTI > CPU: 2 PID: 1728 Comm: page-types Not tainted > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 > 04/01/2014 > RIP: 0010:stable_page_flags+0x27/0x3c0 > Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc > 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 > 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 > RSP: 0018:bbd44111fde0 EFLAGS: 00010202 > RAX: fffe RBX: 7fffeff9 RCX: > RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 > RBP: R08: 0001 R09: 0001 > R10: bbd44111fed8 R11: R12: ed1182fff5c0 > R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 > FS: 7efc4335a500() GS:93a5bfc0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: fffe CR3: b2a58000 CR4: 001406e0 > Call Trace: >kpageflags_read+0xc7/0x120 >proc_reg_read+0x3c/0x60 >__vfs_read+0x36/0x170 >vfs_read+0x89/0x130 >ksys_pread64+0x71/0x90 >do_syscall_64+0x5b/0x160 >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x7efc42e75e23 > Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 > 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff > 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 > > According to kernel bisection, this problem became visible due to commit > f7f99100d8d9 which changes how struct pages are initialized. > > Memblock layout affects the pfn ranges covered by node/zone. Consider > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and > the default (no memmap= given) memblock layout is like below: > > MEMBLOCK configuration: >memory size = 0x0001fff75c00 reserved size = 0x0300c000 >memory.cnt = 0x4 >memory[0x0] [0x1000-0x0009efff], > 0x0009e000 bytes on node 0 flags: 0x0 >memory[0x1] [0x0010-0xbffd6fff], > 0xbfed7000 bytes on node 0 flags: 0x0 >memory[0x2] [0x0001-0x00013fff], > 0x4000 bytes on node 0 flags: 0x0 >memory[0x3] [0x00014000-0x00023fff], > 0x0001 bytes on node 1 flags: 0x0 >... > > If you give memmap=1G!4G (so it just covers memory[0x2]), > the range [0x1-0x13fff] is gone: > > MEMBLOCK configuration: >memory size = 0x0001bff75c00 reserved size = 0x0300c000 >memory.cnt = 0x3 >memory[0x0] [0x1000-0x0009efff], > 0x0009e000 bytes on node 0 flags: 0x0 >memory[0x1] [0x0010-0xbffd6fff], > 0xbfed7000 bytes on node 0 flags: 0x0 >memory[0x2] [0x00014000-0x00023fff], > 0x0001 bytes on node 1 flags: 0x0 >... > > This causes shrinking node 0's pfn range because it is calculated by > the address range of memblock.memory. So some of struct pages in the > gap range are left uninitialized. > > We have a function zero_resv_unavail() which does zeroing the struct > pages outside memblock.memory, but currently it covers only the reserved > unavailable range (i.e. memblock.memory && !memblock.reserved). > This patch extends it to cover all unavailable range, which fixes > the reported issue. Thanks for pin pointing this down Naoya! I am wondering why we cannot simply mark the excluded ranges to be reserved instead. -- Michal Hocko SUSE Labs
Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
On Wed, Jun 13, 2018 at 05:41:08AM +, Naoya Horiguchi wrote: > Hi everyone, > > I wrote a patch for this issue. > There was a discussion about prechecking approach, but I finally found > out it's hard to make change on memblock after numa_init, so I take > another apporach (see patch description). > > I'm glad if you check that it works for you. > > Thanks, > Naoya Horiguchi > --- > From: Naoya Horiguchi > Date: Wed, 13 Jun 2018 12:43:27 +0900 > Subject: [PATCH] mm: zero remaining unavailable struct pages > > There is a kernel panic that is triggered when reading /proc/kpageflags > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': > > BUG: unable to handle kernel paging request at fffe > PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 > Oops: [#1] SMP PTI > CPU: 2 PID: 1728 Comm: page-types Not tainted > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 > 04/01/2014 > RIP: 0010:stable_page_flags+0x27/0x3c0 > Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc > 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 > 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 > RSP: 0018:bbd44111fde0 EFLAGS: 00010202 > RAX: fffe RBX: 7fffeff9 RCX: > RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 > RBP: R08: 0001 R09: 0001 > R10: bbd44111fed8 R11: R12: ed1182fff5c0 > R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 > FS: 7efc4335a500() GS:93a5bfc0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: fffe CR3: b2a58000 CR4: 001406e0 > Call Trace: >kpageflags_read+0xc7/0x120 >proc_reg_read+0x3c/0x60 >__vfs_read+0x36/0x170 >vfs_read+0x89/0x130 >ksys_pread64+0x71/0x90 >do_syscall_64+0x5b/0x160 >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x7efc42e75e23 > Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 > 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff > 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 > > According to kernel bisection, this problem became visible due to commit > f7f99100d8d9 which changes how struct pages are initialized. > > Memblock layout affects the pfn ranges covered by node/zone. Consider > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and > the default (no memmap= given) memblock layout is like below: > > MEMBLOCK configuration: >memory size = 0x0001fff75c00 reserved size = 0x0300c000 >memory.cnt = 0x4 >memory[0x0] [0x1000-0x0009efff], > 0x0009e000 bytes on node 0 flags: 0x0 >memory[0x1] [0x0010-0xbffd6fff], > 0xbfed7000 bytes on node 0 flags: 0x0 >memory[0x2] [0x0001-0x00013fff], > 0x4000 bytes on node 0 flags: 0x0 >memory[0x3] [0x00014000-0x00023fff], > 0x0001 bytes on node 1 flags: 0x0 >... > > If you give memmap=1G!4G (so it just covers memory[0x2]), > the range [0x1-0x13fff] is gone: > > MEMBLOCK configuration: >memory size = 0x0001bff75c00 reserved size = 0x0300c000 >memory.cnt = 0x3 >memory[0x0] [0x1000-0x0009efff], > 0x0009e000 bytes on node 0 flags: 0x0 >memory[0x1] [0x0010-0xbffd6fff], > 0xbfed7000 bytes on node 0 flags: 0x0 >memory[0x2] [0x00014000-0x00023fff], > 0x0001 bytes on node 1 flags: 0x0 >... > > This causes shrinking node 0's pfn range because it is calculated by > the address range of memblock.memory. So some of struct pages in the > gap range are left uninitialized. > > We have a function zero_resv_unavail() which does zeroing the struct > pages outside memblock.memory, but currently it covers only the reserved > unavailable range (i.e. memblock.memory && !memblock.reserved). > This patch extends it to cover all unavailable range, which fixes > the reported issue. > > Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") > Signed-off-by: Naoya Horiguchi > --- > include/linux/memblock.h | 16 > mm/page_alloc.c | 33 - > 2 files changed, 24 insertions(+), 25 deletions(-) > > diff --git a/include/linux/memblock.h b/include/linux/memblock.h > index ca59883c8364..f191e51c5d2a 100644 > --- a/include/linux/memblock.h > +++ b/include/linux/memblock.h > @@ -236,22 +236,6 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned > long *out_start_pfn, > for_each_mem_range_rev(i, , , \ > nid,
Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
On Wed, Jun 13, 2018 at 05:41:08AM +, Naoya Horiguchi wrote: > Hi everyone, > > I wrote a patch for this issue. > There was a discussion about prechecking approach, but I finally found > out it's hard to make change on memblock after numa_init, so I take > another apporach (see patch description). > > I'm glad if you check that it works for you. > > Thanks, > Naoya Horiguchi > --- > From: Naoya Horiguchi > Date: Wed, 13 Jun 2018 12:43:27 +0900 > Subject: [PATCH] mm: zero remaining unavailable struct pages > > There is a kernel panic that is triggered when reading /proc/kpageflags > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': > > BUG: unable to handle kernel paging request at fffe > PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 > Oops: [#1] SMP PTI > CPU: 2 PID: 1728 Comm: page-types Not tainted > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 > 04/01/2014 > RIP: 0010:stable_page_flags+0x27/0x3c0 > Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc > 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 > 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 > RSP: 0018:bbd44111fde0 EFLAGS: 00010202 > RAX: fffe RBX: 7fffeff9 RCX: > RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 > RBP: R08: 0001 R09: 0001 > R10: bbd44111fed8 R11: R12: ed1182fff5c0 > R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 > FS: 7efc4335a500() GS:93a5bfc0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: fffe CR3: b2a58000 CR4: 001406e0 > Call Trace: >kpageflags_read+0xc7/0x120 >proc_reg_read+0x3c/0x60 >__vfs_read+0x36/0x170 >vfs_read+0x89/0x130 >ksys_pread64+0x71/0x90 >do_syscall_64+0x5b/0x160 >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x7efc42e75e23 > Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 > 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff > 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 > > According to kernel bisection, this problem became visible due to commit > f7f99100d8d9 which changes how struct pages are initialized. > > Memblock layout affects the pfn ranges covered by node/zone. Consider > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and > the default (no memmap= given) memblock layout is like below: > > MEMBLOCK configuration: >memory size = 0x0001fff75c00 reserved size = 0x0300c000 >memory.cnt = 0x4 >memory[0x0] [0x1000-0x0009efff], > 0x0009e000 bytes on node 0 flags: 0x0 >memory[0x1] [0x0010-0xbffd6fff], > 0xbfed7000 bytes on node 0 flags: 0x0 >memory[0x2] [0x0001-0x00013fff], > 0x4000 bytes on node 0 flags: 0x0 >memory[0x3] [0x00014000-0x00023fff], > 0x0001 bytes on node 1 flags: 0x0 >... > > If you give memmap=1G!4G (so it just covers memory[0x2]), > the range [0x1-0x13fff] is gone: > > MEMBLOCK configuration: >memory size = 0x0001bff75c00 reserved size = 0x0300c000 >memory.cnt = 0x3 >memory[0x0] [0x1000-0x0009efff], > 0x0009e000 bytes on node 0 flags: 0x0 >memory[0x1] [0x0010-0xbffd6fff], > 0xbfed7000 bytes on node 0 flags: 0x0 >memory[0x2] [0x00014000-0x00023fff], > 0x0001 bytes on node 1 flags: 0x0 >... > > This causes shrinking node 0's pfn range because it is calculated by > the address range of memblock.memory. So some of struct pages in the > gap range are left uninitialized. > > We have a function zero_resv_unavail() which does zeroing the struct > pages outside memblock.memory, but currently it covers only the reserved > unavailable range (i.e. memblock.memory && !memblock.reserved). > This patch extends it to cover all unavailable range, which fixes > the reported issue. > > Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") > Signed-off-by: Naoya Horiguchi > --- > include/linux/memblock.h | 16 > mm/page_alloc.c | 33 - > 2 files changed, 24 insertions(+), 25 deletions(-) > > diff --git a/include/linux/memblock.h b/include/linux/memblock.h > index ca59883c8364..f191e51c5d2a 100644 > --- a/include/linux/memblock.h > +++ b/include/linux/memblock.h > @@ -236,22 +236,6 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned > long *out_start_pfn, > for_each_mem_range_rev(i, , , \ > nid,
[PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
Hi everyone, I wrote a patch for this issue. There was a discussion about prechecking approach, but I finally found out it's hard to make change on memblock after numa_init, so I take another apporach (see patch description). I'm glad if you check that it works for you. Thanks, Naoya Horiguchi --- From: Naoya Horiguchi Date: Wed, 13 Jun 2018 12:43:27 +0900 Subject: [PATCH] mm: zero remaining unavailable struct pages There is a kernel panic that is triggered when reading /proc/kpageflags on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': BUG: unable to handle kernel paging request at fffe PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 Oops: [#1] SMP PTI CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014 RIP: 0010:stable_page_flags+0x27/0x3c0 Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 RSP: 0018:bbd44111fde0 EFLAGS: 00010202 RAX: fffe RBX: 7fffeff9 RCX: RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 RBP: R08: 0001 R09: 0001 R10: bbd44111fed8 R11: R12: ed1182fff5c0 R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 FS: 7efc4335a500() GS:93a5bfc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: fffe CR3: b2a58000 CR4: 001406e0 Call Trace: kpageflags_read+0xc7/0x120 proc_reg_read+0x3c/0x60 __vfs_read+0x36/0x170 vfs_read+0x89/0x130 ksys_pread64+0x71/0x90 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7efc42e75e23 Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 According to kernel bisection, this problem became visible due to commit f7f99100d8d9 which changes how struct pages are initialized. Memblock layout affects the pfn ranges covered by node/zone. Consider that we have a VM with 2 NUMA nodes and each node has 4GB memory, and the default (no memmap= given) memblock layout is like below: MEMBLOCK configuration: memory size = 0x0001fff75c00 reserved size = 0x0300c000 memory.cnt = 0x4 memory[0x0] [0x1000-0x0009efff], 0x0009e000 bytes on node 0 flags: 0x0 memory[0x1] [0x0010-0xbffd6fff], 0xbfed7000 bytes on node 0 flags: 0x0 memory[0x2] [0x0001-0x00013fff], 0x4000 bytes on node 0 flags: 0x0 memory[0x3] [0x00014000-0x00023fff], 0x0001 bytes on node 1 flags: 0x0 ... If you give memmap=1G!4G (so it just covers memory[0x2]), the range [0x1-0x13fff] is gone: MEMBLOCK configuration: memory size = 0x0001bff75c00 reserved size = 0x0300c000 memory.cnt = 0x3 memory[0x0] [0x1000-0x0009efff], 0x0009e000 bytes on node 0 flags: 0x0 memory[0x1] [0x0010-0xbffd6fff], 0xbfed7000 bytes on node 0 flags: 0x0 memory[0x2] [0x00014000-0x00023fff], 0x0001 bytes on node 1 flags: 0x0 ... This causes shrinking node 0's pfn range because it is calculated by the address range of memblock.memory. So some of struct pages in the gap range are left uninitialized. We have a function zero_resv_unavail() which does zeroing the struct pages outside memblock.memory, but currently it covers only the reserved unavailable range (i.e. memblock.memory && !memblock.reserved). This patch extends it to cover all unavailable range, which fixes the reported issue. Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: Naoya Horiguchi --- include/linux/memblock.h | 16 mm/page_alloc.c | 33 - 2 files changed, 24 insertions(+), 25 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index ca59883c8364..f191e51c5d2a 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -236,22 +236,6 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, for_each_mem_range_rev(i, , , \ nid, flags, p_start, p_end, p_nid) -/** - * for_each_resv_unavail_range - iterate through reserved and unavailable memory - * @i: u64 used as loop variable - * @flags: pick from blocks based on memory attributes - * @p_start: ptr to phys_addr_t for start address of the range, can be
[PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)
Hi everyone, I wrote a patch for this issue. There was a discussion about prechecking approach, but I finally found out it's hard to make change on memblock after numa_init, so I take another apporach (see patch description). I'm glad if you check that it works for you. Thanks, Naoya Horiguchi --- From: Naoya Horiguchi Date: Wed, 13 Jun 2018 12:43:27 +0900 Subject: [PATCH] mm: zero remaining unavailable struct pages There is a kernel panic that is triggered when reading /proc/kpageflags on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': BUG: unable to handle kernel paging request at fffe PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 Oops: [#1] SMP PTI CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014 RIP: 0010:stable_page_flags+0x27/0x3c0 Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 RSP: 0018:bbd44111fde0 EFLAGS: 00010202 RAX: fffe RBX: 7fffeff9 RCX: RDX: 0001 RSI: 0202 RDI: ed1182fff5c0 RBP: R08: 0001 R09: 0001 R10: bbd44111fed8 R11: R12: ed1182fff5c0 R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10 FS: 7efc4335a500() GS:93a5bfc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: fffe CR3: b2a58000 CR4: 001406e0 Call Trace: kpageflags_read+0xc7/0x120 proc_reg_read+0x3c/0x60 __vfs_read+0x36/0x170 vfs_read+0x89/0x130 ksys_pread64+0x71/0x90 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7efc42e75e23 Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 According to kernel bisection, this problem became visible due to commit f7f99100d8d9 which changes how struct pages are initialized. Memblock layout affects the pfn ranges covered by node/zone. Consider that we have a VM with 2 NUMA nodes and each node has 4GB memory, and the default (no memmap= given) memblock layout is like below: MEMBLOCK configuration: memory size = 0x0001fff75c00 reserved size = 0x0300c000 memory.cnt = 0x4 memory[0x0] [0x1000-0x0009efff], 0x0009e000 bytes on node 0 flags: 0x0 memory[0x1] [0x0010-0xbffd6fff], 0xbfed7000 bytes on node 0 flags: 0x0 memory[0x2] [0x0001-0x00013fff], 0x4000 bytes on node 0 flags: 0x0 memory[0x3] [0x00014000-0x00023fff], 0x0001 bytes on node 1 flags: 0x0 ... If you give memmap=1G!4G (so it just covers memory[0x2]), the range [0x1-0x13fff] is gone: MEMBLOCK configuration: memory size = 0x0001bff75c00 reserved size = 0x0300c000 memory.cnt = 0x3 memory[0x0] [0x1000-0x0009efff], 0x0009e000 bytes on node 0 flags: 0x0 memory[0x1] [0x0010-0xbffd6fff], 0xbfed7000 bytes on node 0 flags: 0x0 memory[0x2] [0x00014000-0x00023fff], 0x0001 bytes on node 1 flags: 0x0 ... This causes shrinking node 0's pfn range because it is calculated by the address range of memblock.memory. So some of struct pages in the gap range are left uninitialized. We have a function zero_resv_unavail() which does zeroing the struct pages outside memblock.memory, but currently it covers only the reserved unavailable range (i.e. memblock.memory && !memblock.reserved). This patch extends it to cover all unavailable range, which fixes the reported issue. Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: Naoya Horiguchi --- include/linux/memblock.h | 16 mm/page_alloc.c | 33 - 2 files changed, 24 insertions(+), 25 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index ca59883c8364..f191e51c5d2a 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -236,22 +236,6 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, for_each_mem_range_rev(i, , , \ nid, flags, p_start, p_end, p_nid) -/** - * for_each_resv_unavail_range - iterate through reserved and unavailable memory - * @i: u64 used as loop variable - * @flags: pick from blocks based on memory attributes - * @p_start: ptr to phys_addr_t for start address of the range, can be