Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-15 Thread Michal Hocko
On Fri 15-06-18 01:07:22, Naoya Horiguchi wrote:
> On Thu, Jun 14, 2018 at 09:00:50AM +0200, Michal Hocko wrote:
> > On Thu 14-06-18 05:16:18, Naoya Horiguchi wrote:
> > > On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote:
> > > > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote:
> > > > [...]
> > > > > From: Naoya Horiguchi 
> > > > > Date: Wed, 13 Jun 2018 12:43:27 +0900
> > > > > Subject: [PATCH] mm: zero remaining unavailable struct pages
> > > > >
> > > > > There is a kernel panic that is triggered when reading 
> > > > > /proc/kpageflags
> > > > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
> > > > >
> > > > >   BUG: unable to handle kernel paging request at fffe
> > > > >   PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
> > > > >   Oops:  [#1] SMP PTI
> > > > >   CPU: 2 PID: 1728 Comm: page-types Not tainted 
> > > > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
> > > > >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > > > 1.11.0-2.fc28 04/01/2014
> > > > >   RIP: 0010:stable_page_flags+0x27/0x3c0
> > > > >   Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 
> > > > > 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 
> > > > > <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
> > > > >   RSP: 0018:bbd44111fde0 EFLAGS: 00010202
> > > > >   RAX: fffe RBX: 7fffeff9 RCX: 
> > > > >   RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
> > > > >   RBP:  R08: 0001 R09: 0001
> > > > >   R10: bbd44111fed8 R11:  R12: ed1182fff5c0
> > > > >   R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
> > > > >   FS:  7efc4335a500() GS:93a5bfc0() 
> > > > > knlGS:
> > > > >   CS:  0010 DS:  ES:  CR0: 80050033
> > > > >   CR2: fffe CR3: b2a58000 CR4: 001406e0
> > > > >   Call Trace:
> > > > >kpageflags_read+0xc7/0x120
> > > > >proc_reg_read+0x3c/0x60
> > > > >__vfs_read+0x36/0x170
> > > > >vfs_read+0x89/0x130
> > > > >ksys_pread64+0x71/0x90
> > > > >do_syscall_64+0x5b/0x160
> > > > >entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > >   RIP: 0033:0x7efc42e75e23
> > > > >   Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 
> > > > > 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 
> > > > > <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
> > > > >
> > > > > According to kernel bisection, this problem became visible due to 
> > > > > commit
> > > > > f7f99100d8d9 which changes how struct pages are initialized.
> > > > >
> > > > > Memblock layout affects the pfn ranges covered by node/zone. Consider
> > > > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
> > > > > the default (no memmap= given) memblock layout is like below:
> > > > >
> > > > >   MEMBLOCK configuration:
> > > > >memory size = 0x0001fff75c00 reserved size = 0x0300c000
> > > > >memory.cnt  = 0x4
> > > > >memory[0x0] [0x1000-0x0009efff], 
> > > > > 0x0009e000 bytes on node 0 flags: 0x0
> > > > >memory[0x1] [0x0010-0xbffd6fff], 
> > > > > 0xbfed7000 bytes on node 0 flags: 0x0
> > > > >memory[0x2] [0x0001-0x00013fff], 
> > > > > 0x4000 bytes on node 0 flags: 0x0
> > > > >memory[0x3] [0x00014000-0x00023fff], 
> > > > > 0x0001 bytes on node 1 flags: 0x0
> > > > >...
> > > > >
> > > > > If you give memmap=1G!4G (so it just covers memory[0x2]),
> > > > > the range [0x1-0x13fff] is gone:
> > > > >
> > > > >   MEMBLOCK configuration:
> > > > >memory size = 0x0001bff75c00 reserved size = 0x0300c000
> > > > >memory.cnt  = 0x3
> > > > >memory[0x0] [0x1000-0x0009efff], 
> > > > > 0x0009e000 bytes on node 0 flags: 0x0
> > > > >memory[0x1] [0x0010-0xbffd6fff], 
> > > > > 0xbfed7000 bytes on node 0 flags: 0x0
> > > > >memory[0x2] [0x00014000-0x00023fff], 
> > > > > 0x0001 bytes on node 1 flags: 0x0
> > > > >...
> > > > >
> > > > > This causes shrinking node 0's pfn range because it is calculated by
> > > > > the address range of memblock.memory. So some of struct pages in the
> > > > > gap range are left uninitialized.
> > > > >
> > > > > We have a function zero_resv_unavail() which does zeroing the struct
> > > > > pages outside memblock.memory, but currently it covers only the 
> > > > > reserved
> > > > > unavailable range (i.e. memblock.memory && !memblock.reserved).
> > > > > This patch extends it to cover all unavailable range, which fixes
> > > > > the reported issue.
> > > >
> > > > Thanks for pin 

Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-15 Thread Michal Hocko
On Fri 15-06-18 01:07:22, Naoya Horiguchi wrote:
> On Thu, Jun 14, 2018 at 09:00:50AM +0200, Michal Hocko wrote:
> > On Thu 14-06-18 05:16:18, Naoya Horiguchi wrote:
> > > On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote:
> > > > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote:
> > > > [...]
> > > > > From: Naoya Horiguchi 
> > > > > Date: Wed, 13 Jun 2018 12:43:27 +0900
> > > > > Subject: [PATCH] mm: zero remaining unavailable struct pages
> > > > >
> > > > > There is a kernel panic that is triggered when reading 
> > > > > /proc/kpageflags
> > > > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
> > > > >
> > > > >   BUG: unable to handle kernel paging request at fffe
> > > > >   PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
> > > > >   Oops:  [#1] SMP PTI
> > > > >   CPU: 2 PID: 1728 Comm: page-types Not tainted 
> > > > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
> > > > >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > > > 1.11.0-2.fc28 04/01/2014
> > > > >   RIP: 0010:stable_page_flags+0x27/0x3c0
> > > > >   Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 
> > > > > 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 
> > > > > <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
> > > > >   RSP: 0018:bbd44111fde0 EFLAGS: 00010202
> > > > >   RAX: fffe RBX: 7fffeff9 RCX: 
> > > > >   RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
> > > > >   RBP:  R08: 0001 R09: 0001
> > > > >   R10: bbd44111fed8 R11:  R12: ed1182fff5c0
> > > > >   R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
> > > > >   FS:  7efc4335a500() GS:93a5bfc0() 
> > > > > knlGS:
> > > > >   CS:  0010 DS:  ES:  CR0: 80050033
> > > > >   CR2: fffe CR3: b2a58000 CR4: 001406e0
> > > > >   Call Trace:
> > > > >kpageflags_read+0xc7/0x120
> > > > >proc_reg_read+0x3c/0x60
> > > > >__vfs_read+0x36/0x170
> > > > >vfs_read+0x89/0x130
> > > > >ksys_pread64+0x71/0x90
> > > > >do_syscall_64+0x5b/0x160
> > > > >entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > >   RIP: 0033:0x7efc42e75e23
> > > > >   Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 
> > > > > 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 
> > > > > <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
> > > > >
> > > > > According to kernel bisection, this problem became visible due to 
> > > > > commit
> > > > > f7f99100d8d9 which changes how struct pages are initialized.
> > > > >
> > > > > Memblock layout affects the pfn ranges covered by node/zone. Consider
> > > > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
> > > > > the default (no memmap= given) memblock layout is like below:
> > > > >
> > > > >   MEMBLOCK configuration:
> > > > >memory size = 0x0001fff75c00 reserved size = 0x0300c000
> > > > >memory.cnt  = 0x4
> > > > >memory[0x0] [0x1000-0x0009efff], 
> > > > > 0x0009e000 bytes on node 0 flags: 0x0
> > > > >memory[0x1] [0x0010-0xbffd6fff], 
> > > > > 0xbfed7000 bytes on node 0 flags: 0x0
> > > > >memory[0x2] [0x0001-0x00013fff], 
> > > > > 0x4000 bytes on node 0 flags: 0x0
> > > > >memory[0x3] [0x00014000-0x00023fff], 
> > > > > 0x0001 bytes on node 1 flags: 0x0
> > > > >...
> > > > >
> > > > > If you give memmap=1G!4G (so it just covers memory[0x2]),
> > > > > the range [0x1-0x13fff] is gone:
> > > > >
> > > > >   MEMBLOCK configuration:
> > > > >memory size = 0x0001bff75c00 reserved size = 0x0300c000
> > > > >memory.cnt  = 0x3
> > > > >memory[0x0] [0x1000-0x0009efff], 
> > > > > 0x0009e000 bytes on node 0 flags: 0x0
> > > > >memory[0x1] [0x0010-0xbffd6fff], 
> > > > > 0xbfed7000 bytes on node 0 flags: 0x0
> > > > >memory[0x2] [0x00014000-0x00023fff], 
> > > > > 0x0001 bytes on node 1 flags: 0x0
> > > > >...
> > > > >
> > > > > This causes shrinking node 0's pfn range because it is calculated by
> > > > > the address range of memblock.memory. So some of struct pages in the
> > > > > gap range are left uninitialized.
> > > > >
> > > > > We have a function zero_resv_unavail() which does zeroing the struct
> > > > > pages outside memblock.memory, but currently it covers only the 
> > > > > reserved
> > > > > unavailable range (i.e. memblock.memory && !memblock.reserved).
> > > > > This patch extends it to cover all unavailable range, which fixes
> > > > > the reported issue.
> > > >
> > > > Thanks for pin 

Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-14 Thread Naoya Horiguchi
On Thu, Jun 14, 2018 at 09:00:50AM +0200, Michal Hocko wrote:
> On Thu 14-06-18 05:16:18, Naoya Horiguchi wrote:
> > On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote:
> > > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote:
> > > [...]
> > > > From: Naoya Horiguchi 
> > > > Date: Wed, 13 Jun 2018 12:43:27 +0900
> > > > Subject: [PATCH] mm: zero remaining unavailable struct pages
> > > >
> > > > There is a kernel panic that is triggered when reading /proc/kpageflags
> > > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
> > > >
> > > >   BUG: unable to handle kernel paging request at fffe
> > > >   PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
> > > >   Oops:  [#1] SMP PTI
> > > >   CPU: 2 PID: 1728 Comm: page-types Not tainted 
> > > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
> > > >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > > 1.11.0-2.fc28 04/01/2014
> > > >   RIP: 0010:stable_page_flags+0x27/0x3c0
> > > >   Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 
> > > > 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 
> > > > 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
> > > >   RSP: 0018:bbd44111fde0 EFLAGS: 00010202
> > > >   RAX: fffe RBX: 7fffeff9 RCX: 
> > > >   RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
> > > >   RBP:  R08: 0001 R09: 0001
> > > >   R10: bbd44111fed8 R11:  R12: ed1182fff5c0
> > > >   R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
> > > >   FS:  7efc4335a500() GS:93a5bfc0() 
> > > > knlGS:
> > > >   CS:  0010 DS:  ES:  CR0: 80050033
> > > >   CR2: fffe CR3: b2a58000 CR4: 001406e0
> > > >   Call Trace:
> > > >kpageflags_read+0xc7/0x120
> > > >proc_reg_read+0x3c/0x60
> > > >__vfs_read+0x36/0x170
> > > >vfs_read+0x89/0x130
> > > >ksys_pread64+0x71/0x90
> > > >do_syscall_64+0x5b/0x160
> > > >entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > >   RIP: 0033:0x7efc42e75e23
> > > >   Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 
> > > > 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 
> > > > 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
> > > >
> > > > According to kernel bisection, this problem became visible due to commit
> > > > f7f99100d8d9 which changes how struct pages are initialized.
> > > >
> > > > Memblock layout affects the pfn ranges covered by node/zone. Consider
> > > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
> > > > the default (no memmap= given) memblock layout is like below:
> > > >
> > > >   MEMBLOCK configuration:
> > > >memory size = 0x0001fff75c00 reserved size = 0x0300c000
> > > >memory.cnt  = 0x4
> > > >memory[0x0] [0x1000-0x0009efff], 
> > > > 0x0009e000 bytes on node 0 flags: 0x0
> > > >memory[0x1] [0x0010-0xbffd6fff], 
> > > > 0xbfed7000 bytes on node 0 flags: 0x0
> > > >memory[0x2] [0x0001-0x00013fff], 
> > > > 0x4000 bytes on node 0 flags: 0x0
> > > >memory[0x3] [0x00014000-0x00023fff], 
> > > > 0x0001 bytes on node 1 flags: 0x0
> > > >...
> > > >
> > > > If you give memmap=1G!4G (so it just covers memory[0x2]),
> > > > the range [0x1-0x13fff] is gone:
> > > >
> > > >   MEMBLOCK configuration:
> > > >memory size = 0x0001bff75c00 reserved size = 0x0300c000
> > > >memory.cnt  = 0x3
> > > >memory[0x0] [0x1000-0x0009efff], 
> > > > 0x0009e000 bytes on node 0 flags: 0x0
> > > >memory[0x1] [0x0010-0xbffd6fff], 
> > > > 0xbfed7000 bytes on node 0 flags: 0x0
> > > >memory[0x2] [0x00014000-0x00023fff], 
> > > > 0x0001 bytes on node 1 flags: 0x0
> > > >...
> > > >
> > > > This causes shrinking node 0's pfn range because it is calculated by
> > > > the address range of memblock.memory. So some of struct pages in the
> > > > gap range are left uninitialized.
> > > >
> > > > We have a function zero_resv_unavail() which does zeroing the struct
> > > > pages outside memblock.memory, but currently it covers only the reserved
> > > > unavailable range (i.e. memblock.memory && !memblock.reserved).
> > > > This patch extends it to cover all unavailable range, which fixes
> > > > the reported issue.
> > >
> > > Thanks for pin pointing this down Naoya! I am wondering why we cannot
> > > simply mark the excluded ranges to be reserved instead.
> > 
> > I tried your idea with the change below, and it also fixes the kernel panic.
> > 
> > ---
> > diff --git a/arch/x86/kernel/e820.c 

Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-14 Thread Naoya Horiguchi
On Thu, Jun 14, 2018 at 09:00:50AM +0200, Michal Hocko wrote:
> On Thu 14-06-18 05:16:18, Naoya Horiguchi wrote:
> > On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote:
> > > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote:
> > > [...]
> > > > From: Naoya Horiguchi 
> > > > Date: Wed, 13 Jun 2018 12:43:27 +0900
> > > > Subject: [PATCH] mm: zero remaining unavailable struct pages
> > > >
> > > > There is a kernel panic that is triggered when reading /proc/kpageflags
> > > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
> > > >
> > > >   BUG: unable to handle kernel paging request at fffe
> > > >   PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
> > > >   Oops:  [#1] SMP PTI
> > > >   CPU: 2 PID: 1728 Comm: page-types Not tainted 
> > > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
> > > >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > > 1.11.0-2.fc28 04/01/2014
> > > >   RIP: 0010:stable_page_flags+0x27/0x3c0
> > > >   Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 
> > > > 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 
> > > > 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
> > > >   RSP: 0018:bbd44111fde0 EFLAGS: 00010202
> > > >   RAX: fffe RBX: 7fffeff9 RCX: 
> > > >   RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
> > > >   RBP:  R08: 0001 R09: 0001
> > > >   R10: bbd44111fed8 R11:  R12: ed1182fff5c0
> > > >   R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
> > > >   FS:  7efc4335a500() GS:93a5bfc0() 
> > > > knlGS:
> > > >   CS:  0010 DS:  ES:  CR0: 80050033
> > > >   CR2: fffe CR3: b2a58000 CR4: 001406e0
> > > >   Call Trace:
> > > >kpageflags_read+0xc7/0x120
> > > >proc_reg_read+0x3c/0x60
> > > >__vfs_read+0x36/0x170
> > > >vfs_read+0x89/0x130
> > > >ksys_pread64+0x71/0x90
> > > >do_syscall_64+0x5b/0x160
> > > >entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > >   RIP: 0033:0x7efc42e75e23
> > > >   Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 
> > > > 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 
> > > > 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
> > > >
> > > > According to kernel bisection, this problem became visible due to commit
> > > > f7f99100d8d9 which changes how struct pages are initialized.
> > > >
> > > > Memblock layout affects the pfn ranges covered by node/zone. Consider
> > > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
> > > > the default (no memmap= given) memblock layout is like below:
> > > >
> > > >   MEMBLOCK configuration:
> > > >memory size = 0x0001fff75c00 reserved size = 0x0300c000
> > > >memory.cnt  = 0x4
> > > >memory[0x0] [0x1000-0x0009efff], 
> > > > 0x0009e000 bytes on node 0 flags: 0x0
> > > >memory[0x1] [0x0010-0xbffd6fff], 
> > > > 0xbfed7000 bytes on node 0 flags: 0x0
> > > >memory[0x2] [0x0001-0x00013fff], 
> > > > 0x4000 bytes on node 0 flags: 0x0
> > > >memory[0x3] [0x00014000-0x00023fff], 
> > > > 0x0001 bytes on node 1 flags: 0x0
> > > >...
> > > >
> > > > If you give memmap=1G!4G (so it just covers memory[0x2]),
> > > > the range [0x1-0x13fff] is gone:
> > > >
> > > >   MEMBLOCK configuration:
> > > >memory size = 0x0001bff75c00 reserved size = 0x0300c000
> > > >memory.cnt  = 0x3
> > > >memory[0x0] [0x1000-0x0009efff], 
> > > > 0x0009e000 bytes on node 0 flags: 0x0
> > > >memory[0x1] [0x0010-0xbffd6fff], 
> > > > 0xbfed7000 bytes on node 0 flags: 0x0
> > > >memory[0x2] [0x00014000-0x00023fff], 
> > > > 0x0001 bytes on node 1 flags: 0x0
> > > >...
> > > >
> > > > This causes shrinking node 0's pfn range because it is calculated by
> > > > the address range of memblock.memory. So some of struct pages in the
> > > > gap range are left uninitialized.
> > > >
> > > > We have a function zero_resv_unavail() which does zeroing the struct
> > > > pages outside memblock.memory, but currently it covers only the reserved
> > > > unavailable range (i.e. memblock.memory && !memblock.reserved).
> > > > This patch extends it to cover all unavailable range, which fixes
> > > > the reported issue.
> > >
> > > Thanks for pin pointing this down Naoya! I am wondering why we cannot
> > > simply mark the excluded ranges to be reserved instead.
> > 
> > I tried your idea with the change below, and it also fixes the kernel panic.
> > 
> > ---
> > diff --git a/arch/x86/kernel/e820.c 

Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-14 Thread Michal Hocko
On Thu 14-06-18 05:16:18, Naoya Horiguchi wrote:
> On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote:
> > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote:
> > [...]
> > > From: Naoya Horiguchi 
> > > Date: Wed, 13 Jun 2018 12:43:27 +0900
> > > Subject: [PATCH] mm: zero remaining unavailable struct pages
> > >
> > > There is a kernel panic that is triggered when reading /proc/kpageflags
> > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
> > >
> > >   BUG: unable to handle kernel paging request at fffe
> > >   PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
> > >   Oops:  [#1] SMP PTI
> > >   CPU: 2 PID: 1728 Comm: page-types Not tainted 
> > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
> > >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > 1.11.0-2.fc28 04/01/2014
> > >   RIP: 0010:stable_page_flags+0x27/0x3c0
> > >   Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 
> > > fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 
> > > c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
> > >   RSP: 0018:bbd44111fde0 EFLAGS: 00010202
> > >   RAX: fffe RBX: 7fffeff9 RCX: 
> > >   RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
> > >   RBP:  R08: 0001 R09: 0001
> > >   R10: bbd44111fed8 R11:  R12: ed1182fff5c0
> > >   R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
> > >   FS:  7efc4335a500() GS:93a5bfc0() 
> > > knlGS:
> > >   CS:  0010 DS:  ES:  CR0: 80050033
> > >   CR2: fffe CR3: b2a58000 CR4: 001406e0
> > >   Call Trace:
> > >kpageflags_read+0xc7/0x120
> > >proc_reg_read+0x3c/0x60
> > >__vfs_read+0x36/0x170
> > >vfs_read+0x89/0x130
> > >ksys_pread64+0x71/0x90
> > >do_syscall_64+0x5b/0x160
> > >entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > >   RIP: 0033:0x7efc42e75e23
> > >   Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 
> > > 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 
> > > ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
> > >
> > > According to kernel bisection, this problem became visible due to commit
> > > f7f99100d8d9 which changes how struct pages are initialized.
> > >
> > > Memblock layout affects the pfn ranges covered by node/zone. Consider
> > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
> > > the default (no memmap= given) memblock layout is like below:
> > >
> > >   MEMBLOCK configuration:
> > >memory size = 0x0001fff75c00 reserved size = 0x0300c000
> > >memory.cnt  = 0x4
> > >memory[0x0] [0x1000-0x0009efff], 
> > > 0x0009e000 bytes on node 0 flags: 0x0
> > >memory[0x1] [0x0010-0xbffd6fff], 
> > > 0xbfed7000 bytes on node 0 flags: 0x0
> > >memory[0x2] [0x0001-0x00013fff], 
> > > 0x4000 bytes on node 0 flags: 0x0
> > >memory[0x3] [0x00014000-0x00023fff], 
> > > 0x0001 bytes on node 1 flags: 0x0
> > >...
> > >
> > > If you give memmap=1G!4G (so it just covers memory[0x2]),
> > > the range [0x1-0x13fff] is gone:
> > >
> > >   MEMBLOCK configuration:
> > >memory size = 0x0001bff75c00 reserved size = 0x0300c000
> > >memory.cnt  = 0x3
> > >memory[0x0] [0x1000-0x0009efff], 
> > > 0x0009e000 bytes on node 0 flags: 0x0
> > >memory[0x1] [0x0010-0xbffd6fff], 
> > > 0xbfed7000 bytes on node 0 flags: 0x0
> > >memory[0x2] [0x00014000-0x00023fff], 
> > > 0x0001 bytes on node 1 flags: 0x0
> > >...
> > >
> > > This causes shrinking node 0's pfn range because it is calculated by
> > > the address range of memblock.memory. So some of struct pages in the
> > > gap range are left uninitialized.
> > >
> > > We have a function zero_resv_unavail() which does zeroing the struct
> > > pages outside memblock.memory, but currently it covers only the reserved
> > > unavailable range (i.e. memblock.memory && !memblock.reserved).
> > > This patch extends it to cover all unavailable range, which fixes
> > > the reported issue.
> >
> > Thanks for pin pointing this down Naoya! I am wondering why we cannot
> > simply mark the excluded ranges to be reserved instead.
> 
> I tried your idea with the change below, and it also fixes the kernel panic.
> 
> ---
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index d1f25c831447..2cef120535d4 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -1248,6 +1248,7 @@ void __init e820__memblock_setup(void)
>  {
>   int i;
>   u64 end;
> + u64 addr = 0;
>  
>   /*
>  

Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-14 Thread Michal Hocko
On Thu 14-06-18 05:16:18, Naoya Horiguchi wrote:
> On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote:
> > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote:
> > [...]
> > > From: Naoya Horiguchi 
> > > Date: Wed, 13 Jun 2018 12:43:27 +0900
> > > Subject: [PATCH] mm: zero remaining unavailable struct pages
> > >
> > > There is a kernel panic that is triggered when reading /proc/kpageflags
> > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
> > >
> > >   BUG: unable to handle kernel paging request at fffe
> > >   PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
> > >   Oops:  [#1] SMP PTI
> > >   CPU: 2 PID: 1728 Comm: page-types Not tainted 
> > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
> > >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > 1.11.0-2.fc28 04/01/2014
> > >   RIP: 0010:stable_page_flags+0x27/0x3c0
> > >   Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 
> > > fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 
> > > c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
> > >   RSP: 0018:bbd44111fde0 EFLAGS: 00010202
> > >   RAX: fffe RBX: 7fffeff9 RCX: 
> > >   RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
> > >   RBP:  R08: 0001 R09: 0001
> > >   R10: bbd44111fed8 R11:  R12: ed1182fff5c0
> > >   R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
> > >   FS:  7efc4335a500() GS:93a5bfc0() 
> > > knlGS:
> > >   CS:  0010 DS:  ES:  CR0: 80050033
> > >   CR2: fffe CR3: b2a58000 CR4: 001406e0
> > >   Call Trace:
> > >kpageflags_read+0xc7/0x120
> > >proc_reg_read+0x3c/0x60
> > >__vfs_read+0x36/0x170
> > >vfs_read+0x89/0x130
> > >ksys_pread64+0x71/0x90
> > >do_syscall_64+0x5b/0x160
> > >entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > >   RIP: 0033:0x7efc42e75e23
> > >   Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 
> > > 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 
> > > ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
> > >
> > > According to kernel bisection, this problem became visible due to commit
> > > f7f99100d8d9 which changes how struct pages are initialized.
> > >
> > > Memblock layout affects the pfn ranges covered by node/zone. Consider
> > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
> > > the default (no memmap= given) memblock layout is like below:
> > >
> > >   MEMBLOCK configuration:
> > >memory size = 0x0001fff75c00 reserved size = 0x0300c000
> > >memory.cnt  = 0x4
> > >memory[0x0] [0x1000-0x0009efff], 
> > > 0x0009e000 bytes on node 0 flags: 0x0
> > >memory[0x1] [0x0010-0xbffd6fff], 
> > > 0xbfed7000 bytes on node 0 flags: 0x0
> > >memory[0x2] [0x0001-0x00013fff], 
> > > 0x4000 bytes on node 0 flags: 0x0
> > >memory[0x3] [0x00014000-0x00023fff], 
> > > 0x0001 bytes on node 1 flags: 0x0
> > >...
> > >
> > > If you give memmap=1G!4G (so it just covers memory[0x2]),
> > > the range [0x1-0x13fff] is gone:
> > >
> > >   MEMBLOCK configuration:
> > >memory size = 0x0001bff75c00 reserved size = 0x0300c000
> > >memory.cnt  = 0x3
> > >memory[0x0] [0x1000-0x0009efff], 
> > > 0x0009e000 bytes on node 0 flags: 0x0
> > >memory[0x1] [0x0010-0xbffd6fff], 
> > > 0xbfed7000 bytes on node 0 flags: 0x0
> > >memory[0x2] [0x00014000-0x00023fff], 
> > > 0x0001 bytes on node 1 flags: 0x0
> > >...
> > >
> > > This causes shrinking node 0's pfn range because it is calculated by
> > > the address range of memblock.memory. So some of struct pages in the
> > > gap range are left uninitialized.
> > >
> > > We have a function zero_resv_unavail() which does zeroing the struct
> > > pages outside memblock.memory, but currently it covers only the reserved
> > > unavailable range (i.e. memblock.memory && !memblock.reserved).
> > > This patch extends it to cover all unavailable range, which fixes
> > > the reported issue.
> >
> > Thanks for pin pointing this down Naoya! I am wondering why we cannot
> > simply mark the excluded ranges to be reserved instead.
> 
> I tried your idea with the change below, and it also fixes the kernel panic.
> 
> ---
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index d1f25c831447..2cef120535d4 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -1248,6 +1248,7 @@ void __init e820__memblock_setup(void)
>  {
>   int i;
>   u64 end;
> + u64 addr = 0;
>  
>   /*
>  

Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-13 Thread Oscar Salvador
On Thu, Jun 14, 2018 at 05:16:18AM +, Naoya Horiguchi wrote:
> On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote:
> > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote:
> > [...]
> > > From: Naoya Horiguchi 
> > > Date: Wed, 13 Jun 2018 12:43:27 +0900
> > > Subject: [PATCH] mm: zero remaining unavailable struct pages
> > >
> > > There is a kernel panic that is triggered when reading /proc/kpageflags
> > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
> > >
> > >   BUG: unable to handle kernel paging request at fffe
> > >   PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
> > >   Oops:  [#1] SMP PTI
> > >   CPU: 2 PID: 1728 Comm: page-types Not tainted 
> > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
> > >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > 1.11.0-2.fc28 04/01/2014
> > >   RIP: 0010:stable_page_flags+0x27/0x3c0
> > >   Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 
> > > fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 
> > > c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
> > >   RSP: 0018:bbd44111fde0 EFLAGS: 00010202
> > >   RAX: fffe RBX: 7fffeff9 RCX: 
> > >   RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
> > >   RBP:  R08: 0001 R09: 0001
> > >   R10: bbd44111fed8 R11:  R12: ed1182fff5c0
> > >   R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
> > >   FS:  7efc4335a500() GS:93a5bfc0() 
> > > knlGS:
> > >   CS:  0010 DS:  ES:  CR0: 80050033
> > >   CR2: fffe CR3: b2a58000 CR4: 001406e0
> > >   Call Trace:
> > >kpageflags_read+0xc7/0x120
> > >proc_reg_read+0x3c/0x60
> > >__vfs_read+0x36/0x170
> > >vfs_read+0x89/0x130
> > >ksys_pread64+0x71/0x90
> > >do_syscall_64+0x5b/0x160
> > >entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > >   RIP: 0033:0x7efc42e75e23
> > >   Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 
> > > 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 
> > > ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
> > >
> > > According to kernel bisection, this problem became visible due to commit
> > > f7f99100d8d9 which changes how struct pages are initialized.
> > >
> > > Memblock layout affects the pfn ranges covered by node/zone. Consider
> > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
> > > the default (no memmap= given) memblock layout is like below:
> > >
> > >   MEMBLOCK configuration:
> > >memory size = 0x0001fff75c00 reserved size = 0x0300c000
> > >memory.cnt  = 0x4
> > >memory[0x0] [0x1000-0x0009efff], 
> > > 0x0009e000 bytes on node 0 flags: 0x0
> > >memory[0x1] [0x0010-0xbffd6fff], 
> > > 0xbfed7000 bytes on node 0 flags: 0x0
> > >memory[0x2] [0x0001-0x00013fff], 
> > > 0x4000 bytes on node 0 flags: 0x0
> > >memory[0x3] [0x00014000-0x00023fff], 
> > > 0x0001 bytes on node 1 flags: 0x0
> > >...
> > >
> > > If you give memmap=1G!4G (so it just covers memory[0x2]),
> > > the range [0x1-0x13fff] is gone:
> > >
> > >   MEMBLOCK configuration:
> > >memory size = 0x0001bff75c00 reserved size = 0x0300c000
> > >memory.cnt  = 0x3
> > >memory[0x0] [0x1000-0x0009efff], 
> > > 0x0009e000 bytes on node 0 flags: 0x0
> > >memory[0x1] [0x0010-0xbffd6fff], 
> > > 0xbfed7000 bytes on node 0 flags: 0x0
> > >memory[0x2] [0x00014000-0x00023fff], 
> > > 0x0001 bytes on node 1 flags: 0x0
> > >...
> > >
> > > This causes shrinking node 0's pfn range because it is calculated by
> > > the address range of memblock.memory. So some of struct pages in the
> > > gap range are left uninitialized.
> > >
> > > We have a function zero_resv_unavail() which does zeroing the struct
> > > pages outside memblock.memory, but currently it covers only the reserved
> > > unavailable range (i.e. memblock.memory && !memblock.reserved).
> > > This patch extends it to cover all unavailable range, which fixes
> > > the reported issue.
> >
> > Thanks for pin pointing this down Naoya! I am wondering why we cannot
> > simply mark the excluded ranges to be reserved instead.
> 
> I tried your idea with the change below, and it also fixes the kernel panic.
> 
> ---
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index d1f25c831447..2cef120535d4 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -1248,6 +1248,7 @@ void __init e820__memblock_setup(void)
>  {
>   int i;
>   u64 end;
> + u64 addr = 0;
> 

Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-13 Thread Oscar Salvador
On Thu, Jun 14, 2018 at 05:16:18AM +, Naoya Horiguchi wrote:
> On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote:
> > On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote:
> > [...]
> > > From: Naoya Horiguchi 
> > > Date: Wed, 13 Jun 2018 12:43:27 +0900
> > > Subject: [PATCH] mm: zero remaining unavailable struct pages
> > >
> > > There is a kernel panic that is triggered when reading /proc/kpageflags
> > > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
> > >
> > >   BUG: unable to handle kernel paging request at fffe
> > >   PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
> > >   Oops:  [#1] SMP PTI
> > >   CPU: 2 PID: 1728 Comm: page-types Not tainted 
> > > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
> > >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > 1.11.0-2.fc28 04/01/2014
> > >   RIP: 0010:stable_page_flags+0x27/0x3c0
> > >   Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 
> > > fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 
> > > c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
> > >   RSP: 0018:bbd44111fde0 EFLAGS: 00010202
> > >   RAX: fffe RBX: 7fffeff9 RCX: 
> > >   RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
> > >   RBP:  R08: 0001 R09: 0001
> > >   R10: bbd44111fed8 R11:  R12: ed1182fff5c0
> > >   R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
> > >   FS:  7efc4335a500() GS:93a5bfc0() 
> > > knlGS:
> > >   CS:  0010 DS:  ES:  CR0: 80050033
> > >   CR2: fffe CR3: b2a58000 CR4: 001406e0
> > >   Call Trace:
> > >kpageflags_read+0xc7/0x120
> > >proc_reg_read+0x3c/0x60
> > >__vfs_read+0x36/0x170
> > >vfs_read+0x89/0x130
> > >ksys_pread64+0x71/0x90
> > >do_syscall_64+0x5b/0x160
> > >entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > >   RIP: 0033:0x7efc42e75e23
> > >   Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 
> > > 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 
> > > ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
> > >
> > > According to kernel bisection, this problem became visible due to commit
> > > f7f99100d8d9 which changes how struct pages are initialized.
> > >
> > > Memblock layout affects the pfn ranges covered by node/zone. Consider
> > > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
> > > the default (no memmap= given) memblock layout is like below:
> > >
> > >   MEMBLOCK configuration:
> > >memory size = 0x0001fff75c00 reserved size = 0x0300c000
> > >memory.cnt  = 0x4
> > >memory[0x0] [0x1000-0x0009efff], 
> > > 0x0009e000 bytes on node 0 flags: 0x0
> > >memory[0x1] [0x0010-0xbffd6fff], 
> > > 0xbfed7000 bytes on node 0 flags: 0x0
> > >memory[0x2] [0x0001-0x00013fff], 
> > > 0x4000 bytes on node 0 flags: 0x0
> > >memory[0x3] [0x00014000-0x00023fff], 
> > > 0x0001 bytes on node 1 flags: 0x0
> > >...
> > >
> > > If you give memmap=1G!4G (so it just covers memory[0x2]),
> > > the range [0x1-0x13fff] is gone:
> > >
> > >   MEMBLOCK configuration:
> > >memory size = 0x0001bff75c00 reserved size = 0x0300c000
> > >memory.cnt  = 0x3
> > >memory[0x0] [0x1000-0x0009efff], 
> > > 0x0009e000 bytes on node 0 flags: 0x0
> > >memory[0x1] [0x0010-0xbffd6fff], 
> > > 0xbfed7000 bytes on node 0 flags: 0x0
> > >memory[0x2] [0x00014000-0x00023fff], 
> > > 0x0001 bytes on node 1 flags: 0x0
> > >...
> > >
> > > This causes shrinking node 0's pfn range because it is calculated by
> > > the address range of memblock.memory. So some of struct pages in the
> > > gap range are left uninitialized.
> > >
> > > We have a function zero_resv_unavail() which does zeroing the struct
> > > pages outside memblock.memory, but currently it covers only the reserved
> > > unavailable range (i.e. memblock.memory && !memblock.reserved).
> > > This patch extends it to cover all unavailable range, which fixes
> > > the reported issue.
> >
> > Thanks for pin pointing this down Naoya! I am wondering why we cannot
> > simply mark the excluded ranges to be reserved instead.
> 
> I tried your idea with the change below, and it also fixes the kernel panic.
> 
> ---
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index d1f25c831447..2cef120535d4 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -1248,6 +1248,7 @@ void __init e820__memblock_setup(void)
>  {
>   int i;
>   u64 end;
> + u64 addr = 0;
> 

Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-13 Thread Naoya Horiguchi
On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote:
> On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote:
> [...]
> > From: Naoya Horiguchi 
> > Date: Wed, 13 Jun 2018 12:43:27 +0900
> > Subject: [PATCH] mm: zero remaining unavailable struct pages
> >
> > There is a kernel panic that is triggered when reading /proc/kpageflags
> > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
> >
> >   BUG: unable to handle kernel paging request at fffe
> >   PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
> >   Oops:  [#1] SMP PTI
> >   CPU: 2 PID: 1728 Comm: page-types Not tainted 
> > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
> >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 
> > 04/01/2014
> >   RIP: 0010:stable_page_flags+0x27/0x3c0
> >   Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 
> > fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 
> > c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
> >   RSP: 0018:bbd44111fde0 EFLAGS: 00010202
> >   RAX: fffe RBX: 7fffeff9 RCX: 
> >   RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
> >   RBP:  R08: 0001 R09: 0001
> >   R10: bbd44111fed8 R11:  R12: ed1182fff5c0
> >   R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
> >   FS:  7efc4335a500() GS:93a5bfc0() 
> > knlGS:
> >   CS:  0010 DS:  ES:  CR0: 80050033
> >   CR2: fffe CR3: b2a58000 CR4: 001406e0
> >   Call Trace:
> >kpageflags_read+0xc7/0x120
> >proc_reg_read+0x3c/0x60
> >__vfs_read+0x36/0x170
> >vfs_read+0x89/0x130
> >ksys_pread64+0x71/0x90
> >do_syscall_64+0x5b/0x160
> >entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >   RIP: 0033:0x7efc42e75e23
> >   Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 
> > 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 
> > ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
> >
> > According to kernel bisection, this problem became visible due to commit
> > f7f99100d8d9 which changes how struct pages are initialized.
> >
> > Memblock layout affects the pfn ranges covered by node/zone. Consider
> > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
> > the default (no memmap= given) memblock layout is like below:
> >
> >   MEMBLOCK configuration:
> >memory size = 0x0001fff75c00 reserved size = 0x0300c000
> >memory.cnt  = 0x4
> >memory[0x0] [0x1000-0x0009efff], 
> > 0x0009e000 bytes on node 0 flags: 0x0
> >memory[0x1] [0x0010-0xbffd6fff], 
> > 0xbfed7000 bytes on node 0 flags: 0x0
> >memory[0x2] [0x0001-0x00013fff], 
> > 0x4000 bytes on node 0 flags: 0x0
> >memory[0x3] [0x00014000-0x00023fff], 
> > 0x0001 bytes on node 1 flags: 0x0
> >...
> >
> > If you give memmap=1G!4G (so it just covers memory[0x2]),
> > the range [0x1-0x13fff] is gone:
> >
> >   MEMBLOCK configuration:
> >memory size = 0x0001bff75c00 reserved size = 0x0300c000
> >memory.cnt  = 0x3
> >memory[0x0] [0x1000-0x0009efff], 
> > 0x0009e000 bytes on node 0 flags: 0x0
> >memory[0x1] [0x0010-0xbffd6fff], 
> > 0xbfed7000 bytes on node 0 flags: 0x0
> >memory[0x2] [0x00014000-0x00023fff], 
> > 0x0001 bytes on node 1 flags: 0x0
> >...
> >
> > This causes shrinking node 0's pfn range because it is calculated by
> > the address range of memblock.memory. So some of struct pages in the
> > gap range are left uninitialized.
> >
> > We have a function zero_resv_unavail() which does zeroing the struct
> > pages outside memblock.memory, but currently it covers only the reserved
> > unavailable range (i.e. memblock.memory && !memblock.reserved).
> > This patch extends it to cover all unavailable range, which fixes
> > the reported issue.
>
> Thanks for pin pointing this down Naoya! I am wondering why we cannot
> simply mark the excluded ranges to be reserved instead.

I tried your idea with the change below, and it also fixes the kernel panic.

---
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index d1f25c831447..2cef120535d4 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1248,6 +1248,7 @@ void __init e820__memblock_setup(void)
 {
int i;
u64 end;
+   u64 addr = 0;
 
/*
 * The bootstrap memblock region count maximum is 128 entries
@@ -1264,13 +1265,16 @@ void __init e820__memblock_setup(void)
struct e820_entry *entry = _table->entries[i];
 
end = entry->addr + entry->size;
+  

Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-13 Thread Naoya Horiguchi
On Wed, Jun 13, 2018 at 11:07:00AM +0200, Michal Hocko wrote:
> On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote:
> [...]
> > From: Naoya Horiguchi 
> > Date: Wed, 13 Jun 2018 12:43:27 +0900
> > Subject: [PATCH] mm: zero remaining unavailable struct pages
> >
> > There is a kernel panic that is triggered when reading /proc/kpageflags
> > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
> >
> >   BUG: unable to handle kernel paging request at fffe
> >   PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
> >   Oops:  [#1] SMP PTI
> >   CPU: 2 PID: 1728 Comm: page-types Not tainted 
> > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
> >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 
> > 04/01/2014
> >   RIP: 0010:stable_page_flags+0x27/0x3c0
> >   Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 
> > fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 
> > c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
> >   RSP: 0018:bbd44111fde0 EFLAGS: 00010202
> >   RAX: fffe RBX: 7fffeff9 RCX: 
> >   RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
> >   RBP:  R08: 0001 R09: 0001
> >   R10: bbd44111fed8 R11:  R12: ed1182fff5c0
> >   R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
> >   FS:  7efc4335a500() GS:93a5bfc0() 
> > knlGS:
> >   CS:  0010 DS:  ES:  CR0: 80050033
> >   CR2: fffe CR3: b2a58000 CR4: 001406e0
> >   Call Trace:
> >kpageflags_read+0xc7/0x120
> >proc_reg_read+0x3c/0x60
> >__vfs_read+0x36/0x170
> >vfs_read+0x89/0x130
> >ksys_pread64+0x71/0x90
> >do_syscall_64+0x5b/0x160
> >entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >   RIP: 0033:0x7efc42e75e23
> >   Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 
> > 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 
> > ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
> >
> > According to kernel bisection, this problem became visible due to commit
> > f7f99100d8d9 which changes how struct pages are initialized.
> >
> > Memblock layout affects the pfn ranges covered by node/zone. Consider
> > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
> > the default (no memmap= given) memblock layout is like below:
> >
> >   MEMBLOCK configuration:
> >memory size = 0x0001fff75c00 reserved size = 0x0300c000
> >memory.cnt  = 0x4
> >memory[0x0] [0x1000-0x0009efff], 
> > 0x0009e000 bytes on node 0 flags: 0x0
> >memory[0x1] [0x0010-0xbffd6fff], 
> > 0xbfed7000 bytes on node 0 flags: 0x0
> >memory[0x2] [0x0001-0x00013fff], 
> > 0x4000 bytes on node 0 flags: 0x0
> >memory[0x3] [0x00014000-0x00023fff], 
> > 0x0001 bytes on node 1 flags: 0x0
> >...
> >
> > If you give memmap=1G!4G (so it just covers memory[0x2]),
> > the range [0x1-0x13fff] is gone:
> >
> >   MEMBLOCK configuration:
> >memory size = 0x0001bff75c00 reserved size = 0x0300c000
> >memory.cnt  = 0x3
> >memory[0x0] [0x1000-0x0009efff], 
> > 0x0009e000 bytes on node 0 flags: 0x0
> >memory[0x1] [0x0010-0xbffd6fff], 
> > 0xbfed7000 bytes on node 0 flags: 0x0
> >memory[0x2] [0x00014000-0x00023fff], 
> > 0x0001 bytes on node 1 flags: 0x0
> >...
> >
> > This causes shrinking node 0's pfn range because it is calculated by
> > the address range of memblock.memory. So some of struct pages in the
> > gap range are left uninitialized.
> >
> > We have a function zero_resv_unavail() which does zeroing the struct
> > pages outside memblock.memory, but currently it covers only the reserved
> > unavailable range (i.e. memblock.memory && !memblock.reserved).
> > This patch extends it to cover all unavailable range, which fixes
> > the reported issue.
>
> Thanks for pin pointing this down Naoya! I am wondering why we cannot
> simply mark the excluded ranges to be reserved instead.

I tried your idea with the change below, and it also fixes the kernel panic.

---
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index d1f25c831447..2cef120535d4 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1248,6 +1248,7 @@ void __init e820__memblock_setup(void)
 {
int i;
u64 end;
+   u64 addr = 0;
 
/*
 * The bootstrap memblock region count maximum is 128 entries
@@ -1264,13 +1265,16 @@ void __init e820__memblock_setup(void)
struct e820_entry *entry = _table->entries[i];
 
end = entry->addr + entry->size;
+  

Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-13 Thread Naoya Horiguchi
On Wed, Jun 13, 2018 at 10:40:32AM +0200, Oscar Salvador wrote:
> On Wed, Jun 13, 2018 at 05:41:08AM +, Naoya Horiguchi wrote:
> > Hi everyone, 
> > 
> > I wrote a patch for this issue.
> > There was a discussion about prechecking approach, but I finally found
> > out it's hard to make change on memblock after numa_init, so I take
> > another apporach (see patch description).
> > 
> > I'm glad if you check that it works for you.
> > 
> > Thanks,
> > Naoya Horiguchi
> > ---
> > From: Naoya Horiguchi 
> > Date: Wed, 13 Jun 2018 12:43:27 +0900
> > Subject: [PATCH] mm: zero remaining unavailable struct pages
> > 
> > There is a kernel panic that is triggered when reading /proc/kpageflags
> > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
> > 
> >   BUG: unable to handle kernel paging request at fffe
> >   PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
> >   Oops:  [#1] SMP PTI
> >   CPU: 2 PID: 1728 Comm: page-types Not tainted 
> > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
> >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 
> > 04/01/2014
> >   RIP: 0010:stable_page_flags+0x27/0x3c0
> >   Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 
> > fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 
> > c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
> >   RSP: 0018:bbd44111fde0 EFLAGS: 00010202
> >   RAX: fffe RBX: 7fffeff9 RCX: 
> >   RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
> >   RBP:  R08: 0001 R09: 0001
> >   R10: bbd44111fed8 R11:  R12: ed1182fff5c0
> >   R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
> >   FS:  7efc4335a500() GS:93a5bfc0() 
> > knlGS:
> >   CS:  0010 DS:  ES:  CR0: 80050033
> >   CR2: fffe CR3: b2a58000 CR4: 001406e0
> >   Call Trace:
> >kpageflags_read+0xc7/0x120
> >proc_reg_read+0x3c/0x60
> >__vfs_read+0x36/0x170
> >vfs_read+0x89/0x130
> >ksys_pread64+0x71/0x90
> >do_syscall_64+0x5b/0x160
> >entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >   RIP: 0033:0x7efc42e75e23
> >   Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 
> > 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 
> > ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
> > 
> > According to kernel bisection, this problem became visible due to commit
> > f7f99100d8d9 which changes how struct pages are initialized.
> > 
> > Memblock layout affects the pfn ranges covered by node/zone. Consider
> > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
> > the default (no memmap= given) memblock layout is like below:
> > 
> >   MEMBLOCK configuration:
> >memory size = 0x0001fff75c00 reserved size = 0x0300c000
> >memory.cnt  = 0x4
> >memory[0x0] [0x1000-0x0009efff], 
> > 0x0009e000 bytes on node 0 flags: 0x0
> >memory[0x1] [0x0010-0xbffd6fff], 
> > 0xbfed7000 bytes on node 0 flags: 0x0
> >memory[0x2] [0x0001-0x00013fff], 
> > 0x4000 bytes on node 0 flags: 0x0
> >memory[0x3] [0x00014000-0x00023fff], 
> > 0x0001 bytes on node 1 flags: 0x0
> >...
> > 
> > If you give memmap=1G!4G (so it just covers memory[0x2]),
> > the range [0x1-0x13fff] is gone:
> > 
> >   MEMBLOCK configuration:
> >memory size = 0x0001bff75c00 reserved size = 0x0300c000
> >memory.cnt  = 0x3
> >memory[0x0] [0x1000-0x0009efff], 
> > 0x0009e000 bytes on node 0 flags: 0x0
> >memory[0x1] [0x0010-0xbffd6fff], 
> > 0xbfed7000 bytes on node 0 flags: 0x0
> >memory[0x2] [0x00014000-0x00023fff], 
> > 0x0001 bytes on node 1 flags: 0x0
> >...
> > 
> > This causes shrinking node 0's pfn range because it is calculated by
> > the address range of memblock.memory. So some of struct pages in the
> > gap range are left uninitialized.
> > 
> > We have a function zero_resv_unavail() which does zeroing the struct
> > pages outside memblock.memory, but currently it covers only the reserved
> > unavailable range (i.e. memblock.memory && !memblock.reserved).
> > This patch extends it to cover all unavailable range, which fixes
> > the reported issue.
> > 
> > Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap")
> > Signed-off-by: Naoya Horiguchi 
> > ---
> >  include/linux/memblock.h | 16 
> >  mm/page_alloc.c  | 33 -
> >  2 files changed, 24 insertions(+), 25 deletions(-)
> > 
> > diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> > 

Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-13 Thread Naoya Horiguchi
On Wed, Jun 13, 2018 at 10:40:32AM +0200, Oscar Salvador wrote:
> On Wed, Jun 13, 2018 at 05:41:08AM +, Naoya Horiguchi wrote:
> > Hi everyone, 
> > 
> > I wrote a patch for this issue.
> > There was a discussion about prechecking approach, but I finally found
> > out it's hard to make change on memblock after numa_init, so I take
> > another apporach (see patch description).
> > 
> > I'm glad if you check that it works for you.
> > 
> > Thanks,
> > Naoya Horiguchi
> > ---
> > From: Naoya Horiguchi 
> > Date: Wed, 13 Jun 2018 12:43:27 +0900
> > Subject: [PATCH] mm: zero remaining unavailable struct pages
> > 
> > There is a kernel panic that is triggered when reading /proc/kpageflags
> > on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
> > 
> >   BUG: unable to handle kernel paging request at fffe
> >   PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
> >   Oops:  [#1] SMP PTI
> >   CPU: 2 PID: 1728 Comm: page-types Not tainted 
> > 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
> >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 
> > 04/01/2014
> >   RIP: 0010:stable_page_flags+0x27/0x3c0
> >   Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 
> > fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 
> > c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
> >   RSP: 0018:bbd44111fde0 EFLAGS: 00010202
> >   RAX: fffe RBX: 7fffeff9 RCX: 
> >   RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
> >   RBP:  R08: 0001 R09: 0001
> >   R10: bbd44111fed8 R11:  R12: ed1182fff5c0
> >   R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
> >   FS:  7efc4335a500() GS:93a5bfc0() 
> > knlGS:
> >   CS:  0010 DS:  ES:  CR0: 80050033
> >   CR2: fffe CR3: b2a58000 CR4: 001406e0
> >   Call Trace:
> >kpageflags_read+0xc7/0x120
> >proc_reg_read+0x3c/0x60
> >__vfs_read+0x36/0x170
> >vfs_read+0x89/0x130
> >ksys_pread64+0x71/0x90
> >do_syscall_64+0x5b/0x160
> >entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >   RIP: 0033:0x7efc42e75e23
> >   Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 
> > 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 
> > ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
> > 
> > According to kernel bisection, this problem became visible due to commit
> > f7f99100d8d9 which changes how struct pages are initialized.
> > 
> > Memblock layout affects the pfn ranges covered by node/zone. Consider
> > that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
> > the default (no memmap= given) memblock layout is like below:
> > 
> >   MEMBLOCK configuration:
> >memory size = 0x0001fff75c00 reserved size = 0x0300c000
> >memory.cnt  = 0x4
> >memory[0x0] [0x1000-0x0009efff], 
> > 0x0009e000 bytes on node 0 flags: 0x0
> >memory[0x1] [0x0010-0xbffd6fff], 
> > 0xbfed7000 bytes on node 0 flags: 0x0
> >memory[0x2] [0x0001-0x00013fff], 
> > 0x4000 bytes on node 0 flags: 0x0
> >memory[0x3] [0x00014000-0x00023fff], 
> > 0x0001 bytes on node 1 flags: 0x0
> >...
> > 
> > If you give memmap=1G!4G (so it just covers memory[0x2]),
> > the range [0x1-0x13fff] is gone:
> > 
> >   MEMBLOCK configuration:
> >memory size = 0x0001bff75c00 reserved size = 0x0300c000
> >memory.cnt  = 0x3
> >memory[0x0] [0x1000-0x0009efff], 
> > 0x0009e000 bytes on node 0 flags: 0x0
> >memory[0x1] [0x0010-0xbffd6fff], 
> > 0xbfed7000 bytes on node 0 flags: 0x0
> >memory[0x2] [0x00014000-0x00023fff], 
> > 0x0001 bytes on node 1 flags: 0x0
> >...
> > 
> > This causes shrinking node 0's pfn range because it is calculated by
> > the address range of memblock.memory. So some of struct pages in the
> > gap range are left uninitialized.
> > 
> > We have a function zero_resv_unavail() which does zeroing the struct
> > pages outside memblock.memory, but currently it covers only the reserved
> > unavailable range (i.e. memblock.memory && !memblock.reserved).
> > This patch extends it to cover all unavailable range, which fixes
> > the reported issue.
> > 
> > Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap")
> > Signed-off-by: Naoya Horiguchi 
> > ---
> >  include/linux/memblock.h | 16 
> >  mm/page_alloc.c  | 33 -
> >  2 files changed, 24 insertions(+), 25 deletions(-)
> > 
> > diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> > 

Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-13 Thread Michal Hocko
On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote:
[...]
> From: Naoya Horiguchi 
> Date: Wed, 13 Jun 2018 12:43:27 +0900
> Subject: [PATCH] mm: zero remaining unavailable struct pages
> 
> There is a kernel panic that is triggered when reading /proc/kpageflags
> on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
> 
>   BUG: unable to handle kernel paging request at fffe
>   PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
>   Oops:  [#1] SMP PTI
>   CPU: 2 PID: 1728 Comm: page-types Not tainted 
> 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 
> 04/01/2014
>   RIP: 0010:stable_page_flags+0x27/0x3c0
>   Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 
> 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 
> 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
>   RSP: 0018:bbd44111fde0 EFLAGS: 00010202
>   RAX: fffe RBX: 7fffeff9 RCX: 
>   RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
>   RBP:  R08: 0001 R09: 0001
>   R10: bbd44111fed8 R11:  R12: ed1182fff5c0
>   R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
>   FS:  7efc4335a500() GS:93a5bfc0() knlGS:
>   CS:  0010 DS:  ES:  CR0: 80050033
>   CR2: fffe CR3: b2a58000 CR4: 001406e0
>   Call Trace:
>kpageflags_read+0xc7/0x120
>proc_reg_read+0x3c/0x60
>__vfs_read+0x36/0x170
>vfs_read+0x89/0x130
>ksys_pread64+0x71/0x90
>do_syscall_64+0x5b/0x160
>entry_SYSCALL_64_after_hwframe+0x44/0xa9
>   RIP: 0033:0x7efc42e75e23
>   Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 
> 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 
> 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
> 
> According to kernel bisection, this problem became visible due to commit
> f7f99100d8d9 which changes how struct pages are initialized.
> 
> Memblock layout affects the pfn ranges covered by node/zone. Consider
> that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
> the default (no memmap= given) memblock layout is like below:
> 
>   MEMBLOCK configuration:
>memory size = 0x0001fff75c00 reserved size = 0x0300c000
>memory.cnt  = 0x4
>memory[0x0] [0x1000-0x0009efff], 
> 0x0009e000 bytes on node 0 flags: 0x0
>memory[0x1] [0x0010-0xbffd6fff], 
> 0xbfed7000 bytes on node 0 flags: 0x0
>memory[0x2] [0x0001-0x00013fff], 
> 0x4000 bytes on node 0 flags: 0x0
>memory[0x3] [0x00014000-0x00023fff], 
> 0x0001 bytes on node 1 flags: 0x0
>...
> 
> If you give memmap=1G!4G (so it just covers memory[0x2]),
> the range [0x1-0x13fff] is gone:
> 
>   MEMBLOCK configuration:
>memory size = 0x0001bff75c00 reserved size = 0x0300c000
>memory.cnt  = 0x3
>memory[0x0] [0x1000-0x0009efff], 
> 0x0009e000 bytes on node 0 flags: 0x0
>memory[0x1] [0x0010-0xbffd6fff], 
> 0xbfed7000 bytes on node 0 flags: 0x0
>memory[0x2] [0x00014000-0x00023fff], 
> 0x0001 bytes on node 1 flags: 0x0
>...
> 
> This causes shrinking node 0's pfn range because it is calculated by
> the address range of memblock.memory. So some of struct pages in the
> gap range are left uninitialized.
> 
> We have a function zero_resv_unavail() which does zeroing the struct
> pages outside memblock.memory, but currently it covers only the reserved
> unavailable range (i.e. memblock.memory && !memblock.reserved).
> This patch extends it to cover all unavailable range, which fixes
> the reported issue.

Thanks for pin pointing this down Naoya! I am wondering why we cannot
simply mark the excluded ranges to be reserved instead.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-13 Thread Michal Hocko
On Wed 13-06-18 05:41:08, Naoya Horiguchi wrote:
[...]
> From: Naoya Horiguchi 
> Date: Wed, 13 Jun 2018 12:43:27 +0900
> Subject: [PATCH] mm: zero remaining unavailable struct pages
> 
> There is a kernel panic that is triggered when reading /proc/kpageflags
> on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
> 
>   BUG: unable to handle kernel paging request at fffe
>   PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
>   Oops:  [#1] SMP PTI
>   CPU: 2 PID: 1728 Comm: page-types Not tainted 
> 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 
> 04/01/2014
>   RIP: 0010:stable_page_flags+0x27/0x3c0
>   Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 
> 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 
> 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
>   RSP: 0018:bbd44111fde0 EFLAGS: 00010202
>   RAX: fffe RBX: 7fffeff9 RCX: 
>   RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
>   RBP:  R08: 0001 R09: 0001
>   R10: bbd44111fed8 R11:  R12: ed1182fff5c0
>   R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
>   FS:  7efc4335a500() GS:93a5bfc0() knlGS:
>   CS:  0010 DS:  ES:  CR0: 80050033
>   CR2: fffe CR3: b2a58000 CR4: 001406e0
>   Call Trace:
>kpageflags_read+0xc7/0x120
>proc_reg_read+0x3c/0x60
>__vfs_read+0x36/0x170
>vfs_read+0x89/0x130
>ksys_pread64+0x71/0x90
>do_syscall_64+0x5b/0x160
>entry_SYSCALL_64_after_hwframe+0x44/0xa9
>   RIP: 0033:0x7efc42e75e23
>   Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 
> 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 
> 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
> 
> According to kernel bisection, this problem became visible due to commit
> f7f99100d8d9 which changes how struct pages are initialized.
> 
> Memblock layout affects the pfn ranges covered by node/zone. Consider
> that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
> the default (no memmap= given) memblock layout is like below:
> 
>   MEMBLOCK configuration:
>memory size = 0x0001fff75c00 reserved size = 0x0300c000
>memory.cnt  = 0x4
>memory[0x0] [0x1000-0x0009efff], 
> 0x0009e000 bytes on node 0 flags: 0x0
>memory[0x1] [0x0010-0xbffd6fff], 
> 0xbfed7000 bytes on node 0 flags: 0x0
>memory[0x2] [0x0001-0x00013fff], 
> 0x4000 bytes on node 0 flags: 0x0
>memory[0x3] [0x00014000-0x00023fff], 
> 0x0001 bytes on node 1 flags: 0x0
>...
> 
> If you give memmap=1G!4G (so it just covers memory[0x2]),
> the range [0x1-0x13fff] is gone:
> 
>   MEMBLOCK configuration:
>memory size = 0x0001bff75c00 reserved size = 0x0300c000
>memory.cnt  = 0x3
>memory[0x0] [0x1000-0x0009efff], 
> 0x0009e000 bytes on node 0 flags: 0x0
>memory[0x1] [0x0010-0xbffd6fff], 
> 0xbfed7000 bytes on node 0 flags: 0x0
>memory[0x2] [0x00014000-0x00023fff], 
> 0x0001 bytes on node 1 flags: 0x0
>...
> 
> This causes shrinking node 0's pfn range because it is calculated by
> the address range of memblock.memory. So some of struct pages in the
> gap range are left uninitialized.
> 
> We have a function zero_resv_unavail() which does zeroing the struct
> pages outside memblock.memory, but currently it covers only the reserved
> unavailable range (i.e. memblock.memory && !memblock.reserved).
> This patch extends it to cover all unavailable range, which fixes
> the reported issue.

Thanks for pin pointing this down Naoya! I am wondering why we cannot
simply mark the excluded ranges to be reserved instead.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-13 Thread Oscar Salvador
On Wed, Jun 13, 2018 at 05:41:08AM +, Naoya Horiguchi wrote:
> Hi everyone, 
> 
> I wrote a patch for this issue.
> There was a discussion about prechecking approach, but I finally found
> out it's hard to make change on memblock after numa_init, so I take
> another apporach (see patch description).
> 
> I'm glad if you check that it works for you.
> 
> Thanks,
> Naoya Horiguchi
> ---
> From: Naoya Horiguchi 
> Date: Wed, 13 Jun 2018 12:43:27 +0900
> Subject: [PATCH] mm: zero remaining unavailable struct pages
> 
> There is a kernel panic that is triggered when reading /proc/kpageflags
> on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
> 
>   BUG: unable to handle kernel paging request at fffe
>   PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
>   Oops:  [#1] SMP PTI
>   CPU: 2 PID: 1728 Comm: page-types Not tainted 
> 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 
> 04/01/2014
>   RIP: 0010:stable_page_flags+0x27/0x3c0
>   Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 
> 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 
> 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
>   RSP: 0018:bbd44111fde0 EFLAGS: 00010202
>   RAX: fffe RBX: 7fffeff9 RCX: 
>   RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
>   RBP:  R08: 0001 R09: 0001
>   R10: bbd44111fed8 R11:  R12: ed1182fff5c0
>   R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
>   FS:  7efc4335a500() GS:93a5bfc0() knlGS:
>   CS:  0010 DS:  ES:  CR0: 80050033
>   CR2: fffe CR3: b2a58000 CR4: 001406e0
>   Call Trace:
>kpageflags_read+0xc7/0x120
>proc_reg_read+0x3c/0x60
>__vfs_read+0x36/0x170
>vfs_read+0x89/0x130
>ksys_pread64+0x71/0x90
>do_syscall_64+0x5b/0x160
>entry_SYSCALL_64_after_hwframe+0x44/0xa9
>   RIP: 0033:0x7efc42e75e23
>   Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 
> 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 
> 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
> 
> According to kernel bisection, this problem became visible due to commit
> f7f99100d8d9 which changes how struct pages are initialized.
> 
> Memblock layout affects the pfn ranges covered by node/zone. Consider
> that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
> the default (no memmap= given) memblock layout is like below:
> 
>   MEMBLOCK configuration:
>memory size = 0x0001fff75c00 reserved size = 0x0300c000
>memory.cnt  = 0x4
>memory[0x0] [0x1000-0x0009efff], 
> 0x0009e000 bytes on node 0 flags: 0x0
>memory[0x1] [0x0010-0xbffd6fff], 
> 0xbfed7000 bytes on node 0 flags: 0x0
>memory[0x2] [0x0001-0x00013fff], 
> 0x4000 bytes on node 0 flags: 0x0
>memory[0x3] [0x00014000-0x00023fff], 
> 0x0001 bytes on node 1 flags: 0x0
>...
> 
> If you give memmap=1G!4G (so it just covers memory[0x2]),
> the range [0x1-0x13fff] is gone:
> 
>   MEMBLOCK configuration:
>memory size = 0x0001bff75c00 reserved size = 0x0300c000
>memory.cnt  = 0x3
>memory[0x0] [0x1000-0x0009efff], 
> 0x0009e000 bytes on node 0 flags: 0x0
>memory[0x1] [0x0010-0xbffd6fff], 
> 0xbfed7000 bytes on node 0 flags: 0x0
>memory[0x2] [0x00014000-0x00023fff], 
> 0x0001 bytes on node 1 flags: 0x0
>...
> 
> This causes shrinking node 0's pfn range because it is calculated by
> the address range of memblock.memory. So some of struct pages in the
> gap range are left uninitialized.
> 
> We have a function zero_resv_unavail() which does zeroing the struct
> pages outside memblock.memory, but currently it covers only the reserved
> unavailable range (i.e. memblock.memory && !memblock.reserved).
> This patch extends it to cover all unavailable range, which fixes
> the reported issue.
> 
> Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap")
> Signed-off-by: Naoya Horiguchi 
> ---
>  include/linux/memblock.h | 16 
>  mm/page_alloc.c  | 33 -
>  2 files changed, 24 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index ca59883c8364..f191e51c5d2a 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -236,22 +236,6 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned 
> long *out_start_pfn,
>   for_each_mem_range_rev(i, , , \
>  nid, 

Re: [PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-13 Thread Oscar Salvador
On Wed, Jun 13, 2018 at 05:41:08AM +, Naoya Horiguchi wrote:
> Hi everyone, 
> 
> I wrote a patch for this issue.
> There was a discussion about prechecking approach, but I finally found
> out it's hard to make change on memblock after numa_init, so I take
> another apporach (see patch description).
> 
> I'm glad if you check that it works for you.
> 
> Thanks,
> Naoya Horiguchi
> ---
> From: Naoya Horiguchi 
> Date: Wed, 13 Jun 2018 12:43:27 +0900
> Subject: [PATCH] mm: zero remaining unavailable struct pages
> 
> There is a kernel panic that is triggered when reading /proc/kpageflags
> on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
> 
>   BUG: unable to handle kernel paging request at fffe
>   PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
>   Oops:  [#1] SMP PTI
>   CPU: 2 PID: 1728 Comm: page-types Not tainted 
> 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 
> 04/01/2014
>   RIP: 0010:stable_page_flags+0x27/0x3c0
>   Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 
> 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 
> 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
>   RSP: 0018:bbd44111fde0 EFLAGS: 00010202
>   RAX: fffe RBX: 7fffeff9 RCX: 
>   RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
>   RBP:  R08: 0001 R09: 0001
>   R10: bbd44111fed8 R11:  R12: ed1182fff5c0
>   R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
>   FS:  7efc4335a500() GS:93a5bfc0() knlGS:
>   CS:  0010 DS:  ES:  CR0: 80050033
>   CR2: fffe CR3: b2a58000 CR4: 001406e0
>   Call Trace:
>kpageflags_read+0xc7/0x120
>proc_reg_read+0x3c/0x60
>__vfs_read+0x36/0x170
>vfs_read+0x89/0x130
>ksys_pread64+0x71/0x90
>do_syscall_64+0x5b/0x160
>entry_SYSCALL_64_after_hwframe+0x44/0xa9
>   RIP: 0033:0x7efc42e75e23
>   Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 
> 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 
> 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
> 
> According to kernel bisection, this problem became visible due to commit
> f7f99100d8d9 which changes how struct pages are initialized.
> 
> Memblock layout affects the pfn ranges covered by node/zone. Consider
> that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
> the default (no memmap= given) memblock layout is like below:
> 
>   MEMBLOCK configuration:
>memory size = 0x0001fff75c00 reserved size = 0x0300c000
>memory.cnt  = 0x4
>memory[0x0] [0x1000-0x0009efff], 
> 0x0009e000 bytes on node 0 flags: 0x0
>memory[0x1] [0x0010-0xbffd6fff], 
> 0xbfed7000 bytes on node 0 flags: 0x0
>memory[0x2] [0x0001-0x00013fff], 
> 0x4000 bytes on node 0 flags: 0x0
>memory[0x3] [0x00014000-0x00023fff], 
> 0x0001 bytes on node 1 flags: 0x0
>...
> 
> If you give memmap=1G!4G (so it just covers memory[0x2]),
> the range [0x1-0x13fff] is gone:
> 
>   MEMBLOCK configuration:
>memory size = 0x0001bff75c00 reserved size = 0x0300c000
>memory.cnt  = 0x3
>memory[0x0] [0x1000-0x0009efff], 
> 0x0009e000 bytes on node 0 flags: 0x0
>memory[0x1] [0x0010-0xbffd6fff], 
> 0xbfed7000 bytes on node 0 flags: 0x0
>memory[0x2] [0x00014000-0x00023fff], 
> 0x0001 bytes on node 1 flags: 0x0
>...
> 
> This causes shrinking node 0's pfn range because it is calculated by
> the address range of memblock.memory. So some of struct pages in the
> gap range are left uninitialized.
> 
> We have a function zero_resv_unavail() which does zeroing the struct
> pages outside memblock.memory, but currently it covers only the reserved
> unavailable range (i.e. memblock.memory && !memblock.reserved).
> This patch extends it to cover all unavailable range, which fixes
> the reported issue.
> 
> Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap")
> Signed-off-by: Naoya Horiguchi 
> ---
>  include/linux/memblock.h | 16 
>  mm/page_alloc.c  | 33 -
>  2 files changed, 24 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index ca59883c8364..f191e51c5d2a 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -236,22 +236,6 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned 
> long *out_start_pfn,
>   for_each_mem_range_rev(i, , , \
>  nid, 

[PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-12 Thread Naoya Horiguchi
Hi everyone, 

I wrote a patch for this issue.
There was a discussion about prechecking approach, but I finally found
out it's hard to make change on memblock after numa_init, so I take
another apporach (see patch description).

I'm glad if you check that it works for you.

Thanks,
Naoya Horiguchi
---
From: Naoya Horiguchi 
Date: Wed, 13 Jun 2018 12:43:27 +0900
Subject: [PATCH] mm: zero remaining unavailable struct pages

There is a kernel panic that is triggered when reading /proc/kpageflags
on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':

  BUG: unable to handle kernel paging request at fffe
  PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
  Oops:  [#1] SMP PTI
  CPU: 2 PID: 1728 Comm: page-types Not tainted 
4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 
04/01/2014
  RIP: 0010:stable_page_flags+0x27/0x3c0
  Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 
48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 
10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
  RSP: 0018:bbd44111fde0 EFLAGS: 00010202
  RAX: fffe RBX: 7fffeff9 RCX: 
  RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
  RBP:  R08: 0001 R09: 0001
  R10: bbd44111fed8 R11:  R12: ed1182fff5c0
  R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
  FS:  7efc4335a500() GS:93a5bfc0() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: fffe CR3: b2a58000 CR4: 001406e0
  Call Trace:
   kpageflags_read+0xc7/0x120
   proc_reg_read+0x3c/0x60
   __vfs_read+0x36/0x170
   vfs_read+0x89/0x130
   ksys_pread64+0x71/0x90
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7efc42e75e23
  Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 
3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 
c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24

According to kernel bisection, this problem became visible due to commit
f7f99100d8d9 which changes how struct pages are initialized.

Memblock layout affects the pfn ranges covered by node/zone. Consider
that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
the default (no memmap= given) memblock layout is like below:

  MEMBLOCK configuration:
   memory size = 0x0001fff75c00 reserved size = 0x0300c000
   memory.cnt  = 0x4
   memory[0x0] [0x1000-0x0009efff], 0x0009e000 
bytes on node 0 flags: 0x0
   memory[0x1] [0x0010-0xbffd6fff], 0xbfed7000 
bytes on node 0 flags: 0x0
   memory[0x2] [0x0001-0x00013fff], 0x4000 
bytes on node 0 flags: 0x0
   memory[0x3] [0x00014000-0x00023fff], 0x0001 
bytes on node 1 flags: 0x0
   ...

If you give memmap=1G!4G (so it just covers memory[0x2]),
the range [0x1-0x13fff] is gone:

  MEMBLOCK configuration:
   memory size = 0x0001bff75c00 reserved size = 0x0300c000
   memory.cnt  = 0x3
   memory[0x0] [0x1000-0x0009efff], 0x0009e000 
bytes on node 0 flags: 0x0
   memory[0x1] [0x0010-0xbffd6fff], 0xbfed7000 
bytes on node 0 flags: 0x0
   memory[0x2] [0x00014000-0x00023fff], 0x0001 
bytes on node 1 flags: 0x0
   ...

This causes shrinking node 0's pfn range because it is calculated by
the address range of memblock.memory. So some of struct pages in the
gap range are left uninitialized.

We have a function zero_resv_unavail() which does zeroing the struct
pages outside memblock.memory, but currently it covers only the reserved
unavailable range (i.e. memblock.memory && !memblock.reserved).
This patch extends it to cover all unavailable range, which fixes
the reported issue.

Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Naoya Horiguchi 
---
 include/linux/memblock.h | 16 
 mm/page_alloc.c  | 33 -
 2 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index ca59883c8364..f191e51c5d2a 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -236,22 +236,6 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long 
*out_start_pfn,
for_each_mem_range_rev(i, , , \
   nid, flags, p_start, p_end, p_nid)
 
-/**
- * for_each_resv_unavail_range - iterate through reserved and unavailable 
memory
- * @i: u64 used as loop variable
- * @flags: pick from blocks based on memory attributes
- * @p_start: ptr to phys_addr_t for start address of the range, can be 

[PATCH v1] mm: zero remaining unavailable struct pages (Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM)

2018-06-12 Thread Naoya Horiguchi
Hi everyone, 

I wrote a patch for this issue.
There was a discussion about prechecking approach, but I finally found
out it's hard to make change on memblock after numa_init, so I take
another apporach (see patch description).

I'm glad if you check that it works for you.

Thanks,
Naoya Horiguchi
---
From: Naoya Horiguchi 
Date: Wed, 13 Jun 2018 12:43:27 +0900
Subject: [PATCH] mm: zero remaining unavailable struct pages

There is a kernel panic that is triggered when reading /proc/kpageflags
on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':

  BUG: unable to handle kernel paging request at fffe
  PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
  Oops:  [#1] SMP PTI
  CPU: 2 PID: 1728 Comm: page-types Not tainted 
4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 
04/01/2014
  RIP: 0010:stable_page_flags+0x27/0x3c0
  Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 
48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 
10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
  RSP: 0018:bbd44111fde0 EFLAGS: 00010202
  RAX: fffe RBX: 7fffeff9 RCX: 
  RDX: 0001 RSI: 0202 RDI: ed1182fff5c0
  RBP:  R08: 0001 R09: 0001
  R10: bbd44111fed8 R11:  R12: ed1182fff5c0
  R13: 000bffd7 R14: 02fff5c0 R15: bbd44111ff10
  FS:  7efc4335a500() GS:93a5bfc0() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: fffe CR3: b2a58000 CR4: 001406e0
  Call Trace:
   kpageflags_read+0xc7/0x120
   proc_reg_read+0x3c/0x60
   __vfs_read+0x36/0x170
   vfs_read+0x89/0x130
   ksys_pread64+0x71/0x90
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7efc42e75e23
  Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 
3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 
c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24

According to kernel bisection, this problem became visible due to commit
f7f99100d8d9 which changes how struct pages are initialized.

Memblock layout affects the pfn ranges covered by node/zone. Consider
that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
the default (no memmap= given) memblock layout is like below:

  MEMBLOCK configuration:
   memory size = 0x0001fff75c00 reserved size = 0x0300c000
   memory.cnt  = 0x4
   memory[0x0] [0x1000-0x0009efff], 0x0009e000 
bytes on node 0 flags: 0x0
   memory[0x1] [0x0010-0xbffd6fff], 0xbfed7000 
bytes on node 0 flags: 0x0
   memory[0x2] [0x0001-0x00013fff], 0x4000 
bytes on node 0 flags: 0x0
   memory[0x3] [0x00014000-0x00023fff], 0x0001 
bytes on node 1 flags: 0x0
   ...

If you give memmap=1G!4G (so it just covers memory[0x2]),
the range [0x1-0x13fff] is gone:

  MEMBLOCK configuration:
   memory size = 0x0001bff75c00 reserved size = 0x0300c000
   memory.cnt  = 0x3
   memory[0x0] [0x1000-0x0009efff], 0x0009e000 
bytes on node 0 flags: 0x0
   memory[0x1] [0x0010-0xbffd6fff], 0xbfed7000 
bytes on node 0 flags: 0x0
   memory[0x2] [0x00014000-0x00023fff], 0x0001 
bytes on node 1 flags: 0x0
   ...

This causes shrinking node 0's pfn range because it is calculated by
the address range of memblock.memory. So some of struct pages in the
gap range are left uninitialized.

We have a function zero_resv_unavail() which does zeroing the struct
pages outside memblock.memory, but currently it covers only the reserved
unavailable range (i.e. memblock.memory && !memblock.reserved).
This patch extends it to cover all unavailable range, which fixes
the reported issue.

Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Naoya Horiguchi 
---
 include/linux/memblock.h | 16 
 mm/page_alloc.c  | 33 -
 2 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index ca59883c8364..f191e51c5d2a 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -236,22 +236,6 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long 
*out_start_pfn,
for_each_mem_range_rev(i, , , \
   nid, flags, p_start, p_end, p_nid)
 
-/**
- * for_each_resv_unavail_range - iterate through reserved and unavailable 
memory
- * @i: u64 used as loop variable
- * @flags: pick from blocks based on memory attributes
- * @p_start: ptr to phys_addr_t for start address of the range, can be