Re: [PATCH v2 2/2] mm: fix initialization of struct page for holes in memory layout

2021-01-11 Thread Mike Rapoport
On Mon, Jan 11, 2021 at 10:06:43AM -0500, Qian Cai wrote:
> On Sun, 2021-01-10 at 17:39 +0200, Mike Rapoport wrote:
> > On Wed, Jan 06, 2021 at 04:04:21PM -0500, Qian Cai wrote:
> > > On Wed, 2021-01-06 at 10:05 +0200, Mike Rapoport wrote:
> > > > I think we trigger PF_POISONED_CHECK() in PageSlab(), then
> > > > fffe
> > > > is "accessed" from VM_BUG_ON_PAGE().
> > > > 
> > > > It seems to me that we are not initializing struct pages for holes at 
> > > > the
> > > > node
> > > > boundaries because zones are already clamped to exclude those holes.
> > > > 
> > > > Can you please try to see if the patch below will produce any useful 
> > > > info:
> > > 
> > > [0.00] init_unavailable_range: spfn: 8c, epfn: 9b, zone: DMA, 
> > > node:
> > > 0
> > > [0.00] init_unavailable_range: spfn: 1f7be, epfn: 1f9fe, zone:
> > > DMA32, node: 0
> > > [0.00] init_unavailable_range: spfn: 28784, epfn: 288e4, zone:
> > > DMA32, node: 0
> > > [0.00] init_unavailable_range: spfn: 298b9, epfn: 298bd, zone:
> > > DMA32, node: 0
> > > [0.00] init_unavailable_range: spfn: 29923, epfn: 29931, zone:
> > > DMA32, node: 0
> > > [0.00] init_unavailable_range: spfn: 29933, epfn: 29941, zone:
> > > DMA32, node: 0
> > > [0.00] init_unavailable_range: spfn: 29945, epfn: 29946, zone:
> > > DMA32, node: 0
> > > [0.00] init_unavailable_range: spfn: 29ff9, epfn: 2a823, zone:
> > > DMA32, node: 0
> > > [0.00] init_unavailable_range: spfn: 33a23, epfn: 33a53, zone:
> > > DMA32, node: 0
> > > [0.00] init_unavailable_range: spfn: 78000, epfn: 10, zone:
> > > DMA32, node: 0
> > > ...
> > > [  572.222563][ T2302] kpagecount_read: pfn 47f380 is poisoned
> > ...
> > > [  590.570032][ T2302] kpagecount_read: pfn 47 is poisoned
> > > [  604.268653][ T2302] kpagecount_read: pfn 87ff80 is poisoned
> > ...
> > > [  604.611698][ T2302] kpagecount_read: pfn 87ffbc is poisoned
> > > [  617.484205][ T2302] kpagecount_read: pfn c7ff80 is poisoned
> > ...
> > > [  618.212344][ T2302] kpagecount_read: pfn c7 is poisoned
> > > [  633.134228][ T2302] kpagecount_read: pfn 107ff80 is poisoned
> > ...
> > > [  633.874087][ T2302] kpagecount_read: pfn 107 is poisoned
> > > [  647.686412][ T2302] kpagecount_read: pfn 147ff80 is poisoned
> > ...
> > > [  648.425548][ T2302] kpagecount_read: pfn 147 is poisoned
> > > [  663.692630][ T2302] kpagecount_read: pfn 187ff80 is poisoned
> > ...
> > > [  664.432671][ T2302] kpagecount_read: pfn 187 is poisoned
> > > [  675.462757][ T2302] kpagecount_read: pfn 1c7ff80 is poisoned
> > ...
> > > [  676.202548][ T2302] kpagecount_read: pfn 1c7 is poisoned
> > > [  687.121605][ T2302] kpagecount_read: pfn 207ff80 is poisoned
> > ...
> > > [  687.860981][ T2302] kpagecount_read: pfn 207 is poisoned
> > 
> > The e820 map has a hole near the end of each node and these holes are not
> > initialized with init_unavailable_range() after it was interleaved with
> > memmap initialization because such holes are not accounted by
> > zone->spanned_pages.
> > 
> > Yet, I'm still cannot really understand how this never triggered 
> > 
> > VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
> > 
> > before v5.7 as all the struct pages for these holes would have zone=0 and
> > node=0 ... 
> > 
> > @Qian, can you please boot your system with memblock=debug and share the
> > logs?
> > 
> 
> http://people.redhat.com/qcai/memblock.txt

Thanks!

So, we have these large allocations for the memory maps:

memblock_alloc_exact_nid_raw: 266338304 bytes align=0x20 nid=0 
from=0x0100 max_addr=0x sparse_init_nid+0x13b/0x519
memblock_reserve: [0x00046f40-0x00047f1f] 
memblock_alloc_range_nid+0x108/0x1b6
memblock_alloc_exact_nid_raw: 268435456 bytes align=0x20 nid=1 
from=0x0100 max_addr=0x sparse_init_nid+0x13b/0x519
memblock_reserve: [0x00086fe0-0x00087fdf] 
memblock_alloc_range_nid+0x108/0x1b6
memblock_alloc_exact_nid_raw: 268435456 bytes align=0x20 nid=2 
from=0x0100 max_addr=0x sparse_init_nid+0x13b/0x519
memblock_reserve: [0x000c6fe0-0x000c7fdf] 
memblock_alloc_range_nid+0x108/0x1b6
memblock_alloc_exact_nid_raw: 268435456 bytes align=0x20 nid=3 
from=0x0100 max_addr=0x sparse_init_nid+0x13b/0x519
memblock_reserve: [0x00106fe0-0x00107fdf] 
memblock_alloc_range_nid+0x108/0x1b6
memblock_alloc_exact_nid_raw: 268435456 bytes align=0x20 nid=4 
from=0x0100 max_addr=0x sparse_init_nid+0x13b/0x519
memblock_reserve: [0x00146fe0-0x00147fdf] 
memblock_alloc_range_nid+0x108/0x1b6
memblock_alloc_exact_nid_raw: 268435456 bytes align=0x20 nid=5 
from=0x0100 max_addr=0x sparse_init_nid+0x13b/0x519
memblock_reserve: [0x00186fe0-0x00187fdf] 

Re: [PATCH v2 2/2] mm: fix initialization of struct page for holes in memory layout

2021-01-11 Thread Qian Cai
On Sun, 2021-01-10 at 17:39 +0200, Mike Rapoport wrote:
> On Wed, Jan 06, 2021 at 04:04:21PM -0500, Qian Cai wrote:
> > On Wed, 2021-01-06 at 10:05 +0200, Mike Rapoport wrote:
> > > I think we trigger PF_POISONED_CHECK() in PageSlab(), then
> > > fffe
> > > is "accessed" from VM_BUG_ON_PAGE().
> > > 
> > > It seems to me that we are not initializing struct pages for holes at the
> > > node
> > > boundaries because zones are already clamped to exclude those holes.
> > > 
> > > Can you please try to see if the patch below will produce any useful info:
> > 
> > [0.00] init_unavailable_range: spfn: 8c, epfn: 9b, zone: DMA, node:
> > 0
> > [0.00] init_unavailable_range: spfn: 1f7be, epfn: 1f9fe, zone:
> > DMA32, node: 0
> > [0.00] init_unavailable_range: spfn: 28784, epfn: 288e4, zone:
> > DMA32, node: 0
> > [0.00] init_unavailable_range: spfn: 298b9, epfn: 298bd, zone:
> > DMA32, node: 0
> > [0.00] init_unavailable_range: spfn: 29923, epfn: 29931, zone:
> > DMA32, node: 0
> > [0.00] init_unavailable_range: spfn: 29933, epfn: 29941, zone:
> > DMA32, node: 0
> > [0.00] init_unavailable_range: spfn: 29945, epfn: 29946, zone:
> > DMA32, node: 0
> > [0.00] init_unavailable_range: spfn: 29ff9, epfn: 2a823, zone:
> > DMA32, node: 0
> > [0.00] init_unavailable_range: spfn: 33a23, epfn: 33a53, zone:
> > DMA32, node: 0
> > [0.00] init_unavailable_range: spfn: 78000, epfn: 10, zone:
> > DMA32, node: 0
> > ...
> > [  572.222563][ T2302] kpagecount_read: pfn 47f380 is poisoned
> ...
> > [  590.570032][ T2302] kpagecount_read: pfn 47 is poisoned
> > [  604.268653][ T2302] kpagecount_read: pfn 87ff80 is poisoned
> ...
> > [  604.611698][ T2302] kpagecount_read: pfn 87ffbc is poisoned
> > [  617.484205][ T2302] kpagecount_read: pfn c7ff80 is poisoned
> ...
> > [  618.212344][ T2302] kpagecount_read: pfn c7 is poisoned
> > [  633.134228][ T2302] kpagecount_read: pfn 107ff80 is poisoned
> ...
> > [  633.874087][ T2302] kpagecount_read: pfn 107 is poisoned
> > [  647.686412][ T2302] kpagecount_read: pfn 147ff80 is poisoned
> ...
> > [  648.425548][ T2302] kpagecount_read: pfn 147 is poisoned
> > [  663.692630][ T2302] kpagecount_read: pfn 187ff80 is poisoned
> ...
> > [  664.432671][ T2302] kpagecount_read: pfn 187 is poisoned
> > [  675.462757][ T2302] kpagecount_read: pfn 1c7ff80 is poisoned
> ...
> > [  676.202548][ T2302] kpagecount_read: pfn 1c7 is poisoned
> > [  687.121605][ T2302] kpagecount_read: pfn 207ff80 is poisoned
> ...
> > [  687.860981][ T2302] kpagecount_read: pfn 207 is poisoned
> 
> The e820 map has a hole near the end of each node and these holes are not
> initialized with init_unavailable_range() after it was interleaved with
> memmap initialization because such holes are not accounted by
> zone->spanned_pages.
> 
> Yet, I'm still cannot really understand how this never triggered 
> 
>   VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
> 
> before v5.7 as all the struct pages for these holes would have zone=0 and
> node=0 ... 
> 
> @Qian, can you please boot your system with memblock=debug and share the
> logs?
> 

http://people.redhat.com/qcai/memblock.txt



Re: [PATCH v2 2/2] mm: fix initialization of struct page for holes in memory layout

2021-01-10 Thread Mike Rapoport
On Wed, Jan 06, 2021 at 04:04:21PM -0500, Qian Cai wrote:
> On Wed, 2021-01-06 at 10:05 +0200, Mike Rapoport wrote:
> > I think we trigger PF_POISONED_CHECK() in PageSlab(), then fffe
> > is "accessed" from VM_BUG_ON_PAGE().
> > 
> > It seems to me that we are not initializing struct pages for holes at the 
> > node
> > boundaries because zones are already clamped to exclude those holes.
> > 
> > Can you please try to see if the patch below will produce any useful info:
> 
> [0.00] init_unavailable_range: spfn: 8c, epfn: 9b, zone: DMA, node: 0
> [0.00] init_unavailable_range: spfn: 1f7be, epfn: 1f9fe, zone: DMA32, 
> node: 0
> [0.00] init_unavailable_range: spfn: 28784, epfn: 288e4, zone: DMA32, 
> node: 0
> [0.00] init_unavailable_range: spfn: 298b9, epfn: 298bd, zone: DMA32, 
> node: 0
> [0.00] init_unavailable_range: spfn: 29923, epfn: 29931, zone: DMA32, 
> node: 0
> [0.00] init_unavailable_range: spfn: 29933, epfn: 29941, zone: DMA32, 
> node: 0
> [0.00] init_unavailable_range: spfn: 29945, epfn: 29946, zone: DMA32, 
> node: 0
> [0.00] init_unavailable_range: spfn: 29ff9, epfn: 2a823, zone: DMA32, 
> node: 0
> [0.00] init_unavailable_range: spfn: 33a23, epfn: 33a53, zone: DMA32, 
> node: 0
> [0.00] init_unavailable_range: spfn: 78000, epfn: 10, zone: 
> DMA32, node: 0
> ...
> [  572.222563][ T2302] kpagecount_read: pfn 47f380 is poisoned
...
> [  590.570032][ T2302] kpagecount_read: pfn 47 is poisoned
> [  604.268653][ T2302] kpagecount_read: pfn 87ff80 is poisoned
...
> [  604.611698][ T2302] kpagecount_read: pfn 87ffbc is poisoned
> [  617.484205][ T2302] kpagecount_read: pfn c7ff80 is poisoned
...
> [  618.212344][ T2302] kpagecount_read: pfn c7 is poisoned
> [  633.134228][ T2302] kpagecount_read: pfn 107ff80 is poisoned
...
> [  633.874087][ T2302] kpagecount_read: pfn 107 is poisoned
> [  647.686412][ T2302] kpagecount_read: pfn 147ff80 is poisoned
...
> [  648.425548][ T2302] kpagecount_read: pfn 147 is poisoned
> [  663.692630][ T2302] kpagecount_read: pfn 187ff80 is poisoned
...
> [  664.432671][ T2302] kpagecount_read: pfn 187 is poisoned
> [  675.462757][ T2302] kpagecount_read: pfn 1c7ff80 is poisoned
...
> [  676.202548][ T2302] kpagecount_read: pfn 1c7 is poisoned
> [  687.121605][ T2302] kpagecount_read: pfn 207ff80 is poisoned
...
> [  687.860981][ T2302] kpagecount_read: pfn 207 is poisoned

The e820 map has a hole near the end of each node and these holes are not
initialized with init_unavailable_range() after it was interleaved with
memmap initialization because such holes are not accounted by
zone->spanned_pages.

Yet, I'm still cannot really understand how this never triggered 

VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);

before v5.7 as all the struct pages for these holes would have zone=0 and
node=0 ... 

@Qian, can you please boot your system with memblock=debug and share the
logs?

-- 
Sincerely yours,
Mike.


Re: [PATCH v2 2/2] mm: fix initialization of struct page for holes in memory layout

2021-01-06 Thread Qian Cai
On Wed, 2021-01-06 at 10:05 +0200, Mike Rapoport wrote:
> I think we trigger PF_POISONED_CHECK() in PageSlab(), then fffe
> is "accessed" from VM_BUG_ON_PAGE().
> 
> It seems to me that we are not initializing struct pages for holes at the node
> boundaries because zones are already clamped to exclude those holes.
> 
> Can you please try to see if the patch below will produce any useful info:

[0.00] init_unavailable_range: spfn: 8c, epfn: 9b, zone: DMA, node: 0
[0.00] init_unavailable_range: spfn: 1f7be, epfn: 1f9fe, zone: DMA32, 
node: 0
[0.00] init_unavailable_range: spfn: 28784, epfn: 288e4, zone: DMA32, 
node: 0
[0.00] init_unavailable_range: spfn: 298b9, epfn: 298bd, zone: DMA32, 
node: 0
[0.00] init_unavailable_range: spfn: 29923, epfn: 29931, zone: DMA32, 
node: 0
[0.00] init_unavailable_range: spfn: 29933, epfn: 29941, zone: DMA32, 
node: 0
[0.00] init_unavailable_range: spfn: 29945, epfn: 29946, zone: DMA32, 
node: 0
[0.00] init_unavailable_range: spfn: 29ff9, epfn: 2a823, zone: DMA32, 
node: 0
[0.00] init_unavailable_range: spfn: 33a23, epfn: 33a53, zone: DMA32, 
node: 0
[0.00] init_unavailable_range: spfn: 78000, epfn: 10, zone: DMA32, 
node: 0
...
[  572.222563][ T2302] kpagecount_read: pfn 47f380 is poisoned
[  572.228208][ T2302] kpagecount_read: pfn 47f381 is poisoned
[  572.233823][ T2302] kpagecount_read: pfn 47f382 is poisoned
[  572.239465][ T2302] kpagecount_read: pfn 47f383 is poisoned
[  572.245495][ T2302] kpagecount_read: pfn 47f384 is poisoned
[  572.251110][ T2302] kpagecount_read: pfn 47f385 is poisoned
[  572.256739][ T2302] kpagecount_read: pfn 47f386 is poisoned
[  572.262353][ T2302] kpagecount_read: pfn 47f387 is poisoned
[  572.268445][ T2302] kpagecount_read: pfn 47f388 is poisoned
[  572.274057][ T2302] kpagecount_read: pfn 47f389 is poisoned
[  572.279687][ T2302] kpagecount_read: pfn 47f38a is poisoned
[  572.285320][ T2302] kpagecount_read: pfn 47f38b is poisoned
[  572.290934][ T2302] kpagecount_read: pfn 47f38c is poisoned
[  572.296939][ T2302] kpagecount_read: pfn 47f38d is poisoned
[  572.302551][ T2302] kpagecount_read: pfn 47f38e is poisoned
[  572.308180][ T2302] kpagecount_read: pfn 47f38f is poisoned
[  572.313791][ T2302] kpagecount_read: pfn 47f390 is poisoned
[  572.319859][ T2302] kpagecount_read: pfn 47f391 is poisoned
[  572.325536][ T2302] kpagecount_read: pfn 47f392 is poisoned
[  572.331150][ T2302] kpagecount_read: pfn 47f393 is poisoned
[  572.336945][ T2302] kpagecount_read: pfn 47f394 is poisoned
[  572.342981][ T2302] kpagecount_read: pfn 47f395 is poisoned
[  572.348615][ T2302] kpagecount_read: pfn 47f396 is poisoned
[  572.354226][ T2302] kpagecount_read: pfn 47f397 is poisoned
[  572.359865][ T2302] kpagecount_read: pfn 47f398 is poisoned
[  572.365495][ T2302] kpagecount_read: pfn 47f399 is poisoned
[  572.371568][ T2302] kpagecount_read: pfn 47f39a is poisoned
[  572.377199][ T2302] kpagecount_read: pfn 47f39b is poisoned
[  572.382813][ T2302] kpagecount_read: pfn 47f39c is poisoned
[  572.388443][ T2302] kpagecount_read: pfn 47f39d is poisoned
[  572.394507][ T2302] kpagecount_read: pfn 47f39e is poisoned
[  572.400137][ T2302] kpagecount_read: pfn 47f39f is poisoned
[  572.405766][ T2302] kpagecount_read: pfn 47f3a0 is poisoned
[  572.411379][ T2302] kpagecount_read: pfn 47f3a1 is poisoned
[  572.417475][ T2302] kpagecount_read: pfn 47f3a2 is poisoned
[  572.423088][ T2302] kpagecount_read: pfn 47f3a3 is poisoned
[  572.428717][ T2302] kpagecount_read: pfn 47f3a4 is poisoned
[  572.434329][ T2302] kpagecount_read: pfn 47f3a5 is poisoned
[  572.439963][ T2302] kpagecount_read: pfn 47f3a6 is poisoned
[  572.446079][ T2302] kpagecount_read: pfn 47f3a7 is poisoned
[  572.451692][ T2302] kpagecount_read: pfn 47f3a8 is poisoned
[  572.457367][ T2302] kpagecount_read: pfn 47f3a9 is poisoned
[  572.462981][ T2302] kpagecount_read: pfn 47f3aa is poisoned
[  572.469079][ T2302] kpagecount_read: pfn 47f3ab is poisoned
[  572.474694][ T2302] kpagecount_read: pfn 47f3ac is poisoned
[  572.480332][ T2302] kpagecount_read: pfn 47f3ad is poisoned
[  572.485962][ T2302] kpagecount_read: pfn 47f3ae is poisoned
[  572.491577][ T2302] kpagecount_read: pfn 47f3af is poisoned
[  572.497677][ T2302] kpagecount_read: pfn 47f3b0 is poisoned
[  572.503292][ T2302] kpagecount_read: pfn 47f3b1 is poisoned
[  572.508921][ T2302] kpagecount_read: pfn 47f3b2 is poisoned
[  572.514535][ T2302] kpagecount_read: pfn 47f3b3 is poisoned
[  572.520643][ T2302] kpagecount_read: pfn 47f3b4 is poisoned
[  572.526273][ T2302] kpagecount_read: pfn 47f3b5 is poisoned
[  572.531886][ T2302] kpagecount_read: pfn 47f3b6 is poisoned
[  572.537524][ T2302] kpagecount_read: pfn 47f3b7 is poisoned
[  572.543676][ T2302] kpagecount_read: pfn 47f3b8 is poisoned
[  572.549305][ T2302] kpagecount_read: pfn 47f3b9 is poisoned
[  572.554919][ T2302] kpagecount_read: pfn 47f3ba is poisoned
[  

Re: [PATCH v2 2/2] mm: fix initialization of struct page for holes in memory layout

2021-01-06 Thread Mike Rapoport
On Tue, Jan 05, 2021 at 01:45:37PM -0500, Qian Cai wrote:
> On Tue, 2021-01-05 at 10:24 +0200, Mike Rapoport wrote:
> > Hi,
> > 
> > On Mon, Jan 04, 2021 at 02:03:00PM -0500, Qian Cai wrote:
> > > On Wed, 2020-12-09 at 23:43 +0200, Mike Rapoport wrote:
> > > > From: Mike Rapoport 
> > > > 
> > > > Interleave initialization of pages that correspond to holes with the
> > > > initialization of memory map, so that zone and node information will be
> > > > properly set on such pages.
> > > > 
> > > > Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions
> > > > rather
> > > > that check each PFN")
> > > > Reported-by: Andrea Arcangeli 
> > > > Signed-off-by: Mike Rapoport 
> > > 
> > > Reverting this commit on the top of today's linux-next fixed a crash while
> > > reading /proc/kpagecount on a NUMA server.
> > 
> > Can you please post the entire dmesg?
> 
> http://people.redhat.com/qcai/dmesg.txt
> 
> > Is it possible to get the pfn that triggered the crash?
> 
> Do you have any idea how to convert that fffe to pfn as it is 
> always
> that address? I don't understand what that address is though. I tried to catch
> it from struct page pointer and page_address() without luck.

I think we trigger PF_POISONED_CHECK() in PageSlab(), then fffe
is "accessed" from VM_BUG_ON_PAGE().

It seems to me that we are not initializing struct pages for holes at the node
boundaries because zones are already clamped to exclude those holes.

Can you please try to see if the patch below will produce any useful info:
 
diff --git a/fs/proc/page.c b/fs/proc/page.c
index 4dcbcd506cb6..708f8211dcc0 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -66,10 +66,14 @@ static ssize_t kpagecount_read(struct file *file, char 
__user *buf,
 */
ppage = pfn_to_online_page(pfn);
 
-   if (!ppage || PageSlab(ppage) || page_has_type(ppage))
+   if (ppage && PagePoisoned(ppage)) {
+   pr_info("%s: pfn %lx is poisoned\n", __func__, pfn);
pcount = 0;
-   else
+   } else if (!ppage || PageSlab(ppage) || page_has_type(ppage)) {
+   pcount = 0;
+   } else {
pcount = page_mapcount(ppage);
+   }
 
if (put_user(pcount, out)) {
ret = -EFAULT;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 124b8c654ec6..1b3a37ace1b1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6271,6 +6271,8 @@ static u64 __init init_unavailable_range(unsigned long 
spfn, unsigned long epfn,
unsigned long pfn;
u64 pgcnt = 0;
 
+   pr_info("%s: spfn: %lx, epfn: %lx, zone: %s, node: %d\n", __func__, 
spfn, epfn, zone_names[zone], node);
+
for (pfn = spfn; pfn < epfn; pfn++) {
if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
 
> >  
> > > [ 8858.006726][T99897] BUG: unable to handle page fault for address:
> > > fffe
> > > [ 8858.014814][T99897] #PF: supervisor read access in kernel mode
> > > [ 8858.020686][T99897] #PF: error_code(0x) - not-present page
> > > [ 8858.026557][T99897] PGD 1371417067 P4D 1371417067 PUD 1371419067 PMD 0 
> > > [ 8858.033224][T99897] Oops:  [#1] SMP KASAN NOPTI
> > > [ 8858.038710][T99897] CPU: 28 PID: 99897 Comm: proc01 Tainted:
> > > G   O  5.11.0-rc1-next-20210104 #1
> > > [ 8858.048515][T99897] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > > DL385 Gen10, BIOS A40 03/09/2018
> > > [ 8858.057794][T99897] RIP: 0010:kpagecount_read+0x1be/0x5e0
> > > PageSlab at include/linux/page-flags.h:342
> > > (inlined by) kpagecount_read at fs/proc/page.c:69
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v2 2/2] mm: fix initialization of struct page for holes in memory layout

2021-01-05 Thread Qian Cai
On Tue, 2021-01-05 at 10:24 +0200, Mike Rapoport wrote:
> Hi,
> 
> On Mon, Jan 04, 2021 at 02:03:00PM -0500, Qian Cai wrote:
> > On Wed, 2020-12-09 at 23:43 +0200, Mike Rapoport wrote:
> > > From: Mike Rapoport 
> > > 
> > > Interleave initialization of pages that correspond to holes with the
> > > initialization of memory map, so that zone and node information will be
> > > properly set on such pages.
> > > 
> > > Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions
> > > rather
> > > that check each PFN")
> > > Reported-by: Andrea Arcangeli 
> > > Signed-off-by: Mike Rapoport 
> > 
> > Reverting this commit on the top of today's linux-next fixed a crash while
> > reading /proc/kpagecount on a NUMA server.
> 
> Can you please post the entire dmesg?

http://people.redhat.com/qcai/dmesg.txt

> Is it possible to get the pfn that triggered the crash?

Do you have any idea how to convert that fffe to pfn as it is always
that address? I don't understand what that address is though. I tried to catch
it from struct page pointer and page_address() without luck.

>  
> > [ 8858.006726][T99897] BUG: unable to handle page fault for address:
> > fffe
> > [ 8858.014814][T99897] #PF: supervisor read access in kernel mode
> > [ 8858.020686][T99897] #PF: error_code(0x) - not-present page
> > [ 8858.026557][T99897] PGD 1371417067 P4D 1371417067 PUD 1371419067 PMD 0 
> > [ 8858.033224][T99897] Oops:  [#1] SMP KASAN NOPTI
> > [ 8858.038710][T99897] CPU: 28 PID: 99897 Comm: proc01 Tainted:
> > G   O  5.11.0-rc1-next-20210104 #1
> > [ 8858.048515][T99897] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > DL385 Gen10, BIOS A40 03/09/2018
> > [ 8858.057794][T99897] RIP: 0010:kpagecount_read+0x1be/0x5e0
> > PageSlab at include/linux/page-flags.h:342
> > (inlined by) kpagecount_read at fs/proc/page.c:69



Re: [PATCH v2 2/2] mm: fix initialization of struct page for holes in memory layout

2021-01-05 Thread Mike Rapoport
Hi,

On Mon, Jan 04, 2021 at 02:03:00PM -0500, Qian Cai wrote:
> On Wed, 2020-12-09 at 23:43 +0200, Mike Rapoport wrote:
> > From: Mike Rapoport 
> > 
> > Interleave initialization of pages that correspond to holes with the
> > initialization of memory map, so that zone and node information will be
> > properly set on such pages.
> > 
> > Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather
> > that check each PFN")
> > Reported-by: Andrea Arcangeli 
> > Signed-off-by: Mike Rapoport 
> 
> Reverting this commit on the top of today's linux-next fixed a crash while
> reading /proc/kpagecount on a NUMA server.

Can you please post the entire dmesg?
Is it possible to get the pfn that triggered the crash?
 
> [ 8858.006726][T99897] BUG: unable to handle page fault for address: 
> fffe
> [ 8858.014814][T99897] #PF: supervisor read access in kernel mode
> [ 8858.020686][T99897] #PF: error_code(0x) - not-present page
> [ 8858.026557][T99897] PGD 1371417067 P4D 1371417067 PUD 1371419067 PMD 0 
> [ 8858.033224][T99897] Oops:  [#1] SMP KASAN NOPTI
> [ 8858.038710][T99897] CPU: 28 PID: 99897 Comm: proc01 Tainted: G   O 
>  5.11.0-rc1-next-20210104 #1
> [ 8858.048515][T99897] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 
> Gen10, BIOS A40 03/09/2018
> [ 8858.057794][T99897] RIP: 0010:kpagecount_read+0x1be/0x5e0
> PageSlab at include/linux/page-flags.h:342
> (inlined by) kpagecount_read at fs/proc/page.c:69

-- 
Sincerely yours,
Mike.


Re: [PATCH v2 2/2] mm: fix initialization of struct page for holes in memory layout

2021-01-04 Thread Qian Cai
On Wed, 2020-12-09 at 23:43 +0200, Mike Rapoport wrote:
> From: Mike Rapoport 
> 
> There could be struct pages that are not backed by actual physical memory.
> This can happen when the actual memory bank is not a multiple of
> SECTION_SIZE or when an architecture does not register memory holes
> reserved by the firmware as memblock.memory.
> 
> Such pages are currently initialized using init_unavailable_mem() function
> that iterated through PFNs in holes in memblock.memory and if there is a
> struct page corresponding to a PFN, the fields if this page are set to
> default values and it is marked as Reserved.
> 
> init_unavailable_mem() does not take into account zone and node the page
> belongs to and sets both zone and node links in struct page to zero.
> 
> On a system that has firmware reserved holes in a zone above ZONE_DMA, for
> instance in a configuration below:
> 
>   # grep -A1 E820 /proc/iomem
>   7a17b000-7a216fff : Unknown E820 type
>   7a217000-7bff : System RAM
> 
> unset zone link in struct page will trigger
> 
>   VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
> 
> because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link in
> struct page) in the same pageblock.
> 
> Interleave initialization of pages that correspond to holes with the
> initialization of memory map, so that zone and node information will be
> properly set on such pages.
> 
> Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather
> that check each PFN")
> Reported-by: Andrea Arcangeli 
> Signed-off-by: Mike Rapoport 

Reverting this commit on the top of today's linux-next fixed a crash while
reading /proc/kpagecount on a NUMA server.

[ 8858.006726][T99897] BUG: unable to handle page fault for address: 
fffe
[ 8858.014814][T99897] #PF: supervisor read access in kernel mode
[ 8858.020686][T99897] #PF: error_code(0x) - not-present page
[ 8858.026557][T99897] PGD 1371417067 P4D 1371417067 PUD 1371419067 PMD 0 
[ 8858.033224][T99897] Oops:  [#1] SMP KASAN NOPTI
[ 8858.038710][T99897] CPU: 28 PID: 99897 Comm: proc01 Tainted: G   O   
   5.11.0-rc1-next-20210104 #1
[ 8858.048515][T99897] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 
Gen10, BIOS A40 03/09/2018
[ 8858.057794][T99897] RIP: 0010:kpagecount_read+0x1be/0x5e0
PageSlab at include/linux/page-flags.h:342
(inlined by) kpagecount_read at fs/proc/page.c:69
[ 8858.063717][T99897] Code: 3c 30 00 0f 85 29 03 00 00 48 8b 53 08 48 8d 42 ff 
83 e2 01 48 0f 44 c3 48 89 c2 48 c1 ea 03 42 80 3c 32 00 0f 85 e7 02 00 00 <48> 
83 38 ff 0f 84 f3 01 00 00 48 89 c8 48 c1 e8 03 42 80 3c 30 00
[ 8858.083303][T99897] RSP: 0018:c9002159fdd0 EFLAGS: 00010246
[ 8858.089637][T99897] RAX: fffe RBX: ea0011fce000 RCX: 
ea0011fce008
[ 8858.097518][T99897] RDX: 1fff RSI: 0064d7c0 RDI: 
951f91c8
[ 8858.105396][T99897] RBP: 0064d7c0 R08: ed129063f402 R09: 
ed129063f402
[ 8858.113760][T99897] R10: 8894831fa00b R11: ed129063f401 R12: 
0047f380
[ 8858.121639][T99897] R13: 0400 R14: dc00 R15: 
0064d7c0
[ 8858.129517][T99897] FS:  7fd18849d040() GS:88a02fc0() 
knlGS:
[ 8858.138886][T99897] CS:  0010 DS:  ES:  CR0: 80050033
[ 8858.145369][T99897] CR2: fffe CR3: 001c8b5d CR4: 
003506e0
[ 8858.153247][T99897] Call Trace:
[ 8858.156415][T99897]  proc_reg_read+0x1a6/0x240
[ 8858.161345][T99897]  vfs_read+0x175/0x440
[ 8858.165383][T99897]  ksys_read+0xf1/0x1c0
[ 8858.169420][T99897]  ? vfs_write+0x870/0x870
[ 8858.173719][T99897]  ? task_work_run+0xeb/0x170
[ 8858.178284][T99897]  ? syscall_enter_from_user_mode+0x1c/0x40
[ 8858.184073][T99897]  do_syscall_64+0x33/0x40
[ 8858.188863][T99897]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 8858.194652][T99897] RIP: 0033:0x7fd187da1d5d
[ 8858.198952][T99897] Code: 31 11 2b 00 31 c9 64 83 3e 0b 75 ca eb b8 e8 ca fb 
ff ff 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 39 ca 77 2b 31 c0 0f 05 <48> 
3d 00 f0 ff ff 77 0b c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 15
[ 8858.218978][T99897] RSP: 002b:7ffe733de1f8 EFLAGS: 0246 ORIG_RAX: 

[ 8858.227297][T99897] RAX: ffda RBX: 7ffe733df370 RCX: 
7fd187da1d5d
[ 8858.235824][T99897] RDX: 0400 RSI: 0064d7c0 RDI: 
0004
[ 8858.243739][T99897] RBP: 0400 R08: 018fbe73 R09: 
7fd187e13d40
[ 8858.251617][T99897] R10:  R11: 0246 R12: 
023f9c00
[ 8858.259496][T99897] R13: 0004 R14: 0044663c R15: 

[ 8858.267856][T99897] Modules linked in: vfat fat fuse vfio_pci vfio_virqfd 
vfio_iommu_type1 vfio loop iavf kvm_amd ses kvm enclosure irqbypass 
acpi_cpufreq ip_tables x_tables sd_mod smartpqi bnxt_en scsi_transport_sas tg3 
i40e firmware_class libphy dm_mirror dm_region_hash dm_log 

Re: [PATCH v2 2/2] mm: fix initialization of struct page for holes in memory layout

2020-12-10 Thread Greg KH
On Wed, Dec 09, 2020 at 11:43:04PM +0200, Mike Rapoport wrote:
> From: Mike Rapoport 
> 
> There could be struct pages that are not backed by actual physical memory.
> This can happen when the actual memory bank is not a multiple of
> SECTION_SIZE or when an architecture does not register memory holes
> reserved by the firmware as memblock.memory.
> 
> Such pages are currently initialized using init_unavailable_mem() function
> that iterated through PFNs in holes in memblock.memory and if there is a
> struct page corresponding to a PFN, the fields if this page are set to
> default values and it is marked as Reserved.
> 
> init_unavailable_mem() does not take into account zone and node the page
> belongs to and sets both zone and node links in struct page to zero.
> 
> On a system that has firmware reserved holes in a zone above ZONE_DMA, for
> instance in a configuration below:
> 
>   # grep -A1 E820 /proc/iomem
>   7a17b000-7a216fff : Unknown E820 type
>   7a217000-7bff : System RAM
> 
> unset zone link in struct page will trigger
> 
>   VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
> 
> because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link in
> struct page) in the same pageblock.
> 
> Interleave initialization of pages that correspond to holes with the
> initialization of memory map, so that zone and node information will be
> properly set on such pages.
> 
> Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather
> that check each PFN")
> Reported-by: Andrea Arcangeli 
> Signed-off-by: Mike Rapoport 
> ---
>  mm/page_alloc.c | 152 +---
>  1 file changed, 65 insertions(+), 87 deletions(-)




This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.




Re: [PATCH v2 2/2] mm: fix initialization of struct page for holes in memory layout

2020-12-09 Thread Andrea Arcangeli
Hello,

On Wed, Dec 09, 2020 at 11:43:04PM +0200, Mike Rapoport wrote:
> +void __init __weak memmap_init(unsigned long size, int nid,
> +unsigned long zone,
> +unsigned long range_start_pfn)
> +{
> + unsigned long start_pfn, end_pfn, hole_start_pfn = 0;
>   unsigned long range_end_pfn = range_start_pfn + size;
> + u64 pgcnt = 0;
>   int i;
>  
>   for_each_mem_pfn_range(i, nid, _pfn, _pfn, NULL) {
>   start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn);
>   end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn);
> + hole_start_pfn = clamp(hole_start_pfn, range_start_pfn,
> +range_end_pfn);
>  
>   if (end_pfn > start_pfn) {
>   size = end_pfn - start_pfn;
>   memmap_init_zone(size, nid, zone, start_pfn,
>MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
>   }
> +
> + if (hole_start_pfn < start_pfn)
> + pgcnt += init_unavailable_range(hole_start_pfn,
> + start_pfn, zone, nid);
> + hole_start_pfn = end_pfn;
>   }

After applying the new 1/2, the above loop seem to be functionally a
noop compared to what was in -mm yesterday, so the above looks great
as far as I'm concerned.

Unlike the simple fix this will not loop over holes that aren't part
of memblock.memory nor memblock.reserved and it drops the static
variable which would have required ordering and serialization.

By being functionally equivalent, it looks it also suffers from the
same dependency on pfn 0 (and not just pfn 0) being reserved that you
pointed out earlier.

I suppose to drop that further dependency we need a further round down
in this logic to the start of the pageblock_order or max-order like
mentioned yesterday?

If the first pfn of a pageblock (or maybe better a max-order block) is
valid, but not in memblock.reserved nor memblock.memory and any other
pages in such pageblock is freed to the buddy allocator, we should
make sure the whole pageblock gets initialized (or at least the pages
with a pfn lower than the one that was added to the buddy). So
applying a round down in the above loop might just do the trick.

Since the removal of that extra dependency was mostly orthogonal with
the above, I guess it's actually cleaner to do it incrementally.

I'd suggest to also document why we're doing it, in the code (not just
commit header) of the incremental patch, by mentioning which are the
specific VM invariants we're enforcing that the VM code always
depended upon, that required the rundown etc...

In the meantime I'll try to update all systems again with this
implementation to test it.

Thanks!
Andrea