Re: [PATCH v3 0/3] Account reserved memory when allocating system hash

2016-08-31 Thread Michal Hocko
On Mon 29-08-16 18:36:47, Srikar Dronamraju wrote:
> Fadump kernel reserves large chunks of memory even before the pages are
> initialised. This could mean memory that corresponds to several nodes might
> fall in memblock reserved regions.
> 
> Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialise
> only certain size memory per node. The certain size takes into account
> the dentry and inode cache sizes. However such a kernel when booting a
> secondary kernel will not be able to allocate the required amount of
> memory to suffice for the dentry and inode caches. This results in
> crashes like the below on large systems such as 32 TB systems.
> 
> Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes)
> vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes
> swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3
> Call Trace:
> [c108fb10] [c07fac88] dump_stack+0xb0/0xf0 (unreliable)
> [c108fb50] [c0235264] warn_alloc_failed+0x114/0x160
> [c108fbf0] [c0281484] __vmalloc_node_range+0x304/0x340
> [c108fca0] [c028152c] __vmalloc+0x6c/0x90
> [c108fd40] [c0aecfb0]
> alloc_large_system_hash+0x1b8/0x2c0
> [c108fe00] [c0af7240] inode_init+0x94/0xe4
> [c108fe80] [c0af6fec] vfs_caches_init+0x8c/0x13c
> [c108ff00] [c0ac4014] start_kernel+0x50c/0x578
> [c108ff90] [c0008c6c] start_here_common+0x20/0xa8
> 
> This patchset solves this problem by accounting the size of reserved memory
> when calculating the size of large system hashes.

So I think that this is just a fallout from how fadump is hackish and
tricky. Reserving large portion/majority of memory from the kernel just
sounds like a mind field. This patchset is dealing with one particular
problem. Fair enough, it seems like the easiest way to go and something
that would be stable backport safe as well so
Acked-by: Michal Hocko  to those whole series

but I cannot say I would be happy about the whole fadump thing...

> While this patchset applies on v4.8-rc3, it cannot be tested on v4.8-rc3
> because of http://lkml.kernel.org/r/20160829093844.ga2...@linux.vnet.ibm.com
> However it has been tested on v4.7/v4.6 and v4.4

another supporting argument for the above. 15 out of 16 nodes without
any memory... Sigh

> v2: 
> http://lkml.kernel.org/r/1470330729-6273-1-git-send-email-sri...@linux.vnet.ibm.com
>  
> 
> Cc: linux...@kvack.org
> Cc: Mel Gorman 
> Cc: Vlastimil Babka 
> Cc: Michal Hocko 
> Cc: Andrew Morton 
> Cc: Michael Ellerman 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Mahesh Salgaonkar 
> Cc: Hari Bathini 
> Cc: Dave Hansen 
> Cc: Balbir Singh 
> Cc: Srikar Dronamraju 
> 
> Srikar Dronamraju (3):
>   mm: Introduce arch_reserved_kernel_pages()
>   mm/memblock: Expose total reserved memory
>   powerpc: Implement arch_reserved_kernel_pages
> 
>  arch/powerpc/include/asm/mmzone.h |  3 +++
>  arch/powerpc/kernel/fadump.c  |  5 +
>  include/linux/memblock.h  |  1 +
>  include/linux/mm.h|  3 +++
>  mm/memblock.c |  5 +
>  mm/page_alloc.c   | 12 
>  6 files changed, 29 insertions(+)
> 
> -- 
> 1.8.5.6

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v3 0/3] Account reserved memory when allocating system hash

2016-08-29 Thread Andrew Morton
On Mon, 29 Aug 2016 18:36:47 +0530 Srikar Dronamraju 
 wrote:

> Fadump kernel reserves large chunks of memory even before the pages are
> initialised. This could mean memory that corresponds to several nodes might
> fall in memblock reserved regions.
> 
> Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialise
> only certain size memory per node. The certain size takes into account
> the dentry and inode cache sizes. However such a kernel when booting a
> secondary kernel will not be able to allocate the required amount of
> memory to suffice for the dentry and inode caches. This results in
> crashes like the below on large systems such as 32 TB systems.
> 
> Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes)
> vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes
> swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3
> Call Trace:
> [c108fb10] [c07fac88] dump_stack+0xb0/0xf0 (unreliable)
> [c108fb50] [c0235264] warn_alloc_failed+0x114/0x160
> [c108fbf0] [c0281484] __vmalloc_node_range+0x304/0x340
> [c108fca0] [c028152c] __vmalloc+0x6c/0x90
> [c108fd40] [c0aecfb0]
> alloc_large_system_hash+0x1b8/0x2c0
> [c108fe00] [c0af7240] inode_init+0x94/0xe4
> [c108fe80] [c0af6fec] vfs_caches_init+0x8c/0x13c
> [c108ff00] [c0ac4014] start_kernel+0x50c/0x578
> [c108ff90] [c0008c6c] start_here_common+0x20/0xa8
> 
> This patchset solves this problem by accounting the size of reserved memory
> when calculating the size of large system hashes.

What's the priority on this, btw?  Not needed in earlier kernels?


Re: [PATCH v3 0/3] Account reserved memory when allocating system hash

2016-08-29 Thread Andrew Morton
On Mon, 29 Aug 2016 18:36:47 +0530 Srikar Dronamraju 
 wrote:

> Fadump kernel reserves large chunks of memory even before the pages are
> initialised. This could mean memory that corresponds to several nodes might
> fall in memblock reserved regions.
> 
> Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialise
> only certain size memory per node. The certain size takes into account
> the dentry and inode cache sizes. However such a kernel when booting a
> secondary kernel will not be able to allocate the required amount of
> memory to suffice for the dentry and inode caches. This results in
> crashes like the below on large systems such as 32 TB systems.
> 
> Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes)
> vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes
> swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3
> Call Trace:
> [c108fb10] [c07fac88] dump_stack+0xb0/0xf0 (unreliable)
> [c108fb50] [c0235264] warn_alloc_failed+0x114/0x160
> [c108fbf0] [c0281484] __vmalloc_node_range+0x304/0x340
> [c108fca0] [c028152c] __vmalloc+0x6c/0x90
> [c108fd40] [c0aecfb0]
> alloc_large_system_hash+0x1b8/0x2c0
> [c108fe00] [c0af7240] inode_init+0x94/0xe4
> [c108fe80] [c0af6fec] vfs_caches_init+0x8c/0x13c
> [c108ff00] [c0ac4014] start_kernel+0x50c/0x578
> [c108ff90] [c0008c6c] start_here_common+0x20/0xa8
> 
> This patchset solves this problem by accounting the size of reserved memory
> when calculating the size of large system hashes.
> 
> While this patchset applies on v4.8-rc3, it cannot be tested on v4.8-rc3
> because of http://lkml.kernel.org/r/20160829093844.ga2...@linux.vnet.ibm.com
> However it has been tested on v4.7/v4.6 and v4.4

That looks like a pretty serious regression.

I'll grab the patchset anyway.  It will come good when we fix that kswapd
thing.


[PATCH v3 0/3] Account reserved memory when allocating system hash

2016-08-29 Thread Srikar Dronamraju
Fadump kernel reserves large chunks of memory even before the pages are
initialised. This could mean memory that corresponds to several nodes might
fall in memblock reserved regions.

Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialise
only certain size memory per node. The certain size takes into account
the dentry and inode cache sizes. However such a kernel when booting a
secondary kernel will not be able to allocate the required amount of
memory to suffice for the dentry and inode caches. This results in
crashes like the below on large systems such as 32 TB systems.

Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes)
vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes
swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3
Call Trace:
[c108fb10] [c07fac88] dump_stack+0xb0/0xf0 (unreliable)
[c108fb50] [c0235264] warn_alloc_failed+0x114/0x160
[c108fbf0] [c0281484] __vmalloc_node_range+0x304/0x340
[c108fca0] [c028152c] __vmalloc+0x6c/0x90
[c108fd40] [c0aecfb0]
alloc_large_system_hash+0x1b8/0x2c0
[c108fe00] [c0af7240] inode_init+0x94/0xe4
[c108fe80] [c0af6fec] vfs_caches_init+0x8c/0x13c
[c108ff00] [c0ac4014] start_kernel+0x50c/0x578
[c108ff90] [c0008c6c] start_here_common+0x20/0xa8

This patchset solves this problem by accounting the size of reserved memory
when calculating the size of large system hashes.

While this patchset applies on v4.8-rc3, it cannot be tested on v4.8-rc3
because of http://lkml.kernel.org/r/20160829093844.ga2...@linux.vnet.ibm.com
However it has been tested on v4.7/v4.6 and v4.4

v2: 
http://lkml.kernel.org/r/1470330729-6273-1-git-send-email-sri...@linux.vnet.ibm.com
 

Cc: linux...@kvack.org
Cc: Mel Gorman 
Cc: Vlastimil Babka 
Cc: Michal Hocko 
Cc: Andrew Morton 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Mahesh Salgaonkar 
Cc: Hari Bathini 
Cc: Dave Hansen 
Cc: Balbir Singh 
Cc: Srikar Dronamraju 

Srikar Dronamraju (3):
  mm: Introduce arch_reserved_kernel_pages()
  mm/memblock: Expose total reserved memory
  powerpc: Implement arch_reserved_kernel_pages

 arch/powerpc/include/asm/mmzone.h |  3 +++
 arch/powerpc/kernel/fadump.c  |  5 +
 include/linux/memblock.h  |  1 +
 include/linux/mm.h|  3 +++
 mm/memblock.c |  5 +
 mm/page_alloc.c   | 12 
 6 files changed, 29 insertions(+)

-- 
1.8.5.6