On Thu, 7 May 2015 08:25:18 +0100 Mel Gorman <[email protected]> wrote:

> Waiman Long reported that 24TB machines hit OOM during basic setup when
> struct page initialisation was deferred. One approach is to initialise memory
> on demand but it interferes with page allocator paths. This patch creates
> dedicated threads to initialise memory before basic setup. It then blocks
> on a rw_semaphore until completion as a wait_queue and counter is overkill.
> This may be slower to boot but it's simplier overall and also gets rid of a
> section mangling which existed so kswapd could do the initialisation.

Seems a reasonable compromise.  It makes a bit of a mess of the patch
sequencing.

Have some tweaklets:



From: Andrew Morton <[email protected]>
Subject: mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix

include rwsem.h, use DECLARE_RWSEM, fix comment, remove unneeded cast

Cc: Daniel J Blueman <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Nathan Zimmer <[email protected]>
Cc: Scott Norton <[email protected]>
Cc: Waiman Long <[email protected]
Signed-off-by: Andrew Morton <[email protected]>
---

 mm/page_alloc.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff -puN 
mm/page_alloc.c~mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix
 mm/page_alloc.c
--- 
a/mm/page_alloc.c~mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix
+++ a/mm/page_alloc.c
@@ -18,6 +18,7 @@
 #include <linux/mm.h>
 #include <linux/swap.h>
 #include <linux/interrupt.h>
+#include <linux/rwsem.h>
 #include <linux/pagemap.h>
 #include <linux/jiffies.h>
 #include <linux/bootmem.h>
@@ -1075,12 +1076,12 @@ static void __init deferred_free_range(s
                __free_pages_boot_core(page, pfn, 0);
 }
 
-static struct rw_semaphore __initdata pgdat_init_rwsem;
+static __initdata DECLARE_RWSEM(pgdat_init_rwsem);
 
 /* Initialise remaining memory on a node */
 static int __init deferred_init_memmap(void *data)
 {
-       pg_data_t *pgdat = (pg_data_t *)data;
+       pg_data_t *pgdat = data;
        int nid = pgdat->node_id;
        struct mminit_pfnnid_cache nid_init_state = { };
        unsigned long start = jiffies;
@@ -1096,7 +1097,7 @@ static int __init deferred_init_memmap(v
                return 0;
        }
 
-       /* Bound memory initialisation to a local node if possible */
+       /* Bind memory initialisation thread to a local node if possible */
        if (!cpumask_empty(cpumask))
                set_cpus_allowed_ptr(current, cpumask);
 
@@ -1200,7 +1201,6 @@ void __init page_alloc_init_late(void)
 {
        int nid;
 
-       init_rwsem(&pgdat_init_rwsem);
        for_each_node_state(nid, N_MEMORY) {
                down_read(&pgdat_init_rwsem);
                kthread_run(deferred_init_memmap, NODE_DATA(nid), 
"pgdatinit%d", nid);
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to