On Thu, 7 May 2015 08:25:18 +0100 Mel Gorman <mgor...@suse.de> wrote:

> Waiman Long reported that 24TB machines hit OOM during basic setup when
> struct page initialisation was deferred. One approach is to initialise memory
> on demand but it interferes with page allocator paths. This patch creates
> dedicated threads to initialise memory before basic setup. It then blocks
> on a rw_semaphore until completion as a wait_queue and counter is overkill.
> This may be slower to boot but it's simplier overall and also gets rid of a
> section mangling which existed so kswapd could do the initialisation.

Seems a reasonable compromise.  It makes a bit of a mess of the patch
sequencing.

Have some tweaklets:



From: Andrew Morton <a...@linux-foundation.org>
Subject: mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix

include rwsem.h, use DECLARE_RWSEM, fix comment, remove unneeded cast

Cc: Daniel J Blueman <dan...@numascale.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mel Gorman <mgor...@suse.de>
Cc: Nathan Zimmer <nzim...@sgi.com>
Cc: Scott Norton <scott.nor...@hp.com>
Cc: Waiman Long <waiman.l...@hp.com
Signed-off-by: Andrew Morton <a...@linux-foundation.org>
---

 mm/page_alloc.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff -puN 
mm/page_alloc.c~mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix
 mm/page_alloc.c
--- 
a/mm/page_alloc.c~mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix
+++ a/mm/page_alloc.c
@@ -18,6 +18,7 @@
 #include <linux/mm.h>
 #include <linux/swap.h>
 #include <linux/interrupt.h>
+#include <linux/rwsem.h>
 #include <linux/pagemap.h>
 #include <linux/jiffies.h>
 #include <linux/bootmem.h>
@@ -1075,12 +1076,12 @@ static void __init deferred_free_range(s
                __free_pages_boot_core(page, pfn, 0);
 }
 
-static struct rw_semaphore __initdata pgdat_init_rwsem;
+static __initdata DECLARE_RWSEM(pgdat_init_rwsem);
 
 /* Initialise remaining memory on a node */
 static int __init deferred_init_memmap(void *data)
 {
-       pg_data_t *pgdat = (pg_data_t *)data;
+       pg_data_t *pgdat = data;
        int nid = pgdat->node_id;
        struct mminit_pfnnid_cache nid_init_state = { };
        unsigned long start = jiffies;
@@ -1096,7 +1097,7 @@ static int __init deferred_init_memmap(v
                return 0;
        }
 
-       /* Bound memory initialisation to a local node if possible */
+       /* Bind memory initialisation thread to a local node if possible */
        if (!cpumask_empty(cpumask))
                set_cpus_allowed_ptr(current, cpumask);
 
@@ -1200,7 +1201,6 @@ void __init page_alloc_init_late(void)
 {
        int nid;
 
-       init_rwsem(&pgdat_init_rwsem);
        for_each_node_state(nid, N_MEMORY) {
                down_read(&pgdat_init_rwsem);
                kthread_run(deferred_init_memmap, NODE_DATA(nid), 
"pgdatinit%d", nid);
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to