Re: [RFC] [PATCH] more support for memory-less-node.
Andi Kleen <[EMAIL PROTECTED]> wrote: > Now if it's better to set up a empty node or use a nearby node > for a memory less cpu can be further discussed. I still think > I lean towards the later. Worst case: Only slot 0 is used. Plug your memoryless CPU card into the last slot before your plug the CPU+mem card into the last-1 slot. -- W.I.N.D.O.W.S.: Wireless Intelligent Neohuman Designed for Observation and Worldwide Sabotage -- http://www.brunching.com/toys/toy-cyborger.html Friß, Spammer: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] more support for memory-less-node.
On Tue, 13 Feb 2007 10:50:53 -0800 (PST) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Tue, 13 Feb 2007, Martin J. Bligh wrote: > > > What's wrong with just setting the existing counters like > > node_spanned_pages / node_present_pages to zero? > > Will this fix the breakage that Kame-san saw? > Now, memory-less-node's presetn_pages and spanned_pages are zero. and zone's present_pages is zero, too. We added populated_zone(zone) macro. This can check a zone has pages or not. (see build_zonelist in page_alloc.c) -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] more support for memory-less-node.
On Tue, 13 Feb 2007 09:25:00 -0800 (PST) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Tue, 13 Feb 2007, KAMEZAWA Hiroyuki wrote: > > > NOD_DATA(nid) is always valid pointer if a node is online. > > NODE_DATA(nid)->present_pages can be 0 even if a node is online, > > I call this as memory-less-node. > > Yes but the pgdat will have no valid zone in it. That is new. > we have populated_zone() macro for checking it. (I noticed node-hotplug can create memory-less-zone until memory is onlined by the user.) -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] more support for memory-less-node.
Andi Kleen wrote: [Tue Feb 13 2007, 01:18:45PM EST] > > > I wasn't suggesting having NULL pointers for pgdats, if that's what you > > mean. > > That is what started the original thread at least. Can happen on some > ia64 platforms. I don't believe there is a NULL pgdat. The code for memory less nodes in ia64 discontig.c allocates the memory less nodes pgdat from the best memory node candidate. If there is a NULL pgdat, then it's a bug. Instead for memory less nodes you don't have any present pages. I thought the bug was because the process wanted to bind on just one memoryless node and MPOL_BIND didn't handle that correctly and return an error to the process. bob > > > Just nodes with no memory in them, the pgdat would still be there. > > pgdat = struct node, except everything's badly named. > > Ok those can happen even on x86-64, mostly because it's possible > to fill up a node early during boot up with bootmem and then > it's effectively empty. > > [there is even still a open bug when this happens on node 0] > > Handling out of memory here of course has to be always done. > > Just NULL pointers in core data structures are evil. But I'm glad we > agree here. > > Now if it's better to set up a empty node or use a nearby node > for a memory less cpu can be further discussed. I still think > I lean towards the later. > > -Andi > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] more support for memory-less-node.
On Tue, 13 Feb 2007, Martin J. Bligh wrote: > What's wrong with just setting the existing counters like > node_spanned_pages / node_present_pages to zero? Will this fix the breakage that Kame-san saw? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] more support for memory-less-node.
Andi Kleen wrote: I wasn't suggesting having NULL pointers for pgdats, if that's what you mean. That is what started the original thread at least. Can happen on some ia64 platforms. OK, that does seem kind of ugly. Just nodes with no memory in them, the pgdat would still be there. pgdat = struct node, except everything's badly named. Ok those can happen even on x86-64, mostly because it's possible to fill up a node early during boot up with bootmem and then it's effectively empty. [there is even still a open bug when this happens on node 0] Handling out of memory here of course has to be always done. Yup, if we just set the "size" of the node to zero, it seems like a natural degenerate case that should be handled anyway. Just NULL pointers in core data structures are evil. But I'm glad we agree here. Now if it's better to set up a empty node or use a nearby node for a memory less cpu can be further discussed. I still think I lean towards the later. Just seems kind of ugly and unnecessary, particularly if that memory-less cpu (or IO node) is equidistant from one or more memory-possessing nodes. As long as their zonelist is set up correctly, it should all work fine without that, right? build_zonelists_node already checks populated_zone() so it looks like it's all set up for that already ... M. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] more support for memory-less-node.
> I wasn't suggesting having NULL pointers for pgdats, if that's what you > mean. That is what started the original thread at least. Can happen on some ia64 platforms. > Just nodes with no memory in them, the pgdat would still be there. > pgdat = struct node, except everything's badly named. Ok those can happen even on x86-64, mostly because it's possible to fill up a node early during boot up with bootmem and then it's effectively empty. [there is even still a open bug when this happens on node 0] Handling out of memory here of course has to be always done. Just NULL pointers in core data structures are evil. But I'm glad we agree here. Now if it's better to set up a empty node or use a nearby node for a memory less cpu can be further discussed. I still think I lean towards the later. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] more support for memory-less-node.
Christoph Lameter wrote: On Tue, 13 Feb 2007, Andi Kleen wrote: Adding NULL tests all over mm for this would seem like a clear case of this to me. Maybe there is an alternative. We are free to number the nodes right? How about requiring the low node number to have memory and the high ones do not? F.e. have a boundary like nr_mem_nodes ? Everything above nr_mem_nodes has no memory and cannot be specified in a nodemask. Those nodes would not be visible to user space via memory policies and page migration. So the core mempolicy logic could be left untouched. The nodes above nr_mem_nodes would exist purely within the kernel. They would have proximity information (which can be used to determine neighboring memory. More flexible then the current attachment to one fixed memory node) but those node numbers could not be specified as node masks in any memory operations. This would then allow memory less nodes with I/O or cpus. The user would not be aware of these. What's wrong with just setting the existing counters like node_spanned_pages / node_present_pages to zero? M. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] more support for memory-less-node.
Andi Kleen wrote: Your description of the node is correct, it's an arbitrary container of one or more resources. Not only is this definition flexible, it's also very useful, for memory hotplug, odd types of NUMA boxes, etc. I must disagree here. Special cases are always dangerous especially if they are hard to regression test. I made this discovery the hard way on x86-64 ... It's best to eliminate them in the first place, otherwise they will later come back and bite you when you don't expect it. Adding NULL tests all over mm for this would seem like a clear case of this to me. I wasn't suggesting having NULL pointers for pgdats, if that's what you mean. Just nodes with no memory in them, the pgdat would still be there. pgdat = struct node, except everything's badly named. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] more support for memory-less-node.
On Tue, 13 Feb 2007, Andi Kleen wrote: > Adding NULL tests all over mm for this would seem like a clear case > of this to me. Maybe there is an alternative. We are free to number the nodes right? How about requiring the low node number to have memory and the high ones do not? F.e. have a boundary like nr_mem_nodes ? Everything above nr_mem_nodes has no memory and cannot be specified in a nodemask. Those nodes would not be visible to user space via memory policies and page migration. So the core mempolicy logic could be left untouched. The nodes above nr_mem_nodes would exist purely within the kernel. They would have proximity information (which can be used to determine neighboring memory. More flexible then the current attachment to one fixed memory node) but those node numbers could not be specified as node masks in any memory operations. This would then allow memory less nodes with I/O or cpus. The user would not be aware of these. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] more support for memory-less-node.
> Your description of the node is correct, it's an arbitrary container of > one or more resources. Not only is this definition flexible, it's also > very useful, for memory hotplug, odd types of NUMA boxes, etc. I must disagree here. Special cases are always dangerous especially if they are hard to regression test. I made this discovery the hard way on x86-64 ... It's best to eliminate them in the first place, otherwise they will later come back and bite you when you don't expect it. Adding NULL tests all over mm for this would seem like a clear case of this to me. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] more support for memory-less-node.
On Tue, 13 Feb 2007, KAMEZAWA Hiroyuki wrote: > NOD_DATA(nid) is always valid pointer if a node is online. > NODE_DATA(nid)->present_pages can be 0 even if a node is online, > I call this as memory-less-node. Yes but the pgdat will have no valid zone in it. That is new. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] more support for memory-less-node.
On Tue, 13 Feb 2007, Andi Kleen wrote: > The trouble with this is that you'll need to harden large parts > of code against these. Especially a NULL pgdat is something quite > dangerous. You could make it a dummy empty pgdat, but just assigning it > nearby seems easier. Plus there is the issue of having a pgdat but without any valid zone in it. This is what triggered Kame-sans recent bug. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] more support for memory-less-node.
KAMEZAWA Hiroyuki wrote: In my last posintg, mempolicy-fix-for-memory-less-node patch, there was a discussion 'what do you consider definition of "node" as...? I found there is no consensus. But I want to go ahead. Before posing patch again, I'd like to discuss more. -Kame In my understanding, a "node" is a block of cpu, memory, devices. and there could be cpu-only-node, memory-only-node, device-only-node... There will be discussion. IMHO, to represent hardware configuration as it is, this definition is very natural and flexible. (And because my work is memory-hotplug, this definition fits me.) "Don't support such crazy configuraton" is one of opinions. I hear x86_64 doesn't support it and defines node as a block of memory, It remaps cpus on memory-less-nodes to other nodes. I know ia64 allows memory-less-node. (I don't know about ppc.) It works well on my box (and HP's box). It doesn't make much sense for an architecture independent structure to be "defined" in different ways by specific architectures. "not supported" or "currently broken" might be a better description. Your description of the node is correct, it's an arbitrary container of one or more resources. Not only is this definition flexible, it's also very useful, for memory hotplug, odd types of NUMA boxes, etc. M. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] more support for memory-less-node.
On Tue, 13 Feb 2007 09:29:49 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote: > > > In my understanding, a "node" is a block of cpu, memory, devices. > > and there could be cpu-only-node, memory-only-node, device-only-node... > > The trouble with this is that you'll need to harden large parts > of code against these. Especially a NULL pgdat is something quite > dangerous. You could make it a dummy empty pgdat, but just assigning it > nearby seems easier. Ah...It seems I didn't explain enough. Now, memorly-less-node has its own pgdat, for its own zonelist. All *online* node has its own NODA_DATA(nid). NOD_DATA(nid) is always valid pointer if a node is online. NODE_DATA(nid)->present_pages can be 0 even if a node is online, I call this as memory-less-node. Thanks, -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] more support for memory-less-node.
> In my understanding, a "node" is a block of cpu, memory, devices. > and there could be cpu-only-node, memory-only-node, device-only-node... The trouble with this is that you'll need to harden large parts of code against these. Especially a NULL pgdat is something quite dangerous. You could make it a dummy empty pgdat, but just assigning it nearby seems easier. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] [PATCH] more support for memory-less-node.
In my last posintg, mempolicy-fix-for-memory-less-node patch, there was a discussion 'what do you consider definition of "node" as...? I found there is no consensus. But I want to go ahead. Before posing patch again, I'd like to discuss more. -Kame In my understanding, a "node" is a block of cpu, memory, devices. and there could be cpu-only-node, memory-only-node, device-only-node... There will be discussion. IMHO, to represent hardware configuration as it is, this definition is very natural and flexible. (And because my work is memory-hotplug, this definition fits me.) "Don't support such crazy configuraton" is one of opinions. I hear x86_64 doesn't support it and defines node as a block of memory, It remaps cpus on memory-less-nodes to other nodes. I know ia64 allows memory-less-node. (I don't know about ppc.) It works well on my box (and HP's box). If there is memory-less-node, codes which checks all nodes which have memory should check NODE_DATA(nid)->present_pages. But following is a bit heavy operation. x for_each_online_node(nid) if (!node_present_pages(nid)) continue; x This patch adds a new node mask "node_memory_online_map" for nodes which have memory. for_each_node_mask(nid, node_memory_online_map) walks all memory-ready-nodes. This mask is updated at node-hotplug ops. Signed-Off-By: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> Index: linux-2.6.20/include/linux/nodemask.h === --- linux-2.6.20.orig/include/linux/nodemask.h 2007-02-07 17:25:54.0 +0900 +++ linux-2.6.20/include/linux/nodemask.h 2007-02-13 15:31:33.0 +0900 @@ -344,6 +344,8 @@ extern nodemask_t node_online_map; extern nodemask_t node_possible_map; +/* online nodes which have memory */ +extern nodemask_t node_memory_online_map; #if MAX_NUMNODES > 1 #define num_online_nodes() nodes_weight(node_online_map) Index: linux-2.6.20/mm/page_alloc.c === --- linux-2.6.20.orig/mm/page_alloc.c 2007-02-07 17:25:54.0 +0900 +++ linux-2.6.20/mm/page_alloc.c2007-02-13 15:54:04.0 +0900 @@ -54,6 +54,9 @@ EXPORT_SYMBOL(node_online_map); nodemask_t node_possible_map __read_mostly = NODE_MASK_ALL; EXPORT_SYMBOL(node_possible_map); +nodemask_t node_memory_online_map __read_mostly = { { [0] = 1UL } }; +EXPORT_SYMBOL(node_memory_online_map); + unsigned long totalram_pages __read_mostly; unsigned long totalreserve_pages __read_mostly; long nr_swap_pages; @@ -1805,6 +1808,16 @@ } } +static void __meminit fixup_memory_online_nodes(void) +{ + int nid; + nodes_clear(node_memory_online_map); + for_each_online_node(nid) { + if (node_present_pages(nid)) + node_set(nid, node_memory_online_map); + } +} + #else /* CONFIG_NUMA */ static void __meminit build_zonelists(pg_data_t *pgdat) @@ -1851,6 +1864,10 @@ pgdat->node_zonelists[i].zlcache_ptr = NULL; } +static void fixup_memory_online_nodes(void) +{ + return; +} #endif /* CONFIG_NUMA */ /* return values int just for stop_machine_run() */ @@ -1862,6 +1879,7 @@ build_zonelists(NODE_DATA(nid)); build_zonelist_cache(NODE_DATA(nid)); } + fixup_memory_online_nodes(); return 0; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/