Re: [RFC] [PATCH] more support for memory-less-node.

2007-02-15 Thread Bodo Eggert
Andi Kleen <[EMAIL PROTECTED]> wrote:

> Now if it's better to set up a empty node or use a nearby node
> for a memory less cpu can be further discussed. I still think
> I lean towards the later.

Worst case: Only slot 0 is used. Plug your memoryless CPU card into the last
slot before your plug the CPU+mem card into the last-1 slot.
-- 
W.I.N.D.O.W.S.:
 Wireless Intelligent Neohuman Designed for Observation and Worldwide Sabotage
-- http://www.brunching.com/toys/toy-cyborger.html
Friß, Spammer: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] more support for memory-less-node.

2007-02-13 Thread KAMEZAWA Hiroyuki
On Tue, 13 Feb 2007 10:50:53 -0800 (PST)
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> On Tue, 13 Feb 2007, Martin J. Bligh wrote:
> 
> > What's wrong with just setting the existing counters like
> > node_spanned_pages / node_present_pages to zero?
> 
> Will this fix the breakage that Kame-san saw?
> 

Now, memory-less-node's presetn_pages and spanned_pages are zero.
and zone's present_pages is  zero, too.

We added populated_zone(zone) macro. This can check a zone has pages or not.
(see build_zonelist in page_alloc.c)

-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] more support for memory-less-node.

2007-02-13 Thread KAMEZAWA Hiroyuki
On Tue, 13 Feb 2007 09:25:00 -0800 (PST)
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> On Tue, 13 Feb 2007, KAMEZAWA Hiroyuki wrote:
> 
> > NOD_DATA(nid) is always valid pointer if a node is online.
> > NODE_DATA(nid)->present_pages can be 0 even if a node is online, 
> > I call this as memory-less-node.
> 
> Yes but the pgdat will have no valid zone in it. That is new.
> 
we have populated_zone() macro for checking it.

(I noticed node-hotplug can create memory-less-zone until memory is
 onlined by the user.)


-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] more support for memory-less-node.

2007-02-13 Thread Bob Picco
Andi Kleen wrote:   [Tue Feb 13 2007, 01:18:45PM EST]
> 
> > I wasn't suggesting having NULL pointers for pgdats, if that's what you
> > mean. 
> 
> That is what started the original thread at least. Can happen on some
> ia64 platforms.
I don't believe there is a NULL pgdat. The code for memory less nodes in
ia64 discontig.c allocates the memory less nodes pgdat from the best
memory node candidate. If there is a NULL pgdat, then it's a bug. Instead
for memory less nodes you don't have any present pages. 

I thought the bug was because the process wanted to bind on just one
memoryless node and MPOL_BIND didn't handle that correctly and return
an error to the process.

bob
> 
> > Just nodes with no memory in them, the pgdat would still be there. 
> > pgdat = struct node, except everything's badly named.
> 
> Ok those can happen even on x86-64, mostly because it's possible
> to fill up a node early during boot up with bootmem and then
> it's effectively empty.
> 
> [there is even still a open bug when this happens on node 0]
>  
> Handling out of memory here of course has to be always done.
> 
> Just NULL pointers in core data structures are evil. But I'm glad we 
> agree here.
> 
> Now if it's better to set up a empty node or use a nearby node
> for a memory less cpu can be further discussed. I still think
> I lean towards the later.
> 
> -Andi
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] more support for memory-less-node.

2007-02-13 Thread Christoph Lameter
On Tue, 13 Feb 2007, Martin J. Bligh wrote:

> What's wrong with just setting the existing counters like
> node_spanned_pages / node_present_pages to zero?

Will this fix the breakage that Kame-san saw?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] more support for memory-less-node.

2007-02-13 Thread Martin J. Bligh

Andi Kleen wrote:

I wasn't suggesting having NULL pointers for pgdats, if that's what you
mean. 


That is what started the original thread at least. Can happen on some
ia64 platforms.


OK, that does seem kind of ugly.

Just nodes with no memory in them, the pgdat would still be there. 
pgdat = struct node, except everything's badly named.


Ok those can happen even on x86-64, mostly because it's possible
to fill up a node early during boot up with bootmem and then
it's effectively empty.

[there is even still a open bug when this happens on node 0]
 
Handling out of memory here of course has to be always done.


Yup, if we just set the "size" of the node to zero, it seems
like a natural degenerate case that should be handled anyway.

Just NULL pointers in core data structures are evil. But I'm glad we 
agree here.


Now if it's better to set up a empty node or use a nearby node
for a memory less cpu can be further discussed. I still think
I lean towards the later.


Just seems kind of ugly and unnecessary, particularly if that
memory-less cpu (or IO node) is equidistant from one or more
memory-possessing nodes. As long as their zonelist is set up
correctly, it should all work fine without that, right?

build_zonelists_node already checks populated_zone() so it looks
like it's all set up for that already ...

M.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] more support for memory-less-node.

2007-02-13 Thread Andi Kleen

> I wasn't suggesting having NULL pointers for pgdats, if that's what you
> mean. 

That is what started the original thread at least. Can happen on some
ia64 platforms.

> Just nodes with no memory in them, the pgdat would still be there. 
> pgdat = struct node, except everything's badly named.

Ok those can happen even on x86-64, mostly because it's possible
to fill up a node early during boot up with bootmem and then
it's effectively empty.

[there is even still a open bug when this happens on node 0]
 
Handling out of memory here of course has to be always done.

Just NULL pointers in core data structures are evil. But I'm glad we 
agree here.

Now if it's better to set up a empty node or use a nearby node
for a memory less cpu can be further discussed. I still think
I lean towards the later.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] more support for memory-less-node.

2007-02-13 Thread Martin J. Bligh

Christoph Lameter wrote:

On Tue, 13 Feb 2007, Andi Kleen wrote:


Adding NULL tests all over mm for this would seem like a clear case
of this to me.


Maybe there is an alternative. We are free to number the nodes right? 
How about requiring the low node number to have memory and the high ones 
do not?


F.e. have a boundary like

nr_mem_nodes ?

Everything above nr_mem_nodes has no memory and cannot be specified in a 
nodemask. Those nodes would not be visible to user space via memory 
policies and page migration. So the core mempolicy logic could be left 
untouched.


The nodes above nr_mem_nodes would exist purely within the kernel. They 
would have proximity information (which can be used to determine 
neighboring memory. More flexible then the current attachment 
to one fixed memory node) but those node numbers could not be specified as 
node masks in any memory operations. This would then allow memory less nodes 
with I/O or cpus. The user would not be aware of these.


What's wrong with just setting the existing counters like
node_spanned_pages / node_present_pages to zero?

M.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] more support for memory-less-node.

2007-02-13 Thread Martin J. Bligh

Andi Kleen wrote:

Your description of the node is correct, it's an arbitrary container of
one or more resources. Not only is this definition flexible, it's also
very useful, for memory hotplug, odd types of NUMA boxes, etc.


I must disagree here. Special cases are always dangerous especially
if they are hard to regression test. I made this discovery the hard
way on x86-64 ... It's best to eliminate them in the first place,
otherwise they will later come back and bite you when you don't expect it.

Adding NULL tests all over mm for this would seem like a clear case
of this to me.


I wasn't suggesting having NULL pointers for pgdats, if that's what you
mean. Just nodes with no memory in them, the pgdat would still be there.
pgdat = struct node, except everything's badly named.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] more support for memory-less-node.

2007-02-13 Thread Christoph Lameter
On Tue, 13 Feb 2007, Andi Kleen wrote:

> Adding NULL tests all over mm for this would seem like a clear case
> of this to me.

Maybe there is an alternative. We are free to number the nodes right? 
How about requiring the low node number to have memory and the high ones 
do not?

F.e. have a boundary like

nr_mem_nodes ?

Everything above nr_mem_nodes has no memory and cannot be specified in a 
nodemask. Those nodes would not be visible to user space via memory 
policies and page migration. So the core mempolicy logic could be left 
untouched.

The nodes above nr_mem_nodes would exist purely within the kernel. They 
would have proximity information (which can be used to determine 
neighboring memory. More flexible then the current attachment 
to one fixed memory node) but those node numbers could not be specified as 
node masks in any memory operations. This would then allow memory less nodes 
with I/O or cpus. The user would not be aware of these.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] more support for memory-less-node.

2007-02-13 Thread Andi Kleen

> Your description of the node is correct, it's an arbitrary container of
> one or more resources. Not only is this definition flexible, it's also
> very useful, for memory hotplug, odd types of NUMA boxes, etc.

I must disagree here. Special cases are always dangerous especially
if they are hard to regression test. I made this discovery the hard
way on x86-64 ... It's best to eliminate them in the first place,
otherwise they will later come back and bite you when you don't expect it.

Adding NULL tests all over mm for this would seem like a clear case
of this to me.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] more support for memory-less-node.

2007-02-13 Thread Christoph Lameter
On Tue, 13 Feb 2007, KAMEZAWA Hiroyuki wrote:

> NOD_DATA(nid) is always valid pointer if a node is online.
> NODE_DATA(nid)->present_pages can be 0 even if a node is online, 
> I call this as memory-less-node.

Yes but the pgdat will have no valid zone in it. That is new.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] more support for memory-less-node.

2007-02-13 Thread Christoph Lameter
On Tue, 13 Feb 2007, Andi Kleen wrote:

> The trouble with this is that you'll need to harden large parts
> of code against these.  Especially a NULL pgdat is something quite
> dangerous. You could make it a dummy empty pgdat, but just assigning it
> nearby seems easier.

Plus there is the issue of having a pgdat but without any valid zone in 
it. This is what triggered Kame-sans recent bug.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] more support for memory-less-node.

2007-02-13 Thread Martin J. Bligh

KAMEZAWA Hiroyuki wrote:
In my last posintg, mempolicy-fix-for-memory-less-node patch, there was a 
discussion 'what do you consider definition of "node" as...?

I found there is no consensus. But I want to go ahead.
Before posing patch again, I'd like to discuss more.

-Kame

In my understanding, a "node" is a block of cpu, memory, devices.
and there could be cpu-only-node, memory-only-node, device-only-node...

There will be discussion. IMHO, to represent hardware configuration
as it is, this definition is very natural and flexible.
(And because my work is memory-hotplug, this definition fits me.)

"Don't support such crazy configuraton" is one of opinions.
I hear x86_64 doesn't support it and defines node as a block of memory,
It remaps cpus on memory-less-nodes to other nodes.
I know ia64 allows memory-less-node. (I don't know about ppc.)
It works well on my box (and HP's box).


It doesn't make much sense for an architecture independent structure to
be "defined" in different ways by specific architectures. "not
supported" or "currently broken" might be a better description.

Your description of the node is correct, it's an arbitrary container of
one or more resources. Not only is this definition flexible, it's also
very useful, for memory hotplug, odd types of NUMA boxes, etc.

M.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] more support for memory-less-node.

2007-02-13 Thread KAMEZAWA Hiroyuki
On Tue, 13 Feb 2007 09:29:49 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:

> 
> > In my understanding, a "node" is a block of cpu, memory, devices.
> > and there could be cpu-only-node, memory-only-node, device-only-node...
> 
> The trouble with this is that you'll need to harden large parts
> of code against these.  Especially a NULL pgdat is something quite
> dangerous. You could make it a dummy empty pgdat, but just assigning it
> nearby seems easier.

Ah...It seems I didn't explain enough.

Now, memorly-less-node has its own pgdat, for its own zonelist.
All *online* node has its own NODA_DATA(nid).

NOD_DATA(nid) is always valid pointer if a node is online.
NODE_DATA(nid)->present_pages can be 0 even if a node is online, 
I call this as memory-less-node.

Thanks,
-Kame


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] more support for memory-less-node.

2007-02-13 Thread Andi Kleen

> In my understanding, a "node" is a block of cpu, memory, devices.
> and there could be cpu-only-node, memory-only-node, device-only-node...

The trouble with this is that you'll need to harden large parts
of code against these.  Especially a NULL pgdat is something quite
dangerous. You could make it a dummy empty pgdat, but just assigning it
nearby seems easier.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] [PATCH] more support for memory-less-node.

2007-02-12 Thread KAMEZAWA Hiroyuki
In my last posintg, mempolicy-fix-for-memory-less-node patch, there was a 
discussion 'what do you consider definition of "node" as...?
I found there is no consensus. But I want to go ahead.
Before posing patch again, I'd like to discuss more.

-Kame

In my understanding, a "node" is a block of cpu, memory, devices.
and there could be cpu-only-node, memory-only-node, device-only-node...

There will be discussion. IMHO, to represent hardware configuration
as it is, this definition is very natural and flexible.
(And because my work is memory-hotplug, this definition fits me.)

"Don't support such crazy configuraton" is one of opinions.
I hear x86_64 doesn't support it and defines node as a block of memory,
It remaps cpus on memory-less-nodes to other nodes.
I know ia64 allows memory-less-node. (I don't know about ppc.)
It works well on my box (and HP's box).

If there is memory-less-node, codes which checks all nodes which have memory
should check NODE_DATA(nid)->present_pages.

But following is a bit heavy operation.
x
for_each_online_node(nid)
if (!node_present_pages(nid))
continue;
x

This patch adds a new node mask "node_memory_online_map" for nodes
which have memory.

for_each_node_mask(nid, node_memory_online_map) walks all memory-ready-nodes.
This mask is updated at node-hotplug ops.

Signed-Off-By: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>

Index: linux-2.6.20/include/linux/nodemask.h
===
--- linux-2.6.20.orig/include/linux/nodemask.h  2007-02-07 17:25:54.0 
+0900
+++ linux-2.6.20/include/linux/nodemask.h   2007-02-13 15:31:33.0 
+0900
@@ -344,6 +344,8 @@
 
 extern nodemask_t node_online_map;
 extern nodemask_t node_possible_map;
+/* online nodes which have memory */
+extern nodemask_t node_memory_online_map;
 
 #if MAX_NUMNODES > 1
 #define num_online_nodes() nodes_weight(node_online_map)
Index: linux-2.6.20/mm/page_alloc.c
===
--- linux-2.6.20.orig/mm/page_alloc.c   2007-02-07 17:25:54.0 +0900
+++ linux-2.6.20/mm/page_alloc.c2007-02-13 15:54:04.0 +0900
@@ -54,6 +54,9 @@
 EXPORT_SYMBOL(node_online_map);
 nodemask_t node_possible_map __read_mostly = NODE_MASK_ALL;
 EXPORT_SYMBOL(node_possible_map);
+nodemask_t node_memory_online_map __read_mostly = { { [0] = 1UL } };
+EXPORT_SYMBOL(node_memory_online_map);
+
 unsigned long totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 long nr_swap_pages;
@@ -1805,6 +1808,16 @@
}
 }
 
+static void __meminit fixup_memory_online_nodes(void)
+{
+   int nid;
+   nodes_clear(node_memory_online_map);
+   for_each_online_node(nid) {
+   if (node_present_pages(nid))
+   node_set(nid, node_memory_online_map);
+   }
+}
+
 #else  /* CONFIG_NUMA */
 
 static void __meminit build_zonelists(pg_data_t *pgdat)
@@ -1851,6 +1864,10 @@
pgdat->node_zonelists[i].zlcache_ptr = NULL;
 }
 
+static void fixup_memory_online_nodes(void)
+{
+   return;
+}
 #endif /* CONFIG_NUMA */
 
 /* return values int just for stop_machine_run() */
@@ -1862,6 +1879,7 @@
build_zonelists(NODE_DATA(nid));
build_zonelist_cache(NODE_DATA(nid));
}
+   fixup_memory_online_nodes();
return 0;
 }
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/