Hi Hugh,

first of all, please let me apologize for the delay in my response and thank
you for your extensive review!

On Sun, 30 Sep 2012, Hugh Dickins wrote:
> Andrea's point about ksm_migrate_page() is an important one, and I've
> answered that against his mail, but here's some other easier points.
> 
> On Mon, 24 Sep 2012, Petr Holasek wrote:
> 
> > Introduces new sysfs boolean knob /sys/kernel/mm/ksm/merge_across_nodes
> > which control merging pages across different numa nodes.
> > When it is set to zero only pages from the same node are merged,
> > otherwise pages from all nodes can be merged together (default behavior).
> > 

...

> > 
> > v4: - merge_nodes was renamed to merge_across_nodes
> >     - share_all debug knob was dropped
> 
> Yes, you were right to drop the share_all knob for now.  I do like the
> idea, but it was quite inappropriate to mix it in with this NUMAnode
> patch.  And although I like the idea, I think it wants a bit more: I
> already have a hacky "allksm" boot option patch to mm/mmap.c for my
> own testing, which adds VM_MERGEABLE everywhere.  If I gave that up
> (I'd like to!) in favour of yours, I think I would still be missing
> all the mms that are created before there's a chance to switch the
> share_all mode on.  Or maybe I misread your v3.  Anyway, that's a
> different topic, happily taken off today's agenda.
> 

Agreed, it hid original purpose of this patch and made it more difficult for
eventual merging. So let's move it lower on the ksm todo list for this time :)

> > diff --git a/mm/ksm.c b/mm/ksm.c
> > index 47c8853..7c82032 100644
> > --- a/mm/ksm.c
> > +++ b/mm/ksm.c
> > @@ -36,6 +36,7 @@
> >  #include <linux/hash.h>
> >  #include <linux/freezer.h>
> >  #include <linux/oom.h>
> > +#include <linux/numa.h>
> >  
> >  #include <asm/tlbflush.h>
> >  #include "internal.h"
> > @@ -140,7 +141,10 @@ struct rmap_item {
> >     unsigned long address;          /* + low bits used for flags below */
> >     unsigned int oldchecksum;       /* when unstable */
> >     union {
> > -           struct rb_node node;    /* when node of unstable tree */
> > +           struct {
> > +                   struct rb_node node;    /* when node of unstable tree */
> > +                   struct rb_root *root;
> > +           };
> 
> This worries me a little, enlarging struct rmap_item: there may
> be very many of them in the system, best to minimize their size.
> 
> (This struct rb_root *root is used for one thing only, isn't it?  For the
> unstable rb_erase() in remove_rmap_item_from_tree().  It annoys me that
> we need to record root just for that, but I don't see an alternative.)

Yes, I've played a quite lot with this issue, but wasn't able to find an
alternative solution, too.

> 
> The 64-bit case can be easily improved by locating unsigned int nid
> after oldchecksum instead.  The 32-bit case (which supports smaller
> NODES_SHIFT: 6 was the largest I found) could be "improved" by keeping
> nid in the lower bits of address along with (moved) UNSTABLE_FLAG and
> STABLE_FLAG and reduced SEQNR_MASK - we really need only 1 bit for that,
> but 2 bits would be nice for keeping the BUG_ON(age > 1).
> 
> But you may think I'm going too far there, and prefer just to put
> #ifdef CONFIG_NUMA around the unsigned int nid, so at least it does
> not enlarge the more basic 32-bit configuration.
> 

I like your idea of unsigned int nid, will implement it in next version.

...

> >  
> > -   for (node = rb_first(&root_stable_tree); node; node = rb_next(node)) {
> > -           struct stable_node *stable_node;
> > +   for (i = 0; i < MAX_NUMNODES; i++)
> 
> It's irritating to have to do this outer nid loop, but I think you're
> right, that the memory hotremove notification does not quite tell us
> which node to look at.  Or can we derive that from start_pfn?  Would
> it be safe to assume that end_pfn-1 must be in the same node?
> 

I had assumed that we can't rely on end_pfn-1 in the same node, but now
mm/memory_hotremove.c looks to me that we can rely on it because memory 
hotremove
callback is triggered for each zone where we are removing memory.
So I think yes, we can optimize it in the way you mentioned above. If I am wrong
correct me, please :)

...

> 
> Looks nice - thank you.
> 

Thanks for your help!

Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to