On Fri, Feb 08, 2013 at 11:33:40AM -0800, Hugh Dickins wrote: > > > <SNIP> > > > > > > 2. __ksm_enter() has a nice little optimization, to insert the new mm > > > just behind ksmd's cursor, so there's a full pass for it to stabilize > > > (or be removed) before ksmd addresses it. Nice when ksmd is running, > > > but not so nice when we're trying to unmerge all mms: we were missing > > > those mms forked and inserted behind the unmerge cursor. Easily fixed > > > by inserting at the end when KSM_RUN_UNMERGE. > > > > > > 3. It is possible for a KSM page to be faulted back from swapcache into > > > an mm, just after unmerge_and_remove_all_rmap_items() scanned past it. > > > Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private > > > to ksm.c, so dissolve the distinction between ksm_might_need_to_copy() > > > and ksm_does_need_to_copy(), doing it all in the one call into ksm.c. > > What I found is that a 4th cause emerges once KSM migration > is properly working: that interval during page migration when the old > page has been fully unmapped but the new not yet mapped in its place. >
For anyone else watching -- normal page migration expects to be protected during that particular window with migration ptes. Any references to the PTE mapping a page being migrated faults on a swap-like PTE and waits in migration_entry_wait(). > The KSM COW breaking cannot see a page there then, so it ends up with > a (newly migrated) KSM page left behind. Almost certainly has to be > fixed in follow_page(), but I've not yet settled on its final form - > the fix I have works well, but a different approach might be better. > follow_page() is one option. My guess is that you're thinking of adding a FOLL_ flag that will cause follow_page() to check is_migration_entry() and migration_entry_wait() if the flag is present. Otherwise you would need to check for migration ptes in a number of places under page lock and then hold the lock for long periods of time to prevent migration starting. I did not check this option in depth because it quickly looked like it would be a mess, with long page lock hold times and might not even be workable. > > > +static int remove_all_stable_nodes(void) > > > +{ > > > + struct stable_node *stable_node; > > > + int nid; > > > + int err = 0; > > > + > > > + for (nid = 0; nid < nr_node_ids; nid++) { > > > + while (root_stable_tree[nid].rb_node) { > > > + stable_node = rb_entry(root_stable_tree[nid].rb_node, > > > + struct stable_node, node); > > > + if (remove_stable_node(stable_node)) { > > > + err = -EBUSY; > > > + break; /* proceed to next nid */ > > > + } > > > > If remove_stable_node() returns an error then it's quite possible that it'll > > go boom when that page is encountered later but it's not guaranteed. It'd > > be best effort to continue removing as many of the stable nodes anyway. > > We're in trouble either way of course. > > If it returns an error, then indeed something we don't yet understand > has occurred, and we shall want to debug it. But unless it's due to > corruption somewhere, we shouldn't be in much trouble, shouldn't go boom: > remove_all_stable_nodes() error is ignored at the end of unmerging, it > will be tried again when changing merge_across_nodes, and an error > then will just prevent changing merge_across_nodes at that time. So > the mysteriously unremovable stable nodes remain the same kind of tree. > Ok. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/