On Wed, 19 Dec 2012, Petr Holasek wrote:
> On Tue, 18 Dec 2012, Hugh Dickins wrote:
> > On Tue, 18 Dec 2012, Sasha Levin wrote:
> > 
> > > Hi all,
> > > 
> > > While fuzzing with trinity inside a KVM tools guest, running latest 
> > > linux-next kernel, I've
> > > stumbled on the following:
> > > 
> > > [  127.959264] BUG: unable to handle kernel NULL pointer dereference at 
> > > 0000000000000110
> > > [  127.960379] IP: [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90
> ...
> 
> > > 88 e9 b9 09 00 00 90 <48> 81 3b 60 59 22 86 b8 01 00 00 00 44 0f 44 e8 41 
> > > 83 fc 01 77
> > > [  127.978032] RIP  [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90
> > > [  127.978032]  RSP <ffff8800137abb78>
> > > [  127.978032] CR2: 0000000000000110
> > > [  127.978032] ---[ end trace 3dc1b0c5db8c1230 ]---
> > > 
> > > The relevant piece of code is:
> > > 
> > >   static struct page *get_mergeable_page(struct rmap_item *rmap_item)
> > >   {
> > >           struct mm_struct *mm = rmap_item->mm;
> > >           unsigned long addr = rmap_item->address;
> > >           struct vm_area_struct *vma;
> > >           struct page *page;
> > >   
> > >           down_read(&mm->mmap_sem);
> > > 
> > > Where 'mm' is NULL. I'm not really sure how it happens though.
> > 
> > Thanks, yes, I got that, and it's not peculiar to fuzzing at all:
> > I'm testing the fix at the moment, but just hit something else too
> > (ksmd oops on NULL p->mm in task_numa_fault i.e. task_numa_placement).
> > 
> > For the moment, you're safer not to run KSM: configure it out or don't
> > set it to run.  Fixes to follow later, I'll try to remember to Cc you.

Sadly I couldn't send out fixes yesterday, because my testing then
met another more subtle problem, that I've still not yet resolved.

> > 
> 
> Hello all,
> 
> I've also tried fuzzing with trinity inside of kvm guest when tested KSM
> patch, but applied on top of 3.7-rc8, but didn't trigger that oops. So
> going to do the same testing on linux-next.

I haven't tried linux-next myself, it being in a transitional state
until 3.8-rc1.  I've been testing on Linus's git from last weekend
(head at a4f1de176614f634c367e5994a7bcc428c940df0 to be precise),
plus your patch, plus my fixes.

> 
> Hugh, does it seem like bug in unstable_tree_search_insert() you mentioned
> in yesterday email of something else?

Yes, Sasha's is exactly the one I got earlier: hitting in
get_mergeable_page() due to bug in unstable_tree_search_insert().

>From your next mail, I see you've started looking into it; so I'd
better show what I have at the moment, although I do hate posting
incomplete known-buggy patches: this on top of git + your patch.

The kernel/sched/fair.c part to go to Mel and 3.8-rc1 when I've a
moment to separate it out.

The CONFIG_NUMA section in stable_tree_append() I added late
last night, but it's not enough to fix the issue I'm still seeing:
merge_across_nodes 0 and 1 cases are stable, but switching between
them can bring pages_shared down to 0 but leave something behind
in the stable_tree.  I suspect it's a wrong-nid issue, but I've
not yet tracked it down with the BUG_ONs I've been adding (not
included below).

Removing the KSM_RUN_MERGE test: unimportant, but so far as I could
see it should be unnecessary, and if it is necessary, then I think
we would need to check for another state too.

Hastily back to debugging,
Hugh

--- 3.7+git+petr/kernel/sched/fair.c    2012-12-16 16:35:08.724441527 -0800
+++ linux/kernel/sched/fair.c   2012-12-18 21:37:24.727964195 -0800
@@ -793,8 +793,11 @@ unsigned int sysctl_numa_balancing_scan_
 
 static void task_numa_placement(struct task_struct *p)
 {
-       int seq = ACCESS_ONCE(p->mm->numa_scan_seq);
+       int seq;
 
+       if (!p->mm)     /* for example, ksmd faulting in a user's mm */
+               return;
+       seq = ACCESS_ONCE(p->mm->numa_scan_seq);
        if (p->numa_scan_seq == seq)
                return;
        p->numa_scan_seq = seq;
--- 3.7+git+petr/mm/ksm.c       2012-12-18 12:15:04.972032321 -0800
+++ linux/mm/ksm.c      2012-12-19 09:21:12.004004777 -0800
@@ -1151,7 +1151,6 @@ struct rmap_item *unstable_tree_search_i
 
        nid = get_kpfn_nid(page_to_pfn(page));
        root = &root_unstable_tree[nid];
-
        new = &root->rb_node;
 
        while (*new) {
@@ -1174,22 +1173,16 @@ struct rmap_item *unstable_tree_search_i
                }
 
                /*
-                * When there isn't same page location, don't do anything.
-                * If tree_page was migrated previously, it will be flushed
-                * out and put into right unstable tree next time. If the
-                * page was migrated in the meantime, it will be ignored
-                * this round. When both pages were migrated to the same
-                * node, ignore them too.
+                * If tree_page has been migrated to another NUMA node, it
+                * will be flushed out and put into the right unstable tree
+                * next time: only merge with it if merge_across_nodes.
                 * Just notice, we don't have similar problem for PageKsm
-                * because their migration is disabled now. (62b61f611e) */
-
-#ifdef CONFIG_NUMA
-               if (page_to_nid(page) != page_to_nid(tree_page) ||
-                       tree_rmap_item->nid != page_to_nid(tree_page)) {
+                * because their migration is disabled now. (62b61f611e)
+                */
+               if (!ksm_merge_across_nodes && page_to_nid(tree_page) != nid) {
                        put_page(tree_page);
                        return NULL;
                }
-#endif
 
                ret = memcmp_pages(page, tree_page);
 
@@ -1209,7 +1202,7 @@ struct rmap_item *unstable_tree_search_i
        rmap_item->address |= UNSTABLE_FLAG;
        rmap_item->address |= (ksm_scan.seqnr & SEQNR_MASK);
 #ifdef CONFIG_NUMA
-       rmap_item->nid = page_to_nid(page);
+       rmap_item->nid = nid;
 #endif
        rb_link_node(&rmap_item->node, parent, new);
        rb_insert_color(&rmap_item->node, root);
@@ -1226,6 +1219,13 @@ struct rmap_item *unstable_tree_search_i
 static void stable_tree_append(struct rmap_item *rmap_item,
                               struct stable_node *stable_node)
 {
+#ifdef CONFIG_NUMA
+       /*
+        * Usually rmap_item->nid is already set correctly,
+        * but it may be wrong after switching merge_across_nodes.
+        */
+       rmap_item->nid = get_kpfn_nid(stable_node->kpfn);
+#endif
        rmap_item->head = stable_node;
        rmap_item->address |= STABLE_FLAG;
        hlist_add_head(&rmap_item->hlist, &stable_node->hlist);
@@ -1852,7 +1852,7 @@ static struct stable_node *ksm_check_sta
        struct rb_node *node;
        int nid;
 
-       for (nid = 0; nid < MAX_NUMNODES; nid++)
+       for (nid = 0; nid < nr_node_ids; nid++)
                for (node = rb_first(&root_stable_tree[nid]); node;
                                node = rb_next(node)) {
                        struct stable_node *stable_node;
@@ -2030,22 +2030,15 @@ static ssize_t merge_across_nodes_store(
                return -EINVAL;
 
        mutex_lock(&ksm_thread_mutex);
-       if (ksm_run & KSM_RUN_MERGE) {
-               err = -EBUSY;
-       } else {
-               if (ksm_merge_across_nodes != knob) {
-                       if (ksm_pages_shared > 0)
-                               err = -EBUSY;
-                       else
-                               ksm_merge_across_nodes = knob;
-               }
+       if (ksm_merge_across_nodes != knob) {
+               if (ksm_pages_shared)
+                       err = -EBUSY;
+               else
+                       ksm_merge_across_nodes = knob;
        }
-
-       if (err)
-               count = err;
        mutex_unlock(&ksm_thread_mutex);
 
-       return count;
+       return err ? err : count;
 }
 KSM_ATTR(merge_across_nodes);
 #endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to