At Mon, 7 Feb 2005 21:24:59 +0000 (GMT),
Hugh Dickins wrote:
> 
> On Thu, 3 Feb 2005, IWAMOTO Toshihiro wrote:
> > The current implementation of memory hotremoval relies on that pages
> > can be unmapped from process spaces.  After successful unmapping,
> > subsequent accesses to the pages are blocked and don't interfere
> > the hotremoval operation.
> > 
> > However, this code
> > 
> >         if (PageSwapCache(page) &&
> >             page_count(page) != page_mapcount(page) + 2) {
> >                 ret = SWAP_FAIL;
> >                 goto out_unmap;
> >         }
> 
> Yes, that is odd code.  It would be nice to have a solution without it.
> 
> > in try_to_unmap_one() prevents unmapping pages that are referenced via
> > get_user_pages(), and such references can be held for a long time if
> > they are due to such as direct IO.
> > I've made a test program that issues multiple direct IO read requests
> > against a single read buffer, and pages that belong to the buffer
> > cannot be hotremoved because they aren't unmapped.
> 
> I haven't looked at the rest of your hotremoval, so it's not obvious
> to me how a change here would help you - obviously you wouldn't want
> to be migrating pages while direct IO to them was in progress.
> 
> I presume your patch works for you by letting the page count fall
> to a point where migration moves it automatically as soon as the
> got_user_pages are put, where without your patch the count is held
> too high, and you keep doing scans which tend to miss the window
> in which those pages are put?

Yes.  And my test program has no such time window because IO requests
are issued without waiting for completion of older requests.
I think issuing IO requests in such a manner is nonsense, but
a misbehaving process shouldn't be able to prevent memory hotremoval.

If the hotremoval code can unmap a page from a process space, the
process will be blocked when it causes a page fault against the page.
So, a process cannot continuously issue direct IO requests to keep
page counts high.  (get_user_pages() causes a (simulated) page fault.)


> >   - The can_share_swap_page() call in do_swap_page() always returns
> >     false.  It is inefficient but should be harmless.  Incrementing
> >     page_mapcount before calling that function should fix the problem,
> >     but it may cause bad side effects.
> 
> Odd that your patch moves it if it now doesn't even work!
> But I think some more movement should be able to solve that.

Moving swap_free() is necessary to avoid can_share_swap_page()
returning true when it shouldn't.  And, write access cases are handled
later with do_wp_page() call, anyway.

> >   - Another obvious solution to this issue is to find the "offending"
> >     process from a un-unmappable page and suspend it until the page is
> >     unmapped.  I'm afraid the implementation would be much more complicated.
> 
> Agreed, let's not get into that.

Nice to hear that.

> > --- mm/memory.c.orig        2005-01-17 14:47:11.000000000 +0900
> > +++ mm/memory.c     2005-01-17 14:55:51.000000000 +0900
> > @@ -1786,10 +1786,6 @@ static int do_swap_page(struct mm_struct
> >     }
> >  
> >     /* The page isn't present yet, go ahead with the fault. */
> > -           
> > -   swap_free(entry);
> > -   if (vm_swap_full())
> > -           remove_exclusive_swap_page(page);
> >  
> >     mm->rss++;
> >     acct_update_integrals();
> > @@ -1800,6 +1796,10 @@ static int do_swap_page(struct mm_struct
> >             pte = maybe_mkwrite(pte_mkdirty(pte), vma);
> >             write_access = 0;
> >     }
> > +           
> > +   swap_free(entry);
> > +   if (vm_swap_full())
> > +           remove_exclusive_swap_page(page);
> >     unlock_page(page);
> >  
> >     flush_icache_page(vma, page);
> > --- mm/rmap.c.orig  2005-01-17 14:40:08.000000000 +0900
> > +++ mm/rmap.c       2005-01-21 12:34:06.000000000 +0900
> > @@ -569,8 +569,11 @@ static int try_to_unmap_one(struct page 
> >      */
> >     if (PageSwapCache(page) &&
> >         page_count(page) != page_mapcount(page) + 2) {
> > -           ret = SWAP_FAIL;
> > -           goto out_unmap;
> > +           if (page_mapcount(page) > 1) {  /* happens when COW is in 
> > progress */
> > +                   ret = SWAP_FAIL;
> > +                   goto out_unmap;
> > +           }
> > +   printk("unmapping GUPed page\n");
> >     }
> >  
> >     /* Nuke the page table entry. */
> 
> I'm disappointed to see you making this more complicated, it's
> even harder to understand than before.  I believe that if we're
> going to make good use of page_mapcount in can_share_swap_page,
> it should be possible to delete this block from try_to_unmap_one
> altogether.  Or did you try that, and find it goes wrong?

I just wanted to be conservative to get a working patch.
I think this block can be deleted.

> > --- mm/swapfile.c.orig      2005-01-17 14:47:11.000000000 +0900
> > +++ mm/swapfile.c   2005-01-31 16:59:11.000000000 +0900

> This can_share_swap_page looks messier than I'd want.
> 
> I was hoping to append my own patch to this response, but I haven't
> got it working right yet (swap seems too full).  I hope tomorrow,
> but thought I'd better not delay these comments any longer.

I saw your patch in the other mail, and it looks better and should fix
my problem.  I'll try and report.

--
IWAMOTO Toshihiro
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to