> diff --git a/mm/rmap.c b/mm/rmap.c
> index 068522d..b99c742 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1389,9 +1389,19 @@ static int try_to_unmap_cluster(unsigned long
> cursor, unsigned int *mapcount,
>               BUG_ON(!page || PageAnon(page));
> 
>               if (locked_vma) {
> -                     mlock_vma_page(page);   /* no-op if already
> mlocked */
> -                     if (page == check_page)
> +                     if (page == check_page) {
> +                             /* we know we have check_page locked */
> +                             mlock_vma_page(page);
>                               ret = SWAP_MLOCK;
> +                     } else if (trylock_page(page)) {
> +                             /*
> +                              * If we can lock the page, perform mlock.
> +                              * Otherwise leave the page alone, it will be
> +                              * eventually encountered again later.
> +                              */
> +                             mlock_vma_page(page);
> +                             unlock_page(page);
> +                     }
>                       continue;       /* don't unmap */
>               }

I audited all related mm code. However I couldn't find any race that it can 
close.

First off,  current munlock code is crazy tricky.

munlock
        down_write(mmap_sem)
        do_mlock()
                mlock_fixup
                        munlock_vma_pages_range
                                __munlock_pagevec
                                        spin_lock_irq(zone->lru_lock)
                                        TestClearPageMlocked(page)
                                        del_page_from_lru_list
                                        spin_unlock_irq(zone->lru_lock)
                                        lock_page
                                        __munlock_isolated_page
                                        unlock_page
                                
        up_write(mmap_sem)

Look, TestClearPageMlocked(page) is not protected by lock_page. But this is 
still
safe because Page_mocked mean one or more vma marked VM_LOCKED then we
only need care about turning down final VM_LOCKED. I.e. mmap_sem protect them.

And,

                                        spin_lock_irq(zone->lru_lock)
                                        del_page_from_lru_list
                                        spin_unlock_irq(zone->lru_lock)

This idiom ensures I or someone else isolate the page from lru and isolated 
pages
will be put back by putback_lru_page in anycase. So, the pages will move the 
right
lru eventually.

And then, taking page-lock doesn't help to close vs munlock race.

On the other hands, page migration has the following call stack. 

some-caller [isolate_lru_page]
        unmap_and_move
                __unmap_and_move
                trylock_page
                try_to_unmap
                move_to_new_page
                        migrate_page
                                migrate_page_copy
                unlock_page

The critical part (i.e. migrate_page_copy) is protected by both page isolation 
and page lock.
Page fault path take lock page and doesn't use page isolation. This is correct.
try_to_unmap_cluster doesn't take page lock, but it ensure the page is 
isolated. This is correct too.

Plus, do_wp_page() has the following comment. But it is wrong. This lock is 
necessary to protect against
page migration, but not lru manipulation.

                if ((ret & VM_FAULT_WRITE) && (vma->vm_flags & VM_LOCKED)) {
                        lock_page(old_page);    /* LRU manipulation */
                        munlock_vma_page(old_page);
                        unlock_page(old_page);
                }


But, you know, this is crazy ugly lock design. I'm ok to change the rule to 
that PG_mlocked must be protected
page lock. If so, I propose to add PageMlocked() like this

                        } else if (!PageMlocked() && trylock_page(page)) {
                                /*
                                 * If we can lock the page, perform mlock.
                                 * Otherwise leave the page alone, it will be
                                 * eventually encountered again later.
                                 */
                                mlock_vma_page(page);
                                unlock_page(page);

This is safe. Even if race is happen and vmscan failed to mark PG_mlocked, 
that's no problem. Next vmscan may
do the right job and will fix the mistake.

Anyway,  I'm also sure this is not recent issue. It lived 5 years. It is no 
reason to rush.

Reply via email to