The isolate path of non-lru movable pages:

isolate_migratepages_block
        isolate_movable_page
                trylock_page
                // if PageIsolated, goto out_no_isolated
                a_ops->isolate_page
                __SetPageIsolated
                unlock_page
        list_add(&page->lru, &cc->migratepages)

The migration path:

unmap_and_move
        __unmap_and_move
                lock_page
                move_to_new_page
                        a_ops->migratepage
                        __ClearPageIsolated
                unlock_page
        /* here, the page could be isolated again by another thread, and added 
into another cc->migratepages,
        since PG_Isolated has been cleared, and not protected by page_lock */
        list_del(&page->lru)

Suppose thread A isolates three pages in the order p1, p2, p3, A's 
cc->migratepages will be like
        head_A - p3 - p2 - p1
After p2 is migrated (but before list_del), it is isolated by another thread B. 
Then list_del will delete p2
from the cc->migratepages of B (instead of A). When A continues to migrate and 
delete p1, it will find:
        p1->prev == p2
        p2->next == LIST_POISON1. 

So we will end up with a bug like
"list_del corruption. prev->next should be ffffffbf0a1eb8e0, but was 
dead000000000100"
(see __list_del_entry_valid).


________________________________________
From: Michal Hocko [mho...@kernel.org]
Sent: Tuesday, September 03, 2019 21:17
To: sunqiuyang
Cc: linux-kernel@vger.kernel.org; linux...@kvack.org
Subject: Re: [PATCH 1/1] mm/migrate: fix list corruption in migration of 
non-LRU movable pages

On Tue 03-09-19 16:27:46, sunqiuyang wrote:
> From: Qiuyang Sun <sunqiuy...@huawei.com>
>
> Currently, after a page is migrated, it
> 1) has its PG_isolated flag cleared in move_to_new_page(), and
> 2) is deleted from its LRU list (cc->migratepages) in unmap_and_move().
> However, between steps 1) and 2), the page could be isolated by another
> thread in isolate_movable_page(), and added to another LRU list, leading
> to list_del corruption later.

Care to explain the race? Both paths use page_lock AFAICS
>
> This patch fixes the bug by moving list_del into the critical section
> protected by lock_page(), so that a page will not be isolated again before
> it has been deleted from its LRU list.
>
> Signed-off-by: Qiuyang Sun <sunqiuy...@huawei.com>
> ---
>  mm/migrate.c | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index a42858d..c58a606 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1124,6 +1124,8 @@ static int __unmap_and_move(struct page *page, struct 
> page *newpage,
>       /* Drop an anon_vma reference if we took one */
>       if (anon_vma)
>               put_anon_vma(anon_vma);
> +     if (rc != -EAGAIN)
> +             list_del(&page->lru);
>       unlock_page(page);
>  out:
>       /*
> @@ -1190,6 +1192,7 @@ static ICE_noinline int unmap_and_move(new_page_t 
> get_new_page,
>                       put_new_page(newpage, private);
>               else
>                       put_page(newpage);
> +             list_del(&page->lru);
>               goto out;
>       }
>
> @@ -1200,14 +1203,6 @@ static ICE_noinline int unmap_and_move(new_page_t 
> get_new_page,
>  out:
>       if (rc != -EAGAIN) {
>               /*
> -              * A page that has been migrated has all references
> -              * removed and will be freed. A page that has not been
> -              * migrated will have kepts its references and be
> -              * restored.
> -              */
> -             list_del(&page->lru);
> -
> -             /*
>                * Compaction can migrate also non-LRU pages which are
>                * not accounted to NR_ISOLATED_*. They can be recognized
>                * as __PageMovable
> --
> 1.8.3.1

--
Michal Hocko
SUSE Labs

Reply via email to