Miaohe Lin <linmia...@huawei.com> writes: > On 2021/4/10 1:17, Tim Chen wrote: >> >> >> On 4/9/21 1:42 AM, Miaohe Lin wrote: >>> On 2021/4/9 5:34, Tim Chen wrote: >>>> >>>> >>>> On 4/8/21 6:08 AM, Miaohe Lin wrote: >>>>> When I was investigating the swap code, I found the below possible race >>>>> window: >>>>> >>>>> CPU 1 CPU 2 >>>>> ----- ----- >>>>> do_swap_page >>>>> synchronous swap_readpage >>>>> alloc_page_vma >>>>> swapoff >>>>> release swap_file, bdev, or ... >>>> >>> >>> Many thanks for quick review and reply! >>> >>>> Perhaps I'm missing something. The release of swap_file, bdev etc >>>> happens after we have cleared the SWP_VALID bit in si->flags in >>>> destroy_swap_extents >>>> if I read the swapoff code correctly. >>> Agree. Let's look this more close: >>> CPU1 CPU2 >>> ----- ----- >>> swap_readpage >>> if (data_race(sis->flags & SWP_FS_OPS)) { >>> swapoff >>> p->swap_file >>> = NULL; >>> struct file *swap_file = sis->swap_file; >>> struct address_space *mapping = swap_file->f_mapping;[oops!] >>> ... >>> p->flags = 0; >>> ... >>> >>> Does this make sense for you? >> >> p->swapfile = NULL happens after the >> p->flags &= ~SWP_VALID, synchronize_rcu(), destroy_swap_extents() sequence >> in swapoff(). >> >> So I don't think the sequence you illustrated on CPU2 is in the right order. >> That said, without get_swap_device/put_swap_device in swap_readpage, you >> could >> potentially blow pass synchronize_rcu() on CPU2 and causes a problem. so I >> think >> the problematic race looks something like the following: >> >> >> CPU1 CPU2 >> ----- ----- >> swap_readpage >> if (data_race(sis->flags & SWP_FS_OPS)) { >> swapoff >> p->flags = &= >> ~SWP_VALID; >> .. >> >> synchronize_rcu(); >> .. >> p->swap_file >> = NULL; >> struct file *swap_file = sis->swap_file; >> struct address_space *mapping = swap_file->f_mapping;[oops!] >> ... >> ... >> > > Agree. This is also what I meant to illustrate. And you provide a better one. > Many thanks!
For the pages that are swapped in through swap cache. That isn't an issue. Because the page is locked, the swap entry will be marked with SWAP_HAS_CACHE, so swapoff() cannot proceed until the page has been unlocked. So the race is for the fast path as follows, if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) I found it in your original patch description. But please make it more explicit to reduce the potential confusing. Best Regards, Huang, Ying