On Fri, 2021-12-10 at 11:08 +0800, Fengnan Chang wrote:
> Hi chao:
> 
> As I mentioned before,
> https://lore.kernel.org/linux-f2fs-devel/kl1pr0601mb400309c5d62bfddde6aad8aebb...@kl1pr0601mb4003.apcprd06.prod.outlook.com/T/#mbe9a8f27626ac7ca71035e25f5502e756ab877ac
> 
> there is a potential dead lock problem when just remove
> compress file condition in __should_serialize_io().
Hi,

Regardless of removing the condition in __should_serialize_io(), the
deadlock below can occur when Thread 1 is doing checkpoint, as you
mentioned.

We found the deadlock can occur not only between two write_data_pages
thread, but write_data_pages and write_begin.

I will send my patch in a new mail thread.
> The modify like this:
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 9f754aaef558..ffbee94924f3 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> 
> @@ -3218,8 +3224,6 @@ static inline bool __should_serialize_io(struct
> inode
> *inode,
>         if (IS_NOQUOTA(inode))
>                 return false;
> 
> -       if (f2fs_need_compress_data(inode))
> -               return true;
>         if (wbc->sync_mode != WB_SYNC_ALL)
>                 return true;
> 
> This modification is a prerequisite, if there is no modification
> there is no
> problem for now.
> 
> And I make a debug, and here is what I found:
> Thread 1 is doing checkpoint in below procedure:
> 
> Thread 2                                              Thread 3
> f2fs_write_cache_pages page 0
>       ->lock_page(page) page 0-3                      f2fs_write_ca
> che_pages page 0
>       ->f2fs_write_compressed_pages                           ->loc
> k_page(page) //lock page 0, sleep
>               ->f2fs_trylock_op failed
>               ->f2fs_write_raw_pages
>                       ->f2fs_write_single_data_page
>                               ->f2fs_trylock_op failed
>                       ->unlock(page)  page 0
>                       ->cond_resched();                       ->lock_page(p
> age)  //lock page 0 success
>                                                               ->lock_
> page(page) //try page 1,
> 
>                       ->lock_page(page); page 0 //never success
> 
> When Thread 1 do checkpoint and down_write of cp_rwsem, Thread 2 and
> Thread 3
> are write same cluster, Thread 2 start write cache page first, and
> get lock
> page 0-3, beacuse of f2fs_trylock_op failed,
> f2fs_write_compressed_pages and
> f2fs_write_single_data_page will failed, and Thread 2 will unlock
> page 0, but
> keep lock page 1-3 and schedule out, if thread 3 start write cache
> page in
> this time, thread 3 could get lock of page 0, but when try lock of
> page 1, it
> would never success.  Then a deadlock occured between thread 2 and
> Thread 3.
> 
> So, there is a potential limit: We can't write same clutser is
> multithread for
> compressed file .
> 
> If we need to fix this problem, my thoughts is we shoudn't unlcok
> page in
> f2fs_write_single_data_page for compress raw page.
> 
> Do we need to fix this for now?)
> 
> 
> _______________________________________________
> Linux-f2fs-devel mailing list
> [email protected]
> 
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> 
> 
> 



_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to