On Fri, 2021-12-10 at 11:08 +0800, Fengnan Chang wrote:
> Hi chao:
>
> As I mentioned before,
> https://lore.kernel.org/linux-f2fs-devel/kl1pr0601mb400309c5d62bfddde6aad8aebb...@kl1pr0601mb4003.apcprd06.prod.outlook.com/T/#mbe9a8f27626ac7ca71035e25f5502e756ab877ac
>
> there is a potential dead lock problem when just remove
> compress file condition in __should_serialize_io().
Hi,
Regardless of removing the condition in __should_serialize_io(), the
deadlock below can occur when Thread 1 is doing checkpoint, as you
mentioned.
We found the deadlock can occur not only between two write_data_pages
thread, but write_data_pages and write_begin.
I will send my patch in a new mail thread.
> The modify like this:
>
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 9f754aaef558..ffbee94924f3 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
>
> @@ -3218,8 +3224,6 @@ static inline bool __should_serialize_io(struct
> inode
> *inode,
> if (IS_NOQUOTA(inode))
> return false;
>
> - if (f2fs_need_compress_data(inode))
> - return true;
> if (wbc->sync_mode != WB_SYNC_ALL)
> return true;
>
> This modification is a prerequisite, if there is no modification
> there is no
> problem for now.
>
> And I make a debug, and here is what I found:
> Thread 1 is doing checkpoint in below procedure:
>
> Thread 2 Thread 3
> f2fs_write_cache_pages page 0
> ->lock_page(page) page 0-3 f2fs_write_ca
> che_pages page 0
> ->f2fs_write_compressed_pages ->loc
> k_page(page) //lock page 0, sleep
> ->f2fs_trylock_op failed
> ->f2fs_write_raw_pages
> ->f2fs_write_single_data_page
> ->f2fs_trylock_op failed
> ->unlock(page) page 0
> ->cond_resched(); ->lock_page(p
> age) //lock page 0 success
> ->lock_
> page(page) //try page 1,
>
> ->lock_page(page); page 0 //never success
>
> When Thread 1 do checkpoint and down_write of cp_rwsem, Thread 2 and
> Thread 3
> are write same cluster, Thread 2 start write cache page first, and
> get lock
> page 0-3, beacuse of f2fs_trylock_op failed,
> f2fs_write_compressed_pages and
> f2fs_write_single_data_page will failed, and Thread 2 will unlock
> page 0, but
> keep lock page 1-3 and schedule out, if thread 3 start write cache
> page in
> this time, thread 3 could get lock of page 0, but when try lock of
> page 1, it
> would never success. Then a deadlock occured between thread 2 and
> Thread 3.
>
> So, there is a potential limit: We can't write same clutser is
> multithread for
> compressed file .
>
> If we need to fix this problem, my thoughts is we shoudn't unlcok
> page in
> f2fs_write_single_data_page for compress raw page.
>
> Do we need to fix this for now?)
>
>
> _______________________________________________
> Linux-f2fs-devel mailing list
> [email protected]
>
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>
>
>
_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel