Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
Am Dienstag, 26. August 2014, 15:20:21 schrieb Martin Steigerwald: Am Dienstag, 26. August 2014, 09:02:23 schrieb Chris Mason: On 08/26/2014 06:20 AM, Martin Steigerwald wrote: Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason: On 08/15/2014 11:36 AM, Liu Bo wrote: This has been reported and discussed for a long time, and this hang occurs in both 3.15 and 3.16. [ great description ] I ran this through tests last week, and an overnight test over the weekend. It's in my for-linus branch now, along with everything else I plan on sending for rc3. Please double check my merge, I had to undo your rebase onto Miao's patches. I would like to test this on 3.17-rc2, what do I need to do to make it apply cleanly? That function in disk-io.c looks quite different from what the patch assumes at the beginning, so I am not sure how to merge this. You can just pull my for-linus branch, Liu Bo's fix is on top. Thanks, I put the commit on top of your for-linus branch on top of 3.17-rc2. Lets see how it goes. So far this appears to work well. I have experienced no hangs. Tested-By: Martin Steigerwald mar...@lichtvoll.de But I bet Linus took it already anyways :). Looking forward to rc3. Thanks for digging into and fixing this long standing annoying bug. Thanks, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
On Sun, Aug 31, 2014 at 01:48:36PM +0200, Martin Steigerwald wrote: Am Dienstag, 26. August 2014, 15:20:21 schrieb Martin Steigerwald: Am Dienstag, 26. August 2014, 09:02:23 schrieb Chris Mason: On 08/26/2014 06:20 AM, Martin Steigerwald wrote: Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason: On 08/15/2014 11:36 AM, Liu Bo wrote: This has been reported and discussed for a long time, and this hang occurs in both 3.15 and 3.16. [ great description ] I ran this through tests last week, and an overnight test over the weekend. It's in my for-linus branch now, along with everything else I plan on sending for rc3. Please double check my merge, I had to undo your rebase onto Miao's patches. I would like to test this on 3.17-rc2, what do I need to do to make it apply cleanly? That function in disk-io.c looks quite different from what the patch assumes at the beginning, so I am not sure how to merge this. You can just pull my for-linus branch, Liu Bo's fix is on top. Thanks, I put the commit on top of your for-linus branch on top of 3.17-rc2. Lets see how it goes. So far this appears to work well. I have experienced no hangs. Tested-By: Martin Steigerwald mar...@lichtvoll.de But I bet Linus took it already anyways :). Looking forward to rc3. Thanks for digging into and fixing this long standing annoying bug. Thanks for your feedback! thanks, -liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason: On 08/15/2014 11:36 AM, Liu Bo wrote: This has been reported and discussed for a long time, and this hang occurs in both 3.15 and 3.16. [ great description ] I ran this through tests last week, and an overnight test over the weekend. It's in my for-linus branch now, along with everything else I plan on sending for rc3. Please double check my merge, I had to undo your rebase onto Miao's patches. I would like to test this on 3.17-rc2, what do I need to do to make it apply cleanly? That function in disk-io.c looks quite different from what the patch assumes at the beginning, so I am not sure how to merge this. martin@merkaba:~/Computer/Merkaba/Kernel/linux patch -p1 ../[PATCH v3] Btrfs_fix task hang under heavy compressed write.mbox patching file fs/btrfs/async-thread.c patching file fs/btrfs/async-thread.h patching file fs/btrfs/delayed-inode.c patching file fs/btrfs/disk-io.c Hunk #2 FAILED at 713. Hunk #3 succeeded at 827 (offset -29 lines). 1 out of 3 hunks FAILED -- saving rejects to file fs/btrfs/disk-io.c.rej patching file fs/btrfs/extent-tree.c patching file fs/btrfs/inode.c Hunk #1 succeeded at 1096 (offset -15 lines). Hunk #2 succeeded at 1883 (offset -43 lines). Hunk #3 succeeded at 2825 (offset -47 lines). Hunk #4 succeeded at 2835 (offset -47 lines). Hunk #5 succeeded at 7166 (offset -345 lines). Hunk #6 succeeded at 8504 (offset -371 lines). patching file fs/btrfs/ordered-data.c patching file fs/btrfs/qgroup.c Hunk #1 succeeded at 2720 (offset -2 lines). patching file fs/btrfs/raid56.c patching file fs/btrfs/reada.c patching file fs/btrfs/scrub.c Hunk #1 succeeded at 428 (offset 1 line). Hunk #2 succeeded at 999 (offset 2 lines). Hunk #3 succeeded at 1616 (offset -8 lines). Hunk #4 succeeded at 3204 (offset -29 lines). patching file fs/btrfs/volumes.c Hunk #1 succeeded at 5800 (offset -87 lines). Otherwise, I´d wait till rc3, as my current 3.16.2 with v1 of this patch seems to behave stable with trees occupying all space: merkaba:~ btrfs fi sh /home Label: 'home' uuid: […] Total devices 2 FS bytes used 127.69GiB devid1 size 160.00GiB used 160.00GiB path /dev/dm-0 devid2 size 160.00GiB used 160.00GiB path /dev/mapper/sata-home Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
On Tue, Aug 26, 2014 at 12:20:28PM +0200, Martin Steigerwald wrote: Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason: On 08/15/2014 11:36 AM, Liu Bo wrote: This has been reported and discussed for a long time, and this hang occurs in both 3.15 and 3.16. [ great description ] I ran this through tests last week, and an overnight test over the weekend. It's in my for-linus branch now, along with everything else I plan on sending for rc3. Please double check my merge, I had to undo your rebase onto Miao's patches. I would like to test this on 3.17-rc2, what do I need to do to make it apply cleanly? That function in disk-io.c looks quite different from what the patch assumes at the beginning, so I am not sure how to merge this. martin@merkaba:~/Computer/Merkaba/Kernel/linux patch -p1 ../[PATCH v3] Btrfs_fix task hang under heavy compressed write.mbox patching file fs/btrfs/async-thread.c patching file fs/btrfs/async-thread.h patching file fs/btrfs/delayed-inode.c patching file fs/btrfs/disk-io.c Hunk #2 FAILED at 713. Hunk #3 succeeded at 827 (offset -29 lines). 1 out of 3 hunks FAILED -- saving rejects to file fs/btrfs/disk-io.c.rej patching file fs/btrfs/extent-tree.c patching file fs/btrfs/inode.c Hunk #1 succeeded at 1096 (offset -15 lines). Hunk #2 succeeded at 1883 (offset -43 lines). Hunk #3 succeeded at 2825 (offset -47 lines). Hunk #4 succeeded at 2835 (offset -47 lines). Hunk #5 succeeded at 7166 (offset -345 lines). Hunk #6 succeeded at 8504 (offset -371 lines). patching file fs/btrfs/ordered-data.c patching file fs/btrfs/qgroup.c Hunk #1 succeeded at 2720 (offset -2 lines). patching file fs/btrfs/raid56.c patching file fs/btrfs/reada.c patching file fs/btrfs/scrub.c Hunk #1 succeeded at 428 (offset 1 line). Hunk #2 succeeded at 999 (offset 2 lines). Hunk #3 succeeded at 1616 (offset -8 lines). Hunk #4 succeeded at 3204 (offset -29 lines). patching file fs/btrfs/volumes.c Hunk #1 succeeded at 5800 (offset -87 lines). Otherwise, I´d wait till rc3, as my current 3.16.2 with v1 of this patch seems to behave stable with trees occupying all space: merkaba:~ btrfs fi sh /home Label: 'home' uuid: […] Total devices 2 FS bytes used 127.69GiB devid1 size 160.00GiB used 160.00GiB path /dev/dm-0 devid2 size 160.00GiB used 160.00GiB path /dev/mapper/sata-home Hmm, the v3 patch is based on Chris's integration branch, so it's expected not to apply it cleanly. But if you're urgent, I can provide a patch based on 3.17-rc2. (rc3 is coming soon though ;-) ) thanks, -liubo Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
Am Dienstag, 26. August 2014, 18:38:03 schrieb Liu Bo: On Tue, Aug 26, 2014 at 12:20:28PM +0200, Martin Steigerwald wrote: Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason: On 08/15/2014 11:36 AM, Liu Bo wrote: This has been reported and discussed for a long time, and this hang occurs in both 3.15 and 3.16. [ great description ] I ran this through tests last week, and an overnight test over the weekend. It's in my for-linus branch now, along with everything else I plan on sending for rc3. Please double check my merge, I had to undo your rebase onto Miao's patches. I would like to test this on 3.17-rc2, what do I need to do to make it apply cleanly? That function in disk-io.c looks quite different from what the patch assumes at the beginning, so I am not sure how to merge this. martin@merkaba:~/Computer/Merkaba/Kernel/linux patch -p1 ../[PATCH v3] Btrfs_fix task hang under heavy compressed write.mbox patching file fs/btrfs/async-thread.c patching file fs/btrfs/async-thread.h patching file fs/btrfs/delayed-inode.c patching file fs/btrfs/disk-io.c Hunk #2 FAILED at 713. Hunk #3 succeeded at 827 (offset -29 lines). 1 out of 3 hunks FAILED -- saving rejects to file fs/btrfs/disk-io.c.rej patching file fs/btrfs/extent-tree.c patching file fs/btrfs/inode.c Hunk #1 succeeded at 1096 (offset -15 lines). Hunk #2 succeeded at 1883 (offset -43 lines). Hunk #3 succeeded at 2825 (offset -47 lines). Hunk #4 succeeded at 2835 (offset -47 lines). Hunk #5 succeeded at 7166 (offset -345 lines). Hunk #6 succeeded at 8504 (offset -371 lines). patching file fs/btrfs/ordered-data.c patching file fs/btrfs/qgroup.c Hunk #1 succeeded at 2720 (offset -2 lines). patching file fs/btrfs/raid56.c patching file fs/btrfs/reada.c patching file fs/btrfs/scrub.c Hunk #1 succeeded at 428 (offset 1 line). Hunk #2 succeeded at 999 (offset 2 lines). Hunk #3 succeeded at 1616 (offset -8 lines). Hunk #4 succeeded at 3204 (offset -29 lines). patching file fs/btrfs/volumes.c Hunk #1 succeeded at 5800 (offset -87 lines). Otherwise, I´d wait till rc3, as my current 3.16.2 with v1 of this patch seems to behave stable with trees occupying all space: merkaba:~ btrfs fi sh /home Label: 'home' uuid: […] Total devices 2 FS bytes used 127.69GiB devid1 size 160.00GiB used 160.00GiB path /dev/dm-0 devid2 size 160.00GiB used 160.00GiB path /dev/mapper/sata-home Hmm, the v3 patch is based on Chris's integration branch, so it's expected not to apply it cleanly. But if you're urgent, I can provide a patch based on 3.17-rc2. (rc3 is coming soon though ;-) ) I see. Well I can stick with 3.16.1 for the time being. But if you want some additional testing soonish, feel free to provide a patch. Thanks, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
On 08/26/2014 06:20 AM, Martin Steigerwald wrote: Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason: On 08/15/2014 11:36 AM, Liu Bo wrote: This has been reported and discussed for a long time, and this hang occurs in both 3.15 and 3.16. [ great description ] I ran this through tests last week, and an overnight test over the weekend. It's in my for-linus branch now, along with everything else I plan on sending for rc3. Please double check my merge, I had to undo your rebase onto Miao's patches. I would like to test this on 3.17-rc2, what do I need to do to make it apply cleanly? That function in disk-io.c looks quite different from what the patch assumes at the beginning, so I am not sure how to merge this. You can just pull my for-linus branch, Liu Bo's fix is on top. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
Am Dienstag, 26. August 2014, 09:02:23 schrieb Chris Mason: On 08/26/2014 06:20 AM, Martin Steigerwald wrote: Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason: On 08/15/2014 11:36 AM, Liu Bo wrote: This has been reported and discussed for a long time, and this hang occurs in both 3.15 and 3.16. [ great description ] I ran this through tests last week, and an overnight test over the weekend. It's in my for-linus branch now, along with everything else I plan on sending for rc3. Please double check my merge, I had to undo your rebase onto Miao's patches. I would like to test this on 3.17-rc2, what do I need to do to make it apply cleanly? That function in disk-io.c looks quite different from what the patch assumes at the beginning, so I am not sure how to merge this. You can just pull my for-linus branch, Liu Bo's fix is on top. Thanks, I put the commit on top of your for-linus branch on top of 3.17-rc2. Lets see how it goes. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
On 08/15/2014 11:36 AM, Liu Bo wrote: This has been reported and discussed for a long time, and this hang occurs in both 3.15 and 3.16. [ great description ] I ran this through tests last week, and an overnight test over the weekend. It's in my for-linus branch now, along with everything else I plan on sending for rc3. Please double check my merge, I had to undo your rebase onto Miao's patches. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
On Mon, Aug 25, 2014 at 10:58:13AM -0400, Chris Mason wrote: On 08/15/2014 11:36 AM, Liu Bo wrote: This has been reported and discussed for a long time, and this hang occurs in both 3.15 and 3.16. [ great description ] I ran this through tests last week, and an overnight test over the weekend. It's in my for-linus branch now, along with everything else I plan on sending for rc3. Please double check my merge, I had to undo your rebase onto Miao's patches. Just checked, looks good ;) thanks, -liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
On Sat, Aug 16, 2014 at 03:28:11PM +0800, Miao Xie wrote: On Fri, 15 Aug 2014 23:36:53 +0800, Liu Bo wrote: This has been reported and discussed for a long time, and this hang occurs in both 3.15 and 3.16. Btrfs now migrates to use kernel workqueue, but it introduces this hang problem. Btrfs has a kind of work queued as an ordered way, which means that its ordered_func() must be processed in the way of FIFO, so it usually looks like -- normal_work_helper(arg) work = container_of(arg, struct btrfs_work, normal_work); work-func() (we name it work X) for ordered_work in wq-ordered_list ordered_work-ordered_func() ordered_work-ordered_free() The hang is a rare case, first when we find free space, we get an uncached block group, then we go to read its free space cache inode for free space information, so it will file a readahead request btrfs_readpages() for page that is not in page cache __do_readpage() submit_extent_page() btrfs_submit_bio_hook() btrfs_bio_wq_end_io() submit_bio() end_workqueue_bio() --(ret by the 1st endio) queue a work(named work Y) for the 2nd also the real endio() So the hang occurs when work Y's work_struct and work X's work_struct happens to share the same address. A bit more explanation, A,B,C -- struct btrfs_work arg -- struct work_struct kthread: worker_thread() pick up a work_struct from @worklist process_one_work(arg) worker-current_work = arg; -- arg is A-normal_work worker-current_func(arg) normal_work_helper(arg) A = container_of(arg, struct btrfs_work, normal_work); A-func() A-ordered_func() A-ordered_free() -- A gets freed B-ordered_func() submit_compressed_extents() find_free_extent() load_free_space_inode() ... -- (the above readhead stack) end_workqueue_bio() btrfs_queue_work(work C) B-ordered_free() As if work A has a high priority in wq-ordered_list and there are more ordered works queued after it, such as B-ordered_func(), its memory could have been freed before normal_work_helper() returns, which means that kernel workqueue code worker_thread() still has worker-current_work pointer to be work A-normal_work's, ie. arg's address. Meanwhile, work C is allocated after work A is freed, work C-normal_work and work A-normal_work are likely to share the same address(I confirmed this with ftrace output, so I'm not just guessing, it's rare though). When another kthread picks up work C-normal_work to process, and finds our kthread is processing it(see find_worker_executing_work()), it'll think work C as a collision and skip then, which ends up nobody processing work C. So the situation is that our kthread is waiting forever on work C. Besides, there're other cases that can lead to deadlock, but the real problem is that all btrfs workqueue shares one work-func, -- normal_work_helper, so this makes each workqueue to have its own helper function, but only a wraper pf normal_work_helper. With this patch, I no long hit the above hang. Signed-off-by: Liu Bo bo.li@oracle.com --- v3: - Adopting Chris's advice, this has each workqueue provide its own helper function. - This is rebased on the latest Chris's for-linus branch. v2: - This changes a bit to not defer all -ordered_free(), but only defer the work that triggers this run_ordered_work(). Actually we don't need to defer other -ordered_free() because their work cannot be this kthread worker's @current_work. We can benefit from it since this can pin less memory during the period. fs/btrfs/async-thread.c | 44 +++-- fs/btrfs/async-thread.h | 28 - fs/btrfs/delayed-inode.c | 4 +-- fs/btrfs/disk-io.c | 63 ++-- fs/btrfs/extent-tree.c | 7 +++--- fs/btrfs/inode.c | 35 ++- fs/btrfs/ordered-data.c | 1 + fs/btrfs/qgroup.c| 1 + fs/btrfs/raid56.c| 9 --- fs/btrfs/reada.c | 3 ++- fs/btrfs/scrub.c | 14 ++- fs/btrfs/volumes.c | 3 ++- 12 files changed, 146 insertions(+), 66 deletions(-) diff --git a/fs/btrfs/async-thread.c
Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
On Fri, 15 Aug 2014 23:36:53 +0800, Liu Bo wrote: This has been reported and discussed for a long time, and this hang occurs in both 3.15 and 3.16. Btrfs now migrates to use kernel workqueue, but it introduces this hang problem. Btrfs has a kind of work queued as an ordered way, which means that its ordered_func() must be processed in the way of FIFO, so it usually looks like -- normal_work_helper(arg) work = container_of(arg, struct btrfs_work, normal_work); work-func() (we name it work X) for ordered_work in wq-ordered_list ordered_work-ordered_func() ordered_work-ordered_free() The hang is a rare case, first when we find free space, we get an uncached block group, then we go to read its free space cache inode for free space information, so it will file a readahead request btrfs_readpages() for page that is not in page cache __do_readpage() submit_extent_page() btrfs_submit_bio_hook() btrfs_bio_wq_end_io() submit_bio() end_workqueue_bio() --(ret by the 1st endio) queue a work(named work Y) for the 2nd also the real endio() So the hang occurs when work Y's work_struct and work X's work_struct happens to share the same address. A bit more explanation, A,B,C -- struct btrfs_work arg -- struct work_struct kthread: worker_thread() pick up a work_struct from @worklist process_one_work(arg) worker-current_work = arg; -- arg is A-normal_work worker-current_func(arg) normal_work_helper(arg) A = container_of(arg, struct btrfs_work, normal_work); A-func() A-ordered_func() A-ordered_free() -- A gets freed B-ordered_func() submit_compressed_extents() find_free_extent() load_free_space_inode() ... -- (the above readhead stack) end_workqueue_bio() btrfs_queue_work(work C) B-ordered_free() As if work A has a high priority in wq-ordered_list and there are more ordered works queued after it, such as B-ordered_func(), its memory could have been freed before normal_work_helper() returns, which means that kernel workqueue code worker_thread() still has worker-current_work pointer to be work A-normal_work's, ie. arg's address. Meanwhile, work C is allocated after work A is freed, work C-normal_work and work A-normal_work are likely to share the same address(I confirmed this with ftrace output, so I'm not just guessing, it's rare though). When another kthread picks up work C-normal_work to process, and finds our kthread is processing it(see find_worker_executing_work()), it'll think work C as a collision and skip then, which ends up nobody processing work C. So the situation is that our kthread is waiting forever on work C. Besides, there're other cases that can lead to deadlock, but the real problem is that all btrfs workqueue shares one work-func, -- normal_work_helper, so this makes each workqueue to have its own helper function, but only a wraper pf normal_work_helper. With this patch, I no long hit the above hang. Signed-off-by: Liu Bo bo.li@oracle.com --- v3: - Adopting Chris's advice, this has each workqueue provide its own helper function. - This is rebased on the latest Chris's for-linus branch. v2: - This changes a bit to not defer all -ordered_free(), but only defer the work that triggers this run_ordered_work(). Actually we don't need to defer other -ordered_free() because their work cannot be this kthread worker's @current_work. We can benefit from it since this can pin less memory during the period. fs/btrfs/async-thread.c | 44 +++-- fs/btrfs/async-thread.h | 28 - fs/btrfs/delayed-inode.c | 4 +-- fs/btrfs/disk-io.c | 63 ++-- fs/btrfs/extent-tree.c | 7 +++--- fs/btrfs/inode.c | 35 ++- fs/btrfs/ordered-data.c | 1 + fs/btrfs/qgroup.c| 1 + fs/btrfs/raid56.c| 9 --- fs/btrfs/reada.c | 3 ++- fs/btrfs/scrub.c | 14 ++- fs/btrfs/volumes.c | 3 ++- 12 files changed, 146 insertions(+), 66 deletions(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index 5a201d8..fbd76de 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -22,7 +22,6 @@ #include linux/list.h #include
[PATCH v3] Btrfs: fix task hang under heavy compressed write
This has been reported and discussed for a long time, and this hang occurs in both 3.15 and 3.16. Btrfs now migrates to use kernel workqueue, but it introduces this hang problem. Btrfs has a kind of work queued as an ordered way, which means that its ordered_func() must be processed in the way of FIFO, so it usually looks like -- normal_work_helper(arg) work = container_of(arg, struct btrfs_work, normal_work); work-func() (we name it work X) for ordered_work in wq-ordered_list ordered_work-ordered_func() ordered_work-ordered_free() The hang is a rare case, first when we find free space, we get an uncached block group, then we go to read its free space cache inode for free space information, so it will file a readahead request btrfs_readpages() for page that is not in page cache __do_readpage() submit_extent_page() btrfs_submit_bio_hook() btrfs_bio_wq_end_io() submit_bio() end_workqueue_bio() --(ret by the 1st endio) queue a work(named work Y) for the 2nd also the real endio() So the hang occurs when work Y's work_struct and work X's work_struct happens to share the same address. A bit more explanation, A,B,C -- struct btrfs_work arg -- struct work_struct kthread: worker_thread() pick up a work_struct from @worklist process_one_work(arg) worker-current_work = arg; -- arg is A-normal_work worker-current_func(arg) normal_work_helper(arg) A = container_of(arg, struct btrfs_work, normal_work); A-func() A-ordered_func() A-ordered_free() -- A gets freed B-ordered_func() submit_compressed_extents() find_free_extent() load_free_space_inode() ... -- (the above readhead stack) end_workqueue_bio() btrfs_queue_work(work C) B-ordered_free() As if work A has a high priority in wq-ordered_list and there are more ordered works queued after it, such as B-ordered_func(), its memory could have been freed before normal_work_helper() returns, which means that kernel workqueue code worker_thread() still has worker-current_work pointer to be work A-normal_work's, ie. arg's address. Meanwhile, work C is allocated after work A is freed, work C-normal_work and work A-normal_work are likely to share the same address(I confirmed this with ftrace output, so I'm not just guessing, it's rare though). When another kthread picks up work C-normal_work to process, and finds our kthread is processing it(see find_worker_executing_work()), it'll think work C as a collision and skip then, which ends up nobody processing work C. So the situation is that our kthread is waiting forever on work C. Besides, there're other cases that can lead to deadlock, but the real problem is that all btrfs workqueue shares one work-func, -- normal_work_helper, so this makes each workqueue to have its own helper function, but only a wraper pf normal_work_helper. With this patch, I no long hit the above hang. Signed-off-by: Liu Bo bo.li@oracle.com --- v3: - Adopting Chris's advice, this has each workqueue provide its own helper function. - This is rebased on the latest Chris's for-linus branch. v2: - This changes a bit to not defer all -ordered_free(), but only defer the work that triggers this run_ordered_work(). Actually we don't need to defer other -ordered_free() because their work cannot be this kthread worker's @current_work. We can benefit from it since this can pin less memory during the period. fs/btrfs/async-thread.c | 44 +++-- fs/btrfs/async-thread.h | 28 - fs/btrfs/delayed-inode.c | 4 +-- fs/btrfs/disk-io.c | 63 ++-- fs/btrfs/extent-tree.c | 7 +++--- fs/btrfs/inode.c | 35 ++- fs/btrfs/ordered-data.c | 1 + fs/btrfs/qgroup.c| 1 + fs/btrfs/raid56.c| 9 --- fs/btrfs/reada.c | 3 ++- fs/btrfs/scrub.c | 14 ++- fs/btrfs/volumes.c | 3 ++- 12 files changed, 146 insertions(+), 66 deletions(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index 5a201d8..fbd76de 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -22,7 +22,6 @@ #include linux/list.h #include linux/spinlock.h #include linux/freezer.h -#include linux/workqueue.h #include async-thread.h #include ctree.h @@ -55,8 +54,39 @@ struct btrfs_workqueue {
Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
On 08/15/2014 11:36 AM, Liu Bo wrote: [ snip ] Besides, there're other cases that can lead to deadlock, but the real problem is that all btrfs workqueue shares one work-func, -- normal_work_helper, so this makes each workqueue to have its own helper function, but only a wraper pf normal_work_helper. With this patch, I no long hit the above hang. Signed-off-by: Liu Bo bo.li@oracle.com --- v3: - Adopting Chris's advice, this has each workqueue provide its own helper function. - This is rebased on the latest Chris's for-linus branch. This is what I had in mind, thanks Liu! I'll test and queue it up for rc2. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html