Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write

2014-08-31 Thread Martin Steigerwald
Am Dienstag, 26. August 2014, 15:20:21 schrieb Martin Steigerwald:
 Am Dienstag, 26. August 2014, 09:02:23 schrieb Chris Mason:
  On 08/26/2014 06:20 AM, Martin Steigerwald wrote:
   Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason:
   On 08/15/2014 11:36 AM, Liu Bo wrote:
   This has been reported and discussed for a long time, and this hang
   occurs
   in both 3.15 and 3.16.
   
   [ great description ]
   
   I ran this through tests last week, and an overnight test over the
   weekend.  It's in my for-linus branch now, along with everything else I
   plan on sending for rc3.
   
   Please double check my merge, I had to undo your rebase onto Miao's
   patches.
   
   I would like to test this on 3.17-rc2, what do I need to do to make it
   apply cleanly? That function in disk-io.c looks quite different from
   what
   the patch assumes at the beginning, so I am not sure how to merge this.
  
  You can just pull my for-linus branch, Liu Bo's fix is on top.
 
 Thanks, I put the commit on top of your for-linus branch on top of 3.17-rc2.
 Lets see how it goes.

So far this appears to work well.

I have experienced no hangs.

Tested-By: Martin Steigerwald mar...@lichtvoll.de

But I bet Linus took it already anyways :). Looking forward to rc3.

Thanks for digging into and fixing this long standing annoying bug.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write

2014-08-31 Thread Liu Bo
On Sun, Aug 31, 2014 at 01:48:36PM +0200, Martin Steigerwald wrote:
 Am Dienstag, 26. August 2014, 15:20:21 schrieb Martin Steigerwald:
  Am Dienstag, 26. August 2014, 09:02:23 schrieb Chris Mason:
   On 08/26/2014 06:20 AM, Martin Steigerwald wrote:
Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason:
On 08/15/2014 11:36 AM, Liu Bo wrote:
This has been reported and discussed for a long time, and this hang
occurs
in both 3.15 and 3.16.

[ great description ]

I ran this through tests last week, and an overnight test over the
weekend.  It's in my for-linus branch now, along with everything else I
plan on sending for rc3.

Please double check my merge, I had to undo your rebase onto Miao's
patches.

I would like to test this on 3.17-rc2, what do I need to do to make it
apply cleanly? That function in disk-io.c looks quite different from
what
the patch assumes at the beginning, so I am not sure how to merge this.
   
   You can just pull my for-linus branch, Liu Bo's fix is on top.
  
  Thanks, I put the commit on top of your for-linus branch on top of 3.17-rc2.
  Lets see how it goes.
 
 So far this appears to work well.
 
 I have experienced no hangs.
 
 Tested-By: Martin Steigerwald mar...@lichtvoll.de
 
 But I bet Linus took it already anyways :). Looking forward to rc3.
 
 Thanks for digging into and fixing this long standing annoying bug.

Thanks for your feedback!

thanks,
-liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write

2014-08-26 Thread Martin Steigerwald
Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason:
 On 08/15/2014 11:36 AM, Liu Bo wrote:
  This has been reported and discussed for a long time, and this hang occurs
  in both 3.15 and 3.16.
 
 [ great description ]
 
 I ran this through tests last week, and an overnight test over the
 weekend.  It's in my for-linus branch now, along with everything else I
 plan on sending for rc3.
 
 Please double check my merge, I had to undo your rebase onto Miao's patches.

I would like to test this on 3.17-rc2, what do I need to do to make it apply 
cleanly? That function in disk-io.c looks quite different from what the patch 
assumes at the beginning, so I am not sure how to merge this.

martin@merkaba:~/Computer/Merkaba/Kernel/linux patch -p1  ../[PATCH v3] 
Btrfs_fix task hang under heavy compressed write.mbox   
patching file fs/btrfs/async-thread.c
patching file fs/btrfs/async-thread.h
patching file fs/btrfs/delayed-inode.c
patching file fs/btrfs/disk-io.c
Hunk #2 FAILED at 713.
Hunk #3 succeeded at 827 (offset -29 lines).
1 out of 3 hunks FAILED -- saving rejects to file fs/btrfs/disk-io.c.rej
patching file fs/btrfs/extent-tree.c
patching file fs/btrfs/inode.c
Hunk #1 succeeded at 1096 (offset -15 lines).
Hunk #2 succeeded at 1883 (offset -43 lines).
Hunk #3 succeeded at 2825 (offset -47 lines).
Hunk #4 succeeded at 2835 (offset -47 lines).
Hunk #5 succeeded at 7166 (offset -345 lines).
Hunk #6 succeeded at 8504 (offset -371 lines).
patching file fs/btrfs/ordered-data.c
patching file fs/btrfs/qgroup.c
Hunk #1 succeeded at 2720 (offset -2 lines).
patching file fs/btrfs/raid56.c
patching file fs/btrfs/reada.c
patching file fs/btrfs/scrub.c
Hunk #1 succeeded at 428 (offset 1 line).
Hunk #2 succeeded at 999 (offset 2 lines).
Hunk #3 succeeded at 1616 (offset -8 lines).
Hunk #4 succeeded at 3204 (offset -29 lines).
patching file fs/btrfs/volumes.c
Hunk #1 succeeded at 5800 (offset -87 lines).

Otherwise, I´d wait till rc3, as my current 3.16.2 with v1 of this patch seems 
to behave stable with trees occupying all space:

merkaba:~ btrfs fi sh /home
Label: 'home'  uuid: […]
Total devices 2 FS bytes used 127.69GiB
devid1 size 160.00GiB used 160.00GiB path /dev/dm-0
devid2 size 160.00GiB used 160.00GiB path /dev/mapper/sata-home

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write

2014-08-26 Thread Liu Bo
On Tue, Aug 26, 2014 at 12:20:28PM +0200, Martin Steigerwald wrote:
 Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason:
  On 08/15/2014 11:36 AM, Liu Bo wrote:
   This has been reported and discussed for a long time, and this hang occurs
   in both 3.15 and 3.16.
  
  [ great description ]
  
  I ran this through tests last week, and an overnight test over the
  weekend.  It's in my for-linus branch now, along with everything else I
  plan on sending for rc3.
  
  Please double check my merge, I had to undo your rebase onto Miao's patches.
 
 I would like to test this on 3.17-rc2, what do I need to do to make it apply 
 cleanly? That function in disk-io.c looks quite different from what the patch 
 assumes at the beginning, so I am not sure how to merge this.
 
 martin@merkaba:~/Computer/Merkaba/Kernel/linux patch -p1  ../[PATCH v3] 
 Btrfs_fix task hang under heavy compressed write.mbox   
 patching file fs/btrfs/async-thread.c
 patching file fs/btrfs/async-thread.h
 patching file fs/btrfs/delayed-inode.c
 patching file fs/btrfs/disk-io.c
 Hunk #2 FAILED at 713.
 Hunk #3 succeeded at 827 (offset -29 lines).
 1 out of 3 hunks FAILED -- saving rejects to file fs/btrfs/disk-io.c.rej
 patching file fs/btrfs/extent-tree.c
 patching file fs/btrfs/inode.c
 Hunk #1 succeeded at 1096 (offset -15 lines).
 Hunk #2 succeeded at 1883 (offset -43 lines).
 Hunk #3 succeeded at 2825 (offset -47 lines).
 Hunk #4 succeeded at 2835 (offset -47 lines).
 Hunk #5 succeeded at 7166 (offset -345 lines).
 Hunk #6 succeeded at 8504 (offset -371 lines).
 patching file fs/btrfs/ordered-data.c
 patching file fs/btrfs/qgroup.c
 Hunk #1 succeeded at 2720 (offset -2 lines).
 patching file fs/btrfs/raid56.c
 patching file fs/btrfs/reada.c
 patching file fs/btrfs/scrub.c
 Hunk #1 succeeded at 428 (offset 1 line).
 Hunk #2 succeeded at 999 (offset 2 lines).
 Hunk #3 succeeded at 1616 (offset -8 lines).
 Hunk #4 succeeded at 3204 (offset -29 lines).
 patching file fs/btrfs/volumes.c
 Hunk #1 succeeded at 5800 (offset -87 lines).
 
 Otherwise, I´d wait till rc3, as my current 3.16.2 with v1 of this patch 
 seems 
 to behave stable with trees occupying all space:
 
 merkaba:~ btrfs fi sh /home
 Label: 'home'  uuid: […]
 Total devices 2 FS bytes used 127.69GiB
 devid1 size 160.00GiB used 160.00GiB path /dev/dm-0
 devid2 size 160.00GiB used 160.00GiB path /dev/mapper/sata-home

Hmm, the v3 patch is based on Chris's integration branch, so it's expected not
to apply it cleanly.  But if you're urgent, I can provide a patch based on
3.17-rc2.

(rc3 is coming soon though ;-) )

thanks,
-liubo

 
 Ciao,
 -- 
 Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
 GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write

2014-08-26 Thread Martin Steigerwald
Am Dienstag, 26. August 2014, 18:38:03 schrieb Liu Bo:
 On Tue, Aug 26, 2014 at 12:20:28PM +0200, Martin Steigerwald wrote:
  Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason:
   On 08/15/2014 11:36 AM, Liu Bo wrote:
This has been reported and discussed for a long time, and this hang
occurs
in both 3.15 and 3.16.
   
   [ great description ]
   
   I ran this through tests last week, and an overnight test over the
   weekend.  It's in my for-linus branch now, along with everything else I
   plan on sending for rc3.
   
   Please double check my merge, I had to undo your rebase onto Miao's
   patches. 
  I would like to test this on 3.17-rc2, what do I need to do to make it
  apply cleanly? That function in disk-io.c looks quite different from what
  the patch assumes at the beginning, so I am not sure how to merge this.
  
  martin@merkaba:~/Computer/Merkaba/Kernel/linux patch -p1  ../[PATCH v3]
  Btrfs_fix task hang under heavy compressed write.mbox
  patching file fs/btrfs/async-thread.c
  patching file fs/btrfs/async-thread.h
  patching file fs/btrfs/delayed-inode.c
  patching file fs/btrfs/disk-io.c
  Hunk #2 FAILED at 713.
  Hunk #3 succeeded at 827 (offset -29 lines).
  1 out of 3 hunks FAILED -- saving rejects to file fs/btrfs/disk-io.c.rej
  patching file fs/btrfs/extent-tree.c
  patching file fs/btrfs/inode.c
  Hunk #1 succeeded at 1096 (offset -15 lines).
  Hunk #2 succeeded at 1883 (offset -43 lines).
  Hunk #3 succeeded at 2825 (offset -47 lines).
  Hunk #4 succeeded at 2835 (offset -47 lines).
  Hunk #5 succeeded at 7166 (offset -345 lines).
  Hunk #6 succeeded at 8504 (offset -371 lines).
  patching file fs/btrfs/ordered-data.c
  patching file fs/btrfs/qgroup.c
  Hunk #1 succeeded at 2720 (offset -2 lines).
  patching file fs/btrfs/raid56.c
  patching file fs/btrfs/reada.c
  patching file fs/btrfs/scrub.c
  Hunk #1 succeeded at 428 (offset 1 line).
  Hunk #2 succeeded at 999 (offset 2 lines).
  Hunk #3 succeeded at 1616 (offset -8 lines).
  Hunk #4 succeeded at 3204 (offset -29 lines).
  patching file fs/btrfs/volumes.c
  Hunk #1 succeeded at 5800 (offset -87 lines).
  
  Otherwise, I´d wait till rc3, as my current 3.16.2 with v1 of this patch
  seems to behave stable with trees occupying all space:
  
  merkaba:~ btrfs fi sh /home
  Label: 'home'  uuid: […]
  
  Total devices 2 FS bytes used 127.69GiB
  devid1 size 160.00GiB used 160.00GiB path /dev/dm-0
  devid2 size 160.00GiB used 160.00GiB path
  /dev/mapper/sata-home
 
 Hmm, the v3 patch is based on Chris's integration branch, so it's expected
 not to apply it cleanly.  But if you're urgent, I can provide a patch based
 on 3.17-rc2.
 
 (rc3 is coming soon though ;-) )

I see.

Well I can stick with 3.16.1 for the time being. But if you want some 
additional testing soonish, feel free to provide a patch.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write

2014-08-26 Thread Chris Mason


On 08/26/2014 06:20 AM, Martin Steigerwald wrote:
 Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason:
 On 08/15/2014 11:36 AM, Liu Bo wrote:
 This has been reported and discussed for a long time, and this hang occurs
 in both 3.15 and 3.16.

 [ great description ]

 I ran this through tests last week, and an overnight test over the
 weekend.  It's in my for-linus branch now, along with everything else I
 plan on sending for rc3.

 Please double check my merge, I had to undo your rebase onto Miao's patches.
 
 I would like to test this on 3.17-rc2, what do I need to do to make it apply 
 cleanly? That function in disk-io.c looks quite different from what the patch 
 assumes at the beginning, so I am not sure how to merge this.

You can just pull my for-linus branch, Liu Bo's fix is on top.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write

2014-08-26 Thread Martin Steigerwald
Am Dienstag, 26. August 2014, 09:02:23 schrieb Chris Mason:
 On 08/26/2014 06:20 AM, Martin Steigerwald wrote:
  Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason:
  On 08/15/2014 11:36 AM, Liu Bo wrote:
  This has been reported and discussed for a long time, and this hang
  occurs
  in both 3.15 and 3.16.
  
  [ great description ]
  
  I ran this through tests last week, and an overnight test over the
  weekend.  It's in my for-linus branch now, along with everything else I
  plan on sending for rc3.
  
  Please double check my merge, I had to undo your rebase onto Miao's
  patches. 
  I would like to test this on 3.17-rc2, what do I need to do to make it
  apply cleanly? That function in disk-io.c looks quite different from what
  the patch assumes at the beginning, so I am not sure how to merge this.
 
 You can just pull my for-linus branch, Liu Bo's fix is on top.

Thanks, I put the commit on top of your for-linus branch on top of 3.17-rc2. 
Lets see how it goes.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write

2014-08-25 Thread Chris Mason
On 08/15/2014 11:36 AM, Liu Bo wrote:
 This has been reported and discussed for a long time, and this hang occurs in
 both 3.15 and 3.16.

[ great description ]

I ran this through tests last week, and an overnight test over the
weekend.  It's in my for-linus branch now, along with everything else I
plan on sending for rc3.

Please double check my merge, I had to undo your rebase onto Miao's patches.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write

2014-08-25 Thread Liu Bo
On Mon, Aug 25, 2014 at 10:58:13AM -0400, Chris Mason wrote:
 On 08/15/2014 11:36 AM, Liu Bo wrote:
  This has been reported and discussed for a long time, and this hang occurs 
  in
  both 3.15 and 3.16.
 
 [ great description ]
 
 I ran this through tests last week, and an overnight test over the
 weekend.  It's in my for-linus branch now, along with everything else I
 plan on sending for rc3.
 
 Please double check my merge, I had to undo your rebase onto Miao's patches.

Just checked, looks good ;)

thanks,
-liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write

2014-08-18 Thread Liu Bo
On Sat, Aug 16, 2014 at 03:28:11PM +0800, Miao Xie wrote:
 On Fri, 15 Aug 2014 23:36:53 +0800, Liu Bo wrote:
  This has been reported and discussed for a long time, and this hang occurs 
  in
  both 3.15 and 3.16.
  
  Btrfs now migrates to use kernel workqueue, but it introduces this hang 
  problem.
  
  Btrfs has a kind of work queued as an ordered way, which means that its
  ordered_func() must be processed in the way of FIFO, so it usually looks 
  like --
  
  normal_work_helper(arg)
  work = container_of(arg, struct btrfs_work, normal_work);
  
  work-func()  (we name it work X)
  for ordered_work in wq-ordered_list
  ordered_work-ordered_func()
  ordered_work-ordered_free()
  
  The hang is a rare case, first when we find free space, we get an uncached 
  block
  group, then we go to read its free space cache inode for free space 
  information,
  so it will
  
  file a readahead request
  btrfs_readpages()
   for page that is not in page cache
  __do_readpage()
   submit_extent_page()
 btrfs_submit_bio_hook()
   btrfs_bio_wq_end_io()
   submit_bio()
   end_workqueue_bio() --(ret by the 1st 
  endio)
queue a work(named work Y) for the 2nd
also the real endio()
  
  So the hang occurs when work Y's work_struct and work X's work_struct 
  happens
  to share the same address.
  
  A bit more explanation,
  
  A,B,C -- struct btrfs_work
  arg   -- struct work_struct
  
  kthread:
  worker_thread()
  pick up a work_struct from @worklist
  process_one_work(arg)
  worker-current_work = arg;  -- arg is A-normal_work
  worker-current_func(arg)
  normal_work_helper(arg)
   A = container_of(arg, struct btrfs_work, normal_work);
  
   A-func()
   A-ordered_func()
   A-ordered_free()  -- A gets freed
  
   B-ordered_func()
submit_compressed_extents()
find_free_extent()
load_free_space_inode()
...   -- (the above readhead stack)
end_workqueue_bio()
 btrfs_queue_work(work C)
   B-ordered_free()
  
  As if work A has a high priority in wq-ordered_list and there are more 
  ordered
  works queued after it, such as B-ordered_func(), its memory could have been
  freed before normal_work_helper() returns, which means that kernel workqueue
  code worker_thread() still has worker-current_work pointer to be work
  A-normal_work's, ie. arg's address.
  
  Meanwhile, work C is allocated after work A is freed, work C-normal_work
  and work A-normal_work are likely to share the same address(I confirmed 
  this
  with ftrace output, so I'm not just guessing, it's rare though).
  
  When another kthread picks up work C-normal_work to process, and finds our
  kthread is processing it(see find_worker_executing_work()), it'll think
  work C as a collision and skip then, which ends up nobody processing work C.
  
  So the situation is that our kthread is waiting forever on work C.
  
  Besides, there're other cases that can lead to deadlock, but the real 
  problem
  is that all btrfs workqueue shares one work-func, -- normal_work_helper,
  so this makes each workqueue to have its own helper function, but only a
  wraper pf normal_work_helper.
  
  With this patch, I no long hit the above hang.
  
  Signed-off-by: Liu Bo bo.li@oracle.com
  ---
  v3:
- Adopting Chris's advice, this has each workqueue provide its own helper
  function.
- This is rebased on the latest Chris's for-linus branch.
  
  v2:
- This changes a bit to not defer all -ordered_free(), but only defer the
  work that triggers this run_ordered_work().  Actually we don't need to
  defer other -ordered_free() because their work cannot be this kthread
  worker's @current_work.  We can benefit from it since this can pin less
  memory during the period.
  
   fs/btrfs/async-thread.c  | 44 +++--
   fs/btrfs/async-thread.h  | 28 -
   fs/btrfs/delayed-inode.c |  4 +--
   fs/btrfs/disk-io.c   | 63 
  ++--
   fs/btrfs/extent-tree.c   |  7 +++---
   fs/btrfs/inode.c | 35 ++-
   fs/btrfs/ordered-data.c  |  1 +
   fs/btrfs/qgroup.c|  1 +
   fs/btrfs/raid56.c|  9 ---
   fs/btrfs/reada.c |  3 ++-
   fs/btrfs/scrub.c | 14 ++-
   fs/btrfs/volumes.c   |  3 ++-
   12 files changed, 146 insertions(+), 66 deletions(-)
  
  diff --git a/fs/btrfs/async-thread.c 

Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write

2014-08-16 Thread Miao Xie
On Fri, 15 Aug 2014 23:36:53 +0800, Liu Bo wrote:
 This has been reported and discussed for a long time, and this hang occurs in
 both 3.15 and 3.16.
 
 Btrfs now migrates to use kernel workqueue, but it introduces this hang 
 problem.
 
 Btrfs has a kind of work queued as an ordered way, which means that its
 ordered_func() must be processed in the way of FIFO, so it usually looks like 
 --
 
 normal_work_helper(arg)
 work = container_of(arg, struct btrfs_work, normal_work);
 
 work-func()  (we name it work X)
 for ordered_work in wq-ordered_list
 ordered_work-ordered_func()
 ordered_work-ordered_free()
 
 The hang is a rare case, first when we find free space, we get an uncached 
 block
 group, then we go to read its free space cache inode for free space 
 information,
 so it will
 
 file a readahead request
 btrfs_readpages()
  for page that is not in page cache
 __do_readpage()
  submit_extent_page()
btrfs_submit_bio_hook()
  btrfs_bio_wq_end_io()
  submit_bio()
  end_workqueue_bio() --(ret by the 1st endio)
   queue a work(named work Y) for the 2nd
   also the real endio()
 
 So the hang occurs when work Y's work_struct and work X's work_struct happens
 to share the same address.
 
 A bit more explanation,
 
 A,B,C -- struct btrfs_work
 arg   -- struct work_struct
 
 kthread:
 worker_thread()
 pick up a work_struct from @worklist
 process_one_work(arg)
   worker-current_work = arg;  -- arg is A-normal_work
   worker-current_func(arg)
   normal_work_helper(arg)
A = container_of(arg, struct btrfs_work, normal_work);
 
A-func()
A-ordered_func()
A-ordered_free()  -- A gets freed
 
B-ordered_func()
 submit_compressed_extents()
 find_free_extent()
 load_free_space_inode()
 ...   -- (the above readhead stack)
 end_workqueue_bio()
  btrfs_queue_work(work C)
B-ordered_free()
 
 As if work A has a high priority in wq-ordered_list and there are more 
 ordered
 works queued after it, such as B-ordered_func(), its memory could have been
 freed before normal_work_helper() returns, which means that kernel workqueue
 code worker_thread() still has worker-current_work pointer to be work
 A-normal_work's, ie. arg's address.
 
 Meanwhile, work C is allocated after work A is freed, work C-normal_work
 and work A-normal_work are likely to share the same address(I confirmed this
 with ftrace output, so I'm not just guessing, it's rare though).
 
 When another kthread picks up work C-normal_work to process, and finds our
 kthread is processing it(see find_worker_executing_work()), it'll think
 work C as a collision and skip then, which ends up nobody processing work C.
 
 So the situation is that our kthread is waiting forever on work C.
 
 Besides, there're other cases that can lead to deadlock, but the real problem
 is that all btrfs workqueue shares one work-func, -- normal_work_helper,
 so this makes each workqueue to have its own helper function, but only a
 wraper pf normal_work_helper.
 
 With this patch, I no long hit the above hang.
 
 Signed-off-by: Liu Bo bo.li@oracle.com
 ---
 v3:
   - Adopting Chris's advice, this has each workqueue provide its own helper
 function.
   - This is rebased on the latest Chris's for-linus branch.
 
 v2:
   - This changes a bit to not defer all -ordered_free(), but only defer the
 work that triggers this run_ordered_work().  Actually we don't need to
 defer other -ordered_free() because their work cannot be this kthread
 worker's @current_work.  We can benefit from it since this can pin less
 memory during the period.
 
  fs/btrfs/async-thread.c  | 44 +++--
  fs/btrfs/async-thread.h  | 28 -
  fs/btrfs/delayed-inode.c |  4 +--
  fs/btrfs/disk-io.c   | 63 
 ++--
  fs/btrfs/extent-tree.c   |  7 +++---
  fs/btrfs/inode.c | 35 ++-
  fs/btrfs/ordered-data.c  |  1 +
  fs/btrfs/qgroup.c|  1 +
  fs/btrfs/raid56.c|  9 ---
  fs/btrfs/reada.c |  3 ++-
  fs/btrfs/scrub.c | 14 ++-
  fs/btrfs/volumes.c   |  3 ++-
  12 files changed, 146 insertions(+), 66 deletions(-)
 
 diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
 index 5a201d8..fbd76de 100644
 --- a/fs/btrfs/async-thread.c
 +++ b/fs/btrfs/async-thread.c
 @@ -22,7 +22,6 @@
  #include linux/list.h
  #include 

[PATCH v3] Btrfs: fix task hang under heavy compressed write

2014-08-15 Thread Liu Bo
This has been reported and discussed for a long time, and this hang occurs in
both 3.15 and 3.16.

Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.

Btrfs has a kind of work queued as an ordered way, which means that its
ordered_func() must be processed in the way of FIFO, so it usually looks like --

normal_work_helper(arg)
work = container_of(arg, struct btrfs_work, normal_work);

work-func()  (we name it work X)
for ordered_work in wq-ordered_list
ordered_work-ordered_func()
ordered_work-ordered_free()

The hang is a rare case, first when we find free space, we get an uncached block
group, then we go to read its free space cache inode for free space information,
so it will

file a readahead request
btrfs_readpages()
 for page that is not in page cache
__do_readpage()
 submit_extent_page()
   btrfs_submit_bio_hook()
 btrfs_bio_wq_end_io()
 submit_bio()
 end_workqueue_bio() --(ret by the 1st endio)
  queue a work(named work Y) for the 2nd
  also the real endio()

So the hang occurs when work Y's work_struct and work X's work_struct happens
to share the same address.

A bit more explanation,

A,B,C -- struct btrfs_work
arg   -- struct work_struct

kthread:
worker_thread()
pick up a work_struct from @worklist
process_one_work(arg)
worker-current_work = arg;  -- arg is A-normal_work
worker-current_func(arg)
normal_work_helper(arg)
 A = container_of(arg, struct btrfs_work, normal_work);

 A-func()
 A-ordered_func()
 A-ordered_free()  -- A gets freed

 B-ordered_func()
  submit_compressed_extents()
  find_free_extent()
  load_free_space_inode()
  ...   -- (the above readhead stack)
  end_workqueue_bio()
   btrfs_queue_work(work C)
 B-ordered_free()

As if work A has a high priority in wq-ordered_list and there are more ordered
works queued after it, such as B-ordered_func(), its memory could have been
freed before normal_work_helper() returns, which means that kernel workqueue
code worker_thread() still has worker-current_work pointer to be work
A-normal_work's, ie. arg's address.

Meanwhile, work C is allocated after work A is freed, work C-normal_work
and work A-normal_work are likely to share the same address(I confirmed this
with ftrace output, so I'm not just guessing, it's rare though).

When another kthread picks up work C-normal_work to process, and finds our
kthread is processing it(see find_worker_executing_work()), it'll think
work C as a collision and skip then, which ends up nobody processing work C.

So the situation is that our kthread is waiting forever on work C.

Besides, there're other cases that can lead to deadlock, but the real problem
is that all btrfs workqueue shares one work-func, -- normal_work_helper,
so this makes each workqueue to have its own helper function, but only a
wraper pf normal_work_helper.

With this patch, I no long hit the above hang.

Signed-off-by: Liu Bo bo.li@oracle.com
---
v3:
  - Adopting Chris's advice, this has each workqueue provide its own helper
function.
  - This is rebased on the latest Chris's for-linus branch.

v2:
  - This changes a bit to not defer all -ordered_free(), but only defer the
work that triggers this run_ordered_work().  Actually we don't need to
defer other -ordered_free() because their work cannot be this kthread
worker's @current_work.  We can benefit from it since this can pin less
memory during the period.

 fs/btrfs/async-thread.c  | 44 +++--
 fs/btrfs/async-thread.h  | 28 -
 fs/btrfs/delayed-inode.c |  4 +--
 fs/btrfs/disk-io.c   | 63 ++--
 fs/btrfs/extent-tree.c   |  7 +++---
 fs/btrfs/inode.c | 35 ++-
 fs/btrfs/ordered-data.c  |  1 +
 fs/btrfs/qgroup.c|  1 +
 fs/btrfs/raid56.c|  9 ---
 fs/btrfs/reada.c |  3 ++-
 fs/btrfs/scrub.c | 14 ++-
 fs/btrfs/volumes.c   |  3 ++-
 12 files changed, 146 insertions(+), 66 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 5a201d8..fbd76de 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -22,7 +22,6 @@
 #include linux/list.h
 #include linux/spinlock.h
 #include linux/freezer.h
-#include linux/workqueue.h
 #include async-thread.h
 #include ctree.h
 
@@ -55,8 +54,39 @@ struct btrfs_workqueue {
 

Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write

2014-08-15 Thread Chris Mason
On 08/15/2014 11:36 AM, Liu Bo wrote:

[ snip ]
 
 Besides, there're other cases that can lead to deadlock, but the real problem
 is that all btrfs workqueue shares one work-func, -- normal_work_helper,
 so this makes each workqueue to have its own helper function, but only a
 wraper pf normal_work_helper.
 
 With this patch, I no long hit the above hang.
 
 Signed-off-by: Liu Bo bo.li@oracle.com
 ---
 v3:
   - Adopting Chris's advice, this has each workqueue provide its own helper
 function.
   - This is rebased on the latest Chris's for-linus branch.

This is what I had in mind, thanks Liu! I'll test and queue it up for rc2.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html