Re: [PATCH v3 10/11] Btrfs: fix possible deadlock caused by pending I/O in plug list

2014-12-02 Thread Miao Xie
hi, Chris

On Fri, 28 Nov 2014 16:32:03 -0500, Chris Mason wrote:
 On Wed, Nov 26, 2014 at 10:00 PM, Miao Xie mi...@cn.fujitsu.com wrote:
 On Thu, 27 Nov 2014 09:39:56 +0800, Miao Xie wrote:
  On Wed, 26 Nov 2014 10:02:23 -0500, Chris Mason wrote:
  On Wed, Nov 26, 2014 at 8:04 AM, Miao Xie mi...@cn.fujitsu.com wrote:
  The increase/decrease of bio counter is on the I/O path, so we should
  use io_schedule() instead of schedule(), or the deadlock might be
  triggered by the pending I/O in the plug list. io_schedule() can help
  us because it will flush all the pending I/O before the task is going
  to sleep.

  Can you please describe this deadlock in more detail?  schedule() also 
 triggers
  a flush of the plug list, and if that's no longer sufficient we can run 
 into other
  problems (especially with preemption on).

  Sorry for my miss. I forgot to check the current implementation of 
 schedule(), which flushes the plug list unconditionally. Please ignore this 
 patch.

 I have updated my raid56-scrub-replace branch, please re-pull the branch.

   https://github.com/miaoxie/linux-btrfs.git raid56-scrub-replace
 
 Sorry, I wasn't clear.  I do like the patch because it uses a slightly better 
 trigger mechanism for the flush.  I was just worried about a larger deadlock.
 
 I ran the raid56 work with stress.sh overnight, then scrubbed the resulting 
 filesystem and ran balance when the scrub completed.  All of these passed 
 without errors (excellent!).
 
 Then I zero'd 4GB of one drive and ran scrub again.  This was the result.  
 Please make sure CONFIG_DEBUG_PAGEALLOC is enabled and you should be able to 
 reproduce.

I sent out the 4th version of the patchset, please try it.

I have pushed the new patchset to my git tree, you can re-pull it.
  https://github.com/miaoxie/linux-btrfs.git raid56-scrub-replace

Thanks
Miao

 
 [192392.495260] BUG: unable to handle kernel paging request at 
 880303062f80
 [192392.495279] IP: [a05fe77a] lock_stripe_add+0xba/0x390 [btrfs]
 [192392.495281] PGD 2bdb067 PUD 107e7fd067 PMD 107e7e4067 PTE 800303062060
 [192392.495283] Oops:  [#1] SMP DEBUG_PAGEALLOC
 [192392.495307] Modules linked in: ipmi_devintf loop fuse k10temp coretemp 
 hwmon btrfs raid6_pq zlib_deflate lzo_compress xor xfs exportfs libcrc32c 
 tcp_diag inet_diag nfsv4 ip6table_filter ip6_tables xt_NFLOG nfnetlink_log 
 nfnetlink xt_comment xt_statistic iptable_filter ip_tables x_tables mptctl 
 netconsole autofs4 nfsv3 nfs lockd grace rpcsec_gss_krb5 auth_rpcgss 
 oid_registry sunrpc ipv6 ext3 jbd dm_mod rtc_cmos ipmi_si ipmi_msghandler 
 iTCO_wdt iTCO_vendor_support pcspkr i2c_i801 lpc_ich mfd_core shpchp ehci_pci 
 ehci_hcd mlx4_en ptp pps_core mlx4_core sg ses enclosure button megaraid_sas
 [192392.495310] CPU: 0 PID: 11992 Comm: kworker/u65:2 Not tainted 
 3.18.0-rc6-mason+ #7
 [192392.495310] Hardware name: ZTSYSTEMS Echo Ridge T4  /A9DRPF-10D, BIOS 
 1.07 05/10/2012
 [192392.495323] Workqueue: btrfs-btrfs-scrub btrfs_scrub_helper [btrfs]
 [192392.495324] task: 88013dae9110 ti: 8802296a task.ti: 
 8802296a
 [192392.495335] RIP: 0010:[a05fe77a]  [a05fe77a] 
 lock_stripe_add+0xba/0x390 [btrfs]
 [192392.495335] RSP: 0018:8802296a3ac8  EFLAGS: 00010006
 [192392.495336] RAX: 880577e85018 RBX: 880497f0b2f8 RCX: 
 8801190fb000
 [192392.495337] RDX: 013d RSI: 880303062f80 RDI: 
 040c275a
 [192392.495338] RBP: 8802296a3b48 R08: 880497f0 R09: 
 0001
 [192392.495339] R10:  R11:  R12: 
 0282
 [192392.495339] R13: b250 R14: 880577e85000 R15: 
 880497f0b2a0
 [192392.495340] FS:  () GS:88085fc0() 
 knlGS:
 [192392.495341] CS:  0010 DS:  ES:  CR0: 80050033
 [192392.495342] CR2: 880303062f80 CR3: 05289000 CR4: 
 000407f0
 [192392.495342] Stack:
 [192392.495344]  880755e28000 880497f0 013d 
 8801190fb000
 [192392.495346]   88013dae9110 81090d40 
 8802296a3b00
 [192392.495347]  8802296a3b00 0010 8802296a3b68 
 8801190fb000
 [192392.495348] Call Trace:
 [192392.495353]  [81090d40] ? bit_waitqueue+0xa0/0xa0
 [192392.495363]  [a05fea66] 
 raid56_parity_submit_scrub_rbio+0x16/0x30 [btrfs]
 [192392.495372]  [a05e2f0e] 
 scrub_parity_check_and_repair+0x15e/0x1e0 [btrfs]
 [192392.495380]  [a05e301d] scrub_block_put+0x8d/0x90 [btrfs]
 [192392.495388]  [a05e6ed7] ? scrub_bio_end_io_worker+0xd7/0x870 
 [btrfs]
 [192392.495396]  [a05e6ee9] scrub_bio_end_io_worker+0xe9/0x870 
 [btrfs]
 [192392.495405]  [a05b8c44] normal_work_helper+0x84/0x330 [btrfs]
 [192392.495414]  [a05b8f42] btrfs_scrub_helper+0x12/0x20 [btrfs]
 [192392.495417]  [8106c50f] process_one_work+0x1bf/0x520
 [192392.495419]  [8106c48d] ? 

Re: [PATCH v3 10/11] Btrfs: fix possible deadlock caused by pending I/O in plug list

2014-11-28 Thread Chris Mason

On Wed, Nov 26, 2014 at 10:00 PM, Miao Xie mi...@cn.fujitsu.com wrote:

On Thu, 27 Nov 2014 09:39:56 +0800, Miao Xie wrote:

 On Wed, 26 Nov 2014 10:02:23 -0500, Chris Mason wrote:
 On Wed, Nov 26, 2014 at 8:04 AM, Miao Xie mi...@cn.fujitsu.com 
wrote:
 The increase/decrease of bio counter is on the I/O path, so we 
should

 use io_schedule() instead of schedule(), or the deadlock might be
 triggered by the pending I/O in the plug list. io_schedule() can 
help
 us because it will flush all the pending I/O before the task is 
going

 to sleep.


 Can you please describe this deadlock in more detail?  schedule() 
also triggers
 a flush of the plug list, and if that's no longer sufficient we 
can run into other

 problems (especially with preemption on).


 Sorry for my miss. I forgot to check the current implementation of 
schedule(), which flushes the plug list unconditionally. Please 
ignore this patch.


I have updated my raid56-scrub-replace branch, please re-pull the 
branch.


  https://github.com/miaoxie/linux-btrfs.git raid56-scrub-replace


Sorry, I wasn't clear.  I do like the patch because it uses a slightly 
better trigger mechanism for the flush.  I was just worried about a 
larger deadlock.


I ran the raid56 work with stress.sh overnight, then scrubbed the 
resulting filesystem and ran balance when the scrub completed.  All of 
these passed without errors (excellent!).


Then I zero'd 4GB of one drive and ran scrub again.  This was the 
result.  Please make sure CONFIG_DEBUG_PAGEALLOC is enabled and you 
should be able to reproduce.


[192392.495260] BUG: unable to handle kernel paging request at 
880303062f80
[192392.495279] IP: [a05fe77a] lock_stripe_add+0xba/0x390 
[btrfs]
[192392.495281] PGD 2bdb067 PUD 107e7fd067 PMD 107e7e4067 PTE 
800303062060

[192392.495283] Oops:  [#1] SMP DEBUG_PAGEALLOC
[192392.495307] Modules linked in: ipmi_devintf loop fuse k10temp 
coretemp hwmon btrfs raid6_pq zlib_deflate lzo_compress xor xfs 
exportfs libcrc32c tcp_diag inet_diag nfsv4 ip6table_filter ip6_tables 
xt_NFLOG nfnetlink_log nfnetlink xt_comment xt_statistic iptable_filter 
ip_tables x_tables mptctl netconsole autofs4 nfsv3 nfs lockd grace 
rpcsec_gss_krb5 auth_rpcgss oid_registry sunrpc ipv6 ext3 jbd dm_mod 
rtc_cmos ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support pcspkr 
i2c_i801 lpc_ich mfd_core shpchp ehci_pci ehci_hcd mlx4_en ptp pps_core 
mlx4_core sg ses enclosure button megaraid_sas
[192392.495310] CPU: 0 PID: 11992 Comm: kworker/u65:2 Not tainted 
3.18.0-rc6-mason+ #7
[192392.495310] Hardware name: ZTSYSTEMS Echo Ridge T4  /A9DRPF-10D, 
BIOS 1.07 05/10/2012

[192392.495323] Workqueue: btrfs-btrfs-scrub btrfs_scrub_helper [btrfs]
[192392.495324] task: 88013dae9110 ti: 8802296a task.ti: 
8802296a
[192392.495335] RIP: 0010:[a05fe77a]  [a05fe77a] 
lock_stripe_add+0xba/0x390 [btrfs]

[192392.495335] RSP: 0018:8802296a3ac8  EFLAGS: 00010006
[192392.495336] RAX: 880577e85018 RBX: 880497f0b2f8 RCX: 
8801190fb000
[192392.495337] RDX: 013d RSI: 880303062f80 RDI: 
040c275a
[192392.495338] RBP: 8802296a3b48 R08: 880497f0 R09: 
0001
[192392.495339] R10:  R11:  R12: 
0282
[192392.495339] R13: b250 R14: 880577e85000 R15: 
880497f0b2a0
[192392.495340] FS:  () GS:88085fc0() 
knlGS:

[192392.495341] CS:  0010 DS:  ES:  CR0: 80050033
[192392.495342] CR2: 880303062f80 CR3: 05289000 CR4: 
000407f0

[192392.495342] Stack:
[192392.495344]  880755e28000 880497f0 013d 
8801190fb000
[192392.495346]   88013dae9110 81090d40 
8802296a3b00
[192392.495347]  8802296a3b00 0010 8802296a3b68 
8801190fb000

[192392.495348] Call Trace:
[192392.495353]  [81090d40] ? bit_waitqueue+0xa0/0xa0
[192392.495363]  [a05fea66] 
raid56_parity_submit_scrub_rbio+0x16/0x30 [btrfs]
[192392.495372]  [a05e2f0e] 
scrub_parity_check_and_repair+0x15e/0x1e0 [btrfs]

[192392.495380]  [a05e301d] scrub_block_put+0x8d/0x90 [btrfs]
[192392.495388]  [a05e6ed7] ? 
scrub_bio_end_io_worker+0xd7/0x870 [btrfs]
[192392.495396]  [a05e6ee9] 
scrub_bio_end_io_worker+0xe9/0x870 [btrfs]
[192392.495405]  [a05b8c44] normal_work_helper+0x84/0x330 
[btrfs]
[192392.495414]  [a05b8f42] btrfs_scrub_helper+0x12/0x20 
[btrfs]

[192392.495417]  [8106c50f] process_one_work+0x1bf/0x520
[192392.495419]  [8106c48d] ? process_one_work+0x13d/0x520
[192392.495421]  [8106c98e] worker_thread+0x11e/0x4b0
[192392.495424]  [81653ac9] ? __schedule+0x389/0x880
[192392.495426]  [8106c870] ? process_one_work+0x520/0x520
[192392.495428]  [81071e2e] kthread+0xde/0x100
[192392.495430]  [81071d50] ? 

[PATCH v3 10/11] Btrfs: fix possible deadlock caused by pending I/O in plug list

2014-11-26 Thread Miao Xie
The increase/decrease of bio counter is on the I/O path, so we should
use io_schedule() instead of schedule(), or the deadlock might be
triggered by the pending I/O in the plug list. io_schedule() can help
us because it will flush all the pending I/O before the task is going
to sleep.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v2 - v3:
- New patch to fix possible deadlock caused by the pending bios in the
  plug list when the io submitters were going to sleep.

Changelog v1 - v2:
- None.
---
 fs/btrfs/dev-replace.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index fa27b4e..894796a 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -928,16 +928,23 @@ void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, 
s64 amount)
wake_up(fs_info-replace_wait);
 }
 
+#define btrfs_wait_event_io(wq, condition) \
+do {   \
+   if (condition)  \
+   break;  \
+   (void)___wait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0, 0,  \
+   io_schedule()); \
+} while (0)
+
 void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info)
 {
-   DEFINE_WAIT(wait);
 again:
percpu_counter_inc(fs_info-bio_counter);
if (test_bit(BTRFS_FS_STATE_DEV_REPLACING, fs_info-fs_state)) {
btrfs_bio_counter_dec(fs_info);
-   wait_event(fs_info-replace_wait,
-  !test_bit(BTRFS_FS_STATE_DEV_REPLACING,
-fs_info-fs_state));
+   btrfs_wait_event_io(fs_info-replace_wait,
+   !test_bit(BTRFS_FS_STATE_DEV_REPLACING,
+ fs_info-fs_state));
goto again;
}
 
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 10/11] Btrfs: fix possible deadlock caused by pending I/O in plug list

2014-11-26 Thread Chris Mason

On Wed, Nov 26, 2014 at 8:04 AM, Miao Xie mi...@cn.fujitsu.com wrote:

The increase/decrease of bio counter is on the I/O path, so we should
use io_schedule() instead of schedule(), or the deadlock might be
triggered by the pending I/O in the plug list. io_schedule() can help
us because it will flush all the pending I/O before the task is going
to sleep.


Can you please describe this deadlock in more detail?  schedule() also 
triggers a flush of the plug list, and if that's no longer sufficient 
we can run into other problems (especially with preemption on).


-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 10/11] Btrfs: fix possible deadlock caused by pending I/O in plug list

2014-11-26 Thread Miao Xie
On Thu, 27 Nov 2014 09:39:56 +0800, Miao Xie wrote:
 On Wed, 26 Nov 2014 10:02:23 -0500, Chris Mason wrote:
 On Wed, Nov 26, 2014 at 8:04 AM, Miao Xie mi...@cn.fujitsu.com wrote:
 The increase/decrease of bio counter is on the I/O path, so we should
 use io_schedule() instead of schedule(), or the deadlock might be
 triggered by the pending I/O in the plug list. io_schedule() can help
 us because it will flush all the pending I/O before the task is going
 to sleep.

 Can you please describe this deadlock in more detail?  schedule() also 
 triggers
 a flush of the plug list, and if that's no longer sufficient we can run into 
 other
 problems (especially with preemption on).
 
 Sorry for my miss. I forgot to check the current implementation of 
 schedule(), which flushes the plug list unconditionally. Please ignore this 
 patch.

I have updated my raid56-scrub-replace branch, please re-pull the branch.

  https://github.com/miaoxie/linux-btrfs.git raid56-scrub-replace

Thanks
Miao

 
 Thanks
 Miao
 

 -chris


 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html