Re: [PATCH v3 10/11] Btrfs: fix possible deadlock caused by pending I/O in plug list
hi, Chris On Fri, 28 Nov 2014 16:32:03 -0500, Chris Mason wrote: On Wed, Nov 26, 2014 at 10:00 PM, Miao Xie mi...@cn.fujitsu.com wrote: On Thu, 27 Nov 2014 09:39:56 +0800, Miao Xie wrote: On Wed, 26 Nov 2014 10:02:23 -0500, Chris Mason wrote: On Wed, Nov 26, 2014 at 8:04 AM, Miao Xie mi...@cn.fujitsu.com wrote: The increase/decrease of bio counter is on the I/O path, so we should use io_schedule() instead of schedule(), or the deadlock might be triggered by the pending I/O in the plug list. io_schedule() can help us because it will flush all the pending I/O before the task is going to sleep. Can you please describe this deadlock in more detail? schedule() also triggers a flush of the plug list, and if that's no longer sufficient we can run into other problems (especially with preemption on). Sorry for my miss. I forgot to check the current implementation of schedule(), which flushes the plug list unconditionally. Please ignore this patch. I have updated my raid56-scrub-replace branch, please re-pull the branch. https://github.com/miaoxie/linux-btrfs.git raid56-scrub-replace Sorry, I wasn't clear. I do like the patch because it uses a slightly better trigger mechanism for the flush. I was just worried about a larger deadlock. I ran the raid56 work with stress.sh overnight, then scrubbed the resulting filesystem and ran balance when the scrub completed. All of these passed without errors (excellent!). Then I zero'd 4GB of one drive and ran scrub again. This was the result. Please make sure CONFIG_DEBUG_PAGEALLOC is enabled and you should be able to reproduce. I sent out the 4th version of the patchset, please try it. I have pushed the new patchset to my git tree, you can re-pull it. https://github.com/miaoxie/linux-btrfs.git raid56-scrub-replace Thanks Miao [192392.495260] BUG: unable to handle kernel paging request at 880303062f80 [192392.495279] IP: [a05fe77a] lock_stripe_add+0xba/0x390 [btrfs] [192392.495281] PGD 2bdb067 PUD 107e7fd067 PMD 107e7e4067 PTE 800303062060 [192392.495283] Oops: [#1] SMP DEBUG_PAGEALLOC [192392.495307] Modules linked in: ipmi_devintf loop fuse k10temp coretemp hwmon btrfs raid6_pq zlib_deflate lzo_compress xor xfs exportfs libcrc32c tcp_diag inet_diag nfsv4 ip6table_filter ip6_tables xt_NFLOG nfnetlink_log nfnetlink xt_comment xt_statistic iptable_filter ip_tables x_tables mptctl netconsole autofs4 nfsv3 nfs lockd grace rpcsec_gss_krb5 auth_rpcgss oid_registry sunrpc ipv6 ext3 jbd dm_mod rtc_cmos ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support pcspkr i2c_i801 lpc_ich mfd_core shpchp ehci_pci ehci_hcd mlx4_en ptp pps_core mlx4_core sg ses enclosure button megaraid_sas [192392.495310] CPU: 0 PID: 11992 Comm: kworker/u65:2 Not tainted 3.18.0-rc6-mason+ #7 [192392.495310] Hardware name: ZTSYSTEMS Echo Ridge T4 /A9DRPF-10D, BIOS 1.07 05/10/2012 [192392.495323] Workqueue: btrfs-btrfs-scrub btrfs_scrub_helper [btrfs] [192392.495324] task: 88013dae9110 ti: 8802296a task.ti: 8802296a [192392.495335] RIP: 0010:[a05fe77a] [a05fe77a] lock_stripe_add+0xba/0x390 [btrfs] [192392.495335] RSP: 0018:8802296a3ac8 EFLAGS: 00010006 [192392.495336] RAX: 880577e85018 RBX: 880497f0b2f8 RCX: 8801190fb000 [192392.495337] RDX: 013d RSI: 880303062f80 RDI: 040c275a [192392.495338] RBP: 8802296a3b48 R08: 880497f0 R09: 0001 [192392.495339] R10: R11: R12: 0282 [192392.495339] R13: b250 R14: 880577e85000 R15: 880497f0b2a0 [192392.495340] FS: () GS:88085fc0() knlGS: [192392.495341] CS: 0010 DS: ES: CR0: 80050033 [192392.495342] CR2: 880303062f80 CR3: 05289000 CR4: 000407f0 [192392.495342] Stack: [192392.495344] 880755e28000 880497f0 013d 8801190fb000 [192392.495346] 88013dae9110 81090d40 8802296a3b00 [192392.495347] 8802296a3b00 0010 8802296a3b68 8801190fb000 [192392.495348] Call Trace: [192392.495353] [81090d40] ? bit_waitqueue+0xa0/0xa0 [192392.495363] [a05fea66] raid56_parity_submit_scrub_rbio+0x16/0x30 [btrfs] [192392.495372] [a05e2f0e] scrub_parity_check_and_repair+0x15e/0x1e0 [btrfs] [192392.495380] [a05e301d] scrub_block_put+0x8d/0x90 [btrfs] [192392.495388] [a05e6ed7] ? scrub_bio_end_io_worker+0xd7/0x870 [btrfs] [192392.495396] [a05e6ee9] scrub_bio_end_io_worker+0xe9/0x870 [btrfs] [192392.495405] [a05b8c44] normal_work_helper+0x84/0x330 [btrfs] [192392.495414] [a05b8f42] btrfs_scrub_helper+0x12/0x20 [btrfs] [192392.495417] [8106c50f] process_one_work+0x1bf/0x520 [192392.495419] [8106c48d] ?
Re: [PATCH v3 10/11] Btrfs: fix possible deadlock caused by pending I/O in plug list
On Wed, Nov 26, 2014 at 10:00 PM, Miao Xie mi...@cn.fujitsu.com wrote: On Thu, 27 Nov 2014 09:39:56 +0800, Miao Xie wrote: On Wed, 26 Nov 2014 10:02:23 -0500, Chris Mason wrote: On Wed, Nov 26, 2014 at 8:04 AM, Miao Xie mi...@cn.fujitsu.com wrote: The increase/decrease of bio counter is on the I/O path, so we should use io_schedule() instead of schedule(), or the deadlock might be triggered by the pending I/O in the plug list. io_schedule() can help us because it will flush all the pending I/O before the task is going to sleep. Can you please describe this deadlock in more detail? schedule() also triggers a flush of the plug list, and if that's no longer sufficient we can run into other problems (especially with preemption on). Sorry for my miss. I forgot to check the current implementation of schedule(), which flushes the plug list unconditionally. Please ignore this patch. I have updated my raid56-scrub-replace branch, please re-pull the branch. https://github.com/miaoxie/linux-btrfs.git raid56-scrub-replace Sorry, I wasn't clear. I do like the patch because it uses a slightly better trigger mechanism for the flush. I was just worried about a larger deadlock. I ran the raid56 work with stress.sh overnight, then scrubbed the resulting filesystem and ran balance when the scrub completed. All of these passed without errors (excellent!). Then I zero'd 4GB of one drive and ran scrub again. This was the result. Please make sure CONFIG_DEBUG_PAGEALLOC is enabled and you should be able to reproduce. [192392.495260] BUG: unable to handle kernel paging request at 880303062f80 [192392.495279] IP: [a05fe77a] lock_stripe_add+0xba/0x390 [btrfs] [192392.495281] PGD 2bdb067 PUD 107e7fd067 PMD 107e7e4067 PTE 800303062060 [192392.495283] Oops: [#1] SMP DEBUG_PAGEALLOC [192392.495307] Modules linked in: ipmi_devintf loop fuse k10temp coretemp hwmon btrfs raid6_pq zlib_deflate lzo_compress xor xfs exportfs libcrc32c tcp_diag inet_diag nfsv4 ip6table_filter ip6_tables xt_NFLOG nfnetlink_log nfnetlink xt_comment xt_statistic iptable_filter ip_tables x_tables mptctl netconsole autofs4 nfsv3 nfs lockd grace rpcsec_gss_krb5 auth_rpcgss oid_registry sunrpc ipv6 ext3 jbd dm_mod rtc_cmos ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support pcspkr i2c_i801 lpc_ich mfd_core shpchp ehci_pci ehci_hcd mlx4_en ptp pps_core mlx4_core sg ses enclosure button megaraid_sas [192392.495310] CPU: 0 PID: 11992 Comm: kworker/u65:2 Not tainted 3.18.0-rc6-mason+ #7 [192392.495310] Hardware name: ZTSYSTEMS Echo Ridge T4 /A9DRPF-10D, BIOS 1.07 05/10/2012 [192392.495323] Workqueue: btrfs-btrfs-scrub btrfs_scrub_helper [btrfs] [192392.495324] task: 88013dae9110 ti: 8802296a task.ti: 8802296a [192392.495335] RIP: 0010:[a05fe77a] [a05fe77a] lock_stripe_add+0xba/0x390 [btrfs] [192392.495335] RSP: 0018:8802296a3ac8 EFLAGS: 00010006 [192392.495336] RAX: 880577e85018 RBX: 880497f0b2f8 RCX: 8801190fb000 [192392.495337] RDX: 013d RSI: 880303062f80 RDI: 040c275a [192392.495338] RBP: 8802296a3b48 R08: 880497f0 R09: 0001 [192392.495339] R10: R11: R12: 0282 [192392.495339] R13: b250 R14: 880577e85000 R15: 880497f0b2a0 [192392.495340] FS: () GS:88085fc0() knlGS: [192392.495341] CS: 0010 DS: ES: CR0: 80050033 [192392.495342] CR2: 880303062f80 CR3: 05289000 CR4: 000407f0 [192392.495342] Stack: [192392.495344] 880755e28000 880497f0 013d 8801190fb000 [192392.495346] 88013dae9110 81090d40 8802296a3b00 [192392.495347] 8802296a3b00 0010 8802296a3b68 8801190fb000 [192392.495348] Call Trace: [192392.495353] [81090d40] ? bit_waitqueue+0xa0/0xa0 [192392.495363] [a05fea66] raid56_parity_submit_scrub_rbio+0x16/0x30 [btrfs] [192392.495372] [a05e2f0e] scrub_parity_check_and_repair+0x15e/0x1e0 [btrfs] [192392.495380] [a05e301d] scrub_block_put+0x8d/0x90 [btrfs] [192392.495388] [a05e6ed7] ? scrub_bio_end_io_worker+0xd7/0x870 [btrfs] [192392.495396] [a05e6ee9] scrub_bio_end_io_worker+0xe9/0x870 [btrfs] [192392.495405] [a05b8c44] normal_work_helper+0x84/0x330 [btrfs] [192392.495414] [a05b8f42] btrfs_scrub_helper+0x12/0x20 [btrfs] [192392.495417] [8106c50f] process_one_work+0x1bf/0x520 [192392.495419] [8106c48d] ? process_one_work+0x13d/0x520 [192392.495421] [8106c98e] worker_thread+0x11e/0x4b0 [192392.495424] [81653ac9] ? __schedule+0x389/0x880 [192392.495426] [8106c870] ? process_one_work+0x520/0x520 [192392.495428] [81071e2e] kthread+0xde/0x100 [192392.495430] [81071d50] ?
[PATCH v3 10/11] Btrfs: fix possible deadlock caused by pending I/O in plug list
The increase/decrease of bio counter is on the I/O path, so we should use io_schedule() instead of schedule(), or the deadlock might be triggered by the pending I/O in the plug list. io_schedule() can help us because it will flush all the pending I/O before the task is going to sleep. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- Changelog v2 - v3: - New patch to fix possible deadlock caused by the pending bios in the plug list when the io submitters were going to sleep. Changelog v1 - v2: - None. --- fs/btrfs/dev-replace.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index fa27b4e..894796a 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -928,16 +928,23 @@ void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount) wake_up(fs_info-replace_wait); } +#define btrfs_wait_event_io(wq, condition) \ +do { \ + if (condition) \ + break; \ + (void)___wait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0, 0, \ + io_schedule()); \ +} while (0) + void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info) { - DEFINE_WAIT(wait); again: percpu_counter_inc(fs_info-bio_counter); if (test_bit(BTRFS_FS_STATE_DEV_REPLACING, fs_info-fs_state)) { btrfs_bio_counter_dec(fs_info); - wait_event(fs_info-replace_wait, - !test_bit(BTRFS_FS_STATE_DEV_REPLACING, -fs_info-fs_state)); + btrfs_wait_event_io(fs_info-replace_wait, + !test_bit(BTRFS_FS_STATE_DEV_REPLACING, + fs_info-fs_state)); goto again; } -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 10/11] Btrfs: fix possible deadlock caused by pending I/O in plug list
On Wed, Nov 26, 2014 at 8:04 AM, Miao Xie mi...@cn.fujitsu.com wrote: The increase/decrease of bio counter is on the I/O path, so we should use io_schedule() instead of schedule(), or the deadlock might be triggered by the pending I/O in the plug list. io_schedule() can help us because it will flush all the pending I/O before the task is going to sleep. Can you please describe this deadlock in more detail? schedule() also triggers a flush of the plug list, and if that's no longer sufficient we can run into other problems (especially with preemption on). -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 10/11] Btrfs: fix possible deadlock caused by pending I/O in plug list
On Thu, 27 Nov 2014 09:39:56 +0800, Miao Xie wrote: On Wed, 26 Nov 2014 10:02:23 -0500, Chris Mason wrote: On Wed, Nov 26, 2014 at 8:04 AM, Miao Xie mi...@cn.fujitsu.com wrote: The increase/decrease of bio counter is on the I/O path, so we should use io_schedule() instead of schedule(), or the deadlock might be triggered by the pending I/O in the plug list. io_schedule() can help us because it will flush all the pending I/O before the task is going to sleep. Can you please describe this deadlock in more detail? schedule() also triggers a flush of the plug list, and if that's no longer sufficient we can run into other problems (especially with preemption on). Sorry for my miss. I forgot to check the current implementation of schedule(), which flushes the plug list unconditionally. Please ignore this patch. I have updated my raid56-scrub-replace branch, please re-pull the branch. https://github.com/miaoxie/linux-btrfs.git raid56-scrub-replace Thanks Miao Thanks Miao -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html