Re: kernel 4.8-rc5 kernel BUG at block/blk-core.c:2032!

2016-09-09 Thread Shaohua Li
On Fri, Sep 09, 2016 at 08:03:42PM +0200, Stefan Priebe - Profihost AG wrote:
> Am 08.09.2016 um 19:33 schrieb Shaohua Li:
> > On Thu, Sep 08, 2016 at 10:16:59AM -0600, Jens Axboe wrote:
> >> On 09/08/2016 02:23 AM, Stefan Priebe - Profihost AG wrote:
> >>> Hi,
> >>>
> >>> while trying Kernel 4.8-rc5 my raid5 breaks every few minutes.
> >>>
> >>> Trace:
> >>> [ cut here ]
> >>> kernel BUG at block/blk-core.c:2032!
> >>> invalid opcode:  [#1] SMP
> >>> Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
> >>> iptable_filter ip_tables x_tables 8021q garp bonding sb_edac edac_core
> >>> x86_pkg_temp_thermal coretemp kvm_intel kvm i2c_i801 irqbypass i2c_smbus
> >>> ipmi_si crc32_pclmul i2c_core ghash_clmulni_intel shpchp ipmi_msghandler
> >>> button loop fuse btrfs dm_mod raid10 raid0 multipath linear raid456
> >>> async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
> >>> raid1 md_mod sg sd_mod ixgbe i40e mdio usbhid ehci_pci ehci_hcd ahci
> >>> usbcore ptp libahci usb_common megaraid_sas pps_core
> >>> CPU: 8 PID: 1105 Comm: md0_raid5 Not tainted 4.8.0-rc5-3-g3abda5c #2
> >>> Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015
> >>> task: 97de5e1e task.stack: 97de597a
> >>> RIP: 0010:[] []
> >>> generic_make_request+0x1c0/0x1d0
> >>> RSP: 0018:97de597a3aa0 EFLAGS: 00010286
> >>> RAX: 97de5e1e RBX: 97dd227e5030 RCX: 
> >>> RDX: c001 RSI: 0001 RDI: 97de5e7d9db8
> >>> RBP: 97de597a3ad8 R08: 0008 R09: 
> >>> R10:  R11: 0001 R12: 
> >>> R13: 97de5aa20c00 R14: 02f0 R15: 97e65dce0e00
> >>> FS: () GS:97e67f20() 
> >>> knlGS:
> >>> CS: 0010 DS:  ES:  CR0: 80050033
> >>> CR2: 7f0e4e1ec000 CR3: 78c06000 CR4: 001406e0Stack:
> >>> 97de597a3b50 1000  97dd227e4c80
> >>> 97de5aa20c00 02f0 97e65dce0e00 97de597a3ba0
> >>> c02595db c025e04b 0001597a3b01 00020006
> >>> Call Trace:
> >>> [] ops_run_io+0x3bb/0x990 [raid456]
> >>> [] ? raid_run_ops+0xefb/0x1520 [raid456]
> >>> [] handle_stripe+0x9a6/0x2280 [raid456]
> >>> [] ? default_wake_function+0x12/0x20
> >>> [] ? autoremove_wake_function+0x12/0x40
> >>> [] handle_active_stripes.isra.54+0x193/0x4b0 [raid456]
> >>> [] ? __release_stripe+0x15/0x20 [raid456]
> >>> [] raid5d+0x4a9/0x740 [raid456]
> >>> [] ? init_timer_key+0xa0/0xa0
> >>> [] md_thread+0x12b/0x130 [md_mod]
> >>> [] ? wait_woken+0x90/0x90
> >>> [] ? find_pers+0x70/0x70 [md_mod]
> >>> [] kthread+0xdb/0x100
> >>> [] ret_from_fork+0x1f/0x40
> >>> [] ? kthread_park+0x60/0x60
> >>> Code: bd 70 08 00 00 f0 49 83 ad 70 08 00 00 01 74 05 e9 5a ff ff ff 41
> >>> ff 95 80 08 00 00 e9 4e ff ff ff 48 c7 40 08 00 00 00 00 eb 8c <0f> 0b
> >>> 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
> >>> RIP [] generic_make_request+0x1c0/0x1d0
> >>> RSP 
> >>> ---[ end trace 457dbe5e9cdd3473 ]---
> >>
> >> CC'ing Shaohua - this is:
> >>
> >> BUG_ON(bio->bi_next);
> >>
> >> which doesn't look healthy.
> > 
> > Hi Stefan,
> > does below patch help? Looks there is a race condition introduced recently.
> 
> Yes this one fixes it.

Thanks, will push to Linus soon.


Re: kernel 4.8-rc5 kernel BUG at block/blk-core.c:2032!

2016-09-09 Thread Shaohua Li
On Fri, Sep 09, 2016 at 08:03:42PM +0200, Stefan Priebe - Profihost AG wrote:
> Am 08.09.2016 um 19:33 schrieb Shaohua Li:
> > On Thu, Sep 08, 2016 at 10:16:59AM -0600, Jens Axboe wrote:
> >> On 09/08/2016 02:23 AM, Stefan Priebe - Profihost AG wrote:
> >>> Hi,
> >>>
> >>> while trying Kernel 4.8-rc5 my raid5 breaks every few minutes.
> >>>
> >>> Trace:
> >>> [ cut here ]
> >>> kernel BUG at block/blk-core.c:2032!
> >>> invalid opcode:  [#1] SMP
> >>> Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
> >>> iptable_filter ip_tables x_tables 8021q garp bonding sb_edac edac_core
> >>> x86_pkg_temp_thermal coretemp kvm_intel kvm i2c_i801 irqbypass i2c_smbus
> >>> ipmi_si crc32_pclmul i2c_core ghash_clmulni_intel shpchp ipmi_msghandler
> >>> button loop fuse btrfs dm_mod raid10 raid0 multipath linear raid456
> >>> async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
> >>> raid1 md_mod sg sd_mod ixgbe i40e mdio usbhid ehci_pci ehci_hcd ahci
> >>> usbcore ptp libahci usb_common megaraid_sas pps_core
> >>> CPU: 8 PID: 1105 Comm: md0_raid5 Not tainted 4.8.0-rc5-3-g3abda5c #2
> >>> Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015
> >>> task: 97de5e1e task.stack: 97de597a
> >>> RIP: 0010:[] []
> >>> generic_make_request+0x1c0/0x1d0
> >>> RSP: 0018:97de597a3aa0 EFLAGS: 00010286
> >>> RAX: 97de5e1e RBX: 97dd227e5030 RCX: 
> >>> RDX: c001 RSI: 0001 RDI: 97de5e7d9db8
> >>> RBP: 97de597a3ad8 R08: 0008 R09: 
> >>> R10:  R11: 0001 R12: 
> >>> R13: 97de5aa20c00 R14: 02f0 R15: 97e65dce0e00
> >>> FS: () GS:97e67f20() 
> >>> knlGS:
> >>> CS: 0010 DS:  ES:  CR0: 80050033
> >>> CR2: 7f0e4e1ec000 CR3: 78c06000 CR4: 001406e0Stack:
> >>> 97de597a3b50 1000  97dd227e4c80
> >>> 97de5aa20c00 02f0 97e65dce0e00 97de597a3ba0
> >>> c02595db c025e04b 0001597a3b01 00020006
> >>> Call Trace:
> >>> [] ops_run_io+0x3bb/0x990 [raid456]
> >>> [] ? raid_run_ops+0xefb/0x1520 [raid456]
> >>> [] handle_stripe+0x9a6/0x2280 [raid456]
> >>> [] ? default_wake_function+0x12/0x20
> >>> [] ? autoremove_wake_function+0x12/0x40
> >>> [] handle_active_stripes.isra.54+0x193/0x4b0 [raid456]
> >>> [] ? __release_stripe+0x15/0x20 [raid456]
> >>> [] raid5d+0x4a9/0x740 [raid456]
> >>> [] ? init_timer_key+0xa0/0xa0
> >>> [] md_thread+0x12b/0x130 [md_mod]
> >>> [] ? wait_woken+0x90/0x90
> >>> [] ? find_pers+0x70/0x70 [md_mod]
> >>> [] kthread+0xdb/0x100
> >>> [] ret_from_fork+0x1f/0x40
> >>> [] ? kthread_park+0x60/0x60
> >>> Code: bd 70 08 00 00 f0 49 83 ad 70 08 00 00 01 74 05 e9 5a ff ff ff 41
> >>> ff 95 80 08 00 00 e9 4e ff ff ff 48 c7 40 08 00 00 00 00 eb 8c <0f> 0b
> >>> 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
> >>> RIP [] generic_make_request+0x1c0/0x1d0
> >>> RSP 
> >>> ---[ end trace 457dbe5e9cdd3473 ]---
> >>
> >> CC'ing Shaohua - this is:
> >>
> >> BUG_ON(bio->bi_next);
> >>
> >> which doesn't look healthy.
> > 
> > Hi Stefan,
> > does below patch help? Looks there is a race condition introduced recently.
> 
> Yes this one fixes it.

Thanks, will push to Linus soon.


Re: kernel 4.8-rc5 kernel BUG at block/blk-core.c:2032!

2016-09-09 Thread Stefan Priebe - Profihost AG
Am 08.09.2016 um 19:33 schrieb Shaohua Li:
> On Thu, Sep 08, 2016 at 10:16:59AM -0600, Jens Axboe wrote:
>> On 09/08/2016 02:23 AM, Stefan Priebe - Profihost AG wrote:
>>> Hi,
>>>
>>> while trying Kernel 4.8-rc5 my raid5 breaks every few minutes.
>>>
>>> Trace:
>>> [ cut here ]
>>> kernel BUG at block/blk-core.c:2032!
>>> invalid opcode:  [#1] SMP
>>> Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
>>> iptable_filter ip_tables x_tables 8021q garp bonding sb_edac edac_core
>>> x86_pkg_temp_thermal coretemp kvm_intel kvm i2c_i801 irqbypass i2c_smbus
>>> ipmi_si crc32_pclmul i2c_core ghash_clmulni_intel shpchp ipmi_msghandler
>>> button loop fuse btrfs dm_mod raid10 raid0 multipath linear raid456
>>> async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
>>> raid1 md_mod sg sd_mod ixgbe i40e mdio usbhid ehci_pci ehci_hcd ahci
>>> usbcore ptp libahci usb_common megaraid_sas pps_core
>>> CPU: 8 PID: 1105 Comm: md0_raid5 Not tainted 4.8.0-rc5-3-g3abda5c #2
>>> Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015
>>> task: 97de5e1e task.stack: 97de597a
>>> RIP: 0010:[] []
>>> generic_make_request+0x1c0/0x1d0
>>> RSP: 0018:97de597a3aa0 EFLAGS: 00010286
>>> RAX: 97de5e1e RBX: 97dd227e5030 RCX: 
>>> RDX: c001 RSI: 0001 RDI: 97de5e7d9db8
>>> RBP: 97de597a3ad8 R08: 0008 R09: 
>>> R10:  R11: 0001 R12: 
>>> R13: 97de5aa20c00 R14: 02f0 R15: 97e65dce0e00
>>> FS: () GS:97e67f20() knlGS:
>>> CS: 0010 DS:  ES:  CR0: 80050033
>>> CR2: 7f0e4e1ec000 CR3: 78c06000 CR4: 001406e0Stack:
>>> 97de597a3b50 1000  97dd227e4c80
>>> 97de5aa20c00 02f0 97e65dce0e00 97de597a3ba0
>>> c02595db c025e04b 0001597a3b01 00020006
>>> Call Trace:
>>> [] ops_run_io+0x3bb/0x990 [raid456]
>>> [] ? raid_run_ops+0xefb/0x1520 [raid456]
>>> [] handle_stripe+0x9a6/0x2280 [raid456]
>>> [] ? default_wake_function+0x12/0x20
>>> [] ? autoremove_wake_function+0x12/0x40
>>> [] handle_active_stripes.isra.54+0x193/0x4b0 [raid456]
>>> [] ? __release_stripe+0x15/0x20 [raid456]
>>> [] raid5d+0x4a9/0x740 [raid456]
>>> [] ? init_timer_key+0xa0/0xa0
>>> [] md_thread+0x12b/0x130 [md_mod]
>>> [] ? wait_woken+0x90/0x90
>>> [] ? find_pers+0x70/0x70 [md_mod]
>>> [] kthread+0xdb/0x100
>>> [] ret_from_fork+0x1f/0x40
>>> [] ? kthread_park+0x60/0x60
>>> Code: bd 70 08 00 00 f0 49 83 ad 70 08 00 00 01 74 05 e9 5a ff ff ff 41
>>> ff 95 80 08 00 00 e9 4e ff ff ff 48 c7 40 08 00 00 00 00 eb 8c <0f> 0b
>>> 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
>>> RIP [] generic_make_request+0x1c0/0x1d0
>>> RSP 
>>> ---[ end trace 457dbe5e9cdd3473 ]---
>>
>> CC'ing Shaohua - this is:
>>
>> BUG_ON(bio->bi_next);
>>
>> which doesn't look healthy.
> 
> Hi Stefan,
> does below patch help? Looks there is a race condition introduced recently.

Yes this one fixes it.

Thanks.
Stefan

> 
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index b95c54c..ee7fc37 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -2423,10 +2423,10 @@ static void raid5_end_read_request(struct bio * bi)
>   }
>   }
>   rdev_dec_pending(rdev, conf->mddev);
> + bio_reset(bi);
>   clear_bit(R5_LOCKED, >dev[i].flags);
>   set_bit(STRIPE_HANDLE, >state);
>   raid5_release_stripe(sh);
> - bio_reset(bi);
>  }
>  
>  static void raid5_end_write_request(struct bio *bi)
> @@ -2498,6 +2498,7 @@ static void raid5_end_write_request(struct bio *bi)
>   if (sh->batch_head && bi->bi_error && !replacement)
>   set_bit(STRIPE_BATCH_ERR, >batch_head->state);
>  
> + bio_reset(bi);
>   if (!test_and_clear_bit(R5_DOUBLE_LOCKED, >dev[i].flags))
>   clear_bit(R5_LOCKED, >dev[i].flags);
>   set_bit(STRIPE_HANDLE, >state);
> @@ -2505,7 +2506,6 @@ static void raid5_end_write_request(struct bio *bi)
>  
>   if (sh->batch_head && sh != sh->batch_head)
>   raid5_release_stripe(sh->batch_head);
> - bio_reset(bi);
>  }
>  
>  static void raid5_build_block(struct stripe_head *sh, int i, int previous)
> 


Re: kernel 4.8-rc5 kernel BUG at block/blk-core.c:2032!

2016-09-09 Thread Stefan Priebe - Profihost AG
Am 08.09.2016 um 19:33 schrieb Shaohua Li:
> On Thu, Sep 08, 2016 at 10:16:59AM -0600, Jens Axboe wrote:
>> On 09/08/2016 02:23 AM, Stefan Priebe - Profihost AG wrote:
>>> Hi,
>>>
>>> while trying Kernel 4.8-rc5 my raid5 breaks every few minutes.
>>>
>>> Trace:
>>> [ cut here ]
>>> kernel BUG at block/blk-core.c:2032!
>>> invalid opcode:  [#1] SMP
>>> Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
>>> iptable_filter ip_tables x_tables 8021q garp bonding sb_edac edac_core
>>> x86_pkg_temp_thermal coretemp kvm_intel kvm i2c_i801 irqbypass i2c_smbus
>>> ipmi_si crc32_pclmul i2c_core ghash_clmulni_intel shpchp ipmi_msghandler
>>> button loop fuse btrfs dm_mod raid10 raid0 multipath linear raid456
>>> async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
>>> raid1 md_mod sg sd_mod ixgbe i40e mdio usbhid ehci_pci ehci_hcd ahci
>>> usbcore ptp libahci usb_common megaraid_sas pps_core
>>> CPU: 8 PID: 1105 Comm: md0_raid5 Not tainted 4.8.0-rc5-3-g3abda5c #2
>>> Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015
>>> task: 97de5e1e task.stack: 97de597a
>>> RIP: 0010:[] []
>>> generic_make_request+0x1c0/0x1d0
>>> RSP: 0018:97de597a3aa0 EFLAGS: 00010286
>>> RAX: 97de5e1e RBX: 97dd227e5030 RCX: 
>>> RDX: c001 RSI: 0001 RDI: 97de5e7d9db8
>>> RBP: 97de597a3ad8 R08: 0008 R09: 
>>> R10:  R11: 0001 R12: 
>>> R13: 97de5aa20c00 R14: 02f0 R15: 97e65dce0e00
>>> FS: () GS:97e67f20() knlGS:
>>> CS: 0010 DS:  ES:  CR0: 80050033
>>> CR2: 7f0e4e1ec000 CR3: 78c06000 CR4: 001406e0Stack:
>>> 97de597a3b50 1000  97dd227e4c80
>>> 97de5aa20c00 02f0 97e65dce0e00 97de597a3ba0
>>> c02595db c025e04b 0001597a3b01 00020006
>>> Call Trace:
>>> [] ops_run_io+0x3bb/0x990 [raid456]
>>> [] ? raid_run_ops+0xefb/0x1520 [raid456]
>>> [] handle_stripe+0x9a6/0x2280 [raid456]
>>> [] ? default_wake_function+0x12/0x20
>>> [] ? autoremove_wake_function+0x12/0x40
>>> [] handle_active_stripes.isra.54+0x193/0x4b0 [raid456]
>>> [] ? __release_stripe+0x15/0x20 [raid456]
>>> [] raid5d+0x4a9/0x740 [raid456]
>>> [] ? init_timer_key+0xa0/0xa0
>>> [] md_thread+0x12b/0x130 [md_mod]
>>> [] ? wait_woken+0x90/0x90
>>> [] ? find_pers+0x70/0x70 [md_mod]
>>> [] kthread+0xdb/0x100
>>> [] ret_from_fork+0x1f/0x40
>>> [] ? kthread_park+0x60/0x60
>>> Code: bd 70 08 00 00 f0 49 83 ad 70 08 00 00 01 74 05 e9 5a ff ff ff 41
>>> ff 95 80 08 00 00 e9 4e ff ff ff 48 c7 40 08 00 00 00 00 eb 8c <0f> 0b
>>> 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
>>> RIP [] generic_make_request+0x1c0/0x1d0
>>> RSP 
>>> ---[ end trace 457dbe5e9cdd3473 ]---
>>
>> CC'ing Shaohua - this is:
>>
>> BUG_ON(bio->bi_next);
>>
>> which doesn't look healthy.
> 
> Hi Stefan,
> does below patch help? Looks there is a race condition introduced recently.

Yes this one fixes it.

Thanks.
Stefan

> 
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index b95c54c..ee7fc37 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -2423,10 +2423,10 @@ static void raid5_end_read_request(struct bio * bi)
>   }
>   }
>   rdev_dec_pending(rdev, conf->mddev);
> + bio_reset(bi);
>   clear_bit(R5_LOCKED, >dev[i].flags);
>   set_bit(STRIPE_HANDLE, >state);
>   raid5_release_stripe(sh);
> - bio_reset(bi);
>  }
>  
>  static void raid5_end_write_request(struct bio *bi)
> @@ -2498,6 +2498,7 @@ static void raid5_end_write_request(struct bio *bi)
>   if (sh->batch_head && bi->bi_error && !replacement)
>   set_bit(STRIPE_BATCH_ERR, >batch_head->state);
>  
> + bio_reset(bi);
>   if (!test_and_clear_bit(R5_DOUBLE_LOCKED, >dev[i].flags))
>   clear_bit(R5_LOCKED, >dev[i].flags);
>   set_bit(STRIPE_HANDLE, >state);
> @@ -2505,7 +2506,6 @@ static void raid5_end_write_request(struct bio *bi)
>  
>   if (sh->batch_head && sh != sh->batch_head)
>   raid5_release_stripe(sh->batch_head);
> - bio_reset(bi);
>  }
>  
>  static void raid5_build_block(struct stripe_head *sh, int i, int previous)
> 


Re: kernel 4.8-rc5 kernel BUG at block/blk-core.c:2032!

2016-09-08 Thread Shaohua Li
On Thu, Sep 08, 2016 at 10:16:59AM -0600, Jens Axboe wrote:
> On 09/08/2016 02:23 AM, Stefan Priebe - Profihost AG wrote:
> > Hi,
> > 
> > while trying Kernel 4.8-rc5 my raid5 breaks every few minutes.
> > 
> > Trace:
> > [ cut here ]
> > kernel BUG at block/blk-core.c:2032!
> > invalid opcode:  [#1] SMP
> > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
> > iptable_filter ip_tables x_tables 8021q garp bonding sb_edac edac_core
> > x86_pkg_temp_thermal coretemp kvm_intel kvm i2c_i801 irqbypass i2c_smbus
> > ipmi_si crc32_pclmul i2c_core ghash_clmulni_intel shpchp ipmi_msghandler
> > button loop fuse btrfs dm_mod raid10 raid0 multipath linear raid456
> > async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
> > raid1 md_mod sg sd_mod ixgbe i40e mdio usbhid ehci_pci ehci_hcd ahci
> > usbcore ptp libahci usb_common megaraid_sas pps_core
> > CPU: 8 PID: 1105 Comm: md0_raid5 Not tainted 4.8.0-rc5-3-g3abda5c #2
> > Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015
> > task: 97de5e1e task.stack: 97de597a
> > RIP: 0010:[] []
> > generic_make_request+0x1c0/0x1d0
> > RSP: 0018:97de597a3aa0 EFLAGS: 00010286
> > RAX: 97de5e1e RBX: 97dd227e5030 RCX: 
> > RDX: c001 RSI: 0001 RDI: 97de5e7d9db8
> > RBP: 97de597a3ad8 R08: 0008 R09: 
> > R10:  R11: 0001 R12: 
> > R13: 97de5aa20c00 R14: 02f0 R15: 97e65dce0e00
> > FS: () GS:97e67f20() knlGS:
> > CS: 0010 DS:  ES:  CR0: 80050033
> > CR2: 7f0e4e1ec000 CR3: 78c06000 CR4: 001406e0Stack:
> > 97de597a3b50 1000  97dd227e4c80
> > 97de5aa20c00 02f0 97e65dce0e00 97de597a3ba0
> > c02595db c025e04b 0001597a3b01 00020006
> > Call Trace:
> > [] ops_run_io+0x3bb/0x990 [raid456]
> > [] ? raid_run_ops+0xefb/0x1520 [raid456]
> > [] handle_stripe+0x9a6/0x2280 [raid456]
> > [] ? default_wake_function+0x12/0x20
> > [] ? autoremove_wake_function+0x12/0x40
> > [] handle_active_stripes.isra.54+0x193/0x4b0 [raid456]
> > [] ? __release_stripe+0x15/0x20 [raid456]
> > [] raid5d+0x4a9/0x740 [raid456]
> > [] ? init_timer_key+0xa0/0xa0
> > [] md_thread+0x12b/0x130 [md_mod]
> > [] ? wait_woken+0x90/0x90
> > [] ? find_pers+0x70/0x70 [md_mod]
> > [] kthread+0xdb/0x100
> > [] ret_from_fork+0x1f/0x40
> > [] ? kthread_park+0x60/0x60
> > Code: bd 70 08 00 00 f0 49 83 ad 70 08 00 00 01 74 05 e9 5a ff ff ff 41
> > ff 95 80 08 00 00 e9 4e ff ff ff 48 c7 40 08 00 00 00 00 eb 8c <0f> 0b
> > 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
> > RIP [] generic_make_request+0x1c0/0x1d0
> > RSP 
> > ---[ end trace 457dbe5e9cdd3473 ]---
> 
> CC'ing Shaohua - this is:
> 
> BUG_ON(bio->bi_next);
> 
> which doesn't look healthy.

Hi Stefan,
does below patch help? Looks there is a race condition introduced recently.


diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b95c54c..ee7fc37 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2423,10 +2423,10 @@ static void raid5_end_read_request(struct bio * bi)
}
}
rdev_dec_pending(rdev, conf->mddev);
+   bio_reset(bi);
clear_bit(R5_LOCKED, >dev[i].flags);
set_bit(STRIPE_HANDLE, >state);
raid5_release_stripe(sh);
-   bio_reset(bi);
 }
 
 static void raid5_end_write_request(struct bio *bi)
@@ -2498,6 +2498,7 @@ static void raid5_end_write_request(struct bio *bi)
if (sh->batch_head && bi->bi_error && !replacement)
set_bit(STRIPE_BATCH_ERR, >batch_head->state);
 
+   bio_reset(bi);
if (!test_and_clear_bit(R5_DOUBLE_LOCKED, >dev[i].flags))
clear_bit(R5_LOCKED, >dev[i].flags);
set_bit(STRIPE_HANDLE, >state);
@@ -2505,7 +2506,6 @@ static void raid5_end_write_request(struct bio *bi)
 
if (sh->batch_head && sh != sh->batch_head)
raid5_release_stripe(sh->batch_head);
-   bio_reset(bi);
 }
 
 static void raid5_build_block(struct stripe_head *sh, int i, int previous)


Re: kernel 4.8-rc5 kernel BUG at block/blk-core.c:2032!

2016-09-08 Thread Shaohua Li
On Thu, Sep 08, 2016 at 10:16:59AM -0600, Jens Axboe wrote:
> On 09/08/2016 02:23 AM, Stefan Priebe - Profihost AG wrote:
> > Hi,
> > 
> > while trying Kernel 4.8-rc5 my raid5 breaks every few minutes.
> > 
> > Trace:
> > [ cut here ]
> > kernel BUG at block/blk-core.c:2032!
> > invalid opcode:  [#1] SMP
> > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
> > iptable_filter ip_tables x_tables 8021q garp bonding sb_edac edac_core
> > x86_pkg_temp_thermal coretemp kvm_intel kvm i2c_i801 irqbypass i2c_smbus
> > ipmi_si crc32_pclmul i2c_core ghash_clmulni_intel shpchp ipmi_msghandler
> > button loop fuse btrfs dm_mod raid10 raid0 multipath linear raid456
> > async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
> > raid1 md_mod sg sd_mod ixgbe i40e mdio usbhid ehci_pci ehci_hcd ahci
> > usbcore ptp libahci usb_common megaraid_sas pps_core
> > CPU: 8 PID: 1105 Comm: md0_raid5 Not tainted 4.8.0-rc5-3-g3abda5c #2
> > Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015
> > task: 97de5e1e task.stack: 97de597a
> > RIP: 0010:[] []
> > generic_make_request+0x1c0/0x1d0
> > RSP: 0018:97de597a3aa0 EFLAGS: 00010286
> > RAX: 97de5e1e RBX: 97dd227e5030 RCX: 
> > RDX: c001 RSI: 0001 RDI: 97de5e7d9db8
> > RBP: 97de597a3ad8 R08: 0008 R09: 
> > R10:  R11: 0001 R12: 
> > R13: 97de5aa20c00 R14: 02f0 R15: 97e65dce0e00
> > FS: () GS:97e67f20() knlGS:
> > CS: 0010 DS:  ES:  CR0: 80050033
> > CR2: 7f0e4e1ec000 CR3: 78c06000 CR4: 001406e0Stack:
> > 97de597a3b50 1000  97dd227e4c80
> > 97de5aa20c00 02f0 97e65dce0e00 97de597a3ba0
> > c02595db c025e04b 0001597a3b01 00020006
> > Call Trace:
> > [] ops_run_io+0x3bb/0x990 [raid456]
> > [] ? raid_run_ops+0xefb/0x1520 [raid456]
> > [] handle_stripe+0x9a6/0x2280 [raid456]
> > [] ? default_wake_function+0x12/0x20
> > [] ? autoremove_wake_function+0x12/0x40
> > [] handle_active_stripes.isra.54+0x193/0x4b0 [raid456]
> > [] ? __release_stripe+0x15/0x20 [raid456]
> > [] raid5d+0x4a9/0x740 [raid456]
> > [] ? init_timer_key+0xa0/0xa0
> > [] md_thread+0x12b/0x130 [md_mod]
> > [] ? wait_woken+0x90/0x90
> > [] ? find_pers+0x70/0x70 [md_mod]
> > [] kthread+0xdb/0x100
> > [] ret_from_fork+0x1f/0x40
> > [] ? kthread_park+0x60/0x60
> > Code: bd 70 08 00 00 f0 49 83 ad 70 08 00 00 01 74 05 e9 5a ff ff ff 41
> > ff 95 80 08 00 00 e9 4e ff ff ff 48 c7 40 08 00 00 00 00 eb 8c <0f> 0b
> > 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
> > RIP [] generic_make_request+0x1c0/0x1d0
> > RSP 
> > ---[ end trace 457dbe5e9cdd3473 ]---
> 
> CC'ing Shaohua - this is:
> 
> BUG_ON(bio->bi_next);
> 
> which doesn't look healthy.

Hi Stefan,
does below patch help? Looks there is a race condition introduced recently.


diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b95c54c..ee7fc37 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2423,10 +2423,10 @@ static void raid5_end_read_request(struct bio * bi)
}
}
rdev_dec_pending(rdev, conf->mddev);
+   bio_reset(bi);
clear_bit(R5_LOCKED, >dev[i].flags);
set_bit(STRIPE_HANDLE, >state);
raid5_release_stripe(sh);
-   bio_reset(bi);
 }
 
 static void raid5_end_write_request(struct bio *bi)
@@ -2498,6 +2498,7 @@ static void raid5_end_write_request(struct bio *bi)
if (sh->batch_head && bi->bi_error && !replacement)
set_bit(STRIPE_BATCH_ERR, >batch_head->state);
 
+   bio_reset(bi);
if (!test_and_clear_bit(R5_DOUBLE_LOCKED, >dev[i].flags))
clear_bit(R5_LOCKED, >dev[i].flags);
set_bit(STRIPE_HANDLE, >state);
@@ -2505,7 +2506,6 @@ static void raid5_end_write_request(struct bio *bi)
 
if (sh->batch_head && sh != sh->batch_head)
raid5_release_stripe(sh->batch_head);
-   bio_reset(bi);
 }
 
 static void raid5_build_block(struct stripe_head *sh, int i, int previous)


Re: kernel 4.8-rc5 kernel BUG at block/blk-core.c:2032!

2016-09-08 Thread Jens Axboe

On 09/08/2016 02:23 AM, Stefan Priebe - Profihost AG wrote:

Hi,

while trying Kernel 4.8-rc5 my raid5 breaks every few minutes.

Trace:
[ cut here ]
kernel BUG at block/blk-core.c:2032!
invalid opcode:  [#1] SMP
Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
iptable_filter ip_tables x_tables 8021q garp bonding sb_edac edac_core
x86_pkg_temp_thermal coretemp kvm_intel kvm i2c_i801 irqbypass i2c_smbus
ipmi_si crc32_pclmul i2c_core ghash_clmulni_intel shpchp ipmi_msghandler
button loop fuse btrfs dm_mod raid10 raid0 multipath linear raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
raid1 md_mod sg sd_mod ixgbe i40e mdio usbhid ehci_pci ehci_hcd ahci
usbcore ptp libahci usb_common megaraid_sas pps_core
CPU: 8 PID: 1105 Comm: md0_raid5 Not tainted 4.8.0-rc5-3-g3abda5c #2
Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015
task: 97de5e1e task.stack: 97de597a
RIP: 0010:[] []
generic_make_request+0x1c0/0x1d0
RSP: 0018:97de597a3aa0 EFLAGS: 00010286
RAX: 97de5e1e RBX: 97dd227e5030 RCX: 
RDX: c001 RSI: 0001 RDI: 97de5e7d9db8
RBP: 97de597a3ad8 R08: 0008 R09: 
R10:  R11: 0001 R12: 
R13: 97de5aa20c00 R14: 02f0 R15: 97e65dce0e00
FS: () GS:97e67f20() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 7f0e4e1ec000 CR3: 78c06000 CR4: 001406e0Stack:
97de597a3b50 1000  97dd227e4c80
97de5aa20c00 02f0 97e65dce0e00 97de597a3ba0
c02595db c025e04b 0001597a3b01 00020006
Call Trace:
[] ops_run_io+0x3bb/0x990 [raid456]
[] ? raid_run_ops+0xefb/0x1520 [raid456]
[] handle_stripe+0x9a6/0x2280 [raid456]
[] ? default_wake_function+0x12/0x20
[] ? autoremove_wake_function+0x12/0x40
[] handle_active_stripes.isra.54+0x193/0x4b0 [raid456]
[] ? __release_stripe+0x15/0x20 [raid456]
[] raid5d+0x4a9/0x740 [raid456]
[] ? init_timer_key+0xa0/0xa0
[] md_thread+0x12b/0x130 [md_mod]
[] ? wait_woken+0x90/0x90
[] ? find_pers+0x70/0x70 [md_mod]
[] kthread+0xdb/0x100
[] ret_from_fork+0x1f/0x40
[] ? kthread_park+0x60/0x60
Code: bd 70 08 00 00 f0 49 83 ad 70 08 00 00 01 74 05 e9 5a ff ff ff 41
ff 95 80 08 00 00 e9 4e ff ff ff 48 c7 40 08 00 00 00 00 eb 8c <0f> 0b
66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
RIP [] generic_make_request+0x1c0/0x1d0
RSP 
---[ end trace 457dbe5e9cdd3473 ]---


CC'ing Shaohua - this is:

BUG_ON(bio->bi_next);

which doesn't look healthy.

--
Jens Axboe



Re: kernel 4.8-rc5 kernel BUG at block/blk-core.c:2032!

2016-09-08 Thread Jens Axboe

On 09/08/2016 02:23 AM, Stefan Priebe - Profihost AG wrote:

Hi,

while trying Kernel 4.8-rc5 my raid5 breaks every few minutes.

Trace:
[ cut here ]
kernel BUG at block/blk-core.c:2032!
invalid opcode:  [#1] SMP
Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
iptable_filter ip_tables x_tables 8021q garp bonding sb_edac edac_core
x86_pkg_temp_thermal coretemp kvm_intel kvm i2c_i801 irqbypass i2c_smbus
ipmi_si crc32_pclmul i2c_core ghash_clmulni_intel shpchp ipmi_msghandler
button loop fuse btrfs dm_mod raid10 raid0 multipath linear raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
raid1 md_mod sg sd_mod ixgbe i40e mdio usbhid ehci_pci ehci_hcd ahci
usbcore ptp libahci usb_common megaraid_sas pps_core
CPU: 8 PID: 1105 Comm: md0_raid5 Not tainted 4.8.0-rc5-3-g3abda5c #2
Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015
task: 97de5e1e task.stack: 97de597a
RIP: 0010:[] []
generic_make_request+0x1c0/0x1d0
RSP: 0018:97de597a3aa0 EFLAGS: 00010286
RAX: 97de5e1e RBX: 97dd227e5030 RCX: 
RDX: c001 RSI: 0001 RDI: 97de5e7d9db8
RBP: 97de597a3ad8 R08: 0008 R09: 
R10:  R11: 0001 R12: 
R13: 97de5aa20c00 R14: 02f0 R15: 97e65dce0e00
FS: () GS:97e67f20() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 7f0e4e1ec000 CR3: 78c06000 CR4: 001406e0Stack:
97de597a3b50 1000  97dd227e4c80
97de5aa20c00 02f0 97e65dce0e00 97de597a3ba0
c02595db c025e04b 0001597a3b01 00020006
Call Trace:
[] ops_run_io+0x3bb/0x990 [raid456]
[] ? raid_run_ops+0xefb/0x1520 [raid456]
[] handle_stripe+0x9a6/0x2280 [raid456]
[] ? default_wake_function+0x12/0x20
[] ? autoremove_wake_function+0x12/0x40
[] handle_active_stripes.isra.54+0x193/0x4b0 [raid456]
[] ? __release_stripe+0x15/0x20 [raid456]
[] raid5d+0x4a9/0x740 [raid456]
[] ? init_timer_key+0xa0/0xa0
[] md_thread+0x12b/0x130 [md_mod]
[] ? wait_woken+0x90/0x90
[] ? find_pers+0x70/0x70 [md_mod]
[] kthread+0xdb/0x100
[] ret_from_fork+0x1f/0x40
[] ? kthread_park+0x60/0x60
Code: bd 70 08 00 00 f0 49 83 ad 70 08 00 00 01 74 05 e9 5a ff ff ff 41
ff 95 80 08 00 00 e9 4e ff ff ff 48 c7 40 08 00 00 00 00 eb 8c <0f> 0b
66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
RIP [] generic_make_request+0x1c0/0x1d0
RSP 
---[ end trace 457dbe5e9cdd3473 ]---


CC'ing Shaohua - this is:

BUG_ON(bio->bi_next);

which doesn't look healthy.

--
Jens Axboe