Re: kernel BUG at block/bio.c:1785 while trying to issue a discard to LVM on RAID1 md

2016-10-06 Thread Sitsofe Wheeler
On 5 October 2016 at 22:39, Shaohua Li  wrote:
> On Wed, Oct 05, 2016 at 10:31:11PM +0100, Sitsofe Wheeler wrote:
>> On 3 October 2016 at 17:47, Sitsofe Wheeler  wrote:
>> >
>> > While trying to do a discard (via blkdiscard --length 1048576
>> > /dev/) to an LVM device atop a two disk md RAID1 the
>> > following oops was generated:
>> >
>> > [  103.306243] md: resync of RAID array md127
>> > [  103.306246] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
>> > [  103.306248] md: using maximum available idle IO bandwidth (but not
>> > more than 20 KB/sec) for resync.
>> > [  103.306251] md: using 128k window, over a total of 244194432k.
>> > [  103.308158] [ cut here ]
>> > [  103.308205] kernel BUG at block/bio.c:1785!
>>
>> This still seems to be here but slightly modified with a 4.8.0 kernel:
>
> Does this fix the issue? Looks there is IO error
>
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 21dc00e..349eb11 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2196,7 +2196,6 @@ static int narrow_write_error(struct r1bio *r1_bio, int 
> i)
> wbio = bio_clone_mddev(r1_bio->master_bio, GFP_NOIO, 
> mddev);
> }
>
> -   bio_set_op_attrs(wbio, REQ_OP_WRITE, 0);
> wbio->bi_iter.bi_sector = r1_bio->sector;
> wbio->bi_iter.bi_size = r1_bio->sectors << 9;
>

Yes the patch above fixes the issue and make blkdiscard just report
that the BLKDISCARD ioctl failed. Since having this patch applied
means the issue seen in
http://www.gossamer-threads.com/lists/linux/kernel/2538757?do=post_view_threaded#2538757
(BUG at arch/x86/kernel/pci-nommu.c:66 / BUG at
./include/linux/scatterlist.h:90) can't be reached does that mean
whatever was seen there is also spurious?

Additionally as this issue seems to have been a problem going back to
at least the 3.18 kernels, would a fix similar to this be eligible for
stable kernels?

-- 
Sitsofe | http://sucs.org/~sits/


Re: kernel BUG at block/bio.c:1785 while trying to issue a discard to LVM on RAID1 md

2016-10-06 Thread Sitsofe Wheeler
On 5 October 2016 at 22:39, Shaohua Li  wrote:
> On Wed, Oct 05, 2016 at 10:31:11PM +0100, Sitsofe Wheeler wrote:
>> On 3 October 2016 at 17:47, Sitsofe Wheeler  wrote:
>> >
>> > While trying to do a discard (via blkdiscard --length 1048576
>> > /dev/) to an LVM device atop a two disk md RAID1 the
>> > following oops was generated:
>> >
>> > [  103.306243] md: resync of RAID array md127
>> > [  103.306246] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
>> > [  103.306248] md: using maximum available idle IO bandwidth (but not
>> > more than 20 KB/sec) for resync.
>> > [  103.306251] md: using 128k window, over a total of 244194432k.
>> > [  103.308158] [ cut here ]
>> > [  103.308205] kernel BUG at block/bio.c:1785!
>>
>> This still seems to be here but slightly modified with a 4.8.0 kernel:
>
> Does this fix the issue? Looks there is IO error
>
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 21dc00e..349eb11 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2196,7 +2196,6 @@ static int narrow_write_error(struct r1bio *r1_bio, int 
> i)
> wbio = bio_clone_mddev(r1_bio->master_bio, GFP_NOIO, 
> mddev);
> }
>
> -   bio_set_op_attrs(wbio, REQ_OP_WRITE, 0);
> wbio->bi_iter.bi_sector = r1_bio->sector;
> wbio->bi_iter.bi_size = r1_bio->sectors << 9;
>

Yes the patch above fixes the issue and make blkdiscard just report
that the BLKDISCARD ioctl failed. Since having this patch applied
means the issue seen in
http://www.gossamer-threads.com/lists/linux/kernel/2538757?do=post_view_threaded#2538757
(BUG at arch/x86/kernel/pci-nommu.c:66 / BUG at
./include/linux/scatterlist.h:90) can't be reached does that mean
whatever was seen there is also spurious?

Additionally as this issue seems to have been a problem going back to
at least the 3.18 kernels, would a fix similar to this be eligible for
stable kernels?

-- 
Sitsofe | http://sucs.org/~sits/


Re: kernel BUG at block/bio.c:1785 while trying to issue a discard to LVM on RAID1 md

2016-10-05 Thread Shaohua Li
On Wed, Oct 05, 2016 at 10:31:11PM +0100, Sitsofe Wheeler wrote:
> On 3 October 2016 at 17:47, Sitsofe Wheeler  wrote:
> >
> > While trying to do a discard (via blkdiscard --length 1048576
> > /dev/) to an LVM device atop a two disk md RAID1 the
> > following oops was generated:
> >
> > [  103.306243] md: resync of RAID array md127
> > [  103.306246] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> > [  103.306248] md: using maximum available idle IO bandwidth (but not
> > more than 20 KB/sec) for resync.
> > [  103.306251] md: using 128k window, over a total of 244194432k.
> > [  103.308158] [ cut here ]
> > [  103.308205] kernel BUG at block/bio.c:1785!
> 
> This still seems to be here but slightly modified with a 4.8.0 kernel:

Does this fix the issue? Looks there is IO error


diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 21dc00e..349eb11 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2196,7 +2196,6 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
wbio = bio_clone_mddev(r1_bio->master_bio, GFP_NOIO, 
mddev);
}
 
-   bio_set_op_attrs(wbio, REQ_OP_WRITE, 0);
wbio->bi_iter.bi_sector = r1_bio->sector;
wbio->bi_iter.bi_size = r1_bio->sectors << 9;
 


Re: kernel BUG at block/bio.c:1785 while trying to issue a discard to LVM on RAID1 md

2016-10-05 Thread Shaohua Li
On Wed, Oct 05, 2016 at 10:31:11PM +0100, Sitsofe Wheeler wrote:
> On 3 October 2016 at 17:47, Sitsofe Wheeler  wrote:
> >
> > While trying to do a discard (via blkdiscard --length 1048576
> > /dev/) to an LVM device atop a two disk md RAID1 the
> > following oops was generated:
> >
> > [  103.306243] md: resync of RAID array md127
> > [  103.306246] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> > [  103.306248] md: using maximum available idle IO bandwidth (but not
> > more than 20 KB/sec) for resync.
> > [  103.306251] md: using 128k window, over a total of 244194432k.
> > [  103.308158] [ cut here ]
> > [  103.308205] kernel BUG at block/bio.c:1785!
> 
> This still seems to be here but slightly modified with a 4.8.0 kernel:

Does this fix the issue? Looks there is IO error


diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 21dc00e..349eb11 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2196,7 +2196,6 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
wbio = bio_clone_mddev(r1_bio->master_bio, GFP_NOIO, 
mddev);
}
 
-   bio_set_op_attrs(wbio, REQ_OP_WRITE, 0);
wbio->bi_iter.bi_sector = r1_bio->sector;
wbio->bi_iter.bi_size = r1_bio->sectors << 9;
 


Re: kernel BUG at block/bio.c:1785 while trying to issue a discard to LVM on RAID1 md

2016-10-05 Thread Sitsofe Wheeler
On 3 October 2016 at 17:47, Sitsofe Wheeler  wrote:
>
> While trying to do a discard (via blkdiscard --length 1048576
> /dev/) to an LVM device atop a two disk md RAID1 the
> following oops was generated:
>
> [  103.306243] md: resync of RAID array md127
> [  103.306246] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> [  103.306248] md: using maximum available idle IO bandwidth (but not
> more than 20 KB/sec) for resync.
> [  103.306251] md: using 128k window, over a total of 244194432k.
> [  103.308158] [ cut here ]
> [  103.308205] kernel BUG at block/bio.c:1785!

This still seems to be here but slightly modified with a 4.8.0 kernel:

[  156.501373] [ cut here ]
[  156.502374] kernel BUG at block/bio.c:1785!
[  156.503139] invalid opcode:  [#1] SMP
[  156.503816] Modules linked in: vmw_vsock_vmci_transport vsock
sb_edac edac_core intel_powerclamp coretemp raid1 crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel ppdev intel_rapl_perf vmxnet3
vmw_balloon pcspkr joydev acpi_cpufreq tpm_tis i2c_piix4 tpm_tis_core
vmw_vmci parport_pc fjes tpm shpchp parport dm_multipath vmwgfx
drm_kms_helper ttm drm crc32c_intel serio_raw vmw_pvscsi ata_generic
pata_acpi
[  156.510208] CPU: 0 PID: 407 Comm: md127_raid1 Not tainted
4.8.0-1.vanilla.knurd.1.fc24.x86_64 #1
[  156.511559] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[  156.513209] task: 9944f9ed9d80 task.stack: 9944f903c000
[  156.514159] RIP: 0010:[]  []
bio_split+0x8a/0x90
[  156.515471] RSP: 0018:9944f903fb38  EFLAGS: 00010246
[  156.516323] RAX: fffdd1f3 RBX:  RCX: 9944ff05cf00
[  156.517456] RDX: 0240 RSI:  RDI: 9944f830b700
[  156.518573] RBP: 9944f903fb50 R08: 0001 R09: 9944f830b700
[  156.519680] R10: 9944f903fbb0 R11:  R12: 
[  156.520807] R13:  R14:  R15: 9944ff1388e0
[  156.521916] FS:  () GS:9944fec0()
knlGS:
[  156.523187] CS:  0010 DS:  ES:  CR0: 80050033
[  156.524080] CR2: 561c09a99108 CR3: 3ab2e000 CR4: 001406f0
[  156.525228] Stack:
[  156.525571]   0200 
9944f903fbf0
[  156.526812]  bc3b7f0e 9944f8928000 9944fec19580
9944ff05cf00
[  156.528072]  0005803fff747000 9944f903fc00 0001f8928080
9944f830b700
[  156.529308] Call Trace:
[  156.529706]  [] blk_queue_split+0x18e/0x640
[  156.530609]  [] blk_queue_bio+0x53/0x3c0
[  156.531487]  [] generic_make_request+0xf2/0x1d0
[  156.532436]  [] submit_bio+0x7d/0x150
[  156.533272]  [] ? bio_clone_bioset+0x14e/0x360
[  156.534211]  [] submit_bio_wait+0x5d/0x90
[  156.535108]  [] raid1d+0x405/0x1010 [raid1]
[  156.536040]  [] ? set_next_entity+0x49/0xd0
[  156.536938]  [] md_thread+0x139/0x150
[  156.537766]  [] ? prepare_to_wait_event+0xf0/0xf0
[  156.538758]  [] ? find_pers+0x70/0x70
[  156.539578]  [] kthread+0xd8/0xf0
[  156.540355]  [] ret_from_fork+0x1f/0x40
[  156.541208]  [] ? kthread_worker_fn+0x180/0x180
[  156.542186] Code: 44 89 e2 4c 89 ef e8 d6 51 03 00 41 8b 75 28 48
89 df e8 3a d3 ff ff 5b 4c 89 e8 41 5c 41 5d 5d c3 e8 1b fc ff ff 49
89 c5 eb b6 <0f> 0b 0f 0b 66 90 0f 1f 44 00 00 48 8b 07 55 48 89 e5 48
85 c0
[  156.546655] RIP  [] bio_split+0x8a/0x90
[  156.547535]  RSP 
[  156.548127] ---[ end trace bdaed0bbc089e451 ]---
[  156.549888] BUG: unable to handle kernel paging request at 4ebddcbe
[  156.551080] IP: [] __wake_up_common+0x2b/0x80
[  156.552036] PGD 0
[  156.552430] Oops:  [#2] SMP
[  156.552958] Modules linked in: vmw_vsock_vmci_transport vsock
sb_edac edac_core intel_powerclamp coretemp raid1 crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel ppdev intel_rapl_perf vmxnet3
vmw_balloon pcspkr joydev acpi_cpufreq tpm_tis i2c_piix4 tpm_tis_core
vmw_vmci parport_pc fjes tpm shpchp parport dm_multipath vmwgfx
drm_kms_helper ttm drm crc32c_intel serio_raw vmw_pvscsi ata_generic
pata_acpi
[  156.559295] CPU: 0 PID: 407 Comm: md127_raid1 Tainted: G  D
4.8.0-1.vanilla.knurd.1.fc24.x86_64 #1
[  156.560844] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[  156.562491] task: 9944f9ed9d80 task.stack: 9944f903c000
[  156.563418] RIP: 0010:[]  []
__wake_up_common+0x2b/0x80
[  156.564740] RSP: 0018:9944f903fe38  EFLAGS: 00010096
[  156.565588] RAX: 0282 RBX: 9944f903ff10 RCX: 
[  156.566703] RDX: 4ebddcbe RSI: 0003 RDI: 9944f903ff10
[  156.567819] RBP: 9944f903fe70 R08:  R09: 
[  156.568924] R10:  R11: fa83b2da R12: 9944f903ff18
[  156.570038] R13: 0282 R14: 0001 R15: 0003
[  156.571158] FS:  

Re: kernel BUG at block/bio.c:1785 while trying to issue a discard to LVM on RAID1 md

2016-10-05 Thread Sitsofe Wheeler
On 3 October 2016 at 17:47, Sitsofe Wheeler  wrote:
>
> While trying to do a discard (via blkdiscard --length 1048576
> /dev/) to an LVM device atop a two disk md RAID1 the
> following oops was generated:
>
> [  103.306243] md: resync of RAID array md127
> [  103.306246] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> [  103.306248] md: using maximum available idle IO bandwidth (but not
> more than 20 KB/sec) for resync.
> [  103.306251] md: using 128k window, over a total of 244194432k.
> [  103.308158] [ cut here ]
> [  103.308205] kernel BUG at block/bio.c:1785!

This still seems to be here but slightly modified with a 4.8.0 kernel:

[  156.501373] [ cut here ]
[  156.502374] kernel BUG at block/bio.c:1785!
[  156.503139] invalid opcode:  [#1] SMP
[  156.503816] Modules linked in: vmw_vsock_vmci_transport vsock
sb_edac edac_core intel_powerclamp coretemp raid1 crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel ppdev intel_rapl_perf vmxnet3
vmw_balloon pcspkr joydev acpi_cpufreq tpm_tis i2c_piix4 tpm_tis_core
vmw_vmci parport_pc fjes tpm shpchp parport dm_multipath vmwgfx
drm_kms_helper ttm drm crc32c_intel serio_raw vmw_pvscsi ata_generic
pata_acpi
[  156.510208] CPU: 0 PID: 407 Comm: md127_raid1 Not tainted
4.8.0-1.vanilla.knurd.1.fc24.x86_64 #1
[  156.511559] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[  156.513209] task: 9944f9ed9d80 task.stack: 9944f903c000
[  156.514159] RIP: 0010:[]  []
bio_split+0x8a/0x90
[  156.515471] RSP: 0018:9944f903fb38  EFLAGS: 00010246
[  156.516323] RAX: fffdd1f3 RBX:  RCX: 9944ff05cf00
[  156.517456] RDX: 0240 RSI:  RDI: 9944f830b700
[  156.518573] RBP: 9944f903fb50 R08: 0001 R09: 9944f830b700
[  156.519680] R10: 9944f903fbb0 R11:  R12: 
[  156.520807] R13:  R14:  R15: 9944ff1388e0
[  156.521916] FS:  () GS:9944fec0()
knlGS:
[  156.523187] CS:  0010 DS:  ES:  CR0: 80050033
[  156.524080] CR2: 561c09a99108 CR3: 3ab2e000 CR4: 001406f0
[  156.525228] Stack:
[  156.525571]   0200 
9944f903fbf0
[  156.526812]  bc3b7f0e 9944f8928000 9944fec19580
9944ff05cf00
[  156.528072]  0005803fff747000 9944f903fc00 0001f8928080
9944f830b700
[  156.529308] Call Trace:
[  156.529706]  [] blk_queue_split+0x18e/0x640
[  156.530609]  [] blk_queue_bio+0x53/0x3c0
[  156.531487]  [] generic_make_request+0xf2/0x1d0
[  156.532436]  [] submit_bio+0x7d/0x150
[  156.533272]  [] ? bio_clone_bioset+0x14e/0x360
[  156.534211]  [] submit_bio_wait+0x5d/0x90
[  156.535108]  [] raid1d+0x405/0x1010 [raid1]
[  156.536040]  [] ? set_next_entity+0x49/0xd0
[  156.536938]  [] md_thread+0x139/0x150
[  156.537766]  [] ? prepare_to_wait_event+0xf0/0xf0
[  156.538758]  [] ? find_pers+0x70/0x70
[  156.539578]  [] kthread+0xd8/0xf0
[  156.540355]  [] ret_from_fork+0x1f/0x40
[  156.541208]  [] ? kthread_worker_fn+0x180/0x180
[  156.542186] Code: 44 89 e2 4c 89 ef e8 d6 51 03 00 41 8b 75 28 48
89 df e8 3a d3 ff ff 5b 4c 89 e8 41 5c 41 5d 5d c3 e8 1b fc ff ff 49
89 c5 eb b6 <0f> 0b 0f 0b 66 90 0f 1f 44 00 00 48 8b 07 55 48 89 e5 48
85 c0
[  156.546655] RIP  [] bio_split+0x8a/0x90
[  156.547535]  RSP 
[  156.548127] ---[ end trace bdaed0bbc089e451 ]---
[  156.549888] BUG: unable to handle kernel paging request at 4ebddcbe
[  156.551080] IP: [] __wake_up_common+0x2b/0x80
[  156.552036] PGD 0
[  156.552430] Oops:  [#2] SMP
[  156.552958] Modules linked in: vmw_vsock_vmci_transport vsock
sb_edac edac_core intel_powerclamp coretemp raid1 crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel ppdev intel_rapl_perf vmxnet3
vmw_balloon pcspkr joydev acpi_cpufreq tpm_tis i2c_piix4 tpm_tis_core
vmw_vmci parport_pc fjes tpm shpchp parport dm_multipath vmwgfx
drm_kms_helper ttm drm crc32c_intel serio_raw vmw_pvscsi ata_generic
pata_acpi
[  156.559295] CPU: 0 PID: 407 Comm: md127_raid1 Tainted: G  D
4.8.0-1.vanilla.knurd.1.fc24.x86_64 #1
[  156.560844] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[  156.562491] task: 9944f9ed9d80 task.stack: 9944f903c000
[  156.563418] RIP: 0010:[]  []
__wake_up_common+0x2b/0x80
[  156.564740] RSP: 0018:9944f903fe38  EFLAGS: 00010096
[  156.565588] RAX: 0282 RBX: 9944f903ff10 RCX: 
[  156.566703] RDX: 4ebddcbe RSI: 0003 RDI: 9944f903ff10
[  156.567819] RBP: 9944f903fe70 R08:  R09: 
[  156.568924] R10:  R11: fa83b2da R12: 9944f903ff18
[  156.570038] R13: 0282 R14: 0001 R15: 0003
[  156.571158] FS:  

kernel BUG at block/bio.c:1785 while trying to issue a discard to LVM on RAID1 md

2016-10-03 Thread Sitsofe Wheeler
Hi,

While trying to do a discard (via blkdiscard --length 1048576
/dev/) to an LVM device atop a two disk md RAID1 the
following oops was generated:

[  103.306243] md: resync of RAID array md127
[  103.306246] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  103.306248] md: using maximum available idle IO bandwidth (but not
more than 20 KB/sec) for resync.
[  103.306251] md: using 128k window, over a total of 244194432k.
[  103.308158] [ cut here ]
[  103.308205] kernel BUG at block/bio.c:1785!
[  103.308243] invalid opcode:  [#1] SMP
[  103.308279] Modules linked in: vmw_vsock_vmci_transport vsock
sb_edac raid1 edac_core intel_powerclamp coretemp crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel vmw_balloon ppdev intel_rapl_perf
joydev vmxnet3 parport_pc vmw_vmci parport shpchp acpi_cpufreq fjes
tpm_tis tpm i2c_piix4 dm_multipath vmwgfx drm_kms_helper ttm drm
crc32c_intel serio_raw vmw_pvscsi ata_generic pata_acpi
[  103.308641] CPU: 0 PID: 391 Comm: md127_raid1 Not tainted
4.7.5-200.fc24.x86_64 #1
[  103.308699] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[  103.308784] task: 88003beb ti: 8816c000 task.ti:
8816c000
[  103.308841] RIP: 0010:[]  []
bio_split+0x82/0x90
[  103.308921] RSP: 0018:8816fb38  EFLAGS: 00010246
[  103.308972] RAX: 00057fff RBX:  RCX: 88003f017a80
[  103.309038] RDX: 0240 RSI:  RDI: 88003bc01500
[  103.309110] RBP: 8816fb50 R08: 0080 R09: 88003bc01500
[  103.310652] R10: 8816fbb0 R11:  R12: 
[  103.312043] R13:  R14: 0002 R15: 88003f168900
[  103.313419] FS:  () GS:88003ec0()
knlGS:
[  103.314815] CS:  0010 DS:  ES:  CR0: 80050033
[  103.315731] CR2: 7fdd4daeb400 CR3: 3b2c5000 CR4: 001406f0
[  103.316328] Stack:
[  103.316879]   01fe 
8816fbf0
[  103.317473]  a23b1afd 0246 02011200
88003f017a80
[  103.318050]  00058000 00803e8013c0 8816fc00
88003bc01500
[  103.318626] Call Trace:
[  103.319196]  [] blk_queue_split+0x2cd/0x620
[  103.319780]  [] blk_queue_bio+0x53/0x3d0
[  103.320378]  [] generic_make_request+0xf2/0x1d0
[  103.320960]  [] submit_bio+0x76/0x160
[  103.321535]  [] submit_bio_wait+0x63/0x90
[  103.322112]  [] raid1d+0x3ea/0xfb0 [raid1]
[  103.322688]  [] ? schedule_timeout+0x1ac/0x270
[  103.323268]  [] md_thread+0x139/0x150
[  103.323848]  [] ? prepare_to_wait_event+0xf0/0xf0
[  103.324417]  [] ? find_pers+0x70/0x70
[  103.324988]  [] kthread+0xd8/0xf0
[  103.325562]  [] ret_from_fork+0x1f/0x40
[  103.326108]  [] ? kthread_worker_fn+0x180/0x180
[  103.326654] Code: 44 89 e2 4c 89 ef e8 1e 47 03 00 41 8b 75 28 48
89 df e8 92 d6 ff ff 5b 4c 89 e8 41 5c 41 5d 5d c3 e8 63 fc ff ff 49
89 c5 eb b6 <0f> 0b 0f 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
48 8b
[  103.328410] RIP  [] bio_split+0x82/0x90
[  103.328943]  RSP 
[  103.329474] ---[ end trace f093e2f8fabdb9b3 ]---

The kernel is 4.7.5-200.fc24.x86_64 from Fedora 24. While md is stuck
in the PENDING state the oops seems to be reproducible. If md is in
the middle of resyncing the system locks up entirely without printing
anything instead...

The "disks" are raw disk mappings of SSDs on ESXi being passed to a
VM. Here's the some initial /sys/block/ output before the any discards
are issued:
 # grep . /sys/block/sdc/queue/*
/sys/block/sdc/queue/add_random:0
/sys/block/sdc/queue/discard_granularity:512
/sys/block/sdc/queue/discard_max_bytes:4294966784
/sys/block/sdc/queue/discard_max_hw_bytes:4294966784
/sys/block/sdc/queue/discard_zeroes_data:0
/sys/block/sdc/queue/hw_sector_size:512
/sys/block/sdc/queue/io_poll:0
grep: /sys/block/sdc/queue/iosched: Is a directory
/sys/block/sdc/queue/iostats:1
/sys/block/sdc/queue/logical_block_size:512
/sys/block/sdc/queue/max_hw_sectors_kb:32767
/sys/block/sdc/queue/max_integrity_segments:0
/sys/block/sdc/queue/max_sectors_kb:1280
/sys/block/sdc/queue/max_segments:128
/sys/block/sdc/queue/max_segment_size:65536
/sys/block/sdc/queue/minimum_io_size:512
/sys/block/sdc/queue/nomerges:0
/sys/block/sdc/queue/nr_requests:128
/sys/block/sdc/queue/optimal_io_size:0
/sys/block/sdc/queue/physical_block_size:512
/sys/block/sdc/queue/read_ahead_kb:128
/sys/block/sdc/queue/rotational:0
/sys/block/sdc/queue/rq_affinity:1
/sys/block/sdc/queue/scheduler:[noop] deadline cfq
/sys/block/sdc/queue/write_cache:write through
/sys/block/sdc/queue/write_same_max_bytes:0

-- 
Sitsofe | http://sucs.org/~sits/


kernel BUG at block/bio.c:1785 while trying to issue a discard to LVM on RAID1 md

2016-10-03 Thread Sitsofe Wheeler
Hi,

While trying to do a discard (via blkdiscard --length 1048576
/dev/) to an LVM device atop a two disk md RAID1 the
following oops was generated:

[  103.306243] md: resync of RAID array md127
[  103.306246] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  103.306248] md: using maximum available idle IO bandwidth (but not
more than 20 KB/sec) for resync.
[  103.306251] md: using 128k window, over a total of 244194432k.
[  103.308158] [ cut here ]
[  103.308205] kernel BUG at block/bio.c:1785!
[  103.308243] invalid opcode:  [#1] SMP
[  103.308279] Modules linked in: vmw_vsock_vmci_transport vsock
sb_edac raid1 edac_core intel_powerclamp coretemp crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel vmw_balloon ppdev intel_rapl_perf
joydev vmxnet3 parport_pc vmw_vmci parport shpchp acpi_cpufreq fjes
tpm_tis tpm i2c_piix4 dm_multipath vmwgfx drm_kms_helper ttm drm
crc32c_intel serio_raw vmw_pvscsi ata_generic pata_acpi
[  103.308641] CPU: 0 PID: 391 Comm: md127_raid1 Not tainted
4.7.5-200.fc24.x86_64 #1
[  103.308699] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[  103.308784] task: 88003beb ti: 8816c000 task.ti:
8816c000
[  103.308841] RIP: 0010:[]  []
bio_split+0x82/0x90
[  103.308921] RSP: 0018:8816fb38  EFLAGS: 00010246
[  103.308972] RAX: 00057fff RBX:  RCX: 88003f017a80
[  103.309038] RDX: 0240 RSI:  RDI: 88003bc01500
[  103.309110] RBP: 8816fb50 R08: 0080 R09: 88003bc01500
[  103.310652] R10: 8816fbb0 R11:  R12: 
[  103.312043] R13:  R14: 0002 R15: 88003f168900
[  103.313419] FS:  () GS:88003ec0()
knlGS:
[  103.314815] CS:  0010 DS:  ES:  CR0: 80050033
[  103.315731] CR2: 7fdd4daeb400 CR3: 3b2c5000 CR4: 001406f0
[  103.316328] Stack:
[  103.316879]   01fe 
8816fbf0
[  103.317473]  a23b1afd 0246 02011200
88003f017a80
[  103.318050]  00058000 00803e8013c0 8816fc00
88003bc01500
[  103.318626] Call Trace:
[  103.319196]  [] blk_queue_split+0x2cd/0x620
[  103.319780]  [] blk_queue_bio+0x53/0x3d0
[  103.320378]  [] generic_make_request+0xf2/0x1d0
[  103.320960]  [] submit_bio+0x76/0x160
[  103.321535]  [] submit_bio_wait+0x63/0x90
[  103.322112]  [] raid1d+0x3ea/0xfb0 [raid1]
[  103.322688]  [] ? schedule_timeout+0x1ac/0x270
[  103.323268]  [] md_thread+0x139/0x150
[  103.323848]  [] ? prepare_to_wait_event+0xf0/0xf0
[  103.324417]  [] ? find_pers+0x70/0x70
[  103.324988]  [] kthread+0xd8/0xf0
[  103.325562]  [] ret_from_fork+0x1f/0x40
[  103.326108]  [] ? kthread_worker_fn+0x180/0x180
[  103.326654] Code: 44 89 e2 4c 89 ef e8 1e 47 03 00 41 8b 75 28 48
89 df e8 92 d6 ff ff 5b 4c 89 e8 41 5c 41 5d 5d c3 e8 63 fc ff ff 49
89 c5 eb b6 <0f> 0b 0f 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
48 8b
[  103.328410] RIP  [] bio_split+0x82/0x90
[  103.328943]  RSP 
[  103.329474] ---[ end trace f093e2f8fabdb9b3 ]---

The kernel is 4.7.5-200.fc24.x86_64 from Fedora 24. While md is stuck
in the PENDING state the oops seems to be reproducible. If md is in
the middle of resyncing the system locks up entirely without printing
anything instead...

The "disks" are raw disk mappings of SSDs on ESXi being passed to a
VM. Here's the some initial /sys/block/ output before the any discards
are issued:
 # grep . /sys/block/sdc/queue/*
/sys/block/sdc/queue/add_random:0
/sys/block/sdc/queue/discard_granularity:512
/sys/block/sdc/queue/discard_max_bytes:4294966784
/sys/block/sdc/queue/discard_max_hw_bytes:4294966784
/sys/block/sdc/queue/discard_zeroes_data:0
/sys/block/sdc/queue/hw_sector_size:512
/sys/block/sdc/queue/io_poll:0
grep: /sys/block/sdc/queue/iosched: Is a directory
/sys/block/sdc/queue/iostats:1
/sys/block/sdc/queue/logical_block_size:512
/sys/block/sdc/queue/max_hw_sectors_kb:32767
/sys/block/sdc/queue/max_integrity_segments:0
/sys/block/sdc/queue/max_sectors_kb:1280
/sys/block/sdc/queue/max_segments:128
/sys/block/sdc/queue/max_segment_size:65536
/sys/block/sdc/queue/minimum_io_size:512
/sys/block/sdc/queue/nomerges:0
/sys/block/sdc/queue/nr_requests:128
/sys/block/sdc/queue/optimal_io_size:0
/sys/block/sdc/queue/physical_block_size:512
/sys/block/sdc/queue/read_ahead_kb:128
/sys/block/sdc/queue/rotational:0
/sys/block/sdc/queue/rq_affinity:1
/sys/block/sdc/queue/scheduler:[noop] deadline cfq
/sys/block/sdc/queue/write_cache:write through
/sys/block/sdc/queue/write_same_max_bytes:0

-- 
Sitsofe | http://sucs.org/~sits/