Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host

2016-01-05 Thread Eryu Guan
On Tue, Jan 05, 2016 at 06:58:25PM -0500, Martin K. Petersen wrote:
> > "Eryu" == Eryu Guan  writes:
> 
> Eryu> Any updates on this? It's still reproducible with 4.4-rc8 kernel,
> Eryu> and still blocks some of my tests :)
> 
> http://git.kernel.org/cgit/linux/kernel/git/mkp/scsi.git/log/?h=4.4/scsi-fixes
> 
> It just hasn't made it to Linus yet...

Great to hear that, thanks! (I don't subscribe linux-scsi@ so didn't see
the patch sent out)

Eryu
> 
> -- 
> Martin K. PetersenOracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host

2016-01-05 Thread Eryu Guan
On Fri, Dec 11, 2015 at 07:53:40PM +0800, Eryu Guan wrote:
> Hi,
> 
> I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced
> easily on ppc64 host by:
> 
> modprobe scsi_debug sector_size=512 physblk_exp=3 dev_size_mb=256
> 
> And I bisected to this commit
> 
>   commit ca369d51b3e1649be4a72addd6d6a168cfb3f537
>   Author: Martin K. Petersen 
>   Date:   Fri Nov 13 16:46:48 2015 -0500
> 
>   block/sd: Fix device-imposed transfer length limits
> 
> I confirmed by reverting this commit on top of 4.4-rc4 kernel and test
> passed.

Hi,

Any updates on this? It's still reproducible with 4.4-rc8 kernel, and
still blocks some of my tests :)

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host

2016-01-05 Thread Martin K. Petersen
> "Eryu" == Eryu Guan  writes:

Eryu> Any updates on this? It's still reproducible with 4.4-rc8 kernel,
Eryu> and still blocks some of my tests :)

http://git.kernel.org/cgit/linux/kernel/git/mkp/scsi.git/log/?h=4.4/scsi-fixes

It just hasn't made it to Linus yet...

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host

2015-12-15 Thread Eryu Guan
On Tue, Dec 15, 2015 at 09:27:14PM +0800, Ming Lei wrote:
> On Tue, Dec 15, 2015 at 9:06 PM, Eryu Guan  wrote:
> > On Tue, Dec 15, 2015 at 08:06:47PM +0800, Ming Lei wrote:
> >> On Tue, Dec 15, 2015 at 7:20 PM, Eryu Guan  wrote:
> >> > On Fri, Dec 11, 2015 at 07:53:40PM +0800, Eryu Guan wrote:
> >> >> Hi,
> >> >>
> >> >> I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced
> >> >> easily on ppc64 host by:
> >> >
> >> > This is still reproducible with 4.4-rc5 kernel.
> >>
> >> Could you capture the debug log after appyling the attached patch and
> >> the reproduction?
> >
> > Thanks for looking into this! dmesg shows:
> >
> > [  686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0
> 
> Then I guess queue_max_sectors(q) is bad, could you apply the
> attached patch(and the last patch) and post the log?

[  301.279018] blk_bio_segment_split: nseg 0, max_secs 64, max segs 2048
[  301.279023]   bv.len 65536, bv.offset 0
[  301.279026] bio_split: sectors 0, bio_sectors 128, bi_rw 0

If full call trace is needed please let me know.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host

2015-12-15 Thread Eryu Guan
On Tue, Dec 15, 2015 at 11:38:41PM +0800, Ming Lei wrote:
> On Tue, 15 Dec 2015 21:06:31 +0800
> Eryu Guan  wrote:
> 
> > On Tue, Dec 15, 2015 at 08:06:47PM +0800, Ming Lei wrote:
> > > On Tue, Dec 15, 2015 at 7:20 PM, Eryu Guan  wrote:
> > > > On Fri, Dec 11, 2015 at 07:53:40PM +0800, Eryu Guan wrote:
> > > >> Hi,
> > > >>
> > > >> I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced
> > > >> easily on ppc64 host by:
> > > >
> > > > This is still reproducible with 4.4-rc5 kernel.
> > > 
> > > Could you capture the debug log after appyling the attached patch and
> > > the reproduction?
> > 
> > Thanks for looking into this! dmesg shows:
> > 
> > [  686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0
> 
> I guess the following patch should fix the issue, and ca369d51b3
> uses OPTIMAL TRANSFER LENGTH to set limits->max_sectors, which
> may be less than one page size.
> 
> I don't understand the idea behind this change, Martin, could
> you explain it a bit?
> 
> ---
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 3d22fc3..d66d362 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -2889,10 +2889,11 @@ static int sd_revalidate_disk(struct gendisk *disk)
>*/
>   if (sdkp->opt_xfer_blocks && sdkp->opt_xfer_blocks <= dev_max &&
>   sdkp->opt_xfer_blocks <= SD_DEF_XFER_BLOCKS)
> - rw_max = q->limits.io_opt =
> + q->limits.io_opt =
>   logical_to_sectors(sdp, sdkp->opt_xfer_blocks);
> - else
> - rw_max = BLK_DEF_MAX_SECTORS;
> +
> + rw_max = min_t(unsigned, BLK_DEF_MAX_SECTORS,
> +q->limits.max_dev_sectors);
>  
>   /* Combine with controller limits */
>   q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q));

I tested this patch and no BUG_ON this time, the debug messages are not
triggered either.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host

2015-12-15 Thread Martin K. Petersen
> "Eryu" == Eryu Guan  writes:

Eryu,

Does the patch below fix the issue?

-- 
Martin K. Petersen  Oracle Linux Engineering

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 3d22fc3e3c1a..d1eb7aa78b8d 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2667,8 +2667,9 @@ static void sd_read_block_limits(struct scsi_disk *sdkp)
 
if (buffer[3] == 0x3c) {
unsigned int lba_count, desc_count;
+   u64 max_ws = get_unaligned_be64([36]);
 
-   sdkp->max_ws_blocks = (u32)get_unaligned_be64([36]);
+   sdkp->max_ws_blocks = (u32)max_ws;
 
if (!sdkp->lbpme)
goto out;
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host

2015-12-15 Thread Eryu Guan
On Fri, Dec 11, 2015 at 07:53:40PM +0800, Eryu Guan wrote:
> Hi,
> 
> I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced
> easily on ppc64 host by:

This is still reproducible with 4.4-rc5 kernel.

Thanks,
Eryu

> 
> modprobe scsi_debug sector_size=512 physblk_exp=3 dev_size_mb=256
> 
> And I bisected to this commit
> 
>   commit ca369d51b3e1649be4a72addd6d6a168cfb3f537
>   Author: Martin K. Petersen 
>   Date:   Fri Nov 13 16:46:48 2015 -0500
> 
>   block/sd: Fix device-imposed transfer length limits
> 
> I confirmed by reverting this commit on top of 4.4-rc4 kernel and test
> passed.
> 
> Thanks,
> Eryu
> 
> P.S. dmesg log
> [  817.477557] scsi_debug:sdebug_driver_probe: host protection 
> [  817.477571] scsi host1: scsi_debug, version 1.85 [20141022], 
> dev_size_mb=256, opts=0x0 
> [  817.478202] scsi 1:0:0:0: Direct-Access Linuxscsi_debug   0184 
> PQ: 0 ANSI: 6 
> [  817.478733] sd 1:0:0:0: Attached scsi generic sg1 type 0 
> [  817.496144] sd 1:0:0:0: [sdb] 524288 512-byte logical blocks: (268 MB/256 
> MiB) 
> [  817.496155] sd 1:0:0:0: [sdb] 4096-byte physical blocks 
> [  817.506142] sd 1:0:0:0: [sdb] Write Protect is off 
> [  817.526134] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, 
> supports DPO and FUA 
> [  817.646163] [ cut here ] 
> [  817.646168] kernel BUG at block/bio.c:1787! 
> [  817.646172] Oops: Exception in kernel mode, sig: 5 [#1] 
> [  817.646174] SMP NR_CPUS=2048 NUMA pSeries 
> [  817.646178] Modules linked in: scsi_debug(E) nfsv3(E) rpcsec_gss_krb5(E) 
> nfsv4(E) dns_resolver(E) nfs(E) fscache(E) dm_mod(E) loop(E) sg(E) 
> pseries_rng(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) sunrpc(E) grace(E) 
> ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) ibmvscsi(E) ibmveth(E) 
> scsi_transport_srp(E) 
> [  817.646205] CPU: 6 PID: 166 Comm: kworker/u321:1 Tainted: GE   
> 4.4.0-rc4 #1 
> [  817.646211] Workqueue: events_unbound .async_run_entry_fn 
> [  817.646215] task: ca0c ti: ca18 task.ti: 
> ca18 
> [  817.646218] NIP: c03b1d54 LR: c03c4780 CTR: 
> c03be420 
> [  817.646222] REGS: ca1826c0 TRAP: 0700   Tainted: GE
> (4.4.0-rc4) 
> [  817.646225] MSR: 800100029032   CR: 24732728  XER: 
>  
> [  817.646233] CFAR: c03c477c SOFTE: 1  
> GPR00: c03c4780 ca182940 c1325e00 c0016cebcf00  
> GPR04:  0240 c0013c5f4d80 0040  
> GPR08: f0436ac0 0001    
> GPR12: 24732722 ce743900  f0436ac0  
> GPR16: c000f9e3eee0 c0010dab 0001   
> GPR20:  0080  c0016cebcf00  
> GPR24: c000ff9b5a20 ca182bb8 c0016cebcf88   
> GPR28:  c0016cebcf00  0001  
> [  817.646273] NIP [c03b1d54] .bio_split+0x34/0x110 
> [  817.646277] LR [c03c4780] .blk_queue_split+0x3b0/0x560 
> [  817.646280] Call Trace: 
> [  817.646282] [ca182940] [ca1829d0] 0xca1829d0 
> (unreliable) 
> [  817.646287] [ca1829d0] [c03c4780] 
> .blk_queue_split+0x3b0/0x560 
> [  817.646291] [ca182ae0] [c03be460] 
> .blk_queue_bio+0x40/0x430 
> [  817.646295] [ca182b80] [c03bc0f0] 
> .generic_make_request+0x150/0x210 
> [  817.646299] [ca182c30] [c03bc26c] .submit_bio+0xbc/0x1c0 
> [  817.646304] [ca182cf0] [c02cb64c] 
> .submit_bh_wbc+0x19c/0x200 
> [  817.646308] [ca182d90] [c02cbb10] 
> .block_read_full_page+0x310/0x410 
> [  817.646312] [ca183290] [c02cf11c] 
> .blkdev_readpage+0x1c/0x30 
> [  817.646316] [ca183300] [c01e51a0] 
> .do_read_cache_page+0xc0/0x290 
> [  817.646321] [ca1833c0] [c03d59f8] 
> .read_dev_sector+0x38/0xb0 
> [  817.646325] [ca183440] [c03d977c] .read_lba+0xcc/0x1f0 
> [  817.646329] [ca1834f0] [c03da3b8] 
> .efi_partition+0x118/0x780 
> [  817.646333] [ca183670] [c03d6fcc] 
> .check_partition+0x14c/0x2e0 
> [  817.646337] [ca183700] [c03d6260] 
> .rescan_partitions+0xd0/0x380 
> [  817.646341] [ca1837e0] [c02d0b88] 
> .__blkdev_get+0x3d8/0x530 
> [  817.646345] [ca1838a0] [c02d0f10] .blkdev_get+0x230/0x4a0 
> [  817.646348] [ca1839a0] [c03d3288] .add_disk+0x468/0x4f0 
> [  817.646353] [ca183a60] [d2026450] 
> .sd_probe_async+0xf0/0x230 [sd_mod] 
> [  817.646357] [ca183af0] [c00d23a8] 
> .async_run_entry_fn+0x98/0x200 
> [  817.646362] [ca183ba0] [c00c6d74] 
> .process_one_work+0x1a4/0x490 
> [  817.646366] 

Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host

2015-12-15 Thread Eryu Guan
On Tue, Dec 15, 2015 at 08:06:47PM +0800, Ming Lei wrote:
> On Tue, Dec 15, 2015 at 7:20 PM, Eryu Guan  wrote:
> > On Fri, Dec 11, 2015 at 07:53:40PM +0800, Eryu Guan wrote:
> >> Hi,
> >>
> >> I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced
> >> easily on ppc64 host by:
> >
> > This is still reproducible with 4.4-rc5 kernel.
> 
> Could you capture the debug log after appyling the attached patch and
> the reproduction?

Thanks for looking into this! dmesg shows:

[  686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0

Thanks,
Eryu

P.S. full call trace

[  686.065692] scsi_debug:sdebug_driver_probe: host protection
[  686.065710] scsi host1: scsi_debug, version 1.85 [20141022], 
dev_size_mb=256, opts=0x0
[  686.065981] scsi 1:0:0:0: Direct-Access Linuxscsi_debug   0184 
PQ: 0 ANSI: 6
[  686.066873] sd 1:0:0:0: Attached scsi generic sg1 type 0
[  686.077683] sd 1:0:0:0: [sdb] 524288 512-byte logical blocks: (268 MB/256 
MiB)
[  686.077694] sd 1:0:0:0: [sdb] 4096-byte physical blocks
[  686.087670] sd 1:0:0:0: [sdb] Write Protect is off
[  686.107671] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, 
supports DPO and FUA
[  686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0
[  686.217695] [ cut here ]
[  686.217698] kernel BUG at block/bio.c:1793!
[  686.217702] Oops: Exception in kernel mode, sig: 5 [#1]
[  686.217704] SMP NR_CPUS=2048 NUMA pSeries
[  686.217707] Modules linked in: scsi_debug sg pseries_rng nfsd auth_rpcgss 
nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth 
scsi_transport_srp
[  686.217727] CPU: 8 PID: 9515 Comm: kworker/u32:0 Not tainted 4.4.0-rc5+ #33
[  686.217733] Workqueue: events_unbound async_run_entry_fn
[  686.217737] task: c005edb23cc0 ti: c005f016c000 task.ti: 
c005f016c000
[  686.217740] NIP: c03c45c4 LR: c03c46b8 CTR: 013abb8c
[  686.217743] REGS: c005f016ea20 TRAP: 0700   Not tainted  (4.4.0-rc5+)
[  686.217746] MSR: 800100029033   CR: 22bb2322  XER: 
000f
[  686.217756] CFAR: c03c46cc SOFTE: 1 
GPR00: c03c46b8 c005f016eca0 c1068300 002e 
GPR04: c005ffd09c50 c005ffd1b4a0 0001  
GPR08: 0001 c0bab284 0005ff16 0130 
GPR12: 3f30 ce7e4c00  f15d0e40 
GPR16: c005f3c3b7a0 c0057439 0001  
GPR20:  0080  c005f5093200 
GPR24: c005edb0efa0 c005f016ee60 c005f5093288  
GPR28: 0240 c005f5093200  c005efd67600 
[  686.217797] NIP [c03c45c4] bio_split+0x54/0x160
[  686.217800] LR [c03c46b8] bio_split+0x148/0x160
[  686.217803] Call Trace:
[  686.217805] [c005f016eca0] [c03c46b8] bio_split+0x148/0x160 
(unreliable)
[  686.217810] [c005f016ed30] [c03d75e0] blk_queue_split+0x3c0/0x570
[  686.217814] [c005f016ee30] [c03d10a8] blk_queue_bio+0x48/0x440
[  686.217818] [c005f016ee90] [c03cec9c] 
generic_make_request+0x15c/0x220
[  686.217822] [c005f016eef0] [c03cee24] submit_bio+0xc4/0x1d0
[  686.217826] [c005f016efa0] [c02db204] submit_bh_wbc+0x1a4/0x200
[  686.217830] [c005f016eff0] [c02db6f0] 
block_read_full_page+0x320/0x420
[  686.217835] [c005f016f4a0] [c02dedb4] blkdev_readpage+0x24/0x40
[  686.217839] [c005f016f4c0] [c01f06fc] 
do_read_cache_page+0xbc/0x290
[  686.217844] [c005f016f530] [c03e8e00] read_dev_sector+0x40/0xc0
[  686.217848] [c005f016f560] [c03ec6bc] read_lba+0xdc/0x200
[  686.217851] [c005f016f5c0] [c03ece4c] find_valid_gpt+0xec/0x740
[  686.217855] [c005f016f6a0] [c03ed894] efi_partition+0x3f4/0x450
[  686.217859] [c005f016f820] [c03ea428] check_partition+0x158/0x2f0
[  686.217863] [c005f016f8a0] [c03e9694] 
rescan_partitions+0xd4/0x390
[  686.217867] [c005f016f970] [c02e0938] __blkdev_get+0x3a8/0x4d0
[  686.217871] [c005f016f9e0] [c02e0c90] blkdev_get+0x230/0x4a0
[  686.217875] [c005f016fa90] [c03e65b8] add_disk+0x478/0x500
[  686.217880] [c005f016fb40] [d3fa66a8] sd_probe_async+0xf8/0x240 
[sd_mod]
[  686.217884] [c005f016fbc0] [c00d7db8] 
async_run_entry_fn+0x98/0x1f0
[  686.217888] [c005f016fc50] [c00cc1a0] 
process_one_work+0x190/0x470
[  686.217892] [c005f016fce0] [c00cc5fc] worker_thread+0x17c/0x5a0
[  686.217896] [c005f016fd80] [c00d3da8] kthread+0x108/0x130
[  686.217901] [c005f016fe30] [c0009538] 
ret_from_kernel_thread+0x5c/0xa4
[  686.217904] Instruction dump:
[  686.217906] 7cdf3378 7c9e2378 7c7d1b78 f8010010 7cbc2b78 f821ff71 80c30028 
40dd00e8
[  686.217912] 

Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host

2015-12-15 Thread Ming Lei
On Tue, Dec 15, 2015 at 9:06 PM, Eryu Guan  wrote:
> On Tue, Dec 15, 2015 at 08:06:47PM +0800, Ming Lei wrote:
>> On Tue, Dec 15, 2015 at 7:20 PM, Eryu Guan  wrote:
>> > On Fri, Dec 11, 2015 at 07:53:40PM +0800, Eryu Guan wrote:
>> >> Hi,
>> >>
>> >> I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced
>> >> easily on ppc64 host by:
>> >
>> > This is still reproducible with 4.4-rc5 kernel.
>>
>> Could you capture the debug log after appyling the attached patch and
>> the reproduction?
>
> Thanks for looking into this! dmesg shows:
>
> [  686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0

Then I guess queue_max_sectors(q) is bad, could you apply the
attached patch(and the last patch) and post the log?


>
> Thanks,
> Eryu
>
> P.S. full call trace
>
> [  686.065692] scsi_debug:sdebug_driver_probe: host protection
> [  686.065710] scsi host1: scsi_debug, version 1.85 [20141022], 
> dev_size_mb=256, opts=0x0
> [  686.065981] scsi 1:0:0:0: Direct-Access Linuxscsi_debug   0184 
> PQ: 0 ANSI: 6
> [  686.066873] sd 1:0:0:0: Attached scsi generic sg1 type 0
> [  686.077683] sd 1:0:0:0: [sdb] 524288 512-byte logical blocks: (268 MB/256 
> MiB)
> [  686.077694] sd 1:0:0:0: [sdb] 4096-byte physical blocks
> [  686.087670] sd 1:0:0:0: [sdb] Write Protect is off
> [  686.107671] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, 
> supports DPO and FUA
> [  686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0
> [  686.217695] [ cut here ]
> [  686.217698] kernel BUG at block/bio.c:1793!
> [  686.217702] Oops: Exception in kernel mode, sig: 5 [#1]
> [  686.217704] SMP NR_CPUS=2048 NUMA pSeries
> [  686.217707] Modules linked in: scsi_debug sg pseries_rng nfsd auth_rpcgss 
> nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth 
> scsi_transport_srp
> [  686.217727] CPU: 8 PID: 9515 Comm: kworker/u32:0 Not tainted 4.4.0-rc5+ #33
> [  686.217733] Workqueue: events_unbound async_run_entry_fn
> [  686.217737] task: c005edb23cc0 ti: c005f016c000 task.ti: 
> c005f016c000
> [  686.217740] NIP: c03c45c4 LR: c03c46b8 CTR: 
> 013abb8c
> [  686.217743] REGS: c005f016ea20 TRAP: 0700   Not tainted  (4.4.0-rc5+)
> [  686.217746] MSR: 800100029033   CR: 22bb2322  
> XER: 000f
> [  686.217756] CFAR: c03c46cc SOFTE: 1
> GPR00: c03c46b8 c005f016eca0 c1068300 002e
> GPR04: c005ffd09c50 c005ffd1b4a0 0001 
> GPR08: 0001 c0bab284 0005ff16 0130
> GPR12: 3f30 ce7e4c00  f15d0e40
> GPR16: c005f3c3b7a0 c0057439 0001 
> GPR20:  0080  c005f5093200
> GPR24: c005edb0efa0 c005f016ee60 c005f5093288 
> GPR28: 0240 c005f5093200  c005efd67600
> [  686.217797] NIP [c03c45c4] bio_split+0x54/0x160
> [  686.217800] LR [c03c46b8] bio_split+0x148/0x160
> [  686.217803] Call Trace:
> [  686.217805] [c005f016eca0] [c03c46b8] bio_split+0x148/0x160 
> (unreliable)
> [  686.217810] [c005f016ed30] [c03d75e0] 
> blk_queue_split+0x3c0/0x570
> [  686.217814] [c005f016ee30] [c03d10a8] blk_queue_bio+0x48/0x440
> [  686.217818] [c005f016ee90] [c03cec9c] 
> generic_make_request+0x15c/0x220
> [  686.217822] [c005f016eef0] [c03cee24] submit_bio+0xc4/0x1d0
> [  686.217826] [c005f016efa0] [c02db204] submit_bh_wbc+0x1a4/0x200
> [  686.217830] [c005f016eff0] [c02db6f0] 
> block_read_full_page+0x320/0x420
> [  686.217835] [c005f016f4a0] [c02dedb4] blkdev_readpage+0x24/0x40
> [  686.217839] [c005f016f4c0] [c01f06fc] 
> do_read_cache_page+0xbc/0x290
> [  686.217844] [c005f016f530] [c03e8e00] read_dev_sector+0x40/0xc0
> [  686.217848] [c005f016f560] [c03ec6bc] read_lba+0xdc/0x200
> [  686.217851] [c005f016f5c0] [c03ece4c] find_valid_gpt+0xec/0x740
> [  686.217855] [c005f016f6a0] [c03ed894] efi_partition+0x3f4/0x450
> [  686.217859] [c005f016f820] [c03ea428] 
> check_partition+0x158/0x2f0
> [  686.217863] [c005f016f8a0] [c03e9694] 
> rescan_partitions+0xd4/0x390
> [  686.217867] [c005f016f970] [c02e0938] __blkdev_get+0x3a8/0x4d0
> [  686.217871] [c005f016f9e0] [c02e0c90] blkdev_get+0x230/0x4a0
> [  686.217875] [c005f016fa90] [c03e65b8] add_disk+0x478/0x500
> [  686.217880] [c005f016fb40] [d3fa66a8] 
> sd_probe_async+0xf8/0x240 [sd_mod]
> [  686.217884] [c005f016fbc0] [c00d7db8] 
> async_run_entry_fn+0x98/0x1f0
> [  686.217888] [c005f016fc50] [c00cc1a0] 
> process_one_work+0x190/0x470
> [  686.217892] 

Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host

2015-12-15 Thread Ming Lei
On Wed, Dec 16, 2015 at 2:29 AM, Martin K. Petersen
 wrote:
>> "Eryu" == Eryu Guan  writes:
>
> Eryu,
>
> Does the patch below fix the issue?

No, it can't.

As the debug log shows, it is because you use 'OPTIMAL
TRANSFER LENGTH' to set queue's max_sectors.

Thanks,

>
> --
> Martin K. Petersen  Oracle Linux Engineering
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 3d22fc3e3c1a..d1eb7aa78b8d 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -2667,8 +2667,9 @@ static void sd_read_block_limits(struct scsi_disk *sdkp)
>
> if (buffer[3] == 0x3c) {
> unsigned int lba_count, desc_count;
> +   u64 max_ws = get_unaligned_be64([36]);
>
> -   sdkp->max_ws_blocks = (u32)get_unaligned_be64([36]);
> +   sdkp->max_ws_blocks = (u32)max_ws;
>
> if (!sdkp->lbpme)
> goto out;



-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host

2015-12-15 Thread Ming Lei
On Wed, Dec 16, 2015 at 12:56 AM, Eryu Guan  wrote:
> On Tue, Dec 15, 2015 at 09:27:14PM +0800, Ming Lei wrote:
>> On Tue, Dec 15, 2015 at 9:06 PM, Eryu Guan  wrote:
>> > On Tue, Dec 15, 2015 at 08:06:47PM +0800, Ming Lei wrote:
>> >> On Tue, Dec 15, 2015 at 7:20 PM, Eryu Guan  wrote:
>> >> > On Fri, Dec 11, 2015 at 07:53:40PM +0800, Eryu Guan wrote:
>> >> >> Hi,
>> >> >>
>> >> >> I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced
>> >> >> easily on ppc64 host by:
>> >> >
>> >> > This is still reproducible with 4.4-rc5 kernel.
>> >>
>> >> Could you capture the debug log after appyling the attached patch and
>> >> the reproduction?
>> >
>> > Thanks for looking into this! dmesg shows:
>> >
>> > [  686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0
>>
>> Then I guess queue_max_sectors(q) is bad, could you apply the
>> attached patch(and the last patch) and post the log?
>
> [  301.279018] blk_bio_segment_split: nseg 0, max_secs 64, max segs 2048
> [  301.279023]   bv.len 65536, bv.offset 0
> [  301.279026] bio_split: sectors 0, bio_sectors 128, bi_rw 0

Now, the issue is quite obvious, and page size is 64K on your platform,
but max_sectors is set as 64 by commit ca369d51b3e164, and I think
it is wrong to set max sectors from OPTIMAL TRANSFER LENGTH.

Also it is ugly to set limits->max_sectors from drivers directly, and drivers
should have called block helpers to do that.

> If full call trace is needed please let me know.

Thanks for your test, and the above log is absolutely enough, :-)

Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host

2015-12-15 Thread Martin K. Petersen
> "Ming" == Ming Lei  writes:

Ming,

Ming> No, it can't.

Well, it fixes a problem on one of my test systems where max_ws_blocks,
by virtue of being 64 bits, clobbers opt_xfer_blocks causing rw_len and
thus max_sectors to be set incorrectly.

We haven't run into that issue on real hardware. Probably because
scsi_debug is the only driver reporting $LUDICROUS_NUMBER as the max hw
transfer.

Ming> As the debug log shows, it is because you use 'OPTIMAL TRANSFER
Ming> LENGTH' to set queue's max_sectors.

But that is intentional.

I agree that the value chosen by scsi_debug in this case is very low and
we should fix that.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host

2015-12-15 Thread Martin K. Petersen
> "Ming" == Ming Lei  writes:

Ming> I think it is wrong to set max sectors from OPTIMAL TRANSFER
Ming> LENGTH.

OTL is the preferred size for REQ_TYPE_FS requests as reported by the
device. The intent is to honor that. Your patch clamps the rw_size to
BLK_DEF_MAX_SECTORS which is not correct.

Ming> Also it is ugly to set limits->max_sectors from drivers directly,
Ming> and drivers should have called block helpers to do that.

We're trying to avoid unnecessary accessor functions for the queue
limits. But I will add a sanity check for the page size. And fix up
scsi_debug.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host

2015-12-15 Thread Ming Lei
On Tue, 15 Dec 2015 21:06:31 +0800
Eryu Guan  wrote:

> On Tue, Dec 15, 2015 at 08:06:47PM +0800, Ming Lei wrote:
> > On Tue, Dec 15, 2015 at 7:20 PM, Eryu Guan  wrote:
> > > On Fri, Dec 11, 2015 at 07:53:40PM +0800, Eryu Guan wrote:
> > >> Hi,
> > >>
> > >> I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced
> > >> easily on ppc64 host by:
> > >
> > > This is still reproducible with 4.4-rc5 kernel.
> > 
> > Could you capture the debug log after appyling the attached patch and
> > the reproduction?
> 
> Thanks for looking into this! dmesg shows:
> 
> [  686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0

I guess the following patch should fix the issue, and ca369d51b3
uses OPTIMAL TRANSFER LENGTH to set limits->max_sectors, which
may be less than one page size.

I don't understand the idea behind this change, Martin, could
you explain it a bit?

---
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 3d22fc3..d66d362 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2889,10 +2889,11 @@ static int sd_revalidate_disk(struct gendisk *disk)
 */
if (sdkp->opt_xfer_blocks && sdkp->opt_xfer_blocks <= dev_max &&
sdkp->opt_xfer_blocks <= SD_DEF_XFER_BLOCKS)
-   rw_max = q->limits.io_opt =
+   q->limits.io_opt =
logical_to_sectors(sdp, sdkp->opt_xfer_blocks);
-   else
-   rw_max = BLK_DEF_MAX_SECTORS;
+
+   rw_max = min_t(unsigned, BLK_DEF_MAX_SECTORS,
+  q->limits.max_dev_sectors);
 
/* Combine with controller limits */
q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q));
-- 
1.9.1





> 
> Thanks,
> Eryu
> 
> P.S. full call trace
> 
> [  686.065692] scsi_debug:sdebug_driver_probe: host protection
> [  686.065710] scsi host1: scsi_debug, version 1.85 [20141022], 
> dev_size_mb=256, opts=0x0
> [  686.065981] scsi 1:0:0:0: Direct-Access Linuxscsi_debug   0184 
> PQ: 0 ANSI: 6
> [  686.066873] sd 1:0:0:0: Attached scsi generic sg1 type 0
> [  686.077683] sd 1:0:0:0: [sdb] 524288 512-byte logical blocks: (268 MB/256 
> MiB)
> [  686.077694] sd 1:0:0:0: [sdb] 4096-byte physical blocks
> [  686.087670] sd 1:0:0:0: [sdb] Write Protect is off
> [  686.107671] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, 
> supports DPO and FUA
> [  686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0
> [  686.217695] [ cut here ]
> [  686.217698] kernel BUG at block/bio.c:1793!
> [  686.217702] Oops: Exception in kernel mode, sig: 5 [#1]
> [  686.217704] SMP NR_CPUS=2048 NUMA pSeries
> [  686.217707] Modules linked in: scsi_debug sg pseries_rng nfsd auth_rpcgss 
> nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth 
> scsi_transport_srp
> [  686.217727] CPU: 8 PID: 9515 Comm: kworker/u32:0 Not tainted 4.4.0-rc5+ #33
> [  686.217733] Workqueue: events_unbound async_run_entry_fn
> [  686.217737] task: c005edb23cc0 ti: c005f016c000 task.ti: 
> c005f016c000
> [  686.217740] NIP: c03c45c4 LR: c03c46b8 CTR: 
> 013abb8c
> [  686.217743] REGS: c005f016ea20 TRAP: 0700   Not tainted  (4.4.0-rc5+)
> [  686.217746] MSR: 800100029033   CR: 22bb2322  
> XER: 000f
> [  686.217756] CFAR: c03c46cc SOFTE: 1 
> GPR00: c03c46b8 c005f016eca0 c1068300 002e 
> GPR04: c005ffd09c50 c005ffd1b4a0 0001  
> GPR08: 0001 c0bab284 0005ff16 0130 
> GPR12: 3f30 ce7e4c00  f15d0e40 
> GPR16: c005f3c3b7a0 c0057439 0001  
> GPR20:  0080  c005f5093200 
> GPR24: c005edb0efa0 c005f016ee60 c005f5093288  
> GPR28: 0240 c005f5093200  c005efd67600 
> [  686.217797] NIP [c03c45c4] bio_split+0x54/0x160
> [  686.217800] LR [c03c46b8] bio_split+0x148/0x160
> [  686.217803] Call Trace:
> [  686.217805] [c005f016eca0] [c03c46b8] bio_split+0x148/0x160 
> (unreliable)
> [  686.217810] [c005f016ed30] [c03d75e0] 
> blk_queue_split+0x3c0/0x570
> [  686.217814] [c005f016ee30] [c03d10a8] blk_queue_bio+0x48/0x440
> [  686.217818] [c005f016ee90] [c03cec9c] 
> generic_make_request+0x15c/0x220
> [  686.217822] [c005f016eef0] [c03cee24] submit_bio+0xc4/0x1d0
> [  686.217826] [c005f016efa0] [c02db204] submit_bh_wbc+0x1a4/0x200
> [  686.217830] [c005f016eff0] [c02db6f0] 
> block_read_full_page+0x320/0x420
> [  686.217835] [c005f016f4a0] [c02dedb4] blkdev_readpage+0x24/0x40
> [  686.217839] [c005f016f4c0] [c01f06fc] 
> do_read_cache_page+0xbc/0x290
> [  686.217844] [c005f016f530] [c03e8e00] 

Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host

2015-12-15 Thread Martin K. Petersen
> "Eryu" == Eryu Guan  writes:

Eryu,

Eryu> This is still reproducible with 4.4-rc5 kernel.

Sorry about the delay. I've been busy with a lab move and most of my
machines have been disconnected since last week. Almost done getting my
equipment back online.

However, I think I have found the smoking gun. More in a bit...

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host

2015-12-15 Thread Eryu Guan
On Tue, Dec 15, 2015 at 01:29:59PM -0500, Martin K. Petersen wrote:
> > "Eryu" == Eryu Guan  writes:
> 
> Eryu,
> 
> Does the patch below fix the issue?

Unfortunately no, still BUG_ON.

Thanks,
Eryu

> 
> -- 
> Martin K. PetersenOracle Linux Engineering
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 3d22fc3e3c1a..d1eb7aa78b8d 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -2667,8 +2667,9 @@ static void sd_read_block_limits(struct scsi_disk *sdkp)
>  
>   if (buffer[3] == 0x3c) {
>   unsigned int lba_count, desc_count;
> + u64 max_ws = get_unaligned_be64([36]);
>  
> - sdkp->max_ws_blocks = (u32)get_unaligned_be64([36]);
> + sdkp->max_ws_blocks = (u32)max_ws;
>  
>   if (!sdkp->lbpme)
>   goto out;
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html