Re: 4.15.14 crash with iscsi target and dvd

2018-04-14 Thread Wakko Warner
Ming Lei wrote:
> On Thu, Apr 12, 2018 at 09:43:02PM -0400, Wakko Warner wrote:
> > Ming Lei wrote:
> > > On Tue, Apr 10, 2018 at 08:45:25PM -0400, Wakko Warner wrote:
> > > > Sorry for the delay.  I reverted my change, added this one.  I didn't
> > > > reboot, I just unloaded and loaded this one.
> > > > Note: /dev/sr1 as seen from the initiator is /dev/sr0 (physical disc) 
> > > > on the
> > > > target.
> > > > 
> > > > Doesn't crash, however on the initiator I see this:
> > > > [9273849.70] ISO 9660 Extensions: RRIP_1991A
> > > > [9273863.359718] scsi_io_completion: 13 callbacks suppressed
> > > > [9273863.359788] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: 
> > > > hostbyte=0x00 driverbyte=0x08
> > > > [9273863.359909] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current] 
> > > > [9273863.359974] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0 
> > > > [9273863.360036] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 
> > > > f6 96 00 00 80 00
> > > > [9273863.360116] blk_update_request: 13 callbacks suppressed
> > > > [9273863.360177] blk_update_request: I/O error, dev sr1, sector 9165400
> > > > [9273875.864648] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: 
> > > > hostbyte=0x00 driverbyte=0x08
> > > > [9273875.864738] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current] 
> > > > [9273875.864801] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0 
> > > > [9273875.864890] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 
> > > > f7 16 00 00 80 00
> > > > [9273875.864971] blk_update_request: I/O error, dev sr1, sector 9165912
> > > > 
> > > > To cause this, I mounted the dvd as seen in the first line and ran this
> > > > command: find /cdrom2 -type f | xargs -tn1 cat > /dev/null
> > > > I did some various tests.  Each test was done after umount and mount to
> > > > clear the cache.
> > > > cat  > /dev/null causes the message.
> > > > dd if= of=/dev/null bs=2048 doesn't
> > > > using bs=4096 doesn't
> > > > using bs=64k doesn't
> > > > using bs=128k does
> > > > cat uses a blocksize of 128k.
> > > > 
> > > > The following was done without being mounted.
> > > > ddrescue -f -f /dev/sr1 /dev/null 
> > > > doesn't cause the message
> > > > dd if=/dev/sr1 of=/dev/null bs=128k
> > > > doesn't cause the message
> > > > using bs=256k causes the message once:
> > > > [9275916.857409] sr 27:0:0:0: [sr1] tag#0 UNKNOWN(0x2003) Result: 
> > > > hostbyte=0x00 driverbyte=0x08
> > > > [9275916.857482] sr 27:0:0:0: [sr1] tag#0 Sense Key : 0x2 [current] 
> > > > [9275916.857520] sr 27:0:0:0: [sr1] tag#0 ASC=0x8 ASCQ=0x0 
> > > > [9275916.857556] sr 27:0:0:0: [sr1] tag#0 CDB: opcode=0x28 28 00 00 00 
> > > > 00 00 00 00 80 00
> > > > [9275916.857614] blk_update_request: I/O error, dev sr1, sector 0
> > > > 
> > > > If I access the disc from the target natively either by mounting and
> > > > accessing files or working with the device directly (ie dd) no errors 
> > > > are
> > > > logged on the target.
> > > 
> > > OK, thanks for your test.
> > > 
> > > Could you test the following patch and see if there is still the failure
> > > message?
> > > 
> > > diff --git a/drivers/target/target_core_pscsi.c 
> > > b/drivers/target/target_core_pscsi.c
> > > index 0d99b242e82e..6137287b52fb 100644
> > > --- a/drivers/target/target_core_pscsi.c
> > > +++ b/drivers/target/target_core_pscsi.c
> > > @@ -913,9 +913,11 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist 
> > > *sgl, u32 sgl_nents,
> > >  
> > >   rc = bio_add_pc_page(pdv->pdv_sd->request_queue,
> > >   bio, page, bytes, off);
> > > + if (rc != bytes)
> > > + goto fail;
> > >   pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
> > >   bio_segments(bio), nr_vecs);
> > > - if (rc != bytes) {
> > > + if (/*rc != bytes*/0) {
> > >   pr_debug("PSCSI: Reached bio->bi_vcnt max:"
> > >   " %d i: %d bio: %p, allocating another"
> > >   " bio\n", bio->bi_vcnt, i, bio);
> > 
> > Target doesn't crash but the errors on the initiator are still there.
> 
> OK, then this error log isn't related with my commit, because the patch
> I sent to you in last email is to revert my commit simply.
> 
> But the following patch is one correct fix for your crash.
> 
> https://marc.info/?l=linux-kernel&m=152331690727052&w=2

Ok, that'll be the one I used.  Do you know when it'll go upstream?

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.


Re: 4.15.14 crash with iscsi target and dvd

2018-04-12 Thread Wakko Warner
Ming Lei wrote:
> On Tue, Apr 10, 2018 at 08:45:25PM -0400, Wakko Warner wrote:
> > Sorry for the delay.  I reverted my change, added this one.  I didn't
> > reboot, I just unloaded and loaded this one.
> > Note: /dev/sr1 as seen from the initiator is /dev/sr0 (physical disc) on the
> > target.
> > 
> > Doesn't crash, however on the initiator I see this:
> > [9273849.70] ISO 9660 Extensions: RRIP_1991A
> > [9273863.359718] scsi_io_completion: 13 callbacks suppressed
> > [9273863.359788] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: 
> > hostbyte=0x00 driverbyte=0x08
> > [9273863.359909] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current] 
> > [9273863.359974] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0 
> > [9273863.360036] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f6 
> > 96 00 00 80 00
> > [9273863.360116] blk_update_request: 13 callbacks suppressed
> > [9273863.360177] blk_update_request: I/O error, dev sr1, sector 9165400
> > [9273875.864648] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: 
> > hostbyte=0x00 driverbyte=0x08
> > [9273875.864738] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current] 
> > [9273875.864801] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0 
> > [9273875.864890] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f7 
> > 16 00 00 80 00
> > [9273875.864971] blk_update_request: I/O error, dev sr1, sector 9165912
> > 
> > To cause this, I mounted the dvd as seen in the first line and ran this
> > command: find /cdrom2 -type f | xargs -tn1 cat > /dev/null
> > I did some various tests.  Each test was done after umount and mount to
> > clear the cache.
> > cat  > /dev/null causes the message.
> > dd if= of=/dev/null bs=2048 doesn't
> > using bs=4096 doesn't
> > using bs=64k doesn't
> > using bs=128k does
> > cat uses a blocksize of 128k.
> > 
> > The following was done without being mounted.
> > ddrescue -f -f /dev/sr1 /dev/null 
> > doesn't cause the message
> > dd if=/dev/sr1 of=/dev/null bs=128k
> > doesn't cause the message
> > using bs=256k causes the message once:
> > [9275916.857409] sr 27:0:0:0: [sr1] tag#0 UNKNOWN(0x2003) Result: 
> > hostbyte=0x00 driverbyte=0x08
> > [9275916.857482] sr 27:0:0:0: [sr1] tag#0 Sense Key : 0x2 [current] 
> > [9275916.857520] sr 27:0:0:0: [sr1] tag#0 ASC=0x8 ASCQ=0x0 
> > [9275916.857556] sr 27:0:0:0: [sr1] tag#0 CDB: opcode=0x28 28 00 00 00 00 
> > 00 00 00 80 00
> > [9275916.857614] blk_update_request: I/O error, dev sr1, sector 0
> > 
> > If I access the disc from the target natively either by mounting and
> > accessing files or working with the device directly (ie dd) no errors are
> > logged on the target.
> 
> OK, thanks for your test.
> 
> Could you test the following patch and see if there is still the failure
> message?
> 
> diff --git a/drivers/target/target_core_pscsi.c 
> b/drivers/target/target_core_pscsi.c
> index 0d99b242e82e..6137287b52fb 100644
> --- a/drivers/target/target_core_pscsi.c
> +++ b/drivers/target/target_core_pscsi.c
> @@ -913,9 +913,11 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist 
> *sgl, u32 sgl_nents,
>  
>   rc = bio_add_pc_page(pdv->pdv_sd->request_queue,
>   bio, page, bytes, off);
> + if (rc != bytes)
> + goto fail;
>   pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
>   bio_segments(bio), nr_vecs);
> - if (rc != bytes) {
> + if (/*rc != bytes*/0) {
>   pr_debug("PSCSI: Reached bio->bi_vcnt max:"
>   " %d i: %d bio: %p, allocating another"
>   " bio\n", bio->bi_vcnt, i, bio);

Target doesn't crash but the errors on the initiator are still there.

Seems that if I do large transfers, I see this in the initiator's logs.
With the previous patch, I burned 3 dvds at the same time, compared the
files to the originals and I have a script that catalogs the files.  The
files consist of debian packages and source files.  The 3 operations did not
show any errors in the kernel log on either end.

I did this test:
initiator: dd if=/dev/sr1 bs=512k count=1024 | md5sum
target:dd if=/dev/sr0 bs=512k count=1024 | md5sum

Result: the same.  It's OK even with the i/o errors shown on the initiator.

The above patch was added on top of the one you gave me before, but I don't
believe that that would be an issue.

...  Now if someone could help me with a kvm virtualization problem I'm
having with 4.16 that wasn't there with 4.15...

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.


Re: 4.15.14 crash with iscsi target and dvd

2018-04-11 Thread Wakko Warner
Wakko Warner wrote:
> Ming Lei wrote:
> > Sure, thanks for your sharing.
> > 
> > Wakko, could you test the following patch and see if there is any
> > difference?
> > 
> > --
> > diff --git a/drivers/target/target_core_pscsi.c 
> > b/drivers/target/target_core_pscsi.c
> > index 0d99b242e82e..6147178f1f37 100644
> > --- a/drivers/target/target_core_pscsi.c
> > +++ b/drivers/target/target_core_pscsi.c
> > @@ -888,7 +888,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist 
> > *sgl, u32 sgl_nents,
> > if (len > 0 && data_len > 0) {
> > bytes = min_t(unsigned int, len, PAGE_SIZE - off);
> > bytes = min(bytes, data_len);
> > -
> > + new_bio:
> > if (!bio) {
> > nr_vecs = min_t(int, BIO_MAX_PAGES, nr_pages);
> > nr_pages -= nr_vecs;
> > @@ -931,6 +931,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist 
> > *sgl, u32 sgl_nents,
> >  * be allocated with pscsi_get_bio() above.
> >  */
> > bio = NULL;
> > +   goto new_bio;
> > }
> >  
> > data_len -= bytes;
> 
> Sorry for the delay.  I reverted my change, added this one.  I didn't
> reboot, I just unloaded and loaded this one.
> Note: /dev/sr1 as seen from the initiator is /dev/sr0 (physical disc) on the
> target.
> 
> Doesn't crash, however on the initiator I see this:
> [9273849.70] ISO 9660 Extensions: RRIP_1991A
> [9273863.359718] scsi_io_completion: 13 callbacks suppressed
> [9273863.359788] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: 
> hostbyte=0x00 driverbyte=0x08
> [9273863.359909] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current] 
> [9273863.359974] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0 
> [9273863.360036] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f6 96 
> 00 00 80 00
> [9273863.360116] blk_update_request: 13 callbacks suppressed
> [9273863.360177] blk_update_request: I/O error, dev sr1, sector 9165400
> [9273875.864648] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: 
> hostbyte=0x00 driverbyte=0x08
> [9273875.864738] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current] 
> [9273875.864801] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0 
> [9273875.864890] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f7 16 
> 00 00 80 00
> [9273875.864971] blk_update_request: I/O error, dev sr1, sector 9165912

Just FYI: The jobs that I do that uses the disc over iscsi didn't cause any
kernel messages on either system (except for the informational when the disc
was mounted)

I have a dumb question though.  Could the label be placed just after the
'if' statement instead of before it?  bio is set to null and the 'if'
statement checks if it's null, which it always would be after the goto.

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.


Re: 4.15.14 crash with iscsi target and dvd

2018-04-10 Thread Wakko Warner
Ming Lei wrote:
> Sure, thanks for your sharing.
> 
> Wakko, could you test the following patch and see if there is any
> difference?
> 
> --
> diff --git a/drivers/target/target_core_pscsi.c 
> b/drivers/target/target_core_pscsi.c
> index 0d99b242e82e..6147178f1f37 100644
> --- a/drivers/target/target_core_pscsi.c
> +++ b/drivers/target/target_core_pscsi.c
> @@ -888,7 +888,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, 
> u32 sgl_nents,
>   if (len > 0 && data_len > 0) {
>   bytes = min_t(unsigned int, len, PAGE_SIZE - off);
>   bytes = min(bytes, data_len);
> -
> + new_bio:
>   if (!bio) {
>   nr_vecs = min_t(int, BIO_MAX_PAGES, nr_pages);
>   nr_pages -= nr_vecs;
> @@ -931,6 +931,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, 
> u32 sgl_nents,
>* be allocated with pscsi_get_bio() above.
>*/
>   bio = NULL;
> + goto new_bio;
>   }
>  
>   data_len -= bytes;

Sorry for the delay.  I reverted my change, added this one.  I didn't
reboot, I just unloaded and loaded this one.
Note: /dev/sr1 as seen from the initiator is /dev/sr0 (physical disc) on the
target.

Doesn't crash, however on the initiator I see this:
[9273849.70] ISO 9660 Extensions: RRIP_1991A
[9273863.359718] scsi_io_completion: 13 callbacks suppressed
[9273863.359788] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 
driverbyte=0x08
[9273863.359909] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current] 
[9273863.359974] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0 
[9273863.360036] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f6 96 00 
00 80 00
[9273863.360116] blk_update_request: 13 callbacks suppressed
[9273863.360177] blk_update_request: I/O error, dev sr1, sector 9165400
[9273875.864648] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 
driverbyte=0x08
[9273875.864738] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current] 
[9273875.864801] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0 
[9273875.864890] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f7 16 00 
00 80 00
[9273875.864971] blk_update_request: I/O error, dev sr1, sector 9165912

To cause this, I mounted the dvd as seen in the first line and ran this
command: find /cdrom2 -type f | xargs -tn1 cat > /dev/null
I did some various tests.  Each test was done after umount and mount to
clear the cache.
cat  > /dev/null causes the message.
dd if= of=/dev/null bs=2048 doesn't
using bs=4096 doesn't
using bs=64k doesn't
using bs=128k does
cat uses a blocksize of 128k.

The following was done without being mounted.
ddrescue -f -f /dev/sr1 /dev/null 
doesn't cause the message
dd if=/dev/sr1 of=/dev/null bs=128k
doesn't cause the message
using bs=256k causes the message once:
[9275916.857409] sr 27:0:0:0: [sr1] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 
driverbyte=0x08
[9275916.857482] sr 27:0:0:0: [sr1] tag#0 Sense Key : 0x2 [current] 
[9275916.857520] sr 27:0:0:0: [sr1] tag#0 ASC=0x8 ASCQ=0x0 
[9275916.857556] sr 27:0:0:0: [sr1] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 
00 80 00
[9275916.857614] blk_update_request: I/O error, dev sr1, sector 0

If I access the disc from the target natively either by mounting and
accessing files or working with the device directly (ie dd) no errors are
logged on the target.

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.


Re: 4.15.14 crash with iscsi target and dvd

2018-04-09 Thread Wakko Warner
Ming Lei wrote:
> On Mon, Apr 09, 2018 at 09:30:11PM +, Bart Van Assche wrote:
> > Hello Ming,
> > 
> > Can you have a look at this? The start of this e-mail thread is available at
> > https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg72574.html.
> 
> Sure, thanks for your sharing.
> 
> Wakko, could you test the following patch and see if there is any
> difference?

Sure, one question, is this against 4.15 or does it matter.  Last I looked,
4.16 hasn't changed from 4.15 for that file.

> diff --git a/drivers/target/target_core_pscsi.c 
> b/drivers/target/target_core_pscsi.c
> index 0d99b242e82e..6147178f1f37 100644
> --- a/drivers/target/target_core_pscsi.c
> +++ b/drivers/target/target_core_pscsi.c
> @@ -888,7 +888,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, 
> u32 sgl_nents,
>   if (len > 0 && data_len > 0) {
>   bytes = min_t(unsigned int, len, PAGE_SIZE - off);
>   bytes = min(bytes, data_len);
> -
> + new_bio:
>   if (!bio) {
>   nr_vecs = min_t(int, BIO_MAX_PAGES, nr_pages);
>   nr_pages -= nr_vecs;
> @@ -931,6 +931,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, 
> u32 sgl_nents,
>* be allocated with pscsi_get_bio() above.
>*/
>   bio = NULL;
> + goto new_bio;
>   }
>  
>   data_len -= bytes;
> 
> -- 
> Ming
-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.


Re: 4.15.14 crash with iscsi target and dvd

2018-04-08 Thread Wakko Warner
Wakko Warner wrote:
> Bart Van Assche wrote:
> > Have you tried to modify the kernel Makefile as indicated in the following
> > e-mail? This should make the kernel build:
> > 
> > https://lists.ubuntu.com/archives/kernel-team/2016-May/077178.html
> 
> Thanks.  That helped.
> 
> I finished with git bisect.  Here's the output:
> 84c8590646d5b35804bac60eb58b145839b5893e is the first bad commit
> commit 84c8590646d5b35804bac60eb58b145839b5893e
> Author: Ming Lei 
> Date:   Fri Nov 11 20:05:32 2016 +0800
> 
> target: avoid accessing .bi_vcnt directly
> 
> When the bio is full, bio_add_pc_page() will return zero,
> so use this information tell when the bio is full.
> 
> Also replace access to .bi_vcnt for pr_debug() with bio_segments().
> 
> Reviewed-by: Christoph Hellwig 
> Signed-off-by: Ming Lei 
> Reviewed-by: Sagi Grimberg 
> Signed-off-by: Jens Axboe 
> 
> :04 04 a3ebbb71c52ee4eb8c3be4d033b81179211bf704 
> de39a328dbd1b18519946b3ad46d9302886e0dd0 M  drivers
> 
> I did a diff between HEAD^ and HEAD and manually patched the file from
> 4.15.14.  It's not an exact revert.  I'm running it now and it's working.
> I'll do a better test later on.  Here's the patch:
> 
> --- a/drivers/target/target_core_pscsi.c  2018-02-04 14:31:31.077316617 
> -0500
> +++ b/drivers/target/target_core_pscsi.c  2018-04-08 11:43:49.588641374 
> -0400
> @@ -915,7 +915,9 @@
>   bio, page, bytes, off);
>   pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
>   bio_segments(bio), nr_vecs);
> - if (rc != bytes) {
> + if (rc != bytes)
> + goto fail;
> + if (bio->bi_vcnt > nr_vecs) {
>   pr_debug("PSCSI: Reached bio->bi_vcnt max:"
>   " %d i: %d bio: %p, allocating another"
>   " bio\n", bio->bi_vcnt, i, bio);
> 
> I really appreciate your time and assistance with this.

One thing I noticed after doing this is errors in the kernel log on the
initiator:
[9072625.181744] sr 26:0:0:0: [sr1] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 
driverbyte=0x08
[9072625.181802] sr 26:0:0:0: [sr1] tag#0 Sense Key : 0x2 [current] 
[9072625.181835] sr 26:0:0:0: [sr1] tag#0 ASC=0x8 ASCQ=0x0 
[9072625.181866] sr 26:0:0:0: [sr1] tag#0 CDB: opcode=0x28 28 00 00 0a 81 22 00 
00 80 00
[9072625.181919] blk_update_request: I/O error, dev sr1, sector 2753672

When doing the exact same thing on the target, no mention.  My patch may not
be right, but it doesn't cause an oops.

I'm going to try 4.16.1 and see what happens.

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.


Re: 4.15.14 crash with iscsi target and dvd

2018-04-08 Thread Wakko Warner
Bart Van Assche wrote:
> On Sat, 2018-04-07 at 12:53 -0400, Wakko Warner wrote:
> > Bart Van Assche wrote:
> > > On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote:
> > > > I know now why scsi_print_command isn't doing anything.  cmd->cmnd is 
> > > > null.
> > > > I added a dev_printk in scsi_print_command where the 2 if statements 
> > > > return.
> > > > Logs:
> > > > [  29.866415] sr 3:0:0:0: cmd->cmnd is NULL
> > > 
> > > That's something that should never happen. As one can see in
> > > scsi_setup_scsi_cmnd() and scsi_setup_fs_cmnd() both functions initialize
> > > that pointer. Since I have not yet been able to reproduce myself what you
> > > reported, would it be possible for you to bisect this issue? You will need
> > > to follow something like the following procedure (see also
> > > https://git-scm.com/docs/git-bisect):
> > 
> > After doing 3 successful compiles with good/bad, I got this error and was
> > not able to compile any more kernels:
> >   CC  scripts/mod/devicetable-offsets.s
> > scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
> >  /* empty file to figure out endianness / word size */
> >  
> > scripts/mod/devicetable-offsets.c:1:0: error: code model kernel does not 
> > support PIC mode
> >  #include 
> >  
> > scripts/Makefile.build:153: recipe for target 
> > 'scripts/mod/devicetable-offsets.s' failed
> > 
> > I don't think it found the bad commit.
> 
> Have you tried to modify the kernel Makefile as indicated in the following
> e-mail? This should make the kernel build:
> 
> https://lists.ubuntu.com/archives/kernel-team/2016-May/077178.html

Thanks.  That helped.

I finished with git bisect.  Here's the output:
84c8590646d5b35804bac60eb58b145839b5893e is the first bad commit
commit 84c8590646d5b35804bac60eb58b145839b5893e
Author: Ming Lei 
Date:   Fri Nov 11 20:05:32 2016 +0800

target: avoid accessing .bi_vcnt directly

When the bio is full, bio_add_pc_page() will return zero,
so use this information tell when the bio is full.

Also replace access to .bi_vcnt for pr_debug() with bio_segments().

Reviewed-by: Christoph Hellwig 
Signed-off-by: Ming Lei 
Reviewed-by: Sagi Grimberg 
Signed-off-by: Jens Axboe 

:04 04 a3ebbb71c52ee4eb8c3be4d033b81179211bf704 
de39a328dbd1b18519946b3ad46d9302886e0dd0 M  drivers

I did a diff between HEAD^ and HEAD and manually patched the file from
4.15.14.  It's not an exact revert.  I'm running it now and it's working.
I'll do a better test later on.  Here's the patch:

--- a/drivers/target/target_core_pscsi.c2018-02-04 14:31:31.077316617 
-0500
+++ b/drivers/target/target_core_pscsi.c2018-04-08 11:43:49.588641374 
-0400
@@ -915,7 +915,9 @@
bio, page, bytes, off);
pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
bio_segments(bio), nr_vecs);
-   if (rc != bytes) {
+   if (rc != bytes)
+   goto fail;
+   if (bio->bi_vcnt > nr_vecs) {
pr_debug("PSCSI: Reached bio->bi_vcnt max:"
" %d i: %d bio: %p, allocating another"
" bio\n", bio->bi_vcnt, i, bio);

I really appreciate your time and assistance with this.

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.


Re: 4.15.14 crash with iscsi target and dvd

2018-04-07 Thread Wakko Warner
Wakko Warner wrote:
> Bart Van Assche wrote:
> > On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote:
> > > I know now why scsi_print_command isn't doing anything.  cmd->cmnd is 
> > > null.
> > > I added a dev_printk in scsi_print_command where the 2 if statements 
> > > return.
> > > Logs:
> > > [  29.866415] sr 3:0:0:0: cmd->cmnd is NULL
> > 
> > That's something that should never happen. As one can see in
> > scsi_setup_scsi_cmnd() and scsi_setup_fs_cmnd() both functions initialize
> > that pointer. Since I have not yet been able to reproduce myself what you
> > reported, would it be possible for you to bisect this issue? You will need
> > to follow something like the following procedure (see also
> > https://git-scm.com/docs/git-bisect):
> 
> After doing 3 successful compiles with good/bad, I got this error and was
> not able to compile any more kernels:
>   CC  scripts/mod/devicetable-offsets.s
> scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
>  /* empty file to figure out endianness / word size */
>  
> scripts/mod/devicetable-offsets.c:1:0: error: code model kernel does not 
> support PIC mode
>  #include 
>  
> scripts/Makefile.build:153: recipe for target 
> 'scripts/mod/devicetable-offsets.s' failed
> 
> I don't think it found the bad commit.

I forgot to mention my gcc version.
gcc (Debian 6.2.1-7) 6.2.1 20161215

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.


Re: 4.15.14 crash with iscsi target and dvd

2018-04-07 Thread Wakko Warner
Bart Van Assche wrote:
> On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote:
> > I know now why scsi_print_command isn't doing anything.  cmd->cmnd is null.
> > I added a dev_printk in scsi_print_command where the 2 if statements return.
> > Logs:
> > [  29.866415] sr 3:0:0:0: cmd->cmnd is NULL
> 
> That's something that should never happen. As one can see in
> scsi_setup_scsi_cmnd() and scsi_setup_fs_cmnd() both functions initialize
> that pointer. Since I have not yet been able to reproduce myself what you
> reported, would it be possible for you to bisect this issue? You will need
> to follow something like the following procedure (see also
> https://git-scm.com/docs/git-bisect):

After doing 3 successful compiles with good/bad, I got this error and was
not able to compile any more kernels:
  CC  scripts/mod/devicetable-offsets.s
scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
 /* empty file to figure out endianness / word size */
 
scripts/mod/devicetable-offsets.c:1:0: error: code model kernel does not 
support PIC mode
 #include 
 
scripts/Makefile.build:153: recipe for target 
'scripts/mod/devicetable-offsets.s' failed

I don't think it found the bad commit.

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.


Re: 4.15.14 crash with iscsi target and dvd

2018-04-06 Thread Wakko Warner
Bart Van Assche wrote:
> On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote:
> > I know now why scsi_print_command isn't doing anything.  cmd->cmnd is null.
> > I added a dev_printk in scsi_print_command where the 2 if statements return.
> > Logs:
> > [  29.866415] sr 3:0:0:0: cmd->cmnd is NULL
> 
> That's something that should never happen. As one can see in
> scsi_setup_scsi_cmnd() and scsi_setup_fs_cmnd() both functions initialize
> that pointer. Since I have not yet been able to reproduce myself what you
> reported, would it be possible for you to bisect this issue? You will need
> to follow something like the following procedure (see also
> https://git-scm.com/docs/git-bisect):
> 
> git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> git bisect start
> git bisect bad v4.10
> git bisect good v4.9
> 
> and then build the kernel, install it, boot the kernel and test it.
> Depending on the result, run either git bisect bad or git bisect good and
> keep going until git bisect comes to a conclusion. This can take an hour or
> more.

I have 1 question.  Should make clean be done between tests?  My box
compiles the whole kernel in 2 minutes.

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.


Re: 4.15.14 crash with iscsi target and dvd

2018-04-06 Thread Wakko Warner
Bart Van Assche wrote:
> On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote:
> > I know now why scsi_print_command isn't doing anything.  cmd->cmnd is null.
> > I added a dev_printk in scsi_print_command where the 2 if statements return.
> > Logs:
> > [  29.866415] sr 3:0:0:0: cmd->cmnd is NULL
> 
> That's something that should never happen. As one can see in
> scsi_setup_scsi_cmnd() and scsi_setup_fs_cmnd() both functions initialize
> that pointer. Since I have not yet been able to reproduce myself what you
> reported, would it be possible for you to bisect this issue? You will need
> to follow something like the following procedure (see also
> https://git-scm.com/docs/git-bisect):

I don't know how relevent it is, but this machine boots nfs and exports it's
dvd drives over iscsi with the target modules.  My scsi_target.lio is at the
end.  I removed the iqn name.  The options are default except for a few. 
Non default options I tabbed over.
eth0 is the nfs/localnet nic and eth1 is the
nic that iscsi goes over.
eth0 is onboard pci 8086:1502 (subsystem 1028:05d3)
eth1 is pci 8086:107d (subsystem 8086:1084)
Both use the e1000e driver

The initiator is running 4.4.107.
When running on the initiator, /dev/sr1 is the target /dev/sr0.  Therefor
cat /dev/sr1 > /dev/null seems to work.
mount /dev/sr1 /cdrom works
find /cdrom -type f | xargs cat > /dev/null immediately crashes the target.
Burning to /dev/sr1 seems to work.

I have another nic that uses igb instead, I'll see if that makes a
difference.

> git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> git bisect start
> git bisect bad v4.10
> git bisect good v4.9
> 
> and then build the kernel, install it, boot the kernel and test it.
> Depending on the result, run either git bisect bad or git bisect good and
> keep going until git bisect comes to a conclusion. This can take an hour or
> more.

I'll try this.

Here's my scsi_target.lio:
storage pscsi {
disk dvd0 {
path /dev/sr0 
attribute {
emulate_3pc yes 
emulate_caw yes 
emulate_dpo no 
emulate_fua_read no 
emulate_model_alias no 
emulate_rest_reord no 
emulate_tas yes 
emulate_tpu no 
emulate_tpws no 
emulate_ua_intlck_ctrl no 
emulate_write_cache no 
enforce_pr_isids yes 
fabric_max_sectors 8192 
is_nonrot yes 
max_unmap_block_desc_count 0 
max_unmap_lba_count 0 
max_write_same_len 65535 
queue_depth 128 
unmap_granularity 0 
unmap_granularity_alignment 0 
}
}
disk dvd1 {
path /dev/sr1 
attribute {
emulate_3pc yes 
emulate_caw yes 
emulate_dpo no 
emulate_fua_read no 
emulate_model_alias no 
emulate_rest_reord no 
emulate_tas yes 
emulate_tpu no 
emulate_tpws no 
emulate_ua_intlck_ctrl no 
emulate_write_cache no 
enforce_pr_isids yes 
fabric_max_sectors 8192 
is_nonrot yes 
max_unmap_block_desc_count 0 
max_unmap_lba_count 0 
max_write_same_len 65535 
queue_depth 128 
unmap_granularity 0 
unmap_granularity_alignment 0 
}
}
disk dvd2 {
path /dev/sr2 
attribute {
emulate_3pc yes 
emulate_caw yes 
emulate_dpo no 
emulate_fua_read no 
emulate_model_alias no 
emulate_rest_reord no 
emulate_tas yes 
emulate_tpu no 
emulate_tpws no 
emulate_ua_intlck_ctrl no 
emulate_write_cache no 
enforce_pr_isids yes 
fabric_max_sectors 8192 
is_nonrot yes 
max_unmap_block_desc_count 0 
max_unmap_lba_count 0 
max_write_same_len 65535 
queue_depth 128 
unmap_granularity 0 
unmap_granularity_alignment 0 
}
}
}
fabric iscsi {
discovery_auth {
enable no 
mutual_password "" 
mutual_userid "" 
password "" 
userid "" 
}
target iqn.:dvd tpgt 1 {
enable yes 
attribute {
authentication no 
cache_dynamic_acls yes 
default_cmdsn_depth 64 
default_erl 0 
demo_mode_discovery yes 
demo_mode_write_protect no 
fabric_prot_type 0 
generate_node_acls yes 
login_timeout 15 
netif_timeout 2 
prod_mode_write_protect no 
t10_pi 0 
tpg_enabled_sendtargets 1 
}
auth {
password "" 
password_mutual "" 
userid "" 
userid_mutual "" 
}
parameter {
AuthMethod "CHAP,None" 
DataDigest "CRC32C,None" 
DataPDUInOrder yes 
DataSequenceInOrder yes 
DefaultTime2Retain 20 
DefaultTime2Wait 2 
ErrorRecoveryLevel no 
FirstBurstLength 65536 
HeaderDigest "CRC32C,None" 
IFMarkInt Reject 
IFMarker no 
ImmediateData yes 
InitialR2T yes 
MaxBurstLength 262144 
MaxConnections 1 
MaxOutstandingR2T 1 
MaxRecvDataSegmentLength 8192 
MaxXmitDataSegmentLength 262144 
OFMarkInt Reject 
OFMarker no 
TargetAlias "LIO Target" 
}
lun 0 backend pscsi:dvd0 
lun 1 backend pscsi:dvd1 
lun 2 backend pscsi:dvd2 
portal 0.0.0.0:3260 
}
}


-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.


Re: 4.15.14 crash with iscsi target and dvd

2018-04-05 Thread Wakko Warner
Wakko Warner wrote:
> Bart Van Assche wrote:
> > On Sun, 2018-04-01 at 14:27 -0400, Wakko Warner wrote:
> > > Wakko Warner wrote:
> > > > Wakko Warner wrote:
> > > > > I tested 4.14.32 last night with the same oops.  4.9.91 works fine.
> > > > > From the initiator, if I do cat /dev/sr1 > /dev/null it works.  If I 
> > > > > mount
> > > > > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> > > > > crashes.  I'm using the builtin iscsi target with pscsi.  I can burn 
> > > > > from
> > > > > the initiator with out problems.  I'll test other kernels between 4.9 
> > > > > and
> > > > > 4.14.
> > > > 
> > > > So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest 
> > > > patch
> > > > (except for 4.15 which was 1 behind)
> > > > Each of these kernels crash within seconds or immediate of doing find 
> > > > -type
> > > > f | xargs cat > /dev/null from the initiator.
> > > 
> > > I tried 4.10.0.  It doesn't completely lockup the system, but the device
> > > that was used hangs.  So from the initiator, it's /dev/sr1 and from the
> > > target it's /dev/sr0.  Attempting to read /dev/sr0 after the oops causes 
> > > the
> > > process to hang in D state.
> > 
> > Hello Wakko,
> > 
> > Thank you for having narrowed down this further. I think that you 
> > encountered
> > a regression either in the block layer core or in the SCSI core. 
> > Unfortunately
> > the number of changes between kernel versions v4.9 and v4.10 in these two
> > subsystems is huge. I see two possible ways forward:
> > - Either that you perform a bisect to identify the patch that introduced 
> > this
> >   regression. However, I'm not sure whether you are familiar with the bisect
> >   process.
> > - Or that you identify the command that triggers this crash such that others
> >   can reproduce this issue without needing access to your setup.
> > 
> > How about reproducing this crash with the below patch applied on top of
> > kernel v4.15.x? The additional output sent by this patch to the system log
> > should allow us to reproduce this issue by submitting the same SCSI command
> > with sg_raw.
> 
> Ok, so I tried this, but scsi_print_command doesn't print anything.  I added
> a check for !rq and the same thing that blk_rq_nr_phys_segments does in an
> if statement above this thinking it might have crashed during WARN_ON_ONCE.
> It still didn't print anything.  My printk shows this:
> [  36.263193] sr 3:0:0:0: cmd->request->nr_phys_segments is 0
> 
> I also had scsi_print_command in the same if block which again didn't print
> anything.  Is there some debug option I need to turn on to make it print?  I
> tried looking through the code for this and following some of the function
> calls but didn't see any config options.

I know now why scsi_print_command isn't doing anything.  cmd->cmnd is null.
I added a dev_printk in scsi_print_command where the 2 if statements return.
Logs:
[  29.866415] sr 3:0:0:0: cmd->cmnd is NULL

> > Subject: [PATCH] Report commands with no physical segments in the system log
> > 
> > ---
> >  drivers/scsi/scsi_lib.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 6b6a6705f6e5..74a39db57d49 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1093,8 +1093,10 @@ int scsi_init_io(struct scsi_cmnd *cmd)
> > bool is_mq = (rq->mq_ctx != NULL);
> > int error = BLKPREP_KILL;
> >  
> > -   if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq)))
> > +   if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq))) {
> > +   scsi_print_command(cmd);
> > goto err_exit;
> > +   }
> >  
> > error = scsi_init_sgtable(rq, &cmd->sdb);
> > if (error)
> -- 
>  Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
>  million bugs.
-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.


Re: 4.15.14 crash with iscsi target and dvd

2018-04-05 Thread Wakko Warner
Bart Van Assche wrote:
> On Sun, 2018-04-01 at 14:27 -0400, Wakko Warner wrote:
> > Wakko Warner wrote:
> > > Wakko Warner wrote:
> > > > I tested 4.14.32 last night with the same oops.  4.9.91 works fine.
> > > > From the initiator, if I do cat /dev/sr1 > /dev/null it works.  If I 
> > > > mount
> > > > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> > > > crashes.  I'm using the builtin iscsi target with pscsi.  I can burn 
> > > > from
> > > > the initiator with out problems.  I'll test other kernels between 4.9 
> > > > and
> > > > 4.14.
> > > 
> > > So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest 
> > > patch
> > > (except for 4.15 which was 1 behind)
> > > Each of these kernels crash within seconds or immediate of doing find 
> > > -type
> > > f | xargs cat > /dev/null from the initiator.
> > 
> > I tried 4.10.0.  It doesn't completely lockup the system, but the device
> > that was used hangs.  So from the initiator, it's /dev/sr1 and from the
> > target it's /dev/sr0.  Attempting to read /dev/sr0 after the oops causes the
> > process to hang in D state.
> 
> Hello Wakko,
> 
> Thank you for having narrowed down this further. I think that you encountered
> a regression either in the block layer core or in the SCSI core. Unfortunately
> the number of changes between kernel versions v4.9 and v4.10 in these two
> subsystems is huge. I see two possible ways forward:
> - Either that you perform a bisect to identify the patch that introduced this
>   regression. However, I'm not sure whether you are familiar with the bisect
>   process.
> - Or that you identify the command that triggers this crash such that others
>   can reproduce this issue without needing access to your setup.
> 
> How about reproducing this crash with the below patch applied on top of
> kernel v4.15.x? The additional output sent by this patch to the system log
> should allow us to reproduce this issue by submitting the same SCSI command
> with sg_raw.

Ok, so I tried this, but scsi_print_command doesn't print anything.  I added
a check for !rq and the same thing that blk_rq_nr_phys_segments does in an
if statement above this thinking it might have crashed during WARN_ON_ONCE.
It still didn't print anything.  My printk shows this:
[  36.263193] sr 3:0:0:0: cmd->request->nr_phys_segments is 0

I also had scsi_print_command in the same if block which again didn't print
anything.  Is there some debug option I need to turn on to make it print?  I
tried looking through the code for this and following some of the function
calls but didn't see any config options.

> Subject: [PATCH] Report commands with no physical segments in the system log
> 
> ---
>  drivers/scsi/scsi_lib.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 6b6a6705f6e5..74a39db57d49 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1093,8 +1093,10 @@ int scsi_init_io(struct scsi_cmnd *cmd)
>   bool is_mq = (rq->mq_ctx != NULL);
>   int error = BLKPREP_KILL;
>  
> - if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq)))
> + if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq))) {
> + scsi_print_command(cmd);
>   goto err_exit;
> + }
>  
>   error = scsi_init_sgtable(rq, &cmd->sdb);
>   if (error)
-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.


Re: 4.15.14 crash with iscsi target and dvd

2018-04-04 Thread Wakko Warner
Bart Van Assche wrote:
> On Sun, 2018-04-01 at 14:27 -0400, Wakko Warner wrote:
> > Wakko Warner wrote:
> > > Wakko Warner wrote:
> > > > I tested 4.14.32 last night with the same oops.  4.9.91 works fine.
> > > > From the initiator, if I do cat /dev/sr1 > /dev/null it works.  If I 
> > > > mount
> > > > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> > > > crashes.  I'm using the builtin iscsi target with pscsi.  I can burn 
> > > > from
> > > > the initiator with out problems.  I'll test other kernels between 4.9 
> > > > and
> > > > 4.14.
> > > 
> > > So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest 
> > > patch
> > > (except for 4.15 which was 1 behind)
> > > Each of these kernels crash within seconds or immediate of doing find 
> > > -type
> > > f | xargs cat > /dev/null from the initiator.
> > 
> > I tried 4.10.0.  It doesn't completely lockup the system, but the device
> > that was used hangs.  So from the initiator, it's /dev/sr1 and from the
> > target it's /dev/sr0.  Attempting to read /dev/sr0 after the oops causes the
> > process to hang in D state.
> 
> Hello Wakko,
> 
> Thank you for having narrowed down this further. I think that you encountered
> a regression either in the block layer core or in the SCSI core. Unfortunately
> the number of changes between kernel versions v4.9 and v4.10 in these two
> subsystems is huge. I see two possible ways forward:
> - Either that you perform a bisect to identify the patch that introduced this
>   regression. However, I'm not sure whether you are familiar with the bisect
>   process.
> - Or that you identify the command that triggers this crash such that others
>   can reproduce this issue without needing access to your setup.
> 
> How about reproducing this crash with the below patch applied on top of
> kernel v4.15.x? The additional output sent by this patch to the system log
> should allow us to reproduce this issue by submitting the same SCSI command
> with sg_raw.

Sorry for not getting back in touch.  My internet was down.  I haven't tried
the patch yet.  I'll try to get to that tomorrow.  The system with the issue
is busy and I can't reboot it right now.


Re: 4.15.14 crash with iscsi target and dvd

2018-04-01 Thread Wakko Warner
Wakko Warner wrote:
> Wakko Warner wrote:
> > Bart Van Assche wrote:
> > > On Sat, 2018-03-31 at 18:12 -0400, Wakko Warner wrote:
> > > > Richard Weinberger wrote:
> > > > > On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner  
> > > > > wrote:
> > > > > > I reported this before but noone responded.
> > > > > 
> > > > > Because you're sending only to LKML.
> > > > > CC'ing storage folks.
> > > > 
> > > > Thank you.  I wasn't sure who I needed to send it to.
> > > 
> > > Can you share the output of lsscsi? I would like to know whether or not 
> > > you
> > > are using a (S)ATA CDROM.
> > 
> > >From the target:
> > [4:0:0:0]cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr0 
> > [5:0:0:0]cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr1 
> > [6:0:0:0]cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr2 
> > 
> > >From the initiator:
> > [19:0:0:0]   cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr1
> > [19:0:0:1]   cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr2
> > [19:0:0:2]   cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr3
> > 
> > 
> > I tested 4.14.32 last night with the same oops.  4.9.91 works fine.
> > >From the initiator, if I do cat /dev/sr1 > /dev/null it works.  If I mount
> > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> > crashes.  I'm using the builtin iscsi target with pscsi.  I can burn from
> > the initiator with out problems.  I'll test other kernels between 4.9 and
> > 4.14.
> 
> So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest patch
> (except for 4.15 which was 1 behind)
> Each of these kernels crash within seconds or immediate of doing find -type
> f | xargs cat > /dev/null from the initiator.

I tried 4.10.0.  It doesn't completely lockup the system, but the device
that was used hangs.  So from the initiator, it's /dev/sr1 and from the
target it's /dev/sr0.  Attempting to read /dev/sr0 after the oops causes the
process to hang in D state.

Here's the oops.  There was also another line that was not seen in the newer
kernels.
[ 323.105044] [ cut here ]
[ 323.105057] WARNING: CPU: 0 PID: 0 at 
/usr/src/linux/dist/4.10/drivers/scsi/scsi_lib.c:1043 scsi_init_io+0x143/0x1f0 
[scsi_mod]
[ 323.105058] Modules linked in: iscsi_target_mod af_packet tcm_loop vhost_scsi 
vhost target_core_file target_core_iblock target_core_pscsi target_core_mod 
nfsd exportfs dummy bridge stp llc ib_iser rdma_cm iw_cm ib_cm ib_core ipv6 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi netconsole configfs sr_mod 
cdrom sd_mod sg adt7475 hwmon_vid coretemp x86_pkg_temp_thermal kvm_intel kvm 
irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc 
snd_hda_codec_realtek snd_hda_codec_generic nouveau video led_class 
drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt 
fb_sys_fops cfbcopyarea ttm drm snd_hda_intel agpgart snd_hda_codec 
snd_hda_core snd_pcm_oss igb snd_mixer_oss aesni_intel snd_pcm aes_x86_64 hwmon 
snd_timer crypto_simd i2c_algo_bit mptsas snd glue_helper
[ 323.105089]  mpt3sas i2c_core mptscsih soundcore ahci mptbase raid_class 
libahci scsi_transport_sas libata scsi_mod button wmi hed unix
[ 323.105097] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0 #1
[ 323.105098] Hardware name: Dell Inc. Precision T5610/0WN7Y6, BIOS A16 
02/05/2018
[ 323.105100] Call Trace:
[ 323.105101]  
[ 323.105105]  ? dump_stack+0x46/0x5a
[ 323.105107]  ? __warn+0xb4/0xd0
[ 323.105110]  ? scsi_init_io+0x143/0x1f0 [scsi_mod]
[ 323.105113]  ? scsi_setup_cmnd+0x4c/0x140 [scsi_mod]
[ 323.105115]  ? scsi_prep_fn+0xe3/0x170 [scsi_mod]
[ 323.105118]  ? swiotlb_unmap_sg_attrs+0x44/0x60
[ 323.105119]  ? blk_peek_request+0x130/0x200
[ 323.105122]  ? scsi_request_fn+0x2b/0x510 [scsi_mod]
[ 323.105124]  ? __blk_run_queue+0x2a/0x40
[ 323.105126]  ? blk_run_queue+0x1c/0x30
[ 323.105129]  ? scsi_run_queue+0x229/0x2b0 [scsi_mod]
[ 323.105131]  ? scsi_io_completion+0x3d6/0x5c0 [scsi_mod]
[ 323.105133]  ? blk_done_softirq+0x67/0x80
[ 323.105135]  ? __do_softirq+0xdb/0x200
[ 323.105137]  ? irq_exit+0xa3/0xb0
[ 323.105139]  ? do_IRQ+0x45/0xc0
[ 323.105141]  ? common_interrupt+0x7c/0x7c
[ 323.105142]  
[ 323.105145]  ? cpuidle_enter_state+0x144/0x1f0
[ 323.105146]  ? cpuidle_enter_state+0x139/0x1f0
[ 323.105148]  ? do_idle+0xd3/0x190
[ 323.105150]  ? cpu_startup_entry+0x14/0x20
[ 323.105152]  ? start_kernel+0x391/0x399
[ 323.105154]  ? start_cpu+0x14/0x14
[ 323.105155] ---[ end trace f38cc734e4921bdc ]---
[ 323.105157] blk_peek_request: bad return=-22


Re: 4.15.14 crash with iscsi target and dvd

2018-04-01 Thread Wakko Warner
Wakko Warner wrote:
> Bart Van Assche wrote:
> > On Sat, 2018-03-31 at 18:12 -0400, Wakko Warner wrote:
> > > Richard Weinberger wrote:
> > > > On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner  
> > > > wrote:
> > > > > I reported this before but noone responded.
> > > > 
> > > > Because you're sending only to LKML.
> > > > CC'ing storage folks.
> > > 
> > > Thank you.  I wasn't sure who I needed to send it to.
> > 
> > Can you share the output of lsscsi? I would like to know whether or not you
> > are using a (S)ATA CDROM.
> 
> >From the target:
> [4:0:0:0]cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr0 
> [5:0:0:0]cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr1 
> [6:0:0:0]cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr2 
> 
> >From the initiator:
> [19:0:0:0]   cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr1
> [19:0:0:1]   cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr2
> [19:0:0:2]   cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr3
> 
> 
> I tested 4.14.32 last night with the same oops.  4.9.91 works fine.
> >From the initiator, if I do cat /dev/sr1 > /dev/null it works.  If I mount
> /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> crashes.  I'm using the builtin iscsi target with pscsi.  I can burn from
> the initiator with out problems.  I'll test other kernels between 4.9 and
> 4.14.

So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest patch
(except for 4.15 which was 1 behind)
Each of these kernels crash within seconds or immediate of doing find -type
f | xargs cat > /dev/null from the initiator.

I did a diff between 4.9.91 and 4.10.17 on scsi_lib.c.  Here's the
difference around the line reported (in this case 1043).  I've added the
4.10.17 oops at the end:

@@ -1029,10 +1038,10 @@ int scsi_init_io(struct scsi_cmnd *cmd)
struct scsi_device *sdev = cmd->device;
struct request *rq = cmd->request;
bool is_mq = (rq->mq_ctx != NULL);
-   int error;
+   int error = BLKPREP_KILL;
 
-   if (WARN_ON_ONCE(!rq->nr_phys_segments))
-   return -EINVAL;
+   if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq)))
+   goto err_exit;
 
error = scsi_init_sgtable(rq, &cmd->sdb);
if (error)

Oops:
[ 158.157590] [ cut here ]
[ 158.157601] WARNING: CPU: 0 PID: 0 at 
/usr/src/linux/dist/4.10.17-nobklcd/drivers/scsi/scsi_lib.c:1043 
scsi_init_io+0x1d7/0x1e0 [scsi_mod]
[ 158.157603] Modules linked in: iscsi_target_mod tcm_loop af_packet vhost_scsi 
vhost target_core_file target_core_iblock target_core_pscsi target_core_mod 
nfsd exportfs dummy bridge stp llc ib_iser rdma_cm iw_cm ib_cm ib_core ipv6 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi netconsole configfs sr_mod 
cdrom sd_mod sg adt7475 hwmon_vid coretemp x86_pkg_temp_thermal kvm_intel kvm 
irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc 
snd_hda_codec_realtek snd_hda_codec_generic nouveau video led_class 
drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt 
fb_sys_fops cfbcopyarea ttm drm agpgart snd_hda_intel snd_hda_codec 
snd_hda_core mptsas snd_pcm_oss snd_mixer_oss mptscsih mpt3sas snd_pcm mptbase 
snd_timer raid_class aesni_intel snd scsi_transport_sas
[ 158.157634]  igb soundcore aes_x86_64 crypto_simd ahci glue_helper libahci 
hwmon libata i2c_algo_bit i2c_core scsi_mod wmi hed button unix
[ 158.157642] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.17 #1
[ 158.157644] Hardware name: Dell Inc. Precision T5610/0WN7Y6, BIOS A16 
02/05/2018
[ 158.157645] Call Trace:
[ 158.157647]  
[ 158.157651]  ? dump_stack+0x46/0x5a
[ 158.157653]  ? __warn+0xb4/0xd0
[ 158.157656]  ? scsi_init_io+0x1d7/0x1e0 [scsi_mod]
[ 158.157658]  ? scsi_setup_cmnd+0x4c/0x140 [scsi_mod]
[ 158.157661]  ? scsi_prep_fn+0xe3/0x170 [scsi_mod]
[ 158.157663]  ? swiotlb_unmap_sg_attrs+0x44/0x60
[ 158.157665]  ? blk_peek_request+0x130/0x200
[ 158.157668]  ? scsi_request_fn+0x2b/0x510 [scsi_mod]
[ 158.157670]  ? __blk_run_queue+0x2a/0x40
[ 158.157672]  ? blk_run_queue+0x1c/0x30
[ 158.157675]  ? scsi_run_queue+0x229/0x2b0 [scsi_mod]
[ 158.157677]  ? scsi_io_completion+0x3d6/0x5c0 [scsi_mod]
[ 158.157680]  ? blk_done_softirq+0x67/0x80
[ 158.157682]  ? __do_softirq+0xdb/0x200
[ 158.157683]  ? irq_exit+0xa3/0xb0
[ 158.157686]  ? do_IRQ+0x45/0xc0
[ 158.157689]  ? common_interrupt+0x7c/0x7c
[ 158.157690]  
[ 158.157693]  ? cpuidle_enter_state+0x144/0x1f0
[ 158.157694]  ? cpuidle_enter_state+0x139/0x1f0
[ 158.157696]  ? do_idle+0xd3/0x190
[ 158.157698]  ? cpu_startup_entry+0x14/0x20
[ 158.157700]  ? start_kernel+0x391/0x399
[ 158.157701]  ? start_cpu+0x14/0x14
[ 158.157703] ---[ end trace 8d60c2e92fac2697 ]---
[ 158.157711] ---

Re: 4.15.14 crash with iscsi target and dvd

2018-04-01 Thread Wakko Warner
Bart Van Assche wrote:
> On Sun, 2018-04-01 at 07:37 -0400, Wakko Warner wrote:
> > Bart Van Assche wrote:
> > > On Sat, 2018-03-31 at 18:12 -0400, Wakko Warner wrote:
> > > > Richard Weinberger wrote:
> > > > > On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner  
> > > > > wrote:
> > > > > > I reported this before but noone responded.
> > > > > 
> > > > > Because you're sending only to LKML.
> > > > > CC'ing storage folks.
> > > > 
> > > > Thank you.  I wasn't sure who I needed to send it to.
> > > 
> > > Can you share the output of lsscsi? I would like to know whether or not 
> > > you
> > > are using a (S)ATA CDROM.
> > 
> > From the target:
> > [4:0:0:0]cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr0 
> > [5:0:0:0]cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr1 
> > [6:0:0:0]cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr2 
> > 
> > From the initiator:
> > [19:0:0:0]   cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr1
> > [19:0:0:1]   cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr2
> > [19:0:0:2]   cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr3
> > 
> > I tested 4.14.32 last night with the same oops.  4.9.91 works fine.
> > From the initiator, if I do cat /dev/sr1 > /dev/null it works.  If I mount
> > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> > crashes.  I'm using the builtin iscsi target with pscsi.  I can burn from
> > the initiator with out problems.  I'll test other kernels between 4.9 and
> > 4.14.
> 
> (+Lee and Chris)
> 
> Hello Wakko,
> 
> Although I'm not sure that what I ran into is exactly the same as what you
> ran into, there is definitely something wrong with what I encountered. What
> I ran into with Linus' latest master branch indicates two issues - one in
> the iSCSI initiator and one in the block layer:
> 
> scsi 3:0:0:1: Direct-Access LIO-ORG  FILEIO   4.0  PQ: 0 ANSI: 5
> sd 2:0:0:1: [sdd] Attached SCSI disk
> sd 3:0:0:1: Warning! Received an indication that the LUN assignments on this
> target have changed. The Linux SCSI layer does not automatical
> sd 3:0:0:1: Attached scsi generic sg8 type 0
> sd 3:0:0:1: [sdf] 128 512-byte logical blocks: (65.5 kB/64.0 KiB)
> sd 3:0:0:1: [sdf] Write Protect is off
> sd 3:0:0:1: [sdf] Mode Sense: 43 00 00 08
> sd 3:0:0:1: [sdf] Write cache: disabled, read cache: enabled, doesn't
> support DPO or FUA
> iSCSI/iqn.1993-08.org.debian:01:3b68b1b3d2eb: Unsupported SCSI Opcode 0xa3,
> sending CHECK_CONDITION.
> sd 3:0:0:2: [sde] Attached SCSI disk
> sd 3:0:0:1: [sdf] Attached SCSI disk
> 
> =
> WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
> 4.16.0-rc7-dbg+ #3 Not tainted
> -
> kworker/6:1H/155 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
>  (&(&session->frwd_lock)->rlock){+.-.}, at: [<7eb678ec>]
> iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]

[trimmed]

I'm not sure.  Mine happens as 2 oopses.  Both have   lines.
The files mine happen in are drivers/scsi/scsi_lib.c followed by
block/blk-core.c

The first one, the stack trace began with  then scsi_setup_cmnd.  I
tested 4.10.x, 4.11.x 4.12.x 4.14.x 4.15.x where x is the latest patch
(except for 4.15).  ALL crash.  4.9.91 doesn't.  4.10 added dump_stack
__warn scsi_init_io after  and before scsi_setup_cmnd.  Within seconds
of issueing the command to read files, it crashes.  On 4.15, if I just do a
sequential read from the raw device, it doesn't crash.

What do you enable in the kernel to get those locking messages?

> stack backtrace:
> CPU: 6 PID: 155 Comm: kworker/6:1H Not tainted 4.16.0-rc7-dbg+ #3
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> Workqueue: kblockd blk_timeout_work
> Call Trace:
>  dump_stack+0x85/0xc5
>  check_usage+0x6e7/0x700
>  ? check_usage_forwards+0x220/0x220
>  ? find_next_and_bit+0x51/0xe0
>  ? cpumask_next_and+0x20/0x30
>  ? find_busiest_group+0xc94/0x1010
>  ? class_equal+0x11/0x20
>  ? __bfs+0x62/0x2e0
>  ? class_equal+0x11/0x20
>  ? __bfs+0xfb/0x2e0
>  ? __lock_acquire+0x17aa/0x1af0
>  __lock_acquire+0x17aa/0x1af0
>  ? mark_lock+0xc7/0x770
>  ? debug_check_no_locks_freed+0x1b0/0x1b0
>  ? __lock_acquire+0x583/0x1af0
>  ? mark_lock+0xc7/0x770
>  ? lock_pin_lock+0x160/0x160
>  ? debug_check_no_locks_freed+0x1b0/0x1b0
>  ? lock_acquire+0xc9/0x260
>  lock_acquire+0xc9/0x260
>  ? iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]
>  _raw_spin_lock+0x2f/0x40
>

Re: 4.15.14 crash with iscsi target and dvd

2018-04-01 Thread Wakko Warner
Bart Van Assche wrote:
> On Sat, 2018-03-31 at 18:12 -0400, Wakko Warner wrote:
> > Richard Weinberger wrote:
> > > On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner  wrote:
> > > > I reported this before but noone responded.
> > > 
> > > Because you're sending only to LKML.
> > > CC'ing storage folks.
> > 
> > Thank you.  I wasn't sure who I needed to send it to.
> 
> Can you share the output of lsscsi? I would like to know whether or not you
> are using a (S)ATA CDROM.

>From the target:
[4:0:0:0]cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr0 
[5:0:0:0]cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr1 
[6:0:0:0]cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr2 

>From the initiator:
[19:0:0:0]   cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr1
[19:0:0:1]   cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr2
[19:0:0:2]   cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr3


I tested 4.14.32 last night with the same oops.  4.9.91 works fine.
>From the initiator, if I do cat /dev/sr1 > /dev/null it works.  If I mount
/dev/sr1 and then do find -type f | xargs cat > /dev/null the target
crashes.  I'm using the builtin iscsi target with pscsi.  I can burn from
the initiator with out problems.  I'll test other kernels between 4.9 and
4.14.


Re: 4.15.14 crash with iscsi target and dvd

2018-03-31 Thread Wakko Warner
Richard Weinberger wrote:
> On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner  wrote:
> > I reported this before but noone responded.
> 
> Because you're sending only to LKML.
> CC'ing storage folks.

Thank you.  I wasn't sure who I needed to send it to.


Re: [PATCH 0/4] Fix performance burning or extracting audio etc. from multiple optical drives.

2015-11-04 Thread Wakko Warner
Tim Small wrote:
> Fix performance burning or extracting audio etc. from multiple optical
> drives.

I know this is a bit late and is still not in 4.3.  I applied 2 of the
patches.  I did not apply the ide-cd, the paride nor the gdrom since I don't
have any of those.

> Patches are against 3.18.0-rc6+

I had to make some modifications to the patch for sr.c since it has changed
since.  cdrom.[ch] worked verbatim.

I tested on a system with 3 drives.  ejecting all drives didn't happen at
the same time, but I think it's because they are different brands and one
didn't have a disc in.  I did notice the leds coming on about the same time
though.  eject -t on all drives happened at the same time.

The patch I used previously on 3.3.0 removed all mutex_lock and mutex_unlock
lines from sr.c where as this patchset didn't.  I plan on trying to burn 3
dvds to see if it works.

Thanks for your work on the patches.

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Very slow throughput when using cdparanoia on two SATA CDROM drives with /dev/sr but not /dev/sg

2014-11-07 Thread Wakko Warner
Tim Small wrote:
> I've got a big box of audio CDs to read, so I hooked up a load of SATA
> CDROM drives to a machine (Intel motherboard AHCI, and SATA SiI3124
> controllers - in this example one was attached to each host controller),
> so that I could read them in parallel.
> 
> I'm using kernel 3.16.3, with cdparanoia 3.10.2 - both from Debian
> Jessie.  https://www.xiph.org/paranoia/manual.html
> 
> When two (or more) drives are read simultaneously, the performance falls
> to pieces (throughput from each drive drops by 95%) if more than one
> /dev/sr* device is being read by cdparanoia.
> 
> If I tell cdparanoia to use the corresponding /dev/sg devices, then no
> significant throughput drop is experienced when reading multiple drives
> simultaneously.

I had a similar problem burning multiple DVDs at the same time.  I asked
about this on the list more than 2 years ago and was pointed to a patch that
fixed it for me.  It involves sr_mod.  You can unload it, patch the source
and recomple.  When sr_mod.ko is built, insmod that and it worked for me.

See https://lkml.org/lkml/2012/2/28/230 for the patch.

The machine I use to do this is using 3.3.0 with the patch and quite stable.
I was using 3.0.0 at the time.  I haven't tested on any newer kernels.

Do a search for the thread "Burning multiple DVDs at one time".

> As an example, using these two drives:
> 
> [1:0:0:0]cd/dvd  TSSTcorp DVD+-RW TS-H653G DW10  /dev/sr0   /dev/sg4
> [14:0:0:0]   cd/dvd  PLDS DVD+-RW DH-16A6S YD11  /dev/sr9   /dev/sg16
> 
> 
> ... in the following results I used "time cdparanoia -v -d /dev/XXX 1
> /tmp/1.wav" - where XXX was substituted for either sr9 or sg16
> 
> 
> On an otherwise idle machine, I did these two sequentially:
> 
> sr9: 38 seconds
> 
> sg16: 38 seconds
> 
> Simultaneous with: cdparanoia -d /dev/sr0 11 /tmp/11.wav (and auto
> restarted that command when it completed) I then ran these two sequentially:
> 
> sr9: 680 seconds
> 
> sg16: 38 seconds
> 
> 
> 
> Simultaneous with: cdparanoia -d /dev/sg4 11 /tmp/11.wav as above:
> 
> sr9: 40 seconds
> 
> sg16: 40 seconds
> 
> 
> This is a diff of the two sets of cdparanoia -v output (using the sr
> devices vs the sg devices):
> 
> --- /tmp/sr 2014-11-06 12:41:43.094867889 +
> +++ /tmp/sg 2014-11-06 12:42:00.463123769 +
> @@ -2,9 +2,9 @@
>  
>  Using cdda library version: 10.2
>  Using paranoia library version: 10.2
> -Checking /dev/sr9 for cdrom...
> -Testing /dev/sr9 for SCSI/MMC interface
> -SG_IO device: /dev/sr9
> +Checking /dev/sg16 for cdrom...
> +Testing /dev/sg16 for SCSI/MMC interface
> +SG_IO device: /dev/sg16
>  
>  CDROM model sensed sensed: PLDS DVD+-RW DH-16A6S YD11
>  
> @@ -13,9 +13,9 @@
>  
>  Checking for MMC style command set...
>  Drive is MMC style
> -DMA scatter/gather table entries: 1
> +DMA scatter/gather table entries: 167
>  table entry size: 131072 bytes
> -maximum theoretical transfer: 55 sectors
> +maximum theoretical transfer: 9185 sectors
>  Setting default read size to 27 sectors (63504 bytes).
>  
>  Verifying CDDA command set...
> @@ -23,3 +23,5 @@
> 
> 
> I'm happy to try other kernel versions to gather more data.  Which
> kernel trees/branches should I try?
> 
> I'm also assuming this is more likely to be a SCSI layer bug than a SATA
> one, so let me know if that's probably wrong.  Also, is reporting here
> best or bugzilla?

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with mpt2sas kernel message

2013-12-12 Thread Wakko Warner
Peter Chang wrote:
> 2013/12/12 Wakko Warner :
> > Kernel: vanilla 3.7.2.
> >
> > I have 2 mpt2sas controllers.  I'm running a md check on the 2 arrays that I
> > have (one per card).  I'm seeing this in my kernel log (last 4 lines):
> > 2013-12-10 19:52:10 kame kernel:[1558186.193904] mpt2sas0: 
> > log_info(0x3108): originator(PL), code(0x08), sub_code(0x)
> 
> 'all ncq IOs fail when one ncq io encounters an error'

Can you point me to a list of codes?  Is this message something that I
should be concerned about?  I haven't seen this message before, however, I
changed all the disks in the array and replaced the controller (The mptsas
wouldn't support 3tb disks).  The chassis is external and the card has 2
external minisas connectors.  About the only thing that didn't change was
the chassis and the computer.

> > I'd also like to know which card mpt2sas0 is.  Is it the one that has the
> > lower numbered pci bus?
> 
> it's the first controller enumerated on pci, so probably the lowest number.

That would be the 2 minisas controller then.  The disks on this controller
is 8x 3tb in raid6 with 128k chunk.

Thanks for your response.

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Help with mpt2sas kernel message

2013-12-12 Thread Wakko Warner
Kernel: vanilla 3.7.2.

I have 2 mpt2sas controllers.  I'm running a md check on the 2 arrays that I
have (one per card).  I'm seeing this in my kernel log (last 4 lines):
2013-12-10 19:52:10 kame kernel:[1558186.193904] mpt2sas0: 
log_info(0x3108): originator(PL), code(0x08), sub_code(0x)
2013-12-10 19:52:35 kame kernel:[1558211.606908] mpt2sas0: 
log_info(0x3108): originator(PL), code(0x08), sub_code(0x)
2013-12-10 19:53:02 kame kernel:[1558237.954502] mpt2sas0: 
log_info(0x3108): originator(PL), code(0x08), sub_code(0x)
2013-12-10 19:53:50 kame kernel:[1558286.227318] mpt2sas0: 
log_info(0x3108): originator(PL), code(0x08), sub_code(0x)

Can someone tell me if there is a problem here?

I'd also like to know which card mpt2sas0 is.  Is it the one that has the
lower numbered pci bus?  I was unable to figure that out from
/sys/bus/pci/drivers/mpt2sas.

The contents are:
lrwxrwxrwx 1 root root0 Dec 10 19:49 :03:00.0 -> 
../../../../devices/pci:00/:00:05.0/:03:00.0/
lrwxrwxrwx 1 root root0 Dec 10 19:49 :04:00.0 -> 
../../../../devices/pci:00/:00:07.0/:04:00.0/

lspci for those:
03:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 
PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
04:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 
PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 02)

I am able to figure out which drives are on which of these controllers from
/sys/block

So far, I'm not seeing any data errors.  Although drives on 03:00.0 appear
to be running slower than the ones on 04:00.0 according to /proc/mdstat.  No
drives are actually in use other than the check that is running.

I posted to linux-kernel@ but I didn't receive a reply and thought this list
would be better suited.

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [SCSI] sr: Fix multi-drive performance by using per-device mutexes

2013-01-04 Thread Wakko Warner
Stefan Richter wrote:
> On Jan 04 Otto Meta wrote:
> > Otto Meta wrote:
> > > The single mutex for the sr module, introduced as a BKL replacement,
> > > globally serialises all sr ioctls, which hurts multi-drive performance.
> > > 
> > > This patch replaces sr_mutex with per-device mutexes in struct scsi_cd,
> > > allowing concurrent ioctls on different sr devices.
> > 
> > Unfortunately it wasn't as easy as that. The patch seems to introduce
> > a race condition that corrupts a drive's state under certain circumstances.
> > 
> > When two drives (e.g. sr0 and sr1) are attached to the same IDE cable, one
> > drive has its door locked, which will usually be the case after any 
> > operation
> > on the drive with inserted media (and whenever it feels like it, even with
> > dev.cdrom.lock=0), and the other drive is unlocked, then executing
> > 
> >   $ eject sr0 & eject sr1
> > 
> > will eject the unlocked drive and the locked drive will return
> > 
> >   eject: unable to eject, last error: Inappropriate ioctl for device
> > 
> > 
> > Other drivers down the road probably don't expect concurrent ioctls, so this
> > patch cannot be applied safely at this time. Sorry about the noise.
> > 
> > For the record: Tested with kernels 3.2.35 and 3.8.0-rc1, using IDE CD/DVD
> > drives connected via the drivers ata_piix and pata_pdc202xx_old.
> 
> As yo may have seen in the mailinglist archive, when Wakko and I tested
> with sr_mutex removed without any replacement, we were not able to trigger
> any race condition.  However, we certainly did not attempt this very
> particular test (two drives on the same PATA cable, one locked and one
> unlocked, and "eject" called on both of them at the same time).  I wonder
> if this is a PATA idiosyncrasy.
> 
> I will see whether I can do some tests tomorrow.  I can easily test master
> and slave PATA drives on a single cable behind a PATA-to-1394 bridge; but
> testing two drives on a single cable behind a PATA-to-PCI controller would
> be a bit more involved because the case of my PATA-equipped Linux PC is
> rather cramped.

I myself have not tried this specific test.  I do not have any systems with
2 PATA DVDroms on the same cable.  The only system I have that has PATA, the
drives are on seperate cables with nothing else on the cable.  My other
system is all SATA.  I have no other systems with more than 1 CD/DVD drive.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Possible bug with 2.6.20 and SG_IO

2007-06-08 Thread Wakko Warner
Please keep me in the CC.

I was attempting to send an INQUIRY to a scsi device using /dev/sg* and
SG_IO.  My first attempt hard locked the machine.

My code was:
 memset(&sg_io, 0, sizeof(sg_io));
 sg_io.interface_id='S';
 sg_io.dxfer_direction=SG_DXFER_FROM_DEV;
 sg_io.cmd_len=sizeof(cmd);
 sg_io.cmdp=(void *)&cmd;
 sg_io.timeout=1000; /* 1 second */
 sg_io.dxferp=inq;
 if (sense)
 {
  sg_io.mx_sb_len=sizeof(*sense);
  sg_io.sbp=(void *)sense;
 }
 
 if (ioctl(fd, SG_IO, &sg_io) == -1)
  return -1;

I'm fairly new to SG_IO so I didn't notice what I did wrong until it was too
late.  I did not set sg_io.dxfer_len=sizeof(*inq) !  This was with sg8, a
scsi processor (saf-te device) on an aic7xxx (aha-39160) controller.

I decided to try it on my laptop, sg1 (a usb flash drive) just gave me some
bizare data, sg0 (an IDE cdrom using ide-scsi) hard locked the machine. 
The kernel version on this machine is 2.6.18.

I'm wondering if I hit a bug since the sg_io.dxfer_len was 0 when it locked
up.  Maybe the SG_IO code should detect that if there is an xfer to/from the
device and sg_io.dxfer_len is 0 returning -1 ?

P.S. I know ide-scsi is depreciated and I have problems with it.  I intend
to migrate that machine to libata on the next kernel upgrade.

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals
 Got Gas???
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


scsi0: device overrun (status a) on 0:0:0

2007-04-28 Thread Wakko Warner
NOTE: I am not on the list, please always keep me in CC.

Can anyone tell me what this means:
Apr 28 22:50:42 vegeta kernel: [179467.422680] scsi0: device overrun (status a) 
on 0:0:0
Apr 28 22:50:42 vegeta kernel: [179467.479433] scsi0: device overrun (status a) 
on 0:0:0
Apr 28 22:50:42 vegeta kernel: [179467.499404] scsi0: device overrun (status a) 
on 0:0:0
Apr 28 22:50:42 vegeta kernel: [179467.565196] scsi0: device overrun (status a) 
on 0:0:0
Apr 28 22:50:43 vegeta kernel: [179467.647090] scsi0: device overrun (status a) 
on 0:0:0

Kernel is 2.6.20.

I see this frequently in my logs/dmesg and have no clue what it really
means.  I have 2 disks that constantly give this, both are SEAGATE
ST318404LW on an onboard AIC79xx controller (MB: SuperMicro X5DA8).

I see no noticable problems or speed issues.  Kernel 2.6.17 did not show me
this error.

A grep of the source shows this line is in drivers/scsi/aic7xxx/aic79xx_osm.c
and nowhere else.

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals
 Got Gas???
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AIC7xxx on 2.6.18

2007-02-04 Thread Wakko Warner
Keep me in CC, I'm not on the list.

James Bottomley wrote:
> On Fri, 2007-02-02 at 19:42 -0500, Wakko Warner wrote:
> > [   40.154122] ACPI: PCI Interrupt :05:01.1[B] -> GSI 17 (level,
> > low) -> IRQ 22
> > [   40.158190] scsi4: PCI error Interrupt at seqaddr = 0x1bb
> > [   40.158261] scsi4: Signaled a Target Abort
> 
> Well, this is the source of the problem.  It means the driver detected
> an error in the PCI system.  I'm afraid I don't know what a PCI target
> error is, but I think it means something is wrong with the PCI bus in
> your system.  There's also a screaming interrupt, because the first 500
> interrupts will be ignored before it looks at the bus error register.

What I don't understand is that it works fine if I load the module first
then and exec init.  If the ID is 05:01.1, this is the 2nd channel provided
by the aha-2940u/uw card.  There's nothing attached to that channel.

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals
 Got Gas???
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AIC7xxx on 2.6.18

2007-02-03 Thread Wakko Warner
Please keep me in CC, I'm not on the list.

Mark Rustad wrote:
> On Feb 2, 2007, at 6:42 PM, Wakko Warner wrote:
> >The PC is a suprtmicro x5da8 with an onboard dual channel AHA-39320  
> >u320
> >controller.  I have a dual channel AHA-39160 u160 and a dual channel
> >AHA-2940U/UW (ch0/internal is wide/narrow, ch1/externel is narrow).
> 
> I have used an x6-class Supermicro motherboard with the Adaptec u320  
> controller and I had problems hot-swapping drives with 2.6.18. It  

This wasn't about hot swapping in my case.  On the hot swapping thing, I've
never successfully done this with a 2.6 kernel.

> seemed that the bus reset that the backplane processor generated  
> caused trouble for the driver, killing the SCSI bus. 2.6.16 and  
> 2.6.17 locked up the kernel in the case of hot-swapping drives.

I have a scsi box that supports hot plug.  I had a drive failing and I
decided to just replace it, took down the entire bus and the raid array with
it, fortunately, nothing lost.  This was with the dual u160 card.

> I switched to 2.6.19.2 and things are better. I did find that a card  
> dump is produced when hot-inserting a drive, so it is way noisier  
> than I think it should be, but it continues to operate and life goes  
> on, which is much better behavior than 2.6.16, 2.6.17 or 2.6.18.

I've never bothered with the 2.6.x.x kernels.  Yet =)

> >I thought it was because I had option roms turned off, but when I  
> >turned
> >them on, it still has problems.  What's odd is the fact that if I  
> >boot with init=/bin/sh, modprobe aic7xxx, it works fine and I can  
> >exec init and it works fine.
> 
> I don't know what is up with that, but based on what I have seen I  
> would recommend using 2.6.19.x instead of 2.6.18 for systems using  
> the aic79xx driver.

As stated in the email, it was the aic7xxx driver.  I've never had problems
with the u320 driver that I recall since I've had this machine.  It appears
that when I load aic7xxx, it finds both channels of the u160 card and all
it's devices, then when it hits the 2940u/uw card, it loads the first
channel, all devices and crashes before it hits the 2nd channel.  It looked
like it had problems with the plextor cdrw thats on ID2.

But the odd thing is, if I boot to /bin/sh, insmod aic7xxx, and exec init,
everything's fine.

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals
 Got Gas???
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AIC7xxx on 2.6.18

2007-01-30 Thread Wakko Warner
NOTE: I am not on the linux-scsi list, keep me in CC.

Andrew Morton wrote:
> On Tue, 30 Jan 2007 07:18:20 -0500
> Wakko Warner <[EMAIL PROTECTED]> wrote:
> > Andrew Morton wrote:
> > > Yes, getting the oops traces will help, thanks.  And confirmation on a 
> > > more
> > > recent kernel would be good.
> > 
> > I tested with a 2.6.20-rc6 kernel and the MAC 39160 card.  There was no oops
> > and I was able to access the 2 disks.  This was on a different PC though. 
> > I'll try it again on the original PC.
> 
> Thanks.

The PC was a completely different PC when I tried it that time.  This time,
I tried it on a similar PC (same motherboard model, but not the exact same
machine).  I had no problems with 2.6.18.  I looked a little close and I
noticed that the original machine was actually overclocked.  I did the same
to the machine that works and it is now not working.  So the problem with
the mac card seems to be the overclocking.  I completely forgotten about it
since it was a test machine anyway.

So this just leaves the problem I've experienced on the machine with the PC
u160 and the u/uw dual card.

> > Should I try 2.6.19 as well?
> 
> There's not a lot of point in doing so.  If/when we come up with a
> 2.6.20-rc6 fix we'll know whether it is applicable to 2.6.19.x.

I'll try 2.6.19 on the machine with the 2 scsi cards with the option roms
disabled.  I'd rather not run a -rc kernel on this machine.

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals
 Got Gas???
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AIC7xxx on 2.6.18

2007-01-30 Thread Wakko Warner
NOTE: I am not on the linux-scsi list, keep me in CC.

Andrew Morton wrote:
> On Sun, 28 Jan 2007 14:46:20 -0500
> Wakko Warner <[EMAIL PROTECTED]> wrote:
> 
> > I have 2 machine that oops with these cards.
> > 
> > 1) The bios has the option to enable/disable option roms on individual PCI
> > slots.  I have an AHA-39160 and an AHA-2940U/UW (dual channel).  If I
> > disable option roms, the driver oopses when accessing the 2nd card.
> > 
> > I can get the oops if really needed as I don't like rebooting this machine.
> > 
> > 2) I have an AHA-39160 with Apple/Mac firmware.  When attempting to use it
> > on a PC, the driver oopses presumably because the card wasn't initialized or
> > something.  I realize this is probably not a supported configuration, but I
> > don't believe that it should be oopsing.
> > 
> > I can get the oops for this one if it'll help.
> 
> Yes, getting the oops traces will help, thanks.  And confirmation on a more
> recent kernel would be good.

I tested with a 2.6.20-rc6 kernel and the MAC 39160 card.  There was no oops
and I was able to access the 2 disks.  This was on a different PC though. 
I'll try it again on the original PC.

Should I try 2.6.19 as well?

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals
 Got Gas???
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html