Re: [PATCH 03/15] mpt3sas: Add sanity checks for scsi tracker before accessing it.

2018-04-01 Thread Sreekanth Reddy
On Fri, Mar 30, 2018 at 5:29 PM, Christoph Hellwig  wrote:
> On Fri, Mar 30, 2018 at 03:07:12PM +0530, Chaitra P B wrote:
>> Check scsi tracker for NULL before accessing it.
>> And in some places there are possibilities for getting valid st
>> but still other fields are not set.
>
> Please explain how that could ever happen.  You should never see
> a scsi_cmnd without the device pointer.

Chris,

Here is one example, During Host reset operation time driver will
flush out all the outstanding IO's to the SML in below function,

static void
_scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)
{
struct scsi_cmnd *scmd;
struct scsiio_tracker *st;
u16 smid;
int count = 0;

[SR] Here driver is looping starting from smid one to HBA queue depth.
for (smid = 1; smid <= ioc->scsiio_depth; smid++) {

[SR] Some times it is possible that scsi_cmnd might have created at
SML but it might not be issued to the driver as host reset operation
is going on, So here we will get non-NULL scmd.
scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid);
if (!scmd)
continue;
count++;
_scsih_set_satl_pending(scmd, false);
[SR] Here we are trying to get the scsi tracker 'st' for the above
scmd (which is not received by the driver) and so fields of this 'st'
are uninitialized.
st = scsi_cmd_priv(scmd);
[SR] And here we are trying to clear the scsi tracker 'st' which is
not yet all initialized by the driver, in other terms we are trying to
flush out the scmd command which is not yet all received by the
driver.
mpt3sas_base_clear_st(ioc, st);
scsi_dma_unmap(scmd);
if (ioc->pci_error_recovery || ioc->remove_host)
scmd->result = DID_NO_CONNECT << 16;
else
scmd->result = DID_RESET << 16;
scmd->scsi_done(scmd);
}
dtmprintk(ioc, pr_info(MPT3SAS_FMT "completing %d cmds\n",
ioc->name, count));
}

Thanks,
Sreekanth


Re: 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttlefrequent printk

2018-04-01 Thread Jason Yan



On 2018/4/2 13:29, Sergey Senozhatsky wrote:

On (04/02/18 13:14), wen.yan...@zte.com.cn wrote:


> It's true that this print for the same device is useless. But it's
> useful for different devices. Is it possible to limit the print only
> for the same device?

In our scene, it's  just for the same device (q->queuedata), Thanks.


Yes, what Jason meant was that rate limit struct is shared by different call
sites - including scsi_request_fn() from different devices.

If device1->scsi_request_fn()->sdev_printk_ratelimited() causes printk rate
limit, then messages from device2->scsi_request_fn()->sdev_printk_ratelimited()
may be lost entirely, unless you have enough of them.

-ss



Yes, that's exactly what I mean.

Thanks,

Jason


.





Re: 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttlefrequent printk

2018-04-01 Thread Jason Yan



On 2018/4/2 13:14, wen.yan...@zte.com.cn wrote:

Hello,

 > It's true that this print for the same device is useless. But it's

 > useful for different devices. Is it possible to limit the print only

 > for the same device?


In our scene, it's  just for the same device (q->queuedata), Thanks.



I mean the print limit you add will affect all devices. One device's 
print *may* cause another device's print limited even if it only printed 
one message.




Re: 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttlefrequent printk

2018-04-01 Thread Sergey Senozhatsky
On (04/02/18 13:14), wen.yan...@zte.com.cn wrote:
> 
>> It's true that this print for the same device is useless. But it's
>> useful for different devices. Is it possible to limit the print only
>> for the same device?
> 
>In our scene, it's  just for the same device (q->queuedata), Thanks.

Yes, what Jason meant was that rate limit struct is shared by different call
sites - including scsi_request_fn() from different devices.

If device1->scsi_request_fn()->sdev_printk_ratelimited() causes printk rate
limit, then messages from device2->scsi_request_fn()->sdev_printk_ratelimited()
may be lost entirely, unless you have enough of them.

-ss


Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttle frequent printk

2018-04-01 Thread Jason Yan

Hi, Yang,

On 2018/4/2 9:58, Wen Yang wrote:

There would be so many same lines printed by frequent printk if one
disk went wrong, like,
[  546.185242] sd 0:1:0:0: rejecting I/O to offline device
[  546.185258] sd 0:1:0:0: rejecting I/O to offline device
[  546.185280] sd 0:1:0:0: rejecting I/O to offline device
[  546.185307] sd 0:1:0:0: rejecting I/O to offline device
[  546.185334] sd 0:1:0:0: rejecting I/O to offline device
[  546.185364] sd 0:1:0:0: rejecting I/O to offline device
[  546.185390] sd 0:1:0:0: rejecting I/O to offline device
[  546.185410] sd 0:1:0:0: rejecting I/O to offline device
For slow serial console, the frequent printk may be blocked for a
long time, and if any spin_lock has been acquired before the printk
like in scsi_request_fn, watchdog could be triggered.

Related disscussion can be found here,
https://bugzilla.kernel.org/show_bug.cgi?id=199003
And Petr brought the idea to throttle the frequent printk, it's
useless to print the same lines frequently after all.



It's true that this print for the same device is useless. But it's
useful for different devices. Is it possible to limit the print only
for the same device?

Thanks,

Jason








Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttle frequent printk

2018-04-01 Thread Sergey Senozhatsky
Hello,

On (04/02/18 09:58), Wen Yang wrote:
> There would be so many same lines printed by frequent printk if one 
> disk went wrong, like,
> [  546.185242] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185258] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185280] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185307] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185334] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185364] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185390] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185410] sd 0:1:0:0: rejecting I/O to offline device
> For slow serial console, the frequent printk may be blocked for a 
> long time, and if any spin_lock has been acquired before the printk 
> like in scsi_request_fn, watchdog could be triggered.

Did you test the patch? Rate limiting does not completely remove
printk calls. printk is still there, in a loop under spin_lock.
A big enough I/O request queue can cause the same lockup problems.

-ss


[PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttle frequent printk

2018-04-01 Thread Wen Yang
There would be so many same lines printed by frequent printk if one 
disk went wrong, like,
[  546.185242] sd 0:1:0:0: rejecting I/O to offline device
[  546.185258] sd 0:1:0:0: rejecting I/O to offline device
[  546.185280] sd 0:1:0:0: rejecting I/O to offline device
[  546.185307] sd 0:1:0:0: rejecting I/O to offline device
[  546.185334] sd 0:1:0:0: rejecting I/O to offline device
[  546.185364] sd 0:1:0:0: rejecting I/O to offline device
[  546.185390] sd 0:1:0:0: rejecting I/O to offline device
[  546.185410] sd 0:1:0:0: rejecting I/O to offline device
For slow serial console, the frequent printk may be blocked for a 
long time, and if any spin_lock has been acquired before the printk 
like in scsi_request_fn, watchdog could be triggered.

Related disscussion can be found here,
https://bugzilla.kernel.org/show_bug.cgi?id=199003
And Petr brought the idea to throttle the frequent printk, it's 
useless to print the same lines frequently after all.

v2->v1: fix some typos
 
Suggested-by: Petr Mladek 
Suggested-by: Sergey Senozhatsky 
Signed-off-by: Wen Yang 
Signed-off-by: Jiang Biao 
Signed-off-by: Tan Hu 
CC: BartVanAssche 
CC: Petr Mladek 
CC: Sergey Senozhatsky 
CC: Martin K. Petersen 
CC: "James E.J. Bottomley" 
CC: Tejun Heo 
---
 drivers/scsi/scsi_lib.c|  6 +++---
 include/scsi/scsi_device.h | 10 ++
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c84f931..f77e801 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1301,7 +1301,7 @@ static int scsi_setup_cmnd(struct scsi_device *sdev, 
struct request *req)
 * commands.  The device must be brought online
 * before trying any recovery commands.
 */
-   sdev_printk(KERN_ERR, sdev,
+   sdev_printk_ratelimited(KERN_ERR, sdev,
"rejecting I/O to offline device\n");
ret = BLKPREP_KILL;
break;
@@ -1310,7 +1310,7 @@ static int scsi_setup_cmnd(struct scsi_device *sdev, 
struct request *req)
 * If the device is fully deleted, we refuse to
 * process any commands as well.
 */
-   sdev_printk(KERN_ERR, sdev,
+   sdev_printk_ratelimited(KERN_ERR, sdev,
"rejecting I/O to dead device\n");
ret = BLKPREP_KILL;
break;
@@ -1802,7 +1802,7 @@ static void scsi_request_fn(struct request_queue *q)
break;
 
if (unlikely(!scsi_device_online(sdev))) {
-   sdev_printk(KERN_ERR, sdev,
+   sdev_printk_ratelimited(KERN_ERR, sdev,
"rejecting I/O to offline device\n");
scsi_kill_request(req, q);
continue;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 7ae177c..378d3f2 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -249,6 +249,16 @@ struct scsi_device {
 #define sdev_printk(l, sdev, fmt, a...)\
sdev_prefix_printk(l, sdev, NULL, fmt, ##a)
 
+#define sdev_printk_ratelimited(l, sdev, fmt, a...)\
+({ \
+   static DEFINE_RATELIMIT_STATE(_rs,  \
+ DEFAULT_RATELIMIT_INTERVAL,   \
+ DEFAULT_RATELIMIT_BURST); \
+   \
+   if (__ratelimit(&_rs))  \
+   sdev_prefix_printk(l, sdev, NULL, fmt, ##a);\
+})
+
 __printf(3, 4) void
 scmd_printk(const char *, const struct scsi_cmnd *, const char *, ...);
 
-- 
1.8.3.1



Re: 4.15.14 crash with iscsi target and dvd

2018-04-01 Thread Wakko Warner
Wakko Warner wrote:
> Wakko Warner wrote:
> > Bart Van Assche wrote:
> > > On Sat, 2018-03-31 at 18:12 -0400, Wakko Warner wrote:
> > > > Richard Weinberger wrote:
> > > > > On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner  
> > > > > wrote:
> > > > > > I reported this before but noone responded.
> > > > > 
> > > > > Because you're sending only to LKML.
> > > > > CC'ing storage folks.
> > > > 
> > > > Thank you.  I wasn't sure who I needed to send it to.
> > > 
> > > Can you share the output of lsscsi? I would like to know whether or not 
> > > you
> > > are using a (S)ATA CDROM.
> > 
> > >From the target:
> > [4:0:0:0]cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr0 
> > [5:0:0:0]cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr1 
> > [6:0:0:0]cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr2 
> > 
> > >From the initiator:
> > [19:0:0:0]   cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr1
> > [19:0:0:1]   cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr2
> > [19:0:0:2]   cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr3
> > 
> > 
> > I tested 4.14.32 last night with the same oops.  4.9.91 works fine.
> > >From the initiator, if I do cat /dev/sr1 > /dev/null it works.  If I mount
> > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> > crashes.  I'm using the builtin iscsi target with pscsi.  I can burn from
> > the initiator with out problems.  I'll test other kernels between 4.9 and
> > 4.14.
> 
> So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest patch
> (except for 4.15 which was 1 behind)
> Each of these kernels crash within seconds or immediate of doing find -type
> f | xargs cat > /dev/null from the initiator.

I tried 4.10.0.  It doesn't completely lockup the system, but the device
that was used hangs.  So from the initiator, it's /dev/sr1 and from the
target it's /dev/sr0.  Attempting to read /dev/sr0 after the oops causes the
process to hang in D state.

Here's the oops.  There was also another line that was not seen in the newer
kernels.
[ 323.105044] [ cut here ]
[ 323.105057] WARNING: CPU: 0 PID: 0 at 
/usr/src/linux/dist/4.10/drivers/scsi/scsi_lib.c:1043 scsi_init_io+0x143/0x1f0 
[scsi_mod]
[ 323.105058] Modules linked in: iscsi_target_mod af_packet tcm_loop vhost_scsi 
vhost target_core_file target_core_iblock target_core_pscsi target_core_mod 
nfsd exportfs dummy bridge stp llc ib_iser rdma_cm iw_cm ib_cm ib_core ipv6 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi netconsole configfs sr_mod 
cdrom sd_mod sg adt7475 hwmon_vid coretemp x86_pkg_temp_thermal kvm_intel kvm 
irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc 
snd_hda_codec_realtek snd_hda_codec_generic nouveau video led_class 
drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt 
fb_sys_fops cfbcopyarea ttm drm snd_hda_intel agpgart snd_hda_codec 
snd_hda_core snd_pcm_oss igb snd_mixer_oss aesni_intel snd_pcm aes_x86_64 hwmon 
snd_timer crypto_simd i2c_algo_bit mptsas snd glue_helper
[ 323.105089]  mpt3sas i2c_core mptscsih soundcore ahci mptbase raid_class 
libahci scsi_transport_sas libata scsi_mod button wmi hed unix
[ 323.105097] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0 #1
[ 323.105098] Hardware name: Dell Inc. Precision T5610/0WN7Y6, BIOS A16 
02/05/2018
[ 323.105100] Call Trace:
[ 323.105101]  
[ 323.105105]  ? dump_stack+0x46/0x5a
[ 323.105107]  ? __warn+0xb4/0xd0
[ 323.105110]  ? scsi_init_io+0x143/0x1f0 [scsi_mod]
[ 323.105113]  ? scsi_setup_cmnd+0x4c/0x140 [scsi_mod]
[ 323.105115]  ? scsi_prep_fn+0xe3/0x170 [scsi_mod]
[ 323.105118]  ? swiotlb_unmap_sg_attrs+0x44/0x60
[ 323.105119]  ? blk_peek_request+0x130/0x200
[ 323.105122]  ? scsi_request_fn+0x2b/0x510 [scsi_mod]
[ 323.105124]  ? __blk_run_queue+0x2a/0x40
[ 323.105126]  ? blk_run_queue+0x1c/0x30
[ 323.105129]  ? scsi_run_queue+0x229/0x2b0 [scsi_mod]
[ 323.105131]  ? scsi_io_completion+0x3d6/0x5c0 [scsi_mod]
[ 323.105133]  ? blk_done_softirq+0x67/0x80
[ 323.105135]  ? __do_softirq+0xdb/0x200
[ 323.105137]  ? irq_exit+0xa3/0xb0
[ 323.105139]  ? do_IRQ+0x45/0xc0
[ 323.105141]  ? common_interrupt+0x7c/0x7c
[ 323.105142]  
[ 323.105145]  ? cpuidle_enter_state+0x144/0x1f0
[ 323.105146]  ? cpuidle_enter_state+0x139/0x1f0
[ 323.105148]  ? do_idle+0xd3/0x190
[ 323.105150]  ? cpu_startup_entry+0x14/0x20
[ 323.105152]  ? start_kernel+0x391/0x399
[ 323.105154]  ? start_cpu+0x14/0x14
[ 323.105155] ---[ end trace f38cc734e4921bdc ]---
[ 323.105157] blk_peek_request: bad return=-22


Re: 4.15.14 crash with iscsi target and dvd

2018-04-01 Thread Bart Van Assche
On Sun, 2018-04-01 at 12:24 -0400, Wakko Warner wrote:
> What do you enable in the kernel to get those locking messages?

Hello Wakko,

I have attached the script to this e-mail that I use to enable a bunch of
kernel debugging options. Please note that enabling these options,
especially lockdep and kasan, will slow down the kernel.

Bart.





enable-kernel-debugging-options
Description: enable-kernel-debugging-options


Re: 4.15.14 crash with iscsi target and dvd

2018-04-01 Thread Wakko Warner
Wakko Warner wrote:
> Bart Van Assche wrote:
> > On Sat, 2018-03-31 at 18:12 -0400, Wakko Warner wrote:
> > > Richard Weinberger wrote:
> > > > On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner  
> > > > wrote:
> > > > > I reported this before but noone responded.
> > > > 
> > > > Because you're sending only to LKML.
> > > > CC'ing storage folks.
> > > 
> > > Thank you.  I wasn't sure who I needed to send it to.
> > 
> > Can you share the output of lsscsi? I would like to know whether or not you
> > are using a (S)ATA CDROM.
> 
> >From the target:
> [4:0:0:0]cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr0 
> [5:0:0:0]cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr1 
> [6:0:0:0]cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr2 
> 
> >From the initiator:
> [19:0:0:0]   cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr1
> [19:0:0:1]   cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr2
> [19:0:0:2]   cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr3
> 
> 
> I tested 4.14.32 last night with the same oops.  4.9.91 works fine.
> >From the initiator, if I do cat /dev/sr1 > /dev/null it works.  If I mount
> /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> crashes.  I'm using the builtin iscsi target with pscsi.  I can burn from
> the initiator with out problems.  I'll test other kernels between 4.9 and
> 4.14.

So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest patch
(except for 4.15 which was 1 behind)
Each of these kernels crash within seconds or immediate of doing find -type
f | xargs cat > /dev/null from the initiator.

I did a diff between 4.9.91 and 4.10.17 on scsi_lib.c.  Here's the
difference around the line reported (in this case 1043).  I've added the
4.10.17 oops at the end:

@@ -1029,10 +1038,10 @@ int scsi_init_io(struct scsi_cmnd *cmd)
struct scsi_device *sdev = cmd->device;
struct request *rq = cmd->request;
bool is_mq = (rq->mq_ctx != NULL);
-   int error;
+   int error = BLKPREP_KILL;
 
-   if (WARN_ON_ONCE(!rq->nr_phys_segments))
-   return -EINVAL;
+   if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq)))
+   goto err_exit;
 
error = scsi_init_sgtable(rq, &cmd->sdb);
if (error)

Oops:
[ 158.157590] [ cut here ]
[ 158.157601] WARNING: CPU: 0 PID: 0 at 
/usr/src/linux/dist/4.10.17-nobklcd/drivers/scsi/scsi_lib.c:1043 
scsi_init_io+0x1d7/0x1e0 [scsi_mod]
[ 158.157603] Modules linked in: iscsi_target_mod tcm_loop af_packet vhost_scsi 
vhost target_core_file target_core_iblock target_core_pscsi target_core_mod 
nfsd exportfs dummy bridge stp llc ib_iser rdma_cm iw_cm ib_cm ib_core ipv6 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi netconsole configfs sr_mod 
cdrom sd_mod sg adt7475 hwmon_vid coretemp x86_pkg_temp_thermal kvm_intel kvm 
irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc 
snd_hda_codec_realtek snd_hda_codec_generic nouveau video led_class 
drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt 
fb_sys_fops cfbcopyarea ttm drm agpgart snd_hda_intel snd_hda_codec 
snd_hda_core mptsas snd_pcm_oss snd_mixer_oss mptscsih mpt3sas snd_pcm mptbase 
snd_timer raid_class aesni_intel snd scsi_transport_sas
[ 158.157634]  igb soundcore aes_x86_64 crypto_simd ahci glue_helper libahci 
hwmon libata i2c_algo_bit i2c_core scsi_mod wmi hed button unix
[ 158.157642] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.17 #1
[ 158.157644] Hardware name: Dell Inc. Precision T5610/0WN7Y6, BIOS A16 
02/05/2018
[ 158.157645] Call Trace:
[ 158.157647]  
[ 158.157651]  ? dump_stack+0x46/0x5a
[ 158.157653]  ? __warn+0xb4/0xd0
[ 158.157656]  ? scsi_init_io+0x1d7/0x1e0 [scsi_mod]
[ 158.157658]  ? scsi_setup_cmnd+0x4c/0x140 [scsi_mod]
[ 158.157661]  ? scsi_prep_fn+0xe3/0x170 [scsi_mod]
[ 158.157663]  ? swiotlb_unmap_sg_attrs+0x44/0x60
[ 158.157665]  ? blk_peek_request+0x130/0x200
[ 158.157668]  ? scsi_request_fn+0x2b/0x510 [scsi_mod]
[ 158.157670]  ? __blk_run_queue+0x2a/0x40
[ 158.157672]  ? blk_run_queue+0x1c/0x30
[ 158.157675]  ? scsi_run_queue+0x229/0x2b0 [scsi_mod]
[ 158.157677]  ? scsi_io_completion+0x3d6/0x5c0 [scsi_mod]
[ 158.157680]  ? blk_done_softirq+0x67/0x80
[ 158.157682]  ? __do_softirq+0xdb/0x200
[ 158.157683]  ? irq_exit+0xa3/0xb0
[ 158.157686]  ? do_IRQ+0x45/0xc0
[ 158.157689]  ? common_interrupt+0x7c/0x7c
[ 158.157690]  
[ 158.157693]  ? cpuidle_enter_state+0x144/0x1f0
[ 158.157694]  ? cpuidle_enter_state+0x139/0x1f0
[ 158.157696]  ? do_idle+0xd3/0x190
[ 158.157698]  ? cpu_startup_entry+0x14/0x20
[ 158.157700]  ? start_kernel+0x391/0x399
[ 158.157701]  ? start_cpu+0x14/0x14
[ 158.157703] ---[ end trace 8d60c2e92fac2697 ]---
[ 158.157711] [ cut here ]
[ 158.157732] kernel BUG at 
/usr/src/linux/dist/4.10.17-nobklcd/block/blk-core.c:2916!
[ 158.157755] invalid opcode:  [#1] PREEMPT SMP
[ 158.157770] Modules linked in: iscsi_target_mod tcm_loop af_packet vhost_scsi 
vhost target_core_file

Re: 4.15.14 crash with iscsi target and dvd

2018-04-01 Thread Wakko Warner
Bart Van Assche wrote:
> On Sun, 2018-04-01 at 07:37 -0400, Wakko Warner wrote:
> > Bart Van Assche wrote:
> > > On Sat, 2018-03-31 at 18:12 -0400, Wakko Warner wrote:
> > > > Richard Weinberger wrote:
> > > > > On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner  
> > > > > wrote:
> > > > > > I reported this before but noone responded.
> > > > > 
> > > > > Because you're sending only to LKML.
> > > > > CC'ing storage folks.
> > > > 
> > > > Thank you.  I wasn't sure who I needed to send it to.
> > > 
> > > Can you share the output of lsscsi? I would like to know whether or not 
> > > you
> > > are using a (S)ATA CDROM.
> > 
> > From the target:
> > [4:0:0:0]cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr0 
> > [5:0:0:0]cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr1 
> > [6:0:0:0]cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr2 
> > 
> > From the initiator:
> > [19:0:0:0]   cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr1
> > [19:0:0:1]   cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr2
> > [19:0:0:2]   cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr3
> > 
> > I tested 4.14.32 last night with the same oops.  4.9.91 works fine.
> > From the initiator, if I do cat /dev/sr1 > /dev/null it works.  If I mount
> > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> > crashes.  I'm using the builtin iscsi target with pscsi.  I can burn from
> > the initiator with out problems.  I'll test other kernels between 4.9 and
> > 4.14.
> 
> (+Lee and Chris)
> 
> Hello Wakko,
> 
> Although I'm not sure that what I ran into is exactly the same as what you
> ran into, there is definitely something wrong with what I encountered. What
> I ran into with Linus' latest master branch indicates two issues - one in
> the iSCSI initiator and one in the block layer:
> 
> scsi 3:0:0:1: Direct-Access LIO-ORG  FILEIO   4.0  PQ: 0 ANSI: 5
> sd 2:0:0:1: [sdd] Attached SCSI disk
> sd 3:0:0:1: Warning! Received an indication that the LUN assignments on this
> target have changed. The Linux SCSI layer does not automatical
> sd 3:0:0:1: Attached scsi generic sg8 type 0
> sd 3:0:0:1: [sdf] 128 512-byte logical blocks: (65.5 kB/64.0 KiB)
> sd 3:0:0:1: [sdf] Write Protect is off
> sd 3:0:0:1: [sdf] Mode Sense: 43 00 00 08
> sd 3:0:0:1: [sdf] Write cache: disabled, read cache: enabled, doesn't
> support DPO or FUA
> iSCSI/iqn.1993-08.org.debian:01:3b68b1b3d2eb: Unsupported SCSI Opcode 0xa3,
> sending CHECK_CONDITION.
> sd 3:0:0:2: [sde] Attached SCSI disk
> sd 3:0:0:1: [sdf] Attached SCSI disk
> 
> =
> WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
> 4.16.0-rc7-dbg+ #3 Not tainted
> -
> kworker/6:1H/155 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
>  (&(&session->frwd_lock)->rlock){+.-.}, at: [<7eb678ec>]
> iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]

[trimmed]

I'm not sure.  Mine happens as 2 oopses.  Both have   lines.
The files mine happen in are drivers/scsi/scsi_lib.c followed by
block/blk-core.c

The first one, the stack trace began with  then scsi_setup_cmnd.  I
tested 4.10.x, 4.11.x 4.12.x 4.14.x 4.15.x where x is the latest patch
(except for 4.15).  ALL crash.  4.9.91 doesn't.  4.10 added dump_stack
__warn scsi_init_io after  and before scsi_setup_cmnd.  Within seconds
of issueing the command to read files, it crashes.  On 4.15, if I just do a
sequential read from the raw device, it doesn't crash.

What do you enable in the kernel to get those locking messages?

> stack backtrace:
> CPU: 6 PID: 155 Comm: kworker/6:1H Not tainted 4.16.0-rc7-dbg+ #3
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> Workqueue: kblockd blk_timeout_work
> Call Trace:
>  dump_stack+0x85/0xc5
>  check_usage+0x6e7/0x700
>  ? check_usage_forwards+0x220/0x220
>  ? find_next_and_bit+0x51/0xe0
>  ? cpumask_next_and+0x20/0x30
>  ? find_busiest_group+0xc94/0x1010
>  ? class_equal+0x11/0x20
>  ? __bfs+0x62/0x2e0
>  ? class_equal+0x11/0x20
>  ? __bfs+0xfb/0x2e0
>  ? __lock_acquire+0x17aa/0x1af0
>  __lock_acquire+0x17aa/0x1af0
>  ? mark_lock+0xc7/0x770
>  ? debug_check_no_locks_freed+0x1b0/0x1b0
>  ? __lock_acquire+0x583/0x1af0
>  ? mark_lock+0xc7/0x770
>  ? lock_pin_lock+0x160/0x160
>  ? debug_check_no_locks_freed+0x1b0/0x1b0
>  ? lock_acquire+0xc9/0x260
>  lock_acquire+0xc9/0x260
>  ? iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]
>  _raw_spin_lock+0x2f/0x40
>  ? iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]
>  iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]
>  scsi_times_out+0xcc/0x3f0 [scsi_mod]
>  blk_rq_timed_out+0x2f/0x70
>  blk_timeout_work+0x1b0/0x220
>  process_one_work+0x446/0xa50
>  ? pwq_dec_nr_in_flight+0x100/0x100
>  ? worker_thread+0x177/0x6d0
>  worker_thread+0x7b/0x6d0
>  ? process_one_work+0xa50/0xa50
>  kthread+0x1b7/0x1e0
>  ? kthread_create_worker_on_cpu+0xc0/0xc0
>  ret_from_fork+0x24/0x30
> 
> [ ... ]
> 
> [ cut here ]
> kernel BUG at block/blk-core.c:3267!
> inv

Re: 4.15.14 crash with iscsi target and dvd

2018-04-01 Thread Bart Van Assche
On Sun, 2018-04-01 at 07:37 -0400, Wakko Warner wrote:
> Bart Van Assche wrote:
> > On Sat, 2018-03-31 at 18:12 -0400, Wakko Warner wrote:
> > > Richard Weinberger wrote:
> > > > On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner  
> > > > wrote:
> > > > > I reported this before but noone responded.
> > > > 
> > > > Because you're sending only to LKML.
> > > > CC'ing storage folks.
> > > 
> > > Thank you.  I wasn't sure who I needed to send it to.
> > 
> > Can you share the output of lsscsi? I would like to know whether or not you
> > are using a (S)ATA CDROM.
> 
> From the target:
> [4:0:0:0]cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr0 
> [5:0:0:0]cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr1 
> [6:0:0:0]cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr2 
> 
> From the initiator:
> [19:0:0:0]   cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr1
> [19:0:0:1]   cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr2
> [19:0:0:2]   cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr3
> 
> I tested 4.14.32 last night with the same oops.  4.9.91 works fine.
> From the initiator, if I do cat /dev/sr1 > /dev/null it works.  If I mount
> /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> crashes.  I'm using the builtin iscsi target with pscsi.  I can burn from
> the initiator with out problems.  I'll test other kernels between 4.9 and
> 4.14.

(+Lee and Chris)

Hello Wakko,

Although I'm not sure that what I ran into is exactly the same as what you
ran into, there is definitely something wrong with what I encountered. What
I ran into with Linus' latest master branch indicates two issues - one in
the iSCSI initiator and one in the block layer:

scsi 3:0:0:1: Direct-Access LIO-ORG  FILEIO   4.0  PQ: 0 ANSI: 5
sd 2:0:0:1: [sdd] Attached SCSI disk
sd 3:0:0:1: Warning! Received an indication that the LUN assignments on this
target have changed. The Linux SCSI layer does not automatical
sd 3:0:0:1: Attached scsi generic sg8 type 0
sd 3:0:0:1: [sdf] 128 512-byte logical blocks: (65.5 kB/64.0 KiB)
sd 3:0:0:1: [sdf] Write Protect is off
sd 3:0:0:1: [sdf] Mode Sense: 43 00 00 08
sd 3:0:0:1: [sdf] Write cache: disabled, read cache: enabled, doesn't
support DPO or FUA
iSCSI/iqn.1993-08.org.debian:01:3b68b1b3d2eb: Unsupported SCSI Opcode 0xa3,
sending CHECK_CONDITION.
sd 3:0:0:2: [sde] Attached SCSI disk
sd 3:0:0:1: [sdf] Attached SCSI disk

=
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
4.16.0-rc7-dbg+ #3 Not tainted
-
kworker/6:1H/155 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
 (&(&session->frwd_lock)->rlock){+.-.}, at: [<7eb678ec>]
iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]

and this task is already holding:
 (&(&q->__queue_lock)->rlock){-.-.}, at: [<3c5841ec>]
blk_timeout_work+0x45/0x220
which would create a new lock dependency:
 (&(&q->__queue_lock)->rlock){-.-.} -> (&(&session->frwd_lock)->rlock){+.-.}

but this new dependency connects a HARDIRQ-irq-safe lock:
 (&(&q->__queue_lock)->rlock){-.-.}

... which became HARDIRQ-irq-safe at:
  _raw_spin_lock_irqsave+0x46/0x60
  ata_qc_schedule_eh+0xb9/0x110 [libata]
  ata_sff_hsm_move+0x114/0xb10 [libata]
  __ata_sff_port_intr+0xec/0x1a0 [libata]
  ata_bmdma_port_intr+0xef/0x160 [libata]
  ata_bmdma_interrupt+0x160/0x2e0 [libata]
  __handle_irq_event_percpu+0x72/0x460
  handle_irq_event_percpu+0x66/0xd0
  handle_irq_event+0x54/0x90
  handle_edge_irq+0xe9/0x2d0
  handle_irq+0x17b/0x210
  do_IRQ+0x6c/0x140
  ret_from_intr+0x0/0x1e
  native_safe_halt+0x2/0x10
  default_idle+0x1f/0x200
  do_idle+0x1bc/0x1f0
  cpu_startup_entry+0x19/0x20
  start_kernel+0x47f/0x4a1
  secondary_startup_64+0xa5/0xb0

to a HARDIRQ-irq-unsafe lock:
 (&(&session->frwd_lock)->rlock){+.-.}

... which became HARDIRQ-irq-unsafe at:
...
  _raw_spin_lock_bh+0x34/0x40
  iscsi_conn_setup+0x239/0x320 [libiscsi]
  iscsi_tcp_conn_setup+0x14/0x80 [libiscsi_tcp]
  iscsi_sw_tcp_conn_create+0x1f/0x24a [iscsi_tcp]
  iscsi_if_rx+0x13c9/0x20d0 [scsi_transport_iscsi]
  netlink_unicast+0x299/0x330
  netlink_sendmsg+0x435/0x580
  ___sys_sendmsg+0x447/0x4d0
  __sys_sendmsg+0xad/0x120
  do_syscall_64+0xf3/0x2c0
  entry_SYSCALL_64_after_hwframe+0x42/0xb7

other info that might help us debug this:

 Possible interrupt unsafe locking scenario:

   CPU0CPU1
   
  lock(&(&session->frwd_lock)->rlock);
   local_irq_disable();
   lock(&(&q->__queue_lock)->rlock);
   lock(&(&session->frwd_lock)->rlock);
  
lock(&(&q->__queue_lock)->rlock);

 *** DEADLOCK ***

3 locks held by kworker/6:1H/155:
 #0:  ((wq_completion)"kblockd"){+.+.}, at: [<116fed84>]
process_one_work+0x387/0xa50
 #1:  ((work_completion)(&q->timeout_work)){+.+.}, at: [<116fed84>]
process_one_work+0x387/0xa50
 #2:  (&(&q->__queue_lock)->rlock){-.-.}

Re: 4.15.14 crash with iscsi target and dvd

2018-04-01 Thread Wakko Warner
Bart Van Assche wrote:
> On Sat, 2018-03-31 at 18:12 -0400, Wakko Warner wrote:
> > Richard Weinberger wrote:
> > > On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner  wrote:
> > > > I reported this before but noone responded.
> > > 
> > > Because you're sending only to LKML.
> > > CC'ing storage folks.
> > 
> > Thank you.  I wasn't sure who I needed to send it to.
> 
> Can you share the output of lsscsi? I would like to know whether or not you
> are using a (S)ATA CDROM.

>From the target:
[4:0:0:0]cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr0 
[5:0:0:0]cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr1 
[6:0:0:0]cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr2 

>From the initiator:
[19:0:0:0]   cd/dvd  ATAPIiHAS224   B  GL05  /dev/sr1
[19:0:0:1]   cd/dvd  ATAPIiHAS422   8  4L11  /dev/sr2
[19:0:0:2]   cd/dvd  PBDS DVD+-RW DH-16W1S 2D14  /dev/sr3


I tested 4.14.32 last night with the same oops.  4.9.91 works fine.
>From the initiator, if I do cat /dev/sr1 > /dev/null it works.  If I mount
/dev/sr1 and then do find -type f | xargs cat > /dev/null the target
crashes.  I'm using the builtin iscsi target with pscsi.  I can burn from
the initiator with out problems.  I'll test other kernels between 4.9 and
4.14.


Re: Recent kernels fail to boot on POWER8 with multipath SCSI

2018-04-01 Thread Michael Ellerman
Mike Snitzer  writes:
> On Fri, Mar 30 2018 at  5:04P -0400,
> Michael Ellerman  wrote:
...
>> Any prospect of getting that patch to Linus before the 4.16 release? Yes
>> I realise that's in ~36 hours :)
>
> Please, see upstream commit e457edf0b21c873be827b7c2f6b8e1545485c415

Sweet thanks!

cheers


[Bug 198081] scsi sg

2018-04-01 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=198081

Cristian Crinteanu (crinteanu.crist...@gmail.com) changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |PATCH_ALREADY_AVAILABLE

-- 
You are receiving this mail because:
You are the assignee for the bug.