Bart Van Assche wrote: > On Sun, 2018-04-01 at 14:27 -0400, Wakko Warner wrote: > > Wakko Warner wrote: > > > Wakko Warner wrote: > > > > I tested 4.14.32 last night with the same oops. 4.9.91 works fine. > > > > From the initiator, if I do cat /dev/sr1 > /dev/null it works. If I > > > > mount > > > > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target > > > > crashes. I'm using the builtin iscsi target with pscsi. I can burn > > > > from > > > > the initiator with out problems. I'll test other kernels between 4.9 > > > > and > > > > 4.14. > > > > > > So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest > > > patch > > > (except for 4.15 which was 1 behind) > > > Each of these kernels crash within seconds or immediate of doing find > > > -type > > > f | xargs cat > /dev/null from the initiator. > > > > I tried 4.10.0. It doesn't completely lockup the system, but the device > > that was used hangs. So from the initiator, it's /dev/sr1 and from the > > target it's /dev/sr0. Attempting to read /dev/sr0 after the oops causes the > > process to hang in D state. > > Hello Wakko, > > Thank you for having narrowed down this further. I think that you encountered > a regression either in the block layer core or in the SCSI core. Unfortunately > the number of changes between kernel versions v4.9 and v4.10 in these two > subsystems is huge. I see two possible ways forward: > - Either that you perform a bisect to identify the patch that introduced this > regression. However, I'm not sure whether you are familiar with the bisect > process. > - Or that you identify the command that triggers this crash such that others > can reproduce this issue without needing access to your setup. > > How about reproducing this crash with the below patch applied on top of > kernel v4.15.x? The additional output sent by this patch to the system log > should allow us to reproduce this issue by submitting the same SCSI command > with sg_raw.
Ok, so I tried this, but scsi_print_command doesn't print anything. I added a check for !rq and the same thing that blk_rq_nr_phys_segments does in an if statement above this thinking it might have crashed during WARN_ON_ONCE. It still didn't print anything. My printk shows this: [ 36.263193] sr 3:0:0:0: cmd->request->nr_phys_segments is 0 I also had scsi_print_command in the same if block which again didn't print anything. Is there some debug option I need to turn on to make it print? I tried looking through the code for this and following some of the function calls but didn't see any config options. > Subject: [PATCH] Report commands with no physical segments in the system log > > --- > drivers/scsi/scsi_lib.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 6b6a6705f6e5..74a39db57d49 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1093,8 +1093,10 @@ int scsi_init_io(struct scsi_cmnd *cmd) > bool is_mq = (rq->mq_ctx != NULL); > int error = BLKPREP_KILL; > > - if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq))) > + if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq))) { > + scsi_print_command(cmd); > goto err_exit; > + } > > error = scsi_init_sgtable(rq, &cmd->sdb); > if (error) -- Microsoft has beaten Volkswagen's world record. Volkswagen only created 22 million bugs.