Brendan Miller wrote:
> I have read Chiaki and Eric's discussion of Chiaki's problem and Eric's
> [possible] solution, and it is timely that I find myself in a related
> boat (and I provide you an almost-as-long description).
Early last December, I reported a strange multi-lun SCSI CD drive
(Nakamichi MBR-7) access problem in 2.2.13 kernel. (It was a lengthy
posting. I can repost this entirely if necessary to give this post
better background.)
Eric Youngdale responded with the analysis of potential cause and also
the possible problem in the then current development kernel 2.3.xx.
Anyway, the gist of the problem reported was
In early 2.0.36
dd if=/dev/scdX of=/dev/null &
dd if=/dev/scdY of=/dev/null &
would crash the kernel solid. Here, /dev/scd{X,Y} correspond to the
each lun device in the multi-CD drive.
Early 2.2.xx solved this problem.
But now, if we ran two ls processes that would list the CD contents
simultaneously, 2.2.13 somehow would block one of the process and the
process gets stuck in a strange blocked state. (Luckily no kernel
crash, though.)
I was tied up for the last several weeks and could not test the
problem further. Today, I saw Brendan Miller's problem report and
decided to follow up my previous post finally.
In the meantime, I upgraded the kernel to 2.2.14 early this month. I
retried the test again. Here is what I did with 2.2.14:
I ran two dd commands against the two different LUNs simultaneously.
Namely
(sleep 3 : dd if=/dev/scdX of=/dev/null ) &
dd if=/dev/scdY of=/dev/null &
As you can see below, there is something wrong with the way the
particular mult-LUN CD drive is handled in 2.2.14. (In the early
2.1.1xx, 2.2.yy, these commands would run OK albeit the mechanical
noise because of the switching of CD media in the only read
mechanism. The noise was expected after all.)
Below,
/dev/scd0 ... a separate single drive connected
to BusLogic BT-930 card driven by
the BusLogic driver.
/dev/scd1 ... LUN 0 of MBR-7 multi-LUN drive.
MBR-7 is conncted to AMD SCSI chip card
driven by tmscsim driver.
/dev/scd2 ... LUN 1 //
/dev/scd3 ... LUN 2 //
/dev/scd4 ... LUN 3 //
/dev/scd5 ... LUN 4 //
/dev/scd6 ... LUN 5 //
/dev/scd7 ... LUN 6 //
Here is the test log.
This is a slightly edited, session record.
(I put comment lines with "***.." marker string.)
# sh -vx do_dd.sh
(sleep 3; dd if=/dev/scd2 of=/dev/null ) &
dd if=/dev/scd1 of=/dev/null &
+ sleep 3
+ dd if=/dev/scd1 of=/dev/null
standard:/home/ishikawa/tools# + dd if=/dev/scd2 of=/dev/null
dd: /dev/scd2: No medium found
*** THE ABOVE IS CLEARLY BOGUS SINCE WE DO
*** HAVE A CD in lun1, i.e., /dev/scd2!
standard:/home/ishikawa/tools# ps -aef | grep dd
warning: `-' deprecated; use `ps aef', not `ps -aef'
359 p2 S 0:00 \_ grep dd MAILPATH=/usr/spool/mail/ishikawa:/etc/m
356 p2 D 0:00 dd if /dev/scd1 of /dev/null MAILPATH=/usr/spool/mail/ishika
*** dd against /dev/scd1, lun 0, ketp running.
standard:/home/ishikawa/tools# kill 356
*** Let me try scd2 ALONE (ALONE is the keyword here!).
standard:/home/ishikawa/tools# dd if=/dev/scd2 of=/dev/null
7524+0 records in
7524+0 records out
*** It works! I quit the command with ^C. ***
**** NOW LET US RETRY the SIMULTANEOUS access. ****
standard:/home/ishikawa/tools# sh -vx do_dd.sh
:
+ :
: sleep 20
+ : sleep 20
(sleep 3; dd if=/dev/scd2 of=/dev/null ) &
dd if=/dev/scd1 of=/dev/null &
+ sleep 3
+ dd if=/dev/scd1 of=/dev/null
standard:/home/ishikawa/tools# + dd if=/dev/scd2 of=/dev/null
dd: /dev/scd2: No medium found
dd: /dev/scd1: Input/output error
0+0 records in
0+0 records out
**** Whoa! DIFFERENT ERRORS! ****
**** The commands quit automatically.
standard:/home/ishikawa/tools#
**** LET ME CHECK IF WE CAN ACCESS CD in the
**** LUN 1 drive, scd2, AGAIN ALONE.
standard:/var/log# dd if=/dev/scd2 of=/dev/null
9124+0 records in
9124+0 records out
*** YES, WE STILL CAN.
*** ! I quit the command with ^C. ***
*** Now let me try with different drive, /dev/scd3
*** and /dev/scd1
standard:/home/ishikawa/tools# sh -vx do_dd2.sh
:
+ :
: sleep 20
+ : sleep 20
(sleep 3; dd if=/dev/scd3 of=/dev/null ) &
dd if=/dev/scd1 of=/dev/null &
+ sleep 3
+ dd if=/dev/scd1 of=/dev/null
standard:/home/ishikawa/tools# + dd if=/dev/scd3 of=/dev/null
dd: /dev/scd3: Operation not supported by device
*** WHAT !?
*** Let me try again.
standard:/home/ishikawa/tools# dd if=/dev/scd3 of=/dev/null
dd: /dev/scd3: Operation not supported by device
*** Let me try with a different device.
standard:/home/ishikawa/tools# dd if=/dev/scd4 of=/dev/null
dd: /dev/scd4: Operation not supported by device
standard:/home/ishikawa/tools# dd if=/dev/scd5 of=/dev/null
dd: /dev/scd5: Operation not supported by device
standard:/home/ishikawa/tools# dd if=/dev/scd6 of=/dev/null
dd: /dev/scd6: Operation not supported by device
***** Hmm. STRANGE!. Now let us try with scd1 again.
standard:/home/ishikawa/tools# dd if=/dev/scd1 of=/dev/null
**** AGHA! I must have screwed up the SCSI subsystem badly.
**** it somehow got stuck. After a tense few seconds,
**** I killled the command with ^C.
**** There was no input/output records lines.
*** Let me try again.
standard:/home/ishikawa/tools# dd if=/dev/scd1 of=/dev/null
*** Again, it got stuck, but I could kill it with ^C
*** eventually... (I mean the next prompt took a while longer
*** to appear after I hit ^C. Hmmm....)
standard:/home/ishikawa/tools#
========================================
More info.
#
# The drive is listed as below in scsi.c
#
scsi.c:{"NRC","MBR-7","*", BLIST_FORCELUN | BLIST_SINGLELUN},
scsi.c:{"NRC","MBR-7.4","*", BLIST_FORCELUN | BLIST_SINGLELUN},
#
# It sounds that Brendan Miller's drive
# sounds like one of those PIONEER DRM-6xxx drive.
# I have no idea how we could verify this. If there is a photo of DRM-6xxx available
# on a web page, maybe Brendan can check it out.
# It uses a removable magazine: Nakamich MBR-7 doesn't.
# Anyway, if so, it should be "device_list"ed like
# the DRM-6xxx drives are.
#
{"PIONEER","CD-ROM DRM-600","*", BLIST_FORCELUN | BLIST_SINGLELUN},
{"PIONEER","CD-ROM DRM-602X","*", BLIST_FORCELUN | BLIST_SINGLELUN},
{"PIONEER","CD-ROM DRM-604X","*", BLIST_FORCELUN | BLIST_SINGLELUN},
Oh, before I forget, here is the /var/log/syslog
record.
I modified scsi_debug.c slightly and so
the following might contain a slightly verbose output than
the standard file would produce.
Jan 20 01:48:21 standard /usr/sbin/cron[227]: (CRON) STARTUP (fork ok)
Jan 20 01:48:21 standard insmod: insmod: vmmon: no module by that name found
Jan 20 01:48:21 standard squid[196]: storeLateRelease: released 0 objects
Jan 20 01:48:21 standard insmod: insmod: vmnet: no module by that name found
Jan 20 01:48:33 standard modprobe: can't locate module vmnet1
Jan 20 01:48:33 standard last message repeated 3 times
Jan 20 01:48:33 standard vmware[init]: Host-only networking: vmnet1 vmnet-dhcpd.
*** I am not quite so sure what the following line is.
Jan 20 01:49:54 standard kernel: VFS: Disk change detected on device sr(11,1)
Jan 20 01:52:28 standard kernel: Non-zero result in scsi_done 10000 4:1
Jan 20 01:52:28 standard kernel: In scsi_done(host = 1, result = 010000)
Jan 20 01:52:28 standard kernel: Couldn't connect.
Jan 20 01:52:28 standard kernel: sr2: CDROM (ioctl) error, command: Test Unit Ready 20
00 00 00 00
Jan 20 01:52:28 standard kernel: sr00:00: old sense key None
Jan 20 01:52:28 standard kernel: Non-extended sense class 0 code 0x0 Non-zero result
in scsi_done 10000 4:1
Jan 20 01:52:28 standard kernel: In scsi_done(host = 1, result = 010000)
Jan 20 01:52:28 standard kernel: Couldn't connect.
Jan 20 01:52:28 standard kernel: sr2: CDROM (ioctl) error, command: Test Unit Ready 20
00 00 00 00
Jan 20 01:52:28 standard kernel: sr00:00: old sense key None
Jan 20 01:52:28 standard kernel: Non-extended sense class 0 code 0x0 <6>cdrom: open
failed.
Jan 20 01:52:28 standard kernel: Non-zero result in scsi_done 10000 4:1
Jan 20 01:52:28 standard kernel: In scsi_done(host = 1, result = 010000)
Jan 20 01:52:28 standard kernel: Couldn't connect.
Jan 20 01:52:28 standard kernel: SCSI error: host 1 id 4 lun 1 return code = 27010000
Jan 20 01:52:28 standard kernel: ^ISense class 0, sense error 0, extended sense 0
Jan 20 01:52:28 standard kernel: Non-zero result in scsi_done 70008 4:1
Jan 20 01:52:28 standard kernel: In scsi_done(host = 1, result = 070008)
Jan 20 01:52:28 standard kernel: In MAYREDO, allowing 5 retries, have 0
Jan 20 01:52:29 standard kernel: Non-zero result in scsi_done 10000 4:1
Jan 20 01:52:29 standard kernel: In scsi_done(host = 1, result = 010000)
Jan 20 01:52:29 standard kernel: Couldn't connect.
Jan 20 01:52:29 standard kernel: SCSI error: host 1 id 4 lun 1 return code = 27010000
Jan 20 01:52:29 standard kernel: ^ISense class 0, sense error 0, extended sense 0
Jan 20 01:52:29 standard kernel: VFS: Disk change detected on device sr(11,2)
*** I believe the above was the initial do_dd.sh where /dev/scd2 was
*** not accssible.
Jan 20 01:52:58 standard kernel: VFS: Disk change detected on device sr(11,2)
*** The above must be where the /dev/scd2 was accessible alone.
*** The following must be the second try of do_dd.sh where
*** no records were copied due to I/O error.
Jan 20 01:54:01 standard kernel: Non-zero result in scsi_done 70008 4:1
Jan 20 01:54:01 standard kernel: In scsi_done(host = 1, result = 070008)
Jan 20 01:54:01 standard kernel: In MAYREDO, allowing 3 retries, have 0
Jan 20 01:54:01 standard kernel: scsi1 channel 0 : resetting for second half of
retries.
Jan 20 01:54:01 standard kernel: SCSI bus is being reset for host 1 channel 0.
Jan 20 01:54:01 standard kernel: DC390: RESET ... Non-zero result in scsi_done 80000
4:0
Jan 20 01:54:01 standard kernel: In scsi_done(host = 1, result = 080000)
Jan 20 01:54:01 standard kernel: done
Jan 20 01:54:01 standard kernel: scsi reset function returned 2
Jan 20 01:54:01 standard kernel: SCSI_RESET_SUCCESS:
Jan 20 01:54:02 standard kernel: DC390: RETRY pid 30364 (28), target 04-00
Jan 20 01:54:02 standard kernel: Non-zero result in scsi_done 70008 4:1
Jan 20 01:54:02 standard kernel: In scsi_done(host = 1, result = 070008)
Jan 20 01:54:02 standard kernel: In MAYREDO, allowing 3 retries, have 1
Jan 20 01:54:02 standard kernel: Non-zero result in scsi_done 70008 4:1
Jan 20 01:54:02 standard kernel: In scsi_done(host = 1, result = 070008)
Jan 20 01:54:02 standard kernel: In MAYREDO, allowing 3 retries, have 2
Jan 20 01:54:02 standard kernel: sr2: CDROM (ioctl) error, command: Test Unit Ready 20
00 00 00 00
Jan 20 01:54:02 standard kernel: sr0b:02: old sense key None
Jan 20 01:54:04 standard kernel: Non-extended sense class 0 code 0x0 Non-zero result
in scsi_done 70008 4:1
Jan 20 01:54:04 standard kernel: In scsi_done(host = 1, result = 070008)
Jan 20 01:54:04 standard kernel: In MAYREDO, allowing 3 retries, have 0
Jan 20 01:54:04 standard kernel: scsi1 channel 0 : resetting for second half of
retries.
Jan 20 01:54:04 standard kernel: SCSI bus is being reset for host 1 channel 0.
Jan 20 01:54:04 standard kernel: DC390: RESET ... Non-zero result in scsi_done 80000
4:0
Jan 20 01:54:04 standard kernel: In scsi_done(host = 1, result = 080000)
Jan 20 01:54:04 standard kernel: done
Jan 20 01:54:04 standard kernel: scsi reset function returned 2
Jan 20 01:54:04 standard kernel: SCSI_RESET_SUCCESS:
Jan 20 01:54:04 standard kernel: DC390: RETRY pid 30364 (28), target 04-00
Jan 20 01:54:04 standard kernel: Non-zero result in scsi_done 70008 4:1
Jan 20 01:54:04 standard kernel: In scsi_done(host = 1, result = 070008)
Jan 20 01:54:04 standard kernel: In MAYREDO, allowing 3 retries, have 1
Jan 20 01:54:04 standard kernel: Non-zero result in scsi_done 70008 4:1
Jan 20 01:54:04 standard kernel: In scsi_done(host = 1, result = 070008)
Jan 20 01:54:04 standard kernel: In MAYREDO, allowing 3 retries, have 2
Jan 20 01:54:04 standard kernel: sr2: CDROM (ioctl) error, command: Start/Stop Unit 20
00 00 03 00
Jan 20 01:54:04 standard kernel: sr0b:02: old sense key None
Jan 20 01:54:04 standard kernel: Non-extended sense class 0 code 0x0 <6>cdrom: open
failed.
Jan 20 01:54:04 standard kernel: Non-zero result in scsi_done 70008 4:1
Jan 20 01:54:04 standard kernel: In scsi_done(host = 1, result = 070008)
Jan 20 01:54:04 standard kernel: In MAYREDO, allowing 5 retries, have 0
Jan 20 01:54:05 standard kernel: Non-zero result in scsi_done 70008 4:1
Jan 20 01:54:05 standard kernel: In scsi_done(host = 1, result = 070008)
Jan 20 01:54:05 standard kernel: In MAYREDO, allowing 5 retries, have 1
Jan 20 01:54:05 standard kernel: scsi1 channel 0 : resetting for second half of
retries.
Jan 20 01:54:05 standard kernel: SCSI bus is being reset for host 1 channel 0.
Jan 20 01:54:05 standard kernel: DC390: RESET ... Non-zero result in scsi_done 80000
4:0
Jan 20 01:54:05 standard kernel: In scsi_done(host = 1, result = 080000)
Jan 20 01:54:05 standard kernel: done
Jan 20 01:54:05 standard kernel: scsi reset function returned 2
Jan 20 01:54:05 standard kernel: SCSI_RESET_SUCCESS:
Jan 20 01:54:06 standard kernel: DC390: RETRY pid 30364 (28), target 04-00
Jan 20 01:54:06 standard kernel: Non-zero result in scsi_done 70008 4:1
Jan 20 01:54:06 standard kernel: In scsi_done(host = 1, result = 070008)
Jan 20 01:54:06 standard kernel: In MAYREDO, allowing 5 retries, have 2
Jan 20 01:54:06 standard kernel: Non-zero result in scsi_done 70008 4:1
Jan 20 01:54:06 standard kernel: In scsi_done(host = 1, result = 070008)
Jan 20 01:54:06 standard kernel: In MAYREDO, allowing 5 retries, have 3
Jan 20 01:54:06 standard kernel: Non-zero result in scsi_done 70008 4:1
Jan 20 01:54:06 standard kernel: In scsi_done(host = 1, result = 070008)
Jan 20 01:54:06 standard kernel: In MAYREDO, allowing 5 retries, have 4
Jan 20 01:54:06 standard kernel: SCSI error: host 1 id 4 lun 1 return code = 27070008
Jan 20 01:54:06 standard kernel: ^ISense class 0, sense error 0, extended sense 0
Jan 20 01:54:06 standard kernel: Non-zero result in scsi_done 70008 4:1
Jan 20 01:54:06 standard kernel: In scsi_done(host = 1, result = 070008)
Jan 20 01:54:06 standard kernel: In MAYREDO, allowing 5 retries, have 0
Jan 20 01:54:07 standard kernel: Non-zero result in scsi_done 70008 4:1
Jan 20 01:54:07 standard kernel: In scsi_done(host = 1, result = 070008)
Jan 20 01:54:07 standard kernel: In MAYREDO, allowing 5 retries, have 1
Jan 20 01:54:07 standard kernel: scsi1 channel 0 : resetting for second half of
retries.
Jan 20 01:54:07 standard kernel: SCSI bus is being reset for host 1 channel 0.
Jan 20 01:54:07 standard kernel: DC390: RESET ... Non-zero result in scsi_done 80000
4:0
Jan 20 01:54:07 standard kernel: In scsi_done(host = 1, result = 080000)
Jan 20 01:54:07 standard kernel: done
Jan 20 01:54:07 standard kernel: scsi reset function returned 2
Jan 20 01:54:07 standard kernel: SCSI_RESET_SUCCESS:
Jan 20 01:54:07 standard kernel: DC390: RETRY pid 30364 (28), target 04-00
Jan 20 01:54:08 standard kernel: Non-zero result in scsi_done 70008 4:1
Jan 20 01:54:08 standard kernel: In scsi_done(host = 1, result = 070008)
Jan 20 01:54:08 standard kernel: In MAYREDO, allowing 5 retries, have 2
Jan 20 01:54:08 standard kernel: Non-zero result in scsi_done 70008 4:1
Jan 20 01:54:08 standard kernel: In scsi_done(host = 1, result = 070008)
Jan 20 01:54:08 standard kernel: In MAYREDO, allowing 5 retries, have 3
Jan 20 01:54:08 standard kernel: Non-zero result in scsi_done 70008 4:1
Jan 20 01:54:08 standard kernel: In scsi_done(host = 1, result = 070008)
Jan 20 01:54:08 standard kernel: In MAYREDO, allowing 5 retries, have 4
Jan 20 01:54:08 standard kernel: SCSI error: host 1 id 4 lun 1 return code = 27070008
Jan 20 01:54:08 standard kernel: ^ISense class 0, sense error 0, extended sense 0
Jan 20 01:54:08 standard kernel: VFS: Disk change detected on device sr(11,2)
Jan 20 01:54:12 standard kernel: Non-zero result in scsi_done 8000000 4:0
Jan 20 01:54:12 standard kernel: In scsi_done(host = 1, result = 8000000)
Jan 20 01:54:12 standard kernel: CD-ROM I/O error: dev 0b:01, sector 0
Jan 20 01:57:33 standard kernel: Non-zero result in scsi_done 8000002 4:1
Jan 20 01:57:33 standard kernel: In scsi_done(host = 1, result = 8000002)
Jan 20 01:57:33 standard kernel: In MAYREDO, allowing 3 retries, have 0
Jan 20 01:57:33 standard kernel: scsi1 channel 0 : resetting for second half of
retries.
Jan 20 01:57:33 standard kernel: SCSI bus is being reset for host 1 channel 0.
Jan 20 01:57:33 standard kernel: DC390: RESET ... done
Jan 20 01:57:33 standard kernel: scsi reset function returned 2
Jan 20 01:57:33 standard kernel: SCSI_RESET_SUCCESS:
Jan 20 01:57:34 standard kernel: Non-zero result in scsi_done 8000002 4:1
Jan 20 01:57:34 standard kernel: In scsi_done(host = 1, result = 8000002)
Jan 20 01:57:34 standard kernel: In MAYREDO, allowing 3 retries, have 1
Jan 20 01:57:34 standard kernel: VFS: Disk change detected on device sr(11,2)
Jan 20 02:00:00 standard /USR/SBIN/CRON[384]: (root) CMD (test -f /proc/modules &&
/sbin/rmmod -a)
Jan 20 02:00:35 standard kernel: Non-zero result in scsi_done 8000002 4:0
Jan 20 02:00:35 standard kernel: In scsi_done(host = 1, result = 8000002)
Jan 20 02:00:35 standard kernel: VFS: Disk change detected on device sr(11,1)
========================================
LUN handling seems to be buggy still in 2.2.14.
Hope this helps.
Happy Hacking,
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]