Hello,

I would like to report a problem of accessing
two CDs on a multi-LUN CD ROM changer device simultaneously.

This is VERY LENGHTY, but I am afraid it needs to be.

First the background.

Version of Linux reported:
Linux standard 2.2.13 #2 SMP Sun Oct 24 04:06:54 JST 1999 i586 unknown

The device in question is Nakamich MBR-7 multi-CD (7 CDs)  changer.
This is blacklisted in /usr/src/linux/driver/scsi/scsi.c
like this:
 {"NRC","MBR-7","*", BLIST_FORCELUN | BLIST_SINGLELUN},
 {"NRC","MBR-7.4","*", BLIST_FORCELUN | BLIST_SINGLELUN},

This CD changer has 7 CD slots.  There is only one reader mechanism.
Only one CD can be accessed at a time.  Different CD can be accessed
after the currently available CD is replaced with the new CD
mechanically in the reading mechanism.  I think this is why the device
is blacklisted as SINGLELUN device (BLIST_SINGLELUN).
Each CD is given a unique LUN. (0-6). The device as a whole uses
only one target ID.

I am using a BusLogic adaptor driven by BusLogic driver
and an AMD scsi chip adaptor driven by tmscsim driver.
The Nakamich MBR-7 multi-CD changer device is connected to
tmscsim-driven AMD scsi chip adaptor.

  ls -l /proc/scsi
  total 0
  dr-xr-xr-x   2 root     root            0 Dec  8 04:52 BusLogic/
  -rw-r--r--   1 root     root            0 Dec  8 04:52 scsi
  dr-xr-xr-x   2 root     root            0 Dec  8 04:52 tmscsim/


The listing of "cat /proc/scsi/scsi"

**** comment : scsi0 is the BusLogic driver and not
**** related to the problem here.

Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: IBM      Model: DSAS-3540        Rev: S47K
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 03 Lun: 00
  Vendor: CyberDrv Model:  CD-ROM TW240S   Rev: 1.20
  Type:   CD-ROM                           ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 05 Lun: 00
  Vendor: SAMSUNG  Model: WN321010S        Rev: 1224
  Type:   Direct-Access                    ANSI SCSI revision: 02

**** comment: scsi1 is the AMD-chip SCSI controller (using tmscsim
driver)
**** to which the Nakamich drive (recognized as MBR-7) is
**** connected.

Host: scsi1 Channel: 00 Id: 04 Lun: 00
  Vendor: NRC      Model: MBR-7            Rev: 110
  Type:   CD-ROM                           ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 04 Lun: 01
  Vendor: NRC      Model: MBR-7            Rev: 110
  Type:   CD-ROM                           ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 04 Lun: 02
  Vendor: NRC      Model: MBR-7            Rev: 110
  Type:   CD-ROM                           ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 04 Lun: 03
  Vendor: NRC      Model: MBR-7            Rev: 110
  Type:   CD-ROM                           ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 04 Lun: 04
  Vendor: NRC      Model: MBR-7            Rev: 110
  Type:   CD-ROM                           ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 04 Lun: 05
  Vendor: NRC      Model: MBR-7            Rev: 110
  Type:   CD-ROM                           ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 04 Lun: 06
  Vendor: NRC      Model: MBR-7            Rev: 110
  Type:   CD-ROM                           ANSI SCSI revision: 02

Output of "cat /proc/scsi/tmscsim/1"

Tekram DC390/AM53C974 PCI SCSI Host Adapter, Driver Version 2.0d24
1999/11/14
SCSI Host Nr 1, AM53C974 Adapter Nr 0
IOPortBase 0xe000, IRQ 10
MaxID 7, MaxLUN 8, AdapterID 7, SelTimeout 250 ms, DelayReset 1 s
TagMaxNum 16, Status 0x00, ACBFlag 0x00, GlitchEater 24 ns
Statistics: Cmnds 23, Cmnds not sent directly 0, Out of SRB conds 0
            Lost arbitrations 0, Sel. connected 0, Connected: No
Nr of attached devices: 7, Nr of DCBs: 7
Map of attached LUNs: 00 00 00 00 7f 00 00 00
Idx ID LUN Prty Sync DsCn SndS TagQ NegoPeriod SyncSpeed SyncOffs MaxCmd

00  04  00  Yes  No   Yes  Yes  No   (200 ns)                       01
01  04  01  Yes  No   Yes  Yes  No   (200 ns)                       01
02  04  02  Yes  No   Yes  Yes  No   (200 ns)                       01
03  04  03  Yes  No   Yes  Yes  No   (200 ns)                       01
04  04  04  Yes  No   Yes  Yes  No   (200 ns)                       01
05  04  05  Yes  No   Yes  Yes  No   (200 ns)                       01
06  04  06  Yes  No   Yes  Yes  No   (200 ns)                       01



Problem symptom and a short history:

Until early summer of last year (sorry I forgot the exact revision.
Probably
2.0.36?), there was a problem of supporting multiple-LUN device.

In my setup, /dev/scd0 is the single CD drive on the first SCSI adaptor.

The first and the second CD (LUN 0 and LUN 1) on the MBR-7 drive
is then /dev/scd1 and /dev/scd2.

In the pre-2.0.3[67]? kernel, trying to run the following sequence of
command
locked the kernel solid. Not even the magic Alt+SysReq+key combination
produced meaningful result. (Well, I think I tested this Alt+SysReq+Key
in 2.1.1xx series after I noticed the problem and upgraded it
to the emerging newer kernel version for testing. So I should have said
pre-2.2.x kernel, maybe.)

        dd if=/dev/scd1 of=/dev/null &
        dd if=/dev/scd2 of=/dev/null

Running only one of the command was OK, though.
I tried to resolve the problem, but concluded that there was
a bug in the SCSI layer somewhere concerning the support of multi-lun,
but
got nowhere.
(Well, now I recall that there was a misplacement of the if() statement
concerning the checking of BLIST_SINGLELUN device and fixing this was
a little improvement in 2.0.3x series.
I think without the fix, the access was more flaky.
Even after the fix, above hung occured reliably. Also, 2.1.1xx was also
problematic as I noted avove)

However, it was fixed in the later kernel (2.2.x).
I remember I saw the commands above work.
Please note that when the commands worked, the switching of the two
processes above
caused a lot of mechanical noise since the MBR-7 drive needed to
switch the CDs physically into/from the reading tray.
(This audio feedback is helpful when I diagnose the CD access problem.)
I tend to not invoke such simultaneous access commands although
there are times when I do need to mount multiple source archive CDs,
etc.
simultaneously and the multi-CD changer drive is a very handy device
in such cases.

Lately, after almost a year, I tried to mount two CDs on the MBR-7
drive.
One of these days, the standalone x24 drive at /dev/scd0 works as
primary CD and when I need the second media, I use a LUN in MBR-7
(double speed, working but slow).

Then I noticed a very severe problem.
The following is the story.

I  mounted two CDs in the following manner.

        mount -t iso9660  /dev/scd1 /mnt1     (LUN 0)
        mount -t iso9660  /dev/scd2 /mnt2     (LUN 1)

So far, so good. Both were mounted after a slight delay caused by
the initial loading of CD (this is normal and I heard the mechanical
noise.)
I checked by running "ls -l /mnt1" and "ls -l /mnt2" immediately
after the mount of each LUN. The listing worked.

Now the fun began.
I ran the following two commands in two different xterm windows and
encountered the severe problem.

Firstly, in the first xterm window, I ran

        ls -lR /mnt1            (in the first window)

The listing began appearing. The CD media at LUN 0 mounted at /mnt1
is Free Solaris 7 installation media and the listing takes a long time
to finish.
So while the listing in the first window continued,
I ran the following command in the second xterm window.

        ls -lR /mnt2            (in the second window.)

Now, the second command lists the CD media at LUN 1 of the MBR-7 changer

device and so I expected a lot of mechanical noise due to the shuffling
of CDs. But it should work.
Such multiple access to different CDs
simultaneously DID work last year after I switched to the
newer kernel (2.2.[6-9] Sorry again, I am hazy about the revisions
here. No problem there.

However, today, I noticed that the operation caused severe problems.

 - First, as soon as I type in the second ls command,
   I noticed that the display in the first window stopped
   updating completely. The directory output got stuck somewhere and
   was never updated after the second command was issued.

 - The directory listing in the second window is produced continuously
for
   a second or two.

 - The changer device after one or two seconds began switching the
   CDs : I can hear the mechanical noise and I thought this was normal.
   [ When these commands worked, after the noise, i.e. the shuffling of
   CDs, the directory listing commands to different CDs
   ought to get updated and continue one by one in turn
   until both commands finishes.
   Lot of noise, but it worked before. ]

   However, this time, even after the mechanical noise and a period of
   pause, the first window never gets updated.
   During this time, the output in the second window stopped so I gather

   the process of the second command was suspended.
   The first process ought to be running and update the
   output, but it didn't.

 - After looking at the CRT display and heared the mechanical noises
   about a couple of dozen times, I figured that something was amiss and

   began investigating.

 - First thing I tried  is to make the xterm window where the
   stuck first ls command was running own the input focus by
   clicking the window pane. It didn't work! Or rather, to my horror,
   that mouse was not usable! It didn't respond.

 - After a few such trials, I found out an important thing.
   The mouse responded only during the brief time when the
   display in the second xterm
   window where the second directory listing got updated.

   As soon as I heared the mechanical noise, presumably so that the
   first directory command began accessing the CD at LUN 0 again, the
mouse
   stopped responding. After a pause, another mechanical noise,
presumably
   for the access to CD at LUN 1 to resume, the second window again
resumed
   updating its output, and then the mouse worked during this brief
   period when the second ls process ran.!

 During the investigation, by mistake, I tried move a larger netscape
 window to the front
 and block the two xterm windows. However, the re-display or
 the shuffling of the windows didn't proceed smoothly.
 It was hard to tell exactly, but I think the re-display caused by
exposing
 the netscape window proceeded again only when the
 mouse cursor responded to my moving mouse. (Meaning the brief period
 when the second ls process ran.)

After seeing the slow repainting of the root window on my
CRT thusly, I must have hit an incorrect mouse button or someting
and a beep was produced. Well, a single beep would have been produced
in normal time,
but at this time the sound kept continuing.
(This suggests some sort of lost input?)
I had to power-off the linux machine immediately.

I hope the readers got the gist of the problem.

My tentative conclusion is this.

There is a bug in handling multiple LUN device and
MBR-7 multi-CD changer device triggered it somehow.

In my example, it seems that as soon as the control is passed from the
working second ls process that listed the CD at LUN 1 and the first
CD at LUN 0 was again accessed for the resuming first ls process, the
interrupts somehow got turned off (or something similar happened).
(Well, maybe not all interrupts.  The timer interrupt seems to be
recognized after all. Hmm, this puzzles me.)
It might well could be only the SCSI-level problem, but
unless the some of the interrupts are masked, I can't explain the
slow re-painting of windows and the non-responding mouse cursor.

That the mouse movement is not reflected at all during the seeming
hung period when the first ls process ought to be running ( and
accessing the CD at LUN 0 ) suggests to me that the mouse movement data
is not picked AT ALL.  Right, the CUMULATIVE mouse movement is NOT
reflected once the second ls output resumed and the X window system
seemed to respond.  Only the mouse movement during the
brief period when the second ls output got updated was recognized.
So the mouse data must have been silently dropped during the
seemingly hung period.

My mouse is PS/2 mouse at IRQ 12.

IRQ on my machine.
<ishikawa@standard:267>$ cat /proc/interrupts
           CPU0
  0:     295889          XT-PIC  timer
  1:      24206          XT-PIC  keyboard
  2:          0          XT-PIC  cascade
  5:       1437          XT-PIC  Intel EtherExpress Pro 10/100 Ethernet
  8:          2          XT-PIC  rtc
  9:      96500          XT-PIC  BusLogic BT-930
 10:        122          XT-PIC  tmscsim
 12:       8859          XT-PIC  PS/2 Mouse
 13:          0          XT-PIC  fpu
 14:         79          XT-PIC  ide0
NMI:          0
ERR:          0


Although I can try to avoid causing this problem, the problem can be
fatal if triggered.
I may run such commands without intention.

As a comparison point,
the commands worked without a problem (albeit noisy),
in the earlier kernel revisions. (Still earlier kernel revision
had a different locking problem, though, as I mentioned.).

Any tips for collecting more useful data for debugging and fixing
this problem?

(One of these days, fsck has become very solid and
I have lost few files due to problematic power-off shutdown lately.
So I can try a few dangerous testing sessions within reason.)

Finally I would like to thank Kurt Garloff for maintaining
the DC390 tmscsim driver. I noticed the latest pre-release version
annoucement and tested it for a few days. I recalled the
old SCSI CD changer problem of the last year and tried it on my PC today
and
found that the problem re-surfaced in a somehwat different form.

Happy Hacking,


Chiaki Ishikawa

PS: Yesterday, I tried the

        dd if=/dev/scd1 of=/dev/null &
        dd if=/dev/scd2 of=/dev/null

and found that one of the command finished sucessfully and
the other got stuck and didn't proceed.
But I could kill it using control-C.
(At least no crash like in the very early kernel.)

Thus, I thought I would investigate a little more in detail today
and encountered the problem above.
Come to think of it, if interrupt is effectively lost for a
process trying to access a different LUN, then maybe I can
explain the yesterday's problem, too.










-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]

Reply via email to