Let me start by explaining the theory behind single-lun. The idea
is that for devices such as this, accessing different luns at the same
time would require the media to be flipped in and out. This would be a
time consuming operation, and not likely to be the optimal situation. The
point of the SINGLE_LUN flag is that in *theory* only a single LUN for the
device should be active at one time.
One of the fundamental problems with it from the start is that
access is blocked to other LUNs for the device only as long as one lun is
active. For things like simultaneous directory listings, each one will be
intermittently accessed (to read the directory), and thus it is still
possible that the mechanism will thrash the two discs in and out. The
only way to prevent this would be to have a timer of some sort which would
only allow access to a different LUN some small interval of time after the
list command completes.
With the new queueing code, things won't be a lot better, I am
afraid. As things stand, each LUN would get it's own queue, but now that
I think about it, this is probably wrong from an architectural point of
view, and I should instead lump all of the LUNs into a single queue for
the case that the SINGLE_LUN flag is on. Even this wouldn't be quite
enough - the whole way SINGLE_LUN was implemented probably needs to be
redone.
As for debugging this, I can suggest a couple of things. Start by
examining /var/log/messages and see whether any kernel messages related to
this device have appeared. Secondly, it would probably make better sense
to test these things without X and just from two virtual consoles.
Finally, I would suggest bumping up the logging levels when you run this
test - this will make it possible to see more about what the heck the
thing is doing. Look at the kernel configure help for
CONFIG_SCSI_LOGGING for more information. I have some older information
that goes into more detail on http://www.andante.org/scsi_error.html
(the logging went in at the same time that the new error handling code
appeared).
-Eric
"The world was a library, and its books were the stones, leaves,
brooks, grass, and the birds of the earth. We learned to do what only
a student of nature ever learns, and that was to feel beauty."
Chief Luther Standing Bear - Teton Sioux
On Wed, 8 Dec 1999, ishikawa wrote:
>
> Version of Linux reported:
> Linux standard 2.2.13 #2 SMP Sun Oct 24 04:06:54 JST 1999 i586 unknown
>
> The device in question is Nakamich MBR-7 multi-CD (7 CDs) changer.
> This is blacklisted in /usr/src/linux/driver/scsi/scsi.c
> like this:
> {"NRC","MBR-7","*", BLIST_FORCELUN | BLIST_SINGLELUN},
> {"NRC","MBR-7.4","*", BLIST_FORCELUN | BLIST_SINGLELUN},
>
> This CD changer has 7 CD slots. There is only one reader mechanism.
> Only one CD can be accessed at a time. Different CD can be accessed
> after the currently available CD is replaced with the new CD
> mechanically in the reading mechanism. I think this is why the device
> is blacklisted as SINGLELUN device (BLIST_SINGLELUN).
> Each CD is given a unique LUN. (0-6). The device as a whole uses
> only one target ID.
>
> I am using a BusLogic adaptor driven by BusLogic driver
> and an AMD scsi chip adaptor driven by tmscsim driver.
> The Nakamich MBR-7 multi-CD changer device is connected to
> tmscsim-driven AMD scsi chip adaptor.
>
> ls -l /proc/scsi
> total 0
> dr-xr-xr-x 2 root root 0 Dec 8 04:52 BusLogic/
> -rw-r--r-- 1 root root 0 Dec 8 04:52 scsi
> dr-xr-xr-x 2 root root 0 Dec 8 04:52 tmscsim/
>
>
> The listing of "cat /proc/scsi/scsi"
>
> **** comment : scsi0 is the BusLogic driver and not
> **** related to the problem here.
>
> Host: scsi0 Channel: 00 Id: 01 Lun: 00
> Vendor: IBM Model: DSAS-3540 Rev: S47K
> Type: Direct-Access ANSI SCSI revision: 02
> Host: scsi0 Channel: 00 Id: 03 Lun: 00
> Vendor: CyberDrv Model: CD-ROM TW240S Rev: 1.20
> Type: CD-ROM ANSI SCSI revision: 02
> Host: scsi0 Channel: 00 Id: 05 Lun: 00
> Vendor: SAMSUNG Model: WN321010S Rev: 1224
> Type: Direct-Access ANSI SCSI revision: 02
>
> **** comment: scsi1 is the AMD-chip SCSI controller (using tmscsim
> driver)
> **** to which the Nakamich drive (recognized as MBR-7) is
> **** connected.
>
> Host: scsi1 Channel: 00 Id: 04 Lun: 00
> Vendor: NRC Model: MBR-7 Rev: 110
> Type: CD-ROM ANSI SCSI revision: 02
> Host: scsi1 Channel: 00 Id: 04 Lun: 01
> Vendor: NRC Model: MBR-7 Rev: 110
> Type: CD-ROM ANSI SCSI revision: 02
> Host: scsi1 Channel: 00 Id: 04 Lun: 02
> Vendor: NRC Model: MBR-7 Rev: 110
> Type: CD-ROM ANSI SCSI revision: 02
> Host: scsi1 Channel: 00 Id: 04 Lun: 03
> Vendor: NRC Model: MBR-7 Rev: 110
> Type: CD-ROM ANSI SCSI revision: 02
> Host: scsi1 Channel: 00 Id: 04 Lun: 04
> Vendor: NRC Model: MBR-7 Rev: 110
> Type: CD-ROM ANSI SCSI revision: 02
> Host: scsi1 Channel: 00 Id: 04 Lun: 05
> Vendor: NRC Model: MBR-7 Rev: 110
> Type: CD-ROM ANSI SCSI revision: 02
> Host: scsi1 Channel: 00 Id: 04 Lun: 06
> Vendor: NRC Model: MBR-7 Rev: 110
> Type: CD-ROM ANSI SCSI revision: 02
>
> Output of "cat /proc/scsi/tmscsim/1"
>
> Tekram DC390/AM53C974 PCI SCSI Host Adapter, Driver Version 2.0d24
> 1999/11/14
> SCSI Host Nr 1, AM53C974 Adapter Nr 0
> IOPortBase 0xe000, IRQ 10
> MaxID 7, MaxLUN 8, AdapterID 7, SelTimeout 250 ms, DelayReset 1 s
> TagMaxNum 16, Status 0x00, ACBFlag 0x00, GlitchEater 24 ns
> Statistics: Cmnds 23, Cmnds not sent directly 0, Out of SRB conds 0
> Lost arbitrations 0, Sel. connected 0, Connected: No
> Nr of attached devices: 7, Nr of DCBs: 7
> Map of attached LUNs: 00 00 00 00 7f 00 00 00
> Idx ID LUN Prty Sync DsCn SndS TagQ NegoPeriod SyncSpeed SyncOffs MaxCmd
>
> 00 04 00 Yes No Yes Yes No (200 ns) 01
> 01 04 01 Yes No Yes Yes No (200 ns) 01
> 02 04 02 Yes No Yes Yes No (200 ns) 01
> 03 04 03 Yes No Yes Yes No (200 ns) 01
> 04 04 04 Yes No Yes Yes No (200 ns) 01
> 05 04 05 Yes No Yes Yes No (200 ns) 01
> 06 04 06 Yes No Yes Yes No (200 ns) 01
>
>
>
> Problem symptom and a short history:
>
> Until early summer of last year (sorry I forgot the exact revision.
> Probably
> 2.0.36?), there was a problem of supporting multiple-LUN device.
>
> In my setup, /dev/scd0 is the single CD drive on the first SCSI adaptor.
>
> The first and the second CD (LUN 0 and LUN 1) on the MBR-7 drive
> is then /dev/scd1 and /dev/scd2.
>
> In the pre-2.0.3[67]? kernel, trying to run the following sequence of
> command
> locked the kernel solid. Not even the magic Alt+SysReq+key combination
> produced meaningful result. (Well, I think I tested this Alt+SysReq+Key
> in 2.1.1xx series after I noticed the problem and upgraded it
> to the emerging newer kernel version for testing. So I should have said
> pre-2.2.x kernel, maybe.)
>
> dd if=/dev/scd1 of=/dev/null &
> dd if=/dev/scd2 of=/dev/null
>
> Running only one of the command was OK, though.
> I tried to resolve the problem, but concluded that there was
> a bug in the SCSI layer somewhere concerning the support of multi-lun,
> but
> got nowhere.
> (Well, now I recall that there was a misplacement of the if() statement
> concerning the checking of BLIST_SINGLELUN device and fixing this was
> a little improvement in 2.0.3x series.
> I think without the fix, the access was more flaky.
> Even after the fix, above hung occured reliably. Also, 2.1.1xx was also
> problematic as I noted avove)
>
> However, it was fixed in the later kernel (2.2.x).
> I remember I saw the commands above work.
> Please note that when the commands worked, the switching of the two
> processes above
> caused a lot of mechanical noise since the MBR-7 drive needed to
> switch the CDs physically into/from the reading tray.
> (This audio feedback is helpful when I diagnose the CD access problem.)
> I tend to not invoke such simultaneous access commands although
> there are times when I do need to mount multiple source archive CDs,
> etc.
> simultaneously and the multi-CD changer drive is a very handy device
> in such cases.
>
> Lately, after almost a year, I tried to mount two CDs on the MBR-7
> drive.
> One of these days, the standalone x24 drive at /dev/scd0 works as
> primary CD and when I need the second media, I use a LUN in MBR-7
> (double speed, working but slow).
>
> Then I noticed a very severe problem.
> The following is the story.
>
> I mounted two CDs in the following manner.
>
> mount -t iso9660 /dev/scd1 /mnt1 (LUN 0)
> mount -t iso9660 /dev/scd2 /mnt2 (LUN 1)
>
> So far, so good. Both were mounted after a slight delay caused by
> the initial loading of CD (this is normal and I heard the mechanical
> noise.)
> I checked by running "ls -l /mnt1" and "ls -l /mnt2" immediately
> after the mount of each LUN. The listing worked.
>
> Now the fun began.
> I ran the following two commands in two different xterm windows and
> encountered the severe problem.
>
> Firstly, in the first xterm window, I ran
>
> ls -lR /mnt1 (in the first window)
>
> The listing began appearing. The CD media at LUN 0 mounted at /mnt1
> is Free Solaris 7 installation media and the listing takes a long time
> to finish.
> So while the listing in the first window continued,
> I ran the following command in the second xterm window.
>
> ls -lR /mnt2 (in the second window.)
>
> Now, the second command lists the CD media at LUN 1 of the MBR-7 changer
>
> device and so I expected a lot of mechanical noise due to the shuffling
> of CDs. But it should work.
> Such multiple access to different CDs
> simultaneously DID work last year after I switched to the
> newer kernel (2.2.[6-9] Sorry again, I am hazy about the revisions
> here. No problem there.
>
> However, today, I noticed that the operation caused severe problems.
>
> - First, as soon as I type in the second ls command,
> I noticed that the display in the first window stopped
> updating completely. The directory output got stuck somewhere and
> was never updated after the second command was issued.
>
> - The directory listing in the second window is produced continuously
> for
> a second or two.
>
> - The changer device after one or two seconds began switching the
> CDs : I can hear the mechanical noise and I thought this was normal.
> [ When these commands worked, after the noise, i.e. the shuffling of
> CDs, the directory listing commands to different CDs
> ought to get updated and continue one by one in turn
> until both commands finishes.
> Lot of noise, but it worked before. ]
>
> However, this time, even after the mechanical noise and a period of
> pause, the first window never gets updated.
> During this time, the output in the second window stopped so I gather
>
> the process of the second command was suspended.
> The first process ought to be running and update the
> output, but it didn't.
>
> - After looking at the CRT display and heared the mechanical noises
> about a couple of dozen times, I figured that something was amiss and
>
> began investigating.
>
> - First thing I tried is to make the xterm window where the
> stuck first ls command was running own the input focus by
> clicking the window pane. It didn't work! Or rather, to my horror,
> that mouse was not usable! It didn't respond.
>
> - After a few such trials, I found out an important thing.
> The mouse responded only during the brief time when the
> display in the second xterm
> window where the second directory listing got updated.
>
> As soon as I heared the mechanical noise, presumably so that the
> first directory command began accessing the CD at LUN 0 again, the
> mouse
> stopped responding. After a pause, another mechanical noise,
> presumably
> for the access to CD at LUN 1 to resume, the second window again
> resumed
> updating its output, and then the mouse worked during this brief
> period when the second ls process ran.!
>
> During the investigation, by mistake, I tried move a larger netscape
> window to the front
> and block the two xterm windows. However, the re-display or
> the shuffling of the windows didn't proceed smoothly.
> It was hard to tell exactly, but I think the re-display caused by
> exposing
> the netscape window proceeded again only when the
> mouse cursor responded to my moving mouse. (Meaning the brief period
> when the second ls process ran.)
>
> After seeing the slow repainting of the root window on my
> CRT thusly, I must have hit an incorrect mouse button or someting
> and a beep was produced. Well, a single beep would have been produced
> in normal time,
> but at this time the sound kept continuing.
> (This suggests some sort of lost input?)
> I had to power-off the linux machine immediately.
>
> I hope the readers got the gist of the problem.
>
> My tentative conclusion is this.
>
> There is a bug in handling multiple LUN device and
> MBR-7 multi-CD changer device triggered it somehow.
>
> In my example, it seems that as soon as the control is passed from the
> working second ls process that listed the CD at LUN 1 and the first
> CD at LUN 0 was again accessed for the resuming first ls process, the
> interrupts somehow got turned off (or something similar happened).
> (Well, maybe not all interrupts. The timer interrupt seems to be
> recognized after all. Hmm, this puzzles me.)
> It might well could be only the SCSI-level problem, but
> unless the some of the interrupts are masked, I can't explain the
> slow re-painting of windows and the non-responding mouse cursor.
>
> That the mouse movement is not reflected at all during the seeming
> hung period when the first ls process ought to be running ( and
> accessing the CD at LUN 0 ) suggests to me that the mouse movement data
> is not picked AT ALL. Right, the CUMULATIVE mouse movement is NOT
> reflected once the second ls output resumed and the X window system
> seemed to respond. Only the mouse movement during the
> brief period when the second ls output got updated was recognized.
> So the mouse data must have been silently dropped during the
> seemingly hung period.
>
> My mouse is PS/2 mouse at IRQ 12.
>
> IRQ on my machine.
> <ishikawa@standard:267>$ cat /proc/interrupts
> CPU0
> 0: 295889 XT-PIC timer
> 1: 24206 XT-PIC keyboard
> 2: 0 XT-PIC cascade
> 5: 1437 XT-PIC Intel EtherExpress Pro 10/100 Ethernet
> 8: 2 XT-PIC rtc
> 9: 96500 XT-PIC BusLogic BT-930
> 10: 122 XT-PIC tmscsim
> 12: 8859 XT-PIC PS/2 Mouse
> 13: 0 XT-PIC fpu
> 14: 79 XT-PIC ide0
> NMI: 0
> ERR: 0
>
>
> Although I can try to avoid causing this problem, the problem can be
> fatal if triggered.
> I may run such commands without intention.
>
> As a comparison point,
> the commands worked without a problem (albeit noisy),
> in the earlier kernel revisions. (Still earlier kernel revision
> had a different locking problem, though, as I mentioned.).
>
> Any tips for collecting more useful data for debugging and fixing
> this problem?
>
> (One of these days, fsck has become very solid and
> I have lost few files due to problematic power-off shutdown lately.
> So I can try a few dangerous testing sessions within reason.)
>
> Finally I would like to thank Kurt Garloff for maintaining
> the DC390 tmscsim driver. I noticed the latest pre-release version
> annoucement and tested it for a few days. I recalled the
> old SCSI CD changer problem of the last year and tried it on my PC today
> and
> found that the problem re-surfaced in a somehwat different form.
>
> Happy Hacking,
>
>
> Chiaki Ishikawa
>
> PS: Yesterday, I tried the
>
> dd if=/dev/scd1 of=/dev/null &
> dd if=/dev/scd2 of=/dev/null
>
> and found that one of the command finished sucessfully and
> the other got stuck and didn't proceed.
> But I could kill it using control-C.
> (At least no crash like in the very early kernel.)
>
> Thus, I thought I would investigate a little more in detail today
> and encountered the problem above.
> Come to think of it, if interrupt is effectively lost for a
> process trying to access a different LUN, then maybe I can
> explain the yesterday's problem, too.
>
>
>
>
>
>
>
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [EMAIL PROTECTED]
>
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]