If James Bottomley, Mike Anderson, or anyone else from linux-scsi can
comment on the material below, I would appreciate it.
On Mon, 10 Jan 2005, Srihari Vijayaraghavan wrote:
> I am unable to trigger the bug, despite my sincere efforts.
If nothing else, I have to admire your persistence in plugging and
unplugging the drive 120 times over the course of a half-hour!
> Please refer to the attachment for the complete kernel log.
>
> I think I understand the problem (conceptually I guess):
> 1. Connect the USB DVD-RW drive
> 2. USB Storage says: "usb-storage: device found at nnn" and "usb-storage:
> waiting for device to settle before scanning"
> 3. And then it waits for 5+ seconds
> 4. Then it initialises the peripheral and then says: "Vendor: PIONEER
> Model:
> DVD-RW DVR-107D Rev: 1.13 ..." and "usb-storage: device scan complete"
That's right. But there's more going on behind the scenes that you aren't
aware of. Hotplug programs, haldaemon, automounters, and who knows what
else.
> I think if I unplug the device precisely the same moment when it initialises
> (step 4), and by time that routine completes kernel is really unhappy (D
> state process, Badness in scsi_device_set_state at
> drivers/scsi/scsi_lib.c:1717, and/or oops) to discover that the drive is not
> there anymore :-).
It's not that simple. Your log contained two different sorts of errors.
The first occurred right near the start:
Jan 10 20:24:46 desktop kernel: usb 1-3: USB disconnect, address 3
Jan 10 20:24:46 desktop kernel: 50-fstab-sync.h[5147]: segfault at
0000000000000000 rip 000000379706f6b2 rsp 0000007fbffff388 error 4
Jan 10 20:24:46 desktop kernel: 50-fstab-sync.h[5161]: segfault at
0000000000000000 rip 000000379706f6b2 rsp 0000007fbffff6d8 error 4
This looks like a problem in a user program, not in the kernel. I have no
idea what 50-fstab-sync.h is. Part of udev perhaps? Greg, do you know?
The second problem occurred during a period in which you replugged the
drive three times! Here's the relevant part of the log:
Jan 10 20:48:37 desktop kernel: usb 1-3: new high speed USB device using
ehci_hcd and address 111
Jan 10 20:48:37 desktop kernel: scsi111 : SCSI emulation for USB Mass Storage
devices
Jan 10 20:48:37 desktop kernel: usb-storage: device found at 111
Jan 10 20:48:37 desktop kernel: usb-storage: waiting for device to settle
before scanning
Jan 10 20:48:42 desktop kernel: Vendor: PIONEER Model: DVD-RW DVR-107D
Rev: 1.13
Jan 10 20:48:42 desktop kernel: Type: CD-ROM
ANSI SCSI revision: 00
Jan 10 20:48:42 desktop kernel: sr1: scsi3-mmc drive: 40x/40x writer cd/rw
xa/form2 cdda tray
Jan 10 20:48:42 desktop kernel: Attached scsi CD-ROM sr1 at scsi111, channel 0,
id 0, lun 0
Jan 10 20:48:42 desktop kernel: usb-storage: device scan complete
Jan 10 20:48:42 desktop kernel: usb 1-3: USB disconnect, address 111
That looks okay. But you don't see the hotplug etc. programs that are
still running, keeping the device below scsi111 open.
Jan 10 20:48:56 desktop kernel: usb 1-3: new high speed USB device using
ehci_hcd and address 112
Jan 10 20:48:56 desktop kernel: scsi112 : SCSI emulation for USB Mass Storage
devices
Jan 10 20:48:56 desktop kernel: usb-storage: device found at 112
Jan 10 20:48:56 desktop kernel: usb-storage: waiting for device to settle
before scanning
Jan 10 20:49:01 desktop kernel: usb-storage: device scan complete
Jan 10 20:49:01 desktop kernel: usb 1-3: USB disconnect, address 112
That's normal also. Host scsi112 was created and apparently destroyed.
Jan 10 20:49:08 desktop kernel: usb 1-3: new high speed USB device using
ehci_hcd and address 113
Jan 10 20:49:08 desktop kernel: scsi113 : SCSI emulation for USB Mass Storage
devices
Jan 10 20:49:08 desktop kernel: usb-storage: device found at 113
Jan 10 20:49:08 desktop kernel: usb-storage: waiting for device to settle
before scanning
Jan 10 20:49:12 desktop kernel: In bus_reset, srb: 0000010037d71640
Jan 10 20:49:12 desktop kernel: device: 0000010038e3d000
Jan 10 20:49:12 desktop kernel: host: 000001002b500c00
Jan 10 20:49:12 desktop kernel: hostdata: 000001002b549c00
This is the first indication of something funny. I can't think how a bus
reset would get initiated by the kernel -- that only happens after a
timeout. Maybe it came from a user program?
Jan 10 20:49:12 desktop kernel: ehci_hcd 0000:00:10.4: port 3 reset error -110
Jan 10 20:49:12 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
Jan 10 20:49:13 desktop kernel: usb 1-3: reset high speed USB device using
ehci_hcd and address 113
Jan 10 20:49:13 desktop kernel: scsi: Device offlined - not ready after error
recovery: host 111 channel 0 id 0 lun 0
Jan 10 20:49:13 desktop kernel: 111:0:0:0: Illegal state transition
deleted->offline
Jan 10 20:49:13 desktop kernel: Badness in scsi_device_set_state at
drivers/scsi/scsi_lib.c:1717
Jan 10 20:49:13 desktop kernel:
Jan 10 20:49:13 desktop kernel: Call
Trace:<ffffffffa0006808>{:scsi_mod:scsi_device_set_state+264}
Jan 10 20:49:13 desktop kernel:
<ffffffffa000481c>{:scsi_mod:scsi_error_handler+2732}
Jan 10 20:49:13 desktop kernel: <ffffffff8010ebf3>{child_rip+8}
<ffffffffa0003d70>{:scsi_mod:scsi_error_handler+0}
Jan 10 20:49:13 desktop kernel: <ffffffff8010ebeb>{child_rip+0}
And here's the weird part. Apparently the bus reset was not for the
current device attached to scsi113, but for the old incarnation attached
to scsi111 -- which is long since gone! That accounts for the illegal
state transition. But why did it happen like this in the first place?
Could it be caused by a user program doing an SG_SCSI_RESET ioctl while
still holding the device open after scsi_remove_host has finished?
Jan 10 20:49:13 desktop kernel: usb 1-3: USB disconnect, address 113
Jan 10 20:49:18 desktop kernel: In bus_reset, srb: 0000010037d71240
Jan 10 20:49:18 desktop kernel: device: 0000010038f09000
Jan 10 20:49:18 desktop kernel: host: 000001002b500400
Jan 10 20:49:18 desktop kernel: hostdata: 000001002b549c00
Jan 10 20:49:18 desktop kernel: scsi: Device offlined - not ready after error
recovery: host 113 channel 0 id 0 lun 0
Jan 10 20:49:18 desktop kernel: usb-storage: device scan complete
Now it's not surprising that the reset, which was intended for scsi111,
has interfered with the proper operation of scsi113.
Alan Stern
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
[email protected]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel