Re: [linux-usb-devel] oops on usb storage device disconnect with 2.6.14-rc1
James Bottomley [EMAIL PROTECTED] wrote: On Thu, 2005-09-15 at 15:57 -0400, James Bottomley wrote: I haven't had time to review the eh changes, but I was going to reply to the other one (basically there's a better way to try to close the device add/host remove race using the host state model). Let me complete the SCSI process and I'll take them through the scsi-rc- fixes tree. Well, I think the symptoms are racing scsi_remove_host() calls and the solution is to enforce the state model on removal (as in if the host is already in the remove state, don't try to remove it again). Could you try the patch here: http://marc.theaimsgroup.com/?l=linux-scsim=112613077011571 And see if it will fix the problem? A side effect of not applying Alan's previous patch that added SHOST_RECOVERY to the SHOST_CANCEL: state is that we will not move to the SHOST_CANCEL and subsequently not to SHOST_DEL state if the eh is active during the start of scsi_remove_host. I sent mail on the 7th indicating to include that state change hunk of the diff, but I guess that overlapped with your newer state changes. http://marc.theaimsgroup.com/?l=linux-scsim=112238726326927w=2 In looking at the state model introduced by your patch I believe there may still be a state model race issue if the recovery completes just after the if (!scsi_host_set_state(shost, SHOST_CANCEL)) call in scsi_remove_host (maybe I am just looking to quickly at the state updates). I still do not understand as I asked in a previous comment why we are not shutting down the eh_thread in scsi_remove_host and also why simpler state model updates could not solve the problem. I believe I also indicated that we could enhance scsi_error to shutdown faster during this state which should only be a performance improvement. -andmike -- Michael Anderson [EMAIL PROTECTED] --- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42 plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php ___ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
Re: [linux-usb-devel] oops on usb storage device disconnect with 2.6.14-rc1
James Bottomley [EMAIL PROTECTED] wrote: On Thu, 2005-09-15 at 15:19 -0700, Mike Anderson wrote: A side effect of not applying Alan's previous patch that added SHOST_RECOVERY to the SHOST_CANCEL: state is that we will not move to the SHOST_CANCEL and subsequently not to SHOST_DEL state if the eh is active during the start of scsi_remove_host. I sent mail on the 7th indicating to include that state change hunk of the diff, but I guess that overlapped with your newer state changes. http://marc.theaimsgroup.com/?l=linux-scsim=112238726326927w=2 Yes, but that's not really legitimate since it introduces a bifurcation in the state machine ... when the eh terminates it will come back to running even if it went in from cancel. Clarification here as I do not see the split in the state machine or the transition back to running from cancel. If the above patch is applied we can transition to cancel from recovery if eh already has started. When eh is complete scsi_restart_operations transition to running will fail as we are in the cancel state. That said I like the idea below of waiting / terminating the eh thread prior to transitioning to cancel. There is some introduction of asymmetry here in scsi_remove_host as the eh thread is created in scsi_host_alloc, but possibly later patches could move the eh creation to scsi_add_host (unless I forgot the reason it needed to be earlier). In looking at the state model introduced by your patch I believe there may still be a state model race issue if the recovery completes just after the if (!scsi_host_set_state(shost, SHOST_CANCEL)) call in scsi_remove_host (maybe I am just looking to quickly at the state updates). No, that's true; there's a tiny race that can be mediated by doing locking around the state changes ... that was one of the feedback comments from Alan. ok. I still do not understand as I asked in a previous comment why we are not shutting down the eh_thread in scsi_remove_host and also why simpler state model updates could not solve the problem. Well, it goes back to whether we wait for the thread or not. To shut the thread down, we also need to wait for it to complete. As far as the state model goes, we either need to wait for the eh thread before transitioning to cancel or introduce the extra states that reflect the parallel eh transitions. I like waiting for the eh thread to complete / shutdown prior to transitioning to cancel. I believe I also indicated that we could enhance scsi_error to shutdown faster during this state which should only be a performance improvement. Yes, we could ... patches? ok, I will try, but IBM non-coding activities has made me very unproductive in the patch department lately :-). -andmike -- Michael Anderson [EMAIL PROTECTED] --- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42 plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php ___ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
Re: [linux-usb-devel] 2.6.11-rc3-bk5, oops in scsi_try_bus_reset
Alan Stern [EMAIL PROTECTED] wrote: And Mike Anderson's response was http://marc.theaimsgroup.com/?l=linux-scsim=110538854224319w=2 His explanation was Currently scsi_host_cancel being called from scsi_remove_host appears to not do anything as scsi_forget_host removes the devices from the list it iterates over. I don't know whether this problem has been fixed yet. No, this has not been fixed yet, but needs to be. In discussing this with Mike C today he had a patch that would drain the devices queues prior to scsi_forget_host returning which may help in this case. Along with Mike C's change we could remove the scsi_host_cancel code (since it does not do anything and when it did it was racey), and shutdown the error handler thread in scsi_remove_host (reduce possible callers into the LLDD post scsi_remove_host). Mike C or I will have something out soon on this. -andmike -- Michael Anderson [EMAIL PROTECTED] --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click ___ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10
Alan Stern [EMAIL PROTECTED] wrote: Jan 10 20:49:08 desktop kernel: scsi113 : SCSI emulation for USB Mass Storage devices Jan 10 20:49:08 desktop kernel: usb-storage: device found at 113 Jan 10 20:49:08 desktop kernel: usb-storage: waiting for device to settle before scanning Jan 10 20:49:12 desktop kernel: In bus_reset, srb: 010037d71640 Jan 10 20:49:12 desktop kernel: device: 010038e3d000 Jan 10 20:49:12 desktop kernel: host: 01002b500c00 Jan 10 20:49:12 desktop kernel: hostdata: 01002b549c00 This is the first indication of something funny. I can't think how a bus reset would get initiated by the kernel -- that only happens after a timeout. Maybe it came from a user program? Well it can get initiated in the scsi error handler. This can be verified by setting the logging_level (sysctl -w dev.scsi.logging_level=0x4). You probably do not need to do this as I assume what is happening is that with all the plug and unplug that is being initiated that a IO is in flight when the scsi_remove_host functions is called. It looks like we still have to decide on the removal cleanup. Currently scsi_host_cancel being called from scsi_remove_host appears to not do anything as scsi_forget_host removes the devices from the list it iterates over. Also as previously mentioned by you there are races with scsi_mid canceling IOs that might be completing. Both of these issue where mentioned in an old thread (link provided below), but not completely resolved. http://marc.theaimsgroup.com/?t=10963092065r=1w=2 What is needed is for the LLDD to ensure all IO sent through the LLDD's queuecommand functions gets a scsi_done call (even a fail call). The scsi mid applies choke points to stop IO being set to the LLDDs queuecommand, but there can be races during a host removal (you previously mentioned this). This would also imply that scsi mid should not be making the call to scsi_host_cancel (which appears to not do anything anyway). -andmike -- Michael Anderson [EMAIL PROTECTED] --- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almosthttp://www.thinkgeek.com/sfshirt ___ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
Re: [linux-usb-devel] BUG when removing USB flash drive
Alan Stern [EMAIL PROTECTED] wrote: On Thu, 13 May 2004, Mike Anderson wrote: The LLDD queuecommand should not be called after the return of scsi_remove_host (unless we have a bug). The LLDD should not free its resources until the release function is called on the struct device passed in during the scsi_add_host call. Ah. We have a problem here, because usb-storage doesn't own that struct device. It is owned by the USB core, and usb-storage has no way to know when its release function has been called. Even worse, the release function may never be called at all. The device itself may still exist and be attached to the system even though the usb-storage driver has been unbound from it. There must be many other LLDDs with the same problem. Any driver that manages a PCI host controller, for example. The struct device it receives from the PCI layer isn't owned by the LLDD, yet that's what it must pass to scsi_add_host(). How should we handle this? I was incorrect in my previous statement. Well we may hold a reference count on the struct device passed into the scsi_add_host until all references on the host instance are gone and there maybe a positive ref count on the hostt module until all devices are closed a instance of LLDD should be able to do cleanup post return of scsi_remove_host. This needs to be true or SCSI mid-layer would not honor the requirement for the pci remove call. I believe in the past you or someone else has mentioned that post the return of scsi_remove_host a Scsi_Host instance still has a hostt and a transportt which may be dangerous if all code paths are not correctly handling a host in the SHOST_DEL state. I guess we still have this current issue on what is happening in the usb storage that it believes SCSI mid is still using the device. Is something not cleaning up, or is something in SCSI mid not honoring the SHOST_DEL state. -andmike -- Michael Anderson [EMAIL PROTECTED] --- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562alloc_id=6184op=click ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
Re: [linux-usb-devel] BUG when removing USB flash drive
Alan Stern [EMAIL PROTECTED] wrote: This BUG happened because the SCSI layer was still using the drive after usb-storage had called scsi_remove_host(). In this case the scsi_remove_host is being called in a unexpected disconnect case. Any IOs in flight will be canceled in the mid-layer (i.e., they will not be chased down with calls to eh_abort_handler). This is an aspect of SCSI midlayer - low-level driver interaction that I'm not very clear about. How long after the host is removed can the midlayer continue to use it? How does the driver know when the midlayer is finished using the host so the driver can exit (and unload from memory)? The LLDD queuecommand should not be called after the return of scsi_remove_host (unless we have a bug). The LLDD should not free its resources until the release function is called on the struct device passed in during the scsi_add_host call. This release function could be called post the scsi_host_dev_release function being called as it does a put on this struct device. This chain of release calls is undefined as a hotplug goes out on the remove calls, but if user space never umounts / closes the device things will not fully cleanup. -andmike -- Michael Anderson [EMAIL PROTECTED] --- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562alloc_id=6184op=click ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
[linux-usb-devel] Re: bug 2400
PS, I am traveling today so future comments will be delayed a bit. Alan Stern [EMAIL PROTECTED] wrote: All right, let's look at sd.c. I'll show you that _it_ doesn't obey the object lifetime rules. In sd_open we see this code (lightly edited): static int sd_open(struct inode *inode, struct file *filp) { struct gendisk *disk = inode-i_bdev-bd_disk; struct scsi_disk *sdkp = scsi_disk(disk); struct scsi_device *sdev; int retval; retval = scsi_disk_get(sdkp); if (retval) return retval; sdev = sdkp-device; As it turns out, the block layer guarantees that when sd_open runs the bd_disk pointer will be valid. It does this by following the pattern I mentioned in an earlier message -- drivers/base/map.c uses a subsystem-wide semaphore, domain_sem, to properly synchronize lookups and deletes. Next, the scsi_disk inline function returns: container_of(disk-private_data, struct scsi_disk, driver); How do you know that the scsi_disk pointed to by disk-private_data still exists? So far as I can see, the gendisk doesn't take any references to it. Correct me if I'm wrong, but there doesn't seem to be anything preventing a disconnect event from arriving after the open() call has got a valid reference to the gendisk, and succeeding in deallocating the scsi_disk before this code executes. There's only one reference between the scsi_disk and the gendisk, and it goes the wrong way: the scsi_disk owns a reference to the gendisk. But let's suppose that works okay, so sdkp is a valid pointer. Then the code calls scsi_disk_get(), which in turn calls scsi_device_get() for sdkp-device. How do you know that this doesn't point to deallocated storage? The only reference to the scsi_device is taken (in a rather convoluted way) by the gendisk, and it is dropped during del_gendisk() -- not when the gendisk is released. Hence it is entirely possible for a disconnect event to have freed the scsi_device when this code executes. There's two potential oopses for you. I don't have a full grasp of the web of interlocking references (and interlocking code) in the SCSI, gendisk, and block layers, but it seems likely that at least one of these might actually happen. The object lifetime rules require that in your disconnect() routine, you must tell all your users that your structure is going away, but you must not free the structure until your users have notified you that they won't try to use it any more. When the scsi_disk is on its way out, sd.c tells the gendisk but doesn't wait for a notification in return. When the scsi_device is on its way out the SCSI core tells sd.c, but sd.c doesn't send back its notification at the right time. As I previously stated there is no notification back from the block layer that users are complete with a structure. Currently the only method to prevent an oops looks like sd_remove would need to use lock_kernel so that scsi could ensure that a user would be through open which means the sd_remove would not take references to zero or the user has not made it far enough through do open that they have received a gendisk structure so that del_gendisk will ensure they do not call sd_open. I would like another method, but this looks like it would need to be another shared sync mechanism between scsi layer (an other blk interfaces) and block layer or a lookup method similar to kobj_lookup in scsi open routines so that a object can be unmapped atomically. -andmike -- Michael Anderson [EMAIL PROTECTED] --- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470alloc_id=3638op=click ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
[linux-usb-devel] Re: bug 2400
James Bottomley [EMAIL PROTECTED] wrote: Well, I know why this happens, but I'm not entirely clear how to fix it. The problem comes because the cdrom open and close take and release references to the SCSI generic device (as they're supposed to). However, Upper level Drivers like sr are implemented as generic device drivers in the driverfs model. When a USB unplug comes along, it calls scsi_remove_device, which eventually calls device_del(). The problem is that device_del triggers the -remove methods of all the attached drivers and the sr_remove method calls cdrom_unregister which throws away the cdrom device state, even though the actual device has active references. Yes, we reordered some of this in sd. As your comment down below indicates reordering will reduce the window but not eliminate it. Some time later, the device is closed but there's now bogus state because the sr_remove method has kfreed the struct scsi_cd which contains the struct cdrom_device_info. Now, the questions are, whose issue is this and how do we fix it? I can see that a driver needs early notification of unplugs so it can deny all access to a gone device. On the other hand, for a user land open where we still have to hold resources in the driver, we'd like the driver to have a notify when the device reference count drops to zero so we can clean up. This problem, by the way, exists in a lesser form for sd: the sd remove method will free the device for reattachment even though it might have active references. I have looked at the sd issue off and on due to the previous open race reported by Alan Stern. While the window can be narrowed inside SCSI you need help for the calling subsystem to know when it is OK to cleanup and your routine will not be called anymore. A similar problem also showed up in the tear down the host directory entry in /proc/scsi but was only fixed up so far due to its depreciated status. http://marc.theaimsgroup.com/?t=10554517591r=1w=2 I believe as indicated above that all cross subsystem registrations need a release / put callback. This would allow the release chain to be called from block - ULD - scsi core - LLDD. Recently I have not been spending the proper time looking at this, but last look it appeared that we needed to add a release / put method call to the gendisk disk_release routine. The release function or object to do the put on would need to be set prior to the call to add_disk. -andmike -- Michael Anderson [EMAIL PROTECTED] --- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470alloc_id=3638op=click ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
Re: [linux-usb-devel] Re: Unregistering interfaces
Alan Stern [EMAIL PROTECTED] wrote: You can search the lkml archives for the bugs that my kobject patch fixed, as there was some discussion there (otherwise I would have never written the patch...) I searched and found one thread that looks relevant: http://marc.theaimsgroup.com/?t=10635469474r=1w=2 In the final message Mike said that the patch did fix the problem but that a better answer would be to fix the SCSI sd driver. If that's been taken care of then there's no more need for the kobject patch, so far as SCSI is concerned. My test yesterday showed everything working okay without the patch. Mike, can you tell us the current state of the SCSI code? I did change the sd scsi code to correct the order of sysfs calls on tear down. I tried to do it all in sd / scsi core code, but there is still a small window for a open / tear down race in sd. I believe the only way to correct this is to add a release callback from the block layer. There is also another thread that I believe the usb-devel list is copied on that is a similar tear down issue. -andmike -- Michael Anderson [EMAIL PROTECTED] --- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470alloc_id=3638op=click ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
[linux-usb-devel] Re: bug 2400
James Bottomley [EMAIL PROTECTED] wrote: Recently I have not been spending the proper time looking at this, but last look it appeared that we needed to add a release / put method call to the gendisk disk_release routine. The release function or object to do the put on would need to be set prior to the call to add_disk. Actually, I can't believe this problem to be local entirely to SCSI. So, a simpler mechanism (and more globally useful) might be to have a two phase driver release in sysfs: I would agree this should be more general than SCSI, but if kobjects are being passed during the registration / setup in other subsystem interfaces than this could already be handled. - the current -remove would stay where it is (as a notify on device_del). On receving this the driver begins clean up enough to drop any internal references to the device it is holding. - then introduce a -release which is called as part of dropping the last device reference where the driver cleans up any resources the driver was keeping to service the removed but not released device. Then, we'd obviously not call unregister_cdrom and kfree the scsi_cd structure until -release time. Maybe some clarification here as I am unsure if we both think there needs to be a notification (a put call) from outside SCSI. We have release functions available on most objects in SCSI now. The issue is that when we register (add_disk, dev_set_drvdata, etc.) or pass a handle to another subsystem we need a reference count agreement to know when the other subsystem is done with the the object. Something like the put_device(parent) used in scsi_host_dev_release. -andmike -- Michael Anderson [EMAIL PROTECTED] --- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470alloc_id=3638op=click ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
[linux-usb-devel] Re: bug 2400
James Bottomley [EMAIL PROTECTED] wrote: On Fri, 2004-04-02 at 11:45, Mike Anderson wrote: Maybe some clarification here as I am unsure if we both think there needs to be a notification (a put call) from outside SCSI. We have release functions available on most objects in SCSI now. The issue is that when we register (add_disk, dev_set_drvdata, etc.) or pass a handle to another subsystem we need a reference count agreement to know when the other subsystem is done with the the object. Something like the put_device(parent) used in scsi_host_dev_release. Actually, no, that's not the issue here, if I understand you. The reference counting model on the sdev-sdev_gendev seems to be working correctly because sr.c takes a reference to the sdev_gendev on open and drops it on close. The problem is that ULDs are implemented as struct device_drivers and as such, their -remove method gets called *not* on last put of sdev_gendev but on device_del (when there are still active references). The remove can do as much or as little as the implementor wishes, but I believe there is still a under lying issue here (see comment below). sr.c frees the cdinfo structure on -remove, but still has its own reference to sdev_gendev (because the device is still open). On final close, the generic cdrom code tries to use cdinfo to close the device and references a kfree'd structure. Really what sr.c wants to be doing is freeing the cdinfo structure on last put, not on device_del. Where does the last put come from? How do you close the open race or know when the final put_disk has been called? SCSI cannot do this alone as we have created and registered an object in another subsystem (alloc_disk and add_disk) and we have no indication when that objects ref count has reached zero. Or As I previous stated in the thread below I have the gendisk / block layer locking mis-understood and there is something that SCSI can do. For reference you can look at this thread sent by Alan about a sd race. http://marc.theaimsgroup.com/?t=10759118583r=1w=2 While freeing in sr could be rearranged more like what sd does there is still the issue of a cross subsystem put to know that the ULDs open function will not be called again. -andmike -- Michael Anderson [EMAIL PROTECTED] --- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470alloc_id=3638op=click ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
[linux-usb-devel] Re: bug 2400
James Bottomley [EMAIL PROTECTED] wrote: = drivers/scsi/sr.c 1.103 vs edited = --- 1.103/drivers/scsi/sr.c Fri Apr 2 11:30:44 2004 +++ edited/drivers/scsi/sr.c Fri Apr 2 17:29:06 2004 @@ -424,8 +424,19 @@ static int sr_block_release(struct inode *inode, struct file *file) { + int ret; struct scsi_cd *cd = scsi_cd(inode-i_bdev-bd_disk); - return cdrom_release(cd-cdi, file); + struct scsi_device *sdev = cd-device; + ret = cdrom_release(cd-cdi, file); + if(ret) + return ret; It looks like cdrom_release always returns 0? Is there some other patch outstanding that changes this. -andmike -- Michael Anderson [EMAIL PROTECTED] --- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470alloc_id=3638op=click ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
[linux-usb-devel] Re: Problem with an USB Stick and kernel 2.6.1
Alan Stern [EMAIL PROTECTED] wrote: Seeing no sense key, usb-storage returns 0 indicating the command succeeded. But scsi_finish_command() sees that valid sense data is present and sets the driver_byte to DRIVER_SENSE. SCSI error : 1 0 0 0 return code = 0x800 Current sda: sense key No Sense This causes scsi_io_completion() to call scsi_end_request() with uptodate = 0 and results in this error: end_request: I/O error, dev sda, sector 32 FAT: unable to read boot sector How should we handle this? Should we zero out sense_buffer[0] when auto-sense shows sense key = NO_SENSE? I would think the upper level driver should handle this. We decode RECOVERED_ERROR in sd. sd could just handle NO_SENSE the same with / without the print_sense. I noticed that st and osst under some circumstances look at the NO_SENSE case (the tape people will need to comment as this looks like a EOM workaround) so it would seem incorrect for the LLDD as a general policy to say it received no sense when it did. -andmike -- Michael Anderson [EMAIL PROTECTED] --- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
[linux-usb-devel] Re: Problem with an USB Stick and kernel 2.6.1
This matches what I thought should be added to sd, but I did not hear an answer from James / others if this is the way we should handle it. Alan Stern [EMAIL PROTECTED] wrote: On Fri, 16 Jan 2004, Mike Anderson wrote: Alan Stern [EMAIL PROTECTED] wrote: Seeing no sense key, usb-storage returns 0 indicating the command succeeded. But scsi_finish_command() sees that valid sense data is present and sets the driver_byte to DRIVER_SENSE. SCSI error : 1 0 0 0 return code = 0x800 Current sda: sense key No Sense This causes scsi_io_completion() to call scsi_end_request() with uptodate = 0 and results in this error: end_request: I/O error, dev sda, sector 32 FAT: unable to read boot sector How should we handle this? Should we zero out sense_buffer[0] when auto-sense shows sense key = NO_SENSE? I would think the upper level driver should handle this. We decode RECOVERED_ERROR in sd. sd could just handle NO_SENSE the same with / without the print_sense. I noticed that st and osst under some circumstances look at the NO_SENSE case (the tape people will need to comment as this looks like a EOM workaround) so it would seem incorrect for the LLDD as a general policy to say it received no sense when it did. Florian and Alexander, does this patch fix your problem? Mike, how does it look to you? Alan Stern = sd.c 1.59 vs edited = --- 1.59/drivers/scsi/sd.cFri Oct 24 14:53:37 2003 +++ edited/drivers/scsi/sd.c Fri Jan 16 15:49:48 2004 @@ -730,6 +730,14 @@ * hard error. */ print_sense(sd, SCpnt); + /* FALLS THROUGH */ + + case NO_SENSE: + /* + * The low-level driver got the sense data but + * everything was all right. Don't treat this + * an an error. + */ SCpnt-result = 0; SCpnt-sense_buffer[0] = 0x0; good_sectors = this_count; -andmike -- Michael Anderson [EMAIL PROTECTED] --- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
Re: [linux-usb-devel] Re: [PATCH] oops in sd_shutdown
Greg KH [EMAIL PROTECTED] wrote: On Tue, Aug 12, 2003 at 07:53:53AM +0100, Christoph Hellwig wrote: On Tue, Aug 12, 2003 at 12:28:44AM +0200, Andries Brouwer wrote: I see an Oops in the SCSI code, caused by the fact that sdkp is NULL in sd_shutdown. How can that be?, you will ask - dev-driver_data was set in sd_probe. But in my case sd_probe never finished. An insmod usb-storage hangs forever, or at least for more than six hours, giving ample opportunity to observe this race between sd_probe and sd_shutdown. (Of course sd_probe hangs in sd_revalidate disk.) Well, this same problem could show upb in any other driver. Could you instead send a patch to Pat that the driver model never calls the shutdown method for a driver that hasn't finished -probe? I think it already will not do that due to taking the bus-subsys.rwsem before calling either probe() or remove(). Is the shutdown being called directly? The shutdown call is protected by a different rwsem. Depending on the call graph setting dev-driver on return of probe may provide a solution. I have not looked at all probe routines to understand if this would cause any bad side effects. Andries, Can you send the oops output? -andmike -- Michael Anderson [EMAIL PROTECTED] --- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa0013ave/direct;at.aspnet_072303_01/01 ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
[linux-usb-devel] Re: [PATCH / RFC] scsi_set_host_offline
Matthew Dharm [EMAIL PROTECTED] wrote: Mike -- As far as I can tell, this patch was never accepted. Any idea when this will be resolved? usb-storage is still waiting to implement proper device unplug... I reposted the patch with refreshed offsets for the newer kernel version. -andmike -- Michael Anderson [EMAIL PROTECTED] --- This SF.net email is sponsored by: The Definitive IT and Networking Event. Be There! NetWorld+Interop Las Vegas 2003 -- Register today! http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
Re: [linux-usb-devel] Re: [PATCH] USB changes for 2.5.58
Matthew, Sorry for the delay in replying (non coding activities are consuming to many hours). Matthew Dharm [EMAIL PROTECTED] wrote: Okay, I see Linus has now accepted this into his tree. It should propagate to the USB development trees soon. One question: What else is needed? We set the device offline, error/complete all pending commands, and the need to (somehow) make certain we're in a good state for calling scsi_remove_host(). How do we make that final guarantee? There was talk that scsi_set_device_offline() would take care of that for us by waking up the error handler... there seems to be code to do that Yes, The scsi_set_device_offline will wake up the error handler to abort outstanding commands. There was talk of using the release() function from the SCSI template to actually release resources So, what's the plan? There still are a few things on the to do list, but should not effect the LLDD interface (at least this is the goal). - scsi_request_fn needs a fix for device offline that will handle all request types. - scsi_remove_host needs to call template release at the correct time (ref counting ??). - need fix for offline hotplug event. - Should do_hotplug be exported or should device states be added / fixed ?? Cleanups - Change scsi_remove_host for int to void function. -andmike -- Michael Anderson [EMAIL PROTECTED] --- This SF.net email is sponsored by: Scholarships for Techies! Can't afford IT training? All 2003 ictp students receive scholarships. Get hands-on training in Microsoft, Cisco, Sun, Linux/UNIX, and more. www.ictp.com/training/sourceforge.asp ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
Re: [linux-usb-devel] Re: [PATCH] USB changes for 2.5.58
Matthew Dharm [EMAIL PROTECTED] wrote: Right... but I removed the release() function because that was marked (in the documentation) as only for the old-style drivers. So I'll need to re-introduce it -- but it looks like all it has to do is free some memory. Does that sound about right? Yes it was previously removed, but IIRC this is the direction discussed on this thread. For a idle device the release functionality could be done in the context of the scsi_remove_host call, but for a busy device we need to have this call to clean up later. -andmike -- Michael Anderson [EMAIL PROTECTED] --- This SF.net email is sponsored by: Scholarships for Techies! Can't afford IT training? All 2003 ictp students receive scholarships. Get hands-on training in Microsoft, Cisco, Sun, Linux/UNIX, and more. www.ictp.com/training/sourceforge.asp ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
Re: [linux-usb-devel] Re: [PATCH] USB changes for 2.5.58
Matthew Dharm [[EMAIL PROTECTED]] wrote: Any updates on this? I saw some patches, but they don't seem to be in my tree (the usb tree, which is synced from Linus' tree). People are starting to reports OOPSes to me because of this being missing Matt The scsi_set_device_offline interface is part of the last patch (scsi error) I sent to linux-scsi. I updated my patch post some comments from the list, but I am working on issue with the patch before I resend. -andmike -- Michael Anderson [EMAIL PROTECTED] --- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
Re: [linux-usb-devel] Re: [PATCH] USB changes for 2.5.58
Sorry Matthew I got side tracked on some issues for the last week. The scsi_set_device_offline(); function has not been added to any of James linux-scsi trees. You could add a ifndef in your code until we get the interface in the tree. Would scsi_set_device_offline() do more than sdev-online = 0? It depends on the state of the device at the calling of the function. I currently have scsi_set_device_offline() trying to do the following: 1.) set device offline and mark host in_recovery. 2.) mark all outstanding commands to be canceled and wake up error handler. 3.) flush the request queue. 4.) Once device is really offline send hotplug event. (2) needs some changes in the error handler which are needed in other cases. This the large part of the change, but it is not directly part of your request. (3) need a cleaner way to flush request specials off the queue . It would also be nice if there was a method to stop the incoming side of the request queue. (4) need export of do_hotplug interface or a method to generate a call to it for an offline event. I am working on these changes pretty much in order. Doug suggested a scsi_schedule_host_removal(), but I thought we could just change scsi_remove_host() to handle this task unless there is a side effect that all callers would not want???. -andmike Matthew Dharm [[EMAIL PROTECTED]] wrote: Willem Riede [EMAIL PROTECTED] suggested to me that I simply set sdev-online = 0 for scsi_set_device_offline() Any reason that isn't good enough? On Sun, Feb 02, 2003 at 10:13:17AM -0800, Matthew Dharm wrote: So, was any of this ever implemented? As far as I can tell, the required changes were: (o) addition of scsi_schedule_host_removal() (possibly optional) (o) implementation of scsi_set_device_offline() (possibly optional) (o) change the behavior of the 'hotplug initialization model' to call my release function Matt then call the set_scsi_device_offline(), then send back any outstanding commands via scsi_done(), then possibly call the scsi_schedule_host_removal() if Mike adds that function. Then return. Don't to anything else. Take all the remaining code you would normally run at this point and put it into a function in your source called usb_release() and in your Scsi_Host_Template struct that you pass to the scsi layer, init the release pointer with the address of your usb_release() routine. That way, the scsi layer can do what it's best at, taking care of the clean up details we've been talking about, and when it's all done, it will call your usb_release() routine with a single argument of the host struct you are wanting released. At that point, you can do all the freeing you would have done in that khubd loop (at least I think that's the context you are doing the freeing from now) and know for a fact that not only are you freeing everything, but so is the scsi mid layer. I think this will solve all the issues you've had, because this *won't* leak, it won't block your other actions, and it lets the scsi subsystem clean up properly. -andmike -- Michael Anderson [EMAIL PROTECTED] --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
Re: [linux-usb-devel] Re: [PATCH] USB changes for 2.5.58
Doug, I started writing the interface you put forth in your email. I am currently debugging it in UML so I can generate the error conditions in a control manner. I still have some stuff to look at in the error handler with it running in this mode as it previously expected no one else to be possibly doing operations on the host. This could be the case if other LLDD's use this interface and have another device that happens to timeout an IO post a device being set offline. A clarification question below. Doug Ledford [[EMAIL PROTECTED]] wrote: So, what you should be doing in order to be both a nice scsi host that plays well with the generic mechanism we have in place is when you get this removal event, you should be free'ing all the state you needed about the usb bus and such and taking this usb device off line or whatever you do. Then let the scsi mid layer clean up at it's leisure. You don't need to worry about it because the only thing you will have left is to wait for the scsi subsys to call you when it's time to delete things. You don't even have to keep device references around because we pass those in to your deletion routines anyway. I want to be able to inform the SCSI mid-layer, which will then inform higher layers, that the device is gone. scsi_set_device_offline() as we've been discussing. I assumed that the hotplug event would only come from this function if no commands where outstanding. If there where commands outstanding the event would not be generated until the error handler gained ownership of all the commands. -andmike -- Michael Anderson [EMAIL PROTECTED] --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
Re: [linux-usb-devel] Re: [PATCH] USB changes for 2.5.58
Alan Stern [[EMAIL PROTECTED]] wrote: On Fri, 24 Jan 2003, Luben Tuikov wrote: A LLDD should and must *not* call scsi_unregister_host(). This brakes all hierarchy. What I probably meant is the detect()/release() pair; release() itself normally calls scsi_unregister(host); the point is that it got nudged from ``above'', i.e. SCSI Core. How can a LLDD be certain that it can safely call scsi_unregister_host() whenever it wishes? As Doug pointed out this leads to problems. Apparently it can't. I don't mean to say that this was the right thing to do; I just meant that this is what Matt's currently-proposed patch does. Personally, I'm not very familiar with the details of the SCSI subsystem, and I don't know what preconditions are required for calling the various API's. Furhtermore, are we talking about scsi_unregister_host() or scsi_unregister(host)? The former does drivers and the latter does hosts. This would mean that my original statement was nevertheless correct, how can a LLDD decide to unload itself safely? I did indeed type it wrong. The code first calls scsi_remove_host(host) and then it calls scsi_unregister(host). I probably should have looked at Matt's patch closer, sorry. If a LLDD is going to be using scsi_add_host and scsi_remove_host the driver should not use scsi_register_host / scsi_unregister_host. If the driver is updated to the sysfs driver model then: 1.) The drivers probe routine should call into the scsi mid with. scsi_register(...); scsi_add_host(...); 2.) The drivers remove routine should call into the scsi mid with. scsi_remove_host(...); scsi_unregister(...); (scsi_remove_host is part of this current discussion). The event that calls probe / remove is the device_register / device_unregister of the the adapter device. The LLDD's device_initcall / module_exit routines will call driver_register and driver_unregister to cause device to driver binding. Which will cause probe / remove to be called. -andmike -- Michael Anderson [EMAIL PROTECTED] --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
Re: [linux-usb-devel] Re: [PATCH] USB changes for 2.5.58
Luben Tuikov [[EMAIL PROTECTED]] wrote: Mike Anderson wrote: Doug, I started writing the interface you put forth in your email. Do you mind clarifying? Either it was a private email, or one posted here, in which case there was an interpretation. It was posted here at the bottom of this email http://marc.theaimsgroup.com/?l=linux-scsim=104335366403485w=2 It is a starting point. -andmike -- Michael Anderson [EMAIL PROTECTED] --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
Re: [linux-usb-devel] Re: [PATCH] USB changes for 2.5.58
Oliver and Alan I am trying to catch up on this thread so I did not reply directly to your concerns, but I think they are covered below. Matthew Dharm [[EMAIL PROTECTED]] wrote: On Fri, Jan 17, 2003 at 11:55:36AM +0100, Oliver Neukum wrote: That is simply wrong. Reporting somebody having pulled a plug must not fail. What are you supposed to do with an error here? There must be a way for a LLD to report that reliably. If the answer is, take that lock, call that function, error all pending requests, release that lock and call that function, it's OK. But it must work in all cases. I absolutely agree. The device is gone. I can't do anything about it. If the SCSI layer decides it can't let go, what am I supposed to do about it? In a separate discussion with Mike, he mentioned that you can't scsi_remove_device() unless there are no pending commands. How the hell is an LLD supposed to assure that!?!? I believe that the scsi_remove_host function the way it is currently is not the correct function. The SCSI needs to separate the device gone from freeing. There maybe some unbounded cleanup as the request_queue that is exported to the block layer is part of the scsi_device which is a child of the virtual usb SCSI adapter. The only way to reduce the unbounded time is possibly we reorganizing some sysfs tree object layouts. The minute I error a command and call scsi_done(), I can get a new one. Unless I lock out requests with scsi_block_requests(), but that comes with major warnings about needing to get unblocked. Well in the case of the device really being gone does the LLD need to be worried about being unblocked. I get the feeling from this thread that this is probably the wrong interface. The way this should work is that the LLD calls scsi_remove_device(), and that cuts off the flow of commands. The LLD can promise to error-out any pending commands in the device command queue. That is, unless scsi_block_requests() and scsi_unblock_requests() are more useful than the documentation suggests... block(), error all commands, unregister()... that would make some sense. We could call scsi_block_request() as soon as we know the unit is gone, and unregister() as soon as the queue is empty. We should really ensure that we have good separation between stopping device IO, device gone, and release resources. - SCSI seems to have the flags to stop the IO, but instead of scsi_block_requests we may want to export the setting of device online. This can be done from sysfs now, but not from the driver ( the driver does have a handle to the device, but it would be better to have an interface in case we need to do something addition operations). - Possibly add a scsi_remove_device that would always succeed and a version of scsi_remove_host that calls scsi_remove_device for all devices. Though with the recent change to SCSI remove host to allow non sysfs device registration I do not believe we could ensure devices would be cleaned up. - SCSI would need to support ref counting so that resources are not removed to soon. -andmike -- Michael Anderson [EMAIL PROTECTED] --- This SF.NET email is sponsored by: Thawte.com - A 128-bit supercerts will allow you to extend the highest allowed 128 bit encryption to all your clients even if they use browsers that are limited to 40 bit encryption. Get a guide here:http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0030en ___ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel