On Thu, 15 Sep 2005, James Bottomley wrote:
> Well, I think the symptoms are racing scsi_remove_host() calls and the
> solution is to enforce the state model on removal (as in if the host is
> already in the remove state, don't try to remove it again).
You're forgetting something: Devices can be removed at any time, no matter
what state the host is in. It's even possible for thread A to be removing
a device while thread B is removing the host. The A thread will interfere
with the B thread, because the "list_for_each_entry_safe" loops in
scsi_forget_host and __scsi_remove_target _aren't_ safe against other
threads removing devices. Also remember that the list pointers get
changed when a device is _released_, which can occur quite some time after
it is _removed_.
In short, those iterations must be carried out as in my patch.
> > I still do not understand as I asked in a previous comment why we are not
> > shutting down the eh_thread in scsi_remove_host and also why simpler state
> > model updates could not solve the problem.
>
> Well, it goes back to whether we wait for the thread or not. To shut
> the thread down, we also need to wait for it to complete.
>
> As far as the state model goes, we either need to wait for the eh thread
> before transitioning to cancel or introduce the extra states that
> reflect the parallel eh transitions.
Agreed, one or the other of these is needed. I think the parallel-states
approach offers a little more flexibility.
> > I believe I also indicated that we could enhance scsi_error to shutdown
> > faster during this state which should only be a performance improvement.
>
> Yes, we could ... patches?
During the DEL_RECOVERY state you could maybe make the error handler
faster by removing the settle delays. I'm not sure what else you can do
-- all attempts at submitting commands will quickly fail.
During CANCEL_RECOVERY you may not want to speed up the error handler.
In this state it needs to function normally, since there might be errors
during the cache-flush sequence or similar things (for non-forced
removal).
Finally, having said all that, it turns out there's still a bug in the
2.6.14-rc1 code. I feel pretty stupid about it -- you'll understand why
when you read the patch below.
Alan Stern
Signed-off-by: Alan Stern <[EMAIL PROTECTED]>
--- a/drivers/scsi/scsi_sysfs.c Mon Sep 12 23:12:09 2005
+++ b/drivers/scsi/scsi_sysfs.c Thu Sep 15 21:32:27 2005
@@ -707,9 +707,11 @@
**/
void scsi_remove_device(struct scsi_device *sdev)
{
- down(&sdev->host->scan_mutex);
+ struct Scsi_Host *shost = sdev->host;
+
+ down(&shost->scan_mutex);
__scsi_remove_device(sdev);
- up(&sdev->host->scan_mutex);
+ up(&shost->scan_mutex);
}
EXPORT_SYMBOL(scsi_remove_device);
-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
[email protected]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel