Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-26 Thread James Bottomley
On Sat, 2005-03-26 at 09:27 +0200, Kai Makisara wrote:
> I fully agree that doing done() correctly _is_ a problem, especially when 
> the SCSI subsystem evolves and the high-level driver writers do not follow 
> the development closely enough.
> 
> One solution to these problems would be to let the drivers still use 
> scsi_do_req() and their own done() function, but create two 
> (three) helpers:
> - one to be called at the beginning of done(); it would do what needs to 
>   be done here but lets the driver to do some special things of its own if
>   necessary
> - one to be called to wait for the request to finish
> (- one to do scsi_ro_req() and the things necessary before these)

Yes.  The drivers that use it just need visiting with a big hammer.
However, our character ULDs (st and sg) use it because they try to
simulate fire and forget in the async write path (That's the only time
you actually don't wait on completion for scsi_do_req).

This comes about because the mid-layer is block oriented, so you can't
use any of the read/write machinery.  We could fix this by having a
generic character tap to a block queue for use in cases like SCSI where
the underlying driver uses block queues even if the actual device isn't
a block device.

Essentially, the character tap would simply submit a stream of reads and
writes through the block queue.  Then we could modify st and sg to use
an identical framework to the other ULDs ... you get a setup API and a
returning command API which are called for every I/O and the block layer
gets to handle the async/not-async pieces.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-25 Thread Kai Makisara
On Fri, 25 Mar 2005, James Bottomley wrote:

> On Fri, 2005-03-25 at 14:38 +0900, Tejun Heo wrote:
> >  We have users of scsi_do_req() other than scsi_wait_req() and they
> > use different done() functions to do different things.  I've checked
> > other done functions and none uses contents inside the passed
> > scsi_cmnd, so using a dummy command should be okay with them.  Am I
> > missing something here?
> 
> Well ... the other users are supposed to be going away.  They're
> actually all coded wrongly in some way or other ... perhaps I should
> speed up the process.
> 
I have seen you mention this several times now and I am getting more and 
more worried. The reason is that scsi_wait_req() is a synchronous 
interface and it does not allow a driver to do this:

- send a request
- do other useful things/let the user do useful work
- wait for completion before starting another request

I fully agree that doing done() correctly _is_ a problem, especially when 
the SCSI subsystem evolves and the high-level driver writers do not follow 
the development closely enough.

One solution to these problems would be to let the drivers still use 
scsi_do_req() and their own done() function, but create two 
(three) helpers:
- one to be called at the beginning of done(); it would do what needs to 
  be done here but lets the driver to do some special things of its own if
  necessary
- one to be called to wait for the request to finish
(- one to do scsi_ro_req() and the things necessary before these)

Having these helpers would isolate the user of the SCSI subsystem from the 
internals. scsi_wait_req() should call these functions and no additional 
maintenance would be needed for this additional asynchronous interface.

The current drivers may not do any work in done() that could not be done 
later but there is one patch pending where this happens: the st 
performance statistics patch needs to get the time stamp when the SCSI 
command is processed.

-- 
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-25 Thread James Bottomley
On Sat, 2005-03-26 at 06:43 +0900, Tejun Heo wrote:
>  1. Allocate scsi_request and request (two are linked)

This can't be done because the scsi_cmnd's are allocated specially (slab
with reserve pool).

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-25 Thread Tejun Heo

 Hello, James.

James Bottomley wrote:
> On Fri, 2005-03-25 at 14:38 +0900, Tejun Heo wrote:
> 
>> We have users of scsi_do_req() other than scsi_wait_req() and they
>>use different done() functions to do different things.  I've checked
>>other done functions and none uses contents inside the passed
>>scsi_cmnd, so using a dummy command should be okay with them.  Am I
>>missing something here?
> 
> 
> Well ... the other users are supposed to be going away.  They're
> actually all coded wrongly in some way or other ... perhaps I should
> speed up the process.

 Sounds great.  :-)

>> Oh, and I would really appreciate if you can fill me in / give a
>>pointer about the scsi_request/scsi_cmnd distinction.
> 
> The block layer speaks in terms of requests and the scsi layers in terms
> of commands.  The scsi_request_fn() actually associates a request with a
> command.  However, since SCSI uses the block layer for queueing, all the
> internal scsi command submit paths have to use requests.  This is what a
> scsi_request is.  The reason for the special casing is that we can't use
> the normal REQ_CMD or REQ_BLOCK_PC paths because they need ULD
> initialisation and back end processing.

 What I meant was we could just use scsi_cmnd instead of scsi_request
for commands.  Currently, we do the following for special commands.

 1. Allocate scsi_request and request (two are linked)
 2. Initialize scsi_request as needed
 3. queue the request
 4. the request is dispatched
 5. scsi_cmnd is initialized from scsi_request
 6. scsi_cmnd is executed
 7. result code and sense copied back to scsi_request
 8. request is completed

 Instead, we can

 1. Allocate scsi_cmnd and request (two are linked)
 2. Initialize scsi_cmnd as needed
 3. queue the request
 4. the request is dispatched
 5. scsi_cmnd is executed
 6. request is completed

 As the latter seemed more straight-forward to me, I was wondering if
there were reasons that I wasn't aware of.

 Thanks.

-- 
tejun

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-25 Thread James Bottomley
On Fri, 2005-03-25 at 14:38 +0900, Tejun Heo wrote:
>  We have users of scsi_do_req() other than scsi_wait_req() and they
> use different done() functions to do different things.  I've checked
> other done functions and none uses contents inside the passed
> scsi_cmnd, so using a dummy command should be okay with them.  Am I
> missing something here?

Well ... the other users are supposed to be going away.  They're
actually all coded wrongly in some way or other ... perhaps I should
speed up the process.

>  Oh, and I would really appreciate if you can fill me in / give a
> pointer about the scsi_request/scsi_cmnd distinction.

The block layer speaks in terms of requests and the scsi layers in terms
of commands.  The scsi_request_fn() actually associates a request with a
command.  However, since SCSI uses the block layer for queueing, all the
internal scsi command submit paths have to use requests.  This is what a
scsi_request is.  The reason for the special casing is that we can't use
the normal REQ_CMD or REQ_BLOCK_PC paths because they need ULD
initialisation and back end processing.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-24 Thread Tejun Heo
 Hi,

On Thu, Mar 24, 2005 at 11:02:45PM -0600, James Bottomley wrote:
> On Fri, 2005-03-25 at 12:15 +0900, Tejun Heo wrote:
> >  I think I found the cause.  Special requests submitted using
> > scsi_do_req() never initializes ->end_io().  Normally, SCSI midlayer
> > terminates special requests inside the SCSI midlayer without passing
> > through the blkdev layer.  However, if a device is going away or taken
> > offline, blkdev layer gets to terminate special requests and, as
> > ->end_io() is never set-up, nothing happens and the completion gets
> > lost.
> 
> The analysis is exactly correct, well done!  I think your patch is a bit
> overly complex, though.  We can achieve the same effect simply by
> executing the completion without changing the rq_status like the patch
> below.
> 
> Jens,  To go back to the original problem, except when I hit the usb-
> storage error handling oops, I can plug and unplug to my hearts content
> and everything works.

 We have users of scsi_do_req() other than scsi_wait_req() and they
use different done() functions to do different things.  I've checked
other done functions and none uses contents inside the passed
scsi_cmnd, so using a dummy command should be okay with them.  Am I
missing something here?

 Oh, and I would really appreciate if you can fill me in / give a
pointer about the scsi_request/scsi_cmnd distinction.

 Thanks a lot.

-- 
tejun

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-24 Thread James Bottomley
On Fri, 2005-03-25 at 12:15 +0900, Tejun Heo wrote:
>  I think I found the cause.  Special requests submitted using
> scsi_do_req() never initializes ->end_io().  Normally, SCSI midlayer
> terminates special requests inside the SCSI midlayer without passing
> through the blkdev layer.  However, if a device is going away or taken
> offline, blkdev layer gets to terminate special requests and, as
> ->end_io() is never set-up, nothing happens and the completion gets
> lost.

The analysis is exactly correct, well done!  I think your patch is a bit
overly complex, though.  We can achieve the same effect simply by
executing the completion without changing the rq_status like the patch
below.

Jens,  To go back to the original problem, except when I hit the usb-
storage error handling oops, I can plug and unplug to my hearts content
and everything works.

James

= drivers/scsi/scsi_lib.c 1.152 vs edited =
--- 1.152/drivers/scsi/scsi_lib.c   2005-03-18 05:33:09 -06:00
+++ edited/drivers/scsi/scsi_lib.c  2005-03-24 22:59:18 -06:00
@@ -252,6 +252,16 @@
complete(req->waiting);
 }
 
+/* This is the end routine we get to if a command was never attached
+ * to the request.  Simply complete the request without changing
+ * rq_status; this will cause a DRIVER_ERROR. */
+static void scsi_wait_req_end_io(struct request *req)
+{
+   BUG_ON(!req->waiting);
+
+   complete(req->waiting);
+}
+
 void scsi_wait_req(struct scsi_request *sreq, const void *cmnd, void *buffer,
   unsigned bufflen, int timeout, int retries)
 {
@@ -259,6 +269,7 @@

sreq->sr_request->waiting = &wait;
sreq->sr_request->rq_status = RQ_SCSI_BUSY;
+   sreq->sr_request->end_io = scsi_wait_req_end_io;
scsi_do_req(sreq, cmnd, buffer, bufflen, scsi_wait_done,
timeout, retries);
wait_for_completion(&wait);


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-24 Thread Tejun Heo
 Hello, James and Jens.

On Thu, Mar 24, 2005 at 06:45:58PM -0600, James Bottomley wrote:
> On Wed, 2005-03-23 at 16:25 +0100, Jens Axboe wrote:
> > Let me guess, it is hanging in wait_for_completion()?
> 
> Yes, I have the trace now.  Why is curious.  This is the trace of the
> failure:
> 
> Mar 24 18:40:34 localhost kernel: usb 4-2: USB disconnect, address 3
> Mar 24 18:40:34 localhost kernel: sd 0:0:0:0: CMD c25c98b0 done, completing
> Mar 24 18:40:34 localhost kernel:  0:0:0:0: cmd c25c98b0 returning
> Mar 24 18:40:34 localhost kernel:  0:0:0:0: cmd c25c98b0 going out <6>Read 
> Capacity (10) 25 00 00 00 00 00 00 00 00 00
> Mar 24 18:40:34 localhost kernel: scsi0 (0:0): rejecting I/O to dead device 
> (req c25c98b0)
> Mar 24 18:40:34 localhost kernel: usb 4-2: new full speed USB device using 
> uhci_hcd and address 4
> Mar 24 18:40:34 localhost kernel: scsi1 : SCSI emulation for USB Mass Storage 
> devices
> Mar 24 18:40:34 localhost kernel:  1:0:0:0: cmd c1a1b4b0 going out <6>Inquiry 
> 12 00 00 00 24 00
> 
> The problem occurs when the mid-layer rejects the I/O to the dead
> device.  Here it returns BLKPREP_KILL to the prep function, but after
> that we never get a completion back.

 I think I found the cause.  Special requests submitted using
scsi_do_req() never initializes ->end_io().  Normally, SCSI midlayer
terminates special requests inside the SCSI midlayer without passing
through the blkdev layer.  However, if a device is going away or taken
offline, blkdev layer gets to terminate special requests and, as
->end_io() is never set-up, nothing happens and the completion gets
lost.

 The following patch implements scsi_do_req_endio() and sets up
->end_io() and ->end_io_data before sending out special commands.
It's a quick fix & hacky.  I think the proper fix might be one of

 * Don't return BLKPREP_KILL in the prep_fn and always terminate
   special commands inside request_fn without using end_that_*
   functions.

 * I don't really know why the scsi_request/scsi_cmnd distincion
   is made (resource usage?), but, if it's a legacy thing, replace
   scsi_request with scsi_cmnd; then we can BLKPREP_KILL without using
   dummy scsi_cmnd.


 Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>


# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2005/03/25 11:57:25+09:00 [EMAIL PROTECTED] 
#   midlayer special command termination fix
# 
# drivers/scsi/scsi_lib.c
#   2005/03/25 11:57:17+09:00 [EMAIL PROTECTED] +30 -0
#   midlayer special command termination fix
# 
diff -Nru a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c   2005-03-25 11:59:28 +09:00
+++ b/drivers/scsi/scsi_lib.c   2005-03-25 11:59:28 +09:00
@@ -274,6 +274,33 @@
 }
 
 /*
+ * Special requests usually gets terminated inside scsi midlayer
+ * proper; however, they can be terminated by the blkdev layer when
+ * scsi_prep_fn() returns BLKPREP_KILL or scsi_request_fn() detects
+ * offline condition.  The following callback is invoked when the
+ * blkdev layer terminates a special request.  Emulate DID_NO_CONNECT.
+ */
+static void scsi_do_req_endio(struct request *rq)
+{
+   struct scsi_request *sreq = rq->end_io_data;
+   struct request_queue *q = sreq->sr_device->request_queue;
+   struct scsi_cmnd cmd;
+
+   memset(&cmd, 0, sizeof(cmd));
+   cmd.device = sreq->sr_device;
+   scsi_init_cmd_from_req(&cmd, sreq);
+   /* Our command is dummy, nullify back link. */
+   sreq->sr_command = NULL;
+
+   sreq->sr_result = cmd.result = DID_NO_CONNECT << 16;
+
+   /* The sreq->done() callback expects queue_lock to be unlocked. */
+   spin_unlock(q->queue_lock);
+   cmd.done(&cmd);
+   spin_lock(q->queue_lock);
+}
+
+/*
  * Function:scsi_do_req
  *
  * Purpose: Queue a SCSI request
@@ -326,6 +353,9 @@
 
if (sreq->sr_cmd_len == 0)
sreq->sr_cmd_len = COMMAND_SIZE(sreq->sr_cmnd[0]);
+
+   sreq->sr_request->end_io = scsi_do_req_endio;
+   sreq->sr_request->end_io_data = sreq;
 
/*
 * head injection *required* here otherwise quiesce won't work
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-24 Thread James Bottomley
On Wed, 2005-03-23 at 16:25 +0100, Jens Axboe wrote:
> Let me guess, it is hanging in wait_for_completion()?

Yes, I have the trace now.  Why is curious.  This is the trace of the
failure:

Mar 24 18:40:34 localhost kernel: usb 4-2: USB disconnect, address 3
Mar 24 18:40:34 localhost kernel: sd 0:0:0:0: CMD c25c98b0 done, completing
Mar 24 18:40:34 localhost kernel:  0:0:0:0: cmd c25c98b0 returning
Mar 24 18:40:34 localhost kernel:  0:0:0:0: cmd c25c98b0 going out <6>Read 
Capacity (10) 25 00 00 00 00 00 00 00 00 00
Mar 24 18:40:34 localhost kernel: scsi0 (0:0): rejecting I/O to dead device 
(req c25c98b0)
Mar 24 18:40:34 localhost kernel: usb 4-2: new full speed USB device using 
uhci_hcd and address 4
Mar 24 18:40:34 localhost kernel: scsi1 : SCSI emulation for USB Mass Storage 
devices
Mar 24 18:40:34 localhost kernel:  1:0:0:0: cmd c1a1b4b0 going out <6>Inquiry 
12 00 00 00 24 00

The problem occurs when the mid-layer rejects the I/O to the dead
device.  Here it returns BLKPREP_KILL to the prep function, but after
that we never get a completion back.

I'll dig around in ll_rw_blk.c to see if I can trace the problem, but
you know this code better than I do ...

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-23 Thread Jens Axboe
On Wed, Mar 23 2005, James Bottomley wrote:
> On Wed, 2005-03-23 at 08:19 +0100, Jens Axboe wrote:
> > It is not the oops I am getting. When I get a few minutes today, I'll
> > reproduce with vanilla and post it here.
> 
> Well, I have news too.  Unfortunately, the python script I posted is
> hanging in D wait.  When I tested all of this out (with a similar
> script) in the 2.6.10 timeframe, it wasn't doing this, so we have some
> other problem introduced into the stack since then, sigh.

Let me guess, it is hanging in wait_for_completion()?

> Also it means my test isn't effective, so I need to track down the
> open/close hang before I can make progress.

Makes sense, that is why you are not seeing the crash :)

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-23 Thread James Bottomley
On Wed, 2005-03-23 at 08:19 +0100, Jens Axboe wrote:
> It is not the oops I am getting. When I get a few minutes today, I'll
> reproduce with vanilla and post it here.

Well, I have news too.  Unfortunately, the python script I posted is
hanging in D wait.  When I tested all of this out (with a similar
script) in the 2.6.10 timeframe, it wasn't doing this, so we have some
other problem introduced into the stack since then, sigh.

Also it means my test isn't effective, so I need to track down the
open/close hang before I can make progress.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-23 Thread James Bottomley
On Wed, 2005-03-23 at 13:50 +0900, Tejun Heo wrote:
>   Well, but it's because scsi midlayer calls back into usb-storage eh 
> after the detaching process is complete.

Yes, but that's legitimate.  It's always been explicitly stated that we
can't ensure absolute synchronisation in the stack:  storage devices
must expect to have to reject I/O for devices they think have been
removed.

> > However, the current host code does need fixing, but the fix is to move
> > it over to a proper state model rather than the current bit twiddling we
> > do.
> 
>   I agree & am working on it.  This patch was mainly to verify Jens' oops.

Thanks!  You can look at the device state model as a guide ...
originally that was a bit mask too.

James

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-22 Thread Jens Axboe
On Wed, Mar 23 2005, Tejun Heo wrote:
>  Hi,
> 
> James Bottomley wrote:
> >On Wed, 2005-03-23 at 11:14 +0900, Tejun Heo wrote:
> >
> >>When hot-unplugging using scsi_remove_host() function (as usb
> >>does), scsi_forget_host() used to be called before
> >>scsi_host_cancel().  So, the device gets removed first without
> >>request cleanup and scsi_host_cancel() never gets to call
> >>scsi_device_cancel() on the removed devices.  This results in
> >>premature completion of hot-unplugging process with active
> >>requests left in queue, eventually leading to hang/offlined
> >>device or oops when the active command times out.
> >>
> >>This patch makes scsi_remove_host() call scsi_host_cancel()
> >>first such that the host is first transited into cancel state
> >>and all requests of all devices are killed, and then, the
> >>devices are removed.  This patch fixes the oops in eh after
> >>hot-unplugging bug.
> >
> >
> >This is actually simply reversing this patch:
> >
> >http://marc.theaimsgroup.com/?l=linux-scsi&m=109268755500248
> >
> >And all it does is give us the previous consequences back.
> >
> >The oops isn't in the eh it's in the usb-storage eh routine.
> 
>  Well, but it's because scsi midlayer calls back into usb-storage eh 
> after the detaching process is complete.
> 
> >However, the current host code does need fixing, but the fix is to move
> >it over to a proper state model rather than the current bit twiddling we
> >do.
> 
>  I agree & am working on it.  This patch was mainly to verify Jens' oops.

It is not the oops I am getting. When I get a few minutes today, I'll
reproduce with vanilla and post it here.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-22 Thread Tejun Heo
 Hi,
James Bottomley wrote:
On Wed, 2005-03-23 at 11:14 +0900, Tejun Heo wrote:
When hot-unplugging using scsi_remove_host() function (as usb
does), scsi_forget_host() used to be called before
scsi_host_cancel().  So, the device gets removed first without
request cleanup and scsi_host_cancel() never gets to call
scsi_device_cancel() on the removed devices.  This results in
premature completion of hot-unplugging process with active
requests left in queue, eventually leading to hang/offlined
device or oops when the active command times out.
This patch makes scsi_remove_host() call scsi_host_cancel()
first such that the host is first transited into cancel state
and all requests of all devices are killed, and then, the
devices are removed.  This patch fixes the oops in eh after
hot-unplugging bug.

This is actually simply reversing this patch:
http://marc.theaimsgroup.com/?l=linux-scsi&m=109268755500248
And all it does is give us the previous consequences back.
The oops isn't in the eh it's in the usb-storage eh routine.
 Well, but it's because scsi midlayer calls back into usb-storage eh 
after the detaching process is complete.

However, the current host code does need fixing, but the fix is to move
it over to a proper state model rather than the current bit twiddling we
do.
 I agree & am working on it.  This patch was mainly to verify Jens' oops.
--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-22 Thread James Bottomley
On Wed, 2005-03-23 at 11:14 +0900, Tejun Heo wrote:
>   When hot-unplugging using scsi_remove_host() function (as usb
>   does), scsi_forget_host() used to be called before
>   scsi_host_cancel().  So, the device gets removed first without
>   request cleanup and scsi_host_cancel() never gets to call
>   scsi_device_cancel() on the removed devices.  This results in
>   premature completion of hot-unplugging process with active
>   requests left in queue, eventually leading to hang/offlined
>   device or oops when the active command times out.
> 
>   This patch makes scsi_remove_host() call scsi_host_cancel()
>   first such that the host is first transited into cancel state
>   and all requests of all devices are killed, and then, the
>   devices are removed.  This patch fixes the oops in eh after
>   hot-unplugging bug.

This is actually simply reversing this patch:

http://marc.theaimsgroup.com/?l=linux-scsi&m=109268755500248

And all it does is give us the previous consequences back.

The oops isn't in the eh it's in the usb-storage eh routine.

However, the current host code does need fixing, but the fix is to move
it over to a proper state model rather than the current bit twiddling we
do.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-22 Thread Tejun Heo
08_scsi_hot_unplug_fix.patch

When hot-unplugging using scsi_remove_host() function (as usb
does), scsi_forget_host() used to be called before
scsi_host_cancel().  So, the device gets removed first without
request cleanup and scsi_host_cancel() never gets to call
scsi_device_cancel() on the removed devices.  This results in
premature completion of hot-unplugging process with active
requests left in queue, eventually leading to hang/offlined
device or oops when the active command times out.

This patch makes scsi_remove_host() call scsi_host_cancel()
first such that the host is first transited into cancel state
and all requests of all devices are killed, and then, the
devices are removed.  This patch fixes the oops in eh after
hot-unplugging bug.

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>

 hosts.c |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

Index: scsi-export/drivers/scsi/hosts.c
===
--- scsi-export.orig/drivers/scsi/hosts.c   2005-03-23 09:39:36.0 
+0900
+++ scsi-export/drivers/scsi/hosts.c2005-03-23 09:40:11.0 +0900
@@ -74,8 +74,8 @@ void scsi_host_cancel(struct Scsi_Host *
  **/
 void scsi_remove_host(struct Scsi_Host *shost)
 {
-   scsi_forget_host(shost);
scsi_host_cancel(shost, 0);
+   scsi_forget_host(shost);
scsi_proc_host_rm(shost);
 
set_bit(SHOST_DEL, &shost->shost_state);

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html