Re: [PATCH scsi-misc-2.6 02/07] scsi: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout

2005-04-18 Thread Tejun Heo
James Bottomley wrote:
> On Tue, 2005-04-19 at 07:31 +0900, Tejun Heo wrote:
> 
>> The original code also uses timer pending status as a signal that
>>command completed normally in scsi_eh_done() function, and the same
>>race also exists in the original code, no matter what we do, unless we
>>make timer expiration and removal of the command atomic, there will be
>>a window in which command completes normally but considered to have
>>timed out as long as we use timer pending status as tie breaker.
> 
> 
> True enough; it's a race between the driver calling scsi_done() and the
> timer expiring.  However, that's an acceptable race, since the timer
> values are usually in the order of a few seconds and the command usually
> completes in milliseconds.  the done function is called in interrupt
> context after command completion, so it's as close as possible to the
> actual command completion
> 
> 
>> The patch moves the test out of scsi_eh_done() into
>>scsi_send_eh_cmnd() and this does widen the window by delaying removal
>>of timer until after the original thread gets scheduled, but usually
>>not by much and that's how timers are done in many cases (through out
>>the kernel, timer removals are done with intervening scheduling and no
>>one considers those incorrect).  So...
> 
> 
> The time between a thread being marked ready to run and actually running
> has been measured in seconds on a heavily loaded system.  That makes the
> race window potentially as wide as the timer, which is unacceptable.

 Hmm, okay, agreed.  I'll rework it.  I think I can do it by just adding
a private struct scsi_eh_timer_arg to pass along the timer and the cmd
pointer.  I'll repost the patchset soon.

 Thanks a lot.

-- 
tejun

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH scsi-misc-2.6 02/07] scsi: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout

2005-04-18 Thread James Bottomley
On Tue, 2005-04-19 at 07:31 +0900, Tejun Heo wrote:
>  The original code also uses timer pending status as a signal that
> command completed normally in scsi_eh_done() function, and the same
> race also exists in the original code, no matter what we do, unless we
> make timer expiration and removal of the command atomic, there will be
> a window in which command completes normally but considered to have
> timed out as long as we use timer pending status as tie breaker.

True enough; it's a race between the driver calling scsi_done() and the
timer expiring.  However, that's an acceptable race, since the timer
values are usually in the order of a few seconds and the command usually
completes in milliseconds.  the done function is called in interrupt
context after command completion, so it's as close as possible to the
actual command completion

>  The patch moves the test out of scsi_eh_done() into
> scsi_send_eh_cmnd() and this does widen the window by delaying removal
> of timer until after the original thread gets scheduled, but usually
> not by much and that's how timers are done in many cases (through out
> the kernel, timer removals are done with intervening scheduling and no
> one considers those incorrect).  So...

The time between a thread being marked ready to run and actually running
has been measured in seconds on a heavily loaded system.  That makes the
race window potentially as wide as the timer, which is unacceptable.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH scsi-misc-2.6 02/07] scsi: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout

2005-04-18 Thread Tejun Heo
 Hello, James.

On Mon, Apr 18, 2005 at 10:33:21AM -0500, James Bottomley wrote:
> On Mon, 2005-04-11 at 03:45 +0900, Tejun Heo wrote:
> > scmd->eh_timeout is used to resolve the race between command
> > completion and timeout.  However, during error handling,
> > scsi_send_eh_cmnd uses scmd->eh_timeout.  This creates a race
> > condition between eh and normal completion for a request which
> > has timed out and in the process of error handling.  If the
> > request completes while scmd->eh_timeout is being used by eh,
> > eh timeout is lost and the command will be handled by both eh
> > and completion path.  This patch fixes the race by making
> > scsi_send_eh_cmnd() use its own timer.
> > 
> > Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
> 
> The logic is wrong in there.
> 
> The problem is you cannot rely on the timer being pending as a signal
> that the command completed normally.  The kernel doesn't define the
> elapsed time between the eh_action semaphore going up and the process
> waiting for it being scheduled.  If the timer fires within that
> undefined interval, you'll think the command timed out when it, in fact,
> completed normally.

 The original code also uses timer pending status as a signal that
command completed normally in scsi_eh_done() function, and the same
race also exists in the original code, no matter what we do, unless we
make timer expiration and removal of the command atomic, there will be
a window in which command completes normally but considered to have
timed out as long as we use timer pending status as tie breaker.

 The patch moves the test out of scsi_eh_done() into
scsi_send_eh_cmnd() and this does widen the window by delaying removal
of timer until after the original thread gets scheduled, but usually
not by much and that's how timers are done in many cases (through out
the kernel, timer removals are done with intervening scheduling and no
one considers those incorrect).  So...

 * If you're worried about the race itself, it was there before,
   it shouldn't cause any problem, and we really can't help it.
 * If you're worried about the widening of the window, practically,
   it wouldn't cause problem, and it's how timers are done in many
   other places.

 But if you still don't like it, I can rework it (maybe I'll need to
add a field to Scsi_Host or scsi_cmnd).  So, please let me know.

 Thanks a lot.

-- 
tejun

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH scsi-misc-2.6 02/07] scsi: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout

2005-04-18 Thread James Bottomley
On Mon, 2005-04-11 at 03:45 +0900, Tejun Heo wrote:
>   scmd->eh_timeout is used to resolve the race between command
>   completion and timeout.  However, during error handling,
>   scsi_send_eh_cmnd uses scmd->eh_timeout.  This creates a race
>   condition between eh and normal completion for a request which
>   has timed out and in the process of error handling.  If the
>   request completes while scmd->eh_timeout is being used by eh,
>   eh timeout is lost and the command will be handled by both eh
>   and completion path.  This patch fixes the race by making
>   scsi_send_eh_cmnd() use its own timer.
> 
> Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>

The logic is wrong in there.

The problem is you cannot rely on the timer being pending as a signal
that the command completed normally.  The kernel doesn't define the
elapsed time between the eh_action semaphore going up and the process
waiting for it being scheduled.  If the timer fires within that
undefined interval, you'll think the command timed out when it, in fact,
completed normally.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH scsi-misc-2.6 02/07] scsi: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout

2005-04-10 Thread Tejun Heo
02_scsi_timer_eh_timer_fix.patch

scmd->eh_timeout is used to resolve the race between command
completion and timeout.  However, during error handling,
scsi_send_eh_cmnd uses scmd->eh_timeout.  This creates a race
condition between eh and normal completion for a request which
has timed out and in the process of error handling.  If the
request completes while scmd->eh_timeout is being used by eh,
eh timeout is lost and the command will be handled by both eh
and completion path.  This patch fixes the race by making
scsi_send_eh_cmnd() use its own timer.

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>

 scsi_error.c |   64 ++-
 scsi_priv.h  |1 
 2 files changed, 20 insertions(+), 45 deletions(-)

Index: scsi-reqfn-export/drivers/scsi/scsi_error.c
===
--- scsi-reqfn-export.orig/drivers/scsi/scsi_error.c2005-04-11 
03:42:11.0 +0900
+++ scsi-reqfn-export/drivers/scsi/scsi_error.c 2005-04-11 03:42:11.0 
+0900
@@ -420,46 +420,12 @@ static int scsi_eh_completed_normally(st
 }
 
 /**
- * scsi_eh_times_out - timeout function for error handling.
- * @scmd:  Cmd that is timing out.
- *
- * Notes:
- *During error handling, the kernel thread will be sleeping waiting
- *for some action to complete on the device.  our only job is to
- *record that it timed out, and to wake up the thread.
- **/
-static void scsi_eh_times_out(struct scsi_cmnd *scmd)
-{
-   scsi_eh_eflags_set(scmd, SCSI_EH_REC_TIMEOUT);
-   SCSI_LOG_ERROR_RECOVERY(3, printk("%s: scmd:%p\n", __FUNCTION__,
- scmd));
-
-   if (scmd->device->host->eh_action)
-   up(scmd->device->host->eh_action);
-}
-
-/**
  * scsi_eh_done - Completion function for error handling.
  * @scmd:  Cmd that is done.
  **/
 static void scsi_eh_done(struct scsi_cmnd *scmd)
 {
-   /*
-* if the timeout handler is already running, then just set the
-* flag which says we finished late, and return.  we have no
-* way of stopping the timeout handler from running, so we must
-* always defer to it.
-*/
-   if (del_timer(&scmd->eh_timeout)) {
-   scmd->request->rq_status = RQ_SCSI_DONE;
-   scmd->owner = SCSI_OWNER_ERROR_HANDLER;
-
-   SCSI_LOG_ERROR_RECOVERY(3, printk("%s scmd: %p result: %x\n",
-  __FUNCTION__, scmd, scmd->result));
-
-   if (scmd->device->host->eh_action)
-   up(scmd->device->host->eh_action);
-   }
+   up(scmd->device->host->eh_action);
 }
 
 /**
@@ -478,6 +444,7 @@ static int scsi_send_eh_cmnd(struct scsi
 {
struct scsi_device *sdev = scmd->device;
struct Scsi_Host *shost = sdev->host;
+   struct timer_list timer;
DECLARE_MUTEX_LOCKED(sem);
unsigned long flags;
int rtn = SUCCESS;
@@ -492,7 +459,11 @@ static int scsi_send_eh_cmnd(struct scsi
scmd->cmnd[1] = (scmd->cmnd[1] & 0x1f) |
(sdev->lun << 5 & 0xe0);
 
-   scsi_add_timer(scmd, timeout, scsi_eh_times_out);
+   init_timer(&timer);
+   timer.expires = jiffies + timeout;
+   timer.function = (void (*)(unsigned long))scsi_eh_done;
+   timer.data = (unsigned long)scmd;
+   add_timer(&timer);
 
/*
 * set up the semaphore so we wait for the command to complete.
@@ -508,14 +479,19 @@ static int scsi_send_eh_cmnd(struct scsi
down(&sem);
scsi_log_completion(scmd, SUCCESS);
 
-   shost->eh_action = NULL;
-
-   /*
-* see if timeout.  if so, tell the host to forget about it.
-* in other words, we don't want a callback any more.
-*/
-   if (scsi_eh_eflags_chk(scmd, SCSI_EH_REC_TIMEOUT)) {
-   scsi_eh_eflags_clr(scmd,  SCSI_EH_REC_TIMEOUT);
+   if (del_timer(&timer)) {
+   SCSI_LOG_ERROR_RECOVERY(3,
+   printk("scsi_eh_done scmd: %p result: %x\n",
+  scmd, scmd->result));
+   scmd->request->rq_status = RQ_SCSI_DONE;
+   scmd->owner = SCSI_OWNER_ERROR_HANDLER;
+   } else {
+   /*
+* Timed out.  Tell the host to forget about it.  In
+* other words, we don't want a callback any more.
+*/
+   SCSI_LOG_ERROR_RECOVERY(3,
+   printk("scsi_eh_times_out scmd: %p\n", scmd));
scmd->owner = SCSI_OWNER_LOWLEVEL;
 
/*
Index: scsi-reqfn-export/drivers/scsi/scsi_priv.h
===
--- scsi-reqfn-export.orig/drivers/scsi/scsi_priv.h 2005-04-11 
03:42:10.0 +0900
+++ scsi-reqfn-export/drivers/scsi/scsi_priv.h  2005-04-11 03:42:11.