Re: [PATCH] scsi device recovery

2007-12-13 Thread James Bottomley

On Wed, 2007-12-12 at 18:54 +0100, Bernd Schubert wrote:
 [Hmm, resending since mail after more than 30min still not on the ML, maybe 
 the attachment was too large? I have uploaded the log to 
 http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/scsi/kern.log.1]
 
 On Wednesday 12 December 2007 16:59:36 James Bottomley wrote:
  On Wed, 2007-12-12 at 15:36 +0100, Bernd Schubert wrote:
   On Wednesday 12 December 2007 14:39:27 Matthew Wilcox wrote:
On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote:
 below is a patch introducing device recovery, trying to prevent i/o
 errors when a DID_NO_CONNECT or SOFT_ERROR does happen.
   
Why doesn't the regular scsi_eh do what you need?
  
   First of all, it is presently simply not called when the two errors above
   do happen. This could be changed, of course.
 
  Erm, I think you'll find the error handler does activate on
  DID_SOFT_ERROR.  It causes a retry via the eh.  DID_NO_CONNECT is an
 
 Dec  7 23:48:45 beo-96 kernel: [94605.297924] sd 2:0:5:0: [sdd] Result: 
 hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
 Dec  7 23:48:45 beo-96 kernel: [94605.297932] end_request: I/O error, dev 
 sdd, 
 sector 7706802052
 Dec  7 23:48:45 beo-96 kernel: [94605.297937] raid5:md5: read error not 
 correctable (sector 871932472 on sdd3).

This is some type of ioc internal error.  What we do on DID_SOFT_ERROR
is retry for the usual number of times up to the timeout limit.
Unfortunately, the retries are fixed at SD_MAX_RETRIES in sd.c.  Without
diagnosing what's going wrong in the fusion, it's impossible to say if
this is reasonable, but your fusion is signalling ioc errors (firmware
errors).

 Full log attached.
 
  immediate error with no eh intervention because it means that the target
  went away.  Handling this as a retryable error isn't an option because
  it will interfere with hotplug.
 
 Then we need a sysfs flag one can set to manually enable eh for these devices
 on DID_NO_CONNECT. 

No, because that will seriously damage a lot of other systems.

The DID_NO_CONNECT looks to be a genuine reselection issue caused by a
device out of spec on the bus.  The SPI standard says a device should
respond in 250ms, which is what most HBA's take as the default selection
timeout.  I'd say for the device you have, you need to increase this.
Unfortunately doing this for the fusion is some type of mode page
setting, I think, but I don't have the doc in front of me.  I'd be
amenable to putting the selection timeout as a parameter in the spi
transport class, since others might find it valuable occasionally to
control.

 
   Secondly, I think scsi_eh is in most cases doing too much. We are
   fighting with flaky Infortrend boxes here, and scsi_eh sometimes manages
   to crash their scsi channels. In most cases it is sufficient to stall any
   io to the device and then to resume.
 
  But that's basically the default behaviour of the error handler (stall
  then resume).
 
   For most scsi devices one probably doesn't need a suspend time or it can
   be very small, this still needs to become configurable via sysfs.
 
  You mean a wait time beyond what the error handler currently does
  (basically it waits for the quiesce, begins error handling and then
  sends a test unit ready when it finishes before restarting).
 
 In deh just waits on the first error and then only does a DV. For 
 these infortrend devices, thats mostly sufficient.

   Thirdly, scsi_eh doesn't give up, in most cases, when the scsi channel of
   a Infortrend box crashed, it tried forever to recover.
   To improve this is still on my todo list.
 
  Could you send traces for this.  I thought the error handler had been
  fixed over the last few years always to terminate.  If there's a case
  where it doesn't, this needs fixing.
 
 I'm attaching the syslog, this is 2.6.22 + additional printks, dump_stack()'s
 and msleep()'s.
 At 03:59:36 the system finally went into wait_for_completion(), similar
 to the everything in wait_for_completion, what is my system doing? thread.

This looks like a genuine bug.  I missed the thread, since my email
system went off line while I was on holiday for two weeks.  The symptoms
look to be lost commands, but I can't see why from the traces.  There's
a known bug where we can hang in domain validation because of a resource
starvation issue, but I know of none where everything hangs just after
error recovery completes.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] scsi device recovery

2007-12-12 Thread Bernd Schubert
Hi,

below is a patch introducing device recovery, trying to prevent i/o errors 
when a DID_NO_CONNECT or SOFT_ERROR does happen.

The patch still needs quite some work:

1.) I still didn't figure out what is the best place to run 

sdev-deh.ehandler = kthread_run(scsi_device_error_handler, ...)

2.) As I see it, its not a good idea to run spi_schedule_dv_device() in 
scsi_error.c, since spi_schedule_dv_device() is in scsi_transport_spi.c, 
which seems to be separated from the core scsi-layer.
So what is another way to initiate a DV in scsi_error.c?

3.) Maybe related to 2), for now I'm calling spi_schedule_dv_device(), but 
this is not always doing what I want.

[  406.785104] sd 5:0:2:0: deh: scheduling domain validation
[  408.422530]  target5:0:2: Beginning Domain Validation
[  408.466620]  target5:0:2: Domain Validation skipping write tests
[  408.472771]  target5:0:2: Ending Domain Validation

Hmm, somehow related to sdev-inquiry_len, but isn't it the task of 
spi_schedule_dv_device() and subfunctions to do that properly?

Any comments, hints and help is appreciated.


Signed-of-by: Bernd Schubert [EMAIL PROTECTED]

Index: linux-2.6.22/drivers/scsi/scsi_error.c
===
--- linux-2.6.22.orig/drivers/scsi/scsi_error.c 2007-12-12 12:26:20.0 
+0100
+++ linux-2.6.22/drivers/scsi/scsi_error.c  2007-12-12 13:08:40.0 
+0100
@@ -33,6 +33,7 @@
 #include scsi/scsi_transport.h
 #include scsi/scsi_host.h
 #include scsi/scsi_ioctl.h
+#include scsi/scsi_transport_spi.h
 
 #include scsi_priv.h
 #include scsi_logging.h
@@ -1589,6 +1590,153 @@ int scsi_error_handler(void *data)
return 0;
 }
 
+/**
+  * scsi_unjam_sdev - try to revover a failed scsi-device
+  * @sdev: scsi device we are recovering
+  */
+static int scsi_unjam_sdev(struct scsi_device *sdev)
+{
+   int rtn;
+
+   sdev_printk(KERN_CRIT, sdev, resetting device\n);
+   rtn = scsi_reset_provider(sdev, SCSI_TRY_RESET_DEVICE);
+   scsi_report_device_reset(sdev-host, sdev-channel, sdev-id);
+   if (rtn == SUCCESS)
+   sdev_printk(KERN_INFO, sdev, device reset succeeded, 
+   set device to running state\n);
+   return SUCCESS;
+}
+
+/**
+ * scsi_schedule_deh - schedule EH for SCSI device
+ * @sdev:  SCSI device to invoke error handling on.
+ *
+ **/
+void scsi_schedule_deh(struct scsi_device *sdev)
+{
+#if 0
+   if (sdev-deh.error) {
+   /* blocking the device does not work! another recovery was
+* scheduled, though no i/o should go to the device now! */
+   sdev_printk(KERN_CRIT, sdev,
+   device already in recovery, but another recovery 
+   was scheduled\n);
+   dump_stack();
+   }
+#endif
+   if (sdev-deh.error)
+   return; /* recovery already running */
+
+   if (sdev-deh.last_recovery
+ jiffies  sdev-deh.last_recovery + 300 * HZ)
+   sdev-deh.count++;
+   else
+   sdev-deh.count = 0;
+
+   if (sdev-deh.count = 10) {
+   sdev_printk(KERN_WARNING, sdev,
+   too many errors within time limit, setting 
+   device offline\n);
+   scsi_device_set_state(sdev, SDEV_OFFLINE);
+   return;
+   } else if (sdev-deh.count = 5) {
+   sdev_printk(KERN_INFO, sdev, Initiating host recovery\n);
+   scsi_schedule_eh(sdev-host); /* host recovery */
+   return;
+   } else
+   sdev-deh.count++;
+
+   sdev_printk(KERN_INFO, sdev, n-error: %d\n, sdev-deh.count);
+
+   if (!scsi_internal_device_block(sdev)) {
+   sdev-deh.error = 1;
+   if (sdev-deh.ehandler)
+   wake_up_process(sdev-deh.ehandler);
+   else
+   sdev_printk(KERN_WARNING, sdev,
+   deh handler missing\n);
+   } else {
+   sdev_printk(KERN_WARNING, sdev,
+   Couldn't block device, calling host recovery\n);
+   scsi_schedule_eh(sdev-host);
+   }
+}
+EXPORT_SYMBOL_GPL(scsi_schedule_deh);
+
+/**
+ * scsi_device_error_handler - SCSI error handler thread
+ * @data:  Device for which we are running.
+ *
+ * Notes:
+ *This is the main device error handling loop.  This is run as a kernel 
thread
+ *for every SCSI device and handles all device error handling activity.
+ **/
+int scsi_device_error_handler(void *data)
+{
+   struct scsi_device *sdev = data;
+   int sleeptime = 30;
+
+   current-flags |= PF_NOFREEZE;
+
+   /*
+* We use TASK_INTERRUPTIBLE so that the thread is not
+* counted against the load average as a running process.
+* We never actually get interrupted because kthread_run
+* disables singal delivery for the created thread.
+  

Re: [PATCH] scsi device recovery

2007-12-12 Thread Matthew Wilcox
On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote:
 below is a patch introducing device recovery, trying to prevent i/o errors 
 when a DID_NO_CONNECT or SOFT_ERROR does happen.

Why doesn't the regular scsi_eh do what you need?

-- 
Intel are signing my paycheques ... these opinions are still mine
Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi device recovery

2007-12-12 Thread Bernd Schubert
On Wednesday 12 December 2007 14:39:27 Matthew Wilcox wrote:
 On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote:
  below is a patch introducing device recovery, trying to prevent i/o
  errors when a DID_NO_CONNECT or SOFT_ERROR does happen.

 Why doesn't the regular scsi_eh do what you need?

First of all, it is presently simply not called when the two errors above do 
happen. This could be changed, of course.

Secondly, I think scsi_eh is in most cases doing too much. We are fighting 
with flaky Infortrend boxes here, and scsi_eh sometimes manages to crash 
their scsi channels. In most cases it is sufficient to stall any io to the 
device and then to resume.
For most scsi devices one probably doesn't need a suspend time or it can be 
very small, this still needs to become configurable via sysfs.

Thirdly, scsi_eh doesn't give up, in most cases, when the scsi channel of a 
Infortrend box crashed, it tried forever to recover.
To improve this is still on my todo list.


Thanks,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi device recovery

2007-12-12 Thread Bernd Schubert
[Hmm, resending since mail after more than 30min still not on the ML, maybe 
the attachment was too large? I have uploaded the log to 
http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/scsi/kern.log.1]

On Wednesday 12 December 2007 16:59:36 James Bottomley wrote:
 On Wed, 2007-12-12 at 15:36 +0100, Bernd Schubert wrote:
  On Wednesday 12 December 2007 14:39:27 Matthew Wilcox wrote:
   On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote:
below is a patch introducing device recovery, trying to prevent i/o
errors when a DID_NO_CONNECT or SOFT_ERROR does happen.
  
   Why doesn't the regular scsi_eh do what you need?
 
  First of all, it is presently simply not called when the two errors above
  do happen. This could be changed, of course.

 Erm, I think you'll find the error handler does activate on
 DID_SOFT_ERROR.  It causes a retry via the eh.  DID_NO_CONNECT is an

Dec  7 23:48:45 beo-96 kernel: [94605.297924] sd 2:0:5:0: [sdd] Result: 
hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
Dec  7 23:48:45 beo-96 kernel: [94605.297932] end_request: I/O error, dev sdd, 
sector 7706802052
Dec  7 23:48:45 beo-96 kernel: [94605.297937] raid5:md5: read error not 
correctable (sector 871932472 on sdd3).

Full log attached.

 immediate error with no eh intervention because it means that the target
 went away.  Handling this as a retryable error isn't an option because
 it will interfere with hotplug.

Then we need a sysfs flag one can set to manually enable eh for these devices
on DID_NO_CONNECT. 


  Secondly, I think scsi_eh is in most cases doing too much. We are
  fighting with flaky Infortrend boxes here, and scsi_eh sometimes manages
  to crash their scsi channels. In most cases it is sufficient to stall any
  io to the device and then to resume.

 But that's basically the default behaviour of the error handler (stall
 then resume).

  For most scsi devices one probably doesn't need a suspend time or it can
  be very small, this still needs to become configurable via sysfs.

 You mean a wait time beyond what the error handler currently does
 (basically it waits for the quiesce, begins error handling and then
 sends a test unit ready when it finishes before restarting).

In deh just waits on the first error and then only does a DV. For 
these infortrend devices, thats mostly sufficient.


  Thirdly, scsi_eh doesn't give up, in most cases, when the scsi channel of
  a Infortrend box crashed, it tried forever to recover.
  To improve this is still on my todo list.

 Could you send traces for this.  I thought the error handler had been
 fixed over the last few years always to terminate.  If there's a case
 where it doesn't, this needs fixing.

I'm attaching the syslog, this is 2.6.22 + additional printks, dump_stack()'s
and msleep()'s.
At 03:59:36 the system finally went into wait_for_completion(), similar
to the everything in wait_for_completion, what is my system doing? thread.


Thanks,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html