Re: [PATCH] scsi device recovery
On Wed, 2007-12-12 at 18:54 +0100, Bernd Schubert wrote: [Hmm, resending since mail after more than 30min still not on the ML, maybe the attachment was too large? I have uploaded the log to http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/scsi/kern.log.1] On Wednesday 12 December 2007 16:59:36 James Bottomley wrote: On Wed, 2007-12-12 at 15:36 +0100, Bernd Schubert wrote: On Wednesday 12 December 2007 14:39:27 Matthew Wilcox wrote: On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote: below is a patch introducing device recovery, trying to prevent i/o errors when a DID_NO_CONNECT or SOFT_ERROR does happen. Why doesn't the regular scsi_eh do what you need? First of all, it is presently simply not called when the two errors above do happen. This could be changed, of course. Erm, I think you'll find the error handler does activate on DID_SOFT_ERROR. It causes a retry via the eh. DID_NO_CONNECT is an Dec 7 23:48:45 beo-96 kernel: [94605.297924] sd 2:0:5:0: [sdd] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK,SUGGEST_OK Dec 7 23:48:45 beo-96 kernel: [94605.297932] end_request: I/O error, dev sdd, sector 7706802052 Dec 7 23:48:45 beo-96 kernel: [94605.297937] raid5:md5: read error not correctable (sector 871932472 on sdd3). This is some type of ioc internal error. What we do on DID_SOFT_ERROR is retry for the usual number of times up to the timeout limit. Unfortunately, the retries are fixed at SD_MAX_RETRIES in sd.c. Without diagnosing what's going wrong in the fusion, it's impossible to say if this is reasonable, but your fusion is signalling ioc errors (firmware errors). Full log attached. immediate error with no eh intervention because it means that the target went away. Handling this as a retryable error isn't an option because it will interfere with hotplug. Then we need a sysfs flag one can set to manually enable eh for these devices on DID_NO_CONNECT. No, because that will seriously damage a lot of other systems. The DID_NO_CONNECT looks to be a genuine reselection issue caused by a device out of spec on the bus. The SPI standard says a device should respond in 250ms, which is what most HBA's take as the default selection timeout. I'd say for the device you have, you need to increase this. Unfortunately doing this for the fusion is some type of mode page setting, I think, but I don't have the doc in front of me. I'd be amenable to putting the selection timeout as a parameter in the spi transport class, since others might find it valuable occasionally to control. Secondly, I think scsi_eh is in most cases doing too much. We are fighting with flaky Infortrend boxes here, and scsi_eh sometimes manages to crash their scsi channels. In most cases it is sufficient to stall any io to the device and then to resume. But that's basically the default behaviour of the error handler (stall then resume). For most scsi devices one probably doesn't need a suspend time or it can be very small, this still needs to become configurable via sysfs. You mean a wait time beyond what the error handler currently does (basically it waits for the quiesce, begins error handling and then sends a test unit ready when it finishes before restarting). In deh just waits on the first error and then only does a DV. For these infortrend devices, thats mostly sufficient. Thirdly, scsi_eh doesn't give up, in most cases, when the scsi channel of a Infortrend box crashed, it tried forever to recover. To improve this is still on my todo list. Could you send traces for this. I thought the error handler had been fixed over the last few years always to terminate. If there's a case where it doesn't, this needs fixing. I'm attaching the syslog, this is 2.6.22 + additional printks, dump_stack()'s and msleep()'s. At 03:59:36 the system finally went into wait_for_completion(), similar to the everything in wait_for_completion, what is my system doing? thread. This looks like a genuine bug. I missed the thread, since my email system went off line while I was on holiday for two weeks. The symptoms look to be lost commands, but I can't see why from the traces. There's a known bug where we can hang in domain validation because of a resource starvation issue, but I know of none where everything hangs just after error recovery completes. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] scsi device recovery
Hi, below is a patch introducing device recovery, trying to prevent i/o errors when a DID_NO_CONNECT or SOFT_ERROR does happen. The patch still needs quite some work: 1.) I still didn't figure out what is the best place to run sdev-deh.ehandler = kthread_run(scsi_device_error_handler, ...) 2.) As I see it, its not a good idea to run spi_schedule_dv_device() in scsi_error.c, since spi_schedule_dv_device() is in scsi_transport_spi.c, which seems to be separated from the core scsi-layer. So what is another way to initiate a DV in scsi_error.c? 3.) Maybe related to 2), for now I'm calling spi_schedule_dv_device(), but this is not always doing what I want. [ 406.785104] sd 5:0:2:0: deh: scheduling domain validation [ 408.422530] target5:0:2: Beginning Domain Validation [ 408.466620] target5:0:2: Domain Validation skipping write tests [ 408.472771] target5:0:2: Ending Domain Validation Hmm, somehow related to sdev-inquiry_len, but isn't it the task of spi_schedule_dv_device() and subfunctions to do that properly? Any comments, hints and help is appreciated. Signed-of-by: Bernd Schubert [EMAIL PROTECTED] Index: linux-2.6.22/drivers/scsi/scsi_error.c === --- linux-2.6.22.orig/drivers/scsi/scsi_error.c 2007-12-12 12:26:20.0 +0100 +++ linux-2.6.22/drivers/scsi/scsi_error.c 2007-12-12 13:08:40.0 +0100 @@ -33,6 +33,7 @@ #include scsi/scsi_transport.h #include scsi/scsi_host.h #include scsi/scsi_ioctl.h +#include scsi/scsi_transport_spi.h #include scsi_priv.h #include scsi_logging.h @@ -1589,6 +1590,153 @@ int scsi_error_handler(void *data) return 0; } +/** + * scsi_unjam_sdev - try to revover a failed scsi-device + * @sdev: scsi device we are recovering + */ +static int scsi_unjam_sdev(struct scsi_device *sdev) +{ + int rtn; + + sdev_printk(KERN_CRIT, sdev, resetting device\n); + rtn = scsi_reset_provider(sdev, SCSI_TRY_RESET_DEVICE); + scsi_report_device_reset(sdev-host, sdev-channel, sdev-id); + if (rtn == SUCCESS) + sdev_printk(KERN_INFO, sdev, device reset succeeded, + set device to running state\n); + return SUCCESS; +} + +/** + * scsi_schedule_deh - schedule EH for SCSI device + * @sdev: SCSI device to invoke error handling on. + * + **/ +void scsi_schedule_deh(struct scsi_device *sdev) +{ +#if 0 + if (sdev-deh.error) { + /* blocking the device does not work! another recovery was +* scheduled, though no i/o should go to the device now! */ + sdev_printk(KERN_CRIT, sdev, + device already in recovery, but another recovery + was scheduled\n); + dump_stack(); + } +#endif + if (sdev-deh.error) + return; /* recovery already running */ + + if (sdev-deh.last_recovery + jiffies sdev-deh.last_recovery + 300 * HZ) + sdev-deh.count++; + else + sdev-deh.count = 0; + + if (sdev-deh.count = 10) { + sdev_printk(KERN_WARNING, sdev, + too many errors within time limit, setting + device offline\n); + scsi_device_set_state(sdev, SDEV_OFFLINE); + return; + } else if (sdev-deh.count = 5) { + sdev_printk(KERN_INFO, sdev, Initiating host recovery\n); + scsi_schedule_eh(sdev-host); /* host recovery */ + return; + } else + sdev-deh.count++; + + sdev_printk(KERN_INFO, sdev, n-error: %d\n, sdev-deh.count); + + if (!scsi_internal_device_block(sdev)) { + sdev-deh.error = 1; + if (sdev-deh.ehandler) + wake_up_process(sdev-deh.ehandler); + else + sdev_printk(KERN_WARNING, sdev, + deh handler missing\n); + } else { + sdev_printk(KERN_WARNING, sdev, + Couldn't block device, calling host recovery\n); + scsi_schedule_eh(sdev-host); + } +} +EXPORT_SYMBOL_GPL(scsi_schedule_deh); + +/** + * scsi_device_error_handler - SCSI error handler thread + * @data: Device for which we are running. + * + * Notes: + *This is the main device error handling loop. This is run as a kernel thread + *for every SCSI device and handles all device error handling activity. + **/ +int scsi_device_error_handler(void *data) +{ + struct scsi_device *sdev = data; + int sleeptime = 30; + + current-flags |= PF_NOFREEZE; + + /* +* We use TASK_INTERRUPTIBLE so that the thread is not +* counted against the load average as a running process. +* We never actually get interrupted because kthread_run +* disables singal delivery for the created thread. +
Re: [PATCH] scsi device recovery
On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote: below is a patch introducing device recovery, trying to prevent i/o errors when a DID_NO_CONNECT or SOFT_ERROR does happen. Why doesn't the regular scsi_eh do what you need? -- Intel are signing my paycheques ... these opinions are still mine Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi device recovery
On Wednesday 12 December 2007 14:39:27 Matthew Wilcox wrote: On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote: below is a patch introducing device recovery, trying to prevent i/o errors when a DID_NO_CONNECT or SOFT_ERROR does happen. Why doesn't the regular scsi_eh do what you need? First of all, it is presently simply not called when the two errors above do happen. This could be changed, of course. Secondly, I think scsi_eh is in most cases doing too much. We are fighting with flaky Infortrend boxes here, and scsi_eh sometimes manages to crash their scsi channels. In most cases it is sufficient to stall any io to the device and then to resume. For most scsi devices one probably doesn't need a suspend time or it can be very small, this still needs to become configurable via sysfs. Thirdly, scsi_eh doesn't give up, in most cases, when the scsi channel of a Infortrend box crashed, it tried forever to recover. To improve this is still on my todo list. Thanks, Bernd -- Bernd Schubert Q-Leap Networks GmbH - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi device recovery
[Hmm, resending since mail after more than 30min still not on the ML, maybe the attachment was too large? I have uploaded the log to http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/scsi/kern.log.1] On Wednesday 12 December 2007 16:59:36 James Bottomley wrote: On Wed, 2007-12-12 at 15:36 +0100, Bernd Schubert wrote: On Wednesday 12 December 2007 14:39:27 Matthew Wilcox wrote: On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote: below is a patch introducing device recovery, trying to prevent i/o errors when a DID_NO_CONNECT or SOFT_ERROR does happen. Why doesn't the regular scsi_eh do what you need? First of all, it is presently simply not called when the two errors above do happen. This could be changed, of course. Erm, I think you'll find the error handler does activate on DID_SOFT_ERROR. It causes a retry via the eh. DID_NO_CONNECT is an Dec 7 23:48:45 beo-96 kernel: [94605.297924] sd 2:0:5:0: [sdd] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK,SUGGEST_OK Dec 7 23:48:45 beo-96 kernel: [94605.297932] end_request: I/O error, dev sdd, sector 7706802052 Dec 7 23:48:45 beo-96 kernel: [94605.297937] raid5:md5: read error not correctable (sector 871932472 on sdd3). Full log attached. immediate error with no eh intervention because it means that the target went away. Handling this as a retryable error isn't an option because it will interfere with hotplug. Then we need a sysfs flag one can set to manually enable eh for these devices on DID_NO_CONNECT. Secondly, I think scsi_eh is in most cases doing too much. We are fighting with flaky Infortrend boxes here, and scsi_eh sometimes manages to crash their scsi channels. In most cases it is sufficient to stall any io to the device and then to resume. But that's basically the default behaviour of the error handler (stall then resume). For most scsi devices one probably doesn't need a suspend time or it can be very small, this still needs to become configurable via sysfs. You mean a wait time beyond what the error handler currently does (basically it waits for the quiesce, begins error handling and then sends a test unit ready when it finishes before restarting). In deh just waits on the first error and then only does a DV. For these infortrend devices, thats mostly sufficient. Thirdly, scsi_eh doesn't give up, in most cases, when the scsi channel of a Infortrend box crashed, it tried forever to recover. To improve this is still on my todo list. Could you send traces for this. I thought the error handler had been fixed over the last few years always to terminate. If there's a case where it doesn't, this needs fixing. I'm attaching the syslog, this is 2.6.22 + additional printks, dump_stack()'s and msleep()'s. At 03:59:36 the system finally went into wait_for_completion(), similar to the everything in wait_for_completion, what is my system doing? thread. Thanks, Bernd -- Bernd Schubert Q-Leap Networks GmbH - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html