Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
Our testing of this patch has not shown any issues however I think we would be more comfortable with this change if there was a callback back into the LLDs that want to be notified of the rport state change so we can sync up our internal fibre channel port structures. AFAICT the changes required to add the callback to your original patch are fairly small...something along the lines of: diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c index f3bba1a..6d3bc06 100644 --- a/drivers/scsi/scsi_transport_fc.c +++ b/drivers/scsi/scsi_transport_fc.c @@ -959,6 +959,7 @@ store_fc_rport_port_state(struct device *dev, unsigned long flags; struct fc_rport *rport = transport_class_to_rport(dev); struct Scsi_Host *shost = rport_to_shost(rport); + struct fc_internal *i = to_fc_internal(shost->transportt); spin_lock_irqsave(shost->host_lock, flags); if (!strncmp(buf, "Blocked", 7)) { @@ -968,6 +969,8 @@ store_fc_rport_port_state(struct device *dev, } spin_unlock_irqrestore(shost->host_lock, flags); fc_remote_port_delete(rport); + if (i->f->rport_state_change) + i->f->rport_state_change(rport); } else if (!strncmp(buf, "Online", 6)) { if (rport->port_state != FC_PORTSTATE_BLOCKED) { spin_unlock_irqrestore(shost->host_lock, flags); @@ -999,6 +1002,8 @@ store_fc_rport_port_state(struct device *dev, scsi_queue_work(shost, &rport->scan_work); spin_unlock_irqrestore(shost->host_lock, flags); } + if (i->f->rport_state_change) + i->f->rport_state_change(rport); } else return -EINVAL; diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h index b797e8f..71c3ec5 100644 --- a/include/scsi/scsi_transport_fc.h +++ b/include/scsi/scsi_transport_fc.h @@ -698,6 +698,12 @@ struct fc_function_template { int (*bsg_request)(struct fc_bsg_job *); int (*bsg_timeout)(struct fc_bsg_job *); + /* +* Allow low-level drivers to be notified of state change if they want +* to be notified. +*/ + void(*rport_state_change)(struct fc_rport *); + /* allocation lengths for host-specific data */ u32 dd_fcrport_size; u32 dd_fcvport_size; -- 1.7.6.1 On Fri, 5 Apr 2013, James Smart wrote: I understand your points. But you will need to contact the different LLD maintainers to ensure receiving a devloss_tmo_callbk() on an rport they had not called fc_remote_port_delete() for. I know there's work on my side to validate it's ok. First glance was ok, but.. -- james s On 4/4/2013 2:26 AM, Hannes Reinecke wrote: On 04/01/2013 11:06 PM, James Smart wrote: I think lpfc survives your rport state change as : part of the lld behavior on the callback, to clean up reference counts, is to abort all i/o that is outstanding to the rport. So the ref checking not only protects lpfc from prematurely freeing a structure (my real concern), but also just happens to abort all i/o. We got lucky. I still believe the I_T_nexus reset is the right way to solve this. Yes, but this would be an even more intrusive patch. And we would need to implement yet another callback into the LLDDs which need to be implemented there, too. But for this to make any sense we would need to revamp the scsi error handler, as the current problem is that error recovery takes too long. Adding yet another callback will make the escalation chain even longer. So yeah, in the long run I_T nexus reset is the correct way of doing things, but in the short term I would opt to make port_state writeable to simulate an I_T nexus reset. Cheers, Hannes This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
I understand your points. But you will need to contact the different LLD maintainers to ensure receiving a devloss_tmo_callbk() on an rport they had not called fc_remote_port_delete() for. I know there's work on my side to validate it's ok. First glance was ok, but.. -- james s On 4/4/2013 2:26 AM, Hannes Reinecke wrote: On 04/01/2013 11:06 PM, James Smart wrote: I think lpfc survives your rport state change as : part of the lld behavior on the callback, to clean up reference counts, is to abort all i/o that is outstanding to the rport. So the ref checking not only protects lpfc from prematurely freeing a structure (my real concern), but also just happens to abort all i/o. We got lucky. I still believe the I_T_nexus reset is the right way to solve this. Yes, but this would be an even more intrusive patch. And we would need to implement yet another callback into the LLDDs which need to be implemented there, too. But for this to make any sense we would need to revamp the scsi error handler, as the current problem is that error recovery takes too long. Adding yet another callback will make the escalation chain even longer. So yeah, in the long run I_T nexus reset is the correct way of doing things, but in the short term I would opt to make port_state writeable to simulate an I_T nexus reset. Cheers, Hannes -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
On 04/01/2013 11:06 PM, James Smart wrote: > > On 3/18/2013 3:09 AM, Hannes Reinecke wrote: >> On 03/15/2013 08:13 PM, Bart Van Assche wrote: >>> On 03/15/13 19:51, Mike Christie wrote: On 03/15/2013 08:41 AM, Bart Van Assche wrote: > How about using the value of scsi_cmnd.jiffies_at_alloc to finish > only > those SCSI commands in the host reset handler that exceeded a > certain > processing time ? We basically do this now. When a scsi command times out the scsi layer blocks the host from processing new commands and waits for all outstanding commands to either finish normally or timeout. When all commands have finished or timedout, then we start the scsi eh code. So, by the time we have go to the scsi eh callbacks we are in a state where all the commands being processed by the eh have exceeded a certain processing time. If you mean you want to drop the block and wait part, then I think it could speed things up to do the abort callbacks while other IO is running (as long as the driver can support it). However if the abort fails and you need to escalate to operations like resets which interfere with multiple commands, then the driver/scsi-ml does not have much choice in what it does cleanup wise. There would be no point in checking the jiffies_at_alloc. The commands that are going to be affected by the tmf or host reset operation must be returned to the scsi-ml for retries or failure upwards. >>> >>> Hello Mike, >>> >>> It seems like there is a misunderstanding. With my comment I was not >>> referring to the SCSI ML but to the SCSI LLD. LLD drivers like >>> ib_srp keep track of outstanding SCSI requests. With the SRP >>> protocol it is possible to tell the InfiniBand HCA not to deliver >>> completions for outstanding requests by closing the connection used >>> for SRP communication. Hence my suggestion to finish SCSI commands >>> that were queued longer than a certain time ago from inside the LLD >>> host reset handler. I'm not sure though whether all types of FC >>> HBA's allow something equivalent. >>> >> Well, this is not quite identical to what I've been trying to >> achieve with this patch. >> This patch is for an individual rport which has gone out to lunch. >> Sure we could down the link from the HBA, but that would terminate >> I/O to _all_ connected rports, not just the malfunctioning one. >> So that wouldn't help us here. >> >> The closest equivalent to that would be a port logout; however, as >> discussed in the I_T nexus reset thread we would need another >> callout to the LLDs here as this definitely needs LLD support >> and none of the current LLDs have it implemented. >> >> Cheers, >> >> Hannes > > I think lpfc survives your rport state change as : part of the lld > behavior on the callback, to clean up reference counts, is to abort > all i/o that is outstanding to the rport. So the ref checking not > only protects lpfc from prematurely freeing a structure (my real > concern), but also just happens to abort all i/o. We got lucky. > > I still believe the I_T_nexus reset is the right way to solve this. > Yes, but this would be an even more intrusive patch. And we would need to implement yet another callback into the LLDDs which need to be implemented there, too. But for this to make any sense we would need to revamp the scsi error handler, as the current problem is that error recovery takes too long. Adding yet another callback will make the escalation chain even longer. So yeah, in the long run I_T nexus reset is the correct way of doing things, but in the short term I would opt to make port_state writeable to simulate an I_T nexus reset. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
On 3/18/2013 3:09 AM, Hannes Reinecke wrote: On 03/15/2013 08:13 PM, Bart Van Assche wrote: On 03/15/13 19:51, Mike Christie wrote: On 03/15/2013 08:41 AM, Bart Van Assche wrote: How about using the value of scsi_cmnd.jiffies_at_alloc to finish only those SCSI commands in the host reset handler that exceeded a certain processing time ? We basically do this now. When a scsi command times out the scsi layer blocks the host from processing new commands and waits for all outstanding commands to either finish normally or timeout. When all commands have finished or timedout, then we start the scsi eh code. So, by the time we have go to the scsi eh callbacks we are in a state where all the commands being processed by the eh have exceeded a certain processing time. If you mean you want to drop the block and wait part, then I think it could speed things up to do the abort callbacks while other IO is running (as long as the driver can support it). However if the abort fails and you need to escalate to operations like resets which interfere with multiple commands, then the driver/scsi-ml does not have much choice in what it does cleanup wise. There would be no point in checking the jiffies_at_alloc. The commands that are going to be affected by the tmf or host reset operation must be returned to the scsi-ml for retries or failure upwards. Hello Mike, It seems like there is a misunderstanding. With my comment I was not referring to the SCSI ML but to the SCSI LLD. LLD drivers like ib_srp keep track of outstanding SCSI requests. With the SRP protocol it is possible to tell the InfiniBand HCA not to deliver completions for outstanding requests by closing the connection used for SRP communication. Hence my suggestion to finish SCSI commands that were queued longer than a certain time ago from inside the LLD host reset handler. I'm not sure though whether all types of FC HBA's allow something equivalent. Well, this is not quite identical to what I've been trying to achieve with this patch. This patch is for an individual rport which has gone out to lunch. Sure we could down the link from the HBA, but that would terminate I/O to _all_ connected rports, not just the malfunctioning one. So that wouldn't help us here. The closest equivalent to that would be a port logout; however, as discussed in the I_T nexus reset thread we would need another callout to the LLDs here as this definitely needs LLD support and none of the current LLDs have it implemented. Cheers, Hannes I think lpfc survives your rport state change as : part of the lld behavior on the callback, to clean up reference counts, is to abort all i/o that is outstanding to the rport. So the ref checking not only protects lpfc from prematurely freeing a structure (my real concern), but also just happens to abort all i/o. We got lucky. I still believe the I_T_nexus reset is the right way to solve this. -- james s -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
Hannes, I'm finally going through this thread. I'm a bit uncomfortable about making the rport state writeable. The rport state is tightly tied to the driver's discovery engine, and I'm concerned about some of the callbacks firing, such as the rport_delete(), and the driver taking an action that isn't legit as the driver wasn't the one that called the fc_remote_port_delete() call. I expect, at least in the lpfc driver, we would require a change to work with this. Although, we had to add an ugly internal reference counter mechanism so we didn't actually early-free structures, that may have saved your changing of this. -- james s On 3/15/2013 7:55 AM, Hannes Reinecke wrote: On 03/14/2013 07:09 PM, Steffen Maier wrote: Just for my understanding: So this is a bit like writing 0 into the failed attribute of a zfcp_port in sysfs (zfcp_sysfs_port_failed_store()) which forces a port reopen recovery including a sequence of fc_remote_port_delete and fc_remote_port_add? Sorta. This patch simulates the result of an RSCN where this remote port is missing. So the fast_io_fail / dev_loss_tmo mechanism is triggered (until further notice). It'll be reset with the next RSCN or by manually resetting the port state. If so, it sounds good to have this functionality for any FC LLD in common code. That's why I posted this code :-) Rationale for this patch is a weird test case with brocade switches; there you can actually disable a _target_ port. So the port isn't reachable anymore but no RSCN is send. And the LLDD is forced into error recovery which'll take _ages_ as each and every command send during error recovery will time out. (And I can think about deprecating parts of zfcp code, next we could consider the same for zfcp's forced LUN and adapter recovery or maybe this already exists as sdev's writable state attribute and the adapter recovery can be performed manually by walking all fc_rports writing their port_state.) I don't think the sdev 'state' attribute is useable here; it doesn't trigger a rediscovery. You'd want to use 'rescan' here. But yeah, that would be one of the goals. Still waiting for some response to the patch, though. Testing has been successful on our side, plus I've posted a patchset for multipath-tools utilizing this patch. So it would be good if it could be included. James? Cheers, Hannes -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
On 3/15/2013 8:28 AM, Bryn M. Reeves wrote: > On 03/15/2013 12:46 PM, Bart Van Assche wrote: >> The SCSI EH keeps trying until all outstanding request have been >> finished. Does lpfc_host_reset_handler() invoke scsi_done() for > > I don't think so (ends up calling lpfc_sli_cancel_iocbs() via > lpfc_hba_down_post() after shutting down the mailbox) but I've not seen the > EH escalate all the way to host reset in most of my testing - ... > The problem is that getting to this stage can take a very long time - much > longer than most cluster's node eviction timer for e.g. which is the source > of much of the complaint about this behaviour. >> outstanding requests ? If not, how about modifying >> lpfc_host_reset_handler() such that it finishes all outstanding requests >> if the remote port is not reachable ? It does call the done() function on the outstanding command IOCBs after the lpfc_reset_flush_io_context() call aborts them. The "problem" is that they are returned with ScsiResult(DID_REQUEUE, 0) which basically queues them back to the port as long as the port is still "up". Which results in the commands hanging out until their timeouts expire (if the device isn't responding). If the device does resume after the reset, in the case of a tape device it is possible corrupt the tape because the 2900's get trapped by the TUR in the eh routines depending on which commands were hung. Take write for example, the reset can result in a tape rewind, and when the write gets fired back at the device the tape is at BOT and effectively erases all data already on the tape. Whops! Also, as I stated elsewhere, in my testing its impossible to escalate beyond the flush_io_context() in the lpfc_device_reset_handler driver because it always returns true if the card firmware is responding. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
On 03/15/2013 08:13 PM, Bart Van Assche wrote: On 03/15/13 19:51, Mike Christie wrote: On 03/15/2013 08:41 AM, Bart Van Assche wrote: How about using the value of scsi_cmnd.jiffies_at_alloc to finish only those SCSI commands in the host reset handler that exceeded a certain processing time ? We basically do this now. When a scsi command times out the scsi layer blocks the host from processing new commands and waits for all outstanding commands to either finish normally or timeout. When all commands have finished or timedout, then we start the scsi eh code. So, by the time we have go to the scsi eh callbacks we are in a state where all the commands being processed by the eh have exceeded a certain processing time. If you mean you want to drop the block and wait part, then I think it could speed things up to do the abort callbacks while other IO is running (as long as the driver can support it). However if the abort fails and you need to escalate to operations like resets which interfere with multiple commands, then the driver/scsi-ml does not have much choice in what it does cleanup wise. There would be no point in checking the jiffies_at_alloc. The commands that are going to be affected by the tmf or host reset operation must be returned to the scsi-ml for retries or failure upwards. Hello Mike, It seems like there is a misunderstanding. With my comment I was not referring to the SCSI ML but to the SCSI LLD. LLD drivers like ib_srp keep track of outstanding SCSI requests. With the SRP protocol it is possible to tell the InfiniBand HCA not to deliver completions for outstanding requests by closing the connection used for SRP communication. Hence my suggestion to finish SCSI commands that were queued longer than a certain time ago from inside the LLD host reset handler. I'm not sure though whether all types of FC HBA's allow something equivalent. Well, this is not quite identical to what I've been trying to achieve with this patch. This patch is for an individual rport which has gone out to lunch. Sure we could down the link from the HBA, but that would terminate I/O to _all_ connected rports, not just the malfunctioning one. So that wouldn't help us here. The closest equivalent to that would be a port logout; however, as discussed in the I_T nexus reset thread we would need another callout to the LLDs here as this definitely needs LLD support and none of the current LLDs have it implemented. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
On 03/15/2013 02:13 PM, Bart Van Assche wrote: > On 03/15/13 19:51, Mike Christie wrote: >> On 03/15/2013 08:41 AM, Bart Van Assche wrote: >>> How about using the value of scsi_cmnd.jiffies_at_alloc to finish only >>> those SCSI commands in the host reset handler that exceeded a certain >>> processing time ? >> >> We basically do this now. When a scsi command times out the scsi layer >> blocks the host from processing new commands and waits for all >> outstanding commands to either finish normally or timeout. When all >> commands have finished or timedout, then we start the scsi eh code. So, >> by the time we have go to the scsi eh callbacks we are in a state where >> all the commands being processed by the eh have exceeded a certain >> processing time. >> >> If you mean you want to drop the block and wait part, then I think it >> could speed things up to do the abort callbacks while other IO is >> running (as long as the driver can support it). However if the abort >> fails and you need to escalate to operations like resets which interfere >> with multiple commands, then the driver/scsi-ml does not have much >> choice in what it does cleanup wise. There would be no point in checking >> the jiffies_at_alloc. The commands that are going to be affected by the >> tmf or host reset operation must be returned to the scsi-ml for retries >> or failure upwards. > > Hello Mike, > > It seems like there is a misunderstanding. With my comment I was not > referring to the SCSI ML but to the SCSI LLD. LLD drivers like ib_srp > keep track of outstanding SCSI requests. With the SRP protocol it is > possible to tell the InfiniBand HCA not to deliver completions for > outstanding requests by closing the connection used for SRP > communication. Hence my suggestion to finish SCSI commands that were > queued longer than a certain time ago from inside the LLD host reset Are you saying you would have a class/driver timer running that determines when to run this? What you describe sounds like it would help. I am just wondering when it would be run. If I understand you correctly, is iscsi doing what you describe when we escalate to session dropping and relogin? > handler. I'm not sure though whether all types of FC HBA's allow > something equivalent. > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
On 03/15/13 19:51, Mike Christie wrote: On 03/15/2013 08:41 AM, Bart Van Assche wrote: How about using the value of scsi_cmnd.jiffies_at_alloc to finish only those SCSI commands in the host reset handler that exceeded a certain processing time ? We basically do this now. When a scsi command times out the scsi layer blocks the host from processing new commands and waits for all outstanding commands to either finish normally or timeout. When all commands have finished or timedout, then we start the scsi eh code. So, by the time we have go to the scsi eh callbacks we are in a state where all the commands being processed by the eh have exceeded a certain processing time. If you mean you want to drop the block and wait part, then I think it could speed things up to do the abort callbacks while other IO is running (as long as the driver can support it). However if the abort fails and you need to escalate to operations like resets which interfere with multiple commands, then the driver/scsi-ml does not have much choice in what it does cleanup wise. There would be no point in checking the jiffies_at_alloc. The commands that are going to be affected by the tmf or host reset operation must be returned to the scsi-ml for retries or failure upwards. Hello Mike, It seems like there is a misunderstanding. With my comment I was not referring to the SCSI ML but to the SCSI LLD. LLD drivers like ib_srp keep track of outstanding SCSI requests. With the SRP protocol it is possible to tell the InfiniBand HCA not to deliver completions for outstanding requests by closing the connection used for SRP communication. Hence my suggestion to finish SCSI commands that were queued longer than a certain time ago from inside the LLD host reset handler. I'm not sure though whether all types of FC HBA's allow something equivalent. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
On 03/15/2013 08:41 AM, Bart Van Assche wrote: > On 03/15/13 14:28, Bryn M. Reeves wrote: >> On 03/15/2013 12:46 PM, Bart Van Assche wrote: >>> The SCSI EH keeps trying until all outstanding request have been >>> finished. Does lpfc_host_reset_handler() invoke scsi_done() for It does not really matter at that point for the scsi command timeout case. When blk_complete_request is called by scsi_done, it will see the command has timed out and so it will not be processed by that normal completion path. The scsi eh basically owns the completion processing of the command once it has timed out. For the cleanup when the port is not reachable you had suggested is what Hannes is proposing in the I_T nexus reset patch. The problem is making sure that if the driver/class returns SUCCESS or FAST_IO_FAIL then they do not touch the scsi cmnd struct again. >> >> I don't think so (ends up calling lpfc_sli_cancel_iocbs() via >> lpfc_hba_down_post() after shutting down the mailbox) but I've not seen >> the EH escalate all the way to host reset in most of my testing - >> usually some time after reaching the bus reset remaining IOs timeout and >> the error bubbles up to device-mapper (all the cases I'm looking at are >> devices managed by a dm-multipath target). >> >> The problem is that getting to this stage can take a very long time - >> much longer than most cluster's node eviction timer for e.g. which is >> the source of much of the complaint about this behaviour. >> >>> outstanding requests ? If not, how about modifying >>> lpfc_host_reset_handler() such that it finishes all outstanding requests >>> if the remote port is not reachable ? >> >> I'm not sure how safe that is in this situation - James mentioned in the >> I_T nexus reset thread concerns about frames that could be delayed etc. >> in the fabric if the host unilaterally abandons IOs (not sure of the >> details for lpfc at this level). > > How about using the value of scsi_cmnd.jiffies_at_alloc to finish only > those SCSI commands in the host reset handler that exceeded a certain > processing time ? > We basically do this now. When a scsi command times out the scsi layer blocks the host from processing new commands and waits for all outstanding commands to either finish normally or timeout. When all commands have finished or timedout, then we start the scsi eh code. So, by the time we have go to the scsi eh callbacks we are in a state where all the commands being processed by the eh have exceeded a certain processing time. If you mean you want to drop the block and wait part, then I think it could speed things up to do the abort callbacks while other IO is running (as long as the driver can support it). However if the abort fails and you need to escalate to operations like resets which interfere with multiple commands, then the driver/scsi-ml does not have much choice in what it does cleanup wise. There would be no point in checking the jiffies_at_alloc. The commands that are going to be affected by the tmf or host reset operation must be returned to the scsi-ml for retries or failure upwards. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
On 03/15/13 14:28, Bryn M. Reeves wrote: On 03/15/2013 12:46 PM, Bart Van Assche wrote: The SCSI EH keeps trying until all outstanding request have been finished. Does lpfc_host_reset_handler() invoke scsi_done() for I don't think so (ends up calling lpfc_sli_cancel_iocbs() via lpfc_hba_down_post() after shutting down the mailbox) but I've not seen the EH escalate all the way to host reset in most of my testing - usually some time after reaching the bus reset remaining IOs timeout and the error bubbles up to device-mapper (all the cases I'm looking at are devices managed by a dm-multipath target). The problem is that getting to this stage can take a very long time - much longer than most cluster's node eviction timer for e.g. which is the source of much of the complaint about this behaviour. outstanding requests ? If not, how about modifying lpfc_host_reset_handler() such that it finishes all outstanding requests if the remote port is not reachable ? I'm not sure how safe that is in this situation - James mentioned in the I_T nexus reset thread concerns about frames that could be delayed etc. in the fabric if the host unilaterally abandons IOs (not sure of the details for lpfc at this level). How about using the value of scsi_cmnd.jiffies_at_alloc to finish only those SCSI commands in the host reset handler that exceeded a certain processing time ? Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
On 03/15/2013 12:46 PM, Bart Van Assche wrote: The SCSI EH keeps trying until all outstanding request have been finished. Does lpfc_host_reset_handler() invoke scsi_done() for I don't think so (ends up calling lpfc_sli_cancel_iocbs() via lpfc_hba_down_post() after shutting down the mailbox) but I've not seen the EH escalate all the way to host reset in most of my testing - usually some time after reaching the bus reset remaining IOs timeout and the error bubbles up to device-mapper (all the cases I'm looking at are devices managed by a dm-multipath target). The problem is that getting to this stage can take a very long time - much longer than most cluster's node eviction timer for e.g. which is the source of much of the complaint about this behaviour. outstanding requests ? If not, how about modifying lpfc_host_reset_handler() such that it finishes all outstanding requests if the remote port is not reachable ? I'm not sure how safe that is in this situation - James mentioned in the I_T nexus reset thread concerns about frames that could be delayed etc. in the fabric if the host unilaterally abandons IOs (not sure of the details for lpfc at this level). Regards, Bryn. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
On 03/15/13 13:37, Bryn M. Reeves wrote: On 03/15/2013 12:24 PM, Bart Van Assche wrote: On 03/15/13 12:55, Hannes Reinecke wrote: And the LLDD is forced into error recovery which'll take _ages_ as each and every command send during error recovery will time out. Hello Hannes, I'm analyzing a related but not identical issue with SRP. It would help if you could tell with which LLDD you ran into this issue and with which values of fast_io_fail_tmo and dev_loss_tmo. Most of the cases I've seen have involved lpfc (although I don't think it's in any way exclusive to that LLDD). Even with very low fast_io_fail_timeout/dev_loss_timeout (<5/10) the eh is busy for 10m or longer before IO fails and multipath is able to react to the problem. The SCSI EH keeps trying until all outstanding request have been finished. Does lpfc_host_reset_handler() invoke scsi_done() for outstanding requests ? If not, how about modifying lpfc_host_reset_handler() such that it finishes all outstanding requests if the remote port is not reachable ? Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
On 03/15/2013 12:24 PM, Bart Van Assche wrote: On 03/15/13 12:55, Hannes Reinecke wrote: And the LLDD is forced into error recovery which'll take _ages_ as each and every command send during error recovery will time out. Hello Hannes, I'm analyzing a related but not identical issue with SRP. It would help if you could tell with which LLDD you ran into this issue and with which values of fast_io_fail_tmo and dev_loss_tmo. Most of the cases I've seen have involved lpfc (although I don't think it's in any way exclusive to that LLDD). Even with very low fast_io_fail_timeout/dev_loss_timeout (<5/10) the eh is busy for 10m or longer before IO fails and multipath is able to react to the problem. Regards, Bryn. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
On 03/15/13 12:55, Hannes Reinecke wrote: And the LLDD is forced into error recovery which'll take _ages_ as each and every command send during error recovery will time out. Hello Hannes, I'm analyzing a related but not identical issue with SRP. It would help if you could tell with which LLDD you ran into this issue and with which values of fast_io_fail_tmo and dev_loss_tmo. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
On 03/15/2013 11:55 AM, Hannes Reinecke wrote: Rationale for this patch is a weird test case with brocade switches; there you can actually disable a _target_ port. So the port isn't reachable anymore but no RSCN is send. I think it's more than a pure test-case; using the rscnsupr feature on the Brocades is just a handy way to trigger it. I've seen numerous cases in the last few years of fabric failures that had this characteristic - either because of hardware failures in the switches or due to bugs that caused FC traffic to be blackholed without an RSCN or other indication (beyond commands timing out). This (and the I_T nexus reset you proposed in December which is very effective at short-ciruiting the eh escalation in the same situation) both make the kernel more robust in the face of this kind of problem. Regards, Bryn. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
On 03/14/2013 07:09 PM, Steffen Maier wrote: Just for my understanding: So this is a bit like writing 0 into the failed attribute of a zfcp_port in sysfs (zfcp_sysfs_port_failed_store()) which forces a port reopen recovery including a sequence of fc_remote_port_delete and fc_remote_port_add? Sorta. This patch simulates the result of an RSCN where this remote port is missing. So the fast_io_fail / dev_loss_tmo mechanism is triggered (until further notice). It'll be reset with the next RSCN or by manually resetting the port state. If so, it sounds good to have this functionality for any FC LLD in common code. That's why I posted this code :-) Rationale for this patch is a weird test case with brocade switches; there you can actually disable a _target_ port. So the port isn't reachable anymore but no RSCN is send. And the LLDD is forced into error recovery which'll take _ages_ as each and every command send during error recovery will time out. (And I can think about deprecating parts of zfcp code, next we could consider the same for zfcp's forced LUN and adapter recovery or maybe this already exists as sdev's writable state attribute and the adapter recovery can be performed manually by walking all fc_rports writing their port_state.) I don't think the sdev 'state' attribute is useable here; it doesn't trigger a rediscovery. You'd want to use 'rescan' here. But yeah, that would be one of the goals. Still waiting for some response to the patch, though. Testing has been successful on our side, plus I've posted a patchset for multipath-tools utilizing this patch. So it would be good if it could be included. James? Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable
Just for my understanding: So this is a bit like writing 0 into the failed attribute of a zfcp_port in sysfs (zfcp_sysfs_port_failed_store()) which forces a port reopen recovery including a sequence of fc_remote_port_delete and fc_remote_port_add? If so, it sounds good to have this functionality for any FC LLD in common code. (And I can think about deprecating parts of zfcp code, next we could consider the same for zfcp's forced LUN and adapter recovery or maybe this already exists as sdev's writable state attribute and the adapter recovery can be performed manually by walking all fc_rports writing their port_state.) On 01/15/2013 04:02 PM, Hannes Reinecke wrote: Multipath might detect a path down even though the LLDD hasn't registered a link down event. This patch makes the 'port_state' FC attribute writeable to enforce a link down scenario in these cases, allowing for a fast failover to the remaining ports. Cc: Chad Dupuis Cc: Andrew Vasquez Cc: James Smart Cc: James Bottomley Cc: Mike Christie Signed-off-by: Hannes Reinecke diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c index 5e1c6dd..4d28ee5 100644 --- a/drivers/scsi/scsi_transport_fc.c +++ b/drivers/scsi/scsi_transport_fc.c @@ -42,6 +42,8 @@ #include "scsi_priv.h" #include "scsi_transport_fc_internal.h" +static void fc_flush_devloss(struct Scsi_Host *shost); +static void fc_flush_work(struct Scsi_Host *shost); static int fc_queue_work(struct Scsi_Host *, struct work_struct *); static void fc_vport_sched_delete(struct work_struct *work); static int fc_vport_setup(struct Scsi_Host *shost, int channel, @@ -949,7 +951,76 @@ show_fc_rport_roles (struct device *dev, struct device_attribute *attr, static FC_DEVICE_ATTR(rport, roles, S_IRUGO, show_fc_rport_roles, NULL); -fc_private_rport_rd_enum_attr(port_state, FC_PORTSTATE_MAX_NAMELEN); +static ssize_t +store_fc_rport_port_state(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + unsigned long flags; + struct fc_rport *rport = transport_class_to_rport(dev); + struct Scsi_Host *shost = rport_to_shost(rport); + + spin_lock_irqsave(shost->host_lock, flags); + if (!strncmp(buf, "Blocked", 7)) { + if (rport->port_state != FC_PORTSTATE_ONLINE) { + spin_unlock_irqrestore(shost->host_lock, flags); + return -EBUSY; + } + spin_unlock_irqrestore(shost->host_lock, flags); + fc_remote_port_delete(rport); + } else if (!strncmp(buf, "Online", 6)) { + if (rport->port_state != FC_PORTSTATE_BLOCKED) { + spin_unlock_irqrestore(shost->host_lock, flags); + return -EBUSY; + } + /* Reset back to online for rescan */ + rport->port_state = FC_PORTSTATE_ONLINE; + spin_unlock_irqrestore(shost->host_lock, flags); + + if (!cancel_delayed_work(&rport->fail_io_work)) + fc_flush_devloss(shost); + if (!cancel_delayed_work(&rport->dev_loss_work)) + fc_flush_devloss(shost); + + spin_lock_irqsave(shost->host_lock, flags); + rport->flags &= ~(FC_RPORT_FAST_FAIL_TIMEDOUT | + FC_RPORT_DEVLOSS_PENDING | + FC_RPORT_DEVLOSS_CALLBK_DONE); + spin_unlock_irqrestore(shost->host_lock, flags); + + /* ensure any stgt delete functions are done */ + fc_flush_work(shost); + + if (rport->roles & FC_PORT_ROLE_FCP_TARGET) { + scsi_target_unblock(&rport->dev, SDEV_RUNNING); + /* initiate a scan of the target */ + spin_lock_irqsave(shost->host_lock, flags); + rport->flags |= FC_RPORT_SCAN_PENDING; + scsi_queue_work(shost, &rport->scan_work); + spin_unlock_irqrestore(shost->host_lock, flags); + } + } else + return -EINVAL; + + return count; +} + +static ssize_t +show_fc_rport_port_state (struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct fc_rport *rport = transport_class_to_rport(dev); + const char *name; + + name = get_fc_port_state_name(rport->port_state); + if (!name) + return -EINVAL; + return snprintf(buf, FC_PORTSTATE_MAX_NAMELEN, "%s\n", name); +} + +static FC_DEVICE_ATTR(rport, port_state, S_IRUGO | S_IWUSR, + show_fc_rport_port_state, store_fc_rport_port_state); + fc_private_rport_rd_attr(scsi_target_id, "%d\n", 20); /* @@ -2289,7 +2360,7 @@ fc_attach_transport(struct fc_function_template *ft) SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(port_name);