Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-04-12 Thread Chad Dupuis

Our testing of this patch has not shown any issues however I think we
would be more comfortable with this change if there was a callback back
into the LLDs that want to be notified of the rport state change so we
can sync up our internal fibre channel port structures.  AFAICT the
changes required to add the callback to your original patch are fairly
small...something along the lines of:

diff --git a/drivers/scsi/scsi_transport_fc.c
b/drivers/scsi/scsi_transport_fc.c
index f3bba1a..6d3bc06 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -959,6 +959,7 @@ store_fc_rport_port_state(struct device *dev,
   unsigned long flags;
   struct fc_rport *rport = transport_class_to_rport(dev);
   struct Scsi_Host *shost = rport_to_shost(rport);
+   struct fc_internal *i = to_fc_internal(shost-transportt);

   spin_lock_irqsave(shost-host_lock, flags);
   if (!strncmp(buf, Blocked, 7)) {
@@ -968,6 +969,8 @@ store_fc_rport_port_state(struct device *dev,
   }
   spin_unlock_irqrestore(shost-host_lock, flags);
   fc_remote_port_delete(rport);
+   if (i-f-rport_state_change)
+   i-f-rport_state_change(rport);
   } else if (!strncmp(buf, Online, 6)) {
   if (rport-port_state != FC_PORTSTATE_BLOCKED) {
   spin_unlock_irqrestore(shost-host_lock, flags);
@@ -999,6 +1002,8 @@ store_fc_rport_port_state(struct device *dev,
   scsi_queue_work(shost, rport-scan_work);
   spin_unlock_irqrestore(shost-host_lock, flags);
   }
+   if (i-f-rport_state_change)
+   i-f-rport_state_change(rport);
   } else
   return -EINVAL;

diff --git a/include/scsi/scsi_transport_fc.h
b/include/scsi/scsi_transport_fc.h
index b797e8f..71c3ec5 100644
--- a/include/scsi/scsi_transport_fc.h
+++ b/include/scsi/scsi_transport_fc.h
@@ -698,6 +698,12 @@ struct fc_function_template {
   int (*bsg_request)(struct fc_bsg_job *);
   int (*bsg_timeout)(struct fc_bsg_job *);

+   /*
+* Allow low-level drivers to be notified of state change if they
want
+* to be notified.
+*/
+   void(*rport_state_change)(struct fc_rport *);
+
   /* allocation lengths for host-specific data */
   u32 dd_fcrport_size;
   u32 dd_fcvport_size;
--
1.7.6.1

On Fri, 5 Apr 2013, James Smart wrote:


I understand your points. But you will need to contact the different LLD
maintainers to ensure receiving a devloss_tmo_callbk() on an rport they
had not called fc_remote_port_delete() for.  I know there's work on my
side to validate it's ok.  First glance was ok, but..

-- james s


On 4/4/2013 2:26 AM, Hannes Reinecke wrote:

On 04/01/2013 11:06 PM, James Smart wrote:

I think lpfc survives your rport state change as : part of the lld
behavior on the callback, to clean up reference counts, is to abort
all i/o that is outstanding to the rport.   So the ref checking not
only protects lpfc from prematurely freeing a structure (my real
concern), but also just happens to abort all i/o.  We got lucky.

I still believe the I_T_nexus reset is the right way to solve this.


Yes, but this would be an even more intrusive patch.
And we would need to implement yet another callback into the LLDDs
which need to be implemented there, too.

But for this to make any sense we would need to revamp the scsi
error handler, as the current problem is that error recovery takes
too long. Adding yet another callback will make the escalation chain
even longer.

So yeah, in the long run I_T nexus reset is the correct way of doing
things, but in the short term I would opt to make port_state
writeable to simulate an I_T nexus reset.

Cheers,

Hannes








This message and any attached documents contain information from QLogic 
Corporation or its wholly-owned subsidiaries that may be confidential. If you 
are not the intended recipient, you may not read, copy, distribute, or use this 
information. If you have received this transmission in error, please notify the 
sender immediately by reply e-mail and then delete this message.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-04-05 Thread James Smart
I understand your points. But you will need to contact the different LLD 
maintainers to ensure receiving a devloss_tmo_callbk() on an rport they 
had not called fc_remote_port_delete() for.  I know there's work on my 
side to validate it's ok.  First glance was ok, but..


-- james s


On 4/4/2013 2:26 AM, Hannes Reinecke wrote:

On 04/01/2013 11:06 PM, James Smart wrote:

I think lpfc survives your rport state change as : part of the lld
behavior on the callback, to clean up reference counts, is to abort
all i/o that is outstanding to the rport.   So the ref checking not
only protects lpfc from prematurely freeing a structure (my real
concern), but also just happens to abort all i/o.  We got lucky.

I still believe the I_T_nexus reset is the right way to solve this.


Yes, but this would be an even more intrusive patch.
And we would need to implement yet another callback into the LLDDs
which need to be implemented there, too.

But for this to make any sense we would need to revamp the scsi
error handler, as the current problem is that error recovery takes
too long. Adding yet another callback will make the escalation chain
even longer.

So yeah, in the long run I_T nexus reset is the correct way of doing
things, but in the short term I would opt to make port_state
writeable to simulate an I_T nexus reset.

Cheers,

Hannes


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-04-04 Thread Hannes Reinecke
On 04/01/2013 11:06 PM, James Smart wrote:
 
 On 3/18/2013 3:09 AM, Hannes Reinecke wrote:
 On 03/15/2013 08:13 PM, Bart Van Assche wrote:
 On 03/15/13 19:51, Mike Christie wrote:
 On 03/15/2013 08:41 AM, Bart Van Assche wrote:
 How about using the value of scsi_cmnd.jiffies_at_alloc to finish
 only
 those SCSI commands in the host reset handler that exceeded a
 certain
 processing time ?

 We basically do this now. When a scsi command times out the scsi
 layer
 blocks the host from processing new commands and waits for all
 outstanding commands to either finish normally or timeout. When all
 commands have finished or timedout, then we start the scsi eh
 code. So,
 by the time we have go to the scsi eh callbacks we are in a state
 where
 all the commands being processed by the eh have exceeded a certain
 processing time.

 If you mean you want to drop the block and wait part, then I
 think it
 could speed things up to do the abort callbacks while other IO is
 running (as long as the driver can support it). However if the
 abort
 fails and you need to escalate to operations like resets which
 interfere
 with multiple commands, then the driver/scsi-ml does not have much
 choice in what it does cleanup wise. There would be no point in
 checking
 the jiffies_at_alloc. The commands that are going to be affected
 by the
 tmf or host reset operation must be returned to the scsi-ml for
 retries
 or failure upwards.

 Hello Mike,

 It seems like there is a misunderstanding. With my comment I was not
 referring to the SCSI ML but to the SCSI LLD. LLD drivers like
 ib_srp keep track of outstanding SCSI requests. With the SRP
 protocol it is possible to tell the InfiniBand HCA not to deliver
 completions for outstanding requests by closing the connection used
 for SRP communication. Hence my suggestion to finish SCSI commands
 that were queued longer than a certain time ago from inside the LLD
 host reset handler. I'm not sure though whether all types of FC
 HBA's allow something equivalent.

 Well, this is not quite identical to what I've been trying to
 achieve with this patch.
 This patch is for an individual rport which has gone out to lunch.
 Sure we could down the link from the HBA, but that would terminate
 I/O to _all_ connected rports, not just the malfunctioning one.
 So that wouldn't help us here.

 The closest equivalent to that would be a port logout; however, as
 discussed in the I_T nexus reset thread we would need another
 callout to the LLDs here as this definitely needs LLD support
 and none of the current LLDs have it implemented.

 Cheers,

 Hannes
 
 I think lpfc survives your rport state change as : part of the lld
 behavior on the callback, to clean up reference counts, is to abort
 all i/o that is outstanding to the rport.   So the ref checking not
 only protects lpfc from prematurely freeing a structure (my real
 concern), but also just happens to abort all i/o.  We got lucky.
 
 I still believe the I_T_nexus reset is the right way to solve this.
 
Yes, but this would be an even more intrusive patch.
And we would need to implement yet another callback into the LLDDs
which need to be implemented there, too.

But for this to make any sense we would need to revamp the scsi
error handler, as the current problem is that error recovery takes
too long. Adding yet another callback will make the escalation chain
even longer.

So yeah, in the long run I_T nexus reset is the correct way of doing
things, but in the short term I would opt to make port_state
writeable to simulate an I_T nexus reset.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-04-01 Thread James Smart

Hannes,

I'm finally going through this thread.  I'm a bit uncomfortable about 
making the rport state writeable.  The rport state is tightly tied to 
the driver's discovery engine, and I'm concerned about some of the 
callbacks firing, such as the rport_delete(), and the driver taking an 
action that isn't legit as the driver wasn't the one that called the 
fc_remote_port_delete() call.


I expect, at least in the lpfc driver, we would require a change to work 
with this. Although, we had to add an ugly internal reference counter 
mechanism so we didn't actually early-free structures, that may have 
saved your changing of this.


-- james  s


On 3/15/2013 7:55 AM, Hannes Reinecke wrote:

On 03/14/2013 07:09 PM, Steffen Maier wrote:

Just for my understanding:
So this is a bit like writing 0 into the failed attribute of a
zfcp_port in sysfs (zfcp_sysfs_port_failed_store()) which forces a
port reopen recovery including a sequence of fc_remote_port_delete
and fc_remote_port_add?

Sorta.
This patch simulates the result of an RSCN where this remote port is 
missing. So the fast_io_fail / dev_loss_tmo mechanism is triggered 
(until further notice).
It'll be reset with the next RSCN or by manually resetting the port 
state.



If so, it sounds good to have this functionality for any FC LLD in
common code.


That's why I posted this code :-)
Rationale for this patch is a weird test case with brocade switches;
there you can actually disable a _target_ port. So the port isn't
reachable anymore but no RSCN is send.
And the LLDD is forced into error recovery which'll take _ages_ as 
each and every command send during error recovery will time out.



(And I can think about deprecating parts of zfcp code, next we could
consider the same for zfcp's forced LUN and adapter recovery or
maybe this already exists as sdev's writable state attribute and the
adapter recovery can be performed manually by walking all fc_rports
writing their port_state.)

I don't think the sdev 'state' attribute is useable here; it doesn't 
trigger a rediscovery. You'd want to use 'rescan' here.

But yeah, that would be one of the goals.

Still waiting for some response to the patch, though.
Testing has been successful on our side, plus I've posted
a patchset for multipath-tools utilizing this patch.

So it would be good if it could be included.

James?

Cheers,

Hannes


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-18 Thread Hannes Reinecke

On 03/15/2013 08:13 PM, Bart Van Assche wrote:

On 03/15/13 19:51, Mike Christie wrote:

On 03/15/2013 08:41 AM, Bart Van Assche wrote:

How about using the value of scsi_cmnd.jiffies_at_alloc to finish
only
those SCSI commands in the host reset handler that exceeded a
certain
processing time ?


We basically do this now. When a scsi command times out the scsi
layer
blocks the host from processing new commands and waits for all
outstanding commands to either finish normally or timeout. When all
commands have finished or timedout, then we start the scsi eh
code. So,
by the time we have go to the scsi eh callbacks we are in a state
where
all the commands being processed by the eh have exceeded a certain
processing time.

If you mean you want to drop the block and wait part, then I think it
could speed things up to do the abort callbacks while other IO is
running (as long as the driver can support it). However if the abort
fails and you need to escalate to operations like resets which
interfere
with multiple commands, then the driver/scsi-ml does not have much
choice in what it does cleanup wise. There would be no point in
checking
the jiffies_at_alloc. The commands that are going to be affected
by the
tmf or host reset operation must be returned to the scsi-ml for
retries
or failure upwards.


Hello Mike,

It seems like there is a misunderstanding. With my comment I was not
referring to the SCSI ML but to the SCSI LLD. LLD drivers like
ib_srp keep track of outstanding SCSI requests. With the SRP
protocol it is possible to tell the InfiniBand HCA not to deliver
completions for outstanding requests by closing the connection used
for SRP communication. Hence my suggestion to finish SCSI commands
that were queued longer than a certain time ago from inside the LLD
host reset handler. I'm not sure though whether all types of FC
HBA's allow something equivalent.


Well, this is not quite identical to what I've been trying to
achieve with this patch.
This patch is for an individual rport which has gone out to lunch.
Sure we could down the link from the HBA, but that would terminate
I/O to _all_ connected rports, not just the malfunctioning one.
So that wouldn't help us here.

The closest equivalent to that would be a port logout; however, as 
discussed in the I_T nexus reset thread we would need another 
callout to the LLDs here as this definitely needs LLD support

and none of the current LLDs have it implemented.

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-18 Thread Jeremy Linton
On 3/15/2013 8:28 AM, Bryn M. Reeves wrote:
 On 03/15/2013 12:46 PM, Bart Van Assche wrote:
 The SCSI EH keeps trying until all outstanding request have been 
 finished. Does lpfc_host_reset_handler() invoke scsi_done() for
 
 I don't think so (ends up calling lpfc_sli_cancel_iocbs() via 
 lpfc_hba_down_post() after shutting down the mailbox) but I've not seen the
 EH escalate all the way to host reset in most of my testing -
...
 The problem is that getting to this stage can take a very long time - much
 longer than most cluster's node eviction timer for e.g. which is the source
 of much of the complaint about this behaviour.

 outstanding requests ? If not, how about modifying 
 lpfc_host_reset_handler() such that it finishes all outstanding requests 
 if the remote port is not reachable ?

It does call the done() function on the outstanding command IOCBs after 
the
lpfc_reset_flush_io_context() call aborts them. The problem is that they are
returned with ScsiResult(DID_REQUEUE, 0) which basically queues them back to the
port as long as the port is still up. Which results in the commands hanging
out until their timeouts expire (if the device isn't responding).

If the device does resume after the reset, in the case of a tape device 
it is
possible corrupt the tape because the 2900's get trapped by the TUR in the eh
routines depending on which commands were hung. Take write for example, the
reset can result in a tape rewind, and when the write gets fired back at the
device the tape is at BOT and effectively erases all data already on the tape.
Whops!

Also, as I stated elsewhere, in my testing its impossible to escalate 
beyond
the flush_io_context() in the lpfc_device_reset_handler driver because it always
returns true if the card firmware is responding.



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Hannes Reinecke

On 03/14/2013 07:09 PM, Steffen Maier wrote:

Just for my understanding:
So this is a bit like writing 0 into the failed attribute of a
zfcp_port in sysfs (zfcp_sysfs_port_failed_store()) which forces a
port reopen recovery including a sequence of fc_remote_port_delete
and fc_remote_port_add?

Sorta.
This patch simulates the result of an RSCN where this remote port is 
missing. So the fast_io_fail / dev_loss_tmo mechanism is triggered 
(until further notice).
It'll be reset with the next RSCN or by manually resetting the port 
state.



If so, it sounds good to have this functionality for any FC LLD in
common code.


That's why I posted this code :-)
Rationale for this patch is a weird test case with brocade switches;
there you can actually disable a _target_ port. So the port isn't
reachable anymore but no RSCN is send.
And the LLDD is forced into error recovery which'll take _ages_ as 
each and every command send during error recovery will time out.



(And I can think about deprecating parts of zfcp code, next we could
consider the same for zfcp's forced LUN and adapter recovery or
maybe this already exists as sdev's writable state attribute and the
adapter recovery can be performed manually by walking all fc_rports
writing their port_state.)

I don't think the sdev 'state' attribute is useable here; it doesn't 
trigger a rediscovery. You'd want to use 'rescan' here.

But yeah, that would be one of the goals.

Still waiting for some response to the patch, though.
Testing has been successful on our side, plus I've posted
a patchset for multipath-tools utilizing this patch.

So it would be good if it could be included.

James?

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bryn M. Reeves

On 03/15/2013 11:55 AM, Hannes Reinecke wrote:

Rationale for this patch is a weird test case with brocade switches;
there you can actually disable a _target_ port. So the port isn't
reachable anymore but no RSCN is send.


I think it's more than a pure test-case; using the rscnsupr feature on 
the Brocades is just a handy way to trigger it.


I've seen numerous cases in the last few years of fabric failures that 
had this characteristic - either because of hardware failures in the 
switches or due to bugs that caused FC traffic to be blackholed without 
an RSCN or other indication (beyond commands timing out).


This (and the I_T nexus reset you proposed in December which is very 
effective at short-ciruiting the eh escalation in the same situation) 
both make the kernel more robust in the face of this kind of problem.


Regards,
Bryn.


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bart Van Assche

On 03/15/13 12:55, Hannes Reinecke wrote:

And the LLDD is forced into error recovery which'll take _ages_ as each
and every command send during error recovery will time out.


Hello Hannes,

I'm analyzing a related but not identical issue with SRP. It would help 
if you could tell with which LLDD you ran into this issue and with which 
values of fast_io_fail_tmo and dev_loss_tmo.


Bart.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bryn M. Reeves

On 03/15/2013 12:24 PM, Bart Van Assche wrote:

On 03/15/13 12:55, Hannes Reinecke wrote:

And the LLDD is forced into error recovery which'll take _ages_ as each
and every command send during error recovery will time out.


Hello Hannes,

I'm analyzing a related but not identical issue with SRP. It would help
if you could tell with which LLDD you ran into this issue and with which
values of fast_io_fail_tmo and dev_loss_tmo.


Most of the cases I've seen have involved lpfc (although I don't think 
it's in any way exclusive to that LLDD). Even with very low 
fast_io_fail_timeout/dev_loss_timeout (5/10) the eh is busy for 10m or 
longer before IO fails and multipath is able to react to the problem.


Regards,
Bryn.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bart Van Assche

On 03/15/13 13:37, Bryn M. Reeves wrote:

On 03/15/2013 12:24 PM, Bart Van Assche wrote:

On 03/15/13 12:55, Hannes Reinecke wrote:

And the LLDD is forced into error recovery which'll take _ages_ as each
and every command send during error recovery will time out.


Hello Hannes,

I'm analyzing a related but not identical issue with SRP. It would help
if you could tell with which LLDD you ran into this issue and with which
values of fast_io_fail_tmo and dev_loss_tmo.


Most of the cases I've seen have involved lpfc (although I don't think
it's in any way exclusive to that LLDD). Even with very low
fast_io_fail_timeout/dev_loss_timeout (5/10) the eh is busy for 10m or
longer before IO fails and multipath is able to react to the problem.


The SCSI EH keeps trying until all outstanding request have been 
finished. Does lpfc_host_reset_handler() invoke scsi_done() for 
outstanding requests ? If not, how about modifying 
lpfc_host_reset_handler() such that it finishes all outstanding requests 
if the remote port is not reachable ?


Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bryn M. Reeves

On 03/15/2013 12:46 PM, Bart Van Assche wrote:

The SCSI EH keeps trying until all outstanding request have been
finished. Does lpfc_host_reset_handler() invoke scsi_done() for


I don't think so (ends up calling lpfc_sli_cancel_iocbs() via 
lpfc_hba_down_post() after shutting down the mailbox) but I've not seen 
the EH escalate all the way to host reset in most of my testing - 
usually some time after reaching the bus reset remaining IOs timeout and 
the error bubbles up to device-mapper (all the cases I'm looking at are 
devices managed by a dm-multipath target).


The problem is that getting to this stage can take a very long time - 
much longer than most cluster's node eviction timer for e.g. which is 
the source of much of the complaint about this behaviour.



outstanding requests ? If not, how about modifying
lpfc_host_reset_handler() such that it finishes all outstanding requests
if the remote port is not reachable ?


I'm not sure how safe that is in this situation - James mentioned in the 
I_T nexus reset thread concerns about frames that could be delayed etc. 
in the fabric if the host unilaterally abandons IOs (not sure of the 
details for lpfc at this level).


Regards,
Bryn.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bart Van Assche

On 03/15/13 14:28, Bryn M. Reeves wrote:

On 03/15/2013 12:46 PM, Bart Van Assche wrote:

The SCSI EH keeps trying until all outstanding request have been
finished. Does lpfc_host_reset_handler() invoke scsi_done() for


I don't think so (ends up calling lpfc_sli_cancel_iocbs() via
lpfc_hba_down_post() after shutting down the mailbox) but I've not seen
the EH escalate all the way to host reset in most of my testing -
usually some time after reaching the bus reset remaining IOs timeout and
the error bubbles up to device-mapper (all the cases I'm looking at are
devices managed by a dm-multipath target).

The problem is that getting to this stage can take a very long time -
much longer than most cluster's node eviction timer for e.g. which is
the source of much of the complaint about this behaviour.


outstanding requests ? If not, how about modifying
lpfc_host_reset_handler() such that it finishes all outstanding requests
if the remote port is not reachable ?


I'm not sure how safe that is in this situation - James mentioned in the
I_T nexus reset thread concerns about frames that could be delayed etc.
in the fabric if the host unilaterally abandons IOs (not sure of the
details for lpfc at this level).


How about using the value of scsi_cmnd.jiffies_at_alloc to finish only 
those SCSI commands in the host reset handler that exceeded a certain 
processing time ?


Bart.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Mike Christie
On 03/15/2013 08:41 AM, Bart Van Assche wrote:
 On 03/15/13 14:28, Bryn M. Reeves wrote:
 On 03/15/2013 12:46 PM, Bart Van Assche wrote:
 The SCSI EH keeps trying until all outstanding request have been
 finished. Does lpfc_host_reset_handler() invoke scsi_done() for


It does not really matter at that point for the scsi command timeout
case. When blk_complete_request is called by scsi_done, it will see the
command has timed out and so it will not be processed by that normal
completion path. The scsi eh basically owns the completion processing of
the command once it has timed out.

For the cleanup when the port is not reachable you had suggested is what
Hannes is proposing in the I_T nexus reset patch. The problem is making
sure that if the driver/class returns SUCCESS or FAST_IO_FAIL then they
do not touch the scsi cmnd struct again.


 I don't think so (ends up calling lpfc_sli_cancel_iocbs() via
 lpfc_hba_down_post() after shutting down the mailbox) but I've not seen
 the EH escalate all the way to host reset in most of my testing -
 usually some time after reaching the bus reset remaining IOs timeout and
 the error bubbles up to device-mapper (all the cases I'm looking at are
 devices managed by a dm-multipath target).

 The problem is that getting to this stage can take a very long time -
 much longer than most cluster's node eviction timer for e.g. which is
 the source of much of the complaint about this behaviour.

 outstanding requests ? If not, how about modifying
 lpfc_host_reset_handler() such that it finishes all outstanding requests
 if the remote port is not reachable ?

 I'm not sure how safe that is in this situation - James mentioned in the
 I_T nexus reset thread concerns about frames that could be delayed etc.
 in the fabric if the host unilaterally abandons IOs (not sure of the
 details for lpfc at this level).
 
 How about using the value of scsi_cmnd.jiffies_at_alloc to finish only
 those SCSI commands in the host reset handler that exceeded a certain
 processing time ?
 

We basically do this now. When a scsi command times out the scsi layer
blocks the host from processing new commands and waits for all
outstanding commands to either finish normally or timeout. When all
commands have finished or timedout, then we start the scsi eh code. So,
by the time we have go to the scsi eh callbacks we are in a state where
all the commands being processed by the eh have exceeded a certain
processing time.

If you mean you want to drop the block and wait part, then I think it
could speed things up to do the abort callbacks while other IO is
running (as long as the driver can support it). However if the abort
fails and you need to escalate to operations like resets which interfere
with multiple commands, then the driver/scsi-ml does not have much
choice in what it does cleanup wise. There would be no point in checking
the jiffies_at_alloc. The commands that are going to be affected by the
tmf or host reset operation must be returned to the scsi-ml for retries
or failure upwards.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bart Van Assche

On 03/15/13 19:51, Mike Christie wrote:

On 03/15/2013 08:41 AM, Bart Van Assche wrote:

How about using the value of scsi_cmnd.jiffies_at_alloc to finish only
those SCSI commands in the host reset handler that exceeded a certain
processing time ?


We basically do this now. When a scsi command times out the scsi layer
blocks the host from processing new commands and waits for all
outstanding commands to either finish normally or timeout. When all
commands have finished or timedout, then we start the scsi eh code. So,
by the time we have go to the scsi eh callbacks we are in a state where
all the commands being processed by the eh have exceeded a certain
processing time.

If you mean you want to drop the block and wait part, then I think it
could speed things up to do the abort callbacks while other IO is
running (as long as the driver can support it). However if the abort
fails and you need to escalate to operations like resets which interfere
with multiple commands, then the driver/scsi-ml does not have much
choice in what it does cleanup wise. There would be no point in checking
the jiffies_at_alloc. The commands that are going to be affected by the
tmf or host reset operation must be returned to the scsi-ml for retries
or failure upwards.


Hello Mike,

It seems like there is a misunderstanding. With my comment I was not 
referring to the SCSI ML but to the SCSI LLD. LLD drivers like ib_srp 
keep track of outstanding SCSI requests. With the SRP protocol it is 
possible to tell the InfiniBand HCA not to deliver completions for 
outstanding requests by closing the connection used for SRP 
communication. Hence my suggestion to finish SCSI commands that were 
queued longer than a certain time ago from inside the LLD host reset 
handler. I'm not sure though whether all types of FC HBA's allow 
something equivalent.


Bart.


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Mike Christie
On 03/15/2013 02:13 PM, Bart Van Assche wrote:
 On 03/15/13 19:51, Mike Christie wrote:
 On 03/15/2013 08:41 AM, Bart Van Assche wrote:
 How about using the value of scsi_cmnd.jiffies_at_alloc to finish only
 those SCSI commands in the host reset handler that exceeded a certain
 processing time ?

 We basically do this now. When a scsi command times out the scsi layer
 blocks the host from processing new commands and waits for all
 outstanding commands to either finish normally or timeout. When all
 commands have finished or timedout, then we start the scsi eh code. So,
 by the time we have go to the scsi eh callbacks we are in a state where
 all the commands being processed by the eh have exceeded a certain
 processing time.

 If you mean you want to drop the block and wait part, then I think it
 could speed things up to do the abort callbacks while other IO is
 running (as long as the driver can support it). However if the abort
 fails and you need to escalate to operations like resets which interfere
 with multiple commands, then the driver/scsi-ml does not have much
 choice in what it does cleanup wise. There would be no point in checking
 the jiffies_at_alloc. The commands that are going to be affected by the
 tmf or host reset operation must be returned to the scsi-ml for retries
 or failure upwards.
 
 Hello Mike,
 
 It seems like there is a misunderstanding. With my comment I was not
 referring to the SCSI ML but to the SCSI LLD. LLD drivers like ib_srp
 keep track of outstanding SCSI requests. With the SRP protocol it is
 possible to tell the InfiniBand HCA not to deliver completions for
 outstanding requests by closing the connection used for SRP
 communication. Hence my suggestion to finish SCSI commands that were
 queued longer than a certain time ago from inside the LLD host reset

Are you saying you would have a class/driver timer running that
determines when to run this? What you describe sounds like it would
help. I am just wondering when it would be run.

If I understand you correctly, is iscsi doing what you describe when we
escalate to session dropping and relogin?


 handler. I'm not sure though whether all types of FC HBA's allow
 something equivalent.
 
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-14 Thread Steffen Maier

Just for my understanding:
So this is a bit like writing 0 into the failed attribute of a zfcp_port 
in sysfs (zfcp_sysfs_port_failed_store()) which forces a port reopen 
recovery including a sequence of fc_remote_port_delete and 
fc_remote_port_add?
If so, it sounds good to have this functionality for any FC LLD in 
common code.
(And I can think about deprecating parts of zfcp code, next we could 
consider the same for zfcp's forced LUN and adapter recovery or maybe 
this already exists as sdev's writable state attribute and the adapter 
recovery can be performed manually by walking all fc_rports writing 
their port_state.)


On 01/15/2013 04:02 PM, Hannes Reinecke wrote:

Multipath might detect a path down even though the LLDD
hasn't registered a link down event.
This patch makes the 'port_state' FC attribute writeable
to enforce a link down scenario in these cases, allowing
for a fast failover to the remaining ports.

Cc: Chad Dupuis chad.dup...@qlogic.com
Cc: Andrew Vasquez andrew.vasq...@qlogic.com
Cc: James Smart james.sm...@emulex.com
Cc: James Bottomley jbottom...@parallels.com
Cc: Mike Christie micha...@cs.wisc.edu
Signed-off-by: Hannes Reinecke h...@suse.de

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 5e1c6dd..4d28ee5 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -42,6 +42,8 @@
  #include scsi_priv.h
  #include scsi_transport_fc_internal.h

+static void fc_flush_devloss(struct Scsi_Host *shost);
+static void fc_flush_work(struct Scsi_Host *shost);
  static int fc_queue_work(struct Scsi_Host *, struct work_struct *);
  static void fc_vport_sched_delete(struct work_struct *work);
  static int fc_vport_setup(struct Scsi_Host *shost, int channel,
@@ -949,7 +951,76 @@ show_fc_rport_roles (struct device *dev, struct 
device_attribute *attr,
  static FC_DEVICE_ATTR(rport, roles, S_IRUGO,
show_fc_rport_roles, NULL);

-fc_private_rport_rd_enum_attr(port_state, FC_PORTSTATE_MAX_NAMELEN);
+static ssize_t
+store_fc_rport_port_state(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+   unsigned long flags;
+   struct fc_rport *rport = transport_class_to_rport(dev);
+   struct Scsi_Host *shost = rport_to_shost(rport);
+
+   spin_lock_irqsave(shost-host_lock, flags);
+   if (!strncmp(buf, Blocked, 7)) {
+   if (rport-port_state != FC_PORTSTATE_ONLINE) {
+   spin_unlock_irqrestore(shost-host_lock, flags);
+   return -EBUSY;
+   }
+   spin_unlock_irqrestore(shost-host_lock, flags);
+   fc_remote_port_delete(rport);
+   } else if (!strncmp(buf, Online, 6)) {
+   if (rport-port_state != FC_PORTSTATE_BLOCKED) {
+   spin_unlock_irqrestore(shost-host_lock, flags);
+   return -EBUSY;
+   }
+   /* Reset back to online for rescan */
+   rport-port_state = FC_PORTSTATE_ONLINE;
+   spin_unlock_irqrestore(shost-host_lock, flags);
+
+   if (!cancel_delayed_work(rport-fail_io_work))
+   fc_flush_devloss(shost);
+   if (!cancel_delayed_work(rport-dev_loss_work))
+   fc_flush_devloss(shost);
+
+   spin_lock_irqsave(shost-host_lock, flags);
+   rport-flags = ~(FC_RPORT_FAST_FAIL_TIMEDOUT |
+ FC_RPORT_DEVLOSS_PENDING |
+ FC_RPORT_DEVLOSS_CALLBK_DONE);
+   spin_unlock_irqrestore(shost-host_lock, flags);
+
+   /* ensure any stgt delete functions are done */
+   fc_flush_work(shost);
+
+   if (rport-roles  FC_PORT_ROLE_FCP_TARGET) {
+   scsi_target_unblock(rport-dev, SDEV_RUNNING);
+   /* initiate a scan of the target */
+   spin_lock_irqsave(shost-host_lock, flags);
+   rport-flags |= FC_RPORT_SCAN_PENDING;
+   scsi_queue_work(shost, rport-scan_work);
+   spin_unlock_irqrestore(shost-host_lock, flags);
+   }
+   } else
+   return -EINVAL;
+
+   return count;
+}
+
+static ssize_t
+show_fc_rport_port_state (struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+   struct fc_rport *rport = transport_class_to_rport(dev);
+   const char *name;
+
+   name = get_fc_port_state_name(rport-port_state);
+   if (!name)
+   return -EINVAL;
+   return snprintf(buf, FC_PORTSTATE_MAX_NAMELEN, %s\n, name);
+}
+
+static FC_DEVICE_ATTR(rport, port_state, S_IRUGO | S_IWUSR,
+   show_fc_rport_port_state, store_fc_rport_port_state);
+
  fc_private_rport_rd_attr(scsi_target_id, %d\n, 20);

  /*
@@ -2289,7 +2360,7 @@ 

[PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-01-15 Thread Hannes Reinecke
Multipath might detect a path down even though the LLDD
hasn't registered a link down event.
This patch makes the 'port_state' FC attribute writeable
to enforce a link down scenario in these cases, allowing
for a fast failover to the remaining ports.

Cc: Chad Dupuis chad.dup...@qlogic.com
Cc: Andrew Vasquez andrew.vasq...@qlogic.com
Cc: James Smart james.sm...@emulex.com
Cc: James Bottomley jbottom...@parallels.com
Cc: Mike Christie micha...@cs.wisc.edu
Signed-off-by: Hannes Reinecke h...@suse.de

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 5e1c6dd..4d28ee5 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -42,6 +42,8 @@
 #include scsi_priv.h
 #include scsi_transport_fc_internal.h
 
+static void fc_flush_devloss(struct Scsi_Host *shost);
+static void fc_flush_work(struct Scsi_Host *shost);
 static int fc_queue_work(struct Scsi_Host *, struct work_struct *);
 static void fc_vport_sched_delete(struct work_struct *work);
 static int fc_vport_setup(struct Scsi_Host *shost, int channel,
@@ -949,7 +951,76 @@ show_fc_rport_roles (struct device *dev, struct 
device_attribute *attr,
 static FC_DEVICE_ATTR(rport, roles, S_IRUGO,
show_fc_rport_roles, NULL);
 
-fc_private_rport_rd_enum_attr(port_state, FC_PORTSTATE_MAX_NAMELEN);
+static ssize_t
+store_fc_rport_port_state(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+   unsigned long flags;
+   struct fc_rport *rport = transport_class_to_rport(dev);
+   struct Scsi_Host *shost = rport_to_shost(rport);
+
+   spin_lock_irqsave(shost-host_lock, flags);
+   if (!strncmp(buf, Blocked, 7)) {
+   if (rport-port_state != FC_PORTSTATE_ONLINE) {
+   spin_unlock_irqrestore(shost-host_lock, flags);
+   return -EBUSY;
+   }
+   spin_unlock_irqrestore(shost-host_lock, flags);
+   fc_remote_port_delete(rport);
+   } else if (!strncmp(buf, Online, 6)) {
+   if (rport-port_state != FC_PORTSTATE_BLOCKED) {
+   spin_unlock_irqrestore(shost-host_lock, flags);
+   return -EBUSY;
+   }
+   /* Reset back to online for rescan */
+   rport-port_state = FC_PORTSTATE_ONLINE;
+   spin_unlock_irqrestore(shost-host_lock, flags);
+
+   if (!cancel_delayed_work(rport-fail_io_work))
+   fc_flush_devloss(shost);
+   if (!cancel_delayed_work(rport-dev_loss_work))
+   fc_flush_devloss(shost);
+
+   spin_lock_irqsave(shost-host_lock, flags);
+   rport-flags = ~(FC_RPORT_FAST_FAIL_TIMEDOUT |
+ FC_RPORT_DEVLOSS_PENDING |
+ FC_RPORT_DEVLOSS_CALLBK_DONE);
+   spin_unlock_irqrestore(shost-host_lock, flags);
+
+   /* ensure any stgt delete functions are done */
+   fc_flush_work(shost);
+
+   if (rport-roles  FC_PORT_ROLE_FCP_TARGET) {
+   scsi_target_unblock(rport-dev, SDEV_RUNNING);
+   /* initiate a scan of the target */
+   spin_lock_irqsave(shost-host_lock, flags);
+   rport-flags |= FC_RPORT_SCAN_PENDING;
+   scsi_queue_work(shost, rport-scan_work);
+   spin_unlock_irqrestore(shost-host_lock, flags);
+   }
+   } else
+   return -EINVAL;
+
+   return count;
+}
+
+static ssize_t
+show_fc_rport_port_state (struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+   struct fc_rport *rport = transport_class_to_rport(dev);
+   const char *name;
+
+   name = get_fc_port_state_name(rport-port_state);
+   if (!name)
+   return -EINVAL;
+   return snprintf(buf, FC_PORTSTATE_MAX_NAMELEN, %s\n, name);
+}
+
+static FC_DEVICE_ATTR(rport, port_state, S_IRUGO | S_IWUSR,
+   show_fc_rport_port_state, store_fc_rport_port_state);
+
 fc_private_rport_rd_attr(scsi_target_id, %d\n, 20);
 
 /*
@@ -2289,7 +2360,7 @@ fc_attach_transport(struct fc_function_template *ft)
SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(port_name);
SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(port_id);
SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(roles);
-   SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(port_state);
+   SETUP_PRIVATE_RPORT_ATTRIBUTE_RW(port_state);
SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(scsi_target_id);
SETUP_PRIVATE_RPORT_ATTRIBUTE_RW(fast_io_fail_tmo);
 
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html