Re: Question about expected behavior of terminate_rport_io() in fc_function_template

2015-09-25 Thread Benjamin Block
Hej Hannes,

thx for the short explanation.

On 23:05 Wed 23 Sep , Hannes Reinecke wrote:
> On 09/23/2015 07:06 PM, Benjamin Block wrote:
> > Hello,
> > 
> > just a short question. If a low-level driver implements the function
> > `terminate_rport_io()` in `struct fc_function_template`, and it gets
> > called after IO failed, is the low-level driver expected to handle this
> > request synchronously or can it just schedule an action that is worked on
> > asynchronously to the call to the function?
> > 
> Actually, it doesn't matter, as 'terminate_rport_io()' should cause the
> driver to about outstanding commands. The main idea behind this is that
> the driver clears up any additional state it might have tacked onto the
> command. And calling '->done()', obviously.
> 
> Main goal is to have outstanding I/O returned to the upper layers, so
> that things like multipath can redirect outstanding I/O to other paths
> and facilitate quick failover.
>

Yeah, that is what I thought as well, after I read the initial patch
that introduced that function to the template and stack. Makes much more
sense then an implicit rule.

> 
> > Trouble is, we are seeing problems with SCSI-Commands being used by the
> > upper layers when we expect them to still be ours, after we got a call to
> > that function and didn't react upon it immediately. They do not contain
> > valid content anymore when they should.
> > 
> True; after terminate_rport_io() I/O should have been aborted.
> However, the SCSI layer really shouldn't reuse commands before ->done()
> has been invoked or the command itself has been aborted.
> 
> > I've looked into other implementations and it seems there are both
> > version, some LLDs explicitly wait upon completions of requests they
> > schedule and others just schedule work-items and return. That may
> > already be the answer, but I wanted to make sure I am not missing
> > something here. The documentation on it is not really existing, or I
> > missed it.
> > 
> As indicated, the driver is expected to call ->done() on outstanding
> commands when terminate_rport_io() is called.
> This smells more like an issue with the driver itself; if I were to
> guess I would think that some aborts are not handled correctly ...
> 
> But it's hard to know without details. Do you have some message log or
> something?
> 

It may well be that this is a problem in the driver. I am still working
on it, I have logs but those are very messy because the test load
involves LVM volumes with multiple LUNs and multipathing, and I am
trying to reduce it in order to be better able to debug it.



Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
   Geschäftsführung: Dirk Wittkopp / Sitz der Gesellschaft: Böblingen
   Registergericht: Amtsgericht Stuttgart, HRB 243294

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question about expected behavior of terminate_rport_io() in fc_function_template

2015-09-23 Thread Hannes Reinecke
On 09/23/2015 07:06 PM, Benjamin Block wrote:
> Hello,
> 
> just a short question. If a low-level driver implements the function
> `terminate_rport_io()` in `struct fc_function_template`, and it gets
> called after IO failed, is the low-level driver expected to handle this
> request synchronously or can it just schedule an action that is worked on
> asynchronously to the call to the function?
> 
Actually, it doesn't matter, as 'terminate_rport_io()' should cause the
driver to about outstanding commands. The main idea behind this is that
the driver clears up any additional state it might have tacked onto the
command. And calling '->done()', obviously.

Main goal is to have outstanding I/O returned to the upper layers, so
that things like multipath can redirect outstanding I/O to other paths
and facilitate quick failover.

> Trouble is, we are seeing problems with SCSI-Commands being used by the
> upper layers when we expect them to still be ours, after we got a call to
> that function and didn't react upon it immediately. They do not contain
> valid content anymore when they should.
> 
True; after terminate_rport_io() I/O should have been aborted.
However, the SCSI layer really shouldn't reuse commands before ->done()
has been invoked or the command itself has been aborted.

> I've looked into other implementations and it seems there are both
> version, some LLDs explicitly wait upon completions of requests they
> schedule and others just schedule work-items and return. That may
> already be the answer, but I wanted to make sure I am not missing
> something here. The documentation on it is not really existing, or I
> missed it.
> 
As indicated, the driver is expected to call ->done() on outstanding
commands when terminate_rport_io() is called.
This smells more like an issue with the driver itself; if I were to
guess I would think that some aborts are not handled correctly ...

But it's hard to know without details. Do you have some message log or
something?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Question about expected behavior of terminate_rport_io() in fc_function_template

2015-09-23 Thread Benjamin Block
Hello,

just a short question. If a low-level driver implements the function
`terminate_rport_io()` in `struct fc_function_template`, and it gets
called after IO failed, is the low-level driver expected to handle this
request synchronously or can it just schedule an action that is worked on
asynchronously to the call to the function?

Trouble is, we are seeing problems with SCSI-Commands being used by the
upper layers when we expect them to still be ours, after we got a call to
that function and didn't react upon it immediately. They do not contain
valid content anymore when they should.

I've looked into other implementations and it seems there are both
version, some LLDs explicitly wait upon completions of requests they
schedule and others just schedule work-items and return. That may
already be the answer, but I wanted to make sure I am not missing
something here. The documentation on it is not really existing, or I
missed it.


Beste Grüße / Best regards,
  - Benjamin Block
--
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
   Geschäftsführung: Dirk Wittkopp / Sitz der Gesellschaft: Böblingen
   Registergericht: Amtsgericht Stuttgart, HRB 243294

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html