Re: Question about expected behavior of terminate_rport_io() in fc_function_template
Hej Hannes, thx for the short explanation. On 23:05 Wed 23 Sep , Hannes Reinecke wrote: > On 09/23/2015 07:06 PM, Benjamin Block wrote: > > Hello, > > > > just a short question. If a low-level driver implements the function > > `terminate_rport_io()` in `struct fc_function_template`, and it gets > > called after IO failed, is the low-level driver expected to handle this > > request synchronously or can it just schedule an action that is worked on > > asynchronously to the call to the function? > > > Actually, it doesn't matter, as 'terminate_rport_io()' should cause the > driver to about outstanding commands. The main idea behind this is that > the driver clears up any additional state it might have tacked onto the > command. And calling '->done()', obviously. > > Main goal is to have outstanding I/O returned to the upper layers, so > that things like multipath can redirect outstanding I/O to other paths > and facilitate quick failover. > Yeah, that is what I thought as well, after I read the initial patch that introduced that function to the template and stack. Makes much more sense then an implicit rule. > > > Trouble is, we are seeing problems with SCSI-Commands being used by the > > upper layers when we expect them to still be ours, after we got a call to > > that function and didn't react upon it immediately. They do not contain > > valid content anymore when they should. > > > True; after terminate_rport_io() I/O should have been aborted. > However, the SCSI layer really shouldn't reuse commands before ->done() > has been invoked or the command itself has been aborted. > > > I've looked into other implementations and it seems there are both > > version, some LLDs explicitly wait upon completions of requests they > > schedule and others just schedule work-items and return. That may > > already be the answer, but I wanted to make sure I am not missing > > something here. The documentation on it is not really existing, or I > > missed it. > > > As indicated, the driver is expected to call ->done() on outstanding > commands when terminate_rport_io() is called. > This smells more like an issue with the driver itself; if I were to > guess I would think that some aborts are not handled correctly ... > > But it's hard to know without details. Do you have some message log or > something? > It may well be that this is a problem in the driver. I am still working on it, I have logs but those are very messy because the test load involves LVM volumes with multiple LUNs and multipathing, and I am trying to reduce it in order to be better able to debug it. Beste Grüße / Best regards, - Benjamin Block -- Linux on z Systems Development / IBM Systems & Technology Group IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp / Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question about expected behavior of terminate_rport_io() in fc_function_template
On 09/23/2015 07:06 PM, Benjamin Block wrote: > Hello, > > just a short question. If a low-level driver implements the function > `terminate_rport_io()` in `struct fc_function_template`, and it gets > called after IO failed, is the low-level driver expected to handle this > request synchronously or can it just schedule an action that is worked on > asynchronously to the call to the function? > Actually, it doesn't matter, as 'terminate_rport_io()' should cause the driver to about outstanding commands. The main idea behind this is that the driver clears up any additional state it might have tacked onto the command. And calling '->done()', obviously. Main goal is to have outstanding I/O returned to the upper layers, so that things like multipath can redirect outstanding I/O to other paths and facilitate quick failover. > Trouble is, we are seeing problems with SCSI-Commands being used by the > upper layers when we expect them to still be ours, after we got a call to > that function and didn't react upon it immediately. They do not contain > valid content anymore when they should. > True; after terminate_rport_io() I/O should have been aborted. However, the SCSI layer really shouldn't reuse commands before ->done() has been invoked or the command itself has been aborted. > I've looked into other implementations and it seems there are both > version, some LLDs explicitly wait upon completions of requests they > schedule and others just schedule work-items and return. That may > already be the answer, but I wanted to make sure I am not missing > something here. The documentation on it is not really existing, or I > missed it. > As indicated, the driver is expected to call ->done() on outstanding commands when terminate_rport_io() is called. This smells more like an issue with the driver itself; if I were to guess I would think that some aborts are not handled correctly ... But it's hard to know without details. Do you have some message log or something? Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Question about expected behavior of terminate_rport_io() in fc_function_template
Hello, just a short question. If a low-level driver implements the function `terminate_rport_io()` in `struct fc_function_template`, and it gets called after IO failed, is the low-level driver expected to handle this request synchronously or can it just schedule an action that is worked on asynchronously to the call to the function? Trouble is, we are seeing problems with SCSI-Commands being used by the upper layers when we expect them to still be ours, after we got a call to that function and didn't react upon it immediately. They do not contain valid content anymore when they should. I've looked into other implementations and it seems there are both version, some LLDs explicitly wait upon completions of requests they schedule and others just schedule work-items and return. That may already be the answer, but I wanted to make sure I am not missing something here. The documentation on it is not really existing, or I missed it. Beste Grüße / Best regards, - Benjamin Block -- Linux on z Systems Development / IBM Systems & Technology Group IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp / Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html