RE: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

2014-07-12 Thread KY Srinivasan


> -Original Message-
> From: Richard Weinberger [mailto:richard.weinber...@gmail.com]
> Sent: Saturday, July 12, 2014 9:17 AM
> To: KY Srinivasan
> Cc: Christoph Hellwig; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; oher...@suse.com;
> jbottom...@parallels.com; jasow...@redhat.com; a...@canonical.com;
> linux-scsi@vger.kernel.org
> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler
> 
> On Thu, Jul 10, 2014 at 12:33 PM, Richard Weinberger
>  wrote:
> > On Wed, Jul 9, 2014 at 8:51 PM, KY Srinivasan  wrote:
> >>
> >>
> >>> -Original Message-
> >>> From: Christoph Hellwig [mailto:h...@infradead.org]
> >>> Sent: Wednesday, July 9, 2014 1:44 AM
> >>> To: KY Srinivasan
> >>> Cc: linux-ker...@vger.kernel.org; de...@linuxdriverproject.org;
> >>> oher...@suse.com; jbottom...@parallels.com; jasow...@redhat.com;
> >>> a...@canonical.com; linux-scsi@vger.kernel.org
> >>> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort
> >>> handler
> >>>
> >>> On Tue, Jul 08, 2014 at 05:46:50PM -0700, K. Y. Srinivasan wrote:
> >>> > Implement a simple abort handler. The host does not support
> >>> > "Abort"; just ensure that all inflight I/Os have been accounted for.
> >>>
> >>> The abort handler should abort a single command, not wait for all of
> them.
> >>> What issue do you see that this tries to address?
> >>
> >> On Azure, we sometimes have unbounded I/O latencies and some
> >> distributions (such as SLES12) based on recent kernels are invoking the
> "Abort Handler". Unfortunately, our scsi emulation on the host does not
> support aborting a command.
> >> The issue I have seen is that the upper level scsi code attempts error
> recovery when the command times out and finally frees up the command.
> >> The host subsequently responds to the command that has timed out and
> >> since the memory has been freed up, we end up touching freed memory
> >> in this driver. Since the host is also doing error recovery, by just 
> >> delaying
> the error handler in the guest until we can account for all the in-flight
> commands, we can get around the problem.
> >
> > I see strange issues in Azure and maybe they are related to this.
> > Some Linux machines crash in a way that no disk IO is possible (thus,
> > no SSH for me) but they still respond to ping. It happens rather
> > seldom (every few weeks).
> >
> > Do you see similar symptoms?
> 
> ping?

Sorry for the delayed response. Yes we have seen resets and potentially the 
file system mounted
Read-only because of the I/O timeouts. We have increased the standard scsi 
timeouts. Implementing the
Timedout handler as we have done now should solve this problem.

K. Y
> 
> --
> Thanks,
> //richard


Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

2014-07-12 Thread Richard Weinberger
On Thu, Jul 10, 2014 at 12:33 PM, Richard Weinberger
 wrote:
> On Wed, Jul 9, 2014 at 8:51 PM, KY Srinivasan  wrote:
>>
>>
>>> -Original Message-
>>> From: Christoph Hellwig [mailto:h...@infradead.org]
>>> Sent: Wednesday, July 9, 2014 1:44 AM
>>> To: KY Srinivasan
>>> Cc: linux-ker...@vger.kernel.org; de...@linuxdriverproject.org;
>>> oher...@suse.com; jbottom...@parallels.com; jasow...@redhat.com;
>>> a...@canonical.com; linux-scsi@vger.kernel.org
>>> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler
>>>
>>> On Tue, Jul 08, 2014 at 05:46:50PM -0700, K. Y. Srinivasan wrote:
>>> > Implement a simple abort handler. The host does not support "Abort";
>>> > just ensure that all inflight I/Os have been accounted for.
>>>
>>> The abort handler should abort a single command, not wait for all of them.
>>> What issue do you see that this tries to address?
>>
>> On Azure, we sometimes have unbounded I/O latencies and some distributions 
>> (such as SLES12) based on recent kernels are invoking
>> the "Abort Handler". Unfortunately, our scsi emulation on the host does not 
>> support aborting a command.
>> The issue I have seen is that the upper level scsi code attempts error 
>> recovery when the command times out and finally frees up the command.
>> The host subsequently responds to the command that has timed out and since 
>> the memory has been freed up, we end up touching freed memory
>> in this driver. Since the host is also doing error recovery, by just 
>> delaying the error handler in the guest until we can account for all the 
>> in-flight commands,
>> we can get around the problem.
>
> I see strange issues in Azure and maybe they are related to this.
> Some Linux machines crash in a way that no disk IO is possible (thus,
> no SSH for me) but they still respond to
> ping. It happens rather seldom (every few weeks).
>
> Do you see similar symptoms?

ping?

-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

2014-07-11 Thread KY Srinivasan


> -Original Message-
> From: Hannes Reinecke [mailto:h...@suse.de]
> Sent: Friday, July 11, 2014 2:53 AM
> To: KY Srinivasan; Christoph Hellwig
> Cc: linux-ker...@vger.kernel.org; de...@linuxdriverproject.org;
> oher...@suse.com; jbottom...@parallels.com; jasow...@redhat.com;
> a...@canonical.com; linux-scsi@vger.kernel.org
> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler
> 
> On 07/11/2014 12:26 AM, KY Srinivasan wrote:
> >
> >
> >> -Original Message-
> >> From: Christoph Hellwig [mailto:h...@infradead.org]
> >> Sent: Thursday, July 10, 2014 3:13 AM
> >> To: KY Srinivasan
> >> Cc: Christoph Hellwig; linux-ker...@vger.kernel.org;
> >> de...@linuxdriverproject.org; oher...@suse.com;
> >> jbottom...@parallels.com; jasow...@redhat.com; a...@canonical.com;
> >> linux-scsi@vger.kernel.org
> >> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort
> >> handler
> >>
> >>
> >> Note that you could increase the timeout and/or implement an
> >> eh_timed_out handler that just returns BLK_EH_RESET_TIMER, but if the
> >> completion takes too long the expectation is that a command will
> >> eventually finish instead of beeing delayed by an unmound amount.
> >
> > I like this idea; I will implement a eh_timed_out_handler.
> >
> Something like this should be sufficient:
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index
> e71a0d7..630ae81 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -1468,6 +1468,12 @@ static int storvsc_get_chs(struct scsi_device
> *sdev, stru ct block_device * bdev,
>  return 0;
>   }
> 
> +static enum blk_eh_timer_return
> +storvsc_timed_out_handler(struct scsi_cmnd *scmd) {
> +   return BLK_EH_RESET_TIMER;
> +}
> +
>   static int storvsc_host_reset_handler(struct scsi_cmnd *scmnd)
>   {
>  struct hv_host_device *host_dev = shost_priv(scmnd->device->host);
> @@ -1687,6 +1693,7 @@ static struct scsi_host_template scsi_driver = {
>  .name = "storvsc_host_t",
>  .bios_param =   storvsc_get_chs,
>  .queuecommand = storvsc_queuecommand,
> +   .eh_timed_out = storvsc_timed_out_handler,
>  .eh_host_reset_handler =storvsc_host_reset_handler,
>  .slave_alloc =  storvsc_device_alloc,
>  .slave_destroy =storvsc_device_destroy,
> 
> Cheers,

Thanks Hannes. Based on Christoph's feedback I have implemented exactly this 
patch. It is under test on Azure.
I will be re-spinning and posting the patches shortly.

K. Y
> 
> Hannes
> --
> Dr. Hannes Reinecke zSeries & Storage
> h...@suse.de+49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

2014-07-11 Thread Christoph Hellwig
On Fri, Jul 11, 2014 at 11:52:55AM +0200, Hannes Reinecke wrote:
> Something like this should be sufficient:

Right.  That plus a detailed comment explaining why it's there..
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

2014-07-11 Thread Hannes Reinecke

On 07/11/2014 12:26 AM, KY Srinivasan wrote:




-Original Message-
From: Christoph Hellwig [mailto:h...@infradead.org]
Sent: Thursday, July 10, 2014 3:13 AM
To: KY Srinivasan
Cc: Christoph Hellwig; linux-ker...@vger.kernel.org;
de...@linuxdriverproject.org; oher...@suse.com;
jbottom...@parallels.com; jasow...@redhat.com; a...@canonical.com;
linux-scsi@vger.kernel.org
Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler


Note that you could increase the timeout and/or implement an
eh_timed_out handler that just returns BLK_EH_RESET_TIMER, but if the
completion takes too long the expectation is that a command will eventually
finish instead of beeing delayed by an unmound amount.


I like this idea; I will implement a eh_timed_out_handler.


Something like this should be sufficient:

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index e71a0d7..630ae81 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1468,6 +1468,12 @@ static int storvsc_get_chs(struct scsi_device 
*sdev, stru

ct block_device * bdev,
return 0;
 }

+static enum blk_eh_timer_return
+storvsc_timed_out_handler(struct scsi_cmnd *scmd)
+{
+   return BLK_EH_RESET_TIMER;
+}
+
 static int storvsc_host_reset_handler(struct scsi_cmnd *scmnd)
 {
struct hv_host_device *host_dev = 
shost_priv(scmnd->device->host);

@@ -1687,6 +1693,7 @@ static struct scsi_host_template scsi_driver = {
.name = "storvsc_host_t",
.bios_param =   storvsc_get_chs,
.queuecommand = storvsc_queuecommand,
+   .eh_timed_out = storvsc_timed_out_handler,
.eh_host_reset_handler =storvsc_host_reset_handler,
.slave_alloc =  storvsc_device_alloc,
.slave_destroy =storvsc_device_destroy,

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

2014-07-10 Thread KY Srinivasan


> -Original Message-
> From: Christoph Hellwig [mailto:h...@infradead.org]
> Sent: Thursday, July 10, 2014 3:13 AM
> To: KY Srinivasan
> Cc: Christoph Hellwig; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; oher...@suse.com;
> jbottom...@parallels.com; jasow...@redhat.com; a...@canonical.com;
> linux-scsi@vger.kernel.org
> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler
> 
> 
> Note that you could increase the timeout and/or implement an
> eh_timed_out handler that just returns BLK_EH_RESET_TIMER, but if the
> completion takes too long the expectation is that a command will eventually
> finish instead of beeing delayed by an unmound amount.

I like this idea; I will implement a eh_timed_out_handler.

K. Y

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

2014-07-10 Thread Richard Weinberger
On Wed, Jul 9, 2014 at 8:51 PM, KY Srinivasan  wrote:
>
>
>> -Original Message-
>> From: Christoph Hellwig [mailto:h...@infradead.org]
>> Sent: Wednesday, July 9, 2014 1:44 AM
>> To: KY Srinivasan
>> Cc: linux-ker...@vger.kernel.org; de...@linuxdriverproject.org;
>> oher...@suse.com; jbottom...@parallels.com; jasow...@redhat.com;
>> a...@canonical.com; linux-scsi@vger.kernel.org
>> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler
>>
>> On Tue, Jul 08, 2014 at 05:46:50PM -0700, K. Y. Srinivasan wrote:
>> > Implement a simple abort handler. The host does not support "Abort";
>> > just ensure that all inflight I/Os have been accounted for.
>>
>> The abort handler should abort a single command, not wait for all of them.
>> What issue do you see that this tries to address?
>
> On Azure, we sometimes have unbounded I/O latencies and some distributions 
> (such as SLES12) based on recent kernels are invoking
> the "Abort Handler". Unfortunately, our scsi emulation on the host does not 
> support aborting a command.
> The issue I have seen is that the upper level scsi code attempts error 
> recovery when the command times out and finally frees up the command.
> The host subsequently responds to the command that has timed out and since 
> the memory has been freed up, we end up touching freed memory
> in this driver. Since the host is also doing error recovery, by just delaying 
> the error handler in the guest until we can account for all the in-flight 
> commands,
> we can get around the problem.

I see strange issues in Azure and maybe they are related to this.
Some Linux machines crash in a way that no disk IO is possible (thus,
no SSH for me) but they still respond to
ping. It happens rather seldom (every few weeks).

Do you see similar symptoms?

-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

2014-07-10 Thread Christoph Hellwig
On Wed, Jul 09, 2014 at 06:51:38PM +, KY Srinivasan wrote:
> On Azure, we sometimes have unbounded I/O latencies and some distributions
> (such as SLES12) based on recent kernels are invoking the "Abort Handler".

Any kernel will invoke the abort handler if present, and then escalate
to the various resets.

> Unfortunately, our scsi emulation on the host does not support aborting 
> a command. The issue I have seen is that the upper level scsi code attempts
> error recovery when the command times out and finally frees up the command.
> The host subsequently responds to the command that has timed out and since
> the memory has been freed up, we end up touching freed memory in this
> driver. Since the host is also doing error recovery, by just delaying the
> error handler in the guest until we can account for all the in-flight 
> commands, we can get around the problem.

The storvsc driver does implement an bus reset error handler, and
after that completes successfully the midlayer frees the commands,
and the driver has to deal with this and not call scsi_done after
the reset finished (normally you'd expect the hardware to not complete
requests after an reset).

Note that you could increase the timeout and/or implement an
eh_timed_out handler that just returns BLK_EH_RESET_TIMER, but if the
completion takes too long the expectation is that a command will
eventually finish instead of beeing delayed by an unmound amount.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

2014-07-09 Thread KY Srinivasan


> -Original Message-
> From: Christoph Hellwig [mailto:h...@infradead.org]
> Sent: Wednesday, July 9, 2014 1:44 AM
> To: KY Srinivasan
> Cc: linux-ker...@vger.kernel.org; de...@linuxdriverproject.org;
> oher...@suse.com; jbottom...@parallels.com; jasow...@redhat.com;
> a...@canonical.com; linux-scsi@vger.kernel.org
> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler
> 
> On Tue, Jul 08, 2014 at 05:46:50PM -0700, K. Y. Srinivasan wrote:
> > Implement a simple abort handler. The host does not support "Abort";
> > just ensure that all inflight I/Os have been accounted for.
> 
> The abort handler should abort a single command, not wait for all of them.
> What issue do you see that this tries to address?

On Azure, we sometimes have unbounded I/O latencies and some distributions 
(such as SLES12) based on recent kernels are invoking
the "Abort Handler". Unfortunately, our scsi emulation on the host does not 
support aborting a command.
The issue I have seen is that the upper level scsi code attempts error recovery 
when the command times out and finally frees up the command.
The host subsequently responds to the command that has timed out and since the 
memory has been freed up, we end up touching freed memory
in this driver. Since the host is also doing error recovery, by just delaying 
the error handler in the guest until we can account for all the in-flight 
commands,
we can get around the problem.

Regards,

K. Y

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

2014-07-09 Thread Christoph Hellwig
On Tue, Jul 08, 2014 at 05:46:50PM -0700, K. Y. Srinivasan wrote:
> Implement a simple abort handler. The host does not support "Abort"; just
> ensure that all inflight I/Os have been accounted for.

The abort handler should abort a single command, not wait for all of
them.  What issue do you see that this tries to address?

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html