On Wed, Jul 9, 2014 at 8:51 PM, KY Srinivasan <k...@microsoft.com> wrote: > > >> -----Original Message----- >> From: Christoph Hellwig [mailto:h...@infradead.org] >> Sent: Wednesday, July 9, 2014 1:44 AM >> To: KY Srinivasan >> Cc: linux-ker...@vger.kernel.org; de...@linuxdriverproject.org; >> oher...@suse.com; jbottom...@parallels.com; jasow...@redhat.com; >> a...@canonical.com; linux-s...@vger.kernel.org >> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler >> >> On Tue, Jul 08, 2014 at 05:46:50PM -0700, K. Y. Srinivasan wrote: >> > Implement a simple abort handler. The host does not support "Abort"; >> > just ensure that all inflight I/Os have been accounted for. >> >> The abort handler should abort a single command, not wait for all of them. >> What issue do you see that this tries to address? > > On Azure, we sometimes have unbounded I/O latencies and some distributions > (such as SLES12) based on recent kernels are invoking > the "Abort Handler". Unfortunately, our scsi emulation on the host does not > support aborting a command. > The issue I have seen is that the upper level scsi code attempts error > recovery when the command times out and finally frees up the command. > The host subsequently responds to the command that has timed out and since > the memory has been freed up, we end up touching freed memory > in this driver. Since the host is also doing error recovery, by just delaying > the error handler in the guest until we can account for all the in-flight > commands, > we can get around the problem.
I see strange issues in Azure and maybe they are related to this. Some Linux machines crash in a way that no disk IO is possible (thus, no SSH for me) but they still respond to ping. It happens rather seldom (every few weeks). Do you see similar symptoms? -- Thanks, //richard _______________________________________________ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel