On Thu, Oct 22, 2020 at 09:02:54PM +0800, Jiahui Cen wrote: > A VM in the cloud environment may use a virutal disk as the backend storage, > and there are usually filesystems on the virtual block device. When backend > storage is temporarily down, any I/O issued to the virtual block device will > cause an error. For example, an error occurred in ext4 filesystem would make > the filesystem readonly. However a cloud backend storage can be soon > recovered. > For example, an IP-SAN may be down due to network failure and will be online > soon after network is recovered. The error in the filesystem may not be > recovered unless a device reattach or system restart. So an I/O rehandle is > in need to implement a self-healing mechanism. > > This patch series propose a feature called I/O hang. It can rehandle AIOs > with EIO error without sending error back to guest. From guest's perspective > of view it is just like an IO is hanging and not returned. Guest can get > back running smoothly when I/O is recovred with this feature enabled.
Hi, This feature seems like an extension of the existing -drive rerror=/werror= parameters: werror=action,rerror=action Specify which action to take on write and read errors. Valid actions are: "ignore" (ignore the error and try to continue), "stop" (pause QEMU), "report" (report the error to the guest), "enospc" (pause QEMU only if the host disk is full; report the error to the guest otherwise). The default setting is werror=enospc and rerror=report. That mechanism already has a list of requests to retry and live migration integration. Using the werror=/rerror= mechanism would avoid code duplication between these features. You could add a werror/rerror=retry error action for this feature. Does that sound good? Stefan
signature.asc
Description: PGP signature