On 02/06/2015 18:43, ronnie sahlberg wrote: > If we change this to iSCSI, we can actually avoid this by using task > management functions: > guest -> QEMU write A to sector 1 > QEMU -> iSCSI write A to sector 1 > ... timeout... > QEMU -> iSCSI task management: abort task for Write A (**A) > QEMU -> guest write A to sector 1 timed out > guest -> QEMU write B to sector 1 (**B) > > I think that IF a task times out and then IF you then immediately > generate and send a task management abort task to the > target, and you do this before you tell the guest the i/o failed, then > all should be good.
You still have to wait for the answer to the TMF, so this doesn't help much. :-( Paolo > That should guarantee the ordering of **A always being sent to the > target before **B > so the race should not happen. > > > > > At this point you have the two outstanding writes are for the same > sector and with different payloads, so it's undefined which one > wins. > > QEMU -> NFS write B to sector 1 > NFS -> QEMU write B to sector 1 completed > QEMU -> guest write B to sector 1 completed > NFS -> QEMU write A to sector 1 completed > (QEMU doesn't report this to the guest) > > The guest thinks it has written B, but it's possible that the > storage > has written A. > > > So you would go for infinite reconnecting? We can SIGKILL then anyway. > > As said before my idea would be default of 5000ms for all sync calls and > no timeout for all async calls coming from the block layer. > > A user settable timeout can be optionally specified via cmdline options > to define a timeout for both sync and async calls. > > > Sounds sane to me. > > As for infinite reconnect. I guess that since these disks are not > exposes as "removable" to the > guest, there is not really much recovery that the guest kernel can do if > the disk go away and never return > so there might not be much point in not having infinite reconnect attempts. > > >