On 26/05/2015 12:06, Kevin Wolf wrote: > Am 26.05.2015 um 11:44 hat Paolo Bonzini geschrieben: >> >> >> On 26/05/2015 11:37, Kevin Wolf wrote: >>>> If we run into a timeout we theoretically have the following options: >>>> - reconnect >>>> - retry >>>> - error >>>> >>>> I would reconnect as Ronnie proposed. >>> >>> Just trying to reconnect indefinitely might not be the best option. >>> Consider the situation where you're inside a bdrv_drain_all(), which >>> blocks qemu completely. Trying to reconnect once or twice is probably >>> fine, but if that doesn't work, eventually you want to return an error >>> so that qemu is unstuck. >> >> Whenever the topic of timeout is brought about, I'm worried that >> introducing timeouts (and doing anything except reconnecting) is the >> same as NFS's soft option, which can actually cause data corruption. >> So, why would it be safe? > > How would it cause data corruption for qemu, i.e. which of the block > layer assumptions would be broken?
Reordering of operations. Say you have: guest -> QEMU write A to sector 1 QEMU -> NFS write A to sector 1 QEMU -> guest write A to sector 1 timed out guest -> QEMU write B to sector 1 At this point you have the two outstanding writes are for the same sector and with different payloads, so it's undefined which one wins. QEMU -> NFS write B to sector 1 NFS -> QEMU write B to sector 1 completed QEMU -> guest write B to sector 1 completed NFS -> QEMU write A to sector 1 completed (QEMU doesn't report this to the guest) The guest thinks it has written B, but it's possible that the storage has written A. Paolo