Am 26.05.2015 um 11:44 hat Paolo Bonzini geschrieben: > > > On 26/05/2015 11:37, Kevin Wolf wrote: > > > If we run into a timeout we theoretically have the following options: > > > - reconnect > > > - retry > > > - error > > > > > > I would reconnect as Ronnie proposed. > > > > Just trying to reconnect indefinitely might not be the best option. > > Consider the situation where you're inside a bdrv_drain_all(), which > > blocks qemu completely. Trying to reconnect once or twice is probably > > fine, but if that doesn't work, eventually you want to return an error > > so that qemu is unstuck. > > Whenever the topic of timeout is brought about, I'm worried that > introducing timeouts (and doing anything except reconnecting) is the > same as NFS's soft option, which can actually cause data corruption. > So, why would it be safe?
How would it cause data corruption for qemu, i.e. which of the block layer assumptions would be broken? > Considering that, unlike a process stuck on NFS, QEMU can always be > SIGKILLed, reconnection seems like a pretty good default. Having to kill a whole VM just because one disk is on an NFS server that has gone down might somehow be good enough, but I wouldn't call it "pretty good". > Perhaps we can have a limited number of retries (like NFS's retrans) > followed by either reconnect or error? Perhaps. And unless there is a real corruption scenario, a limited number of reconnects before we error out. Kevin