Re: [Qemu-block] RFC block/iscsi command timeout

Kevin Wolf Tue, 26 May 2015 03:06:51 -0700

Am 26.05.2015 um 11:44 hat Paolo Bonzini geschrieben:
> 
> 
> On 26/05/2015 11:37, Kevin Wolf wrote:
> > > If we run into a timeout we theoretically have the following options:
> > >  - reconnect
> > >  - retry
> > >  - error
> > > 
> > > I would reconnect as Ronnie proposed.
> > 
> > Just trying to reconnect indefinitely might not be the best option.
> > Consider the situation where you're inside a bdrv_drain_all(), which
> > blocks qemu completely. Trying to reconnect once or twice is probably
> > fine, but if that doesn't work, eventually you want to return an error
> > so that qemu is unstuck.
> 
> Whenever the topic of timeout is brought about, I'm worried that
> introducing timeouts (and doing anything except reconnecting) is the
> same as NFS's soft option, which can actually cause data corruption.
> So, why would it be safe?


How would it cause data corruption for qemu, i.e. which of the block
layer assumptions would be broken?

> Considering that, unlike a process stuck on NFS, QEMU can always be
> SIGKILLed, reconnection seems like a pretty good default.

Having to kill a whole VM just because one disk is on an NFS server that
has gone down might somehow be good enough, but I wouldn't call it
"pretty good".

> Perhaps we can have a limited number of retries (like NFS's retrans)
> followed by either reconnect or error?

Perhaps. And unless there is a real corruption scenario, a limited
number of reconnects before we error out.

Kevin

Re: [Qemu-block] RFC block/iscsi command timeout

Reply via email to