On Mon 14 Dec 2015 09:23:11 AM CET, Stefan Hajnoczi <stefa...@gmail.com> wrote:

>>   2) When a block-stream operation calls bdrv_reopen(), it drains all
>>      images. This can trigger the completion of a different
>>      block-stream operation, that will remove the intermediate images
>>      from the chain. When the original bdrv_reopen() continues, it
>>      still contains pointers to the images that have just been
>>      removed, crashing QEMU.
>> 
>>      The problem is described here:
>> 
>>      https://lists.gnu.org/archive/html/qemu-block/2015-06/msg00701.html
>> 
>>      I'm still not sure about how to deal with this. I've been taking
>>      a look at the bdrv_drained_begin/end() API, but as I understand
>>      it it prevents requests from a different AioContext.  Since all
>>      BDS in the same chain share the same context it does not really
>>      help here.
>
> You are right, bdrv_drained_begin/end() does not solve this problem.
>
> bdrv_reopen_multiple() claims to be atomic but can call aio_poll().
>
> I think blockjobs need the ability to quiesce or attach/detach from the
> AioContext (like BlockDriverStates).  That way reopen can be made truly
> atomic.
>
> The solution isn't clear in my mind.  Adding a quiesce operation to all
> blockjobs is probably not that hard to do.  I wonder whether that is a
> nice long-term solution though...

Yeah, I understand. Some things come to my mind:

  - There are BDSs in the reopen queue even if none of their options are
    going to be modified. Is this necessary? Is it possible to remove
    them from the queue if we know that they are not going to be
    affected? And would that be enough? :)

  - We can disallow this kind of parallel operations altogether and
    allow only one block-stream operation per BDS chain at a time. I
    think this would simplify lots of things, but of course at the
    expense of flexibility.

    This would require us to either

      a) block the whole chain. This can be reverted in the future if we
         ever figure out how to allow parallel operations.

      b) disallow block jobs on arbitrary nodes. This would be similar
         to what 'block-commit' does: install the job on the active
         layer even if the user wants to commit the data from some other
         node (using the 'top' parameter).

Berto

Reply via email to