Il 18/09/2014 05:26, Alexey Kardashevskiy ha scritto:
> On 09/18/2014 01:07 AM, Stefan Hajnoczi wrote:
>> On Wed, Sep 17, 2014 at 2:44 PM, Alexey Kardashevskiy <a...@ozlabs.ru> wrote:
>>> On 09/17/2014 07:25 PM, Paolo Bonzini wrote:
>>> btw any better idea of a hack to try? Testers are pushing me - they want to
>>> upgrade the broken setup and I am blocking them :) Thanks!
>>
>> Paolo's qemu_co_mutex_lock(&s->lock) idea in qcow2_invalidate_cache()
>> is good.  Have you tried that patch?
> 
> 
> Yes, did not help.
> 
>>
>> I haven't checked the qcow2 code whether that works properly across
>> bdrv_close() (is the lock freed?) but in principle that's how you
>> protect against concurrent I/O.
> 
> I thought we have to avoid qemu_coroutine_yield() in this particular case.
> I fail to see how the locks may help if we still do yeild. But the whole
> thing is already way behind of my understanding :) For example - how many
> BlockDriverState things are layered here? NBD -> QCOW2 -> RAW?

No, this is an NBD server.  So we have three users of the same QCOW2
image: migration, NBD server and virtio disk (not active while the bug
happens, and thus not depicted):


              NBD server   ->    QCOW2     <-     migration
                                   |
                                   v
                                 File

The problem is that the NBD server accesses the QCOW2 image while
migration does qcow2_invalidate_cache.

Paolo

Reply via email to