Il 18/09/2014 05:26, Alexey Kardashevskiy ha scritto: > On 09/18/2014 01:07 AM, Stefan Hajnoczi wrote: >> On Wed, Sep 17, 2014 at 2:44 PM, Alexey Kardashevskiy <a...@ozlabs.ru> wrote: >>> On 09/17/2014 07:25 PM, Paolo Bonzini wrote: >>> btw any better idea of a hack to try? Testers are pushing me - they want to >>> upgrade the broken setup and I am blocking them :) Thanks! >> >> Paolo's qemu_co_mutex_lock(&s->lock) idea in qcow2_invalidate_cache() >> is good. Have you tried that patch? > > > Yes, did not help. > >> >> I haven't checked the qcow2 code whether that works properly across >> bdrv_close() (is the lock freed?) but in principle that's how you >> protect against concurrent I/O. > > I thought we have to avoid qemu_coroutine_yield() in this particular case. > I fail to see how the locks may help if we still do yeild. But the whole > thing is already way behind of my understanding :) For example - how many > BlockDriverState things are layered here? NBD -> QCOW2 -> RAW?
No, this is an NBD server. So we have three users of the same QCOW2 image: migration, NBD server and virtio disk (not active while the bug happens, and thus not depicted): NBD server -> QCOW2 <- migration | v File The problem is that the NBD server accesses the QCOW2 image while migration does qcow2_invalidate_cache. Paolo