On Fri, Mar 17, 2017 at 12:27 PM, Paolo Bonzini <pbonz...@redhat.com> wrote: > And this is a fix, but I have no idea why/how it works and what else it > may break. > > Patches 1 and 2 are pretty obvious and would be the first step towards > eliminating aio_disable/enable_external altogether. > > However I got patch 3 more or less by trial and error, and when I > thought I had the reasoning right I noticed this: > > bdrv_drained_end(state->old_bs); > > in external_snapshot_clean which makes no sense given the > > bdrv_drained_begin(bs_new); > > that I added to bdrv_append. So take this with a ton of salt. > > The basic idea is that calling child->role->drained_begin and > child->role->drained_end is not necessary and in fact actively wrong > when both the old and the new child should be in a drained section. > But maybe instead it should be asserted that they are, except for the > special case of adding or removing a child. i.e. after > > int drain = !!(old_bs && old_bs->quiesce_counter) - !!(new_bs && > new_bs->quiesce_counter); > > add > > assert(!(drain && old_bs && new_bs)); > > Throwing this out because it's Friday evening... Maybe Fam can pick > it up on Monday.
OK, thanks. It would be good to figure this out for 2.9, since the workaround of disabling iothreads will affect performance. Let me know if there's anything I can do to help. Meanwhile I'm also looking into an intermittent crash running block-commit on an external snapshot. This is with an earlier QEMU snapshot (from around 20 Jan): /x/qemu/deb/debuild/block.c:2433: bdrv_append: Assertion `!bdrv_requests_pending(bs_top)' failed. That assertion no longer exists in the current master, but I'm trying to reproduce it reliably and see whether the bug itself has disappeared. --Ed