Hi Kevin, Am 26.03.24 um 13:44 schrieb Kevin Wolf: > Am 22.03.2024 um 10:50 hat Fiona Ebner geschrieben: >> The old_bs variable in bdrv_next() is currently determined by looking >> at the old block backend. However, if the block graph changes before >> the next bdrv_next() call, it might be that the associated BDS is not >> the same that was referenced previously. In that case, the wrong BDS >> is unreferenced, leading to an assertion failure later: >> >>> bdrv_unref: Assertion `bs->refcnt > 0' failed. > > Your change makes sense, but in theory it shouldn't make a difference. > The real bug is in the caller, you can't allow graph modifications while > iterating the list of nodes. Even if it doesn't crash (like after your > patch), you don't have any guarantee that you will have seen every node > that exists that the end - and maybe not even that you don't get the > same node twice. > >> In particular, this can happen in the context of bdrv_flush_all(), >> when polling for bdrv_co_flush() in the generated co-wrapper leads to >> a graph change (for example with a stream block job [0]). > > The whole locking around this case is a bit tricky and would deserve > some cleanup. > > The basic rule for bdrv_next() callers is that they need to hold the > graph reader lock as long as they are iterating the graph, otherwise > it's not safe. This implies that the ref/unref pairs in it should never > make a difference either - which is important, because at least > releasing the last reference is forbidden while holding the graph lock. > I intended to remove the ref/unref for bdrv_next(), but I didn't because > I realised that the callers need to be audited first that they really > obey the rules. You found one that would be problematic. > > The thing that bdrv_flush_all() gets wrong is that it promises to follow > the graph lock rules with GRAPH_RDLOCK_GUARD_MAINLOOP(), but then calls > something that polls. The compiler can't catch this because bdrv_flush() > is a co_wrapper_mixed_bdrv_rdlock. The behaviour for these functions is: > > - If called outside of coroutine context, they are GRAPH_UNLOCKED > - If called in coroutine context, they are GRAPH_RDLOCK > > We should probably try harder to get rid of the mixed functions, because > a synchronous co_wrapper_bdrv_rdlock could actually be marked > GRAPH_UNLOCKED in the code and then the compiler could catch this case. > > The fix for bdrv_flush_all() is probably to make it bdrv_co_flush_all() > with a coroutine wrapper so that the graph lock is held for the whole > function. Then calling bdrv_co_flush() while iterating the list is safe > and doesn't allow concurrent graph modifications.
I attempted this now, but ran into two issues: The first is that I had to add support for a function without arguments to the block-coroutine-wrapper.py script. I can send this as an RFC in any case if desired. The second is that iotest 255 ran into an assertion failure upon QMP 'quit': > ../block/graph-lock.c:113: bdrv_graph_wrlock: Assertion > `!qemu_in_coroutine()' failed. Looking at the backtrace: > #5 0x0000762a90cc3eb2 in __GI___assert_fail > (assertion=0x5afb07991e7d "!qemu_in_coroutine()", file=0x5afb07991e00 > "../block/graph-lock.c", line=113, function=0x5afb07991f20 > <__PRETTY_FUNCTION__.4> "bdrv_graph_wrlock") > at ./assert/assert.c:101 > #6 0x00005afb07585311 in bdrv_graph_wrlock () at ../block/graph-lock.c:113 > #7 0x00005afb07573a36 in blk_remove_bs (blk=0x5afb0af99420) at > ../block/block-backend.c:901 > #8 0x00005afb075729a7 in blk_delete (blk=0x5afb0af99420) at > ../block/block-backend.c:487 > #9 0x00005afb07572d88 in blk_unref (blk=0x5afb0af99420) at > ../block/block-backend.c:547 > #10 0x00005afb07572fe8 in bdrv_next (it=0x762a852fef00) at > ../block/block-backend.c:618 > #11 0x00005afb0758cd65 in bdrv_co_flush_all () at ../block/io.c:2347 > #12 0x00005afb0753ba37 in bdrv_co_flush_all_entry (opaque=0x7ffff12c6050) at > block/block-gen.c:1391 > #13 0x00005afb0773bf41 in coroutine_trampoline (i0=168365184, i1=23291) So I guess calling bdrv_next() is not safe from a coroutine, because the function doing the iteration could end up being the last thing to have a reference for the BB. Best Regards, Fiona