Am 17.10.2023 um 07:19 hat Michael Tokarev geschrieben: > 05.09.2023 17:50, Kevin Wolf wrote: > > virtio_load() as a whole should run in coroutine context because it > > reads from the migration stream and we don't want this to block. > > > > However, it calls virtio_set_features_nocheck() and devices don't > > expect their .set_features callback to run in a coroutine and therefore > > call functions that may not be called in coroutine context. To fix this, > > drop out of coroutine context for calling virtio_set_features_nocheck(). > ... > > Cc: qemu-sta...@nongnu.org > > Buglink: https://issues.redhat.com/browse/RHEL-832 > > Signed-off-by: Kevin Wolf <kw...@redhat.com> > > It looks like this change caused an interesting regression, > https://gitlab.com/qemu-project/qemu/-/issues/1933 > at least in -stable. Can you take a look please?
Huh?! This is an interesting one indeed. I can't see any direct connection between the patch and this regression. Random memory corruption is the only explanation I have. But I'm not sure how this patch could cause it, it's quite simple. The next step is probably trying to find a simple reproducer on the QEMU level. And then maybe valgrind or we could get stack traces for the call to virtio_set_features_nocheck_maybe_co(). Also the stack trace for the crash and maybe the content of 's' would be interesting - we can ask the reporter for that, the core dump should be enough for that. Another potentially interesting question is whether after yielding, the coroutine is indeed reentered from the aio_co_wake() call in the patch or if something else wakes it up. If it were the latter, that could explain memory corruption. > BTW, Kevin, do you have account @gitlab? Yes, @kmwolf. Kevin