On Fri, Sep 12, 2025 at 04:53:19PM +0200, Kevin Wolf wrote: > Am 12.09.2025 um 16:23 hat Daniel P. Berrangé geschrieben: > > On Thu, Sep 11, 2025 at 02:13:47PM +0200, Kevin Wolf wrote: > > > Hm, so block_crypto_read_func() isn't prepared to be called in coroutine > > > context, but block_crypto_co_amend_luks() still calls it from a > > > coroutine. The indirection of going through QCrypto won't make it easier > > > to fix this properly. > > > > Historically block_crypto_read_func() didn't care/know whether it > > was in a coroutine or not. Bisect tells me the regression was caused > > by > > > > commit 1f051dcbdf2e4b6f518db731c84e304b2b9d15ce > > Author: Kevin Wolf <[email protected]> > > Date: Fri Oct 27 17:53:33 2023 +0200 > > > > block: Protect bs->file with graph_lock > > > > which added > > > > GLOBAL_STATE_CODE(); > > GRAPH_RDLOCK_GUARD_MAINLOOP(); > > > > > It seems to me that while block_crypto_read/write_func are effectively > > > no_coroutine_fn, qcrypto_block_amend_options() should really take > > > function pointers that can be called from coroutines. It is called from > > > both coroutine and non-coroutine code paths, so should the function > > > pointers be coroutine_mixed_fn or do we want to change the callers? > > > > > > Either way, we should add the appropriate coroutine markers to the > > > QCrypto interfaces to show the intention at least. > > > > I'm unclear why QCrypto needs to know about coroutines at all ? > > It just wants a function pointer that will send or recv a blob > > of data. In the case of the block layer these functions end up > > doing I/O via the block APIs, but QCrypto doesn't care about > > this impl detail. > > Does a case where it's not in the context of the block layer even exist?
Only the unit tests. > The only callers of qcrypto_block_amend_options() are in block/crypto.c > and block/qcow2.c. Apart from a test case, qcrypto_block_open() is the > same. Yep > And even ignoring the block layer, doing synchronous I/O outside of > coroutines is arguably a bug anyway because that's a blocking operation > that stops the mainloop from making progress. More generally, simply opening a LUKS volume can also impose a significant delay because key validation is intentionally slow in wallclock time. So we should get a minimum of 1 second delay, but if given an image created on a significantly faster machine (or a malicious image), the larger 'iterations' count could make us take way longer to open the image. I guess that's a potential problem too ? Amending the keys has the same performance penalty too as that involves same intentionally slow crypto > But if we don't want to fix it at the QCrypto level, that makes the > function pointer implicitly coroutine_mixed_fn and the function needs to > be rewritten to check qemu_in_coroutine() and probably take the graph > lock internally before calling bdrv_co_pread() in the coroutine case, > unless we can prove that all callers already hold it. Unfortunately, > function pointers still defeat TSA, so this proof will have to be > manual. So IIUC the 'open' operation is not in a coroutine, while the 'amend' operation is in a coroutine ? IIUC the coroutine_mixed_fn is expanding to a no-op. What is the actual functional fix needed to stop the crash ? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
