Re: Some iotests are failing with -luks

Daniel P . Berrangé Sat, 20 Sep 2025 12:55:30 -0700

On Fri, Sep 12, 2025 at 04:53:19PM +0200, Kevin Wolf wrote:
> Am 12.09.2025 um 16:23 hat Daniel P. Berrangé geschrieben:
> > On Thu, Sep 11, 2025 at 02:13:47PM +0200, Kevin Wolf wrote:
> > > Hm, so block_crypto_read_func() isn't prepared to be called in coroutine
> > > context, but block_crypto_co_amend_luks() still calls it from a
> > > coroutine. The indirection of going through QCrypto won't make it easier
> > > to fix this properly.
> > 
> > Historically block_crypto_read_func() didn't care/know whether it
> > was in a coroutine or not. Bisect tells me the regression was caused
> > by
> > 
> >   commit 1f051dcbdf2e4b6f518db731c84e304b2b9d15ce
> >   Author: Kevin Wolf <[email protected]>
> >   Date:   Fri Oct 27 17:53:33 2023 +0200
> > 
> >     block: Protect bs->file with graph_lock
> > 
> > which added
> >  
> >     GLOBAL_STATE_CODE();
> >     GRAPH_RDLOCK_GUARD_MAINLOOP();
> > 
> > > It seems to me that while block_crypto_read/write_func are effectively
> > > no_coroutine_fn, qcrypto_block_amend_options() should really take
> > > function pointers that can be called from coroutines. It is called from
> > > both coroutine and non-coroutine code paths, so should the function
> > > pointers be coroutine_mixed_fn or do we want to change the callers?
> > >
> > > Either way, we should add the appropriate coroutine markers to the
> > > QCrypto interfaces to show the intention at least.
> > 
> > I'm unclear why QCrypto needs to know about coroutines at all ?
> > It just wants a function pointer that will send or recv a blob
> > of data. In the case of the block layer these functions end up
> > doing I/O via the block APIs, but QCrypto doesn't care about
> > this impl detail.
> 
> Does a case where it's not in the context of the block layer even exist?


Only the unit tests.

> The only callers of qcrypto_block_amend_options() are in block/crypto.c
> and block/qcow2.c. Apart from a test case, qcrypto_block_open() is the
> same.

Yep

> And even ignoring the block layer, doing synchronous I/O outside of
> coroutines is arguably a bug anyway because that's a blocking operation
> that stops the mainloop from making progress.

More generally, simply opening a LUKS volume can also impose a significant
delay because key validation is intentionally slow in wallclock time. So we
should get a minimum of 1 second delay, but if given an image created on a
significantly faster machine (or a malicious image), the larger 'iterations'
count could make us take way longer to open the image.

I guess that's a potential problem too ?

Amending the keys has the same performance penalty too as that involves
same intentionally slow crypto

> But if we don't want to fix it at the QCrypto level, that makes the
> function pointer implicitly coroutine_mixed_fn and the function needs to
> be rewritten to check qemu_in_coroutine() and probably take the graph
> lock internally before calling bdrv_co_pread() in the coroutine case,
> unless we can prove that all callers already hold it. Unfortunately,
> function pointers still defeat TSA, so this proof will have to be
> manual.

So IIUC the 'open' operation is not in a coroutine, while the
'amend' operation is in a coroutine ?

IIUC the coroutine_mixed_fn is expanding to a no-op. What is the
actual functional fix needed to stop the crash ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: Some iotests are failing with -luks

Reply via email to