Am 17.01.2022 um 09:46 hat Hanna Reitz geschrieben: > On 16.01.22 19:09, Nir Soffer wrote: > > On Sun, Jan 16, 2022 at 1:17 PM Nir Soffer <nsof...@redhat.com> wrote: > > > Some of our tests started to fail since qemu-img 6.2.0 released. > > > > > > Here is an example failure: > > > > > > $ truncate -s1g /data/scratch/empty-1g.raw > > > > > > $ qemu-nbd -r -t -e8 -k /tmp/nbd.sock -A -f raw > > > /data/scratch/empty-1g.raw & > > git bisect points to this commit: > > > > $ git bisect good > > 0bc329fbb009f8601cec23bf2bc48ead0c5a5fa2 is the first bad commit > > commit 0bc329fbb009f8601cec23bf2bc48ead0c5a5fa2 > > Author: Hanna Reitz <hre...@redhat.com> > > Date: Thu Aug 12 10:41:44 2021 +0200 > > > > block: block-status cache for data regions > > > > The commit message says: > > > > (Note that only caching data regions but not zero regions means that > > returning false information from the cache is not catastrophic: > > Treating > > zeroes as data is fine. While we try to invalidate the cache on zero > > writes and discards, such incongruences may still occur when there are > > other processes writing to the image.) > > > > I don't think it is fine to report zero as data. This can cause severe > > performance > > issues when users copy unneeded zero extents over the network, instead of > > doing no work with zero bandwidth. > > You’re right, it was meant as a contrast to whether this might lead to > catastrophic data failure. > > But it was also meant as a risk estimation. There wasn’t supposed to be a > situation where the cache would report zeroes as data (other than with > external writers), and so that’s why I thought it’d be fine. But, well, > things are supposed to be one way, and then you (me) write buggy code... > > Thanks for the reproducer steps, I can reproduce the bug with your script > (not with nbdinfo fwiw) and the problem seems to be this: > > diff --git a/block/io.c b/block/io.c > index bb0a254def..83e94faeaf 100644 > --- a/block/io.c > +++ b/block/io.c > @@ -2498,7 +2498,8 @@ static int coroutine_fn > bdrv_co_block_status(BlockDriverState *bs, > * the cache requires an RCU update, so double check here to > avoid > * such an update if possible. > */ > - if (ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) && > + if (want_zero && > + ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) && > QLIST_EMPTY(&bs->children)) > { > /* > > We should update the cache only when we have accurate zero information, so > only if want_zero was true.
Either that, or store the want_zero value as well and compare it before using the cache. Depends on whether we need the optimisation for want_zero=false cases, too. I think this mostly means bdrv_is_allocated(_above)(), which seems to be relevant for the commit and stream jobs and 'qemu-img rebase' at least and some less commonly used stuff. There are some other cases of is_allocated (like in mirror) where it doesn't make a difference because we always process the maximum number of bytes with the first call. > > Do we have a way to disable this cache, at least for nbd? > > No, unfortunately not. We could implement it, but I think the management > layer would need to specify it for all protocol nodes where it wants the > cache to be disabled. > > Auto-detecting would be difficult for qemu, because it would basically mean > detecting whether there are any exports in the block graph somewhere above > the node in question (not even immediately above it, because for example in > your example there’s a raw node between the export and the protocol node > whose block-status information we’re caching). > > I assume you’re now very skeptical of the cache, so even if we implement a > fix for this problem, you’ll probably still want that option to disable it, > right? I don't think we can realistically provide an option for every individual optimisation where there could be a bug. We just need to fix the problem and make sure to return correct data when there is no external writer. Kevin