On Mon, 12 Aug 2024, John Snow wrote:

> On Tue, Apr 16, 2024 at 7:54 AM Stefan Fritsch <s...@sfritsch.de> wrote:
> 
> > adding John Snow to CC because he investigated this in 2020.
> >
> > On Fri, 12 Apr 2024, Eric Blake wrote:
> >
> > > On Fri, Apr 12, 2024 at 10:06:17AM +0200, Stefan Fritsch wrote:
> > > > Commit 99868af3d0 changed the hardcoded constant BDRV_SECTOR_SIZE to a
> > > > dynamic field 'align' but introduced a bug. qemu_iovec_discard_back()
> > > > is now passed the wanted iov length instead of the actually required
> > > > amount that should be removed from the end of the iov.
> > > >
> > > > The bug can likely only be hit in uncommon configurations, e.g. with
> > > > icount enabled or when reading from disk directly to device memory.
> > > >
> > > > Fixes: 99868af3d0a75cf6 ("dma-helpers: explicitly pass alignment into
> > DMA helpers")
> > > > Signed-off-by: Stefan Fritsch <s...@sfritsch.de>
> > > > ---
> > > >  system/dma-helpers.c | 3 +--
> > > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > >
> > > Wow, that bug has been latent for a while (2016, v2.8.0).  As such, I
> > > don't think it's worth holding up 9.0 for, but it is definitely worth
> > > cc'ing qemu-stable (done now).
> >
> > we had some more internal discussions and did some more googling, and it
> > turns out that this is actually part of a known issue:
> >
> > https://lists.nongnu.org/archive/html/qemu-block/2020-07/msg01866.html
> > https://gitlab.com/qemu-project/qemu/-/issues/259
> >
> > In the above mail to qemu-block, John Snow listed 4 problems in the code
> > but my patch only fixes the first one. Considering that the code may also
> > write data to the wrong place (problem 2), I wonder if an assert in the
> > same place would be better until the underlying issues have been fixed?
> >
> 
> Whew, it's been such a long time since I've looked at this, I barely
> remember. I think my email now knows more about the problem than *I*
> currently do. I'm glad I wrote it ...
> 
> I do not currently know what happens if you fix the first issue but not the
> other four; I don't know if it gets passably "less broken" or if it causes
> other bigger issues... and I don't have the bandwidth to test that out
> currently, my apologies.
> 
> If there's a renewed interest in fixing this, I can try to start ramping
> back up on it, but I have a few other projects to finish first, but
> possibly I could discuss this at KVM Forum to help delineate the work into
> manageable chunks.

We came across this while chasing another bug, which turned out to be in 
other unrelated code. I also don't know if fixing this first of several 
issues improves the situation for people running an affected 
configuration. We only use qemu configurations where the broken code path 
is not hit. So, I am not sure this can be called renewed interest in 
fixing the issue. But if this is committed, I think there should be at 
least a comment pointing to the other issues.

Cheers,
Stefan

Reply via email to