On Thu, May 30, 2024 at 2:12 PM Alex Bennée <alex.ben...@linaro.org> wrote:
> Cord Amfmgm <dmamf...@gmail.com> writes: > > > On Thu, May 30, 2024 at 3:33 AM Alex Bennée <alex.ben...@linaro.org> > wrote: > > > > Cord Amfmgm <dmamf...@gmail.com> writes: > > > > > On Tue, May 28, 2024 at 11:32 AM Peter Maydell < > peter.mayd...@linaro.org> wrote: > > > > > > On Tue, 28 May 2024 at 16:37, Cord Amfmgm <dmamf...@gmail.com> > wrote: > > > > > > > > On Tue, May 28, 2024 at 9:03 AM Peter Maydell < > peter.mayd...@linaro.org> wrote: > > > >> > > > >> On Mon, 20 May 2024 at 23:24, Cord Amfmgm <dmamf...@gmail.com> > wrote: > > > >> > On Mon, May 20, 2024 at 12:05 PM Peter Maydell < > peter.mayd...@linaro.org> wrote: > > <snip> > > > >> > And here's an example buffer of length 0 -- you probably > already know what I'm going to do here: > > > >> > > > > >> > char buf[0]; > > > >> > char * CurrentBufferPointer = &buf[0]; > > > >> > char * BufferEnd = &buf[-1]; // "address of the last byte in > the buffer" > > > >> > // The OHCI Host Controller than advances CurrentBufferPointer > like this: CurrentBufferPointer += 0 > > > >> > // After the transfer: > > > >> > // CurrentBufferPointer = &buf[0]; > > > >> > // BufferEnd = &buf[-1]; > > > >> > > > >> Right, but why do you think this is valid, rather than > > > >> being a guest software bug? My reading of the spec is that it's > > > >> pretty clear about how to say "zero length buffer", and this > > > >> isn't it. > > > >> > > > >> Is there some real-world guest OS that programs the OHCI > > > >> controller this way that we're trying to accommodate? > > > > > > > > > > > > qemu versions 4.2 and before allowed this behavior. > > > > > > So? That might just mean we had a bug and we fixed it. > > > 4.2 is a very old version of QEMU and nobody seems to have > > > complained in the four years since we released 5.0 about this, > > > which suggests that generally guest OS drivers don't try > > > to send zero-length buffers in this way. > > > > > > > I don't think it's valid to ask for a *popular* guest OS as a > proof-of-concept because I'm not an expert on those. > > > > > > I didn't ask for "popular"; I asked for "real-world". > > > What is the actual guest code you're running that falls over > > > because of the behaviour change? > > > > > > More generally, why do you want this behaviour to be > > > changed? Reasonable reasons might include: > > > * we're out of spec based on reading the documentation > > > * you're trying to run some old Windows VM/QNX/etc image, > > > and it doesn't work any more > > > * all the real hardware we tested behaves this way > > > > > > But don't necessarily include: > > > * something somebody wrote and only tested on QEMU happens to > > > assume the old behaviour rather than following the hw spec > > > > > > QEMU occasionally works around guest OS bugs, but only as > > > when we really have to. It's usually better to fix the > > > bug in the guest. > > > > > > It's not, and I've already demonstrated that real hardware is > consistent with the fix in this patch. > > > > > > Please check your tone. > > > > I don't think that is a particularly helpful comment for someone who is > > taking the time to review your patches. Reading through the thread I > > didn't see anything that said this is how real HW behaves but I may well > > have missed it. However you have a number of review comments to address > > so I suggest you spin a v2 of the series to address them and outline the > > reason to accept an out of spec transaction. > > > > I did a rework of the patch -- see my email from May 20, quoted below -- > and I was under the impression it addressed all the > > review comments. Did I miss something? I apologize if I did. > > Ahh I see - I'd only seen this thread continue so wasn't aware a new > version had been posted. For future patches consider using -vN when > sending them so we can clearly see a new revision is available. > > > > >> index acd6016980..71b54914d3 100644 > >> --- a/hw/usb/hcd-ohci.c > >> +++ b/hw/usb/hcd-ohci.c > >> @@ -941,8 +941,8 @@ static int ohci_service_td(OHCIState *ohci, struct > ohci_ed *ed) > >> if ((td.cbp & 0xfffff000) != (td.be & 0xfffff000)) { > >> len = (td.be & 0xfff) + 0x1001 - (td.cbp & 0xfff); > >> } else { > >> - if (td.cbp > td.be) { > >> - trace_usb_ohci_iso_td_bad_cc_overrun(td.cbp, td.be); > >> + if (td.cbp - 1 > td.be) { /* rely on td.cbp != 0 */ > > > >> Reading through the thread I didn't see anything that said this is how > real HW behaves but I may well have missed it. > > > > This is what I wrote regarding real HW: > > > > Results are: > > > > qemu 4.2 | qemu HEAD | actual HW > > ------------+------------+------------ > > works fine | ohci_die() | works fine > > > > Would additional verification of the actual HW be useful? > > > > Peter posted the following which is more specific than "qemu 4.2" -- I > agree this is most likely the qemu commit where this > > thread is focused: > > > >> Almost certainly this was commit 1328fe0c32d54 ("hw: usb: hcd-ohci: > >> check len and frame_number variables"), which added these bounds > >> checks. Prior to that we did no bounds checking at all, which > >> meant that we permitted cbp=be+1 to mean a zero length, but also > >> that we permitted the guest to overrun host-side buffers by > >> specifying completely bogus cbp and be values. The timeframe is > >> more or less right (2020), at least. > >> > >> -- PMM > > > > Where does the conversation go from here? I'm under the impression I > have provided objective answers to all the questions > > and resolved all review comments on the code. I receive the feedback > > that I missed something - please restate the question? > > I can see patch 1/2 has been queued and 2/2 is still outstanding. I'm > having trouble finding the referenced entry in the OHCI spec. The only > one I can see is Release 1.1, January 6th, 2000 and that doesn't have a > section 4.3.1.2. > > I think discussion should continue on that thread. > Yes, agreed. > > -- > Alex Bennée > Virtualisation Tech Lead @ Linaro >