Cord Amfmgm <dmamf...@gmail.com> writes:

> On Thu, May 30, 2024 at 3:33 AM Alex Bennée <alex.ben...@linaro.org> wrote:
>
>  Cord Amfmgm <dmamf...@gmail.com> writes:
>
>  > On Tue, May 28, 2024 at 11:32 AM Peter Maydell <peter.mayd...@linaro.org> 
> wrote:
>  >
>  >  On Tue, 28 May 2024 at 16:37, Cord Amfmgm <dmamf...@gmail.com> wrote:
>  >  >
>  >  > On Tue, May 28, 2024 at 9:03 AM Peter Maydell 
> <peter.mayd...@linaro.org> wrote:
>  >  >>
>  >  >> On Mon, 20 May 2024 at 23:24, Cord Amfmgm <dmamf...@gmail.com> wrote:
>  >  >> > On Mon, May 20, 2024 at 12:05 PM Peter Maydell 
> <peter.mayd...@linaro.org> wrote:
>  <snip>
>  >  >> > And here's an example buffer of length 0 -- you probably already 
> know what I'm going to do here:
>  >  >> >
>  >  >> > char buf[0];
>  >  >> > char * CurrentBufferPointer = &buf[0];
>  >  >> > char * BufferEnd = &buf[-1]; // "address of the last byte in the 
> buffer"
>  >  >> > // The OHCI Host Controller than advances CurrentBufferPointer like 
> this: CurrentBufferPointer += 0
>  >  >> > // After the transfer:
>  >  >> > // CurrentBufferPointer = &buf[0];
>  >  >> > // BufferEnd = &buf[-1];
>  >  >>
>  >  >> Right, but why do you think this is valid, rather than
>  >  >> being a guest software bug? My reading of the spec is that it's
>  >  >> pretty clear about how to say "zero length buffer", and this
>  >  >> isn't it.
>  >  >>
>  >  >> Is there some real-world guest OS that programs the OHCI
>  >  >> controller this way that we're trying to accommodate?
>  >  >
>  >  >
>  >  > qemu versions 4.2 and before allowed this behavior.
>  >
>  >  So? That might just mean we had a bug and we fixed it.
>  >  4.2 is a very old version of QEMU and nobody seems to have
>  >  complained in the four years since we released 5.0 about this,
>  >  which suggests that generally guest OS drivers don't try
>  >  to send zero-length buffers in this way.
>  >
>  >  > I don't think it's valid to ask for a *popular* guest OS as a 
> proof-of-concept because I'm not an expert on those.
>  >
>  >  I didn't ask for "popular"; I asked for "real-world".
>  >  What is the actual guest code you're running that falls over
>  >  because of the behaviour change?
>  >
>  >  More generally, why do you want this behaviour to be
>  >  changed? Reasonable reasons might include:
>  >   * we're out of spec based on reading the documentation
>  >   * you're trying to run some old Windows VM/QNX/etc image,
>  >     and it doesn't work any more
>  >   * all the real hardware we tested behaves this way
>  >
>  >  But don't necessarily include:
>  >   * something somebody wrote and only tested on QEMU happens to
>  >     assume the old behaviour rather than following the hw spec
>  >
>  >  QEMU occasionally works around guest OS bugs, but only as
>  >  when we really have to. It's usually better to fix the
>  >  bug in the guest.
>  >
>  > It's not, and I've already demonstrated that real hardware is consistent 
> with the fix in this patch.
>  >
>  > Please check your tone.
>
>  I don't think that is a particularly helpful comment for someone who is
>  taking the time to review your patches. Reading through the thread I
>  didn't see anything that said this is how real HW behaves but I may well
>  have missed it. However you have a number of review comments to address
>  so I suggest you spin a v2 of the series to address them and outline the
>  reason to accept an out of spec transaction.
>
> I did a rework of the patch -- see my email from May 20, quoted below -- and 
> I was under the impression it addressed all the
> review comments. Did I miss something? I apologize if I did.

Ahh I see - I'd only seen this thread continue so wasn't aware a new
version had been posted. For future patches consider using -vN when
sending them so we can clearly see a new revision is available.

>
>> index acd6016980..71b54914d3 100644
>> --- a/hw/usb/hcd-ohci.c
>> +++ b/hw/usb/hcd-ohci.c
>> @@ -941,8 +941,8 @@ static int ohci_service_td(OHCIState *ohci, struct 
>> ohci_ed *ed)
>>          if ((td.cbp & 0xfffff000) != (td.be & 0xfffff000)) {
>>              len = (td.be & 0xfff) + 0x1001 - (td.cbp & 0xfff);
>>          } else {
>> -            if (td.cbp > td.be) {
>> -                trace_usb_ohci_iso_td_bad_cc_overrun(td.cbp, td.be);
>> +            if (td.cbp - 1 > td.be) {  /* rely on td.cbp != 0 */
>
>> Reading through the thread I didn't see anything that said this is how real 
>> HW behaves but I may well have missed it.
>
> This is what I wrote regarding real HW:
>
> Results are:
>
>  qemu 4.2   | qemu HEAD  | actual HW
> ------------+------------+------------
>  works fine | ohci_die() | works fine
>
> Would additional verification of the actual HW be useful?
>
> Peter posted the following which is more specific than "qemu 4.2" -- I agree 
> this is most likely the qemu commit where this
> thread is focused:
>
>> Almost certainly this was commit 1328fe0c32d54 ("hw: usb: hcd-ohci:
>> check len and frame_number variables"), which added these bounds
>> checks. Prior to that we did no bounds checking at all, which
>> meant that we permitted cbp=be+1 to mean a zero length, but also
>> that we permitted the guest to overrun host-side buffers by
>> specifying completely bogus cbp and be values. The timeframe is
>> more or less right (2020), at least.
>> 
>> -- PMM
>
> Where does the conversation go from here? I'm under the impression I have 
> provided objective answers to all the questions
> and resolved all review comments on the code. I receive the feedback
> that I missed something - please restate the question?

I can see patch 1/2 has been queued and 2/2 is still outstanding. I'm
having trouble finding the referenced entry in the OHCI spec. The only
one I can see is Release 1.1, January 6th, 2000 and that doesn't have a
section 4.3.1.2.

I think discussion should continue on that thread.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Reply via email to