On Tue, 21 May 2024 at 00:26, David Hubbard <dmamf...@gmail.com> wrote:
>
> From: Cord Amfmgm <dmamf...@gmail.com>
>
> This changes the way the ohci emulation handles a Transfer Descriptor with
> "Current Buffer Pointer" set to "Buffer End" + 1.
>
> The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more than td.be
> to signal the buffer has zero length. Currently qemu only accepts zero-length
> Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI hardware
> accepts both cases.
>
> The qemu ohci emulation has a regression in ohci_service_td. Version 4.2
> and earlier matched the spec. (I haven't taken the time to bisect exactly
> where the logic was changed.)
>
> With a tiny OS[1] that boots and executes a test, the issue can be seen:
>
> * OS that sends USB requests to a USB mass storage device
>   but sends td.cbp = td.be + 1
> * qemu 4.2
> * qemu HEAD (4e66a0854)
> * Actual OHCI controller (hardware)
>
> Command line:
> qemu-system-x86_64 -m 20 \
>  -device pci-ohci,id=ohci \
>  -drive if=none,format=raw,id=d,file=testmbr.raw \
>  -device usb-storage,bus=ohci.0,drive=d \
>  --trace "usb_*" --trace "ohci_*" -D qemu.log
>
> Results are:
>
>  qemu 4.2   | qemu HEAD  | actual HW
> ------------+------------+------------
>  works fine | ohci_die() | works fine
>
> Tip: if the flags "-serial pty -serial stdio" are added to the command line
> the test will output USB requests like this:
>
> Testing qemu HEAD:
>
> > Free mem 2M ohci port2 conn FS
> > setup { 80 6 0 1 0 0 8 0 }
> > ED info=80000 { mps=8 en=0 d=0 } tail=c20920
> >   td0 c20880 nxt=c20960 f2000000 setup cbp=c20900 be=c20907
> >   td1 c20960 nxt=c20980 f3140000    in cbp=c20908 be=c2090f
> >   td2 c20980 nxt=c20920 f3080000   out cbp=c20910 be=c2090f ohci20 host err
> > usb stopped
>
> And in qemu.log:
>
> usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 > 
> next_offset=0x00c2090f
>
> Testing qemu 4.2:
>
> > Free mem 2M ohci port2 conn FS
> > setup { 80 6 0 1 0 0 8 0 }
> > ED info=80000 { mps=8 en=0 d=0 } tail=620920
> >   td0 620880 nxt=620960 f2000000 setup cbp=620900 be=620907       cbp=0 
> > be=620907
> >   td1 620960 nxt=620980 f3140000    in cbp=620908 be=62090f       cbp=0 
> > be=62090f
> >   td2 620980 nxt=620920 f3080000   out cbp=620910 be=62090f       cbp=0 
> > be=62090f
> >    rx { 12 1 0 2 0 0 0 8 }
> > setup { 0 5 1 0 0 0 0 0 } tx {}
> > ED info=80000 { mps=8 en=0 d=0 } tail=620880
> >   td0 620920 nxt=620960 f2000000 setup cbp=620900 be=620907       cbp=0 
> > be=620907
> >   td1 620960 nxt=620880 f3100000    in cbp=620908 be=620907       cbp=0 
> > be=620907
> > setup { 80 6 0 1 0 0 12 0 }
> > ED info=80001 { mps=8 en=0 d=1 } tail=620960
> >   td0 620880 nxt=6209c0 f2000000 setup cbp=620920 be=620927       cbp=0 
> > be=620927
> >   td1 6209c0 nxt=6209e0 f3140000    in cbp=620928 be=620939       cbp=0 
> > be=620939
> >   td2 6209e0 nxt=620960 f3080000   out cbp=62093a be=620939       cbp=0 
> > be=620939
> >    rx { 12 1 0 2 0 0 0 8 f4 46 1 0 0 0 1 2 3 1 }
> > setup { 80 6 0 2 0 0 0 1 }
> > ED info=80001 { mps=8 en=0 d=1 } tail=620880
> >   td0 620960 nxt=6209a0 f2000000 setup cbp=620a20 be=620a27       cbp=0 
> > be=620a27
> >   td1 6209a0 nxt=6209c0 f3140004    in cbp=620a28 be=620b27       
> > cbp=620a48 be=620b27
> >   td2 6209c0 nxt=620880 f3080000   out cbp=620b28 be=620b27       cbp=0 
> > be=620b27
> >    rx { 9 2 20 0 1 1 4 c0 0 9 4 0 0 2 8 6 50 0 7 5 81 2 40 0 0 7 5 2 2 40 0 
> > 0 }
> > setup { 0 9 1 0 0 0 0 0 } tx {}
> > ED info=80001 { mps=8 en=0 d=1 } tail=620900
> >   td0 620880 nxt=620940 f2000000 setup cbp=620a00 be=620a07       cbp=0 
> > be=620a07
> >   td1 620940 nxt=620900 f3100000    in cbp=620a08 be=620a07       cbp=0 
> > be=620a07
>
> [1] The OS disk image has been emailed to phi...@linaro.org, m...@tls.msk.ru,
> and kra...@redhat.com:
>
> * testCbpOffBy1.img.xz
> * sha256: f87baddcb86de845de12f002c698670a426affb40946025cc32694f9daa3abed
>
> Signed-off-by: Cord Amfmgm <dmamf...@gmail.com>
> ---
>  hw/usb/hcd-ohci.c   | 4 ++--
>  hw/usb/trace-events | 1 +
>  2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
> index acd6016980..71b54914d3 100644
> --- a/hw/usb/hcd-ohci.c
> +++ b/hw/usb/hcd-ohci.c
> @@ -941,8 +941,8 @@ static int ohci_service_td(OHCIState *ohci, struct 
> ohci_ed *ed)
>          if ((td.cbp & 0xfffff000) != (td.be & 0xfffff000)) {
>              len = (td.be & 0xfff) + 0x1001 - (td.cbp & 0xfff);
>          } else {
> -            if (td.cbp > td.be) {
> -                trace_usb_ohci_iso_td_bad_cc_overrun(td.cbp, td.be);
> +            if (td.cbp - 1 > td.be) {  /* rely on td.cbp != 0 */
> +                trace_usb_ohci_td_bad_buf(td.cbp, td.be);
>                  ohci_die(ohci);
>                  return 1;
>              }

This patch seems to me to do what the commit message sets out to
do, and it looks unlikely to have any unintended side effects
because it turns a "USB controller flags an error" case into
a "treat as zero length packet" case, and I have trouble
imagining that any guest could be relying on looking for the
controller error. On that basis I'm inclined to accept it.

What I would like to see is what we could classify under
"rationale", which is to say "what prompted us to make this
change?". In my experience it's important to record this
(including in the commit message). There are of course
many cases in QEMU's git history where we failed to do that,
but in general I think it's a good standard to meet. (I
am also erring on the side of caution in reviewing this
particular patch, because I don't know the relevant standards
or this bit of the code very well.)

Why do we care about the motivation for a patch?

(1) In the present: it helps to raise confidence that the
proposed new behaviour is right. This is good because QEMU's
test suite is far from comprehensive, so in some sense any
change to the codebase is a risk.

For instance, if a change is being made because the QNX demo
floppy doesn't run, then the fact that the change fixes that
failure-to-run indicates that our interpretation of the
meaning of the standard, or of what should happen in the
grey areas that the documentation doesn't clearly describe,
matches what the QNX driver author (an unrelated third party)
thought and probably also what a lot of in-the-field hardware
does (since QNX was no doubt tested on a lot of different PCs
back in the day).

On the other hand, if a change is proposed because it fixes
booting with older Linux kernels prior to commit XYZ, and
kernel commit XYZ turns out to be "make this device driver
program the hardware according to the specification rather
than relying on an accident of timing", then we might want
to look at where we want to be in the tradeoff of "run older
kernels" versus "put workaround for a guest software issue
into QEMU". (Workarounds for guest software bugs are something
I'm very reluctant to put into QEMU, because my experience
is that once they're in the codebase we can essentially never
remove them, because we don't know what guest code might
be relying on them. But sometimes they're a necessary evil.)

(2) In the future: if in a year's time or more, somebody
reports that a particular commit has regressed some specific
guest workload they have, knowing why we made the change in
the first place is really useful in investigating the
regression.

If we need to change code that was initially added to solve
a problem when running FreeBSD, we know we need to re-test
with FreeBSD.

If the change went in to fix a buffer overrun, we know we
need to be careful and cross-check that we don't reintroduce
the overrun in the course of fixing a regression.

If a change is one that we made on the grounds of "reading
the spec and the code, this looked like it was clearly wrong,
but we don't have a definite repro case of it breaking a guest"
then that might put "revert the change, we were mistaken" on
the table as a response to a future regression report.
And so on.

thanks
-- PMM

Reply via email to