On Tue, 21 May 2024 at 00:26, David Hubbard <dmamf...@gmail.com> wrote: > > From: Cord Amfmgm <dmamf...@gmail.com> > > This changes the way the ohci emulation handles a Transfer Descriptor with > "Current Buffer Pointer" set to "Buffer End" + 1. > > The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more than td.be > to signal the buffer has zero length. Currently qemu only accepts zero-length > Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI hardware > accepts both cases. > > The qemu ohci emulation has a regression in ohci_service_td. Version 4.2 > and earlier matched the spec. (I haven't taken the time to bisect exactly > where the logic was changed.) > > With a tiny OS[1] that boots and executes a test, the issue can be seen: > > * OS that sends USB requests to a USB mass storage device > but sends td.cbp = td.be + 1 > * qemu 4.2 > * qemu HEAD (4e66a0854) > * Actual OHCI controller (hardware) > > Command line: > qemu-system-x86_64 -m 20 \ > -device pci-ohci,id=ohci \ > -drive if=none,format=raw,id=d,file=testmbr.raw \ > -device usb-storage,bus=ohci.0,drive=d \ > --trace "usb_*" --trace "ohci_*" -D qemu.log > > Results are: > > qemu 4.2 | qemu HEAD | actual HW > ------------+------------+------------ > works fine | ohci_die() | works fine > > Tip: if the flags "-serial pty -serial stdio" are added to the command line > the test will output USB requests like this: > > Testing qemu HEAD: > > > Free mem 2M ohci port2 conn FS > > setup { 80 6 0 1 0 0 8 0 } > > ED info=80000 { mps=8 en=0 d=0 } tail=c20920 > > td0 c20880 nxt=c20960 f2000000 setup cbp=c20900 be=c20907 > > td1 c20960 nxt=c20980 f3140000 in cbp=c20908 be=c2090f > > td2 c20980 nxt=c20920 f3080000 out cbp=c20910 be=c2090f ohci20 host err > > usb stopped > > And in qemu.log: > > usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 > > next_offset=0x00c2090f > > Testing qemu 4.2: > > > Free mem 2M ohci port2 conn FS > > setup { 80 6 0 1 0 0 8 0 } > > ED info=80000 { mps=8 en=0 d=0 } tail=620920 > > td0 620880 nxt=620960 f2000000 setup cbp=620900 be=620907 cbp=0 > > be=620907 > > td1 620960 nxt=620980 f3140000 in cbp=620908 be=62090f cbp=0 > > be=62090f > > td2 620980 nxt=620920 f3080000 out cbp=620910 be=62090f cbp=0 > > be=62090f > > rx { 12 1 0 2 0 0 0 8 } > > setup { 0 5 1 0 0 0 0 0 } tx {} > > ED info=80000 { mps=8 en=0 d=0 } tail=620880 > > td0 620920 nxt=620960 f2000000 setup cbp=620900 be=620907 cbp=0 > > be=620907 > > td1 620960 nxt=620880 f3100000 in cbp=620908 be=620907 cbp=0 > > be=620907 > > setup { 80 6 0 1 0 0 12 0 } > > ED info=80001 { mps=8 en=0 d=1 } tail=620960 > > td0 620880 nxt=6209c0 f2000000 setup cbp=620920 be=620927 cbp=0 > > be=620927 > > td1 6209c0 nxt=6209e0 f3140000 in cbp=620928 be=620939 cbp=0 > > be=620939 > > td2 6209e0 nxt=620960 f3080000 out cbp=62093a be=620939 cbp=0 > > be=620939 > > rx { 12 1 0 2 0 0 0 8 f4 46 1 0 0 0 1 2 3 1 } > > setup { 80 6 0 2 0 0 0 1 } > > ED info=80001 { mps=8 en=0 d=1 } tail=620880 > > td0 620960 nxt=6209a0 f2000000 setup cbp=620a20 be=620a27 cbp=0 > > be=620a27 > > td1 6209a0 nxt=6209c0 f3140004 in cbp=620a28 be=620b27 > > cbp=620a48 be=620b27 > > td2 6209c0 nxt=620880 f3080000 out cbp=620b28 be=620b27 cbp=0 > > be=620b27 > > rx { 9 2 20 0 1 1 4 c0 0 9 4 0 0 2 8 6 50 0 7 5 81 2 40 0 0 7 5 2 2 40 0 > > 0 } > > setup { 0 9 1 0 0 0 0 0 } tx {} > > ED info=80001 { mps=8 en=0 d=1 } tail=620900 > > td0 620880 nxt=620940 f2000000 setup cbp=620a00 be=620a07 cbp=0 > > be=620a07 > > td1 620940 nxt=620900 f3100000 in cbp=620a08 be=620a07 cbp=0 > > be=620a07 > > [1] The OS disk image has been emailed to phi...@linaro.org, m...@tls.msk.ru, > and kra...@redhat.com: > > * testCbpOffBy1.img.xz > * sha256: f87baddcb86de845de12f002c698670a426affb40946025cc32694f9daa3abed > > Signed-off-by: Cord Amfmgm <dmamf...@gmail.com> > --- > hw/usb/hcd-ohci.c | 4 ++-- > hw/usb/trace-events | 1 + > 2 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c > index acd6016980..71b54914d3 100644 > --- a/hw/usb/hcd-ohci.c > +++ b/hw/usb/hcd-ohci.c > @@ -941,8 +941,8 @@ static int ohci_service_td(OHCIState *ohci, struct > ohci_ed *ed) > if ((td.cbp & 0xfffff000) != (td.be & 0xfffff000)) { > len = (td.be & 0xfff) + 0x1001 - (td.cbp & 0xfff); > } else { > - if (td.cbp > td.be) { > - trace_usb_ohci_iso_td_bad_cc_overrun(td.cbp, td.be); > + if (td.cbp - 1 > td.be) { /* rely on td.cbp != 0 */ > + trace_usb_ohci_td_bad_buf(td.cbp, td.be); > ohci_die(ohci); > return 1; > }
This patch seems to me to do what the commit message sets out to do, and it looks unlikely to have any unintended side effects because it turns a "USB controller flags an error" case into a "treat as zero length packet" case, and I have trouble imagining that any guest could be relying on looking for the controller error. On that basis I'm inclined to accept it. What I would like to see is what we could classify under "rationale", which is to say "what prompted us to make this change?". In my experience it's important to record this (including in the commit message). There are of course many cases in QEMU's git history where we failed to do that, but in general I think it's a good standard to meet. (I am also erring on the side of caution in reviewing this particular patch, because I don't know the relevant standards or this bit of the code very well.) Why do we care about the motivation for a patch? (1) In the present: it helps to raise confidence that the proposed new behaviour is right. This is good because QEMU's test suite is far from comprehensive, so in some sense any change to the codebase is a risk. For instance, if a change is being made because the QNX demo floppy doesn't run, then the fact that the change fixes that failure-to-run indicates that our interpretation of the meaning of the standard, or of what should happen in the grey areas that the documentation doesn't clearly describe, matches what the QNX driver author (an unrelated third party) thought and probably also what a lot of in-the-field hardware does (since QNX was no doubt tested on a lot of different PCs back in the day). On the other hand, if a change is proposed because it fixes booting with older Linux kernels prior to commit XYZ, and kernel commit XYZ turns out to be "make this device driver program the hardware according to the specification rather than relying on an accident of timing", then we might want to look at where we want to be in the tradeoff of "run older kernels" versus "put workaround for a guest software issue into QEMU". (Workarounds for guest software bugs are something I'm very reluctant to put into QEMU, because my experience is that once they're in the codebase we can essentially never remove them, because we don't know what guest code might be relying on them. But sometimes they're a necessary evil.) (2) In the future: if in a year's time or more, somebody reports that a particular commit has regressed some specific guest workload they have, knowing why we made the change in the first place is really useful in investigating the regression. If we need to change code that was initially added to solve a problem when running FreeBSD, we know we need to re-test with FreeBSD. If the change went in to fix a buffer overrun, we know we need to be careful and cross-check that we don't reintroduce the overrun in the course of fixing a regression. If a change is one that we made on the grounds of "reading the spec and the code, this looked like it was clearly wrong, but we don't have a definite repro case of it breaking a guest" then that might put "revert the change, we were mistaken" on the table as a response to a future regression report. And so on. thanks -- PMM