Re: OpenBSD 7.2 on Oracle Cloud
On Thu, 4 May 2023, Aaron Mason wrote: > On Mon, May 1, 2023 at 4:56 AM Stefan Fritsch wrote: > > > > Hi, > > > > what qemu version are you using? I cannot reproduce this with qemu 7.2. > > Can you try with a newer qemu? > > > > Cheers, > > Stefan > > > > What is the host OS where you're running QEMU? Just looking to > eliminate any variables at play here. I run it on linux. Yesterday, I have tried to debug why a scsi command could time out in any case, but without success so far. Also no idea why it should behave differently on windows and linux. It is probably the initial TEST_UNIT_READY command that goes wrong.
Re: OpenBSD 7.2 on Oracle Cloud
On Mon, May 1, 2023 at 4:56 AM Stefan Fritsch wrote: > > Hi, > > what qemu version are you using? I cannot reproduce this with qemu 7.2. > Can you try with a newer qemu? > > Cheers, > Stefan > What is the host OS where you're running QEMU? Just looking to eliminate any variables at play here. -- Aaron Mason - Programmer, open source addict I've taken my software vows - for beta or for worse
Re: OpenBSD 7.2 on Oracle Cloud
On Tue, May 2, 2023 at 11:30 AM Aaron Mason wrote: > > On Tue, May 2, 2023 at 9:29 AM Aaron Mason wrote: > > [REDACTED] > > > > > > The actual problem is here. One request times out, but the driver does > > > not tell qemu that it should abort the request. The queue entry then > > > gets reused and the two responses from qemu overwrite each other. You > > > could try if increasing the timeout here to e.g. 1 helps: > > > > > > if (ISSET(xs->flags, SCSI_POLL)) { > > > DPRINTF("vioscsi_scsi_cmd: polling...\n"); > > > int timeout = 1000; > > > do { > > > virtio_poll_intr(vsc); > > > if (vr->vr_xs != xs) > > > break; > > > delay(1000); > > > } while (--timeout > 0); > > > if (vr->vr_xs == xs) { > > > // TODO(matthew): Abort the request. > > > xs->error = XS_TIMEOUT; > > > xs->resid = xs->datalen; > > > DPRINTF("vioscsi_scsi_cmd: polling timeout\n"); > > > scsi_done(xs); > > > } > > > > > > Unfortunately, it order to properly abort the request, quite a bit of > > > infrastructure related to the control queue is still missing in the > > > driver. > > > > I'll give it a go and report back, thanks! > > > > No dice, it only takes longer to generate a page fault. > > -- > Aaron Mason - Programmer, open source addict > I've taken my software vows - for beta or for worse Increasing it to 4 has the same effect, only longer. -- Aaron Mason - Programmer, open source addict I've taken my software vows - for beta or for worse
Re: OpenBSD 7.2 on Oracle Cloud
On Tue, May 2, 2023 at 9:29 AM Aaron Mason wrote: > [REDACTED] > > > > The actual problem is here. One request times out, but the driver does > > not tell qemu that it should abort the request. The queue entry then > > gets reused and the two responses from qemu overwrite each other. You > > could try if increasing the timeout here to e.g. 1 helps: > > > > if (ISSET(xs->flags, SCSI_POLL)) { > > DPRINTF("vioscsi_scsi_cmd: polling...\n"); > > int timeout = 1000; > > do { > > virtio_poll_intr(vsc); > > if (vr->vr_xs != xs) > > break; > > delay(1000); > > } while (--timeout > 0); > > if (vr->vr_xs == xs) { > > // TODO(matthew): Abort the request. > > xs->error = XS_TIMEOUT; > > xs->resid = xs->datalen; > > DPRINTF("vioscsi_scsi_cmd: polling timeout\n"); > > scsi_done(xs); > > } > > > > Unfortunately, it order to properly abort the request, quite a bit of > > infrastructure related to the control queue is still missing in the driver. > > I'll give it a go and report back, thanks! > No dice, it only takes longer to generate a page fault. -- Aaron Mason - Programmer, open source addict I've taken my software vows - for beta or for worse
Re: OpenBSD 7.2 on Oracle Cloud
On Mon, May 1, 2023 at 11:32 PM Stefan Fritsch wrote: > > Dropping misc@ from cc > > Am 01.05.23 um 02:08 schrieb Aaron Mason: > > I can reproduce it with this in QEMU 8.0 in Winders (thanks Antun who > > sent something like this to the bugs@ list): > > > > qemu-system-x86_64 -accel whpx,kernel-irqchip=off -machine q35 \ > > -cpu EPYC-Rome,-monitor -m 8g -smp 6,sockets=1,cores=6 \ > > -nic user,model=virtio-net-pci,hostfwd=tcp::10022-:22 -vga virtio \ > > -drive if=virtio,file=miniroot73.img -device virtio-scsi-pci,id=scsi > > It is probably depending on timing and the timing on my machine is > different. > > > The temporary workaround patch results in a booting system. > > I fear that just returning from vioscsi_req_done may cause data > corruption sometimes. Yeah I was concerned that such a blanket drop might result in, to quote a certain G-Man, "unforeseen consequences" (mostly out of lack of experience), hence why I stopped well short of declaring the problem fixed. > > >>> I enabled debugging on the vioscsi driver, rebuilt the RAMDISK kernel > >>> with those drivers enabled, and got this: > >>> > >>> vioscsi0 at virtio1: qsize 128 > >>> scsibus0 at vioscsi0: 255 targets > >>> vioscsi_req_get: 0xfd803f80d338 > >>> vioscsi_scsi_cmd: enter > >>> vioscsi_scsi_cmd: polling... > >>> vioscsi_scsi_cmd: polling timeout > > The actual problem is here. One request times out, but the driver does > not tell qemu that it should abort the request. The queue entry then > gets reused and the two responses from qemu overwrite each other. You > could try if increasing the timeout here to e.g. 1 helps: > > if (ISSET(xs->flags, SCSI_POLL)) { > DPRINTF("vioscsi_scsi_cmd: polling...\n"); > int timeout = 1000; > do { > virtio_poll_intr(vsc); > if (vr->vr_xs != xs) > break; > delay(1000); > } while (--timeout > 0); > if (vr->vr_xs == xs) { > // TODO(matthew): Abort the request. > xs->error = XS_TIMEOUT; > xs->resid = xs->datalen; > DPRINTF("vioscsi_scsi_cmd: polling timeout\n"); > scsi_done(xs); > } > > Unfortunately, it order to properly abort the request, quite a bit of > infrastructure related to the control queue is still missing in the driver. I'll give it a go and report back, thanks! > > >>> vioscsi_scsi_cmd: done (timeout=0) > >>> vioscsi_scsi_cmd: enter > >>> vioscsi_scsi_cmd: polling... > >>> vioscsi_vq_done: enter > >>> vioscsi_vq_done: slot=127 > >>> vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0xfd803f8a5e58 > >>> vioscsi_req_done: done 0, 2, 0 > >>> vioscsi_vq_done: slot=127 > >>> vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0x0 > >>> uvm_fault(0x813ec2e0, 0x8, 0, 1) -> e > >>> fatal page fault in supervisor mode > >>> trap type 6 code 0 rip 810e6190 cs 8 rflags 10286 cr2 8 cpl e > >>> rsp 81606670 > >>> gsbase 0x813dfff0 kgsbase 0x0 > >>> panic: trap type 6, code=0, pc=810e6190 > >>> > >>> That "xs: 0x0" bit feels like a clue. It should be trivial to pick up > >>> and handle, but what would be the correct way to handle that? > >>> > >>> If I have it return if "xs" is found to be NULL, it continues - the > >>> debugging suggests it goes through each possible target before > >>> finishing up. I don't know if that's correct, but it seems to continue > >>> booting after that even if my example didn't detect the drive with the > >>> kernel I built (I used the RAMDISK kernel and it was pretty stripped > >>> down). > >>> > >>> I'm about to attempt a -STABLE build (I've got 7.3 installed and thus > >>> can't yet build a snapshot, but I will do that if this test succeeds) > >>> - here's the patch that hopefully fixes the problem. (and hopefully > >>> gmail doesn't clobber the tabs) > >>> > >>> Index: sys/dev/pv/vioscsi.c > >>> === > >>> RCS file: /cvs/src/sys/dev/pv/vioscsi.c,v > >>> retrieving revision 1.30 > >>> diff -u -p -u -p -r1.30 vioscsi.c > >>> --- sys/dev/pv/vioscsi.c 16 Apr 2022 19:19:59 - 1.30 > >>> +++ sys/dev/pv/vioscsi.c 25 Apr 2023 12:51:16 - > >>> @@ -296,6 +296,7 @@ vioscsi_req_done(struct vioscsi_softc *s > >>> struct scsi_xfer *xs = vr->vr_xs; > >>> DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs); > >>> > >>> + if (xs == NULL) return; > >>> int isread = !!(xs->flags & SCSI_DATA_IN); > >>> bus_dmamap_sync(vsc->sc_dmat, vr->vr_control, > >>> offsetof(struct vioscsi_req, vr_req), > >>> > >>> > > > > > > -- Aaron Mason - Programmer, open source addict I've taken my software vows - for beta or for worse
Re: OpenBSD 7.2 on Oracle Cloud
Dropping misc@ from cc Am 01.05.23 um 02:08 schrieb Aaron Mason: I can reproduce it with this in QEMU 8.0 in Winders (thanks Antun who sent something like this to the bugs@ list): qemu-system-x86_64 -accel whpx,kernel-irqchip=off -machine q35 \ -cpu EPYC-Rome,-monitor -m 8g -smp 6,sockets=1,cores=6 \ -nic user,model=virtio-net-pci,hostfwd=tcp::10022-:22 -vga virtio \ -drive if=virtio,file=miniroot73.img -device virtio-scsi-pci,id=scsi It is probably depending on timing and the timing on my machine is different. The temporary workaround patch results in a booting system. I fear that just returning from vioscsi_req_done may cause data corruption sometimes. I enabled debugging on the vioscsi driver, rebuilt the RAMDISK kernel with those drivers enabled, and got this: vioscsi0 at virtio1: qsize 128 scsibus0 at vioscsi0: 255 targets vioscsi_req_get: 0xfd803f80d338 vioscsi_scsi_cmd: enter vioscsi_scsi_cmd: polling... vioscsi_scsi_cmd: polling timeout The actual problem is here. One request times out, but the driver does not tell qemu that it should abort the request. The queue entry then gets reused and the two responses from qemu overwrite each other. You could try if increasing the timeout here to e.g. 1 helps: if (ISSET(xs->flags, SCSI_POLL)) { DPRINTF("vioscsi_scsi_cmd: polling...\n"); int timeout = 1000; do { virtio_poll_intr(vsc); if (vr->vr_xs != xs) break; delay(1000); } while (--timeout > 0); if (vr->vr_xs == xs) { // TODO(matthew): Abort the request. xs->error = XS_TIMEOUT; xs->resid = xs->datalen; DPRINTF("vioscsi_scsi_cmd: polling timeout\n"); scsi_done(xs); } Unfortunately, it order to properly abort the request, quite a bit of infrastructure related to the control queue is still missing in the driver. vioscsi_scsi_cmd: done (timeout=0) vioscsi_scsi_cmd: enter vioscsi_scsi_cmd: polling... vioscsi_vq_done: enter vioscsi_vq_done: slot=127 vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0xfd803f8a5e58 vioscsi_req_done: done 0, 2, 0 vioscsi_vq_done: slot=127 vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0x0 uvm_fault(0x813ec2e0, 0x8, 0, 1) -> e fatal page fault in supervisor mode trap type 6 code 0 rip 810e6190 cs 8 rflags 10286 cr2 8 cpl e rsp 81606670 gsbase 0x813dfff0 kgsbase 0x0 panic: trap type 6, code=0, pc=810e6190 That "xs: 0x0" bit feels like a clue. It should be trivial to pick up and handle, but what would be the correct way to handle that? If I have it return if "xs" is found to be NULL, it continues - the debugging suggests it goes through each possible target before finishing up. I don't know if that's correct, but it seems to continue booting after that even if my example didn't detect the drive with the kernel I built (I used the RAMDISK kernel and it was pretty stripped down). I'm about to attempt a -STABLE build (I've got 7.3 installed and thus can't yet build a snapshot, but I will do that if this test succeeds) - here's the patch that hopefully fixes the problem. (and hopefully gmail doesn't clobber the tabs) Index: sys/dev/pv/vioscsi.c === RCS file: /cvs/src/sys/dev/pv/vioscsi.c,v retrieving revision 1.30 diff -u -p -u -p -r1.30 vioscsi.c --- sys/dev/pv/vioscsi.c 16 Apr 2022 19:19:59 - 1.30 +++ sys/dev/pv/vioscsi.c 25 Apr 2023 12:51:16 - @@ -296,6 +296,7 @@ vioscsi_req_done(struct vioscsi_softc *s struct scsi_xfer *xs = vr->vr_xs; DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs); + if (xs == NULL) return; int isread = !!(xs->flags & SCSI_DATA_IN); bus_dmamap_sync(vsc->sc_dmat, vr->vr_control, offsetof(struct vioscsi_req, vr_req),
Re: OpenBSD 7.2 on Oracle Cloud
On Mon, May 1, 2023 at 10:08 AM Aaron Mason wrote: > > I can reproduce it with this in QEMU 8.0 in Winders (thanks Antun who > sent something like this to the bugs@ list): > > qemu-system-x86_64 -accel whpx,kernel-irqchip=off -machine q35 \ >-cpu EPYC-Rome,-monitor -m 8g -smp 6,sockets=1,cores=6 \ >-nic user,model=virtio-net-pci,hostfwd=tcp::10022-:22 -vga virtio \ >-drive if=virtio,file=miniroot73.img -device virtio-scsi-pci,id=scsi > > The temporary workaround patch results in a booting system. > The same occurs in 7.2 under Winders. > On Mon, May 1, 2023 at 4:56 AM Stefan Fritsch wrote: > > > > Hi, > > > > what qemu version are you using? I cannot reproduce this with qemu 7.2. > > Can you try with a newer qemu? > > > > Cheers, > > Stefan > > > > Am 25.04.23 um 14:53 schrieb Aaron Mason: > > [REDACTED] -- Aaron Mason - Programmer, open source addict I've taken my software vows - for beta or for worse
Re: OpenBSD 7.2 on Oracle Cloud
I can reproduce it with this in QEMU 8.0 in Winders (thanks Antun who sent something like this to the bugs@ list): qemu-system-x86_64 -accel whpx,kernel-irqchip=off -machine q35 \ -cpu EPYC-Rome,-monitor -m 8g -smp 6,sockets=1,cores=6 \ -nic user,model=virtio-net-pci,hostfwd=tcp::10022-:22 -vga virtio \ -drive if=virtio,file=miniroot73.img -device virtio-scsi-pci,id=scsi The temporary workaround patch results in a booting system. On Mon, May 1, 2023 at 4:56 AM Stefan Fritsch wrote: > > Hi, > > what qemu version are you using? I cannot reproduce this with qemu 7.2. > Can you try with a newer qemu? > > Cheers, > Stefan > > Am 25.04.23 um 14:53 schrieb Aaron Mason: > Yeah I'm getting the same thing. Trying a build in QEMU and > transferring in to see if that helps. Will report back. > > >>> > >>> Ok, good news, it still crashes at the same spot, but this time I've > >>> got more data. Copying in tech@ - if I've forgotten anything let me > >>> know and I'll fire up a fresh instance. > >>> > >>> [REDACTED] > >>> vioscsi_req_done(e,80024a00,fd803f81c338,e,80024a00,800 > >>> d3228) at vioscsi_req_done+0x26 > >>> [REDACTED] > >> > >> Ok, so based on the trace I got, I was able to trace the stop itself > >> back to line 299 of vioscsi.c (thank. you. random relink. And > >> anonymous CVS): > >> > >> 293 vioscsi_req_done(struct vioscsi_softc *sc, struct virtio_softc > >> *vsc, > >> 294 struct vioscsi_req *vr) > >> 295 { > >> 296 struct scsi_xfer *xs = vr->vr_xs; > >> 297 DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, > >> xs); > >> 298 > >> -->299 int isread = !!(xs->flags & SCSI_DATA_IN); > >> 300 bus_dmamap_sync(vsc->sc_dmat, vr->vr_control, > >> 301 offsetof(struct vioscsi_req, vr_req), > >> 302 sizeof(struct virtio_scsi_req_hdr), > >> 303 BUS_DMASYNC_POSTWRITE); > >> > >> Maybe if I follow the rabbit hole enough, I might find out what's > >> going wrong between the driver and OCI. I've got a day off tomorrow > >> (yay for war I guess), I'll give it a bash and see where we end up. > >> > >> -- > >> Aaron Mason - Programmer, open source addict > >> I've taken my software vows - for beta or for worse > > > > I enabled debugging on the vioscsi driver, rebuilt the RAMDISK kernel > > with those drivers enabled, and got this: > > > > vioscsi0 at virtio1: qsize 128 > > scsibus0 at vioscsi0: 255 targets > > vioscsi_req_get: 0xfd803f80d338 > > vioscsi_scsi_cmd: enter > > vioscsi_scsi_cmd: polling... > > vioscsi_scsi_cmd: polling timeout > > vioscsi_scsi_cmd: done (timeout=0) > > vioscsi_scsi_cmd: enter > > vioscsi_scsi_cmd: polling... > > vioscsi_vq_done: enter > > vioscsi_vq_done: slot=127 > > vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0xfd803f8a5e58 > > vioscsi_req_done: done 0, 2, 0 > > vioscsi_vq_done: slot=127 > > vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0x0 > > uvm_fault(0x813ec2e0, 0x8, 0, 1) -> e > > fatal page fault in supervisor mode > > trap type 6 code 0 rip 810e6190 cs 8 rflags 10286 cr2 8 cpl e > > rsp 81606670 > > gsbase 0x813dfff0 kgsbase 0x0 > > panic: trap type 6, code=0, pc=810e6190 > > > > That "xs: 0x0" bit feels like a clue. It should be trivial to pick up > > and handle, but what would be the correct way to handle that? > > > > If I have it return if "xs" is found to be NULL, it continues - the > > debugging suggests it goes through each possible target before > > finishing up. I don't know if that's correct, but it seems to continue > > booting after that even if my example didn't detect the drive with the > > kernel I built (I used the RAMDISK kernel and it was pretty stripped > > down). > > > > I'm about to attempt a -STABLE build (I've got 7.3 installed and thus > > can't yet build a snapshot, but I will do that if this test succeeds) > > - here's the patch that hopefully fixes the problem. (and hopefully > > gmail doesn't clobber the tabs) > > > > Index: sys/dev/pv/vioscsi.c > > === > > RCS file: /cvs/src/sys/dev/pv/vioscsi.c,v > > retrieving revision 1.30 > > diff -u -p -u -p -r1.30 vioscsi.c > > --- sys/dev/pv/vioscsi.c 16 Apr 2022 19:19:59 - 1.30 > > +++ sys/dev/pv/vioscsi.c 25 Apr 2023 12:51:16 - > > @@ -296,6 +296,7 @@ vioscsi_req_done(struct vioscsi_softc *s > >struct scsi_xfer *xs = vr->vr_xs; > >DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs); > > > > + if (xs == NULL) return; > >int isread = !!(xs->flags & SCSI_DATA_IN); > >bus_dmamap_sync(vsc->sc_dmat, vr->vr_control, > >offsetof(struct vioscsi_req, vr_req), > > > > -- Aaron Mason - Programmer, open source addict I've taken my software vows - for beta or for worse
Re: OpenBSD 7.2 on Oracle Cloud
Hi, what qemu version are you using? I cannot reproduce this with qemu 7.2. Can you try with a newer qemu? Cheers, Stefan Am 25.04.23 um 14:53 schrieb Aaron Mason: Yeah I'm getting the same thing. Trying a build in QEMU and transferring in to see if that helps. Will report back. Ok, good news, it still crashes at the same spot, but this time I've got more data. Copying in tech@ - if I've forgotten anything let me know and I'll fire up a fresh instance. [REDACTED] vioscsi_req_done(e,80024a00,fd803f81c338,e,80024a00,800 d3228) at vioscsi_req_done+0x26 [REDACTED] Ok, so based on the trace I got, I was able to trace the stop itself back to line 299 of vioscsi.c (thank. you. random relink. And anonymous CVS): 293 vioscsi_req_done(struct vioscsi_softc *sc, struct virtio_softc *vsc, 294 struct vioscsi_req *vr) 295 { 296 struct scsi_xfer *xs = vr->vr_xs; 297 DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs); 298 -->299 int isread = !!(xs->flags & SCSI_DATA_IN); 300 bus_dmamap_sync(vsc->sc_dmat, vr->vr_control, 301 offsetof(struct vioscsi_req, vr_req), 302 sizeof(struct virtio_scsi_req_hdr), 303 BUS_DMASYNC_POSTWRITE); Maybe if I follow the rabbit hole enough, I might find out what's going wrong between the driver and OCI. I've got a day off tomorrow (yay for war I guess), I'll give it a bash and see where we end up. -- Aaron Mason - Programmer, open source addict I've taken my software vows - for beta or for worse I enabled debugging on the vioscsi driver, rebuilt the RAMDISK kernel with those drivers enabled, and got this: vioscsi0 at virtio1: qsize 128 scsibus0 at vioscsi0: 255 targets vioscsi_req_get: 0xfd803f80d338 vioscsi_scsi_cmd: enter vioscsi_scsi_cmd: polling... vioscsi_scsi_cmd: polling timeout vioscsi_scsi_cmd: done (timeout=0) vioscsi_scsi_cmd: enter vioscsi_scsi_cmd: polling... vioscsi_vq_done: enter vioscsi_vq_done: slot=127 vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0xfd803f8a5e58 vioscsi_req_done: done 0, 2, 0 vioscsi_vq_done: slot=127 vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0x0 uvm_fault(0x813ec2e0, 0x8, 0, 1) -> e fatal page fault in supervisor mode trap type 6 code 0 rip 810e6190 cs 8 rflags 10286 cr2 8 cpl e rsp 81606670 gsbase 0x813dfff0 kgsbase 0x0 panic: trap type 6, code=0, pc=810e6190 That "xs: 0x0" bit feels like a clue. It should be trivial to pick up and handle, but what would be the correct way to handle that? If I have it return if "xs" is found to be NULL, it continues - the debugging suggests it goes through each possible target before finishing up. I don't know if that's correct, but it seems to continue booting after that even if my example didn't detect the drive with the kernel I built (I used the RAMDISK kernel and it was pretty stripped down). I'm about to attempt a -STABLE build (I've got 7.3 installed and thus can't yet build a snapshot, but I will do that if this test succeeds) - here's the patch that hopefully fixes the problem. (and hopefully gmail doesn't clobber the tabs) Index: sys/dev/pv/vioscsi.c === RCS file: /cvs/src/sys/dev/pv/vioscsi.c,v retrieving revision 1.30 diff -u -p -u -p -r1.30 vioscsi.c --- sys/dev/pv/vioscsi.c 16 Apr 2022 19:19:59 - 1.30 +++ sys/dev/pv/vioscsi.c 25 Apr 2023 12:51:16 - @@ -296,6 +296,7 @@ vioscsi_req_done(struct vioscsi_softc *s struct scsi_xfer *xs = vr->vr_xs; DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs); + if (xs == NULL) return; int isread = !!(xs->flags & SCSI_DATA_IN); bus_dmamap_sync(vsc->sc_dmat, vr->vr_control, offsetof(struct vioscsi_req, vr_req),
Re: OpenBSD 7.2 on Oracle Cloud
On Tue, Apr 25, 2023 at 10:53 PM Aaron Mason wrote: > > On Mon, Apr 24, 2023 at 3:47 PM Aaron Mason wrote: > > > > On Fri, Apr 21, 2023 at 2:50 PM Aaron Mason > > wrote: > > > > > > On Fri, Apr 21, 2023 at 1:39 PM Aaron Mason > > > wrote: > > > > > > > > On Fri, Apr 7, 2023 at 3:25 AM Antun Matanović > > > > wrote: > > > > > > > > > > On Thu, 6 Apr 2023 at 12:55, Fabio Martins wrote: > > > > > > > > > > > > Try to add an entry in grub like in this article: > > > > > > > > > > > > https://raby.sh/installing-openbsd-on-ovhs-vps-2016-kvm-machines.html > > > > > > > > > > I have tried that, but it did not resolve the issue. Sorry I forgot to > > > > > mention it originally. > > > > > > > > > > On Thu, 6 Apr 2023 at 14:24, Janne Johansson > > > > > wrote: > > > > > > > > > > > > That is very much not the same issue. The arm64 instances on Oracle > > > > > > finds the correct kernel and boots it, it just crashes at or after > > > > > > the > > > > > > scsi attachment. > > > > > > > > > > This has been my experience as well, except on the amd64 instance, > > > > > haven't tried arm64. > > > > > > > > > > > > > Yeah I'm getting the same thing. Trying a build in QEMU and > > > > transferring in to see if that helps. Will report back. > > > > > > > > > > Ok, good news, it still crashes at the same spot, but this time I've > > > got more data. Copying in tech@ - if I've forgotten anything let me > > > know and I'll fire up a fresh instance. > > > > > > [REDACTED] > > > vioscsi_req_done(e,80024a00,fd803f81c338,e,80024a00,800 > > > d3228) at vioscsi_req_done+0x26 > > > [REDACTED] > > > > Ok, so based on the trace I got, I was able to trace the stop itself > > back to line 299 of vioscsi.c (thank. you. random relink. And > > anonymous CVS): > > > >293 vioscsi_req_done(struct vioscsi_softc *sc, struct virtio_softc *vsc, > >294 struct vioscsi_req *vr) > >295 { > >296 struct scsi_xfer *xs = vr->vr_xs; > >297 DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs); > >298 > > -->299 int isread = !!(xs->flags & SCSI_DATA_IN); > >300 bus_dmamap_sync(vsc->sc_dmat, vr->vr_control, > >301 offsetof(struct vioscsi_req, vr_req), > >302 sizeof(struct virtio_scsi_req_hdr), > >303 BUS_DMASYNC_POSTWRITE); > > > > Maybe if I follow the rabbit hole enough, I might find out what's > > going wrong between the driver and OCI. I've got a day off tomorrow > > (yay for war I guess), I'll give it a bash and see where we end up. > > > > -- > > Aaron Mason - Programmer, open source addict > > I've taken my software vows - for beta or for worse > > I enabled debugging on the vioscsi driver, rebuilt the RAMDISK kernel > with those drivers enabled, and got this: > > vioscsi0 at virtio1: qsize 128 > scsibus0 at vioscsi0: 255 targets > vioscsi_req_get: 0xfd803f80d338 > vioscsi_scsi_cmd: enter > vioscsi_scsi_cmd: polling... > vioscsi_scsi_cmd: polling timeout > vioscsi_scsi_cmd: done (timeout=0) > vioscsi_scsi_cmd: enter > vioscsi_scsi_cmd: polling... > vioscsi_vq_done: enter > vioscsi_vq_done: slot=127 > vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0xfd803f8a5e58 > vioscsi_req_done: done 0, 2, 0 > vioscsi_vq_done: slot=127 > vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0x0 > uvm_fault(0x813ec2e0, 0x8, 0, 1) -> e > fatal page fault in supervisor mode > trap type 6 code 0 rip 810e6190 cs 8 rflags 10286 cr2 8 cpl e > rsp 81606670 > gsbase 0x813dfff0 kgsbase 0x0 > panic: trap type 6, code=0, pc=810e6190 > > That "xs: 0x0" bit feels like a clue. It should be trivial to pick up > and handle, but what would be the correct way to handle that? > > If I have it return if "xs" is found to be NULL, it continues - the > debugging suggests it goes through each possible target before > finishing up. I don't know if that's correct, but it seems to continue > booting after that even if my example didn't detect the drive with the > kernel I built (I used the RAMDISK kernel and it was pretty stripped > down). > > I'm about to attempt a -STABLE build (I've got 7.3 installed and thus > can't yet build a snapshot, but I will do that if this test succeeds) > - here's the patch that hopefully fixes the problem. (and hopefully > gmail doesn't clobber the tabs) > > Index: sys/dev/pv/vioscsi.c > === > RCS file: /cvs/src/sys/dev/pv/vioscsi.c,v > retrieving revision 1.30 > diff -u -p -u -p -r1.30 vioscsi.c > --- sys/dev/pv/vioscsi.c 16 Apr 2022 19:19:59 - 1.30 > +++ sys/dev/pv/vioscsi.c 25 Apr 2023 12:51:16 - > @@ -296,6 +296,7 @@ vioscsi_req_done(struct vioscsi_softc *s > struct scsi_xfer *xs = vr->vr_xs; > DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs); > > + if (xs == NULL) return; > int isread = !!(xs->flags & SCSI_DATA_IN); > bus_dmamap_sync(vsc->sc_dmat,
Re: OpenBSD 7.2 on Oracle Cloud
On Mon, Apr 24, 2023 at 3:47 PM Aaron Mason wrote: > > On Fri, Apr 21, 2023 at 2:50 PM Aaron Mason wrote: > > > > On Fri, Apr 21, 2023 at 1:39 PM Aaron Mason > > wrote: > > > > > > On Fri, Apr 7, 2023 at 3:25 AM Antun Matanović > > > wrote: > > > > > > > > On Thu, 6 Apr 2023 at 12:55, Fabio Martins wrote: > > > > > > > > > > Try to add an entry in grub like in this article: > > > > > > > > > > https://raby.sh/installing-openbsd-on-ovhs-vps-2016-kvm-machines.html > > > > > > > > I have tried that, but it did not resolve the issue. Sorry I forgot to > > > > mention it originally. > > > > > > > > On Thu, 6 Apr 2023 at 14:24, Janne Johansson > > > > wrote: > > > > > > > > > > That is very much not the same issue. The arm64 instances on Oracle > > > > > finds the correct kernel and boots it, it just crashes at or after the > > > > > scsi attachment. > > > > > > > > This has been my experience as well, except on the amd64 instance, > > > > haven't tried arm64. > > > > > > > > > > Yeah I'm getting the same thing. Trying a build in QEMU and > > > transferring in to see if that helps. Will report back. > > > > > > > Ok, good news, it still crashes at the same spot, but this time I've > > got more data. Copying in tech@ - if I've forgotten anything let me > > know and I'll fire up a fresh instance. > > > > [REDACTED] > > vioscsi_req_done(e,80024a00,fd803f81c338,e,80024a00,800 > > d3228) at vioscsi_req_done+0x26 > > [REDACTED] > > Ok, so based on the trace I got, I was able to trace the stop itself > back to line 299 of vioscsi.c (thank. you. random relink. And > anonymous CVS): > >293 vioscsi_req_done(struct vioscsi_softc *sc, struct virtio_softc *vsc, >294 struct vioscsi_req *vr) >295 { >296 struct scsi_xfer *xs = vr->vr_xs; >297 DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs); >298 > -->299 int isread = !!(xs->flags & SCSI_DATA_IN); >300 bus_dmamap_sync(vsc->sc_dmat, vr->vr_control, >301 offsetof(struct vioscsi_req, vr_req), >302 sizeof(struct virtio_scsi_req_hdr), >303 BUS_DMASYNC_POSTWRITE); > > Maybe if I follow the rabbit hole enough, I might find out what's > going wrong between the driver and OCI. I've got a day off tomorrow > (yay for war I guess), I'll give it a bash and see where we end up. > > -- > Aaron Mason - Programmer, open source addict > I've taken my software vows - for beta or for worse I enabled debugging on the vioscsi driver, rebuilt the RAMDISK kernel with those drivers enabled, and got this: vioscsi0 at virtio1: qsize 128 scsibus0 at vioscsi0: 255 targets vioscsi_req_get: 0xfd803f80d338 vioscsi_scsi_cmd: enter vioscsi_scsi_cmd: polling... vioscsi_scsi_cmd: polling timeout vioscsi_scsi_cmd: done (timeout=0) vioscsi_scsi_cmd: enter vioscsi_scsi_cmd: polling... vioscsi_vq_done: enter vioscsi_vq_done: slot=127 vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0xfd803f8a5e58 vioscsi_req_done: done 0, 2, 0 vioscsi_vq_done: slot=127 vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0x0 uvm_fault(0x813ec2e0, 0x8, 0, 1) -> e fatal page fault in supervisor mode trap type 6 code 0 rip 810e6190 cs 8 rflags 10286 cr2 8 cpl e rsp 81606670 gsbase 0x813dfff0 kgsbase 0x0 panic: trap type 6, code=0, pc=810e6190 That "xs: 0x0" bit feels like a clue. It should be trivial to pick up and handle, but what would be the correct way to handle that? If I have it return if "xs" is found to be NULL, it continues - the debugging suggests it goes through each possible target before finishing up. I don't know if that's correct, but it seems to continue booting after that even if my example didn't detect the drive with the kernel I built (I used the RAMDISK kernel and it was pretty stripped down). I'm about to attempt a -STABLE build (I've got 7.3 installed and thus can't yet build a snapshot, but I will do that if this test succeeds) - here's the patch that hopefully fixes the problem. (and hopefully gmail doesn't clobber the tabs) Index: sys/dev/pv/vioscsi.c === RCS file: /cvs/src/sys/dev/pv/vioscsi.c,v retrieving revision 1.30 diff -u -p -u -p -r1.30 vioscsi.c --- sys/dev/pv/vioscsi.c 16 Apr 2022 19:19:59 - 1.30 +++ sys/dev/pv/vioscsi.c 25 Apr 2023 12:51:16 - @@ -296,6 +296,7 @@ vioscsi_req_done(struct vioscsi_softc *s struct scsi_xfer *xs = vr->vr_xs; DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs); + if (xs == NULL) return; int isread = !!(xs->flags & SCSI_DATA_IN); bus_dmamap_sync(vsc->sc_dmat, vr->vr_control, offsetof(struct vioscsi_req, vr_req), -- Aaron Mason - Programmer, open source addict I've taken my software vows - for beta or for worse
Re: OpenBSD 7.2 on Oracle Cloud
On Fri, Apr 21, 2023 at 2:50 PM Aaron Mason wrote: > > On Fri, Apr 21, 2023 at 1:39 PM Aaron Mason wrote: > > > > On Fri, Apr 7, 2023 at 3:25 AM Antun Matanović > > wrote: > > > > > > On Thu, 6 Apr 2023 at 12:55, Fabio Martins wrote: > > > > > > > > Try to add an entry in grub like in this article: > > > > > > > > https://raby.sh/installing-openbsd-on-ovhs-vps-2016-kvm-machines.html > > > > > > I have tried that, but it did not resolve the issue. Sorry I forgot to > > > mention it originally. > > > > > > On Thu, 6 Apr 2023 at 14:24, Janne Johansson wrote: > > > > > > > > That is very much not the same issue. The arm64 instances on Oracle > > > > finds the correct kernel and boots it, it just crashes at or after the > > > > scsi attachment. > > > > > > This has been my experience as well, except on the amd64 instance, > > > haven't tried arm64. > > > > > > > Yeah I'm getting the same thing. Trying a build in QEMU and > > transferring in to see if that helps. Will report back. > > > > Ok, good news, it still crashes at the same spot, but this time I've > got more data. Copying in tech@ - if I've forgotten anything let me > know and I'll fire up a fresh instance. > > [REDACTED] > vioscsi_req_done(e,80024a00,fd803f81c338,e,80024a00,800 > d3228) at vioscsi_req_done+0x26 > [REDACTED] Ok, so based on the trace I got, I was able to trace the stop itself back to line 299 of vioscsi.c (thank. you. random relink. And anonymous CVS): 293 vioscsi_req_done(struct vioscsi_softc *sc, struct virtio_softc *vsc, 294 struct vioscsi_req *vr) 295 { 296 struct scsi_xfer *xs = vr->vr_xs; 297 DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs); 298 -->299 int isread = !!(xs->flags & SCSI_DATA_IN); 300 bus_dmamap_sync(vsc->sc_dmat, vr->vr_control, 301 offsetof(struct vioscsi_req, vr_req), 302 sizeof(struct virtio_scsi_req_hdr), 303 BUS_DMASYNC_POSTWRITE); Maybe if I follow the rabbit hole enough, I might find out what's going wrong between the driver and OCI. I've got a day off tomorrow (yay for war I guess), I'll give it a bash and see where we end up. -- Aaron Mason - Programmer, open source addict I've taken my software vows - for beta or for worse
Re: OpenBSD 7.2 on Oracle Cloud
On Fri, Apr 21, 2023 at 1:39 PM Aaron Mason wrote: > > On Fri, Apr 7, 2023 at 3:25 AM Antun Matanović > wrote: > > > > On Thu, 6 Apr 2023 at 12:55, Fabio Martins wrote: > > > > > > Try to add an entry in grub like in this article: > > > > > > https://raby.sh/installing-openbsd-on-ovhs-vps-2016-kvm-machines.html > > > > I have tried that, but it did not resolve the issue. Sorry I forgot to > > mention it originally. > > > > On Thu, 6 Apr 2023 at 14:24, Janne Johansson wrote: > > > > > > That is very much not the same issue. The arm64 instances on Oracle > > > finds the correct kernel and boots it, it just crashes at or after the > > > scsi attachment. > > > > This has been my experience as well, except on the amd64 instance, > > haven't tried arm64. > > > > Yeah I'm getting the same thing. Trying a build in QEMU and > transferring in to see if that helps. Will report back. > Ok, good news, it still crashes at the same spot, but this time I've got more data. Copying in tech@ - if I've forgotten anything let me know and I'll fire up a fresh instance. OpenBSD 7.3 (GENERIC.MP) #1125: Sat Mar 25 10:36:29 MDT 2023 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 1056817152 (1007MB) avail mem = 1005449216 (958MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf7050 (9 entries) bios0: vendor SeaBIOS version "?-20171121_152543-x86-ol7-builder-01.us.oracle.c om-4.el7.1" date 04/01/2014 bios0: QEMU Standard PC (i440FX + PIIX, 1996) acpi0 at bios0: ACPI 1.0 acpi0: sleep states S3 S4 S5 acpi0: tables DSDT FACP APIC HPET acpi0: wakeup devices acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: AMD EPYC 7551 32-Core Processor, 1997.30 MHz, 17-01-02 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C FLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MO VBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP, LONG,LAHF,CMPLEG,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,TOPEXT,CPCTR,FSGSBASE,TSC_AD JUST,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,VIRTSSBD,XSAVEOPT, XSAVEC,XGETBV1,XSAVES cpu0: 64KB 64b/line 2-way D-cache, 64KB 64b/line 2-way I-cache, 512KB 64b/line 1 6-way L2 cache, 16MB 64b/line 16-way L3 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 1000MHz cpu1 at mainbus0: apid 1 (application processor) cpu1: AMD EPYC 7551 32-Core Processor, 3377.48 MHz, 17-01-02 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C FLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MO VBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP, LONG,LAHF,CMPLEG,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,TOPEXT,CPCTR,FSGSBASE,TSC_AD JUST,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,VIRTSSBD,XSAVEOPT, XSAVEC,XGETBV1,XSAVES cpu1: 64KB 64b/line 2-way D-cache, 64KB 64b/line 2-way I-cache, 512KB 64b/line 1 6-way L2 cache, 16MB 64b/line 16-way L3 cache cpu1: smt 0, core 0, package 1 ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 24 pins acpihpet0 at acpi0: 1 Hz acpiprt0 at acpi0: bus 0 (PCI0) "ACPI0006" at acpi0 not configured acpipci0 at acpi0 PCI0 acpicmos0 at acpi0 com0 at acpi0 COM1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo com0: console "QEMU0001" at acpi0 not configured "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured "QEMU0002" at acpi0 not configured "ACPI0010" at acpi0 not configured acpicpu0 at acpi0: C1(@1 halt!) acpicpu1 at acpi0: C1(@1 halt!) pvbus0 at mainbus0: KVM pvclock0 at pvbus0 pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02 pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00 pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 w ired to compatibility, channel 1 wired to compatibility pciide0: channel 0 disabled (no drives) pciide0: channel 1 disabled (no drives) uhci0 at pci0 dev 1 function 2 "Intel 82371SB USB" rev 0x01: apic 0 int 11 piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 0 int 9 iic0 at piixpm0 vga1 at pci0 dev 2 function 0 "Bochs VGA" rev 0x02 wsdisplay at vga1 not configured virtio0 at pci0 dev 3 function 0 "Qumranet Virtio Network" rev 0x00 vio0 at virtio0: address 02:00:17:00:21:c1 virtio0: msix shared virtio1 at pci0 dev 4 function 0 "Qumranet Virtio SCSI" rev 0x00 vioscsi0 at virtio1: qsize 128 scsibus1 at vioscsi0: 255 targets uvm_fault(0x82533360, 0x8, 0, 1) -> e kernel: page fault trap, code=0 Stopped at vioscsi_req_done+0x26: movl0x8(%r15),%ebx TIDPIDUID PRFLAGS PFLAGS CPU COMMAND * 0 0 0 0x1 0x2000K swapper