Re: OpenBSD 7.2 on Oracle Cloud

2023-05-04 Thread Stefan Fritsch



On Thu, 4 May 2023, Aaron Mason wrote:

> On Mon, May 1, 2023 at 4:56 AM Stefan Fritsch  wrote:
> >
> > Hi,
> >
> > what qemu version are you using? I cannot reproduce this with qemu 7.2.
> > Can you try with a newer qemu?
> >
> > Cheers,
> > Stefan
> >
> 
> What is the host OS where you're running QEMU? Just looking to
> eliminate any variables at play here.

I run it on linux. Yesterday, I have tried to debug why a scsi command 
could time out in any case, but without success so far. Also no idea why 
it should behave differently on windows and linux. It is probably the 
initial TEST_UNIT_READY command that goes wrong.


Re: OpenBSD 7.2 on Oracle Cloud

2023-05-03 Thread Aaron Mason
On Mon, May 1, 2023 at 4:56 AM Stefan Fritsch  wrote:
>
> Hi,
>
> what qemu version are you using? I cannot reproduce this with qemu 7.2.
> Can you try with a newer qemu?
>
> Cheers,
> Stefan
>

What is the host OS where you're running QEMU? Just looking to
eliminate any variables at play here.

-- 
Aaron Mason - Programmer, open source addict
I've taken my software vows - for beta or for worse



Re: OpenBSD 7.2 on Oracle Cloud

2023-05-02 Thread Aaron Mason
On Tue, May 2, 2023 at 11:30 AM Aaron Mason  wrote:
>
> On Tue, May 2, 2023 at 9:29 AM Aaron Mason  wrote:
> > [REDACTED]
> > >
> > > The actual problem is here. One request times out, but the driver does
> > > not tell qemu that it should abort the request. The queue entry then
> > > gets reused and the two responses from qemu overwrite each other. You
> > > could try if increasing the timeout here to e.g. 1 helps:
> > >
> > >  if (ISSET(xs->flags, SCSI_POLL)) {
> > >  DPRINTF("vioscsi_scsi_cmd: polling...\n");
> > >  int timeout = 1000;
> > >  do {
> > >  virtio_poll_intr(vsc);
> > >  if (vr->vr_xs != xs)
> > >  break;
> > >  delay(1000);
> > >  } while (--timeout > 0);
> > >  if (vr->vr_xs == xs) {
> > >  // TODO(matthew): Abort the request.
> > >  xs->error = XS_TIMEOUT;
> > >  xs->resid = xs->datalen;
> > >  DPRINTF("vioscsi_scsi_cmd: polling timeout\n");
> > >  scsi_done(xs);
> > >  }
> > >
> > > Unfortunately, it order to properly abort the request, quite a bit of
> > > infrastructure related to the control queue is still missing in the 
> > > driver.
> >
> > I'll give it a go and report back, thanks!
> >
>
> No dice, it only takes longer to generate a page fault.
>
> --
> Aaron Mason - Programmer, open source addict
> I've taken my software vows - for beta or for worse

Increasing it to 4 has the same effect, only longer.

-- 
Aaron Mason - Programmer, open source addict
I've taken my software vows - for beta or for worse



Re: OpenBSD 7.2 on Oracle Cloud

2023-05-01 Thread Aaron Mason
On Tue, May 2, 2023 at 9:29 AM Aaron Mason  wrote:
> [REDACTED]
> >
> > The actual problem is here. One request times out, but the driver does
> > not tell qemu that it should abort the request. The queue entry then
> > gets reused and the two responses from qemu overwrite each other. You
> > could try if increasing the timeout here to e.g. 1 helps:
> >
> >  if (ISSET(xs->flags, SCSI_POLL)) {
> >  DPRINTF("vioscsi_scsi_cmd: polling...\n");
> >  int timeout = 1000;
> >  do {
> >  virtio_poll_intr(vsc);
> >  if (vr->vr_xs != xs)
> >  break;
> >  delay(1000);
> >  } while (--timeout > 0);
> >  if (vr->vr_xs == xs) {
> >  // TODO(matthew): Abort the request.
> >  xs->error = XS_TIMEOUT;
> >  xs->resid = xs->datalen;
> >  DPRINTF("vioscsi_scsi_cmd: polling timeout\n");
> >  scsi_done(xs);
> >  }
> >
> > Unfortunately, it order to properly abort the request, quite a bit of
> > infrastructure related to the control queue is still missing in the driver.
>
> I'll give it a go and report back, thanks!
>

No dice, it only takes longer to generate a page fault.

-- 
Aaron Mason - Programmer, open source addict
I've taken my software vows - for beta or for worse



Re: OpenBSD 7.2 on Oracle Cloud

2023-05-01 Thread Aaron Mason
On Mon, May 1, 2023 at 11:32 PM Stefan Fritsch  wrote:
>
> Dropping misc@ from cc
>
> Am 01.05.23 um 02:08 schrieb Aaron Mason:
> > I can reproduce it with this in QEMU 8.0 in Winders (thanks Antun who
> > sent something like this to the bugs@ list):
> >
> > qemu-system-x86_64 -accel whpx,kernel-irqchip=off -machine q35 \
> > -cpu EPYC-Rome,-monitor -m 8g -smp 6,sockets=1,cores=6 \
> > -nic user,model=virtio-net-pci,hostfwd=tcp::10022-:22 -vga virtio \
> > -drive if=virtio,file=miniroot73.img -device virtio-scsi-pci,id=scsi
>
> It is probably depending on timing and the timing on my machine is
> different.
>
> > The temporary workaround patch results in a booting system.
>
> I fear that just returning from vioscsi_req_done may cause data
> corruption sometimes.

Yeah I was concerned that such a blanket drop might result in, to
quote a certain G-Man, "unforeseen consequences" (mostly out of lack
of experience), hence why I stopped well short of declaring the
problem fixed.

>
> >>> I enabled debugging on the vioscsi driver, rebuilt the RAMDISK kernel
> >>> with those drivers enabled, and got this:
> >>>
> >>> vioscsi0 at virtio1: qsize 128
> >>> scsibus0 at vioscsi0: 255 targets
> >>> vioscsi_req_get: 0xfd803f80d338
> >>> vioscsi_scsi_cmd: enter
> >>> vioscsi_scsi_cmd: polling...
> >>> vioscsi_scsi_cmd: polling timeout
>
> The actual problem is here. One request times out, but the driver does
> not tell qemu that it should abort the request. The queue entry then
> gets reused and the two responses from qemu overwrite each other. You
> could try if increasing the timeout here to e.g. 1 helps:
>
>  if (ISSET(xs->flags, SCSI_POLL)) {
>  DPRINTF("vioscsi_scsi_cmd: polling...\n");
>  int timeout = 1000;
>  do {
>  virtio_poll_intr(vsc);
>  if (vr->vr_xs != xs)
>  break;
>  delay(1000);
>  } while (--timeout > 0);
>  if (vr->vr_xs == xs) {
>  // TODO(matthew): Abort the request.
>  xs->error = XS_TIMEOUT;
>  xs->resid = xs->datalen;
>  DPRINTF("vioscsi_scsi_cmd: polling timeout\n");
>  scsi_done(xs);
>  }
>
> Unfortunately, it order to properly abort the request, quite a bit of
> infrastructure related to the control queue is still missing in the driver.

I'll give it a go and report back, thanks!

>
> >>> vioscsi_scsi_cmd: done (timeout=0)
> >>> vioscsi_scsi_cmd: enter
> >>> vioscsi_scsi_cmd: polling...
> >>> vioscsi_vq_done: enter
> >>> vioscsi_vq_done: slot=127
> >>> vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0xfd803f8a5e58
> >>> vioscsi_req_done: done 0, 2, 0
> >>> vioscsi_vq_done: slot=127
> >>> vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0x0
> >>> uvm_fault(0x813ec2e0, 0x8, 0, 1) -> e
> >>> fatal page fault in supervisor mode
> >>> trap type 6 code 0 rip 810e6190 cs 8 rflags 10286 cr2 8 cpl e
> >>> rsp 81606670
> >>> gsbase 0x813dfff0  kgsbase 0x0
> >>> panic: trap type 6, code=0, pc=810e6190
> >>>
> >>> That "xs: 0x0" bit feels like a clue. It should be trivial to pick up
> >>> and handle, but what would be the correct way to handle that?
> >>>
> >>> If I have it return if "xs" is found to be NULL, it continues - the
> >>> debugging suggests it goes through each possible target before
> >>> finishing up. I don't know if that's correct, but it seems to continue
> >>> booting after that even if my example didn't detect the drive with the
> >>> kernel I built (I used the RAMDISK kernel and it was pretty stripped
> >>> down).
> >>>
> >>> I'm about to attempt a -STABLE build (I've got 7.3 installed and thus
> >>> can't yet build a snapshot, but I will do that if this test succeeds)
> >>> - here's the patch that hopefully fixes the problem. (and hopefully
> >>> gmail doesn't clobber the tabs)
> >>>
> >>> Index: sys/dev/pv/vioscsi.c
> >>> ===
> >>> RCS file: /cvs/src/sys/dev/pv/vioscsi.c,v
> >>> retrieving revision 1.30
> >>> diff -u -p -u -p -r1.30 vioscsi.c
> >>> --- sys/dev/pv/vioscsi.c 16 Apr 2022 19:19:59 - 1.30
> >>> +++ sys/dev/pv/vioscsi.c 25 Apr 2023 12:51:16 -
> >>> @@ -296,6 +296,7 @@ vioscsi_req_done(struct vioscsi_softc *s
> >>> struct scsi_xfer *xs = vr->vr_xs;
> >>> DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs);
> >>>
> >>> + if (xs == NULL) return;
> >>> int isread = !!(xs->flags & SCSI_DATA_IN);
> >>> bus_dmamap_sync(vsc->sc_dmat, vr->vr_control,
> >>> offsetof(struct vioscsi_req, vr_req),
> >>>
> >>>
> >
> >
> >



-- 
Aaron Mason - Programmer, open source addict
I've taken my software vows - for beta or for worse



Re: OpenBSD 7.2 on Oracle Cloud

2023-05-01 Thread Stefan Fritsch

Dropping misc@ from cc

Am 01.05.23 um 02:08 schrieb Aaron Mason:

I can reproduce it with this in QEMU 8.0 in Winders (thanks Antun who
sent something like this to the bugs@ list):

qemu-system-x86_64 -accel whpx,kernel-irqchip=off -machine q35 \
-cpu EPYC-Rome,-monitor -m 8g -smp 6,sockets=1,cores=6 \
-nic user,model=virtio-net-pci,hostfwd=tcp::10022-:22 -vga virtio \
-drive if=virtio,file=miniroot73.img -device virtio-scsi-pci,id=scsi


It is probably depending on timing and the timing on my machine is 
different.



The temporary workaround patch results in a booting system.


I fear that just returning from vioscsi_req_done may cause data 
corruption sometimes.



I enabled debugging on the vioscsi driver, rebuilt the RAMDISK kernel
with those drivers enabled, and got this:

vioscsi0 at virtio1: qsize 128
scsibus0 at vioscsi0: 255 targets
vioscsi_req_get: 0xfd803f80d338
vioscsi_scsi_cmd: enter
vioscsi_scsi_cmd: polling...
vioscsi_scsi_cmd: polling timeout


The actual problem is here. One request times out, but the driver does 
not tell qemu that it should abort the request. The queue entry then 
gets reused and the two responses from qemu overwrite each other. You 
could try if increasing the timeout here to e.g. 1 helps:


if (ISSET(xs->flags, SCSI_POLL)) {
DPRINTF("vioscsi_scsi_cmd: polling...\n");
int timeout = 1000;
do {
virtio_poll_intr(vsc);
if (vr->vr_xs != xs)
break;
delay(1000);
} while (--timeout > 0);
if (vr->vr_xs == xs) {
// TODO(matthew): Abort the request.
xs->error = XS_TIMEOUT;
xs->resid = xs->datalen;
DPRINTF("vioscsi_scsi_cmd: polling timeout\n");
scsi_done(xs);
}

Unfortunately, it order to properly abort the request, quite a bit of 
infrastructure related to the control queue is still missing in the driver.



vioscsi_scsi_cmd: done (timeout=0)
vioscsi_scsi_cmd: enter
vioscsi_scsi_cmd: polling...
vioscsi_vq_done: enter
vioscsi_vq_done: slot=127
vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0xfd803f8a5e58
vioscsi_req_done: done 0, 2, 0
vioscsi_vq_done: slot=127
vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0x0
uvm_fault(0x813ec2e0, 0x8, 0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip 810e6190 cs 8 rflags 10286 cr2 8 cpl e
rsp 81606670
gsbase 0x813dfff0  kgsbase 0x0
panic: trap type 6, code=0, pc=810e6190

That "xs: 0x0" bit feels like a clue. It should be trivial to pick up
and handle, but what would be the correct way to handle that?

If I have it return if "xs" is found to be NULL, it continues - the
debugging suggests it goes through each possible target before
finishing up. I don't know if that's correct, but it seems to continue
booting after that even if my example didn't detect the drive with the
kernel I built (I used the RAMDISK kernel and it was pretty stripped
down).

I'm about to attempt a -STABLE build (I've got 7.3 installed and thus
can't yet build a snapshot, but I will do that if this test succeeds)
- here's the patch that hopefully fixes the problem. (and hopefully
gmail doesn't clobber the tabs)

Index: sys/dev/pv/vioscsi.c
===
RCS file: /cvs/src/sys/dev/pv/vioscsi.c,v
retrieving revision 1.30
diff -u -p -u -p -r1.30 vioscsi.c
--- sys/dev/pv/vioscsi.c 16 Apr 2022 19:19:59 - 1.30
+++ sys/dev/pv/vioscsi.c 25 Apr 2023 12:51:16 -
@@ -296,6 +296,7 @@ vioscsi_req_done(struct vioscsi_softc *s
struct scsi_xfer *xs = vr->vr_xs;
DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs);

+ if (xs == NULL) return;
int isread = !!(xs->flags & SCSI_DATA_IN);
bus_dmamap_sync(vsc->sc_dmat, vr->vr_control,
offsetof(struct vioscsi_req, vr_req),










Re: OpenBSD 7.2 on Oracle Cloud

2023-04-30 Thread Aaron Mason
On Mon, May 1, 2023 at 10:08 AM Aaron Mason  wrote:
>
> I can reproduce it with this in QEMU 8.0 in Winders (thanks Antun who
> sent something like this to the bugs@ list):
>
> qemu-system-x86_64 -accel whpx,kernel-irqchip=off -machine q35 \
>-cpu EPYC-Rome,-monitor -m 8g -smp 6,sockets=1,cores=6 \
>-nic user,model=virtio-net-pci,hostfwd=tcp::10022-:22 -vga virtio \
>-drive if=virtio,file=miniroot73.img -device virtio-scsi-pci,id=scsi
>
> The temporary workaround patch results in a booting system.
>

The same occurs in 7.2 under Winders.

> On Mon, May 1, 2023 at 4:56 AM Stefan Fritsch  wrote:
> >
> > Hi,
> >
> > what qemu version are you using? I cannot reproduce this with qemu 7.2.
> > Can you try with a newer qemu?
> >
> > Cheers,
> > Stefan
> >
> > Am 25.04.23 um 14:53 schrieb Aaron Mason:
> > [REDACTED]



-- 
Aaron Mason - Programmer, open source addict
I've taken my software vows - for beta or for worse



Re: OpenBSD 7.2 on Oracle Cloud

2023-04-30 Thread Aaron Mason
I can reproduce it with this in QEMU 8.0 in Winders (thanks Antun who
sent something like this to the bugs@ list):

qemu-system-x86_64 -accel whpx,kernel-irqchip=off -machine q35 \
   -cpu EPYC-Rome,-monitor -m 8g -smp 6,sockets=1,cores=6 \
   -nic user,model=virtio-net-pci,hostfwd=tcp::10022-:22 -vga virtio \
   -drive if=virtio,file=miniroot73.img -device virtio-scsi-pci,id=scsi

The temporary workaround patch results in a booting system.

On Mon, May 1, 2023 at 4:56 AM Stefan Fritsch  wrote:
>
> Hi,
>
> what qemu version are you using? I cannot reproduce this with qemu 7.2.
> Can you try with a newer qemu?
>
> Cheers,
> Stefan
>
> Am 25.04.23 um 14:53 schrieb Aaron Mason:
>  Yeah I'm getting the same thing. Trying a build in QEMU and
>  transferring in to see if that helps. Will report back.
> 
> >>>
> >>> Ok, good news, it still crashes at the same spot, but this time I've
> >>> got more data. Copying in tech@ - if I've forgotten anything let me
> >>> know and I'll fire up a fresh instance.
> >>>
> >>> [REDACTED]
> >>> vioscsi_req_done(e,80024a00,fd803f81c338,e,80024a00,800
> >>> d3228) at vioscsi_req_done+0x26
> >>> [REDACTED]
> >>
> >> Ok, so based on the trace I got, I was able to trace the stop itself
> >> back to line 299 of vioscsi.c (thank. you. random relink. And
> >> anonymous CVS):
> >>
> >> 293  vioscsi_req_done(struct vioscsi_softc *sc, struct virtio_softc 
> >> *vsc,
> >> 294  struct vioscsi_req *vr)
> >> 295  {
> >> 296  struct scsi_xfer *xs = vr->vr_xs;
> >> 297  DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, 
> >> xs);
> >> 298
> >> -->299  int isread = !!(xs->flags & SCSI_DATA_IN);
> >> 300  bus_dmamap_sync(vsc->sc_dmat, vr->vr_control,
> >> 301  offsetof(struct vioscsi_req, vr_req),
> >> 302  sizeof(struct virtio_scsi_req_hdr),
> >> 303  BUS_DMASYNC_POSTWRITE);
> >>
> >> Maybe if I follow the rabbit hole enough, I might find out what's
> >> going wrong between the driver and OCI. I've got a day off tomorrow
> >> (yay for war I guess), I'll give it a bash and see where we end up.
> >>
> >> --
> >> Aaron Mason - Programmer, open source addict
> >> I've taken my software vows - for beta or for worse
> >
> > I enabled debugging on the vioscsi driver, rebuilt the RAMDISK kernel
> > with those drivers enabled, and got this:
> >
> > vioscsi0 at virtio1: qsize 128
> > scsibus0 at vioscsi0: 255 targets
> > vioscsi_req_get: 0xfd803f80d338
> > vioscsi_scsi_cmd: enter
> > vioscsi_scsi_cmd: polling...
> > vioscsi_scsi_cmd: polling timeout
> > vioscsi_scsi_cmd: done (timeout=0)
> > vioscsi_scsi_cmd: enter
> > vioscsi_scsi_cmd: polling...
> > vioscsi_vq_done: enter
> > vioscsi_vq_done: slot=127
> > vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0xfd803f8a5e58
> > vioscsi_req_done: done 0, 2, 0
> > vioscsi_vq_done: slot=127
> > vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0x0
> > uvm_fault(0x813ec2e0, 0x8, 0, 1) -> e
> > fatal page fault in supervisor mode
> > trap type 6 code 0 rip 810e6190 cs 8 rflags 10286 cr2 8 cpl e
> > rsp 81606670
> > gsbase 0x813dfff0  kgsbase 0x0
> > panic: trap type 6, code=0, pc=810e6190
> >
> > That "xs: 0x0" bit feels like a clue. It should be trivial to pick up
> > and handle, but what would be the correct way to handle that?
> >
> > If I have it return if "xs" is found to be NULL, it continues - the
> > debugging suggests it goes through each possible target before
> > finishing up. I don't know if that's correct, but it seems to continue
> > booting after that even if my example didn't detect the drive with the
> > kernel I built (I used the RAMDISK kernel and it was pretty stripped
> > down).
> >
> > I'm about to attempt a -STABLE build (I've got 7.3 installed and thus
> > can't yet build a snapshot, but I will do that if this test succeeds)
> > - here's the patch that hopefully fixes the problem. (and hopefully
> > gmail doesn't clobber the tabs)
> >
> > Index: sys/dev/pv/vioscsi.c
> > ===
> > RCS file: /cvs/src/sys/dev/pv/vioscsi.c,v
> > retrieving revision 1.30
> > diff -u -p -u -p -r1.30 vioscsi.c
> > --- sys/dev/pv/vioscsi.c 16 Apr 2022 19:19:59 - 1.30
> > +++ sys/dev/pv/vioscsi.c 25 Apr 2023 12:51:16 -
> > @@ -296,6 +296,7 @@ vioscsi_req_done(struct vioscsi_softc *s
> >struct scsi_xfer *xs = vr->vr_xs;
> >DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs);
> >
> > + if (xs == NULL) return;
> >int isread = !!(xs->flags & SCSI_DATA_IN);
> >bus_dmamap_sync(vsc->sc_dmat, vr->vr_control,
> >offsetof(struct vioscsi_req, vr_req),
> >
> >



-- 
Aaron Mason - Programmer, open source addict
I've taken my software vows - for beta or for worse



Re: OpenBSD 7.2 on Oracle Cloud

2023-04-30 Thread Stefan Fritsch

Hi,

what qemu version are you using? I cannot reproduce this with qemu 7.2. 
Can you try with a newer qemu?


Cheers,
Stefan

Am 25.04.23 um 14:53 schrieb Aaron Mason:

Yeah I'm getting the same thing. Trying a build in QEMU and
transferring in to see if that helps. Will report back.



Ok, good news, it still crashes at the same spot, but this time I've
got more data. Copying in tech@ - if I've forgotten anything let me
know and I'll fire up a fresh instance.

[REDACTED]
vioscsi_req_done(e,80024a00,fd803f81c338,e,80024a00,800
d3228) at vioscsi_req_done+0x26
[REDACTED]


Ok, so based on the trace I got, I was able to trace the stop itself
back to line 299 of vioscsi.c (thank. you. random relink. And
anonymous CVS):

293  vioscsi_req_done(struct vioscsi_softc *sc, struct virtio_softc *vsc,
294  struct vioscsi_req *vr)
295  {
296  struct scsi_xfer *xs = vr->vr_xs;
297  DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs);
298
-->299  int isread = !!(xs->flags & SCSI_DATA_IN);
300  bus_dmamap_sync(vsc->sc_dmat, vr->vr_control,
301  offsetof(struct vioscsi_req, vr_req),
302  sizeof(struct virtio_scsi_req_hdr),
303  BUS_DMASYNC_POSTWRITE);

Maybe if I follow the rabbit hole enough, I might find out what's
going wrong between the driver and OCI. I've got a day off tomorrow
(yay for war I guess), I'll give it a bash and see where we end up.

--
Aaron Mason - Programmer, open source addict
I've taken my software vows - for beta or for worse


I enabled debugging on the vioscsi driver, rebuilt the RAMDISK kernel
with those drivers enabled, and got this:

vioscsi0 at virtio1: qsize 128
scsibus0 at vioscsi0: 255 targets
vioscsi_req_get: 0xfd803f80d338
vioscsi_scsi_cmd: enter
vioscsi_scsi_cmd: polling...
vioscsi_scsi_cmd: polling timeout
vioscsi_scsi_cmd: done (timeout=0)
vioscsi_scsi_cmd: enter
vioscsi_scsi_cmd: polling...
vioscsi_vq_done: enter
vioscsi_vq_done: slot=127
vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0xfd803f8a5e58
vioscsi_req_done: done 0, 2, 0
vioscsi_vq_done: slot=127
vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0x0
uvm_fault(0x813ec2e0, 0x8, 0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip 810e6190 cs 8 rflags 10286 cr2 8 cpl e
rsp 81606670
gsbase 0x813dfff0  kgsbase 0x0
panic: trap type 6, code=0, pc=810e6190

That "xs: 0x0" bit feels like a clue. It should be trivial to pick up
and handle, but what would be the correct way to handle that?

If I have it return if "xs" is found to be NULL, it continues - the
debugging suggests it goes through each possible target before
finishing up. I don't know if that's correct, but it seems to continue
booting after that even if my example didn't detect the drive with the
kernel I built (I used the RAMDISK kernel and it was pretty stripped
down).

I'm about to attempt a -STABLE build (I've got 7.3 installed and thus
can't yet build a snapshot, but I will do that if this test succeeds)
- here's the patch that hopefully fixes the problem. (and hopefully
gmail doesn't clobber the tabs)

Index: sys/dev/pv/vioscsi.c
===
RCS file: /cvs/src/sys/dev/pv/vioscsi.c,v
retrieving revision 1.30
diff -u -p -u -p -r1.30 vioscsi.c
--- sys/dev/pv/vioscsi.c 16 Apr 2022 19:19:59 - 1.30
+++ sys/dev/pv/vioscsi.c 25 Apr 2023 12:51:16 -
@@ -296,6 +296,7 @@ vioscsi_req_done(struct vioscsi_softc *s
   struct scsi_xfer *xs = vr->vr_xs;
   DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs);

+ if (xs == NULL) return;
   int isread = !!(xs->flags & SCSI_DATA_IN);
   bus_dmamap_sync(vsc->sc_dmat, vr->vr_control,
   offsetof(struct vioscsi_req, vr_req),






Re: OpenBSD 7.2 on Oracle Cloud

2023-04-27 Thread Aaron Mason
On Tue, Apr 25, 2023 at 10:53 PM Aaron Mason  wrote:
>
> On Mon, Apr 24, 2023 at 3:47 PM Aaron Mason  wrote:
> >
> > On Fri, Apr 21, 2023 at 2:50 PM Aaron Mason  
> > wrote:
> > >
> > > On Fri, Apr 21, 2023 at 1:39 PM Aaron Mason  
> > > wrote:
> > > >
> > > > On Fri, Apr 7, 2023 at 3:25 AM Antun Matanović
> > > >  wrote:
> > > > >
> > > > > On Thu, 6 Apr 2023 at 12:55, Fabio Martins  wrote:
> > > > > >
> > > > > > Try to add an entry in grub like in this article:
> > > > > >
> > > > > > https://raby.sh/installing-openbsd-on-ovhs-vps-2016-kvm-machines.html
> > > > >
> > > > > I have tried that, but it did not resolve the issue. Sorry I forgot to
> > > > > mention it originally.
> > > > >
> > > > > On Thu, 6 Apr 2023 at 14:24, Janne Johansson  
> > > > > wrote:
> > > > > >
> > > > > > That is very much not the same issue. The arm64 instances on Oracle
> > > > > > finds the correct kernel and boots it, it just crashes at or after 
> > > > > > the
> > > > > > scsi attachment.
> > > > >
> > > > > This has been my experience as well, except on the amd64 instance,
> > > > > haven't tried arm64.
> > > > >
> > > >
> > > > Yeah I'm getting the same thing. Trying a build in QEMU and
> > > > transferring in to see if that helps. Will report back.
> > > >
> > >
> > > Ok, good news, it still crashes at the same spot, but this time I've
> > > got more data. Copying in tech@ - if I've forgotten anything let me
> > > know and I'll fire up a fresh instance.
> > >
> > > [REDACTED]
> > > vioscsi_req_done(e,80024a00,fd803f81c338,e,80024a00,800
> > > d3228) at vioscsi_req_done+0x26
> > > [REDACTED]
> >
> > Ok, so based on the trace I got, I was able to trace the stop itself
> > back to line 299 of vioscsi.c (thank. you. random relink. And
> > anonymous CVS):
> >
> >293  vioscsi_req_done(struct vioscsi_softc *sc, struct virtio_softc *vsc,
> >294  struct vioscsi_req *vr)
> >295  {
> >296  struct scsi_xfer *xs = vr->vr_xs;
> >297  DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs);
> >298
> > -->299  int isread = !!(xs->flags & SCSI_DATA_IN);
> >300  bus_dmamap_sync(vsc->sc_dmat, vr->vr_control,
> >301  offsetof(struct vioscsi_req, vr_req),
> >302  sizeof(struct virtio_scsi_req_hdr),
> >303  BUS_DMASYNC_POSTWRITE);
> >
> > Maybe if I follow the rabbit hole enough, I might find out what's
> > going wrong between the driver and OCI. I've got a day off tomorrow
> > (yay for war I guess), I'll give it a bash and see where we end up.
> >
> > --
> > Aaron Mason - Programmer, open source addict
> > I've taken my software vows - for beta or for worse
>
> I enabled debugging on the vioscsi driver, rebuilt the RAMDISK kernel
> with those drivers enabled, and got this:
>
> vioscsi0 at virtio1: qsize 128
> scsibus0 at vioscsi0: 255 targets
> vioscsi_req_get: 0xfd803f80d338
> vioscsi_scsi_cmd: enter
> vioscsi_scsi_cmd: polling...
> vioscsi_scsi_cmd: polling timeout
> vioscsi_scsi_cmd: done (timeout=0)
> vioscsi_scsi_cmd: enter
> vioscsi_scsi_cmd: polling...
> vioscsi_vq_done: enter
> vioscsi_vq_done: slot=127
> vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0xfd803f8a5e58
> vioscsi_req_done: done 0, 2, 0
> vioscsi_vq_done: slot=127
> vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0x0
> uvm_fault(0x813ec2e0, 0x8, 0, 1) -> e
> fatal page fault in supervisor mode
> trap type 6 code 0 rip 810e6190 cs 8 rflags 10286 cr2 8 cpl e
> rsp 81606670
> gsbase 0x813dfff0  kgsbase 0x0
> panic: trap type 6, code=0, pc=810e6190
>
> That "xs: 0x0" bit feels like a clue. It should be trivial to pick up
> and handle, but what would be the correct way to handle that?
>
> If I have it return if "xs" is found to be NULL, it continues - the
> debugging suggests it goes through each possible target before
> finishing up. I don't know if that's correct, but it seems to continue
> booting after that even if my example didn't detect the drive with the
> kernel I built (I used the RAMDISK kernel and it was pretty stripped
> down).
>
> I'm about to attempt a -STABLE build (I've got 7.3 installed and thus
> can't yet build a snapshot, but I will do that if this test succeeds)
> - here's the patch that hopefully fixes the problem. (and hopefully
> gmail doesn't clobber the tabs)
>
> Index: sys/dev/pv/vioscsi.c
> ===
> RCS file: /cvs/src/sys/dev/pv/vioscsi.c,v
> retrieving revision 1.30
> diff -u -p -u -p -r1.30 vioscsi.c
> --- sys/dev/pv/vioscsi.c 16 Apr 2022 19:19:59 - 1.30
> +++ sys/dev/pv/vioscsi.c 25 Apr 2023 12:51:16 -
> @@ -296,6 +296,7 @@ vioscsi_req_done(struct vioscsi_softc *s
>   struct scsi_xfer *xs = vr->vr_xs;
>   DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs);
>
> + if (xs == NULL) return;
>   int isread = !!(xs->flags & SCSI_DATA_IN);
>   bus_dmamap_sync(vsc->sc_dmat, 

Re: OpenBSD 7.2 on Oracle Cloud

2023-04-25 Thread Aaron Mason
On Mon, Apr 24, 2023 at 3:47 PM Aaron Mason  wrote:
>
> On Fri, Apr 21, 2023 at 2:50 PM Aaron Mason  wrote:
> >
> > On Fri, Apr 21, 2023 at 1:39 PM Aaron Mason  
> > wrote:
> > >
> > > On Fri, Apr 7, 2023 at 3:25 AM Antun Matanović
> > >  wrote:
> > > >
> > > > On Thu, 6 Apr 2023 at 12:55, Fabio Martins  wrote:
> > > > >
> > > > > Try to add an entry in grub like in this article:
> > > > >
> > > > > https://raby.sh/installing-openbsd-on-ovhs-vps-2016-kvm-machines.html
> > > >
> > > > I have tried that, but it did not resolve the issue. Sorry I forgot to
> > > > mention it originally.
> > > >
> > > > On Thu, 6 Apr 2023 at 14:24, Janne Johansson  
> > > > wrote:
> > > > >
> > > > > That is very much not the same issue. The arm64 instances on Oracle
> > > > > finds the correct kernel and boots it, it just crashes at or after the
> > > > > scsi attachment.
> > > >
> > > > This has been my experience as well, except on the amd64 instance,
> > > > haven't tried arm64.
> > > >
> > >
> > > Yeah I'm getting the same thing. Trying a build in QEMU and
> > > transferring in to see if that helps. Will report back.
> > >
> >
> > Ok, good news, it still crashes at the same spot, but this time I've
> > got more data. Copying in tech@ - if I've forgotten anything let me
> > know and I'll fire up a fresh instance.
> >
> > [REDACTED]
> > vioscsi_req_done(e,80024a00,fd803f81c338,e,80024a00,800
> > d3228) at vioscsi_req_done+0x26
> > [REDACTED]
>
> Ok, so based on the trace I got, I was able to trace the stop itself
> back to line 299 of vioscsi.c (thank. you. random relink. And
> anonymous CVS):
>
>293  vioscsi_req_done(struct vioscsi_softc *sc, struct virtio_softc *vsc,
>294  struct vioscsi_req *vr)
>295  {
>296  struct scsi_xfer *xs = vr->vr_xs;
>297  DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs);
>298
> -->299  int isread = !!(xs->flags & SCSI_DATA_IN);
>300  bus_dmamap_sync(vsc->sc_dmat, vr->vr_control,
>301  offsetof(struct vioscsi_req, vr_req),
>302  sizeof(struct virtio_scsi_req_hdr),
>303  BUS_DMASYNC_POSTWRITE);
>
> Maybe if I follow the rabbit hole enough, I might find out what's
> going wrong between the driver and OCI. I've got a day off tomorrow
> (yay for war I guess), I'll give it a bash and see where we end up.
>
> --
> Aaron Mason - Programmer, open source addict
> I've taken my software vows - for beta or for worse

I enabled debugging on the vioscsi driver, rebuilt the RAMDISK kernel
with those drivers enabled, and got this:

vioscsi0 at virtio1: qsize 128
scsibus0 at vioscsi0: 255 targets
vioscsi_req_get: 0xfd803f80d338
vioscsi_scsi_cmd: enter
vioscsi_scsi_cmd: polling...
vioscsi_scsi_cmd: polling timeout
vioscsi_scsi_cmd: done (timeout=0)
vioscsi_scsi_cmd: enter
vioscsi_scsi_cmd: polling...
vioscsi_vq_done: enter
vioscsi_vq_done: slot=127
vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0xfd803f8a5e58
vioscsi_req_done: done 0, 2, 0
vioscsi_vq_done: slot=127
vioscsi_req_done: enter vr: 0xfd803f80d338 xs: 0x0
uvm_fault(0x813ec2e0, 0x8, 0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip 810e6190 cs 8 rflags 10286 cr2 8 cpl e
rsp 81606670
gsbase 0x813dfff0  kgsbase 0x0
panic: trap type 6, code=0, pc=810e6190

That "xs: 0x0" bit feels like a clue. It should be trivial to pick up
and handle, but what would be the correct way to handle that?

If I have it return if "xs" is found to be NULL, it continues - the
debugging suggests it goes through each possible target before
finishing up. I don't know if that's correct, but it seems to continue
booting after that even if my example didn't detect the drive with the
kernel I built (I used the RAMDISK kernel and it was pretty stripped
down).

I'm about to attempt a -STABLE build (I've got 7.3 installed and thus
can't yet build a snapshot, but I will do that if this test succeeds)
- here's the patch that hopefully fixes the problem. (and hopefully
gmail doesn't clobber the tabs)

Index: sys/dev/pv/vioscsi.c
===
RCS file: /cvs/src/sys/dev/pv/vioscsi.c,v
retrieving revision 1.30
diff -u -p -u -p -r1.30 vioscsi.c
--- sys/dev/pv/vioscsi.c 16 Apr 2022 19:19:59 - 1.30
+++ sys/dev/pv/vioscsi.c 25 Apr 2023 12:51:16 -
@@ -296,6 +296,7 @@ vioscsi_req_done(struct vioscsi_softc *s
  struct scsi_xfer *xs = vr->vr_xs;
  DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs);

+ if (xs == NULL) return;
  int isread = !!(xs->flags & SCSI_DATA_IN);
  bus_dmamap_sync(vsc->sc_dmat, vr->vr_control,
  offsetof(struct vioscsi_req, vr_req),


-- 
Aaron Mason - Programmer, open source addict
I've taken my software vows - for beta or for worse



Re: OpenBSD 7.2 on Oracle Cloud

2023-04-23 Thread Aaron Mason
On Fri, Apr 21, 2023 at 2:50 PM Aaron Mason  wrote:
>
> On Fri, Apr 21, 2023 at 1:39 PM Aaron Mason  wrote:
> >
> > On Fri, Apr 7, 2023 at 3:25 AM Antun Matanović
> >  wrote:
> > >
> > > On Thu, 6 Apr 2023 at 12:55, Fabio Martins  wrote:
> > > >
> > > > Try to add an entry in grub like in this article:
> > > >
> > > > https://raby.sh/installing-openbsd-on-ovhs-vps-2016-kvm-machines.html
> > >
> > > I have tried that, but it did not resolve the issue. Sorry I forgot to
> > > mention it originally.
> > >
> > > On Thu, 6 Apr 2023 at 14:24, Janne Johansson  wrote:
> > > >
> > > > That is very much not the same issue. The arm64 instances on Oracle
> > > > finds the correct kernel and boots it, it just crashes at or after the
> > > > scsi attachment.
> > >
> > > This has been my experience as well, except on the amd64 instance,
> > > haven't tried arm64.
> > >
> >
> > Yeah I'm getting the same thing. Trying a build in QEMU and
> > transferring in to see if that helps. Will report back.
> >
>
> Ok, good news, it still crashes at the same spot, but this time I've
> got more data. Copying in tech@ - if I've forgotten anything let me
> know and I'll fire up a fresh instance.
>
> [REDACTED]
> vioscsi_req_done(e,80024a00,fd803f81c338,e,80024a00,800
> d3228) at vioscsi_req_done+0x26
> [REDACTED]

Ok, so based on the trace I got, I was able to trace the stop itself
back to line 299 of vioscsi.c (thank. you. random relink. And
anonymous CVS):

   293  vioscsi_req_done(struct vioscsi_softc *sc, struct virtio_softc *vsc,
   294  struct vioscsi_req *vr)
   295  {
   296  struct scsi_xfer *xs = vr->vr_xs;
   297  DPRINTF("vioscsi_req_done: enter vr: %p xs: %p\n", vr, xs);
   298
-->299  int isread = !!(xs->flags & SCSI_DATA_IN);
   300  bus_dmamap_sync(vsc->sc_dmat, vr->vr_control,
   301  offsetof(struct vioscsi_req, vr_req),
   302  sizeof(struct virtio_scsi_req_hdr),
   303  BUS_DMASYNC_POSTWRITE);

Maybe if I follow the rabbit hole enough, I might find out what's
going wrong between the driver and OCI. I've got a day off tomorrow
(yay for war I guess), I'll give it a bash and see where we end up.

-- 
Aaron Mason - Programmer, open source addict
I've taken my software vows - for beta or for worse



Re: OpenBSD 7.2 on Oracle Cloud

2023-04-20 Thread Aaron Mason
On Fri, Apr 21, 2023 at 1:39 PM Aaron Mason  wrote:
>
> On Fri, Apr 7, 2023 at 3:25 AM Antun Matanović
>  wrote:
> >
> > On Thu, 6 Apr 2023 at 12:55, Fabio Martins  wrote:
> > >
> > > Try to add an entry in grub like in this article:
> > >
> > > https://raby.sh/installing-openbsd-on-ovhs-vps-2016-kvm-machines.html
> >
> > I have tried that, but it did not resolve the issue. Sorry I forgot to
> > mention it originally.
> >
> > On Thu, 6 Apr 2023 at 14:24, Janne Johansson  wrote:
> > >
> > > That is very much not the same issue. The arm64 instances on Oracle
> > > finds the correct kernel and boots it, it just crashes at or after the
> > > scsi attachment.
> >
> > This has been my experience as well, except on the amd64 instance,
> > haven't tried arm64.
> >
>
> Yeah I'm getting the same thing. Trying a build in QEMU and
> transferring in to see if that helps. Will report back.
>

Ok, good news, it still crashes at the same spot, but this time I've
got more data. Copying in tech@ - if I've forgotten anything let me
know and I'll fire up a fresh instance.

OpenBSD 7.3 (GENERIC.MP) #1125: Sat Mar 25 10:36:29 MDT 2023
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 1056817152 (1007MB)
avail mem = 1005449216 (958MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf7050 (9 entries)
bios0: vendor SeaBIOS version "?-20171121_152543-x86-ol7-builder-01.us.oracle.c
om-4.el7.1" date 04/01/2014
bios0: QEMU Standard PC (i440FX + PIIX, 1996)
acpi0 at bios0: ACPI 1.0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC HPET
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD EPYC 7551 32-Core Processor, 1997.30 MHz, 17-01-02
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C
FLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MO
VBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,
LONG,LAHF,CMPLEG,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,TOPEXT,CPCTR,FSGSBASE,TSC_AD
JUST,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,VIRTSSBD,XSAVEOPT,
XSAVEC,XGETBV1,XSAVES
cpu0: 64KB 64b/line 2-way D-cache, 64KB 64b/line 2-way I-cache, 512KB 64b/line 1
6-way L2 cache, 16MB 64b/line 16-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 1000MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD EPYC 7551 32-Core Processor, 3377.48 MHz, 17-01-02
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C
FLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MO
VBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,
LONG,LAHF,CMPLEG,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,TOPEXT,CPCTR,FSGSBASE,TSC_AD
JUST,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,VIRTSSBD,XSAVEOPT,
XSAVEC,XGETBV1,XSAVES
cpu1: 64KB 64b/line 2-way D-cache, 64KB 64b/line 2-way I-cache, 512KB 64b/line 1
6-way L2 cache, 16MB 64b/line 16-way L3 cache
cpu1: smt 0, core 0, package 1
ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 24 pins
acpihpet0 at acpi0: 1 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
"ACPI0006" at acpi0 not configured
acpipci0 at acpi0 PCI0
acpicmos0 at acpi0
com0 at acpi0 COM1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
com0: console
"QEMU0001" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"QEMU0002" at acpi0 not configured
"ACPI0010" at acpi0 not configured
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
pvbus0 at mainbus0: KVM
pvclock0 at pvbus0
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 w
ired to compatibility, channel 1 wired to compatibility
pciide0: channel 0 disabled (no drives)
pciide0: channel 1 disabled (no drives)
uhci0 at pci0 dev 1 function 2 "Intel 82371SB USB" rev 0x01: apic 0 int 11
piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 0 int 9
iic0 at piixpm0
vga1 at pci0 dev 2 function 0 "Bochs VGA" rev 0x02
wsdisplay at vga1 not configured
virtio0 at pci0 dev 3 function 0 "Qumranet Virtio Network" rev 0x00
vio0 at virtio0: address 02:00:17:00:21:c1
virtio0: msix shared
virtio1 at pci0 dev 4 function 0 "Qumranet Virtio SCSI" rev 0x00
vioscsi0 at virtio1: qsize 128
scsibus1 at vioscsi0: 255 targets
uvm_fault(0x82533360, 0x8, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  vioscsi_req_done+0x26:  movl0x8(%r15),%ebx
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
* 0  0  0 0x1  0x2000K swapper