Re: ldapd(8) hangs when receiving large data

2017-05-15 Thread Alexander Bluhm
On Fri, May 05, 2017 at 06:05:28PM +, Seiya Kawashima wrote:
> I'm not quite sure if this is the right way to fix the issue but it looks 
> like that
> this issue is related to how ldapd(8) buffers LDAP messages from the client.

Thanks for the analysis.  I have copied the code from libevent to
syslogd.  While adapting to TLS, I did not think about that
ioctl(FIONREAD) does not make sense for TLS.  Then it was copied
to ldapd again without realizing the problem.

I would like to keep this as much a possible in sync with libevent.
So just remove the problematic code in ldapd(8) and syslogd(8).

ok?

bluhm

Index: usr.sbin/ldapd/evbuffer_tls.c
===
RCS file: /data/mirror/openbsd/cvs/src/usr.sbin/ldapd/evbuffer_tls.c,v
retrieving revision 1.2
diff -u -p -r1.2 evbuffer_tls.c
--- usr.sbin/ldapd/evbuffer_tls.c   3 Mar 2017 20:26:23 -   1.2
+++ usr.sbin/ldapd/evbuffer_tls.c   15 May 2017 22:32:32 -
@@ -298,21 +298,6 @@ evtls_read(struct evbuffer *buf, int fd,
size_t oldoff = buf->off;
int n = EVBUFFER_MAX_READ;
 
-   if (ioctl(fd, FIONREAD, &n) == -1 || n <= 0) {
-   n = EVBUFFER_MAX_READ;
-   } else if (n > EVBUFFER_MAX_READ && n > howmuch) {
-   /*
-* It's possible that a lot of data is available for
-* reading.  We do not want to exhaust resources
-* before the reader has a chance to do something
-* about it.  If the reader does not tell us how much
-* data we should read, we artifically limit it.
-*/
-   if ((size_t)n > buf->totallen << 2)
-   n = buf->totallen << 2;
-   if (n < EVBUFFER_MAX_READ)
-   n = EVBUFFER_MAX_READ;
-   }
if (howmuch < 0 || howmuch > n)
howmuch = n;
 
Index: usr.sbin/syslogd/evbuffer_tls.c
===
RCS file: /data/mirror/openbsd/cvs/src/usr.sbin/syslogd/evbuffer_tls.c,v
retrieving revision 1.10
diff -u -p -r1.10 evbuffer_tls.c
--- usr.sbin/syslogd/evbuffer_tls.c 3 Mar 2017 20:26:23 -   1.10
+++ usr.sbin/syslogd/evbuffer_tls.c 15 May 2017 22:32:15 -
@@ -298,21 +298,6 @@ evtls_read(struct evbuffer *buf, int fd,
size_t oldoff = buf->off;
int n = EVBUFFER_MAX_READ;
 
-   if (ioctl(fd, FIONREAD, &n) == -1 || n <= 0) {
-   n = EVBUFFER_MAX_READ;
-   } else if (n > EVBUFFER_MAX_READ && n > howmuch) {
-   /*
-* It's possible that a lot of data is available for
-* reading.  We do not want to exhaust resources
-* before the reader has a chance to do something
-* about it.  If the reader does not tell us how much
-* data we should read, we artifically limit it.
-*/
-   if ((size_t)n > buf->totallen << 2)
-   n = buf->totallen << 2;
-   if (n < EVBUFFER_MAX_READ)
-   n = EVBUFFER_MAX_READ;
-   }
if (howmuch < 0 || howmuch > n)
howmuch = n;
 



Re: Kernel panic on 6.1: init dies under load

2017-05-15 Thread Mike Belopuhov
On Mon, May 15, 2017 at 11:45 -0400, Dan Cross wrote:
> On Mon, May 15, 2017 at 11:28 AM, Mike Belopuhov  wrote:
> 
> > On Mon, May 15, 2017 at 11:18 -0400, Dan Cross wrote:
> > > On Mon, May 15, 2017 at 11:01 AM, Mike Belopuhov 
> > wrote:
> > > >
> > > > Thanks for reporting this, however there's not enough info to follow
> > > > up on this right now.  What is clear is that your provider is using
> > > > an ancient version of Xen that doesn't even support the callback
> > > > vector interrupt delivery (the emulated xspd0 device is delivering
> > > > all interrupts).  We have developed code for Xen 4.5+ platforms and
> > > > there was only some testing done by users on 3.x.  So, in a way, you
> > > > can consider Xen 3.x to not be officially supported at this point.
> > >
> > > That's unfortunate. Sadly, this is common across two different providers
> > > (Panix and rootbsd.net). The latter, I'm sure, would at least be
> > interested
> > > in coordinating with you guys to get a fix. I'll open a trouble ticket
> > with
> > > them.
> > >
> > > Having said that, I've got a few questions:
> > > >
> > > >  - Do you see other write failures as well?
> > >
> > > Yes. E.g, syslogd had a similar write failure before panic.
> >
> > Can you reproduce any of these write failures at will?
> >
> 
> I'm not sure what you mean. If I induce the load conditions, then the VM
> will panic fairly reliably.
>

I was wondering if you have seen any other write errors apart
from those that cause the panic.

> What happens when you just send a signal to dump the core?
> > You can test this by running "sleep 100", and then call
> > "pkill -ABRT -lf sleep".
> 
> 
> I'm not sure what this shows, but sure I can do that:
>

There are quite a number of different I/O codepaths in the
kernel and some are wonkier than the other.

> : jaan; /bin/sleep 100&
> [1] 20701
> : jaan; pkill -ABRT -lf sleep
> 20701 sleep
> : jaan;
> [1]  + abort (core dumped)  /bin/sleep 100
> : jaan; ls -l sleep.core
> -rw---  1 cross  staff  4208416 May 15 15:42 sleep.core
> : jaan;
> 
> The panic-inducing condition seems to be that, for whatever reason, the
> kernel gets into a funny state where processes like init(8) die due to
> having part of their VM image corrupted; the kernel then panics because
> `init` dies.
> 
> >  - Do you have swap enabled? (pstat -s)
> > >
> > >
> > > Yes; a gig:
> > >
> > > : jaan; pstat -s
> > > Device  1K-blocks UsedAvail Capacity  Priority
> > > /dev/sd0b 10482490  1048249 0%0
> > > : jaan;
> > >
> >
> > Do you see swap being used under your load?
> 
> 
> I'm not sure. I can try and crash a machine again and see poke at a kernel
> var from ddb to see; anything in particular you want me to look at?
>

Indeed.  You can run a "show uvmexp" DDB command.

Please try running with the diff below.  It will log all polled
and bounced transfers as well as some additional info.



diff --git sys/dev/pv/xbf.c sys/dev/pv/xbf.c
index d5c44770acb..29e7615d0fc 100644
--- sys/dev/pv/xbf.c
+++ sys/dev/pv/xbf.c
@@ -36,11 +36,11 @@
 #include 
 #include 
 #include 
 #include 
 
-/* #define XBF_DEBUG */
+#define XBF_DEBUG
 
 #ifdef XBF_DEBUG
 #define DPRINTF(x...)  printf(x)
 #else
 #define DPRINTF(x...)
@@ -478,10 +478,11 @@ xbf_load_xs(struct scsi_xfer *xs, int desc)
sge->sge_first = i > 0 ? 0 :
((vaddr_t)xs->data & PAGE_MASK) >> XBF_SEC_SHIFT;
sge->sge_last = sge->sge_first +
(map->dm_segs[i].ds_len >> XBF_SEC_SHIFT) - 1;
 
+   if (ISSET(xs->flags, SCSI_POLL))
DPRINTF("%s:   seg %d/%d ref %lu len %lu first %u last %u\n",
sc->sc_dev.dv_xname, i + 1, map->dm_nsegs,
map->dm_segs[i].ds_addr, map->dm_segs[i].ds_len,
sge->sge_first, sge->sge_last);
 
@@ -640,10 +641,11 @@ xbf_submit_cmd(struct scsi_xfer *xs)
xrd->xrd_req.req_op = operation;
xrd->xrd_req.req_unit = (uint16_t)sc->sc_unit;
xrd->xrd_req.req_sector = lba;
 
if (operation == XBF_OP_READ || operation == XBF_OP_WRITE) {
+   if (ISSET(xs->flags, SCSI_POLL))
DPRINTF("%s: desc %d %s%s lba %llu nsec %u len %d\n",
sc->sc_dev.dv_xname, desc, operation == XBF_OP_READ ?
"read" : "write", ISSET(xs->flags, SCSI_POLL) ? "-poll" :
"", lba, nblk, xs->datalen);
 
@@ -718,10 +720,11 @@ xbf_complete_cmd(struct scsi_xfer *xs, int desc)
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(sc->sc_dmat, map);
 
sc->sc_xs[desc] = NULL;
 
+   if (ISSET(xs->flags, SCSI_POLL))
DPRINTF("%s: completing desc %d(%llu) op %u with error %d\n",
sc->sc_dev.dv_xname, desc, xrd->xrd_rsp.rsp_id,
xrd->xrd_rsp.rsp_op, xrd->xrd_rsp.rsp_status);
 
id = xrd->xrd_rsp.rsp_id;



Re: Kernel panic on 6.1: init dies under load

2017-05-15 Thread Dan Cross
On Mon, May 15, 2017 at 11:28 AM, Mike Belopuhov  wrote:

> On Mon, May 15, 2017 at 11:18 -0400, Dan Cross wrote:
> > On Mon, May 15, 2017 at 11:01 AM, Mike Belopuhov 
> wrote:
> > >
> > > Thanks for reporting this, however there's not enough info to follow
> > > up on this right now.  What is clear is that your provider is using
> > > an ancient version of Xen that doesn't even support the callback
> > > vector interrupt delivery (the emulated xspd0 device is delivering
> > > all interrupts).  We have developed code for Xen 4.5+ platforms and
> > > there was only some testing done by users on 3.x.  So, in a way, you
> > > can consider Xen 3.x to not be officially supported at this point.
> >
> > That's unfortunate. Sadly, this is common across two different providers
> > (Panix and rootbsd.net). The latter, I'm sure, would at least be
> interested
> > in coordinating with you guys to get a fix. I'll open a trouble ticket
> with
> > them.
> >
> > Having said that, I've got a few questions:
> > >
> > >  - Do you see other write failures as well?
> >
> > Yes. E.g, syslogd had a similar write failure before panic.
>
> Can you reproduce any of these write failures at will?
>

I'm not sure what you mean. If I induce the load conditions, then the VM
will panic fairly reliably.

What happens when you just send a signal to dump the core?
> You can test this by running "sleep 100", and then call
> "pkill -ABRT -lf sleep".


I'm not sure what this shows, but sure I can do that:

: jaan; /bin/sleep 100&
[1] 20701
: jaan; pkill -ABRT -lf sleep
20701 sleep
: jaan;
[1]  + abort (core dumped)  /bin/sleep 100
: jaan; ls -l sleep.core
-rw---  1 cross  staff  4208416 May 15 15:42 sleep.core
: jaan;

The panic-inducing condition seems to be that, for whatever reason, the
kernel gets into a funny state where processes like init(8) die due to
having part of their VM image corrupted; the kernel then panics because
`init` dies.

>  - Do you have swap enabled? (pstat -s)
> >
> >
> > Yes; a gig:
> >
> > : jaan; pstat -s
> > Device  1K-blocks UsedAvail Capacity  Priority
> > /dev/sd0b 10482490  1048249 0%0
> > : jaan;
> >
>
> Do you see swap being used under your load?


I'm not sure. I can try and crash a machine again and see poke at a kernel
var from ddb to see; anything in particular you want me to look at?

>  - Do you see crashes when bsd.mp is used instead of a single processor
> >
> >kernel (that's right, even on the single processor VM)?
> > >
> >
> > Yes; the panic happens whether using single- or multi-processor kernels.
>
> Good, nothing has slipped through those cracks again.
>

I can see the value in narrowing down the search space. :-)

- Dan C.


Re: Kernel panic on 6.1: init dies under load

2017-05-15 Thread Mike Belopuhov
On Mon, May 15, 2017 at 11:18 -0400, Dan Cross wrote:
> On Mon, May 15, 2017 at 11:01 AM, Mike Belopuhov  wrote:
> >
> > Thanks for reporting this, however there's not enough info to follow
> > up on this right now.  What is clear is that your provider is using
> > an ancient version of Xen that doesn't even support the callback
> > vector interrupt delivery (the emulated xspd0 device is delivering
> > all interrupts).  We have developed code for Xen 4.5+ platforms and
> > there was only some testing done by users on 3.x.  So, in a way, you
> > can consider Xen 3.x to not be officially supported at this point.
> >
> 
> That's unfortunate. Sadly, this is common across two different providers
> (Panix and rootbsd.net). The latter, I'm sure, would at least be interested
> in coordinating with you guys to get a fix. I'll open a trouble ticket with
> them.
> 
> Having said that, I've got a few questions:
> >
> >  - Do you see other write failures as well?
> >
> 
> Yes. E.g, syslogd had a similar write failure before panic.
>

Can you reproduce any of these write failures at will?

What happens when you just send a signal to dump the core?
You can test this by running "sleep 100", and then call
"pkill -ABRT -lf sleep".

>  - Do you have swap enabled? (pstat -s)
> 
> 
> Yes; a gig:
> 
> : jaan; pstat -s
> Device  1K-blocks UsedAvail Capacity  Priority
> /dev/sd0b 10482490  1048249 0%0
> : jaan;
>

Do you see swap being used under your load?

>  - Do you see crashes when bsd.mp is used instead of a single processor
> 
>kernel (that's right, even on the single processor VM)?
> >
> 
> Yes; the panic happens whether using single- or multi-processor kernels.
>

Good, nothing has slipped through those cracks again.



Re: Kernel panic on 6.1: init dies under load

2017-05-15 Thread Dan Cross
On Mon, May 15, 2017 at 11:01 AM, Mike Belopuhov  wrote:
>
> Thanks for reporting this, however there's not enough info to follow
> up on this right now.  What is clear is that your provider is using
> an ancient version of Xen that doesn't even support the callback
> vector interrupt delivery (the emulated xspd0 device is delivering
> all interrupts).  We have developed code for Xen 4.5+ platforms and
> there was only some testing done by users on 3.x.  So, in a way, you
> can consider Xen 3.x to not be officially supported at this point.
>

That's unfortunate. Sadly, this is common across two different providers
(Panix and rootbsd.net). The latter, I'm sure, would at least be interested
in coordinating with you guys to get a fix. I'll open a trouble ticket with
them.

Having said that, I've got a few questions:
>
>  - Do you see other write failures as well?
>

Yes. E.g, syslogd had a similar write failure before panic.

 - Do you have swap enabled? (pstat -s)


Yes; a gig:

: jaan; pstat -s
Device  1K-blocks UsedAvail Capacity  Priority
/dev/sd0b 10482490  1048249 0%0
: jaan;

 - Do you see crashes when bsd.mp is used instead of a single processor

   kernel (that's right, even on the single processor VM)?
>

Yes; the panic happens whether using single- or multi-processor kernels.

- Dan C.


Regards,
> Mike
>
> On Mon, May 15, 2017 at 10:28 -0400, Dan Cross wrote:
> > >Synopsis:  init dies causing kernel panic on virtualized hosts.
> > >Category:  system
> > >Environment:
> > System  : OpenBSD 6.1
> > Details : OpenBSD 6.1 (GENERIC) #6: Sat May  6 09:33:26 CEST
> > 2017
> >  rob...@syspatch-61-amd64.openbsd.org:
> > /usr/src/sys/arch/amd64/compile/GENERIC
> >
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > Kernel panics under moderate/heavy load when running under a
> > hypervisor (I believe my VPS provider is using Xen); init(8)
> > dies and the machine panics. `boot sync` does not work and
> > the filesystem requires manual fsck on reboot.
> >
> > I have not seen this on harware.
> >
> > Console data from the panic is as follows:
> >
> > : tempest; cat panic
> > coredump of syslogd(94574), write failed: errno 14
> > coredump of init(1), write failed: errno 14
> > panic: init died (signal 10, exit 0)
> > Stopped at  Debugger+0x9:   leave
> > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > *285197  1  0   0x802 0x20000  init
> > Debuggger() at Debugger+0x9
> > panic() at panic+0xfe
> > exit1() at exit1+0x58d
> > trapsignal() at trapsignal+0x110
> > trap() at trap+0x309
> > --- trap (number 4) ---
> > end of kernel
> > end trace fram: 0xff, count: 10
> > 0x18057281cfdc
> > https://www.openbsd.org/ddb.html describes the minimum info
> > required in bug
> > reports.  Insufficient info makes it difficult to find and fix
> bugs.
> > ddb>
> > : tempest;
> >
> > >How-To-Repeat:
> > Run some CPU/memory intensive workload; for example, rebuilding
> > the Go compiler and toolchain.  Occasionally the system will
> > survive,
> > but gets into a state where processes are dying.
> > >Fix:
> > Unknown.
> >
> >
> > dmesg:
> > OpenBSD 6.1 (GENERIC) #6: Sat May  6 09:33:26 CEST 2017
> > rob...@syspatch-61-amd64.openbsd.org:/usr/src/sys/arch/
> > amd64/compile/GENERIC
> > real mem = 520093696 (496MB)
> > avail mem = 499785728 (476MB)
> > mpath0 at root
> > scsibus0 at mpath0: 256 targets
> > mainbus0 at root
> > bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xeb01f (10 entries)
> > bios0: vendor Xen version "3.4.4" date 07/15/2016
> > bios0: Xen HVM domU
> > acpi0 at bios0: rev 2
> > acpi0: sleep states S3 S4 S5
> > acpi0: tables DSDT FACP APIC
> > acpi0: wakeup devices
> > acpitimer0 at acpi0: 3579545 Hz, 32 bits
> > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> > ioapic0 at mainbus0: apid 1 pa 0xfec0, version 11, 48 pins
> > cpu0 at mainbus0: apid 0 (boot processor)
> > cpu0: Intel(R) Xeon(R) CPU L5640 @ 2.27GHz, 2267.15 MHz
> > cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
> > CMOV,PAT,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,SSSE3,CX16,SSE4.
> > 1,SSE4.2,POPCNT,HV,NXE,LONG,LAHF
> > cpu0: 256KB 64b/line 8-way L2 cache
> > cpu0: smt 0, core 0, package 0
> > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> > cpu0: apic clock running at 99MHz
> > acpiprt0 at acpi0: bus 0 (PCI0)
> > acpicpu0 at acpi0: C1(@1 halt!)
> > "PNP0F13" at acpi0 not configured
> > "PNP0303" at acpi0 not configured
> > "PNP0700" at acpi0 not configured
> > "PNP0501" at acpi0 not configured
> > "PNP0400" at acpi0 not configured
> > pvbus0 at mainbus0: Xen 3.4
> > xen0 at pvbus0: features 0x5, 32 grant tabl

Re: Kernel panic on 6.1: init dies under load

2017-05-15 Thread Mike Belopuhov
Hi,

Thanks for reporting this, however there's not enough info to follow
up on this right now.  What is clear is that your provider is using
an ancient version of Xen that doesn't even support the callback
vector interrupt delivery (the emulated xspd0 device is delivering
all interrupts).  We have developed code for Xen 4.5+ platforms and
there was only some testing done by users on 3.x.  So, in a way, you
can consider Xen 3.x to not be officially supported at this point.

Having said that, I've got a few questions:

 - Do you see other write failures as well?

 - Do you have swap enabled? (pstat -s)

 - Do you see crashes when bsd.mp is used instead of a single processor
   kernel (that's right, even on the single processor VM)?

Regards,
Mike

On Mon, May 15, 2017 at 10:28 -0400, Dan Cross wrote:
> >Synopsis:  init dies causing kernel panic on virtualized hosts.
> >Category:  system
> >Environment:
> System  : OpenBSD 6.1
> Details : OpenBSD 6.1 (GENERIC) #6: Sat May  6 09:33:26 CEST
> 2017
>  rob...@syspatch-61-amd64.openbsd.org:
> /usr/src/sys/arch/amd64/compile/GENERIC
> 
> Architecture: OpenBSD.amd64
> Machine : amd64
> >Description:
> Kernel panics under moderate/heavy load when running under a
> hypervisor (I believe my VPS provider is using Xen); init(8)
> dies and the machine panics. `boot sync` does not work and
> the filesystem requires manual fsck on reboot.
> 
> I have not seen this on harware.
> 
> Console data from the panic is as follows:
> 
> : tempest; cat panic
> coredump of syslogd(94574), write failed: errno 14
> coredump of init(1), write failed: errno 14
> panic: init died (signal 10, exit 0)
> Stopped at  Debugger+0x9:   leave
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> *285197  1  0   0x802 0x20000  init
> Debuggger() at Debugger+0x9
> panic() at panic+0xfe
> exit1() at exit1+0x58d
> trapsignal() at trapsignal+0x110
> trap() at trap+0x309
> --- trap (number 4) ---
> end of kernel
> end trace fram: 0xff, count: 10
> 0x18057281cfdc
> https://www.openbsd.org/ddb.html describes the minimum info
> required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb>
> : tempest;
> 
> >How-To-Repeat:
> Run some CPU/memory intensive workload; for example, rebuilding
> the Go compiler and toolchain.  Occasionally the system will
> survive,
> but gets into a state where processes are dying.
> >Fix:
> Unknown.
> 
> 
> dmesg:
> OpenBSD 6.1 (GENERIC) #6: Sat May  6 09:33:26 CEST 2017
> rob...@syspatch-61-amd64.openbsd.org:/usr/src/sys/arch/
> amd64/compile/GENERIC
> real mem = 520093696 (496MB)
> avail mem = 499785728 (476MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xeb01f (10 entries)
> bios0: vendor Xen version "3.4.4" date 07/15/2016
> bios0: Xen HVM domU
> acpi0 at bios0: rev 2
> acpi0: sleep states S3 S4 S5
> acpi0: tables DSDT FACP APIC
> acpi0: wakeup devices
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> ioapic0 at mainbus0: apid 1 pa 0xfec0, version 11, 48 pins
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU L5640 @ 2.27GHz, 2267.15 MHz
> cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
> CMOV,PAT,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,SSSE3,CX16,SSE4.
> 1,SSE4.2,POPCNT,HV,NXE,LONG,LAHF
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpicpu0 at acpi0: C1(@1 halt!)
> "PNP0F13" at acpi0 not configured
> "PNP0303" at acpi0 not configured
> "PNP0700" at acpi0 not configured
> "PNP0501" at acpi0 not configured
> "PNP0400" at acpi0 not configured
> pvbus0 at mainbus0: Xen 3.4
> xen0 at pvbus0: features 0x5, 32 grant table frames, event channel 2
> "vkbd" at xen0: device/vkbd/0 not configured
> "vfb" at xen0: device/vfb/0 not configured
> xbf0 at xen0 backend 0 channel 4: disk
> scsibus1 at xbf0: 2 targets
> sd0 at scsibus1 targ 0 lun 0:  SCSI3 0/direct fixed
> sd0: 20480MB, 512 bytes/sector, 41943040 sectors
> xnf0 at xen0 backend 0 channel 5: address 00:16:3e:15:9a:43
> xnf1 at xen0 backend 0 channel 6: address 00:16:3e:48:5b:04
> "console" at xen0: device/console/0 not configured
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
> pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
> pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel
> 0 wired to compatibility, channel 1 wired to compatibility
> pciide0: channel 0 disabled (no driv

Kernel panic on 6.1: init dies under load

2017-05-15 Thread Dan Cross
>Synopsis:  init dies causing kernel panic on virtualized hosts.
>Category:  system
>Environment:
System  : OpenBSD 6.1
Details : OpenBSD 6.1 (GENERIC) #6: Sat May  6 09:33:26 CEST
2017
 rob...@syspatch-61-amd64.openbsd.org:
/usr/src/sys/arch/amd64/compile/GENERIC

Architecture: OpenBSD.amd64
Machine : amd64
>Description:
Kernel panics under moderate/heavy load when running under a
hypervisor (I believe my VPS provider is using Xen); init(8)
dies and the machine panics. `boot sync` does not work and
the filesystem requires manual fsck on reboot.

I have not seen this on harware.

Console data from the panic is as follows:

: tempest; cat panic
coredump of syslogd(94574), write failed: errno 14
coredump of init(1), write failed: errno 14
panic: init died (signal 10, exit 0)
Stopped at  Debugger+0x9:   leave
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
*285197  1  0   0x802 0x20000  init
Debuggger() at Debugger+0x9
panic() at panic+0xfe
exit1() at exit1+0x58d
trapsignal() at trapsignal+0x110
trap() at trap+0x309
--- trap (number 4) ---
end of kernel
end trace fram: 0xff, count: 10
0x18057281cfdc
https://www.openbsd.org/ddb.html describes the minimum info
required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb>
: tempest;

>How-To-Repeat:
Run some CPU/memory intensive workload; for example, rebuilding
the Go compiler and toolchain.  Occasionally the system will
survive,
but gets into a state where processes are dying.
>Fix:
Unknown.


dmesg:
OpenBSD 6.1 (GENERIC) #6: Sat May  6 09:33:26 CEST 2017
rob...@syspatch-61-amd64.openbsd.org:/usr/src/sys/arch/
amd64/compile/GENERIC
real mem = 520093696 (496MB)
avail mem = 499785728 (476MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xeb01f (10 entries)
bios0: vendor Xen version "3.4.4" date 07/15/2016
bios0: Xen HVM domU
acpi0 at bios0: rev 2
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
ioapic0 at mainbus0: apid 1 pa 0xfec0, version 11, 48 pins
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU L5640 @ 2.27GHz, 2267.15 MHz
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,SSSE3,CX16,SSE4.
1,SSE4.2,POPCNT,HV,NXE,LONG,LAHF
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
"PNP0F13" at acpi0 not configured
"PNP0303" at acpi0 not configured
"PNP0700" at acpi0 not configured
"PNP0501" at acpi0 not configured
"PNP0400" at acpi0 not configured
pvbus0 at mainbus0: Xen 3.4
xen0 at pvbus0: features 0x5, 32 grant table frames, event channel 2
"vkbd" at xen0: device/vkbd/0 not configured
"vfb" at xen0: device/vfb/0 not configured
xbf0 at xen0 backend 0 channel 4: disk
scsibus1 at xbf0: 2 targets
sd0 at scsibus1 targ 0 lun 0:  SCSI3 0/direct fixed
sd0: 20480MB, 512 bytes/sector, 41943040 sectors
xnf0 at xen0 backend 0 channel 5: address 00:16:3e:15:9a:43
xnf1 at xen0 backend 0 channel 6: address 00:16:3e:48:5b:04
"console" at xen0: device/console/0 not configured
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel
0 wired to compatibility, channel 1 wired to compatibility
pciide0: channel 0 disabled (no drives)
pciide0: channel 1 disabled (no drives)
piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x01: SMBus
disabled
vga1 at pci0 dev 2 function 0 "Cirrus Logic CL-GD5446" rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
xspd0 at pci0 dev 3 function 0 "XenSource Platform Device" rev 0x01: apic 1
int 28
isa0 at pcib0
isadma0 at isa0
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: density unknown
fd1 at fdc0 drive 1: density unknown
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (e0bfc277bba6b729.a) swap on sd0b dump on sd0b

usbdevs:
usbdevs: no USB controll