VM crash on 7.2#4

2023-01-01 Thread Mischa

Hi,

Just noticed one of the VMs greeted me with a ddb> prompt.
The host is running 7.2#4 as well as the VM, dmesg of the host below.

I managed to get the following data from the VM:

ddb> show panic
*cpu0: kernel diagnostic assertion "m != NULL" failed: file 
"/usr/src/sys/dev/p

v/if_vio.c", line 1006
ddb> trace
db_enter() at db_enter+0x10
panic(81f17485) at panic+0xb8
__assert(81f891d8,81f89d08,3ee,81f90540) at 
__assert+0x

25
vio_rxeof(8003a000) at vio_rxeof+0x23f
vio_rx_intr(8003a050) at vio_rx_intr+0x38
virtio_check_vqs(80039400) at virtio_check_vqs+0xfe
virtio_pci_legacy_intr(80039400) at virtio_pci_legacy_intr+0x61
intr_handler(80002250c100,80049e80) at intr_handler+0x38
Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
cpu_idle_cycle() at cpu_idle_cycle+0x1f
end trace frame: 0x0, count: -10

root@r2:~ # syspatch -l
001_x509
002_asn1
003_ukbd
004_expat
005_pixman
006_vmm
007_unwind
008_pfsync
009_xserver
010_vmd
011_gpuinv
012_acme

root@r2:~ # dmesg
OpenBSD 7.2 (GENERIC.MP) #4: Mon Dec 12 06:06:42 MST 2022

r...@syspatch-72-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

real mem = 412202078208 (393106MB)
avail mem = 399692173312 (381176MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x7a32f000 (76 entries)
bios0: vendor Dell Inc. version "2.16.0" date 07/20/2022
bios0: Dell Inc. PowerEdge R630
acpi0 at bios0: ACPI 4.0
acpi0: sleep states S0 S5
acpi0: tables DSDT FACP MCEJ WD__ SLIC HPET APIC MCFG MSCT SLIT SRAT 
SSDT SSDT SSDT PRAD DMAR HEST BERT ERST EINJ
acpi0: wakeup devices PCI0(S4) BR1A(S4) BR1B(S4) BR2A(S4) BR2B(S4) 
BR2C(S4) BR2D(S4) BR3A(S4) BR3B(S4) BR3C(S4) BR3D(S4) XHC_(S0) RP02(S4) 
RP03(S4) RP05(S4) RP08(S4) [...]

acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz, 3200.03 MHz, 06-3f-02
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
64b/line 8-way L2 cache, 20MB 64b/line 20-way L3 cache

cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2, IBE
cpu1 at mainbus0: apid 16 (application processor)
cpu1: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz, 3398.59 MHz, 06-3f-02
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
64b/line 8-way L2 cache, 20MB 64b/line 20-way L3 cache

cpu1: smt 0, core 0, package 1
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz, 3399.01 MHz, 06-3f-02
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu2: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
64b/line 8-way L2 cache, 20MB 64b/line 20-way L3 cache

cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 18 (application processor)
cpu3: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz, 3400.00 MHz, 06-3f-02
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu3: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
64b/line 8-way L2 cache, 20MB 64b/line 20-way L3 cache


Re: NSD exit status 11 on 7.0

2021-10-28 Thread Mischa

Hi All,

Just to confirm that the below patch has been working like a charm.
Since the patch was applied, and only this patch, NSD has been behaving 
properly.


Mischa


On 2021-10-20 12:33, Florian Obser wrote:

On 2021-10-20 07:55 +02, Otto Moerbeek  wrote:

On Wed, Oct 20, 2021 at 07:47:30AM +0200, Mischa wrote:

Unfortunately our joy was short lived. This morning I noticed a lot 
of
Oct 20 07:44:15 name1 nsd[80814]: server 76410 died unexpectedly with 
status

11, restarting

It looks like there is a potentially fixed in version 4.3.8.

https://github.com/NLnetLabs/nsd/issues/195
https://github.com/NLnetLabs/nsd/issues/189

https://github.com/NLnetLabs/nsd/blob/NSD_4_3_8_REL/doc/ChangeLog
23 August 2021: Wouter
- Fix #189: nsd 4.3.7 crash answer_delegation: Assertion
`query->delegation_rrset' failed.

(Thanx Roger!)


That is not the correct fix, it only hides the problem and worse,
produces wrong results. Please try this, which is the fix for
https://github.com/NLnetLabs/nsd/issues/194

diff --git namedb.c namedb.c
index 06bef71147c..772e038b16d 100644
--- namedb.c
+++ namedb.c
@@ -583,10 +583,13 @@ domain_find_ns_rrsets(domain_type* domain,
zone_type* zone, rrset_type **ns)
 {
/* return highest NS RRset in the zone that is a delegation above */
domain_type* result = NULL;
+   rrset_type* rrset = NULL;
while (domain && domain != zone->apex) {
-   *ns = domain_find_rrset(domain, zone, TYPE_NS);
-   if (*ns)
+   rrset = domain_find_rrset(domain, zone, TYPE_NS);
+   if (rrset) {
+   *ns = rrset;
result = domain;
+   }
domain = domain->parent;
}




As far as I can tell from the things Martijn found it might be the 
case.


Will give that a try and report back.

Mischa


Are you going to try just the one line fix or the whole of 4.3.8?
I suppose if we want to backport to -stable the one-line fix is
preferred.


Yes, except, we should go with the correct fix above ;) Nothing else is
interesting to backport in 4.3.8 as far as I can tell.



-Otto



I provided an explanation what's going on in
https://github.com/NLnetLabs/nsd/issues/195#issuecomment-947505367
Reproduced here (slightly edited):

712296f (the one-line-fix) only hides the problem, it doesn't fix
anything. The real fix is ba0002e (the diff above).

f.9.1.1.0.0.2.ip6.arpa. is an ENT in ip6.arpa. and so is 2.ip6.arpa.
In line 1420 in query.c we haveq->delegation_domain = 
domain_find_ns_rrsets(

and the unfixed domain_find_ns_rrsets would find the NS RRset for
9.1.1.0.0.2.ip6.arpa.

But it would then continue searching upwards, overwriting *ns which is
>delegation_rrset. Until it hits 2.ip6.arpa. which has no NS
records. So q->delegation_rrset = NULL but at the same time result !=
NULL because we did find a delegation RRset along the way, we just
ignored it (at least for 9.1.1.0.0.2.ip6.arpa., I didn't check if there
was one further up).

domain_find_ns_rrsets returns non-NULL which means we found a
delegation, but at the same time it doesn't give us the delegation NS
RRset.

It is probably best to revert 712296f since on its own it produces 
wrong

results. I.e. adding it to 4.3.7 gives this:

$ dig @192.168.178.219 +norec f.9.1.1.0.0.2.ip6.arpa NS

; <<>> dig 9.10.8-P1 <<>> @192.168.178.219 +norec 
f.9.1.1.0.0.2.ip6.arpa NS

; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 10923
;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;f.9.1.1.0.0.2.ip6.arpa.INNS

;; AUTHORITY SECTION:
ip6.arpa.3600INSOAb.ip6-servers.arpa.
nstld.iana.org. 2021100154 1800 900 604800 3600

;; Query time: 0 msec
;; SERVER: 192.168.178.219#53(192.168.178.219)
;; WHEN: Wed Oct 20 10:24:56 CEST 2021
;; MSG SIZE  rcvd: 115

But the correct answer is this:

dig @::1 +norec  f.9.1.1.0.0.2.ip6.arpa NS

; <<>> dig 9.10.8-P1 <<>> @::1 +norec f.9.1.1.0.0.2.ip6.arpa NS
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48090
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;f.9.1.1.0.0.2.ip6.arpa.INNS

;; AUTHORITY SECTION:
9.1.1.0.0.2.ip6.arpa.86400INNSr.arin.net.
9.1.1.0.0.2.ip6.arpa.86400INNSu.arin.net.
9.1.1.0.0.2.ip6.arpa.86400INNSx.arin.net.
9.1.1.0.0.2.ip6.arpa.86400INNSy.arin.net.
9.1.1.0.0.2.ip6.arpa.86400INNSz.arin.net.
9.1.1.0.0.2.ip6.arpa.86400INNSarin.authdns.ripe.net.

;; Query time: 0 msec
;; SERVER: ::1#53(::1)
;; WHEN: Wed Oct 20 10:24:16 CEST 2021
;; MSG SIZE  rcvd: 171




Re: NSD exit status 11 on 7.0

2021-10-20 Thread Mischa

On 2021-10-20 12:33, Florian Obser wrote:

On 2021-10-20 07:55 +02, Otto Moerbeek  wrote:

On Wed, Oct 20, 2021 at 07:47:30AM +0200, Mischa wrote:

Unfortunately our joy was short lived. This morning I noticed a lot 
of
Oct 20 07:44:15 name1 nsd[80814]: server 76410 died unexpectedly with 
status

11, restarting

It looks like there is a potentially fixed in version 4.3.8.

https://github.com/NLnetLabs/nsd/issues/195
https://github.com/NLnetLabs/nsd/issues/189

https://github.com/NLnetLabs/nsd/blob/NSD_4_3_8_REL/doc/ChangeLog
23 August 2021: Wouter
- Fix #189: nsd 4.3.7 crash answer_delegation: Assertion
`query->delegation_rrset' failed.

(Thanx Roger!)


That is not the correct fix, it only hides the problem and worse,
produces wrong results. Please try this, which is the fix for
https://github.com/NLnetLabs/nsd/issues/194

diff --git namedb.c namedb.c
index 06bef71147c..772e038b16d 100644
--- namedb.c
+++ namedb.c
@@ -583,10 +583,13 @@ domain_find_ns_rrsets(domain_type* domain,
zone_type* zone, rrset_type **ns)
 {
/* return highest NS RRset in the zone that is a delegation above */
domain_type* result = NULL;
+   rrset_type* rrset = NULL;
while (domain && domain != zone->apex) {
-   *ns = domain_find_rrset(domain, zone, TYPE_NS);
-   if (*ns)
+   rrset = domain_find_rrset(domain, zone, TYPE_NS);
+   if (rrset) {
+   *ns = rrset;
result = domain;
+   }
domain = domain->parent;
}



Thanx Florian!
Will give that a go and let you know.

Mischa







As far as I can tell from the things Martijn found it might be the 
case.


Will give that a try and report back.

Mischa


Are you going to try just the one line fix or the whole of 4.3.8?
I suppose if we want to backport to -stable the one-line fix is
preferred.


Yes, except, we should go with the correct fix above ;) Nothing else is
interesting to backport in 4.3.8 as far as I can tell.



-Otto



I provided an explanation what's going on in
https://github.com/NLnetLabs/nsd/issues/195#issuecomment-947505367
Reproduced here (slightly edited):

712296f (the one-line-fix) only hides the problem, it doesn't fix
anything. The real fix is ba0002e (the diff above).

f.9.1.1.0.0.2.ip6.arpa. is an ENT in ip6.arpa. and so is 2.ip6.arpa.
In line 1420 in query.c we haveq->delegation_domain = 
domain_find_ns_rrsets(

and the unfixed domain_find_ns_rrsets would find the NS RRset for
9.1.1.0.0.2.ip6.arpa.

But it would then continue searching upwards, overwriting *ns which is
>delegation_rrset. Until it hits 2.ip6.arpa. which has no NS
records. So q->delegation_rrset = NULL but at the same time result !=
NULL because we did find a delegation RRset along the way, we just
ignored it (at least for 9.1.1.0.0.2.ip6.arpa., I didn't check if there
was one further up).

domain_find_ns_rrsets returns non-NULL which means we found a
delegation, but at the same time it doesn't give us the delegation NS
RRset.

It is probably best to revert 712296f since on its own it produces 
wrong

results. I.e. adding it to 4.3.7 gives this:

$ dig @192.168.178.219 +norec f.9.1.1.0.0.2.ip6.arpa NS

; <<>> dig 9.10.8-P1 <<>> @192.168.178.219 +norec 
f.9.1.1.0.0.2.ip6.arpa NS

; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 10923
;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;f.9.1.1.0.0.2.ip6.arpa.INNS

;; AUTHORITY SECTION:
ip6.arpa.3600INSOAb.ip6-servers.arpa.
nstld.iana.org. 2021100154 1800 900 604800 3600

;; Query time: 0 msec
;; SERVER: 192.168.178.219#53(192.168.178.219)
;; WHEN: Wed Oct 20 10:24:56 CEST 2021
;; MSG SIZE  rcvd: 115

But the correct answer is this:

dig @::1 +norec  f.9.1.1.0.0.2.ip6.arpa NS

; <<>> dig 9.10.8-P1 <<>> @::1 +norec f.9.1.1.0.0.2.ip6.arpa NS
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48090
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;f.9.1.1.0.0.2.ip6.arpa.INNS

;; AUTHORITY SECTION:
9.1.1.0.0.2.ip6.arpa.86400INNSr.arin.net.
9.1.1.0.0.2.ip6.arpa.86400INNSu.arin.net.
9.1.1.0.0.2.ip6.arpa.86400INNSx.arin.net.
9.1.1.0.0.2.ip6.arpa.86400INNSy.arin.net.
9.1.1.0.0.2.ip6.arpa.86400INNSz.arin.net.
9.1.1.0.0.2.ip6.arpa.86400INNSarin.authdns.ripe.net.

;; Query time: 0 msec
;; SERVER: ::1#53(::1)
;; WHEN: Wed Oct 20 10:24:16 CEST 2021
;; MSG SIZE  rcvd: 171




Re: sysupgrade after upgrade shuts down VM

2020-09-24 Thread Mischa
> On 24 Sep 2020, at 13:52, Sebastien Marie  wrote:
> 
> On Thu, Sep 24, 2020 at 12:47:30PM +0200, Mischa wrote:
>>> 
>>> One quirk of the archive: It just creates a directory every night, no
>>> matter if a snap was built or not, you can check with what(1) if you
>>> actually have a different kernel to the one you already tested.
>> 
>> for i in $(jot -w %02d 15 10); do ftp -o bsd.rd-${i} 
>> https://ftp.hostserver.de/archive/2020-09-${i}-0105/snapshots/amd64/bsd.rd; 
>> done
>> 
>> The build from the 15th is the first showing this issue, the one of the 14th 
>> is fine.
>> 
>> tx# what /bsd.rd-15
>> /bsd.rd-15:
>>OpenBSD 6.8-beta (RAMDISK_CD) #65: Sun Sep 13 03:09:57 MDT 2020
>>PD KSH v5.2.14 99/07/13.2
>>$OpenBSD: cert.pem,v 1.21 2020/06/01 18:53:53 sthen Exp $
>> 
> 
> Could you provide the what /bsd.rd-14 too ?
> 
> Downloading 
> https://ftp.hostserver.de/archive/2020-09-14-0105/snapshots/amd64/bsd.rd
> and https://ftp.hostserver.de/archive/2020-09-15-0105/snapshots/amd64/bsd.rd 
> , I have
> the same file.
> 
> $ sha256 -b bsd.rd-14 bsd.rd-15
> SHA256 (bsd.rd-14) = wfNVV8gKxUP8gJvozr73T2bbz2uuKCX7p7JeS3kmTj8=
> SHA256 (bsd.rd-15) = wfNVV8gKxUP8gJvozr73T2bbz2uuKCX7p7JeS3kmTj8=
> 
> Is it is the same on your side too ? if yes, it means hypervisor
> doesn't have constant behaviour.

I upgraded my -current host to the latest release and created two VMs, one 
running 6.7-stable and one 6.8-current.
Rebooted each VM in bsd.rd around 10 times and on 6.8 I don’t see this 
happening at all.

On 6.7 I indeed see inconsistent behaviour, but never showed itself as obvious 
then the last couple of weeks.
It is happening with both 6.7 as well as 6.8 bsd.rd, I needed to try more times 
to see this happening.

Somewhere in between 6.7 -> 6.8 it seems this issue has been addressed.

Thank you all for sharing your insights and help.
Will keep this host running -current for the foreseeable future, if anybody 
needs a VM for testing/breaking let me know.

Mischa

PS: I love the fact ctrl-l is working in ksh vi mode! Thank you for doing that!



Re: sysupgrade after upgrade shuts down VM

2020-09-24 Thread Mischa



> On 24 Sep 2020, at 12:23, Florian Obser  wrote:
> 
> On Thu, Sep 24, 2020 at 12:13:31PM +0200, Mischa wrote:
>> 
>> 
>>> On 24 Sep 2020, at 09:15, Florian Obser  wrote:
>>> 3a) does it also shutdown: bisect the hypervisor (tbh I expect the problem 
>>> here).
>> 
>> 6.8 bsd.rd shuts down
>> 6.7 bsd.rd reboots
>> 
>> Both VMs are running on the same host which is on 6.7.
>> # sysctl kern.version
>> kern.version=OpenBSD 6.7 (GENERIC.MP) #1: Sat May 16 16:33:02 MDT 2020
>>
>> r...@syspatch-67-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> 
>>> 3b) does it reboot: bisect bsd.rd
>> 
>> What do you mean?
> 
> Go to https://ftp.hostserver.de/archive/ <https://ftp.hostserver.de/archive/>
> 
> There you will find an archive of old snapshots, about 100 days worth.
> i.e, the oldest:
> https://ftp.hostserver.de/archive/2020-06-20-0105/snapshots/amd64/ 
> <https://ftp.hostserver.de/archive/2020-06-20-0105/snapshots/amd64/>
> and the newest:
> https://ftp.hostserver.de/archive/2020-09-24-0105/snapshots/amd64/ 
> <https://ftp.hostserver.de/archive/2020-09-24-0105/snapshots/amd64/>
> 
> We already know that the newest is bad.
> 
> Pick a bsd.rd from the middle (I'm just eyeballing this):
> 
> https://ftp.hostserver.de/archive/2020-08-02-0105/snapshots/amd64/bsd.rd 
> <https://ftp.hostserver.de/archive/2020-08-02-0105/snapshots/amd64/bsd.rd>
> 
> Does that one work?
> Yes: Pick one in the middle between 2020-08-02 and 2020-09-24
> No: Pick one in the middle between 2020-08-02 and 2020-06-20.
> 
> binary search...
> 
> One quirk of the archive: It just creates a directory every night, no
> matter if a snap was built or not, you can check with what(1) if you
> actually have a different kernel to the one you already tested.

for i in $(jot -w %02d 15 10); do ftp -o bsd.rd-${i} 
https://ftp.hostserver.de/archive/2020-09-${i}-0105/snapshots/amd64/bsd.rd; done

The build from the 15th is the first showing this issue, the one of the 14th is 
fine.

tx# what /bsd.rd-15
/bsd.rd-15:
OpenBSD 6.8-beta (RAMDISK_CD) #65: Sun Sep 13 03:09:57 MDT 2020
PD KSH v5.2.14 99/07/13.2
$OpenBSD: cert.pem,v 1.21 2020/06/01 18:53:53 sthen Exp $


Mischa



Re: sysupgrade after upgrade shuts down VM

2020-09-24 Thread Mischa



> On 24 Sep 2020, at 09:15, Florian Obser  wrote:
> 
> Hi Mischa,
> 
> On Thu, Sep 24, 2020 at 08:52:55AM +0200, Mischa wrote:
>> Hi All,
>> 
>> With the last couple of -current updates I noticed a VM doesn’t come back 
>> after running sysupgrade, which it used to do.
> 
> it's very unlikely that this is a sysupgrade problem.
> More likely something in the kernel changed (vm or hypervisor).
> 
>> I don’t know exactly when it started but something in the late #60s.
> 
> This number doesn't mean anything. Please provide build dates.
> 
>> 
>> Running sysupgrade from within the VM, it reboots and goes through the 
>> upgrade as normal. Once it’s done with the upgrade it shuts down.
>> Tail-end of the process from the latest sysupgrade.
>> 
>> Set name(s)? (or 'abort' or 'done') [done] done
>> Directory does not contain SHA256.sig. Continue without verification? [no] 
>> yes
>> Installing bsd  100% |**| 20383 KB00:01  
>>   
>> Installing bsd.rd   100% |**| 10141 KB00:00  
>>   
>> Installing base68.tgz   100% |**|   289 MB01:42  
>>   
>> Installing comp68.tgz   100% |**| 74305 KB00:52  
>>   
>> Installing man68.tgz100% |**|  7484 KB00:10  
>>   
>> Installing game68.tgz   100% |**|  2739 KB00:01  
>>   
>> Installing xbase68.tgz  100% |**| 28866 KB00:17  
>>   
>> Installing xshare68.tgz 100% |**|  4499 KB00:15  
>>   
>> Installing xfont68.tgz  100% |**| 39342 KB00:23  
>>   
>> Installing xserv68.tgz  100% |**| 18333 KB00:07  
>>   
>> Location of sets? (disk http nfs or 'done') [done] done
>> Making all device nodes... done.
>> Relinking to create unique kernel... done.
>> 
>> CONGRATULATIONS! Your OpenBSD upgrade has been successfully completed!
>> 
>> syncing disks... done
>> vmmci0: powerdown
> 
>  ^
> Why does it think it should power down?
> 
> I never ran vmm, so this is all a wild guess, but here is how I would
> approach this:
> 
> 1) manually boot into bsd.rd, hit 's' to get to a shell prompt and
> type reboot, does it shutdown?

>> OpenBSD/amd64 BOOT 3.52  
>> 
boot> bsd.rd
booting hd0a:bsd.rd: 3818189+1573888+3878136+0+757760 
[324353+128+468792+313530]=0xaa0780
entry point at 0x81001000
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2020 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 6.8-beta (RAMDISK_CD) #75: Wed Sep 23 15:43:49 MDT 2020
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
[snip]
Welcome to the OpenBSD/amd64 6.8 installation program.
(I)nstall, (U)pgrade, (A)utoinstall or (S)hell? s
# reboot
syncing disks... done
vmmci0: powerdown
rebooting…

[EOT]

> 2) if yes, get a 6.7 bsd.rd (which I presume is known good) and retry

>> OpenBSD/amd64 BOOT 3.47
boot> bsd.rd
booting hd0a:bsd.rd: 3826379+1557504+3881976+0+598016 
[301104+128+465696+311208]=0xa71778
entry point at 0x81001000
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2020 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 6.7 (RAMDISK_CD) #177: Thu May  7 11:19:02 MDT 2020
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
[snip]
Welcome to the OpenBSD/amd64 6.7 installation program.
(I)nstall, (U)pgrade, (A)utoinstall or (S)hell? s
# reboot
syncing disks... done
vmmci0: powerdown
rebooting...
Using drive 0, partition 3.
Loading..
probing: pc0 com0 mem[638K 1022M a20=on] 
disk: hd0+
>> OpenBSD/amd64 BOOT 3.47
boot>

> 3a) does it also shutdown: bisect the hypervisor (tbh I expect the problem 
> here).

6.8 bsd.rd shuts down
6.7 bsd.rd reboots

Both VMs are running on the same host which is on 6.7.
# sysctl kern.version
kern.version=OpenBSD 6.7 (GENERIC.MP) #1: Sat May 16 16:33:02 MDT 2020

r...@syspatch-67-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

> 3b) does it reboot: bisect bsd.rd

What do you mean?

Mischa



Re: sysupgrade after upgrade shuts down VM

2020-09-24 Thread Mischa



> On 24 Sep 2020, at 09:15, Florian Obser  wrote:
> 
> Hi Mischa,
> 
> On Thu, Sep 24, 2020 at 08:52:55AM +0200, Mischa wrote:
>> Hi All,
>> 
>> With the last couple of -current updates I noticed a VM doesn’t come back 
>> after running sysupgrade, which it used to do.
> 
> it's very unlikely that this is a sysupgrade problem.
> More likely something in the kernel changed (vm or hypervisor).
> 
>> I don’t know exactly when it started but something in the late #60s.
> 
> This number doesn't mean anything. Please provide build dates.

Will try to find the one where it started.

>> Running sysupgrade from within the VM, it reboots and goes through the 
>> upgrade as normal. Once it’s done with the upgrade it shuts down.
>> Tail-end of the process from the latest sysupgrade.
>> 
>> Set name(s)? (or 'abort' or 'done') [done] done
>> Directory does not contain SHA256.sig. Continue without verification? [no] 
>> yes
>> Installing bsd  100% |**| 20383 KB00:01  
>>   
>> Installing bsd.rd   100% |**| 10141 KB00:00  
>>   
>> Installing base68.tgz   100% |**|   289 MB01:42  
>>   
>> Installing comp68.tgz   100% |**| 74305 KB00:52  
>>   
>> Installing man68.tgz100% |**|  7484 KB00:10  
>>   
>> Installing game68.tgz   100% |**|  2739 KB00:01  
>>   
>> Installing xbase68.tgz  100% |**| 28866 KB00:17  
>>   
>> Installing xshare68.tgz 100% |**|  4499 KB00:15  
>>   
>> Installing xfont68.tgz  100% |**| 39342 KB00:23  
>>   
>> Installing xserv68.tgz  100% |**| 18333 KB00:07  
>>   
>> Location of sets? (disk http nfs or 'done') [done] done
>> Making all device nodes... done.
>> Relinking to create unique kernel... done.
>> 
>> CONGRATULATIONS! Your OpenBSD upgrade has been successfully completed!
>> 
>> syncing disks... done
>> vmmci0: powerdown
> 
>  ^
> Why does it think it should power down?

Indeed.

> I never ran vmm, so this is all a wild guess, but here is how I would
> approach this:
> 
> 1) manually boot into bsd.rd, hit 's' to get to a shell prompt and
> type reboot, does it shutdown?
> 2) if yes, get a 6.7 bsd.rd (which I presume is known good) and retry
> 3a) does it also shutdown: bisect the hypervisor (tbh I expect the problem 
> here).
> 3b) does it reboot: bisect bsd.rd
> 
> You can use https://ftp.hostserver.de/archive/ 
> <https://ftp.hostserver.de/archive/> for bisecting.

6.7 is fine, majority of the VMs are on 6.7-stable, issuing a reboot does an 
actual reboot.
Will try with -current bsd.rd and see what happens.

Mischa



sysupgrade after upgrade shuts down VM

2020-09-24 Thread Mischa
Hi All,

With the last couple of -current updates I noticed a VM doesn’t come back after 
running sysupgrade, which it used to do.
I don’t know exactly when it started but something in the late #60s.

Running sysupgrade from within the VM, it reboots and goes through the upgrade 
as normal. Once it’s done with the upgrade it shuts down.
Tail-end of the process from the latest sysupgrade.

Set name(s)? (or 'abort' or 'done') [done] done
Directory does not contain SHA256.sig. Continue without verification? [no] yes
Installing bsd  100% |**| 20383 KB00:01
Installing bsd.rd   100% |**| 10141 KB00:00
Installing base68.tgz   100% |**|   289 MB01:42
Installing comp68.tgz   100% |**| 74305 KB00:52
Installing man68.tgz100% |**|  7484 KB00:10
Installing game68.tgz   100% |**|  2739 KB00:01
Installing xbase68.tgz  100% |**| 28866 KB00:17
Installing xshare68.tgz 100% |**|  4499 KB00:15
Installing xfont68.tgz  100% |**| 39342 KB00:23
Installing xserv68.tgz  100% |**| 18333 KB00:07
Location of sets? (disk http nfs or 'done') [done] done
Making all device nodes... done.
Relinking to create unique kernel... done.

CONGRATULATIONS! Your OpenBSD upgrade has been successfully completed!

syncing disks... done
vmmci0: powerdown
rebooting...

[EOT]

# vmctl show tx
  ID   PID VCPUS  MAXMEM  CURMEM TTYOWNERSTATE NAME
   3 - 14.0G   -   - root  stopped tx


Anything I can change to have the VM reboot and not shutdown?

Mischa



Re: Panic captures of VM

2019-10-16 Thread Mischa Peters



> On 16 Oct 2019, at 21:35, Mike Larkin  wrote:
> 
> On Wed, Oct 16, 2019 at 06:14:55PM +0200, Mischa wrote:
>> Hi Stuart,
>> 
>> 
>>>> On 16 Oct 2019, at 18:07, Stuart Henderson  wrote:
>>> 
>>> On 2019/10/16 18:00, Mischa wrote:
>>>> Hi All,
>>>> 
>>>> One of the OpenBSD VMs running on 6.6-beta #313 is rebooting or panicing 
>>>> in different ways.
>>>> Not sure if they are all relevant or useful but here are the ones we 
>>>> managed to capture.
>>> 
>>> There's not a lot of information in your mail... for starters, what are
>>> you running the VM in, and is there any difference in the config for that
>>> VM compared to other working ones?
>> 
>> Fair point.
>> 
>> There are 10 VMs running on this host, the host is running:
>> $ sysctl kern.version
>> kern.version=OpenBSD 6.6-beta (GENERIC.MP) #313: Tue Sep 10 23:30:52 MDT 2019
>>dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> 
>> I know of one other VM which is rebooting every once in a while, but haven’t 
>> seen any panics.
>> As for the other 8 I can see every once in a while a VM shutdown. But no 
>> capture of the console.
>> 
>>> Do you have any other VMs running the same OpenBSD snapshot successfully?
>> 
>> The rest of the VMs are on -stable as far as I am aware. Other people are 
>> operating these VMS.
>> 
>>> Can you boot an old kernel and get a dmesg?
>> 
>> Here is a dmesg which we manage to capture after one of the panics:
>> 
> 
> Are you in swap at all on that host?

Yes. :/

load averages:  0.02,  0.06,  0.13 server1.openbsd.amsterdam 
21:41:11
71 processes: 70 idle, 1 on processor   up 34 days,  
2:15
CPU0:  0.7% user,  0.0% nice,  2.6% sys,  0.5% spin,  0.1% intr, 96.2% idle
CPU1:  0.7% user,  0.0% nice,  3.4% sys,  0.4% spin,  0.0% intr, 95.6% idle
CPU2:  6.2% user,  0.0% nice, 31.0% sys, 11.0% spin,  0.0% intr, 51.8% idle
CPU3:  0.7% user,  0.0% nice,  2.8% sys,  0.3% spin,  0.0% intr, 96.1% idle
Memory: Real: 5427M/7623M act/tot Free: 275M Cache: 2001M Swap: 41M/8405M

Mischa


> 
> -ml
> 
>> fd0# panic: mtx 0x81f353f0: locking against myself
>> Using drive 0, partition 3.
>> Loading..
>> probing: pc0 com0 mem[638K 510M a20=on]
>> disk: hd0+ hd1+
>>>> OpenBSD/amd64 BOOT 3.45
>> /
>> com0: 115200 baud
>> switching console to com0
>>>> OpenBSD/amd64 BOOT 3.45
>> boot>
>> booting hd0a:/bsd: 12666184+2937864+332896+0+704512 
>> [987630+128+1010256+738953]=0x127d750
>> entry point at 0x81001000
>> [ using 2738000 bytes of bsd ELF symbol table ]
>> Copyright (c) 1982, 1986, 1989, 1991, 1993
>>The Regents of the University of California.  All rights reserved.
>> Copyright (c) 1995-2019 OpenBSD. All rights reserved.  
>> https://www.OpenBSD.org
>> 
>> OpenBSD 6.6 (GENERIC) #325: Wed Oct  2 11:38:13 MDT 2019
>>dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
>> real mem = 520077312 (495MB)
>> avail mem = 491753472 (468MB)
>> mpath0 at root
>> scsibus0 at mpath0: 256 targets
>> mainbus0 at root
>> bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf3f40 (10 entries)
>> bios0: vendor SeaBIOS version "1.11.0p2-OpenBSD-vmm" date 01/01/2011
>> bios0: OpenBSD VMM
>> acpi at bios0 not configured
>> cpu0 at mainbus0: (uniprocessor)
>> cpu0: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3101.63 MHz, 06-3a-09
>> cpu0: 
>> FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,LONG,LAHF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,MELTDOWN
>> cpu0: 256KB 64b/line 8-way L2 cache
>> tsc_timecounter_init: TSC skew=0 observed drift=0
>> cpu0: smt 0, core 0, package 0
>> cpu0: using VERW MDS workaround
>> pvbus0 at mainbus0: OpenBSD
>> pvclock0 at pvbus0
>> pci0 at mainbus0 bus 0
>> pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00
>> virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
>> viornd0 at virtio0
>> virtio0: irq 3
>> virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00
>> vio0 at virtio1: address fe:e1:bb:d1:24:36
>> virtio1: irq 5
>> virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Storage" rev 0x00
>> vioblk0 at virtio2
>> scsibus1 at vioblk0: 2 targets
>> sd0 at scsibus1 targ 0 lun 0: 
>> sd0: 51200MB, 512 bytes/sector, 104

Re: Panic captures of VM

2019-10-16 Thread Mischa
Hi Stuart,


> On 16 Oct 2019, at 18:07, Stuart Henderson  wrote:
> 
> On 2019/10/16 18:00, Mischa wrote:
>> Hi All,
>> 
>> One of the OpenBSD VMs running on 6.6-beta #313 is rebooting or panicing in 
>> different ways.
>> Not sure if they are all relevant or useful but here are the ones we managed 
>> to capture.
> 
> There's not a lot of information in your mail... for starters, what are
> you running the VM in, and is there any difference in the config for that
> VM compared to other working ones?

Fair point.

There are 10 VMs running on this host, the host is running:
$ sysctl kern.version
kern.version=OpenBSD 6.6-beta (GENERIC.MP) #313: Tue Sep 10 23:30:52 MDT 2019
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

I know of one other VM which is rebooting every once in a while, but haven’t 
seen any panics.
As for the other 8 I can see every once in a while a VM shutdown. But no 
capture of the console.

> Do you have any other VMs running the same OpenBSD snapshot successfully?

The rest of the VMs are on -stable as far as I am aware. Other people are 
operating these VMS.

> Can you boot an old kernel and get a dmesg?

Here is a dmesg which we manage to capture after one of the panics:

fd0# panic: mtx 0x81f353f0: locking against myself
Using drive 0, partition 3.
Loading..
probing: pc0 com0 mem[638K 510M a20=on]
disk: hd0+ hd1+
>> OpenBSD/amd64 BOOT 3.45
/
com0: 115200 baud
switching console to com0
>> OpenBSD/amd64 BOOT 3.45
boot>
booting hd0a:/bsd: 12666184+2937864+332896+0+704512 
[987630+128+1010256+738953]=0x127d750
entry point at 0x81001000
[ using 2738000 bytes of bsd ELF symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2019 OpenBSD. All rights reserved.  https://www.OpenBSD.org
 
OpenBSD 6.6 (GENERIC) #325: Wed Oct  2 11:38:13 MDT 2019
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
real mem = 520077312 (495MB)
avail mem = 491753472 (468MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf3f40 (10 entries)
bios0: vendor SeaBIOS version "1.11.0p2-OpenBSD-vmm" date 01/01/2011
bios0: OpenBSD VMM
acpi at bios0 not configured
cpu0 at mainbus0: (uniprocessor)
cpu0: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3101.63 MHz, 06-3a-09
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,LONG,LAHF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
tsc_timecounter_init: TSC skew=0 observed drift=0
cpu0: smt 0, core 0, package 0
cpu0: using VERW MDS workaround
pvbus0 at mainbus0: OpenBSD
pvclock0 at pvbus0
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00
virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
viornd0 at virtio0
virtio0: irq 3
virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00
vio0 at virtio1: address fe:e1:bb:d1:24:36
virtio1: irq 5
virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk0 at virtio2
scsibus1 at vioblk0: 2 targets
sd0 at scsibus1 targ 0 lun 0: 
sd0: 51200MB, 512 bytes/sector, 104857600 sectors
virtio2: irq 6
virtio3 at pci0 dev 4 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk1 at virtio3
scsibus2 at vioblk1: 2 targets
sd1 at scsibus2 targ 0 lun 0: 
sd1: 51200MB, 512 bytes/sector, 104857600 sectors
virtio3: irq 7
virtio4 at pci0 dev 5 function 0 "OpenBSD VMM Control" rev 0x00
vmmci0 at virtio4
virtio4: irq 9
isa0 at mainbus0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns8250, no fifo
com0: console
vscsi0 at root
scsibus3 at vscsi0: 256 targets
softraid0 at root
scsibus4 at softraid0: 256 targets
root on sd0a (d4c2875ac610c324.a) swap on sd0b dump on sd0b
WARNING: / was not properly unmounted
Automatic boot in progress: starting file system checks.
/dev/sd0a (d4c2875ac610c324.a): 2323 files, 47973 used, 466466 free (178 frags, 
58286 blocks, 0.0% fragmentation)
/dev/sd0a (d4c2875ac610c324.a): MARKING FILE SYSTEM CLEAN
/dev/rsd1a: file system is clean; not checking
/dev/rsd1k: file system is clean; not checking
/dev/rsd1d: file system is clean; not checking
/dev/rsd1f: file system is clean; not checking
/dev/rsd1g: file system is clean; not checking
/dev/rsd1h: file system is clean; not checking
/dev/rsd1j: file system is clean; not checking
/dev/rsd1i: file system is clean; not checking
/dev/rsd1e: file system is clean; not checking
/dev/sd0k (d4c2875ac610c324.k): 3992 files, 548865 used, 10831142 free (78 
frags, 1353883 blocks, 0.0% fragmentation)
/dev/sd0k (d4c2875ac610c324.k): MARKING FILE SYSTEM CLEAN
/dev/sd0d (d4c2875ac610c324.d): 10 files, 7 used, 1697224 free (56 frags, 
212146 blo

Panic captures of VM

2019-10-16 Thread Mischa
Hi All,

One of the OpenBSD VMs running on 6.6-beta #313 is rebooting or panicing in 
different ways.
Not sure if they are all relevant or useful but here are the ones we managed to 
capture.

Mischa

###

fd0$ uvm_fault(0x81f9def0, 0x0, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  splassert_check+0x1:cmpb%cl,0(%rax)
Using drive 0, partition 3.
Loading..
probing: pc0 com0 mem[638K 510M a20=on]
disk: hd0+ hd1+
>> OpenBSD/amd64 BOOT 3.45
/
com0: 115200 baud
switching console to com0
>> OpenBSD/amd64 BOOT 3.45
boot>
booting hd0a:/bsd: 12641608+2937864+331376+0+704512 
[980486+128+1009392+738445]=0x1274608
entry point at 0x81001000
[ using 2729480 bytes of bsd ELF symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2019 OpenBSD. All rights reserved.  https://www.OpenBSD.org
 
OpenBSD 6.6-beta (GENERIC) #301: Tue Sep 24 15:28:24 MDT 2019
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC

###

• d0# panic: mtx 0x81f353f0: locking against myself
Using drive 0, partition 3.
Loading..
probing: pc0 com0 mem[638K 510M a20=on]
disk: hd0+ hd1+
>> OpenBSD/amd64 BOOT 3.45
/
com0: 115200 baud
switching console to com0
>> OpenBSD/amd64 BOOT 3.45
boot>
booting hd0a:/bsd: 12666184+2937864+332896+0+704512 
[987630+128+1010256+738953]=0x127d750
entry point at 0x81001000
[ using 2738000 bytes of bsd ELF symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2019 OpenBSD. All rights reserved.  https://www.OpenBSD.org
 
OpenBSD 6.6 (GENERIC) #325: Wed Oct  2 11:38:13 MDT 2019
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC

###

fd0# sysupgrade -s
SHA256.sig   100% |*|  2141   00:00
Signature Verified
Verifying old sets.
base66.tgz 2% | |  5109 KB00:01
fd0# uvm_fault(0x81fb5808, 0x3ff, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  amdgpu_atombios_encoder_setup_dig_transmitter+0x63: testb   
%
r9b,0x(%r9,%rcx,4)
uvm_fault(0x81fb5808, 0x3fd, 0, 1) -> e
kernel: page fault trap, code=0
Faulted in DDB; continuing...
uvm_fault(0x81f88c90, 0x894c1488, 0, 2) -> e
kernel: page fault trap, code=0
Faulted in DDB; continuing...
uvm_fault(0x81f88c90, 0x894c1488, 0, 2) -> e
kernel: page fault trap, code=0
Faulted in DDB; continuing...
uvm_fault(0x81f88c90, 0x894c1488, 0, 2) -> e
kernel: page fault trap, code=0
Faulted in DDB; continuing...
uvm_fault(0x81f88c90, 0x894c1488, 0, 2) -> e
kernel: page fault trap, code=0
Faulted in DDB; continuing...
uvm_fault(0x81f88c90, 0x894c1488, 0, 2) -> e
kernel: page fault trap, code=0
Faulted in DDB; continuing...
uvm_fault(0x81f88c90, 0x894c1488, 0, 2) -> e
Using drive 0, partition 3.
Loading..
probing: pc0 com0 mem[638K 510M a20=on]
disk: hd0+ hd1+
>> OpenBSD/amd64 BOOT 3.45
/
com0: 115200 baud
switching console to com0
>> OpenBSD/amd64 BOOT 3.45
boot>
booting hd0a:/bsd: 12666184+2937872+333664+0+704512 
[983130+128+1010256+738953]=0x127c5c0
entry point at 0x81001000
[ using 2733504 bytes of bsd ELF symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2019 OpenBSD. All rights reserved.  https://www.OpenBSD.org
 
OpenBSD 6.6 (GENERIC) #334: Sat Oct  5 12:16:54 MDT 2019
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC

###

OpenBSD/amd64 (fd0.openbsd.amsterdam) (tty00)
 
login: uvm_fault(0x81fdf560, 0x0, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  Xintr_uvuvm_fault(0x81fdf560, 0x43a, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  Xintr_legacy5_untramp:  fcompl  0x42(%rdx)
ddb> uvm_fault(0x81fdf560, 0x0, 0, 1) -> e
kernel: page fault trap, code=0
Faulted in DDB; continuing...
ddb> uvm_fault(0x81fdf560, 0x0, 0, 1) -> e
kernel: page fault trap, code=0
Faulted in DDB; continuing...
ddb> uvm_fault(0x81fdf560, 0x0, 0, 1) -> e
kernel: page fault trap, code=0
Faulted in DDB; continuing...
ddb> uvm_fault(0x81fdf560, 0x43f, 0, 1) -> e
kernel: page fault trap, code=0
Faulted in DDB; continuing...
ddb> uvm_fault(0x81fdf560, 0x0, 0, 1) -> e
kernel: page fault trap, code=0
Faulted in DDB; continuing...
ddb> uvm_fault(0x81fdf560, 0x43f, 0, 1) -> e
kernel: page fault trap, code=0
Faulted in DDB; continuing...
ddb> uvm_fault(0x81fdf560, 0x0, 0, 1) -> e
kernel: page fault trap, code=0
Faulted in DDB; continuing...
ddb> uvm_fault(0x81fdf560, 0x0, 0, 1) -

Re: ifconfig bridge crashes host

2019-07-24 Thread Mischa



> On 23 Jul 2019, at 23:40, Hrvoje Popovski  wrote:
> 
> On 23.7.2019. 17:03, obs...@high5.nl wrote:
>>> Synopsis:   ifconfig bridge crashes host
>>> Category:   
>>> Environment:
>>  System  : OpenBSD 6.5
>>  Details : OpenBSD 6.5 (GENERIC.MP) #1: Mon May 27 18:27:59 CEST 2019
>>   
>> r...@syspatch-65-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> 
>>  Architecture: OpenBSD.amd64
>>  Machine : amd64
>>> Description:
>>  After running the command "ifconfig bridge" twice on the host, the host
>>  became unresponsive. I was able to capture the trace from the console.
>>> How-To-Repeat:
>>  The host was running for some time so I am uncertain if it's related to 
>> time,
>>  but I have seen this happening a couple of times now, and it seems 
>> running the
>>  "ifconfig bridge" command multiple times triggers this.
> 
> Hi,
> 
> can you update your box with latest snapshot ?
> There were some problems with "ifconfig bridge" command few months ago..

Will give that a go.

Thanx!

Mischa



[no subject]

2019-05-09 Thread Mischa
>Synopsis:  ifconfig bridge crashed host
>Category:  
>Environment:
System  : OpenBSD 6.5
Details : OpenBSD 6.5 (GENERIC.MP) #0: Wed Apr 24 23:38:54 CEST 2019
 
r...@syspatch-65-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:

Issued the command "ifconfig bridge" on a server running
vmm/vmd exclusively, after part of the output was shown the
machine froze. Connecting to the console it showed the debug
prompt. 

There is only limited information I was able to capture on the debug
prompt, not sure if this is heplful:

ddb{3}> p
818993d1
ddb{3}> trace
savectx(6,0,2000,36,b75479de000,2000) at savectx+0xb1
end of kernel
end trace frame: 0x7f7c69d0, count: -1
ddb{3}> next
After 23 instructions
Stopped at  x86_ipi_db+0x2f:ret
ddb{3}> next
After 4 instructions
Stopped at  x86_ipi_db+0x2f:ret


>How-To-Repeat:
Have not tried to reproduce. :/
>Fix:



dmesg:
OpenBSD 6.5 (GENERIC.MP) #0: Wed Apr 24 23:38:54 CEST 2019

r...@syspatch-65-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 34329825280 (32739MB)
avail mem = 33279778816 (31738MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xcf49c000 (84 entries)
bios0: vendor Dell Inc. version "6.4.0" date 07/23/2013
bios0: Dell Inc. PowerEdge R610
acpi0 at bios0: rev 2
acpi0: sleep states S0 S4 S5
acpi0: tables DSDT FACP APIC SPCR HPET DMAR MCFG WD__ SLIC ERST HEST BERT EINJ 
SRAT TCPA SSDT
acpi0: wakeup devices PCI0(S5)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 32 (boot processor)
cpu0: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz, 3325.40 MHz, 06-2c-02
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 1
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 133MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
cpu1 at mainbus0: apid 0 (application processor)
cpu1: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz, 1596.01 MHz, 06-2c-02
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 0, package 0
cpu2 at mainbus0: apid 34 (application processor)
cpu2: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz, 3325.01 MHz, 06-2c-02
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 1, package 1
cpu3 at mainbus0: apid 2 (application processor)
cpu3: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz, 1596.01 MHz, 06-2c-02
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 0, core 1, package 0
cpu4 at mainbus0: apid 36 (application processor)
cpu4: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz, 3325.01 MHz, 06-2c-02
cpu4: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN
cpu4: 256KB 64b/line 8-way L2 cache
cpu4: smt 0, core 2, package 1
cpu5 at mainbus0: apid 4 (application processor)
cpu5: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz, 1596.01 MHz, 06-2c-02
cpu5: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN
cpu5: 256KB 64b/line 8-way L2 

Re: VMs "crash" with: vcpu_run_loop: vm 19 / vcpu 0 run ioctl failed: Invalid argument

2018-06-21 Thread Mischa
> On 19 Jun 2018, at 20:28, Mischa  wrote:
> 
>> On 19 Jun 2018, at 17:51, Mike Larkin  wrote:
>> On Tue, Jun 19, 2018 at 03:42:06PM +0200, obs...@high5.nl wrote:
>>>> Synopsis:  VMs stop intermitently after vcpu_run_loop error
>>>> Category:  system
>>>> Environment:
>>> System  : OpenBSD 6.3
>>> Details : OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018
>>>  
>>> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>>> 
>>> Architecture: OpenBSD.amd64
>>> Machine : amd64
>>>> Description:
>>> Currently running 12 VMs on a single machine. After some random time, 
>>> VMs randomly shutdown. Sometimes a single VM and sometimes more. It's 
>>> always after an error message like the following a VM stops.
>>> Jun 19 11:16:49 j5 vmd[59907]: vcpu_run_loop: vm 11 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[51030]: vcpu_run_loop: vm 6 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[64667]: vcpu_run_loop: vm 9 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[53125]: vcpu_run_loop: vm 10 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[61348]: vcpu_run_loop: vm 7 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[59411]: vcpu_run_loop: vm 4 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[52083]: vcpu_run_loop: vm 12 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[21423]: vcpu_run_loop: vm 3 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[40185]: vcpu_run_loop: vm 5 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[60512]: vcpu_run_loop: vm 8 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[44187]: vcpu_run_loop: vm 1 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[46728]: vcpu_run_loop: vm 2 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:20:30 j5 vmd[81571]: vmd: vm 22 event thread exited unexpectedly  
>>>
>> 
>> This is almost surely the following bug, fixed in April (log from pmap.c):
>> 
>> revision 1.113
>> date: 2018/04/17 06:31:55;  author: mlarkin;  state: Exp;  lines: +275 -64;  
>> commitid: BaLjO2NVfYaZP00l;
>> Better way of allocating EPT entries.
>> 
>> Don't use the standard pmap PTE functions to manipulate EPT PTEs. This
>> occasionally caused VMs to fail after random amounts of time due to
>> loading the pmap on the CPU and the processor updating A/D bits (which
>> are reserved bits in EPT). This ultimately manifested itself as errors
>> from vmd ("vcpu X run ioctl failed".)
>> 
>> tested by many, on different types of HW, no regressions noted
>> 
>> ---
>> 
>> Can you try -current and see if you can still reproduce this problem?

Tried -current today but got a kernel panic, seems to be unrelated to vmd but 
wasn't able to collect all the information that is needed to file the bug.
Only got the trace. Will try -current again in a couple of days.

The below is what I was able to collect, will do better next time.

panic: kernel diagnostic assertion "_kernel_lock_held()" failed: file 
"/usr/src/sys/net/if.c", line 1382
Stopped at  db_enter+0x12:  popq%r11
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 374176   65811070x100010  0x4003K vmd
  94393   65811070x100010  0x4002  vmd
 311214   91351070x100010  0x4000  vmd
*299692  82346 910x12  01  snmpd
db_enter() at db_enter+0x12
panic() at panic+0x138
__assert(818eebd4,8000222de2e0,0,8000222de3d8) at __assert+0x24

ifa_ifwithaddr(1962634d45f6caa5,8000222de3d8) at ifa_ifwithaddr+0xed   
in_pcbaddrisavail(c69add51b77cda1a,ff01f1cb6118,18,16) at in_pcbaddrisavail
+0xd0
udp_output(8b483746c94285a0,237d,0,0) at udp_output+0x168
sosend(c197a5735fd92204,ff01edcbc798,80009bd8,6b,5,8000ffff9bf8
) at sosend+0x351
sendit(cb9e953daf7fb585,80009bd8,8000222de6c0,8000222de5c0,
8000222de6d0) at sendit+0x3fb
sys_sendmsg(e1aea756e75a2f45,1c0,80009bd8) at sys_sendmsg+0x15a
syscall(eee152956d09f98c) at syscall+0x32a
Xsyscall_untramp(6,0,0,0,0,1c) at Xsyscall_untramp+0xe4
end of kernel
end trace frame: 0x7f7c0410, count: 4
https://www.openbsd.org/ddb.html describes t

Re: VMM owner needs to be part of wheel

2018-06-20 Thread Mischa
Hi Reyk,

Thank you for your help on the kernel includes.
I have applied the patch to vmd -current and it works well on my setup.
A non-wheel user can indeed start/stop a VM and connect to the console.

Thanx!!

Mischa


On 19 Jun at 20:38, Reyk Floeter  wrote:
> Hi,
> 
> On Sun, Jun 17, 2018 at 10:35:27PM +0200, obs...@high5.nl wrote:
> > >Synopsis:  VMM owner needs to be part of group wheel in order to run vmctl 
> > >console|start|stop
> 
> the solution is not that easy as it seemed.
> 
> 1. Change the umask and let everyone access vmd, restrict the commands
> internally.
> 
> While this is a possible solution, I also agree that this allows any
> user (including system/privsep users) to trigger actions and imsgs in
> vmd; even if the result is permission denied as this is checked fairly
> late.
> 
> 2. Change the default owner group to root:_vmd.
> 
> It would be possible to define a hardcoded group, or use group _vmd,
> but this doesn't feel right. 
> 
> 3. Let the user configure the owner of the control socket.
> 
> This allows you to configure your own group, like "devops" (hrhr), or
> fetch the group from YP/LDA, and let them mess with your VMs.  I think
> this is much more viable in a multi-user environment.  Add the
> following to the top/global section of /etc/vm.conf: "socket owner :devops"
> 
> privsep/pledge also makes it a bit more complicated for us because I
> don't want to allow the control process to chown the unix socket; so
> it is done by the parent and some messaging.
> 
> The attached diff implements 3. ... comments? OK? Should we use 2. instead?
> 
> Reyk
> 
> Index: usr.sbin/vmd/control.c
> ===
> RCS file: /cvs/src/usr.sbin/vmd/control.c,v
> retrieving revision 1.23
> diff -u -p -u -p -r1.23 control.c
> --- usr.sbin/vmd/control.c13 May 2018 22:48:11 -  1.23
> +++ usr.sbin/vmd/control.c19 Jun 2018 18:27:55 -
> @@ -103,6 +103,7 @@ control_dispatch_vmd(int fd, struct priv
>   break;
>   case IMSG_VMDOP_CONFIG:
>   config_getconfig(ps->ps_env, imsg);
> + proc_compose(ps, PROC_PARENT, IMSG_VMDOP_DONE, NULL, 0);
>   break;
>   case IMSG_CTL_RESET:
>   config_getreset(ps->ps_env, imsg);
> @@ -169,6 +170,18 @@ control_init(struct privsep *ps, struct 
>  
>   cs->cs_fd = fd;
>   cs->cs_env = ps;
> +
> + proc_compose(ps, PROC_PARENT, IMSG_VMDOP_DONE, NULL, 0);
> +
> + return (0);
> +}
> +
> +int
> +control_reset(struct control_sock *cs)
> +{
> + /* Updating owner of the control socket */
> + if (chown(cs->cs_name, cs->cs_uid, cs->cs_gid) == -1)
> + return (-1);
>  
>   return (0);
>  }
> Index: usr.sbin/vmd/parse.y
> ===
> RCS file: /cvs/src/usr.sbin/vmd/parse.y,v
> retrieving revision 1.35
> diff -u -p -u -p -r1.35 parse.y
> --- usr.sbin/vmd/parse.y  19 Jun 2018 17:12:34 -  1.35
> +++ usr.sbin/vmd/parse.y  19 Jun 2018 18:27:55 -
> @@ -119,7 +119,8 @@ typedef struct {
>  
>  %token   INCLUDE ERROR
>  %token   ADD BOOT CDROM DISABLE DISK DOWN ENABLE GROUP INTERFACE LLADDR 
> LOCAL
> -%token   LOCKED MEMORY NIFS OWNER PATH PREFIX RDOMAIN SIZE SWITCH UP VM 
> VMID
> +%token   LOCKED MEMORY NIFS OWNER PATH PREFIX RDOMAIN SIZE SOCKET SWITCH 
> UP
> +%token   VM VMID
>  %token NUMBER
>  %token STRING
>  %type  lladdr
> @@ -190,6 +191,10 @@ main : LOCAL PREFIX STRING {
>  
>   memcpy(>vmd_cfg.cfg_localprefix, , sizeof(h));
>   }
> + | SOCKET OWNER owner_id {
> + env->vmd_ps.ps_csock.cs_uid = $3.uid;
> + env->vmd_ps.ps_csock.cs_gid = $3.gid == -1 ? 0 : $3.gid;
> + }
>   ;
>  
>  switch   : SWITCH string {
> @@ -678,6 +683,7 @@ lookup(char *s)
>   { "prefix", PREFIX },
>   { "rdomain",RDOMAIN },
>   { "size",   SIZE },
> + { "socket", SOCKET },
>   { "switch", SWITCH },
>   { "up", UP },
>   { "vm", VM }
> Index: usr.sbin/vmd/proc.h
> ===
> RCS file: /cvs/src/usr.sbin/vmd/proc.h,v
> retrieving revision 1.12
> diff -u -p -u -

Re: VMM owner needs to be part of wheel

2018-06-19 Thread Mischa
On 19 Jun at 22:10, Theo de Raadt  wrote:
> Mischa  wrote:
> 
> > > 2. Change the default owner group to root:_vmd.
> > > 
> > > It would be possible to define a hardcoded group, or use group _vmd,
> > > but this doesn't feel right. 
> > 
> > Why would using _vmd not work?
> > Wouldn't that be the same for other daemons?
> > Or wouldn't these be used to assign other users to it?
> > One which comes to mind is _ladvd, albeit not in base.
> 
> Because it exists for a different purpose.  A security purpose.
> Good lord.

Makes perfect sense.

Mischa



Re: VMM owner needs to be part of wheel

2018-06-19 Thread Mischa
Hi Reyk,

On 19 Jun at 20:38, Reyk Floeter  wrote:
> Hi,
> 
> On Sun, Jun 17, 2018 at 10:35:27PM +0200, obs...@high5.nl wrote:
> > >Synopsis:  VMM owner needs to be part of group wheel in order to run vmctl 
> > >console|start|stop
> 
> the solution is not that easy as it seemed.
> 
> 1. Change the umask and let everyone access vmd, restrict the commands
> internally.
> 
> While this is a possible solution, I also agree that this allows any
> user (including system/privsep users) to trigger actions and imsgs in
> vmd; even if the result is permission denied as this is checked fairly
> late.
> 
> 2. Change the default owner group to root:_vmd.
> 
> It would be possible to define a hardcoded group, or use group _vmd,
> but this doesn't feel right. 

Why would using _vmd not work?
Wouldn't that be the same for other daemons?
Or wouldn't these be used to assign other users to it?
One which comes to mind is _ladvd, albeit not in base.

> 3. Let the user configure the owner of the control socket.
> 
> This allows you to configure your own group, like "devops" (hrhr), or
> fetch the group from YP/LDA, and let them mess with your VMs.  I think
> this is much more viable in a multi-user environment.  Add the
> following to the top/global section of /etc/vm.conf: "socket owner :devops"

Works for me. And when vm.conf doesn't exist, or the owner would exist it would 
fall back to root?

Will check out the patch asap!

Mischa

> privsep/pledge also makes it a bit more complicated for us because I
> don't want to allow the control process to chown the unix socket; so
> it is done by the parent and some messaging.
> 
> The attached diff implements 3. ... comments? OK? Should we use 2. instead?
> 
> Reyk
> 
> Index: usr.sbin/vmd/control.c
> ===
> RCS file: /cvs/src/usr.sbin/vmd/control.c,v
> retrieving revision 1.23
> diff -u -p -u -p -r1.23 control.c
> --- usr.sbin/vmd/control.c13 May 2018 22:48:11 -  1.23
> +++ usr.sbin/vmd/control.c19 Jun 2018 18:27:55 -
> @@ -103,6 +103,7 @@ control_dispatch_vmd(int fd, struct priv
>   break;
>   case IMSG_VMDOP_CONFIG:
>   config_getconfig(ps->ps_env, imsg);
> + proc_compose(ps, PROC_PARENT, IMSG_VMDOP_DONE, NULL, 0);
>   break;
>   case IMSG_CTL_RESET:
>   config_getreset(ps->ps_env, imsg);
> @@ -169,6 +170,18 @@ control_init(struct privsep *ps, struct 
>  
>   cs->cs_fd = fd;
>   cs->cs_env = ps;
> +
> + proc_compose(ps, PROC_PARENT, IMSG_VMDOP_DONE, NULL, 0);
> +
> + return (0);
> +}
> +
> +int
> +control_reset(struct control_sock *cs)
> +{
> + /* Updating owner of the control socket */
> + if (chown(cs->cs_name, cs->cs_uid, cs->cs_gid) == -1)
> + return (-1);
>  
>   return (0);
>  }
> Index: usr.sbin/vmd/parse.y
> ===
> RCS file: /cvs/src/usr.sbin/vmd/parse.y,v
> retrieving revision 1.35
> diff -u -p -u -p -r1.35 parse.y
> --- usr.sbin/vmd/parse.y  19 Jun 2018 17:12:34 -  1.35
> +++ usr.sbin/vmd/parse.y  19 Jun 2018 18:27:55 -
> @@ -119,7 +119,8 @@ typedef struct {
>  
>  %token   INCLUDE ERROR
>  %token   ADD BOOT CDROM DISABLE DISK DOWN ENABLE GROUP INTERFACE LLADDR 
> LOCAL
> -%token   LOCKED MEMORY NIFS OWNER PATH PREFIX RDOMAIN SIZE SWITCH UP VM 
> VMID
> +%token   LOCKED MEMORY NIFS OWNER PATH PREFIX RDOMAIN SIZE SOCKET SWITCH 
> UP
> +%token   VM VMID
>  %token NUMBER
>  %token STRING
>  %type  lladdr
> @@ -190,6 +191,10 @@ main : LOCAL PREFIX STRING {
>  
>   memcpy(>vmd_cfg.cfg_localprefix, , sizeof(h));
>   }
> + | SOCKET OWNER owner_id {
> + env->vmd_ps.ps_csock.cs_uid = $3.uid;
> + env->vmd_ps.ps_csock.cs_gid = $3.gid == -1 ? 0 : $3.gid;
> + }
>   ;
>  
>  switch   : SWITCH string {
> @@ -678,6 +683,7 @@ lookup(char *s)
>   { "prefix", PREFIX },
>   { "rdomain",RDOMAIN },
>   { "size",   SIZE },
> + { "socket", SOCKET },
>   { "switch", SWITCH },
>   { "up", UP },
>   { "vm", VM }
> Index: usr.sbin/vmd/proc.h
> ===

Re: VMs "crash" with: vcpu_run_loop: vm 19 / vcpu 0 run ioctl failed: Invalid argument

2018-06-19 Thread Mischa
> On 19 Jun 2018, at 17:51, Mike Larkin  wrote:
> On Tue, Jun 19, 2018 at 03:42:06PM +0200, obs...@high5.nl wrote:
>>> Synopsis:   VMs stop intermitently after vcpu_run_loop error
>>> Category:   system
>>> Environment:
>>  System  : OpenBSD 6.3
>>  Details : OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018
>>   
>> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> 
>>  Architecture: OpenBSD.amd64
>>  Machine : amd64
>>> Description:
>>  Currently running 12 VMs on a single machine. After some random time, 
>> VMs randomly shutdown. Sometimes a single VM and sometimes more. It's always 
>> after an error message like the following a VM stops.
>> Jun 19 11:16:49 j5 vmd[59907]: vcpu_run_loop: vm 11 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[51030]: vcpu_run_loop: vm 6 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[64667]: vcpu_run_loop: vm 9 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[53125]: vcpu_run_loop: vm 10 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[61348]: vcpu_run_loop: vm 7 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[59411]: vcpu_run_loop: vm 4 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[52083]: vcpu_run_loop: vm 12 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[21423]: vcpu_run_loop: vm 3 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[40185]: vcpu_run_loop: vm 5 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[60512]: vcpu_run_loop: vm 8 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[44187]: vcpu_run_loop: vm 1 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[46728]: vcpu_run_loop: vm 2 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:20:30 j5 vmd[81571]: vmd: vm 22 event thread exited unexpectedly   
>>   
> 
> This is almost surely the following bug, fixed in April (log from pmap.c):
> 
> revision 1.113
> date: 2018/04/17 06:31:55;  author: mlarkin;  state: Exp;  lines: +275 -64;  
> commitid: BaLjO2NVfYaZP00l;
> Better way of allocating EPT entries.
> 
> Don't use the standard pmap PTE functions to manipulate EPT PTEs. This
> occasionally caused VMs to fail after random amounts of time due to
> loading the pmap on the CPU and the processor updating A/D bits (which
> are reserved bits in EPT). This ultimately manifested itself as errors
> from vmd ("vcpu X run ioctl failed".)
> 
> tested by many, on different types of HW, no regressions noted
> 
> ---
> 
> Can you try -current and see if you can still reproduce this problem?

Will do! Will probably be able to upgrade to current this week.

>> Side note: after a reboot of the host, all VMs stop at one point as it looks 
>> like VMM starts all the VMs at the same time. Looks like it's draining 
>> resources at that point.
>> 
> 
> Yes, this is a known issue, I've had it on my to-do list to have some sort
> of sequencing or delay, but never got around to it (Hint, hint, such a fix 
> would
> likely be an easy intro-to-vmd-hacking effort for someone who wanted to dip 
> their
> toe in the water).

Wish I was able to do something. Will get some more hardware up and running to 
host OpenBSD VMs and donate a part to the Foundation.

Mischa

> 
> -ml
> 
>>> How-To-Repeat:
>>  Unfortunately I have not found a way to reproduce this, I thought I was 
>> on to something when I loaded a Alpine Linux VM as well, but this is now 
>> also happening without it running. 
>> 
>>> Fix:
>>  No fix.
>> 
>> 
>> dmesg:
>> OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018
>>
>> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> real mem = 8544342016 (8148MB)
>> avail mem = 8278315008 (7894MB)
>> mpath0 at root
>> scsibus0 at mpath0: 256 targets
>> mainbus0 at root
>> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe9750 (56 entries)
>> bios0: vendor American Megatrends Inc. version "2.0b" date 09/17/2012
>> bios0: Supermicro X9SCL/X9SCM
>> acpi0 at bios0: rev 2
>> acpi0: sleep states S0 S1 S4 S5
>> acpi0: tables DSDT FACP APIC FPDT MCFG HPET SSDT SPMI SSDT SSDT EINJ ERST 
>> HEST BERT BGRT
>> acpi0: wakeup devices PS2K(S4) PS2M(S4) UAR1(S4) UAR2(S4) P0P1(S4) USB1(S4) 
>> USB2

Re: VMM owner needs to be part of wheel

2018-06-19 Thread Mischa
Hi Reyk,

Seems like a workable solution if the security part is restricted enough by vmd.

Mischa

> On 18 Jun 2018, at 00:05, Reyk Floeter  wrote:
> 
> Hi,
> 
> changing the umask in control.c could fix it. There’s no need to restrict it 
> to wheel since vmd checks the permissions based on configuration internally. 
> Having the vmd socket world-writable should be OK.
> 
> But we could eventually use a group _vmd to shield off users who shouldn’t 
> even be able to do anything. But this doesn’t make much sense - it would be a 
> bit like restricting users from running ps a.
> 
> I can make a diff tomorrow.
> 
> Reyk
> 
> Am 17.06.2018 um 22:35 schrieb obs...@high5.nl:
> 
>>> Synopsis:VMM owner needs to be part of group wheel in order to run 
>>> vmctl console|start|stop
>>> Category:system
>>> Environment:
>>   System  : OpenBSD 6.3
>>   Details : OpenBSD 6.3 (GENERIC.MP) #3: Fri May 18 00:06:26 CEST 2018
>>
>> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> 
>>   Architecture: OpenBSD.amd64
>>   Machine : amd64
>>> Description:
>>   When some level of vmctl is needed for users they are currently required 
>> to be part of group wheel. It would be great from a hosting perspective to 
>> allow users to control their own VM and attach to tJhe console. I started a 
>> small project to host OpenBSD VMs for the community out of Amsterdam and I 
>> would love to provide users access to their own VM.
>>> How-To-Repeat:
>>   Set the owner who is not in wheel will result in a message like:
>>   vmctl: command failed: Operation not permitted
>>> Fix:
>>   The current work around is to add the user to group wheel, which is might 
>> be ok for trusted users.
>> 
>> 
>> dmesg:
>> OpenBSD 6.3 (GENERIC.MP) #3: Fri May 18 00:06:26 CEST 2018
>>   
>> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> real mem = 8544342016 (8148MB)
>> avail mem = 8278310912 (7894MB)
>> mpath0 at root
>> scsibus0 at mpath0: 256 targets
>> mainbus0 at root
>> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe9750 (56 entries)
>> bios0: vendor American Megatrends Inc. version "2.0b" date 09/17/2012
>> bios0: Supermicro X9SCL/X9SCM
>> acpi0 at bios0: rev 2
>> acpi0: sleep states S0 S1 S4 S5
>> acpi0: tables DSDT FACP APIC FPDT MCFG HPET SSDT SPMI SSDT SSDT EINJ ERST 
>> HEST BERT BGRT
>> acpi0: wakeup devices PS2K(S4) PS2M(S4) UAR1(S4) UAR2(S4) P0P1(S4) USB1(S4) 
>> USB2(S4) USB3(S4) USB4(S4) USB5(S4) USB6(S4) USB7(S4) PXSX(S4) RP01(S4) 
>> PXSX(S4) RP02(S4) [...]
>> acpitimer0 at acpi0: 3579545 Hz, 24 bits
>> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
>> cpu0 at mainbus0: apid 0 (boot processor)
>> cpu0: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3100.45 MHz
>> cpu0: 
>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
>> cpu0: 256KB 64b/line 8-way L2 cache
>> acpitimer0: recalibrated TSC frequency 3100015637 Hz
>> cpu0: smt 0, core 0, package 0
>> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
>> cpu0: apic clock running at 100MHz
>> cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
>> cpu1 at mainbus0: apid 2 (application processor)
>> cpu1: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3100.03 MHz
>> cpu1: 
>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
>> cpu1: 256KB 64b/line 8-way L2 cache
>> cpu1: smt 0, core 1, package 0
>> cpu2 at mainbus0: apid 4 (application processor)
>> cpu2: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3100.03 MHz
>> cpu2: 
>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
>> cpu2: 256KB 64b/line 8-way L2 cache
>> cpu2: smt 0, core 2, package 0
&g

Re: em0 stops working after random amount of time in OpenBSD 5.8/5.9

2016-04-04 Thread Mischa

Hi Evgeniy,

Thank you for your suggestion. Unfortunately there is already a card in the 
only available slot.
I did notice that the network cards had a different chipset.

em0 at pci4 dev 0 function 0 "Intel 82573E" rev 0x03: msi, address 
00:30:48:96:42:06
em1 at pci5 dev 0 function 0 "Intel 82573L" rev 0x00: msi, address 
00:30:48:96:42:07

I moved the cable em1 and it's been behaving as expected since.

Thanx for responding!!

Mischa

> On 03 Apr 2016, at 02:13, Evgeniy Sudyr <eject.in...@gmail.com> wrote:
> 
> Mischa, please provide sendbug (1) with all details to developers and
> then wait and hope someone will pick that bug.
> 
> Personally I will suggest you to don't use that 11 years old NIC which
> causes such problems (in archives I found some reports similar to
> yours) and get better NIC which fits your needs.
> 
> Check Intel i350
> http://www.intel.com/content/www/us/en/embedded/products/networking/ethernet-controller-i350-datasheet.html
> 
> You can get it for ~ $150 retail price (dual port). I never had issues
> with this one, also I didn't used even half of it's cool features on
> any platform.
> 
> 
> On Sat, Apr 2, 2016 at 10:14 PM, Mischa Peters <open...@high5.nl> wrote:
>> Hi Evgeniy,
>> 
>> One of the questions I had was indeed how to troubleshoot this. Nothing is 
>> in dmesg or messages that is out of the ordinary, I can not find anything 
>> that changes on the interface or netstat.
>> 
>> Until the 18th of March this machine was running FreeBSD, without any 
>> issues. I moved from 9.3-RELEASE-pXX to OpenBSD 5.8. There are still 2 
>> machines of the same type that are running FreeBSD 9.3 without any issues.
>> 
>> I do know there are issues in FreeBSD 10 with this NIC which haven't been 
>> resolved. But they have primarily to do that the driver is not loading.
>> 
>> The thing that is strange is that it works after reboot, I can ping an IP. 
>> But as soon as I run ftp or pkg_add for example, it stops working.
>> 
>> Mischa
>> 
>> --
>> 
>> 
>> 
>> --
>>> On 02 Apr 2016, at 21:15, Evgeniy Sudyr <eject.in...@gmail.com> wrote:
>>> 
>>> Mischa,
>>> 
>>> 1) Consider using sendbug (1) to provide report (read section saying
>>> "The following items should be contained in every bug report")
>>> 
>>> http://www.openbsd.org/report.html
>>> 
>>> 2) I suggest to provide more details about your system configuration.
>>> Most interesting is if any sysctl tuning done and if it was working
>>> system or new/fresh setup which never worked before?
>>> 
>>> 3) Can it be some broken hardware? I just googled for your board / NIC
>>> and both are about 9yrs old.
>>> 
>>> --
>>> Evgeniy
>>> 
>>>> On Sat, Apr 2, 2016 at 7:36 PM, Mischa <open...@high5.nl> wrote:
>>>> Hi All,
>>>> 
>>>> I just tried with: OpenBSD host 5.9 GENERIC.MP#1888 amd64
>>>> The result is still the same. Networking stops and sometimes continues 
>>>> after some time.
>>>> Could this because of SMP networking?
>>>> 
>>>> What I am seeing on the switch is that the MAC address is still in the MAC 
>>>> table.
>>>> But there is no longer an ARP entry.
>>>> 
>>>> Mischa
>>>> 
>>>> 
>>>>> On 22 Mar 2016, at 12:18, Mischa <open...@high5.nl> wrote:
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> I would be happy to provide remote console access if that helps.
>>>>> 
>>>>> Mischa
>>>>> 
>>>>>> On 20 Mar 2016, at 14:52, Mischa <open...@high5.nl> wrote:
>>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> I am running OpenBSD 5.8, and tried 5.9 as well, on a SuperMicro PDSMi 
>>>>>> which has an Intel 82573E.
>>>>>> For some reason networking just stops working after a random amount of 
>>>>>> time and usually happens when I SSH-ed into the machine.
>>>>>> When connected to the console it seems to be working longer. I am 
>>>>>> testing this by pinging an IP address on the local subnet.
>>>>>> 
>>>>>> Unfortunately I can not find anything different from an interface 
>>>>>> perspective, subnet perspective and nothing appears in the logs.
>>>>>> The problem goes away, temporarily, when I bounce the int

Re: em0 stops working after random amount of time in OpenBSD 5.8/5.9

2016-04-02 Thread Mischa Peters
Hi Evgeniy,

One of the questions I had was indeed how to troubleshoot this. Nothing is in 
dmesg or messages that is out of the ordinary, I can not find anything that 
changes on the interface or netstat.

Until the 18th of March this machine was running FreeBSD, without any issues. I 
moved from 9.3-RELEASE-pXX to OpenBSD 5.8. There are still 2 machines of the 
same type that are running FreeBSD 9.3 without any issues. 

I do know there are issues in FreeBSD 10 with this NIC which haven't been 
resolved. But they have primarily to do that the driver is not loading.

The thing that is strange is that it works after reboot, I can ping an IP. But 
as soon as I run ftp or pkg_add for example, it stops working. 

Mischa

--



--
> On 02 Apr 2016, at 21:15, Evgeniy Sudyr <eject.in...@gmail.com> wrote:
> 
> Mischa,
> 
> 1) Consider using sendbug (1) to provide report (read section saying
> "The following items should be contained in every bug report")
> 
> http://www.openbsd.org/report.html
> 
> 2) I suggest to provide more details about your system configuration.
> Most interesting is if any sysctl tuning done and if it was working
> system or new/fresh setup which never worked before?
> 
> 3) Can it be some broken hardware? I just googled for your board / NIC
> and both are about 9yrs old.
> 
> --
> Evgeniy
> 
>> On Sat, Apr 2, 2016 at 7:36 PM, Mischa <open...@high5.nl> wrote:
>> Hi All,
>> 
>> I just tried with: OpenBSD host 5.9 GENERIC.MP#1888 amd64
>> The result is still the same. Networking stops and sometimes continues after 
>> some time.
>> Could this because of SMP networking?
>> 
>> What I am seeing on the switch is that the MAC address is still in the MAC 
>> table.
>> But there is no longer an ARP entry.
>> 
>> Mischa
>> 
>> 
>>> On 22 Mar 2016, at 12:18, Mischa <open...@high5.nl> wrote:
>>> 
>>> Hi All,
>>> 
>>> I would be happy to provide remote console access if that helps.
>>> 
>>> Mischa
>>> 
>>>> On 20 Mar 2016, at 14:52, Mischa <open...@high5.nl> wrote:
>>>> 
>>>> Hi All,
>>>> 
>>>> I am running OpenBSD 5.8, and tried 5.9 as well, on a SuperMicro PDSMi 
>>>> which has an Intel 82573E.
>>>> For some reason networking just stops working after a random amount of 
>>>> time and usually happens when I SSH-ed into the machine.
>>>> When connected to the console it seems to be working longer. I am testing 
>>>> this by pinging an IP address on the local subnet.
>>>> 
>>>> Unfortunately I can not find anything different from an interface 
>>>> perspective, subnet perspective and nothing appears in the logs.
>>>> The problem goes away, temporarily, when I bounce the interface on the 
>>>> switch.
>>>> 
>>>> How can I best troubleshoot the cause?
>>>> 
>>>> # dmesg
>>>> em0 at pci4 dev 0 function 0 "Intel 82573E" rev 0x03: msi, address 
>>>> 00:30:48:96:42:06
>>>> 
>>>> # pcidump -v
>>>> 13:0:0: Intel 82573E
>>>>0x: Vendor ID: 8086 Product ID: 108c
>>>>0x0004: Command: 0107 Status: 0010
>>>>0x0008: Class: 02 Subclass: 00 Interface: 00 Revision: 03
>>>>0x000c: BIST: 00 Header Type: 00 Latency Timer: 00 Cache Line Size: 10
>>>>0x0010: BAR mem 32bit addr: 0xe8a0/0x0002
>>>>0x0014: BAR empty ()
>>>>0x0018: BAR io addr: 0x5000/0x0020
>>>>0x001c: BAR empty ()
>>>>0x0020: BAR empty ()
>>>>0x0024: BAR empty ()
>>>>0x0028: Cardbus CIS: 
>>>>0x002c: Subsystem Vendor ID: 15d9 Product ID: 108c
>>>>0x0030: Expansion ROM Base Address: 
>>>>0x0038: 
>>>>0x003c: Interrupt Pin: 01 Line: 0b Min Gnt: 00 Max Lat: 00
>>>>0x00c8: Capability 0x01: Power Management
>>>>0x00d0: Capability 0x05: Message Signaled Interrupts (MSI)
>>>>0x00e0: Capability 0x10: PCI Express
>>>>Link Speed: 2.5 / 2.5 GT/s Link Width: x1 / x1
>>>> 
>>>> # ifconfig em0
>>>> em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>>>> lladdr 00:30:48:96:42:06
>>>> priority: 0
>>>> groups: egress
>>>> media: Ethernet autoselect (1000baseT 
>>>> full-duplex,master,rxpause,txpause)
>>>> status: active
>>>> inet  netmask 0xff00 broadcast 46.23.86.255
>>>> 
>>>> # netstat -nr
>>>> Internet:
>>>> DestinationGatewayFlags   Refs  Use   Mtu  Prio 
>>>> Iface
>>>> default UGS3   37 - 8 em0
>>>> /2446.23.86.132   UC 10 - 8 em0
>>>>  02:e0:52:9c:3c:56  UHLc   10 - 8 em0
>>>>00:30:48:96:42:06  HLl00 - 1 lo0
>>>>   UHb00 - 1 em0
>>>> 127/8  127.0.0.1  UGRS   00 32768 8 lo0
>>>> 127.0.0.1  127.0.0.1  UHl10 32768 1 lo0
>>>> 224/4  127.0.0.1  URS00 32768 8 lo0
>>>> 
>>>> Thanx!
>>>> 
>>>> Mischa
> 
> 
> 
> -- 
> --
> With regards,
> Eugene Sudyr



Re: em0 stops working after random amount of time in OpenBSD 5.8/5.9

2016-04-02 Thread Mischa
Hi All,

I just tried with: OpenBSD host 5.9 GENERIC.MP#1888 amd64
The result is still the same. Networking stops and sometimes continues after 
some time.
Could this because of SMP networking?

What I am seeing on the switch is that the MAC address is still in the MAC 
table.
But there is no longer an ARP entry.

Mischa


> On 22 Mar 2016, at 12:18, Mischa <open...@high5.nl> wrote:
> 
> Hi All,
> 
> I would be happy to provide remote console access if that helps.
> 
> Mischa
> 
>> On 20 Mar 2016, at 14:52, Mischa <open...@high5.nl> wrote:
>> 
>> Hi All,
>> 
>> I am running OpenBSD 5.8, and tried 5.9 as well, on a SuperMicro PDSMi which 
>> has an Intel 82573E.
>> For some reason networking just stops working after a random amount of time 
>> and usually happens when I SSH-ed into the machine.
>> When connected to the console it seems to be working longer. I am testing 
>> this by pinging an IP address on the local subnet.
>> 
>> Unfortunately I can not find anything different from an interface 
>> perspective, subnet perspective and nothing appears in the logs.
>> The problem goes away, temporarily, when I bounce the interface on the 
>> switch.
>> 
>> How can I best troubleshoot the cause?
>> 
>> # dmesg
>> em0 at pci4 dev 0 function 0 "Intel 82573E" rev 0x03: msi, address 
>> 00:30:48:96:42:06
>> 
>> # pcidump -v
>> 13:0:0: Intel 82573E
>>  0x: Vendor ID: 8086 Product ID: 108c
>>  0x0004: Command: 0107 Status: 0010
>>  0x0008: Class: 02 Subclass: 00 Interface: 00 Revision: 03
>>  0x000c: BIST: 00 Header Type: 00 Latency Timer: 00 Cache Line Size: 10
>>  0x0010: BAR mem 32bit addr: 0xe8a0/0x0002
>>  0x0014: BAR empty ()
>>  0x0018: BAR io addr: 0x5000/0x0020
>>  0x001c: BAR empty ()
>>  0x0020: BAR empty ()
>>  0x0024: BAR empty ()
>>  0x0028: Cardbus CIS: 
>>  0x002c: Subsystem Vendor ID: 15d9 Product ID: 108c
>>  0x0030: Expansion ROM Base Address: 
>>  0x0038: 
>>  0x003c: Interrupt Pin: 01 Line: 0b Min Gnt: 00 Max Lat: 00
>>  0x00c8: Capability 0x01: Power Management
>>  0x00d0: Capability 0x05: Message Signaled Interrupts (MSI)
>>  0x00e0: Capability 0x10: PCI Express
>>  Link Speed: 2.5 / 2.5 GT/s Link Width: x1 / x1
>> 
>> # ifconfig em0
>> em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>>   lladdr 00:30:48:96:42:06
>>   priority: 0
>>   groups: egress
>>   media: Ethernet autoselect (1000baseT 
>> full-duplex,master,rxpause,txpause)
>>   status: active
>>   inet  netmask 0xff00 broadcast 46.23.86.255
>> 
>> # netstat -nr
>> Internet:
>> DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
>> default UGS3   37 - 8 em0  
>> /2446.23.86.132   UC 10 - 8 em0  
>>  02:e0:52:9c:3c:56  UHLc   1    0     - 8 em0  
>>00:30:48:96:42:06  HLl00 - 1 lo0  
>>   UHb00 - 1 em0  
>> 127/8  127.0.0.1  UGRS   00 32768 8 lo0  
>> 127.0.0.1  127.0.0.1  UHl10 32768 1 lo0  
>> 224/4  127.0.0.1  URS00 32768 8 lo0  
>> 
>> Thanx!
>> 
>> Mischa
>> 
>> 
> 



Re: em0 stops working after random amount of time in OpenBSD 5.8/5.9

2016-03-22 Thread Mischa
Hi All,

I would be happy to provide remote console access if that helps.

Mischa

> On 20 Mar 2016, at 14:52, Mischa <open...@high5.nl> wrote:
> 
> Hi All,
> 
> I am running OpenBSD 5.8, and tried 5.9 as well, on a SuperMicro PDSMi which 
> has an Intel 82573E.
> For some reason networking just stops working after a random amount of time 
> and usually happens when I SSH-ed into the machine.
> When connected to the console it seems to be working longer. I am testing 
> this by pinging an IP address on the local subnet.
> 
> Unfortunately I can not find anything different from an interface 
> perspective, subnet perspective and nothing appears in the logs.
> The problem goes away, temporarily, when I bounce the interface on the switch.
> 
> How can I best troubleshoot the cause?
> 
> # dmesg
> em0 at pci4 dev 0 function 0 "Intel 82573E" rev 0x03: msi, address 
> 00:30:48:96:42:06
> 
> # pcidump -v
> 13:0:0: Intel 82573E
>   0x: Vendor ID: 8086 Product ID: 108c
>   0x0004: Command: 0107 Status: 0010
>   0x0008: Class: 02 Subclass: 00 Interface: 00 Revision: 03
>   0x000c: BIST: 00 Header Type: 00 Latency Timer: 00 Cache Line Size: 10
>   0x0010: BAR mem 32bit addr: 0xe8a0/0x0002
>   0x0014: BAR empty ()
>   0x0018: BAR io addr: 0x5000/0x0020
>   0x001c: BAR empty ()
>   0x0020: BAR empty ()
>   0x0024: BAR empty ()
>   0x0028: Cardbus CIS: 
>   0x002c: Subsystem Vendor ID: 15d9 Product ID: 108c
>   0x0030: Expansion ROM Base Address: 
>   0x0038: 
>   0x003c: Interrupt Pin: 01 Line: 0b Min Gnt: 00 Max Lat: 00
>   0x00c8: Capability 0x01: Power Management
>   0x00d0: Capability 0x05: Message Signaled Interrupts (MSI)
>   0x00e0: Capability 0x10: PCI Express
>   Link Speed: 2.5 / 2.5 GT/s Link Width: x1 / x1
> 
> # ifconfig em0
> em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>lladdr 00:30:48:96:42:06
>priority: 0
>groups: egress
>media: Ethernet autoselect (1000baseT 
> full-duplex,master,rxpause,txpause)
>status: active
>inet  netmask 0xff00 broadcast 46.23.86.255
> 
> # netstat -nr
> Internet:
> DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
> default UGS3   37 - 8 em0  
> /2446.23.86.132   UC 10 - 8 em0  
>  02:e0:52:9c:3c:56  UHLc   10 - 8 em0  
>00:30:48:96:42:06  HLl00 - 1 lo0  
>   UHb00 - 1 em0  
> 127/8  127.0.0.1  UGRS   0    0 32768 8 lo0  
> 127.0.0.1  127.0.0.1  UHl10 32768 1 lo0  
> 224/4  127.0.0.1  URS00 32768 8 lo0  
> 
> Thanx!
> 
> Mischa
> 
> 



em0 stops working after random amount of time in OpenBSD 5.8/5.9

2016-03-20 Thread Mischa
Hi All,

I am running OpenBSD 5.8, and tried 5.9 as well, on a SuperMicro PDSMi which 
has an Intel 82573E.
For some reason networking just stops working after a random amount of time and 
usually happens when I SSH-ed into the machine.
When connected to the console it seems to be working longer. I am testing this 
by pinging an IP address on the local subnet.

Unfortunately I can not find anything different from an interface perspective, 
subnet perspective and nothing appears in the logs.
The problem goes away, temporarily, when I bounce the interface on the switch.

How can I best troubleshoot the cause?

# dmesg
em0 at pci4 dev 0 function 0 "Intel 82573E" rev 0x03: msi, address 
00:30:48:96:42:06

# pcidump -v
13:0:0: Intel 82573E
   0x: Vendor ID: 8086 Product ID: 108c
   0x0004: Command: 0107 Status: 0010
   0x0008: Class: 02 Subclass: 00 Interface: 00 Revision: 03
   0x000c: BIST: 00 Header Type: 00 Latency Timer: 00 Cache Line Size: 10
   0x0010: BAR mem 32bit addr: 0xe8a0/0x0002
   0x0014: BAR empty ()
   0x0018: BAR io addr: 0x5000/0x0020
   0x001c: BAR empty ()
   0x0020: BAR empty ()
   0x0024: BAR empty ()
   0x0028: Cardbus CIS: 
   0x002c: Subsystem Vendor ID: 15d9 Product ID: 108c
   0x0030: Expansion ROM Base Address: 
   0x0038: 
   0x003c: Interrupt Pin: 01 Line: 0b Min Gnt: 00 Max Lat: 00
   0x00c8: Capability 0x01: Power Management
   0x00d0: Capability 0x05: Message Signaled Interrupts (MSI)
   0x00e0: Capability 0x10: PCI Express
   Link Speed: 2.5 / 2.5 GT/s Link Width: x1 / x1

# ifconfig em0
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:30:48:96:42:06
priority: 0
groups: egress
media: Ethernet autoselect (1000baseT 
full-duplex,master,rxpause,txpause)
status: active
inet  netmask 0xff00 broadcast 46.23.86.255

# netstat -nr
Internet:
DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
default UGS3   37 - 8 em0  
/2446.23.86.132   UC 10 - 8 em0  
 02:e0:52:9c:3c:56  UHLc   10 - 8 em0  
   00:30:48:96:42:06  HLl00 - 1 lo0  
  UHb00 - 1 em0  
127/8  127.0.0.1  UGRS   00 32768 8 lo0  
127.0.0.1  127.0.0.1  UHl10 32768 1 lo0  
224/4  127.0.0.1  URS00 32768 8 lo0  

Thanx!

Mischa




Re: Potential MP problem in OpenBSD 5.8

2016-03-07 Thread Mischa

> On 07 Mar 2016, at 09:34, Mike Larkin <mlar...@azathoth.net> wrote:
> 
> On Sun, Mar 06, 2016 at 04:33:59PM +0100, Mischa wrote:
>> Hi,
>> 
>> Reyk asked me to post the following panics on this list.
>> I have seen multiple panics when running the stock relayd / httpd on both 
>> bhyve and bare metal.
>> Here are the 2 I captured.
>> 
>> The first trace is from OpenBSD 5.8 running on bhyve (FreeBSD 10.2).
>> 
>>  https://gist.github.com/mischapeters/11dd221087c2b04b7741
>> panic: mtx_enter: locking against myself
>> Stopped at  0x8133fc09: leave
>> RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
>> DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
>> ddb>
>> ddb> trace
>> (null)() at 0x8133fc09
>> (null)() at 0x811a310e
>> (null)() at 0x8132508f
>> (null)() at 0x811bc830
>> (null)() at 0x8123f060
>> (null)() at 0x81253abd
>> (null)() at 0x81254900
>> (null)() at 0x81193a95
>> (null)() at 0x81324632
>> (null)() at 0x8133b3cf
>> (null)() at 0x8119f8f5
>> (null)() at 0x811bb0f6
>> (null)() at 0x811bb6bc
>> (null)() at 0x811bee6e
>> (null)() at 0x813233ae
> 
> Please tell bhyve to implement proper elf symbol loading in their bootloader.
> The trace above is pretty useless.

I guess you are seeing all of them at AsiaBSD or BSDCan, no? ;)

What about the other trace?

Mischa





Potential MP problem in OpenBSD 5.8

2016-03-06 Thread Mischa
 89  30x90  kqreadrelayd
  8280  11407  11407 89  30x90  kqreadrelayd
 11407   9472  11407 89  30x90  kqreadrelayd
 26863   9472  26863 89  30x90  kqreadrelayd
 26108   9472  26108 89  30x90  kqreadrelayd
  8232   9472   8232 89  70x10relayd
  9472  1   9472  0  30x80  kqreadrelayd
 16247  14642  14642 75  30x92  poll  bgpd
 18324  14642  14642 75  30x92  poll  bgpd
 14642  1  14642  0  30x80  poll  bgpd
 11830   2383  11830 91  30x90  kqreadsnmpd
 31427   2383  31427 91  30x90  kqreadsnmpd
  2383  1   2383  0  30x80  kqreadsnmpd
  7874  1   7874  0  30x80  selectsshd
 20764   8955   7081 83  30x90  poll  ntpd
  8955   7081   7081 83  30x90  poll  ntpd
  7081  1   7081  0  30x80  poll  ntpd
  5327  29909  29909 74  30x90  bpf   pflogd
 29909  1  29909  0  30x80  netio pflogd
 24314  11314  11314 73  30x90  kqreadsyslogd
 11314  1  11314  0  30x80  netio syslogd
 10994  0  0  0  3 0x14200  pgzerozerothread
 23221  0  0  0  3 0x14200  aiodoned  aiodoned
  5596  0  0  0  3 0x14200  syncerupdate
 17098  0  0  0  3 0x14200  cleaner   cleaner
 12769  0  0  0  3 0x14200  reaperreaper
 28148  0  0  0  3 0x14200  pgdaemon  pagedaemon
 14465  0  0  0  3 0x14200  bored srdis
 18524  0  0  0  3 0x14200  bored crypto
 30899  0  0  0  3 0x14200  pftm  pfpurge
 18749  0  0  0  3 0x14200  usbtskusbtask
 15794  0  0  0  3 0x14200  usbatsk   usbatsk
 13494  0  0  0  3  0x40014200  acpi0 acpi0
 28997  0  0  0  3  0x40014200idle3
 19921  0  0  0  3  0x40014200idle2
 28399  0  0  0  3  0x40014200idle1
 18074  0  0  0  3 0x14200  bored sensors
  9023  0  0  0  7 0x14210softnet
 24448  0  0  0  3 0x14200  bored systqmp
 15997  0  0  0  3 0x14200  bored systq
 28210  0  0  0  3  0x40014200idle0
 1  0  1  0  30x82  wait  init
 0 -1  0  0  3 0x10200  scheduler swapper
ddb{0}>


Hopefully this will help.

Regards,

Mischa




Re: OpenBSD 5.8 GENERIC#1170 amd64 panics in bhyve

2015-11-21 Thread Mischa
Hi Steven,

> On 21 Nov 2015, at 18:00, Steven Chamberlain <ste...@pyro.eu.org> wrote:
> 
> Hello,
> 
> Mischa wrote:
>> I am running OpenBSD 5.8 GENERIC#1170 amd64 as a bhyve instance on FreeBSD 
>> 10.2-RELEASE-p7.
> 
> I suspect you should try this again using a snapshot kernel[0] and if
> the problem still happens, the ddb trace will be more detailed.
> 
> [0]: http://ftp.eu.openbsd.org/pub/OpenBSD/snapshots/amd64/

Will give this a go!

Thanx!

Mischa



OpenBSD 5.8 GENERIC#1170 amd64 panics in bhyve

2015-11-20 Thread Mischa
Hi All,

I am running OpenBSD 5.8 GENERIC#1170 amd64 as a bhyve instance on FreeBSD 
10.2-RELEASE-p7.
The storage is provided by ZFS of which the instance runs of.

The only services that are running in this OpeBSD instance are relayd and httpd.
There is no content hosted on this instance relayd / httpd only act as a 
reverse proxy.

I managed to catch the panic this time. Hopefully this provides some insight 
what happend.

lb1:~ $  panic: mtx_enter: locking against myself
Stopped at  0x8133fbf9: leave
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!

ddb> trace
(null)() at 0x8133fbf9
(null)() at 0x811a310e
(null)() at 0x8132507f
(null)() at 0x811bc830
(null)() at 0x8123f050
(null)() at 0x81253aad
(null)() at 0x812548f0
(null)() at 0x81193a95
(null)() at 0x81324622
(null)() at 0x8133b3bf
(null)() at 0x8119f8f5
(null)() at 0x811bb0f6
(null)() at 0x811bb6bc
(null)() at 0x811bee6e
(null)() at 0x8132339e
end of kernel
end trace frame: 0x13b598f9a190, count: -15

ddb> ps
   PID   PPID   PGRPUID  S   FLAGS  WAIT  COMMAND
  6780  1   6780  0  30x83  ttyin ksh
 28980  1  28980  0  30x80  poll  cron
 20453  30323  30323 95  30x90  kqreadsmtpd
  4183  30323  30323 95  30x90  kqreadsmtpd
 14921  30323  30323 95  30x90  kqreadsmtpd
 21549  30323  30323 95  30x90  kqreadsmtpd
 21867  30323  30323 95  30x90  kqreadsmtpd
  5556  30323  30323103  30x90  kqreadsmtpd
 30323  1  30323  0  30x80  kqreadsmtpd
* 8116   8728   8728 89  70x10relayd
 28617   8728   8728 89  30x90  kqreadrelayd
 28666   8728   8728 89  30x90  kqreadrelayd
 14234   8728   8728 89  30x90  kqreadrelayd
  6772  20410  20410 89  30x90  kqreadrelayd
  6064  20410  20410 89  30x90  kqreadrelayd
 32481  20410  20410 89  30x90  kqreadrelayd
 30101  20410  20410 89  30x90  kqreadrelayd
 20410  22081  20410 89  30x90  kqreadrelayd
  8728  22081   8728 89  30x90  kqreadrelayd
  4449  22081   4449 89  30x90  kqreadrelayd
 28012  22081  28012 89  30x90  kqreadrelayd
 22081  1  22081  0  30x80  kqreadrelayd
 23731  21123  23731 91  30x90  kqreadsnmpd
 23092  21123  23092 91  30x90  kqreadsnmpd
 21123  1  21123  0  30x80  kqreadsnmpd
65  1 65  0  30x80  selectsshd
 18184  21160  15260 83  30x90  poll  ntpd
 21160  15260  15260 83  30x90  poll  ntpd
 15260  1  15260  0  30x80  poll  ntpd
 25057  27605  27605 74  30x90  bpf   pflogd
 27605  1  27605  0  30x80  netio pflogd
  7709   9861   9861 73  30x90  kqreadsyslogd
  9861  1   9861  0  30x80  netio syslogd
  2720  0  0  0  3 0x14200  pgzerozerothread
  3008  0  0  0  3 0x14200  aiodoned  aiodoned
  9376  0  0  0  3 0x14200  syncerupdate
 20457  0  0  0  3 0x14200  cleaner   cleaner
   281  0  0  0  3 0x14200  reaperreaper
 22066  0  0  0  3 0x14200  pgdaemon  pagedaemon
 26663  0  0  0  3 0x14200  bored crypto
 12531  0  0  0  3 0x14200  pftm  pfpurge
 28680  0  0  0  3  0x40014200  acpi0 acpi0
  1276  0  0  0  3 0x14200  bored softnet
  6840  0  0  0  3 0x14200  bored systqmp
 22866  0  0  0  3 0x14200  bored systq
 29152  0  0  0  3  0x40014200idle0
 1  0  1  0  30x82  wait  init
 0 -1  0  0  3 0x10200  scheduler swapper

Mischa