VM crash on 7.2#4
Hi, Just noticed one of the VMs greeted me with a ddb> prompt. The host is running 7.2#4 as well as the VM, dmesg of the host below. I managed to get the following data from the VM: ddb> show panic *cpu0: kernel diagnostic assertion "m != NULL" failed: file "/usr/src/sys/dev/p v/if_vio.c", line 1006 ddb> trace db_enter() at db_enter+0x10 panic(81f17485) at panic+0xb8 __assert(81f891d8,81f89d08,3ee,81f90540) at __assert+0x 25 vio_rxeof(8003a000) at vio_rxeof+0x23f vio_rx_intr(8003a050) at vio_rx_intr+0x38 virtio_check_vqs(80039400) at virtio_check_vqs+0xfe virtio_pci_legacy_intr(80039400) at virtio_pci_legacy_intr+0x61 intr_handler(80002250c100,80049e80) at intr_handler+0x38 Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3 cpu_idle_cycle() at cpu_idle_cycle+0x1f end trace frame: 0x0, count: -10 root@r2:~ # syspatch -l 001_x509 002_asn1 003_ukbd 004_expat 005_pixman 006_vmm 007_unwind 008_pfsync 009_xserver 010_vmd 011_gpuinv 012_acme root@r2:~ # dmesg OpenBSD 7.2 (GENERIC.MP) #4: Mon Dec 12 06:06:42 MST 2022 r...@syspatch-72-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 412202078208 (393106MB) avail mem = 399692173312 (381176MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x7a32f000 (76 entries) bios0: vendor Dell Inc. version "2.16.0" date 07/20/2022 bios0: Dell Inc. PowerEdge R630 acpi0 at bios0: ACPI 4.0 acpi0: sleep states S0 S5 acpi0: tables DSDT FACP MCEJ WD__ SLIC HPET APIC MCFG MSCT SLIT SRAT SSDT SSDT SSDT PRAD DMAR HEST BERT ERST EINJ acpi0: wakeup devices PCI0(S4) BR1A(S4) BR1B(S4) BR2A(S4) BR2B(S4) BR2C(S4) BR2D(S4) BR3A(S4) BR3B(S4) BR3C(S4) BR3D(S4) XHC_(S0) RP02(S4) RP03(S4) RP05(S4) RP08(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 14318179 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz, 3200.03 MHz, 06-3f-02 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 8-way L2 cache, 20MB 64b/line 20-way L3 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.2, IBE cpu1 at mainbus0: apid 16 (application processor) cpu1: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz, 3398.59 MHz, 06-3f-02 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 8-way L2 cache, 20MB 64b/line 20-way L3 cache cpu1: smt 0, core 0, package 1 cpu2 at mainbus0: apid 2 (application processor) cpu2: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz, 3399.01 MHz, 06-3f-02 cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu2: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 8-way L2 cache, 20MB 64b/line 20-way L3 cache cpu2: smt 0, core 1, package 0 cpu3 at mainbus0: apid 18 (application processor) cpu3: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz, 3400.00 MHz, 06-3f-02 cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu3: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 8-way L2 cache, 20MB 64b/line 20-way L3 cache
Re: NSD exit status 11 on 7.0
Hi All, Just to confirm that the below patch has been working like a charm. Since the patch was applied, and only this patch, NSD has been behaving properly. Mischa On 2021-10-20 12:33, Florian Obser wrote: On 2021-10-20 07:55 +02, Otto Moerbeek wrote: On Wed, Oct 20, 2021 at 07:47:30AM +0200, Mischa wrote: Unfortunately our joy was short lived. This morning I noticed a lot of Oct 20 07:44:15 name1 nsd[80814]: server 76410 died unexpectedly with status 11, restarting It looks like there is a potentially fixed in version 4.3.8. https://github.com/NLnetLabs/nsd/issues/195 https://github.com/NLnetLabs/nsd/issues/189 https://github.com/NLnetLabs/nsd/blob/NSD_4_3_8_REL/doc/ChangeLog 23 August 2021: Wouter - Fix #189: nsd 4.3.7 crash answer_delegation: Assertion `query->delegation_rrset' failed. (Thanx Roger!) That is not the correct fix, it only hides the problem and worse, produces wrong results. Please try this, which is the fix for https://github.com/NLnetLabs/nsd/issues/194 diff --git namedb.c namedb.c index 06bef71147c..772e038b16d 100644 --- namedb.c +++ namedb.c @@ -583,10 +583,13 @@ domain_find_ns_rrsets(domain_type* domain, zone_type* zone, rrset_type **ns) { /* return highest NS RRset in the zone that is a delegation above */ domain_type* result = NULL; + rrset_type* rrset = NULL; while (domain && domain != zone->apex) { - *ns = domain_find_rrset(domain, zone, TYPE_NS); - if (*ns) + rrset = domain_find_rrset(domain, zone, TYPE_NS); + if (rrset) { + *ns = rrset; result = domain; + } domain = domain->parent; } As far as I can tell from the things Martijn found it might be the case. Will give that a try and report back. Mischa Are you going to try just the one line fix or the whole of 4.3.8? I suppose if we want to backport to -stable the one-line fix is preferred. Yes, except, we should go with the correct fix above ;) Nothing else is interesting to backport in 4.3.8 as far as I can tell. -Otto I provided an explanation what's going on in https://github.com/NLnetLabs/nsd/issues/195#issuecomment-947505367 Reproduced here (slightly edited): 712296f (the one-line-fix) only hides the problem, it doesn't fix anything. The real fix is ba0002e (the diff above). f.9.1.1.0.0.2.ip6.arpa. is an ENT in ip6.arpa. and so is 2.ip6.arpa. In line 1420 in query.c we haveq->delegation_domain = domain_find_ns_rrsets( and the unfixed domain_find_ns_rrsets would find the NS RRset for 9.1.1.0.0.2.ip6.arpa. But it would then continue searching upwards, overwriting *ns which is >delegation_rrset. Until it hits 2.ip6.arpa. which has no NS records. So q->delegation_rrset = NULL but at the same time result != NULL because we did find a delegation RRset along the way, we just ignored it (at least for 9.1.1.0.0.2.ip6.arpa., I didn't check if there was one further up). domain_find_ns_rrsets returns non-NULL which means we found a delegation, but at the same time it doesn't give us the delegation NS RRset. It is probably best to revert 712296f since on its own it produces wrong results. I.e. adding it to 4.3.7 gives this: $ dig @192.168.178.219 +norec f.9.1.1.0.0.2.ip6.arpa NS ; <<>> dig 9.10.8-P1 <<>> @192.168.178.219 +norec f.9.1.1.0.0.2.ip6.arpa NS ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 10923 ;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ;f.9.1.1.0.0.2.ip6.arpa.INNS ;; AUTHORITY SECTION: ip6.arpa.3600INSOAb.ip6-servers.arpa. nstld.iana.org. 2021100154 1800 900 604800 3600 ;; Query time: 0 msec ;; SERVER: 192.168.178.219#53(192.168.178.219) ;; WHEN: Wed Oct 20 10:24:56 CEST 2021 ;; MSG SIZE rcvd: 115 But the correct answer is this: dig @::1 +norec f.9.1.1.0.0.2.ip6.arpa NS ; <<>> dig 9.10.8-P1 <<>> @::1 +norec f.9.1.1.0.0.2.ip6.arpa NS ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48090 ;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ;f.9.1.1.0.0.2.ip6.arpa.INNS ;; AUTHORITY SECTION: 9.1.1.0.0.2.ip6.arpa.86400INNSr.arin.net. 9.1.1.0.0.2.ip6.arpa.86400INNSu.arin.net. 9.1.1.0.0.2.ip6.arpa.86400INNSx.arin.net. 9.1.1.0.0.2.ip6.arpa.86400INNSy.arin.net. 9.1.1.0.0.2.ip6.arpa.86400INNSz.arin.net. 9.1.1.0.0.2.ip6.arpa.86400INNSarin.authdns.ripe.net. ;; Query time: 0 msec ;; SERVER: ::1#53(::1) ;; WHEN: Wed Oct 20 10:24:16 CEST 2021 ;; MSG SIZE rcvd: 171
Re: NSD exit status 11 on 7.0
On 2021-10-20 12:33, Florian Obser wrote: On 2021-10-20 07:55 +02, Otto Moerbeek wrote: On Wed, Oct 20, 2021 at 07:47:30AM +0200, Mischa wrote: Unfortunately our joy was short lived. This morning I noticed a lot of Oct 20 07:44:15 name1 nsd[80814]: server 76410 died unexpectedly with status 11, restarting It looks like there is a potentially fixed in version 4.3.8. https://github.com/NLnetLabs/nsd/issues/195 https://github.com/NLnetLabs/nsd/issues/189 https://github.com/NLnetLabs/nsd/blob/NSD_4_3_8_REL/doc/ChangeLog 23 August 2021: Wouter - Fix #189: nsd 4.3.7 crash answer_delegation: Assertion `query->delegation_rrset' failed. (Thanx Roger!) That is not the correct fix, it only hides the problem and worse, produces wrong results. Please try this, which is the fix for https://github.com/NLnetLabs/nsd/issues/194 diff --git namedb.c namedb.c index 06bef71147c..772e038b16d 100644 --- namedb.c +++ namedb.c @@ -583,10 +583,13 @@ domain_find_ns_rrsets(domain_type* domain, zone_type* zone, rrset_type **ns) { /* return highest NS RRset in the zone that is a delegation above */ domain_type* result = NULL; + rrset_type* rrset = NULL; while (domain && domain != zone->apex) { - *ns = domain_find_rrset(domain, zone, TYPE_NS); - if (*ns) + rrset = domain_find_rrset(domain, zone, TYPE_NS); + if (rrset) { + *ns = rrset; result = domain; + } domain = domain->parent; } Thanx Florian! Will give that a go and let you know. Mischa As far as I can tell from the things Martijn found it might be the case. Will give that a try and report back. Mischa Are you going to try just the one line fix or the whole of 4.3.8? I suppose if we want to backport to -stable the one-line fix is preferred. Yes, except, we should go with the correct fix above ;) Nothing else is interesting to backport in 4.3.8 as far as I can tell. -Otto I provided an explanation what's going on in https://github.com/NLnetLabs/nsd/issues/195#issuecomment-947505367 Reproduced here (slightly edited): 712296f (the one-line-fix) only hides the problem, it doesn't fix anything. The real fix is ba0002e (the diff above). f.9.1.1.0.0.2.ip6.arpa. is an ENT in ip6.arpa. and so is 2.ip6.arpa. In line 1420 in query.c we haveq->delegation_domain = domain_find_ns_rrsets( and the unfixed domain_find_ns_rrsets would find the NS RRset for 9.1.1.0.0.2.ip6.arpa. But it would then continue searching upwards, overwriting *ns which is >delegation_rrset. Until it hits 2.ip6.arpa. which has no NS records. So q->delegation_rrset = NULL but at the same time result != NULL because we did find a delegation RRset along the way, we just ignored it (at least for 9.1.1.0.0.2.ip6.arpa., I didn't check if there was one further up). domain_find_ns_rrsets returns non-NULL which means we found a delegation, but at the same time it doesn't give us the delegation NS RRset. It is probably best to revert 712296f since on its own it produces wrong results. I.e. adding it to 4.3.7 gives this: $ dig @192.168.178.219 +norec f.9.1.1.0.0.2.ip6.arpa NS ; <<>> dig 9.10.8-P1 <<>> @192.168.178.219 +norec f.9.1.1.0.0.2.ip6.arpa NS ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 10923 ;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ;f.9.1.1.0.0.2.ip6.arpa.INNS ;; AUTHORITY SECTION: ip6.arpa.3600INSOAb.ip6-servers.arpa. nstld.iana.org. 2021100154 1800 900 604800 3600 ;; Query time: 0 msec ;; SERVER: 192.168.178.219#53(192.168.178.219) ;; WHEN: Wed Oct 20 10:24:56 CEST 2021 ;; MSG SIZE rcvd: 115 But the correct answer is this: dig @::1 +norec f.9.1.1.0.0.2.ip6.arpa NS ; <<>> dig 9.10.8-P1 <<>> @::1 +norec f.9.1.1.0.0.2.ip6.arpa NS ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48090 ;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ;f.9.1.1.0.0.2.ip6.arpa.INNS ;; AUTHORITY SECTION: 9.1.1.0.0.2.ip6.arpa.86400INNSr.arin.net. 9.1.1.0.0.2.ip6.arpa.86400INNSu.arin.net. 9.1.1.0.0.2.ip6.arpa.86400INNSx.arin.net. 9.1.1.0.0.2.ip6.arpa.86400INNSy.arin.net. 9.1.1.0.0.2.ip6.arpa.86400INNSz.arin.net. 9.1.1.0.0.2.ip6.arpa.86400INNSarin.authdns.ripe.net. ;; Query time: 0 msec ;; SERVER: ::1#53(::1) ;; WHEN: Wed Oct 20 10:24:16 CEST 2021 ;; MSG SIZE rcvd: 171
Re: sysupgrade after upgrade shuts down VM
> On 24 Sep 2020, at 13:52, Sebastien Marie wrote: > > On Thu, Sep 24, 2020 at 12:47:30PM +0200, Mischa wrote: >>> >>> One quirk of the archive: It just creates a directory every night, no >>> matter if a snap was built or not, you can check with what(1) if you >>> actually have a different kernel to the one you already tested. >> >> for i in $(jot -w %02d 15 10); do ftp -o bsd.rd-${i} >> https://ftp.hostserver.de/archive/2020-09-${i}-0105/snapshots/amd64/bsd.rd; >> done >> >> The build from the 15th is the first showing this issue, the one of the 14th >> is fine. >> >> tx# what /bsd.rd-15 >> /bsd.rd-15: >>OpenBSD 6.8-beta (RAMDISK_CD) #65: Sun Sep 13 03:09:57 MDT 2020 >>PD KSH v5.2.14 99/07/13.2 >>$OpenBSD: cert.pem,v 1.21 2020/06/01 18:53:53 sthen Exp $ >> > > Could you provide the what /bsd.rd-14 too ? > > Downloading > https://ftp.hostserver.de/archive/2020-09-14-0105/snapshots/amd64/bsd.rd > and https://ftp.hostserver.de/archive/2020-09-15-0105/snapshots/amd64/bsd.rd > , I have > the same file. > > $ sha256 -b bsd.rd-14 bsd.rd-15 > SHA256 (bsd.rd-14) = wfNVV8gKxUP8gJvozr73T2bbz2uuKCX7p7JeS3kmTj8= > SHA256 (bsd.rd-15) = wfNVV8gKxUP8gJvozr73T2bbz2uuKCX7p7JeS3kmTj8= > > Is it is the same on your side too ? if yes, it means hypervisor > doesn't have constant behaviour. I upgraded my -current host to the latest release and created two VMs, one running 6.7-stable and one 6.8-current. Rebooted each VM in bsd.rd around 10 times and on 6.8 I don’t see this happening at all. On 6.7 I indeed see inconsistent behaviour, but never showed itself as obvious then the last couple of weeks. It is happening with both 6.7 as well as 6.8 bsd.rd, I needed to try more times to see this happening. Somewhere in between 6.7 -> 6.8 it seems this issue has been addressed. Thank you all for sharing your insights and help. Will keep this host running -current for the foreseeable future, if anybody needs a VM for testing/breaking let me know. Mischa PS: I love the fact ctrl-l is working in ksh vi mode! Thank you for doing that!
Re: sysupgrade after upgrade shuts down VM
> On 24 Sep 2020, at 12:23, Florian Obser wrote: > > On Thu, Sep 24, 2020 at 12:13:31PM +0200, Mischa wrote: >> >> >>> On 24 Sep 2020, at 09:15, Florian Obser wrote: >>> 3a) does it also shutdown: bisect the hypervisor (tbh I expect the problem >>> here). >> >> 6.8 bsd.rd shuts down >> 6.7 bsd.rd reboots >> >> Both VMs are running on the same host which is on 6.7. >> # sysctl kern.version >> kern.version=OpenBSD 6.7 (GENERIC.MP) #1: Sat May 16 16:33:02 MDT 2020 >> >> r...@syspatch-67-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP >> >>> 3b) does it reboot: bisect bsd.rd >> >> What do you mean? > > Go to https://ftp.hostserver.de/archive/ <https://ftp.hostserver.de/archive/> > > There you will find an archive of old snapshots, about 100 days worth. > i.e, the oldest: > https://ftp.hostserver.de/archive/2020-06-20-0105/snapshots/amd64/ > <https://ftp.hostserver.de/archive/2020-06-20-0105/snapshots/amd64/> > and the newest: > https://ftp.hostserver.de/archive/2020-09-24-0105/snapshots/amd64/ > <https://ftp.hostserver.de/archive/2020-09-24-0105/snapshots/amd64/> > > We already know that the newest is bad. > > Pick a bsd.rd from the middle (I'm just eyeballing this): > > https://ftp.hostserver.de/archive/2020-08-02-0105/snapshots/amd64/bsd.rd > <https://ftp.hostserver.de/archive/2020-08-02-0105/snapshots/amd64/bsd.rd> > > Does that one work? > Yes: Pick one in the middle between 2020-08-02 and 2020-09-24 > No: Pick one in the middle between 2020-08-02 and 2020-06-20. > > binary search... > > One quirk of the archive: It just creates a directory every night, no > matter if a snap was built or not, you can check with what(1) if you > actually have a different kernel to the one you already tested. for i in $(jot -w %02d 15 10); do ftp -o bsd.rd-${i} https://ftp.hostserver.de/archive/2020-09-${i}-0105/snapshots/amd64/bsd.rd; done The build from the 15th is the first showing this issue, the one of the 14th is fine. tx# what /bsd.rd-15 /bsd.rd-15: OpenBSD 6.8-beta (RAMDISK_CD) #65: Sun Sep 13 03:09:57 MDT 2020 PD KSH v5.2.14 99/07/13.2 $OpenBSD: cert.pem,v 1.21 2020/06/01 18:53:53 sthen Exp $ Mischa
Re: sysupgrade after upgrade shuts down VM
> On 24 Sep 2020, at 09:15, Florian Obser wrote: > > Hi Mischa, > > On Thu, Sep 24, 2020 at 08:52:55AM +0200, Mischa wrote: >> Hi All, >> >> With the last couple of -current updates I noticed a VM doesn’t come back >> after running sysupgrade, which it used to do. > > it's very unlikely that this is a sysupgrade problem. > More likely something in the kernel changed (vm or hypervisor). > >> I don’t know exactly when it started but something in the late #60s. > > This number doesn't mean anything. Please provide build dates. > >> >> Running sysupgrade from within the VM, it reboots and goes through the >> upgrade as normal. Once it’s done with the upgrade it shuts down. >> Tail-end of the process from the latest sysupgrade. >> >> Set name(s)? (or 'abort' or 'done') [done] done >> Directory does not contain SHA256.sig. Continue without verification? [no] >> yes >> Installing bsd 100% |**| 20383 KB00:01 >> >> Installing bsd.rd 100% |**| 10141 KB00:00 >> >> Installing base68.tgz 100% |**| 289 MB01:42 >> >> Installing comp68.tgz 100% |**| 74305 KB00:52 >> >> Installing man68.tgz100% |**| 7484 KB00:10 >> >> Installing game68.tgz 100% |**| 2739 KB00:01 >> >> Installing xbase68.tgz 100% |**| 28866 KB00:17 >> >> Installing xshare68.tgz 100% |**| 4499 KB00:15 >> >> Installing xfont68.tgz 100% |**| 39342 KB00:23 >> >> Installing xserv68.tgz 100% |**| 18333 KB00:07 >> >> Location of sets? (disk http nfs or 'done') [done] done >> Making all device nodes... done. >> Relinking to create unique kernel... done. >> >> CONGRATULATIONS! Your OpenBSD upgrade has been successfully completed! >> >> syncing disks... done >> vmmci0: powerdown > > ^ > Why does it think it should power down? > > I never ran vmm, so this is all a wild guess, but here is how I would > approach this: > > 1) manually boot into bsd.rd, hit 's' to get to a shell prompt and > type reboot, does it shutdown? >> OpenBSD/amd64 BOOT 3.52 >> boot> bsd.rd booting hd0a:bsd.rd: 3818189+1573888+3878136+0+757760 [324353+128+468792+313530]=0xaa0780 entry point at 0x81001000 Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2020 OpenBSD. All rights reserved. https://www.OpenBSD.org OpenBSD 6.8-beta (RAMDISK_CD) #75: Wed Sep 23 15:43:49 MDT 2020 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD [snip] Welcome to the OpenBSD/amd64 6.8 installation program. (I)nstall, (U)pgrade, (A)utoinstall or (S)hell? s # reboot syncing disks... done vmmci0: powerdown rebooting… [EOT] > 2) if yes, get a 6.7 bsd.rd (which I presume is known good) and retry >> OpenBSD/amd64 BOOT 3.47 boot> bsd.rd booting hd0a:bsd.rd: 3826379+1557504+3881976+0+598016 [301104+128+465696+311208]=0xa71778 entry point at 0x81001000 Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2020 OpenBSD. All rights reserved. https://www.OpenBSD.org OpenBSD 6.7 (RAMDISK_CD) #177: Thu May 7 11:19:02 MDT 2020 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD [snip] Welcome to the OpenBSD/amd64 6.7 installation program. (I)nstall, (U)pgrade, (A)utoinstall or (S)hell? s # reboot syncing disks... done vmmci0: powerdown rebooting... Using drive 0, partition 3. Loading.. probing: pc0 com0 mem[638K 1022M a20=on] disk: hd0+ >> OpenBSD/amd64 BOOT 3.47 boot> > 3a) does it also shutdown: bisect the hypervisor (tbh I expect the problem > here). 6.8 bsd.rd shuts down 6.7 bsd.rd reboots Both VMs are running on the same host which is on 6.7. # sysctl kern.version kern.version=OpenBSD 6.7 (GENERIC.MP) #1: Sat May 16 16:33:02 MDT 2020 r...@syspatch-67-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > 3b) does it reboot: bisect bsd.rd What do you mean? Mischa
Re: sysupgrade after upgrade shuts down VM
> On 24 Sep 2020, at 09:15, Florian Obser wrote: > > Hi Mischa, > > On Thu, Sep 24, 2020 at 08:52:55AM +0200, Mischa wrote: >> Hi All, >> >> With the last couple of -current updates I noticed a VM doesn’t come back >> after running sysupgrade, which it used to do. > > it's very unlikely that this is a sysupgrade problem. > More likely something in the kernel changed (vm or hypervisor). > >> I don’t know exactly when it started but something in the late #60s. > > This number doesn't mean anything. Please provide build dates. Will try to find the one where it started. >> Running sysupgrade from within the VM, it reboots and goes through the >> upgrade as normal. Once it’s done with the upgrade it shuts down. >> Tail-end of the process from the latest sysupgrade. >> >> Set name(s)? (or 'abort' or 'done') [done] done >> Directory does not contain SHA256.sig. Continue without verification? [no] >> yes >> Installing bsd 100% |**| 20383 KB00:01 >> >> Installing bsd.rd 100% |**| 10141 KB00:00 >> >> Installing base68.tgz 100% |**| 289 MB01:42 >> >> Installing comp68.tgz 100% |**| 74305 KB00:52 >> >> Installing man68.tgz100% |**| 7484 KB00:10 >> >> Installing game68.tgz 100% |**| 2739 KB00:01 >> >> Installing xbase68.tgz 100% |**| 28866 KB00:17 >> >> Installing xshare68.tgz 100% |**| 4499 KB00:15 >> >> Installing xfont68.tgz 100% |**| 39342 KB00:23 >> >> Installing xserv68.tgz 100% |**| 18333 KB00:07 >> >> Location of sets? (disk http nfs or 'done') [done] done >> Making all device nodes... done. >> Relinking to create unique kernel... done. >> >> CONGRATULATIONS! Your OpenBSD upgrade has been successfully completed! >> >> syncing disks... done >> vmmci0: powerdown > > ^ > Why does it think it should power down? Indeed. > I never ran vmm, so this is all a wild guess, but here is how I would > approach this: > > 1) manually boot into bsd.rd, hit 's' to get to a shell prompt and > type reboot, does it shutdown? > 2) if yes, get a 6.7 bsd.rd (which I presume is known good) and retry > 3a) does it also shutdown: bisect the hypervisor (tbh I expect the problem > here). > 3b) does it reboot: bisect bsd.rd > > You can use https://ftp.hostserver.de/archive/ > <https://ftp.hostserver.de/archive/> for bisecting. 6.7 is fine, majority of the VMs are on 6.7-stable, issuing a reboot does an actual reboot. Will try with -current bsd.rd and see what happens. Mischa
sysupgrade after upgrade shuts down VM
Hi All, With the last couple of -current updates I noticed a VM doesn’t come back after running sysupgrade, which it used to do. I don’t know exactly when it started but something in the late #60s. Running sysupgrade from within the VM, it reboots and goes through the upgrade as normal. Once it’s done with the upgrade it shuts down. Tail-end of the process from the latest sysupgrade. Set name(s)? (or 'abort' or 'done') [done] done Directory does not contain SHA256.sig. Continue without verification? [no] yes Installing bsd 100% |**| 20383 KB00:01 Installing bsd.rd 100% |**| 10141 KB00:00 Installing base68.tgz 100% |**| 289 MB01:42 Installing comp68.tgz 100% |**| 74305 KB00:52 Installing man68.tgz100% |**| 7484 KB00:10 Installing game68.tgz 100% |**| 2739 KB00:01 Installing xbase68.tgz 100% |**| 28866 KB00:17 Installing xshare68.tgz 100% |**| 4499 KB00:15 Installing xfont68.tgz 100% |**| 39342 KB00:23 Installing xserv68.tgz 100% |**| 18333 KB00:07 Location of sets? (disk http nfs or 'done') [done] done Making all device nodes... done. Relinking to create unique kernel... done. CONGRATULATIONS! Your OpenBSD upgrade has been successfully completed! syncing disks... done vmmci0: powerdown rebooting... [EOT] # vmctl show tx ID PID VCPUS MAXMEM CURMEM TTYOWNERSTATE NAME 3 - 14.0G - - root stopped tx Anything I can change to have the VM reboot and not shutdown? Mischa
Re: Panic captures of VM
> On 16 Oct 2019, at 21:35, Mike Larkin wrote: > > On Wed, Oct 16, 2019 at 06:14:55PM +0200, Mischa wrote: >> Hi Stuart, >> >> >>>> On 16 Oct 2019, at 18:07, Stuart Henderson wrote: >>> >>> On 2019/10/16 18:00, Mischa wrote: >>>> Hi All, >>>> >>>> One of the OpenBSD VMs running on 6.6-beta #313 is rebooting or panicing >>>> in different ways. >>>> Not sure if they are all relevant or useful but here are the ones we >>>> managed to capture. >>> >>> There's not a lot of information in your mail... for starters, what are >>> you running the VM in, and is there any difference in the config for that >>> VM compared to other working ones? >> >> Fair point. >> >> There are 10 VMs running on this host, the host is running: >> $ sysctl kern.version >> kern.version=OpenBSD 6.6-beta (GENERIC.MP) #313: Tue Sep 10 23:30:52 MDT 2019 >>dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP >> >> I know of one other VM which is rebooting every once in a while, but haven’t >> seen any panics. >> As for the other 8 I can see every once in a while a VM shutdown. But no >> capture of the console. >> >>> Do you have any other VMs running the same OpenBSD snapshot successfully? >> >> The rest of the VMs are on -stable as far as I am aware. Other people are >> operating these VMS. >> >>> Can you boot an old kernel and get a dmesg? >> >> Here is a dmesg which we manage to capture after one of the panics: >> > > Are you in swap at all on that host? Yes. :/ load averages: 0.02, 0.06, 0.13 server1.openbsd.amsterdam 21:41:11 71 processes: 70 idle, 1 on processor up 34 days, 2:15 CPU0: 0.7% user, 0.0% nice, 2.6% sys, 0.5% spin, 0.1% intr, 96.2% idle CPU1: 0.7% user, 0.0% nice, 3.4% sys, 0.4% spin, 0.0% intr, 95.6% idle CPU2: 6.2% user, 0.0% nice, 31.0% sys, 11.0% spin, 0.0% intr, 51.8% idle CPU3: 0.7% user, 0.0% nice, 2.8% sys, 0.3% spin, 0.0% intr, 96.1% idle Memory: Real: 5427M/7623M act/tot Free: 275M Cache: 2001M Swap: 41M/8405M Mischa > > -ml > >> fd0# panic: mtx 0x81f353f0: locking against myself >> Using drive 0, partition 3. >> Loading.. >> probing: pc0 com0 mem[638K 510M a20=on] >> disk: hd0+ hd1+ >>>> OpenBSD/amd64 BOOT 3.45 >> / >> com0: 115200 baud >> switching console to com0 >>>> OpenBSD/amd64 BOOT 3.45 >> boot> >> booting hd0a:/bsd: 12666184+2937864+332896+0+704512 >> [987630+128+1010256+738953]=0x127d750 >> entry point at 0x81001000 >> [ using 2738000 bytes of bsd ELF symbol table ] >> Copyright (c) 1982, 1986, 1989, 1991, 1993 >>The Regents of the University of California. All rights reserved. >> Copyright (c) 1995-2019 OpenBSD. All rights reserved. >> https://www.OpenBSD.org >> >> OpenBSD 6.6 (GENERIC) #325: Wed Oct 2 11:38:13 MDT 2019 >>dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC >> real mem = 520077312 (495MB) >> avail mem = 491753472 (468MB) >> mpath0 at root >> scsibus0 at mpath0: 256 targets >> mainbus0 at root >> bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf3f40 (10 entries) >> bios0: vendor SeaBIOS version "1.11.0p2-OpenBSD-vmm" date 01/01/2011 >> bios0: OpenBSD VMM >> acpi at bios0 not configured >> cpu0 at mainbus0: (uniprocessor) >> cpu0: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3101.63 MHz, 06-3a-09 >> cpu0: >> FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,LONG,LAHF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,MELTDOWN >> cpu0: 256KB 64b/line 8-way L2 cache >> tsc_timecounter_init: TSC skew=0 observed drift=0 >> cpu0: smt 0, core 0, package 0 >> cpu0: using VERW MDS workaround >> pvbus0 at mainbus0: OpenBSD >> pvclock0 at pvbus0 >> pci0 at mainbus0 bus 0 >> pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00 >> virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00 >> viornd0 at virtio0 >> virtio0: irq 3 >> virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00 >> vio0 at virtio1: address fe:e1:bb:d1:24:36 >> virtio1: irq 5 >> virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Storage" rev 0x00 >> vioblk0 at virtio2 >> scsibus1 at vioblk0: 2 targets >> sd0 at scsibus1 targ 0 lun 0: >> sd0: 51200MB, 512 bytes/sector, 104
Re: Panic captures of VM
Hi Stuart, > On 16 Oct 2019, at 18:07, Stuart Henderson wrote: > > On 2019/10/16 18:00, Mischa wrote: >> Hi All, >> >> One of the OpenBSD VMs running on 6.6-beta #313 is rebooting or panicing in >> different ways. >> Not sure if they are all relevant or useful but here are the ones we managed >> to capture. > > There's not a lot of information in your mail... for starters, what are > you running the VM in, and is there any difference in the config for that > VM compared to other working ones? Fair point. There are 10 VMs running on this host, the host is running: $ sysctl kern.version kern.version=OpenBSD 6.6-beta (GENERIC.MP) #313: Tue Sep 10 23:30:52 MDT 2019 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP I know of one other VM which is rebooting every once in a while, but haven’t seen any panics. As for the other 8 I can see every once in a while a VM shutdown. But no capture of the console. > Do you have any other VMs running the same OpenBSD snapshot successfully? The rest of the VMs are on -stable as far as I am aware. Other people are operating these VMS. > Can you boot an old kernel and get a dmesg? Here is a dmesg which we manage to capture after one of the panics: fd0# panic: mtx 0x81f353f0: locking against myself Using drive 0, partition 3. Loading.. probing: pc0 com0 mem[638K 510M a20=on] disk: hd0+ hd1+ >> OpenBSD/amd64 BOOT 3.45 / com0: 115200 baud switching console to com0 >> OpenBSD/amd64 BOOT 3.45 boot> booting hd0a:/bsd: 12666184+2937864+332896+0+704512 [987630+128+1010256+738953]=0x127d750 entry point at 0x81001000 [ using 2738000 bytes of bsd ELF symbol table ] Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2019 OpenBSD. All rights reserved. https://www.OpenBSD.org OpenBSD 6.6 (GENERIC) #325: Wed Oct 2 11:38:13 MDT 2019 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC real mem = 520077312 (495MB) avail mem = 491753472 (468MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf3f40 (10 entries) bios0: vendor SeaBIOS version "1.11.0p2-OpenBSD-vmm" date 01/01/2011 bios0: OpenBSD VMM acpi at bios0 not configured cpu0 at mainbus0: (uniprocessor) cpu0: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3101.63 MHz, 06-3a-09 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,LONG,LAHF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache tsc_timecounter_init: TSC skew=0 observed drift=0 cpu0: smt 0, core 0, package 0 cpu0: using VERW MDS workaround pvbus0 at mainbus0: OpenBSD pvclock0 at pvbus0 pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00 virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00 viornd0 at virtio0 virtio0: irq 3 virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00 vio0 at virtio1: address fe:e1:bb:d1:24:36 virtio1: irq 5 virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Storage" rev 0x00 vioblk0 at virtio2 scsibus1 at vioblk0: 2 targets sd0 at scsibus1 targ 0 lun 0: sd0: 51200MB, 512 bytes/sector, 104857600 sectors virtio2: irq 6 virtio3 at pci0 dev 4 function 0 "Qumranet Virtio Storage" rev 0x00 vioblk1 at virtio3 scsibus2 at vioblk1: 2 targets sd1 at scsibus2 targ 0 lun 0: sd1: 51200MB, 512 bytes/sector, 104857600 sectors virtio3: irq 7 virtio4 at pci0 dev 5 function 0 "OpenBSD VMM Control" rev 0x00 vmmci0 at virtio4 virtio4: irq 9 isa0 at mainbus0 isadma0 at isa0 com0 at isa0 port 0x3f8/8 irq 4: ns8250, no fifo com0: console vscsi0 at root scsibus3 at vscsi0: 256 targets softraid0 at root scsibus4 at softraid0: 256 targets root on sd0a (d4c2875ac610c324.a) swap on sd0b dump on sd0b WARNING: / was not properly unmounted Automatic boot in progress: starting file system checks. /dev/sd0a (d4c2875ac610c324.a): 2323 files, 47973 used, 466466 free (178 frags, 58286 blocks, 0.0% fragmentation) /dev/sd0a (d4c2875ac610c324.a): MARKING FILE SYSTEM CLEAN /dev/rsd1a: file system is clean; not checking /dev/rsd1k: file system is clean; not checking /dev/rsd1d: file system is clean; not checking /dev/rsd1f: file system is clean; not checking /dev/rsd1g: file system is clean; not checking /dev/rsd1h: file system is clean; not checking /dev/rsd1j: file system is clean; not checking /dev/rsd1i: file system is clean; not checking /dev/rsd1e: file system is clean; not checking /dev/sd0k (d4c2875ac610c324.k): 3992 files, 548865 used, 10831142 free (78 frags, 1353883 blocks, 0.0% fragmentation) /dev/sd0k (d4c2875ac610c324.k): MARKING FILE SYSTEM CLEAN /dev/sd0d (d4c2875ac610c324.d): 10 files, 7 used, 1697224 free (56 frags, 212146 blo
Panic captures of VM
Hi All, One of the OpenBSD VMs running on 6.6-beta #313 is rebooting or panicing in different ways. Not sure if they are all relevant or useful but here are the ones we managed to capture. Mischa ### fd0$ uvm_fault(0x81f9def0, 0x0, 0, 1) -> e kernel: page fault trap, code=0 Stopped at splassert_check+0x1:cmpb%cl,0(%rax) Using drive 0, partition 3. Loading.. probing: pc0 com0 mem[638K 510M a20=on] disk: hd0+ hd1+ >> OpenBSD/amd64 BOOT 3.45 / com0: 115200 baud switching console to com0 >> OpenBSD/amd64 BOOT 3.45 boot> booting hd0a:/bsd: 12641608+2937864+331376+0+704512 [980486+128+1009392+738445]=0x1274608 entry point at 0x81001000 [ using 2729480 bytes of bsd ELF symbol table ] Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2019 OpenBSD. All rights reserved. https://www.OpenBSD.org OpenBSD 6.6-beta (GENERIC) #301: Tue Sep 24 15:28:24 MDT 2019 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC ### • d0# panic: mtx 0x81f353f0: locking against myself Using drive 0, partition 3. Loading.. probing: pc0 com0 mem[638K 510M a20=on] disk: hd0+ hd1+ >> OpenBSD/amd64 BOOT 3.45 / com0: 115200 baud switching console to com0 >> OpenBSD/amd64 BOOT 3.45 boot> booting hd0a:/bsd: 12666184+2937864+332896+0+704512 [987630+128+1010256+738953]=0x127d750 entry point at 0x81001000 [ using 2738000 bytes of bsd ELF symbol table ] Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2019 OpenBSD. All rights reserved. https://www.OpenBSD.org OpenBSD 6.6 (GENERIC) #325: Wed Oct 2 11:38:13 MDT 2019 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC ### fd0# sysupgrade -s SHA256.sig 100% |*| 2141 00:00 Signature Verified Verifying old sets. base66.tgz 2% | | 5109 KB00:01 fd0# uvm_fault(0x81fb5808, 0x3ff, 0, 1) -> e kernel: page fault trap, code=0 Stopped at amdgpu_atombios_encoder_setup_dig_transmitter+0x63: testb % r9b,0x(%r9,%rcx,4) uvm_fault(0x81fb5808, 0x3fd, 0, 1) -> e kernel: page fault trap, code=0 Faulted in DDB; continuing... uvm_fault(0x81f88c90, 0x894c1488, 0, 2) -> e kernel: page fault trap, code=0 Faulted in DDB; continuing... uvm_fault(0x81f88c90, 0x894c1488, 0, 2) -> e kernel: page fault trap, code=0 Faulted in DDB; continuing... uvm_fault(0x81f88c90, 0x894c1488, 0, 2) -> e kernel: page fault trap, code=0 Faulted in DDB; continuing... uvm_fault(0x81f88c90, 0x894c1488, 0, 2) -> e kernel: page fault trap, code=0 Faulted in DDB; continuing... uvm_fault(0x81f88c90, 0x894c1488, 0, 2) -> e kernel: page fault trap, code=0 Faulted in DDB; continuing... uvm_fault(0x81f88c90, 0x894c1488, 0, 2) -> e Using drive 0, partition 3. Loading.. probing: pc0 com0 mem[638K 510M a20=on] disk: hd0+ hd1+ >> OpenBSD/amd64 BOOT 3.45 / com0: 115200 baud switching console to com0 >> OpenBSD/amd64 BOOT 3.45 boot> booting hd0a:/bsd: 12666184+2937872+333664+0+704512 [983130+128+1010256+738953]=0x127c5c0 entry point at 0x81001000 [ using 2733504 bytes of bsd ELF symbol table ] Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2019 OpenBSD. All rights reserved. https://www.OpenBSD.org OpenBSD 6.6 (GENERIC) #334: Sat Oct 5 12:16:54 MDT 2019 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC ### OpenBSD/amd64 (fd0.openbsd.amsterdam) (tty00) login: uvm_fault(0x81fdf560, 0x0, 0, 1) -> e kernel: page fault trap, code=0 Stopped at Xintr_uvuvm_fault(0x81fdf560, 0x43a, 0, 1) -> e kernel: page fault trap, code=0 Stopped at Xintr_legacy5_untramp: fcompl 0x42(%rdx) ddb> uvm_fault(0x81fdf560, 0x0, 0, 1) -> e kernel: page fault trap, code=0 Faulted in DDB; continuing... ddb> uvm_fault(0x81fdf560, 0x0, 0, 1) -> e kernel: page fault trap, code=0 Faulted in DDB; continuing... ddb> uvm_fault(0x81fdf560, 0x0, 0, 1) -> e kernel: page fault trap, code=0 Faulted in DDB; continuing... ddb> uvm_fault(0x81fdf560, 0x43f, 0, 1) -> e kernel: page fault trap, code=0 Faulted in DDB; continuing... ddb> uvm_fault(0x81fdf560, 0x0, 0, 1) -> e kernel: page fault trap, code=0 Faulted in DDB; continuing... ddb> uvm_fault(0x81fdf560, 0x43f, 0, 1) -> e kernel: page fault trap, code=0 Faulted in DDB; continuing... ddb> uvm_fault(0x81fdf560, 0x0, 0, 1) -> e kernel: page fault trap, code=0 Faulted in DDB; continuing... ddb> uvm_fault(0x81fdf560, 0x0, 0, 1) -
Re: ifconfig bridge crashes host
> On 23 Jul 2019, at 23:40, Hrvoje Popovski wrote: > > On 23.7.2019. 17:03, obs...@high5.nl wrote: >>> Synopsis: ifconfig bridge crashes host >>> Category: >>> Environment: >> System : OpenBSD 6.5 >> Details : OpenBSD 6.5 (GENERIC.MP) #1: Mon May 27 18:27:59 CEST 2019 >> >> r...@syspatch-65-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP >> >> Architecture: OpenBSD.amd64 >> Machine : amd64 >>> Description: >> After running the command "ifconfig bridge" twice on the host, the host >> became unresponsive. I was able to capture the trace from the console. >>> How-To-Repeat: >> The host was running for some time so I am uncertain if it's related to >> time, >> but I have seen this happening a couple of times now, and it seems >> running the >> "ifconfig bridge" command multiple times triggers this. > > Hi, > > can you update your box with latest snapshot ? > There were some problems with "ifconfig bridge" command few months ago.. Will give that a go. Thanx! Mischa
[no subject]
>Synopsis: ifconfig bridge crashed host >Category: >Environment: System : OpenBSD 6.5 Details : OpenBSD 6.5 (GENERIC.MP) #0: Wed Apr 24 23:38:54 CEST 2019 r...@syspatch-65-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP Architecture: OpenBSD.amd64 Machine : amd64 >Description: Issued the command "ifconfig bridge" on a server running vmm/vmd exclusively, after part of the output was shown the machine froze. Connecting to the console it showed the debug prompt. There is only limited information I was able to capture on the debug prompt, not sure if this is heplful: ddb{3}> p 818993d1 ddb{3}> trace savectx(6,0,2000,36,b75479de000,2000) at savectx+0xb1 end of kernel end trace frame: 0x7f7c69d0, count: -1 ddb{3}> next After 23 instructions Stopped at x86_ipi_db+0x2f:ret ddb{3}> next After 4 instructions Stopped at x86_ipi_db+0x2f:ret >How-To-Repeat: Have not tried to reproduce. :/ >Fix: dmesg: OpenBSD 6.5 (GENERIC.MP) #0: Wed Apr 24 23:38:54 CEST 2019 r...@syspatch-65-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 34329825280 (32739MB) avail mem = 33279778816 (31738MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xcf49c000 (84 entries) bios0: vendor Dell Inc. version "6.4.0" date 07/23/2013 bios0: Dell Inc. PowerEdge R610 acpi0 at bios0: rev 2 acpi0: sleep states S0 S4 S5 acpi0: tables DSDT FACP APIC SPCR HPET DMAR MCFG WD__ SLIC ERST HEST BERT EINJ SRAT TCPA SSDT acpi0: wakeup devices PCI0(S5) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 32 (boot processor) cpu0: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz, 3325.40 MHz, 06-2c-02 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 1 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 133MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE cpu1 at mainbus0: apid 0 (application processor) cpu1: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz, 1596.01 MHz, 06-2c-02 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 0, package 0 cpu2 at mainbus0: apid 34 (application processor) cpu2: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz, 3325.01 MHz, 06-2c-02 cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN cpu2: 256KB 64b/line 8-way L2 cache cpu2: smt 0, core 1, package 1 cpu3 at mainbus0: apid 2 (application processor) cpu3: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz, 1596.01 MHz, 06-2c-02 cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN cpu3: 256KB 64b/line 8-way L2 cache cpu3: smt 0, core 1, package 0 cpu4 at mainbus0: apid 36 (application processor) cpu4: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz, 3325.01 MHz, 06-2c-02 cpu4: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN cpu4: 256KB 64b/line 8-way L2 cache cpu4: smt 0, core 2, package 1 cpu5 at mainbus0: apid 4 (application processor) cpu5: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz, 1596.01 MHz, 06-2c-02 cpu5: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN cpu5: 256KB 64b/line 8-way L2
Re: VMs "crash" with: vcpu_run_loop: vm 19 / vcpu 0 run ioctl failed: Invalid argument
> On 19 Jun 2018, at 20:28, Mischa wrote: > >> On 19 Jun 2018, at 17:51, Mike Larkin wrote: >> On Tue, Jun 19, 2018 at 03:42:06PM +0200, obs...@high5.nl wrote: >>>> Synopsis: VMs stop intermitently after vcpu_run_loop error >>>> Category: system >>>> Environment: >>> System : OpenBSD 6.3 >>> Details : OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018 >>> >>> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP >>> >>> Architecture: OpenBSD.amd64 >>> Machine : amd64 >>>> Description: >>> Currently running 12 VMs on a single machine. After some random time, >>> VMs randomly shutdown. Sometimes a single VM and sometimes more. It's >>> always after an error message like the following a VM stops. >>> Jun 19 11:16:49 j5 vmd[59907]: vcpu_run_loop: vm 11 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[51030]: vcpu_run_loop: vm 6 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[64667]: vcpu_run_loop: vm 9 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[53125]: vcpu_run_loop: vm 10 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[61348]: vcpu_run_loop: vm 7 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[59411]: vcpu_run_loop: vm 4 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[52083]: vcpu_run_loop: vm 12 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[21423]: vcpu_run_loop: vm 3 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[40185]: vcpu_run_loop: vm 5 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[60512]: vcpu_run_loop: vm 8 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[44187]: vcpu_run_loop: vm 1 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[46728]: vcpu_run_loop: vm 2 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:20:30 j5 vmd[81571]: vmd: vm 22 event thread exited unexpectedly >>> >> >> This is almost surely the following bug, fixed in April (log from pmap.c): >> >> revision 1.113 >> date: 2018/04/17 06:31:55; author: mlarkin; state: Exp; lines: +275 -64; >> commitid: BaLjO2NVfYaZP00l; >> Better way of allocating EPT entries. >> >> Don't use the standard pmap PTE functions to manipulate EPT PTEs. This >> occasionally caused VMs to fail after random amounts of time due to >> loading the pmap on the CPU and the processor updating A/D bits (which >> are reserved bits in EPT). This ultimately manifested itself as errors >> from vmd ("vcpu X run ioctl failed".) >> >> tested by many, on different types of HW, no regressions noted >> >> --- >> >> Can you try -current and see if you can still reproduce this problem? Tried -current today but got a kernel panic, seems to be unrelated to vmd but wasn't able to collect all the information that is needed to file the bug. Only got the trace. Will try -current again in a couple of days. The below is what I was able to collect, will do better next time. panic: kernel diagnostic assertion "_kernel_lock_held()" failed: file "/usr/src/sys/net/if.c", line 1382 Stopped at db_enter+0x12: popq%r11 TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 374176 65811070x100010 0x4003K vmd 94393 65811070x100010 0x4002 vmd 311214 91351070x100010 0x4000 vmd *299692 82346 910x12 01 snmpd db_enter() at db_enter+0x12 panic() at panic+0x138 __assert(818eebd4,8000222de2e0,0,8000222de3d8) at __assert+0x24 ifa_ifwithaddr(1962634d45f6caa5,8000222de3d8) at ifa_ifwithaddr+0xed in_pcbaddrisavail(c69add51b77cda1a,ff01f1cb6118,18,16) at in_pcbaddrisavail +0xd0 udp_output(8b483746c94285a0,237d,0,0) at udp_output+0x168 sosend(c197a5735fd92204,ff01edcbc798,80009bd8,6b,5,8000ffff9bf8 ) at sosend+0x351 sendit(cb9e953daf7fb585,80009bd8,8000222de6c0,8000222de5c0, 8000222de6d0) at sendit+0x3fb sys_sendmsg(e1aea756e75a2f45,1c0,80009bd8) at sys_sendmsg+0x15a syscall(eee152956d09f98c) at syscall+0x32a Xsyscall_untramp(6,0,0,0,0,1c) at Xsyscall_untramp+0xe4 end of kernel end trace frame: 0x7f7c0410, count: 4 https://www.openbsd.org/ddb.html describes t
Re: VMM owner needs to be part of wheel
Hi Reyk, Thank you for your help on the kernel includes. I have applied the patch to vmd -current and it works well on my setup. A non-wheel user can indeed start/stop a VM and connect to the console. Thanx!! Mischa On 19 Jun at 20:38, Reyk Floeter wrote: > Hi, > > On Sun, Jun 17, 2018 at 10:35:27PM +0200, obs...@high5.nl wrote: > > >Synopsis: VMM owner needs to be part of group wheel in order to run vmctl > > >console|start|stop > > the solution is not that easy as it seemed. > > 1. Change the umask and let everyone access vmd, restrict the commands > internally. > > While this is a possible solution, I also agree that this allows any > user (including system/privsep users) to trigger actions and imsgs in > vmd; even if the result is permission denied as this is checked fairly > late. > > 2. Change the default owner group to root:_vmd. > > It would be possible to define a hardcoded group, or use group _vmd, > but this doesn't feel right. > > 3. Let the user configure the owner of the control socket. > > This allows you to configure your own group, like "devops" (hrhr), or > fetch the group from YP/LDA, and let them mess with your VMs. I think > this is much more viable in a multi-user environment. Add the > following to the top/global section of /etc/vm.conf: "socket owner :devops" > > privsep/pledge also makes it a bit more complicated for us because I > don't want to allow the control process to chown the unix socket; so > it is done by the parent and some messaging. > > The attached diff implements 3. ... comments? OK? Should we use 2. instead? > > Reyk > > Index: usr.sbin/vmd/control.c > === > RCS file: /cvs/src/usr.sbin/vmd/control.c,v > retrieving revision 1.23 > diff -u -p -u -p -r1.23 control.c > --- usr.sbin/vmd/control.c13 May 2018 22:48:11 - 1.23 > +++ usr.sbin/vmd/control.c19 Jun 2018 18:27:55 - > @@ -103,6 +103,7 @@ control_dispatch_vmd(int fd, struct priv > break; > case IMSG_VMDOP_CONFIG: > config_getconfig(ps->ps_env, imsg); > + proc_compose(ps, PROC_PARENT, IMSG_VMDOP_DONE, NULL, 0); > break; > case IMSG_CTL_RESET: > config_getreset(ps->ps_env, imsg); > @@ -169,6 +170,18 @@ control_init(struct privsep *ps, struct > > cs->cs_fd = fd; > cs->cs_env = ps; > + > + proc_compose(ps, PROC_PARENT, IMSG_VMDOP_DONE, NULL, 0); > + > + return (0); > +} > + > +int > +control_reset(struct control_sock *cs) > +{ > + /* Updating owner of the control socket */ > + if (chown(cs->cs_name, cs->cs_uid, cs->cs_gid) == -1) > + return (-1); > > return (0); > } > Index: usr.sbin/vmd/parse.y > === > RCS file: /cvs/src/usr.sbin/vmd/parse.y,v > retrieving revision 1.35 > diff -u -p -u -p -r1.35 parse.y > --- usr.sbin/vmd/parse.y 19 Jun 2018 17:12:34 - 1.35 > +++ usr.sbin/vmd/parse.y 19 Jun 2018 18:27:55 - > @@ -119,7 +119,8 @@ typedef struct { > > %token INCLUDE ERROR > %token ADD BOOT CDROM DISABLE DISK DOWN ENABLE GROUP INTERFACE LLADDR > LOCAL > -%token LOCKED MEMORY NIFS OWNER PATH PREFIX RDOMAIN SIZE SWITCH UP VM > VMID > +%token LOCKED MEMORY NIFS OWNER PATH PREFIX RDOMAIN SIZE SOCKET SWITCH > UP > +%token VM VMID > %token NUMBER > %token STRING > %type lladdr > @@ -190,6 +191,10 @@ main : LOCAL PREFIX STRING { > > memcpy(>vmd_cfg.cfg_localprefix, , sizeof(h)); > } > + | SOCKET OWNER owner_id { > + env->vmd_ps.ps_csock.cs_uid = $3.uid; > + env->vmd_ps.ps_csock.cs_gid = $3.gid == -1 ? 0 : $3.gid; > + } > ; > > switch : SWITCH string { > @@ -678,6 +683,7 @@ lookup(char *s) > { "prefix", PREFIX }, > { "rdomain",RDOMAIN }, > { "size", SIZE }, > + { "socket", SOCKET }, > { "switch", SWITCH }, > { "up", UP }, > { "vm", VM } > Index: usr.sbin/vmd/proc.h > === > RCS file: /cvs/src/usr.sbin/vmd/proc.h,v > retrieving revision 1.12 > diff -u -p -u -
Re: VMM owner needs to be part of wheel
On 19 Jun at 22:10, Theo de Raadt wrote: > Mischa wrote: > > > > 2. Change the default owner group to root:_vmd. > > > > > > It would be possible to define a hardcoded group, or use group _vmd, > > > but this doesn't feel right. > > > > Why would using _vmd not work? > > Wouldn't that be the same for other daemons? > > Or wouldn't these be used to assign other users to it? > > One which comes to mind is _ladvd, albeit not in base. > > Because it exists for a different purpose. A security purpose. > Good lord. Makes perfect sense. Mischa
Re: VMM owner needs to be part of wheel
Hi Reyk, On 19 Jun at 20:38, Reyk Floeter wrote: > Hi, > > On Sun, Jun 17, 2018 at 10:35:27PM +0200, obs...@high5.nl wrote: > > >Synopsis: VMM owner needs to be part of group wheel in order to run vmctl > > >console|start|stop > > the solution is not that easy as it seemed. > > 1. Change the umask and let everyone access vmd, restrict the commands > internally. > > While this is a possible solution, I also agree that this allows any > user (including system/privsep users) to trigger actions and imsgs in > vmd; even if the result is permission denied as this is checked fairly > late. > > 2. Change the default owner group to root:_vmd. > > It would be possible to define a hardcoded group, or use group _vmd, > but this doesn't feel right. Why would using _vmd not work? Wouldn't that be the same for other daemons? Or wouldn't these be used to assign other users to it? One which comes to mind is _ladvd, albeit not in base. > 3. Let the user configure the owner of the control socket. > > This allows you to configure your own group, like "devops" (hrhr), or > fetch the group from YP/LDA, and let them mess with your VMs. I think > this is much more viable in a multi-user environment. Add the > following to the top/global section of /etc/vm.conf: "socket owner :devops" Works for me. And when vm.conf doesn't exist, or the owner would exist it would fall back to root? Will check out the patch asap! Mischa > privsep/pledge also makes it a bit more complicated for us because I > don't want to allow the control process to chown the unix socket; so > it is done by the parent and some messaging. > > The attached diff implements 3. ... comments? OK? Should we use 2. instead? > > Reyk > > Index: usr.sbin/vmd/control.c > === > RCS file: /cvs/src/usr.sbin/vmd/control.c,v > retrieving revision 1.23 > diff -u -p -u -p -r1.23 control.c > --- usr.sbin/vmd/control.c13 May 2018 22:48:11 - 1.23 > +++ usr.sbin/vmd/control.c19 Jun 2018 18:27:55 - > @@ -103,6 +103,7 @@ control_dispatch_vmd(int fd, struct priv > break; > case IMSG_VMDOP_CONFIG: > config_getconfig(ps->ps_env, imsg); > + proc_compose(ps, PROC_PARENT, IMSG_VMDOP_DONE, NULL, 0); > break; > case IMSG_CTL_RESET: > config_getreset(ps->ps_env, imsg); > @@ -169,6 +170,18 @@ control_init(struct privsep *ps, struct > > cs->cs_fd = fd; > cs->cs_env = ps; > + > + proc_compose(ps, PROC_PARENT, IMSG_VMDOP_DONE, NULL, 0); > + > + return (0); > +} > + > +int > +control_reset(struct control_sock *cs) > +{ > + /* Updating owner of the control socket */ > + if (chown(cs->cs_name, cs->cs_uid, cs->cs_gid) == -1) > + return (-1); > > return (0); > } > Index: usr.sbin/vmd/parse.y > === > RCS file: /cvs/src/usr.sbin/vmd/parse.y,v > retrieving revision 1.35 > diff -u -p -u -p -r1.35 parse.y > --- usr.sbin/vmd/parse.y 19 Jun 2018 17:12:34 - 1.35 > +++ usr.sbin/vmd/parse.y 19 Jun 2018 18:27:55 - > @@ -119,7 +119,8 @@ typedef struct { > > %token INCLUDE ERROR > %token ADD BOOT CDROM DISABLE DISK DOWN ENABLE GROUP INTERFACE LLADDR > LOCAL > -%token LOCKED MEMORY NIFS OWNER PATH PREFIX RDOMAIN SIZE SWITCH UP VM > VMID > +%token LOCKED MEMORY NIFS OWNER PATH PREFIX RDOMAIN SIZE SOCKET SWITCH > UP > +%token VM VMID > %token NUMBER > %token STRING > %type lladdr > @@ -190,6 +191,10 @@ main : LOCAL PREFIX STRING { > > memcpy(>vmd_cfg.cfg_localprefix, , sizeof(h)); > } > + | SOCKET OWNER owner_id { > + env->vmd_ps.ps_csock.cs_uid = $3.uid; > + env->vmd_ps.ps_csock.cs_gid = $3.gid == -1 ? 0 : $3.gid; > + } > ; > > switch : SWITCH string { > @@ -678,6 +683,7 @@ lookup(char *s) > { "prefix", PREFIX }, > { "rdomain",RDOMAIN }, > { "size", SIZE }, > + { "socket", SOCKET }, > { "switch", SWITCH }, > { "up", UP }, > { "vm", VM } > Index: usr.sbin/vmd/proc.h > ===
Re: VMs "crash" with: vcpu_run_loop: vm 19 / vcpu 0 run ioctl failed: Invalid argument
> On 19 Jun 2018, at 17:51, Mike Larkin wrote: > On Tue, Jun 19, 2018 at 03:42:06PM +0200, obs...@high5.nl wrote: >>> Synopsis: VMs stop intermitently after vcpu_run_loop error >>> Category: system >>> Environment: >> System : OpenBSD 6.3 >> Details : OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018 >> >> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP >> >> Architecture: OpenBSD.amd64 >> Machine : amd64 >>> Description: >> Currently running 12 VMs on a single machine. After some random time, >> VMs randomly shutdown. Sometimes a single VM and sometimes more. It's always >> after an error message like the following a VM stops. >> Jun 19 11:16:49 j5 vmd[59907]: vcpu_run_loop: vm 11 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[51030]: vcpu_run_loop: vm 6 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[64667]: vcpu_run_loop: vm 9 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[53125]: vcpu_run_loop: vm 10 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[61348]: vcpu_run_loop: vm 7 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[59411]: vcpu_run_loop: vm 4 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[52083]: vcpu_run_loop: vm 12 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[21423]: vcpu_run_loop: vm 3 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[40185]: vcpu_run_loop: vm 5 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[60512]: vcpu_run_loop: vm 8 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[44187]: vcpu_run_loop: vm 1 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[46728]: vcpu_run_loop: vm 2 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:20:30 j5 vmd[81571]: vmd: vm 22 event thread exited unexpectedly >> > > This is almost surely the following bug, fixed in April (log from pmap.c): > > revision 1.113 > date: 2018/04/17 06:31:55; author: mlarkin; state: Exp; lines: +275 -64; > commitid: BaLjO2NVfYaZP00l; > Better way of allocating EPT entries. > > Don't use the standard pmap PTE functions to manipulate EPT PTEs. This > occasionally caused VMs to fail after random amounts of time due to > loading the pmap on the CPU and the processor updating A/D bits (which > are reserved bits in EPT). This ultimately manifested itself as errors > from vmd ("vcpu X run ioctl failed".) > > tested by many, on different types of HW, no regressions noted > > --- > > Can you try -current and see if you can still reproduce this problem? Will do! Will probably be able to upgrade to current this week. >> Side note: after a reboot of the host, all VMs stop at one point as it looks >> like VMM starts all the VMs at the same time. Looks like it's draining >> resources at that point. >> > > Yes, this is a known issue, I've had it on my to-do list to have some sort > of sequencing or delay, but never got around to it (Hint, hint, such a fix > would > likely be an easy intro-to-vmd-hacking effort for someone who wanted to dip > their > toe in the water). Wish I was able to do something. Will get some more hardware up and running to host OpenBSD VMs and donate a part to the Foundation. Mischa > > -ml > >>> How-To-Repeat: >> Unfortunately I have not found a way to reproduce this, I thought I was >> on to something when I loaded a Alpine Linux VM as well, but this is now >> also happening without it running. >> >>> Fix: >> No fix. >> >> >> dmesg: >> OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018 >> >> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP >> real mem = 8544342016 (8148MB) >> avail mem = 8278315008 (7894MB) >> mpath0 at root >> scsibus0 at mpath0: 256 targets >> mainbus0 at root >> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe9750 (56 entries) >> bios0: vendor American Megatrends Inc. version "2.0b" date 09/17/2012 >> bios0: Supermicro X9SCL/X9SCM >> acpi0 at bios0: rev 2 >> acpi0: sleep states S0 S1 S4 S5 >> acpi0: tables DSDT FACP APIC FPDT MCFG HPET SSDT SPMI SSDT SSDT EINJ ERST >> HEST BERT BGRT >> acpi0: wakeup devices PS2K(S4) PS2M(S4) UAR1(S4) UAR2(S4) P0P1(S4) USB1(S4) >> USB2
Re: VMM owner needs to be part of wheel
Hi Reyk, Seems like a workable solution if the security part is restricted enough by vmd. Mischa > On 18 Jun 2018, at 00:05, Reyk Floeter wrote: > > Hi, > > changing the umask in control.c could fix it. There’s no need to restrict it > to wheel since vmd checks the permissions based on configuration internally. > Having the vmd socket world-writable should be OK. > > But we could eventually use a group _vmd to shield off users who shouldn’t > even be able to do anything. But this doesn’t make much sense - it would be a > bit like restricting users from running ps a. > > I can make a diff tomorrow. > > Reyk > > Am 17.06.2018 um 22:35 schrieb obs...@high5.nl: > >>> Synopsis:VMM owner needs to be part of group wheel in order to run >>> vmctl console|start|stop >>> Category:system >>> Environment: >> System : OpenBSD 6.3 >> Details : OpenBSD 6.3 (GENERIC.MP) #3: Fri May 18 00:06:26 CEST 2018 >> >> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP >> >> Architecture: OpenBSD.amd64 >> Machine : amd64 >>> Description: >> When some level of vmctl is needed for users they are currently required >> to be part of group wheel. It would be great from a hosting perspective to >> allow users to control their own VM and attach to tJhe console. I started a >> small project to host OpenBSD VMs for the community out of Amsterdam and I >> would love to provide users access to their own VM. >>> How-To-Repeat: >> Set the owner who is not in wheel will result in a message like: >> vmctl: command failed: Operation not permitted >>> Fix: >> The current work around is to add the user to group wheel, which is might >> be ok for trusted users. >> >> >> dmesg: >> OpenBSD 6.3 (GENERIC.MP) #3: Fri May 18 00:06:26 CEST 2018 >> >> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP >> real mem = 8544342016 (8148MB) >> avail mem = 8278310912 (7894MB) >> mpath0 at root >> scsibus0 at mpath0: 256 targets >> mainbus0 at root >> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe9750 (56 entries) >> bios0: vendor American Megatrends Inc. version "2.0b" date 09/17/2012 >> bios0: Supermicro X9SCL/X9SCM >> acpi0 at bios0: rev 2 >> acpi0: sleep states S0 S1 S4 S5 >> acpi0: tables DSDT FACP APIC FPDT MCFG HPET SSDT SPMI SSDT SSDT EINJ ERST >> HEST BERT BGRT >> acpi0: wakeup devices PS2K(S4) PS2M(S4) UAR1(S4) UAR2(S4) P0P1(S4) USB1(S4) >> USB2(S4) USB3(S4) USB4(S4) USB5(S4) USB6(S4) USB7(S4) PXSX(S4) RP01(S4) >> PXSX(S4) RP02(S4) [...] >> acpitimer0 at acpi0: 3579545 Hz, 24 bits >> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat >> cpu0 at mainbus0: apid 0 (boot processor) >> cpu0: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3100.45 MHz >> cpu0: >> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN >> cpu0: 256KB 64b/line 8-way L2 cache >> acpitimer0: recalibrated TSC frequency 3100015637 Hz >> cpu0: smt 0, core 0, package 0 >> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges >> cpu0: apic clock running at 100MHz >> cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE >> cpu1 at mainbus0: apid 2 (application processor) >> cpu1: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3100.03 MHz >> cpu1: >> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN >> cpu1: 256KB 64b/line 8-way L2 cache >> cpu1: smt 0, core 1, package 0 >> cpu2 at mainbus0: apid 4 (application processor) >> cpu2: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3100.03 MHz >> cpu2: >> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN >> cpu2: 256KB 64b/line 8-way L2 cache >> cpu2: smt 0, core 2, package 0 &g
Re: em0 stops working after random amount of time in OpenBSD 5.8/5.9
Hi Evgeniy, Thank you for your suggestion. Unfortunately there is already a card in the only available slot. I did notice that the network cards had a different chipset. em0 at pci4 dev 0 function 0 "Intel 82573E" rev 0x03: msi, address 00:30:48:96:42:06 em1 at pci5 dev 0 function 0 "Intel 82573L" rev 0x00: msi, address 00:30:48:96:42:07 I moved the cable em1 and it's been behaving as expected since. Thanx for responding!! Mischa > On 03 Apr 2016, at 02:13, Evgeniy Sudyr <eject.in...@gmail.com> wrote: > > Mischa, please provide sendbug (1) with all details to developers and > then wait and hope someone will pick that bug. > > Personally I will suggest you to don't use that 11 years old NIC which > causes such problems (in archives I found some reports similar to > yours) and get better NIC which fits your needs. > > Check Intel i350 > http://www.intel.com/content/www/us/en/embedded/products/networking/ethernet-controller-i350-datasheet.html > > You can get it for ~ $150 retail price (dual port). I never had issues > with this one, also I didn't used even half of it's cool features on > any platform. > > > On Sat, Apr 2, 2016 at 10:14 PM, Mischa Peters <open...@high5.nl> wrote: >> Hi Evgeniy, >> >> One of the questions I had was indeed how to troubleshoot this. Nothing is >> in dmesg or messages that is out of the ordinary, I can not find anything >> that changes on the interface or netstat. >> >> Until the 18th of March this machine was running FreeBSD, without any >> issues. I moved from 9.3-RELEASE-pXX to OpenBSD 5.8. There are still 2 >> machines of the same type that are running FreeBSD 9.3 without any issues. >> >> I do know there are issues in FreeBSD 10 with this NIC which haven't been >> resolved. But they have primarily to do that the driver is not loading. >> >> The thing that is strange is that it works after reboot, I can ping an IP. >> But as soon as I run ftp or pkg_add for example, it stops working. >> >> Mischa >> >> -- >> >> >> >> -- >>> On 02 Apr 2016, at 21:15, Evgeniy Sudyr <eject.in...@gmail.com> wrote: >>> >>> Mischa, >>> >>> 1) Consider using sendbug (1) to provide report (read section saying >>> "The following items should be contained in every bug report") >>> >>> http://www.openbsd.org/report.html >>> >>> 2) I suggest to provide more details about your system configuration. >>> Most interesting is if any sysctl tuning done and if it was working >>> system or new/fresh setup which never worked before? >>> >>> 3) Can it be some broken hardware? I just googled for your board / NIC >>> and both are about 9yrs old. >>> >>> -- >>> Evgeniy >>> >>>> On Sat, Apr 2, 2016 at 7:36 PM, Mischa <open...@high5.nl> wrote: >>>> Hi All, >>>> >>>> I just tried with: OpenBSD host 5.9 GENERIC.MP#1888 amd64 >>>> The result is still the same. Networking stops and sometimes continues >>>> after some time. >>>> Could this because of SMP networking? >>>> >>>> What I am seeing on the switch is that the MAC address is still in the MAC >>>> table. >>>> But there is no longer an ARP entry. >>>> >>>> Mischa >>>> >>>> >>>>> On 22 Mar 2016, at 12:18, Mischa <open...@high5.nl> wrote: >>>>> >>>>> Hi All, >>>>> >>>>> I would be happy to provide remote console access if that helps. >>>>> >>>>> Mischa >>>>> >>>>>> On 20 Mar 2016, at 14:52, Mischa <open...@high5.nl> wrote: >>>>>> >>>>>> Hi All, >>>>>> >>>>>> I am running OpenBSD 5.8, and tried 5.9 as well, on a SuperMicro PDSMi >>>>>> which has an Intel 82573E. >>>>>> For some reason networking just stops working after a random amount of >>>>>> time and usually happens when I SSH-ed into the machine. >>>>>> When connected to the console it seems to be working longer. I am >>>>>> testing this by pinging an IP address on the local subnet. >>>>>> >>>>>> Unfortunately I can not find anything different from an interface >>>>>> perspective, subnet perspective and nothing appears in the logs. >>>>>> The problem goes away, temporarily, when I bounce the int
Re: em0 stops working after random amount of time in OpenBSD 5.8/5.9
Hi Evgeniy, One of the questions I had was indeed how to troubleshoot this. Nothing is in dmesg or messages that is out of the ordinary, I can not find anything that changes on the interface or netstat. Until the 18th of March this machine was running FreeBSD, without any issues. I moved from 9.3-RELEASE-pXX to OpenBSD 5.8. There are still 2 machines of the same type that are running FreeBSD 9.3 without any issues. I do know there are issues in FreeBSD 10 with this NIC which haven't been resolved. But they have primarily to do that the driver is not loading. The thing that is strange is that it works after reboot, I can ping an IP. But as soon as I run ftp or pkg_add for example, it stops working. Mischa -- -- > On 02 Apr 2016, at 21:15, Evgeniy Sudyr <eject.in...@gmail.com> wrote: > > Mischa, > > 1) Consider using sendbug (1) to provide report (read section saying > "The following items should be contained in every bug report") > > http://www.openbsd.org/report.html > > 2) I suggest to provide more details about your system configuration. > Most interesting is if any sysctl tuning done and if it was working > system or new/fresh setup which never worked before? > > 3) Can it be some broken hardware? I just googled for your board / NIC > and both are about 9yrs old. > > -- > Evgeniy > >> On Sat, Apr 2, 2016 at 7:36 PM, Mischa <open...@high5.nl> wrote: >> Hi All, >> >> I just tried with: OpenBSD host 5.9 GENERIC.MP#1888 amd64 >> The result is still the same. Networking stops and sometimes continues after >> some time. >> Could this because of SMP networking? >> >> What I am seeing on the switch is that the MAC address is still in the MAC >> table. >> But there is no longer an ARP entry. >> >> Mischa >> >> >>> On 22 Mar 2016, at 12:18, Mischa <open...@high5.nl> wrote: >>> >>> Hi All, >>> >>> I would be happy to provide remote console access if that helps. >>> >>> Mischa >>> >>>> On 20 Mar 2016, at 14:52, Mischa <open...@high5.nl> wrote: >>>> >>>> Hi All, >>>> >>>> I am running OpenBSD 5.8, and tried 5.9 as well, on a SuperMicro PDSMi >>>> which has an Intel 82573E. >>>> For some reason networking just stops working after a random amount of >>>> time and usually happens when I SSH-ed into the machine. >>>> When connected to the console it seems to be working longer. I am testing >>>> this by pinging an IP address on the local subnet. >>>> >>>> Unfortunately I can not find anything different from an interface >>>> perspective, subnet perspective and nothing appears in the logs. >>>> The problem goes away, temporarily, when I bounce the interface on the >>>> switch. >>>> >>>> How can I best troubleshoot the cause? >>>> >>>> # dmesg >>>> em0 at pci4 dev 0 function 0 "Intel 82573E" rev 0x03: msi, address >>>> 00:30:48:96:42:06 >>>> >>>> # pcidump -v >>>> 13:0:0: Intel 82573E >>>>0x: Vendor ID: 8086 Product ID: 108c >>>>0x0004: Command: 0107 Status: 0010 >>>>0x0008: Class: 02 Subclass: 00 Interface: 00 Revision: 03 >>>>0x000c: BIST: 00 Header Type: 00 Latency Timer: 00 Cache Line Size: 10 >>>>0x0010: BAR mem 32bit addr: 0xe8a0/0x0002 >>>>0x0014: BAR empty () >>>>0x0018: BAR io addr: 0x5000/0x0020 >>>>0x001c: BAR empty () >>>>0x0020: BAR empty () >>>>0x0024: BAR empty () >>>>0x0028: Cardbus CIS: >>>>0x002c: Subsystem Vendor ID: 15d9 Product ID: 108c >>>>0x0030: Expansion ROM Base Address: >>>>0x0038: >>>>0x003c: Interrupt Pin: 01 Line: 0b Min Gnt: 00 Max Lat: 00 >>>>0x00c8: Capability 0x01: Power Management >>>>0x00d0: Capability 0x05: Message Signaled Interrupts (MSI) >>>>0x00e0: Capability 0x10: PCI Express >>>>Link Speed: 2.5 / 2.5 GT/s Link Width: x1 / x1 >>>> >>>> # ifconfig em0 >>>> em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 >>>> lladdr 00:30:48:96:42:06 >>>> priority: 0 >>>> groups: egress >>>> media: Ethernet autoselect (1000baseT >>>> full-duplex,master,rxpause,txpause) >>>> status: active >>>> inet netmask 0xff00 broadcast 46.23.86.255 >>>> >>>> # netstat -nr >>>> Internet: >>>> DestinationGatewayFlags Refs Use Mtu Prio >>>> Iface >>>> default UGS3 37 - 8 em0 >>>> /2446.23.86.132 UC 10 - 8 em0 >>>> 02:e0:52:9c:3c:56 UHLc 10 - 8 em0 >>>>00:30:48:96:42:06 HLl00 - 1 lo0 >>>> UHb00 - 1 em0 >>>> 127/8 127.0.0.1 UGRS 00 32768 8 lo0 >>>> 127.0.0.1 127.0.0.1 UHl10 32768 1 lo0 >>>> 224/4 127.0.0.1 URS00 32768 8 lo0 >>>> >>>> Thanx! >>>> >>>> Mischa > > > > -- > -- > With regards, > Eugene Sudyr
Re: em0 stops working after random amount of time in OpenBSD 5.8/5.9
Hi All, I just tried with: OpenBSD host 5.9 GENERIC.MP#1888 amd64 The result is still the same. Networking stops and sometimes continues after some time. Could this because of SMP networking? What I am seeing on the switch is that the MAC address is still in the MAC table. But there is no longer an ARP entry. Mischa > On 22 Mar 2016, at 12:18, Mischa <open...@high5.nl> wrote: > > Hi All, > > I would be happy to provide remote console access if that helps. > > Mischa > >> On 20 Mar 2016, at 14:52, Mischa <open...@high5.nl> wrote: >> >> Hi All, >> >> I am running OpenBSD 5.8, and tried 5.9 as well, on a SuperMicro PDSMi which >> has an Intel 82573E. >> For some reason networking just stops working after a random amount of time >> and usually happens when I SSH-ed into the machine. >> When connected to the console it seems to be working longer. I am testing >> this by pinging an IP address on the local subnet. >> >> Unfortunately I can not find anything different from an interface >> perspective, subnet perspective and nothing appears in the logs. >> The problem goes away, temporarily, when I bounce the interface on the >> switch. >> >> How can I best troubleshoot the cause? >> >> # dmesg >> em0 at pci4 dev 0 function 0 "Intel 82573E" rev 0x03: msi, address >> 00:30:48:96:42:06 >> >> # pcidump -v >> 13:0:0: Intel 82573E >> 0x: Vendor ID: 8086 Product ID: 108c >> 0x0004: Command: 0107 Status: 0010 >> 0x0008: Class: 02 Subclass: 00 Interface: 00 Revision: 03 >> 0x000c: BIST: 00 Header Type: 00 Latency Timer: 00 Cache Line Size: 10 >> 0x0010: BAR mem 32bit addr: 0xe8a0/0x0002 >> 0x0014: BAR empty () >> 0x0018: BAR io addr: 0x5000/0x0020 >> 0x001c: BAR empty () >> 0x0020: BAR empty () >> 0x0024: BAR empty () >> 0x0028: Cardbus CIS: >> 0x002c: Subsystem Vendor ID: 15d9 Product ID: 108c >> 0x0030: Expansion ROM Base Address: >> 0x0038: >> 0x003c: Interrupt Pin: 01 Line: 0b Min Gnt: 00 Max Lat: 00 >> 0x00c8: Capability 0x01: Power Management >> 0x00d0: Capability 0x05: Message Signaled Interrupts (MSI) >> 0x00e0: Capability 0x10: PCI Express >> Link Speed: 2.5 / 2.5 GT/s Link Width: x1 / x1 >> >> # ifconfig em0 >> em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 >> lladdr 00:30:48:96:42:06 >> priority: 0 >> groups: egress >> media: Ethernet autoselect (1000baseT >> full-duplex,master,rxpause,txpause) >> status: active >> inet netmask 0xff00 broadcast 46.23.86.255 >> >> # netstat -nr >> Internet: >> DestinationGatewayFlags Refs Use Mtu Prio Iface >> default UGS3 37 - 8 em0 >> /2446.23.86.132 UC 10 - 8 em0 >> 02:e0:52:9c:3c:56 UHLc 1 0 - 8 em0 >>00:30:48:96:42:06 HLl00 - 1 lo0 >> UHb00 - 1 em0 >> 127/8 127.0.0.1 UGRS 00 32768 8 lo0 >> 127.0.0.1 127.0.0.1 UHl10 32768 1 lo0 >> 224/4 127.0.0.1 URS00 32768 8 lo0 >> >> Thanx! >> >> Mischa >> >> >
Re: em0 stops working after random amount of time in OpenBSD 5.8/5.9
Hi All, I would be happy to provide remote console access if that helps. Mischa > On 20 Mar 2016, at 14:52, Mischa <open...@high5.nl> wrote: > > Hi All, > > I am running OpenBSD 5.8, and tried 5.9 as well, on a SuperMicro PDSMi which > has an Intel 82573E. > For some reason networking just stops working after a random amount of time > and usually happens when I SSH-ed into the machine. > When connected to the console it seems to be working longer. I am testing > this by pinging an IP address on the local subnet. > > Unfortunately I can not find anything different from an interface > perspective, subnet perspective and nothing appears in the logs. > The problem goes away, temporarily, when I bounce the interface on the switch. > > How can I best troubleshoot the cause? > > # dmesg > em0 at pci4 dev 0 function 0 "Intel 82573E" rev 0x03: msi, address > 00:30:48:96:42:06 > > # pcidump -v > 13:0:0: Intel 82573E > 0x: Vendor ID: 8086 Product ID: 108c > 0x0004: Command: 0107 Status: 0010 > 0x0008: Class: 02 Subclass: 00 Interface: 00 Revision: 03 > 0x000c: BIST: 00 Header Type: 00 Latency Timer: 00 Cache Line Size: 10 > 0x0010: BAR mem 32bit addr: 0xe8a0/0x0002 > 0x0014: BAR empty () > 0x0018: BAR io addr: 0x5000/0x0020 > 0x001c: BAR empty () > 0x0020: BAR empty () > 0x0024: BAR empty () > 0x0028: Cardbus CIS: > 0x002c: Subsystem Vendor ID: 15d9 Product ID: 108c > 0x0030: Expansion ROM Base Address: > 0x0038: > 0x003c: Interrupt Pin: 01 Line: 0b Min Gnt: 00 Max Lat: 00 > 0x00c8: Capability 0x01: Power Management > 0x00d0: Capability 0x05: Message Signaled Interrupts (MSI) > 0x00e0: Capability 0x10: PCI Express > Link Speed: 2.5 / 2.5 GT/s Link Width: x1 / x1 > > # ifconfig em0 > em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 >lladdr 00:30:48:96:42:06 >priority: 0 >groups: egress >media: Ethernet autoselect (1000baseT > full-duplex,master,rxpause,txpause) >status: active >inet netmask 0xff00 broadcast 46.23.86.255 > > # netstat -nr > Internet: > DestinationGatewayFlags Refs Use Mtu Prio Iface > default UGS3 37 - 8 em0 > /2446.23.86.132 UC 10 - 8 em0 > 02:e0:52:9c:3c:56 UHLc 10 - 8 em0 >00:30:48:96:42:06 HLl00 - 1 lo0 > UHb00 - 1 em0 > 127/8 127.0.0.1 UGRS 0 0 32768 8 lo0 > 127.0.0.1 127.0.0.1 UHl10 32768 1 lo0 > 224/4 127.0.0.1 URS00 32768 8 lo0 > > Thanx! > > Mischa > >
em0 stops working after random amount of time in OpenBSD 5.8/5.9
Hi All, I am running OpenBSD 5.8, and tried 5.9 as well, on a SuperMicro PDSMi which has an Intel 82573E. For some reason networking just stops working after a random amount of time and usually happens when I SSH-ed into the machine. When connected to the console it seems to be working longer. I am testing this by pinging an IP address on the local subnet. Unfortunately I can not find anything different from an interface perspective, subnet perspective and nothing appears in the logs. The problem goes away, temporarily, when I bounce the interface on the switch. How can I best troubleshoot the cause? # dmesg em0 at pci4 dev 0 function 0 "Intel 82573E" rev 0x03: msi, address 00:30:48:96:42:06 # pcidump -v 13:0:0: Intel 82573E 0x: Vendor ID: 8086 Product ID: 108c 0x0004: Command: 0107 Status: 0010 0x0008: Class: 02 Subclass: 00 Interface: 00 Revision: 03 0x000c: BIST: 00 Header Type: 00 Latency Timer: 00 Cache Line Size: 10 0x0010: BAR mem 32bit addr: 0xe8a0/0x0002 0x0014: BAR empty () 0x0018: BAR io addr: 0x5000/0x0020 0x001c: BAR empty () 0x0020: BAR empty () 0x0024: BAR empty () 0x0028: Cardbus CIS: 0x002c: Subsystem Vendor ID: 15d9 Product ID: 108c 0x0030: Expansion ROM Base Address: 0x0038: 0x003c: Interrupt Pin: 01 Line: 0b Min Gnt: 00 Max Lat: 00 0x00c8: Capability 0x01: Power Management 0x00d0: Capability 0x05: Message Signaled Interrupts (MSI) 0x00e0: Capability 0x10: PCI Express Link Speed: 2.5 / 2.5 GT/s Link Width: x1 / x1 # ifconfig em0 em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:30:48:96:42:06 priority: 0 groups: egress media: Ethernet autoselect (1000baseT full-duplex,master,rxpause,txpause) status: active inet netmask 0xff00 broadcast 46.23.86.255 # netstat -nr Internet: DestinationGatewayFlags Refs Use Mtu Prio Iface default UGS3 37 - 8 em0 /2446.23.86.132 UC 10 - 8 em0 02:e0:52:9c:3c:56 UHLc 10 - 8 em0 00:30:48:96:42:06 HLl00 - 1 lo0 UHb00 - 1 em0 127/8 127.0.0.1 UGRS 00 32768 8 lo0 127.0.0.1 127.0.0.1 UHl10 32768 1 lo0 224/4 127.0.0.1 URS00 32768 8 lo0 Thanx! Mischa
Re: Potential MP problem in OpenBSD 5.8
> On 07 Mar 2016, at 09:34, Mike Larkin <mlar...@azathoth.net> wrote: > > On Sun, Mar 06, 2016 at 04:33:59PM +0100, Mischa wrote: >> Hi, >> >> Reyk asked me to post the following panics on this list. >> I have seen multiple panics when running the stock relayd / httpd on both >> bhyve and bare metal. >> Here are the 2 I captured. >> >> The first trace is from OpenBSD 5.8 running on bhyve (FreeBSD 10.2). >> >> https://gist.github.com/mischapeters/11dd221087c2b04b7741 >> panic: mtx_enter: locking against myself >> Stopped at 0x8133fc09: leave >> RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC! >> DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION! >> ddb> >> ddb> trace >> (null)() at 0x8133fc09 >> (null)() at 0x811a310e >> (null)() at 0x8132508f >> (null)() at 0x811bc830 >> (null)() at 0x8123f060 >> (null)() at 0x81253abd >> (null)() at 0x81254900 >> (null)() at 0x81193a95 >> (null)() at 0x81324632 >> (null)() at 0x8133b3cf >> (null)() at 0x8119f8f5 >> (null)() at 0x811bb0f6 >> (null)() at 0x811bb6bc >> (null)() at 0x811bee6e >> (null)() at 0x813233ae > > Please tell bhyve to implement proper elf symbol loading in their bootloader. > The trace above is pretty useless. I guess you are seeing all of them at AsiaBSD or BSDCan, no? ;) What about the other trace? Mischa
Potential MP problem in OpenBSD 5.8
89 30x90 kqreadrelayd 8280 11407 11407 89 30x90 kqreadrelayd 11407 9472 11407 89 30x90 kqreadrelayd 26863 9472 26863 89 30x90 kqreadrelayd 26108 9472 26108 89 30x90 kqreadrelayd 8232 9472 8232 89 70x10relayd 9472 1 9472 0 30x80 kqreadrelayd 16247 14642 14642 75 30x92 poll bgpd 18324 14642 14642 75 30x92 poll bgpd 14642 1 14642 0 30x80 poll bgpd 11830 2383 11830 91 30x90 kqreadsnmpd 31427 2383 31427 91 30x90 kqreadsnmpd 2383 1 2383 0 30x80 kqreadsnmpd 7874 1 7874 0 30x80 selectsshd 20764 8955 7081 83 30x90 poll ntpd 8955 7081 7081 83 30x90 poll ntpd 7081 1 7081 0 30x80 poll ntpd 5327 29909 29909 74 30x90 bpf pflogd 29909 1 29909 0 30x80 netio pflogd 24314 11314 11314 73 30x90 kqreadsyslogd 11314 1 11314 0 30x80 netio syslogd 10994 0 0 0 3 0x14200 pgzerozerothread 23221 0 0 0 3 0x14200 aiodoned aiodoned 5596 0 0 0 3 0x14200 syncerupdate 17098 0 0 0 3 0x14200 cleaner cleaner 12769 0 0 0 3 0x14200 reaperreaper 28148 0 0 0 3 0x14200 pgdaemon pagedaemon 14465 0 0 0 3 0x14200 bored srdis 18524 0 0 0 3 0x14200 bored crypto 30899 0 0 0 3 0x14200 pftm pfpurge 18749 0 0 0 3 0x14200 usbtskusbtask 15794 0 0 0 3 0x14200 usbatsk usbatsk 13494 0 0 0 3 0x40014200 acpi0 acpi0 28997 0 0 0 3 0x40014200idle3 19921 0 0 0 3 0x40014200idle2 28399 0 0 0 3 0x40014200idle1 18074 0 0 0 3 0x14200 bored sensors 9023 0 0 0 7 0x14210softnet 24448 0 0 0 3 0x14200 bored systqmp 15997 0 0 0 3 0x14200 bored systq 28210 0 0 0 3 0x40014200idle0 1 0 1 0 30x82 wait init 0 -1 0 0 3 0x10200 scheduler swapper ddb{0}> Hopefully this will help. Regards, Mischa
Re: OpenBSD 5.8 GENERIC#1170 amd64 panics in bhyve
Hi Steven, > On 21 Nov 2015, at 18:00, Steven Chamberlain <ste...@pyro.eu.org> wrote: > > Hello, > > Mischa wrote: >> I am running OpenBSD 5.8 GENERIC#1170 amd64 as a bhyve instance on FreeBSD >> 10.2-RELEASE-p7. > > I suspect you should try this again using a snapshot kernel[0] and if > the problem still happens, the ddb trace will be more detailed. > > [0]: http://ftp.eu.openbsd.org/pub/OpenBSD/snapshots/amd64/ Will give this a go! Thanx! Mischa
OpenBSD 5.8 GENERIC#1170 amd64 panics in bhyve
Hi All, I am running OpenBSD 5.8 GENERIC#1170 amd64 as a bhyve instance on FreeBSD 10.2-RELEASE-p7. The storage is provided by ZFS of which the instance runs of. The only services that are running in this OpeBSD instance are relayd and httpd. There is no content hosted on this instance relayd / httpd only act as a reverse proxy. I managed to catch the panic this time. Hopefully this provides some insight what happend. lb1:~ $ panic: mtx_enter: locking against myself Stopped at 0x8133fbf9: leave RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC! DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION! ddb> trace (null)() at 0x8133fbf9 (null)() at 0x811a310e (null)() at 0x8132507f (null)() at 0x811bc830 (null)() at 0x8123f050 (null)() at 0x81253aad (null)() at 0x812548f0 (null)() at 0x81193a95 (null)() at 0x81324622 (null)() at 0x8133b3bf (null)() at 0x8119f8f5 (null)() at 0x811bb0f6 (null)() at 0x811bb6bc (null)() at 0x811bee6e (null)() at 0x8132339e end of kernel end trace frame: 0x13b598f9a190, count: -15 ddb> ps PID PPID PGRPUID S FLAGS WAIT COMMAND 6780 1 6780 0 30x83 ttyin ksh 28980 1 28980 0 30x80 poll cron 20453 30323 30323 95 30x90 kqreadsmtpd 4183 30323 30323 95 30x90 kqreadsmtpd 14921 30323 30323 95 30x90 kqreadsmtpd 21549 30323 30323 95 30x90 kqreadsmtpd 21867 30323 30323 95 30x90 kqreadsmtpd 5556 30323 30323103 30x90 kqreadsmtpd 30323 1 30323 0 30x80 kqreadsmtpd * 8116 8728 8728 89 70x10relayd 28617 8728 8728 89 30x90 kqreadrelayd 28666 8728 8728 89 30x90 kqreadrelayd 14234 8728 8728 89 30x90 kqreadrelayd 6772 20410 20410 89 30x90 kqreadrelayd 6064 20410 20410 89 30x90 kqreadrelayd 32481 20410 20410 89 30x90 kqreadrelayd 30101 20410 20410 89 30x90 kqreadrelayd 20410 22081 20410 89 30x90 kqreadrelayd 8728 22081 8728 89 30x90 kqreadrelayd 4449 22081 4449 89 30x90 kqreadrelayd 28012 22081 28012 89 30x90 kqreadrelayd 22081 1 22081 0 30x80 kqreadrelayd 23731 21123 23731 91 30x90 kqreadsnmpd 23092 21123 23092 91 30x90 kqreadsnmpd 21123 1 21123 0 30x80 kqreadsnmpd 65 1 65 0 30x80 selectsshd 18184 21160 15260 83 30x90 poll ntpd 21160 15260 15260 83 30x90 poll ntpd 15260 1 15260 0 30x80 poll ntpd 25057 27605 27605 74 30x90 bpf pflogd 27605 1 27605 0 30x80 netio pflogd 7709 9861 9861 73 30x90 kqreadsyslogd 9861 1 9861 0 30x80 netio syslogd 2720 0 0 0 3 0x14200 pgzerozerothread 3008 0 0 0 3 0x14200 aiodoned aiodoned 9376 0 0 0 3 0x14200 syncerupdate 20457 0 0 0 3 0x14200 cleaner cleaner 281 0 0 0 3 0x14200 reaperreaper 22066 0 0 0 3 0x14200 pgdaemon pagedaemon 26663 0 0 0 3 0x14200 bored crypto 12531 0 0 0 3 0x14200 pftm pfpurge 28680 0 0 0 3 0x40014200 acpi0 acpi0 1276 0 0 0 3 0x14200 bored softnet 6840 0 0 0 3 0x14200 bored systqmp 22866 0 0 0 3 0x14200 bored systq 29152 0 0 0 3 0x40014200idle0 1 0 1 0 30x82 wait init 0 -1 0 0 3 0x10200 scheduler swapper Mischa