Re: IPv4 on ix(4) slow/nothing - 7.4
> On Oct 18, 2023, at 15:44, Hrvoje Popovski wrote: > > On 18.10.2023. 15:35, Mischa wrote: >> Hi All, >> >> Just upgraded a couple of machines to 7.4. smooth as always!! >> >> I am however seeing issues with IPv4, slowness or no throughput at all. >> The machines I have upgraded are using an Intel X540-T network card and >> is connected on 10G. >> >> ix0 at pci5 dev 0 function 0 "Intel X540T" rev 0x01, msix, 16 queues, >> address b8:ca:3a:62:ee:40 >> ix1 at pci5 dev 0 function 1 "Intel X540T" rev 0x01, msix, 16 queues, >> address b8:ca:3a:62:ee:42 >> >> root@n2:~ # sysctl kern.version >> kern.version=OpenBSD 7.4 (GENERIC.MP) #1397: Tue Oct 10 09:02:37 MDT 2023 >> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP >> >> There are a bunch of VMs running on top of it, as soon as I want to >> fetch something with ftp, for example, I don't get anything over IPv4, >> with IPv6 everything is normal. >> >> mischa@www2:~ $ ftp -4 >> https://mirror.openbsd.amsterdam/pub/OpenBSD/7.4/amd64/install74.iso >> Trying 46.23.88.18... >> Requesting >> https://mirror.openbsd.amsterdam/pub/OpenBSD/7.4/amd64/install74.iso >> 0% | | 512 KB - stalled >> -^C >> >> A trace on mirror / n2: >> >> n2:~ # tcpdump -i vport880 host 46.23.88.32 >> tcpdump: listening on vport880, link-type EN10MB >> 15:16:08.730274 www2.high5.nl.1828 > n2.high5.nl.https: S >> 2182224746:2182224746(0) win 16384 > 6,nop,nop,timestamp 2899683458 0> (DF) >> 15:16:08.730297 arp who-has www2.high5.nl tell n2.high5.nl >> 15:16:08.731535 arp reply www2.high5.nl is-at fe:51:bb:1e:12:11 >> 15:16:08.731540 n2.high5.nl.https > www2.high5.nl.1828: S >> 633749938:633749938(0) ack 2182224747 win 16384 > 1460,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 3129955106 >> 2899683458> (DF) >> 15:16:08.732017 www2.high5.nl.1828 > n2.high5.nl.https: . ack 1 win 256 >> (DF) >> 15:16:08.785752 www2.high5.nl.1828 > n2.high5.nl.https: P 1:312(311) ack >> 1 win 256 (DF) >> 15:16:08.786092 n2.high5.nl.https > www2.high5.nl.1828: P 1:128(127) ack >> 312 win 271 (DF) >> 15:16:08.786376 n2.high5.nl.https > www2.high5.nl.1828: P 128:134(6) ack >> 312 win 271 (DF) >> 15:16:08.786396 n2.high5.nl.https > www2.high5.nl.1828: P 134:166(32) >> ack 312 win 271 (DF) >> 15:16:08.786455 n2.high5.nl.https > www2.high5.nl.1828: . 166:1614(1448) >> ack 312 win 271 (DF) >> 15:16:08.786457 n2.high5.nl.https > www2.high5.nl.1828: . >> 1614:3062(1448) ack 312 win 271 > 2899683510> (DF) >> 15:16:08.786460 n2.high5.nl.https > www2.high5.nl.1828: P 3062:3803(741) >> ack 312 win 271 (DF) >> 15:16:08.786943 www2.high5.nl.1828 > n2.high5.nl.https: . ack 134 win >> 255 (DF) >> 15:16:08.796534 n2.high5.nl.https > www2.high5.nl.1828: P 3803:4345(542) >> ack 312 win 271 (DF) >> 15:16:08.796577 n2.high5.nl.https > www2.high5.nl.1828: P 4345:4403(58) >> ack 312 win 271 (DF) >> 15:16:08.797518 www2.high5.nl.1828 > n2.high5.nl.https: . ack 166 win >> 256 >> (DF) >> 15:16:08.797522 www2.high5.nl.1828 > n2.high5.nl.https: . ack 166 win >> 256 >> (DF) >> 15:16:09.790297 n2.high5.nl.https > www2.high5.nl.1828: . 166:1614(1448) >> ack 312 win 271 (DF) >> 15:16:09.790902 www2.high5.nl.1828 > n2.high5.nl.https: . ack 1614 win >> 233 >> (DF) >> 15:16:09.790917 n2.high5.nl.https > www2.high5.nl.1828: . >> 1614:3062(1448) ack 312 win 271 > 2899684519> (DF) >> 15:16:09.790923 n2.high5.nl.https > www2.high5.nl.1828: P >> 3062:4403(1341) ack 312 win 271 > 2899684519> (DF) >> 15:16:10.790299 n2.high5.nl.https > www2.high5.nl.1828: . >> 1614:3062(1448) ack 312 win 271 > 2899684519> (DF) >> 15:16:10.791204 www2.high5.nl.1828 > n2.high5.nl.https: . ack 3062 win >> 233 >> (DF) >> 15:16:10.791223 n2.high5.nl.https > www2.high5.nl.1828: P >> 3062:4403(1341) ack 312 win 271 > 2899685520> (DF) >> 15:16:10.791692 www2.high5.nl.1828 > n2.high5.nl.https: . ack 4403 win >> 235 (DF) >> 15:16:10.802647 www2.high5.nl.1828 > n2.high5.nl.https: P 312:318(6) ack >> 4403 win 256 (DF) >> 15:16:11.000297 n2.high5.nl.https > www2.high5.nl.1828: . ack 318 win >> 271 (DF) >> 15:16:11.001162 www2.high5.nl.1828 > n2.high5.nl.https: P 318:527(209) >> ack 4403 win 256 (DF) >> 15:16:11.001860 n2.high5.nl.https > www2.high5.nl.1828: P 4403:5059(656) >> ack
IPv4 on ix(4) slow/nothing - 7.4
Hi All, Just upgraded a couple of machines to 7.4. smooth as always!! I am however seeing issues with IPv4, slowness or no throughput at all. The machines I have upgraded are using an Intel X540-T network card and is connected on 10G. ix0 at pci5 dev 0 function 0 "Intel X540T" rev 0x01, msix, 16 queues, address b8:ca:3a:62:ee:40 ix1 at pci5 dev 0 function 1 "Intel X540T" rev 0x01, msix, 16 queues, address b8:ca:3a:62:ee:42 root@n2:~ # sysctl kern.version kern.version=OpenBSD 7.4 (GENERIC.MP) #1397: Tue Oct 10 09:02:37 MDT 2023 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP There are a bunch of VMs running on top of it, as soon as I want to fetch something with ftp, for example, I don't get anything over IPv4, with IPv6 everything is normal. mischa@www2:~ $ ftp -4 https://mirror.openbsd.amsterdam/pub/OpenBSD/7.4/amd64/install74.iso Trying 46.23.88.18... Requesting https://mirror.openbsd.amsterdam/pub/OpenBSD/7.4/amd64/install74.iso 0% | | 512 KB - stalled -^C A trace on mirror / n2: n2:~ # tcpdump -i vport880 host 46.23.88.32 tcpdump: listening on vport880, link-type EN10MB 15:16:08.730274 www2.high5.nl.1828 > n2.high5.nl.https: S 2182224746:2182224746(0) win 16384 6,nop,nop,timestamp 2899683458 0> (DF) 15:16:08.730297 arp who-has www2.high5.nl tell n2.high5.nl 15:16:08.731535 arp reply www2.high5.nl is-at fe:51:bb:1e:12:11 15:16:08.731540 n2.high5.nl.https > www2.high5.nl.1828: S 633749938:633749938(0) ack 2182224747 win 16384 1460,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 3129955106 2899683458> (DF) 15:16:08.732017 www2.high5.nl.1828 > n2.high5.nl.https: . ack 1 win 256 (DF) 15:16:08.785752 www2.high5.nl.1828 > n2.high5.nl.https: P 1:312(311) ack 1 win 256 (DF) 15:16:08.786092 n2.high5.nl.https > www2.high5.nl.1828: P 1:128(127) ack 312 win 271 (DF) 15:16:08.786376 n2.high5.nl.https > www2.high5.nl.1828: P 128:134(6) ack 312 win 271 (DF) 15:16:08.786396 n2.high5.nl.https > www2.high5.nl.1828: P 134:166(32) ack 312 win 271 (DF) 15:16:08.786455 n2.high5.nl.https > www2.high5.nl.1828: . 166:1614(1448) ack 312 win 271 (DF) 15:16:08.786457 n2.high5.nl.https > www2.high5.nl.1828: . 1614:3062(1448) ack 312 win 271 2899683510> (DF) 15:16:08.786460 n2.high5.nl.https > www2.high5.nl.1828: P 3062:3803(741) ack 312 win 271 (DF) 15:16:08.786943 www2.high5.nl.1828 > n2.high5.nl.https: . ack 134 win 255 (DF) 15:16:08.796534 n2.high5.nl.https > www2.high5.nl.1828: P 3803:4345(542) ack 312 win 271 (DF) 15:16:08.796577 n2.high5.nl.https > www2.high5.nl.1828: P 4345:4403(58) ack 312 win 271 (DF) 15:16:08.797518 www2.high5.nl.1828 > n2.high5.nl.https: . ack 166 win 256 > (DF) 15:16:08.797522 www2.high5.nl.1828 > n2.high5.nl.https: . ack 166 win 256 > (DF) 15:16:09.790297 n2.high5.nl.https > www2.high5.nl.1828: . 166:1614(1448) ack 312 win 271 (DF) 15:16:09.790902 www2.high5.nl.1828 > n2.high5.nl.https: . ack 1614 win 233 > (DF) 15:16:09.790917 n2.high5.nl.https > www2.high5.nl.1828: . 1614:3062(1448) ack 312 win 271 2899684519> (DF) 15:16:09.790923 n2.high5.nl.https > www2.high5.nl.1828: P 3062:4403(1341) ack 312 win 271 2899684519> (DF) 15:16:10.790299 n2.high5.nl.https > www2.high5.nl.1828: . 1614:3062(1448) ack 312 win 271 2899684519> (DF) 15:16:10.791204 www2.high5.nl.1828 > n2.high5.nl.https: . ack 3062 win 233 > (DF) 15:16:10.791223 n2.high5.nl.https > www2.high5.nl.1828: P 3062:4403(1341) ack 312 win 271 2899685520> (DF) 15:16:10.791692 www2.high5.nl.1828 > n2.high5.nl.https: . ack 4403 win 235 (DF) 15:16:10.802647 www2.high5.nl.1828 > n2.high5.nl.https: P 312:318(6) ack 4403 win 256 (DF) 15:16:11.000297 n2.high5.nl.https > www2.high5.nl.1828: . ack 318 win 271 (DF) 15:16:11.001162 www2.high5.nl.1828 > n2.high5.nl.https: P 318:527(209) ack 4403 win 256 (DF) 15:16:11.001860 n2.high5.nl.https > www2.high5.nl.1828: P 4403:5059(656) ack 527 win 271 (DF) 15:16:11.001989 n2.high5.nl.https > www2.high5.nl.1828: . 5059:6507(1448) ack 527 win 271 2899685730> (DF) 15:16:11.001992 n2.high5.nl.https > www2.high5.nl.1828: . 6507:7955(1448) ack 527 win 271 2899685730> (DF) 15:16:11.195431 www2.high5.nl.1828 > n2.high5.nl.https: . ack 5059 win 256 (DF) 15:16:11.195447 n2.high5.nl.https > www2.high5.nl.1828: . 7955:9403(1448) ack 527 win 271 2899685924> (DF) Running a trace on www2 I am seeing: www2:~ # tcpdump -i vio0 host 46.23.88.18 tcpdump: listening on vio0, link-type EN10MB 15:16:08.729974 www2.high5.nl.1828 > n2.high5.nl.https: S 2182224746:2182224746(0) win 16384 6,nop,nop,timestamp 2899683458 0> (DF) 15:16:08.731114 arp who-has www2.high5.nl tell n2.high5.nl 15:16:08.731229 arp reply www2.high5.nl is-at fe:51:bb:1e:12:11 15:16:08.731631 n2.high5.nl.https > www2.high5.nl.1828: S 633749938:633749938(0) ack
Re: vmd(8): disambiguate logging for vm's and devices.
Hi Dave, I like it a lot! Mischa On 2023-09-23 19:50, Dave Voutila wrote: It annoys me how all the log messages from different vmd vm's blur together. Here is a diff that makes them distinguishable. It also fixes dynamic toggling of verbosity levels in virtio devices using `vmctl log`, and now preserves the verbosity across vm reboots. I chose the pattern "vm/ and "vm// to distinguish between vmd procs and the names of vm's since someone could name their vm "vmd" or "vmm" or "priv" etc. Additionally, I'm proposing to change the proc titles for devices to use "name>/" instead of "/[]" to match the logging format. While here I changed the name "parent" to "vmd". Feedback? Ok? -dv Sample abbreviated output of `$(which vmd) -dvv` (aka `vmctl log verbose`) when launching a vm named "alpine": vmd: startup vmd: start_vm_batch: done starting vms priv: config_getconfig: priv retrieving config agentx: config_getconfig: agentx retrieving config vmm: config_getconfig: vmm retrieving config control: config_getconfig: control retrieving config vm/alpine: alpine: launching vioblk0 vm/alpine: virtio_dev_launch: sending 'd' type device struct vm/alpine: virtio_dev_launch: marking fd 7 !close-on-exec vm/alpine: virtio_dev_launch: sending vm message for 'alpine' vm/alpine/vioblk: vioblk_main: got viblk dev. num disk fds = 1, sync fd = 16, async fd = 18, capacity = 0 seg_max = 126, vmm fd = 5 vm/alpine/vioblk0: qc2_open: qcow2 disk version 3 size 42949672960 end 604592 snap 0 vm/alpine/vioblk0: vioblk_main: initialized vioblk0 with qcow2 image (capacity=83886080) vm/alpine/vioblk0: vioblk_main: wiring in async vm event handler (fd=18) vm/alpine/vioblk0: vm_device_pipe: initializing 'd' device pipe (fd=18) vm/alpine/vioblk0: vioblk_main: wiring in sync channel handler (fd=16) vm/alpine/vioblk0: vioblk_main: telling vm alpine device is ready vm/alpine/vioblk0: vioblk_main: sending heartbeat vm/alpine: virtio_dev_launch: receiving reply vm/alpine: virtio_dev_launch: device reports ready via sync channel diff refs/heads/master refs/heads/vmd-logging commit - 6332535f639065d7cf3e5fc071339d6e7a72e767 commit + 0154c36753c63e89d2f4005e4fdc6a3ff17d174f blob - a6b0db9c264a7ca77411c0bc68a958bc226b317a blob + 1dd2a384fa24410474fd50de0c594e6f1e2e2bfc --- usr.sbin/vmd/log.c +++ usr.sbin/vmd/log.c @@ -24,31 +24,12 @@ #include #include +#include "proc.h" + static int debug; static int verbose; -const char *log_procname; +static char log_procname[2048]; -void log_init(int, int); -void log_procinit(const char *); -void log_setverbose(int); -intlog_getverbose(void); -void log_warn(const char *, ...) - __attribute__((__format__ (printf, 1, 2))); -void log_warnx(const char *, ...) - __attribute__((__format__ (printf, 1, 2))); -void log_info(const char *, ...) - __attribute__((__format__ (printf, 1, 2))); -void log_debug(const char *, ...) - __attribute__((__format__ (printf, 1, 2))); -void logit(int, const char *, ...) - __attribute__((__format__ (printf, 2, 3))); -void vlog(int, const char *, va_list) - __attribute__((__format__ (printf, 2, 0))); -__dead void fatal(const char *, ...) - __attribute__((__format__ (printf, 1, 2))); -__dead void fatalx(const char *, ...) - __attribute__((__format__ (printf, 1, 2))); - void log_init(int n_debug, int facility) { @@ -56,7 +37,7 @@ log_init(int n_debug, int facility) debug = n_debug; verbose = n_debug; - log_procinit(__progname); + log_procinit("%s", __progname); if (!debug) openlog(__progname, LOG_PID | LOG_NDELAY, facility); @@ -65,10 +46,12 @@ log_init(int n_debug, int facility) } void -log_procinit(const char *procname) +log_procinit(const char *fmt, ...) { - if (procname != NULL) - log_procname = procname; + va_list ap; + va_start(ap, fmt); + vsnprintf(log_procname, sizeof(log_procname), fmt, ap); + va_end(ap); } void @@ -101,7 +84,7 @@ vlog(int pri, const char *fmt, va_list ap) if (debug) { /* best effort in out of mem situations */ - if (asprintf(, "%s\n", fmt) == -1) { + if (asprintf(, "%s: %s\n", log_procname, fmt) == -1) { vfprintf(stderr, fmt, ap); fprintf(stderr, "\n"); } else { blob - c9efad13ef3a504bbd43e2b26336784baca33db9 blob + 0b71a9a33e4ded6dac468196ba448fb1623ee137 --- usr.sbin/vmd/proc.c +++ usr.sbin/vmd/proc.c @@ -287,7 +287,7 @@ proc_setup(struct privsep *ps, struct privsep_proc *pr struct privsep_pipes*pp; /* Initialize parent title, ps_instances and procs. */ - ps->ps_title[PROC_PARENT] = "parent"; + ps->ps_ti
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-06 19:38, Dave Voutila wrote: Mischa writes: On 2023-09-06 05:36, Dave Voutila wrote: Mischa writes: On 2023-09-05 14:27, Dave Voutila wrote: Mike Larkin writes: On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote: On 2023-09-04 18:58, Mischa wrote: > On 2023-09-04 18:55, Mischa wrote: /snip > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started > > this way, before it would choke on 2-3. > > > > Do I only need the unpatched kernel or also the vmd/vmctl from snap? > > I do still get the same message on the console, but the machine isn't > freezing up. > > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not > MAP_STACK Starting 30 VMs this way caused the machine to become unresponsive again, but nothing on the console. :( Mischa Were you seeing these uvm errors before this diff? If so, this isn't causing the problem and something else is. I don't believe we solved any of the underlying uvm issues in Bruges last year. Mischa, can you test with just the latest snapshot/-current? I'd imagine starting and stopping many vm's now is exacerbating the issue because of the fork/exec for devices plus the ioctl to do a uvm share into the device process address space. If this diff causes the errors to occur, and without the diff it's fine, then we need to look into that. Also I think a pid number in that printf might be useful, I'll see what I can find. If it's not vmd causing this and rather some other process then that would be good to know also. Sadly it looks like that printf doesn't spit out the offending pid. :( Just to confirm I am seeing this behavior on the latest snap without the patch as well. Since this diff isn't the cause, I've committed it. Thanks for testing. I'll see if I can reproduce your MAP_STACK issues. Just started 10 VMs with sleep 2, machine freezes, but nothing on the console. :( For now, I'd recommend spacing out vm launches. I'm pretty sure it's related to the uvm corruption we saw last year when creating, starting, and destroying vm's rapidly in a loop. That could very well be the case. I will adjust my start script, so far I've got good results with a 10 second sleep. Is there some additional debugging I can turn that makes sense for this? I can easily replicate. Highly doubtful if the issue is what I think. The only thing would be making sure you're running in a way to see any panic and drop into ddb. If you're using X or not on the the primary console or serial connection it might just appear as a deadlocked system during a panic. I am using the console via iDRAC, there isn't any information anymore. :( Mischa
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-06 05:36, Dave Voutila wrote: Mischa writes: On 2023-09-05 14:27, Dave Voutila wrote: Mike Larkin writes: On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote: On 2023-09-04 18:58, Mischa wrote: > On 2023-09-04 18:55, Mischa wrote: /snip > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started > > this way, before it would choke on 2-3. > > > > Do I only need the unpatched kernel or also the vmd/vmctl from snap? > > I do still get the same message on the console, but the machine isn't > freezing up. > > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not > MAP_STACK Starting 30 VMs this way caused the machine to become unresponsive again, but nothing on the console. :( Mischa Were you seeing these uvm errors before this diff? If so, this isn't causing the problem and something else is. I don't believe we solved any of the underlying uvm issues in Bruges last year. Mischa, can you test with just the latest snapshot/-current? I'd imagine starting and stopping many vm's now is exacerbating the issue because of the fork/exec for devices plus the ioctl to do a uvm share into the device process address space. If this diff causes the errors to occur, and without the diff it's fine, then we need to look into that. Also I think a pid number in that printf might be useful, I'll see what I can find. If it's not vmd causing this and rather some other process then that would be good to know also. Sadly it looks like that printf doesn't spit out the offending pid. :( Just to confirm I am seeing this behavior on the latest snap without the patch as well. Since this diff isn't the cause, I've committed it. Thanks for testing. I'll see if I can reproduce your MAP_STACK issues. Just started 10 VMs with sleep 2, machine freezes, but nothing on the console. :( For now, I'd recommend spacing out vm launches. I'm pretty sure it's related to the uvm corruption we saw last year when creating, starting, and destroying vm's rapidly in a loop. That could very well be the case. I will adjust my start script, so far I've got good results with a 10 second sleep. Is there some additional debugging I can turn that makes sense for this? I can easily replicate. Mischa
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-05 14:27, Dave Voutila wrote: Mike Larkin writes: On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote: On 2023-09-04 18:58, Mischa wrote: > On 2023-09-04 18:55, Mischa wrote: /snip > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started > > this way, before it would choke on 2-3. > > > > Do I only need the unpatched kernel or also the vmd/vmctl from snap? > > I do still get the same message on the console, but the machine isn't > freezing up. > > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not > MAP_STACK Starting 30 VMs this way caused the machine to become unresponsive again, but nothing on the console. :( Mischa Were you seeing these uvm errors before this diff? If so, this isn't causing the problem and something else is. I don't believe we solved any of the underlying uvm issues in Bruges last year. Mischa, can you test with just the latest snapshot/-current? I'd imagine starting and stopping many vm's now is exacerbating the issue because of the fork/exec for devices plus the ioctl to do a uvm share into the device process address space. If this diff causes the errors to occur, and without the diff it's fine, then we need to look into that. Also I think a pid number in that printf might be useful, I'll see what I can find. If it's not vmd causing this and rather some other process then that would be good to know also. Sadly it looks like that printf doesn't spit out the offending pid. :( Just to confirm I am seeing this behavior on the latest snap without the patch as well. Just started 10 VMs with sleep 2, machine freezes, but nothing on the console. :( Mischa
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-05 14:27, Dave Voutila wrote: Mike Larkin writes: On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote: On 2023-09-04 18:58, Mischa wrote: > On 2023-09-04 18:55, Mischa wrote: /snip > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started > > this way, before it would choke on 2-3. > > > > Do I only need the unpatched kernel or also the vmd/vmctl from snap? > > I do still get the same message on the console, but the machine isn't > freezing up. > > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not > MAP_STACK Starting 30 VMs this way caused the machine to become unresponsive again, but nothing on the console. :( Mischa Were you seeing these uvm errors before this diff? If so, this isn't causing the problem and something else is. I don't believe we solved any of the underlying uvm issues in Bruges last year. Mischa, can you test with just the latest snapshot/-current? Yes, after Mike's email I already started getting an extra machine up and running. Will finish that shortly and run the tests on the latest snap. I'd imagine starting and stopping many vm's now is exacerbating the issue because of the fork/exec for devices plus the ioctl to do a uvm share into the device process address space. I will adjust my scripts accordingly. I currently start as many VMs as there are cores in production. Will test if that is still possible. Mischa If this diff causes the errors to occur, and without the diff it's fine, then we need to look into that. Also I think a pid number in that printf might be useful, I'll see what I can find. If it's not vmd causing this and rather some other process then that would be good to know also. Sadly it looks like that printf doesn't spit out the offending pid. :(
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-04 18:58, Mischa wrote: On 2023-09-04 18:55, Mischa wrote: On 2023-09-04 17:57, Dave Voutila wrote: Mischa writes: On 2023-09-04 16:23, Mike Larkin wrote: On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote: On 2023-09-03 21:18, Dave Voutila wrote: > Mischa writes: > > > Nice!! Thanx Dave! > > > > Running go brrr as we speak. > > Testing with someone who is running Debian. > > Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time > unless I hear of any issues. There are a couple of permanent VMs running on this host, 1 ToR node, OpenBSD VM and a Debian VM. While they were running I started my stress script. The first round I started 40 VMs with just bsd.rd, 2G memory All good, then I started 40 VMs with a base disk and 2G memory. After 20 VMs started I got the following messages on the console: [umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: not MAP_STACK [umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff: not MAP_STACK [umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff: not MAP_STACK [umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not MAP_STACK [umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not MAP_STACK [umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff: not MAP_STACK [umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff: not MAP_STACK Not sure if this is related to starting of the VMs or something else, the ToR node was consuming 100%+ CPU at the time. :) Mischa I have not seen this; can you try without the ToR node some time and see if this still happens? Testing again without any other VMs running. Things wrong when I run the following command and wait a little. for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2 /var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 2G vm${i}; done Can you try adding a "sleep 2" or something in the loop? I can't think of a reason my changes would cause this. Do you see this on -current without the diff? Adding the sleep 2 does indeed help. I managed to get 20 VMs started this way, before it would choke on 2-3. Do I only need the unpatched kernel or also the vmd/vmctl from snap? I do still get the same message on the console, but the machine isn't freezing up. [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not MAP_STACK Starting 30 VMs this way caused the machine to become unresponsive again, but nothing on the console. :( Mischa
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-04 18:55, Mischa wrote: On 2023-09-04 17:57, Dave Voutila wrote: Mischa writes: On 2023-09-04 16:23, Mike Larkin wrote: On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote: On 2023-09-03 21:18, Dave Voutila wrote: > Mischa writes: > > > Nice!! Thanx Dave! > > > > Running go brrr as we speak. > > Testing with someone who is running Debian. > > Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time > unless I hear of any issues. There are a couple of permanent VMs running on this host, 1 ToR node, OpenBSD VM and a Debian VM. While they were running I started my stress script. The first round I started 40 VMs with just bsd.rd, 2G memory All good, then I started 40 VMs with a base disk and 2G memory. After 20 VMs started I got the following messages on the console: [umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: not MAP_STACK [umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff: not MAP_STACK [umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff: not MAP_STACK [umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not MAP_STACK [umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not MAP_STACK [umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff: not MAP_STACK [umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff: not MAP_STACK Not sure if this is related to starting of the VMs or something else, the ToR node was consuming 100%+ CPU at the time. :) Mischa I have not seen this; can you try without the ToR node some time and see if this still happens? Testing again without any other VMs running. Things wrong when I run the following command and wait a little. for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2 /var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 2G vm${i}; done Can you try adding a "sleep 2" or something in the loop? I can't think of a reason my changes would cause this. Do you see this on -current without the diff? Adding the sleep 2 does indeed help. I managed to get 20 VMs started this way, before it would choke on 2-3. Do I only need the unpatched kernel or also the vmd/vmctl from snap? I do still get the same message on the console, but the machine isn't freezing up. [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not MAP_STACK Mischa
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-04 17:57, Dave Voutila wrote: Mischa writes: On 2023-09-04 16:23, Mike Larkin wrote: On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote: On 2023-09-03 21:18, Dave Voutila wrote: > Mischa writes: > > > Nice!! Thanx Dave! > > > > Running go brrr as we speak. > > Testing with someone who is running Debian. > > Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time > unless I hear of any issues. There are a couple of permanent VMs running on this host, 1 ToR node, OpenBSD VM and a Debian VM. While they were running I started my stress script. The first round I started 40 VMs with just bsd.rd, 2G memory All good, then I started 40 VMs with a base disk and 2G memory. After 20 VMs started I got the following messages on the console: [umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: not MAP_STACK [umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff: not MAP_STACK [umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff: not MAP_STACK [umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not MAP_STACK [umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not MAP_STACK [umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff: not MAP_STACK [umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff: not MAP_STACK Not sure if this is related to starting of the VMs or something else, the ToR node was consuming 100%+ CPU at the time. :) Mischa I have not seen this; can you try without the ToR node some time and see if this still happens? Testing again without any other VMs running. Things wrong when I run the following command and wait a little. for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2 /var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 2G vm${i}; done Can you try adding a "sleep 2" or something in the loop? I can't think of a reason my changes would cause this. Do you see this on -current without the diff? Adding the sleep 2 does indeed help. I managed to get 20 VMs started this way, before it would choke on 2-3. Do I only need the unpatched kernel or also the vmd/vmctl from snap? Mischa
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-04 16:23, Mike Larkin wrote: On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote: On 2023-09-03 21:18, Dave Voutila wrote: > Mischa writes: > > > Nice!! Thanx Dave! > > > > Running go brrr as we speak. > > Testing with someone who is running Debian. > > Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time > unless I hear of any issues. There are a couple of permanent VMs running on this host, 1 ToR node, OpenBSD VM and a Debian VM. While they were running I started my stress script. The first round I started 40 VMs with just bsd.rd, 2G memory All good, then I started 40 VMs with a base disk and 2G memory. After 20 VMs started I got the following messages on the console: [umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: not MAP_STACK [umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff: not MAP_STACK [umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff: not MAP_STACK [umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not MAP_STACK [umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not MAP_STACK [umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff: not MAP_STACK [umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff: not MAP_STACK Not sure if this is related to starting of the VMs or something else, the ToR node was consuming 100%+ CPU at the time. :) Mischa I have not seen this; can you try without the ToR node some time and see if this still happens? Testing again without any other VMs running. Things wrong when I run the following command and wait a little. for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2 /var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 2G vm${i}; done Mischa
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-03 21:18, Dave Voutila wrote: Mischa writes: Nice!! Thanx Dave! Running go brrr as we speak. Testing with someone who is running Debian. Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time unless I hear of any issues. There are a couple of permanent VMs running on this host, 1 ToR node, OpenBSD VM and a Debian VM. While they were running I started my stress script. The first round I started 40 VMs with just bsd.rd, 2G memory All good, then I started 40 VMs with a base disk and 2G memory. After 20 VMs started I got the following messages on the console: [umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: not MAP_STACK [umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff: not MAP_STACK [umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff: not MAP_STACK [umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not MAP_STACK [umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not MAP_STACK [umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff: not MAP_STACK [umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff: not MAP_STACK Not sure if this is related to starting of the VMs or something else, the ToR node was consuming 100%+ CPU at the time. :) Mischa
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
Nice!! Thanx Dave! Running go brrr as we speak. Testing with someone who is running Debian. Mischa On 2023-09-01 21:50, Dave Voutila wrote: Now that my i8259 fix is in, it's safe to expand the testing pool for this diff. (Without that fix, users would definitely hit the hung block device issue testing this one.) Hoping that folks that run non-OpenBSD guests or strange configurations can give it a spin. This change removes an ioctl(2) call from the vcpu thread hot path in vmd. Instead of making that syscall to toggle on/off a pending interrupt flag on the vcpu object in vmm(4), it adds a flag into the vm_run_params struct sent with the VMM_IOC_RUN ioctl. The in-kernel vcpu runloop can now toggle the pending interrupt state prior to vm entry. mbuhl@ and phessler@ have run this diff on their machines. Current observations are reduced average network latency for guests. My terse measurements using the following btrace script show some promising changes in terms of reducing ioctl syscalls: /* VMM_IOC_INTR: 0x800c5606 -> 2148292102 */ syscall:ioctl:entry /arg1 == 2148292102/ { @total[tid] = count(); @running[tid] = count(); } interval:hz:1 { print(@running); clear(@running); } Measuring from boot of an OpenBSD guest to after the guest finishes relinking (based on my manual observation of the libevent thread settling down in syscall rate), I see a huge reduction in VMM_IOC_INTR ioctls for a single guest: ## -current @total[433237]: 1325100 # vcpu thread (!!) @total[187073]: 80239# libevent thread ## with diff @total[550347]: 42 # vcpu thread (!!) @total[256550]: 86946# libevent thread Most of the VMM_IOC_INTR ioctls on the vcpu threads come from seabios and the bootloader prodding some of the emulated hardware, but even after the bootloader you'll see ~10-20k/s of ioctl's on -current vs. ~4-5k/s with the diff. At steady-state, the vcpu thread no longer makes the VMM_IOC_INTR calls at all and you should see the libevent thread calling it at a rate ~100/s (probably hardclock?). *Without* the diff, I see a steady 650/s rate on the vcpu thread at idle. *With* the diff, it's 0/s at idle. :) To test: - rebuild & install new kernel - copy/symlink vmmvar.h into /usr/include/machine/ - rebuild & re-install vmd & vmctl - reboot -dv diffstat refs/heads/master refs/heads/vmm-vrp_intr_pending M sys/arch/amd64/amd64/vmm_machdep.c | 10+ 0- M sys/arch/amd64/include/vmmvar.h | 1+ 0- M usr.sbin/vmd/vm.c | 2+ 16- 3 files changed, 13 insertions(+), 16 deletions(-) diff refs/heads/master refs/heads/vmm-vrp_intr_pending commit - 8afcf90fb39e4a84606e93137c2b6c20f44312cb commit + 10eeb8a0414ec927b6282473c50043a7027d6b41 blob - 24a376a8f3bc94bc4a4203fe66c5994594adff46 blob + e3b6d10a0ae78b12ec2f3296f708b42540ce798e --- sys/arch/amd64/amd64/vmm_machdep.c +++ sys/arch/amd64/amd64/vmm_machdep.c @@ -3973,6 +3973,11 @@ vcpu_run_vmx(struct vcpu *vcpu, struct vm_run_params * */ irq = vrp->vrp_irq; + if (vrp->vrp_intr_pending) + vcpu->vc_intr = 1; + else + vcpu->vc_intr = 0; + if (vrp->vrp_continue) { switch (vcpu->vc_gueststate.vg_exit_reason) { case VMX_EXIT_IO: @@ -6381,6 +6386,11 @@ vcpu_run_svm(struct vcpu *vcpu, struct vm_run_params * irq = vrp->vrp_irq; + if (vrp->vrp_intr_pending) + vcpu->vc_intr = 1; + else + vcpu->vc_intr = 0; + /* * If we are returning from userspace (vmd) because we exited * last time, fix up any needed vcpu state first. Which state blob - e9f8384cccfde33034d7ac9782610f93eb5dc640 blob + 88545b54b35dd60280ba87403e343db9463d7419 --- sys/arch/amd64/include/vmmvar.h +++ sys/arch/amd64/include/vmmvar.h @@ -456,6 +456,7 @@ struct vm_run_params { uint32_tvrp_vcpu_id; uint8_t vrp_continue; /* Continuing from an exit */ uint16_tvrp_irq;/* IRQ to inject */ + uint8_t vrp_intr_pending; /* Additional intrs pending? */ /* Input/output parameter to VMM_IOC_RUN */ struct vm_exit *vrp_exit; /* updated exit data */ blob - 5f598bcc14af5115372d34a4176254d377aad91c blob + 447fc219adadf945de2bf25d5335993c2abdc26f --- usr.sbin/vmd/vm.c +++ usr.sbin/vmd/vm.c @@ -1610,22 +1610,8 @@ vcpu_run_loop(void *arg) } else vrp->vrp_irq = 0x; - /* Still more pending? */ - if (i8259_is_pending()) { - /* -* XXX can probably avoid ioctls here by providing intr -* in vrp -*/ - if (vcpu_pic_intr(vrp->vrp_vm_id, - vrp->vrp_vcpu_id, 1)) {
Re: only open /dev/vmm once in vmd(8)
Hi Dave, Applied the patch on top of the previous two you provided and all looks good. Running four proper VMs (installed 7.2, with different amount of memory allocated, one of them with rpki-client) and booted ~40 with just bsd.rd. Some log messages I am seeing, which I didn't see/notice before. Let me know if there is something specific I need to look out for. Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 0xfdd1a31ff998 Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509fc7501e Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 0xfdd1a31ff448 Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509fb5001e Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 0xfdd1a31ff008 Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509f9b401e Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 0xfdd1a31ff228 Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509f7b001e Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 0xfdd1a31ff778 Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509f51a01e Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 0xfdd1a31ffaa8 Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509f1c601e Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 0xfdd1a31ffdd8 Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509ee0901e Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 0xfdd09e98fcd0 Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509e98e01e Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 0xfdd09e98f780 Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509e46c01e Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 0xfdd09e98f340 Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509de2601e Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 25 Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, data=0x80010031 Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 26 Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, data=0x8001003 Dec 26 16:54:13 current /bsd: 1 Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID Dec 26 16:54:13 current /bsd: /ASID 27 Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, data=0x80010031 Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 28 Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, data=0x80010031 Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 29 Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, data=0x80010031 Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 30 Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 31 Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, data=0x80010031 Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, data=0x80010031 Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 32 Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, data=0x80010031 Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 33 Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 34 Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, data=0x80010031 Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, data=0x80010031 Dec 26 16:54:14 current /bsd: vmm_handle_cpuid: unsupported rax=0x4100 Dec 26 16:54:14 current last message repeated 5 times Dec 26 16:54:14 current /bsd: vmm_handle_cpuid: function 0x06 (thermal/power mgt) not supported Dec 26 16:54:14 current last message repeated 2 times Mischa On 2022-12-25 16:57, Dave Voutila wrote: During h2k22 there was some discussion around how vmd(8) manages vms and the vmm(4) device's role. While looking into something related, I found vmd opens /dev/vmm in each subprocess during the initial fork+execve dance. The only vmd process that needs /dev/vmm is the vmm process. The diff below changes it so that *only
Re: vmd(8): create a proper e820 bios memory map
On 2022-12-14 14:57, Dave Voutila wrote: Mischa writes: On 2022-12-13 20:29, Dave Voutila wrote: Dave Voutila writes: tech@, The below diff tweaks how vmd and vmm define memory ranges (adding a "type" attribute) so we can properly build an e820 memory map to hand to things like SeaBIOS or the OpenBSD ramdisk kernel (when direct booting bsd.rd). Why do it? We've been carrying a few patches to SeaBIOS in the ports tree to hack around how vmd articulates some memory range details. By finally implementing a proper bios memory map table we can drop some of those patches. (Diff to ports@ coming shortly.) Bonus is it cleans up how we were hacking a bios memory map for direct booting ramdisk kernels. Note: the below diff *will* work with the current SeaBIOS (vmm-firmware), so you do *not* need to build the port. You will, however, need to: - build, install, & reboot into a new kernel - make sure you update /usr/include/amd64/vmmvar.h with a copy of symlink to sys/arch/amd64/include/vmmvar.h - rebuild & install vmctl - rebuild & install vmd This should *not* result in any behavioral changes of current vmd guests. If you notice any, especially guests failing to start, please rebuild a kernel with VMM_DEBUG to help diagnose the regression. Updated diff to fix some accounting issues with guest memory. (vmctl should report the correct max mem now.) Booted... The memory display in vmctl show is normal again. root@current:~ # vmctl show ID PID VCPUS MAXMEM CURMEM TTYOWNERSTATE NAME 4 56252 11.0G989M ttyp4 runbsd04 running vm04 3 60536 18.0G2.2G ttyp3 runbsd running vm03 2 20642 1 16.0G3.4G ttyp2 runbsd running vm02 1 81947 1 30.0G5.6G ttyp1 runbsd running vm01 All seems to running normal. Anything specific I need to look out for? Other than the above, no not really. Going to keep this diff out on tech@ a few days to allow folks with a variety of guests to test before I ask for OK's to commit. The next change will be SeaBIOS (vmm-firmware) once this lands. Perfect! Will do some more tests and will let you know if I find something. Mischa
Re: vmd(8): create a proper e820 bios memory map
On 2022-12-13 20:29, Dave Voutila wrote: Dave Voutila writes: tech@, The below diff tweaks how vmd and vmm define memory ranges (adding a "type" attribute) so we can properly build an e820 memory map to hand to things like SeaBIOS or the OpenBSD ramdisk kernel (when direct booting bsd.rd). Why do it? We've been carrying a few patches to SeaBIOS in the ports tree to hack around how vmd articulates some memory range details. By finally implementing a proper bios memory map table we can drop some of those patches. (Diff to ports@ coming shortly.) Bonus is it cleans up how we were hacking a bios memory map for direct booting ramdisk kernels. Note: the below diff *will* work with the current SeaBIOS (vmm-firmware), so you do *not* need to build the port. You will, however, need to: - build, install, & reboot into a new kernel - make sure you update /usr/include/amd64/vmmvar.h with a copy of symlink to sys/arch/amd64/include/vmmvar.h - rebuild & install vmctl - rebuild & install vmd This should *not* result in any behavioral changes of current vmd guests. If you notice any, especially guests failing to start, please rebuild a kernel with VMM_DEBUG to help diagnose the regression. Updated diff to fix some accounting issues with guest memory. (vmctl should report the correct max mem now.) Booted... The memory display in vmctl show is normal again. root@current:~ # vmctl show ID PID VCPUS MAXMEM CURMEM TTYOWNERSTATE NAME 4 56252 11.0G989M ttyp4 runbsd04 running vm04 3 60536 18.0G2.2G ttyp3 runbsd running vm03 2 20642 1 16.0G3.4G ttyp2 runbsd running vm02 1 81947 1 30.0G5.6G ttyp1 runbsd running vm01 All seems to running normal. Anything specific I need to look out for? Mischa As a result, adds in an MMIO range type (previous diff counted that range towards guest mem, though we don't actually fault in virtual memory to represent it to the guest). This has the added benefit of removing more knowledge from vmm(4) of what an emulated machine looks like, i.e. why does it care what the pci mmio range is? vmd(8) is responsible for that. I did also remove the "multiple of 1M" requirement for guest memory. Since I transitioned things to bytes awhile ago, no need to prohibit that. -dv diff refs/heads/master refs/heads/vmd-e820 commit - 9be741fe9857107e3610acb9a39e2972330b122d commit + ad422400e2f72c14c73d7f124f8b96d01d4ad4c5 blob - 3f7e0ce405ae3c6b0b4a787de341839886f97436 blob + d69293fcd5fd98315181eb0dd77b653601530e9d --- sys/arch/amd64/amd64/vmm.c +++ sys/arch/amd64/amd64/vmm.c @@ -1631,8 +1631,8 @@ vmx_remote_vmclear(struct cpu_info *ci, struct vcpu *v * The last physical address may not exceed VMM_MAX_VM_MEM_SIZE. * * Return Values: - * The total memory size in MB if the checks were successful - * 0: One of the memory ranges was invalid, or VMM_MAX_VM_MEM_SIZE was + * The total memory size in bytes if the checks were successful + * 0: One of the memory ranges was invalid or VMM_MAX_VM_MEM_SIZE was * exceeded */ size_t @@ -1643,21 +1643,27 @@ vm_create_check_mem_ranges(struct vm_create_params *vc const paddr_t maxgpa = VMM_MAX_VM_MEM_SIZE; if (vcp->vcp_nmemranges == 0 || - vcp->vcp_nmemranges > VMM_MAX_MEM_RANGES) + vcp->vcp_nmemranges > VMM_MAX_MEM_RANGES) { + DPRINTF("invalid number of guest memory ranges\n"); return (0); + } for (i = 0; i < vcp->vcp_nmemranges; i++) { vmr = >vcp_memranges[i]; /* Only page-aligned addresses and sizes are permitted */ if ((vmr->vmr_gpa & PAGE_MASK) || (vmr->vmr_va & PAGE_MASK) || - (vmr->vmr_size & PAGE_MASK) || vmr->vmr_size == 0) + (vmr->vmr_size & PAGE_MASK) || vmr->vmr_size == 0) { + DPRINTF("memory range %zu is not page aligned\n", i); return (0); + } /* Make sure that VMM_MAX_VM_MEM_SIZE is not exceeded */ if (vmr->vmr_gpa >= maxgpa || - vmr->vmr_size > maxgpa - vmr->vmr_gpa) + vmr->vmr_size > maxgpa - vmr->vmr_gpa) { + DPRINTF("exceeded max memory size\n"); return (0); + } /* * Make sure that all virtual addresses are within the address @@ -1667,39 +1673,29 @@ vm_create_check_mem_ranges(struct vm_create_params *vc */ if (vmr->vmr_va < VM_MIN_ADDRESS || vmr->vmr_va >= VM_MAXUSER_ADDRESS || - vmr->vmr_size >= VM_MAXUSER_ADDRESS - vmr->vmr_va) +
Re: vmd(8): create a proper e820 bios memory map
On 2022-12-12 16:36, Dave Voutila wrote: Mischa writes: On 2022-12-12 16:02, Dave Voutila wrote: Mischa writes: Hi Dave, Great stuff!! Everything is patched, build and booted. What is the best way to test this? Start guests as usual. I'd say the only thing definitively to manually check is that they see the same amount of physical memory as before the patch. That is indeed different. Before the patch allocating 1G would be displayed as 1G. Now a 1G allocation is 1.3G, 2G is 2.3G, 8G is 8.3G, etc. So I can reproduce, how are you measuring it? All the VMs with 2.3G have 2G memory allocated. root@current:~ # vmctl show ID PID VCPUS MAXMEM CURMEM TTYOWNERSTATE NAME 41 27026 12.3G328M ttypl root running vm29 40 27031 12.3G328M ttypk root running vm28 39 66653 12.3G328M ttypj root running vm27 38 79530 12.3G328M ttypi root running vm26 37 24110 12.3G328M ttyph root running vm25 36 89659 12.3G328M ttypg root running vm24 35 19193 12.3G328M ttypf root running vm23 34 48946 12.3G328M ttype root running vm22 33 32065 12.3G328M ttypd root running vm21 32 61847 12.3G328M ttypc root running vm20 31 42429 12.3G328M ttypb root running vm19 30 50201 12.3G328M ttypa root running vm18 29 18652 12.3G328M ttyp9 root running vm17 28 23312 12.3G328M ttyp8 root running vm16 27 21314 12.3G328M ttyp7 root running vm15 26 79420 12.3G328M ttyp6 root running vm14 25 23214 12.3G328M ttyp5 root running vm13 24 22755 12.3G328M ttyp4 root running vm12 23 7716 12.3G328M ttyp3 root running vm11 22 2758 12.3G328M ttyp2 root running vm10 1 - 1 30.0G - - runbsd stopped vm01 2 - 1 16.0G - - runbsd stopped vm02 3 - 18.0G - - runbsd stopped vm03 4 - 11.0G - - runbsd04 stopped vm04 5 - 11.0G - - runbsd stopped vm42 After starting vm0[1-4]. root@current:~ # vmctl show | grep vm0[1-4] 4 61002 11.3G990M ttypq runbsd04 running vm04 3 60620 18.3G1.4G ttypp runbsd running vm03 2 94240 1 16.3G2.5G ttypo runbsd running vm02 1 32209 1 30.3G4.6G ttypn runbsd running vm01 Booting one of the VMs with console: root@current:~ # vmctl start -c vm04 Connected to /dev/ttypn (speed 115200) Using drive 0, partition 3. Loading.. probing: pc0 com0 mem[638K 1022M a20=on] disk: hd0+ OpenBSD/amd64 BOOT 3.55 \ com0: 115200 baud switching console to com0 OpenBSD/amd64 BOOT 3.55 boot> booting hd0a:/bsd: 15615256+3765256+310448+0+1171456 [1138229+128+1224792+927979]=0x170ac00 entry point at 0x81001000 [ using 3292160 bytes of bsd ELF symbol table ] Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2022 OpenBSD. All rights reserved. https://www.OpenBSD.org OpenBSD 7.2 (GENERIC) #0: Wed Oct 26 11:26:29 MDT 2022 r...@syspatch-72-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC real mem = 1056952320 (1007MB) avail mem = 1007689728 (961MB) Standard 7.2 vmd/vmm: root@server18:~ # vmctl start -c vm40 Connected to /dev/ttyp7 (speed 115200) Using drive 0, partition 3. Loading.. probing: pc0 com0 mem[638K 1022M a20=on] disk: hd0+ OpenBSD/amd64 BOOT 3.55 \ com0: 115200 baud switching console to com0 OpenBSD/amd64 BOOT 3.55 boot> booting hd0a:/bsd: 15615256+3769352+309808+0+1171456 [1143120+128+1224792+927979]=0x170cf18 entry point at 0x81001000 [ using 3297048 bytes of bsd ELF symbol table ] Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2022 OpenBSD. All rights reserved. https://www.OpenBSD.org OpenBSD 7.2 (GENERIC) #2: Thu Nov 24 23:52:58 MST 2022 r...@syspatch-72-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC real mem = 1056952320 (1007MB) avail mem = 1007681536 (961MB) root@server18:~ # vmctl show vm40 ID PID VCPUS MAXMEM CURMEM TTYOWNERSTATE NAME 8 94043 11.0G918M ttyp7 runbsd running vm40 Mischa Start a bunch of VMs with bsd.rd? Does this still need to be a decompressed bsd.rd? Booting compressed bsd.rd's has been working for awhile now, so no need to decompress them, btw. If you boot a bsd.rd, it's exercising the changes to loadfile_elf.c,
Re: vmd(8): create a proper e820 bios memory map
On 2022-12-12 16:02, Dave Voutila wrote: Mischa writes: Hi Dave, Great stuff!! Everything is patched, build and booted. What is the best way to test this? Start guests as usual. I'd say the only thing definitively to manually check is that they see the same amount of physical memory as before the patch. That is indeed different. Before the patch allocating 1G would be displayed as 1G. Now a 1G allocation is 1.3G, 2G is 2.3G, 8G is 8.3G, etc. Start a bunch of VMs with bsd.rd? Does this still need to be a decompressed bsd.rd? Booting compressed bsd.rd's has been working for awhile now, so no need to decompress them, btw. If you boot a bsd.rd, it's exercising the changes to loadfile_elf.c, which is important to test. I missed that completely, good to know. Starting/stopping a bunch of VMs with bsd.rd (only) at the moment. Not sure if this is helpful, but got this in the logs: Dec 12 16:21:47 current /bsd: vmm_handle_cpuid: function 0x06 (thermal/power mgt) not supported Dec 12 16:21:47 current /bsd: vcpu_run_vmx: unimplemented exit type 32 (WRMSR instruction) Dec 12 16:21:47 current /bsd: vcpu @ 0x800022e3a000 in long mode Dec 12 16:21:47 current /bsd: CPL=0 Dec 12 16:21:47 current /bsd: rax=0x0081 rbx=0x818f8008 rcx=0x01a0 Dec 12 16:21:47 current /bsd: rdx=0x rbp=0x81a06da0 rdi=0x01a0 Dec 12 16:21:47 current /bsd: rsi=0x81a06d88 r8=0x0028 r9=0xa9144070 Dec 12 16:21:47 current /bsd: r10=0x r11=0x81a06ce0 r12=0x81a06e28 Dec 12 16:21:47 current /bsd: r13=0x8147da20 r14=0x818f6ff0 r15=0x0006 Dec 12 16:21:47 current /bsd: rip=0x81221d50 rsp=0x81a06d80 Dec 12 16:21:47 current /bsd: rflags=0x0246 (cf PF af ZF sf tf IF df of nt rf vm ac vif vip id IOPL=0) Dec 12 16:21:47 current /bsd: cr0=0x80010031 (PG cd nw am WP NE ET ts em mp PE) Dec 12 16:21:47 current /bsd: cr2=0x Dec 12 16:21:47 current /bsd: cr3=0x7f7d8000 (pwt pcd) Dec 12 16:21:47 current /bsd: cr4=0x26b0 (pke smap smep osxsave pcide fsgsbase smxe VMXE OSXMMEXCPT OSFXSR pce PGE mce PAE PSE de tsd pvi vme) Dec 12 16:21:47 current /bsd: --Guest Segment Info-- Dec 12 16:21:47 current /bsd: cs=0x0008 rpl=0 base=0x limit=0x a/r=0xa09b Dec 12 16:21:47 current /bsd: granularity=1 dib=0 l(64 bit)=1 present=1 sys=1 type=code, r/x, accessed Dec 12 16:21:47 current /bsd: ds=0x0010 rpl=0 base=0x limit=0x a/r=vmm_handle_cpuid: unsupported rax=0x4100 Dec 12 16:21:47 current /bsd: 0xa093 Dec 12 16:21:47 current /bsd: granularity=1 dib=0 l(64 bit)=1 present=1 sys=1 type=data, r/w, accessed Dec 12 16:21:47 current /bsd: es=0x0010 rpl=0 base=0x limit=0x a/r=0xa093 Dec 12 16:21:47 current /bsd: granularity=1 dib=0 l(64 bit)=1 present=1 sys=1 type=data, r/w, accessed Dec 12 16:21:47 current /bsd: fs=0x rpl=0 base=0x limit=0x a/r=0x1c000 Dec 12 16:21:47 current /bsd: (unusable) Dec 12 16:21:47 current /bsd: gs=0x rpl=0 base=0x818f6ff0 limit=0x a/r=0x1c000 Dec 12 16:21:47 current /bsd: (unusable) Dec 12 16:21:47 current /bsd: ss=0x0010 rpl=0 base=0x limit=0x a/r=0xa093 Dec 12 16:21:47 current /bsd: granularity=1 dib=0 l(64 bit)=1 present=1 sys=1 type=data, r/w, accessed Dec 12 16:21:47 current /bsd: tr=0x0030 base=0x818f5000 limit=0x0067 a/r=0x008b Dec 12 16:21:47 current /bsd: granularity=0 dib=0 l(64 bit)=0 present=1 sys=0 type=tss (busy) Dec 12 16:21:47 current /bsd: gdtr base=0x818f5068 limit=0x003f Dec 12 16:21:47 current /bsd: idtr base=0x8002 limit=0x0fff Dec 12 16:21:47 current /bsd: ldtr=0x base=0x limit=0x a/r=0x1c000 Dec 12 16:21:47 current /bsd: (unusable) Dec 12 16:21:47 current /bsd: vmm_handle_cpuid: function 0x06 (thermal/power mgt) not supported Dec 12 16:21:47 current /bsd: --Guest MSRs @ 0xfddea339d000 (paddr: 0x005ea339d000)-- Dec 12 16:21:47 current /bsd: MSR 0 @ 0xfddea339d000 : 0xc080 (EFER Dec 12 16:21:47 current /bsd: ), value=0x0d01 (SCE LME LMA NXE) Dec 12 16:21:47 current /bsd: MSR 1 @ 0xfddea339d010 : 0xc081 (STAR), value=0x001b0008 Dec 12 16:21:47 current /bsd: MSR 2 @ 0xfddea339d020 : 0xc082 (LSTAR), value=0x813b8000 Dec 12 16:21:47 current /bsd: MSR 3 @ 0xfddea339d030 : 0xc083 (CSTAR), value=0x813ba000 Dec 12 16:21:47 current /bsd: MSR 4 @ Dec 12 16:21:47 current /bsd: 0xfddea339d040 : 0xc084 (SFMASK), val Dec 12 16:21:47 current /bsd: ue=0x000 Dec 12 16:21:47 current /bsd: 44701 Mischa On 2022-12-10 23:51, Dave
Re: vmd(8): create a proper e820 bios memory map
Hi Dave, Great stuff!! Everything is patched, build and booted. What is the best way to test this? Start a bunch of VMs with bsd.rd? Does this still need to be a decompressed bsd.rd? Mischa On 2022-12-10 23:51, Dave Voutila wrote: tech@, The below diff tweaks how vmd and vmm define memory ranges (adding a "type" attribute) so we can properly build an e820 memory map to hand to things like SeaBIOS or the OpenBSD ramdisk kernel (when direct booting bsd.rd). Why do it? We've been carrying a few patches to SeaBIOS in the ports tree to hack around how vmd articulates some memory range details. By finally implementing a proper bios memory map table we can drop some of those patches. (Diff to ports@ coming shortly.) Bonus is it cleans up how we were hacking a bios memory map for direct booting ramdisk kernels. Note: the below diff *will* work with the current SeaBIOS (vmm-firmware), so you do *not* need to build the port. You will, however, need to: - build, install, & reboot into a new kernel - make sure you update /usr/include/amd64/vmmvar.h with a copy of symlink to sys/arch/amd64/include/vmmvar.h - rebuild & install vmctl - rebuild & install vmd This should *not* result in any behavioral changes of current vmd guests. If you notice any, especially guests failing to start, please rebuild a kernel with VMM_DEBUG to help diagnose the regression. -dv diff refs/heads/master refs/heads/vmd-e820 commit - a96642fb40af450c6576e205fab247cdbce0b5ed commit + f3cb01998127d200e95ff9984a7503eb16c2a8d8 blob - 3f7e0ce405ae3c6b0b4a787de341839886f97436 blob + f2a464217838d3f0a50e4131b5b074b315e490fb --- sys/arch/amd64/amd64/vmm.c +++ sys/arch/amd64/amd64/vmm.c @@ -1643,21 +1643,27 @@ vm_create_check_mem_ranges(struct vm_create_params *vc const paddr_t maxgpa = VMM_MAX_VM_MEM_SIZE; if (vcp->vcp_nmemranges == 0 || - vcp->vcp_nmemranges > VMM_MAX_MEM_RANGES) + vcp->vcp_nmemranges > VMM_MAX_MEM_RANGES) { + DPRINTF("invalid number of guest memory ranges\n"); return (0); + } for (i = 0; i < vcp->vcp_nmemranges; i++) { vmr = >vcp_memranges[i]; /* Only page-aligned addresses and sizes are permitted */ if ((vmr->vmr_gpa & PAGE_MASK) || (vmr->vmr_va & PAGE_MASK) || - (vmr->vmr_size & PAGE_MASK) || vmr->vmr_size == 0) + (vmr->vmr_size & PAGE_MASK) || vmr->vmr_size == 0) { + DPRINTF("memory range %zu is not page aligned\n", i); return (0); + } /* Make sure that VMM_MAX_VM_MEM_SIZE is not exceeded */ if (vmr->vmr_gpa >= maxgpa || - vmr->vmr_size > maxgpa - vmr->vmr_gpa) + vmr->vmr_size > maxgpa - vmr->vmr_gpa) { + DPRINTF("exceeded max memory size\n"); return (0); + } /* * Make sure that all virtual addresses are within the address @@ -1667,39 +1673,55 @@ vm_create_check_mem_ranges(struct vm_create_params *vc */ if (vmr->vmr_va < VM_MIN_ADDRESS || vmr->vmr_va >= VM_MAXUSER_ADDRESS || - vmr->vmr_size >= VM_MAXUSER_ADDRESS - vmr->vmr_va) + vmr->vmr_size >= VM_MAXUSER_ADDRESS - vmr->vmr_va) { + DPRINTF("guest va not within range or wraps\n"); return (0); + } /* * Specifying ranges within the PCI MMIO space is forbidden. * Disallow ranges that start inside the MMIO space: * [VMM_PCI_MMIO_BAR_BASE .. VMM_PCI_MMIO_BAR_END] */ - if (vmr->vmr_gpa >= VMM_PCI_MMIO_BAR_BASE && - vmr->vmr_gpa <= VMM_PCI_MMIO_BAR_END) + if (vmr->vmr_type == VM_MEM_RAM && + vmr->vmr_gpa >= VMM_PCI_MMIO_BAR_BASE && + vmr->vmr_gpa <= VMM_PCI_MMIO_BAR_END) { + DPRINTF("guest RAM range %zu cannot being in mmio range" + " (gpa=0x%lx)\n", i, vmr->vmr_gpa); return (0); + } /* * ... and disallow ranges that end inside the MMIO space: * (VMM_PCI_MMIO_BAR_BASE .. VMM_PCI_MMIO_BAR_END] */ - if (vmr->vmr_gpa + vmr->vmr_size > VMM_PCI_MMIO_BAR_BASE && - vmr->vmr_gpa + vmr->vmr_size <= VMM_PCI_MMIO_BAR_END) + if (vmr->vmr_type == VM_MEM_RAM && +
Re: move vmd vioblk handling to another thread
Hi David, Updated the machine to latest snap and applied the patch. The VMs I have on the machine I am testing with aren't booting properly. OpenBSD/amd64 BOOT 3.53 boot> NOTE: random seed is being reused. booting hd0a:/bsd: 15549720+3695624+345456+0+1171456 [1142327+128+1218000+922932]=0x16f0dc0 entry point at 0x81001000 [ using 3284416 bytes of bsd ELF symbol table ] Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2022 OpenBSD. All rights reserved. https://www.OpenBSD.org OpenBSD 7.1 (GENERIC) #3: Sun May 15 10:25:28 MDT 2022 r...@syspatch-71-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC real mem = 1056952320 (1007MB) avail mem = 1007792128 (961MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36e0 (10 entries) bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011 bios0: OpenBSD VMM acpi at bios0 not configured cpu0 at mainbus0: (uniprocessor) cpu0: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz, 2500.01 MHz, 06-2d-07 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,HV,NXE,PAGE1GB,LONG,LAHF,ITSC,MD_CLEAR,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 cpu0: using VERW MDS workaround pvbus0 at mainbus0: OpenBSD pvclock0 at pvbus0 pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00 virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00 viornd0 at virtio0 virtio0: irq 3 virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00 vio0 at virtio1: address fe:e1:bb:6a:00:04 virtio1: irq 5 virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Storage" rev 0x00 vioblk0 at virtio2 scsibus1 at vioblk0: 1 targets sd0 at scsibus1 targ 0 lun 0: sd0: 51200MB, 512 bytes/sector, 104857600 sectors virtio2: irq 6 virtio3 at pci0 dev 4 function 0 "OpenBSD VMM Control" rev 0x00 vmmci0 at virtio3 virtio3: irq 7 isa0 at mainbus0 isadma0 at isa0 com0 at isa0 port 0x3f8/8 irq 4: ns8250, no fifo com0: console vscsi0 at root scsibus2 at vscsi0: 256 targets softraid0 at root scsibus3 at softraid0: 256 targets After this nothing happens. Mischa On 2022-11-11 16:52, David Gwynne wrote: this updates a diff i had from a few years ago to move the vioblk handling in vmd into a separate thread. basically disk io in your virtual machine should not block the vcpu from running now. just throwing this out so people can give it a go and kick it around. Index: Makefile === RCS file: /cvs/src/usr.sbin/vmd/Makefile,v retrieving revision 1.28 diff -u -p -r1.28 Makefile --- Makefile10 Nov 2022 11:46:39 - 1.28 +++ Makefile11 Nov 2022 15:51:50 - @@ -5,7 +5,7 @@ PROG= vmd SRCS= vmd.c control.c log.c priv.c proc.c config.c vmm.c SRCS+= vm.c loadfile_elf.c pci.c virtio.c i8259.c mc146818.c -SRCS+= ns8250.c i8253.c dhcp.c packet.c mmio.c +SRCS+= ns8250.c i8253.c dhcp.c packet.c mmio.c task.c SRCS+= parse.y atomicio.c vioscsi.c vioraw.c vioqcow2.c fw_cfg.c SRCS+= vm_agentx.c Index: task.c === RCS file: task.c diff -N task.c --- /dev/null 1 Jan 1970 00:00:00 - +++ task.c 11 Nov 2022 15:51:50 - @@ -0,0 +1,158 @@ +/* $OpenBSD: task.c,v 1.2 2018/06/19 17:12:34 reyk Exp $ */ + +/* + * Copyright (c) 2017 David Gwynne + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "task.h" + +#define ISSET(_v, _m) ((_v) & (_m)) +#define SET(_v, _m)((_v) |= (_m)) +#define CLR(_v, _m)((_v) &= ~(_m)) + +struct taskq { + pthread_t thread; + struct task_list list; + pthread_mutex_t mtx; + pthread_cond_tcv; +}; + +#define TASK_ONQUEUE (1 << 0) + +static void *taskq_run(void *);
Re: vmd(8) fix error handling when hitting rlimit
Hi Dave, This is great! Had some surprises when allocating 32G, I think the limit is just a little bit lower than 32G, and indeed... lots of loud silence. :) Mischa On 2022-02-27 01:27, Dave Voutila wrote: Following the discusion on misc@ and a diff from tedu@ [1], here's a bit more work cleaning up the issue in vmd(8) to prevent vmd dying if a user tries to create a vm with memory above the rlimit. I changed tedu's diff a bit to make it less verbose and print a value that's human readable using fmt_scaled(3). It also only prints the message about data limits if the error was ENOMEM. I've also incorporated some feedback off list and follow the error condition further through vmd. Now vmctl receives a descriptive error as well. OK or feedback? An example trying to start a vm named "test": Before == vmctl: $ doas vmctl start -Lc -d vm/openbsd.qcow2 -m4G test vmctl: pipe closed vmd: $ doas vmd -d startup test: could not allocate guest memory - exiting: Cannot allocate memory vmm: read vcp id priv exiting, pid 36084 control exiting, pid 58866 (vmd exits, a dove cries) After = vmctl: $ doas vmctl start -Lc -d vm/openbsd.qcow2 -m4G test vmctl: start vm command failed: Cannot allocate memory vmd: $ doas obj/vmd -d startup test: could not allocate guest memory (data limit is 4.0G) test: failed to start vm (vmd still alive) -dv [1] https://marc.info/?l=openbsd-misc=164581487723923=2 diff e7fa9d3941282eed56a0d5808179cb0e321faae6 /usr/src blob - 4c6c99f1133cec7cb1e38dfd22e595e4d2023842 file + usr.sbin/vmd/vm.c --- usr.sbin/vmd/vm.c +++ usr.sbin/vmd/vm.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -292,17 +293,24 @@ start_vm(struct vmd_vm *vm, int fd) ret = alloc_guest_mem(vcp); if (ret) { + struct rlimit lim; + char buf[FMT_SCALED_STRSIZE]; + if (ret == ENOMEM && getrlimit(RLIMIT_DATA, ) == 0) { + if (fmt_scaled(lim.rlim_cur, buf) == 0) + fatalx("could not allocate guest memory (data " + "limit is %s)", buf); + } errno = ret; - fatal("could not allocate guest memory - exiting"); + fatal("could not allocate guest memory"); } ret = vmm_create_vm(vcp); current_vm = vm; /* send back the kernel-generated vm id (0 on error) */ - if (write(fd, >vcp_id, sizeof(vcp->vcp_id)) != + if (atomicio(vwrite, fd, >vcp_id, sizeof(vcp->vcp_id)) != sizeof(vcp->vcp_id)) - fatal("write vcp id"); + fatal("failed to send created vm id to vmm process"); if (ret) { errno = ret; @@ -319,10 +327,9 @@ start_vm(struct vmd_vm *vm, int fd) fatal("pledge"); if (vm->vm_state & VM_STATE_RECEIVED) { - ret = read(vm->vm_receive_fd, , sizeof(vrp)); - if (ret != sizeof(vrp)) { + ret = atomicio(read, vm->vm_receive_fd, , sizeof(vrp)); + if (ret != sizeof(vrp)) fatal("received incomplete vrp - exiting"); - } vrs = vrp.vrwp_regs; } else { /* blob - a60291c17f1f5bb94572f531bdc7e6b2f6b707d5 file + usr.sbin/vmd/vmd.c --- usr.sbin/vmd/vmd.c +++ usr.sbin/vmd/vmd.c @@ -399,9 +399,9 @@ vmd_dispatch_vmm(int fd, struct privsep_proc *p, struc } if (vmr.vmr_result) { + log_warnx("%s: failed to start vm", vcp->vcp_name); + vm_remove(vm, __func__); errno = vmr.vmr_result; - log_warn("%s: failed to start vm", vcp->vcp_name); - vm_remove(vm, __func__); break; } blob - eb75b4c587884ec43704420ef4172386a5b39bd9 file + usr.sbin/vmd/vmm.c --- usr.sbin/vmd/vmm.c +++ usr.sbin/vmd/vmm.c @@ -51,6 +51,7 @@ #include "vmd.h" #include "vmm.h" +#include "atomicio.h" void vmm_sighdlr(int, short, void *); intvmm_start_vm(struct imsg *, uint32_t *, pid_t *); @@ -145,7 +146,7 @@ vmm_dispatch_parent(int fd, struct privsep_proc *p, st case IMSG_VMDOP_START_VM_END: res = vmm_start_vm(imsg, , ); /* Check if the ID can be mapped correctly */ - if ((id = vm_id2vmid(id, NULL)) == 0) + if (res == 0 && (id = vm_id2vmid(id, NULL)) == 0) res = ENOENT; cmd = IMSG_VMDOP_START_VM_RESPONSE; break; @@ -615,7 +616,8 @@ vmm_start_vm(struct imsg *imsg, uint32_t *id, pid_t *p struct vmd_vm *vm; int
Re: nsd 4.3.8
Got it. Oct 20 20:47:19 name2 nsd[62305]: nsd starting (NSD 4.3.8) Oct 20 20:47:19 name2 nsd[37128]: nsd started (NSD 4.3.8), pid 31864 Oct 20 22:07:09 name2 nsd[37128]: signal received, shutting down... Oct 20 22:07:09 name2 nsd[39445]: nsd starting (NSD 4.3.7) Oct 20 22:07:10 name2 nsd[72021]: nsd started (NSD 4.3.7), pid 192 So far so good. Mischa On 2021-10-20 21:56, Florian Obser wrote: I mean the diff I sent to bugs@ in response to the thread you started on misc. "Re: NSD exit status 11 on 7.0" This thread is about upgrading nsd in current, but we also need to fix 7.0. I thought you are running stable in production? Anyway, having the full upgrade tested is also valuable, so thanks for that. But if you are running stable please try the patch from bugs@, I want to put that one into an errata. On 20 October 2021 21:44:19 CEST, Mischa wrote: Is the below patch not needed? I did run it without the below patch first, without any problems. After I applied the below patch and compiled again. Mischa On 2021-10-20 21:34, Florian Obser wrote: Uhm, could you please try the single patch from the other mail on 7.0? We are probably not going to syspatch to a new nsd version in 7.0. On 20 October 2021 21:18:17 CEST, Mischa Peters wrote: Hi Florian, Great stuff! Applied both patches and NSD has been running without crashing since 20:47 CEST. Oct 20 20:47:19 name2 nsd[62305]: nsd starting (NSD 4.3.8) Oct 20 20:47:19 name2 nsd[37128]: nsd started (NSD 4.3.8), pid 31864 Oct 20 20:47:30 name2 /bsd: carp24: state transition: BACKUP -> MASTER Oct 20 20:47:46 name2 /bsd: carp23: state transition: BACKUP -> MASTER Thanx a lot for the quick patches!! Mischa On 2021-10-20 18:27, Florian Obser wrote: On 2021-10-20 18:24 +02, Florian Obser wrote: +4.3.8 + +FEATURES: + - Set default for answer-cookie to no. Because in server deployments + with mixed server software, a default of yes causes issues. sthen and me think that we shouldn't flip-flop between cookie on and cookie off since we shipped the cookie on default in 7.0. This is on top of the 4.3.8 diff and reverts that behaviour to cookie on as we have in 7.0. OK? diff --git nsd.conf.5.in nsd.conf.5.in index 4ee4b1292f9..9ae376f288c 100644 --- nsd.conf.5.in +++ nsd.conf.5.in @@ -494,7 +494,7 @@ With the value 0 the rate is unlimited. .TP .B answer\-cookie:\fR Enable to answer to requests containig DNS Cookies as specified in RFC7873. -Default is no. +Default is yes. .TP .B cookie\-secret:\fR <128 bit hex string> Servers in an anycast deployment need to be able to verify each other's DNS diff --git options.c options.c index 6411959e8c6..d8fe022b412 100644 --- options.c +++ options.c @@ -131,7 +131,7 @@ nsd_options_create(region_type* region) opt->tls_service_pem = NULL; opt->tls_port = TLS_PORT; opt->tls_cert_bundle = NULL; - opt->answer_cookie = 0; + opt->answer_cookie = 1; opt->cookie_secret = NULL; opt->cookie_secret_file = CONFIGDIR"/nsd_cookiesecrets.txt"; opt->control_enable = 0;
Re: nsd 4.3.8
Is the below patch not needed? I did run it without the below patch first, without any problems. After I applied the below patch and compiled again. Mischa On 2021-10-20 21:34, Florian Obser wrote: Uhm, could you please try the single patch from the other mail on 7.0? We are probably not going to syspatch to a new nsd version in 7.0. On 20 October 2021 21:18:17 CEST, Mischa Peters wrote: Hi Florian, Great stuff! Applied both patches and NSD has been running without crashing since 20:47 CEST. Oct 20 20:47:19 name2 nsd[62305]: nsd starting (NSD 4.3.8) Oct 20 20:47:19 name2 nsd[37128]: nsd started (NSD 4.3.8), pid 31864 Oct 20 20:47:30 name2 /bsd: carp24: state transition: BACKUP -> MASTER Oct 20 20:47:46 name2 /bsd: carp23: state transition: BACKUP -> MASTER Thanx a lot for the quick patches!! Mischa On 2021-10-20 18:27, Florian Obser wrote: On 2021-10-20 18:24 +02, Florian Obser wrote: +4.3.8 + +FEATURES: + - Set default for answer-cookie to no. Because in server deployments + with mixed server software, a default of yes causes issues. sthen and me think that we shouldn't flip-flop between cookie on and cookie off since we shipped the cookie on default in 7.0. This is on top of the 4.3.8 diff and reverts that behaviour to cookie on as we have in 7.0. OK? diff --git nsd.conf.5.in nsd.conf.5.in index 4ee4b1292f9..9ae376f288c 100644 --- nsd.conf.5.in +++ nsd.conf.5.in @@ -494,7 +494,7 @@ With the value 0 the rate is unlimited. .TP .B answer\-cookie:\fR Enable to answer to requests containig DNS Cookies as specified in RFC7873. -Default is no. +Default is yes. .TP .B cookie\-secret:\fR <128 bit hex string> Servers in an anycast deployment need to be able to verify each other's DNS diff --git options.c options.c index 6411959e8c6..d8fe022b412 100644 --- options.c +++ options.c @@ -131,7 +131,7 @@ nsd_options_create(region_type* region) opt->tls_service_pem = NULL; opt->tls_port = TLS_PORT; opt->tls_cert_bundle = NULL; - opt->answer_cookie = 0; + opt->answer_cookie = 1; opt->cookie_secret = NULL; opt->cookie_secret_file = CONFIGDIR"/nsd_cookiesecrets.txt"; opt->control_enable = 0;
Re: nsd 4.3.8
Hi Florian, Great stuff! Applied both patches and NSD has been running without crashing since 20:47 CEST. Oct 20 20:47:19 name2 nsd[62305]: nsd starting (NSD 4.3.8) Oct 20 20:47:19 name2 nsd[37128]: nsd started (NSD 4.3.8), pid 31864 Oct 20 20:47:30 name2 /bsd: carp24: state transition: BACKUP -> MASTER Oct 20 20:47:46 name2 /bsd: carp23: state transition: BACKUP -> MASTER Thanx a lot for the quick patches!! Mischa On 2021-10-20 18:27, Florian Obser wrote: On 2021-10-20 18:24 +02, Florian Obser wrote: +4.3.8 + +FEATURES: + - Set default for answer-cookie to no. Because in server deployments + with mixed server software, a default of yes causes issues. sthen and me think that we shouldn't flip-flop between cookie on and cookie off since we shipped the cookie on default in 7.0. This is on top of the 4.3.8 diff and reverts that behaviour to cookie on as we have in 7.0. OK? diff --git nsd.conf.5.in nsd.conf.5.in index 4ee4b1292f9..9ae376f288c 100644 --- nsd.conf.5.in +++ nsd.conf.5.in @@ -494,7 +494,7 @@ With the value 0 the rate is unlimited. .TP .B answer\-cookie:\fR Enable to answer to requests containig DNS Cookies as specified in RFC7873. -Default is no. +Default is yes. .TP .B cookie\-secret:\fR <128 bit hex string> Servers in an anycast deployment need to be able to verify each other's DNS diff --git options.c options.c index 6411959e8c6..d8fe022b412 100644 --- options.c +++ options.c @@ -131,7 +131,7 @@ nsd_options_create(region_type* region) opt->tls_service_pem = NULL; opt->tls_port = TLS_PORT; opt->tls_cert_bundle = NULL; - opt->answer_cookie = 0; + opt->answer_cookie = 1; opt->cookie_secret = NULL; opt->cookie_secret_file = CONFIGDIR"/nsd_cookiesecrets.txt"; opt->control_enable = 0;
Re: vmd(8): simplify vcpu logic, removing uart & net reads
Hi Dave, > On 3 Jul 2021, at 19:08, Matthias Schmidt wrote: > > Hi Dave, > > * Dave Voutila wrote: >> Looking for some broader testing of the following diff. It cleans up >> some complicated logic predominantly left over from the early days of >> vmd prior to its having a dedicated device thread. >> >> In summary, this diff: >> >> - Removes vionet "rx pending" state handling and removes the code path >> for the vcpu thread to possibly take control of the virtio net device >> and attempt a read of the underlying tap(4). (virtio.{c,h}, vm.c) >> >> - Removes ns8250 "rcv pending" state handling and removes the code path >> for the vcpu thread to read the pty via com_rcv(). (ns8250.{c,h}) >> >> In both of the above cases, the event handling thread will be notified >> of readable data and deal with it. >> >> Why remove them? The logic is overly complicated and hard to reason >> about for zero gain. (This diff results in no intended functional >> change.) Plus, some of the above logic I helped add to deal with the >> race conditions and state corruption over a year ago. The logic was >> needed once upon a time, but shouldn't be needed at present. >> >> I've had positive testing feedback from abieber@ so far with at least >> the ns8250/uart diff, but want to cast a broader net here with both >> before either part is committed. I debated splitting these up, but >> they're thematically related. > > I have the diff running since one week on -current with stable/current > and an Archlinux guest and have noticed no regression so far. > > Cheers > > Matthias No issues on my side as well. Mischa
Re: vmd(8): fix vmctl wait state corruption
> On 24 Apr 2021, at 20:56, Dave Voutila wrote: > > > Dave Voutila writes: > >> Dave Voutila writes: >> >>> vmd(8) users of tech@, >>> >>> NOTE: I have no intention to try to commit this prior to 6.9's release >>> due to its complexity, but I didn't want to "wait" to solicit testers or >>> potential feedback. >> >> Freeze is over, so bumping this thread with an updated diff below. >> > > Now that there's been some testing and snaps are building once again, > anyone willing to review & OK? Wanted to confirm here is as well. The patch works well. Ran this patch against -current with ~30 VMs owned by a user account. Issued vmctl stop -aw, ctrl-c-ed every 3-4 VM, and every time the last VM stopped shutdown properly. Even in rapid succession of vmctl stop -aw + ctrl-c, resulting in multiple VMs in "stopping” stage, all worked well. All VMs also started properly without any fsck needed. Mischa >>> I noticed recently that I could not have two vmctl(8) clients "wait" for >>> the same vm to shutdown as one would cancel the other. Worse yet, if you >>> cancel a wait (^C) you can effectively corrupt the state being used for >>> tracking the waiting client preventing future clients from waiting on >>> the vm. >>> >>> It turns out the socket fd of the vmctl(8) client is being sent by the >>> control process as the peerid in the imsg. This fd is being stored on >>> the vmd_vm structure in the vm_peerid member, but this fd only has >>> meaning in the scope of the control process. Consequently: >>> >>> - only 1 value can be stored at a time, meaning only 1 waiting client >>> can exist at a time >>> - since vm_peerid is used for storing if another vmd(8) process is >>> waiting on the vm, vm_peerid can be corrupted by vmctl(8) >>> - the control process cannot update waiting state on vmctl disconnects >>> and since fd's are reused it's possible the message could be sent to a >>> vmctl(8) client performing an operation other than a "wait" >>> >>> The below diff: >>> >>> 1. enables support for multiple vmctl(8) clients to wait on the same vm >>> to terminate >>> 2. keeps the wait state in the control process and out of the parent's >>> global vm state, tracking waiting parties in a TAILQ >>> 3. removes the waiting client state on client disconnect/cancellation >>> 4. simplifies vmd(8) by removing IMSG_VMDOP_WAIT_VM_REQUEST handling >>> from the vmm process, which isn't needed (and was partially >>> responsible for the corruption) >>> >> >> Above design still stands, but I've fixed some messaging issues related >> to the fact the parent process was forwarding >> IMSG_VMDOP_TERMINATE_VM_RESPONSE messages directly to the control >> process resulting in duplicate messages. This broke doing a `vmctl stop` >> for all vms (-a) and waiting (-w). It now only forwards errors. >> >>> There are some subsequent tweaks that may follow this diff, specifically >>> one related to the fact I've switched the logic to send >>> IMSG_VMDOP_TERMINATE_VM_EVENT messages to the control process (which >>> makes sense to me) but control relays a IMSG_VMDOP_TERMINATE_VM_RESPONSE >>> message to the waiting vmctl(8) client. I'd need to update vmctl(8) to >>> look for the other event and don't want to complicate the diff further. >>> >>> If any testers out there can try to break this for me it would be much >>> appreciated. :-) >>> >> >> Testers? I'd like to give people a few days to kick the tires before >> asking for OK to commit. >> >> -dv >> >> >> Index: control.c >> === >> RCS file: /cvs/src/usr.sbin/vmd/control.c,v >> retrieving revision 1.34 >> diff -u -p -r1.34 control.c >> --- control.c20 Apr 2021 21:11:56 - 1.34 >> +++ control.c21 Apr 2021 17:17:04 - >> @@ -41,6 +41,13 @@ >> >> struct ctl_connlist ctl_conns = TAILQ_HEAD_INITIALIZER(ctl_conns); >> >> +struct ctl_notify { >> +int ctl_fd; >> +uint32_tctl_vmid; >> +TAILQ_ENTRY(ctl_notify) entry; >> +}; >> +TAILQ_HEAD(ctl_notify_q, ctl_notify) ctl_notify_q = >> +TAILQ_HEAD_INITIALIZER(ctl_notify_q); >> void >> control_accept(int, short, void *); >> struct ctl_conn >> @@ -78,7 +85,10 @@ int >> control_di
Re: vmm crash on 6.9-beta
> On 22 Mar 2021, at 15:23, Otto Moerbeek wrote: > On Mon, Mar 22, 2021 at 03:20:37PM +0100, Mischa wrote: >>> On 22 Mar 2021, at 15:18, Otto Moerbeek wrote: >>> On Mon, Mar 22, 2021 at 03:06:40PM +0100, Mischa wrote: >>> >>>>> On 22 Mar 2021, at 15:05, Dave Voutila wrote: >>>>> Otto Moerbeek writes: >>>>>> On Mon, Mar 22, 2021 at 09:51:19AM -0400, Dave Voutila wrote: >>>>>>> Otto Moerbeek writes: >>>>>>>> On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote: >>>>>>>>>> On 22 Mar 2021, at 13:43, Stuart Henderson >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>>> Created a fresh install qcow2 image and derived 35 new VMs from it. >>>>>>>>>>>> Then I started all the VMs in four cycles, 10 VMs per cycle and >>>>>>>>>>>> waiting 240 seconds after each cycle. >>>>>>>>>>>> Similar to the staggered start based on the amount of CPUs. >>>>>>>>>> >>>>>>>>>>> For me this is not enough info to even try to reproduce, I know >>>>>>>>>>> little >>>>>>>>>>> of vmm or vmd and have no idea what "derive" means in this context. >>>>>>>>>> >>>>>>>>>> This is a big bit of information that was missing from the original >>>>>>>>> >>>>>>>>> Well.. could have been better described indeed. :)) >>>>>>>>> " I created 41 additional VMs based on a single qcow2 base image.” >>>>>>>>> >>>>>>>>>> report ;) qcow has a concept of a read-only base image (or 'backing >>>>>>>>>> file') which can be shared between VMs, with writes diverted to a >>>>>>>>>> separate image ('derived image'). >>>>>>>>>> >>>>>>>>>> So e.g. you can create a base image, do a simple OS install for a >>>>>>>>>> particular OS version to that base image, then you stop using that >>>>>>>>>> for a VM and just use it as a base to create derived images from. >>>>>>>>>> You then run VMs using the derived image and make whatever config >>>>>>>>>> changes. If you have a bunch of VMs using the same OS release then >>>>>>>>>> you save some disk space for the common files. >>>>>>>>>> >>>>>>>>>> Mischa did you leave a VM running which is working on the base >>>>>>>>>> image directly? That would certainly cause problems. >>>>>>>>> >>>>>>>>> I did indeed. Let me try that again without keeping the base image >>>>>>>>> running. >>>>>>>> >>>>>>>> Right. As a safeguard, I would change the base image to be r/o. >>>>>>> >>>>>>> vmd(8) should treating it r/o...the config process is responsible for >>>>>>> opening the disk files and passing the fd's to the vm process. In >>>>>>> config.c, the call to open(2) for the base images should be using the >>>>>>> flags O_RDONLY | O_NONBLOCK. >>>>>>> >>>>>>> A ktrace on my system shows that's the case. Below, "new.qcow2" is a new >>>>>>> disk image I based off the "alpine.qcow2" image: >>>>>>> >>>>>>> 20862 vmd CALL >>>>>>> open(0x7f7d4370,0x26) >>>>>>> 20862 vmd NAMI "/home/dave/vm/new.qcow2" >>>>>>> 20862 vmd RET open 10/0xa >>>>>>> 20862 vmd CALL fstat(10,0x7f7d42b8) >>>>>>> 20862 vmd STRU struct stat { dev=1051, ino=19531847, >>>>>>> mode=-rw--- , nlink=1, uid=1000<"dave">, gid=1000<"dave">, >>>>>>> rdev=78096304, atime=1616420730<"Mar 22 09:45:30 2021">.509011764, >>>>>>> mtime=1616420697<"Mar 22 09:44:57 2021">.189185158, >>>>>>> ctime=1616420697<"Mar 22 09:44:57 2021">.189185158, size=262144, >>&
Re: vmm crash on 6.9-beta
> On 22 Mar 2021, at 15:18, Otto Moerbeek wrote: > > On Mon, Mar 22, 2021 at 03:06:40PM +0100, Mischa wrote: > >>> On 22 Mar 2021, at 15:05, Dave Voutila wrote: >>> Otto Moerbeek writes: >>>> On Mon, Mar 22, 2021 at 09:51:19AM -0400, Dave Voutila wrote: >>>>> Otto Moerbeek writes: >>>>>> On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote: >>>>>>>> On 22 Mar 2021, at 13:43, Stuart Henderson >>>>>>>> wrote: >>>>>>>> >>>>>>>>>> Created a fresh install qcow2 image and derived 35 new VMs from it. >>>>>>>>>> Then I started all the VMs in four cycles, 10 VMs per cycle and >>>>>>>>>> waiting 240 seconds after each cycle. >>>>>>>>>> Similar to the staggered start based on the amount of CPUs. >>>>>>>> >>>>>>>>> For me this is not enough info to even try to reproduce, I know little >>>>>>>>> of vmm or vmd and have no idea what "derive" means in this context. >>>>>>>> >>>>>>>> This is a big bit of information that was missing from the original >>>>>>> >>>>>>> Well.. could have been better described indeed. :)) >>>>>>> " I created 41 additional VMs based on a single qcow2 base image.” >>>>>>> >>>>>>>> report ;) qcow has a concept of a read-only base image (or 'backing >>>>>>>> file') which can be shared between VMs, with writes diverted to a >>>>>>>> separate image ('derived image'). >>>>>>>> >>>>>>>> So e.g. you can create a base image, do a simple OS install for a >>>>>>>> particular OS version to that base image, then you stop using that >>>>>>>> for a VM and just use it as a base to create derived images from. >>>>>>>> You then run VMs using the derived image and make whatever config >>>>>>>> changes. If you have a bunch of VMs using the same OS release then >>>>>>>> you save some disk space for the common files. >>>>>>>> >>>>>>>> Mischa did you leave a VM running which is working on the base >>>>>>>> image directly? That would certainly cause problems. >>>>>>> >>>>>>> I did indeed. Let me try that again without keeping the base image >>>>>>> running. >>>>>> >>>>>> Right. As a safeguard, I would change the base image to be r/o. >>>>> >>>>> vmd(8) should treating it r/o...the config process is responsible for >>>>> opening the disk files and passing the fd's to the vm process. In >>>>> config.c, the call to open(2) for the base images should be using the >>>>> flags O_RDONLY | O_NONBLOCK. >>>>> >>>>> A ktrace on my system shows that's the case. Below, "new.qcow2" is a new >>>>> disk image I based off the "alpine.qcow2" image: >>>>> >>>>> 20862 vmd CALL open(0x7f7d4370,0x26) >>>>> 20862 vmd NAMI "/home/dave/vm/new.qcow2" >>>>> 20862 vmd RET open 10/0xa >>>>> 20862 vmd CALL fstat(10,0x7f7d42b8) >>>>> 20862 vmd STRU struct stat { dev=1051, ino=19531847, >>>>> mode=-rw--- , nlink=1, uid=1000<"dave">, gid=1000<"dave">, >>>>> rdev=78096304, atime=1616420730<"Mar 22 09:45:30 2021">.509011764, >>>>> mtime=1616420697<"Mar 22 09:44:57 2021">.189185158, ctime=1616420697<"Mar >>>>> 22 09:44:57 2021">.189185158, size=262144, blocks=256, blksize=32768, >>>>> flags=0x0, gen=0xb64d5d98 } >>>>> 20862 vmd RET fstat 0 >>>>> 20862 vmd CALL kbind(0x7f7d39d8,24,0x2a9349e63ae9950c) >>>>> 20862 vmd RET kbind 0 >>>>> 20862 vmd CALL pread(10,0x7f7d42a8,0x68,0) >>>>> 20862 vmd GIO fd 10 read 104 bytes >>>>> >>>>> "QFI\M-{\0\0\0\^C\0\0\0\0\0\0\0h\0\0\0\f\0\0\0\^P\0\0\0\^E\0\0\0\0\0\0\ >>>>> >>>>> \0\0\0\0\0(\0\0\0\0\0\^A\0\0\0\0\0\0\0\^B\0\0\0\0\0\^A\0\0\0\0\0\0\0\ >>>>> >
Re: vmm crash on 6.9-beta
> On 22 Mar 2021, at 15:05, Dave Voutila wrote: > Otto Moerbeek writes: >> On Mon, Mar 22, 2021 at 09:51:19AM -0400, Dave Voutila wrote: >>> Otto Moerbeek writes: >>>> On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote: >>>>>> On 22 Mar 2021, at 13:43, Stuart Henderson wrote: >>>>>> >>>>>>>> Created a fresh install qcow2 image and derived 35 new VMs from it. >>>>>>>> Then I started all the VMs in four cycles, 10 VMs per cycle and >>>>>>>> waiting 240 seconds after each cycle. >>>>>>>> Similar to the staggered start based on the amount of CPUs. >>>>>> >>>>>>> For me this is not enough info to even try to reproduce, I know little >>>>>>> of vmm or vmd and have no idea what "derive" means in this context. >>>>>> >>>>>> This is a big bit of information that was missing from the original >>>>> >>>>> Well.. could have been better described indeed. :)) >>>>> " I created 41 additional VMs based on a single qcow2 base image.” >>>>> >>>>>> report ;) qcow has a concept of a read-only base image (or 'backing >>>>>> file') which can be shared between VMs, with writes diverted to a >>>>>> separate image ('derived image'). >>>>>> >>>>>> So e.g. you can create a base image, do a simple OS install for a >>>>>> particular OS version to that base image, then you stop using that >>>>>> for a VM and just use it as a base to create derived images from. >>>>>> You then run VMs using the derived image and make whatever config >>>>>> changes. If you have a bunch of VMs using the same OS release then >>>>>> you save some disk space for the common files. >>>>>> >>>>>> Mischa did you leave a VM running which is working on the base >>>>>> image directly? That would certainly cause problems. >>>>> >>>>> I did indeed. Let me try that again without keeping the base image >>>>> running. >>>> >>>> Right. As a safeguard, I would change the base image to be r/o. >>> >>> vmd(8) should treating it r/o...the config process is responsible for >>> opening the disk files and passing the fd's to the vm process. In >>> config.c, the call to open(2) for the base images should be using the >>> flags O_RDONLY | O_NONBLOCK. >>> >>> A ktrace on my system shows that's the case. Below, "new.qcow2" is a new >>> disk image I based off the "alpine.qcow2" image: >>> >>> 20862 vmd CALL open(0x7f7d4370,0x26) >>> 20862 vmd NAMI "/home/dave/vm/new.qcow2" >>> 20862 vmd RET open 10/0xa >>> 20862 vmd CALL fstat(10,0x7f7d42b8) >>> 20862 vmd STRU struct stat { dev=1051, ino=19531847, mode=-rw--- >>> , nlink=1, uid=1000<"dave">, gid=1000<"dave">, rdev=78096304, >>> atime=1616420730<"Mar 22 09:45:30 2021">.509011764, mtime=1616420697<"Mar >>> 22 09:44:57 2021">.189185158, ctime=1616420697<"Mar 22 09:44:57 >>> 2021">.189185158, size=262144, blocks=256, blksize=32768, flags=0x0, >>> gen=0xb64d5d98 } >>> 20862 vmd RET fstat 0 >>> 20862 vmd CALL kbind(0x7f7d39d8,24,0x2a9349e63ae9950c) >>> 20862 vmd RET kbind 0 >>> 20862 vmd CALL pread(10,0x7f7d42a8,0x68,0) >>> 20862 vmd GIO fd 10 read 104 bytes >>> >>> "QFI\M-{\0\0\0\^C\0\0\0\0\0\0\0h\0\0\0\f\0\0\0\^P\0\0\0\^E\0\0\0\0\0\0\ >>>\0\0\0\0\0(\0\0\0\0\0\^A\0\0\0\0\0\0\0\^B\0\0\0\0\0\^A\0\0\0\0\0\0\0\ >>> >>> \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\^D\0\ >>>\0\0h" >>> 20862 vmd RET pread 104/0x68 >>> 20862 vmd CALL pread(10,0x7f7d4770,0xc,0x68) >>> 20862 vmd GIO fd 10 read 12 bytes >>> "alpine.qcow2" >>> 20862 vmd RET pread 12/0xc >>> 20862 vmd CALL kbind(0x7f7d39d8,24,0x2a9349e63ae9950c) >>> 20862 vmd RET kbind 0 >>> 20862 vmd CALL kbind(0x7f7d39d8,24,0x2a9349e63ae9950c) >>> 20862 vmd RET kbind 0 >>> 20862 vmd CALL __realpath(0x7f7d3ea0,0x7f7d3680) >>> 20862 vmd NAMI "/home/dave/vm/alpine.qcow2" >>> 20862 vmd NAMI "/home/dave/vm/alpine.qcow2" >>> 20862 vmd RET __realpath 0 >>> 20862 vmd CALL open(0x7f7d4370,0x4) >>> 20862 vmd NAMI "/home/dave/vm/alpine.qcow2" >>> 20862 vmd RET open 11/0xb >>> 20862 vmd CALL fstat(11,0x7f7d42b8) >>> >>> >>> I'm more familiar with the vmd(8) codebase than any ffs stuff, but I >>> don't think the issue is the base image being r/w. >>> >>> -Dave >> >> AFAIKS, the issue is that if you start a vm modifying the base because it >> uses it as a regular image, that r/o open for the other vms does not >> matter a lot, >> >> -OPtto > > Good point. I'm going to look into the feasibility of having the > control[1] process track what disks it's opened and in what mode to see > if there's a way to build in some protection against this from > happening. > > [1] I mistakenly called it the "config" process earlier. I guess that would help a lot of poor souls like myself to not make that mistake again. :) Mischa
Re: vmm crash on 6.9-beta
> On 22 Mar 2021, at 14:30, Otto Moerbeek wrote: > On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote: >>> On 22 Mar 2021, at 13:43, Stuart Henderson wrote: >>> >>>>> Created a fresh install qcow2 image and derived 35 new VMs from it. >>>>> Then I started all the VMs in four cycles, 10 VMs per cycle and waiting >>>>> 240 seconds after each cycle. >>>>> Similar to the staggered start based on the amount of CPUs. >>> >>>> For me this is not enough info to even try to reproduce, I know little >>>> of vmm or vmd and have no idea what "derive" means in this context. >>> >>> This is a big bit of information that was missing from the original >> >> Well.. could have been better described indeed. :)) >> " I created 41 additional VMs based on a single qcow2 base image.” >> >>> report ;) qcow has a concept of a read-only base image (or 'backing >>> file') which can be shared between VMs, with writes diverted to a >>> separate image ('derived image'). >>> >>> So e.g. you can create a base image, do a simple OS install for a >>> particular OS version to that base image, then you stop using that >>> for a VM and just use it as a base to create derived images from. >>> You then run VMs using the derived image and make whatever config >>> changes. If you have a bunch of VMs using the same OS release then >>> you save some disk space for the common files. >>> >>> Mischa did you leave a VM running which is working on the base >>> image directly? That would certainly cause problems. >> >> I did indeed. Let me try that again without keeping the base image running. > > Right. As a safeguard, I would change the base image to be r/o. > > I was just looking at your script and scratching my head: why is Mischa > starting vm01 ... > > -Otto Normally I don’t use derived images, was a way to get a bunch of VMs running quickly to put some more load on veb/vport. I have moved vm01 out of the way to vm00 and redid the whole process. Seems much better now. Thank you all for showing me the way. :) Mischa
Re: vmm crash on 6.9-beta
> On 22 Mar 2021, at 14:27, Bryan Steele wrote: > On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote: >> >>> On 22 Mar 2021, at 13:43, Stuart Henderson wrote: >>> >>>>> Created a fresh install qcow2 image and derived 35 new VMs from it. >>>>> Then I started all the VMs in four cycles, 10 VMs per cycle and waiting >>>>> 240 seconds after each cycle. >>>>> Similar to the staggered start based on the amount of CPUs. >>> >>>> For me this is not enough info to even try to reproduce, I know little >>>> of vmm or vmd and have no idea what "derive" means in this context. >>> >>> This is a big bit of information that was missing from the original >> >> Well.. could have been better described indeed. :)) >> " I created 41 additional VMs based on a single qcow2 base image.” >> >>> report ;) qcow has a concept of a read-only base image (or 'backing >>> file') which can be shared between VMs, with writes diverted to a >>> separate image ('derived image'). >>> >>> So e.g. you can create a base image, do a simple OS install for a >>> particular OS version to that base image, then you stop using that >>> for a VM and just use it as a base to create derived images from. >>> You then run VMs using the derived image and make whatever config >>> changes. If you have a bunch of VMs using the same OS release then >>> you save some disk space for the common files. >>> >>> Mischa did you leave a VM running which is working on the base >>> image directly? That would certainly cause problems. >> >> I did indeed. Let me try that again without keeping the base image running. >> >> Mischa > > I seemed to recall that the base image is not supposed to be modified, > so this is a pretty big omission. > > Per original commit message: > > "A limitation of this format is that modifying the base image will > corrupt the derived image." > > https://marc.info/?l=openbsd-cvs=153901633011716=2 Makes a lot of sense. I guess a man page patch is in order. Mischa
Re: vmm crash on 6.9-beta
> On 22 Mar 2021, at 13:43, Stuart Henderson wrote: > >>> Created a fresh install qcow2 image and derived 35 new VMs from it. >>> Then I started all the VMs in four cycles, 10 VMs per cycle and waiting 240 >>> seconds after each cycle. >>> Similar to the staggered start based on the amount of CPUs. > >> For me this is not enough info to even try to reproduce, I know little >> of vmm or vmd and have no idea what "derive" means in this context. > > This is a big bit of information that was missing from the original Well.. could have been better described indeed. :)) " I created 41 additional VMs based on a single qcow2 base image.” > report ;) qcow has a concept of a read-only base image (or 'backing > file') which can be shared between VMs, with writes diverted to a > separate image ('derived image'). > > So e.g. you can create a base image, do a simple OS install for a > particular OS version to that base image, then you stop using that > for a VM and just use it as a base to create derived images from. > You then run VMs using the derived image and make whatever config > changes. If you have a bunch of VMs using the same OS release then > you save some disk space for the common files. > > Mischa did you leave a VM running which is working on the base > image directly? That would certainly cause problems. I did indeed. Let me try that again without keeping the base image running. Mischa > > >> Would it be possiblet for you to show the exact steps (preferably a >> script) to reproduce the issue? >> >> Though the specific hardware might play a role as well... >> >> -Otto >>> >>> Mischa >>> >>> OpenBSD 6.9-beta (GENERIC.MP) #421: Sun Mar 21 13:17:22 MDT 2021 >>>dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP >>> real mem = 137374924800 (131010MB) >>> avail mem = 133196165120 (127025MB) >>> random: good seed from bootblocks >>> mpath0 at root >>> scsibus0 at mpath0: 256 targets >>> mainbus0 at root >>> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xbf42c000 (99 entries) >>> bios0: vendor Dell Inc. version "2.8.0" date 06/26/2019 >>> bios0: Dell Inc. PowerEdge R620 >>> acpi0 at bios0: ACPI 3.0 >>> acpi0: sleep states S0 S4 S5 >>> acpi0: tables DSDT FACP APIC SPCR HPET DMAR MCFG WD__ SLIC ERST HEST BERT >>> EINJ TCPA PC__ SRAT SSDT >>> acpi0: wakeup devices PCI0(S5) PCI1(S5) >>> acpitimer0 at acpi0: 3579545 Hz, 24 bits >>> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat >>> cpu0 at mainbus0: apid 0 (boot processor) >>> cpu0: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 2600.34 MHz, 06-2d-07 >>> cpu0: >>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN >>> cpu0: 256KB 64b/line 8-way L2 cache >>> cpu0: smt 0, core 0, package 0 >>> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges >>> cpu0: apic clock running at 99MHz >>> cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE >>> cpu1 at mainbus0: apid 32 (application processor) >>> cpu1: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 1200.02 MHz, 06-2d-07 >>> cpu1: >>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN >>> cpu1: 256KB 64b/line 8-way L2 cache >>> cpu1: smt 0, core 0, package 1 >>> cpu2 at mainbus0: apid 2 (application processor) >>> cpu2: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 2600.03 MHz, 06-2d-07 >>> cpu2: >>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN >>> cpu2: 256KB 64b/line 8-way L2 cache >>> cpu2: smt 0, core 1, package 0 >>> cpu3 at mainbus0: apid 34 (application processor) >>> cpu3: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 26
Re: vmm crash on 6.9-beta
> On 22 Mar 2021, at 13:08, Otto Moerbeek wrote: > On Mon, Mar 22, 2021 at 11:34:25AM +0100, Mischa wrote: > >>> On 21 Mar 2021, at 02:31, Theo de Raadt wrote: >>> Otto Moerbeek wrote: >>>> On Fri, Mar 19, 2021 at 04:15:31PM +, Stuart Henderson wrote: >>>> >>>>> On 2021/03/19 17:05, Jan Klemkow wrote: >>>>>> Hi, >>>>>> >>>>>> I had the same issue a few days ago a server hardware of mine. I just >>>>>> ran 'cvs up'. So, it looks like a generic bug in FFS and not related to >>>>>> vmm. >>>>> >>>>> This panic generally relates to filesystem corruption. If fsck doesn't >>>>> help then recreating which filesystem is triggering it is usually needed. >>>> >>>> Yeah, once in a while we see reports of it. It seems to be some nasty >>>> conspiracy between the generic filesystem code, ffs and fsck_ffs. >>>> Maybe even the device (driver) itself is involved. A possible >>>> underlying issue may be that some operation are re-ordered while they >>>> should not. >>> >>> Yes, it does hint at a reordering. >>> >>>> Now the strange thing is, fsck_ffs *should* be able to repair the >>>> inconsistency, but it appears in some cases it is not, and some bits >>>> on the disk remain to trigger it again. >>> >>> fsck_ffs can only repair one inconsistancy. There are a number of lockstep >>> operations, I suppose we can call them acid-in-lowercase, which allow fsck >>> to determine at which point the crashed system gave up the ghost. fsck then >>> removes the partial operations, leaving a viable filesystem. But if the >>> disk >>> layer lands later writes but not earlier writes, fsck cannot handle it. >> >> I managed to re-create the issue. >> >> Created a fresh install qcow2 image and derived 35 new VMs from it. >> Then I started all the VMs in four cycles, 10 VMs per cycle and waiting 240 >> seconds after each cycle. >> Similar to the staggered start based on the amount of CPUs. >> >> This time it was “only” one VM that was affected by this. VM four that got >> started. >> >> ddb> show panic >> ffs_valloc: dup alloc >> ddb> trace >> db_enter() at db_enter+0x10 >> panic(81dc21b2) at panic+0x12a >> ffs_inode_alloc(fd803c94ef00,81a4,fd803f7bbf00,800014d728b8) at >> ffs >> _inode_alloc+0x442 >> ufs_makeinode(81a4,fd803c930908,800014d72bb0,800014d72c00) at >> ufs_m >> akeinode+0x7f >> ufs_create(800014d72960) at ufs_create+0x3c >> VOP_CREATE(fd803c930908,800014d72bb0,800014d72c00,800014d729c0) >> at VOP_CREATE+0x4a >> vn_open(800014d72b80,602,1a4) at vn_open+0x182 >> doopenat(8000c778,ff9c,f8fc28f00f4,601,1b6,800014d72d80) at >> doo >> penat+0x1d0 >> syscall(800014d72df0) at syscall+0x315 >> Xsyscall() at Xsyscall+0x128 >> end of kernel >> end trace frame: 0x7f7be450, count: -10 >> >> dmesg of the host below. > > For me this is not enough info to even try to reproduce, I know little > of vmm or vmd and have no idea what "derive" means in this context. > > Would it be possiblet for you to show the exact steps (preferably a > script) to reproduce the issue? Hopefully the below helps. If you do have vmd running create a VM (qcow2 format) with the normal installation process. The base image I created with: vmctl create -s 50G /var/vmm/vm01.qcow2 I have dhcp setup so all the subsequent images will be able to pickup a different IP address. Once that is done replicate the vm.conf config for all the other VMs. The config I used for the VMs is something like: vm "vm01" { disable owner runbsd memory 1G disk "/var/vmm/vm01.qcow2" format qcow2 interface tap { switch "uplink_veb911" lladdr fe:e1:bb:d4:d4:01 } } I replicate them by running something like: for i in $(jot 39 2); do vmctl create -b /var/vmm/vm01.qcow2 /var/vmm/vm${i}.qcow2; done This will use vm01.qcow2 image as the base and create a derived image from it. Which means only changes will be applied to the new image. I start them with the following script: #!/bin/sh SLEEP=240 CPU=$(($(sysctl -n hw.ncpuonline)-2)) COUNTER=0 for i in $(vmctl show | sort | awk '/ - / {print $9}' | xargs); do VMS[${COUNTER}]=${i} COUNTER=$((${COUNTER}+1)) done CYCLES=$((${#VMS[*]}/${CPU}+1)) echo "Starting ${#VMS[*]} VMs on ${CPU} CPUs in ${CYCLES} cycle(s), waiting ${SLEEP} seconds after each cycle." COUNTER=0 for i in ${VMS[*]}; do COUNTER=$((${COUNTER}+1)) vmctl start ${i} if [ $COUNTER -eq $CPU ]; then sleep ${SLEEP} COUNTER=0 fi done This to make sure they are “settled” and all processes are properly started before starting the next batch of VMs. > Though the specific hardware might play a role as well… I can also provide you access to the host itself. Mischa
Re: vmm crash on 6.9-beta
> On 21 Mar 2021, at 02:31, Theo de Raadt wrote: > Otto Moerbeek wrote: >> On Fri, Mar 19, 2021 at 04:15:31PM +, Stuart Henderson wrote: >> >>> On 2021/03/19 17:05, Jan Klemkow wrote: >>>> Hi, >>>> >>>> I had the same issue a few days ago a server hardware of mine. I just >>>> ran 'cvs up'. So, it looks like a generic bug in FFS and not related to >>>> vmm. >>> >>> This panic generally relates to filesystem corruption. If fsck doesn't >>> help then recreating which filesystem is triggering it is usually needed. >> >> Yeah, once in a while we see reports of it. It seems to be some nasty >> conspiracy between the generic filesystem code, ffs and fsck_ffs. >> Maybe even the device (driver) itself is involved. A possible >> underlying issue may be that some operation are re-ordered while they >> should not. > > Yes, it does hint at a reordering. > >> Now the strange thing is, fsck_ffs *should* be able to repair the >> inconsistency, but it appears in some cases it is not, and some bits >> on the disk remain to trigger it again. > > fsck_ffs can only repair one inconsistancy. There are a number of lockstep > operations, I suppose we can call them acid-in-lowercase, which allow fsck > to determine at which point the crashed system gave up the ghost. fsck then > removes the partial operations, leaving a viable filesystem. But if the disk > layer lands later writes but not earlier writes, fsck cannot handle it. I managed to re-create the issue. Created a fresh install qcow2 image and derived 35 new VMs from it. Then I started all the VMs in four cycles, 10 VMs per cycle and waiting 240 seconds after each cycle. Similar to the staggered start based on the amount of CPUs. This time it was “only” one VM that was affected by this. VM four that got started. ddb> show panic ffs_valloc: dup alloc ddb> trace db_enter() at db_enter+0x10 panic(81dc21b2) at panic+0x12a ffs_inode_alloc(fd803c94ef00,81a4,fd803f7bbf00,800014d728b8) at ffs _inode_alloc+0x442 ufs_makeinode(81a4,fd803c930908,800014d72bb0,800014d72c00) at ufs_m akeinode+0x7f ufs_create(800014d72960) at ufs_create+0x3c VOP_CREATE(fd803c930908,800014d72bb0,800014d72c00,800014d729c0) at VOP_CREATE+0x4a vn_open(800014d72b80,602,1a4) at vn_open+0x182 doopenat(8000c778,ff9c,f8fc28f00f4,601,1b6,800014d72d80) at doo penat+0x1d0 syscall(800014d72df0) at syscall+0x315 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7be450, count: -10 dmesg of the host below. Mischa OpenBSD 6.9-beta (GENERIC.MP) #421: Sun Mar 21 13:17:22 MDT 2021 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 137374924800 (131010MB) avail mem = 133196165120 (127025MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xbf42c000 (99 entries) bios0: vendor Dell Inc. version "2.8.0" date 06/26/2019 bios0: Dell Inc. PowerEdge R620 acpi0 at bios0: ACPI 3.0 acpi0: sleep states S0 S4 S5 acpi0: tables DSDT FACP APIC SPCR HPET DMAR MCFG WD__ SLIC ERST HEST BERT EINJ TCPA PC__ SRAT SSDT acpi0: wakeup devices PCI0(S5) PCI1(S5) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 2600.34 MHz, 06-2d-07 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE cpu1 at mainbus0: apid 32 (application processor) cpu1: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 1200.02 MHz, 06-2d-07 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 0, package 1 cpu2 at mainbus0: apid 2 (application processor) cpu2: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 2600.03 MHz, 06-2d-07 cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PB
Re: vmm crash on 6.9-beta
> On 16 Mar 2021, at 21:17, Mischa wrote: > On 13 Mar at 09:17, Otto Moerbeek wrote: >> On Sat, Mar 13, 2021 at 12:08:52AM -0800, Mike Larkin wrote: >>> On Wed, Mar 10, 2021 at 08:30:32PM +0100, Mischa wrote: >>>> On 10 Mar at 18:59, Mike Larkin wrote: >>>>> On Wed, Mar 10, 2021 at 03:08:21PM +0100, Mischa wrote: >>>>>> Hi All, >>>>>> >>>>>> Currently I am running 6.9-beta on one of my hosts to test >>>>>> veb(4)/vport(4). >>>>>> >>>>>> root@server14:~ # sysctl kern.version >>>>>> kern.version=OpenBSD 6.9-beta (GENERIC.MP) #385: Mon Mar 8 12:57:12 MST >>>>>> 2021 >>>>>>dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP >>>>>> >>>>>> On order to add some load to the system I created 41 additional VMs >>>>>> based on a single qcow2 base image. >>>>>> A couple of those VMs crashed with the following ddb output. >>>>>> >>>>>> ddb> show panic >>>>>> ffs_valloc: dup alloc >>>>>> ddb> trace >>>>>> db_enter() at db_enter+0x10 >>>>>> panic(81dc0709) at panic+0x12a >>>>>> ffs_inode_alloc(fd80269831e0,8180,fd803f7bb540,800014e1e3e8) >>>>>> at ffs >>>>>> _inode_alloc+0x442 >>>>>> ufs_makeinode(8180,fd8026a386a0,800014e1e6e0,800014e1e730) >>>>>> at ufs_m >>>>>> akeinode+0x7f >>>>>> ufs_create(800014e1e490) at ufs_create+0x3c >>>>>> VOP_CREATE(fd8026a386a0,800014e1e6e0,800014e1e730,800014e1e4f0) >>>>>> at VOP_CREATE+0x4a >>>>>> vn_open(800014e1e6b0,10602,180) at vn_open+0x182 >>>>>> doopenat(800014e8a518,ff9c,70e0e92a500,10601,1b6,800014e1e8b0) >>>>>> at d >>>>>> oopenat+0x1d0 >>>>>> syscall(800014e1e920) at syscall+0x315 >>>>>> Xsyscall() at Xsyscall+0x128 >>>>>> end of kernel >>>>>> end trace frame: 0x7f7e5000, count: -10 >>>>>> >>>>>> Mischa >>>>>> >>>>> >>>>> Probably not vmm(4) related but thanks for reporting! >>>> >>>> Could it be qcow2 related? or is this general disk? At least that is what >>>> I think ffs_ is. :) >>>> >>>> Mischa >>>> >>> >>> likely completely unrelated to anything vmd(8) is doing. >>> >> >> Appart form kernel/ffs bugs, a dup alloc can also be caused by an >> inconsistent fs. Please run a *forced* (-f) fsck on the fs. (after >> unmounting of course). >> >> -Otto > > Thanx Otto, that indeed did the trick. > It hasn't happened since and veb/vport seems to hold well. Was running pkg_add and during extraction of automake the VM hanged, with the same panic/trace. root@server14:~ # vmctl console vm11 Connected to /dev/ttyp8 (speed 115200) ddb> show panic ffs_valloc: dup alloc ddb> trace db_enter() at db_enter+0x10 panic(81dc5a29) at panic+0x12a ffs_inode_alloc(fd8035819e20,41ed,fd803f7bb780,800014e84ac8) at ffs _inode_alloc+0x442 ufs_mkdir(800014e84b20) at ufs_mkdir+0x9e VOP_MKDIR(fd8038c3b820,800014e84c58,800014e84ca8,800014e84b88) a t VOP_MKDIR+0x50 domkdirat(80008ef0,ff9c,f7e8cefbd80,1ff) at domkdirat+0xf6 syscall(800014e84e20) at syscall+0x315 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7c6480, count: -8 ddb> After running fsck the VM isn’t right anymore. reordering libraries:install: unknown group bin install: unknown group bin failed. install: unknown group utmp starting early daemons: syslogd pflogd ntpd. starting RPC daemons:. savecore: no core dump checking quotas: done. kvm_mkdb: can't find kmem group: Undefined error: 0 chown: group is invalid: wheel clearing /tmp Will spin a non-derived VM to see if this makes a difference. Mischa
Re: vmm crash on 6.9-beta
On 13 Mar at 09:17, Otto Moerbeek wrote: > On Sat, Mar 13, 2021 at 12:08:52AM -0800, Mike Larkin wrote: > > On Wed, Mar 10, 2021 at 08:30:32PM +0100, Mischa wrote: > > > On 10 Mar at 18:59, Mike Larkin wrote: > > > > On Wed, Mar 10, 2021 at 03:08:21PM +0100, Mischa wrote: > > > > > Hi All, > > > > > > > > > > Currently I am running 6.9-beta on one of my hosts to test > > > > > veb(4)/vport(4). > > > > > > > > > > root@server14:~ # sysctl kern.version > > > > > kern.version=OpenBSD 6.9-beta (GENERIC.MP) #385: Mon Mar 8 12:57:12 > > > > > MST 2021 > > > > > > > > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > > > > > > > On order to add some load to the system I created 41 additional VMs > > > > > based on a single qcow2 base image. > > > > > A couple of those VMs crashed with the following ddb output. > > > > > > > > > > ddb> show panic > > > > > ffs_valloc: dup alloc > > > > > ddb> trace > > > > > db_enter() at db_enter+0x10 > > > > > panic(81dc0709) at panic+0x12a > > > > > ffs_inode_alloc(fd80269831e0,8180,fd803f7bb540,800014e1e3e8) > > > > > at ffs > > > > > _inode_alloc+0x442 > > > > > ufs_makeinode(8180,fd8026a386a0,800014e1e6e0,800014e1e730) > > > > > at ufs_m > > > > > akeinode+0x7f > > > > > ufs_create(800014e1e490) at ufs_create+0x3c > > > > > VOP_CREATE(fd8026a386a0,800014e1e6e0,800014e1e730,800014e1e4f0) > > > > > at VOP_CREATE+0x4a > > > > > vn_open(800014e1e6b0,10602,180) at vn_open+0x182 > > > > > doopenat(800014e8a518,ff9c,70e0e92a500,10601,1b6,800014e1e8b0) > > > > > at d > > > > > oopenat+0x1d0 > > > > > syscall(800014e1e920) at syscall+0x315 > > > > > Xsyscall() at Xsyscall+0x128 > > > > > end of kernel > > > > > end trace frame: 0x7f7e5000, count: -10 > > > > > > > > > > Mischa > > > > > > > > > > > > > Probably not vmm(4) related but thanks for reporting! > > > > > > Could it be qcow2 related? or is this general disk? At least that is what > > > I think ffs_ is. :) > > > > > > Mischa > > > > > > > likely completely unrelated to anything vmd(8) is doing. > > > > Appart form kernel/ffs bugs, a dup alloc can also be caused by an > inconsistent fs. Please run a *forced* (-f) fsck on the fs. (after > unmounting of course). > > -Otto Thanx Otto, that indeed did the trick. It hasn't happened since and veb/vport seems to hold well. Mischa
Re: vmm crash on 6.9-beta
On 10 Mar at 18:59, Mike Larkin wrote: > On Wed, Mar 10, 2021 at 03:08:21PM +0100, Mischa wrote: > > Hi All, > > > > Currently I am running 6.9-beta on one of my hosts to test veb(4)/vport(4). > > > > root@server14:~ # sysctl kern.version > > kern.version=OpenBSD 6.9-beta (GENERIC.MP) #385: Mon Mar 8 12:57:12 MST > > 2021 > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > On order to add some load to the system I created 41 additional VMs based > > on a single qcow2 base image. > > A couple of those VMs crashed with the following ddb output. > > > > ddb> show panic > > ffs_valloc: dup alloc > > ddb> trace > > db_enter() at db_enter+0x10 > > panic(81dc0709) at panic+0x12a > > ffs_inode_alloc(fd80269831e0,8180,fd803f7bb540,800014e1e3e8) at > > ffs > > _inode_alloc+0x442 > > ufs_makeinode(8180,fd8026a386a0,800014e1e6e0,800014e1e730) at > > ufs_m > > akeinode+0x7f > > ufs_create(800014e1e490) at ufs_create+0x3c > > VOP_CREATE(fd8026a386a0,800014e1e6e0,800014e1e730,800014e1e4f0) > > at VOP_CREATE+0x4a > > vn_open(800014e1e6b0,10602,180) at vn_open+0x182 > > doopenat(800014e8a518,ff9c,70e0e92a500,10601,1b6,800014e1e8b0) > > at d > > oopenat+0x1d0 > > syscall(800014e1e920) at syscall+0x315 > > Xsyscall() at Xsyscall+0x128 > > end of kernel > > end trace frame: 0x7f7e5000, count: -10 > > > > Mischa > > > > Probably not vmm(4) related but thanks for reporting! Could it be qcow2 related? or is this general disk? At least that is what I think ffs_ is. :) Mischa
vmm crash on 6.9-beta
Hi All, Currently I am running 6.9-beta on one of my hosts to test veb(4)/vport(4). root@server14:~ # sysctl kern.version kern.version=OpenBSD 6.9-beta (GENERIC.MP) #385: Mon Mar 8 12:57:12 MST 2021 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP On order to add some load to the system I created 41 additional VMs based on a single qcow2 base image. A couple of those VMs crashed with the following ddb output. ddb> show panic ffs_valloc: dup alloc ddb> trace db_enter() at db_enter+0x10 panic(81dc0709) at panic+0x12a ffs_inode_alloc(fd80269831e0,8180,fd803f7bb540,800014e1e3e8) at ffs _inode_alloc+0x442 ufs_makeinode(8180,fd8026a386a0,800014e1e6e0,800014e1e730) at ufs_m akeinode+0x7f ufs_create(800014e1e490) at ufs_create+0x3c VOP_CREATE(fd8026a386a0,800014e1e6e0,800014e1e730,800014e1e4f0) at VOP_CREATE+0x4a vn_open(800014e1e6b0,10602,180) at vn_open+0x182 doopenat(800014e8a518,ff9c,70e0e92a500,10601,1b6,800014e1e8b0) at d oopenat+0x1d0 syscall(800014e1e920) at syscall+0x315 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7e5000, count: -10 Mischa
Re: Port httpd(8) 'strip' directive to relayd(8)
By no means an official OK, but would love to see this in relayd! Mischa > On 3 Jan 2021, at 11:40, Denis Fondras wrote: > > Le Fri, Dec 11, 2020 at 10:53:56AM +, Olivier Cherrier a écrit : >> >> Hello tech@, >> >> Is there any interest for this feature to be commited? >> I find it very useful. Thank you Denis! >> > > Here is an up to date diff, looking for OKs. > > Index: parse.y > === > RCS file: /cvs/src/usr.sbin/relayd/parse.y,v > retrieving revision 1.250 > diff -u -p -r1.250 parse.y > --- parse.y 29 Dec 2020 19:48:06 - 1.250 > +++ parse.y 3 Jan 2021 10:38:26 - > @@ -175,7 +175,7 @@ typedef struct { > %tokenLOOKUP METHOD MODE NAT NO DESTINATION NODELAY NOTHING ON PARENT > PATH > %tokenPFTAG PORT PREFORK PRIORITY PROTO QUERYSTR REAL REDIRECT RELAY > REMOVE > %tokenREQUEST RESPONSE RETRY QUICK RETURN ROUNDROBIN ROUTE SACK > SCRIPT SEND > -%token SESSION SOCKET SPLICE SSL STICKYADDR STYLE TABLE TAG TAGGED TCP > +%token SESSION SOCKET SPLICE SSL STICKYADDR STRIP STYLE TABLE TAG > TAGGED TCP > %tokenTIMEOUT TLS TO ROUTER RTLABEL TRANSPARENT URL WITH TTL RTABLE > %tokenMATCH PARAMS RANDOM LEASTSTATES SRCHASH KEY CERTIFICATE > PASSWORD ECDHE > %tokenEDH TICKETS CONNECTION CONNECTIONS CONTEXT ERRORS STATE CHANGES > CHECKS > @@ -1549,6 +1549,20 @@ ruleopts : METHOD STRING > { > rule->rule_kv[keytype].kv_option = $2; > rule->rule_kv[keytype].kv_type = keytype; > } > + | PATH STRIP NUMBER { > + char*strip = NULL; > + > + if ($3 < 0 || $3 > INT_MAX) { > + yyerror("invalid strip number"); > + YYERROR; > + } > + if (asprintf(, "%lld", $3) <= 0) > + fatal("can't parse strip"); > + keytype = KEY_TYPE_PATH; > + rule->rule_kv[keytype].kv_option = KEY_OPTION_STRIP; > + rule->rule_kv[keytype].kv_value = strip; > + rule->rule_kv[keytype].kv_type = keytype; > + } > | QUERYSTR key_option STRING value { > switch ($2) { > case KEY_OPTION_APPEND: > @@ -2481,6 +2495,7 @@ lookup(char *s) > { "ssl",SSL }, > { "state", STATE }, > { "sticky-address", STICKYADDR }, > + { "strip", STRIP }, > { "style", STYLE }, > { "table", TABLE }, > { "tag",TAG }, > Index: relay.c > === > RCS file: /cvs/src/usr.sbin/relayd/relay.c,v > retrieving revision 1.251 > diff -u -p -r1.251 relay.c > --- relay.c 14 May 2020 17:27:38 - 1.251 > +++ relay.c 3 Jan 2021 10:38:27 - > @@ -214,6 +214,9 @@ relay_ruledebug(struct relay_rule *rule) > case KEY_OPTION_LOG: > fprintf(stderr, "log "); > break; > + case KEY_OPTION_STRIP: > + fprintf(stderr, "strip "); > + break; > case KEY_OPTION_NONE: > break; > } > @@ -227,13 +230,15 @@ relay_ruledebug(struct relay_rule *rule) > break; > } > > + int kvv = (kv->kv_option == KEY_OPTION_STRIP || > + kv->kv_value == NULL); > fprintf(stderr, "%s%s%s%s%s%s ", > kv->kv_key == NULL ? "" : "\"", > kv->kv_key == NULL ? "" : kv->kv_key, > kv->kv_key == NULL ? "" : "\"", > - kv->kv_value == NULL ? "" : " value \"", > + kvv ? "" : " value \"", > kv->kv_value == NULL ? "" : kv->kv_value, > - kv->kv_value == NULL ? "" : "\""); > + kvv ? "" : "\""); > } > > if (rule->rule_tablename[0]) > Index: rela
Re: iwm (7260) on APU2 fails on -current #601
> On 21 Jan 2020, at 11:57, Stefan Sperling wrote: > > On Tue, Jan 21, 2020 at 11:34:28AM +0100, Mischa wrote: >> Hi All, >> >> I have an APU2 with a iwm card which keeps on acting up on a regular basis. >> >> apu2# dmesg | grep iwm >> iwm0 at pci1 dev 0 function 0 "Intel Dual Band Wireless AC 7260" rev 0x73, >> msi >> iwm0: hw rev 0x140, fw ver 17.3216344376.0, address f8:16:54:06:b9:a9 >> >> After the APU has booted properly it fails to continue to use the device >> pretty quickly. >> Message like below are shown on the console: >> >> iwm0: device timeout >> iwm0: acquiring device failed >> iwm0: acquiring device failed >> iwm0: acquiring device failed >> iwm0: acquiring device failed >> iwm0: acquiring device failed >> iwm0: acquiring device failed >> iwm0: acquiring device failed >> iwm0: acquiring device failed >> iwm0: acquiring device failed >> iwm0: acquiring device failed >> iwm0: acquiring device failed >> iwm0: acquiring device failed >> iwm0: acquiring device failed >> iwm0: acquiring device failed >> iwm0: acquiring device failed >> iwm0: acquiring device failed >> iwm0: acquiring device failed >> >> Managed to capture the above when running: sysupgrade >> Is this a hardware problem? > > Not sure. I would try moving the card to another minipcie slot. Good idea. Will give that a try. Thanx! Mischa
iwm (7260) on APU2 fails on -current #601
Hi All, I have an APU2 with a iwm card which keeps on acting up on a regular basis. apu2# dmesg | grep iwm iwm0 at pci1 dev 0 function 0 "Intel Dual Band Wireless AC 7260" rev 0x73, msi iwm0: hw rev 0x140, fw ver 17.3216344376.0, address f8:16:54:06:b9:a9 After the APU has booted properly it fails to continue to use the device pretty quickly. Message like below are shown on the console: iwm0: device timeout iwm0: acquiring device failed iwm0: acquiring device failed iwm0: acquiring device failed iwm0: acquiring device failed iwm0: acquiring device failed iwm0: acquiring device failed iwm0: acquiring device failed iwm0: acquiring device failed iwm0: acquiring device failed iwm0: acquiring device failed iwm0: acquiring device failed iwm0: acquiring device failed iwm0: acquiring device failed iwm0: acquiring device failed iwm0: acquiring device failed iwm0: acquiring device failed iwm0: acquiring device failed Managed to capture the above when running: sysupgrade Is this a hardware problem? Mischa apu2# dmesg OpenBSD 6.6-current (GENERIC.MP) #601: Sun Jan 12 22:51:04 MST 2020 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 4259917824 (4062MB) avail mem = 4118347776 (3927MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xcfe96020 (13 entries) bios0: vendor coreboot version "v4.10.0.3" date 11/07/2019 bios0: PC Engines apu2 acpi0 at bios0: ACPI 4.0 acpi0: sleep states S0 S1 S4 S5 acpi0: tables DSDT FACP SSDT MCFG TPM2 APIC HEST IVRS SSDT SSDT HPET acpi0: wakeup devices PWRB(S4) PBR4(S4) PBR5(S4) PBR6(S4) PBR7(S4) PBR8(S4) UOH1(S3) UOH2( acpitimer0 at acpi0: 3579545 Hz, 32 bits acpimcfg0 at acpi0 acpimcfg0: addr 0xf800, bus 0-64 acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: AMD GX-412TC SOC, 998.28 MHz, 16-30-01 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,F cpu0: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 16-way L2 cac cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu0: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, IBE cpu1 at mainbus0: apid 1 (application processor) cpu1: AMD GX-412TC SOC, 998.14 MHz, 16-30-01 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,F cpu1: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 16-way L2 cac cpu1: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu1: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 2 (application processor) cpu2: AMD GX-412TC SOC, 998.14 MHz, 16-30-01 cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,F cpu2: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 16-way L2 cac cpu2: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu2: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative cpu2: smt 0, core 2, package 0 cpu3 at mainbus0: apid 3 (application processor) cpu3: AMD GX-412TC SOC, 998.14 MHz, 16-30-01 cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,F cpu3: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 16-way L2 cac cpu3: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu3: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative cpu3: smt 0, core 3, package 0 ioapic0 at mainbus0: apid 4 pa 0xfec0, version 21, 24 pins ioapic1 at mainbus0: apid 5 pa 0xfec2, version 21, 32 pins, remapped acpihpet0 at acpi0: 14318180 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 1 (PBR4) acpiprt2 at acpi0: bus 2 (PBR5) acpiprt3 at acpi0: bus 3 (PBR6) acpiprt4 at acpi0: bus 4 (PBR7) acpiprt5 at acpi0: bus -1 (PBR8) "ACPI0007" at acpi0 not configured "ACPI0007" at acpi0 not configured "ACPI0007" at acpi0 not configured "ACPI0007" at acpi0 not configured "ACPI0007" at acpi0 not configured "ACPI0007" at acpi0 not configured "ACPI0007" at acpi0 not configured "ACPI0007" at acpi0 not configured acpibtn0 at acpi0: PWRB acpipci0 at acpi0 PCI0: 0x 0x0011 0x0001 extent `acpipci0 pcibus' (0x0 - 0xff), flags=0 extent `acpipci0 pciio' (0x0 - 0x), flags=0 0xcf8 - 0xcff 0x1 - 0x extent `acpipci0 pcimem' (0x0 - 0x), flags=0 0x0 - 0x9 0xe - 0xcfff 0x1 - 0x acpicmos0 at acpi0 amdgpio0 at acpi0: GPIO uid 0 addr 0xfed81500/0x300 irq 7, 184 pins "PRP0001"
Re: [PATCH] staggered start of vms in vm.conf
> On 8 Dec 2019, at 11:08, Pratik Vyas wrote: > > Hi! > > This is an attempt to address 'thundering herd' problem when a lot of > vms are configured in vm.conf. A lot of vms booting in parallel can > overload the host and also mess up tsc calibration in openbsd guests as > it uses PIT which doesn't fire reliably if the host is overloaded. > > > This diff makes vmd start vms in a staggered fashion with default parallelism > of > number of cpus on the host and a delay of 30s. Default can be overridden with > a line like following in vm.conf > > staggered start parallel 4 delay 30 > > > Every non-disabled vm starts in waiting state. If you are eager to > start a vm that is way further in the list, you can vmctl start it. > > Discussed the idea with ori@, mlarkin@ and phessler@. > > Comments / ok? Great addition to stop -w. Like it! Mischa > -- > Pratik > > Index: usr.sbin/vmctl/vmctl.c > === > RCS file: /home/cvs/src/usr.sbin/vmctl/vmctl.c,v > retrieving revision 1.71 > diff -u -p -a -u -r1.71 vmctl.c > --- usr.sbin/vmctl/vmctl.c7 Sep 2019 09:11:14 -1.71 > +++ usr.sbin/vmctl/vmctl.c8 Dec 2019 09:29:39 - > @@ -716,6 +716,8 @@ vm_state(unsigned int mask) > { >if (mask & VM_STATE_PAUSED) >return "paused"; > +else if (mask & VM_STATE_WAITING) > +return "waiting"; >else if (mask & VM_STATE_RUNNING) >return "running"; >else if (mask & VM_STATE_SHUTDOWN) > Index: usr.sbin/vmd/parse.y > === > RCS file: /home/cvs/src/usr.sbin/vmd/parse.y,v > retrieving revision 1.52 > diff -u -p -a -u -r1.52 parse.y > --- usr.sbin/vmd/parse.y14 May 2019 06:05:45 -1.52 > +++ usr.sbin/vmd/parse.y8 Dec 2019 09:29:39 - > @@ -122,7 +122,8 @@ typedef struct { > %tokenINCLUDE ERROR > %tokenADD ALLOW BOOT CDROM DEVICE DISABLE DISK DOWN ENABLE FORMAT GROUP > %tokenINET6 INSTANCE INTERFACE LLADDR LOCAL LOCKED MEMORY NET NIFS OWNER > -%tokenPATH PREFIX RDOMAIN SIZE SOCKET SWITCH UP VM VMID > +%tokenPATH PREFIX RDOMAIN SIZE SOCKET SWITCH UP VM VMID STAGGERED START > +%token PARALLEL DELAY > %tokenNUMBER > %tokenSTRING > %typelladdr > @@ -217,6 +218,11 @@ main: LOCAL INET6 { >env->vmd_ps.ps_csock.cs_uid = $3.uid; >env->vmd_ps.ps_csock.cs_gid = $3.gid == -1 ? 0 : $3.gid; >} > +| STAGGERED START PARALLEL NUMBER DELAY NUMBER { > +env->vmd_cfg.cfg_flags |= VMD_CFG_STAGGERED_START; > +env->vmd_cfg.delay.tv_sec = $6; > +env->vmd_cfg.parallelism = $4; > +} >; > switch: SWITCH string{ > @@ -368,6 +374,8 @@ vm: VM string vm_instance{ >} else { >if (vcp_disable) >vm->vm_state |= VM_STATE_DISABLED; > +else > +vm->vm_state |= VM_STATE_WAITING; >log_debug("%s:%d: vm \"%s\" " >"registered (%s)", >file->name, yylval.lineno, > @@ -766,6 +774,7 @@ lookup(char *s) >{ "allow",ALLOW }, >{ "boot",BOOT }, >{ "cdrom",CDROM }, > +{ "delay",DELAY }, >{ "device",DEVICE }, >{ "disable",DISABLE }, >{ "disk",DISK }, > @@ -785,10 +794,13 @@ lookup(char *s) >{ "memory",MEMORY }, >{ "net",NET }, >{ "owner",OWNER }, > +{ "parallel",PARALLEL }, >{ "prefix",PREFIX }, >{ "rdomain",RDOMAIN }, >{ "size",SIZE }, >{ "socket",SOCKET }, > +{ "staggered",STAGGERED }, > +{ "start",START }, >{ "switch",SWITCH }, >{ "up",UP }, >{ "vm",VM } > Index: usr.sbin/vmd/vm.conf.5 > === > RCS file: /home/cvs/src/usr.sbin/vmd/vm.conf.5,v > retrieving revision 1.44 > diff -u -p -a -u -r1.44 vm.conf.5 > --- usr.sbin/vmd/vm.conf.514 May 2019 12:47:17 -1.44 > +++ usr.sbin/vmd/vm.conf.58 D
Re: relayd(8): transparent forward
> On 6 Nov 2019, at 08:25, Stuart Henderson wrote: > > On 2019/11/05 20:46, Mischa Peters wrote: >> When you are using transparent (Direct Server Return) you have to make sure >> you disable ARP on the servers you are load balancing. > > Transparent is not "direct server return", that is done with "route to". You are right indeed. However according to the manual transparent also retains the client IP. So seems similar in operation, but I guess a different use-case. > >> What happens with transparant is that the server gets the client IP as >> source, not the IP of relayd, and will respond directly to the client from >> its own IP address. The client is expecting a response from the relayd IP >> address and doesn’t respond to the server. > > The client is expecting a response from the address it sent packets to, > "transparent" doesn't interfere with this. > > There is something fiddly with the config for "transparent" but it should > be possible to do what OP wants if relayd is on a machine on the network > path between client and destination (e.g. on a firewall/router). >
Re: relayd(8): transparent forward
What are you trying to do? When you are using transparent (Direct Server Return) you have to make sure you disable ARP on the servers you are load balancing. What happens with transparant is that the server gets the client IP as source, not the IP of relayd, and will respond directly to the client from its own IP address. The client is expecting a response from the relayd IP address and doesn’t respond to the server. Since you are going to the same server it might not be a good idea to use transparent. :) If you want to get the client IP address on your destination you can add the client IP address in a header, if it’s HTTP. With headers like: match request header set "X-ClientIP" value "$REMOTE_ADDR" match request header append "X-Forwarded-For" value "$REMOTE_ADDR" match request header append "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT" Hope this helps. Mischa -- > On 5 Nov 2019, at 17:38, mp1...@gmx-topmail.de wrote: > > > The configuration below works fine as soon as I remove the 'transparent' > keyword but times out when running as transparent forwarder. > > What am I missing? > > Any help is being appreciated. > > > - > # relayd.conf > > http protocol "httpsfilter" { >tcp { nodelay, sack } >return error >pass >tls keypair test-site > } > relay "httpsinspect" { >listen on 127.0.0.1 port 8443 tls >protocol "httpsfilter" >transparent forward with tls to destination > } > -- > > -- > # pf.conf > > set skip on lo > block return > pass > pass in on egress inet proto tcp to port https \ >divert-to 127.0.0.1 port 8443 > -- > > > Here's some debug output: > > root:/root:2# relayd -dvv > startup > socket_rlimit: max open files 1024 > socket_rlimit: max open files 1024 > socket_rlimit: max open files 1024 > pfe: filter init done > socket_rlimit: max open files 1024 > relay_load_certfiles: using certificate /etc/ssl/test-site.crt > relay_load_certfiles: using private key /etc/ssl/private/test-site.key > parent_tls_ticket_rekey: rekeying tickets > relay_privinit: adding relay httpsinspect > protocol 1: name httpsfilter >flags: used, return, relay flags: tls, tls client, divert >tcp flags: nodelay, sack >tls flags: tlsv1.2, cipher-server-preference >tls session tickets: disabled >type: http >pass request > ca_engine_init: using RSA privsep engine > ca_engine_init: using RSA privsep engine > ca_engine_init: using RSA privsep engine > ca_engine_init: using RSA privsep engine > init_tables: created 0 tables > relay_tls_ctx_create: loading certificate > relay_tls_ctx_create: loading certificate > relay_tls_ctx_create: loading certificate > relay_launch: running relay httpsinspect > relay_launch: running relay httpsinspect > relay_launch: running relay httpsinspect > relay_tls_transaction: session 1: scheduling on EV_READ > relay httpsinspect, tls session 1 established (1 active) > relay_connect: session 1: forward failed: Operation timed out > relay_close: sessions inflight decremented, now 0 > relay_tls_transaction: session 2: scheduling on EV_READ > relay httpsinspect, tls session 2 established (1 active) > relay_connect: session 2: forward failed: Operation timed out > relay_close: sessions inflight decremented, now 0 > ^Ckill_tables: deleted 0 tables > hce exiting, pid 46061 > ca exiting, pid 45725 > flush_rulesets: flushed rules > ca exiting, pid 26171 > ca exiting, pid 87096 > pfe exiting, pid 63649 > relay exiting, pid 69039 > relay exiting, pid 56446 > relay exiting, pid 69591 > parent terminating, pid 49439 > root:/root:3# >
Re: vmd: static address for local interfaces, fix static tapX names
-- > On 25 Oct 2019, at 21:53, Mike Larkin wrote: > > On Fri, Oct 25, 2019 at 07:47:35PM +, Reyk Floeter wrote: >>> On Fri, Oct 25, 2019 at 12:27:25PM -0700, Mike Larkin wrote: >>> On Fri, Oct 25, 2019 at 06:15:59PM +, Reyk Floeter wrote: >>>> Hi, >>>> >>>> the attached diff is rather large and implements two things for vmd: >>>> >>>> 1) Allow to configure static IP address/gateway pairs local interfaces. >>>> 2) Skip statically configured interface names (eg. tap0) when >>>> allocating dynamic interfaces. >>>> >>>> Example: >>>> ---snip--- >>>> vm "foo" { >>>>disable >>>>local interface "tap0" { >>>>address 192.168.0.10/24 192.168.0.1 >>>>} >>>>local interface "tap1" >>>>disk "/home/vm/foo.qcow2" >>>> } >>>> >>>> vm "bar" { >>>>local interface >>>>disk "/home/vm/bar.qcow2" >>>> } >>>> ---snap--- >>>> >>>> >>>> 1) The VM "foo" has two interfaces: The first interface has a fixed >>>> IPv4 address with 192.168.0.1/24 on the gateway and 192.168.0.10/24 on >>>> the VM. 192.168.0.10/24 is assigned to the VM's first NIC via the >>>> built-in DHCP server. The second VM gets a default 100.64.x.x/31 IP. >>> >>> I'm not sure the above description matches what I'm seeing in the vm.conf >>> snippet above. >>> >>> What's "the gateway" here? Is this the host machine, or the actual >>> gateway, perhaps on some other machine? Does this just allow me to specify >>> the host-side tap(4) IP address for a corresponding given VM vio(4) >>> interface? >>> >> >> Ah, OK. I used the terms without explaining them: >> >> With local interfaces, vmd(8) uses two IPs per interface: one for the >> tap(4) on the host, one for the vio(4) on the VM. It configures the >> first one on the host and provides the second one via DHCP. The IP on >> the host IP is the default "gateway" router for the VM. >> > > Ah, I missed the fact that these are not "-i" style interfaces but rather > *local* interfaces (eg, "-L" style). > >> The address syntax is currently reversed: >>address "address/prefix" "gateway" >> Maybe I should change it to >>address "gateway" "address/prefix" >> or >>address "address/prefix" gateway "gateway" > > I like the last one, but I probably won't be a heavy user of this ... I think the last is most clear, will be a heavy user of this. :)) The IPs which are assigned this way will be mapped to the vlan/interface with the same subnet on the host to get out? Or does it still require a assigned interface in vm.conf? Very cool Reyk! Happy you managed to spend some time on this. Mischa > > -ml > >> >> I also wonder if we could technically use a non-local IP address for >> the gateway. I currently enforce that the prefix matches, but I don't >> enforce that both addresses are in the same subnet. >> >> When using the default auto-generated 100.64.0.0/31 method, it uses >> the first IP in the subnet as the gateway and the second IP for the >> VM. >> >>> And did you mean "The second interface" there instead of the "The second >>> VM"? >>> (Although I think the description fits for "The second VM" also...) >>> >> >> Yes, both, the second interface is correct as well. >> >>> I think the idea is sound. As long as we don't end up adding extra command >>> line args to vmctl to manually configure this, which it doesn't appear we >>> are >>> doing here. :) >>> >> >> I don't want to add it to vmctl either. >> >>> I didn't read the diff in great detail, I'll wait until you say you have a >>> final version. >>> >> >> OK, thanks. >> >> Reyk >> >>> -ml >>> >>>>This idea came up when I talked with Mischa at EuroBSDCon about >>>> OpenBSDAms: instead of using L2 and external static dhcpd for all VMs, >>>> it could be a solution to use L3 and to avoid bridge(4) and dhcpd(8). >>>> But it would need a way to serve static IPs via the internal dhcp >>&
[patch] www.openbsd.org/events.html
Hi All, Does it make sense to add my talk on vmm/vmd at EuroBSDCon to the events.html page? If it does, below is the diff. (Thanx Paul! :)) Mischa Index: events.html === RCS file: /home/OpenBSD/cvs/www/events.html,v retrieving revision 1.1182 diff -u -p -r1.1182 events.html --- events.html 22 Sep 2019 19:36:35 - 1.1182 +++ events.html 4 Oct 2019 07:47:37 - @@ -61,6 +61,8 @@ September 19-22, 2019, Lillehammer, Norw (slides) Marc Espie - Advanced ports toolkit: near-perfect packing-list generation (slides) +Mischa Peters - The OpenBSD hypervisor in the wild, a short story. +(https://2019.eurobsdcon.org/slides/The%20OpenBSD%20hypervisor%20in%20the%20wild,%20a%20short%20story%20-%20Mischa%20Peters.pdf;>slides)
httpd rewrite support and REQUEST_URI - repost
Hi All, With the XFF patch being committed, thank you very much Theo! Can someone have a look at the patch send last year? https://marc.info/?l=openbsd-tech=153303654230606 It's a patch by Tim Baumgard which sets the correct REQUEST_URI CGI variable. His git repo is at https://github.com/tbaumgard/openbsd-httpd-rewrite It's another piece of the puzzle which makes httpd better suited to host even more. Thanx! Mischa
Re: httpd: New log format to log X-Forwarded-{For|Port} headers
-- > On 3 May 2019, at 19:19, Theo Buehler wrote: > >> On Mon, Mar 04, 2019 at 02:06:02PM +0100, Bruno Flueckiger wrote: >> Hi, >> >> I've completely reworked my patch for httpd(8). The last patch broke the >> log format combined. And the config option was ugly. This time I've >> added another log format called forwarded. It appends two fields to the >> log format combined: The first field contains the value of the header >> X-Forwarded-For and the second one the value of X-Forwarded-Port. If >> either of the headers is empty or missing a dash (-) is written. >> >> The new log format is compatible with log analyzing tools like Webalizer >> or GoAccess. If you run httpd(8) behind a proxy like relayd(8) the new >> log format finally gives you a way to track the origin of the requests. > > Committed, thanks! Great! You are making a lot of people very happy! Mischa
Re: httpd: New log format to log X-Forwarded-{For|Port} headers
> On 3 May 2019, at 04:59, Theo Buehler wrote: > >> On Fri, Mar 08, 2019 at 10:52:28AM +0100, Reyk Floeter wrote: >> Hi, >> >>> On Mon, Mar 04, 2019 at 02:06:02PM +0100, Bruno Flueckiger wrote: >>> I've completely reworked my patch for httpd(8). The last patch broke the >>> log format combined. And the config option was ugly. This time I've >>> added another log format called forwarded. It appends two fields to the >>> log format combined: The first field contains the value of the header >>> X-Forwarded-For and the second one the value of X-Forwarded-Port. If >>> either of the headers is empty or missing a dash (-) is written. >>> >>> The new log format is compatible with log analyzing tools like Webalizer >>> or GoAccess. If you run httpd(8) behind a proxy like relayd(8) the new >>> log format finally gives you a way to track the origin of the requests. >>> >> >> Your diff looks clean and makes a lot of sense. >> >> Especially since X-Forwarded-For is a feature in relayd that I first >> used and documented around 2006/2007. Adding the forwarded style to >> httpd is a complementary feature in OpenBSD and not something for a >> random external web stack. >> >> OK reyk@ >> >> Anyone else, any objections? > > That would be really nice to have. Did this slip through the cracks or > are there concerns with this diff? > I believe it fell through the cracks. Would be super useful. Mischa >> >> Reyk >> >>> Cheers, >>> Bruno >>> >>> Index: usr.sbin/httpd/httpd.conf.5 >>> === >>> RCS file: /cvs/src/usr.sbin/httpd/httpd.conf.5,v >>> retrieving revision 1.103 >>> diff -u -p -r1.103 httpd.conf.5 >>> --- usr.sbin/httpd/httpd.conf.519 Feb 2019 11:37:26 -1.103 >>> +++ usr.sbin/httpd/httpd.conf.527 Feb 2019 15:26:48 - >>> @@ -450,7 +450,8 @@ The >>> .Ar style >>> can be >>> .Cm common , >>> -.Cm combined >>> +.Cm combined , >>> +.Cm forwarded >>> or >>> .Cm connection . >>> The styles >>> @@ -459,6 +460,14 @@ and >>> .Cm combined >>> write a log entry after each request similar to the standard Apache >>> and nginx access log formats. >>> +The style >>> +.Cm forwarded >>> +extends the style >>> +.Cm combined >>> +by appending two fields containing the values of the headers >>> +.Ar X-Forwarded-For >>> +and >>> +.Ar X-Forwarded-Port . >>> The style >>> .Cm connection >>> writes a summarized log entry after each connection, >>> Index: usr.sbin/httpd/httpd.h >>> === >>> RCS file: /cvs/src/usr.sbin/httpd/httpd.h,v >>> retrieving revision 1.143 >>> diff -u -p -r1.143 httpd.h >>> --- usr.sbin/httpd/httpd.h19 Feb 2019 11:37:26 -1.143 >>> +++ usr.sbin/httpd/httpd.h27 Feb 2019 15:26:48 - >>> @@ -437,7 +437,8 @@ SPLAY_HEAD(client_tree, client); >>> enum log_format { >>>LOG_FORMAT_COMMON, >>>LOG_FORMAT_COMBINED, >>> -LOG_FORMAT_CONNECTION >>> +LOG_FORMAT_CONNECTION, >>> +LOG_FORMAT_FORWARDED >>> }; >>> >>> struct log_file { >>> Index: usr.sbin/httpd/parse.y >>> === >>> RCS file: /cvs/src/usr.sbin/httpd/parse.y,v >>> retrieving revision 1.110 >>> diff -u -p -r1.110 parse.y >>> --- usr.sbin/httpd/parse.y19 Feb 2019 11:37:26 -1.110 >>> +++ usr.sbin/httpd/parse.y27 Feb 2019 15:26:48 - >>> @@ -140,7 +140,7 @@ typedef struct { >>> %tokenPROTOCOLS REQUESTS ROOT SACK SERVER SOCKET STRIP STYLE SYSLOG TCP >>> TICKET >>> %tokenTIMEOUT TLS TYPE TYPES HSTS MAXAGE SUBDOMAINS DEFAULT PRELOAD >>> REQUEST >>> %tokenERROR INCLUDE AUTHENTICATE WITH BLOCK DROP RETURN PASS REWRITE >>> -%tokenCA CLIENT CRL OPTIONAL PARAM >>> +%tokenCA CLIENT CRL OPTIONAL PARAM FORWARDED >>> %tokenSTRING >>> %token NUMBER >>> %typeport >>> @@ -1024,6 +1024,11 @@ logstyle: COMMON{ >>>srv_conf->flags |= SRVFLAG_LOG; >>>srv_conf->logformat = LOG_FORMAT_CONNECTION; >>>} >>> +| FORWARDED{ >&g
Re: Conditional sysupgrade
On 27 Apr at 22:57, Florian Obser wrote: > On Sat, Apr 27, 2019 at 09:53:08PM +0200, Mischa Peters wrote: > > Let me know if this needs more work. Love the idea of sysupgrade! > > Please shelf this for now, there is a lot of churn going on in the > tool in private and we are moving very fast. > > There are more subtleties to consider. Ok. Did get some good suggestions on my shell use, so might be able to put them to use at a later stage. Mischa
Re: Conditional sysupgrade
On 27 Apr at 17:52, Florian Obser wrote: > On Sat, Apr 27, 2019 at 01:23:20PM +0100, Marco Bonetti wrote: > > Hello folks, > > > > First of all congratulations on a new OpenBSD release and thanks for > > introducing sysupgrade in -current. > > > > Before sysupgrade, I was using a custom script for achieving the same > > result with only difference that I was checking if a new snapshot (or > > release) is available by looking at BUILDINFO before starting the > > upgrade process. > > > > Patch below introduce the same behaviour using SHA256.sig as control > > file. If you believe there is a valid use case for reinstalling already > > applied sets to the running system please let me know and I can add a > > -f force option. > > I see a need for the feature and also for the -f flag. One idea was if > you messed up your shared libs you just type sysupgrade to > unbreak things. (Doesn't quite work since not all the tools are > statically linked). > > I'm not happy with comparing the sha256 file, could you please use > what(1) to compare the downloaded kernel with the running kernel? > > $ sysctl -n kern.version | head -1 > OpenBSD 6.5-current (GENERIC.MP) #32: Fri Apr 26 10:37:48 MDT 2019 > $ what /home/_sysupgrade/bsd.mp | tail -1 > OpenBSD 6.5-current (GENERIC.MP) #32: Fri Apr 26 10:37:48 MDT 2019 > > You need to check if you are running MP or SP though. > > I have also suggested this to Mischa, added to Cc. As Florian suggested I compared kern.version to what from both bsd and bsd.mp. I personally don't like repetition in the code, but I don't know how to do this more elegantly. The other thing that might need to be adjuested is when to compare, I choose to do this all the way at the end before bsd.rd gets copied to bsd.upgrade. Let me know if this needs more work. Love the idea of sysupgrade! --- /usr/sbin/sysupgradeFri Apr 26 18:23:15 2019 +++ sysupgrade Sat Apr 27 17:50:15 2019 @@ -149,6 +149,19 @@ unpriv signify -C -p "${SIGNIFY_KEY}" -x SHA256.sig ${SETS} +VERSION=$(sysctl -n kern.version | head -1) +BSDSP=$(what /home/_sysupgrade/bsd | tail -1 | awk '{$1=$1;print}') +BSDMP=$(what /home/_sysupgrade/bsd.mp | tail -1 | awk '{$1=$1;print}') + +if [[ ${VERSION} = ${BSDMP} ]]; then + echo "No update needed" + exit 1 +fi +if [[ ${VERSION} = ${BSDSP} ]]; then + echo "No update needed" + exit 1 +fi + cp bsd.rd /nbsd.upgrade ln /nbsd.upgrade /bsd.upgrade rm /nbsd.upgrade Mischa > > > > > Cheers, > > Marco > > > > Index: usr.sbin/sysupgrade/sysupgrade.8 > > === > > RCS file: /cvs/src/usr.sbin/sysupgrade/sysupgrade.8,v > > retrieving revision 1.2 > > diff -u -p -u -r1.2 sysupgrade.8 > > --- usr.sbin/sysupgrade/sysupgrade.826 Apr 2019 05:54:49 - > > 1.2 > > +++ usr.sbin/sysupgrade/sysupgrade.827 Apr 2019 11:54:40 - > > @@ -28,7 +28,7 @@ > > .Nm > > is a utility to upgrade > > .Ox > > -to the next release or a new snapshot. > > +to the next release or a new snapshot if available. > > .Pp > > .Nm > > downloads the necessary files to > > > > Index: usr.sbin/sysupgrade/sysupgrade.sh > > === > > RCS file: /cvs/src/usr.sbin/sysupgrade/sysupgrade.sh,v > > retrieving revision 1.6 > > diff -u -p -u -r1.6 sysupgrade.sh > > --- usr.sbin/sysupgrade/sysupgrade.sh 26 Apr 2019 21:52:39 - > > 1.6 > > +++ usr.sbin/sysupgrade/sysupgrade.sh 27 Apr 2019 11:54:48 - > > @@ -110,7 +110,19 @@ fi > > > > cd ${SETSDIR} > > > > -unpriv -f SHA256.sig ftp -Vmo SHA256.sig ${URL}SHA256.sig > > +unpriv -f SHA256.sig.tmp ftp -Vmo SHA256.sig.tmp ${URL}SHA256.sig > > +TMP_SHA=$(sha256 -q SHA256.sig.tmp) > > + > > +unpriv touch SHA256.sig > > +CUR_SHA=$(sha256 -q SHA256.sig) > > + > > +if [[ "${TMP_SHA}" = "${CUR_SHA}" ]]; then > > + rm SHA256.sig.tmp > > + return 0 > > +fi > > + > > +unpriv cat SHA256.sig.tmp >SHA256.sig > > +rm SHA256.sig.tmp > > > > _KEY=openbsd-${_KERNV[0]%.*}${_KERNV[0]#*.}-base.pub > > _NEXTKEY=openbsd-${NEXT_VERSION%.*}${NEXT_VERSION#*.}-base.pub > > > > -- > I'm not entirely sure you are real. >
Re: [PATCH] httpd: Write X-Forwarded-For to access.log
> On 12 Feb 2019, at 14:52, Bruno Flueckiger wrote: > > On 12.11.18 12:40, Bruno Flueckiger wrote: >> On 11.11.18 18:43, Claudio Jeker wrote: >>> On Sun, Nov 11, 2018 at 06:32:53PM +0100, Bruno Flueckiger wrote: >>>> On 11.11.18 15:29, Florian Obser wrote: >>>>> On Sun, Nov 11, 2018 at 01:46:06PM +0100, Sebastian Benoit wrote: >>>>>> Bruno Flueckiger(inform...@gmx.net) on 2018.11.11 10:31:34 +0100: >>>>>>> Hi >>>>>>> >>>>>>> When I run httpd(8) behind relayd(8) the access log of httpd contains >>>>>>> the IP address of relayd, but not the IP address of the client. I've >>>>>>> tried to match the logs of relayd(8) and httpd(8) using some scripting >>>>>>> and failed. >>>>>>> >>>>>>> So I've written a patch for httpd(8). It stores the content of the >>>>>>> X-Forwarded-For header in the third field of the log entry: >>>>>>> >>>>>>> www.example.com 192.0.2.99 192.0.2.134 - [11/Nov/2018:09:28:48 ... >>>>>>> >>>>>>> Cheers, >>>>>>> Bruno >>>>>> >>>>>> I'm not sure we should do this unconditionally. With no relayd or other >>>>>> proxy infront of httpd, this is (more) data the client controls. >>>>> >>>>> Isn't what httpd(8) currently logs apache's common log format? If >>>>> people are shoving that through webalizer or something that will >>>>> break. I don't think we can do this without a config option. >>>>> Do we need LogFormat? >>>>> >>>>>> >>>>>> Could this be a problem? >>>>>> >>>>>> code reads ok. >>>>>> >>>>>> /Benno >>>>>> >>>> >>>> I've extended my patch with an option to the log directive. Both log xff >>>> and log combined must be set for a server to log the content of the >>>> X-Forwarded-For header. If only log combined is set the log entries >>>> remain in the well-known format. >>>> >>>> This prevents clients from getting unwanted data in the log by default. >>>> And it makes sure the log format remains default until one decides >>>> actively to change it. >>> >>> From my experience with webservices is that today logging the IP is >>> seldomly good enough. Please also include X-Forwarded-Port and maybe >>> X-Forwarded-Proto. >>> In general thanks to CG-NAT logging only IP is a bit pointless. >>> >>> -- >>> :wq Claudio >>> >> >> Thanks for the hint, Claudio. >> >> This version of my diff includes the two headers X-Forwarded-For and >> X-Forwarded-Port. Instead of using the third field of the log entry I >> add two additional fileds to it. The first one contains the value of >> X-Forwarded-For and the second one that of X-Forwarded-Port. >> >> I think that appending two fields might do less harm than replacing one >> field at the beginning of the log entry. I'm not sure that adding >> X-Forwarded-Proto to the log really brings a benefit, so I left it away >> in this diff. > > In the meantime I've run my diff on a webserver. In my experience > webalizer has no problem with the modified log format. GoAccess on the > other hand has troubles reading access.log, but that happens for me with > and without the diff applied. > > I think that most admins would profit from the diff. The setting is > optional so it doesn't affect everybody rightaway. And I believe that > those who enable it are ready to reconfigure whatever log parser they > use. > > Therefore I have reworked my diff so it applies to the -current tree. Would be a very welcome addition to httpd. Mischa > > Index: usr.sbin/httpd/config.c > === > RCS file: /cvs/src/usr.sbin/httpd/config.c,v > retrieving revision 1.55 > diff -u -p -r1.55 config.c > --- usr.sbin/httpd/config.c 20 Jun 2018 16:43:05 - 1.55 > +++ usr.sbin/httpd/config.c 12 Feb 2019 13:37:55 - > @@ -427,6 +427,10 @@ config_getserver_config(struct httpd *en > if ((srv_conf->flags & f) == 0) > srv_conf->flags |= parent->flags & f; > > + f = SRVFLAG_XFF|SRVFLAG_NO_XFF; > + if ((srv_conf->flags & f) == 0) > + srv_
Re: sbin/wsconsctl: show more data
I have to concur with Paul! Saw the new font yesterday and was pleasantly surprised. Very nice! Mischa -- > On 6 Jan 2019, at 15:51, Paul de Weerd wrote: > > Lots of negativity here, so I just wanted to chime in - really love > the new console font! Crisp and easily readable letters, big enough > to be readable, with a reasonable number of letters per line > (${COLUMNS}) en lines per screen (${LINES}). It does mean pretty big > characters on big screens when in console mode, but on big screens I > want to run X anyway, so it's all good. What I understand of the > algorithm to pick the font size makes a lot of sense to me. > > Thank you Frederic for all the effort you put into this font and > making it happen on the console and in X through the fonts/spleen > port! > > Cheers, > > Paul 'WEiRD' de Weerd > > -- >> [<++>-]<+++.>+++[<-->-]<.>+++[<+ > +++>-]<.>++[<>-]<+.--.[-] > http://www.weirdnet.nl/ >
Re: carp though bridge with vmd
Hi Reyk, If there is anything I can supply let me know, but I guess it's simple enough to replicate. Let me check carppeer anyway. Mischa > On 10 Dec 2018, at 09:55, Reyk Floeter wrote: > > Hi, > > as a general note for virtual switches and clouds that don’t support CARP due > to restrictions on multicast and/or additional MACs: I use carppeer and > lladdr of the parent interface in such cases. > > That doesn’t mean that you should need it with vmd and bridge and we have to > look into this. > > Reyk > >> Am 09.12.2018 um 16:56 schrieb Mischa : >> >> Hi All, >> >> Is there a way to get carp working through a bridge? >> I am currently testing to see whether I can have 2 vmd VMs on different >> hosts use carp between them. >> The current state that I am currently at is, both VMs are master. >> >> Setup on both hosts is the same, bridge1 with em0 as interface. >> >> # vm.conf >> switch "uplink_bridge1" { >> interface bridge1 >> } >> vm "lb1" { >> disable >> disk "/home/mischa/vmm/lb1.img" >> interface tap { >> switch "uplink_bridge1" >> } >> } >> >> lb1 carp config: >> inet 192.168.0.100 255.255.255.0 NONE vhid 1 pass carpdev vio0 advbase >> 10 advskew 100 >> >> lb2 carp config: >> inet 192.168.0.100 255.255.255.0 NONE vhid 1 pass carpdev vio0 advbase >> 10 advskew 110 >> >> Is there anything that can be configured on the bridge side? >> >> Mischa >> >
Re: carp though bridge with vmd
Hi David, Yes there is. Currently the machine are directly connected to each other on em0, the VMs are able to reach each other. VM1 -> bridge1 -> em0 — em0 <- bridge1 <- VM2 Mischa -- > On 10 Dec 2018, at 03:00, David Gwynne wrote: > > Is there a shared ethernet network between the bridges on each host? > >> On 10 Dec 2018, at 01:56, Mischa wrote: >> >> Hi All, >> >> Is there a way to get carp working through a bridge? >> I am currently testing to see whether I can have 2 vmd VMs on different >> hosts use carp between them. >> The current state that I am currently at is, both VMs are master. >> >> Setup on both hosts is the same, bridge1 with em0 as interface. >> >> # vm.conf >> switch "uplink_bridge1" { >> interface bridge1 >> } >> vm "lb1" { >> disable >> disk "/home/mischa/vmm/lb1.img" >> interface tap { >> switch "uplink_bridge1" >> } >> } >> >> lb1 carp config: >> inet 192.168.0.100 255.255.255.0 NONE vhid 1 pass carpdev vio0 advbase >> 10 advskew 100 >> >> lb2 carp config: >> inet 192.168.0.100 255.255.255.0 NONE vhid 1 pass carpdev vio0 advbase >> 10 advskew 110 >> >> Is there anything that can be configured on the bridge side? >> >> Mischa >> >
carp though bridge with vmd
Hi All, Is there a way to get carp working through a bridge? I am currently testing to see whether I can have 2 vmd VMs on different hosts use carp between them. The current state that I am currently at is, both VMs are master. Setup on both hosts is the same, bridge1 with em0 as interface. # vm.conf switch "uplink_bridge1" { interface bridge1 } vm "lb1" { disable disk "/home/mischa/vmm/lb1.img" interface tap { switch "uplink_bridge1" } } lb1 carp config: inet 192.168.0.100 255.255.255.0 NONE vhid 1 pass carpdev vio0 advbase 10 advskew 100 lb2 carp config: inet 192.168.0.100 255.255.255.0 NONE vhid 1 pass carpdev vio0 advbase 10 advskew 110 Is there anything that can be configured on the bridge side? Mischa
Re: [PATCH] httpd: Write X-Forwarded-For to access.log
> On 11 Nov 2018, at 18:43, Claudio Jeker wrote: > >> On Sun, Nov 11, 2018 at 06:32:53PM +0100, Bruno Flueckiger wrote: >>> On 11.11.18 15:29, Florian Obser wrote: >>>> On Sun, Nov 11, 2018 at 01:46:06PM +0100, Sebastian Benoit wrote: >>>> Bruno Flueckiger(inform...@gmx.net) on 2018.11.11 10:31:34 +0100: >>>>> Hi >>>>> >>>>> When I run httpd(8) behind relayd(8) the access log of httpd contains >>>>> the IP address of relayd, but not the IP address of the client. I've >>>>> tried to match the logs of relayd(8) and httpd(8) using some scripting >>>>> and failed. >>>>> >>>>> So I've written a patch for httpd(8). It stores the content of the >>>>> X-Forwarded-For header in the third field of the log entry: >>>>> >>>>> www.example.com 192.0.2.99 192.0.2.134 - [11/Nov/2018:09:28:48 ... >>>>> >>>>> Cheers, >>>>> Bruno >>>> >>>> I'm not sure we should do this unconditionally. With no relayd or other >>>> proxy infront of httpd, this is (more) data the client controls. >>> >>> Isn't what httpd(8) currently logs apache's common log format? If >>> people are shoving that through webalizer or something that will >>> break. I don't think we can do this without a config option. >>> Do we need LogFormat? >>> >>>> >>>> Could this be a problem? >>>> >>>> code reads ok. >>>> >>>> /Benno >>>> >> >> I've extended my patch with an option to the log directive. Both log xff >> and log combined must be set for a server to log the content of the >> X-Forwarded-For header. If only log combined is set the log entries >> remain in the well-known format. >> >> This prevents clients from getting unwanted data in the log by default. >> And it makes sure the log format remains default until one decides >> actively to change it. > > From my experience with webservices is that today logging the IP is > seldomly good enough. Please also include X-Forwarded-Port and maybe > X-Forwarded-Proto. > In general thanks to CG-NAT logging only IP is a bit pointless. Or with relayd in front of it. :) Welcome addition to httpd. Mischa > > -- > :wq Claudio > >> Index: usr.sbin/httpd/config.c >> === >> RCS file: /cvs/src/usr.sbin/httpd/config.c,v >> retrieving revision 1.55 >> diff -u -p -r1.55 config.c >> --- usr.sbin/httpd/config.c20 Jun 2018 16:43:05 -1.55 >> +++ usr.sbin/httpd/config.c11 Nov 2018 14:45:47 - >> @@ -427,6 +427,10 @@ config_getserver_config(struct httpd *en >>if ((srv_conf->flags & f) == 0) >>srv_conf->flags |= parent->flags & f; >> >> +f = SRVFLAG_XFF|SRVFLAG_NO_XFF; >> +if ((srv_conf->flags & f) == 0) >> +srv_conf->flags |= parent->flags & f; >> + >>f = SRVFLAG_AUTH|SRVFLAG_NO_AUTH; >>if ((srv_conf->flags & f) == 0) { >>srv_conf->flags |= parent->flags & f; >> Index: usr.sbin/httpd/httpd.conf.5 >> === >> RCS file: /cvs/src/usr.sbin/httpd/httpd.conf.5,v >> retrieving revision 1.101 >> diff -u -p -r1.101 httpd.conf.5 >> --- usr.sbin/httpd/httpd.conf.520 Jun 2018 16:43:05 -1.101 >> +++ usr.sbin/httpd/httpd.conf.511 Nov 2018 14:45:47 - >> @@ -455,6 +455,14 @@ If not specified, the default is >> Enable or disable logging to >> .Xr syslog 3 >> instead of the log files. >> +.It Oo Ic no Oc Ic xff >> +Enable or disable logging of the request header >> +.Ar X-Forwarded-For >> +if >> +.Cm log combined >> +is set. This header can be set by a reverse proxy like >> +.Xr relayd 8 >> +and should contain the IP address of the client. >> .El >> .It Ic pass >> Disable any previous >> Index: usr.sbin/httpd/httpd.h >> === >> RCS file: /cvs/src/usr.sbin/httpd/httpd.h,v >> retrieving revision 1.142 >> diff -u -p -r1.142 httpd.h >> --- usr.sbin/httpd/httpd.h11 Oct 2018 09:52:22 -1.142 >> +++ usr.sbin/httpd/httpd.h11 Nov 2018 14:45:47 - >> @@ -400,6 +400,8 @@ SPLAY_HEAD(client_tree, client); >> #define SRVFLAG_DEFAULT_TYPE0x0080 >> #define SRVFLAG_PATH_R
Re: Clean install or upgrade to 6.4 fw_update error
That helps. Thanx! Mischa > On 18 Oct 2018, at 17:47, Stuart Henderson wrote: > > That error message is because there are no syspatch yet, it is not from the > firmware update. > > > On 18 October 2018 16:02:34 Mischa wrote: > >> Hi All, >> >> >> Just ran a couple of updates and clean installs and I am seeing the >> following error during boot. >> >> >> Clean install: >> running rc.firsttime >> Path to firmware: http://firmware.openbsd.org/firmware/6.4/ >> Installing: intel-firmware >> Checking for available binary patches...ftp: Error retrieving file: 404 Not >> Found >> >> >> Upgrade: >> running rc.firsttime >> Path to firmware: http://firmware.openbsd.org/firmware/6.4/ >> Updating: intel-firmware-20180807v0 >> Checking for available binary patches...ftp: Error retrieving file: 404 Not >> Found >> >> >> Doing this by hand after login is successful. >> # fw_update -v intel-firmware-20180807p0v0 >> Path to firmware: http://firmware.openbsd.org/firmware/6.4/ >> Updating: intel-firmware-20180807p0v0 >> >> >> Mischa > > >
Clean install or upgrade to 6.4 fw_update error
Hi All, Just ran a couple of updates and clean installs and I am seeing the following error during boot. Clean install: running rc.firsttime Path to firmware: http://firmware.openbsd.org/firmware/6.4/ Installing: intel-firmware Checking for available binary patches...ftp: Error retrieving file: 404 Not Found Upgrade: running rc.firsttime Path to firmware: http://firmware.openbsd.org/firmware/6.4/ Updating: intel-firmware-20180807v0 Checking for available binary patches...ftp: Error retrieving file: 404 Not Found Doing this by hand after login is successful. # fw_update -v intel-firmware-20180807p0v0 Path to firmware: http://firmware.openbsd.org/firmware/6.4/ Updating: intel-firmware-20180807p0v0 Mischa
Re: Reuse VM ids.
No idea if the code works yet. Hopefully I can try later. But love the idea. Mischa > On 8 Oct 2018, at 04:31, Ori Bernstein wrote: > > Keep a list of known vms, and reuse the VM IDs. This means that when using > '-L', the IP addresses of the VMs are stable. > > diff --git usr.sbin/vmd/config.c usr.sbin/vmd/config.c > index af12b790002..522bae32501 100644 > --- usr.sbin/vmd/config.c > +++ usr.sbin/vmd/config.c > @@ -61,7 +61,10 @@ config_init(struct vmd *env) >if (what & CONFIG_VMS) { >if ((env->vmd_vms = calloc(1, sizeof(*env->vmd_vms))) == NULL) >return (-1); > +if ((env->vmd_known = calloc(1, sizeof(*env->vmd_known))) == NULL) > +return (-1); >TAILQ_INIT(env->vmd_vms); > +TAILQ_INIT(env->vmd_known); >} >if (what & CONFIG_SWITCHES) { >if ((env->vmd_switches = calloc(1, > diff --git usr.sbin/vmd/vmd.c usr.sbin/vmd/vmd.c > index 18a5e0d3d5d..732691b4381 100644 > --- usr.sbin/vmd/vmd.c > +++ usr.sbin/vmd/vmd.c > @@ -1169,6 +1169,27 @@ vm_remove(struct vmd_vm *vm, const char *caller) >free(vm); > } > > +static uint32_t > +claim_vmid(const char *name) > +{ > +struct name2id *n2i = NULL; > + > +TAILQ_FOREACH(n2i, env->vmd_known, entry) > +if (strcmp(n2i->name, name) == 0) > +return n2i->id; > + > +if (++env->vmd_nvm == 0) > +fatalx("too many vms"); > +if ((n2i = calloc(1, sizeof(struct name2id))) == NULL) > +fatalx("could not alloc vm name"); > +n2i->id = env->vmd_nvm; > +if (strlcpy(n2i->name, name, sizeof(n2i->name)) >= sizeof(n2i->name)) > +fatalx("overlong vm name"); > +TAILQ_INSERT_TAIL(env->vmd_known, n2i, entry); > + > +return n2i->id; > +} > + > int > vm_register(struct privsep *ps, struct vmop_create_params *vmc, > struct vmd_vm **ret_vm, uint32_t id, uid_t uid) > @@ -1300,11 +1321,8 @@ vm_register(struct privsep *ps, struct > vmop_create_params *vmc, >vm->vm_cdrom = -1; >vm->vm_iev.ibuf.fd = -1; > > -if (++env->vmd_nvm == 0) > -fatalx("too many vms"); > - >/* Assign a new internal Id if not specified */ > -vm->vm_vmid = id == 0 ? env->vmd_nvm : id; > +vm->vm_vmid = (id == 0) ? claim_vmid(vcp->vcp_name) : id; > >log_debug("%s: registering vm %d", __func__, vm->vm_vmid); >TAILQ_INSERT_TAIL(env->vmd_vms, vm, vm_entry); > diff --git usr.sbin/vmd/vmd.h usr.sbin/vmd/vmd.h > index b7c012854e8..86fad536e59 100644 > --- usr.sbin/vmd/vmd.h > +++ usr.sbin/vmd/vmd.h > @@ -276,6 +276,13 @@ struct vmd_user { > }; > TAILQ_HEAD(userlist, vmd_user); > > +struct name2id { > +charname[VMM_MAX_NAME_LEN]; > +int32_tid; > +TAILQ_ENTRY(name2id)entry; > +}; > +TAILQ_HEAD(name2idlist, name2id); > + > struct address { >struct sockaddr_storage ss; >int prefixlen; > @@ -300,6 +307,7 @@ struct vmd { > >uint32_t vmd_nvm; >struct vmlist*vmd_vms; > +struct name2idlist*vmd_known; >uint32_t vmd_nswitches; >struct switchlist*vmd_switches; >struct userlist*vmd_users; > > -- >Ori Bernstein >
softraid offline
Hi All, Rebooting after a dbb prompt, which I was unfortunately unable to capture, the softraid configuration seemed to have been damaged. root@j6:~ # dmesg | grep softraid softraid0 at root scsibus5 at softraid0: 256 targets softraid0: trying to bring up sd8 degraded softraid0: sd8 was not shutdown properly softraid0: sd8 is offline, will not be brought online root@j6:~ # bioctl -d sd8 bioctl: Can't locate sd8 device via /dev/bio root@j6:~ # bioctl -c 5 -l /dev/sd3a,/dev/sd4a,/dev/sd5a,/dev/sd6a softraid0 softraid0: trying to bring up sd8 degraded softraid0: sd8 was not shutdown properly softraid0: sd8 is offline, will not be brought online root@j6:~ # bioctl -R /dev/sd3a sd8 bioctl: Can't locate sd8 device via /dev/bio All the /dev/sd8* are there. So it looks like I am stuck with schroedingers softraid. It's both there and not there. Anybody some pointers on how to get it back or to remove it completely? Thanx! Mischa
[diff] copyright notice index.html
Hi @tech, Just noticed index.html still has 2017 in the copyright notice. Mischa Index: index.html === RCS file: /cvs/www/index.html,v retrieving revision 1.726 diff -r1.726 index.html 7c7 <
Re: [diff] httpd.conf.5 - consistent IPs, added examples
> On 22 Jun 2018, at 22:20, Sebastian Benoit wrote: > > Jason McIntyre(j...@kerhand.co.uk) on 2018.06.22 19:38:55 +0100: >>> On Fri, Jun 22, 2018 at 12:08:07AM +0200, Mischa wrote: >>> Hi tech@, >>> >>> Changed httpd.conf.5 to be more consistent with IPs used, all documentation >>> IPs now. >>> And added a couple of examples. 2 for dyamic pages, cgi and php. >>> One fairly common used rewrite rule for things like wordpress. >>> >>> Mischa >>> >> >> hi. >> >> the diff reads ok to me, but i cannot easily verify it because it does >> not apply cleanly. > > The parts that dont apply are already commited parts from reyks rewrite > diff. They can be ignored. My mistake. Will take a clean checkout. > However i dont know if we want all those examples. We will get them in > package readmes eventually. I can also check out the pkg-readmes and expand those. As the php ones don’t have any mention about the httpd.conf needs. Will check out Wordpress as well. I do think adding extra examples can be helpful as its probably a knee jerk reaction for people to go to the man page, or man.openbsd.org. Mischa > > Reyk? > >> >> reyk? >> >> jmc >> >>> Index: httpd.conf.5 >>> === >>> RCS file: /cvs/src/usr.sbin/httpd/httpd.conf.5,v >>> retrieving revision 1.100 >>> diff -u -p -r1.100 httpd.conf.5 >>> --- httpd.conf.518 Jun 2018 06:04:25 -1.100 >>> +++ httpd.conf.521 Jun 2018 21:55:38 - >>> @@ -97,7 +97,7 @@ Macros are not expanded inside quotes. >>> .Pp >>> For example: >>> .Bd -literal -offset indent >>> -ext_ip="10.0.0.1" >>> +ext_ip="203.0.113.1" >>> server "default" { >>>listen on $ext_ip port 80 >>> } >>> @@ -198,6 +198,8 @@ argument can be used with return codes i >>> .Sq Location: >>> header for redirection to a specified URI. >>> .Pp >>> +It is possible to rewrite the request to redirect it to a different >>> +external location. >>> The >>> .Ar uri >>> may contain predefined macros that will be expanded at runtime: >>> @@ -396,10 +398,10 @@ the >>> using pattern matching instead of shell globbing rules, >>> see >>> .Xr patterns 7 . >>> -The pattern may contain captures that can be used in the >>> -.Ar uri >>> -of an enclosed >>> +The pattern may contain captures that can be used in an enclosed >>> .Ic block return >>> +or >>> +.Ic request rewrite >>> option. >>> .It Oo Ic no Oc Ic log Op Ar option >>> Set the specified logging options. >>> @@ -462,6 +464,19 @@ in a location. >>> Configure the options for the request path. >>> Valid options are: >>> .Bl -tag -width Ds >>> +.It Oo Ic no Oc Ic rewrite Ar path >>> +Enable or disable rewriting of the request. >>> +Unlike the redirection with >>> +.Ic block return , >>> +this will change the request path internally before >>> +.Nm httpd >>> +makes a final decision about the matching location. >>> +The >>> +.Ar path >>> +argument may contain predefined macros that will be expanded at runtime. >>> +See the >>> +.Ic block return >>> +option for the list of supported macros. >>> .It Ic strip Ar number >>> Strip >>> .Ar number >>> @@ -699,7 +714,7 @@ server "www.b.example.com" { >>> } >>> >>> server "intranet.example.com" { >>> -listen on 10.0.0.1 port 80 >>> +listen on 192.0.2.1 port 80 >>>root "/htdocs/intranet.example.com" >>> } >>> .Ed >>> @@ -709,12 +724,43 @@ Simple redirections can be configured wi >>> directive: >>> .Bd -literal -offset indent >>> server "example.com" { >>> -listen on 10.0.0.1 port 80 >>> +listen on 203.0.113.1 port 80 >>>block return 301 "http://www.example.com$REQUEST_URI; >>> } >>> >>> server "www.example.com" { >>> -listen on 10.0.0.1 port 80 >>> +listen on 203.0.113.1 port 80 >>> +} >>> +.Ed >>> +.Pp >>> +Serving dynamic pages can be defined with the >>> +.Ic location >>> +directive: >>> +.Bd -literal -offset indent >>> +server "www.example.com" { >>> +listen on * port 80 >>> +location "/*.cgi*" { >>> +fastcgi >>> +root "/cgi-bin/" >>> +} >>> +location "/*.php*" { >>> +fastcgi socket "/run/php-fpm.sock" >>> +} >>> +} >>> +.Ed >>> +.Pp >>> +The request can also be rewritten with the >>> +.Ic request rewrite >>> +directive: >>> +.Bd -literal -offset indent >>> +server "www.example.com" { >>> +listen on * port 80 >>> +location match "/old/(.*)" { >>> +request rewrite "/new/%1" >>> +} >>> +location match "/([%a%d]+)" { >>> +request rewrite "/dynamic/index.php?q=%1" >>> +} >>> } >>> .Ed >>> .Sh SEE ALSO >>> >> >
[diff] httpd.conf.5 - consistent IPs, added examples
Hi tech@, Changed httpd.conf.5 to be more consistent with IPs used, all documentation IPs now. And added a couple of examples. 2 for dyamic pages, cgi and php. One fairly common used rewrite rule for things like wordpress. Mischa Index: httpd.conf.5 === RCS file: /cvs/src/usr.sbin/httpd/httpd.conf.5,v retrieving revision 1.100 diff -u -p -r1.100 httpd.conf.5 --- httpd.conf.518 Jun 2018 06:04:25 - 1.100 +++ httpd.conf.521 Jun 2018 21:55:38 - @@ -97,7 +97,7 @@ Macros are not expanded inside quotes. .Pp For example: .Bd -literal -offset indent -ext_ip="10.0.0.1" +ext_ip="203.0.113.1" server "default" { listen on $ext_ip port 80 } @@ -198,6 +198,8 @@ argument can be used with return codes i .Sq Location: header for redirection to a specified URI. .Pp +It is possible to rewrite the request to redirect it to a different +external location. The .Ar uri may contain predefined macros that will be expanded at runtime: @@ -396,10 +398,10 @@ the using pattern matching instead of shell globbing rules, see .Xr patterns 7 . -The pattern may contain captures that can be used in the -.Ar uri -of an enclosed +The pattern may contain captures that can be used in an enclosed .Ic block return +or +.Ic request rewrite option. .It Oo Ic no Oc Ic log Op Ar option Set the specified logging options. @@ -462,6 +464,19 @@ in a location. Configure the options for the request path. Valid options are: .Bl -tag -width Ds +.It Oo Ic no Oc Ic rewrite Ar path +Enable or disable rewriting of the request. +Unlike the redirection with +.Ic block return , +this will change the request path internally before +.Nm httpd +makes a final decision about the matching location. +The +.Ar path +argument may contain predefined macros that will be expanded at runtime. +See the +.Ic block return +option for the list of supported macros. .It Ic strip Ar number Strip .Ar number @@ -699,7 +714,7 @@ server "www.b.example.com" { } server "intranet.example.com" { - listen on 10.0.0.1 port 80 + listen on 192.0.2.1 port 80 root "/htdocs/intranet.example.com" } .Ed @@ -709,12 +724,43 @@ Simple redirections can be configured wi directive: .Bd -literal -offset indent server "example.com" { - listen on 10.0.0.1 port 80 + listen on 203.0.113.1 port 80 block return 301 "http://www.example.com$REQUEST_URI; } server "www.example.com" { - listen on 10.0.0.1 port 80 + listen on 203.0.113.1 port 80 +} +.Ed +.Pp +Serving dynamic pages can be defined with the +.Ic location +directive: +.Bd -literal -offset indent +server "www.example.com" { + listen on * port 80 + location "/*.cgi*" { + fastcgi + root "/cgi-bin/" + } + location "/*.php*" { + fastcgi socket "/run/php-fpm.sock" + } +} +.Ed +.Pp +The request can also be rewritten with the +.Ic request rewrite +directive: +.Bd -literal -offset indent +server "www.example.com" { + listen on * port 80 + location match "/old/(.*)" { + request rewrite "/new/%1" + } + location match "/([%a%d]+)" { + request rewrite "/dynamic/index.php?q=%1" + } } .Ed .Sh SEE ALSO
Re: [patch] Skip background scan if bssid is set
> On 29 Apr 2018, at 11:43, Stuart Henderson <s...@spacehopper.org> wrote: > >> On 2018/04/29 10:17, Stefan Sperling wrote: >>> On Sun, Apr 29, 2018 at 03:39:07AM +0200, Jesper Wallin wrote: >>> Hi all, >>> >>> I recently learned that my AP behaves badly and I have packet loss when >>> the background scan is running. I had a small chat with stsp@ about it, >>> asking if there is a way to disable it. He kindly explained that if I'm >>> connected to an AP with a weak signal, it will try to find another AP >>> with better signal and use that one instead. >>> >>> Sadly, I only have a single AP at home and this doesn't really solve my >>> problem. Though, you can also set a desired bssid to use, to force it >>> to connect to a single AP. However, the background scan will still run >>> even if this is set. >>> >>> Maybe the background scan has other use-cases that I'm not aware of, if >>> so, I apologize in advance. The patch below simply check if a bssid is >>> specified and if so, skip the background scan. >> >> I agree, even though it would be nice to understand the underlying >> packet loss issue. But I cannot reproduce the problem unforunately :( >> Have you verified that the problem only happens on this particular AP? > > It's very common for wifi clients to do background scans so I'd be > interested to know whether non-OpenBSD clients also see packet loss, > or whether OpenBSD with a different client device is any better. What > are the AP and client devices? Are other firmware versions available? I > guess bg scan must use power-saving to queue frames while the client is > off channel so maybe the issue relates to this. > > I'm wondering if changing this may introduce problems when an AP moves > to a different channel? Either by manual configuration, mechanisms > like Ruckus' channelfly (still possible on single-AP even without a > controller), radar detect on 5GHz, or even something as simple as > rebooting an AP set to "auto" channel. How does this play with roaming protocols on “enterprise” WiFi equipment, like 802.11k and 802.11v? Mischa
iwm0 doesn't connect automatically after reboot
acpi0: bus -1 (RP22) acpiprt23 at acpi0: bus -1 (RP23) acpiprt24 at acpi0: bus -1 (RP24) acpicpu0 at acpi0: C3(200@1034 mwait.1@0x60), C2(200@151 mwait.1@0x33), C1(1000@1 mwait.1), PSS acpicpu1 at acpi0: C3(200@1034 mwait.1@0x60), C2(200@151 mwait.1@0x33), C1(1000@1 mwait.1), PSS acpicpu2 at acpi0: C3(200@1034 mwait.1@0x60), C2(200@151 mwait.1@0x33), C1(1000@1 mwait.1), PSS acpicpu3 at acpi0: C3(200@1034 mwait.1@0x60), C2(200@151 mwait.1@0x33), C1(1000@1 mwait.1), PSS acpipwrres0 at acpi0: PUBS, resource for XHC_ acpipwrres1 at acpi0: WRST acpipwrres2 at acpi0: WRST acpitz0 at acpi0: critical temperature is 128 degC acpithinkpad0 at acpi0 acpiac0 at acpi0: AC unit offline acpibat0 at acpi0: BAT0 model "45N1113" serial 4754 type LION oem "LGC" acpibat1 at acpi0: BAT1 model "45N1775" serial 2821 type LION oem "SANYO" "INT3F0D" at acpi0 not configured "LEN0071" at acpi0 not configured "LEN2046" at acpi0 not configured "INT3515" at acpi0 not configured acpibtn0 at acpi0: SLPB "PNP0C14" at acpi0 not configured acpibtn1 at acpi0: LID_ "PNP0C14" at acpi0 not configured "PNP0C14" at acpi0 not configured "PNP0C14" at acpi0 not configured "INT3394" at acpi0 not configured acpivideo0 at acpi0: GFX0 acpivout at acpivideo0 not configured cpu0: Enhanced SpeedStep 1296 MHz: speeds: 2701, 2700, 2600, 2500, 2400, 2200, 2000, 1800, 1600, 1500, 1300, 1100, 800, 700, 600, 400 MHz pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel Core 7G Host" rev 0x02 inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 620" rev 0x02 drm0 at inteldrm0 inteldrm0: msi error: [drm:pid0:i915_firmware_load_error_print] *ERROR* failed to load firmware i915/kbl_dmc_ver1.bin (-22) inteldrm0: 1920x1080, 32bpp wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation) wsdisplay0: screen 1-5 added (std, vt100 emulation) xhci0 at pci0 dev 20 function 0 "Intel 100 Series xHCI" rev 0x21: msi usb0 at xhci0: USB revision 3.0 uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00 addr 1 pchtemp0 at pci0 dev 20 function 2 "Intel 100 Series Thermal" rev 0x21 dwiic0 at pci0 dev 21 function 0 "Intel 100 Series I2C" rev 0x21: apic 2 int 16 iic0 at dwiic0 dwiic1 at pci0 dev 21 function 1 "Intel 100 Series I2C" rev 0x21: apic 2 int 17 iic1 at dwiic1 "INT3515" at iic1 addr 0x38 not configured "Intel 100 Series MEI" rev 0x21 at pci0 dev 22 function 0 not configured ahci0 at pci0 dev 23 function 0 "Intel 100 Series AHCI" rev 0x21: msi, AHCI 1.3.1 ahci0: port 1: 6.0Gb/s scsibus1 at ahci0: 32 targets sd0 at scsibus1 targ 1 lun 0: <ATA, Samsung SSD 850, EMT0> SCSI3 0/direct fixed naa.5002538d423a4b80 sd0: 476940MB, 512 bytes/sector, 976773168 sectors, thin ppb0 at pci0 dev 28 function 0 "Intel 100 Series PCIE" rev 0xf1: msi pci1 at ppb0 bus 2 rtsx0 at pci1 dev 0 function 0 "Realtek RTS522A Card Reader" rev 0x01: msi sdmmc0 at rtsx0: 4-bit, dma ppb1 at pci0 dev 28 function 2 "Intel 100 Series PCIE" rev 0xf1: msi pci2 at ppb1 bus 3 iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless-AC 8265" rev 0x78, msi pcib0 at pci0 dev 31 function 0 "Intel 200 Series LPC" rev 0x21 "Intel 100 Series PMC" rev 0x21 at pci0 dev 31 function 2 not configured azalia0 at pci0 dev 31 function 3 "Intel 200 Series HD Audio" rev 0x21: msi azalia0: codecs: Realtek/0x0298, Intel/0x280b, using Realtek/0x0298 audio0 at azalia0 ichiic0 at pci0 dev 31 function 4 "Intel 100 Series SMBus" rev 0x21: apic 2 int 16 iic2 at ichiic0 em0 at pci0 dev 31 function 6 "Intel I219-V" rev 0x21: msi, address 54:e1:ad:c3:1f:cc isa0 at pcib0 isadma0 at isa0 com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo pckbc0 at isa0 port 0x60/5 irq 1 irq 12 pckbd0 at pckbc0 (kbd slot) wskbd0 at pckbd0: console keyboard, using wsdisplay0 pms0 at pckbc0 (aux slot) wsmouse0 at pms0 mux 0 wsmouse1 at pms0 mux 0 pms0: Synaptics clickpad, firmware 8.2, 0x1e2b1 0x943300 pcppi0 at isa0 port 0x61 spkr0 at pcppi0 ugen0 at uhub0 port 7 "Intel Bluetooth" rev 2.00/0.10 addr 2 uvideo0 at uhub0 port 8 configuration 1 interface 0 "Bison Integrated Camera" rev 2.00/37.27 addr 3 video0 at uvideo0 ugen1 at uhub0 port 9 "Validity Sensors product 0x0097" rev 2.00/1.64 addr 4 vscsi0 at root scsibus2 at vscsi0: 256 targets softraid0 at root scsibus3 at softraid0: 256 targets root on sd0a (e41ed128f96a0a39.a) swap on sd0b dump on sd0b iwm0: hw rev 0x230, fw ver 22.361476.0, address 28:c6:3f:90:ad:4e ### end dmesg ### Thanx! Mischa
Re: restrict carp use to ethernet interfaces
> On 11 Jan 2018, at 08:25, Matthieu Herrb <matth...@herrb.eu> wrote: > >> On Thu, Jan 11, 2018 at 10:29:17AM +1000, David Gwynne wrote: >> carp interfaces output using ether_output, so it is reasonable to >> require that they only get configured on top of ethernet interfaces >> rather than just !IFT_CARP. >> > Hi, > > in this context are vlan interfaces also considered as IFT_ETHER ? > I've use cases for carp over vlan interfaces. I'd hate not being able > to do that anymore. Doing the same at the moment. Super useful to be able to continue to do this. Mischa