Re: IPv4 on ix(4) slow/nothing - 7.4

2023-10-18 Thread Mischa Peters


> On Oct 18, 2023, at 15:44, Hrvoje Popovski  wrote:
> 
> On 18.10.2023. 15:35, Mischa wrote:
>> Hi All,
>> 
>> Just upgraded a couple of machines to 7.4. smooth as always!!
>> 
>> I am however seeing issues with IPv4, slowness or no throughput at all.
>> The machines I have upgraded are using an Intel X540-T network card and
>> is connected on 10G.
>> 
>> ix0 at pci5 dev 0 function 0 "Intel X540T" rev 0x01, msix, 16 queues,
>> address b8:ca:3a:62:ee:40
>> ix1 at pci5 dev 0 function 1 "Intel X540T" rev 0x01, msix, 16 queues,
>> address b8:ca:3a:62:ee:42
>> 
>> root@n2:~ # sysctl kern.version
>> kern.version=OpenBSD 7.4 (GENERIC.MP) #1397: Tue Oct 10 09:02:37 MDT 2023
>> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> 
>> There are a bunch of VMs running on top of it, as soon as I want to
>> fetch something with ftp, for example, I don't get anything over IPv4,
>> with IPv6 everything is normal.
>> 
>> mischa@www2:~ $ ftp -4
>> https://mirror.openbsd.amsterdam/pub/OpenBSD/7.4/amd64/install74.iso
>> Trying 46.23.88.18...
>> Requesting
>> https://mirror.openbsd.amsterdam/pub/OpenBSD/7.4/amd64/install74.iso
>>   0% | |   512 KB  - stalled
>> -^C
>> 
>> A trace on mirror / n2:
>> 
>> n2:~ # tcpdump -i vport880 host 46.23.88.32
>> tcpdump: listening on vport880, link-type EN10MB
>> 15:16:08.730274 www2.high5.nl.1828 > n2.high5.nl.https: S
>> 2182224746:2182224746(0) win 16384 > 6,nop,nop,timestamp 2899683458 0> (DF)
>> 15:16:08.730297 arp who-has www2.high5.nl tell n2.high5.nl
>> 15:16:08.731535 arp reply www2.high5.nl is-at fe:51:bb:1e:12:11
>> 15:16:08.731540 n2.high5.nl.https > www2.high5.nl.1828: S
>> 633749938:633749938(0) ack 2182224747 win 16384 > 1460,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 3129955106
>> 2899683458> (DF)
>> 15:16:08.732017 www2.high5.nl.1828 > n2.high5.nl.https: . ack 1 win 256
>>  (DF)
>> 15:16:08.785752 www2.high5.nl.1828 > n2.high5.nl.https: P 1:312(311) ack
>> 1 win 256  (DF)
>> 15:16:08.786092 n2.high5.nl.https > www2.high5.nl.1828: P 1:128(127) ack
>> 312 win 271  (DF)
>> 15:16:08.786376 n2.high5.nl.https > www2.high5.nl.1828: P 128:134(6) ack
>> 312 win 271  (DF)
>> 15:16:08.786396 n2.high5.nl.https > www2.high5.nl.1828: P 134:166(32)
>> ack 312 win 271  (DF)
>> 15:16:08.786455 n2.high5.nl.https > www2.high5.nl.1828: . 166:1614(1448)
>> ack 312 win 271  (DF)
>> 15:16:08.786457 n2.high5.nl.https > www2.high5.nl.1828: .
>> 1614:3062(1448) ack 312 win 271 > 2899683510> (DF)
>> 15:16:08.786460 n2.high5.nl.https > www2.high5.nl.1828: P 3062:3803(741)
>> ack 312 win 271  (DF)
>> 15:16:08.786943 www2.high5.nl.1828 > n2.high5.nl.https: . ack 134 win
>> 255  (DF)
>> 15:16:08.796534 n2.high5.nl.https > www2.high5.nl.1828: P 3803:4345(542)
>> ack 312 win 271  (DF)
>> 15:16:08.796577 n2.high5.nl.https > www2.high5.nl.1828: P 4345:4403(58)
>> ack 312 win 271  (DF)
>> 15:16:08.797518 www2.high5.nl.1828 > n2.high5.nl.https: . ack 166 win
>> 256 >> (DF)
>> 15:16:08.797522 www2.high5.nl.1828 > n2.high5.nl.https: . ack 166 win
>> 256 >> (DF)
>> 15:16:09.790297 n2.high5.nl.https > www2.high5.nl.1828: . 166:1614(1448)
>> ack 312 win 271  (DF)
>> 15:16:09.790902 www2.high5.nl.1828 > n2.high5.nl.https: . ack 1614 win
>> 233 >> (DF)
>> 15:16:09.790917 n2.high5.nl.https > www2.high5.nl.1828: .
>> 1614:3062(1448) ack 312 win 271 > 2899684519> (DF)
>> 15:16:09.790923 n2.high5.nl.https > www2.high5.nl.1828: P
>> 3062:4403(1341) ack 312 win 271 > 2899684519> (DF)
>> 15:16:10.790299 n2.high5.nl.https > www2.high5.nl.1828: .
>> 1614:3062(1448) ack 312 win 271 > 2899684519> (DF)
>> 15:16:10.791204 www2.high5.nl.1828 > n2.high5.nl.https: . ack 3062 win
>> 233 >> (DF)
>> 15:16:10.791223 n2.high5.nl.https > www2.high5.nl.1828: P
>> 3062:4403(1341) ack 312 win 271 > 2899685520> (DF)
>> 15:16:10.791692 www2.high5.nl.1828 > n2.high5.nl.https: . ack 4403 win
>> 235  (DF)
>> 15:16:10.802647 www2.high5.nl.1828 > n2.high5.nl.https: P 312:318(6) ack
>> 4403 win 256  (DF)
>> 15:16:11.000297 n2.high5.nl.https > www2.high5.nl.1828: . ack 318 win
>> 271  (DF)
>> 15:16:11.001162 www2.high5.nl.1828 > n2.high5.nl.https: P 318:527(209)
>> ack 4403 win 256  (DF)
>> 15:16:11.001860 n2.high5.nl.https > www2.high5.nl.1828: P 4403:5059(656)
>> ack

IPv4 on ix(4) slow/nothing - 7.4

2023-10-18 Thread Mischa

Hi All,

Just upgraded a couple of machines to 7.4. smooth as always!!

I am however seeing issues with IPv4, slowness or no throughput at all.
The machines I have upgraded are using an Intel X540-T network card and 
is connected on 10G.


ix0 at pci5 dev 0 function 0 "Intel X540T" rev 0x01, msix, 16 queues, 
address b8:ca:3a:62:ee:40
ix1 at pci5 dev 0 function 1 "Intel X540T" rev 0x01, msix, 16 queues, 
address b8:ca:3a:62:ee:42


root@n2:~ # sysctl kern.version
kern.version=OpenBSD 7.4 (GENERIC.MP) #1397: Tue Oct 10 09:02:37 MDT 
2023

dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

There are a bunch of VMs running on top of it, as soon as I want to 
fetch something with ftp, for example, I don't get anything over IPv4, 
with IPv6 everything is normal.


mischa@www2:~ $ ftp -4 
https://mirror.openbsd.amsterdam/pub/OpenBSD/7.4/amd64/install74.iso

Trying 46.23.88.18...
Requesting 
https://mirror.openbsd.amsterdam/pub/OpenBSD/7.4/amd64/install74.iso
  0% | |   512 KB  - stalled 
-^C


A trace on mirror / n2:

n2:~ # tcpdump -i vport880 host 46.23.88.32
tcpdump: listening on vport880, link-type EN10MB
15:16:08.730274 www2.high5.nl.1828 > n2.high5.nl.https: S 
2182224746:2182224746(0) win 16384 6,nop,nop,timestamp 2899683458 0> (DF)

15:16:08.730297 arp who-has www2.high5.nl tell n2.high5.nl
15:16:08.731535 arp reply www2.high5.nl is-at fe:51:bb:1e:12:11
15:16:08.731540 n2.high5.nl.https > www2.high5.nl.1828: S 
633749938:633749938(0) ack 2182224747 win 16384 1460,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 3129955106 
2899683458> (DF)
15:16:08.732017 www2.high5.nl.1828 > n2.high5.nl.https: . ack 1 win 256 
 (DF)
15:16:08.785752 www2.high5.nl.1828 > n2.high5.nl.https: P 1:312(311) ack 
1 win 256  (DF)
15:16:08.786092 n2.high5.nl.https > www2.high5.nl.1828: P 1:128(127) ack 
312 win 271  (DF)
15:16:08.786376 n2.high5.nl.https > www2.high5.nl.1828: P 128:134(6) ack 
312 win 271  (DF)
15:16:08.786396 n2.high5.nl.https > www2.high5.nl.1828: P 134:166(32) 
ack 312 win 271  (DF)
15:16:08.786455 n2.high5.nl.https > www2.high5.nl.1828: . 166:1614(1448) 
ack 312 win 271  (DF)
15:16:08.786457 n2.high5.nl.https > www2.high5.nl.1828: . 
1614:3062(1448) ack 312 win 271 2899683510> (DF)
15:16:08.786460 n2.high5.nl.https > www2.high5.nl.1828: P 3062:3803(741) 
ack 312 win 271  (DF)
15:16:08.786943 www2.high5.nl.1828 > n2.high5.nl.https: . ack 134 win 
255  (DF)
15:16:08.796534 n2.high5.nl.https > www2.high5.nl.1828: P 3803:4345(542) 
ack 312 win 271  (DF)
15:16:08.796577 n2.high5.nl.https > www2.high5.nl.1828: P 4345:4403(58) 
ack 312 win 271  (DF)
15:16:08.797518 www2.high5.nl.1828 > n2.high5.nl.https: . ack 166 win 
256 > (DF)
15:16:08.797522 www2.high5.nl.1828 > n2.high5.nl.https: . ack 166 win 
256 > (DF)
15:16:09.790297 n2.high5.nl.https > www2.high5.nl.1828: . 166:1614(1448) 
ack 312 win 271  (DF)
15:16:09.790902 www2.high5.nl.1828 > n2.high5.nl.https: . ack 1614 win 
233 > (DF)
15:16:09.790917 n2.high5.nl.https > www2.high5.nl.1828: . 
1614:3062(1448) ack 312 win 271 2899684519> (DF)
15:16:09.790923 n2.high5.nl.https > www2.high5.nl.1828: P 
3062:4403(1341) ack 312 win 271 2899684519> (DF)
15:16:10.790299 n2.high5.nl.https > www2.high5.nl.1828: . 
1614:3062(1448) ack 312 win 271 2899684519> (DF)
15:16:10.791204 www2.high5.nl.1828 > n2.high5.nl.https: . ack 3062 win 
233 > (DF)
15:16:10.791223 n2.high5.nl.https > www2.high5.nl.1828: P 
3062:4403(1341) ack 312 win 271 2899685520> (DF)
15:16:10.791692 www2.high5.nl.1828 > n2.high5.nl.https: . ack 4403 win 
235  (DF)
15:16:10.802647 www2.high5.nl.1828 > n2.high5.nl.https: P 312:318(6) ack 
4403 win 256  (DF)
15:16:11.000297 n2.high5.nl.https > www2.high5.nl.1828: . ack 318 win 
271  (DF)
15:16:11.001162 www2.high5.nl.1828 > n2.high5.nl.https: P 318:527(209) 
ack 4403 win 256  (DF)
15:16:11.001860 n2.high5.nl.https > www2.high5.nl.1828: P 4403:5059(656) 
ack 527 win 271  (DF)
15:16:11.001989 n2.high5.nl.https > www2.high5.nl.1828: . 
5059:6507(1448) ack 527 win 271 2899685730> (DF)
15:16:11.001992 n2.high5.nl.https > www2.high5.nl.1828: . 
6507:7955(1448) ack 527 win 271 2899685730> (DF)
15:16:11.195431 www2.high5.nl.1828 > n2.high5.nl.https: . ack 5059 win 
256  (DF)
15:16:11.195447 n2.high5.nl.https > www2.high5.nl.1828: . 
7955:9403(1448) ack 527 win 271 2899685924> (DF)



Running a trace on www2 I am seeing:

www2:~ # tcpdump -i vio0 host 46.23.88.18
tcpdump: listening on vio0, link-type EN10MB
15:16:08.729974 www2.high5.nl.1828 > n2.high5.nl.https: S 
2182224746:2182224746(0) win 16384 6,nop,nop,timestamp 2899683458 0> (DF)

15:16:08.731114 arp who-has www2.high5.nl tell n2.high5.nl
15:16:08.731229 arp reply www2.high5.nl is-at fe:51:bb:1e:12:11
15:16:08.731631 n2.high5.nl.https > www2.high5.nl.1828: S 
633749938:633749938(0) ack 

Re: vmd(8): disambiguate logging for vm's and devices.

2023-09-23 Thread Mischa

Hi Dave,

I like it a lot!

Mischa

On 2023-09-23 19:50, Dave Voutila wrote:

It annoys me how all the log messages from different vmd vm's blur
together. Here is a diff that makes them distinguishable. It also fixes
dynamic toggling of verbosity levels in virtio devices using `vmctl
log`, and now preserves the verbosity across vm reboots.

I chose the pattern "vm/ and "vm// to
distinguish between vmd procs and the names of vm's since someone could
name their vm "vmd" or "vmm" or "priv" etc. Additionally, I'm proposing
to change the proc titles for devices to use "name>/"

instead of "/[]" to match the logging format.

While here I changed the name "parent" to "vmd".

Feedback? Ok?

-dv

Sample abbreviated output of `$(which vmd) -dvv` (aka `vmctl log
verbose`) when launching a vm named "alpine":

vmd: startup

vmd: start_vm_batch: done starting vms
priv: config_getconfig: priv retrieving config
agentx: config_getconfig: agentx retrieving config
vmm: config_getconfig: vmm retrieving config
control: config_getconfig: control retrieving config

vm/alpine: alpine: launching vioblk0
vm/alpine: virtio_dev_launch: sending 'd' type device struct
vm/alpine: virtio_dev_launch: marking fd 7 !close-on-exec
vm/alpine: virtio_dev_launch: sending vm message for 'alpine'
vm/alpine/vioblk: vioblk_main: got viblk dev. num disk fds = 1, sync fd 
= 16, async fd = 18, capacity = 0 seg_max = 126, vmm fd = 5
vm/alpine/vioblk0: qc2_open: qcow2 disk version 3 size 42949672960 end 
604592 snap 0
vm/alpine/vioblk0: vioblk_main: initialized vioblk0 with qcow2 image 
(capacity=83886080)
vm/alpine/vioblk0: vioblk_main: wiring in async vm event handler 
(fd=18)

vm/alpine/vioblk0: vm_device_pipe: initializing 'd' device pipe (fd=18)
vm/alpine/vioblk0: vioblk_main: wiring in sync channel handler (fd=16)
vm/alpine/vioblk0: vioblk_main: telling vm alpine device is ready
vm/alpine/vioblk0: vioblk_main: sending heartbeat
vm/alpine: virtio_dev_launch: receiving reply
vm/alpine: virtio_dev_launch: device reports ready via sync channel


diff refs/heads/master refs/heads/vmd-logging
commit - 6332535f639065d7cf3e5fc071339d6e7a72e767
commit + 0154c36753c63e89d2f4005e4fdc6a3ff17d174f
blob - a6b0db9c264a7ca77411c0bc68a958bc226b317a
blob + 1dd2a384fa24410474fd50de0c594e6f1e2e2bfc
--- usr.sbin/vmd/log.c
+++ usr.sbin/vmd/log.c
@@ -24,31 +24,12 @@
 #include 
 #include 

+#include "proc.h"
+
 static int  debug;
 static int  verbose;
-const char *log_procname;
+static char log_procname[2048];

-void   log_init(int, int);
-void   log_procinit(const char *);
-void   log_setverbose(int);
-intlog_getverbose(void);
-void   log_warn(const char *, ...)
-   __attribute__((__format__ (printf, 1, 2)));
-void   log_warnx(const char *, ...)
-   __attribute__((__format__ (printf, 1, 2)));
-void   log_info(const char *, ...)
-   __attribute__((__format__ (printf, 1, 2)));
-void   log_debug(const char *, ...)
-   __attribute__((__format__ (printf, 1, 2)));
-void   logit(int, const char *, ...)
-   __attribute__((__format__ (printf, 2, 3)));
-void   vlog(int, const char *, va_list)
-   __attribute__((__format__ (printf, 2, 0)));
-__dead void fatal(const char *, ...)
-   __attribute__((__format__ (printf, 1, 2)));
-__dead void fatalx(const char *, ...)
-   __attribute__((__format__ (printf, 1, 2)));
-
 void
 log_init(int n_debug, int facility)
 {
@@ -56,7 +37,7 @@ log_init(int n_debug, int facility)

debug = n_debug;
verbose = n_debug;
-   log_procinit(__progname);
+   log_procinit("%s", __progname);

if (!debug)
openlog(__progname, LOG_PID | LOG_NDELAY, facility);
@@ -65,10 +46,12 @@ log_init(int n_debug, int facility)
 }

 void
-log_procinit(const char *procname)
+log_procinit(const char *fmt, ...)
 {
-   if (procname != NULL)
-   log_procname = procname;
+   va_list ap;
+   va_start(ap, fmt);
+   vsnprintf(log_procname, sizeof(log_procname), fmt, ap);
+   va_end(ap);
 }

 void
@@ -101,7 +84,7 @@ vlog(int pri, const char *fmt, va_list ap)

if (debug) {
/* best effort in out of mem situations */
-   if (asprintf(, "%s\n", fmt) == -1) {
+   if (asprintf(, "%s: %s\n", log_procname, fmt) == -1) {
vfprintf(stderr, fmt, ap);
fprintf(stderr, "\n");
} else {
blob - c9efad13ef3a504bbd43e2b26336784baca33db9
blob + 0b71a9a33e4ded6dac468196ba448fb1623ee137
--- usr.sbin/vmd/proc.c
+++ usr.sbin/vmd/proc.c
@@ -287,7 +287,7 @@ proc_setup(struct privsep *ps, struct privsep_proc 
*pr

struct privsep_pipes*pp;

/* Initialize parent title, ps_instances and procs. */
-   ps->ps_title[PROC_PARENT] = "parent";
+   ps->ps_ti

Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-06 Thread Mischa

On 2023-09-06 19:38, Dave Voutila wrote:

Mischa  writes:

On 2023-09-06 05:36, Dave Voutila wrote:

Mischa  writes:

On 2023-09-05 14:27, Dave Voutila wrote:

Mike Larkin  writes:


On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote:

On 2023-09-04 18:58, Mischa wrote:
> On 2023-09-04 18:55, Mischa wrote:

/snip


> > Adding the sleep 2 does indeed help. I managed to get 20 VMs started
> > this way, before it would choke on 2-3.
> >
> > Do I only need the unpatched kernel or also the vmd/vmctl from snap?
>
> I do still get the same message on the console, but the machine isn't
> freezing up.
>
> [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not
> MAP_STACK
Starting 30 VMs this way caused the machine to become 
unresponsive

again,
but nothing on the console. :(
Mischa

Were you seeing these uvm errors before this diff? If so, this
isn't
causing the problem and something else is.
I don't believe we solved any of the underlying uvm issues in 
Bruges

last year. Mischa, can you test with just the latest
snapshot/-current?
I'd imagine starting and stopping many vm's now is exacerbating the
issue because of the fork/exec for devices plus the ioctl to do a 
uvm

share into the device process address space.


If this diff causes the errors to occur, and without the diff it's
fine, then
we need to look into that.
Also I think a pid number in that printf might be useful, I'll see
what I can
find. If it's not vmd causing this and rather some other process
then that
would be good to know also.

Sadly it looks like that printf doesn't spit out the offending
pid. :(

Just to confirm I am seeing this behavior on the latest snap
without
the patch as well.

Since this diff isn't the cause, I've committed it. Thanks for
testing. I'll see if I can reproduce your MAP_STACK issues.

Just started 10 VMs with sleep 2, machine freezes, but nothing on 
the

console. :(

For now, I'd recommend spacing out vm launches. I'm pretty sure it's
related to the uvm corruption we saw last year when creating, 
starting,

and destroying vm's rapidly in a loop.


That could very well be the case. I will adjust my start script, so
far I've got good results with a 10 second sleep.

Is there some additional debugging I can turn that makes sense for
this? I can easily replicate.



Highly doubtful if the issue is what I think. The only thing would be
making sure you're running in a way to see any panic and drop into
ddb. If you're using X or not on the the primary console or serial
connection it might just appear as a deadlocked system during a panic.


I am using the console via iDRAC, there isn't any information anymore. 
:(


Mischa



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-06 Thread Mischa

On 2023-09-06 05:36, Dave Voutila wrote:

Mischa  writes:

On 2023-09-05 14:27, Dave Voutila wrote:

Mike Larkin  writes:


On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote:

On 2023-09-04 18:58, Mischa wrote:
> On 2023-09-04 18:55, Mischa wrote:

/snip


> > Adding the sleep 2 does indeed help. I managed to get 20 VMs started
> > this way, before it would choke on 2-3.
> >
> > Do I only need the unpatched kernel or also the vmd/vmctl from snap?
>
> I do still get the same message on the console, but the machine isn't
> freezing up.
>
> [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not
> MAP_STACK
Starting 30 VMs this way caused the machine to become unresponsive
again,
but nothing on the console. :(
Mischa

Were you seeing these uvm errors before this diff? If so, this
isn't
causing the problem and something else is.

I don't believe we solved any of the underlying uvm issues in Bruges
last year. Mischa, can you test with just the latest 
snapshot/-current?

I'd imagine starting and stopping many vm's now is exacerbating the
issue because of the fork/exec for devices plus the ioctl to do a uvm
share into the device process address space.


If this diff causes the errors to occur, and without the diff it's
fine, then
we need to look into that.
Also I think a pid number in that printf might be useful, I'll see
what I can
find. If it's not vmd causing this and rather some other process
then that
would be good to know also.

Sadly it looks like that printf doesn't spit out the offending
pid. :(


Just to confirm I am seeing this behavior on the latest snap without
the patch as well.


Since this diff isn't the cause, I've committed it. Thanks for
testing. I'll see if I can reproduce your MAP_STACK issues.


Just started 10 VMs with sleep 2, machine freezes, but nothing on the
console. :(


For now, I'd recommend spacing out vm launches. I'm pretty sure it's
related to the uvm corruption we saw last year when creating, starting,
and destroying vm's rapidly in a loop.


That could very well be the case. I will adjust my start script, so far 
I've got good results with a 10 second sleep.


Is there some additional debugging I can turn that makes sense for this? 
I can easily replicate.


Mischa



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-05 Thread Mischa

On 2023-09-05 14:27, Dave Voutila wrote:

Mike Larkin  writes:


On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote:

On 2023-09-04 18:58, Mischa wrote:
> On 2023-09-04 18:55, Mischa wrote:


/snip


> > Adding the sleep 2 does indeed help. I managed to get 20 VMs started
> > this way, before it would choke on 2-3.
> >
> > Do I only need the unpatched kernel or also the vmd/vmctl from snap?
>
> I do still get the same message on the console, but the machine isn't
> freezing up.
>
> [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not
> MAP_STACK

Starting 30 VMs this way caused the machine to become unresponsive 
again,

but nothing on the console. :(

Mischa


Were you seeing these uvm errors before this diff? If so, this isn't
causing the problem and something else is.


I don't believe we solved any of the underlying uvm issues in Bruges
last year. Mischa, can you test with just the latest snapshot/-current?

I'd imagine starting and stopping many vm's now is exacerbating the
issue because of the fork/exec for devices plus the ioctl to do a uvm
share into the device process address space.



If this diff causes the errors to occur, and without the diff it's 
fine, then

we need to look into that.


Also I think a pid number in that printf might be useful, I'll see 
what I can
find. If it's not vmd causing this and rather some other process then 
that

would be good to know also.


Sadly it looks like that printf doesn't spit out the offending pid. :(


Just to confirm I am seeing this behavior on the latest snap without the 
patch as well.
Just started 10 VMs with sleep 2, machine freezes, but nothing on the 
console. :(


Mischa



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-05 Thread Mischa



On 2023-09-05 14:27, Dave Voutila wrote:

Mike Larkin  writes:


On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote:

On 2023-09-04 18:58, Mischa wrote:
> On 2023-09-04 18:55, Mischa wrote:


/snip


> > Adding the sleep 2 does indeed help. I managed to get 20 VMs started
> > this way, before it would choke on 2-3.
> >
> > Do I only need the unpatched kernel or also the vmd/vmctl from snap?
>
> I do still get the same message on the console, but the machine isn't
> freezing up.
>
> [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not
> MAP_STACK

Starting 30 VMs this way caused the machine to become unresponsive 
again,

but nothing on the console. :(

Mischa


Were you seeing these uvm errors before this diff? If so, this isn't
causing the problem and something else is.


I don't believe we solved any of the underlying uvm issues in Bruges
last year. Mischa, can you test with just the latest snapshot/-current?


Yes, after Mike's email I already started getting an extra machine up 
and running.

Will finish that shortly and run the tests on the latest snap.


I'd imagine starting and stopping many vm's now is exacerbating the
issue because of the fork/exec for devices plus the ioctl to do a uvm
share into the device process address space.


I will adjust my scripts accordingly. I currently start as many VMs as 
there are cores in production. Will test if that is still possible.


Mischa

If this diff causes the errors to occur, and without the diff it's 
fine, then

we need to look into that.


Also I think a pid number in that printf might be useful, I'll see 
what I can
find. If it's not vmd causing this and rather some other process then 
that

would be good to know also.


Sadly it looks like that printf doesn't spit out the offending pid. :(




Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-04 Thread Mischa

On 2023-09-04 18:58, Mischa wrote:

On 2023-09-04 18:55, Mischa wrote:

On 2023-09-04 17:57, Dave Voutila wrote:

Mischa  writes:

On 2023-09-04 16:23, Mike Larkin wrote:

On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote:

On 2023-09-03 21:18, Dave Voutila wrote:
> Mischa  writes:
>
> > Nice!! Thanx Dave!
> >
> > Running go brrr as we speak.
> > Testing with someone who is running Debian.
>
> Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time
> unless I hear of any issues.
There are a couple of permanent VMs running on this host, 1 ToR
node,
OpenBSD VM and a Debian VM.
While they were running I started my stress script.
The first round I started 40 VMs with just bsd.rd, 2G memory
All good, then I started 40 VMs with a base disk and 2G memory.
After 20 VMs started I got the following messages on the console:
[umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff:
not
MAP_STACK
[umd159360/355276 sp=783369$96750 inside 
7256d538c000-725645b8bFff:

not
MAP_STACK
[umd172263/319211 sp=70fb86794b60 inside 
75247a4d2000-75247acdifff:

not
MAP_STACK
[umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: 
not

MAP_STACK
[umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: 
not

MAP_STACK
[umd193279/488634 sp=72652c3e3da0 inside 
7845f168d000-7845f1e8cfff:

not
MAP_STACK
[umd155924/286116 sp=7eac5a1ff060 inside 
7b88bcb79000-7b88b4378fff:

not
MAP_STACK
Not sure if this is related to starting of the VMs or something
else, the
ToR node was consuming 100%+ CPU at the time. :)
Mischa
I have not seen this; can you try without the ToR node some time 
and

see if
this still happens?


Testing again without any other VMs running.
Things wrong when I run the following command and wait a little.

for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2
/var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 
2G

vm${i}; done


Can you try adding a "sleep 2" or something in the loop? I can't 
think

of a reason my changes would cause this. Do you see this on -current
without the diff?


Adding the sleep 2 does indeed help. I managed to get 20 VMs started 
this way, before it would choke on 2-3.


Do I only need the unpatched kernel or also the vmd/vmctl from snap?


I do still get the same message on the console, but the machine isn't 
freezing up.


[umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not 
MAP_STACK


Starting 30 VMs this way caused the machine to become unresponsive 
again, but nothing on the console. :(


Mischa



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-04 Thread Mischa




On 2023-09-04 18:55, Mischa wrote:

On 2023-09-04 17:57, Dave Voutila wrote:

Mischa  writes:

On 2023-09-04 16:23, Mike Larkin wrote:

On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote:

On 2023-09-03 21:18, Dave Voutila wrote:
> Mischa  writes:
>
> > Nice!! Thanx Dave!
> >
> > Running go brrr as we speak.
> > Testing with someone who is running Debian.
>
> Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time
> unless I hear of any issues.
There are a couple of permanent VMs running on this host, 1 ToR
node,
OpenBSD VM and a Debian VM.
While they were running I started my stress script.
The first round I started 40 VMs with just bsd.rd, 2G memory
All good, then I started 40 VMs with a base disk and 2G memory.
After 20 VMs started I got the following messages on the console:
[umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff:
not
MAP_STACK
[umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff:
not
MAP_STACK
[umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff:
not
MAP_STACK
[umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: 
not

MAP_STACK
[umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: 
not

MAP_STACK
[umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff:
not
MAP_STACK
[umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff:
not
MAP_STACK
Not sure if this is related to starting of the VMs or something
else, the
ToR node was consuming 100%+ CPU at the time. :)
Mischa

I have not seen this; can you try without the ToR node some time and
see if
this still happens?


Testing again without any other VMs running.
Things wrong when I run the following command and wait a little.

for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2
/var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 
2G

vm${i}; done


Can you try adding a "sleep 2" or something in the loop? I can't think
of a reason my changes would cause this. Do you see this on -current
without the diff?


Adding the sleep 2 does indeed help. I managed to get 20 VMs started 
this way, before it would choke on 2-3.


Do I only need the unpatched kernel or also the vmd/vmctl from snap?


I do still get the same message on the console, but the machine isn't 
freezing up.


[umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not 
MAP_STACK


Mischa



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-04 Thread Mischa

On 2023-09-04 17:57, Dave Voutila wrote:

Mischa  writes:

On 2023-09-04 16:23, Mike Larkin wrote:

On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote:

On 2023-09-03 21:18, Dave Voutila wrote:
> Mischa  writes:
>
> > Nice!! Thanx Dave!
> >
> > Running go brrr as we speak.
> > Testing with someone who is running Debian.
>
> Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time
> unless I hear of any issues.
There are a couple of permanent VMs running on this host, 1 ToR
node,
OpenBSD VM and a Debian VM.
While they were running I started my stress script.
The first round I started 40 VMs with just bsd.rd, 2G memory
All good, then I started 40 VMs with a base disk and 2G memory.
After 20 VMs started I got the following messages on the console:
[umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff:
not
MAP_STACK
[umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff:
not
MAP_STACK
[umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff:
not
MAP_STACK
[umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: 
not

MAP_STACK
[umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: 
not

MAP_STACK
[umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff:
not
MAP_STACK
[umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff:
not
MAP_STACK
Not sure if this is related to starting of the VMs or something
else, the
ToR node was consuming 100%+ CPU at the time. :)
Mischa

I have not seen this; can you try without the ToR node some time and
see if
this still happens?


Testing again without any other VMs running.
Things wrong when I run the following command and wait a little.

for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2
/var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 2G
vm${i}; done


Can you try adding a "sleep 2" or something in the loop? I can't think
of a reason my changes would cause this. Do you see this on -current
without the diff?


Adding the sleep 2 does indeed help. I managed to get 20 VMs started 
this way, before it would choke on 2-3.


Do I only need the unpatched kernel or also the vmd/vmctl from snap?

Mischa



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-04 Thread Mischa

On 2023-09-04 16:23, Mike Larkin wrote:

On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote:

On 2023-09-03 21:18, Dave Voutila wrote:
> Mischa  writes:
>
> > Nice!! Thanx Dave!
> >
> > Running go brrr as we speak.
> > Testing with someone who is running Debian.
>
> Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time
> unless I hear of any issues.

There are a couple of permanent VMs running on this host, 1 ToR node,
OpenBSD VM and a Debian VM.
While they were running I started my stress script.
The first round I started 40 VMs with just bsd.rd, 2G memory
All good, then I started 40 VMs with a base disk and 2G memory.
After 20 VMs started I got the following messages on the console:

[umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: not
MAP_STACK
[umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff: 
not

MAP_STACK
[umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff: 
not

MAP_STACK
[umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not
MAP_STACK
[umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not
MAP_STACK
[umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff: 
not

MAP_STACK
[umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff: 
not

MAP_STACK

Not sure if this is related to starting of the VMs or something else, 
the

ToR node was consuming 100%+ CPU at the time. :)

Mischa


I have not seen this; can you try without the ToR node some time and 
see if

this still happens?


Testing again without any other VMs running.
Things wrong when I run the following command and wait a little.

for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2 
/var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 2G 
vm${i}; done


Mischa



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-04 Thread Mischa

On 2023-09-03 21:18, Dave Voutila wrote:

Mischa  writes:


Nice!! Thanx Dave!

Running go brrr as we speak.
Testing with someone who is running Debian.


Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time
unless I hear of any issues.


There are a couple of permanent VMs running on this host, 1 ToR node, 
OpenBSD VM and a Debian VM.

While they were running I started my stress script.
The first round I started 40 VMs with just bsd.rd, 2G memory
All good, then I started 40 VMs with a base disk and 2G memory.
After 20 VMs started I got the following messages on the console:

[umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: not 
MAP_STACK
[umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff: not 
MAP_STACK
[umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff: not 
MAP_STACK
[umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not 
MAP_STACK
[umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not 
MAP_STACK
[umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff: not 
MAP_STACK
[umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff: not 
MAP_STACK


Not sure if this is related to starting of the VMs or something else, 
the ToR node was consuming 100%+ CPU at the time. :)


Mischa



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-03 Thread Mischa

Nice!! Thanx Dave!

Running go brrr as we speak.
Testing with someone who is running Debian.

Mischa

On 2023-09-01 21:50, Dave Voutila wrote:

Now that my i8259 fix is in, it's safe to expand the testing pool for
this diff. (Without that fix, users would definitely hit the hung block
device issue testing this one.) Hoping that folks that run non-OpenBSD
guests or strange configurations can give it a spin.

This change removes an ioctl(2) call from the vcpu thread hot path in
vmd. Instead of making that syscall to toggle on/off a pending 
interrupt
flag on the vcpu object in vmm(4), it adds a flag into the 
vm_run_params

struct sent with the VMM_IOC_RUN ioctl. The in-kernel vcpu runloop can
now toggle the pending interrupt state prior to vm entry.

mbuhl@ and phessler@ have run this diff on their machines. Current
observations are reduced average network latency for guests.

My terse measurements using the following btrace script show some
promising changes in terms of reducing ioctl syscalls:

  /* VMM_IOC_INTR: 0x800c5606 -> 2148292102 */
  syscall:ioctl:entry
  /arg1 == 2148292102/
  {
@total[tid] = count();
@running[tid] = count();
  }
  interval:hz:1
  {
print(@running);
clear(@running);
  }

Measuring from boot of an OpenBSD guest to after the guest finishes
relinking (based on my manual observation of the libevent thread
settling down in syscall rate), I see a huge reduction in VMM_IOC_INTR
ioctls for a single guest:

## -current
@total[433237]: 1325100  # vcpu thread (!!)
@total[187073]: 80239# libevent thread

## with diff
@total[550347]: 42   # vcpu thread (!!)
@total[256550]: 86946# libevent thread

Most of the VMM_IOC_INTR ioctls on the vcpu threads come from seabios
and the bootloader prodding some of the emulated hardware, but even
after the bootloader you'll see ~10-20k/s of ioctl's on -current
vs. ~4-5k/s with the diff.

At steady-state, the vcpu thread no longer makes the VMM_IOC_INTR calls
at all and you should see the libevent thread calling it at a rate 
~100/s

(probably hardclock?). *Without* the diff, I see a steady 650/s rate on
the vcpu thread at idle. *With* the diff, it's 0/s at idle. :)

To test:
- rebuild & install new kernel
- copy/symlink vmmvar.h into /usr/include/machine/
- rebuild & re-install vmd & vmctl
- reboot

-dv


diffstat refs/heads/master refs/heads/vmm-vrp_intr_pending
 M  sys/arch/amd64/amd64/vmm_machdep.c  |  10+   0-
 M  sys/arch/amd64/include/vmmvar.h |   1+   0-
 M  usr.sbin/vmd/vm.c   |   2+  16-

3 files changed, 13 insertions(+), 16 deletions(-)

diff refs/heads/master refs/heads/vmm-vrp_intr_pending
commit - 8afcf90fb39e4a84606e93137c2b6c20f44312cb
commit + 10eeb8a0414ec927b6282473c50043a7027d6b41
blob - 24a376a8f3bc94bc4a4203fe66c5994594adff46
blob + e3b6d10a0ae78b12ec2f3296f708b42540ce798e
--- sys/arch/amd64/amd64/vmm_machdep.c
+++ sys/arch/amd64/amd64/vmm_machdep.c
@@ -3973,6 +3973,11 @@ vcpu_run_vmx(struct vcpu *vcpu, struct 
vm_run_params *

 */
irq = vrp->vrp_irq;

+   if (vrp->vrp_intr_pending)
+   vcpu->vc_intr = 1;
+   else
+   vcpu->vc_intr = 0;
+
if (vrp->vrp_continue) {
switch (vcpu->vc_gueststate.vg_exit_reason) {
case VMX_EXIT_IO:
@@ -6381,6 +6386,11 @@ vcpu_run_svm(struct vcpu *vcpu, struct 
vm_run_params *


irq = vrp->vrp_irq;

+   if (vrp->vrp_intr_pending)
+   vcpu->vc_intr = 1;
+   else
+   vcpu->vc_intr = 0;
+
/*
 * If we are returning from userspace (vmd) because we exited
 * last time, fix up any needed vcpu state first. Which state
blob - e9f8384cccfde33034d7ac9782610f93eb5dc640
blob + 88545b54b35dd60280ba87403e343db9463d7419
--- sys/arch/amd64/include/vmmvar.h
+++ sys/arch/amd64/include/vmmvar.h
@@ -456,6 +456,7 @@ struct vm_run_params {
uint32_tvrp_vcpu_id;
uint8_t vrp_continue;   /* Continuing from an exit */
uint16_tvrp_irq;/* IRQ to inject */
+   uint8_t vrp_intr_pending;   /* Additional intrs pending? */

/* Input/output parameter to VMM_IOC_RUN */
struct vm_exit  *vrp_exit;  /* updated exit data */
blob - 5f598bcc14af5115372d34a4176254d377aad91c
blob + 447fc219adadf945de2bf25d5335993c2abdc26f
--- usr.sbin/vmd/vm.c
+++ usr.sbin/vmd/vm.c
@@ -1610,22 +1610,8 @@ vcpu_run_loop(void *arg)
} else
vrp->vrp_irq = 0x;

-   /* Still more pending? */
-   if (i8259_is_pending()) {
-   /*
-* XXX can probably avoid ioctls here by providing intr
-* in vrp
-*/
-   if (vcpu_pic_intr(vrp->vrp_vm_id,
-   vrp->vrp_vcpu_id, 1)) {

Re: only open /dev/vmm once in vmd(8)

2022-12-26 Thread Mischa

Hi Dave,

Applied the patch on top of the previous two you provided and all looks 
good.
Running four proper VMs (installed 7.2, with different amount of memory 
allocated, one of them with rpki-client) and booted ~40 with just 
bsd.rd.


Some log messages I am seeing, which I didn't see/notice before.
Let me know if there is something specific I need to look out for.

Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 
0xfdd1a31ff998

Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509fc7501e
Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 
0xfdd1a31ff448

Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509fb5001e
Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 
0xfdd1a31ff008

Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509f9b401e
Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 
0xfdd1a31ff228

Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509f7b001e
Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 
0xfdd1a31ff778

Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509f51a01e
Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 
0xfdd1a31ffaa8

Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509f1c601e
Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 
0xfdd1a31ffdd8

Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509ee0901e
Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 
0xfdd09e98fcd0

Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509e98e01e
Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 
0xfdd09e98f780

Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509e46c01e
Dec 26 16:54:13 current /bsd: vm_impl_init_vmx: created vm_map @ 
0xfdd09e98f340

Dec 26 16:54:13 current /bsd: Guest EPTP = 0x509de2601e
Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 25
Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. 
perf mon) not supported
Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, 
data=0x80010031

Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 26
Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. 
perf mon) not supported
Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, 
data=0x8001003

Dec 26 16:54:13 current /bsd: 1
Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID
Dec 26 16:54:13 current /bsd: /ASID 27
Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. 
perf mon) not supported
Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, 
data=0x80010031

Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 28
Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. 
perf mon) not supported
Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, 
data=0x80010031

Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 29
Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. 
perf mon) not supported
Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, 
data=0x80010031

Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 30
Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 31
Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. 
perf mon) not supported
Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, 
data=0x80010031
Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. 
perf mon) not supported
Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, 
data=0x80010031

Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 32
Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. 
perf mon) not supported
Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, 
data=0x80010031

Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 33
Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. 
perf mon) not supported

Dec 26 16:54:13 current /bsd: vmm_alloc_vpid: allocated VPID/ASID 34
Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, 
data=0x80010031
Dec 26 16:54:13 current /bsd: vmm_handle_cpuid: function 0x0a (arch. 
perf mon) not supported
Dec 26 16:54:13 current /bsd: vmx_handle_cr: mov to cr0 @ 100148a, 
data=0x80010031
Dec 26 16:54:14 current /bsd: vmm_handle_cpuid: unsupported 
rax=0x4100

Dec 26 16:54:14 current last message repeated 5 times
Dec 26 16:54:14 current /bsd: vmm_handle_cpuid: function 0x06 
(thermal/power mgt) not supported

Dec 26 16:54:14 current last message repeated 2 times

Mischa


On 2022-12-25 16:57, Dave Voutila wrote:
During h2k22 there was some discussion around how vmd(8) manages vms 
and

the vmm(4) device's role. While looking into something related, I found
vmd opens /dev/vmm in each subprocess during the initial fork+execve
dance.

The only vmd process that needs /dev/vmm is the vmm process.

The diff below changes it so that *only

Re: vmd(8): create a proper e820 bios memory map

2022-12-14 Thread Mischa

On 2022-12-14 14:57, Dave Voutila wrote:

Mischa  writes:


On 2022-12-13 20:29, Dave Voutila wrote:

Dave Voutila  writes:


tech@,
The below diff tweaks how vmd and vmm define memory ranges (adding
a
"type" attribute) so we can properly build an e820 memory map to
hand to
things like SeaBIOS or the OpenBSD ramdisk kernel (when direct 
booting

bsd.rd).
Why do it? We've been carrying a few patches to SeaBIOS in the
ports
tree to hack around how vmd articulates some memory range details. 
By

finally implementing a proper bios memory map table we can drop
some of
those patches. (Diff to ports@ coming shortly.)
Bonus is it cleans up how we were hacking a bios memory map for
direct
booting ramdisk kernels.
Note: the below diff *will* work with the current SeaBIOS
(vmm-firmware), so you do *not* need to build the port.
You will, however, need to:
- build, install, & reboot into a new kernel
- make sure you update /usr/include/amd64/vmmvar.h with a copy of
  symlink to sys/arch/amd64/include/vmmvar.h
- rebuild & install vmctl
- rebuild & install vmd
This should *not* result in any behavioral changes of current vmd
guests. If you notice any, especially guests failing to start, 
please

rebuild a kernel with VMM_DEBUG to help diagnose the regression.


Updated diff to fix some accounting issues with guest memory. (vmctl
should report the correct max mem now.)


Booted... The memory display in vmctl show is normal again.

root@current:~ # vmctl show
   ID   PID VCPUS  MAXMEM  CURMEM TTYOWNERSTATE NAME
4 56252 11.0G989M   ttyp4 runbsd04  running vm04
3 60536 18.0G2.2G   ttyp3   runbsd  running vm03
2 20642 1   16.0G3.4G   ttyp2   runbsd  running vm02
1 81947 1   30.0G5.6G   ttyp1   runbsd  running vm01

All seems to running normal. Anything specific I need to look out for?



Other than the above, no not really. Going to keep this diff out on
tech@ a few days to allow folks with a variety of guests to test before
I ask for OK's to commit.

The next change will be SeaBIOS (vmm-firmware) once this lands.


Perfect! Will do some more tests and will let you know if I find 
something.


Mischa



Re: vmd(8): create a proper e820 bios memory map

2022-12-14 Thread Mischa

On 2022-12-13 20:29, Dave Voutila wrote:

Dave Voutila  writes:


tech@,

The below diff tweaks how vmd and vmm define memory ranges (adding a
"type" attribute) so we can properly build an e820 memory map to hand 
to

things like SeaBIOS or the OpenBSD ramdisk kernel (when direct booting
bsd.rd).

Why do it? We've been carrying a few patches to SeaBIOS in the ports
tree to hack around how vmd articulates some memory range details. By
finally implementing a proper bios memory map table we can drop some 
of

those patches. (Diff to ports@ coming shortly.)

Bonus is it cleans up how we were hacking a bios memory map for direct
booting ramdisk kernels.

Note: the below diff *will* work with the current SeaBIOS
(vmm-firmware), so you do *not* need to build the port.

You will, however, need to:
- build, install, & reboot into a new kernel
- make sure you update /usr/include/amd64/vmmvar.h with a copy of
  symlink to sys/arch/amd64/include/vmmvar.h
- rebuild & install vmctl
- rebuild & install vmd

This should *not* result in any behavioral changes of current vmd
guests. If you notice any, especially guests failing to start, please
rebuild a kernel with VMM_DEBUG to help diagnose the regression.



Updated diff to fix some accounting issues with guest memory. (vmctl
should report the correct max mem now.)


Booted... The memory display in vmctl show is normal again.

root@current:~ # vmctl show
   ID   PID VCPUS  MAXMEM  CURMEM TTYOWNERSTATE NAME
4 56252 11.0G989M   ttyp4 runbsd04  running vm04
3 60536 18.0G2.2G   ttyp3   runbsd  running vm03
2 20642 1   16.0G3.4G   ttyp2   runbsd  running vm02
1 81947 1   30.0G5.6G   ttyp1   runbsd  running vm01

All seems to running normal. Anything specific I need to look out for?

Mischa


As a result, adds in an MMIO range type (previous diff counted that
range towards guest mem, though we don't actually fault in virtual
memory to represent it to the guest).

This has the added benefit of removing more knowledge from vmm(4) of
what an emulated machine looks like, i.e. why does it care what the pci
mmio range is? vmd(8) is responsible for that.

I did also remove the "multiple of 1M" requirement for guest
memory. Since I transitioned things to bytes awhile ago, no need to
prohibit that.

-dv

diff refs/heads/master refs/heads/vmd-e820
commit - 9be741fe9857107e3610acb9a39e2972330b122d
commit + ad422400e2f72c14c73d7f124f8b96d01d4ad4c5
blob - 3f7e0ce405ae3c6b0b4a787de341839886f97436
blob + d69293fcd5fd98315181eb0dd77b653601530e9d
--- sys/arch/amd64/amd64/vmm.c
+++ sys/arch/amd64/amd64/vmm.c
@@ -1631,8 +1631,8 @@ vmx_remote_vmclear(struct cpu_info *ci, struct 
vcpu *v

  * The last physical address may not exceed VMM_MAX_VM_MEM_SIZE.
  *
  * Return Values:
- *   The total memory size in MB if the checks were successful
- *   0: One of the memory ranges was invalid, or VMM_MAX_VM_MEM_SIZE 
was

+ *   The total memory size in bytes if the checks were successful
+ *   0: One of the memory ranges was invalid or VMM_MAX_VM_MEM_SIZE 
was

  *   exceeded
  */
 size_t
@@ -1643,21 +1643,27 @@ vm_create_check_mem_ranges(struct 
vm_create_params *vc

const paddr_t maxgpa = VMM_MAX_VM_MEM_SIZE;

if (vcp->vcp_nmemranges == 0 ||
-   vcp->vcp_nmemranges > VMM_MAX_MEM_RANGES)
+   vcp->vcp_nmemranges > VMM_MAX_MEM_RANGES) {
+   DPRINTF("invalid number of guest memory ranges\n");
return (0);
+   }

for (i = 0; i < vcp->vcp_nmemranges; i++) {
vmr = >vcp_memranges[i];

/* Only page-aligned addresses and sizes are permitted */
if ((vmr->vmr_gpa & PAGE_MASK) || (vmr->vmr_va & PAGE_MASK) ||
-   (vmr->vmr_size & PAGE_MASK) || vmr->vmr_size == 0)
+   (vmr->vmr_size & PAGE_MASK) || vmr->vmr_size == 0) {
+   DPRINTF("memory range %zu is not page aligned\n", i);
return (0);
+   }

/* Make sure that VMM_MAX_VM_MEM_SIZE is not exceeded */
if (vmr->vmr_gpa >= maxgpa ||
-   vmr->vmr_size > maxgpa - vmr->vmr_gpa)
+   vmr->vmr_size > maxgpa - vmr->vmr_gpa) {
+   DPRINTF("exceeded max memory size\n");
return (0);
+   }

/*
 * Make sure that all virtual addresses are within the address
@@ -1667,39 +1673,29 @@ vm_create_check_mem_ranges(struct 
vm_create_params *vc

 */
if (vmr->vmr_va < VM_MIN_ADDRESS ||
vmr->vmr_va >= VM_MAXUSER_ADDRESS ||
-   vmr->vmr_size >= VM_MAXUSER_ADDRESS - vmr->vmr_va)
+ 

Re: vmd(8): create a proper e820 bios memory map

2022-12-12 Thread Mischa

On 2022-12-12 16:36, Dave Voutila wrote:

Mischa  writes:


On 2022-12-12 16:02, Dave Voutila wrote:

Mischa  writes:


Hi Dave,
Great stuff!!
Everything is patched, build and booted.
What is the best way to test this?

Start guests as usual. I'd say the only thing definitively to
manually
check is that they see the same amount of physical memory as before 
the

patch.


That is indeed different. Before the patch allocating 1G would be
displayed as 1G.
Now a 1G allocation is 1.3G, 2G is 2.3G, 8G is 8.3G, etc.


So I can reproduce, how are you measuring it?


All the VMs with 2.3G have 2G memory allocated.

root@current:~ # vmctl show
   ID   PID VCPUS  MAXMEM  CURMEM TTYOWNERSTATE NAME
   41 27026 12.3G328M   ttypl root  running vm29
   40 27031 12.3G328M   ttypk root  running vm28
   39 66653 12.3G328M   ttypj root  running vm27
   38 79530 12.3G328M   ttypi root  running vm26
   37 24110 12.3G328M   ttyph root  running vm25
   36 89659 12.3G328M   ttypg root  running vm24
   35 19193 12.3G328M   ttypf root  running vm23
   34 48946 12.3G328M   ttype root  running vm22
   33 32065 12.3G328M   ttypd root  running vm21
   32 61847 12.3G328M   ttypc root  running vm20
   31 42429 12.3G328M   ttypb root  running vm19
   30 50201 12.3G328M   ttypa root  running vm18
   29 18652 12.3G328M   ttyp9 root  running vm17
   28 23312 12.3G328M   ttyp8 root  running vm16
   27 21314 12.3G328M   ttyp7 root  running vm15
   26 79420 12.3G328M   ttyp6 root  running vm14
   25 23214 12.3G328M   ttyp5 root  running vm13
   24 22755 12.3G328M   ttyp4 root  running vm12
   23  7716 12.3G328M   ttyp3 root  running vm11
   22  2758 12.3G328M   ttyp2 root  running vm10
1 - 1   30.0G   -   -   runbsd  stopped vm01
2 - 1   16.0G   -   -   runbsd  stopped vm02
3 - 18.0G   -   -   runbsd  stopped vm03
4 - 11.0G   -   - runbsd04  stopped vm04
5 - 11.0G   -   -   runbsd  stopped vm42

After starting vm0[1-4].

root@current:~ # vmctl show | grep vm0[1-4]
4 61002 11.3G990M   ttypq runbsd04  running vm04
3 60620 18.3G1.4G   ttypp   runbsd  running vm03
2 94240 1   16.3G2.5G   ttypo   runbsd  running vm02
1 32209 1   30.3G4.6G   ttypn   runbsd  running vm01

Booting one of the VMs with console:

root@current:~ # vmctl start -c vm04
Connected to /dev/ttypn (speed 115200)
Using drive 0, partition 3.
Loading..
probing: pc0 com0 mem[638K 1022M a20=on]
disk: hd0+

OpenBSD/amd64 BOOT 3.55

\
com0: 115200 baud
switching console to com0

OpenBSD/amd64 BOOT 3.55

boot>
booting hd0a:/bsd: 15615256+3765256+310448+0+1171456 
[1138229+128+1224792+927979]=0x170ac00

entry point at 0x81001000
[ using 3292160 bytes of bsd ELF symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights 
reserved.
Copyright (c) 1995-2022 OpenBSD. All rights reserved.  
https://www.OpenBSD.org


OpenBSD 7.2 (GENERIC) #0: Wed Oct 26 11:26:29 MDT 2022

r...@syspatch-72-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC

real mem = 1056952320 (1007MB)
avail mem = 1007689728 (961MB)

Standard 7.2 vmd/vmm:

root@server18:~ # vmctl start -c vm40
Connected to /dev/ttyp7 (speed 115200)
Using drive 0, partition 3.
Loading..
probing: pc0 com0 mem[638K 1022M a20=on]
disk: hd0+

OpenBSD/amd64 BOOT 3.55

\
com0: 115200 baud
switching console to com0

OpenBSD/amd64 BOOT 3.55

boot>
booting hd0a:/bsd: 15615256+3769352+309808+0+1171456 
[1143120+128+1224792+927979]=0x170cf18

entry point at 0x81001000
[ using 3297048 bytes of bsd ELF symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights 
reserved.
Copyright (c) 1995-2022 OpenBSD. All rights reserved.  
https://www.OpenBSD.org


OpenBSD 7.2 (GENERIC) #2: Thu Nov 24 23:52:58 MST 2022

r...@syspatch-72-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC

real mem = 1056952320 (1007MB)
avail mem = 1007681536 (961MB)

root@server18:~ # vmctl show vm40
   ID   PID VCPUS  MAXMEM  CURMEM TTYOWNERSTATE NAME
8 94043 11.0G918M   ttyp7   runbsd  running vm40


Mischa



Start a bunch of VMs with bsd.rd? Does this still need to be a
decompressed bsd.rd?

Booting compressed bsd.rd's has been working for awhile now, so no
need
to decompress them, btw. If you boot a bsd.rd, it's exercising the
changes to loadfile_elf.c,

Re: vmd(8): create a proper e820 bios memory map

2022-12-12 Thread Mischa

On 2022-12-12 16:02, Dave Voutila wrote:

Mischa  writes:


Hi Dave,

Great stuff!!
Everything is patched, build and booted.

What is the best way to test this?


Start guests as usual. I'd say the only thing definitively to manually
check is that they see the same amount of physical memory as before the
patch.


That is indeed different. Before the patch allocating 1G would be 
displayed as 1G.

Now a 1G allocation is 1.3G, 2G is 2.3G, 8G is 8.3G, etc.


Start a bunch of VMs with bsd.rd? Does this still need to be a
decompressed bsd.rd?


Booting compressed bsd.rd's has been working for awhile now, so no need
to decompress them, btw. If you boot a bsd.rd, it's exercising the
changes to loadfile_elf.c, which is important to test.


I missed that completely, good to know.

Starting/stopping a bunch of VMs with bsd.rd (only) at the moment.
Not sure if this is helpful, but got this in the logs:

Dec 12 16:21:47 current /bsd: vmm_handle_cpuid: function 0x06 
(thermal/power mgt) not supported
Dec 12 16:21:47 current /bsd: vcpu_run_vmx: unimplemented exit type 32 
(WRMSR instruction)

Dec 12 16:21:47 current /bsd: vcpu @ 0x800022e3a000 in long mode
Dec 12 16:21:47 current /bsd:  CPL=0
Dec 12 16:21:47 current /bsd:  rax=0x0081 
rbx=0x818f8008 rcx=0x01a0
Dec 12 16:21:47 current /bsd:  rdx=0x 
rbp=0x81a06da0 rdi=0x01a0
Dec 12 16:21:47 current /bsd:  rsi=0x81a06d88  
r8=0x0028  r9=0xa9144070
Dec 12 16:21:47 current /bsd:  r10=0x 
r11=0x81a06ce0 r12=0x81a06e28
Dec 12 16:21:47 current /bsd:  r13=0x8147da20 
r14=0x818f6ff0 r15=0x0006
Dec 12 16:21:47 current /bsd:  rip=0x81221d50 
rsp=0x81a06d80
Dec 12 16:21:47 current /bsd:  rflags=0x0246 (cf PF af ZF sf 
tf IF df of nt rf vm ac vif vip id IOPL=0)
Dec 12 16:21:47 current /bsd:  cr0=0x80010031 (PG cd nw am WP NE 
ET ts em mp PE)

Dec 12 16:21:47 current /bsd:  cr2=0x
Dec 12 16:21:47 current /bsd:  cr3=0x7f7d8000 (pwt pcd)
Dec 12 16:21:47 current /bsd:  cr4=0x26b0 (pke smap smep 
osxsave pcide fsgsbase smxe VMXE OSXMMEXCPT OSFXSR pce PGE mce PAE PSE 
de tsd pvi vme)

Dec 12 16:21:47 current /bsd:  --Guest Segment Info--
Dec 12 16:21:47 current /bsd:  cs=0x0008 rpl=0 base=0x 
limit=0x a/r=0xa09b
Dec 12 16:21:47 current /bsd:   granularity=1 dib=0 l(64 bit)=1 
present=1 sys=1 type=code, r/x, accessed
Dec 12 16:21:47 current /bsd:  ds=0x0010 rpl=0 base=0x 
limit=0x a/r=vmm_handle_cpuid: unsupported 
rax=0x4100

Dec 12 16:21:47 current /bsd: 0xa093
Dec 12 16:21:47 current /bsd:   granularity=1 dib=0 l(64 bit)=1 
present=1 sys=1 type=data, r/w, accessed
Dec 12 16:21:47 current /bsd:  es=0x0010 rpl=0 base=0x 
limit=0x a/r=0xa093
Dec 12 16:21:47 current /bsd:   granularity=1 dib=0 l(64 bit)=1 
present=1 sys=1 type=data, r/w, accessed
Dec 12 16:21:47 current /bsd:  fs=0x rpl=0 base=0x 
limit=0x a/r=0x1c000

Dec 12 16:21:47 current /bsd:   (unusable)
Dec 12 16:21:47 current /bsd:  gs=0x rpl=0 base=0x818f6ff0 
limit=0x a/r=0x1c000

Dec 12 16:21:47 current /bsd:   (unusable)
Dec 12 16:21:47 current /bsd:  ss=0x0010 rpl=0 base=0x 
limit=0x a/r=0xa093
Dec 12 16:21:47 current /bsd:   granularity=1 dib=0 l(64 bit)=1 
present=1 sys=1 type=data, r/w, accessed
Dec 12 16:21:47 current /bsd:  tr=0x0030 base=0x818f5000 
limit=0x0067 a/r=0x008b
Dec 12 16:21:47 current /bsd:   granularity=0 dib=0 l(64 bit)=0 
present=1 sys=0 type=tss (busy)
Dec 12 16:21:47 current /bsd:  gdtr base=0x818f5068 
limit=0x003f
Dec 12 16:21:47 current /bsd:  idtr base=0x8002 
limit=0x0fff
Dec 12 16:21:47 current /bsd:  ldtr=0x base=0x 
limit=0x a/r=0x1c000

Dec 12 16:21:47 current /bsd:   (unusable)
Dec 12 16:21:47 current /bsd: vmm_handle_cpuid: function 0x06 
(thermal/power mgt) not supported
Dec 12 16:21:47 current /bsd:  --Guest MSRs @ 0xfddea339d000 (paddr: 
0x005ea339d000)--
Dec 12 16:21:47 current /bsd:   MSR 0 @ 0xfddea339d000 : 0xc080 
(EFER
Dec 12 16:21:47 current /bsd: ), value=0x0d01 (SCE LME LMA 
NXE)
Dec 12 16:21:47 current /bsd:   MSR 1 @ 0xfddea339d010 : 0xc081 
(STAR), value=0x001b0008
Dec 12 16:21:47 current /bsd:   MSR 2 @ 0xfddea339d020 : 0xc082 
(LSTAR), value=0x813b8000
Dec 12 16:21:47 current /bsd:   MSR 3 @ 0xfddea339d030 : 0xc083 
(CSTAR), value=0x813ba000

Dec 12 16:21:47 current /bsd:   MSR 4 @
Dec 12 16:21:47 current /bsd: 0xfddea339d040 : 0xc084 (SFMASK), 
val

Dec 12 16:21:47 current /bsd: ue=0x000
Dec 12 16:21:47 current /bsd: 44701

Mischa





On 2022-12-10 23:51, Dave

Re: vmd(8): create a proper e820 bios memory map

2022-12-12 Thread Mischa

Hi Dave,

Great stuff!!
Everything is patched, build and booted.

What is the best way to test this?
Start a bunch of VMs with bsd.rd? Does this still need to be a 
decompressed bsd.rd?


Mischa

On 2022-12-10 23:51, Dave Voutila wrote:

tech@,

The below diff tweaks how vmd and vmm define memory ranges (adding a
"type" attribute) so we can properly build an e820 memory map to hand 
to

things like SeaBIOS or the OpenBSD ramdisk kernel (when direct booting
bsd.rd).

Why do it? We've been carrying a few patches to SeaBIOS in the ports
tree to hack around how vmd articulates some memory range details. By
finally implementing a proper bios memory map table we can drop some of
those patches. (Diff to ports@ coming shortly.)

Bonus is it cleans up how we were hacking a bios memory map for direct
booting ramdisk kernels.

Note: the below diff *will* work with the current SeaBIOS
(vmm-firmware), so you do *not* need to build the port.

You will, however, need to:
- build, install, & reboot into a new kernel
- make sure you update /usr/include/amd64/vmmvar.h with a copy of
  symlink to sys/arch/amd64/include/vmmvar.h
- rebuild & install vmctl
- rebuild & install vmd

This should *not* result in any behavioral changes of current vmd
guests. If you notice any, especially guests failing to start, please
rebuild a kernel with VMM_DEBUG to help diagnose the regression.

-dv

diff refs/heads/master refs/heads/vmd-e820
commit - a96642fb40af450c6576e205fab247cdbce0b5ed
commit + f3cb01998127d200e95ff9984a7503eb16c2a8d8
blob - 3f7e0ce405ae3c6b0b4a787de341839886f97436
blob + f2a464217838d3f0a50e4131b5b074b315e490fb
--- sys/arch/amd64/amd64/vmm.c
+++ sys/arch/amd64/amd64/vmm.c
@@ -1643,21 +1643,27 @@ vm_create_check_mem_ranges(struct 
vm_create_params *vc

const paddr_t maxgpa = VMM_MAX_VM_MEM_SIZE;

if (vcp->vcp_nmemranges == 0 ||
-   vcp->vcp_nmemranges > VMM_MAX_MEM_RANGES)
+   vcp->vcp_nmemranges > VMM_MAX_MEM_RANGES) {
+   DPRINTF("invalid number of guest memory ranges\n");
return (0);
+   }

for (i = 0; i < vcp->vcp_nmemranges; i++) {
vmr = >vcp_memranges[i];

/* Only page-aligned addresses and sizes are permitted */
if ((vmr->vmr_gpa & PAGE_MASK) || (vmr->vmr_va & PAGE_MASK) ||
-   (vmr->vmr_size & PAGE_MASK) || vmr->vmr_size == 0)
+   (vmr->vmr_size & PAGE_MASK) || vmr->vmr_size == 0) {
+   DPRINTF("memory range %zu is not page aligned\n", i);
return (0);
+   }

/* Make sure that VMM_MAX_VM_MEM_SIZE is not exceeded */
if (vmr->vmr_gpa >= maxgpa ||
-   vmr->vmr_size > maxgpa - vmr->vmr_gpa)
+   vmr->vmr_size > maxgpa - vmr->vmr_gpa) {
+   DPRINTF("exceeded max memory size\n");
return (0);
+   }

/*
 * Make sure that all virtual addresses are within the address
@@ -1667,39 +1673,55 @@ vm_create_check_mem_ranges(struct 
vm_create_params *vc

 */
if (vmr->vmr_va < VM_MIN_ADDRESS ||
vmr->vmr_va >= VM_MAXUSER_ADDRESS ||
-   vmr->vmr_size >= VM_MAXUSER_ADDRESS - vmr->vmr_va)
+   vmr->vmr_size >= VM_MAXUSER_ADDRESS - vmr->vmr_va) {
+   DPRINTF("guest va not within range or wraps\n");
return (0);
+   }

/*
 * Specifying ranges within the PCI MMIO space is forbidden.
 * Disallow ranges that start inside the MMIO space:
 * [VMM_PCI_MMIO_BAR_BASE .. VMM_PCI_MMIO_BAR_END]
 */
-   if (vmr->vmr_gpa >= VMM_PCI_MMIO_BAR_BASE &&
-   vmr->vmr_gpa <= VMM_PCI_MMIO_BAR_END)
+   if (vmr->vmr_type == VM_MEM_RAM &&
+   vmr->vmr_gpa >= VMM_PCI_MMIO_BAR_BASE &&
+   vmr->vmr_gpa <= VMM_PCI_MMIO_BAR_END) {
+   DPRINTF("guest RAM range %zu cannot being in mmio range"
+   " (gpa=0x%lx)\n", i, vmr->vmr_gpa);
return (0);
+   }

/*
 * ... and disallow ranges that end inside the MMIO space:
 * (VMM_PCI_MMIO_BAR_BASE .. VMM_PCI_MMIO_BAR_END]
 */
-   if (vmr->vmr_gpa + vmr->vmr_size > VMM_PCI_MMIO_BAR_BASE &&
-   vmr->vmr_gpa + vmr->vmr_size <= VMM_PCI_MMIO_BAR_END)
+   if (vmr->vmr_type == VM_MEM_RAM &&
+

Re: move vmd vioblk handling to another thread

2022-11-12 Thread Mischa

Hi David,

Updated the machine to latest snap and applied the patch.
The VMs I have on the machine I am testing with aren't booting properly.


OpenBSD/amd64 BOOT 3.53

boot>
NOTE: random seed is being reused.
booting hd0a:/bsd: 15549720+3695624+345456+0+1171456 
[1142327+128+1218000+922932]=0x16f0dc0

entry point at 0x81001000
[ using 3284416 bytes of bsd ELF symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights 
reserved.
Copyright (c) 1995-2022 OpenBSD. All rights reserved.  
https://www.OpenBSD.org


OpenBSD 7.1 (GENERIC) #3: Sun May 15 10:25:28 MDT 2022

r...@syspatch-71-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC

real mem = 1056952320 (1007MB)
avail mem = 1007792128 (961MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36e0 (10 entries)
bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011
bios0: OpenBSD VMM
acpi at bios0 not configured
cpu0 at mainbus0: (uniprocessor)
cpu0: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz, 2500.01 MHz, 06-2d-07
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,HV,NXE,PAGE1GB,LONG,LAHF,ITSC,MD_CLEAR,MELTDOWN

cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
cpu0: using VERW MDS workaround
pvbus0 at mainbus0: OpenBSD
pvclock0 at pvbus0
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00
virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
viornd0 at virtio0
virtio0: irq 3
virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00
vio0 at virtio1: address fe:e1:bb:6a:00:04
virtio1: irq 5
virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk0 at virtio2
scsibus1 at vioblk0: 1 targets
sd0 at scsibus1 targ 0 lun 0: 
sd0: 51200MB, 512 bytes/sector, 104857600 sectors
virtio2: irq 6
virtio3 at pci0 dev 4 function 0 "OpenBSD VMM Control" rev 0x00
vmmci0 at virtio3
virtio3: irq 7
isa0 at mainbus0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns8250, no fifo
com0: console
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets


After this nothing happens.

Mischa


On 2022-11-11 16:52, David Gwynne wrote:

this updates a diff i had from a few years ago to move the vioblk
handling in vmd into a separate thread.

basically disk io in your virtual machine should not block the vcpu 
from

running now.

just throwing this out so people can give it a go and kick it around.

Index: Makefile
===
RCS file: /cvs/src/usr.sbin/vmd/Makefile,v
retrieving revision 1.28
diff -u -p -r1.28 Makefile
--- Makefile10 Nov 2022 11:46:39 -  1.28
+++ Makefile11 Nov 2022 15:51:50 -
@@ -5,7 +5,7 @@
 PROG=  vmd
 SRCS=  vmd.c control.c log.c priv.c proc.c config.c vmm.c
 SRCS+= vm.c loadfile_elf.c pci.c virtio.c i8259.c mc146818.c
-SRCS+= ns8250.c i8253.c dhcp.c packet.c mmio.c
+SRCS+= ns8250.c i8253.c dhcp.c packet.c mmio.c task.c
 SRCS+= parse.y atomicio.c vioscsi.c vioraw.c vioqcow2.c fw_cfg.c
 SRCS+= vm_agentx.c

Index: task.c
===
RCS file: task.c
diff -N task.c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ task.c  11 Nov 2022 15:51:50 -
@@ -0,0 +1,158 @@
+/* $OpenBSD: task.c,v 1.2 2018/06/19 17:12:34 reyk Exp $ */
+
+/*
+ * Copyright (c) 2017 David Gwynne 
+ *
+ * Permission to use, copy, modify, and distribute this software for 
any
+ * purpose with or without fee is hereby granted, provided that the 
above

+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL 
WARRANTIES

+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE 
FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY 
DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN 
AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING 
OUT OF

+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "task.h"
+
+#define ISSET(_v, _m)  ((_v) & (_m))
+#define SET(_v, _m)((_v) |= (_m))
+#define CLR(_v, _m)((_v) &= ~(_m))
+
+struct taskq {
+   pthread_t thread;
+   struct task_list  list;
+   pthread_mutex_t   mtx;
+   pthread_cond_tcv;
+};
+
+#define TASK_ONQUEUE   (1 << 0)
+
+static void *taskq_run(void *);

Re: vmd(8) fix error handling when hitting rlimit

2022-02-27 Thread Mischa

Hi Dave,

This is great!
Had some surprises when allocating 32G, I think the limit is just a 
little bit lower than 32G, and indeed... lots of loud silence. :)


Mischa


On 2022-02-27 01:27, Dave Voutila wrote:
Following the discusion on misc@ and a diff from tedu@ [1], here's a 
bit
more work cleaning up the issue in vmd(8) to prevent vmd dying if a 
user

tries to create a vm with memory above the rlimit.

I changed tedu's diff a bit to make it less verbose and print a value
that's human readable using fmt_scaled(3). It also only prints the
message about data limits if the error was ENOMEM.

I've also incorporated some feedback off list and follow the error
condition further through vmd. Now vmctl receives a descriptive error 
as

well.

OK or feedback?

An example trying to start a vm named "test":

Before
==

vmctl:
  $ doas vmctl start -Lc -d vm/openbsd.qcow2 -m4G test
  vmctl: pipe closed

vmd:
  $ doas vmd -d
  startup
  test: could not allocate guest memory - exiting: Cannot allocate 
memory

  vmm: read vcp id
  priv exiting, pid 36084
  control exiting, pid 58866
  (vmd exits, a dove cries)

After
=

vmctl:
  $ doas vmctl start -Lc -d vm/openbsd.qcow2 -m4G test
  vmctl: start vm command failed: Cannot allocate memory

vmd:
  $ doas obj/vmd -d
  startup
  test: could not allocate guest memory (data limit is 4.0G)
  test: failed to start vm
  (vmd still alive)

-dv

[1] https://marc.info/?l=openbsd-misc=164581487723923=2


diff e7fa9d3941282eed56a0d5808179cb0e321faae6 /usr/src
blob - 4c6c99f1133cec7cb1e38dfd22e595e4d2023842
file + usr.sbin/vmd/vm.c
--- usr.sbin/vmd/vm.c
+++ usr.sbin/vmd/vm.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -292,17 +293,24 @@ start_vm(struct vmd_vm *vm, int fd)
ret = alloc_guest_mem(vcp);

if (ret) {
+   struct rlimit lim;
+   char buf[FMT_SCALED_STRSIZE];
+   if (ret == ENOMEM && getrlimit(RLIMIT_DATA, ) == 0) {
+   if (fmt_scaled(lim.rlim_cur, buf) == 0)
+   fatalx("could not allocate guest memory (data "
+   "limit is %s)", buf);
+   }
errno = ret;
-   fatal("could not allocate guest memory - exiting");
+   fatal("could not allocate guest memory");
}

ret = vmm_create_vm(vcp);
current_vm = vm;

/* send back the kernel-generated vm id (0 on error) */
-   if (write(fd, >vcp_id, sizeof(vcp->vcp_id)) !=
+   if (atomicio(vwrite, fd, >vcp_id, sizeof(vcp->vcp_id)) !=
sizeof(vcp->vcp_id))
-   fatal("write vcp id");
+   fatal("failed to send created vm id to vmm process");

if (ret) {
errno = ret;
@@ -319,10 +327,9 @@ start_vm(struct vmd_vm *vm, int fd)
fatal("pledge");

if (vm->vm_state & VM_STATE_RECEIVED) {
-   ret = read(vm->vm_receive_fd, , sizeof(vrp));
-   if (ret != sizeof(vrp)) {
+   ret = atomicio(read, vm->vm_receive_fd, , sizeof(vrp));
+   if (ret != sizeof(vrp))
fatal("received incomplete vrp - exiting");
-   }
vrs = vrp.vrwp_regs;
} else {
/*
blob - a60291c17f1f5bb94572f531bdc7e6b2f6b707d5
file + usr.sbin/vmd/vmd.c
--- usr.sbin/vmd/vmd.c
+++ usr.sbin/vmd/vmd.c
@@ -399,9 +399,9 @@ vmd_dispatch_vmm(int fd, struct privsep_proc *p, 
struc

}

if (vmr.vmr_result) {
+   log_warnx("%s: failed to start vm", vcp->vcp_name);
+   vm_remove(vm, __func__);
errno = vmr.vmr_result;
-   log_warn("%s: failed to start vm", vcp->vcp_name);
-   vm_remove(vm, __func__);
break;
}

blob - eb75b4c587884ec43704420ef4172386a5b39bd9
file + usr.sbin/vmd/vmm.c
--- usr.sbin/vmd/vmm.c
+++ usr.sbin/vmd/vmm.c
@@ -51,6 +51,7 @@

 #include "vmd.h"
 #include "vmm.h"
+#include "atomicio.h"

 void   vmm_sighdlr(int, short, void *);
 intvmm_start_vm(struct imsg *, uint32_t *, pid_t *);
@@ -145,7 +146,7 @@ vmm_dispatch_parent(int fd, struct privsep_proc *p, 
st

case IMSG_VMDOP_START_VM_END:
res = vmm_start_vm(imsg, , );
/* Check if the ID can be mapped correctly */
-   if ((id = vm_id2vmid(id, NULL)) == 0)
+   if (res == 0 && (id = vm_id2vmid(id, NULL)) == 0)
res = ENOENT;
cmd = IMSG_VMDOP_START_VM_RESPONSE;
break;
@@ -615,7 +616,8 @@ vmm_start_vm(struct imsg *imsg, uint32_t *id, pid_t 
*p

struct vmd_vm   *vm;
int

Re: nsd 4.3.8

2021-10-20 Thread Mischa

Got it.

Oct 20 20:47:19 name2 nsd[62305]: nsd starting (NSD 4.3.8)
Oct 20 20:47:19 name2 nsd[37128]: nsd started (NSD 4.3.8), pid 31864
Oct 20 22:07:09 name2 nsd[37128]: signal received, shutting down...
Oct 20 22:07:09 name2 nsd[39445]: nsd starting (NSD 4.3.7)
Oct 20 22:07:10 name2 nsd[72021]: nsd started (NSD 4.3.7), pid 192

So far so good.

Mischa

On 2021-10-20 21:56, Florian Obser wrote:

I mean the diff I sent to bugs@ in response to the thread you started
on misc. "Re: NSD exit status 11 on 7.0"

This thread is about upgrading nsd in current, but we also need to fix
7.0. I thought you are running stable in production?


Anyway, having the full upgrade tested is also valuable, so thanks for
that. But if you are running stable please try the patch from bugs@, I
want to put that one into an errata.

On 20 October 2021 21:44:19 CEST, Mischa  wrote:

Is the below patch not needed?

I did run it without the below patch first, without any problems.
After I applied the below patch and compiled again.

Mischa

On 2021-10-20 21:34, Florian Obser wrote:
Uhm, could you please try the single patch from the other mail on 
7.0?

We are probably not going to syspatch to a new nsd version in 7.0.

On 20 October 2021 21:18:17 CEST, Mischa Peters 
wrote:

Hi Florian,

Great stuff!
Applied both patches and NSD has been running without crashing since
20:47 CEST.

Oct 20 20:47:19 name2 nsd[62305]: nsd starting (NSD 4.3.8)
Oct 20 20:47:19 name2 nsd[37128]: nsd started (NSD 4.3.8), pid 31864
Oct 20 20:47:30 name2 /bsd: carp24: state transition: BACKUP -> 
MASTER
Oct 20 20:47:46 name2 /bsd: carp23: state transition: BACKUP -> 
MASTER


Thanx a lot for the quick patches!!

Mischa

On 2021-10-20 18:27, Florian Obser wrote:

On 2021-10-20 18:24 +02, Florian Obser  wrote:

+4.3.8
+
+FEATURES:
+   - Set default for answer-cookie to no. Because in server
deployments
+ with mixed server software, a default of yes causes issues.


sthen and me think that we shouldn't flip-flop between cookie on 
and

cookie off since we shipped the cookie on default in 7.0.

This is on top of the 4.3.8 diff and reverts that behaviour to 
cookie

on
as we have in 7.0.

OK?

diff --git nsd.conf.5.in nsd.conf.5.in
index 4ee4b1292f9..9ae376f288c 100644
--- nsd.conf.5.in
+++ nsd.conf.5.in
@@ -494,7 +494,7 @@ With the value 0 the rate is unlimited.
 .TP
 .B answer\-cookie:\fR 
 Enable to answer to requests containig DNS Cookies as specified in
RFC7873.
-Default is no.
+Default is yes.
 .TP
 .B cookie\-secret:\fR <128 bit hex string>
 Servers in an anycast deployment need to be able to  verify  each
other's DNS
diff --git options.c options.c
index 6411959e8c6..d8fe022b412 100644
--- options.c
+++ options.c
@@ -131,7 +131,7 @@ nsd_options_create(region_type* region)
opt->tls_service_pem = NULL;
opt->tls_port = TLS_PORT;
opt->tls_cert_bundle = NULL;
-   opt->answer_cookie = 0;
+   opt->answer_cookie = 1;
opt->cookie_secret = NULL;
opt->cookie_secret_file = CONFIGDIR"/nsd_cookiesecrets.txt";
opt->control_enable = 0;








Re: nsd 4.3.8

2021-10-20 Thread Mischa

Is the below patch not needed?

I did run it without the below patch first, without any problems.
After I applied the below patch and compiled again.

Mischa

On 2021-10-20 21:34, Florian Obser wrote:

Uhm, could you please try the single patch from the other mail on 7.0?
We are probably not going to syspatch to a new nsd version in 7.0.

On 20 October 2021 21:18:17 CEST, Mischa Peters  
wrote:

Hi Florian,

Great stuff!
Applied both patches and NSD has been running without crashing since
20:47 CEST.

Oct 20 20:47:19 name2 nsd[62305]: nsd starting (NSD 4.3.8)
Oct 20 20:47:19 name2 nsd[37128]: nsd started (NSD 4.3.8), pid 31864
Oct 20 20:47:30 name2 /bsd: carp24: state transition: BACKUP -> MASTER
Oct 20 20:47:46 name2 /bsd: carp23: state transition: BACKUP -> MASTER

Thanx a lot for the quick patches!!

Mischa

On 2021-10-20 18:27, Florian Obser wrote:

On 2021-10-20 18:24 +02, Florian Obser  wrote:

+4.3.8
+
+FEATURES:
+	- Set default for answer-cookie to no. Because in server 
deployments

+ with mixed server software, a default of yes causes issues.


sthen and me think that we shouldn't flip-flop between cookie on and
cookie off since we shipped the cookie on default in 7.0.

This is on top of the 4.3.8 diff and reverts that behaviour to cookie
on
as we have in 7.0.

OK?

diff --git nsd.conf.5.in nsd.conf.5.in
index 4ee4b1292f9..9ae376f288c 100644
--- nsd.conf.5.in
+++ nsd.conf.5.in
@@ -494,7 +494,7 @@ With the value 0 the rate is unlimited.
 .TP
 .B answer\-cookie:\fR 
 Enable to answer to requests containig DNS Cookies as specified in
RFC7873.
-Default is no.
+Default is yes.
 .TP
 .B cookie\-secret:\fR <128 bit hex string>
 Servers in an anycast deployment need to be able to  verify  each
other's DNS
diff --git options.c options.c
index 6411959e8c6..d8fe022b412 100644
--- options.c
+++ options.c
@@ -131,7 +131,7 @@ nsd_options_create(region_type* region)
opt->tls_service_pem = NULL;
opt->tls_port = TLS_PORT;
opt->tls_cert_bundle = NULL;
-   opt->answer_cookie = 0;
+   opt->answer_cookie = 1;
opt->cookie_secret = NULL;
opt->cookie_secret_file = CONFIGDIR"/nsd_cookiesecrets.txt";
opt->control_enable = 0;






Re: nsd 4.3.8

2021-10-20 Thread Mischa Peters

Hi Florian,

Great stuff!
Applied both patches and NSD has been running without crashing since 
20:47 CEST.


Oct 20 20:47:19 name2 nsd[62305]: nsd starting (NSD 4.3.8)
Oct 20 20:47:19 name2 nsd[37128]: nsd started (NSD 4.3.8), pid 31864
Oct 20 20:47:30 name2 /bsd: carp24: state transition: BACKUP -> MASTER
Oct 20 20:47:46 name2 /bsd: carp23: state transition: BACKUP -> MASTER

Thanx a lot for the quick patches!!

Mischa

On 2021-10-20 18:27, Florian Obser wrote:

On 2021-10-20 18:24 +02, Florian Obser  wrote:

+4.3.8
+
+FEATURES:
+   - Set default for answer-cookie to no. Because in server deployments
+ with mixed server software, a default of yes causes issues.


sthen and me think that we shouldn't flip-flop between cookie on and
cookie off since we shipped the cookie on default in 7.0.

This is on top of the 4.3.8 diff and reverts that behaviour to cookie 
on

as we have in 7.0.

OK?

diff --git nsd.conf.5.in nsd.conf.5.in
index 4ee4b1292f9..9ae376f288c 100644
--- nsd.conf.5.in
+++ nsd.conf.5.in
@@ -494,7 +494,7 @@ With the value 0 the rate is unlimited.
 .TP
 .B answer\-cookie:\fR 
 Enable to answer to requests containig DNS Cookies as specified in 
RFC7873.

-Default is no.
+Default is yes.
 .TP
 .B cookie\-secret:\fR <128 bit hex string>
 Servers in an anycast deployment need to be able to  verify  each 
other's DNS

diff --git options.c options.c
index 6411959e8c6..d8fe022b412 100644
--- options.c
+++ options.c
@@ -131,7 +131,7 @@ nsd_options_create(region_type* region)
opt->tls_service_pem = NULL;
opt->tls_port = TLS_PORT;
opt->tls_cert_bundle = NULL;
-   opt->answer_cookie = 0;
+   opt->answer_cookie = 1;
opt->cookie_secret = NULL;
opt->cookie_secret_file = CONFIGDIR"/nsd_cookiesecrets.txt";
opt->control_enable = 0;




Re: vmd(8): simplify vcpu logic, removing uart & net reads

2021-07-05 Thread Mischa
Hi Dave,

> On 3 Jul 2021, at 19:08, Matthias Schmidt  wrote:
> 
> Hi Dave,
> 
> * Dave Voutila wrote:
>> Looking for some broader testing of the following diff. It cleans up
>> some complicated logic predominantly left over from the early days of
>> vmd prior to its having a dedicated device thread.
>> 
>> In summary, this diff:
>> 
>> - Removes vionet "rx pending" state handling and removes the code path
>>  for the vcpu thread to possibly take control of the virtio net device
>>  and attempt a read of the underlying tap(4). (virtio.{c,h}, vm.c)
>> 
>> - Removes ns8250 "rcv pending" state handling and removes the code path
>>  for the vcpu thread to read the pty via com_rcv(). (ns8250.{c,h})
>> 
>> In both of the above cases, the event handling thread will be notified
>> of readable data and deal with it.
>> 
>> Why remove them? The logic is overly complicated and hard to reason
>> about for zero gain. (This diff results in no intended functional
>> change.) Plus, some of the above logic I helped add to deal with the
>> race conditions and state corruption over a year ago. The logic was
>> needed once upon a time, but shouldn't be needed at present.
>> 
>> I've had positive testing feedback from abieber@ so far with at least
>> the ns8250/uart diff, but want to cast a broader net here with both
>> before either part is committed. I debated splitting these up, but
>> they're thematically related.
> 
> I have the diff running since one week on -current with stable/current
> and an Archlinux guest and have noticed no regression so far.
> 
> Cheers
> 
>   Matthias

No issues on my side as well.

Mischa



Re: vmd(8): fix vmctl wait state corruption

2021-04-26 Thread Mischa


> On 24 Apr 2021, at 20:56, Dave Voutila  wrote:
> 
> 
> Dave Voutila writes:
> 
>> Dave Voutila writes:
>> 
>>> vmd(8) users of tech@,
>>> 
>>> NOTE: I have no intention to try to commit this prior to 6.9's release
>>> due to its complexity, but I didn't want to "wait" to solicit testers or
>>> potential feedback.
>> 
>> Freeze is over, so bumping this thread with an updated diff below.
>> 
> 
> Now that there's been some testing and snaps are building once again,
> anyone willing to review & OK?

Wanted to confirm here is as well. The patch works well.
Ran this patch against -current with ~30 VMs owned by a user account.

Issued vmctl stop -aw, ctrl-c-ed every 3-4 VM, and every time the last VM 
stopped shutdown properly.
Even in rapid succession of vmctl stop -aw + ctrl-c, resulting in multiple VMs 
in "stopping” stage, all worked well.
All VMs also started properly without any fsck needed.

Mischa

>>> I noticed recently that I could not have two vmctl(8) clients "wait" for
>>> the same vm to shutdown as one would cancel the other. Worse yet, if you
>>> cancel a wait (^C) you can effectively corrupt the state being used for
>>> tracking the waiting client preventing future clients from waiting on
>>> the vm.
>>> 
>>> It turns out the socket fd of the vmctl(8) client is being sent by the
>>> control process as the peerid in the imsg. This fd is being stored on
>>> the vmd_vm structure in the vm_peerid member, but this fd only has
>>> meaning in the scope of the control process. Consequently:
>>> 
>>> - only 1 value can be stored at a time, meaning only 1 waiting client
>>>  can exist at a time
>>> - since vm_peerid is used for storing if another vmd(8) process is
>>>  waiting on the vm, vm_peerid can be corrupted by vmctl(8)
>>> - the control process cannot update waiting state on vmctl disconnects
>>>  and since fd's are reused it's possible the message could be sent to a
>>>  vmctl(8) client performing an operation other than a "wait"
>>> 
>>> The below diff:
>>> 
>>> 1. enables support for multiple vmctl(8) clients to wait on the same vm
>>>   to terminate
>>> 2. keeps the wait state in the control process and out of the parent's
>>>   global vm state, tracking waiting parties in a TAILQ
>>> 3. removes the waiting client state on client disconnect/cancellation
>>> 4. simplifies vmd(8) by removing IMSG_VMDOP_WAIT_VM_REQUEST handling
>>>   from the vmm process, which isn't needed (and was partially
>>>   responsible for the corruption)
>>> 
>> 
>> Above design still stands, but I've fixed some messaging issues related
>> to the fact the parent process was forwarding
>> IMSG_VMDOP_TERMINATE_VM_RESPONSE messages directly to the control
>> process resulting in duplicate messages. This broke doing a `vmctl stop`
>> for all vms (-a) and waiting (-w). It now only forwards errors.
>> 
>>> There are some subsequent tweaks that may follow this diff, specifically
>>> one related to the fact I've switched the logic to send
>>> IMSG_VMDOP_TERMINATE_VM_EVENT messages to the control process (which
>>> makes sense to me) but control relays a IMSG_VMDOP_TERMINATE_VM_RESPONSE
>>> message to the waiting vmctl(8) client. I'd need to update vmctl(8) to
>>> look for the other event and don't want to complicate the diff further.
>>> 
>>> If any testers out there can try to break this for me it would be much
>>> appreciated. :-)
>>> 
>> 
>> Testers? I'd like to give people a few days to kick the tires before
>> asking for OK to commit.
>> 
>> -dv
>> 
>> 
>> Index: control.c
>> ===
>> RCS file: /cvs/src/usr.sbin/vmd/control.c,v
>> retrieving revision 1.34
>> diff -u -p -r1.34 control.c
>> --- control.c20 Apr 2021 21:11:56 -  1.34
>> +++ control.c21 Apr 2021 17:17:04 -
>> @@ -41,6 +41,13 @@
>> 
>> struct ctl_connlist ctl_conns = TAILQ_HEAD_INITIALIZER(ctl_conns);
>> 
>> +struct ctl_notify {
>> +int ctl_fd;
>> +uint32_tctl_vmid;
>> +TAILQ_ENTRY(ctl_notify) entry;
>> +};
>> +TAILQ_HEAD(ctl_notify_q, ctl_notify) ctl_notify_q =
>> +TAILQ_HEAD_INITIALIZER(ctl_notify_q);
>> void
>>   control_accept(int, short, void *);
>> struct ctl_conn
>> @@ -78,7 +85,10 @@ int
>> control_di

Re: vmm crash on 6.9-beta

2021-03-22 Thread Mischa
> On 22 Mar 2021, at 15:23, Otto Moerbeek  wrote:
> On Mon, Mar 22, 2021 at 03:20:37PM +0100, Mischa wrote:
>>> On 22 Mar 2021, at 15:18, Otto Moerbeek  wrote:
>>> On Mon, Mar 22, 2021 at 03:06:40PM +0100, Mischa wrote:
>>> 
>>>>> On 22 Mar 2021, at 15:05, Dave Voutila  wrote:
>>>>> Otto Moerbeek writes:
>>>>>> On Mon, Mar 22, 2021 at 09:51:19AM -0400, Dave Voutila wrote:
>>>>>>> Otto Moerbeek writes:
>>>>>>>> On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote:
>>>>>>>>>> On 22 Mar 2021, at 13:43, Stuart Henderson  
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>>> Created a fresh install qcow2 image and derived 35 new VMs from it.
>>>>>>>>>>>> Then I started all the VMs in four cycles, 10 VMs per cycle and 
>>>>>>>>>>>> waiting 240 seconds after each cycle.
>>>>>>>>>>>> Similar to the staggered start based on the amount of CPUs.
>>>>>>>>>> 
>>>>>>>>>>> For me this is not enough info to even try to reproduce, I know 
>>>>>>>>>>> little
>>>>>>>>>>> of vmm or vmd and have no idea what "derive" means in this context.
>>>>>>>>>> 
>>>>>>>>>> This is a big bit of information that was missing from the original
>>>>>>>>> 
>>>>>>>>> Well.. could have been better described indeed. :))
>>>>>>>>> " I created 41 additional VMs based on a single qcow2 base image.”
>>>>>>>>> 
>>>>>>>>>> report ;) qcow has a concept of a read-only base image (or 'backing
>>>>>>>>>> file') which can be shared between VMs, with writes diverted to a
>>>>>>>>>> separate image ('derived image').
>>>>>>>>>> 
>>>>>>>>>> So e.g. you can create a base image, do a simple OS install for a
>>>>>>>>>> particular OS version to that base image, then you stop using that
>>>>>>>>>> for a VM and just use it as a base to create derived images from.
>>>>>>>>>> You then run VMs using the derived image and make whatever config
>>>>>>>>>> changes. If you have a bunch of VMs using the same OS release then
>>>>>>>>>> you save some disk space for the common files.
>>>>>>>>>> 
>>>>>>>>>> Mischa did you leave a VM running which is working on the base
>>>>>>>>>> image directly? That would certainly cause problems.
>>>>>>>>> 
>>>>>>>>> I did indeed. Let me try that again without keeping the base image 
>>>>>>>>> running.
>>>>>>>> 
>>>>>>>> Right. As a safeguard, I would change the base image to be r/o.
>>>>>>> 
>>>>>>> vmd(8) should treating it r/o...the config process is responsible for
>>>>>>> opening the disk files and passing the fd's to the vm process. In
>>>>>>> config.c, the call to open(2) for the base images should be using the
>>>>>>> flags O_RDONLY | O_NONBLOCK.
>>>>>>> 
>>>>>>> A ktrace on my system shows that's the case. Below, "new.qcow2" is a new
>>>>>>> disk image I based off the "alpine.qcow2" image:
>>>>>>> 
>>>>>>> 20862 vmd  CALL  
>>>>>>> open(0x7f7d4370,0x26)
>>>>>>> 20862 vmd  NAMI  "/home/dave/vm/new.qcow2"
>>>>>>> 20862 vmd  RET   open 10/0xa
>>>>>>> 20862 vmd  CALL  fstat(10,0x7f7d42b8)
>>>>>>> 20862 vmd  STRU  struct stat { dev=1051, ino=19531847, 
>>>>>>> mode=-rw--- , nlink=1, uid=1000<"dave">, gid=1000<"dave">, 
>>>>>>> rdev=78096304, atime=1616420730<"Mar 22 09:45:30 2021">.509011764, 
>>>>>>> mtime=1616420697<"Mar 22 09:44:57 2021">.189185158, 
>>>>>>> ctime=1616420697<"Mar 22 09:44:57 2021">.189185158, size=262144, 
>>&

Re: vmm crash on 6.9-beta

2021-03-22 Thread Mischa



> On 22 Mar 2021, at 15:18, Otto Moerbeek  wrote:
> 
> On Mon, Mar 22, 2021 at 03:06:40PM +0100, Mischa wrote:
> 
>>> On 22 Mar 2021, at 15:05, Dave Voutila  wrote:
>>> Otto Moerbeek writes:
>>>> On Mon, Mar 22, 2021 at 09:51:19AM -0400, Dave Voutila wrote:
>>>>> Otto Moerbeek writes:
>>>>>> On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote:
>>>>>>>> On 22 Mar 2021, at 13:43, Stuart Henderson  
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>>> Created a fresh install qcow2 image and derived 35 new VMs from it.
>>>>>>>>>> Then I started all the VMs in four cycles, 10 VMs per cycle and 
>>>>>>>>>> waiting 240 seconds after each cycle.
>>>>>>>>>> Similar to the staggered start based on the amount of CPUs.
>>>>>>>> 
>>>>>>>>> For me this is not enough info to even try to reproduce, I know little
>>>>>>>>> of vmm or vmd and have no idea what "derive" means in this context.
>>>>>>>> 
>>>>>>>> This is a big bit of information that was missing from the original
>>>>>>> 
>>>>>>> Well.. could have been better described indeed. :))
>>>>>>> " I created 41 additional VMs based on a single qcow2 base image.”
>>>>>>> 
>>>>>>>> report ;) qcow has a concept of a read-only base image (or 'backing
>>>>>>>> file') which can be shared between VMs, with writes diverted to a
>>>>>>>> separate image ('derived image').
>>>>>>>> 
>>>>>>>> So e.g. you can create a base image, do a simple OS install for a
>>>>>>>> particular OS version to that base image, then you stop using that
>>>>>>>> for a VM and just use it as a base to create derived images from.
>>>>>>>> You then run VMs using the derived image and make whatever config
>>>>>>>> changes. If you have a bunch of VMs using the same OS release then
>>>>>>>> you save some disk space for the common files.
>>>>>>>> 
>>>>>>>> Mischa did you leave a VM running which is working on the base
>>>>>>>> image directly? That would certainly cause problems.
>>>>>>> 
>>>>>>> I did indeed. Let me try that again without keeping the base image 
>>>>>>> running.
>>>>>> 
>>>>>> Right. As a safeguard, I would change the base image to be r/o.
>>>>> 
>>>>> vmd(8) should treating it r/o...the config process is responsible for
>>>>> opening the disk files and passing the fd's to the vm process. In
>>>>> config.c, the call to open(2) for the base images should be using the
>>>>> flags O_RDONLY | O_NONBLOCK.
>>>>> 
>>>>> A ktrace on my system shows that's the case. Below, "new.qcow2" is a new
>>>>> disk image I based off the "alpine.qcow2" image:
>>>>> 
>>>>> 20862 vmd  CALL  open(0x7f7d4370,0x26)
>>>>> 20862 vmd  NAMI  "/home/dave/vm/new.qcow2"
>>>>> 20862 vmd  RET   open 10/0xa
>>>>> 20862 vmd  CALL  fstat(10,0x7f7d42b8)
>>>>> 20862 vmd  STRU  struct stat { dev=1051, ino=19531847, 
>>>>> mode=-rw--- , nlink=1, uid=1000<"dave">, gid=1000<"dave">, 
>>>>> rdev=78096304, atime=1616420730<"Mar 22 09:45:30 2021">.509011764, 
>>>>> mtime=1616420697<"Mar 22 09:44:57 2021">.189185158, ctime=1616420697<"Mar 
>>>>> 22 09:44:57 2021">.189185158, size=262144, blocks=256, blksize=32768, 
>>>>> flags=0x0, gen=0xb64d5d98 }
>>>>> 20862 vmd  RET   fstat 0
>>>>> 20862 vmd  CALL  kbind(0x7f7d39d8,24,0x2a9349e63ae9950c)
>>>>> 20862 vmd  RET   kbind 0
>>>>> 20862 vmd  CALL  pread(10,0x7f7d42a8,0x68,0)
>>>>> 20862 vmd  GIO   fd 10 read 104 bytes
>>>>>  
>>>>> "QFI\M-{\0\0\0\^C\0\0\0\0\0\0\0h\0\0\0\f\0\0\0\^P\0\0\0\^E\0\0\0\0\0\0\
>>>>>   
>>>>> \0\0\0\0\0(\0\0\0\0\0\^A\0\0\0\0\0\0\0\^B\0\0\0\0\0\^A\0\0\0\0\0\0\0\
>>>>>   
>

Re: vmm crash on 6.9-beta

2021-03-22 Thread Mischa
> On 22 Mar 2021, at 15:05, Dave Voutila  wrote:
> Otto Moerbeek writes:
>> On Mon, Mar 22, 2021 at 09:51:19AM -0400, Dave Voutila wrote:
>>> Otto Moerbeek writes:
>>>> On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote:
>>>>>> On 22 Mar 2021, at 13:43, Stuart Henderson  wrote:
>>>>>> 
>>>>>>>> Created a fresh install qcow2 image and derived 35 new VMs from it.
>>>>>>>> Then I started all the VMs in four cycles, 10 VMs per cycle and 
>>>>>>>> waiting 240 seconds after each cycle.
>>>>>>>> Similar to the staggered start based on the amount of CPUs.
>>>>>> 
>>>>>>> For me this is not enough info to even try to reproduce, I know little
>>>>>>> of vmm or vmd and have no idea what "derive" means in this context.
>>>>>> 
>>>>>> This is a big bit of information that was missing from the original
>>>>> 
>>>>> Well.. could have been better described indeed. :))
>>>>> " I created 41 additional VMs based on a single qcow2 base image.”
>>>>> 
>>>>>> report ;) qcow has a concept of a read-only base image (or 'backing
>>>>>> file') which can be shared between VMs, with writes diverted to a
>>>>>> separate image ('derived image').
>>>>>> 
>>>>>> So e.g. you can create a base image, do a simple OS install for a
>>>>>> particular OS version to that base image, then you stop using that
>>>>>> for a VM and just use it as a base to create derived images from.
>>>>>> You then run VMs using the derived image and make whatever config
>>>>>> changes. If you have a bunch of VMs using the same OS release then
>>>>>> you save some disk space for the common files.
>>>>>> 
>>>>>> Mischa did you leave a VM running which is working on the base
>>>>>> image directly? That would certainly cause problems.
>>>>> 
>>>>> I did indeed. Let me try that again without keeping the base image 
>>>>> running.
>>>> 
>>>> Right. As a safeguard, I would change the base image to be r/o.
>>> 
>>> vmd(8) should treating it r/o...the config process is responsible for
>>> opening the disk files and passing the fd's to the vm process. In
>>> config.c, the call to open(2) for the base images should be using the
>>> flags O_RDONLY | O_NONBLOCK.
>>> 
>>> A ktrace on my system shows that's the case. Below, "new.qcow2" is a new
>>> disk image I based off the "alpine.qcow2" image:
>>> 
>>> 20862 vmd  CALL  open(0x7f7d4370,0x26)
>>> 20862 vmd  NAMI  "/home/dave/vm/new.qcow2"
>>> 20862 vmd  RET   open 10/0xa
>>> 20862 vmd  CALL  fstat(10,0x7f7d42b8)
>>> 20862 vmd  STRU  struct stat { dev=1051, ino=19531847, mode=-rw--- 
>>> , nlink=1, uid=1000<"dave">, gid=1000<"dave">, rdev=78096304, 
>>> atime=1616420730<"Mar 22 09:45:30 2021">.509011764, mtime=1616420697<"Mar 
>>> 22 09:44:57 2021">.189185158, ctime=1616420697<"Mar 22 09:44:57 
>>> 2021">.189185158, size=262144, blocks=256, blksize=32768, flags=0x0, 
>>> gen=0xb64d5d98 }
>>> 20862 vmd  RET   fstat 0
>>> 20862 vmd  CALL  kbind(0x7f7d39d8,24,0x2a9349e63ae9950c)
>>> 20862 vmd  RET   kbind 0
>>> 20862 vmd  CALL  pread(10,0x7f7d42a8,0x68,0)
>>> 20862 vmd  GIO   fd 10 read 104 bytes
>>>   
>>> "QFI\M-{\0\0\0\^C\0\0\0\0\0\0\0h\0\0\0\f\0\0\0\^P\0\0\0\^E\0\0\0\0\0\0\
>>>\0\0\0\0\0(\0\0\0\0\0\^A\0\0\0\0\0\0\0\^B\0\0\0\0\0\^A\0\0\0\0\0\0\0\
>>>
>>> \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\^D\0\
>>>\0\0h"
>>> 20862 vmd  RET   pread 104/0x68
>>> 20862 vmd  CALL  pread(10,0x7f7d4770,0xc,0x68)
>>> 20862 vmd  GIO   fd 10 read 12 bytes
>>>   "alpine.qcow2"
>>> 20862 vmd  RET   pread 12/0xc
>>> 20862 vmd  CALL  kbind(0x7f7d39d8,24,0x2a9349e63ae9950c)
>>> 20862 vmd  RET   kbind 0
>>> 20862 vmd  CALL  kbind(0x7f7d39d8,24,0x2a9349e63ae9950c)
>>> 20862 vmd  RET   kbind 0
>>> 20862 vmd  CALL  __realpath(0x7f7d3ea0,0x7f7d3680)
>>> 20862 vmd  NAMI  "/home/dave/vm/alpine.qcow2"
>>> 20862 vmd  NAMI  "/home/dave/vm/alpine.qcow2"
>>> 20862 vmd  RET   __realpath 0
>>> 20862 vmd  CALL  open(0x7f7d4370,0x4)
>>> 20862 vmd  NAMI  "/home/dave/vm/alpine.qcow2"
>>> 20862 vmd  RET   open 11/0xb
>>> 20862 vmd  CALL  fstat(11,0x7f7d42b8)
>>> 
>>> 
>>> I'm more familiar with the vmd(8) codebase than any ffs stuff, but I
>>> don't think the issue is the base image being r/w.
>>> 
>>> -Dave
>> 
>> AFAIKS, the issue is that if you start a vm modifying the base because it
>> uses it as a regular image, that r/o open for the other vms does not
>> matter a lot,
>> 
>>  -OPtto
> 
> Good point. I'm going to look into the feasibility of having the
> control[1] process track what disks it's opened and in what mode to see
> if there's a way to build in some protection against this from
> happening.
> 
> [1] I mistakenly called it the "config" process earlier.

I guess that would help a lot of poor souls like myself to not make that 
mistake again. :)

Mischa



Re: vmm crash on 6.9-beta

2021-03-22 Thread Mischa
> On 22 Mar 2021, at 14:30, Otto Moerbeek  wrote:
> On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote:
>>> On 22 Mar 2021, at 13:43, Stuart Henderson  wrote:
>>> 
>>>>> Created a fresh install qcow2 image and derived 35 new VMs from it.
>>>>> Then I started all the VMs in four cycles, 10 VMs per cycle and waiting 
>>>>> 240 seconds after each cycle.
>>>>> Similar to the staggered start based on the amount of CPUs.
>>> 
>>>> For me this is not enough info to even try to reproduce, I know little
>>>> of vmm or vmd and have no idea what "derive" means in this context.
>>> 
>>> This is a big bit of information that was missing from the original
>> 
>> Well.. could have been better described indeed. :))
>> " I created 41 additional VMs based on a single qcow2 base image.”
>> 
>>> report ;) qcow has a concept of a read-only base image (or 'backing
>>> file') which can be shared between VMs, with writes diverted to a
>>> separate image ('derived image').
>>> 
>>> So e.g. you can create a base image, do a simple OS install for a
>>> particular OS version to that base image, then you stop using that
>>> for a VM and just use it as a base to create derived images from.
>>> You then run VMs using the derived image and make whatever config
>>> changes. If you have a bunch of VMs using the same OS release then
>>> you save some disk space for the common files.
>>> 
>>> Mischa did you leave a VM running which is working on the base
>>> image directly? That would certainly cause problems.
>> 
>> I did indeed. Let me try that again without keeping the base image running.
> 
> Right. As a safeguard, I would change the base image to be r/o.
> 
> I was just looking at your script and scratching my head: why is Mischa
> starting vm01 ...
> 
>   -Otto

Normally I don’t use derived images, was a way to get a bunch of VMs running 
quickly to put some more load on veb/vport.
I have moved vm01 out of the way to vm00 and redid the whole process.
Seems much better now.

Thank you all for showing me the way. :)

Mischa



Re: vmm crash on 6.9-beta

2021-03-22 Thread Mischa
> On 22 Mar 2021, at 14:27, Bryan Steele  wrote:
> On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote:
>> 
>>> On 22 Mar 2021, at 13:43, Stuart Henderson  wrote:
>>> 
>>>>> Created a fresh install qcow2 image and derived 35 new VMs from it.
>>>>> Then I started all the VMs in four cycles, 10 VMs per cycle and waiting 
>>>>> 240 seconds after each cycle.
>>>>> Similar to the staggered start based on the amount of CPUs.
>>> 
>>>> For me this is not enough info to even try to reproduce, I know little
>>>> of vmm or vmd and have no idea what "derive" means in this context.
>>> 
>>> This is a big bit of information that was missing from the original
>> 
>> Well.. could have been better described indeed. :))
>> " I created 41 additional VMs based on a single qcow2 base image.”
>> 
>>> report ;) qcow has a concept of a read-only base image (or 'backing
>>> file') which can be shared between VMs, with writes diverted to a
>>> separate image ('derived image').
>>> 
>>> So e.g. you can create a base image, do a simple OS install for a
>>> particular OS version to that base image, then you stop using that
>>> for a VM and just use it as a base to create derived images from.
>>> You then run VMs using the derived image and make whatever config
>>> changes. If you have a bunch of VMs using the same OS release then
>>> you save some disk space for the common files.
>>> 
>>> Mischa did you leave a VM running which is working on the base
>>> image directly? That would certainly cause problems.
>> 
>> I did indeed. Let me try that again without keeping the base image running.
>> 
>> Mischa
> 
> I seemed to recall that the base image is not supposed to be modified,
> so this is a pretty big omission.
> 
> Per original commit message:
> 
> "A limitation of this format is that modifying the base image will
> corrupt the derived image."
> 
> https://marc.info/?l=openbsd-cvs=153901633011716=2

Makes a lot of sense. I guess a man page patch is in order. 

Mischa

Re: vmm crash on 6.9-beta

2021-03-22 Thread Mischa



> On 22 Mar 2021, at 13:43, Stuart Henderson  wrote:
> 
>>> Created a fresh install qcow2 image and derived 35 new VMs from it.
>>> Then I started all the VMs in four cycles, 10 VMs per cycle and waiting 240 
>>> seconds after each cycle.
>>> Similar to the staggered start based on the amount of CPUs.
> 
>> For me this is not enough info to even try to reproduce, I know little
>> of vmm or vmd and have no idea what "derive" means in this context.
> 
> This is a big bit of information that was missing from the original

Well.. could have been better described indeed. :))
" I created 41 additional VMs based on a single qcow2 base image.”

> report ;) qcow has a concept of a read-only base image (or 'backing
> file') which can be shared between VMs, with writes diverted to a
> separate image ('derived image').
> 
> So e.g. you can create a base image, do a simple OS install for a
> particular OS version to that base image, then you stop using that
> for a VM and just use it as a base to create derived images from.
> You then run VMs using the derived image and make whatever config
> changes. If you have a bunch of VMs using the same OS release then
> you save some disk space for the common files.
> 
> Mischa did you leave a VM running which is working on the base
> image directly? That would certainly cause problems.

I did indeed. Let me try that again without keeping the base image running.

Mischa

> 
> 
>> Would it be possiblet for you to show the exact steps (preferably a
>> script) to reproduce the issue?
>> 
>> Though the specific hardware might play a role as well...
>> 
>>  -Otto
>>> 
>>> Mischa
>>> 
>>> OpenBSD 6.9-beta (GENERIC.MP) #421: Sun Mar 21 13:17:22 MDT 2021
>>>dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>>> real mem = 137374924800 (131010MB)
>>> avail mem = 133196165120 (127025MB)
>>> random: good seed from bootblocks
>>> mpath0 at root
>>> scsibus0 at mpath0: 256 targets
>>> mainbus0 at root
>>> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xbf42c000 (99 entries)
>>> bios0: vendor Dell Inc. version "2.8.0" date 06/26/2019
>>> bios0: Dell Inc. PowerEdge R620
>>> acpi0 at bios0: ACPI 3.0
>>> acpi0: sleep states S0 S4 S5
>>> acpi0: tables DSDT FACP APIC SPCR HPET DMAR MCFG WD__ SLIC ERST HEST BERT 
>>> EINJ TCPA PC__ SRAT SSDT
>>> acpi0: wakeup devices PCI0(S5) PCI1(S5)
>>> acpitimer0 at acpi0: 3579545 Hz, 24 bits
>>> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
>>> cpu0 at mainbus0: apid 0 (boot processor)
>>> cpu0: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 2600.34 MHz, 06-2d-07
>>> cpu0: 
>>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
>>> cpu0: 256KB 64b/line 8-way L2 cache
>>> cpu0: smt 0, core 0, package 0
>>> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
>>> cpu0: apic clock running at 99MHz
>>> cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
>>> cpu1 at mainbus0: apid 32 (application processor)
>>> cpu1: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 1200.02 MHz, 06-2d-07
>>> cpu1: 
>>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
>>> cpu1: 256KB 64b/line 8-way L2 cache
>>> cpu1: smt 0, core 0, package 1
>>> cpu2 at mainbus0: apid 2 (application processor)
>>> cpu2: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 2600.03 MHz, 06-2d-07
>>> cpu2: 
>>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
>>> cpu2: 256KB 64b/line 8-way L2 cache
>>> cpu2: smt 0, core 1, package 0
>>> cpu3 at mainbus0: apid 34 (application processor)
>>> cpu3: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 26

Re: vmm crash on 6.9-beta

2021-03-22 Thread Mischa
> On 22 Mar 2021, at 13:08, Otto Moerbeek  wrote:
> On Mon, Mar 22, 2021 at 11:34:25AM +0100, Mischa wrote:
> 
>>> On 21 Mar 2021, at 02:31, Theo de Raadt  wrote:
>>> Otto Moerbeek  wrote:
>>>> On Fri, Mar 19, 2021 at 04:15:31PM +, Stuart Henderson wrote:
>>>> 
>>>>> On 2021/03/19 17:05, Jan Klemkow wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I had the same issue a few days ago a server hardware of mine.  I just
>>>>>> ran 'cvs up'.  So, it looks like a generic bug in FFS and not related to
>>>>>> vmm.
>>>>> 
>>>>> This panic generally relates to filesystem corruption. If fsck doesn't
>>>>> help then recreating which filesystem is triggering it is usually needed.
>>>> 
>>>> Yeah, once in a while we see reports of it. It seems to be some nasty
>>>> conspiracy between the generic filesystem code, ffs and fsck_ffs.
>>>> Maybe even the device (driver) itself is involved. A possible
>>>> underlying issue may be that some operation are re-ordered while they
>>>> should not.
>>> 
>>> Yes, it does hint at a reordering.
>>> 
>>>> Now the strange thing is, fsck_ffs *should* be able to repair the
>>>> inconsistency, but it appears in some cases it is not, and some bits
>>>> on the disk remain to trigger it again.
>>> 
>>> fsck_ffs can only repair one inconsistancy.  There are a number of lockstep
>>> operations, I suppose we can call them acid-in-lowercase, which allow fsck
>>> to determine at which point the crashed system gave up the ghost.  fsck then
>>> removes the partial operations, leaving a viable filesystem.  But if the 
>>> disk
>>> layer lands later writes but not earlier writes, fsck cannot handle it.
>> 
>> I managed to re-create the issue.
>> 
>> Created a fresh install qcow2 image and derived 35 new VMs from it.
>> Then I started all the VMs in four cycles, 10 VMs per cycle and waiting 240 
>> seconds after each cycle.
>> Similar to the staggered start based on the amount of CPUs.
>> 
>> This time it was “only” one VM that was affected by this. VM four that got 
>> started.
>> 
>> ddb> show panic
>> ffs_valloc: dup alloc
>> ddb> trace
>> db_enter() at db_enter+0x10
>> panic(81dc21b2) at panic+0x12a
>> ffs_inode_alloc(fd803c94ef00,81a4,fd803f7bbf00,800014d728b8) at 
>> ffs
>> _inode_alloc+0x442
>> ufs_makeinode(81a4,fd803c930908,800014d72bb0,800014d72c00) at 
>> ufs_m
>> akeinode+0x7f
>> ufs_create(800014d72960) at ufs_create+0x3c
>> VOP_CREATE(fd803c930908,800014d72bb0,800014d72c00,800014d729c0)
>> at VOP_CREATE+0x4a
>> vn_open(800014d72b80,602,1a4) at vn_open+0x182
>> doopenat(8000c778,ff9c,f8fc28f00f4,601,1b6,800014d72d80) at 
>> doo
>> penat+0x1d0
>> syscall(800014d72df0) at syscall+0x315
>> Xsyscall() at Xsyscall+0x128
>> end of kernel
>> end trace frame: 0x7f7be450, count: -10
>> 
>> dmesg of the host below.
> 
> For me this is not enough info to even try to reproduce, I know little
> of vmm or vmd and have no idea what "derive" means in this context.
> 
> Would it be possiblet for you to show the exact steps (preferably a
> script) to reproduce the issue?

Hopefully the below helps.

If you do have vmd running create a VM (qcow2 format) with the normal 
installation process.
The base image I created with: vmctl create -s 50G /var/vmm/vm01.qcow2

I have dhcp setup so all the subsequent images will be able to pickup a 
different IP address.

Once that is done replicate the vm.conf config for all the other VMs.
The config I used for the VMs is something like:

vm "vm01" {
disable
owner runbsd
memory 1G
disk "/var/vmm/vm01.qcow2" format qcow2
interface tap {
switch "uplink_veb911"
lladdr fe:e1:bb:d4:d4:01
}
}

I replicate them by running something like:
for i in $(jot 39 2); do vmctl create -b /var/vmm/vm01.qcow2 
/var/vmm/vm${i}.qcow2; done

This will use vm01.qcow2 image as the base and create a derived image from it.
Which means only changes will be applied to the new image.

I start them with the following script:

#!/bin/sh
SLEEP=240
CPU=$(($(sysctl -n hw.ncpuonline)-2))

COUNTER=0
for i in $(vmctl show | sort | awk '/ - / {print $9}' | xargs); do
VMS[${COUNTER}]=${i}
COUNTER=$((${COUNTER}+1))
done

CYCLES=$((${#VMS[*]}/${CPU}+1))
echo "Starting ${#VMS[*]} VMs on ${CPU} CPUs in ${CYCLES} cycle(s), waiting 
${SLEEP} seconds after each cycle."

COUNTER=0
for i in ${VMS[*]}; do
COUNTER=$((${COUNTER}+1))
vmctl start ${i}
if [ $COUNTER -eq $CPU ]; then
sleep ${SLEEP}
COUNTER=0
fi
done

This to make sure they are “settled” and all processes are properly started 
before starting the next batch of VMs.

> Though the specific hardware might play a role as well…

I can also provide you access to the host itself. 

Mischa



Re: vmm crash on 6.9-beta

2021-03-22 Thread Mischa
> On 21 Mar 2021, at 02:31, Theo de Raadt  wrote:
> Otto Moerbeek  wrote:
>> On Fri, Mar 19, 2021 at 04:15:31PM +, Stuart Henderson wrote:
>> 
>>> On 2021/03/19 17:05, Jan Klemkow wrote:
>>>> Hi,
>>>> 
>>>> I had the same issue a few days ago a server hardware of mine.  I just
>>>> ran 'cvs up'.  So, it looks like a generic bug in FFS and not related to
>>>> vmm.
>>> 
>>> This panic generally relates to filesystem corruption. If fsck doesn't
>>> help then recreating which filesystem is triggering it is usually needed.
>> 
>> Yeah, once in a while we see reports of it. It seems to be some nasty
>> conspiracy between the generic filesystem code, ffs and fsck_ffs.
>> Maybe even the device (driver) itself is involved. A possible
>> underlying issue may be that some operation are re-ordered while they
>> should not.
> 
> Yes, it does hint at a reordering.
> 
>> Now the strange thing is, fsck_ffs *should* be able to repair the
>> inconsistency, but it appears in some cases it is not, and some bits
>> on the disk remain to trigger it again.
> 
> fsck_ffs can only repair one inconsistancy.  There are a number of lockstep
> operations, I suppose we can call them acid-in-lowercase, which allow fsck
> to determine at which point the crashed system gave up the ghost.  fsck then
> removes the partial operations, leaving a viable filesystem.  But if the disk
> layer lands later writes but not earlier writes, fsck cannot handle it.

I managed to re-create the issue.

Created a fresh install qcow2 image and derived 35 new VMs from it.
Then I started all the VMs in four cycles, 10 VMs per cycle and waiting 240 
seconds after each cycle.
Similar to the staggered start based on the amount of CPUs.

This time it was “only” one VM that was affected by this. VM four that got 
started.

ddb> show panic
ffs_valloc: dup alloc
ddb> trace
db_enter() at db_enter+0x10
panic(81dc21b2) at panic+0x12a
ffs_inode_alloc(fd803c94ef00,81a4,fd803f7bbf00,800014d728b8) at ffs
_inode_alloc+0x442
ufs_makeinode(81a4,fd803c930908,800014d72bb0,800014d72c00) at ufs_m
akeinode+0x7f
ufs_create(800014d72960) at ufs_create+0x3c
VOP_CREATE(fd803c930908,800014d72bb0,800014d72c00,800014d729c0)
 at VOP_CREATE+0x4a
vn_open(800014d72b80,602,1a4) at vn_open+0x182
doopenat(8000c778,ff9c,f8fc28f00f4,601,1b6,800014d72d80) at doo
penat+0x1d0
syscall(800014d72df0) at syscall+0x315
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7be450, count: -10

dmesg of the host below.

Mischa

OpenBSD 6.9-beta (GENERIC.MP) #421: Sun Mar 21 13:17:22 MDT 2021
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 137374924800 (131010MB)
avail mem = 133196165120 (127025MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xbf42c000 (99 entries)
bios0: vendor Dell Inc. version "2.8.0" date 06/26/2019
bios0: Dell Inc. PowerEdge R620
acpi0 at bios0: ACPI 3.0
acpi0: sleep states S0 S4 S5
acpi0: tables DSDT FACP APIC SPCR HPET DMAR MCFG WD__ SLIC ERST HEST BERT EINJ 
TCPA PC__ SRAT SSDT
acpi0: wakeup devices PCI0(S5) PCI1(S5)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 2600.34 MHz, 06-2d-07
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
cpu1 at mainbus0: apid 32 (application processor)
cpu1: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 1200.02 MHz, 06-2d-07
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 0, package 1
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 2600.03 MHz, 06-2d-07
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PB

Re: vmm crash on 6.9-beta

2021-03-19 Thread Mischa
> On 16 Mar 2021, at 21:17, Mischa  wrote:
> On 13 Mar at 09:17, Otto Moerbeek  wrote:
>> On Sat, Mar 13, 2021 at 12:08:52AM -0800, Mike Larkin wrote:
>>> On Wed, Mar 10, 2021 at 08:30:32PM +0100, Mischa wrote:
>>>> On 10 Mar at 18:59, Mike Larkin  wrote:
>>>>> On Wed, Mar 10, 2021 at 03:08:21PM +0100, Mischa wrote:
>>>>>> Hi All,
>>>>>> 
>>>>>> Currently I am running 6.9-beta on one of my hosts to test 
>>>>>> veb(4)/vport(4).
>>>>>> 
>>>>>> root@server14:~ # sysctl kern.version
>>>>>> kern.version=OpenBSD 6.9-beta (GENERIC.MP) #385: Mon Mar  8 12:57:12 MST 
>>>>>> 2021
>>>>>>dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>>>>>> 
>>>>>> On order to add some load to the system I created 41 additional VMs 
>>>>>> based on a single qcow2 base image.
>>>>>> A couple of those VMs crashed with the following ddb output.
>>>>>> 
>>>>>> ddb> show panic
>>>>>> ffs_valloc: dup alloc
>>>>>> ddb> trace
>>>>>> db_enter() at db_enter+0x10
>>>>>> panic(81dc0709) at panic+0x12a
>>>>>> ffs_inode_alloc(fd80269831e0,8180,fd803f7bb540,800014e1e3e8) 
>>>>>> at ffs
>>>>>> _inode_alloc+0x442
>>>>>> ufs_makeinode(8180,fd8026a386a0,800014e1e6e0,800014e1e730) 
>>>>>> at ufs_m
>>>>>> akeinode+0x7f
>>>>>> ufs_create(800014e1e490) at ufs_create+0x3c
>>>>>> VOP_CREATE(fd8026a386a0,800014e1e6e0,800014e1e730,800014e1e4f0)
>>>>>> at VOP_CREATE+0x4a
>>>>>> vn_open(800014e1e6b0,10602,180) at vn_open+0x182
>>>>>> doopenat(800014e8a518,ff9c,70e0e92a500,10601,1b6,800014e1e8b0)
>>>>>>  at d
>>>>>> oopenat+0x1d0
>>>>>> syscall(800014e1e920) at syscall+0x315
>>>>>> Xsyscall() at Xsyscall+0x128
>>>>>> end of kernel
>>>>>> end trace frame: 0x7f7e5000, count: -10
>>>>>> 
>>>>>> Mischa
>>>>>> 
>>>>> 
>>>>> Probably not vmm(4) related but thanks for reporting!
>>>> 
>>>> Could it be qcow2 related? or is this general disk? At least that is what 
>>>> I think ffs_ is. :)
>>>> 
>>>> Mischa
>>>> 
>>> 
>>> likely completely unrelated to anything vmd(8) is doing.
>>> 
>> 
>> Appart form kernel/ffs bugs, a dup alloc can also be caused by an
>> inconsistent fs.  Please run a *forced* (-f) fsck on the fs. (after
>> unmounting of course).
>> 
>>  -Otto
> 
> Thanx Otto, that indeed did the trick.
> It hasn't happened since and veb/vport seems to hold well.

Was running pkg_add and during extraction of automake the VM hanged, with the 
same panic/trace.

root@server14:~ # vmctl console vm11
Connected to /dev/ttyp8 (speed 115200)

ddb> show panic
ffs_valloc: dup alloc
ddb> trace
db_enter() at db_enter+0x10
panic(81dc5a29) at panic+0x12a
ffs_inode_alloc(fd8035819e20,41ed,fd803f7bb780,800014e84ac8) at ffs
_inode_alloc+0x442
ufs_mkdir(800014e84b20) at ufs_mkdir+0x9e
VOP_MKDIR(fd8038c3b820,800014e84c58,800014e84ca8,800014e84b88) a
t VOP_MKDIR+0x50
domkdirat(80008ef0,ff9c,f7e8cefbd80,1ff) at domkdirat+0xf6
syscall(800014e84e20) at syscall+0x315
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7c6480, count: -8
ddb> 

After running fsck the VM isn’t right anymore.

reordering libraries:install: unknown group bin
install: unknown group bin
 failed.
install: unknown group utmp
starting early daemons: syslogd pflogd ntpd.
starting RPC daemons:.
savecore: no core dump
checking quotas: done.
kvm_mkdb: can't find kmem group: Undefined error: 0
chown: group is invalid: wheel
clearing /tmp

Will spin a non-derived VM to see if this makes a difference.

Mischa





Re: vmm crash on 6.9-beta

2021-03-16 Thread Mischa
On 13 Mar at 09:17, Otto Moerbeek  wrote:
> On Sat, Mar 13, 2021 at 12:08:52AM -0800, Mike Larkin wrote:
> > On Wed, Mar 10, 2021 at 08:30:32PM +0100, Mischa wrote:
> > > On 10 Mar at 18:59, Mike Larkin  wrote:
> > > > On Wed, Mar 10, 2021 at 03:08:21PM +0100, Mischa wrote:
> > > > > Hi All,
> > > > >
> > > > > Currently I am running 6.9-beta on one of my hosts to test 
> > > > > veb(4)/vport(4).
> > > > >
> > > > > root@server14:~ # sysctl kern.version
> > > > > kern.version=OpenBSD 6.9-beta (GENERIC.MP) #385: Mon Mar  8 12:57:12 
> > > > > MST 2021
> > > > > 
> > > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > > >
> > > > > On order to add some load to the system I created 41 additional VMs 
> > > > > based on a single qcow2 base image.
> > > > > A couple of those VMs crashed with the following ddb output.
> > > > >
> > > > > ddb> show panic
> > > > > ffs_valloc: dup alloc
> > > > > ddb> trace
> > > > > db_enter() at db_enter+0x10
> > > > > panic(81dc0709) at panic+0x12a
> > > > > ffs_inode_alloc(fd80269831e0,8180,fd803f7bb540,800014e1e3e8)
> > > > >  at ffs
> > > > > _inode_alloc+0x442
> > > > > ufs_makeinode(8180,fd8026a386a0,800014e1e6e0,800014e1e730)
> > > > >  at ufs_m
> > > > > akeinode+0x7f
> > > > > ufs_create(800014e1e490) at ufs_create+0x3c
> > > > > VOP_CREATE(fd8026a386a0,800014e1e6e0,800014e1e730,800014e1e4f0)
> > > > >  at VOP_CREATE+0x4a
> > > > > vn_open(800014e1e6b0,10602,180) at vn_open+0x182
> > > > > doopenat(800014e8a518,ff9c,70e0e92a500,10601,1b6,800014e1e8b0)
> > > > >  at d
> > > > > oopenat+0x1d0
> > > > > syscall(800014e1e920) at syscall+0x315
> > > > > Xsyscall() at Xsyscall+0x128
> > > > > end of kernel
> > > > > end trace frame: 0x7f7e5000, count: -10
> > > > >
> > > > > Mischa
> > > > >
> > > >
> > > > Probably not vmm(4) related but thanks for reporting!
> > >
> > > Could it be qcow2 related? or is this general disk? At least that is what 
> > > I think ffs_ is. :)
> > >
> > > Mischa
> > >
> > 
> > likely completely unrelated to anything vmd(8) is doing.
> > 
> 
> Appart form kernel/ffs bugs, a dup alloc can also be caused by an
> inconsistent fs.  Please run a *forced* (-f) fsck on the fs. (after
> unmounting of course).
> 
>   -Otto

Thanx Otto, that indeed did the trick.
It hasn't happened since and veb/vport seems to hold well.

Mischa



Re: vmm crash on 6.9-beta

2021-03-10 Thread Mischa
On 10 Mar at 18:59, Mike Larkin  wrote:
> On Wed, Mar 10, 2021 at 03:08:21PM +0100, Mischa wrote:
> > Hi All,
> >
> > Currently I am running 6.9-beta on one of my hosts to test veb(4)/vport(4).
> >
> > root@server14:~ # sysctl kern.version
> > kern.version=OpenBSD 6.9-beta (GENERIC.MP) #385: Mon Mar  8 12:57:12 MST 
> > 2021
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> >
> > On order to add some load to the system I created 41 additional VMs based 
> > on a single qcow2 base image.
> > A couple of those VMs crashed with the following ddb output.
> >
> > ddb> show panic
> > ffs_valloc: dup alloc
> > ddb> trace
> > db_enter() at db_enter+0x10
> > panic(81dc0709) at panic+0x12a
> > ffs_inode_alloc(fd80269831e0,8180,fd803f7bb540,800014e1e3e8) at 
> > ffs
> > _inode_alloc+0x442
> > ufs_makeinode(8180,fd8026a386a0,800014e1e6e0,800014e1e730) at 
> > ufs_m
> > akeinode+0x7f
> > ufs_create(800014e1e490) at ufs_create+0x3c
> > VOP_CREATE(fd8026a386a0,800014e1e6e0,800014e1e730,800014e1e4f0)
> >  at VOP_CREATE+0x4a
> > vn_open(800014e1e6b0,10602,180) at vn_open+0x182
> > doopenat(800014e8a518,ff9c,70e0e92a500,10601,1b6,800014e1e8b0) 
> > at d
> > oopenat+0x1d0
> > syscall(800014e1e920) at syscall+0x315
> > Xsyscall() at Xsyscall+0x128
> > end of kernel
> > end trace frame: 0x7f7e5000, count: -10
> >
> > Mischa
> >
> 
> Probably not vmm(4) related but thanks for reporting!

Could it be qcow2 related? or is this general disk? At least that is what I 
think ffs_ is. :)

Mischa



vmm crash on 6.9-beta

2021-03-10 Thread Mischa
Hi All,

Currently I am running 6.9-beta on one of my hosts to test veb(4)/vport(4).

root@server14:~ # sysctl kern.version
kern.version=OpenBSD 6.9-beta (GENERIC.MP) #385: Mon Mar  8 12:57:12 MST 2021
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

On order to add some load to the system I created 41 additional VMs based on a 
single qcow2 base image.
A couple of those VMs crashed with the following ddb output.

ddb> show panic
ffs_valloc: dup alloc
ddb> trace
db_enter() at db_enter+0x10
panic(81dc0709) at panic+0x12a
ffs_inode_alloc(fd80269831e0,8180,fd803f7bb540,800014e1e3e8) at ffs
_inode_alloc+0x442
ufs_makeinode(8180,fd8026a386a0,800014e1e6e0,800014e1e730) at ufs_m
akeinode+0x7f
ufs_create(800014e1e490) at ufs_create+0x3c
VOP_CREATE(fd8026a386a0,800014e1e6e0,800014e1e730,800014e1e4f0)
 at VOP_CREATE+0x4a
vn_open(800014e1e6b0,10602,180) at vn_open+0x182
doopenat(800014e8a518,ff9c,70e0e92a500,10601,1b6,800014e1e8b0) at d
oopenat+0x1d0
syscall(800014e1e920) at syscall+0x315
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7e5000, count: -10

Mischa



Re: Port httpd(8) 'strip' directive to relayd(8)

2021-01-04 Thread Mischa
By no means an official OK, but would love to see this in relayd!

Mischa

> On 3 Jan 2021, at 11:40, Denis Fondras  wrote:
> 
> Le Fri, Dec 11, 2020 at 10:53:56AM +, Olivier Cherrier a écrit :
>> 
>>  Hello tech@,
>> 
>> Is there any interest for this feature to be commited?
>> I find it very useful. Thank you Denis!
>> 
> 
> Here is an up to date diff, looking for OKs.
> 
> Index: parse.y
> ===
> RCS file: /cvs/src/usr.sbin/relayd/parse.y,v
> retrieving revision 1.250
> diff -u -p -r1.250 parse.y
> --- parse.y   29 Dec 2020 19:48:06 -  1.250
> +++ parse.y   3 Jan 2021 10:38:26 -
> @@ -175,7 +175,7 @@ typedef struct {
> %tokenLOOKUP METHOD MODE NAT NO DESTINATION NODELAY NOTHING ON PARENT 
> PATH
> %tokenPFTAG PORT PREFORK PRIORITY PROTO QUERYSTR REAL REDIRECT RELAY 
> REMOVE
> %tokenREQUEST RESPONSE RETRY QUICK RETURN ROUNDROBIN ROUTE SACK 
> SCRIPT SEND
> -%token   SESSION SOCKET SPLICE SSL STICKYADDR STYLE TABLE TAG TAGGED TCP
> +%token   SESSION SOCKET SPLICE SSL STICKYADDR STRIP STYLE TABLE TAG 
> TAGGED TCP
> %tokenTIMEOUT TLS TO ROUTER RTLABEL TRANSPARENT URL WITH TTL RTABLE
> %tokenMATCH PARAMS RANDOM LEASTSTATES SRCHASH KEY CERTIFICATE 
> PASSWORD ECDHE
> %tokenEDH TICKETS CONNECTION CONNECTIONS CONTEXT ERRORS STATE CHANGES 
> CHECKS
> @@ -1549,6 +1549,20 @@ ruleopts   : METHOD STRING 
> {
>   rule->rule_kv[keytype].kv_option = $2;
>   rule->rule_kv[keytype].kv_type = keytype;
>   }
> + | PATH STRIP NUMBER {
> + char*strip = NULL;
> +
> + if ($3 < 0 || $3 > INT_MAX) {
> + yyerror("invalid strip number");
> + YYERROR;
> + }
> + if (asprintf(, "%lld", $3) <= 0)
> + fatal("can't parse strip");
> + keytype = KEY_TYPE_PATH;
> + rule->rule_kv[keytype].kv_option = KEY_OPTION_STRIP;
> + rule->rule_kv[keytype].kv_value = strip;
> + rule->rule_kv[keytype].kv_type = keytype;
> + }
>   | QUERYSTR key_option STRING value  {
>   switch ($2) {
>   case KEY_OPTION_APPEND:
> @@ -2481,6 +2495,7 @@ lookup(char *s)
>   { "ssl",SSL },
>   { "state",  STATE },
>   { "sticky-address", STICKYADDR },
> + { "strip",  STRIP },
>   { "style",  STYLE },
>   { "table",  TABLE },
>   { "tag",TAG },
> Index: relay.c
> ===
> RCS file: /cvs/src/usr.sbin/relayd/relay.c,v
> retrieving revision 1.251
> diff -u -p -r1.251 relay.c
> --- relay.c   14 May 2020 17:27:38 -  1.251
> +++ relay.c   3 Jan 2021 10:38:27 -
> @@ -214,6 +214,9 @@ relay_ruledebug(struct relay_rule *rule)
>   case KEY_OPTION_LOG:
>   fprintf(stderr, "log ");
>   break;
> + case KEY_OPTION_STRIP:
> + fprintf(stderr, "strip ");
> + break;
>   case KEY_OPTION_NONE:
>   break;
>   }
> @@ -227,13 +230,15 @@ relay_ruledebug(struct relay_rule *rule)
>   break;
>   }
> 
> + int kvv = (kv->kv_option == KEY_OPTION_STRIP ||
> +  kv->kv_value == NULL);
>   fprintf(stderr, "%s%s%s%s%s%s ",
>   kv->kv_key == NULL ? "" : "\"",
>   kv->kv_key == NULL ? "" : kv->kv_key,
>   kv->kv_key == NULL ? "" : "\"",
> - kv->kv_value == NULL ? "" : " value \"",
> + kvv ? "" : " value \"",
>   kv->kv_value == NULL ? "" : kv->kv_value,
> - kv->kv_value == NULL ? "" : "\"");
> + kvv ? "" : "\"");
>   }
> 
>   if (rule->rule_tablename[0])
> Index: rela

Re: iwm (7260) on APU2 fails on -current #601

2020-01-21 Thread Mischa



> On 21 Jan 2020, at 11:57, Stefan Sperling  wrote:
> 
> On Tue, Jan 21, 2020 at 11:34:28AM +0100, Mischa wrote:
>> Hi All,
>> 
>> I have an APU2 with a iwm card which keeps on acting up on a regular basis.
>> 
>> apu2# dmesg | grep iwm
>> iwm0 at pci1 dev 0 function 0 "Intel Dual Band Wireless AC 7260" rev 0x73, 
>> msi
>> iwm0: hw rev 0x140, fw ver 17.3216344376.0, address f8:16:54:06:b9:a9
>> 
>> After the APU has booted properly it fails to continue to use the device 
>> pretty quickly.
>> Message like below are shown on the console:
>> 
>> iwm0: device timeout
>> iwm0: acquiring device failed
>> iwm0: acquiring device failed
>> iwm0: acquiring device failed
>> iwm0: acquiring device failed
>> iwm0: acquiring device failed
>> iwm0: acquiring device failed
>> iwm0: acquiring device failed
>> iwm0: acquiring device failed
>> iwm0: acquiring device failed
>> iwm0: acquiring device failed
>> iwm0: acquiring device failed
>> iwm0: acquiring device failed
>> iwm0: acquiring device failed
>> iwm0: acquiring device failed
>> iwm0: acquiring device failed
>> iwm0: acquiring device failed
>> iwm0: acquiring device failed
>> 
>> Managed to capture the above when running: sysupgrade
>> Is this a hardware problem?
> 
> Not sure. I would try moving the card to another minipcie slot.

Good idea. Will give that a try. Thanx!

Mischa



iwm (7260) on APU2 fails on -current #601

2020-01-21 Thread Mischa
Hi All,

I have an APU2 with a iwm card which keeps on acting up on a regular basis.

apu2# dmesg | grep iwm
iwm0 at pci1 dev 0 function 0 "Intel Dual Band Wireless AC 7260" rev 0x73, msi
iwm0: hw rev 0x140, fw ver 17.3216344376.0, address f8:16:54:06:b9:a9

After the APU has booted properly it fails to continue to use the device pretty 
quickly.
Message like below are shown on the console:

iwm0: device timeout
iwm0: acquiring device failed
iwm0: acquiring device failed
iwm0: acquiring device failed
iwm0: acquiring device failed
iwm0: acquiring device failed
iwm0: acquiring device failed
iwm0: acquiring device failed
iwm0: acquiring device failed
iwm0: acquiring device failed
iwm0: acquiring device failed
iwm0: acquiring device failed
iwm0: acquiring device failed
iwm0: acquiring device failed
iwm0: acquiring device failed
iwm0: acquiring device failed
iwm0: acquiring device failed
iwm0: acquiring device failed

Managed to capture the above when running: sysupgrade
Is this a hardware problem?

Mischa

apu2# dmesg
OpenBSD 6.6-current (GENERIC.MP) #601: Sun Jan 12 22:51:04 MST 2020
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4259917824 (4062MB)
avail mem = 4118347776 (3927MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xcfe96020 (13 entries)
bios0: vendor coreboot version "v4.10.0.3" date 11/07/2019
bios0: PC Engines apu2
acpi0 at bios0: ACPI 4.0
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT FACP SSDT MCFG TPM2 APIC HEST IVRS SSDT SSDT HPET
acpi0: wakeup devices PWRB(S4) PBR4(S4) PBR5(S4) PBR6(S4) PBR7(S4) PBR8(S4) 
UOH1(S3) UOH2(
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimcfg0 at acpi0
acpimcfg0: addr 0xf800, bus 0-64
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD GX-412TC SOC, 998.28 MHz, 16-30-01
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,F
cpu0: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 
16-way L2 cac
cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu0: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD GX-412TC SOC, 998.14 MHz, 16-30-01
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,F
cpu1: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 
16-way L2 cac
cpu1: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu1: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: AMD GX-412TC SOC, 998.14 MHz, 16-30-01
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,F
cpu2: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 
16-way L2 cac
cpu2: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu2: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: AMD GX-412TC SOC, 998.14 MHz, 16-30-01
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,F
cpu3: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 
16-way L2 cac
cpu3: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu3: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
cpu3: smt 0, core 3, package 0
ioapic0 at mainbus0: apid 4 pa 0xfec0, version 21, 24 pins
ioapic1 at mainbus0: apid 5 pa 0xfec2, version 21, 32 pins, remapped
acpihpet0 at acpi0: 14318180 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (PBR4)
acpiprt2 at acpi0: bus 2 (PBR5)
acpiprt3 at acpi0: bus 3 (PBR6)
acpiprt4 at acpi0: bus 4 (PBR7)
acpiprt5 at acpi0: bus -1 (PBR8)
"ACPI0007" at acpi0 not configured
"ACPI0007" at acpi0 not configured
"ACPI0007" at acpi0 not configured
"ACPI0007" at acpi0 not configured
"ACPI0007" at acpi0 not configured
"ACPI0007" at acpi0 not configured
"ACPI0007" at acpi0 not configured
"ACPI0007" at acpi0 not configured
acpibtn0 at acpi0: PWRB
acpipci0 at acpi0 PCI0: 0x 0x0011 0x0001
extent `acpipci0 pcibus' (0x0 - 0xff), flags=0
extent `acpipci0 pciio' (0x0 - 0x), flags=0
 0xcf8 - 0xcff
 0x1 - 0x
extent `acpipci0 pcimem' (0x0 - 0x), flags=0
 0x0 - 0x9
 0xe - 0xcfff
 0x1 - 0x
acpicmos0 at acpi0
amdgpio0 at acpi0: GPIO uid 0 addr 0xfed81500/0x300 irq 7, 184 pins
"PRP0001" 

Re: [PATCH] staggered start of vms in vm.conf

2019-12-08 Thread Mischa Peters



> On 8 Dec 2019, at 11:08, Pratik Vyas  wrote:
> 
> Hi!
> 
> This is an attempt to address 'thundering herd' problem when a lot of
> vms are configured in vm.conf.  A lot of vms booting in parallel can
> overload the host and also mess up tsc calibration in openbsd guests as
> it uses PIT which doesn't fire reliably if the host is overloaded.
> 
> 
> This diff makes vmd start vms in a staggered fashion with default parallelism 
> of
> number of cpus on the host and a delay of 30s.  Default can be overridden with
> a line like following in vm.conf
> 
> staggered start parallel 4 delay 30
> 
> 
> Every non-disabled vm starts in waiting state.  If you are eager to
> start a vm that is way further in the list, you can vmctl start it.
> 
> Discussed the idea with ori@, mlarkin@ and phessler@.
> 
> Comments / ok? 

Great addition to stop -w. Like it!

Mischa


> --
> Pratik
> 
> Index: usr.sbin/vmctl/vmctl.c
> ===
> RCS file: /home/cvs/src/usr.sbin/vmctl/vmctl.c,v
> retrieving revision 1.71
> diff -u -p -a -u -r1.71 vmctl.c
> --- usr.sbin/vmctl/vmctl.c7 Sep 2019 09:11:14 -1.71
> +++ usr.sbin/vmctl/vmctl.c8 Dec 2019 09:29:39 -
> @@ -716,6 +716,8 @@ vm_state(unsigned int mask)
> {
>if (mask & VM_STATE_PAUSED)
>return "paused";
> +else if (mask & VM_STATE_WAITING)
> +return "waiting";
>else if (mask & VM_STATE_RUNNING)
>return "running";
>else if (mask & VM_STATE_SHUTDOWN)
> Index: usr.sbin/vmd/parse.y
> ===
> RCS file: /home/cvs/src/usr.sbin/vmd/parse.y,v
> retrieving revision 1.52
> diff -u -p -a -u -r1.52 parse.y
> --- usr.sbin/vmd/parse.y14 May 2019 06:05:45 -1.52
> +++ usr.sbin/vmd/parse.y8 Dec 2019 09:29:39 -
> @@ -122,7 +122,8 @@ typedef struct {
> %tokenINCLUDE ERROR
> %tokenADD ALLOW BOOT CDROM DEVICE DISABLE DISK DOWN ENABLE FORMAT GROUP
> %tokenINET6 INSTANCE INTERFACE LLADDR LOCAL LOCKED MEMORY NET NIFS OWNER
> -%tokenPATH PREFIX RDOMAIN SIZE SOCKET SWITCH UP VM VMID
> +%tokenPATH PREFIX RDOMAIN SIZE SOCKET SWITCH UP VM VMID STAGGERED START
> +%token  PARALLEL DELAY
> %tokenNUMBER
> %tokenSTRING
> %typelladdr
> @@ -217,6 +218,11 @@ main: LOCAL INET6 {
>env->vmd_ps.ps_csock.cs_uid = $3.uid;
>env->vmd_ps.ps_csock.cs_gid = $3.gid == -1 ? 0 : $3.gid;
>}
> +| STAGGERED START PARALLEL NUMBER DELAY NUMBER {
> +env->vmd_cfg.cfg_flags |= VMD_CFG_STAGGERED_START;
> +env->vmd_cfg.delay.tv_sec = $6;
> +env->vmd_cfg.parallelism = $4;
> +}
>;
> switch: SWITCH string{
> @@ -368,6 +374,8 @@ vm: VM string vm_instance{
>} else {
>if (vcp_disable)
>vm->vm_state |= VM_STATE_DISABLED;
> +else
> +vm->vm_state |= VM_STATE_WAITING;
>log_debug("%s:%d: vm \"%s\" "
>"registered (%s)",
>file->name, yylval.lineno,
> @@ -766,6 +774,7 @@ lookup(char *s)
>{ "allow",ALLOW },
>{ "boot",BOOT },
>{ "cdrom",CDROM },
> +{ "delay",DELAY },
>{ "device",DEVICE },
>{ "disable",DISABLE },
>{ "disk",DISK },
> @@ -785,10 +794,13 @@ lookup(char *s)
>{ "memory",MEMORY },
>{ "net",NET },
>{ "owner",OWNER },
> +{ "parallel",PARALLEL },
>{ "prefix",PREFIX },
>{ "rdomain",RDOMAIN },
>{ "size",SIZE },
>{ "socket",SOCKET },
> +{ "staggered",STAGGERED },
> +{ "start",START  },
>{ "switch",SWITCH },
>{ "up",UP },
>{ "vm",VM }
> Index: usr.sbin/vmd/vm.conf.5
> ===
> RCS file: /home/cvs/src/usr.sbin/vmd/vm.conf.5,v
> retrieving revision 1.44
> diff -u -p -a -u -r1.44 vm.conf.5
> --- usr.sbin/vmd/vm.conf.514 May 2019 12:47:17 -1.44
> +++ usr.sbin/vmd/vm.conf.58 D

Re: relayd(8): transparent forward

2019-11-05 Thread Mischa Peters



> On 6 Nov 2019, at 08:25, Stuart Henderson  wrote:
> 
> On 2019/11/05 20:46, Mischa Peters wrote:
>> When you are using transparent (Direct Server Return) you have to make sure 
>> you disable ARP on the servers you are load balancing.
> 
> Transparent is not "direct server return", that is done with "route to".

You are right indeed. 
However according to the manual transparent also retains the client IP. So 
seems similar in operation, but I guess a different use-case. 


> 
>> What happens with transparant is that the server gets the client IP as 
>> source, not the IP of relayd, and will respond directly to the client from 
>> its own IP address. The client is expecting a response from the relayd IP 
>> address and doesn’t respond to the server. 
> 
> The client is expecting a response from the address it sent packets to,
> "transparent" doesn't interfere with this.
> 
> There is something fiddly with the config for "transparent" but it should
> be possible to do what OP wants if relayd is on a machine on the network
> path between client and destination (e.g. on a firewall/router).
> 



Re: relayd(8): transparent forward

2019-11-05 Thread Mischa Peters
What are you trying to do?

When you are using transparent (Direct Server Return) you have to make sure you 
disable ARP on the servers you are load balancing.

What happens with transparant is that the server gets the client IP as source, 
not the IP of relayd, and will respond directly to the client from its own IP 
address. The client is expecting a response from the relayd IP address and 
doesn’t respond to the server. 

Since you are going to the same server it might not be a good idea to use 
transparent. :)

If you want to get the client IP address on your destination you can add the 
client IP address in a header, if it’s HTTP. 

With headers like:

match request header set "X-ClientIP" value "$REMOTE_ADDR"
match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
match request header append "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT"

Hope this helps. 

Mischa

--

> On 5 Nov 2019, at 17:38, mp1...@gmx-topmail.de wrote:
> 
> 
> The configuration below works fine as soon as I remove the 'transparent'
> keyword but times out when running as transparent forwarder.
> 
> What am I missing?
> 
> Any help is being appreciated.
> 
> 
> -
> # relayd.conf
> 
> http protocol "httpsfilter" {
>tcp { nodelay, sack }
>return error
>pass
>tls keypair test-site
> }
> relay "httpsinspect" {
>listen on 127.0.0.1 port 8443 tls
>protocol "httpsfilter"
>transparent forward with tls to destination
> }
> --
> 
> --
> # pf.conf
> 
> set skip on lo
> block return
> pass
> pass in on egress inet proto tcp to port https \
>divert-to 127.0.0.1 port 8443
> --
> 
> 
> Here's some debug output:
> 
> root:/root:2# relayd -dvv
> startup
> socket_rlimit: max open files 1024
> socket_rlimit: max open files 1024
> socket_rlimit: max open files 1024
> pfe: filter init done
> socket_rlimit: max open files 1024
> relay_load_certfiles: using certificate /etc/ssl/test-site.crt
> relay_load_certfiles: using private key /etc/ssl/private/test-site.key
> parent_tls_ticket_rekey: rekeying tickets
> relay_privinit: adding relay httpsinspect
> protocol 1: name httpsfilter
>flags: used, return, relay flags: tls, tls client, divert
>tcp flags: nodelay, sack
>tls flags: tlsv1.2, cipher-server-preference
>tls session tickets: disabled
>type: http
>pass request
> ca_engine_init: using RSA privsep engine
> ca_engine_init: using RSA privsep engine
> ca_engine_init: using RSA privsep engine
> ca_engine_init: using RSA privsep engine
> init_tables: created 0 tables
> relay_tls_ctx_create: loading certificate
> relay_tls_ctx_create: loading certificate
> relay_tls_ctx_create: loading certificate
> relay_launch: running relay httpsinspect
> relay_launch: running relay httpsinspect
> relay_launch: running relay httpsinspect
> relay_tls_transaction: session 1: scheduling on EV_READ
> relay httpsinspect, tls session 1 established (1 active)
> relay_connect: session 1: forward failed: Operation timed out
> relay_close: sessions inflight decremented, now 0
> relay_tls_transaction: session 2: scheduling on EV_READ
> relay httpsinspect, tls session 2 established (1 active)
> relay_connect: session 2: forward failed: Operation timed out
> relay_close: sessions inflight decremented, now 0
> ^Ckill_tables: deleted 0 tables
> hce exiting, pid 46061
> ca exiting, pid 45725
> flush_rulesets: flushed rules
> ca exiting, pid 26171
> ca exiting, pid 87096
> pfe exiting, pid 63649
> relay exiting, pid 69039
> relay exiting, pid 56446
> relay exiting, pid 69591
> parent terminating, pid 49439
> root:/root:3#
> 



Re: vmd: static address for local interfaces, fix static tapX names

2019-10-25 Thread Mischa Peters
--
> On 25 Oct 2019, at 21:53, Mike Larkin  wrote:
> 
> On Fri, Oct 25, 2019 at 07:47:35PM +, Reyk Floeter wrote:
>>> On Fri, Oct 25, 2019 at 12:27:25PM -0700, Mike Larkin wrote:
>>> On Fri, Oct 25, 2019 at 06:15:59PM +, Reyk Floeter wrote:
>>>> Hi,
>>>> 
>>>> the attached diff is rather large and implements two things for vmd:
>>>> 
>>>> 1) Allow to configure static IP address/gateway pairs local interfaces.
>>>> 2) Skip statically configured interface names (eg. tap0) when
>>>>  allocating dynamic interfaces.
>>>> 
>>>> Example:
>>>> ---snip---
>>>> vm "foo" {
>>>>disable
>>>>local interface "tap0" {
>>>>address 192.168.0.10/24 192.168.0.1
>>>>}
>>>>local interface "tap1"
>>>>disk "/home/vm/foo.qcow2"
>>>> }
>>>> 
>>>> vm "bar" {
>>>>local interface
>>>>disk "/home/vm/bar.qcow2"
>>>> }
>>>> ---snap---
>>>> 
>>>> 
>>>> 1) The VM "foo" has two interfaces: The first interface has a fixed
>>>> IPv4 address with 192.168.0.1/24 on the gateway and 192.168.0.10/24 on
>>>> the VM.  192.168.0.10/24 is assigned to the VM's first NIC via the
>>>> built-in DHCP server.  The second VM gets a default 100.64.x.x/31 IP.
>>> 
>>> I'm not sure the above description matches what I'm seeing in the vm.conf
>>> snippet above.
>>> 
>>> What's "the gateway" here? Is this the host machine, or the actual
>>> gateway, perhaps on some other machine? Does this just allow me to specify
>>> the host-side tap(4) IP address for a corresponding given VM vio(4) 
>>> interface?
>>> 
>> 
>> Ah, OK.  I used the terms without explaining them:
>> 
>> With local interfaces, vmd(8) uses two IPs per interface: one for the
>> tap(4) on the host, one for the vio(4) on the VM.  It configures the
>> first one on the host and provides the second one via DHCP.  The IP on
>> the host IP is the default "gateway" router for the VM.
>> 
> 
> Ah, I missed the fact that these are not "-i" style interfaces but rather
> *local* interfaces (eg, "-L" style). 
> 
>> The address syntax is currently reversed:
>>address "address/prefix" "gateway"
>> Maybe I should change it to
>>address "gateway" "address/prefix"
>> or
>>address "address/prefix" gateway "gateway"
> 
> I like the last one, but I probably won't be a heavy user of this ...

I think the last is most clear, will be a heavy user of this. :))

The IPs which are assigned this way will be mapped to the vlan/interface with 
the same subnet on the host to get out?

Or does it still require a assigned interface in vm.conf?

Very cool Reyk! Happy you managed to spend some time on this. 

Mischa

> 
> -ml
> 
>> 
>> I also wonder if we could technically use a non-local IP address for
>> the gateway.  I currently enforce that the prefix matches, but I don't
>> enforce that both addresses are in the same subnet.
>> 
>> When using the default auto-generated 100.64.0.0/31 method, it uses
>> the first IP in the subnet as the gateway and the second IP for the
>> VM.
>> 
>>> And did you mean "The second interface" there instead of the "The second 
>>> VM"?
>>> (Although I think the description fits for "The second VM" also...)
>>> 
>> 
>> Yes, both, the second interface is correct as well.
>> 
>>> I think the idea is sound. As long as we don't end up adding extra command
>>> line args to vmctl to manually configure this, which it doesn't appear we 
>>> are
>>> doing here. :)
>>> 
>> 
>> I don't want to add it to vmctl either.
>> 
>>> I didn't read the diff in great detail, I'll wait until you say you have a
>>> final version.
>>> 
>> 
>> OK, thanks.
>> 
>> Reyk
>> 
>>> -ml
>>> 
>>>>This idea came up when I talked with Mischa at EuroBSDCon about
>>>> OpenBSDAms: instead of using L2 and external static dhcpd for all VMs,
>>>> it could be a solution to use L3 and to avoid bridge(4) and dhcpd(8).
>>>> But it would need a way to serve static IPs via the internal dhcp
>>&

[patch] www.openbsd.org/events.html

2019-10-04 Thread Mischa
Hi All,

Does it make sense to add my talk on vmm/vmd at EuroBSDCon to the events.html 
page?
If it does, below is the diff. (Thanx Paul! :))

Mischa

Index: events.html
===
RCS file: /home/OpenBSD/cvs/www/events.html,v
retrieving revision 1.1182
diff -u -p -r1.1182 events.html
--- events.html 22 Sep 2019 19:36:35 -  1.1182
+++ events.html 4 Oct 2019 07:47:37 -
@@ -61,6 +61,8 @@ September 19-22, 2019, Lillehammer, Norw
(slides)
Marc Espie - Advanced ports toolkit: near-perfect packing-list generation
(slides)
+Mischa Peters - The OpenBSD hypervisor in the wild, a short story.
+(https://2019.eurobsdcon.org/slides/The%20OpenBSD%20hypervisor%20in%20the%20wild,%20a%20short%20story%20-%20Mischa%20Peters.pdf;>slides)





httpd rewrite support and REQUEST_URI - repost

2019-05-05 Thread Mischa
Hi All,

With the XFF patch being committed, thank you very much Theo!

Can someone have a look at the patch send last year?
https://marc.info/?l=openbsd-tech=153303654230606

It's a patch by Tim Baumgard which sets the correct REQUEST_URI CGI variable.
His git repo is at https://github.com/tbaumgard/openbsd-httpd-rewrite

It's another piece of the puzzle which makes httpd better suited to host even 
more.

Thanx!

Mischa



Re: httpd: New log format to log X-Forwarded-{For|Port} headers

2019-05-03 Thread Mischa Peters


--
> On 3 May 2019, at 19:19, Theo Buehler  wrote:
> 
>> On Mon, Mar 04, 2019 at 02:06:02PM +0100, Bruno Flueckiger wrote:
>> Hi,
>> 
>> I've completely reworked my patch for httpd(8). The last patch broke the
>> log format combined. And the config option was ugly. This time I've
>> added another log format called forwarded. It appends two fields to the
>> log format combined: The first field contains the value of the header
>> X-Forwarded-For and the second one the value of X-Forwarded-Port. If
>> either of the headers is empty or missing a dash (-) is written.
>> 
>> The new log format is compatible with log analyzing tools like Webalizer
>> or GoAccess. If you run httpd(8) behind a proxy like relayd(8) the new
>> log format finally gives you a way to track the origin of the requests.
> 
> Committed, thanks!

Great! You are making a lot of people very happy!

Mischa



Re: httpd: New log format to log X-Forwarded-{For|Port} headers

2019-05-02 Thread Mischa Peters



> On 3 May 2019, at 04:59, Theo Buehler  wrote:
> 
>> On Fri, Mar 08, 2019 at 10:52:28AM +0100, Reyk Floeter wrote:
>> Hi,
>> 
>>> On Mon, Mar 04, 2019 at 02:06:02PM +0100, Bruno Flueckiger wrote:
>>> I've completely reworked my patch for httpd(8). The last patch broke the
>>> log format combined. And the config option was ugly. This time I've
>>> added another log format called forwarded. It appends two fields to the
>>> log format combined: The first field contains the value of the header
>>> X-Forwarded-For and the second one the value of X-Forwarded-Port. If
>>> either of the headers is empty or missing a dash (-) is written.
>>> 
>>> The new log format is compatible with log analyzing tools like Webalizer
>>> or GoAccess. If you run httpd(8) behind a proxy like relayd(8) the new
>>> log format finally gives you a way to track the origin of the requests.
>>> 
>> 
>> Your diff looks clean and makes a lot of sense.
>> 
>> Especially since X-Forwarded-For is a feature in relayd that I first
>> used and documented around 2006/2007.  Adding the forwarded style to
>> httpd is a complementary feature in OpenBSD and not something for a
>> random external web stack.
>> 
>> OK reyk@
>> 
>> Anyone else, any objections?
> 
> That would be really nice to have. Did this slip through the cracks or
> are there concerns with this diff?
> 

I believe it fell through the cracks. Would be super useful. 

Mischa

>> 
>> Reyk
>> 
>>> Cheers,
>>> Bruno
>>> 
>>> Index: usr.sbin/httpd/httpd.conf.5
>>> ===
>>> RCS file: /cvs/src/usr.sbin/httpd/httpd.conf.5,v
>>> retrieving revision 1.103
>>> diff -u -p -r1.103 httpd.conf.5
>>> --- usr.sbin/httpd/httpd.conf.519 Feb 2019 11:37:26 -1.103
>>> +++ usr.sbin/httpd/httpd.conf.527 Feb 2019 15:26:48 -
>>> @@ -450,7 +450,8 @@ The
>>> .Ar style
>>> can be
>>> .Cm common ,
>>> -.Cm combined
>>> +.Cm combined ,
>>> +.Cm forwarded
>>> or
>>> .Cm connection .
>>> The styles
>>> @@ -459,6 +460,14 @@ and
>>> .Cm combined
>>> write a log entry after each request similar to the standard Apache
>>> and nginx access log formats.
>>> +The style
>>> +.Cm forwarded
>>> +extends the style
>>> +.Cm combined
>>> +by appending two fields containing the values of the headers
>>> +.Ar X-Forwarded-For
>>> +and
>>> +.Ar X-Forwarded-Port .
>>> The style
>>> .Cm connection
>>> writes a summarized log entry after each connection,
>>> Index: usr.sbin/httpd/httpd.h
>>> ===
>>> RCS file: /cvs/src/usr.sbin/httpd/httpd.h,v
>>> retrieving revision 1.143
>>> diff -u -p -r1.143 httpd.h
>>> --- usr.sbin/httpd/httpd.h19 Feb 2019 11:37:26 -1.143
>>> +++ usr.sbin/httpd/httpd.h27 Feb 2019 15:26:48 -
>>> @@ -437,7 +437,8 @@ SPLAY_HEAD(client_tree, client);
>>> enum log_format {
>>>LOG_FORMAT_COMMON,
>>>LOG_FORMAT_COMBINED,
>>> -LOG_FORMAT_CONNECTION
>>> +LOG_FORMAT_CONNECTION,
>>> +LOG_FORMAT_FORWARDED
>>> };
>>> 
>>> struct log_file {
>>> Index: usr.sbin/httpd/parse.y
>>> ===
>>> RCS file: /cvs/src/usr.sbin/httpd/parse.y,v
>>> retrieving revision 1.110
>>> diff -u -p -r1.110 parse.y
>>> --- usr.sbin/httpd/parse.y19 Feb 2019 11:37:26 -1.110
>>> +++ usr.sbin/httpd/parse.y27 Feb 2019 15:26:48 -
>>> @@ -140,7 +140,7 @@ typedef struct {
>>> %tokenPROTOCOLS REQUESTS ROOT SACK SERVER SOCKET STRIP STYLE SYSLOG TCP 
>>> TICKET
>>> %tokenTIMEOUT TLS TYPE TYPES HSTS MAXAGE SUBDOMAINS DEFAULT PRELOAD 
>>> REQUEST
>>> %tokenERROR INCLUDE AUTHENTICATE WITH BLOCK DROP RETURN PASS REWRITE
>>> -%tokenCA CLIENT CRL OPTIONAL PARAM
>>> +%tokenCA CLIENT CRL OPTIONAL PARAM FORWARDED
>>> %tokenSTRING
>>> %token  NUMBER
>>> %typeport
>>> @@ -1024,6 +1024,11 @@ logstyle: COMMON{
>>>srv_conf->flags |= SRVFLAG_LOG;
>>>srv_conf->logformat = LOG_FORMAT_CONNECTION;
>>>}
>>> +| FORWARDED{
>&g

Re: Conditional sysupgrade

2019-04-28 Thread Mischa
On 27 Apr at 22:57, Florian Obser  wrote:
> On Sat, Apr 27, 2019 at 09:53:08PM +0200, Mischa Peters wrote:
> > Let me know if this needs more work. Love the idea of sysupgrade!
> 
> Please shelf this for now, there is a lot of churn going on in the
> tool in private and we are moving very fast.
> 
> There are more subtleties to consider.

Ok. Did get some good suggestions on my shell use, so might be able to put them 
to use at a later stage.

Mischa



Re: Conditional sysupgrade

2019-04-27 Thread Mischa Peters
On 27 Apr at 17:52, Florian Obser  wrote:
> On Sat, Apr 27, 2019 at 01:23:20PM +0100, Marco Bonetti wrote:
> > Hello folks,
> > 
> > First of all congratulations on a new OpenBSD release and thanks for
> > introducing sysupgrade in -current.
> > 
> > Before sysupgrade, I was using a custom script for achieving the same
> > result with only difference that I was checking if a new snapshot (or
> > release) is available by looking at BUILDINFO before starting the
> > upgrade process.
> > 
> > Patch below introduce the same behaviour using SHA256.sig as control
> > file. If you believe there is a valid use case for reinstalling already
> > applied sets to the running system please let me know and I can add a
> > -f force option.
> 
> I see a need for the feature and also for the -f flag. One idea was if
> you messed up your shared libs you just type sysupgrade to
> unbreak things. (Doesn't quite work since not all the tools are
> statically linked).
> 
> I'm not happy with comparing the sha256 file, could you please use
> what(1) to compare the downloaded kernel with the running kernel?
> 
> $ sysctl -n kern.version | head -1
> OpenBSD 6.5-current (GENERIC.MP) #32: Fri Apr 26 10:37:48 MDT 2019
> $ what /home/_sysupgrade/bsd.mp | tail -1
>   OpenBSD 6.5-current (GENERIC.MP) #32: Fri Apr 26 10:37:48 MDT 2019
> 
> You need to check if you are running MP or SP though.
> 
> I have also suggested this to Mischa, added to Cc.

As Florian suggested I compared kern.version to what from both bsd and bsd.mp.
I personally don't like repetition in the code, but I don't know how to do this 
more elegantly.

The other thing that might need to be adjuested is when to compare, I choose to 
do this all the way at the end before bsd.rd gets copied to bsd.upgrade.

Let me know if this needs more work. Love the idea of sysupgrade!

--- /usr/sbin/sysupgradeFri Apr 26 18:23:15 2019
+++ sysupgrade  Sat Apr 27 17:50:15 2019
@@ -149,6 +149,19 @@

 unpriv signify -C -p "${SIGNIFY_KEY}" -x SHA256.sig ${SETS}

+VERSION=$(sysctl -n kern.version | head -1)
+BSDSP=$(what /home/_sysupgrade/bsd | tail -1 | awk '{$1=$1;print}')
+BSDMP=$(what /home/_sysupgrade/bsd.mp | tail -1 | awk '{$1=$1;print}')
+
+if [[ ${VERSION} = ${BSDMP} ]]; then
+   echo "No update needed"
+   exit 1
+fi
+if [[ ${VERSION} = ${BSDSP} ]]; then
+   echo "No update needed"
+   exit 1
+fi
+
 cp bsd.rd /nbsd.upgrade
 ln /nbsd.upgrade /bsd.upgrade
 rm /nbsd.upgrade


Mischa

> 
> > 
> > Cheers,
> > Marco
> > 
> > Index: usr.sbin/sysupgrade/sysupgrade.8
> > ===
> > RCS file: /cvs/src/usr.sbin/sysupgrade/sysupgrade.8,v
> > retrieving revision 1.2
> > diff -u -p -u -r1.2 sysupgrade.8
> > --- usr.sbin/sysupgrade/sysupgrade.826 Apr 2019 05:54:49 -  
> > 1.2
> > +++ usr.sbin/sysupgrade/sysupgrade.827 Apr 2019 11:54:40 -
> > @@ -28,7 +28,7 @@
> >  .Nm
> >  is a utility to upgrade
> >  .Ox
> > -to the next release or a new snapshot.
> > +to the next release or a new snapshot if available.
> >  .Pp
> >  .Nm
> >  downloads the necessary files to
> > 
> > Index: usr.sbin/sysupgrade/sysupgrade.sh
> > ===
> > RCS file: /cvs/src/usr.sbin/sysupgrade/sysupgrade.sh,v
> > retrieving revision 1.6
> > diff -u -p -u -r1.6 sysupgrade.sh
> > --- usr.sbin/sysupgrade/sysupgrade.sh   26 Apr 2019 21:52:39 -  
> > 1.6
> > +++ usr.sbin/sysupgrade/sysupgrade.sh   27 Apr 2019 11:54:48 -
> > @@ -110,7 +110,19 @@ fi
> >  
> >  cd ${SETSDIR}
> >  
> > -unpriv -f SHA256.sig ftp -Vmo SHA256.sig ${URL}SHA256.sig
> > +unpriv -f SHA256.sig.tmp ftp -Vmo SHA256.sig.tmp ${URL}SHA256.sig
> > +TMP_SHA=$(sha256 -q SHA256.sig.tmp)
> > +
> > +unpriv touch SHA256.sig
> > +CUR_SHA=$(sha256 -q SHA256.sig)
> > +
> > +if [[ "${TMP_SHA}" = "${CUR_SHA}" ]]; then
> > +   rm SHA256.sig.tmp
> > +   return 0
> > +fi
> > +
> > +unpriv cat SHA256.sig.tmp >SHA256.sig
> > +rm SHA256.sig.tmp
> >  
> >  _KEY=openbsd-${_KERNV[0]%.*}${_KERNV[0]#*.}-base.pub
> >  _NEXTKEY=openbsd-${NEXT_VERSION%.*}${NEXT_VERSION#*.}-base.pub
> > 
> 
> -- 
> I'm not entirely sure you are real.
> 



Re: [PATCH] httpd: Write X-Forwarded-For to access.log

2019-02-12 Thread Mischa


> On 12 Feb 2019, at 14:52, Bruno Flueckiger  wrote:
> 
> On 12.11.18 12:40, Bruno Flueckiger wrote:
>> On 11.11.18 18:43, Claudio Jeker wrote:
>>> On Sun, Nov 11, 2018 at 06:32:53PM +0100, Bruno Flueckiger wrote:
>>>> On 11.11.18 15:29, Florian Obser wrote:
>>>>> On Sun, Nov 11, 2018 at 01:46:06PM +0100, Sebastian Benoit wrote:
>>>>>> Bruno Flueckiger(inform...@gmx.net) on 2018.11.11 10:31:34 +0100:
>>>>>>> Hi
>>>>>>> 
>>>>>>> When I run httpd(8) behind relayd(8) the access log of httpd contains
>>>>>>> the IP address of relayd, but not the IP address of the client. I've
>>>>>>> tried to match the logs of relayd(8) and httpd(8) using some scripting
>>>>>>> and failed.
>>>>>>> 
>>>>>>> So I've written a patch for httpd(8). It stores the content of the
>>>>>>> X-Forwarded-For header in the third field of the log entry:
>>>>>>> 
>>>>>>> www.example.com 192.0.2.99 192.0.2.134 - [11/Nov/2018:09:28:48 ...
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Bruno
>>>>>> 
>>>>>> I'm not sure we should do this unconditionally. With no relayd or other
>>>>>> proxy infront of httpd, this is (more) data the client controls.
>>>>> 
>>>>> Isn't what httpd(8) currently logs apache's common log format?  If
>>>>> people are shoving that through webalizer or something that will
>>>>> break. I don't think we can do this without a config option.
>>>>> Do we need LogFormat?
>>>>> 
>>>>>> 
>>>>>> Could this be a problem?
>>>>>> 
>>>>>> code reads ok.
>>>>>> 
>>>>>> /Benno
>>>>>> 
>>>> 
>>>> I've extended my patch with an option to the log directive. Both log xff
>>>> and log combined must be set for a server to log the content of the
>>>> X-Forwarded-For header. If only log combined is set the log entries
>>>> remain in the well-known format.
>>>> 
>>>> This prevents clients from getting unwanted data in the log by default.
>>>> And it makes sure the log format remains default until one decides
>>>> actively to change it.
>>> 
>>> From my experience with webservices is that today logging the IP is
>>> seldomly good enough. Please also include X-Forwarded-Port and maybe
>>> X-Forwarded-Proto.
>>> In general thanks to CG-NAT logging only IP is a bit pointless.
>>> 
>>> -- 
>>> :wq Claudio
>>> 
>> 
>> Thanks for the hint, Claudio.
>> 
>> This version of my diff includes the two headers X-Forwarded-For and
>> X-Forwarded-Port. Instead of using the third field of the log entry I
>> add two additional fileds to it. The first one contains the value of
>> X-Forwarded-For and the second one that of X-Forwarded-Port.
>> 
>> I think that appending two fields might do less harm than replacing one
>> field at the beginning of the log entry. I'm not sure that adding
>> X-Forwarded-Proto to the log really brings a benefit, so I left it away
>> in this diff.
> 
> In the meantime I've run my diff on a webserver. In my experience
> webalizer has no problem with the modified log format. GoAccess on the
> other hand has troubles reading access.log, but that happens for me with
> and without the diff applied.
> 
> I think that most admins would profit from the diff. The setting is
> optional so it doesn't affect everybody rightaway. And I believe that
> those who enable it are ready to reconfigure whatever log parser they
> use.
> 
> Therefore I have reworked my diff so it applies to the -current tree.

Would be a very welcome addition to httpd.

Mischa

> 
> Index: usr.sbin/httpd/config.c
> ===
> RCS file: /cvs/src/usr.sbin/httpd/config.c,v
> retrieving revision 1.55
> diff -u -p -r1.55 config.c
> --- usr.sbin/httpd/config.c   20 Jun 2018 16:43:05 -  1.55
> +++ usr.sbin/httpd/config.c   12 Feb 2019 13:37:55 -
> @@ -427,6 +427,10 @@ config_getserver_config(struct httpd *en
>   if ((srv_conf->flags & f) == 0)
>   srv_conf->flags |= parent->flags & f;
> 
> + f = SRVFLAG_XFF|SRVFLAG_NO_XFF;
> + if ((srv_conf->flags & f) == 0)
> + srv_

Re: sbin/wsconsctl: show more data

2019-01-06 Thread Mischa Peters
I have to concur with Paul!
Saw the new font yesterday and was pleasantly surprised. Very nice!

Mischa

--

> On 6 Jan 2019, at 15:51, Paul de Weerd  wrote:
> 
> Lots of negativity here, so I just wanted to chime in - really love
> the new console font!  Crisp and easily readable letters, big enough
> to be readable, with a reasonable number of letters per line
> (${COLUMNS}) en lines per screen (${LINES}).  It does mean pretty big
> characters on big screens when in console mode, but on big screens I
> want to run X anyway, so it's all good.  What I understand of the
> algorithm to pick the font size makes a lot of sense to me.
> 
> Thank you Frederic for all the effort you put into this font and
> making it happen on the console and in X through the fonts/spleen
> port!
> 
> Cheers,
> 
> Paul 'WEiRD' de Weerd
> 
> -- 
>> [<++>-]<+++.>+++[<-->-]<.>+++[<+
> +++>-]<.>++[<>-]<+.--.[-]
> http://www.weirdnet.nl/ 
> 



Re: carp though bridge with vmd

2018-12-10 Thread Mischa
Hi Reyk,

If there is anything I can supply let me know, but I guess it's simple enough 
to replicate.
Let me check carppeer anyway.

Mischa


> On 10 Dec 2018, at 09:55, Reyk Floeter  wrote:
> 
> Hi,
> 
> as a general note for virtual switches and clouds that don’t support CARP due 
> to restrictions on multicast and/or additional MACs: I use carppeer and 
> lladdr of the parent interface in such cases.
> 
> That doesn’t mean that you should need it with vmd and bridge and we have to 
> look into this.
> 
> Reyk
> 
>> Am 09.12.2018 um 16:56 schrieb Mischa :
>> 
>> Hi All,
>> 
>> Is there a way to get carp working through a bridge?
>> I am currently testing to see whether I can have 2 vmd VMs on different 
>> hosts use carp between them.
>> The current state that I am currently at is, both VMs are master.
>> 
>> Setup on both hosts is the same, bridge1 with em0 as interface.
>> 
>> # vm.conf
>> switch "uplink_bridge1" {
>>   interface bridge1
>> }
>> vm "lb1" {
>>   disable
>>   disk "/home/mischa/vmm/lb1.img"
>>   interface tap {
>>   switch "uplink_bridge1"
>>   }
>> }
>> 
>> lb1 carp config:
>> inet 192.168.0.100 255.255.255.0 NONE vhid 1 pass  carpdev vio0 advbase 
>> 10 advskew 100
>> 
>> lb2 carp config:
>> inet 192.168.0.100 255.255.255.0 NONE vhid 1 pass  carpdev vio0 advbase 
>> 10 advskew 110
>> 
>> Is there anything that can be configured on the bridge side?
>> 
>> Mischa
>> 
> 



Re: carp though bridge with vmd

2018-12-10 Thread Mischa Peters
Hi David,

Yes there is. Currently the machine are directly connected to each other on 
em0, the VMs are able to reach each other. 

VM1 -> bridge1 -> em0 — em0 <- bridge1 <- VM2

Mischa

--

> On 10 Dec 2018, at 03:00, David Gwynne  wrote:
> 
> Is there a shared ethernet network between the bridges on each host?
> 
>> On 10 Dec 2018, at 01:56, Mischa  wrote:
>> 
>> Hi All,
>> 
>> Is there a way to get carp working through a bridge?
>> I am currently testing to see whether I can have 2 vmd VMs on different 
>> hosts use carp between them.
>> The current state that I am currently at is, both VMs are master.
>> 
>> Setup on both hosts is the same, bridge1 with em0 as interface.
>> 
>> # vm.conf
>> switch "uplink_bridge1" {
>>   interface bridge1
>> }
>> vm "lb1" {
>>   disable
>>   disk "/home/mischa/vmm/lb1.img"
>>   interface tap {
>>   switch "uplink_bridge1"
>>   }
>> }
>> 
>> lb1 carp config:
>> inet 192.168.0.100 255.255.255.0 NONE vhid 1 pass  carpdev vio0 advbase 
>> 10 advskew 100
>> 
>> lb2 carp config:
>> inet 192.168.0.100 255.255.255.0 NONE vhid 1 pass  carpdev vio0 advbase 
>> 10 advskew 110
>> 
>> Is there anything that can be configured on the bridge side?
>> 
>> Mischa
>> 
> 



carp though bridge with vmd

2018-12-09 Thread Mischa
Hi All,

Is there a way to get carp working through a bridge?
I am currently testing to see whether I can have 2 vmd VMs on different hosts 
use carp between them.
The current state that I am currently at is, both VMs are master.

Setup on both hosts is the same, bridge1 with em0 as interface.

# vm.conf
switch "uplink_bridge1" {
interface bridge1
}
vm "lb1" {
disable
    disk "/home/mischa/vmm/lb1.img"
interface tap {
switch "uplink_bridge1"
}
}

lb1 carp config:
inet 192.168.0.100 255.255.255.0 NONE vhid 1 pass  carpdev vio0 advbase 10 
advskew 100

lb2 carp config:
inet 192.168.0.100 255.255.255.0 NONE vhid 1 pass  carpdev vio0 advbase 10 
advskew 110

Is there anything that can be configured on the bridge side?

Mischa



Re: [PATCH] httpd: Write X-Forwarded-For to access.log

2018-11-11 Thread Mischa Peters



> On 11 Nov 2018, at 18:43, Claudio Jeker  wrote:
> 
>> On Sun, Nov 11, 2018 at 06:32:53PM +0100, Bruno Flueckiger wrote:
>>> On 11.11.18 15:29, Florian Obser wrote:
>>>> On Sun, Nov 11, 2018 at 01:46:06PM +0100, Sebastian Benoit wrote:
>>>> Bruno Flueckiger(inform...@gmx.net) on 2018.11.11 10:31:34 +0100:
>>>>> Hi
>>>>> 
>>>>> When I run httpd(8) behind relayd(8) the access log of httpd contains
>>>>> the IP address of relayd, but not the IP address of the client. I've
>>>>> tried to match the logs of relayd(8) and httpd(8) using some scripting
>>>>> and failed.
>>>>> 
>>>>> So I've written a patch for httpd(8). It stores the content of the
>>>>> X-Forwarded-For header in the third field of the log entry:
>>>>> 
>>>>> www.example.com 192.0.2.99 192.0.2.134 - [11/Nov/2018:09:28:48 ...
>>>>> 
>>>>> Cheers,
>>>>> Bruno
>>>> 
>>>> I'm not sure we should do this unconditionally. With no relayd or other
>>>> proxy infront of httpd, this is (more) data the client controls.
>>> 
>>> Isn't what httpd(8) currently logs apache's common log format?  If
>>> people are shoving that through webalizer or something that will
>>> break. I don't think we can do this without a config option.
>>> Do we need LogFormat?
>>> 
>>>> 
>>>> Could this be a problem?
>>>> 
>>>> code reads ok.
>>>> 
>>>> /Benno
>>>> 
>> 
>> I've extended my patch with an option to the log directive. Both log xff
>> and log combined must be set for a server to log the content of the
>> X-Forwarded-For header. If only log combined is set the log entries
>> remain in the well-known format.
>> 
>> This prevents clients from getting unwanted data in the log by default.
>> And it makes sure the log format remains default until one decides
>> actively to change it.
> 
> From my experience with webservices is that today logging the IP is
> seldomly good enough. Please also include X-Forwarded-Port and maybe
> X-Forwarded-Proto.
> In general thanks to CG-NAT logging only IP is a bit pointless.

Or with relayd in front of it. :)
Welcome addition to httpd. 

Mischa

> 
> -- 
> :wq Claudio
> 
>> Index: usr.sbin/httpd/config.c
>> ===
>> RCS file: /cvs/src/usr.sbin/httpd/config.c,v
>> retrieving revision 1.55
>> diff -u -p -r1.55 config.c
>> --- usr.sbin/httpd/config.c20 Jun 2018 16:43:05 -1.55
>> +++ usr.sbin/httpd/config.c11 Nov 2018 14:45:47 -
>> @@ -427,6 +427,10 @@ config_getserver_config(struct httpd *en
>>if ((srv_conf->flags & f) == 0)
>>srv_conf->flags |= parent->flags & f;
>> 
>> +f = SRVFLAG_XFF|SRVFLAG_NO_XFF;
>> +if ((srv_conf->flags & f) == 0)
>> +srv_conf->flags |= parent->flags & f;
>> +
>>f = SRVFLAG_AUTH|SRVFLAG_NO_AUTH;
>>if ((srv_conf->flags & f) == 0) {
>>srv_conf->flags |= parent->flags & f;
>> Index: usr.sbin/httpd/httpd.conf.5
>> ===
>> RCS file: /cvs/src/usr.sbin/httpd/httpd.conf.5,v
>> retrieving revision 1.101
>> diff -u -p -r1.101 httpd.conf.5
>> --- usr.sbin/httpd/httpd.conf.520 Jun 2018 16:43:05 -1.101
>> +++ usr.sbin/httpd/httpd.conf.511 Nov 2018 14:45:47 -
>> @@ -455,6 +455,14 @@ If not specified, the default is
>> Enable or disable logging to
>> .Xr syslog 3
>> instead of the log files.
>> +.It Oo Ic no Oc Ic xff
>> +Enable or disable logging of the request header
>> +.Ar X-Forwarded-For
>> +if
>> +.Cm log combined
>> +is set. This header can be set by a reverse proxy like
>> +.Xr relayd 8
>> +and should contain the IP address of the client.
>> .El
>> .It Ic pass
>> Disable any previous
>> Index: usr.sbin/httpd/httpd.h
>> ===
>> RCS file: /cvs/src/usr.sbin/httpd/httpd.h,v
>> retrieving revision 1.142
>> diff -u -p -r1.142 httpd.h
>> --- usr.sbin/httpd/httpd.h11 Oct 2018 09:52:22 -1.142
>> +++ usr.sbin/httpd/httpd.h11 Nov 2018 14:45:47 -
>> @@ -400,6 +400,8 @@ SPLAY_HEAD(client_tree, client);
>> #define SRVFLAG_DEFAULT_TYPE0x0080
>> #define SRVFLAG_PATH_R

Re: Clean install or upgrade to 6.4 fw_update error

2018-10-18 Thread Mischa
That helps. Thanx!

Mischa


> On 18 Oct 2018, at 17:47, Stuart Henderson  wrote:
> 
> That error message is because there are no syspatch yet, it is not from the 
> firmware update.
> 
> 
> On 18 October 2018 16:02:34 Mischa  wrote:
> 
>> Hi All,
>> 
>> 
>> Just ran a couple of updates and clean installs and I am seeing the 
>> following error during boot.
>> 
>> 
>> Clean install:
>> running rc.firsttime
>> Path to firmware: http://firmware.openbsd.org/firmware/6.4/
>> Installing: intel-firmware
>> Checking for available binary patches...ftp: Error retrieving file: 404 Not 
>> Found
>> 
>> 
>> Upgrade:
>> running rc.firsttime
>> Path to firmware: http://firmware.openbsd.org/firmware/6.4/
>> Updating: intel-firmware-20180807v0
>> Checking for available binary patches...ftp: Error retrieving file: 404 Not 
>> Found
>> 
>> 
>> Doing this by hand after login is successful.
>> # fw_update -v intel-firmware-20180807p0v0
>> Path to firmware: http://firmware.openbsd.org/firmware/6.4/
>> Updating: intel-firmware-20180807p0v0
>> 
>> 
>> Mischa
> 
> 
> 



Clean install or upgrade to 6.4 fw_update error

2018-10-18 Thread Mischa
Hi All,

Just ran a couple of updates and clean installs and I am seeing the following 
error during boot.

Clean install:
running rc.firsttime
Path to firmware: http://firmware.openbsd.org/firmware/6.4/
Installing: intel-firmware
Checking for available binary patches...ftp: Error retrieving file: 404 Not 
Found

Upgrade:
running rc.firsttime   
Path to firmware: http://firmware.openbsd.org/firmware/6.4/
Updating: intel-firmware-20180807v0
Checking for available binary patches...ftp: Error retrieving file: 404 Not 
Found 

Doing this by hand after login is successful.
# fw_update -v intel-firmware-20180807p0v0   
Path to firmware: http://firmware.openbsd.org/firmware/6.4/
Updating: intel-firmware-20180807p0v0

Mischa



Re: Reuse VM ids.

2018-10-07 Thread Mischa Peters
No idea if the code works yet.
Hopefully I can try later. But love the idea. 

Mischa

> On 8 Oct 2018, at 04:31, Ori Bernstein  wrote:
> 
> Keep a list of known vms, and reuse the VM IDs. This means that when using
> '-L', the IP addresses of the VMs are stable.
> 
> diff --git usr.sbin/vmd/config.c usr.sbin/vmd/config.c
> index af12b790002..522bae32501 100644
> --- usr.sbin/vmd/config.c
> +++ usr.sbin/vmd/config.c
> @@ -61,7 +61,10 @@ config_init(struct vmd *env)
>if (what & CONFIG_VMS) {
>if ((env->vmd_vms = calloc(1, sizeof(*env->vmd_vms))) == NULL)
>return (-1);
> +if ((env->vmd_known = calloc(1, sizeof(*env->vmd_known))) == NULL)
> +return (-1);
>TAILQ_INIT(env->vmd_vms);
> +TAILQ_INIT(env->vmd_known);
>}
>if (what & CONFIG_SWITCHES) {
>if ((env->vmd_switches = calloc(1,
> diff --git usr.sbin/vmd/vmd.c usr.sbin/vmd/vmd.c
> index 18a5e0d3d5d..732691b4381 100644
> --- usr.sbin/vmd/vmd.c
> +++ usr.sbin/vmd/vmd.c
> @@ -1169,6 +1169,27 @@ vm_remove(struct vmd_vm *vm, const char *caller)
>free(vm);
> }
> 
> +static uint32_t
> +claim_vmid(const char *name)
> +{
> +struct name2id *n2i = NULL;
> +
> +TAILQ_FOREACH(n2i, env->vmd_known, entry)
> +if (strcmp(n2i->name, name) == 0)
> +return n2i->id;
> +
> +if (++env->vmd_nvm == 0)
> +fatalx("too many vms");
> +if ((n2i = calloc(1, sizeof(struct name2id))) == NULL)
> +fatalx("could not alloc vm name");
> +n2i->id = env->vmd_nvm;
> +if (strlcpy(n2i->name, name, sizeof(n2i->name)) >= sizeof(n2i->name))
> +fatalx("overlong vm name");
> +TAILQ_INSERT_TAIL(env->vmd_known, n2i, entry);
> +
> +return n2i->id;
> +}
> +
> int
> vm_register(struct privsep *ps, struct vmop_create_params *vmc,
> struct vmd_vm **ret_vm, uint32_t id, uid_t uid)
> @@ -1300,11 +1321,8 @@ vm_register(struct privsep *ps, struct 
> vmop_create_params *vmc,
>vm->vm_cdrom = -1;
>vm->vm_iev.ibuf.fd = -1;
> 
> -if (++env->vmd_nvm == 0)
> -fatalx("too many vms");
> -
>/* Assign a new internal Id if not specified */
> -vm->vm_vmid = id == 0 ? env->vmd_nvm : id;
> +vm->vm_vmid = (id == 0) ? claim_vmid(vcp->vcp_name) : id;
> 
>log_debug("%s: registering vm %d", __func__, vm->vm_vmid);
>TAILQ_INSERT_TAIL(env->vmd_vms, vm, vm_entry);
> diff --git usr.sbin/vmd/vmd.h usr.sbin/vmd/vmd.h
> index b7c012854e8..86fad536e59 100644
> --- usr.sbin/vmd/vmd.h
> +++ usr.sbin/vmd/vmd.h
> @@ -276,6 +276,13 @@ struct vmd_user {
> };
> TAILQ_HEAD(userlist, vmd_user);
> 
> +struct name2id {
> +charname[VMM_MAX_NAME_LEN];
> +int32_tid;
> +TAILQ_ENTRY(name2id)entry;
> +};
> +TAILQ_HEAD(name2idlist, name2id);
> +
> struct address {
>struct sockaddr_storage ss;
>int prefixlen;
> @@ -300,6 +307,7 @@ struct vmd {
> 
>uint32_t vmd_nvm;
>struct vmlist*vmd_vms;
> +struct name2idlist*vmd_known;
>uint32_t vmd_nswitches;
>struct switchlist*vmd_switches;
>struct userlist*vmd_users;
> 
> -- 
>Ori Bernstein
> 



softraid offline

2018-07-18 Thread Mischa
Hi All,

Rebooting after a dbb prompt, which I was unfortunately unable to capture, the 
softraid configuration seemed to have been damaged.

root@j6:~ # dmesg | grep softraid
softraid0 at root
scsibus5 at softraid0: 256 targets
softraid0: trying to bring up sd8 degraded
softraid0: sd8 was not shutdown properly
softraid0: sd8 is offline, will not be brought online

root@j6:~ # bioctl -d sd8
bioctl: Can't locate sd8 device via /dev/bio

root@j6:~ # bioctl -c 5 -l /dev/sd3a,/dev/sd4a,/dev/sd5a,/dev/sd6a softraid0
softraid0: trying to bring up sd8 degraded
softraid0: sd8 was not shutdown properly
softraid0: sd8 is offline, will not be brought online

root@j6:~ # bioctl -R /dev/sd3a sd8
bioctl: Can't locate sd8 device via /dev/bio

All the /dev/sd8* are there.

So it looks like I am stuck with schroedingers softraid. It's both there and 
not there.
Anybody some pointers on how to get it back or to remove it completely?

Thanx!

Mischa



[diff] copyright notice index.html

2018-06-26 Thread Mischa
Hi @tech,

Just noticed index.html still has 2017 in the copyright notice.

Mischa

Index: index.html
===
RCS file: /cvs/www/index.html,v
retrieving revision 1.726
diff -r1.726 index.html
7c7
< 

Re: [diff] httpd.conf.5 - consistent IPs, added examples

2018-06-22 Thread Mischa Peters


> On 22 Jun 2018, at 22:20, Sebastian Benoit  wrote:
> 
> Jason McIntyre(j...@kerhand.co.uk) on 2018.06.22 19:38:55 +0100:
>>> On Fri, Jun 22, 2018 at 12:08:07AM +0200, Mischa wrote:
>>> Hi tech@,
>>> 
>>> Changed httpd.conf.5 to be more consistent with IPs used, all documentation 
>>> IPs now.
>>> And added a couple of examples. 2 for dyamic pages, cgi and php.
>>> One fairly common used rewrite rule for things like wordpress.
>>> 
>>> Mischa
>>> 
>> 
>> hi.
>> 
>> the diff reads ok to me, but i cannot easily verify it because it does
>> not apply cleanly.
> 
> The parts that dont apply are already commited parts from reyks rewrite
> diff. They can be ignored.

My mistake. Will take a clean checkout. 

> However i dont know if we want all those examples. We will get them in
> package readmes eventually.

I can also check out the pkg-readmes and expand those. As the php ones don’t 
have any mention about the httpd.conf needs. Will check out Wordpress as well.

I do think adding extra examples can be helpful as its probably a knee jerk 
reaction for people to go to the man page, or man.openbsd.org. 

Mischa


> 
> Reyk?
> 
>> 
>> reyk?
>> 
>> jmc
>> 
>>> Index: httpd.conf.5
>>> ===
>>> RCS file: /cvs/src/usr.sbin/httpd/httpd.conf.5,v
>>> retrieving revision 1.100
>>> diff -u -p -r1.100 httpd.conf.5
>>> --- httpd.conf.518 Jun 2018 06:04:25 -1.100
>>> +++ httpd.conf.521 Jun 2018 21:55:38 -
>>> @@ -97,7 +97,7 @@ Macros are not expanded inside quotes.
>>> .Pp
>>> For example:
>>> .Bd -literal -offset indent
>>> -ext_ip="10.0.0.1"
>>> +ext_ip="203.0.113.1"
>>> server "default" {
>>>listen on $ext_ip port 80
>>> }
>>> @@ -198,6 +198,8 @@ argument can be used with return codes i
>>> .Sq Location:
>>> header for redirection to a specified URI.
>>> .Pp
>>> +It is possible to rewrite the request to redirect it to a different
>>> +external location.
>>> The
>>> .Ar uri
>>> may contain predefined macros that will be expanded at runtime:
>>> @@ -396,10 +398,10 @@ the
>>> using pattern matching instead of shell globbing rules,
>>> see
>>> .Xr patterns 7 .
>>> -The pattern may contain captures that can be used in the
>>> -.Ar uri
>>> -of an enclosed
>>> +The pattern may contain captures that can be used in an enclosed
>>> .Ic block return
>>> +or
>>> +.Ic request rewrite
>>> option.
>>> .It Oo Ic no Oc Ic log Op Ar option
>>> Set the specified logging options.
>>> @@ -462,6 +464,19 @@ in a location.
>>> Configure the options for the request path.
>>> Valid options are:
>>> .Bl -tag -width Ds
>>> +.It Oo Ic no Oc Ic rewrite Ar path
>>> +Enable or disable rewriting of the request.
>>> +Unlike the redirection with
>>> +.Ic block return ,
>>> +this will change the request path internally before
>>> +.Nm httpd
>>> +makes a final decision about the matching location.
>>> +The
>>> +.Ar path
>>> +argument may contain predefined macros that will be expanded at runtime.
>>> +See the
>>> +.Ic block return
>>> +option for the list of supported macros.
>>> .It Ic strip Ar number
>>> Strip
>>> .Ar number
>>> @@ -699,7 +714,7 @@ server "www.b.example.com" {
>>> }
>>> 
>>> server "intranet.example.com" {
>>> -listen on 10.0.0.1 port 80
>>> +listen on 192.0.2.1 port 80
>>>root "/htdocs/intranet.example.com"
>>> }
>>> .Ed
>>> @@ -709,12 +724,43 @@ Simple redirections can be configured wi
>>> directive:
>>> .Bd -literal -offset indent
>>> server "example.com" {
>>> -listen on 10.0.0.1 port 80
>>> +listen on 203.0.113.1 port 80
>>>block return 301 "http://www.example.com$REQUEST_URI;
>>> }
>>> 
>>> server "www.example.com" {
>>> -listen on 10.0.0.1 port 80
>>> +listen on 203.0.113.1 port 80
>>> +}
>>> +.Ed
>>> +.Pp
>>> +Serving dynamic pages can be defined with the 
>>> +.Ic location
>>> +directive:
>>> +.Bd -literal -offset indent
>>> +server "www.example.com" {
>>> +listen on * port 80
>>> +location "/*.cgi*" {
>>> +fastcgi
>>> +root "/cgi-bin/"
>>> +}
>>> +location "/*.php*" {
>>> +fastcgi socket "/run/php-fpm.sock"
>>> +}
>>> +}
>>> +.Ed
>>> +.Pp
>>> +The request can also be rewritten with the
>>> +.Ic request rewrite
>>> +directive:
>>> +.Bd -literal -offset indent
>>> +server "www.example.com" {
>>> +listen on * port 80
>>> +location match "/old/(.*)" {
>>> +request rewrite "/new/%1"
>>> +}
>>> +location match "/([%a%d]+)" {
>>> +request rewrite "/dynamic/index.php?q=%1"
>>> +}
>>> }
>>> .Ed
>>> .Sh SEE ALSO
>>> 
>> 
> 



[diff] httpd.conf.5 - consistent IPs, added examples

2018-06-21 Thread Mischa
Hi tech@,

Changed httpd.conf.5 to be more consistent with IPs used, all documentation IPs 
now.
And added a couple of examples. 2 for dyamic pages, cgi and php.
One fairly common used rewrite rule for things like wordpress.

Mischa

Index: httpd.conf.5
===
RCS file: /cvs/src/usr.sbin/httpd/httpd.conf.5,v
retrieving revision 1.100
diff -u -p -r1.100 httpd.conf.5
--- httpd.conf.518 Jun 2018 06:04:25 -  1.100
+++ httpd.conf.521 Jun 2018 21:55:38 -
@@ -97,7 +97,7 @@ Macros are not expanded inside quotes.
 .Pp
 For example:
 .Bd -literal -offset indent
-ext_ip="10.0.0.1"
+ext_ip="203.0.113.1"
 server "default" {
listen on $ext_ip port 80
 }
@@ -198,6 +198,8 @@ argument can be used with return codes i
 .Sq Location:
 header for redirection to a specified URI.
 .Pp
+It is possible to rewrite the request to redirect it to a different
+external location.
 The
 .Ar uri
 may contain predefined macros that will be expanded at runtime:
@@ -396,10 +398,10 @@ the
 using pattern matching instead of shell globbing rules,
 see
 .Xr patterns 7 .
-The pattern may contain captures that can be used in the
-.Ar uri
-of an enclosed
+The pattern may contain captures that can be used in an enclosed
 .Ic block return
+or
+.Ic request rewrite
 option.
 .It Oo Ic no Oc Ic log Op Ar option
 Set the specified logging options.
@@ -462,6 +464,19 @@ in a location.
 Configure the options for the request path.
 Valid options are:
 .Bl -tag -width Ds
+.It Oo Ic no Oc Ic rewrite Ar path
+Enable or disable rewriting of the request.
+Unlike the redirection with
+.Ic block return ,
+this will change the request path internally before
+.Nm httpd
+makes a final decision about the matching location.
+The
+.Ar path
+argument may contain predefined macros that will be expanded at runtime.
+See the
+.Ic block return
+option for the list of supported macros.
 .It Ic strip Ar number
 Strip
 .Ar number
@@ -699,7 +714,7 @@ server "www.b.example.com" {
 }
 
 server "intranet.example.com" {
-   listen on 10.0.0.1 port 80
+   listen on 192.0.2.1 port 80
root "/htdocs/intranet.example.com"
 }
 .Ed
@@ -709,12 +724,43 @@ Simple redirections can be configured wi
 directive:
 .Bd -literal -offset indent
 server "example.com" {
-   listen on 10.0.0.1 port 80
+   listen on 203.0.113.1 port 80
block return 301 "http://www.example.com$REQUEST_URI;
 }
 
 server "www.example.com" {
-   listen on 10.0.0.1 port 80
+   listen on 203.0.113.1 port 80
+}
+.Ed
+.Pp
+Serving dynamic pages can be defined with the 
+.Ic location
+directive:
+.Bd -literal -offset indent
+server "www.example.com" {
+   listen on * port 80
+   location "/*.cgi*" {
+   fastcgi
+   root "/cgi-bin/"
+   }
+   location "/*.php*" {
+   fastcgi socket "/run/php-fpm.sock"
+   }
+}
+.Ed
+.Pp
+The request can also be rewritten with the
+.Ic request rewrite
+directive:
+.Bd -literal -offset indent
+server "www.example.com" {
+   listen on * port 80
+   location match "/old/(.*)" {
+   request rewrite "/new/%1"
+   }
+   location match "/([%a%d]+)" {
+   request rewrite "/dynamic/index.php?q=%1"
+   }
 }
 .Ed
 .Sh SEE ALSO



Re: [patch] Skip background scan if bssid is set

2018-04-29 Thread Mischa Peters

> On 29 Apr 2018, at 11:43, Stuart Henderson <s...@spacehopper.org> wrote:
> 
>> On 2018/04/29 10:17, Stefan Sperling wrote:
>>> On Sun, Apr 29, 2018 at 03:39:07AM +0200, Jesper Wallin wrote:
>>> Hi all,
>>> 
>>> I recently learned that my AP behaves badly and I have packet loss when
>>> the background scan is running.  I had a small chat with stsp@ about it,
>>> asking if there is a way to disable it.  He kindly explained that if I'm
>>> connected to an AP with a weak signal, it will try to find another AP
>>> with better signal and use that one instead.
>>> 
>>> Sadly, I only have a single AP at home and this doesn't really solve my
>>> problem.  Though, you can also set a desired bssid to use, to force it
>>> to connect to a single AP.  However, the background scan will still run
>>> even if this is set.
>>> 
>>> Maybe the background scan has other use-cases that I'm not aware of, if
>>> so, I apologize in advance.  The patch below simply check if a bssid is
>>> specified and if so, skip the background scan.
>> 
>> I agree, even though it would be nice to understand the underlying
>> packet loss issue. But I cannot reproduce the problem unforunately :(
>> Have you verified that the problem only happens on this particular AP?
> 
> It's very common  for wifi clients to do background scans so I'd be
> interested to know whether non-OpenBSD clients also see packet loss,
> or whether OpenBSD with a different client device is any better. What
> are the AP and client devices? Are other firmware versions available? I
> guess bg scan must use power-saving to queue frames while the client is
> off channel so maybe the issue relates to this.
> 
> I'm wondering if changing this may introduce problems when an AP moves
> to a different channel? Either by manual configuration, mechanisms
> like Ruckus' channelfly (still possible on single-AP even without a
> controller), radar detect on 5GHz, or even something as simple as
> rebooting an AP set to "auto" channel.

How does this play with roaming protocols on “enterprise” WiFi equipment, like 
802.11k and 802.11v?

Mischa



iwm0 doesn't connect automatically after reboot

2018-03-24 Thread Mischa
acpi0: bus -1 (RP22)
acpiprt23 at acpi0: bus -1 (RP23)
acpiprt24 at acpi0: bus -1 (RP24)
acpicpu0 at acpi0: C3(200@1034 mwait.1@0x60), C2(200@151 mwait.1@0x33), 
C1(1000@1 mwait.1), PSS
acpicpu1 at acpi0: C3(200@1034 mwait.1@0x60), C2(200@151 mwait.1@0x33), 
C1(1000@1 mwait.1), PSS
acpicpu2 at acpi0: C3(200@1034 mwait.1@0x60), C2(200@151 mwait.1@0x33), 
C1(1000@1 mwait.1), PSS
acpicpu3 at acpi0: C3(200@1034 mwait.1@0x60), C2(200@151 mwait.1@0x33), 
C1(1000@1 mwait.1), PSS
acpipwrres0 at acpi0: PUBS, resource for XHC_
acpipwrres1 at acpi0: WRST
acpipwrres2 at acpi0: WRST
acpitz0 at acpi0: critical temperature is 128 degC
acpithinkpad0 at acpi0
acpiac0 at acpi0: AC unit offline
acpibat0 at acpi0: BAT0 model "45N1113" serial  4754 type LION oem "LGC"
acpibat1 at acpi0: BAT1 model "45N1775" serial  2821 type LION oem "SANYO"
"INT3F0D" at acpi0 not configured
"LEN0071" at acpi0 not configured
"LEN2046" at acpi0 not configured
"INT3515" at acpi0 not configured
acpibtn0 at acpi0: SLPB
"PNP0C14" at acpi0 not configured
acpibtn1 at acpi0: LID_
"PNP0C14" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"INT3394" at acpi0 not configured
acpivideo0 at acpi0: GFX0
acpivout at acpivideo0 not configured
cpu0: Enhanced SpeedStep 1296 MHz: speeds: 2701, 2700, 2600, 2500, 2400, 2200, 
2000, 1800, 1600, 1500, 1300, 1100, 800, 700, 600, 400 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 7G Host" rev 0x02
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 620" rev 0x02
drm0 at inteldrm0
inteldrm0: msi
error: [drm:pid0:i915_firmware_load_error_print] *ERROR* failed to load 
firmware i915/kbl_dmc_ver1.bin (-22)
inteldrm0: 1920x1080, 32bpp
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
xhci0 at pci0 dev 20 function 0 "Intel 100 Series xHCI" rev 0x21: msi
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00 
addr 1
pchtemp0 at pci0 dev 20 function 2 "Intel 100 Series Thermal" rev 0x21
dwiic0 at pci0 dev 21 function 0 "Intel 100 Series I2C" rev 0x21: apic 2 int 16
iic0 at dwiic0
dwiic1 at pci0 dev 21 function 1 "Intel 100 Series I2C" rev 0x21: apic 2 int 17
iic1 at dwiic1
"INT3515" at iic1 addr 0x38 not configured
"Intel 100 Series MEI" rev 0x21 at pci0 dev 22 function 0 not configured
ahci0 at pci0 dev 23 function 0 "Intel 100 Series AHCI" rev 0x21: msi, AHCI 
1.3.1
ahci0: port 1: 6.0Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 1 lun 0: <ATA, Samsung SSD 850, EMT0> SCSI3 0/direct fixed 
naa.5002538d423a4b80
sd0: 476940MB, 512 bytes/sector, 976773168 sectors, thin
ppb0 at pci0 dev 28 function 0 "Intel 100 Series PCIE" rev 0xf1: msi
pci1 at ppb0 bus 2
rtsx0 at pci1 dev 0 function 0 "Realtek RTS522A Card Reader" rev 0x01: msi
sdmmc0 at rtsx0: 4-bit, dma
ppb1 at pci0 dev 28 function 2 "Intel 100 Series PCIE" rev 0xf1: msi
pci2 at ppb1 bus 3
iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless-AC 8265" rev 0x78, msi
pcib0 at pci0 dev 31 function 0 "Intel 200 Series LPC" rev 0x21
"Intel 100 Series PMC" rev 0x21 at pci0 dev 31 function 2 not configured
azalia0 at pci0 dev 31 function 3 "Intel 200 Series HD Audio" rev 0x21: msi
azalia0: codecs: Realtek/0x0298, Intel/0x280b, using Realtek/0x0298
audio0 at azalia0
ichiic0 at pci0 dev 31 function 4 "Intel 100 Series SMBus" rev 0x21: apic 2 int 
16
iic2 at ichiic0
em0 at pci0 dev 31 function 6 "Intel I219-V" rev 0x21: msi, address 
54:e1:ad:c3:1f:cc
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
wsmouse1 at pms0 mux 0
pms0: Synaptics clickpad, firmware 8.2, 0x1e2b1 0x943300
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
ugen0 at uhub0 port 7 "Intel Bluetooth" rev 2.00/0.10 addr 2
uvideo0 at uhub0 port 8 configuration 1 interface 0 "Bison Integrated Camera" 
rev 2.00/37.27 addr 3
video0 at uvideo0
ugen1 at uhub0 port 9 "Validity Sensors product 0x0097" rev 2.00/1.64 addr 4
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (e41ed128f96a0a39.a) swap on sd0b dump on sd0b
iwm0: hw rev 0x230, fw ver 22.361476.0, address 28:c6:3f:90:ad:4e
### end dmesg ###

Thanx!

Mischa



Re: restrict carp use to ethernet interfaces

2018-01-10 Thread Mischa Peters

> On 11 Jan 2018, at 08:25, Matthieu Herrb <matth...@herrb.eu> wrote:
> 
>> On Thu, Jan 11, 2018 at 10:29:17AM +1000, David Gwynne wrote:
>> carp interfaces output using ether_output, so it is reasonable to
>> require that they only get configured on top of ethernet interfaces
>> rather than just !IFT_CARP.
>> 
> Hi,
> 
> in this context are vlan interfaces also considered as IFT_ETHER ?
> I've use cases for carp over vlan interfaces. I'd hate not being able
> to do that anymore.

Doing the same at the moment. Super useful to be able to continue to do this. 

Mischa