Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-06 Thread Mischa

On 2023-09-06 19:38, Dave Voutila wrote:

Mischa  writes:

On 2023-09-06 05:36, Dave Voutila wrote:

Mischa  writes:

On 2023-09-05 14:27, Dave Voutila wrote:

Mike Larkin  writes:


On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote:

On 2023-09-04 18:58, Mischa wrote:
> On 2023-09-04 18:55, Mischa wrote:

/snip


> > Adding the sleep 2 does indeed help. I managed to get 20 VMs started
> > this way, before it would choke on 2-3.
> >
> > Do I only need the unpatched kernel or also the vmd/vmctl from snap?
>
> I do still get the same message on the console, but the machine isn't
> freezing up.
>
> [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not
> MAP_STACK
Starting 30 VMs this way caused the machine to become 
unresponsive

again,
but nothing on the console. :(
Mischa

Were you seeing these uvm errors before this diff? If so, this
isn't
causing the problem and something else is.
I don't believe we solved any of the underlying uvm issues in 
Bruges

last year. Mischa, can you test with just the latest
snapshot/-current?
I'd imagine starting and stopping many vm's now is exacerbating the
issue because of the fork/exec for devices plus the ioctl to do a 
uvm

share into the device process address space.


If this diff causes the errors to occur, and without the diff it's
fine, then
we need to look into that.
Also I think a pid number in that printf might be useful, I'll see
what I can
find. If it's not vmd causing this and rather some other process
then that
would be good to know also.

Sadly it looks like that printf doesn't spit out the offending
pid. :(

Just to confirm I am seeing this behavior on the latest snap
without
the patch as well.

Since this diff isn't the cause, I've committed it. Thanks for
testing. I'll see if I can reproduce your MAP_STACK issues.

Just started 10 VMs with sleep 2, machine freezes, but nothing on 
the

console. :(

For now, I'd recommend spacing out vm launches. I'm pretty sure it's
related to the uvm corruption we saw last year when creating, 
starting,

and destroying vm's rapidly in a loop.


That could very well be the case. I will adjust my start script, so
far I've got good results with a 10 second sleep.

Is there some additional debugging I can turn that makes sense for
this? I can easily replicate.



Highly doubtful if the issue is what I think. The only thing would be
making sure you're running in a way to see any panic and drop into
ddb. If you're using X or not on the the primary console or serial
connection it might just appear as a deadlocked system during a panic.


I am using the console via iDRAC, there isn't any information anymore. 
:(


Mischa



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-06 Thread Dave Voutila


Mischa  writes:

> On 2023-09-06 05:36, Dave Voutila wrote:
>> Mischa  writes:
>>> On 2023-09-05 14:27, Dave Voutila wrote:
 Mike Larkin  writes:

> On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote:
>> On 2023-09-04 18:58, Mischa wrote:
>> > On 2023-09-04 18:55, Mischa wrote:
 /snip

>> > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started
>> > > this way, before it would choke on 2-3.
>> > >
>> > > Do I only need the unpatched kernel or also the vmd/vmctl from snap?
>> >
>> > I do still get the same message on the console, but the machine isn't
>> > freezing up.
>> >
>> > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not
>> > MAP_STACK
>> Starting 30 VMs this way caused the machine to become unresponsive
>> again,
>> but nothing on the console. :(
>> Mischa
> Were you seeing these uvm errors before this diff? If so, this
> isn't
> causing the problem and something else is.
 I don't believe we solved any of the underlying uvm issues in Bruges
 last year. Mischa, can you test with just the latest
 snapshot/-current?
 I'd imagine starting and stopping many vm's now is exacerbating the
 issue because of the fork/exec for devices plus the ioctl to do a uvm
 share into the device process address space.

> If this diff causes the errors to occur, and without the diff it's
> fine, then
> we need to look into that.
> Also I think a pid number in that printf might be useful, I'll see
> what I can
> find. If it's not vmd causing this and rather some other process
> then that
> would be good to know also.
 Sadly it looks like that printf doesn't spit out the offending
 pid. :(
>>> Just to confirm I am seeing this behavior on the latest snap
>>> without
>>> the patch as well.
>> Since this diff isn't the cause, I've committed it. Thanks for
>> testing. I'll see if I can reproduce your MAP_STACK issues.
>>
>>> Just started 10 VMs with sleep 2, machine freezes, but nothing on the
>>> console. :(
>> For now, I'd recommend spacing out vm launches. I'm pretty sure it's
>> related to the uvm corruption we saw last year when creating, starting,
>> and destroying vm's rapidly in a loop.
>
> That could very well be the case. I will adjust my start script, so
> far I've got good results with a 10 second sleep.
>
> Is there some additional debugging I can turn that makes sense for
> this? I can easily replicate.
>

Highly doubtful if the issue is what I think. The only thing would be
making sure you're running in a way to see any panic and drop into
ddb. If you're using X or not on the the primary console or serial
connection it might just appear as a deadlocked system during a panic.

-dv



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-06 Thread Mischa

On 2023-09-06 05:36, Dave Voutila wrote:

Mischa  writes:

On 2023-09-05 14:27, Dave Voutila wrote:

Mike Larkin  writes:


On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote:

On 2023-09-04 18:58, Mischa wrote:
> On 2023-09-04 18:55, Mischa wrote:

/snip


> > Adding the sleep 2 does indeed help. I managed to get 20 VMs started
> > this way, before it would choke on 2-3.
> >
> > Do I only need the unpatched kernel or also the vmd/vmctl from snap?
>
> I do still get the same message on the console, but the machine isn't
> freezing up.
>
> [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not
> MAP_STACK
Starting 30 VMs this way caused the machine to become unresponsive
again,
but nothing on the console. :(
Mischa

Were you seeing these uvm errors before this diff? If so, this
isn't
causing the problem and something else is.

I don't believe we solved any of the underlying uvm issues in Bruges
last year. Mischa, can you test with just the latest 
snapshot/-current?

I'd imagine starting and stopping many vm's now is exacerbating the
issue because of the fork/exec for devices plus the ioctl to do a uvm
share into the device process address space.


If this diff causes the errors to occur, and without the diff it's
fine, then
we need to look into that.
Also I think a pid number in that printf might be useful, I'll see
what I can
find. If it's not vmd causing this and rather some other process
then that
would be good to know also.

Sadly it looks like that printf doesn't spit out the offending
pid. :(


Just to confirm I am seeing this behavior on the latest snap without
the patch as well.


Since this diff isn't the cause, I've committed it. Thanks for
testing. I'll see if I can reproduce your MAP_STACK issues.


Just started 10 VMs with sleep 2, machine freezes, but nothing on the
console. :(


For now, I'd recommend spacing out vm launches. I'm pretty sure it's
related to the uvm corruption we saw last year when creating, starting,
and destroying vm's rapidly in a loop.


That could very well be the case. I will adjust my start script, so far 
I've got good results with a 10 second sleep.


Is there some additional debugging I can turn that makes sense for this? 
I can easily replicate.


Mischa



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-05 Thread Dave Voutila


Mischa  writes:

> On 2023-09-05 14:27, Dave Voutila wrote:
>> Mike Larkin  writes:
>>
>>> On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote:
 On 2023-09-04 18:58, Mischa wrote:
 > On 2023-09-04 18:55, Mischa wrote:
>> /snip
>>
 > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started
 > > this way, before it would choke on 2-3.
 > >
 > > Do I only need the unpatched kernel or also the vmd/vmctl from snap?
 >
 > I do still get the same message on the console, but the machine isn't
 > freezing up.
 >
 > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not
 > MAP_STACK
 Starting 30 VMs this way caused the machine to become unresponsive
 again,
 but nothing on the console. :(
 Mischa
>>> Were you seeing these uvm errors before this diff? If so, this
>>> isn't
>>> causing the problem and something else is.
>> I don't believe we solved any of the underlying uvm issues in Bruges
>> last year. Mischa, can you test with just the latest snapshot/-current?
>> I'd imagine starting and stopping many vm's now is exacerbating the
>> issue because of the fork/exec for devices plus the ioctl to do a uvm
>> share into the device process address space.
>>
>>> If this diff causes the errors to occur, and without the diff it's
>>> fine, then
>>> we need to look into that.
>>> Also I think a pid number in that printf might be useful, I'll see
>>> what I can
>>> find. If it's not vmd causing this and rather some other process
>>> then that
>>> would be good to know also.
>> Sadly it looks like that printf doesn't spit out the offending
>> pid. :(
>
> Just to confirm I am seeing this behavior on the latest snap without
> the patch as well.

Since this diff isn't the cause, I've committed it. Thanks for
testing. I'll see if I can reproduce your MAP_STACK issues.

> Just started 10 VMs with sleep 2, machine freezes, but nothing on the
> console. :(

For now, I'd recommend spacing out vm launches. I'm pretty sure it's
related to the uvm corruption we saw last year when creating, starting,
and destroying vm's rapidly in a loop.

-dv



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-05 Thread Mischa

On 2023-09-05 14:27, Dave Voutila wrote:

Mike Larkin  writes:


On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote:

On 2023-09-04 18:58, Mischa wrote:
> On 2023-09-04 18:55, Mischa wrote:


/snip


> > Adding the sleep 2 does indeed help. I managed to get 20 VMs started
> > this way, before it would choke on 2-3.
> >
> > Do I only need the unpatched kernel or also the vmd/vmctl from snap?
>
> I do still get the same message on the console, but the machine isn't
> freezing up.
>
> [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not
> MAP_STACK

Starting 30 VMs this way caused the machine to become unresponsive 
again,

but nothing on the console. :(

Mischa


Were you seeing these uvm errors before this diff? If so, this isn't
causing the problem and something else is.


I don't believe we solved any of the underlying uvm issues in Bruges
last year. Mischa, can you test with just the latest snapshot/-current?

I'd imagine starting and stopping many vm's now is exacerbating the
issue because of the fork/exec for devices plus the ioctl to do a uvm
share into the device process address space.



If this diff causes the errors to occur, and without the diff it's 
fine, then

we need to look into that.


Also I think a pid number in that printf might be useful, I'll see 
what I can
find. If it's not vmd causing this and rather some other process then 
that

would be good to know also.


Sadly it looks like that printf doesn't spit out the offending pid. :(


Just to confirm I am seeing this behavior on the latest snap without the 
patch as well.
Just started 10 VMs with sleep 2, machine freezes, but nothing on the 
console. :(


Mischa



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-05 Thread Mischa



On 2023-09-05 14:27, Dave Voutila wrote:

Mike Larkin  writes:


On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote:

On 2023-09-04 18:58, Mischa wrote:
> On 2023-09-04 18:55, Mischa wrote:


/snip


> > Adding the sleep 2 does indeed help. I managed to get 20 VMs started
> > this way, before it would choke on 2-3.
> >
> > Do I only need the unpatched kernel or also the vmd/vmctl from snap?
>
> I do still get the same message on the console, but the machine isn't
> freezing up.
>
> [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not
> MAP_STACK

Starting 30 VMs this way caused the machine to become unresponsive 
again,

but nothing on the console. :(

Mischa


Were you seeing these uvm errors before this diff? If so, this isn't
causing the problem and something else is.


I don't believe we solved any of the underlying uvm issues in Bruges
last year. Mischa, can you test with just the latest snapshot/-current?


Yes, after Mike's email I already started getting an extra machine up 
and running.

Will finish that shortly and run the tests on the latest snap.


I'd imagine starting and stopping many vm's now is exacerbating the
issue because of the fork/exec for devices plus the ioctl to do a uvm
share into the device process address space.


I will adjust my scripts accordingly. I currently start as many VMs as 
there are cores in production. Will test if that is still possible.


Mischa

If this diff causes the errors to occur, and without the diff it's 
fine, then

we need to look into that.


Also I think a pid number in that printf might be useful, I'll see 
what I can
find. If it's not vmd causing this and rather some other process then 
that

would be good to know also.


Sadly it looks like that printf doesn't spit out the offending pid. :(




Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-05 Thread Dave Voutila


Mike Larkin  writes:

> On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote:
>> On 2023-09-04 18:58, Mischa wrote:
>> > On 2023-09-04 18:55, Mischa wrote:

/snip

>> > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started
>> > > this way, before it would choke on 2-3.
>> > >
>> > > Do I only need the unpatched kernel or also the vmd/vmctl from snap?
>> >
>> > I do still get the same message on the console, but the machine isn't
>> > freezing up.
>> >
>> > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not
>> > MAP_STACK
>>
>> Starting 30 VMs this way caused the machine to become unresponsive again,
>> but nothing on the console. :(
>>
>> Mischa
>
> Were you seeing these uvm errors before this diff? If so, this isn't
> causing the problem and something else is.

I don't believe we solved any of the underlying uvm issues in Bruges
last year. Mischa, can you test with just the latest snapshot/-current?

I'd imagine starting and stopping many vm's now is exacerbating the
issue because of the fork/exec for devices plus the ioctl to do a uvm
share into the device process address space.

>
> If this diff causes the errors to occur, and without the diff it's fine, then
> we need to look into that.
>
>
> Also I think a pid number in that printf might be useful, I'll see what I can
> find. If it's not vmd causing this and rather some other process then that
> would be good to know also.

Sadly it looks like that printf doesn't spit out the offending pid. :(

-dv



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-04 Thread Mike Larkin
On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote:
> On 2023-09-04 18:58, Mischa wrote:
> > On 2023-09-04 18:55, Mischa wrote:
> > > On 2023-09-04 17:57, Dave Voutila wrote:
> > > > Mischa  writes:
> > > > > On 2023-09-04 16:23, Mike Larkin wrote:
> > > > > > On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote:
> > > > > > > On 2023-09-03 21:18, Dave Voutila wrote:
> > > > > > > > Mischa  writes:
> > > > > > > >
> > > > > > > > > Nice!! Thanx Dave!
> > > > > > > > >
> > > > > > > > > Running go brrr as we speak.
> > > > > > > > > Testing with someone who is running Debian.
> > > > > > > >
> > > > > > > > Great. I'll plan on committing this tomorrow afternoon (4 Sep) 
> > > > > > > > my time
> > > > > > > > unless I hear of any issues.
> > > > > > > There are a couple of permanent VMs running on this host, 1 ToR
> > > > > > > node,
> > > > > > > OpenBSD VM and a Debian VM.
> > > > > > > While they were running I started my stress script.
> > > > > > > The first round I started 40 VMs with just bsd.rd, 2G memory
> > > > > > > All good, then I started 40 VMs with a base disk and 2G memory.
> > > > > > > After 20 VMs started I got the following messages on the console:
> > > > > > > [umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff:
> > > > > > > not
> > > > > > > MAP_STACK
> > > > > > > [umd159360/355276 sp=783369$96750 inside
> > > > > > > 7256d538c000-725645b8bFff:
> > > > > > > not
> > > > > > > MAP_STACK
> > > > > > > [umd172263/319211 sp=70fb86794b60 inside
> > > > > > > 75247a4d2000-75247acdifff:
> > > > > > > not
> > > > > > > MAP_STACK
> > > > > > > [umd142824/38950 sp=7db1ed2a64d0 inside
> > > > > > > 756c57d18000-756c58517fff: not
> > > > > > > MAP_STACK
> > > > > > > [umd19808/286658 sp=7dbied2a64d0 inside
> > > > > > > 70f685f41000-70f6867dofff: not
> > > > > > > MAP_STACK
> > > > > > > [umd193279/488634 sp=72652c3e3da0 inside
> > > > > > > 7845f168d000-7845f1e8cfff:
> > > > > > > not
> > > > > > > MAP_STACK
> > > > > > > [umd155924/286116 sp=7eac5a1ff060 inside
> > > > > > > 7b88bcb79000-7b88b4378fff:
> > > > > > > not
> > > > > > > MAP_STACK
> > > > > > > Not sure if this is related to starting of the VMs or something
> > > > > > > else, the
> > > > > > > ToR node was consuming 100%+ CPU at the time. :)
> > > > > > > Mischa
> > > > > > I have not seen this; can you try without the ToR node
> > > > > > some time and
> > > > > > see if
> > > > > > this still happens?
> > > > >
> > > > > Testing again without any other VMs running.
> > > > > Things wrong when I run the following command and wait a little.
> > > > >
> > > > > for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2
> > > > > /var/vmm/vm${i}.qcow2 && vmctl start -L -d
> > > > > /var/vmm/vm${i}.qcow2 -m 2G
> > > > > vm${i}; done
> > > >
> > > > Can you try adding a "sleep 2" or something in the loop? I can't
> > > > think
> > > > of a reason my changes would cause this. Do you see this on -current
> > > > without the diff?
> > >
> > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started
> > > this way, before it would choke on 2-3.
> > >
> > > Do I only need the unpatched kernel or also the vmd/vmctl from snap?
> >
> > I do still get the same message on the console, but the machine isn't
> > freezing up.
> >
> > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not
> > MAP_STACK
>
> Starting 30 VMs this way caused the machine to become unresponsive again,
> but nothing on the console. :(
>
> Mischa

Were you seeing these uvm errors before this diff? If so, this isn't
causing the problem and something else is.

If this diff causes the errors to occur, and without the diff it's fine, then
we need to look into that.


Also I think a pid number in that printf might be useful, I'll see what I can
find. If it's not vmd causing this and rather some other process then that
would be good to know also.



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-04 Thread Mischa

On 2023-09-04 18:58, Mischa wrote:

On 2023-09-04 18:55, Mischa wrote:

On 2023-09-04 17:57, Dave Voutila wrote:

Mischa  writes:

On 2023-09-04 16:23, Mike Larkin wrote:

On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote:

On 2023-09-03 21:18, Dave Voutila wrote:
> Mischa  writes:
>
> > Nice!! Thanx Dave!
> >
> > Running go brrr as we speak.
> > Testing with someone who is running Debian.
>
> Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time
> unless I hear of any issues.
There are a couple of permanent VMs running on this host, 1 ToR
node,
OpenBSD VM and a Debian VM.
While they were running I started my stress script.
The first round I started 40 VMs with just bsd.rd, 2G memory
All good, then I started 40 VMs with a base disk and 2G memory.
After 20 VMs started I got the following messages on the console:
[umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff:
not
MAP_STACK
[umd159360/355276 sp=783369$96750 inside 
7256d538c000-725645b8bFff:

not
MAP_STACK
[umd172263/319211 sp=70fb86794b60 inside 
75247a4d2000-75247acdifff:

not
MAP_STACK
[umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: 
not

MAP_STACK
[umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: 
not

MAP_STACK
[umd193279/488634 sp=72652c3e3da0 inside 
7845f168d000-7845f1e8cfff:

not
MAP_STACK
[umd155924/286116 sp=7eac5a1ff060 inside 
7b88bcb79000-7b88b4378fff:

not
MAP_STACK
Not sure if this is related to starting of the VMs or something
else, the
ToR node was consuming 100%+ CPU at the time. :)
Mischa
I have not seen this; can you try without the ToR node some time 
and

see if
this still happens?


Testing again without any other VMs running.
Things wrong when I run the following command and wait a little.

for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2
/var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 
2G

vm${i}; done


Can you try adding a "sleep 2" or something in the loop? I can't 
think

of a reason my changes would cause this. Do you see this on -current
without the diff?


Adding the sleep 2 does indeed help. I managed to get 20 VMs started 
this way, before it would choke on 2-3.


Do I only need the unpatched kernel or also the vmd/vmctl from snap?


I do still get the same message on the console, but the machine isn't 
freezing up.


[umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not 
MAP_STACK


Starting 30 VMs this way caused the machine to become unresponsive 
again, but nothing on the console. :(


Mischa



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-04 Thread Mischa




On 2023-09-04 18:55, Mischa wrote:

On 2023-09-04 17:57, Dave Voutila wrote:

Mischa  writes:

On 2023-09-04 16:23, Mike Larkin wrote:

On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote:

On 2023-09-03 21:18, Dave Voutila wrote:
> Mischa  writes:
>
> > Nice!! Thanx Dave!
> >
> > Running go brrr as we speak.
> > Testing with someone who is running Debian.
>
> Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time
> unless I hear of any issues.
There are a couple of permanent VMs running on this host, 1 ToR
node,
OpenBSD VM and a Debian VM.
While they were running I started my stress script.
The first round I started 40 VMs with just bsd.rd, 2G memory
All good, then I started 40 VMs with a base disk and 2G memory.
After 20 VMs started I got the following messages on the console:
[umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff:
not
MAP_STACK
[umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff:
not
MAP_STACK
[umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff:
not
MAP_STACK
[umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: 
not

MAP_STACK
[umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: 
not

MAP_STACK
[umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff:
not
MAP_STACK
[umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff:
not
MAP_STACK
Not sure if this is related to starting of the VMs or something
else, the
ToR node was consuming 100%+ CPU at the time. :)
Mischa

I have not seen this; can you try without the ToR node some time and
see if
this still happens?


Testing again without any other VMs running.
Things wrong when I run the following command and wait a little.

for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2
/var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 
2G

vm${i}; done


Can you try adding a "sleep 2" or something in the loop? I can't think
of a reason my changes would cause this. Do you see this on -current
without the diff?


Adding the sleep 2 does indeed help. I managed to get 20 VMs started 
this way, before it would choke on 2-3.


Do I only need the unpatched kernel or also the vmd/vmctl from snap?


I do still get the same message on the console, but the machine isn't 
freezing up.


[umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not 
MAP_STACK


Mischa



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-04 Thread Mischa

On 2023-09-04 17:57, Dave Voutila wrote:

Mischa  writes:

On 2023-09-04 16:23, Mike Larkin wrote:

On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote:

On 2023-09-03 21:18, Dave Voutila wrote:
> Mischa  writes:
>
> > Nice!! Thanx Dave!
> >
> > Running go brrr as we speak.
> > Testing with someone who is running Debian.
>
> Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time
> unless I hear of any issues.
There are a couple of permanent VMs running on this host, 1 ToR
node,
OpenBSD VM and a Debian VM.
While they were running I started my stress script.
The first round I started 40 VMs with just bsd.rd, 2G memory
All good, then I started 40 VMs with a base disk and 2G memory.
After 20 VMs started I got the following messages on the console:
[umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff:
not
MAP_STACK
[umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff:
not
MAP_STACK
[umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff:
not
MAP_STACK
[umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: 
not

MAP_STACK
[umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: 
not

MAP_STACK
[umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff:
not
MAP_STACK
[umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff:
not
MAP_STACK
Not sure if this is related to starting of the VMs or something
else, the
ToR node was consuming 100%+ CPU at the time. :)
Mischa

I have not seen this; can you try without the ToR node some time and
see if
this still happens?


Testing again without any other VMs running.
Things wrong when I run the following command and wait a little.

for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2
/var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 2G
vm${i}; done


Can you try adding a "sleep 2" or something in the loop? I can't think
of a reason my changes would cause this. Do you see this on -current
without the diff?


Adding the sleep 2 does indeed help. I managed to get 20 VMs started 
this way, before it would choke on 2-3.


Do I only need the unpatched kernel or also the vmd/vmctl from snap?

Mischa



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-04 Thread Dave Voutila


Mischa  writes:

> On 2023-09-04 16:23, Mike Larkin wrote:
>> On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote:
>>> On 2023-09-03 21:18, Dave Voutila wrote:
>>> > Mischa  writes:
>>> >
>>> > > Nice!! Thanx Dave!
>>> > >
>>> > > Running go brrr as we speak.
>>> > > Testing with someone who is running Debian.
>>> >
>>> > Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time
>>> > unless I hear of any issues.
>>> There are a couple of permanent VMs running on this host, 1 ToR
>>> node,
>>> OpenBSD VM and a Debian VM.
>>> While they were running I started my stress script.
>>> The first round I started 40 VMs with just bsd.rd, 2G memory
>>> All good, then I started 40 VMs with a base disk and 2G memory.
>>> After 20 VMs started I got the following messages on the console:
>>> [umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff:
>>> not
>>> MAP_STACK
>>> [umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff:
>>> not
>>> MAP_STACK
>>> [umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff:
>>> not
>>> MAP_STACK
>>> [umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not
>>> MAP_STACK
>>> [umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not
>>> MAP_STACK
>>> [umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff:
>>> not
>>> MAP_STACK
>>> [umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff:
>>> not
>>> MAP_STACK
>>> Not sure if this is related to starting of the VMs or something
>>> else, the
>>> ToR node was consuming 100%+ CPU at the time. :)
>>> Mischa
>> I have not seen this; can you try without the ToR node some time and
>> see if
>> this still happens?
>
> Testing again without any other VMs running.
> Things wrong when I run the following command and wait a little.
>
> for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2
> /var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 2G
> vm${i}; done

Can you try adding a "sleep 2" or something in the loop? I can't think
of a reason my changes would cause this. Do you see this on -current
without the diff?

-dv



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-04 Thread Mischa

On 2023-09-04 16:23, Mike Larkin wrote:

On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote:

On 2023-09-03 21:18, Dave Voutila wrote:
> Mischa  writes:
>
> > Nice!! Thanx Dave!
> >
> > Running go brrr as we speak.
> > Testing with someone who is running Debian.
>
> Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time
> unless I hear of any issues.

There are a couple of permanent VMs running on this host, 1 ToR node,
OpenBSD VM and a Debian VM.
While they were running I started my stress script.
The first round I started 40 VMs with just bsd.rd, 2G memory
All good, then I started 40 VMs with a base disk and 2G memory.
After 20 VMs started I got the following messages on the console:

[umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: not
MAP_STACK
[umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff: 
not

MAP_STACK
[umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff: 
not

MAP_STACK
[umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not
MAP_STACK
[umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not
MAP_STACK
[umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff: 
not

MAP_STACK
[umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff: 
not

MAP_STACK

Not sure if this is related to starting of the VMs or something else, 
the

ToR node was consuming 100%+ CPU at the time. :)

Mischa


I have not seen this; can you try without the ToR node some time and 
see if

this still happens?


Testing again without any other VMs running.
Things wrong when I run the following command and wait a little.

for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2 
/var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 2G 
vm${i}; done


Mischa



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-04 Thread Mike Larkin
On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote:
> On 2023-09-03 21:18, Dave Voutila wrote:
> > Mischa  writes:
> >
> > > Nice!! Thanx Dave!
> > >
> > > Running go brrr as we speak.
> > > Testing with someone who is running Debian.
> >
> > Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time
> > unless I hear of any issues.
>
> There are a couple of permanent VMs running on this host, 1 ToR node,
> OpenBSD VM and a Debian VM.
> While they were running I started my stress script.
> The first round I started 40 VMs with just bsd.rd, 2G memory
> All good, then I started 40 VMs with a base disk and 2G memory.
> After 20 VMs started I got the following messages on the console:
>
> [umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: not
> MAP_STACK
> [umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff: not
> MAP_STACK
> [umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff: not
> MAP_STACK
> [umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not
> MAP_STACK
> [umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not
> MAP_STACK
> [umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff: not
> MAP_STACK
> [umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff: not
> MAP_STACK
>
> Not sure if this is related to starting of the VMs or something else, the
> ToR node was consuming 100%+ CPU at the time. :)
>
> Mischa

I have not seen this; can you try without the ToR node some time and see if
this still happens?



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-04 Thread Mischa

On 2023-09-03 21:18, Dave Voutila wrote:

Mischa  writes:


Nice!! Thanx Dave!

Running go brrr as we speak.
Testing with someone who is running Debian.


Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time
unless I hear of any issues.


There are a couple of permanent VMs running on this host, 1 ToR node, 
OpenBSD VM and a Debian VM.

While they were running I started my stress script.
The first round I started 40 VMs with just bsd.rd, 2G memory
All good, then I started 40 VMs with a base disk and 2G memory.
After 20 VMs started I got the following messages on the console:

[umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: not 
MAP_STACK
[umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff: not 
MAP_STACK
[umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff: not 
MAP_STACK
[umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not 
MAP_STACK
[umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not 
MAP_STACK
[umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff: not 
MAP_STACK
[umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff: not 
MAP_STACK


Not sure if this is related to starting of the VMs or something else, 
the ToR node was consuming 100%+ CPU at the time. :)


Mischa



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-03 Thread Dave Voutila


Mischa  writes:

> Nice!! Thanx Dave!
>
> Running go brrr as we speak.
> Testing with someone who is running Debian.

Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time
unless I hear of any issues.



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-03 Thread Mischa

Nice!! Thanx Dave!

Running go brrr as we speak.
Testing with someone who is running Debian.

Mischa

On 2023-09-01 21:50, Dave Voutila wrote:

Now that my i8259 fix is in, it's safe to expand the testing pool for
this diff. (Without that fix, users would definitely hit the hung block
device issue testing this one.) Hoping that folks that run non-OpenBSD
guests or strange configurations can give it a spin.

This change removes an ioctl(2) call from the vcpu thread hot path in
vmd. Instead of making that syscall to toggle on/off a pending 
interrupt
flag on the vcpu object in vmm(4), it adds a flag into the 
vm_run_params

struct sent with the VMM_IOC_RUN ioctl. The in-kernel vcpu runloop can
now toggle the pending interrupt state prior to vm entry.

mbuhl@ and phessler@ have run this diff on their machines. Current
observations are reduced average network latency for guests.

My terse measurements using the following btrace script show some
promising changes in terms of reducing ioctl syscalls:

  /* VMM_IOC_INTR: 0x800c5606 -> 2148292102 */
  syscall:ioctl:entry
  /arg1 == 2148292102/
  {
@total[tid] = count();
@running[tid] = count();
  }
  interval:hz:1
  {
print(@running);
clear(@running);
  }

Measuring from boot of an OpenBSD guest to after the guest finishes
relinking (based on my manual observation of the libevent thread
settling down in syscall rate), I see a huge reduction in VMM_IOC_INTR
ioctls for a single guest:

## -current
@total[433237]: 1325100  # vcpu thread (!!)
@total[187073]: 80239# libevent thread

## with diff
@total[550347]: 42   # vcpu thread (!!)
@total[256550]: 86946# libevent thread

Most of the VMM_IOC_INTR ioctls on the vcpu threads come from seabios
and the bootloader prodding some of the emulated hardware, but even
after the bootloader you'll see ~10-20k/s of ioctl's on -current
vs. ~4-5k/s with the diff.

At steady-state, the vcpu thread no longer makes the VMM_IOC_INTR calls
at all and you should see the libevent thread calling it at a rate 
~100/s

(probably hardclock?). *Without* the diff, I see a steady 650/s rate on
the vcpu thread at idle. *With* the diff, it's 0/s at idle. :)

To test:
- rebuild & install new kernel
- copy/symlink vmmvar.h into /usr/include/machine/
- rebuild & re-install vmd & vmctl
- reboot

-dv


diffstat refs/heads/master refs/heads/vmm-vrp_intr_pending
 M  sys/arch/amd64/amd64/vmm_machdep.c  |  10+   0-
 M  sys/arch/amd64/include/vmmvar.h |   1+   0-
 M  usr.sbin/vmd/vm.c   |   2+  16-

3 files changed, 13 insertions(+), 16 deletions(-)

diff refs/heads/master refs/heads/vmm-vrp_intr_pending
commit - 8afcf90fb39e4a84606e93137c2b6c20f44312cb
commit + 10eeb8a0414ec927b6282473c50043a7027d6b41
blob - 24a376a8f3bc94bc4a4203fe66c5994594adff46
blob + e3b6d10a0ae78b12ec2f3296f708b42540ce798e
--- sys/arch/amd64/amd64/vmm_machdep.c
+++ sys/arch/amd64/amd64/vmm_machdep.c
@@ -3973,6 +3973,11 @@ vcpu_run_vmx(struct vcpu *vcpu, struct 
vm_run_params *

 */
irq = vrp->vrp_irq;

+   if (vrp->vrp_intr_pending)
+   vcpu->vc_intr = 1;
+   else
+   vcpu->vc_intr = 0;
+
if (vrp->vrp_continue) {
switch (vcpu->vc_gueststate.vg_exit_reason) {
case VMX_EXIT_IO:
@@ -6381,6 +6386,11 @@ vcpu_run_svm(struct vcpu *vcpu, struct 
vm_run_params *


irq = vrp->vrp_irq;

+   if (vrp->vrp_intr_pending)
+   vcpu->vc_intr = 1;
+   else
+   vcpu->vc_intr = 0;
+
/*
 * If we are returning from userspace (vmd) because we exited
 * last time, fix up any needed vcpu state first. Which state
blob - e9f8384cccfde33034d7ac9782610f93eb5dc640
blob + 88545b54b35dd60280ba87403e343db9463d7419
--- sys/arch/amd64/include/vmmvar.h
+++ sys/arch/amd64/include/vmmvar.h
@@ -456,6 +456,7 @@ struct vm_run_params {
uint32_tvrp_vcpu_id;
uint8_t vrp_continue;   /* Continuing from an exit */
uint16_tvrp_irq;/* IRQ to inject */
+   uint8_t vrp_intr_pending;   /* Additional intrs pending? */

/* Input/output parameter to VMM_IOC_RUN */
struct vm_exit  *vrp_exit;  /* updated exit data */
blob - 5f598bcc14af5115372d34a4176254d377aad91c
blob + 447fc219adadf945de2bf25d5335993c2abdc26f
--- usr.sbin/vmd/vm.c
+++ usr.sbin/vmd/vm.c
@@ -1610,22 +1610,8 @@ vcpu_run_loop(void *arg)
} else
vrp->vrp_irq = 0x;

-   /* Still more pending? */
-   if (i8259_is_pending()) {
-   /*
-* XXX can probably avoid ioctls here by providing intr
-* in vrp
-*/
-   if (vcpu_pic_intr(vrp->vrp_vm_id,
-   vrp->vrp_vcpu_id, 1)) {
-   fatal("can't set INTR");
-   }
-   } 

Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-03 Thread Mike Larkin
On Fri, Sep 01, 2023 at 03:50:31PM -0400, Dave Voutila wrote:
> Now that my i8259 fix is in, it's safe to expand the testing pool for
> this diff. (Without that fix, users would definitely hit the hung block
> device issue testing this one.) Hoping that folks that run non-OpenBSD
> guests or strange configurations can give it a spin.
>
> This change removes an ioctl(2) call from the vcpu thread hot path in
> vmd. Instead of making that syscall to toggle on/off a pending interrupt
> flag on the vcpu object in vmm(4), it adds a flag into the vm_run_params
> struct sent with the VMM_IOC_RUN ioctl. The in-kernel vcpu runloop can
> now toggle the pending interrupt state prior to vm entry.
>
> mbuhl@ and phessler@ have run this diff on their machines. Current
> observations are reduced average network latency for guests.
>
> My terse measurements using the following btrace script show some
> promising changes in terms of reducing ioctl syscalls:
>
>   /* VMM_IOC_INTR: 0x800c5606 -> 2148292102 */
>   syscall:ioctl:entry
>   /arg1 == 2148292102/
>   {
> @total[tid] = count();
> @running[tid] = count();
>   }
>   interval:hz:1
>   {
> print(@running);
> clear(@running);
>   }
>
> Measuring from boot of an OpenBSD guest to after the guest finishes
> relinking (based on my manual observation of the libevent thread
> settling down in syscall rate), I see a huge reduction in VMM_IOC_INTR
> ioctls for a single guest:
>
> ## -current
> @total[433237]: 1325100  # vcpu thread (!!)
> @total[187073]: 80239# libevent thread
>
> ## with diff
> @total[550347]: 42   # vcpu thread (!!)
> @total[256550]: 86946# libevent thread
>
> Most of the VMM_IOC_INTR ioctls on the vcpu threads come from seabios
> and the bootloader prodding some of the emulated hardware, but even
> after the bootloader you'll see ~10-20k/s of ioctl's on -current
> vs. ~4-5k/s with the diff.
>
> At steady-state, the vcpu thread no longer makes the VMM_IOC_INTR calls
> at all and you should see the libevent thread calling it at a rate ~100/s
> (probably hardclock?). *Without* the diff, I see a steady 650/s rate on
> the vcpu thread at idle. *With* the diff, it's 0/s at idle. :)
>
> To test:
> - rebuild & install new kernel
> - copy/symlink vmmvar.h into /usr/include/machine/
> - rebuild & re-install vmd & vmctl
> - reboot
>
> -dv
>
>

ok mlarkin, thanks!

> diffstat refs/heads/master refs/heads/vmm-vrp_intr_pending
>  M  sys/arch/amd64/amd64/vmm_machdep.c  |  10+   0-
>  M  sys/arch/amd64/include/vmmvar.h |   1+   0-
>  M  usr.sbin/vmd/vm.c   |   2+  16-
>
> 3 files changed, 13 insertions(+), 16 deletions(-)
>
> diff refs/heads/master refs/heads/vmm-vrp_intr_pending
> commit - 8afcf90fb39e4a84606e93137c2b6c20f44312cb
> commit + 10eeb8a0414ec927b6282473c50043a7027d6b41
> blob - 24a376a8f3bc94bc4a4203fe66c5994594adff46
> blob + e3b6d10a0ae78b12ec2f3296f708b42540ce798e
> --- sys/arch/amd64/amd64/vmm_machdep.c
> +++ sys/arch/amd64/amd64/vmm_machdep.c
> @@ -3973,6 +3973,11 @@ vcpu_run_vmx(struct vcpu *vcpu, struct vm_run_params *
>*/
>   irq = vrp->vrp_irq;
>
> + if (vrp->vrp_intr_pending)
> + vcpu->vc_intr = 1;
> + else
> + vcpu->vc_intr = 0;
> +
>   if (vrp->vrp_continue) {
>   switch (vcpu->vc_gueststate.vg_exit_reason) {
>   case VMX_EXIT_IO:
> @@ -6381,6 +6386,11 @@ vcpu_run_svm(struct vcpu *vcpu, struct vm_run_params *
>
>   irq = vrp->vrp_irq;
>
> + if (vrp->vrp_intr_pending)
> + vcpu->vc_intr = 1;
> + else
> + vcpu->vc_intr = 0;
> +
>   /*
>* If we are returning from userspace (vmd) because we exited
>* last time, fix up any needed vcpu state first. Which state
> blob - e9f8384cccfde33034d7ac9782610f93eb5dc640
> blob + 88545b54b35dd60280ba87403e343db9463d7419
> --- sys/arch/amd64/include/vmmvar.h
> +++ sys/arch/amd64/include/vmmvar.h
> @@ -456,6 +456,7 @@ struct vm_run_params {
>   uint32_tvrp_vcpu_id;
>   uint8_t vrp_continue;   /* Continuing from an exit */
>   uint16_tvrp_irq;/* IRQ to inject */
> + uint8_t vrp_intr_pending;   /* Additional intrs pending? */
>
>   /* Input/output parameter to VMM_IOC_RUN */
>   struct vm_exit  *vrp_exit;  /* updated exit data */
> blob - 5f598bcc14af5115372d34a4176254d377aad91c
> blob + 447fc219adadf945de2bf25d5335993c2abdc26f
> --- usr.sbin/vmd/vm.c
> +++ usr.sbin/vmd/vm.c
> @@ -1610,22 +1610,8 @@ vcpu_run_loop(void *arg)
>   } else
>   vrp->vrp_irq = 0x;
>
> - /* Still more pending? */
> - if (i8259_is_pending()) {
> - /*
> -  * XXX can probably avoid ioctls here by providing intr
> -  * in vrp
> -  */
> - if (vcpu_pic_intr(vrp->vrp_vm_id,
> - vrp->vrp_vcpu_id, 1)) {
> -  

Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-02 Thread Stefan Sperling
On Sat, Sep 02, 2023 at 12:59:51PM +0200, Peter Hessler wrote:
> I just upgraded to -current and didn't have this patch in for a little
> bit, and woof that was super noticable.  Still works for my big VM host.
> 
> OK

No issues here either with just one VM running for testing some
software, so mostly idle.



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-02 Thread Peter Hessler
I just upgraded to -current and didn't have this patch in for a little
bit, and woof that was super noticable.  Still works for my big VM host.

OK


On 2023 Sep 01 (Fri) at 15:50:31 -0400 (-0400), Dave Voutila wrote:
:Now that my i8259 fix is in, it's safe to expand the testing pool for
:this diff. (Without that fix, users would definitely hit the hung block
:device issue testing this one.) Hoping that folks that run non-OpenBSD
:guests or strange configurations can give it a spin.
:
:This change removes an ioctl(2) call from the vcpu thread hot path in
:vmd. Instead of making that syscall to toggle on/off a pending interrupt
:flag on the vcpu object in vmm(4), it adds a flag into the vm_run_params
:struct sent with the VMM_IOC_RUN ioctl. The in-kernel vcpu runloop can
:now toggle the pending interrupt state prior to vm entry.
:
:mbuhl@ and phessler@ have run this diff on their machines. Current
:observations are reduced average network latency for guests.
:
:My terse measurements using the following btrace script show some
:promising changes in terms of reducing ioctl syscalls:
:
:  /* VMM_IOC_INTR: 0x800c5606 -> 2148292102 */
:  syscall:ioctl:entry
:  /arg1 == 2148292102/
:  {
:@total[tid] = count();
:@running[tid] = count();
:  }
:  interval:hz:1
:  {
:print(@running);
:clear(@running);
:  }
:
:Measuring from boot of an OpenBSD guest to after the guest finishes
:relinking (based on my manual observation of the libevent thread
:settling down in syscall rate), I see a huge reduction in VMM_IOC_INTR
:ioctls for a single guest:
:
:## -current
:@total[433237]: 1325100  # vcpu thread (!!)
:@total[187073]: 80239# libevent thread
:
:## with diff
:@total[550347]: 42   # vcpu thread (!!)
:@total[256550]: 86946# libevent thread
:
:Most of the VMM_IOC_INTR ioctls on the vcpu threads come from seabios
:and the bootloader prodding some of the emulated hardware, but even
:after the bootloader you'll see ~10-20k/s of ioctl's on -current
:vs. ~4-5k/s with the diff.
:
:At steady-state, the vcpu thread no longer makes the VMM_IOC_INTR calls
:at all and you should see the libevent thread calling it at a rate ~100/s
:(probably hardclock?). *Without* the diff, I see a steady 650/s rate on
:the vcpu thread at idle. *With* the diff, it's 0/s at idle. :)
:
:To test:
:- rebuild & install new kernel
:- copy/symlink vmmvar.h into /usr/include/machine/
:- rebuild & re-install vmd & vmctl
:- reboot
:
:-dv
:
:
:diffstat refs/heads/master refs/heads/vmm-vrp_intr_pending
: M  sys/arch/amd64/amd64/vmm_machdep.c  |  10+   0-
: M  sys/arch/amd64/include/vmmvar.h |   1+   0-
: M  usr.sbin/vmd/vm.c   |   2+  16-
:
:3 files changed, 13 insertions(+), 16 deletions(-)
:
:diff refs/heads/master refs/heads/vmm-vrp_intr_pending
:commit - 8afcf90fb39e4a84606e93137c2b6c20f44312cb
:commit + 10eeb8a0414ec927b6282473c50043a7027d6b41
:blob - 24a376a8f3bc94bc4a4203fe66c5994594adff46
:blob + e3b6d10a0ae78b12ec2f3296f708b42540ce798e
:--- sys/arch/amd64/amd64/vmm_machdep.c
:+++ sys/arch/amd64/amd64/vmm_machdep.c
:@@ -3973,6 +3973,11 @@ vcpu_run_vmx(struct vcpu *vcpu, struct vm_run_params *
:*/
:   irq = vrp->vrp_irq;
:
:+  if (vrp->vrp_intr_pending)
:+  vcpu->vc_intr = 1;
:+  else
:+  vcpu->vc_intr = 0;
:+
:   if (vrp->vrp_continue) {
:   switch (vcpu->vc_gueststate.vg_exit_reason) {
:   case VMX_EXIT_IO:
:@@ -6381,6 +6386,11 @@ vcpu_run_svm(struct vcpu *vcpu, struct vm_run_params *
:
:   irq = vrp->vrp_irq;
:
:+  if (vrp->vrp_intr_pending)
:+  vcpu->vc_intr = 1;
:+  else
:+  vcpu->vc_intr = 0;
:+
:   /*
:* If we are returning from userspace (vmd) because we exited
:* last time, fix up any needed vcpu state first. Which state
:blob - e9f8384cccfde33034d7ac9782610f93eb5dc640
:blob + 88545b54b35dd60280ba87403e343db9463d7419
:--- sys/arch/amd64/include/vmmvar.h
:+++ sys/arch/amd64/include/vmmvar.h
:@@ -456,6 +456,7 @@ struct vm_run_params {
:   uint32_tvrp_vcpu_id;
:   uint8_t vrp_continue;   /* Continuing from an exit */
:   uint16_tvrp_irq;/* IRQ to inject */
:+  uint8_t vrp_intr_pending;   /* Additional intrs pending? */
:
:   /* Input/output parameter to VMM_IOC_RUN */
:   struct vm_exit  *vrp_exit;  /* updated exit data */
:blob - 5f598bcc14af5115372d34a4176254d377aad91c
:blob + 447fc219adadf945de2bf25d5335993c2abdc26f
:--- usr.sbin/vmd/vm.c
:+++ usr.sbin/vmd/vm.c
:@@ -1610,22 +1610,8 @@ vcpu_run_loop(void *arg)
:   } else
:   vrp->vrp_irq = 0x;
:
:-  /* Still more pending? */
:-  if (i8259_is_pending()) {
:-  /*
:-   * XXX can probably avoid ioctls here by providing intr
:-   * in vrp
:-   */
:-  if 

vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-01 Thread Dave Voutila
Now that my i8259 fix is in, it's safe to expand the testing pool for
this diff. (Without that fix, users would definitely hit the hung block
device issue testing this one.) Hoping that folks that run non-OpenBSD
guests or strange configurations can give it a spin.

This change removes an ioctl(2) call from the vcpu thread hot path in
vmd. Instead of making that syscall to toggle on/off a pending interrupt
flag on the vcpu object in vmm(4), it adds a flag into the vm_run_params
struct sent with the VMM_IOC_RUN ioctl. The in-kernel vcpu runloop can
now toggle the pending interrupt state prior to vm entry.

mbuhl@ and phessler@ have run this diff on their machines. Current
observations are reduced average network latency for guests.

My terse measurements using the following btrace script show some
promising changes in terms of reducing ioctl syscalls:

  /* VMM_IOC_INTR: 0x800c5606 -> 2148292102 */
  syscall:ioctl:entry
  /arg1 == 2148292102/
  {
@total[tid] = count();
@running[tid] = count();
  }
  interval:hz:1
  {
print(@running);
clear(@running);
  }

Measuring from boot of an OpenBSD guest to after the guest finishes
relinking (based on my manual observation of the libevent thread
settling down in syscall rate), I see a huge reduction in VMM_IOC_INTR
ioctls for a single guest:

## -current
@total[433237]: 1325100  # vcpu thread (!!)
@total[187073]: 80239# libevent thread

## with diff
@total[550347]: 42   # vcpu thread (!!)
@total[256550]: 86946# libevent thread

Most of the VMM_IOC_INTR ioctls on the vcpu threads come from seabios
and the bootloader prodding some of the emulated hardware, but even
after the bootloader you'll see ~10-20k/s of ioctl's on -current
vs. ~4-5k/s with the diff.

At steady-state, the vcpu thread no longer makes the VMM_IOC_INTR calls
at all and you should see the libevent thread calling it at a rate ~100/s
(probably hardclock?). *Without* the diff, I see a steady 650/s rate on
the vcpu thread at idle. *With* the diff, it's 0/s at idle. :)

To test:
- rebuild & install new kernel
- copy/symlink vmmvar.h into /usr/include/machine/
- rebuild & re-install vmd & vmctl
- reboot

-dv


diffstat refs/heads/master refs/heads/vmm-vrp_intr_pending
 M  sys/arch/amd64/amd64/vmm_machdep.c  |  10+   0-
 M  sys/arch/amd64/include/vmmvar.h |   1+   0-
 M  usr.sbin/vmd/vm.c   |   2+  16-

3 files changed, 13 insertions(+), 16 deletions(-)

diff refs/heads/master refs/heads/vmm-vrp_intr_pending
commit - 8afcf90fb39e4a84606e93137c2b6c20f44312cb
commit + 10eeb8a0414ec927b6282473c50043a7027d6b41
blob - 24a376a8f3bc94bc4a4203fe66c5994594adff46
blob + e3b6d10a0ae78b12ec2f3296f708b42540ce798e
--- sys/arch/amd64/amd64/vmm_machdep.c
+++ sys/arch/amd64/amd64/vmm_machdep.c
@@ -3973,6 +3973,11 @@ vcpu_run_vmx(struct vcpu *vcpu, struct vm_run_params *
 */
irq = vrp->vrp_irq;

+   if (vrp->vrp_intr_pending)
+   vcpu->vc_intr = 1;
+   else
+   vcpu->vc_intr = 0;
+
if (vrp->vrp_continue) {
switch (vcpu->vc_gueststate.vg_exit_reason) {
case VMX_EXIT_IO:
@@ -6381,6 +6386,11 @@ vcpu_run_svm(struct vcpu *vcpu, struct vm_run_params *

irq = vrp->vrp_irq;

+   if (vrp->vrp_intr_pending)
+   vcpu->vc_intr = 1;
+   else
+   vcpu->vc_intr = 0;
+
/*
 * If we are returning from userspace (vmd) because we exited
 * last time, fix up any needed vcpu state first. Which state
blob - e9f8384cccfde33034d7ac9782610f93eb5dc640
blob + 88545b54b35dd60280ba87403e343db9463d7419
--- sys/arch/amd64/include/vmmvar.h
+++ sys/arch/amd64/include/vmmvar.h
@@ -456,6 +456,7 @@ struct vm_run_params {
uint32_tvrp_vcpu_id;
uint8_t vrp_continue;   /* Continuing from an exit */
uint16_tvrp_irq;/* IRQ to inject */
+   uint8_t vrp_intr_pending;   /* Additional intrs pending? */

/* Input/output parameter to VMM_IOC_RUN */
struct vm_exit  *vrp_exit;  /* updated exit data */
blob - 5f598bcc14af5115372d34a4176254d377aad91c
blob + 447fc219adadf945de2bf25d5335993c2abdc26f
--- usr.sbin/vmd/vm.c
+++ usr.sbin/vmd/vm.c
@@ -1610,22 +1610,8 @@ vcpu_run_loop(void *arg)
} else
vrp->vrp_irq = 0x;

-   /* Still more pending? */
-   if (i8259_is_pending()) {
-   /*
-* XXX can probably avoid ioctls here by providing intr
-* in vrp
-*/
-   if (vcpu_pic_intr(vrp->vrp_vm_id,
-   vrp->vrp_vcpu_id, 1)) {
-   fatal("can't set INTR");
-   }
-   } else {
-   if (vcpu_pic_intr(vrp->vrp_vm_id,
-   vrp->vrp_vcpu_id, 0)) {
-