Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-06 19:38, Dave Voutila wrote: Mischa writes: On 2023-09-06 05:36, Dave Voutila wrote: Mischa writes: On 2023-09-05 14:27, Dave Voutila wrote: Mike Larkin writes: On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote: On 2023-09-04 18:58, Mischa wrote: > On 2023-09-04 18:55, Mischa wrote: /snip > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started > > this way, before it would choke on 2-3. > > > > Do I only need the unpatched kernel or also the vmd/vmctl from snap? > > I do still get the same message on the console, but the machine isn't > freezing up. > > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not > MAP_STACK Starting 30 VMs this way caused the machine to become unresponsive again, but nothing on the console. :( Mischa Were you seeing these uvm errors before this diff? If so, this isn't causing the problem and something else is. I don't believe we solved any of the underlying uvm issues in Bruges last year. Mischa, can you test with just the latest snapshot/-current? I'd imagine starting and stopping many vm's now is exacerbating the issue because of the fork/exec for devices plus the ioctl to do a uvm share into the device process address space. If this diff causes the errors to occur, and without the diff it's fine, then we need to look into that. Also I think a pid number in that printf might be useful, I'll see what I can find. If it's not vmd causing this and rather some other process then that would be good to know also. Sadly it looks like that printf doesn't spit out the offending pid. :( Just to confirm I am seeing this behavior on the latest snap without the patch as well. Since this diff isn't the cause, I've committed it. Thanks for testing. I'll see if I can reproduce your MAP_STACK issues. Just started 10 VMs with sleep 2, machine freezes, but nothing on the console. :( For now, I'd recommend spacing out vm launches. I'm pretty sure it's related to the uvm corruption we saw last year when creating, starting, and destroying vm's rapidly in a loop. That could very well be the case. I will adjust my start script, so far I've got good results with a 10 second sleep. Is there some additional debugging I can turn that makes sense for this? I can easily replicate. Highly doubtful if the issue is what I think. The only thing would be making sure you're running in a way to see any panic and drop into ddb. If you're using X or not on the the primary console or serial connection it might just appear as a deadlocked system during a panic. I am using the console via iDRAC, there isn't any information anymore. :( Mischa
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
Mischa writes: > On 2023-09-06 05:36, Dave Voutila wrote: >> Mischa writes: >>> On 2023-09-05 14:27, Dave Voutila wrote: Mike Larkin writes: > On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote: >> On 2023-09-04 18:58, Mischa wrote: >> > On 2023-09-04 18:55, Mischa wrote: /snip >> > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started >> > > this way, before it would choke on 2-3. >> > > >> > > Do I only need the unpatched kernel or also the vmd/vmctl from snap? >> > >> > I do still get the same message on the console, but the machine isn't >> > freezing up. >> > >> > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not >> > MAP_STACK >> Starting 30 VMs this way caused the machine to become unresponsive >> again, >> but nothing on the console. :( >> Mischa > Were you seeing these uvm errors before this diff? If so, this > isn't > causing the problem and something else is. I don't believe we solved any of the underlying uvm issues in Bruges last year. Mischa, can you test with just the latest snapshot/-current? I'd imagine starting and stopping many vm's now is exacerbating the issue because of the fork/exec for devices plus the ioctl to do a uvm share into the device process address space. > If this diff causes the errors to occur, and without the diff it's > fine, then > we need to look into that. > Also I think a pid number in that printf might be useful, I'll see > what I can > find. If it's not vmd causing this and rather some other process > then that > would be good to know also. Sadly it looks like that printf doesn't spit out the offending pid. :( >>> Just to confirm I am seeing this behavior on the latest snap >>> without >>> the patch as well. >> Since this diff isn't the cause, I've committed it. Thanks for >> testing. I'll see if I can reproduce your MAP_STACK issues. >> >>> Just started 10 VMs with sleep 2, machine freezes, but nothing on the >>> console. :( >> For now, I'd recommend spacing out vm launches. I'm pretty sure it's >> related to the uvm corruption we saw last year when creating, starting, >> and destroying vm's rapidly in a loop. > > That could very well be the case. I will adjust my start script, so > far I've got good results with a 10 second sleep. > > Is there some additional debugging I can turn that makes sense for > this? I can easily replicate. > Highly doubtful if the issue is what I think. The only thing would be making sure you're running in a way to see any panic and drop into ddb. If you're using X or not on the the primary console or serial connection it might just appear as a deadlocked system during a panic. -dv
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-06 05:36, Dave Voutila wrote: Mischa writes: On 2023-09-05 14:27, Dave Voutila wrote: Mike Larkin writes: On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote: On 2023-09-04 18:58, Mischa wrote: > On 2023-09-04 18:55, Mischa wrote: /snip > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started > > this way, before it would choke on 2-3. > > > > Do I only need the unpatched kernel or also the vmd/vmctl from snap? > > I do still get the same message on the console, but the machine isn't > freezing up. > > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not > MAP_STACK Starting 30 VMs this way caused the machine to become unresponsive again, but nothing on the console. :( Mischa Were you seeing these uvm errors before this diff? If so, this isn't causing the problem and something else is. I don't believe we solved any of the underlying uvm issues in Bruges last year. Mischa, can you test with just the latest snapshot/-current? I'd imagine starting and stopping many vm's now is exacerbating the issue because of the fork/exec for devices plus the ioctl to do a uvm share into the device process address space. If this diff causes the errors to occur, and without the diff it's fine, then we need to look into that. Also I think a pid number in that printf might be useful, I'll see what I can find. If it's not vmd causing this and rather some other process then that would be good to know also. Sadly it looks like that printf doesn't spit out the offending pid. :( Just to confirm I am seeing this behavior on the latest snap without the patch as well. Since this diff isn't the cause, I've committed it. Thanks for testing. I'll see if I can reproduce your MAP_STACK issues. Just started 10 VMs with sleep 2, machine freezes, but nothing on the console. :( For now, I'd recommend spacing out vm launches. I'm pretty sure it's related to the uvm corruption we saw last year when creating, starting, and destroying vm's rapidly in a loop. That could very well be the case. I will adjust my start script, so far I've got good results with a 10 second sleep. Is there some additional debugging I can turn that makes sense for this? I can easily replicate. Mischa
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
Mischa writes: > On 2023-09-05 14:27, Dave Voutila wrote: >> Mike Larkin writes: >> >>> On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote: On 2023-09-04 18:58, Mischa wrote: > On 2023-09-04 18:55, Mischa wrote: >> /snip >> > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started > > this way, before it would choke on 2-3. > > > > Do I only need the unpatched kernel or also the vmd/vmctl from snap? > > I do still get the same message on the console, but the machine isn't > freezing up. > > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not > MAP_STACK Starting 30 VMs this way caused the machine to become unresponsive again, but nothing on the console. :( Mischa >>> Were you seeing these uvm errors before this diff? If so, this >>> isn't >>> causing the problem and something else is. >> I don't believe we solved any of the underlying uvm issues in Bruges >> last year. Mischa, can you test with just the latest snapshot/-current? >> I'd imagine starting and stopping many vm's now is exacerbating the >> issue because of the fork/exec for devices plus the ioctl to do a uvm >> share into the device process address space. >> >>> If this diff causes the errors to occur, and without the diff it's >>> fine, then >>> we need to look into that. >>> Also I think a pid number in that printf might be useful, I'll see >>> what I can >>> find. If it's not vmd causing this and rather some other process >>> then that >>> would be good to know also. >> Sadly it looks like that printf doesn't spit out the offending >> pid. :( > > Just to confirm I am seeing this behavior on the latest snap without > the patch as well. Since this diff isn't the cause, I've committed it. Thanks for testing. I'll see if I can reproduce your MAP_STACK issues. > Just started 10 VMs with sleep 2, machine freezes, but nothing on the > console. :( For now, I'd recommend spacing out vm launches. I'm pretty sure it's related to the uvm corruption we saw last year when creating, starting, and destroying vm's rapidly in a loop. -dv
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-05 14:27, Dave Voutila wrote: Mike Larkin writes: On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote: On 2023-09-04 18:58, Mischa wrote: > On 2023-09-04 18:55, Mischa wrote: /snip > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started > > this way, before it would choke on 2-3. > > > > Do I only need the unpatched kernel or also the vmd/vmctl from snap? > > I do still get the same message on the console, but the machine isn't > freezing up. > > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not > MAP_STACK Starting 30 VMs this way caused the machine to become unresponsive again, but nothing on the console. :( Mischa Were you seeing these uvm errors before this diff? If so, this isn't causing the problem and something else is. I don't believe we solved any of the underlying uvm issues in Bruges last year. Mischa, can you test with just the latest snapshot/-current? I'd imagine starting and stopping many vm's now is exacerbating the issue because of the fork/exec for devices plus the ioctl to do a uvm share into the device process address space. If this diff causes the errors to occur, and without the diff it's fine, then we need to look into that. Also I think a pid number in that printf might be useful, I'll see what I can find. If it's not vmd causing this and rather some other process then that would be good to know also. Sadly it looks like that printf doesn't spit out the offending pid. :( Just to confirm I am seeing this behavior on the latest snap without the patch as well. Just started 10 VMs with sleep 2, machine freezes, but nothing on the console. :( Mischa
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-05 14:27, Dave Voutila wrote: Mike Larkin writes: On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote: On 2023-09-04 18:58, Mischa wrote: > On 2023-09-04 18:55, Mischa wrote: /snip > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started > > this way, before it would choke on 2-3. > > > > Do I only need the unpatched kernel or also the vmd/vmctl from snap? > > I do still get the same message on the console, but the machine isn't > freezing up. > > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not > MAP_STACK Starting 30 VMs this way caused the machine to become unresponsive again, but nothing on the console. :( Mischa Were you seeing these uvm errors before this diff? If so, this isn't causing the problem and something else is. I don't believe we solved any of the underlying uvm issues in Bruges last year. Mischa, can you test with just the latest snapshot/-current? Yes, after Mike's email I already started getting an extra machine up and running. Will finish that shortly and run the tests on the latest snap. I'd imagine starting and stopping many vm's now is exacerbating the issue because of the fork/exec for devices plus the ioctl to do a uvm share into the device process address space. I will adjust my scripts accordingly. I currently start as many VMs as there are cores in production. Will test if that is still possible. Mischa If this diff causes the errors to occur, and without the diff it's fine, then we need to look into that. Also I think a pid number in that printf might be useful, I'll see what I can find. If it's not vmd causing this and rather some other process then that would be good to know also. Sadly it looks like that printf doesn't spit out the offending pid. :(
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
Mike Larkin writes: > On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote: >> On 2023-09-04 18:58, Mischa wrote: >> > On 2023-09-04 18:55, Mischa wrote: /snip >> > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started >> > > this way, before it would choke on 2-3. >> > > >> > > Do I only need the unpatched kernel or also the vmd/vmctl from snap? >> > >> > I do still get the same message on the console, but the machine isn't >> > freezing up. >> > >> > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not >> > MAP_STACK >> >> Starting 30 VMs this way caused the machine to become unresponsive again, >> but nothing on the console. :( >> >> Mischa > > Were you seeing these uvm errors before this diff? If so, this isn't > causing the problem and something else is. I don't believe we solved any of the underlying uvm issues in Bruges last year. Mischa, can you test with just the latest snapshot/-current? I'd imagine starting and stopping many vm's now is exacerbating the issue because of the fork/exec for devices plus the ioctl to do a uvm share into the device process address space. > > If this diff causes the errors to occur, and without the diff it's fine, then > we need to look into that. > > > Also I think a pid number in that printf might be useful, I'll see what I can > find. If it's not vmd causing this and rather some other process then that > would be good to know also. Sadly it looks like that printf doesn't spit out the offending pid. :( -dv
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote: > On 2023-09-04 18:58, Mischa wrote: > > On 2023-09-04 18:55, Mischa wrote: > > > On 2023-09-04 17:57, Dave Voutila wrote: > > > > Mischa writes: > > > > > On 2023-09-04 16:23, Mike Larkin wrote: > > > > > > On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote: > > > > > > > On 2023-09-03 21:18, Dave Voutila wrote: > > > > > > > > Mischa writes: > > > > > > > > > > > > > > > > > Nice!! Thanx Dave! > > > > > > > > > > > > > > > > > > Running go brrr as we speak. > > > > > > > > > Testing with someone who is running Debian. > > > > > > > > > > > > > > > > Great. I'll plan on committing this tomorrow afternoon (4 Sep) > > > > > > > > my time > > > > > > > > unless I hear of any issues. > > > > > > > There are a couple of permanent VMs running on this host, 1 ToR > > > > > > > node, > > > > > > > OpenBSD VM and a Debian VM. > > > > > > > While they were running I started my stress script. > > > > > > > The first round I started 40 VMs with just bsd.rd, 2G memory > > > > > > > All good, then I started 40 VMs with a base disk and 2G memory. > > > > > > > After 20 VMs started I got the following messages on the console: > > > > > > > [umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: > > > > > > > not > > > > > > > MAP_STACK > > > > > > > [umd159360/355276 sp=783369$96750 inside > > > > > > > 7256d538c000-725645b8bFff: > > > > > > > not > > > > > > > MAP_STACK > > > > > > > [umd172263/319211 sp=70fb86794b60 inside > > > > > > > 75247a4d2000-75247acdifff: > > > > > > > not > > > > > > > MAP_STACK > > > > > > > [umd142824/38950 sp=7db1ed2a64d0 inside > > > > > > > 756c57d18000-756c58517fff: not > > > > > > > MAP_STACK > > > > > > > [umd19808/286658 sp=7dbied2a64d0 inside > > > > > > > 70f685f41000-70f6867dofff: not > > > > > > > MAP_STACK > > > > > > > [umd193279/488634 sp=72652c3e3da0 inside > > > > > > > 7845f168d000-7845f1e8cfff: > > > > > > > not > > > > > > > MAP_STACK > > > > > > > [umd155924/286116 sp=7eac5a1ff060 inside > > > > > > > 7b88bcb79000-7b88b4378fff: > > > > > > > not > > > > > > > MAP_STACK > > > > > > > Not sure if this is related to starting of the VMs or something > > > > > > > else, the > > > > > > > ToR node was consuming 100%+ CPU at the time. :) > > > > > > > Mischa > > > > > > I have not seen this; can you try without the ToR node > > > > > > some time and > > > > > > see if > > > > > > this still happens? > > > > > > > > > > Testing again without any other VMs running. > > > > > Things wrong when I run the following command and wait a little. > > > > > > > > > > for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2 > > > > > /var/vmm/vm${i}.qcow2 && vmctl start -L -d > > > > > /var/vmm/vm${i}.qcow2 -m 2G > > > > > vm${i}; done > > > > > > > > Can you try adding a "sleep 2" or something in the loop? I can't > > > > think > > > > of a reason my changes would cause this. Do you see this on -current > > > > without the diff? > > > > > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started > > > this way, before it would choke on 2-3. > > > > > > Do I only need the unpatched kernel or also the vmd/vmctl from snap? > > > > I do still get the same message on the console, but the machine isn't > > freezing up. > > > > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not > > MAP_STACK > > Starting 30 VMs this way caused the machine to become unresponsive again, > but nothing on the console. :( > > Mischa Were you seeing these uvm errors before this diff? If so, this isn't causing the problem and something else is. If this diff causes the errors to occur, and without the diff it's fine, then we need to look into that. Also I think a pid number in that printf might be useful, I'll see what I can find. If it's not vmd causing this and rather some other process then that would be good to know also.
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-04 18:58, Mischa wrote: On 2023-09-04 18:55, Mischa wrote: On 2023-09-04 17:57, Dave Voutila wrote: Mischa writes: On 2023-09-04 16:23, Mike Larkin wrote: On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote: On 2023-09-03 21:18, Dave Voutila wrote: > Mischa writes: > > > Nice!! Thanx Dave! > > > > Running go brrr as we speak. > > Testing with someone who is running Debian. > > Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time > unless I hear of any issues. There are a couple of permanent VMs running on this host, 1 ToR node, OpenBSD VM and a Debian VM. While they were running I started my stress script. The first round I started 40 VMs with just bsd.rd, 2G memory All good, then I started 40 VMs with a base disk and 2G memory. After 20 VMs started I got the following messages on the console: [umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: not MAP_STACK [umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff: not MAP_STACK [umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff: not MAP_STACK [umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not MAP_STACK [umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not MAP_STACK [umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff: not MAP_STACK [umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff: not MAP_STACK Not sure if this is related to starting of the VMs or something else, the ToR node was consuming 100%+ CPU at the time. :) Mischa I have not seen this; can you try without the ToR node some time and see if this still happens? Testing again without any other VMs running. Things wrong when I run the following command and wait a little. for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2 /var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 2G vm${i}; done Can you try adding a "sleep 2" or something in the loop? I can't think of a reason my changes would cause this. Do you see this on -current without the diff? Adding the sleep 2 does indeed help. I managed to get 20 VMs started this way, before it would choke on 2-3. Do I only need the unpatched kernel or also the vmd/vmctl from snap? I do still get the same message on the console, but the machine isn't freezing up. [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not MAP_STACK Starting 30 VMs this way caused the machine to become unresponsive again, but nothing on the console. :( Mischa
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-04 18:55, Mischa wrote: On 2023-09-04 17:57, Dave Voutila wrote: Mischa writes: On 2023-09-04 16:23, Mike Larkin wrote: On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote: On 2023-09-03 21:18, Dave Voutila wrote: > Mischa writes: > > > Nice!! Thanx Dave! > > > > Running go brrr as we speak. > > Testing with someone who is running Debian. > > Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time > unless I hear of any issues. There are a couple of permanent VMs running on this host, 1 ToR node, OpenBSD VM and a Debian VM. While they were running I started my stress script. The first round I started 40 VMs with just bsd.rd, 2G memory All good, then I started 40 VMs with a base disk and 2G memory. After 20 VMs started I got the following messages on the console: [umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: not MAP_STACK [umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff: not MAP_STACK [umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff: not MAP_STACK [umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not MAP_STACK [umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not MAP_STACK [umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff: not MAP_STACK [umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff: not MAP_STACK Not sure if this is related to starting of the VMs or something else, the ToR node was consuming 100%+ CPU at the time. :) Mischa I have not seen this; can you try without the ToR node some time and see if this still happens? Testing again without any other VMs running. Things wrong when I run the following command and wait a little. for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2 /var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 2G vm${i}; done Can you try adding a "sleep 2" or something in the loop? I can't think of a reason my changes would cause this. Do you see this on -current without the diff? Adding the sleep 2 does indeed help. I managed to get 20 VMs started this way, before it would choke on 2-3. Do I only need the unpatched kernel or also the vmd/vmctl from snap? I do still get the same message on the console, but the machine isn't freezing up. [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not MAP_STACK Mischa
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-04 17:57, Dave Voutila wrote: Mischa writes: On 2023-09-04 16:23, Mike Larkin wrote: On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote: On 2023-09-03 21:18, Dave Voutila wrote: > Mischa writes: > > > Nice!! Thanx Dave! > > > > Running go brrr as we speak. > > Testing with someone who is running Debian. > > Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time > unless I hear of any issues. There are a couple of permanent VMs running on this host, 1 ToR node, OpenBSD VM and a Debian VM. While they were running I started my stress script. The first round I started 40 VMs with just bsd.rd, 2G memory All good, then I started 40 VMs with a base disk and 2G memory. After 20 VMs started I got the following messages on the console: [umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: not MAP_STACK [umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff: not MAP_STACK [umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff: not MAP_STACK [umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not MAP_STACK [umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not MAP_STACK [umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff: not MAP_STACK [umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff: not MAP_STACK Not sure if this is related to starting of the VMs or something else, the ToR node was consuming 100%+ CPU at the time. :) Mischa I have not seen this; can you try without the ToR node some time and see if this still happens? Testing again without any other VMs running. Things wrong when I run the following command and wait a little. for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2 /var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 2G vm${i}; done Can you try adding a "sleep 2" or something in the loop? I can't think of a reason my changes would cause this. Do you see this on -current without the diff? Adding the sleep 2 does indeed help. I managed to get 20 VMs started this way, before it would choke on 2-3. Do I only need the unpatched kernel or also the vmd/vmctl from snap? Mischa
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
Mischa writes: > On 2023-09-04 16:23, Mike Larkin wrote: >> On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote: >>> On 2023-09-03 21:18, Dave Voutila wrote: >>> > Mischa writes: >>> > >>> > > Nice!! Thanx Dave! >>> > > >>> > > Running go brrr as we speak. >>> > > Testing with someone who is running Debian. >>> > >>> > Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time >>> > unless I hear of any issues. >>> There are a couple of permanent VMs running on this host, 1 ToR >>> node, >>> OpenBSD VM and a Debian VM. >>> While they were running I started my stress script. >>> The first round I started 40 VMs with just bsd.rd, 2G memory >>> All good, then I started 40 VMs with a base disk and 2G memory. >>> After 20 VMs started I got the following messages on the console: >>> [umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: >>> not >>> MAP_STACK >>> [umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff: >>> not >>> MAP_STACK >>> [umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff: >>> not >>> MAP_STACK >>> [umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not >>> MAP_STACK >>> [umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not >>> MAP_STACK >>> [umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff: >>> not >>> MAP_STACK >>> [umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff: >>> not >>> MAP_STACK >>> Not sure if this is related to starting of the VMs or something >>> else, the >>> ToR node was consuming 100%+ CPU at the time. :) >>> Mischa >> I have not seen this; can you try without the ToR node some time and >> see if >> this still happens? > > Testing again without any other VMs running. > Things wrong when I run the following command and wait a little. > > for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2 > /var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 2G > vm${i}; done Can you try adding a "sleep 2" or something in the loop? I can't think of a reason my changes would cause this. Do you see this on -current without the diff? -dv
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-04 16:23, Mike Larkin wrote: On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote: On 2023-09-03 21:18, Dave Voutila wrote: > Mischa writes: > > > Nice!! Thanx Dave! > > > > Running go brrr as we speak. > > Testing with someone who is running Debian. > > Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time > unless I hear of any issues. There are a couple of permanent VMs running on this host, 1 ToR node, OpenBSD VM and a Debian VM. While they were running I started my stress script. The first round I started 40 VMs with just bsd.rd, 2G memory All good, then I started 40 VMs with a base disk and 2G memory. After 20 VMs started I got the following messages on the console: [umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: not MAP_STACK [umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff: not MAP_STACK [umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff: not MAP_STACK [umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not MAP_STACK [umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not MAP_STACK [umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff: not MAP_STACK [umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff: not MAP_STACK Not sure if this is related to starting of the VMs or something else, the ToR node was consuming 100%+ CPU at the time. :) Mischa I have not seen this; can you try without the ToR node some time and see if this still happens? Testing again without any other VMs running. Things wrong when I run the following command and wait a little. for i in $(jot 10 10); do vmctl create -b /var/vmm/vm09.qcow2 /var/vmm/vm${i}.qcow2 && vmctl start -L -d /var/vmm/vm${i}.qcow2 -m 2G vm${i}; done Mischa
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On Mon, Sep 04, 2023 at 02:30:23PM +0200, Mischa wrote: > On 2023-09-03 21:18, Dave Voutila wrote: > > Mischa writes: > > > > > Nice!! Thanx Dave! > > > > > > Running go brrr as we speak. > > > Testing with someone who is running Debian. > > > > Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time > > unless I hear of any issues. > > There are a couple of permanent VMs running on this host, 1 ToR node, > OpenBSD VM and a Debian VM. > While they were running I started my stress script. > The first round I started 40 VMs with just bsd.rd, 2G memory > All good, then I started 40 VMs with a base disk and 2G memory. > After 20 VMs started I got the following messages on the console: > > [umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: not > MAP_STACK > [umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff: not > MAP_STACK > [umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff: not > MAP_STACK > [umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not > MAP_STACK > [umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not > MAP_STACK > [umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff: not > MAP_STACK > [umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff: not > MAP_STACK > > Not sure if this is related to starting of the VMs or something else, the > ToR node was consuming 100%+ CPU at the time. :) > > Mischa I have not seen this; can you try without the ToR node some time and see if this still happens?
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On 2023-09-03 21:18, Dave Voutila wrote: Mischa writes: Nice!! Thanx Dave! Running go brrr as we speak. Testing with someone who is running Debian. Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time unless I hear of any issues. There are a couple of permanent VMs running on this host, 1 ToR node, OpenBSD VM and a Debian VM. While they were running I started my stress script. The first round I started 40 VMs with just bsd.rd, 2G memory All good, then I started 40 VMs with a base disk and 2G memory. After 20 VMs started I got the following messages on the console: [umd116390/221323 sp=752d7ac9f090 inside 75c264948000-75c26147fff: not MAP_STACK [umd159360/355276 sp=783369$96750 inside 7256d538c000-725645b8bFff: not MAP_STACK [umd172263/319211 sp=70fb86794b60 inside 75247a4d2000-75247acdifff: not MAP_STACK [umd142824/38950 sp=7db1ed2a64d0 inside 756c57d18000-756c58517fff: not MAP_STACK [umd19808/286658 sp=7dbied2a64d0 inside 70f685f41000-70f6867dofff: not MAP_STACK [umd193279/488634 sp=72652c3e3da0 inside 7845f168d000-7845f1e8cfff: not MAP_STACK [umd155924/286116 sp=7eac5a1ff060 inside 7b88bcb79000-7b88b4378fff: not MAP_STACK Not sure if this is related to starting of the VMs or something else, the ToR node was consuming 100%+ CPU at the time. :) Mischa
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
Mischa writes: > Nice!! Thanx Dave! > > Running go brrr as we speak. > Testing with someone who is running Debian. Great. I'll plan on committing this tomorrow afternoon (4 Sep) my time unless I hear of any issues.
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
Nice!! Thanx Dave! Running go brrr as we speak. Testing with someone who is running Debian. Mischa On 2023-09-01 21:50, Dave Voutila wrote: Now that my i8259 fix is in, it's safe to expand the testing pool for this diff. (Without that fix, users would definitely hit the hung block device issue testing this one.) Hoping that folks that run non-OpenBSD guests or strange configurations can give it a spin. This change removes an ioctl(2) call from the vcpu thread hot path in vmd. Instead of making that syscall to toggle on/off a pending interrupt flag on the vcpu object in vmm(4), it adds a flag into the vm_run_params struct sent with the VMM_IOC_RUN ioctl. The in-kernel vcpu runloop can now toggle the pending interrupt state prior to vm entry. mbuhl@ and phessler@ have run this diff on their machines. Current observations are reduced average network latency for guests. My terse measurements using the following btrace script show some promising changes in terms of reducing ioctl syscalls: /* VMM_IOC_INTR: 0x800c5606 -> 2148292102 */ syscall:ioctl:entry /arg1 == 2148292102/ { @total[tid] = count(); @running[tid] = count(); } interval:hz:1 { print(@running); clear(@running); } Measuring from boot of an OpenBSD guest to after the guest finishes relinking (based on my manual observation of the libevent thread settling down in syscall rate), I see a huge reduction in VMM_IOC_INTR ioctls for a single guest: ## -current @total[433237]: 1325100 # vcpu thread (!!) @total[187073]: 80239# libevent thread ## with diff @total[550347]: 42 # vcpu thread (!!) @total[256550]: 86946# libevent thread Most of the VMM_IOC_INTR ioctls on the vcpu threads come from seabios and the bootloader prodding some of the emulated hardware, but even after the bootloader you'll see ~10-20k/s of ioctl's on -current vs. ~4-5k/s with the diff. At steady-state, the vcpu thread no longer makes the VMM_IOC_INTR calls at all and you should see the libevent thread calling it at a rate ~100/s (probably hardclock?). *Without* the diff, I see a steady 650/s rate on the vcpu thread at idle. *With* the diff, it's 0/s at idle. :) To test: - rebuild & install new kernel - copy/symlink vmmvar.h into /usr/include/machine/ - rebuild & re-install vmd & vmctl - reboot -dv diffstat refs/heads/master refs/heads/vmm-vrp_intr_pending M sys/arch/amd64/amd64/vmm_machdep.c | 10+ 0- M sys/arch/amd64/include/vmmvar.h | 1+ 0- M usr.sbin/vmd/vm.c | 2+ 16- 3 files changed, 13 insertions(+), 16 deletions(-) diff refs/heads/master refs/heads/vmm-vrp_intr_pending commit - 8afcf90fb39e4a84606e93137c2b6c20f44312cb commit + 10eeb8a0414ec927b6282473c50043a7027d6b41 blob - 24a376a8f3bc94bc4a4203fe66c5994594adff46 blob + e3b6d10a0ae78b12ec2f3296f708b42540ce798e --- sys/arch/amd64/amd64/vmm_machdep.c +++ sys/arch/amd64/amd64/vmm_machdep.c @@ -3973,6 +3973,11 @@ vcpu_run_vmx(struct vcpu *vcpu, struct vm_run_params * */ irq = vrp->vrp_irq; + if (vrp->vrp_intr_pending) + vcpu->vc_intr = 1; + else + vcpu->vc_intr = 0; + if (vrp->vrp_continue) { switch (vcpu->vc_gueststate.vg_exit_reason) { case VMX_EXIT_IO: @@ -6381,6 +6386,11 @@ vcpu_run_svm(struct vcpu *vcpu, struct vm_run_params * irq = vrp->vrp_irq; + if (vrp->vrp_intr_pending) + vcpu->vc_intr = 1; + else + vcpu->vc_intr = 0; + /* * If we are returning from userspace (vmd) because we exited * last time, fix up any needed vcpu state first. Which state blob - e9f8384cccfde33034d7ac9782610f93eb5dc640 blob + 88545b54b35dd60280ba87403e343db9463d7419 --- sys/arch/amd64/include/vmmvar.h +++ sys/arch/amd64/include/vmmvar.h @@ -456,6 +456,7 @@ struct vm_run_params { uint32_tvrp_vcpu_id; uint8_t vrp_continue; /* Continuing from an exit */ uint16_tvrp_irq;/* IRQ to inject */ + uint8_t vrp_intr_pending; /* Additional intrs pending? */ /* Input/output parameter to VMM_IOC_RUN */ struct vm_exit *vrp_exit; /* updated exit data */ blob - 5f598bcc14af5115372d34a4176254d377aad91c blob + 447fc219adadf945de2bf25d5335993c2abdc26f --- usr.sbin/vmd/vm.c +++ usr.sbin/vmd/vm.c @@ -1610,22 +1610,8 @@ vcpu_run_loop(void *arg) } else vrp->vrp_irq = 0x; - /* Still more pending? */ - if (i8259_is_pending()) { - /* -* XXX can probably avoid ioctls here by providing intr -* in vrp -*/ - if (vcpu_pic_intr(vrp->vrp_vm_id, - vrp->vrp_vcpu_id, 1)) { - fatal("can't set INTR"); - } - }
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On Fri, Sep 01, 2023 at 03:50:31PM -0400, Dave Voutila wrote: > Now that my i8259 fix is in, it's safe to expand the testing pool for > this diff. (Without that fix, users would definitely hit the hung block > device issue testing this one.) Hoping that folks that run non-OpenBSD > guests or strange configurations can give it a spin. > > This change removes an ioctl(2) call from the vcpu thread hot path in > vmd. Instead of making that syscall to toggle on/off a pending interrupt > flag on the vcpu object in vmm(4), it adds a flag into the vm_run_params > struct sent with the VMM_IOC_RUN ioctl. The in-kernel vcpu runloop can > now toggle the pending interrupt state prior to vm entry. > > mbuhl@ and phessler@ have run this diff on their machines. Current > observations are reduced average network latency for guests. > > My terse measurements using the following btrace script show some > promising changes in terms of reducing ioctl syscalls: > > /* VMM_IOC_INTR: 0x800c5606 -> 2148292102 */ > syscall:ioctl:entry > /arg1 == 2148292102/ > { > @total[tid] = count(); > @running[tid] = count(); > } > interval:hz:1 > { > print(@running); > clear(@running); > } > > Measuring from boot of an OpenBSD guest to after the guest finishes > relinking (based on my manual observation of the libevent thread > settling down in syscall rate), I see a huge reduction in VMM_IOC_INTR > ioctls for a single guest: > > ## -current > @total[433237]: 1325100 # vcpu thread (!!) > @total[187073]: 80239# libevent thread > > ## with diff > @total[550347]: 42 # vcpu thread (!!) > @total[256550]: 86946# libevent thread > > Most of the VMM_IOC_INTR ioctls on the vcpu threads come from seabios > and the bootloader prodding some of the emulated hardware, but even > after the bootloader you'll see ~10-20k/s of ioctl's on -current > vs. ~4-5k/s with the diff. > > At steady-state, the vcpu thread no longer makes the VMM_IOC_INTR calls > at all and you should see the libevent thread calling it at a rate ~100/s > (probably hardclock?). *Without* the diff, I see a steady 650/s rate on > the vcpu thread at idle. *With* the diff, it's 0/s at idle. :) > > To test: > - rebuild & install new kernel > - copy/symlink vmmvar.h into /usr/include/machine/ > - rebuild & re-install vmd & vmctl > - reboot > > -dv > > ok mlarkin, thanks! > diffstat refs/heads/master refs/heads/vmm-vrp_intr_pending > M sys/arch/amd64/amd64/vmm_machdep.c | 10+ 0- > M sys/arch/amd64/include/vmmvar.h | 1+ 0- > M usr.sbin/vmd/vm.c | 2+ 16- > > 3 files changed, 13 insertions(+), 16 deletions(-) > > diff refs/heads/master refs/heads/vmm-vrp_intr_pending > commit - 8afcf90fb39e4a84606e93137c2b6c20f44312cb > commit + 10eeb8a0414ec927b6282473c50043a7027d6b41 > blob - 24a376a8f3bc94bc4a4203fe66c5994594adff46 > blob + e3b6d10a0ae78b12ec2f3296f708b42540ce798e > --- sys/arch/amd64/amd64/vmm_machdep.c > +++ sys/arch/amd64/amd64/vmm_machdep.c > @@ -3973,6 +3973,11 @@ vcpu_run_vmx(struct vcpu *vcpu, struct vm_run_params * >*/ > irq = vrp->vrp_irq; > > + if (vrp->vrp_intr_pending) > + vcpu->vc_intr = 1; > + else > + vcpu->vc_intr = 0; > + > if (vrp->vrp_continue) { > switch (vcpu->vc_gueststate.vg_exit_reason) { > case VMX_EXIT_IO: > @@ -6381,6 +6386,11 @@ vcpu_run_svm(struct vcpu *vcpu, struct vm_run_params * > > irq = vrp->vrp_irq; > > + if (vrp->vrp_intr_pending) > + vcpu->vc_intr = 1; > + else > + vcpu->vc_intr = 0; > + > /* >* If we are returning from userspace (vmd) because we exited >* last time, fix up any needed vcpu state first. Which state > blob - e9f8384cccfde33034d7ac9782610f93eb5dc640 > blob + 88545b54b35dd60280ba87403e343db9463d7419 > --- sys/arch/amd64/include/vmmvar.h > +++ sys/arch/amd64/include/vmmvar.h > @@ -456,6 +456,7 @@ struct vm_run_params { > uint32_tvrp_vcpu_id; > uint8_t vrp_continue; /* Continuing from an exit */ > uint16_tvrp_irq;/* IRQ to inject */ > + uint8_t vrp_intr_pending; /* Additional intrs pending? */ > > /* Input/output parameter to VMM_IOC_RUN */ > struct vm_exit *vrp_exit; /* updated exit data */ > blob - 5f598bcc14af5115372d34a4176254d377aad91c > blob + 447fc219adadf945de2bf25d5335993c2abdc26f > --- usr.sbin/vmd/vm.c > +++ usr.sbin/vmd/vm.c > @@ -1610,22 +1610,8 @@ vcpu_run_loop(void *arg) > } else > vrp->vrp_irq = 0x; > > - /* Still more pending? */ > - if (i8259_is_pending()) { > - /* > - * XXX can probably avoid ioctls here by providing intr > - * in vrp > - */ > - if (vcpu_pic_intr(vrp->vrp_vm_id, > - vrp->vrp_vcpu_id, 1)) { > -
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
On Sat, Sep 02, 2023 at 12:59:51PM +0200, Peter Hessler wrote: > I just upgraded to -current and didn't have this patch in for a little > bit, and woof that was super noticable. Still works for my big VM host. > > OK No issues here either with just one VM running for testing some software, so mostly idle.
Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
I just upgraded to -current and didn't have this patch in for a little bit, and woof that was super noticable. Still works for my big VM host. OK On 2023 Sep 01 (Fri) at 15:50:31 -0400 (-0400), Dave Voutila wrote: :Now that my i8259 fix is in, it's safe to expand the testing pool for :this diff. (Without that fix, users would definitely hit the hung block :device issue testing this one.) Hoping that folks that run non-OpenBSD :guests or strange configurations can give it a spin. : :This change removes an ioctl(2) call from the vcpu thread hot path in :vmd. Instead of making that syscall to toggle on/off a pending interrupt :flag on the vcpu object in vmm(4), it adds a flag into the vm_run_params :struct sent with the VMM_IOC_RUN ioctl. The in-kernel vcpu runloop can :now toggle the pending interrupt state prior to vm entry. : :mbuhl@ and phessler@ have run this diff on their machines. Current :observations are reduced average network latency for guests. : :My terse measurements using the following btrace script show some :promising changes in terms of reducing ioctl syscalls: : : /* VMM_IOC_INTR: 0x800c5606 -> 2148292102 */ : syscall:ioctl:entry : /arg1 == 2148292102/ : { :@total[tid] = count(); :@running[tid] = count(); : } : interval:hz:1 : { :print(@running); :clear(@running); : } : :Measuring from boot of an OpenBSD guest to after the guest finishes :relinking (based on my manual observation of the libevent thread :settling down in syscall rate), I see a huge reduction in VMM_IOC_INTR :ioctls for a single guest: : :## -current :@total[433237]: 1325100 # vcpu thread (!!) :@total[187073]: 80239# libevent thread : :## with diff :@total[550347]: 42 # vcpu thread (!!) :@total[256550]: 86946# libevent thread : :Most of the VMM_IOC_INTR ioctls on the vcpu threads come from seabios :and the bootloader prodding some of the emulated hardware, but even :after the bootloader you'll see ~10-20k/s of ioctl's on -current :vs. ~4-5k/s with the diff. : :At steady-state, the vcpu thread no longer makes the VMM_IOC_INTR calls :at all and you should see the libevent thread calling it at a rate ~100/s :(probably hardclock?). *Without* the diff, I see a steady 650/s rate on :the vcpu thread at idle. *With* the diff, it's 0/s at idle. :) : :To test: :- rebuild & install new kernel :- copy/symlink vmmvar.h into /usr/include/machine/ :- rebuild & re-install vmd & vmctl :- reboot : :-dv : : :diffstat refs/heads/master refs/heads/vmm-vrp_intr_pending : M sys/arch/amd64/amd64/vmm_machdep.c | 10+ 0- : M sys/arch/amd64/include/vmmvar.h | 1+ 0- : M usr.sbin/vmd/vm.c | 2+ 16- : :3 files changed, 13 insertions(+), 16 deletions(-) : :diff refs/heads/master refs/heads/vmm-vrp_intr_pending :commit - 8afcf90fb39e4a84606e93137c2b6c20f44312cb :commit + 10eeb8a0414ec927b6282473c50043a7027d6b41 :blob - 24a376a8f3bc94bc4a4203fe66c5994594adff46 :blob + e3b6d10a0ae78b12ec2f3296f708b42540ce798e :--- sys/arch/amd64/amd64/vmm_machdep.c :+++ sys/arch/amd64/amd64/vmm_machdep.c :@@ -3973,6 +3973,11 @@ vcpu_run_vmx(struct vcpu *vcpu, struct vm_run_params * :*/ : irq = vrp->vrp_irq; : :+ if (vrp->vrp_intr_pending) :+ vcpu->vc_intr = 1; :+ else :+ vcpu->vc_intr = 0; :+ : if (vrp->vrp_continue) { : switch (vcpu->vc_gueststate.vg_exit_reason) { : case VMX_EXIT_IO: :@@ -6381,6 +6386,11 @@ vcpu_run_svm(struct vcpu *vcpu, struct vm_run_params * : : irq = vrp->vrp_irq; : :+ if (vrp->vrp_intr_pending) :+ vcpu->vc_intr = 1; :+ else :+ vcpu->vc_intr = 0; :+ : /* :* If we are returning from userspace (vmd) because we exited :* last time, fix up any needed vcpu state first. Which state :blob - e9f8384cccfde33034d7ac9782610f93eb5dc640 :blob + 88545b54b35dd60280ba87403e343db9463d7419 :--- sys/arch/amd64/include/vmmvar.h :+++ sys/arch/amd64/include/vmmvar.h :@@ -456,6 +456,7 @@ struct vm_run_params { : uint32_tvrp_vcpu_id; : uint8_t vrp_continue; /* Continuing from an exit */ : uint16_tvrp_irq;/* IRQ to inject */ :+ uint8_t vrp_intr_pending; /* Additional intrs pending? */ : : /* Input/output parameter to VMM_IOC_RUN */ : struct vm_exit *vrp_exit; /* updated exit data */ :blob - 5f598bcc14af5115372d34a4176254d377aad91c :blob + 447fc219adadf945de2bf25d5335993c2abdc26f :--- usr.sbin/vmd/vm.c :+++ usr.sbin/vmd/vm.c :@@ -1610,22 +1610,8 @@ vcpu_run_loop(void *arg) : } else : vrp->vrp_irq = 0x; : :- /* Still more pending? */ :- if (i8259_is_pending()) { :- /* :- * XXX can probably avoid ioctls here by providing intr :- * in vrp :- */ :- if
vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
Now that my i8259 fix is in, it's safe to expand the testing pool for this diff. (Without that fix, users would definitely hit the hung block device issue testing this one.) Hoping that folks that run non-OpenBSD guests or strange configurations can give it a spin. This change removes an ioctl(2) call from the vcpu thread hot path in vmd. Instead of making that syscall to toggle on/off a pending interrupt flag on the vcpu object in vmm(4), it adds a flag into the vm_run_params struct sent with the VMM_IOC_RUN ioctl. The in-kernel vcpu runloop can now toggle the pending interrupt state prior to vm entry. mbuhl@ and phessler@ have run this diff on their machines. Current observations are reduced average network latency for guests. My terse measurements using the following btrace script show some promising changes in terms of reducing ioctl syscalls: /* VMM_IOC_INTR: 0x800c5606 -> 2148292102 */ syscall:ioctl:entry /arg1 == 2148292102/ { @total[tid] = count(); @running[tid] = count(); } interval:hz:1 { print(@running); clear(@running); } Measuring from boot of an OpenBSD guest to after the guest finishes relinking (based on my manual observation of the libevent thread settling down in syscall rate), I see a huge reduction in VMM_IOC_INTR ioctls for a single guest: ## -current @total[433237]: 1325100 # vcpu thread (!!) @total[187073]: 80239# libevent thread ## with diff @total[550347]: 42 # vcpu thread (!!) @total[256550]: 86946# libevent thread Most of the VMM_IOC_INTR ioctls on the vcpu threads come from seabios and the bootloader prodding some of the emulated hardware, but even after the bootloader you'll see ~10-20k/s of ioctl's on -current vs. ~4-5k/s with the diff. At steady-state, the vcpu thread no longer makes the VMM_IOC_INTR calls at all and you should see the libevent thread calling it at a rate ~100/s (probably hardclock?). *Without* the diff, I see a steady 650/s rate on the vcpu thread at idle. *With* the diff, it's 0/s at idle. :) To test: - rebuild & install new kernel - copy/symlink vmmvar.h into /usr/include/machine/ - rebuild & re-install vmd & vmctl - reboot -dv diffstat refs/heads/master refs/heads/vmm-vrp_intr_pending M sys/arch/amd64/amd64/vmm_machdep.c | 10+ 0- M sys/arch/amd64/include/vmmvar.h | 1+ 0- M usr.sbin/vmd/vm.c | 2+ 16- 3 files changed, 13 insertions(+), 16 deletions(-) diff refs/heads/master refs/heads/vmm-vrp_intr_pending commit - 8afcf90fb39e4a84606e93137c2b6c20f44312cb commit + 10eeb8a0414ec927b6282473c50043a7027d6b41 blob - 24a376a8f3bc94bc4a4203fe66c5994594adff46 blob + e3b6d10a0ae78b12ec2f3296f708b42540ce798e --- sys/arch/amd64/amd64/vmm_machdep.c +++ sys/arch/amd64/amd64/vmm_machdep.c @@ -3973,6 +3973,11 @@ vcpu_run_vmx(struct vcpu *vcpu, struct vm_run_params * */ irq = vrp->vrp_irq; + if (vrp->vrp_intr_pending) + vcpu->vc_intr = 1; + else + vcpu->vc_intr = 0; + if (vrp->vrp_continue) { switch (vcpu->vc_gueststate.vg_exit_reason) { case VMX_EXIT_IO: @@ -6381,6 +6386,11 @@ vcpu_run_svm(struct vcpu *vcpu, struct vm_run_params * irq = vrp->vrp_irq; + if (vrp->vrp_intr_pending) + vcpu->vc_intr = 1; + else + vcpu->vc_intr = 0; + /* * If we are returning from userspace (vmd) because we exited * last time, fix up any needed vcpu state first. Which state blob - e9f8384cccfde33034d7ac9782610f93eb5dc640 blob + 88545b54b35dd60280ba87403e343db9463d7419 --- sys/arch/amd64/include/vmmvar.h +++ sys/arch/amd64/include/vmmvar.h @@ -456,6 +456,7 @@ struct vm_run_params { uint32_tvrp_vcpu_id; uint8_t vrp_continue; /* Continuing from an exit */ uint16_tvrp_irq;/* IRQ to inject */ + uint8_t vrp_intr_pending; /* Additional intrs pending? */ /* Input/output parameter to VMM_IOC_RUN */ struct vm_exit *vrp_exit; /* updated exit data */ blob - 5f598bcc14af5115372d34a4176254d377aad91c blob + 447fc219adadf945de2bf25d5335993c2abdc26f --- usr.sbin/vmd/vm.c +++ usr.sbin/vmd/vm.c @@ -1610,22 +1610,8 @@ vcpu_run_loop(void *arg) } else vrp->vrp_irq = 0x; - /* Still more pending? */ - if (i8259_is_pending()) { - /* -* XXX can probably avoid ioctls here by providing intr -* in vrp -*/ - if (vcpu_pic_intr(vrp->vrp_vm_id, - vrp->vrp_vcpu_id, 1)) { - fatal("can't set INTR"); - } - } else { - if (vcpu_pic_intr(vrp->vrp_vm_id, - vrp->vrp_vcpu_id, 0)) { -