Re: [PATCH 00/17] Paravirt CPUs and push task for less vCPU preemption

Shrikanth Hegde Wed, 17 Dec 2025 21:24:09 -0800

Hi, Sorry for delay in response. Just landed yesterday from LPC.

Others have already commented on the naming, and I would agree that
"paravirt" is really misleading. I cannot say that the previous "cpu-
avoid" one was perfect, but it was much better.

It was my suggestion to switch names. cpu-avoid is definitely a

no-go. Because it doesn't explain anything and only confuses.

I suggested 'paravirt' (notice - only suggested) because the patch
series is mainly discussing paravirtualized VMs. But now I'm not even
sure that the idea of the series is:

1. Applicable only to paravirtualized VMs; and
2. Preemption and rescheduling throttling requires another in-kernel
    concept other than nohs, isolcpus, cgroups and similar.

Shrikanth, can you please clarify the scope of the new feature? Would
it be useful for non-paravirtualized VMs, for example? Any other
task-cpu bonding problems?


Current scope of the feature in virtulaized environment where the idea is
to do co-operative folding in each VM based on hint(either HW hint or steal 
time).

If you see from macro level, this is framework which allows one to avoid some 
vCPUs(In
Guest) to achieve better throughput or latency. So one could come up with more 
usecases
even in non-paravirtualized VMs. For example, one crazy idea such as avoid 
using SMT siblings
when the system utilization is low to achieve higher ipc(instruction per cycle) 
value.


On previous rounds you tried to implement the same with cgroups, as
far as I understood. Can you discuss that? What exactly can't be done
with the existing kernel APIs?

Thanks,
Yury


We discussed this in Sched-MC this year.
https://youtu.be/zf-MBoUIz1Q?t=8581


Currently explored options.

1. CPU Hotplug - slow. Some efforts underway to speed it up.
2. Creating isolated cpusets - Faster. still involves sched domain rebuilds.

The reason why they both won't work is that they break user affinities in the 
guest.
i.e guest can do "taskset -c <some_vcpus> <workload>, when the
last vCPU goes offline(guest vCPU hotplug) in that list of vCPUs
the affinity mask is reset and workload can run on online vCPUs and it
doesn't set back to earlier value. That is okay for hotlug or isolated cpusets
since it is driven by user in the guest. So user is aware of it.

Whereas here, the change is driven by the system than user in the guest.
So it cannot break user-space affinities.
So we need a new interface to drive this. I think it is better if it is
non cgroup based framework since cgroup is usually user driven.
(correct me if i am wrong).

PS:
There were some confusion around this affinity breaking. Note it is guest vCPU 
being marked and
guest vCPU being hotplugged. Task affinied workload was running in guest. Host 
CPUs(pCPU) are not
hotplugged.

---

I had discussion with vincent in hallway, idea is to use the push framework 
bits and set the
CPU Capacity=1 (lowest value and consider it as special value) and use a static 
key check to do
this stuff only when HW says to do so.
Such as (considering name as paravirt):

static inline bool cpu_paravirt(int cpu)
{
        if (static_branch_unlikely(&cpu_paravirt_framework))
                return arch_scale_cpu_capacity(cpu) == 1;

        return false;
}

Rest of the bits remain same. I found an issue with current series where 
setting affinity
is going wrong after cpu is marked paravirt, i will fix it next version. will 
do some more
testing and send next version in 2026.

Happy Holidays!

Re: [PATCH 00/17] Paravirt CPUs and push task for less vCPU preemption

Reply via email to