On Tue, Nov 08, 2022 at 01:09:27AM +,
"Huang, Kai" wrote:
> On Mon, 2022-11-07 at 13:46 -0800, Isaku Yamahata wrote:
> > > On Fri, Nov 04, 2022, Isaku Yamahata wrote:
> > > > Thanks for the patch series. I the rebased TDX KVM patch series and it
> > > > worked.
> > > > Since cpu offline needs
Hi Marc,
On Mon, Nov 7, 2022 at 1:16 AM Marc Zyngier wrote:
>
> Allow userspace to write ID_AA64DFR0_EL1, on the condition that only
> the PMUver field can be altered and be at most the one that was
> initially computed for the guest.
>
> Signed-off-by: Marc Zyngier
> ---
> arch/arm64/kvm/sys_r
Hi Marc,
> > BTW, if we have no intention of supporting a mix of vCPUs with and
> > without PMU, I think it would be nice if we have a clear comment on
> > that in the code. Or I'm hoping to disallow it if possible though.
>
> I'm not sure we're in a position to do this right now. The current API
In the dirty ring case, we rely on vcpu exit due to full dirty ring
state. On ARM64 system, there are 4096 host pages when the host
page size is 64KB. In this case, the vcpu never exits due to the
full dirty ring state. The similar case is 4KB page size on host
and 64KB page size on guest. The vcpu
There are two states, which need to be cleared before next mode
is executed. Otherwise, we will hit failure as the following messages
indicate.
- The variable 'dirty_ring_vcpu_ring_full' shared by main and vcpu
thread. It's indicating if the vcpu exit due to full ring buffer.
The value can be
In vcpu_map_dirty_ring(), the guest's page size is used to figure out
the offset in the virtual area. It works fine when we have same page
sizes on host and guest. However, it fails when the page sizes on host
and guest are different on arm64, like below error messages indicates.
# ./dirty_log_t
Enable ring-based dirty memory tracking on arm64 by selecting
CONFIG_HAVE_KVM_DIRTY_{RING_ACQ_REL, RING_WITH_BITMAP} and providing
the ring buffer's physical page offset (KVM_DIRTY_LOG_PAGE_OFFSET).
Besides, ARM64 specific kvm_arch_allow_write_without_running_vcpu() is
added to override the generi
ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
enabled. It's conflicting with that ring-based dirty page tracking always
requires a running VCPU context.
Introduce a new flavor of dirty ring that requires the use of both VCPU
dirty rings and a dirty bitmap. The expectation
Not all architectures like ARM64 need to override the function. Move
its declaration to kvm_dirty_ring.h to avoid the following compiling
warning on ARM64 when the feature is enabled.
arch/arm64/kvm/../../../virt/kvm/dirty_ring.c:14:12:\
warning: no previous prototype for 'kvm_cpu_dirt
The VCPU isn't expected to be runnable when the dirty ring becomes soft
full, until the dirty pages are harvested and the dirty ring is reset
from userspace. So there is a check in each guest's entrace to see if
the dirty ring is soft full or not. The VCPU is stopped from running if
its dirty ring
This series enables the ring-based dirty memory tracking for ARM64.
The feature has been available and enabled on x86 for a while. It
is beneficial when the number of dirty pages is small in a checkpointing
system or live migration scenario. More details can be found from
fb04a1eddb1a ("KVM: X86: I
Hi Oliver,
On 11/8/22 9:13 AM, Oliver Upton wrote:
On Tue, Nov 08, 2022 at 08:44:52AM +0800, Gavin Shan wrote:
Frankly, I don't expect the capability to be disabled. Similar to
KVM_CAP_DIRTY_LOG_RING
or KVM_CAP_DIRTY_LOG_RING_ACQ_REL, it would a one-shot capability and only
enablement is
allo
On Tue, Nov 08, 2022 at 08:44:52AM +0800, Gavin Shan wrote:
> Frankly, I don't expect the capability to be disabled. Similar to
> KVM_CAP_DIRTY_LOG_RING
> or KVM_CAP_DIRTY_LOG_RING_ACQ_REL, it would a one-shot capability and only
> enablement is
> allowed. The disablement was suggested by Oliver
Hi Sean,
On 11/8/22 12:05 AM, Sean Christopherson wrote:
On Sat, Nov 05, 2022, Gavin Shan wrote:
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index fecbb7d75ad2..758679724447 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -21,6 +21,16 @@ u32 kvm_dirty_ring_get_r
Hi Marc,
On 11/7/22 7:33 PM, Marc Zyngier wrote:
On Mon, 07 Nov 2022 10:45:34 +,
Gavin Shan wrote:
On 11/5/22 7:40 AM, Gavin Shan wrote:
ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
enabled. It's conflicting with that ring-based dirty page tracking always
require
The stage-2 map walker has been made parallel-aware, and as such can be
called while only holding the read side of the MMU lock. Rip out the
conditional locking in user_mem_abort() and instead grab the read lock.
Continue to take the write lock from other callsites to
kvm_pgtable_stage2_map().
Sig
stage2_map_walker_try_leaf() and friends now handle stage-2 PTEs
generically, and perform the correct flush when a table PTE is removed.
Additionally, they've been made parallel-aware, using an atomic break
to take ownership of the PTE.
Stop clearing the PTE in the pre-order callback and instead l
Convert stage2_map_walker_try_leaf() to use the new break-before-make
helpers, thereby making the handler parallel-aware. As before, avoid the
break-before-make if recreating the existing mapping. Additionally,
retry execution if another vCPU thread is modifying the same PTE.
Signed-off-by: Oliver
In order to service stage-2 faults in parallel, stage-2 table walkers
must take exclusive ownership of the PTE being worked on. An additional
requirement of the architecture is that software must perform a
'break-before-make' operation when changing the block size used for
mapping memory.
Roll the
The stage2 attr walker is already used for parallel walks. Since commit
f783ef1c0e82 ("KVM: arm64: Add fast path to handle permission relaxation
during dirty logging"), KVM acquires the read lock when
write-unprotecting a PTE. However, the walker only uses a simple store
to update the PTE. This is
Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
release the RCU read lock when traversing the page tables. Defer the
freeing of table memory to an RCU callback. Indirect the calls into RCU
and provide stubs for hypervisor code, as RCU is not available in such a
context.
The
Create a helper to initialize a table and directly call
smp_store_release() to install it (for now). Prepare for a subsequent
change that generalizes PTE writes with a helper.
Signed-off-by: Oliver Upton
---
arch/arm64/kvm/hyp/pgtable.c | 20 ++--
1 file changed, 10 insertions(+)
Use an opaque type for pteps and require visitors explicitly dereference
the pointer before using. Protecting page table memory with RCU requires
that KVM dereferences RCU-annotated pointers before using. However, RCU
is not available for use in the nVHE hypervisor and the opaque type can
be condit
The break-before-make sequence is a bit annoying as it opens a window
wherein memory is unmapped from the guest. KVM should replace the PTE
as quickly as possible and avoid unnecessary work in between.
Presently, the stage-2 map walker tears down a removed table before
installing a block mapping w
As a prerequisite for getting visitors off of struct kvm_pgtable, pass
mm_ops through the visitor context.
No functional change intended.
Signed-off-by: Oliver Upton
---
arch/arm64/include/asm/kvm_pgtable.h | 1 +
arch/arm64/kvm/hyp/nvhe/setup.c | 3 +-
arch/arm64/kvm/hyp/pgtable.c
A subsequent change to KVM will move the tear down of an unlinked
stage-2 subtree out of the critical path of the break-before-make
sequence.
Introduce a new helper for tearing down unlinked stage-2 subtrees.
Leverage the existing stage-2 free walkers to do so, with a deep call
into __kvm_pgtable_
In order to tear down page tables from outside the context of
kvm_pgtable (such as an RCU callback), stop passing a pointer through
kvm_pgtable_walk_data.
No functional change intended.
Signed-off-by: Oliver Upton
---
arch/arm64/kvm/hyp/pgtable.c | 18 +-
1 file changed, 5 inser
Passing new arguments by value to the visitor callbacks is extremely
inflexible for stuffing new parameters used by only some of the
visitors. Use a context structure instead and pass the pointer through
to the visitor callback.
While at it, redefine the 'flags' parameter to the visitor to contain
Rather than reading the ptep all over the shop, read the ptep once from
__kvm_pgtable_visit() and stick it in the visitor context. Reread the
ptep after visiting a leaf in case the callback installed a new table
underneath.
No functional change intended.
Signed-off-by: Oliver Upton
---
arch/arm
Presently KVM only takes a read lock for stage 2 faults if it believes
the fault can be fixed by relaxing permissions on a PTE (write unprotect
for dirty logging). Otherwise, stage 2 faults grab the write lock, which
predictably can pile up all the vCPUs in a sufficiently large VM.
Like the TDP MM
On Fri, Nov 04, 2022 at 08:27:14PM +,
Sean Christopherson wrote:
> On Fri, Nov 04, 2022, Isaku Yamahata wrote:
> > Thanks for the patch series. I the rebased TDX KVM patch series and it
> > worked.
> > Since cpu offline needs to be rejected in some cases(To keep at least one
> > cpu
> > on
On 07/11/2022 18:02, Punit Agrawal wrote:
Usama Arif writes:
Implement the service call for configuring a shared structure between a
VCPU and the hypervisor in which the hypervisor can tell whether the
VCPU is running or not.
The preempted field is zero if the VCPU is not preempted.
Any ot
On 07/11/2022 17:58, Punit Agrawal wrote:
Usama Arif writes:
Add a new SMCCC compatible hypercalls for PV lock features:
ARM_SMCCC_KVM_FUNC_PV_LOCK: 0xC602
Also add the header file which defines the ABI for the paravirtualized
lock features we're about to add.
Signed-off-by: Zeng
On 07/11/2022 17:56, Punit Agrawal wrote:
Hi Usama,
Usama Arif writes:
Introduce a paravirtualization interface for KVM/arm64 to obtain whether
the VCPU is currently running or not.
The PV lock structure of the guest is allocated by user space.
A hypercall interface is provided for the g
On Thu, Nov 03 2022, Peter Collingbourne wrote:
> Document both the restriction on VM_MTE_ALLOWED mappings and
> the relaxation for shared mappings.
>
> Signed-off-by: Peter Collingbourne
> Acked-by: Catalin Marinas
> ---
> Documentation/virt/kvm/api.rst | 5 +++--
> 1 file changed, 3 insertio
On Thu, Nov 03 2022, Peter Collingbourne wrote:
> Certain VMMs such as crosvm have features (e.g. sandboxing) that depend
> on being able to map guest memory as MAP_SHARED. The current restriction
> on sharing MAP_SHARED pages with the guest is preventing the use of
> those features with MTE. Now
On Thu, Nov 03 2022, Peter Collingbourne wrote:
> Previously we allowed creating a memslot containing a private mapping that
> was not VM_MTE_ALLOWED, but would later reject KVM_RUN with -EFAULT. Now
> we reject the memory region at memslot creation time.
>
> Since this is a minor tweak to the AB
On Thu, Nov 03 2022, Peter Collingbourne wrote:
> From: Catalin Marinas
>
> Initialising the tags and setting PG_mte_tagged flag for a page can race
> between multiple set_pte_at() on shared pages or setting the stage 2 pte
> via user_mem_abort(). Introduce a new PG_mte_lock flag as PG_arch_3 an
On Thu, Nov 03 2022, Peter Collingbourne wrote:
> From: Catalin Marinas
>
> Currently sanitise_mte_tags() checks if it's an online page before
> attempting to sanitise the tags. Such detection should be done in the
> caller via the VM_MTE_ALLOWED vma flag. Since kvm_set_spte_gfn() does
> not hav
On Sat, Nov 05, 2022, Gavin Shan wrote:
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index fecbb7d75ad2..758679724447 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -21,6 +21,16 @@ u32 kvm_dirty_ring_get_rsvd_entries(void)
> return KVM_DIRTY_RING_RSV
On Mon, 07 Nov 2022 14:47:10 +,
Leo Yan wrote:
>
> On Sat, Nov 05, 2022 at 01:28:40PM +, Marc Zyngier wrote:
>
> [...]
>
> > > Before:
> > >
> > > # perf kvm stat report --vcpu 27
> > >
> > > Analyze events for all VMs, VCPU 27:
> > >
> > >VM-EXITSamples Samp
On Mon, 07 Nov 2022 14:59:41 +,
Peter Xu wrote:
>
> On Mon, Nov 07, 2022 at 09:21:35AM +, Marc Zyngier wrote:
> > On Sun, 06 Nov 2022 21:06:43 +,
> > Peter Xu wrote:
> > >
> > > It's definitely not the case for x86, but if it's true for
> > > arm64, then could the DMA be spread acro
On Fri, Nov 04, 2022 at 10:55:03AM -0500, Rob Herring wrote:
> Convert all the SPE register defines to automatic generation. No
> functional changes.
>
> New registers and fields for SPEv1.2 are added with the conversion.
>
> Some of the PMBSR MSS field defines are kept as the automatic generati
On Mon, Nov 07, 2022 at 09:21:35AM +, Marc Zyngier wrote:
> On Sun, 06 Nov 2022 21:06:43 +,
> Peter Xu wrote:
> >
> > On Sun, Nov 06, 2022 at 08:12:22PM +, Marc Zyngier wrote:
> > > Hi Peter,
> > >
> > > On Sun, 06 Nov 2022 16:22:29 +,
> > > Peter Xu wrote:
> > > >
> > > > Hi,
On Sat, Nov 05, 2022 at 01:28:40PM +, Marc Zyngier wrote:
[...]
> > Before:
> >
> > # perf kvm stat report --vcpu 27
> >
> > Analyze events for all VMs, VCPU 27:
> >
> >VM-EXITSamples Samples% Time%Min TimeMax
> > Time Avg time
> >
> > Total
On Mon, Nov 07, 2022 at 09:38:24AM +, Marc Zyngier wrote:
> Peter said there is an undefined behaviour. I want to understand
> whether this is the case or not. QEMU is only one of the users of this
> stuff, as all the vendors have their own custom VMM, and they do
> things in funny ways.
It's
On 06/11/2022 16:35, Marc Zyngier wrote:
On Fri, 04 Nov 2022 06:20:59 +,
Usama Arif wrote:
This patchset adds support for vcpu_is_preempted in arm64, which
allows the guest to check if a vcpu was scheduled out, which is
useful to know incase it was holding a lock. vcpu_is_preempted can
On Mon, 07 Nov 2022 10:45:34 +,
Gavin Shan wrote:
>
> Hi Marc, Peter, Oliver and Sean,
>
> On 11/5/22 7:40 AM, Gavin Shan wrote:
> > ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> > enabled. It's conflicting with that ring-based dirty page tracking always
> > requir
Hi Marc,
On 11/7/22 5:47 PM, Marc Zyngier wrote:
On Sun, 06 Nov 2022 21:46:19 +,
Gavin Shan wrote:
On 11/6/22 11:50 PM, Marc Zyngier wrote:
On Fri, 04 Nov 2022 23:40:46 +,
Gavin Shan wrote:
Enable ring-based dirty memory tracking on arm64 by selecting
CONFIG_HAVE_KVM_DIRTY_{RING_AC
Hi Marc, Peter, Oliver and Sean,
On 11/5/22 7:40 AM, Gavin Shan wrote:
ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
enabled. It's conflicting with that ring-based dirty page tracking always
requires a running VCPU context.
Introduce a new flavor of dirty ring that requ
On Sun, 06 Nov 2022 21:46:19 +,
Gavin Shan wrote:
>
> Hi Marc,
>
> On 11/6/22 11:50 PM, Marc Zyngier wrote:
> > On Fri, 04 Nov 2022 23:40:46 +,
> > Gavin Shan wrote:
> >>
> >> Enable ring-based dirty memory tracking on arm64 by selecting
> >> CONFIG_HAVE_KVM_DIRTY_{RING_ACQ_REL, RING_W
On Sun, 06 Nov 2022 21:40:49 +,
Gavin Shan wrote:
>
> Hi Marc,
>
> On 11/6/22 11:43 PM, Marc Zyngier wrote:
> > On Fri, 04 Nov 2022 23:40:45 +,
> > Gavin Shan wrote:
> >>
> >> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> >> enabled. It's conflicting with tha
On Sun, 06 Nov 2022 21:23:13 +,
Gavin Shan wrote:
>
> Hi Peter and Marc,
>
> On 11/7/22 5:06 AM, Peter Xu wrote:
> > On Sun, Nov 06, 2022 at 08:12:22PM +, Marc Zyngier wrote:
> >> On Sun, 06 Nov 2022 16:22:29 +,
> >> Peter Xu wrote:
> >>> On Sun, Nov 06, 2022 at 03:43:17PM +, Ma
On Sun, 06 Nov 2022 21:06:43 +,
Peter Xu wrote:
>
> On Sun, Nov 06, 2022 at 08:12:22PM +, Marc Zyngier wrote:
> > Hi Peter,
> >
> > On Sun, 06 Nov 2022 16:22:29 +,
> > Peter Xu wrote:
> > >
> > > Hi, Marc,
> > >
> > > On Sun, Nov 06, 2022 at 03:43:17PM +, Marc Zyngier wrote:
>
Now that the infrastructure is in place, bump the PMU support up
to PMUv3p5.
Signed-off-by: Marc Zyngier
---
arch/arm64/kvm/pmu-emul.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index dc163e1a1fcf..26293f842b0f 100644
PMUv3p5 (which is mandatory with ARMv8.5) comes with some extra
features:
- All counters are 64bit
- The overflow point is controlled by the PMCR_EL0.LP bit
Add the required checks in the helpers that control counter
width and overflow, as well as the sysreg handling for the LP
bit. A new kvm_pm
As further patches will enable the selection of a PMU revision
from userspace, sample the supported PMU revision at VM creation
time, rather than building each time the ID_AA64DFR0_EL1 register
is accessed.
This shouldn't result in any change in behaviour.
Reviewed-by: Reiji Watanabe
Signed-off-
Allow userspace to write ID_DFR0_EL1, on the condition that only
the PerfMon field can be altered and be something that is compatible
with what was computed for the AArch64 view of the guest.
Signed-off-by: Marc Zyngier
---
arch/arm64/kvm/sys_regs.c | 54 ++-
Allow userspace to write ID_AA64DFR0_EL1, on the condition that only
the PMUver field can be altered and be at most the one that was
initially computed for the guest.
Signed-off-by: Marc Zyngier
---
arch/arm64/kvm/sys_regs.c | 40 ++-
1 file changed, 39 insert
Even when the underlying HW doesn't offer the CHAIN event
(which happens with QEMU), we can always support it as we're
in control of the counter overflow.
Always advertise the event via PMCEID0_EL0.
Signed-off-by: Marc Zyngier
---
arch/arm64/kvm/pmu-emul.c | 2 ++
1 file changed, 2 insertions(+
The PMU architecture makes a subtle difference between a 64bit
counter and a counter that has a 64bit overflow. This is for example
the case of the cycle counter, which can generate an overflow on
a 32bit boundary if PMCR_EL0.LC==0 despite the accumulation being
done on 64 bits.
Use this distincti
The current PMU emulation sometimes narrows counters to 32bit
if the counter isn't the cycle counter. As this is going to
change with PMUv3p5 where the counters are all 64bit, fix
the couple of cases where this happens unconditionally.
Signed-off-by: Marc Zyngier
---
arch/arm64/kvm/pmu-emul.c |
Align the ID_DFR0_EL1.PerfMon values with ID_AA64DFR0_EL1.PMUver.
Reviewed-by: Oliver Upton
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/sysreg.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 7d301700d1a9.
Ricardo reported[0] that our PMU emulation was busted when it comes to
chained events, as we cannot expose the overflow on a 32bit boundary
(which the architecture requires).
This series aims at fixing this (by deleting a lot of code), and as a
bonus adds support for PMUv3p5, as this requires us t
Even when using PMUv3p5 (which implies 64bit counters), there is
no way for AArch32 to write to the top 32 bits of the counters.
The only way to influence these bits (other than by counting
events) is by writing PMCR.P==1.
Make sure we obey the architecture and preserve the top 32 bits
on a counte
In order to reduce the boilerplate code, add two helpers returning
the counter register index (resp. the event register) in the vcpu
register file from the counter index.
Reviewed-by: Oliver Upton
Signed-off-by: Marc Zyngier
---
arch/arm64/kvm/pmu-emul.c | 33 ++---
kvm_pmu_set_counter_value() is pretty odd, as it tries to update
the counter value while taking into account the value that is
currently held by the running perf counter.
This is not only complicated, this is quite wrong. Nowhere in
the architecture is it said that the counter would be offset
by s
Ricardo recently pointed out that the PMU chained counter emulation
in KVM wasn't quite behaving like the one on actual hardware, in
the sense that a chained counter would expose an overflow on
both halves of a chained counter, while KVM would only expose the
overflow on the top half.
The differen
For 64bit counters that overflow on a 32bit boundary, make
sure we only check the bottom 32bit to generate a CHAIN event.
Signed-off-by: Marc Zyngier
---
arch/arm64/kvm/pmu-emul.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu
69 matches
Mail list logo