Hi All,
This RFC introduces "named" CPU models for ARM64 KVM guests. This
is foundational for cross-host live migration and management-stack
control over individual CPU features exposed to the guest.
TL;DR Examples:
# Boot with Grace CPU model
qemu-system-aarch64 -cpu grace-v1 -machine virt,accel=kvm ...
# Grace with a feature disabled
qemu-system-aarch64 -cpu grace-v1,feat_SHA1=off ...
# Host passthrough with individual feature control
qemu-system-aarch64 -cpu host,feat_AES=aes ...
# Neoverse v2 on Grace.
qemu-system-aarch64 -cpu neoverse-v2-v1
# Migration from Grace to Graviton3 (TBD)
qemu-system-aarch64 -cpu neoverse-v1-v1 ...
Relationship with Auger/Huck's customizable host model [1]:
We have been working on this series in parallel with [1]. Eric Auger and
Cornelia Huck's series [1] exposes raw SYSREG_<REG>_<FIELD> uint64
properties on -cpu host, providing the essential low-level knobs for ID
register customization. This RFC builds on the same KVM capability
and can be layered on top of [1]:
- Human-readable property names: feat_AES=pmull instead of
SYSREG_ID_AA64ISAR0_EL1_AES=2, with arch-defined named values
validated at set time.
- Default values and forward compatibility: CPU models start from a
known-zero baseline rather than the host view, so new fields/registers
introduced in future kernels do not silently leak into existing models.
- Named CPU models with hierarchical inheritance: grace-v1,
neoverse-v2-v1, etc.
The two series can coexist; this series can be rebased on top of [1].
[1]
https://lore.kernel.org/qemu-devel/[email protected]/
Problems with defining "named" CPU models for ARM64 KVM guests:
* Features are not single CPUID bits. They are mostly multi-bit fields
encoding version/level instead of just presence. A single field encodes
multiple ARM ARM defined features (FEAT_s) at different thresholds.
* KVM does not allow all registers and fields to be modified for a guest.
Some fields KVM does not virtualise at all (SME) or only support host
values (BRPs, CWG, etc.). This is evolving and differs between kernel
versions.
* ARM does not have a single natural granularity for CPU models unlike
x86. ARM has architecture, reference core and SoC levels each becoming
more granular.
* ARM has dozens of vendors and it will be tricky to maintain models for
all of them.
* Previous designs started from the host values and then subtracted
undesirable features. This is not forward-compatible; the design
should work when a new ID register or field is introduced.
With the above problems in mind, the design has 3 layers:
1. ARM ID Register Field Table:
- This layer maintains all architecturally defined ID registers and
ID register fields. It includes:
* Field name
* Field shift
* Field length
* Safe-value tag: LOWER, HIGHER, HIGHER_OR_ZERO, SIGNED_LOWER,
EXACT, ANY
This will be used to validate user-provided values
during
CPU realization time against the host's value. I.e., if
the
host only supports "aes", a CPU model that sets "pmull"
should be rejected.
* Default value: The value to which the field is reset. This
gives
CPU models a clean cpu.isar.idregs[] baseline instead
of the
host view provided by the kernel, as in previous
designs.
This also complements the forward-compatibility story.
Given
the "default" values, higher levels need not worry
about new
fields/registers being introduced.
* Architecturally defined named values like "off", "aes",
"pmull",
etc.
* These values are derived from the kernel's ftr_bits array and
tools/sysreg file.
E.g:
IDREG_START(ID_AA64ISAR0)
IDREG_FIELD_START(ID_AA64ISAR0, AES, 4, 4, LOWER, 0)
IDREG_FIELD_ARCH_VAL(0b0000, "off")
IDREG_FIELD_ARCH_VAL(0b0001, "aes")
IDREG_FIELD_ARCH_VAL(0b0010, "pmull")
IDREG_FIELD_END(ID_AA64ISAR0, AES)
....
IDREG_END(ID_AA64ISAR0)
- This layer is the single source of truth for ARM64 ID registers.
The default values and safe-value tags are manually derived from the
kernel's ftr_bits array. Other boilerplate and arch-defined values are
script-generated.
- AArch32 ID registers are added with a single field so they can be
zeroed out on hosts that support AArch32.
- This layer also defines helpers for higher layers to extract and
manipulate ID register fields.
* arm_idregs_reset_to_defaults(): Reset all ID registers to their
default values.
* arm_idreg_field_read/write(): Read the value of an ID register
field.
* arm_arch_val_name/from_name(): Look up the arch-defined name for
a numeric field value.
* ...
- This layer creates the following tables using X-macro expansion:
* arm_idregs[]: Array of ID register descriptors.
* arm_field_locs[]: Array of field location descriptors.
(fieldIDx -> registerIndex, fieldIndex)
* ...
- The ArmIdReg struct also includes a writable_mask to track which
bits are writable by KVM. This is populated at runtime during
scratch VM creation, and is further used to validate that only
the writable bits are modified by the CPU model.
2. ARM Properties Layer:
A small property layer on top of the ID Register Field table is defined.
This series defines two types of properties with plans for one more
in the future:
- Single field properties: These represent ARM FEAT_X features
that correspond to a single ID register field. Example: feat_AES,
feat_SHA2, etc.
The property name is set as "feat_<FieldName>" and possible
values
are the arch-defined named values. This can be further
categorized
into:
* STRING: multi-bit fields (>=2 bits) with arch-defined
named
values, example: feat_AES, feat_SHA2, etc.
* BOOLEAN: 1-bit fields only (true/false)
example: hw_prop_IDC, hw_prop_DIC, etc.
* NUMERIC: IDREG_ANY fields with no named values (raw
integer)
example: hw_prop_BS, hw_prop_DZP, etc.
String property values are validated against the arch-defined
named
values.
ID register fields that are not covered by single field
properties
are also exposed as a property named hw_prop_FieldName. These
are
usually implementation-defined values like cache geometry, debug
counter widths, etc. (CTR_EL0.*, DCZID_EL0.*, etc.)
Example: hw_prop_BS, hw_prop_DZP, etc.
Single field properties are defined as:
ARM_PROP("prop_name", type, reg, field)
Example:
ARM_PROP("feat_AES", STRING, ID_AA64ISAR0, AES)
* Validation based on safe-value tags is yet to be implemented.
- Fractional properties: These represent ARM FEAT_X features that
use two fields (base + frac) across registers. Example: feat_CSV2,
feat_MPAM, etc.
The property name is set as "feat_<BaseFieldName>" and possible
values are the arch-defined string values like "0.0", "1.0",
"1.1",
etc.
Fractional properties are defined as:
ARM_FRACTIONAL_PROP("prop_name", base_reg, base_field,
frac_reg, frac_field)
Example:
ARM_FRACTIONAL_PROP("feat_CSV2", ID_AA64PFR0, CSV2,
ID_AA64PFR1, CSV2_FRAC)
When a fractional property is set, both the base field and frac
field values are set to the corresponding values.
E.g: feat_CSV2=1.1 will set ID_AA64PFR0.CSV2=1 and
ID_AA64PFR1.CSV2_FRAC=1.
- Composite properties (planned for v2):
These will act as master boolean switches that control a list of
fields. Example: pauth, sve, etc. Setting sve=on with a named model
will set all the SVE-related fields (ID_AA64ZFR0_EL1.*) along with
sveNNN vector-length. Similarly, setting pauth=on will set APA, GPA,
API, GPA3, GPI, GPA3 fields based on the named model.
- cpu_revision, cpu_partnum, etc. properties are introduced to expose
MIDR, REVIDR, AIDR fields.
Exceptions to the property naming are made for ID_AA64PFR0_EL1.ELx
fields, which are named elx_mode.
This series defines over 130 single field properties plus 4
fractional properties. All properties work with -cpu host also.
All properties change the cpu.isar.idregs[] values which are later
written back to KVM at the end of kvm_arch_init_vcpu().
* The arch-defined named values and property names can be iterated
until they make sense.
3. ARM CPU Model Hierarchy:
A small named model layer is defined on top of the properties. An ARM named
CPU model defines a list of property values and a parent model. A child
model naturally inherits all the properties from its parent and can
override them when needed.
The initial model hierarchy shipped here is:
kvm-base-v1 KVM-imposed quirks
arm-v8_4-a-v1 ARMv8.4-A architectural mandate
neoverse-v1-v1 Neoverse V1
graviton3-v1 AWS Graviton3
arm-v9_0-a-v1 ARMv9.0-A architectural deltas on top
of ARM-v8_4-a-v1
neoverse-v2-v1 Neoverse V2
grace-v1 NVIDIA Grace
(kvm-base-v1 and arm-vX are not meant to be realizable unless the
user provides values for implementation-defined fields)
So for example, grace-v1 defines Crypto fields and CTR_EL0.IDC/DIC on top
of neoverse-v2-v1, which leaves those fields vendor-configurable.
The hierarchy reflects a deliberate trade-off:
- Architecture-level models (arm-v8_4-a-v1) maximize migration
compatibility but lack implementation-defined values.
- Reference-core models (neoverse-v2-v1) enable migration across
SoCs sharing the same core design.
- SoC models (grace-v1) expose the full hardware feature set but
limit migration to hosts with the same SoC.
At model realization time,
1. a clean slate of cpu.isar.idregs[] is created using
arm_idregs_reset_to_defaults().
2. Then, a model's full parent-chain is walked and all properties are
applied in order from parent to child.
3. Finally, kvm_arm_writeback_idregs() compares the model's desired
ID-register values against the host-provided cpreg snapshot and
writes back the writable bits, warning on any non-writable
difference.
Models will follow a monotonic versioning convention (grace-v1, grace-v2,
...) mirroring x86's scheme.
* Please take the CPU model property values with a grain of salt.
They are added based on what the guest-visible values are with "host"
model on available hardware.
Benefits of this design:
- General benefits that come with properties and named CPU models,
like cross-host live migration, management-stack control over
feature exposure, etc.
- Forward compatibility: when a new ID register or field is
introduced, CPU models need not change; during realization they
will be populated with the default values. Only ID register/field
information needs to be added to the field table.
- As CPU models are hierarchical, defining a new model is much easier.
- The property names and values are self-documenting.
NOTE: ~2200 of the ~3300 added lines are declarative (field table,
model definitions, properties, etc.)
Tested with KVM on an NVIDIA Grace host.
Relationship with existing code base:
- It does not change any TCG-based code paths.
- For KVM host passthrough it just adds property support.
- Does not change any existing properties or other code paths.
- Can layer on top of the SYSREG_ property series [1].
Planned Follow-ups:
- Composite properties with handling of sve, pauth for named models.
- CLIDR_EL1 and CCSIDR_EL1 handling.
- Safe-value based validation logic.
- QMP commands like query-cpu-model-expansion are not hooked yet.
Blockers and supported values (calculated using safe-value tags
and runtime KVM writable masks) will be reported through them.
E.g. libvirt could report:
<property name='feat_AES' type='string' value='pmull'
supports='off,aes,pmull'/>
and:
<cpu type='kvm' name='nvidia-grace-v1'
typename='arm-nvidia-grace-v1-arm-cpu' usable='no'>
<blocker name='feat_AES'/>
</cpu>
- DCZID_EL0 handling.
Out of Scope:
- Inter-feature dependencies like FP <--> AdvSIMD, SM3 <--> SM4, etc.
Appendix: KVM Non-writable fields (kernel 6.18):
These fields pass the host value through to the guest unmodified on
6.18; trying to override them get a warning from kvm_arm_writeback_idregs()
and the slot retains the host value:
# Field # Field
--- ----------------------- --- -----------------------
1 ID_AA64PFR0.FP 10 ID_AA64MMFR2.NV
2 ID_AA64PFR0.AdvSIMD 11 ID_AA64MMFR2.CCIDX
3 ID_AA64MMFR0.ASIDBITS 12 ID_AA64MMFR4.E2H0
4 ID_AA64MMFR1.XNX 13 ID_AA64DFR0.CTX_CMPs
5 ID_AA64MMFR1.VH 14 ID_AA64DFR0.BRPs
6 ID_AA64MMFR1.VMIDBits 15 CTR_EL0.CWG
7 ID_AA64MMFR2.EVT 16 CTR_EL0.ERG
8 ID_AA64MMFR2.FWB 17 DCZID_EL0.*
9 ID_AA64MMFR2.IDS
This list shifts with the kernel version. The runtime probe via
KVM_ARM_GET_REG_WRITABLE_MASKS is authoritative.
Warm Regards,
Shaju, Khushit
Khushit Shah (1):
target/arm/kvm: enable writable implementation ID registers
Shaju Abraham (12):
target/arm: named_cpu_model: define containers for ID registers and
fields
target/arm: named_cpu_model: Add ID Register Fields
target/arm: named_cpu_model: initialise additional sysregs
target/arm: named_cpu_model: generate tables for Arm64 ID registers
and fields
target/arm: named_cpu_model: replace FIELD macro with IDREG_FIELD
target/arm: named_cpu_model: data-structures required for the ARM
property layer.
target/arm: named_cpu_model: define ARM properties
target/arm: named_cpu_model: generate arm_cpu_props[] table
target/arm: named_cpu_model: Add ID register field helper functions
target/arm: named_cpu_model: Register Arm64 properties for host model
target/arm: named_cpu_model: introduce named CPU models for selected
CPUs
target/arm: named_cpu_model: writeback modified ID registers to KVM
hw/arm/virt.c | 8 +
target/arm/arm-cpu-frac.inc.h | 34 +
target/arm/arm-cpu-models.c | 214 ++++
target/arm/arm-cpu-models.h | 43 +
target/arm/arm-cpu-props.c | 259 +++++
target/arm/arm-cpu-props.h | 36 +
target/arm/arm-cpu-props.inc.h | 180 ++++
target/arm/arm-v8_4-a-v1.inc.h | 22 +
target/arm/arm-v9_0-a-v1.inc.h | 28 +
target/arm/cpu-features.h | 232 +----
target/arm/cpu-idregs.c | 232 +++++
target/arm/cpu-idregs.h | 132 +++
target/arm/cpu-idregs.h.inc | 1724 +++++++++++++++++++++++++++++++
target/arm/cpu-sysregs.h.inc | 5 +
target/arm/cpu64.c | 3 +-
target/arm/grace-v1.inc.h | 17 +
target/arm/graviton3-v1.inc.h | 16 +
target/arm/kvm-base-v1.inc.h | 13 +
target/arm/kvm.c | 167 ++-
target/arm/meson.build | 7 +-
target/arm/neoverse-v1-v1.inc.h | 64 ++
target/arm/neoverse-v2-v1.inc.h | 64 ++
target/arm/trace-events | 1 +
23 files changed, 3284 insertions(+), 217 deletions(-)
create mode 100644 target/arm/arm-cpu-frac.inc.h
create mode 100644 target/arm/arm-cpu-models.c
create mode 100644 target/arm/arm-cpu-models.h
create mode 100644 target/arm/arm-cpu-props.c
create mode 100644 target/arm/arm-cpu-props.h
create mode 100644 target/arm/arm-cpu-props.inc.h
create mode 100644 target/arm/arm-v8_4-a-v1.inc.h
create mode 100644 target/arm/arm-v9_0-a-v1.inc.h
create mode 100644 target/arm/cpu-idregs.c
create mode 100644 target/arm/cpu-idregs.h
create mode 100644 target/arm/cpu-idregs.h.inc
create mode 100644 target/arm/grace-v1.inc.h
create mode 100644 target/arm/graviton3-v1.inc.h
create mode 100644 target/arm/kvm-base-v1.inc.h
create mode 100644 target/arm/neoverse-v1-v1.inc.h
create mode 100644 target/arm/neoverse-v2-v1.inc.h
--
2.52.0