Hi All,

This RFC introduces "named" CPU models for ARM64 KVM guests. This
is foundational for cross-host live migration and management-stack
control over individual CPU features exposed to the guest.

TL;DR Examples:
  # Boot with Grace CPU model
  qemu-system-aarch64 -cpu grace-v1 -machine virt,accel=kvm ...

  # Grace with a feature disabled
  qemu-system-aarch64 -cpu grace-v1,feat_SHA1=off ...

  # Host passthrough with individual feature control
  qemu-system-aarch64 -cpu host,feat_AES=aes ...

  # Neoverse v2 on Grace.
  qemu-system-aarch64 -cpu neoverse-v2-v1

  # Migration from Grace to Graviton3 (TBD)
  qemu-system-aarch64 -cpu neoverse-v1-v1 ...

Relationship with Auger/Huck's customizable host model [1]:
We have been working on this series in parallel with [1]. Eric Auger and
Cornelia Huck's series [1] exposes raw SYSREG_<REG>_<FIELD> uint64
properties on -cpu host, providing the essential low-level knobs for ID
register customization. This RFC builds on the same KVM capability
and can be layered on top of [1]:
  - Human-readable property names: feat_AES=pmull instead of
    SYSREG_ID_AA64ISAR0_EL1_AES=2, with arch-defined named values
    validated at set time.
  - Default values and forward compatibility: CPU models start from a
    known-zero baseline rather than the host view, so new fields/registers
    introduced in future kernels do not silently leak into existing models.
  - Named CPU models with hierarchical inheritance: grace-v1,
    neoverse-v2-v1, etc.

The two series can coexist; this series can be rebased on top of [1].

[1] 
https://lore.kernel.org/qemu-devel/[email protected]/

Problems with defining "named" CPU models for ARM64 KVM guests:
  * Features are not single CPUID bits. They are mostly multi-bit fields
    encoding version/level instead of just presence. A single field encodes
        multiple ARM ARM defined features (FEAT_s) at different thresholds.
  * KVM does not allow all registers and fields to be modified for a guest.
    Some fields KVM does not virtualise at all (SME) or only support host
        values (BRPs, CWG, etc.). This is evolving and differs between kernel
        versions.
  * ARM does not have a single natural granularity for CPU models unlike
    x86. ARM has architecture, reference core and SoC levels each becoming
        more granular.
  * ARM has dozens of vendors and it will be tricky to maintain models for
    all of them.
  * Previous designs started from the host values and then subtracted
    undesirable features. This is not forward-compatible; the design
    should work when a new ID register or field is introduced.

With the above problems in mind, the design has 3 layers:

1. ARM ID Register Field Table:
   - This layer maintains all architecturally defined ID registers and
     ID register fields. It includes:
                * Field name
                * Field shift
                * Field length
                * Safe-value tag: LOWER, HIGHER, HIGHER_OR_ZERO, SIGNED_LOWER,
                                                  EXACT, ANY
                        This will be used to validate user-provided values 
during
                        CPU realization time against the host's value. I.e., if 
the
                        host only supports "aes", a CPU model that sets "pmull"
                        should be rejected.
                * Default value: The value to which the field is reset. This 
gives
                        CPU models a clean cpu.isar.idregs[] baseline instead 
of the
                        host view provided by the kernel, as in previous 
designs.
                        This also complements the forward-compatibility story. 
Given
                        the "default" values, higher levels need not worry 
about new
                        fields/registers being introduced.
                * Architecturally defined named values like "off", "aes", 
"pmull",
                        etc.
                * These values are derived from the kernel's ftr_bits array and
                  tools/sysreg file.
    E.g:

     IDREG_START(ID_AA64ISAR0)
     IDREG_FIELD_START(ID_AA64ISAR0, AES, 4, 4, LOWER, 0)
     IDREG_FIELD_ARCH_VAL(0b0000, "off")
     IDREG_FIELD_ARCH_VAL(0b0001, "aes")
     IDREG_FIELD_ARCH_VAL(0b0010, "pmull")
     IDREG_FIELD_END(ID_AA64ISAR0, AES)
         ....
         IDREG_END(ID_AA64ISAR0)

   - This layer is the single source of truth for ARM64 ID registers.
     The default values and safe-value tags are manually derived from the
         kernel's ftr_bits array. Other boilerplate and arch-defined values are
         script-generated.

   - AArch32 ID registers are added with a single field so they can be
     zeroed out on hosts that support AArch32.

   - This layer also defines helpers for higher layers to extract and
     manipulate ID register fields.
       * arm_idregs_reset_to_defaults(): Reset all ID registers to their
             default values.
           * arm_idreg_field_read/write(): Read the value of an ID register
             field.
           * arm_arch_val_name/from_name(): Look up the arch-defined name for
             a numeric field value.
           * ...

        - This layer creates the following tables using X-macro expansion:
           * arm_idregs[]: Array of ID register descriptors.
           * arm_field_locs[]: Array of field location descriptors.
                (fieldIDx -> registerIndex, fieldIndex)
           * ...

    - The ArmIdReg struct also includes a writable_mask to track which
      bits are writable by KVM. This is populated at runtime during
      scratch VM creation, and is further used to validate that only
      the writable bits are modified by the CPU model.

2. ARM Properties Layer:
   A small property layer on top of the ID Register Field table is defined.
   This series defines two types of properties with plans for one more
   in the future:
      - Single field properties: These represent ARM FEAT_X features
            that correspond to a single ID register field. Example: feat_AES,
            feat_SHA2, etc.

                The property name is set as "feat_<FieldName>" and possible 
values
                are the arch-defined named values. This can be further 
categorized
                into:
                        * STRING: multi-bit fields (>=2 bits) with arch-defined 
named
                                  values, example: feat_AES, feat_SHA2, etc.
                        * BOOLEAN: 1-bit fields only (true/false)
                                  example: hw_prop_IDC, hw_prop_DIC, etc.
                        * NUMERIC: IDREG_ANY fields with no named values (raw 
integer)
                                  example: hw_prop_BS, hw_prop_DZP, etc.

                String property values are validated against the arch-defined 
named
                values.

                ID register fields that are not covered by single field 
properties
                are also exposed as a property named hw_prop_FieldName. These 
are
                usually implementation-defined values like cache geometry, debug
                counter widths, etc. (CTR_EL0.*, DCZID_EL0.*, etc.)
                Example: hw_prop_BS, hw_prop_DZP, etc.

                Single field properties are defined as:

                ARM_PROP("prop_name", type, reg, field)
                Example:
                ARM_PROP("feat_AES", STRING, ID_AA64ISAR0, AES)

                * Validation based on safe-value tags is yet to be implemented.

     - Fractional properties: These represent ARM FEAT_X features that
            use two fields (base + frac) across registers. Example: feat_CSV2,
            feat_MPAM, etc.

                The property name is set as "feat_<BaseFieldName>" and possible
                values are the arch-defined string values like "0.0", "1.0", 
"1.1",
                etc.

                Fractional properties are defined as:
                ARM_FRACTIONAL_PROP("prop_name", base_reg, base_field, 
frac_reg, frac_field)
                Example:
                ARM_FRACTIONAL_PROP("feat_CSV2", ID_AA64PFR0, CSV2, 
ID_AA64PFR1, CSV2_FRAC)


                When a fractional property is set, both the base field and frac
                field values are set to the corresponding values.
                E.g: feat_CSV2=1.1 will set ID_AA64PFR0.CSV2=1 and 
ID_AA64PFR1.CSV2_FRAC=1.

        - Composite properties (planned for v2):
           These will act as master boolean switches that control a list of
           fields. Example: pauth, sve, etc. Setting sve=on with a named model
           will set all the SVE-related fields (ID_AA64ZFR0_EL1.*) along with
           sveNNN vector-length. Similarly, setting pauth=on will set APA, GPA,
           API, GPA3, GPI, GPA3 fields based on the named model.

        - cpu_revision, cpu_partnum, etc. properties are introduced to expose
          MIDR, REVIDR, AIDR fields.

        Exceptions to the property naming are made for ID_AA64PFR0_EL1.ELx
        fields, which are named elx_mode.

        This series defines over 130 single field properties plus 4
        fractional properties. All properties work with -cpu host also.

        All properties change the cpu.isar.idregs[] values which are later
        written back to KVM at the end of kvm_arch_init_vcpu().

        * The arch-defined named values and property names can be iterated
          until they make sense.

3. ARM CPU Model Hierarchy:

A small named model layer is defined on top of the properties. An ARM named
CPU model defines a list of property values and a parent model. A child
model naturally inherits all the properties from its parent and can
override them when needed.

The initial model hierarchy shipped here is:
    kvm-base-v1                  KVM-imposed quirks
      arm-v8_4-a-v1              ARMv8.4-A architectural mandate
          neoverse-v1-v1         Neoverse V1
                    graviton3-v1         AWS Graviton3
                arm-v9_0-a-v1            ARMv9.0-A architectural deltas on top 
of ARM-v8_4-a-v1
          neoverse-v2-v1         Neoverse V2
            grace-v1             NVIDIA Grace

(kvm-base-v1 and arm-vX are not meant to be realizable unless the
 user provides values for implementation-defined fields)

So for example, grace-v1 defines Crypto fields and CTR_EL0.IDC/DIC on top
of neoverse-v2-v1, which leaves those fields vendor-configurable.

The hierarchy reflects a deliberate trade-off:
  - Architecture-level models (arm-v8_4-a-v1) maximize migration
    compatibility but lack implementation-defined values.
  - Reference-core models (neoverse-v2-v1) enable migration across
    SoCs sharing the same core design.
  - SoC models (grace-v1) expose the full hardware feature set but
    limit migration to hosts with the same SoC.

At model realization time,
    1. a clean slate of cpu.isar.idregs[] is created using
           arm_idregs_reset_to_defaults().
        2. Then, a model's full parent-chain is walked and all properties are
           applied in order from parent to child.
        3. Finally, kvm_arm_writeback_idregs() compares the model's desired
           ID-register values against the host-provided cpreg snapshot and
           writes back the writable bits, warning on any non-writable 
difference.

Models will follow a monotonic versioning convention (grace-v1, grace-v2,
...) mirroring x86's scheme.

* Please take the CPU model property values with a grain of salt.
  They are added based on what the guest-visible values are with "host"
  model on available hardware.

Benefits of this design:
        - General benefits that come with properties and named CPU models,
          like cross-host live migration, management-stack control over
          feature exposure, etc.
        - Forward compatibility: when a new ID register or field is
          introduced, CPU models need not change; during realization they
          will be populated with the default values. Only ID register/field
          information needs to be added to the field table.
        - As CPU models are hierarchical, defining a new model is much easier.
        - The property names and values are self-documenting.

NOTE: ~2200 of the ~3300 added lines are declarative (field table,
model definitions, properties, etc.)

Tested with KVM on an NVIDIA Grace host.

Relationship with existing code base:
 - It does not change any TCG-based code paths.
 - For KVM host passthrough it just adds property support.
 - Does not change any existing properties or other code paths.
 - Can layer on top of the SYSREG_ property series [1].

Planned Follow-ups:
    - Composite properties with handling of sve, pauth for named models.
    - CLIDR_EL1 and CCSIDR_EL1 handling.
        - Safe-value based validation logic.
        - QMP commands like query-cpu-model-expansion are not hooked yet.
          Blockers and supported values (calculated using safe-value tags
          and runtime KVM writable masks) will be reported through them.
          E.g. libvirt could report:
            <property name='feat_AES' type='string' value='pmull'
                      supports='off,aes,pmull'/>
          and:
            <cpu type='kvm' name='nvidia-grace-v1'
                        typename='arm-nvidia-grace-v1-arm-cpu' usable='no'>
              <blocker name='feat_AES'/>
            </cpu>

        - DCZID_EL0 handling.

Out of Scope:
        - Inter-feature dependencies like FP <--> AdvSIMD, SM3 <--> SM4, etc.

Appendix: KVM Non-writable fields (kernel 6.18):

These fields pass the host value through to the guest unmodified on
6.18; trying to override them get a warning from kvm_arm_writeback_idregs()
and the slot retains the host value:

   #   Field                       #   Field
  ---  -----------------------    ---  -----------------------
   1   ID_AA64PFR0.FP              10  ID_AA64MMFR2.NV
   2   ID_AA64PFR0.AdvSIMD         11  ID_AA64MMFR2.CCIDX
   3   ID_AA64MMFR0.ASIDBITS       12  ID_AA64MMFR4.E2H0
   4   ID_AA64MMFR1.XNX            13  ID_AA64DFR0.CTX_CMPs
   5   ID_AA64MMFR1.VH             14  ID_AA64DFR0.BRPs
   6   ID_AA64MMFR1.VMIDBits       15  CTR_EL0.CWG
   7   ID_AA64MMFR2.EVT            16  CTR_EL0.ERG
   8   ID_AA64MMFR2.FWB            17  DCZID_EL0.*
   9   ID_AA64MMFR2.IDS

This list shifts with the kernel version. The runtime probe via
KVM_ARM_GET_REG_WRITABLE_MASKS is authoritative.

Warm Regards,
 Shaju, Khushit

Khushit Shah (1):
  target/arm/kvm: enable writable implementation ID registers

Shaju Abraham (12):
  target/arm: named_cpu_model: define containers for ID registers and
    fields
  target/arm: named_cpu_model: Add ID Register Fields
  target/arm: named_cpu_model: initialise additional sysregs
  target/arm: named_cpu_model: generate tables for Arm64 ID registers
    and fields
  target/arm: named_cpu_model: replace FIELD macro with IDREG_FIELD
  target/arm: named_cpu_model: data-structures required for the ARM
    property layer.
  target/arm: named_cpu_model: define ARM properties
  target/arm: named_cpu_model: generate arm_cpu_props[] table
  target/arm: named_cpu_model: Add ID register field helper functions
  target/arm: named_cpu_model: Register Arm64 properties for host model
  target/arm: named_cpu_model: introduce named CPU models for selected
    CPUs
  target/arm: named_cpu_model: writeback modified ID registers to KVM

 hw/arm/virt.c                   |    8 +
 target/arm/arm-cpu-frac.inc.h   |   34 +
 target/arm/arm-cpu-models.c     |  214 ++++
 target/arm/arm-cpu-models.h     |   43 +
 target/arm/arm-cpu-props.c      |  259 +++++
 target/arm/arm-cpu-props.h      |   36 +
 target/arm/arm-cpu-props.inc.h  |  180 ++++
 target/arm/arm-v8_4-a-v1.inc.h  |   22 +
 target/arm/arm-v9_0-a-v1.inc.h  |   28 +
 target/arm/cpu-features.h       |  232 +----
 target/arm/cpu-idregs.c         |  232 +++++
 target/arm/cpu-idregs.h         |  132 +++
 target/arm/cpu-idregs.h.inc     | 1724 +++++++++++++++++++++++++++++++
 target/arm/cpu-sysregs.h.inc    |    5 +
 target/arm/cpu64.c              |    3 +-
 target/arm/grace-v1.inc.h       |   17 +
 target/arm/graviton3-v1.inc.h   |   16 +
 target/arm/kvm-base-v1.inc.h    |   13 +
 target/arm/kvm.c                |  167 ++-
 target/arm/meson.build          |    7 +-
 target/arm/neoverse-v1-v1.inc.h |   64 ++
 target/arm/neoverse-v2-v1.inc.h |   64 ++
 target/arm/trace-events         |    1 +
 23 files changed, 3284 insertions(+), 217 deletions(-)
 create mode 100644 target/arm/arm-cpu-frac.inc.h
 create mode 100644 target/arm/arm-cpu-models.c
 create mode 100644 target/arm/arm-cpu-models.h
 create mode 100644 target/arm/arm-cpu-props.c
 create mode 100644 target/arm/arm-cpu-props.h
 create mode 100644 target/arm/arm-cpu-props.inc.h
 create mode 100644 target/arm/arm-v8_4-a-v1.inc.h
 create mode 100644 target/arm/arm-v9_0-a-v1.inc.h
 create mode 100644 target/arm/cpu-idregs.c
 create mode 100644 target/arm/cpu-idregs.h
 create mode 100644 target/arm/cpu-idregs.h.inc
 create mode 100644 target/arm/grace-v1.inc.h
 create mode 100644 target/arm/graviton3-v1.inc.h
 create mode 100644 target/arm/kvm-base-v1.inc.h
 create mode 100644 target/arm/neoverse-v1-v1.inc.h
 create mode 100644 target/arm/neoverse-v2-v1.inc.h

--
2.52.0


Reply via email to