Re: [PATCH 20/20] target/riscv: add 'kvm_supported' class property

2023-09-04 Thread Andrew Jones
On Fri, Sep 01, 2023 at 05:57:46PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 8/31/23 09:47, Andrew Jones wrote:
> > On Fri, Aug 25, 2023 at 10:08:53AM -0300, Daniel Henrique Barboza wrote:
> > > This follows the same idea of 'tcg_support' property added in the
> > > previous patch. Note that we're now implementing the 'cpu_realizefn' for
> > > the KVMAccel class since this verification is done in realize() time.
> > > 
> > > Supporting vendor CPUs with KVM is not possible. We rely on the
> > > extension support of the KVM module running in the host, making it
> > > impossible to guarantee that a vendor CPU will have all the required
> > > extensions available. The only way to guarantee that a vendor CPU is KVM
> > > compatible is running KVM in a host that has the same vendor CPU, and
> > 
> > Or to attempt to enable each extension which the vendor CPU expects and
> > to attempt to disable everything else. If all those actions succeed, then
> > we can override the ID registers with those of the CPU we want to model
> > and go for it. There's still risk, though, that the guest kernel will see
> > the ID registers of the model and attempt to apply some errata workaround
> > which may or may not work and/or crash the guest.
> 
> This can also happen when migrating the guest from a host that happens to have
> an errata to one that doesn't have, regardless of the CPU type the guest
> is using (host CPU vs vendor CPU). The guest would need a power cycle to
> identify the current model ID.

We shouldn't migrate a 'host' CPU model anywhere other than to an exactly
identical host (same ID registers, same errata). Also, migration must
consider the host kernel. The aim is to support "ping-pong" migration,
i.e. migrate A->B->A, where B has a host kernel which is the same or more
recent than A. This is a reasonable level of support, as it supports host
upgrades with rollback. B cannot be older than A, as it may not handle
errata in the same way.

> 
> We don't have the tooling needed to mitigate this risk in QEMU I'm afraid. 
> Upper
> layers like libvirt are more able to deal with it.

And higher layers yet, libvirt daemons capture all the information of the
hosts they run on. Layers above libvirt compare information from all hosts
under their control to create sets of possible migration destinations for
each VM, considering the VM configurations.

> 
> > 
> > > for this case we already have the 'host' CPU type.
> > > 
> > > We're better of declaring that all vendors CPUs are not KVM capable.
> > > After this patch, running KVM accel with a vendor CPU will produce an
> > > error like the following:
> > > 
> > > $ ./qemu-system-riscv64 -M virt,accel=kvm -cpu veyron-v1
> > > qemu-system-riscv64: 'veyron-v1' CPU is not compatible with KVM 
> > > acceleration
> > 
> > Shouldn't we at least check if the host matches the requested CPU first?
> > So, if we happen to be on a veyron-v1, then the veyron-v1 model should
> > be equivalent to 'host'. (They may not be 100% equivalent in practice, but
> > theoretically they should be, which means trying it and debugging the bugs
> > should improve the CPU models on both sides.)
> 
> If we're really going this route we would need to match host and vendor CPU
> in the extension level, matching each vendor CPU extension with what the
> CPU can provide, failing if the host can't provide all extensions the vendor
> CPU requires.

We can't support arbitrary vendor CPU models on arbitrary hosts. I'm only
advocating for supporting CPU model XYZ when KVM is running on XYZ CPUs
or compatible CPUs (more on the compatible CPUs later).

To elaborate, I don't really see a problem with expecting KVM to provide a
VCPU which matches the CPU model of the physical CPU which KVM is running
on (minus M-mode). KVM should be steadily learning how to expose all
extensions of the CPUs it runs on to its guests. So, while it may not be
possible now to enable all extensions of a particular model, it should be
eventually. If there are extensions in the CPU model which cannot be
virtualized, then it may be tolerable for QEMU to just warn about them,
rather than abort the whole thing (hopefully we don't have any of those
anyway). And, the "VCPU only almost matching the CPU model" problem isn't
much different than the "VCPU not actually matching the host CPU when
using '-cpu host'" problem. In both cases, a user may not be pleased that
they didn't get exactly what they asked for. At least with the CPU model,
QEMU will be aware of the differences and can warn about them.

> I wouldn't even bother checking for things like machine ID since
> they can be easily impersonated (e.g. use a rv64 emulated host, edit 
> mvendorid)
> and can't be trusted.

We should definitely check the ID registers. If KVM says it's running on
XYZ CPUs, then we should consider allowing the XYZ model to be used with
KVM guests. If the host is emulated and the user configured things in
a strange way, then, when things blow up, they can 

Re: [PATCH 20/20] target/riscv: add 'kvm_supported' class property

2023-09-01 Thread Daniel Henrique Barboza




On 8/31/23 09:47, Andrew Jones wrote:

On Fri, Aug 25, 2023 at 10:08:53AM -0300, Daniel Henrique Barboza wrote:

This follows the same idea of 'tcg_support' property added in the
previous patch. Note that we're now implementing the 'cpu_realizefn' for
the KVMAccel class since this verification is done in realize() time.

Supporting vendor CPUs with KVM is not possible. We rely on the
extension support of the KVM module running in the host, making it
impossible to guarantee that a vendor CPU will have all the required
extensions available. The only way to guarantee that a vendor CPU is KVM
compatible is running KVM in a host that has the same vendor CPU, and


Or to attempt to enable each extension which the vendor CPU expects and
to attempt to disable everything else. If all those actions succeed, then
we can override the ID registers with those of the CPU we want to model
and go for it. There's still risk, though, that the guest kernel will see
the ID registers of the model and attempt to apply some errata workaround
which may or may not work and/or crash the guest.


This can also happen when migrating the guest from a host that happens to have
an errata to one that doesn't have, regardless of the CPU type the guest
is using (host CPU vs vendor CPU). The guest would need a power cycle to
identify the current model ID.

We don't have the tooling needed to mitigate this risk in QEMU I'm afraid. Upper
layers like libvirt are more able to deal with it.




for this case we already have the 'host' CPU type.

We're better of declaring that all vendors CPUs are not KVM capable.
After this patch, running KVM accel with a vendor CPU will produce an
error like the following:

$ ./qemu-system-riscv64 -M virt,accel=kvm -cpu veyron-v1
qemu-system-riscv64: 'veyron-v1' CPU is not compatible with KVM acceleration


Shouldn't we at least check if the host matches the requested CPU first?
So, if we happen to be on a veyron-v1, then the veyron-v1 model should
be equivalent to 'host'. (They may not be 100% equivalent in practice, but
theoretically they should be, which means trying it and debugging the bugs
should improve the CPU models on both sides.)


If we're really going this route we would need to match host and vendor CPU
in the extension level, matching each vendor CPU extension with what the
CPU can provide, failing if the host can't provide all extensions the vendor
CPU requires. I wouldn't even bother checking for things like machine ID since
they can be easily impersonated (e.g. use a rv64 emulated host, edit mvendorid)
and can't be trusted.

TBH I am not thrilled with the idea of supporting vendor CPUs with KVM. The user
can pick the 'host' CPU to have the most capable KVM CPU available in the host,
and that is already not trivial to support in cases like live migration and so
on. Vendor CPU KVM support will promote things like:

"I tried to use a veyron-v2 KVM CPU in a veyron-v1 host, why is that not 
possible
it should be it's not fair"

"why can't I use a vendor X KVM CPU A into a vendor Y CPU B host it surely 
should
work since CPU A is older than CPU B right"

And then, even if we decide to support vendor CPUs with KVM in a feasible way, 
with
a lot of conditions and training wheels, we'll be so restrictive that the user 
will
be better of using the 'host' CPU anyway.


All this said, there's a lot going on in this series already and this vendor 
CPU + KVM
discussion might deserve its own RFC/thread. I'll drop this patch from the 
series to
give us time to discuss this properly. Let's leave it as is for now.


Thanks,

Daniel






Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/cpu-qom.h |  1 +
  target/riscv/cpu.c |  1 +
  target/riscv/kvm/kvm-cpu.c | 24 
  3 files changed, 26 insertions(+)

diff --git a/target/riscv/cpu-qom.h b/target/riscv/cpu-qom.h
index e86b76f9fe..32d9bb07b4 100644
--- a/target/riscv/cpu-qom.h
+++ b/target/riscv/cpu-qom.h
@@ -72,5 +72,6 @@ struct RISCVCPUClass {
  
  bool user_extension_properties;

  bool tcg_supported;
+bool kvm_supported;
  };
  #endif /* RISCV_CPU_QOM_H */
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index f749ea2a2e..73302bb72a 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1646,6 +1646,7 @@ static void riscv_dynamic_cpu_class_init(ObjectClass *c, 
void *data)
  
  rcc->user_extension_properties = true;

  rcc->tcg_supported = true;
+rcc->kvm_supported = true;
  }
  
  static void riscv_vendor_cpu_class_init(ObjectClass *c, void *data)

diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
index 501384924b..85f3b8c80e 100644
--- a/target/riscv/kvm/kvm-cpu.c
+++ b/target/riscv/kvm/kvm-cpu.c
@@ -1289,6 +1289,7 @@ static void riscv_kvm_cpu_class_init(ObjectClass *c, void 
*data)
  RISCVCPUClass *rcc = RISCV_CPU_CLASS(c);
  
  rcc->user_extension_properties = true;

+rcc->kvm_supported = true;
  }
  
  static const TypeInfo 

Re: [PATCH 20/20] target/riscv: add 'kvm_supported' class property

2023-08-31 Thread Andrew Jones
On Fri, Aug 25, 2023 at 10:08:53AM -0300, Daniel Henrique Barboza wrote:
> This follows the same idea of 'tcg_support' property added in the
> previous patch. Note that we're now implementing the 'cpu_realizefn' for
> the KVMAccel class since this verification is done in realize() time.
> 
> Supporting vendor CPUs with KVM is not possible. We rely on the
> extension support of the KVM module running in the host, making it
> impossible to guarantee that a vendor CPU will have all the required
> extensions available. The only way to guarantee that a vendor CPU is KVM
> compatible is running KVM in a host that has the same vendor CPU, and

Or to attempt to enable each extension which the vendor CPU expects and
to attempt to disable everything else. If all those actions succeed, then
we can override the ID registers with those of the CPU we want to model
and go for it. There's still risk, though, that the guest kernel will see
the ID registers of the model and attempt to apply some errata workaround
which may or may not work and/or crash the guest.

> for this case we already have the 'host' CPU type.
> 
> We're better of declaring that all vendors CPUs are not KVM capable.
> After this patch, running KVM accel with a vendor CPU will produce an
> error like the following:
> 
> $ ./qemu-system-riscv64 -M virt,accel=kvm -cpu veyron-v1
> qemu-system-riscv64: 'veyron-v1' CPU is not compatible with KVM acceleration

Shouldn't we at least check if the host matches the requested CPU first?
So, if we happen to be on a veyron-v1, then the veyron-v1 model should
be equivalent to 'host'. (They may not be 100% equivalent in practice, but
theoretically they should be, which means trying it and debugging the bugs
should improve the CPU models on both sides.)

> 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  target/riscv/cpu-qom.h |  1 +
>  target/riscv/cpu.c |  1 +
>  target/riscv/kvm/kvm-cpu.c | 24 
>  3 files changed, 26 insertions(+)
> 
> diff --git a/target/riscv/cpu-qom.h b/target/riscv/cpu-qom.h
> index e86b76f9fe..32d9bb07b4 100644
> --- a/target/riscv/cpu-qom.h
> +++ b/target/riscv/cpu-qom.h
> @@ -72,5 +72,6 @@ struct RISCVCPUClass {
>  
>  bool user_extension_properties;
>  bool tcg_supported;
> +bool kvm_supported;
>  };
>  #endif /* RISCV_CPU_QOM_H */
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index f749ea2a2e..73302bb72a 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -1646,6 +1646,7 @@ static void riscv_dynamic_cpu_class_init(ObjectClass 
> *c, void *data)
>  
>  rcc->user_extension_properties = true;
>  rcc->tcg_supported = true;
> +rcc->kvm_supported = true;
>  }
>  
>  static void riscv_vendor_cpu_class_init(ObjectClass *c, void *data)
> diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
> index 501384924b..85f3b8c80e 100644
> --- a/target/riscv/kvm/kvm-cpu.c
> +++ b/target/riscv/kvm/kvm-cpu.c
> @@ -1289,6 +1289,7 @@ static void riscv_kvm_cpu_class_init(ObjectClass *c, 
> void *data)
>  RISCVCPUClass *rcc = RISCV_CPU_CLASS(c);
>  
>  rcc->user_extension_properties = true;
> +rcc->kvm_supported = true;
>  }
>  
>  static const TypeInfo riscv_kvm_cpu_type_infos[] = {
> @@ -1302,6 +1303,28 @@ static const TypeInfo riscv_kvm_cpu_type_infos[] = {
>  
>  DEFINE_TYPES(riscv_kvm_cpu_type_infos)
>  
> +/*
> + * We'll get here via the following path:
> + *
> + * riscv_cpu_realize()
> + *   -> cpu_exec_realizefn()
> + *  -> kvm_cpu_realizefn() (via accel_cpu_realizefn())
> + */
> +static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
> +{
> +RISCVCPU *cpu = RISCV_CPU(cs);
> +RISCVCPUClass *rcc = RISCV_CPU_GET_CLASS(cpu);
> +
> +if (!rcc->kvm_supported) {
> +g_autofree char *name = riscv_cpu_get_name(rcc);
> +error_setg(errp, "'%s' CPU is not compatible with KVM acceleration",
> +   name);
> +return false;
> +}
> +
> +return true;
> +}
> +
>  static void kvm_cpu_instance_init(CPUState *cs)
>  {
>  Object *obj = OBJECT(RISCV_CPU(cs));
> @@ -1328,6 +1351,7 @@ static void kvm_cpu_accel_class_init(ObjectClass *oc, 
> void *data)
>  AccelCPUClass *acc = ACCEL_CPU_CLASS(oc);
>  
>  acc->cpu_instance_init = kvm_cpu_instance_init;
> +acc->cpu_realizefn = kvm_cpu_realizefn;
>  }
>  
>  static const TypeInfo kvm_cpu_accel_type_info = {
> -- 
> 2.41.0
> 
>

I don't think we want kvm_supported nor tcg_supported as they necessarily
bring accelerator knowledge into target/riscv/cpu.c, where the booleans
have to be set. It would be better if the decisions as to what is
supported or not are made in the accelerator's respective files. So, we
try to realize some model with some accel and let that accel do some
checks and error out when it can't. If riscv kvm only supports 'host',
then it's check would simply be "is the model host?" and the inverse for
tcg.

Thanks,
drew



[PATCH 20/20] target/riscv: add 'kvm_supported' class property

2023-08-25 Thread Daniel Henrique Barboza
This follows the same idea of 'tcg_support' property added in the
previous patch. Note that we're now implementing the 'cpu_realizefn' for
the KVMAccel class since this verification is done in realize() time.

Supporting vendor CPUs with KVM is not possible. We rely on the
extension support of the KVM module running in the host, making it
impossible to guarantee that a vendor CPU will have all the required
extensions available. The only way to guarantee that a vendor CPU is KVM
compatible is running KVM in a host that has the same vendor CPU, and
for this case we already have the 'host' CPU type.

We're better of declaring that all vendors CPUs are not KVM capable.
After this patch, running KVM accel with a vendor CPU will produce an
error like the following:

$ ./qemu-system-riscv64 -M virt,accel=kvm -cpu veyron-v1
qemu-system-riscv64: 'veyron-v1' CPU is not compatible with KVM acceleration

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu-qom.h |  1 +
 target/riscv/cpu.c |  1 +
 target/riscv/kvm/kvm-cpu.c | 24 
 3 files changed, 26 insertions(+)

diff --git a/target/riscv/cpu-qom.h b/target/riscv/cpu-qom.h
index e86b76f9fe..32d9bb07b4 100644
--- a/target/riscv/cpu-qom.h
+++ b/target/riscv/cpu-qom.h
@@ -72,5 +72,6 @@ struct RISCVCPUClass {
 
 bool user_extension_properties;
 bool tcg_supported;
+bool kvm_supported;
 };
 #endif /* RISCV_CPU_QOM_H */
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index f749ea2a2e..73302bb72a 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1646,6 +1646,7 @@ static void riscv_dynamic_cpu_class_init(ObjectClass *c, 
void *data)
 
 rcc->user_extension_properties = true;
 rcc->tcg_supported = true;
+rcc->kvm_supported = true;
 }
 
 static void riscv_vendor_cpu_class_init(ObjectClass *c, void *data)
diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
index 501384924b..85f3b8c80e 100644
--- a/target/riscv/kvm/kvm-cpu.c
+++ b/target/riscv/kvm/kvm-cpu.c
@@ -1289,6 +1289,7 @@ static void riscv_kvm_cpu_class_init(ObjectClass *c, void 
*data)
 RISCVCPUClass *rcc = RISCV_CPU_CLASS(c);
 
 rcc->user_extension_properties = true;
+rcc->kvm_supported = true;
 }
 
 static const TypeInfo riscv_kvm_cpu_type_infos[] = {
@@ -1302,6 +1303,28 @@ static const TypeInfo riscv_kvm_cpu_type_infos[] = {
 
 DEFINE_TYPES(riscv_kvm_cpu_type_infos)
 
+/*
+ * We'll get here via the following path:
+ *
+ * riscv_cpu_realize()
+ *   -> cpu_exec_realizefn()
+ *  -> kvm_cpu_realizefn() (via accel_cpu_realizefn())
+ */
+static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
+{
+RISCVCPU *cpu = RISCV_CPU(cs);
+RISCVCPUClass *rcc = RISCV_CPU_GET_CLASS(cpu);
+
+if (!rcc->kvm_supported) {
+g_autofree char *name = riscv_cpu_get_name(rcc);
+error_setg(errp, "'%s' CPU is not compatible with KVM acceleration",
+   name);
+return false;
+}
+
+return true;
+}
+
 static void kvm_cpu_instance_init(CPUState *cs)
 {
 Object *obj = OBJECT(RISCV_CPU(cs));
@@ -1328,6 +1351,7 @@ static void kvm_cpu_accel_class_init(ObjectClass *oc, 
void *data)
 AccelCPUClass *acc = ACCEL_CPU_CLASS(oc);
 
 acc->cpu_instance_init = kvm_cpu_instance_init;
+acc->cpu_realizefn = kvm_cpu_realizefn;
 }
 
 static const TypeInfo kvm_cpu_accel_type_info = {
-- 
2.41.0