Re: [PATCH v2] virtio: kconfig: memory devices are PCI only

2024-09-25 Thread Pankaj Gupta
> Virtio memory devices rely on PCI BARs to expose the contents of memory.
> Because of this they cannot be used (yet) with virtio-mmio or virtio-ccw.
> In fact the code that is common to virtio-mem and virtio-pmem, which
> is in hw/virtio/virtio-md-pci.c, is only included if CONFIG_VIRTIO_PCI
> is set.  Reproduce the same condition in the Kconfig file, only allowing
> VIRTIO_MEM and VIRTIO_PMEM to be defined if the transport supports it.
>
> Without this patch it is possible to create a configuration with
> CONFIG_VIRTIO_PCI=n and CONFIG_VIRTIO_MEM=y, but that causes a
> linking failure.

Just curious what is required to make virtio-mem & virtio-pmem compatible with
virtio-mmio?

Maybe late but still:
Reviewed-by: Pankaj Gupta 

Thanks,
Pankaj

>
> Cc: David Hildenbrand 
> Reported-by: Michael Tokarev 
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/virtio/Kconfig | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/hw/virtio/Kconfig b/hw/virtio/Kconfig
> index aa63ff7fd41..0afec2ae929 100644
> --- a/hw/virtio/Kconfig
> +++ b/hw/virtio/Kconfig
> @@ -16,6 +16,7 @@ config VIRTIO_PCI
>  default y if PCI_DEVICES
>  depends on PCI
>  select VIRTIO
> +select VIRTIO_MD_SUPPORTED
>
>  config VIRTIO_MMIO
>  bool
> @@ -35,10 +36,17 @@ config VIRTIO_CRYPTO
>  default y
>  depends on VIRTIO
>
> +# not all virtio transports support memory devices; if none does,
> +# no need to include the code
> +config VIRTIO_MD_SUPPORTED
> +bool
> +
>  config VIRTIO_MD
>  bool
> +depends on VIRTIO_MD_SUPPORTED
>  select MEM_DEVICE
>
> +# selected by the board if it has the required support code
>  config VIRTIO_PMEM_SUPPORTED
>  bool
>
> @@ -46,9 +54,11 @@ config VIRTIO_PMEM
>  bool
>  default y
>  depends on VIRTIO
> +depends on VIRTIO_MD_SUPPORTED
>  depends on VIRTIO_PMEM_SUPPORTED
>  select VIRTIO_MD
>
> +# selected by the board if it has the required support code
>  config VIRTIO_MEM_SUPPORTED
>  bool
>
> @@ -57,6 +67,7 @@ config VIRTIO_MEM
>  default y
>  depends on VIRTIO
>  depends on LINUX
> +depends on VIRTIO_MD_SUPPORTED
>  depends on VIRTIO_MEM_SUPPORTED
>  select VIRTIO_MD
>
> --
> 2.46.0
>
>



[PATCH 2/3] i386/sev: Move SEV_COMMON null check before dereferencing

2024-06-07 Thread Pankaj Gupta
coverity #1546886

fixes: 9861405a8f ("i386/sev: Invoke launch_updata_data() for SEV class")
Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 7c9df621de..f18432f58e 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1529,11 +1529,12 @@ int
 sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t len, Error **errp)
 {
 SevCommonState *sev_common = SEV_COMMON(MACHINE(qdev_get_machine())->cgs);
-SevCommonStateClass *klass = SEV_COMMON_GET_CLASS(sev_common);
+SevCommonStateClass *klass;
 
 if (!sev_common) {
 return 0;
 }
+klass = SEV_COMMON_GET_CLASS(sev_common);
 
 /* if SEV is in update state then encrypt the data else do nothing */
 if (sev_check_state(sev_common, SEV_STATE_LAUNCH_UPDATE)) {
-- 
2.34.1




[PATCH 0/3] snp: fix coverity reported issues

2024-06-07 Thread Pankaj Gupta
Pankaj Gupta (3):
  i386/sev: fix unreachable code coverity issue
  i386/sev: Move SEV_COMMON null check before dereferencing
  i386/sev: Return when sev_common is null

 target/i386/sev.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

-- 
2.34.1




[PATCH 3/3] i386/sev: Return when sev_common is null

2024-06-07 Thread Pankaj Gupta
coverity #1546885

fixes: 16dcf200dc ("i386/sev: Introduce "sev-common" type to encapsulate common 
SEV state")
Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index f18432f58e..c40562dce3 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -587,6 +587,7 @@ static SevCapability *sev_get_capabilities(Error **errp)
 sev_common = SEV_COMMON(MACHINE(qdev_get_machine())->cgs);
 if (!sev_common) {
 error_setg(errp, "SEV is not configured");
+return NULL;
 }
 
 sev_device = object_property_get_str(OBJECT(sev_common), "sev-device",
-- 
2.34.1




[PATCH 1/3] i386/sev: fix unreachable code coverity issue

2024-06-07 Thread Pankaj Gupta
Set 'finish->id_block_en' when block_size read.

coverity #1546887

fixes: 7b34df4426 ("i386/sev: Introduce 'sev-snp-guest' object")
Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 004c667ac1..7c9df621de 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -2165,6 +2165,7 @@ sev_snp_guest_set_id_block(Object *obj, const char 
*value, Error **errp)
 struct kvm_sev_snp_launch_finish *finish = &sev_snp_guest->kvm_finish_conf;
 gsize len;
 
+finish->id_block_en = 0;
 g_free(sev_snp_guest->id_block);
 g_free((guchar *)finish->id_block_uaddr);
 
@@ -2184,7 +2185,7 @@ sev_snp_guest_set_id_block(Object *obj, const char 
*value, Error **errp)
 return;
 }
 
-finish->id_block_en = (len) ? 1 : 0;
+finish->id_block_en = 1;
 }
 
 static char *
-- 
2.34.1




[PATCH v4 11/31] i386/cpu: Set SEV-SNP CPUID bit when SNP enabled

2024-05-30 Thread Pankaj Gupta
From: Michael Roth 

SNP guests will rely on this bit to determine certain feature support.

Signed-off-by: Michael Roth 
Signed-off-by: Pankaj Gupta 
---
 target/i386/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index bc2dceb647..914bef442c 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6979,6 +6979,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 if (sev_enabled()) {
 *eax = 0x2;
 *eax |= sev_es_enabled() ? 0x8 : 0;
+*eax |= sev_snp_enabled() ? 0x10 : 0;
 *ebx = sev_get_cbit_position() & 0x3f; /* EBX[5:0] */
 *ebx |= (sev_get_reduced_phys_bits() & 0x3f) << 6; /* EBX[11:6] */
 }
-- 
2.34.1




[PATCH v4 08/31] i386/sev: Add a sev_snp_enabled() helper

2024-05-30 Thread Pankaj Gupta
From: Michael Roth 

Add a simple helper to check if the current guest type is SNP. Also have
SNP-enabled imply that SEV-ES is enabled as well, and fix up any places
where the sev_es_enabled() check is expecting a pure/non-SNP guest.

Signed-off-by: Michael Roth 
Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 13 -
 target/i386/sev.h |  2 ++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 841b45f59b..f4f1971202 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -333,12 +333,21 @@ sev_enabled(void)
 return !!object_dynamic_cast(OBJECT(cgs), TYPE_SEV_COMMON);
 }
 
+bool
+sev_snp_enabled(void)
+{
+ConfidentialGuestSupport *cgs = MACHINE(qdev_get_machine())->cgs;
+
+return !!object_dynamic_cast(OBJECT(cgs), TYPE_SEV_SNP_GUEST);
+}
+
 bool
 sev_es_enabled(void)
 {
 ConfidentialGuestSupport *cgs = MACHINE(qdev_get_machine())->cgs;
 
-return sev_enabled() && (SEV_GUEST(cgs)->policy & SEV_POLICY_ES);
+return sev_snp_enabled() ||
+(sev_enabled() && SEV_GUEST(cgs)->policy & SEV_POLICY_ES);
 }
 
 uint32_t
@@ -954,7 +963,9 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, 
Error **errp)
"support", __func__);
 goto err;
 }
+}
 
+if (sev_es_enabled() && !sev_snp_enabled()) {
 if (!(status.flags & SEV_STATUS_FLAGS_CONFIG_ES)) {
 error_setg(errp, "%s: guest policy requires SEV-ES, but "
  "host SEV-ES support unavailable",
diff --git a/target/i386/sev.h b/target/i386/sev.h
index bedc667eeb..94295ee74f 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -45,9 +45,11 @@ typedef struct SevKernelLoaderContext {
 #ifdef CONFIG_SEV
 bool sev_enabled(void);
 bool sev_es_enabled(void);
+bool sev_snp_enabled(void);
 #else
 #define sev_enabled() 0
 #define sev_es_enabled() 0
+#define sev_snp_enabled() 0
 #endif
 
 uint32_t sev_get_cbit_position(void);
-- 
2.34.1




[PATCH v4 02/31] linux-headers: Update to current kvm/next

2024-05-30 Thread Pankaj Gupta
Co-developed-by: Michael Roth 
Signed-off-by: Michael Roth 
Signed-off-by: Pankaj Gupta 
---
 linux-headers/asm-loongarch/bitsperlong.h | 23 ++
 linux-headers/asm-loongarch/kvm.h |  4 ++
 linux-headers/asm-loongarch/mman.h|  9 
 linux-headers/asm-riscv/kvm.h |  1 +
 linux-headers/asm-riscv/mman.h| 36 +++-
 linux-headers/asm-s390/mman.h | 36 +++-
 linux-headers/asm-x86/kvm.h   | 52 ++-
 linux-headers/linux/vhost.h   | 15 ---
 8 files changed, 166 insertions(+), 10 deletions(-)

diff --git a/linux-headers/asm-loongarch/bitsperlong.h 
b/linux-headers/asm-loongarch/bitsperlong.h
index 6dc0bb0c13..485d60bee2 100644
--- a/linux-headers/asm-loongarch/bitsperlong.h
+++ b/linux-headers/asm-loongarch/bitsperlong.h
@@ -1 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_BITSPERLONG_H
+#define __ASM_BITSPERLONG_H
+
+#define __BITS_PER_LONG 64
+
 #include 
+
+#endif /* __ASM_BITSPERLONG_H */
diff --git a/linux-headers/asm-loongarch/kvm.h 
b/linux-headers/asm-loongarch/kvm.h
index 109785922c..f9abef3823 100644
--- a/linux-headers/asm-loongarch/kvm.h
+++ b/linux-headers/asm-loongarch/kvm.h
@@ -17,6 +17,8 @@
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 #define KVM_DIRTY_LOG_PAGE_OFFSET  64
 
+#define KVM_GUESTDBG_USE_SW_BP 0x0001
+
 /*
  * for KVM_GET_REGS and KVM_SET_REGS
  */
@@ -72,6 +74,8 @@ struct kvm_fpu {
 
 #define KVM_REG_LOONGARCH_COUNTER  (KVM_REG_LOONGARCH_KVM | 
KVM_REG_SIZE_U64 | 1)
 #define KVM_REG_LOONGARCH_VCPU_RESET   (KVM_REG_LOONGARCH_KVM | 
KVM_REG_SIZE_U64 | 2)
+/* Debugging: Special instruction for software breakpoint */
+#define KVM_REG_LOONGARCH_DEBUG_INST   (KVM_REG_LOONGARCH_KVM | 
KVM_REG_SIZE_U64 | 3)
 
 #define LOONGARCH_REG_SHIFT3
 #define LOONGARCH_REG_64(TYPE, REG)(TYPE | KVM_REG_SIZE_U64 | (REG << 
LOONGARCH_REG_SHIFT))
diff --git a/linux-headers/asm-loongarch/mman.h 
b/linux-headers/asm-loongarch/mman.h
index 8eebf89f5a..d0dbfe9587 100644
--- a/linux-headers/asm-loongarch/mman.h
+++ b/linux-headers/asm-loongarch/mman.h
@@ -1 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __ASM_MMAN_H
+#define __ASM_MMAN_H
+
 #include 
+
+#define PROT_BTI   0x10/* BTI guarded page */
+#define PROT_MTE   0x20/* Normal Tagged mapping */
+
+#endif /* ! _UAPI__ASM_MMAN_H */
diff --git a/linux-headers/asm-riscv/kvm.h b/linux-headers/asm-riscv/kvm.h
index b1c503c295..e878e7cc39 100644
--- a/linux-headers/asm-riscv/kvm.h
+++ b/linux-headers/asm-riscv/kvm.h
@@ -167,6 +167,7 @@ enum KVM_RISCV_ISA_EXT_ID {
KVM_RISCV_ISA_EXT_ZFA,
KVM_RISCV_ISA_EXT_ZTSO,
KVM_RISCV_ISA_EXT_ZACAS,
+   KVM_RISCV_ISA_EXT_SSCOFPMF,
KVM_RISCV_ISA_EXT_MAX,
 };
 
diff --git a/linux-headers/asm-riscv/mman.h b/linux-headers/asm-riscv/mman.h
index 8eebf89f5a..8db7c2a3be 100644
--- a/linux-headers/asm-riscv/mman.h
+++ b/linux-headers/asm-riscv/mman.h
@@ -1 +1,35 @@
-#include 
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#ifndef _ASM_POWERPC_MMAN_H
+#define _ASM_POWERPC_MMAN_H
+
+#include 
+
+
+#define PROT_SAO   0x10/* Strong Access Ordering */
+
+#define MAP_RENAME  MAP_ANONYMOUS   /* In SunOS terminology */
+#define MAP_NORESERVE   0x40/* don't reserve swap pages */
+#define MAP_LOCKED 0x80
+
+#define MAP_GROWSDOWN  0x0100  /* stack-like segment */
+#define MAP_DENYWRITE  0x0800  /* ETXTBSY */
+#define MAP_EXECUTABLE 0x1000  /* mark it as an executable */
+
+
+#define MCL_CURRENT 0x2000  /* lock all currently mapped pages */
+#define MCL_FUTURE  0x4000  /* lock all additions to address space 
*/
+#define MCL_ONFAULT0x8000  /* lock all pages that are faulted in */
+
+/* Override any generic PKEY permission defines */
+#define PKEY_DISABLE_EXECUTE   0x4
+#undef PKEY_ACCESS_MASK
+#define PKEY_ACCESS_MASK   (PK

[PATCH v4 00/31] Add AMD Secure Nested Paging (SEV-SNP) support

2024-05-30 Thread Pankaj Gupta
UID 0x801f when SNP enabled
 - updated query-sev to handle differences between SEV and SEV-SNP
 - updated to work against v5 of SEV-SNP host kernel / hypervisor patches

Brijesh Singh (6):
  i386/sev: Introduce 'sev-snp-guest' object
  i386/sev: Add the SNP launch start context
  i386/sev: Add handling to encrypt/finalize guest launch data
  hw/i386/sev: Add function to get SEV metadata from OVMF header
  i386/sev: Add support for populating OVMF metadata pages
  hw/i386/sev: Add support to encrypt BIOS when SEV-SNP is enabled

Dov Murik (3):
  i386/sev: Extract build_kernel_loader_hashes
  i386/sev: Reorder struct declarations
  i386/sev: Allow measured direct kernel boot on SNP

Michael Roth (12):
  i386/sev: Introduce "sev-common" type to encapsulate common SEV state
  i386/sev: Add a sev_snp_enabled() helper
  i386/cpu: Set SEV-SNP CPUID bit when SNP enabled
  i386/sev: Don't return launch measurements for SEV-SNP guests
  i386/sev: Update query-sev QAPI format to handle SEV-SNP
  i386/sev: Set CPU state to protected once SNP guest payload is
finalized
  i386/sev: Add support for SNP CPUID validation
  hw/i386/sev: Use guest_memfd for legacy ROMs
  hw/i386: Add support for loading BIOS using guest_memfd
  hw/i386/sev: Allow use of pflash in conjunction with -bios
  i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE
  i386/sev: Enable KVM_HC_MAP_GPA_RANGE hcall for SNP guests

Pankaj Gupta (9):
  i386/sev: Replace error_report with error_setg
  linux-headers: Update to current kvm/next
  i386/sev: Move sev_launch_update to separate class method
  i386/sev: Move sev_launch_finish to separate class method
  i386/sev: Add sev_kvm_init() override for SEV class
  i386/sev: Add snp_kvm_init() override for SNP class
  i386/sev: Add a class method to determine KVM VM type for SNP guests
  i386/sev: Invoke launch_updata_data() for SEV class
  i386/sev: Invoke launch_updata_data() for SNP class

Xiaoyao Li (1):
  memory: Introduce memory_region_init_ram_guest_memfd()

 docs/system/i386/amd-memory-encryption.rst |   70 +-
 hw/i386/pc.c   |   14 +-
 hw/i386/pc_sysfw.c |   76 +-
 hw/i386/x86-common.c   |   24 +-
 include/exec/memory.h  |6 +
 include/hw/i386/pc.h   |   28 +
 include/hw/i386/x86.h  |2 +-
 linux-headers/asm-loongarch/bitsperlong.h  |   23 +
 linux-headers/asm-loongarch/kvm.h  |4 +
 linux-headers/asm-loongarch/mman.h |9 +
 linux-headers/asm-riscv/kvm.h  |1 +
 linux-headers/asm-riscv/mman.h |   36 +-
 linux-headers/asm-s390/mman.h  |   36 +-
 linux-headers/asm-x86/kvm.h|   52 +-
 linux-headers/linux/vhost.h|   15 +-
 qapi/misc-target.json  |   72 +-
 qapi/qom.json  |   97 +-
 system/memory.c|   24 +
 target/i386/cpu.c  |1 +
 target/i386/kvm/kvm.c  |   55 +
 target/i386/kvm/kvm_i386.h |1 +
 target/i386/kvm/trace-events   |1 +
 target/i386/sev-sysemu-stub.c  |2 +-
 target/i386/sev.c  | 1588 +++-
 target/i386/sev.h  |   13 +-
 target/i386/trace-events   |3 +
 26 files changed, 1833 insertions(+), 420 deletions(-)

-- 
2.34.1




[PATCH v4 13/31] i386/sev: Add a class method to determine KVM VM type for SNP guests

2024-05-30 Thread Pankaj Gupta
SEV guests can use either KVM_X86_DEFAULT_VM, KVM_X86_SEV_VM,
or KVM_X86_SEV_ES_VM depending on the configuration and what
the host kernel supports. SNP guests on the other hand can only
ever use KVM_X86_SNP_VM, so split determination of VM type out
into a separate class method that can be set accordingly for
sev-guest vs. sev-snp-guest objects and add handling for SNP.

Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 458ff5040d..8ca486f5d2 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -100,6 +100,9 @@ struct SevGuestState {
 
 struct SevGuestStateClass {
 SevCommonStateClass parent_class;
+
+/* public */
+int (*kvm_type)(X86ConfidentialGuest *cg);
 };
 
 struct SevSnpGuestState {
@@ -117,6 +120,9 @@ struct SevSnpGuestState {
 
 struct SevSnpGuestStateClass {
 SevCommonStateClass parent_class;
+
+/* public */
+int (*kvm_type)(X86ConfidentialGuest *cg);
 };
 
 #define DEFAULT_GUEST_POLICY0x1 /* disable debug */
@@ -893,6 +899,11 @@ out:
 return sev_common->kvm_type;
 }
 
+static int sev_snp_kvm_type(X86ConfidentialGuest *cg)
+{
+return KVM_X86_SNP_VM;
+}
+
 static int sev_common_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
 char *devname;
@@ -902,6 +913,8 @@ static int sev_common_kvm_init(ConfidentialGuestSupport 
*cgs, Error **errp)
 struct sev_user_data_status status = {};
 SevCommonState *sev_common = SEV_COMMON(cgs);
 SevCommonStateClass *klass = SEV_COMMON_GET_CLASS(cgs);
+X86ConfidentialGuestClass *x86_klass =
+   X86_CONFIDENTIAL_GUEST_GET_CLASS(cgs);
 
 sev_common->state = SEV_STATE_UNINIT;
 
@@ -972,7 +985,7 @@ static int sev_common_kvm_init(ConfidentialGuestSupport 
*cgs, Error **errp)
 }
 
 trace_kvm_sev_init();
-if (sev_kvm_type(X86_CONFIDENTIAL_GUEST(sev_common)) == 
KVM_X86_DEFAULT_VM) {
+if (x86_klass->kvm_type(X86_CONFIDENTIAL_GUEST(sev_common)) == 
KVM_X86_DEFAULT_VM) {
 cmd = sev_es_enabled() ? KVM_SEV_ES_INIT : KVM_SEV_INIT;
 
 ret = sev_ioctl(sev_common->sev_fd, cmd, NULL, &fw_error);
@@ -1451,10 +1464,8 @@ static void
 sev_common_class_init(ObjectClass *oc, void *data)
 {
 ConfidentialGuestSupportClass *klass = 
CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
-X86ConfidentialGuestClass *x86_klass = X86_CONFIDENTIAL_GUEST_CLASS(oc);
 
 klass->kvm_init = sev_common_kvm_init;
-x86_klass->kvm_type = sev_kvm_type;
 
 object_class_property_add_str(oc, "sev-device",
   sev_common_get_sev_device,
@@ -1539,10 +1550,12 @@ static void
 sev_guest_class_init(ObjectClass *oc, void *data)
 {
 SevCommonStateClass *klass = SEV_COMMON_CLASS(oc);
+X86ConfidentialGuestClass *x86_klass = X86_CONFIDENTIAL_GUEST_CLASS(oc);
 
 klass->launch_start = sev_launch_start;
 klass->launch_finish = sev_launch_finish;
 klass->kvm_init = sev_kvm_init;
+x86_klass->kvm_type = sev_kvm_type;
 
 object_class_property_add_str(oc, "dh-cert-file",
   sev_guest_get_dh_cert_file,
@@ -1781,8 +1794,10 @@ static void
 sev_snp_guest_class_init(ObjectClass *oc, void *data)
 {
 SevCommonStateClass *klass = SEV_COMMON_CLASS(oc);
+X86ConfidentialGuestClass *x86_klass = X86_CONFIDENTIAL_GUEST_CLASS(oc);
 
 klass->kvm_init = sev_snp_kvm_init;
+x86_klass->kvm_type = sev_snp_kvm_type;
 
 object_class_property_add(oc, "policy", "uint64",
   sev_snp_guest_get_policy,
-- 
2.34.1




[PATCH v4 12/31] i386/sev: Don't return launch measurements for SEV-SNP guests

2024-05-30 Thread Pankaj Gupta
From: Michael Roth 

For SEV-SNP guests, launch measurement is queried from within the guest
during attestation, so don't attempt to return it as part of
query-sev-launch-measure.

Signed-off-by: Michael Roth 
Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 56c1cce8e7..458ff5040d 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -803,7 +803,9 @@ sev_launch_get_measure(Notifier *notifier, void *unused)
 
 static char *sev_get_launch_measurement(void)
 {
-SevGuestState *sev_guest = SEV_GUEST(MACHINE(qdev_get_machine())->cgs);
+ConfidentialGuestSupport *cgs = MACHINE(qdev_get_machine())->cgs;
+SevGuestState *sev_guest =
+(SevGuestState *)object_dynamic_cast(OBJECT(cgs), TYPE_SEV_GUEST);
 
 if (sev_guest &&
 SEV_COMMON(sev_guest)->state >= SEV_STATE_LAUNCH_SECRET) {
-- 
2.34.1




[PATCH v4 22/31] i386/sev: Reorder struct declarations

2024-05-30 Thread Pankaj Gupta
From: Dov Murik 

Move the declaration of PaddedSevHashTable before SevSnpGuest so
we can add a new such field to the latter.

No functional change intended.

Signed-off-by: Dov Murik 
Signed-off-by: Michael Roth 
Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 56 +++
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 831745c02a..1b29fdbc9a 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -45,6 +45,34 @@ OBJECT_DECLARE_TYPE(SevCommonState, SevCommonStateClass, 
SEV_COMMON)
 OBJECT_DECLARE_TYPE(SevGuestState, SevGuestStateClass, SEV_GUEST)
 OBJECT_DECLARE_TYPE(SevSnpGuestState, SevSnpGuestStateClass, SEV_SNP_GUEST)
 
+/* hard code sha256 digest size */
+#define HASH_SIZE 32
+
+typedef struct QEMU_PACKED SevHashTableEntry {
+QemuUUID guid;
+uint16_t len;
+uint8_t hash[HASH_SIZE];
+} SevHashTableEntry;
+
+typedef struct QEMU_PACKED SevHashTable {
+QemuUUID guid;
+uint16_t len;
+SevHashTableEntry cmdline;
+SevHashTableEntry initrd;
+SevHashTableEntry kernel;
+} SevHashTable;
+
+/*
+ * Data encrypted by sev_encrypt_flash() must be padded to a multiple of
+ * 16 bytes.
+ */
+typedef struct QEMU_PACKED PaddedSevHashTable {
+SevHashTable ht;
+uint8_t padding[ROUND_UP(sizeof(SevHashTable), 16) - sizeof(SevHashTable)];
+} PaddedSevHashTable;
+
+QEMU_BUILD_BUG_ON(sizeof(PaddedSevHashTable) % 16 != 0);
+
 struct SevCommonState {
 X86ConfidentialGuest parent_obj;
 
@@ -154,34 +182,6 @@ typedef struct QEMU_PACKED SevHashTableDescriptor {
 uint32_t size;
 } SevHashTableDescriptor;
 
-/* hard code sha256 digest size */
-#define HASH_SIZE 32
-
-typedef struct QEMU_PACKED SevHashTableEntry {
-QemuUUID guid;
-uint16_t len;
-uint8_t hash[HASH_SIZE];
-} SevHashTableEntry;
-
-typedef struct QEMU_PACKED SevHashTable {
-QemuUUID guid;
-uint16_t len;
-SevHashTableEntry cmdline;
-SevHashTableEntry initrd;
-SevHashTableEntry kernel;
-} SevHashTable;
-
-/*
- * Data encrypted by sev_encrypt_flash() must be padded to a multiple of
- * 16 bytes.
- */
-typedef struct QEMU_PACKED PaddedSevHashTable {
-SevHashTable ht;
-uint8_t padding[ROUND_UP(sizeof(SevHashTable), 16) - sizeof(SevHashTable)];
-} PaddedSevHashTable;
-
-QEMU_BUILD_BUG_ON(sizeof(PaddedSevHashTable) % 16 != 0);
-
 static Error *sev_mig_blocker;
 
 static const char *const sev_fw_errlist[] = {
-- 
2.34.1




[PATCH v4 01/31] i386/sev: Replace error_report with error_setg

2024-05-30 Thread Pankaj Gupta
Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index d30b68c11e..67ed32e5ea 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -952,13 +952,13 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, 
Error **errp)
 
 if (sev_es_enabled()) {
 if (!kvm_kernel_irqchip_allowed()) {
-error_report("%s: SEV-ES guests require in-kernel irqchip support",
- __func__);
+error_setg(errp, "%s: SEV-ES guests require in-kernel irqchip"
+   "support", __func__);
 goto err;
 }
 
 if (!(status.flags & SEV_STATUS_FLAGS_CONFIG_ES)) {
-error_report("%s: guest policy requires SEV-ES, but "
+error_setg(errp, "%s: guest policy requires SEV-ES, but "
  "host SEV-ES support unavailable",
  __func__);
 goto err;
-- 
2.34.1




[PATCH v4 14/31] i386/sev: Update query-sev QAPI format to handle SEV-SNP

2024-05-30 Thread Pankaj Gupta
From: Michael Roth 

Most of the current 'query-sev' command is relevant to both legacy
SEV/SEV-ES guests and SEV-SNP guests, with 2 exceptions:

  - 'policy' is a 64-bit field for SEV-SNP, not 32-bit, and
the meaning of the bit positions has changed
  - 'handle' is not relevant to SEV-SNP

To address this, this patch adds a new 'sev-type' field that can be
used as a discriminator to select between SEV and SEV-SNP-specific
fields/formats without breaking compatibility for existing management
tools (so long as management tools that add support for launching
SEV-SNP guest update their handling of query-sev appropriately).

The corresponding HMP command has also been fixed up similarly.

Signed-off-by: Michael Roth 
Co-developed-by:Pankaj Gupta 
Signed-off-by: Pankaj Gupta 
---
 qapi/misc-target.json | 72 ++-
 target/i386/sev.c | 55 +
 target/i386/sev.h |  3 ++
 3 files changed, 96 insertions(+), 34 deletions(-)

diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..2d7d4d89bd 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -47,6 +47,50 @@
'send-update', 'receive-update' ],
   'if': 'TARGET_I386' }
 
+##
+# @SevGuestType:
+#
+# An enumeration indicating the type of SEV guest being run.
+#
+# @sev: The guest is a legacy SEV or SEV-ES guest.
+#
+# @sev-snp: The guest is an SEV-SNP guest.
+#
+# Since: 6.2
+##
+{ 'enum': 'SevGuestType',
+  'data': [ 'sev', 'sev-snp' ],
+  'if': 'TARGET_I386' }
+
+##
+# @SevGuestInfo:
+#
+# Information specific to legacy SEV/SEV-ES guests.
+#
+# @policy: SEV policy value
+#
+# @handle: SEV firmware handle
+#
+# Since: 2.12
+##
+{ 'struct': 'SevGuestInfo',
+  'data': { 'policy': 'uint32',
+'handle': 'uint32' },
+  'if': 'TARGET_I386' }
+
+##
+# @SevSnpGuestInfo:
+#
+# Information specific to SEV-SNP guests.
+#
+# @snp-policy: SEV-SNP policy value
+#
+# Since: 9.1
+##
+{ 'struct': 'SevSnpGuestInfo',
+  'data': { 'snp-policy': 'uint64' },
+  'if': 'TARGET_I386' }
+
 ##
 # @SevInfo:
 #
@@ -60,25 +104,25 @@
 #
 # @build-id: SEV FW build id
 #
-# @policy: SEV policy value
-#
 # @state: SEV guest state
 #
-# @handle: SEV firmware handle
+# @sev-type: Type of SEV guest being run
 #
 # Since: 2.12
 ##
-{ 'struct': 'SevInfo',
-'data': { 'enabled': 'bool',
-  'api-major': 'uint8',
-  'api-minor' : 'uint8',
-  'build-id' : 'uint8',
-  'policy' : 'uint32',
-  'state' : 'SevState',
-  'handle' : 'uint32'
-},
-  'if': 'TARGET_I386'
-}
+{ 'union': 'SevInfo',
+  'base': { 'enabled': 'bool',
+'api-major': 'uint8',
+'api-minor' : 'uint8',
+'build-id' : 'uint8',
+'state' : 'SevState',
+'sev-type' : 'SevGuestType' },
+  'discriminator': 'sev-type',
+  'data': {
+  'sev': 'SevGuestInfo',
+  'sev-snp': 'SevSnpGuestInfo' },
+  'if': 'TARGET_I386' }
+
 
 ##
 # @query-sev:
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 8ca486f5d2..101661bf71 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -377,25 +377,27 @@ static SevInfo *sev_get_info(void)
 {
 SevInfo *info;
 SevCommonState *sev_common = SEV_COMMON(MACHINE(qdev_get_machine())->cgs);
-SevGuestState *sev_guest =
-(SevGuestState *)object_dynamic_cast(OBJECT(sev_common),
- TYPE_SEV_GUEST);
 
 info = g_new0(SevInfo, 1);
 info->enabled = sev_enabled();
 
 if (info->enabled) {
-if (sev_guest) {
-info->handle = sev_guest->handle;
-}
 info->api_major = sev_common->api_major;
 info->api_minor = sev_common->api_minor;
 info->build_id = sev_common->build_id;
 info->state = sev_common->state;
-/* we only report the lower 32-bits of policy for SNP, ok for now... */
-info->policy =
-(uint32_t)object_property_get_uint(OBJECT(sev_common),
-   "policy", NULL);
+
+if (sev_snp_enabled()) {
+info->sev_type = SEV_GUEST_TYPE_SEV_SNP;
+   

[PATCH v4 29/31] hw/i386/sev: Allow use of pflash in conjunction with -bios

2024-05-30 Thread Pankaj Gupta
From: Michael Roth 

SEV-ES and SEV-SNP support OVMF images with non-volatile storage in
cases where the storage area is generated as a separate image as part
of the OVMF build process.

Currently these are exposed with unit=0 corresponding to the actual BIOS
image, and unit=1 corresponding to the storage image. However, pflash
images are mapped guest memory using read-only memslots, which are not
allowed in conjunction with guest_memfd-backed ranges. This makes that
approach unusable for SEV-SNP, where the BIOS range will be encrypted
and mapped as private guest_memfd-backed memory. For this reason,
SEV-SNP will instead rely on -bios to handle loading the BIOS image.

To allow for pflash to still be used for the storage image, rework the
existing logic to remove assumptions that unit=0 contains the BIOS image
when SEV-SNP, so that it can instead be used to handle only the storage
image.

Signed-off-by: Michael Roth 
Signed-off-by: Pankaj Gupta 
---
 hw/i386/pc_sysfw.c | 47 +-
 1 file changed, 30 insertions(+), 17 deletions(-)

diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index def77a442d..7f97e62b16 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -125,21 +125,10 @@ void pc_system_flash_cleanup_unused(PCMachineState *pcms)
 }
 }
 
-/*
- * Map the pcms->flash[] from 4GiB downward, and realize.
- * Map them in descending order, i.e. pcms->flash[0] at the top,
- * without gaps.
- * Stop at the first pcms->flash[0] lacking a block backend.
- * Set each flash's size from its block backend.  Fatal error if the
- * size isn't a non-zero multiple of 4KiB, or the total size exceeds
- * pcms->max_fw_size.
- *
- * If pcms->flash[0] has a block backend, its memory is passed to
- * pc_isa_bios_init().  Merging several flash devices for isa-bios is
- * not supported.
- */
-static void pc_system_flash_map(PCMachineState *pcms,
-MemoryRegion *rom_memory)
+static void pc_system_flash_map_partial(PCMachineState *pcms,
+MemoryRegion *rom_memory,
+hwaddr offset,
+bool storage_only)
 {
 X86MachineState *x86ms = X86_MACHINE(pcms);
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
@@ -154,6 +143,8 @@ static void pc_system_flash_map(PCMachineState *pcms,
 
 assert(PC_MACHINE_GET_CLASS(pcms)->pci_enabled);
 
+total_size = offset;
+
 for (i = 0; i < ARRAY_SIZE(pcms->flash); i++) {
 hwaddr gpa;
 
@@ -192,7 +183,7 @@ static void pc_system_flash_map(PCMachineState *pcms,
 sysbus_realize_and_unref(SYS_BUS_DEVICE(system_flash), &error_fatal);
 sysbus_mmio_map(SYS_BUS_DEVICE(system_flash), 0, gpa);
 
-if (i == 0) {
+if (i == 0 && !storage_only) {
 flash_mem = pflash_cfi01_get_memory(system_flash);
 if (pcmc->isa_bios_alias) {
 x86_isa_bios_init(&x86ms->isa_bios, rom_memory, flash_mem,
@@ -211,6 +202,25 @@ static void pc_system_flash_map(PCMachineState *pcms,
 }
 }
 
+/*
+ * Map the pcms->flash[] from 4GiB downward, and realize.
+ * Map them in descending order, i.e. pcms->flash[0] at the top,
+ * without gaps.
+ * Stop at the first pcms->flash[0] lacking a block backend.
+ * Set each flash's size from its block backend.  Fatal error if the
+ * size isn't a non-zero multiple of 4KiB, or the total size exceeds
+ * pcms->max_fw_size.
+ *
+ * If pcms->flash[0] has a block backend, its memory is passed to
+ * pc_isa_bios_init().  Merging several flash devices for isa-bios is
+ * not supported.
+ */
+static void pc_system_flash_map(PCMachineState *pcms,
+MemoryRegion *rom_memory)
+{
+pc_system_flash_map_partial(pcms, rom_memory, 0, false);
+}
+
 void pc_system_firmware_init(PCMachineState *pcms,
  MemoryRegion *rom_memory)
 {
@@ -238,9 +248,12 @@ void pc_system_firmware_init(PCMachineState *pcms,
 }
 }
 
-if (!pflash_blk[0]) {
+if (!pflash_blk[0] || sev_snp_enabled()) {
 /* Machine property pflash0 not set, use ROM mode */
 x86_bios_rom_init(X86_MACHINE(pcms), "bios.bin", rom_memory, false);
+if (sev_snp_enabled()) {
+pc_system_flash_map_partial(pcms, rom_memory, 3653632, true);
+}
 } else {
 if (kvm_enabled() && !kvm_readonly_mem_enabled()) {
 /*
-- 
2.34.1




[PATCH v4 04/31] i386/sev: Introduce "sev-common" type to encapsulate common SEV state

2024-05-30 Thread Pankaj Gupta
From: Michael Roth 

Currently all SEV/SEV-ES functionality is managed through a single
'sev-guest' QOM type. With upcoming support for SEV-SNP, taking this
same approach won't work well since some of the properties/state
managed by 'sev-guest' is not applicable to SEV-SNP, which will instead
rely on a new QOM type with its own set of properties/state.

To prepare for this, this patch moves common state into an abstract
'sev-common' parent type to encapsulate properties/state that are
common to both SEV/SEV-ES and SEV-SNP, leaving only SEV/SEV-ES-specific
properties/state in the current 'sev-guest' type. This should not
affect current behavior or command-line options.

As part of this patch, some related changes are also made:

  - a static 'sev_guest' variable is currently used to keep track of
the 'sev-guest' instance. SEV-SNP would similarly introduce an
'sev_snp_guest' static variable. But these instances are now
available via qdev_get_machine()->cgs, so switch to using that
instead and drop the static variable.

  - 'sev_guest' is currently used as the name for the static variable
holding a pointer to the 'sev-guest' instance. Re-purpose the name
as a local variable referring the 'sev-guest' instance, and use
that consistently throughout the code so it can be easily
distinguished from sev-common/sev-snp-guest instances.

  - 'sev' is generally used as the name for local variables holding a
pointer to the 'sev-guest' instance. In cases where that now points
to common state, use the name 'sev_common'; in cases where that now
points to state specific to 'sev-guest' instance, use the name
'sev_guest'

In order to enable kernel-hashes for SNP, pull it from
SevGuestProperties to its parent SevCommonProperties so
it will be available for both SEV and SNP.

Signed-off-by: Michael Roth 
Co-developed-by: Dov Murik 
Signed-off-by: Dov Murik 
Acked-by: Markus Armbruster  (QAPI schema)
Co-developed-by: Pankaj Gupta 
Signed-off-by: Pankaj Gupta 
---
 qapi/qom.json |  40 ++--
 target/i386/sev.c | 494 ++
 target/i386/sev.h |   3 +
 3 files changed, 306 insertions(+), 231 deletions(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 38dde6d785..056b38f491 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -875,20 +875,12 @@
   'data': { '*filename': 'str' } }
 
 ##
-# @SevGuestProperties:
+# @SevCommonProperties:
 #
-# Properties for sev-guest objects.
+# Properties common to objects that are derivatives of sev-common.
 #
 # @sev-device: SEV device to use (default: "/dev/sev")
 #
-# @dh-cert-file: guest owners DH certificate (encoded with base64)
-#
-# @session-file: guest owners session parameters (encoded with base64)
-#
-# @policy: SEV policy value (default: 0x1)
-#
-# @handle: SEV firmware handle (default: 0)
-#
 # @cbitpos: C-bit location in page table entry (default: 0)
 #
 # @reduced-phys-bits: number of bits in physical addresses that become
@@ -898,6 +890,27 @@
 # designated guest firmware page for measured boot with -kernel
 # (default: false) (since 6.2)
 #
+# Since: 9.1
+##
+{ 'struct': 'SevCommonProperties',
+  'data': { '*sev-device': 'str',
+'*cbitpos': 'uint32',
+'reduced-phys-bits': 'uint32',
+'*kernel-hashes': 'bool' } }
+
+##
+# @SevGuestProperties:
+#
+# Properties for sev-guest objects.
+#
+# @dh-cert-file: guest owners DH certificate (encoded with base64)
+#
+# @session-file: guest owners session parameters (encoded with base64)
+#
+# @policy: SEV policy value (default: 0x1)
+#
+# @handle: SEV firmware handle (default: 0)
+#
 # @legacy-vm-type: Use legacy KVM_SEV_INIT KVM interface for creating the VM.
 #  The newer KVM_SEV_INIT2 interface syncs additional vCPU
 #  state when initializing the VMSA structures, which will
@@ -909,14 +922,11 @@
 # Since: 2.12
 ##
 { 'struct': 'SevGuestProperties',
-  'data': { '*sev-device': 'str',
-'*dh-cert-file': 'str',
+  'base': 'SevCommonProperties',
+  'data': { '*dh-cert-file': 'str',
 '*session-file': 'str',
 '*policy': 'uint32',
 '*handle': 'uint32',
-'*cbitpos': 'uint32',
-'reduced-phys-bits': 'uint32',
-'*kernel-hashes': 'bool',
 '*legacy-vm-type': 'bool' } }
 
 ##
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 67ed32e5

[PATCH v4 15/31] i386/sev: Add the SNP launch start context

2024-05-30 Thread Pankaj Gupta
From: Brijesh Singh 

The SNP_LAUNCH_START is called first to create a cryptographic launch
context within the firmware.

Signed-off-by: Brijesh Singh 
Signed-off-by: Michael Roth 
Co-developed-by: Pankaj Gupta 
Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c| 39 +++
 target/i386/trace-events |  1 +
 2 files changed, 40 insertions(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 101661bf71..acbb22ff15 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -39,6 +39,7 @@
 #include "confidential-guest.h"
 #include "hw/i386/pc.h"
 #include "exec/address-spaces.h"
+#include "qemu/queue.h"
 
 OBJECT_DECLARE_TYPE(SevCommonState, SevCommonStateClass, SEV_COMMON)
 OBJECT_DECLARE_TYPE(SevGuestState, SevGuestStateClass, SEV_GUEST)
@@ -129,6 +130,16 @@ struct SevSnpGuestStateClass {
 #define DEFAULT_SEV_DEVICE  "/dev/sev"
 #define DEFAULT_SEV_SNP_POLICY  0x3
 
+typedef struct SevLaunchUpdateData {
+QTAILQ_ENTRY(SevLaunchUpdateData) next;
+hwaddr gpa;
+void *hva;
+uint64_t len;
+int type;
+} SevLaunchUpdateData;
+
+static QTAILQ_HEAD(, SevLaunchUpdateData) launch_update;
+
 #define SEV_INFO_BLOCK_GUID "00f771de-1a7e-4fcb-890e-68c77e2fb44e"
 typedef struct __attribute__((__packed__)) SevInfoBlock {
 /* SEV-ES Reset Vector Address */
@@ -688,6 +699,31 @@ sev_read_file_base64(const char *filename, guchar **data, 
gsize *len)
 return 0;
 }
 
+static int
+sev_snp_launch_start(SevCommonState *sev_common)
+{
+int fw_error, rc;
+SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(sev_common);
+struct kvm_sev_snp_launch_start *start = &sev_snp_guest->kvm_start_conf;
+
+trace_kvm_sev_snp_launch_start(start->policy,
+   sev_snp_guest->guest_visible_workarounds);
+
+rc = sev_ioctl(sev_common->sev_fd, KVM_SEV_SNP_LAUNCH_START,
+   start, &fw_error);
+if (rc < 0) {
+error_report("%s: SNP_LAUNCH_START ret=%d fw_error=%d '%s'",
+__func__, rc, fw_error, fw_error_to_str(fw_error));
+return 1;
+}
+
+QTAILQ_INIT(&launch_update);
+
+sev_set_guest_state(sev_common, SEV_STATE_LAUNCH_UPDATE);
+
+return 0;
+}
+
 static int
 sev_launch_start(SevCommonState *sev_common)
 {
@@ -1017,6 +1053,7 @@ static int sev_common_kvm_init(ConfidentialGuestSupport 
*cgs, Error **errp)
 }
 
 ret = klass->launch_start(sev_common);
+
 if (ret) {
 error_setg(errp, "%s: failed to create encryption context", __func__);
 return -1;
@@ -1811,9 +1848,11 @@ sev_snp_guest_class_init(ObjectClass *oc, void *data)
 SevCommonStateClass *klass = SEV_COMMON_CLASS(oc);
 X86ConfidentialGuestClass *x86_klass = X86_CONFIDENTIAL_GUEST_CLASS(oc);
 
+klass->launch_start = sev_snp_launch_start;
 klass->kvm_init = sev_snp_kvm_init;
 x86_klass->kvm_type = sev_snp_kvm_type;
 
+
 object_class_property_add(oc, "policy", "uint64",
   sev_snp_guest_get_policy,
   sev_snp_guest_set_policy, NULL, NULL);
diff --git a/target/i386/trace-events b/target/i386/trace-events
index 2cd8726eeb..cb26d8a925 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -11,3 +11,4 @@ kvm_sev_launch_measurement(const char *value) "data %s"
 kvm_sev_launch_finish(void) ""
 kvm_sev_launch_secret(uint64_t hpa, uint64_t hva, uint64_t secret, int len) 
"hpa 0x%" PRIx64 " hva 0x%" PRIx64 " data 0x%" PRIx64 " len %d"
 kvm_sev_attestation_report(const char *mnonce, const char *data) "mnonce %s 
data %s"
+kvm_sev_snp_launch_start(uint64_t policy, char *gosvw) "policy 0x%" PRIx64 " 
gosvw %s"
-- 
2.34.1




[PATCH v4 21/31] i386/sev: Extract build_kernel_loader_hashes

2024-05-30 Thread Pankaj Gupta
From: Dov Murik 

Extract the building of the kernel hashes table out from
sev_add_kernel_loader_hashes() to allow building it in
other memory areas (for SNP support).

No functional change intended.

Signed-off-by: Dov Murik 
Signed-off-by: Michael Roth 
Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 101 ++
 1 file changed, 58 insertions(+), 43 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 4388ffe867..831745c02a 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1751,45 +1751,16 @@ static const QemuUUID sev_cmdline_entry_guid = {
 0x4d, 0x36, 0xab, 0x2a)
 };
 
-/*
- * Add the hashes of the linux kernel/initrd/cmdline to an encrypted guest page
- * which is included in SEV's initial memory measurement.
- */
-bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp)
+static bool build_kernel_loader_hashes(PaddedSevHashTable *padded_ht,
+   SevKernelLoaderContext *ctx,
+   Error **errp)
 {
-uint8_t *data;
-SevHashTableDescriptor *area;
 SevHashTable *ht;
-PaddedSevHashTable *padded_ht;
 uint8_t cmdline_hash[HASH_SIZE];
 uint8_t initrd_hash[HASH_SIZE];
 uint8_t kernel_hash[HASH_SIZE];
 uint8_t *hashp;
 size_t hash_len = HASH_SIZE;
-hwaddr mapped_len = sizeof(*padded_ht);
-MemTxAttrs attrs = { 0 };
-bool ret = true;
-SevCommonState *sev_common = SEV_COMMON(MACHINE(qdev_get_machine())->cgs);
-
-/*
- * Only add the kernel hashes if the sev-guest configuration explicitly
- * stated kernel-hashes=on.
- */
-if (!sev_common->kernel_hashes) {
-return false;
-}
-
-if (!pc_system_ovmf_table_find(SEV_HASH_TABLE_RV_GUID, &data, NULL)) {
-error_setg(errp, "SEV: kernel specified but guest firmware "
- "has no hashes table GUID");
-return false;
-}
-area = (SevHashTableDescriptor *)data;
-if (!area->base || area->size < sizeof(PaddedSevHashTable)) {
-error_setg(errp, "SEV: guest firmware hashes table area is invalid "
- "(base=0x%x size=0x%x)", area->base, area->size);
-return false;
-}
 
 /*
  * Calculate hash of kernel command-line with the terminating null byte. If
@@ -1826,16 +1797,6 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext 
*ctx, Error **errp)
 }
 assert(hash_len == HASH_SIZE);
 
-/*
- * Populate the hashes table in the guest's memory at the OVMF-designated
- * area for the SEV hashes table
- */
-padded_ht = address_space_map(&address_space_memory, area->base,
-  &mapped_len, true, attrs);
-if (!padded_ht || mapped_len != sizeof(*padded_ht)) {
-error_setg(errp, "SEV: cannot map hashes table guest memory area");
-return false;
-}
 ht = &padded_ht->ht;
 
 ht->guid = sev_hash_table_header_guid;
@@ -1856,7 +1817,61 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext 
*ctx, Error **errp)
 /* zero the excess data so the measurement can be reliably calculated */
 memset(padded_ht->padding, 0, sizeof(padded_ht->padding));
 
-if (sev_encrypt_flash((uint8_t *)padded_ht, sizeof(*padded_ht), errp) < 0) 
{
+return true;
+}
+
+/*
+ * Add the hashes of the linux kernel/initrd/cmdline to an encrypted guest page
+ * which is included in SEV's initial memory measurement.
+ */
+bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp)
+{
+uint8_t *data;
+SevHashTableDescriptor *area;
+PaddedSevHashTable *padded_ht;
+hwaddr mapped_len = sizeof(*padded_ht);
+MemTxAttrs attrs = { 0 };
+bool ret = true;
+SevCommonState *sev_common = SEV_COMMON(MACHINE(qdev_get_machine())->cgs);
+
+/*
+ * Only add the kernel hashes if the sev-guest configuration explicitly
+ * stated kernel-hashes=on.
+ */
+if (!sev_common->kernel_hashes) {
+return false;
+}
+
+if (!pc_system_ovmf_table_find(SEV_HASH_TABLE_RV_GUID, &data, NULL)) {
+error_setg(errp, "SEV: kernel specified but guest firmware "
+ "has no hashes table GUID");
+return false;
+}
+
+area = (SevHashTableDescriptor *)data;
+if (!area->base || area->size < sizeof(PaddedSevHashTable)) {
+error_setg(errp, "SEV: guest firmware hashes table area is invalid "
+ "(base=0x%x size=0x%x)", area->base, area->size);
+return false;
+}
+
+/*
+ * Populate the hashes table in the guest's memory at the OVMF-designated
+ * area for the SEV hashes table
+ */
+padded_ht = address_space_map(&address_space_memory, area-&g

[PATCH v4 28/31] hw/i386: Add support for loading BIOS using guest_memfd

2024-05-30 Thread Pankaj Gupta
From: Michael Roth 

When guest_memfd is enabled, the BIOS is generally part of the initial
encrypted guest image and will be accessed as private guest memory. Add
the necessary changes to set up the associated RAM region with a
guest_memfd backend to allow for this.

Current support centers around using -bios to load the BIOS data.
Support for loading the BIOS via pflash requires additional enablement
since those interfaces rely on the use of ROM memory regions which make
use of the KVM_MEM_READONLY memslot flag, which is not supported for
guest_memfd-backed memslots.

Signed-off-by: Michael Roth 
Signed-off-by: Pankaj Gupta 
---
 hw/i386/x86-common.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index f41cb0a6a8..059de65f36 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -999,10 +999,18 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char 
*default_firmware,
 }
 if (bios_size <= 0 ||
 (bios_size % 65536) != 0) {
-goto bios_error;
+if (!machine_require_guest_memfd(MACHINE(x86ms))) {
+g_warning("%s: Unaligned BIOS size %d", __func__, bios_size);
+goto bios_error;
+}
+}
+if (machine_require_guest_memfd(MACHINE(x86ms))) {
+memory_region_init_ram_guest_memfd(&x86ms->bios, NULL, "pc.bios",
+   bios_size, &error_fatal);
+} else {
+memory_region_init_ram(&x86ms->bios, NULL, "pc.bios",
+   bios_size, &error_fatal);
 }
-memory_region_init_ram(&x86ms->bios, NULL, "pc.bios", bios_size,
-   &error_fatal);
 if (sev_enabled()) {
 /*
  * The concept of a "reset" simply doesn't exist for
@@ -1023,9 +1031,11 @@ void x86_bios_rom_init(X86MachineState *x86ms, const 
char *default_firmware,
 }
 g_free(filename);
 
-/* map the last 128KB of the BIOS in ISA space */
-x86_isa_bios_init(&x86ms->isa_bios, rom_memory, &x86ms->bios,
-  !isapc_ram_fw);
+if (!machine_require_guest_memfd(MACHINE(x86ms))) {
+/* map the last 128KB of the BIOS in ISA space */
+x86_isa_bios_init(&x86ms->isa_bios, rom_memory, &x86ms->bios,
+  !isapc_ram_fw);
+}
 
 /* map all the bios at the top of memory */
 memory_region_add_subregion(rom_memory,
-- 
2.34.1




[PATCH v4 23/31] i386/sev: Allow measured direct kernel boot on SNP

2024-05-30 Thread Pankaj Gupta
From: Dov Murik 

In SNP, the hashes page designated with a specific metadata entry
published in AmdSev OVMF.

Therefore, if the user enabled kernel hashes (for measured direct boot),
QEMU should prepare the content of hashes table, and during the
processing of the metadata entry it copy the content into the designated
page and encrypt it.

Note that in SNP (unlike SEV and SEV-ES) the measurements is done in
whole 4KB pages.  Therefore QEMU zeros the whole page that includes the
hashes table, and fills in the kernel hashes area in that page, and then
encrypts the whole page.  The rest of the page is reserved for SEV
launch secrets which are not usable anyway on SNP.

If the user disabled kernel hashes, QEMU pre-validates the kernel hashes
page as a zero page.

Signed-off-by: Dov Murik 
Signed-off-by: Michael Roth 
Signed-off-by: Pankaj Gupta 
---
 include/hw/i386/pc.h |  2 ++
 target/i386/sev.c| 35 +++
 2 files changed, 37 insertions(+)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index c653b8eeb2..ca7904ac2c 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -172,6 +172,8 @@ typedef enum {
 SEV_DESC_TYPE_SNP_SECRETS,
 /* The section contains address that can be used as a CPUID page */
 SEV_DESC_TYPE_CPUID,
+/* The section contains the region for kernel hashes for measured direct 
boot */
+SEV_DESC_TYPE_SNP_KERNEL_HASHES = 0x10,
 
 } ovmf_sev_metadata_desc_type;
 
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 1b29fdbc9a..1a78e98751 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -145,6 +145,9 @@ struct SevSnpGuestState {
 
 struct kvm_sev_snp_launch_start kvm_start_conf;
 struct kvm_sev_snp_launch_finish kvm_finish_conf;
+
+uint32_t kernel_hashes_offset;
+PaddedSevHashTable *kernel_hashes_data;
 };
 
 struct SevSnpGuestStateClass {
@@ -1187,6 +1190,23 @@ snp_launch_update_cpuid(uint32_t cpuid_addr, void *hva, 
uint32_t cpuid_len)
   KVM_SEV_SNP_PAGE_TYPE_CPUID);
 }
 
+static int
+snp_launch_update_kernel_hashes(SevSnpGuestState *sev_snp, uint32_t addr,
+void *hva, uint32_t len)
+{
+int type = KVM_SEV_SNP_PAGE_TYPE_ZERO;
+if (sev_snp->parent_obj.kernel_hashes) {
+assert(sev_snp->kernel_hashes_data);
+assert((sev_snp->kernel_hashes_offset +
+sizeof(*sev_snp->kernel_hashes_data)) <= len);
+memset(hva, 0, len);
+memcpy(hva + sev_snp->kernel_hashes_offset, 
sev_snp->kernel_hashes_data,
+   sizeof(*sev_snp->kernel_hashes_data));
+type = KVM_SEV_SNP_PAGE_TYPE_NORMAL;
+}
+return snp_launch_update_data(addr, hva, len, type);
+}
+
 static int
 snp_metadata_desc_to_page_type(int desc_type)
 {
@@ -1223,6 +1243,9 @@ snp_populate_metadata_pages(SevSnpGuestState *sev_snp,
 
 if (type == KVM_SEV_SNP_PAGE_TYPE_CPUID) {
 ret = snp_launch_update_cpuid(desc->base, hva, desc->len);
+} else if (desc->type == SEV_DESC_TYPE_SNP_KERNEL_HASHES) {
+ret = snp_launch_update_kernel_hashes(sev_snp, desc->base, hva,
+  desc->len);
 } else {
 ret = snp_launch_update_data(desc->base, hva, desc->len, type);
 }
@@ -1855,6 +1878,18 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext 
*ctx, Error **errp)
 return false;
 }
 
+if (sev_snp_enabled()) {
+/*
+ * SNP: Populate the hashes table in an area that later in
+ * snp_launch_update_kernel_hashes() will be copied to the guest memory
+ * and encrypted.
+ */
+SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(sev_common);
+sev_snp_guest->kernel_hashes_offset = area->base & ~TARGET_PAGE_MASK;
+sev_snp_guest->kernel_hashes_data = g_new0(PaddedSevHashTable, 1);
+return build_kernel_loader_hashes(sev_snp_guest->kernel_hashes_data, 
ctx, errp);
+}
+
 /*
  * Populate the hashes table in the guest's memory at the OVMF-designated
  * area for the SEV hashes table
-- 
2.34.1




[PATCH v4 24/31] hw/i386/sev: Add support to encrypt BIOS when SEV-SNP is enabled

2024-05-30 Thread Pankaj Gupta
From: Brijesh Singh 

As with SEV, an SNP guest requires that the BIOS be part of the initial
encrypted/measured guest payload. Extend sev_encrypt_flash() to handle
the SNP case and plumb through the GPA of the BIOS location since this
is needed for SNP.

Signed-off-by: Brijesh Singh 
Signed-off-by: Michael Roth 
Signed-off-by: Pankaj Gupta 
---
 hw/i386/pc_sysfw.c| 12 +++-
 hw/i386/x86-common.c  |  2 +-
 include/hw/i386/x86.h |  2 +-
 target/i386/sev-sysemu-stub.c |  2 +-
 target/i386/sev.c | 15 +++
 target/i386/sev.h |  2 +-
 6 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index 048d0919c1..00464afcb4 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -148,6 +148,8 @@ static void pc_system_flash_map(PCMachineState *pcms,
 assert(PC_MACHINE_GET_CLASS(pcms)->pci_enabled);
 
 for (i = 0; i < ARRAY_SIZE(pcms->flash); i++) {
+hwaddr gpa;
+
 system_flash = pcms->flash[i];
 blk = pflash_cfi01_get_blk(system_flash);
 if (!blk) {
@@ -177,11 +179,11 @@ static void pc_system_flash_map(PCMachineState *pcms,
 }
 
 total_size += size;
+gpa = 0x1ULL - total_size; /* where the flash is mapped */
 qdev_prop_set_uint32(DEVICE(system_flash), "num-blocks",
  size / FLASH_SECTOR_SIZE);
 sysbus_realize_and_unref(SYS_BUS_DEVICE(system_flash), &error_fatal);
-sysbus_mmio_map(SYS_BUS_DEVICE(system_flash), 0,
-0x1ULL - total_size);
+sysbus_mmio_map(SYS_BUS_DEVICE(system_flash), 0, gpa);
 
 if (i == 0) {
 flash_mem = pflash_cfi01_get_memory(system_flash);
@@ -196,7 +198,7 @@ static void pc_system_flash_map(PCMachineState *pcms,
 if (sev_enabled()) {
 flash_ptr = memory_region_get_ram_ptr(flash_mem);
 flash_size = memory_region_size(flash_mem);
-x86_firmware_configure(flash_ptr, flash_size);
+x86_firmware_configure(gpa, flash_ptr, flash_size);
 }
 }
 }
@@ -249,7 +251,7 @@ void pc_system_firmware_init(PCMachineState *pcms,
 pc_system_flash_cleanup_unused(pcms);
 }
 
-void x86_firmware_configure(void *ptr, int size)
+void x86_firmware_configure(hwaddr gpa, void *ptr, int size)
 {
 int ret;
 
@@ -270,6 +272,6 @@ void x86_firmware_configure(void *ptr, int size)
 exit(1);
 }
 
-sev_encrypt_flash(ptr, size, &error_fatal);
+sev_encrypt_flash(gpa, ptr, size, &error_fatal);
 }
 }
diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index ee9046d9a8..f41cb0a6a8 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -1013,7 +1013,7 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char 
*default_firmware,
  */
 void *ptr = memory_region_get_ram_ptr(&x86ms->bios);
 load_image_size(filename, ptr, bios_size);
-x86_firmware_configure(ptr, bios_size);
+x86_firmware_configure(0x1ULL - bios_size, ptr, bios_size);
 } else {
 memory_region_set_readonly(&x86ms->bios, !isapc_ram_fw);
 ret = rom_add_file_fixed(bios_name, (uint32_t)(-bios_size), -1);
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index b006f16b8d..d43cb3908e 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -154,6 +154,6 @@ void ioapic_init_gsi(GSIState *gsi_state, Object *parent);
 DeviceState *ioapic_init_secondary(GSIState *gsi_state);
 
 /* pc_sysfw.c */
-void x86_firmware_configure(void *ptr, int size);
+void x86_firmware_configure(hwaddr gpa, void *ptr, int size);
 
 #endif
diff --git a/target/i386/sev-sysemu-stub.c b/target/i386/sev-sysemu-stub.c
index 96e1c15cc3..6af643e3a1 100644
--- a/target/i386/sev-sysemu-stub.c
+++ b/target/i386/sev-sysemu-stub.c
@@ -42,7 +42,7 @@ void qmp_sev_inject_launch_secret(const char *packet_header, 
const char *secret,
 error_setg(errp, "SEV is not available in this QEMU");
 }
 
-int sev_encrypt_flash(uint8_t *ptr, uint64_t len, Error **errp)
+int sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t len, Error **errp)
 {
 g_assert_not_reached();
 }
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 1a78e98751..c5c703bc8d 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1522,7 +1522,7 @@ static int sev_snp_kvm_init(ConfidentialGuestSupport 
*cgs, Error **errp)
 }
 
 int
-sev_encrypt_flash(uint8_t *ptr, uint64_t len, Error **errp)
+sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t len, Error **errp)
 {
 SevCommonState *sev_common = SEV_COMMON(MACHINE(qdev_get_machine())->cgs);
 
@@ -1532,7 +1532,14 @@ sev_encrypt_flash(uint8_t *ptr, uint64_t len, Error 
**errp)
 
 /* if SEV is in update state then encrypt the data else do nothing */
 if (sev_check_state(sev_common, SEV_STAT

[PATCH v4 18/31] hw/i386/sev: Add function to get SEV metadata from OVMF header

2024-05-30 Thread Pankaj Gupta
From: Brijesh Singh 

A recent version of OVMF expanded the reset vector GUID list to add
SEV-specific metadata GUID. The SEV metadata describes the reserved
memory regions such as the secrets and CPUID page used during the SEV-SNP
guest launch.

The pc_system_get_ovmf_sev_metadata_ptr() is used to retieve the SEV
metadata pointer from the OVMF GUID list.

Signed-off-by: Brijesh Singh 
Signed-off-by: Michael Roth 
Signed-off-by: Pankaj Gupta 
---
 hw/i386/pc_sysfw.c   |  4 
 include/hw/i386/pc.h | 26 ++
 target/i386/sev.c| 31 +++
 target/i386/sev.h|  2 ++
 4 files changed, 63 insertions(+)

diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index ac88ad4eb9..048d0919c1 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -260,6 +260,10 @@ void x86_firmware_configure(void *ptr, int size)
 pc_system_parse_ovmf_flash(ptr, size);
 
 if (sev_enabled()) {
+
+/* Copy the SEV metadata table (if exist) */
+pc_system_parse_sev_metadata(ptr, size);
+
 ret = sev_es_save_reset_vector(ptr, size);
 if (ret) {
 error_report("failed to locate and/or save reset vector");
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index ad9c3d9ba8..c653b8eeb2 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -164,6 +164,32 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int 
level);
 #define PCI_HOST_ABOVE_4G_MEM_SIZE "above-4g-mem-size"
 #define PCI_HOST_PROP_SMM_RANGES   "smm-ranges"
 
+typedef enum {
+SEV_DESC_TYPE_UNDEF,
+/* The section contains the region that must be validated by the VMM. */
+SEV_DESC_TYPE_SNP_SEC_MEM,
+/* The section contains the SNP secrets page */
+SEV_DESC_TYPE_SNP_SECRETS,
+/* The section contains address that can be used as a CPUID page */
+SEV_DESC_TYPE_CPUID,
+
+} ovmf_sev_metadata_desc_type;
+
+typedef struct __attribute__((__packed__)) OvmfSevMetadataDesc {
+uint32_t base;
+uint32_t len;
+ovmf_sev_metadata_desc_type type;
+} OvmfSevMetadataDesc;
+
+typedef struct __attribute__((__packed__)) OvmfSevMetadata {
+uint8_t signature[4];
+uint32_t len;
+uint32_t version;
+uint32_t num_desc;
+OvmfSevMetadataDesc descs[];
+} OvmfSevMetadata;
+
+OvmfSevMetadata *pc_system_get_ovmf_sev_metadata_ptr(void);
 
 void pc_pci_as_mapping_init(MemoryRegion *system_memory,
 MemoryRegion *pci_address_space);
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 2ca9a86bf3..d9d1d97f0c 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -611,6 +611,37 @@ SevCapability *qmp_query_sev_capabilities(Error **errp)
 return sev_get_capabilities(errp);
 }
 
+static OvmfSevMetadata *ovmf_sev_metadata_table;
+
+#define OVMF_SEV_META_DATA_GUID "dc886566-984a-4798-A75e-5585a7bf67cc"
+typedef struct __attribute__((__packed__)) OvmfSevMetadataOffset {
+uint32_t offset;
+} OvmfSevMetadataOffset;
+
+OvmfSevMetadata *pc_system_get_ovmf_sev_metadata_ptr(void)
+{
+return ovmf_sev_metadata_table;
+}
+
+void pc_system_parse_sev_metadata(uint8_t *flash_ptr, size_t flash_size)
+{
+OvmfSevMetadata *metadata;
+OvmfSevMetadataOffset  *data;
+
+if (!pc_system_ovmf_table_find(OVMF_SEV_META_DATA_GUID, (uint8_t **)&data,
+   NULL)) {
+return;
+}
+
+metadata = (OvmfSevMetadata *)(flash_ptr + flash_size - data->offset);
+if (memcmp(metadata->signature, "ASEV", 4) != 0) {
+return;
+}
+
+ovmf_sev_metadata_table = g_malloc(metadata->len);
+memcpy(ovmf_sev_metadata_table, metadata, metadata->len);
+}
+
 static SevAttestationReport *sev_get_attestation_report(const char *mnonce,
 Error **errp)
 {
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 5dc4767b1e..cc12824dd6 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -66,4 +66,6 @@ int sev_inject_launch_secret(const char *hdr, const char 
*secret,
 int sev_es_save_reset_vector(void *flash_ptr, uint64_t flash_size);
 void sev_es_set_reset_vector(CPUState *cpu);
 
+void pc_system_parse_sev_metadata(uint8_t *flash_ptr, size_t flash_size);
+
 #endif
-- 
2.34.1




[PATCH v4 09/31] i386/sev: Add sev_kvm_init() override for SEV class

2024-05-30 Thread Pankaj Gupta
Some aspects of the init routine SEV are specific to SEV and not
applicable for SNP guests, so move the SEV-specific bits into
separate class method and retain only the common functionality.

Co-developed-by: Michael Roth 
Signed-off-by: Michael Roth 
Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 72 +--
 1 file changed, 51 insertions(+), 21 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index f4f1971202..2a9a77a2d9 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -73,6 +73,7 @@ struct SevCommonStateClass {
 /* public */
 int (*launch_start)(SevCommonState *sev_common);
 void (*launch_finish)(SevCommonState *sev_common);
+int (*kvm_init)(ConfidentialGuestSupport *cgs, Error **errp);
 };
 
 /**
@@ -890,7 +891,7 @@ out:
 return sev_common->kvm_type;
 }
 
-static int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
+static int sev_common_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
 SevCommonState *sev_common = SEV_COMMON(cgs);
 char *devname;
@@ -900,12 +901,6 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, 
Error **errp)
 struct sev_user_data_status status = {};
 SevCommonStateClass *klass = SEV_COMMON_GET_CLASS(cgs);
 
-ret = ram_block_discard_disable(true);
-if (ret) {
-error_report("%s: cannot disable RAM discard", __func__);
-return -1;
-}
-
 sev_common->state = SEV_STATE_UNINIT;
 
 host_cpuid(0x801F, 0, NULL, &ebx, NULL, NULL);
@@ -919,7 +914,7 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, 
Error **errp)
 if (host_cbitpos != sev_common->cbitpos) {
 error_setg(errp, "%s: cbitpos check failed, host '%d' requested '%d'",
__func__, host_cbitpos, sev_common->cbitpos);
-goto err;
+return -1;
 }
 
 /*
@@ -932,7 +927,7 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, 
Error **errp)
 error_setg(errp, "%s: reduced_phys_bits check failed,"
" it should be in the range of 1 to 63, requested '%d'",
__func__, sev_common->reduced_phys_bits);
-goto err;
+return -1;
 }
 
 devname = object_property_get_str(OBJECT(sev_common), "sev-device", NULL);
@@ -941,7 +936,7 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, 
Error **errp)
 error_setg(errp, "%s: Failed to open %s '%s'", __func__,
devname, strerror(errno));
 g_free(devname);
-goto err;
+return -1;
 }
 g_free(devname);
 
@@ -951,7 +946,7 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, 
Error **errp)
 error_setg(errp, "%s: failed to get platform status ret=%d "
"fw_error='%d: %s'", __func__, ret, fw_error,
fw_error_to_str(fw_error));
-goto err;
+return -1;
 }
 sev_common->build_id = status.build;
 sev_common->api_major = status.api_major;
@@ -961,7 +956,7 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, 
Error **errp)
 if (!kvm_kernel_irqchip_allowed()) {
 error_setg(errp, "%s: SEV-ES guests require in-kernel irqchip"
"support", __func__);
-goto err;
+return -1;
 }
 }
 
@@ -970,7 +965,7 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, 
Error **errp)
 error_setg(errp, "%s: guest policy requires SEV-ES, but "
  "host SEV-ES support unavailable",
  __func__);
-goto err;
+return -1;
 }
 }
 
@@ -988,25 +983,59 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, 
Error **errp)
 if (ret) {
 error_setg(errp, "%s: failed to initialize ret=%d fw_error=%d '%s'",
__func__, ret, fw_error, fw_error_to_str(fw_error));
-goto err;
+return -1;
 }
 
 ret = klass->launch_start(sev_common);
 if (ret) {
 error_setg(errp, "%s: failed to create encryption context", __func__);
-goto err;
+return -1;
+}
+
+if (klass->kvm_init && klass->kvm_init(cgs, errp)) {
+return -1;
 }
 
-ram_block_notifier_add(&sev_ram_notifier);
-qemu_add_machine_init_done_notifier(&sev_machine_done_notify);
 qemu_add_vm_change_state_handler(sev_vm_state_change, sev_common);
 
 cgs->ready = true;
 
 return 0;
-err:
-ram_block_discard_disable(false);
-return -1;
+}
+
+static int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
+{
+ int ret;
+
+/*
+ * SEV/SEV-ES rely on pinned memory to back guest RAM so discarding
+ * isn't actually possi

[PATCH v4 31/31] i386/sev: Enable KVM_HC_MAP_GPA_RANGE hcall for SNP guests

2024-05-30 Thread Pankaj Gupta
From: Michael Roth 

KVM will forward GHCB page-state change requests to userspace in the
form of KVM_HC_MAP_GPA_RANGE, so make sure the hypercall handling is
enabled for SNP guests.

Signed-off-by: Michael Roth 
Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 7d2f67e2f3..c1872ce3a4 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -14,6 +14,7 @@
 #include "qemu/osdep.h"
 
 #include 
+#include 
 #include 
 
 #include 
@@ -774,6 +775,10 @@ sev_snp_launch_start(SevCommonState *sev_common)
 trace_kvm_sev_snp_launch_start(start->policy,
sev_snp_guest->guest_visible_workarounds);
 
+if (!kvm_enable_hypercall(BIT_ULL(KVM_HC_MAP_GPA_RANGE))) {
+return 1;
+}
+
 rc = sev_ioctl(sev_common->sev_fd, KVM_SEV_SNP_LAUNCH_START,
start, &fw_error);
 if (rc < 0) {
-- 
2.34.1




[PATCH v4 07/31] i386/sev: Introduce 'sev-snp-guest' object

2024-05-30 Thread Pankaj Gupta
From: Brijesh Singh 

SEV-SNP support relies on a different set of properties/state than the
existing 'sev-guest' object. This patch introduces the 'sev-snp-guest'
object, which can be used to configure an SEV-SNP guest. For example,
a default-configured SEV-SNP guest with no additional information
passed in for use with attestation:

  -object sev-snp-guest,id=sev0

or a fully-specified SEV-SNP guest where all spec-defined binary
blobs are passed in as base64-encoded strings:

  -object sev-snp-guest,id=sev0, \
policy=0x3, \
init-flags=0, \
id-block=YWFhYWFhYWFhYWFhYWFhCg==, \
id-auth=CxHK/OKLkXGn/KpAC7Wl1FSiisWDbGTEKz..., \
auth-key-enabled=on, \
host-data=LNkCWBRC5CcdGXirbNUV1OrsR28s..., \
guest-visible-workarounds=AA==, \

See the QAPI schema updates included in this patch for more usage
details.

In some cases these blobs may be up to 4096 characters, but this is
generally well below the default limit for linux hosts where
command-line sizes are defined by the sysconf-configurable ARG_MAX
value, which defaults to 2097152 characters for Ubuntu hosts, for
example.

Signed-off-by: Brijesh Singh 
Co-developed-by: Michael Roth 
Acked-by: Markus Armbruster  (for QAPI schema)
Signed-off-by: Michael Roth 
Co-developed-by: Pankaj Gupta 
Signed-off-by: Pankaj Gupta 
---
 docs/system/i386/amd-memory-encryption.rst |  70 +-
 qapi/qom.json  |  57 +
 target/i386/sev.c  | 257 +
 target/i386/sev.h  |   1 +
 4 files changed, 383 insertions(+), 2 deletions(-)

diff --git a/docs/system/i386/amd-memory-encryption.rst 
b/docs/system/i386/amd-memory-encryption.rst
index e9bc142bc1..5849ec8972 100644
--- a/docs/system/i386/amd-memory-encryption.rst
+++ b/docs/system/i386/amd-memory-encryption.rst
@@ -25,8 +25,8 @@ support for notifying a guest's operating system when certain 
types of VMEXITs
 are about to occur. This allows the guest to selectively share information with
 the hypervisor to satisfy the requested function.
 
-Launching
--
+Launching (SEV and SEV-ES)
+--
 
 Boot images (such as bios) must be encrypted before a guest can be booted. The
 ``MEMORY_ENCRYPT_OP`` ioctl provides commands to encrypt the images: 
``LAUNCH_START``,
@@ -161,6 +161,72 @@ The value of GCTX.LD is
 If kernel hashes are not used, or SEV-ES is disabled, use empty blobs for
 ``kernel_hashes_blob`` and ``vmsas_blob`` as needed.
 
+Launching (SEV-SNP)
+---
+Boot images (such as bios) must be encrypted before a guest can be booted. The
+``MEMORY_ENCRYPT_OP`` ioctl provides commands to encrypt the images:
+``SNP_LAUNCH_START``, ``SNP_LAUNCH_UPDATE``, and ``SNP_LAUNCH_FINISH``. These
+three commands communicate with SEV-SNP firmware to generate a fresh memory
+encryption key for the VM, encrypt the boot images for a successful launch. For
+more details on the SEV-SNP firmware interfaces used by these commands please
+see the SEV-SNP Firmware ABI.
+
+``SNP_LAUNCH_START`` is called first to create a cryptographic launch context
+within the firmware. To create this context, the guest owner must provide a
+guest policy and other parameters as described in the SEV-SNP firmware
+specification. The launch parameters should be specified as described in the
+QAPI schema for the sev-snp-guest object.
+
+The ``SNP_LAUNCH_START`` uses the following parameters, which can be configured
+by the corresponding parameters documented in the QAPI schema for the
+'sev-snp-guest' object.
+
+++---+--+-+
+| key   | type  | default  | meaning  |
++---+-+
+| policy| hex   | 0x3  | a 64-bit guest policy|
++---+-+
+| guest-visible-workarounds | string| 0| 16-byte base64 encoded string|
+|   |   |  | for guest OS visible |
+|   |   |  | workarounds. |
++---+-+
+
+``SNP_LAUNCH_UPDATE`` encrypts the memory region using the cryptographic 
context
+created via the ``SNP_LAUNCH_START`` command. If required, this command can be
+called multiple times to encrypt different memory regions. The command also
+calculates the measurement of the memory contents as it encrypts.
+
+``SNP_LAUNCH_FINISH`` finalizes the guest launch flow. Optionally, while
+finalizing the launch the firmware can perform checks on the launch digest
+computing through the ``SNP_LAUNCH_UPDATE``. To perform the check the user must
+supply the id block, authentication blob and host data that should be included
+in the attestation report. See t

[PATCH v4 17/31] i386/sev: Set CPU state to protected once SNP guest payload is finalized

2024-05-30 Thread Pankaj Gupta
From: Michael Roth 

Once KVM_SNP_LAUNCH_FINISH is called the vCPU state is copied into the
vCPU's VMSA page and measured/encrypted. Any attempt to read/write CPU
state afterward will only be acting on the initial data and so are
effectively no-ops.

Set the vCPU state to protected at this point so that QEMU don't
continue trying to re-sync vCPU data during guest runtime.

Signed-off-by: Michael Roth 
Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index cd47c195cd..2ca9a86bf3 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1011,6 +1011,7 @@ sev_snp_launch_finish(SevCommonState *sev_common)
 exit(1);
 }
 
+kvm_mark_guest_state_protected();
 sev_set_guest_state(sev_common, SEV_STATE_RUNNING);
 
 /* add migration blocker */
-- 
2.34.1




[PATCH v4 19/31] i386/sev: Add support for populating OVMF metadata pages

2024-05-30 Thread Pankaj Gupta
From: Brijesh Singh 

OVMF reserves various pages so they can be pre-initialized/validated
prior to launching the guest. Add support for populating these pages
with the expected content.

Signed-off-by: Brijesh Singh 
Signed-off-by: Michael Roth 
Co-developed-by: Pankaj Gupta 
Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 74 +++
 1 file changed, 74 insertions(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index d9d1d97f0c..504f641038 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1016,15 +1016,89 @@ sev_launch_finish(SevCommonState *sev_common)
 migrate_add_blocker(&sev_mig_blocker, &error_fatal);
 }
 
+static int
+snp_launch_update_data(uint64_t gpa, void *hva, uint32_t len, int type)
+{
+SevLaunchUpdateData *data;
+
+data = g_new0(SevLaunchUpdateData, 1);
+data->gpa = gpa;
+data->hva = hva;
+data->len = len;
+data->type = type;
+
+QTAILQ_INSERT_TAIL(&launch_update, data, next);
+
+return 0;
+}
+
+static int
+snp_metadata_desc_to_page_type(int desc_type)
+{
+switch (desc_type) {
+/* Add the umeasured prevalidated pages as a zero page */
+case SEV_DESC_TYPE_SNP_SEC_MEM: return KVM_SEV_SNP_PAGE_TYPE_ZERO;
+case SEV_DESC_TYPE_SNP_SECRETS: return KVM_SEV_SNP_PAGE_TYPE_SECRETS;
+case SEV_DESC_TYPE_CPUID: return KVM_SEV_SNP_PAGE_TYPE_CPUID;
+default:
+ return KVM_SEV_SNP_PAGE_TYPE_ZERO;
+}
+}
+
+static void
+snp_populate_metadata_pages(SevSnpGuestState *sev_snp,
+OvmfSevMetadata *metadata)
+{
+OvmfSevMetadataDesc *desc;
+int type, ret, i;
+void *hva;
+MemoryRegion *mr = NULL;
+
+for (i = 0; i < metadata->num_desc; i++) {
+desc = &metadata->descs[i];
+
+type = snp_metadata_desc_to_page_type(desc->type);
+
+hva = gpa2hva(&mr, desc->base, desc->len, NULL);
+if (!hva) {
+error_report("%s: Failed to get HVA for GPA 0x%x sz 0x%x",
+ __func__, desc->base, desc->len);
+exit(1);
+}
+
+ret = snp_launch_update_data(desc->base, hva, desc->len, type);
+if (ret) {
+error_report("%s: Failed to add metadata page gpa 0x%x+%x type %d",
+ __func__, desc->base, desc->len, desc->type);
+exit(1);
+}
+}
+}
+
 static void
 sev_snp_launch_finish(SevCommonState *sev_common)
 {
 int ret, error;
 Error *local_err = NULL;
+OvmfSevMetadata *metadata;
 SevLaunchUpdateData *data;
 SevSnpGuestState *sev_snp = SEV_SNP_GUEST(sev_common);
 struct kvm_sev_snp_launch_finish *finish = &sev_snp->kvm_finish_conf;
 
+/*
+ * To boot the SNP guest, the hypervisor is required to populate the CPUID
+ * and Secrets page before finalizing the launch flow. The location of
+ * the secrets and CPUID page is available through the OVMF metadata GUID.
+ */
+metadata = pc_system_get_ovmf_sev_metadata_ptr();
+if (metadata == NULL) {
+error_report("%s: Failed to locate SEV metadata header", __func__);
+exit(1);
+}
+
+/* Populate all the metadata pages */
+snp_populate_metadata_pages(sev_snp, metadata);
+
 QTAILQ_FOREACH(data, &launch_update, next) {
 ret = sev_snp_launch_update(sev_snp, data);
 if (ret) {
-- 
2.34.1




[PATCH v4 25/31] i386/sev: Invoke launch_updata_data() for SEV class

2024-05-30 Thread Pankaj Gupta
Add launch_update_data() in SevCommonStateClass and
invoke as sev_launch_update_data() for SEV object.

Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index c5c703bc8d..7a0c2ee10f 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -102,6 +102,7 @@ struct SevCommonStateClass {
 /* public */
 int (*launch_start)(SevCommonState *sev_common);
 void (*launch_finish)(SevCommonState *sev_common);
+int (*launch_update_data)(hwaddr gpa, uint8_t *ptr, uint64_t len);
 int (*kvm_init)(ConfidentialGuestSupport *cgs, Error **errp);
 };
 
@@ -945,10 +946,11 @@ out:
 }
 
 static int
-sev_launch_update_data(SevGuestState *sev_guest, uint8_t *addr, uint64_t len)
+sev_launch_update_data(hwaddr gpa, uint8_t *addr, uint64_t len)
 {
 int ret, fw_error;
 struct kvm_sev_launch_update_data update;
+SevCommonState *sev_common = SEV_COMMON(MACHINE(qdev_get_machine())->cgs);
 
 if (!addr || !len) {
 return 1;
@@ -957,7 +959,7 @@ sev_launch_update_data(SevGuestState *sev_guest, uint8_t 
*addr, uint64_t len)
 update.uaddr = (uintptr_t)addr;
 update.len = len;
 trace_kvm_sev_launch_update_data(addr, len);
-ret = sev_ioctl(SEV_COMMON(sev_guest)->sev_fd, KVM_SEV_LAUNCH_UPDATE_DATA,
+ret = sev_ioctl(sev_common->sev_fd, KVM_SEV_LAUNCH_UPDATE_DATA,
 &update, &fw_error);
 if (ret) {
 error_report("%s: LAUNCH_UPDATE ret=%d fw_error=%d '%s'",
@@ -1525,6 +1527,7 @@ int
 sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t len, Error **errp)
 {
 SevCommonState *sev_common = SEV_COMMON(MACHINE(qdev_get_machine())->cgs);
+SevCommonStateClass *klass = SEV_COMMON_GET_CLASS(sev_common);
 
 if (!sev_common) {
 return 0;
@@ -1534,12 +1537,7 @@ sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t 
len, Error **errp)
 if (sev_check_state(sev_common, SEV_STATE_LAUNCH_UPDATE)) {
 int ret;
 
-if (sev_snp_enabled()) {
-ret = snp_launch_update_data(gpa, ptr, len,
- KVM_SEV_SNP_PAGE_TYPE_NORMAL);
-} else {
-ret = sev_launch_update_data(SEV_GUEST(sev_common), ptr, len);
-}
+ret = klass->launch_update_data(gpa, ptr, len);
 if (ret < 0) {
 error_setg(errp, "SEV: Failed to encrypt pflash rom");
 return ret;
@@ -2039,6 +2037,7 @@ sev_guest_class_init(ObjectClass *oc, void *data)
 
 klass->launch_start = sev_launch_start;
 klass->launch_finish = sev_launch_finish;
+klass->launch_update_data = sev_launch_update_data;
 klass->kvm_init = sev_kvm_init;
 x86_klass->kvm_type = sev_kvm_type;
 
-- 
2.34.1




[PATCH v4 30/31] i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE

2024-05-30 Thread Pankaj Gupta
From: Michael Roth 

KVM_HC_MAP_GPA_RANGE will be used to send requests to userspace for
private/shared memory attribute updates requested by the guest.
Implement handling for that use-case along with some basic
infrastructure for enabling specific hypercall events.

Signed-off-by: Michael Roth 
Signed-off-by: Pankaj Gupta 
---
 target/i386/kvm/kvm.c| 55 
 target/i386/kvm/kvm_i386.h   |  1 +
 target/i386/kvm/trace-events |  1 +
 3 files changed, 57 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 6c864e4611..e72c295f77 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -21,6 +21,7 @@
 #include 
 
 #include 
+#include 
 #include "standard-headers/asm-x86/kvm_para.h"
 #include "hw/xen/interface/arch-x86/cpuid.h"
 
@@ -208,6 +209,13 @@ int kvm_get_vm_type(MachineState *ms)
 return kvm_type;
 }
 
+bool kvm_enable_hypercall(uint64_t enable_mask)
+{
+KVMState *s = KVM_STATE(current_accel());
+
+return !kvm_vm_enable_cap(s, KVM_CAP_EXIT_HYPERCALL, 0, enable_mask);
+}
+
 bool kvm_has_smm(void)
 {
 return kvm_vm_check_extension(kvm_state, KVM_CAP_X86_SMM);
@@ -5321,6 +5329,50 @@ static bool host_supports_vmx(void)
 return ecx & CPUID_EXT_VMX;
 }
 
+/*
+ * Currently the handling here only supports use of KVM_HC_MAP_GPA_RANGE
+ * to service guest-initiated memory attribute update requests so that
+ * KVM_SET_MEMORY_ATTRIBUTES can update whether or not a page should be
+ * backed by the private memory pool provided by guest_memfd, and as such
+ * is only applicable to guest_memfd-backed guests (e.g. SNP/TDX).
+ *
+ * Other other use-cases for KVM_HC_MAP_GPA_RANGE, such as for SEV live
+ * migration, are not implemented here currently.
+ *
+ * For the guest_memfd use-case, these exits will generally be synthesized
+ * by KVM based on platform-specific hypercalls, like GHCB requests in the
+ * case of SEV-SNP, and not issued directly within the guest though the
+ * KVM_HC_MAP_GPA_RANGE hypercall. So in this case, KVM_HC_MAP_GPA_RANGE is
+ * not actually advertised to guests via the KVM CPUID feature bit, as
+ * opposed to SEV live migration where it would be. Since it is unlikely the
+ * SEV live migration use-case would be useful for guest-memfd backed guests,
+ * because private/shared page tracking is already provided through other
+ * means, these 2 use-cases should be treated as being mutually-exclusive.
+ */
+static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
+{
+uint64_t gpa, size, attributes;
+
+if (!machine_require_guest_memfd(current_machine))
+return -EINVAL;
+
+gpa = run->hypercall.args[0];
+size = run->hypercall.args[1] * TARGET_PAGE_SIZE;
+attributes = run->hypercall.args[2];
+
+trace_kvm_hc_map_gpa_range(gpa, size, attributes, run->hypercall.flags);
+
+return kvm_convert_memory(gpa, size, attributes & 
KVM_MAP_GPA_RANGE_ENCRYPTED);
+}
+
+static int kvm_handle_hypercall(struct kvm_run *run)
+{
+if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
+return kvm_handle_hc_map_gpa_range(run);
+
+return -EINVAL;
+}
+
 #define VMX_INVALID_GUEST_STATE 0x8021
 
 int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
@@ -5416,6 +5468,9 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run 
*run)
 ret = kvm_xen_handle_exit(cpu, &run->xen);
 break;
 #endif
+case KVM_EXIT_HYPERCALL:
+ret = kvm_handle_hypercall(run);
+break;
 default:
 fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
 ret = -1;
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index 6b44844d95..34fc60774b 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -33,6 +33,7 @@
 bool kvm_has_smm(void);
 bool kvm_enable_x2apic(void);
 bool kvm_hv_vpindex_settable(void);
+bool kvm_enable_hypercall(uint64_t enable_mask);
 
 bool kvm_enable_sgx_provisioning(KVMState *s);
 bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp);
diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index b365a8e8e2..74a6234ff7 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -5,6 +5,7 @@ kvm_x86_fixup_msi_error(uint32_t gsi) "VT-d failed to remap 
interrupt for GSI %"
 kvm_x86_add_msi_route(int virq) "Adding route entry for virq %d"
 kvm_x86_remove_msi_route(int virq) "Removing route entry for virq %d"
 kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
+kvm_hc_map_gpa_range(uint64_t gpa, uint64_t size, uint64_t attributes, 
uint64_t flags) "gpa 0x%" PRIx64 " size 0x%" PRIx64 " attributes 0x%" PRIx64 " 
flags 0x%" PRIx64
 
 # xen-emu.c
 kvm_xen_hypercall(int cpu, uint8_t cpl, uint64_t input, uint64_t a0, uint64_t 
a1, uint64_t a2, uint64_t ret) "xen_hypercall: cpu %d cpl %d input %" PRIu64 " 
a0 0x%" PRIx64 " a1 0x%" PRIx64 " a2 0x%" PRIx64" ret 0x%" PRIx64
-- 
2.34.1




[PATCH v4 03/31] memory: Introduce memory_region_init_ram_guest_memfd()

2024-05-30 Thread Pankaj Gupta
From: Xiaoyao Li 

Introduce memory_region_init_ram_guest_memfd() to allocate private
guset memfd on the MemoryRegion initialization. It's for the use case of
TDVF, which must be private on TDX case.

Signed-off-by: Xiaoyao Li 
Signed-off-by: Michael Roth 
Signed-off-by: Pankaj Gupta 
---
 include/exec/memory.h |  6 ++
 system/memory.c   | 24 
 2 files changed, 30 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 9cdd64e9c6..1be58f694c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1638,6 +1638,12 @@ bool memory_region_init_ram(MemoryRegion *mr,
 uint64_t size,
 Error **errp);
 
+bool memory_region_init_ram_guest_memfd(MemoryRegion *mr,
+Object *owner,
+const char *name,
+uint64_t size,
+Error **errp);
+
 /**
  * memory_region_init_rom: Initialize a ROM memory region.
  *
diff --git a/system/memory.c b/system/memory.c
index 9540caa8a1..74cd73ebc7 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -3649,6 +3649,30 @@ bool memory_region_init_ram(MemoryRegion *mr,
 return true;
 }
 
+bool memory_region_init_ram_guest_memfd(MemoryRegion *mr,
+Object *owner,
+const char *name,
+uint64_t size,
+Error **errp)
+{
+DeviceState *owner_dev;
+
+if (!memory_region_init_ram_flags_nomigrate(mr, owner, name, size,
+RAM_GUEST_MEMFD, errp)) {
+return false;
+}
+/* This will assert if owner is neither NULL nor a DeviceState.
+ * We only want the owner here for the purposes of defining a
+ * unique name for migration. TODO: Ideally we should implement
+ * a naming scheme for Objects which are not DeviceStates, in
+ * which case we can relax this restriction.
+ */
+owner_dev = DEVICE(owner);
+vmstate_register_ram(mr, owner_dev);
+
+return true;
+}
+
 bool memory_region_init_rom(MemoryRegion *mr,
 Object *owner,
 const char *name,
-- 
2.34.1




[PATCH v4 20/31] i386/sev: Add support for SNP CPUID validation

2024-05-30 Thread Pankaj Gupta
From: Michael Roth 

SEV-SNP firmware allows a special guest page to be populated with a
table of guest CPUID values so that they can be validated through
firmware before being loaded into encrypted guest memory where they can
be used in place of hypervisor-provided values[1].

As part of SEV-SNP guest initialization, use this interface to validate
the CPUID entries reported by KVM_GET_CPUID2 prior to initial guest
start and populate the CPUID page reserved by OVMF with the resulting
encrypted data.

[1] SEV SNP Firmware ABI Specification, Rev. 0.8, 8.13.2.6

Signed-off-by: Michael Roth 
Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 164 +-
 1 file changed, 162 insertions(+), 2 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 504f641038..4388ffe867 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -214,6 +214,36 @@ static const char *const sev_fw_errlist[] = {
 
 #define SEV_FW_MAX_ERROR  ARRAY_SIZE(sev_fw_errlist)
 
+/*  doesn't expose this, so re-use the max from kvm.c */
+#define KVM_MAX_CPUID_ENTRIES 100
+
+typedef struct KvmCpuidInfo {
+struct kvm_cpuid2 cpuid;
+struct kvm_cpuid_entry2 entries[KVM_MAX_CPUID_ENTRIES];
+} KvmCpuidInfo;
+
+#define SNP_CPUID_FUNCTION_MAXCOUNT 64
+#define SNP_CPUID_FUNCTION_UNKNOWN 0x
+
+typedef struct {
+uint32_t eax_in;
+uint32_t ecx_in;
+uint64_t xcr0_in;
+uint64_t xss_in;
+uint32_t eax;
+uint32_t ebx;
+uint32_t ecx;
+uint32_t edx;
+uint64_t reserved;
+} __attribute__((packed)) SnpCpuidFunc;
+
+typedef struct {
+uint32_t count;
+uint32_t reserved1;
+uint64_t reserved2;
+SnpCpuidFunc entries[SNP_CPUID_FUNCTION_MAXCOUNT];
+} __attribute__((packed)) SnpCpuidInfo;
+
 static int
 sev_ioctl(int fd, int cmd, void *data, int *error)
 {
@@ -801,6 +831,35 @@ out:
 return ret;
 }
 
+static void
+sev_snp_cpuid_report_mismatches(SnpCpuidInfo *old,
+SnpCpuidInfo *new)
+{
+size_t i;
+
+if (old->count != new->count) {
+error_report("SEV-SNP: CPUID validation failed due to count mismatch,"
+ "provided: %d, expected: %d", old->count, new->count);
+return;
+}
+
+for (i = 0; i < old->count; i++) {
+SnpCpuidFunc *old_func, *new_func;
+
+old_func = &old->entries[i];
+new_func = &new->entries[i];
+
+if (memcmp(old_func, new_func, sizeof(SnpCpuidFunc))) {
+error_report("SEV-SNP: CPUID validation failed for function 0x%x, 
index: 0x%x"
+ "provided: eax:0x%08x, ebx: 0x%08x, ecx: 0x%08x, edx: 
0x%08x"
+ "expected: eax:0x%08x, ebx: 0x%08x, ecx: 0x%08x, edx: 
0x%08x",
+ old_func->eax_in, old_func->ecx_in,
+ old_func->eax, old_func->ebx, old_func->ecx, 
old_func->edx,
+ new_func->eax, new_func->ebx, new_func->ecx, 
new_func->edx);
+}
+}
+}
+
 static const char *
 snp_page_type_to_str(int type)
 {
@@ -819,6 +878,7 @@ sev_snp_launch_update(SevSnpGuestState *sev_snp_guest,
   SevLaunchUpdateData *data)
 {
 int ret, fw_error;
+SnpCpuidInfo snp_cpuid_info;
 struct kvm_sev_snp_launch_update update = {0};
 
 if (!data->hva || !data->len) {
@@ -828,6 +888,11 @@ sev_snp_launch_update(SevSnpGuestState *sev_snp_guest,
 return 1;
 }
 
+if (data->type == KVM_SEV_SNP_PAGE_TYPE_CPUID) {
+/* Save a copy for comparison in case the LAUNCH_UPDATE fails */
+memcpy(&snp_cpuid_info, data->hva, sizeof(snp_cpuid_info));
+}
+
 update.uaddr = (__u64)(unsigned long)data->hva;
 update.gfn_start = data->gpa >> TARGET_PAGE_BITS;
 update.len = data->len;
@@ -855,6 +920,11 @@ sev_snp_launch_update(SevSnpGuestState *sev_snp_guest,
 if (ret && ret != -EAGAIN) {
 error_report("SNP_LAUNCH_UPDATE ret=%d fw_error=%d '%s'",
  ret, fw_error, fw_error_to_str(fw_error));
+
+if (data->type == KVM_SEV_SNP_PAGE_TYPE_CPUID) {
+sev_snp_cpuid_report_mismatches(&snp_cpuid_info, data->hva);
+error_report("SEV-SNP: failed update CPUID page");
+}
 break;
 }
 }
@@ -1017,7 +1087,8 @@ sev_launch_finish(SevCommonState *sev_common)
 }
 
 static int
-snp_launch_update_data(uint64_t gpa, void *hva, uint32_t len, int type)
+snp_launch_update_data(uint64_t gpa, void *hva,
+   uint32_t len, int type)
 {
 SevLaunchUpdateData *data;
 
@@ -1032,6 +1103,90 @@ snp_launch_update_data(uint64_t gpa, void *hva, uint32_t 
len, int type)
 return 0;
 }
 
+static int
+sev_snp_cpuid_info_fill(SnpCpuidInfo *snp_cpuid_info,
+   

[PATCH v4 05/31] i386/sev: Move sev_launch_update to separate class method

2024-05-30 Thread Pankaj Gupta
When sev-snp-guest objects are introduced there will be a number of
differences in how the launch data is handled compared to the existing
sev-guest object. Move sev_launch_start() to a class method to make it
easier to implement SNP-specific launch update functionality later.

Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 79eb21c7d0..3bdb88f2ed 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -69,6 +69,8 @@ struct SevCommonState {
 struct SevCommonStateClass {
 X86ConfidentialGuestClass parent_class;
 
+/* public */
+int (*launch_start)(SevCommonState *sev_common);
 };
 
 /**
@@ -636,16 +638,16 @@ sev_read_file_base64(const char *filename, guchar **data, 
gsize *len)
 }
 
 static int
-sev_launch_start(SevGuestState *sev_guest)
+sev_launch_start(SevCommonState *sev_common)
 {
 gsize sz;
 int ret = 1;
 int fw_error, rc;
+SevGuestState *sev_guest = SEV_GUEST(sev_common);
 struct kvm_sev_launch_start start = {
 .handle = sev_guest->handle, .policy = sev_guest->policy
 };
 guchar *session = NULL, *dh_cert = NULL;
-SevCommonState *sev_common = SEV_COMMON(sev_guest);
 
 if (sev_guest->session_file) {
 if (sev_read_file_base64(sev_guest->session_file, &session, &sz) < 0) {
@@ -866,6 +868,7 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, 
Error **errp)
 uint32_t ebx;
 uint32_t host_cbitpos;
 struct sev_user_data_status status = {};
+SevCommonStateClass *klass = SEV_COMMON_GET_CLASS(cgs);
 
 ret = ram_block_discard_disable(true);
 if (ret) {
@@ -956,7 +959,7 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, 
Error **errp)
 goto err;
 }
 
-sev_launch_start(SEV_GUEST(sev_common));
+ret = klass->launch_start(sev_common);
 if (ret) {
 error_setg(errp, "%s: failed to create encryption context", __func__);
 goto err;
@@ -1455,6 +1458,10 @@ static void sev_guest_set_legacy_vm_type(Object *obj, 
bool value, Error **errp)
 static void
 sev_guest_class_init(ObjectClass *oc, void *data)
 {
+SevCommonStateClass *klass = SEV_COMMON_CLASS(oc);
+
+klass->launch_start = sev_launch_start;
+
 object_class_property_add_str(oc, "dh-cert-file",
   sev_guest_get_dh_cert_file,
   sev_guest_set_dh_cert_file);
-- 
2.34.1




[PATCH v4 06/31] i386/sev: Move sev_launch_finish to separate class method

2024-05-30 Thread Pankaj Gupta
When sev-snp-guest objects are introduced there will be a number of
differences in how the launch finish is handled compared to the existing
sev-guest object. Move sev_launch_finish() to a class method to make it
easier to implement SNP-specific launch update functionality later.

Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 3bdb88f2ed..c141f4fed4 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -71,6 +71,7 @@ struct SevCommonStateClass {
 
 /* public */
 int (*launch_start)(SevCommonState *sev_common);
+void (*launch_finish)(SevCommonState *sev_common);
 };
 
 /**
@@ -805,12 +806,12 @@ static Notifier sev_machine_done_notify = {
 };
 
 static void
-sev_launch_finish(SevGuestState *sev_guest)
+sev_launch_finish(SevCommonState *sev_common)
 {
 int ret, error;
 
 trace_kvm_sev_launch_finish();
-ret = sev_ioctl(SEV_COMMON(sev_guest)->sev_fd, KVM_SEV_LAUNCH_FINISH, 0,
+ret = sev_ioctl(sev_common->sev_fd, KVM_SEV_LAUNCH_FINISH, 0,
 &error);
 if (ret) {
 error_report("%s: LAUNCH_FINISH ret=%d fw_error=%d '%s'",
@@ -818,7 +819,7 @@ sev_launch_finish(SevGuestState *sev_guest)
 exit(1);
 }
 
-sev_set_guest_state(SEV_COMMON(sev_guest), SEV_STATE_RUNNING);
+sev_set_guest_state(sev_common, SEV_STATE_RUNNING);
 
 /* add migration blocker */
 error_setg(&sev_mig_blocker,
@@ -830,10 +831,11 @@ static void
 sev_vm_state_change(void *opaque, bool running, RunState state)
 {
 SevCommonState *sev_common = opaque;
+SevCommonStateClass *klass = SEV_COMMON_GET_CLASS(opaque);
 
 if (running) {
 if (!sev_check_state(sev_common, SEV_STATE_RUNNING)) {
-sev_launch_finish(SEV_GUEST(sev_common));
+klass->launch_finish(sev_common);
 }
 }
 }
@@ -1461,6 +1463,7 @@ sev_guest_class_init(ObjectClass *oc, void *data)
 SevCommonStateClass *klass = SEV_COMMON_CLASS(oc);
 
 klass->launch_start = sev_launch_start;
+klass->launch_finish = sev_launch_finish;
 
 object_class_property_add_str(oc, "dh-cert-file",
   sev_guest_get_dh_cert_file,
-- 
2.34.1




[PATCH v4 16/31] i386/sev: Add handling to encrypt/finalize guest launch data

2024-05-30 Thread Pankaj Gupta
From: Brijesh Singh 

Process any queued up launch data and encrypt/measure it into the SNP
guest instance prior to initial guest launch.

This also updates the KVM_SEV_SNP_LAUNCH_UPDATE call to handle partial
update responses.

Signed-off-by: Brijesh Singh 
Co-developed-by: Michael Roth 
Signed-off-by: Michael Roth 
Co-developed-by: Pankaj Gupta 
Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c| 112 ++-
 target/i386/trace-events |   2 +
 2 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index acbb22ff15..cd47c195cd 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -770,6 +770,76 @@ out:
 return ret;
 }
 
+static const char *
+snp_page_type_to_str(int type)
+{
+switch (type) {
+case KVM_SEV_SNP_PAGE_TYPE_NORMAL: return "Normal";
+case KVM_SEV_SNP_PAGE_TYPE_ZERO: return "Zero";
+case KVM_SEV_SNP_PAGE_TYPE_UNMEASURED: return "Unmeasured";
+case KVM_SEV_SNP_PAGE_TYPE_SECRETS: return "Secrets";
+case KVM_SEV_SNP_PAGE_TYPE_CPUID: return "Cpuid";
+default: return "unknown";
+}
+}
+
+static int
+sev_snp_launch_update(SevSnpGuestState *sev_snp_guest,
+  SevLaunchUpdateData *data)
+{
+int ret, fw_error;
+struct kvm_sev_snp_launch_update update = {0};
+
+if (!data->hva || !data->len) {
+error_report("SNP_LAUNCH_UPDATE called with invalid address"
+ "/ length: %p / %lx",
+ data->hva, data->len);
+return 1;
+}
+
+update.uaddr = (__u64)(unsigned long)data->hva;
+update.gfn_start = data->gpa >> TARGET_PAGE_BITS;
+update.len = data->len;
+update.type = data->type;
+
+/*
+ * KVM_SEV_SNP_LAUNCH_UPDATE requires that GPA ranges have the private
+ * memory attribute set in advance.
+ */
+ret = kvm_set_memory_attributes_private(data->gpa, data->len);
+if (ret) {
+error_report("SEV-SNP: failed to configure initial"
+ "private guest memory");
+goto out;
+}
+
+while (update.len || ret == -EAGAIN) {
+trace_kvm_sev_snp_launch_update(update.uaddr, update.gfn_start <<
+TARGET_PAGE_BITS, update.len,
+snp_page_type_to_str(update.type));
+
+ret = sev_ioctl(SEV_COMMON(sev_snp_guest)->sev_fd,
+KVM_SEV_SNP_LAUNCH_UPDATE,
+&update, &fw_error);
+if (ret && ret != -EAGAIN) {
+error_report("SNP_LAUNCH_UPDATE ret=%d fw_error=%d '%s'",
+ ret, fw_error, fw_error_to_str(fw_error));
+break;
+}
+}
+
+out:
+if (!ret && update.gfn_start << TARGET_PAGE_BITS != data->gpa + data->len) 
{
+error_report("SEV-SNP: expected update of GPA range %lx-%lx,"
+ "got GPA range %lx-%llx",
+ data->gpa, data->gpa + data->len, data->gpa,
+ update.gfn_start << TARGET_PAGE_BITS);
+ret = -EIO;
+}
+
+return ret;
+}
+
 static int
 sev_launch_update_data(SevGuestState *sev_guest, uint8_t *addr, uint64_t len)
 {
@@ -915,6 +985,46 @@ sev_launch_finish(SevCommonState *sev_common)
 migrate_add_blocker(&sev_mig_blocker, &error_fatal);
 }
 
+static void
+sev_snp_launch_finish(SevCommonState *sev_common)
+{
+int ret, error;
+Error *local_err = NULL;
+SevLaunchUpdateData *data;
+SevSnpGuestState *sev_snp = SEV_SNP_GUEST(sev_common);
+struct kvm_sev_snp_launch_finish *finish = &sev_snp->kvm_finish_conf;
+
+QTAILQ_FOREACH(data, &launch_update, next) {
+ret = sev_snp_launch_update(sev_snp, data);
+if (ret) {
+exit(1);
+}
+}
+
+trace_kvm_sev_snp_launch_finish(sev_snp->id_block, sev_snp->id_auth,
+sev_snp->host_data);
+ret = sev_ioctl(sev_common->sev_fd, KVM_SEV_SNP_LAUNCH_FINISH,
+finish, &error);
+if (ret) {
+error_report("SNP_LAUNCH_FINISH ret=%d fw_error=%d '%s'",
+ ret, error, fw_error_to_str(error));
+exit(1);
+}
+
+sev_set_guest_state(sev_common, SEV_STATE_RUNNING);
+
+/* add migration blocker */
+error_setg(&sev_mig_blocker,
+   "SEV-SNP: Migration is not implemented");
+ret = migrate_add_blocker(&sev_mig_blocker, &local_err);
+if (local_err) {
+error_report_err(local_err);
+error_free(sev_mig_blocker);
+exit(1);
+}
+}
+
+
 static void
 sev_vm_state_change(void *opaque, bool running, RunState state)
 {
@@ -1849,10 +1959,10 @@ se

[PATCH v4 27/31] hw/i386/sev: Use guest_memfd for legacy ROMs

2024-05-30 Thread Pankaj Gupta
From: Michael Roth 

Current SNP guest kernels will attempt to access these regions with
with C-bit set, so guest_memfd is needed to handle that. Otherwise,
kvm_convert_memory() will fail when the guest kernel tries to access it
and QEMU attempts to call KVM_SET_MEMORY_ATTRIBUTES to set these ranges
to private.

Whether guests should actually try to access ROM regions in this way (or
need to deal with legacy ROM regions at all), is a separate issue to be
addressed on kernel side, but current SNP guest kernels will exhibit
this behavior and so this handling is needed to allow QEMU to continue
running existing SNP guest kernels.

Signed-off-by: Michael Roth 
[pankaj: Added sev_snp_enabled() check]
Signed-off-by: Pankaj Gupta 
---
 hw/i386/pc.c   | 14 ++
 hw/i386/pc_sysfw.c | 13 ++---
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 7b638da7aa..62c25ea1e9 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -62,6 +62,7 @@
 #include "hw/mem/memory-device.h"
 #include "e820_memory_layout.h"
 #include "trace.h"
+#include "sev.h"
 #include CONFIG_DEVICES
 
 #ifdef CONFIG_XEN_EMU
@@ -1022,10 +1023,15 @@ void pc_memory_init(PCMachineState *pcms,
 pc_system_firmware_init(pcms, rom_memory);
 
 option_rom_mr = g_malloc(sizeof(*option_rom_mr));
-memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE,
-   &error_fatal);
-if (pcmc->pci_enabled) {
-memory_region_set_readonly(option_rom_mr, true);
+if (sev_snp_enabled()) {
+memory_region_init_ram_guest_memfd(option_rom_mr, NULL, "pc.rom",
+   PC_ROM_SIZE, &error_fatal);
+} else {
+memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE,
+   &error_fatal);
+if (pcmc->pci_enabled) {
+memory_region_set_readonly(option_rom_mr, true);
+}
 }
 memory_region_add_subregion_overlap(rom_memory,
 PC_ROM_MIN_VGA,
diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index 00464afcb4..def77a442d 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -51,8 +51,13 @@ static void pc_isa_bios_init(MemoryRegion *isa_bios, 
MemoryRegion *rom_memory,
 
 /* map the last 128KB of the BIOS in ISA space */
 isa_bios_size = MIN(flash_size, 128 * KiB);
-memory_region_init_ram(isa_bios, NULL, "isa-bios", isa_bios_size,
-   &error_fatal);
+if (sev_snp_enabled()) {
+memory_region_init_ram_guest_memfd(isa_bios, NULL, "isa-bios",
+   isa_bios_size, &error_fatal);
+} else {
+memory_region_init_ram(isa_bios, NULL, "isa-bios", isa_bios_size,
+   &error_fatal);
+}
 memory_region_add_subregion_overlap(rom_memory,
 0x10 - isa_bios_size,
 isa_bios,
@@ -65,7 +70,9 @@ static void pc_isa_bios_init(MemoryRegion *isa_bios, 
MemoryRegion *rom_memory,
((uint8_t*)flash_ptr) + (flash_size - isa_bios_size),
isa_bios_size);
 
-memory_region_set_readonly(isa_bios, true);
+if (!machine_require_guest_memfd(current_machine)) {
+memory_region_set_readonly(isa_bios, true);
+}
 }
 
 static PFlashCFI01 *pc_pflash_create(PCMachineState *pcms,
-- 
2.34.1




[PATCH v4 26/31] i386/sev: Invoke launch_updata_data() for SNP class

2024-05-30 Thread Pankaj Gupta
Invoke as sev_snp_launch_update_data() for SNP object.

Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 7a0c2ee10f..7d2f67e2f3 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1108,6 +1108,14 @@ snp_launch_update_data(uint64_t gpa, void *hva,
 return 0;
 }
 
+static int
+sev_snp_launch_update_data(hwaddr gpa, uint8_t *ptr, uint64_t len)
+{
+   int ret = snp_launch_update_data(gpa, ptr, len,
+ KVM_SEV_SNP_PAGE_TYPE_NORMAL);
+   return ret;
+}
+
 static int
 sev_snp_cpuid_info_fill(SnpCpuidInfo *snp_cpuid_info,
 const KvmCpuidInfo *kvm_cpuid_info)
@@ -2282,6 +2290,7 @@ sev_snp_guest_class_init(ObjectClass *oc, void *data)
 
 klass->launch_start = sev_snp_launch_start;
 klass->launch_finish = sev_snp_launch_finish;
+klass->launch_update_data = sev_snp_launch_update_data;
 klass->kvm_init = sev_snp_kvm_init;
 x86_klass->kvm_type = sev_snp_kvm_type;
 
-- 
2.34.1




[PATCH v4 10/31] i386/sev: Add snp_kvm_init() override for SNP class

2024-05-30 Thread Pankaj Gupta
SNP does not support SMM and requires guest_memfd for
private guest memory, so add SNP specific kvm_init()
functionality in snp_kvm_init() class method.

Signed-off-by: Michael Roth 
Co-developed-by: Pankaj Gupta 
Signed-off-by: Pankaj Gupta 
---
 target/i386/sev.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 2a9a77a2d9..56c1cce8e7 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -893,12 +893,12 @@ out:
 
 static int sev_common_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
-SevCommonState *sev_common = SEV_COMMON(cgs);
 char *devname;
 int ret, fw_error, cmd;
 uint32_t ebx;
 uint32_t host_cbitpos;
 struct sev_user_data_status status = {};
+SevCommonState *sev_common = SEV_COMMON(cgs);
 SevCommonStateClass *klass = SEV_COMMON_GET_CLASS(cgs);
 
 sev_common->state = SEV_STATE_UNINIT;
@@ -1038,6 +1038,23 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, 
Error **errp)
 return 0;
 }
 
+static int sev_snp_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+X86MachineState *x86ms = X86_MACHINE(ms);
+
+if (x86ms->smm == ON_OFF_AUTO_AUTO) {
+x86ms->smm = ON_OFF_AUTO_OFF;
+} else if (x86ms->smm == ON_OFF_AUTO_ON) {
+error_setg(errp, "SEV-SNP does not support SMM.");
+ram_block_discard_disable(false);
+return -1;
+}
+ms->require_guest_memfd = true;
+
+return 0;
+}
+
 int
 sev_encrypt_flash(uint8_t *ptr, uint64_t len, Error **errp)
 {
@@ -1761,6 +1778,10 @@ sev_snp_guest_set_host_data(Object *obj, const char 
*value, Error **errp)
 static void
 sev_snp_guest_class_init(ObjectClass *oc, void *data)
 {
+SevCommonStateClass *klass = SEV_COMMON_CLASS(oc);
+
+klass->kvm_init = sev_snp_kvm_init;
+
 object_class_property_add(oc, "policy", "uint64",
   sev_snp_guest_get_policy,
   sev_snp_guest_set_policy, NULL, NULL);
-- 
2.34.1




Re: [RESEND PATCH] hw/mem/nvdimm: fix error message for 'unarmed' flag

2022-10-19 Thread Pankaj Gupta
> In the ACPI specification [1], the 'unarmed' bit is set when a device
> cannot accept a persistent write. This means that when a memdev is
> read-only, the 'unarmed' flag must be turned on. The logic is correct,
> just changing the error message.
>
> [1] ACPI NFIT NVDIMM Region Mapping Structure "NVDIMM State Flags" Bit 3
>
> Signed-off-by: Julia Suvorova 
> Reviewed-by: Stefan Hajnoczi 
> ---
>  hw/mem/nvdimm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
> index 7c7d81..bfb76818c1 100644
> --- a/hw/mem/nvdimm.c
> +++ b/hw/mem/nvdimm.c
> @@ -149,7 +149,7 @@ static void nvdimm_prepare_memory_region(NVDIMMDevice 
> *nvdimm, Error **errp)
>  if (!nvdimm->unarmed && memory_region_is_rom(mr)) {
>  HostMemoryBackend *hostmem = dimm->hostmem;
>
> -error_setg(errp, "'unarmed' property must be off since memdev %s "
> +error_setg(errp, "'unarmed' property must be on since memdev %s "
> "is read-only",
> object_get_canonical_path_component(OBJECT(hostmem)));
>  return;

With the suggested minor change.

Reviewed-by: Pankaj Gupta 



Re: [PATCH v12 4/8] qmp: add QMP command x-query-virtio-status

2022-02-10 Thread Pankaj Gupta
device.
> +#
> +# Since: 7.0
> +#
> +##
> +
> +{ 'struct': 'VirtioStatus',
> +  'data': { 'name': 'str',
> +'device-id': 'uint16',
> +'vhost-started': 'bool',
> +'device-endian': 'str',
> +'guest-features': 'uint64',
> +'host-features': 'uint64',
> +'backend-features': 'uint64',
> +'num-vqs': 'int',
> +'status': 'uint8',
> +'isr': 'uint8',
> +'queue-sel': 'uint16',
> +'vm-running': 'bool',
> +'broken': 'bool',
> +'disabled': 'bool',
> +'use-started': 'bool',
> +'started': 'bool',
> +'start-on-kick': 'bool',
> +'disable-legacy-check': 'bool',
> +'bus-name': 'str',
> +'use-guest-notifier-mask': 'bool',
> +'*vhost-dev': 'VhostStatus' } }
> +
> +##
> +# @x-query-virtio-status:
> +#
> +# Poll for a comprehensive status of a given virtio device
> +#
> +# @path: Canonical QOM path of the VirtIODevice
> +#
> +# Features:
> +# @unstable: This command is meant for debugging.
> +#
> +# Returns: VirtioStatus of the virtio device
> +#
> +# Since: 7.0
> +#
> +# Examples:
> +#
> +# 1. Poll for the status of virtio-crypto (no vhost-crypto active)
> +#
> +# -> { "execute": "x-query-virtio-status",
> +#  "arguments": { "path": "/machine/peripheral/crypto0/virtio-backend" }
> +#}
> +# <- { "return": {
> +#"device-endian": "little",
> +#"bus-name": "",
> +#"disable-legacy-check": false,
> +#"name": "virtio-crypto",
> +#"started": true,
> +#"device-id": 20,
> +#"backend-features": 0,
> +#"start-on-kick": false,
> +#"isr": 1,
> +#"broken": false,
> +#"status": 15,
> +#"num-vqs": 2,
> +#"guest-features": 5100273664,
> +#"host-features": 6325010432,
> +#"use-guest-notifier-mask": true,
> +#"vm-running": true,
> +#"queue-sel": 1,
> +#"disabled": false,
> +#"vhost-started": false,
> +#"use-started": true
> +#  }
> +#}
> +#
> +# 2. Poll for the status of virtio-net (vhost-net is active)
> +#
> +# -> { "execute": "x-query-virtio-status",
> +#  "arguments": { "path": 
> "/machine/peripheral-anon/device[1]/virtio-backend" }
> +#}
> +# <- { "return": {
> +#"device-endian": "little",
> +#"bus-name": "",
> +#"disabled-legacy-check": false,
> +#"name": "virtio-net",
> +#"started": true,
> +#"device-id": 1,
> +#"vhost-dev": {
> +#   "n-tmp-sections": 4,
> +#   "n-mem-sections": 4,
> +#   "max-queues": 1,
> +#   "backend-cap": 2,
> +#   "log-size": 0,
> +#   "backend-features": 0,
> +#   "nvqs": 2,
> +#   "protocol-features": 0,
> +#   "vq-index": 0,
> +#   "log-enabled": false,
> +#   "acked-features": 5100306432,
> +#   "features": 13908344832
> +#},
> +#"backend-features": 6337593319,
> +#"start-on-kick": false,
> +#"isr": 1,
> +#"broken": false,
> +#"status": 15,
> +#"num-vqs": 3,
> +#"guest-features": 5111807911,
> +#"host-features": 6337593319,
> +#"use-guest-notifier-mask": true,
> +#"vm-running": true,
> +#"queue-sel": 2,
> +#"disabled": false,
> +#"vhost-started": true,
> +#"use-started": true
> +#  }
> +#}
> +#
> +##
> +
> +{ 'command': 'x-query-virtio-status',
> +  'data': { 'path': 'str' },
> +  'returns': 'VirtioStatus',
> +  'features': [ 'unstable' ] }

Reviewed-by: Pankaj Gupta 



Re: [PATCH v12 3/8] qmp: add QMP command x-query-virtio

2022-02-10 Thread Pankaj Gupta
t;bus_name);
>  vdev->bus_name = NULL;
>  }
> @@ -3874,6 +3885,8 @@ static void virtio_device_class_init(ObjectClass 
> *klass, void *data)
>  vdc->stop_ioeventfd = virtio_device_stop_ioeventfd_impl;
>
>  vdc->legacy_features |= VIRTIO_LEGACY_FEATURES;
> +
> +QTAILQ_INIT(&virtio_list);
>  }
>
>  bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev)
> @@ -3884,6 +3897,37 @@ bool virtio_device_ioeventfd_enabled(VirtIODevice 
> *vdev)
>  return virtio_bus_ioeventfd_enabled(vbus);
>  }
>
> +VirtioInfoList *qmp_x_query_virtio(Error **errp)
> +{
> +VirtioInfoList *list = NULL;
> +VirtioInfoList *node;
> +VirtIODevice *vdev;
> +
> +QTAILQ_FOREACH(vdev, &virtio_list, next) {
> +DeviceState *dev = DEVICE(vdev);
> +Error *err = NULL;
> +QObject *obj = qmp_qom_get(dev->canonical_path, "realized", &err);
> +
> +if (err == NULL) {
> +GString *is_realized = qobject_to_json_pretty(obj, true);
> +/* virtio device is NOT realized, remove it from list */
> +if (!strncmp(is_realized->str, "false", 4)) {
> +QTAILQ_REMOVE(&virtio_list, vdev, next);
> +} else {
> +node = g_new0(VirtioInfoList, 1);
> +node->value = g_new(VirtioInfo, 1);
> +node->value->path = g_strdup(dev->canonical_path);
> +node->value->name = g_strdup(vdev->name);
> +QAPI_LIST_PREPEND(list, node->value);
> +}
> +   g_string_free(is_realized, true);
> +}
> +qobject_unref(obj);
> +}
> +
> +return list;
> +}
> +
>  static const TypeInfo virtio_device_info = {
>  .name = TYPE_VIRTIO_DEVICE,
>  .parent = TYPE_DEVICE,
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 90e6080..8f4e4c1 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -110,6 +110,7 @@ struct VirtIODevice
>  bool use_guest_notifier_mask;
>  AddressSpace *dma_as;
>  QLIST_HEAD(, VirtQueue) *vector_queues;
> +QTAILQ_ENTRY(VirtIODevice) next;
>  };
>
>  struct VirtioDeviceClass {
> diff --git a/qapi/meson.build b/qapi/meson.build
> index c0c49c1..e332f28 100644
> --- a/qapi/meson.build
> +++ b/qapi/meson.build
> @@ -48,6 +48,7 @@ qapi_all_modules = [
>'sockets',
>'trace',
>'transaction',
> +  'virtio',
>'yank',
>  ]
>  if have_system
> diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
> index 4912b97..1512ada 100644
> --- a/qapi/qapi-schema.json
> +++ b/qapi/qapi-schema.json
> @@ -93,3 +93,4 @@
>  { 'include': 'audio.json' }
>  { 'include': 'acpi.json' }
>  { 'include': 'pci.json' }
> +{ 'include': 'virtio.json' }
> diff --git a/qapi/virtio.json b/qapi/virtio.json
> new file mode 100644
> index 000..aee0e40
> --- /dev/null
> +++ b/qapi/virtio.json
> @@ -0,0 +1,68 @@
> +# -*- Mode: Python -*-
> +# vim: filetype=python
> +#
> +
> +##
> +# = Virtio devices
> +##
> +
> +##
> +# @VirtioInfo:
> +#
> +# Basic information about a given VirtIODevice
> +#
> +# @path: The VirtIODevice's canonical QOM path
> +#
> +# @name: Name of the VirtIODevice
> +#
> +# Since: 7.0
> +#
> +##
> +{ 'struct': 'VirtioInfo',
> +  'data': { 'path': 'str',
> +'name': 'str' } }
> +
> +##
> +# @x-query-virtio:
> +#
> +# Returns a list of all realized VirtIODevices
> +#
> +# Features:
> +# @unstable: This command is meant for debugging.
> +#
> +# Returns: List of gathered VirtIODevices
> +#
> +# Since: 7.0
> +#
> +# Example:
> +#
> +# -> { "execute": "x-query-virtio" }
> +# <- { "return": [
> +#{
> +#"path": "/machine/peripheral-anon/device[4]/virtio-backend",
> +#"name": "virtio-input"
> +#},
> +#{
> +#"path": "/machine/peripheral/crypto0/virtio-backend",
> +#"name": "virtio-crypto"
> +#},
> +#{
> +#"path": "/machine/peripheral-anon/device[2]/virtio-backend",
> +#"name": "virtio-scsi"
> +#},
> +#{
> +#"path": "/machine/peripheral-anon/device[1]/virtio-backend",
> +#"name": "virtio-net"
> +#},
> +#{
> +#"path": "/machine/peripheral-anon/device[0]/virtio-backend",
> +#"name": "virtio-serial"
> +#}
> +#  ]
> +#}
> +#
> +##
> +
> +{ 'command': 'x-query-virtio',
> +  'returns': [ 'VirtioInfo' ],
> +  'features': [ 'unstable' ] }
> diff --git a/tests/qtest/qmp-cmd-test.c b/tests/qtest/qmp-cmd-test.c
> index 7f103ea..fd00ee2 100644
> --- a/tests/qtest/qmp-cmd-test.c
> +++ b/tests/qtest/qmp-cmd-test.c
> @@ -103,6 +103,7 @@ static bool query_is_ignored(const char *cmd)
>  "query-gic-capabilities", /* arm */
>  /* Success depends on target-specific build configuration: */
>  "query-pci",  /* CONFIG_PCI */
> +"x-query-virtio", /* CONFIG_VIRTIO */
>  /* Success depends on launching SEV guest */
>  "query-sev-launch-measure",
>  /* Success depends on Host or Hypervisor SEV support */

Reviewed-by: Pankaj Gupta 

Re: [PATCH v12 1/8] virtio: drop name parameter for virtio_init()

2022-02-10 Thread Pankaj Gupta
turn;
>  }
>
> -virtio_init(vdev, "virtio-rng", VIRTIO_ID_RNG, 0);
> +virtio_init(vdev, VIRTIO_ID_RNG, 0);
>
>  vrng->vq = virtio_add_queue(vdev, 8, handle_input);
>  vrng->quota_remaining = vrng->conf.max_bytes;
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index aae72fb..734b7fb 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -132,6 +132,56 @@ struct VirtQueue
>  QLIST_ENTRY(VirtQueue) node;
>  };
>
> +const char *virtio_device_names[] = {
> +[VIRTIO_ID_NET] = "virtio-net",
> +[VIRTIO_ID_BLOCK] = "virtio-blk",
> +[VIRTIO_ID_CONSOLE] = "virtio-serial",
> +[VIRTIO_ID_RNG] = "virtio-rng",
> +[VIRTIO_ID_BALLOON] = "virtio-balloon",
> +[VIRTIO_ID_IOMEM] = "virtio-iomem",
> +[VIRTIO_ID_RPMSG] = "virtio-rpmsg",
> +[VIRTIO_ID_SCSI] = "virtio-scsi",
> +[VIRTIO_ID_9P] = "virtio-9p",
> +[VIRTIO_ID_MAC80211_WLAN] = "virtio-mac-wlan",
> +[VIRTIO_ID_RPROC_SERIAL] = "virtio-rproc-serial",
> +[VIRTIO_ID_CAIF] = "virtio-caif",
> +[VIRTIO_ID_MEMORY_BALLOON] = "virtio-mem-balloon",
> +[VIRTIO_ID_GPU] = "virtio-gpu",
> +[VIRTIO_ID_CLOCK] = "virtio-clk",
> +[VIRTIO_ID_INPUT] = "virtio-input",
> +[VIRTIO_ID_VSOCK] = "vhost-vsock",
> +[VIRTIO_ID_CRYPTO] = "virtio-crypto",
> +[VIRTIO_ID_SIGNAL_DIST] = "virtio-signal",
> +[VIRTIO_ID_PSTORE] = "virtio-pstore",
> +[VIRTIO_ID_IOMMU] = "virtio-iommu",
> +[VIRTIO_ID_MEM] = "virtio-mem",
> +[VIRTIO_ID_SOUND] = "virtio-sound",
> +[VIRTIO_ID_FS] = "virtio-user-fs",
> +[VIRTIO_ID_PMEM] = "virtio-pmem",
> +[VIRTIO_ID_RPMB] = "virtio-rpmb",
> +[VIRTIO_ID_MAC80211_HWSIM] = "virtio-mac-hwsim",
> +[VIRTIO_ID_VIDEO_ENCODER] = "virtio-vid-encoder",
> +[VIRTIO_ID_VIDEO_DECODER] = "virtio-vid-decoder",
> +[VIRTIO_ID_SCMI] = "virtio-scmi",
> +[VIRTIO_ID_NITRO_SEC_MOD] = "virtio-nitro-sec-mod",
> +[VIRTIO_ID_I2C_ADAPTER] = "vhost-user-i2c",
> +[VIRTIO_ID_WATCHDOG] = "virtio-watchdog",
> +[VIRTIO_ID_CAN] = "virtio-can",
> +[VIRTIO_ID_DMABUF] = "virtio-dmabuf",
> +[VIRTIO_ID_PARAM_SERV] = "virtio-param-serv",
> +[VIRTIO_ID_AUDIO_POLICY] = "virtio-audio-pol",
> +[VIRTIO_ID_BT] = "virtio-bluetooth",
> +[VIRTIO_ID_GPIO] = "virtio-gpio"
> +};
> +
> +static const char *virtio_id_to_name(uint16_t device_id)
> +{
> +assert(device_id < G_N_ELEMENTS(virtio_device_names));
> +const char *name = virtio_device_names[device_id];
> +assert(name != NULL);
> +return name;
> +}
> +
>  /* Called within call_rcu().  */
>  static void virtio_free_region_cache(VRingMemoryRegionCaches *caches)
>  {
> @@ -3209,8 +3259,7 @@ void virtio_instance_init_common(Object *proxy_obj, 
> void *data,
>  qdev_alias_all_properties(vdev, proxy_obj);
>  }
>
> -void virtio_init(VirtIODevice *vdev, const char *name,
> - uint16_t device_id, size_t config_size)
> +void virtio_init(VirtIODevice *vdev, uint16_t device_id, size_t config_size)
>  {
>  BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
>  VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> @@ -3239,7 +3288,7 @@ void virtio_init(VirtIODevice *vdev, const char *name,
>  vdev->vq[i].host_notifier_enabled = false;
>  }
>
> -vdev->name = name;
> +vdev->name = virtio_id_to_name(device_id);
>  vdev->config_len = config_size;
>  if (vdev->config_len) {
>  vdev->config = g_malloc0(config_size);
> diff --git a/include/hw/virtio/vhost-vsock-common.h 
> b/include/hw/virtio/vhost-vsock-common.h
> index d8b565b..076b7ab 100644
> --- a/include/hw/virtio/vhost-vsock-common.h
> +++ b/include/hw/virtio/vhost-vsock-common.h
> @@ -44,7 +44,7 @@ int vhost_vsock_common_start(VirtIODevice *vdev);
>  void vhost_vsock_common_stop(VirtIODevice *vdev);
>  int vhost_vsock_common_pre_save(void *opaque);
>  int vhost_vsock_common_post_load(void *opaque, int version_id);
> -void vhost_vsock_common_realize(VirtIODevice *vdev, const char *name);
> +void vhost_vsock_common_realize(VirtIODevice *vdev);
>  void vhost_vsock_common_unrealize(VirtIODevice *vdev);
>  uint64_t vhost_vsock_common_get_features(VirtIODevice *vdev, uint64_t 
> features,
>   Error **errp);
> diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
> index 2179b75..afff9e1 100644
> --- a/include/hw/virtio/virtio-gpu.h
> +++ b/include/hw/virtio/virtio-gpu.h
> @@ -22,6 +22,7 @@
>  #include "sysemu/vhost-user-backend.h"
>
>  #include "standard-headers/linux/virtio_gpu.h"
> +#include "standard-headers/linux/virtio_ids.h"
>  #include "qom/object.h"
>
>  #define TYPE_VIRTIO_GPU_BASE "virtio-gpu-base"
> @@ -37,8 +38,6 @@ OBJECT_DECLARE_SIMPLE_TYPE(VirtIOGPUGL, VIRTIO_GPU_GL)
>  #define TYPE_VHOST_USER_GPU "vhost-user-gpu"
>  OBJECT_DECLARE_SIMPLE_TYPE(VhostUserGPU, VHOST_USER_GPU)
>
> -#define VIRTIO_ID_GPU 16
> -
>  struct virtio_gpu_simple_resource {
>  uint32_t resource_id;
>  uint32_t width;
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index f095637..2a0be70 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -165,8 +165,8 @@ struct VirtioDeviceClass {
>  void virtio_instance_init_common(Object *proxy_obj, void *data,
>   size_t vdev_size, const char *vdev_name);
>
> -void virtio_init(VirtIODevice *vdev, const char *name,
> - uint16_t device_id, size_t config_size);
> +void virtio_init(VirtIODevice *vdev, uint16_t device_id, size_t config_size);
> +
>  void virtio_cleanup(VirtIODevice *vdev);
>
>  void virtio_error(VirtIODevice *vdev, const char *fmt, ...) GCC_FMT_ATTR(2, 
> 3);

Reviewed-by: Pankaj Gupta 

Re: [PATCH v2] migration/rdma: set the REUSEADDR option for destination

2022-02-08 Thread Pankaj Gupta
> We hit following error during testing RDMA transport:
> in case of migration error, mgmt daemon pick one migration port,
> incoming rdma:[::]:8089: RDMA ERROR: Error: could not rdma_bind_addr
>
> Then try another -incoming rdma:[::]:8103, sometime it worked,
> sometimes need another try with other ports number.
>
> Set the REUSEADDR option for destination, This allow address could
> be reused to avoid rdma_bind_addr error out.
>
> Signed-off-by: Jack Wang 
> ---
> v2: extend commit message as discussed with Pankaj and David
> ---
>  migration/rdma.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index c7c7a384875b..663e1fbb096d 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2705,6 +2705,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error 
> **errp)
>  char ip[40] = "unknown";
>  struct rdma_addrinfo *res, *e;
>  char port_str[16];
> +int reuse = 1;
>
>  for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
>  rdma->wr_data[idx].control_len = 0;
> @@ -2740,6 +2741,12 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, 
> Error **errp)
>  goto err_dest_init_bind_addr;
>  }
>
> +ret = rdma_set_option(listen_id, RDMA_OPTION_ID, 
> RDMA_OPTION_ID_REUSEADDR,
> + &reuse, sizeof reuse);
> +if (ret) {
> +ERROR(errp, "Error: could not set REUSEADDR option");
> +goto err_dest_init_bind_addr;
> +}
>  for (e = res; e != NULL; e = e->ai_next) {
>  inet_ntop(e->ai_family,
>  &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof 
> ip);

Reviewed-by: Pankaj Gupta 

Re: [PATCH 1/2] migration/rdma: Increase the backlog from 5 to 128

2022-02-01 Thread Pankaj Gupta
> > > >  migration/rdma.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/migration/rdma.c b/migration/rdma.c
> > > > index c7c7a384875b..2e223170d06d 100644
> > > > --- a/migration/rdma.c
> > > > +++ b/migration/rdma.c
> > > > @@ -4238,7 +4238,7 @@ void rdma_start_incoming_migration(const char 
> > > > *host_port, Error **errp)
> > > >
> > > >  trace_rdma_start_incoming_migration_after_dest_init();
> > > >
> > > > -ret = rdma_listen(rdma->listen_id, 5);
> > > > +ret = rdma_listen(rdma->listen_id, 128);
> > >
> > > 128 backlog seems too much to me. Any reason for choosing this number.
> > > Any rationale to choose this number?
> > >
> > 128 is the default value of SOMAXCONN, I can use that if it is preferred.
>
> AFAICS backlog is only applicable with RDMA iWARP CM mode. Maybe we
> can increase it to 128.these many

Or maybe we first increase it to 20 or 32? or so to avoid memory
overhead if we are not
using these many connections at the same time.

> Maybe you can also share any testing data for multiple concurrent live
> migrations using RDMA, please.
>
> Thanks,
> Pankaj
>
> Thanks,
> Pankaj



Re: [PATCH 1/2] migration/rdma: Increase the backlog from 5 to 128

2022-02-01 Thread Pankaj Gupta
> > > So it can handle more incoming requests.
> > >
> > > Signed-off-by: Jack Wang 
> > > ---
> > >  migration/rdma.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/migration/rdma.c b/migration/rdma.c
> > > index c7c7a384875b..2e223170d06d 100644
> > > --- a/migration/rdma.c
> > > +++ b/migration/rdma.c
> > > @@ -4238,7 +4238,7 @@ void rdma_start_incoming_migration(const char 
> > > *host_port, Error **errp)
> > >
> > >  trace_rdma_start_incoming_migration_after_dest_init();
> > >
> > > -ret = rdma_listen(rdma->listen_id, 5);
> > > +ret = rdma_listen(rdma->listen_id, 128);
> >
> > 128 backlog seems too much to me. Any reason for choosing this number.
> > Any rationale to choose this number?
> >
> 128 is the default value of SOMAXCONN, I can use that if it is preferred.

AFAICS backlog is only applicable with RDMA iWARP CM mode. Maybe we
can increase it to 128.
Maybe you can also share any testing data for multiple concurrent live
migrations using RDMA, please.

Thanks,
Pankaj

Thanks,
Pankaj



Re: [PATCH 2/2] migration/rdma: set the REUSEADDR option for destination

2022-02-01 Thread Pankaj Gupta
> > > This allow address could be reused to avoid rdma_bind_addr error
> > > out.
> >
> > Seems we are proposing to allow multiple connections on same source ip
> > port pair?
> according to the man page, it's more about the destination side which
> is the incoming side.[1]

By source here I meant generic ip port address pair binding. For this case it is
at destination live migration host.

> We hit the error on the migration target when there are many migration
> tests in parallel:
> "RDMA ERROR: Error: could not rdma_bind_addr!"
o.k. Worth to add this in commit message.

>
> [1]https://manpages.debian.org/testing/librdmacm-dev/rdma_set_option.3.en.html
> > >
> > > Signed-off-by: Jack Wang 
> > > ---
> > >  migration/rdma.c | 7 +++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/migration/rdma.c b/migration/rdma.c
> > > index 2e223170d06d..b498ef013c77 100644
> > > --- a/migration/rdma.c
> > > +++ b/migration/rdma.c
> > > @@ -2705,6 +2705,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, 
> > > Error **errp)
> > >  char ip[40] = "unknown";
> > >  struct rdma_addrinfo *res, *e;
> > >  char port_str[16];
> > > +int reuse = 1;
> > >
> > >  for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
> > >  rdma->wr_data[idx].control_len = 0;
> > > @@ -2740,6 +2741,12 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, 
> > > Error **errp)
> > >  goto err_dest_init_bind_addr;
> > >  }
> > >
> > > +ret = rdma_set_option(listen_id, RDMA_OPTION_ID, 
> > > RDMA_OPTION_ID_REUSEADDR,
> > > + &reuse, sizeof reuse);
> >
> > maybe we can just write '1' directly on the argument list of 
> > 'rdma_set_option'.
> > Assuming reuseaddr does not effect core rdma transport? change seems ok to 
> > me.
> I feel it's cleaner to do it with a variable than force conversion of
> 1 to void *.

Adding typecasted 1 seems more cleaner than adding another auto variable.
Will leave this to maintainers to decide.
>
> It's bound to the cm_id which is newly created a few lines above, so
> does not affect core rdma transport.

o.k.
>
> >
> > Thanks,
> > Pankaj
> Thanks for the review!
>
> Jinpu Wang
> >
> > > +if (ret) {
> > > +ERROR(errp, "Error: could not set REUSEADDR option");
> > > +goto err_dest_init_bind_addr;
> > > +}
> > >  for (e = res; e != NULL; e = e->ai_next) {
> > >  inet_ntop(e->ai_family,
> > >  &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, 
> > > sizeof ip);
> > > --
> > > 2.25.1
> > >



Re: [PATCH 2/2] migration/rdma: set the REUSEADDR option for destination

2022-02-01 Thread Pankaj Gupta
> This allow address could be reused to avoid rdma_bind_addr error
> out.

Seems we are proposing to allow multiple connections on same source ip
port pair?
>
> Signed-off-by: Jack Wang 
> ---
>  migration/rdma.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 2e223170d06d..b498ef013c77 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2705,6 +2705,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error 
> **errp)
>  char ip[40] = "unknown";
>  struct rdma_addrinfo *res, *e;
>  char port_str[16];
> +int reuse = 1;
>
>  for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
>  rdma->wr_data[idx].control_len = 0;
> @@ -2740,6 +2741,12 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, 
> Error **errp)
>  goto err_dest_init_bind_addr;
>  }
>
> +ret = rdma_set_option(listen_id, RDMA_OPTION_ID, 
> RDMA_OPTION_ID_REUSEADDR,
> + &reuse, sizeof reuse);

maybe we can just write '1' directly on the argument list of 'rdma_set_option'.
Assuming reuseaddr does not effect core rdma transport? change seems ok to me.

Thanks,
Pankaj

> +if (ret) {
> +ERROR(errp, "Error: could not set REUSEADDR option");
> +goto err_dest_init_bind_addr;
> +}
>  for (e = res; e != NULL; e = e->ai_next) {
>  inet_ntop(e->ai_family,
>  &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof 
> ip);
> --
> 2.25.1
>



Re: [PATCH 1/2] migration/rdma: Increase the backlog from 5 to 128

2022-02-01 Thread Pankaj Gupta
> So it can handle more incoming requests.
>
> Signed-off-by: Jack Wang 
> ---
>  migration/rdma.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index c7c7a384875b..2e223170d06d 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -4238,7 +4238,7 @@ void rdma_start_incoming_migration(const char 
> *host_port, Error **errp)
>
>  trace_rdma_start_incoming_migration_after_dest_init();
>
> -ret = rdma_listen(rdma->listen_id, 5);
> +ret = rdma_listen(rdma->listen_id, 128);

128 backlog seems too much to me. Any reason for choosing this number.
Any rationale to choose this number?

Thanks,
Pankaj

>
>  if (ret) {
>  ERROR(errp, "listening on socket!");
> --
> 2.25.1
>



Re: [RFC] virtio_pmem: enable live migration support

2022-01-12 Thread Pankaj Gupta
> >  I mean, that would be fundamentally broken, because the fsync() would
> >  corrupt the file. So I assume in a sane environment, the dst could only
> >  have stale clean pagecache pages. And we'd have to get rid of these to
> >  re-read everything from file.
> > >>>
> > >>> In case of write back cache mode, we could still have stale dirty
> > >>> pages at the destination
> > >>> host and destination fsync is not the right thing to do. We need to
> > >>> invalidate these pages
> > >>> (Can we invalidate dirty pages resident in page cache with
> > >>> POSIX_FADV_DONTNEED as
> > >>> well?) man pages say, we cannot (unless i misunderstood it).
> > >>>
> > >>
> > >> I think you'd have to fsync + POSIX_FADV_DONTNEED. But I am still
> > >> confused how we could end up with dirty pagecache pages on the
> > >> destination. In my opinion, there should only be clean pagecache pages
> > >> -- can someone enlighten me? :)
> > >
> > > because of activity on the page cache pages corresponding to mmap region
> > > in the past which is not synced yet or not reclaimed yet. Maybe this
> > > is hypothetical
> > > or not possible, happy to learn?
> >
> > Right, but assume the following *sane*
> >
> > #1 H0 starts and runs VM.
> > #2 H0 migrates VM to H1.
> > #3 H1 runs VM.
> > #4 H1 migrates VM to H0.
> > #5 H0 runs VM.
> >
> > We'd expect a proper fsync during #2, writing back any dirty pages to
> > the memory backend. Otherwise, #3 would already be broken. Similarly,
> > we'd expect a proper fsync during #4.
> >
> > I assume during #4 we could find clean pagecache pages that are actually
> > invalid, because the underlying file was changed by H1. So we have to
> > make sure to invalidate all pagecache pages (all clean).
>
> Yes, you mean fsync on source host before migration starts. My point
> is something
> like another process mmap same backend file on destination host and/or
> guest/qemu
> crashing abruptly.

In that case we should not start guest if we cannot invalidate all the
corresponding
page cache pages before starting guest i.e mmaping virtio-pmem backend file.

Thank you for the discussion!

Best regards,
Pankaj



Re: [RFC] virtio_pmem: enable live migration support

2022-01-12 Thread Pankaj Gupta
>  I mean, that would be fundamentally broken, because the fsync() would
>  corrupt the file. So I assume in a sane environment, the dst could only
>  have stale clean pagecache pages. And we'd have to get rid of these to
>  re-read everything from file.
> >>>
> >>> In case of write back cache mode, we could still have stale dirty
> >>> pages at the destination
> >>> host and destination fsync is not the right thing to do. We need to
> >>> invalidate these pages
> >>> (Can we invalidate dirty pages resident in page cache with
> >>> POSIX_FADV_DONTNEED as
> >>> well?) man pages say, we cannot (unless i misunderstood it).
> >>>
> >>
> >> I think you'd have to fsync + POSIX_FADV_DONTNEED. But I am still
> >> confused how we could end up with dirty pagecache pages on the
> >> destination. In my opinion, there should only be clean pagecache pages
> >> -- can someone enlighten me? :)
> >
> > because of activity on the page cache pages corresponding to mmap region
> > in the past which is not synced yet or not reclaimed yet. Maybe this
> > is hypothetical
> > or not possible, happy to learn?
>
> Right, but assume the following *sane*
>
> #1 H0 starts and runs VM.
> #2 H0 migrates VM to H1.
> #3 H1 runs VM.
> #4 H1 migrates VM to H0.
> #5 H0 runs VM.
>
> We'd expect a proper fsync during #2, writing back any dirty pages to
> the memory backend. Otherwise, #3 would already be broken. Similarly,
> we'd expect a proper fsync during #4.
>
> I assume during #4 we could find clean pagecache pages that are actually
> invalid, because the underlying file was changed by H1. So we have to
> make sure to invalidate all pagecache pages (all clean).

Yes, you mean fsync on source host before migration starts. My point
is something
like another process mmap same backend file on destination host and/or
guest/qemu
crashing abruptly.



Re: [RFC] virtio_pmem: enable live migration support

2022-01-12 Thread Pankaj Gupta
> >> I mean, that would be fundamentally broken, because the fsync() would
> >> corrupt the file. So I assume in a sane environment, the dst could only
> >> have stale clean pagecache pages. And we'd have to get rid of these to
> >> re-read everything from file.
> >
> > In case of write back cache mode, we could still have stale dirty
> > pages at the destination
> > host and destination fsync is not the right thing to do. We need to
> > invalidate these pages
> > (Can we invalidate dirty pages resident in page cache with
> > POSIX_FADV_DONTNEED as
> > well?) man pages say, we cannot (unless i misunderstood it).
> >
>
> I think you'd have to fsync + POSIX_FADV_DONTNEED. But I am still
> confused how we could end up with dirty pagecache pages on the
> destination. In my opinion, there should only be clean pagecache pages
> -- can someone enlighten me? :)

because of activity on the page cache pages corresponding to mmap region
in the past which is not synced yet or not reclaimed yet. Maybe this
is hypothetical
or not possible, happy to learn?

>
> >>
> >> IIRC, an existing mmap of the file on the dst should not really be
> >> problematic *as long as* we didn't actually access file content that way
> >> and faulted in the pages. So *maybe*, if we do the POSIX_FADV_DONTNEED
> >> on the dst before accessing file content via the mmap, there shouldn't
> >> be an issue. Unless the mmap itself is already problematic.
> >
> > mmap with shared=ON, might result in stale dirty page cache pages?
>
> But only if actually accessing memory, especially writing to it, no?

yes.


>
> >
> >>
> >> I think we can assume that once QEMU starts on the dst and wants to mmap
> >> the file that it's not mapped into any other process yet. vhost-user
> >> will only mmap *after* being told from QEMU about the mmap region and
> >> the location in GPA.
> >
> > maybe we have an old stale dirty page cache page even if there no mmap 
> > process
> > alive before mmaping virtio-pmem backend file in destination?
>
> But how could that happen in a sane environment? As I said, any
> writeback on that file from the destination would actually corrupt the
> file that has been used+modified on the source in the meantime, no?
>
>
> --
> Thanks,
>
> David / dhildenb
>



Re: [RFC] virtio_pmem: enable live migration support

2022-01-12 Thread Pankaj Gupta
Thank you David for replying!

> > From: Pankaj Gupta >
> >
> > Enable live migration support for virtio-pmem device.
> > Tested this: with live migration on same host.
> >
> > Need suggestion on below points to support virtio-pmem live migration
> > between two separate host systems:
>
> I assume emulated NVDIMMs would have the exact same issue, right?
>
> There are two cases to consider I think:
>
> 1) Backing storage is migrated manually to the destination (i.e., a file
> that is copied/moved/transmitted during migration)
>
> 2) Backing storage is located on a shared network storage (i.e., a file
> that is not copied during migration)
>
> IIRC you're concerned about 2).

Yes.

>
> >
> > - There is still possibility of stale page cache page at the
> >   destination host which we cannot invalidate currently as done in 1]
> >   for write-back mode because virtio-pmem memory backend file is mmaped
> >   in guest address space and invalidating corresponding page cache pages
> >   would also fault all the other userspace process mappings on the same 
> > file.
> >   Or we make it strict no other process would mmap this backing file?
>
> I'd have assume that a simple fsync on the src once migration is about
> to switch over (e.g., pre_save/post_save handler) should be enough to
> trigger writeback to the backing storage, at which point the dst can
> take over. So handling the src is easy.
>
> So is the issue that the dst might still have stale pagecache
> information, because it already accessed some of that file previously,
> correct?

yes.

>
> >
> >   -- In commit 1] we first fsync and then invalidate all the pages from 
> > destination
> >  page cache. fsync would sync the stale dirty page cache page, Is this 
> > the right
> >  thing to do as we might end up in data discrepancy?
>
> It would be weird if
>
> a) The src used/modified the file and fsync'ed the modifications back to
>backing storage
> b) The dst has stale dirty pagecache pages that would result in a
>modification of backing storage during fsync()

Yes. That's what I thought currently we are doing with commit 1] maybe Stefan
can confirm. If yes, itvirg is broken as well.

>
> I mean, that would be fundamentally broken, because the fsync() would
> corrupt the file. So I assume in a sane environment, the dst could only
> have stale clean pagecache pages. And we'd have to get rid of these to
> re-read everything from file.

In case of write back cache mode, we could still have stale dirty
pages at the destination
host and destination fsync is not the right thing to do. We need to
invalidate these pages
(Can we invalidate dirty pages resident in page cache with
POSIX_FADV_DONTNEED as
well?) man pages say, we cannot (unless i misunderstood it).

>
> IIRC, an existing mmap of the file on the dst should not really be
> problematic *as long as* we didn't actually access file content that way
> and faulted in the pages. So *maybe*, if we do the POSIX_FADV_DONTNEED
> on the dst before accessing file content via the mmap, there shouldn't
> be an issue. Unless the mmap itself is already problematic.

mmap with shared=ON, might result in stale dirty page cache pages?

>
> I think we can assume that once QEMU starts on the dst and wants to mmap
> the file that it's not mapped into any other process yet. vhost-user
> will only mmap *after* being told from QEMU about the mmap region and
> the location in GPA.

maybe we have an old stale dirty page cache page even if there no mmap process
alive before mmaping virtio-pmem backend file in destination?
>
> So if the existing QEMU mmap is not problematic, it should be easy, just
> do the POSIX_FADV_DONTNEED on the destination when initializing
> virtio-pmem. If we have to POSIX_FADV_DONTNEED *before* performing the
> mmap, we might need a way to tell QEMU to POSIX_FADV_DONTNEED before
> doing the mmap. The could be a parameter for memory-backend-file like
> "flush=on", or doing that implicitly when we're told that we expect an
> incoming migration.

Yes, that's what I had in mind. Just wanted to confirm some of my
doubts for correct
implementation. As I see it, page cache coherency across multiple host
systems with
live migration needs to be addressed or used to avoid such scenarios.


Thanks,
Pankaj
>
> --
> Thanks,
>
> David / dhildenb



Re: [PATCH v1] util/oslib-posix: Fix missing unlock in the error path of os_mem_prealloc()

2022-01-11 Thread Pankaj Gupta
> We're missing an unlock in case installing the signal handler failed.
> Fortunately, we barely see this error in real life.
>
> Fixes: a960d6642d39 ("util/oslib-posix: Support concurrent os_mem_prealloc() 
> invocation")
> Fixes: CID 1468941
> Cc: Paolo Bonzini 
> Cc: Michael S. Tsirkin 
> Cc: Pankaj Gupta 
> Cc: Daniel P. Berrangé 
> Cc: Michal Privoznik 
> Signed-off-by: David Hildenbrand 
> ---
>  util/oslib-posix.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 9efdc74bba..ac0dbc2adc 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -683,6 +683,7 @@ void os_mem_prealloc(int fd, char *area, size_t memory, 
> int smp_cpus,
>
>  ret = sigaction(SIGBUS, &act, &sigbus_oldact);
>  if (ret) {
> +qemu_mutex_unlock(&sigbus_mutex);
>  error_setg_errno(errp, errno,
>          "os_mem_prealloc: failed to install signal handler");
>  return;
> --

Reviewed-by: Pankaj Gupta 

> 2.33.1
>



[RFC] virtio_pmem: enable live migration support

2021-12-31 Thread Pankaj Gupta
From: Pankaj Gupta >

Enable live migration support for virtio-pmem device.
Tested this: with live migration on same host.

Need suggestion on below points to support virtio-pmem live migration
between two separate host systems:

- There is still possibility of stale page cache page at the
  destination host which we cannot invalidate currently as done in 1]
  for write-back mode because virtio-pmem memory backend file is mmaped
  in guest address space and invalidating corresponding page cache pages
  would also fault all the other userspace process mappings on the same file.
  Or we make it strict no other process would mmap this backing file?

  -- In commit 1] we first fsync and then invalidate all the pages from 
destination
 page cache. fsync would sync the stale dirty page cache page, Is this the 
right
 thing to do as we might end up in data discrepency?

- Thinking, alternatively if we transfer active corresponding guest page cache
  pages information from active LRU list source to destination host and refault
  those pages. This would also help to enable hot page cache in destination host
  for the guest and solve stale page cache issue as well. How we can achieve 
this
  so that we make sure we get rid of all the stale page cache pages in 
destination
  host?

  Looking for suggestions on recommended and feasible solution we can implement?
  Thank you!

1] dd577a26ff ("block/file-posix: implement bdrv_co_invalidate_cache() on 
Linux")

Signed-off-by: Pankaj Gupta 
---
 hw/virtio/virtio-pmem.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
index d1aeb90a31..a19619a387 100644
--- a/hw/virtio/virtio-pmem.c
+++ b/hw/virtio/virtio-pmem.c
@@ -123,6 +123,7 @@ static void virtio_pmem_realize(DeviceState *dev, Error 
**errp)
 }
 
 host_memory_backend_set_mapped(pmem->memdev, true);
+vmstate_register_ram(&pmem->memdev->mr, DEVICE(pmem));
 virtio_init(vdev, TYPE_VIRTIO_PMEM, VIRTIO_ID_PMEM,
 sizeof(struct virtio_pmem_config));
 pmem->rq_vq = virtio_add_queue(vdev, 128, virtio_pmem_flush);
@@ -133,6 +134,7 @@ static void virtio_pmem_unrealize(DeviceState *dev)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtIOPMEM *pmem = VIRTIO_PMEM(dev);
 
+vmstate_unregister_ram(&pmem->memdev->mr, DEVICE(pmem));
 host_memory_backend_set_mapped(pmem->memdev, false);
 virtio_delete_queue(pmem->rq_vq);
 virtio_cleanup(vdev);
@@ -157,6 +159,16 @@ static MemoryRegion 
*virtio_pmem_get_memory_region(VirtIOPMEM *pmem,
 return &pmem->memdev->mr;
 }
 
+static const VMStateDescription vmstate_virtio_pmem = {
+.name = "virtio-pmem",
+.minimum_version_id = 1,
+.version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_END_OF_LIST()
+},
+};
+
 static Property virtio_pmem_properties[] = {
 DEFINE_PROP_UINT64(VIRTIO_PMEM_ADDR_PROP, VirtIOPMEM, start, 0),
 DEFINE_PROP_LINK(VIRTIO_PMEM_MEMDEV_PROP, VirtIOPMEM, memdev,
@@ -171,6 +183,7 @@ static void virtio_pmem_class_init(ObjectClass *klass, void 
*data)
 VirtIOPMEMClass *vpc = VIRTIO_PMEM_CLASS(klass);
 
 device_class_set_props(dc, virtio_pmem_properties);
+dc->vmsd = &vmstate_virtio_pmem;
 
 vdc->realize = virtio_pmem_realize;
 vdc->unrealize = virtio_pmem_unrealize;
-- 
2.25.1




Re: [PATCH v1 3/3] virtio-mem: Set "unplugged-inaccessible=auto" for the 6.2 machine on x86

2021-12-08 Thread Pankaj Gupta
> Set the new default to "auto", keeping it set to "on" for compat
> machines. This property is only available for x86 targets.
>
> TODO: once 6.2 was released and we have compat machines, target the next
>   QEMU release.
>
> Signed-off-by: David Hildenbrand 
> ---
>  hw/i386/pc.c   | 1 +
>  hw/virtio/virtio-mem.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index a2ef40ecbc..045ba05431 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -99,6 +99,7 @@ GlobalProperty pc_compat_6_1[] = {
>  { TYPE_X86_CPU, "hv-version-id-major", "0x0006" },
>  { TYPE_X86_CPU, "hv-version-id-minor", "0x0001" },
>  { "ICH9-LPC", "x-keep-pci-slot-hpc", "false" },
> +{ "virtio-mem", "unplugged-inaccessible", "off" },
>  };
>  const size_t pc_compat_6_1_len = G_N_ELEMENTS(pc_compat_6_1);
>
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 1e57156e81..a5d26d414f 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -1169,7 +1169,7 @@ static Property virtio_mem_properties[] = {
>   TYPE_MEMORY_BACKEND, HostMemoryBackend *),
>  #if defined(VIRTIO_MEM_HAS_LEGACY_GUESTS)
>  DEFINE_PROP_ON_OFF_AUTO(VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP, 
> VirtIOMEM,
> -unplugged_inaccessible, ON_OFF_AUTO_OFF),
> +unplugged_inaccessible, ON_OFF_AUTO_AUTO),
>  #endif
>  DEFINE_PROP_END_OF_LIST(),
>  };

With correction in commit message pointed by Michal.
 Reviewed-by: Pankaj Gupta 



[PATCH v2] docs/nvdimm: Update nvdimm option value in machine example

2021-09-23 Thread Pankaj Gupta
Update nvdimm option value in example command from "-machine pc,nvdimm"
to "-machine pc,nvdimm=on" as former complains with the below error:

"qemu-system-x86_64: -machine pc,nvdimm: Expected '=' after parameter 'nvdimm'"

Reviewed-by: Laurent Vivier 
Signed-off-by: Pankaj Gupta 
---
 docs/nvdimm.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 0aae682be3..fd7773dc5a 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -15,7 +15,7 @@ backend (i.e. memory-backend-file and memory-backend-ram). A 
simple
 way to create a vNVDIMM device at startup time is done via the
 following command line options:
 
- -machine pc,nvdimm
+ -machine pc,nvdimm=on
  -m $RAM_SIZE,slots=$N,maxmem=$MAX_SIZE
  -object 
memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE,readonly=off
  -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off
-- 
2.25.1




Re: [PATCH] docs/nvdimm: Update nvdimm option value in machine example

2021-09-23 Thread Pankaj Gupta
> > ping
>
> Could you post your patch with an email address ("From") that is the same as 
> the "Signed-off-by"?

Sure sent a V2.

Thank you,
Pankaj

>
> https://wiki.qemu.org/Contribute/SubmitAPatch#Patch_emails_must_include_a_Signed-off-by:_line
>
> Thanks,
> Laurent
>
> >> Update nvdimm option value in example command from "-machine pc,nvdimm"
> >> to "-machine pc,nvdimm=on" as former complains with the below error:
> >>
> >> "qemu-system-x86_64: -machine pc,nvdimm: Expected '=' after parameter 
> >> 'nvdimm'"
> >>
> >> Signed-off-by: Pankaj Gupta 
> >> ---
> >>  docs/nvdimm.txt | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> >> index 0aae682be3..fd7773dc5a 100644
> >> --- a/docs/nvdimm.txt
> >> +++ b/docs/nvdimm.txt
> >> @@ -15,7 +15,7 @@ backend (i.e. memory-backend-file and 
> >> memory-backend-ram). A simple
> >>  way to create a vNVDIMM device at startup time is done via the
> >>  following command line options:
> >>
> >> - -machine pc,nvdimm
> >> + -machine pc,nvdimm=on
> >>   -m $RAM_SIZE,slots=$N,maxmem=$MAX_SIZE
> >>   -object 
> >> memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE,readonly=off
> >>   -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off
> >> --
> >> 2.25.1
> >>
> >
>



Re: [PATCH] docs/nvdimm: Update nvdimm option value in machine example

2021-09-23 Thread Pankaj Gupta
ping

> Update nvdimm option value in example command from "-machine pc,nvdimm"
> to "-machine pc,nvdimm=on" as former complains with the below error:
>
> "qemu-system-x86_64: -machine pc,nvdimm: Expected '=' after parameter 
> 'nvdimm'"
>
> Signed-off-by: Pankaj Gupta 
> ---
>  docs/nvdimm.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index 0aae682be3..fd7773dc5a 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -15,7 +15,7 @@ backend (i.e. memory-backend-file and memory-backend-ram). 
> A simple
>  way to create a vNVDIMM device at startup time is done via the
>  following command line options:
>
> - -machine pc,nvdimm
> + -machine pc,nvdimm=on
>   -m $RAM_SIZE,slots=$N,maxmem=$MAX_SIZE
>   -object 
> memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE,readonly=off
>   -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off
> --
> 2.25.1
>



Re: [PATCH for-6.2 v5 06/14] machine: Prefer cores over sockets in smp parsing since 6.2

2021-08-16 Thread Pankaj Gupta
ards.h
> index 463a5514f9..2ae039b74f 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -247,6 +247,7 @@ struct MachineClass {
>  bool nvdimm_supported;
>  bool numa_mem_supported;
>  bool auto_enable_numa;
> +bool smp_prefer_sockets;
>  const char *default_ram_id;
>
>  HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 06f819177e..451d2cd817 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -238,7 +238,8 @@ SRST
>  Historically preference was given to the coarsest topology parameters
>  when computing missing values (ie sockets preferred over cores, which
>  were preferred over threads), however, this behaviour is considered
> -liable to change.
> +liable to change. Prior to 6.2 the preference was sockets over cores
> +over threads. Since 6.2 the preference is cores over sockets over 
> threads.
>  ERST
>
>  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> --

Reviewed-by: Pankaj Gupta 



Re: [PATCH for-6.2 v5 06/14] machine: Prefer cores over sockets in smp parsing since 6.2

2021-08-16 Thread Pankaj Gupta
> On 2021/8/14 0:30, Pankaj Gupta wrote:
> >> In the real SMP hardware topology world, it's much more likely that
> >> we have high cores-per-socket counts and few sockets totally. While
> >> the current preference of sockets over cores in smp parsing results
> >> in a virtual cpu topology with low cores-per-sockets counts and a
> >> large number of sockets, which is just contrary to the real world.
> >>
> >> Given that it is better to make the virtual cpu topology be more
> >> reflective of the real world and also for the sake of compatibility,
> >> we start to prefer cores over sockets over threads in smp parsing
> >> since machine type 6.2 for different arches.
> >>
> >> In this patch, a boolean "smp_prefer_sockets" is added, and we only
> >> enable the old preference on older machines and enable the new one
> >> since type 6.2 for all arches by using the machine compat mechanism.
> >>
> >> Suggested-by: Daniel P. Berrange 
> >> Signed-off-by: Yanan Wang 
> >> Acked-by: David Gibson 
> >> Acked-by: Cornelia Huck 
> >> Reviewed-by: Andrew Jones 
> >> ---
> >>   hw/arm/virt.c  |  1 +
> >>   hw/core/machine.c  | 35 ++-
> >>   hw/i386/pc.c   | 35 ++-
> >>   hw/i386/pc_piix.c  |  1 +
> >>   hw/i386/pc_q35.c   |  1 +
> >>   hw/ppc/spapr.c |  1 +
> >>   hw/s390x/s390-virtio-ccw.c |  1 +
> >>   include/hw/boards.h|  1 +
> >>   qemu-options.hx|  3 ++-
> >>   9 files changed, 60 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >> index 01165f7f53..7babea40dc 100644
> >> --- a/hw/arm/virt.c
> >> +++ b/hw/arm/virt.c
> >> @@ -2797,6 +2797,7 @@ static void virt_machine_6_1_options(MachineClass 
> >> *mc)
> >>   {
> >>   virt_machine_6_2_options(mc);
> >>   compat_props_add(mc->compat_props, hw_compat_6_1, hw_compat_6_1_len);
> >> +mc->smp_prefer_sockets = true;
> > Will this be set for virt_machine_6_2_?
> >
> Hi,
>
> We hope to start to assume cores over sockets on virt_6_2 machines
> and the later, and keep the old style of assuming sockets over cores
> on virt_6_1 machines and the older. Is there any concern here?

O.k. Thank you for clarifying.

Best regards,
Pankaj

>
> Thanks,
> Yanan
> >>   }
> >>   DEFINE_VIRT_MACHINE(6, 1)
> >>
> >> diff --git a/hw/core/machine.c b/hw/core/machine.c
> >> index bdce80df32..15b41c52e8 100644
> >> --- a/hw/core/machine.c
> >> +++ b/hw/core/machine.c
> >> @@ -748,6 +748,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
> >>
> >>   static void smp_parse(MachineState *ms, SMPConfiguration *config, Error 
> >> **errp)
> >>   {
> >> +MachineClass *mc = MACHINE_GET_CLASS(ms);
> >>   unsigned cpus= config->has_cpus ? config->cpus : 0;
> >>   unsigned sockets = config->has_sockets ? config->sockets : 0;
> >>   unsigned cores   = config->has_cores ? config->cores : 0;
> >> @@ -759,7 +760,7 @@ static void smp_parse(MachineState *ms, 
> >> SMPConfiguration *config, Error **errp)
> >>   return;
> >>   }
> >>
> >> -/* compute missing values, prefer sockets over cores over threads */
> >> +/* compute missing values based on the provided ones */
> >>   if (cpus == 0 && maxcpus == 0) {
> >>   sockets = sockets > 0 ? sockets : 1;
> >>   cores = cores > 0 ? cores : 1;
> >> @@ -767,14 +768,30 @@ static void smp_parse(MachineState *ms, 
> >> SMPConfiguration *config, Error **errp)
> >>   } else {
> >>   maxcpus = maxcpus > 0 ? maxcpus : cpus;
> >>
> >> -if (sockets == 0) {
> >> -cores = cores > 0 ? cores : 1;
> >> -threads = threads > 0 ? threads : 1;
> >> -sockets = maxcpus / (cores * threads);
> >> -} else if (cores == 0) {
> >> -threads = threads > 0 ? threads : 1;
> >> -cores = maxcpus / (sockets * threads);
> >> -} else if (threads == 0) {
> >> +if (mc->smp_prefer_sockets) {
> >> +/* prefer sockets over cores before 6.2 */

Re: [PATCH for-6.2 v5 06/14] machine: Prefer cores over sockets in smp parsing since 6.2

2021-08-13 Thread Pankaj Gupta
> In the real SMP hardware topology world, it's much more likely that
> we have high cores-per-socket counts and few sockets totally. While
> the current preference of sockets over cores in smp parsing results
> in a virtual cpu topology with low cores-per-sockets counts and a
> large number of sockets, which is just contrary to the real world.
>
> Given that it is better to make the virtual cpu topology be more
> reflective of the real world and also for the sake of compatibility,
> we start to prefer cores over sockets over threads in smp parsing
> since machine type 6.2 for different arches.
>
> In this patch, a boolean "smp_prefer_sockets" is added, and we only
> enable the old preference on older machines and enable the new one
> since type 6.2 for all arches by using the machine compat mechanism.
>
> Suggested-by: Daniel P. Berrange 
> Signed-off-by: Yanan Wang 
> Acked-by: David Gibson 
> Acked-by: Cornelia Huck 
> Reviewed-by: Andrew Jones 
> ---
>  hw/arm/virt.c  |  1 +
>  hw/core/machine.c  | 35 ++-
>  hw/i386/pc.c   | 35 ++-
>  hw/i386/pc_piix.c  |  1 +
>  hw/i386/pc_q35.c   |  1 +
>  hw/ppc/spapr.c |  1 +
>  hw/s390x/s390-virtio-ccw.c |  1 +
>  include/hw/boards.h|  1 +
>  qemu-options.hx|  3 ++-
>  9 files changed, 60 insertions(+), 19 deletions(-)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 01165f7f53..7babea40dc 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -2797,6 +2797,7 @@ static void virt_machine_6_1_options(MachineClass *mc)
>  {
>  virt_machine_6_2_options(mc);
>  compat_props_add(mc->compat_props, hw_compat_6_1, hw_compat_6_1_len);
> +mc->smp_prefer_sockets = true;

Will this be set for virt_machine_6_2_?


>  }
>  DEFINE_VIRT_MACHINE(6, 1)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index bdce80df32..15b41c52e8 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -748,6 +748,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
>
>  static void smp_parse(MachineState *ms, SMPConfiguration *config, Error 
> **errp)
>  {
> +MachineClass *mc = MACHINE_GET_CLASS(ms);
>  unsigned cpus= config->has_cpus ? config->cpus : 0;
>  unsigned sockets = config->has_sockets ? config->sockets : 0;
>  unsigned cores   = config->has_cores ? config->cores : 0;
> @@ -759,7 +760,7 @@ static void smp_parse(MachineState *ms, SMPConfiguration 
> *config, Error **errp)
>  return;
>  }
>
> -/* compute missing values, prefer sockets over cores over threads */
> +/* compute missing values based on the provided ones */
>  if (cpus == 0 && maxcpus == 0) {
>  sockets = sockets > 0 ? sockets : 1;
>  cores = cores > 0 ? cores : 1;
> @@ -767,14 +768,30 @@ static void smp_parse(MachineState *ms, 
> SMPConfiguration *config, Error **errp)
>  } else {
>  maxcpus = maxcpus > 0 ? maxcpus : cpus;
>
> -if (sockets == 0) {
> -cores = cores > 0 ? cores : 1;
> -threads = threads > 0 ? threads : 1;
> -sockets = maxcpus / (cores * threads);
> -} else if (cores == 0) {
> -threads = threads > 0 ? threads : 1;
> -cores = maxcpus / (sockets * threads);
> -} else if (threads == 0) {
> +if (mc->smp_prefer_sockets) {
> +/* prefer sockets over cores before 6.2 */
> +if (sockets == 0) {
> +cores = cores > 0 ? cores : 1;
> +threads = threads > 0 ? threads : 1;
> +sockets = maxcpus / (cores * threads);
> +} else if (cores == 0) {
> +threads = threads > 0 ? threads : 1;
> +cores = maxcpus / (sockets * threads);
> +}
> +} else {
> +/* prefer cores over sockets since 6.2 */
> +if (cores == 0) {
> +sockets = sockets > 0 ? sockets : 1;
> +threads = threads > 0 ? threads : 1;
> +cores = maxcpus / (sockets * threads);
> +} else if (sockets == 0) {
> +threads = threads > 0 ? threads : 1;
> +sockets = maxcpus / (cores * threads);
> +}
> +}
> +
> +/* try to calculate omitted threads at last */
> +if (threads == 0) {
>  threads = maxcpus / (sockets * cores);
>  }
>  }
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index afd8b9c283..4b05ff7160 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -717,6 +717,7 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int 
> level)
>   */
>  static void pc_smp_parse(MachineState *ms, SMPConfiguration *config, Error 
> **errp)
>  {
> +MachineClass *mc = MACHINE_GET_CLASS(ms);
>  unsigned cpus= config->has_cpus ? config->cpus : 0;
>  unsigned sockets = config->has_sockets ? config->sockets : 0;
>  unsigned dies= config-

Re: [PATCH for-6.2 v4 06/14] machine: Prefer cores over sockets in smp parsing since 6.2

2021-08-06 Thread Pankaj Gupta
> >> In the real SMP hardware topology world, it's much more likely that
> >> we have high cores-per-socket counts and few sockets totally. While
> >> the current preference of sockets over cores in smp parsing results
> >> in a virtual cpu topology with low cores-per-sockets counts and a
> >> large number of sockets, which is just contrary to the real world.
> >>
> >> Given that it is better to make the virtual cpu topology be more
> >> reflective of the real world and also for the sake of compatibility,
> >> we start to prefer cores over sockets over threads in smp parsing
> >> since machine type 6.2 for different arches.
> >>
> >> In this patch, a boolean "smp_prefer_sockets" is added, and we only
> >> enable the old preference on older machines and enable the new one
> >> since type 6.2 for all arches by using the machine compat mechanism.
> >>
> >> Reviewed-by: Andrew Jones 
> >> Acked-by: Cornelia Huck 
> >> Acked-by: David Gibson 
> >> Suggested-by: Daniel P. Berrange 
> >> Signed-off-by: Yanan Wang 
> >> ---
> >>   hw/arm/virt.c  |  1 +
> >>   hw/core/machine.c  | 36 ++--
> >>   hw/i386/pc.c   | 36 ++--
> >>   hw/i386/pc_piix.c  |  1 +
> >>   hw/i386/pc_q35.c   |  1 +
> >>   hw/ppc/spapr.c |  1 +
> >>   hw/s390x/s390-virtio-ccw.c |  1 +
> >>   include/hw/boards.h|  1 +
> >>   qemu-options.hx|  3 ++-
> >>   9 files changed, 60 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >> index 01165f7f53..7babea40dc 100644
> >> --- a/hw/arm/virt.c
> >> +++ b/hw/arm/virt.c
> >> @@ -2797,6 +2797,7 @@ static void virt_machine_6_1_options(MachineClass 
> >> *mc)
> >>   {
> >>   virt_machine_6_2_options(mc);
> >>   compat_props_add(mc->compat_props, hw_compat_6_1, hw_compat_6_1_len);
> >> +mc->smp_prefer_sockets = true;
> >>   }
> >>   DEFINE_VIRT_MACHINE(6, 1)
> >>
> >> diff --git a/hw/core/machine.c b/hw/core/machine.c
> >> index 458d9736e3..a8173a0f45 100644
> >> --- a/hw/core/machine.c
> >> +++ b/hw/core/machine.c
> >> @@ -746,6 +746,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
> >>
> >>   static void smp_parse(MachineState *ms, SMPConfiguration *config, Error 
> >> **errp)
> >>   {
> >> +MachineClass *mc = MACHINE_GET_CLASS(ms);
> >>   unsigned cpus= config->has_cpus ? config->cpus : 0;
> >>   unsigned sockets = config->has_sockets ? config->sockets : 0;
> >>   unsigned cores   = config->has_cores ? config->cores : 0;
> >> @@ -757,7 +758,7 @@ static void smp_parse(MachineState *ms, 
> >> SMPConfiguration *config, Error **errp)
> >>   return;
> >>   }
> >>
> >> -/* compute missing values, prefer sockets over cores over threads */
> >> +/* compute missing values based on the provided ones */
> >>   if (cpus == 0 && maxcpus == 0) {
> >>   sockets = sockets > 0 ? sockets : 1;
> >>   cores = cores > 0 ? cores : 1;
> >> @@ -765,15 +766,30 @@ static void smp_parse(MachineState *ms, 
> >> SMPConfiguration *config, Error **errp)
> >>   } else {
> >>   maxcpus = maxcpus > 0 ? maxcpus : cpus;
> >>
> >> -if (sockets == 0) {
> >> -cores = cores > 0 ? cores : 1;
> >> -threads = threads > 0 ? threads : 1;
> >> -sockets = maxcpus / (cores * threads);
> >> -} else if (cores == 0) {
> >> -threads = threads > 0 ? threads : 1;
> >> -cores = maxcpus / (sockets * threads);
> >> -} else if (threads == 0) {
> >> -threads = maxcpus / (sockets * cores);
> >> +if (mc->smp_prefer_sockets) {
> >> +/* prefer sockets over cores over threads before 6.2 */
> >> +if (sockets == 0) {
> >> +cores = cores > 0 ? cores : 1;
> >> +threads = threads > 0 ? threads : 1;
> >> +sockets = maxcpus / (cores * threads);
> >> +} else if (cores == 0) {
> >> +threads = threads > 0 ? threads : 1;
> >> +cores = maxcpus / (sockets * threads);
> >> +} else if (threads == 0) {
> >> +threads = maxcpus / (sockets * cores);
> >> +}
> >> +} else {
> >> +/* prefer cores over sockets over threads since 6.2 */
> >> +if (cores == 0) {
> >> +sockets = sockets > 0 ? sockets : 1;
> >> +threads = threads > 0 ? threads : 1;
> >> +cores = maxcpus / (sockets * threads);
> >> +} else if (sockets == 0) {
> >> +threads = threads > 0 ? threads : 1;
> >> +sockets = maxcpus / (cores * threads);
> >> +} else if (threads == 0) {
> >> +threads = maxcpus / (sockets * cores);
> >> +}
> > I feel this code is repeated at multiple places. Also, (threads == 0) case
> > at the end is common for all the cas

Re: [PATCH-for-6.1 v2] softmmu/physmem: fix wrong assertion in qemu_ram_alloc_internal()

2021-08-05 Thread Pankaj Gupta
> When adding RAM_NORESERVE, we forgot to remove the old assertion when
> adding the updated one, most probably when reworking the patches or
> rebasing. We can easily crash QEMU by adding
>   -object memory-backend-ram,id=mem0,size=500G,reserve=off
> to the QEMU cmdline:
>   qemu-system-x86_64: ../softmmu/physmem.c:2146: qemu_ram_alloc_internal:
>   Assertion `(ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC))
>   == 0' failed.
>
> Fix it by removing the old assertion.
>
> Fixes: 8dbe22c6868b ("memory: Introduce RAM_NORESERVE and wire it up in 
> qemu_ram_mmap()")
> Reviewed-by: Peter Xu 
> Reviewed-by: Philippe Mathieu-Daudé 
> Tested-by: Philippe Mathieu-Daudé 
> Cc: Paolo Bonzini 
> Cc: Peter Xu 
> Cc: Philippe Mathieu-Daudé 
> Signed-off-by: David Hildenbrand 
> ---
>
> v1 -> v2:
> - Added rbs
> - Tagged for 6.1 inclusion
>
> ---
>  softmmu/physmem.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 3c1912a1a0..2e18947598 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -2143,7 +2143,6 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, 
> ram_addr_t max_size,
>  RAMBlock *new_block;
>  Error *local_err = NULL;
>
> -assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC)) == 0);
>  assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC |
>RAM_NORESERVE)) == 0);
>  assert(!host ^ (ram_flags & RAM_PREALLOC));
> --
> 2.31.1

Reviewed-by: Pankaj Gupta 

>
>



Re: [PATCH for-6.2 v4 02/14] machine: Uniformly use maxcpus to calculate the omitted parameters

2021-08-05 Thread Pankaj Gupta
   sockets = maxcpus / (dies * cores * threads);
>  } else if (cores == 0) {
>  threads = threads > 0 ? threads : 1;
> -cores = cpus / (sockets * dies * threads);
> -cores = cores > 0 ? cores : 1;
> +cores = maxcpus / (sockets * dies * threads);
>  } else if (threads == 0) {
> -threads = cpus / (cores * dies * sockets);
> -threads = threads > 0 ? threads : 1;
> -} else if (sockets * dies * cores * threads < cpus) {
> +threads = maxcpus / (sockets * dies * cores);
> +}
> +
> +if (sockets * dies * cores * threads < cpus) {
>  error_setg(errp, "cpu topology: "
> "sockets (%u) * dies (%u) * cores (%u) * threads (%u) < "
> "smp_cpus (%u)",
> @@ -750,8 +752,6 @@ static void pc_smp_parse(MachineState *ms, 
> SMPConfiguration *config, Error **err
>  return;
>  }
>
> -maxcpus = maxcpus > 0 ? maxcpus : cpus;
> -
>  if (maxcpus < cpus) {
>  error_setg(errp, "maxcpus must be equal to or greater than smp");
>  return;

Reviewed-by: Pankaj Gupta 



Re: [PATCH for-6.2 v4 04/14] machine: Improve the error reporting of smp parsing

2021-08-05 Thread Pankaj Gupta
> We have two requirements for a valid SMP configuration:
> the product of "sockets * cores * threads" must represent all the
> possible cpus, i.e., max_cpus, and then must include the initially
> present cpus, i.e., smp_cpus.
>
> So we only need to ensure 1) "sockets * cores * threads == maxcpus"
> at first and then ensure 2) "maxcpus >= cpus". With a reasonable
> order of the sanity check, we can simplify the error reporting code.
> When reporting an error message we also report the exact value of
> each topology member to make users easily see what's going on.
>
> Reviewed-by: Andrew Jones 
> Signed-off-by: Yanan Wang 
> ---
>  hw/core/machine.c | 22 +-
>  hw/i386/pc.c  | 24 ++--
>  2 files changed, 19 insertions(+), 27 deletions(-)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 958e6e7107..e879163c3b 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -777,25 +777,21 @@ static void smp_parse(MachineState *ms, 
> SMPConfiguration *config, Error **errp)
>  maxcpus = maxcpus > 0 ? maxcpus : sockets * cores * threads;
>  cpus = cpus > 0 ? cpus : maxcpus;
>
> -if (sockets * cores * threads < cpus) {
> -error_setg(errp, "cpu topology: "
> -   "sockets (%u) * cores (%u) * threads (%u) < "
> -   "smp_cpus (%u)",
> -   sockets, cores, threads, cpus);
> +if (sockets * cores * threads != maxcpus) {
> +error_setg(errp, "Invalid CPU topology: "
> +   "product of the hierarchy must match maxcpus: "
> +   "sockets (%u) * cores (%u) * threads (%u) "
> +   "!= maxcpus (%u)",
> +   sockets, cores, threads, maxcpus);
>  return;
>  }
>
>  if (maxcpus < cpus) {
> -error_setg(errp, "maxcpus must be equal to or greater than smp");
> -return;
> -}
> -
> -if (sockets * cores * threads != maxcpus) {
>  error_setg(errp, "Invalid CPU topology: "
> +   "maxcpus must be equal to or greater than smp: "
> "sockets (%u) * cores (%u) * threads (%u) "
> -   "!= maxcpus (%u)",
> -   sockets, cores, threads,
> -   maxcpus);
> +   "== maxcpus (%u) < smp_cpus (%u)",
> +   sockets, cores, threads, maxcpus, cpus);
>  return;
>  }
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 9ad7ae5254..fcf6905219 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -747,25 +747,21 @@ static void pc_smp_parse(MachineState *ms, 
> SMPConfiguration *config, Error **err
>  maxcpus = maxcpus > 0 ? maxcpus : sockets * dies * cores * threads;
>  cpus = cpus > 0 ? cpus : maxcpus;
>
> -if (sockets * dies * cores * threads < cpus) {
> -error_setg(errp, "cpu topology: "
> -   "sockets (%u) * dies (%u) * cores (%u) * threads (%u) < "
> -   "smp_cpus (%u)",
> -   sockets, dies, cores, threads, cpus);
> +if (sockets * dies * cores * threads != maxcpus) {
> +error_setg(errp, "Invalid CPU topology: "
> +   "product of the hierarchy must match maxcpus: "
> +   "sockets (%u) * dies (%u) * cores (%u) * threads (%u) "
> +   "!= maxcpus (%u)",
> +   sockets, dies, cores, threads, maxcpus);
>  return;
>  }
>
>  if (maxcpus < cpus) {
> -error_setg(errp, "maxcpus must be equal to or greater than smp");
> -return;
> -}
> -
> -if (sockets * dies * cores * threads != maxcpus) {
> -error_setg(errp, "Invalid CPU topology deprecated: "
> +error_setg(errp, "Invalid CPU topology: "
> +   "maxcpus must be equal to or greater than smp: "
> "sockets (%u) * dies (%u) * cores (%u) * threads (%u) "
> -   "!= maxcpus (%u)",
> -   sockets, dies, cores, threads,
> -   maxcpus);
> +   "== maxcpus (%u) < smp_cpus (%u)",
> +   sockets, dies, cores, threads, maxcpus, cpus);
>  return;
>  }
>

Reviewed-by: Pankaj Gupta 



Re: [PATCH for-6.2 v4 12/14] machine: Put all sanity-check in the generic SMP parser

2021-08-05 Thread Pankaj Gupta
> Put both sanity-check of the input SMP configuration and sanity-check
> of the output SMP configuration uniformly in the generic parser. Then
> machine_set_smp() will become cleaner, also all the invalid scenarios
> can be tested only by calling the parser.
>
> Signed-off-by: Yanan Wang 
> ---
>  hw/core/machine.c | 63 +++
>  1 file changed, 31 insertions(+), 32 deletions(-)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 443ae5aa1b..3dd6c6f81e 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -811,6 +811,20 @@ static void smp_parse(MachineState *ms, SMPConfiguration 
> *config, Error **errp)
>  unsigned threads = config->has_threads ? config->threads : 0;
>  unsigned maxcpus = config->has_maxcpus ? config->maxcpus : 0;
>
> +/*
> + * A specified topology parameter must be greater than zero,
> + * explicit configuration like "cpus=0" is not allowed.
> + */
> +if ((config->has_cpus && config->cpus == 0) ||
> +(config->has_sockets && config->sockets == 0) ||
> +(config->has_dies && config->dies == 0) ||
> +(config->has_cores && config->cores == 0) ||
> +(config->has_threads && config->threads == 0) ||
> +(config->has_maxcpus && config->maxcpus == 0)) {
> +error_setg(errp, "CPU topology parameters must be greater than 
> zero");
> +return;
> +}
> +
>  /*
>   * If not supported by the machine, a topology parameter must be
>   * omitted or specified equal to 1.
> @@ -889,6 +903,22 @@ static void smp_parse(MachineState *ms, SMPConfiguration 
> *config, Error **errp)
> topo_msg, maxcpus, cpus);
>  return;
>  }
> +
> +if (ms->smp.cpus < mc->min_cpus) {
> +error_setg(errp, "Invalid SMP CPUs %d. The min CPUs "
> +   "supported by machine '%s' is %d",
> +   ms->smp.cpus,
> +   mc->name, mc->min_cpus);
> +return;
> +}
> +
> +if (ms->smp.max_cpus > mc->max_cpus) {
> +error_setg(errp, "Invalid SMP CPUs %d. The max CPUs "
> +   "supported by machine '%s' is %d",
> +   ms->smp.max_cpus,
> +   mc->name, mc->max_cpus);
> +return;
> +}
>  }
>
>  static void machine_get_smp(Object *obj, Visitor *v, const char *name,
> @@ -911,7 +941,6 @@ static void machine_get_smp(Object *obj, Visitor *v, 
> const char *name,
>  static void machine_set_smp(Object *obj, Visitor *v, const char *name,
>  void *opaque, Error **errp)
>  {
> -MachineClass *mc = MACHINE_GET_CLASS(obj);
>  MachineState *ms = MACHINE(obj);
>  SMPConfiguration *config;
>  ERRP_GUARD();
> @@ -920,40 +949,10 @@ static void machine_set_smp(Object *obj, Visitor *v, 
> const char *name,
>  return;
>  }
>
> -/*
> - * The CPU topology parameters must be specified greater than zero or
> - * just omitted, explicit configuration like "cpus=0" is not allowed.
> - */
> -if ((config->has_cpus && config->cpus == 0) ||
> -(config->has_sockets && config->sockets == 0) ||
> -(config->has_dies && config->dies == 0) ||
> -(config->has_cores && config->cores == 0) ||
> -(config->has_threads && config->threads == 0) ||
> -(config->has_maxcpus && config->maxcpus == 0)) {
> -error_setg(errp, "CPU topology parameters must be greater than 
> zero");
> -goto out_free;
> -}
> -
>  smp_parse(ms, config, errp);
>  if (errp) {
> -goto out_free;
> +qapi_free_SMPConfiguration(config);
>  }
> -
> -/* sanity-check smp_cpus and max_cpus against mc */
> -if (ms->smp.cpus < mc->min_cpus) {
> -error_setg(errp, "Invalid SMP CPUs %d. The min CPUs "
> -   "supported by machine '%s' is %d",
> -   ms->smp.cpus,
> -   mc->name, mc->min_cpus);
> -} else if (ms->smp.max_cpus > mc->max_cpus) {
> -error_setg(errp, "Invalid SMP CPUs %d. The max CPUs "
> -   "supported by machine '%s' is %d",
> -   ms->smp.max_cpus,
> -   mc->name, mc->max_cpus);
> -}
> -
> -out_free:
> -qapi_free_SMPConfiguration(config);
>  }
>
>  static void machine_class_init(ObjectClass *oc, void *data)

Looks good.

Reviewed-by: Pankaj Gupta 



Re: [PATCH for-6.2 v4 06/14] machine: Prefer cores over sockets in smp parsing since 6.2

2021-08-05 Thread Pankaj Gupta
> In the real SMP hardware topology world, it's much more likely that
> we have high cores-per-socket counts and few sockets totally. While
> the current preference of sockets over cores in smp parsing results
> in a virtual cpu topology with low cores-per-sockets counts and a
> large number of sockets, which is just contrary to the real world.
>
> Given that it is better to make the virtual cpu topology be more
> reflective of the real world and also for the sake of compatibility,
> we start to prefer cores over sockets over threads in smp parsing
> since machine type 6.2 for different arches.
>
> In this patch, a boolean "smp_prefer_sockets" is added, and we only
> enable the old preference on older machines and enable the new one
> since type 6.2 for all arches by using the machine compat mechanism.
>
> Reviewed-by: Andrew Jones 
> Acked-by: Cornelia Huck 
> Acked-by: David Gibson 
> Suggested-by: Daniel P. Berrange 
> Signed-off-by: Yanan Wang 
> ---
>  hw/arm/virt.c  |  1 +
>  hw/core/machine.c  | 36 ++--
>  hw/i386/pc.c   | 36 ++--
>  hw/i386/pc_piix.c  |  1 +
>  hw/i386/pc_q35.c   |  1 +
>  hw/ppc/spapr.c |  1 +
>  hw/s390x/s390-virtio-ccw.c |  1 +
>  include/hw/boards.h|  1 +
>  qemu-options.hx|  3 ++-
>  9 files changed, 60 insertions(+), 21 deletions(-)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 01165f7f53..7babea40dc 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -2797,6 +2797,7 @@ static void virt_machine_6_1_options(MachineClass *mc)
>  {
>  virt_machine_6_2_options(mc);
>  compat_props_add(mc->compat_props, hw_compat_6_1, hw_compat_6_1_len);
> +mc->smp_prefer_sockets = true;
>  }
>  DEFINE_VIRT_MACHINE(6, 1)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 458d9736e3..a8173a0f45 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -746,6 +746,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
>
>  static void smp_parse(MachineState *ms, SMPConfiguration *config, Error 
> **errp)
>  {
> +MachineClass *mc = MACHINE_GET_CLASS(ms);
>  unsigned cpus= config->has_cpus ? config->cpus : 0;
>  unsigned sockets = config->has_sockets ? config->sockets : 0;
>  unsigned cores   = config->has_cores ? config->cores : 0;
> @@ -757,7 +758,7 @@ static void smp_parse(MachineState *ms, SMPConfiguration 
> *config, Error **errp)
>  return;
>  }
>
> -/* compute missing values, prefer sockets over cores over threads */
> +/* compute missing values based on the provided ones */
>  if (cpus == 0 && maxcpus == 0) {
>  sockets = sockets > 0 ? sockets : 1;
>  cores = cores > 0 ? cores : 1;
> @@ -765,15 +766,30 @@ static void smp_parse(MachineState *ms, 
> SMPConfiguration *config, Error **errp)
>  } else {
>  maxcpus = maxcpus > 0 ? maxcpus : cpus;
>
> -if (sockets == 0) {
> -cores = cores > 0 ? cores : 1;
> -threads = threads > 0 ? threads : 1;
> -sockets = maxcpus / (cores * threads);
> -} else if (cores == 0) {
> -threads = threads > 0 ? threads : 1;
> -cores = maxcpus / (sockets * threads);
> -} else if (threads == 0) {
> -threads = maxcpus / (sockets * cores);
> +if (mc->smp_prefer_sockets) {
> +/* prefer sockets over cores over threads before 6.2 */
> +if (sockets == 0) {
> +cores = cores > 0 ? cores : 1;
> +threads = threads > 0 ? threads : 1;
> +sockets = maxcpus / (cores * threads);
> +} else if (cores == 0) {
> +threads = threads > 0 ? threads : 1;
> +cores = maxcpus / (sockets * threads);
> +} else if (threads == 0) {
> +threads = maxcpus / (sockets * cores);
> +}
> +} else {
> +/* prefer cores over sockets over threads since 6.2 */
> +if (cores == 0) {
> +sockets = sockets > 0 ? sockets : 1;
> +threads = threads > 0 ? threads : 1;
> +cores = maxcpus / (sockets * threads);
> +} else if (sockets == 0) {
> +threads = threads > 0 ? threads : 1;
> +sockets = maxcpus / (cores * threads);
> +} else if (threads == 0) {
> +threads = maxcpus / (sockets * cores);
> +}

I feel this code is repeated at multiple places. Also, (threads == 0) case
at the end is common for all the cases, we can move it out of if-else?

>  }
>  }
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index afd8b9c283..5d7c3efc43 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -717,6 +717,7 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int 
> level)
>   */
>  static void pc_smp_parse(MachineState *ms, SMPConfiguration *config, Error 

Re: [PATCH v2 4/6] util/oslib-posix: Avoid creating a single thread with MADV_POPULATE_WRITE

2021-07-28 Thread Pankaj Gupta
> Let's simplify the case when we only want a single thread and don't have
> to mess with signal handlers.
>
> Signed-off-by: David Hildenbrand 
> ---
>  util/oslib-posix.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index a1d309d495..1483e985c6 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -568,6 +568,14 @@ static bool touch_all_pages(char *area, size_t 
> hpagesize, size_t numpages,
>  }
>
>  if (use_madv_populate_write) {
> +/* Avoid creating a single thread for MADV_POPULATE_WRITE */
> +if (context.num_threads == 1) {
> +if (qemu_madvise(area, hpagesize * numpages,
> + QEMU_MADV_POPULATE_WRITE)) {
> +return true;
> +}
> +return false;
> +}
>  touch_fn = do_madv_populate_write_pages;
>  } else {
>  touch_fn = do_touch_pages;

Reviewed-by: Pankaj Gupta 



Re: [PATCH v2 3/6] util/oslib-posix: Don't create too many threads with small memory or little pages

2021-07-28 Thread Pankaj Gupta
> Let's limit the number of threads to something sane, especially that
> - We don't have more threads than the number of pages we have
> - We don't have threads that initialize small (< 64 MiB) memory
>
> Signed-off-by: David Hildenbrand 
> ---
>  util/oslib-posix.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 2ae6c3aaab..a1d309d495 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -40,6 +40,7 @@
>  #include 
>  #include "qemu/cutils.h"
>  #include "qemu/compiler.h"
> +#include "qemu/units.h"
>
>  #ifdef CONFIG_LINUX
>  #include 
> @@ -529,7 +530,8 @@ static void *do_madv_populate_write_pages(void *arg)
>  return NULL;
>  }
>
> -static inline int get_memset_num_threads(int smp_cpus)
> +static inline int get_memset_num_threads(size_t hpagesize, size_t numpages,
> + int smp_cpus)
>  {
>  long host_procs = sysconf(_SC_NPROCESSORS_ONLN);
>  int ret = 1;
> @@ -537,6 +539,12 @@ static inline int get_memset_num_threads(int smp_cpus)
>  if (host_procs > 0) {
>  ret = MIN(MIN(host_procs, MAX_MEM_PREALLOC_THREAD_COUNT), smp_cpus);
>  }
> +
> +/* Especially with gigantic pages, don't create more threads than pages. 
> */
> +ret = MIN(ret, numpages);
> +/* Don't start threads to prealloc comparatively little memory. */
> +ret = MIN(ret, MAX(1, hpagesize * numpages / (64 * MiB)));
> +
>  /* In case sysconf() fails, we fall back to single threaded */
>  return ret;
>  }
> @@ -546,7 +554,7 @@ static bool touch_all_pages(char *area, size_t hpagesize, 
> size_t numpages,
>  {
>  static gsize initialized = 0;
>  MemsetContext context = {
> -.num_threads = get_memset_num_threads(smp_cpus),
> +.num_threads = get_memset_num_threads(hpagesize, numpages, smp_cpus),
>  };
>  size_t numpages_per_thread, leftover;
>  void *(*touch_fn)(void *);

Reviewed-by: Pankaj Gupta 



Re: [PATCH for-6.2 v3 07/11] machine: Use ms instead of global current_machine in sanity-check

2021-07-27 Thread Pankaj Gupta
> In the sanity-check of smp_cpus and max_cpus against mc in function
> machine_set_smp(), we are now using ms->smp.max_cpus for the check
> but using current_machine->smp.max_cpus in the error message.
> Tweak this by uniformly using the local ms.
>
> Reviewed-by: Andrew Jones 
> Signed-off-by: Yanan Wang 
> ---
>  hw/core/machine.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index a8173a0f45..e13a8f2f34 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -878,7 +878,7 @@ static void machine_set_smp(Object *obj, Visitor *v, 
> const char *name,
>  } else if (ms->smp.max_cpus > mc->max_cpus) {
>  error_setg(errp, "Invalid SMP CPUs %d. The max CPUs "
> "supported by machine '%s' is %d",
> -   current_machine->smp.max_cpus,
> +   ms->smp.max_cpus,
> mc->name, mc->max_cpus);
>  }
>

Reviewed-by: Pankaj Gupta 



[PATCH] docs/nvdimm: Update nvdimm option value in machine example

2021-07-26 Thread Pankaj Gupta
Update nvdimm option value in example command from "-machine pc,nvdimm"
to "-machine pc,nvdimm=on" as former complains with the below error:

"qemu-system-x86_64: -machine pc,nvdimm: Expected '=' after parameter 'nvdimm'"

Signed-off-by: Pankaj Gupta 
---
 docs/nvdimm.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 0aae682be3..fd7773dc5a 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -15,7 +15,7 @@ backend (i.e. memory-backend-file and memory-backend-ram). A 
simple
 way to create a vNVDIMM device at startup time is done via the
 following command line options:
 
- -machine pc,nvdimm
+ -machine pc,nvdimm=on
  -m $RAM_SIZE,slots=$N,maxmem=$MAX_SIZE
  -object 
memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE,readonly=off
  -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off
-- 
2.25.1




Re: [PATCH for-6.2 v2 00/11] machine: smp parsing fixes and improvement

2021-07-21 Thread Pankaj Gupta
> > On Mon, Jul 19 2021, Yanan Wang  wrote:
> >
> >> Hi,
> >>
> >> This is v2 of the series [1] that I have posted to introduce some smp 
> >> parsing
> >> fixes and improvement, much more work has been processed compared to RFC 
> >> v1.
> >>
> >> [1] https://lists.gnu.org/archive/html/qemu-devel/2021-07/msg00259.html
> >>
> >> The purpose of this series is to improve/fix the parsing logic. Explicitly
> >> specifying a CPU topology parameter as zero is not allowed any more, and
> >> maxcpus is now uniformly used to calculate the omitted parameters. It's 
> >> also
> >> suggested that we should start to prefer cores over sockets over threads on
> >> the newer machine types, which will make the computed virtual topology more
> >> reflective of the real hardware.
> >>
> >> In order to reduce code duplication and ease the code maintenance, 
> >> smp_parse
> >> in now converted into a parser generic enough for all arches, so that the 
> >> PC
> >> specific one can be removed. It's also convenient to introduce more 
> >> topology
> >> members (e.g. cluster) to the generic parser in the future.
> > Cc:ing Pierre, as he also had been looking at the smp parsing code (for
> > s390x) recently.
> >
> > Also, please keep me on cc: for patches that touch s390x.
> Sure, I will. Sorry about the missing. :)
>
> Thanks,
> Yanan
> .
> >> Finally, a QEMU unit test for the parsing of given SMP configuration is 
> >> added.
> >> Since all the parsing logic is in generic function smp_parse(), this test
> >> passes diffenent SMP configurations to the function and compare the parsing
> >> result with what is expected. In the test, all possible collections of the
> >> topology parameters and the corressponding expected results are listed,
> >> including the valid and invalid ones. The preference of sockets over cores
> >> and the preference of cores over sockets, and the support of multi-dies are
> >> also taken into consideration.
> >>
> >> ---
> >>
> >> Changelogs:
> >>
> >> v1->v2:
> >> - disallow "anything=0" in the smp configuration (Andrew)
> >> - make function smp_parse() a generic helper for all arches
> >> - improve the error reporting in the parser
> >> - start to prefer cores over sockets since 6.2 (Daniel)
> >> - add a unit test for the smp parsing (Daniel)
> >>
> >> ---
> >>
> >> Yanan Wang (11):
> >>machine: Disallow specifying topology parameters as zero
> >>machine: Make smp_parse generic enough for all arches
> >>machine: Uniformly use maxcpus to calculate the omitted parameters
> >>machine: Use the computed parameters to calculate omitted cpus
> >>machine: Improve the error reporting of smp parsing
> >>hw: Add compat machines for 6.2
> >>machine: Prefer cores over sockets in smp parsing since 6.2
> >>machine: Use ms instead of global current_machine in sanity-check
> >>machine: Tweak the order of topology members in struct CpuTopology
> >>machine: Split out the smp parsing code
> >>tests/unit: Add a unit test for smp parsing
> >>
> >>   MAINTAINERS |2 +
> >>   hw/arm/virt.c   |   10 +-
> >>   hw/core/machine-smp.c   |  124 
> >>   hw/core/machine.c   |   68 +--
> >>   hw/core/meson.build |1 +
> >>   hw/i386/pc.c|   66 +--
> >>   hw/i386/pc_piix.c   |   15 +-
> >>   hw/i386/pc_q35.c|   14 +-
> >>   hw/ppc/spapr.c  |   16 +-
> >>   hw/s390x/s390-virtio-ccw.c  |   15 +-
> >>   include/hw/boards.h |   13 +-
> >>   include/hw/i386/pc.h|3 +
> >>   qapi/machine.json   |6 +-
> >>   qemu-options.hx |4 +-
> >>   tests/unit/meson.build  |1 +
> >>   tests/unit/test-smp-parse.c | 1117 +++
> >>   16 files changed, 1338 insertions(+), 137 deletions(-)
> >>   create mode 100644 hw/core/machine-smp.c
> >>   create mode 100644 tests/unit/test-smp-parse.c
> >>

Please add me in CC as I reviewed one of the patch and was in middle
of reviewing few other patches
but missed the latest version.

Thanks,
Pankaj



Re: [PATCH 11/16] microvm: Drop dead error handling in microvm_machine_state_init()

2021-07-20 Thread Pankaj Gupta
> Stillborn in commit 0ebf007dda "hw/i386: Introduce the microvm machine
> type".
>
> Cc: Sergio Lopez 
> Signed-off-by: Markus Armbruster 
> ---
>  hw/i386/microvm.c | 5 -
>  1 file changed, 5 deletions(-)
>
> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
> index aba0c83219..f257ec5a0b 100644
> --- a/hw/i386/microvm.c
> +++ b/hw/i386/microvm.c
> @@ -458,15 +458,10 @@ static void microvm_machine_state_init(MachineState 
> *machine)
>  {
>  MicrovmMachineState *mms = MICROVM_MACHINE(machine);
>  X86MachineState *x86ms = X86_MACHINE(machine);
> -Error *local_err = NULL;
>
>  microvm_memory_init(mms);
>
>  x86_cpus_init(x86ms, CPU_VERSION_LATEST);
> -if (local_err) {
> -error_report_err(local_err);
> -    exit(1);
> -}
>
>  microvm_devices_init(mms);
>  }
> --
> 2.31.1

Reviewed-by: Pankaj Gupta 



Re: [PATCH 16/16] vl: Don't continue after -smp help.

2021-07-20 Thread Pankaj Gupta
> We continue after -smp help:
>
> $ qemu-system-x86_64 -smp help -display none -monitor stdio
> smp-opts options:
>   cores=
>   cpus=
>   dies=
>   maxcpus=
>   sockets=
>   threads=
> QEMU 6.0.50 monitor - type 'help' for more information
> (qemu)
>
> Other options, such as -object help and -device help, don't.
>
> Adjust -smp not to continue either.
>
> Cc: Paolo Bonzini 
> Signed-off-by: Markus Armbruster 
> ---
>  softmmu/vl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index ce0ecc736b..8f9d97635a 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -1543,7 +1543,7 @@ machine_parse_property_opt(QemuOptsList *opts_list, 
> const char *propname,
>  prop = keyval_parse(arg, opts_list->implied_opt_name, &help, 
> &error_fatal);
>  if (help) {
>  qemu_opts_print_help(opts_list, true);
> -    return;
> +exit(0);
>  }
>  opts = qdict_new();
>  qdict_put(opts, propname, prop);
> --

Reviewed-by: Pankaj Gupta 



Re: [PATCH 10/16] migration: Handle migration_incoming_setup() errors consistently

2021-07-20 Thread Pankaj Gupta
> Commit b673eab4e2 "multifd: Make multifd_load_setup() get an Error
> parameter" changed migration_incoming_setup() to take an Error **
> argument, and adjusted the callers accordingly.  It neglected to
> change adjust multifd_load_setup(): it still exit()s on error.  Clean
> that up.
>
> The error now gets propagated up two call chains: via
> migration_fd_process_incoming() to rdma_accept_incoming_migration(),
> and via migration_ioc_process_incoming() to
> migration_channel_process_incoming().  Both chain ends report the
> error with error_report_err(), but otherwise ignore it.  Behavioral
> change: we no longer exit() on this error.
>
> This is consistent with how we handle other errors here, e.g. from
> multifd_recv_new_channel() via migration_ioc_process_incoming() to
> migration_channel_process_incoming().  Wether it's consistently right
> or consistently wrong I can't tell.
>
> Also clean up the return value from the unusual 0 on success, 1 on
> error to the more common true on success, false on error.
>
> Cc: Juan Quintela 
> Cc: Dr. David Alan Gilbert 
> Signed-off-by: Markus Armbruster 
> ---
>  migration/migration.c | 27 +--
>  1 file changed, 9 insertions(+), 18 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 231dc24414..c1c0a48647 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -609,30 +609,25 @@ fail:
>  }
>
>  /**
> - * @migration_incoming_setup: Setup incoming migration
> - *
> - * Returns 0 for no error or 1 for error
> - *
> + * migration_incoming_setup: Setup incoming migration
>   * @f: file for main migration channel
>   * @errp: where to put errors
> + *
> + * Returns: %true on success, %false on error.
>   */
> -static int migration_incoming_setup(QEMUFile *f, Error **errp)
> +static bool migration_incoming_setup(QEMUFile *f, Error **errp)
>  {
>  MigrationIncomingState *mis = migration_incoming_get_current();
> -Error *local_err = NULL;
>
> -if (multifd_load_setup(&local_err) != 0) {
> -/* We haven't been able to create multifd threads
> -   nothing better to do */
> -error_report_err(local_err);
> -exit(EXIT_FAILURE);
> +if (multifd_load_setup(errp) != 0) {
> +return false;
>  }
>
>  if (!mis->from_src_file) {
>  mis->from_src_file = f;
>  }
>  qemu_file_set_blocking(f, false);
> -return 0;
> +return true;
>  }
>
>  void migration_incoming_process(void)
> @@ -675,14 +670,11 @@ static bool postcopy_try_recover(QEMUFile *f)
>
>  void migration_fd_process_incoming(QEMUFile *f, Error **errp)
>  {
> -Error *local_err = NULL;
> -
>  if (postcopy_try_recover(f)) {
>  return;
>  }
>
> -if (migration_incoming_setup(f, &local_err)) {
> -error_propagate(errp, local_err);
> +if (!migration_incoming_setup(f, errp)) {
>  return;
>  }
>  migration_incoming_process();
> @@ -703,8 +695,7 @@ void migration_ioc_process_incoming(QIOChannel *ioc, 
> Error **errp)
>  return;
>  }
>
> -if (migration_incoming_setup(f, &local_err)) {
> -error_propagate(errp, local_err);
> +if (!migration_incoming_setup(f, errp)) {
>  return;
>  }
>

Reviewed-by: Pankaj Gupta 



Re: [PATCH v1 0/3] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()

2021-07-20 Thread Pankaj Gupta
> > #1 adds support for MADV_POPULATE_WRITE, #2 cleans up the code to avoid
> > global variables and prepare for concurrency and #3 makes os_mem_prealloc()
> > safe to be called from multiple threads concurrently.
> >
> > Details regarding MADV_POPULATE_WRITE can be found in introducing upstream
> > Linux commit 4ca9b3859dac ("mm/madvise: introduce
> > MADV_POPULATE_(READ|WRITE) to prefault page tables") and in the latest man
> > page patch [1].
> >
> > [1] https://lkml.kernel.org/r/20210712083917.16361-1-da...@redhat.com
> >
> > Cc: Paolo Bonzini 
> > Cc: "Michael S. Tsirkin" 
> > Cc: Igor Mammedov 
> > Cc: Eduardo Habkost 
> > Cc: Dr. David Alan Gilbert 
> > Cc: Marek Kedzierski 
> > Cc: Pankaj Gupta 
> >
> > David Hildenbrand (3):
> >   util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()
> >   util/oslib-posix: Introduce and use MemsetContext for
> > touch_all_pages()
> >   util/oslib-posix: Support concurrent os_mem_prealloc() invocation
> >
> >  include/qemu/osdep.h |   7 ++
> >  util/oslib-posix.c   | 167 ++-
> >  2 files changed, 126 insertions(+), 48 deletions(-)
> >
>
> Nice implementation to avoid wear of memory device for prealloc case

For prealloc case I mean.

> and to avoid touching of
> all the memory and abrupt exit of VM because of lack of memory. Instead better
> way to populate the page tables with madvise.
>
> Plan is to use this infrastructure for virtio-mem, I guess?
>
> For the patches 1 & 3:
> Reviewed-by: Pankaj Gupta 
>
>
> Thanks,
> Pankaj



Re: [PATCH v1 0/3] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()

2021-07-20 Thread Pankaj Gupta
> #1 adds support for MADV_POPULATE_WRITE, #2 cleans up the code to avoid
> global variables and prepare for concurrency and #3 makes os_mem_prealloc()
> safe to be called from multiple threads concurrently.
>
> Details regarding MADV_POPULATE_WRITE can be found in introducing upstream
> Linux commit 4ca9b3859dac ("mm/madvise: introduce
> MADV_POPULATE_(READ|WRITE) to prefault page tables") and in the latest man
> page patch [1].
>
> [1] https://lkml.kernel.org/r/20210712083917.16361-1-da...@redhat.com
>
> Cc: Paolo Bonzini 
> Cc: "Michael S. Tsirkin" 
> Cc: Igor Mammedov 
> Cc: Eduardo Habkost 
> Cc: Dr. David Alan Gilbert 
> Cc: Marek Kedzierski 
> Cc: Pankaj Gupta 
>
> David Hildenbrand (3):
>   util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()
>   util/oslib-posix: Introduce and use MemsetContext for
> touch_all_pages()
>   util/oslib-posix: Support concurrent os_mem_prealloc() invocation
>
>  include/qemu/osdep.h |   7 ++
>  util/oslib-posix.c   | 167 ++-
>  2 files changed, 126 insertions(+), 48 deletions(-)
>

Nice implementation to avoid wear of memory device for prealloc case
and to avoid touching of
all the memory and abrupt exit of VM because of lack of memory. Instead better
way to populate the page tables with madvise.

Plan is to use this infrastructure for virtio-mem, I guess?

For the patches 1 & 3:
Reviewed-by: Pankaj Gupta 


Thanks,
Pankaj



Re: [PATCH for-6.2 v2 06/11] hw: Add compat machines for 6.2

2021-07-19 Thread Pankaj Gupta
 pc_q35_6_1_machine_options);
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 81699d4f8b..d39fd4e644 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -4685,15 +4685,26 @@ static void 
> spapr_machine_latest_class_options(MachineClass *mc)
>  }\
>  type_init(spapr_machine_register_##suffix)
>
> +/*
> + * pseries-6.2
> + */
> +static void spapr_machine_6_2_class_options(MachineClass *mc)
> +{
> +/* Defaults for the latest behaviour inherited from the base class */
> +}
> +
> +DEFINE_SPAPR_MACHINE(6_2, "6.2", true);
> +
>  /*
>   * pseries-6.1
>   */
>  static void spapr_machine_6_1_class_options(MachineClass *mc)
>  {
> -/* Defaults for the latest behaviour inherited from the base class */
> +spapr_machine_6_2_class_options(mc);
> +compat_props_add(mc->compat_props, hw_compat_6_1, hw_compat_6_1_len);
>  }
>
> -DEFINE_SPAPR_MACHINE(6_1, "6.1", true);
> +DEFINE_SPAPR_MACHINE(6_1, "6.1", false);
>
>  /*
>   * pseries-6.0
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index e4b18aef49..4d25278cf2 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -791,14 +791,26 @@ bool css_migration_enabled(void)
>  }
>  \
>  type_init(ccw_machine_register_##suffix)
>
> +static void ccw_machine_6_2_instance_options(MachineState *machine)
> +{
> +}
> +
> +static void ccw_machine_6_2_class_options(MachineClass *mc)
> +{
> +}
> +DEFINE_CCW_MACHINE(6_2, "6.2", true);
> +
>  static void ccw_machine_6_1_instance_options(MachineState *machine)
>  {
> +ccw_machine_6_2_instance_options(machine);
>  }
>
>  static void ccw_machine_6_1_class_options(MachineClass *mc)
>  {
> +ccw_machine_6_2_class_options(mc);
> +compat_props_add(mc->compat_props, hw_compat_6_1, hw_compat_6_1_len);
>  }
> -DEFINE_CCW_MACHINE(6_1, "6.1", true);
> +DEFINE_CCW_MACHINE(6_1, "6.1", false);
>
>  static void ccw_machine_6_0_instance_options(MachineState *machine)
>  {
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index b6161cee88..2832f0f8aa 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -354,6 +354,9 @@ struct MachineState {
>  } \
>  type_init(machine_initfn##_register_types)
>
> +extern GlobalProperty hw_compat_6_1[];
> +extern const size_t hw_compat_6_1_len;
> +
>  extern GlobalProperty hw_compat_6_0[];
>  extern const size_t hw_compat_6_0_len;
>
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 88dffe7517..97b4ab79b5 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -196,6 +196,9 @@ void pc_system_parse_ovmf_flash(uint8_t *flash_ptr, 
> size_t flash_size);
>  void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
> const CPUArchIdList *apic_ids, GArray *entry);
>
> +extern GlobalProperty pc_compat_6_1[];
> +extern const size_t pc_compat_6_1_len;
> +
>  extern GlobalProperty pc_compat_6_0[];
>  extern const size_t pc_compat_6_0_len;
>
> --

Reviewed-by: Pankaj Gupta 



Re: [PATCH] hw/net/net_tx_pkt: Fix crash detected by fuzzer

2021-07-18 Thread Pankaj Gupta
> QEMU currently crashes when it's started like this:
>
> cat << EOF | ./qemu-system-i386 -device vmxnet3 -nodefaults -qtest stdio
> outl 0xcf8 0x80001014
> outl 0xcfc 0xe0001000
> outl 0xcf8 0x80001018
> outl 0xcf8 0x80001004
> outw 0xcfc 0x7
> outl 0xcf8 0x80001083
> write 0x0 0x1 0xe1
> write 0x1 0x1 0xfe
> write 0x2 0x1 0xbe
> write 0x3 0x1 0xba
> writeq 0xe0001020 0xefefff5ecafe
> writeq 0xe0001020 0x5e5ccafe0002
> EOF
>
> It hits this assertion:
>
> qemu-system-i386: ../qemu/hw/net/net_tx_pkt.c:453: net_tx_pkt_reset:
>  Assertion `pkt->raw' failed.
>
> This happens because net_tx_pkt_init() is called with max_frags == 0 and
> thus the allocation
>
> p->raw = g_new(struct iovec, max_frags);
>
> results in a NULL pointer that cause the
>
> assert(pkt->raw);
>
> in net_tx_pkt_reset() to fail later. To fix this issue we can check
> that max_raw_frags was not zero before asserting that pkt->raw is
> a non-NULL pointer.
>
> Buglink: https://bugs.launchpad.net/qemu/+bug/1890157
> Signed-off-by: Thomas Huth 
> ---
>  hw/net/net_tx_pkt.c | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
> index 1f9aa59eca..1cb1125d9f 100644
> --- a/hw/net/net_tx_pkt.c
> +++ b/hw/net/net_tx_pkt.c
> @@ -450,11 +450,13 @@ void net_tx_pkt_reset(struct NetTxPkt *pkt)
>  pkt->payload_len = 0;
>  pkt->payload_frags = 0;
>
> -assert(pkt->raw);
> -for (i = 0; i < pkt->raw_frags; i++) {
> -assert(pkt->raw[i].iov_base);
> -pci_dma_unmap(pkt->pci_dev, pkt->raw[i].iov_base, 
> pkt->raw[i].iov_len,
> -  DMA_DIRECTION_TO_DEVICE, 0);
> +if (pkt->max_raw_frags > 0) {
> +assert(pkt->raw);
> +for (i = 0; i < pkt->raw_frags; i++) {
> +    assert(pkt->raw[i].iov_base);
> +pci_dma_unmap(pkt->pci_dev, pkt->raw[i].iov_base,
> +  pkt->raw[i].iov_len, DMA_DIRECTION_TO_DEVICE, 0);
> +}
>  }
>  pkt->raw_frags = 0;
>

Reviewed-by: Pankaj Gupta 



Re: [PATCH v3 2/3] memory: add memory_region_is_mapped_shared()

2021-07-14 Thread Pankaj Gupta
> Add a function to query whether a memory region is mmap(MAP_SHARED).
> This will be used to check that vhost-user memory regions can be shared
> with the device backend process in the next patch.
>
> An inline function in "exec/memory.h" would have been nice but RAMBlock
> fields are only accessible from memory.c (see "exec/ramblock.h").
>
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/exec/memory.h | 11 +++
>  softmmu/memory.c  |  6 ++
>  2 files changed, 17 insertions(+)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index c3d417d317..5976f05a01 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -2763,6 +2763,17 @@ static inline bool 
> memory_access_is_direct(MemoryRegion *mr, bool is_write)
>  }
>  }
>
> +/**
> + * memory_region_is_mapped_shared: check whether a memory region is
> + * mmap(MAP_SHARED)
> + *
> + * Returns %true is a memory region is mmap(MAP_SHARED). This is always false
> + * on memory regions that do not support memory_region_get_ram_ptr().
> + *
> + * @mr: the memory region being queried
> + */
> +bool memory_region_is_mapped_shared(MemoryRegion *mr);
> +
>  /**
>   * address_space_read: read from an address space.
>   *
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index bfedaf9c4d..3c4510c809 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -1811,6 +1811,12 @@ bool memory_region_is_ram_device(MemoryRegion *mr)
>  return mr->ram_device;
>  }
>
> +bool memory_region_is_mapped_shared(MemoryRegion *mr)
> +{
> +return memory_access_is_direct(mr, false) &&
> +   (mr->ram_block->flags & RAM_SHARED);
> +}
> +
>  uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
>  {
>  uint8_t mask = mr->dirty_log_mask;

Reviewed-by: Pankaj Gupta 



Re: [PATCH v3 1/3] tests/qtest/vhost-user-test: use share=on with memfd

2021-07-14 Thread Pankaj Gupta
> Add share=on for consistency and to prevent future bugs in the test.
>
> Note that share=on is the default for memory-backend-memfd so existing
> tests work.
>
> Reviewed-by: Marc-André Lureau 
> Acked-by: Thomas Huth 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  tests/qtest/vhost-user-test.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
> index 3d6337fb5c..6c0891d429 100644
> --- a/tests/qtest/vhost-user-test.c
> +++ b/tests/qtest/vhost-user-test.c
> @@ -40,7 +40,7 @@
>  #define QEMU_CMD_MEM" -m %d -object 
> memory-backend-file,id=mem,size=%dM," \
>  "mem-path=%s,share=on -numa node,memdev=mem"
>  #define QEMU_CMD_MEMFD  " -m %d -object 
> memory-backend-memfd,id=mem,size=%dM," \
> -" -numa node,memdev=mem"
> +"share=on -numa node,memdev=mem"
>  #define QEMU_CMD_CHR" -chardev socket,id=%s,path=%s%s"
>  #define QEMU_CMD_NETDEV " -netdev vhost-user,id=hs0,chardev=%s,vhostforce"
>

Reviewed-by: Pankaj Gupta 



Re: [PATCH v2 3/4] qemu-options: tweak to show that CPU count is optional

2021-07-13 Thread Pankaj Gupta
> The initial CPU count number is not required, if any of the topology
> options are given, since it can be computed.
>
> Reviewed-by: Andrew Jones 
> Reviewed-by: Yanan Wang 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  qemu-options.hx | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 6b72617844..14ff35dd4e 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -196,7 +196,7 @@ SRST
>  ERST
>
>  DEF("smp", HAS_ARG, QEMU_OPTION_smp,
> -"-smp 
> [cpus=]n[,maxcpus=cpus][,sockets=sockets][,dies=dies][,cores=cores][,threads=threads]\n"
> +"-smp 
> [[cpus=]n][,maxcpus=cpus][,sockets=sockets][,dies=dies][,cores=cores][,threads=threads]\n"
>  "set the number of CPUs to 'n' [default=1]\n"
>  "maxcpus= maximum number of total CPUs, including\n"
>  "offline CPUs for hotplug, etc\n"
> @@ -206,7 +206,7 @@ DEF("smp", HAS_ARG, QEMU_OPTION_smp,
>  "threads= number of threads on one CPU core\n",
>  QEMU_ARCH_ALL)
>  SRST
> -``-smp 
> [cpus=]n[,maxcpus=maxcpus][,sockets=sockets][,dies=dies][,cores=cores][,threads=threads]``
> +``-smp 
> [[cpus=]n][,maxcpus=maxcpus][,sockets=sockets][,dies=dies][,cores=cores][,threads=threads]``
>  Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
>  are supported. On Sparc32 target, Linux limits the number of usable
>  CPUs to 4. For the PC target, the number of cores per die, the
> --

Reviewed-by: Pankaj Gupta 



Re: [PATCH v2 4/4] qemu-options: rewrite help for -smp options

2021-07-13 Thread Pankaj Gupta
> The -smp option help is peculiarly specific about mentioning the CPU
> upper limits, but these are wrong. The "PC" target has varying max
> CPU counts depending on the machine type picked. Notes about guest
> OS limits are inappropriate for QEMU docs. There are way too many
> machine types for it to be practical to mention actual limits, and
> some limits are even modified by downstream distribtions. Thus it
> is better to remove the specific limits entirely.
>
> The CPU topology reporting is also not neccessarily specific to the
> PC platform and descriptions around the rules of usage are somewhat
> terse. Expand this information with some examples to show effects
> of defaulting.
>
> Reviewed-by: Andrew Jones 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  qemu-options.hx | 29 +
>  1 file changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 14ff35dd4e..214c477dcc 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -207,14 +207,27 @@ DEF("smp", HAS_ARG, QEMU_OPTION_smp,
>  QEMU_ARCH_ALL)
>  SRST
>  ``-smp 
> [[cpus=]n][,maxcpus=maxcpus][,sockets=sockets][,dies=dies][,cores=cores][,threads=threads]``
> -Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
> -are supported. On Sparc32 target, Linux limits the number of usable
> -CPUs to 4. For the PC target, the number of cores per die, the
> -number of threads per cores, the number of dies per packages and the
> -total number of sockets can be specified. Missing values will be
> -computed. If any on the three values is given, the total number of
> -CPUs n can be omitted. maxcpus specifies the maximum number of
> -hotpluggable CPUs.
> +Simulate a SMP system with '\ ``n``\ ' CPUs initially present on
> +the machine type board. On boards supporting CPU hotplug, the optional
> +'\ ``maxcpus``\ ' parameter can be set to enable further CPUs to be
> +added at runtime. If omitted the maximum number of CPUs will be
> +set to match the initial CPU count. Both parameters are subject to
> +an upper limit that is determined by the specific machine type chosen.
> +
> +To control reporting of CPU topology information, the number of sockets,
> +dies per socket, cores per die, and threads per core can be specified.
> +The sum `` sockets * cores * dies * threads `` must be equal to the
> +maximum CPU count. CPU targets may only support a subset of the topology
> +parameters. Where a CPU target does not support use of a particular
> +topology parameter, its value should be assumed to be 1 for the purpose
> +of computing the CPU maximum count.
> +
> +Either the initial CPU count, or at least one of the topology parameters
> +must be specified. Values for any omitted parameters will be computed
> +from those which are given. Historically preference was given to the
> +coarsest topology parameters when computing missing values (ie sockets
> +preferred over cores, which were preferred over threads), however, this
> +behaviour is considered liable to change.
>  ERST
>
>  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> --

Reviewed-by: Pankaj Gupta 



Re: [PATCH v2 1/4] docs: fix typo s/Intel/AMD/ in CPU model notes

2021-07-13 Thread Pankaj Gupta
> Reviewed-by: Andrew Jones 
> Reviewed-by: Yanan Wang 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  docs/system/cpu-models-x86.rst.inc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/docs/system/cpu-models-x86.rst.inc 
> b/docs/system/cpu-models-x86.rst.inc
> index f40ee03ecc..9119f5dff5 100644
> --- a/docs/system/cpu-models-x86.rst.inc
> +++ b/docs/system/cpu-models-x86.rst.inc
> @@ -227,7 +227,7 @@ features are included if using "Host passthrough" or 
> "Host model".
>  Preferred CPU models for AMD x86 hosts
>  ^^
>
> -The following CPU models are preferred for use on Intel hosts.
> +The following CPU models are preferred for use on AMD hosts.
>  Administrators / applications are recommended to use the CPU model that
>  matches the generation of the host CPUs in use. In a deployment with a
>  mixture of host CPU models between machines, if live migration
> --

Reviewed-by: Pankaj Gupta 



Re: [PATCH v2 2/4] qemu-options: re-arrange CPU topology options

2021-07-13 Thread Pankaj Gupta
> The list of CPU topology options are presented in a fairly arbitrary
> order currently. Re-arrange them so that they're ordered from largest to
> smallest unit
>
> Reviewed-by: Andrew Jones 
> Reviewed-by: Yanan Wang 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  qemu-options.hx | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 8965dabc83..6b72617844 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -196,17 +196,17 @@ SRST
>  ERST
>
>  DEF("smp", HAS_ARG, QEMU_OPTION_smp,
> -"-smp 
> [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
> +"-smp 
> [cpus=]n[,maxcpus=cpus][,sockets=sockets][,dies=dies][,cores=cores][,threads=threads]\n"
>  "set the number of CPUs to 'n' [default=1]\n"
> -"maxcpus= maximum number of total cpus, including\n"
> +"maxcpus= maximum number of total CPUs, including\n"
>  "offline CPUs for hotplug, etc\n"
> -"cores= number of CPU cores on one socket (for PC, it's 
> on one die)\n"
> -"threads= number of threads on one CPU core\n"
> +"sockets= number of discrete sockets in the system\n"
>  "dies= number of CPU dies on one socket (for PC only)\n"
> -"sockets= number of discrete sockets in the system\n",
> +"cores= number of CPU cores on one socket (for PC, it's 
> on one die)\n"
> +"threads= number of threads on one CPU core\n",
>  QEMU_ARCH_ALL)
>  SRST
> -``-smp 
> [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
> +``-smp 
> [cpus=]n[,maxcpus=maxcpus][,sockets=sockets][,dies=dies][,cores=cores][,threads=threads]``
>  Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
>  are supported. On Sparc32 target, Linux limits the number of usable
>  CPUs to 4. For the PC target, the number of cores per die, the
> --

Reviewed-by: Pankaj Gupta 



Re: [RFC PATCH 6/6] machine: Tweak the order of topology members in struct CpuTopology

2021-07-12 Thread Pankaj Gupta
> Now that all the possible topology parameters are integrated in struct
> CpuTopology, tweak the order of topology members to be "cpus/sockets/
> dies/cores/threads/maxcpus" for readability and consistency. We also
> tweak the comment by adding explanation of dies parameter.
>
> Signed-off-by: Yanan Wang 
> ---
>  hw/core/machine.c   | 4 ++--
>  include/hw/boards.h | 7 ---
>  2 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 55785feee2..8c538d2ba5 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -968,10 +968,10 @@ static void machine_initfn(Object *obj)
>  /* default to mc->default_cpus */
>  ms->smp.cpus = mc->default_cpus;
>  ms->smp.max_cpus = mc->default_cpus;
> -ms->smp.cores = 1;
> +ms->smp.sockets = 1;
>  ms->smp.dies = 1;
> +ms->smp.cores = 1;
>  ms->smp.threads = 1;
> -ms->smp.sockets = 1;
>  }
>
>  static void machine_finalize(Object *obj)
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 1eae4427e8..3b64757981 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -275,17 +275,18 @@ typedef struct DeviceMemoryState {
>  /**
>   * CpuTopology:
>   * @cpus: the number of present logical processors on the machine
> - * @cores: the number of cores in one package
> - * @threads: the number of threads in one core
>   * @sockets: the number of sockets on the machine
> + * @dies: the number of dies in one socket
> + * @cores: the number of cores in one die
> + * @threads: the number of threads in one core
>   * @max_cpus: the maximum number of logical processors on the machine
>   */
>  typedef struct CpuTopology {
>  unsigned int cpus;
> +unsigned int sockets;
>  unsigned int dies;
>  unsigned int cores;
>  unsigned int threads;
> -unsigned int sockets;
>  unsigned int max_cpus;
>  } CpuTopology;

Reviewed-by: Pankaj Gupta 



Re: [PATCH] io_uring: move LuringState typedef to block/aio.h

2021-07-12 Thread Pankaj Gupta
> The LuringState typedef is defined twice, in include/block/raw-aio.h and
> block/io_uring.c.  Move it in include/block/aio.h, which is included
> everywhere the typedef is needed, since include/block/aio.h already has
> to define the forward reference to the struct.
>
> Signed-off-by: Paolo Bonzini 
> ---
>  block/io_uring.c| 4 ++--
>  include/block/aio.h | 8 
>  include/block/raw-aio.h | 1 -
>  3 files changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/block/io_uring.c b/block/io_uring.c
> index 00a3ee9fb8..aa856a4c5d 100644
> --- a/block/io_uring.c
> +++ b/block/io_uring.c
> @@ -46,7 +46,7 @@ typedef struct LuringQueue {
>  QSIMPLEQ_HEAD(, LuringAIOCB) submit_queue;
>  } LuringQueue;
>
> -typedef struct LuringState {
> +struct LuringState {
>  AioContext *aio_context;
>
>  struct io_uring ring;
> @@ -56,7 +56,7 @@ typedef struct LuringState {
>
>  /* I/O completion processing.  Only runs in I/O thread.  */
>  QEMUBH *completion_bh;
> -} LuringState;
> +};
>
>  /**
>   * luring_resubmit:
> diff --git a/include/block/aio.h b/include/block/aio.h
> index 807edce9b5..8e2e4fe10f 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -54,7 +54,7 @@ typedef void IOHandler(void *opaque);
>  struct Coroutine;
>  struct ThreadPool;
>  struct LinuxAioState;
> -struct LuringState;
> +typedef struct LuringState LuringState;
>
>  /* Is polling disabled? */
>  bool aio_poll_disabled(AioContext *ctx);
> @@ -209,7 +209,7 @@ struct AioContext {
>   * State for Linux io_uring.  Uses aio_context_acquire/release for
>   * locking.
>   */
> -struct LuringState *linux_io_uring;
> +LuringState *linux_io_uring;
>
>  /* State for file descriptor monitoring using Linux io_uring */
>  struct io_uring fdmon_io_uring;
> @@ -513,10 +513,10 @@ struct LinuxAioState *aio_setup_linux_aio(AioContext 
> *ctx, Error **errp);
>  struct LinuxAioState *aio_get_linux_aio(AioContext *ctx);
>
>  /* Setup the LuringState bound to this AioContext */
> -struct LuringState *aio_setup_linux_io_uring(AioContext *ctx, Error **errp);
> +LuringState *aio_setup_linux_io_uring(AioContext *ctx, Error **errp);
>
>  /* Return the LuringState bound to this AioContext */
> -struct LuringState *aio_get_linux_io_uring(AioContext *ctx);
> +LuringState *aio_get_linux_io_uring(AioContext *ctx);
>  /**
>   * aio_timer_new_with_attrs:
>   * @ctx: the aio context
> diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
> index 251b10d273..af0ea0fba4 100644
> --- a/include/block/raw-aio.h
> +++ b/include/block/raw-aio.h
> @@ -59,7 +59,6 @@ void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s);
>  #endif
>  /* io_uring.c - Linux io_uring implementation */
>  #ifdef CONFIG_LINUX_IO_URING
> -typedef struct LuringState LuringState;
>  LuringState *luring_init(Error **errp);
>  void luring_cleanup(LuringState *s);
>  int coroutine_fn luring_co_submit(BlockDriverState *bs, LuringState *s, int 
> fd,
> --

Reviewed-by: Pankaj Gupta 



Re: [PATCH 2/4] qemu-options: re-arrange CPU topology options

2021-07-12 Thread Pankaj Gupta
> The list of CPU topology options are presented in a fairly arbitrary
> order currently. Re-arrange them so that they're ordered from largest to
> smallest unit
>
> Signed-off-by: Daniel P. Berrangé 
> ---
>  qemu-options.hx | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/qemu-options.hx b/qemu-options.hx
> index ba3ca9da1d..aa33dfdcfd 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -196,17 +196,17 @@ SRST
>  ERST
>
>  DEF("smp", HAS_ARG, QEMU_OPTION_smp,
> -"-smp 
> [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
> +"-smp 
> [cpus=]n[,maxcpus=cpus][,sockets=sockets][,dies=dies][,cores=cores][,threads=threads]\n"
>  "set the number of CPUs to 'n' [default=1]\n"
>  "maxcpus= maximum number of total cpus, including\n"
>  "offline CPUs for hotplug, etc\n"
> +"sockets= number of discrete sockets in the system\n",
> +"dies= number of CPU dies on one socket (for PC only)\n"
>  "cores= number of CPU cores on one socket (for PC, it's 
> on one die)\n"
>  "threads= number of threads on one CPU core\n"
> -"dies= number of CPU dies on one socket (for PC only)\n"
> -"sockets= number of discrete sockets in the system\n",
>  QEMU_ARCH_ALL)
>  SRST
> -``-smp 
> [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
> +``-smp 
> [cpus=]n[,maxcpus=maxcpus][,sockets=sockets][,dies=dies][,cores=cores][,threads=threads]``
>  Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
>  are supported. On Sparc32 target, Linux limits the number of usable
>  CPUs to 4. For the PC target, the number of cores per die, the

Looks cleaner. With the nit mentioned by Dan:

Reviewed-by: Pankaj Gupta 



Re: [PATCH v1] vfio: Fix CID 1458134 in vfio_register_ram_discard_listener()

2021-07-12 Thread Pankaj Gupta
>   CID 1458134:  Integer handling issues  (BAD_SHIFT)
> In expression "1 << ctz64(container->pgsizes)", left shifting by more
> than 31 bits has undefined behavior.  The shift amount,
> "ctz64(container->pgsizes)", is 64.
>
> Commit 5e3b981c330c ("vfio: Support for RamDiscardManager in the !vIOMMU
> case") added an assertion that our granularity is at least as big as the
> page size.
>
> Although unlikely, we could have a page size that does not fit into
> 32 bit. In that case, we'd try shifting by more than 31 bit.
>
> Let's use 1ULL instead and make sure we're not shifting by more than 63
> bit by asserting that any bit in container->pgsizes is set.
>
> Fixes: CID 1458134
> Cc: Alex Williamson 
> Cc: Eduardo Habkost 
> Cc: "Michael S. Tsirkin" 
> Cc: Paolo Bonzini 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Cc: Peter Xu 
> Cc: Auger Eric 
> Cc: Wei Yang 
> Cc: teawater 
> Cc: Marek Kedzierski 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/vfio/common.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 3f0d111360..8728d4d5c2 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -783,7 +783,8 @@ static void 
> vfio_register_ram_discard_listener(VFIOContainer *container,
>  section->mr);
>
>  g_assert(vrdl->granularity && is_power_of_2(vrdl->granularity));
> -g_assert(vrdl->granularity >= 1 << ctz64(container->pgsizes));
> +g_assert(container->pgsizes &&
> + vrdl->granularity >= 1ULL << ctz64(container->pgsizes));
>
>  ram_discard_listener_init(&vrdl->listener,
>vfio_ram_discard_notify_populate,

> Fixes: d5015b801340 ("softmmu/memory: Pass ram_flags to
> qemu_ram_alloc_from_fd()")
>
> Signed-off-by: Yang Zhong 
> Reviewed-by: David Hildenbrand 
> ---
>  hw/remote/memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/remote/memory.c b/hw/remote/memory.c
> index 472ed2a272..6e21ab1a45 100644
> --- a/hw/remote/memory.c
> +++ b/hw/remote/memory.c
> @@ -46,7 +46,7 @@ void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp)
>  subregion = g_new(MemoryRegion, 1);
>  memory_region_init_ram_from_fd(subregion, NULL,
>         name, sysmem_info->sizes[region],
> -   true, msg->fds[region],
> +   RAM_SHARED, msg->fds[region],
> sysmem_info->offsets[region],
> errp);
>

Reviewed-by: Pankaj Gupta 



Re: [PATCH 1/2] numa: Report expected initiator

2021-07-09 Thread Pankaj Gupta
> When setting up NUMA with HMAT enabled there's a check performed
> in machine_set_cpu_numa_node() that reports an error when a NUMA
> node has a CPU but the node's initiator is not itself. The error
> message reported contains only the expected value and not the
> actual value (which is different because an error is being
> reported). Report both values in the error message.
>
> Signed-off-by: Michal Privoznik 
> ---
>  hw/core/machine.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 57c18f909a..6f59fb0b7f 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -728,7 +728,8 @@ void machine_set_cpu_numa_node(MachineState *machine,
>  if ((numa_info[props->node_id].initiator < MAX_NODES) &&
>  (props->node_id != numa_info[props->node_id].initiator)) {
>  error_setg(errp, "The initiator of CPU NUMA node %" PRId64
> -" should be itself", props->node_id);
> +   " should be itself (got %" PRIu16 ")",
> +   props->node_id, 
> numa_info[props->node_id].initiator);
>  return;
>  }
>  numa_info[props->node_id].has_cpu = true;

Reviewed-by: Pankaj Gupta 



Re: [PATCH v2] remote/memory: Replace share parameter with ram_flags

2021-07-08 Thread Pankaj Gupta
> Fixes: d5015b801340 ("softmmu/memory: Pass ram_flags to
> qemu_ram_alloc_from_fd()")
>
> Signed-off-by: Yang Zhong 
> Reviewed-by: David Hildenbrand 
> ---
>  hw/remote/memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/remote/memory.c b/hw/remote/memory.c
> index 472ed2a272..6e21ab1a45 100644
> --- a/hw/remote/memory.c
> +++ b/hw/remote/memory.c
> @@ -46,7 +46,7 @@ void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp)
>  subregion = g_new(MemoryRegion, 1);
>  memory_region_init_ram_from_fd(subregion, NULL,
> name, sysmem_info->sizes[region],
> -   true, msg->fds[region],
> +   RAM_SHARED, msg->fds[region],
> sysmem_info->offsets[region],
> errp);
>
Reviewed-by: Pankaj Gupta 



Re: [PATCH] Fix libpmem configuration option

2021-07-07 Thread Pankaj Gupta
> For some reason, libpmem option setting was set to work in an opposite
> way (--enable-libpmem disabled it and vice versa). Fixing this so
> configuration works properly.
>
> Signed-off-by: Miroslav Rezanina 
> ---
>  configure | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/configure b/configure
> index 7994bdee92..ffa93cc5fd 100755
> --- a/configure
> +++ b/configure
> @@ -1501,9 +1501,9 @@ for opt do
>;;
>--disable-debug-mutex) debug_mutex=no
>;;
> -  --enable-libpmem) libpmem=disabled
> +  --enable-libpmem) libpmem="enabled"
>;;
> -  --disable-libpmem) libpmem=enabled
> +  --disable-libpmem) libpmem="disabled"
>;;
>--enable-xkbcommon) xkbcommon="enabled"
>;;

Reviewed-by: Pankaj Gupta 



Re: [PATCH v1 1/2] migration/postcopy-ram: define type for "struct PostcopyNotifyData"

2021-07-07 Thread Pankaj Gupta
> Let's define a proper type, just as we do for PrecopyNotifyData already.
>
> Cc: Wei Wang 
> Cc: Michael S. Tsirkin 
> Cc: Philippe Mathieu-Daudé 
> Cc: Alexander Duyck 
> Cc: Juan Quintela 
> Cc: "Dr. David Alan Gilbert" 
> Cc: Peter Xu 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/virtio/vhost-user.c   | 2 +-
>  migration/postcopy-ram.c | 2 +-
>  migration/postcopy-ram.h | 4 ++--
>  3 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 1ac4a2ebec..42dad711bf 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1827,9 +1827,9 @@ static int vhost_user_postcopy_end(struct vhost_dev 
> *dev, Error **errp)
>  static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
>  void *opaque)
>  {
> -struct PostcopyNotifyData *pnd = opaque;
>  struct vhost_user *u = container_of(notifier, struct vhost_user,
>   postcopy_notifier);
> +PostcopyNotifyData *pnd = opaque;
>  struct vhost_dev *dev = u->dev;
>
>  switch (pnd->reason) {
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 2e9697bdd2..ee4e1c7cf5 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -69,7 +69,7 @@ void postcopy_remove_notifier(NotifierWithReturn *n)
>
>  int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp)
>  {
> -struct PostcopyNotifyData pnd;
> +PostcopyNotifyData pnd;
>  pnd.reason = reason;
>  pnd.errp = errp;
>
> diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> index 6d2b3cf124..01829c3562 100644
> --- a/migration/postcopy-ram.h
> +++ b/migration/postcopy-ram.h
> @@ -125,10 +125,10 @@ enum PostcopyNotifyReason {
>  POSTCOPY_NOTIFY_INBOUND_END,
>  };
>
> -struct PostcopyNotifyData {
> +typedef struct PostcopyNotifyData {
>  enum PostcopyNotifyReason reason;
>  Error **errp;
> -};
> +} PostcopyNotifyData;
>
>  void postcopy_add_notifier(NotifierWithReturn *nn);
>  void postcopy_remove_notifier(NotifierWithReturn *n);
> --

Reviewed-by: Pankaj Gupta 



Re: [PATCHv2 1/1] Support monitor chardev hotswap with QMP

2021-05-03 Thread Pankaj Gupta
+CC Danpb

> >>> Marc-André, I'd like your opinion for this one, in particular the use of
> >>> g_source_remove().
> >>>
> >>
> >> My opinion isn't really worth much, my review would have a bit more value.
> >>
> >> GSource has indeed some peculiar lifetime management, that I got wrong in
> >> the past. So I would be extra careful.
> >>
> >> But before spending time on review, I would also clarify the motivation
> >> and ask for testing.
> >>
> >> Markus, hot-adding/removing monitors isn't supported?
> >>
> >>
> > I realize you answered my question below. That's surprising me. Wouldn't it
> > make more sense to support it rather than having a pre-opened null-based
> > monitor that can have its chardev swapped?
>
> Yes, it would.  Patches welcome.
>
> This patch is a somewhat ham-fisted and limited solution to the problem
> stated in the commit message.  However, it might *also* be a reasonable
> improvement to chardev-change on its own.  Not for me to judge.
>
> chardev-change comes with a number of restrictions.  Let's have a closer
> look.  It fails
>
> 1. when no such character device exists (d'oh)
>
> 2. for chardev-mux devices
>
> 3. in record/replay mode
>
> 4. when a backend is connected that doesn't implement the chr_be_change()
>method
>
> 5. when chr_be_change() fails
>
> 6. when creating the new chardev fails[*]
>
> Items 2, 3, 4 are restrictions.  I figure 2 and 4 are simply not
> implemented, yet.  I'm not sure about 3.
>
> Whether we want to accept patches lifting restrictions is up to the
> chardev maintainers.

Maybe we can handle or already handle the restrictions at libvirt side?

>
> This patch lifts restriction 4 for QMP monitor backends.  Its monitor
> part looks acceptable to me, but I dislike its code duplication.  Before
> we spend time on cleaning that up (or on deciding to clean it up later),
> I'd like to hear the chardev mantainers' judgement, because that's about
> more serious matters than cleanliness.

Sure. But I also feel allowing to change monitor device is a useful feature
independent of monitor hotplug/unplug feature .

>
> Do I make sense?
>
> [...]
>
>
> [*] The code for creating the new chardev in the "no backend connected"
> case
>
> be = chr->be;
> if (!be) {
> /* easy case */
> object_unparent(OBJECT(chr));
> return qmp_chardev_add(id, backend, errp);
> }
>
> is problematic: when qmp_chardev_add() fails, we already destroyed the
> old chardev.  It should destroy the old chardev only when it can create
> its replacement.

 Good point. I agree. We should fix this.

Thanks,
Pankaj



Re: [PATCH v4 0/3] nvdimm: Enable sync-dax property for nvdimm

2021-05-03 Thread Pankaj Gupta
> > The proposal that "sync-dax=unsafe" for non-PPC architectures, is a
> > fundamental misrepresentation of how this is supposed to work. Rather
> > than make "sync-dax" a first class citizen of the device-description
> > interface I'm proposing that you make this a separate device-type.
> > This also solves the problem that "sync-dax" with an implicit
> > architecture backend assumption is not precise, but a new "non-nvdimm"
> > device type would make it explicit what the host is advertising to the
> > guest.
> >
>
> Currently, users can use a virtualized nvdimm support in Qemu to share
> host page cache to the guest via the below command
>
> -object memory-backend-file,id=memnvdimm1,mem-path=file_name_in_host_fs
> -device nvdimm,memdev=memnvdimm1
>
> Such usage can results in wrong application behavior because there is no
> hint to the application/guest OS that a cpu cache flush is not
> sufficient to ensure persistence.
>
> I understand that virio-pmem is suggested as an alternative for that.
> But why not fix virtualized nvdimm if platforms can express the details.
>
> ie, can ACPI indicate to the guest OS that the device need a flush
> mechanism to ensure persistence in the above case?
>
> What this patch series did was to express that property via a device
> tree node and guest driver enables a hypercall based flush mechanism to
> ensure persistence.

Would VIRTIO (entirely asynchronous, no trap at host side) based
mechanism is better
than hyper-call based? Registering memory can be done any way. We
implemented virtio-pmem
flush mechanisms with below considerations:

- Proper semantic for guest flush requests.
- Efficient mechanism for performance pov.

I am just asking myself if we have platform agnostic mechanism already
there, maybe
we can extend it to suit our needs? Maybe I am missing some points here.

> >> On PPC, the default is "sync-dax=writeback" - so the ND_REGION_ASYNC
> >>
> >> is set for the region and the guest makes hcalls to issue fsync on the 
> >> host.
> >>
> >>
> >> Are you suggesting me to keep it "unsafe" as default for all architectures
> >>
> >> including PPC and a user can set it to "writeback" if desired.
> >
> > No, I am suggesting that "sync-dax" is insufficient to convey this
> > property. This behavior warrants its own device type, not an ambiguous
> > property of the memory-backend-file with implicit architecture
> > assumptions attached.
> >
>
> Why is it insufficient?  Is it because other architectures don't have an
> ability express this detail to guest OS? Isn't that an arch limitations?



Re: [PATCH 2/2] Support monitor chardev hotswap with QMP

2021-04-12 Thread Pankaj Gupta
Hi Li,

> For some scenarios, it needs to hot-add a monitor device.
> But QEMU doesn't support hotplug yet. It also works by adding
> a monitor with null backend by default and then change its
> backend to socket by QMP command "chardev-change".
>
> So this patch is to support monitor chardev hotswap with QMP.
>
> Signed-off-by: Li Zhang 
> ---
>  monitor/monitor-internal.h |  3 +++
>  monitor/monitor.c  |  2 +-
>  monitor/qmp.c  | 42 +++---
>  3 files changed, 43 insertions(+), 4 deletions(-)
>
> diff --git a/monitor/monitor-internal.h b/monitor/monitor-internal.h
> index 40903d6386..2df6dd21de 100644
> --- a/monitor/monitor-internal.h
> +++ b/monitor/monitor-internal.h
> @@ -186,4 +186,7 @@ int hmp_compare_cmd(const char *name, const char *list);
>  void qmp_query_qmp_schema(QDict *qdict, QObject **ret_data,
>   Error **errp);
>
> +gboolean monitor_unblocked(GIOChannel *chan, GIOCondition cond,
> +   void *opaque);
> +
>  #endif
> diff --git a/monitor/monitor.c b/monitor/monitor.c
> index e94f532cf5..2d255bab18 100644
> --- a/monitor/monitor.c
> +++ b/monitor/monitor.c
> @@ -157,7 +157,7 @@ static inline bool monitor_is_hmp_non_interactive(const 
> Monitor *mon)
>
>  static void monitor_flush_locked(Monitor *mon);
>
> -static gboolean monitor_unblocked(GIOChannel *chan, GIOCondition cond,
> +gboolean monitor_unblocked(GIOChannel *chan, GIOCondition cond,
>void *opaque)
>  {
>  Monitor *mon = opaque;
> diff --git a/monitor/qmp.c b/monitor/qmp.c
> index 2326bd7f9b..55cfb230d9 100644
> --- a/monitor/qmp.c
> +++ b/monitor/qmp.c
> @@ -44,6 +44,7 @@ struct QMPRequest {
>  Error *err;
>  };
>  typedef struct QMPRequest QMPRequest;
> +static void monitor_qmp_set_handlers_bh(void *opaque);
>
>  QmpCommandList qmp_commands, qmp_cap_negotiation_commands;
>
> @@ -480,7 +481,35 @@ void monitor_data_destroy_qmp(MonitorQMP *mon)
>  g_queue_free(mon->qmp_requests);
>  }
>
> -static void monitor_qmp_setup_handlers_bh(void *opaque)
> +static int monitor_qmp_change(void *opaque)
> +{
> +MonitorQMP *mon = opaque;
> +
> +mon->common.use_io_thread =
> +qemu_chr_has_feature(mon->common.chr.chr, 
> QEMU_CHAR_FEATURE_GCONTEXT);
> +
> +if (mon->common.use_io_thread) {
> +aio_bh_schedule_oneshot(iothread_get_aio_context(mon_iothread),
> +monitor_qmp_set_handlers_bh, mon);
> +} else {
> +qemu_chr_fe_set_handlers(&mon->common.chr, monitor_can_read,
> + monitor_qmp_read, monitor_qmp_event,
> + monitor_qmp_change, &mon->common, NULL, 
> true);
> +}
> +
> +if (mon->common.out_watch) {
> +g_source_remove(mon->common.out_watch);
> +qemu_mutex_lock(&mon->common.mon_lock);

Not very sure if mutex is must here, but still its protecting "out_watch"
for simultaneous updating.

> +mon->common.out_watch =
> +qemu_chr_fe_add_watch(&mon->common.chr, G_IO_OUT | G_IO_HUP,
> +   monitor_unblocked, &mon->common);
> +qemu_mutex_unlock(&mon->common.mon_lock);
> +}
> +
> +return 0;
> +}
> +
> +static void monitor_qmp_set_handlers_bh(void *opaque)
>  {
>  MonitorQMP *mon = opaque;
>  GMainContext *context;
> @@ -490,7 +519,14 @@ static void monitor_qmp_setup_handlers_bh(void *opaque)
>  assert(context);
>  qemu_chr_fe_set_handlers(&mon->common.chr, monitor_can_read,
>   monitor_qmp_read, monitor_qmp_event,
> - NULL, &mon->common, context, true);
> + monitor_qmp_change, &mon->common, context, 
> true);
> +
> +}
> +
> +static void monitor_qmp_setup_handlers_bh(void *opaque)
> +{
> +MonitorQMP *mon = opaque;
> +monitor_qmp_set_handlers_bh(mon);
>  monitor_list_append(&mon->common);
>  }
>
> @@ -531,7 +567,7 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error 
> **errp)
>  } else {
>  qemu_chr_fe_set_handlers(&mon->common.chr, monitor_can_read,
>   monitor_qmp_read, monitor_qmp_event,
> - NULL, &mon->common, NULL, true);
> + monitor_qmp_change, &mon->common, NULL, 
> true);
>  monitor_list_append(&mon->common);
>  }
>  }

Overall patch looks good to me.
Reviewed-by: Pankaj Gupta 



Re: [PATCH] virtio-pmem: fix virtio_pmem_resp assign problem

2021-03-17 Thread Pankaj Gupta
> ret in virtio_pmem_resp is a uint32_t variable, which should be assigned
> using virtio_stl_p.
>
> The kernel side driver does not guarantee virtio_pmem_resp to be initialized
> to zero in advance, So sometimes the flush operation will fail.
>
> Signed-off-by: Wang Liang 
> ---
>  hw/virtio/virtio-pmem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
> index a3e0688a89..d1aeb90a31 100644
> --- a/hw/virtio/virtio-pmem.c
> +++ b/hw/virtio/virtio-pmem.c
> @@ -47,7 +47,7 @@ static int worker_cb(void *opaque)
>  err = 1;
>  }
>
> -virtio_stw_p(req_data->vdev, &req_data->resp.ret, err);
> +virtio_stl_p(req_data->vdev, &req_data->resp.ret, err);
>
>  return 0;
>  }

Thanks!
Reviewed-by: Pankaj Gupta 



Re: [PATCH v2] i386: Add the support for AMD EPYC 3rd generation processors

2021-03-02 Thread Pankaj Gupta
Hi Babu,

I confirm I can see both the ssbd & irbs features in guest with the
below patch. There was some issue at my end, Sorry! for the confusion.
Can you please post the official patch for inclusion.

Best regards,
Pankaj

On Mon, Mar 1, 2021 at 9:38 PM Babu Moger  wrote:
>
>
>
> > -Original Message-----
> > From: Pankaj Gupta 
> > Sent: Monday, March 1, 2021 2:22 PM
> > To: Moger, Babu 
> > Cc: Pankaj Gupta ; Paolo Bonzini
> > ; richard.hender...@linaro.org; Eduardo Habkost
> > ; Qemu Developers 
> > Subject: Re: [PATCH v2] i386: Add the support for AMD EPYC 3rd generation
> > processors
> >
> > > > Hi Babu,
> > > >
> > > > I tried to test below patch for AMD EPYC Rome CPU and I got below error
> > [1]:
> > > >
> > > > Also, I noticed SSBD CPU flag for guest was still available even
> > > > without this patch, I noticed that for the guest, AMD_SSBD not got set.
> > > >
> > > > Guest:
> > > > 0x8008 0x00: eax=0x3028 ebx=0x9205 ecx=0x2003
> > > > edx=0x
> > > >
> > > > [1]
> > > > [0.008000] unchecked MSR access error: WRMSR to 0x48 (tried to
> > > > write 0x) at rIP: 0x9245c9e4
> > > > (native_write_msr+0x4/0x20)
> > > > [0.008000]  [] ? x86_spec_ctrl_setup_ap+0x35/0x50
> > > > [0.008000]  [] ? identify_secondary_cpu+0x53/0x80
> > > > [0.008000]  [] ? start_secondary+0x6a/0x1b0
> > > >
> > > > 0.011970] unchecked MSR access error: RDMSR from 0x48 at rIP:
> > > > 0x9245c772 (native_read_msr+0x2/0x40)
> > >
> > > I did not see any problem with these patches.
> > > My guest setup.
> > > # lscpu |grep -o ssbd
> > > ssbd
> > > [root@rome-vm ~]# uname -r
> > > 4.18.0-147.el8.x86_64
> > > [root@rome-vm ~]# cat /etc/redhat-release Red Hat Enterprise Linux
> > > release 8.1 (Ootpa) # wrmsr 0x48 7 [root@rome-vm ~]# rdmsr 0x48
> > > 7
> > >
> > >
> > > My host os.
> > > # uname -r
> > > 4.18.0-193.el8.x86_64
> > > [root@rome images]# cat /etc/redhat-release Red Hat Enterprise Linux
> > > release 8.2 Beta (Ootpa)
> > >
> > > Also, I only see ssbd feature when add this patch(EPYC-Rome-v2).
> > > Otherwise, I don’t see ssbd feature.
> >
> > Thanks for checking!
> > Can you also see the ibrs feature inside guest with this patch?
>
> Yes, The feature is available with this patch. Otherwise it is not available.
> [root@rome-vm ~]# lscpu |grep -o ibrs
> ibrs
>
> >
> > Thanks,
> > Pankaj
> > >
> > > Thanks
> > > Babu
> > >
> > >
> > > >
> > > > Thanks,
> > > > Pankaj
> > > >
> > > > > > It is normally added as v2 for compatibility. Like this.
> > > > >
> > > > > o.k. Thanks!
> > > > > I will test this tomorrow.
> > > > >
> > > > > >
> > > > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c index
> > > > > > 24db7ed892..f721d0db78 100644
> > > > > > --- a/target/i386/cpu.c
> > > > > > +++ b/target/i386/cpu.c
> > > > > > @@ -4179,6 +4179,20 @@ static X86CPUDefinition builtin_x86_defs[] =
> > {
> > > > > >  .xlevel = 0x801E,
> > > > > >  .model_id = "AMD EPYC-Rome Processor",
> > > > > >  .cache_info = &epyc_rome_cache_info,
> > > > > > +.versions = (X86CPUVersionDefinition[]) {
> > > > > > +{ .version = 1 },
> > > > > > +{
> > > > > > +.version = 2,
> > > > > > +.props = (PropValue[]) {
> > > > > > +{ "ibrs", "on" },
> > > > > > +{ "amd-ssbd", "on" },
> > > > > > +{ "model-id",
> > > > > > +  "AMD EPYC-Rome Processor" },
> > > > > > +{ /* end of list */ }
> > > > > > +}
> > > > > > +},
> > > > > > +{ /* end of list */ }
> > > > > > +}
> > > > > >  },
> > > > > >  {
> > > > > >  .name = "EPYC-Milan",



Re: [PATCH v2] i386: Add the support for AMD EPYC 3rd generation processors

2021-03-02 Thread Pankaj Gupta
> Pankaj, Sure.
>
> I will add signoff from you if it is fine with you.
Sure. Thank you!

>
> Signed-off-by: Pankaj Gupta 
Please use: Pankaj Gupta 

Thanks,
Pankaj



Re: [PATCH v2] i386: Add the support for AMD EPYC 3rd generation processors

2021-03-01 Thread Pankaj Gupta
> > Hi Babu,
> >
> > I tried to test below patch for AMD EPYC Rome CPU and I got below error [1]:
> >
> > Also, I noticed SSBD CPU flag for guest was still available even without 
> > this
> > patch, I noticed that for the guest, AMD_SSBD not got set.
> >
> > Guest:
> > 0x8008 0x00: eax=0x3028 ebx=0x9205 ecx=0x2003
> > edx=0x
> >
> > [1]
> > [0.008000] unchecked MSR access error: WRMSR to 0x48 (tried to
> > write 0x) at rIP: 0x9245c9e4
> > (native_write_msr+0x4/0x20)
> > [0.008000]  [] ? x86_spec_ctrl_setup_ap+0x35/0x50
> > [0.008000]  [] ? identify_secondary_cpu+0x53/0x80
> > [0.008000]  [] ? start_secondary+0x6a/0x1b0
> >
> > 0.011970] unchecked MSR access error: RDMSR from 0x48 at rIP:
> > 0x9245c772 (native_read_msr+0x2/0x40)
>
> I did not see any problem with these patches.
> My guest setup.
> # lscpu |grep -o ssbd
> ssbd
> [root@rome-vm ~]# uname -r
> 4.18.0-147.el8.x86_64
> [root@rome-vm ~]# cat /etc/redhat-release
> Red Hat Enterprise Linux release 8.1 (Ootpa)
> # wrmsr 0x48 7
> [root@rome-vm ~]# rdmsr 0x48
> 7
>
>
> My host os.
> # uname -r
> 4.18.0-193.el8.x86_64
> [root@rome images]# cat /etc/redhat-release
> Red Hat Enterprise Linux release 8.2 Beta (Ootpa)
>
> Also, I only see ssbd feature when add this patch(EPYC-Rome-v2).
> Otherwise, I don’t see ssbd feature.

Thanks for checking!
Can you also see the ibrs feature inside guest with this patch?

Thanks,
Pankaj
>
> Thanks
> Babu
>
>
> >
> > Thanks,
> > Pankaj
> >
> > > > It is normally added as v2 for compatibility. Like this.
> > >
> > > o.k. Thanks!
> > > I will test this tomorrow.
> > >
> > > >
> > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c index
> > > > 24db7ed892..f721d0db78 100644
> > > > --- a/target/i386/cpu.c
> > > > +++ b/target/i386/cpu.c
> > > > @@ -4179,6 +4179,20 @@ static X86CPUDefinition builtin_x86_defs[] = {
> > > >  .xlevel = 0x801E,
> > > >  .model_id = "AMD EPYC-Rome Processor",
> > > >  .cache_info = &epyc_rome_cache_info,
> > > > +.versions = (X86CPUVersionDefinition[]) {
> > > > +{ .version = 1 },
> > > > +{
> > > > +.version = 2,
> > > > +.props = (PropValue[]) {
> > > > +{ "ibrs", "on" },
> > > > +{ "amd-ssbd", "on" },
> > > > +{ "model-id",
> > > > +  "AMD EPYC-Rome Processor" },
> > > > +{ /* end of list */ }
> > > > +}
> > > > +},
> > > > +{ /* end of list */ }
> > > > +}
> > > >  },
> > > >  {
> > > >  .name = "EPYC-Milan",



Re: [PATCH v2] i386: Add the support for AMD EPYC 3rd generation processors

2021-03-01 Thread Pankaj Gupta
Hi Babu,

I tried to test below patch for AMD EPYC Rome CPU and I got below error [1]:

Also, I noticed SSBD CPU flag for guest was still available even
without this patch,
I noticed that for the guest, AMD_SSBD not got set.

Guest:
0x8008 0x00: eax=0x3028 ebx=0x9205 ecx=0x2003 edx=0x

[1]
[0.008000] unchecked MSR access error: WRMSR to 0x48 (tried to
write 0x) at rIP: 0x9245c9e4
(native_write_msr+0x4/0x20)
[0.008000]  [] ? x86_spec_ctrl_setup_ap+0x35/0x50
[0.008000]  [] ? identify_secondary_cpu+0x53/0x80
[0.008000]  [] ? start_secondary+0x6a/0x1b0

0.011970] unchecked MSR access error: RDMSR from 0x48 at rIP:
0x9245c772 (native_read_msr+0x2/0x40)

Thanks,
Pankaj

> > It is normally added as v2 for compatibility. Like this.
>
> o.k. Thanks!
> I will test this tomorrow.
>
> >
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index 24db7ed892..f721d0db78 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -4179,6 +4179,20 @@ static X86CPUDefinition builtin_x86_defs[] = {
> >  .xlevel = 0x801E,
> >  .model_id = "AMD EPYC-Rome Processor",
> >  .cache_info = &epyc_rome_cache_info,
> > +.versions = (X86CPUVersionDefinition[]) {
> > +{ .version = 1 },
> > +{
> > +.version = 2,
> > +.props = (PropValue[]) {
> > +{ "ibrs", "on" },
> > +{ "amd-ssbd", "on" },
> > +{ "model-id",
> > +  "AMD EPYC-Rome Processor" },
> > +{ /* end of list */ }
> > +}
> > +},
> > +{ /* end of list */ }
> > +}
> >  },
> >  {
> >  .name = "EPYC-Milan",



  1   2   3   4   5   6   >