date:20230705

This patch series mainly updates SapphireRapids CPU model and adds new
CPU model GraniteRapids.

Bit 13 (ARCH_CAP_SBDR_SSDP_NO), bit 14 (ARCH_CAP_FBSDP_NO) and bit 15
(ARCH_CAP_PSDP_NO) of MSR_IA32_ARCH_CAPABILITIES are enumerated starting
from SapphireRapids, which are missed in current SapphireRapids CPU model,
so add a new version for SapphireRapids CPU model to expose these bits.

GraniteRapids is Intel's successor to EmeraldRapids, an Intel 3 process
microarchitecture for enthusiasts and servers, which adds new features
based on SapphireRapids. The new features can be found in [1].

---

Changelog:

v2:
- Drop the same part of patch[2]
- Drop EmeraldRapids CPU model
- Change the commit messages to make these clear

v1: https://lore.kernel.org/all/20230616032311.19137-1-tao1...@linux.intel.com/

[1] https://cdrdv2.intel.com/v1/dl/getContent/671368
[2]
https://lore.kernel.org/all/63d85cc76d4cdc51e6c732478b81d8f13be11e5a.1687551881.git.pawan.kumar.gu...@linux.intel.com/


Lei Wang (1):
  target/i386: Add few security fix bits in ARCH_CAPABILITIES into
SapphireRapids CPU model

Tao Su (5):
  target/i386: Add FEAT_7_1_EDX to adjust feature level
  target/i386: Add support for MCDT_NO in CPUID enumeration
  target/i386: Allow MCDT_NO if host supports
  target/i386: Add new bit definitions of MSR_IA32_ARCH_CAPABILITIES
  target/i386: Add new CPU model GraniteRapids

 target/i386/cpu.c | 172 ++
 target/i386/cpu.h |   8 ++
 target/i386/kvm/kvm.c |   4 +
 3 files changed, 184 insertions(+)


base-commit: 2a6ae69154542caa91dd17c40fd3f5ffbec300de
-- 
2.34.1

[PATCH v2 3/6] target/i386: Allow MCDT_NO if host supports

MCDT_NO bit indicates HW contains the security fix and doesn't need to
be mitigated to avoid data-dependent behaviour for certain instructions.
It needs no hypervisor support. Treat it as supported regardless of what
KVM reports.

Signed-off-by: Tao Su 
Reviewed-by: Xiaoyao Li 
---
 target/i386/kvm/kvm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index de531842f6..ebfaf3d24c 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -432,6 +432,10 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 uint32_t eax;
 host_cpuid(7, 1, , , , );
 ret |= eax & (CPUID_7_1_EAX_FZRM | CPUID_7_1_EAX_FSRS | 
CPUID_7_1_EAX_FSRC);
+} else if (function == 7 && index == 2 && reg == R_EDX) {
+uint32_t edx;
+host_cpuid(7, 2, , , , );
+ret |= edx & CPUID_7_2_EDX_MCDT_NO;
 } else if (function == 0xd && index == 0 &&
(reg == R_EAX || reg == R_EDX)) {
 /*
-- 
2.34.1

[PATCH v2 4/6] target/i386: Add new bit definitions of MSR_IA32_ARCH_CAPABILITIES

Currently, bit 13, 14, 15 and 24 of MSR_IA32_ARCH_CAPABILITIES are
disclosed for fixing security issues, so add those bit definitions.

Signed-off-by: Tao Su 
Reviewed-by: Igor Mammedov 
---
 target/i386/cpu.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index c196b0a482..e0771a1043 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1022,7 +1022,11 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord 
w,
 #define MSR_ARCH_CAP_PSCHANGE_MC_NO (1U << 6)
 #define MSR_ARCH_CAP_TSX_CTRL_MSR   (1U << 7)
 #define MSR_ARCH_CAP_TAA_NO (1U << 8)
+#define MSR_ARCH_CAP_SBDR_SSDP_NO   (1U << 13)
+#define MSR_ARCH_CAP_FBSDP_NO   (1U << 14)
+#define MSR_ARCH_CAP_PSDP_NO(1U << 15)
 #define MSR_ARCH_CAP_FB_CLEAR   (1U << 17)
+#define MSR_ARCH_CAP_PBRSB_NO   (1U << 24)
 
 #define MSR_CORE_CAP_SPLIT_LOCK_DETECT  (1U << 5)
 
-- 
2.34.1

[PATCH v2 6/6] target/i386: Add new CPU model GraniteRapids

The GraniteRapids CPU model mainly adds the following new features
based on SapphireRapids:
- PREFETCHITI CPUID.(EAX=7,ECX=1):EDX[bit 14]
- AMX-FP16 CPUID.(EAX=7,ECX=1):EAX[bit 21]

And adds the following security fix for corresponding vulnerabilities:
- MCDT_NO CPUID.(EAX=7,ECX=2):EDX[bit 5]
- SBDR_SSDP_NO MSR_IA32_ARCH_CAPABILITIES[bit 13]
- FBSDP_NO MSR_IA32_ARCH_CAPABILITIES[bit 14]
- PSDP_NO MSR_IA32_ARCH_CAPABILITIES[bit 15]
- PBRSB_NO MSR_IA32_ARCH_CAPABILITIES[bit 24]

Signed-off-by: Tao Su 
Tested-by: Xuelian Guo 
Reviewed-by: Xiaoyao Li 
---
 target/i386/cpu.c | 136 ++
 1 file changed, 136 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index ec229072e7..97ad229d8b 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3956,6 +3956,142 @@ static const X86CPUDefinition builtin_x86_defs[] = {
 { /* end of list */ }
 }
 },
+{
+.name = "GraniteRapids",
+.level = 0x20,
+.vendor = CPUID_VENDOR_INTEL,
+.family = 6,
+.model = 173,
+.stepping = 0,
+/*
+ * please keep the ascending order so that we can have a clear view of
+ * bit position of each feature.
+ */
+.features[FEAT_1_EDX] =
+CPUID_FP87 | CPUID_VME | CPUID_DE | CPUID_PSE | CPUID_TSC |
+CPUID_MSR | CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC |
+CPUID_SEP | CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV |
+CPUID_PAT | CPUID_PSE36 | CPUID_CLFLUSH | CPUID_MMX | CPUID_FXSR |
+CPUID_SSE | CPUID_SSE2,
+.features[FEAT_1_ECX] =
+CPUID_EXT_SSE3 | CPUID_EXT_PCLMULQDQ | CPUID_EXT_SSSE3 |
+CPUID_EXT_FMA | CPUID_EXT_CX16 | CPUID_EXT_PCID | CPUID_EXT_SSE41 |
+CPUID_EXT_SSE42 | CPUID_EXT_X2APIC | CPUID_EXT_MOVBE |
+CPUID_EXT_POPCNT | CPUID_EXT_TSC_DEADLINE_TIMER | CPUID_EXT_AES |
+CPUID_EXT_XSAVE | CPUID_EXT_AVX | CPUID_EXT_F16C | 
CPUID_EXT_RDRAND,
+.features[FEAT_8000_0001_EDX] =
+CPUID_EXT2_SYSCALL | CPUID_EXT2_NX | CPUID_EXT2_PDPE1GB |
+CPUID_EXT2_RDTSCP | CPUID_EXT2_LM,
+.features[FEAT_8000_0001_ECX] =
+CPUID_EXT3_LAHF_LM | CPUID_EXT3_ABM | CPUID_EXT3_3DNOWPREFETCH,
+.features[FEAT_8000_0008_EBX] =
+CPUID_8000_0008_EBX_WBNOINVD,
+.features[FEAT_7_0_EBX] =
+CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_HLE |
+CPUID_7_0_EBX_AVX2 | CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 |
+CPUID_7_0_EBX_ERMS | CPUID_7_0_EBX_INVPCID | CPUID_7_0_EBX_RTM |
+CPUID_7_0_EBX_AVX512F | CPUID_7_0_EBX_AVX512DQ |
+CPUID_7_0_EBX_RDSEED | CPUID_7_0_EBX_ADX | CPUID_7_0_EBX_SMAP |
+CPUID_7_0_EBX_AVX512IFMA | CPUID_7_0_EBX_CLFLUSHOPT |
+CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_AVX512CD | CPUID_7_0_EBX_SHA_NI 
|
+CPUID_7_0_EBX_AVX512BW | CPUID_7_0_EBX_AVX512VL,
+.features[FEAT_7_0_ECX] =
+CPUID_7_0_ECX_AVX512_VBMI | CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_PKU 
|
+CPUID_7_0_ECX_AVX512_VBMI2 | CPUID_7_0_ECX_GFNI |
+CPUID_7_0_ECX_VAES | CPUID_7_0_ECX_VPCLMULQDQ |
+CPUID_7_0_ECX_AVX512VNNI | CPUID_7_0_ECX_AVX512BITALG |
+CPUID_7_0_ECX_AVX512_VPOPCNTDQ | CPUID_7_0_ECX_LA57 |
+CPUID_7_0_ECX_RDPID | CPUID_7_0_ECX_BUS_LOCK_DETECT,
+.features[FEAT_7_0_EDX] =
+CPUID_7_0_EDX_FSRM | CPUID_7_0_EDX_SERIALIZE |
+CPUID_7_0_EDX_TSX_LDTRK | CPUID_7_0_EDX_AMX_BF16 |
+CPUID_7_0_EDX_AVX512_FP16 | CPUID_7_0_EDX_AMX_TILE |
+CPUID_7_0_EDX_AMX_INT8 | CPUID_7_0_EDX_SPEC_CTRL |
+CPUID_7_0_EDX_ARCH_CAPABILITIES | CPUID_7_0_EDX_SPEC_CTRL_SSBD,
+.features[FEAT_ARCH_CAPABILITIES] =
+MSR_ARCH_CAP_RDCL_NO | MSR_ARCH_CAP_IBRS_ALL |
+MSR_ARCH_CAP_SKIP_L1DFL_VMENTRY | MSR_ARCH_CAP_MDS_NO |
+MSR_ARCH_CAP_PSCHANGE_MC_NO | MSR_ARCH_CAP_TAA_NO |
+MSR_ARCH_CAP_SBDR_SSDP_NO | MSR_ARCH_CAP_FBSDP_NO |
+MSR_ARCH_CAP_PSDP_NO | MSR_ARCH_CAP_PBRSB_NO,
+.features[FEAT_XSAVE] =
+CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
+CPUID_XSAVE_XGETBV1 | CPUID_XSAVE_XSAVES | CPUID_D_1_EAX_XFD,
+.features[FEAT_6_EAX] =
+CPUID_6_EAX_ARAT,
+.features[FEAT_7_1_EAX] =
+CPUID_7_1_EAX_AVX_VNNI | CPUID_7_1_EAX_AVX512_BF16 |
+CPUID_7_1_EAX_FZRM | CPUID_7_1_EAX_FSRS | CPUID_7_1_EAX_FSRC |
+CPUID_7_1_EAX_AMX_FP16,
+.features[FEAT_7_1_EDX] =
+CPUID_7_1_EDX_PREFETCHITI,
+.features[FEAT_7_2_EDX] =
+CPUID_7_2_EDX_MCDT_NO,
+.features[FEAT_VMX_BASIC] =
+MSR_VMX_BASIC_INS_OUTS | MSR_VMX_BASIC_TRUE_CTLS,
+.features[FEAT_VMX_ENTRY_CTLS] =
+VMX_VM_ENTRY_LOAD_DEBUG_CONTROLS | VMX_VM_ENTRY_IA32E_MODE |
+

[PATCH v2 2/6] target/i386: Add support for MCDT_NO in CPUID enumeration

CPUID.(EAX=7,ECX=2):EDX[bit 5] enumerates MCDT_NO. Processors enumerate
this bit as 1 do not exhibit MXCSR Configuration Dependent Timing (MCDT)
behavior and do not need to be mitigated to avoid data-dependent behavior
for certain instructions.

Since MCDT_NO is in a new sub-leaf, add a new CPUID feature word
FEAT_7_2_EDX. Also update cpuid_level_func7 by FEAT_7_2_EDX.

Signed-off-by: Tao Su 
Reviewed-by: Xiaoyao Li 
---
 target/i386/cpu.c | 26 ++
 target/i386/cpu.h |  4 
 2 files changed, 30 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 952744af97..852c45b965 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -739,6 +739,7 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
 #define TCG_7_1_EAX_FEATURES (CPUID_7_1_EAX_FZRM | CPUID_7_1_EAX_FSRS | \
   CPUID_7_1_EAX_FSRC)
 #define TCG_7_1_EDX_FEATURES 0
+#define TCG_7_2_EDX_FEATURES 0
 #define TCG_APM_FEATURES 0
 #define TCG_6_EAX_FEATURES CPUID_6_EAX_ARAT
 #define TCG_XSAVE_FEATURES (CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XGETBV1)
@@ -993,6 +994,25 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 },
 .tcg_features = TCG_7_1_EDX_FEATURES,
 },
+[FEAT_7_2_EDX] = {
+.type = CPUID_FEATURE_WORD,
+.feat_names = {
+NULL, NULL, NULL, NULL,
+NULL, "mcdt-no", NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+},
+.cpuid = {
+.eax = 7,
+.needs_ecx = true, .ecx = 2,
+.reg = R_EDX,
+},
+.tcg_features = TCG_7_2_EDX_FEATURES,
+},
 [FEAT_8000_0007_EDX] = {
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
@@ -6017,6 +6037,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *edx = env->features[FEAT_7_1_EDX];
 *ebx = 0;
 *ecx = 0;
+} else if (count == 2) {
+*edx = env->features[FEAT_7_2_EDX];
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
 } else {
 *eax = 0;
 *ebx = 0;
@@ -6881,6 +6906,7 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp)
 x86_cpu_adjust_feat_level(cpu, FEAT_7_0_ECX);
 x86_cpu_adjust_feat_level(cpu, FEAT_7_1_EAX);
 x86_cpu_adjust_feat_level(cpu, FEAT_7_1_EDX);
+x86_cpu_adjust_feat_level(cpu, FEAT_7_2_EDX);
 x86_cpu_adjust_feat_level(cpu, FEAT_8000_0001_EDX);
 x86_cpu_adjust_feat_level(cpu, FEAT_8000_0001_ECX);
 x86_cpu_adjust_feat_level(cpu, FEAT_8000_0007_EDX);
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 2c9b0d2ebc..c196b0a482 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -628,6 +628,7 @@ typedef enum FeatureWord {
 FEAT_XSAVE_XSS_LO, /* CPUID[EAX=0xd,ECX=1].ECX */
 FEAT_XSAVE_XSS_HI, /* CPUID[EAX=0xd,ECX=1].EDX */
 FEAT_7_1_EDX,   /* CPUID[EAX=7,ECX=1].EDX */
+FEAT_7_2_EDX,   /* CPUID[EAX=7,ECX=2].EDX */
 FEATURE_WORDS,
 } FeatureWord;
 
@@ -932,6 +933,9 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 /* PREFETCHIT0/1 Instructions */
 #define CPUID_7_1_EDX_PREFETCHITI   (1U << 14)
 
+/* Do not exhibit MXCSR Configuration Dependent Timing (MCDT) behavior */
+#define CPUID_7_2_EDX_MCDT_NO   (1U << 5)
+
 /* XFD Extend Feature Disabled */
 #define CPUID_D_1_EAX_XFD   (1U << 4)
 
-- 
2.34.1

[PATCH v2 1/6] target/i386: Add FEAT_7_1_EDX to adjust feature level

Considering the case of FEAT_7_1_EAX being 0 and FEAT_7_1_EDX being
non-zero. Such as starting a VM on GraniteRapids using '-cpu host',
we can see two leafs CPUID_7_0 and CPUID_7_1 in VM, because both
CPUID_7_1_EAX and CPUID_7_1_EDX have non-zero value, but if minus all
FEAT_7_1_EAX features using
'-cpu host,-avx-vnni,-avx512-bf16,-fzrm,-fsrs,-fsrc,-amx-fp16', we can't
get CPUID_7_1 leaf even though CPUID_7_1_EDX has non-zero value.

So it is necessary to update cpuid_level_func7 by CPUID_7_1_EDX, otherwise
guest may report wrong maximum number sub-leaves in leaf 07H.

Fixes: eaaa197d5b11 ("target/i386: Add support for AVX-VNNI-INT8 in CPUID
enumeration")

Signed-off-by: Tao Su 
Reviewed-by: Xiaoyao Li 
---
 target/i386/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index b5688cabb4..952744af97 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6880,6 +6880,7 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp)
 x86_cpu_adjust_feat_level(cpu, FEAT_6_EAX);
 x86_cpu_adjust_feat_level(cpu, FEAT_7_0_ECX);
 x86_cpu_adjust_feat_level(cpu, FEAT_7_1_EAX);
+x86_cpu_adjust_feat_level(cpu, FEAT_7_1_EDX);
 x86_cpu_adjust_feat_level(cpu, FEAT_8000_0001_EDX);
 x86_cpu_adjust_feat_level(cpu, FEAT_8000_0001_ECX);
 x86_cpu_adjust_feat_level(cpu, FEAT_8000_0007_EDX);
-- 
2.34.1

[PATCH v2 5/6] target/i386: Add few security fix bits in ARCH_CAPABILITIES into SapphireRapids CPU model

From: Lei Wang 

SapphireRapids has bit 13, 14 and 15 of MSR_IA32_ARCH_CAPABILITIES
enabled, which are related to some security fixes.

Add version 2 of SapphireRapids CPU model with those bits enabled also.

Signed-off-by: Lei Wang 
Signed-off-by: Tao Su 
---
 target/i386/cpu.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 852c45b965..ec229072e7 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3944,8 +3944,17 @@ static const X86CPUDefinition builtin_x86_defs[] = {
 .model_id = "Intel Xeon Processor (SapphireRapids)",
 .versions = (X86CPUVersionDefinition[]) {
 { .version = 1 },
-{ /* end of list */ },
-},
+{
+.version = 2,
+.props = (PropValue[]) {
+{ "sbdr-ssdp-no", "on" },
+{ "fbsdp-no", "on" },
+{ "psdp-no", "on" },
+{ /* end of list */ }
+}
+},
+{ /* end of list */ }
+}
 },
 {
 .name = "Denverton",
-- 
2.34.1

Re: Reducing vdpa migration downtime because of memory pin / maps

2023-07-05 Thread Eugenio Perez Martin

On Thu, Jul 6, 2023 at 2:13 AM Si-Wei Liu  wrote:
>
>
>
> On 7/5/2023 11:03 AM, Eugenio Perez Martin wrote:
> > On Tue, Jun 27, 2023 at 8:36 AM Si-Wei Liu  wrote:
> >>
> >>
> >> On 6/9/2023 7:32 AM, Eugenio Perez Martin wrote:
> >>> On Fri, Jun 9, 2023 at 12:39 AM Si-Wei Liu  wrote:
>  On 6/7/23 01:08, Eugenio Perez Martin wrote:
> > On Wed, Jun 7, 2023 at 12:43 AM Si-Wei Liu  
> > wrote:
> >> Sorry for reviving this old thread, I lost the best timing to follow up
> >> on this while I was on vacation. I have been working on this and found
> >> out some discrepancy, please see below.
> >>
> >> On 4/5/23 04:37, Eugenio Perez Martin wrote:
> >>> Hi!
> >>>
> >>> As mentioned in the last upstream virtio-networking meeting, one of
> >>> the factors that adds more downtime to migration is the handling of
> >>> the guest memory (pin, map, etc). At this moment this handling is
> >>> bound to the virtio life cycle (DRIVER_OK, RESET). In that sense, the
> >>> destination device waits until all the guest memory / state is
> >>> migrated to start pinning all the memory.
> >>>
> >>> The proposal is to bind it to the char device life cycle (open vs
> >>> close),
> >> Hmmm, really? If it's the life cycle for char device, the next guest /
> >> qemu launch on the same vhost-vdpa device node won't make it work.
> >>
> > Maybe my sentence was not accurate, but I think we're on the same page 
> > here.
> >
> > Two qemu instances opening the same char device at the same time are
> > not allowed, and vhost_vdpa_release clean all the maps. So the next
> > qemu that opens the char device should see a clean device anyway.
>  I mean the pin can't be done at the time of char device open, where the
>  user address space is not known/bound yet. The earliest point possible
>  for pinning would be until the vhost_attach_mm() call from SET_OWNER is
>  done.
> >>> Maybe we are deviating, let me start again.
> >>>
> >>> Using QEMU code, what I'm proposing is to modify the lifecycle of the
> >>> .listener member of struct vhost_vdpa.
> >>>
> >>> At this moment, the memory listener is registered at
> >>> vhost_vdpa_dev_start(dev, started=true) call for the last vhost_dev,
> >>> and is unregistered in both vhost_vdpa_reset_status and
> >>> vhost_vdpa_cleanup.
> >>>
> >>> My original proposal was just to move the memory listener registration
> >>> to the last vhost_vdpa_init, and remove the unregister from
> >>> vhost_vdpa_reset_status. The calls to vhost_vdpa_dma_map/unmap would
> >>> be the same, the device should not realize this change.
> >> This can address LM downtime latency for sure, but it won't help
> >> downtime during dynamic SVQ switch - which still needs to go through the
> >> full unmap/map cycle (that includes the slow part for pinning) from
> >> passthrough to SVQ mode. Be noted not every device could work with a
> >> separate ASID for SVQ descriptors. The fix should expect to work on
> >> normal vDPA vendor devices without a separate descriptor ASID, with
> >> platform IOMMU underneath or with on-chip IOMMU.
> >>
> > At this moment the SVQ switch is very inefficient mapping-wise, as it
> > unmap all the GPA->HVA maps and overrides it. In particular, SVQ is
> > allocated in low regions of the iova space, and then the guest memory
> > is allocated in this new IOVA region incrementally.
> Yep. The key to build this fast path for SVQ switching I think is to
> maintain the identity mapping for the passthrough queues so that QEMU
> can reuse the old mappings for guest memory (e.g. GIOVA identity mapped
> to GPA) while incrementally adding new mappings for SVQ vrings.
>
> >
> > We can optimize that if we place SVQ in a free GPA area instead.
> Here's a question though: it might not be hard to find a free GPA range
> for the non-vIOMMU case (allocate iova from beyond the 48bit or 52bit
> ranges), but I'm not sure if easy to find a free GIOVA range for the
> vIOMMU case - particularly this has to work in the same entire 64bit
> IOVA address ranges that (for now) QEMU won't be able to "reserve" a
> specific IOVA ranges for SVQ from the vIOMMU. Do you foresee this can be
> done for every QEMU emulated vIOMMU (intel-iommu amd-iommu, arm smmu and
> virito-iommu) so that we can call it out as a generic means for SVQ
> switching optimization?
>

In the case vIOMMU allocates a new block we will use the same algorithm as now:
* Find a new free IOVA chunk of the same size
* Map this new SVQ IOVA, that may or may not be the same as SVQ

Since we must go through the translation phase to sanitize guest's
available descriptors anyway, it has zero added cost.

Another option would be to move the SVQ vring to a new region, but I
don't see any advantage on maintaining 1:1 mapping at that point.

> If this QEMU/vIOMMU "hack" is not universally feasible, I would rather
> build a fast path in the kernel via a new vhost IOTLB

[PATCH v2 0/2] ppc/pnv: Set P10 core xscom region size to match hardware

2023-07-05 Thread Nicholas Piggin

Sorry about the paper bag bug in the first version of the patch -
I broke powernv8 and 9.

This adds a xsom_size core class field to change the P10 size without
changing the others.

Also added a P10 xscom test, and passes make check.

Thanks,
Nick

Nicholas Piggin (2):
  ppc/pnv: Set P10 core xscom region size to match hardware
  tests/qtest: Add xscom tests for powernv10 machine

 hw/ppc/pnv_core.c|  6 +++--
 include/hw/ppc/pnv_core.h|  1 +
 include/hw/ppc/pnv_xscom.h   |  2 +-
 tests/qtest/pnv-xscom-test.c | 44 
 4 files changed, 41 insertions(+), 12 deletions(-)

-- 
2.40.1

[PATCH v2 2/2] tests/qtest: Add xscom tests for powernv10 machine

2023-07-05 Thread Nicholas Piggin

Add basic chip and core xscom tests for powernv10 machine, equivalent
to tests for powernv8 and 9.

Signed-off-by: Nicholas Piggin 
---
 tests/qtest/pnv-xscom-test.c | 44 
 1 file changed, 35 insertions(+), 9 deletions(-)

diff --git a/tests/qtest/pnv-xscom-test.c b/tests/qtest/pnv-xscom-test.c
index 2c46d5cf6d..80903fa782 100644
--- a/tests/qtest/pnv-xscom-test.c
+++ b/tests/qtest/pnv-xscom-test.c
@@ -15,6 +15,7 @@ typedef enum PnvChipType {
 PNV_CHIP_POWER8,  /* AKA Venice */
 PNV_CHIP_POWER8NVL,   /* AKA Naples */
 PNV_CHIP_POWER9,  /* AKA Nimbus */
+PNV_CHIP_POWER10,
 } PnvChipType;
 
 typedef struct PnvChip {
@@ -46,13 +47,22 @@ static const PnvChip pnv_chips[] = {
 .cfam_id= 0x220d10498000ull,
 .first_core = 0x0,
 },
+{
+.chip_type  = PNV_CHIP_POWER10,
+.cpu_model  = "POWER10",
+.xscom_base = 0x000603fcull,
+.cfam_id= 0x120da0498000ull,
+.first_core = 0x0,
+},
 };
 
 static uint64_t pnv_xscom_addr(const PnvChip *chip, uint32_t pcba)
 {
 uint64_t addr = chip->xscom_base;
 
-if (chip->chip_type == PNV_CHIP_POWER9) {
+if (chip->chip_type == PNV_CHIP_POWER10) {
+addr |= ((uint64_t) pcba << 3);
+} else if (chip->chip_type == PNV_CHIP_POWER9) {
 addr |= ((uint64_t) pcba << 3);
 } else {
 addr |= (((uint64_t) pcba << 4) & ~0xffull) |
@@ -82,6 +92,8 @@ static void test_cfam_id(const void *data)
 
 if (chip->chip_type == PNV_CHIP_POWER9) {
 machine = "powernv9";
+} else if (chip->chip_type == PNV_CHIP_POWER10) {
+machine = "powernv10";
 }
 
 qts = qtest_initf("-M %s -accel tcg -cpu %s",
@@ -96,23 +108,35 @@ static void test_cfam_id(const void *data)
 (PNV_XSCOM_EX_CORE_BASE | ((uint64_t)(core) << 24))
 #define PNV_XSCOM_P9_EC_BASE(core) \
 ((uint64_t)(((core) & 0x1F) + 0x20) << 24)
+#define PNV_XSCOM_P10_EC_BASE(core) \
+((uint64_t)core) & ~0x3) + 0x20) << 24) + 0x2 + (0x1000 << (3 - 
(core & 0x3
 
 #define PNV_XSCOM_EX_DTS_RESULT0 0x5
 
 static void test_xscom_core(QTestState *qts, const PnvChip *chip)
 {
-uint32_t first_core_dts0 = PNV_XSCOM_EX_DTS_RESULT0;
-uint64_t dts0;
+if (chip->chip_type == PNV_CHIP_POWER10) {
+uint32_t first_core_thread_state =
+ PNV_XSCOM_P10_EC_BASE(chip->first_core) + 0x412;
+uint64_t thread_state;
+
+thread_state = pnv_xscom_read(qts, chip, first_core_thread_state);
 
-if (chip->chip_type != PNV_CHIP_POWER9) {
-first_core_dts0 |= PNV_XSCOM_EX_BASE(chip->first_core);
+g_assert_cmphex(thread_state, ==, 0);
 } else {
-first_core_dts0 |= PNV_XSCOM_P9_EC_BASE(chip->first_core);
-}
+uint32_t first_core_dts0 = PNV_XSCOM_EX_DTS_RESULT0;
+uint64_t dts0;
 
-dts0 = pnv_xscom_read(qts, chip, first_core_dts0);
+if (chip->chip_type == PNV_CHIP_POWER9) {
+first_core_dts0 |= PNV_XSCOM_P9_EC_BASE(chip->first_core);
+} else { /* POWER8 */
+first_core_dts0 |= PNV_XSCOM_EX_BASE(chip->first_core);
+}
 
-g_assert_cmphex(dts0, ==, 0x26f024f023full);
+dts0 = pnv_xscom_read(qts, chip, first_core_dts0);
+
+g_assert_cmphex(dts0, ==, 0x26f024f023full);
+}
 }
 
 static void test_core(const void *data)
@@ -123,6 +147,8 @@ static void test_core(const void *data)
 
 if (chip->chip_type == PNV_CHIP_POWER9) {
 machine = "powernv9";
+} else if (chip->chip_type == PNV_CHIP_POWER10) {
+machine = "powernv10";
 }
 
 qts = qtest_initf("-M %s -accel tcg -cpu %s",
-- 
2.40.1

[PATCH v2 1/2] ppc/pnv: Set P10 core xscom region size to match hardware

2023-07-05 Thread Nicholas Piggin

The P10 core xscom memory regions overlap because the size is wrong.
The P10 core+L2 xscom region size is allocated as 0x1000 (with some
unused ranges). "EC" is used as a closer match, as "EX" includes L3
which has a disjoint xscom range that would require a different
region if it were implemented.

Signed-off-by: Nicholas Piggin 
---
 hw/ppc/pnv_core.c  | 6 --
 include/hw/ppc/pnv_core.h  | 1 +
 include/hw/ppc/pnv_xscom.h | 2 +-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
index 8a72171ce0..aa363e4b85 100644
--- a/hw/ppc/pnv_core.c
+++ b/hw/ppc/pnv_core.c
@@ -296,9 +296,8 @@ static void pnv_core_realize(DeviceState *dev, Error **errp)
 }
 
 snprintf(name, sizeof(name), "xscom-core.%d", cc->core_id);
-/* TODO: check PNV_XSCOM_EX_SIZE for p10 */
 pnv_xscom_region_init(>xscom_regs, OBJECT(dev), pcc->xscom_ops,
-  pc, name, PNV_XSCOM_EX_SIZE);
+  pc, name, pcc->xscom_size);
 
 qemu_register_reset(pnv_core_reset, pc);
 return;
@@ -350,6 +349,7 @@ static void pnv_core_power8_class_init(ObjectClass *oc, 
void *data)
 PnvCoreClass *pcc = PNV_CORE_CLASS(oc);
 
 pcc->xscom_ops = _core_power8_xscom_ops;
+pcc->xscom_size = PNV_XSCOM_EX_SIZE;
 }
 
 static void pnv_core_power9_class_init(ObjectClass *oc, void *data)
@@ -357,6 +357,7 @@ static void pnv_core_power9_class_init(ObjectClass *oc, 
void *data)
 PnvCoreClass *pcc = PNV_CORE_CLASS(oc);
 
 pcc->xscom_ops = _core_power9_xscom_ops;
+pcc->xscom_size = PNV_XSCOM_EX_SIZE;
 }
 
 static void pnv_core_power10_class_init(ObjectClass *oc, void *data)
@@ -364,6 +365,7 @@ static void pnv_core_power10_class_init(ObjectClass *oc, 
void *data)
 PnvCoreClass *pcc = PNV_CORE_CLASS(oc);
 
 pcc->xscom_ops = _core_power10_xscom_ops;
+pcc->xscom_size = PNV10_XSCOM_EC_SIZE;
 }
 
 static void pnv_core_class_init(ObjectClass *oc, void *data)
diff --git a/include/hw/ppc/pnv_core.h b/include/hw/ppc/pnv_core.h
index 77ef00f47a..aa5ca281fc 100644
--- a/include/hw/ppc/pnv_core.h
+++ b/include/hw/ppc/pnv_core.h
@@ -46,6 +46,7 @@ struct PnvCoreClass {
 DeviceClass parent_class;
 
 const MemoryRegionOps *xscom_ops;
+uint64_t xscom_size;
 };
 
 #define PNV_CORE_TYPE_SUFFIX "-" TYPE_PNV_CORE
diff --git a/include/hw/ppc/pnv_xscom.h b/include/hw/ppc/pnv_xscom.h
index f7da9a1dc6..a4c9d95dc5 100644
--- a/include/hw/ppc/pnv_xscom.h
+++ b/include/hw/ppc/pnv_xscom.h
@@ -133,7 +133,7 @@ struct PnvXScomInterfaceClass {
 
 #define PNV10_XSCOM_EC_BASE(core) \
 ((uint64_t) PNV10_XSCOM_EQ_BASE(core) | PNV10_XSCOM_EC(core & 0x3))
-#define PNV10_XSCOM_EC_SIZE0x10
+#define PNV10_XSCOM_EC_SIZE0x1000
 
 #define PNV10_XSCOM_PSIHB_BASE 0x3011D00
 #define PNV10_XSCOM_PSIHB_SIZE 0x100
-- 
2.40.1

[PATCH v2 1/1] pcie: Add hotplug detect state register to cmask

2023-07-05 Thread Leonardo Bras

When trying to migrate a machine type pc-q35-6.0 or lower, with this
cmdline options,

-device 
driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12
 \
-device 
driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1

the following bug happens after all ram pages were sent:

qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 
cmask: ff wmask: 0 w1cmask:19
qemu-kvm: Failed to load PCIDevice:config
qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
qemu-kvm: error while loading state for instance 0x0 of device 
':00:12.0/pcie-root-port'
qemu-kvm: load of migration failed: Invalid argument

This happens on pc-q35-6.0 or lower because of:
{ "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }

In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
which sets dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to signal PCI
hotplug for the guest. After a while the guest will deal with this hotplug
and qemu will clear the above bit.

Then, during migration, get_pci_config_device() will compare the
configs of both the freshly created device and the one that is being
received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
and cause the bug to reproduce.

To avoid this fake incompatibility, there are tree fields in PCIDevice that
can help:

- wmask: Used to implement R/W bytes, and
- w1cmask: Used to implement RW1C(Write 1 to Clear) bytes
- cmask: Used to enable config checks on load.

According to PCI Express® Base Specification Revision 5.0 Version 1.0,
table 7-27 (Slot Status Register) bit 6, the "Presence Detect State" is
listed as RO (read-only), so it only makes sense to make use of the cmask
field.

So, clear PCI_EXP_SLTSTA_PDS bit on cmask, so the fake incompatibility on
get_pci_config_device() does not abort the migration.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
Signed-off-by: Leonardo Bras 
---
 hw/pci/pcie.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index b8c24cf45f..cae56bf1c8 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -659,6 +659,10 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
 pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
PCI_EXP_HP_EV_SUPPORTED);
 
+/* Avoid migration abortion when this device hot-removed by guest */
+pci_word_test_and_clear_mask(dev->cmask + pos + PCI_EXP_SLTSTA,
+ PCI_EXP_SLTSTA_PDS);
+
 dev->exp.hpev_notified = false;
 
 qbus_set_hotplug_handler(BUS(pci_bridge_get_sec_bus(PCI_BRIDGE(dev))),
-- 
2.41.0

Re: [PATCH 1/1] pcie: Add hotplug detect state register to w1cmask

2023-07-05 Thread Leonardo Bras Soares Passos

On Wed, Jul 5, 2023 at 3:40 AM Leonardo Bras Soares Passos
 wrote:
>
> On Tue, Jul 4, 2023 at 3:43 AM Michael S. Tsirkin  wrote:
> >
> > On Tue, Jul 04, 2023 at 03:20:36AM -0300, Leonardo Brás wrote:
> > > Hello Peter and Michael, I have a few updates on this:
> > >
> > > On Mon, 2023-07-03 at 02:20 -0300, Leonardo Brás wrote:
> > > > Hello Peter and Michael, thanks for reviewing!
> > > >
> > > >
> > > > On Thu, 2023-06-29 at 16:56 -0400, Peter Xu wrote:
> > > > > On Thu, Jun 29, 2023 at 04:06:53PM -0400, Michael S. Tsirkin wrote:
> > > > > > On Thu, Jun 29, 2023 at 04:01:41PM -0400, Peter Xu wrote:
> > > > > > > On Thu, Jun 29, 2023 at 03:33:06PM -0400, Michael S. Tsirkin 
> > > > > > > wrote:
> > > > > > > > On Thu, Jun 29, 2023 at 01:01:53PM -0400, Peter Xu wrote:
> > > > > > > > > Hi, Leo,
> > > > > > > > >
> > > > > > > > > Thanks for figuring this out.  Let me copy a few more 
> > > > > > > > > potential reviewers
> > > > > > > > > from commit 17858a1695 ("hw/acpi/ich9: Set ACPI PCI hot-plug 
> > > > > > > > > as default on
> > > > > > > > > Q35").
> > > > > > > > >
> > > > > > > > > On Thu, Jun 29, 2023 at 06:05:00AM -0300, Leonardo Bras wrote:
> > > > > > > > > > When trying to migrate a machine type pc-q35-6.0 or lower, 
> > > > > > > > > > with this
> > > > > > > > > > cmdline options:
> > > > > > > > > >
> > > > > > > > > > -device 
> > > > > > > > > > driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12
> > > > > > > > > >  \
> > > > > > > > > > -device 
> > > > > > > > > > driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1
> > > > > > > > > >
> > > > > > > > > > the following bug happens after all ram pages were sent:
> > > > > > > > > >
> > > > > > > > > > qemu-kvm: get_pci_config_device: Bad config data: i=0x6e 
> > > > > > > > > > read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
> > > > > > > > > > qemu-kvm: Failed to load PCIDevice:config
> > > > > > > > > > qemu-kvm: Failed to load 
> > > > > > > > > > pcie-root-port:parent_obj.parent_obj.parent_obj
> > > > > > > > > > qemu-kvm: error while loading state for instance 0x0 of 
> > > > > > > > > > device ':00:12.0/pcie-root-port'
> > > > > > > > > > qemu-kvm: load of migration failed: Invalid argument
> > > > > > > > > >
> > > > > > > > > > This happens on pc-q35-6.0 or lower because of:
> > > > > > > > > > { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
> > > > > > > > > >
> > > > > > > > > > In this scenario, hotplug_handler_plug() calls 
> > > > > > > > > > pcie_cap_slot_plug_cb(),
> > > > > > > > > > which sets the bus dev->config byte 0x6e with bit 
> > > > > > > > > > PCI_EXP_SLTSTA_PDS to
> > > > > > > > > > signal PCI hotplug for the guest. After a while the guest 
> > > > > > > > > > will deal with
> > > > > > > > > > this hotplug and qemu will clear the above bit.
> > > > > > > >
> > > > > > > > Presence Detect State – This bit indicates the presence of an
> > > > > > > > adapter in the slot, reflected by the logical “OR” of the 
> > > > > > > > Physical
> > > > > > > > Layer in-band presence detect mechanism and, if present, any
> > > > > > > > out-of-band presence detect mechanism defined for the slot’s
> > > > > > > > corresponding form factor. Note that the in-band presence
> > > > > > > > detect mechanism requires that power be applied to an adapter
> > > > > > > > for its presence to be detected. Consequently, form factors that
> > > > > > > > require a power controller for hot-plug must implement a
> > > > > > > > physical pin presence detect mechanism.
> > > > > > > > RO
> > > > > > > > Defined encodings are:
> > > > > > > > 0b Slot Empty
> > > > > > > > 1b Card Present in slot
> > > > > > > > This bit must be implemented on all Downstream Ports that
> > > > > > > > implement slots. For Downstream Ports not connected to slots
> > > > > > > > (where the Slot Implemented bit of the PCI Express Capabilities
> > > > > > > > register is 0b), this bit must be hardwired to 1b.
> > > >
> > > > Thank you for providing this doc!
> > > > I am new to PCI stuff, could you please point this doc?
> > >
> > > (I mean, the link to the documentation)
> >
> > The pci specs are all here: https://pcisig.com/
> > Red Hat is a member so just register, it's free.
> >
> > I'd get the 5.0 version of pci express base:
> > https://members.pcisig.com/wg/PCI-SIG/document/13005
> >
> > 6.0 is out but they did something to make it take years to open,
> > and it shouldn't matter for this.
>
> This is great! Thanks for sharing!
>
> >
> > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > And this seems to match what QEMU is doing: it clears on unplug
> > > > > > > > not after guest deals with hotplug.
> > > >
> > > > Oh, that's weird.
> > > > It should not unplug the device, so IIUC it should not clear the bit.
> > > > Maybe something weird is happening in the guest, I will take a look.
> > >
> > > Updates on this:
> > > You are right! For some reason the guest is hot-unplugging the

[PATCH v2] riscv: Generate devicetree only after machine initialization is complete

2023-07-05 Thread Guenter Roeck

If the devicetree is created before machine initialization is complete,
it misses dynamic devices. Specifically, the tpm device is not added
to the devicetree file and is therefore not instantiated in Linux.
Load/create devicetree in virt_machine_done() to solve the problem.

Cc: Daniel Henrique Barboza 
Cc: Alistair Francis 
Cc: Daniel Henrique Barboza 
Fixes: 325b7c4e75 hw/riscv: Enable TPM backends
Signed-off-by: Guenter Roeck 
---
v2: Handle devicetree (load & create) entirely in machine_done function.

 hw/riscv/virt.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index ed4c27487e..1c4bd823df 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -1248,6 +1248,17 @@ static void virt_machine_done(Notifier *notifier, void 
*data)
 uint64_t kernel_entry = 0;
 BlockBackend *pflash_blk0;
 
+/* load/create device tree */
+if (machine->dtb) {
+machine->fdt = load_device_tree(machine->dtb, >fdt_size);
+if (!machine->fdt) {
+error_report("load_device_tree() failed");
+exit(1);
+}
+} else {
+create_fdt(s, memmap);
+}
+
 /*
  * Only direct boot kernel is currently supported for KVM VM,
  * so the "-bios" parameter is not supported when KVM is enabled.
@@ -1508,17 +1519,6 @@ static void virt_machine_init(MachineState *machine)
 }
 virt_flash_map(s, system_memory);
 
-/* load/create device tree */
-if (machine->dtb) {
-machine->fdt = load_device_tree(machine->dtb, >fdt_size);
-if (!machine->fdt) {
-error_report("load_device_tree() failed");
-exit(1);
-}
-} else {
-create_fdt(s, memmap);
-}
-
 s->machine_done.notify = virt_machine_done;
 qemu_add_machine_init_done_notifier(>machine_done);
 }
-- 
2.39.2

Re: [PATCH qemu v5] aspeed add montblanc bmc reference from fuji

2023-07-05 Thread Sittisak Sinprem

Hi Mike,

the FRUID data, it is used to define the BMC Mac address,
to able the CIT, test_eeprom, test_bmc_mac on Qemu.

On Thu, Jul 6, 2023 at 12:38 AM Mike Choi  wrote:

> Hi Sittisak,
>
>
>
> Minipack3 is not open-sourced yet, and we are unlikely to be able to
> upstream detailed data.
>
>
>
>1. What is these FRUID datas for, is it for testing?
>2. What other option do we have, since we are not able to upstream
>FRUID data. (It is still OK to upstream system configuration, but NOT the
>arrays of _fruid data array)
>
>
>
> Thanks,
>
> Mike
>
>
>
>
>
> *From: *Cédric Le Goater 
> *Date: *Tuesday, July 4, 2023 at 7:07 AM
> *To: *Sittisak Sinprem , Bin Huang <
> binhu...@meta.com>, Tao Ren , Mike Choi <
> mikec...@meta.com>
> *Cc: *qemu-devel@nongnu.org , qemu-...@nongnu.org <
> qemu-...@nongnu.org>, peter.mayd...@linaro.org ,
> and...@aj.id.au , Joel Stanley ,
> qemu-sta...@nongnu.org , srika...@celestica.com <
> srika...@celestica.com>, ssu...@celestica.com ,
> thangavel...@celestica.com ,
> kgen...@celestica.com , anandaram...@celestica.com
> 
> *Subject: *Re: [PATCH qemu v5] aspeed add montblanc bmc reference from
> fuji
>
>
>
> On 7/4/23 15:27, Sittisak Sinprem wrote:
> > Hi Meta Team,
> >
> > the FRU EEPROM content, I think for now detail still be confidential,
> > Please confirm, Can we add the description in Qemu upstream following
> Cedric's request?
>
> We don't need all the details, and not the confidential part of course.
>
> C.
>
> >
> > On Tue, Jul 4, 2023 at 6:19 PM Cédric Le Goater  mailto:c...@kaod.org >> wrote:
> >
> > On 7/4/23 13:06, ~ssinprem wrote:
> >  > From: Sittisak Sinprem  mailto:ssinp...@celestica.com >>
> >  >
> >  > - I2C list follow I2C Tree v1.6 20230320
> >  > - fru eeprom data use FB FRU format version 4
> >  >
> >  > Signed-off-by: Sittisak Sinprem  mailto:ssinp...@celestica.com >>
> >
> > You shoot too fast :) Please add some description for the EEPROM
> contents.
> > What they enable when the OS/FW boots is good to know for QEMU.
> >
> > Thanks,
> >
> > C.
> >
> >
> >  > ---
> >  >   docs/system/arm/aspeed.rst |  1 +
> >  >   hw/arm/aspeed.c| 65
> ++
> >  >   hw/arm/aspeed_eeprom.c | 50 +
> >  >   hw/arm/aspeed_eeprom.h |  7 
> >  >   4 files changed, 123 insertions(+)
> >  >
> >  > diff --git a/docs/system/arm/aspeed.rst
> b/docs/system/arm/aspeed.rst
> >  > index 80538422a1..5e0824f48b 100644
> >  > --- a/docs/system/arm/aspeed.rst
> >  > +++ b/docs/system/arm/aspeed.rst
> >  > @@ -33,6 +33,7 @@ AST2600 SoC based machines :
> >  >   - ``tacoma-bmc``   OpenPOWER Witherspoon POWER9 AST2600
> BMC
> >  >   - ``rainier-bmc``  IBM Rainier POWER10 BMC
> >  >   - ``fuji-bmc`` Facebook Fuji BMC
> >  > +- ``montblanc-bmc``Facebook Montblanc BMC
> >  >   - ``bletchley-bmc``Facebook Bletchley BMC
> >  >   - ``fby35-bmc``Facebook fby35 BMC
> >  >   - ``qcom-dc-scm-v1-bmc``   Qualcomm DC-SCM V1 BMC
> >  > diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> >  > index 9fca644d92..bbb7a3392c 100644
> >  > --- a/hw/arm/aspeed.c
> >  > +++ b/hw/arm/aspeed.c
> >  > @@ -189,6 +189,10 @@ struct AspeedMachineState {
> >  >   #define FUJI_BMC_HW_STRAP10x
> >  >   #define FUJI_BMC_HW_STRAP20x
> >  >
> >  > +/* Montblanc hardware value */
> >  > +#define MONTBLANC_BMC_HW_STRAP10x
> >  > +#define MONTBLANC_BMC_HW_STRAP20x
> >  > +
> >  >   /* Bletchley hardware value */
> >  >   /* TODO: Leave same as EVB for now. */
> >  >   #define BLETCHLEY_BMC_HW_STRAP1 AST2600_EVB_HW_STRAP1
> >  > @@ -925,6 +929,41 @@ static void
> fuji_bmc_i2c_init(AspeedMachineState *bmc)
> >  >   }
> >  >   }
> >  >
> >  > +static void montblanc_bmc_i2c_init(AspeedMachineState *bmc)
> >  > +{
> >  > +AspeedSoCState *soc = >soc;
> >  > +I2CBus *i2c[16] = {};
> >  > +
> >  > +for (int i = 0; i < 16; i++) {
> >  > +i2c[i] = aspeed_i2c_get_bus(>i2c, i);
> >  > +}
> >  > +
> >  > +/* Ref from Minipack3_I2C_Tree_V1.6 20230320 */
> >  > +at24c_eeprom_init_rom(i2c[3], 0x56, 8192,
> montblanc_scm_fruid,
> >  > +  montblanc_scm_fruid_len);
> >  > +at24c_eeprom_init_rom(i2c[6], 0x53, 8192,
> montblanc_fcm_fruid,
> >  > +  montblanc_fcm_fruid_len);
> >  > +
> >  > +/* CPLD and FPGA */
> >  > +at24c_eeprom_init(i2c[1], 0x35, 256);  /* SCM CPLD */
> >  > +at24c_eeprom_init(i2c[5], 0x35, 256);  /* COMe CPLD TODO:
> need to update */
> >  > +at24c_eeprom_init(i2c[12], 0x60, 256); /* MCB PWR CPLD */
> >  > +at24c_eeprom_init(i2c[13], 0x35, 256); /* IOB FPGA

[PATCH] ppc/pnv: Log all unimp warnings with similar message

2023-07-05 Thread Joel Stanley

Add the function name so there's an indication as to where the message
is coming from. Change all prints to use the offset instead of the
address.

Signed-off-by: Joel Stanley 
---
Happy to use the address instead of the offset (or print both), but I
like the idea of being consistent.
---
 hw/ppc/pnv_core.c | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
index ffbc29cbf4f9..3eb95670d6a3 100644
--- a/hw/ppc/pnv_core.c
+++ b/hw/ppc/pnv_core.c
@@ -85,8 +85,8 @@ static uint64_t pnv_core_power8_xscom_read(void *opaque, 
hwaddr addr,
 val = 0x24full;
 break;
 default:
-qemu_log_mask(LOG_UNIMP, "Warning: reading reg=0x%" HWADDR_PRIx "\n",
-  addr);
+qemu_log_mask(LOG_UNIMP, "%s: unimp read 0x%08x\n", __func__,
+  offset);
 }
 
 return val;
@@ -95,8 +95,10 @@ static uint64_t pnv_core_power8_xscom_read(void *opaque, 
hwaddr addr,
 static void pnv_core_power8_xscom_write(void *opaque, hwaddr addr, uint64_t 
val,
 unsigned int width)
 {
-qemu_log_mask(LOG_UNIMP, "Warning: writing to reg=0x%" HWADDR_PRIx "\n",
-  addr);
+uint32_t offset = addr >> 3;
+
+qemu_log_mask(LOG_UNIMP, "%s: unimp write 0x%08x\n", __func__,
+  offset);
 }
 
 static const MemoryRegionOps pnv_core_power8_xscom_ops = {
@@ -140,8 +142,8 @@ static uint64_t pnv_core_power9_xscom_read(void *opaque, 
hwaddr addr,
 val = 0;
 break;
 default:
-qemu_log_mask(LOG_UNIMP, "Warning: reading reg=0x%" HWADDR_PRIx "\n",
-  addr);
+qemu_log_mask(LOG_UNIMP, "%s: unimp read 0x%08x\n", __func__,
+  offset);
 }
 
 return val;
@@ -157,8 +159,8 @@ static void pnv_core_power9_xscom_write(void *opaque, 
hwaddr addr, uint64_t val,
 case PNV9_XSCOM_EC_PPM_SPECIAL_WKUP_OTR:
 break;
 default:
-qemu_log_mask(LOG_UNIMP, "Warning: writing to reg=0x%" HWADDR_PRIx 
"\n",
-  addr);
+qemu_log_mask(LOG_UNIMP, "%s: unimp write 0x%08x\n", __func__,
+  offset);
 }
 }
 
@@ -189,8 +191,8 @@ static uint64_t pnv_core_power10_xscom_read(void *opaque, 
hwaddr addr,
 val = 0;
 break;
 default:
-qemu_log_mask(LOG_UNIMP, "Warning: reading reg=0x%" HWADDR_PRIx "\n",
-  addr);
+qemu_log_mask(LOG_UNIMP, "%s: unimp read 0x%08x\n", __func__,
+  offset);
 }
 
 return val;
@@ -203,8 +205,8 @@ static void pnv_core_power10_xscom_write(void *opaque, 
hwaddr addr,
 
 switch (offset) {
 default:
-qemu_log_mask(LOG_UNIMP, "Warning: writing to reg=0x%" HWADDR_PRIx 
"\n",
-  addr);
+qemu_log_mask(LOG_UNIMP, "%s: unimp write 0x%08x\n", __func__,
+  offset);
 }
 }
 
@@ -421,7 +423,7 @@ static uint64_t pnv_quad_power9_xscom_read(void *opaque, 
hwaddr addr,
 val = 0;
 break;
 default:
-qemu_log_mask(LOG_UNIMP, "%s: reading @0x%08x\n", __func__,
+qemu_log_mask(LOG_UNIMP, "%s: unimp read 0x%08x\n", __func__,
   offset);
 }
 
@@ -438,7 +440,7 @@ static void pnv_quad_power9_xscom_write(void *opaque, 
hwaddr addr, uint64_t val,
 case P9X_EX_NCU_SPEC_BAR + 0x400: /* Second EX */
 break;
 default:
-qemu_log_mask(LOG_UNIMP, "%s: writing @0x%08x\n", __func__,
+qemu_log_mask(LOG_UNIMP, "%s: unimp write 0x%08x\n", __func__,
   offset);
 }
 }
@@ -465,7 +467,7 @@ static uint64_t pnv_quad_power10_xscom_read(void *opaque, 
hwaddr addr,
 
 switch (offset) {
 default:
-qemu_log_mask(LOG_UNIMP, "%s: reading @0x%08x\n", __func__,
+qemu_log_mask(LOG_UNIMP, "%s: unimp read 0x%08x\n", __func__,
   offset);
 }
 
@@ -479,7 +481,7 @@ static void pnv_quad_power10_xscom_write(void *opaque, 
hwaddr addr,
 
 switch (offset) {
 default:
-qemu_log_mask(LOG_UNIMP, "%s: writing @0x%08x\n", __func__,
+qemu_log_mask(LOG_UNIMP, "%s: unimp write 0x%08x\n", __func__,
   offset);
 }
 }
-- 
2.40.1

Re: [PATCH] ppc/pnv: Set P10 core xscom region size to match hardware

2023-07-05 Thread Joel Stanley

On Wed, 5 Jul 2023 at 10:02, Cédric Le Goater  wrote:
>
> On 7/5/23 04:05, Joel Stanley wrote:
> > On Wed, 5 Jul 2023 at 01:27, Nicholas Piggin  wrote:
> >>
> >> The P10 core xscom memory regions overlap because the size is wrong.
> >> The P10 core+L2 xscom region size is allocated as 0x1000 (with some
> >> unused ranges). "EC" is used as a closer match, as "EX" includes L3
> >> which has a disjoint xscom range that would require a different
> >> region if it were implemented.
> >>
> >> Signed-off-by: Nicholas Piggin 
> >
> > Nice, that looks better:
> >
> > 0001-0001000f (prio 0, i/o): xscom-quad.0: 0x10
> > 000100108000-00010010 (prio 0, i/o): xscom-core.3: 0x8000
> > 00010011-000100117fff (prio 0, i/o): xscom-core.2: 0x8000
> > 00010012-000100127fff (prio 0, i/o): xscom-core.1: 0x8000
> > 00010014-000100147fff (prio 0, i/o): xscom-core.0: 0x8000
> > 00010800-0001080f (prio 0, i/o): xscom-quad.4: 0x10
> > 000108108000-00010810 (prio 0, i/o): xscom-core.7: 0x8000
> > 00010811-000108117fff (prio 0, i/o): xscom-core.6: 0x8000
> > 00010812-000108127fff (prio 0, i/o): xscom-core.5: 0x8000
> > 00010814-000108147fff (prio 0, i/o): xscom-core.4: 0x8000
> >
> > Reviewed-by: Joel Stanley 
>
> It'd interesting to add some dummy SLW handlers to get rid of the
> XSCOM errors at boot and shutdown on P10 :
>
> [ 4824.393446266,3] XSCOM: write error gcid=0x0 pcb_addr=0x200e883c stat=0x0
> [ 4824.393588777,5] Unable to log error
> [ 4824.393650582,3] XSCOM: Write failed, ret =  -6
> [ 4824.394124623,3] Could not set special wakeup on 0:0: Unable to write 
> QME_SPWU_HYP.
> [ 4824.394368459,3] XSCOM: write error gcid=0x0 pcb_addr=0x200e883c stat=0x0
> [ 4824.394382007,5] Unable to log error
> [ 4824.394384603,3] XSCOM: Write failed, ret =  -6

Yes. I was looking at this yesterday. We need to figure out how to do
the xscom addressing for the QME. It sets (different) bits in order to
address a given core.

For a -smp 4 machine, the P10_QME_SPWU_HYP read comes in on these addresses:

case 0x200e883c:
case 0x200e483c:
case 0x200e283c:
case 0x200e183c:

ie, the fourth nibble selects the core.

For a -smp 8 machine, the address now has bit 24 set to select the
second quad, so we need to cover these addresses:

case 0x210e883c:
case 0x210e483c:
case 0x210e283c:
case 0x210e183c:

I am thinking about how to map this into an address range that a model
can claim.

Cheers,

Joel

PS. For reference, this is sufficient to silence xscom errors with
skiboot and -M powernv10 -smp4. A different set of hacks is required
for p9.

--- a/hw/ppc/pnv_xscom.c
+++ b/hw/ppc/pnv_xscom.c
@@ -106,6 +106,26 @@ static uint64_t xscom_read_default(PnvChip *chip,
uint32_t pcba)
 case 0x401082a:
 case 0x4010828:
 return 0;
+
+/* P10_QME_SPWU_HYP */
+case 0x200e883c:
+case 0x200e483c:
+case 0x200e283c:
+case 0x200e183c:
+return 0;
+
+/* P10_QME_SSH_HYP */
+case 0x200e882c:
+case 0x200e482c:
+case 0x200e282c:
+case 0x200e182c:
+return 0;
+
+/* XPEC_P10_PCI_CPLT_CONF1 */
+case 0x0809:
+case 0x0909:
+return 0;
+
 default:
 return -1;
 }
@@ -152,6 +172,13 @@ static bool xscom_write_default(PnvChip *chip,
uint32_t pcba, uint64_t val)
 case PRD_P8_IPOLL_REG_STATUS:
 case PRD_P9_IPOLL_REG_MASK:
 case PRD_P9_IPOLL_REG_STATUS:
+
+/* P10_QME_SPWU_HYP */
+case 0x200e883c:
+case 0x200e483c:
+case 0x200e283c:
+case 0x200e183c:
+
 return true;
 default:
 return false;

Re: [PATCH v8 5/6] hw/pci: warn when PCIe device is plugged into non-zero slot of downstream port

2023-07-05 Thread Akihiko Odaki


On 2023/07/05 20:59, Ani Sinha wrote:

PCIe downstream ports only have a single device 0, so PCI Express devices can
only be plugged into slot 0 on a PCIe port. Add a warning to let users know
when the invalid configuration is used. We may enforce this more strongly later
once we get more clarity on whether we are introducing a bad regression for
users currently using the wrong configuration.

The change has been tested to not break or alter behaviors of ARI capable
devices by instantiating seven vfs on an emulated igb device (the maximum
number of vfs the igb device supports). The vfs are instantiated correctly
and are seen to have non-zero device/slot numbers in the conventional PCI BDF
representation.

CC: jus...@redhat.com
CC: imamm...@redhat.com
CC: m...@redhat.com
CC: akihiko.od...@daynix.com

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
Signed-off-by: Ani Sinha 
Reviewed-by: Julia Suvorova 


Reviewed-by: Akihiko Odaki

Re: [PATCH v2 00/14] PPC440 devices misc clean up


On Wed, 5 Jul 2023, Daniel Henrique Barboza wrote:

Zoltan,

Patches 1-9 are queued. Don't need to re-send those.


Thanks, the last two patches are also reviewed and they don't depend on 
the ones before so you could queue those too.


The only outstanding patches are those 3 that rename the type defines to 
match their string values. We could come up with better names but those 
suggested by Philippe are too long IMO so at least the patches in this 
series clean up the current mess and we could rename these later. I'd 
rather not change the string values too much as those are what QOM 
actually uses to ideintify the types but we're free to change the defines. 
Currently we have:

#define TYPE_PPC4xx_PCI_HOST_BRIDGE "ppc4xx-pcihost"
and then a "ppc4xx-host-bridge" type without a define which is another 
type which is quite confusing. I may have partly created this mess back 
when I first tried to add sam460ex and did not know much about this but at 
least I'd like to improve it a little and resolve some of it now.


Regards,
BALATON Zoltan

Re: [PATCH v2 00/14] PPC440 devices misc clean up


Zoltan,

Patches 1-9 are queued. Don't need to re-send those.


Thanks,

Daniel

On 7/5/23 17:12, BALATON Zoltan wrote:

These are some small misc clean ups to PPC440 related device models
which is all I have ready for now.

v2:
- Added R-b tags from Philippe
- Addressed review comments
- Added new patch to rename parent field of PPC460EXPCIEState to parent_obj

Patches needing review: 6 7 10-13

BALATON Zoltan (14):
   ppc440: Change ppc460ex_pcie_init() parameter type
   ppc440: Add cpu link property to PCIe controller model
   ppc440: Add a macro to shorten PCIe controller DCR registration
   ppc440: Rename parent field of PPC460EXPCIEState to match code style
   ppc440: Rename local variable in dcr_read_pcie()
   ppc440: Stop using system io region for PCIe buses
   ppc/sam460ex: Remove address_space_mem local variable
   ppc440: Add busnum property to PCIe controller model
   ppc440: Remove ppc460ex_pcie_init legacy init function
   ppc4xx_pci: Rename QOM type name define
   ppc4xx_pci: Add define for ppc4xx-host-bridge type name
   ppc440_pcix: Rename QOM type define abd move it to common header
   ppc440_pcix: Don't use iomem for regs
   ppc440_pcix: Stop using system io region for PCI bus

  hw/ppc/ppc440.h |   1 -
  hw/ppc/ppc440_bamboo.c  |   3 +-
  hw/ppc/ppc440_pcix.c|  28 +++---
  hw/ppc/ppc440_uc.c  | 192 +---
  hw/ppc/ppc4xx_pci.c |  10 +--
  hw/ppc/sam460ex.c   |  33 ---
  include/hw/ppc/ppc4xx.h |   5 +-
  7 files changed, 129 insertions(+), 143 deletions(-)

Re: [PATCH v2 07/14] ppc/sam460ex: Remove address_space_mem local variable





On 7/5/23 17:12, BALATON Zoltan wrote:

Some places already use  get_system_memory() directly so replace the
remaining uses and drop the local variable.

Signed-off-by: BALATON Zoltan 
---


Reviewed-by: Daniel Henrique Barboza 


  hw/ppc/sam460ex.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/sam460ex.c b/hw/ppc/sam460ex.c
index aaa8d2f4a5..f098226974 100644
--- a/hw/ppc/sam460ex.c
+++ b/hw/ppc/sam460ex.c
@@ -266,7 +266,6 @@ static void main_cpu_reset(void *opaque)
  
  static void sam460ex_init(MachineState *machine)

  {
-MemoryRegion *address_space_mem = get_system_memory();
  MemoryRegion *isa = g_new(MemoryRegion, 1);
  MemoryRegion *l2cache_ram = g_new(MemoryRegion, 1);
  DeviceState *uic[4];
@@ -406,7 +405,8 @@ static void sam460ex_init(MachineState *machine)
  /* FIXME: remove this after fixing l2sram mapping in ppc440_uc.c? */
  memory_region_init_ram(l2cache_ram, NULL, "ppc440.l2cache_ram", 256 * KiB,
 _abort);
-memory_region_add_subregion(address_space_mem, 0x4LL, l2cache_ram);
+memory_region_add_subregion(get_system_memory(), 0x4LL,
+l2cache_ram);
  
  /* USB */

  sysbus_create_simple(TYPE_PPC4xx_EHCI, 0x4bffd0400,
@@ -444,13 +444,13 @@ static void sam460ex_init(MachineState *machine)
  /* SoC has 4 UARTs
   * but board has only one wired and two are present in fdt */
  if (serial_hd(0) != NULL) {
-serial_mm_init(address_space_mem, 0x4ef600300, 0,
+serial_mm_init(get_system_memory(), 0x4ef600300, 0,
 qdev_get_gpio_in(uic[1], 1),
 PPC_SERIAL_MM_BAUDBASE, serial_hd(0),
 DEVICE_BIG_ENDIAN);
  }
  if (serial_hd(1) != NULL) {
-serial_mm_init(address_space_mem, 0x4ef600400, 0,
+serial_mm_init(get_system_memory(), 0x4ef600400, 0,
 qdev_get_gpio_in(uic[0], 1),
 PPC_SERIAL_MM_BAUDBASE, serial_hd(1),
 DEVICE_BIG_ENDIAN);

Re: Reducing vdpa migration downtime because of memory pin / maps

2023-07-05 Thread Si-Wei Liu





On 7/5/2023 11:03 AM, Eugenio Perez Martin wrote:

On Tue, Jun 27, 2023 at 8:36 AM Si-Wei Liu  wrote:



On 6/9/2023 7:32 AM, Eugenio Perez Martin wrote:

On Fri, Jun 9, 2023 at 12:39 AM Si-Wei Liu  wrote:

On 6/7/23 01:08, Eugenio Perez Martin wrote:

On Wed, Jun 7, 2023 at 12:43 AM Si-Wei Liu  wrote:

Sorry for reviving this old thread, I lost the best timing to follow up
on this while I was on vacation. I have been working on this and found
out some discrepancy, please see below.

On 4/5/23 04:37, Eugenio Perez Martin wrote:

Hi!

As mentioned in the last upstream virtio-networking meeting, one of
the factors that adds more downtime to migration is the handling of
the guest memory (pin, map, etc). At this moment this handling is
bound to the virtio life cycle (DRIVER_OK, RESET). In that sense, the
destination device waits until all the guest memory / state is
migrated to start pinning all the memory.

The proposal is to bind it to the char device life cycle (open vs
close),

Hmmm, really? If it's the life cycle for char device, the next guest /
qemu launch on the same vhost-vdpa device node won't make it work.


Maybe my sentence was not accurate, but I think we're on the same page here.

Two qemu instances opening the same char device at the same time are
not allowed, and vhost_vdpa_release clean all the maps. So the next
qemu that opens the char device should see a clean device anyway.

I mean the pin can't be done at the time of char device open, where the
user address space is not known/bound yet. The earliest point possible
for pinning would be until the vhost_attach_mm() call from SET_OWNER is
done.

Maybe we are deviating, let me start again.

Using QEMU code, what I'm proposing is to modify the lifecycle of the
.listener member of struct vhost_vdpa.

At this moment, the memory listener is registered at
vhost_vdpa_dev_start(dev, started=true) call for the last vhost_dev,
and is unregistered in both vhost_vdpa_reset_status and
vhost_vdpa_cleanup.

My original proposal was just to move the memory listener registration
to the last vhost_vdpa_init, and remove the unregister from
vhost_vdpa_reset_status. The calls to vhost_vdpa_dma_map/unmap would
be the same, the device should not realize this change.

This can address LM downtime latency for sure, but it won't help
downtime during dynamic SVQ switch - which still needs to go through the
full unmap/map cycle (that includes the slow part for pinning) from
passthrough to SVQ mode. Be noted not every device could work with a
separate ASID for SVQ descriptors. The fix should expect to work on
normal vDPA vendor devices without a separate descriptor ASID, with
platform IOMMU underneath or with on-chip IOMMU.


At this moment the SVQ switch is very inefficient mapping-wise, as it
unmap all the GPA->HVA maps and overrides it. In particular, SVQ is
allocated in low regions of the iova space, and then the guest memory
is allocated in this new IOVA region incrementally.
Yep. The key to build this fast path for SVQ switching I think is to 
maintain the identity mapping for the passthrough queues so that QEMU 
can reuse the old mappings for guest memory (e.g. GIOVA identity mapped 
to GPA) while incrementally adding new mappings for SVQ vrings.




We can optimize that if we place SVQ in a free GPA area instead.
Here's a question though: it might not be hard to find a free GPA range 
for the non-vIOMMU case (allocate iova from beyond the 48bit or 52bit 
ranges), but I'm not sure if easy to find a free GIOVA range for the 
vIOMMU case - particularly this has to work in the same entire 64bit 
IOVA address ranges that (for now) QEMU won't be able to "reserve" a 
specific IOVA ranges for SVQ from the vIOMMU. Do you foresee this can be 
done for every QEMU emulated vIOMMU (intel-iommu amd-iommu, arm smmu and 
virito-iommu) so that we can call it out as a generic means for SVQ 
switching optimization?


If this QEMU/vIOMMU "hack" is not universally feasible, I would rather 
build a fast path in the kernel via a new vhost IOTLB command, say 
INVALIDATE_AND_UPDATE_ALL, to atomically flush all existing 
(passthrough) mappings and update to use the SVQ ones in a single batch, 
while keeping the pages for guest memory always pinned (the kernel will 
make this decision). This doesn't expose pinning to userspace, and can 
also fix downtime issue.



  All
of the "translations" still need to be done, to ensure the guest
doesn't have access to SVQ vring. That way, qemu will not send all the
unmaps & maps, only the new ones. And vhost/vdpa does not need to call
unpin_user_page / pin_user_pages for all the guest memory.

More optimizations include the batching of the SVQ vrings.

Nods.




One of the concerns was that it could delay VM initialization, and I
didn't profile it but I think that may be the case.

Yes, that's the concern here - we should not introduce regression to
normal VM boot process/time. In case of large VM it's very easy to see
the side effect if we go

Re: [PATCH] ui/gtk: set the area of the scanout texture correctly

2023-07-05 Thread Kim, Dongwon


On 7/4/2023 9:07 AM, Marc-André Lureau wrote:

Hi

On Mon, Jun 26, 2023 at 9:49 PM Kim, Dongwon  
wrote:


Hi Marc-André Lureau,

On 6/26/2023 4:56 AM, Marc-André Lureau wrote:
> Hi
>
> On Wed, Jun 21, 2023 at 11:53 PM Dongwon Kim

> wrote:
>
>     x and y offsets and width and height of the scanout texture
>     is not correctly configured in case guest scanout frame is
>     dmabuf.
>
>     Cc: Gerd Hoffmann 
>     Cc: Marc-André Lureau 
>     Cc: Vivek Kasireddy 
>     Signed-off-by: Dongwon Kim 
>
>
> I find this a bit confusing, and I don't know how to actually
test it.
>
> The only place where scanout_{width, height} are set is
> virtio_gpu_create_dmabuf() and there, they have the same values as
> width and height. it's too easy to get confused with the values
imho.

Yes, scanout_width/height are same as width/height as far as there is
only one guest display exist. But they will be different in case
there
multiple displays on the guest side, configured in extended mode
(when
the guest is running Xorg).

In this case, blob for the guest display is same for scanout 1 and
2 but
each scanout will have different offset and
scanout_width/scanout_height
to reference a sub region in the same blob(dmabuf).

I added x/y/scanout_width/scanout_height with a previous commit:

commit e86a93f55463c088aa0b5260e915ffbf9f86c62b
Author: Dongwon Kim 
Date:   Wed Nov 3 23:51:52 2021 -0700

 virtio-gpu: splitting one extended mode guest fb into n-scanouts

> I find the terminology we use for ScanoutTexture much clearer.
It uses
> backing_{width, height} instead, which indicates quite clearly that
> the usual x/y/w/h are for the sub-region to be shown.
yeah agreed. Then dmabuf->width/height should be changed to
dmabuf->backing_width/height and dmabuf->width/height will be
replacing
dmabuf->scanout_width/scanout_height. I guess this is what you
meant, right?


right, can you send a new patch?
thanks


https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg01081.html

Thanks!



> I think we should have a preliminary commit that renames
> scanout_{width, height}.
>
> Please give some help/hints on how to actually test this code too.

So this patch is just to make things look consistent in the code
level.
Having offset (0,0) in this function call for all different scanouts
didn't look right to me. This code change won't make anything done
differently though. So no test is applicable.

>
> Thanks!
>
>
>     ---
>      ui/gtk-egl.c     | 3 ++-
>      ui/gtk-gl-area.c | 3 ++-
>      2 files changed, 4 insertions(+), 2 deletions(-)
>
>     diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
>     index 19130041bc..e99e3b0d8c 100644
>     --- a/ui/gtk-egl.c
>     +++ b/ui/gtk-egl.c
>     @@ -257,7 +257,8 @@ void
>     gd_egl_scanout_dmabuf(DisplayChangeListener *dcl,
>
>          gd_egl_scanout_texture(dcl, dmabuf->texture,
>                                 dmabuf->y0_top, dmabuf->width,
>     dmabuf->height,
>     -                           0, 0, dmabuf->width,
dmabuf->height);
>     +                           dmabuf->x, dmabuf->y,
>     dmabuf->scanout_width,
>     +  dmabuf->scanout_height);
>
>          if (dmabuf->allow_fences) {
>              vc->gfx.guest_fb.dmabuf = dmabuf;
>     diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
>     index c384a1516b..1605818bd1 100644
>     --- a/ui/gtk-gl-area.c
>     +++ b/ui/gtk-gl-area.c
>     @@ -299,7 +299,8 @@ void
>     gd_gl_area_scanout_dmabuf(DisplayChangeListener *dcl,
>
>          gd_gl_area_scanout_texture(dcl, dmabuf->texture,
>                                     dmabuf->y0_top, dmabuf->width,
>     dmabuf->height,
>     -                               0, 0, dmabuf->width,
dmabuf->height);
>     +                               dmabuf->x, dmabuf->y,
>     dmabuf->scanout_width,
>     +  dmabuf->scanout_height);
>
>          if (dmabuf->allow_fences) {
>              vc->gfx.guest_fb.dmabuf = dmabuf;
>     --
>     2.34.1
>
>
>
>
> --
> Marc-André Lureau



--
Marc-André Lureau

[PATCH] virtio-gpu-udmabuf: replacing scanout_width/height with backing_width/height

2023-07-05 Thread Dongwon Kim

'backing_width' and 'backing_height' are commonly used to indicate the size
of the whole backing region so it makes sense to use those terms for
VGAUDMABuf as well in place of 'scanout_width' and 'scanout_height'.

Cc: Gerd Hoffmann 
Cc: Marc-André Lureau 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 hw/display/virtio-gpu-udmabuf.c | 8 
 include/ui/console.h| 4 ++--
 ui/dbus-listener.c  | 4 ++--
 ui/egl-helpers.c| 4 ++--
 ui/gtk-egl.c| 4 ++--
 ui/gtk-gl-area.c| 4 ++--
 6 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/hw/display/virtio-gpu-udmabuf.c b/hw/display/virtio-gpu-udmabuf.c
index ef1a740de5..920d457d4a 100644
--- a/hw/display/virtio-gpu-udmabuf.c
+++ b/hw/display/virtio-gpu-udmabuf.c
@@ -186,8 +186,8 @@ static VGPUDMABuf
 dmabuf->buf.stride = fb->stride;
 dmabuf->buf.x = r->x;
 dmabuf->buf.y = r->y;
-dmabuf->buf.scanout_width = r->width;
-dmabuf->buf.scanout_height = r->height;
+dmabuf->buf.backing_width = r->width;
+dmabuf->buf.backing_height = r->height;
 dmabuf->buf.fourcc = qemu_pixman_to_drm_format(fb->format);
 dmabuf->buf.fd = res->dmabuf_fd;
 dmabuf->buf.allow_fences = true;
@@ -218,8 +218,8 @@ int virtio_gpu_update_dmabuf(VirtIOGPU *g,
 
 g->dmabuf.primary[scanout_id] = new_primary;
 qemu_console_resize(scanout->con,
-new_primary->buf.scanout_width,
-new_primary->buf.scanout_height);
+new_primary->buf.backing_width,
+new_primary->buf.backing_height);
 dpy_gl_scanout_dmabuf(scanout->con, _primary->buf);
 
 if (old_primary) {
diff --git a/include/ui/console.h b/include/ui/console.h
index f27b2aad4f..3e8b22d6c6 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -201,8 +201,8 @@ typedef struct QemuDmaBuf {
 uint32_t  texture;
 uint32_t  x;
 uint32_t  y;
-uint32_t  scanout_width;
-uint32_t  scanout_height;
+uint32_t  backing_width;
+uint32_t  backing_height;
 bool  y0_top;
 void  *sync;
 int   fence_fd;
diff --git a/ui/dbus-listener.c b/ui/dbus-listener.c
index 0240c39510..7d73681cbc 100644
--- a/ui/dbus-listener.c
+++ b/ui/dbus-listener.c
@@ -420,8 +420,8 @@ static void dbus_scanout_texture(DisplayChangeListener *dcl,
 .y0_top = backing_y_0_top,
 .x = x,
 .y = y,
-.scanout_width = w,
-.scanout_height = h,
+.backing_width = w,
+.backing_height = h,
 };
 
 assert(tex_id);
diff --git a/ui/egl-helpers.c b/ui/egl-helpers.c
index 8f9fbf583e..6b7be5753d 100644
--- a/ui/egl-helpers.c
+++ b/ui/egl-helpers.c
@@ -148,8 +148,8 @@ void egl_fb_blit(egl_fb *dst, egl_fb *src, bool flip)
 if (src->dmabuf) {
 x1 = src->dmabuf->x;
 y1 = src->dmabuf->y;
-w = src->dmabuf->scanout_width;
-h = src->dmabuf->scanout_height;
+w = src->dmabuf->backing_width;
+h = src->dmabuf->backing_height;
 }
 
 w = (x1 + w) > src->width ? src->width - x1 : w;
diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index d59b8cd7d7..7604696d4a 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -259,8 +259,8 @@ void gd_egl_scanout_dmabuf(DisplayChangeListener *dcl,
 
 gd_egl_scanout_texture(dcl, dmabuf->texture,
dmabuf->y0_top, dmabuf->width, dmabuf->height,
-   dmabuf->x, dmabuf->y, dmabuf->scanout_width,
-   dmabuf->scanout_height, NULL);
+   dmabuf->x, dmabuf->y, dmabuf->backing_width,
+   dmabuf->backing_height, NULL);
 
 if (dmabuf->allow_fences) {
 vc->gfx.guest_fb.dmabuf = dmabuf;
diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
index 7367dfd793..3337a4baa3 100644
--- a/ui/gtk-gl-area.c
+++ b/ui/gtk-gl-area.c
@@ -300,8 +300,8 @@ void gd_gl_area_scanout_dmabuf(DisplayChangeListener *dcl,
 
 gd_gl_area_scanout_texture(dcl, dmabuf->texture,
dmabuf->y0_top, dmabuf->width, dmabuf->height,
-   dmabuf->x, dmabuf->y, dmabuf->scanout_width,
-   dmabuf->scanout_height, NULL);
+   dmabuf->x, dmabuf->y, dmabuf->backing_width,
+   dmabuf->backing_height, NULL);
 
 if (dmabuf->allow_fences) {
 vc->gfx.guest_fb.dmabuf = dmabuf;
-- 
2.34.1

Re: [PATCH v2 7/7] migration: Provide explicit error message for file shutdowns

2023-07-05 Thread Peter Xu

On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > Provide an explicit reason for qemu_file_shutdown()s, which can be
> > displayed in query-migrate when used.
> >
> 
> Can we consider this to cover the TODO:
> 
>  * TODO: convert to propagate Error objects instead of squashing
>  * to a fixed errno value
> 
> or would that need something fancier?

The TODO seems to say we want to allow qemu_file_shutdown() to report an
Error* when anything wrong happened (e.g. shutdown() failed)?  While this
patch was trying to store a specific error string so when query migration
later it'll show up to the user.  If so, IMHO they're two things.

> 
> > This will make e.g. migrate-pause to display explicit error descriptions,
> > from:
> >
> > "error-desc": "Channel error: Input/output error"
> >
> > To:
> >
> > "error-desc": "Channel is explicitly shutdown by the user"
> >
> > in query-migrate.
> >
> > Signed-off-by: Peter Xu 
> > ---
> >  migration/qemu-file.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > index 419b4092e7..ff605027de 100644
> > --- a/migration/qemu-file.c
> > +++ b/migration/qemu-file.c
> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
> >   *  --> guest crash!
> >   */
> >  if (!f->last_error) {
> > -qemu_file_set_error(f, -EIO);
> > +Error *err = NULL;
> > +
> > +error_setg(, "Channel is explicitly shutdown by the user");
> 
> It is good that we can grep this message. However, I'm confused about
> who the "user" is meant to be here and how are they implicated in this
> error.

Ah, here the user is who sends the "migrate-pause" command, according to
the example of the commit message.

What I wanted to do is provide a clear message (besides -EIO) when
query-migrate, so we know more on how the migration is paused/stopped/...
Before that it shows that the same as e.g. any form of IO errors happened.

Thanks,

-- 
Peter Xu

Re: [PATCH v2 6/7] qemufile: Always return a verbose error

2023-07-05 Thread Peter Xu

On Wed, Jul 05, 2023 at 06:54:37PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > There're a lot of cases where we only have an errno set in last_error but
> > without a detailed error description.  When this happens, try to generate
> > an error contains the errno as a descriptive error.
> >
> > This will be helpful in cases where one relies on the Error*.  E.g.,
> > migration state only caches Error* in MigrationState.error.  With this,
> > we'll display correct error messages in e.g. query-migrate when the error
> > was only set by qemu_file_set_error().
> >
> > Signed-off-by: Peter Xu 
> > ---
> >  migration/qemu-file.c | 15 ---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > index acc282654a..419b4092e7 100644
> > --- a/migration/qemu-file.c
> > +++ b/migration/qemu-file.c
> > @@ -156,15 +156,24 @@ void qemu_file_set_hooks(QEMUFile *f, const 
> > QEMUFileHooks *hooks)
> >   *
> >   * Return negative error value if there has been an error on previous
> >   * operations, return 0 if no error happened.
> > - * Optional, it returns Error* in errp, but it may be NULL even if return 
> > value
> > - * is not 0.
> >   *
> > + * If errp is specified, a verbose error message will be copied over.
> >   */
> >  int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
> >  {
> > +if (!f->last_error) {
> > +return 0;
> > +}
> > +
> > +/* There is an error */
> >  if (errp) {
> > -*errp = f->last_error_obj ? error_copy(f->last_error_obj) : NULL;
> > +if (f->last_error_obj) {
> > +*errp = error_copy(f->last_error_obj);
> > +} else {
> > +error_setg_errno(errp, -f->last_error, "Channel error");
> 
> There are a couple of places that do:
> 
> ret = vmstate_save(f, se, ms->vmdesc);
> if (ret) {
> qemu_file_set_error(f, ret);
> break;
> }
> 
> and vmstate_save() can return > 0 on error. This would make this message
> say "Unknown error". This is minor.
> 
> But take a look at qemu_fclose(). It can return f->last_error while the
> function documentation says it should return negative on error.
> 
> Should we make qemu_file_set_error() check 'ret' and always set a
> negative value for f->last_error?

Yeah, maybe we can add a sanity check, but logically it's better we just
fix vmstate_save() to make sure it always returns a <0 error.

It seems to me there're so many hooks in vmstate_save_state_v() that it can
return random things.  What's the one you spot?  If it's an obvious issue
we can fix them.

-- 
Peter Xu

Re: [PATCH v1 00/23] Q35 support for Xen

2023-07-05 Thread Bernhard Beschow




Am 5. Juli 2023 16:50:28 UTC schrieb Joel Upham :
>I believe it might have been master unstable branch. Last commit before my
>patches was:
>
>commit 19a720b74fde7e859d19f12c66a72e545947a657
>Merge: c6a5fc2ac7 367189efae
>Author: Richard Henderson 
>Date:   Thu Jun 1 08:30:29 2023 -0700

Indeed!

I've rebased your series and changed the first commit of your series to only 
touch pc_q35.c: https://github.com/shentok/qemu/commits/q35-xen . Judging just 
from compilation my work on decoupling Xen and PIIX seems to have provided a 
good blueprint for the ICH9 LPC device model (no changes needed there).

How can one run Xen/Q35? I've tried running Xen/PC with the PIIX4 rather than 
the usual PIIX3 south bridge before which essentially only changes the PCI IDs. 
But that didn't work. With Q35/ICH9 the differences would be way bigger...

Thanks,
Bernhard

P.S.: I'm waiting for Xen to become compilable again with my Linux distribution 
such that I could add PIIX4 support to Xen.

>
>-Joel
>
>On Thu, Jun 22, 2023 at 1:11 PM Bernhard Beschow  wrote:
>
>>
>>
>> Am 20. Juni 2023 17:24:33 UTC schrieb Joel Upham :
>> >These are the Qemu changes needed to support the q35 chipset for xen
>> >I based the patches from 2017 found on the mailing list here:
>> >
>> https://lists.xenproject.org/archives/html/xen-devel/2018-03/msg01176.html
>> >
>> >I have been using a version of these patches on Xen 4.16 with Qemu
>> >version 4.1 for over 6 months.  The guest VMs are very stable, and PCIe
>> >PT is working as was designed (all of the PCIe devices are on the root
>> >PCIe device).  I have successfully passed through GPUs, NICs, etc. I was
>> >asked by those in the community to attempt to once again upstream the
>> >patches.  I have them working with Seabios and OVMF (patches are needed
>> >to OVMF which I will be sending to the mailing list). The Qemu patches
>> >allow for the xenvbd to properly unplug the AHCI SATA device, and all
>> >xen pv windows drivers work as intended.
>> >
>> >I used the original author of the patches to get a majority of this to
>> work:
>> >Alexey Gerasimenko.  I fixed the patches to be in line with the upstream
>> >Qemu and Xen versions.  Any original issues may still exist; however, I
>> >am sure in time they can be improved. If the code doesn't exist then they
>> >can't be actively looked at by the community.
>> >
>> >I am not an expert on the Q35 chipset or PCIe technology.  This is my
>> >first patch to this mailing list.
>>
>> Patchew was unable to apply this series onto master:
>> https://patchew.org/QEMU/cover.1687278381.git.jupham...@gmail.com/ What
>> revision is the series based on?
>>
>> Can you rebase? Rebasing this series will probably cause quite some work
>> since it will simplify here and there, as indicated by Igor and by my
>> comments in "version zero" of this series.
>>
>> Best regards,
>> Bernhard
>>
>> >
>> >
>> >Joel Upham (23):
>> >  pc/xen: Xen Q35 support: provide IRQ handling for PCI devices
>> >  pc/q35: Apply PCI bus BSEL property for Xen PCI device hotplug
>> >  q35/acpi/xen: Provide ACPI PCI hotplug interface for Xen on Q35
>> >  q35/xen: Add Xen platform device support for Q35
>> >  q35: Fix incorrect values for PCIEXBAR masks
>> >  xen/pt: XenHostPCIDevice: provide functions for PCI Capabilities and
>> >PCIe Extended Capabilities enumeration
>> >  xen/pt: avoid reading PCIe device type and cap version multiple times
>> >  xen/pt: determine the legacy/PCIe mode for a passed through device
>> >  xen/pt: Xen PCIe passthrough support for Q35: bypass PCIe topology
>> >check
>> >  xen/pt: add support for PCIe Extended Capabilities and larger config
>> >space
>> >  xen/pt: handle PCIe Extended Capabilities Next register
>> >  xen/pt: allow to hide PCIe Extended Capabilities
>> >  xen/pt: add Vendor-specific PCIe Extended Capability descriptor and
>> >sizing
>> >  xen/pt: add fixed-size PCIe Extended Capabilities descriptors
>> >  xen/pt: add AER PCIe Extended Capability descriptor and sizing
>> >  xen/pt: add descriptors and size calculation for
>> >RCLD/ACS/PMUX/DPA/MCAST/TPH/DPC PCIe Extended Capabilities
>> >  xen/pt: add Resizable BAR PCIe Extended Capability descriptor and
>> >sizing
>> >  xen/pt: add VC/VC9/MFVC PCIe Extended Capabilities descriptors and
>> >sizing
>> >  xen/pt: Fake capability id
>> >  xen platform: unplug ahci object
>> >  pc/q35: setup q35 for xen
>> >  qdev-monitor/pt: bypass root device check
>> >  s3 support: enabling s3 with q35
>> >
>> > hw/acpi/ich9.c|   22 +-
>> > hw/acpi/pcihp.c   |6 +-
>> > hw/core/machine.c |   19 +
>> > hw/i386/pc_piix.c |3 +-
>> > hw/i386/pc_q35.c  |   39 +-
>> > hw/i386/xen/xen-hvm.c |7 +-
>> > hw/i386/xen/xen_platform.c|   19 +-
>> > hw/isa/lpc_ich9.c |   53 +-
>> > hw/isa/piix3.c|2 +-
>> > hw/pci-host/q35.c |   28 +-
>> > hw/pci/pci.c  |   17

Re: [PATCH v8 02/20] hw/riscv/virt.c: skip 'mmu-type' FDT if satp mode not set

On 7/5/23 19:12, Conor Dooley wrote:

On Wed, Jul 05, 2023 at 07:00:52PM -0300, Daniel Henrique Barboza wrote:

On 7/5/23 18:49, Conor Dooley wrote:

On Wed, Jul 05, 2023 at 06:39:37PM -0300, Daniel Henrique Barboza wrote:

The absence of a satp mode in riscv_host_cpu_init() is causing the
following error:

$ ./qemu/build/qemu-system-riscv64 -machine virt,accel=kvm \
-m 2G -smp 1 -nographic -snapshot \
-kernel ./guest_imgs/Image \
-initrd ./guest_imgs/rootfs_kvm_riscv64.img \
-append "earlycon=sbi root=/dev/ram rw" \
-cpu host
**
ERROR:../target/riscv/cpu.c:320:satp_mode_str: code should not be
reached
Bail out! ERROR:../target/riscv/cpu.c:320:satp_mode_str: code should
not be reached
Aborted

The error is triggered from create_fdt_socket_cpus() in hw/riscv/virt.c.
It's trying to get satp_mode_str for a NULL cpu->cfg.satp_mode.map.

For this KVM cpu we would need to inherit the satp supported modes
from the RISC-V host. At this moment this is not possible because the
KVM driver does not support it. And even when it does we can't just let
this broken for every other older kernel.

Since mmu-type is not a required node, according to [1], skip the
'mmu-type' FDT node if there's no satp_mode set. We'll revisit this
logic when we can get satp information from KVM.

[1]
https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/cpu.yaml

I don't think this is the correct link to reference as backup, as the
generic binding sets out no requirements. I think you would want to link
to the RISC-V specific cpus binding.

You mean this link?

https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/riscv/cpus.yaml

Yeah, that's the correct file. Should probably have linked it, sorry
about that. And in case it was not clear, not suggesting that this would
require a resend, since the reasoning is correct.

I don't mind amending this in case we need another version for any other reason.
Otherwise we'll hope that Alistair will be a true, real gentlemann and amend the
commit msg for us :D

That said, things like FreeBSD and U-Boot appear to require mmu-type
https://lore.kernel.org/all/20230705-fondue-bagginess-66c25f1a4135@spud/
so I am wondering if we should in fact make the mmu-type a required
property in the RISC-V specific binding.

To make it required, as far as QEMU is concerned, we'll need to assume a
default value for the 'host' CPU type (e.g. sv57). In the future we can read the
satp host value directly when/if KVM provides satp_mode via get_one_reg().

I dunno if assuming is the right thing to do, since it could be actively
wrong. Leaving it out, as you are doing here, is, IMO, nicer to those
guests. Once there's an API for it, I think it could then be added and
then the additional guests would be supported.

Makes sense. We'll revisit this piece of code when that API I sent today find
its way upstream. Thanks,

Daniel

Thanks,
Conor.

RE: [PATCH v2] Hexagon: move GETPC() calls to top level helpers

2023-07-05 Thread ltaylorsimpson




> -Original Message-
> From: Matheus Tavares Bernardino 
> Sent: Wednesday, July 5, 2023 12:35 PM
> To: qemu-devel@nongnu.org
> Cc: quic_mathb...@quicinc.com; bc...@quicinc.com;
> ltaylorsimp...@gmail.com; quic_mlie...@quicinc.com;
> richard.hender...@linaro.org
> Subject: [PATCH v2] Hexagon: move GETPC() calls to top level helpers
> 
> As docs/devel/loads-stores.rst states:
> 
>   ``GETPC()`` should be used with great care: calling
>   it in other functions that are *not* the top level
>   ``HELPER(foo)`` will cause unexpected behavior. Instead, the
>   value of ``GETPC()`` should be read from the helper and passed
>   if needed to the functions that the helper calls.
> 
> Let's fix the GETPC() usage in Hexagon, making sure it's always called
from
> top level helpers and passed down to the places where it's needed. There
> are two snippets where that is not currently the case:
> 
> - probe_store(), which is only called from two helpers, so it's easy to
>   move GETPC() up.
> 
> - mem_load*() functions, which are also called directly from helpers,
>   but through the MEM_LOAD*() set of macros. Note that this are only
>   used when compiling with --disable-hexagon-idef-parser.
> 
>   In this case, we also take this opportunity to simplify the code,
>   unifying the mem_load*() functions.
> 
> Signed-off-by: Matheus Tavares Bernardino 
> ---
> v1:
> d40fabcf9d6e92e4cd8d6a144e9b2a9acf4580dc.1688420966.git.quic_mathber
> n...@quicinc.com
> 
> Changes in v2:
> - Fixed wrong cpu_ld* unification from previous version.
> - Passed retaddr down to check_noshuf() and further, as Taylor
>   suggested.
> - Reorganized macros for simplification.
> 
>  target/hexagon/macros.h| 19 ++--
>  target/hexagon/op_helper.h | 11 ++-  target/hexagon/op_helper.c | 62
> +++---
>  3 files changed, 29 insertions(+), 63 deletions(-)
> 
> diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h index
> 5451b061ee..e44a932434 100644
> --- a/target/hexagon/macros.h
> +++ b/target/hexagon/macros.h
> @@ -173,15 +173,6 @@
>  #define fLOAD(NUM, SIZE, SIGN, EA, DST) \
> -DST = (size##SIZE##SIGN##_t)MEM_LOAD##SIZE##SIGN(EA)
> +DST =  (size##SIZE##SIGN##_t)({ \
> +check_noshuf(env, pkt_has_store_s1, slot, EA, SIZE, GETPC()); \
> +MEM_LOAD##SIZE(env, EA, GETPC()); \
> +})
>  #endif

This should be formatted as
#define fLOAD(...) \
do { \
check_noshuf(...); \
DST = ...; \
} while (0)

> a/target/hexagon/op_helper.h b/target/hexagon/op_helper.h index
> 8f3764d15e..7744e819ef 100644
> --- a/target/hexagon/op_helper.h
> +++ b/target/hexagon/op_helper.h
> +void check_noshuf(CPUHexagonState *env, bool pkt_has_store_s1,
> +  uint32_t slot, target_ulong vaddr, int size,
> +uintptr_t ra);

Are you sure this needs to be non-static?


Othersiwe
Reviewed-by: Taylor Simpson

Re: [PATCH v8 02/20] hw/riscv/virt.c: skip 'mmu-type' FDT if satp mode not set

2023-07-05 Thread Conor Dooley

On Wed, Jul 05, 2023 at 07:00:52PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 7/5/23 18:49, Conor Dooley wrote:
> > On Wed, Jul 05, 2023 at 06:39:37PM -0300, Daniel Henrique Barboza wrote:
> > > The absence of a satp mode in riscv_host_cpu_init() is causing the
> > > following error:
> > > 
> > > $ ./qemu/build/qemu-system-riscv64  -machine virt,accel=kvm \
> > >  -m 2G -smp 1  -nographic -snapshot \
> > >  -kernel ./guest_imgs/Image \
> > >  -initrd ./guest_imgs/rootfs_kvm_riscv64.img \
> > >  -append "earlycon=sbi root=/dev/ram rw" \
> > >  -cpu host
> > > **
> > > ERROR:../target/riscv/cpu.c:320:satp_mode_str: code should not be
> > > reached
> > > Bail out! ERROR:../target/riscv/cpu.c:320:satp_mode_str: code should
> > > not be reached
> > > Aborted
> > > 
> > > The error is triggered from create_fdt_socket_cpus() in hw/riscv/virt.c.
> > > It's trying to get satp_mode_str for a NULL cpu->cfg.satp_mode.map.
> > > 
> > > For this KVM cpu we would need to inherit the satp supported modes
> > > from the RISC-V host. At this moment this is not possible because the
> > > KVM driver does not support it. And even when it does we can't just let
> > > this broken for every other older kernel.
> > > 
> > > Since mmu-type is not a required node, according to [1], skip the
> > > 'mmu-type' FDT node if there's no satp_mode set. We'll revisit this
> > > logic when we can get satp information from KVM.
> > > 
> > > [1] 
> > > https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/cpu.yaml
> > 
> > I don't think this is the correct link to reference as backup, as the
> > generic binding sets out no requirements. I think you would want to link
> > to the RISC-V specific cpus binding.
> 
> You mean this link?
> 
> https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/riscv/cpus.yaml

Yeah, that's the correct file. Should probably have linked it, sorry
about that. And in case it was not clear, not suggesting that this would
require a resend, since the reasoning is correct.

> > That said, things like FreeBSD and U-Boot appear to require mmu-type
> > https://lore.kernel.org/all/20230705-fondue-bagginess-66c25f1a4135@spud/
> > so I am wondering if we should in fact make the mmu-type a required
> > property in the RISC-V specific binding.
> 
> 
> To make it required, as far as QEMU is concerned, we'll need to assume a
> default value for the 'host' CPU type (e.g. sv57). In the future we can read 
> the
> satp host value directly when/if KVM provides satp_mode via get_one_reg().

I dunno if assuming is the right thing to do, since it could be actively
wrong. Leaving it out, as you are doing here, is, IMO, nicer to those
guests. Once there's an API for it, I think it could then be added and
then the additional guests would be supported.

Thanks,
Conor.


signature.asc
Description: PGP signature

Re: [PATCH v2 7/7] migration: Provide explicit error message for file shutdowns

Peter Xu  writes:

> Provide an explicit reason for qemu_file_shutdown()s, which can be
> displayed in query-migrate when used.
>

Can we consider this to cover the TODO:

 * TODO: convert to propagate Error objects instead of squashing
 * to a fixed errno value

or would that need something fancier?

> This will make e.g. migrate-pause to display explicit error descriptions,
> from:
>
> "error-desc": "Channel error: Input/output error"
>
> To:
>
> "error-desc": "Channel is explicitly shutdown by the user"
>
> in query-migrate.
>
> Signed-off-by: Peter Xu 
> ---
>  migration/qemu-file.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 419b4092e7..ff605027de 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
>   *  --> guest crash!
>   */
>  if (!f->last_error) {
> -qemu_file_set_error(f, -EIO);
> +Error *err = NULL;
> +
> +error_setg(, "Channel is explicitly shutdown by the user");

It is good that we can grep this message. However, I'm confused about
who the "user" is meant to be here and how are they implicated in this
error.

> +qemu_file_set_error_obj(f, -EIO, err);
>  }
>  
>  if (!qio_channel_has_feature(f->ioc,

Re: [PATCH v8 02/20] hw/riscv/virt.c: skip 'mmu-type' FDT if satp mode not set

On 7/5/23 18:49, Conor Dooley wrote:

On Wed, Jul 05, 2023 at 06:39:37PM -0300, Daniel Henrique Barboza wrote:

The absence of a satp mode in riscv_host_cpu_init() is causing the
following error:

The error is triggered from create_fdt_socket_cpus() in hw/riscv/virt.c.
It's trying to get satp_mode_str for a NULL cpu->cfg.satp_mode.map.

Since mmu-type is not a required node, according to [1], skip the
'mmu-type' FDT node if there's no satp_mode set. We'll revisit this
logic when we can get satp information from KVM.

[1]
https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/cpu.yaml

I don't think this is the correct link to reference as backup, as the
generic binding sets out no requirements. I think you would want to link
to the RISC-V specific cpus binding.

You mean this link?

https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/riscv/cpus.yaml

Thanks,

Daniel

Since nommu is covered by an mmu type of "riscv,none", I am kinda
struggling to think of a case where it should be left out (while
describing real hardware at least).

Cheers,
Conor.

Re: [PATCH v2 6/7] qemufile: Always return a verbose error

Peter Xu  writes:

> There're a lot of cases where we only have an errno set in last_error but
> without a detailed error description.  When this happens, try to generate
> an error contains the errno as a descriptive error.
>
> This will be helpful in cases where one relies on the Error*.  E.g.,
> migration state only caches Error* in MigrationState.error.  With this,
> we'll display correct error messages in e.g. query-migrate when the error
> was only set by qemu_file_set_error().
>
> Signed-off-by: Peter Xu 
> ---
>  migration/qemu-file.c | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index acc282654a..419b4092e7 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -156,15 +156,24 @@ void qemu_file_set_hooks(QEMUFile *f, const 
> QEMUFileHooks *hooks)
>   *
>   * Return negative error value if there has been an error on previous
>   * operations, return 0 if no error happened.
> - * Optional, it returns Error* in errp, but it may be NULL even if return 
> value
> - * is not 0.
>   *
> + * If errp is specified, a verbose error message will be copied over.
>   */
>  int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
>  {
> +if (!f->last_error) {
> +return 0;
> +}
> +
> +/* There is an error */
>  if (errp) {
> -*errp = f->last_error_obj ? error_copy(f->last_error_obj) : NULL;
> +if (f->last_error_obj) {
> +*errp = error_copy(f->last_error_obj);
> +} else {
> +error_setg_errno(errp, -f->last_error, "Channel error");

There are a couple of places that do:

ret = vmstate_save(f, se, ms->vmdesc);
if (ret) {
qemu_file_set_error(f, ret);
break;
}

and vmstate_save() can return > 0 on error. This would make this message
say "Unknown error". This is minor.

But take a look at qemu_fclose(). It can return f->last_error while the
function documentation says it should return negative on error.

Should we make qemu_file_set_error() check 'ret' and always set a
negative value for f->last_error?

Re: [PATCH v2] hw/ide/piix: properly initialize the BMIBA register

2023-07-05 Thread Bernhard Beschow




Am 5. Juli 2023 10:01:21 UTC schrieb Olaf Hering :
>Tue, 4 Jul 2023 08:38:33 +0200 Paolo Bonzini :
>
>> I agree that calling pci_device_reset() would be a better match for 
>> pci_xen_ide_unplug().
>
>This change works as well:

Nice!

>
>--- a/hw/i386/xen/xen_platform.c
>+++ b/hw/i386/xen/xen_platform.c
>@@ -164,8 +164,9 @@ static void pci_unplug_nics(PCIBus *bus)
>  *
>  * [1] 
> https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/hvm-emulated-unplug.pandoc
>  */
>-static void pci_xen_ide_unplug(DeviceState *dev, bool aux)
>+static void pci_xen_ide_unplug(PCIDevice *d, bool aux)
> {
>+DeviceState *dev = DEVICE(d);
> PCIIDEState *pci_ide;
> int i;
> IDEDevice *idedev;
>@@ -195,7 +196,7 @@ static void pci_xen_ide_unplug(DeviceState *dev, bool aux)
> blk_unref(blk);
> }
> }
>-device_cold_reset(dev);
>+pci_device_reset(d);
> }
> 
> static void unplug_disks(PCIBus *b, PCIDevice *d, void *opaque)
>@@ -210,7 +211,7 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void 
>*opaque)
> 
> switch (pci_get_word(d->config + PCI_CLASS_DEVICE)) {
> case PCI_CLASS_STORAGE_IDE:
>-pci_xen_ide_unplug(DEVICE(d), aux);
>+pci_xen_ide_unplug(d, aux);
> break;
> 
> case PCI_CLASS_STORAGE_SCSI:
>--- a/hw/ide/piix.c
>+++ b/hw/ide/piix.c
>@@ -118,7 +118,6 @@ static void piix_ide_reset(DeviceState *dev)
> pci_set_word(pci_conf + PCI_COMMAND, 0x);
> pci_set_word(pci_conf + PCI_STATUS,
>  PCI_STATUS_DEVSEL_MEDIUM | PCI_STATUS_FAST_BACK);
>-pci_set_byte(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */

I wonder if we should fix this line rather than dropping it. pci_device_reset() 
calls pci_reset_regions() which unconditionally clears all BARs to zero. While 
that works for PIIX IDE the VIA IDE device model intends to set BARs to the IDE 
compatibility addresses during reset but pci_reset_regions() overwrites it with 
zeroes again. So I wonder if pci_reset_regions() should be dropped such that 
pci_update_mappings() resets the BARs to whatever they were set in reset.

Of course this won't be an easy change but I wonder if it was more correct, 
especially since there seems to be no way to have the device model have the 
last word. Any opinions/suggestions?

Thanks,
Bernhard

> }
> 
> static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, Error **errp)
>
>
>Olaf

Re: [PATCH v8 02/20] hw/riscv/virt.c: skip 'mmu-type' FDT if satp mode not set

2023-07-05 Thread Conor Dooley

On Wed, Jul 05, 2023 at 06:39:37PM -0300, Daniel Henrique Barboza wrote:
> The absence of a satp mode in riscv_host_cpu_init() is causing the
> following error:
> 
> $ ./qemu/build/qemu-system-riscv64  -machine virt,accel=kvm \
> -m 2G -smp 1  -nographic -snapshot \
> -kernel ./guest_imgs/Image \
> -initrd ./guest_imgs/rootfs_kvm_riscv64.img \
> -append "earlycon=sbi root=/dev/ram rw" \
> -cpu host
> **
> ERROR:../target/riscv/cpu.c:320:satp_mode_str: code should not be
> reached
> Bail out! ERROR:../target/riscv/cpu.c:320:satp_mode_str: code should
> not be reached
> Aborted
> 
> The error is triggered from create_fdt_socket_cpus() in hw/riscv/virt.c.
> It's trying to get satp_mode_str for a NULL cpu->cfg.satp_mode.map.
> 
> For this KVM cpu we would need to inherit the satp supported modes
> from the RISC-V host. At this moment this is not possible because the
> KVM driver does not support it. And even when it does we can't just let
> this broken for every other older kernel.
> 
> Since mmu-type is not a required node, according to [1], skip the
> 'mmu-type' FDT node if there's no satp_mode set. We'll revisit this
> logic when we can get satp information from KVM.
> 
> [1] 
> https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/cpu.yaml

I don't think this is the correct link to reference as backup, as the
generic binding sets out no requirements. I think you would want to link
to the RISC-V specific cpus binding.

That said, things like FreeBSD and U-Boot appear to require mmu-type
https://lore.kernel.org/all/20230705-fondue-bagginess-66c25f1a4135@spud/
so I am wondering if we should in fact make the mmu-type a required
property in the RISC-V specific binding.

Since nommu is covered by an mmu type of "riscv,none", I am kinda
struggling to think of a case where it should be left out (while
describing real hardware at least).

Cheers,
Conor.

signature.asc
Description: PGP signature

[PATCH v8 00/20] target/riscv, KVM: fixes and enhancements

Hi,

This version has a last minute change in patch 14. It's a bug fix and a
design change.

The bug fix: ioctl() will always error out with -1 and return the error
code in 'errno'. I was checking the ioctl() return value for EINVAL,
which doesn't work.

The design change has to do with discussions between Andrew and Anup and
myself in our internal Slack. RISC-V KVM is overusing the EINVAL error
code in set_one_reg() and get_one_reg() APIs, making it very hard for
userspace (such as QEMU, crosvm, etc) to tell what went wrong. In our
case, patch 14 is making an EINVAL assumption that we're not confident
about because this error code can mean almost anything.

We'll push for a KVM change in the next few days. As far as QEMU goes
we're going to do what we consider the right thing: check for ENOENT
instead of EINVAL in patch 14. The reason why we're doing this change
right now in QEMU, instead of waiting for KVM to change first, can be
better explained by Drew's comment in version 7 [1]:

" But, also as discussed internally, based on our upcoming plans to use
ENOENT for missing registers, we should change this check to be for
ENOENT now. While that may seem premature, I think it's OK, because
until a KVM which returns ENOENT for missing registers exists and is
used, QEMU command lines which disable unknown registers will be
rejected. But, that will also happen even after a KVM that returns
ENOENT exits if an older KVM is used. In both cases that's fine, as
rejecting is the more conservative behavior for an error. Finally, if
the yet-to-be-posted KVM ENOENT patch never gets merged, then we may be
stuck rejecting forever anyway, since EINVAL is quite generic and
probably isn't safe to use for this purpose."

Checking for ENOENT is the right approach and we'll change QEMU to
implement it right off the gate for 8.1. In case KVM refuses to change
we'll error out in all cases in patch 14, which is still a better
solution than making guesses about EINVAL means.


Series based on top of Alistair's riscv-to-apply.next.

Patches missing review: 14

Changes from v7:
- Patch 14:
  - use 'errno' to check the error code from ioctl()
  - test for ENOENT instead of EINVAL
- v7 link: 
https://lore.kernel.org/qemu-devel/20230630100811.287315-1-dbarb...@ventanamicro.com/

[1] https://lore.kernel.org/qemu-devel/20230705-091906904fcc54a4ce96e625@orel/

Daniel Henrique Barboza (20):
  target/riscv: skip features setup for KVM CPUs
  hw/riscv/virt.c: skip 'mmu-type' FDT if satp mode not set
  target/riscv/cpu.c: restrict 'mvendorid' value
  target/riscv/cpu.c: restrict 'mimpid' value
  target/riscv/cpu.c: restrict 'marchid' value
  target/riscv: use KVM scratch CPUs to init KVM properties
  target/riscv: read marchid/mimpid in kvm_riscv_init_machine_ids()
  target/riscv: handle mvendorid/marchid/mimpid for KVM CPUs
  linux-headers: Update to v6.4-rc1
  target/riscv/kvm.c: init 'misa_ext_mask' with scratch CPU
  target/riscv/cpu: add misa_ext_info_arr[]
  target/riscv: add KVM specific MISA properties
  target/riscv/kvm.c: update KVM MISA bits
  target/riscv/kvm.c: add multi-letter extension KVM properties
  target/riscv/cpu.c: add satp_mode properties earlier
  target/riscv/cpu.c: remove priv_ver check from riscv_isa_string_ext()
  target/riscv/cpu.c: create KVM mock properties
  target/riscv: update multi-letter extension KVM properties
  target/riscv/kvm.c: add kvmconfig_get_cfg_addr() helper
  target/riscv/kvm.c: read/write (cbom|cboz)_blocksize in KVM

 hw/riscv/virt.c   |  14 +-
 include/standard-headers/linux/const.h|   2 +-
 include/standard-headers/linux/virtio_blk.h   |  18 +-
 .../standard-headers/linux/virtio_config.h|   6 +
 include/standard-headers/linux/virtio_net.h   |   1 +
 linux-headers/asm-arm64/kvm.h |  33 ++
 linux-headers/asm-riscv/kvm.h |  53 +-
 linux-headers/asm-riscv/unistd.h  |   9 +
 linux-headers/asm-s390/unistd_32.h|   1 +
 linux-headers/asm-s390/unistd_64.h|   1 +
 linux-headers/asm-x86/kvm.h   |   3 +
 linux-headers/linux/const.h   |   2 +-
 linux-headers/linux/kvm.h |  12 +-
 linux-headers/linux/psp-sev.h |   7 +
 linux-headers/linux/userfaultfd.h |  17 +-
 target/riscv/cpu.c| 341 ++--
 target/riscv/cpu.h|   7 +-
 target/riscv/kvm.c| 499 +-
 target/riscv/kvm_riscv.h  |   1 +
 19 files changed, 940 insertions(+), 87 deletions(-)

-- 
2.41.0

[PATCH v8 14/20] target/riscv/kvm.c: add multi-letter extension KVM properties

Let's add KVM user properties for the multi-letter extensions that KVM
currently supports: zicbom, zicboz, zihintpause, zbb, ssaia, sstc,
svinval and svpbmt.

As with MISA extensions, we're using the KVMCPUConfig type to hold
information about the state of each extension. However, multi-letter
extensions have more cases to cover than MISA extensions, so we're
adding an extra 'supported' flag as well. This flag will reflect if a
given extension is supported by KVM, i.e. KVM knows how to handle it.
This is determined during KVM extension discovery in
kvm_riscv_init_multiext_cfg(), where we test for ENOENT errors. Any
other error will cause an abort.

The use of the 'user_set' is similar to what we already do with MISA
extensions: the flag set only if the user is changing the extension
state.

The 'supported' flag will be used later on to make an exception for
users that are disabling multi-letter extensions that are unknown to
KVM.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c |   8 +++
 target/riscv/kvm.c | 119 +
 2 files changed, 127 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 5c8832a030..31e591a938 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1860,6 +1860,14 @@ static void riscv_cpu_add_user_properties(Object *obj)
 riscv_cpu_add_misa_properties(obj);
 
 for (prop = riscv_cpu_extensions; prop && prop->name; prop++) {
+#ifndef CONFIG_USER_ONLY
+if (kvm_enabled()) {
+/* Check if KVM created the property already */
+if (object_property_find(obj, prop->name)) {
+continue;
+}
+}
+#endif
 qdev_property_add_static(dev, prop);
 }
 
diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 7afd6024e6..f2545bd560 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -113,6 +113,7 @@ typedef struct KVMCPUConfig {
 target_ulong offset;
 int kvm_reg_id;
 bool user_set;
+bool supported;
 } KVMCPUConfig;
 
 #define KVM_MISA_CFG(_bit, _reg_id) \
@@ -197,6 +198,81 @@ static void kvm_riscv_update_cpu_misa_ext(RISCVCPU *cpu, 
CPUState *cs)
 }
 }
 
+#define CPUCFG(_prop) offsetof(struct RISCVCPUConfig, _prop)
+
+#define KVM_EXT_CFG(_name, _prop, _reg_id) \
+{.name = _name, .offset = CPUCFG(_prop), \
+ .kvm_reg_id = _reg_id}
+
+static KVMCPUConfig kvm_multi_ext_cfgs[] = {
+KVM_EXT_CFG("zicbom", ext_icbom, KVM_RISCV_ISA_EXT_ZICBOM),
+KVM_EXT_CFG("zicboz", ext_icboz, KVM_RISCV_ISA_EXT_ZICBOZ),
+KVM_EXT_CFG("zihintpause", ext_zihintpause, KVM_RISCV_ISA_EXT_ZIHINTPAUSE),
+KVM_EXT_CFG("zbb", ext_zbb, KVM_RISCV_ISA_EXT_ZBB),
+KVM_EXT_CFG("ssaia", ext_ssaia, KVM_RISCV_ISA_EXT_SSAIA),
+KVM_EXT_CFG("sstc", ext_sstc, KVM_RISCV_ISA_EXT_SSTC),
+KVM_EXT_CFG("svinval", ext_svinval, KVM_RISCV_ISA_EXT_SVINVAL),
+KVM_EXT_CFG("svpbmt", ext_svpbmt, KVM_RISCV_ISA_EXT_SVPBMT),
+};
+
+static void kvm_cpu_cfg_set(RISCVCPU *cpu, KVMCPUConfig *multi_ext,
+uint32_t val)
+{
+int cpu_cfg_offset = multi_ext->offset;
+bool *ext_enabled = (void *)>cfg + cpu_cfg_offset;
+
+*ext_enabled = val;
+}
+
+static uint32_t kvm_cpu_cfg_get(RISCVCPU *cpu,
+KVMCPUConfig *multi_ext)
+{
+int cpu_cfg_offset = multi_ext->offset;
+bool *ext_enabled = (void *)>cfg + cpu_cfg_offset;
+
+return *ext_enabled;
+}
+
+static void kvm_cpu_set_multi_ext_cfg(Object *obj, Visitor *v,
+  const char *name,
+  void *opaque, Error **errp)
+{
+KVMCPUConfig *multi_ext_cfg = opaque;
+RISCVCPU *cpu = RISCV_CPU(obj);
+bool value, host_val;
+
+if (!visit_type_bool(v, name, , errp)) {
+return;
+}
+
+host_val = kvm_cpu_cfg_get(cpu, multi_ext_cfg);
+
+/*
+ * Ignore if the user is setting the same value
+ * as the host.
+ */
+if (value == host_val) {
+return;
+}
+
+if (!multi_ext_cfg->supported) {
+/*
+ * Error out if the user is trying to enable an
+ * extension that KVM doesn't support. Ignore
+ * option otherwise.
+ */
+if (value) {
+error_setg(errp, "KVM does not support disabling extension %s",
+   multi_ext_cfg->name);
+}
+
+return;
+}
+
+multi_ext_cfg->user_set = true;
+kvm_cpu_cfg_set(cpu, multi_ext_cfg, value);
+}
+
 static void kvm_riscv_add_cpu_user_properties(Object *cpu_obj)
 {
 int i;
@@ -215,6 +291,15 @@ static void kvm_riscv_add_cpu_user_properties(Object 
*cpu_obj)
 object_property_set_description(cpu_obj, misa_cfg->name,
 misa_cfg->description);
 }
+
+for (i = 0; i < ARRAY_SIZE(kvm_multi_ext_cfgs); i++) {
+KVMCPUConfig *multi_cfg = _multi_ext_cfgs[i];
+
+object_property_add(cpu_obj, multi_cfg->name, "bool",
+

[PATCH v8 20/20] target/riscv/kvm.c: read/write (cbom|cboz)_blocksize in KVM

If we don't set a proper cbom_blocksize|cboz_blocksize in the FDT the
Linux Kernel will fail to detect the availability of the CBOM/CBOZ
extensions, regardless of the contents of the 'riscv,isa' DT prop.

The FDT is being written using the cpu->cfg.cbom|z_blocksize attributes,
so let's expose them as user properties like it is already done with
TCG.

This will also require us to determine proper blocksize values during
init() time since the FDT is already created during realize(). We'll
take a ride in kvm_riscv_init_multiext_cfg() to do it. Note that we
don't need to fetch both cbom and cboz blocksizes every time: check for
their parent extensions (icbom and icboz) and only read the blocksizes
if needed.

In contrast with cbom|z_blocksize properties from TCG, the user is not
able to set any value that is different from the 'host' value when
running KVM. KVM can be particularly harsh dealing with it: a ENOTSUPP
can be thrown for the mere attempt of executing kvm_set_one_reg() for
these 2 regs.

Hopefully we don't need to call kvm_set_one_reg() for these regs.
We'll check if the user input matches the host value in
kvm_cpu_set_cbomz_blksize(), the set() accessor for both blocksize
properties. We'll fail fast since it's already known to not be
supported.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
---
 target/riscv/kvm.c | 70 ++
 1 file changed, 70 insertions(+)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index d503e03078..659942fded 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -276,6 +276,42 @@ static void kvm_cpu_set_multi_ext_cfg(Object *obj, Visitor 
*v,
 kvm_cpu_cfg_set(cpu, multi_ext_cfg, value);
 }
 
+static KVMCPUConfig kvm_cbom_blocksize = {
+.name = "cbom_blocksize",
+.offset = CPUCFG(cbom_blocksize),
+.kvm_reg_id = KVM_REG_RISCV_CONFIG_REG(zicbom_block_size)
+};
+
+static KVMCPUConfig kvm_cboz_blocksize = {
+.name = "cboz_blocksize",
+.offset = CPUCFG(cboz_blocksize),
+.kvm_reg_id = KVM_REG_RISCV_CONFIG_REG(zicboz_block_size)
+};
+
+static void kvm_cpu_set_cbomz_blksize(Object *obj, Visitor *v,
+  const char *name,
+  void *opaque, Error **errp)
+{
+KVMCPUConfig *cbomz_cfg = opaque;
+RISCVCPU *cpu = RISCV_CPU(obj);
+uint16_t value, *host_val;
+
+if (!visit_type_uint16(v, name, , errp)) {
+return;
+}
+
+host_val = kvmconfig_get_cfg_addr(cpu, cbomz_cfg);
+
+if (value != *host_val) {
+error_report("Unable to set %s to a different value than "
+ "the host (%u)",
+ cbomz_cfg->name, *host_val);
+exit(EXIT_FAILURE);
+}
+
+cbomz_cfg->user_set = true;
+}
+
 static void kvm_riscv_update_cpu_cfg_isa_ext(RISCVCPU *cpu, CPUState *cs)
 {
 CPURISCVState *env = >env;
@@ -329,6 +365,14 @@ static void kvm_riscv_add_cpu_user_properties(Object 
*cpu_obj)
 kvm_cpu_set_multi_ext_cfg,
 NULL, multi_cfg);
 }
+
+object_property_add(cpu_obj, "cbom_blocksize", "uint16",
+NULL, kvm_cpu_set_cbomz_blksize,
+NULL, _cbom_blocksize);
+
+object_property_add(cpu_obj, "cboz_blocksize", "uint16",
+NULL, kvm_cpu_set_cbomz_blksize,
+NULL, _cboz_blocksize);
 }
 
 static int kvm_riscv_get_regs_core(CPUState *cs)
@@ -644,6 +688,24 @@ static void kvm_riscv_init_misa_ext_mask(RISCVCPU *cpu,
 env->misa_ext = env->misa_ext_mask;
 }
 
+static void kvm_riscv_read_cbomz_blksize(RISCVCPU *cpu, KVMScratchCPU *kvmcpu,
+ KVMCPUConfig *cbomz_cfg)
+{
+CPURISCVState *env = >env;
+struct kvm_one_reg reg;
+int ret;
+
+reg.id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG,
+  cbomz_cfg->kvm_reg_id);
+reg.addr = (uint64_t)kvmconfig_get_cfg_addr(cpu, cbomz_cfg);
+ret = ioctl(kvmcpu->cpufd, KVM_GET_ONE_REG, );
+if (ret != 0) {
+error_report("Unable to read KVM reg %s, error %d",
+ cbomz_cfg->name, ret);
+exit(EXIT_FAILURE);
+}
+}
+
 static void kvm_riscv_init_multiext_cfg(RISCVCPU *cpu, KVMScratchCPU *kvmcpu)
 {
 CPURISCVState *env = >env;
@@ -675,6 +737,14 @@ static void kvm_riscv_init_multiext_cfg(RISCVCPU *cpu, 
KVMScratchCPU *kvmcpu)
 
 kvm_cpu_cfg_set(cpu, multi_ext_cfg, val);
 }
+
+if (cpu->cfg.ext_icbom) {
+kvm_riscv_read_cbomz_blksize(cpu, kvmcpu, _cbom_blocksize);
+}
+
+if (cpu->cfg.ext_icboz) {
+kvm_riscv_read_cbomz_blksize(cpu, kvmcpu, _cboz_blocksize);
+}
 }
 
 void kvm_riscv_init_user_properties(Object *cpu_obj)
-- 
2.41.0

[PATCH v8 03/20] target/riscv/cpu.c: restrict 'mvendorid' value

We're going to change the handling of mvendorid/marchid/mimpid by the
KVM driver. Since these are always present in all CPUs let's put the
same validation for everyone.

It doesn't make sense to allow 'mvendorid' to be different than it
is already set in named (vendor) CPUs. Generic (dynamic) CPUs can have
any 'mvendorid' they want.

Change 'mvendorid' to be a class property created via
'object_class_property_add', instead of using the DEFINE_PROP_UINT32()
macro. This allow us to define a custom setter for it that will verify,
for named CPUs, if mvendorid is different than it is already set by the
CPU. This is the error thrown for the 'veyron-v1' CPU if 'mvendorid' is
set to an invalid value:

$ qemu-system-riscv64 -M virt -nographic -cpu veyron-v1,mvendorid=2
qemu-system-riscv64: can't apply global veyron-v1-riscv-cpu.mvendorid=2:
Unable to change veyron-v1-riscv-cpu mvendorid (0x61f)

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
Reviewed-by: Alistair Francis 
---
 target/riscv/cpu.c | 38 +-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 6232e6513b..a778241d9f 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1812,7 +1812,6 @@ static void riscv_cpu_add_user_properties(Object *obj)
 static Property riscv_cpu_properties[] = {
 DEFINE_PROP_BOOL("debug", RISCVCPU, cfg.debug, true),
 
-DEFINE_PROP_UINT32("mvendorid", RISCVCPU, cfg.mvendorid, 0),
 DEFINE_PROP_UINT64("marchid", RISCVCPU, cfg.marchid, RISCV_CPU_MARCHID),
 DEFINE_PROP_UINT64("mimpid", RISCVCPU, cfg.mimpid, RISCV_CPU_MIMPID),
 
@@ -1899,6 +1898,40 @@ static const struct TCGCPUOps riscv_tcg_ops = {
 #endif /* !CONFIG_USER_ONLY */
 };
 
+static bool riscv_cpu_is_dynamic(Object *cpu_obj)
+{
+return object_dynamic_cast(cpu_obj, TYPE_RISCV_DYNAMIC_CPU) != NULL;
+}
+
+static void cpu_set_mvendorid(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp)
+{
+bool dynamic_cpu = riscv_cpu_is_dynamic(obj);
+RISCVCPU *cpu = RISCV_CPU(obj);
+uint32_t prev_val = cpu->cfg.mvendorid;
+uint32_t value;
+
+if (!visit_type_uint32(v, name, , errp)) {
+return;
+}
+
+if (!dynamic_cpu && prev_val != value) {
+error_setg(errp, "Unable to change %s mvendorid (0x%x)",
+   object_get_typename(obj), prev_val);
+return;
+}
+
+cpu->cfg.mvendorid = value;
+}
+
+static void cpu_get_mvendorid(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp)
+{
+bool value = RISCV_CPU(obj)->cfg.mvendorid;
+
+visit_type_bool(v, name, , errp);
+}
+
 static void riscv_cpu_class_init(ObjectClass *c, void *data)
 {
 RISCVCPUClass *mcc = RISCV_CPU_CLASS(c);
@@ -1930,6 +1963,9 @@ static void riscv_cpu_class_init(ObjectClass *c, void 
*data)
 cc->gdb_get_dynamic_xml = riscv_gdb_get_dynamic_xml;
 cc->tcg_ops = _tcg_ops;
 
+object_class_property_add(c, "mvendorid", "uint32", cpu_get_mvendorid,
+  cpu_set_mvendorid, NULL, NULL);
+
 device_class_set_props(dc, riscv_cpu_properties);
 }
 
-- 
2.41.0

[PATCH v8 13/20] target/riscv/kvm.c: update KVM MISA bits

Our design philosophy with KVM properties can be resumed in two main
decisions based on KVM interface availability and what the user wants to
do:

- if the user disables an extension that the host KVM module doesn't
know about (i.e. it doesn't implement the kvm_get_one_reg() interface),
keep booting the CPU. This will avoid users having to deal with issues
with older KVM versions while disabling features they don't care;

- for any other case we're going to error out immediately. If the user
wants to enable a feature that KVM doesn't know about this a problem that
is worth aborting - the user must know that the feature wasn't enabled
in the hart. Likewise, if KVM knows about the extension, the user wants
to enable/disable it, and we fail to do it so, that's also a problem we
can't shrug it off.

In the case of MISA bits we won't even try enabling bits that aren't
already available in the host. The ioctl() is so likely to fail that
it's not worth trying. This check is already done in the previous patch,
in kvm_cpu_set_misa_ext_cfg(), thus we don't need to worry about it now.

In kvm_riscv_update_cpu_misa_ext() we'll go through every potential user
option and do as follows:

- if the user didn't set the property or set to the same value of the
host, do nothing;

- Disable the given extension in KVM. Error out if anything goes wrong.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
---
 target/riscv/kvm.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index c55d0ec7ab..7afd6024e6 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -162,6 +162,41 @@ static void kvm_cpu_set_misa_ext_cfg(Object *obj, Visitor 
*v,
"enabled in the host", misa_ext_cfg->name);
 }
 
+static void kvm_riscv_update_cpu_misa_ext(RISCVCPU *cpu, CPUState *cs)
+{
+CPURISCVState *env = >env;
+uint64_t id, reg;
+int i, ret;
+
+for (i = 0; i < ARRAY_SIZE(kvm_misa_ext_cfgs); i++) {
+KVMCPUConfig *misa_cfg = _misa_ext_cfgs[i];
+target_ulong misa_bit = misa_cfg->offset;
+
+if (!misa_cfg->user_set) {
+continue;
+}
+
+/* If we're here we're going to disable the MISA bit */
+reg = 0;
+id = kvm_riscv_reg_id(env, KVM_REG_RISCV_ISA_EXT,
+  misa_cfg->kvm_reg_id);
+ret = kvm_set_one_reg(cs, id, );
+if (ret != 0) {
+/*
+ * We're not checking for -EINVAL because if the bit is about
+ * to be disabled, it means that it was already enabled by
+ * KVM. We determined that by fetching the 'isa' register
+ * during init() time. Any error at this point is worth
+ * aborting.
+ */
+error_report("Unable to set KVM reg %s, error %d",
+ misa_cfg->name, ret);
+exit(EXIT_FAILURE);
+}
+env->misa_ext &= ~misa_bit;
+}
+}
+
 static void kvm_riscv_add_cpu_user_properties(Object *cpu_obj)
 {
 int i;
@@ -632,8 +667,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
 if (!object_dynamic_cast(OBJECT(cpu), TYPE_RISCV_CPU_HOST)) {
 ret = kvm_vcpu_set_machine_ids(cpu, cs);
+if (ret != 0) {
+return ret;
+}
 }
 
+kvm_riscv_update_cpu_misa_ext(cpu, cs);
+
 return ret;
 }
 
-- 
2.41.0

[PATCH v8 02/20] hw/riscv/virt.c: skip 'mmu-type' FDT if satp mode not set

The absence of a satp mode in riscv_host_cpu_init() is causing the
following error:

$ ./qemu/build/qemu-system-riscv64  -machine virt,accel=kvm \
-m 2G -smp 1  -nographic -snapshot \
-kernel ./guest_imgs/Image \
-initrd ./guest_imgs/rootfs_kvm_riscv64.img \
-append "earlycon=sbi root=/dev/ram rw" \
-cpu host
**
ERROR:../target/riscv/cpu.c:320:satp_mode_str: code should not be
reached
Bail out! ERROR:../target/riscv/cpu.c:320:satp_mode_str: code should
not be reached
Aborted

The error is triggered from create_fdt_socket_cpus() in hw/riscv/virt.c.
It's trying to get satp_mode_str for a NULL cpu->cfg.satp_mode.map.

For this KVM cpu we would need to inherit the satp supported modes
from the RISC-V host. At this moment this is not possible because the
KVM driver does not support it. And even when it does we can't just let
this broken for every other older kernel.

Since mmu-type is not a required node, according to [1], skip the
'mmu-type' FDT node if there's no satp_mode set. We'll revisit this
logic when we can get satp information from KVM.

[1] 
https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/cpu.yaml

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
Reviewed-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/riscv/virt.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 8ff4b5fd71..ee77b005ef 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -244,13 +244,13 @@ static void create_fdt_socket_cpus(RISCVVirtState *s, int 
socket,
 s->soc[socket].hartid_base + cpu);
 qemu_fdt_add_subnode(ms->fdt, cpu_name);
 
-satp_mode_max = satp_mode_max_from_map(
-s->soc[socket].harts[cpu].cfg.satp_mode.map);
-sv_name = g_strdup_printf("riscv,%s",
-  satp_mode_str(satp_mode_max, is_32_bit));
-qemu_fdt_setprop_string(ms->fdt, cpu_name, "mmu-type", sv_name);
-g_free(sv_name);
-
+if (cpu_ptr->cfg.satp_mode.supported != 0) {
+satp_mode_max = satp_mode_max_from_map(cpu_ptr->cfg.satp_mode.map);
+sv_name = g_strdup_printf("riscv,%s",
+  satp_mode_str(satp_mode_max, is_32_bit));
+qemu_fdt_setprop_string(ms->fdt, cpu_name, "mmu-type", sv_name);
+g_free(sv_name);
+}
 
 name = riscv_isa_string(cpu_ptr);
 qemu_fdt_setprop_string(ms->fdt, cpu_name, "riscv,isa", name);
-- 
2.41.0

[PATCH v8 10/20] target/riscv/kvm.c: init 'misa_ext_mask' with scratch CPU

At this moment we're retrieving env->misa_ext during
kvm_arch_init_cpu(), leaving env->misa_ext_mask behind.

We want to set env->misa_ext_mask, and we want to set it as early as
possible. The reason is that we're going to use it in the validation
process of the KVM MISA properties we're going to add next. Setting it
during arch_init_cpu() is too late for user validation.

Move the code to a new helper that is going to be called during init()
time, via kvm_riscv_init_user_properties(), like we're already doing for
the machine ID properties. Set both misa_ext and misa_ext_mask to the
same value retrieved by the 'isa' config reg.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
Acked-by: Alistair Francis 
---
 target/riscv/kvm.c | 34 +++---
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 602727cdfd..4d0808cb9a 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -396,6 +396,28 @@ static void kvm_riscv_init_machine_ids(RISCVCPU *cpu, 
KVMScratchCPU *kvmcpu)
 }
 }
 
+static void kvm_riscv_init_misa_ext_mask(RISCVCPU *cpu,
+ KVMScratchCPU *kvmcpu)
+{
+CPURISCVState *env = >env;
+struct kvm_one_reg reg;
+int ret;
+
+reg.id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG,
+  KVM_REG_RISCV_CONFIG_REG(isa));
+reg.addr = (uint64_t)>misa_ext_mask;
+ret = ioctl(kvmcpu->cpufd, KVM_GET_ONE_REG, );
+
+if (ret) {
+error_report("Unable to fetch ISA register from KVM, "
+ "error %d", ret);
+kvm_riscv_destroy_scratch_vcpu(kvmcpu);
+exit(EXIT_FAILURE);
+}
+
+env->misa_ext = env->misa_ext_mask;
+}
+
 void kvm_riscv_init_user_properties(Object *cpu_obj)
 {
 RISCVCPU *cpu = RISCV_CPU(cpu_obj);
@@ -406,6 +428,7 @@ void kvm_riscv_init_user_properties(Object *cpu_obj)
 }
 
 kvm_riscv_init_machine_ids(cpu, );
+kvm_riscv_init_misa_ext_mask(cpu, );
 
 kvm_riscv_destroy_scratch_vcpu();
 }
@@ -525,21 +548,10 @@ static int kvm_vcpu_set_machine_ids(RISCVCPU *cpu, 
CPUState *cs)
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 int ret = 0;
-target_ulong isa;
 RISCVCPU *cpu = RISCV_CPU(cs);
-CPURISCVState *env = >env;
-uint64_t id;
 
 qemu_add_vm_change_state_handler(kvm_riscv_vm_state_change, cs);
 
-id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG,
-  KVM_REG_RISCV_CONFIG_REG(isa));
-ret = kvm_get_one_reg(cs, id, );
-if (ret) {
-return ret;
-}
-env->misa_ext = isa;
-
 if (!object_dynamic_cast(OBJECT(cpu), TYPE_RISCV_CPU_HOST)) {
 ret = kvm_vcpu_set_machine_ids(cpu, cs);
 }
-- 
2.41.0

[PATCH v8 09/20] linux-headers: Update to v6.4-rc1

Update to commit ac9a78681b92 ("Linux 6.4-rc1").

Signed-off-by: Daniel Henrique Barboza 
Acked-by: Alistair Francis 
---
 include/standard-headers/linux/const.h|  2 +-
 include/standard-headers/linux/virtio_blk.h   | 18 +++
 .../standard-headers/linux/virtio_config.h|  6 +++
 include/standard-headers/linux/virtio_net.h   |  1 +
 linux-headers/asm-arm64/kvm.h | 33 
 linux-headers/asm-riscv/kvm.h | 53 ++-
 linux-headers/asm-riscv/unistd.h  |  9 
 linux-headers/asm-s390/unistd_32.h|  1 +
 linux-headers/asm-s390/unistd_64.h|  1 +
 linux-headers/asm-x86/kvm.h   |  3 ++
 linux-headers/linux/const.h   |  2 +-
 linux-headers/linux/kvm.h | 12 +++--
 linux-headers/linux/psp-sev.h |  7 +++
 linux-headers/linux/userfaultfd.h | 17 +-
 14 files changed, 149 insertions(+), 16 deletions(-)

diff --git a/include/standard-headers/linux/const.h 
b/include/standard-headers/linux/const.h
index 5e48987251..1eb84b5087 100644
--- a/include/standard-headers/linux/const.h
+++ b/include/standard-headers/linux/const.h
@@ -28,7 +28,7 @@
 #define _BITUL(x)  (_UL(1) << (x))
 #define _BITULL(x) (_ULL(1) << (x))
 
-#define __ALIGN_KERNEL(x, a)   __ALIGN_KERNEL_MASK(x, (typeof(x))(a) - 
1)
+#define __ALIGN_KERNEL(x, a)   __ALIGN_KERNEL_MASK(x, 
(__typeof__(x))(a) - 1)
 #define __ALIGN_KERNEL_MASK(x, mask)   (((x) + (mask)) & ~(mask))
 
 #define __KERNEL_DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
diff --git a/include/standard-headers/linux/virtio_blk.h 
b/include/standard-headers/linux/virtio_blk.h
index 7155b1a470..d7be3cf5e4 100644
--- a/include/standard-headers/linux/virtio_blk.h
+++ b/include/standard-headers/linux/virtio_blk.h
@@ -138,11 +138,11 @@ struct virtio_blk_config {
 
/* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */
struct virtio_blk_zoned_characteristics {
-   uint32_t zone_sectors;
-   uint32_t max_open_zones;
-   uint32_t max_active_zones;
-   uint32_t max_append_sectors;
-   uint32_t write_granularity;
+   __virtio32 zone_sectors;
+   __virtio32 max_open_zones;
+   __virtio32 max_active_zones;
+   __virtio32 max_append_sectors;
+   __virtio32 write_granularity;
uint8_t model;
uint8_t unused2[3];
} zoned;
@@ -239,11 +239,11 @@ struct virtio_blk_outhdr {
  */
 struct virtio_blk_zone_descriptor {
/* Zone capacity */
-   uint64_t z_cap;
+   __virtio64 z_cap;
/* The starting sector of the zone */
-   uint64_t z_start;
+   __virtio64 z_start;
/* Zone write pointer position in sectors */
-   uint64_t z_wp;
+   __virtio64 z_wp;
/* Zone type */
uint8_t z_type;
/* Zone state */
@@ -252,7 +252,7 @@ struct virtio_blk_zone_descriptor {
 };
 
 struct virtio_blk_zone_report {
-   uint64_t nr_zones;
+   __virtio64 nr_zones;
uint8_t reserved[56];
struct virtio_blk_zone_descriptor zones[];
 };
diff --git a/include/standard-headers/linux/virtio_config.h 
b/include/standard-headers/linux/virtio_config.h
index 965ee6ae23..8a7d0dc8b0 100644
--- a/include/standard-headers/linux/virtio_config.h
+++ b/include/standard-headers/linux/virtio_config.h
@@ -97,6 +97,12 @@
  */
 #define VIRTIO_F_SR_IOV37
 
+/*
+ * This feature indicates that the driver passes extra data (besides
+ * identifying the virtqueue) in its device notifications.
+ */
+#define VIRTIO_F_NOTIFICATION_DATA 38
+
 /*
  * This feature indicates that the driver can reset a queue individually.
  */
diff --git a/include/standard-headers/linux/virtio_net.h 
b/include/standard-headers/linux/virtio_net.h
index c0e797067a..2325485f2c 100644
--- a/include/standard-headers/linux/virtio_net.h
+++ b/include/standard-headers/linux/virtio_net.h
@@ -61,6 +61,7 @@
 #define VIRTIO_NET_F_GUEST_USO655  /* Guest can handle USOv6 in. */
 #define VIRTIO_NET_F_HOST_USO  56  /* Host can handle USO in. */
 #define VIRTIO_NET_F_HASH_REPORT  57   /* Supports hash report */
+#define VIRTIO_NET_F_GUEST_HDRLEN  59  /* Guest provides the exact hdr_len 
value. */
 #define VIRTIO_NET_F_RSS 60/* Supports RSS RX steering */
 #define VIRTIO_NET_F_RSC_EXT 61/* extended coalescing info */
 #define VIRTIO_NET_F_STANDBY 62/* Act as standby for another device
diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index d7e7bb885e..38e5957526 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -198,6 +198,15 @@ struct kvm_arm_copy_mte_tags {
__u64 reserved[2];
 };
 
+/*
+ * Counter/Timer offset structure. Describe the virtual/physical offset.
+ * To be used with KVM_ARM_SET_COUNTER_OFFSET.
+ */

[PATCH v8 08/20] target/riscv: handle mvendorid/marchid/mimpid for KVM CPUs

After changing user validation for mvendorid/marchid/mimpid to guarantee
that the value is validated on user input time, coupled with the work in
fetching KVM default values for them by using a scratch CPU, we're
certain that the values in cpu->cfg.(mvendorid|marchid|mimpid) are
already good to be written back to KVM.

There's no need to write the values back for 'host' type CPUs since the
values can't be changed, so let's do that just for generic CPUs.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
Acked-by: Alistair Francis 
---
 target/riscv/kvm.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index cd2974c663..602727cdfd 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -495,6 +495,33 @@ void kvm_arch_init_irq_routing(KVMState *s)
 {
 }
 
+static int kvm_vcpu_set_machine_ids(RISCVCPU *cpu, CPUState *cs)
+{
+CPURISCVState *env = >env;
+uint64_t id;
+int ret;
+
+id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG,
+  KVM_REG_RISCV_CONFIG_REG(mvendorid));
+ret = kvm_set_one_reg(cs, id, >cfg.mvendorid);
+if (ret != 0) {
+return ret;
+}
+
+id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG,
+  KVM_REG_RISCV_CONFIG_REG(marchid));
+ret = kvm_set_one_reg(cs, id, >cfg.marchid);
+if (ret != 0) {
+return ret;
+}
+
+id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG,
+  KVM_REG_RISCV_CONFIG_REG(mimpid));
+ret = kvm_set_one_reg(cs, id, >cfg.mimpid);
+
+return ret;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 int ret = 0;
@@ -513,6 +540,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
 }
 env->misa_ext = isa;
 
+if (!object_dynamic_cast(OBJECT(cpu), TYPE_RISCV_CPU_HOST)) {
+ret = kvm_vcpu_set_machine_ids(cpu, cs);
+}
+
 return ret;
 }
 
-- 
2.41.0

Re: [PATCH V3] migration: simplify blockers

2023-07-05 Thread Steven Sistare

On 7/5/2023 5:33 PM, Steven Sistare wrote:
> On 6/7/2023 11:58 AM, Peter Xu wrote:
>> On Wed, Jun 07, 2023 at 07:35:32AM -0700, Steve Sistare wrote:
>>> Modify migrate_add_blocker and migrate_del_blocker to take an Error **
>>> reason.  This allows migration to own the Error object, so that if
>>> an error occurs, migration code can free the Error and clear the client
>>> handle, simplifying client code.
>>>
>>> This is also a pre-requisite for future patches that will add a mode
>>> argument to migration requests to support live update, and will maintain
>>> a list of blockers for each mode.  A blocker may apply to a single mode
>>> or to multiple modes, and passing Error** will allow one Error object to
>>> be registered for multiple modes.
>>>
>>> No functional change.
>>>
>>> Signed-off-by: Steve Sistare 
>>
>> Reviewed-by: Peter Xu 
> 
> Hi Juan,
>   This stand-alone patch is ready to be pulled.


Ahh nope, it has been too long and no longer applies cleanly.
I will rebase and repost.

- Steve

[PATCH v8 15/20] target/riscv/cpu.c: add satp_mode properties earlier

riscv_cpu_add_user_properties() ended up with an excess of "#ifndef
CONFIG_USER_ONLY" blocks after changes that added KVM properties
handling.

KVM specific properties are required to be created earlier than their
TCG counterparts, but the remaining props can be created at any order.
Move riscv_add_satp_mode_properties() to the start of the function,
inside the !CONFIG_USER_ONLY block already present there, to remove the
last ifndef block.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
Reviewed-by: Philippe Mathieu-Daudé 
---
 target/riscv/cpu.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 31e591a938..deb3c0f035 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1852,6 +1852,8 @@ static void riscv_cpu_add_user_properties(Object *obj)
 DeviceState *dev = DEVICE(obj);
 
 #ifndef CONFIG_USER_ONLY
+riscv_add_satp_mode_properties(obj);
+
 if (kvm_enabled()) {
 kvm_riscv_init_user_properties(obj);
 }
@@ -1870,10 +1872,6 @@ static void riscv_cpu_add_user_properties(Object *obj)
 #endif
 qdev_property_add_static(dev, prop);
 }
-
-#ifndef CONFIG_USER_ONLY
-riscv_add_satp_mode_properties(obj);
-#endif
 }
 
 static Property riscv_cpu_properties[] = {
-- 
2.41.0

[PATCH v8 19/20] target/riscv/kvm.c: add kvmconfig_get_cfg_addr() helper

There are 2 places in which we need to get a pointer to a certain
property of the cpu->cfg struct based on property offset. Next patch
will add a couple more.

Create a helper to avoid repeating this code over and over.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
---
 target/riscv/kvm.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 55ea189520..d503e03078 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -215,11 +215,15 @@ static KVMCPUConfig kvm_multi_ext_cfgs[] = {
 KVM_EXT_CFG("svpbmt", ext_svpbmt, KVM_RISCV_ISA_EXT_SVPBMT),
 };
 
+static void *kvmconfig_get_cfg_addr(RISCVCPU *cpu, KVMCPUConfig *kvmcfg)
+{
+return (void *)>cfg + kvmcfg->offset;
+}
+
 static void kvm_cpu_cfg_set(RISCVCPU *cpu, KVMCPUConfig *multi_ext,
 uint32_t val)
 {
-int cpu_cfg_offset = multi_ext->offset;
-bool *ext_enabled = (void *)>cfg + cpu_cfg_offset;
+bool *ext_enabled = kvmconfig_get_cfg_addr(cpu, multi_ext);
 
 *ext_enabled = val;
 }
@@ -227,8 +231,7 @@ static void kvm_cpu_cfg_set(RISCVCPU *cpu, KVMCPUConfig 
*multi_ext,
 static uint32_t kvm_cpu_cfg_get(RISCVCPU *cpu,
 KVMCPUConfig *multi_ext)
 {
-int cpu_cfg_offset = multi_ext->offset;
-bool *ext_enabled = (void *)>cfg + cpu_cfg_offset;
+bool *ext_enabled = kvmconfig_get_cfg_addr(cpu, multi_ext);
 
 return *ext_enabled;
 }
-- 
2.41.0

[PATCH v8 16/20] target/riscv/cpu.c: remove priv_ver check from riscv_isa_string_ext()

riscv_isa_string_ext() is being used by riscv_isa_string(), which is
then used by boards to retrieve the 'riscv,isa' string to be written in
the FDT. All this happens after riscv_cpu_realize(), meaning that we're
already past riscv_cpu_validate_set_extensions() and, more important,
riscv_cpu_disable_priv_spec_isa_exts().

This means that all extensions that needed to be disabled due to
priv_spec mismatch are already disabled. Checking this again during
riscv_isa_string_ext() is unneeded. Remove it.

As a bonus, riscv_isa_string_ext() can now be used with the 'host'
KVM-only CPU type since it doesn't have a env->priv_ver assigned and it
would fail this check for no good reason.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
---
 target/riscv/cpu.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index deb3c0f035..2acf77949f 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -2124,8 +2124,7 @@ static void riscv_isa_string_ext(RISCVCPU *cpu, char 
**isa_str,
 int i;
 
 for (i = 0; i < ARRAY_SIZE(isa_edata_arr); i++) {
-if (cpu->env.priv_ver >= isa_edata_arr[i].min_version &&
-isa_ext_is_enabled(cpu, _edata_arr[i])) {
+if (isa_ext_is_enabled(cpu, _edata_arr[i])) {
 new = g_strconcat(old, "_", isa_edata_arr[i].name, NULL);
 g_free(old);
 old = new;
-- 
2.41.0

[PATCH v8 06/20] target/riscv: use KVM scratch CPUs to init KVM properties

Certain validations, such as the validations done for the machine IDs
(mvendorid/marchid/mimpid), are done before starting the CPU.
Non-dynamic (named) CPUs tries to match user input with a preset
default. As it is today we can't prefetch a KVM default for these cases
because we're only able to read/write KVM regs after the vcpu is
spinning.

Our target/arm friends use a concept called "scratch CPU", which
consists of creating a vcpu for doing queries and validations and so on,
which is discarded shortly after use [1]. This is a suitable solution
for what we need so let's implement it in target/riscv as well.

kvm_riscv_init_machine_ids() will be used to do any pre-launch setup for
KVM CPUs, via riscv_cpu_add_user_properties(). The function will create
a KVM scratch CPU, fetch KVM regs that work as default values for user
properties, and then discard the scratch CPU afterwards.

We're starting by initializing 'mvendorid'. This concept will be used to
init other KVM specific properties in the next patches as well.

[1] target/arm/kvm.c, kvm_arm_create_scratch_host_vcpu()

Suggested-by: Andrew Jones 
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
Acked-by: Alistair Francis 
---
 target/riscv/cpu.c   |  6 +++
 target/riscv/kvm.c   | 85 
 target/riscv/kvm_riscv.h |  1 +
 3 files changed, 92 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 9080d021fa..0e1265bb17 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1792,6 +1792,12 @@ static void riscv_cpu_add_user_properties(Object *obj)
 Property *prop;
 DeviceState *dev = DEVICE(obj);
 
+#ifndef CONFIG_USER_ONLY
+if (kvm_enabled()) {
+kvm_riscv_init_user_properties(obj);
+}
+#endif
+
 riscv_cpu_add_misa_properties(obj);
 
 for (prop = riscv_cpu_extensions; prop && prop->name; prop++) {
diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 0f932a5b96..37f0f70794 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -309,6 +309,91 @@ static void kvm_riscv_put_regs_timer(CPUState *cs)
 env->kvm_timer_dirty = false;
 }
 
+typedef struct KVMScratchCPU {
+int kvmfd;
+int vmfd;
+int cpufd;
+} KVMScratchCPU;
+
+/*
+ * Heavily inspired by kvm_arm_create_scratch_host_vcpu()
+ * from target/arm/kvm.c.
+ */
+static bool kvm_riscv_create_scratch_vcpu(KVMScratchCPU *scratch)
+{
+int kvmfd = -1, vmfd = -1, cpufd = -1;
+
+kvmfd = qemu_open_old("/dev/kvm", O_RDWR);
+if (kvmfd < 0) {
+goto err;
+}
+do {
+vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
+} while (vmfd == -1 && errno == EINTR);
+if (vmfd < 0) {
+goto err;
+}
+cpufd = ioctl(vmfd, KVM_CREATE_VCPU, 0);
+if (cpufd < 0) {
+goto err;
+}
+
+scratch->kvmfd =  kvmfd;
+scratch->vmfd = vmfd;
+scratch->cpufd = cpufd;
+
+return true;
+
+ err:
+if (cpufd >= 0) {
+close(cpufd);
+}
+if (vmfd >= 0) {
+close(vmfd);
+}
+if (kvmfd >= 0) {
+close(kvmfd);
+}
+
+return false;
+}
+
+static void kvm_riscv_destroy_scratch_vcpu(KVMScratchCPU *scratch)
+{
+close(scratch->cpufd);
+close(scratch->vmfd);
+close(scratch->kvmfd);
+}
+
+static void kvm_riscv_init_machine_ids(RISCVCPU *cpu, KVMScratchCPU *kvmcpu)
+{
+CPURISCVState *env = >env;
+struct kvm_one_reg reg;
+int ret;
+
+reg.id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG,
+  KVM_REG_RISCV_CONFIG_REG(mvendorid));
+reg.addr = (uint64_t)>cfg.mvendorid;
+ret = ioctl(kvmcpu->cpufd, KVM_GET_ONE_REG, );
+if (ret != 0) {
+error_report("Unable to retrieve mvendorid from host, error %d", ret);
+}
+}
+
+void kvm_riscv_init_user_properties(Object *cpu_obj)
+{
+RISCVCPU *cpu = RISCV_CPU(cpu_obj);
+KVMScratchCPU kvmcpu;
+
+if (!kvm_riscv_create_scratch_vcpu()) {
+return;
+}
+
+kvm_riscv_init_machine_ids(cpu, );
+
+kvm_riscv_destroy_scratch_vcpu();
+}
+
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 KVM_CAP_LAST_INFO
 };
diff --git a/target/riscv/kvm_riscv.h b/target/riscv/kvm_riscv.h
index ed281bdce0..e3ba935808 100644
--- a/target/riscv/kvm_riscv.h
+++ b/target/riscv/kvm_riscv.h
@@ -19,6 +19,7 @@
 #ifndef QEMU_KVM_RISCV_H
 #define QEMU_KVM_RISCV_H
 
+void kvm_riscv_init_user_properties(Object *cpu_obj);
 void kvm_riscv_reset_vcpu(RISCVCPU *cpu);
 void kvm_riscv_set_irq(RISCVCPU *cpu, int irq, int level);
 
-- 
2.41.0

[PATCH v8 07/20] target/riscv: read marchid/mimpid in kvm_riscv_init_machine_ids()

Allow 'marchid' and 'mimpid' to also be initialized in
kvm_riscv_init_machine_ids().

After this change, the handling of mvendorid/marchid/mimpid for the
'host' CPU type will be equal to what we already have for TCG named
CPUs, i.e. the user is not able to set these values to a different val
than the one that is already preset.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
Acked-by: Alistair Francis 
---
 target/riscv/kvm.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 37f0f70794..cd2974c663 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -378,6 +378,22 @@ static void kvm_riscv_init_machine_ids(RISCVCPU *cpu, 
KVMScratchCPU *kvmcpu)
 if (ret != 0) {
 error_report("Unable to retrieve mvendorid from host, error %d", ret);
 }
+
+reg.id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG,
+  KVM_REG_RISCV_CONFIG_REG(marchid));
+reg.addr = (uint64_t)>cfg.marchid;
+ret = ioctl(kvmcpu->cpufd, KVM_GET_ONE_REG, );
+if (ret != 0) {
+error_report("Unable to retrieve marchid from host, error %d", ret);
+}
+
+reg.id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG,
+  KVM_REG_RISCV_CONFIG_REG(mimpid));
+reg.addr = (uint64_t)>cfg.mimpid;
+ret = ioctl(kvmcpu->cpufd, KVM_GET_ONE_REG, );
+if (ret != 0) {
+error_report("Unable to retrieve mimpid from host, error %d", ret);
+}
 }
 
 void kvm_riscv_init_user_properties(Object *cpu_obj)
-- 
2.41.0

[PATCH v8 05/20] target/riscv/cpu.c: restrict 'marchid' value

'marchid' shouldn't be set to a different value as previously set for
named CPUs.

For all other CPUs it shouldn't be freely set either - the spec requires
that 'marchid' can't have the MSB (most significant bit) set and every
other bit set to zero, i.e. 0x8000 is an invalid 'marchid' value for
32 bit CPUs.

As with 'mimpid', setting a default value based on the current QEMU
version is not a good idea because it implies that the CPU
implementation changes from one QEMU version to the other. Named CPUs
should set 'marchid' to a meaningful value instead, and generic CPUs can
set to any valid value.

For the 'veyron-v1' CPU this is the error thrown if 'marchid' is set to
a different val:

$ ./build/qemu-system-riscv64 -M virt -nographic -cpu 
veyron-v1,marchid=0x8000
qemu-system-riscv64: can't apply global veyron-v1-riscv-cpu.marchid=0x8000:
Unable to change veyron-v1-riscv-cpu marchid (0x8001)

And, for generics CPUs, this is the error when trying to set to an
invalid val:

$ ./build/qemu-system-riscv64 -M virt -nographic -cpu 
rv64,marchid=0x8000
qemu-system-riscv64: can't apply global 
rv64-riscv-cpu.marchid=0x8000:
Unable to set marchid with MSB (64) bit set and the remaining bits zero

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
Reviewed-by: Alistair Francis 
---
 target/riscv/cpu.c | 60 --
 1 file changed, 53 insertions(+), 7 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 477f8f8f97..9080d021fa 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -39,11 +39,6 @@
 #include "tcg/tcg.h"
 
 /* RISC-V CPU definitions */
-
-#define RISCV_CPU_MARCHID   ((QEMU_VERSION_MAJOR << 16) | \
- (QEMU_VERSION_MINOR << 8)  | \
- (QEMU_VERSION_MICRO))
-
 static const char riscv_single_letter_exts[] = "IEMAFDQCPVH";
 
 struct isa_ext_data {
@@ -1811,8 +1806,6 @@ static void riscv_cpu_add_user_properties(Object *obj)
 static Property riscv_cpu_properties[] = {
 DEFINE_PROP_BOOL("debug", RISCVCPU, cfg.debug, true),
 
-DEFINE_PROP_UINT64("marchid", RISCVCPU, cfg.marchid, RISCV_CPU_MARCHID),
-
 #ifndef CONFIG_USER_ONLY
 DEFINE_PROP_UINT64("resetvec", RISCVCPU, env.resetvec, DEFAULT_RSTVEC),
 #endif
@@ -1959,6 +1952,56 @@ static void cpu_get_mimpid(Object *obj, Visitor *v, 
const char *name,
 visit_type_bool(v, name, , errp);
 }
 
+static void cpu_set_marchid(Object *obj, Visitor *v, const char *name,
+void *opaque, Error **errp)
+{
+bool dynamic_cpu = riscv_cpu_is_dynamic(obj);
+RISCVCPU *cpu = RISCV_CPU(obj);
+uint64_t prev_val = cpu->cfg.marchid;
+uint64_t value, invalid_val;
+uint32_t mxlen = 0;
+
+if (!visit_type_uint64(v, name, , errp)) {
+return;
+}
+
+if (!dynamic_cpu && prev_val != value) {
+error_setg(errp, "Unable to change %s marchid (0x%" PRIu64 ")",
+   object_get_typename(obj), prev_val);
+return;
+}
+
+switch (riscv_cpu_mxl(>env)) {
+case MXL_RV32:
+mxlen = 32;
+break;
+case MXL_RV64:
+case MXL_RV128:
+mxlen = 64;
+break;
+default:
+g_assert_not_reached();
+}
+
+invalid_val = 1LL << (mxlen - 1);
+
+if (value == invalid_val) {
+error_setg(errp, "Unable to set marchid with MSB (%u) bit set "
+ "and the remaining bits zero", mxlen);
+return;
+}
+
+cpu->cfg.marchid = value;
+}
+
+static void cpu_get_marchid(Object *obj, Visitor *v, const char *name,
+   void *opaque, Error **errp)
+{
+bool value = RISCV_CPU(obj)->cfg.marchid;
+
+visit_type_bool(v, name, , errp);
+}
+
 static void riscv_cpu_class_init(ObjectClass *c, void *data)
 {
 RISCVCPUClass *mcc = RISCV_CPU_CLASS(c);
@@ -1996,6 +2039,9 @@ static void riscv_cpu_class_init(ObjectClass *c, void 
*data)
 object_class_property_add(c, "mimpid", "uint64", cpu_get_mimpid,
   cpu_set_mimpid, NULL, NULL);
 
+object_class_property_add(c, "marchid", "uint64", cpu_get_marchid,
+  cpu_set_marchid, NULL, NULL);
+
 device_class_set_props(dc, riscv_cpu_properties);
 }
 
-- 
2.41.0

[PATCH v8 18/20] target/riscv: update multi-letter extension KVM properties

We're now ready to update the multi-letter extensions status for KVM.

kvm_riscv_update_cpu_cfg_isa_ext() is called called during vcpu creation
time to verify which user options changes host defaults (via the 'user_set'
flag) and tries to write them back to KVM.

Failure to commit a change to KVM is only ignored in case KVM doesn't
know about the extension (-EINVAL error code) and the user wanted to
disable the given extension. Otherwise we're going to abort the boot
process.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
---
 target/riscv/kvm.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index f2545bd560..55ea189520 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -273,6 +273,32 @@ static void kvm_cpu_set_multi_ext_cfg(Object *obj, Visitor 
*v,
 kvm_cpu_cfg_set(cpu, multi_ext_cfg, value);
 }
 
+static void kvm_riscv_update_cpu_cfg_isa_ext(RISCVCPU *cpu, CPUState *cs)
+{
+CPURISCVState *env = >env;
+uint64_t id, reg;
+int i, ret;
+
+for (i = 0; i < ARRAY_SIZE(kvm_multi_ext_cfgs); i++) {
+KVMCPUConfig *multi_ext_cfg = _multi_ext_cfgs[i];
+
+if (!multi_ext_cfg->user_set) {
+continue;
+}
+
+id = kvm_riscv_reg_id(env, KVM_REG_RISCV_ISA_EXT,
+  multi_ext_cfg->kvm_reg_id);
+reg = kvm_cpu_cfg_get(cpu, multi_ext_cfg);
+ret = kvm_set_one_reg(cs, id, );
+if (ret != 0) {
+error_report("Unable to %s extension %s in KVM, error %d",
+ reg ? "enable" : "disable",
+ multi_ext_cfg->name, ret);
+exit(EXIT_FAILURE);
+}
+}
+}
+
 static void kvm_riscv_add_cpu_user_properties(Object *cpu_obj)
 {
 int i;
@@ -792,6 +818,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
 }
 
 kvm_riscv_update_cpu_misa_ext(cpu, cs);
+kvm_riscv_update_cpu_cfg_isa_ext(cpu, cs);
 
 return ret;
 }
-- 
2.41.0

[PATCH v8 11/20] target/riscv/cpu: add misa_ext_info_arr[]

Next patch will add KVM specific user properties for both MISA and
multi-letter extensions. For MISA extensions we want to make use of what
is already available in misa_ext_cfgs[] to avoid code repetition.

misa_ext_info_arr[] array will hold name and description for each MISA
extension that misa_ext_cfgs[] is declaring. We'll then use this new
array in KVM code to avoid duplicating strings. Two getters were added
to allow KVM to retrieve the 'name' and 'description' for each MISA
property.

There's nothing holding us back from doing the same with multi-letter
extensions. For now doing just with MISA extensions is enough.

It is worth documenting that even using the __bultin_ctz() directive to
populate the misa_ext_info_arr[] we are forced to assign 'name' and
'description' during runtime in riscv_cpu_add_misa_properties(). The
reason is that some Gitlab runners ('clang-user' and 'tsan-build') will
throw errors like this if we fetch 'name' and 'description' from the
array in the MISA_CFG() macro:

../target/riscv/cpu.c:1624:5: error: initializer element is not a
  compile-time constant
MISA_CFG(RVA, true),
^~~
../target/riscv/cpu.c:1619:53: note: expanded from macro 'MISA_CFG'
{.name = misa_ext_info_arr[MISA_INFO_IDX(_bit)].name, \
 ~~~^~~~

gcc and others compilers/builders were fine with that change. We can't
ignore failures in the Gitlab pipeline though, so code was changed to
make every runner happy.

As a side effect, misa_ext_cfg[] is no longer a 'const' array because
it must be set during runtime.

Suggested-by: Andrew Jones 
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Andrew Jones 
---
 target/riscv/cpu.c | 110 +
 target/riscv/cpu.h |   7 ++-
 2 files changed, 88 insertions(+), 29 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 0e1265bb17..35ba220c8f 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1636,33 +1636,83 @@ static void cpu_get_misa_ext_cfg(Object *obj, Visitor 
*v, const char *name,
 visit_type_bool(v, name, , errp);
 }
 
-static const RISCVCPUMisaExtConfig misa_ext_cfgs[] = {
-{.name = "a", .description = "Atomic instructions",
- .misa_bit = RVA, .enabled = true},
-{.name = "c", .description = "Compressed instructions",
- .misa_bit = RVC, .enabled = true},
-{.name = "d", .description = "Double-precision float point",
- .misa_bit = RVD, .enabled = true},
-{.name = "f", .description = "Single-precision float point",
- .misa_bit = RVF, .enabled = true},
-{.name = "i", .description = "Base integer instruction set",
- .misa_bit = RVI, .enabled = true},
-{.name = "e", .description = "Base integer instruction set (embedded)",
- .misa_bit = RVE, .enabled = false},
-{.name = "m", .description = "Integer multiplication and division",
- .misa_bit = RVM, .enabled = true},
-{.name = "s", .description = "Supervisor-level instructions",
- .misa_bit = RVS, .enabled = true},
-{.name = "u", .description = "User-level instructions",
- .misa_bit = RVU, .enabled = true},
-{.name = "h", .description = "Hypervisor",
- .misa_bit = RVH, .enabled = true},
-{.name = "x-j", .description = "Dynamic translated languages",
- .misa_bit = RVJ, .enabled = false},
-{.name = "v", .description = "Vector operations",
- .misa_bit = RVV, .enabled = false},
-{.name = "g", .description = "General purpose (IMAFD_Zicsr_Zifencei)",
- .misa_bit = RVG, .enabled = false},
+typedef struct misa_ext_info {
+const char *name;
+const char *description;
+} MISAExtInfo;
+
+#define MISA_INFO_IDX(_bit) \
+__builtin_ctz(_bit)
+
+#define MISA_EXT_INFO(_bit, _propname, _descr) \
+[MISA_INFO_IDX(_bit)] = {.name = _propname, .description = _descr}
+
+static const MISAExtInfo misa_ext_info_arr[] = {
+MISA_EXT_INFO(RVA, "a", "Atomic instructions"),
+MISA_EXT_INFO(RVC, "c", "Compressed instructions"),
+MISA_EXT_INFO(RVD, "d", "Double-precision float point"),
+MISA_EXT_INFO(RVF, "f", "Single-precision float point"),
+MISA_EXT_INFO(RVI, "i", "Base integer instruction set"),
+MISA_EXT_INFO(RVE, "e", "Base integer instruction set (embedded)"),
+MISA_EXT_INFO(RVM, "m", "Integer multiplication and division"),
+MISA_EXT_INFO(RVS, "s", "Supervisor-level instructions"),
+MISA_EXT_INFO(RVU, "u", "User-level instructions"),
+MISA_EXT_INFO(RVH, "h", "Hypervisor"),
+MISA_EXT_INFO(RVJ, "x-j", "Dynamic translated languages"),
+MISA_EXT_INFO(RVV, "v", "Vector operations"),
+MISA_EXT_INFO(RVG, "g", "General purpose (IMAFD_Zicsr_Zifencei)"),
+};
+
+static int riscv_validate_misa_info_idx(uint32_t bit)
+{
+int idx;
+
+/*
+ * Our lowest valid input (RVA) is 1 and
+ * __builtin_ctz() is UB with zero.
+ */
+g_assert(bit != 0);
+idx =

[PATCH v8 17/20] target/riscv/cpu.c: create KVM mock properties

KVM-specific properties are being created inside target/riscv/kvm.c. But
at this moment we're gathering all the remaining properties from TCG and
adding them as is when running KVM. This creates a situation where
non-KVM properties are setting flags to 'true' due to its default
settings (e.g.  Zawrs). Users can also freely enable them via command
line.

This doesn't impact runtime per se because KVM doesn't care about these
flags, but code such as riscv_isa_string_ext() take those flags into
account. The result is that, for a KVM guest, setting non-KVM properties
will make them appear in the riscv,isa DT.

We want to keep the same API for both TCG and KVM and at the same time,
when running KVM, forbid non-KVM extensions to be enabled internally. We
accomplish both by changing riscv_cpu_add_user_properties() to add a
mock boolean property for every non-KVM extension in
riscv_cpu_extensions[]. Then, when running KVM, users are still free to
set extensions at will, but we'll error out if a non-KVM extension is
enabled. Setting such extension to 'false' will be ignored.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
---
 target/riscv/cpu.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 2acf77949f..b2883ca533 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1840,6 +1840,26 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+
+#ifndef CONFIG_USER_ONLY
+static void cpu_set_cfg_unavailable(Object *obj, Visitor *v,
+const char *name,
+void *opaque, Error **errp)
+{
+const char *propname = opaque;
+bool value;
+
+if (!visit_type_bool(v, name, , errp)) {
+return;
+}
+
+if (value) {
+error_setg(errp, "extension %s is not available with KVM",
+   propname);
+}
+}
+#endif
+
 /*
  * Add CPU properties with user-facing flags.
  *
@@ -1868,6 +1888,22 @@ static void riscv_cpu_add_user_properties(Object *obj)
 if (object_property_find(obj, prop->name)) {
 continue;
 }
+
+/*
+ * Set the default to disabled for every extension
+ * unknown to KVM and error out if the user attempts
+ * to enable any of them.
+ *
+ * We're giving a pass for non-bool properties since they're
+ * not related to the availability of extensions and can be
+ * safely ignored as is.
+ */
+if (prop->info == _prop_bool) {
+object_property_add(obj, prop->name, "bool",
+NULL, cpu_set_cfg_unavailable,
+NULL, (void *)prop->name);
+continue;
+}
 }
 #endif
 qdev_property_add_static(dev, prop);
-- 
2.41.0

[PATCH v8 12/20] target/riscv: add KVM specific MISA properties

Using all TCG user properties in KVM is tricky. First because KVM
supports only a small subset of what TCG provides, so most of the
cpu->cfg flags do nothing for KVM.

Second, and more important, we don't have a way of telling if any given
value is an user input or not. For TCG this has a small impact since we
just validating everything and error out if needed. But for KVM it would
be good to know if a given value was set by the user or if it's a value
already provided by KVM. Otherwise we don't know how to handle failed
kvm_set_one_regs() when writing the configurations back.

These characteristics make it overly complicated to use the same user
facing flags for both KVM and TCG. A simpler approach is to create KVM
specific properties that have specialized logic, forking KVM and TCG use
cases for those cases only. Fully separating KVM/TCG properties is
unneeded at this point - in fact we want the user experience to be as
equal as possible, regardless of the acceleration chosen.

We'll start this fork with the MISA properties, adding the MISA bits
that the KVM driver currently supports. A new KVMCPUConfig type is
introduced. It'll hold general information about an extension. For MISA
extensions we're going to use the newly created getters of
misa_ext_infos[] to populate their name and description. 'offset' holds
the MISA bit (RVA, RVC, ...). We're calling it 'offset' instead of
'misa_bit' because this same KVMCPUConfig struct will be used to
multi-letter extensions later on.

This new type also holds a 'user_set' flag. This flag will be set when
the user set an option that's different than what is already configured
in the host, requiring KVM intervention to write the regs back during
kvm_arch_init_vcpu(). Similar mechanics will be implemented for
multi-letter extensions as well.

There is no need to duplicate more code than necessary, so we're going
to use the existing kvm_riscv_init_user_properties() to add the KVM
specific properties. Any code that is adding a TCG user prop is then
changed slightly to verify first if there's a KVM prop with the same
name already added.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
---
 target/riscv/cpu.c |  5 +++
 target/riscv/kvm.c | 78 ++
 2 files changed, 83 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 35ba220c8f..5c8832a030 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1726,6 +1726,11 @@ static void riscv_cpu_add_misa_properties(Object 
*cpu_obj)
 misa_cfg->name = riscv_get_misa_ext_name(bit);
 misa_cfg->description = riscv_get_misa_ext_description(bit);
 
+/* Check if KVM already created the property */
+if (object_property_find(cpu_obj, misa_cfg->name)) {
+continue;
+}
+
 object_property_add(cpu_obj, misa_cfg->name, "bool",
 cpu_get_misa_ext_cfg,
 cpu_set_misa_ext_cfg,
diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 4d0808cb9a..c55d0ec7ab 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -22,8 +22,10 @@
 #include 
 
 #include "qemu/timer.h"
+#include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
+#include "qapi/visitor.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
 #include "sysemu/kvm_int.h"
@@ -105,6 +107,81 @@ static uint64_t kvm_riscv_reg_id(CPURISCVState *env, 
uint64_t type,
 } \
 } while (0)
 
+typedef struct KVMCPUConfig {
+const char *name;
+const char *description;
+target_ulong offset;
+int kvm_reg_id;
+bool user_set;
+} KVMCPUConfig;
+
+#define KVM_MISA_CFG(_bit, _reg_id) \
+{.offset = _bit, .kvm_reg_id = _reg_id}
+
+/* KVM ISA extensions */
+static KVMCPUConfig kvm_misa_ext_cfgs[] = {
+KVM_MISA_CFG(RVA, KVM_RISCV_ISA_EXT_A),
+KVM_MISA_CFG(RVC, KVM_RISCV_ISA_EXT_C),
+KVM_MISA_CFG(RVD, KVM_RISCV_ISA_EXT_D),
+KVM_MISA_CFG(RVF, KVM_RISCV_ISA_EXT_F),
+KVM_MISA_CFG(RVH, KVM_RISCV_ISA_EXT_H),
+KVM_MISA_CFG(RVI, KVM_RISCV_ISA_EXT_I),
+KVM_MISA_CFG(RVM, KVM_RISCV_ISA_EXT_M),
+};
+
+static void kvm_cpu_set_misa_ext_cfg(Object *obj, Visitor *v,
+ const char *name,
+ void *opaque, Error **errp)
+{
+KVMCPUConfig *misa_ext_cfg = opaque;
+target_ulong misa_bit = misa_ext_cfg->offset;
+RISCVCPU *cpu = RISCV_CPU(obj);
+CPURISCVState *env = >env;
+bool value, host_bit;
+
+if (!visit_type_bool(v, name, , errp)) {
+return;
+}
+
+host_bit = env->misa_ext_mask & misa_bit;
+
+if (value == host_bit) {
+return;
+}
+
+if (!value) {
+misa_ext_cfg->user_set = true;
+return;
+}
+
+/*
+ * Forbid users to enable extensions that aren't
+ * available in the hart.
+ */
+error_setg(errp, "Enabling MISA bit '%s' is not allowed: it's not "
+   "enabled in the

[PATCH v8 01/20] target/riscv: skip features setup for KVM CPUs

As it is today it's not possible to use '-cpu host' if the RISC-V host
has RVH enabled. This is the resulting error:

$ ./qemu/build/qemu-system-riscv64 \
-machine virt,accel=kvm -m 2G -smp 1 \
-nographic -snapshot -kernel ./guest_imgs/Image  \
-initrd ./guest_imgs/rootfs_kvm_riscv64.img \
-append "earlycon=sbi root=/dev/ram rw" \
-cpu host
qemu-system-riscv64: H extension requires priv spec 1.12.0

This happens because we're checking for priv spec for all CPUs, and
since we're not setting  env->priv_ver for the 'host' CPU, it's being
default to zero (i.e. PRIV_SPEC_1_10_0).

In reality env->priv_ver does not make sense when running with the KVM
'host' CPU. It's used to gate certain CSRs/extensions during translation
to make them unavailable if the hart declares an older spec version. It
doesn't have any other use. E.g. OpenSBI version 1.2 retrieves the spec
checking if the CSR_MCOUNTEREN, CSR_MCOUNTINHIBIT and CSR_MENVCFG CSRs
are available [1].

'priv_ver' is just one example. We're doing a lot of feature validation
and setup during riscv_cpu_realize() that it doesn't apply to KVM CPUs.
Validating the feature set for those CPUs is a KVM problem that should
be handled in KVM specific code.

The new riscv_cpu_realize_tcg() helper contains all validation logic that
are applicable to TCG CPUs only. riscv_cpu_realize() verifies if we're
running TCG and, if it's the case, proceed with the usual TCG realize()
logic.

[1] lib/sbi/sbi_hart.c, hart_detect_features()

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
---
 target/riscv/cpu.c | 35 +--
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index fd647534cf..6232e6513b 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -34,6 +34,7 @@
 #include "migration/vmstate.h"
 #include "fpu/softfloat-helpers.h"
 #include "sysemu/kvm.h"
+#include "sysemu/tcg.h"
 #include "kvm_riscv.h"
 #include "tcg/tcg.h"
 
@@ -1386,20 +1387,12 @@ static void riscv_cpu_validate_misa_priv(CPURISCVState 
*env, Error **errp)
 }
 }
 
-static void riscv_cpu_realize(DeviceState *dev, Error **errp)
+static void riscv_cpu_realize_tcg(DeviceState *dev, Error **errp)
 {
-CPUState *cs = CPU(dev);
 RISCVCPU *cpu = RISCV_CPU(dev);
 CPURISCVState *env = >env;
-RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
 Error *local_err = NULL;
 
-cpu_exec_realizefn(cs, _err);
-if (local_err != NULL) {
-error_propagate(errp, local_err);
-return;
-}
-
 riscv_cpu_validate_misa_mxl(cpu, _err);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
@@ -1434,7 +1427,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 }
 
 #ifndef CONFIG_USER_ONLY
-cs->tcg_cflags |= CF_PCREL;
+CPU(dev)->tcg_cflags |= CF_PCREL;
 
 if (cpu->cfg.ext_sstc) {
 riscv_timer_init(cpu);
@@ -1447,6 +1440,28 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 }
  }
 #endif
+}
+
+static void riscv_cpu_realize(DeviceState *dev, Error **errp)
+{
+CPUState *cs = CPU(dev);
+RISCVCPU *cpu = RISCV_CPU(dev);
+RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
+Error *local_err = NULL;
+
+cpu_exec_realizefn(cs, _err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
+}
+
+if (tcg_enabled()) {
+riscv_cpu_realize_tcg(dev, _err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
+}
+}
 
 riscv_cpu_finalize_features(cpu, _err);
 if (local_err != NULL) {
-- 
2.41.0

[PATCH v8 04/20] target/riscv/cpu.c: restrict 'mimpid' value

Following the same logic used with 'mvendorid' let's also restrict
'mimpid' for named CPUs. Generic CPUs keep setting the value freely.

Note that we're getting rid of the default RISCV_CPU_MARCHID value. The
reason is that this is not a good default since it's dynamic, changing
with with every QEMU version, regardless of whether the actual
implementation of the CPU changed from one QEMU version to the other.
Named CPU should set it to a meaningful value instead and generic CPUs
can set whatever they want.

This is the error thrown for an invalid 'mimpid' value for the veyron-v1
CPU:

$ ./qemu-system-riscv64 -M virt -nographic -cpu veyron-v1,mimpid=2
qemu-system-riscv64: can't apply global veyron-v1-riscv-cpu.mimpid=2:
Unable to change veyron-v1-riscv-cpu mimpid (0x111)

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
Reviewed-by: Alistair Francis 
---
 target/riscv/cpu.c | 34 --
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index a778241d9f..477f8f8f97 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -43,7 +43,6 @@
 #define RISCV_CPU_MARCHID   ((QEMU_VERSION_MAJOR << 16) | \
  (QEMU_VERSION_MINOR << 8)  | \
  (QEMU_VERSION_MICRO))
-#define RISCV_CPU_MIMPIDRISCV_CPU_MARCHID
 
 static const char riscv_single_letter_exts[] = "IEMAFDQCPVH";
 
@@ -1813,7 +1812,6 @@ static Property riscv_cpu_properties[] = {
 DEFINE_PROP_BOOL("debug", RISCVCPU, cfg.debug, true),
 
 DEFINE_PROP_UINT64("marchid", RISCVCPU, cfg.marchid, RISCV_CPU_MARCHID),
-DEFINE_PROP_UINT64("mimpid", RISCVCPU, cfg.mimpid, RISCV_CPU_MIMPID),
 
 #ifndef CONFIG_USER_ONLY
 DEFINE_PROP_UINT64("resetvec", RISCVCPU, env.resetvec, DEFAULT_RSTVEC),
@@ -1932,6 +1930,35 @@ static void cpu_get_mvendorid(Object *obj, Visitor *v, 
const char *name,
 visit_type_bool(v, name, , errp);
 }
 
+static void cpu_set_mimpid(Object *obj, Visitor *v, const char *name,
+   void *opaque, Error **errp)
+{
+bool dynamic_cpu = riscv_cpu_is_dynamic(obj);
+RISCVCPU *cpu = RISCV_CPU(obj);
+uint64_t prev_val = cpu->cfg.mimpid;
+uint64_t value;
+
+if (!visit_type_uint64(v, name, , errp)) {
+return;
+}
+
+if (!dynamic_cpu && prev_val != value) {
+error_setg(errp, "Unable to change %s mimpid (0x%" PRIu64 ")",
+   object_get_typename(obj), prev_val);
+return;
+}
+
+cpu->cfg.mimpid = value;
+}
+
+static void cpu_get_mimpid(Object *obj, Visitor *v, const char *name,
+   void *opaque, Error **errp)
+{
+bool value = RISCV_CPU(obj)->cfg.mimpid;
+
+visit_type_bool(v, name, , errp);
+}
+
 static void riscv_cpu_class_init(ObjectClass *c, void *data)
 {
 RISCVCPUClass *mcc = RISCV_CPU_CLASS(c);
@@ -1966,6 +1993,9 @@ static void riscv_cpu_class_init(ObjectClass *c, void 
*data)
 object_class_property_add(c, "mvendorid", "uint32", cpu_get_mvendorid,
   cpu_set_mvendorid, NULL, NULL);
 
+object_class_property_add(c, "mimpid", "uint64", cpu_get_mimpid,
+  cpu_set_mimpid, NULL, NULL);
+
 device_class_set_props(dc, riscv_cpu_properties);
 }
 
-- 
2.41.0

Re: [PATCH V3] migration: simplify blockers

2023-07-05 Thread Steven Sistare

On 6/7/2023 11:58 AM, Peter Xu wrote:
> On Wed, Jun 07, 2023 at 07:35:32AM -0700, Steve Sistare wrote:
>> Modify migrate_add_blocker and migrate_del_blocker to take an Error **
>> reason.  This allows migration to own the Error object, so that if
>> an error occurs, migration code can free the Error and clear the client
>> handle, simplifying client code.
>>
>> This is also a pre-requisite for future patches that will add a mode
>> argument to migration requests to support live update, and will maintain
>> a list of blockers for each mode.  A blocker may apply to a single mode
>> or to multiple modes, and passing Error** will allow one Error object to
>> be registered for multiple modes.
>>
>> No functional change.
>>
>> Signed-off-by: Steve Sistare 
> 
> Reviewed-by: Peter Xu 

Hi Juan,
  This stand-alone patch is ready to be pulled.

- Steve

[PATCH v2] net: add initial support for AF_XDP network backend

2023-07-05 Thread Ilya Maximets

AF_XDP is a network socket family that allows communication directly
with the network device driver in the kernel, bypassing most or all
of the kernel networking stack.  In the essence, the technology is
pretty similar to netmap.  But, unlike netmap, AF_XDP is Linux-native
and works with any network interfaces without driver modifications.
Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't
require access to character devices or unix sockets.  Only access to
the network interface itself is necessary.

This patch implements a network backend that communicates with the
kernel by creating an AF_XDP socket.  A chunk of userspace memory
is shared between QEMU and the host kernel.  4 ring buffers (Tx, Rx,
Fill and Completion) are placed in that memory along with a pool of
memory buffers for the packet data.  Data transmission is done by
allocating one of the buffers, copying packet data into it and
placing the pointer into Tx ring.  After transmission, device will
return the buffer via Completion ring.  On Rx, device will take
a buffer form a pre-populated Fill ring, write the packet data into
it and place the buffer into Rx ring.

AF_XDP network backend takes on the communication with the host
kernel and the network interface and forwards packets to/from the
peer device in QEMU.

Usage example:

  -device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C
  -netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1

XDP program bridges the socket with a network interface.  It can be
attached to the interface in 2 different modes:

1. skb - this mode should work for any interface and doesn't require
 driver support.  With a caveat of lower performance.

2. native - this does require support from the driver and allows to
bypass skb allocation in the kernel and potentially use
zero-copy while getting packets in/out userspace.

By default, QEMU will try to use native mode and fall back to skb.
Mode can be forced via 'mode' option.  To force 'copy' even in native
mode, use 'force-copy=on' option.  This might be useful if there is
some issue with the driver.

Option 'queues=N' allows to specify how many device queues should
be open.  Note that all the queues that are not open are still
functional and can receive traffic, but it will not be delivered to
QEMU.  So, the number of device queues should generally match the
QEMU configuration, unless the device is shared with something
else and the traffic re-direction to appropriate queues is correctly
configured on a device level (e.g. with ethtool -N).
'start-queue=M' option can be used to specify from which queue id
QEMU should start configuring 'N' queues.  It might also be necessary
to use this option with certain NICs, e.g. MLX5 NICs.  See the docs
for examples.

In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN
capabilities in order to load default XSK/XDP programs to the
network interface and configure BPF maps.  It is possible, however,
to run with no capabilities.  For that to work, an external process
with admin capabilities will need to pre-load default XSK program,
create AF_XDP sockets and pass their file descriptors to QEMU process
on startup via 'sock-fds' option.  Network backend will need to be
configured with 'inhibit=on' to avoid loading of the program.
QEMU will need 32 MB of locked memory (RLIMIT_MEMLOCK) per queue
or CAP_IPC_LOCK.

Alternatively, the file descriptor for 'xsks_map' can be passed via
'xsks-map-fd=N' option instead of passing socket file descriptors.
That will additionally require CAP_NET_RAW on QEMU side.  This is
useful, because 'sock-fds' may not be available with older libxdp.
'sock-fds' requires libxdp >= 1.4.0.

There are few performance challenges with the current network backends.

First is that they do not support IO threads.  This means that data
path is handled by the main thread in QEMU and may slow down other
work or may be slowed down by some other work.  This also means that
taking advantage of multi-queue is generally not possible today.

Another thing is that data path is going through the device emulation
code, which is not really optimized for performance.  The fastest
"frontend" device is virtio-net.  But it's not optimized for heavy
traffic either, because it expects such use-cases to be handled via
some implementation of vhost (user, kernel, vdpa).  In practice, we
have virtio notifications and rcu lock/unlock on a per-packet basis
and not very efficient accesses to the guest memory.  Communication
channels between backend and frontend devices do not allow passing
more than one packet at a time as well.

Some of these challenges can be avoided in the future by adding better
batching into device emulation or by implementing vhost-af-xdp variant.

There are also a few kernel limitations.  AF_XDP sockets do not
support any kinds of checksum or segmentation offloading.  Buffers
are limited to a page size (4K), i.e. MTU is limited.  Multi-buffer
support

Re: [PATCH v2 3/7] migration: Introduce migrate_has_error()

Peter Xu  writes:

> Introduce a helper to detect whether MigrationState.error is set for
> whatever reason.  It is intended to not taking the error_mutex here because
> neither do we reference the pointer, nor do we modify the pointer.  State
> why it's safe to do so.
>
> This is preparation work for any thread (e.g. source return path thread) to
> setup errors in an unified way to MigrationState, rather than relying on
> its own way to set errors (mark_source_rp_bad()).
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH v2 2/7] migration: Let migrate_set_error() take ownership

Peter Xu  writes:

> migrate_set_error() used one error_copy() so it always copy an error.
> However that's not the major use case - the major use case is one would
> like to pass the error to migrate_set_error() without further touching the
> error.
>
> It can be proved if we see most of the callers are freeing the error
> explicitly right afterwards.  There're a few outliers (only if when the
> caller) where we can use error_copy() explicitly there.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH] target/arm: gdbstub: Guard M-profile code with CONFIG_TCG

Richard Henderson  writes:

> On 7/4/23 17:44, Peter Maydell wrote:
>>> IIUC tcg_enabled(), this guard shouldn't be necessary; if CONFIG_TCG
>>> is not defined, tcg_enabled() evaluates to 0, and the compiler should
>>> elide the whole block.
>> 
>> IME it's a bit optimistic to assume that the compiler will always
>> do that, especially with no optimisation enabled.
>
> There's plenty of other places that we do.
> The compiler is usually pretty good with "if (0)".
>
> My question is if
>
>>   if (arm_feature(env, ARM_FEATURE_M) && tcg_enabled()) { 
>
> needs to be written
>
>  if (tcg_enabled()) {
>  if (arm_feature(..., M) {
> ...
>  }
>  }

Yeah, that doesn't work either. I don't understand why in this
particular case the compiler seems unable to remove that code.

Can anyone else reproduce this or is it just happening on my setup?
Maybe something is broken on my side...

Re: [PATCH 2/4] QGA VSS: Replace 'fprintf(stderr' with PRINT_DEBUG


On 5/7/23 16:12, Konstantin Kostiuk wrote:

Signed-off-by: Konstantin Kostiuk 
---
  qga/vss-win32/install.cpp   | 13 +++--
  qga/vss-win32/requester.cpp |  9 +
  2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/qga/vss-win32/install.cpp b/qga/vss-win32/install.cpp
index ff93b08a9e..c10a397e51 100644
--- a/qga/vss-win32/install.cpp
+++ b/qga/vss-win32/install.cpp
@@ -13,6 +13,7 @@
  #include "qemu/osdep.h"
  
  #include "vss-common.h"

+#include "vss-debug.h"
  #ifdef HAVE_VSS_SDK
  #include 
  #else
@@ -54,7 +55,7 @@ void errmsg(DWORD err, const char *text)
FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
NULL, err, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
(char *), 0, NULL);
-fprintf(stderr, "%.*s. (Error: %lx) %s\n", len, text, err, msg);
+PRINT_DEBUG("%.*s. (Error: %lx) %s\n", len, text, err, msg);


PRINT_DEBUG() ends calling fprintf(stderr)...

Re: [PATCH 1/4] QGA VSS: Add wrapper to send log to debugger and stderr


Hi Konstantin,

On 5/7/23 16:12, Konstantin Kostiuk wrote:

Signed-off-by: Konstantin Kostiuk 
---
  qga/vss-win32/vss-debug.h | 31 +++
  1 file changed, 31 insertions(+)
  create mode 100644 qga/vss-win32/vss-debug.h




+#define PRINT_DEBUG(fmt, ...) {   \
+char user_sting[512] = { 0 }; \
+char full_string[640] = { 0 };\
+snprintf(user_sting, 512, fmt, ## __VA_ARGS__);   \
+snprintf(full_string, 640, QGA_PROVIDER_NAME"[%lu]: %s %s\n", \
+GetCurrentThreadId(), __func__, user_sting);  \
+OutputDebugString(full_string);   \
+fprintf(stderr, "%s", full_string);   \
+}


Why not simply use a plain function?


+#define PRINT_DEBUG_BEGIN PRINT_DEBUG("begin")
+#define PRINT_DEBUG_END PRINT_DEBUG("end")
+
+#endif

Re: [PATCH v2 06/14] ppc440: Stop using system io region for PCIe buses


On 5/7/23 22:12, BALATON Zoltan wrote:

Add separate memory regions for the mem and io spaces of the PCIe bus
to avoid different buses using the same system io region.

Signed-off-by: BALATON Zoltan 
---
  hw/ppc/ppc440_uc.c | 9 ++---
  1 file changed, 6 insertions(+), 3 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 13/14] ppc440_pcix: Don't use iomem for regs


On 5/7/23 22:12, BALATON Zoltan wrote:

The iomem memory region is better used for the PCI IO space but
currently used for registers. Stop using it for that to allow this to
be cleaned up in the next patch.

Signed-off-by: BALATON Zoltan 
---
  hw/ppc/ppc440_pcix.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v3] kconfig: Add PCIe devices to s390x machines


On 5/7/23 17:23, Cédric Le Goater wrote:

It is useful to extend the number of available PCI devices to KVM guests
for passthrough scenarios and also to expose these models to a different
(big endian) architecture. Include models for Intel Ethernet adapters
and one USB controller, which all support MSI-X. Devices only supporting
INTx won't work on s390x.

Signed-off-by: Cédric Le Goater 
---

  v3: PCI -> PCI_EXPRESS
  v2: select -> imply
   
  hw/s390x/Kconfig | 5 -

  1 file changed, 4 insertions(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 00/14] PPC440 devices misc clean up


On Wed, 5 Jul 2023, BALATON Zoltan wrote:

These are some small misc clean ups to PPC440 related device models
which is all I have ready for now.


Sorry, typo in email addresses in cc. Should I send it again or you can 
pick up from the list?


Regards,
BALATON Zoltan


v2:
- Added R-b tags from Philippe
- Addressed review comments
- Added new patch to rename parent field of PPC460EXPCIEState to parent_obj

Patches needing review: 6 7 10-13

BALATON Zoltan (14):
 ppc440: Change ppc460ex_pcie_init() parameter type
 ppc440: Add cpu link property to PCIe controller model
 ppc440: Add a macro to shorten PCIe controller DCR registration
 ppc440: Rename parent field of PPC460EXPCIEState to match code style
 ppc440: Rename local variable in dcr_read_pcie()
 ppc440: Stop using system io region for PCIe buses
 ppc/sam460ex: Remove address_space_mem local variable
 ppc440: Add busnum property to PCIe controller model
 ppc440: Remove ppc460ex_pcie_init legacy init function
 ppc4xx_pci: Rename QOM type name define
 ppc4xx_pci: Add define for ppc4xx-host-bridge type name
 ppc440_pcix: Rename QOM type define abd move it to common header
 ppc440_pcix: Don't use iomem for regs
 ppc440_pcix: Stop using system io region for PCI bus

hw/ppc/ppc440.h |   1 -
hw/ppc/ppc440_bamboo.c  |   3 +-
hw/ppc/ppc440_pcix.c|  28 +++---
hw/ppc/ppc440_uc.c  | 192 +---
hw/ppc/ppc4xx_pci.c |  10 +--
hw/ppc/sam460ex.c   |  33 ---
include/hw/ppc/ppc4xx.h |   5 +-
7 files changed, 129 insertions(+), 143 deletions(-)

[PATCH v2 04/14] ppc440: Rename parent field of PPC460EXPCIEState to match code style

QOM prefers to call the parent field parent_obj, change
PPC460EXPCIEState ro match that convention.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/ppc/ppc440_uc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/ppc440_uc.c b/hw/ppc/ppc440_uc.c
index b36dc409d7..22c74839ae 100644
--- a/hw/ppc/ppc440_uc.c
+++ b/hw/ppc/ppc440_uc.c
@@ -774,7 +774,7 @@ void ppc4xx_dma_init(CPUPPCState *env, int dcr_base)
 OBJECT_DECLARE_SIMPLE_TYPE(PPC460EXPCIEState, PPC460EX_PCIE_HOST)
 
 struct PPC460EXPCIEState {
-PCIExpressHost host;
+PCIExpressHost parent_obj;
 
 MemoryRegion iomem;
 qemu_irq irq[4];
-- 
2.30.9

[PATCH v2 09/14] ppc440: Remove ppc460ex_pcie_init legacy init function

After previous changes we can now remove the legacy init function and
move the device creation to board code.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/ppc/ppc440.h |  1 -
 hw/ppc/ppc440_uc.c  | 21 -
 hw/ppc/sam460ex.c   | 17 -
 include/hw/ppc/ppc4xx.h |  1 +
 4 files changed, 17 insertions(+), 23 deletions(-)

diff --git a/hw/ppc/ppc440.h b/hw/ppc/ppc440.h
index ae42bcf0c8..909373fb38 100644
--- a/hw/ppc/ppc440.h
+++ b/hw/ppc/ppc440.h
@@ -18,6 +18,5 @@ void ppc4xx_cpr_init(CPUPPCState *env);
 void ppc4xx_sdr_init(CPUPPCState *env);
 void ppc4xx_ahb_init(CPUPPCState *env);
 void ppc4xx_dma_init(CPUPPCState *env, int dcr_base);
-void ppc460ex_pcie_init(PowerPCCPU *cpu);
 
 #endif /* PPC440_H */
diff --git a/hw/ppc/ppc440_uc.c b/hw/ppc/ppc440_uc.c
index b74b2212fa..4181c843a8 100644
--- a/hw/ppc/ppc440_uc.c
+++ b/hw/ppc/ppc440_uc.c
@@ -770,7 +770,6 @@ void ppc4xx_dma_init(CPUPPCState *env, int dcr_base)
  */
 #include "hw/pci/pcie_host.h"
 
-#define TYPE_PPC460EX_PCIE_HOST "ppc460ex-pcie-host"
 OBJECT_DECLARE_SIMPLE_TYPE(PPC460EXPCIEState, PPC460EX_PCIE_HOST)
 
 struct PPC460EXPCIEState {
@@ -799,9 +798,6 @@ struct PPC460EXPCIEState {
 uint32_t cfg;
 };
 
-#define DCRN_PCIE0_BASE 0x100
-#define DCRN_PCIE1_BASE 0x120
-
 enum {
 PEGPL_CFGBAH = 0x0,
 PEGPL_CFGBAL,
@@ -1096,20 +1092,3 @@ static void ppc460ex_pcie_register(void)
 }
 
 type_init(ppc460ex_pcie_register)
-
-void ppc460ex_pcie_init(PowerPCCPU *cpu)
-{
-DeviceState *dev;
-
-dev = qdev_new(TYPE_PPC460EX_PCIE_HOST);
-qdev_prop_set_int32(dev, "busnum", 0);
-qdev_prop_set_int32(dev, "dcrn-base", DCRN_PCIE0_BASE);
-object_property_set_link(OBJECT(dev), "cpu", OBJECT(cpu), _abort);
-sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
-
-dev = qdev_new(TYPE_PPC460EX_PCIE_HOST);
-qdev_prop_set_int32(dev, "busnum", 1);
-qdev_prop_set_int32(dev, "dcrn-base", DCRN_PCIE1_BASE);
-object_property_set_link(OBJECT(dev), "cpu", OBJECT(cpu), _abort);
-sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
-}
diff --git a/hw/ppc/sam460ex.c b/hw/ppc/sam460ex.c
index f098226974..d446cfc37b 100644
--- a/hw/ppc/sam460ex.c
+++ b/hw/ppc/sam460ex.c
@@ -45,6 +45,9 @@
 /* dd bs=1 skip=$(($(stat -c '%s' updater/updater-460) - 0x8)) \
  if=updater/updater-460 of=u-boot-sam460-20100605.bin */
 
+#define PCIE0_DCRN_BASE 0x100
+#define PCIE1_DCRN_BASE 0x120
+
 /* from Sam460 U-Boot include/configs/Sam460ex.h */
 #define FLASH_BASE 0xfff0
 #define FLASH_BASE_H   0x4
@@ -421,8 +424,20 @@ static void sam460ex_init(MachineState *machine)
 usb_create_simple(usb_bus_find(-1), "usb-kbd");
 usb_create_simple(usb_bus_find(-1), "usb-mouse");
 
+/* PCIe buses */
+dev = qdev_new(TYPE_PPC460EX_PCIE_HOST);
+qdev_prop_set_int32(dev, "busnum", 0);
+qdev_prop_set_int32(dev, "dcrn-base", PCIE0_DCRN_BASE);
+object_property_set_link(OBJECT(dev), "cpu", OBJECT(cpu), _abort);
+sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
+
+dev = qdev_new(TYPE_PPC460EX_PCIE_HOST);
+qdev_prop_set_int32(dev, "busnum", 1);
+qdev_prop_set_int32(dev, "dcrn-base", PCIE1_DCRN_BASE);
+object_property_set_link(OBJECT(dev), "cpu", OBJECT(cpu), _abort);
+sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
+
 /* PCI bus */
-ppc460ex_pcie_init(cpu);
 /* All PCI irqs are connected to the same UIC pin (cf. UBoot source) */
 dev = sysbus_create_simple("ppc440-pcix-host", 0xc0ec0,
qdev_get_gpio_in(uic[1], 0));
diff --git a/include/hw/ppc/ppc4xx.h b/include/hw/ppc/ppc4xx.h
index f8c86e09ec..39ca602442 100644
--- a/include/hw/ppc/ppc4xx.h
+++ b/include/hw/ppc/ppc4xx.h
@@ -30,6 +30,7 @@
 #include "hw/sysbus.h"
 
 #define TYPE_PPC4xx_PCI_HOST_BRIDGE "ppc4xx-pcihost"
+#define TYPE_PPC460EX_PCIE_HOST "ppc460ex-pcie-host"
 
 /*
  * Generic DCR device
-- 
2.30.9

[PATCH v2 11/14] ppc4xx_pci: Add define for ppc4xx-host-bridge type name

Add a QOM type name define for ppc4xx-host-bridge in the common header
and replace direct use of the string name with the constant.

Signed-off-by: BALATON Zoltan 
---
 hw/ppc/ppc440_pcix.c| 3 ++-
 hw/ppc/ppc4xx_pci.c | 4 ++--
 include/hw/ppc/ppc4xx.h | 1 +
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/ppc440_pcix.c b/hw/ppc/ppc440_pcix.c
index f10f93c533..dfec25ac83 100644
--- a/hw/ppc/ppc440_pcix.c
+++ b/hw/ppc/ppc440_pcix.c
@@ -495,7 +495,8 @@ static void ppc440_pcix_realize(DeviceState *dev, Error 
**errp)
  ppc440_pcix_map_irq, >irq, >busmem,
  get_system_io(), PCI_DEVFN(0, 0), 1, TYPE_PCI_BUS);
 
-s->dev = pci_create_simple(h->bus, PCI_DEVFN(0, 0), "ppc4xx-host-bridge");
+s->dev = pci_create_simple(h->bus, PCI_DEVFN(0, 0),
+   TYPE_PPC4xx_HOST_BRIDGE);
 
 memory_region_init(>bm, OBJECT(s), "bm-ppc440-pcix", UINT64_MAX);
 memory_region_add_subregion(>bm, 0x0, >busmem);
diff --git a/hw/ppc/ppc4xx_pci.c b/hw/ppc/ppc4xx_pci.c
index fbdf8266d8..6652119008 100644
--- a/hw/ppc/ppc4xx_pci.c
+++ b/hw/ppc/ppc4xx_pci.c
@@ -333,7 +333,7 @@ static void ppc4xx_pcihost_realize(DeviceState *dev, Error 
**errp)
   TYPE_PCI_BUS);
 h->bus = b;
 
-pci_create_simple(b, 0, "ppc4xx-host-bridge");
+pci_create_simple(b, 0, TYPE_PPC4xx_HOST_BRIDGE);
 
 /* XXX split into 2 memory regions, one for config space, one for regs */
 memory_region_init(>container, OBJECT(s), "pci-container", 
PCI_ALL_SIZE);
@@ -367,7 +367,7 @@ static void ppc4xx_host_bridge_class_init(ObjectClass 
*klass, void *data)
 }
 
 static const TypeInfo ppc4xx_host_bridge_info = {
-.name  = "ppc4xx-host-bridge",
+.name  = TYPE_PPC4xx_HOST_BRIDGE,
 .parent= TYPE_PCI_DEVICE,
 .instance_size = sizeof(PCIDevice),
 .class_init= ppc4xx_host_bridge_class_init,
diff --git a/include/hw/ppc/ppc4xx.h b/include/hw/ppc/ppc4xx.h
index e053b9751b..766d575e86 100644
--- a/include/hw/ppc/ppc4xx.h
+++ b/include/hw/ppc/ppc4xx.h
@@ -29,6 +29,7 @@
 #include "exec/memory.h"
 #include "hw/sysbus.h"
 
+#define TYPE_PPC4xx_HOST_BRIDGE "ppc4xx-host-bridge"
 #define TYPE_PPC4xx_PCI_HOST "ppc4xx-pci-host"
 #define TYPE_PPC460EX_PCIE_HOST "ppc460ex-pcie-host"
 
-- 
2.30.9

[PATCH v2 03/14] ppc440: Add a macro to shorten PCIe controller DCR registration

It is shorter and more readable to wrap the complex call to
ppc_dcr_register() in a macro than to repeat it several times.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/ppc/ppc440_uc.c | 76 +-
 1 file changed, 28 insertions(+), 48 deletions(-)

diff --git a/hw/ppc/ppc440_uc.c b/hw/ppc/ppc440_uc.c
index b26c0cee1b..b36dc409d7 100644
--- a/hw/ppc/ppc440_uc.c
+++ b/hw/ppc/ppc440_uc.c
@@ -1002,56 +1002,36 @@ static void ppc460ex_set_irq(void *opaque, int irq_num, 
int level)
qemu_set_irq(s->irq[irq_num], level);
 }
 
+#define PPC440_PCIE_DCR(s, dcrn) \
+ppc_dcr_register(&(s)->cpu->env, (s)->dcrn_base + (dcrn), (s), \
+ _read_pcie, _write_pcie)
+
+
 static void ppc460ex_pcie_register_dcrs(PPC460EXPCIEState *s)
 {
-CPUPPCState *env = >cpu->env;
-
-ppc_dcr_register(env, s->dcrn_base + PEGPL_CFGBAH, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_CFGBAL, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_CFGMSK, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_MSGBAH, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_MSGBAL, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_MSGMSK, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR1BAH, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR1BAL, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR1MSKH, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR1MSKL, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR2BAH, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR2BAL, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR2MSKH, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR2MSKL, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR3BAH, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR3BAL, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR3MSKH, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR3MSKL, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_REGBAH, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_REGBAL, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_REGMSK, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_SPECIAL, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_CFG, s,
- _read_pcie, _write_pcie);
+PPC440_PCIE_DCR(s, PEGPL_CFGBAH);
+PPC440_PCIE_DCR(s, PEGPL_CFGBAL);
+PPC440_PCIE_DCR(s, PEGPL_CFGMSK);
+PPC440_PCIE_DCR(s, PEGPL_MSGBAH);
+PPC440_PCIE_DCR(s, PEGPL_MSGBAL);
+PPC440_PCIE_DCR(s, PEGPL_MSGMSK);
+PPC440_PCIE_DCR(s, PEGPL_OMR1BAH);
+PPC440_PCIE_DCR(s, PEGPL_OMR1BAL);
+PPC440_PCIE_DCR(s, PEGPL_OMR1MSKH);
+PPC440_PCIE_DCR(s, PEGPL_OMR1MSKL);
+PPC440_PCIE_DCR(s, PEGPL_OMR2BAH);
+PPC440_PCIE_DCR(s, PEGPL_OMR2BAL);
+PPC440_PCIE_DCR(s, PEGPL_OMR2MSKH);
+PPC440_PCIE_DCR(s, PEGPL_OMR2MSKL);
+PPC440_PCIE_DCR(s, PEGPL_OMR3BAH);
+PPC440_PCIE_DCR(s, PEGPL_OMR3BAL);
+PPC440_PCIE_DCR(s, PEGPL_OMR3MSKH);
+PPC440_PCIE_DCR(s, PEGPL_OMR3MSKL);
+PPC440_PCIE_DCR(s, PEGPL_REGBAH);
+PPC440_PCIE_DCR(s, PEGPL_REGBAL);
+PPC440_PCIE_DCR(s, PEGPL_REGMSK);
+PPC440_PCIE_DCR(s, PEGPL_SPECIAL);
+PPC440_PCIE_DCR(s, PEGPL_CFG);
 }
 
 static void ppc460ex_pcie_realize(DeviceState *dev, Error **errp)
-- 
2.30.9

[PATCH v2 02/14] ppc440: Add cpu link property to PCIe controller model

The PCIe controller model uses PPC DCRs but cannot be modeled with
TYPE_PPC4xx_DCR_DEVICE as it derives from TYPE_PCIE_HOST_BRIDGE. Add a
cpu link property to it similar to other DCR devices to allow
registering DCRs from the device model.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/ppc/ppc440_uc.c | 114 -
 1 file changed, 62 insertions(+), 52 deletions(-)

diff --git a/hw/ppc/ppc440_uc.c b/hw/ppc/ppc440_uc.c
index 8eb985d714..b26c0cee1b 100644
--- a/hw/ppc/ppc440_uc.c
+++ b/hw/ppc/ppc440_uc.c
@@ -779,6 +779,7 @@ struct PPC460EXPCIEState {
 MemoryRegion iomem;
 qemu_irq irq[4];
 int32_t dcrn_base;
+PowerPCCPU *cpu;
 
 uint64_t cfg_base;
 uint32_t cfg_mask;
@@ -1001,6 +1002,58 @@ static void ppc460ex_set_irq(void *opaque, int irq_num, 
int level)
qemu_set_irq(s->irq[irq_num], level);
 }
 
+static void ppc460ex_pcie_register_dcrs(PPC460EXPCIEState *s)
+{
+CPUPPCState *env = >cpu->env;
+
+ppc_dcr_register(env, s->dcrn_base + PEGPL_CFGBAH, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_CFGBAL, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_CFGMSK, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_MSGBAH, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_MSGBAL, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_MSGMSK, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR1BAH, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR1BAL, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR1MSKH, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR1MSKL, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR2BAH, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR2BAL, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR2MSKH, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR2MSKL, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR3BAH, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR3BAL, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR3MSKH, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_OMR3MSKL, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_REGBAH, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_REGBAL, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_REGMSK, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_SPECIAL, s,
+ _read_pcie, _write_pcie);
+ppc_dcr_register(env, s->dcrn_base + PEGPL_CFG, s,
+ _read_pcie, _write_pcie);
+}
+
 static void ppc460ex_pcie_realize(DeviceState *dev, Error **errp)
 {
 PPC460EXPCIEState *s = PPC460EX_PCIE_HOST(dev);
@@ -1008,6 +1061,10 @@ static void ppc460ex_pcie_realize(DeviceState *dev, 
Error **errp)
 int i, id;
 char buf[16];
 
+if (!s->cpu) {
+error_setg(errp, "cpu link property must be set");
+return;
+}
 switch (s->dcrn_base) {
 case DCRN_PCIE0_BASE:
 id = 0;
@@ -1028,10 +1085,13 @@ static void ppc460ex_pcie_realize(DeviceState *dev, 
Error **errp)
 pci->bus = pci_register_root_bus(DEVICE(s), buf, ppc460ex_set_irq,
 pci_swizzle_map_irq_fn, s, >iomem,
 get_system_io(), 0, 4, TYPE_PCIE_BUS);
+ppc460ex_pcie_register_dcrs(s);
 }
 
 static Property ppc460ex_pcie_props[] = {
 DEFINE_PROP_INT32("dcrn-base", PPC460EXPCIEState, dcrn_base, -1),
+DEFINE_PROP_LINK("cpu", PPC460EXPCIEState, cpu, TYPE_POWERPC_CPU,
+ PowerPCCPU *),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -1059,67 +1119,17 @@ static void ppc460ex_pcie_register(void)
 
 type_init(ppc460ex_pcie_register)
 
-static void ppc460ex_pcie_register_dcrs(PPC460EXPCIEState *s, CPUPPCState *env)
-{
-ppc_dcr_register(env, s->dcrn_base + PEGPL_CFGBAH, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_CFGBAL, s,
- _read_pcie, _write_pcie);
-ppc_dcr_register(env, s->dcrn_base + PEGPL_CFGMSK, s,
-

[PATCH v2 07/14] ppc/sam460ex: Remove address_space_mem local variable

Some places already use  get_system_memory() directly so replace the
remaining uses and drop the local variable.

Signed-off-by: BALATON Zoltan 
---
 hw/ppc/sam460ex.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/sam460ex.c b/hw/ppc/sam460ex.c
index aaa8d2f4a5..f098226974 100644
--- a/hw/ppc/sam460ex.c
+++ b/hw/ppc/sam460ex.c
@@ -266,7 +266,6 @@ static void main_cpu_reset(void *opaque)
 
 static void sam460ex_init(MachineState *machine)
 {
-MemoryRegion *address_space_mem = get_system_memory();
 MemoryRegion *isa = g_new(MemoryRegion, 1);
 MemoryRegion *l2cache_ram = g_new(MemoryRegion, 1);
 DeviceState *uic[4];
@@ -406,7 +405,8 @@ static void sam460ex_init(MachineState *machine)
 /* FIXME: remove this after fixing l2sram mapping in ppc440_uc.c? */
 memory_region_init_ram(l2cache_ram, NULL, "ppc440.l2cache_ram", 256 * KiB,
_abort);
-memory_region_add_subregion(address_space_mem, 0x4LL, l2cache_ram);
+memory_region_add_subregion(get_system_memory(), 0x4LL,
+l2cache_ram);
 
 /* USB */
 sysbus_create_simple(TYPE_PPC4xx_EHCI, 0x4bffd0400,
@@ -444,13 +444,13 @@ static void sam460ex_init(MachineState *machine)
 /* SoC has 4 UARTs
  * but board has only one wired and two are present in fdt */
 if (serial_hd(0) != NULL) {
-serial_mm_init(address_space_mem, 0x4ef600300, 0,
+serial_mm_init(get_system_memory(), 0x4ef600300, 0,
qdev_get_gpio_in(uic[1], 1),
PPC_SERIAL_MM_BAUDBASE, serial_hd(0),
DEVICE_BIG_ENDIAN);
 }
 if (serial_hd(1) != NULL) {
-serial_mm_init(address_space_mem, 0x4ef600400, 0,
+serial_mm_init(get_system_memory(), 0x4ef600400, 0,
qdev_get_gpio_in(uic[0], 1),
PPC_SERIAL_MM_BAUDBASE, serial_hd(1),
DEVICE_BIG_ENDIAN);
-- 
2.30.9

[PATCH v2 10/14] ppc4xx_pci: Rename QOM type name define

Rename the TYPE_PPC4xx_PCI_HOST_BRIDGE define and its string value to
match each other and other similar types and to avoid confusion with
"ppc4xx-host-bridge" type defined in same file.

Signed-off-by: BALATON Zoltan 
---
 hw/ppc/ppc440_bamboo.c  | 3 +--
 hw/ppc/ppc4xx_pci.c | 6 +++---
 include/hw/ppc/ppc4xx.h | 2 +-
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/ppc440_bamboo.c b/hw/ppc/ppc440_bamboo.c
index f061b8cf3b..45f409c838 100644
--- a/hw/ppc/ppc440_bamboo.c
+++ b/hw/ppc/ppc440_bamboo.c
@@ -205,8 +205,7 @@ static void bamboo_init(MachineState *machine)
 ppc4xx_sdram_ddr_enable(PPC4xx_SDRAM_DDR(dev));
 
 /* PCI */
-dev = sysbus_create_varargs(TYPE_PPC4xx_PCI_HOST_BRIDGE,
-PPC440EP_PCI_CONFIG,
+dev = sysbus_create_varargs(TYPE_PPC4xx_PCI_HOST, PPC440EP_PCI_CONFIG,
 qdev_get_gpio_in(uicdev, pci_irq_nrs[0]),
 qdev_get_gpio_in(uicdev, pci_irq_nrs[1]),
 qdev_get_gpio_in(uicdev, pci_irq_nrs[2]),
diff --git a/hw/ppc/ppc4xx_pci.c b/hw/ppc/ppc4xx_pci.c
index 1d4a50fa7c..fbdf8266d8 100644
--- a/hw/ppc/ppc4xx_pci.c
+++ b/hw/ppc/ppc4xx_pci.c
@@ -46,7 +46,7 @@ struct PCITargetMap {
 uint32_t la;
 };
 
-OBJECT_DECLARE_SIMPLE_TYPE(PPC4xxPCIState, PPC4xx_PCI_HOST_BRIDGE)
+OBJECT_DECLARE_SIMPLE_TYPE(PPC4xxPCIState, PPC4xx_PCI_HOST)
 
 #define PPC4xx_PCI_NR_PMMS 3
 #define PPC4xx_PCI_NR_PTMS 2
@@ -321,7 +321,7 @@ static void ppc4xx_pcihost_realize(DeviceState *dev, Error 
**errp)
 int i;
 
 h = PCI_HOST_BRIDGE(dev);
-s = PPC4xx_PCI_HOST_BRIDGE(dev);
+s = PPC4xx_PCI_HOST(dev);
 
 for (i = 0; i < ARRAY_SIZE(s->irq); i++) {
 sysbus_init_irq(sbd, >irq[i]);
@@ -386,7 +386,7 @@ static void ppc4xx_pcihost_class_init(ObjectClass *klass, 
void *data)
 }
 
 static const TypeInfo ppc4xx_pcihost_info = {
-.name  = TYPE_PPC4xx_PCI_HOST_BRIDGE,
+.name  = TYPE_PPC4xx_PCI_HOST,
 .parent= TYPE_PCI_HOST_BRIDGE,
 .instance_size = sizeof(PPC4xxPCIState),
 .class_init= ppc4xx_pcihost_class_init,
diff --git a/include/hw/ppc/ppc4xx.h b/include/hw/ppc/ppc4xx.h
index 39ca602442..e053b9751b 100644
--- a/include/hw/ppc/ppc4xx.h
+++ b/include/hw/ppc/ppc4xx.h
@@ -29,7 +29,7 @@
 #include "exec/memory.h"
 #include "hw/sysbus.h"
 
-#define TYPE_PPC4xx_PCI_HOST_BRIDGE "ppc4xx-pcihost"
+#define TYPE_PPC4xx_PCI_HOST "ppc4xx-pci-host"
 #define TYPE_PPC460EX_PCIE_HOST "ppc460ex-pcie-host"
 
 /*
-- 
2.30.9

[PATCH v2 13/14] ppc440_pcix: Don't use iomem for regs

The iomem memory region is better used for the PCI IO space but
currently used for registers. Stop using it for that to allow this to
be cleaned up in the next patch.

Signed-off-by: BALATON Zoltan 
---
 hw/ppc/ppc440_pcix.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/ppc440_pcix.c b/hw/ppc/ppc440_pcix.c
index adfecf1e76..cf932e4b25 100644
--- a/hw/ppc/ppc440_pcix.c
+++ b/hw/ppc/ppc440_pcix.c
@@ -63,6 +63,7 @@ struct PPC440PCIXState {
 MemoryRegion container;
 MemoryRegion iomem;
 MemoryRegion busmem;
+MemoryRegion regs;
 };
 
 #define PPC440_REG_BASE 0x8
@@ -507,11 +508,11 @@ static void ppc440_pcix_realize(DeviceState *dev, Error 
**errp)
   h, "pci-conf-idx", 4);
 memory_region_init_io(>data_mem, OBJECT(s), _host_data_le_ops,
   h, "pci-conf-data", 4);
-memory_region_init_io(>iomem, OBJECT(s), _reg_ops, s,
-  "pci.reg", PPC440_REG_SIZE);
+memory_region_init_io(>regs, OBJECT(s), _reg_ops, s, "pci-reg",
+  PPC440_REG_SIZE);
 memory_region_add_subregion(>container, PCIC0_CFGADDR, >conf_mem);
 memory_region_add_subregion(>container, PCIC0_CFGDATA, >data_mem);
-memory_region_add_subregion(>container, PPC440_REG_BASE, >iomem);
+memory_region_add_subregion(>container, PPC440_REG_BASE, >regs);
 sysbus_init_mmio(sbd, >container);
 }
 
-- 
2.30.9

[PATCH v2 05/14] ppc440: Rename local variable in dcr_read_pcie()

Rename local variable storing state struct in dcr_read_pcie() for
brevity and consistency with other functions.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/ppc/ppc440_uc.c | 50 +++---
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/hw/ppc/ppc440_uc.c b/hw/ppc/ppc440_uc.c
index 22c74839ae..5724db2702 100644
--- a/hw/ppc/ppc440_uc.c
+++ b/hw/ppc/ppc440_uc.c
@@ -828,78 +828,78 @@ enum {
 
 static uint32_t dcr_read_pcie(void *opaque, int dcrn)
 {
-PPC460EXPCIEState *state = opaque;
+PPC460EXPCIEState *s = opaque;
 uint32_t ret = 0;
 
-switch (dcrn - state->dcrn_base) {
+switch (dcrn - s->dcrn_base) {
 case PEGPL_CFGBAH:
-ret = state->cfg_base >> 32;
+ret = s->cfg_base >> 32;
 break;
 case PEGPL_CFGBAL:
-ret = state->cfg_base;
+ret = s->cfg_base;
 break;
 case PEGPL_CFGMSK:
-ret = state->cfg_mask;
+ret = s->cfg_mask;
 break;
 case PEGPL_MSGBAH:
-ret = state->msg_base >> 32;
+ret = s->msg_base >> 32;
 break;
 case PEGPL_MSGBAL:
-ret = state->msg_base;
+ret = s->msg_base;
 break;
 case PEGPL_MSGMSK:
-ret = state->msg_mask;
+ret = s->msg_mask;
 break;
 case PEGPL_OMR1BAH:
-ret = state->omr1_base >> 32;
+ret = s->omr1_base >> 32;
 break;
 case PEGPL_OMR1BAL:
-ret = state->omr1_base;
+ret = s->omr1_base;
 break;
 case PEGPL_OMR1MSKH:
-ret = state->omr1_mask >> 32;
+ret = s->omr1_mask >> 32;
 break;
 case PEGPL_OMR1MSKL:
-ret = state->omr1_mask;
+ret = s->omr1_mask;
 break;
 case PEGPL_OMR2BAH:
-ret = state->omr2_base >> 32;
+ret = s->omr2_base >> 32;
 break;
 case PEGPL_OMR2BAL:
-ret = state->omr2_base;
+ret = s->omr2_base;
 break;
 case PEGPL_OMR2MSKH:
-ret = state->omr2_mask >> 32;
+ret = s->omr2_mask >> 32;
 break;
 case PEGPL_OMR2MSKL:
-ret = state->omr3_mask;
+ret = s->omr3_mask;
 break;
 case PEGPL_OMR3BAH:
-ret = state->omr3_base >> 32;
+ret = s->omr3_base >> 32;
 break;
 case PEGPL_OMR3BAL:
-ret = state->omr3_base;
+ret = s->omr3_base;
 break;
 case PEGPL_OMR3MSKH:
-ret = state->omr3_mask >> 32;
+ret = s->omr3_mask >> 32;
 break;
 case PEGPL_OMR3MSKL:
-ret = state->omr3_mask;
+ret = s->omr3_mask;
 break;
 case PEGPL_REGBAH:
-ret = state->reg_base >> 32;
+ret = s->reg_base >> 32;
 break;
 case PEGPL_REGBAL:
-ret = state->reg_base;
+ret = s->reg_base;
 break;
 case PEGPL_REGMSK:
-ret = state->reg_mask;
+ret = s->reg_mask;
 break;
 case PEGPL_SPECIAL:
-ret = state->special;
+ret = s->special;
 break;
 case PEGPL_CFG:
-ret = state->cfg;
+ret = s->cfg;
 break;
 }
 
-- 
2.30.9

[PATCH v2 06/14] ppc440: Stop using system io region for PCIe buses

Add separate memory regions for the mem and io spaces of the PCIe bus
to avoid different buses using the same system io region.

Signed-off-by: BALATON Zoltan 
---
 hw/ppc/ppc440_uc.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/ppc440_uc.c b/hw/ppc/ppc440_uc.c
index 5724db2702..663abf3449 100644
--- a/hw/ppc/ppc440_uc.c
+++ b/hw/ppc/ppc440_uc.c
@@ -776,6 +776,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(PPC460EXPCIEState, 
PPC460EX_PCIE_HOST)
 struct PPC460EXPCIEState {
 PCIExpressHost parent_obj;
 
+MemoryRegion busmem;
 MemoryRegion iomem;
 qemu_irq irq[4];
 int32_t dcrn_base;
@@ -1056,15 +1057,17 @@ static void ppc460ex_pcie_realize(DeviceState *dev, 
Error **errp)
 error_setg(errp, "invalid PCIe DCRN base");
 return;
 }
+snprintf(buf, sizeof(buf), "pcie%d-mem", id);
+memory_region_init(>busmem, OBJECT(s), buf, UINT64_MAX);
 snprintf(buf, sizeof(buf), "pcie%d-io", id);
-memory_region_init(>iomem, OBJECT(s), buf, UINT64_MAX);
+memory_region_init(>iomem, OBJECT(s), buf, 64 * KiB);
 for (i = 0; i < 4; i++) {
 sysbus_init_irq(SYS_BUS_DEVICE(dev), >irq[i]);
 }
 snprintf(buf, sizeof(buf), "pcie.%d", id);
 pci->bus = pci_register_root_bus(DEVICE(s), buf, ppc460ex_set_irq,
-pci_swizzle_map_irq_fn, s, >iomem,
-get_system_io(), 0, 4, TYPE_PCIE_BUS);
+pci_swizzle_map_irq_fn, s, >busmem,
+>iomem, 0, 4, TYPE_PCIE_BUS);
 ppc460ex_pcie_register_dcrs(s);
 }
 
-- 
2.30.9

[PATCH v2 08/14] ppc440: Add busnum property to PCIe controller model

Instead of guessing controller number from dcrn_base add a property so
the device does not need knowledge about where it is used.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/ppc/ppc440_uc.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/hw/ppc/ppc440_uc.c b/hw/ppc/ppc440_uc.c
index 663abf3449..b74b2212fa 100644
--- a/hw/ppc/ppc440_uc.c
+++ b/hw/ppc/ppc440_uc.c
@@ -779,6 +779,7 @@ struct PPC460EXPCIEState {
 MemoryRegion busmem;
 MemoryRegion iomem;
 qemu_irq irq[4];
+int32_t num;
 int32_t dcrn_base;
 PowerPCCPU *cpu;
 
@@ -1039,32 +1040,25 @@ static void ppc460ex_pcie_realize(DeviceState *dev, 
Error **errp)
 {
 PPC460EXPCIEState *s = PPC460EX_PCIE_HOST(dev);
 PCIHostState *pci = PCI_HOST_BRIDGE(dev);
-int i, id;
-char buf[16];
+int i;
+char buf[20];
 
 if (!s->cpu) {
 error_setg(errp, "cpu link property must be set");
 return;
 }
-switch (s->dcrn_base) {
-case DCRN_PCIE0_BASE:
-id = 0;
-break;
-case DCRN_PCIE1_BASE:
-id = 1;
-break;
-default:
-error_setg(errp, "invalid PCIe DCRN base");
+if (s->num < 0 || s->dcrn_base < 0) {
+error_setg(errp, "busnum and dcrn-base properties must be set");
 return;
 }
-snprintf(buf, sizeof(buf), "pcie%d-mem", id);
+snprintf(buf, sizeof(buf), "pcie%d-mem", s->num);
 memory_region_init(>busmem, OBJECT(s), buf, UINT64_MAX);
-snprintf(buf, sizeof(buf), "pcie%d-io", id);
+snprintf(buf, sizeof(buf), "pcie%d-io", s->num);
 memory_region_init(>iomem, OBJECT(s), buf, 64 * KiB);
 for (i = 0; i < 4; i++) {
 sysbus_init_irq(SYS_BUS_DEVICE(dev), >irq[i]);
 }
-snprintf(buf, sizeof(buf), "pcie.%d", id);
+snprintf(buf, sizeof(buf), "pcie.%d", s->num);
 pci->bus = pci_register_root_bus(DEVICE(s), buf, ppc460ex_set_irq,
 pci_swizzle_map_irq_fn, s, >busmem,
 >iomem, 0, 4, TYPE_PCIE_BUS);
@@ -1072,6 +1066,7 @@ static void ppc460ex_pcie_realize(DeviceState *dev, Error 
**errp)
 }
 
 static Property ppc460ex_pcie_props[] = {
+DEFINE_PROP_INT32("busnum", PPC460EXPCIEState, num, -1),
 DEFINE_PROP_INT32("dcrn-base", PPC460EXPCIEState, dcrn_base, -1),
 DEFINE_PROP_LINK("cpu", PPC460EXPCIEState, cpu, TYPE_POWERPC_CPU,
  PowerPCCPU *),
@@ -1107,11 +1102,13 @@ void ppc460ex_pcie_init(PowerPCCPU *cpu)
 DeviceState *dev;
 
 dev = qdev_new(TYPE_PPC460EX_PCIE_HOST);
+qdev_prop_set_int32(dev, "busnum", 0);
 qdev_prop_set_int32(dev, "dcrn-base", DCRN_PCIE0_BASE);
 object_property_set_link(OBJECT(dev), "cpu", OBJECT(cpu), _abort);
 sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
 
 dev = qdev_new(TYPE_PPC460EX_PCIE_HOST);
+qdev_prop_set_int32(dev, "busnum", 1);
 qdev_prop_set_int32(dev, "dcrn-base", DCRN_PCIE1_BASE);
 object_property_set_link(OBJECT(dev), "cpu", OBJECT(cpu), _abort);
 sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
-- 
2.30.9

[PATCH v2 14/14] ppc440_pcix: Stop using system io region for PCI bus

Reduce the iomem region to 64K and use it for the PCI io space and map
it directly from the board without an intermediate alias that is not
really needed.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/ppc/ppc440_pcix.c | 9 ++---
 hw/ppc/sam460ex.c| 6 +-
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/ppc440_pcix.c b/hw/ppc/ppc440_pcix.c
index cf932e4b25..672090de94 100644
--- a/hw/ppc/ppc440_pcix.c
+++ b/hw/ppc/ppc440_pcix.c
@@ -23,6 +23,7 @@
 #include "qemu/error-report.h"
 #include "qemu/log.h"
 #include "qemu/module.h"
+#include "qemu/units.h"
 #include "hw/irq.h"
 #include "hw/ppc/ppc.h"
 #include "hw/ppc/ppc4xx.h"
@@ -490,10 +491,11 @@ static void ppc440_pcix_realize(DeviceState *dev, Error 
**errp)
 s = PPC440_PCIX_HOST(dev);
 
 sysbus_init_irq(sbd, >irq);
-memory_region_init(>busmem, OBJECT(dev), "pci bus memory", UINT64_MAX);
+memory_region_init(>busmem, OBJECT(dev), "pci-mem", UINT64_MAX);
+memory_region_init(>iomem, OBJECT(dev), "pci-io", 64 * KiB);
 h->bus = pci_register_root_bus(dev, NULL, ppc440_pcix_set_irq,
- ppc440_pcix_map_irq, >irq, >busmem,
- get_system_io(), PCI_DEVFN(0, 0), 1, TYPE_PCI_BUS);
+ ppc440_pcix_map_irq, >irq, >busmem, >iomem,
+ PCI_DEVFN(0, 0), 1, TYPE_PCI_BUS);
 
 s->dev = pci_create_simple(h->bus, PCI_DEVFN(0, 0),
TYPE_PPC4xx_HOST_BRIDGE);
@@ -514,6 +516,7 @@ static void ppc440_pcix_realize(DeviceState *dev, Error 
**errp)
 memory_region_add_subregion(>container, PCIC0_CFGDATA, >data_mem);
 memory_region_add_subregion(>container, PPC440_REG_BASE, >regs);
 sysbus_init_mmio(sbd, >container);
+sysbus_init_mmio(sbd, >iomem);
 }
 
 static void ppc440_pcix_class_init(ObjectClass *klass, void *data)
diff --git a/hw/ppc/sam460ex.c b/hw/ppc/sam460ex.c
index 8d0e551d14..1e615b8d35 100644
--- a/hw/ppc/sam460ex.c
+++ b/hw/ppc/sam460ex.c
@@ -269,7 +269,6 @@ static void main_cpu_reset(void *opaque)
 
 static void sam460ex_init(MachineState *machine)
 {
-MemoryRegion *isa = g_new(MemoryRegion, 1);
 MemoryRegion *l2cache_ram = g_new(MemoryRegion, 1);
 DeviceState *uic[4];
 int i;
@@ -441,12 +440,9 @@ static void sam460ex_init(MachineState *machine)
 /* All PCI irqs are connected to the same UIC pin (cf. UBoot source) */
 dev = sysbus_create_simple(TYPE_PPC440_PCIX_HOST, 0xc0ec0,
qdev_get_gpio_in(uic[1], 0));
+sysbus_mmio_map(SYS_BUS_DEVICE(dev), 1, 0xc0800);
 pci_bus = PCI_BUS(qdev_get_child_bus(dev, "pci.0"));
 
-memory_region_init_alias(isa, NULL, "isa_mmio", get_system_io(),
- 0, 0x1);
-memory_region_add_subregion(get_system_memory(), 0xc0800, isa);
-
 /* PCI devices */
 pci_create_simple(pci_bus, PCI_DEVFN(6, 0), "sm501");
 /* SoC has a single SATA port but we don't emulate that yet
-- 
2.30.9

[PATCH v2 12/14] ppc440_pcix: Rename QOM type define abd move it to common header

Rename TYPE_PPC440_PCIX_HOST_BRIDGE to better match its string value,
move it to common header and use it also in sam460ex to replace hard
coded type name.

Signed-off-by: BALATON Zoltan 
---
 hw/ppc/ppc440_pcix.c| 9 -
 hw/ppc/sam460ex.c   | 2 +-
 include/hw/ppc/ppc4xx.h | 1 +
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/ppc440_pcix.c b/hw/ppc/ppc440_pcix.c
index dfec25ac83..adfecf1e76 100644
--- a/hw/ppc/ppc440_pcix.c
+++ b/hw/ppc/ppc440_pcix.c
@@ -44,8 +44,7 @@ struct PLBInMap {
 MemoryRegion mr;
 };
 
-#define TYPE_PPC440_PCIX_HOST_BRIDGE "ppc440-pcix-host"
-OBJECT_DECLARE_SIMPLE_TYPE(PPC440PCIXState, PPC440_PCIX_HOST_BRIDGE)
+OBJECT_DECLARE_SIMPLE_TYPE(PPC440PCIXState, PPC440_PCIX_HOST)
 
 #define PPC440_PCIX_NR_POMS 3
 #define PPC440_PCIX_NR_PIMS 3
@@ -397,7 +396,7 @@ static const MemoryRegionOps pci_reg_ops = {
 
 static void ppc440_pcix_reset(DeviceState *dev)
 {
-struct PPC440PCIXState *s = PPC440_PCIX_HOST_BRIDGE(dev);
+struct PPC440PCIXState *s = PPC440_PCIX_HOST(dev);
 int i;
 
 for (i = 0; i < PPC440_PCIX_NR_POMS; i++) {
@@ -487,7 +486,7 @@ static void ppc440_pcix_realize(DeviceState *dev, Error 
**errp)
 PCIHostState *h;
 
 h = PCI_HOST_BRIDGE(dev);
-s = PPC440_PCIX_HOST_BRIDGE(dev);
+s = PPC440_PCIX_HOST(dev);
 
 sysbus_init_irq(sbd, >irq);
 memory_region_init(>busmem, OBJECT(dev), "pci bus memory", UINT64_MAX);
@@ -525,7 +524,7 @@ static void ppc440_pcix_class_init(ObjectClass *klass, void 
*data)
 }
 
 static const TypeInfo ppc440_pcix_info = {
-.name  = TYPE_PPC440_PCIX_HOST_BRIDGE,
+.name  = TYPE_PPC440_PCIX_HOST,
 .parent= TYPE_PCI_HOST_BRIDGE,
 .instance_size = sizeof(PPC440PCIXState),
 .class_init= ppc440_pcix_class_init,
diff --git a/hw/ppc/sam460ex.c b/hw/ppc/sam460ex.c
index d446cfc37b..8d0e551d14 100644
--- a/hw/ppc/sam460ex.c
+++ b/hw/ppc/sam460ex.c
@@ -439,7 +439,7 @@ static void sam460ex_init(MachineState *machine)
 
 /* PCI bus */
 /* All PCI irqs are connected to the same UIC pin (cf. UBoot source) */
-dev = sysbus_create_simple("ppc440-pcix-host", 0xc0ec0,
+dev = sysbus_create_simple(TYPE_PPC440_PCIX_HOST, 0xc0ec0,
qdev_get_gpio_in(uic[1], 0));
 pci_bus = PCI_BUS(qdev_get_child_bus(dev, "pci.0"));
 
diff --git a/include/hw/ppc/ppc4xx.h b/include/hw/ppc/ppc4xx.h
index 766d575e86..ea7740239b 100644
--- a/include/hw/ppc/ppc4xx.h
+++ b/include/hw/ppc/ppc4xx.h
@@ -31,6 +31,7 @@
 
 #define TYPE_PPC4xx_HOST_BRIDGE "ppc4xx-host-bridge"
 #define TYPE_PPC4xx_PCI_HOST "ppc4xx-pci-host"
+#define TYPE_PPC440_PCIX_HOST "ppc440-pcix-host"
 #define TYPE_PPC460EX_PCIE_HOST "ppc460ex-pcie-host"
 
 /*
-- 
2.30.9

[PATCH v2 00/14] PPC440 devices misc clean up

These are some small misc clean ups to PPC440 related device models
which is all I have ready for now.

v2:
- Added R-b tags from Philippe
- Addressed review comments
- Added new patch to rename parent field of PPC460EXPCIEState to parent_obj

Patches needing review: 6 7 10-13

BALATON Zoltan (14):
  ppc440: Change ppc460ex_pcie_init() parameter type
  ppc440: Add cpu link property to PCIe controller model
  ppc440: Add a macro to shorten PCIe controller DCR registration
  ppc440: Rename parent field of PPC460EXPCIEState to match code style
  ppc440: Rename local variable in dcr_read_pcie()
  ppc440: Stop using system io region for PCIe buses
  ppc/sam460ex: Remove address_space_mem local variable
  ppc440: Add busnum property to PCIe controller model
  ppc440: Remove ppc460ex_pcie_init legacy init function
  ppc4xx_pci: Rename QOM type name define
  ppc4xx_pci: Add define for ppc4xx-host-bridge type name
  ppc440_pcix: Rename QOM type define abd move it to common header
  ppc440_pcix: Don't use iomem for regs
  ppc440_pcix: Stop using system io region for PCI bus

 hw/ppc/ppc440.h |   1 -
 hw/ppc/ppc440_bamboo.c  |   3 +-
 hw/ppc/ppc440_pcix.c|  28 +++---
 hw/ppc/ppc440_uc.c  | 192 +---
 hw/ppc/ppc4xx_pci.c |  10 +--
 hw/ppc/sam460ex.c   |  33 ---
 include/hw/ppc/ppc4xx.h |   5 +-
 7 files changed, 129 insertions(+), 143 deletions(-)

-- 
2.30.9

[PATCH v2 01/14] ppc440: Change ppc460ex_pcie_init() parameter type

Change parameter of ppc460ex_pcie_init() from env to cpu to allow
further refactoring.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/ppc/ppc440.h| 2 +-
 hw/ppc/ppc440_uc.c | 7 ---
 hw/ppc/sam460ex.c  | 2 +-
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/ppc440.h b/hw/ppc/ppc440.h
index 7c24db8504..ae42bcf0c8 100644
--- a/hw/ppc/ppc440.h
+++ b/hw/ppc/ppc440.h
@@ -18,6 +18,6 @@ void ppc4xx_cpr_init(CPUPPCState *env);
 void ppc4xx_sdr_init(CPUPPCState *env);
 void ppc4xx_ahb_init(CPUPPCState *env);
 void ppc4xx_dma_init(CPUPPCState *env, int dcr_base);
-void ppc460ex_pcie_init(CPUPPCState *env);
+void ppc460ex_pcie_init(PowerPCCPU *cpu);
 
 #endif /* PPC440_H */
diff --git a/hw/ppc/ppc440_uc.c b/hw/ppc/ppc440_uc.c
index 651263926e..8eb985d714 100644
--- a/hw/ppc/ppc440_uc.c
+++ b/hw/ppc/ppc440_uc.c
@@ -17,6 +17,7 @@
 #include "hw/qdev-properties.h"
 #include "hw/pci/pci.h"
 #include "sysemu/reset.h"
+#include "cpu.h"
 #include "ppc440.h"
 
 /*/
@@ -1108,17 +1109,17 @@ static void 
ppc460ex_pcie_register_dcrs(PPC460EXPCIEState *s, CPUPPCState *env)
  _read_pcie, _write_pcie);
 }
 
-void ppc460ex_pcie_init(CPUPPCState *env)
+void ppc460ex_pcie_init(PowerPCCPU *cpu)
 {
 DeviceState *dev;
 
 dev = qdev_new(TYPE_PPC460EX_PCIE_HOST);
 qdev_prop_set_int32(dev, "dcrn-base", DCRN_PCIE0_BASE);
 sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
-ppc460ex_pcie_register_dcrs(PPC460EX_PCIE_HOST(dev), env);
+ppc460ex_pcie_register_dcrs(PPC460EX_PCIE_HOST(dev), >env);
 
 dev = qdev_new(TYPE_PPC460EX_PCIE_HOST);
 qdev_prop_set_int32(dev, "dcrn-base", DCRN_PCIE1_BASE);
 sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
-ppc460ex_pcie_register_dcrs(PPC460EX_PCIE_HOST(dev), env);
+ppc460ex_pcie_register_dcrs(PPC460EX_PCIE_HOST(dev), >env);
 }
diff --git a/hw/ppc/sam460ex.c b/hw/ppc/sam460ex.c
index cf065aae0e..aaa8d2f4a5 100644
--- a/hw/ppc/sam460ex.c
+++ b/hw/ppc/sam460ex.c
@@ -422,7 +422,7 @@ static void sam460ex_init(MachineState *machine)
 usb_create_simple(usb_bus_find(-1), "usb-mouse");
 
 /* PCI bus */
-ppc460ex_pcie_init(env);
+ppc460ex_pcie_init(cpu);
 /* All PCI irqs are connected to the same UIC pin (cf. UBoot source) */
 dev = sysbus_create_simple("ppc440-pcix-host", 0xc0ec0,
qdev_get_gpio_in(uic[1], 0));
-- 
2.30.9

Re: [PATCH qemu v5] aspeed add montblanc bmc reference from fuji

2023-07-05 Thread Mike Choi

Hi Sittisak,

Minipack3 is not open-sourced yet, and we are unlikely to be able to upstream 
detailed data.


  1.  What is these FRUID datas for, is it for testing?
  2.  What other option do we have, since we are not able to upstream FRUID 
data. (It is still OK to upstream system configuration, but NOT the arrays of 
_fruid data array)

Thanks,
Mike


From: Cédric Le Goater 
Date: Tuesday, July 4, 2023 at 7:07 AM
To: Sittisak Sinprem , Bin Huang , 
Tao Ren , Mike Choi 
Cc: qemu-devel@nongnu.org , qemu-...@nongnu.org 
, peter.mayd...@linaro.org , 
and...@aj.id.au , Joel Stanley , 
qemu-sta...@nongnu.org , srika...@celestica.com 
, ssu...@celestica.com , 
thangavel...@celestica.com , kgen...@celestica.com 
, anandaram...@celestica.com 
Subject: Re: [PATCH qemu v5] aspeed add montblanc bmc reference from fuji
!---|
  This Message Is From an External Sender

|---!

On 7/4/23 15:27, Sittisak Sinprem wrote:
> Hi Meta Team,
>
> the FRU EEPROM content, I think for now detail still be confidential,
> Please confirm, Can we add the description in Qemu upstream following 
> Cedric's request?

We don't need all the details, and not the confidential part of course.

C.

>
> On Tue, Jul 4, 2023 at 6:19 PM Cédric Le Goater  > wrote:
>
> On 7/4/23 13:06, ~ssinprem wrote:
>  > From: Sittisak Sinprem  >
>  >
>  > - I2C list follow I2C Tree v1.6 20230320
>  > - fru eeprom data use FB FRU format version 4
>  >
>  > Signed-off-by: Sittisak Sinprem  >
>
> You shoot too fast :) Please add some description for the EEPROM contents.
> What they enable when the OS/FW boots is good to know for QEMU.
>
> Thanks,
>
> C.
>
>
>  > ---
>  >   docs/system/arm/aspeed.rst |  1 +
>  >   hw/arm/aspeed.c| 65 
> ++
>  >   hw/arm/aspeed_eeprom.c | 50 +
>  >   hw/arm/aspeed_eeprom.h |  7 
>  >   4 files changed, 123 insertions(+)
>  >
>  > diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst
>  > index 80538422a1..5e0824f48b 100644
>  > --- a/docs/system/arm/aspeed.rst
>  > +++ b/docs/system/arm/aspeed.rst
>  > @@ -33,6 +33,7 @@ AST2600 SoC based machines :
>  >   - ``tacoma-bmc``   OpenPOWER Witherspoon POWER9 AST2600 BMC
>  >   - ``rainier-bmc``  IBM Rainier POWER10 BMC
>  >   - ``fuji-bmc`` Facebook Fuji BMC
>  > +- ``montblanc-bmc``Facebook Montblanc BMC
>  >   - ``bletchley-bmc``Facebook Bletchley BMC
>  >   - ``fby35-bmc``Facebook fby35 BMC
>  >   - ``qcom-dc-scm-v1-bmc``   Qualcomm DC-SCM V1 BMC
>  > diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
>  > index 9fca644d92..bbb7a3392c 100644
>  > --- a/hw/arm/aspeed.c
>  > +++ b/hw/arm/aspeed.c
>  > @@ -189,6 +189,10 @@ struct AspeedMachineState {
>  >   #define FUJI_BMC_HW_STRAP10x
>  >   #define FUJI_BMC_HW_STRAP20x
>  >
>  > +/* Montblanc hardware value */
>  > +#define MONTBLANC_BMC_HW_STRAP10x
>  > +#define MONTBLANC_BMC_HW_STRAP20x
>  > +
>  >   /* Bletchley hardware value */
>  >   /* TODO: Leave same as EVB for now. */
>  >   #define BLETCHLEY_BMC_HW_STRAP1 AST2600_EVB_HW_STRAP1
>  > @@ -925,6 +929,41 @@ static void fuji_bmc_i2c_init(AspeedMachineState 
> *bmc)
>  >   }
>  >   }
>  >
>  > +static void montblanc_bmc_i2c_init(AspeedMachineState *bmc)
>  > +{
>  > +AspeedSoCState *soc = >soc;
>  > +I2CBus *i2c[16] = {};
>  > +
>  > +for (int i = 0; i < 16; i++) {
>  > +i2c[i] = aspeed_i2c_get_bus(>i2c, i);
>  > +}
>  > +
>  > +/* Ref from Minipack3_I2C_Tree_V1.6 20230320 */
>  > +at24c_eeprom_init_rom(i2c[3], 0x56, 8192, montblanc_scm_fruid,
>  > +  montblanc_scm_fruid_len);
>  > +at24c_eeprom_init_rom(i2c[6], 0x53, 8192, montblanc_fcm_fruid,
>  > +  montblanc_fcm_fruid_len);
>  > +
>  > +/* CPLD and FPGA */
>  > +at24c_eeprom_init(i2c[1], 0x35, 256);  /* SCM CPLD */
>  > +at24c_eeprom_init(i2c[5], 0x35, 256);  /* COMe CPLD TODO: need to 
> update */
>  > +at24c_eeprom_init(i2c[12], 0x60, 256); /* MCB PWR CPLD */
>  > +at24c_eeprom_init(i2c[13], 0x35, 256); /* IOB FPGA */
>  > +
>  > +/* on BMC board */
>  > +at24c_eeprom_init_rom(i2c[8], 0x51, 8192, montblanc_bmc_fruid,
>  > +  montblanc_bmc_fruid_len); /* BMC EEPROM */
>  > +i2c_slave_create_simple(i2c[8], TYPE_LM75, 0x48); /* Thermal 
> Sensor */
>  > +
>  > +/* COMe

Re: [PATCH 14/21] mac_via: work around underflow in TimeDBRA timing loop in SETUPTIMEK

2023-07-05 Thread Mark Cave-Ayland


On 03/07/2023 09:30, Philippe Mathieu-Daudé wrote:


On 2/7/23 17:48, Mark Cave-Ayland wrote:

The MacOS toolbox ROM calculates the number of branches that can be executed
per millisecond as part of its timer calibration. Since modern hosts are
considerably quicker than original hardware, the negative counter reaches zero
before the calibration completes leading to division by zero later in
CALCULATESLOD.

Instead of trying to fudge the timing loop (which won't work for 
TimeDBRA/TimeSCCDB
anyhow), use the pattern of access to the VIA1 registers to detect when 
SETUPTIMEK
has finished executing and write some well-known good timer values to TimeDBRA
and TimeSCCDB taken from real hardware with a suitable scaling factor.

Signed-off-by: Mark Cave-Ayland 
---
  hw/misc/mac_via.c | 115 ++
  hw/misc/trace-events  |   1 +
  include/hw/misc/mac_via.h |   3 +
  3 files changed, 119 insertions(+)

diff --git a/hw/misc/mac_via.c b/hw/misc/mac_via.c
index baeb73eeb3..766a32a95d 100644
--- a/hw/misc/mac_via.c
+++ b/hw/misc/mac_via.c
@@ -16,6 +16,7 @@
   */
  #include "qemu/osdep.h"
+#include "exec/address-spaces.h"
  #include "migration/vmstate.h"
  #include "hw/sysbus.h"
  #include "hw/irq.h"




+/*
+ * Addresses and real values for TimeDBRA/TimeSCCB to allow timer calibration
+ * to succeed (NOTE: both values have been multiplied by 3 to cope with the
+ * speed of QEMU execution on a modern host
+ */
+#define MACOS_TIMEDBRA    0xd00
+#define MACOS_TIMESCCB    0xd02
+
+#define MACOS_TIMEDBRA_VALUE  (0x2a00 * 3)
+#define MACOS_TIMESCCB_VALUE  (0x079d * 3)
+
+static bool via1_is_toolbox_timer_calibrated(void)
+{
+    /*
+ * Indicate whether the MacOS toolbox has been calibrated by checking
+ * for the value of our magic constants
+ */
+    uint16_t timedbra = lduw_be_phys(_space_memory, MACOS_TIMEDBRA);
+    uint16_t timesccdb = lduw_be_phys(_space_memory, MACOS_TIMESCCB);


Rather than using the global address_space_memory (which we secretly
try to remove entirely), could we pass a MemoryRegion link property
to the VIA1 device?


Hmmm good question. It seems to me that we're dispatching a write to the default 
address space (which includes all RAM and MMIO etc.) rather than a particular 
MemoryRegion so it feels as if AddressSpace is the right approach here. Unfortunately 
since AddressSpace is not a QOM type then it isn't possible to pass it as a link 
property.


There are existing examples in qtest that use first_cpu->as which seems a better 
option unless we want to move away from using first_cpu in the codebase?



ATB,

Mark.

Re: [PATCH v7 14/20] target/riscv/kvm.c: add multi-letter extension KVM properties





On 7/5/23 10:41, Andrew Jones wrote:

On Fri, Jun 30, 2023 at 07:08:05AM -0300, Daniel Henrique Barboza wrote:

Let's add KVM user properties for the multi-letter extensions that KVM
currently supports: zicbom, zicboz, zihintpause, zbb, ssaia, sstc,
svinval and svpbmt.

As with MISA extensions, we're using the KVMCPUConfig type to hold
information about the state of each extension. However, multi-letter
extensions have more cases to cover than MISA extensions, so we're
adding an extra 'supported' flag as well. This flag will reflect if a
given extension is supported by KVM, i.e. KVM knows how to handle it.
This is determined during KVM extension discovery in
kvm_riscv_init_multiext_cfg(), where we test for EINVAL errors. Any
other error different from EINVAL will cause an abort.

The use of the 'user_set' is similar to what we already do with MISA
extensions: the flag set only if the user is changing the extension
state.

The 'supported' flag will be used later on to make an exception for
users that are disabling multi-letter extensions that are unknown to
KVM.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Andrew Jones 
---
  target/riscv/cpu.c |   8 +++
  target/riscv/kvm.c | 119 +
  2 files changed, 127 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index a9df61c9b4..f348424170 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1778,6 +1778,14 @@ static void riscv_cpu_add_user_properties(Object *obj)
  riscv_cpu_add_misa_properties(obj);
  
  for (prop = riscv_cpu_extensions; prop && prop->name; prop++) {

+#ifndef CONFIG_USER_ONLY
+if (kvm_enabled()) {
+/* Check if KVM created the property already */
+if (object_property_find(obj, prop->name)) {
+continue;
+}
+}
+#endif
  qdev_property_add_static(dev, prop);
  }
  
diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c

index 7afd6024e6..6ef81a6825 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -113,6 +113,7 @@ typedef struct KVMCPUConfig {
  target_ulong offset;
  int kvm_reg_id;
  bool user_set;
+bool supported;
  } KVMCPUConfig;
  
  #define KVM_MISA_CFG(_bit, _reg_id) \

@@ -197,6 +198,81 @@ static void kvm_riscv_update_cpu_misa_ext(RISCVCPU *cpu, 
CPUState *cs)
  }
  }
  
+#define CPUCFG(_prop) offsetof(struct RISCVCPUConfig, _prop)

+
+#define KVM_EXT_CFG(_name, _prop, _reg_id) \
+{.name = _name, .offset = CPUCFG(_prop), \
+ .kvm_reg_id = _reg_id}
+
+static KVMCPUConfig kvm_multi_ext_cfgs[] = {
+KVM_EXT_CFG("zicbom", ext_icbom, KVM_RISCV_ISA_EXT_ZICBOM),
+KVM_EXT_CFG("zicboz", ext_icboz, KVM_RISCV_ISA_EXT_ZICBOZ),
+KVM_EXT_CFG("zihintpause", ext_zihintpause, KVM_RISCV_ISA_EXT_ZIHINTPAUSE),
+KVM_EXT_CFG("zbb", ext_zbb, KVM_RISCV_ISA_EXT_ZBB),
+KVM_EXT_CFG("ssaia", ext_ssaia, KVM_RISCV_ISA_EXT_SSAIA),
+KVM_EXT_CFG("sstc", ext_sstc, KVM_RISCV_ISA_EXT_SSTC),
+KVM_EXT_CFG("svinval", ext_svinval, KVM_RISCV_ISA_EXT_SVINVAL),
+KVM_EXT_CFG("svpbmt", ext_svpbmt, KVM_RISCV_ISA_EXT_SVPBMT),
+};
+
+static void kvm_cpu_cfg_set(RISCVCPU *cpu, KVMCPUConfig *multi_ext,
+uint32_t val)
+{
+int cpu_cfg_offset = multi_ext->offset;
+bool *ext_enabled = (void *)>cfg + cpu_cfg_offset;
+
+*ext_enabled = val;
+}
+
+static uint32_t kvm_cpu_cfg_get(RISCVCPU *cpu,
+KVMCPUConfig *multi_ext)
+{
+int cpu_cfg_offset = multi_ext->offset;
+bool *ext_enabled = (void *)>cfg + cpu_cfg_offset;
+
+return *ext_enabled;
+}
+
+static void kvm_cpu_set_multi_ext_cfg(Object *obj, Visitor *v,
+  const char *name,
+  void *opaque, Error **errp)
+{
+KVMCPUConfig *multi_ext_cfg = opaque;
+RISCVCPU *cpu = RISCV_CPU(obj);
+bool value, host_val;
+
+if (!visit_type_bool(v, name, , errp)) {
+return;
+}
+
+host_val = kvm_cpu_cfg_get(cpu, multi_ext_cfg);
+
+/*
+ * Ignore if the user is setting the same value
+ * as the host.
+ */
+if (value == host_val) {
+return;
+}
+
+if (!multi_ext_cfg->supported) {
+/*
+ * Error out if the user is trying to enable an
+ * extension that KVM doesn't support. Ignore
+ * option otherwise.
+ */
+if (value) {
+error_setg(errp, "KVM does not support disabling extension %s",
+   multi_ext_cfg->name);
+}
+
+return;
+}
+
+multi_ext_cfg->user_set = true;
+kvm_cpu_cfg_set(cpu, multi_ext_cfg, value);
+}
+
  static void kvm_riscv_add_cpu_user_properties(Object *cpu_obj)
  {
  int i;
@@ -215,6 +291,15 @@ static void kvm_riscv_add_cpu_user_properties(Object 
*cpu_obj)
  object_property_set_description(cpu_obj, misa_cfg->name,
  misa_cfg->description);
  }
+
+

Re: [PATCH 11/21] swim: add trace events for IWM and ISM registers

2023-07-05 Thread Mark Cave-Ayland


On 03/07/2023 09:26, Philippe Mathieu-Daudé wrote:


On 2/7/23 17:48, Mark Cave-Ayland wrote:

Signed-off-by: Mark Cave-Ayland 
---
  hw/block/swim.c   | 14 ++
  hw/block/trace-events |  7 +++
  2 files changed, 21 insertions(+)



@@ -267,6 +275,7 @@ static void iwmctrl_write(void *opaque, hwaddr reg, uint64_t 
value,

  reg >>= REG_SHIFT;
  swimctrl->regs[reg >> 1] = reg & 1;
+    trace_swim_iwmctrl_write((reg >> 1), size, (reg & 1));
  if (swimctrl->regs[IWM_Q6] &&
  swimctrl->regs[IWM_Q7]) {
@@ -297,6 +306,7 @@ static void iwmctrl_write(void *opaque, hwaddr reg, uint64_t 
value,

  if (value == 0x57) {
  swimctrl->mode = SWIM_MODE_SWIM;
  swimctrl->iwm_switch = 0;
+    trace_swim_iwm_switch();
  }
  break;
  }
@@ -312,6 +322,7 @@ static uint64_t iwmctrl_read(void *opaque, hwaddr reg, unsigned 
size)

  swimctrl->regs[reg >> 1] = reg & 1;
+    trace_swim_iwmctrl_read((reg >> 1), size, (reg & 1));
  return 0;
  }



+swim_swimctrl_read(int reg, const char *name, unsigned size, uint64_t value) 
"reg=%d [%s] size=%u value=0x%"PRIx64
+swim_swimctrl_write(int reg, const char *name, unsigned size, uint64_t value) 
"reg=%d [%s] size=%u value=0x%"PRIx64
+swim_iwmctrl_read(int reg, unsigned size, uint64_t value) "reg=%d size=%u 
value=0x%"PRIx64
+swim_iwmctrl_write(int reg, unsigned size, uint64_t value) "reg=%d size=%u 
value=0x%"PRIx64


For these 2 functions, 'value' is 1 bit so could be 'unsigned' ;)


Indeed. In fact I'd be inclined to make them "unsigned int" just to be sure there is 
no confusion :)



Reviewed-by: Philippe Mathieu-Daudé 



ATB,

Mark.

Re: [PATCH v3] kconfig: Add PCIe devices to s390x machines

2023-07-05 Thread Matthew Rosato

On 7/5/23 11:23 AM, Cédric Le Goater wrote:
> It is useful to extend the number of available PCI devices to KVM guests
> for passthrough scenarios and also to expose these models to a different
> (big endian) architecture. Include models for Intel Ethernet adapters
> and one USB controller, which all support MSI-X. Devices only supporting
> INTx won't work on s390x.
> 
> Signed-off-by: Cédric Le Goater 

Acked-by: Matthew Rosato 

> ---
> 
>  v3: PCI -> PCI_EXPRESS
>  v2: select -> imply
>   
>  hw/s390x/Kconfig | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/s390x/Kconfig b/hw/s390x/Kconfig
> index 5e7d8a2bae8b..ab62c9120545 100644
> --- a/hw/s390x/Kconfig
> +++ b/hw/s390x/Kconfig
> @@ -5,8 +5,11 @@ config S390_CCW_VIRTIO
>  imply VFIO_AP
>  imply VFIO_CCW
>  imply WDT_DIAG288
> -select PCI
> +select PCI_EXPRESS
>  select S390_FLIC
>  select SCLPCONSOLE
>  select VIRTIO_CCW
>  select MSI_NONBROKEN
> +imply E1000E_PCI_EXPRESS
> +imply IGB_PCI_EXPRESS
> +imply USB_XHCI_PCI

[PATCH v2] Hexagon: move GETPC() calls to top level helpers

2023-07-05 Thread Matheus Tavares Bernardino

As docs/devel/loads-stores.rst states:

  ``GETPC()`` should be used with great care: calling
  it in other functions that are *not* the top level
  ``HELPER(foo)`` will cause unexpected behavior. Instead, the
  value of ``GETPC()`` should be read from the helper and passed
  if needed to the functions that the helper calls.

Let's fix the GETPC() usage in Hexagon, making sure it's always called
from top level helpers and passed down to the places where it's
needed. There are two snippets where that is not currently the case:

- probe_store(), which is only called from two helpers, so it's easy to
  move GETPC() up.

- mem_load*() functions, which are also called directly from helpers,
  but through the MEM_LOAD*() set of macros. Note that this are only
  used when compiling with --disable-hexagon-idef-parser.

  In this case, we also take this opportunity to simplify the code,
  unifying the mem_load*() functions.

Signed-off-by: Matheus Tavares Bernardino 
---
v1: 
d40fabcf9d6e92e4cd8d6a144e9b2a9acf4580dc.1688420966.git.quic_mathb...@quicinc.com

Changes in v2:
- Fixed wrong cpu_ld* unification from previous version.
- Passed retaddr down to check_noshuf() and further, as Taylor
  suggested.
- Reorganized macros for simplification.

 target/hexagon/macros.h| 19 ++--
 target/hexagon/op_helper.h | 11 ++-
 target/hexagon/op_helper.c | 62 +++---
 3 files changed, 29 insertions(+), 63 deletions(-)

diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 5451b061ee..e44a932434 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -173,15 +173,6 @@
 #define MEM_STORE8(VA, DATA, SLOT) \
 MEM_STORE8_FUNC(DATA)(cpu_env, VA, DATA, SLOT)
 #else
-#define MEM_LOAD1s(VA) ((int8_t)mem_load1(env, pkt_has_store_s1, slot, VA))
-#define MEM_LOAD1u(VA) ((uint8_t)mem_load1(env, pkt_has_store_s1, slot, VA))
-#define MEM_LOAD2s(VA) ((int16_t)mem_load2(env, pkt_has_store_s1, slot, VA))
-#define MEM_LOAD2u(VA) ((uint16_t)mem_load2(env, pkt_has_store_s1, slot, VA))
-#define MEM_LOAD4s(VA) ((int32_t)mem_load4(env, pkt_has_store_s1, slot, VA))
-#define MEM_LOAD4u(VA) ((uint32_t)mem_load4(env, pkt_has_store_s1, slot, VA))
-#define MEM_LOAD8s(VA) ((int64_t)mem_load8(env, pkt_has_store_s1, slot, VA))
-#define MEM_LOAD8u(VA) ((uint64_t)mem_load8(env, pkt_has_store_s1, slot, VA))
-
 #define MEM_STORE1(VA, DATA, SLOT) log_store32(env, VA, DATA, 1, SLOT)
 #define MEM_STORE2(VA, DATA, SLOT) log_store32(env, VA, DATA, 2, SLOT)
 #define MEM_STORE4(VA, DATA, SLOT) log_store32(env, VA, DATA, 4, SLOT)
@@ -530,8 +521,16 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, 
int shift)
 #ifdef QEMU_GENERATE
 #define fLOAD(NUM, SIZE, SIGN, EA, DST) MEM_LOAD##SIZE##SIGN(DST, EA)
 #else
+#define MEM_LOAD1 cpu_ldub_data_ra
+#define MEM_LOAD2 cpu_lduw_data_ra
+#define MEM_LOAD4 cpu_ldl_data_ra
+#define MEM_LOAD8 cpu_ldq_data_ra
+
 #define fLOAD(NUM, SIZE, SIGN, EA, DST) \
-DST = (size##SIZE##SIGN##_t)MEM_LOAD##SIZE##SIGN(EA)
+DST =  (size##SIZE##SIGN##_t)({ \
+check_noshuf(env, pkt_has_store_s1, slot, EA, SIZE, GETPC()); \
+MEM_LOAD##SIZE(env, EA, GETPC()); \
+})
 #endif
 
 #define fMEMOP(NUM, SIZE, SIGN, EA, FNTYPE, VALUE)
diff --git a/target/hexagon/op_helper.h b/target/hexagon/op_helper.h
index 8f3764d15e..7744e819ef 100644
--- a/target/hexagon/op_helper.h
+++ b/target/hexagon/op_helper.h
@@ -19,15 +19,8 @@
 #define HEXAGON_OP_HELPER_H
 
 /* Misc functions */
-uint8_t mem_load1(CPUHexagonState *env, bool pkt_has_store_s1,
-  uint32_t slot, target_ulong vaddr);
-uint16_t mem_load2(CPUHexagonState *env, bool pkt_has_store_s1,
-   uint32_t slot, target_ulong vaddr);
-uint32_t mem_load4(CPUHexagonState *env, bool pkt_has_store_s1,
-   uint32_t slot, target_ulong vaddr);
-uint64_t mem_load8(CPUHexagonState *env, bool pkt_has_store_s1,
-   uint32_t slot, target_ulong vaddr);
-
+void check_noshuf(CPUHexagonState *env, bool pkt_has_store_s1,
+  uint32_t slot, target_ulong vaddr, int size, uintptr_t ra);
 void log_store64(CPUHexagonState *env, target_ulong addr,
  int64_t val, int width, int slot);
 void log_store32(CPUHexagonState *env, target_ulong addr,
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 12967ac21e..abc9fc4724 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -95,9 +95,8 @@ void HELPER(debug_check_store_width)(CPUHexagonState *env, 
int slot, int check)
 }
 }
 
-void HELPER(commit_store)(CPUHexagonState *env, int slot_num)
+static void commit_store(CPUHexagonState *env, int slot_num, uintptr_t ra)
 {
-uintptr_t ra = GETPC();
 uint8_t width = env->mem_log_stores[slot_num].width;
 target_ulong va = env->mem_log_stores[slot_num].va;
 
@@ -119,6 +118,12 @@ void HELPER(commit_store)(CPUHexagonState *env, int 
slot_num)
 }
 }
 
+void HELPER(commit_store)(CPUHexagonState

Re: [PATCH v1 2/2] xen_arm: Initialize RAM and add hi/low memory regions

2023-07-05 Thread Vikram Garhwal


HI Leo,

On 7/2/23 11:14 PM, Leo Yan wrote:

Hi Vikram,

On Thu, Jun 29, 2023 at 10:43:10AM -0700, Oleksandr Tyshchenko wrote:

[...]


  void arch_handle_ioreq(XenIOState *state, ioreq_t *req)
  {
  hw_error("Invalid ioreq type 0x%x\n", req->type);
@@ -135,6 +170,14 @@ static void xen_arm_init(MachineState *machine)
  
  xam->state =  g_new0(XenIOState, 1);
  
+if (machine->ram_size == 0) {

+DPRINTF("ram_size not specified. QEMU machine will be started without"
+" TPM, IOREQ and Virtio-MMIO backends\n");
+return;
+}
+
+xen_init_ram(machine);
+
  xen_register_ioreq(xam->state, machine->smp.cpus, xen_memory_listener);
  
  xen_create_virtio_mmio_devices(xam);

@@ -182,6 +225,8 @@ static void xen_arm_machine_class_init(ObjectClass *oc, 
void *data)
  mc->init = xen_arm_init;
  mc->max_cpus = 1;
  mc->default_machine_opts = "accel=xen";
+/* Set explicitly here to make sure that real ram_size is passed */
+mc->default_ram_size = 0;

This patch fails to apply on my side on QEMU 8.0.0.


  printf("CHECK for NEW BUILD\n");

The printf sentence is introduced unexpectly, right?

I will rebase it with latest and resend v2.
Thank you!


Thanks,
Leo


  #ifdef CONFIG_TPM
--
2.25.1

[PATCH] io: remove io watch if TLS channel is closed during handshake

2023-07-05 Thread Daniel P . Berrangé

The TLS handshake make take some time to complete, during which time an
I/O watch might be registered with the main loop. If the owner of the
I/O channel invokes qio_channel_close() while the handshake is waiting
to continue the I/O watch must be removed. Failing to remove it will
later trigger the completion callback which the owner is not expecting
to receive. In the case of the VNC server, this results in a SEGV as
vnc_disconnect_start() tries to shutdown a client connection that is
already gone / NULL.

CVE-2023-3354
Reported-by: jiangyegen 
Signed-off-by: Daniel P. Berrangé 
---
 include/io/channel-tls.h |  1 +
 io/channel-tls.c | 18 --
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/io/channel-tls.h b/include/io/channel-tls.h
index 5672479e9e..26c67f17e2 100644
--- a/include/io/channel-tls.h
+++ b/include/io/channel-tls.h
@@ -48,6 +48,7 @@ struct QIOChannelTLS {
 QIOChannel *master;
 QCryptoTLSSession *session;
 QIOChannelShutdown shutdown;
+guint hs_ioc_tag;
 };
 
 /**
diff --git a/io/channel-tls.c b/io/channel-tls.c
index 9805dd0a3f..e327e6a5c2 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -198,12 +198,13 @@ static void qio_channel_tls_handshake_task(QIOChannelTLS 
*ioc,
 }
 
 trace_qio_channel_tls_handshake_pending(ioc, status);
-qio_channel_add_watch_full(ioc->master,
-   condition,
-   qio_channel_tls_handshake_io,
-   data,
-   NULL,
-   context);
+ioc->hs_ioc_tag =
+qio_channel_add_watch_full(ioc->master,
+   condition,
+   qio_channel_tls_handshake_io,
+   data,
+   NULL,
+   context);
 }
 }
 
@@ -218,6 +219,7 @@ static gboolean qio_channel_tls_handshake_io(QIOChannel 
*ioc,
 QIOChannelTLS *tioc = QIO_CHANNEL_TLS(
 qio_task_get_source(task));
 
+tioc->hs_ioc_tag = 0;
 g_free(data);
 qio_channel_tls_handshake_task(tioc, task, context);
 
@@ -378,6 +380,10 @@ static int qio_channel_tls_close(QIOChannel *ioc,
 {
 QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
 
+if (tioc->hs_ioc_tag) {
+g_source_remove(tioc->hs_ioc_tag);
+}
+
 return qio_channel_close(tioc->master, errp);
 }
 
-- 
2.41.0

Re: Reducing vdpa migration downtime because of memory pin / maps

2023-07-05 Thread Eugenio Perez Martin

On Tue, Jun 27, 2023 at 8:36 AM Si-Wei Liu  wrote:
>
>
>
> On 6/9/2023 7:32 AM, Eugenio Perez Martin wrote:
> > On Fri, Jun 9, 2023 at 12:39 AM Si-Wei Liu  wrote:
> >>
> >> On 6/7/23 01:08, Eugenio Perez Martin wrote:
> >>> On Wed, Jun 7, 2023 at 12:43 AM Si-Wei Liu  wrote:
>  Sorry for reviving this old thread, I lost the best timing to follow up
>  on this while I was on vacation. I have been working on this and found
>  out some discrepancy, please see below.
> 
>  On 4/5/23 04:37, Eugenio Perez Martin wrote:
> > Hi!
> >
> > As mentioned in the last upstream virtio-networking meeting, one of
> > the factors that adds more downtime to migration is the handling of
> > the guest memory (pin, map, etc). At this moment this handling is
> > bound to the virtio life cycle (DRIVER_OK, RESET). In that sense, the
> > destination device waits until all the guest memory / state is
> > migrated to start pinning all the memory.
> >
> > The proposal is to bind it to the char device life cycle (open vs
> > close),
>  Hmmm, really? If it's the life cycle for char device, the next guest /
>  qemu launch on the same vhost-vdpa device node won't make it work.
> 
> >>> Maybe my sentence was not accurate, but I think we're on the same page 
> >>> here.
> >>>
> >>> Two qemu instances opening the same char device at the same time are
> >>> not allowed, and vhost_vdpa_release clean all the maps. So the next
> >>> qemu that opens the char device should see a clean device anyway.
> >> I mean the pin can't be done at the time of char device open, where the
> >> user address space is not known/bound yet. The earliest point possible
> >> for pinning would be until the vhost_attach_mm() call from SET_OWNER is
> >> done.
> > Maybe we are deviating, let me start again.
> >
> > Using QEMU code, what I'm proposing is to modify the lifecycle of the
> > .listener member of struct vhost_vdpa.
> >
> > At this moment, the memory listener is registered at
> > vhost_vdpa_dev_start(dev, started=true) call for the last vhost_dev,
> > and is unregistered in both vhost_vdpa_reset_status and
> > vhost_vdpa_cleanup.
> >
> > My original proposal was just to move the memory listener registration
> > to the last vhost_vdpa_init, and remove the unregister from
> > vhost_vdpa_reset_status. The calls to vhost_vdpa_dma_map/unmap would
> > be the same, the device should not realize this change.
> This can address LM downtime latency for sure, but it won't help
> downtime during dynamic SVQ switch - which still needs to go through the
> full unmap/map cycle (that includes the slow part for pinning) from
> passthrough to SVQ mode. Be noted not every device could work with a
> separate ASID for SVQ descriptors. The fix should expect to work on
> normal vDPA vendor devices without a separate descriptor ASID, with
> platform IOMMU underneath or with on-chip IOMMU.
>

At this moment the SVQ switch is very inefficient mapping-wise, as it
unmap all the GPA->HVA maps and overrides it. In particular, SVQ is
allocated in low regions of the iova space, and then the guest memory
is allocated in this new IOVA region incrementally.

We can optimize that if we place SVQ in a free GPA area instead. All
of the "translations" still need to be done, to ensure the guest
doesn't have access to SVQ vring. That way, qemu will not send all the
unmaps & maps, only the new ones. And vhost/vdpa does not need to call
unpin_user_page / pin_user_pages for all the guest memory.

More optimizations include the batching of the SVQ vrings.

> >
> > One of the concerns was that it could delay VM initialization, and I
> > didn't profile it but I think that may be the case.
> Yes, that's the concern here - we should not introduce regression to
> normal VM boot process/time. In case of large VM it's very easy to see
> the side effect if we go this way.
>
> >   I'm not sure about
> > the right fix but I think the change is easy to profile. If that is
> > the case, we could:
> > * use a flag (listener->address_space ?) and only register the
> > listener in _init if waiting for a migration, do it in _start
> > otherwise.
> Just doing this alone won't help SVQ mode switch downtime, see the
> reason stated above.
>
> > * something like io_uring, where the map can be done in parallel and
> > we can fail _start if some of them fails.
> This can alleviate the problem somehow, but still sub-optimal and not
> scalable with larger size. I'd like zero or least pinning to be done at
> the SVQ switch or migration time.
>

To reduce even further the pinning at SVQ time we would need to
preallocate SVQ vrings before suspending the device.

> >
> >> Actually I think the counterpart vhost_detach_mm() only gets
> >> handled in vhost_vdpa_release() at device close time is a resulting
> >> artifact and amiss of today's vhost protocol - the opposite RESET_OWNER
> >> call is not made mandatory hence only seen implemented in vhost-net
> >>

Re: [PATCH] pnv/xive2: Always pass a presenter object when accessing the TIMA


Queued in gitlab.com/danielhb/qemu/tree/ppc-next. Thanks,


Daniel

On 7/5/23 05:14, Frederic Barrat wrote:

The low-level functions to access the TIMA take a presenter object as
a first argument. When accessing the TIMA from the IC BAR,
i.e. indirect calls, we currently pass a NULL pointer for the
presenter argument. While it appears ok with the current usage, it's
dangerous. And it's pretty easy to figure out the presenter in that
context, so this patch fixes it.

Signed-off-by: Frederic Barrat 
---
  hw/intc/pnv_xive2.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/intc/pnv_xive2.c b/hw/intc/pnv_xive2.c
index 82fcd3ea22..bbb44a533c 100644
--- a/hw/intc/pnv_xive2.c
+++ b/hw/intc/pnv_xive2.c
@@ -1624,6 +1624,7 @@ static uint64_t pnv_xive2_ic_tm_indirect_read(void 
*opaque, hwaddr offset,
unsigned size)
  {
  PnvXive2 *xive = PNV_XIVE2(opaque);
+XivePresenter *xptr = XIVE_PRESENTER(xive);
  hwaddr hw_page_offset;
  uint32_t pir;
  XiveTCTX *tctx;
@@ -1633,7 +1634,7 @@ static uint64_t pnv_xive2_ic_tm_indirect_read(void 
*opaque, hwaddr offset,
  hw_page_offset = pnv_xive2_ic_tm_get_hw_page_offset(xive, offset);
  tctx = pnv_xive2_get_indirect_tctx(xive, pir);
  if (tctx) {
-val = xive_tctx_tm_read(NULL, tctx, hw_page_offset, size);
+val = xive_tctx_tm_read(xptr, tctx, hw_page_offset, size);
  }
  
  return val;

@@ -1643,6 +1644,7 @@ static void pnv_xive2_ic_tm_indirect_write(void *opaque, 
hwaddr offset,
 uint64_t val, unsigned size)
  {
  PnvXive2 *xive = PNV_XIVE2(opaque);
+XivePresenter *xptr = XIVE_PRESENTER(xive);
  hwaddr hw_page_offset;
  uint32_t pir;
  XiveTCTX *tctx;
@@ -1651,7 +1653,7 @@ static void pnv_xive2_ic_tm_indirect_write(void *opaque, 
hwaddr offset,
  hw_page_offset = pnv_xive2_ic_tm_get_hw_page_offset(xive, offset);
  tctx = pnv_xive2_get_indirect_tctx(xive, pir);
  if (tctx) {
-xive_tctx_tm_write(NULL, tctx, hw_page_offset, val, size);
+xive_tctx_tm_write(xptr, tctx, hw_page_offset, val, size);
  }
  }

Re: [PATCH v2 0/4] ppc/pnv: SMT support for powernv


Queued in gitlab.com/danielhb/qemu/tree/ppc-next. Thanks,


Daniel

On 7/5/23 09:06, Nicholas Piggin wrote:

These patches implement enough to install a distro, boot, run SMP KVM
guests with libvirt with good performance using MTTCG (as reported by
Cedric).

There are a few more SPRs that need to be done, and per-LPAR SPRs are
mostly not annotated yet so it can't run in 1 LPAR mode. But those can
be added in time, it will take a bit of time to get everything exactly
as hardware does so I consider this good enough to run common
software usefully.

Since RFC:
- Rebased against ppc-next (no conflicts vs upstream anyway).
- Add patch 4 avocado boot test with SMT, as was added with pseries SMT.
- Renamed POWERPC_FLAG_1LPAR to POWERPC_FLAG_SMT_1LPAR since it implies
   SMT.
- Fixed typos, patch 1, 3 changelogs improvement (hopefully).

Since v1:
- Fix clang compile bug
- Fix LPAR-per-thread bug in CTRL/DPDES/msgsndp in patch 1
- Add 2-socket test case to powernv Linux boot avocado test
- Remove SMT caveat from docs/system/ppc/powernv.rst

Thanks,
Nick

Nicholas Piggin (4):
   target/ppc: Add LPAR-per-core vs per-thread mode flag
   target/ppc: SMT support for the HID SPR
   ppc/pnv: SMT support for powernv
   tests/avocado: Add powernv machine test script

  docs/system/ppc/powernv.rst  |  5 ---
  hw/ppc/pnv.c | 12 +
  hw/ppc/pnv_core.c| 13 +++---
  hw/ppc/spapr_cpu_core.c  |  2 +
  target/ppc/cpu.h |  3 ++
  target/ppc/cpu_init.c| 14 +-
  target/ppc/excp_helper.c |  4 ++
  target/ppc/helper.h  |  1 +
  target/ppc/misc_helper.c | 29 
  target/ppc/spr_common.h  |  1 +
  target/ppc/translate.c   | 27 ---
  tests/avocado/ppc_powernv.py | 87 
  12 files changed, 179 insertions(+), 19 deletions(-)
  create mode 100644 tests/avocado/ppc_powernv.py

Re: [PATCH] pnv/xive: Print CPU target in all TIMA traces