from:"Beata Michalska"

Re: [PATCH v8 1/2] target/arm: kvm: Handle DABT with no valid ISS

2020-06-29 Thread Beata Michalska

On Mon, 29 Jun 2020 at 09:15, Andrew Jones  wrote:
>
> On Sun, Jun 28, 2020 at 04:04:58PM +0100, Beata Michalska wrote:
> > On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
> > exception with no valid ISS info to be decoded. The lack of decode info
> > makes it at least tricky to emulate those instruction which is one of the
> > (many) reasons why KVM will not even try to do so.
> >
> > Add support for handling those by requesting KVM to inject external
> > dabt into the quest.
> >
> > Signed-off-by: Beata Michalska 
> > ---
> >  target/arm/kvm.c | 57 
> > +++-
> >  1 file changed, 56 insertions(+), 1 deletion(-)
> >
> > diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> > index eef3bbd..2dd8a9a 100644
> > --- a/target/arm/kvm.c
> > +++ b/target/arm/kvm.c
> > @@ -39,6 +39,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] 
> > = {
> >
> >  static bool cap_has_mp_state;
> >  static bool cap_has_inject_serror_esr;
> > +static bool cap_has_inject_ext_dabt;
> >
> >  static ARMHostCPUFeatures arm_host_cpu_features;
> >
> > @@ -245,6 +246,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
> >  ret = -EINVAL;
> >  }
> >
> > +if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER)) {
> > +if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
> > +error_report("Failed to enable KVM_CAP_ARM_NISV_TO_USER cap");
> > +} else {
> > +/* Set status for supporting the external dabt injection */
> > +cap_has_inject_ext_dabt = kvm_check_extension(s,
> > +KVM_CAP_ARM_INJECT_EXT_DABT);
> > +}
> > +}
> > +
> >  return ret;
> >  }
> >
> > @@ -810,6 +821,45 @@ void kvm_arm_vm_state_change(void *opaque, int 
> > running, RunState state)
> >  }
> >  }
> >
> > +/**
> > + * kvm_arm_handle_dabt_nisv:
> > + * @cs: CPUState
> > + * @esr_iss: ISS encoding (limited) for the exception from Data Abort
> > + *   ISV bit set to '0b0' -> no valid instruction syndrome
> > + * @fault_ipa: faulting address for the synchronous data abort
> > + *
> > + * Returns: 0 if the exception has been handled, < 0 otherwise
> > + */
> > +static int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
> > + uint64_t fault_ipa)
>
> The indent of 'uint64_t fault_ipa' is off. Should be under CPUState.
>
> > +{
> > +/*
> > + * Request KVM to inject the external data abort into the guest
> > + */
> > +if (cap_has_inject_ext_dabt) {
> > +struct kvm_vcpu_events events = { };
> > +/*
> > + * The external data abort event will be handled immediately by KVM
> > + * using the address fault that triggered the exit on given VCPU.
> > + * Requesting injection of the external data abort does not rely
> > + * on any other VCPU state. Therefore, in this particular case, 
> > the VCPU
> > + * synchronization can be exceptionally skipped.
> > + */
> > +events.exception.ext_dabt_pending = 1;
> > +/*
> > + * KVM_CAP_ARM_INJECT_EXT_DABT implies KVM_CAP_VCPU_EVENTS
> > + */
>
> Single line comments may be done with /* ... */
>
> > +return kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, );
> > +
>
> Extra blank line here.
>
> > +} else {
> > +error_report("Data abort exception triggered by guest memory 
> > access "
> > + "at physical address: 0x"  TARGET_FMT_lx,
> > + (target_ulong)fault_ipa);
> > +error_printf("KVM unable to emulate faulting instruction.\n");
> > +}
> > +return -1;
> > +}
> > +
> >  int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
> >  {
> >  int ret = 0;
> > @@ -820,7 +870,12 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run 
> > *run)
> >  ret = EXCP_DEBUG;
> >  } /* otherwise return to guest */
> >  break;
> > -default:
> > +case KVM_EXIT_ARM_NISV:
> > +/* External DABT with no valid iss to decode */
> > +ret = kvm_arm_handle_dabt_nisv(cs, run->arm_nisv.esr_iss,
> > +   run->arm_nisv.fault_ipa);
> > +break;
> > + default:
>
> An extra space got added in front of 'default:'
>
> >  qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
> >__func__, run->exit_reason);
> >  break;
> > --
> > 2.7.4
> >
> >
>
> Besides the format changes
>
> Reviewed-by: Andrew Jones 
>
Done.
Thanks a lot for the reviews!

BR
Beata

[PATCH v9 1/2] target/arm: kvm: Handle DABT with no valid ISS

2020-06-29 Thread Beata Michalska

On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate those instruction which is one of the
(many) reasons why KVM will not even try to do so.

Add support for handling those by requesting KVM to inject external
dabt into the quest.

Signed-off-by: Beata Michalska 
Reviewed-by: Andrew Jones 
---
 target/arm/kvm.c | 52 
 1 file changed, 52 insertions(+)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index eef3bbd..545d2ba 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -39,6 +39,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 
 static bool cap_has_mp_state;
 static bool cap_has_inject_serror_esr;
+static bool cap_has_inject_ext_dabt;
 
 static ARMHostCPUFeatures arm_host_cpu_features;
 
@@ -245,6 +246,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 ret = -EINVAL;
 }
 
+if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER)) {
+if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
+error_report("Failed to enable KVM_CAP_ARM_NISV_TO_USER cap");
+} else {
+/* Set status for supporting the external dabt injection */
+cap_has_inject_ext_dabt = kvm_check_extension(s,
+KVM_CAP_ARM_INJECT_EXT_DABT);
+}
+}
+
 return ret;
 }
 
@@ -810,6 +821,42 @@ void kvm_arm_vm_state_change(void *opaque, int running, 
RunState state)
 }
 }
 
+/**
+ * kvm_arm_handle_dabt_nisv:
+ * @cs: CPUState
+ * @esr_iss: ISS encoding (limited) for the exception from Data Abort
+ *   ISV bit set to '0b0' -> no valid instruction syndrome
+ * @fault_ipa: faulting address for the synchronous data abort
+ *
+ * Returns: 0 if the exception has been handled, < 0 otherwise
+ */
+static int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
+uint64_t fault_ipa)
+{
+/*
+ * Request KVM to inject the external data abort into the guest
+ */
+if (cap_has_inject_ext_dabt) {
+struct kvm_vcpu_events events = { };
+/*
+ * The external data abort event will be handled immediately by KVM
+ * using the address fault that triggered the exit on given VCPU.
+ * Requesting injection of the external data abort does not rely
+ * on any other VCPU state. Therefore, in this particular case, the 
VCPU
+ * synchronization can be exceptionally skipped.
+ */
+events.exception.ext_dabt_pending = 1;
+/* KVM_CAP_ARM_INJECT_EXT_DABT implies KVM_CAP_VCPU_EVENTS */
+return kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, );
+} else {
+error_report("Data abort exception triggered by guest memory access "
+ "at physical address: 0x"  TARGET_FMT_lx,
+ (target_ulong)fault_ipa);
+error_printf("KVM unable to emulate faulting instruction.\n");
+}
+return -1;
+}
+
 int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 {
 int ret = 0;
@@ -820,6 +867,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 ret = EXCP_DEBUG;
 } /* otherwise return to guest */
 break;
+case KVM_EXIT_ARM_NISV:
+/* External DABT with no valid iss to decode */
+ret = kvm_arm_handle_dabt_nisv(cs, run->arm_nisv.esr_iss,
+   run->arm_nisv.fault_ipa);
+break;
 default:
 qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
   __func__, run->exit_reason);
-- 
2.7.4

[PATCH v9 2/2] target/arm: kvm: Handle misconfigured dabt injection

2020-06-29 Thread Beata Michalska

Injecting external data abort through KVM might trigger
an issue on kernels that do not get updated to include the KVM fix.
For those and aarch32 guests, the injected abort gets misconfigured
to be an implementation defined exception. This leads to the guest
repeatedly re-running the faulting instruction.

Add support for handling that case.

[
  Fixed-by: 018f22f95e8a
('KVM: arm: Fix DFSR setting for non-LPAE aarch32 guests')
  Fixed-by: 21aecdbd7f3a
('KVM: arm: Make inject_abt32() inject an external abort instead')
]

Signed-off-by: Beata Michalska 
Acked-by: Andrew Jones 
---
 target/arm/cpu.h |  2 ++
 target/arm/kvm.c | 30 +-
 target/arm/kvm32.c   | 34 ++
 target/arm/kvm64.c   | 49 +
 target/arm/kvm_arm.h | 10 ++
 5 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 677584e..ed0ff09 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -570,6 +570,8 @@ typedef struct CPUARMState {
 uint64_t esr;
 } serror;
 
+uint8_t ext_dabt_raised; /* Tracking/verifying injection of ext DABT */
+
 /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
 uint32_t irq_line_state;
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 545d2ba..603d431 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -749,6 +749,29 @@ int kvm_get_vcpu_events(ARMCPU *cpu)
 
 void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
 {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+
+if (unlikely(env->ext_dabt_raised)) {
+/*
+ * Verifying that the ext DABT has been properly injected,
+ * otherwise risking indefinitely re-running the faulting instruction
+ * Covering a very narrow case for kernels 5.5..5.5.4
+ * when injected abort was misconfigured to be
+ * an IMPLEMENTATION DEFINED exception (for 32-bit EL1)
+ */
+if (!arm_feature(env, ARM_FEATURE_AARCH64) &&
+unlikely(!kvm_arm_verify_ext_dabt_pending(cs))) {
+
+error_report("Data abort exception with no valid ISS generated by "
+   "guest memory access. KVM unable to emulate faulting "
+   "instruction. Failed to inject an external data abort "
+   "into the guest.");
+abort();
+   }
+   /* Clear the status */
+   env->ext_dabt_raised = 0;
+}
 }
 
 MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
@@ -833,6 +856,8 @@ void kvm_arm_vm_state_change(void *opaque, int running, 
RunState state)
 static int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
 uint64_t fault_ipa)
 {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
 /*
  * Request KVM to inject the external data abort into the guest
  */
@@ -847,7 +872,10 @@ static int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t 
esr_iss,
  */
 events.exception.ext_dabt_pending = 1;
 /* KVM_CAP_ARM_INJECT_EXT_DABT implies KVM_CAP_VCPU_EVENTS */
-return kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, );
+if (!kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, )) {
+env->ext_dabt_raised = 1;
+return 0;
+}
 } else {
 error_report("Data abort exception triggered by guest memory access "
  "at physical address: 0x"  TARGET_FMT_lx,
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index 7b3a19e..0af46b4 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -559,3 +559,37 @@ void kvm_arm_pmu_init(CPUState *cs)
 {
 qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
 }
+
+#define ARM_REG_DFSR  ARM_CP15_REG32(0, 5, 0, 0)
+#define ARM_REG_TTBCR ARM_CP15_REG32(0, 2, 0, 2)
+/*
+ *DFSR:
+ *  TTBCR.EAE == 0
+ *  FS[4]   - DFSR[10]
+ *  FS[3:0] - DFSR[3:0]
+ *  TTBCR.EAE == 1
+ *  FS, bits [5:0]
+ */
+#define DFSR_FSC(lpae, v) \
+((lpae) ? ((v) & 0x3F) : (((v) >> 6) | ((v) & 0x1F)))
+
+#define DFSC_EXTABT(lpae) ((lpae) ? 0x10 : 0x08)
+
+bool kvm_arm_verify_ext_dabt_pending(CPUState *cs)
+{
+uint32_t dfsr_val;
+
+if (!kvm_get_one_reg(cs, ARM_REG_DFSR, _val)) {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+uint32_t ttbcr;
+int lpae = 0;
+
+if (!kvm_get_one_reg(cs, ARM_REG_TTBCR, )) {
+lpae = arm_feature(env, ARM_FEATURE_LPAE) && (ttbcr & TTBCR_EAE);
+}
+/* The verification is based on FS filed of the DFSR reg only*/
+return (DFSR_FSC(lpae, dfsr_val) == DFSC_EXTABT(lpae));
+}
+return false;
+}
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index f09ed9f..88cf10c 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm

[PATCH v9 0/2] target/arm: kvm: Support for KVM DABT with no valid ISS

2020-06-29 Thread Beata Michalska

Some of the ARMv7 & ARMv8 load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate the instruction which is one of the
(many) reasons why KVM will not even try to do so.

So far, if a guest made an attempt to access memory outside the memory slot,
KVM reported vague ENOSYS. As a result QEMU exited with no useful information
being provided or even a clue on what has just happened.

ARM KVM introduced support for notifying of an attempt to execute
an instruction that resulted in dabt with no valid ISS decoding info.
This still leaves QEMU to handle the case, but at least now it gives more
control and a start point for more meaningful handling of such cases.

This patchset relies on KVM to insert the external data abort into the guest.


--
v9:
 - formatting fixes

v8:
 - moving the comment re support for VCPU events - missed in previous patchset

v7:
 - Rephrasing the comment regarding abort injection and vcpu synchronization
 - switching to struct initialization instead of memset

v6:
 - replacing calling kvm_put_vcpu_events with an actual ioctl call
 - making the handler function static

v5:
 - Drop syncing vcpu regs in favour of calling kvm_put_vcpu_events directly
 - Fix decoding DFSC for LPAE case
 - Add/clarify comments
 - Switch to reporting error case failure when enabling the cap

v4:
 - Removing one of the patches as it is being picked-up separately
 target/arm: kvm: Inject events at the last stage of sync
 - Moving handling KVM issue to a separate patch
 - Minor changes wrt the review comments

v3:
 - Fix setting KVM cap per vm not per vcpu
 - Simplifying the handler to bare minimum with no default logging to address
   the potential risk of overflooding the host (adding support for rate
   limiting the logs turned out to be bit too invasive to justify the little
   add-on value from logs in this particular case)
 - Adding handling KVM bug (for small range of affected kernels):
   little bit of trade-off between what's reasonable and what's effective:
   aborting qemu when running on buggy host kernel

v2:
- Improving/re-phrasing messaging
- Dropping messing around with forced sync (@see [PATCH v2 1/2])
  and PC alignment


Beata Michalska (2):
  target/arm: kvm: Handle DABT with no valid ISS
  target/arm: kvm: Handle misconfigured dabt injection

 target/arm/cpu.h |  2 ++
 target/arm/kvm.c | 80 
 target/arm/kvm32.c   | 34 ++
 target/arm/kvm64.c   | 49 
 target/arm/kvm_arm.h | 10 +++
 5 files changed, 175 insertions(+)

-- 
2.7.4

[PATCH v8 2/2] target/arm: kvm: Handle misconfigured dabt injection

2020-06-28 Thread Beata Michalska

Injecting external data abort through KVM might trigger
an issue on kernels that do not get updated to include the KVM fix.
For those and aarch32 guests, the injected abort gets misconfigured
to be an implementation defined exception. This leads to the guest
repeatedly re-running the faulting instruction.

Add support for handling that case.

[
  Fixed-by: 018f22f95e8a
('KVM: arm: Fix DFSR setting for non-LPAE aarch32 guests')
  Fixed-by: 21aecdbd7f3a
('KVM: arm: Make inject_abt32() inject an external abort instead')
]

Signed-off-by: Beata Michalska 
---
 target/arm/cpu.h |  2 ++
 target/arm/kvm.c | 30 +-
 target/arm/kvm32.c   | 34 ++
 target/arm/kvm64.c   | 49 +
 target/arm/kvm_arm.h | 10 ++
 5 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 677584e..ed0ff09 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -570,6 +570,8 @@ typedef struct CPUARMState {
 uint64_t esr;
 } serror;
 
+uint8_t ext_dabt_raised; /* Tracking/verifying injection of ext DABT */
+
 /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
 uint32_t irq_line_state;
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 2dd8a9a..e7a596e 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -749,6 +749,29 @@ int kvm_get_vcpu_events(ARMCPU *cpu)
 
 void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
 {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+
+if (unlikely(env->ext_dabt_raised)) {
+/*
+ * Verifying that the ext DABT has been properly injected,
+ * otherwise risking indefinitely re-running the faulting instruction
+ * Covering a very narrow case for kernels 5.5..5.5.4
+ * when injected abort was misconfigured to be
+ * an IMPLEMENTATION DEFINED exception (for 32-bit EL1)
+ */
+if (!arm_feature(env, ARM_FEATURE_AARCH64) &&
+unlikely(!kvm_arm_verify_ext_dabt_pending(cs))) {
+
+error_report("Data abort exception with no valid ISS generated by "
+   "guest memory access. KVM unable to emulate faulting "
+   "instruction. Failed to inject an external data abort "
+   "into the guest.");
+abort();
+   }
+   /* Clear the status */
+   env->ext_dabt_raised = 0;
+}
 }
 
 MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
@@ -833,6 +856,8 @@ void kvm_arm_vm_state_change(void *opaque, int running, 
RunState state)
 static int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
  uint64_t fault_ipa)
 {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
 /*
  * Request KVM to inject the external data abort into the guest
  */
@@ -849,7 +874,10 @@ static int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t 
esr_iss,
 /*
  * KVM_CAP_ARM_INJECT_EXT_DABT implies KVM_CAP_VCPU_EVENTS
  */
-return kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, );
+if (!kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, )) {
+env->ext_dabt_raised = 1;
+return 0;
+}
 
 } else {
 error_report("Data abort exception triggered by guest memory access "
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index 7b3a19e..0af46b4 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -559,3 +559,37 @@ void kvm_arm_pmu_init(CPUState *cs)
 {
 qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
 }
+
+#define ARM_REG_DFSR  ARM_CP15_REG32(0, 5, 0, 0)
+#define ARM_REG_TTBCR ARM_CP15_REG32(0, 2, 0, 2)
+/*
+ *DFSR:
+ *  TTBCR.EAE == 0
+ *  FS[4]   - DFSR[10]
+ *  FS[3:0] - DFSR[3:0]
+ *  TTBCR.EAE == 1
+ *  FS, bits [5:0]
+ */
+#define DFSR_FSC(lpae, v) \
+((lpae) ? ((v) & 0x3F) : (((v) >> 6) | ((v) & 0x1F)))
+
+#define DFSC_EXTABT(lpae) ((lpae) ? 0x10 : 0x08)
+
+bool kvm_arm_verify_ext_dabt_pending(CPUState *cs)
+{
+uint32_t dfsr_val;
+
+if (!kvm_get_one_reg(cs, ARM_REG_DFSR, _val)) {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+uint32_t ttbcr;
+int lpae = 0;
+
+if (!kvm_get_one_reg(cs, ARM_REG_TTBCR, )) {
+lpae = arm_feature(env, ARM_FEATURE_LPAE) && (ttbcr & TTBCR_EAE);
+}
+/* The verification is based on FS filed of the DFSR reg only*/
+return (DFSR_FSC(lpae, dfsr_val) == DFSC_EXTABT(lpae));
+}
+return false;
+}
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index f09ed9f..88cf10c 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -1497,3 +1497,52 @@ bool kvm_arm_handle_debug(CPUState *cs, struct 
kvm_debug_exit_arch *debug_exit)
 
 return false;
 }

[PATCH v8 1/2] target/arm: kvm: Handle DABT with no valid ISS

2020-06-28 Thread Beata Michalska

On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate those instruction which is one of the
(many) reasons why KVM will not even try to do so.

Add support for handling those by requesting KVM to inject external
dabt into the quest.

Signed-off-by: Beata Michalska 
---
 target/arm/kvm.c | 57 +++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index eef3bbd..2dd8a9a 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -39,6 +39,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 
 static bool cap_has_mp_state;
 static bool cap_has_inject_serror_esr;
+static bool cap_has_inject_ext_dabt;
 
 static ARMHostCPUFeatures arm_host_cpu_features;
 
@@ -245,6 +246,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 ret = -EINVAL;
 }
 
+if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER)) {
+if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
+error_report("Failed to enable KVM_CAP_ARM_NISV_TO_USER cap");
+} else {
+/* Set status for supporting the external dabt injection */
+cap_has_inject_ext_dabt = kvm_check_extension(s,
+KVM_CAP_ARM_INJECT_EXT_DABT);
+}
+}
+
 return ret;
 }
 
@@ -810,6 +821,45 @@ void kvm_arm_vm_state_change(void *opaque, int running, 
RunState state)
 }
 }
 
+/**
+ * kvm_arm_handle_dabt_nisv:
+ * @cs: CPUState
+ * @esr_iss: ISS encoding (limited) for the exception from Data Abort
+ *   ISV bit set to '0b0' -> no valid instruction syndrome
+ * @fault_ipa: faulting address for the synchronous data abort
+ *
+ * Returns: 0 if the exception has been handled, < 0 otherwise
+ */
+static int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
+ uint64_t fault_ipa)
+{
+/*
+ * Request KVM to inject the external data abort into the guest
+ */
+if (cap_has_inject_ext_dabt) {
+struct kvm_vcpu_events events = { };
+/*
+ * The external data abort event will be handled immediately by KVM
+ * using the address fault that triggered the exit on given VCPU.
+ * Requesting injection of the external data abort does not rely
+ * on any other VCPU state. Therefore, in this particular case, the 
VCPU
+ * synchronization can be exceptionally skipped.
+ */
+events.exception.ext_dabt_pending = 1;
+/*
+ * KVM_CAP_ARM_INJECT_EXT_DABT implies KVM_CAP_VCPU_EVENTS
+ */
+return kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, );
+
+} else {
+error_report("Data abort exception triggered by guest memory access "
+ "at physical address: 0x"  TARGET_FMT_lx,
+ (target_ulong)fault_ipa);
+error_printf("KVM unable to emulate faulting instruction.\n");
+}
+return -1;
+}
+
 int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 {
 int ret = 0;
@@ -820,7 +870,12 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 ret = EXCP_DEBUG;
 } /* otherwise return to guest */
 break;
-default:
+case KVM_EXIT_ARM_NISV:
+/* External DABT with no valid iss to decode */
+ret = kvm_arm_handle_dabt_nisv(cs, run->arm_nisv.esr_iss,
+   run->arm_nisv.fault_ipa);
+break;
+ default:
 qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
   __func__, run->exit_reason);
 break;
-- 
2.7.4

[PATCH v8 0/2] target/arm: kvm: Support for KVM DABT with no valid ISS

2020-06-28 Thread Beata Michalska

Some of the ARMv7 & ARMv8 load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate the instruction which is one of the
(many) reasons why KVM will not even try to do so.

So far, if a guest made an attempt to access memory outside the memory slot,
KVM reported vague ENOSYS. As a result QEMU exited with no useful information
being provided or even a clue on what has just happened.

ARM KVM introduced support for notifying of an attempt to execute
an instruction that resulted in dabt with no valid ISS decoding info.
This still leaves QEMU to handle the case, but at least now it gives more
control and a start point for more meaningful handling of such cases.

This patchset relies on KVM to insert the external data abort into the guest.


--
v8:
 - moving the comment re support for VCPU events - missed in previous patchset

v7:
 - Rephrasing the comment regarding abort injection and vcpu synchronization
 - switching to struct initialization instead of memset

v6:
 - replacing calling kvm_put_vcpu_events with an actual ioctl call
 - making the handler function static

v5:
 - Drop syncing vcpu regs in favour of calling kvm_put_vcpu_events directly
 - Fix decoding DFSC for LPAE case
 - Add/clarify comments
 - Switch to reporting error case failure when enabling the cap

v4:
 - Removing one of the patches as it is being picked-up separately
 target/arm: kvm: Inject events at the last stage of sync
 - Moving handling KVM issue to a separate patch
 - Minor changes wrt the review comments

v3:
 - Fix setting KVM cap per vm not per vcpu
 - Simplifying the handler to bare minimum with no default logging to address
   the potential risk of overflooding the host (adding support for rate
   limiting the logs turned out to be bit too invasive to justify the little
   add-on value from logs in this particular case)
 - Adding handling KVM bug (for small range of affected kernels):
   little bit of trade-off between what's reasonable and what's effective:
   aborting qemu when running on buggy host kernel

v2:
- Improving/re-phrasing messaging
- Dropping messing around with forced sync (@see [PATCH v2 1/2])
  and PC alignment

Beata Michalska (2):
  target/arm: kvm: Handle DABT with no valid ISS
  target/arm: kvm: Handle misconfigured dabt injection

 target/arm/cpu.h |  2 ++
 target/arm/kvm.c | 85 +++-
 target/arm/kvm32.c   | 34 +
 target/arm/kvm64.c   | 49 ++
 target/arm/kvm_arm.h | 10 +++
 5 files changed, 179 insertions(+), 1 deletion(-)

-- 
2.7.4

Re: [PATCH v6 1/2] target/arm: kvm: Handle DABT with no valid ISS

2020-06-27 Thread Beata Michalska

Hi Peter,
Hi Andrew

Thanks for quick review.
I have pushed the updated version.

BR
Beata

On Fri, 26 Jun 2020 at 13:59, Peter Maydell  wrote:
>
> On Fri, 26 Jun 2020 at 10:01, Andrew Jones  wrote:
> > nit: How about using '= {0}' when declaring the variable, rather than this
> > memset?
>
> We prefer "= {}" -- although "= {0}" is the C standard approved
> version, some compiler versions produce spurious warnings for
> it in some situations. (cf commit 039d4e3df0049bdd8f9).
>
> thanks
> -- PMM

[PATCH v7 2/2] target/arm: kvm: Handle misconfigured dabt injection

2020-06-27 Thread Beata Michalska

Injecting external data abort through KVM might trigger
an issue on kernels that do not get updated to include the KVM fix.
For those and aarch32 guests, the injected abort gets misconfigured
to be an implementation defined exception. This leads to the guest
repeatedly re-running the faulting instruction.

Add support for handling that case.

[
  Fixed-by: 018f22f95e8a
('KVM: arm: Fix DFSR setting for non-LPAE aarch32 guests')
  Fixed-by: 21aecdbd7f3a
('KVM: arm: Make inject_abt32() inject an external abort instead')
]

Signed-off-by: Beata Michalska 
---
 target/arm/cpu.h |  2 ++
 target/arm/kvm.c | 30 +-
 target/arm/kvm32.c   | 34 ++
 target/arm/kvm64.c   | 49 +
 target/arm/kvm_arm.h | 10 ++
 5 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 677584e..ed0ff09 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -570,6 +570,8 @@ typedef struct CPUARMState {
 uint64_t esr;
 } serror;
 
+uint8_t ext_dabt_raised; /* Tracking/verifying injection of ext DABT */
+
 /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
 uint32_t irq_line_state;
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 3ea6f9a..66d5ee5 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -749,6 +749,29 @@ int kvm_get_vcpu_events(ARMCPU *cpu)
 
 void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
 {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+
+if (unlikely(env->ext_dabt_raised)) {
+/*
+ * Verifying that the ext DABT has been properly injected,
+ * otherwise risking indefinitely re-running the faulting instruction
+ * Covering a very narrow case for kernels 5.5..5.5.4
+ * when injected abort was misconfigured to be
+ * an IMPLEMENTATION DEFINED exception (for 32-bit EL1)
+ */
+if (!arm_feature(env, ARM_FEATURE_AARCH64) &&
+unlikely(!kvm_arm_verify_ext_dabt_pending(cs))) {
+
+error_report("Data abort exception with no valid ISS generated by "
+   "guest memory access. KVM unable to emulate faulting "
+   "instruction. Failed to inject an external data abort "
+   "into the guest.");
+abort();
+   }
+   /* Clear the status */
+   env->ext_dabt_raised = 0;
+}
 }
 
 MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
@@ -833,6 +856,8 @@ void kvm_arm_vm_state_change(void *opaque, int running, 
RunState state)
 static int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
  uint64_t fault_ipa)
 {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
 /*
  * Request KVM to inject the external data abort into the guest
  */
@@ -851,7 +876,10 @@ static int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t 
esr_iss,
  */
 events.exception.ext_dabt_pending = 1;
 
-return kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, );
+if (!kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, )) {
+env->ext_dabt_raised = 1;
+return 0;
+}
 } else {
 error_report("Data abort exception triggered by guest memory access "
  "at physical address: 0x"  TARGET_FMT_lx,
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index 7b3a19e..0af46b4 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -559,3 +559,37 @@ void kvm_arm_pmu_init(CPUState *cs)
 {
 qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
 }
+
+#define ARM_REG_DFSR  ARM_CP15_REG32(0, 5, 0, 0)
+#define ARM_REG_TTBCR ARM_CP15_REG32(0, 2, 0, 2)
+/*
+ *DFSR:
+ *  TTBCR.EAE == 0
+ *  FS[4]   - DFSR[10]
+ *  FS[3:0] - DFSR[3:0]
+ *  TTBCR.EAE == 1
+ *  FS, bits [5:0]
+ */
+#define DFSR_FSC(lpae, v) \
+((lpae) ? ((v) & 0x3F) : (((v) >> 6) | ((v) & 0x1F)))
+
+#define DFSC_EXTABT(lpae) ((lpae) ? 0x10 : 0x08)
+
+bool kvm_arm_verify_ext_dabt_pending(CPUState *cs)
+{
+uint32_t dfsr_val;
+
+if (!kvm_get_one_reg(cs, ARM_REG_DFSR, _val)) {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+uint32_t ttbcr;
+int lpae = 0;
+
+if (!kvm_get_one_reg(cs, ARM_REG_TTBCR, )) {
+lpae = arm_feature(env, ARM_FEATURE_LPAE) && (ttbcr & TTBCR_EAE);
+}
+/* The verification is based on FS filed of the DFSR reg only*/
+return (DFSR_FSC(lpae, dfsr_val) == DFSC_EXTABT(lpae));
+}
+return false;
+}
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index f09ed9f..88cf10c 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -1497,3 +1497,52 @@ bool kvm_arm_handle_debug(CPUState *cs, struct 
kvm_debug_exit_arch

[PATCH v7 1/2] target/arm: kvm: Handle DABT with no valid ISS

2020-06-27 Thread Beata Michalska

On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate those instruction which is one of the
(many) reasons why KVM will not even try to do so.

Add support for handling those by requesting KVM to inject external
dabt into the quest.

Signed-off-by: Beata Michalska 
---
 target/arm/kvm.c | 58 +++-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index eef3bbd..3ea6f9a 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -39,6 +39,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 
 static bool cap_has_mp_state;
 static bool cap_has_inject_serror_esr;
+static bool cap_has_inject_ext_dabt;
 
 static ARMHostCPUFeatures arm_host_cpu_features;
 
@@ -245,6 +246,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 ret = -EINVAL;
 }
 
+if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER)) {
+if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
+error_report("Failed to enable KVM_CAP_ARM_NISV_TO_USER cap");
+} else {
+/* Set status for supporting the external dabt injection */
+cap_has_inject_ext_dabt = kvm_check_extension(s,
+KVM_CAP_ARM_INJECT_EXT_DABT);
+}
+}
+
 return ret;
 }
 
@@ -810,6 +821,46 @@ void kvm_arm_vm_state_change(void *opaque, int running, 
RunState state)
 }
 }
 
+/**
+ * kvm_arm_handle_dabt_nisv:
+ * @cs: CPUState
+ * @esr_iss: ISS encoding (limited) for the exception from Data Abort
+ *   ISV bit set to '0b0' -> no valid instruction syndrome
+ * @fault_ipa: faulting address for the synchronous data abort
+ *
+ * Returns: 0 if the exception has been handled, < 0 otherwise
+ */
+static int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
+ uint64_t fault_ipa)
+{
+/*
+ * Request KVM to inject the external data abort into the guest
+ */
+if (cap_has_inject_ext_dabt) {
+/*
+ * KVM_CAP_ARM_INJECT_EXT_DABT support implies one for
+ * KVM_CAP_VCPU_EVENTS
+ */
+struct kvm_vcpu_events events = { };
+/*
+ * The external data abort event will be handled immediately by KVM
+ * using the address fault that triggered the exit on given VCPU.
+ * Requesting injection of the external data abort does not rely
+ * on any other VCPU state. Therefore, in this particular case, the 
VCPU
+ * synchronization can be exceptionally skipped.
+ */
+events.exception.ext_dabt_pending = 1;
+
+return kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, );
+} else {
+error_report("Data abort exception triggered by guest memory access "
+ "at physical address: 0x"  TARGET_FMT_lx,
+ (target_ulong)fault_ipa);
+error_printf("KVM unable to emulate faulting instruction.\n");
+}
+return -1;
+}
+
 int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 {
 int ret = 0;
@@ -820,7 +871,12 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 ret = EXCP_DEBUG;
 } /* otherwise return to guest */
 break;
-default:
+case KVM_EXIT_ARM_NISV:
+/* External DABT with no valid iss to decode */
+ret = kvm_arm_handle_dabt_nisv(cs, run->arm_nisv.esr_iss,
+   run->arm_nisv.fault_ipa);
+break;
+ default:
 qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
   __func__, run->exit_reason);
 break;
-- 
2.7.4

[PATCH v7 0/2] target/arm: kvm: Support for KVM DABT with no valid ISS

2020-06-27 Thread Beata Michalska

Some of the ARMv7 & ARMv8 load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate the instruction which is one of the
(many) reasons why KVM will not even try to do so.

So far, if a guest made an attempt to access memory outside the memory slot,
KVM reported vague ENOSYS. As a result QEMU exited with no useful information
being provided or even a clue on what has just happened.

ARM KVM introduced support for notifying of an attempt to execute
an instruction that resulted in dabt with no valid ISS decoding info.
This still leaves QEMU to handle the case, but at least now it gives more
control and a start point for more meaningful handling of such cases.

This patchset relies on KVM to insert the external data abort into the guest.


--
v7:
 - Rephrasing the comment regarding abort injection and vcpu synchronization
 - switching to struct initialization instead of memset

v6:
 - replacing calling kvm_put_vcpu_events with an actual ioctl call
 - making the handler function static

v5:
 - Drop syncing vcpu regs in favour of calling kvm_put_vcpu_events directly
 - Fix decoding DFSC for LPAE case
 - Add/clarify comments
 - Switch to reporting error case failure when enabling the cap

v4:
 - Removing one of the patches as it is being picked-up separately
 target/arm: kvm: Inject events at the last stage of sync
 - Moving handling KVM issue to a separate patch
 - Minor changes wrt the review comments

v3:
 - Fix setting KVM cap per vm not per vcpu
 - Simplifying the handler to bare minimum with no default logging to address
   the potential risk of overflooding the host (adding support for rate
   limiting the logs turned out to be bit too invasive to justify the little
   add-on value from logs in this particular case)
 - Adding handling KVM bug (for small range of affected kernels):
   little bit of trade-off between what's reasonable and what's effective:
   aborting qemu when running on buggy host kernel

v2:
- Improving/re-phrasing messaging
- Dropping messing around with forced sync (@see [PATCH v2 1/2])
  and PC alignment



Beata Michalska (2):
  target/arm: kvm: Handle DABT with no valid ISS
  target/arm: kvm: Handle misconfigured dabt injection

 target/arm/cpu.h |  2 ++
 target/arm/kvm.c | 86 +++-
 target/arm/kvm32.c   | 34 +
 target/arm/kvm64.c   | 49 ++
 target/arm/kvm_arm.h | 10 ++
 5 files changed, 180 insertions(+), 1 deletion(-)

-- 
2.7.4

[PATCH v6 2/2] target/arm: kvm: Handle misconfigured dabt injection

2020-06-25 Thread Beata Michalska

Injecting external data abort through KVM might trigger
an issue on kernels that do not get updated to include the KVM fix.
For those and aarch32 guests, the injected abort gets misconfigured
to be an implementation defined exception. This leads to the guest
repeatedly re-running the faulting instruction.

Add support for handling that case.

[
  Fixed-by: 018f22f95e8a
('KVM: arm: Fix DFSR setting for non-LPAE aarch32 guests')
  Fixed-by: 21aecdbd7f3a
('KVM: arm: Make inject_abt32() inject an external abort instead')
]

Signed-off-by: Beata Michalska 
---
 target/arm/cpu.h |  2 ++
 target/arm/kvm.c | 30 +-
 target/arm/kvm32.c   | 34 ++
 target/arm/kvm64.c   | 49 +
 target/arm/kvm_arm.h | 10 ++
 5 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 677584e..ed0ff09 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -570,6 +570,8 @@ typedef struct CPUARMState {
 uint64_t esr;
 } serror;
 
+uint8_t ext_dabt_raised; /* Tracking/verifying injection of ext DABT */
+
 /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
 uint32_t irq_line_state;
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 265c4b8..85a09ea 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -749,6 +749,29 @@ int kvm_get_vcpu_events(ARMCPU *cpu)
 
 void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
 {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+
+if (unlikely(env->ext_dabt_raised)) {
+/*
+ * Verifying that the ext DABT has been properly injected,
+ * otherwise risking indefinitely re-running the faulting instruction
+ * Covering a very narrow case for kernels 5.5..5.5.4
+ * when injected abort was misconfigured to be
+ * an IMPLEMENTATION DEFINED exception (for 32-bit EL1)
+ */
+if (!arm_feature(env, ARM_FEATURE_AARCH64) &&
+unlikely(!kvm_arm_verify_ext_dabt_pending(cs))) {
+
+error_report("Data abort exception with no valid ISS generated by "
+   "guest memory access. KVM unable to emulate faulting "
+   "instruction. Failed to inject an external data abort "
+   "into the guest.");
+abort();
+   }
+   /* Clear the status */
+   env->ext_dabt_raised = 0;
+}
 }
 
 MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
@@ -833,6 +856,8 @@ void kvm_arm_vm_state_change(void *opaque, int running, 
RunState state)
 static int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
  uint64_t fault_ipa)
 {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
 /*
  * Request KVM to inject the external data abort into the guest
  */
@@ -852,7 +877,10 @@ static int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t 
esr_iss,
  */
 events.exception.ext_dabt_pending = 1;
 
-return kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, );
+if (!kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, )) {
+env->ext_dabt_raised = 1;
+return 0;
+}
 } else {
 error_report("Data abort exception triggered by guest memory access "
  "at physical address: 0x"  TARGET_FMT_lx,
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index 7b3a19e..0af46b4 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -559,3 +559,37 @@ void kvm_arm_pmu_init(CPUState *cs)
 {
 qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
 }
+
+#define ARM_REG_DFSR  ARM_CP15_REG32(0, 5, 0, 0)
+#define ARM_REG_TTBCR ARM_CP15_REG32(0, 2, 0, 2)
+/*
+ *DFSR:
+ *  TTBCR.EAE == 0
+ *  FS[4]   - DFSR[10]
+ *  FS[3:0] - DFSR[3:0]
+ *  TTBCR.EAE == 1
+ *  FS, bits [5:0]
+ */
+#define DFSR_FSC(lpae, v) \
+((lpae) ? ((v) & 0x3F) : (((v) >> 6) | ((v) & 0x1F)))
+
+#define DFSC_EXTABT(lpae) ((lpae) ? 0x10 : 0x08)
+
+bool kvm_arm_verify_ext_dabt_pending(CPUState *cs)
+{
+uint32_t dfsr_val;
+
+if (!kvm_get_one_reg(cs, ARM_REG_DFSR, _val)) {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+uint32_t ttbcr;
+int lpae = 0;
+
+if (!kvm_get_one_reg(cs, ARM_REG_TTBCR, )) {
+lpae = arm_feature(env, ARM_FEATURE_LPAE) && (ttbcr & TTBCR_EAE);
+}
+/* The verification is based on FS filed of the DFSR reg only*/
+return (DFSR_FSC(lpae, dfsr_val) == DFSC_EXTABT(lpae));
+}
+return false;
+}
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index f09ed9f..88cf10c 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -1497,3 +1497,52 @@ bool kvm_arm_handle_debug(CPUState *cs, struct 
kvm_debug_exit_arch

[PATCH v6 1/2] target/arm: kvm: Handle DABT with no valid ISS

2020-06-25 Thread Beata Michalska

On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate those instruction which is one of the
(many) reasons why KVM will not even try to do so.

Add support for handling those by requesting KVM to inject external
dabt into the quest.

Signed-off-by: Beata Michalska 
---
 target/arm/kvm.c | 59 +++-
 1 file changed, 58 insertions(+), 1 deletion(-)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index eef3bbd..265c4b8 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -39,6 +39,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 
 static bool cap_has_mp_state;
 static bool cap_has_inject_serror_esr;
+static bool cap_has_inject_ext_dabt;
 
 static ARMHostCPUFeatures arm_host_cpu_features;
 
@@ -245,6 +246,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 ret = -EINVAL;
 }
 
+if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER)) {
+if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
+error_report("Failed to enable KVM_CAP_ARM_NISV_TO_USER cap");
+} else {
+/* Set status for supporting the external dabt injection */
+cap_has_inject_ext_dabt = kvm_check_extension(s,
+KVM_CAP_ARM_INJECT_EXT_DABT);
+}
+}
+
 return ret;
 }
 
@@ -810,6 +821,47 @@ void kvm_arm_vm_state_change(void *opaque, int running, 
RunState state)
 }
 }
 
+/**
+ * kvm_arm_handle_dabt_nisv:
+ * @cs: CPUState
+ * @esr_iss: ISS encoding (limited) for the exception from Data Abort
+ *   ISV bit set to '0b0' -> no valid instruction syndrome
+ * @fault_ipa: faulting address for the synchronous data abort
+ *
+ * Returns: 0 if the exception has been handled, < 0 otherwise
+ */
+static int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
+ uint64_t fault_ipa)
+{
+/*
+ * Request KVM to inject the external data abort into the guest
+ */
+if (cap_has_inject_ext_dabt) {
+struct kvm_vcpu_events events;
+/*
+ * KVM_CAP_ARM_INJECT_EXT_DABT support implies one for
+ * KVM_CAP_VCPU_EVENTS
+ */
+memset(, 0, sizeof(events));
+/*
+ * Skipping all the overhead of syncing vcpu regs back and forth
+ * and messing around with the vcpu_dirty flag to avoid
+ * overwriting changes done by KVM : directly calling
+ * the associated ioctl with the status set for external data abort,
+ * which, in turn, will be directly delivered to the affected vcpu.
+ */
+events.exception.ext_dabt_pending = 1;
+
+return kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, );
+} else {
+error_report("Data abort exception triggered by guest memory access "
+ "at physical address: 0x"  TARGET_FMT_lx,
+ (target_ulong)fault_ipa);
+error_printf("KVM unable to emulate faulting instruction.\n");
+}
+return -1;
+}
+
 int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 {
 int ret = 0;
@@ -820,7 +872,12 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 ret = EXCP_DEBUG;
 } /* otherwise return to guest */
 break;
-default:
+case KVM_EXIT_ARM_NISV:
+/* External DABT with no valid iss to decode */
+ret = kvm_arm_handle_dabt_nisv(cs, run->arm_nisv.esr_iss,
+   run->arm_nisv.fault_ipa);
+break;
+ default:
 qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
   __func__, run->exit_reason);
 break;
-- 
2.7.4

[PATCH v6 0/2] target/arm: kvm: Support for KVM DABT with no valid ISS

2020-06-25 Thread Beata Michalska

Some of the ARMv7 & ARMv8 load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate the instruction which is one of the
(many) reasons why KVM will not even try to do so.

So far, if a guest made an attempt to access memory outside the memory slot,
KVM reported vague ENOSYS. As a result QEMU exited with no useful information
being provided or even a clue on what has just happened.

ARM KVM introduced support for notifying of an attempt to execute
an instruction that resulted in dabt with no valid ISS decoding info.
This still leaves QEMU to handle the case, but at least now it gives more
control and a start point for more meaningful handling of such cases.

This patchset relies on KVM to insert the external data abort into the guest.


--
v6:
 - replacing calling kvm_put_vcpu_events with an actual ioctl call
 - making the handler function static

v5:
 - Drop syncing vcpu regs in favour of calling kvm_put_vcpu_events directly
 - Fix decoding DFSC for LPAE case
 - Add/clarify comments
 - Switch to reporting error case failure when enabling the cap

v4:
 - Removing one of the patches as it is being picked-up separately
 target/arm: kvm: Inject events at the last stage of sync
 - Moving handling KVM issue to a separate patch
 - Minor changes wrt the review comments

v3:
 - Fix setting KVM cap per vm not per vcpu
 - Simplifying the handler to bare minimum with no default logging to address
   the potential risk of overflooding the host (adding support for rate
   limiting the logs turned out to be bit too invasive to justify the little
   add-on value from logs in this particular case)
 - Adding handling KVM bug (for small range of affected kernels):
   little bit of trade-off between what's reasonable and what's effective:
   aborting qemu when running on buggy host kernel

v2:
- Improving/re-phrasing messaging
- Dropping messing around with forced sync (@see [PATCH v2 1/2])
  and PC alignment


Beata Michalska (2):
  target/arm: kvm: Handle DABT with no valid ISS
  target/arm: kvm: Handle misconfigured dabt injection

 target/arm/cpu.h |  2 ++
 target/arm/kvm.c | 87 +++-
 target/arm/kvm32.c   | 34 
 target/arm/kvm64.c   | 49 +
 target/arm/kvm_arm.h | 10 ++
 5 files changed, 181 insertions(+), 1 deletion(-)

-- 
2.7.4

Re: [PATCH v5 1/2] target/arm: kvm: Handle DABT with no valid ISS

2020-06-16 Thread Beata Michalska

Hi Andrew,

Thanks for the feedback.

On Tue, 16 Jun 2020 at 09:33, Andrew Jones  wrote:
>
> Hi Beata,
>
> I see Peter just picked this up, so I'm a bit late getting to it. I do
> have a couple comments below though.
>
> Thanks,
> drew
>
> On Fri, May 29, 2020 at 12:27:56PM +0100, Beata Michalska wrote:
> > On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
> > exception with no valid ISS info to be decoded. The lack of decode info
> > makes it at least tricky to emulate those instruction which is one of the
> > (many) reasons why KVM will not even try to do so.
> >
> > Add support for handling those by requesting KVM to inject external
> > dabt into the quest.
> >
> > Signed-off-by: Beata Michalska 
> > ---
> >  target/arm/cpu.h |  2 ++
> >  target/arm/kvm.c | 64 
> > +++-
> >  target/arm/kvm_arm.h | 11 +
> >  3 files changed, 76 insertions(+), 1 deletion(-)
> >
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index 677584e..3702f21 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -570,6 +570,8 @@ typedef struct CPUARMState {
> >  uint64_t esr;
> >  } serror;
> >
> > +uint8_t ext_dabt_pending; /* Request for injecting ext DABT */
> > +
> >  /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
> >  uint32_t irq_line_state;
> >
> > diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> > index 4bdbe6d..bf84224 100644
> > --- a/target/arm/kvm.c
> > +++ b/target/arm/kvm.c
> > @@ -39,6 +39,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] 
> > = {
> >
> >  static bool cap_has_mp_state;
> >  static bool cap_has_inject_serror_esr;
> > +static bool cap_has_inject_ext_dabt;
> >
> >  static ARMHostCPUFeatures arm_host_cpu_features;
> >
> > @@ -244,6 +245,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
> >  ret = -EINVAL;
> >  }
> >
> > +if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER)) {
> > +if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
> > +error_report("Failed to enable KVM_CAP_ARM_NISV_TO_USER cap");
> > +} else {
> > +/* Set status for supporting the external dabt injection */
> > +cap_has_inject_ext_dabt = kvm_check_extension(s,
> > +KVM_CAP_ARM_INJECT_EXT_DABT);
> > +}
> > +}
> > +
> >  return ret;
> >  }
> >
> > @@ -703,9 +714,16 @@ int kvm_put_vcpu_events(ARMCPU *cpu)
> >  events.exception.serror_esr = env->serror.esr;
> >  }
> >
> > +if (cap_has_inject_ext_dabt) {
> > +events.exception.ext_dabt_pending = env->ext_dabt_pending;
> > +}
> > +
> >  ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, );
> >  if (ret) {
> >  error_report("failed to put vcpu events");
> > +} else {
> > +/* Clear instantly if the call was successful */
> > +env->ext_dabt_pending = 0;
> >  }
> >
> >  return ret;
> > @@ -819,7 +837,12 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run 
> > *run)
> >  ret = EXCP_DEBUG;
> >  } /* otherwise return to guest */
> >  break;
> > -default:
> > +case KVM_EXIT_ARM_NISV:
> > +/* External DABT with no valid iss to decode */
> > +ret = kvm_arm_handle_dabt_nisv(cs, run->arm_nisv.esr_iss,
> > +   run->arm_nisv.fault_ipa);
> > +break;
> > + default:
> >  qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
> >__func__, run->exit_reason);
> >  break;
> > @@ -955,3 +978,42 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
> >  {
> >  return (data - 32) & 0x;
> >  }
> > +
> > +int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
> > + uint64_t fault_ipa)
>
> This function could be static since it's in the same file as its one
> and only caller.
>
Right. Will do.

> > +{
> > +ARMCPU *cpu = ARM_CPU(cs);
> > +CPUARMState *env = >env;
> > +
> > +   /*
> > +* ISS [23:14] is invalid so there is a limited info
> > +* on what has just happened so the only *useful* thing that can
> > +* be retrieved from ISS is WnR &

[PATCH v5 2/2] target/arm: kvm: Handle misconfigured dabt injection

2020-05-29 Thread Beata Michalska

Injecting external data abort through KVM might trigger
an issue on kernels that do not get updated to include the KVM fix.
For those and aarch32 guests, the injected abort gets misconfigured
to be an implementation defined exception. This leads to the guest
repeatedly re-running the faulting instruction.

Add support for handling that case.

[
  Fixed-by: 018f22f95e8a
('KVM: arm: Fix DFSR setting for non-LPAE aarch32 guests')
  Fixed-by: 21aecdbd7f3a
('KVM: arm: Make inject_abt32() inject an external abort instead')
]

Signed-off-by: Beata Michalska 
---
 target/arm/cpu.h |  1 +
 target/arm/kvm.c | 30 +-
 target/arm/kvm32.c   | 34 ++
 target/arm/kvm64.c   | 49 +
 target/arm/kvm_arm.h | 10 ++
 5 files changed, 123 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 3702f21..5ebfb72 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -571,6 +571,7 @@ typedef struct CPUARMState {
 } serror;
 
 uint8_t ext_dabt_pending; /* Request for injecting ext DABT */
+uint8_t ext_dabt_raised; /* Tracking/verifying injection of ext DABT */
 
 /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
 uint32_t irq_line_state;
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index bf84224..ac73c67 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -721,7 +721,12 @@ int kvm_put_vcpu_events(ARMCPU *cpu)
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, );
 if (ret) {
 error_report("failed to put vcpu events");
-} else {
+} else if (env->ext_dabt_pending) {
+/*
+ * Mark that the external DABT has been injected,
+ * if one has been requested
+ */
+env->ext_dabt_raised = env->ext_dabt_pending;
 /* Clear instantly if the call was successful */
 env->ext_dabt_pending = 0;
 }
@@ -755,6 +760,29 @@ int kvm_get_vcpu_events(ARMCPU *cpu)
 
 void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
 {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+
+if (unlikely(env->ext_dabt_raised)) {
+/*
+ * Verifying that the ext DABT has been properly injected,
+ * otherwise risking indefinitely re-running the faulting instruction
+ * Covering a very narrow case for kernels 5.5..5.5.4
+ * when injected abort was misconfigured to be
+ * an IMPLEMENTATION DEFINED exception (for 32-bit EL1)
+ */
+if (!arm_feature(env, ARM_FEATURE_AARCH64) &&
+unlikely(!kvm_arm_verify_ext_dabt_pending(cs))) {
+
+error_report("Data abort exception with no valid ISS generated by "
+   "guest memory access. KVM unable to emulate faulting "
+   "instruction. Failed to inject an external data abort "
+   "into the guest.");
+abort();
+   }
+   /* Clear the status */
+   env->ext_dabt_raised = 0;
+}
 }
 
 MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index 7b3a19e..0af46b4 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -559,3 +559,37 @@ void kvm_arm_pmu_init(CPUState *cs)
 {
 qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
 }
+
+#define ARM_REG_DFSR  ARM_CP15_REG32(0, 5, 0, 0)
+#define ARM_REG_TTBCR ARM_CP15_REG32(0, 2, 0, 2)
+/*
+ *DFSR:
+ *  TTBCR.EAE == 0
+ *  FS[4]   - DFSR[10]
+ *  FS[3:0] - DFSR[3:0]
+ *  TTBCR.EAE == 1
+ *  FS, bits [5:0]
+ */
+#define DFSR_FSC(lpae, v) \
+((lpae) ? ((v) & 0x3F) : (((v) >> 6) | ((v) & 0x1F)))
+
+#define DFSC_EXTABT(lpae) ((lpae) ? 0x10 : 0x08)
+
+bool kvm_arm_verify_ext_dabt_pending(CPUState *cs)
+{
+uint32_t dfsr_val;
+
+if (!kvm_get_one_reg(cs, ARM_REG_DFSR, _val)) {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+uint32_t ttbcr;
+int lpae = 0;
+
+if (!kvm_get_one_reg(cs, ARM_REG_TTBCR, )) {
+lpae = arm_feature(env, ARM_FEATURE_LPAE) && (ttbcr & TTBCR_EAE);
+}
+/* The verification is based on FS filed of the DFSR reg only*/
+return (DFSR_FSC(lpae, dfsr_val) == DFSC_EXTABT(lpae));
+}
+return false;
+}
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index f09ed9f..88cf10c 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -1497,3 +1497,52 @@ bool kvm_arm_handle_debug(CPUState *cs, struct 
kvm_debug_exit_arch *debug_exit)
 
 return false;
 }
+
+#define ARM64_REG_ESR_EL1 ARM64_SYS_REG(3, 0, 5, 2, 0)
+#define ARM64_REG_TCR_EL1 ARM64_SYS_REG(3, 0, 2, 0, 2)
+
+/*
+ * ESR_EL1
+ * ISS encoding
+ * AARCH64: DFSC,   bits [5:0]
+ * AARCH32:
+ *  TTBCR.EAE == 0
+ *  FS[4]   - DFSR[10]
+ *

[PATCH v5 1/2] target/arm: kvm: Handle DABT with no valid ISS

2020-05-29 Thread Beata Michalska

On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate those instruction which is one of the
(many) reasons why KVM will not even try to do so.

Add support for handling those by requesting KVM to inject external
dabt into the quest.

Signed-off-by: Beata Michalska 
---
 target/arm/cpu.h |  2 ++
 target/arm/kvm.c | 64 +++-
 target/arm/kvm_arm.h | 11 +
 3 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 677584e..3702f21 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -570,6 +570,8 @@ typedef struct CPUARMState {
 uint64_t esr;
 } serror;
 
+uint8_t ext_dabt_pending; /* Request for injecting ext DABT */
+
 /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
 uint32_t irq_line_state;
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 4bdbe6d..bf84224 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -39,6 +39,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 
 static bool cap_has_mp_state;
 static bool cap_has_inject_serror_esr;
+static bool cap_has_inject_ext_dabt;
 
 static ARMHostCPUFeatures arm_host_cpu_features;
 
@@ -244,6 +245,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 ret = -EINVAL;
 }
 
+if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER)) {
+if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
+error_report("Failed to enable KVM_CAP_ARM_NISV_TO_USER cap");
+} else {
+/* Set status for supporting the external dabt injection */
+cap_has_inject_ext_dabt = kvm_check_extension(s,
+KVM_CAP_ARM_INJECT_EXT_DABT);
+}
+}
+
 return ret;
 }
 
@@ -703,9 +714,16 @@ int kvm_put_vcpu_events(ARMCPU *cpu)
 events.exception.serror_esr = env->serror.esr;
 }
 
+if (cap_has_inject_ext_dabt) {
+events.exception.ext_dabt_pending = env->ext_dabt_pending;
+}
+
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, );
 if (ret) {
 error_report("failed to put vcpu events");
+} else {
+/* Clear instantly if the call was successful */
+env->ext_dabt_pending = 0;
 }
 
 return ret;
@@ -819,7 +837,12 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 ret = EXCP_DEBUG;
 } /* otherwise return to guest */
 break;
-default:
+case KVM_EXIT_ARM_NISV:
+/* External DABT with no valid iss to decode */
+ret = kvm_arm_handle_dabt_nisv(cs, run->arm_nisv.esr_iss,
+   run->arm_nisv.fault_ipa);
+break;
+ default:
 qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
   __func__, run->exit_reason);
 break;
@@ -955,3 +978,42 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
 return (data - 32) & 0x;
 }
+
+int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
+ uint64_t fault_ipa)
+{
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+
+   /*
+* ISS [23:14] is invalid so there is a limited info
+* on what has just happened so the only *useful* thing that can
+* be retrieved from ISS is WnR & DFSC (though in some cases WnR
+* might be less of a value as well)
+*/
+
+/*
+ * Request KVM to inject the external data abort into the guest
+ * by setting a pending exception on the affected vcpu.
+ */
+if (cap_has_inject_ext_dabt) {
+/* Set pending exception */
+env->ext_dabt_pending = 1;
+/*
+ * Even though at this point, the vcpu regs are out of sync,
+ * directly calling the KVM_SET_VCPU_EVENTS ioctl without
+ * explicitly synchronizing those, is enough and it also avoids
+ * overwriting changes done by KVM.
+ * The vcpu is not being marked as 'dirty' as all the changes
+ * needed to inject the abort are being handled by KVM only
+ * and there is no need for syncing either way
+ */
+return kvm_put_vcpu_events(cpu);
+} else {
+error_report("Data abort exception triggered by guest memory access "
+ "at physical address: 0x"  TARGET_FMT_lx,
+ (target_ulong)fault_ipa);
+error_printf("KVM unable to emulate faulting instruction.\n");
+return -1;
+}
+}
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index 48bf5e1..e939e51 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -453,6 +453,17 @@ struct kvm_guest_debug_arch;
 void kvm_arm_copy_hw_debug_data(struct kvm_guest_debug_arch *ptr);
 
 /**
+ * kvm_arm_handle_dabt_nisv:
+ * @cs: CPUSt

[PATCH v5 0/2] target/arm: kvm: Support for KVM DABT with no valid ISS

2020-05-29 Thread Beata Michalska

Some of the ARMv7 & ARMv8 load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate the instruction which is one of the
(many) reasons why KVM will not even try to do so.

So far, if a guest made an attempt to access memory outside the memory slot,
KVM reported vague ENOSYS. As a result QEMU exited with no useful information
being provided or even a clue on what has just happened.

ARM KVM introduced support for notifying of an attempt to execute
an instruction that resulted in dabt with no valid ISS decoding info.
This still leaves QEMU to handle the case, but at least now it gives more
control and a start point for more meaningful handling of such cases.

This patchset relies on KVM to insert the external data abort into the guest.


Thanks for all the input on the previous version!
--

v5:
 - Drop syncing vcpu regs in favour of calling kvm_put_vcpu_events directly
 - Fix decoding DFSC for LPAE case
 - Add/clarify comments
 - Switch to reporting error case failure when enabling the cap

v4:
 - Removing one of the patches as it is being picked-up separately
 target/arm: kvm: Inject events at the last stage of sync
 - Moving handling KVM issue to a separate patch
 - Minor changes wrt the review comments

v3:
 - Fix setting KVM cap per vm not per vcpu
 - Simplifying the handler to bare minimum with no default logging to address
   the potential risk of overflooding the host (adding support for rate
   limiting the logs turned out to be bit too invasive to justify the little
   add-on value from logs in this particular case)
 - Adding handling KVM bug (for small range of affected kernels):
   little bit of trade-off between what's reasonable and what's effective:
   aborting qemu when running on buggy host kernel

v2:
- Improving/re-phrasing messaging
- Dropping messing around with forced sync (@see [PATCH v2 1/2])
  and PC alignment

Beata Michalska (2):
  target/arm: kvm: Handle DABT with no valid ISS
  target/arm: kvm: Handle misconfigured dabt injection

 target/arm/cpu.h |  3 ++
 target/arm/kvm.c | 92 +++-
 target/arm/kvm32.c   | 35 
 target/arm/kvm64.c   | 49 
 target/arm/kvm_arm.h | 21 
 5 files changed, 199 insertions(+), 1 deletion(-)

-- 
2.7.4

Re: [PATCH 0/4] memory: Add memory_region_msync() & make NVMe emulated device generic

2020-05-08 Thread Beata Michalska

On Fri, 8 May 2020 at 07:33, Paolo Bonzini  wrote:
>
> On 08/05/20 08:24, Philippe Mathieu-Daudé wrote:
> > It is not clear if dccvap_writefn() really needs
> > memory_region_writeback() or could use memory_region_msync().
>
> Indeed, I don't understand the code and why it matters that
> mr->dirty_log_mask is nonzero.
>
> mr->dirty_log_mask tells if dirty tracking has been enabled, not if the
> page is dirty.  It would always be true during live migration and when
> running on TCG, but otherwise it would always be false.
>
> Beata, can you explain what you had in mind? :)
>
It has been a while ... , but the intention there was to skip the sync
if there is nothing to be synced in the first place - so for performance
reasons. I honestly do not recall why I went for the dirty_log_mask,
as that seems not  to be the right choice .

BR
Beata
> Paolo
>

Re: [PATCH v4 1/2] target/arm: kvm: Handle DABT with no valid ISS

2020-04-18 Thread Beata Michalska

On Fri, 17 Apr 2020 at 14:10, Andrew Jones  wrote:
>
> On Fri, Apr 17, 2020 at 11:39:25AM +0100, Peter Maydell wrote:
> > On Mon, 23 Mar 2020 at 11:32, Beata Michalska
> >  wrote:
> > >
> > > On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
> > > exception with no valid ISS info to be decoded. The lack of decode info
> > > makes it at least tricky to emulate those instruction which is one of the
> > > (many) reasons why KVM will not even try to do so.
> > >
> > > Add support for handling those by requesting KVM to inject external
> > > dabt into the quest.
> > >
> > > Signed-off-by: Beata Michalska 
> > > ---
> > >  target/arm/cpu.h |  2 ++
> > >  target/arm/kvm.c | 54 
> > > 
> > >  target/arm/kvm_arm.h | 11 +++
> > >  3 files changed, 67 insertions(+)
> > >
> > > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > > index 4ffd991..4f834c1 100644
> > > --- a/target/arm/cpu.h
> > > +++ b/target/arm/cpu.h
> > > @@ -560,6 +560,8 @@ typedef struct CPUARMState {
> > >  uint64_t esr;
> > >  } serror;
> > >
> > > +uint8_t ext_dabt_pending; /* Request for injecting ext DABT */
> >
> > I was trying to work out whether we need to migrate this state,
> > and I'm not sure. Andrew, do you know? I think this comes down
> > to "at what points in QEMU's kvm run loop can migration kick in",
> > and specifically if we get a KVM_EXIT_ARM_NISV do we definitely
> > go round the loop and KVM_RUN again without ever checking
> > to see if we should do a migration ?
> >
>
> I'd prefer a migration expert confirm this, so I've CC'ed David and Juan,
> but afaict there's no way to break out of the KVM_RUN loop after a
> successful (ret=0) call to kvm_arch_handle_exit() until after the next
> KVM_RUN ioctl. This is because even if migration kicks the vcpus between
> kvm_arch_handle_exit() and the next run, the signal won't do anything
> other than prepare the vcpu for an immediate exit.
>
I am definitely not an expert on that one, but if I got things right,
by the time the 'exit_request' gets verified , the external abort
should already be set up , the pending status cleared (through
KVM_SET_VCPU_EVENTS)
and the reg content verified (kvm_arch_pre_run), as all of it is being
 triggered
prior to checking the exit request. So this should not need a
dedicated migration state.

I will hold on with sending the new version though to get the confirmation
whether that is the case.

Thanks,

BR
Beata
>
> Thanks,
> drew
>

Re: [PATCH v4 2/2] target/arm: kvm: Handle potential issue with dabt injection

2020-04-07 Thread Beata Michalska

On Tue, 7 Apr 2020 at 12:24, Peter Maydell  wrote:
>
> On Fri, 3 Apr 2020 at 09:44, Andrew Jones  wrote:
> >
> > On Mon, Mar 23, 2020 at 11:32:27AM +0000, Beata Michalska wrote:
> > > Injecting external data abort through KVM might trigger
> > > an issue on kernels that do not get updated to include the KVM fix.
> > > For those and aarch32 guests, the injected abort gets misconfigured
> > > to be an implementation defined exception. This leads to the guest
> > > repeatedly re-running the faulting instruction.
> > >
> > > Add support for handling that case.
> > > [
> > >   Fixed-by: 018f22f95e8a
> > >   ('KVM: arm: Fix DFSR setting for non-LPAE aarch32 guests')
> > >   Fixed-by: 21aecdbd7f3a
> > >   ('KVM: arm: Make inject_abt32() inject an external abort
instead')
> > > ]
> > >
>
> > I'll leave the decision to take this KVM bug workaround patch at all to
Peter,
> > and I didn't actually review whether or not
kvm_arm_verify_ext_dabt_pending
> > is doing what it claims it's doing, so I'm reluctant to give an r-b on
> > this patch. But, as far as the code goes, besides the comments above, it
> > looks fine to me.
>
> I think that having the workaround for the broken kernels is
> reasonable (in fact it might have been my suggestion).
>

I will update the current version to cover the review feedback
and resend the patches soon.

Thanks a lot!

BR
Beata
> thanks
> -- PMM

Re: [PATCH v4 2/2] target/arm: kvm: Handle potential issue with dabt injection

2020-04-07 Thread Beata Michalska

On Fri, 3 Apr 2020 at 09:44, Andrew Jones  wrote:
>
> On Mon, Mar 23, 2020 at 11:32:27AM +, Beata Michalska wrote:
> > Injecting external data abort through KVM might trigger
> > an issue on kernels that do not get updated to include the KVM fix.
> > For those and aarch32 guests, the injected abort gets misconfigured
> > to be an implementation defined exception. This leads to the guest
> > repeatedly re-running the faulting instruction.
> >
> > Add support for handling that case.
> > [
> >   Fixed-by: 018f22f95e8a
> >   ('KVM: arm: Fix DFSR setting for non-LPAE aarch32 guests')
> >   Fixed-by: 21aecdbd7f3a
> >   ('KVM: arm: Make inject_abt32() inject an external abort instead')
> > ]
> >
> > Signed-off-by: Beata Michalska 
> > ---
> >  target/arm/cpu.h |  1 +
> >  target/arm/kvm.c | 30 +-
> >  target/arm/kvm32.c   | 25 +
> >  target/arm/kvm64.c   | 34 ++
> >  target/arm/kvm_arm.h | 10 ++
> >  5 files changed, 99 insertions(+), 1 deletion(-)
> >
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index 4f834c1..868afc6 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -561,6 +561,7 @@ typedef struct CPUARMState {
> >  } serror;
> >
> >  uint8_t ext_dabt_pending; /* Request for injecting ext DABT */
> > +uint8_t ext_dabt_raised; /* Tracking/verifying injection of ext
DABT */
> >
> >  /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
> >  uint32_t irq_line_state;
> > diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> > index c088589..58ad734 100644
> > --- a/target/arm/kvm.c
> > +++ b/target/arm/kvm.c
> > @@ -721,7 +721,12 @@ int kvm_put_vcpu_events(ARMCPU *cpu)
> >  ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, );
> >  if (ret) {
> >  error_report("failed to put vcpu events");
> > -} else {
> > +} else if (env->ext_dabt_pending) {
> > +/*
> > + * Mark that the external DABT has been injected,
> > + * if one has been requested
> > + */
> > +env->ext_dabt_raised = env->ext_dabt_pending;
> >  /* Clear instantly if the call was successful */
> >  env->ext_dabt_pending = 0;
> >  }
> > @@ -755,6 +760,29 @@ int kvm_get_vcpu_events(ARMCPU *cpu)
> >
> >  void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
> >  {
> > +ARMCPU *cpu = ARM_CPU(cs);
> > +CPUARMState *env = >env;
> > +
> > +if (unlikely(env->ext_dabt_raised)) {
> > +/*
> > + * Verifying that the ext DABT has been properly injected,
> > + * otherwise risking indefinitely re-running the faulting
instruction
> > + * Covering a very narrow case for kernels 5.5..5.5.4
> > + * when injected abort was misconfigured to be
> > + * an IMPLEMENTATION DEFINED exception (for 32-bit EL1)
> > + */
> > +if (!arm_feature(env, ARM_FEATURE_AARCH64) &&
> > +unlikely(!kvm_arm_verify_ext_dabt_pending(cs))) {
> > +
> > +error_report("Data abort exception with no valid ISS
generated by "
> > +   "guest memory access. KVM unable to emulate
faulting "
> > +   "instruction. Failed to inject an external data
abort "
> > +   "into the guest.");
> > +abort();
> > +   }
> > +   /* Clear the status */
> > +   env->ext_dabt_raised = 0;
> > +}
> >  }
> >
> >  MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
> > diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
> > index f271181..86c4fe7 100644
> > --- a/target/arm/kvm32.c
> > +++ b/target/arm/kvm32.c
> > @@ -564,3 +564,28 @@ void kvm_arm_pmu_init(CPUState *cs)
> >  {
> >  qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
> >  }
> > +
> > +#define ARM_REG_DFSR  ARM_CP15_REG32(0, 5, 0, 0)
> > +#define ARM_REG_TTBCR ARM_CP15_REG32(0, 2, 0, 2)
> > +
> > +#define DFSR_FSC(v)   (((v) >> 6 | (v)) & 0x1F)
> > +#define DFSC_EXTABT(lpae) (lpae) ? 0x10 : 0x08
>
> We should put () around the whole ?: expression when it's in a macro
>
> > +
> > +bool kvm_arm_verify_ext_dabt_pending(CPUState *cs)
> > +{
> > +uint32_t dfsr_val;
> > +
> > +if (!kvm_get_one_reg(cs, ARM_R

Re: [PATCH v4 2/2] target/arm: kvm: Handle potential issue with dabt injection

2020-03-25 Thread Beata Michalska

On Mon, 23 Mar 2020 at 18:44, Richard Henderson
 wrote:
>
> On 3/23/20 4:32 AM, Beata Michalska wrote:
> >  uint8_t ext_dabt_pending; /* Request for injecting ext DABT */
> > +uint8_t ext_dabt_raised; /* Tracking/verifying injection of ext DABT */
>
> Is there a reason these are uint8_t and not bool?
>
>
The ext_dabt_pending is reflecting the KVM type.
The ext_dabt_raised is following that one.

BR
Beata
> r~

Re: [PATCH v4 1/2] target/arm: kvm: Handle DABT with no valid ISS

2020-03-25 Thread Beata Michalska

Hi,

On Mon, 23 Mar 2020 at 12:44, Andrew Jones  wrote:
>
> On Mon, Mar 23, 2020 at 11:32:26AM +, Beata Michalska wrote:
> > On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
> > exception with no valid ISS info to be decoded. The lack of decode info
> > makes it at least tricky to emulate those instruction which is one of the
> > (many) reasons why KVM will not even try to do so.
> >
> > Add support for handling those by requesting KVM to inject external
> > dabt into the quest.
> >
> > Signed-off-by: Beata Michalska 
> > ---
> >  target/arm/cpu.h |  2 ++
> >  target/arm/kvm.c | 54 
> > 
> >  target/arm/kvm_arm.h | 11 +++
> >  3 files changed, 67 insertions(+)
> >
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index 4ffd991..4f834c1 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -560,6 +560,8 @@ typedef struct CPUARMState {
> >  uint64_t esr;
> >  } serror;
> >
> > +uint8_t ext_dabt_pending; /* Request for injecting ext DABT */
> > +
> >  /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
> >  uint32_t irq_line_state;
> >
> > diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> > index 85860e6..c088589 100644
> > --- a/target/arm/kvm.c
> > +++ b/target/arm/kvm.c
> > @@ -39,6 +39,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] 
> > = {
> >
> >  static bool cap_has_mp_state;
> >  static bool cap_has_inject_serror_esr;
> > +static bool cap_has_inject_ext_dabt;
> >
> >  static ARMHostCPUFeatures arm_host_cpu_features;
> >
> > @@ -244,6 +245,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
> >  ret = -EINVAL;
> >  }
> >
> > +if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER)) {
> > +if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
> > +warn_report("Failed to enable DABT NISV cap");
>
> Shouldn't this be an error? If KVM says it has KVM_CAP_ARM_NISV_TO_USER,
> then I think it should always work to enable it, unless userspace passes
> the wrong flags. Currently flags must be zero, but if they were to change
> then we'll need to add the flags to vmstate and fail migration when they
> aren't compatible, and I guess that failure would occur here.
>
That's a fair point. From the kernel point of view this one is pretty
straightforward,
so it should not fail. I haven't used the error here as the lack of
this cap is not really
critical for guest but indeed it might be worth to have it here .

> > +} else {
> > +/* Set status for supporting the external dabt injection */
> > +cap_has_inject_ext_dabt = kvm_check_extension(s,
> > +KVM_CAP_ARM_INJECT_EXT_DABT);
> > +}
> > +}
> > +
> >  return ret;
> >  }
> >
> > @@ -703,9 +714,16 @@ int kvm_put_vcpu_events(ARMCPU *cpu)
> >  events.exception.serror_esr = env->serror.esr;
> >  }
> >
> > +if (cap_has_inject_ext_dabt) {
> > +events.exception.ext_dabt_pending = env->ext_dabt_pending;
> > +}
> > +
> >  ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, );
> >  if (ret) {
> >  error_report("failed to put vcpu events");
> > +} else {
> > +/* Clear instantly if the call was successful */
> > +env->ext_dabt_pending = 0;
> >  }
> >
> >  return ret;
> > @@ -819,6 +837,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run 
> > *run)
> >  ret = EXCP_DEBUG;
> >  } /* otherwise return to guest */
> >  break;
> > +case KVM_EXIT_ARM_NISV:
> > +/* External DABT with no valid iss to decode */
> > +ret = kvm_arm_handle_dabt_nisv(cs, run->arm_nisv.esr_iss,
> > +   run->arm_nisv.fault_ipa);
> > +break;
> >  default:
> >  qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
> >__func__, run->exit_reason);
> > @@ -953,3 +976,34 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
> >  {
> >  return (data - 32) & 0x;
> >  }
> > +
> > +int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
> > + uint64_t fault_ipa)
> > +{
> > +ARMCPU *cpu = ARM_CPU(cs);
> > +CPUARMState *env = >env;
&g

[PATCH v4 0/2] target/arm: kvm: Support for KVM DABT with no valid ISS

2020-03-23 Thread Beata Michalska

Some of the ARMv7 & ARMv8 load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate the instruction which is one of the
(many) reasons why KVM will not even try to do so.

So far, if a guest made an attempt to access memory outside the memory slot,
KVM reported vague ENOSYS. As a result QEMU exited with no useful information
being provided or even a clue on what has just happened.

ARM KVM introduced support for notifying of an attempt to execute
an instruction that resulted in dabt with no valid ISS decoding info.
This still leaves QEMU to handle the case, but at least now it gives more
control and a start point for more meaningful handling of such cases.

This patchset relies on KVM to insert the external data abort into the guest.

v4:
 - Removing one of the patches as it is being picked-up separately
 target/arm: kvm: Inject events at the last stage of sync
 - Moving handling KVM issue to a separate patch
 - Minor changes wrt the review comments

v3:
 - Fix setting KVM cap per vm not per vcpu
 - Simplifying the handler to bare minimum with no default logging to address
   the potential risk of overflooding the host (adding support for rate
   limiting the logs turned out to be bit too invasive to justify the little
   add-on value from logs in this particular case)
 - Adding handling KVM bug (for small range of affected kernels):
   little bit of trade-off between what's reasonable and what's effective:
   aborting qemu when running on buggy host kernel

v2:
- Improving/re-phrasing messaging
- Dropping messing around with forced sync (@see [PATCH v2 1/2])
  and PC alignment

Beata Michalska (2):
  target/arm: kvm: Handle DABT with no valid ISS
  target/arm: kvm: Handle potential issue with dabt injection

 target/arm/cpu.h |  3 ++
 target/arm/kvm.c | 82 
 target/arm/kvm32.c   | 25 
 target/arm/kvm64.c   | 34 ++
 target/arm/kvm_arm.h | 21 ++
 5 files changed, 165 insertions(+)

-- 
2.7.4

[PATCH v4 1/2] target/arm: kvm: Handle DABT with no valid ISS

2020-03-23 Thread Beata Michalska

On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate those instruction which is one of the
(many) reasons why KVM will not even try to do so.

Add support for handling those by requesting KVM to inject external
dabt into the quest.

Signed-off-by: Beata Michalska 
---
 target/arm/cpu.h |  2 ++
 target/arm/kvm.c | 54 
 target/arm/kvm_arm.h | 11 +++
 3 files changed, 67 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 4ffd991..4f834c1 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -560,6 +560,8 @@ typedef struct CPUARMState {
 uint64_t esr;
 } serror;
 
+uint8_t ext_dabt_pending; /* Request for injecting ext DABT */
+
 /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
 uint32_t irq_line_state;
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 85860e6..c088589 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -39,6 +39,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 
 static bool cap_has_mp_state;
 static bool cap_has_inject_serror_esr;
+static bool cap_has_inject_ext_dabt;
 
 static ARMHostCPUFeatures arm_host_cpu_features;
 
@@ -244,6 +245,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 ret = -EINVAL;
 }
 
+if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER)) {
+if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
+warn_report("Failed to enable DABT NISV cap");
+} else {
+/* Set status for supporting the external dabt injection */
+cap_has_inject_ext_dabt = kvm_check_extension(s,
+KVM_CAP_ARM_INJECT_EXT_DABT);
+}
+}
+
 return ret;
 }
 
@@ -703,9 +714,16 @@ int kvm_put_vcpu_events(ARMCPU *cpu)
 events.exception.serror_esr = env->serror.esr;
 }
 
+if (cap_has_inject_ext_dabt) {
+events.exception.ext_dabt_pending = env->ext_dabt_pending;
+}
+
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, );
 if (ret) {
 error_report("failed to put vcpu events");
+} else {
+/* Clear instantly if the call was successful */
+env->ext_dabt_pending = 0;
 }
 
 return ret;
@@ -819,6 +837,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 ret = EXCP_DEBUG;
 } /* otherwise return to guest */
 break;
+case KVM_EXIT_ARM_NISV:
+/* External DABT with no valid iss to decode */
+ret = kvm_arm_handle_dabt_nisv(cs, run->arm_nisv.esr_iss,
+   run->arm_nisv.fault_ipa);
+break;
 default:
 qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
   __func__, run->exit_reason);
@@ -953,3 +976,34 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
 return (data - 32) & 0x;
 }
+
+int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
+ uint64_t fault_ipa)
+{
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+
+   /*
+* ISS [23:14] is invalid so there is a limited info
+* on what has just happened so the only *useful* thing that can
+* be retrieved from ISS is WnR & DFSC (though in some cases WnR
+* might be less of a value as well)
+*/
+
+/*
+ * Set pending ext dabt and trigger SET_EVENTS so that
+ * KVM can inject the abort
+ */
+if (cap_has_inject_ext_dabt) {
+kvm_cpu_synchronize_state(cs);
+env->ext_dabt_pending = 1;
+} else {
+error_report("Data abort exception triggered by guest memory access "
+ "at physical address: 0x"  TARGET_FMT_lx,
+ (target_ulong)fault_ipa);
+error_printf("KVM unable to emulate faulting instruction.\n");
+return -1;
+}
+
+return 0;
+}
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index ae9e075..39472d5 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -450,6 +450,17 @@ struct kvm_guest_debug_arch;
 void kvm_arm_copy_hw_debug_data(struct kvm_guest_debug_arch *ptr);
 
 /**
+ * kvm_arm_handle_dabt_nisv:
+ * @cs: CPUState
+ * @esr_iss: ISS encoding (limited) for the exception from Data Abort
+ *   ISV bit set to '0b0' -> no valid instruction syndrome
+ * @fault_ipa: faulting address for the synch data abort
+ *
+ * Returns: 0 if the exception has been handled, < 0 otherwise
+ */
+int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
+ uint64_t fault_ipa);
+/**
  * its_class_name:
  *
  * Return the ITS class name to use depending on whether KVM acceleration
-- 
2.7.4

[PATCH v4 2/2] target/arm: kvm: Handle potential issue with dabt injection

2020-03-23 Thread Beata Michalska

Injecting external data abort through KVM might trigger
an issue on kernels that do not get updated to include the KVM fix.
For those and aarch32 guests, the injected abort gets misconfigured
to be an implementation defined exception. This leads to the guest
repeatedly re-running the faulting instruction.

Add support for handling that case.
[
  Fixed-by: 018f22f95e8a
('KVM: arm: Fix DFSR setting for non-LPAE aarch32 guests')
  Fixed-by: 21aecdbd7f3a
('KVM: arm: Make inject_abt32() inject an external abort instead')
]

Signed-off-by: Beata Michalska 
---
 target/arm/cpu.h |  1 +
 target/arm/kvm.c | 30 +-
 target/arm/kvm32.c   | 25 +
 target/arm/kvm64.c   | 34 ++
 target/arm/kvm_arm.h | 10 ++
 5 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 4f834c1..868afc6 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -561,6 +561,7 @@ typedef struct CPUARMState {
 } serror;
 
 uint8_t ext_dabt_pending; /* Request for injecting ext DABT */
+uint8_t ext_dabt_raised; /* Tracking/verifying injection of ext DABT */
 
 /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
 uint32_t irq_line_state;
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index c088589..58ad734 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -721,7 +721,12 @@ int kvm_put_vcpu_events(ARMCPU *cpu)
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, );
 if (ret) {
 error_report("failed to put vcpu events");
-} else {
+} else if (env->ext_dabt_pending) {
+/*
+ * Mark that the external DABT has been injected,
+ * if one has been requested
+ */
+env->ext_dabt_raised = env->ext_dabt_pending;
 /* Clear instantly if the call was successful */
 env->ext_dabt_pending = 0;
 }
@@ -755,6 +760,29 @@ int kvm_get_vcpu_events(ARMCPU *cpu)
 
 void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
 {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+
+if (unlikely(env->ext_dabt_raised)) {
+/*
+ * Verifying that the ext DABT has been properly injected,
+ * otherwise risking indefinitely re-running the faulting instruction
+ * Covering a very narrow case for kernels 5.5..5.5.4
+ * when injected abort was misconfigured to be
+ * an IMPLEMENTATION DEFINED exception (for 32-bit EL1)
+ */
+if (!arm_feature(env, ARM_FEATURE_AARCH64) &&
+unlikely(!kvm_arm_verify_ext_dabt_pending(cs))) {
+
+error_report("Data abort exception with no valid ISS generated by "
+   "guest memory access. KVM unable to emulate faulting "
+   "instruction. Failed to inject an external data abort "
+   "into the guest.");
+abort();
+   }
+   /* Clear the status */
+   env->ext_dabt_raised = 0;
+}
 }
 
 MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index f271181..86c4fe7 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -564,3 +564,28 @@ void kvm_arm_pmu_init(CPUState *cs)
 {
 qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
 }
+
+#define ARM_REG_DFSR  ARM_CP15_REG32(0, 5, 0, 0)
+#define ARM_REG_TTBCR ARM_CP15_REG32(0, 2, 0, 2)
+
+#define DFSR_FSC(v)   (((v) >> 6 | (v)) & 0x1F)
+#define DFSC_EXTABT(lpae) (lpae) ? 0x10 : 0x08
+
+bool kvm_arm_verify_ext_dabt_pending(CPUState *cs)
+{
+uint32_t dfsr_val;
+
+if (!kvm_get_one_reg(cs, ARM_REG_DFSR, _val)) {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+uint32_t ttbcr;
+int lpae = 0;
+
+if (!kvm_get_one_reg(cs, ARM_REG_TTBCR, )) {
+lpae = arm_feature(env, ARM_FEATURE_LPAE) && (ttbcr & TTBCR_EAE);
+}
+return !(DFSR_FSC(dfsr_val) != DFSC_EXTABT(lpae));
+}
+return false;
+}
+
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index be5b31c..18594e9 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -1430,3 +1430,37 @@ bool kvm_arm_handle_debug(CPUState *cs, struct 
kvm_debug_exit_arch *debug_exit)
 
 return false;
 }
+
+#define ARM64_REG_ESR_EL1 ARM64_SYS_REG(3, 0, 5, 2, 0)
+#define ARM64_REG_TCR_EL1 ARM64_SYS_REG(3, 0, 2, 0, 2)
+
+#define ESR_DFSC(aarch64, v)\
+((aarch64) ? ((v) & 0x3F)   \
+   : (((v) >> 6 | (v)) & 0x1F))
+
+#define ESR_DFSC_EXTABT(aarch64, lpae) \
+((aarch64) ? 0x10 : (lpae) ? 0x10 : 0x8)
+
+bool kvm_arm_verify_ext_dabt_pending(CPUState *cs)
+{
+uint64_t dfsr_val;
+
+if (!kvm_get_one_reg(cs, ARM64_REG_ESR_EL1, _val)) {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+int aarch

Re: [PATCH v3 2/2] target/arm: kvm: Handle DABT with no valid ISS

2020-03-15 Thread Beata Michalska

On Thu, 12 Mar 2020 at 10:25, Andrew Jones  wrote:
>
> On Thu, Mar 12, 2020 at 12:34:01AM +, Beata Michalska wrote:
> > On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
> > exception with no valid ISS info to be decoded. The lack of decode info
> > makes it at least tricky to emulate those instruction which is one of the
> > (many) reasons why KVM will not even try to do so.
> >
> > Add support for handling those by requesting KVM to inject external
> > dabt into the quest.
> >
> > Signed-off-by: Beata Michalska 
> > ---
> >  target/arm/cpu.h |  3 ++
> >  target/arm/kvm.c | 81 
> > 
> >  target/arm/kvm32.c   | 26 +
> >  target/arm/kvm64.c   | 36 +++
> >  target/arm/kvm_arm.h | 22 ++
> >  5 files changed, 168 insertions(+)
> >
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index 4ffd991..45fdd2e 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -560,6 +560,9 @@ typedef struct CPUARMState {
> >  uint64_t esr;
> >  } serror;
> >
> > +uint8_t ext_dabt_pending:1; /* Request for injecting ext DABT */
> > +uint8_t ext_dabt_raised:1; /* Tracking/verifying injection of ext DABT 
> > */
> > +
>
> Why the bit-fields? We don't use them anywhere else in cpu.h, and that's
> probably because they're not portable. We should just use bools.
>
Old habit of optimizations.
I can drop the bit fields but I'd rather stay with the original type
to be consistent with the kvm ones. I am not sure though why in this case
that would not be portable - bit fields can get tricky but that should not
be the case here (?)

> >  /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
> >  uint32_t irq_line_state;
> >
> > diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> > index 85860e6..8b7b708 100644
> > --- a/target/arm/kvm.c
> > +++ b/target/arm/kvm.c
> > @@ -39,6 +39,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] 
> > = {
> >
> >  static bool cap_has_mp_state;
> >  static bool cap_has_inject_serror_esr;
> > +static bool cap_has_inject_ext_dabt;
> >
> >  static ARMHostCPUFeatures arm_host_cpu_features;
> >
> > @@ -244,6 +245,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
> >  ret = -EINVAL;
> >  }
> >
> > +if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER)) {
> > +if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
> > +warn_report("Failed to enable DABT NISV cap");
> > +} else {
> > +/* Set status for supporting the external dabt injection */
> > +cap_has_inject_ext_dabt = kvm_check_extension(s,
> > +KVM_CAP_ARM_INJECT_EXT_DABT);
> > +}
> > +}
> > +
> >  return ret;
> >  }
> >
> > @@ -703,9 +714,20 @@ int kvm_put_vcpu_events(ARMCPU *cpu)
> >  events.exception.serror_esr = env->serror.esr;
> >  }
> >
> > +if (cap_has_inject_ext_dabt) {
> > +events.exception.ext_dabt_pending = env->ext_dabt_pending;
> > +}
> > +
> >  ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, );
> >  if (ret) {
> >  error_report("failed to put vcpu events");
> > +} else if (env->ext_dabt_pending) {
> > +/*
> > + * Mark that the external DABT has been injected,
> > + * if one has been requested
> > + */
> > +env->ext_dabt_raised = env->ext_dabt_pending;
> > +env->ext_dabt_pending = 0;
> >  }
> >
> >  return ret;
> > @@ -737,6 +759,30 @@ int kvm_get_vcpu_events(ARMCPU *cpu)
> >
> >  void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
> >  {
> > +ARMCPU *cpu = ARM_CPU(cs);
> > +CPUARMState *env = >env;
> > +
> > +if (unlikely(env->ext_dabt_raised)) {
> > +/*
> > + * Verifying that the ext DABT has been properly injected,
> > + * otherwise risking indefinitely re-running the faulting 
> > instruction
> > + * Covering a very narrow case for kernels 5.5..5.5.4
>
> I'm still not convinced that QEMU needs to add workarounds for broken KVM,
> when KVM can be fixed, and even is already fixed. If you really want to
> keep it, then can you break this patch into two, splitting the dabt
> injection out from the workaround?
>
I can defi

Re: [PATCH v3 1/2] target/arm: kvm: Inject events at the last stage of sync

2020-03-15 Thread Beata Michalska

On Thu, 12 Mar 2020 at 16:33, Peter Maydell  wrote:
>
> On Thu, 12 Mar 2020 at 00:34, Beata Michalska
>  wrote:
> >
> > KVM_SET_VCPU_EVENTS might actually lead to vcpu registers being modified.
> > As such this should be the last step of sync to avoid potential overwriting
> > of whatever changes KVM might have done.
> >
> > Signed-off-by: Beata Michalska 
>
> Hi; I'm going to take patch 1 into target-arm.next since it
> seems worth having on its own and I'm doing a pullreq today
> anyway. Andrew's given you feedback on patch 2.
>
Hi,

Thanks for that. Will drop this one from the next version of the patchset
once I address all the comments.

BR
Beata
> thanks
> -- PMM

[PATCH v3 2/2] target/arm: kvm: Handle DABT with no valid ISS

2020-03-11 Thread Beata Michalska

On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate those instruction which is one of the
(many) reasons why KVM will not even try to do so.

Add support for handling those by requesting KVM to inject external
dabt into the quest.

Signed-off-by: Beata Michalska 
---
 target/arm/cpu.h |  3 ++
 target/arm/kvm.c | 81 
 target/arm/kvm32.c   | 26 +
 target/arm/kvm64.c   | 36 +++
 target/arm/kvm_arm.h | 22 ++
 5 files changed, 168 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 4ffd991..45fdd2e 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -560,6 +560,9 @@ typedef struct CPUARMState {
 uint64_t esr;
 } serror;
 
+uint8_t ext_dabt_pending:1; /* Request for injecting ext DABT */
+uint8_t ext_dabt_raised:1; /* Tracking/verifying injection of ext DABT */
+
 /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
 uint32_t irq_line_state;
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 85860e6..8b7b708 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -39,6 +39,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 
 static bool cap_has_mp_state;
 static bool cap_has_inject_serror_esr;
+static bool cap_has_inject_ext_dabt;
 
 static ARMHostCPUFeatures arm_host_cpu_features;
 
@@ -244,6 +245,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 ret = -EINVAL;
 }
 
+if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER)) {
+if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
+warn_report("Failed to enable DABT NISV cap");
+} else {
+/* Set status for supporting the external dabt injection */
+cap_has_inject_ext_dabt = kvm_check_extension(s,
+KVM_CAP_ARM_INJECT_EXT_DABT);
+}
+}
+
 return ret;
 }
 
@@ -703,9 +714,20 @@ int kvm_put_vcpu_events(ARMCPU *cpu)
 events.exception.serror_esr = env->serror.esr;
 }
 
+if (cap_has_inject_ext_dabt) {
+events.exception.ext_dabt_pending = env->ext_dabt_pending;
+}
+
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, );
 if (ret) {
 error_report("failed to put vcpu events");
+} else if (env->ext_dabt_pending) {
+/*
+ * Mark that the external DABT has been injected,
+ * if one has been requested
+ */
+env->ext_dabt_raised = env->ext_dabt_pending;
+env->ext_dabt_pending = 0;
 }
 
 return ret;
@@ -737,6 +759,30 @@ int kvm_get_vcpu_events(ARMCPU *cpu)
 
 void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
 {
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+
+if (unlikely(env->ext_dabt_raised)) {
+/*
+ * Verifying that the ext DABT has been properly injected,
+ * otherwise risking indefinitely re-running the faulting instruction
+ * Covering a very narrow case for kernels 5.5..5.5.4
+ * when injected abort was misconfigured to be
+ * an IMPLEMENTATION DEFINED exception (for 32-bit EL1)
+ */
+if (!arm_feature(env, ARM_FEATURE_AARCH64) &&
+unlikely(kvm_arm_verify_ext_dabt_pending(cs))) {
+
+error_report("Data abort exception with no valid ISS generated by "
+   "guest memory access. KVM unable to emulate faulting "
+   "instruction. Failed to inject an external data abort "
+   "into the guest.");
+abort();
+   }
+   /* Clear the status */
+   env->ext_dabt_raised = 0;
+}
+
 }
 
 MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
@@ -819,6 +865,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 ret = EXCP_DEBUG;
 } /* otherwise return to guest */
 break;
+case KVM_EXIT_ARM_NISV:
+/* External DABT with no valid iss to decode */
+ret = kvm_arm_handle_dabt_nisv(cs, run->arm_nisv.esr_iss,
+   run->arm_nisv.fault_ipa);
+break;
 default:
 qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
   __func__, run->exit_reason);
@@ -953,3 +1004,33 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
 return (data - 32) & 0x;
 }
+
+int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
+ uint64_t fault_ipa)
+{
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+
+   /*
+* ISS [23:14] is invalid so there is a limited info
+* on what has just happened so the only *useful* thing that can
+* be retrieved from ISS is WnR & DFSC (though in

[PATCH v3 1/2] target/arm: kvm: Inject events at the last stage of sync

2020-03-11 Thread Beata Michalska

KVM_SET_VCPU_EVENTS might actually lead to vcpu registers being modified.
As such this should be the last step of sync to avoid potential overwriting
of whatever changes KVM might have done.

Signed-off-by: Beata Michalska 
---
 target/arm/kvm32.c | 15 ++-
 target/arm/kvm64.c | 15 ++-
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index f703c4f..f271181 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -409,17 +409,22 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return ret;
 }
 
-ret = kvm_put_vcpu_events(cpu);
-if (ret) {
-return ret;
-}
-
 write_cpustate_to_list(cpu, true);
 
 if (!write_list_to_kvmstate(cpu, level)) {
 return EINVAL;
 }
 
+/*
+ * Setting VCPU events should be triggered after syncing the registers
+ * to avoid overwriting potential changes made by KVM upon calling
+ * KVM_SET_VCPU_EVENTS ioctl
+ */
+ret = kvm_put_vcpu_events(cpu);
+if (ret) {
+return ret;
+}
+
 kvm_arm_sync_mpstate_to_kvm(cpu);
 
 return ret;
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 93ba144..be5b31c 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -1094,17 +1094,22 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return ret;
 }
 
-ret = kvm_put_vcpu_events(cpu);
-if (ret) {
-return ret;
-}
-
 write_cpustate_to_list(cpu, true);
 
 if (!write_list_to_kvmstate(cpu, level)) {
 return -EINVAL;
 }
 
+   /*
+* Setting VCPU events should be triggered after syncing the registers
+* to avoid overwriting potential changes made by KVM upon calling
+* KVM_SET_VCPU_EVENTS ioctl
+*/
+ret = kvm_put_vcpu_events(cpu);
+if (ret) {
+return ret;
+}
+
 kvm_arm_sync_mpstate_to_kvm(cpu);
 
 return ret;
-- 
2.7.4

[PATCH v3 0/2] target/arm: kvm: Support for KVM DABT with no valid ISS

2020-03-11 Thread Beata Michalska

Some of the ARMv7 & ARMv8 load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate the instruction which is one of the
(many) reasons why KVM will not even try to do so.

So far, if a guest made an attempt to access memory outside the memory slot,
KVM reported vague ENOSYS. As a result QEMU exited with no useful information
being provided or even a clue on what has just happened.

ARM KVM introduced support for notifying of an attempt to execute
an instruction that resulted in dabt with no valid ISS decoding info.
This still leaves QEMU to handle the case, but at least now it gives more
control and a start point for more meaningful handling of such cases.

This patchset relies on KVM to insert the external data abort into the guest.

v3:
 - Fix setting KVM cap per vm not per vcpu
 - Simplifying the handler to bare minimum with no default logging to address
   the potential risk of overflooding the host (adding support for rate
   limiting the logs turned out to be bit too invasive to justify the little
   add-on value from logs in this particular case)
 - Adding handling KVM bug (for small range of affected kernels):
   little bit of trade-off between what's reasonable and what's effective:
   aborting qemu when running on buggy host kernel

v2:
- Improving/re-phrasing messaging
- Dropping messing around with forced sync (@see [PATCH v2 1/2])
  and PC alignment


Beata Michalska (2):
  target/arm: kvm: Inject events at the last stage of sync
  target/arm: kvm: Handle DABT with no valid ISS

 target/arm/cpu.h |  3 ++
 target/arm/kvm.c | 81 
 target/arm/kvm32.c   | 41 ++
 target/arm/kvm64.c   | 51 +
 target/arm/kvm_arm.h | 22 ++
 5 files changed, 188 insertions(+), 10 deletions(-)

-- 
2.7.4

Re: [PATCH v2 2/2] target/arm: kvm: Handle DABT with no valid ISS

2020-02-11 Thread Beata Michalska

On Fri, 7 Feb 2020 at 08:20, Andrew Jones  wrote:
>
> On Thu, Feb 06, 2020 at 09:48:05PM +, Beata Michalska wrote:
> > On Wed, 5 Feb 2020 at 16:57, Andrew Jones  wrote:
> > >
> > > On Wed, Jan 29, 2020 at 08:24:41PM +, Beata Michalska wrote:
> > > > On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
> > > > exception with no valid ISS info to be decoded. The lack of decode info
> > > > makes it at least tricky to emulate those instruction which is one of 
> > > > the
> > > > (many) reasons why KVM will not even try to do so.
> > > >
> > > > Add support for handling those by requesting KVM to inject external
> > > > dabt into the quest.
> > > >
> > > > Signed-off-by: Beata Michalska 
> > > > ---
> > > >  target/arm/cpu.h |  2 ++
> > > >  target/arm/kvm.c | 96 
> > > > 
> > > >  target/arm/kvm32.c   |  3 ++
> > > >  target/arm/kvm64.c   |  3 ++
> > > >  target/arm/kvm_arm.h | 19 +++
> > > >  5 files changed, 123 insertions(+)
> > > >
> > > > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > > > index c1aedbe..e04a8d3 100644
> > > > --- a/target/arm/cpu.h
> > > > +++ b/target/arm/cpu.h
> > > > @@ -558,6 +558,8 @@ typedef struct CPUARMState {
> > > >  uint8_t has_esr;
> > > >  uint64_t esr;
> > > >  } serror;
> > > > +/* Status field for pending external dabt */
> > > > +uint8_t ext_dabt_pending;
> > > >
> > > >  /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
> > > >  uint32_t irq_line_state;
> > > > diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> > > > index 8d82889..e7bc9b7 100644
> > > > --- a/target/arm/kvm.c
> > > > +++ b/target/arm/kvm.c
> > > > @@ -37,6 +37,7 @@ const KVMCapabilityInfo 
> > > > kvm_arch_required_capabilities[] = {
> > > >
> > > >  static bool cap_has_mp_state;
> > > >  static bool cap_has_inject_serror_esr;
> > > > +static bool cap_has_inject_ext_dabt; /* KVM_CAP_ARM_INJECT_EXT_DABT */
> > >
> > > nit: the KVM_CAP_ARM_INJECT_EXT_DABT comment is unnecessary
> >
> > Might be - I just find it handy when looking for  related details.
> > I will remove that one though.
> >
> > >
> > > >
> > > >  static ARMHostCPUFeatures arm_host_cpu_features;
> > > >
> > > > @@ -62,6 +63,12 @@ void kvm_arm_init_serror_injection(CPUState *cs)
> > > >  KVM_CAP_ARM_INJECT_SERROR_ESR);
> > > >  }
> > > >
> > > > +void kvm_arm_init_ext_dabt_injection(CPUState *cs)
> > > > +{
> > > > +cap_has_inject_ext_dabt = kvm_check_extension(cs->kvm_state,
> > > > +KVM_CAP_ARM_INJECT_EXT_DABT);
> > > > +}
> > > > +
> > > >  bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
> > > >int *fdarray,
> > > >struct kvm_vcpu_init *init)
> > > > @@ -216,6 +223,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
> > > >  ret = -EINVAL;
> > > >  }
> > > >
> > > > +if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER))
> > > > +if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
> > > > +warn_report("Failed to enable DABT NISV cap");
> > > > +}
> > > > +
> > >
> > > Missing {} around the outer block.
> >
> > Checkpatch didn't complain ...
> > Will fix that.
> >
> > >
> > > As KVM_CAP_ARM_INJECT_EXT_DABT is a VM capability then I think we should
> > > set cap_has_inject_ext_dabt here, like cap_has_mp_state gets set. I see
> > > you've followed the pattern used for cap_has_inject_serror_esr, but that
> > > looks wrong too since KVM_CAP_ARM_INJECT_SERROR_ESR is also a VM
> > > capability. The way it is now we just keep setting
> > > cap_has_inject_serror_esr to the same value, NR_VCPUS times.
> > >
> > You are totally right - I have completely missed that point! Thanks.
> >
> > > >  return ret;
> > > >  }
> > > >
> > > &

Re: [PATCH v2 1/2] target/arm: kvm: Inject events at the last stage of sync

2020-02-06 Thread Beata Michalska

On Tue, 4 Feb 2020 at 10:34, Andrew Jones  wrote:
>
> On Wed, Jan 29, 2020 at 08:24:40PM +, Beata Michalska wrote:
> > KVM_SET_VCPU_EVENTS might actually lead to vcpu registers being modified.
> > As such this should be the last step of sync to avoid potential overwriting
> > of whatever changes KVM might have done.
> >
> > Signed-off-by: Beata Michalska 
> > ---
> >  target/arm/kvm32.c | 20 ++--
> >  target/arm/kvm64.c | 20 ++--
> >  2 files changed, 20 insertions(+), 20 deletions(-)
> >
> > diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
> > index 32bf8d6..cf2b47f 100644
> > --- a/target/arm/kvm32.c
> > +++ b/target/arm/kvm32.c
> > @@ -386,17 +386,17 @@ int kvm_arch_put_registers(CPUState *cs, int level)
> >  return ret;
> >  }
> >
> > -ret = kvm_put_vcpu_events(cpu);
> > -if (ret) {
> > -return ret;
> > -}
> > -
> >  write_cpustate_to_list(cpu, true);
> >
> >  if (!write_list_to_kvmstate(cpu, level)) {
> >  return EINVAL;
> >  }
> >
> > +ret = kvm_put_vcpu_events(cpu);
> > +if (ret) {
> > +return ret;
> > +}
> > +
>
> I think we should put a comment above this that says basically the same
> thing as the commit message in order to explain why kvm_put_vcpu_events()
> *must* be after write_list_to_kvmstate().
>
Will do that.

> >  kvm_arm_sync_mpstate_to_kvm(cpu);
> >
> >  return ret;
> > @@ -462,11 +462,6 @@ int kvm_arch_get_registers(CPUState *cs)
> >  }
> >  vfp_set_fpscr(env, fpscr);
> >
> > -ret = kvm_get_vcpu_events(cpu);
> > -if (ret) {
> > -return ret;
> > -}
> > -
> >  if (!write_kvmstate_to_list(cpu)) {
> >  return EINVAL;
> >  }
> > @@ -475,6 +470,11 @@ int kvm_arch_get_registers(CPUState *cs)
> >   */
> >  write_list_to_cpustate(cpu);
> >
> > +ret = kvm_get_vcpu_events(cpu);
> > +if (ret) {
> > +return ret;
> > +}
> > +
>
> Why are we moving kvm_get_vcpu_events()?

This is only to make things consistent with put_registeres.
There is no functional change per se.

BR

Beata

> >  kvm_arm_sync_mpstate_to_qemu(cpu);
> >
> >  return 0;
> > diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> > index 6344113..d06fd32 100644
> > --- a/target/arm/kvm64.c
> > +++ b/target/arm/kvm64.c
> > @@ -1043,17 +1043,17 @@ int kvm_arch_put_registers(CPUState *cs, int level)
> >  return ret;
> >  }
> >
> > -ret = kvm_put_vcpu_events(cpu);
> > -if (ret) {
> > -return ret;
> > -}
> > -
> >  write_cpustate_to_list(cpu, true);
> >
> >  if (!write_list_to_kvmstate(cpu, level)) {
> >  return -EINVAL;
> >  }
> >
> > +ret = kvm_put_vcpu_events(cpu);
> > +if (ret) {
> > +return ret;
> > +}
> > +
> >  kvm_arm_sync_mpstate_to_kvm(cpu);
> >
> >  return ret;
> > @@ -1251,11 +1251,6 @@ int kvm_arch_get_registers(CPUState *cs)
> >  }
> >  vfp_set_fpcr(env, fpr);
> >
> > -ret = kvm_get_vcpu_events(cpu);
> > -if (ret) {
> > -return ret;
> > -}
> > -
> >  if (!write_kvmstate_to_list(cpu)) {
> >  return -EINVAL;
> >  }
> > @@ -1264,6 +1259,11 @@ int kvm_arch_get_registers(CPUState *cs)
> >   */
> >  write_list_to_cpustate(cpu);
> >
> > +ret = kvm_get_vcpu_events(cpu);
> > +if (ret) {
> > +return ret;
> > +}
> > +
> >  kvm_arm_sync_mpstate_to_qemu(cpu);
> >
> >  /* TODO: other registers */
> > --
> > 2.7.4
> >
> >
>
> Same comments for kvm64.c as for kvm32.c
>
> Thanks,
> drew
>

Re: [PATCH v2 2/2] target/arm: kvm: Handle DABT with no valid ISS

2020-02-06 Thread Beata Michalska

On Wed, 5 Feb 2020 at 16:57, Andrew Jones  wrote:
>
> On Wed, Jan 29, 2020 at 08:24:41PM +, Beata Michalska wrote:
> > On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
> > exception with no valid ISS info to be decoded. The lack of decode info
> > makes it at least tricky to emulate those instruction which is one of the
> > (many) reasons why KVM will not even try to do so.
> >
> > Add support for handling those by requesting KVM to inject external
> > dabt into the quest.
> >
> > Signed-off-by: Beata Michalska 
> > ---
> >  target/arm/cpu.h |  2 ++
> >  target/arm/kvm.c | 96 
> > 
> >  target/arm/kvm32.c   |  3 ++
> >  target/arm/kvm64.c   |  3 ++
> >  target/arm/kvm_arm.h | 19 +++
> >  5 files changed, 123 insertions(+)
> >
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index c1aedbe..e04a8d3 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -558,6 +558,8 @@ typedef struct CPUARMState {
> >  uint8_t has_esr;
> >  uint64_t esr;
> >  } serror;
> > +/* Status field for pending external dabt */
> > +uint8_t ext_dabt_pending;
> >
> >  /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
> >  uint32_t irq_line_state;
> > diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> > index 8d82889..e7bc9b7 100644
> > --- a/target/arm/kvm.c
> > +++ b/target/arm/kvm.c
> > @@ -37,6 +37,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] 
> > = {
> >
> >  static bool cap_has_mp_state;
> >  static bool cap_has_inject_serror_esr;
> > +static bool cap_has_inject_ext_dabt; /* KVM_CAP_ARM_INJECT_EXT_DABT */
>
> nit: the KVM_CAP_ARM_INJECT_EXT_DABT comment is unnecessary

Might be - I just find it handy when looking for  related details.
I will remove that one though.

>
> >
> >  static ARMHostCPUFeatures arm_host_cpu_features;
> >
> > @@ -62,6 +63,12 @@ void kvm_arm_init_serror_injection(CPUState *cs)
> >  KVM_CAP_ARM_INJECT_SERROR_ESR);
> >  }
> >
> > +void kvm_arm_init_ext_dabt_injection(CPUState *cs)
> > +{
> > +cap_has_inject_ext_dabt = kvm_check_extension(cs->kvm_state,
> > +KVM_CAP_ARM_INJECT_EXT_DABT);
> > +}
> > +
> >  bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
> >int *fdarray,
> >struct kvm_vcpu_init *init)
> > @@ -216,6 +223,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
> >  ret = -EINVAL;
> >  }
> >
> > +if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER))
> > +if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
> > +warn_report("Failed to enable DABT NISV cap");
> > +}
> > +
>
> Missing {} around the outer block.

Checkpatch didn't complain ...
Will fix that.

>
> As KVM_CAP_ARM_INJECT_EXT_DABT is a VM capability then I think we should
> set cap_has_inject_ext_dabt here, like cap_has_mp_state gets set. I see
> you've followed the pattern used for cap_has_inject_serror_esr, but that
> looks wrong too since KVM_CAP_ARM_INJECT_SERROR_ESR is also a VM
> capability. The way it is now we just keep setting
> cap_has_inject_serror_esr to the same value, NR_VCPUS times.
>
You are totally right - I have completely missed that point! Thanks.

> >  return ret;
> >  }
> >
> > @@ -598,6 +610,10 @@ int kvm_put_vcpu_events(ARMCPU *cpu)
> >  events.exception.serror_esr = env->serror.esr;
> >  }
> >
> > +if (cap_has_inject_ext_dabt) {
> > +events.exception.ext_dabt_pending = env->ext_dabt_pending;
> > +}
> > +
> >  ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, );
> >  if (ret) {
> >  error_report("failed to put vcpu events");
> > @@ -627,6 +643,8 @@ int kvm_get_vcpu_events(ARMCPU *cpu)
> >  env->serror.has_esr = events.exception.serror_has_esr;
> >  env->serror.esr = events.exception.serror_esr;
> >
> > +env->ext_dabt_pending = events.exception.ext_dabt_pending;
> > +
>
> afaict from Documentation/virt/kvm/api.txt and the KVM code you cannot
> get this state. Therefore the above line (and extra stray blank line)
> should be dropped.
>
That's true, though this is a lightweight way of resetting the vcpu state.
We would have to do that otherwise to mark that this case

[PATCH v2 0/2] target/arm: kvm: Support for KVM DABT without valid ISS

2020-01-29 Thread Beata Michalska

Some of the ARMv7 & ARMv8 load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate the instruction which is one of the
(many) reasons why KVM will not even try to do so.

So far, if a guest made an attempt to access memory outside the memory slot,
KVM reported vague ENOSYS. As a result QEMU exited with no useful information
being provided or even a clue on what has just happened.

ARM KVM introduced support for notifying guest of an attempt to execute
an instruction that resulted in dabt with no valid ISS decoding info.
This still leaves QEMU to handle the case, but at least now, it can enable
further debugging of the encountered issue by being more verbose
in a (hopefully) useful way.

v2:
- Improving/re-phrasing messaging
- Dropping messing around with forced sync (@see [PATCH v2 1/2])
  and PC alignment


Beata Michalska (2):
  target/arm: kvm: Inject events at the last stage of sync
  target/arm: kvm: Handle DABT with no valid ISS

 target/arm/cpu.h |  2 ++
 target/arm/kvm.c | 96 
 target/arm/kvm32.c   | 23 +++--
 target/arm/kvm64.c   | 23 +++--
 target/arm/kvm_arm.h | 19 +++
 5 files changed, 143 insertions(+), 20 deletions(-)

-- 
2.7.4

[PATCH v2 2/2] target/arm: kvm: Handle DABT with no valid ISS

2020-01-29 Thread Beata Michalska

On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate those instruction which is one of the
(many) reasons why KVM will not even try to do so.

Add support for handling those by requesting KVM to inject external
dabt into the quest.

Signed-off-by: Beata Michalska 
---
 target/arm/cpu.h |  2 ++
 target/arm/kvm.c | 96 
 target/arm/kvm32.c   |  3 ++
 target/arm/kvm64.c   |  3 ++
 target/arm/kvm_arm.h | 19 +++
 5 files changed, 123 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index c1aedbe..e04a8d3 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -558,6 +558,8 @@ typedef struct CPUARMState {
 uint8_t has_esr;
 uint64_t esr;
 } serror;
+/* Status field for pending external dabt */
+uint8_t ext_dabt_pending;
 
 /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
 uint32_t irq_line_state;
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 8d82889..e7bc9b7 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -37,6 +37,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 
 static bool cap_has_mp_state;
 static bool cap_has_inject_serror_esr;
+static bool cap_has_inject_ext_dabt; /* KVM_CAP_ARM_INJECT_EXT_DABT */
 
 static ARMHostCPUFeatures arm_host_cpu_features;
 
@@ -62,6 +63,12 @@ void kvm_arm_init_serror_injection(CPUState *cs)
 KVM_CAP_ARM_INJECT_SERROR_ESR);
 }
 
+void kvm_arm_init_ext_dabt_injection(CPUState *cs)
+{
+cap_has_inject_ext_dabt = kvm_check_extension(cs->kvm_state,
+KVM_CAP_ARM_INJECT_EXT_DABT);
+}
+
 bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
   int *fdarray,
   struct kvm_vcpu_init *init)
@@ -216,6 +223,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 ret = -EINVAL;
 }
 
+if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER))
+if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
+warn_report("Failed to enable DABT NISV cap");
+}
+
 return ret;
 }
 
@@ -598,6 +610,10 @@ int kvm_put_vcpu_events(ARMCPU *cpu)
 events.exception.serror_esr = env->serror.esr;
 }
 
+if (cap_has_inject_ext_dabt) {
+events.exception.ext_dabt_pending = env->ext_dabt_pending;
+}
+
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, );
 if (ret) {
 error_report("failed to put vcpu events");
@@ -627,6 +643,8 @@ int kvm_get_vcpu_events(ARMCPU *cpu)
 env->serror.has_esr = events.exception.serror_has_esr;
 env->serror.esr = events.exception.serror_esr;
 
+env->ext_dabt_pending = events.exception.ext_dabt_pending;
+
 return 0;
 }
 
@@ -634,6 +652,7 @@ void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
 {
 }
 
+
 MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
 {
 ARMCPU *cpu;
@@ -699,6 +718,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 ret = EXCP_DEBUG;
 } /* otherwise return to guest */
 break;
+case KVM_EXIT_ARM_NISV:
+/* External DABT with no valid iss to decode */
+ret = kvm_arm_handle_dabt_nisv(cs, run->arm_nisv.esr_iss,
+   run->arm_nisv.fault_ipa);
+break;
 default:
 qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
   __func__, run->exit_reason);
@@ -833,3 +857,75 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
 return (data - 32) & 0x;
 }
+
+int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
+ uint64_t fault_ipa)
+{
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+uint32_t ins, ins_fetched;
+
+/*
+ * Hacky workaround for kernels that for aarch32 guests, instead of 
expected
+ * external data abort, inject the IMPLEMENTATION DEFINED exception with 
the
+ * lock-down. This is actually handled by the guest which results in
+ * re-running the faulting instruction.
+ * This intends to break the vicious cycle.
+ */
+if (!is_a64(env)) {
+static uint8_t setback;
+
+/*
+ * The state has not been synchronized yet, so if this is re-occurrence
+ * of the same abort triggered by guest, the status for pending 
external
+ * abort should not get cleared yet
+ */
+if (unlikely(env->ext_dabt_pending)) {
+if (setback) {
+error_report("Most probably triggered kernel issue with"
+ " injecting external data abort.");
+error_printf("Giving up trying ...\n

[PATCH v2 1/2] target/arm: kvm: Inject events at the last stage of sync

2020-01-29 Thread Beata Michalska

KVM_SET_VCPU_EVENTS might actually lead to vcpu registers being modified.
As such this should be the last step of sync to avoid potential overwriting
of whatever changes KVM might have done.

Signed-off-by: Beata Michalska 
---
 target/arm/kvm32.c | 20 ++--
 target/arm/kvm64.c | 20 ++--
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index 32bf8d6..cf2b47f 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -386,17 +386,17 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return ret;
 }
 
-ret = kvm_put_vcpu_events(cpu);
-if (ret) {
-return ret;
-}
-
 write_cpustate_to_list(cpu, true);
 
 if (!write_list_to_kvmstate(cpu, level)) {
 return EINVAL;
 }
 
+ret = kvm_put_vcpu_events(cpu);
+if (ret) {
+return ret;
+}
+
 kvm_arm_sync_mpstate_to_kvm(cpu);
 
 return ret;
@@ -462,11 +462,6 @@ int kvm_arch_get_registers(CPUState *cs)
 }
 vfp_set_fpscr(env, fpscr);
 
-ret = kvm_get_vcpu_events(cpu);
-if (ret) {
-return ret;
-}
-
 if (!write_kvmstate_to_list(cpu)) {
 return EINVAL;
 }
@@ -475,6 +470,11 @@ int kvm_arch_get_registers(CPUState *cs)
  */
 write_list_to_cpustate(cpu);
 
+ret = kvm_get_vcpu_events(cpu);
+if (ret) {
+return ret;
+}
+
 kvm_arm_sync_mpstate_to_qemu(cpu);
 
 return 0;
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 6344113..d06fd32 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -1043,17 +1043,17 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return ret;
 }
 
-ret = kvm_put_vcpu_events(cpu);
-if (ret) {
-return ret;
-}
-
 write_cpustate_to_list(cpu, true);
 
 if (!write_list_to_kvmstate(cpu, level)) {
 return -EINVAL;
 }
 
+ret = kvm_put_vcpu_events(cpu);
+if (ret) {
+return ret;
+}
+
 kvm_arm_sync_mpstate_to_kvm(cpu);
 
 return ret;
@@ -1251,11 +1251,6 @@ int kvm_arch_get_registers(CPUState *cs)
 }
 vfp_set_fpcr(env, fpr);
 
-ret = kvm_get_vcpu_events(cpu);
-if (ret) {
-return ret;
-}
-
 if (!write_kvmstate_to_list(cpu)) {
 return -EINVAL;
 }
@@ -1264,6 +1259,11 @@ int kvm_arch_get_registers(CPUState *cs)
  */
 write_list_to_cpustate(cpu);
 
+ret = kvm_get_vcpu_events(cpu);
+if (ret) {
+return ret;
+}
+
 kvm_arm_sync_mpstate_to_qemu(cpu);
 
 /* TODO: other registers */
-- 
2.7.4

Re: [PATCH v4 20/24] nvme: add support for scatter gather lists

2020-01-09 Thread Beata Michalska

Hi Klaus,

On Thu, 19 Dec 2019 at 13:09, Klaus Jensen  wrote:
>
> For now, support the Data Block, Segment and Last Segment descriptor
> types.
>
> See NVM Express 1.3d, Section 4.4 ("Scatter Gather List (SGL)").
>
> Signed-off-by: Klaus Jensen 
> Acked-by: Fam Zheng 
> ---
>  block/nvme.c  |  18 +-
>  hw/block/nvme.c   | 379 +++---
>  hw/block/trace-events |   3 +
>  include/block/nvme.h  |  62 ++-
>  4 files changed, 393 insertions(+), 69 deletions(-)
>
> diff --git a/block/nvme.c b/block/nvme.c
> index d41c4bda6e39..521f521054d5 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -446,7 +446,7 @@ static void nvme_identify(BlockDriverState *bs, int 
> namespace, Error **errp)
>  error_setg(errp, "Cannot map buffer for DMA");
>  goto out;
>  }
> -cmd.prp1 = cpu_to_le64(iova);
> +cmd.dptr.prp.prp1 = cpu_to_le64(iova);
>
>  if (nvme_cmd_sync(bs, s->queues[0], )) {
>  error_setg(errp, "Failed to identify controller");
> @@ -545,7 +545,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
> **errp)
>  }
>  cmd = (NvmeCmd) {
>  .opcode = NVME_ADM_CMD_CREATE_CQ,
> -.prp1 = cpu_to_le64(q->cq.iova),
> +.dptr.prp.prp1 = cpu_to_le64(q->cq.iova),
>  .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0x)),
>  .cdw11 = cpu_to_le32(0x3),
>  };
> @@ -556,7 +556,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
> **errp)
>  }
>  cmd = (NvmeCmd) {
>  .opcode = NVME_ADM_CMD_CREATE_SQ,
> -.prp1 = cpu_to_le64(q->sq.iova),
> +.dptr.prp.prp1 = cpu_to_le64(q->sq.iova),
>  .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0x)),
>  .cdw11 = cpu_to_le32(0x1 | (n << 16)),
>  };
> @@ -906,16 +906,16 @@ try_map:
>  case 0:
>  abort();
>  case 1:
> -cmd->prp1 = pagelist[0];
> -cmd->prp2 = 0;
> +cmd->dptr.prp.prp1 = pagelist[0];
> +cmd->dptr.prp.prp2 = 0;
>  break;
>  case 2:
> -cmd->prp1 = pagelist[0];
> -cmd->prp2 = pagelist[1];
> +cmd->dptr.prp.prp1 = pagelist[0];
> +cmd->dptr.prp.prp2 = pagelist[1];
>  break;
>  default:
> -cmd->prp1 = pagelist[0];
> -cmd->prp2 = cpu_to_le64(req->prp_list_iova + sizeof(uint64_t));
> +cmd->dptr.prp.prp1 = pagelist[0];
> +cmd->dptr.prp.prp2 = cpu_to_le64(req->prp_list_iova + 
> sizeof(uint64_t));
>  break;
>  }
>  trace_nvme_cmd_map_qiov(s, cmd, req, qiov, entries);
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index f6591285b504..b3fca3c4ea58 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -73,7 +73,12 @@ static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr 
> addr)
>
>  static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
>  {
> -if (n->cmbsz && nvme_addr_is_cmb(n, addr)) {
> +hwaddr hi = addr + size;
> +if (hi < addr) {

What is the actual use case for that ?

> +return 1;
> +}
> +
> +if (n->cmbsz && nvme_addr_is_cmb(n, addr) && nvme_addr_is_cmb(n, hi)) {
>  memcpy(buf, (void *) >cmbuf[addr - n->ctrl_mem.addr], size);
>  return 0;
>  }
> @@ -301,17 +306,287 @@ unmap:
>  return status;
>  }
>
> -static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
> -uint64_t prp1, uint64_t prp2, DMADirection dir, NvmeRequest *req)
> +static uint16_t nvme_map_to_cmb(NvmeCtrl *n, QEMUIOVector *iov, hwaddr addr,
> +size_t len)
> +{
> +hwaddr hi = addr + len;
> +if (hi < addr) {
> +return NVME_DATA_TRANSFER_ERROR;
> +}
> +
> +if (!nvme_addr_is_cmb(n, addr) || !nvme_addr_is_cmb(n, hi)) {
> +return NVME_DATA_TRANSFER_ERROR;
> +}
> +
> +qemu_iovec_add(iov, nvme_addr_to_cmb(n, addr), len);
> +
> +return NVME_SUCCESS;
> +}
> +
> +static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList *qsg,
> +QEMUIOVector *iov, NvmeSglDescriptor *segment, uint64_t nsgld,
> +uint32_t *len, bool is_cmb, NvmeRequest *req)
> +{
> +dma_addr_t addr, trans_len;
> +uint16_t status;
> +
> +for (int i = 0; i < nsgld; i++) {
> +if (NVME_SGL_TYPE(segment[i].type) != SGL_DESCR_TYPE_DATA_BLOCK) {
> +trace_nvme_dev_err_invalid_sgl_descriptor(nvme_cid(req),
> +NVME_SGL_TYPE(segment[i].type));
> +return NVME_SGL_DESCRIPTOR_TYPE_INVALID | NVME_DNR;
> +}
> +
> +if (*len == 0) {
> +if (!NVME_CTRL_SGLS_EXCESS_LENGTH(n->id_ctrl.sgls)) {
> +trace_nvme_dev_err_invalid_sgl_excess_length(nvme_cid(req));
> +return NVME_DATA_SGL_LENGTH_INVALID | NVME_DNR;
> +}
> +
> +break;
> +}
> +
> +addr = le64_to_cpu(segment[i].addr);
> +trans_len = MIN(*len, le64_to_cpu(segment[i].len));
> +
> +if (nvme_addr_is_cmb(n, addr)) {
> +/*
> +

Re: [PATCH v4 17/24] nvme: allow multiple aios per command

2020-01-09 Thread Beata Michalska

Hi Klaus,

On Thu, 19 Dec 2019 at 13:09, Klaus Jensen  wrote:
>
> This refactors how the device issues asynchronous block backend
> requests. The NvmeRequest now holds a queue of NvmeAIOs that are
> associated with the command. This allows multiple aios to be issued for
> a command. Only when all requests have been completed will the device
> post a completion queue entry.
>
> Because the device is currently guaranteed to only issue a single aio
> request per command, the benefit is not immediately obvious. But this
> functionality is required to support metadata, the dataset management
> command and other features.
>
> Signed-off-by: Klaus Jensen 
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c   | 422 ++
>  hw/block/nvme.h   | 126 +++--
>  hw/block/trace-events |   8 +
>  3 files changed, 461 insertions(+), 95 deletions(-)
>
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index be554ae1e94c..56659bbe263a 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -19,7 +19,8 @@
>   *  -drive file=,if=none,id=
>   *  -device nvme,drive=,serial=,id=, \
>   *  cmb_size_mb=, \
> - *  num_queues=
> + *  num_queues=, \
> + *  mdts=
>   *
>   * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
>   * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
> @@ -55,6 +56,7 @@
>  } while (0)
>
>  static void nvme_process_sq(void *opaque);
> +static void nvme_aio_cb(void *opaque, int ret);
>
>  static inline void *nvme_addr_to_cmb(NvmeCtrl *n, hwaddr addr)
>  {
> @@ -339,6 +341,116 @@ static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t *ptr, 
> uint32_t len,
>  return status;
>  }
>
> +static uint16_t nvme_map(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> +{
> +NvmeNamespace *ns = req->ns;
> +
> +uint32_t len = req->nlb << nvme_ns_lbads(ns);
> +uint64_t prp1 = le64_to_cpu(cmd->prp1);
> +uint64_t prp2 = le64_to_cpu(cmd->prp2);
> +
> +return nvme_map_prp(n, >qsg, >iov, prp1, prp2, len, req);
> +}
> +
> +static void nvme_aio_destroy(NvmeAIO *aio)
> +{
> +g_free(aio);
> +}
> +
> +static NvmeAIO *nvme_aio_new(BlockBackend *blk, int64_t offset, size_t len,
> +QEMUSGList *qsg, QEMUIOVector *iov, NvmeRequest *req,
> +NvmeAIOCompletionFunc *cb)

Minor: The indentation here (and in a few other places across the patchset)
does not seem right . And maybe inline ?
Also : seems that there are cases when some of the parameters are
not required (NULL) , maybe having a simplified version for those cases
might be useful ?

> +{
> +NvmeAIO *aio = g_malloc0(sizeof(*aio));
> +
> +*aio = (NvmeAIO) {
> +.blk = blk,
> +.offset = offset,
> +.len = len,
> +.req = req,
> +.qsg = qsg,
> +.iov = iov,
> +.cb = cb,
> +};
> +
> +return aio;
> +}
> +
> +static inline void nvme_req_register_aio(NvmeRequest *req, NvmeAIO *aio,
> +NvmeAIOOp opc)
> +{
> +aio->opc = opc;
> +
> +trace_nvme_dev_req_register_aio(nvme_cid(req), aio, blk_name(aio->blk),
> +aio->offset, aio->len, nvme_aio_opc_str(aio), req);
> +
> +if (req) {
> +QTAILQ_INSERT_TAIL(>aio_tailq, aio, tailq_entry);
> +}
> +}
> +
> +static void nvme_aio(NvmeAIO *aio)
> +{
> +BlockBackend *blk = aio->blk;
> +BlockAcctCookie *acct = >acct;
> +BlockAcctStats *stats = blk_get_stats(blk);
> +
> +bool is_write, dma;
> +
> +switch (aio->opc) {
> +case NVME_AIO_OPC_NONE:
> +break;
> +
> +case NVME_AIO_OPC_FLUSH:
> +block_acct_start(stats, acct, 0, BLOCK_ACCT_FLUSH);
> +aio->aiocb = blk_aio_flush(blk, nvme_aio_cb, aio);
> +break;
> +
> +case NVME_AIO_OPC_WRITE_ZEROES:
> +block_acct_start(stats, acct, aio->len, BLOCK_ACCT_WRITE);
> +aio->aiocb = blk_aio_pwrite_zeroes(blk, aio->offset, aio->len,
> +BDRV_REQ_MAY_UNMAP, nvme_aio_cb, aio);
> +break;
> +
> +case NVME_AIO_OPC_READ:
> +case NVME_AIO_OPC_WRITE:
> +dma = aio->qsg != NULL;
> +is_write = (aio->opc == NVME_AIO_OPC_WRITE);
> +
> +block_acct_start(stats, acct, aio->len,
> +is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
> +
> +if (dma) {
> +aio->aiocb = is_write ?
> +dma_blk_write(blk, aio->qsg, aio->offset,
> +BDRV_SECTOR_SIZE, nvme_aio_cb, aio) :
> +dma_blk_read(blk, aio->qsg, aio->offset,
> +BDRV_SECTOR_SIZE, nvme_aio_cb, aio);
> +
> +return;
> +}
> +
> +aio->aiocb = is_write ?
> +blk_aio_pwritev(blk, aio->offset, aio->iov, 0,
> +nvme_aio_cb, aio) :
> +blk_aio_preadv(blk, aio->offset, aio->iov, 0,
> +nvme_aio_cb, aio);
> +
> +break;
> +}
> +}
> +
> +static void nvme_rw_aio(BlockBackend *blk, uint64_t offset, NvmeRequest *req)
> +{
> +

Re: [PATCH v4 19/24] nvme: handle dma errors

2020-01-09 Thread Beata Michalska

Hi Klaus,

On Thu, 19 Dec 2019 at 13:09, Klaus Jensen  wrote:
>
> Handling DMA errors gracefully is required for the device to pass the
> block/011 test ("disable PCI device while doing I/O") in the blktests
> suite.
>
> With this patch the device passes the test by retrying "critical"
> transfers (posting of completion entries and processing of submission
> queue entries).
>
> If DMA errors occur at any other point in the execution of the command
> (say, while mapping the PRPs), the command is aborted with a Data
> Transfer Error status code.
>
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c   | 37 +
>  hw/block/trace-events |  2 ++
>  include/block/nvme.h  |  2 +-
>  3 files changed, 32 insertions(+), 9 deletions(-)
>
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 56659bbe263a..f6591285b504 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -71,14 +71,14 @@ static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr 
> addr)
>  return addr >= low && addr < hi;
>  }
>
> -static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
> +static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
>  {
>  if (n->cmbsz && nvme_addr_is_cmb(n, addr)) {
>  memcpy(buf, (void *) >cmbuf[addr - n->ctrl_mem.addr], size);
> -return;
> +return 0;
>  }
>
> -pci_dma_read(>parent_obj, addr, buf, size);
> +return pci_dma_read(>parent_obj, addr, buf, size);
>  }
>
>  static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
> @@ -216,7 +216,11 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList 
> *qsg, QEMUIOVector *iov,
>
>  nents = (len + n->page_size - 1) >> n->page_bits;
>  prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
> -nvme_addr_read(n, prp2, (void *) prp_list, prp_trans);
> +if (nvme_addr_read(n, prp2, (void *) prp_list, prp_trans)) {
> +trace_nvme_dev_err_addr_read(prp2);
> +status = NVME_DATA_TRANSFER_ERROR;
> +goto unmap;
> +}
>  while (len != 0) {
>  uint64_t prp_ent = le64_to_cpu(prp_list[i]);
>
> @@ -235,7 +239,11 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList 
> *qsg, QEMUIOVector *iov,
>  i = 0;
>  nents = (len + n->page_size - 1) >> n->page_bits;
>  prp_trans = MIN(n->max_prp_ents, nents) * 
> sizeof(uint64_t);
> -nvme_addr_read(n, prp_ent, (void *) prp_list, prp_trans);
> +if (nvme_addr_read(n, prp_ent, (void *) prp_list, 
> prp_trans)) {
> +trace_nvme_dev_err_addr_read(prp_ent);
> +status = NVME_DATA_TRANSFER_ERROR;
> +goto unmap;
> +}
>  prp_ent = le64_to_cpu(prp_list[i]);
>  }
>
> @@ -456,6 +464,7 @@ static void nvme_post_cqes(void *opaque)
>  NvmeCQueue *cq = opaque;
>  NvmeCtrl *n = cq->ctrl;
>  NvmeRequest *req, *next;
> +int ret;
>
>  QTAILQ_FOREACH_SAFE(req, >req_list, entry, next) {
>  NvmeSQueue *sq;
> @@ -471,9 +480,16 @@ static void nvme_post_cqes(void *opaque)
>  req->cqe.sq_id = cpu_to_le16(sq->sqid);
>  req->cqe.sq_head = cpu_to_le16(sq->head);
>  addr = cq->dma_addr + cq->tail * n->cqe_size;
> -nvme_inc_cq_tail(cq);
> -pci_dma_write(>parent_obj, addr, (void *)>cqe,
> +ret = pci_dma_write(>parent_obj, addr, (void *)>cqe,
>  sizeof(req->cqe));
> +if (ret) {
> +trace_nvme_dev_err_addr_write(addr);
> +QTAILQ_INSERT_TAIL(>req_list, req, entry);
> +timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
> +100 * SCALE_MS);
> +break;
> +}
> +nvme_inc_cq_tail(cq);
>  QTAILQ_INSERT_TAIL(>req_list, req, entry);
>  }
>  if (cq->tail != cq->head) {
> @@ -1595,7 +1611,12 @@ static void nvme_process_sq(void *opaque)
>
>  while (!(nvme_sq_empty(sq) || QTAILQ_EMPTY(>req_list))) {
>  addr = sq->dma_addr + sq->head * n->sqe_size;
> -nvme_addr_read(n, addr, (void *), sizeof(cmd));
> +if (nvme_addr_read(n, addr, (void *), sizeof(cmd))) {
> +trace_nvme_dev_err_addr_read(addr);
> +timer_mod(sq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
> +100 * SCALE_MS);
> +break;
> +}

Is there a chance we will end up repeatedly triggering the read error here
as this will come back to the same memory location each time (the sq->head
is not moving here) ?


BR
Beata

>  nvme_inc_sq_head(sq);
>
>  req = QTAILQ_FIRST(>req_list);
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 90a57fb6099a..09bfb3782dd0 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -83,6 +83,8 @@

Re: [RFC PATCH 1/1] target/arm: kvm: Handle DABT with no valid ISS

2020-01-07 Thread Beata Michalska

On Tue, 7 Jan 2020 at 14:28, Peter Maydell  wrote:
>
> On Fri, 20 Dec 2019 at 20:27, Beata Michalska
>  wrote:
> >
> > On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
> > exception with no valid ISS info to be decoded. The lack of decode info
> > makes it at least tricky to emulate those instruction which is one of the
> > (many) reasons why KVM will not even try to do so.
> >
> > Add suport for handling those by requesting KVM to inject external
> > dabt into the quest.
> >
> > Signed-off-by: Beata Michalska 
> > ---
> > +/*
> > + * Get current PC before it will get updated to except vector entry
> > + */
> > +target_ulong ins_addr = is_a64(env) ? env->pc
> > +/* AArch32 mode vs T32 aka Thumb mode */
> > +: env->regs[15] - (env->thumb ? 4 : 8);
>
> Another thing that occurred to me last night -- why do we need
> to do this adjustment of the PC/r15 ? If this is the kernel
> handing control to userspace to say "this is not an instruction
> I can handle, maybe you'd like to try" then surely it should
> do so with the PC pointing at the offending instruction?
> Similarly, if we ask the kernel to inject a data abort I
> would expect that the kernel would do the work of adjusting
> the PC forwards as the architecture requires when taking
> the exception.
>

The code here is just for easing debugging from Qemu perspective
and that is the only reason why we even try to read the value of PC
- it is not in any way needed by kernel to inject the abort.
One can use the monitor to decode the instruction, provided it is still
available at the memory location pointed by PC (handy monitor_disas)
- that is why logging the address with decoded instruction,
as it is the only thing that is being done here. Still the address of actually
executed instruction for ARM would be PC–8 (PC–4 for Thumb)
that's why the adjustment.

BR
Beata


> thanks
> -- PMM

Re: [RFC PATCH 1/1] target/arm: kvm: Handle DABT with no valid ISS

2020-01-07 Thread Beata Michalska

On Mon, 6 Jan 2020 at 17:15, Peter Maydell  wrote:
>
> On Fri, 20 Dec 2019 at 20:27, Beata Michalska
>  wrote:
> >
> > On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
> > exception with no valid ISS info to be decoded. The lack of decode info
> > makes it at least tricky to emulate those instruction which is one of the
> > (many) reasons why KVM will not even try to do so.
> >
> > Add suport for handling those by requesting KVM to inject external
> > dabt into the quest.
> >
> > Signed-off-by: Beata Michalska 
> > ---
> >  accel/kvm/kvm-all.c| 15 +++
> >  accel/stubs/kvm-stub.c |  4 ++
> >  include/sysemu/kvm.h   |  1 +
> >  target/arm/cpu.h   |  3 +-
> >  target/arm/kvm.c   | 95 ++
> >  target/arm/kvm32.c |  3 ++
> >  target/arm/kvm64.c |  3 ++
> >  target/arm/kvm_arm.h   | 19 +
> >  8 files changed, 142 insertions(+), 1 deletion(-)
> >
> > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> > index ca00daa2f5..a3ee038142 100644
> > --- a/accel/kvm/kvm-all.c
> > +++ b/accel/kvm/kvm-all.c
> > @@ -2174,6 +2174,14 @@ static void do_kvm_cpu_synchronize_state(CPUState 
> > *cpu, run_on_cpu_data arg)
> >  }
> >  }
> >
> > +static void do_kvm_cpu_synchronize_state_force(CPUState *cpu,
> > +   run_on_cpu_data arg)
> > +{
> > +kvm_arch_get_registers(cpu);
> > +cpu->vcpu_dirty = true;
> > +}
>
> Why is this functionality special such that it needs a non-standard
> way of synchronizing state with KVM that nothing else does ?

We need the up-to-date state when handling the NISV and this is being
achieved by calling  synchronise cpu state. This will set the 'dirty' flag,
as expected. In order to get KVM to insert the external abort we need to
set vcpu events. So far so good. Still, as the cpu state is marked as 'dirty',
before entering the KVM_RUN again, Qemu will overwrite  relevant regs
-> see kvm_arch_put_registers. This messes up with  the regs
setup done by KVM when injecting the external abort. This is why we need
a sequence:
sync vcpu -> put events -> sync vcpu again
so that when entering KVM_RUN Qemu has all the updates needed.
Now, I could just set the 'dirty' flag to false and skip the regular
kvm_arch_put_registers after setting the vcpu event,
but it seemed more  sensible to have the sync version that despite the dirty
flag performs the requested sync. This is just to avoid hacky way
round the problem.
I was tempted to re-factor slightly syncing regs and events but that seemed
too invasive for this particular change. I might try to do just that though.

>
> > diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> > index 9fe233b9bf..0cacc61d8a 100644
> > --- a/include/sysemu/kvm.h
> > +++ b/include/sysemu/kvm.h
> > @@ -483,6 +483,7 @@ void kvm_cpu_synchronize_state(CPUState *cpu);
> >  void kvm_cpu_synchronize_post_reset(CPUState *cpu);
> >  void kvm_cpu_synchronize_post_init(CPUState *cpu);
> >  void kvm_cpu_synchronize_pre_loadvm(CPUState *cpu);
> > +void kvm_cpu_synchronize_state_force(CPUState *cpu);
> >
> >  void kvm_init_cpu_signals(CPUState *cpu);
> >
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index 5f70e9e043..e11b5e7438 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -558,7 +558,8 @@ typedef struct CPUARMState {
> >  uint8_t has_esr;
> >  uint64_t esr;
> >  } serror;
> > -
> > +/* Status field for pending extarnal dabt */
>
> "external" (I think you have the same typo later in the patch too)

And I did run the codespell  ... Will blame it on Christmas rush ...
And will fix all the typos.

>
> > +uint8_t ext_dabt_pending;
> >  /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
> >  uint32_t irq_line_state;
>
>
> > @@ -701,6 +719,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run 
> > *run)
> >  ret = EXCP_DEBUG;
> >  } /* otherwise return to guest */
> >  break;
> > +case KVM_EXIT_ARM_NISV:
> > +/* External DAB with no valid iss to decode */
>
> "DABT"
>
> > +ret = kvm_arm_handle_dabt_nisv(cs, run->arm_nisv.esr_iss,
> > + run->arm_nisv.fault_ipa);
>
> (indentation looks odd here?)

It does indeed 
>
> > +break;
> >  default:
> >  qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
> >__f

[RFC PATCH 0/1] target/arm: kvm: Support for KVM DABT without valid ISS

2019-12-20 Thread Beata Michalska

Some of the ARMv7 & ARMv8 load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate the instruction which is one of the
(many) reasons why KVM will not even try to do so.

So far, if a guest made an attempt to access memory outside the memory slot,
KVM reported vague ENOSYS. As a result QEMU exited with no useful information
being provided or even a clue on what has just happened.

Recently ARM KVM introduced support for notifying guest of an attempt to
execute an instruction that resulted in dabt with no valid ISS decoding info.
This still leaves QEMU to handle the case, but at least now, it can enable
further debugging of the encountered issue by being more verbose
in a (hopefully) useful way.



Beata Michalska (1):
  target/arm: kvm: Handle DABT with no valid ISS

 accel/kvm/kvm-all.c| 15 +++
 accel/stubs/kvm-stub.c |  4 ++
 include/sysemu/kvm.h   |  1 +
 target/arm/cpu.h   |  3 +-
 target/arm/kvm.c   | 95 ++
 target/arm/kvm32.c |  3 ++
 target/arm/kvm64.c |  3 ++
 target/arm/kvm_arm.h   | 19 +
 8 files changed, 142 insertions(+), 1 deletion(-)

-- 
2.17.1

[RFC PATCH 1/1] target/arm: kvm: Handle DABT with no valid ISS

2019-12-20 Thread Beata Michalska

On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
exception with no valid ISS info to be decoded. The lack of decode info
makes it at least tricky to emulate those instruction which is one of the
(many) reasons why KVM will not even try to do so.

Add suport for handling those by requesting KVM to inject external
dabt into the quest.

Signed-off-by: Beata Michalska 
---
 accel/kvm/kvm-all.c| 15 +++
 accel/stubs/kvm-stub.c |  4 ++
 include/sysemu/kvm.h   |  1 +
 target/arm/cpu.h   |  3 +-
 target/arm/kvm.c   | 95 ++
 target/arm/kvm32.c |  3 ++
 target/arm/kvm64.c |  3 ++
 target/arm/kvm_arm.h   | 19 +
 8 files changed, 142 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index ca00daa2f5..a3ee038142 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2174,6 +2174,14 @@ static void do_kvm_cpu_synchronize_state(CPUState *cpu, 
run_on_cpu_data arg)
 }
 }
 
+static void do_kvm_cpu_synchronize_state_force(CPUState *cpu,
+   run_on_cpu_data arg)
+{
+kvm_arch_get_registers(cpu);
+cpu->vcpu_dirty = true;
+}
+
+
 void kvm_cpu_synchronize_state(CPUState *cpu)
 {
 if (!cpu->vcpu_dirty) {
@@ -2181,6 +2189,13 @@ void kvm_cpu_synchronize_state(CPUState *cpu)
 }
 }
 
+void kvm_cpu_synchronize_state_force(CPUState *cpu)
+{
+/* Force the sync */
+run_on_cpu(cpu, do_kvm_cpu_synchronize_state_force, RUN_ON_CPU_NULL);
+}
+
+
 static void do_kvm_cpu_synchronize_post_reset(CPUState *cpu, run_on_cpu_data 
arg)
 {
 kvm_arch_put_registers(cpu, KVM_PUT_RESET_STATE);
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 82f118d2df..e917d1d55e 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -58,6 +58,10 @@ void kvm_cpu_synchronize_post_init(CPUState *cpu)
 {
 }
 
+void kvm_cpu_synchronize_state_force(CPUState *cpu)
+{
+}
+
 int kvm_cpu_exec(CPUState *cpu)
 {
 abort();
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 9fe233b9bf..0cacc61d8a 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -483,6 +483,7 @@ void kvm_cpu_synchronize_state(CPUState *cpu);
 void kvm_cpu_synchronize_post_reset(CPUState *cpu);
 void kvm_cpu_synchronize_post_init(CPUState *cpu);
 void kvm_cpu_synchronize_pre_loadvm(CPUState *cpu);
+void kvm_cpu_synchronize_state_force(CPUState *cpu);
 
 void kvm_init_cpu_signals(CPUState *cpu);
 
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 5f70e9e043..e11b5e7438 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -558,7 +558,8 @@ typedef struct CPUARMState {
 uint8_t has_esr;
 uint64_t esr;
 } serror;
-
+/* Status field for pending extarnal dabt */
+uint8_t ext_dabt_pending;
 /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
 uint32_t irq_line_state;
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 5b82cefef6..10fe739c2d 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -37,6 +37,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 
 static bool cap_has_mp_state;
 static bool cap_has_inject_serror_esr;
+static bool cap_has_inject_ext_dabt; /* KVM_CAP_ARM_INJECT_EXT_DABT */
 
 static ARMHostCPUFeatures arm_host_cpu_features;
 
@@ -62,6 +63,12 @@ void kvm_arm_init_serror_injection(CPUState *cs)
 KVM_CAP_ARM_INJECT_SERROR_ESR);
 }
 
+void kvm_arm_init_ext_dabt_injection(CPUState *cs)
+{
+cap_has_inject_ext_dabt = kvm_check_extension(cs->kvm_state,
+KVM_CAP_ARM_INJECT_EXT_DABT);
+}
+
 bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
   int *fdarray,
   struct kvm_vcpu_init *init)
@@ -218,6 +225,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 ret = -EINVAL;
 }
 
+if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER))
+if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
+warn_report("Failed to enable DABT NISV cap");
+}
+
 return ret;
 }
 
@@ -600,6 +612,10 @@ int kvm_put_vcpu_events(ARMCPU *cpu)
 events.exception.serror_esr = env->serror.esr;
 }
 
+if (cap_has_inject_ext_dabt) {
+events.exception.ext_dabt_pending = env->ext_dabt_pending;
+}
+
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, );
 if (ret) {
 error_report("failed to put vcpu events");
@@ -629,6 +645,8 @@ int kvm_get_vcpu_events(ARMCPU *cpu)
 env->serror.has_esr = events.exception.serror_has_esr;
 env->serror.esr = events.exception.serror_esr;
 
+env->ext_dabt_pending = events.exception.ext_dabt_pending;
+
 return 0;
 }
 
@@ -701,6 +719,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 ret = EXCP_DEBUG;
 } /* otherwise return to gues

Re: Recent change pmem related breaks Xen migration

2019-12-19 Thread Beata Michalska

Hi Anthony,

On Thu, 19 Dec 2019 at 15:42, Anthony PERARD  wrote:
>
> Hi,
>
> Commit bd108a44bc29 ("migration: ram: Switch to ram block writeback")
> breaks migration on Xen. We have:
>   ramblock_ptr: Assertion `offset_in_ramblock(block, offset)' failed.
>
> I've track it down to qemu_ram_writeback() calling ramblock_ptr()
> unconditionally, even when the result will not be used.
>
> Maybe we could call ramblock_ptr() twice in that function? I've prepared
> a patch.
>
>
> FYI, full-ish trace on restore of a xen guest:
> #3  0x7f82d0848526 in __assert_fail () from /usr/lib/libc.so.6
> #4  0x562dc4578122 in ramblock_ptr (block=0x562dc5ebe2a0, offset=0) at 
> /root/build/qemu/include/exec/ram_addr.h:120
> #5  0x562dc457d1b7 in qemu_ram_writeback (block=0x562dc5ebe2a0, start=0, 
> length=515899392) at /root/build/qemu/exec.c:2169
> #6  0x562dc45e8941 in qemu_ram_block_writeback (block=0x562dc5ebe2a0) at 
> /root/build/qemu/include/exec/ram_addr.h:182
> #7  0x562dc45f0b56 in ram_load_cleanup (opaque=0x562dc510fe00 
> ) at /root/build/qemu/migration/ram.c:3983
> #8  0x562dc49970b6 in qemu_loadvm_state_cleanup () at 
> migration/savevm.c:2415
> #9  0x562dc4997548 in qemu_loadvm_state (f=0x562dc6a1c600) at 
> migration/savevm.c:2597
> #10 0x562dc4987be7 in process_incoming_migration_co (opaque=0x0) at 
> migration/migration.c:454
> #11 0x562dc4b907e5 in coroutine_trampoline (i0=-962514432, i1=22061) at 
> util/coroutine-ucontext.c:115
>
> And *block in ramblock_ptr():
> (gdb) p *block
> $2 = {
>   rcu = {
> next = 0x0,
> func = 0x0
>   },
>   mr = 0x562dc512e140 ,
>   host = 0x0,
>   colo_cache = 0x0,
>   offset = 0,
>   used_length = 515899392,
>   max_length = 515899392,
>   resized = 0x0,
>   flags = 16,
>   idstr = "xen.ram", '\000' ,
>   next = {
> le_next = 0x562dc67bf7e0,
> le_prev = 0x562dc510f1a0 
>   },
>   ramblock_notifiers = {
> lh_first = 0x0
>   },
>   fd = -1,
>   page_size = 4096,
>   bmap = 0x0,
>   receivedmap = 0x562dc6a24a60,
>   clear_bmap = 0x0,
>   clear_bmap_shift = 0 '\000'
> }
>
> Cheers,
>
> --
> Anthony PERARD

I have already replied to your patch submission.
Looks good and thanks for fixing .

BR
Beata

Re: [PATCH] Memory: Only call ramblock_ptr when needed in qemu_ram_writeback

2019-12-19 Thread Beata Michalska

Hi Anthony,

On Thu, 19 Dec 2019 at 15:43, Anthony PERARD  wrote:
>
> It is possible that a ramblock doesn't have memory that QEMU can
> access, this is the case with the Xen hypervisor.
>
> In order to avoid to trigger an assert, only call ramblock_ptr() when
> needed in qemu_ram_writeback(). This should fix migration of Xen
> guests that was broken with bd108a44bc29 ("migration: ram: Switch to
> ram block writeback").
>
> Signed-off-by: Anthony PERARD 
> ---
>  exec.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index a34c34818404..b11010e0cb4c 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2166,14 +2166,13 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t 
> newsize, Error **errp)
>   */
>  void qemu_ram_writeback(RAMBlock *block, ram_addr_t start, ram_addr_t length)
>  {
> -void *addr = ramblock_ptr(block, start);
> -
>  /* The requested range should fit in within the block range */
>  g_assert((start + length) <= block->used_length);
>
>  #ifdef CONFIG_LIBPMEM
>  /* The lack of support for pmem should not block the sync */
>  if (ramblock_is_pmem(block)) {
> +void *addr = ramblock_ptr(block, start);
>  pmem_persist(addr, length);
>  return;
>  }
> @@ -2184,6 +2183,7 @@ void qemu_ram_writeback(RAMBlock *block, ram_addr_t 
> start, ram_addr_t length)
>   * specified as persistent (or is not one) - use the msync.
>   * Less optimal but still achieves the same goal
>   */
> +void *addr = ramblock_ptr(block, start);
>  if (qemu_msync(addr, length, block->fd)) {
>  warn_report("%s: failed to sync memory range: start: "
>  RAM_ADDR_FMT " length: " RAM_ADDR_FMT,

We could also do :
void *addr = block->host ? ramblock_ptr : NULL

Looks good to me thought.
Thanks for fixing.

BR

Beata
> --
> Anthony PERARD
>

Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM

2019-12-09 Thread Beata Michalska

On Sat, 7 Dec 2019 at 09:33, gengdongjiu  wrote:
>
>
>
> On 2019/11/22 23:47, Beata Michalska wrote:
> > Hi,
> >
> > On Mon, 11 Nov 2019 at 01:48, Xiang Zheng  wrote:
> >>
> >> From: Dongjiu Geng 
> >>
> >> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> >> translates the host VA delivered by host to guest PA, then fills this PA
> >> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> >> type.
> >>
> >> When guest accesses the poisoned memory, it will generate a Synchronous
> >> External Abort(SEA). Then host kernel gets an APEI notification and calls
> >> memory_failure() to unmapped the affected page in stage 2, finally
> >> returns to guest.
> >>
> >> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> >> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> >> Qemu, Qemu records this error address into guest APEI GHES memory and
> >> notifes guest using Synchronous-External-Abort(SEA).
> >>
> >> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> >> in which we can setup the type of exception and the syndrome information.
> >> When switching to guest, the target vcpu will jump to the synchronous
> >> external abort vector table entry.
> >>
> >> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> >> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> >> not valid and hold an UNKNOWN value. These values will be set to KVM
> >> register structures through KVM_SET_ONE_REG IOCTL.
> >>
> >> Signed-off-by: Dongjiu Geng 
> >> Signed-off-by: Xiang Zheng 
> >> Reviewed-by: Michael S. Tsirkin 
> >> ---
> >>  hw/acpi/acpi_ghes.c | 297 
> >>  include/hw/acpi/acpi_ghes.h |   4 +
> >>  include/sysemu/kvm.h|   3 +-
> >>  target/arm/cpu.h|   4 +
> >>  target/arm/helper.c |   2 +-
> >>  target/arm/internals.h  |   5 +-
> >>  target/arm/kvm64.c  |  64 
> >>  target/arm/tlb_helper.c |   2 +-
> >>  target/i386/cpu.h   |   2 +
> >>  9 files changed, 377 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> >> index 42c00ff3d3..f5b54990c0 100644
> >> --- a/hw/acpi/acpi_ghes.c
> >> +++ b/hw/acpi/acpi_ghes.c
> >> @@ -39,6 +39,34 @@
> >>  /* The max size in bytes for one error block */
> >>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH   0x1000
> >>
> >> +/*
> >> + * The total size of Generic Error Data Entry
> >> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-343 Generic Error Data Entry
> >> + */
> >> +#define ACPI_GHES_DATA_LENGTH   72
> >> +
> >> +/*
> >> + * The memory section CPER size,
> >> + * UEFI 2.6: N.2.5 Memory Error Section
> >> + */
> >> +#define ACPI_GHES_MEM_CPER_LENGTH   80
> >> +
> >> +/*
> >> + * Masks for block_status flags
> >> + */
> >> +#define ACPI_GEBS_UNCORRECTABLE 1
> >
> > Why not listing all supported statuses ? Similar to error severity below ?
> >
> >> +
> >> +/*
> >> + * Values for error_severity field
> >> + */
> >> +enum AcpiGenericErrorSeverity {
> >> +ACPI_CPER_SEV_RECOVERABLE,
> >> +ACPI_CPER_SEV_FATAL,
> >> +ACPI_CPER_SEV_CORRECTED,
> >> +ACPI_CPER_SEV_NONE,
> >> +};
> >> +
> >>  /*
> >>   * Now only support ARMv8 SEA notification type error source
> >>   */
> >> @@ -49,6 +77,16 @@
> >>   */
> >>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
> >>
> >> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)\
> >> +{{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 
> >> 0xff, \
> >> +((b) >> 8) & 0xff, (b) & 0xff,   \
> >> +((c) >> 8) & 0xff, (c) & 0xff,\
> >> +(d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> >> +
> >> +#define UEFI_CPER_SEC_PLATFORM_MEM   \
> >> +UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> >> +0xED, 0x7C, 0x83, 0xB1)
> &

Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM

2019-11-27 Thread Beata Michalska

On Wed, 27 Nov 2019 at 13:03, Igor Mammedov  wrote:
>
> On Wed, 27 Nov 2019 20:47:15 +0800
> Xiang Zheng  wrote:
>
> > Hi Beata,
> >
> > Thanks for you review!
> >
> > On 2019/11/22 23:47, Beata Michalska wrote:
> > > Hi,
> > >
> > > On Mon, 11 Nov 2019 at 01:48, Xiang Zheng  wrote:
> > >>
> > >> From: Dongjiu Geng 
> > >>
> > >> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> > >> translates the host VA delivered by host to guest PA, then fills this PA
> > >> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> > >> type.
> > >>
> > >> When guest accesses the poisoned memory, it will generate a Synchronous
> > >> External Abort(SEA). Then host kernel gets an APEI notification and calls
> > >> memory_failure() to unmapped the affected page in stage 2, finally
> > >> returns to guest.
> > >>
> > >> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> > >> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> > >> Qemu, Qemu records this error address into guest APEI GHES memory and
> > >> notifes guest using Synchronous-External-Abort(SEA).
> > >>
> > >> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> > >> in which we can setup the type of exception and the syndrome information.
> > >> When switching to guest, the target vcpu will jump to the synchronous
> > >> external abort vector table entry.
> > >>
> > >> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> > >> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> > >> not valid and hold an UNKNOWN value. These values will be set to KVM
> > >> register structures through KVM_SET_ONE_REG IOCTL.
> > >>
> > >> Signed-off-by: Dongjiu Geng 
> > >> Signed-off-by: Xiang Zheng 
> > >> Reviewed-by: Michael S. Tsirkin 
> > >> ---
> [...]
> > >> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> > >> index cb62ec9c7b..8e3c5b879e 100644
> > >> --- a/include/hw/acpi/acpi_ghes.h
> > >> +++ b/include/hw/acpi/acpi_ghes.h
> > >> @@ -24,6 +24,9 @@
> > >>
> > >>  #include "hw/acpi/bios-linker-loader.h"
> > >>
> > >> +#define ACPI_GHES_CPER_OK   1
> > >> +#define ACPI_GHES_CPER_FAIL 0
> > >> +
> > >
> > > Is there really a need to introduce those ?
> > >
> >
> > Don't you think it's more clear than using "1" or "0"? :)
>
> or maybe just reuse default libc return convention: 0 - ok, -1 - fail
> and drop custom macros
>

Totally agree.

BR
Beata
> >
> > >>  /*
> > >>   * Values for Hardware Error Notification Type field
> > >>   */
> [...]
>

Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM

2019-11-27 Thread Beata Michalska

Hi

On Wed, 27 Nov 2019 at 12:47, Xiang Zheng  wrote:
>
> Hi Beata,
>
> Thanks for you review!
>
YAW

> On 2019/11/22 23:47, Beata Michalska wrote:
> > Hi,
> >
> > On Mon, 11 Nov 2019 at 01:48, Xiang Zheng  wrote:
> >>
> >> From: Dongjiu Geng 
> >>
> >> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> >> translates the host VA delivered by host to guest PA, then fills this PA
> >> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> >> type.
> >>
> >> When guest accesses the poisoned memory, it will generate a Synchronous
> >> External Abort(SEA). Then host kernel gets an APEI notification and calls
> >> memory_failure() to unmapped the affected page in stage 2, finally
> >> returns to guest.
> >>
> >> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> >> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> >> Qemu, Qemu records this error address into guest APEI GHES memory and
> >> notifes guest using Synchronous-External-Abort(SEA).
> >>
> >> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> >> in which we can setup the type of exception and the syndrome information.
> >> When switching to guest, the target vcpu will jump to the synchronous
> >> external abort vector table entry.
> >>
> >> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> >> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> >> not valid and hold an UNKNOWN value. These values will be set to KVM
> >> register structures through KVM_SET_ONE_REG IOCTL.
> >>
> >> Signed-off-by: Dongjiu Geng 
> >> Signed-off-by: Xiang Zheng 
> >> Reviewed-by: Michael S. Tsirkin 
> >> ---
> >>  hw/acpi/acpi_ghes.c | 297 
> >>  include/hw/acpi/acpi_ghes.h |   4 +
> >>  include/sysemu/kvm.h|   3 +-
> >>  target/arm/cpu.h|   4 +
> >>  target/arm/helper.c |   2 +-
> >>  target/arm/internals.h  |   5 +-
> >>  target/arm/kvm64.c  |  64 
> >>  target/arm/tlb_helper.c |   2 +-
> >>  target/i386/cpu.h   |   2 +
> >>  9 files changed, 377 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> >> index 42c00ff3d3..f5b54990c0 100644
> >> --- a/hw/acpi/acpi_ghes.c
> >> +++ b/hw/acpi/acpi_ghes.c
> >> @@ -39,6 +39,34 @@
> >>  /* The max size in bytes for one error block */
> >>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH   0x1000
> >>
> >> +/*
> >> + * The total size of Generic Error Data Entry
> >> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-343 Generic Error Data Entry
> >> + */
> >> +#define ACPI_GHES_DATA_LENGTH   72
> >> +
> >> +/*
> >> + * The memory section CPER size,
> >> + * UEFI 2.6: N.2.5 Memory Error Section
> >> + */
> >> +#define ACPI_GHES_MEM_CPER_LENGTH   80
> >> +
> >> +/*
> >> + * Masks for block_status flags
> >> + */
> >> +#define ACPI_GEBS_UNCORRECTABLE 1
> >
> > Why not listing all supported statuses ? Similar to error severity below ?
> >
>
> We now only use the first bit for uncorrectable error. The correctable errors
> are handled in host and would not be delivered to QEMU.
>
> I think it's unnecessary to list all the bit masks.

I'm not sure we are using all the error severity types either, but fair enough.
>
> >> +
> >> +/*
> >> + * Values for error_severity field
> >> + */
> >> +enum AcpiGenericErrorSeverity {
> >> +ACPI_CPER_SEV_RECOVERABLE,
> >> +ACPI_CPER_SEV_FATAL,
> >> +ACPI_CPER_SEV_CORRECTED,
> >> +ACPI_CPER_SEV_NONE,
> >> +};
> >> +
> >>  /*
> >>   * Now only support ARMv8 SEA notification type error source
> >>   */
> >> @@ -49,6 +77,16 @@
> >>   */
> >>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
> >>
> >> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)\
> >> +{{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 
> >> 0xff, \
> >> +((b) >> 8) & 0xff, (b) & 0xff,   \
> >> +((c) >>

Re: [PATCH v2 15/20] nvme: add support for scatter gather lists

2019-11-25 Thread Beata Michalska

On Mon, 25 Nov 2019 at 06:21, Klaus Birkelund  wrote:
>
> On Tue, Nov 12, 2019 at 03:25:18PM +, Beata Michalska wrote:
> > Hi Klaus,
> >
> > On Tue, 15 Oct 2019 at 11:57, Klaus Jensen  wrote:
> > > +static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg,
> > > +NvmeSglDescriptor sgl, uint32_t len, NvmeRequest *req)
> > > +{
> > > +const int MAX_NSGLD = 256;
> > > +
> > > +NvmeSglDescriptor segment[MAX_NSGLD];
> > > +uint64_t nsgld;
> > > +uint16_t status;
> > > +bool sgl_in_cmb = false;
> > > +hwaddr addr = le64_to_cpu(sgl.addr);
> > > +
> > > +trace_nvme_map_sgl(req->cid, NVME_SGL_TYPE(sgl.type), req->nlb, len);
> > > +
> > > +pci_dma_sglist_init(qsg, >parent_obj, 1);
> > > +
> > > +/*
> > > + * If the entire transfer can be described with a single data block 
> > > it can
> > > + * be mapped directly.
> > > + */
> > > +if (NVME_SGL_TYPE(sgl.type) == SGL_DESCR_TYPE_DATA_BLOCK) {
> > > +status = nvme_map_sgl_data(n, qsg, , 1, , req);
> > > +if (status) {
> > > +goto unmap;
> > > +}
> > > +
> > > +goto out;
> > > +}
> > > +
> > > +/*
> > > + * If the segment is located in the CMB, the submission queue of the
> > > + * request must also reside there.
> > > + */
> > > +if (nvme_addr_is_cmb(n, addr)) {
> > > +if (!nvme_addr_is_cmb(n, req->sq->dma_addr)) {
> > > +return NVME_INVALID_USE_OF_CMB | NVME_DNR;
> > > +}
> > > +
> > > +sgl_in_cmb = true;
> > > +}
> > > +
> > > +while (NVME_SGL_TYPE(sgl.type) == SGL_DESCR_TYPE_SEGMENT) {
> > > +bool addr_is_cmb;
> > > +
> > > +nsgld = le64_to_cpu(sgl.len) / sizeof(NvmeSglDescriptor);
> > > +
> > > +/* read the segment in chunks of 256 descriptors (4k) */
> > > +while (nsgld > MAX_NSGLD) {
> > > +nvme_addr_read(n, addr, segment, sizeof(segment));
> > Is there any chance this will go outside the CMB?
> >
>
> Yes, there certainly was a chance of that. This has been fixed in a
> general way for both nvme_map_sgl and nvme_map_sgl_data.
>
> > > +
> > > +status = nvme_map_sgl_data(n, qsg, segment, MAX_NSGLD, , 
> > > req);
> > > +if (status) {
> > > +goto unmap;
> > > +}
> > > +
> > > +nsgld -= MAX_NSGLD;
> > > +addr += MAX_NSGLD * sizeof(NvmeSglDescriptor);
> > > +}
> > > +
> > > +nvme_addr_read(n, addr, segment, nsgld * 
> > > sizeof(NvmeSglDescriptor));
> > > +
> > > +sgl = segment[nsgld - 1];
> > > +addr = le64_to_cpu(sgl.addr);
> > > +
> > > +/* an SGL is allowed to end with a Data Block in a regular 
> > > Segment */
> > > +if (NVME_SGL_TYPE(sgl.type) == SGL_DESCR_TYPE_DATA_BLOCK) {
> > > +status = nvme_map_sgl_data(n, qsg, segment, nsgld, , 
> > > req);
> > > +if (status) {
> > > +goto unmap;
> > > +}
> > > +
> > > +goto out;
> > > +}
> > > +
> > > +/* do not map last descriptor */
> > > +status = nvme_map_sgl_data(n, qsg, segment, nsgld - 1, , 
> > > req);
> > > +if (status) {
> > > +goto unmap;
> > > +}
> > > +
> > > +/*
> > > + * If the next segment is in the CMB, make sure that the sgl was
> > > + * already located there.
> > > + */
> > > +addr_is_cmb = nvme_addr_is_cmb(n, addr);
> > > +if ((sgl_in_cmb && !addr_is_cmb) || (!sgl_in_cmb && 
> > > addr_is_cmb)) {
> > > +status = NVME_INVALID_USE_OF_CMB | NVME_DNR;
> > > +goto unmap;
> > > +}
> > > +}
> > > +
> > > +/*
> > > + * If the segment did not end with a Data Block or a Segment 
> > > descriptor, it
> > > + * must be a Last Segment descriptor.
> > > + */
> > > +if (NVME_SGL_TYPE(sgl.type) != SGL_DESCR_TYPE_LAST_SEGMENT) {
> > > +trace_nvme_err_invalid_

Re: [PATCH v2 14/20] nvme: allow multiple aios per command

2019-11-25 Thread Beata Michalska

On Thu, 21 Nov 2019 at 11:57, Klaus Birkelund  wrote:
>
> On Tue, Nov 12, 2019 at 03:25:06PM +, Beata Michalska wrote:
> > Hi Klaus,
> >
> > On Tue, 15 Oct 2019 at 11:55, Klaus Jensen  wrote:
> > > @@ -341,19 +344,18 @@ static uint16_t nvme_dma_write_prp(NvmeCtrl *n, 
> > > uint8_t *ptr, uint32_t len,
> > Any reason why the nvme_dma_write_prp is missing the changes applied
> > to nvme_dma_read_prp ?
> >
>
> This was adressed by proxy through changes to the previous patch
> (by combining the read/write functions).
>
> > > +case NVME_AIO_OPC_WRITE_ZEROES:
> > > +block_acct_start(stats, acct, aio->iov.size, BLOCK_ACCT_WRITE);
> > > +aio->aiocb = blk_aio_pwrite_zeroes(aio->blk, aio->offset,
> > > +aio->iov.size, BDRV_REQ_MAY_UNMAP, nvme_aio_cb, aio);
> > Minor: aio->blk  => blk
> >
>
> Thanks. Fixed this in a couple of other places as well.
>
> > > @@ -621,8 +880,11 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeCmd 
> > > *cmd)
> > >  sq = n->sq[qid];
> > >  while (!QTAILQ_EMPTY(>out_req_list)) {
> > >  req = QTAILQ_FIRST(>out_req_list);
> > > -assert(req->aiocb);
> > > -blk_aio_cancel(req->aiocb);
> > > +while (!QTAILQ_EMPTY(>aio_tailq)) {
> > > +aio = QTAILQ_FIRST(>aio_tailq);
> > > +assert(aio->aiocb);
> > > +blk_aio_cancel(aio->aiocb);
> > What about releasing memory associated with given aio ?
>
> I believe the callback is still called when cancelled? That should take
> care of it. Or have I misunderstood that? At least for the DMAAIOCBs it
> is.
>
It seems that the completion callback is supposed to be called.
My bad.

BR
Beata
> > > +struct NvmeAIO {
> > > +NvmeRequest *req;
> > > +
> > > +NvmeAIOOp   opc;
> > > +int64_t offset;
> > > +BlockBackend*blk;
> > > +BlockAIOCB  *aiocb;
> > > +BlockAcctCookie acct;
> > > +
> > > +NvmeAIOCompletionFunc *cb;
> > > +void  *cb_arg;
> > > +
> > > +QEMUSGList   *qsg;
> > > +QEMUIOVector iov;
> >
> > There is a bit of inconsistency on the ownership of IOVs and SGLs.
> > SGLs now seem to be owned by request whereas IOVs by the aio.
> > WOuld be good to have that unified or documented at least.
> >
>
> Fixed this. The NvmeAIO only holds pointers now.
>
> > > +#define NVME_REQ_TRANSFER_DMA  0x1
> > This one does not seem to be used 
> >
>
> I have dropped the flags and reverted to a simple req->is_cmb as that is
> all that is really needed.
>

Re: [PATCH v2 13/20] nvme: refactor prp mapping

2019-11-25 Thread Beata Michalska

On Wed, 20 Nov 2019 at 09:39, Klaus Birkelund  wrote:
>
> On Tue, Nov 12, 2019 at 03:23:43PM +, Beata Michalska wrote:
> > Hi Klaus,
> >
> > On Tue, 15 Oct 2019 at 11:57, Klaus Jensen  wrote:
> > >
> > > Instead of handling both QSGs and IOVs in multiple places, simply use
> > > QSGs everywhere by assuming that the request does not involve the
> > > controller memory buffer (CMB). If the request is found to involve the
> > > CMB, convert the QSG to an IOV and issue the I/O. The QSG is converted
> > > to an IOV by the dma helpers anyway, so the CMB path is not unfairly
> > > affected by this simplifying change.
> > >
> >
> > Out of curiosity, in how many cases the SG list will have to
> > be converted to IOV ? Does that justify creating the SG list in vain ?
> >
>
> You got me wondering. Only using QSGs does not really remove much
> complexity, so I readded the direct use of IOVs for the CMB path. There
> is no harm in that.
>
> > > +static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, uint64_t prp1,
> > > +uint64_t prp2, uint32_t len, NvmeRequest *req)
> > >  {
> > >  hwaddr trans_len = n->page_size - (prp1 % n->page_size);
> > >  trans_len = MIN(len, trans_len);
> > >  int num_prps = (len >> n->page_bits) + 1;
> > > +uint16_t status = NVME_SUCCESS;
> > > +bool prp_list_in_cmb = false;
> > > +
> > > +trace_nvme_map_prp(req->cid, req->cmd.opcode, trans_len, len, prp1, 
> > > prp2,
> > > +num_prps);
> > >
> > >  if (unlikely(!prp1)) {
> > >  trace_nvme_err_invalid_prp();
> > >  return NVME_INVALID_FIELD | NVME_DNR;
> > > -} else if (n->cmbsz && prp1 >= n->ctrl_mem.addr &&
> > > -   prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) 
> > > {
> > > -qsg->nsg = 0;
> > > -qemu_iovec_init(iov, num_prps);
> > > -qemu_iovec_add(iov, (void *)>cmbuf[prp1 - n->ctrl_mem.addr], 
> > > trans_len);
> > > -} else {
> > > -pci_dma_sglist_init(qsg, >parent_obj, num_prps);
> > > -qemu_sglist_add(qsg, prp1, trans_len);
> > >  }
> > > +
> > > +if (nvme_addr_is_cmb(n, prp1)) {
> > > +req->is_cmb = true;
> > > +}
> > > +
> > This seems to be used here and within read/write functions which are calling
> > this one. Maybe there is a nicer way to track that instead of passing
> > the request
> > from multiple places ?
> >
>
> Hmm. Whether or not the command reads/writes from the CMB is really only
> something you can determine by looking at the PRPs (which is done in
> nvme_map_prp), so I think this is the right way to track it. Or do you
> have something else in mind?
>

I think what I mean is that is seems that this variable is being used only in
functions that call map_prp directly, but in order to set that variable within
the req structure this needs to be passed along from the top level of command
submission instead of maybe having additional  param to map_prpr to set it to be
CMB command or not. If that makes sense.

BR
Beata
> > > +pci_dma_sglist_init(qsg, >parent_obj, num_prps);
> > > +qemu_sglist_add(qsg, prp1, trans_len);
> > > +
> > >  len -= trans_len;
> > >  if (len) {
> > >  if (unlikely(!prp2)) {
> > >  trace_nvme_err_invalid_prp2_missing();
> > > +status = NVME_INVALID_FIELD | NVME_DNR;
> > >  goto unmap;
> > >  }
> > > +
> > >  if (len > n->page_size) {
> > >  uint64_t prp_list[n->max_prp_ents];
> > >  uint32_t nents, prp_trans;
> > >  int i = 0;
> > >
> > > +if (nvme_addr_is_cmb(n, prp2)) {
> > > +prp_list_in_cmb = true;
> > > +}
> > > +
> > >  nents = (len + n->page_size - 1) >> n->page_bits;
> > >  prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
> > > -nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
> > > +nvme_addr_read(n, prp2, (void *) prp_list, prp_trans);
> > >  while (len != 0) {
> > > +bool addr_is_cmb;
> > >  uint64_t prp_ent = le64_to_cpu(prp_list[i]);
> > >
> > >

Re: [PATCH v2 09/20] nvme: add support for the asynchronous event request command

2019-11-25 Thread Beata Michalska

On Tue, 19 Nov 2019 at 19:51, Klaus Birkelund  wrote:
>
> On Tue, Nov 12, 2019 at 03:04:59PM +, Beata Michalska wrote:
> > Hi Klaus,
> >
> > On Tue, 15 Oct 2019 at 11:49, Klaus Jensen  wrote:
> > > @@ -1188,6 +1326,9 @@ static int nvme_start_ctrl(NvmeCtrl *n)
> > >
> > >  nvme_set_timestamp(n, 0ULL);
> > >
> > > +n->aer_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_aers, 
> > > n);
> > > +QTAILQ_INIT(>aer_queue);
> > > +
> >
> > Is the timer really needed here ? The CEQ can be posted either when 
> > requested
> > by host through AER, if there are any pending events, or once the
> > event is triggered
> > and there are active AER's.
> >
>
> I guess you are right. I mostly cribbed this from Keith's tree, but I
> see no reason to keep the timer.
>
> Keith, do you have any comments on this?
>
> > > @@ -1380,6 +1521,13 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr 
> > > addr, int val)
> > > "completion queue doorbell write"
> > > " for nonexistent queue,"
> > > " sqid=%"PRIu32", ignoring", qid);
> > > +
> > > +if (n->outstanding_aers) {
> > > +nvme_enqueue_event(n, NVME_AER_TYPE_ERROR,
> > > +NVME_AER_INFO_ERR_INVALID_DB_REGISTER,
> > > +NVME_LOG_ERROR_INFO);
> > > +}
> > > +
> > This one (as well as cases below) might not be entirely right
> > according to the spec. If given event is enabled for asynchronous
> > reporting the controller should retain that even. In this case, the event
> > will be ignored as there is no pending request.
> >
>
> I understand these notifications to be special cases (i.e. they cannot
> be enabled/disabled through the Asynchronous Event Configuration
> feature). See Section 4.1 of NVM Express 1.2.1. The spec specifically
> says that "... and an Asynchronous Event Request command is outstanding,
> ...).
>

 OK, I have missed that one.
Thanks for the reference.

BR
Beata
> > > @@ -1591,6 +1759,7 @@ static void nvme_init_ctrl(NvmeCtrl *n)
> > >  id->ver = cpu_to_le32(0x00010201);
> > >  id->oacs = cpu_to_le16(0);
> > >  id->acl = 3;
> > > +id->aerl = n->params.aerl;
> >
> > What about the configuration for the asynchronous events ?
> >
>
> It will default to an AEC vector of 0 (everything disabled).
>
>
> K

Re: [PATCH v2 12/20] nvme: bump supported specification version to 1.3

2019-11-25 Thread Beata Michalska

On Mon, 18 Nov 2019 at 09:48, Klaus Birkelund  wrote:
>
> On Tue, Nov 12, 2019 at 03:05:06PM +, Beata Michalska wrote:
> > Hi Klaus,
> >
> > On Tue, 15 Oct 2019 at 11:52, Klaus Jensen  wrote:
> > >
> > > +static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeCmd *c)
> > > +{
> > > +static const int len = 4096;
> > > +
> > > +struct ns_descr {
> > > +uint8_t nidt;
> > > +uint8_t nidl;
> > > +uint8_t rsvd2[2];
> > > +uint8_t nid[16];
> > > +};
> > > +
> > > +uint32_t nsid = le32_to_cpu(c->nsid);
> > > +uint64_t prp1 = le64_to_cpu(c->prp1);
> > > +uint64_t prp2 = le64_to_cpu(c->prp2);
> > > +
> > > +struct ns_descr *list;
> > > +uint16_t ret;
> > > +
> > > +trace_nvme_identify_ns_descr_list(nsid);
> > > +
> > > +if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
> > > +trace_nvme_err_invalid_ns(nsid, n->num_namespaces);
> > > +return NVME_INVALID_NSID | NVME_DNR;
> > > +}
> > > +
> > In theory this should abort the command for inactive NSIDs as well.
> > But I guess this will come later on.
> >
>
> At this point in the series, the device does not support multiple
> namespaces anyway and num_namespaces is always 1. But this has also been
> reported seperately in relation the patch adding multiple namespaces and
> is fixed in v3.
>
> > > +list = g_malloc0(len);
> > > +list->nidt = 0x3;
> > > +list->nidl = 0x10;
> > > +*(uint32_t *) >nid[12] = cpu_to_be32(nsid);
> > > +
> > Might be worth to add some comment here -> as per the NGUID/EUI64 format.
> > Also those are not specified currently in the namespace identity data 
> > structure.
> >
>
> I'll add a comment for why the Namespace UUID is set to this value here.
> The NGUID/EUI64 fields are not set in the namespace identity data
> structure as they are not required. See the descriptions of NGUID and
> EUI64. Here for NGUID:
>
> "The controller shall specify a globally unique namespace identifier
> in this field, the EUI64 field, or a Namespace UUID in the Namespace
> Identification Descriptor..."
>
> Here, I chose to provide it in the Namespace Identification Descriptor
> (by setting `list->nidt = 0x3`).
>
> > > +ret = nvme_dma_read_prp(n, (uint8_t *) list, len, prp1, prp2);
> > > +g_free(list);
> > > +return ret;
> > > +}
> > > +
> > >  static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
> > >  {
> > >  NvmeIdentify *c = (NvmeIdentify *)cmd;
> > > @@ -934,7 +978,9 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd 
> > > *cmd)
> > >  case 0x01:
> > >  return nvme_identify_ctrl(n, c);
> > >  case 0x02:
> > > -return nvme_identify_nslist(n, c);
> > > +return nvme_identify_ns_list(n, c);
> > > +case 0x03:
> > > +return nvme_identify_ns_descr_list(n, cmd);
> > >  default:
> > >  trace_nvme_err_invalid_identify_cns(le32_to_cpu(c->cns));
> > >  return NVME_INVALID_FIELD | NVME_DNR;
> > > @@ -1101,6 +1147,14 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, 
> > > NvmeCmd *cmd, NvmeRequest *req)
> > >  blk_set_enable_write_cache(n->conf.blk, dw11 & 1);
> > >  break;
> > >  case NVME_NUMBER_OF_QUEUES:
> > > +if (n->qs_created > 2) {
> > > +return NVME_CMD_SEQ_ERROR | NVME_DNR;
> > > +}
> > > +
> > I am not sure this is entirely correct as the spec says:
> > "if any I/O Submission and/or Completion Queues (...)"
> > so it might be enough to have a single queue created
> > for this command to be valid.
> > Also I think that the condition here is to make sure that the number
> > of queues requested is being set once at init phase. Currently this will
> > allow the setting to happen if there is no active queue -> so at any
> > point of time (provided the condition mentioned). I might be wrong here
> > but it seems that what we need is a single status saying any queue
> > has been created prior to the Set Feature command at all
> >
>
> Internally, the admin queue pair is counted in qs_created, which is the
> reason for checking if is above 2. The admin queues are created when the
> controller is enabled (m

Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support

2019-11-22 Thread Beata Michalska

Hi,

On Mon, 18 Nov 2019 at 12:50, gengdongjiu  wrote:
>
> Hi,Igor,
>Thanks for you review and time.
>
> >
> >> +/*
> >> + * Type:
> >> + * Generic Hardware Error Source version 2(GHESv2 - Type 10)
> >> + */
> >> +build_append_int_noprefix(table_data, 
> >> ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
> >> +/*
> >> + * Source Id
> >
> >> + * Once we support more than one hardware error sources, we need to
> >> + * increase the value of this field.
> > I'm not sure ^^^ is correct, according to spec it's just unique id per
> > distinct error structure, so we just assign arbitrary values to each
> > declared source and that never changes once assigned.
> The source id is used to distinct the error source, for each source， the 
> ‘source id’ is unique，
> but different source has different source id. for example, the 'source id' of 
> the error source 0 is 0,
> the 'source id' of the error source 1 is 1.
>

I might be wrong but the source id is not a sequence number and it can
have any value as long
as it is unique and the comment 're 'increasing the number' reads bit wrong.

>
> >
> > For now I'd make source_id an enum with one member
> >   enum {
> > ACPI_HEST_SRC_ID_SEA = 0,
> > /* future ids go here */
> > ACPI_HEST_SRC_ID_RESERVED,
> >   }
> If we only have one error source, we can use enum instead of allocating magic 
> 0.
> But if we have more error source , such as 10 error source. using enum  maybe 
> not a good idea.
>
> for example, if there are 10 error sources, I can just using below loop
>
> for(i=0; i< 10; i++)
>build_ghes_v2（source_id++）;
>

You can do that but using enum makes it more readable and maintainable.
Also you can keep the source id as a sequence number but still represent that
with enum, as it has been suggested, and use the 'RESERVED' field for
loop control.
I think it might be also worth to represent the HES type as enum as well :
enum{
ACPI_HES_TYPE_GHESv2 = 10,

};

> >
> > and use that instead of allocating magic 0 at the beginning of the function.
> >  build_ghes_v2(ACPI_HEST_GHES_SEA);
> > Also add a comment to declaration that already assigned values are not to 
> > be changed
> >
> >> + */
> >> +build_append_int_noprefix(table_data, source_id, 2);
> >> +/* Related Source Id */
> >> +build_append_int_noprefix(table_data, 0x, 2);
> >> +/* Flags */
> >> +build_append_int_noprefix(table_data, 0, 1);
> >> +/* Enabled */
> >> +build_append_int_noprefix(table_data, 1, 1);
> >> +
> >> +/* Number of Records To Pre-allocate */
> >> +build_append_int_noprefix(table_data, 1, 4);
> >> +/* Max Sections Per Record */
> >> +build_append_int_noprefix(table_data, 1, 4);
> >> +/* Max Raw Data Length */
> >> +build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 
> >> 4);
> >> +
> >> +/* Error Status Address */
> >> +build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> >> + 4 /* QWord access */, 0);
> >> +bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> >> +ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
> > it's fine only if GHESv2 is the only entries in HEST, but once
> > other types are added this macro will silently fall apart and
> > cause table corruption.
> >
> > Instead of offset from hest_start, I suggest to use offset relative
> > to GAS structure, here is an idea
> >
> > #define GAS_ADDR_OFFSET 4
> >
> > off = table->len
> > build_append_gas()
> > bios_linker_loader_add_pointer(...,
> > off + GAS_ADDR_OFFSET, ...
> I think your suggestion is good.
>
> >
> >> +ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> >> +source_id * ACPI_GHES_ADDRESS_SIZE);
> >> +
> >> +/*
> >> + * Notification Structure
> >> + * Now only enable ARMv8 SEA notification type
> >> + */
> >> +acpi_ghes_build_notify(table_data, ACPI_GHES_NOTIFY_SEA);
> >> +
> >> +/* Error Status Block Length */
> >> +build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 
> >> 4);
> >> +
> >> +/*
> >> + * Read Ack Register
> >> + * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
> >> + * version 2 (GHESv2 - Type 10)
> >> + */
> >> +build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> >> + 4 /* QWord access */, 0);
> >> +bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> >> +ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(hest_start, 0),
> > ditto
> >
> >> +ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> >> +(ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * 
> >> ACPI_GHES_ADDRESS_SIZE);
> >> +
> >> +/*
> >> + * Read Ack Preserve
> >> + * We only provide the first bit in Read Ack Register to OSPM to write
> >> + * while the other bits are preserved.
> >> + */
> >> +build_append_int_noprefix(table_data, ~0x1ULL, 8);
> >> +/*

Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support

2019-11-22 Thread Beata Michalska

Hi Xiang,

On Mon, 11 Nov 2019 at 01:48, Xiang Zheng  wrote:
>
> From: Dongjiu Geng 
>
> This patch implements APEI GHES Table generation via fw_cfg blobs. Now
> it only supports ARMv8 SEA, a type of GHESv2 error source. Afterwards,
> we can extend the supported types if needed. For the CPER section,
> currently it is memory section because kernel mainly wants userspace to
> handle the memory errors.
>
> This patch follows the spec ACPI 6.2 to build the Hardware Error Source
> table. For more detailed information, please refer to document:
> docs/specs/acpi_hest_ghes.rst
>
> Suggested-by: Laszlo Ersek 
> Signed-off-by: Dongjiu Geng 
> Signed-off-by: Xiang Zheng 
> Reviewed-by: Michael S. Tsirkin 
> ---
>  default-configs/arm-softmmu.mak |   1 +
>  hw/acpi/Kconfig |   4 +
>  hw/acpi/Makefile.objs   |   1 +
>  hw/acpi/acpi_ghes.c | 267 
>  hw/acpi/aml-build.c |   2 +
>  hw/arm/virt-acpi-build.c|  12 ++
>  include/hw/acpi/acpi_ghes.h |  56 +++
>  include/hw/acpi/aml-build.h |   1 +
>  8 files changed, 344 insertions(+)
>  create mode 100644 hw/acpi/acpi_ghes.c
>  create mode 100644 include/hw/acpi/acpi_ghes.h
>
> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index 1f2e0e7fde..5722f3130e 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -40,3 +40,4 @@ CONFIG_FSL_IMX25=y
>  CONFIG_FSL_IMX7=y
>  CONFIG_FSL_IMX6UL=y
>  CONFIG_SEMIHOSTING=y
> +CONFIG_ACPI_APEI=y
> diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
> index 12e3f1e86e..ed8c34d238 100644
> --- a/hw/acpi/Kconfig
> +++ b/hw/acpi/Kconfig
> @@ -23,6 +23,10 @@ config ACPI_NVDIMM
>  bool
>  depends on ACPI
>
> +config ACPI_APEI
> +bool
> +depends on ACPI
> +
>  config ACPI_PCI
>  bool
>  depends on ACPI && PCI
> diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
> index 655a9c1973..84474b0ca8 100644
> --- a/hw/acpi/Makefile.objs
> +++ b/hw/acpi/Makefile.objs
> @@ -5,6 +5,7 @@ common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
>  common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
>  common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
>  common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
> +common-obj-$(CONFIG_ACPI_APEI) += acpi_ghes.o

Minor: The 'acpi' prefix could be dropped - it does not seem to be used
for other files (self impliend by the dir name).
This also applies to most of the naming within this patch

>  common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
>  common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
>  common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> new file mode 100644
> index 00..42c00ff3d3
> --- /dev/null
> +++ b/hw/acpi/acpi_ghes.c
> @@ -0,0 +1,267 @@
> +/*
> + * Support for generating APEI tables and recording CPER for Guests
> + *
> + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> + *
> + * Author: Dongjiu Geng 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see .
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/aml-build.h"
> +#include "hw/acpi/acpi_ghes.h"
> +#include "hw/nvram/fw_cfg.h"
> +#include "sysemu/sysemu.h"
> +#include "qemu/error-report.h"
> +
> +#define ACPI_GHES_ERRORS_FW_CFG_FILE"etc/hardware_errors"
> +#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
> +
> +/*
> + * The size of Address field in Generic Address Structure.
> + * ACPI 2.0/3.0: 5.2.3.1 Generic Address Structure.
> + */
> +#define ACPI_GHES_ADDRESS_SIZE  8
> +
As already mentioned, you can safely drop this and use sizeof(unit64_t).

> +/* The max size in bytes for one error block */
> +#define ACPI_GHES_MAX_RAW_DATA_LENGTH   0x1000
> +
> +/*
> + * Now only support ARMv8 SEA notification type error source
> + */
> +#define ACPI_GHES_ERROR_SOURCE_COUNT1
> +
> +/*
> + * Generic Hardware Error Source version 2
> + */
> +#define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10

Minor: this is actually a type so would be good if the name would
reflect that somehow..

> +
> +/*
> + * | +--+ 0
> + * | |Header|
> + * | +--+ 40---+-
> + * | | .|  |
> + * | |

Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM

2019-11-22 Thread Beata Michalska

Hi,

On Mon, 11 Nov 2019 at 01:48, Xiang Zheng  wrote:
>
> From: Dongjiu Geng 
>
> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> translates the host VA delivered by host to guest PA, then fills this PA
> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> type.
>
> When guest accesses the poisoned memory, it will generate a Synchronous
> External Abort(SEA). Then host kernel gets an APEI notification and calls
> memory_failure() to unmapped the affected page in stage 2, finally
> returns to guest.
>
> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> Qemu, Qemu records this error address into guest APEI GHES memory and
> notifes guest using Synchronous-External-Abort(SEA).
>
> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> in which we can setup the type of exception and the syndrome information.
> When switching to guest, the target vcpu will jump to the synchronous
> external abort vector table entry.
>
> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> not valid and hold an UNKNOWN value. These values will be set to KVM
> register structures through KVM_SET_ONE_REG IOCTL.
>
> Signed-off-by: Dongjiu Geng 
> Signed-off-by: Xiang Zheng 
> Reviewed-by: Michael S. Tsirkin 
> ---
>  hw/acpi/acpi_ghes.c | 297 
>  include/hw/acpi/acpi_ghes.h |   4 +
>  include/sysemu/kvm.h|   3 +-
>  target/arm/cpu.h|   4 +
>  target/arm/helper.c |   2 +-
>  target/arm/internals.h  |   5 +-
>  target/arm/kvm64.c  |  64 
>  target/arm/tlb_helper.c |   2 +-
>  target/i386/cpu.h   |   2 +
>  9 files changed, 377 insertions(+), 6 deletions(-)
>
> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> index 42c00ff3d3..f5b54990c0 100644
> --- a/hw/acpi/acpi_ghes.c
> +++ b/hw/acpi/acpi_ghes.c
> @@ -39,6 +39,34 @@
>  /* The max size in bytes for one error block */
>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH   0x1000
>
> +/*
> + * The total size of Generic Error Data Entry
> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> + * Table 18-343 Generic Error Data Entry
> + */
> +#define ACPI_GHES_DATA_LENGTH   72
> +
> +/*
> + * The memory section CPER size,
> + * UEFI 2.6: N.2.5 Memory Error Section
> + */
> +#define ACPI_GHES_MEM_CPER_LENGTH   80
> +
> +/*
> + * Masks for block_status flags
> + */
> +#define ACPI_GEBS_UNCORRECTABLE 1

Why not listing all supported statuses ? Similar to error severity below ?

> +
> +/*
> + * Values for error_severity field
> + */
> +enum AcpiGenericErrorSeverity {
> +ACPI_CPER_SEV_RECOVERABLE,
> +ACPI_CPER_SEV_FATAL,
> +ACPI_CPER_SEV_CORRECTED,
> +ACPI_CPER_SEV_NONE,
> +};
> +
>  /*
>   * Now only support ARMv8 SEA notification type error source
>   */
> @@ -49,6 +77,16 @@
>   */
>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
>
> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)\
> +{{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 
> 0xff, \
> +((b) >> 8) & 0xff, (b) & 0xff,   \
> +((c) >> 8) & 0xff, (c) & 0xff,\
> +(d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> +
> +#define UEFI_CPER_SEC_PLATFORM_MEM   \
> +UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> +0xED, 0x7C, 0x83, 0xB1)
> +
>  /*
>   * | +--+ 0
>   * | |Header|
> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
>  uint64_t ghes_addr_le;
>  } AcpiGhesState;
>
> +/*
> + * Total size for Generic Error Status Block
> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> + * Table 18-380 Generic Error Status Block
> + */
> +#define ACPI_GHES_GESB_SIZE 20

Minor: This is not entirely correct: GEDE is part of GESB so the total length
would be ACPI_GHES_GESB_SIZE + n* sizeof(GEDE)

> +/* The offset of Data Length in Generic Error Status Block */
> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
> +

If those were nicely represented as structures you get the offsets easily
without having number of defines. That could simplify the code and make it
more readable - see comments below

> +/*
> + * Record the value of data length for each error status block to avoid 
> getting
> + * this value from guest.
> + */
> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
> +
> +/*
> + * Generic Error Data Entry
> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> + */
> +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID 
> section_type,
> +uint32_t error_severity, uint16_t revision,
> +uint8_t validation_bits, uint8_t flags,
> +uint32_t error_data_length, QemuUUID fru_id,
> +

Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM

2019-11-22 Thread Beata Michalska

Hi,

On Fri, 15 Nov 2019 at 16:54, Igor Mammedov  wrote:
>
> On Mon, 11 Nov 2019 09:40:47 +0800
> Xiang Zheng  wrote:
>
> > From: Dongjiu Geng 
> >
> > Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> > translates the host VA delivered by host to guest PA, then fills this PA
> > to guest APEI GHES memory, then notifies guest according to the SIGBUS
> > type.
> >
> > When guest accesses the poisoned memory, it will generate a Synchronous
> > External Abort(SEA). Then host kernel gets an APEI notification and calls
> > memory_failure() to unmapped the affected page in stage 2, finally
> > returns to guest.
> >
> > Guest continues to access the PG_hwpoison page, it will trap to KVM as
> > stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> > Qemu, Qemu records this error address into guest APEI GHES memory and
> > notifes guest using Synchronous-External-Abort(SEA).
> >
> > In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> > in which we can setup the type of exception and the syndrome information.
> > When switching to guest, the target vcpu will jump to the synchronous
> > external abort vector table entry.
> >
> > The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> > ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> > not valid and hold an UNKNOWN value. These values will be set to KVM
> > register structures through KVM_SET_ONE_REG IOCTL.
> >
> > Signed-off-by: Dongjiu Geng 
> > Signed-off-by: Xiang Zheng 
> > Reviewed-by: Michael S. Tsirkin 
> > ---
> >  hw/acpi/acpi_ghes.c | 297 
> >  include/hw/acpi/acpi_ghes.h |   4 +
> >  include/sysemu/kvm.h|   3 +-
> >  target/arm/cpu.h|   4 +
> >  target/arm/helper.c |   2 +-
> >  target/arm/internals.h  |   5 +-
> >  target/arm/kvm64.c  |  64 
> >  target/arm/tlb_helper.c |   2 +-
> >  target/i386/cpu.h   |   2 +
> >  9 files changed, 377 insertions(+), 6 deletions(-)
> >
> > diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> > index 42c00ff3d3..f5b54990c0 100644
> > --- a/hw/acpi/acpi_ghes.c
> > +++ b/hw/acpi/acpi_ghes.c
> > @@ -39,6 +39,34 @@
> >  /* The max size in bytes for one error block */
> >  #define ACPI_GHES_MAX_RAW_DATA_LENGTH   0x1000
> >
> > +/*
> > + * The total size of Generic Error Data Entry
> > + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> > + * Table 18-343 Generic Error Data Entry
> > + */
> > +#define ACPI_GHES_DATA_LENGTH   72
> > +
> > +/*
> > + * The memory section CPER size,
> > + * UEFI 2.6: N.2.5 Memory Error Section
> > + */
> maybe use one line comment
>
> > +#define ACPI_GHES_MEM_CPER_LENGTH   80
> > +
> > +/*
> > + * Masks for block_status flags
> > + */
> ditto
>
> > +#define ACPI_GEBS_UNCORRECTABLE 1
> > +
> > +/*
> > + * Values for error_severity field
> > + */
> ditto
>
> > +enum AcpiGenericErrorSeverity {
> > +ACPI_CPER_SEV_RECOVERABLE,
> > +ACPI_CPER_SEV_FATAL,
> > +ACPI_CPER_SEV_CORRECTED,
> > +ACPI_CPER_SEV_NONE,
> I'd assign values explicitly here
>   foo = x,
>   ...
>
> > +};
> > +
> >  /*
> >   * Now only support ARMv8 SEA notification type error source
> >   */
> > @@ -49,6 +77,16 @@
> >   */
> >  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
> >
> > +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)\
> > +{{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 
> > 0xff, \
> > +((b) >> 8) & 0xff, (b) & 0xff,   \
> > +((c) >> 8) & 0xff, (c) & 0xff,\
> > +(d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> > +
> > +#define UEFI_CPER_SEC_PLATFORM_MEM   \
> > +UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> > +0xED, 0x7C, 0x83, 0xB1)
> > +
> >  /*
> >   * | +--+ 0
> >   * | |Header|
> > @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
> >  uint64_t ghes_addr_le;
> >  } AcpiGhesState;
> >
> > +/*
> > + * Total size for Generic Error Status Block
> > + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> > + * Table 18-380 Generic Error Status Block
> > + */
> > +#define ACPI_GHES_GESB_SIZE 20
>
> > +/* The offset of Data Length in Generic Error Status Block */
> > +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
>
> unused, drop it
>
> > +
> > +/*
> > + * Record the value of data length for each error status block to avoid 
> > getting
> > + * this value from guest.
> > + */
> > +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
> > +
> > +/*
> > + * Generic Error Data Entry
> > + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> > + */
> > +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID 
> > section_type,
> > +uint32_t error_severity, uint16_t revision,
> > +uint8_t validation_bits, uint8_t flags,
> > +

[PATCH v3 4/4] target/arm: Add support for DC CVAP & DC CVADP ins

2019-11-20 Thread Beata Michalska

ARMv8.2 introduced support for Data Cache Clean instructions
to PoP (point-of-persistence) - DC CVAP and PoDP (point-of-deep-persistence)
- DV CVADP. Both specify conceptual points in a memory system where all writes
that are to reach them are considered persistent.
The support provided considers both to be actually the same so there is no
distinction between the two. If none is available (there is no backing store
for given memory) both will result in Data Cache Clean up to the point of
coherency. Otherwise sync for the specified range shall be performed.

Signed-off-by: Beata Michalska 
Reviewed-by: Richard Henderson 
---
 linux-user/elfload.c |  2 ++
 target/arm/cpu.h | 10 ++
 target/arm/cpu64.c   |  1 +
 target/arm/helper.c  | 56 
 4 files changed, 69 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index f6693e5..07b16cc 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -656,6 +656,7 @@ static uint32_t get_elf_hwcap(void)
 GET_FEATURE_ID(aa64_jscvt, ARM_HWCAP_A64_JSCVT);
 GET_FEATURE_ID(aa64_sb, ARM_HWCAP_A64_SB);
 GET_FEATURE_ID(aa64_condm_4, ARM_HWCAP_A64_FLAGM);
+GET_FEATURE_ID(aa64_dcpop, ARM_HWCAP_A64_DCPOP);
 
 return hwcaps;
 }
@@ -665,6 +666,7 @@ static uint32_t get_elf_hwcap2(void)
 ARMCPU *cpu = ARM_CPU(thread_cpu);
 uint32_t hwcaps = 0;
 
+GET_FEATURE_ID(aa64_dcpodp, ARM_HWCAP2_A64_DCPODP);
 GET_FEATURE_ID(aa64_condm_5, ARM_HWCAP2_A64_FLAGM2);
 GET_FEATURE_ID(aa64_frint, ARM_HWCAP2_A64_FRINT);
 
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 83a809d..c3c0bf5 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3616,6 +3616,16 @@ static inline bool isar_feature_aa64_frint(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, FRINTTS) != 0;
 }
 
+static inline bool isar_feature_aa64_dcpop(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) != 0;
+}
+
+static inline bool isar_feature_aa64_dcpodp(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) >= 2;
+}
+
 static inline bool isar_feature_aa64_fp16(const ARMISARegisters *id)
 {
 /* We always set the AdvSIMD and FP fields identically wrt FP16.  */
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index a39d6fc..61fd0ad 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -646,6 +646,7 @@ static void aarch64_max_initfn(Object *obj)
 cpu->isar.id_aa64isar0 = t;
 
 t = cpu->isar.id_aa64isar1;
+t = FIELD_DP64(t, ID_AA64ISAR1, DPB, 2);
 t = FIELD_DP64(t, ID_AA64ISAR1, JSCVT, 1);
 t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
 t = FIELD_DP64(t, ID_AA64ISAR1, APA, 1); /* PAuth, architected only */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index a089fb5..f90f3ec 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -5929,6 +5929,52 @@ static const ARMCPRegInfo rndr_reginfo[] = {
   .access = PL0_R, .readfn = rndr_readfn },
 REGINFO_SENTINEL
 };
+
+#ifndef CONFIG_USER_ONLY
+static void dccvap_writefn(CPUARMState *env, const ARMCPRegInfo *opaque,
+  uint64_t value)
+{
+ARMCPU *cpu = env_archcpu(env);
+/* CTR_EL0 System register -> DminLine, bits [19:16] */
+uint64_t dline_size = 4 << ((cpu->ctr >> 16) & 0xF);
+uint64_t vaddr_in = (uint64_t) value;
+uint64_t vaddr = vaddr_in & ~(dline_size - 1);
+void *haddr;
+int mem_idx = cpu_mmu_index(env, false);
+
+/* This won't be crossing page boundaries */
+haddr = probe_read(env, vaddr, dline_size, mem_idx, GETPC());
+if (haddr) {
+
+ram_addr_t offset;
+MemoryRegion *mr;
+
+/* RCU lock is already being held */
+mr = memory_region_from_host(haddr, );
+
+if (mr) {
+memory_region_do_writeback(mr, offset, dline_size);
+}
+}
+}
+
+static const ARMCPRegInfo dcpop_reg[] = {
+{ .name = "DC_CVAP", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 12, .opc2 = 1,
+  .access = PL0_W, .type = ARM_CP_NO_RAW | ARM_CP_SUPPRESS_TB_END,
+  .accessfn = aa64_cacheop_access, .writefn = dccvap_writefn },
+REGINFO_SENTINEL
+};
+
+static const ARMCPRegInfo dcpodp_reg[] = {
+{ .name = "DC_CVADP", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 13, .opc2 = 1,
+  .access = PL0_W, .type = ARM_CP_NO_RAW | ARM_CP_SUPPRESS_TB_END,
+  .accessfn = aa64_cacheop_access, .writefn = dccvap_writefn },
+REGINFO_SENTINEL
+};
+#endif /*CONFIG_USER_ONLY*/
+
 #endif
 
 static CPAccessResult access_predinv(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -6889,6 +6935,16 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 if (cpu_isar_feature(aa64_rndr, cpu)) {
 define_arm_cp_regs(cpu, rndr_reginfo);
 }
+#ifndef

[PATCH v3 3/4] migration: ram: Switch to ram block writeback

2019-11-20 Thread Beata Michalska

Switch to ram block writeback for pmem migration.

Signed-off-by: Beata Michalska 
Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Acked-by: Dr. David Alan Gilbert 
---
 migration/ram.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 5078f94..38070f1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -33,7 +33,6 @@
 #include "qemu/bitops.h"
 #include "qemu/bitmap.h"
 #include "qemu/main-loop.h"
-#include "qemu/pmem.h"
 #include "xbzrle.h"
 #include "ram.h"
 #include "migration.h"
@@ -3981,9 +3980,7 @@ static int ram_load_cleanup(void *opaque)
 RAMBlock *rb;
 
 RAMBLOCK_FOREACH_NOT_IGNORED(rb) {
-if (ramblock_is_pmem(rb)) {
-pmem_persist(rb->host, rb->used_length);
-}
+qemu_ram_block_writeback(rb);
 }
 
 xbzrle_load_cleanup();
-- 
2.7.4

[PATCH v3 2/4] Memory: Enable writeback for given memory region

2019-11-20 Thread Beata Michalska

Add an option to trigger memory writeback to sync given memory region
with the corresponding backing store, case one is available.
This extends the support for persistent memory, allowing syncing on-demand.

Signed-off-by: Beata Michalska 
---
 exec.c  | 36 
 include/exec/memory.h   |  6 ++
 include/exec/ram_addr.h |  8 
 include/qemu/cutils.h   |  1 +
 memory.c| 12 
 util/cutils.c   | 38 ++
 6 files changed, 101 insertions(+)

diff --git a/exec.c b/exec.c
index ffdb518..a34c348 100644
--- a/exec.c
+++ b/exec.c
@@ -65,6 +65,8 @@
 #include "exec/ram_addr.h"
 #include "exec/log.h"
 
+#include "qemu/pmem.h"
+
 #include "migration/vmstate.h"
 
 #include "qemu/range.h"
@@ -2156,6 +2158,40 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, 
Error **errp)
 return 0;
 }
 
+/*
+ * Trigger sync on the given ram block for range [start, start + length]
+ * with the backing store if one is available.
+ * Otherwise no-op.
+ * @Note: this is supposed to be a synchronous op.
+ */
+void qemu_ram_writeback(RAMBlock *block, ram_addr_t start, ram_addr_t length)
+{
+void *addr = ramblock_ptr(block, start);
+
+/* The requested range should fit in within the block range */
+g_assert((start + length) <= block->used_length);
+
+#ifdef CONFIG_LIBPMEM
+/* The lack of support for pmem should not block the sync */
+if (ramblock_is_pmem(block)) {
+pmem_persist(addr, length);
+return;
+}
+#endif
+if (block->fd >= 0) {
+/**
+ * Case there is no support for PMEM or the memory has not been
+ * specified as persistent (or is not one) - use the msync.
+ * Less optimal but still achieves the same goal
+ */
+if (qemu_msync(addr, length, block->fd)) {
+warn_report("%s: failed to sync memory range: start: "
+RAM_ADDR_FMT " length: " RAM_ADDR_FMT,
+__func__, start, length);
+}
+}
+}
+
 /* Called with ram_list.mutex held */
 static void dirty_memory_extend(ram_addr_t old_ram_size,
 ram_addr_t new_ram_size)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index e499dc2..27a84e0 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1265,6 +1265,12 @@ void *memory_region_get_ram_ptr(MemoryRegion *mr);
  */
 void memory_region_ram_resize(MemoryRegion *mr, ram_addr_t newsize,
   Error **errp);
+/**
+ * memory_region_do_writeback: Trigger writeback for selected address range
+ * [addr, addr + size]
+ *
+ */
+void memory_region_do_writeback(MemoryRegion *mr, hwaddr addr, hwaddr size);
 
 /**
  * memory_region_set_log: Turn dirty logging on or off for a region.
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index bed0554..5adebb0 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -174,6 +174,14 @@ void qemu_ram_free(RAMBlock *block);
 
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp);
 
+void qemu_ram_writeback(RAMBlock *block, ram_addr_t start, ram_addr_t length);
+
+/* Clear whole block of mem */
+static inline void qemu_ram_block_writeback(RAMBlock *block)
+{
+qemu_ram_writeback(block, 0, block->used_length);
+}
+
 #define DIRTY_CLIENTS_ALL ((1 << DIRTY_MEMORY_NUM) - 1)
 #define DIRTY_CLIENTS_NOCODE  (DIRTY_CLIENTS_ALL & ~(1 << DIRTY_MEMORY_CODE))
 
diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
index b54c847..eb59852 100644
--- a/include/qemu/cutils.h
+++ b/include/qemu/cutils.h
@@ -130,6 +130,7 @@ const char *qemu_strchrnul(const char *s, int c);
 #endif
 time_t mktimegm(struct tm *tm);
 int qemu_fdatasync(int fd);
+int qemu_msync(void *addr, size_t length, int fd);
 int fcntl_setfl(int fd, int flag);
 int qemu_parse_fd(const char *param);
 int qemu_strtoi(const char *nptr, const char **endptr, int base,
diff --git a/memory.c b/memory.c
index 06484c2..0228cad 100644
--- a/memory.c
+++ b/memory.c
@@ -2207,6 +2207,18 @@ void memory_region_ram_resize(MemoryRegion *mr, 
ram_addr_t newsize, Error **errp
 qemu_ram_resize(mr->ram_block, newsize, errp);
 }
 
+
+void memory_region_do_writeback(MemoryRegion *mr, hwaddr addr, hwaddr size)
+{
+/*
+ * Might be extended case needed to cover
+ * different types of memory regions
+ */
+if (mr->ram_block && mr->dirty_log_mask) {
+qemu_ram_writeback(mr->ram_block, addr, size);
+}
+}
+
 /*
  * Call proper memory listeners about the change on the newly
  * added/removed CoalescedMemoryRange.
diff --git a/util/cutils.c b/util/cutils.c
index fd591ca..c76ed88 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -164,6 +164,44 @@ int qemu_fdatasync(int fd)
 #endif
 }
 
+/**
+ * Sync changes made to the

[PATCH v3 1/4] tcg: cputlb: Add probe_read

2019-11-20 Thread Beata Michalska

Add probe_read alongside the write probing equivalent.

Signed-off-by: Beata Michalska 
Reviewed-by: Alex Bennée 
---
 include/exec/exec-all.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index d85e610..350c4b4 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -339,6 +339,12 @@ static inline void *probe_write(CPUArchState *env, 
target_ulong addr, int size,
 return probe_access(env, addr, size, MMU_DATA_STORE, mmu_idx, retaddr);
 }
 
+static inline void *probe_read(CPUArchState *env, target_ulong addr, int size,
+   int mmu_idx, uintptr_t retaddr)
+{
+return probe_access(env, addr, size, MMU_DATA_LOAD, mmu_idx, retaddr);
+}
+
 #define CODE_GEN_ALIGN   16 /* must be >= of the size of a icache line 
*/
 
 /* Estimated block size for TB allocation.  */
-- 
2.7.4

[PATCH v3 0/4] target/arm: Support for Data Cache Clean up to PoP

2019-11-20 Thread Beata Michalska

ARMv8.2 introduced support for Data Cache Clean instructions to PoP
(point-of-persistence) and PoDP (point-of-deep-persistence):
ARMv8.2-DCCVAP &  ARMv8.2-DCCVADP respectively.
This patch set adds support for emulating both, though there is no
distinction between the two points: the PoDP is assumed to represent
the same point of persistence as PoP. Case there is no such point specified
for the considered memory system both will fall back to the DV CVAC inst
(clean up to the point of coherency).
The changes introduced include adding probe_read for validating read memory
access to allow verification for mandatory read access for both cache
clean instructions, along with support for writeback for requested memory
regions through msync, if one is available, based otherwise on fsyncdata.

As currently the virt platform is missing support for NVDIMM,
the changes have been tested  with [1] & [2]


[1] https://patchwork.kernel.org/cover/10830237/
[2] https://patchwork.kernel.org/project/qemu-devel/list/?series=159441

v3:
- Assert on invalid sync range for ram block
- Drop alignment handling from qemu_msync

v2:
- Moved the msync into a qemu wrapper with
  CONFIG_POSIX switch + additional comments
- Fixed length alignment
- Dropped treating the DC CVAP/CVADP as special case
  and moved those to conditional registration
- Dropped needless locking for grabbing mem region


Beata Michalska (4):
  tcg: cputlb: Add probe_read
  Memory: Enable writeback for given memory region
  migration: ram: Switch to ram block writeback
  target/arm: Add support for DC CVAP & DC CVADP ins

 exec.c  | 36 +++
 include/exec/exec-all.h |  6 ++
 include/exec/memory.h   |  6 ++
 include/exec/ram_addr.h |  8 +++
 include/qemu/cutils.h   |  1 +
 linux-user/elfload.c|  2 ++
 memory.c| 12 +++
 migration/ram.c |  5 +
 target/arm/cpu.h| 10 +
 target/arm/cpu64.c  |  1 +
 target/arm/helper.c | 56 +
 util/cutils.c   | 38 +
 12 files changed, 177 insertions(+), 4 deletions(-)

-- 
2.7.4

Re: [PATCH v2 06/20] nvme: add support for the abort command

2019-11-15 Thread Beata Michalska

Hi Klaus,

On Wed, 13 Nov 2019 at 06:12, Klaus Birkelund  wrote:
>
> On Tue, Nov 12, 2019 at 03:04:38PM +, Beata Michalska wrote:
> > Hi Klaus
> >
>
> Hi Beata,
>
> Thank you very much for your thorough reviews! I'll start going through
> them one by one :) You might have seen that I've posted a v3, but I will
> make sure to consolidate between v2 and v3!
>
> > On Tue, 15 Oct 2019 at 11:41, Klaus Jensen  wrote:
> > >
> > > Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
> > > Section 5.1 ("Abort command").
> > >
> > > The Abort command is a best effort command; for now, the device always
> > > fails to abort the given command.
> > >
> > > Signed-off-by: Klaus Jensen 
> > > ---
> > >  hw/block/nvme.c | 16 
> > >  1 file changed, 16 insertions(+)
> > >
> > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > > index daa2367b0863..84e4f2ea7a15 100644
> > > --- a/hw/block/nvme.c
> > > +++ b/hw/block/nvme.c
> > > @@ -741,6 +741,18 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd 
> > > *cmd)
> > >  }
> > >  }
> > >
> > > +static uint16_t nvme_abort(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> > > +{
> > > +uint16_t sqid = le32_to_cpu(cmd->cdw10) & 0x;
> > > +
> > > +req->cqe.result = 1;
> > > +if (nvme_check_sqid(n, sqid)) {
> > > +return NVME_INVALID_FIELD | NVME_DNR;
> > > +}
> > > +
> > Shouldn't we validate the CID as well ?
> >
>
> According to the specification it is "implementation specific if/when a
> controller chooses to complete the command when the command to abort is
> not found".
>
> I'm interpreting this to mean that, yes, an invalid command identifier
> could be given in the command, but this implementation does not care
> about that.
>
> I still think the controller should check the validity of the submission
> queue identifier though. It is a general invariant that the sqid should
> be valid.
>
> > > +return NVME_SUCCESS;
> > > +}
> > > +
> > >  static inline void nvme_set_timestamp(NvmeCtrl *n, uint64_t ts)
> > >  {
> > >  trace_nvme_setfeat_timestamp(ts);
> > > @@ -859,6 +871,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd 
> > > *cmd, NvmeRequest *req)
> > >  trace_nvme_err_invalid_setfeat(dw10);
> > >  return NVME_INVALID_FIELD | NVME_DNR;
> > >  }
> > > +
> > >  return NVME_SUCCESS;
> > >  }
> > >
> > > @@ -875,6 +888,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd 
> > > *cmd, NvmeRequest *req)
> > >  return nvme_create_cq(n, cmd);
> > >  case NVME_ADM_CMD_IDENTIFY:
> > >  return nvme_identify(n, cmd);
> > > +case NVME_ADM_CMD_ABORT:
> > > +return nvme_abort(n, cmd, req);
> > >  case NVME_ADM_CMD_SET_FEATURES:
> > >  return nvme_set_feature(n, cmd, req);
> > >  case NVME_ADM_CMD_GET_FEATURES:
> > > @@ -1388,6 +1403,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
> > > **errp)
> > >  id->ieee[2] = 0xb3;
> > >  id->ver = cpu_to_le32(0x00010201);
> > >  id->oacs = cpu_to_le16(0);
> > > +id->acl = 3;
> > So we are setting the max number of concurrent commands
> > but there is no logic to enforce that and wrap up with the
> > status suggested by specification.
> >
>
> That is true, but because the controller always completes the Abort
> command immediately this cannot happen. If the controller did try to
> abort executing commands, the Abort command would need to linger in the
> controller state until a completion queue entry is posted for the
> command to be aborted before the completion queue entry can be posted
> for the Abort command. This takes up resources in the controller and is
> the reason for the Abort Command Limit.
>
> You could argue that we should set ACL to 0 then, but the specification
> recommends a value of 3 and I do not see any harm in conveying a
> "reasonable", though inconsequential, value.

Could we  potentially add some comment describing the above ?

BR
Beata

Re: [PATCH v2 15/20] nvme: add support for scatter gather lists

2019-11-12 Thread Beata Michalska

Hi Klaus,

On Tue, 15 Oct 2019 at 11:57, Klaus Jensen  wrote:
>
> For now, support the Data Block, Segment and Last Segment descriptor
> types.
>
> See NVM Express 1.3d, Section 4.4 ("Scatter Gather List (SGL)").
>
> Signed-off-by: Klaus Jensen 
> ---
>  block/nvme.c  |  18 +-
>  hw/block/nvme.c   | 380 --
>  hw/block/trace-events |   3 +
>  include/block/nvme.h  |  62 ++-
>  4 files changed, 398 insertions(+), 65 deletions(-)
>
> diff --git a/block/nvme.c b/block/nvme.c
> index 5be3a39b632e..8825c19c72c2 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -440,7 +440,7 @@ static void nvme_identify(BlockDriverState *bs, int 
> namespace, Error **errp)
>  error_setg(errp, "Cannot map buffer for DMA");
>  goto out;
>  }
> -cmd.prp1 = cpu_to_le64(iova);
> +cmd.dptr.prp.prp1 = cpu_to_le64(iova);
>
>  if (nvme_cmd_sync(bs, s->queues[0], )) {
>  error_setg(errp, "Failed to identify controller");
> @@ -529,7 +529,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
> **errp)
>  }
>  cmd = (NvmeCmd) {
>  .opcode = NVME_ADM_CMD_CREATE_CQ,
> -.prp1 = cpu_to_le64(q->cq.iova),
> +.dptr.prp.prp1 = cpu_to_le64(q->cq.iova),
>  .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0x)),
>  .cdw11 = cpu_to_le32(0x3),
>  };
> @@ -540,7 +540,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
> **errp)
>  }
>  cmd = (NvmeCmd) {
>  .opcode = NVME_ADM_CMD_CREATE_SQ,
> -.prp1 = cpu_to_le64(q->sq.iova),
> +.dptr.prp.prp1 = cpu_to_le64(q->sq.iova),
>  .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0x)),
>  .cdw11 = cpu_to_le32(0x1 | (n << 16)),
>  };
> @@ -889,16 +889,16 @@ try_map:
>  case 0:
>  abort();
>  case 1:
> -cmd->prp1 = pagelist[0];
> -cmd->prp2 = 0;
> +cmd->dptr.prp.prp1 = pagelist[0];
> +cmd->dptr.prp.prp2 = 0;
>  break;
>  case 2:
> -cmd->prp1 = pagelist[0];
> -cmd->prp2 = pagelist[1];
> +cmd->dptr.prp.prp1 = pagelist[0];
> +cmd->dptr.prp.prp2 = pagelist[1];
>  break;
>  default:
> -cmd->prp1 = pagelist[0];
> -cmd->prp2 = cpu_to_le64(req->prp_list_iova + sizeof(uint64_t));
> +cmd->dptr.prp.prp1 = pagelist[0];
> +cmd->dptr.prp.prp2 = cpu_to_le64(req->prp_list_iova + 
> sizeof(uint64_t));
>  break;
>  }
>  trace_nvme_cmd_map_qiov(s, cmd, req, qiov, entries);
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index f4b9bd36a04e..0a5cd079df9a 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -296,6 +296,198 @@ unmap:
>  return status;
>  }
>
> +static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList *qsg,
> +NvmeSglDescriptor *segment, uint64_t nsgld, uint32_t *len,
> +NvmeRequest *req)
> +{
> +dma_addr_t addr, trans_len;
> +
> +for (int i = 0; i < nsgld; i++) {
> +if (NVME_SGL_TYPE(segment[i].type) != SGL_DESCR_TYPE_DATA_BLOCK) {
> +trace_nvme_err_invalid_sgl_descriptor(req->cid,
> +NVME_SGL_TYPE(segment[i].type));
> +return NVME_SGL_DESCRIPTOR_TYPE_INVALID | NVME_DNR;
> +}
> +
> +if (*len == 0) {
> +if (!NVME_CTRL_SGLS_EXCESS_LENGTH(n->id_ctrl.sgls)) {
> +trace_nvme_err_invalid_sgl_excess_length(req->cid);
> +return NVME_DATA_SGL_LENGTH_INVALID | NVME_DNR;
> +}
> +
> +break;
> +}
> +
> +addr = le64_to_cpu(segment[i].addr);
> +trans_len = MIN(*len, le64_to_cpu(segment[i].len));
> +
> +if (nvme_addr_is_cmb(n, addr)) {
> +/*
> + * All data and metadata, if any, associated with a particular
> + * command shall be located in either the CMB or host memory. 
> Thus,
> + * if an address if found to be in the CMB and we have already
> + * mapped data that is in host memory, the use is invalid.
> + */
> +if (!nvme_req_is_cmb(req) && qsg->size) {
> +return NVME_INVALID_USE_OF_CMB | NVME_DNR;
> +}
> +
> +nvme_req_set_cmb(req);
> +} else {
> +/*
> + * Similarly, if the address does not reference the CMB, but we
> + * have already established that the request has data or metadata
> + * in the CMB, the use is invalid.
> + */
> +if (nvme_req_is_cmb(req)) {
> +return NVME_INVALID_USE_OF_CMB | NVME_DNR;
> +}
> +}
> +
> +qemu_sglist_add(qsg, addr, trans_len);
> +
> +*len -= trans_len;
> +}
> +
> +return NVME_SUCCESS;
> +}
> +
> +static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg,
> +NvmeSglDescriptor sgl, uint32_t len, NvmeRequest *req)
> +{
> +const int MAX_NSGLD = 256;
> +
> +

Re: [PATCH v2 14/20] nvme: allow multiple aios per command

2019-11-12 Thread Beata Michalska

Hi Klaus,

On Tue, 15 Oct 2019 at 11:55, Klaus Jensen  wrote:
>
> This refactors how the device issues asynchronous block backend
> requests. The NvmeRequest now holds a queue of NvmeAIOs that are
> associated with the command. This allows multiple aios to be issued for
> a command. Only when all requests have been completed will the device
> post a completion queue entry.
>
> Because the device is currently guaranteed to only issue a single aio
> request per command, the benefit is not immediately obvious. But this
> functionality is required to support metadata.
>
> Signed-off-by: Klaus Jensen 
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c   | 455 +-
>  hw/block/nvme.h   | 165 ---
>  hw/block/trace-events |   8 +
>  3 files changed, 511 insertions(+), 117 deletions(-)
>
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index cbc0b6a660b6..f4b9bd36a04e 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -25,6 +25,8 @@
>   *  Default: 64
>   *   cmb_size_mb= : Size of Controller Memory Buffer in MBs.
>   *  Default: 0 (disabled)
> + *   mdts= : Maximum Data Transfer Size (power of two)
> + *  Default: 7
>   */
>
>  #include "qemu/osdep.h"
> @@ -56,6 +58,7 @@
>  } while (0)
>
>  static void nvme_process_sq(void *opaque);
> +static void nvme_aio_cb(void *opaque, int ret);
>
>  static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
>  {
> @@ -197,7 +200,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList 
> *qsg, uint64_t prp1,
>  }
>
>  if (nvme_addr_is_cmb(n, prp1)) {
> -req->is_cmb = true;
> +nvme_req_set_cmb(req);
>  }
>
>  pci_dma_sglist_init(qsg, >parent_obj, num_prps);
> @@ -255,8 +258,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList 
> *qsg, uint64_t prp1,
>  }
>
>  addr_is_cmb = nvme_addr_is_cmb(n, prp_ent);
> -if ((req->is_cmb && !addr_is_cmb) ||
> -(!req->is_cmb && addr_is_cmb)) {
> +if ((nvme_req_is_cmb(req) && !addr_is_cmb) ||
> +(!nvme_req_is_cmb(req) && addr_is_cmb)) {
>  status = NVME_INVALID_USE_OF_CMB | NVME_DNR;
>  goto unmap;
>  }
> @@ -269,8 +272,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList 
> *qsg, uint64_t prp1,
>  }
>  } else {
>  bool addr_is_cmb = nvme_addr_is_cmb(n, prp2);
> -if ((req->is_cmb && !addr_is_cmb) ||
> -(!req->is_cmb && addr_is_cmb)) {
> +if ((nvme_req_is_cmb(req) && !addr_is_cmb) ||
> +(!nvme_req_is_cmb(req) && addr_is_cmb)) {
>  status = NVME_INVALID_USE_OF_CMB | NVME_DNR;
>  goto unmap;
>  }
> @@ -312,7 +315,7 @@ static uint16_t nvme_dma_write_prp(NvmeCtrl *n, uint8_t 
> *ptr, uint32_t len,
>  return status;
>  }
>
> -if (req->is_cmb) {
> +if (nvme_req_is_cmb(req)) {
>  QEMUIOVector iov;
>
>  qemu_iovec_init(, qsg.nsg);
> @@ -341,19 +344,18 @@ static uint16_t nvme_dma_write_prp(NvmeCtrl *n, uint8_t 
> *ptr, uint32_t len,
Any reason why the nvme_dma_write_prp is missing the changes applied
to nvme_dma_read_prp ?

>  static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
>  uint64_t prp1, uint64_t prp2, NvmeRequest *req)
>  {
> -QEMUSGList qsg;
>  uint16_t status = NVME_SUCCESS;
>
> -status = nvme_map_prp(n, , prp1, prp2, len, req);
> +status = nvme_map_prp(n, >qsg, prp1, prp2, len, req);
>  if (status) {
>  return status;
>  }
>
> -if (req->is_cmb) {
> +if (nvme_req_is_cmb(req)) {
>  QEMUIOVector iov;
>
> -qemu_iovec_init(, qsg.nsg);
> -dma_to_cmb(n, , );
> +qemu_iovec_init(, req->qsg.nsg);
> +dma_to_cmb(n, >qsg, );
>
>  if (unlikely(qemu_iovec_from_buf(, 0, ptr, len) != len)) {
>  trace_nvme_err_invalid_dma();
> @@ -365,17 +367,137 @@ static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t 
> *ptr, uint32_t len,
>  goto out;
>  }
>
> -if (unlikely(dma_buf_read(ptr, len, ))) {
> +if (unlikely(dma_buf_read(ptr, len, >qsg))) {
>  trace_nvme_err_invalid_dma();
>  status = NVME_INVALID_FIELD | NVME_DNR;
>  }
>
>  out:
> -qemu_sglist_destroy();
> +qemu_sglist_destroy(>qsg);
>
>  return status;
>  }
>
> +static uint16_t nvme_map(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> +{
> +NvmeNamespace *ns = req->ns;
> +
> +uint32_t len = req->nlb << nvme_ns_lbads(ns);
> +uint64_t prp1 = le64_to_cpu(cmd->prp1);
> +uint64_t prp2 = le64_to_cpu(cmd->prp2);
> +
> +return nvme_map_prp(n, >qsg, prp1, prp2, len, req);
> +}
> +
> +static void nvme_aio_destroy(NvmeAIO *aio)
> +{
> +if (aio->iov.nalloc) {
> +

Re: [PATCH v2 19/20] nvme: make lba data size configurable

2019-11-12 Thread Beata Michalska

Hi Klaus,

On Tue, 15 Oct 2019 at 11:50, Klaus Jensen  wrote:
>
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme-ns.c | 2 +-
>  hw/block/nvme-ns.h | 4 +++-
>  hw/block/nvme.c| 1 +
>  3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
> index aa76bb63ef45..70ff622a5729 100644
> --- a/hw/block/nvme-ns.c
> +++ b/hw/block/nvme-ns.c
> @@ -18,7 +18,7 @@ static int nvme_ns_init(NvmeNamespace *ns)
>  {
>  NvmeIdNs *id_ns = >id_ns;
>
> -id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
> +id_ns->lbaf[0].ds = ns->params.lbads;
>  id_ns->nuse = id_ns->ncap = id_ns->nsze =
>  cpu_to_le64(nvme_ns_nlbas(ns));
>
> diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
> index 64dd054cf6a9..aa1c81d85cde 100644
> --- a/hw/block/nvme-ns.h
> +++ b/hw/block/nvme-ns.h
> @@ -6,10 +6,12 @@
>  OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS)
>
>  #define DEFINE_NVME_NS_PROPERTIES(_state, _props) \
> -DEFINE_PROP_UINT32("nsid", _state, _props.nsid, 0)
> +DEFINE_PROP_UINT32("nsid", _state, _props.nsid, 0), \
> +DEFINE_PROP_UINT8("lbads", _state, _props.lbads, 9)
>
Could we actually use BDRV_SECTOR_BITS instead of magic numbers?

BR

Beata


>  typedef struct NvmeNamespaceParams {
>  uint32_t nsid;
> +uint8_t  lbads;
>  } NvmeNamespaceParams;
>
>  typedef struct NvmeNamespace {
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 67f92bf5a3ac..d0103c16cfe9 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -2602,6 +2602,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
> **errp)
>  if (n->namespace.conf.blk) {
>  ns = >namespace;
>  ns->params.nsid = 1;
> +ns->params.lbads = 9;
>
>  if (nvme_ns_setup(n, ns, _err)) {
>  error_propagate_prepend(errp, local_err, "nvme_ns_setup: ");
> --
> 2.23.0
>
>

Re: [PATCH v2 13/20] nvme: refactor prp mapping

2019-11-12 Thread Beata Michalska

Hi Klaus,

On Tue, 15 Oct 2019 at 11:57, Klaus Jensen  wrote:
>
> Instead of handling both QSGs and IOVs in multiple places, simply use
> QSGs everywhere by assuming that the request does not involve the
> controller memory buffer (CMB). If the request is found to involve the
> CMB, convert the QSG to an IOV and issue the I/O. The QSG is converted
> to an IOV by the dma helpers anyway, so the CMB path is not unfairly
> affected by this simplifying change.
>

Out of curiosity, in how many cases the SG list will have to
be converted to IOV ? Does that justify creating the SG list in vain ?

> As a side-effect, this patch also allows PRPs to be located in the CMB.
> The logic ensures that if some of the PRP is in the CMB, all of it must
> be located there, as per the specification.
>
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c   | 255 --
>  hw/block/nvme.h   |   4 +-
>  hw/block/trace-events |   1 +
>  include/block/nvme.h  |   1 +
>  4 files changed, 174 insertions(+), 87 deletions(-)
>
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 1e2320b38b14..cbc0b6a660b6 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -179,138 +179,200 @@ static void nvme_set_error_page(NvmeCtrl *n, uint16_t 
> sqid, uint16_t cid,
>  n->elp_index = (n->elp_index + 1) % n->params.elpe;
>  }
>
> -static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t 
> prp1,
> - uint64_t prp2, uint32_t len, NvmeCtrl *n)
> +static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, uint64_t prp1,
> +uint64_t prp2, uint32_t len, NvmeRequest *req)
>  {
>  hwaddr trans_len = n->page_size - (prp1 % n->page_size);
>  trans_len = MIN(len, trans_len);
>  int num_prps = (len >> n->page_bits) + 1;
> +uint16_t status = NVME_SUCCESS;
> +bool prp_list_in_cmb = false;
> +
> +trace_nvme_map_prp(req->cid, req->cmd.opcode, trans_len, len, prp1, prp2,
> +num_prps);
>
>  if (unlikely(!prp1)) {
>  trace_nvme_err_invalid_prp();
>  return NVME_INVALID_FIELD | NVME_DNR;
> -} else if (n->cmbsz && prp1 >= n->ctrl_mem.addr &&
> -   prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) {
> -qsg->nsg = 0;
> -qemu_iovec_init(iov, num_prps);
> -qemu_iovec_add(iov, (void *)>cmbuf[prp1 - n->ctrl_mem.addr], 
> trans_len);
> -} else {
> -pci_dma_sglist_init(qsg, >parent_obj, num_prps);
> -qemu_sglist_add(qsg, prp1, trans_len);
>  }
> +
> +if (nvme_addr_is_cmb(n, prp1)) {
> +req->is_cmb = true;
> +}
> +
This seems to be used here and within read/write functions which are calling
this one. Maybe there is a nicer way to track that instead of passing
the request
from multiple places ?

> +pci_dma_sglist_init(qsg, >parent_obj, num_prps);
> +qemu_sglist_add(qsg, prp1, trans_len);
> +
>  len -= trans_len;
>  if (len) {
>  if (unlikely(!prp2)) {
>  trace_nvme_err_invalid_prp2_missing();
> +status = NVME_INVALID_FIELD | NVME_DNR;
>  goto unmap;
>  }
> +
>  if (len > n->page_size) {
>  uint64_t prp_list[n->max_prp_ents];
>  uint32_t nents, prp_trans;
>  int i = 0;
>
> +if (nvme_addr_is_cmb(n, prp2)) {
> +prp_list_in_cmb = true;
> +}
> +
>  nents = (len + n->page_size - 1) >> n->page_bits;
>  prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
> -nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
> +nvme_addr_read(n, prp2, (void *) prp_list, prp_trans);
>  while (len != 0) {
> +bool addr_is_cmb;
>  uint64_t prp_ent = le64_to_cpu(prp_list[i]);
>
>  if (i == n->max_prp_ents - 1 && len > n->page_size) {
>  if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
>  trace_nvme_err_invalid_prplist_ent(prp_ent);
> +status = NVME_INVALID_FIELD | NVME_DNR;
> +goto unmap;
> +}
> +
> +addr_is_cmb = nvme_addr_is_cmb(n, prp_ent);
> +if ((prp_list_in_cmb && !addr_is_cmb) ||
> +(!prp_list_in_cmb && addr_is_cmb)) {

Minor: Same condition (based on different vars) is being used in
multiple places. Might be worth to move it outside and just pass in
the needed values.

> +status = NVME_INVALID_USE_OF_CMB | NVME_DNR;
>  goto unmap;
>  }
>
>  i = 0;
>  nents = (len + n->page_size - 1) >> n->page_bits;
>  prp_trans = MIN(n->max_prp_ents, nents) * 
> sizeof(uint64_t);
> -nvme_addr_read(n, prp_ent, (void *)prp_list,
> -prp_trans);
> +

Re: [PATCH v2 09/20] nvme: add support for the asynchronous event request command

2019-11-12 Thread Beata Michalska

Hi Klaus,

On Tue, 15 Oct 2019 at 11:49, Klaus Jensen  wrote:
>
> Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
> Section 5.2 ("Asynchronous Event Request command").
>
> Mostly imported from Keith's qemu-nvme tree. Modified to not enqueue
> events if something of the same type is already queued (but not cleared
> by the host).
>
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c   | 180 --
>  hw/block/nvme.h   |  13 ++-
>  hw/block/trace-events |   8 ++
>  include/block/nvme.h  |   4 +-
>  4 files changed, 196 insertions(+), 9 deletions(-)
>
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 4412a3bea3bc..5cdee37582f9 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -334,6 +334,46 @@ static void nvme_enqueue_req_completion(NvmeCQueue *cq, 
> NvmeRequest *req)
>  timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
>  }
>
> +static void nvme_enqueue_event(NvmeCtrl *n, uint8_t event_type,
> +uint8_t event_info, uint8_t log_page)
> +{
> +NvmeAsyncEvent *event;
> +
> +trace_nvme_enqueue_event(event_type, event_info, log_page);
> +
> +/*
> + * Do not enqueue the event if something of this type is already queued.
> + * This bounds the size of the event queue and makes sure it does not 
> grow
> + * indefinitely when events are not processed by the host (i.e. does not
> + * issue any AERs).
> + */
> +if (n->aer_mask_queued & (1 << event_type)) {
> +trace_nvme_enqueue_event_masked(event_type);
> +return;
> +}
> +n->aer_mask_queued |= (1 << event_type);
> +
> +event = g_new(NvmeAsyncEvent, 1);
> +event->result = (NvmeAerResult) {
> +.event_type = event_type,
> +.event_info = event_info,
> +.log_page   = log_page,
> +};
> +
> +QTAILQ_INSERT_TAIL(>aer_queue, event, entry);
> +
> +timer_mod(n->aer_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
> +}
> +
> +static void nvme_clear_events(NvmeCtrl *n, uint8_t event_type)
> +{
> +n->aer_mask &= ~(1 << event_type);
> +if (!QTAILQ_EMPTY(>aer_queue)) {
> +timer_mod(n->aer_timer,
> +qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
> +}
> +}
> +
>  static void nvme_rw_cb(void *opaque, int ret)
>  {
>  NvmeRequest *req = opaque;
> @@ -578,7 +618,7 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd *cmd)
>  return NVME_SUCCESS;
>  }
>
> -static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd,
> +static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
>  uint32_t buf_len, uint64_t off, NvmeRequest *req)
>  {
>  uint32_t trans_len;
> @@ -591,12 +631,16 @@ static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd 
> *cmd,
>
>  trans_len = MIN(sizeof(*n->elpes) * (n->params.elpe + 1) - off, buf_len);
>
> +if (!rae) {
> +nvme_clear_events(n, NVME_AER_TYPE_ERROR);
> +}
> +
>  return nvme_dma_read_prp(n, (uint8_t *) n->elpes + off, trans_len, prp1,
>  prp2);
>  }
>
> -static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
> -uint64_t off, NvmeRequest *req)
> +static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
> +uint32_t buf_len, uint64_t off, NvmeRequest *req)
>  {
>  uint64_t prp1 = le64_to_cpu(cmd->prp1);
>  uint64_t prp2 = le64_to_cpu(cmd->prp2);
> @@ -646,6 +690,10 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd 
> *cmd, uint32_t buf_len,
>  smart.power_on_hours[0] = cpu_to_le64(
>  (((current_ms - n->starttime_ms) / 1000) / 60) / 60);
>
> +if (!rae) {
> +nvme_clear_events(n, NVME_AER_TYPE_SMART);
> +}
> +
>  return nvme_dma_read_prp(n, (uint8_t *)  + off, trans_len, prp1,
>  prp2);
>  }
> @@ -698,9 +746,9 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, 
> NvmeRequest *req)
>
>  switch (lid) {
>  case NVME_LOG_ERROR_INFO:
> -return nvme_error_info(n, cmd, len, off, req);
> +return nvme_error_info(n, cmd, rae, len, off, req);
>  case NVME_LOG_SMART_INFO:
> -return nvme_smart_info(n, cmd, len, off, req);
> +return nvme_smart_info(n, cmd, rae, len, off, req);
>  case NVME_LOG_FW_SLOT_INFO:
>  return nvme_fw_log_info(n, cmd, len, off, req);
>  default:
> @@ -958,6 +1006,9 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd 
> *cmd, NvmeRequest *req)
>  break;
>  case NVME_TIMESTAMP:
>  return nvme_get_feature_timestamp(n, cmd);
> +case NVME_ASYNCHRONOUS_EVENT_CONF:
> +result = cpu_to_le32(n->features.async_config);
> +break;
>  default:
>  trace_nvme_err_invalid_getfeat(dw10);
>  return NVME_INVALID_FIELD | NVME_DNR;
> @@ -993,6 +1044,12 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd 
> *cmd, NvmeRequest *req)
>  switch (dw10) {
>  case NVME_TEMPERATURE_THRESHOLD:
>  n->features.temp_thresh = dw11;
> +

Re: [PATCH v2 12/20] nvme: bump supported specification version to 1.3

2019-11-12 Thread Beata Michalska

Hi Klaus,

On Tue, 15 Oct 2019 at 11:52, Klaus Jensen  wrote:
>
> Add the new Namespace Identification Descriptor List (CNS 03h) and track
> creation of queues to enable the controller to return Command Sequence
> Error if Set Features is called for Number of Queues after any queues
> have been created.
>
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c   | 82 +++
>  hw/block/nvme.h   |  1 +
>  hw/block/trace-events |  8 +++--
>  include/block/nvme.h  | 30 +---
>  4 files changed, 100 insertions(+), 21 deletions(-)
>
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index e7d46dcc6afe..1e2320b38b14 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -9,20 +9,22 @@
>   */
>
>  /**
> - * Reference Specification: NVM Express 1.2.1
> + * Reference Specification: NVM Express 1.3d
>   *
>   *   https://nvmexpress.org/resources/specifications/
>   */
>
>  /**
>   * Usage: add options:
> - *  -drive file=,if=none,id=
> - *  -device nvme,drive=,serial=,id=, \
> - *  cmb_size_mb=, \
> - *  num_queues=
> + * -drive file=,if=none,id=
> + * -device nvme,drive=,serial=,id=
>   *
> - * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
> - * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
> + * Advanced optional options:
> + *
> + *   num_queues=  : Maximum number of IO Queues.
> + *  Default: 64
> + *   cmb_size_mb= : Size of Controller Memory Buffer in MBs.
> + *  Default: 0 (disabled)
>   */
>
>  #include "qemu/osdep.h"
> @@ -345,6 +347,8 @@ static void nvme_post_cqes(void *opaque)
>  static void nvme_enqueue_req_completion(NvmeCQueue *cq, NvmeRequest *req)
>  {
>  assert(cq->cqid == req->sq->cqid);
> +
> +trace_nvme_enqueue_req_completion(req->cid, cq->cqid, req->status);
>  QTAILQ_REMOVE(>sq->out_req_list, req, entry);
>  QTAILQ_INSERT_TAIL(>req_list, req, entry);
>  timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
> @@ -530,6 +534,7 @@ static void nvme_free_sq(NvmeSQueue *sq, NvmeCtrl *n)
>  if (sq->sqid) {
>  g_free(sq);
>  }
> +n->qs_created--;
>  }
>
>  static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeCmd *cmd)
> @@ -596,6 +601,7 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, 
> uint64_t dma_addr,
>  cq = n->cq[cqid];
>  QTAILQ_INSERT_TAIL(&(cq->sq_list), sq, entry);
>  n->sq[sqid] = sq;
> +n->qs_created++;
>  }
>
>  static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd *cmd)
> @@ -742,7 +748,8 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, 
> NvmeRequest *req)
>  uint32_t dw11 = le32_to_cpu(cmd->cdw11);
>  uint32_t dw12 = le32_to_cpu(cmd->cdw12);
>  uint32_t dw13 = le32_to_cpu(cmd->cdw13);
> -uint16_t lid = dw10 & 0xff;
> +uint8_t  lid = dw10 & 0xff;
> +uint8_t  lsp = (dw10 >> 8) & 0xf;
>  uint8_t  rae = (dw10 >> 15) & 0x1;
>  uint32_t numdl, numdu;
>  uint64_t off, lpol, lpou;
> @@ -760,7 +767,7 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, 
> NvmeRequest *req)
>  return NVME_INVALID_FIELD | NVME_DNR;
>  }
>
> -trace_nvme_get_log(req->cid, lid, rae, len, off);
> +trace_nvme_get_log(req->cid, lid, lsp, rae, len, off);
>
>  switch (lid) {
>  case NVME_LOG_ERROR_INFO:
> @@ -784,6 +791,7 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
>  if (cq->cqid) {
>  g_free(cq);
>  }
> +n->qs_created--;
>  }
>
>  static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeCmd *cmd)
> @@ -824,6 +832,7 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, 
> uint64_t dma_addr,
>  msix_vector_use(>parent_obj, cq->vector);
>  n->cq[cqid] = cq;
>  cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
> +n->qs_created++;
>  }
>
>  static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
> @@ -897,7 +906,7 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, 
> NvmeIdentify *c)
>  prp1, prp2);
>  }
>
> -static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
> +static uint16_t nvme_identify_ns_list(NvmeCtrl *n, NvmeIdentify *c)
>  {
>  static const int data_len = 4 * KiB;
>  uint32_t min_nsid = le32_to_cpu(c->nsid);
> @@ -907,7 +916,7 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
> NvmeIdentify *c)
>  uint16_t ret;
>  int i, j = 0;
>
> -trace_nvme_identify_nslist(min_nsid);
> +trace_nvme_identify_ns_list(min_nsid);
>
>  list = g_malloc0(data_len);
>  for (i = 0; i < n->num_namespaces; i++) {
> @@ -924,6 +933,41 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
> NvmeIdentify *c)
>  return ret;
>  }
>
> +static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeCmd *c)
> +{
> +static const int len = 4096;
> +
> +struct ns_descr {
> +uint8_t nidt;
> +uint8_t nidl;
> +uint8_t rsvd2[2];
> +uint8_t nid[16];
> +};

Re: [PATCH v2 04/20] nvme: populate the mandatory subnqn and ver fields

2019-11-12 Thread Beata Michalska

Hi Klaus

On Tue, 15 Oct 2019 at 11:42, Klaus Jensen  wrote:
>
> Required for compliance with NVMe revision 1.2.1 or later. See NVM
> Express 1.2.1, Section 5.11 ("Identify command"), Figure 90 and Section
> 7.9 ("NVMe Qualified Names").
>
> This also bumps the supported version to 1.2.1.
>
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 277700fdcc58..16f0fba10b08 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -9,9 +9,9 @@
>   */
>
>  /**
> - * Reference Specs: http://www.nvmexpress.org, 1.2, 1.1, 1.0e
> + * Reference Specification: NVM Express 1.2.1
>   *
> - *  http://www.nvmexpress.org/resources/
> + *   https://nvmexpress.org/resources/specifications/
>   */
>
>  /**
> @@ -1366,6 +1366,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
> **errp)
>  id->ieee[0] = 0x00;
>  id->ieee[1] = 0x02;
>  id->ieee[2] = 0xb3;
> +id->ver = cpu_to_le32(0x00010201);
>  id->oacs = cpu_to_le16(0);
>  id->frmw = 7 << 1;
>  id->lpa = 1 << 0;
> @@ -1373,6 +1374,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
> **errp)
>  id->cqes = (0x4 << 4) | 0x4;
>  id->nn = cpu_to_le32(n->num_namespaces);
>  id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROS | NVME_ONCS_TIMESTAMP);
> +
> +strcpy((char *) id->subnqn, "nqn.2019-08.org.qemu:");
> +pstrcat((char *) id->subnqn, sizeof(id->subnqn), n->params.serial);
> +
>  id->psd[0].mp = cpu_to_le16(0x9c4);
>  id->psd[0].enlat = cpu_to_le32(0x10);
>  id->psd[0].exlat = cpu_to_le32(0x4);
> @@ -1387,7 +1392,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
> **errp)
>  NVME_CAP_SET_CSS(n->bar.cap, 1);
>  NVME_CAP_SET_MPSMAX(n->bar.cap, 4);
>
> -n->bar.vs = 0x00010200;
> +n->bar.vs = 0x00010201;

Very minor:

The version number is being set twice in the patch series already.
And it is being set in two places.
It might be worth to make a #define out of it so that only one
needs to be changed.

BR


Beata
>  n->bar.intmc = n->bar.intms = 0;
>
>  if (n->params.cmb_size_mb) {
> --
> 2.23.0
>
>

Re: [PATCH v2 06/20] nvme: add support for the abort command

2019-11-12 Thread Beata Michalska

Hi Klaus

On Tue, 15 Oct 2019 at 11:41, Klaus Jensen  wrote:
>
> Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
> Section 5.1 ("Abort command").
>
> The Abort command is a best effort command; for now, the device always
> fails to abort the given command.
>
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c | 16 
>  1 file changed, 16 insertions(+)
>
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index daa2367b0863..84e4f2ea7a15 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -741,6 +741,18 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
>  }
>  }
>
> +static uint16_t nvme_abort(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> +{
> +uint16_t sqid = le32_to_cpu(cmd->cdw10) & 0x;
> +
> +req->cqe.result = 1;
> +if (nvme_check_sqid(n, sqid)) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
Shouldn't we validate the CID as well ?

> +return NVME_SUCCESS;
> +}
> +
>  static inline void nvme_set_timestamp(NvmeCtrl *n, uint64_t ts)
>  {
>  trace_nvme_setfeat_timestamp(ts);
> @@ -859,6 +871,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd 
> *cmd, NvmeRequest *req)
>  trace_nvme_err_invalid_setfeat(dw10);
>  return NVME_INVALID_FIELD | NVME_DNR;
>  }
> +
>  return NVME_SUCCESS;
>  }
>
> @@ -875,6 +888,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd *cmd, 
> NvmeRequest *req)
>  return nvme_create_cq(n, cmd);
>  case NVME_ADM_CMD_IDENTIFY:
>  return nvme_identify(n, cmd);
> +case NVME_ADM_CMD_ABORT:
> +return nvme_abort(n, cmd, req);
>  case NVME_ADM_CMD_SET_FEATURES:
>  return nvme_set_feature(n, cmd, req);
>  case NVME_ADM_CMD_GET_FEATURES:
> @@ -1388,6 +1403,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
> **errp)
>  id->ieee[2] = 0xb3;
>  id->ver = cpu_to_le32(0x00010201);
>  id->oacs = cpu_to_le16(0);
> +id->acl = 3;
So we are setting the max number of concurrent commands
but there is no logic to enforce that and wrap up with the
status suggested by specification.

BR


Beata
>  id->frmw = 7 << 1;
>  id->lpa = 1 << 0;
>  id->sqes = (0x6 << 4) | 0x6;
> --
> 2.23.0
>
>

Re: [PATCH v2 08/20] nvme: add support for the get log page command

2019-11-12 Thread Beata Michalska

Hi Klaus,


On Tue, 15 Oct 2019 at 11:45, Klaus Jensen  wrote:
>
> Add support for the Get Log Page command and basic implementations
> of the mandatory Error Information, SMART/Health Information and
> Firmware Slot Information log pages.
>
> In violation of the specification, the SMART/Health Information log page
> does not persist information over the lifetime of the controller because
> the device has no place to store such persistent state.
>
> Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
> Section 5.10 ("Get Log Page command").
>
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c   | 150 +-
>  hw/block/nvme.h   |   9 ++-
>  hw/block/trace-events |   2 +
>  include/block/nvme.h  |   2 +-
>  4 files changed, 160 insertions(+), 3 deletions(-)
>
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 1fdb3b8655ed..4412a3bea3bc 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -44,6 +44,7 @@
>  #include "nvme.h"
>
>  #define NVME_MAX_QS PCI_MSIX_FLAGS_QSIZE
> +#define NVME_TEMPERATURE 0x143
>
>  #define NVME_GUEST_ERR(trace, fmt, ...) \
>  do { \
> @@ -577,6 +578,137 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd 
> *cmd)
>  return NVME_SUCCESS;
>  }
>
> +static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd,
> +uint32_t buf_len, uint64_t off, NvmeRequest *req)
> +{
> +uint32_t trans_len;
> +uint64_t prp1 = le64_to_cpu(cmd->prp1);
> +uint64_t prp2 = le64_to_cpu(cmd->prp2);
> +
> +if (off > sizeof(*n->elpes) * (n->params.elpe + 1)) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
> +trans_len = MIN(sizeof(*n->elpes) * (n->params.elpe + 1) - off, buf_len);
> +
> +return nvme_dma_read_prp(n, (uint8_t *) n->elpes + off, trans_len, prp1,
> +prp2);
> +}
> +
> +static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
> +uint64_t off, NvmeRequest *req)
> +{
> +uint64_t prp1 = le64_to_cpu(cmd->prp1);
> +uint64_t prp2 = le64_to_cpu(cmd->prp2);
> +uint32_t nsid = le32_to_cpu(cmd->nsid);
> +
> +uint32_t trans_len;
> +time_t current_ms;
> +uint64_t units_read = 0, units_written = 0, read_commands = 0,
> +write_commands = 0;
> +NvmeSmartLog smart;
> +BlockAcctStats *s;
> +
> +if (!nsid || (nsid != 0x && nsid > n->num_namespaces)) {
> +trace_nvme_err_invalid_ns(nsid, n->num_namespaces);
> +return NVME_INVALID_NSID | NVME_DNR;
> +}
> +
The LAP '0' bit is cleared now - which means there is no support
for per-namespace data. So in theory, if that was the aim, this condition
should check for the values different than 0x0 and 0x and either
abort the command or treat that as request for controller specific data.

> +s = blk_get_stats(n->conf.blk);
> +
> +units_read = s->nr_bytes[BLOCK_ACCT_READ] >> BDRV_SECTOR_BITS;
> +units_written = s->nr_bytes[BLOCK_ACCT_WRITE] >> BDRV_SECTOR_BITS;
> +read_commands = s->nr_ops[BLOCK_ACCT_READ];
> +write_commands = s->nr_ops[BLOCK_ACCT_WRITE];
> +
> +if (off > sizeof(smart)) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
> +trans_len = MIN(sizeof(smart) - off, buf_len);
> +
> +memset(, 0x0, sizeof(smart));
> +
> +smart.data_units_read[0] = cpu_to_le64(units_read / 1000);
> +smart.data_units_written[0] = cpu_to_le64(units_written / 1000);
> +smart.host_read_commands[0] = cpu_to_le64(read_commands);
> +smart.host_write_commands[0] = cpu_to_le64(write_commands);
> +
> +smart.number_of_error_log_entries[0] = cpu_to_le64(0);
> +smart.temperature[0] = n->temperature & 0xff;
> +smart.temperature[1] = (n->temperature >> 8) & 0xff;
> +
> +if (n->features.temp_thresh <= n->temperature) {
> +smart.critical_warning |= NVME_SMART_TEMPERATURE;
> +}
> +
> +current_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
> +smart.power_on_hours[0] = cpu_to_le64(
> +(((current_ms - n->starttime_ms) / 1000) / 60) / 60);
> +
> +return nvme_dma_read_prp(n, (uint8_t *)  + off, trans_len, prp1,
> +prp2);
> +}
> +
> +static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
> +uint64_t off, NvmeRequest *req)
> +{
> +uint32_t trans_len;
> +uint64_t prp1 = le64_to_cpu(cmd->prp1);
> +uint64_t prp2 = le64_to_cpu(cmd->prp2);
> +NvmeFwSlotInfoLog fw_log;
> +
> +if (off > sizeof(fw_log)) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
> +memset(_log, 0, sizeof(NvmeFwSlotInfoLog));
> +
> +trans_len = MIN(sizeof(fw_log) - off, buf_len);
> +
> +return nvme_dma_read_prp(n, (uint8_t *) _log + off, trans_len, prp1,
> +prp2);
> +}
> +
> +static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> +{
> +uint32_t dw10 = le32_to_cpu(cmd->cdw10);
> +uint32_t dw11 = le32_to_cpu(cmd->cdw11);
> +uint32_t dw12 = le32_to_cpu(cmd->cdw12);
> +uint32_t dw13 =

Re: [PATCH v2 2/4] Memory: Enable writeback for given memory region

2019-11-07 Thread Beata Michalska

On Wed, 6 Nov 2019 at 12:20, Richard Henderson
 wrote:
>
> On 11/6/19 12:40 AM, Beata Michalska wrote:
> > +void qemu_ram_writeback(RAMBlock *block, ram_addr_t start, ram_addr_t 
> > length)
> > +{
> > +void *addr = ramblock_ptr(block, start);
> > +
> > +/*
> > + * The requested range might spread up to the very end of the block
> > + */
> > +if ((start + length) > block->used_length) {
> > +qemu_log("%s: sync range outside the block boundaries: "
> > + "start: " RAM_ADDR_FMT " length: " RAM_ADDR_FMT
> > + " block length: " RAM_ADDR_FMT " Narrowing down ..." ,
> > + __func__, start, length, block->used_length);
> > +length = block->used_length - start;
> > +}
>
> qemu_log_mask w/ GUEST_ERROR?  How do we expect the length to overflow?

In theory it shouldn't, at least with current usage.
I guess the probe_access will make sure of that.
This was more of a precaution to enable catching potential/future misuses
aka debugging purpose. I can get rid of that it that's playing too safe.

>
> > +#ifdef CONFIG_LIBPMEM
> > +/* The lack of support for pmem should not block the sync */
> > +if (ramblock_is_pmem(block)) {
> > +pmem_persist(addr, length);
> > +} else
> > +#endif
>
> Perhaps better to return out of that if block than have the dangling else.

Good idea
>
> > +/**
> > + * Sync changes made to the memory mapped file back to the backing
> > + * storage. For POSIX compliant systems this will simply fallback
> > + * to regular msync call (thus the required alignment). Otherwise
> > + * it will trigger whole file sync (including the metadata case
> > + * there is no support to skip that otherwise)
> > + *
> > + * @addr   - start of the memory area to be synced
> > + * @length - length of the are to be synced
> > + * @align  - alignment (expected to be PAGE_SIZE)
> > + * @fd - file descriptor for the file to be synced
> > + *   (mandatory only for POSIX non-compliant systems)
> > + */
> > +int qemu_msync(void *addr, size_t length, size_t align, int fd)
> > +{
> > +#ifdef CONFIG_POSIX
> > +size_t align_mask;
> > +
> > +/* Bare minimum of sanity checks on the alignment */
> > +/* The start address needs to be a multiple of PAGE_SIZE */
> > +align = MAX(align, qemu_real_host_page_size);
> > +align_mask = ~(qemu_real_host_page_size - 1);
> > +align = (align + ~align_mask) & align_mask;
> > +
> > +align_mask = ~(align - 1);
>
> I don't understand what you're trying to do with align.
>
> You pass in qemu_host_page_size from the one caller, and then adjust it for
> qemu_real_host_page_size?
>
> Why pass in anything at all, and just use qemu_real_host_page_mask?

The qemu_msync was supposed to be generic and not tied to current use case
without any assumptions on the alignment and whether that would  be an actual
host page size. So that was just to make sure it will be a multiple of that.
I can get rid of that with assumption all will be using the same alignment.

BR
Beata
>
> > +/**
> > + * There are no strict reqs as per the length of mapping
> > + * to be synced. Still the length needs to follow the address
> > + * alignment changes. Additionally - round the size to the multiple
> > + * of requested alignment (expected as PAGE_SIZE)
> > + */
> > +length += ((uintptr_t)addr & (align - 1));
> > +length = (length + ~align_mask) & align_mask;
> > +
> > +addr = (void *)((uintptr_t)addr & align_mask);
> > +
> > +return msync(addr, length, MS_SYNC);
> > +#else /* CONFIG_POSIX */
> > +/**
> > + * Perform the sync based on the file descriptor
> > + * The sync range will most probably be wider than the one
> > + * requested - but it will still get the job done
> > + */
> > +return qemu_fdatasync(fd);
> > +#endif /* CONFIG_POSIX */
> > +}
>
>
> r~
>

[PATCH v2 4/4] target/arm: Add support for DC CVAP & DC CVADP ins

2019-11-05 Thread Beata Michalska

ARMv8.2 introduced support for Data Cache Clean instructions
to PoP (point-of-persistence) - DC CVAP and PoDP (point-of-deep-persistence)
- DV CVADP. Both specify conceptual points in a memory system where all writes
that are to reach them are considered persistent.
The support provided considers both to be actually the same so there is no
distinction between the two. If none is available (there is no backing store
for given memory) both will result in Data Cache Clean up to the point of
coherency. Otherwise sync for the specified range shall be performed.

Signed-off-by: Beata Michalska 
---
 linux-user/elfload.c |  2 ++
 target/arm/cpu.h | 10 ++
 target/arm/cpu64.c   |  1 +
 target/arm/helper.c  | 56 
 4 files changed, 69 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index f6693e5..07b16cc 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -656,6 +656,7 @@ static uint32_t get_elf_hwcap(void)
 GET_FEATURE_ID(aa64_jscvt, ARM_HWCAP_A64_JSCVT);
 GET_FEATURE_ID(aa64_sb, ARM_HWCAP_A64_SB);
 GET_FEATURE_ID(aa64_condm_4, ARM_HWCAP_A64_FLAGM);
+GET_FEATURE_ID(aa64_dcpop, ARM_HWCAP_A64_DCPOP);
 
 return hwcaps;
 }
@@ -665,6 +666,7 @@ static uint32_t get_elf_hwcap2(void)
 ARMCPU *cpu = ARM_CPU(thread_cpu);
 uint32_t hwcaps = 0;
 
+GET_FEATURE_ID(aa64_dcpodp, ARM_HWCAP2_A64_DCPODP);
 GET_FEATURE_ID(aa64_condm_5, ARM_HWCAP2_A64_FLAGM2);
 GET_FEATURE_ID(aa64_frint, ARM_HWCAP2_A64_FRINT);
 
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index e1a66a2..0dc22c6 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3617,6 +3617,16 @@ static inline bool isar_feature_aa64_frint(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, FRINTTS) != 0;
 }
 
+static inline bool isar_feature_aa64_dcpop(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) != 0;
+}
+
+static inline bool isar_feature_aa64_dcpodp(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) >= 2;
+}
+
 static inline bool isar_feature_aa64_fp16(const ARMISARegisters *id)
 {
 /* We always set the AdvSIMD and FP fields identically wrt FP16.  */
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 68baf04..e6a033e 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -661,6 +661,7 @@ static void aarch64_max_initfn(Object *obj)
 cpu->isar.id_aa64isar0 = t;
 
 t = cpu->isar.id_aa64isar1;
+t = FIELD_DP64(t, ID_AA64ISAR1, DPB, 2);
 t = FIELD_DP64(t, ID_AA64ISAR1, JSCVT, 1);
 t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
 t = FIELD_DP64(t, ID_AA64ISAR1, APA, 1); /* PAuth, architected only */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index be67e2c..00c72e4 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -5924,6 +5924,52 @@ static const ARMCPRegInfo rndr_reginfo[] = {
   .access = PL0_R, .readfn = rndr_readfn },
 REGINFO_SENTINEL
 };
+
+#ifndef CONFIG_USER_ONLY
+static void dccvap_writefn(CPUARMState *env, const ARMCPRegInfo *opaque,
+  uint64_t value)
+{
+ARMCPU *cpu = env_archcpu(env);
+/* CTR_EL0 System register -> DminLine, bits [19:16] */
+uint64_t dline_size = 4 << ((cpu->ctr >> 16) & 0xF);
+uint64_t vaddr_in = (uint64_t) value;
+uint64_t vaddr = vaddr_in & ~(dline_size - 1);
+void *haddr;
+int mem_idx = cpu_mmu_index(env, false);
+
+/* This won't be crossing page boundaries */
+haddr = probe_read(env, vaddr, dline_size, mem_idx, GETPC());
+if (haddr) {
+
+ram_addr_t offset;
+MemoryRegion *mr;
+
+/* RCU lock is already being held */
+mr = memory_region_from_host(haddr, );
+
+if (mr) {
+memory_region_do_writeback(mr, offset, dline_size);
+}
+}
+}
+
+static const ARMCPRegInfo dcpop_reg[] = {
+{ .name = "DC_CVAP", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 12, .opc2 = 1,
+  .access = PL0_W, .type = ARM_CP_NO_RAW | ARM_CP_SUPPRESS_TB_END,
+  .accessfn = aa64_cacheop_access, .writefn = dccvap_writefn },
+REGINFO_SENTINEL
+};
+
+static const ARMCPRegInfo dcpodp_reg[] = {
+{ .name = "DC_CVADP", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 13, .opc2 = 1,
+  .access = PL0_W, .type = ARM_CP_NO_RAW | ARM_CP_SUPPRESS_TB_END,
+  .accessfn = aa64_cacheop_access, .writefn = dccvap_writefn },
+REGINFO_SENTINEL
+};
+#endif /*CONFIG_USER_ONLY*/
+
 #endif
 
 static CPAccessResult access_predinv(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -6884,6 +6930,16 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 if (cpu_isar_feature(aa64_rndr, cpu)) {
 define_arm_cp_regs(cpu, rndr_reginfo);
 }
+#ifndef CONFIG_USER_ONLY
+/* Data Cache clean ins

[PATCH v2 3/4] migration: ram: Switch to ram block writeback

2019-11-05 Thread Beata Michalska

Switch to ram block writeback for pmem migration.

Signed-off-by: Beata Michalska 
Reviewed-by: Richard Henderson 
Acked-by: Dr. David Alan Gilbert 
---
 migration/ram.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 5078f94..38070f1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -33,7 +33,6 @@
 #include "qemu/bitops.h"
 #include "qemu/bitmap.h"
 #include "qemu/main-loop.h"
-#include "qemu/pmem.h"
 #include "xbzrle.h"
 #include "ram.h"
 #include "migration.h"
@@ -3981,9 +3980,7 @@ static int ram_load_cleanup(void *opaque)
 RAMBlock *rb;
 
 RAMBLOCK_FOREACH_NOT_IGNORED(rb) {
-if (ramblock_is_pmem(rb)) {
-pmem_persist(rb->host, rb->used_length);
-}
+qemu_ram_block_writeback(rb);
 }
 
 xbzrle_load_cleanup();
-- 
2.7.4

[PATCH v2 2/4] Memory: Enable writeback for given memory region

2019-11-05 Thread Beata Michalska

Add an option to trigger memory writeback to sync given memory region
with the corresponding backing store, case one is available.
This extends the support for persistent memory, allowing syncing on-demand.

Signed-off-by: Beata Michalska 
---
 exec.c  | 43 +++
 include/exec/memory.h   |  6 ++
 include/exec/ram_addr.h |  8 
 include/qemu/cutils.h   |  1 +
 memory.c| 12 
 util/cutils.c   | 47 +++
 6 files changed, 117 insertions(+)

diff --git a/exec.c b/exec.c
index ffdb518..e1f06de 100644
--- a/exec.c
+++ b/exec.c
@@ -65,6 +65,8 @@
 #include "exec/ram_addr.h"
 #include "exec/log.h"
 
+#include "qemu/pmem.h"
+
 #include "migration/vmstate.h"
 
 #include "qemu/range.h"
@@ -2156,6 +2158,47 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, 
Error **errp)
 return 0;
 }
 
+/*
+ * Trigger sync on the given ram block for range [start, start + length]
+ * with the backing store if one is available.
+ * Otherwise no-op.
+ * @Note: this is supposed to be a synchronous op.
+ */
+void qemu_ram_writeback(RAMBlock *block, ram_addr_t start, ram_addr_t length)
+{
+void *addr = ramblock_ptr(block, start);
+
+/*
+ * The requested range might spread up to the very end of the block
+ */
+if ((start + length) > block->used_length) {
+qemu_log("%s: sync range outside the block boundaries: "
+ "start: " RAM_ADDR_FMT " length: " RAM_ADDR_FMT
+ " block length: " RAM_ADDR_FMT " Narrowing down ..." ,
+ __func__, start, length, block->used_length);
+length = block->used_length - start;
+}
+
+#ifdef CONFIG_LIBPMEM
+/* The lack of support for pmem should not block the sync */
+if (ramblock_is_pmem(block)) {
+pmem_persist(addr, length);
+} else
+#endif
+if (block->fd >= 0) {
+/**
+ * Case there is no support for PMEM or the memory has not been
+ * specified as persistent (or is not one) - use the msync.
+ * Less optimal but still achieves the same goal
+ */
+if (qemu_msync(addr, length, qemu_host_page_size, block->fd)) {
+warn_report("%s: failed to sync memory range: start: "
+RAM_ADDR_FMT " length: " RAM_ADDR_FMT,
+__func__, start, length);
+}
+}
+}
+
 /* Called with ram_list.mutex held */
 static void dirty_memory_extend(ram_addr_t old_ram_size,
 ram_addr_t new_ram_size)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index e499dc2..27a84e0 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1265,6 +1265,12 @@ void *memory_region_get_ram_ptr(MemoryRegion *mr);
  */
 void memory_region_ram_resize(MemoryRegion *mr, ram_addr_t newsize,
   Error **errp);
+/**
+ * memory_region_do_writeback: Trigger writeback for selected address range
+ * [addr, addr + size]
+ *
+ */
+void memory_region_do_writeback(MemoryRegion *mr, hwaddr addr, hwaddr size);
 
 /**
  * memory_region_set_log: Turn dirty logging on or off for a region.
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index bed0554..5adebb0 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -174,6 +174,14 @@ void qemu_ram_free(RAMBlock *block);
 
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp);
 
+void qemu_ram_writeback(RAMBlock *block, ram_addr_t start, ram_addr_t length);
+
+/* Clear whole block of mem */
+static inline void qemu_ram_block_writeback(RAMBlock *block)
+{
+qemu_ram_writeback(block, 0, block->used_length);
+}
+
 #define DIRTY_CLIENTS_ALL ((1 << DIRTY_MEMORY_NUM) - 1)
 #define DIRTY_CLIENTS_NOCODE  (DIRTY_CLIENTS_ALL & ~(1 << DIRTY_MEMORY_CODE))
 
diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
index b54c847..41c5fa9 100644
--- a/include/qemu/cutils.h
+++ b/include/qemu/cutils.h
@@ -130,6 +130,7 @@ const char *qemu_strchrnul(const char *s, int c);
 #endif
 time_t mktimegm(struct tm *tm);
 int qemu_fdatasync(int fd);
+int qemu_msync(void *addr, size_t length, size_t alignment, int fd);
 int fcntl_setfl(int fd, int flag);
 int qemu_parse_fd(const char *param);
 int qemu_strtoi(const char *nptr, const char **endptr, int base,
diff --git a/memory.c b/memory.c
index c952eab..15734a0 100644
--- a/memory.c
+++ b/memory.c
@@ -2214,6 +2214,18 @@ void memory_region_ram_resize(MemoryRegion *mr, 
ram_addr_t newsize, Error **errp
 qemu_ram_resize(mr->ram_block, newsize, errp);
 }
 
+
+void memory_region_do_writeback(MemoryRegion *mr, hwaddr addr, hwaddr size)
+{
+/*
+ * Might be extended case needed to cover
+ * different types of memory regions
+ */
+i

[PATCH v2 1/4] tcg: cputlb: Add probe_read

2019-11-05 Thread Beata Michalska

Add probe_read alongside the write probing equivalent.

Signed-off-by: Beata Michalska 
Reviewed-by: Alex Bennée 
---
 include/exec/exec-all.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index d85e610..350c4b4 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -339,6 +339,12 @@ static inline void *probe_write(CPUArchState *env, 
target_ulong addr, int size,
 return probe_access(env, addr, size, MMU_DATA_STORE, mmu_idx, retaddr);
 }
 
+static inline void *probe_read(CPUArchState *env, target_ulong addr, int size,
+   int mmu_idx, uintptr_t retaddr)
+{
+return probe_access(env, addr, size, MMU_DATA_LOAD, mmu_idx, retaddr);
+}
+
 #define CODE_GEN_ALIGN   16 /* must be >= of the size of a icache line 
*/
 
 /* Estimated block size for TB allocation.  */
-- 
2.7.4

[PATCH v2 0/4] target/arm: Support for Data Cache Clean up to PoP

2019-11-05 Thread Beata Michalska

ARMv8.2 introduced support for Data Cache Clean instructions to PoP
(point-of-persistence) and PoDP (point-of-deep-persistence):
ARMv8.2-DCCVAP &  ARMv8.2-DCCVADP respectively.
This patch set adds support for emulating both, though there is no
distinction between the two points: the PoDP is assumed to represent
the same point of persistence as PoP. Case there is no such point specified
for the considered memory system both will fall back to the DV CVAC inst
(clean up to the point of coherency).
The changes introduced include adding probe_read for validating read memory
access to allow verification for mandatory read access for both cache
clean instructions, along with support for writeback for requested memory
regions through msync, if one is available, based otherwise on fsyncdata.

As currently the virt platform is missing support for NVDIMM,
the changes have been tested  with [1] & [2]


[1] https://patchwork.kernel.org/cover/10830237/
[2] https://patchwork.kernel.org/project/qemu-devel/list/?series=159441

v2:
- Moved the msync into a qemu wrapper with
  CONFIG_POSIX switch + additional comments
- Fixed length alignment
- Dropped treating the DC CVAP/CVADP as special case
  and moved those to conditional registration
- Dropped needless locking for grabbing mem region


Beata Michalska (4):
  tcg: cputlb: Add probe_read
  Memory: Enable writeback for given memory region
  migration: ram: Switch to ram block writeback
  target/arm: Add support for DC CVAP & DC CVADP ins

 exec.c  | 43 +
 include/exec/exec-all.h |  6 ++
 include/exec/memory.h   |  6 ++
 include/exec/ram_addr.h |  8 +++
 include/qemu/cutils.h   |  1 +
 linux-user/elfload.c|  2 ++
 memory.c| 12 +++
 migration/ram.c |  5 +
 target/arm/cpu.h| 10 +
 target/arm/cpu64.c  |  1 +
 target/arm/helper.c | 56 +
 util/cutils.c   | 47 +
 12 files changed, 193 insertions(+), 4 deletions(-)

-- 
2.7.4

Re: [PATCH v6 6/9] target/arm/kvm64: max cpu: Enable SVE when available

2019-10-22 Thread Beata Michalska

Hi Andrew

On Wed, 16 Oct 2019 at 10:03, Andrew Jones  wrote:
>
> Enable SVE in the KVM guest when the 'max' cpu type is configured
> and KVM supports it. KVM SVE requires use of the new finalize
> vcpu ioctl, so we add that now too. For starters SVE can only be
> turned on or off, getting all vector lengths the host CPU supports
> when on. We'll add the other SVE CPU properties in later patches.
>
> Signed-off-by: Andrew Jones 
> Reviewed-by: Richard Henderson 
> Reviewed-by: Eric Auger 

Reviewed-by: Beata Michalska 

Thanks,

BR
Beata
> ---
>  target/arm/cpu64.c   | 17 ++---
>  target/arm/kvm.c |  5 +
>  target/arm/kvm64.c   | 20 +++-
>  target/arm/kvm_arm.h | 27 +++
>  tests/arm-cpu-features.c |  1 +
>  5 files changed, 66 insertions(+), 4 deletions(-)
>
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index 34b0ba2cf6f7..a771a28daa56 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -493,6 +493,11 @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, 
> const char *name,
>  return;
>  }
>
> +if (value && kvm_enabled() && !kvm_arm_sve_supported(CPU(cpu))) {
> +error_setg(errp, "'sve' feature not supported by KVM on this host");
> +return;
> +}
> +
>  t = cpu->isar.id_aa64pfr0;
>  t = FIELD_DP64(t, ID_AA64PFR0, SVE, value);
>  cpu->isar.id_aa64pfr0 = t;
> @@ -507,11 +512,16 @@ static void aarch64_max_initfn(Object *obj)
>  {
>  ARMCPU *cpu = ARM_CPU(obj);
>  uint32_t vq;
> +uint64_t t;
>
>  if (kvm_enabled()) {
>  kvm_arm_set_cpu_features_from_host(cpu);
> +if (kvm_arm_sve_supported(CPU(cpu))) {
> +t = cpu->isar.id_aa64pfr0;
> +t = FIELD_DP64(t, ID_AA64PFR0, SVE, 1);
> +cpu->isar.id_aa64pfr0 = t;
> +}
>  } else {
> -uint64_t t;
>  uint32_t u;
>  aarch64_a57_initfn(obj);
>
> @@ -612,8 +622,6 @@ static void aarch64_max_initfn(Object *obj)
>
>  object_property_add(obj, "sve-max-vq", "uint32", 
> cpu_max_get_sve_max_vq,
>  cpu_max_set_sve_max_vq, NULL, NULL, 
> _fatal);
> -object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
> -cpu_arm_set_sve, NULL, NULL, _fatal);
>
>  for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
>  char name[8];
> @@ -622,6 +630,9 @@ static void aarch64_max_initfn(Object *obj)
>  cpu_arm_set_sve_vq, NULL, NULL, 
> _fatal);
>  }
>  }
> +
> +object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
> +cpu_arm_set_sve, NULL, NULL, _fatal);
>  }
>
>  struct ARMCPUInfo {
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index b473c63edb1c..f07332bbda30 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -51,6 +51,11 @@ int kvm_arm_vcpu_init(CPUState *cs)
>  return kvm_vcpu_ioctl(cs, KVM_ARM_VCPU_INIT, );
>  }
>
> +int kvm_arm_vcpu_finalize(CPUState *cs, int feature)
> +{
> +return kvm_vcpu_ioctl(cs, KVM_ARM_VCPU_FINALIZE, );
> +}
> +
>  void kvm_arm_init_serror_injection(CPUState *cs)
>  {
>  cap_has_inject_serror_esr = kvm_check_extension(cs->kvm_state,
> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> index 4c0b11d105a4..850da1b5e6aa 100644
> --- a/target/arm/kvm64.c
> +++ b/target/arm/kvm64.c
> @@ -602,6 +602,13 @@ bool kvm_arm_aarch32_supported(CPUState *cpu)
>  return kvm_check_extension(s, KVM_CAP_ARM_EL1_32BIT);
>  }
>
> +bool kvm_arm_sve_supported(CPUState *cpu)
> +{
> +KVMState *s = KVM_STATE(current_machine->accelerator);
> +
> +return kvm_check_extension(s, KVM_CAP_ARM_SVE);
> +}
> +
>  #define ARM_CPU_ID_MPIDR   3, 0, 0, 0, 5
>
>  int kvm_arch_init_vcpu(CPUState *cs)
> @@ -630,13 +637,17 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_EL1_32BIT;
>  }
>  if (!kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_PMU_V3)) {
> -cpu->has_pmu = false;
> +cpu->has_pmu = false;
>  }
>  if (cpu->has_pmu) {
>  cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_PMU_V3;
>  } else {
>  unset_feature(>features, ARM_FEATURE_PMU);
>  }
> +if (cpu_isar_feature(aa64_sve, cpu)) {
> +assert(kvm_arm_sve_supported(cs));
> +cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_SVE;
> +}
>
>  /* Do KVM_ARM_VCPU_

Re: [PATCH v6 1/9] target/arm/monitor: Introduce qmp_query_cpu_model_expansion

2019-10-22 Thread Beata Michalska

Hi Andrew,

On Wed, 16 Oct 2019 at 09:59, Andrew Jones  wrote:
>
> Add support for the query-cpu-model-expansion QMP command to Arm. We
> do this selectively, only exposing CPU properties which represent
> optional CPU features which the user may want to enable/disable.
> Additionally we restrict the list of queryable cpu models to 'max',
> 'host', or the current type when KVM is in use. And, finally, we only
> implement expansion type 'full', as Arm does not yet have a "base"
> CPU type. More details and example queries are described in a new
> document (docs/arm-cpu-features.rst).
>
> Note, certainly more features may be added to the list of advertised
> features, e.g. 'vfp' and 'neon'. The only requirement is that we can
> detect invalid configurations and emit failures at QMP query time.
> For 'vfp' and 'neon' this will require some refactoring to share a
> validation function between the QMP query and the CPU realize
> functions.
>
> Signed-off-by: Andrew Jones 
> Reviewed-by: Richard Henderson 
> Reviewed-by: Eric Auger 
> ---

Reviewed-by: Beata Michalska 

Thanks.

BR
Beata
>  docs/arm-cpu-features.rst | 137 +++
>  qapi/machine-target.json  |   6 +-
>  target/arm/monitor.c  | 146 ++
>  3 files changed, 286 insertions(+), 3 deletions(-)
>  create mode 100644 docs/arm-cpu-features.rst
>
> diff --git a/docs/arm-cpu-features.rst b/docs/arm-cpu-features.rst
> new file mode 100644
> index ..c79dcffb5556
> --- /dev/null
> +++ b/docs/arm-cpu-features.rst
> @@ -0,0 +1,137 @@
> +
> +ARM CPU Features
> +
> +
> +Examples of probing and using ARM CPU features
> +
> +Introduction
> +
> +
> +CPU features are optional features that a CPU of supporting type may
> +choose to implement or not.  In QEMU, optional CPU features have
> +corresponding boolean CPU proprieties that, when enabled, indicate
> +that the feature is implemented, and, conversely, when disabled,
> +indicate that it is not implemented. An example of an ARM CPU feature
> +is the Performance Monitoring Unit (PMU).  CPU types such as the
> +Cortex-A15 and the Cortex-A57, which respectively implement ARM
> +architecture reference manuals ARMv7-A and ARMv8-A, may both optionally
> +implement PMUs.  For example, if a user wants to use a Cortex-A15 without
> +a PMU, then the `-cpu` parameter should contain `pmu=off` on the QEMU
> +command line, i.e. `-cpu cortex-a15,pmu=off`.
> +
> +As not all CPU types support all optional CPU features, then whether or
> +not a CPU property exists depends on the CPU type.  For example, CPUs
> +that implement the ARMv8-A architecture reference manual may optionally
> +support the AArch32 CPU feature, which may be enabled by disabling the
> +`aarch64` CPU property.  A CPU type such as the Cortex-A15, which does
> +not implement ARMv8-A, will not have the `aarch64` CPU property.
> +
> +QEMU's support may be limited for some CPU features, only partially
> +supporting the feature or only supporting the feature under certain
> +configurations.  For example, the `aarch64` CPU feature, which, when
> +disabled, enables the optional AArch32 CPU feature, is only supported
> +when using the KVM accelerator and when running on a host CPU type that
> +supports the feature.
> +
> +CPU Feature Probing
> +===
> +
> +Determining which CPU features are available and functional for a given
> +CPU type is possible with the `query-cpu-model-expansion` QMP command.
> +Below are some examples where `scripts/qmp/qmp-shell` (see the top comment
> +block in the script for usage) is used to issue the QMP commands.
> +
> +(1) Determine which CPU features are available for the `max` CPU type
> +(Note, we started QEMU with qemu-system-aarch64, so `max` is
> + implementing the ARMv8-A reference manual in this case)::
> +
> +  (QEMU) query-cpu-model-expansion type=full model={"name":"max"}
> +  { "return": {
> +"model": { "name": "max", "props": {
> +"pmu": true, "aarch64": true
> +  
> +
> +We see that the `max` CPU type has the `pmu` and `aarch64` CPU features.
> +We also see that the CPU features are enabled, as they are all `true`.
> +
> +(2) Let's try to disable the PMU::
> +
> +  (QEMU) query-cpu-model-expansion type=full 
> model={"name":"max","props":{"pmu":false}}
> +  { "return": {
> +"model": { "name": "max", "props": {
> +"pmu": false, "aarch64": t

Re: [PATCH v5 1/9] target/arm/monitor: Introduce qmp_query_cpu_model_expansion

2019-10-22 Thread Beata Michalska

Hi Andrew,

On Tue, 22 Oct 2019 at 14:43, Andrew Jones  wrote:
>
> On Mon, Oct 21, 2019 at 04:07:14PM +0100, Beata Michalska wrote:
> > Indeed, the patch got bit messed-up. Apologies for that as well.
> > I have been testing manually but I did try the test you have provided
> > and yes it fails - there is a slight problem with the case when qdict_in
> > is empty,but this can be easily solved still keeping the single loop.
> > Otherwise I have seen you have posted a new patchest so I guess we  are
> > dropping the idea of refactoring ?
>
> Well, without a patch that applies, I couldn't really evaluate your
> proposal. And, TBH, I'd rather not hold this series up on a refactoring
> that doesn't provide measurable performance improvements, especially when
> it's not in a performance critical path. Indeed, I'd like to get this
> series merged as soon as possible, which is why I posted v6 with your
> visit_free() fix already.
>
> >
> > One more question: in case of querying a property which is not supported
> > by given cpu model - we are returning properties that are actually valid
> > (the test case for cortex-a15 and aarch64 prop).
> > Shouldn't we return an error there? I honestly must admit I do not know
> > what is the expected behaviour for the qmp query in such cases.
>
> We do generate an error for that case:
>
> (QEMU) query-cpu-model-expansion type=full model={"name":"cortex-a15"}
> {"return": {"model": {"name": "cortex-a15", "props": {"pmu": true
>
> (QEMU) query-cpu-model-expansion type=full 
> model={"name":"cortex-a15","props":{"aarch64":false}}
> {"error": {"class": "GenericError", "desc": "Property '.aarch64' not found"}}
>
>
> If you have any more comments on the series, please send them right away.
> I'd like Peter to be able to merge this soon, and I understand that he's
> waiting on your review.
>

I think we can proceed with the v6 as it is.

Thanks a lot.

BR
Beata

> Thanks,
> drew
>

Re: [PATCH v6 4/9] target/arm/cpu64: max cpu: Introduce sve properties

2019-10-22 Thread Beata Michalska

Hi Andrew

On Wed, 16 Oct 2019 at 09:57, Andrew Jones  wrote:
>
> Introduce cpu properties to give fine control over SVE vector lengths.
> We introduce a property for each valid length up to the current
> maximum supported, which is 2048-bits. The properties are named, e.g.
> sve128, sve256, sve384, sve512, ..., where the number is the number of
> bits. See the updates to docs/arm-cpu-features.rst for a description
> of the semantics and for example uses.
>
> Note, as sve-max-vq is still present and we'd like to be able to
> support qmp_query_cpu_model_expansion with guests launched with e.g.
> -cpu max,sve-max-vq=8 on their command lines, then we do allow
> sve-max-vq and sve properties to be provided at the same time, but
> this is not recommended, and is why sve-max-vq is not mentioned in the
> document.  If sve-max-vq is provided then it enables all lengths smaller
> than and including the max and disables all lengths larger. It also has
> the side-effect that no larger lengths may be enabled and that the max
> itself cannot be disabled. Smaller non-power-of-two lengths may,
> however, be disabled, e.g. -cpu max,sve-max-vq=4,sve384=off provides a
> guest the vector lengths 128, 256, and 512 bits.
>
> This patch has been co-authored with Richard Henderson, who reworked
> the target/arm/cpu64.c changes in order to push all the validation and
> auto-enabling/disabling steps into the finalizer, resulting in a nice
> LOC reduction.
>
> Signed-off-by: Andrew Jones 
> Reviewed-by: Richard Henderson 
> Reviewed-by: Eric Auger 

Reviewed-by: Beata Michalska 

Thanks.

BR
Beata
> ---
>  docs/arm-cpu-features.rst | 168 +++--
>  include/qemu/bitops.h |   1 +
>  target/arm/cpu.c  |  19 
>  target/arm/cpu.h  |  19 
>  target/arm/cpu64.c| 192 -
>  target/arm/helper.c   |  10 +-
>  target/arm/monitor.c  |  12 +++
>  tests/arm-cpu-features.c  | 194 ++
>  8 files changed, 606 insertions(+), 9 deletions(-)
>
> diff --git a/docs/arm-cpu-features.rst b/docs/arm-cpu-features.rst
> index c79dcffb5556..2ea4d6e90c02 100644
> --- a/docs/arm-cpu-features.rst
> +++ b/docs/arm-cpu-features.rst
> @@ -48,18 +48,31 @@ block in the script for usage) is used to issue the QMP 
> commands.
>(QEMU) query-cpu-model-expansion type=full model={"name":"max"}
>{ "return": {
>  "model": { "name": "max", "props": {
> -"pmu": true, "aarch64": true
> +"sve1664": true, "pmu": true, "sve1792": true, "sve1920": true,
> +"sve128": true, "aarch64": true, "sve1024": true, "sve": true,
> +"sve640": true, "sve768": true, "sve1408": true, "sve256": true,
> +"sve1152": true, "sve512": true, "sve384": true, "sve1536": true,
> +"sve896": true, "sve1280": true, "sve2048": true
>
>
> -We see that the `max` CPU type has the `pmu` and `aarch64` CPU features.
> -We also see that the CPU features are enabled, as they are all `true`.
> +We see that the `max` CPU type has the `pmu`, `aarch64`, `sve`, and many
> +`sve` CPU features.  We also see that all the CPU features are
> +enabled, as they are all `true`.  (The `sve` CPU features are all
> +optional SVE vector lengths (see "SVE CPU Properties").  While with TCG
> +all SVE vector lengths can be supported, when KVM is in use it's more
> +likely that only a few lengths will be supported, if SVE is supported at
> +all.)
>
>  (2) Let's try to disable the PMU::
>
>(QEMU) query-cpu-model-expansion type=full 
> model={"name":"max","props":{"pmu":false}}
>{ "return": {
>  "model": { "name": "max", "props": {
> -"pmu": false, "aarch64": true
> +"sve1664": true, "pmu": false, "sve1792": true, "sve1920": true,
> +"sve128": true, "aarch64": true, "sve1024": true, "sve": true,
> +"sve640": true, "sve768": true, "sve1408": true, "sve256": true,
> +"sve1152": true, "sve512": true, "sve384": true, "sve1536": true,
> +"sve896": true, "sve1280": true, "sve2048": true
>
>
>  We see it worked

Re: [PATCH v6 7/9] target/arm/kvm: scratch vcpu: Preserve input kvm_vcpu_init features

2019-10-22 Thread Beata Michalska

HI Andrew

On Wed, 16 Oct 2019 at 09:57, Andrew Jones  wrote:
>
> kvm_arm_create_scratch_host_vcpu() takes a struct kvm_vcpu_init
> parameter. Rather than just using it as an output parameter to
> pass back the preferred target, use it also as an input parameter,
> allowing a caller to pass a selected target if they wish and to
> also pass cpu features. If the caller doesn't want to select a
> target they can pass -1 for the target which indicates they want
> to use the preferred target and have it passed back like before.
>
> Signed-off-by: Andrew Jones 
> Reviewed-by: Richard Henderson 
> Reviewed-by: Eric Auger 

Reviewed-by: Beata Michalska 

Thanks

BR
Beata
> ---
>  target/arm/kvm.c   | 20 +++-
>  target/arm/kvm32.c |  6 +-
>  target/arm/kvm64.c |  6 +-
>  3 files changed, 25 insertions(+), 7 deletions(-)
>
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index f07332bbda30..5b82cefef608 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -66,7 +66,7 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
> *cpus_to_try,
>int *fdarray,
>struct kvm_vcpu_init *init)
>  {
> -int ret, kvmfd = -1, vmfd = -1, cpufd = -1;
> +int ret = 0, kvmfd = -1, vmfd = -1, cpufd = -1;
>
>  kvmfd = qemu_open("/dev/kvm", O_RDWR);
>  if (kvmfd < 0) {
> @@ -86,7 +86,14 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
> *cpus_to_try,
>  goto finish;
>  }
>
> -ret = ioctl(vmfd, KVM_ARM_PREFERRED_TARGET, init);
> +if (init->target == -1) {
> +struct kvm_vcpu_init preferred;
> +
> +ret = ioctl(vmfd, KVM_ARM_PREFERRED_TARGET, );
> +if (!ret) {
> +init->target = preferred.target;
> +}
> +}
>  if (ret >= 0) {
>  ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, init);
>  if (ret < 0) {
> @@ -98,10 +105,12 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
> *cpus_to_try,
>   * creating one kind of guest CPU which is its preferred
>   * CPU type.
>   */
> +struct kvm_vcpu_init try;
> +
>  while (*cpus_to_try != QEMU_KVM_ARM_TARGET_NONE) {
> -init->target = *cpus_to_try++;
> -memset(init->features, 0, sizeof(init->features));
> -ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, init);
> +try.target = *cpus_to_try++;
> +memcpy(try.features, init->features, sizeof(init->features));
> +ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, );
>  if (ret >= 0) {
>  break;
>  }
> @@ -109,6 +118,7 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
> *cpus_to_try,
>  if (ret < 0) {
>  goto err;
>  }
> +init->target = try.target;
>  } else {
>  /* Treat a NULL cpus_to_try argument the same as an empty
>   * list, which means we will fail the call since this must
> diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
> index 2451a2d4bbef..32bf8d6757c4 100644
> --- a/target/arm/kvm32.c
> +++ b/target/arm/kvm32.c
> @@ -53,7 +53,11 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures 
> *ahcf)
>  QEMU_KVM_ARM_TARGET_CORTEX_A15,
>  QEMU_KVM_ARM_TARGET_NONE
>  };
> -struct kvm_vcpu_init init;
> +/*
> + * target = -1 informs kvm_arm_create_scratch_host_vcpu()
> + * to use the preferred target
> + */
> +struct kvm_vcpu_init init = { .target = -1, };
>
>  if (!kvm_arm_create_scratch_host_vcpu(cpus_to_try, fdarray, )) {
>  return false;
> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> index 850da1b5e6aa..c7ecefbed720 100644
> --- a/target/arm/kvm64.c
> +++ b/target/arm/kvm64.c
> @@ -502,7 +502,11 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures 
> *ahcf)
>  KVM_ARM_TARGET_CORTEX_A57,
>  QEMU_KVM_ARM_TARGET_NONE
>  };
> -struct kvm_vcpu_init init;
> +/*
> + * target = -1 informs kvm_arm_create_scratch_host_vcpu()
> + * to use the preferred target
> + */
> +struct kvm_vcpu_init init = { .target = -1, };
>
>  if (!kvm_arm_create_scratch_host_vcpu(cpus_to_try, fdarray, )) {
>  return false;
> --
> 2.21.0
>
>

Re: [PATCH v6 3/9] target/arm: Allow SVE to be disabled via a CPU property

2019-10-22 Thread Beata Michalska

Hi Andrew

On Wed, 16 Oct 2019 at 09:57, Andrew Jones  wrote:
>
> Since 97a28b0eeac14 ("target/arm: Allow VFP and Neon to be disabled via
> a CPU property") we can disable the 'max' cpu model's VFP and neon
> features, but there's no way to disable SVE. Add the 'sve=on|off'
> property to give it that flexibility. We also rename
> cpu_max_get/set_sve_vq to cpu_max_get/set_sve_max_vq in order for them
> to follow the typical *_get/set_ pattern.
>
> Signed-off-by: Andrew Jones 
> Reviewed-by: Richard Henderson 
> Reviewed-by: Eric Auger 

Reviewed-by: Beata Michalska 

Thanks.

BR
Beata
> ---
>  target/arm/cpu.c |  3 ++-
>  target/arm/cpu64.c   | 52 ++--
>  target/arm/monitor.c |  2 +-
>  tests/arm-cpu-features.c |  1 +
>  4 files changed, 49 insertions(+), 9 deletions(-)
>
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 13813fb21354..2a1e95e90df3 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -200,7 +200,8 @@ static void arm_cpu_reset(CPUState *s)
>  env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 16, 2, 3);
>  env->cp15.cptr_el[3] |= CPTR_EZ;
>  /* with maximum vector length */
> -env->vfp.zcr_el[1] = cpu->sve_max_vq - 1;
> +env->vfp.zcr_el[1] = cpu_isar_feature(aa64_sve, cpu) ?
> + cpu->sve_max_vq - 1 : 0;
>  env->vfp.zcr_el[2] = env->vfp.zcr_el[1];
>  env->vfp.zcr_el[3] = env->vfp.zcr_el[1];
>  /*
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index d7f5bf610a7d..89a8ae77fe84 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -256,15 +256,23 @@ static void aarch64_a72_initfn(Object *obj)
>  define_arm_cp_regs(cpu, cortex_a72_a57_a53_cp_reginfo);
>  }
>
> -static void cpu_max_get_sve_vq(Object *obj, Visitor *v, const char *name,
> -   void *opaque, Error **errp)
> +static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char *name,
> +   void *opaque, Error **errp)
>  {
>  ARMCPU *cpu = ARM_CPU(obj);
> -visit_type_uint32(v, name, >sve_max_vq, errp);
> +uint32_t value;
> +
> +/* All vector lengths are disabled when SVE is off. */
> +if (!cpu_isar_feature(aa64_sve, cpu)) {
> +value = 0;
> +} else {
> +value = cpu->sve_max_vq;
> +}
> +visit_type_uint32(v, name, , errp);
>  }
>
> -static void cpu_max_set_sve_vq(Object *obj, Visitor *v, const char *name,
> -   void *opaque, Error **errp)
> +static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char *name,
> +   void *opaque, Error **errp)
>  {
>  ARMCPU *cpu = ARM_CPU(obj);
>  Error *err = NULL;
> @@ -279,6 +287,34 @@ static void cpu_max_set_sve_vq(Object *obj, Visitor *v, 
> const char *name,
>  error_propagate(errp, err);
>  }
>
> +static void cpu_arm_get_sve(Object *obj, Visitor *v, const char *name,
> +void *opaque, Error **errp)
> +{
> +ARMCPU *cpu = ARM_CPU(obj);
> +bool value = cpu_isar_feature(aa64_sve, cpu);
> +
> +visit_type_bool(v, name, , errp);
> +}
> +
> +static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
> +void *opaque, Error **errp)
> +{
> +ARMCPU *cpu = ARM_CPU(obj);
> +Error *err = NULL;
> +bool value;
> +uint64_t t;
> +
> +visit_type_bool(v, name, , );
> +if (err) {
> +error_propagate(errp, err);
> +return;
> +}
> +
> +t = cpu->isar.id_aa64pfr0;
> +t = FIELD_DP64(t, ID_AA64PFR0, SVE, value);
> +cpu->isar.id_aa64pfr0 = t;
> +}
> +
>  /* -cpu max: if KVM is enabled, like -cpu host (best possible with this 
> host);
>   * otherwise, a CPU with as many features enabled as our emulation supports.
>   * The version of '-cpu max' for qemu-system-arm is defined in cpu.c;
> @@ -391,8 +427,10 @@ static void aarch64_max_initfn(Object *obj)
>  #endif
>
>  cpu->sve_max_vq = ARM_MAX_VQ;
> -object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_vq,
> -cpu_max_set_sve_vq, NULL, NULL, _fatal);
> +object_property_add(obj, "sve-max-vq", "uint32", 
> cpu_max_get_sve_max_vq,
> +cpu_max_set_sve_max_vq, NULL, NULL, 
> _fatal);
> +object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
> +cpu_arm_set_sve, NULL, NULL, _fatal);
>  }
>

Re: [PATCH v5 1/9] target/arm/monitor: Introduce qmp_query_cpu_model_expansion

2019-10-21 Thread Beata Michalska

On Wed, 16 Oct 2019 at 17:16, Andrew Jones  wrote:
>
> On Wed, Oct 16, 2019 at 04:16:57PM +0100, Beata Michalska wrote:
> > On Wed, 16 Oct 2019 at 14:50, Andrew Jones  wrote:
> > >
> > > On Wed, Oct 16, 2019 at 02:24:50PM +0100, Beata Michalska wrote:
> > > > On Tue, 15 Oct 2019 at 12:56, Beata Michalska
> > > >  wrote:
> > > > >
> > > > > On Tue, 15 Oct 2019 at 11:56, Andrew Jones  wrote:
> > > > > >
> > > > > > On Tue, Oct 15, 2019 at 10:59:16AM +0100, Beata Michalska wrote:
> > > > > > > On Tue, 1 Oct 2019 at 14:04, Andrew Jones  
> > > > > > > wrote:
> > > > > > > > +
> > > > > > > > +obj = object_new(object_class_get_name(oc));
> > > > > > > > +
> > > > > > > > +if (qdict_in) {
> > > > > > > > +Visitor *visitor;
> > > > > > > > +Error *err = NULL;
> > > > > > > > +
> > > > > > > > +visitor = qobject_input_visitor_new(model->props);
> > > > > > > > +visit_start_struct(visitor, NULL, NULL, 0, );
> > > > > > > > +if (err) {
> > > > > > > > +object_unref(obj);
> > > > > > >
> > > > > > > Shouldn't we free the 'visitor' here as well ?
> > > > > >
> > > > > > Yes. Good catch. So we also need to fix
> > > > > > target/s390x/cpu_models.c:cpu_model_from_info(), which has the same
> > > > > > construction (the construction from which I derived this)
> > > > > >
> > > > > > >
> > > > > > > > +error_propagate(errp, err);
> > > > > > > > +return NULL;
> > > > > > > > +}
> > > > > > > > +
> > > > > >
> > > > > > What about the rest of the patch? With that fixed for v6 can I
> > > > > > add your r-b?
> > > > > >
> > > > >
> > > > > I still got this feeling that we could optimize that a bit - which I'm
> > > > > currently on, so hopefully I'll be able to add more comments soon if
> > > > > that proves to be the case.
> > > > >
> > > > > BR
> > > > > Beata
> > > >
> > > > I think there are few options that might be considered though the gain
> > > > is not huge .. but it's always smth:
> > > >
> > > > > +CpuModelExpansionInfo 
> > > > > *qmp_query_cpu_model_expansion(CpuModelExpansionType type,
> > > > > + CpuModelInfo 
> > > > > *model,
> > > > > + Error **errp)
> > > > > +{
> > > > > +CpuModelExpansionInfo *expansion_info;
> > > > > +const QDict *qdict_in = NULL;
> > > > > +QDict *qdict_out;
> > > > > +ObjectClass *oc;
> > > > > +Object *obj;
> > > > > +const char *name;
> > > > > +int i;
> > > > > +
> > > > > +if (type != CPU_MODEL_EXPANSION_TYPE_FULL) {
> > > > > +error_setg(errp, "The requested expansion type is not 
> > > > > supported");
> > > > > +return NULL;
> > > > > +}
> > > > > +
> > > > > +if (!kvm_enabled() && !strcmp(model->name, "host")) {
> > > > > +error_setg(errp, "The CPU type '%s' requires KVM", 
> > > > > model->name);
> > > > > +return NULL;
> > > > > +}
> > > > > +
> > > > > +oc = cpu_class_by_name(TYPE_ARM_CPU, model->name);
> > > > > +if (!oc) {
> > > > > +error_setg(errp, "The CPU type '%s' is not a recognized ARM 
> > > > > CPU type",
> > > > > +   model->name);
> > > > > +return NULL;
> > > > > +}
> > > > > +
> > > > > +if (kvm_enabled()) {
> > > > > +const char *cpu_type = current_machine->cpu_type;
> > > > > +int len = strlen(cpu_ty

Re: [PATCH v5 1/9] target/arm/monitor: Introduce qmp_query_cpu_model_expansion

2019-10-16 Thread Beata Michalska

On Wed, 16 Oct 2019 at 14:50, Andrew Jones  wrote:
>
> On Wed, Oct 16, 2019 at 02:24:50PM +0100, Beata Michalska wrote:
> > On Tue, 15 Oct 2019 at 12:56, Beata Michalska
> >  wrote:
> > >
> > > On Tue, 15 Oct 2019 at 11:56, Andrew Jones  wrote:
> > > >
> > > > On Tue, Oct 15, 2019 at 10:59:16AM +0100, Beata Michalska wrote:
> > > > > On Tue, 1 Oct 2019 at 14:04, Andrew Jones  wrote:
> > > > > > +
> > > > > > +obj = object_new(object_class_get_name(oc));
> > > > > > +
> > > > > > +if (qdict_in) {
> > > > > > +Visitor *visitor;
> > > > > > +Error *err = NULL;
> > > > > > +
> > > > > > +visitor = qobject_input_visitor_new(model->props);
> > > > > > +visit_start_struct(visitor, NULL, NULL, 0, );
> > > > > > +if (err) {
> > > > > > +object_unref(obj);
> > > > >
> > > > > Shouldn't we free the 'visitor' here as well ?
> > > >
> > > > Yes. Good catch. So we also need to fix
> > > > target/s390x/cpu_models.c:cpu_model_from_info(), which has the same
> > > > construction (the construction from which I derived this)
> > > >
> > > > >
> > > > > > +error_propagate(errp, err);
> > > > > > +return NULL;
> > > > > > +}
> > > > > > +
> > > >
> > > > What about the rest of the patch? With that fixed for v6 can I
> > > > add your r-b?
> > > >
> > >
> > > I still got this feeling that we could optimize that a bit - which I'm
> > > currently on, so hopefully I'll be able to add more comments soon if
> > > that proves to be the case.
> > >
> > > BR
> > > Beata
> >
> > I think there are few options that might be considered though the gain
> > is not huge .. but it's always smth:
> >
> > > +CpuModelExpansionInfo 
> > > *qmp_query_cpu_model_expansion(CpuModelExpansionType type,
> > > + CpuModelInfo *model,
> > > + Error **errp)
> > > +{
> > > +CpuModelExpansionInfo *expansion_info;
> > > +const QDict *qdict_in = NULL;
> > > +QDict *qdict_out;
> > > +ObjectClass *oc;
> > > +Object *obj;
> > > +const char *name;
> > > +int i;
> > > +
> > > +if (type != CPU_MODEL_EXPANSION_TYPE_FULL) {
> > > +error_setg(errp, "The requested expansion type is not 
> > > supported");
> > > +return NULL;
> > > +}
> > > +
> > > +if (!kvm_enabled() && !strcmp(model->name, "host")) {
> > > +error_setg(errp, "The CPU type '%s' requires KVM", model->name);
> > > +return NULL;
> > > +}
> > > +
> > > +oc = cpu_class_by_name(TYPE_ARM_CPU, model->name);
> > > +if (!oc) {
> > > +error_setg(errp, "The CPU type '%s' is not a recognized ARM CPU 
> > > type",
> > > +   model->name);
> > > +return NULL;
> > > +}
> > > +
> > > +if (kvm_enabled()) {
> > > +const char *cpu_type = current_machine->cpu_type;
> > > +int len = strlen(cpu_type) - strlen(ARM_CPU_TYPE_SUFFIX);
> > > +bool supported = false;
> > > +
> > > +if (!strcmp(model->name, "host") || !strcmp(model->name, "max")) 
> > > {
> > > +/* These are kvmarm's recommended cpu types */
> > > +supported = true;
> > > +} else if (strlen(model->name) == len &&
> > > +   !strncmp(model->name, cpu_type, len)) {
> > > +/* KVM is enabled and we're using this type, so it works. */
> > > +supported = true;
> > > +}
> > > +if (!supported) {
> > > +error_setg(errp, "We cannot guarantee the CPU type '%s' 
> > > works "
> > > + "with KVM on this host", model->name);
> > > +return NULL;
> > > +}
> > > +}
> >

Re: [PATCH v5 1/9] target/arm/monitor: Introduce qmp_query_cpu_model_expansion

2019-10-16 Thread Beata Michalska

On Tue, 15 Oct 2019 at 12:56, Beata Michalska
 wrote:
>
> On Tue, 15 Oct 2019 at 11:56, Andrew Jones  wrote:
> >
> > On Tue, Oct 15, 2019 at 10:59:16AM +0100, Beata Michalska wrote:
> > > On Tue, 1 Oct 2019 at 14:04, Andrew Jones  wrote:
> > > > +
> > > > +obj = object_new(object_class_get_name(oc));
> > > > +
> > > > +if (qdict_in) {
> > > > +Visitor *visitor;
> > > > +Error *err = NULL;
> > > > +
> > > > +visitor = qobject_input_visitor_new(model->props);
> > > > +visit_start_struct(visitor, NULL, NULL, 0, );
> > > > +if (err) {
> > > > +object_unref(obj);
> > >
> > > Shouldn't we free the 'visitor' here as well ?
> >
> > Yes. Good catch. So we also need to fix
> > target/s390x/cpu_models.c:cpu_model_from_info(), which has the same
> > construction (the construction from which I derived this)
> >
> > >
> > > > +error_propagate(errp, err);
> > > > +return NULL;
> > > > +}
> > > > +
> >
> > What about the rest of the patch? With that fixed for v6 can I
> > add your r-b?
> >
>
> I still got this feeling that we could optimize that a bit - which I'm
> currently on, so hopefully I'll be able to add more comments soon if
> that proves to be the case.
>
> BR
> Beata

I think there are few options that might be considered though the gain
is not huge .. but it's always smth:

> +CpuModelExpansionInfo *qmp_query_cpu_model_expansion(CpuModelExpansionType 
> type,
> + CpuModelInfo *model,
> + Error **errp)
> +{
> +CpuModelExpansionInfo *expansion_info;
> +const QDict *qdict_in = NULL;
> +QDict *qdict_out;
> +ObjectClass *oc;
> +Object *obj;
> +const char *name;
> +int i;
> +
> +if (type != CPU_MODEL_EXPANSION_TYPE_FULL) {
> +error_setg(errp, "The requested expansion type is not supported");
> +return NULL;
> +}
> +
> +if (!kvm_enabled() && !strcmp(model->name, "host")) {
> +error_setg(errp, "The CPU type '%s' requires KVM", model->name);
> +return NULL;
> +}
> +
> +oc = cpu_class_by_name(TYPE_ARM_CPU, model->name);
> +if (!oc) {
> +error_setg(errp, "The CPU type '%s' is not a recognized ARM CPU 
> type",
> +   model->name);
> +return NULL;
> +}
> +
> +if (kvm_enabled()) {
> +const char *cpu_type = current_machine->cpu_type;
> +int len = strlen(cpu_type) - strlen(ARM_CPU_TYPE_SUFFIX);
> +bool supported = false;
> +
> +if (!strcmp(model->name, "host") || !strcmp(model->name, "max")) {
> +/* These are kvmarm's recommended cpu types */
> +supported = true;
> +} else if (strlen(model->name) == len &&
> +   !strncmp(model->name, cpu_type, len)) {
> +/* KVM is enabled and we're using this type, so it works. */
> +supported = true;
> +}
> +if (!supported) {
> +error_setg(errp, "We cannot guarantee the CPU type '%s' works "
> + "with KVM on this host", model->name);
> +return NULL;
> +}
> +}
> +

The above section can be slightly reduced and rearranged - preferably
moved to a separate function
-> get_cpu_model (...) ?

* You can check the 'host' model first and then validate the accelerator ->
if ( !strcmp(model->name, "host")
if (!kvm_enabled())
log_error & leave
   else
  goto cpu_class_by_name /*cpu_class_by_name moved after the
final model check @see below */

* the kvm_enabled section can be than slightly improved (dropping the
second compare against 'host')

  if (kvm_enabled() && strcmp(model->name, "max") {
   /*Validate the current_machine->cpu_type against the
model->name and report error case mismatch
  /* otherwise just fall through */
  }
 * cpu_class_by_name moved here ...
> +if (model->props) {
MInor: the CPUModelInfo seems to have dedicated field for that
verification -> has_props

> +qdict_in = qobject_to(QDict, model->props);
> +if (!qdict_in) {
> +error_setg(errp, QERR_INVALID_PARAMETER_TYPE, "props", "dict");
> +return NULL;
>

Re: [PATCH v5 1/9] target/arm/monitor: Introduce qmp_query_cpu_model_expansion

2019-10-15 Thread Beata Michalska

On Tue, 15 Oct 2019 at 11:56, Andrew Jones  wrote:
>
> On Tue, Oct 15, 2019 at 10:59:16AM +0100, Beata Michalska wrote:
> > On Tue, 1 Oct 2019 at 14:04, Andrew Jones  wrote:
> > > +
> > > +obj = object_new(object_class_get_name(oc));
> > > +
> > > +if (qdict_in) {
> > > +Visitor *visitor;
> > > +Error *err = NULL;
> > > +
> > > +visitor = qobject_input_visitor_new(model->props);
> > > +visit_start_struct(visitor, NULL, NULL, 0, );
> > > +if (err) {
> > > +object_unref(obj);
> >
> > Shouldn't we free the 'visitor' here as well ?
>
> Yes. Good catch. So we also need to fix
> target/s390x/cpu_models.c:cpu_model_from_info(), which has the same
> construction (the construction from which I derived this)
>
> >
> > > +error_propagate(errp, err);
> > > +return NULL;
> > > +}
> > > +
>
> What about the rest of the patch? With that fixed for v6 can I
> add your r-b?
>

I still got this feeling that we could optimize that a bit - which I'm
currently on, so hopefully I'll be able to add more comments soon if
that proves to be the case.

BR
Beata

> Thanks,
> drew

Re: [PATCH v5 1/9] target/arm/monitor: Introduce qmp_query_cpu_model_expansion

2019-10-15 Thread Beata Michalska

On Tue, 1 Oct 2019 at 14:04, Andrew Jones  wrote:
>
> Add support for the query-cpu-model-expansion QMP command to Arm. We
> do this selectively, only exposing CPU properties which represent
> optional CPU features which the user may want to enable/disable.
> Additionally we restrict the list of queryable cpu models to 'max',
> 'host', or the current type when KVM is in use. And, finally, we only
> implement expansion type 'full', as Arm does not yet have a "base"
> CPU type. More details and example queries are described in a new
> document (docs/arm-cpu-features.rst).
>
> Note, certainly more features may be added to the list of advertised
> features, e.g. 'vfp' and 'neon'. The only requirement is that we can
> detect invalid configurations and emit failures at QMP query time.
> For 'vfp' and 'neon' this will require some refactoring to share a
> validation function between the QMP query and the CPU realize
> functions.
>
> Signed-off-by: Andrew Jones 
> Reviewed-by: Richard Henderson 
> Reviewed-by: Eric Auger 
> ---
>  docs/arm-cpu-features.rst | 137 +++
>  qapi/machine-target.json  |   6 +-
>  target/arm/monitor.c  | 145 ++
>  3 files changed, 285 insertions(+), 3 deletions(-)
>  create mode 100644 docs/arm-cpu-features.rst
>
> diff --git a/docs/arm-cpu-features.rst b/docs/arm-cpu-features.rst
> new file mode 100644
> index ..c79dcffb5556
> --- /dev/null
> +++ b/docs/arm-cpu-features.rst
> @@ -0,0 +1,137 @@
> +
> +ARM CPU Features
> +
> +
> +Examples of probing and using ARM CPU features
> +
> +Introduction
> +
> +
> +CPU features are optional features that a CPU of supporting type may
> +choose to implement or not.  In QEMU, optional CPU features have
> +corresponding boolean CPU proprieties that, when enabled, indicate
> +that the feature is implemented, and, conversely, when disabled,
> +indicate that it is not implemented. An example of an ARM CPU feature
> +is the Performance Monitoring Unit (PMU).  CPU types such as the
> +Cortex-A15 and the Cortex-A57, which respectively implement ARM
> +architecture reference manuals ARMv7-A and ARMv8-A, may both optionally
> +implement PMUs.  For example, if a user wants to use a Cortex-A15 without
> +a PMU, then the `-cpu` parameter should contain `pmu=off` on the QEMU
> +command line, i.e. `-cpu cortex-a15,pmu=off`.
> +
> +As not all CPU types support all optional CPU features, then whether or
> +not a CPU property exists depends on the CPU type.  For example, CPUs
> +that implement the ARMv8-A architecture reference manual may optionally
> +support the AArch32 CPU feature, which may be enabled by disabling the
> +`aarch64` CPU property.  A CPU type such as the Cortex-A15, which does
> +not implement ARMv8-A, will not have the `aarch64` CPU property.
> +
> +QEMU's support may be limited for some CPU features, only partially
> +supporting the feature or only supporting the feature under certain
> +configurations.  For example, the `aarch64` CPU feature, which, when
> +disabled, enables the optional AArch32 CPU feature, is only supported
> +when using the KVM accelerator and when running on a host CPU type that
> +supports the feature.
> +
> +CPU Feature Probing
> +===
> +
> +Determining which CPU features are available and functional for a given
> +CPU type is possible with the `query-cpu-model-expansion` QMP command.
> +Below are some examples where `scripts/qmp/qmp-shell` (see the top comment
> +block in the script for usage) is used to issue the QMP commands.
> +
> +(1) Determine which CPU features are available for the `max` CPU type
> +(Note, we started QEMU with qemu-system-aarch64, so `max` is
> + implementing the ARMv8-A reference manual in this case)::
> +
> +  (QEMU) query-cpu-model-expansion type=full model={"name":"max"}
> +  { "return": {
> +"model": { "name": "max", "props": {
> +"pmu": true, "aarch64": true
> +  
> +
> +We see that the `max` CPU type has the `pmu` and `aarch64` CPU features.
> +We also see that the CPU features are enabled, as they are all `true`.
> +
> +(2) Let's try to disable the PMU::
> +
> +  (QEMU) query-cpu-model-expansion type=full 
> model={"name":"max","props":{"pmu":false}}
> +  { "return": {
> +"model": { "name": "max", "props": {
> +"pmu": false, "aarch64": true
> +  
> +
> +We see it worked, as `pmu` is now `false`.
> +
> +(3) Let's try to disable `aarch64`, which enables the AArch32 CPU feature::
> +
> +  (QEMU) query-cpu-model-expansion type=full 
> model={"name":"max","props":{"aarch64":false}}
> +  {"error": {
> +   "class": "GenericError", "desc":
> +   "'aarch64' feature cannot be disabled unless KVM is enabled and 
> 32-bit EL1 is supported"
> +  }}
> +
> +It looks like this feature is limited to a configuration we do not
> +currently have.
> +
> +(4) Let's try probing CPU

Re: [PATCH v5 4/9] target/arm/cpu64: max cpu: Introduce sve properties

2019-10-09 Thread Beata Michalska

On Tue, 1 Oct 2019 at 14:04, Andrew Jones  wrote:
>
> Introduce cpu properties to give fine control over SVE vector lengths.
> We introduce a property for each valid length up to the current
> maximum supported, which is 2048-bits. The properties are named, e.g.
> sve128, sve256, sve384, sve512, ..., where the number is the number of
> bits. See the updates to docs/arm-cpu-features.rst for a description
> of the semantics and for example uses.
>
> Note, as sve-max-vq is still present and we'd like to be able to
> support qmp_query_cpu_model_expansion with guests launched with e.g.
> -cpu max,sve-max-vq=8 on their command lines, then we do allow
> sve-max-vq and sve properties to be provided at the same time, but
> this is not recommended, and is why sve-max-vq is not mentioned in the
> document.  If sve-max-vq is provided then it enables all lengths smaller
> than and including the max and disables all lengths larger. It also has
> the side-effect that no larger lengths may be enabled and that the max
> itself cannot be disabled. Smaller non-power-of-two lengths may,
> however, be disabled, e.g. -cpu max,sve-max-vq=4,sve384=off provides a
> guest the vector lengths 128, 256, and 512 bits.
>
> This patch has been co-authored with Richard Henderson, who reworked
> the target/arm/cpu64.c changes in order to push all the validation and
> auto-enabling/disabling steps into the finalizer, resulting in a nice
> LOC reduction.
>

I have most probably missed part of previous discussions around the vector
lengths  so apologies if the question is not relevant anymore  but ...
why the idea of having bitmap representation for those has been dropped ?
Although the proposed solution provides indeed fine control over the vector
lengths it also adds extra logic for handling corner cases and makes specifying
those on the command line rather cumbersome in some cases.
What if we could re-consider bitmaps and add an option for sve with a 'help'
switch to query available options and present them (or just a subset
as an example)
with additional information on how to interpret/modify it ? Smth like :
   -cpu max,sve=help
which would print the bitmap of available lengths with note on what each bit
represents and which ones can be modified .
Than it should be pretty straightforward to enable/disable selected lengths.
This could potentially simplify things a bit

BR
Beata

> Signed-off-by: Andrew Jones 
> ---
>  docs/arm-cpu-features.rst | 168 +++--
>  include/qemu/bitops.h |   1 +
>  target/arm/cpu.c  |  19 
>  target/arm/cpu.h  |  19 
>  target/arm/cpu64.c| 192 -
>  target/arm/helper.c   |  10 +-
>  target/arm/monitor.c  |  12 +++
>  tests/arm-cpu-features.c  | 194 ++
>  8 files changed, 606 insertions(+), 9 deletions(-)
>
> diff --git a/docs/arm-cpu-features.rst b/docs/arm-cpu-features.rst
> index c79dcffb5556..2ea4d6e90c02 100644
> --- a/docs/arm-cpu-features.rst
> +++ b/docs/arm-cpu-features.rst
> @@ -48,18 +48,31 @@ block in the script for usage) is used to issue the QMP 
> commands.
>(QEMU) query-cpu-model-expansion type=full model={"name":"max"}
>{ "return": {
>  "model": { "name": "max", "props": {
> -"pmu": true, "aarch64": true
> +"sve1664": true, "pmu": true, "sve1792": true, "sve1920": true,
> +"sve128": true, "aarch64": true, "sve1024": true, "sve": true,
> +"sve640": true, "sve768": true, "sve1408": true, "sve256": true,
> +"sve1152": true, "sve512": true, "sve384": true, "sve1536": true,
> +"sve896": true, "sve1280": true, "sve2048": true
>
>
> -We see that the `max` CPU type has the `pmu` and `aarch64` CPU features.
> -We also see that the CPU features are enabled, as they are all `true`.
> +We see that the `max` CPU type has the `pmu`, `aarch64`, `sve`, and many
> +`sve` CPU features.  We also see that all the CPU features are
> +enabled, as they are all `true`.  (The `sve` CPU features are all
> +optional SVE vector lengths (see "SVE CPU Properties").  While with TCG
> +all SVE vector lengths can be supported, when KVM is in use it's more
> +likely that only a few lengths will be supported, if SVE is supported at
> +all.)
>
>  (2) Let's try to disable the PMU::
>
>(QEMU) query-cpu-model-expansion type=full 
> model={"name":"max","props":{"pmu":false}}
>{ "return": {
>  "model": { "name": "max", "props": {
> -"pmu": false, "aarch64": true
> +"sve1664": true, "pmu": false, "sve1792": true, "sve1920": true,
> +"sve128": true, "aarch64": true, "sve1024": true, "sve": true,
> +"sve640": true, "sve768": true, "sve1408": true, "sve256": true,
> +"sve1152": true, "sve512": true, "sve384": true, "sve1536": true,
> +"sve896": true, "sve1280": true, "sve2048": true
>
>
>  We see it worked, as `pmu` is now

Re: [Qemu-devel] [PATCH 4/4] target/arm: Add support for DC CVAP & DC CVADP ins

2019-10-09 Thread Beata Michalska

On Tue, 24 Sep 2019 at 02:16, Alex Bennée  wrote:
>
>
> Beata Michalska  writes:
>
> > ARMv8.2 introduced support for Data Cache Clean instructions
> > to PoP (point-of-persistence) - DC CVAP and PoDP (point-of-deep-persistence)
> > - DV CVADP. Both specify conceptual points in a memory system where all 
> > writes
> > that are to reach them are considered persistent.
> > The support provided considers both to be actually the same so there is no
> > distinction between the two. If none is available (there is no backing store
> > for given memory) both will result in Data Cache Clean up to the point of
> > coherency. Otherwise sync for the specified range shall be performed.
> >
> > Signed-off-by: Beata Michalska 
> > ---
> >  linux-user/elfload.c   | 18 +-
> >  target/arm/cpu.h   | 13 -
> >  target/arm/cpu64.c |  1 +
> >  target/arm/helper.c| 24 
> >  target/arm/helper.h|  1 +
> >  target/arm/op_helper.c | 36 
> >  target/arm/translate-a64.c |  5 +
> >  7 files changed, 96 insertions(+), 2 deletions(-)
> >
> > diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> > index 3365e192eb..1ec00308d5 100644
> > --- a/linux-user/elfload.c
> > +++ b/linux-user/elfload.c
> > @@ -609,7 +609,12 @@ enum {
> >  ARM_HWCAP_A64_PACG  = 1UL << 31,
> >  };
> >
> > +enum {
> > +ARM_HWCAP2_A64_DCPODP   = 1 << 0,
> > +};
> > +
> >  #define ELF_HWCAP get_elf_hwcap()
> > +#define ELF_HWCAP2 get_elf_hwcap2()
> >
> >  static uint32_t get_elf_hwcap(void)
> >  {
> > @@ -644,12 +649,23 @@ static uint32_t get_elf_hwcap(void)
> >  GET_FEATURE_ID(aa64_jscvt, ARM_HWCAP_A64_JSCVT);
> >  GET_FEATURE_ID(aa64_sb, ARM_HWCAP_A64_SB);
> >  GET_FEATURE_ID(aa64_condm_4, ARM_HWCAP_A64_FLAGM);
> > +GET_FEATURE_ID(aa64_dcpop, ARM_HWCAP_A64_DCPOP);
> >
> > -#undef GET_FEATURE_ID
> >
> >  return hwcaps;
> >  }
> >
> > +static uint32_t get_elf_hwcap2(void)
> > +{
> > +ARMCPU *cpu = ARM_CPU(thread_cpu);
> > +uint32_t hwcaps = 0;
> > +
> > +GET_FEATURE_ID(aa64_dcpodp, ARM_HWCAP2_A64_DCPODP);
> > +return hwcaps;
> > +}
> > +
> > +#undef GET_FEATURE_ID
> > +
> >  #endif /* not TARGET_AARCH64 */
> >  #endif /* TARGET_ARM */
> >
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index 297ad5e47a..1713d76590 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -2229,7 +2229,8 @@ static inline uint64_t cpreg_to_kvm_id(uint32_t 
> > cpregid)
> >  #define ARM_CP_NZCV  (ARM_CP_SPECIAL | 0x0300)
> >  #define ARM_CP_CURRENTEL (ARM_CP_SPECIAL | 0x0400)
> >  #define ARM_CP_DC_ZVA(ARM_CP_SPECIAL | 0x0500)
> > -#define ARM_LAST_SPECIAL ARM_CP_DC_ZVA
> > +#define ARM_CP_DC_CVAP   (ARM_CP_SPECIAL | 0x0600)
> > +#define ARM_LAST_SPECIAL ARM_CP_DC_CVAP
> >  #define ARM_CP_FPU   0x1000
> >  #define ARM_CP_SVE   0x2000
> >  #define ARM_CP_NO_GDB0x4000
> > @@ -3572,6 +3573,16 @@ static inline bool isar_feature_aa64_frint(const 
> > ARMISARegisters *id)
> >  return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, FRINTTS) != 0;
> >  }
> >
> > +static inline bool isar_feature_aa64_dcpop(const ARMISARegisters *id)
> > +{
> > +return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) != 0;
> > +}
> > +
> > +static inline bool isar_feature_aa64_dcpodp(const ARMISARegisters *id)
> > +{
> > +return (FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) >> 1) != 0;
> > +}
> > +
> >  static inline bool isar_feature_aa64_fp16(const ARMISARegisters *id)
> >  {
> >  /* We always set the AdvSIMD and FP fields identically wrt FP16.  */
> > diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> > index d7f5bf610a..20094f980d 100644
> > --- a/target/arm/cpu64.c
> > +++ b/target/arm/cpu64.c
> > @@ -331,6 +331,7 @@ static void aarch64_max_initfn(Object *obj)
> >  cpu->isar.id_aa64isar0 = t;
> >
> >  t = cpu->isar.id_aa64isar1;
> > +t = FIELD_DP64(t, ID_AA64ISAR1, DPB, 2);
> >  t = FIELD_DP64(t, ID_AA64ISAR1, JSCVT, 1);
> >  t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
> >  t = FIELD_DP64(t, ID_AA64ISAR1, APA, 1); /* PAuth, architected 
> > only */
> > diff -

Re: [Qemu-devel] [PATCH 2/4] Memory: Enable writeback for given memory region

2019-10-09 Thread Beata Michalska

On Tue, 24 Sep 2019 at 17:30, Richard Henderson
 wrote:
>
> On 9/10/19 2:56 AM, Beata Michalska wrote:
> > +void qemu_ram_writeback(RAMBlock *block, ram_addr_t start, ram_addr_t 
> > length);
> > +
> > +/* Clear whole block of mem */
> > +#define qemu_ram_block_writeback(rb)\
> > +qemu_ram_writeback(rb, 0, rb->used_length)
> > +
>
> Inline function, with its typechecking, is preferred.
>
>
Noted.
To be fixed in the next version.

BR
Beata
> r~

Re: [Qemu-devel] [PATCH 2/4] Memory: Enable writeback for given memory region

2019-10-09 Thread Beata Michalska

On Tue, 24 Sep 2019 at 17:28, Richard Henderson
 wrote:
>
> On 9/10/19 2:56 AM, Beata Michalska wrote:
> > +int main(void) {
> > +#if defined(_POSIX_MAPPED_FILES) && _POSIX_MAPPED_FILES > 0 \
> > +&& defined(_POSIX_SYNCHRONIZED_IO) && _POSIX_SYNCHRONIZED_IO > 0
> > +return msync(NULL,0, MS_SYNC);
> > +#else
> > +#error Not supported
> > +#endif
> > +}
>
> Is there any particular reason to check _POSIX_MAPPED_FILES &
> _POSIX_SYNCHRONIZED_IO?  IIRC, you can use those to "safely" use MS_SYNC.  But
> this is a configure test, and an error is in fact our defined failure case, so
> "safely" doesn't seem particularly relevant.
>
> Alternately, do we even support any systems (besides perhaps windows) that do
> not provide POSIX-2001 support, and so include msync + MS_SYNC?  My first 
> guess
> is that we don't.
>

Both flags are there to verify support for msync itself.
The check there is for posix systems , where if both set to value
greater than '0'
the msync call is available.
AFAIK Windows is the only posix non-compliant system being supported . Though
I might be wrong (?)
I might just drop the check here and use CONFIG_POSIX to handle the
msync call instead.

> > +msync((void *)((uintptr_t)addr & qemu_host_page_mask),
> > +   HOST_PAGE_ALIGN(length), MS_SYNC);
>
> This isn't quite right.  If you move addr down to a lower address via this 
> page
> mask, you must also increase length by the same amount, and only afterward
> increase length to the host page size.
>
> Consider addr == 0xff, length = 2.  This covers two pages, so you'd expect
> the final parameters to be, for 4k page size, 0xfff000, 0x2000.
>
Thanks for catching this - guess I was too focused on the cache line
size, which would not cross page boundaries. Will fix that in the next version.

> r~

Thank you.

BR
Beata

Re: [Qemu-devel] [PATCH 2/4] Memory: Enable writeback for given memory region

2019-10-09 Thread Beata Michalska

On Tue, 24 Sep 2019 at 01:00, Alex Bennée  wrote:
>
>
> Beata Michalska  writes:
>
> > Add an option to trigger memory writeback to sync given memory region
> > with the corresponding backing store, case one is available.
> > This extends the support for persistent memory, allowing syncing on-demand.
> >
> > Also, adding verification for msync support on host.
> >
> > Signed-off-by: Beata Michalska 
> > ---
> >  configure   | 24 
> >  exec.c  | 38 ++
> >  include/exec/memory.h   |  6 ++
> >  include/exec/ram_addr.h |  6 ++
> >  memory.c| 12 
> >  5 files changed, 86 insertions(+)
> >
> > diff --git a/configure b/configure
> > index 95134c0180..bdb7dc47e9 100755
> > --- a/configure
> > +++ b/configure
> > @@ -5081,6 +5081,26 @@ if compile_prog "" "" ; then
> >  fdatasync=yes
> >  fi
> >
> > +##
> > +# verify support for msyc
> > +
> > +msync=no
> > +cat > $TMPC << EOF
> > +#include 
> > +#include 
> > +int main(void) {
> > +#if defined(_POSIX_MAPPED_FILES) && _POSIX_MAPPED_FILES > 0 \
> > +&& defined(_POSIX_SYNCHRONIZED_IO) && _POSIX_SYNCHRONIZED_IO > 0
> > +return msync(NULL,0, MS_SYNC);
> > +#else
> > +#error Not supported
> > +#endif
> > +}
> > +EOF
> > +if compile_prog "" "" ; then
> > +msync=yes
> > +fi
> > +
> >  ##
> >  # check if we have madvise
> >
> > @@ -6413,6 +6433,7 @@ echo "fdt support   $fdt"
> >  echo "membarrier$membarrier"
> >  echo "preadv support$preadv"
> >  echo "fdatasync $fdatasync"
> > +echo "msync $msync"
> >  echo "madvise   $madvise"
> >  echo "posix_madvise $posix_madvise"
> >  echo "posix_memalign$posix_memalign"
> > @@ -6952,6 +6973,9 @@ fi
> >  if test "$fdatasync" = "yes" ; then
> >echo "CONFIG_FDATASYNC=y" >> $config_host_mak
> >  fi
> > +if test "$msync" = "yes" ; then
> > +echo "CONFIG_MSYNC=y" >> $config_host_mak
> > +fi
>
> I think it's best to split this configure check into a new prequel patch 
> and...

I might just drop it in favour of CONFIG_POSIX switch ..
>
> >  if test "$madvise" = "yes" ; then
> >echo "CONFIG_MADVISE=y" >> $config_host_mak
> >  fi
> > diff --git a/exec.c b/exec.c
> > index 235d6bc883..5ed6908368 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -65,6 +65,8 @@
> >  #include "exec/ram_addr.h"
> >  #include "exec/log.h"
> >
> > +#include "qemu/pmem.h"
> > +
> >  #include "migration/vmstate.h"
> >
> >  #include "qemu/range.h"
> > @@ -2182,6 +2184,42 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t 
> > newsize, Error **errp)
> >  return 0;
> >  }
> >
> > +/*
> > + * Trigger sync on the given ram block for range [start, start + length]
> > + * with the backing store if available.
> > + * @Note: this is supposed to be a SYNC op.
> > + */
> > +void qemu_ram_writeback(RAMBlock *block, ram_addr_t start, ram_addr_t 
> > length)
> > +{
> > +void *addr = ramblock_ptr(block, start);
> > +
> > +/*
> > + * The requested range might spread up to the very end of the block
> > + */
> > +if ((start + length) > block->used_length) {
> > +error_report("%s: sync range outside the block boundires: "
> > + "start: " RAM_ADDR_FMT " length: " RAM_ADDR_FMT
> > + " block length: " RAM_ADDR_FMT " Narrowing down ..." ,
> > + __func__, start, length, block->used_length);
>
> Is this an error or just logging? error_report should be used for stuff
> that the user needs to know about so it will appear on the HMP console
> (or via stderr). If so what is the user expected to do? Have they
> misconfigured their system?
>

This should be logging  rather than 'error reporting as such. My bad.
Will address that in the next version.

> > +length = block->used_length - start;
> > +}
&g

Re: [Qemu-devel] [PATCH 4/4] target/arm: Add support for DC CVAP & DC CVADP ins

2019-10-09 Thread Beata Michalska

On Tue, 24 Sep 2019 at 18:22, Richard Henderson
 wrote:
>
> On 9/10/19 2:56 AM, Beata Michalska wrote:
> > @@ -2229,7 +2229,8 @@ static inline uint64_t cpreg_to_kvm_id(uint32_t 
> > cpregid)
> >  #define ARM_CP_NZCV  (ARM_CP_SPECIAL | 0x0300)
> >  #define ARM_CP_CURRENTEL (ARM_CP_SPECIAL | 0x0400)
> >  #define ARM_CP_DC_ZVA(ARM_CP_SPECIAL | 0x0500)
> > -#define ARM_LAST_SPECIAL ARM_CP_DC_ZVA
> > +#define ARM_CP_DC_CVAP   (ARM_CP_SPECIAL | 0x0600)
> > +#define ARM_LAST_SPECIAL ARM_CP_DC_CVAP
>
> I don't see that this operation needs to be handled via "special".  It's a
> function call upon write, as for many other system registers.
>

Too inspired by ZVA I guess. Will make the appropriate changes in the
next version.

> > +static inline bool isar_feature_aa64_dcpop(const ARMISARegisters *id)
> > +{
> > +return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) != 0;
> > +}
> > +
> > +static inline bool isar_feature_aa64_dcpodp(const ARMISARegisters *id)
> > +{
> > +return (FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) >> 1) != 0;
> > +}
>
> The correct check is FIELD_EX(...) >= 2.  This is a 4-bit field, as with all 
> of
> the others.
>
Noted.
> > +static CPAccessResult aa64_cacheop_persist_access(CPUARMState *env,
> > +  const ARMCPRegInfo *ri,
> > +  bool isread)
> > +{
> > +ARMCPU *cpu = env_archcpu(env);
> > +/*
> > + * Access is UNDEF if lacking implementation for either DC CVAP or DC 
> > CVADP
> > + * DC CVAP ->  CRm: 0xc
> > + * DC CVADP -> CRm: 0xd
> > + */
> > +return (ri->crm == 0xc && !cpu_isar_feature(aa64_dcpop, cpu)) ||
> > +   (ri->crm == 0xd && !cpu_isar_feature(aa64_dcpodp, cpu))
> > +   ? CP_ACCESS_TRAP_UNCATEGORIZED
> > +   : aa64_cacheop_access(env, ri, isread);
> > +}
> ...
> > +{ .name = "DC_CVAP", .state = ARM_CP_STATE_AA64,
> > +  .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 12, .opc2 = 1,
> > +  .access = PL0_W, .type = ARM_CP_DC_CVAP,
> > +  .accessfn = aa64_cacheop_persist_access },
> > +{ .name = "DC_CVADP", .state = ARM_CP_STATE_AA64,
> > +  .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 13, .opc2 = 1,
> > +  .access = PL0_W, .type = ARM_CP_DC_CVAP,
> > +  .accessfn = aa64_cacheop_persist_access },
>
> While this access function works, it's better to simply not register these at
> all when they're not supported.  Compare the registration of rndr_reginfo.
>
> As I described above, I think this can use a normal write function.  In which
> case this would use .type = ARM_CP_NO_RAW | ARM_CP_SUPPRESS_TB_END.
>
> (I believe that ARM_CP_IO is not required, but I'm not 100% sure.  Certainly
> there does not seem to be anything in dc_cvap() that affects state which can
> queried from another virtual cpu, so there does not appear to be any reason to
> grab the global i/o lock.  The host kernel should be just fine with concurrent
> msync syscalls, or whatever it is that libpmem uses.)
>
>
OK, will move that to conditional registration and double check the type.
Thanks for the suggestion.

> > +void HELPER(dc_cvap)(CPUARMState *env, uint64_t vaddr_in)
> > +{
> > +#ifndef CONFIG_USER_ONLY
> > +ARMCPU *cpu = env_archcpu(env);
> > +/* CTR_EL0 System register -> DminLine, bits [19:16] */
> > +uint64_t dline_size = 4 << ((cpu->ctr >> 16) & 0xF);
> > +uint64_t vaddr = vaddr_in & ~(dline_size - 1);
> > +void *haddr;
> > +int mem_idx = cpu_mmu_index(env, false);
> > +
> > +/* This won't be crossing page boundaries */
> > +haddr = probe_read(env, vaddr, dline_size, mem_idx, GETPC());
> > +if (haddr) {
> > +
> > +ram_addr_t offset;
> > +MemoryRegion *mr;
> > +
> > +/*
> > + * RCU critical section + ref counting,
> > + * so that MR won't disappear behind the scene
> > + */
> > +rcu_read_lock();
> > +mr = memory_region_from_host(haddr, );
> > +if (mr) {
> > +memory_region_ref(mr);
> > +}
> > +rcu_read_unlock();
> > +
> > +if (mr) {
> > +memory_region_do_writeback(mr, offset, dline_size);
> > +memory_region_unref(mr);
> > +}
> > +}
> > +#endif
>
>
> We hold the rcu lock whenever a TB is executing.  I don't believe there's any
> point in increasing the lock count here.  Similarly with memory_region
> refcounts -- they cannot vanish while we're executing a TB.
>
> Thus I believe that about half of this function can fold away.
>
>
So I was chasing the wrong locking herre...
Indeed if the RCU lock is being already held I can safely drop the locking here.

> r~

Thank you for the review,

BR
Beata

Re: [Qemu-devel] [PATCH 4/4] target/arm: Add support for DC CVAP & DC CVADP ins

2019-10-09 Thread Beata Michalska

On Tue, 24 Sep 2019 at 00:54, Alex Bennée  wrote:
>
>
> Beata Michalska  writes:
>
> > ARMv8.2 introduced support for Data Cache Clean instructions
> > to PoP (point-of-persistence) - DC CVAP and PoDP (point-of-deep-persistence)
> > - DV CVADP. Both specify conceptual points in a memory system where all 
> > writes
> > that are to reach them are considered persistent.
> > The support provided considers both to be actually the same so there is no
> > distinction between the two. If none is available (there is no backing store
> > for given memory) both will result in Data Cache Clean up to the point of
> > coherency. Otherwise sync for the specified range shall be performed.
> >
> > Signed-off-by: Beata Michalska 
> > ---
> >  linux-user/elfload.c   | 18 +-
>
> There are conflicts from the recent elfload.c tweaks to fix on your next 
> rebase.

Will address it in the next version.

> >  target/arm/cpu.h   | 13 -
> >  target/arm/cpu64.c |  1 +
> >  target/arm/helper.c| 24 
> >  target/arm/helper.h|  1 +
> >  target/arm/op_helper.c | 36 
> >  target/arm/translate-a64.c |  5 +
> >  7 files changed, 96 insertions(+), 2 deletions(-)
> >
> > diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> > index 3365e192eb..1ec00308d5 100644
> > --- a/linux-user/elfload.c
> > +++ b/linux-user/elfload.c
> > @@ -609,7 +609,12 @@ enum {
> >  ARM_HWCAP_A64_PACG  = 1UL << 31,
> >  };
> >
> > +enum {
> > +ARM_HWCAP2_A64_DCPODP   = 1 << 0,
> > +};
> > +
> >  #define ELF_HWCAP get_elf_hwcap()
> > +#define ELF_HWCAP2 get_elf_hwcap2()
> >
> >  static uint32_t get_elf_hwcap(void)
> >  {
> > @@ -644,12 +649,23 @@ static uint32_t get_elf_hwcap(void)
> >  GET_FEATURE_ID(aa64_jscvt, ARM_HWCAP_A64_JSCVT);
> >  GET_FEATURE_ID(aa64_sb, ARM_HWCAP_A64_SB);
> >  GET_FEATURE_ID(aa64_condm_4, ARM_HWCAP_A64_FLAGM);
> > +GET_FEATURE_ID(aa64_dcpop, ARM_HWCAP_A64_DCPOP);
> >
> > -#undef GET_FEATURE_ID
> >
> >  return hwcaps;
> >  }
> >
> > +static uint32_t get_elf_hwcap2(void)
> > +{
> > +ARMCPU *cpu = ARM_CPU(thread_cpu);
> > +uint32_t hwcaps = 0;
> > +
> > +GET_FEATURE_ID(aa64_dcpodp, ARM_HWCAP2_A64_DCPODP);
> > +return hwcaps;
> > +}
> > +
> > +#undef GET_FEATURE_ID
> > +
> >  #endif /* not TARGET_AARCH64 */
> >  #endif /* TARGET_ARM */
> >
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index 297ad5e47a..1713d76590 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -2229,7 +2229,8 @@ static inline uint64_t cpreg_to_kvm_id(uint32_t 
> > cpregid)
> >  #define ARM_CP_NZCV  (ARM_CP_SPECIAL | 0x0300)
> >  #define ARM_CP_CURRENTEL (ARM_CP_SPECIAL | 0x0400)
> >  #define ARM_CP_DC_ZVA(ARM_CP_SPECIAL | 0x0500)
> > -#define ARM_LAST_SPECIAL ARM_CP_DC_ZVA
> > +#define ARM_CP_DC_CVAP   (ARM_CP_SPECIAL | 0x0600)
> > +#define ARM_LAST_SPECIAL ARM_CP_DC_CVAP
> >  #define ARM_CP_FPU   0x1000
> >  #define ARM_CP_SVE   0x2000
> >  #define ARM_CP_NO_GDB0x4000
> > @@ -3572,6 +3573,16 @@ static inline bool isar_feature_aa64_frint(const 
> > ARMISARegisters *id)
> >  return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, FRINTTS) != 0;
> >  }
> >
> > +static inline bool isar_feature_aa64_dcpop(const ARMISARegisters *id)
> > +{
> > +return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) != 0;
> > +}
> > +
> > +static inline bool isar_feature_aa64_dcpodp(const ARMISARegisters *id)
> > +{
> > +return (FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) >> 1) != 0;
> > +}
> > +
> >  static inline bool isar_feature_aa64_fp16(const ARMISARegisters *id)
> >  {
> >  /* We always set the AdvSIMD and FP fields identically wrt FP16.  */
> > diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> > index d7f5bf610a..20094f980d 100644
> > --- a/target/arm/cpu64.c
> > +++ b/target/arm/cpu64.c
> > @@ -331,6 +331,7 @@ static void aarch64_max_initfn(Object *obj)
> >  cpu->isar.id_aa64isar0 = t;
> >
> >  t = cpu->isar.id_aa64isar1;
> > +t = FIELD_DP64(t, ID_AA64ISAR1, DPB, 2);
> >  t = FIELD_DP64(t, ID_AA64ISAR1, JSCVT, 1);
> >  t = FIELD_DP64(t, ID_AA64

Re: [Qemu-devel] [PATCH 3/4] migration: ram: Switch to ram block writeback

2019-09-12 Thread Beata Michalska

On Wed, 11 Sep 2019 at 11:36, Dr. David Alan Gilbert
 wrote:
>
> * Beata Michalska (beata.michal...@linaro.org) wrote:
> > On Tue, 10 Sep 2019 at 14:16, Dr. David Alan Gilbert
> >  wrote:
> > >
> > > * Beata Michalska (beata.michal...@linaro.org) wrote:
> > > > On Tue, 10 Sep 2019 at 12:26, Dr. David Alan Gilbert
> > > >  wrote:
> > > > >
> > > > > * Beata Michalska (beata.michal...@linaro.org) wrote:
> > > > > > Switch to ram block writeback for pmem migration.
> > > > > >
> > > > > > Signed-off-by: Beata Michalska 
> > > > > > ---
> > > > > >  migration/ram.c | 5 +
> > > > > >  1 file changed, 1 insertion(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/migration/ram.c b/migration/ram.c
> > > > > > index b01a37e7ca..8ea0bd63fc 100644
> > > > > > --- a/migration/ram.c
> > > > > > +++ b/migration/ram.c
> > > > > > @@ -33,7 +33,6 @@
> > > > > >  #include "qemu/bitops.h"
> > > > > >  #include "qemu/bitmap.h"
> > > > > >  #include "qemu/main-loop.h"
> > > > > > -#include "qemu/pmem.h"
> > > > > >  #include "xbzrle.h"
> > > > > >  #include "ram.h"
> > > > > >  #include "migration.h"
> > > > > > @@ -4064,9 +4063,7 @@ static int ram_load_cleanup(void *opaque)
> > > > > >  RAMBlock *rb;
> > > > > >
> > > > > >  RAMBLOCK_FOREACH_NOT_IGNORED(rb) {
> > > > > > -if (ramblock_is_pmem(rb)) {
> > > > > > -pmem_persist(rb->host, rb->used_length);
> > > > > > -}
> > > > > > +qemu_ram_block_writeback(rb);
> > > > >
> > > > > ACK for migration
> > > > >
> > > > > Although I do worry that if you really have pmem hardware, is it 
> > > > > better
> > > > > to fail the migration if you don't have libpmem available?
> > > >
> > > > According to the PMDG man page, pmem_persist is supposed to be
> > > > equivalent for the msync.
> > >
> > > OK, but you do define qemu_ram_block_writeback to fall back to fdatasync;
> > > so that would be too little?
> >
> > Actually it shouldn't. All will end-up in vfs_fsync_range; msync will
> > narrow the range.
> > fdatasync will trigger the same call just that with a wider range. At
> > least for Linux.
> > fdatasync will also fallback to fsync if it is not available.
> > So it's going from the best case scenario (as of performance and range of 
> > mem
> > to be synced) towards the worst case one.
> >
> > I should probably double-check earlier versions of Linux.
> > I'll also try to verify that for other host variants.
>
> Well I guess it should probably follow whatever Posix says;  it's OK to
> make Linux specific assumptions for Linux specific bits - but you can't
> do it by code examination to guarantee it'll be right for other
> platforms, especially if this is in code ifdef'd for portability.
> Also it needs commenting to explain why it's safe to avoid someone else
> asking this question.
>
I will definitely address that in the next version.
Will just wait a bit to potentially gather more input
on the series.

> > BTW: Thank you for having a look at the changes.
>
> No problem.
>
Thanks again.

BR
Beata
> Dave
>
> > BR
> > Beata
> >
> > >
> > > > It's just more performant. So in case of real pmem hardware it should
> > > > be all good.
> > > >
> > > > [http://pmem.io/pmdk/manpages/linux/v1.2/libpmem.3.html]
> > >
> > > Dave
> > >
> > > >
> > > > BR
> > > > Beata
> > > > >
> > > > > Dave
> > > > >
> > > > > >  }
> > > > > >
> > > > > >  xbzrle_load_cleanup();
> > > > > > --
> > > > > > 2.17.1
> > > > > >
> > > > > --
> > > > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> > > --
> > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 3/4] migration: ram: Switch to ram block writeback

2019-09-10 Thread Beata Michalska

On Tue, 10 Sep 2019 at 14:16, Dr. David Alan Gilbert
 wrote:
>
> * Beata Michalska (beata.michal...@linaro.org) wrote:
> > On Tue, 10 Sep 2019 at 12:26, Dr. David Alan Gilbert
> >  wrote:
> > >
> > > * Beata Michalska (beata.michal...@linaro.org) wrote:
> > > > Switch to ram block writeback for pmem migration.
> > > >
> > > > Signed-off-by: Beata Michalska 
> > > > ---
> > > >  migration/ram.c | 5 +
> > > >  1 file changed, 1 insertion(+), 4 deletions(-)
> > > >
> > > > diff --git a/migration/ram.c b/migration/ram.c
> > > > index b01a37e7ca..8ea0bd63fc 100644
> > > > --- a/migration/ram.c
> > > > +++ b/migration/ram.c
> > > > @@ -33,7 +33,6 @@
> > > >  #include "qemu/bitops.h"
> > > >  #include "qemu/bitmap.h"
> > > >  #include "qemu/main-loop.h"
> > > > -#include "qemu/pmem.h"
> > > >  #include "xbzrle.h"
> > > >  #include "ram.h"
> > > >  #include "migration.h"
> > > > @@ -4064,9 +4063,7 @@ static int ram_load_cleanup(void *opaque)
> > > >  RAMBlock *rb;
> > > >
> > > >  RAMBLOCK_FOREACH_NOT_IGNORED(rb) {
> > > > -if (ramblock_is_pmem(rb)) {
> > > > -pmem_persist(rb->host, rb->used_length);
> > > > -}
> > > > +qemu_ram_block_writeback(rb);
> > >
> > > ACK for migration
> > >
> > > Although I do worry that if you really have pmem hardware, is it better
> > > to fail the migration if you don't have libpmem available?
> >
> > According to the PMDG man page, pmem_persist is supposed to be
> > equivalent for the msync.
>
> OK, but you do define qemu_ram_block_writeback to fall back to fdatasync;
> so that would be too little?

Actually it shouldn't. All will end-up in vfs_fsync_range; msync will
narrow the range.
fdatasync will trigger the same call just that with a wider range. At
least for Linux.
fdatasync will also fallback to fsync if it is not available.
So it's going from the best case scenario (as of performance and range of mem
to be synced) towards the worst case one.

I should probably double-check earlier versions of Linux.
I'll also try to verify that for other host variants.

BTW: Thank you for having a look at the changes.

BR
Beata

>
> > It's just more performant. So in case of real pmem hardware it should
> > be all good.
> >
> > [http://pmem.io/pmdk/manpages/linux/v1.2/libpmem.3.html]
>
> Dave
>
> >
> > BR
> > Beata
> > >
> > > Dave
> > >
> > > >  }
> > > >
> > > >  xbzrle_load_cleanup();
> > > > --
> > > > 2.17.1
> > > >
> > > --
> > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

1 2 >

1 - 100 of 106 matches

Mail list logo