Re: [PATCH v5 3/3] arm64: perf: Add support for ARMv8.5-PMU 64-bit counters

2020-01-28 Thread Andrew Murray
On Mon, Jan 27, 2020 at 12:32:22PM +, Marc Zyngier wrote:
> On 2020-01-27 11:44, Andrew Murray wrote:
> > At present ARMv8 event counters are limited to 32-bits, though by
> > using the CHAIN event it's possible to combine adjacent counters to
> > achieve 64-bits. The perf config1:0 bit can be set to use such a
> > configuration.
> > 
> > With the introduction of ARMv8.5-PMU support, all event counters can
> > now be used as 64-bit counters.
> > 
> > Let's enable 64-bit event counters where support exists. Unless the
> > user sets config1:0 we will adjust the counter value such that it
> > overflows upon 32-bit overflow. This follows the same behaviour as
> > the cycle counter which has always been (and remains) 64-bits.
> > 
> > Signed-off-by: Andrew Murray 
> > Reviewed-by: Suzuki K Poulose 
> > ---
> >  arch/arm64/include/asm/perf_event.h |  3 +-
> >  arch/arm64/include/asm/sysreg.h |  1 +
> >  arch/arm64/kernel/perf_event.c  | 86
> > +
> >  include/linux/perf/arm_pmu.h|  1 +
> >  4 files changed, 73 insertions(+), 18 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/perf_event.h
> > b/arch/arm64/include/asm/perf_event.h
> > index 2bdbc79..e7765b6 100644
> > --- a/arch/arm64/include/asm/perf_event.h
> > +++ b/arch/arm64/include/asm/perf_event.h
> > @@ -176,9 +176,10 @@
> >  #define ARMV8_PMU_PMCR_X   (1 << 4) /* Export to ETM */
> >  #define ARMV8_PMU_PMCR_DP  (1 << 5) /* Disable CCNT if non-invasive
> > debug*/
> >  #define ARMV8_PMU_PMCR_LC  (1 << 6) /* Overflow on 64 bit cycle counter
> > */
> > +#define ARMV8_PMU_PMCR_LP  (1 << 7) /* Long event counter enable */
> >  #defineARMV8_PMU_PMCR_N_SHIFT  11   /* Number of counters 
> > supported */
> >  #defineARMV8_PMU_PMCR_N_MASK   0x1f
> > -#defineARMV8_PMU_PMCR_MASK 0x7f /* Mask for writable bits */
> > +#defineARMV8_PMU_PMCR_MASK 0xff /* Mask for writable bits */
> > 
> >  /*
> >   * PMOVSR: counters overflow flag status reg
> > diff --git a/arch/arm64/include/asm/sysreg.h
> > b/arch/arm64/include/asm/sysreg.h
> > index 1009878..30c1e18 100644
> > --- a/arch/arm64/include/asm/sysreg.h
> > +++ b/arch/arm64/include/asm/sysreg.h
> > @@ -675,6 +675,7 @@
> >  #define ID_DFR0_PERFMON_SHIFT  24
> > 
> >  #define ID_DFR0_EL1_PMUVER_8_1 4
> > +#define ID_DFR0_EL1_PMUVER_8_4 5
> 
> This doesn't seem right, see below.

Yes you're right - I'll rename this to ID_AA64DFR0_EL1_PMUVER_8_4


> 
> >  #define ID_AA64DFR0_EL1_PMUVER_8_1 4
> > 
> >  #define ID_ISAR5_RDM_SHIFT 24
> > diff --git a/arch/arm64/kernel/perf_event.c
> > b/arch/arm64/kernel/perf_event.c
> > index e40b656..4e27f90 100644
> > --- a/arch/arm64/kernel/perf_event.c
> > +++ b/arch/arm64/kernel/perf_event.c
> > @@ -285,6 +285,17 @@ static struct attribute_group
> > armv8_pmuv3_format_attr_group = {
> >  #defineARMV8_IDX_COUNTER_LAST(cpu_pmu) \
> > (ARMV8_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
> > 
> > +
> > +/*
> > + * We unconditionally enable ARMv8.5-PMU long event counter support
> > + * (64-bit events) where supported. Indicate if this arm_pmu has long
> > + * event counter support.
> > + */
> > +static bool armv8pmu_has_long_event(struct arm_pmu *cpu_pmu)
> > +{
> > +   return (cpu_pmu->pmuver > ID_DFR0_EL1_PMUVER_8_4);
> 
> Isn't the ID_DFR0 prefix for AArch32? Although this doesn't change much
> the final result (the values happen to be the same on both architectures),
> it is nonetheless a bit confusing.

Yes - ID_DFR0 is the AArch32 register relating to the AArch32 state, that's
mapped onto the AArch64 ID_DFR0_EL1 register. The ID_AA64DFR0_EL1 register
relates to the AArch64 state.


> 
> > +}
> > +
> >  /*
> >   * We must chain two programmable counters for 64 bit events,
> >   * except when we have allocated the 64bit cycle counter (for CPU
> > @@ -294,9 +305,11 @@ static struct attribute_group
> > armv8_pmuv3_format_attr_group = {
> >  static inline bool armv8pmu_event_is_chained(struct perf_event *event)
> >  {
> > int idx = event->hw.idx;
> > +   struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
> > 
> > return !WARN_ON(idx < 0) &&
> >armv8pmu_event_is_64bit(event) &&
> > +  !armv8pmu_has_long_event(cpu_pmu) &&
> >(idx != ARMV8_IDX_CYCLE_COUNTER);
> >  }
&

[PATCH v5 1/3] arm64: cpufeature: Extract capped fields

2020-01-27 Thread Andrew Murray
When emulating ID registers there is often a need to cap the version
bits of a feature such that the guest will not use features that do
not yet exist.

Let's add a helper that extracts a field and caps the version to a
given value.

Signed-off-by: Andrew Murray 
---
 arch/arm64/include/asm/cpufeature.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 4261d55..1462fd1 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -447,6 +447,22 @@ cpuid_feature_extract_unsigned_field(u64 features, int 
field)
return cpuid_feature_extract_unsigned_field_width(features, field, 4);
 }
 
+static inline u64 __attribute_const__
+cpuid_feature_cap_signed_field_width(u64 features, int field, int width,
+s64 cap)
+{
+   s64 val = cpuid_feature_extract_signed_field_width(features, field,
+  width);
+   u64 mask = GENMASK_ULL(field + width - 1, field);
+
+   if (val > cap) {
+   features &= ~mask;
+   features |= (cap << field) & mask;
+   }
+
+   return features;
+}
+
 static inline u64 arm64_ftr_mask(const struct arm64_ftr_bits *ftrp)
 {
return (u64)GENMASK(ftrp->shift + ftrp->width - 1, ftrp->shift);
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 0/3] arm64: perf: Add support for ARMv8.5-PMU 64-bit counters

2020-01-27 Thread Andrew Murray
At present ARMv8 event counters are limited to 32-bits, though by
using the CHAIN event it's possible to combine adjacent counters to
achieve 64-bits. The perf config1:0 bit can be set to use such a
configuration.

With the introduction of ARMv8.5-PMU support, all event counters can
now be used as 64-bit counters. Let's add support for 64-bit event
counters.

As KVM doesn't yet support 64-bit event counters (or other features
after PMUv3 for ARMv8.1), we also trap and emulate the Debug Feature
Registers to limit the PMU version a guest sees to PMUv3 for ARMv8.1.

Tested by running the following perf command on both guest and host
and ensuring that the figures are very similar:

perf stat -e armv8_pmuv3/inst_retired,long=1/ \
  -e armv8_pmuv3/inst_retired,long=0/ -e cycles

Changes since v4:

 - Limit KVM to PMUv3 for ARMv8.1 instead of 8.4
 - Reword second commit

Changes since v3:

 - Rebased onto v5.5-rc7
 - Instead of overriding trap access handler, update read_id_reg

Changes since v2:

 - Rebased onto v5.5-rc4
 - Mask 'cap' value to 'width' in cpuid_feature_cap_signed_field_width

Changes since v1:

 - Rebased onto v5.5-rc1



Andrew Murray (3):
  arm64: cpufeature: Extract capped fields
  KVM: arm64: limit PMU version to PMUv3 for ARMv8.1
  arm64: perf: Add support for ARMv8.5-PMU 64-bit counters

 arch/arm64/include/asm/cpufeature.h | 16 +++
 arch/arm64/include/asm/perf_event.h |  3 +-
 arch/arm64/include/asm/sysreg.h |  6 +++
 arch/arm64/kernel/perf_event.c  | 86 +
 arch/arm64/kvm/sys_regs.c   | 11 +
 include/linux/perf/arm_pmu.h|  1 +
 6 files changed, 105 insertions(+), 18 deletions(-)

-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 2/3] KVM: arm64: limit PMU version to PMUv3 for ARMv8.1

2020-01-27 Thread Andrew Murray
We currently expose the PMU version of the host to the guest via
emulation of the DFR0_EL1 and AA64DFR0_EL1 debug feature registers.
However many of the features offered beyond PMUv3 for 8.1 are not
supported in KVM. Examples of this include support for the PMMIR
registers (added in PMUv3 for ARMv8.4) and 64-bit event counters
added in (PMUv3 for ARMv8.5).

Let's trap the Debug Feature Registers in order to limit
PMUVer/PerfMon in the Debug Feature Registers to PMUv3 for ARMv8.1
to avoid unexpected behaviour.

Signed-off-by: Andrew Murray 
---
 arch/arm64/include/asm/sysreg.h |  5 +
 arch/arm64/kvm/sys_regs.c   | 11 +++
 2 files changed, 16 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 6e919fa..1009878 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -672,6 +672,11 @@
 #define ID_AA64DFR0_TRACEVER_SHIFT 4
 #define ID_AA64DFR0_DEBUGVER_SHIFT 0
 
+#define ID_DFR0_PERFMON_SHIFT  24
+
+#define ID_DFR0_EL1_PMUVER_8_1 4
+#define ID_AA64DFR0_EL1_PMUVER_8_1 4
+
 #define ID_ISAR5_RDM_SHIFT 24
 #define ID_ISAR5_CRC32_SHIFT   16
 #define ID_ISAR5_SHA2_SHIFT12
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 9f21659..3f0f3cc 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1085,6 +1085,17 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
 (0xfUL << ID_AA64ISAR1_GPA_SHIFT) |
 (0xfUL << ID_AA64ISAR1_GPI_SHIFT));
+   } else if (id == SYS_ID_AA64DFR0_EL1) {
+   /* Limit guests to PMUv3 for ARMv8.1 */
+   val = cpuid_feature_cap_signed_field_width(val,
+   ID_AA64DFR0_PMUVER_SHIFT,
+   4, ID_AA64DFR0_EL1_PMUVER_8_1);
+   } else if (id == SYS_ID_DFR0_EL1) {
+   /* Limit guests to PMUv3 for ARMv8.1 */
+   val = cpuid_feature_cap_signed_field_width(val,
+   ID_DFR0_PERFMON_SHIFT,
+   4, ID_DFR0_EL1_PMUVER_8_1);
+
}
 
return val;
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 3/3] arm64: perf: Add support for ARMv8.5-PMU 64-bit counters

2020-01-27 Thread Andrew Murray
At present ARMv8 event counters are limited to 32-bits, though by
using the CHAIN event it's possible to combine adjacent counters to
achieve 64-bits. The perf config1:0 bit can be set to use such a
configuration.

With the introduction of ARMv8.5-PMU support, all event counters can
now be used as 64-bit counters.

Let's enable 64-bit event counters where support exists. Unless the
user sets config1:0 we will adjust the counter value such that it
overflows upon 32-bit overflow. This follows the same behaviour as
the cycle counter which has always been (and remains) 64-bits.

Signed-off-by: Andrew Murray 
Reviewed-by: Suzuki K Poulose 
---
 arch/arm64/include/asm/perf_event.h |  3 +-
 arch/arm64/include/asm/sysreg.h |  1 +
 arch/arm64/kernel/perf_event.c  | 86 +
 include/linux/perf/arm_pmu.h|  1 +
 4 files changed, 73 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/perf_event.h 
b/arch/arm64/include/asm/perf_event.h
index 2bdbc79..e7765b6 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -176,9 +176,10 @@
 #define ARMV8_PMU_PMCR_X   (1 << 4) /* Export to ETM */
 #define ARMV8_PMU_PMCR_DP  (1 << 5) /* Disable CCNT if non-invasive debug*/
 #define ARMV8_PMU_PMCR_LC  (1 << 6) /* Overflow on 64 bit cycle counter */
+#define ARMV8_PMU_PMCR_LP  (1 << 7) /* Long event counter enable */
 #defineARMV8_PMU_PMCR_N_SHIFT  11   /* Number of counters 
supported */
 #defineARMV8_PMU_PMCR_N_MASK   0x1f
-#defineARMV8_PMU_PMCR_MASK 0x7f /* Mask for writable bits */
+#defineARMV8_PMU_PMCR_MASK 0xff /* Mask for writable bits */
 
 /*
  * PMOVSR: counters overflow flag status reg
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 1009878..30c1e18 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -675,6 +675,7 @@
 #define ID_DFR0_PERFMON_SHIFT  24
 
 #define ID_DFR0_EL1_PMUVER_8_1 4
+#define ID_DFR0_EL1_PMUVER_8_4 5
 #define ID_AA64DFR0_EL1_PMUVER_8_1 4
 
 #define ID_ISAR5_RDM_SHIFT 24
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index e40b656..4e27f90 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -285,6 +285,17 @@ static struct attribute_group 
armv8_pmuv3_format_attr_group = {
 #defineARMV8_IDX_COUNTER_LAST(cpu_pmu) \
(ARMV8_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
 
+
+/*
+ * We unconditionally enable ARMv8.5-PMU long event counter support
+ * (64-bit events) where supported. Indicate if this arm_pmu has long
+ * event counter support.
+ */
+static bool armv8pmu_has_long_event(struct arm_pmu *cpu_pmu)
+{
+   return (cpu_pmu->pmuver > ID_DFR0_EL1_PMUVER_8_4);
+}
+
 /*
  * We must chain two programmable counters for 64 bit events,
  * except when we have allocated the 64bit cycle counter (for CPU
@@ -294,9 +305,11 @@ static struct attribute_group 
armv8_pmuv3_format_attr_group = {
 static inline bool armv8pmu_event_is_chained(struct perf_event *event)
 {
int idx = event->hw.idx;
+   struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 
return !WARN_ON(idx < 0) &&
   armv8pmu_event_is_64bit(event) &&
+  !armv8pmu_has_long_event(cpu_pmu) &&
   (idx != ARMV8_IDX_CYCLE_COUNTER);
 }
 
@@ -345,7 +358,7 @@ static inline void armv8pmu_select_counter(int idx)
isb();
 }
 
-static inline u32 armv8pmu_read_evcntr(int idx)
+static inline u64 armv8pmu_read_evcntr(int idx)
 {
armv8pmu_select_counter(idx);
return read_sysreg(pmxevcntr_el0);
@@ -362,6 +375,44 @@ static inline u64 armv8pmu_read_hw_counter(struct 
perf_event *event)
return val;
 }
 
+/*
+ * The cycle counter is always a 64-bit counter. When ARMV8_PMU_PMCR_LP
+ * is set the event counters also become 64-bit counters. Unless the
+ * user has requested a long counter (attr.config1) then we want to
+ * interrupt upon 32-bit overflow - we achieve this by applying a bias.
+ */
+static bool armv8pmu_event_needs_bias(struct perf_event *event)
+{
+   struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
+   struct hw_perf_event *hwc = >hw;
+   int idx = hwc->idx;
+
+   if (armv8pmu_event_is_64bit(event))
+   return false;
+
+   if (armv8pmu_has_long_event(cpu_pmu) ||
+   idx == ARMV8_IDX_CYCLE_COUNTER)
+   return true;
+
+   return false;
+}
+
+static u64 armv8pmu_bias_long_counter(struct perf_event *event, u64 value)
+{
+   if (armv8pmu_event_needs_bias(event))
+   value |= GENMASK(63, 32);
+
+   return value;
+}
+
+static u64 armv8pmu_unbias_long_counter(struct perf_event *event, u64 value)
+{
+   if (armv8pmu_event_needs_bias(event))
+   value &= ~GENMASK

[PATCH v4 2/3] KVM: arm64: limit PMU version to ARMv8.4

2020-01-23 Thread Andrew Murray
ARMv8.5-PMU introduces 64-bit event counters, however KVM doesn't yet
support this. Let's trap the Debug Feature Registers in order to limit
PMUVer/PerfMon in the Debug Feature Registers to PMUv3 for ARMv8.4.

Signed-off-by: Andrew Murray 
Reviewed-by: Suzuki K Poulose 
---
 arch/arm64/include/asm/sysreg.h |  5 +
 arch/arm64/kvm/sys_regs.c   | 11 +++
 2 files changed, 16 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 6e919fafb43d..d969df417f88 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -672,6 +672,11 @@
 #define ID_AA64DFR0_TRACEVER_SHIFT 4
 #define ID_AA64DFR0_DEBUGVER_SHIFT 0
 
+#define ID_DFR0_PERFMON_SHIFT  24
+
+#define ID_DFR0_EL1_PMUVER_8_4 5
+#define ID_AA64DFR0_EL1_PMUVER_8_4 5
+
 #define ID_ISAR5_RDM_SHIFT 24
 #define ID_ISAR5_CRC32_SHIFT   16
 #define ID_ISAR5_SHA2_SHIFT12
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 9f2165937f7d..028c93a88a51 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1085,6 +1085,17 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
 (0xfUL << ID_AA64ISAR1_GPA_SHIFT) |
 (0xfUL << ID_AA64ISAR1_GPI_SHIFT));
+   } else if (id == SYS_ID_AA64DFR0_EL1) {
+   /* Limit guests to PMUv3 for ARMv8.4 */
+   val = cpuid_feature_cap_signed_field_width(val,
+   ID_AA64DFR0_PMUVER_SHIFT,
+   4, ID_AA64DFR0_EL1_PMUVER_8_4);
+   } else if (id == SYS_ID_DFR0_EL1) {
+   /* Limit guests to PMUv3 for ARMv8.4 */
+   val = cpuid_feature_cap_signed_field_width(val,
+   ID_DFR0_PERFMON_SHIFT,
+   4, ID_DFR0_EL1_PMUVER_8_4);
+
}
 
return val;
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 0/3] arm64: perf: Add support for ARMv8.5-PMU 64-bit counters

2020-01-23 Thread Andrew Murray
At present ARMv8 event counters are limited to 32-bits, though by
using the CHAIN event it's possible to combine adjacent counters to
achieve 64-bits. The perf config1:0 bit can be set to use such a
configuration.

With the introduction of ARMv8.5-PMU support, all event counters can
now be used as 64-bit counters. Let's add support for 64-bit event
counters.

As KVM doesn't yet support 64-bit event counters, we also trap
and emulate the Debug Feature Registers to limit the PMU version a
guest sees to PMUv3 for ARMv8.4.

Tested by running the following perf command on both guest and host
and ensuring that the figures are very similar:

perf stat -e armv8_pmuv3/inst_retired,long=1/ \
  -e armv8_pmuv3/inst_retired,long=0/ -e cycles

Changes since v3:

 - Rebased onto v5.5-rc7
 - Instead of overriding trap access handler, update read_id_reg

Changes since v2:

 - Rebased onto v5.5-rc4
 - Mask 'cap' value to 'width' in cpuid_feature_cap_signed_field_width

Changes since v1:

 - Rebased onto v5.5-rc1


Andrew Murray (3):
  arm64: cpufeature: Extract capped fields
  KVM: arm64: limit PMU version to ARMv8.4
  arm64: perf: Add support for ARMv8.5-PMU 64-bit counters

 arch/arm64/include/asm/cpufeature.h | 16 ++
 arch/arm64/include/asm/perf_event.h |  3 +-
 arch/arm64/include/asm/sysreg.h |  5 ++
 arch/arm64/kernel/perf_event.c  | 86 +++--
 arch/arm64/kvm/sys_regs.c   | 11 
 include/linux/perf/arm_pmu.h|  1 +
 6 files changed, 104 insertions(+), 18 deletions(-)

-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 3/3] arm64: perf: Add support for ARMv8.5-PMU 64-bit counters

2020-01-23 Thread Andrew Murray
At present ARMv8 event counters are limited to 32-bits, though by
using the CHAIN event it's possible to combine adjacent counters to
achieve 64-bits. The perf config1:0 bit can be set to use such a
configuration.

With the introduction of ARMv8.5-PMU support, all event counters can
now be used as 64-bit counters.

Let's enable 64-bit event counters where support exists. Unless the
user sets config1:0 we will adjust the counter value such that it
overflows upon 32-bit overflow. This follows the same behaviour as
the cycle counter which has always been (and remains) 64-bits.

Signed-off-by: Andrew Murray 
Reviewed-by: Suzuki K Poulose 
---
 arch/arm64/include/asm/perf_event.h |  3 +-
 arch/arm64/kernel/perf_event.c  | 86 +++--
 include/linux/perf/arm_pmu.h|  1 +
 3 files changed, 72 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/perf_event.h 
b/arch/arm64/include/asm/perf_event.h
index 2bdbc79bbd01..e7765b62c712 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -176,9 +176,10 @@
 #define ARMV8_PMU_PMCR_X   (1 << 4) /* Export to ETM */
 #define ARMV8_PMU_PMCR_DP  (1 << 5) /* Disable CCNT if non-invasive debug*/
 #define ARMV8_PMU_PMCR_LC  (1 << 6) /* Overflow on 64 bit cycle counter */
+#define ARMV8_PMU_PMCR_LP  (1 << 7) /* Long event counter enable */
 #defineARMV8_PMU_PMCR_N_SHIFT  11   /* Number of counters 
supported */
 #defineARMV8_PMU_PMCR_N_MASK   0x1f
-#defineARMV8_PMU_PMCR_MASK 0x7f /* Mask for writable bits */
+#defineARMV8_PMU_PMCR_MASK 0xff /* Mask for writable bits */
 
 /*
  * PMOVSR: counters overflow flag status reg
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index e40b65645c86..4e27f90bb89e 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -285,6 +285,17 @@ static struct attribute_group 
armv8_pmuv3_format_attr_group = {
 #defineARMV8_IDX_COUNTER_LAST(cpu_pmu) \
(ARMV8_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
 
+
+/*
+ * We unconditionally enable ARMv8.5-PMU long event counter support
+ * (64-bit events) where supported. Indicate if this arm_pmu has long
+ * event counter support.
+ */
+static bool armv8pmu_has_long_event(struct arm_pmu *cpu_pmu)
+{
+   return (cpu_pmu->pmuver > ID_DFR0_EL1_PMUVER_8_4);
+}
+
 /*
  * We must chain two programmable counters for 64 bit events,
  * except when we have allocated the 64bit cycle counter (for CPU
@@ -294,9 +305,11 @@ static struct attribute_group 
armv8_pmuv3_format_attr_group = {
 static inline bool armv8pmu_event_is_chained(struct perf_event *event)
 {
int idx = event->hw.idx;
+   struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 
return !WARN_ON(idx < 0) &&
   armv8pmu_event_is_64bit(event) &&
+  !armv8pmu_has_long_event(cpu_pmu) &&
   (idx != ARMV8_IDX_CYCLE_COUNTER);
 }
 
@@ -345,7 +358,7 @@ static inline void armv8pmu_select_counter(int idx)
isb();
 }
 
-static inline u32 armv8pmu_read_evcntr(int idx)
+static inline u64 armv8pmu_read_evcntr(int idx)
 {
armv8pmu_select_counter(idx);
return read_sysreg(pmxevcntr_el0);
@@ -362,6 +375,44 @@ static inline u64 armv8pmu_read_hw_counter(struct 
perf_event *event)
return val;
 }
 
+/*
+ * The cycle counter is always a 64-bit counter. When ARMV8_PMU_PMCR_LP
+ * is set the event counters also become 64-bit counters. Unless the
+ * user has requested a long counter (attr.config1) then we want to
+ * interrupt upon 32-bit overflow - we achieve this by applying a bias.
+ */
+static bool armv8pmu_event_needs_bias(struct perf_event *event)
+{
+   struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
+   struct hw_perf_event *hwc = >hw;
+   int idx = hwc->idx;
+
+   if (armv8pmu_event_is_64bit(event))
+   return false;
+
+   if (armv8pmu_has_long_event(cpu_pmu) ||
+   idx == ARMV8_IDX_CYCLE_COUNTER)
+   return true;
+
+   return false;
+}
+
+static u64 armv8pmu_bias_long_counter(struct perf_event *event, u64 value)
+{
+   if (armv8pmu_event_needs_bias(event))
+   value |= GENMASK(63, 32);
+
+   return value;
+}
+
+static u64 armv8pmu_unbias_long_counter(struct perf_event *event, u64 value)
+{
+   if (armv8pmu_event_needs_bias(event))
+   value &= ~GENMASK(63, 32);
+
+   return value;
+}
+
 static u64 armv8pmu_read_counter(struct perf_event *event)
 {
struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
@@ -377,10 +428,10 @@ static u64 armv8pmu_read_counter(struct perf_event *event)
else
value = armv8pmu_read_hw_counter(event);
 
-   return value;
+   return  armv8pmu_unbias_long_counter(event, value);
 }
 
-static inline void armv8pmu_write_evcnt

[PATCH v4 1/3] arm64: cpufeature: Extract capped fields

2020-01-23 Thread Andrew Murray
When emulating ID registers there is often a need to cap the version
bits of a feature such that the guest will not use features that do
not yet exist.

Let's add a helper that extracts a field and caps the version to a
given value.

Signed-off-by: Andrew Murray 
---
 arch/arm64/include/asm/cpufeature.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 4261d55e8506..1462fd1101e3 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -447,6 +447,22 @@ cpuid_feature_extract_unsigned_field(u64 features, int 
field)
return cpuid_feature_extract_unsigned_field_width(features, field, 4);
 }
 
+static inline u64 __attribute_const__
+cpuid_feature_cap_signed_field_width(u64 features, int field, int width,
+s64 cap)
+{
+   s64 val = cpuid_feature_extract_signed_field_width(features, field,
+  width);
+   u64 mask = GENMASK_ULL(field + width - 1, field);
+
+   if (val > cap) {
+   features &= ~mask;
+   features |= (cap << field) & mask;
+   }
+
+   return features;
+}
+
 static inline u64 arm64_ftr_mask(const struct arm64_ftr_bits *ftrp)
 {
return (u64)GENMASK(ftrp->shift + ftrp->width - 1, ftrp->shift);
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 2/3] KVM: arm64: limit PMU version to ARMv8.4

2020-01-21 Thread Andrew Murray
On Mon, Jan 20, 2020 at 05:44:33PM +, Will Deacon wrote:
> On Thu, Jan 02, 2020 at 12:39:04PM +0000, Andrew Murray wrote:
> > ARMv8.5-PMU introduces 64-bit event counters, however KVM doesn't yet
> > support this. Let's trap the Debug Feature Registers in order to limit
> > PMUVer/PerfMon in the Debug Feature Registers to PMUv3 for ARMv8.4.
> > 
> > Signed-off-by: Andrew Murray 
> > Reviewed-by: Suzuki K Poulose 
> > ---
> >  arch/arm64/include/asm/sysreg.h |  4 
> >  arch/arm64/kvm/sys_regs.c   | 36 +++--
> >  2 files changed, 38 insertions(+), 2 deletions(-)
> 
> I'll need an ack from the kvm side for this.
> 
> > diff --git a/arch/arm64/include/asm/sysreg.h 
> > b/arch/arm64/include/asm/sysreg.h
> > index 6e919fafb43d..1b74f275a115 100644
> > --- a/arch/arm64/include/asm/sysreg.h
> > +++ b/arch/arm64/include/asm/sysreg.h
> > @@ -672,6 +672,10 @@
> >  #define ID_AA64DFR0_TRACEVER_SHIFT 4
> >  #define ID_AA64DFR0_DEBUGVER_SHIFT 0
> >  
> > +#define ID_DFR0_PERFMON_SHIFT  24
> > +
> > +#define ID_DFR0_EL1_PMUVER_8_4 5
> > +
> >  #define ID_ISAR5_RDM_SHIFT 24
> >  #define ID_ISAR5_CRC32_SHIFT   16
> >  #define ID_ISAR5_SHA2_SHIFT12
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index 9f2165937f7d..61b984d934d1 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -668,6 +668,37 @@ static bool 
> > pmu_access_event_counter_el0_disabled(struct kvm_vcpu *vcpu)
> > return check_pmu_access_disabled(vcpu, ARMV8_PMU_USERENR_ER | 
> > ARMV8_PMU_USERENR_EN);
> >  }
> >  
> > +static bool access_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
> > +  struct sys_reg_params *p,
> > +  const struct sys_reg_desc *rd)
> > +{
> > +   if (p->is_write)
> > +   return write_to_read_only(vcpu, p, rd);
> > +
> > +   /* Limit guests to PMUv3 for ARMv8.4 */
> > +   p->regval = read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1);
> > +   p->regval = cpuid_feature_cap_signed_field_width(p->regval,
> > +   ID_AA64DFR0_PMUVER_SHIFT,
> > +   4, ID_DFR0_EL1_PMUVER_8_4);
> 
> nit: I'd probably have a separate define for the field value of the 64-bit
> register, since there's no guarantee other values will be encoded the same
> way. (i.e. add ID_AA64DFR0_PMUVER_8_4 as well).

Yes that seems reasonable, i'll update it.

> 
> > +
> > +   return p->regval;
> > +}
> > +
> > +static bool access_id_dfr0_el1(struct kvm_vcpu *vcpu, struct 
> > sys_reg_params *p,
> > +  const struct sys_reg_desc *rd)
> > +{
> > +   if (p->is_write)
> > +   return write_to_read_only(vcpu, p, rd);
> > +
> > +   /* Limit guests to PMUv3 for ARMv8.4 */
> > +   p->regval = read_sanitised_ftr_reg(SYS_ID_DFR0_EL1);
> > +   p->regval = cpuid_feature_cap_signed_field_width(p->regval,
> 
> You could just return the result here (same above).

Or perhaps a bool - sigh.

Thanks,

Andrew Murray

> 
> Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 2/3] KVM: arm64: limit PMU version to ARMv8.4

2020-01-21 Thread Andrew Murray
On Mon, Jan 20, 2020 at 05:55:17PM +, Marc Zyngier wrote:
> On 2020-01-02 12:39, Andrew Murray wrote:
> > ARMv8.5-PMU introduces 64-bit event counters, however KVM doesn't yet
> > support this. Let's trap the Debug Feature Registers in order to limit
> > PMUVer/PerfMon in the Debug Feature Registers to PMUv3 for ARMv8.4.
> > 
> > Signed-off-by: Andrew Murray 
> > Reviewed-by: Suzuki K Poulose 
> > ---
> >  arch/arm64/include/asm/sysreg.h |  4 
> >  arch/arm64/kvm/sys_regs.c   | 36 +++--
> >  2 files changed, 38 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/sysreg.h
> > b/arch/arm64/include/asm/sysreg.h
> > index 6e919fafb43d..1b74f275a115 100644
> > --- a/arch/arm64/include/asm/sysreg.h
> > +++ b/arch/arm64/include/asm/sysreg.h
> > @@ -672,6 +672,10 @@
> >  #define ID_AA64DFR0_TRACEVER_SHIFT 4
> >  #define ID_AA64DFR0_DEBUGVER_SHIFT 0
> > 
> > +#define ID_DFR0_PERFMON_SHIFT  24
> > +
> > +#define ID_DFR0_EL1_PMUVER_8_4 5
> > +
> >  #define ID_ISAR5_RDM_SHIFT 24
> >  #define ID_ISAR5_CRC32_SHIFT   16
> >  #define ID_ISAR5_SHA2_SHIFT12
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index 9f2165937f7d..61b984d934d1 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -668,6 +668,37 @@ static bool
> > pmu_access_event_counter_el0_disabled(struct kvm_vcpu *vcpu)
> > return check_pmu_access_disabled(vcpu, ARMV8_PMU_USERENR_ER |
> > ARMV8_PMU_USERENR_EN);
> >  }
> > 
> > +static bool access_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
> > +  struct sys_reg_params *p,
> > +  const struct sys_reg_desc *rd)
> > +{
> > +   if (p->is_write)
> > +   return write_to_read_only(vcpu, p, rd);
> > +
> > +   /* Limit guests to PMUv3 for ARMv8.4 */
> > +   p->regval = read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1);
> > +   p->regval = cpuid_feature_cap_signed_field_width(p->regval,
> > +   ID_AA64DFR0_PMUVER_SHIFT,
> > +   4, ID_DFR0_EL1_PMUVER_8_4);
> > +
> > +   return p->regval;
> 
> If feels very odd to return the register value in place of a something
> that actually indicates whether we should update the PC or not. I have
> no idea what is happening here in this case.

This should have returned true. I have no idea why I did this.


> 
> > +}
> > +
> > +static bool access_id_dfr0_el1(struct kvm_vcpu *vcpu, struct
> > sys_reg_params *p,
> > +  const struct sys_reg_desc *rd)
> > +{
> > +   if (p->is_write)
> > +   return write_to_read_only(vcpu, p, rd);
> > +
> > +   /* Limit guests to PMUv3 for ARMv8.4 */
> > +   p->regval = read_sanitised_ftr_reg(SYS_ID_DFR0_EL1);
> > +   p->regval = cpuid_feature_cap_signed_field_width(p->regval,
> > +   ID_DFR0_PERFMON_SHIFT,
> > +   4, ID_DFR0_EL1_PMUVER_8_4);
> > +
> > +   return p->regval;
> 
> Same here.
> 
> > +}
> > +
> >  static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params
> > *p,
> > const struct sys_reg_desc *r)
> >  {
> > @@ -1409,7 +1440,8 @@ static const struct sys_reg_desc sys_reg_descs[] =
> > {
> > /* CRm=1 */
> > ID_SANITISED(ID_PFR0_EL1),
> > ID_SANITISED(ID_PFR1_EL1),
> > -   ID_SANITISED(ID_DFR0_EL1),
> > +   { SYS_DESC(SYS_ID_DFR0_EL1), access_id_dfr0_el1 },
> 
> How about the .get_user and .set_user accessors that were provided by
> ID_SANITISED and that are now dropped? You should probably define a
> new wrapper that allows you to override the .access method.

Yes I can do that, thus ensuring we continue to return sanitised values
rather than the current vcpu value.

However should I also update read_id_reg - thus ensuring the host sees
the same value that the guest sees? (I see this already does something
similar with AA64PFR0 and AA64ISAR1).

Thanks,

Andrew Murray

> 
> > +
> > ID_HIDDEN(ID_AFR0_EL1),
> > ID_SANITISED(ID_MMFR0_EL1),
> > ID_SANITISED(ID_MMFR1_EL1),
> > @@ -1448,7 +1480,7 @@ static const struct sys_reg_desc sys_reg_descs[] =
> > {
> > ID_UNALLOCATED(4,7),
> > 
> > /* CRm=5 */
> > -   ID_SANITISED(ID_AA64DFR0_EL1),
> > +   { SYS_DESC(SYS_ID_AA64DFR0_EL1), access_id_aa64dfr0_el1 },
> > ID_SANITISED(ID_AA64DFR1_EL1),
> > ID_UNALLOCATED(5,2),
> > ID_UNALLOCATED(5,3),
> 
> Thanks,
> 
> M.
> -- 
> Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 2/3] KVM: arm64: limit PMU version to ARMv8.4

2020-01-21 Thread Andrew Murray
On Tue, Jan 21, 2020 at 09:04:21AM +, Will Deacon wrote:
> On Mon, Jan 20, 2020 at 05:55:17PM +, Marc Zyngier wrote:
> > On 2020-01-02 12:39, Andrew Murray wrote:
> > > ARMv8.5-PMU introduces 64-bit event counters, however KVM doesn't yet
> > > support this. Let's trap the Debug Feature Registers in order to limit
> > > PMUVer/PerfMon in the Debug Feature Registers to PMUv3 for ARMv8.4.
> > > 
> > > Signed-off-by: Andrew Murray 
> > > Reviewed-by: Suzuki K Poulose 
> > > ---
> > >  arch/arm64/include/asm/sysreg.h |  4 
> > >  arch/arm64/kvm/sys_regs.c   | 36 +++--
> > >  2 files changed, 38 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/arm64/include/asm/sysreg.h
> > > b/arch/arm64/include/asm/sysreg.h
> > > index 6e919fafb43d..1b74f275a115 100644
> > > --- a/arch/arm64/include/asm/sysreg.h
> > > +++ b/arch/arm64/include/asm/sysreg.h
> > > @@ -672,6 +672,10 @@
> > >  #define ID_AA64DFR0_TRACEVER_SHIFT   4
> > >  #define ID_AA64DFR0_DEBUGVER_SHIFT   0
> > > 
> > > +#define ID_DFR0_PERFMON_SHIFT24
> > > +
> > > +#define ID_DFR0_EL1_PMUVER_8_4   5
> > > +
> > >  #define ID_ISAR5_RDM_SHIFT   24
> > >  #define ID_ISAR5_CRC32_SHIFT 16
> > >  #define ID_ISAR5_SHA2_SHIFT  12
> > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > > index 9f2165937f7d..61b984d934d1 100644
> > > --- a/arch/arm64/kvm/sys_regs.c
> > > +++ b/arch/arm64/kvm/sys_regs.c
> > > @@ -668,6 +668,37 @@ static bool
> > > pmu_access_event_counter_el0_disabled(struct kvm_vcpu *vcpu)
> > >   return check_pmu_access_disabled(vcpu, ARMV8_PMU_USERENR_ER |
> > > ARMV8_PMU_USERENR_EN);
> > >  }
> > > 
> > > +static bool access_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
> > > +struct sys_reg_params *p,
> > > +const struct sys_reg_desc *rd)
> > > +{
> > > + if (p->is_write)
> > > + return write_to_read_only(vcpu, p, rd);
> > > +
> > > + /* Limit guests to PMUv3 for ARMv8.4 */
> > > + p->regval = read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1);
> > > + p->regval = cpuid_feature_cap_signed_field_width(p->regval,
> > > + ID_AA64DFR0_PMUVER_SHIFT,
> > > + 4, ID_DFR0_EL1_PMUVER_8_4);
> > > +
> > > + return p->regval;
> > 
> > If feels very odd to return the register value in place of a something
> > that actually indicates whether we should update the PC or not. I have
> > no idea what is happening here in this case.
> 
> Crikey, yes, I missed that and it probably explains why the code looks so
> odd. Andrew -- is there a missing hunk or something here?

Doh, it should always return true.

Nothing missing here - sometimes I also look at my own code and have no
idea what I was thinking.

Thanks,

Andrew Murray

> 
> Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 10/18] arm64: KVM/debug: use EL1&0 stage 1 translation regime

2020-01-13 Thread Andrew Murray
On Sun, Dec 22, 2019 at 10:34:55AM +, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:17 +,
> Andrew Murray  wrote:
> > 
> > From: Sudeep Holla 
> > 
> > Now that we have all the save/restore mechanism in place, lets enable
> > the translation regime used by buffer from EL2 stage 1 to EL1 stage 1
> > on VHE systems.
> > 
> > Signed-off-by: Sudeep Holla 
> > [ Reword commit, don't trap to EL2 ]
> 
> Not trapping to EL2 for the case where we don't allow SPE in the
> guest is not acceptable.
> 
> > Signed-off-by: Andrew Murray 
> > ---
> >  arch/arm64/kvm/hyp/switch.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 67b7c160f65b..6c153b79829b 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -100,6 +100,7 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
> >  
> > write_sysreg(val, cpacr_el1);
> >  
> > +   write_sysreg(vcpu->arch.mdcr_el2 | 3 << MDCR_EL2_E2PB_SHIFT, mdcr_el2);
> > write_sysreg(kvm_get_hyp_vector(), vbar_el1);
> >  }
> >  NOKPROBE_SYMBOL(activate_traps_vhe);
> > @@ -117,6 +118,7 @@ static void __hyp_text __activate_traps_nvhe(struct 
> > kvm_vcpu *vcpu)
> > __activate_traps_fpsimd32(vcpu);
> > }
> >  
> > +   write_sysreg(vcpu->arch.mdcr_el2 | 3 << MDCR_EL2_E2PB_SHIFT, mdcr_el2);
> 
> There is a _MASK macro that can replace this '3', and is in keeping
> with the rest of the code.
> 
> It still remains that it looks like the wrong place to do this, and
> vcpu_load seems much better. Why should you write to mdcr_el2 on each
> entry to the guest, since you know whether it has SPE enabled at the
> point where it gets scheduled?

For nVHE, the only reason we'd want to change E2PB on entry/exit of guest
would be if the host is also using SPE. If the host is using SPE whilst
the vcpu is 'loaded' but we're not in the guest, then host SPE could raise
an interrupt - we need the E2PB bits to allow access from EL1 (host).

Thanks,

Andrew Murray

> 
>   M.
> 
> -- 
> Jazz is not dead, it just smells funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls

2020-01-10 Thread Andrew Murray
On Fri, Jan 10, 2020 at 11:51:39AM +, Marc Zyngier wrote:
> On 2020-01-10 11:04, Andrew Murray wrote:
> > On Fri, Jan 10, 2020 at 10:54:36AM +0000, Andrew Murray wrote:
> > > On Sat, Dec 21, 2019 at 02:13:25PM +, Marc Zyngier wrote:
> > > > On Fri, 20 Dec 2019 14:30:16 +
> > > > Andrew Murray  wrote:
> > > >
> > > > [somehow managed not to do a reply all, re-sending]
> > > >
> > > > > From: Sudeep Holla 
> > > > >
> > > > > Now that we can save/restore the full SPE controls, we can enable it
> > > > > if SPE is setup and ready to use in KVM. It's supported in KVM only if
> > > > > all the CPUs in the system supports SPE.
> > > > >
> > > > > However to support heterogenous systems, we need to move the check if
> > > > > host supports SPE and do a partial save/restore.
> > > >
> > > > No. Let's just not go down that path. For now, KVM on heterogeneous
> > > > systems do not get SPE. If SPE has been enabled on a guest and a CPU
> > > > comes up without SPE, this CPU should fail to boot (same as exposing a
> > > > feature to userspace).
> > > >
> > > > >
> > > > > Signed-off-by: Sudeep Holla 
> > > > > Signed-off-by: Andrew Murray 
> > > > > ---
> > > > >  arch/arm64/kvm/hyp/debug-sr.c | 33 -
> > > > >  include/kvm/arm_spe.h |  6 ++
> > > > >  2 files changed, 22 insertions(+), 17 deletions(-)
> > > > >
> > > > > diff --git a/arch/arm64/kvm/hyp/debug-sr.c 
> > > > > b/arch/arm64/kvm/hyp/debug-sr.c
> > > > > index 12429b212a3a..d8d857067e6d 100644
> > > > > --- a/arch/arm64/kvm/hyp/debug-sr.c
> > > > > +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > > > > @@ -86,18 +86,13 @@
> > > > >   }
> > > > >
> > > > >  static void __hyp_text
> > > > > -__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > > > +__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool 
> > > > > full_ctxt)
> > > > >  {
> > > > >   u64 reg;
> > > > >
> > > > >   /* Clear pmscr in case of early return */
> > > > >   ctxt->sys_regs[PMSCR_EL1] = 0;
> > > > >
> > > > > - /* SPE present on this CPU? */
> > > > > - if 
> > > > > (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> > > > > -   
> > > > > ID_AA64DFR0_PMSVER_SHIFT))
> > > > > - return;
> > > > > -
> > > > >   /* Yes; is it owned by higher EL? */
> > > > >   reg = read_sysreg_s(SYS_PMBIDR_EL1);
> > > > >   if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> > > > > @@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context 
> > > > > *ctxt, bool full_ctxt)
> > > > >  }
> > > > >
> > > > >  static void __hyp_text
> > > > > -__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool 
> > > > > full_ctxt)
> > > > > +__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool 
> > > > > full_ctxt)
> > > > >  {
> > > > >   if (!ctxt->sys_regs[PMSCR_EL1])
> > > > >   return;
> > > > > @@ -210,11 +205,14 @@ void __hyp_text 
> > > > > __debug_restore_guest_context(struct kvm_vcpu *vcpu)
> > > > >   struct kvm_guest_debug_arch *host_dbg;
> > > > >   struct kvm_guest_debug_arch *guest_dbg;
> > > > >
> > > > > + host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > > > > + guest_ctxt = >arch.ctxt;
> > > > > +
> > > > > + __debug_restore_spe_context(guest_ctxt, 
> > > > > kvm_arm_spe_v1_ready(vcpu));
> > > > > +
> > > > >   if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> > > > >   return;
> > > > >
> > > > > - host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > > > > - guest_ctxt = >arch.ctxt;
> > > > >   host_dbg = >arch.host_debug_state.regs;
> > > > >   guest_dbg = ke

Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls

2020-01-10 Thread Andrew Murray
On Fri, Jan 10, 2020 at 11:18:48AM +, Marc Zyngier wrote:
> On 2020-01-10 10:54, Andrew Murray wrote:
> > On Sat, Dec 21, 2019 at 02:13:25PM +, Marc Zyngier wrote:
> > > On Fri, 20 Dec 2019 14:30:16 +
> > > Andrew Murray  wrote:
> > > 
> > > [somehow managed not to do a reply all, re-sending]
> > > 
> > > > From: Sudeep Holla 
> > > >
> > > > Now that we can save/restore the full SPE controls, we can enable it
> > > > if SPE is setup and ready to use in KVM. It's supported in KVM only if
> > > > all the CPUs in the system supports SPE.
> > > >
> > > > However to support heterogenous systems, we need to move the check if
> > > > host supports SPE and do a partial save/restore.
> > > 
> > > No. Let's just not go down that path. For now, KVM on heterogeneous
> > > systems do not get SPE. If SPE has been enabled on a guest and a CPU
> > > comes up without SPE, this CPU should fail to boot (same as exposing a
> > > feature to userspace).
> > > 
> > > >
> > > > Signed-off-by: Sudeep Holla 
> > > > Signed-off-by: Andrew Murray 
> > > > ---
> > > >  arch/arm64/kvm/hyp/debug-sr.c | 33 -
> > > >  include/kvm/arm_spe.h |  6 ++
> > > >  2 files changed, 22 insertions(+), 17 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/kvm/hyp/debug-sr.c 
> > > > b/arch/arm64/kvm/hyp/debug-sr.c
> > > > index 12429b212a3a..d8d857067e6d 100644
> > > > --- a/arch/arm64/kvm/hyp/debug-sr.c
> > > > +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > > > @@ -86,18 +86,13 @@
> > > > }
> > > >
> > > >  static void __hyp_text
> > > > -__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > > +__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > >  {
> > > > u64 reg;
> > > >
> > > > /* Clear pmscr in case of early return */
> > > > ctxt->sys_regs[PMSCR_EL1] = 0;
> > > >
> > > > -   /* SPE present on this CPU? */
> > > > -   if 
> > > > (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> > > > - 
> > > > ID_AA64DFR0_PMSVER_SHIFT))
> > > > -   return;
> > > > -
> > > > /* Yes; is it owned by higher EL? */
> > > > reg = read_sysreg_s(SYS_PMBIDR_EL1);
> > > > if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> > > > @@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, 
> > > > bool full_ctxt)
> > > >  }
> > > >
> > > >  static void __hyp_text
> > > > -__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > > +__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool 
> > > > full_ctxt)
> > > >  {
> > > > if (!ctxt->sys_regs[PMSCR_EL1])
> > > > return;
> > > > @@ -210,11 +205,14 @@ void __hyp_text 
> > > > __debug_restore_guest_context(struct kvm_vcpu *vcpu)
> > > > struct kvm_guest_debug_arch *host_dbg;
> > > > struct kvm_guest_debug_arch *guest_dbg;
> > > >
> > > > +   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > > > +   guest_ctxt = >arch.ctxt;
> > > > +
> > > > +   __debug_restore_spe_context(guest_ctxt, 
> > > > kvm_arm_spe_v1_ready(vcpu));
> > > > +
> > > > if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> > > > return;
> > > >
> > > > -   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > > > -   guest_ctxt = >arch.ctxt;
> > > > host_dbg = >arch.host_debug_state.regs;
> > > > guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
> > > >
> > > > @@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct 
> > > > kvm_vcpu *vcpu)
> > > > host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > > > guest_ctxt = >arch.ctxt;
> > > >
> > > > -   if (!has_vhe())
> > > > -   __debug_restore_spe_nvhe(host_ctxt, false);
> > > >

Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls

2020-01-10 Thread Andrew Murray
On Fri, Jan 10, 2020 at 10:54:36AM +, Andrew Murray wrote:
> On Sat, Dec 21, 2019 at 02:13:25PM +, Marc Zyngier wrote:
> > On Fri, 20 Dec 2019 14:30:16 +
> > Andrew Murray  wrote:
> > 
> > [somehow managed not to do a reply all, re-sending]
> > 
> > > From: Sudeep Holla 
> > > 
> > > Now that we can save/restore the full SPE controls, we can enable it
> > > if SPE is setup and ready to use in KVM. It's supported in KVM only if
> > > all the CPUs in the system supports SPE.
> > > 
> > > However to support heterogenous systems, we need to move the check if
> > > host supports SPE and do a partial save/restore.
> > 
> > No. Let's just not go down that path. For now, KVM on heterogeneous
> > systems do not get SPE. If SPE has been enabled on a guest and a CPU
> > comes up without SPE, this CPU should fail to boot (same as exposing a
> > feature to userspace).
> > 
> > > 
> > > Signed-off-by: Sudeep Holla 
> > > Signed-off-by: Andrew Murray 
> > > ---
> > >  arch/arm64/kvm/hyp/debug-sr.c | 33 -
> > >  include/kvm/arm_spe.h |  6 ++
> > >  2 files changed, 22 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> > > index 12429b212a3a..d8d857067e6d 100644
> > > --- a/arch/arm64/kvm/hyp/debug-sr.c
> > > +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > > @@ -86,18 +86,13 @@
> > >   }
> > >  
> > >  static void __hyp_text
> > > -__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > +__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > >  {
> > >   u64 reg;
> > >  
> > >   /* Clear pmscr in case of early return */
> > >   ctxt->sys_regs[PMSCR_EL1] = 0;
> > >  
> > > - /* SPE present on this CPU? */
> > > - if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> > > -   ID_AA64DFR0_PMSVER_SHIFT))
> > > - return;
> > > -
> > >   /* Yes; is it owned by higher EL? */
> > >   reg = read_sysreg_s(SYS_PMBIDR_EL1);
> > >   if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> > > @@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, 
> > > bool full_ctxt)
> > >  }
> > >  
> > >  static void __hyp_text
> > > -__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > +__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > >  {
> > >   if (!ctxt->sys_regs[PMSCR_EL1])
> > >   return;
> > > @@ -210,11 +205,14 @@ void __hyp_text 
> > > __debug_restore_guest_context(struct kvm_vcpu *vcpu)
> > >   struct kvm_guest_debug_arch *host_dbg;
> > >   struct kvm_guest_debug_arch *guest_dbg;
> > >  
> > > + host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > > + guest_ctxt = >arch.ctxt;
> > > +
> > > + __debug_restore_spe_context(guest_ctxt, kvm_arm_spe_v1_ready(vcpu));
> > > +
> > >   if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> > >   return;
> > >  
> > > - host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > > - guest_ctxt = >arch.ctxt;
> > >   host_dbg = >arch.host_debug_state.regs;
> > >   guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
> > >  
> > > @@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct 
> > > kvm_vcpu *vcpu)
> > >   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > >   guest_ctxt = >arch.ctxt;
> > >  
> > > - if (!has_vhe())
> > > - __debug_restore_spe_nvhe(host_ctxt, false);
> > > + __debug_restore_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
> > 
> > So you now do an unconditional save/restore on the exit path for VHE as
> > well? Even if the host isn't using the SPE HW? That's not acceptable
> > as, in most cases, only the host /or/ the guest will use SPE. Here, you
> > put a measurable overhead on each exit.
> > 
> > If the host is not using SPE, then the restore/save should happen in
> > vcpu_load/vcpu_put. Only if the host is using SPE should you do
> > something in the run loop. Of course, this only applies to VHE and
> > non-VHE must switch eagerly.
> > 
> 
> On VHE where SPE is used in the guest only - we save/restore in 

Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls

2020-01-10 Thread Andrew Murray
On Sat, Dec 21, 2019 at 02:13:25PM +, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:16 +
> Andrew Murray  wrote:
> 
> [somehow managed not to do a reply all, re-sending]
> 
> > From: Sudeep Holla 
> > 
> > Now that we can save/restore the full SPE controls, we can enable it
> > if SPE is setup and ready to use in KVM. It's supported in KVM only if
> > all the CPUs in the system supports SPE.
> > 
> > However to support heterogenous systems, we need to move the check if
> > host supports SPE and do a partial save/restore.
> 
> No. Let's just not go down that path. For now, KVM on heterogeneous
> systems do not get SPE. If SPE has been enabled on a guest and a CPU
> comes up without SPE, this CPU should fail to boot (same as exposing a
> feature to userspace).
> 
> > 
> > Signed-off-by: Sudeep Holla 
> > Signed-off-by: Andrew Murray 
> > ---
> >  arch/arm64/kvm/hyp/debug-sr.c | 33 -
> >  include/kvm/arm_spe.h |  6 ++
> >  2 files changed, 22 insertions(+), 17 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> > index 12429b212a3a..d8d857067e6d 100644
> > --- a/arch/arm64/kvm/hyp/debug-sr.c
> > +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > @@ -86,18 +86,13 @@
> > }
> >  
> >  static void __hyp_text
> > -__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > +__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> >  {
> > u64 reg;
> >  
> > /* Clear pmscr in case of early return */
> > ctxt->sys_regs[PMSCR_EL1] = 0;
> >  
> > -   /* SPE present on this CPU? */
> > -   if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> > - ID_AA64DFR0_PMSVER_SHIFT))
> > -   return;
> > -
> > /* Yes; is it owned by higher EL? */
> > reg = read_sysreg_s(SYS_PMBIDR_EL1);
> > if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> > @@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, 
> > bool full_ctxt)
> >  }
> >  
> >  static void __hyp_text
> > -__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > +__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> >  {
> > if (!ctxt->sys_regs[PMSCR_EL1])
> > return;
> > @@ -210,11 +205,14 @@ void __hyp_text __debug_restore_guest_context(struct 
> > kvm_vcpu *vcpu)
> > struct kvm_guest_debug_arch *host_dbg;
> > struct kvm_guest_debug_arch *guest_dbg;
> >  
> > +   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > +   guest_ctxt = >arch.ctxt;
> > +
> > +   __debug_restore_spe_context(guest_ctxt, kvm_arm_spe_v1_ready(vcpu));
> > +
> > if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> > return;
> >  
> > -   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > -   guest_ctxt = >arch.ctxt;
> > host_dbg = >arch.host_debug_state.regs;
> > guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
> >  
> > @@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct 
> > kvm_vcpu *vcpu)
> > host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > guest_ctxt = >arch.ctxt;
> >  
> > -   if (!has_vhe())
> > -   __debug_restore_spe_nvhe(host_ctxt, false);
> > +   __debug_restore_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
> 
> So you now do an unconditional save/restore on the exit path for VHE as
> well? Even if the host isn't using the SPE HW? That's not acceptable
> as, in most cases, only the host /or/ the guest will use SPE. Here, you
> put a measurable overhead on each exit.
> 
> If the host is not using SPE, then the restore/save should happen in
> vcpu_load/vcpu_put. Only if the host is using SPE should you do
> something in the run loop. Of course, this only applies to VHE and
> non-VHE must switch eagerly.
> 

On VHE where SPE is used in the guest only - we save/restore in vcpu_load/put.

On VHE where SPE is used in the host only - we save/restore in the run loop.

On VHE where SPE is used in guest and host - we save/restore in the run loop.

As the guest can't trace EL2 it doesn't matter if we restore guest SPE early
in the vcpu_load/put functions. (I assume it doesn't matter that we restore
an EL0/EL1 profiling buffer address at this point and enable tracing given
that there is nothing to trace until entering the guest).

However the reason for moving save/restore to vcp

Re: [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2

2020-01-09 Thread Andrew Murray
On Thu, Jan 09, 2020 at 05:42:51PM +, Mark Rutland wrote:
> Hi Andrew,
> 
> On Thu, Jan 09, 2020 at 05:25:12PM +0000, Andrew Murray wrote:
> > On Mon, Dec 23, 2019 at 12:10:42PM +, Andrew Murray wrote:
> > > On Mon, Dec 23, 2019 at 12:05:12PM +, Marc Zyngier wrote:
> > > > On 2019-12-23 11:56, Andrew Murray wrote:
> > > > > My original concern in the cover letter was in how to prevent
> > > > > the guest from attempting to use these registers in the first
> > > > > place - I think the solution I was looking for is to
> > > > > trap-and-emulate ID_AA64DFR0_EL1 such that the PMSVer bits
> > > > > indicate that SPE is not emulated.
> > > > 
> > > > That, and active trapping of the SPE system registers resulting in 
> > > > injection
> > > > of an UNDEF into the offending guest.
> > > 
> > > Yes that's no problem.
> > 
> > The spec says that 'direct access to [these registers] are UNDEFINED' - is 
> > it
> > not more correct to handle this with trap_raz_wi than an undefined 
> > instruction?
> 
> The term UNDEFINED specifically means treated as an undefined
> instruction. The Glossary in ARM DDI 0487E.a says for UNDEFINED:
> 
> | Indicates cases where an attempt to execute a particular encoding bit
> | pattern generates an exception, that is taken to the current Exception
> | level, or to the default Exception level for taking exceptions if the
> | UNDEFINED encoding was executed at EL0. This applies to:
> |
> | * Any encoding that is not allocated to any instruction.
> |
> | * Any encoding that is defined as never accessible at the current
> |   Exception level.
> |
> | * Some cases where an enable, disable, or trap control means an
> |   encoding is not accessible at the current Exception level.
> 
> So these should trigger an UNDEFINED exception rather than behaving as
> RAZ/WI.

OK thanks for the clarification - I'll leave it as an undefined instruction.

Thanks,

Andrew Murray

> 
> Thanks,
> Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2

2020-01-09 Thread Andrew Murray
On Mon, Dec 23, 2019 at 12:10:42PM +, Andrew Murray wrote:
> On Mon, Dec 23, 2019 at 12:05:12PM +, Marc Zyngier wrote:
> > On 2019-12-23 11:56, Andrew Murray wrote:
> > > On Sun, Dec 22, 2019 at 10:42:05AM +, Marc Zyngier wrote:
> > > > On Fri, 20 Dec 2019 14:30:18 +,
> > > > Andrew Murray  wrote:
> > > > >
> > > > > As we now save/restore the profiler state there is no need to trap
> > > > > accesses to the statistical profiling controls. Let's unset the
> > > > > _TPMS bit.
> > > > >
> > > > > Signed-off-by: Andrew Murray 
> > > > > ---
> > > > >  arch/arm64/kvm/debug.c | 2 --
> > > > >  1 file changed, 2 deletions(-)
> > > > >
> > > > > diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> > > > > index 43487f035385..07ca783e7d9e 100644
> > > > > --- a/arch/arm64/kvm/debug.c
> > > > > +++ b/arch/arm64/kvm/debug.c
> > > > > @@ -88,7 +88,6 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu
> > > > *vcpu)
> > > > >   *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
> > > > >   *  - Debug ROM Address (MDCR_EL2_TDRA)
> > > > >   *  - OS related registers (MDCR_EL2_TDOSA)
> > > > > - *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
> > > > >   *
> > > > >   * Additionally, KVM only traps guest accesses to the debug
> > > > registers if
> > > > >   * the guest is not actively using them (see the
> > > > KVM_ARM64_DEBUG_DIRTY
> > > > > @@ -111,7 +110,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu
> > > > *vcpu)
> > > > >*/
> > > > >   vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) &
> > > > MDCR_EL2_HPMN_MASK;
> > > > >   vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
> > > > > - MDCR_EL2_TPMS |
> > > > 
> > > > No. This is an *optional* feature (the guest could not be presented
> > > > with the SPE feature, or the the support simply not be compiled in).
> > > > 
> > > > If the guest is not allowed to see the feature, for whichever
> > > > reason,
> > > > the traps *must* be enabled and handled.
> > > 
> > > I'll update this (and similar) to trap such registers when we don't
> > > support
> > > SPE in the guest.
> > > 
> > > My original concern in the cover letter was in how to prevent the guest
> > > from attempting to use these registers in the first place - I think the
> > > solution I was looking for is to trap-and-emulate ID_AA64DFR0_EL1 such
> > > that
> > > the PMSVer bits indicate that SPE is not emulated.
> > 
> > That, and active trapping of the SPE system registers resulting in injection
> > of an UNDEF into the offending guest.
> 
> Yes that's no problem.

The spec says that 'direct access to [these registers] are UNDEFINED' - is it
not more correct to handle this with trap_raz_wi than an undefined instruction?

Thanks,

Andrew Murray

> 
> Thanks,
> 
> Andrew Murray
> 
> > 
> > Thanks,
> > 
> > M.
> > -- 
> > Jazz is not dead. It just smells funny...
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls

2020-01-09 Thread Andrew Murray
On Thu, Jan 09, 2020 at 11:23:37AM +, Andrew Murray wrote:
> On Wed, Jan 08, 2020 at 01:10:21PM +, Will Deacon wrote:
> > On Wed, Jan 08, 2020 at 12:36:11PM +, Marc Zyngier wrote:
> > > On 2020-01-08 11:58, Will Deacon wrote:
> > > > On Wed, Jan 08, 2020 at 11:17:16AM +, Marc Zyngier wrote:
> > > > > On 2020-01-07 15:13, Andrew Murray wrote:
> > > > > > Looking at the vcpu_load and related code, I don't see a way of 
> > > > > > saying
> > > > > > 'don't schedule this VCPU on this CPU' or bailing in any way.
> > > > > 
> > > > > That would actually be pretty easy to implement. In vcpu_load(), check
> > > > > that that the CPU physical has SPE. If not, raise a request for that
> > > > > vcpu.
> > > > > In the run loop, check for that request and abort if raised, returning
> > > > > to userspace.
> 
> I hadn't really noticed the kvm_make_request mechanism - however it's now
> clear how this could be implemented.
> 
> This approach gives responsibility for which CPUs should be used to userspace
> and if userspace gets it wrong then the KVM_RUN ioctl won't do very much.
> 
> 
> > > > > 
> > > > > Userspace can always check /sys/devices/arm_spe_0/cpumask and work out
> > > > > where to run that particular vcpu.
> > > > 
> > > > It's also worth considering systems where there are multiple
> > > > implementations
> > > > of SPE in play. Assuming we don't want to expose this to a guest, then
> > > > the
> > > > right interface here is probably for userspace to pick one SPE
> > > > implementation and expose that to the guest.
> 
> If I understand correctly then this implies the following:
> 
>  - If the host userspace indicates it wants support for SPE in the guest (via 
>KVM_SET_DEVICE_ATTR at start of day) - then we should check in vcpu_load 
> that
>the minimum version of SPE is present on the current CPU. 'minimum' because
>we don't know why userspace has selected the given cpumask.
> 
>  - Userspace can get it wrong, i.e. it can create a CPU mask with CPUs that
>have SPE with differing versions. If it does, and all CPUs have some form 
> of
>SPE then errors may occur in the guest. Perhaps this is OK and userspace
>shouldn't get it wrong?

Actually this could be guarded against by emulating the ID_AA64DFR0_EL1 such to
cap the version to the minimum SPE version - if absolutely required.

Thanks,

Andrew Murray

> 
> 
> > > >  That fits with your idea
> > > > above,
> > > > where you basically get an immediate exit if we try to schedule a vCPU
> > > > onto
> > > > a CPU that isn't part of the SPE mask.
> > > 
> > > Then it means that the VM should be configured with a mask indicating
> > > which CPUs it is intended to run on, and setting such a mask is mandatory
> > > for SPE.
> > 
> > Yeah, and this could probably all be wrapped up by userspace so you just
> > pass the SPE PMU name or something and it grabs the corresponding cpumask
> > for you.
> > 
> > > > > > One solution could be to allow scheduling onto non-SPE VCPUs but 
> > > > > > wrap
> > > > > > the
> > > > > > SPE save/restore code in a macro (much like kvm_arm_spe_v1_ready) 
> > > > > > that
> > > > > > reads the non-sanitised feature register. Therefore we don't go 
> > > > > > bang,
> > > > > > but
> > > > > > we also increase the size of any black-holes in SPE capturing. 
> > > > > > Though
> > > > > > this
> > > > > > feels like something that will cause grief down the line.
> > > > > >
> > > > > > Is there something else that can be done?
> > > > > 
> > > > > How does userspace deal with this? When SPE is only available on
> > > > > half of
> > > > > the CPUs, how does perf work in these conditions?
> > > > 
> > > > Not sure about userspace, but the kernel driver works by instantiating
> > > > an
> > > > SPE PMU instance only for the CPUs that have it and then that instance
> > > > profiles for only those CPUs. You also need to do something similar if
> > > > you had two CPU types with SPE, since the SPE configuration is likely to
> > > > be
> > > > different between them.
> > > 
> >

Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls

2020-01-09 Thread Andrew Murray
On Wed, Jan 08, 2020 at 01:10:21PM +, Will Deacon wrote:
> On Wed, Jan 08, 2020 at 12:36:11PM +, Marc Zyngier wrote:
> > On 2020-01-08 11:58, Will Deacon wrote:
> > > On Wed, Jan 08, 2020 at 11:17:16AM +, Marc Zyngier wrote:
> > > > On 2020-01-07 15:13, Andrew Murray wrote:
> > > > > Looking at the vcpu_load and related code, I don't see a way of saying
> > > > > 'don't schedule this VCPU on this CPU' or bailing in any way.
> > > > 
> > > > That would actually be pretty easy to implement. In vcpu_load(), check
> > > > that that the CPU physical has SPE. If not, raise a request for that
> > > > vcpu.
> > > > In the run loop, check for that request and abort if raised, returning
> > > > to userspace.

I hadn't really noticed the kvm_make_request mechanism - however it's now
clear how this could be implemented.

This approach gives responsibility for which CPUs should be used to userspace
and if userspace gets it wrong then the KVM_RUN ioctl won't do very much.


> > > > 
> > > > Userspace can always check /sys/devices/arm_spe_0/cpumask and work out
> > > > where to run that particular vcpu.
> > > 
> > > It's also worth considering systems where there are multiple
> > > implementations
> > > of SPE in play. Assuming we don't want to expose this to a guest, then
> > > the
> > > right interface here is probably for userspace to pick one SPE
> > > implementation and expose that to the guest.

If I understand correctly then this implies the following:

 - If the host userspace indicates it wants support for SPE in the guest (via 
   KVM_SET_DEVICE_ATTR at start of day) - then we should check in vcpu_load that
   the minimum version of SPE is present on the current CPU. 'minimum' because
   we don't know why userspace has selected the given cpumask.

 - Userspace can get it wrong, i.e. it can create a CPU mask with CPUs that
   have SPE with differing versions. If it does, and all CPUs have some form of
   SPE then errors may occur in the guest. Perhaps this is OK and userspace
   shouldn't get it wrong?


> > >  That fits with your idea
> > > above,
> > > where you basically get an immediate exit if we try to schedule a vCPU
> > > onto
> > > a CPU that isn't part of the SPE mask.
> > 
> > Then it means that the VM should be configured with a mask indicating
> > which CPUs it is intended to run on, and setting such a mask is mandatory
> > for SPE.
> 
> Yeah, and this could probably all be wrapped up by userspace so you just
> pass the SPE PMU name or something and it grabs the corresponding cpumask
> for you.
> 
> > > > > One solution could be to allow scheduling onto non-SPE VCPUs but wrap
> > > > > the
> > > > > SPE save/restore code in a macro (much like kvm_arm_spe_v1_ready) that
> > > > > reads the non-sanitised feature register. Therefore we don't go bang,
> > > > > but
> > > > > we also increase the size of any black-holes in SPE capturing. Though
> > > > > this
> > > > > feels like something that will cause grief down the line.
> > > > >
> > > > > Is there something else that can be done?
> > > > 
> > > > How does userspace deal with this? When SPE is only available on
> > > > half of
> > > > the CPUs, how does perf work in these conditions?
> > > 
> > > Not sure about userspace, but the kernel driver works by instantiating
> > > an
> > > SPE PMU instance only for the CPUs that have it and then that instance
> > > profiles for only those CPUs. You also need to do something similar if
> > > you had two CPU types with SPE, since the SPE configuration is likely to
> > > be
> > > different between them.
> > 
> > So that's closer to what Andrew was suggesting above (running a guest on a
> > non-SPE CPU creates a profiling black hole). Except that we can't really
> > run a SPE-enabled guest on a non-SPE CPU, as the SPE sysregs will UNDEF
> > at EL1.
> 
> Right. I wouldn't suggest the "black hole" approach for VMs, but it works
> for userspace so that's why the driver does it that way.
> 
> > Conclusion: we need a mix of a cpumask to indicate which CPUs we want to
> > run on (generic, not-SPE related), 

If I understand correctly this mask isn't exposed to KVM (in the kernel) and
KVM (in the kernel) is unware of how the CPUs that have KVM_RUN called are
selected.

Thus this implies the cpumask is a feature of KVM tool or QEMU that would
need to be added there. (E.g. kvm_cmd_run_work would set some affinity when
creating pthreads - based on a CPU mask triggered by setting the --spe flag)?

Thanks,

Andrew Murray

> and a check for SPE-capable CPUs.
> > If any of these condition is not satisfied, the vcpu exits for userspace
> > to sort out the affinity.
> > 
> > I hate heterogeneous systems.
> 
> They hate you too ;)
> 
> Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls

2020-01-07 Thread Andrew Murray
On Sat, Dec 21, 2019 at 02:13:25PM +, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:16 +
> Andrew Murray  wrote:
> 
> [somehow managed not to do a reply all, re-sending]
> 
> > From: Sudeep Holla 
> > 
> > Now that we can save/restore the full SPE controls, we can enable it
> > if SPE is setup and ready to use in KVM. It's supported in KVM only if
> > all the CPUs in the system supports SPE.
> > 
> > However to support heterogenous systems, we need to move the check if
> > host supports SPE and do a partial save/restore.
> 
> No. Let's just not go down that path. For now, KVM on heterogeneous
> systems do not get SPE.

At present these patches only offer the SPE feature to VCPU's where the
sanitised AA64DFR0 register indicates that all CPUs have this support
(kvm_arm_support_spe_v1) at the time of setting the attribute
(KVM_SET_DEVICE_ATTR).

Therefore if a new CPU comes online without SPE support, and an
existing VCPU is scheduled onto it, then bad things happen - which I guess
must have been the intention behind this patch.


> If SPE has been enabled on a guest and a CPU
> comes up without SPE, this CPU should fail to boot (same as exposing a
> feature to userspace).

I'm unclear as how to prevent this. We can set the FTR_STRICT flag on
the sanitised register - thus tainting the kernel if such a non-SPE CPU
comes online - thought that doesn't prevent KVM from blowing up. Though
I don't believe we can prevent a CPU coming up. At the moment this is
my preferred approach.

Looking at the vcpu_load and related code, I don't see a way of saying
'don't schedule this VCPU on this CPU' or bailing in any way.

One solution could be to allow scheduling onto non-SPE VCPUs but wrap the
SPE save/restore code in a macro (much like kvm_arm_spe_v1_ready) that
reads the non-sanitised feature register. Therefore we don't go bang, but
we also increase the size of any black-holes in SPE capturing. Though this
feels like something that will cause grief down the line.

Is there something else that can be done?

Thanks,

Andrew Murray

> 
> > 
> > Signed-off-by: Sudeep Holla 
> > Signed-off-by: Andrew Murray 
> > ---
> >  arch/arm64/kvm/hyp/debug-sr.c | 33 -
> >  include/kvm/arm_spe.h |  6 ++
> >  2 files changed, 22 insertions(+), 17 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> > index 12429b212a3a..d8d857067e6d 100644
> > --- a/arch/arm64/kvm/hyp/debug-sr.c
> > +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > @@ -86,18 +86,13 @@
> > }
> >  
> >  static void __hyp_text
> > -__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > +__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> >  {
> > u64 reg;
> >  
> > /* Clear pmscr in case of early return */
> > ctxt->sys_regs[PMSCR_EL1] = 0;
> >  
> > -   /* SPE present on this CPU? */
> > -   if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> > - ID_AA64DFR0_PMSVER_SHIFT))
> > -   return;
> > -
> > /* Yes; is it owned by higher EL? */
> > reg = read_sysreg_s(SYS_PMBIDR_EL1);
> > if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> > @@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, 
> > bool full_ctxt)
> >  }
> >  
> >  static void __hyp_text
> > -__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > +__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> >  {
> > if (!ctxt->sys_regs[PMSCR_EL1])
> > return;
> > @@ -210,11 +205,14 @@ void __hyp_text __debug_restore_guest_context(struct 
> > kvm_vcpu *vcpu)
> > struct kvm_guest_debug_arch *host_dbg;
> > struct kvm_guest_debug_arch *guest_dbg;
> >  
> > +   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > +   guest_ctxt = >arch.ctxt;
> > +
> > +   __debug_restore_spe_context(guest_ctxt, kvm_arm_spe_v1_ready(vcpu));
> > +
> > if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> > return;
> >  
> > -   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > -   guest_ctxt = >arch.ctxt;
> > host_dbg = >arch.host_debug_state.regs;
> > guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
> >  
> > @@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct 
> > kvm_vcpu *vcpu)
> > host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > guest_ctxt = >arch.ctxt;
> >  
> >

Re: [PATCH v2 02/18] arm64: KVM: reset E2PB correctly in MDCR_EL2 when exiting the guest(VHE)

2020-01-02 Thread Andrew Murray
On Tue, Dec 24, 2019 at 10:29:50AM +, Andrew Murray wrote:
> On Sat, Dec 21, 2019 at 01:12:14PM +, Marc Zyngier wrote:
> > On Fri, 20 Dec 2019 14:30:09 +
> > Andrew Murray  wrote:
> > 
> > > From: Sudeep Holla 
> > > 
> > > On VHE systems, the reset value for MDCR_EL2.E2PB=b00 which defaults
> > > to profiling buffer using the EL2 stage 1 translations. 
> > 
> > Does the reset value actually matter here? I don't see it being
> > specific to VHE systems, and all we're trying to achieve is to restore
> > the SPE configuration to a state where it can be used by the host.
> > 
> > > However if the
> > > guest are allowed to use profiling buffers changing E2PB settings, we
> > 
> > How can the guest be allowed to change E2PB settings? Or do you mean
> > here that allowing the guest to use SPE will mandate changes of the
> > E2PB settings, and that we'd better restore the hypervisor state once
> > we exit?
> > 
> > > need to ensure we resume back MDCR_EL2.E2PB=b00. Currently we just
> > > do bitwise '&' with MDCR_EL2_E2PB_MASK which will retain the value.
> > > 
> > > So fix it by clearing all the bits in E2PB.
> > > 
> > > Signed-off-by: Sudeep Holla 
> > > Signed-off-by: Andrew Murray 
> > > ---
> > >  arch/arm64/kvm/hyp/switch.c | 4 +---
> > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > > index 72fbbd86eb5e..250f13910882 100644
> > > --- a/arch/arm64/kvm/hyp/switch.c
> > > +++ b/arch/arm64/kvm/hyp/switch.c
> > > @@ -228,9 +228,7 @@ void deactivate_traps_vhe_put(void)
> > >  {
> > >   u64 mdcr_el2 = read_sysreg(mdcr_el2);
> > >  
> > > - mdcr_el2 &= MDCR_EL2_HPMN_MASK |
> > > - MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
> > > - MDCR_EL2_TPMS;
> > > + mdcr_el2 &= MDCR_EL2_HPMN_MASK | MDCR_EL2_TPMS;
> > >  
> > >   write_sysreg(mdcr_el2, mdcr_el2);
> > >  
> > 
> > I'm OK with this change, but I believe the commit message could use
> > some tidying up.
> 
> No problem, I'll update the commit message.

This is my new description:

arm64: KVM: reset E2PB correctly in MDCR_EL2 when exiting the guest (VHE)

Upon leaving the guest on VHE systems we currently preserve the value of
MDCR_EL2.E2PB. This register determines if the SPE profiling buffer controls
are trapped and which translation regime they use.
    
In order to permit guest access to SPE we may use a different translation
regime whilst the vCPU is scheduled - therefore let's ensure that upon 
leaving
the guest we set E2PB back to the value expected by the host (b00).

For nVHE systems we already explictly set E2PB back to the expected value
of 0b11 in __deactivate_traps_nvhe.

Thanks,

Andrew Murray

> 
> Thanks,
> 
> Andrew Murray
> 
> > 
> > Thanks,
> > 
> > M.
> > -- 
> > Jazz is not dead. It just smells funny...
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 0/3] arm64: perf: Add support for ARMv8.5-PMU 64-bit counters

2020-01-02 Thread Andrew Murray
At present ARMv8 event counters are limited to 32-bits, though by
using the CHAIN event it's possible to combine adjacent counters to
achieve 64-bits. The perf config1:0 bit can be set to use such a
configuration.

With the introduction of ARMv8.5-PMU support, all event counters can
now be used as 64-bit counters. Let's add support for 64-bit event
counters.

As KVM doesn't yet support 64-bit event counters, we also trap
and emulate the Debug Feature Registers to limit the PMU version a
guest sees to PMUv3 for ARMv8.4.

Tested by running the following perf command on both guest and host
and ensuring that the figures are very similar:

perf stat -e armv8_pmuv3/inst_retired,long=1/ \
  -e armv8_pmuv3/inst_retired,long=0/ -e cycles

Changes since v2:

 - Rebased onto v5.5-rc4
 - Mask 'cap' value to 'width' in cpuid_feature_cap_signed_field_width

Changes since v1:

 - Rebased onto v5.5-rc1


Andrew Murray (3):
  arm64: cpufeature: Extract capped fields
  KVM: arm64: limit PMU version to ARMv8.4
  arm64: perf: Add support for ARMv8.5-PMU 64-bit counters

 arch/arm64/include/asm/cpufeature.h | 16 ++
 arch/arm64/include/asm/perf_event.h |  3 +-
 arch/arm64/include/asm/sysreg.h |  4 ++
 arch/arm64/kernel/perf_event.c  | 86 +++--
 arch/arm64/kvm/sys_regs.c   | 36 +++-
 include/linux/perf/arm_pmu.h|  1 +
 6 files changed, 126 insertions(+), 20 deletions(-)

-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 1/3] arm64: cpufeature: Extract capped fields

2020-01-02 Thread Andrew Murray
When emulating ID registers there is often a need to cap the version
bits of a feature such that the guest will not use features that do
not yet exist.

Let's add a helper that extracts a field and caps the version to a
given value.

Signed-off-by: Andrew Murray 
---
 arch/arm64/include/asm/cpufeature.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 4261d55e8506..1462fd1101e3 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -447,6 +447,22 @@ cpuid_feature_extract_unsigned_field(u64 features, int 
field)
return cpuid_feature_extract_unsigned_field_width(features, field, 4);
 }
 
+static inline u64 __attribute_const__
+cpuid_feature_cap_signed_field_width(u64 features, int field, int width,
+s64 cap)
+{
+   s64 val = cpuid_feature_extract_signed_field_width(features, field,
+  width);
+   u64 mask = GENMASK_ULL(field + width - 1, field);
+
+   if (val > cap) {
+   features &= ~mask;
+   features |= (cap << field) & mask;
+   }
+
+   return features;
+}
+
 static inline u64 arm64_ftr_mask(const struct arm64_ftr_bits *ftrp)
 {
return (u64)GENMASK(ftrp->shift + ftrp->width - 1, ftrp->shift);
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 2/3] KVM: arm64: limit PMU version to ARMv8.4

2020-01-02 Thread Andrew Murray
ARMv8.5-PMU introduces 64-bit event counters, however KVM doesn't yet
support this. Let's trap the Debug Feature Registers in order to limit
PMUVer/PerfMon in the Debug Feature Registers to PMUv3 for ARMv8.4.

Signed-off-by: Andrew Murray 
Reviewed-by: Suzuki K Poulose 
---
 arch/arm64/include/asm/sysreg.h |  4 
 arch/arm64/kvm/sys_regs.c   | 36 +++--
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 6e919fafb43d..1b74f275a115 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -672,6 +672,10 @@
 #define ID_AA64DFR0_TRACEVER_SHIFT 4
 #define ID_AA64DFR0_DEBUGVER_SHIFT 0
 
+#define ID_DFR0_PERFMON_SHIFT  24
+
+#define ID_DFR0_EL1_PMUVER_8_4 5
+
 #define ID_ISAR5_RDM_SHIFT 24
 #define ID_ISAR5_CRC32_SHIFT   16
 #define ID_ISAR5_SHA2_SHIFT12
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 9f2165937f7d..61b984d934d1 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -668,6 +668,37 @@ static bool pmu_access_event_counter_el0_disabled(struct 
kvm_vcpu *vcpu)
return check_pmu_access_disabled(vcpu, ARMV8_PMU_USERENR_ER | 
ARMV8_PMU_USERENR_EN);
 }
 
+static bool access_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
+  struct sys_reg_params *p,
+  const struct sys_reg_desc *rd)
+{
+   if (p->is_write)
+   return write_to_read_only(vcpu, p, rd);
+
+   /* Limit guests to PMUv3 for ARMv8.4 */
+   p->regval = read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1);
+   p->regval = cpuid_feature_cap_signed_field_width(p->regval,
+   ID_AA64DFR0_PMUVER_SHIFT,
+   4, ID_DFR0_EL1_PMUVER_8_4);
+
+   return p->regval;
+}
+
+static bool access_id_dfr0_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+  const struct sys_reg_desc *rd)
+{
+   if (p->is_write)
+   return write_to_read_only(vcpu, p, rd);
+
+   /* Limit guests to PMUv3 for ARMv8.4 */
+   p->regval = read_sanitised_ftr_reg(SYS_ID_DFR0_EL1);
+   p->regval = cpuid_feature_cap_signed_field_width(p->regval,
+   ID_DFR0_PERFMON_SHIFT,
+   4, ID_DFR0_EL1_PMUVER_8_4);
+
+   return p->regval;
+}
+
 static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
const struct sys_reg_desc *r)
 {
@@ -1409,7 +1440,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
/* CRm=1 */
ID_SANITISED(ID_PFR0_EL1),
ID_SANITISED(ID_PFR1_EL1),
-   ID_SANITISED(ID_DFR0_EL1),
+   { SYS_DESC(SYS_ID_DFR0_EL1), access_id_dfr0_el1 },
+
ID_HIDDEN(ID_AFR0_EL1),
ID_SANITISED(ID_MMFR0_EL1),
ID_SANITISED(ID_MMFR1_EL1),
@@ -1448,7 +1480,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
ID_UNALLOCATED(4,7),
 
/* CRm=5 */
-   ID_SANITISED(ID_AA64DFR0_EL1),
+   { SYS_DESC(SYS_ID_AA64DFR0_EL1), access_id_aa64dfr0_el1 },
ID_SANITISED(ID_AA64DFR1_EL1),
ID_UNALLOCATED(5,2),
ID_UNALLOCATED(5,3),
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 3/3] arm64: perf: Add support for ARMv8.5-PMU 64-bit counters

2020-01-02 Thread Andrew Murray
At present ARMv8 event counters are limited to 32-bits, though by
using the CHAIN event it's possible to combine adjacent counters to
achieve 64-bits. The perf config1:0 bit can be set to use such a
configuration.

With the introduction of ARMv8.5-PMU support, all event counters can
now be used as 64-bit counters.

Let's enable 64-bit event counters where support exists. Unless the
user sets config1:0 we will adjust the counter value such that it
overflows upon 32-bit overflow. This follows the same behaviour as
the cycle counter which has always been (and remains) 64-bits.

Signed-off-by: Andrew Murray 
Reviewed-by: Suzuki K Poulose 
---
 arch/arm64/include/asm/perf_event.h |  3 +-
 arch/arm64/kernel/perf_event.c  | 86 +++--
 include/linux/perf/arm_pmu.h|  1 +
 3 files changed, 72 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/perf_event.h 
b/arch/arm64/include/asm/perf_event.h
index 2bdbc79bbd01..e7765b62c712 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -176,9 +176,10 @@
 #define ARMV8_PMU_PMCR_X   (1 << 4) /* Export to ETM */
 #define ARMV8_PMU_PMCR_DP  (1 << 5) /* Disable CCNT if non-invasive debug*/
 #define ARMV8_PMU_PMCR_LC  (1 << 6) /* Overflow on 64 bit cycle counter */
+#define ARMV8_PMU_PMCR_LP  (1 << 7) /* Long event counter enable */
 #defineARMV8_PMU_PMCR_N_SHIFT  11   /* Number of counters 
supported */
 #defineARMV8_PMU_PMCR_N_MASK   0x1f
-#defineARMV8_PMU_PMCR_MASK 0x7f /* Mask for writable bits */
+#defineARMV8_PMU_PMCR_MASK 0xff /* Mask for writable bits */
 
 /*
  * PMOVSR: counters overflow flag status reg
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index e40b65645c86..4e27f90bb89e 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -285,6 +285,17 @@ static struct attribute_group 
armv8_pmuv3_format_attr_group = {
 #defineARMV8_IDX_COUNTER_LAST(cpu_pmu) \
(ARMV8_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
 
+
+/*
+ * We unconditionally enable ARMv8.5-PMU long event counter support
+ * (64-bit events) where supported. Indicate if this arm_pmu has long
+ * event counter support.
+ */
+static bool armv8pmu_has_long_event(struct arm_pmu *cpu_pmu)
+{
+   return (cpu_pmu->pmuver > ID_DFR0_EL1_PMUVER_8_4);
+}
+
 /*
  * We must chain two programmable counters for 64 bit events,
  * except when we have allocated the 64bit cycle counter (for CPU
@@ -294,9 +305,11 @@ static struct attribute_group 
armv8_pmuv3_format_attr_group = {
 static inline bool armv8pmu_event_is_chained(struct perf_event *event)
 {
int idx = event->hw.idx;
+   struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 
return !WARN_ON(idx < 0) &&
   armv8pmu_event_is_64bit(event) &&
+  !armv8pmu_has_long_event(cpu_pmu) &&
   (idx != ARMV8_IDX_CYCLE_COUNTER);
 }
 
@@ -345,7 +358,7 @@ static inline void armv8pmu_select_counter(int idx)
isb();
 }
 
-static inline u32 armv8pmu_read_evcntr(int idx)
+static inline u64 armv8pmu_read_evcntr(int idx)
 {
armv8pmu_select_counter(idx);
return read_sysreg(pmxevcntr_el0);
@@ -362,6 +375,44 @@ static inline u64 armv8pmu_read_hw_counter(struct 
perf_event *event)
return val;
 }
 
+/*
+ * The cycle counter is always a 64-bit counter. When ARMV8_PMU_PMCR_LP
+ * is set the event counters also become 64-bit counters. Unless the
+ * user has requested a long counter (attr.config1) then we want to
+ * interrupt upon 32-bit overflow - we achieve this by applying a bias.
+ */
+static bool armv8pmu_event_needs_bias(struct perf_event *event)
+{
+   struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
+   struct hw_perf_event *hwc = >hw;
+   int idx = hwc->idx;
+
+   if (armv8pmu_event_is_64bit(event))
+   return false;
+
+   if (armv8pmu_has_long_event(cpu_pmu) ||
+   idx == ARMV8_IDX_CYCLE_COUNTER)
+   return true;
+
+   return false;
+}
+
+static u64 armv8pmu_bias_long_counter(struct perf_event *event, u64 value)
+{
+   if (armv8pmu_event_needs_bias(event))
+   value |= GENMASK(63, 32);
+
+   return value;
+}
+
+static u64 armv8pmu_unbias_long_counter(struct perf_event *event, u64 value)
+{
+   if (armv8pmu_event_needs_bias(event))
+   value &= ~GENMASK(63, 32);
+
+   return value;
+}
+
 static u64 armv8pmu_read_counter(struct perf_event *event)
 {
struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
@@ -377,10 +428,10 @@ static u64 armv8pmu_read_counter(struct perf_event *event)
else
value = armv8pmu_read_hw_counter(event);
 
-   return value;
+   return  armv8pmu_unbias_long_counter(event, value);
 }
 
-static inline void armv8pmu_write_evcnt

Re: [PATCH v2 08/18] arm64: KVM: add support to save/restore SPE profiling buffer controls

2019-12-24 Thread Andrew Murray
On Tue, Dec 24, 2019 at 10:49:30AM +, Andrew Murray wrote:
> On Sat, Dec 21, 2019 at 01:57:55PM +, Marc Zyngier wrote:
> > On Fri, 20 Dec 2019 14:30:15 +
> > Andrew Murray  wrote:
> > 
> > > From: Sudeep Holla 
> > > 
> > > Currently since we don't support profiling using SPE in the guests,
> > > we just save the PMSCR_EL1, flush the profiling buffers and disable
> > > sampling. However in order to support simultaneous sampling both in
> > 
> > Is the sampling actually simultaneous? I don't believe so (the whole
> > series would be much simpler if it was).
> 
> No the SPE is used by either the guest or host at any one time. I guess
> the term simultaneous was used to refer to illusion given to both guest
> and host that they are able to use it whenever they like. I'll update
> the commit message to drop the magic.
>  
> 
> > 
> > > the host and guests, we need to save and reatore the complete SPE
> > 
> > s/reatore/restore/
> 
> Noted.
> 
> 
> > 
> > > profiling buffer controls' context.
> > > 
> > > Let's add the support for the same and keep it disabled for now.
> > > We can enable it conditionally only if guests are allowed to use
> > > SPE.
> > > 
> > > Signed-off-by: Sudeep Holla 
> > > [ Clear PMBSR bit when saving state to prevent spurious interrupts ]
> > > Signed-off-by: Andrew Murray 
> > > ---
> > >  arch/arm64/kvm/hyp/debug-sr.c | 51 +--
> > >  1 file changed, 43 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> > > index 8a70a493345e..12429b212a3a 100644
> > > --- a/arch/arm64/kvm/hyp/debug-sr.c
> > > +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > > @@ -85,7 +85,8 @@
> > >   default:write_debug(ptr[0], reg, 0);\
> > >   }
> > >  
> > > -static void __hyp_text __debug_save_spe_nvhe(struct kvm_cpu_context 
> > > *ctxt)
> > > +static void __hyp_text
> > > +__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > 
> > nit: don't split lines like this if you can avoid it. You can put the
> > full_ctxt parameter on a separate line instead.
> 
> Yes understood.
> 
> 
> > 
> > >  {
> > >   u64 reg;
> > >  
> > > @@ -102,22 +103,46 @@ static void __hyp_text __debug_save_spe_nvhe(struct 
> > > kvm_cpu_context *ctxt)
> > >   if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> > >   return;
> > >  
> > > - /* No; is the host actually using the thing? */
> > > - reg = read_sysreg_s(SYS_PMBLIMITR_EL1);
> > > - if (!(reg & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)))
> > > + /* Save the control register and disable data generation */
> > > + ctxt->sys_regs[PMSCR_EL1] = read_sysreg_el1(SYS_PMSCR);
> > > +
> > > + if (!ctxt->sys_regs[PMSCR_EL1])
> > 
> > Shouldn't you check the enable bits instead of relying on the whole
> > thing being zero?
> 
> Yes that would make more sense (E1SPE and E0SPE).
> 
> I feel that this check makes an assumption about the guest/host SPE
> driver... What happens if the SPE driver writes to some SPE registers
> but doesn't enable PMSCR? If the guest is also using SPE then those
> writes will be lost, when the host returns and the SPE driver enables
> SPE it won't work.
> 
> With a quick look at the SPE driver I'm not sure this will happen, but
> even so it makes me nervous relying on these assumptions. I wonder if
> this risk is present in other devices?

In fact, this may be a good reason to trap the SPE registers - this would
allow you to conditionally save/restore based on a dirty bit. It would
also allow you to re-evaluate the SPE interrupt (for example when the guest
clears the status register) and thus potentially reduce any black hole.

Thanks,

Andrew Murray

> 
> 
> > 
> > >   return;
> > >  
> > >   /* Yes; save the control register and disable data generation */
> > > - ctxt->sys_regs[PMSCR_EL1] = read_sysreg_el1(SYS_PMSCR);
> > 
> > You've already saved the control register...
> 
> I'll remove that.
> 
> 
> > 
> > >   write_sysreg_el1(0, SYS_PMSCR);
> > >   isb();
> > >  
> > >   /* Now drain all buffered data to memory */
> > >   psb_csync();
> > >   dsb(nsh);
> > > +
> > > + if (!full_ctxt)
> > > + return;
> > > +
> > > + ctxt->sys_r

Re: [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual interrupts for SPE

2019-12-24 Thread Andrew Murray
On Tue, Dec 24, 2019 at 01:22:46PM +, Marc Zyngier wrote:
> On 2019-12-24 13:08, Andrew Murray wrote:
> > On Tue, Dec 24, 2019 at 12:42:02PM +, Marc Zyngier wrote:
> > > On 2019-12-24 11:50, Andrew Murray wrote:
> > > > On Sun, Dec 22, 2019 at 12:07:50PM +, Marc Zyngier wrote:
> > > > > On Fri, 20 Dec 2019 14:30:21 +,
> > > > > Andrew Murray  wrote:
> > > > > >
> > > > > > Upon the exit of a guest, let's determine if the SPE device
> > > has
> > > > > generated
> > > > > > an interrupt - if so we'll inject a virtual interrupt to the
> > > > > guest.
> > > > > >
> > > > > > Upon the entry and exit of a guest we'll also update the state
> > > of
> > > > > the
> > > > > > physical IRQ such that it is active when a guest interrupt is
> > > > > pending
> > > > > > and the guest is running.
> > > > > >
> > > > > > Finally we map the physical IRQ to the virtual IRQ such that
> > > the
> > > > > guest
> > > > > > can deactivate the interrupt when it handles the interrupt.
> > > > > >
> > > > > > Signed-off-by: Andrew Murray 
> > > > > > ---
> > > > > >  include/kvm/arm_spe.h |  6 
> > > > > >  virt/kvm/arm/arm.c|  5 ++-
> > > > > >  virt/kvm/arm/spe.c| 71
> > > > > +++
> > > > > >  3 files changed, 81 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> > > > > > index 9c65130d726d..91b2214f543a 100644
> > > > > > --- a/include/kvm/arm_spe.h
> > > > > > +++ b/include/kvm/arm_spe.h
> > > > > > @@ -37,6 +37,9 @@ static inline bool
> > > kvm_arm_support_spe_v1(void)
> > > > > >   
> > > > > > ID_AA64DFR0_PMSVER_SHIFT);
> > > > > >  }
> > > > > >
> > > > > > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu);
> > > > > > +inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
> > > > > > +
> > > > > >  int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
> > > > > > struct kvm_device_attr *attr);
> > > > > >  int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
> > > > > > @@ -49,6 +52,9 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu
> > > > > *vcpu);
> > > > > >  #define kvm_arm_support_spe_v1()   (false)
> > > > > >  #define kvm_arm_spe_irq_initialized(v) (false)
> > > > > >
> > > > > > +static inline void kvm_spe_flush_hwstate(struct kvm_vcpu
> > > *vcpu)
> > > > > {}
> > > > > > +static inline void kvm_spe_sync_hwstate(struct kvm_vcpu
> > > *vcpu) {}
> > > > > > +
> > > > > >  static inline int kvm_arm_spe_v1_set_attr(struct kvm_vcpu
> > > *vcpu,
> > > > > >   struct kvm_device_attr *attr)
> > > > > >  {
> > > > > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > > > > > index 340d2388ee2c..a66085c8e785 100644
> > > > > > --- a/virt/kvm/arm/arm.c
> > > > > > +++ b/virt/kvm/arm/arm.c
> > > > > > @@ -741,6 +741,7 @@ int kvm_arch_vcpu_ioctl_run(struct
> > > kvm_vcpu
> > > > > *vcpu, struct kvm_run *run)
> > > > > > preempt_disable();
> > > > > >
> > > > > > kvm_pmu_flush_hwstate(vcpu);
> > > > > > +   kvm_spe_flush_hwstate(vcpu);
> > > > > >
> > > > > > local_irq_disable();
> > > > > >
> > > > > > @@ -782,6 +783,7 @@ int kvm_arch_vcpu_ioctl_run(struct
> > > kvm_vcpu
> > > > > *vcpu, struct kvm_run *run)
> > > > > > kvm_request_pending(vcpu)) {
> > > > > > vcpu->mode = OUTSIDE_GUEST_MODE;
> > > > > > isb(); /* Ensure work in x_flush_hwstate is 
> > > > > > committed */
> > > > > > +   kvm_spe_s

Re: [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual interrupts for SPE

2019-12-24 Thread Andrew Murray
On Tue, Dec 24, 2019 at 12:42:02PM +, Marc Zyngier wrote:
> On 2019-12-24 11:50, Andrew Murray wrote:
> > On Sun, Dec 22, 2019 at 12:07:50PM +, Marc Zyngier wrote:
> > > On Fri, 20 Dec 2019 14:30:21 +0000,
> > > Andrew Murray  wrote:
> > > >
> > > > Upon the exit of a guest, let's determine if the SPE device has
> > > generated
> > > > an interrupt - if so we'll inject a virtual interrupt to the
> > > guest.
> > > >
> > > > Upon the entry and exit of a guest we'll also update the state of
> > > the
> > > > physical IRQ such that it is active when a guest interrupt is
> > > pending
> > > > and the guest is running.
> > > >
> > > > Finally we map the physical IRQ to the virtual IRQ such that the
> > > guest
> > > > can deactivate the interrupt when it handles the interrupt.
> > > >
> > > > Signed-off-by: Andrew Murray 
> > > > ---
> > > >  include/kvm/arm_spe.h |  6 
> > > >  virt/kvm/arm/arm.c|  5 ++-
> > > >  virt/kvm/arm/spe.c| 71
> > > +++
> > > >  3 files changed, 81 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> > > > index 9c65130d726d..91b2214f543a 100644
> > > > --- a/include/kvm/arm_spe.h
> > > > +++ b/include/kvm/arm_spe.h
> > > > @@ -37,6 +37,9 @@ static inline bool kvm_arm_support_spe_v1(void)
> > > >   
> > > > ID_AA64DFR0_PMSVER_SHIFT);
> > > >  }
> > > >
> > > > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu);
> > > > +inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
> > > > +
> > > >  int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
> > > > struct kvm_device_attr *attr);
> > > >  int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
> > > > @@ -49,6 +52,9 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu
> > > *vcpu);
> > > >  #define kvm_arm_support_spe_v1()   (false)
> > > >  #define kvm_arm_spe_irq_initialized(v) (false)
> > > >
> > > > +static inline void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu)
> > > {}
> > > > +static inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu) {}
> > > > +
> > > >  static inline int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
> > > >   struct kvm_device_attr *attr)
> > > >  {
> > > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > > > index 340d2388ee2c..a66085c8e785 100644
> > > > --- a/virt/kvm/arm/arm.c
> > > > +++ b/virt/kvm/arm/arm.c
> > > > @@ -741,6 +741,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
> > > *vcpu, struct kvm_run *run)
> > > > preempt_disable();
> > > >
> > > > kvm_pmu_flush_hwstate(vcpu);
> > > > +   kvm_spe_flush_hwstate(vcpu);
> > > >
> > > > local_irq_disable();
> > > >
> > > > @@ -782,6 +783,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
> > > *vcpu, struct kvm_run *run)
> > > > kvm_request_pending(vcpu)) {
> > > > vcpu->mode = OUTSIDE_GUEST_MODE;
> > > > isb(); /* Ensure work in x_flush_hwstate is 
> > > > committed */
> > > > +   kvm_spe_sync_hwstate(vcpu);
> > > > kvm_pmu_sync_hwstate(vcpu);
> > > > if 
> > > > (static_branch_unlikely(_irqchip_in_use))
> > > > kvm_timer_sync_hwstate(vcpu);
> > > > @@ -816,11 +818,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
> > > *vcpu, struct kvm_run *run)
> > > > kvm_arm_clear_debug(vcpu);
> > > >
> > > > /*
> > > > -* We must sync the PMU state before the vgic state so
> > > > +* We must sync the PMU and SPE state before the vgic 
> > > > state so
> > > >  * that the vgic can properly sample the updated state 
> > > > of the
> > > >  * interrupt line.
> > > >  

Re: [PATCH v2 00/18] arm64: KVM: add SPE profiling support

2019-12-24 Thread Andrew Murray
On Sun, Dec 22, 2019 at 12:22:10PM +, Marc Zyngier wrote:
> On Sat, 21 Dec 2019 10:48:16 +,
> Marc Zyngier  wrote:
> > 
> > [fixing email addresses]
> > 
> > Hi Andrew,
> > 
> > On 2019-12-20 14:30, Andrew Murray wrote:
> > > This series implements support for allowing KVM guests to use the Arm
> > > Statistical Profiling Extension (SPE).
> > 
> > Thanks for this. In future, please Cc me and Will on email addresses
> > we can actually read.
> > 
> > > It has been tested on a model to ensure that both host and guest can
> > > simultaneously use SPE with valid data. E.g.
> > > 
> > > $ perf record -e arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ \
> > > dd if=/dev/zero of=/dev/null count=1000
> > > $ perf report --dump-raw-trace > spe_buf.txt
> > > 
> > > As we save and restore the SPE context, the guest can access the SPE
> > > registers directly, thus in this version of the series we remove the
> > > trapping and emulation.
> > > 
> > > In the previous series of this support, when KVM SPE isn't
> > > supported (e.g. via CONFIG_KVM_ARM_SPE) we were able to return a
> > > value of 0 to all reads of the SPE registers - as we can no longer
> > > do this there isn't a mechanism to prevent the guest from using
> > > SPE - thus I'm keen for feedback on the best way of resolving
> > > this.
> > 
> > Surely there is a way to conditionally trap SPE registers, right? You
> > should still be able to do this if SPE is not configured for a given
> > guest (as we do for other feature such as PtrAuth).
> > 
> > > It appears necessary to pin the entire guest memory in order to
> > > provide guest SPE access - otherwise it is possible for the guest
> > > to receive Stage-2 faults.
> > 
> > Really? How can the guest receive a stage-2 fault? This doesn't fit
> > what I understand of the ARMv8 exception model. Or do you mean a SPE
> > interrupt describing a S2 fault?

Yes the latter.


> > 
> > And this is not just pinning the memory either. You have to ensure that
> > all S2 page tables are created ahead of SPE being able to DMA to guest
> > memory. This may have some impacts on the THP code...
> > 
> > I'll have a look at the actual series ASAP (but that's not very soon).
> 
> I found some time to go through the series, and there is clearly a lot
> of work left to do:
> 
> - There so nothing here to handle memory pinning whatsoever. If it
>   works, it is only thanks to some side effect.
> 
> - The missing trapping is deeply worrying. Given that this is an
>   optional feature, you cannot just let the guest do whatever it wants
>   in an uncontrolled manner.

Yes I'll add this.


> 
> - The interrupt handling is busted. You mix concepts picked from both
>   the PMU and the timer code, while the SPE device doesn't behave like
>   any of these two (it is neither a fully emulated device, nor a
>   device that is exclusively owned by a guest at any given time).
> 
> I expect some level of discussion on the list including at least Will
> and myself before you respin this.

Thanks for the quick feedback.

Andrew Murray

> 
>   M.
> 
> -- 
> Jazz is not dead, it just smells funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 00/18] arm64: KVM: add SPE profiling support

2019-12-24 Thread Andrew Murray
On Fri, Dec 20, 2019 at 05:55:25PM +, Mark Rutland wrote:
> Hi Andrew,
> 
> On Fri, Dec 20, 2019 at 02:30:07PM +0000, Andrew Murray wrote:
> > This series implements support for allowing KVM guests to use the Arm
> > Statistical Profiling Extension (SPE).
> > 
> > It has been tested on a model to ensure that both host and guest can
> > simultaneously use SPE with valid data. E.g.
> > 
> > $ perf record -e arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ \
> > dd if=/dev/zero of=/dev/null count=1000
> > $ perf report --dump-raw-trace > spe_buf.txt
> 
> What happens if I run perf record on the VMM, or on the CPU(s) that the
> VMM is running on? i.e.
> 
> $ perf record -e arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ \
> lkvm ${OPTIONS_FOR_GUEST_USING_SPE}
> 

By default perf excludes the guest, so this works as expected, just recording
activity of the process when it is outside the guest. (perf report appears
to give valid output).

Patch 15 currently prevents using perf to record inside the guest.


> ... or:
> 
> $ perf record -a -c 0 -e arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ \
> sleep 1000 &
> $ taskset -c 0 lkvm ${OPTIONS_FOR_GUEST_USING_SPE} &
> 
> > As we save and restore the SPE context, the guest can access the SPE
> > registers directly, thus in this version of the series we remove the
> > trapping and emulation.
> > 
> > In the previous series of this support, when KVM SPE isn't supported
> > (e.g. via CONFIG_KVM_ARM_SPE) we were able to return a value of 0 to
> > all reads of the SPE registers - as we can no longer do this there isn't
> > a mechanism to prevent the guest from using SPE - thus I'm keen for
> > feedback on the best way of resolving this.
> 
> When not providing SPE to the guest, surely we should be trapping the
> registers and injecting an UNDEF?

Yes we should, I'll update the series.


> 
> What happens today, without these patches?
> 

Prior to this series MDCR_EL2_TPMS is set and E2PB is unset resulting in all
SPE registers being trapped (with NULL handlers).


> > It appears necessary to pin the entire guest memory in order to provide
> > guest SPE access - otherwise it is possible for the guest to receive
> > Stage-2 faults.
> 
> AFAICT these patches do not implement this. I assume that's what you're
> trying to point out here, but I just want to make sure that's explicit.

That's right.


> 
> Maybe this is a reason to trap+emulate if there's something more
> sensible that hyp can do if it sees a Stage-2 fault.

Yes it's not really clear to me at the moment what to do about this.

Thanks,

Andrew Murray

> 
> Thanks,
> Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 13/18] perf: arm_spe: Add KVM structure for obtaining IRQ info

2019-12-24 Thread Andrew Murray
On Sun, Dec 22, 2019 at 11:24:13AM +, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:20 +,
> Andrew Murray  wrote:
> > 
> > KVM requires knowledge of the physical SPE IRQ number such that it can
> > associate it with any virtual IRQ for guests that require SPE emulation.
> 
> This is at best extremely odd. The only reason for KVM to obtain this
> IRQ number is if it has exclusive access to the device.  This
> obviously isn't the case, as this device is shared between host and
> guest.

This was an attempt to set the interrupt as active such that host SPE driver
doesn't get spurious interrupts due to guest SPE activity. Though let's save
the discussion to patch 14.


> 
> > Let's create a structure to hold this information and an accessor that
> > KVM can use to retrieve this information.
> > 
> > We expect that each SPE device will have the same physical PPI number
> > and thus will warn when this is not the case.
> > 
> > Signed-off-by: Andrew Murray 
> > ---
> >  drivers/perf/arm_spe_pmu.c | 23 +++
> >  include/kvm/arm_spe.h  |  6 ++
> >  2 files changed, 29 insertions(+)
> > 
> > diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> > index 4e4984a55cd1..2d24af4cfcab 100644
> > --- a/drivers/perf/arm_spe_pmu.c
> > +++ b/drivers/perf/arm_spe_pmu.c
> > @@ -34,6 +34,9 @@
> >  #include 
> >  #include 
> >  
> > +#include 
> > +#include 
> > +
> >  #include 
> >  #include 
> >  #include 
> > @@ -1127,6 +1130,24 @@ static void arm_spe_pmu_dev_teardown(struct 
> > arm_spe_pmu *spe_pmu)
> > free_percpu_irq(spe_pmu->irq, spe_pmu->handle);
> >  }
> >  
> > +#ifdef CONFIG_KVM_ARM_SPE
> > +static struct arm_spe_kvm_info arm_spe_kvm_info;
> > +
> > +struct arm_spe_kvm_info *arm_spe_get_kvm_info(void)
> > +{
> > +   return _spe_kvm_info;
> > +}
> 
> How does this work when SPE is built as a module?
> 
> > +
> > +static void arm_spe_populate_kvm_info(struct arm_spe_pmu *spe_pmu)
> > +{
> > +   WARN_ON_ONCE(arm_spe_kvm_info.physical_irq != 0 &&
> > +arm_spe_kvm_info.physical_irq != spe_pmu->irq);
> > +   arm_spe_kvm_info.physical_irq = spe_pmu->irq;
> 
> What does 'physical' means here? It's an IRQ in the Linux sense, so
> it's already some random number that bears no relation to anything
> 'physical'.

It's some random number relating to the SPE device as opposed to the virtual
SPE device.

Thanks,

Andrew Murray

> 
> > +}
> > +#else
> > +static void arm_spe_populate_kvm_info(struct arm_spe_pmu *spe_pmu) {}
> > +#endif
> > +
> >  /* Driver and device probing */
> >  static int arm_spe_pmu_irq_probe(struct arm_spe_pmu *spe_pmu)
> >  {
> > @@ -1149,6 +1170,8 @@ static int arm_spe_pmu_irq_probe(struct arm_spe_pmu 
> > *spe_pmu)
> > }
> >  
> > spe_pmu->irq = irq;
> > +   arm_spe_populate_kvm_info(spe_pmu);
> > +
> > return 0;
> >  }
> >  
> > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> > index d1f3c564dfd0..9c65130d726d 100644
> > --- a/include/kvm/arm_spe.h
> > +++ b/include/kvm/arm_spe.h
> > @@ -17,6 +17,12 @@ struct kvm_spe {
> > bool irq_level;
> >  };
> >  
> > +struct arm_spe_kvm_info {
> > +   int physical_irq;
> > +};
> > +
> > +struct arm_spe_kvm_info *arm_spe_get_kvm_info(void);
> > +
> >  #ifdef CONFIG_KVM_ARM_SPE
> >  #define kvm_arm_spe_v1_ready(v)((v)->arch.spe.ready)
> >  #define kvm_arm_spe_irq_initialized(v) \
> 
>   M.
> 
> -- 
> Jazz is not dead, it just smells funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 12/18] KVM: arm64: add a new vcpu device control group for SPEv1

2019-12-24 Thread Andrew Murray
On Sun, Dec 22, 2019 at 11:03:04AM +, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:19 +,
> Andrew Murray  wrote:
> > 
> > From: Sudeep Holla 
> > 
> > To configure the virtual SPEv1 overflow interrupt number, we use the
> > vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_SPE_V1_IRQ
> > attribute within the KVM_ARM_VCPU_SPE_V1_CTRL group.
> > 
> > After configuring the SPEv1, call the vcpu ioctl with attribute
> > KVM_ARM_VCPU_SPE_V1_INIT to initialize the SPEv1.
> > 
> > Signed-off-by: Sudeep Holla 
> > Signed-off-by: Andrew Murray 
> > ---
> >  Documentation/virt/kvm/devices/vcpu.txt |  28 
> >  arch/arm64/include/asm/kvm_host.h   |   2 +-
> >  arch/arm64/include/uapi/asm/kvm.h   |   4 +
> >  arch/arm64/kvm/Makefile |   1 +
> >  arch/arm64/kvm/guest.c  |   6 +
> >  arch/arm64/kvm/reset.c  |   3 +
> >  include/kvm/arm_spe.h   |  45 +++
> >  include/uapi/linux/kvm.h|   1 +
> >  virt/kvm/arm/arm.c  |   1 +
> >  virt/kvm/arm/spe.c  | 163 
> >  10 files changed, 253 insertions(+), 1 deletion(-)
> >  create mode 100644 virt/kvm/arm/spe.c
> > 
> > diff --git a/Documentation/virt/kvm/devices/vcpu.txt 
> > b/Documentation/virt/kvm/devices/vcpu.txt
> > index 6f3bd64a05b0..cefad056d677 100644
> > --- a/Documentation/virt/kvm/devices/vcpu.txt
> > +++ b/Documentation/virt/kvm/devices/vcpu.txt
> > @@ -74,3 +74,31 @@ Specifies the base address of the stolen time structure 
> > for this VCPU. The
> >  base address must be 64 byte aligned and exist within a valid guest memory
> >  region. See Documentation/virt/kvm/arm/pvtime.txt for more information
> >  including the layout of the stolen time structure.
> > +
> > +4. GROUP: KVM_ARM_VCPU_SPE_V1_CTRL
> > +Architectures: ARM64
> > +
> > +4.1. ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_IRQ
> > +Parameters: in kvm_device_attr.addr the address for SPE buffer overflow 
> > interrupt
> > +   is a pointer to an int
> > +Returns: -EBUSY: The SPE overflow interrupt is already set
> > + -ENXIO: The overflow interrupt not set when attempting to get it
> > + -ENODEV: SPEv1 not supported
> > + -EINVAL: Invalid SPE overflow interrupt number supplied or
> > +  trying to set the IRQ number without using an in-kernel
> > +  irqchip.
> > +
> > +A value describing the SPEv1 (Statistical Profiling Extension v1) overflow
> > +interrupt number for this vcpu. This interrupt should be PPI and the 
> > interrupt
> > +type and number must be same for each vcpu.
> > +
> > +4.2 ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_INIT
> > +Parameters: no additional parameter in kvm_device_attr.addr
> > +Returns: -ENODEV: SPEv1 not supported or GIC not initialized
> > + -ENXIO: SPEv1 not properly configured or in-kernel irqchip not
> > + configured as required prior to calling this attribute
> > + -EBUSY: SPEv1 already initialized
> > +
> > +Request the initialization of the SPEv1.  If using the SPEv1 with an 
> > in-kernel
> > +virtual GIC implementation, this must be done after initializing the 
> > in-kernel
> > +irqchip.
> > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > b/arch/arm64/include/asm/kvm_host.h
> > index 333c6491bec7..d00f450dc4cd 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -39,7 +39,7 @@
> >  
> >  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
> >  
> > -#define KVM_VCPU_MAX_FEATURES 7
> > +#define KVM_VCPU_MAX_FEATURES 8
> >  
> >  #define KVM_REQ_SLEEP \
> > KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> > b/arch/arm64/include/uapi/asm/kvm.h
> > index 820e5751ada7..905a73f30079 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -106,6 +106,7 @@ struct kvm_regs {
> >  #define KVM_ARM_VCPU_SVE   4 /* enable SVE for this CPU */
> >  #define KVM_ARM_VCPU_PTRAUTH_ADDRESS   5 /* VCPU uses address 
> > authentication */
> >  #define KVM_ARM_VCPU_PTRAUTH_GENERIC   6 /* VCPU uses generic 
> > authentication */
> > +#define KVM_ARM_VCPU_SPE_V17 /* Support guest SPEv1 */
> >  
> >  struct kvm_vcpu_init {
> > __u32 target;
> > @@ -326,6 +327,9 @@ struct 

Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls

2019-12-24 Thread Andrew Murray
On Fri, Dec 20, 2019 at 06:06:58PM +, Mark Rutland wrote:
> On Fri, Dec 20, 2019 at 02:30:16PM +0000, Andrew Murray wrote:
> > From: Sudeep Holla 
> > 
> > Now that we can save/restore the full SPE controls, we can enable it
> > if SPE is setup and ready to use in KVM. It's supported in KVM only if
> > all the CPUs in the system supports SPE.
> > 
> > However to support heterogenous systems, we need to move the check if
> > host supports SPE and do a partial save/restore.
> 
> I don't think that it makes sense to support this for heterogeneous
> systems, given their SPE capabilities and IMP DEF details will differ.
> 
> Is there some way we can limit this to homogeneous systems?

No problem, I'll see how to limit this.

Thanks,

Andrew Murray

> 
> Thanks,
> Mark.
> 
> > 
> > Signed-off-by: Sudeep Holla 
> > Signed-off-by: Andrew Murray 
> > ---
> >  arch/arm64/kvm/hyp/debug-sr.c | 33 -
> >  include/kvm/arm_spe.h |  6 ++
> >  2 files changed, 22 insertions(+), 17 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> > index 12429b212a3a..d8d857067e6d 100644
> > --- a/arch/arm64/kvm/hyp/debug-sr.c
> > +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > @@ -86,18 +86,13 @@
> > }
> >  
> >  static void __hyp_text
> > -__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > +__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> >  {
> > u64 reg;
> >  
> > /* Clear pmscr in case of early return */
> > ctxt->sys_regs[PMSCR_EL1] = 0;
> >  
> > -   /* SPE present on this CPU? */
> > -   if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> > - ID_AA64DFR0_PMSVER_SHIFT))
> > -   return;
> > -
> > /* Yes; is it owned by higher EL? */
> > reg = read_sysreg_s(SYS_PMBIDR_EL1);
> > if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> > @@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, 
> > bool full_ctxt)
> >  }
> >  
> >  static void __hyp_text
> > -__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > +__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> >  {
> > if (!ctxt->sys_regs[PMSCR_EL1])
> > return;
> > @@ -210,11 +205,14 @@ void __hyp_text __debug_restore_guest_context(struct 
> > kvm_vcpu *vcpu)
> > struct kvm_guest_debug_arch *host_dbg;
> > struct kvm_guest_debug_arch *guest_dbg;
> >  
> > +   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > +   guest_ctxt = >arch.ctxt;
> > +
> > +   __debug_restore_spe_context(guest_ctxt, kvm_arm_spe_v1_ready(vcpu));
> > +
> > if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> > return;
> >  
> > -   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > -   guest_ctxt = >arch.ctxt;
> > host_dbg = >arch.host_debug_state.regs;
> > guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
> >  
> > @@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct 
> > kvm_vcpu *vcpu)
> > host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > guest_ctxt = >arch.ctxt;
> >  
> > -   if (!has_vhe())
> > -   __debug_restore_spe_nvhe(host_ctxt, false);
> > +   __debug_restore_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
> >  
> > if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> > return;
> > @@ -249,19 +246,21 @@ void __hyp_text __debug_restore_host_context(struct 
> > kvm_vcpu *vcpu)
> >  
> >  void __hyp_text __debug_save_host_context(struct kvm_vcpu *vcpu)
> >  {
> > -   /*
> > -* Non-VHE: Disable and flush SPE data generation
> > -* VHE: The vcpu can run, but it can't hide.
> > -*/
> > struct kvm_cpu_context *host_ctxt;
> >  
> > host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > -   if (!has_vhe())
> > -   __debug_save_spe_nvhe(host_ctxt, false);
> > +   if (cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> > +ID_AA64DFR0_PMSVER_SHIFT))
> > +   __debug_save_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
> >  }
> >  
> >  void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)
> >  {
> > +   bool kvm_spe

Re: [PATCH v2 03/18] arm64: KVM: define SPE data structure for each vcpu

2019-12-24 Thread Andrew Murray
On Sat, Dec 21, 2019 at 01:19:36PM +, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:10 +
> Andrew Murray  wrote:
> 
> > From: Sudeep Holla 
> > 
> > In order to support virtual SPE for guest, so define some basic structs.
> > This features depends on host having hardware with SPE support.
> > 
> > Since we can support this only on ARM64, add a separate config symbol
> > for the same.
> > 
> > Signed-off-by: Sudeep Holla 
> > [ Add irq_level, rename irq to irq_num for kvm_spe ]
> > Signed-off-by: Andrew Murray 
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  2 ++
> >  arch/arm64/kvm/Kconfig|  7 +++
> >  include/kvm/arm_spe.h | 19 +++
> >  3 files changed, 28 insertions(+)
> >  create mode 100644 include/kvm/arm_spe.h
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > b/arch/arm64/include/asm/kvm_host.h
> > index c61260cf63c5..f5dcff912645 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -35,6 +35,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
> >  
> > @@ -302,6 +303,7 @@ struct kvm_vcpu_arch {
> > struct vgic_cpu vgic_cpu;
> > struct arch_timer_cpu timer_cpu;
> > struct kvm_pmu pmu;
> > +   struct kvm_spe spe;
> >  
> > /*
> >  * Anything that is not used directly from assembly code goes
> > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> > index a475c68cbfec..af5be2c57dcb 100644
> > --- a/arch/arm64/kvm/Kconfig
> > +++ b/arch/arm64/kvm/Kconfig
> > @@ -35,6 +35,7 @@ config KVM
> > select HAVE_KVM_EVENTFD
> > select HAVE_KVM_IRQFD
> > select KVM_ARM_PMU if HW_PERF_EVENTS
> > +   select KVM_ARM_SPE if (HW_PERF_EVENTS && ARM_SPE_PMU)
> > select HAVE_KVM_MSI
> > select HAVE_KVM_IRQCHIP
> > select HAVE_KVM_IRQ_ROUTING
> > @@ -61,6 +62,12 @@ config KVM_ARM_PMU
> >   Adds support for a virtual Performance Monitoring Unit (PMU) in
> >   virtual machines.
> >  
> > +config KVM_ARM_SPE
> > +   bool
> > +   ---help---
> > + Adds support for a virtual Statistical Profiling Extension(SPE) in
> > + virtual machines.
> > +
> >  config KVM_INDIRECT_VECTORS
> > def_bool KVM && (HARDEN_BRANCH_PREDICTOR || HARDEN_EL2_VECTORS)
> >  
> > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> > new file mode 100644
> > index ..48d118fdb174
> > --- /dev/null
> > +++ b/include/kvm/arm_spe.h
> > @@ -0,0 +1,19 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 ARM Ltd.
> > + */
> > +
> > +#ifndef __ASM_ARM_KVM_SPE_H
> > +#define __ASM_ARM_KVM_SPE_H
> > +
> > +#include 
> > +#include 
> 
> I don't believe these are required at this stage.
> 
> > +
> > +struct kvm_spe {
> > +   int irq_num;
> 
> 'irq' was the right name *if* this represents a Linux irq. If this
> instead represents a guest PPI, then it should be named 'intid'.
> 
> In either case, please document what this represents.
> 
> > +   bool ready; /* indicates that SPE KVM instance is ready for use */
> > +   bool created; /* SPE KVM instance is created, may not be ready yet */
> > +   bool irq_level;
> 
> What does this represent? The state of the interrupt on the host? The
> guest? Something else? Also, please consider grouping related fields
> together.

It should be the state of the interrupt on the guest.

> 
> > +};
> 
> If you've added a config option that controls the selection of the SPE
> feature, why doesn't this result in an empty structure when it isn't
> selected?

OK, all noted.

Andrew Murray

> 
> > +
> > +#endif /* __ASM_ARM_KVM_SPE_H */
> 
> Thanks,
> 
>   M.
> -- 
> Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual interrupts for SPE

2019-12-24 Thread Andrew Murray
On Sun, Dec 22, 2019 at 12:07:50PM +, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:21 +,
> Andrew Murray  wrote:
> > 
> > Upon the exit of a guest, let's determine if the SPE device has generated
> > an interrupt - if so we'll inject a virtual interrupt to the guest.
> > 
> > Upon the entry and exit of a guest we'll also update the state of the
> > physical IRQ such that it is active when a guest interrupt is pending
> > and the guest is running.
> > 
> > Finally we map the physical IRQ to the virtual IRQ such that the guest
> > can deactivate the interrupt when it handles the interrupt.
> > 
> > Signed-off-by: Andrew Murray 
> > ---
> >  include/kvm/arm_spe.h |  6 
> >  virt/kvm/arm/arm.c|  5 ++-
> >  virt/kvm/arm/spe.c| 71 +++
> >  3 files changed, 81 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> > index 9c65130d726d..91b2214f543a 100644
> > --- a/include/kvm/arm_spe.h
> > +++ b/include/kvm/arm_spe.h
> > @@ -37,6 +37,9 @@ static inline bool kvm_arm_support_spe_v1(void)
> >   ID_AA64DFR0_PMSVER_SHIFT);
> >  }
> >  
> > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu);
> > +inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
> > +
> >  int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
> > struct kvm_device_attr *attr);
> >  int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
> > @@ -49,6 +52,9 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu);
> >  #define kvm_arm_support_spe_v1()   (false)
> >  #define kvm_arm_spe_irq_initialized(v) (false)
> >  
> > +static inline void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu) {}
> > +static inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu) {}
> > +
> >  static inline int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
> >   struct kvm_device_attr *attr)
> >  {
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 340d2388ee2c..a66085c8e785 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -741,6 +741,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> > struct kvm_run *run)
> > preempt_disable();
> >  
> > kvm_pmu_flush_hwstate(vcpu);
> > +   kvm_spe_flush_hwstate(vcpu);
> >  
> > local_irq_disable();
> >  
> > @@ -782,6 +783,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> > struct kvm_run *run)
> > kvm_request_pending(vcpu)) {
> > vcpu->mode = OUTSIDE_GUEST_MODE;
> > isb(); /* Ensure work in x_flush_hwstate is committed */
> > +   kvm_spe_sync_hwstate(vcpu);
> > kvm_pmu_sync_hwstate(vcpu);
> > if (static_branch_unlikely(_irqchip_in_use))
> > kvm_timer_sync_hwstate(vcpu);
> > @@ -816,11 +818,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> > struct kvm_run *run)
> > kvm_arm_clear_debug(vcpu);
> >  
> > /*
> > -* We must sync the PMU state before the vgic state so
> > +* We must sync the PMU and SPE state before the vgic state so
> >  * that the vgic can properly sample the updated state of the
> >  * interrupt line.
> >  */
> > kvm_pmu_sync_hwstate(vcpu);
> > +   kvm_spe_sync_hwstate(vcpu);
> 
> The *HUGE* difference is that the PMU is purely a virtual interrupt,
> while you're trying to deal with a HW interrupt here.
> 
> >  
> > /*
> >  * Sync the vgic state before syncing the timer state because
> > diff --git a/virt/kvm/arm/spe.c b/virt/kvm/arm/spe.c
> > index 83ac2cce2cc3..097ed39014e4 100644
> > --- a/virt/kvm/arm/spe.c
> > +++ b/virt/kvm/arm/spe.c
> > @@ -35,6 +35,68 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu)
> > return 0;
> >  }
> >  
> > +static inline void set_spe_irq_phys_active(struct arm_spe_kvm_info *info,
> > +  bool active)
> > +{
> > +   int r;
> > +   r = irq_set_irqchip_state(info->physical_irq, IRQCHIP_STATE_ACTIVE,
> > + active);
> > +   WARN_ON(r);
> > +}
> > +
> > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu)
> > +{
> > +   struct k

Re: [PATCH v2 10/18] arm64: KVM/debug: use EL1&0 stage 1 translation regime

2019-12-24 Thread Andrew Murray
On Sun, Dec 22, 2019 at 10:34:55AM +, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:17 +,
> Andrew Murray  wrote:
> > 
> > From: Sudeep Holla 
> > 
> > Now that we have all the save/restore mechanism in place, lets enable
> > the translation regime used by buffer from EL2 stage 1 to EL1 stage 1
> > on VHE systems.
> > 
> > Signed-off-by: Sudeep Holla 
> > [ Reword commit, don't trap to EL2 ]
> 
> Not trapping to EL2 for the case where we don't allow SPE in the
> guest is not acceptable.

Yes understood (because of this I had meant to send the series as RFC btw).


> 
> > Signed-off-by: Andrew Murray 
> > ---
> >  arch/arm64/kvm/hyp/switch.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 67b7c160f65b..6c153b79829b 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -100,6 +100,7 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
> >  
> > write_sysreg(val, cpacr_el1);
> >  
> > +   write_sysreg(vcpu->arch.mdcr_el2 | 3 << MDCR_EL2_E2PB_SHIFT, mdcr_el2);
> > write_sysreg(kvm_get_hyp_vector(), vbar_el1);
> >  }
> >  NOKPROBE_SYMBOL(activate_traps_vhe);
> > @@ -117,6 +118,7 @@ static void __hyp_text __activate_traps_nvhe(struct 
> > kvm_vcpu *vcpu)
> > __activate_traps_fpsimd32(vcpu);
> > }
> >  
> > +   write_sysreg(vcpu->arch.mdcr_el2 | 3 << MDCR_EL2_E2PB_SHIFT, mdcr_el2);
> 
> There is a _MASK macro that can replace this '3', and is in keeping
> with the rest of the code.

OK.


> 
> It still remains that it looks like the wrong place to do this, and
> vcpu_load seems much better. Why should you write to mdcr_el2 on each
> entry to the guest, since you know whether it has SPE enabled at the
> point where it gets scheduled?

Yes OK, I'll move what I can to vcpu_load.

Thanks,

Andrew Murray


> 
>   M.
> 
> -- 
> Jazz is not dead, it just smells funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 08/18] arm64: KVM: add support to save/restore SPE profiling buffer controls

2019-12-24 Thread Andrew Murray
On Sat, Dec 21, 2019 at 01:57:55PM +, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:15 +
> Andrew Murray  wrote:
> 
> > From: Sudeep Holla 
> > 
> > Currently since we don't support profiling using SPE in the guests,
> > we just save the PMSCR_EL1, flush the profiling buffers and disable
> > sampling. However in order to support simultaneous sampling both in
> 
> Is the sampling actually simultaneous? I don't believe so (the whole
> series would be much simpler if it was).

No the SPE is used by either the guest or host at any one time. I guess
the term simultaneous was used to refer to illusion given to both guest
and host that they are able to use it whenever they like. I'll update
the commit message to drop the magic.
 

> 
> > the host and guests, we need to save and reatore the complete SPE
> 
> s/reatore/restore/

Noted.


> 
> > profiling buffer controls' context.
> > 
> > Let's add the support for the same and keep it disabled for now.
> > We can enable it conditionally only if guests are allowed to use
> > SPE.
> > 
> > Signed-off-by: Sudeep Holla 
> > [ Clear PMBSR bit when saving state to prevent spurious interrupts ]
> > Signed-off-by: Andrew Murray 
> > ---
> >  arch/arm64/kvm/hyp/debug-sr.c | 51 +--
> >  1 file changed, 43 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> > index 8a70a493345e..12429b212a3a 100644
> > --- a/arch/arm64/kvm/hyp/debug-sr.c
> > +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > @@ -85,7 +85,8 @@
> > default:write_debug(ptr[0], reg, 0);\
> > }
> >  
> > -static void __hyp_text __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt)
> > +static void __hyp_text
> > +__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> 
> nit: don't split lines like this if you can avoid it. You can put the
> full_ctxt parameter on a separate line instead.

Yes understood.


> 
> >  {
> > u64 reg;
> >  
> > @@ -102,22 +103,46 @@ static void __hyp_text __debug_save_spe_nvhe(struct 
> > kvm_cpu_context *ctxt)
> > if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> > return;
> >  
> > -   /* No; is the host actually using the thing? */
> > -   reg = read_sysreg_s(SYS_PMBLIMITR_EL1);
> > -   if (!(reg & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)))
> > +   /* Save the control register and disable data generation */
> > +   ctxt->sys_regs[PMSCR_EL1] = read_sysreg_el1(SYS_PMSCR);
> > +
> > +   if (!ctxt->sys_regs[PMSCR_EL1])
> 
> Shouldn't you check the enable bits instead of relying on the whole
> thing being zero?

Yes that would make more sense (E1SPE and E0SPE).

I feel that this check makes an assumption about the guest/host SPE
driver... What happens if the SPE driver writes to some SPE registers
but doesn't enable PMSCR? If the guest is also using SPE then those
writes will be lost, when the host returns and the SPE driver enables
SPE it won't work.

With a quick look at the SPE driver I'm not sure this will happen, but
even so it makes me nervous relying on these assumptions. I wonder if
this risk is present in other devices?


> 
> > return;
> >  
> > /* Yes; save the control register and disable data generation */
> > -   ctxt->sys_regs[PMSCR_EL1] = read_sysreg_el1(SYS_PMSCR);
> 
> You've already saved the control register...

I'll remove that.


> 
> > write_sysreg_el1(0, SYS_PMSCR);
> > isb();
> >  
> > /* Now drain all buffered data to memory */
> > psb_csync();
> > dsb(nsh);
> > +
> > +   if (!full_ctxt)
> > +   return;
> > +
> > +   ctxt->sys_regs[PMBLIMITR_EL1] = read_sysreg_s(SYS_PMBLIMITR_EL1);
> > +   write_sysreg_s(0, SYS_PMBLIMITR_EL1);
> > +
> > +   /*
> > +* As PMBSR is conditionally restored when returning to the host we
> > +* must ensure the service bit is unset here to prevent a spurious
> > +* host SPE interrupt from being raised.
> > +*/
> > +   ctxt->sys_regs[PMBSR_EL1] = read_sysreg_s(SYS_PMBSR_EL1);
> > +   write_sysreg_s(0, SYS_PMBSR_EL1);
> > +
> > +   isb();
> > +
> > +   ctxt->sys_regs[PMSICR_EL1] = read_sysreg_s(SYS_PMSICR_EL1);
> > +   ctxt->sys_regs[PMSIRR_EL1] = read_sysreg_s(SYS_PMSIRR_EL1);
> > +   ctxt->sys_regs[PMSFCR_EL1] = read_sysreg_s(SYS_PMSFCR_EL1);
> > +   ctxt->sys_regs[PMSEVFR_EL1] = read_sysreg_s(SYS_PMSEVFR_EL1);
> > +   ctxt->sys_regs[PMSLATFR_EL1] = read

Re: [PATCH v2 02/18] arm64: KVM: reset E2PB correctly in MDCR_EL2 when exiting the guest(VHE)

2019-12-24 Thread Andrew Murray
On Sat, Dec 21, 2019 at 01:12:14PM +, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:09 +
> Andrew Murray  wrote:
> 
> > From: Sudeep Holla 
> > 
> > On VHE systems, the reset value for MDCR_EL2.E2PB=b00 which defaults
> > to profiling buffer using the EL2 stage 1 translations. 
> 
> Does the reset value actually matter here? I don't see it being
> specific to VHE systems, and all we're trying to achieve is to restore
> the SPE configuration to a state where it can be used by the host.
> 
> > However if the
> > guest are allowed to use profiling buffers changing E2PB settings, we
> 
> How can the guest be allowed to change E2PB settings? Or do you mean
> here that allowing the guest to use SPE will mandate changes of the
> E2PB settings, and that we'd better restore the hypervisor state once
> we exit?
> 
> > need to ensure we resume back MDCR_EL2.E2PB=b00. Currently we just
> > do bitwise '&' with MDCR_EL2_E2PB_MASK which will retain the value.
> > 
> > So fix it by clearing all the bits in E2PB.
> > 
> > Signed-off-by: Sudeep Holla 
> > Signed-off-by: Andrew Murray 
> > ---
> >  arch/arm64/kvm/hyp/switch.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 72fbbd86eb5e..250f13910882 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -228,9 +228,7 @@ void deactivate_traps_vhe_put(void)
> >  {
> > u64 mdcr_el2 = read_sysreg(mdcr_el2);
> >  
> > -   mdcr_el2 &= MDCR_EL2_HPMN_MASK |
> > -   MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
> > -   MDCR_EL2_TPMS;
> > +   mdcr_el2 &= MDCR_EL2_HPMN_MASK | MDCR_EL2_TPMS;
> >  
> > write_sysreg(mdcr_el2, mdcr_el2);
> >  
> 
> I'm OK with this change, but I believe the commit message could use
> some tidying up.

No problem, I'll update the commit message.

Thanks,

Andrew Murray

> 
> Thanks,
> 
>   M.
> -- 
> Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2

2019-12-23 Thread Andrew Murray
On Mon, Dec 23, 2019 at 12:05:12PM +, Marc Zyngier wrote:
> On 2019-12-23 11:56, Andrew Murray wrote:
> > On Sun, Dec 22, 2019 at 10:42:05AM +, Marc Zyngier wrote:
> > > On Fri, 20 Dec 2019 14:30:18 +0000,
> > > Andrew Murray  wrote:
> > > >
> > > > As we now save/restore the profiler state there is no need to trap
> > > > accesses to the statistical profiling controls. Let's unset the
> > > > _TPMS bit.
> > > >
> > > > Signed-off-by: Andrew Murray 
> > > > ---
> > > >  arch/arm64/kvm/debug.c | 2 --
> > > >  1 file changed, 2 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> > > > index 43487f035385..07ca783e7d9e 100644
> > > > --- a/arch/arm64/kvm/debug.c
> > > > +++ b/arch/arm64/kvm/debug.c
> > > > @@ -88,7 +88,6 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu
> > > *vcpu)
> > > >   *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
> > > >   *  - Debug ROM Address (MDCR_EL2_TDRA)
> > > >   *  - OS related registers (MDCR_EL2_TDOSA)
> > > > - *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
> > > >   *
> > > >   * Additionally, KVM only traps guest accesses to the debug
> > > registers if
> > > >   * the guest is not actively using them (see the
> > > KVM_ARM64_DEBUG_DIRTY
> > > > @@ -111,7 +110,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu
> > > *vcpu)
> > > >  */
> > > > vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) &
> > > MDCR_EL2_HPMN_MASK;
> > > > vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
> > > > -   MDCR_EL2_TPMS |
> > > 
> > > No. This is an *optional* feature (the guest could not be presented
> > > with the SPE feature, or the the support simply not be compiled in).
> > > 
> > > If the guest is not allowed to see the feature, for whichever
> > > reason,
> > > the traps *must* be enabled and handled.
> > 
> > I'll update this (and similar) to trap such registers when we don't
> > support
> > SPE in the guest.
> > 
> > My original concern in the cover letter was in how to prevent the guest
> > from attempting to use these registers in the first place - I think the
> > solution I was looking for is to trap-and-emulate ID_AA64DFR0_EL1 such
> > that
> > the PMSVer bits indicate that SPE is not emulated.
> 
> That, and active trapping of the SPE system registers resulting in injection
> of an UNDEF into the offending guest.

Yes that's no problem.

Thanks,

Andrew Murray

> 
> Thanks,
> 
> M.
> -- 
> Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 15/18] perf: arm_spe: Handle guest/host exclusion flags

2019-12-23 Thread Andrew Murray
On Sun, Dec 22, 2019 at 12:10:52PM +, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:22 +,
> Andrew Murray  wrote:
> > 
> > A side effect of supporting the SPE in guests is that we prevent the
> > host from collecting data whilst inside a guest thus creating a black-out
> > window. This occurs because instead of emulating the SPE, we share it
> > with our guests.
> > 
> > Let's accurately describe our capabilities by using the perf exclude
> > flags to prevent !exclude_guest and exclude_host flags from being used.
> > 
> > Signed-off-by: Andrew Murray 
> > ---
> >  drivers/perf/arm_spe_pmu.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> > index 2d24af4cfcab..3703dbf459de 100644
> > --- a/drivers/perf/arm_spe_pmu.c
> > +++ b/drivers/perf/arm_spe_pmu.c
> > @@ -679,6 +679,9 @@ static int arm_spe_pmu_event_init(struct perf_event 
> > *event)
> > if (attr->exclude_idle)
> > return -EOPNOTSUPP;
> >  
> > +   if (!attr->exclude_guest || attr->exclude_host)
> > +   return -EOPNOTSUPP;
> > +
> 
> I have the opposite approach. If the host decides to profile the
> guest, why should that be denied? If there is a black hole, it should
> take place in the guest. Today, the host does expect this to work, and
> there is no way that we unconditionally allow it to regress.

That seems reasonable.

Upon entering the guest we'd have to detect if the host is using SPE, and if
so choose not to restore the guest registers. Instead we'd have to trap them
and let the guest read/write emulated values until the host has finished with
SPE - at which time we could restore the guest SPE registers to hardware.

Does that approach make sense?

Thanks,

Andrew Murray

> 
>   M.
> 
> -- 
> Jazz is not dead, it just smells funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2

2019-12-23 Thread Andrew Murray
On Sun, Dec 22, 2019 at 10:42:05AM +, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:18 +,
> Andrew Murray  wrote:
> > 
> > As we now save/restore the profiler state there is no need to trap
> > accesses to the statistical profiling controls. Let's unset the
> > _TPMS bit.
> > 
> > Signed-off-by: Andrew Murray 
> > ---
> >  arch/arm64/kvm/debug.c | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> > index 43487f035385..07ca783e7d9e 100644
> > --- a/arch/arm64/kvm/debug.c
> > +++ b/arch/arm64/kvm/debug.c
> > @@ -88,7 +88,6 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
> >   *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
> >   *  - Debug ROM Address (MDCR_EL2_TDRA)
> >   *  - OS related registers (MDCR_EL2_TDOSA)
> > - *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
> >   *
> >   * Additionally, KVM only traps guest accesses to the debug registers if
> >   * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
> > @@ -111,7 +110,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
> >  */
> > vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
> > vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
> > -   MDCR_EL2_TPMS |
> 
> No. This is an *optional* feature (the guest could not be presented
> with the SPE feature, or the the support simply not be compiled in).
> 
> If the guest is not allowed to see the feature, for whichever reason,
> the traps *must* be enabled and handled.

I'll update this (and similar) to trap such registers when we don't support
SPE in the guest.

My original concern in the cover letter was in how to prevent the guest
from attempting to use these registers in the first place - I think the
solution I was looking for is to trap-and-emulate ID_AA64DFR0_EL1 such that
the PMSVer bits indicate that SPE is not emulated.

Thanks,

Andrew Murray


> 
>   M.
> 
> -- 
> Jazz is not dead, it just smells funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 17/18, KVMTOOL] update_headers: Sync kvm UAPI headers with linux v5.5-rc2

2019-12-20 Thread Andrew Murray
The local copies of the kvm user API headers are getting stale.

In preparation for some arch-specific updated, this patch reflects
a re-run of util/update_headers.sh to pull in upstream updates from
linux v5.5-rc2.

Signed-off-by: Sudeep Holla 
[ Update headers to v5.5-rc2 ]
Signed-off-by: Andrew Murray 
---
 arm/aarch32/include/asm/kvm.h |  7 +--
 arm/aarch64/include/asm/kvm.h | 13 +++--
 include/linux/kvm.h   | 18 ++
 powerpc/include/asm/kvm.h |  3 +++
 4 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/arm/aarch32/include/asm/kvm.h b/arm/aarch32/include/asm/kvm.h
index a4217c1a5d01..03cd7c19a683 100644
--- a/arm/aarch32/include/asm/kvm.h
+++ b/arm/aarch32/include/asm/kvm.h
@@ -131,8 +131,9 @@ struct kvm_vcpu_events {
struct {
__u8 serror_pending;
__u8 serror_has_esr;
+   __u8 ext_dabt_pending;
/* Align it to 8 bytes */
-   __u8 pad[6];
+   __u8 pad[5];
__u64 serror_esr;
} exception;
__u32 reserved[12];
@@ -266,8 +267,10 @@ struct kvm_vcpu_events {
 #define   KVM_DEV_ARM_ITS_CTRL_RESET   4
 
 /* KVM_IRQ_LINE irq field index values */
+#define KVM_ARM_IRQ_VCPU2_SHIFT28
+#define KVM_ARM_IRQ_VCPU2_MASK 0xf
 #define KVM_ARM_IRQ_TYPE_SHIFT 24
-#define KVM_ARM_IRQ_TYPE_MASK  0xff
+#define KVM_ARM_IRQ_TYPE_MASK  0xf
 #define KVM_ARM_IRQ_VCPU_SHIFT 16
 #define KVM_ARM_IRQ_VCPU_MASK  0xff
 #define KVM_ARM_IRQ_NUM_SHIFT  0
diff --git a/arm/aarch64/include/asm/kvm.h b/arm/aarch64/include/asm/kvm.h
index 9a507716ae2f..905a73f30079 100644
--- a/arm/aarch64/include/asm/kvm.h
+++ b/arm/aarch64/include/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE   4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS   5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC   6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_SPE_V17 /* Support guest SPEv1 */
 
 struct kvm_vcpu_init {
__u32 target;
@@ -164,8 +165,9 @@ struct kvm_vcpu_events {
struct {
__u8 serror_pending;
__u8 serror_has_esr;
+   __u8 ext_dabt_pending;
/* Align it to 8 bytes */
-   __u8 pad[6];
+   __u8 pad[5];
__u64 serror_esr;
} exception;
__u32 reserved[12];
@@ -323,10 +325,17 @@ struct kvm_vcpu_events {
 #define KVM_ARM_VCPU_TIMER_CTRL1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER1
+#define KVM_ARM_VCPU_PVTIME_CTRL   2
+#define   KVM_ARM_VCPU_PVTIME_IPA  0
+#define KVM_ARM_VCPU_SPE_V1_CTRL   3
+#define   KVM_ARM_VCPU_SPE_V1_IRQ  0
+#define   KVM_ARM_VCPU_SPE_V1_INIT 1
 
 /* KVM_IRQ_LINE irq field index values */
+#define KVM_ARM_IRQ_VCPU2_SHIFT28
+#define KVM_ARM_IRQ_VCPU2_MASK 0xf
 #define KVM_ARM_IRQ_TYPE_SHIFT 24
-#define KVM_ARM_IRQ_TYPE_MASK  0xff
+#define KVM_ARM_IRQ_TYPE_MASK  0xf
 #define KVM_ARM_IRQ_VCPU_SHIFT 16
 #define KVM_ARM_IRQ_VCPU_MASK  0xff
 #define KVM_ARM_IRQ_NUM_SHIFT  0
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 5e3f12d5359e..1a362c230e4a 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -235,6 +235,7 @@ struct kvm_hyperv_exit {
 #define KVM_EXIT_S390_STSI25
 #define KVM_EXIT_IOAPIC_EOI   26
 #define KVM_EXIT_HYPERV   27
+#define KVM_EXIT_ARM_NISV 28
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -243,6 +244,8 @@ struct kvm_hyperv_exit {
 #define KVM_INTERNAL_ERROR_SIMUL_EX2
 /* Encounter unexpected vm-exit due to delivery event. */
 #define KVM_INTERNAL_ERROR_DELIVERY_EV 3
+/* Encounter unexpected vm-exit reason */
+#define KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON  4
 
 /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
 struct kvm_run {
@@ -392,6 +395,11 @@ struct kvm_run {
} eoi;
/* KVM_EXIT_HYPERV */
struct kvm_hyperv_exit hyperv;
+   /* KVM_EXIT_ARM_NISV */
+   struct {
+   __u64 esr_iss;
+   __u64 fault_ipa;
+   } arm_nisv;
/* Fix the size of the union. */
char padding[256];
};
@@ -996,6 +1004,12 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
 #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
 #define KVM_CAP_PMU_EVENT_FILTER 173
+#define KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 174
+#define KVM_CAP_HYPERV_DIRECT_TLBFLUSH 175
+#define KVM_CAP_PPC_GUEST_DEBUG_SSTEP 176
+#define KVM_CAP_ARM_NISV_TO_USER 177
+#define KVM_CAP_ARM_INJECT_EXT_DABT 178
+#define KVM_CAP_ARM_SPE_V1 179
 
 #ifdef KVM_CAP_IRQ_ROUTING

[PATCH v2 13/18] perf: arm_spe: Add KVM structure for obtaining IRQ info

2019-12-20 Thread Andrew Murray
KVM requires knowledge of the physical SPE IRQ number such that it can
associate it with any virtual IRQ for guests that require SPE emulation.

Let's create a structure to hold this information and an accessor that
KVM can use to retrieve this information.

We expect that each SPE device will have the same physical PPI number
and thus will warn when this is not the case.

Signed-off-by: Andrew Murray 
---
 drivers/perf/arm_spe_pmu.c | 23 +++
 include/kvm/arm_spe.h  |  6 ++
 2 files changed, 29 insertions(+)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 4e4984a55cd1..2d24af4cfcab 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -34,6 +34,9 @@
 #include 
 #include 
 
+#include 
+#include 
+
 #include 
 #include 
 #include 
@@ -1127,6 +1130,24 @@ static void arm_spe_pmu_dev_teardown(struct arm_spe_pmu 
*spe_pmu)
free_percpu_irq(spe_pmu->irq, spe_pmu->handle);
 }
 
+#ifdef CONFIG_KVM_ARM_SPE
+static struct arm_spe_kvm_info arm_spe_kvm_info;
+
+struct arm_spe_kvm_info *arm_spe_get_kvm_info(void)
+{
+   return _spe_kvm_info;
+}
+
+static void arm_spe_populate_kvm_info(struct arm_spe_pmu *spe_pmu)
+{
+   WARN_ON_ONCE(arm_spe_kvm_info.physical_irq != 0 &&
+arm_spe_kvm_info.physical_irq != spe_pmu->irq);
+   arm_spe_kvm_info.physical_irq = spe_pmu->irq;
+}
+#else
+static void arm_spe_populate_kvm_info(struct arm_spe_pmu *spe_pmu) {}
+#endif
+
 /* Driver and device probing */
 static int arm_spe_pmu_irq_probe(struct arm_spe_pmu *spe_pmu)
 {
@@ -1149,6 +1170,8 @@ static int arm_spe_pmu_irq_probe(struct arm_spe_pmu 
*spe_pmu)
}
 
spe_pmu->irq = irq;
+   arm_spe_populate_kvm_info(spe_pmu);
+
return 0;
 }
 
diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
index d1f3c564dfd0..9c65130d726d 100644
--- a/include/kvm/arm_spe.h
+++ b/include/kvm/arm_spe.h
@@ -17,6 +17,12 @@ struct kvm_spe {
bool irq_level;
 };
 
+struct arm_spe_kvm_info {
+   int physical_irq;
+};
+
+struct arm_spe_kvm_info *arm_spe_get_kvm_info(void);
+
 #ifdef CONFIG_KVM_ARM_SPE
 #define kvm_arm_spe_v1_ready(v)((v)->arch.spe.ready)
 #define kvm_arm_spe_irq_initialized(v) \
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 15/18] perf: arm_spe: Handle guest/host exclusion flags

2019-12-20 Thread Andrew Murray
A side effect of supporting the SPE in guests is that we prevent the
host from collecting data whilst inside a guest thus creating a black-out
window. This occurs because instead of emulating the SPE, we share it
with our guests.

Let's accurately describe our capabilities by using the perf exclude
flags to prevent !exclude_guest and exclude_host flags from being used.

Signed-off-by: Andrew Murray 
---
 drivers/perf/arm_spe_pmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 2d24af4cfcab..3703dbf459de 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -679,6 +679,9 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
if (attr->exclude_idle)
return -EOPNOTSUPP;
 
+   if (!attr->exclude_guest || attr->exclude_host)
+   return -EOPNOTSUPP;
+
/*
 * Feedback-directed frequency throttling doesn't work when we
 * have a buffer of samples. We'd need to manually count the
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 18/18, KVMTOOL] kvm: add a vcpu feature for SPEv1 support

2019-12-20 Thread Andrew Murray
This is a runtime configurable for KVM tool to enable Statistical
Profiling Extensions version 1 support in guest kernel. A command line
option --spe is required to use the same.

Signed-off-by: Sudeep Holla 
[ Add SPE to init features ]
Signed-off-by: Andrew Murray 
---
 Makefile  |  2 +-
 arm/aarch64/arm-cpu.c |  2 +
 arm/aarch64/include/kvm/kvm-config-arch.h |  2 +
 arm/aarch64/include/kvm/kvm-cpu-arch.h|  3 +-
 arm/aarch64/kvm-cpu.c |  4 ++
 arm/include/arm-common/kvm-config-arch.h  |  1 +
 arm/include/arm-common/spe.h  |  4 ++
 arm/spe.c | 81 +++
 8 files changed, 97 insertions(+), 2 deletions(-)
 create mode 100644 arm/include/arm-common/spe.h
 create mode 100644 arm/spe.c

diff --git a/Makefile b/Makefile
index 3862112c5ec6..04dddb3e7699 100644
--- a/Makefile
+++ b/Makefile
@@ -158,7 +158,7 @@ endif
 # ARM
 OBJS_ARM_COMMON:= arm/fdt.o arm/gic.o arm/gicv2m.o 
arm/ioport.o \
   arm/kvm.o arm/kvm-cpu.o arm/pci.o arm/timer.o \
-  arm/pmu.o
+  arm/pmu.o arm/spe.o
 HDRS_ARM_COMMON:= arm/include
 ifeq ($(ARCH), arm)
DEFINES += -DCONFIG_ARM
diff --git a/arm/aarch64/arm-cpu.c b/arm/aarch64/arm-cpu.c
index d7572b7790b1..6ccea033f361 100644
--- a/arm/aarch64/arm-cpu.c
+++ b/arm/aarch64/arm-cpu.c
@@ -6,6 +6,7 @@
 #include "arm-common/gic.h"
 #include "arm-common/timer.h"
 #include "arm-common/pmu.h"
+#include "arm-common/spe.h"
 
 #include 
 #include 
@@ -17,6 +18,7 @@ static void generate_fdt_nodes(void *fdt, struct kvm *kvm)
gic__generate_fdt_nodes(fdt, kvm->cfg.arch.irqchip);
timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
pmu__generate_fdt_nodes(fdt, kvm);
+   spe__generate_fdt_nodes(fdt, kvm);
 }
 
 static int arm_cpu__vcpu_init(struct kvm_cpu *vcpu)
diff --git a/arm/aarch64/include/kvm/kvm-config-arch.h 
b/arm/aarch64/include/kvm/kvm-config-arch.h
index 04be43dfa9b2..9968e1666de5 100644
--- a/arm/aarch64/include/kvm/kvm-config-arch.h
+++ b/arm/aarch64/include/kvm/kvm-config-arch.h
@@ -6,6 +6,8 @@
"Run AArch32 guest"),   \
OPT_BOOLEAN('\0', "pmu", &(cfg)->has_pmuv3, \
"Create PMUv3 device"), \
+   OPT_BOOLEAN('\0', "spe", &(cfg)->has_spev1, \
+   "Create SPEv1 device"), \
OPT_U64('\0', "kaslr-seed", &(cfg)->kaslr_seed, \
"Specify random seed for Kernel Address Space " \
"Layout Randomization (KASLR)"),
diff --git a/arm/aarch64/include/kvm/kvm-cpu-arch.h 
b/arm/aarch64/include/kvm/kvm-cpu-arch.h
index 8dfb82ecbc37..080183fa4f81 100644
--- a/arm/aarch64/include/kvm/kvm-cpu-arch.h
+++ b/arm/aarch64/include/kvm/kvm-cpu-arch.h
@@ -8,7 +8,8 @@
 #define ARM_VCPU_FEATURE_FLAGS(kvm, cpuid) {   
\
[0] = ((!!(cpuid) << KVM_ARM_VCPU_POWER_OFF) |  
\
   (!!(kvm)->cfg.arch.aarch32_guest << KVM_ARM_VCPU_EL1_32BIT) |
\
-  (!!(kvm)->cfg.arch.has_pmuv3 << KVM_ARM_VCPU_PMU_V3))
\
+  (!!(kvm)->cfg.arch.has_pmuv3 << KVM_ARM_VCPU_PMU_V3) |   
\
+  (!!(kvm)->cfg.arch.has_spev1 << KVM_ARM_VCPU_SPE_V1))
\
 }
 
 #define ARM_MPIDR_HWID_BITMASK 0xFF00FFUL
diff --git a/arm/aarch64/kvm-cpu.c b/arm/aarch64/kvm-cpu.c
index 9f3e8586880c..90c2e1784e97 100644
--- a/arm/aarch64/kvm-cpu.c
+++ b/arm/aarch64/kvm-cpu.c
@@ -140,6 +140,10 @@ void kvm_cpu__select_features(struct kvm *kvm, struct 
kvm_vcpu_init *init)
/* Enable SVE if available */
if (kvm__supports_extension(kvm, KVM_CAP_ARM_SVE))
init->features[0] |= 1UL << KVM_ARM_VCPU_SVE;
+
+   /* Enable SPE if available */
+   if (kvm__supports_extension(kvm, KVM_CAP_ARM_SPE_V1))
+   init->features[0] |= 1UL << KVM_ARM_VCPU_SPE_V1;
 }
 
 int kvm_cpu__configure_features(struct kvm_cpu *vcpu)
diff --git a/arm/include/arm-common/kvm-config-arch.h 
b/arm/include/arm-common/kvm-config-arch.h
index 5734c46ab9e6..742733e289af 100644
--- a/arm/include/arm-common/kvm-config-arch.h
+++ b/arm/include/arm-common/kvm-config-arch.h
@@ -9,6 +9,7 @@ struct kvm_config_arch {
boolvirtio_trans_pci;
boolaarch32_guest;
boolhas_pmuv3;
+   boolhas_spev1;
u64 kaslr_seed;
enum irqchip_type irqchip;
u64 fw_addr;
diff --git a/arm

[PATCH v2 04/18] arm64: KVM: add SPE system registers to sys_reg_descs

2019-12-20 Thread Andrew Murray
From: Sudeep Holla 

Add the Statistical Profiling Extension(SPE) Profiling Buffer controls
registers such that we can provide initial register values and use the
sys_regs structure as a store for our SPE context.

Signed-off-by: Sudeep Holla 
[ Reword commit, remove access/reset handlers, defer kvm_arm_support_spe_v1 ]
Signed-off-by: Andrew Murray 
---
 arch/arm64/include/asm/kvm_host.h | 12 
 arch/arm64/kvm/sys_regs.c | 11 +++
 2 files changed, 23 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index f5dcff912645..9eb85f14df90 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -145,6 +145,18 @@ enum vcpu_sysreg {
MDCCINT_EL1,/* Monitor Debug Comms Channel Interrupt Enable Reg */
DISR_EL1,   /* Deferred Interrupt Status Register */
 
+   /* Statistical Profiling Extension Registers */
+   PMSCR_EL1,
+   PMSICR_EL1,
+   PMSIRR_EL1,
+   PMSFCR_EL1,
+   PMSEVFR_EL1,
+   PMSLATFR_EL1,
+   PMSIDR_EL1,
+   PMBLIMITR_EL1,
+   PMBPTR_EL1,
+   PMBSR_EL1,
+
/* Performance Monitors Registers */
PMCR_EL0,   /* Control Register */
PMSELR_EL0, /* Event Counter Selection Register */
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 46822afc57e0..955b157f9cc5 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1506,6 +1506,17 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
 
+   { SYS_DESC(SYS_PMSCR_EL1), NULL, reset_val, PMSCR_EL1, 0 },
+   { SYS_DESC(SYS_PMSICR_EL1), NULL, reset_val, PMSICR_EL1, 0 },
+   { SYS_DESC(SYS_PMSIRR_EL1), NULL, reset_val, PMSIRR_EL1, 0 },
+   { SYS_DESC(SYS_PMSFCR_EL1), NULL, reset_val, PMSFCR_EL1, 0 },
+   { SYS_DESC(SYS_PMSEVFR_EL1), NULL, reset_val, PMSEVFR_EL1, 0 },
+   { SYS_DESC(SYS_PMSLATFR_EL1), NULL, reset_val, PMSLATFR_EL1, 0 },
+   { SYS_DESC(SYS_PMSIDR_EL1), NULL, reset_val, PMSIDR_EL1, 0 },
+   { SYS_DESC(SYS_PMBLIMITR_EL1), NULL, reset_val, PMBLIMITR_EL1, 0 },
+   { SYS_DESC(SYS_PMBPTR_EL1), NULL, reset_val, PMBPTR_EL1, 0 },
+   { SYS_DESC(SYS_PMBSR_EL1), NULL, reset_val, PMBSR_EL1, 0 },
+
{ SYS_DESC(SYS_PMINTENSET_EL1), access_pminten, reset_unknown, 
PMINTENSET_EL1 },
{ SYS_DESC(SYS_PMINTENCLR_EL1), access_pminten, NULL, PMINTENSET_EL1 },
 
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 16/18] KVM: arm64: enable SPE support

2019-12-20 Thread Andrew Murray
From: Sudeep Holla 

We have all the bits and pieces to enable SPE for guest in place, so
lets enable it.

Signed-off-by: Sudeep Holla 
Signed-off-by: Andrew Murray 
---
 virt/kvm/arm/arm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index a66085c8e785..fb3ad0835255 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -611,6 +611,10 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
return ret;
 
ret = kvm_arm_pmu_v3_enable(vcpu);
+   if (ret)
+   return ret;
+
+   ret = kvm_arm_spe_v1_enable(vcpu);
 
return ret;
 }
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls

2019-12-20 Thread Andrew Murray
From: Sudeep Holla 

Now that we can save/restore the full SPE controls, we can enable it
if SPE is setup and ready to use in KVM. It's supported in KVM only if
all the CPUs in the system supports SPE.

However to support heterogenous systems, we need to move the check if
host supports SPE and do a partial save/restore.

Signed-off-by: Sudeep Holla 
Signed-off-by: Andrew Murray 
---
 arch/arm64/kvm/hyp/debug-sr.c | 33 -
 include/kvm/arm_spe.h |  6 ++
 2 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 12429b212a3a..d8d857067e6d 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -86,18 +86,13 @@
}
 
 static void __hyp_text
-__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
+__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
 {
u64 reg;
 
/* Clear pmscr in case of early return */
ctxt->sys_regs[PMSCR_EL1] = 0;
 
-   /* SPE present on this CPU? */
-   if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
- ID_AA64DFR0_PMSVER_SHIFT))
-   return;
-
/* Yes; is it owned by higher EL? */
reg = read_sysreg_s(SYS_PMBIDR_EL1);
if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
@@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool 
full_ctxt)
 }
 
 static void __hyp_text
-__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
+__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
 {
if (!ctxt->sys_regs[PMSCR_EL1])
return;
@@ -210,11 +205,14 @@ void __hyp_text __debug_restore_guest_context(struct 
kvm_vcpu *vcpu)
struct kvm_guest_debug_arch *host_dbg;
struct kvm_guest_debug_arch *guest_dbg;
 
+   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+   guest_ctxt = >arch.ctxt;
+
+   __debug_restore_spe_context(guest_ctxt, kvm_arm_spe_v1_ready(vcpu));
+
if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
return;
 
-   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
-   guest_ctxt = >arch.ctxt;
host_dbg = >arch.host_debug_state.regs;
guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
 
@@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct 
kvm_vcpu *vcpu)
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
guest_ctxt = >arch.ctxt;
 
-   if (!has_vhe())
-   __debug_restore_spe_nvhe(host_ctxt, false);
+   __debug_restore_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
 
if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
return;
@@ -249,19 +246,21 @@ void __hyp_text __debug_restore_host_context(struct 
kvm_vcpu *vcpu)
 
 void __hyp_text __debug_save_host_context(struct kvm_vcpu *vcpu)
 {
-   /*
-* Non-VHE: Disable and flush SPE data generation
-* VHE: The vcpu can run, but it can't hide.
-*/
struct kvm_cpu_context *host_ctxt;
 
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
-   if (!has_vhe())
-   __debug_save_spe_nvhe(host_ctxt, false);
+   if (cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
+ID_AA64DFR0_PMSVER_SHIFT))
+   __debug_save_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
 }
 
 void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)
 {
+   bool kvm_spe_ready = kvm_arm_spe_v1_ready(vcpu);
+
+   /* SPE present on this vCPU? */
+   if (kvm_spe_ready)
+   __debug_save_spe_context(>arch.ctxt, kvm_spe_ready);
 }
 
 u32 __hyp_text __kvm_get_mdcr_el2(void)
diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
index 48d118fdb174..30c40b1bc385 100644
--- a/include/kvm/arm_spe.h
+++ b/include/kvm/arm_spe.h
@@ -16,4 +16,10 @@ struct kvm_spe {
bool irq_level;
 };
 
+#ifdef CONFIG_KVM_ARM_SPE
+#define kvm_arm_spe_v1_ready(v)((v)->arch.spe.ready)
+#else
+#define kvm_arm_spe_v1_ready(v)(false)
+#endif /* CONFIG_KVM_ARM_SPE */
+
 #endif /* __ASM_ARM_KVM_SPE_H */
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 12/18] KVM: arm64: add a new vcpu device control group for SPEv1

2019-12-20 Thread Andrew Murray
From: Sudeep Holla 

To configure the virtual SPEv1 overflow interrupt number, we use the
vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_SPE_V1_IRQ
attribute within the KVM_ARM_VCPU_SPE_V1_CTRL group.

After configuring the SPEv1, call the vcpu ioctl with attribute
KVM_ARM_VCPU_SPE_V1_INIT to initialize the SPEv1.

Signed-off-by: Sudeep Holla 
Signed-off-by: Andrew Murray 
---
 Documentation/virt/kvm/devices/vcpu.txt |  28 
 arch/arm64/include/asm/kvm_host.h   |   2 +-
 arch/arm64/include/uapi/asm/kvm.h   |   4 +
 arch/arm64/kvm/Makefile |   1 +
 arch/arm64/kvm/guest.c  |   6 +
 arch/arm64/kvm/reset.c  |   3 +
 include/kvm/arm_spe.h   |  45 +++
 include/uapi/linux/kvm.h|   1 +
 virt/kvm/arm/arm.c  |   1 +
 virt/kvm/arm/spe.c  | 163 
 10 files changed, 253 insertions(+), 1 deletion(-)
 create mode 100644 virt/kvm/arm/spe.c

diff --git a/Documentation/virt/kvm/devices/vcpu.txt 
b/Documentation/virt/kvm/devices/vcpu.txt
index 6f3bd64a05b0..cefad056d677 100644
--- a/Documentation/virt/kvm/devices/vcpu.txt
+++ b/Documentation/virt/kvm/devices/vcpu.txt
@@ -74,3 +74,31 @@ Specifies the base address of the stolen time structure for 
this VCPU. The
 base address must be 64 byte aligned and exist within a valid guest memory
 region. See Documentation/virt/kvm/arm/pvtime.txt for more information
 including the layout of the stolen time structure.
+
+4. GROUP: KVM_ARM_VCPU_SPE_V1_CTRL
+Architectures: ARM64
+
+4.1. ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_IRQ
+Parameters: in kvm_device_attr.addr the address for SPE buffer overflow 
interrupt
+   is a pointer to an int
+Returns: -EBUSY: The SPE overflow interrupt is already set
+ -ENXIO: The overflow interrupt not set when attempting to get it
+ -ENODEV: SPEv1 not supported
+ -EINVAL: Invalid SPE overflow interrupt number supplied or
+  trying to set the IRQ number without using an in-kernel
+  irqchip.
+
+A value describing the SPEv1 (Statistical Profiling Extension v1) overflow
+interrupt number for this vcpu. This interrupt should be PPI and the interrupt
+type and number must be same for each vcpu.
+
+4.2 ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_INIT
+Parameters: no additional parameter in kvm_device_attr.addr
+Returns: -ENODEV: SPEv1 not supported or GIC not initialized
+ -ENXIO: SPEv1 not properly configured or in-kernel irqchip not
+ configured as required prior to calling this attribute
+ -EBUSY: SPEv1 already initialized
+
+Request the initialization of the SPEv1.  If using the SPEv1 with an in-kernel
+virtual GIC implementation, this must be done after initializing the in-kernel
+irqchip.
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 333c6491bec7..d00f450dc4cd 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -39,7 +39,7 @@
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
-#define KVM_VCPU_MAX_FEATURES 7
+#define KVM_VCPU_MAX_FEATURES 8
 
 #define KVM_REQ_SLEEP \
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index 820e5751ada7..905a73f30079 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE   4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS   5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC   6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_SPE_V17 /* Support guest SPEv1 */
 
 struct kvm_vcpu_init {
__u32 target;
@@ -326,6 +327,9 @@ struct kvm_vcpu_events {
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER1
 #define KVM_ARM_VCPU_PVTIME_CTRL   2
 #define   KVM_ARM_VCPU_PVTIME_IPA  0
+#define KVM_ARM_VCPU_SPE_V1_CTRL   3
+#define   KVM_ARM_VCPU_SPE_V1_IRQ  0
+#define   KVM_ARM_VCPU_SPE_V1_INIT 1
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT28
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 5ffbdc39e780..526f3bf09321 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -37,3 +37,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-debug.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
 kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
+kvm-$(CONFIG_KVM_ARM_SPE) += $(KVM)/arm/spe.o
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 2fff06114a8f..50fea538b8bd 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -874,6 +874,8 @@ int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
break;
case KVM_ARM_VCPU_PVTIME_CTRL

[PATCH v2 02/18] arm64: KVM: reset E2PB correctly in MDCR_EL2 when exiting the guest(VHE)

2019-12-20 Thread Andrew Murray
From: Sudeep Holla 

On VHE systems, the reset value for MDCR_EL2.E2PB=b00 which defaults
to profiling buffer using the EL2 stage 1 translations. However if the
guest are allowed to use profiling buffers changing E2PB settings, we
need to ensure we resume back MDCR_EL2.E2PB=b00. Currently we just
do bitwise '&' with MDCR_EL2_E2PB_MASK which will retain the value.

So fix it by clearing all the bits in E2PB.

Signed-off-by: Sudeep Holla 
Signed-off-by: Andrew Murray 
---
 arch/arm64/kvm/hyp/switch.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 72fbbd86eb5e..250f13910882 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -228,9 +228,7 @@ void deactivate_traps_vhe_put(void)
 {
u64 mdcr_el2 = read_sysreg(mdcr_el2);
 
-   mdcr_el2 &= MDCR_EL2_HPMN_MASK |
-   MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
-   MDCR_EL2_TPMS;
+   mdcr_el2 &= MDCR_EL2_HPMN_MASK | MDCR_EL2_TPMS;
 
write_sysreg(mdcr_el2, mdcr_el2);
 
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 03/18] arm64: KVM: define SPE data structure for each vcpu

2019-12-20 Thread Andrew Murray
From: Sudeep Holla 

In order to support virtual SPE for guest, so define some basic structs.
This features depends on host having hardware with SPE support.

Since we can support this only on ARM64, add a separate config symbol
for the same.

Signed-off-by: Sudeep Holla 
[ Add irq_level, rename irq to irq_num for kvm_spe ]
Signed-off-by: Andrew Murray 
---
 arch/arm64/include/asm/kvm_host.h |  2 ++
 arch/arm64/kvm/Kconfig|  7 +++
 include/kvm/arm_spe.h | 19 +++
 3 files changed, 28 insertions(+)
 create mode 100644 include/kvm/arm_spe.h

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index c61260cf63c5..f5dcff912645 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
@@ -302,6 +303,7 @@ struct kvm_vcpu_arch {
struct vgic_cpu vgic_cpu;
struct arch_timer_cpu timer_cpu;
struct kvm_pmu pmu;
+   struct kvm_spe spe;
 
/*
 * Anything that is not used directly from assembly code goes
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index a475c68cbfec..af5be2c57dcb 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -35,6 +35,7 @@ config KVM
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
select KVM_ARM_PMU if HW_PERF_EVENTS
+   select KVM_ARM_SPE if (HW_PERF_EVENTS && ARM_SPE_PMU)
select HAVE_KVM_MSI
select HAVE_KVM_IRQCHIP
select HAVE_KVM_IRQ_ROUTING
@@ -61,6 +62,12 @@ config KVM_ARM_PMU
  Adds support for a virtual Performance Monitoring Unit (PMU) in
  virtual machines.
 
+config KVM_ARM_SPE
+   bool
+   ---help---
+ Adds support for a virtual Statistical Profiling Extension(SPE) in
+ virtual machines.
+
 config KVM_INDIRECT_VECTORS
def_bool KVM && (HARDEN_BRANCH_PREDICTOR || HARDEN_EL2_VECTORS)
 
diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
new file mode 100644
index ..48d118fdb174
--- /dev/null
+++ b/include/kvm/arm_spe.h
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 ARM Ltd.
+ */
+
+#ifndef __ASM_ARM_KVM_SPE_H
+#define __ASM_ARM_KVM_SPE_H
+
+#include 
+#include 
+
+struct kvm_spe {
+   int irq_num;
+   bool ready; /* indicates that SPE KVM instance is ready for use */
+   bool created; /* SPE KVM instance is created, may not be ready yet */
+   bool irq_level;
+};
+
+#endif /* __ASM_ARM_KVM_SPE_H */
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 05/18] arm64: KVM/VHE: enable the use PMSCR_EL12 on VHE systems

2019-12-20 Thread Andrew Murray
From: Sudeep Holla 

Currently, we are just using PMSCR_EL1 in the host for non VHE systems.
We already have the {read,write}_sysreg_el*() accessors for accessing
particular ELs' sysregs in the presence of VHE.

Lets just define PMSCR_EL12 and start making use of it here which will
access the right register on both VHE and non VHE systems. This change
is required to add SPE guest support on VHE systems.

Signed-off-by: Sudeep Holla 
Signed-off-by: Andrew Murray 
---
 arch/arm64/include/asm/sysreg.h | 1 +
 arch/arm64/kvm/hyp/debug-sr.c   | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 6e919fafb43d..6c0b0ad97688 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -468,6 +468,7 @@
 #define SYS_AFSR1_EL12 sys_reg(3, 5, 5, 1, 1)
 #define SYS_ESR_EL12   sys_reg(3, 5, 5, 2, 0)
 #define SYS_FAR_EL12   sys_reg(3, 5, 6, 0, 0)
+#define SYS_PMSCR_EL12 sys_reg(3, 5, 9, 9, 0)
 #define SYS_MAIR_EL12  sys_reg(3, 5, 10, 2, 0)
 #define SYS_AMAIR_EL12 sys_reg(3, 5, 10, 3, 0)
 #define SYS_VBAR_EL12  sys_reg(3, 5, 12, 0, 0)
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 0fc9872a1467..98be2f11c16c 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -108,8 +108,8 @@ static void __hyp_text __debug_save_spe_nvhe(u64 *pmscr_el1)
return;
 
/* Yes; save the control register and disable data generation */
-   *pmscr_el1 = read_sysreg_s(SYS_PMSCR_EL1);
-   write_sysreg_s(0, SYS_PMSCR_EL1);
+   *pmscr_el1 = read_sysreg_el1(SYS_PMSCR);
+   write_sysreg_el1(0, SYS_PMSCR);
isb();
 
/* Now drain all buffered data to memory */
@@ -126,7 +126,7 @@ static void __hyp_text __debug_restore_spe_nvhe(u64 
pmscr_el1)
isb();
 
/* Re-enable data generation */
-   write_sysreg_s(pmscr_el1, SYS_PMSCR_EL1);
+   write_sysreg_el1(pmscr_el1, SYS_PMSCR);
 }
 
 static void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual interrupts for SPE

2019-12-20 Thread Andrew Murray
Upon the exit of a guest, let's determine if the SPE device has generated
an interrupt - if so we'll inject a virtual interrupt to the guest.

Upon the entry and exit of a guest we'll also update the state of the
physical IRQ such that it is active when a guest interrupt is pending
and the guest is running.

Finally we map the physical IRQ to the virtual IRQ such that the guest
can deactivate the interrupt when it handles the interrupt.

Signed-off-by: Andrew Murray 
---
 include/kvm/arm_spe.h |  6 
 virt/kvm/arm/arm.c|  5 ++-
 virt/kvm/arm/spe.c| 71 +++
 3 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
index 9c65130d726d..91b2214f543a 100644
--- a/include/kvm/arm_spe.h
+++ b/include/kvm/arm_spe.h
@@ -37,6 +37,9 @@ static inline bool kvm_arm_support_spe_v1(void)
  ID_AA64DFR0_PMSVER_SHIFT);
 }
 
+void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu);
+inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
+
 int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr);
 int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
@@ -49,6 +52,9 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu);
 #define kvm_arm_support_spe_v1()   (false)
 #define kvm_arm_spe_irq_initialized(v) (false)
 
+static inline void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu) {}
+static inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu) {}
+
 static inline int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
  struct kvm_device_attr *attr)
 {
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 340d2388ee2c..a66085c8e785 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -741,6 +741,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
preempt_disable();
 
kvm_pmu_flush_hwstate(vcpu);
+   kvm_spe_flush_hwstate(vcpu);
 
local_irq_disable();
 
@@ -782,6 +783,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
kvm_request_pending(vcpu)) {
vcpu->mode = OUTSIDE_GUEST_MODE;
isb(); /* Ensure work in x_flush_hwstate is committed */
+   kvm_spe_sync_hwstate(vcpu);
kvm_pmu_sync_hwstate(vcpu);
if (static_branch_unlikely(_irqchip_in_use))
kvm_timer_sync_hwstate(vcpu);
@@ -816,11 +818,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
kvm_arm_clear_debug(vcpu);
 
/*
-* We must sync the PMU state before the vgic state so
+* We must sync the PMU and SPE state before the vgic state so
 * that the vgic can properly sample the updated state of the
 * interrupt line.
 */
kvm_pmu_sync_hwstate(vcpu);
+   kvm_spe_sync_hwstate(vcpu);
 
/*
 * Sync the vgic state before syncing the timer state because
diff --git a/virt/kvm/arm/spe.c b/virt/kvm/arm/spe.c
index 83ac2cce2cc3..097ed39014e4 100644
--- a/virt/kvm/arm/spe.c
+++ b/virt/kvm/arm/spe.c
@@ -35,6 +35,68 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu)
return 0;
 }
 
+static inline void set_spe_irq_phys_active(struct arm_spe_kvm_info *info,
+  bool active)
+{
+   int r;
+   r = irq_set_irqchip_state(info->physical_irq, IRQCHIP_STATE_ACTIVE,
+ active);
+   WARN_ON(r);
+}
+
+void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu)
+{
+   struct kvm_spe *spe = >arch.spe;
+   bool phys_active = false;
+   struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
+
+   if (!kvm_arm_spe_v1_ready(vcpu))
+   return;
+
+   if (irqchip_in_kernel(vcpu->kvm))
+   phys_active = kvm_vgic_map_is_active(vcpu, spe->irq_num);
+
+   phys_active |= spe->irq_level;
+
+   set_spe_irq_phys_active(info, phys_active);
+}
+
+void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu)
+{
+   struct kvm_spe *spe = >arch.spe;
+   u64 pmbsr;
+   int r;
+   bool service;
+   struct kvm_cpu_context *ctxt = >arch.ctxt;
+   struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
+
+   if (!kvm_arm_spe_v1_ready(vcpu))
+   return;
+
+   set_spe_irq_phys_active(info, false);
+
+   pmbsr = ctxt->sys_regs[PMBSR_EL1];
+   service = !!(pmbsr & BIT(SYS_PMBSR_EL1_S_SHIFT));
+   if (spe->irq_level == service)
+   return;
+
+   spe->irq_level = service;
+
+   if (likely(irqchip_in_kernel(vcpu->kvm))) {
+   r = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
+ 

[PATCH v2 07/18] arm64: KVM/debug: drop pmscr_el1 and use sys_regs[PMSCR_EL1] in kvm_cpu_context

2019-12-20 Thread Andrew Murray
From: Sudeep Holla 

kvm_cpu_context now has support to stash the complete SPE buffer control
context. We no longer need the pmscr_el1 kvm_vcpu_arch and it can be
dropped.

Signed-off-by: Sudeep Holla 
Signed-off-by: Andrew Murray 
---
 arch/arm64/include/asm/kvm_host.h |  2 --
 arch/arm64/kvm/hyp/debug-sr.c | 26 +++---
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 9eb85f14df90..333c6491bec7 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -307,8 +307,6 @@ struct kvm_vcpu_arch {
struct {
/* {Break,watch}point registers */
struct kvm_guest_debug_arch regs;
-   /* Statistical profiling extension */
-   u64 pmscr_el1;
} host_debug_state;
 
/* VGIC state */
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index c803daebd596..8a70a493345e 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -85,19 +85,19 @@
default:write_debug(ptr[0], reg, 0);\
}
 
-static void __hyp_text __debug_save_spe_nvhe(u64 *pmscr_el1)
+static void __hyp_text __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt)
 {
u64 reg;
 
/* Clear pmscr in case of early return */
-   *pmscr_el1 = 0;
+   ctxt->sys_regs[PMSCR_EL1] = 0;
 
/* SPE present on this CPU? */
if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
  ID_AA64DFR0_PMSVER_SHIFT))
return;
 
-   /* Yes; is it owned by EL3? */
+   /* Yes; is it owned by higher EL? */
reg = read_sysreg_s(SYS_PMBIDR_EL1);
if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
return;
@@ -108,7 +108,7 @@ static void __hyp_text __debug_save_spe_nvhe(u64 *pmscr_el1)
return;
 
/* Yes; save the control register and disable data generation */
-   *pmscr_el1 = read_sysreg_el1(SYS_PMSCR);
+   ctxt->sys_regs[PMSCR_EL1] = read_sysreg_el1(SYS_PMSCR);
write_sysreg_el1(0, SYS_PMSCR);
isb();
 
@@ -117,16 +117,16 @@ static void __hyp_text __debug_save_spe_nvhe(u64 
*pmscr_el1)
dsb(nsh);
 }
 
-static void __hyp_text __debug_restore_spe_nvhe(u64 pmscr_el1)
+static void __hyp_text __debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt)
 {
-   if (!pmscr_el1)
+   if (!ctxt->sys_regs[PMSCR_EL1])
return;
 
/* The host page table is installed, but not yet synchronised */
isb();
 
/* Re-enable data generation */
-   write_sysreg_el1(pmscr_el1, SYS_PMSCR);
+   write_sysreg_el1(ctxt->sys_regs[PMSCR_EL1], SYS_PMSCR);
 }
 
 static void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
@@ -194,14 +194,15 @@ void __hyp_text __debug_restore_host_context(struct 
kvm_vcpu *vcpu)
struct kvm_guest_debug_arch *host_dbg;
struct kvm_guest_debug_arch *guest_dbg;
 
+   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+   guest_ctxt = >arch.ctxt;
+
if (!has_vhe())
-   __debug_restore_spe_nvhe(vcpu->arch.host_debug_state.pmscr_el1);
+   __debug_restore_spe_nvhe(host_ctxt);
 
if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
return;
 
-   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
-   guest_ctxt = >arch.ctxt;
host_dbg = >arch.host_debug_state.regs;
guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
 
@@ -217,8 +218,11 @@ void __hyp_text __debug_save_host_context(struct kvm_vcpu 
*vcpu)
 * Non-VHE: Disable and flush SPE data generation
 * VHE: The vcpu can run, but it can't hide.
 */
+   struct kvm_cpu_context *host_ctxt;
+
+   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
if (!has_vhe())
-   __debug_save_spe_nvhe(>arch.host_debug_state.pmscr_el1);
+   __debug_save_spe_nvhe(host_ctxt);
 }
 
 void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 06/18] arm64: KVM: split debug save restore across vm/traps activation

2019-12-20 Thread Andrew Murray
From: Sudeep Holla 

If we enable profiling buffer controls at EL1 generate a trap exception
to EL2, it also changes profiling buffer to use EL1&0 stage 1 translation
regime in case of VHE. To support SPE both in the guest and host, we
need to first stop profiling and flush the profiling buffers before
we activate/switch vm or enable/disable the traps.

In prepartion to do that, lets split the debug save restore functionality
into 4 steps:
1. debug_save_host_context - saves the host context
2. debug_restore_guest_context - restore the guest context
3. debug_save_guest_context - saves the guest context
4. debug_restore_host_context - restores the host context

Lets rename existing __debug_switch_to_{host,guest} to make sure it's
aligned to the above and just add the place holders for new ones getting
added here as we need them to support SPE in guests.

Signed-off-by: Sudeep Holla 
Signed-off-by: Andrew Murray 
---
 arch/arm64/include/asm/kvm_hyp.h |  6 --
 arch/arm64/kvm/hyp/debug-sr.c| 25 -
 arch/arm64/kvm/hyp/switch.c  | 12 
 3 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 97f21cc66657..011e7963f772 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -69,8 +69,10 @@ void sysreg_restore_guest_state_vhe(struct kvm_cpu_context 
*ctxt);
 void __sysreg32_save_state(struct kvm_vcpu *vcpu);
 void __sysreg32_restore_state(struct kvm_vcpu *vcpu);
 
-void __debug_switch_to_guest(struct kvm_vcpu *vcpu);
-void __debug_switch_to_host(struct kvm_vcpu *vcpu);
+void __debug_save_host_context(struct kvm_vcpu *vcpu);
+void __debug_restore_guest_context(struct kvm_vcpu *vcpu);
+void __debug_save_guest_context(struct kvm_vcpu *vcpu);
+void __debug_restore_host_context(struct kvm_vcpu *vcpu);
 
 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
 void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 98be2f11c16c..c803daebd596 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -168,20 +168,13 @@ static void __hyp_text __debug_restore_state(struct 
kvm_vcpu *vcpu,
write_sysreg(ctxt->sys_regs[MDCCINT_EL1], mdccint_el1);
 }
 
-void __hyp_text __debug_switch_to_guest(struct kvm_vcpu *vcpu)
+void __hyp_text __debug_restore_guest_context(struct kvm_vcpu *vcpu)
 {
struct kvm_cpu_context *host_ctxt;
struct kvm_cpu_context *guest_ctxt;
struct kvm_guest_debug_arch *host_dbg;
struct kvm_guest_debug_arch *guest_dbg;
 
-   /*
-* Non-VHE: Disable and flush SPE data generation
-* VHE: The vcpu can run, but it can't hide.
-*/
-   if (!has_vhe())
-   __debug_save_spe_nvhe(>arch.host_debug_state.pmscr_el1);
-
if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
return;
 
@@ -194,7 +187,7 @@ void __hyp_text __debug_switch_to_guest(struct kvm_vcpu 
*vcpu)
__debug_restore_state(vcpu, guest_dbg, guest_ctxt);
 }
 
-void __hyp_text __debug_switch_to_host(struct kvm_vcpu *vcpu)
+void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
 {
struct kvm_cpu_context *host_ctxt;
struct kvm_cpu_context *guest_ctxt;
@@ -218,6 +211,20 @@ void __hyp_text __debug_switch_to_host(struct kvm_vcpu 
*vcpu)
vcpu->arch.flags &= ~KVM_ARM64_DEBUG_DIRTY;
 }
 
+void __hyp_text __debug_save_host_context(struct kvm_vcpu *vcpu)
+{
+   /*
+* Non-VHE: Disable and flush SPE data generation
+* VHE: The vcpu can run, but it can't hide.
+*/
+   if (!has_vhe())
+   __debug_save_spe_nvhe(>arch.host_debug_state.pmscr_el1);
+}
+
+void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)
+{
+}
+
 u32 __hyp_text __kvm_get_mdcr_el2(void)
 {
return read_sysreg(mdcr_el2);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 250f13910882..67b7c160f65b 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -626,6 +626,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
guest_ctxt = >arch.ctxt;
 
sysreg_save_host_state_vhe(host_ctxt);
+   __debug_save_host_context(vcpu);
 
/*
 * ARM erratum 1165522 requires us to configure both stage 1 and
@@ -642,7 +643,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
__activate_traps(vcpu);
 
sysreg_restore_guest_state_vhe(guest_ctxt);
-   __debug_switch_to_guest(vcpu);
+   __debug_restore_guest_context(vcpu);
 
__set_guest_arch_workaround_state(vcpu);
 
@@ -656,6 +657,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
__set_host_arch_workaround_state(vcpu);
 
sysreg_save_guest_state_vhe(guest_ctxt);
+   __debug_save_guest_context(vcpu);
 
__deactivate_traps(vcpu);
 
@@ -664,7 +666,7 

[PATCH v2 01/18] dt-bindings: ARM SPE: highlight the need for PPI partitions on heterogeneous systems

2019-12-20 Thread Andrew Murray
From: Sudeep Holla 

It's not entirely clear for the binding document that the only way to
express ARM SPE affined to a subset of CPUs on a heterogeneous systems
is through the use of PPI partitions available in the interrupt
controller bindings.

Let's make it clear.

Signed-off-by: Sudeep Holla 
Signed-off-by: Andrew Murray 
---
 Documentation/devicetree/bindings/arm/spe-pmu.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/arm/spe-pmu.txt 
b/Documentation/devicetree/bindings/arm/spe-pmu.txt
index 93372f2a7df9..4f4815800f6e 100644
--- a/Documentation/devicetree/bindings/arm/spe-pmu.txt
+++ b/Documentation/devicetree/bindings/arm/spe-pmu.txt
@@ -9,8 +9,9 @@ performance sample data using an in-memory trace buffer.
   "arm,statistical-profiling-extension-v1"
 
 - interrupts : Exactly 1 PPI must be listed. For heterogeneous systems where
-   SPE is only supported on a subset of the CPUs, please consult
-  the arm,gic-v3 binding for details on describing a PPI partition.
+   SPE is only supported on a subset of the CPUs, a PPI partition
+  described in the arm,gic-v3 binding must be used to describe
+  the set of CPUs this interrupt is affine to.
 
 ** Example:
 
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 00/18] arm64: KVM: add SPE profiling support

2019-12-20 Thread Andrew Murray
This series implements support for allowing KVM guests to use the Arm
Statistical Profiling Extension (SPE).

It has been tested on a model to ensure that both host and guest can
simultaneously use SPE with valid data. E.g.

$ perf record -e arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ \
dd if=/dev/zero of=/dev/null count=1000
$ perf report --dump-raw-trace > spe_buf.txt

As we save and restore the SPE context, the guest can access the SPE
registers directly, thus in this version of the series we remove the
trapping and emulation.

In the previous series of this support, when KVM SPE isn't supported
(e.g. via CONFIG_KVM_ARM_SPE) we were able to return a value of 0 to
all reads of the SPE registers - as we can no longer do this there isn't
a mechanism to prevent the guest from using SPE - thus I'm keen for
feedback on the best way of resolving this.

It appears necessary to pin the entire guest memory in order to provide
guest SPE access - otherwise it is possible for the guest to receive
Stage-2 faults.

The last two extra patches are for the kvmtool if someone wants to play
with it.

Changes since v2:
- Rebased on v5.5-rc2
- Renamed kvm_spe structure 'irq' member to 'irq_num'
- Added irq_level to kvm_spe structure
- Clear PMBSR service bit on save to avoid spurious interrupts
- Update kvmtool headers to 5.4
- Enabled SPE in KVM init features
- No longer trap and emulate
- Add support for guest/host exclusion flags
- Fix virq support for SPE
- Adjusted sysreg_elx_s macros with merged clang build support

Andrew Murray (4):
  KVM: arm64: don't trap Statistical Profiling controls to EL2
  perf: arm_spe: Add KVM structure for obtaining IRQ info
  KVM: arm64: spe: Provide guest virtual interrupts for SPE
  perf: arm_spe: Handle guest/host exclusion flags

Sudeep Holla (12):
  dt-bindings: ARM SPE: highlight the need for PPI partitions on
heterogeneous systems
  arm64: KVM: reset E2PB correctly in MDCR_EL2 when exiting the
guest(VHE)
  arm64: KVM: define SPE data structure for each vcpu
  arm64: KVM: add SPE system registers to sys_reg_descs
  arm64: KVM/VHE: enable the use PMSCR_EL12 on VHE systems
  arm64: KVM: split debug save restore across vm/traps activation
  arm64: KVM/debug: drop pmscr_el1 and use sys_regs[PMSCR_EL1] in
kvm_cpu_context
  arm64: KVM: add support to save/restore SPE profiling buffer controls
  arm64: KVM: enable conditional save/restore full SPE profiling buffer
controls
  arm64: KVM/debug: use EL1&0 stage 1 translation regime
  KVM: arm64: add a new vcpu device control group for SPEv1
  KVM: arm64: enable SPE support
  KVMTOOL: update_headers: Sync kvm UAPI headers with linux v5.5-rc2
  KVMTOOL: kvm: add a vcpu feature for SPEv1 support

 .../devicetree/bindings/arm/spe-pmu.txt   |   5 +-
 Documentation/virt/kvm/devices/vcpu.txt   |  28 +++
 arch/arm64/include/asm/kvm_host.h |  18 +-
 arch/arm64/include/asm/kvm_hyp.h  |   6 +-
 arch/arm64/include/asm/sysreg.h   |   1 +
 arch/arm64/include/uapi/asm/kvm.h |   4 +
 arch/arm64/kvm/Kconfig|   7 +
 arch/arm64/kvm/Makefile   |   1 +
 arch/arm64/kvm/debug.c|   2 -
 arch/arm64/kvm/guest.c|   6 +
 arch/arm64/kvm/hyp/debug-sr.c | 105 +---
 arch/arm64/kvm/hyp/switch.c   |  18 +-
 arch/arm64/kvm/reset.c|   3 +
 arch/arm64/kvm/sys_regs.c |  11 +
 drivers/perf/arm_spe_pmu.c|  26 ++
 include/kvm/arm_spe.h |  82 ++
 include/uapi/linux/kvm.h  |   1 +
 virt/kvm/arm/arm.c|  10 +-
 virt/kvm/arm/spe.c| 234 ++
 19 files changed, 521 insertions(+), 47 deletions(-)
 create mode 100644 include/kvm/arm_spe.h
 create mode 100644 virt/kvm/arm/spe.c

-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2

2019-12-20 Thread Andrew Murray
As we now save/restore the profiler state there is no need to trap
accesses to the statistical profiling controls. Let's unset the
_TPMS bit.

Signed-off-by: Andrew Murray 
---
 arch/arm64/kvm/debug.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 43487f035385..07ca783e7d9e 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -88,7 +88,6 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
  *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
  *  - Debug ROM Address (MDCR_EL2_TDRA)
  *  - OS related registers (MDCR_EL2_TDOSA)
- *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
  *
  * Additionally, KVM only traps guest accesses to the debug registers if
  * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
@@ -111,7 +110,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 */
vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
-   MDCR_EL2_TPMS |
MDCR_EL2_TPMCR |
MDCR_EL2_TDRA |
MDCR_EL2_TDOSA);
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 08/18] arm64: KVM: add support to save/restore SPE profiling buffer controls

2019-12-20 Thread Andrew Murray
From: Sudeep Holla 

Currently since we don't support profiling using SPE in the guests,
we just save the PMSCR_EL1, flush the profiling buffers and disable
sampling. However in order to support simultaneous sampling both in
the host and guests, we need to save and reatore the complete SPE
profiling buffer controls' context.

Let's add the support for the same and keep it disabled for now.
We can enable it conditionally only if guests are allowed to use
SPE.

Signed-off-by: Sudeep Holla 
[ Clear PMBSR bit when saving state to prevent spurious interrupts ]
Signed-off-by: Andrew Murray 
---
 arch/arm64/kvm/hyp/debug-sr.c | 51 +--
 1 file changed, 43 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 8a70a493345e..12429b212a3a 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -85,7 +85,8 @@
default:write_debug(ptr[0], reg, 0);\
}
 
-static void __hyp_text __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt)
+static void __hyp_text
+__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
 {
u64 reg;
 
@@ -102,22 +103,46 @@ static void __hyp_text __debug_save_spe_nvhe(struct 
kvm_cpu_context *ctxt)
if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
return;
 
-   /* No; is the host actually using the thing? */
-   reg = read_sysreg_s(SYS_PMBLIMITR_EL1);
-   if (!(reg & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)))
+   /* Save the control register and disable data generation */
+   ctxt->sys_regs[PMSCR_EL1] = read_sysreg_el1(SYS_PMSCR);
+
+   if (!ctxt->sys_regs[PMSCR_EL1])
return;
 
/* Yes; save the control register and disable data generation */
-   ctxt->sys_regs[PMSCR_EL1] = read_sysreg_el1(SYS_PMSCR);
write_sysreg_el1(0, SYS_PMSCR);
isb();
 
/* Now drain all buffered data to memory */
psb_csync();
dsb(nsh);
+
+   if (!full_ctxt)
+   return;
+
+   ctxt->sys_regs[PMBLIMITR_EL1] = read_sysreg_s(SYS_PMBLIMITR_EL1);
+   write_sysreg_s(0, SYS_PMBLIMITR_EL1);
+
+   /*
+* As PMBSR is conditionally restored when returning to the host we
+* must ensure the service bit is unset here to prevent a spurious
+* host SPE interrupt from being raised.
+*/
+   ctxt->sys_regs[PMBSR_EL1] = read_sysreg_s(SYS_PMBSR_EL1);
+   write_sysreg_s(0, SYS_PMBSR_EL1);
+
+   isb();
+
+   ctxt->sys_regs[PMSICR_EL1] = read_sysreg_s(SYS_PMSICR_EL1);
+   ctxt->sys_regs[PMSIRR_EL1] = read_sysreg_s(SYS_PMSIRR_EL1);
+   ctxt->sys_regs[PMSFCR_EL1] = read_sysreg_s(SYS_PMSFCR_EL1);
+   ctxt->sys_regs[PMSEVFR_EL1] = read_sysreg_s(SYS_PMSEVFR_EL1);
+   ctxt->sys_regs[PMSLATFR_EL1] = read_sysreg_s(SYS_PMSLATFR_EL1);
+   ctxt->sys_regs[PMBPTR_EL1] = read_sysreg_s(SYS_PMBPTR_EL1);
 }
 
-static void __hyp_text __debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt)
+static void __hyp_text
+__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
 {
if (!ctxt->sys_regs[PMSCR_EL1])
return;
@@ -126,6 +151,16 @@ static void __hyp_text __debug_restore_spe_nvhe(struct 
kvm_cpu_context *ctxt)
isb();
 
/* Re-enable data generation */
+   if (full_ctxt) {
+   write_sysreg_s(ctxt->sys_regs[PMBPTR_EL1], SYS_PMBPTR_EL1);
+   write_sysreg_s(ctxt->sys_regs[PMBLIMITR_EL1], 
SYS_PMBLIMITR_EL1);
+   write_sysreg_s(ctxt->sys_regs[PMSFCR_EL1], SYS_PMSFCR_EL1);
+   write_sysreg_s(ctxt->sys_regs[PMSEVFR_EL1], SYS_PMSEVFR_EL1);
+   write_sysreg_s(ctxt->sys_regs[PMSLATFR_EL1], SYS_PMSLATFR_EL1);
+   write_sysreg_s(ctxt->sys_regs[PMSIRR_EL1], SYS_PMSIRR_EL1);
+   write_sysreg_s(ctxt->sys_regs[PMSICR_EL1], SYS_PMSICR_EL1);
+   write_sysreg_s(ctxt->sys_regs[PMBSR_EL1], SYS_PMBSR_EL1);
+   }
write_sysreg_el1(ctxt->sys_regs[PMSCR_EL1], SYS_PMSCR);
 }
 
@@ -198,7 +233,7 @@ void __hyp_text __debug_restore_host_context(struct 
kvm_vcpu *vcpu)
guest_ctxt = >arch.ctxt;
 
if (!has_vhe())
-   __debug_restore_spe_nvhe(host_ctxt);
+   __debug_restore_spe_nvhe(host_ctxt, false);
 
if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
return;
@@ -222,7 +257,7 @@ void __hyp_text __debug_save_host_context(struct kvm_vcpu 
*vcpu)
 
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
if (!has_vhe())
-   __debug_save_spe_nvhe(host_ctxt);
+   __debug_save_spe_nvhe(host_ctxt, false);
 }
 
 void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 10/18] arm64: KVM/debug: use EL1&0 stage 1 translation regime

2019-12-20 Thread Andrew Murray
From: Sudeep Holla 

Now that we have all the save/restore mechanism in place, lets enable
the translation regime used by buffer from EL2 stage 1 to EL1 stage 1
on VHE systems.

Signed-off-by: Sudeep Holla 
[ Reword commit, don't trap to EL2 ]
Signed-off-by: Andrew Murray 
---
 arch/arm64/kvm/hyp/switch.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 67b7c160f65b..6c153b79829b 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -100,6 +100,7 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
 
write_sysreg(val, cpacr_el1);
 
+   write_sysreg(vcpu->arch.mdcr_el2 | 3 << MDCR_EL2_E2PB_SHIFT, mdcr_el2);
write_sysreg(kvm_get_hyp_vector(), vbar_el1);
 }
 NOKPROBE_SYMBOL(activate_traps_vhe);
@@ -117,6 +118,7 @@ static void __hyp_text __activate_traps_nvhe(struct 
kvm_vcpu *vcpu)
__activate_traps_fpsimd32(vcpu);
}
 
+   write_sysreg(vcpu->arch.mdcr_el2 | 3 << MDCR_EL2_E2PB_SHIFT, mdcr_el2);
write_sysreg(val, cptr_el2);
 
if (cpus_have_const_cap(ARM64_WORKAROUND_1319367)) {
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 0/3] arm64: perf: Add support for ARMv8.5-PMU 64-bit counters

2019-12-10 Thread Andrew Murray
At present ARMv8 event counters are limited to 32-bits, though by
using the CHAIN event it's possible to combine adjacent counters to
achieve 64-bits. The perf config1:0 bit can be set to use such a
configuration.

With the introduction of ARMv8.5-PMU support, all event counters can
now be used as 64-bit counters. Let's add support for 64-bit event
counters.

As KVM doesn't yet support 64-bit event counters, we also trap
and emulate the Debug Feature Registers to limit the PMU version a
guest sees to PMUv3 for ARMv8.4.

Tested by running the following perf command on both guest and host
and ensuring that the figures are very similar:

perf stat -e armv8_pmuv3/inst_retired,long=1/ \
  -e armv8_pmuv3/inst_retired,long=0/ -e cycles

Changes since v1:

 - Rebased onto v5.5-rc1


Andrew Murray (3):
  arm64: cpufeature: Extract capped fields
  KVM: arm64: limit PMU version to ARMv8.4
  arm64: perf: Add support for ARMv8.5-PMU 64-bit counters

 arch/arm64/include/asm/cpufeature.h | 15 +
 arch/arm64/include/asm/perf_event.h |  3 +-
 arch/arm64/include/asm/sysreg.h |  4 ++
 arch/arm64/kernel/perf_event.c  | 86 +++--
 arch/arm64/kvm/sys_regs.c   | 36 +++-
 include/linux/perf/arm_pmu.h|  1 +
 6 files changed, 125 insertions(+), 20 deletions(-)

-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 3/3] arm64: perf: Add support for ARMv8.5-PMU 64-bit counters

2019-12-10 Thread Andrew Murray
At present ARMv8 event counters are limited to 32-bits, though by
using the CHAIN event it's possible to combine adjacent counters to
achieve 64-bits. The perf config1:0 bit can be set to use such a
configuration.

With the introduction of ARMv8.5-PMU support, all event counters can
now be used as 64-bit counters.

Let's enable 64-bit event counters where support exists. Unless the
user sets config1:0 we will adjust the counter value such that it
overflows upon 32-bit overflow. This follows the same behaviour as
the cycle counter which has always been (and remains) 64-bits.

Signed-off-by: Andrew Murray 
---
 arch/arm64/include/asm/perf_event.h |  3 +-
 arch/arm64/kernel/perf_event.c  | 86 +++--
 include/linux/perf/arm_pmu.h|  1 +
 3 files changed, 72 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/perf_event.h 
b/arch/arm64/include/asm/perf_event.h
index 2bdbc79bbd01..e7765b62c712 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -176,9 +176,10 @@
 #define ARMV8_PMU_PMCR_X   (1 << 4) /* Export to ETM */
 #define ARMV8_PMU_PMCR_DP  (1 << 5) /* Disable CCNT if non-invasive debug*/
 #define ARMV8_PMU_PMCR_LC  (1 << 6) /* Overflow on 64 bit cycle counter */
+#define ARMV8_PMU_PMCR_LP  (1 << 7) /* Long event counter enable */
 #defineARMV8_PMU_PMCR_N_SHIFT  11   /* Number of counters 
supported */
 #defineARMV8_PMU_PMCR_N_MASK   0x1f
-#defineARMV8_PMU_PMCR_MASK 0x7f /* Mask for writable bits */
+#defineARMV8_PMU_PMCR_MASK 0xff /* Mask for writable bits */
 
 /*
  * PMOVSR: counters overflow flag status reg
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index e40b65645c86..4e27f90bb89e 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -285,6 +285,17 @@ static struct attribute_group 
armv8_pmuv3_format_attr_group = {
 #defineARMV8_IDX_COUNTER_LAST(cpu_pmu) \
(ARMV8_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
 
+
+/*
+ * We unconditionally enable ARMv8.5-PMU long event counter support
+ * (64-bit events) where supported. Indicate if this arm_pmu has long
+ * event counter support.
+ */
+static bool armv8pmu_has_long_event(struct arm_pmu *cpu_pmu)
+{
+   return (cpu_pmu->pmuver > ID_DFR0_EL1_PMUVER_8_4);
+}
+
 /*
  * We must chain two programmable counters for 64 bit events,
  * except when we have allocated the 64bit cycle counter (for CPU
@@ -294,9 +305,11 @@ static struct attribute_group 
armv8_pmuv3_format_attr_group = {
 static inline bool armv8pmu_event_is_chained(struct perf_event *event)
 {
int idx = event->hw.idx;
+   struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 
return !WARN_ON(idx < 0) &&
   armv8pmu_event_is_64bit(event) &&
+  !armv8pmu_has_long_event(cpu_pmu) &&
   (idx != ARMV8_IDX_CYCLE_COUNTER);
 }
 
@@ -345,7 +358,7 @@ static inline void armv8pmu_select_counter(int idx)
isb();
 }
 
-static inline u32 armv8pmu_read_evcntr(int idx)
+static inline u64 armv8pmu_read_evcntr(int idx)
 {
armv8pmu_select_counter(idx);
return read_sysreg(pmxevcntr_el0);
@@ -362,6 +375,44 @@ static inline u64 armv8pmu_read_hw_counter(struct 
perf_event *event)
return val;
 }
 
+/*
+ * The cycle counter is always a 64-bit counter. When ARMV8_PMU_PMCR_LP
+ * is set the event counters also become 64-bit counters. Unless the
+ * user has requested a long counter (attr.config1) then we want to
+ * interrupt upon 32-bit overflow - we achieve this by applying a bias.
+ */
+static bool armv8pmu_event_needs_bias(struct perf_event *event)
+{
+   struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
+   struct hw_perf_event *hwc = >hw;
+   int idx = hwc->idx;
+
+   if (armv8pmu_event_is_64bit(event))
+   return false;
+
+   if (armv8pmu_has_long_event(cpu_pmu) ||
+   idx == ARMV8_IDX_CYCLE_COUNTER)
+   return true;
+
+   return false;
+}
+
+static u64 armv8pmu_bias_long_counter(struct perf_event *event, u64 value)
+{
+   if (armv8pmu_event_needs_bias(event))
+   value |= GENMASK(63, 32);
+
+   return value;
+}
+
+static u64 armv8pmu_unbias_long_counter(struct perf_event *event, u64 value)
+{
+   if (armv8pmu_event_needs_bias(event))
+   value &= ~GENMASK(63, 32);
+
+   return value;
+}
+
 static u64 armv8pmu_read_counter(struct perf_event *event)
 {
struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
@@ -377,10 +428,10 @@ static u64 armv8pmu_read_counter(struct perf_event *event)
else
value = armv8pmu_read_hw_counter(event);
 
-   return value;
+   return  armv8pmu_unbias_long_counter(event, value);
 }
 
-static inline void armv8pmu_write_evcntr(int idx, u32 value)
+static inline voi

[PATCH v2 1/3] arm64: cpufeature: Extract capped fields

2019-12-10 Thread Andrew Murray
When emulating ID registers there is often a need to cap the version
bits of a feature such that the guest will not use features that do
not yet exist.

Let's add a helper that extracts a field and caps the version to a
given value.

Signed-off-by: Andrew Murray 
---
 arch/arm64/include/asm/cpufeature.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 4261d55e8506..19f051ec1610 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -447,6 +447,21 @@ cpuid_feature_extract_unsigned_field(u64 features, int 
field)
return cpuid_feature_extract_unsigned_field_width(features, field, 4);
 }
 
+static inline u64 __attribute_const__
+cpuid_feature_cap_signed_field_width(u64 features, int field, int width,
+s64 cap)
+{
+   s64 val = cpuid_feature_extract_signed_field_width(features, field,
+  width);
+
+   if (val > cap) {
+   features &= ~GENMASK_ULL(field + width - 1, field);
+   features |= cap << field;
+   }
+
+   return features;
+}
+
 static inline u64 arm64_ftr_mask(const struct arm64_ftr_bits *ftrp)
 {
return (u64)GENMASK(ftrp->shift + ftrp->width - 1, ftrp->shift);
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 2/3] KVM: arm64: limit PMU version to ARMv8.4

2019-12-10 Thread Andrew Murray
ARMv8.5-PMU introduces 64-bit event counters, however KVM doesn't yet
support this. Let's trap the Debug Feature Registers in order to limit
PMUVer/PerfMon in the Debug Feature Registers to PMUv3 for ARMv8.4.

Signed-off-by: Andrew Murray 
---
 arch/arm64/include/asm/sysreg.h |  4 
 arch/arm64/kvm/sys_regs.c   | 36 +++--
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 6e919fafb43d..1b74f275a115 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -672,6 +672,10 @@
 #define ID_AA64DFR0_TRACEVER_SHIFT 4
 #define ID_AA64DFR0_DEBUGVER_SHIFT 0
 
+#define ID_DFR0_PERFMON_SHIFT  24
+
+#define ID_DFR0_EL1_PMUVER_8_4 5
+
 #define ID_ISAR5_RDM_SHIFT 24
 #define ID_ISAR5_CRC32_SHIFT   16
 #define ID_ISAR5_SHA2_SHIFT12
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 46822afc57e0..e0cd95ca41fd 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -668,6 +668,37 @@ static bool pmu_access_event_counter_el0_disabled(struct 
kvm_vcpu *vcpu)
return check_pmu_access_disabled(vcpu, ARMV8_PMU_USERENR_ER | 
ARMV8_PMU_USERENR_EN);
 }
 
+static bool access_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
+  struct sys_reg_params *p,
+  const struct sys_reg_desc *rd)
+{
+   if (p->is_write)
+   return write_to_read_only(vcpu, p, rd);
+
+   /* Limit guests to PMUv3 for ARMv8.4 */
+   p->regval = read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1);
+   p->regval = cpuid_feature_cap_signed_field_width(p->regval,
+   ID_AA64DFR0_PMUVER_SHIFT,
+   4, ID_DFR0_EL1_PMUVER_8_4);
+
+   return p->regval;
+}
+
+static bool access_id_dfr0_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+  const struct sys_reg_desc *rd)
+{
+   if (p->is_write)
+   return write_to_read_only(vcpu, p, rd);
+
+   /* Limit guests to PMUv3 for ARMv8.4 */
+   p->regval = read_sanitised_ftr_reg(SYS_ID_DFR0_EL1);
+   p->regval = cpuid_feature_cap_signed_field_width(p->regval,
+   ID_DFR0_PERFMON_SHIFT,
+   4, ID_DFR0_EL1_PMUVER_8_4);
+
+   return p->regval;
+}
+
 static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
const struct sys_reg_desc *r)
 {
@@ -1409,7 +1440,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
/* CRm=1 */
ID_SANITISED(ID_PFR0_EL1),
ID_SANITISED(ID_PFR1_EL1),
-   ID_SANITISED(ID_DFR0_EL1),
+   { SYS_DESC(SYS_ID_DFR0_EL1), access_id_dfr0_el1 },
+
ID_HIDDEN(ID_AFR0_EL1),
ID_SANITISED(ID_MMFR0_EL1),
ID_SANITISED(ID_MMFR1_EL1),
@@ -1448,7 +1480,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
ID_UNALLOCATED(4,7),
 
/* CRm=5 */
-   ID_SANITISED(ID_AA64DFR0_EL1),
+   { SYS_DESC(SYS_ID_AA64DFR0_EL1), access_id_aa64dfr0_el1 },
ID_SANITISED(ID_AA64DFR1_EL1),
ID_UNALLOCATED(5,2),
ID_UNALLOCATED(5,3),
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC 2/3] KVM: arm64: pmu: Fix chained SW_INCR counters

2019-12-06 Thread Andrew Murray
On Fri, Dec 06, 2019 at 03:35:03PM +, Marc Zyngier wrote:
> On 2019-12-06 15:21, Andrew Murray wrote:
> > On Thu, Dec 05, 2019 at 02:52:26PM +, Marc Zyngier wrote:
> > > On 2019-12-05 14:06, Auger Eric wrote:
> > > > Hi Marc,


> > > > >
> > > > > I think the whole function is a bit of a mess, and could be
> > > better
> > > > > structured to treat 64bit counters as a first class citizen.
> > > > >
> > > > > I'm suggesting something along those lines, which tries to
> > > > > streamline things a bit and keep the flow uniform between the
> > > > > two word sizes. IMHO, it helps reasonning about it and gives
> > > > > scope to the ARMv8.5 full 64bit counters... It is of course
> > > > > completely untested.
> > > >
> > > > Looks OK to me as well. One remark though, don't we need to test
> > > if the
> > > > n+1th reg is enabled before incrementing it?
> > 
> > Indeed - we don't want to indicate an overflow on a disabled counter.
> > 
> > 
> > > 
> > > Hmmm. I'm not sure. I think we should make sure that we don't flag
> > > a counter as being chained if the odd counter is disabled, rather
> > > than checking it here. As long as the odd counter is not chained
> > > *and* enabled, we shouldn't touch it.
> > 
> > Does this mean that we don't care if the low counter is enabled or not
> > when deciding if the pair is chained?
> > 
> > I would find the code easier to follow if we had an explicit 'is the
> > high counter enabled here' check (at the point of deciding where to
> > put the overflow).
> 
> Sure. But the point is that we're spreading that kind of checks all over
> the map, and that we don't have a way to even reason about the state of
> a 64bit counter. Doesn't it strike you as being mildly broken?
> 

Yup! To the point where I can't trust the function names and have to look
at what the code does...


> > > @@ -645,7 +647,8 @@ static void kvm_pmu_update_pmc_chained(struct
> > > kvm_vcpu
> > > *vcpu, u64 select_idx)
> > >   struct kvm_pmu *pmu = >arch.pmu;
> > >   struct kvm_pmc *pmc = >pmc[select_idx];
> > > 
> > > - if (kvm_pmu_idx_has_chain_evtype(vcpu, pmc->idx)) {
> > > + if (kvm_pmu_idx_has_chain_evtype(vcpu, pmc->idx) &&
> > > + kvm_pmu_counter_is_enabled(vcpu, pmc->idx)) {
> > 
> > I.e. here we don't care what the state of enablement is for the low
> > counter.
> > 
> > Also at present, this may break the following use-case
> > 
> >  - User creates and uses a pair of chained counters
> >  - User disables odd/high counter
> >  - User reads values of both counters
> >  - User rewrites CHAIN event to odd/high counter OR user re-enables
> > just the even/low counter
> >  - User reads value of both counters <- this may now different to the
> > last read
> 
> Hey, I didn't say it was perfect ;-). But for sure we can't let the
> PMU bitrot more than it already has, and I'm not sure this is heading
> the right way.

I think we're aligned here. To me this code is becoming very fragile, difficult
for me to make sense of and is stretching the abstractions we've made. This is
why it is difficult to enhance it without breaking something. It's why I felt it
was safer to add 'an extra check' in the SWINCR than to risk touching something
that I didn't have the confidence I could be sure was correct. 


> 
> I'm certainly going to push back on new PMU features until we can properly
> reason about 64bit counters as a top-level entity (as opposed to a bunch
> of discrete counters).

Thanks,

Andrew Murray

> 
> Thanks,
> 
> M.
> -- 
> Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC 2/3] KVM: arm64: pmu: Fix chained SW_INCR counters

2019-12-06 Thread Andrew Murray
FG1_KVM_PMU_CHAINED 0x1
> > 
> > @@ -298,6 +299,7 @@ void kvm_pmu_enable_counter_mask(struct kvm_vcpu
> > *vcpu, u64 val)
> >   * For high counters of chained events we must recreate the
> >   * perf event with the long (64bit) attribute set.
> >   */
> > +    kvm_pmu_update_pmc_chained(vcpu, i);
> >  if (kvm_pmu_pmc_is_chained(pmc) &&
> >  kvm_pmu_idx_is_high_counter(i)) {
> >  kvm_pmu_create_perf_event(vcpu, i);
> > @@ -645,7 +647,8 @@ static void kvm_pmu_update_pmc_chained(struct
> > kvm_vcpu *vcpu, u64 select_idx)
> >  struct kvm_pmu *pmu = >arch.pmu;
> >  struct kvm_pmc *pmc = >pmc[select_idx];
> > 
> > -    if (kvm_pmu_idx_has_chain_evtype(vcpu, pmc->idx)) {
> > +    if (kvm_pmu_idx_has_chain_evtype(vcpu, pmc->idx) &&
> > +    kvm_pmu_counter_is_enabled(vcpu, pmc->idx)) {
> 
> In create_perf_event(), has_chain_evtype() is used and a 64b sample
> period would be chosen even if the counters are disjoined (since the odd
> is disabled). We would need to use pmc_is_chained() instead.

So in this use-case, someone has configured a pair of chained counters
but only enabled the lower half. In this case we only create a 32bit backing
event (no PERF_ATTR_CFG1_KVM_PMU_CHAINED flag) - I guess this means the
perf event will trigger on 64bit period(?) despite the high counter being
disabled. The guest will see an interrupt in their disabled high counter.

This is a know limitation - see the comment "For
chained counters we only support overflow interrupts on the high counter"

Though upon looking at this it seems a little more broken. I guess where
both counters are enabled we want to overflow at 64bits and raise the
overflow to the high counter. When the high counter is disabled we want to
overflow on 32bits and raise the overflow to the low counter.

Perhaps something like the following would help:

--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -585,15 +585,16 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu 
*vcpu, u64 select_idx)
 
counter = kvm_pmu_get_pair_counter_value(vcpu, pmc);
 
-   if (kvm_pmu_idx_has_chain_evtype(vcpu, pmc->idx)) {
+   if (kvm_pmu_idx_has_chain_evtype(vcpu, pmc->idx) &&
+   kvm_pmu_counter_is_enabled(vcpu, pmc->idx + 1))
+
/**
 * The initial sample period (overflow count) of an event. For
 * chained counters we only support overflow interrupts on the
 * high counter.
 */
attr.sample_period = (-counter) & GENMASK(63, 0);
-   if (kvm_pmu_counter_is_enabled(vcpu, pmc->idx + 1))
-   attr.config1 |= PERF_ATTR_CFG1_KVM_PMU_CHAINED;
+   attr.config1 |= PERF_ATTR_CFG1_KVM_PMU_CHAINED;
 
event = perf_event_create_kernel_counter(, -1, current,
     kvm_pmu_perf_overflow,


It's not clear to me what is supposed to happen with overflow handling on the
low counter on chained counters (where the high counter is disabled).


> 
> With perf_events, the check of whether the odd register is enabled is
> properly done (create_perf_event). Then I understand whenever there is a
> change in enable state or type we delete the previous perf event and
> re-create a new one. Enable state check just is missing for SW_INCR.
> 
> Some other questions:
> - do we need a perf event to be created even if the counter is not
> enabled? For instance on counter resets, create_perf_events get called.

That would suggest we create and destroy them each time the guest enables
and disables the counters - I would expect them to do that a lot (every
context switch) - my assumption would be that the current approach has
less overhead for a running guest.


> - also actions are made for counters which are not implemented. loop
> until ARMV8_PMU_MAX_COUNTERS. Do you think it is valuable to have a
> bitmask of supported counters stored before pmu readiness?
> I can propose such changes if you think they are valuable.

Are they? Many of the calls into this file come from
arch/arm64/kvm/sys_regs.c where we apply a mask (value from
kvm_pmu_valid_counter_mask) to ignore unsupported counters.

Thanks,

Andrew Murray


> 
> Thanks
> 
> Eric
> 
> >  /*
> >   * During promotion from !chained to chained we must ensure
> >   * the adjacent counter is stopped and its event destroyed
> > 
> > What do you think?
> > 
> >     M.
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC 2/3] KVM: arm64: pmu: Fix chained SW_INCR counters

2019-12-06 Thread Andrew Murray
vm_pmu_update_pmc_chained(struct kvm_vcpu *vcpu, u64
> select_idx);
> 
>  #define PERF_ATTR_CFG1_KVM_PMU_CHAINED 0x1
> 
> @@ -298,6 +299,7 @@ void kvm_pmu_enable_counter_mask(struct kvm_vcpu *vcpu,
> u64 val)
>* For high counters of chained events we must recreate the
>* perf event with the long (64bit) attribute set.
>*/
> + kvm_pmu_update_pmc_chained(vcpu, i);
>   if (kvm_pmu_pmc_is_chained(pmc) &&
>   kvm_pmu_idx_is_high_counter(i)) {
>   kvm_pmu_create_perf_event(vcpu, i);
> @@ -645,7 +647,8 @@ static void kvm_pmu_update_pmc_chained(struct kvm_vcpu
> *vcpu, u64 select_idx)
>   struct kvm_pmu *pmu = >arch.pmu;
>   struct kvm_pmc *pmc = >pmc[select_idx];
> 
> - if (kvm_pmu_idx_has_chain_evtype(vcpu, pmc->idx)) {
> + if (kvm_pmu_idx_has_chain_evtype(vcpu, pmc->idx) &&
> + kvm_pmu_counter_is_enabled(vcpu, pmc->idx)) {

I.e. here we don't care what the state of enablement is for the low counter.

Also at present, this may break the following use-case

 - User creates and uses a pair of chained counters
 - User disables odd/high counter
 - User reads values of both counters
 - User rewrites CHAIN event to odd/high counter OR user re-enables just the 
even/low counter
 - User reads value of both counters <- this may now different to the last read

Thanks,

Andrew Murray

>   /*
>* During promotion from !chained to chained we must ensure
>* the adjacent counter is stopped and its event destroyed
> 
> What do you think?
> 
> M.
> -- 
> Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC 1/3] KVM: arm64: pmu: Don't increment SW_INCR if PMCR.E is unset

2019-12-06 Thread Andrew Murray
On Wed, Dec 04, 2019 at 09:44:24PM +0100, Eric Auger wrote:
> The specification says PMSWINC increments PMEVCNTR_EL1 by 1
> if PMEVCNTR_EL0 is enabled and configured to count SW_INCR.
> 
> For PMEVCNTR_EL0 to be enabled, we need both PMCNTENSET to
> be set for the corresponding event counter but we also need
> the PMCR.E bit to be set.
> 
> Fixes: 7a0adc7064b8 ("arm64: KVM: Add access handler for PMSWINC register")
> Signed-off-by: Eric Auger 
> ---
>  virt/kvm/arm/pmu.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index 8731dfeced8b..c3f8b059881e 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -486,6 +486,9 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, 
> u64 val)
>   if (val == 0)
>   return;
>  
> + if (!(__vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E))
> + return;
> +

Reviewed-by: Andrew Murray 


>   enable = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
>   for (i = 0; i < ARMV8_PMU_CYCLE_IDX; i++) {
>   if (!(val & BIT(i)))
> -- 
> 2.20.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 3/3] arm64: perf: Add support for ARMv8.5-PMU 64-bit counters

2019-10-25 Thread Andrew Murray
At present ARMv8 event counters are limited to 32-bits, though by
using the CHAIN event it's possible to combine adjacent counters to
achieve 64-bits. The perf config1:0 bit can be set to use such a
configuration.

With the introduction of ARMv8.5-PMU support, all event counters can
now be used as 64-bit counters.

Let's enable 64-bit event counters where support exists. Unless the
user sets config1:0 we will adjust the counter value such that it
overflows upon 32-bit overflow. This follows the same behaviour as
the cycle counter which has always been (and remains) 64-bits.

Signed-off-by: Andrew Murray 
---
 arch/arm64/include/asm/perf_event.h |  3 +-
 arch/arm64/kernel/perf_event.c  | 86 +++--
 include/linux/perf/arm_pmu.h|  1 +
 3 files changed, 72 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/perf_event.h 
b/arch/arm64/include/asm/perf_event.h
index 2bdbc79bbd01..e7765b62c712 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -176,9 +176,10 @@
 #define ARMV8_PMU_PMCR_X   (1 << 4) /* Export to ETM */
 #define ARMV8_PMU_PMCR_DP  (1 << 5) /* Disable CCNT if non-invasive debug*/
 #define ARMV8_PMU_PMCR_LC  (1 << 6) /* Overflow on 64 bit cycle counter */
+#define ARMV8_PMU_PMCR_LP  (1 << 7) /* Long event counter enable */
 #defineARMV8_PMU_PMCR_N_SHIFT  11   /* Number of counters 
supported */
 #defineARMV8_PMU_PMCR_N_MASK   0x1f
-#defineARMV8_PMU_PMCR_MASK 0x7f /* Mask for writable bits */
+#defineARMV8_PMU_PMCR_MASK 0xff /* Mask for writable bits */
 
 /*
  * PMOVSR: counters overflow flag status reg
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index a0b4f1bca491..d8a3fa060abc 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -344,6 +344,17 @@ static struct attribute_group 
armv8_pmuv3_format_attr_group = {
 #defineARMV8_IDX_COUNTER_LAST(cpu_pmu) \
(ARMV8_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
 
+
+/*
+ * We unconditionally enable ARMv8.5-PMU long event counter support
+ * (64-bit events) where supported. Indicate if this arm_pmu has long
+ * event counter support.
+ */
+static bool armv8pmu_has_long_event(struct arm_pmu *cpu_pmu)
+{
+   return (cpu_pmu->pmuver > ID_DFR0_EL1_PMUVER_8_4);
+}
+
 /*
  * We must chain two programmable counters for 64 bit events,
  * except when we have allocated the 64bit cycle counter (for CPU
@@ -353,9 +364,11 @@ static struct attribute_group 
armv8_pmuv3_format_attr_group = {
 static inline bool armv8pmu_event_is_chained(struct perf_event *event)
 {
int idx = event->hw.idx;
+   struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 
return !WARN_ON(idx < 0) &&
   armv8pmu_event_is_64bit(event) &&
+  !armv8pmu_has_long_event(cpu_pmu) &&
   (idx != ARMV8_IDX_CYCLE_COUNTER);
 }
 
@@ -404,7 +417,7 @@ static inline void armv8pmu_select_counter(int idx)
isb();
 }
 
-static inline u32 armv8pmu_read_evcntr(int idx)
+static inline u64 armv8pmu_read_evcntr(int idx)
 {
armv8pmu_select_counter(idx);
return read_sysreg(pmxevcntr_el0);
@@ -421,6 +434,44 @@ static inline u64 armv8pmu_read_hw_counter(struct 
perf_event *event)
return val;
 }
 
+/*
+ * The cycle counter is always a 64-bit counter. When ARMV8_PMU_PMCR_LP
+ * is set the event counters also become 64-bit counters. Unless the
+ * user has requested a long counter (attr.config1) then we want to
+ * interrupt upon 32-bit overflow - we achieve this by applying a bias.
+ */
+static bool armv8pmu_event_needs_bias(struct perf_event *event)
+{
+   struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
+   struct hw_perf_event *hwc = >hw;
+   int idx = hwc->idx;
+
+   if (armv8pmu_event_is_64bit(event))
+   return false;
+
+   if (armv8pmu_has_long_event(cpu_pmu) ||
+   idx == ARMV8_IDX_CYCLE_COUNTER)
+   return true;
+
+   return false;
+}
+
+static u64 armv8pmu_bias_long_counter(struct perf_event *event, u64 value)
+{
+   if (armv8pmu_event_needs_bias(event))
+   value |= GENMASK(63, 32);
+
+   return value;
+}
+
+static u64 armv8pmu_unbias_long_counter(struct perf_event *event, u64 value)
+{
+   if (armv8pmu_event_needs_bias(event))
+   value &= ~GENMASK(63, 32);
+
+   return value;
+}
+
 static u64 armv8pmu_read_counter(struct perf_event *event)
 {
struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
@@ -436,10 +487,10 @@ static u64 armv8pmu_read_counter(struct perf_event *event)
else
value = armv8pmu_read_hw_counter(event);
 
-   return value;
+   return  armv8pmu_unbias_long_counter(event, value);
 }
 
-static inline void armv8pmu_write_evcntr(int idx, u32 value)
+static inline voi

[PATCH v1 1/3] arm64: cpufeature: Extract capped fields

2019-10-25 Thread Andrew Murray
When emulating ID registers there is often a need to cap the version
bits of a feature such that the guest will not use features that do
not yet exist.

Let's add a helper that extracts a field and caps the version to a
given value.

Signed-off-by: Andrew Murray 
---
 arch/arm64/include/asm/cpufeature.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 9cde5d2e768f..6b5bbf770969 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -447,6 +447,21 @@ cpuid_feature_extract_unsigned_field(u64 features, int 
field)
return cpuid_feature_extract_unsigned_field_width(features, field, 4);
 }
 
+static inline u64 __attribute_const__
+cpuid_feature_cap_signed_field_width(u64 features, int field, int width,
+s64 cap)
+{
+   s64 val = cpuid_feature_extract_signed_field_width(features, field,
+  width);
+
+   if (val > cap) {
+   features &= ~GENMASK_ULL(field + width - 1, field);
+   features |= cap << field;
+   }
+
+   return features;
+}
+
 static inline u64 arm64_ftr_mask(const struct arm64_ftr_bits *ftrp)
 {
return (u64)GENMASK(ftrp->shift + ftrp->width - 1, ftrp->shift);
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 2/3] KVM: arm64: limit PMU version to ARMv8.4

2019-10-25 Thread Andrew Murray
ARMv8.5-PMU introduces 64-bit event counters, however KVM doesn't yet
support this. Let's trap the Debug Feature Registers in order to limit
PMUVer/PerfMon in the Debug Feature Registers to PMUv3 for ARMv8.4.

Signed-off-by: Andrew Murray 
---
 arch/arm64/include/asm/sysreg.h |  4 
 arch/arm64/kvm/sys_regs.c   | 36 +++--
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 972d196c7714..0e82e210e22b 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -672,6 +672,10 @@
 #define ID_AA64DFR0_TRACEVER_SHIFT 4
 #define ID_AA64DFR0_DEBUGVER_SHIFT 0
 
+#define ID_DFR0_PERFMON_SHIFT  24
+
+#define ID_DFR0_EL1_PMUVER_8_4 5
+
 #define ID_ISAR5_RDM_SHIFT 24
 #define ID_ISAR5_CRC32_SHIFT   16
 #define ID_ISAR5_SHA2_SHIFT12
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2071260a275b..829838c9af3f 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -666,6 +666,37 @@ static bool pmu_access_event_counter_el0_disabled(struct 
kvm_vcpu *vcpu)
return check_pmu_access_disabled(vcpu, ARMV8_PMU_USERENR_ER | 
ARMV8_PMU_USERENR_EN);
 }
 
+static bool access_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
+  struct sys_reg_params *p,
+  const struct sys_reg_desc *rd)
+{
+   if (p->is_write)
+   return write_to_read_only(vcpu, p, rd);
+
+   /* Limit guests to PMUv3 for ARMv8.4 */
+   p->regval = read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1);
+   p->regval = cpuid_feature_cap_signed_field_width(p->regval,
+   ID_AA64DFR0_PMUVER_SHIFT,
+   4, ID_DFR0_EL1_PMUVER_8_4);
+
+   return p->regval;
+}
+
+static bool access_id_dfr0_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+  const struct sys_reg_desc *rd)
+{
+   if (p->is_write)
+   return write_to_read_only(vcpu, p, rd);
+
+   /* Limit guests to PMUv3 for ARMv8.4 */
+   p->regval = read_sanitised_ftr_reg(SYS_ID_DFR0_EL1);
+   p->regval = cpuid_feature_cap_signed_field_width(p->regval,
+   ID_DFR0_PERFMON_SHIFT,
+   4, ID_DFR0_EL1_PMUVER_8_4);
+
+   return p->regval;
+}
+
 static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
const struct sys_reg_desc *r)
 {
@@ -1405,7 +1436,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
/* CRm=1 */
ID_SANITISED(ID_PFR0_EL1),
ID_SANITISED(ID_PFR1_EL1),
-   ID_SANITISED(ID_DFR0_EL1),
+   { SYS_DESC(SYS_ID_DFR0_EL1), access_id_dfr0_el1 },
+
ID_HIDDEN(ID_AFR0_EL1),
ID_SANITISED(ID_MMFR0_EL1),
ID_SANITISED(ID_MMFR1_EL1),
@@ -1444,7 +1476,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
ID_UNALLOCATED(4,7),
 
/* CRm=5 */
-   ID_SANITISED(ID_AA64DFR0_EL1),
+   { SYS_DESC(SYS_ID_AA64DFR0_EL1), access_id_aa64dfr0_el1 },
ID_SANITISED(ID_AA64DFR1_EL1),
ID_UNALLOCATED(5,2),
ID_UNALLOCATED(5,3),
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 0/3] arm64: perf: Add support for ARMv8.5-PMU 64-bit counters

2019-10-25 Thread Andrew Murray
At present ARMv8 event counters are limited to 32-bits, though by
using the CHAIN event it's possible to combine adjacent counters to
achieve 64-bits. The perf config1:0 bit can be set to use such a
configuration.

With the introduction of ARMv8.5-PMU support, all event counters can
now be used as 64-bit counters. Let's add support for 64-bit event
counters.

As KVM doesn't yet support 64-bit event counters, we also trap
and emulate the Debug Feature Registers to limit the PMU version a
guest sees to PMUv3 for ARMv8.4.

Tested by running the following perf command on both guest and host
and ensuring that the figures are very similar:

perf stat -e armv8_pmuv3/inst_retired,long=1/ \
  -e armv8_pmuv3/inst_retired,long=0/ -e cycles


Andrew Murray (3):
  arm64: cpufeature: Extract capped fields
  KVM: arm64: limit PMU version to ARMv8.4
  arm64: perf: Add support for ARMv8.5-PMU 64-bit counters

 arch/arm64/include/asm/cpufeature.h | 15 +
 arch/arm64/include/asm/perf_event.h |  3 +-
 arch/arm64/include/asm/sysreg.h |  4 ++
 arch/arm64/kernel/perf_event.c  | 86 +++--
 arch/arm64/kvm/sys_regs.c   | 36 +++-
 include/linux/perf/arm_pmu.h|  1 +
 6 files changed, 125 insertions(+), 20 deletions(-)

-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 4/4] KVM: arm64: pmu: Reset sample period on overflow handling

2019-10-15 Thread Andrew Murray
On Fri, Oct 11, 2019 at 01:39:54PM +0100, Marc Zyngier wrote:
> The PMU emulation code uses the perf event sample period to trigger
> the overflow detection. This works fine  for the *first* overflow
> handling, but results in a huge number of interrupts on the host,
> unrelated to the number of interrupts handled in the guest (a x20
> factor is pretty common for the cycle counter). On a slow system
> (such as a SW model), this can result in the guest only making
> forward progress at a glacial pace.
> 
> It turns out that the clue is in the name. The sample period is
> exactly that: a period. And once the an overflow has occured,
> the following period should be the full width of the associated
> counter, instead of whatever the guest had initially programed.
> 
> Reset the sample period to the architected value in the overflow
> handler, which now results in a number of host interrupts that is
> much closer to the number of interrupts in the guest.
> 
> Fixes: b02386eb7dac ("arm64: KVM: Add PMU overflow interrupt routing")
> Signed-off-by: Marc Zyngier 
> ---

Reviewed-by: Andrew Murray 

>  virt/kvm/arm/pmu.c | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index f291d4ac3519..8731dfeced8b 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -8,6 +8,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -442,8 +443,25 @@ static void kvm_pmu_perf_overflow(struct perf_event 
> *perf_event,
> struct pt_regs *regs)
>  {
>   struct kvm_pmc *pmc = perf_event->overflow_handler_context;
> + struct arm_pmu *cpu_pmu = to_arm_pmu(perf_event->pmu);
>   struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
>   int idx = pmc->idx;
> + u64 period;
> +
> + cpu_pmu->pmu.stop(perf_event, PERF_EF_UPDATE);
> +
> + /*
> +  * Reset the sample period to the architectural limit,
> +  * i.e. the point where the counter overflows.
> +  */
> + period = -(local64_read(_event->count));
> +
> + if (!kvm_pmu_idx_is_64bit(vcpu, pmc->idx))
> + period &= GENMASK(31, 0);
> +
> + local64_set(_event->hw.period_left, 0);
> + perf_event->attr.sample_period = period;
> + perf_event->hw.sample_period = period;
>  
>   __vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= BIT(idx);
>  
> @@ -451,6 +469,8 @@ static void kvm_pmu_perf_overflow(struct perf_event 
> *perf_event,
>   kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>   kvm_vcpu_kick(vcpu);
>   }
> +
> + cpu_pmu->pmu.start(perf_event, PERF_EF_RELOAD);
>  }
>  
>  /**
> -- 
> 2.20.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 3/4] KVM: arm64: pmu: Set the CHAINED attribute before creating the in-kernel event

2019-10-11 Thread Andrew Murray
On Fri, Oct 11, 2019 at 01:39:53PM +0100, Marc Zyngier wrote:
> The current convention for KVM to request a chained event from the
> host PMU is to set bit[0] in attr.config1 (PERF_ATTR_CFG1_KVM_PMU_CHAINED).
> 
> But as it turns out, this bit gets set *after* we create the kernel
> event that backs our virtual counter, meaning that we never get
> a 64bit counter.
> 
> Moving the setting to an earlier point solves the problem.
> 
> Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters")
> Signed-off-by: Marc Zyngier 

Reviewed-by: Andrew Murray 

> ---
>  virt/kvm/arm/pmu.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index c30c3a74fc7f..f291d4ac3519 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -569,12 +569,12 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu 
> *vcpu, u64 select_idx)
>* high counter.
>*/
>   attr.sample_period = (-counter) & GENMASK(63, 0);
> + if (kvm_pmu_counter_is_enabled(vcpu, pmc->idx + 1))
> + attr.config1 |= PERF_ATTR_CFG1_KVM_PMU_CHAINED;
> +
>   event = perf_event_create_kernel_counter(, -1, current,
>kvm_pmu_perf_overflow,
>pmc + 1);
> -
> - if (kvm_pmu_counter_is_enabled(vcpu, pmc->idx + 1))
> - attr.config1 |= PERF_ATTR_CFG1_KVM_PMU_CHAINED;
>   } else {
>   /* The initial sample period (overflow count) of an event. */
>   if (kvm_pmu_idx_is_64bit(vcpu, pmc->idx))
> -- 
> 2.20.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 5/5] KVM: arm64: pmu: Reset sample period on overflow handling

2019-10-11 Thread Andrew Murray
On Fri, Oct 11, 2019 at 12:28:48PM +0100, Marc Zyngier wrote:
> On Tue, 8 Oct 2019 23:42:22 +0100
> Andrew Murray  wrote:
> 
> > On Tue, Oct 08, 2019 at 05:01:28PM +0100, Marc Zyngier wrote:
> > > The PMU emulation code uses the perf event sample period to trigger
> > > the overflow detection. This works fine  for the *first* overflow
> > > handling, but results in a huge number of interrupts on the host,
> > > unrelated to the number of interrupts handled in the guest (a x20
> > > factor is pretty common for the cycle counter). On a slow system
> > > (such as a SW model), this can result in the guest only making
> > > forward progress at a glacial pace.
> > > 
> > > It turns out that the clue is in the name. The sample period is
> > > exactly that: a period. And once the an overflow has occured,
> > > the following period should be the full width of the associated
> > > counter, instead of whatever the guest had initially programed.
> > > 
> > > Reset the sample period to the architected value in the overflow
> > > handler, which now results in a number of host interrupts that is
> > > much closer to the number of interrupts in the guest.
> > > 
> > > Fixes: b02386eb7dac ("arm64: KVM: Add PMU overflow interrupt routing")
> > > Signed-off-by: Marc Zyngier 
> > > ---
> > >  virt/kvm/arm/pmu.c | 15 +++
> > >  1 file changed, 15 insertions(+)
> > > 
> > > diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> > > index 25a483a04beb..8b524d74c68a 100644
> > > --- a/virt/kvm/arm/pmu.c
> > > +++ b/virt/kvm/arm/pmu.c
> > > @@ -442,6 +442,20 @@ static void kvm_pmu_perf_overflow(struct perf_event 
> > > *perf_event,
> > >   struct kvm_pmc *pmc = perf_event->overflow_handler_context;
> > >   struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
> > >   int idx = pmc->idx;
> > > + u64 period;
> > > +
> > > + /*
> > > +  * Reset the sample period to the architectural limit,
> > > +  * i.e. the point where the counter overflows.
> > > +  */
> > > + period = -(local64_read(>perf_event->count));
> > > +
> > > + if (!kvm_pmu_idx_is_64bit(vcpu, pmc->idx))
> > > + period &= GENMASK(31, 0);
> > > +
> > > + local64_set(>perf_event->hw.period_left, 0);
> > > + pmc->perf_event->attr.sample_period = period;
> > > + pmc->perf_event->hw.sample_period = period;  
> > 
> > I believe that above, you are reducing the period by the amount period_left
> > would have been - they cancel each other out.
> 
> That's not what I see happening, having put some traces:
> 
>  kvm_pmu_perf_overflow: count = 308 left = 129
>  kvm_pmu_perf_overflow: count = 409 left = 47
>  kvm_pmu_perf_overflow: count = 585 left = 223
>  kvm_pmu_perf_overflow: count = 775 left = 413
>  kvm_pmu_perf_overflow: count = 1368 left = 986
>  kvm_pmu_perf_overflow: count = 2086 left = 1716
>  kvm_pmu_perf_overflow: count = 958 left = 584
>  kvm_pmu_perf_overflow: count = 1907 left = 1551
>  kvm_pmu_perf_overflow: count = 7292 left = 6932

Indeed.

> 
> although I've now moved the stop/start calls inside the overflow
> handler so that I don't have to mess with the PMU backend.
> 
> > Given that kvm_pmu_perf_overflow is now always called between a
> > cpu_pmu->pmu.stop and a cpu_pmu->pmu.start, it means armpmu_event_update
> > has been called prior to this function, and armpmu_event_set_period will
> > be called after...
> > 
> > Therefore, I think the above could be reduced to:
> > 
> > +   /*
> > +* Reset the sample period to the architectural limit,
> > +* i.e. the point where the counter overflows.
> > +*/
> > +   u64 period = GENMASK(63, 0);
> > +   if (!kvm_pmu_idx_is_64bit(vcpu, pmc->idx))
> > +   period = GENMASK(31, 0);
> > +
> > +   pmc->perf_event->attr.sample_period = period;
> > +   pmc->perf_event->hw.sample_period = period;
> > 
> > This is because armpmu_event_set_period takes into account the overflow
> > and the counter wrapping via the "if (unlikely(left <= 0)) {" block.
> 
> I think that's an oversimplification. As shown above, the counter has
> moved forward, and there is a delta to be accounted for.
> 

Yeah, I probably need to spend more time understanding this...

> > Though this code confuses me easily, so I may be talking rubbish.
> 
> Same here! ;-)
> 
> > 
> > >  
> > >   __vc

Re: [PATCH v2 5/5] KVM: arm64: pmu: Reset sample period on overflow handling

2019-10-08 Thread Andrew Murray
On Tue, Oct 08, 2019 at 05:01:28PM +0100, Marc Zyngier wrote:
> The PMU emulation code uses the perf event sample period to trigger
> the overflow detection. This works fine  for the *first* overflow
> handling, but results in a huge number of interrupts on the host,
> unrelated to the number of interrupts handled in the guest (a x20
> factor is pretty common for the cycle counter). On a slow system
> (such as a SW model), this can result in the guest only making
> forward progress at a glacial pace.
> 
> It turns out that the clue is in the name. The sample period is
> exactly that: a period. And once the an overflow has occured,
> the following period should be the full width of the associated
> counter, instead of whatever the guest had initially programed.
> 
> Reset the sample period to the architected value in the overflow
> handler, which now results in a number of host interrupts that is
> much closer to the number of interrupts in the guest.
> 
> Fixes: b02386eb7dac ("arm64: KVM: Add PMU overflow interrupt routing")
> Signed-off-by: Marc Zyngier 
> ---
>  virt/kvm/arm/pmu.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index 25a483a04beb..8b524d74c68a 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -442,6 +442,20 @@ static void kvm_pmu_perf_overflow(struct perf_event 
> *perf_event,
>   struct kvm_pmc *pmc = perf_event->overflow_handler_context;
>   struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
>   int idx = pmc->idx;
> + u64 period;
> +
> + /*
> +  * Reset the sample period to the architectural limit,
> +  * i.e. the point where the counter overflows.
> +  */
> + period = -(local64_read(>perf_event->count));
> +
> + if (!kvm_pmu_idx_is_64bit(vcpu, pmc->idx))
> + period &= GENMASK(31, 0);
> +
> + local64_set(>perf_event->hw.period_left, 0);
> + pmc->perf_event->attr.sample_period = period;
> + pmc->perf_event->hw.sample_period = period;

I believe that above, you are reducing the period by the amount period_left
would have been - they cancel each other out.

Given that kvm_pmu_perf_overflow is now always called between a
cpu_pmu->pmu.stop and a cpu_pmu->pmu.start, it means armpmu_event_update
has been called prior to this function, and armpmu_event_set_period will
be called after...

Therefore, I think the above could be reduced to:

+   /*
+* Reset the sample period to the architectural limit,
+* i.e. the point where the counter overflows.
+*/
+   u64 period = GENMASK(63, 0);
+   if (!kvm_pmu_idx_is_64bit(vcpu, pmc->idx))
+   period = GENMASK(31, 0);
+
+   pmc->perf_event->attr.sample_period = period;
+   pmc->perf_event->hw.sample_period = period;

This is because armpmu_event_set_period takes into account the overflow
and the counter wrapping via the "if (unlikely(left <= 0)) {" block.

Though this code confuses me easily, so I may be talking rubbish.

>  
>   __vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= BIT(idx);
>  
> @@ -557,6 +571,7 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu 
> *vcpu, u64 select_idx)
>   attr.exclude_host = 1; /* Don't count host events */
>   attr.config = (pmc->idx == ARMV8_PMU_CYCLE_IDX) ?
>   ARMV8_PMUV3_PERFCTR_CPU_CYCLES : eventsel;
> + attr.config1 = PERF_ATTR_CFG1_RELOAD_EVENT;

I'm not sure that this flag, or patch 4 is really needed. As the perf
events created by KVM are pinned to the task and exclude_(host,hv) are set -
I think the perf event is not active at this point. Therefore if you change
the sample period, you can wait until the perf event gets scheduled back in
(when you return to the guest) where it's call to pmu.start will result in
armpmu_event_set_period being called. In other words the pmu.start and
pmu.stop you add in patch 4 is effectively being done for you by perf when
the KVM task is switched out.

I'd be interested to see if the following works:

+   WARN_ON(pmc->perf_event->state == PERF_EVENT_STATE_ACTIVE)
+
+   /*
+* Reset the sample period to the architectural limit,
+* i.e. the point where the counter overflows.
+*/
+   u64 period = GENMASK(63, 0);
+   if (!kvm_pmu_idx_is_64bit(vcpu, pmc->idx))
+   period = GENMASK(31, 0);
+
+   pmc->perf_event->attr.sample_period = period;
+   pmc->perf_event->hw.sample_period = period;

>  
>   counter = kvm_pmu_get_pair_counter_value(vcpu, pmc);
>  

What about ARM 32 bit support for this?

Thanks,

Andrew Murray

> -- 
> 2.20.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/5] arm64: perf: Add reload-on-overflow capability

2019-10-08 Thread Andrew Murray
On Tue, Oct 08, 2019 at 05:01:27PM +0100, Marc Zyngier wrote:
> As KVM uses perf as a way to emulate an ARMv8 PMU, it needs to
> be able to change the sample period as part of the overflow
> handling (once an overflow has taken place, the following
> overflow point is the overflow of the virtual counter).
> 
> Deleting and recreating the in-kernel event is difficult, as
> we're in interrupt context. Instead, we can teach the PMU driver
> a new trick, which is to stop the event before the overflow handling,
> and reprogram it once it has been handled. This would give KVM
> the opportunity to adjust the next sample period. This feature
> is gated on a new flag that can get set by KVM in a subsequent
> patch.
> 
> Whilst we're at it, move the CHAINED flag from the KVM emulation
> to the perf_event.h file and adjust the PMU code accordingly.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/include/asm/perf_event.h | 4 
>  arch/arm64/kernel/perf_event.c  | 8 +++-
>  virt/kvm/arm/pmu.c  | 4 +---
>  3 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/perf_event.h 
> b/arch/arm64/include/asm/perf_event.h
> index 2bdbc79bbd01..8b6b38f2db8e 100644
> --- a/arch/arm64/include/asm/perf_event.h
> +++ b/arch/arm64/include/asm/perf_event.h
> @@ -223,4 +223,8 @@ extern unsigned long perf_misc_flags(struct pt_regs 
> *regs);
>   (regs)->pstate = PSR_MODE_EL1h; \
>  }
>  
> +/* Flags used by KVM, among others */
> +#define PERF_ATTR_CFG1_CHAINED_EVENT (1U << 0)
> +#define PERF_ATTR_CFG1_RELOAD_EVENT  (1U << 1)
> +
>  #endif
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index a0b4f1bca491..98907c9e5508 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -322,7 +322,7 @@ PMU_FORMAT_ATTR(long, "config1:0");
>  
>  static inline bool armv8pmu_event_is_64bit(struct perf_event *event)
>  {
> - return event->attr.config1 & 0x1;
> + return event->attr.config1 & PERF_ATTR_CFG1_CHAINED_EVENT;

I'm pleased to see this be replaced with a define, it helps readers see the
link between this and the KVM driver.

>  }
>  
>  static struct attribute *armv8_pmuv3_format_attrs[] = {
> @@ -736,8 +736,14 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu 
> *cpu_pmu)
>   if (!armpmu_event_set_period(event))
>   continue;
>  
> + if (event->attr.config1 & PERF_ATTR_CFG1_RELOAD_EVENT)
> + cpu_pmu->pmu.stop(event, PERF_EF_RELOAD);

I believe PERF_EF_RELOAD is only intended to be used in the stop calls. I'd
suggest that you replace it with PERF_EF_UPDATE instead, this tells the PMU
to update the counter with the latest value from the hardware. (Though the
ARM PMU driver always does this regardless to the flag anyway).

Thanks,

Andrew Murray

> +
>   if (perf_event_overflow(event, , regs))
>   cpu_pmu->disable(event);
> +
> + if (event->attr.config1 & PERF_ATTR_CFG1_RELOAD_EVENT)
> + cpu_pmu->pmu.start(event, PERF_EF_RELOAD);
>   }
>   armv8pmu_start(cpu_pmu);
>  
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index f291d4ac3519..25a483a04beb 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -15,8 +15,6 @@
>  
>  static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx);
>  
> -#define PERF_ATTR_CFG1_KVM_PMU_CHAINED 0x1
> -
>  /**
>   * kvm_pmu_idx_is_64bit - determine if select_idx is a 64bit counter
>   * @vcpu: The vcpu pointer
> @@ -570,7 +568,7 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu 
> *vcpu, u64 select_idx)
>*/
>   attr.sample_period = (-counter) & GENMASK(63, 0);
>   if (kvm_pmu_counter_is_enabled(vcpu, pmc->idx + 1))
> - attr.config1 |= PERF_ATTR_CFG1_KVM_PMU_CHAINED;
> + attr.config1 |= PERF_ATTR_CFG1_CHAINED_EVENT;
>  
>   event = perf_event_create_kernel_counter(, -1, current,
>kvm_pmu_perf_overflow,
> -- 
> 2.20.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 3/5] KVM: arm64: pmu: Set the CHAINED attribute before creating the in-kernel event

2019-10-08 Thread Andrew Murray
On Tue, Oct 08, 2019 at 05:01:26PM +0100, Marc Zyngier wrote:
> The current convention for KVM to request a chained event from the
> host PMU is to set bit[0] in attr.config1 (PERF_ATTR_CFG1_KVM_PMU_CHAINED).
> 
> But as it turns out, this bit gets set *after* we create the kernel
> event that backs our virtual counter, meaning that we never get
> a 64bit counter.
> 
> Moving the setting to an earlier point solves the problem.
> 
> Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters")
> Signed-off-by: Marc Zyngier 

Reviewed-by: Andrew Murray 

> ---
>  virt/kvm/arm/pmu.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index c30c3a74fc7f..f291d4ac3519 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -569,12 +569,12 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu 
> *vcpu, u64 select_idx)
>* high counter.
>*/
>   attr.sample_period = (-counter) & GENMASK(63, 0);
> + if (kvm_pmu_counter_is_enabled(vcpu, pmc->idx + 1))
> + attr.config1 |= PERF_ATTR_CFG1_KVM_PMU_CHAINED;
> +
>   event = perf_event_create_kernel_counter(, -1, current,
>kvm_pmu_perf_overflow,
>pmc + 1);
> -
> - if (kvm_pmu_counter_is_enabled(vcpu, pmc->idx + 1))
> - attr.config1 |= PERF_ATTR_CFG1_KVM_PMU_CHAINED;
>   } else {
>   /* The initial sample period (overflow count) of an event. */
>   if (kvm_pmu_idx_is_64bit(vcpu, pmc->idx))
> -- 
> 2.20.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/3] KVM: arm64: pmu: Reset sample period on overflow handling

2019-10-07 Thread Andrew Murray
On Mon, Oct 07, 2019 at 11:48:33AM +0100, Marc Zyngier wrote:
> On Mon, 07 Oct 2019 10:43:27 +0100,
> Andrew Murray  wrote:
> > 
> > On Sun, Oct 06, 2019 at 11:46:36AM +0100, m...@kernel.org wrote:
> > > From: Marc Zyngier 
> > > 
> > > The PMU emulation code uses the perf event sample period to trigger
> > > the overflow detection. This works fine  for the *first* overflow
> > > handling
> > 
> > Although, even though the first overflow is timed correctly, the value
> > the guest reads may be wrong...
> > 
> > Assuming a Linux guest with the arm_pmu.c driver, if I recall correctly
> > this writes the -remainingperiod to the counter upon stopping/starting.
> > In the case of a perf_event that is pinned to a task, this will happen
> > upon every context switch of that task. If the counter was getting close
> > to overflow before the context switch, then the value written to the
> > guest counter will be very high and thus the sample_period written in KVM
> > will be very low...
> > 
> > The best scenario is when the host handles the overflow, the guest
> > handles its overflow and rewrites the guest counter (resulting in a new
> > host perf_event) - all before the first host perf_event fires again. This
> > is clearly the assumption the code makes.
> > 
> > Or - the host handles its overflow and kicks the guest, but the guest
> > doesn't respond in time, so we end up endlessly and pointlessly kicking it
> > for each host overflow - thus resulting in the large difference between 
> > number
> > of interrupts between host and guest. This isn't ideal, because when the
> > guest does read its counter, the value isn't correct (because it overflowed
> > a zillion times at a value less than the arrchitected max).
> > 
> > Worse still is when the sample_period is so small, the host doesn't
> > even keep up.
> 
> Well, there are plenty of ways to make this code go mad. The
> overarching reason is that we abuse the notion of sampling period to
> generate interrupts, while what we'd really like is something that
> says "call be back in that many events", rather than the sampling
> period which doesn't match the architecture.
> 
> Yes, small values will results in large drifts. Nothing we can do
> about it.
> 
> > 
> > > , but results in a huge number of interrupts on the host,
> > > unrelated to the number of interrupts handled in the guest (a x20
> > > factor is pretty common for the cycle counter). On a slow system
> > > (such as a SW model), this can result in the guest only making
> > > forward progress at a glacial pace.
> > > 
> > > It turns out that the clue is in the name. The sample period is
> > > exactly that: a period. And once the an overflow has occured,
> > > the following period should be the full width of the associated
> > > counter, instead of whatever the guest had initially programed.
> > > 
> > > Reset the sample period to the architected value in the overflow
> > > handler, which now results in a number of host interrupts that is
> > > much closer to the number of interrupts in the guest.
> > 
> > This seems a reasonable pragmatic approach - though of course you will end
> > up counting slightly slower due to the host interrupt latency. But that's
> > better than the status quo.
> 
> Slower than what?
> 

Slower than the guest should expect. Assuming a cycle counter (with LC) is
initially programmed to 0, you'd target a guest interrupt period of 2^64 x cycle
period...

But I'm wrong in saying that you end up counting slightly slower - as you're
not restarting the perf counter or changing the value so there should be no 
change
in the interrupt period to the guest.

I was considering the case where the kernel perf event is recreated in the
overflow handler, in which case unless you consider the time elapsed between the
event firing and changing the sample_period then you end up with a larger 
period.

> > 
> > It may be possible with perf to have a single-fire counter (this mitigates
> > against my third scenario but you still end up with a loss of precision) -
> > See PERF_EVENT_IOC_REFRESH.
> 
> Unfortunately, that's a userspace interface, not something that's
> available to the kernel at large...

The mechanism to change the value of event->event_limit is only available via
ioctl, though I was implying that an in-kernel mechansim could be provided.
This would be trivial. (But it doesn't help, as I don't think you could create
another perf kernel event in that context).
 
> 
> > Ideally the PERF_EVENT_IOC_REFRESH type of 

Re: [PATCH 3/3] KVM: arm64: pmu: Reset sample period on overflow handling

2019-10-07 Thread Andrew Murray
On Sun, Oct 06, 2019 at 11:46:36AM +0100, m...@kernel.org wrote:
> From: Marc Zyngier 
> 
> The PMU emulation code uses the perf event sample period to trigger
> the overflow detection. This works fine  for the *first* overflow
> handling

Although, even though the first overflow is timed correctly, the value
the guest reads may be wrong...

Assuming a Linux guest with the arm_pmu.c driver, if I recall correctly
this writes the -remainingperiod to the counter upon stopping/starting.
In the case of a perf_event that is pinned to a task, this will happen
upon every context switch of that task. If the counter was getting close
to overflow before the context switch, then the value written to the
guest counter will be very high and thus the sample_period written in KVM
will be very low...

The best scenario is when the host handles the overflow, the guest
handles its overflow and rewrites the guest counter (resulting in a new
host perf_event) - all before the first host perf_event fires again. This
is clearly the assumption the code makes.

Or - the host handles its overflow and kicks the guest, but the guest
doesn't respond in time, so we end up endlessly and pointlessly kicking it
for each host overflow - thus resulting in the large difference between number
of interrupts between host and guest. This isn't ideal, because when the
guest does read its counter, the value isn't correct (because it overflowed
a zillion times at a value less than the arrchitected max).

Worse still is when the sample_period is so small, the host doesn't
even keep up.

> , but results in a huge number of interrupts on the host,
> unrelated to the number of interrupts handled in the guest (a x20
> factor is pretty common for the cycle counter). On a slow system
> (such as a SW model), this can result in the guest only making
> forward progress at a glacial pace.
> 
> It turns out that the clue is in the name. The sample period is
> exactly that: a period. And once the an overflow has occured,
> the following period should be the full width of the associated
> counter, instead of whatever the guest had initially programed.
> 
> Reset the sample period to the architected value in the overflow
> handler, which now results in a number of host interrupts that is
> much closer to the number of interrupts in the guest.

This seems a reasonable pragmatic approach - though of course you will end
up counting slightly slower due to the host interrupt latency. But that's
better than the status quo.

It may be possible with perf to have a single-fire counter (this mitigates
against my third scenario but you still end up with a loss of precision) -
See PERF_EVENT_IOC_REFRESH.

Ideally the PERF_EVENT_IOC_REFRESH type of functionality could be updated
to reload to a different value after the first hit.

This problem also exists on arch/x86/kvm/pmu.c (though I'm not sure what
their PMU drivers do with respect to the value they write).

> 
> Fixes: b02386eb7dac ("arm64: KVM: Add PMU overflow interrupt routing")
> Signed-off-by: Marc Zyngier 
> ---
>  virt/kvm/arm/pmu.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index c30c3a74fc7f..3ca4761fc0f5 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -444,6 +444,18 @@ static void kvm_pmu_perf_overflow(struct perf_event 
> *perf_event,
>   struct kvm_pmc *pmc = perf_event->overflow_handler_context;
>   struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
>   int idx = pmc->idx;
> + u64 val, period;
> +
> + /* Start by resetting the sample period to the architectural limit */
> + val = kvm_pmu_get_pair_counter_value(vcpu, pmc);
> +
> + if (kvm_pmu_idx_is_64bit(vcpu, pmc->idx))

This is correct, because in this case we *do* care about _PMCR_LC.

> + period = (-val) & GENMASK(63, 0);
> + else
> + period = (-val) & GENMASK(31, 0);
> +
> + pmc->perf_event->attr.sample_period = period;
> + pmc->perf_event->hw.sample_period = period;

I'm not sure about the above line - does direct manipulation of sample_period
work on a running perf event? As far as I can tell this is already done in the
kernel with __perf_event_period - however this also does other stuff (such as
disable and re-enable the event).

>  

Thanks,

Andrew Murray

>   __vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= BIT(idx);
>  
> -- 
> 2.20.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/3] KVM: arm64: pmu: Fix cycle counter truncation

2019-10-07 Thread Andrew Murray
On Sun, Oct 06, 2019 at 11:46:34AM +0100, m...@kernel.org wrote:
> From: Marc Zyngier 
> 
> When a counter is disabled, its value is sampled before the event
> is being disabled, and the value written back in the shadow register.
> 
> In that process, the value gets truncated to 32bit, which is adequate
> for any counter but the cycle counter (defined as a 64bit counter).
> 
> This obviously results in a corrupted counter, and things like
> "perf record -e cycles" not working at all when run in a guest...
> A similar, but less critical bug exists in kvm_pmu_get_counter_value.
> 
> Make the truncation conditional on the counter not being the cycle
> counter, which results in a minor code reorganisation.
> 
> Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters")
> Cc: Andrew Murray 
> Reported-by: Julien Thierry 
> Signed-off-by: Marc Zyngier 
> ---

Reviewed-by: Andrew Murray 

>  virt/kvm/arm/pmu.c | 22 --
>  1 file changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index 362a01886bab..c30c3a74fc7f 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -146,8 +146,7 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 
> select_idx)
>   if (kvm_pmu_pmc_is_chained(pmc) &&
>   kvm_pmu_idx_is_high_counter(select_idx))
>   counter = upper_32_bits(counter);
> -
> - else if (!kvm_pmu_idx_is_64bit(vcpu, select_idx))
> + else if (select_idx != ARMV8_PMU_CYCLE_IDX)
>   counter = lower_32_bits(counter);
>  
>   return counter;
> @@ -193,7 +192,7 @@ static void kvm_pmu_release_perf_event(struct kvm_pmc 
> *pmc)
>   */
>  static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, struct kvm_pmc *pmc)
>  {
> - u64 counter, reg;
> + u64 counter, reg, val;
>  
>   pmc = kvm_pmu_get_canonical_pmc(pmc);
>   if (!pmc->perf_event)
> @@ -201,16 +200,19 @@ static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, 
> struct kvm_pmc *pmc)
>  
>   counter = kvm_pmu_get_pair_counter_value(vcpu, pmc);
>  
> - if (kvm_pmu_pmc_is_chained(pmc)) {
> - reg = PMEVCNTR0_EL0 + pmc->idx;
> - __vcpu_sys_reg(vcpu, reg) = lower_32_bits(counter);
> - __vcpu_sys_reg(vcpu, reg + 1) = upper_32_bits(counter);
> + if (pmc->idx == ARMV8_PMU_CYCLE_IDX) {
> + reg = PMCCNTR_EL0;
> + val = counter;
>   } else {
> - reg = (pmc->idx == ARMV8_PMU_CYCLE_IDX)
> -? PMCCNTR_EL0 : PMEVCNTR0_EL0 + pmc->idx;
> - __vcpu_sys_reg(vcpu, reg) = lower_32_bits(counter);
> + reg = PMEVCNTR0_EL0 + pmc->idx;
> + val = lower_32_bits(counter);
>   }
>  
> + __vcpu_sys_reg(vcpu, reg) = val;
> +
> + if (kvm_pmu_pmc_is_chained(pmc))
> + __vcpu_sys_reg(vcpu, reg + 1) = upper_32_bits(counter);
> +
>   kvm_pmu_release_perf_event(pmc);
>  }
>  
> -- 
> 2.20.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: pmu: Fix cycle counter truncation on counter stop

2019-10-04 Thread Andrew Murray
On Fri, Oct 04, 2019 at 03:42:57PM +0100, Marc Zyngier wrote:
> On Fri, 04 Oct 2019 15:10:06 +0100,
> Andrew Murray  wrote:
> > 
> > On Fri, Oct 04, 2019 at 11:08:29AM +0100, Marc Zyngier wrote:
> > > On Fri, 4 Oct 2019 09:55:55 +0100
> > > Andrew Murray  wrote:
> > > 
> > > > On Thu, Oct 03, 2019 at 06:24:00PM +0100, Marc Zyngier wrote:
> > > > > When a counter is disabled, its value is sampled before the event
> > > > > is being disabled, and the value written back in the shadow register.
> > > > > 
> > > > > In that process, the value gets truncated to 32bit, which is adequate 
> > > > >  
> > > > 
> > > > Doh, that shouldn't have happened.
> > > > 
> > > > > for any counter but the cycle counter, which can be configured to
> > > > > hold a 64bit value. This obviously results in a corrupted counter,
> > > > > and things like "perf record -e cycles" not working at all when
> > > > > run in a guest...
> > > > > 
> > > > > Make the truncation conditional on the counter not being 64bit.
> > > > > 
> > > > > Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters")
> > > > > Cc: Andrew Murray 
> > > > > Reported-by: Julien Thierry Julien Thierry 
> > > > > 
> > > > > Signed-off-by: Marc Zyngier 
> > > > > ---
> > > > >  virt/kvm/arm/pmu.c | 4 +++-
> > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> > > > > index 362a01886bab..d716aef2bae9 100644
> > > > > --- a/virt/kvm/arm/pmu.c
> > > > > +++ b/virt/kvm/arm/pmu.c
> > > > > @@ -206,9 +206,11 @@ static void kvm_pmu_stop_counter(struct kvm_vcpu 
> > > > > *vcpu, struct kvm_pmc *pmc)
> > > > >   __vcpu_sys_reg(vcpu, reg) = lower_32_bits(counter);
> > > > >   __vcpu_sys_reg(vcpu, reg + 1) = upper_32_bits(counter);
> > > > >   } else {
> > > > > + if (!kvm_pmu_idx_is_64bit(vcpu, pmc->idx))
> > > > > + counter = lower_32_bits(counter);
> > > > >   reg = (pmc->idx == ARMV8_PMU_CYCLE_IDX)
> > > > >  ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + pmc->idx;
> > > > > - __vcpu_sys_reg(vcpu, reg) = lower_32_bits(counter);
> > > > > + __vcpu_sys_reg(vcpu, reg) = counter;  
> > > > 
> > > > The other uses of lower_32_bits look OK to me.
> > > > 
> > > > Reviewed-by: Andrew Murray 
> > > > 
> > > > As a side note, I'm not convinced that the implementation (or perhaps 
> > > > the
> > > > use of) kvm_pmu_idx_is_64bit is correct:
> > > > 
> > > > static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, u64 select_idx)
> > > > {
> > > > return (select_idx == ARMV8_PMU_CYCLE_IDX &&
> > > > __vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_LC);
> > > > }
> > > > 
> > > > We shouldn't truncate the value of a cycle counter to 32 bits just 
> > > > because
> > > > _PMCR_LC is unset. We should only be interested in _PMCR_LC when setting
> > > > the sample_period.
> > > 
> > > That's a good point. The ARMv8 ARM says:
> > > 
> > > "Long cycle counter enable. Determines when unsigned overflow is
> > > recorded by the cycle counter overflow bit."
> > > 
> > > which doesn't say anything about the counter being truncated one way or
> > > another.
> > > 
> > > > If you agree this is wrong, I'll spin a change.
> > > 
> > > I still think kvm_pmu_idx_is_64bit() correct, and would be easily
> > > extended to supporting the ARMv8.5-PMU extension. However, it'd be
> > > better to just detect the cycle counter in the current patch rather
> > > than relying on the above helper:
> > 
> > I guess at present kvm_pmu_idx_is_64bit has the meaning "does the counter
> > have a 64 bit overflow". (And we check for the CYCLE_IDX because at
> > present thats the only thing that *can* have a 64bit overflow.)
> 
> Exactly. The function is badly named, but hey, we'll live with it
> until we refactor this further.
> 
&g

Re: [PATCH] KVM: arm64: pmu: Fix cycle counter truncation on counter stop

2019-10-04 Thread Andrew Murray
On Fri, Oct 04, 2019 at 11:08:29AM +0100, Marc Zyngier wrote:
> On Fri, 4 Oct 2019 09:55:55 +0100
> Andrew Murray  wrote:
> 
> > On Thu, Oct 03, 2019 at 06:24:00PM +0100, Marc Zyngier wrote:
> > > When a counter is disabled, its value is sampled before the event
> > > is being disabled, and the value written back in the shadow register.
> > > 
> > > In that process, the value gets truncated to 32bit, which is adequate  
> > 
> > Doh, that shouldn't have happened.
> > 
> > > for any counter but the cycle counter, which can be configured to
> > > hold a 64bit value. This obviously results in a corrupted counter,
> > > and things like "perf record -e cycles" not working at all when
> > > run in a guest...
> > > 
> > > Make the truncation conditional on the counter not being 64bit.
> > > 
> > > Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters")
> > > Cc: Andrew Murray 
> > > Reported-by: Julien Thierry Julien Thierry 
> > > Signed-off-by: Marc Zyngier 
> > > ---
> > >  virt/kvm/arm/pmu.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> > > index 362a01886bab..d716aef2bae9 100644
> > > --- a/virt/kvm/arm/pmu.c
> > > +++ b/virt/kvm/arm/pmu.c
> > > @@ -206,9 +206,11 @@ static void kvm_pmu_stop_counter(struct kvm_vcpu 
> > > *vcpu, struct kvm_pmc *pmc)
> > >   __vcpu_sys_reg(vcpu, reg) = lower_32_bits(counter);
> > >   __vcpu_sys_reg(vcpu, reg + 1) = upper_32_bits(counter);
> > >   } else {
> > > + if (!kvm_pmu_idx_is_64bit(vcpu, pmc->idx))
> > > + counter = lower_32_bits(counter);
> > >   reg = (pmc->idx == ARMV8_PMU_CYCLE_IDX)
> > >  ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + pmc->idx;
> > > - __vcpu_sys_reg(vcpu, reg) = lower_32_bits(counter);
> > > + __vcpu_sys_reg(vcpu, reg) = counter;  
> > 
> > The other uses of lower_32_bits look OK to me.
> > 
> > Reviewed-by: Andrew Murray 
> > 
> > As a side note, I'm not convinced that the implementation (or perhaps the
> > use of) kvm_pmu_idx_is_64bit is correct:
> > 
> > static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, u64 select_idx)
> > {
> > return (select_idx == ARMV8_PMU_CYCLE_IDX &&
> > __vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_LC);
> > }
> > 
> > We shouldn't truncate the value of a cycle counter to 32 bits just because
> > _PMCR_LC is unset. We should only be interested in _PMCR_LC when setting
> > the sample_period.
> 
> That's a good point. The ARMv8 ARM says:
> 
> "Long cycle counter enable. Determines when unsigned overflow is
> recorded by the cycle counter overflow bit."
> 
> which doesn't say anything about the counter being truncated one way or
> another.
> 
> > If you agree this is wrong, I'll spin a change.
> 
> I still think kvm_pmu_idx_is_64bit() correct, and would be easily
> extended to supporting the ARMv8.5-PMU extension. However, it'd be
> better to just detect the cycle counter in the current patch rather
> than relying on the above helper:

I guess at present kvm_pmu_idx_is_64bit has the meaning "does the counter
have a 64 bit overflow". (And we check for the CYCLE_IDX because at
present thats the only thing that *can* have a 64bit overflow.)

> 
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index d716aef2bae9..90a90d8f7280 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -206,7 +206,7 @@ static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, 
> struct kvm_pmc *pmc)
>   __vcpu_sys_reg(vcpu, reg) = lower_32_bits(counter);
>   __vcpu_sys_reg(vcpu, reg + 1) = upper_32_bits(counter);
>   } else {
> - if (!kvm_pmu_idx_is_64bit(vcpu, pmc->idx))
> + if (pmc->idx != ARMV8_PMU_CYCLE_IDX)
>   counter = lower_32_bits(counter);
>   reg = (pmc->idx == ARMV8_PMU_CYCLE_IDX)
>  ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + pmc->idx;
> 

That looks fine to me.

> 
> As for revamping the rest of the code, that's 5.5 material.

The only other change required would be as follows:

diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 362a01886bab..2435119b8524 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -147,7 +147,7 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 
select_idx)
kvm_pm

Re: [PATCH] KVM: arm64: pmu: Fix cycle counter truncation on counter stop

2019-10-04 Thread Andrew Murray
On Thu, Oct 03, 2019 at 06:24:00PM +0100, Marc Zyngier wrote:
> When a counter is disabled, its value is sampled before the event
> is being disabled, and the value written back in the shadow register.
> 
> In that process, the value gets truncated to 32bit, which is adequate

Doh, that shouldn't have happened.

> for any counter but the cycle counter, which can be configured to
> hold a 64bit value. This obviously results in a corrupted counter,
> and things like "perf record -e cycles" not working at all when
> run in a guest...
> 
> Make the truncation conditional on the counter not being 64bit.
> 
> Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters")
> Cc: Andrew Murray 
> Reported-by: Julien Thierry Julien Thierry 
> Signed-off-by: Marc Zyngier 
> ---
>  virt/kvm/arm/pmu.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index 362a01886bab..d716aef2bae9 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -206,9 +206,11 @@ static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, 
> struct kvm_pmc *pmc)
>   __vcpu_sys_reg(vcpu, reg) = lower_32_bits(counter);
>   __vcpu_sys_reg(vcpu, reg + 1) = upper_32_bits(counter);
>   } else {
> + if (!kvm_pmu_idx_is_64bit(vcpu, pmc->idx))
> + counter = lower_32_bits(counter);
>   reg = (pmc->idx == ARMV8_PMU_CYCLE_IDX)
>  ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + pmc->idx;
> - __vcpu_sys_reg(vcpu, reg) = lower_32_bits(counter);
> + __vcpu_sys_reg(vcpu, reg) = counter;

The other uses of lower_32_bits look OK to me.

Reviewed-by: Andrew Murray 

As a side note, I'm not convinced that the implementation (or perhaps the
use of) kvm_pmu_idx_is_64bit is correct:

static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, u64 select_idx)
{
return (select_idx == ARMV8_PMU_CYCLE_IDX &&
__vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_LC);
}

We shouldn't truncate the value of a cycle counter to 32 bits just because
_PMCR_LC is unset. We should only be interested in _PMCR_LC when setting
the sample_period.

If you agree this is wrong, I'll spin a change.

Though unsetting _PMCR_LC is deprecated so I can't imagine this causes any
issue.

Thanks,

Andrew Murray

>   }
>  
>   kvm_pmu_release_perf_event(pmc);
> -- 
> 2.20.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 04/35] irqchip/gic-v3: Detect GICv4.1 supporting RVPEID

2019-09-24 Thread Andrew Murray
On Tue, Sep 24, 2019 at 11:49:24AM +0100, Marc Zyngier wrote:
> On 24/09/2019 11:24, Andrew Murray wrote:
> > On Mon, Sep 23, 2019 at 07:25:35PM +0100, Marc Zyngier wrote:
> >> GICv4.1 supports the RVPEID ("Residency per vPE ID"), which allows for
> >> a much efficient way of making virtual CPUs resident (to allow direct
> >> injection of interrupts).
> >>
> >> The functionnality needs to be discovered on each and every redistributor
> >> in the system, and disabled if the settings are inconsistent.
> >>
> >> Signed-off-by: Marc Zyngier 
> >> ---
> >>  drivers/irqchip/irq-gic-v3.c   | 21 ++---
> >>  include/linux/irqchip/arm-gic-v3.h |  2 ++
> >>  2 files changed, 20 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> >> index 422664ac5f53..0b545e2c3498 100644
> >> --- a/drivers/irqchip/irq-gic-v3.c
> >> +++ b/drivers/irqchip/irq-gic-v3.c
> >> @@ -849,8 +849,21 @@ static int __gic_update_rdist_properties(struct 
> >> redist_region *region,
> >> void __iomem *ptr)
> >>  {
> >>u64 typer = gic_read_typer(ptr + GICR_TYPER);
> >> +
> >>gic_data.rdists.has_vlpis &= !!(typer & GICR_TYPER_VLPIS);
> >> -  gic_data.rdists.has_direct_lpi &= !!(typer & GICR_TYPER_DirectLPIS);
> >> +
> >> +  /* RVPEID implies some form of DirectLPI, no matter what the doc 
> >> says... :-/ */
> > 
> > I think the doc says, RVPEID is *always* 1 for GICv4.1 (and presumably 
> > beyond)
> > and when RVPEID==1 then DirectLPI is *always* 0 - but that's OK because for
> > GICv4.1 support for direct LPIs is mandatory.
> 
> Well, v4.1 support for DirectLPI is pretty patchy. It has just enough
> features to make it useful.
> 
> > 
> >> +  gic_data.rdists.has_rvpeid &= !!(typer & GICR_TYPER_RVPEID);
> >> +  gic_data.rdists.has_direct_lpi &= (!!(typer & GICR_TYPER_DirectLPIS) |
> >> + gic_data.rdists.has_rvpeid);
> >> +
> >> +  /* Detect non-sensical configurations */
> >> +  if (WARN_ON_ONCE(gic_data.rdists.has_rvpeid && 
> >> !gic_data.rdists.has_vlpis)) {
> > 
> > How feasible is the following suitation? All the redistributors in the 
> > system has
> > vlpis=0, and only the first redistributor has rvpeid=1 (with the remaining 
> > ones
> > rvpeid=0).If we evaluate this WARN_ON_ONCE on each call to
> > __gic_update_rdist_properties we end up without direct LPI support, however 
> > if we
> > evaluated this after iterating through all the redistributors then we'd end 
> > up
> > with direct LPI support and a non-essential WARN.
> > 
> > Should we do the WARN after iterating through all the redistributors once we
> > know what the final values of these flags will be, perhaps in
> > gic_update_rdist_properties?
> 
> What does it gains us?

It prevents an unnecessary WARN.

If the first redistributor has rvpeid=1, vlpis=0, direct_lpi=1, and the others
have rvpeid=0, vlpis=0, direct_lpi=0. At the end of iteration, without the
WARN if statement, you end up wth rvpeid=0, vlpis=0, direct_lpi=0. I.e. it's
done the right thing. In this use-case the WARN doesn't achieve anything other
than give the user a pointless WARN. If the WARN was moved to after iteration
then the WARN wouldn't fire.

I have no idea how likely this use-case is.

> The moment we've detected any inconsistency, any
> use of DirectLPI or VLPIs is a big nono, because the redistributors care
> not designed to communicate with each other, and we might as well do
> that early. Frankly, the HW should have stayed in someone's lab. The
> only reason I have that code in is to detect the FVP model being
> misconfigured, which is pretty easy to do

Thanks,

Andrew Murray

> 
>   M.
> -- 
> Jazz is not dead, it just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 05/35] irqchip/gic-v3: Add GICv4.1 VPEID size discovery

2019-09-24 Thread Andrew Murray
On Mon, Sep 23, 2019 at 07:25:36PM +0100, Marc Zyngier wrote:
> While GICv4.0 mandates 16 bit worth of VPEIDs, GICv4.1 allows smaller

s/VPEIDs/vPEIDs/

> implementations to be built. Add the required glue to dynamically
> compute the limit.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  drivers/irqchip/irq-gic-v3-its.c   | 11 ++-
>  drivers/irqchip/irq-gic-v3.c   |  3 +++
>  include/linux/irqchip/arm-gic-v3.h |  5 +
>  3 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> b/drivers/irqchip/irq-gic-v3-its.c
> index c94eb287393b..17b77a0b9d97 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -119,7 +119,16 @@ struct its_node {
>  #define ITS_ITT_ALIGNSZ_256
>  
>  /* The maximum number of VPEID bits supported by VLPI commands */
> -#define ITS_MAX_VPEID_BITS   (16)
> +#define ITS_MAX_VPEID_BITS   \
> + ({  \
> + int nvpeid = 16;\
> + if (gic_rdists->has_rvpeid &&   \

We use rvpeid as a way of determining if this is a GICv4.1, are there any
other means of determining this? If we use it in this way, is there any
benefit to having a has_gicv4_1 type of flag instead?

Also for 'insane' configurations we set has_rvpeid to false, thus preventing
this feature. Does it make sense to do that?

GICD_TYPER2 is reserved in GICv4, however I understand this reads as RES0,
can we just rely on that instead? (We read it below anyway).

> + gic_rdists->gicd_typer2 & GICD_TYPER2_VIL)  \
> + nvpeid = 1 + (gic_rdists->gicd_typer2 & \
> +   GICD_TYPER2_VID); \
> + \
> + nvpeid; \
> + })
>  #define ITS_MAX_VPEID(1 << (ITS_MAX_VPEID_BITS))
>  
>  /* Convert page order to size in bytes */
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 0b545e2c3498..fb6360161d6c 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -1556,6 +1556,9 @@ static int __init gic_init_bases(void __iomem 
> *dist_base,
>  
>   pr_info("%d SPIs implemented\n", GIC_LINE_NR - 32);
>   pr_info("%d Extended SPIs implemented\n", GIC_ESPI_NR);
> +
> + gic_data.rdists.gicd_typer2 = readl_relaxed(gic_data.dist_base + 
> GICD_TYPER2);
> +
>   gic_data.domain = irq_domain_create_tree(handle, _irq_domain_ops,
>_data);
>   irq_domain_update_bus_token(gic_data.domain, DOMAIN_BUS_WIRED);
> diff --git a/include/linux/irqchip/arm-gic-v3.h 
> b/include/linux/irqchip/arm-gic-v3.h
> index b34e0c113697..71730b9def0c 100644
> --- a/include/linux/irqchip/arm-gic-v3.h
> +++ b/include/linux/irqchip/arm-gic-v3.h
> @@ -13,6 +13,7 @@
>  #define GICD_CTLR0x
>  #define GICD_TYPER   0x0004
>  #define GICD_IIDR0x0008
> +#define GICD_TYPER2  0x000C
>  #define GICD_STATUSR 0x0010
>  #define GICD_SETSPI_NSR  0x0040
>  #define GICD_CLRSPI_NSR  0x0048
> @@ -89,6 +90,9 @@
>  #define GICD_TYPER_ESPIS(typer)  
> \
>   (((typer) & GICD_TYPER_ESPI) ? GICD_TYPER_SPIS((typer) >> 27) : 0)
>  
> +#define GICD_TYPER2_VIL  (1U << 7)
> +#define GICD_TYPER2_VID  GENMASK(4, 0)

Given that the 4th bit is reserved for future expansion and values greater
than 0xF are reserved, is there value in changing this to GENMASK(3, 0)?

> +
>  #define GICD_IROUTER_SPI_MODE_ONE(0U << 31)
>  #define GICD_IROUTER_SPI_MODE_ANY(1U << 31)
>  
> @@ -613,6 +617,7 @@ struct rdists {
>   void*prop_table_va;
>   u64 flags;
>   u32 gicd_typer;
> + u32 gicd_typer2;
>   boolhas_vlpis;
>   boolhas_rvpeid;
>   boolhas_direct_lpi;
> -- 
> 2.20.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 04/35] irqchip/gic-v3: Detect GICv4.1 supporting RVPEID

2019-09-24 Thread Andrew Murray
On Mon, Sep 23, 2019 at 07:25:35PM +0100, Marc Zyngier wrote:
> GICv4.1 supports the RVPEID ("Residency per vPE ID"), which allows for
> a much efficient way of making virtual CPUs resident (to allow direct
> injection of interrupts).
> 
> The functionnality needs to be discovered on each and every redistributor
> in the system, and disabled if the settings are inconsistent.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  drivers/irqchip/irq-gic-v3.c   | 21 ++---
>  include/linux/irqchip/arm-gic-v3.h |  2 ++
>  2 files changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 422664ac5f53..0b545e2c3498 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -849,8 +849,21 @@ static int __gic_update_rdist_properties(struct 
> redist_region *region,
>void __iomem *ptr)
>  {
>   u64 typer = gic_read_typer(ptr + GICR_TYPER);
> +
>   gic_data.rdists.has_vlpis &= !!(typer & GICR_TYPER_VLPIS);
> - gic_data.rdists.has_direct_lpi &= !!(typer & GICR_TYPER_DirectLPIS);
> +
> + /* RVPEID implies some form of DirectLPI, no matter what the doc 
> says... :-/ */

I think the doc says, RVPEID is *always* 1 for GICv4.1 (and presumably beyond)
and when RVPEID==1 then DirectLPI is *always* 0 - but that's OK because for
GICv4.1 support for direct LPIs is mandatory.

> + gic_data.rdists.has_rvpeid &= !!(typer & GICR_TYPER_RVPEID);
> + gic_data.rdists.has_direct_lpi &= (!!(typer & GICR_TYPER_DirectLPIS) |
> +gic_data.rdists.has_rvpeid);
> +
> + /* Detect non-sensical configurations */
> + if (WARN_ON_ONCE(gic_data.rdists.has_rvpeid && 
> !gic_data.rdists.has_vlpis)) {

How feasible is the following suitation? All the redistributors in the system 
has
vlpis=0, and only the first redistributor has rvpeid=1 (with the remaining ones
rvpeid=0). If we evaluate this WARN_ON_ONCE on each call to
__gic_update_rdist_properties we end up without direct LPI support, however if 
we
evaluated this after iterating through all the redistributors then we'd end up
with direct LPI support and a non-essential WARN.

Should we do the WARN after iterating through all the redistributors once we
know what the final values of these flags will be, perhaps in
gic_update_rdist_properties?

> + gic_data.rdists.has_direct_lpi = false;
> + gic_data.rdists.has_vlpis = false;
> + gic_data.rdists.has_rvpeid = false;
> + }
> +
>   gic_data.ppi_nr = min(GICR_TYPER_NR_PPIS(typer), gic_data.ppi_nr);
>  
>   return 1;
> @@ -863,9 +876,10 @@ static void gic_update_rdist_properties(void)
>   if (WARN_ON(gic_data.ppi_nr == UINT_MAX))
>   gic_data.ppi_nr = 0;
>   pr_info("%d PPIs implemented\n", gic_data.ppi_nr);
> - pr_info("%sVLPI support, %sdirect LPI support\n",
> + pr_info("%sVLPI support, %sdirect LPI support, %sRVPEID support\n",
>   !gic_data.rdists.has_vlpis ? "no " : "",
> - !gic_data.rdists.has_direct_lpi ? "no " : "");
> + !gic_data.rdists.has_direct_lpi ? "no " : "",
> + !gic_data.rdists.has_rvpeid ? "no " : "");
>  }
>  
>  /* Check whether it's single security state view */
> @@ -1546,6 +1560,7 @@ static int __init gic_init_bases(void __iomem 
> *dist_base,
>_data);
>   irq_domain_update_bus_token(gic_data.domain, DOMAIN_BUS_WIRED);
>   gic_data.rdists.rdist = alloc_percpu(typeof(*gic_data.rdists.rdist));
> + gic_data.rdists.has_rvpeid = true;
>   gic_data.rdists.has_vlpis = true;
>   gic_data.rdists.has_direct_lpi = true;
>  
> diff --git a/include/linux/irqchip/arm-gic-v3.h 
> b/include/linux/irqchip/arm-gic-v3.h
> index 5cc10cf7cb3e..b34e0c113697 100644
> --- a/include/linux/irqchip/arm-gic-v3.h
> +++ b/include/linux/irqchip/arm-gic-v3.h
> @@ -234,6 +234,7 @@
>  #define GICR_TYPER_VLPIS (1U << 1)
>  #define GICR_TYPER_DirectLPIS(1U << 3)
>  #define GICR_TYPER_LAST  (1U << 4)
> +#define GICR_TYPER_RVPEID(1U << 7)
>  
>  #define GIC_V3_REDIST_SIZE   0x2
>  
> @@ -613,6 +614,7 @@ struct rdists {
>   u64 flags;
>   u32 gicd_typer;
>   boolhas_vlpis;
> + boolhas_rvpeid;
>   boolhas_direct_lpi;
>  };
>  
> -- 
> 2.20.1
> 


Re: [PATCH 02/35] irqchip/gic-v3-its: Factor out wait_for_syncr primitive

2019-09-24 Thread Andrew Murray
On Mon, Sep 23, 2019 at 07:25:33PM +0100, Marc Zyngier wrote:
> Waiting for a redistributor to have performed an operation is a
> common thing to do, and the idiom is already spread around.
> As we're going to make even more use of this, let's have a primitive
> that does just that.
> 
> Signed-off-by: Marc Zyngier 

Reviewed-by: Andrew Murray 

> ---
>  drivers/irqchip/irq-gic-v3-its.c | 15 +--
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> b/drivers/irqchip/irq-gic-v3-its.c
> index 62e54f1a248b..58cb233cf138 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1075,6 +1075,12 @@ static void lpi_write_config(struct irq_data *d, u8 
> clr, u8 set)
>   dsb(ishst);
>  }
>  
> +static void wait_for_syncr(void __iomem *rdbase)
> +{
> + while (gic_read_lpir(rdbase + GICR_SYNCR) & 1)
> + cpu_relax();
> +}
> +
>  static void lpi_update_config(struct irq_data *d, u8 clr, u8 set)
>  {
>   struct its_device *its_dev = irq_data_get_irq_chip_data(d);
> @@ -2757,8 +2763,7 @@ static void its_vpe_db_proxy_move(struct its_vpe *vpe, 
> int from, int to)
>  
>   rdbase = per_cpu_ptr(gic_rdists->rdist, from)->rd_base;
>   gic_write_lpir(vpe->vpe_db_lpi, rdbase + GICR_CLRLPIR);
> - while (gic_read_lpir(rdbase + GICR_SYNCR) & 1)
> - cpu_relax();
> + wait_for_syncr(rdbase);
>  
>   return;
>   }
> @@ -2914,8 +2919,7 @@ static void its_vpe_send_inv(struct irq_data *d)
>  
>   rdbase = per_cpu_ptr(gic_rdists->rdist, vpe->col_idx)->rd_base;
>   gic_write_lpir(vpe->vpe_db_lpi, rdbase + GICR_INVLPIR);
> - while (gic_read_lpir(rdbase + GICR_SYNCR) & 1)
> - cpu_relax();
> + wait_for_syncr(rdbase);
>   } else {
>   its_vpe_send_cmd(vpe, its_send_inv);
>   }
> @@ -2957,8 +2961,7 @@ static int its_vpe_set_irqchip_state(struct irq_data *d,
>   gic_write_lpir(vpe->vpe_db_lpi, rdbase + GICR_SETLPIR);
>   } else {
>   gic_write_lpir(vpe->vpe_db_lpi, rdbase + GICR_CLRLPIR);
> - while (gic_read_lpir(rdbase + GICR_SYNCR) & 1)
> - cpu_relax();
> + wait_for_syncr(rdbase);
>   }
>   } else {
>   if (state)
> -- 
> 2.20.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put

2019-09-05 Thread Andrew Murray
raction by adding this comment? Surely we just
want to indicate that we're done with ITS for now - do whatever you need to do.

This would have made more sense to me if the comment above was removed in this
patch rather than added.

>*/
>   preempt_disable();
>   kvm_vgic_vmcr_sync(vcpu);
> + vgic_v4_put(vcpu, true);
>   preempt_enable();
> -
> - kvm_vgic_v4_enable_doorbell(vcpu);
>  }
>  
>  void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
>  {
> - kvm_vgic_v4_disable_doorbell(vcpu);
> + preempt_disable();
> + vgic_v4_load(vcpu);
> + preempt_enable();
>  }
>  
>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> index 8d69f007dd0c..48307a9eb1d8 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -664,6 +664,8 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
>  
>   if (has_vhe())
>   __vgic_v3_activate_traps(vcpu);
> +
> + WARN_ON(vgic_v4_load(vcpu));
>  }
>  
>  void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
> @@ -676,6 +678,8 @@ void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
>  
>  void vgic_v3_put(struct kvm_vcpu *vcpu)
>  {
> + WARN_ON(vgic_v4_put(vcpu, false));
> +
>   vgic_v3_vmcr_sync(vcpu);
>  
>   kvm_call_hyp(__vgic_v3_save_aprs, vcpu);
> diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c
> index 477af6aebb97..3a8a28854b13 100644
> --- a/virt/kvm/arm/vgic/vgic-v4.c
> +++ b/virt/kvm/arm/vgic/vgic-v4.c
> @@ -85,6 +85,10 @@ static irqreturn_t vgic_v4_doorbell_handler(int irq, void 
> *info)
>  {
>   struct kvm_vcpu *vcpu = info;
>  
> + /* We got the message, no need to fire again */
> + if (!irqd_irq_disabled(_to_desc(irq)->irq_data))
> + disable_irq_nosync(irq);
> +
>   vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
>   kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>   kvm_vcpu_kick(vcpu);

This is because the doorbell will fire each time any guest device interrupts,
however we only need to tell the guest just once that something has happened
right?

> @@ -192,20 +196,30 @@ void vgic_v4_teardown(struct kvm *kvm)
>   its_vm->vpes = NULL;
>  }
>  
> -int vgic_v4_sync_hwstate(struct kvm_vcpu *vcpu)
> +int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db)
>  {
> - if (!vgic_supports_direct_msis(vcpu->kvm))
> + struct its_vpe *vpe = >arch.vgic_cpu.vgic_v3.its_vpe;
> + struct irq_desc *desc = irq_to_desc(vpe->irq);
> +
> + if (!vgic_supports_direct_msis(vcpu->kvm) || !vpe->resident)
>   return 0;

Are we using !vpe->resident to avoid pointlessly doing work we've
already done?

>  
> - return its_schedule_vpe(>arch.vgic_cpu.vgic_v3.its_vpe, false);
> + /*
> +  * If blocking, a doorbell is required. Undo the nested
> +  * disable_irq() calls...
> +  */
> + while (need_db && irqd_irq_disabled(>irq_data))
> + enable_irq(vpe->irq);
> +
> + return its_schedule_vpe(vpe, false);
>  }
>  
> -int vgic_v4_flush_hwstate(struct kvm_vcpu *vcpu)
> +int vgic_v4_load(struct kvm_vcpu *vcpu)
>  {
> - int irq = vcpu->arch.vgic_cpu.vgic_v3.its_vpe.irq;
> + struct its_vpe *vpe = >arch.vgic_cpu.vgic_v3.its_vpe;
>   int err;
>  
> - if (!vgic_supports_direct_msis(vcpu->kvm))
> + if (!vgic_supports_direct_msis(vcpu->kvm) || vpe->resident)
>   return 0;
>  
>   /*
> @@ -214,11 +228,14 @@ int vgic_v4_flush_hwstate(struct kvm_vcpu *vcpu)
>* doc in drivers/irqchip/irq-gic-v4.c to understand how this
>* turns into a VMOVP command at the ITS level.
>*/
> - err = irq_set_affinity(irq, cpumask_of(smp_processor_id()));
> + err = irq_set_affinity(vpe->irq, cpumask_of(smp_processor_id()));
>   if (err)
>   return err;
>  
> - err = its_schedule_vpe(>arch.vgic_cpu.vgic_v3.its_vpe, true);
> + /* Disabled the doorbell, as we're about to enter the guest */
> + disable_irq(vpe->irq);
> +
> + err = its_schedule_vpe(vpe, true);
>   if (err)
>   return err;

Given that the doorbell corresponds with vpe residency, it could make sense
to add a helper here that calls its_schedule_vpe and [disable,enable]_irq.
Though I see that vgic_v3_put calls vgic_v4_put with need_db=false. I wonder
what effect setting that to true would be for vgic_v3_put? Could it be known
that v3 won't set need_db to true?

Thanks,

Andrew Murray

>  
> @@ -226,9 +243,7 @@ int vgic_v4_flush_hwstate(struct kvm_vcpu *vcpu)
>* Now that the VPE is resident, 

Re: BUG: KASAN: slab-out-of-bounds in kvm_pmu_get_canonical_pmc+0x48/0x78

2019-07-16 Thread Andrew Murray
On Tue, Jul 16, 2019 at 11:14:37PM +0800, Zenghui Yu wrote:
> 
> On 2019/7/16 23:05, Zenghui Yu wrote:
> > Hi folks,
> > 
> > Running the latest kernel with KASAN enabled, we will hit the following
> > KASAN BUG during guest's boot process.
> > 
> > I'm in commit 9637d517347e80ee2fe1c5d8ce45ba1b88d8b5cd.
> > 
> > Any problems in the chained PMU code? Or just a false positive?
> > 
> > ---8<---
> > 
> > [  654.706268]
> > ==
> > [  654.706280] BUG: KASAN: slab-out-of-bounds in
> > kvm_pmu_get_canonical_pmc+0x48/0x78
> > [  654.706286] Read of size 8 at addr 801d6c8fea38 by task
> > qemu-kvm/23268
> > 
> > [  654.706296] CPU: 2 PID: 23268 Comm: qemu-kvm Not tainted 5.2.0+ #178
> > [  654.706301] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.58
> > 10/24/2018
> > [  654.706305] Call trace:
> > [  654.706311]  dump_backtrace+0x0/0x238
> > [  654.706317]  show_stack+0x24/0x30
> > [  654.706325]  dump_stack+0xe0/0x134
> > [  654.706332]  print_address_description+0x80/0x408
> > [  654.706338]  __kasan_report+0x164/0x1a0
> > [  654.706343]  kasan_report+0xc/0x18
> > [  654.706348]  __asan_load8+0x88/0xb0
> > [  654.706353]  kvm_pmu_get_canonical_pmc+0x48/0x78
> 
> I noticed that we will use "pmc->idx" and the "chained" bitmap to
> determine if the pmc is chained, in kvm_pmu_pmc_is_chained().
> 
> Should we initialize the idx and the bitmap appropriately before
> doing kvm_pmu_stop_counter()?  Like:

Hi Zenghui,

Thanks for spotting this and investigating - I'll make sure to use KASAN
in the future when testing...

> 
> 
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index 3dd8238..cf3119a 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -224,12 +224,12 @@ void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu)
>   int i;
>   struct kvm_pmu *pmu = >arch.pmu;
> 
> + bitmap_zero(vcpu->arch.pmu.chained, ARMV8_PMU_MAX_COUNTER_PAIRS);
> +
>   for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) {
> - kvm_pmu_stop_counter(vcpu, >pmc[i]);
>   pmu->pmc[i].idx = i;
> + kvm_pmu_stop_counter(vcpu, >pmc[i]);
>   }
> -
> - bitmap_zero(vcpu->arch.pmu.chained, ARMV8_PMU_MAX_COUNTER_PAIRS);
>  }

We have to be a little careful here, as the vcpu may be reset after use.
Upon resetting we must ensure that any existing perf_events are released -
this is why kvm_pmu_stop_counter is called before bitmap_zero (as
kvm_pmu_stop_counter relies on kvm_pmu_pmc_is_chained).

(For example, by clearing the bitmap before stopping the counters, we will
attempt to release the perf event for both pmc's in a chained pair. Whereas
we should only release the canonical pmc. It's actually OK right now as we
set the non-canonical pmc perf_event will be NULL - but who knows that this
will hold true in the future. The code makes the assumption that the
non-canonical perf event isn't touched on a chained pair).

The KASAN bug gets fixed by moving the assignment of idx before 
kvm_pmu_stop_counter. Therefore I'd suggest you drop the bitmap_zero hunks.

Can you send a patch with just the idx assignment hunk please?

Thanks,

Andrew Murray

> 
>  /**
> 
> 
> Thanks,
> zenghui
> 
> > [  654.706358]  kvm_pmu_stop_counter+0x28/0x118
> > [  654.706363]  kvm_pmu_vcpu_reset+0x60/0xa8
> > [  654.706369]  kvm_reset_vcpu+0x30/0x4d8
> > [  654.706376]  kvm_arch_vcpu_ioctl+0xa04/0xc18
> > [  654.706381]  kvm_vcpu_ioctl+0x17c/0xde8
> > [  654.706387]  do_vfs_ioctl+0x150/0xaf8
> > [  654.706392]  ksys_ioctl+0x84/0xb8
> > [  654.706397]  __arm64_sys_ioctl+0x4c/0x60
> > [  654.706403]  el0_svc_common.constprop.0+0xb4/0x208
> > [  654.706409]  el0_svc_handler+0x3c/0xa8
> > [  654.706414]  el0_svc+0x8/0xc
> > 
> > [  654.706422] Allocated by task 23268:
> > [  654.706429]  __kasan_kmalloc.isra.0+0xd0/0x180
> > [  654.706435]  kasan_slab_alloc+0x14/0x20
> > [  654.706440]  kmem_cache_alloc+0x17c/0x4a8
> > [  654.706445]  kvm_arch_vcpu_create+0xa0/0x130
> > [  654.706451]  kvm_vm_ioctl+0x844/0x1218
> > [  654.706456]  do_vfs_ioctl+0x150/0xaf8
> > [  654.706461]  ksys_ioctl+0x84/0xb8
> > [  654.706466]  __arm64_sys_ioctl+0x4c/0x60
> > [  654.706472]  el0_svc_common.constprop.0+0xb4/0x208
> > [  654.706478]  el0_svc_handler+0x3c/0xa8
> > [  654.706482]  el0_svc+0x8/0xc
> > 
> > [  654.706490] Freed by task 0:
> > [  654.706493] (stack is not available)
> > 
> > [  654.706501] The buggy address belongs to the object at 801d6c

Re: [PATCH] KVM: ARM64: Update perf event when setting PMU count value

2019-06-17 Thread Andrew Murray
On Wed, Jun 12, 2019 at 09:47:05PM +0800, Xiang Zheng wrote:
> 
> On 2019/5/22 0:44, Andrew Murray wrote:
> > On Sun, May 19, 2019 at 06:05:59PM +0800, Xiang Zheng wrote:
> >> Guest will adjust the sample period and set PMU counter value when
> >> it takes a long time to handle the PMU interrupts.
> >>
> >> However, we don't have a corresponding change on the virtual PMU
> >> which is emulated via a perf event. It could cause a large number
> >> of PMU interrupts injected to guest. Then guest will get hang for
> >> handling these interrupts.
> > 
> > Yes this is indeed an issue. I believe I've addressed this in my 'chained
> > pmu' series - the relevant patch is here...
> > 
> > https://lists.cs.columbia.edu/pipermail/kvmarm/2019-May/035933.html
> > 
> > Some other comments below.
> > 
> 
> Sorry for that I didn't notice your patches...
> I will test your patch series.

Thanks.

> 
> >>
> >> So update the sample_period of perf event if the counter value is
> >> changed to avoid this case.
> >>
> >> Signed-off-by: Xiang Zheng 
> >> ---
> >>  virt/kvm/arm/pmu.c | 54 
> >> +-
> >>  1 file changed, 45 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> >> index 1c5b76c..cbad3ec 100644
> >> --- a/virt/kvm/arm/pmu.c
> >> +++ b/virt/kvm/arm/pmu.c
> >> @@ -24,6 +24,11 @@
> >>  #include 
> >>  #include 
> >>  
> >> +static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, struct kvm_pmc 
> >> *pmc);
> >> +static struct perf_event *kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu,
> >> +  struct kvm_pmc *pmc,
> >> +  struct perf_event_attr 
> >> *attr);
> >> +
> >>  /**
> >>   * kvm_pmu_get_counter_value - get PMU counter value
> >>   * @vcpu: The vcpu pointer
> >> @@ -57,11 +62,29 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, 
> >> u64 select_idx)
> >>   */
> >>  void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 
> >> val)
> >>  {
> >> -  u64 reg;
> >> +  u64 reg, counter, old_sample_period;
> >> +  struct kvm_pmu *pmu = >arch.pmu;
> >> +  struct kvm_pmc *pmc = >pmc[select_idx];
> >> +  struct perf_event *event;
> >> +  struct perf_event_attr attr;
> >>  
> >>reg = (select_idx == ARMV8_PMU_CYCLE_IDX)
> >>  ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx;
> >>__vcpu_sys_reg(vcpu, reg) += (s64)val - kvm_pmu_get_counter_value(vcpu, 
> >> select_idx);
> >> +
> >> +  if (pmc->perf_event) {
> >> +  attr = pmc->perf_event->attr;
> >> +  old_sample_period = attr.sample_period;
> >> +  counter = kvm_pmu_get_counter_value(vcpu, select_idx);
> >> +  attr.sample_period = (-counter) & pmc->bitmask;
> >> +  if (attr.sample_period == old_sample_period)
> >> +  return;
> > 
> > I'd be interested to know how often this would evaluate to true.
> > 
> 
> I have counted it while running my test script, the result shows that there 
> are 1552288
> times evaluated to true and 8294235 times not.
> 
> I think different testcases may produce different results.

You may find that this occurs more often when you are using pinned events, e.g.
when the counter is pinned to a process. When this happens perf stops and starts
the counter each time the process is switched in/out. The ARM pmu
(drivers/perf/arm_pmu.c) resets the period each time the counter is restarted
(armpmu_start), and thus rewrites the same value to the hardware counter (I 
think).

If you run "perf stat -e instructions" you'll probably find the number reduces.

I guess there is a balance between doing unnecessary work (recreating the kernel
event) and code complexity here. However there is scope for similar 
optimisations
such as not recreating the event when someone writes the same event type 
(kvm_pmu_set_counter_event_type).

> 
> >> +
> >> +  kvm_pmu_stop_counter(vcpu, pmc);
> >> +  event = kvm_pmu_create_perf_event(vcpu, pmc, );
> > 
> > I'm not sure it's necessary to change the prototype of 
> > kvm_pmu_create_perf_event
> > as this function will recalculate the sample period based on the updated 
> > counter
> > value anyway.
> >

[PATCH v10 3/5] KVM: arm/arm64: re-create event when setting counter value

2019-06-17 Thread Andrew Murray
The perf event sample_period is currently set based upon the current
counter value, when PMXEVTYPER is written to and the perf event is created.
However the user may choose to write the type before the counter value in
which case sample_period will be set incorrectly. Let's instead decouple
event creation from PMXEVTYPER and (re)create the event in either
suitation.

Signed-off-by: Andrew Murray 
Reviewed-by: Julien Thierry 
Reviewed-by: Suzuki K Poulose 
---
 virt/kvm/arm/pmu.c | 42 +-
 1 file changed, 33 insertions(+), 9 deletions(-)

diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 6e7c179103a6..ae1e886d4a1a 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 
+static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx);
 /**
  * kvm_pmu_get_counter_value - get PMU counter value
  * @vcpu: The vcpu pointer
@@ -62,6 +63,9 @@ void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 
select_idx, u64 val)
reg = (select_idx == ARMV8_PMU_CYCLE_IDX)
  ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx;
__vcpu_sys_reg(vcpu, reg) += (s64)val - kvm_pmu_get_counter_value(vcpu, 
select_idx);
+
+   /* Recreate the perf event to reflect the updated sample_period */
+   kvm_pmu_create_perf_event(vcpu, select_idx);
 }
 
 /**
@@ -378,23 +382,21 @@ static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu 
*vcpu, u64 select_idx)
 }
 
 /**
- * kvm_pmu_set_counter_event_type - set selected counter to monitor some event
+ * kvm_pmu_create_perf_event - create a perf event for a counter
  * @vcpu: The vcpu pointer
- * @data: The data guest writes to PMXEVTYPER_EL0
  * @select_idx: The number of selected counter
- *
- * When OS accesses PMXEVTYPER_EL0, that means it wants to set a PMC to count 
an
- * event with given hardware event number. Here we call perf_event API to
- * emulate this action and create a kernel perf event for it.
  */
-void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
-   u64 select_idx)
+static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
 {
struct kvm_pmu *pmu = >arch.pmu;
struct kvm_pmc *pmc = >pmc[select_idx];
struct perf_event *event;
struct perf_event_attr attr;
-   u64 eventsel, counter;
+   u64 eventsel, counter, reg, data;
+
+   reg = (select_idx == ARMV8_PMU_CYCLE_IDX)
+ ? PMCCFILTR_EL0 : PMEVTYPER0_EL0 + select_idx;
+   data = __vcpu_sys_reg(vcpu, reg);
 
kvm_pmu_stop_counter(vcpu, pmc);
eventsel = data & ARMV8_PMU_EVTYPE_EVENT;
@@ -431,6 +433,28 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
u64 data,
pmc->perf_event = event;
 }
 
+/**
+ * kvm_pmu_set_counter_event_type - set selected counter to monitor some event
+ * @vcpu: The vcpu pointer
+ * @data: The data guest writes to PMXEVTYPER_EL0
+ * @select_idx: The number of selected counter
+ *
+ * When OS accesses PMXEVTYPER_EL0, that means it wants to set a PMC to count 
an
+ * event with given hardware event number. Here we call perf_event API to
+ * emulate this action and create a kernel perf event for it.
+ */
+void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
+   u64 select_idx)
+{
+   u64 reg, event_type = data & ARMV8_PMU_EVTYPE_MASK;
+
+   reg = (select_idx == ARMV8_PMU_CYCLE_IDX)
+ ? PMCCFILTR_EL0 : PMEVTYPER0_EL0 + select_idx;
+
+   __vcpu_sys_reg(vcpu, reg) = event_type;
+   kvm_pmu_create_perf_event(vcpu, select_idx);
+}
+
 bool kvm_arm_support_pmu_v3(void)
 {
/*
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v10 5/5] KVM: arm/arm64: support chained PMU counters

2019-06-17 Thread Andrew Murray
ARMv8 provides support for chained PMU counters, where an event type
of 0x001E is set for odd-numbered counters, the event counter will
increment by one for each overflow of the preceding even-numbered
counter. Let's emulate this in KVM by creating a 64 bit perf counter
when a user chains two emulated counters together.

For chained events we only support generating an overflow interrupt
on the high counter. We use the attributes of the low counter to
determine the attributes of the perf event.

Suggested-by: Marc Zyngier 
Signed-off-by: Andrew Murray 
Reviewed-by: Julien Thierry 
Reviewed-by: Suzuki K Poulose 
---
 include/kvm/arm_pmu.h |   2 +
 virt/kvm/arm/pmu.c| 252 +++---
 2 files changed, 217 insertions(+), 37 deletions(-)

diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 2f0e28dc5a9e..589f49ed8cf8 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -22,6 +22,7 @@
 #include 
 
 #define ARMV8_PMU_CYCLE_IDX(ARMV8_PMU_MAX_COUNTERS - 1)
+#define ARMV8_PMU_MAX_COUNTER_PAIRS((ARMV8_PMU_MAX_COUNTERS + 1) >> 1)
 
 #ifdef CONFIG_KVM_ARM_PMU
 
@@ -33,6 +34,7 @@ struct kvm_pmc {
 struct kvm_pmu {
int irq_num;
struct kvm_pmc pmc[ARMV8_PMU_MAX_COUNTERS];
+   DECLARE_BITMAP(chained, ARMV8_PMU_MAX_COUNTER_PAIRS);
bool ready;
bool created;
bool irq_level;
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 0588ad0ddb77..21191c9dfba7 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -26,6 +26,8 @@
 
 static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx);
 
+#define PERF_ATTR_CFG1_KVM_PMU_CHAINED 0x1
+
 /**
  * kvm_pmu_idx_is_64bit - determine if select_idx is a 64bit counter
  * @vcpu: The vcpu pointer
@@ -37,29 +39,126 @@ static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, 
u64 select_idx)
__vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_LC);
 }
 
+static struct kvm_vcpu *kvm_pmc_to_vcpu(struct kvm_pmc *pmc)
+{
+   struct kvm_pmu *pmu;
+   struct kvm_vcpu_arch *vcpu_arch;
+
+   pmc -= pmc->idx;
+   pmu = container_of(pmc, struct kvm_pmu, pmc[0]);
+   vcpu_arch = container_of(pmu, struct kvm_vcpu_arch, pmu);
+   return container_of(vcpu_arch, struct kvm_vcpu, arch);
+}
+
 /**
- * kvm_pmu_get_counter_value - get PMU counter value
+ * kvm_pmu_pmc_is_chained - determine if the pmc is chained
+ * @pmc: The PMU counter pointer
+ */
+static bool kvm_pmu_pmc_is_chained(struct kvm_pmc *pmc)
+{
+   struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
+
+   return test_bit(pmc->idx >> 1, vcpu->arch.pmu.chained);
+}
+
+/**
+ * kvm_pmu_idx_is_high_counter - determine if select_idx is a high/low counter
+ * @select_idx: The counter index
+ */
+static bool kvm_pmu_idx_is_high_counter(u64 select_idx)
+{
+   return select_idx & 0x1;
+}
+
+/**
+ * kvm_pmu_get_canonical_pmc - obtain the canonical pmc
+ * @pmc: The PMU counter pointer
+ *
+ * When a pair of PMCs are chained together we use the low counter (canonical)
+ * to hold the underlying perf event.
+ */
+static struct kvm_pmc *kvm_pmu_get_canonical_pmc(struct kvm_pmc *pmc)
+{
+   if (kvm_pmu_pmc_is_chained(pmc) &&
+   kvm_pmu_idx_is_high_counter(pmc->idx))
+   return pmc - 1;
+
+   return pmc;
+}
+
+/**
+ * kvm_pmu_idx_has_chain_evtype - determine if the event type is chain
  * @vcpu: The vcpu pointer
  * @select_idx: The counter index
  */
-u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx)
+static bool kvm_pmu_idx_has_chain_evtype(struct kvm_vcpu *vcpu, u64 select_idx)
 {
-   u64 counter, reg, enabled, running;
-   struct kvm_pmu *pmu = >arch.pmu;
-   struct kvm_pmc *pmc = >pmc[select_idx];
+   u64 eventsel, reg;
 
-   reg = (select_idx == ARMV8_PMU_CYCLE_IDX)
- ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx;
-   counter = __vcpu_sys_reg(vcpu, reg);
+   select_idx |= 0x1;
+
+   if (select_idx == ARMV8_PMU_CYCLE_IDX)
+   return false;
+
+   reg = PMEVTYPER0_EL0 + select_idx;
+   eventsel = __vcpu_sys_reg(vcpu, reg) & ARMV8_PMU_EVTYPE_EVENT;
+
+   return eventsel == ARMV8_PMUV3_PERFCTR_CHAIN;
+}
+
+/**
+ * kvm_pmu_get_pair_counter_value - get PMU counter value
+ * @vcpu: The vcpu pointer
+ * @pmc: The PMU counter pointer
+ */
+static u64 kvm_pmu_get_pair_counter_value(struct kvm_vcpu *vcpu,
+ struct kvm_pmc *pmc)
+{
+   u64 counter, counter_high, reg, enabled, running;
 
-   /* The real counter value is equal to the value of counter register plus
+   if (kvm_pmu_pmc_is_chained(pmc)) {
+   pmc = kvm_pmu_get_canonical_pmc(pmc);
+   reg = PMEVCNTR0_EL0 + pmc->idx;
+
+   counter = __vcpu_sys_reg(vcpu, reg);
+   counter_high = __vcpu_sys_reg(vcpu, reg + 1);
+
+   counter = lower_32

[PATCH v10 4/5] KVM: arm/arm64: remove pmc->bitmask

2019-06-17 Thread Andrew Murray
We currently use pmc->bitmask to determine the width of the pmc - however
it's superfluous as the pmc index already describes if the pmc is a cycle
counter or event counter. The architecture clearly describes the widths of
these counters.

Let's remove the bitmask to simplify the code.

Signed-off-by: Andrew Murray 
---
 include/kvm/arm_pmu.h |  1 -
 virt/kvm/arm/pmu.c| 30 --
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index b73f31baca52..2f0e28dc5a9e 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -28,7 +28,6 @@
 struct kvm_pmc {
u8 idx; /* index into the pmu->pmc array */
struct perf_event *perf_event;
-   u64 bitmask;
 };
 
 struct kvm_pmu {
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index ae1e886d4a1a..0588ad0ddb77 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -25,6 +25,18 @@
 #include 
 
 static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx);
+
+/**
+ * kvm_pmu_idx_is_64bit - determine if select_idx is a 64bit counter
+ * @vcpu: The vcpu pointer
+ * @select_idx: The counter index
+ */
+static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, u64 select_idx)
+{
+   return (select_idx == ARMV8_PMU_CYCLE_IDX &&
+   __vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_LC);
+}
+
 /**
  * kvm_pmu_get_counter_value - get PMU counter value
  * @vcpu: The vcpu pointer
@@ -47,7 +59,10 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 
select_idx)
counter += perf_event_read_value(pmc->perf_event, ,
 );
 
-   return counter & pmc->bitmask;
+   if (!kvm_pmu_idx_is_64bit(vcpu, select_idx))
+   counter = lower_32_bits(counter);
+
+   return counter;
 }
 
 /**
@@ -113,7 +128,6 @@ void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu)
for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) {
kvm_pmu_stop_counter(vcpu, >pmc[i]);
pmu->pmc[i].idx = i;
-   pmu->pmc[i].bitmask = 0xUL;
}
 }
 
@@ -348,8 +362,6 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 
val)
  */
 void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
 {
-   struct kvm_pmu *pmu = >arch.pmu;
-   struct kvm_pmc *pmc;
u64 mask;
int i;
 
@@ -368,11 +380,6 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
for (i = 0; i < ARMV8_PMU_CYCLE_IDX; i++)
kvm_pmu_set_counter_value(vcpu, i, 0);
}
-
-   if (val & ARMV8_PMU_PMCR_LC) {
-   pmc = >pmc[ARMV8_PMU_CYCLE_IDX];
-   pmc->bitmask = 0xUL;
-   }
 }
 
 static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u64 select_idx)
@@ -420,7 +427,10 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu 
*vcpu, u64 select_idx)
 
counter = kvm_pmu_get_counter_value(vcpu, select_idx);
/* The initial sample period (overflow count) of an event. */
-   attr.sample_period = (-counter) & pmc->bitmask;
+   if (kvm_pmu_idx_is_64bit(vcpu, pmc->idx))
+   attr.sample_period = (-counter) & GENMASK(63, 0);
+   else
+   attr.sample_period = (-counter) & GENMASK(31, 0);
 
event = perf_event_create_kernel_counter(, -1, current,
 kvm_pmu_perf_overflow, pmc);
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v10 2/5] KVM: arm/arm64: extract duplicated code to own function

2019-06-17 Thread Andrew Murray
Let's reduce code duplication by extracting common code to its own
function.

Signed-off-by: Andrew Murray 
Reviewed-by: Suzuki K Poulose 
---
 virt/kvm/arm/pmu.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index c5a722ad283f..6e7c179103a6 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -64,6 +64,19 @@ void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 
select_idx, u64 val)
__vcpu_sys_reg(vcpu, reg) += (s64)val - kvm_pmu_get_counter_value(vcpu, 
select_idx);
 }
 
+/**
+ * kvm_pmu_release_perf_event - remove the perf event
+ * @pmc: The PMU counter pointer
+ */
+static void kvm_pmu_release_perf_event(struct kvm_pmc *pmc)
+{
+   if (pmc->perf_event) {
+   perf_event_disable(pmc->perf_event);
+   perf_event_release_kernel(pmc->perf_event);
+   pmc->perf_event = NULL;
+   }
+}
+
 /**
  * kvm_pmu_stop_counter - stop PMU counter
  * @pmc: The PMU counter pointer
@@ -79,9 +92,7 @@ static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, 
struct kvm_pmc *pmc)
reg = (pmc->idx == ARMV8_PMU_CYCLE_IDX)
   ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + pmc->idx;
__vcpu_sys_reg(vcpu, reg) = counter;
-   perf_event_disable(pmc->perf_event);
-   perf_event_release_kernel(pmc->perf_event);
-   pmc->perf_event = NULL;
+   kvm_pmu_release_perf_event(pmc);
}
 }
 
@@ -112,15 +123,8 @@ void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu)
int i;
struct kvm_pmu *pmu = >arch.pmu;
 
-   for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) {
-   struct kvm_pmc *pmc = >pmc[i];
-
-   if (pmc->perf_event) {
-   perf_event_disable(pmc->perf_event);
-   perf_event_release_kernel(pmc->perf_event);
-   pmc->perf_event = NULL;
-   }
-   }
+   for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++)
+   kvm_pmu_release_perf_event(>pmc[i]);
 }
 
 u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu)
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v10 0/5] KVM: arm/arm64: add support for chained counters

2019-06-17 Thread Andrew Murray
ARMv8 provides support for chained PMU counters, where an event type
of 0x001E is set for odd-numbered counters, the event counter will
increment by one for each overflow of the preceding even-numbered
counter. Let's emulate this in KVM by creating a 64 bit perf counter
when a user chains two emulated counters together.

Testing has been performed by hard-coding hwc->sample_period in
__hw_perf_event_init (arm_pmu.c) to a small value, this results in
regular overflows (for non sampling events). The following command
was then used to measure chained and non-chained instruction cycles:

perf stat -e armv8_pmuv3/long=1,inst_retired/u \
  -e armv8_pmuv3/long=0,inst_retired/u dd if=/dev/zero bs=1M \
  count=10 | gzip > /dev/null

The reported values were identical (and for non-chained was in the
same ballpark when running on a kernel without this patchset). Debug
was added to verify that the guest received overflow interrupts for
the chain counter.

The test was also repeated using the cycle counter (cycle:u).

For chained events we only support generating an overflow interrupt
on the high counter. We use the attributes of the low counter to
determine the attributes of the perf event.

Changes since v9:

 - Ensure only 32 bits of cycle counter is returned when !PMCR_LC

 - Add a helper to test for 64 bit counters (e.g. long cycle counter)

 - Rename kvm_pmu_pmc_is_high_counter to kvm_pmu_idx_is_high_counter to
   reflect arguments passed to it

Changes since v8:

 - Correctly calculate the sample_period for the cycle counter

 - Drop "arm64: perf: extract chain helper into header" patch

Changes since v7:

 - Remove pmc->bitmask

 - Remove a couple of instances of using kvm_pmu_get_canonical_pmc
   when not needed

 - Remove unused perf_event variable

Changes since v6:

 - Drop kvm_pmu_{get,set}_perf_event

 - Avoid duplicate work by using kvm_pmu_get_pair_counter_value inside
   kvm_pmu_stop_counter

 - Use GENMASK for 64bit mask

Changes since v5:

 - Use kvm_pmu_pmc_is_high_counter instead of open coding

 - Rename kvm_pmu_event_is_chained to kvm_pmu_idx_has_chain_evtype

 - Use kvm_pmu_get_canonical_pmc only where needed and reintroduce
   the kvm_pmu_{set, get}_perf_event functions

 - Drop masking of counter in kvm_pmu_get_pair_counter_value

 - Only initialise pmc once in kvm_pmu_create_perf_event and other
   minor changes.

Changes since v4:

 - Track pairs of chained counters with a bitmap instead of using
   a struct kvm_pmc_pair.

 - Rebase onto kvmarm/queue

Changes since v3:

 - Simplify approach by not creating events lazily and by introducing
   a struct kvm_pmc_pair to represent the relationship between
   adjacent counters.

 - Rebase onto v5.1-rc2

Changes since v2:

 - Rebased onto v5.0-rc7

 - Add check for cycle counter in correct patch

 - Minor style, naming and comment changes

 - Extract armv8pmu_evtype_is_chain from arch/arm64/kernel/perf_event.c
   into a common header that KVM can use

Changes since v1:

 - Rename kvm_pmu_{enable,disable}_counter to reflect that they can
   operate on multiple counters at once and use these functions where
   possible

 - Fix bugs with overflow handing, kvm_pmu_get_counter_value did not
   take into consideration the perf counter value overflowing the low
   counter

 - Ensure PMCCFILTR_EL0 is used when operating on the cycle counter

 - Rename kvm_pmu_reenable_enabled_{pair, single} and similar

 - Always create perf event disabled to simplify logic elsewhere

 - Move PMCNTENSET_EL0 test to kvm_pmu_enable_counter_mask


Andrew Murray (5):
  KVM: arm/arm64: rename kvm_pmu_{enable/disable}_counter functions
  KVM: arm/arm64: extract duplicated code to own function
  KVM: arm/arm64: re-create event when setting counter value
  KVM: arm/arm64: remove pmc->bitmask
  KVM: arm/arm64: support chained PMU counters

 arch/arm64/kvm/sys_regs.c |   4 +-
 include/kvm/arm_pmu.h |  11 +-
 virt/kvm/arm/pmu.c| 350 ++
 3 files changed, 291 insertions(+), 74 deletions(-)

-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v10 1/5] KVM: arm/arm64: rename kvm_pmu_{enable/disable}_counter functions

2019-06-17 Thread Andrew Murray
The kvm_pmu_{enable/disable}_counter functions can enable/disable
multiple counters at once as they operate on a bitmask. Let's
make this clearer by renaming the function.

Suggested-by: Suzuki K Poulose 
Signed-off-by: Andrew Murray 
Reviewed-by: Julien Thierry 
Reviewed-by: Suzuki K Poulose 
---
 arch/arm64/kvm/sys_regs.c |  4 ++--
 include/kvm/arm_pmu.h |  8 
 virt/kvm/arm/pmu.c| 12 ++--
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 9d02643bc601..8e98fb173ed3 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -876,12 +876,12 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct 
sys_reg_params *p,
if (r->Op2 & 0x1) {
/* accessing PMCNTENSET_EL0 */
__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) |= val;
-   kvm_pmu_enable_counter(vcpu, val);
+   kvm_pmu_enable_counter_mask(vcpu, val);
kvm_vcpu_pmu_restore_guest(vcpu);
} else {
/* accessing PMCNTENCLR_EL0 */
__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) &= ~val;
-   kvm_pmu_disable_counter(vcpu, val);
+   kvm_pmu_disable_counter_mask(vcpu, val);
}
} else {
p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask;
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index f87fe20fcb05..b73f31baca52 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -46,8 +46,8 @@ void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 
select_idx, u64 val);
 u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu);
 void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu);
 void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu);
-void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val);
-void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val);
+void kvm_pmu_disable_counter_mask(struct kvm_vcpu *vcpu, u64 val);
+void kvm_pmu_enable_counter_mask(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu);
 void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
 bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu);
@@ -83,8 +83,8 @@ static inline u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu 
*vcpu)
 }
 static inline void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu) {}
 static inline void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu) {}
-static inline void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val) {}
-static inline void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val) {}
+static inline void kvm_pmu_disable_counter_mask(struct kvm_vcpu *vcpu, u64 
val) {}
+static inline void kvm_pmu_enable_counter_mask(struct kvm_vcpu *vcpu, u64 val) 
{}
 static inline void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) {}
 static inline void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) {}
 static inline bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu)
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 1c5b76c46e26..c5a722ad283f 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -135,13 +135,13 @@ u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu)
 }
 
 /**
- * kvm_pmu_enable_counter - enable selected PMU counter
+ * kvm_pmu_enable_counter_mask - enable selected PMU counters
  * @vcpu: The vcpu pointer
  * @val: the value guest writes to PMCNTENSET register
  *
  * Call perf_event_enable to start counting the perf event
  */
-void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val)
+void kvm_pmu_enable_counter_mask(struct kvm_vcpu *vcpu, u64 val)
 {
int i;
struct kvm_pmu *pmu = >arch.pmu;
@@ -164,13 +164,13 @@ void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 
val)
 }
 
 /**
- * kvm_pmu_disable_counter - disable selected PMU counter
+ * kvm_pmu_disable_counter_mask - disable selected PMU counters
  * @vcpu: The vcpu pointer
  * @val: the value guest writes to PMCNTENCLR register
  *
  * Call perf_event_disable to stop counting the perf event
  */
-void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val)
+void kvm_pmu_disable_counter_mask(struct kvm_vcpu *vcpu, u64 val)
 {
int i;
struct kvm_pmu *pmu = >arch.pmu;
@@ -347,10 +347,10 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
 
mask = kvm_pmu_valid_counter_mask(vcpu);
if (val & ARMV8_PMU_PMCR_E) {
-   kvm_pmu_enable_counter(vcpu,
+   kvm_pmu_enable_counter_mask(vcpu,
   __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
} else {
-   kvm_pmu_disable_counter(vcpu, mask);
+   kvm_pmu_disable_counter_mask(vcpu, mask);
}
 
if (val & ARMV8_PMU_PMCR_C)
-- 
2.21.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v9 4/5] KVM: arm/arm64: remove pmc->bitmask

2019-06-17 Thread Andrew Murray
On Mon, Jun 17, 2019 at 05:33:40PM +0100, Suzuki K Poulose wrote:
> 
> 
> On 17/06/2019 16:43, Andrew Murray wrote:
> > On Thu, Jun 13, 2019 at 05:50:43PM +0100, Suzuki K Poulose wrote:
> > > 
> > > 
> > > On 13/06/2019 10:39, Andrew Murray wrote:
> > > > On Thu, Jun 13, 2019 at 08:30:51AM +0100, Julien Thierry wrote:
> 
> > > > > > index ae1e886d4a1a..88ce24ae0b45 100644
> > > > > > --- a/virt/kvm/arm/pmu.c
> > > > > > +++ b/virt/kvm/arm/pmu.c
> > > > > > @@ -47,7 +47,10 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu 
> > > > > > *vcpu, u64 select_idx)
> > > > > > counter += perf_event_read_value(pmc->perf_event, 
> > > > > > ,
> > > > > >  );
> > > > > > -   return counter & pmc->bitmask;
> > > > > > +   if (select_idx != ARMV8_PMU_CYCLE_IDX)
> > > > > > +   counter = lower_32_bits(counter);
> > > > > 
> > > > > Shouldn't this depend on PMCR.LC as well? If PMCR.LC is clear we only
> > > > > want the lower 32bits of the cycle counter.
> > > > 
> > > > Yes that's correct. The hunk should look like this:
> > > > 
> > > > -   return counter & pmc->bitmask;
> > > > +   if (!(select_idx == ARMV8_PMU_CYCLE_IDX &&
> > > > + __vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_LC))
> > > > +   counter = lower_32_bits(counter);
> > > > +
> > > > +   return counter;
> > > 
> > > May be you could add a macro :
> > > 
> > > #define vcpu_pmu_counter_is_64bit(vcpu, idx) ?
> > 
> > Yes I think a helper would be useful - though I'd prefer the name
> > 'kvm_pmu_idx_is_long_cycle_counter'. This seems a little clearer as
> > you could otherwise argue that a chained counter is also 64 bits.
> 
> When you get to add 64bit PMU counter (v8.5), this would be handy. So
> having it de-coupled from the cycle counter may be a good idea. Anyways,
> I leave that to you.

Yes that thought crossed my mind. I'll take your suggestion afterall.

Thanks,

Andrew Murray

> 
> Cheers
> Suzuki
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


  1   2   3   4   >