Re: [PATCH v3 2/8] target/arm: v8.3 PAC ID_AA64ISAR[12] feature-detection

2023-06-12 Thread Aaron Lindsay
On Jun 09 13:51, Richard Henderson wrote:
> On 6/9/23 10:23, Aaron Lindsay wrote:
> > +static inline int isar_feature_pauth_get_features(const ARMISARegisters 
> > *id)
> > +{
> > +if (isar_feature_aa64_pauth_arch_qarma5(id)) {
> > +return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, APA);
> > +} else if (isar_feature_aa64_pauth_arch_qarma3(id)) {
> > +return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, APA3);
> > +} else {
> > +return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, API);
> > +}
> > +}
> 
> As I mentioned in previous review, exactly one of these fields will be
> non-zero, so you can just OR them all together without the conditionals.

Sorry I missed this last time around - I've queued this change for v4.

Thanks!

-Aaron



Re: [PATCH v3 1/8] target/arm: Add ID_AA64ISAR2_EL1

2023-06-12 Thread Aaron Lindsay
On Jun 09 13:49, Richard Henderson wrote:
> On 6/9/23 10:23, Aaron Lindsay wrote:
> > --- a/target/arm/hvf/hvf.c
> > +++ b/target/arm/hvf/hvf.c
> > @@ -847,6 +847,7 @@ static bool 
> > hvf_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
> >   { HV_SYS_REG_ID_AA64DFR1_EL1, _isar.id_aa64dfr1 },
> >   { HV_SYS_REG_ID_AA64ISAR0_EL1, _isar.id_aa64isar0 },
> >   { HV_SYS_REG_ID_AA64ISAR1_EL1, _isar.id_aa64isar1 },
> > +{ HV_SYS_REG_ID_AA64ISAR2_EL1, _isar.id_aa64isar2 },
> 
> Sadly not defined for MacOSX13.1.sdk, and it's an enum so you can't #ifdef it 
> either.
> 
> You'll need a meson probe for it.

I'm not very familiar with HVF or meson - I am not sure I understand
what you're suggesting here (and a few attempts to grep around for an
example didn't turn up anything that looked helpful). Are you suggesting
some sort of build-time auto-detection, a "dumb" configuration switch
that a user could use to manually enable this, or something else? And/or
is there an example you could point me to of what you're thinking?

-Aaron



[PATCH v3 0/8] Implement Most ARMv8.3 Pointer Authentication Features

2023-06-09 Thread Aaron Lindsay
Changes from v2 of this patchset [0]:
- Remove properties for EPAC, Pauth2, FPAC, FPACCombined
- Separate out aa64isar2 addition into its own patch
- Comment clarifications
- Several code formatting/simplifications
- Rebase on top of latest upstream changes (for example, those which
  reorganized decoding PAC branch instructions)

[0] - https://lists.nongnu.org/archive/html/qemu-devel/2023-02/msg06494.html

Aaron Lindsay (8):
  target/arm: Add ID_AA64ISAR2_EL1
  target/arm: v8.3 PAC ID_AA64ISAR[12] feature-detection
  target/arm: Implement v8.3 QARMA3 PAC cipher
  target/arm: Implement v8.3 EnhancedPAC
  target/arm: Implement v8.3 Pauth2
  targer/arm: Inform helpers whether a PAC instruction is 'combined'
  target/arm: Implement v8.3 FPAC and FPACCOMBINE
  target/arm: Add CPU property for QARMA3, enable FPACCombined by
default

 target/arm/cpu.h   |  67 +++-
 target/arm/cpu64.c |  48 ++---
 target/arm/helper.c|   4 +-
 target/arm/hvf/hvf.c   |   1 +
 target/arm/kvm64.c |   2 +
 target/arm/syndrome.h  |   7 ++
 target/arm/tcg/helper-a64.h|   4 +
 target/arm/tcg/pauth_helper.c  | 189 ++---
 target/arm/tcg/translate-a64.c |  12 +--
 9 files changed, 270 insertions(+), 64 deletions(-)

-- 
2.25.1




[PATCH v3 2/8] target/arm: v8.3 PAC ID_AA64ISAR[12] feature-detection

2023-06-09 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.h  | 65 +--
 target/arm/tcg/pauth_helper.c |  2 +-
 2 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index df04c9a9ab..22dd898577 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3714,18 +3714,77 @@ static inline bool isar_feature_aa64_pauth(const 
ARMISARegisters *id)
 (FIELD_DP64(0, ID_AA64ISAR1, APA, 0xf) |
  FIELD_DP64(0, ID_AA64ISAR1, API, 0xf) |
  FIELD_DP64(0, ID_AA64ISAR1, GPA, 0xf) |
- FIELD_DP64(0, ID_AA64ISAR1, GPI, 0xf))) != 0;
+ FIELD_DP64(0, ID_AA64ISAR1, GPI, 0xf))) != 0 ||
+   (id->id_aa64isar2 &
+(FIELD_DP64(0, ID_AA64ISAR2, APA3, 0xf) |
+ FIELD_DP64(0, ID_AA64ISAR2, GPA3, 0xf))) != 0;
 }
 
-static inline bool isar_feature_aa64_pauth_arch(const ARMISARegisters *id)
+static inline bool isar_feature_aa64_pauth_arch_qarma5(const ARMISARegisters 
*id)
 {
 /*
- * Return true if pauth is enabled with the architected QARMA algorithm.
+ * Return true if pauth is enabled with the architected QARMA5 algorithm.
  * QEMU will always set APA+GPA to the same value.
  */
 return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, APA) != 0;
 }
 
+static inline bool isar_feature_aa64_pauth_arch_qarma3(const ARMISARegisters 
*id)
+{
+/*
+ * Return true if pauth is enabled with the architected QARMA3 algorithm.
+ * QEMU will always set APA3+GPA3 to the same result.
+ */
+return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, APA3) != 0;
+}
+
+static inline bool isar_feature_aa64_pauth_arch(const ARMISARegisters *id)
+{
+return isar_feature_aa64_pauth_arch_qarma5(id) ||
+isar_feature_aa64_pauth_arch_qarma3(id);
+}
+
+static inline int isar_feature_pauth_get_features(const ARMISARegisters *id)
+{
+if (isar_feature_aa64_pauth_arch_qarma5(id)) {
+return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, APA);
+} else if (isar_feature_aa64_pauth_arch_qarma3(id)) {
+return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, APA3);
+} else {
+return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, API);
+}
+}
+
+static inline bool isar_feature_aa64_pauth_epac(const ARMISARegisters *id)
+{
+/*
+ * Note that unlike most AArch64 features, EPAC is treated (in the ARM
+ * psedocode, at least) as not being implemented by larger values of this
+ * field. Our usage of '>=' rather than '==' here causes our implementation
+ * of PAC logic to diverge from ARM pseudocode - we must check that
+ * isar_feature_aa64_pauth2() returns false AND
+ * isar_feature_aa64_pauth_epac() returns true, where the pseudocode reads
+ * as if EPAC is not implemented if the value of this register is > 0b10.
+ * See the implementation of pauth_addpac() for an example.
+ */
+return isar_feature_pauth_get_features(id) >= 0b0010;
+}
+
+static inline bool isar_feature_aa64_pauth2(const ARMISARegisters *id)
+{
+return isar_feature_pauth_get_features(id) >= 0b0011;
+}
+
+static inline bool isar_feature_aa64_fpac(const ARMISARegisters *id)
+{
+return isar_feature_pauth_get_features(id) >= 0b0100;
+}
+
+static inline bool isar_feature_aa64_fpac_combine(const ARMISARegisters *id)
+{
+return isar_feature_pauth_get_features(id) >= 0b0101;
+}
+
 static inline bool isar_feature_aa64_tlbirange(const ARMISARegisters *id)
 {
 return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, TLB) == 2;
diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index 62af569341..3ff4610a26 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -282,7 +282,7 @@ static uint64_t pauth_computepac_impdef(uint64_t data, 
uint64_t modifier,
 static uint64_t pauth_computepac(CPUARMState *env, uint64_t data,
  uint64_t modifier, ARMPACKey key)
 {
-if (cpu_isar_feature(aa64_pauth_arch, env_archcpu(env))) {
+if (cpu_isar_feature(aa64_pauth_arch_qarma5, env_archcpu(env))) {
 return pauth_computepac_architected(data, modifier, key);
 } else {
 return pauth_computepac_impdef(data, modifier, key);
-- 
2.25.1




[PATCH v3 7/8] target/arm: Implement v8.3 FPAC and FPACCOMBINE

2023-06-09 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
---
 target/arm/syndrome.h |  7 +++
 target/arm/tcg/pauth_helper.c | 16 
 2 files changed, 23 insertions(+)

diff --git a/target/arm/syndrome.h b/target/arm/syndrome.h
index d27d1bc31f..bf79c539d9 100644
--- a/target/arm/syndrome.h
+++ b/target/arm/syndrome.h
@@ -49,6 +49,7 @@ enum arm_exception_class {
 EC_SYSTEMREGISTERTRAP = 0x18,
 EC_SVEACCESSTRAP  = 0x19,
 EC_ERETTRAP   = 0x1a,
+EC_PACFAIL= 0x1c,
 EC_SMETRAP= 0x1d,
 EC_INSNABORT  = 0x20,
 EC_INSNABORT_SAME_EL  = 0x21,
@@ -231,6 +232,12 @@ static inline uint32_t syn_smetrap(SMEExceptionType etype, 
bool is_16bit)
 | (is_16bit ? 0 : ARM_EL_IL) | etype;
 }
 
+static inline uint32_t syn_pacfail(bool data, int keynumber)
+{
+int error_code = (data << 1) | keynumber;
+return (EC_PACFAIL << ARM_EL_EC_SHIFT) | ARM_EL_IL | error_code;
+}
+
 static inline uint32_t syn_pactrap(void)
 {
 return EC_PACTRAP << ARM_EL_EC_SHIFT;
diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index 278d6d36bc..f42945257f 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -395,6 +395,13 @@ static uint64_t pauth_original_ptr(uint64_t ptr, 
ARMVAParameters param)
 }
 }
 
+static G_NORETURN
+void pauth_fail_exception(CPUARMState *env, bool data, int keynumber, 
uintptr_t ra)
+{
+int target_el = exception_target_el(env);
+raise_exception_ra(env, EXCP_UDEF, syn_pacfail(data, keynumber), 
target_el, ra);
+}
+
 static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
ARMPACKey *key, bool data, int keynumber,
uintptr_t ra, bool is_combined)
@@ -414,6 +421,15 @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, 
uint64_t modifier,
 uint64_t xor_mask = MAKE_64BIT_MASK(bot_bit, top_bit - bot_bit + 1) &
 ~MAKE_64BIT_MASK(55, 1);
 result = ptr ^ (pac & xor_mask);
+if (cpu_isar_feature(aa64_fpac_combine, cpu)
+|| (cpu_isar_feature(aa64_fpac, cpu) && !is_combined)) {
+int fpac_top = param.tbi ? 55 : 64;
+uint64_t fpac_mask = MAKE_64BIT_MASK(bot_bit, fpac_top - bot_bit);
+test = (result ^ sextract64(result, 55, 1)) & fpac_mask;
+if (unlikely(test)) {
+pauth_fail_exception(env, data, keynumber, ra);
+}
+}
 } else {
 test = (pac ^ ptr) & ~MAKE_64BIT_MASK(55, 1);
 if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
-- 
2.25.1




[PATCH v3 3/8] target/arm: Implement v8.3 QARMA3 PAC cipher

2023-06-09 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/tcg/pauth_helper.c | 54 ---
 1 file changed, 44 insertions(+), 10 deletions(-)

diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index 3ff4610a26..68942015e1 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -96,6 +96,21 @@ static uint64_t pac_sub(uint64_t i)
 return o;
 }
 
+static uint64_t pac_sub1(uint64_t i)
+{
+static const uint8_t sub1[16] = {
+0xa, 0xd, 0xe, 0x6, 0xf, 0x7, 0x3, 0x5,
+0x9, 0x8, 0x0, 0xc, 0xb, 0x1, 0x2, 0x4,
+};
+uint64_t o = 0;
+int b;
+
+for (b = 0; b < 64; b += 4) {
+o |= (uint64_t)sub1[(i >> b) & 0xf] << b;
+}
+return o;
+}
+
 static uint64_t pac_inv_sub(uint64_t i)
 {
 static const uint8_t inv_sub[16] = {
@@ -209,7 +224,7 @@ static uint64_t tweak_inv_shuffle(uint64_t i)
 }
 
 static uint64_t pauth_computepac_architected(uint64_t data, uint64_t modifier,
- ARMPACKey key)
+ ARMPACKey key, bool isqarma3)
 {
 static const uint64_t RC[5] = {
 0xull,
@@ -219,6 +234,7 @@ static uint64_t pauth_computepac_architected(uint64_t data, 
uint64_t modifier,
 0x452821E638D01377ull,
 };
 const uint64_t alpha = 0xC0AC29B7C97C50DDull;
+int iterations = isqarma3 ? 2 : 4;
 /*
  * Note that in the ARM pseudocode, key0 contains bits <127:64>
  * and key1 contains bits <63:0> of the 128-bit key.
@@ -231,7 +247,7 @@ static uint64_t pauth_computepac_architected(uint64_t data, 
uint64_t modifier,
 runningmod = modifier;
 workingval = data ^ key0;
 
-for (i = 0; i <= 4; ++i) {
+for (i = 0; i <= iterations; ++i) {
 roundkey = key1 ^ runningmod;
 workingval ^= roundkey;
 workingval ^= RC[i];
@@ -239,32 +255,48 @@ static uint64_t pauth_computepac_architected(uint64_t 
data, uint64_t modifier,
 workingval = pac_cell_shuffle(workingval);
 workingval = pac_mult(workingval);
 }
-workingval = pac_sub(workingval);
+if (isqarma3) {
+workingval = pac_sub1(workingval);
+} else {
+workingval = pac_sub(workingval);
+}
 runningmod = tweak_shuffle(runningmod);
 }
 roundkey = modk0 ^ runningmod;
 workingval ^= roundkey;
 workingval = pac_cell_shuffle(workingval);
 workingval = pac_mult(workingval);
-workingval = pac_sub(workingval);
+if (isqarma3) {
+workingval = pac_sub1(workingval);
+} else {
+workingval = pac_sub(workingval);
+}
 workingval = pac_cell_shuffle(workingval);
 workingval = pac_mult(workingval);
 workingval ^= key1;
 workingval = pac_cell_inv_shuffle(workingval);
-workingval = pac_inv_sub(workingval);
+if (isqarma3) {
+workingval = pac_sub1(workingval);
+} else {
+workingval = pac_inv_sub(workingval);
+}
 workingval = pac_mult(workingval);
 workingval = pac_cell_inv_shuffle(workingval);
 workingval ^= key0;
 workingval ^= runningmod;
-for (i = 0; i <= 4; ++i) {
-workingval = pac_inv_sub(workingval);
-if (i < 4) {
+for (i = 0; i <= iterations; ++i) {
+if (isqarma3) {
+workingval = pac_sub1(workingval);
+} else {
+workingval = pac_inv_sub(workingval);
+}
+if (i < iterations) {
 workingval = pac_mult(workingval);
 workingval = pac_cell_inv_shuffle(workingval);
 }
 runningmod = tweak_inv_shuffle(runningmod);
 roundkey = key1 ^ runningmod;
-workingval ^= RC[4 - i];
+workingval ^= RC[iterations - i];
 workingval ^= roundkey;
 workingval ^= alpha;
 }
@@ -283,7 +315,9 @@ static uint64_t pauth_computepac(CPUARMState *env, uint64_t 
data,
  uint64_t modifier, ARMPACKey key)
 {
 if (cpu_isar_feature(aa64_pauth_arch_qarma5, env_archcpu(env))) {
-return pauth_computepac_architected(data, modifier, key);
+return pauth_computepac_architected(data, modifier, key, false);
+} else if (cpu_isar_feature(aa64_pauth_arch_qarma3, env_archcpu(env))) {
+return pauth_computepac_architected(data, modifier, key, true);
 } else {
 return pauth_computepac_impdef(data, modifier, key);
 }
-- 
2.25.1




[PATCH v3 5/8] target/arm: Implement v8.3 Pauth2

2023-06-09 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/tcg/pauth_helper.c | 33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index 1e9159c313..b0282d1a05 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -352,7 +352,9 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
  */
 test = sextract64(ptr, bot_bit, top_bit - bot_bit);
 if (test != 0 && test != -1) {
-if (cpu_isar_feature(aa64_pauth_epac, cpu)) {
+if (cpu_isar_feature(aa64_pauth2, cpu)) {
+/* No action required */
+} else if (cpu_isar_feature(aa64_pauth_epac, cpu)) {
 pac = 0;
 } else {
 /*
@@ -367,6 +369,9 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
  * Preserve the determination between upper and lower at bit 55,
  * and insert pointer authentication code.
  */
+if (cpu_isar_feature(aa64_pauth2, cpu)) {
+pac ^= ptr;
+}
 if (param.tbi) {
 ptr &= ~MAKE_64BIT_MASK(bot_bit, 55 - bot_bit + 1);
 pac &= MAKE_64BIT_MASK(bot_bit, 54 - bot_bit + 1);
@@ -393,26 +398,34 @@ static uint64_t pauth_original_ptr(uint64_t ptr, 
ARMVAParameters param)
 static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
ARMPACKey *key, bool data, int keynumber)
 {
+ARMCPU *cpu = env_archcpu(env);
 ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
 ARMVAParameters param = aa64_va_parameters(env, ptr, mmu_idx, data, false);
 int bot_bit, top_bit;
-uint64_t pac, orig_ptr, test;
+uint64_t pac, orig_ptr, test, result;
 
 orig_ptr = pauth_original_ptr(ptr, param);
 pac = pauth_computepac(env, orig_ptr, modifier, *key);
 bot_bit = 64 - param.tsz;
 top_bit = 64 - 8 * param.tbi;
 
-test = (pac ^ ptr) & ~MAKE_64BIT_MASK(55, 1);
-if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
-int error_code = (keynumber << 1) | (keynumber ^ 1);
-if (param.tbi) {
-return deposit64(orig_ptr, 53, 2, error_code);
-} else {
-return deposit64(orig_ptr, 61, 2, error_code);
+if (cpu_isar_feature(aa64_pauth2, cpu)) {
+uint64_t xor_mask = MAKE_64BIT_MASK(bot_bit, top_bit - bot_bit + 1) &
+~MAKE_64BIT_MASK(55, 1);
+result = ptr ^ (pac & xor_mask);
+} else {
+test = (pac ^ ptr) & ~MAKE_64BIT_MASK(55, 1);
+if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
+int error_code = (keynumber << 1) | (keynumber ^ 1);
+if (param.tbi) {
+return deposit64(orig_ptr, 53, 2, error_code);
+} else {
+return deposit64(orig_ptr, 61, 2, error_code);
+}
 }
+result = orig_ptr;
 }
-return orig_ptr;
+return result;
 }
 
 static uint64_t pauth_strip(CPUARMState *env, uint64_t ptr, bool data)
-- 
2.25.1




[PATCH v3 8/8] target/arm: Add CPU property for QARMA3, enable FPACCombined by default

2023-06-09 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.h   |  1 +
 target/arm/cpu64.c | 48 +++---
 2 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 22dd898577..0c4c6c9c82 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1061,6 +1061,7 @@ struct ArchCPU {
  */
 bool prop_pauth;
 bool prop_pauth_impdef;
+bool prop_pauth_qarma3;
 bool prop_lpa2;
 
 /* DCZ blocksize, in log_2(words), ie low 4 bits of DCZID_EL0 */
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 6eaf8e32cf..b0a5af7a31 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -473,9 +473,6 @@ void aarch64_add_sme_properties(Object *obj)
 
 void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
 {
-int arch_val = 0, impdef_val = 0;
-uint64_t t;
-
 /* Exit early if PAuth is enabled, and fall through to disable it */
 if ((kvm_enabled() || hvf_enabled()) && cpu->prop_pauth) {
 if (!cpu_isar_feature(aa64_pauth, cpu)) {
@@ -486,30 +483,50 @@ void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
 return;
 }
 
-/* TODO: Handle HaveEnhancedPAC, HaveEnhancedPAC2, HaveFPAC. */
+/* Write the features into the correct field for the algorithm in use */
 if (cpu->prop_pauth) {
+uint64_t t;
+
+if (cpu->prop_pauth_impdef && cpu->prop_pauth_qarma3) {
+error_setg(errp, "Cannot set both qarma3 ('pauth-qarma3') and "
+"impdef ('pauth-impdef') pointer authentication ciphers");
+return;
+}
+
+/* Implement FEAT_FPACCOMBINE for address authentication and enable
+ * generic authentication for the chosen cipher.
+ */
+int address_auth = 0b0101;
+int generic_auth = 0b0001;
+
 if (cpu->prop_pauth_impdef) {
-impdef_val = 1;
+t = cpu->isar.id_aa64isar1;
+t = FIELD_DP64(t, ID_AA64ISAR1, API, address_auth);
+t = FIELD_DP64(t, ID_AA64ISAR1, GPI, generic_auth);
+cpu->isar.id_aa64isar1 = t;
+} else if (cpu->prop_pauth_qarma3) {
+t = cpu->isar.id_aa64isar2;
+t = FIELD_DP64(t, ID_AA64ISAR2, APA3, address_auth);
+t = FIELD_DP64(t, ID_AA64ISAR2, GPA3, generic_auth);
+cpu->isar.id_aa64isar2 = t;
 } else {
-arch_val = 1;
+t = cpu->isar.id_aa64isar1;
+t = FIELD_DP64(t, ID_AA64ISAR1, APA, address_auth);
+t = FIELD_DP64(t, ID_AA64ISAR1, GPA, generic_auth);
+cpu->isar.id_aa64isar1 = t;
 }
-} else if (cpu->prop_pauth_impdef) {
-error_setg(errp, "cannot enable pauth-impdef without pauth");
+} else if (cpu->prop_pauth_impdef || cpu->prop_pauth_qarma3) {
+error_setg(errp, "cannot enable pauth-impdef or pauth-qarma3 without 
pauth");
 error_append_hint(errp, "Add pauth=on to the CPU property list.\n");
 }
-
-t = cpu->isar.id_aa64isar1;
-t = FIELD_DP64(t, ID_AA64ISAR1, APA, arch_val);
-t = FIELD_DP64(t, ID_AA64ISAR1, GPA, arch_val);
-t = FIELD_DP64(t, ID_AA64ISAR1, API, impdef_val);
-t = FIELD_DP64(t, ID_AA64ISAR1, GPI, impdef_val);
-cpu->isar.id_aa64isar1 = t;
 }
 
 static Property arm_cpu_pauth_property =
 DEFINE_PROP_BOOL("pauth", ARMCPU, prop_pauth, true);
 static Property arm_cpu_pauth_impdef_property =
 DEFINE_PROP_BOOL("pauth-impdef", ARMCPU, prop_pauth_impdef, false);
+static Property arm_cpu_pauth_qarma3_property =
+DEFINE_PROP_BOOL("pauth-qarma3", ARMCPU, prop_pauth_qarma3, false);
 
 void aarch64_add_pauth_properties(Object *obj)
 {
@@ -529,6 +546,7 @@ void aarch64_add_pauth_properties(Object *obj)
 cpu->prop_pauth = cpu_isar_feature(aa64_pauth, cpu);
 } else {
 qdev_property_add_static(DEVICE(obj), _cpu_pauth_impdef_property);
+qdev_property_add_static(DEVICE(obj), _cpu_pauth_qarma3_property);
 }
 }
 
-- 
2.25.1




[PATCH v3 4/8] target/arm: Implement v8.3 EnhancedPAC

2023-06-09 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/tcg/pauth_helper.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index 68942015e1..1e9159c313 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -326,6 +326,7 @@ static uint64_t pauth_computepac(CPUARMState *env, uint64_t 
data,
 static uint64_t pauth_addpac(CPUARMState *env, uint64_t ptr, uint64_t modifier,
  ARMPACKey *key, bool data)
 {
+ARMCPU *cpu = env_archcpu(env);
 ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
 ARMVAParameters param = aa64_va_parameters(env, ptr, mmu_idx, data, false);
 uint64_t pac, ext_ptr, ext, test;
@@ -351,11 +352,15 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
  */
 test = sextract64(ptr, bot_bit, top_bit - bot_bit);
 if (test != 0 && test != -1) {
-/*
- * Note that our top_bit is one greater than the pseudocode's
- * version, hence "- 2" here.
- */
-pac ^= MAKE_64BIT_MASK(top_bit - 2, 1);
+if (cpu_isar_feature(aa64_pauth_epac, cpu)) {
+pac = 0;
+} else {
+/*
+ * Note that our top_bit is one greater than the pseudocode's
+ * version, hence "- 2" here.
+ */
+pac ^= MAKE_64BIT_MASK(top_bit - 2, 1);
+}
 }
 
 /*
-- 
2.25.1




[PATCH v3 1/8] target/arm: Add ID_AA64ISAR2_EL1

2023-06-09 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.h | 1 +
 target/arm/helper.c  | 4 ++--
 target/arm/hvf/hvf.c | 1 +
 target/arm/kvm64.c   | 2 ++
 4 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 36c608f0e6..df04c9a9ab 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1023,6 +1023,7 @@ struct ArchCPU {
 uint32_t dbgdevid1;
 uint64_t id_aa64isar0;
 uint64_t id_aa64isar1;
+uint64_t id_aa64isar2;
 uint64_t id_aa64pfr0;
 uint64_t id_aa64pfr1;
 uint64_t id_aa64mmfr0;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index d4bee43bd0..4ced2f71e5 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -8204,11 +8204,11 @@ void register_cp_regs_for_features(ARMCPU *cpu)
   .access = PL1_R, .type = ARM_CP_CONST,
   .accessfn = access_aa64_tid3,
   .resetvalue = cpu->isar.id_aa64isar1 },
-{ .name = "ID_AA64ISAR2_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
+{ .name = "ID_AA64ISAR2_EL1", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 2,
   .access = PL1_R, .type = ARM_CP_CONST,
   .accessfn = access_aa64_tid3,
-  .resetvalue = 0 },
+  .resetvalue = cpu->isar.id_aa64isar2 },
 { .name = "ID_AA64ISAR3_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 3,
   .access = PL1_R, .type = ARM_CP_CONST,
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 8f72624586..bf567b24db 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -847,6 +847,7 @@ static bool 
hvf_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
 { HV_SYS_REG_ID_AA64DFR1_EL1, _isar.id_aa64dfr1 },
 { HV_SYS_REG_ID_AA64ISAR0_EL1, _isar.id_aa64isar0 },
 { HV_SYS_REG_ID_AA64ISAR1_EL1, _isar.id_aa64isar1 },
+{ HV_SYS_REG_ID_AA64ISAR2_EL1, _isar.id_aa64isar2 },
 { HV_SYS_REG_ID_AA64MMFR0_EL1, _isar.id_aa64mmfr0 },
 { HV_SYS_REG_ID_AA64MMFR1_EL1, _isar.id_aa64mmfr1 },
 { HV_SYS_REG_ID_AA64MMFR2_EL1, _isar.id_aa64mmfr2 },
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 94bbd9661f..e2d05d7fc0 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -306,6 +306,8 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
   ARM64_SYS_REG(3, 0, 0, 6, 0));
 err |= read_sys_reg64(fdarray[2], >isar.id_aa64isar1,
   ARM64_SYS_REG(3, 0, 0, 6, 1));
+err |= read_sys_reg64(fdarray[2], >isar.id_aa64isar2,
+  ARM64_SYS_REG(3, 0, 0, 6, 2));
 err |= read_sys_reg64(fdarray[2], >isar.id_aa64mmfr0,
   ARM64_SYS_REG(3, 0, 0, 7, 0));
 err |= read_sys_reg64(fdarray[2], >isar.id_aa64mmfr1,
-- 
2.25.1




[PATCH v3 6/8] targer/arm: Inform helpers whether a PAC instruction is 'combined'

2023-06-09 Thread Aaron Lindsay
An instruction is a 'combined' Pointer Authentication instruction if it
does something in addition to PAC - for instance, branching to or
loading an address from the authenticated pointer. Knowing whether a PAC
operation is 'combined' is needed to implement the FPACCOMBINE feature
for ARMv8.3.

Signed-off-by: Aaron Lindsay 
Reviewed-by: Richard Henderson 
---
 target/arm/tcg/helper-a64.h|  4 ++
 target/arm/tcg/pauth_helper.c  | 71 +++---
 target/arm/tcg/translate-a64.c | 12 +++---
 3 files changed, 68 insertions(+), 19 deletions(-)

diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
index 3d5957c11f..57cfd68569 100644
--- a/target/arm/tcg/helper-a64.h
+++ b/target/arm/tcg/helper-a64.h
@@ -90,9 +90,13 @@ DEF_HELPER_FLAGS_3(pacda, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(pacdb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(pacga, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autia, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autia_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autib, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autib_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autda, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autda_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autdb, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autdb_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_2(xpaci, TCG_CALL_NO_RWG_SE, i64, env, i64)
 DEF_HELPER_FLAGS_2(xpacd, TCG_CALL_NO_RWG_SE, i64, env, i64)
 
diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index b0282d1a05..278d6d36bc 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -396,7 +396,8 @@ static uint64_t pauth_original_ptr(uint64_t ptr, 
ARMVAParameters param)
 }
 
 static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
-   ARMPACKey *key, bool data, int keynumber)
+   ARMPACKey *key, bool data, int keynumber,
+   uintptr_t ra, bool is_combined)
 {
 ARMCPU *cpu = env_archcpu(env);
 ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
@@ -518,44 +519,88 @@ uint64_t HELPER(pacga)(CPUARMState *env, uint64_t x, 
uint64_t y)
 return pac & 0xull;
 }
 
-uint64_t HELPER(autia)(CPUARMState *env, uint64_t x, uint64_t y)
+static uint64_t pauth_autia(CPUARMState *env, uint64_t x, uint64_t y,
+uintptr_t ra, bool is_combined)
 {
 int el = arm_current_el(env);
 if (!pauth_key_enabled(env, el, SCTLR_EnIA)) {
 return x;
 }
-pauth_check_trap(env, el, GETPC());
-return pauth_auth(env, x, y, >keys.apia, false, 0);
+pauth_check_trap(env, el, ra);
+return pauth_auth(env, x, y, >keys.apia, false, 0, ra, is_combined);
 }
 
-uint64_t HELPER(autib)(CPUARMState *env, uint64_t x, uint64_t y)
+uint64_t HELPER(autia)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autia(env, x, y, GETPC(), false);
+}
+
+uint64_t HELPER(autia_combined)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autia(env, x, y, GETPC(), true);
+}
+
+static uint64_t pauth_autib(CPUARMState *env, uint64_t x, uint64_t y,
+uintptr_t ra, bool is_combined)
 {
 int el = arm_current_el(env);
 if (!pauth_key_enabled(env, el, SCTLR_EnIB)) {
 return x;
 }
-pauth_check_trap(env, el, GETPC());
-return pauth_auth(env, x, y, >keys.apib, false, 1);
+pauth_check_trap(env, el, ra);
+return pauth_auth(env, x, y, >keys.apib, false, 1, ra, is_combined);
 }
 
-uint64_t HELPER(autda)(CPUARMState *env, uint64_t x, uint64_t y)
+uint64_t HELPER(autib)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autib(env, x, y, GETPC(), false);
+}
+
+uint64_t HELPER(autib_combined)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autib(env, x, y, GETPC(), true);
+}
+
+static uint64_t pauth_autda(CPUARMState *env, uint64_t x, uint64_t y,
+uintptr_t ra, bool is_combined)
 {
 int el = arm_current_el(env);
 if (!pauth_key_enabled(env, el, SCTLR_EnDA)) {
 return x;
 }
-pauth_check_trap(env, el, GETPC());
-return pauth_auth(env, x, y, >keys.apda, true, 0);
+pauth_check_trap(env, el, ra);
+return pauth_auth(env, x, y, >keys.apda, true, 0, ra, is_combined);
 }
 
-uint64_t HELPER(autdb)(CPUARMState *env, uint64_t x, uint64_t y)
+uint64_t HELPER(autda)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autda(env, x, y, GETPC(), false);
+}
+
+uint64_t HELPER(autda_combined)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autda(env, x, y, GETPC(), true);
+}
+
+static uint64_t pauth_autdb(CPUARMState *env, uint64_t x, uint64_t y,
+uintptr_t ra, bool is_combined)
 {
 int e

Re: [PATCH v2 7/7] target/arm: Add CPU properties for most v8.3 PAC features

2023-03-22 Thread Aaron Lindsay
On Feb 22 12:14, Richard Henderson wrote:
> On 2/22/23 09:35, Aaron Lindsay wrote:
> > +static Property arm_cpu_pauth2_property =
> > +DEFINE_PROP_BOOL("pauth2", ARMCPU, prop_pauth2, false);
> > +static Property arm_cpu_pauth_fpac_property =
> > +DEFINE_PROP_BOOL("pauth-fpac", ARMCPU, prop_pauth_fpac, false);
> > +static Property arm_cpu_pauth_fpac_combine_property =
> > +DEFINE_PROP_BOOL("pauth-fpac-combine", ARMCPU, 
> > prop_pauth_fpac_combine, false);
> 
> For -cpu max, I would expect these to default on.
> Or perhaps not expose these or epac as properties at all.

I've removed these properties, and epac's as well. It now defaults to
the equivalent of prop_pauth_fpac_combine==true in my previous patch.

> I see that qarma3 does about half the work of qarma5, so it would be
> interesting to measure the relative speed of the 3 implementations on a boot
> of kernel + selftests.
> 
> You may want to look a the code generated and play with flatten and noinline
> attributes around pauth_computepac and subroutines.  E.g.
> 
> static uint64_t __attribute__((flatten, noinline))
> pauth_computepac_qarma5(uint64_t data, uint64_t modifier, ARMPACKey key)
> {
> return pauth_computepac_architected(data, modifier, key, false);
> }
> 
> static uint64_t __attribute__((flatten, noinline))
> pauth_computepac_qarma3(uint64_t data, uint64_t modifier, ARMPACKey key)
> {
> return pauth_computepac_architected(data, modifier, key, true);
> }
> 
> static uint64_t __attribute__((flatten, noinline))
> pauth_computepac_impdef(uint64_t data, uint64_t modifier, ARMPACKey key)
> {
> return qemu_xxhash64_4(data, modifier, key.lo, key.hi);
> }
> 
> static uint64_t pauth_computepac(CPUARMState *env, uint64_t data,
>  uint64_t modifier, ARMPACKey key)
> {
> if (cpu_isar_feature(aa64_pauth_arch_qarma5, env_archcpu(env))) {
> return pauth_computepac_qarma5(data, modifier, key);
> } else if (cpu_isar_feature(aa64_pauth_arch_qarma3, env_archcpu(env))) {
> return pauth_computepac_qarma3(data, modifier, key);
> } else {
> return pauth_computepac_impdef(data, modifier, key);
> }
> }

I have not played around with this further. Do you feel this is
important to look into prior to merging this patchset (since QARMA3
isn't the default)?

-Aaron



Re: [PATCH v2 6/7] target/arm: Implement v8.3 FPAC and FPACCOMBINE

2023-03-22 Thread Aaron Lindsay
On Feb 22 11:37, Richard Henderson wrote:
> On 2/22/23 09:35, Aaron Lindsay wrote:
> > @@ -406,6 +421,16 @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t 
> > ptr, uint64_t modifier,
> >   uint64_t xor_mask = MAKE_64BIT_MASK(bot_bit, top_bit - bot_bit + 
> > 1) &
> >   ~MAKE_64BIT_MASK(55, 1);
> >   result = ((ptr ^ pac) & xor_mask) | (ptr & ~xor_mask);
> > +if (cpu_isar_feature(aa64_fpac_combine, env_archcpu(env)) ||
> > +(cpu_isar_feature(aa64_fpac, env_archcpu(env)) &&
> > + !is_combined)) {
> 
> Indentation is off.

I pulled `env_archcpu(env)` out of this if-statement in my latest
patchset in addition to the indentation, but am not confident I have
done what you intended. The QEMU Coding Style guide doesn't seem to
address longer statements like this in its section on indentation, so I
attempted to follow other examples in the code, but I'll take further
direction here.

-Aaron



[PATCH v3 4/8] target/arm: Implement v8.3 EnhancedPAC

2023-03-22 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/tcg/pauth_helper.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index 122c208de2..7682f139ef 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -326,6 +326,7 @@ static uint64_t pauth_computepac(CPUARMState *env, uint64_t 
data,
 static uint64_t pauth_addpac(CPUARMState *env, uint64_t ptr, uint64_t modifier,
  ARMPACKey *key, bool data)
 {
+ARMCPU *cpu = env_archcpu(env);
 ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
 ARMVAParameters param = aa64_va_parameters(env, ptr, mmu_idx, data);
 uint64_t pac, ext_ptr, ext, test;
@@ -351,11 +352,15 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
  */
 test = sextract64(ptr, bot_bit, top_bit - bot_bit);
 if (test != 0 && test != -1) {
-/*
- * Note that our top_bit is one greater than the pseudocode's
- * version, hence "- 2" here.
- */
-pac ^= MAKE_64BIT_MASK(top_bit - 2, 1);
+if (cpu_isar_feature(aa64_pauth_epac, cpu)) {
+pac = 0;
+} else {
+/*
+ * Note that our top_bit is one greater than the pseudocode's
+ * version, hence "- 2" here.
+ */
+pac ^= MAKE_64BIT_MASK(top_bit - 2, 1);
+}
 }
 
 /*
-- 
2.25.1




[PATCH v3 1/8] target/arm: Add ID_AA64ISAR2_EL1

2023-03-22 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.h | 1 +
 target/arm/helper.c  | 4 ++--
 target/arm/hvf/hvf.c | 1 +
 target/arm/kvm64.c   | 2 ++
 4 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index c097cae988..f0f27f259d 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1015,6 +1015,7 @@ struct ArchCPU {
 uint32_t dbgdevid1;
 uint64_t id_aa64isar0;
 uint64_t id_aa64isar1;
+uint64_t id_aa64isar2;
 uint64_t id_aa64pfr0;
 uint64_t id_aa64pfr1;
 uint64_t id_aa64mmfr0;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 2297626bfb..32426495c0 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -8204,11 +8204,11 @@ void register_cp_regs_for_features(ARMCPU *cpu)
   .access = PL1_R, .type = ARM_CP_CONST,
   .accessfn = access_aa64_tid3,
   .resetvalue = cpu->isar.id_aa64isar1 },
-{ .name = "ID_AA64ISAR2_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
+{ .name = "ID_AA64ISAR2_EL1", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 2,
   .access = PL1_R, .type = ARM_CP_CONST,
   .accessfn = access_aa64_tid3,
-  .resetvalue = 0 },
+  .resetvalue = cpu->isar.id_aa64isar2 },
 { .name = "ID_AA64ISAR3_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 3,
   .access = PL1_R, .type = ARM_CP_CONST,
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index ad65603445..4d7366b761 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -507,6 +507,7 @@ static bool 
hvf_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
 { HV_SYS_REG_ID_AA64DFR1_EL1, _isar.id_aa64dfr1 },
 { HV_SYS_REG_ID_AA64ISAR0_EL1, _isar.id_aa64isar0 },
 { HV_SYS_REG_ID_AA64ISAR1_EL1, _isar.id_aa64isar1 },
+{ HV_SYS_REG_ID_AA64ISAR2_EL1, _isar.id_aa64isar2 },
 { HV_SYS_REG_ID_AA64MMFR0_EL1, _isar.id_aa64mmfr0 },
 { HV_SYS_REG_ID_AA64MMFR1_EL1, _isar.id_aa64mmfr1 },
 { HV_SYS_REG_ID_AA64MMFR2_EL1, _isar.id_aa64mmfr2 },
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 1197253d12..4b71306f92 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -590,6 +590,8 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
   ARM64_SYS_REG(3, 0, 0, 6, 0));
 err |= read_sys_reg64(fdarray[2], >isar.id_aa64isar1,
   ARM64_SYS_REG(3, 0, 0, 6, 1));
+err |= read_sys_reg64(fdarray[2], >isar.id_aa64isar2,
+  ARM64_SYS_REG(3, 0, 0, 6, 2));
 err |= read_sys_reg64(fdarray[2], >isar.id_aa64mmfr0,
   ARM64_SYS_REG(3, 0, 0, 7, 0));
 err |= read_sys_reg64(fdarray[2], >isar.id_aa64mmfr1,
-- 
2.25.1




[PATCH v3 8/8] target/arm: Add CPU property for QARMA3, enable FPACCombined by default

2023-03-22 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.h   |  1 +
 target/arm/cpu64.c | 48 +++---
 2 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 868d844d5a..80683c428f 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1053,6 +1053,7 @@ struct ArchCPU {
  */
 bool prop_pauth;
 bool prop_pauth_impdef;
+bool prop_pauth_qarma3;
 bool prop_lpa2;
 
 /* DCZ blocksize, in log_2(words), ie low 4 bits of DCZID_EL0 */
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 0fb07cc7b6..a5f4540c73 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -591,9 +591,6 @@ static void aarch64_add_sme_properties(Object *obj)
 
 void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
 {
-int arch_val = 0, impdef_val = 0;
-uint64_t t;
-
 /* Exit early if PAuth is enabled, and fall through to disable it */
 if ((kvm_enabled() || hvf_enabled()) && cpu->prop_pauth) {
 if (!cpu_isar_feature(aa64_pauth, cpu)) {
@@ -604,30 +601,50 @@ void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
 return;
 }
 
-/* TODO: Handle HaveEnhancedPAC, HaveEnhancedPAC2, HaveFPAC. */
+/* Write the features into the correct field for the algorithm in use */
 if (cpu->prop_pauth) {
+uint64_t t;
+
+if (cpu->prop_pauth_impdef && cpu->prop_pauth_qarma3) {
+error_setg(errp, "Cannot set both qarma3 ('pauth-qarma3') and "
+"impdef ('pauth-impdef') pointer authentication ciphers");
+return;
+}
+
+/* Implement FEAT_FPACCOMBINE for address authentication and enable
+ * generic authentication for the chosen cipher.
+ */
+int address_auth = 0b0101;
+int generic_auth = 0b0001;
+
 if (cpu->prop_pauth_impdef) {
-impdef_val = 1;
+t = cpu->isar.id_aa64isar1;
+t = FIELD_DP64(t, ID_AA64ISAR1, API, address_auth);
+t = FIELD_DP64(t, ID_AA64ISAR1, GPI, generic_auth);
+cpu->isar.id_aa64isar1 = t;
+} else if (cpu->prop_pauth_qarma3) {
+t = cpu->isar.id_aa64isar2;
+t = FIELD_DP64(t, ID_AA64ISAR2, APA3, address_auth);
+t = FIELD_DP64(t, ID_AA64ISAR2, GPA3, generic_auth);
+cpu->isar.id_aa64isar2 = t;
 } else {
-arch_val = 1;
+t = cpu->isar.id_aa64isar1;
+t = FIELD_DP64(t, ID_AA64ISAR1, APA, address_auth);
+t = FIELD_DP64(t, ID_AA64ISAR1, GPA, generic_auth);
+cpu->isar.id_aa64isar1 = t;
 }
-} else if (cpu->prop_pauth_impdef) {
-error_setg(errp, "cannot enable pauth-impdef without pauth");
+} else if (cpu->prop_pauth_impdef || cpu->prop_pauth_qarma3) {
+error_setg(errp, "cannot enable pauth-impdef or pauth-qarma3 without 
pauth");
 error_append_hint(errp, "Add pauth=on to the CPU property list.\n");
 }
-
-t = cpu->isar.id_aa64isar1;
-t = FIELD_DP64(t, ID_AA64ISAR1, APA, arch_val);
-t = FIELD_DP64(t, ID_AA64ISAR1, GPA, arch_val);
-t = FIELD_DP64(t, ID_AA64ISAR1, API, impdef_val);
-t = FIELD_DP64(t, ID_AA64ISAR1, GPI, impdef_val);
-cpu->isar.id_aa64isar1 = t;
 }
 
 static Property arm_cpu_pauth_property =
 DEFINE_PROP_BOOL("pauth", ARMCPU, prop_pauth, true);
 static Property arm_cpu_pauth_impdef_property =
 DEFINE_PROP_BOOL("pauth-impdef", ARMCPU, prop_pauth_impdef, false);
+static Property arm_cpu_pauth_qarma3_property =
+DEFINE_PROP_BOOL("pauth-qarma3", ARMCPU, prop_pauth_qarma3, false);
 
 static void aarch64_add_pauth_properties(Object *obj)
 {
@@ -647,6 +664,7 @@ static void aarch64_add_pauth_properties(Object *obj)
 cpu->prop_pauth = cpu_isar_feature(aa64_pauth, cpu);
 } else {
 qdev_property_add_static(DEVICE(obj), _cpu_pauth_impdef_property);
+qdev_property_add_static(DEVICE(obj), _cpu_pauth_qarma3_property);
 }
 }
 
-- 
2.25.1




[PATCH v3 7/8] target/arm: Implement v8.3 FPAC and FPACCOMBINE

2023-03-22 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
---
 target/arm/syndrome.h |  7 +++
 target/arm/tcg/pauth_helper.c | 16 
 2 files changed, 23 insertions(+)

diff --git a/target/arm/syndrome.h b/target/arm/syndrome.h
index d27d1bc31f..bf79c539d9 100644
--- a/target/arm/syndrome.h
+++ b/target/arm/syndrome.h
@@ -49,6 +49,7 @@ enum arm_exception_class {
 EC_SYSTEMREGISTERTRAP = 0x18,
 EC_SVEACCESSTRAP  = 0x19,
 EC_ERETTRAP   = 0x1a,
+EC_PACFAIL= 0x1c,
 EC_SMETRAP= 0x1d,
 EC_INSNABORT  = 0x20,
 EC_INSNABORT_SAME_EL  = 0x21,
@@ -231,6 +232,12 @@ static inline uint32_t syn_smetrap(SMEExceptionType etype, 
bool is_16bit)
 | (is_16bit ? 0 : ARM_EL_IL) | etype;
 }
 
+static inline uint32_t syn_pacfail(bool data, int keynumber)
+{
+int error_code = (data << 1) | keynumber;
+return (EC_PACFAIL << ARM_EL_EC_SHIFT) | ARM_EL_IL | error_code;
+}
+
 static inline uint32_t syn_pactrap(void)
 {
 return EC_PACTRAP << ARM_EL_EC_SHIFT;
diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index 90ad6453e5..bb3dc7ff54 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -411,6 +411,13 @@ uint64_t pauth_ptr_mask(CPUARMState *env, uint64_t ptr, 
bool data)
 return pauth_ptr_mask_internal(param);
 }
 
+static G_NORETURN
+void pauth_fail_exception(CPUARMState *env, bool data, int keynumber, 
uintptr_t ra)
+{
+int target_el = exception_target_el(env);
+raise_exception_ra(env, EXCP_UDEF, syn_pacfail(data, keynumber), 
target_el, ra);
+}
+
 static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
ARMPACKey *key, bool data, int keynumber,
uintptr_t ra, bool is_combined)
@@ -430,6 +437,15 @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, 
uint64_t modifier,
 uint64_t xor_mask = MAKE_64BIT_MASK(bot_bit, top_bit - bot_bit + 1) &
 ~MAKE_64BIT_MASK(55, 1);
 result = ptr ^ (pac & xor_mask);
+if (cpu_isar_feature(aa64_fpac_combine, cpu)
+|| (cpu_isar_feature(aa64_fpac, cpu) && !is_combined)) {
+int fpac_top = param.tbi ? 55 : 64;
+uint64_t fpac_mask = MAKE_64BIT_MASK(bot_bit, fpac_top - bot_bit);
+test = (result ^ sextract64(result, 55, 1)) & fpac_mask;
+if (unlikely(test)) {
+pauth_fail_exception(env, data, keynumber, ra);
+}
+}
 } else {
 test = (pac ^ ptr) & ~MAKE_64BIT_MASK(55, 1);
 if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
-- 
2.25.1




[PATCH v3 5/8] target/arm: Implement v8.3 Pauth2

2023-03-22 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/tcg/pauth_helper.c | 33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index 7682f139ef..1148a21ce6 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -352,7 +352,9 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
  */
 test = sextract64(ptr, bot_bit, top_bit - bot_bit);
 if (test != 0 && test != -1) {
-if (cpu_isar_feature(aa64_pauth_epac, cpu)) {
+if (cpu_isar_feature(aa64_pauth2, cpu)) {
+/* No action required */
+} else if (cpu_isar_feature(aa64_pauth_epac, cpu)) {
 pac = 0;
 } else {
 /*
@@ -367,6 +369,9 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
  * Preserve the determination between upper and lower at bit 55,
  * and insert pointer authentication code.
  */
+if (cpu_isar_feature(aa64_pauth2, cpu)) {
+pac ^= ptr;
+}
 if (param.tbi) {
 ptr &= ~MAKE_64BIT_MASK(bot_bit, 55 - bot_bit + 1);
 pac &= MAKE_64BIT_MASK(bot_bit, 54 - bot_bit + 1);
@@ -409,26 +414,34 @@ uint64_t pauth_ptr_mask(CPUARMState *env, uint64_t ptr, 
bool data)
 static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
ARMPACKey *key, bool data, int keynumber)
 {
+ARMCPU *cpu = env_archcpu(env);
 ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
 ARMVAParameters param = aa64_va_parameters(env, ptr, mmu_idx, data);
 int bot_bit, top_bit;
-uint64_t pac, orig_ptr, test;
+uint64_t pac, orig_ptr, test, result;
 
 orig_ptr = pauth_original_ptr(ptr, param);
 pac = pauth_computepac(env, orig_ptr, modifier, *key);
 bot_bit = 64 - param.tsz;
 top_bit = 64 - 8 * param.tbi;
 
-test = (pac ^ ptr) & ~MAKE_64BIT_MASK(55, 1);
-if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
-int error_code = (keynumber << 1) | (keynumber ^ 1);
-if (param.tbi) {
-return deposit64(orig_ptr, 53, 2, error_code);
-} else {
-return deposit64(orig_ptr, 61, 2, error_code);
+if (cpu_isar_feature(aa64_pauth2, cpu)) {
+uint64_t xor_mask = MAKE_64BIT_MASK(bot_bit, top_bit - bot_bit + 1) &
+~MAKE_64BIT_MASK(55, 1);
+result = ptr ^ (pac & xor_mask);
+} else {
+test = (pac ^ ptr) & ~MAKE_64BIT_MASK(55, 1);
+if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
+int error_code = (keynumber << 1) | (keynumber ^ 1);
+if (param.tbi) {
+return deposit64(orig_ptr, 53, 2, error_code);
+} else {
+return deposit64(orig_ptr, 61, 2, error_code);
+}
 }
+result = orig_ptr;
 }
-return orig_ptr;
+return result;
 }
 
 static uint64_t pauth_strip(CPUARMState *env, uint64_t ptr, bool data)
-- 
2.25.1




[PATCH v3 3/8] target/arm: Implement v8.3 QARMA3 PAC cipher

2023-03-22 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/tcg/pauth_helper.c | 54 ---
 1 file changed, 44 insertions(+), 10 deletions(-)

diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index 6bb3b5b9e5..122c208de2 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -96,6 +96,21 @@ static uint64_t pac_sub(uint64_t i)
 return o;
 }
 
+static uint64_t pac_sub1(uint64_t i)
+{
+static const uint8_t sub1[16] = {
+0xa, 0xd, 0xe, 0x6, 0xf, 0x7, 0x3, 0x5,
+0x9, 0x8, 0x0, 0xc, 0xb, 0x1, 0x2, 0x4,
+};
+uint64_t o = 0;
+int b;
+
+for (b = 0; b < 64; b += 4) {
+o |= (uint64_t)sub1[(i >> b) & 0xf] << b;
+}
+return o;
+}
+
 static uint64_t pac_inv_sub(uint64_t i)
 {
 static const uint8_t inv_sub[16] = {
@@ -209,7 +224,7 @@ static uint64_t tweak_inv_shuffle(uint64_t i)
 }
 
 static uint64_t pauth_computepac_architected(uint64_t data, uint64_t modifier,
- ARMPACKey key)
+ ARMPACKey key, bool isqarma3)
 {
 static const uint64_t RC[5] = {
 0xull,
@@ -219,6 +234,7 @@ static uint64_t pauth_computepac_architected(uint64_t data, 
uint64_t modifier,
 0x452821E638D01377ull,
 };
 const uint64_t alpha = 0xC0AC29B7C97C50DDull;
+int iterations = isqarma3 ? 2 : 4;
 /*
  * Note that in the ARM pseudocode, key0 contains bits <127:64>
  * and key1 contains bits <63:0> of the 128-bit key.
@@ -231,7 +247,7 @@ static uint64_t pauth_computepac_architected(uint64_t data, 
uint64_t modifier,
 runningmod = modifier;
 workingval = data ^ key0;
 
-for (i = 0; i <= 4; ++i) {
+for (i = 0; i <= iterations; ++i) {
 roundkey = key1 ^ runningmod;
 workingval ^= roundkey;
 workingval ^= RC[i];
@@ -239,32 +255,48 @@ static uint64_t pauth_computepac_architected(uint64_t 
data, uint64_t modifier,
 workingval = pac_cell_shuffle(workingval);
 workingval = pac_mult(workingval);
 }
-workingval = pac_sub(workingval);
+if (isqarma3) {
+workingval = pac_sub1(workingval);
+} else {
+workingval = pac_sub(workingval);
+}
 runningmod = tweak_shuffle(runningmod);
 }
 roundkey = modk0 ^ runningmod;
 workingval ^= roundkey;
 workingval = pac_cell_shuffle(workingval);
 workingval = pac_mult(workingval);
-workingval = pac_sub(workingval);
+if (isqarma3) {
+workingval = pac_sub1(workingval);
+} else {
+workingval = pac_sub(workingval);
+}
 workingval = pac_cell_shuffle(workingval);
 workingval = pac_mult(workingval);
 workingval ^= key1;
 workingval = pac_cell_inv_shuffle(workingval);
-workingval = pac_inv_sub(workingval);
+if (isqarma3) {
+workingval = pac_sub1(workingval);
+} else {
+workingval = pac_inv_sub(workingval);
+}
 workingval = pac_mult(workingval);
 workingval = pac_cell_inv_shuffle(workingval);
 workingval ^= key0;
 workingval ^= runningmod;
-for (i = 0; i <= 4; ++i) {
-workingval = pac_inv_sub(workingval);
-if (i < 4) {
+for (i = 0; i <= iterations; ++i) {
+if (isqarma3) {
+workingval = pac_sub1(workingval);
+} else {
+workingval = pac_inv_sub(workingval);
+}
+if (i < iterations) {
 workingval = pac_mult(workingval);
 workingval = pac_cell_inv_shuffle(workingval);
 }
 runningmod = tweak_inv_shuffle(runningmod);
 roundkey = key1 ^ runningmod;
-workingval ^= RC[4 - i];
+workingval ^= RC[iterations - i];
 workingval ^= roundkey;
 workingval ^= alpha;
 }
@@ -283,7 +315,9 @@ static uint64_t pauth_computepac(CPUARMState *env, uint64_t 
data,
  uint64_t modifier, ARMPACKey key)
 {
 if (cpu_isar_feature(aa64_pauth_arch_qarma5, env_archcpu(env))) {
-return pauth_computepac_architected(data, modifier, key);
+return pauth_computepac_architected(data, modifier, key, false);
+} else if (cpu_isar_feature(aa64_pauth_arch_qarma3, env_archcpu(env))) {
+return pauth_computepac_architected(data, modifier, key, true);
 } else {
 return pauth_computepac_impdef(data, modifier, key);
 }
-- 
2.25.1




[PATCH v3 0/8] Implement Most ARMv8.3 Pointer Authentication Features

2023-03-22 Thread Aaron Lindsay
Changes from v2 of this patchset [0]:
- Remove properties for EPAC, Pauth2, FPAC, FPACCombined
- Put aa64isar2 addition/initialization into separate patch
- Clarified several comments (particularly one regarding our divergence
  from ARM's pseudocode around EPAC feature-detection)
- Several code formatting fixes and logic simplifications

[0] - https://lists.nongnu.org/archive/html/qemu-devel/2023-02/msg06494.html

Aaron Lindsay (8):
  target/arm: Add ID_AA64ISAR2_EL1
  target/arm: v8.3 PAC ID_AA64ISAR[12] feature-detection
  target/arm: Implement v8.3 QARMA3 PAC cipher
  target/arm: Implement v8.3 EnhancedPAC
  target/arm: Implement v8.3 Pauth2
  targer/arm: Inform helpers whether a PAC instruction is 'combined'
  target/arm: Implement v8.3 FPAC and FPACCOMBINE
  target/arm: Add CPU property for QARMA3, enable FPACCombined by
default

 target/arm/cpu.h   |  67 +++-
 target/arm/cpu64.c |  48 ++---
 target/arm/helper-a64.h|   4 +
 target/arm/helper.c|   4 +-
 target/arm/hvf/hvf.c   |   1 +
 target/arm/kvm64.c |   2 +
 target/arm/syndrome.h  |   7 ++
 target/arm/tcg/pauth_helper.c  | 189 ++---
 target/arm/tcg/translate-a64.c |  20 ++--
 9 files changed, 274 insertions(+), 68 deletions(-)

-- 
2.25.1




[PATCH v3 2/8] target/arm: v8.3 PAC ID_AA64ISAR[12] feature-detection

2023-03-22 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.h  | 65 +--
 target/arm/tcg/pauth_helper.c |  2 +-
 2 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index f0f27f259d..868d844d5a 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3705,18 +3705,77 @@ static inline bool isar_feature_aa64_pauth(const 
ARMISARegisters *id)
 (FIELD_DP64(0, ID_AA64ISAR1, APA, 0xf) |
  FIELD_DP64(0, ID_AA64ISAR1, API, 0xf) |
  FIELD_DP64(0, ID_AA64ISAR1, GPA, 0xf) |
- FIELD_DP64(0, ID_AA64ISAR1, GPI, 0xf))) != 0;
+ FIELD_DP64(0, ID_AA64ISAR1, GPI, 0xf))) != 0 ||
+   (id->id_aa64isar2 &
+(FIELD_DP64(0, ID_AA64ISAR2, APA3, 0xf) |
+ FIELD_DP64(0, ID_AA64ISAR2, GPA3, 0xf))) != 0;
 }
 
-static inline bool isar_feature_aa64_pauth_arch(const ARMISARegisters *id)
+static inline bool isar_feature_aa64_pauth_arch_qarma5(const ARMISARegisters 
*id)
 {
 /*
- * Return true if pauth is enabled with the architected QARMA algorithm.
+ * Return true if pauth is enabled with the architected QARMA5 algorithm.
  * QEMU will always set APA+GPA to the same value.
  */
 return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, APA) != 0;
 }
 
+static inline bool isar_feature_aa64_pauth_arch_qarma3(const ARMISARegisters 
*id)
+{
+/*
+ * Return true if pauth is enabled with the architected QARMA3 algorithm.
+ * QEMU will always set APA3+GPA3 to the same result.
+ */
+return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, APA3) != 0;
+}
+
+static inline bool isar_feature_aa64_pauth_arch(const ARMISARegisters *id)
+{
+return isar_feature_aa64_pauth_arch_qarma5(id) ||
+isar_feature_aa64_pauth_arch_qarma3(id);
+}
+
+static inline int isar_feature_pauth_get_features(const ARMISARegisters *id)
+{
+if (isar_feature_aa64_pauth_arch_qarma5(id)) {
+return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, APA);
+} else if (isar_feature_aa64_pauth_arch_qarma3(id)) {
+return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, APA3);
+} else {
+return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, API);
+}
+}
+
+static inline bool isar_feature_aa64_pauth_epac(const ARMISARegisters *id)
+{
+/*
+ * Note that unlike most AArch64 features, EPAC is treated (in the ARM
+ * psedocode, at least) as not being implemented by larger values of this
+ * field. Our usage of '>=' rather than '==' here causes our implementation
+ * of PAC logic to diverge from ARM pseudocode - we must check that
+ * isar_feature_aa64_pauth2() returns false AND
+ * isar_feature_aa64_pauth_epac() returns true, where the pseudocode reads
+ * as if EPAC is not implemented if the value of this register is > 0b10.
+ * See the implementation of pauth_addpac() for an example.
+ */
+return isar_feature_pauth_get_features(id) >= 0b0010;
+}
+
+static inline bool isar_feature_aa64_pauth2(const ARMISARegisters *id)
+{
+return isar_feature_pauth_get_features(id) >= 0b0011;
+}
+
+static inline bool isar_feature_aa64_fpac(const ARMISARegisters *id)
+{
+return isar_feature_pauth_get_features(id) >= 0b0100;
+}
+
+static inline bool isar_feature_aa64_fpac_combine(const ARMISARegisters *id)
+{
+return isar_feature_pauth_get_features(id) >= 0b0101;
+}
+
 static inline bool isar_feature_aa64_tlbirange(const ARMISARegisters *id)
 {
 return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, TLB) == 2;
diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index 20f347332d..6bb3b5b9e5 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -282,7 +282,7 @@ static uint64_t pauth_computepac_impdef(uint64_t data, 
uint64_t modifier,
 static uint64_t pauth_computepac(CPUARMState *env, uint64_t data,
  uint64_t modifier, ARMPACKey key)
 {
-if (cpu_isar_feature(aa64_pauth_arch, env_archcpu(env))) {
+if (cpu_isar_feature(aa64_pauth_arch_qarma5, env_archcpu(env))) {
 return pauth_computepac_architected(data, modifier, key);
 } else {
 return pauth_computepac_impdef(data, modifier, key);
-- 
2.25.1




[PATCH v3 6/8] targer/arm: Inform helpers whether a PAC instruction is 'combined'

2023-03-22 Thread Aaron Lindsay
An instruction is a 'combined' Pointer Authentication instruction if it
does something in addition to PAC - for instance, branching to or
loading an address from the authenticated pointer. Knowing whether a PAC
operation is 'combined' is needed to implement the FPACCOMBINE feature
for ARMv8.3.

Signed-off-by: Aaron Lindsay 
Reviewed-by: Richard Henderson 
---
 target/arm/helper-a64.h|  4 ++
 target/arm/tcg/pauth_helper.c  | 71 +++---
 target/arm/tcg/translate-a64.c | 20 +-
 3 files changed, 72 insertions(+), 23 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index ff56807247..79d06e820a 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -90,9 +90,13 @@ DEF_HELPER_FLAGS_3(pacda, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(pacdb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(pacga, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autia, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autia_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autib, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autib_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autda, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autda_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autdb, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autdb_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_2(xpaci, TCG_CALL_NO_RWG_SE, i64, env, i64)
 DEF_HELPER_FLAGS_2(xpacd, TCG_CALL_NO_RWG_SE, i64, env, i64)
 
diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index 1148a21ce6..90ad6453e5 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -412,7 +412,8 @@ uint64_t pauth_ptr_mask(CPUARMState *env, uint64_t ptr, 
bool data)
 }
 
 static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
-   ARMPACKey *key, bool data, int keynumber)
+   ARMPACKey *key, bool data, int keynumber,
+   uintptr_t ra, bool is_combined)
 {
 ARMCPU *cpu = env_archcpu(env);
 ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
@@ -534,44 +535,88 @@ uint64_t HELPER(pacga)(CPUARMState *env, uint64_t x, 
uint64_t y)
 return pac & 0xull;
 }
 
-uint64_t HELPER(autia)(CPUARMState *env, uint64_t x, uint64_t y)
+static uint64_t pauth_autia(CPUARMState *env, uint64_t x, uint64_t y,
+uintptr_t ra, bool is_combined)
 {
 int el = arm_current_el(env);
 if (!pauth_key_enabled(env, el, SCTLR_EnIA)) {
 return x;
 }
-pauth_check_trap(env, el, GETPC());
-return pauth_auth(env, x, y, >keys.apia, false, 0);
+pauth_check_trap(env, el, ra);
+return pauth_auth(env, x, y, >keys.apia, false, 0, ra, is_combined);
 }
 
-uint64_t HELPER(autib)(CPUARMState *env, uint64_t x, uint64_t y)
+uint64_t HELPER(autia)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autia(env, x, y, GETPC(), false);
+}
+
+uint64_t HELPER(autia_combined)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autia(env, x, y, GETPC(), true);
+}
+
+static uint64_t pauth_autib(CPUARMState *env, uint64_t x, uint64_t y,
+uintptr_t ra, bool is_combined)
 {
 int el = arm_current_el(env);
 if (!pauth_key_enabled(env, el, SCTLR_EnIB)) {
 return x;
 }
-pauth_check_trap(env, el, GETPC());
-return pauth_auth(env, x, y, >keys.apib, false, 1);
+pauth_check_trap(env, el, ra);
+return pauth_auth(env, x, y, >keys.apib, false, 1, ra, is_combined);
 }
 
-uint64_t HELPER(autda)(CPUARMState *env, uint64_t x, uint64_t y)
+uint64_t HELPER(autib)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autib(env, x, y, GETPC(), false);
+}
+
+uint64_t HELPER(autib_combined)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autib(env, x, y, GETPC(), true);
+}
+
+static uint64_t pauth_autda(CPUARMState *env, uint64_t x, uint64_t y,
+uintptr_t ra, bool is_combined)
 {
 int el = arm_current_el(env);
 if (!pauth_key_enabled(env, el, SCTLR_EnDA)) {
 return x;
 }
-pauth_check_trap(env, el, GETPC());
-return pauth_auth(env, x, y, >keys.apda, true, 0);
+pauth_check_trap(env, el, ra);
+return pauth_auth(env, x, y, >keys.apda, true, 0, ra, is_combined);
 }
 
-uint64_t HELPER(autdb)(CPUARMState *env, uint64_t x, uint64_t y)
+uint64_t HELPER(autda)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autda(env, x, y, GETPC(), false);
+}
+
+uint64_t HELPER(autda_combined)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autda(env, x, y, GETPC(), true);
+}
+
+static uint64_t pauth_autdb(CPUARMState *env, uint64_t x, uint64_t y,
+uintptr_t ra, bool is_combined)
 {
 int e

[PATCH v2 6/7] target/arm: Implement v8.3 FPAC and FPACCOMBINE

2023-02-22 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
---
 target/arm/pauth_helper.c | 35 ++-
 target/arm/syndrome.h |  7 +++
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
index 96770d7860..db6cf9b5bc 100644
--- a/target/arm/pauth_helper.c
+++ b/target/arm/pauth_helper.c
@@ -388,9 +388,24 @@ static uint64_t pauth_original_ptr(uint64_t ptr, 
ARMVAParameters param)
 return deposit64(ptr, bot_pac_bit, top_pac_bit - bot_pac_bit, extfield);
 }
 
+static G_NORETURN
+void pauth_fail_exception(CPUARMState *env, bool data, int keynumber, 
uintptr_t ra)
+{
+int target_el = arm_current_el(env);
+if (target_el == 0) {
+uint64_t hcr = arm_hcr_el2_eff(env);
+if (arm_is_el2_enabled(env) && (hcr & HCR_TGE))
+target_el = 2;
+else
+target_el = 1;
+}
+
+raise_exception_ra(env, EXCP_UDEF, syn_pacfail(data, keynumber), 
target_el, ra);
+}
+
 static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
ARMPACKey *key, bool data, int keynumber,
-   bool is_combined)
+   uintptr_t ra, bool is_combined)
 {
 ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
 ARMVAParameters param = aa64_va_parameters(env, ptr, mmu_idx, data);
@@ -406,6 +421,16 @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, 
uint64_t modifier,
 uint64_t xor_mask = MAKE_64BIT_MASK(bot_bit, top_bit - bot_bit + 1) &
 ~MAKE_64BIT_MASK(55, 1);
 result = ((ptr ^ pac) & xor_mask) | (ptr & ~xor_mask);
+if (cpu_isar_feature(aa64_fpac_combine, env_archcpu(env)) ||
+(cpu_isar_feature(aa64_fpac, env_archcpu(env)) &&
+ !is_combined)) {
+int fpac_top = param.tbi ? 55 : 64;
+uint64_t fpac_mask = MAKE_64BIT_MASK(bot_bit, fpac_top - bot_bit);
+test = (result ^ sextract64(result, 55, 1)) & fpac_mask;
+if (unlikely(test)) {
+pauth_fail_exception(env, data, keynumber, ra);
+}
+}
 } else {
 test = (pac ^ ptr) & ~MAKE_64BIT_MASK(55, 1);
 if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
@@ -519,7 +544,7 @@ static uint64_t pauth_autia(CPUARMState *env, uint64_t x, 
uint64_t y,
 return x;
 }
 pauth_check_trap(env, el, ra);
-return pauth_auth(env, x, y, >keys.apia, false, 0, is_combined);
+return pauth_auth(env, x, y, >keys.apia, false, 0, ra, is_combined);
 }
 
 uint64_t HELPER(autia)(CPUARMState *env, uint64_t x, uint64_t y)
@@ -540,7 +565,7 @@ static uint64_t pauth_autib(CPUARMState *env, uint64_t x, 
uint64_t y,
 return x;
 }
 pauth_check_trap(env, el, ra);
-return pauth_auth(env, x, y, >keys.apib, false, 1, is_combined);
+return pauth_auth(env, x, y, >keys.apib, false, 1, ra, is_combined);
 }
 
 uint64_t HELPER(autib)(CPUARMState *env, uint64_t x, uint64_t y)
@@ -561,7 +586,7 @@ static uint64_t pauth_autda(CPUARMState *env, uint64_t x, 
uint64_t y,
 return x;
 }
 pauth_check_trap(env, el, ra);
-return pauth_auth(env, x, y, >keys.apda, true, 0, is_combined);
+return pauth_auth(env, x, y, >keys.apda, true, 0, ra, is_combined);
 }
 
 uint64_t HELPER(autda)(CPUARMState *env, uint64_t x, uint64_t y)
@@ -582,7 +607,7 @@ static uint64_t pauth_autdb(CPUARMState *env, uint64_t x, 
uint64_t y,
 return x;
 }
 pauth_check_trap(env, el, ra);
-return pauth_auth(env, x, y, >keys.apdb, true, 1, is_combined);
+return pauth_auth(env, x, y, >keys.apdb, true, 1, ra, is_combined);
 }
 
 uint64_t HELPER(autdb)(CPUARMState *env, uint64_t x, uint64_t y)
diff --git a/target/arm/syndrome.h b/target/arm/syndrome.h
index 73df5e3793..99ed4c7d3d 100644
--- a/target/arm/syndrome.h
+++ b/target/arm/syndrome.h
@@ -48,6 +48,7 @@ enum arm_exception_class {
 EC_AA64_SMC   = 0x17,
 EC_SYSTEMREGISTERTRAP = 0x18,
 EC_SVEACCESSTRAP  = 0x19,
+EC_PACFAIL= 0x1c,
 EC_SMETRAP= 0x1d,
 EC_INSNABORT  = 0x20,
 EC_INSNABORT_SAME_EL  = 0x21,
@@ -221,6 +222,12 @@ static inline uint32_t syn_smetrap(SMEExceptionType etype, 
bool is_16bit)
 | (is_16bit ? 0 : ARM_EL_IL) | etype;
 }
 
+static inline uint32_t syn_pacfail(bool data, int keynumber)
+{
+int error_code = ((data ? 1 : 0) << 1) | (keynumber);
+return (EC_PACFAIL << ARM_EL_EC_SHIFT) | ARM_EL_IL | error_code;
+}
+
 static inline uint32_t syn_pactrap(void)
 {
 return EC_PACTRAP << ARM_EL_EC_SHIFT;
-- 
2.25.1




[PATCH v2 3/7] target/arm: Implement v8.3 EnhancedPAC

2023-02-22 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
---
 target/arm/pauth_helper.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
index f525ef7fad..a83956652f 100644
--- a/target/arm/pauth_helper.c
+++ b/target/arm/pauth_helper.c
@@ -347,11 +347,15 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
  */
 test = sextract64(ptr, bot_bit, top_bit - bot_bit);
 if (test != 0 && test != -1) {
-/*
- * Note that our top_bit is one greater than the pseudocode's
- * version, hence "- 2" here.
- */
-pac ^= MAKE_64BIT_MASK(top_bit - 2, 1);
+if (cpu_isar_feature(aa64_pauth_epac, env_archcpu(env))) {
+pac = 0;
+} else {
+/*
+ * Note that our top_bit is one greater than the pseudocode's
+ * version, hence "- 2" here.
+ */
+pac ^= MAKE_64BIT_MASK(top_bit - 2, 1);
+}
 }
 
 /*
-- 
2.25.1




[PATCH v2 7/7] target/arm: Add CPU properties for most v8.3 PAC features

2023-02-22 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.h   |  5 +++
 target/arm/cpu64.c | 81 ++
 2 files changed, 72 insertions(+), 14 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 9c3cbc9a29..40b4631f11 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1039,6 +1039,11 @@ struct ArchCPU {
  */
 bool prop_pauth;
 bool prop_pauth_impdef;
+bool prop_pauth_qarma3;
+bool prop_pauth_epac;
+bool prop_pauth2; // also known as EnhancedPAC2/EPAC2
+bool prop_pauth_fpac;
+bool prop_pauth_fpac_combine;
 bool prop_lpa2;
 
 /* DCZ blocksize, in log_2(words), ie low 4 bits of DCZID_EL0 */
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 0e021960fb..315acabbe2 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -590,8 +590,7 @@ static void aarch64_add_sme_properties(Object *obj)
 
 void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
 {
-int arch_val = 0, impdef_val = 0;
-uint64_t t;
+int address_auth = 0, generic_auth = 0;
 
 /* Exit early if PAuth is enabled, and fall through to disable it */
 if ((kvm_enabled() || hvf_enabled()) && cpu->prop_pauth) {
@@ -603,30 +602,79 @@ void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
 return;
 }
 
-/* TODO: Handle HaveEnhancedPAC, HaveEnhancedPAC2, HaveFPAC. */
+if (cpu->prop_pauth_epac &&
+(cpu->prop_pauth2 ||
+ cpu->prop_pauth_fpac ||
+ cpu->prop_pauth_fpac_combine)) {
+error_setg(errp, "'pauth-epac' feature not compatible with any of "
+   "'pauth-2', 'pauth-fpac', or 'pauth-fpac-combine'");
+return;
+}
+
+/* Determine the PAC features independently of the algorithm */
+if (cpu->prop_pauth_fpac_combine) {
+address_auth = 0b0101;
+} else if (cpu->prop_pauth_fpac) {
+address_auth = 0b0100;
+} else if (cpu->prop_pauth2) {
+address_auth = 0b0011;
+} else if (cpu->prop_pauth_epac) {
+address_auth = 0b0010;
+}
+
+/* Write the features into the correct field for the algorithm in use */
 if (cpu->prop_pauth) {
+uint64_t t;
+
+if (cpu->prop_pauth_impdef && cpu->prop_pauth_qarma3) {
+error_setg(errp, "Cannot set both qarma3 ('pauth-qarma3') and "
+"impdef ('pauth-impdef') pointer authentication ciphers");
+return;
+}
+
+if (address_auth == 0)
+address_auth = 0b0001;
+generic_auth = 1;
+
 if (cpu->prop_pauth_impdef) {
-impdef_val = 1;
+t = cpu->isar.id_aa64isar1;
+t = FIELD_DP64(t, ID_AA64ISAR1, API, address_auth);
+t = FIELD_DP64(t, ID_AA64ISAR1, GPI, generic_auth);
+cpu->isar.id_aa64isar1 = t;
+} else if (cpu->prop_pauth_qarma3) {
+t = cpu->isar.id_aa64isar2;
+t = FIELD_DP64(t, ID_AA64ISAR2, APA3, address_auth);
+t = FIELD_DP64(t, ID_AA64ISAR2, GPA3, generic_auth);
+cpu->isar.id_aa64isar2 = t;
 } else {
-arch_val = 1;
+t = cpu->isar.id_aa64isar1;
+t = FIELD_DP64(t, ID_AA64ISAR1, APA, address_auth);
+t = FIELD_DP64(t, ID_AA64ISAR1, GPA, generic_auth);
+cpu->isar.id_aa64isar1 = t;
 }
-} else if (cpu->prop_pauth_impdef) {
-error_setg(errp, "cannot enable pauth-impdef without pauth");
+} else if (cpu->prop_pauth_impdef || cpu->prop_pauth_qarma3) {
+error_setg(errp, "cannot enable pauth-impdef or pauth-qarma3 without 
pauth");
+error_append_hint(errp, "Add pauth=on to the CPU property list.\n");
+} else if (address_auth != 0) {
+error_setg(errp, "cannot enable any pauth* features without pauth");
 error_append_hint(errp, "Add pauth=on to the CPU property list.\n");
 }
-
-t = cpu->isar.id_aa64isar1;
-t = FIELD_DP64(t, ID_AA64ISAR1, APA, arch_val);
-t = FIELD_DP64(t, ID_AA64ISAR1, GPA, arch_val);
-t = FIELD_DP64(t, ID_AA64ISAR1, API, impdef_val);
-t = FIELD_DP64(t, ID_AA64ISAR1, GPI, impdef_val);
-cpu->isar.id_aa64isar1 = t;
 }
 
 static Property arm_cpu_pauth_property =
 DEFINE_PROP_BOOL("pauth", ARMCPU, prop_pauth, true);
 static Property arm_cpu_pauth_impdef_property =
 DEFINE_PROP_BOOL("pauth-impdef", ARMCPU, prop_pauth_impdef, false);
+static Property arm_cpu_pauth_qarma3_property =
+DEFINE_PROP_BOOL("pauth-qarma3", ARMCPU, prop_pauth_qarma3, false);
+static Property arm_cpu_pauth_epac_property =
+DEFINE_PROP_BOOL("pauth-epac", ARMCPU, prop_pauth_epac, false);
+static Property arm_cpu_pauth2_property =
+DEFINE_PROP_BOOL("pauth2", ARMCPU, pro

[PATCH v2 2/7] target/arm: Implement v8.3 QARMA3 PAC cipher

2023-02-22 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
---
 target/arm/pauth_helper.c | 50 +++
 1 file changed, 40 insertions(+), 10 deletions(-)

diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
index e5206453f6..f525ef7fad 100644
--- a/target/arm/pauth_helper.c
+++ b/target/arm/pauth_helper.c
@@ -96,6 +96,21 @@ static uint64_t pac_sub(uint64_t i)
 return o;
 }
 
+static uint64_t pac_sub1(uint64_t i)
+{
+static const uint8_t sub1[16] = {
+0xa, 0xd, 0xe, 0x6, 0xf, 0x7, 0x3, 0x5,
+0x9, 0x8, 0x0, 0xc, 0xb, 0x1, 0x2, 0x4,
+};
+uint64_t o = 0;
+int b;
+
+for (b = 0; b < 64; b += 4) {
+o |= (uint64_t)sub1[(i >> b) & 0xf] << b;
+}
+return o;
+}
+
 static uint64_t pac_inv_sub(uint64_t i)
 {
 static const uint8_t inv_sub[16] = {
@@ -209,7 +224,7 @@ static uint64_t tweak_inv_shuffle(uint64_t i)
 }
 
 static uint64_t pauth_computepac_architected(uint64_t data, uint64_t modifier,
- ARMPACKey key)
+ ARMPACKey key, bool isqarma3)
 {
 static const uint64_t RC[5] = {
 0xull,
@@ -219,6 +234,7 @@ static uint64_t pauth_computepac_architected(uint64_t data, 
uint64_t modifier,
 0x452821E638D01377ull,
 };
 const uint64_t alpha = 0xC0AC29B7C97C50DDull;
+int iterations = isqarma3 ? 2 : 4;
 /*
  * Note that in the ARM pseudocode, key0 contains bits <127:64>
  * and key1 contains bits <63:0> of the 128-bit key.
@@ -231,7 +247,7 @@ static uint64_t pauth_computepac_architected(uint64_t data, 
uint64_t modifier,
 runningmod = modifier;
 workingval = data ^ key0;
 
-for (i = 0; i <= 4; ++i) {
+for (i = 0; i <= iterations; ++i) {
 roundkey = key1 ^ runningmod;
 workingval ^= roundkey;
 workingval ^= RC[i];
@@ -239,32 +255,44 @@ static uint64_t pauth_computepac_architected(uint64_t 
data, uint64_t modifier,
 workingval = pac_cell_shuffle(workingval);
 workingval = pac_mult(workingval);
 }
-workingval = pac_sub(workingval);
+if (isqarma3)
+workingval = pac_sub1(workingval);
+else
+workingval = pac_sub(workingval);
 runningmod = tweak_shuffle(runningmod);
 }
 roundkey = modk0 ^ runningmod;
 workingval ^= roundkey;
 workingval = pac_cell_shuffle(workingval);
 workingval = pac_mult(workingval);
-workingval = pac_sub(workingval);
+if (isqarma3)
+workingval = pac_sub1(workingval);
+else
+workingval = pac_sub(workingval);
 workingval = pac_cell_shuffle(workingval);
 workingval = pac_mult(workingval);
 workingval ^= key1;
 workingval = pac_cell_inv_shuffle(workingval);
-workingval = pac_inv_sub(workingval);
+if (isqarma3)
+workingval = pac_sub1(workingval);
+else
+workingval = pac_inv_sub(workingval);
 workingval = pac_mult(workingval);
 workingval = pac_cell_inv_shuffle(workingval);
 workingval ^= key0;
 workingval ^= runningmod;
-for (i = 0; i <= 4; ++i) {
-workingval = pac_inv_sub(workingval);
-if (i < 4) {
+for (i = 0; i <= iterations; ++i) {
+if (isqarma3)
+workingval = pac_sub1(workingval);
+else
+workingval = pac_inv_sub(workingval);
+if (i < iterations) {
 workingval = pac_mult(workingval);
 workingval = pac_cell_inv_shuffle(workingval);
 }
 runningmod = tweak_inv_shuffle(runningmod);
 roundkey = key1 ^ runningmod;
-workingval ^= RC[4 - i];
+workingval ^= RC[iterations - i];
 workingval ^= roundkey;
 workingval ^= alpha;
 }
@@ -283,7 +311,9 @@ static uint64_t pauth_computepac(CPUARMState *env, uint64_t 
data,
  uint64_t modifier, ARMPACKey key)
 {
 if (cpu_isar_feature(aa64_pauth_arch_qarma5, env_archcpu(env))) {
-return pauth_computepac_architected(data, modifier, key);
+return pauth_computepac_architected(data, modifier, key, false);
+} else if (cpu_isar_feature(aa64_pauth_arch_qarma3, env_archcpu(env))) {
+return pauth_computepac_architected(data, modifier, key, true);
 } else {
 return pauth_computepac_impdef(data, modifier, key);
 }
-- 
2.25.1




[PATCH v2 0/7] Implement Most ARMv8.3 Pointer Authentication Features

2023-02-22 Thread Aaron Lindsay
Changes from v1 of this patchset [0]:

* Changed ISAR feature detection to use '>=' rather than '=='
* Switched around the logic handling EPAC/Pauth2 to play nicely with the
  '>=' ISAR feature detection for EPAC
* Re-organized a stray fragment of a patch to be bisect-friendly
* Moved call-sites for GETPC() to top-level helpers
* Calculate FPAC error code inside syn_pacfail() instead of the
  callsite. 

I have NOT yet made any changes to the properties/documentation (see
"target/arm: Add CPU properties for most v8.3 PAC features") since my
previous patchset - I'm planning to await further discussion about the
appropriate way to organize them before making those changes and
particularly welcome further review there.

-Aaron

[0] https://lists.nongnu.org/archive/html/qemu-devel/2023-02/msg00660.html

Aaron Lindsay (7):
  target/arm: v8.3 PAC ID_AA64ISAR[12] feature-detection
  target/arm: Implement v8.3 QARMA3 PAC cipher
  target/arm: Implement v8.3 EnhancedPAC
  target/arm: Implement v8.3 Pauth2
  targer/arm: Inform helpers whether a PAC instruction is 'combined'
  target/arm: Implement v8.3 FPAC and FPACCOMBINE
  target/arm: Add CPU properties for most v8.3 PAC features

 target/arm/cpu.h   |  66 -
 target/arm/cpu64.c |  81 +---
 target/arm/helper-a64.h|   4 +
 target/arm/helper.c|   4 +-
 target/arm/pauth_helper.c  | 192 +
 target/arm/syndrome.h  |   7 ++
 target/arm/translate-a64.c |  20 ++--
 7 files changed, 307 insertions(+), 67 deletions(-)

-- 
2.25.1




[PATCH v2 5/7] targer/arm: Inform helpers whether a PAC instruction is 'combined'

2023-02-22 Thread Aaron Lindsay
An instruction is a 'combined' Pointer Authentication instruction if it
does something in addition to PAC - for instance, branching to or
loading an address from the authenticated pointer. Knowing whether a PAC
operation is 'combined' is needed to implement the FPACCOMBINE feature
for ARMv8.3.

Signed-off-by: Aaron Lindsay 
---
 target/arm/helper-a64.h|  4 +++
 target/arm/pauth_helper.c  | 71 +++---
 target/arm/translate-a64.c | 20 +--
 3 files changed, 72 insertions(+), 23 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index 7b706571bb..829aaf4919 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -98,9 +98,13 @@ DEF_HELPER_FLAGS_3(pacda, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(pacdb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(pacga, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autia, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autia_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autib, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autib_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autda, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autda_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autdb, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autdb_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_2(xpaci, TCG_CALL_NO_RWG_SE, i64, env, i64)
 DEF_HELPER_FLAGS_2(xpacd, TCG_CALL_NO_RWG_SE, i64, env, i64)
 
diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
index c4ee040da7..96770d7860 100644
--- a/target/arm/pauth_helper.c
+++ b/target/arm/pauth_helper.c
@@ -389,7 +389,8 @@ static uint64_t pauth_original_ptr(uint64_t ptr, 
ARMVAParameters param)
 }
 
 static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
-   ARMPACKey *key, bool data, int keynumber)
+   ARMPACKey *key, bool data, int keynumber,
+   bool is_combined)
 {
 ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
 ARMVAParameters param = aa64_va_parameters(env, ptr, mmu_idx, data);
@@ -510,44 +511,88 @@ uint64_t HELPER(pacga)(CPUARMState *env, uint64_t x, 
uint64_t y)
 return pac & 0xull;
 }
 
-uint64_t HELPER(autia)(CPUARMState *env, uint64_t x, uint64_t y)
+static uint64_t pauth_autia(CPUARMState *env, uint64_t x, uint64_t y,
+uintptr_t ra, bool is_combined)
 {
 int el = arm_current_el(env);
 if (!pauth_key_enabled(env, el, SCTLR_EnIA)) {
 return x;
 }
-pauth_check_trap(env, el, GETPC());
-return pauth_auth(env, x, y, >keys.apia, false, 0);
+pauth_check_trap(env, el, ra);
+return pauth_auth(env, x, y, >keys.apia, false, 0, is_combined);
 }
 
-uint64_t HELPER(autib)(CPUARMState *env, uint64_t x, uint64_t y)
+uint64_t HELPER(autia)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autia(env, x, y, GETPC(), false);
+}
+
+uint64_t HELPER(autia_combined)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autia(env, x, y, GETPC(), true);
+}
+
+static uint64_t pauth_autib(CPUARMState *env, uint64_t x, uint64_t y,
+uintptr_t ra, bool is_combined)
 {
 int el = arm_current_el(env);
 if (!pauth_key_enabled(env, el, SCTLR_EnIB)) {
 return x;
 }
-pauth_check_trap(env, el, GETPC());
-return pauth_auth(env, x, y, >keys.apib, false, 1);
+pauth_check_trap(env, el, ra);
+return pauth_auth(env, x, y, >keys.apib, false, 1, is_combined);
 }
 
-uint64_t HELPER(autda)(CPUARMState *env, uint64_t x, uint64_t y)
+uint64_t HELPER(autib)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autib(env, x, y, GETPC(), false);
+}
+
+uint64_t HELPER(autib_combined)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autib(env, x, y, GETPC(), true);
+}
+
+static uint64_t pauth_autda(CPUARMState *env, uint64_t x, uint64_t y,
+uintptr_t ra, bool is_combined)
 {
 int el = arm_current_el(env);
 if (!pauth_key_enabled(env, el, SCTLR_EnDA)) {
 return x;
 }
-pauth_check_trap(env, el, GETPC());
-return pauth_auth(env, x, y, >keys.apda, true, 0);
+pauth_check_trap(env, el, ra);
+return pauth_auth(env, x, y, >keys.apda, true, 0, is_combined);
 }
 
-uint64_t HELPER(autdb)(CPUARMState *env, uint64_t x, uint64_t y)
+uint64_t HELPER(autda)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autda(env, x, y, GETPC(), false);
+}
+
+uint64_t HELPER(autda_combined)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autda(env, x, y, GETPC(), true);
+}
+
+static uint64_t pauth_autdb(CPUARMState *env, uint64_t x, uint64_t y,
+uintptr_t ra, bool is_combined)
 {
 int el = arm_current_el(env);
 if (!pauth_key_enabl

[PATCH v2 4/7] target/arm: Implement v8.3 Pauth2

2023-02-22 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
---
 target/arm/pauth_helper.c | 32 ++--
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
index a83956652f..c4ee040da7 100644
--- a/target/arm/pauth_helper.c
+++ b/target/arm/pauth_helper.c
@@ -347,7 +347,9 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
  */
 test = sextract64(ptr, bot_bit, top_bit - bot_bit);
 if (test != 0 && test != -1) {
-if (cpu_isar_feature(aa64_pauth_epac, env_archcpu(env))) {
+if (cpu_isar_feature(aa64_pauth2, env_archcpu(env))) {
+/* No action required */
+} else if (cpu_isar_feature(aa64_pauth_epac, env_archcpu(env))) {
 pac = 0;
 } else {
 /*
@@ -362,6 +364,9 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
  * Preserve the determination between upper and lower at bit 55,
  * and insert pointer authentication code.
  */
+if (cpu_isar_feature(aa64_pauth2, env_archcpu(env))) {
+pac ^= ptr;
+}
 if (param.tbi) {
 ptr &= ~MAKE_64BIT_MASK(bot_bit, 55 - bot_bit + 1);
 pac &= MAKE_64BIT_MASK(bot_bit, 54 - bot_bit + 1);
@@ -389,23 +394,30 @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
 ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
 ARMVAParameters param = aa64_va_parameters(env, ptr, mmu_idx, data);
 int bot_bit, top_bit;
-uint64_t pac, orig_ptr, test;
+uint64_t pac, orig_ptr, test, result;
 
 orig_ptr = pauth_original_ptr(ptr, param);
 pac = pauth_computepac(env, orig_ptr, modifier, *key);
 bot_bit = 64 - param.tsz;
 top_bit = 64 - 8 * param.tbi;
 
-test = (pac ^ ptr) & ~MAKE_64BIT_MASK(55, 1);
-if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
-int error_code = (keynumber << 1) | (keynumber ^ 1);
-if (param.tbi) {
-return deposit64(orig_ptr, 53, 2, error_code);
-} else {
-return deposit64(orig_ptr, 61, 2, error_code);
+if (cpu_isar_feature(aa64_pauth2, env_archcpu(env))) {
+uint64_t xor_mask = MAKE_64BIT_MASK(bot_bit, top_bit - bot_bit + 1) &
+~MAKE_64BIT_MASK(55, 1);
+result = ((ptr ^ pac) & xor_mask) | (ptr & ~xor_mask);
+} else {
+test = (pac ^ ptr) & ~MAKE_64BIT_MASK(55, 1);
+if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
+int error_code = (keynumber << 1) | (keynumber ^ 1);
+if (param.tbi) {
+return deposit64(orig_ptr, 53, 2, error_code);
+} else {
+return deposit64(orig_ptr, 61, 2, error_code);
+}
 }
+result = orig_ptr;
 }
-return orig_ptr;
+return result;
 }
 
 static uint64_t pauth_strip(CPUARMState *env, uint64_t ptr, bool data)
-- 
2.25.1




[PATCH v2 1/7] target/arm: v8.3 PAC ID_AA64ISAR[12] feature-detection

2023-02-22 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.h  | 61 +--
 target/arm/helper.c   |  4 +--
 target/arm/pauth_helper.c |  2 +-
 3 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 8cf70693be..9c3cbc9a29 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1001,6 +1001,7 @@ struct ArchCPU {
 uint32_t dbgdevid1;
 uint64_t id_aa64isar0;
 uint64_t id_aa64isar1;
+uint64_t id_aa64isar2;
 uint64_t id_aa64pfr0;
 uint64_t id_aa64pfr1;
 uint64_t id_aa64mmfr0;
@@ -3902,18 +3903,72 @@ static inline bool isar_feature_aa64_pauth(const 
ARMISARegisters *id)
 (FIELD_DP64(0, ID_AA64ISAR1, APA, 0xf) |
  FIELD_DP64(0, ID_AA64ISAR1, API, 0xf) |
  FIELD_DP64(0, ID_AA64ISAR1, GPA, 0xf) |
- FIELD_DP64(0, ID_AA64ISAR1, GPI, 0xf))) != 0;
+ FIELD_DP64(0, ID_AA64ISAR1, GPI, 0xf))) != 0 ||
+   (id->id_aa64isar2 &
+(FIELD_DP64(0, ID_AA64ISAR2, APA3, 0xf) |
+ FIELD_DP64(0, ID_AA64ISAR2, GPA3, 0xf))) != 0;
 }
 
-static inline bool isar_feature_aa64_pauth_arch(const ARMISARegisters *id)
+static inline bool isar_feature_aa64_pauth_arch_qarma5(const ARMISARegisters 
*id)
 {
 /*
- * Return true if pauth is enabled with the architected QARMA algorithm.
+ * Return true if pauth is enabled with the architected QARMA5 algorithm.
  * QEMU will always set APA+GPA to the same value.
  */
 return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, APA) != 0;
 }
 
+static inline bool isar_feature_aa64_pauth_arch_qarma3(const ARMISARegisters 
*id)
+{
+/*
+ * Return true if pauth is enabled with the architected QARMA3 algorithm.
+ * QEMU will always set APA3+GPA3 to the same value.
+ */
+return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, APA3) != 0;
+}
+
+static inline bool isar_feature_aa64_pauth_arch(const ARMISARegisters *id)
+{
+return isar_feature_aa64_pauth_arch_qarma5(id) ||
+isar_feature_aa64_pauth_arch_qarma3(id);
+}
+
+static inline uint8_t isar_feature_pauth_get_features(const ARMISARegisters 
*id)
+{
+if (isar_feature_aa64_pauth_arch_qarma5(id))
+return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, APA);
+else if (isar_feature_aa64_pauth_arch_qarma3(id))
+return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, APA3);
+else
+return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, API);
+}
+
+static inline bool isar_feature_aa64_pauth_epac(const ARMISARegisters *id)
+{
+/*
+ * Note that unlike most AArch64 features, EPAC is treated (in the ARM
+ * psedocode, at least) as not being implemented by larger values of this
+ * field. Our usage of '>=' rather than '==' here causes our implementation
+ * of PAC logic to diverge slightly from ARM pseudocode.
+ */
+return isar_feature_pauth_get_features(id) >= 0b0010;
+}
+
+static inline bool isar_feature_aa64_pauth2(const ARMISARegisters *id)
+{
+return isar_feature_pauth_get_features(id) >= 0b0011;
+}
+
+static inline bool isar_feature_aa64_fpac(const ARMISARegisters *id)
+{
+return isar_feature_pauth_get_features(id) >= 0b0100;
+}
+
+static inline bool isar_feature_aa64_fpac_combine(const ARMISARegisters *id)
+{
+return isar_feature_pauth_get_features(id) >= 0b0101;
+}
+
 static inline bool isar_feature_aa64_tlbirange(const ARMISARegisters *id)
 {
 return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, TLB) == 2;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 72b37b7cf1..448ebf8301 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -8028,11 +8028,11 @@ void register_cp_regs_for_features(ARMCPU *cpu)
   .access = PL1_R, .type = ARM_CP_CONST,
   .accessfn = access_aa64_tid3,
   .resetvalue = cpu->isar.id_aa64isar1 },
-{ .name = "ID_AA64ISAR2_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
+{ .name = "ID_AA64ISAR2_EL1", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 2,
   .access = PL1_R, .type = ARM_CP_CONST,
   .accessfn = access_aa64_tid3,
-  .resetvalue = 0 },
+  .resetvalue = cpu->isar.id_aa64isar2 },
 { .name = "ID_AA64ISAR3_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 3,
   .access = PL1_R, .type = ARM_CP_CONST,
diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
index d0483bf051..e5206453f6 100644
--- a/target/arm/pauth_helper.c
+++ b/target/arm/pauth_helper.c
@@ -282,7 +282,7 @@ static uint64_t pauth_computepac_impdef(uint64_t data, 
uint64_t modifier,
 static uint64_t pauth_computepac(CPUARMState *env, uint64_t data,
  uint64_t modifier, 

Re: [PATCH 1/7] target/arm: v8.3 PAC ID_AA64ISAR[12] feature-detection

2023-02-21 Thread Aaron Lindsay
On Feb 13 16:01, Peter Maydell wrote:
> On Thu, 2 Feb 2023 at 21:13, Aaron Lindsay  
> wrote:
> > +static inline bool isar_feature_aa64_pauth_epac(const ARMISARegisters *id)
> > +{
> > +return isar_feature_pauth_get_features(id) == 0b0010;
> 
> This should ideally be ">= 0b0010", but it depends a bit on
> where we call it.

FYI - I did make this `>= 0b0010` after changing the logic around elsewhere to
be compatible as you suggested. I'm also planning to add a comment like this:

 /*
  * Note that unlike most AArch64 features, EPAC is treated (in the ARM
  * psedocode, at least) as not being implemented by larger values of this
  * field. Our usage of '>=' rather than '==' here causes our implementation
  * of PAC logic to diverge slightly from ARM pseudocode.
  */


> This field, like most ID register fields, follows the "standard
> scheme", where the value increases and larger numbers always
> imply "all of the functionality from the lower values, plus
> some more". In QEMU we implement this by doing a >= type
> comparison on the field value.
> 
> The PAC related ID fields are documented slightly confusingly,
> but they do work this way. The documentation suggests it might not
> be quite that way for FEAT_EPAC because it says that
> HaveEnhancedPAC() returns TRUE for 2 and FALSE for 3 and up.
> However this is more because the definition of the architectural
> features means that FEAT_EPAC is irrelevant, and it's an accident
> of the way the pseudocode was written that means that
> HaveEnhancedPAC() ever gets called when FEAT_PAuth2 is present.
> 
> Other than EPAC, the rest of the values in these fields are
> straightforward and we can implement the isar_feature functions
> with a single >= comparison.

Thanks for your review!

I've made a number of your (simpler) suggested changes already, and will
target getting a new patchset out in the next couple weeks after I spend
more time withi the the remaining GETPC() changes that need a little
more thought/rework, and we sort out what the command-line options
should look like.

-Aaron



Re: [PATCH 7/7] target/arm: Add CPU properties for most v8.3 PAC features

2023-02-21 Thread Aaron Lindsay
On Feb 13 17:11, Peter Maydell wrote:
> On Thu, 2 Feb 2023 at 21:12, Aaron Lindsay  
> wrote:
> >
> > Signed-off-by: Aaron Lindsay 
> > ---
> >  target/arm/cpu.h   |  5 +++
> >  target/arm/cpu64.c | 81 ++
> >  2 files changed, 72 insertions(+), 14 deletions(-)
> 
> Do we really need all these properties ? Generally we don't
> add CPU properties unless there's a good reason for the
> user (or the board/SoC code) to want to flip them. The
> more usual case is that we simply enable them on the 'max'
> CPU by setting the ID register fields appropriately.

Honestly, I wasn't sure where to draw the line... so I didn't. Though I
won't claim to have perfect knowledge of the evolution of this feature,
it felt like there were 4 distinct levels that I could imagine might be
wanted - I've starred those 4 below:

* 1) no PAC   (APA/API=0b)
* 2) PAC without EPAC/Pauth2, QEMU's highest PAC implementation previous
 to this patchset (APA/API=0b0001)
* 3) EPAC (APA/API=0b0010)
  4) Pauth2   (APA/API=0b0011) 
  5) FPAC (APA/API=0b0100) 
* 6) FPACCombined (APA/API=0b0101)

And I am not sure how likely 4) and 5) are to be implemented, but after
I was already up to 4, adding the last two more didn't feel like much
more!

I half-considered trying to make `pauth` a single option which took a
string instead of a handful of separate boolean arguments. The possible
options might be `pauth=off`, `pauth=no-epac` (no EPAC), `pauth=epac`,
`pauth=pauth2`, `pauth=fpac-combine`.

All this to say: I'm more than happy to take guidance here!

> Somewhere in this series you need to add documentation of
> the features being implemented to docs/system/arm/emulation.rst
> (just a one-liner per FEAT_whatever).

Will do in my next patchset based on what we decide upon above.

Thanks!

-Aaron



[PATCH 6/7] target/arm: Implement v8.3 FPAC and FPACCOMBINE

2023-02-02 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
---
 target/arm/pauth_helper.c | 26 ++
 target/arm/syndrome.h |  6 ++
 2 files changed, 32 insertions(+)

diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
index 66dc90a289..3a2772de0e 100644
--- a/target/arm/pauth_helper.c
+++ b/target/arm/pauth_helper.c
@@ -385,6 +385,21 @@ static uint64_t pauth_original_ptr(uint64_t ptr, 
ARMVAParameters param)
 return deposit64(ptr, bot_pac_bit, top_pac_bit - bot_pac_bit, extfield);
 }
 
+static G_NORETURN
+void pauth_fail_exception(CPUARMState *env, int error_code)
+{
+int target_el = arm_current_el(env);
+if (target_el == 0) {
+uint64_t hcr = arm_hcr_el2_eff(env);
+if (arm_is_el2_enabled(env) && (hcr & HCR_TGE))
+target_el = 2;
+else
+target_el = 1;
+}
+
+raise_exception_ra(env, EXCP_UDEF, syn_pacfail(error_code), target_el, 
GETPC());
+}
+
 static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
ARMPACKey *key, bool data, int keynumber,
bool is_combined)
@@ -403,6 +418,17 @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, 
uint64_t modifier,
 uint64_t xor_mask = MAKE_64BIT_MASK(bot_bit, top_bit - bot_bit + 1) &
 ~MAKE_64BIT_MASK(55, 1);
 result = ((ptr ^ pac) & xor_mask) | (ptr & ~xor_mask);
+if (cpu_isar_feature(aa64_fpac_combine, env_archcpu(env)) ||
+(cpu_isar_feature(aa64_fpac, env_archcpu(env)) &&
+ !is_combined)) {
+int fpac_top = param.tbi ? 55 : 64;
+uint64_t fpac_mask = MAKE_64BIT_MASK(bot_bit, fpac_top - bot_bit);
+test = (result ^ sextract64(result, 55, 1)) & fpac_mask;
+if (unlikely(test)) {
+int error_code = ((data ? 1 : 0) << 1) | (keynumber);
+pauth_fail_exception(env, error_code);
+}
+}
 } else {
 test = (pac ^ ptr) & ~MAKE_64BIT_MASK(55, 1);
 if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
diff --git a/target/arm/syndrome.h b/target/arm/syndrome.h
index 73df5e3793..885a85735c 100644
--- a/target/arm/syndrome.h
+++ b/target/arm/syndrome.h
@@ -48,6 +48,7 @@ enum arm_exception_class {
 EC_AA64_SMC   = 0x17,
 EC_SYSTEMREGISTERTRAP = 0x18,
 EC_SVEACCESSTRAP  = 0x19,
+EC_PACFAIL= 0x1c,
 EC_SMETRAP= 0x1d,
 EC_INSNABORT  = 0x20,
 EC_INSNABORT_SAME_EL  = 0x21,
@@ -221,6 +222,11 @@ static inline uint32_t syn_smetrap(SMEExceptionType etype, 
bool is_16bit)
 | (is_16bit ? 0 : ARM_EL_IL) | etype;
 }
 
+static inline uint32_t syn_pacfail(int error_code)
+{
+return (EC_PACFAIL << ARM_EL_EC_SHIFT) | error_code;
+}
+
 static inline uint32_t syn_pactrap(void)
 {
 return EC_PACTRAP << ARM_EL_EC_SHIFT;
-- 
2.25.1




[PATCH 4/7] target/arm: Implement v8.3 Pauth2

2023-02-02 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
---
 target/arm/pauth_helper.c | 29 +++--
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
index a83956652f..6ebf6df75c 100644
--- a/target/arm/pauth_helper.c
+++ b/target/arm/pauth_helper.c
@@ -349,7 +349,7 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
 if (test != 0 && test != -1) {
 if (cpu_isar_feature(aa64_pauth_epac, env_archcpu(env))) {
 pac = 0;
-} else {
+} else if (! cpu_isar_feature(aa64_pauth2, env_archcpu(env))) {
 /*
  * Note that our top_bit is one greater than the pseudocode's
  * version, hence "- 2" here.
@@ -362,6 +362,8 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
  * Preserve the determination between upper and lower at bit 55,
  * and insert pointer authentication code.
  */
+if (cpu_isar_feature(aa64_pauth2, env_archcpu(env)))
+pac ^= ptr;
 if (param.tbi) {
 ptr &= ~MAKE_64BIT_MASK(bot_bit, 55 - bot_bit + 1);
 pac &= MAKE_64BIT_MASK(bot_bit, 54 - bot_bit + 1);
@@ -389,23 +391,30 @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
 ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
 ARMVAParameters param = aa64_va_parameters(env, ptr, mmu_idx, data);
 int bot_bit, top_bit;
-uint64_t pac, orig_ptr, test;
+uint64_t pac, orig_ptr, test, result;
 
 orig_ptr = pauth_original_ptr(ptr, param);
 pac = pauth_computepac(env, orig_ptr, modifier, *key);
 bot_bit = 64 - param.tsz;
 top_bit = 64 - 8 * param.tbi;
 
-test = (pac ^ ptr) & ~MAKE_64BIT_MASK(55, 1);
-if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
-int error_code = (keynumber << 1) | (keynumber ^ 1);
-if (param.tbi) {
-return deposit64(orig_ptr, 53, 2, error_code);
-} else {
-return deposit64(orig_ptr, 61, 2, error_code);
+if (cpu_isar_feature(aa64_pauth2, env_archcpu(env))) {
+uint64_t xor_mask = MAKE_64BIT_MASK(bot_bit, top_bit - bot_bit + 1) &
+~MAKE_64BIT_MASK(55, 1);
+result = ((ptr ^ pac) & xor_mask) | (ptr & ~xor_mask);
+} else {
+test = (pac ^ ptr) & ~MAKE_64BIT_MASK(55, 1);
+if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
+int error_code = (keynumber << 1) | (keynumber ^ 1);
+if (param.tbi) {
+return deposit64(orig_ptr, 53, 2, error_code);
+} else {
+return deposit64(orig_ptr, 61, 2, error_code);
+}
 }
+result = orig_ptr;
 }
-return orig_ptr;
+return result;
 }
 
 static uint64_t pauth_strip(CPUARMState *env, uint64_t ptr, bool data)
-- 
2.25.1




[PATCH 2/7] target/arm: Implement v8.3 QARMA3 PAC cipher

2023-02-02 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
---
 target/arm/pauth_helper.c | 48 +++
 1 file changed, 39 insertions(+), 9 deletions(-)

diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
index a0c9bea06b..f525ef7fad 100644
--- a/target/arm/pauth_helper.c
+++ b/target/arm/pauth_helper.c
@@ -96,6 +96,21 @@ static uint64_t pac_sub(uint64_t i)
 return o;
 }
 
+static uint64_t pac_sub1(uint64_t i)
+{
+static const uint8_t sub1[16] = {
+0xa, 0xd, 0xe, 0x6, 0xf, 0x7, 0x3, 0x5,
+0x9, 0x8, 0x0, 0xc, 0xb, 0x1, 0x2, 0x4,
+};
+uint64_t o = 0;
+int b;
+
+for (b = 0; b < 64; b += 4) {
+o |= (uint64_t)sub1[(i >> b) & 0xf] << b;
+}
+return o;
+}
+
 static uint64_t pac_inv_sub(uint64_t i)
 {
 static const uint8_t inv_sub[16] = {
@@ -209,7 +224,7 @@ static uint64_t tweak_inv_shuffle(uint64_t i)
 }
 
 static uint64_t pauth_computepac_architected(uint64_t data, uint64_t modifier,
- ARMPACKey key)
+ ARMPACKey key, bool isqarma3)
 {
 static const uint64_t RC[5] = {
 0xull,
@@ -219,6 +234,7 @@ static uint64_t pauth_computepac_architected(uint64_t data, 
uint64_t modifier,
 0x452821E638D01377ull,
 };
 const uint64_t alpha = 0xC0AC29B7C97C50DDull;
+int iterations = isqarma3 ? 2 : 4;
 /*
  * Note that in the ARM pseudocode, key0 contains bits <127:64>
  * and key1 contains bits <63:0> of the 128-bit key.
@@ -231,7 +247,7 @@ static uint64_t pauth_computepac_architected(uint64_t data, 
uint64_t modifier,
 runningmod = modifier;
 workingval = data ^ key0;
 
-for (i = 0; i <= 4; ++i) {
+for (i = 0; i <= iterations; ++i) {
 roundkey = key1 ^ runningmod;
 workingval ^= roundkey;
 workingval ^= RC[i];
@@ -239,32 +255,44 @@ static uint64_t pauth_computepac_architected(uint64_t 
data, uint64_t modifier,
 workingval = pac_cell_shuffle(workingval);
 workingval = pac_mult(workingval);
 }
-workingval = pac_sub(workingval);
+if (isqarma3)
+workingval = pac_sub1(workingval);
+else
+workingval = pac_sub(workingval);
 runningmod = tweak_shuffle(runningmod);
 }
 roundkey = modk0 ^ runningmod;
 workingval ^= roundkey;
 workingval = pac_cell_shuffle(workingval);
 workingval = pac_mult(workingval);
-workingval = pac_sub(workingval);
+if (isqarma3)
+workingval = pac_sub1(workingval);
+else
+workingval = pac_sub(workingval);
 workingval = pac_cell_shuffle(workingval);
 workingval = pac_mult(workingval);
 workingval ^= key1;
 workingval = pac_cell_inv_shuffle(workingval);
-workingval = pac_inv_sub(workingval);
+if (isqarma3)
+workingval = pac_sub1(workingval);
+else
+workingval = pac_inv_sub(workingval);
 workingval = pac_mult(workingval);
 workingval = pac_cell_inv_shuffle(workingval);
 workingval ^= key0;
 workingval ^= runningmod;
-for (i = 0; i <= 4; ++i) {
-workingval = pac_inv_sub(workingval);
-if (i < 4) {
+for (i = 0; i <= iterations; ++i) {
+if (isqarma3)
+workingval = pac_sub1(workingval);
+else
+workingval = pac_inv_sub(workingval);
+if (i < iterations) {
 workingval = pac_mult(workingval);
 workingval = pac_cell_inv_shuffle(workingval);
 }
 runningmod = tweak_inv_shuffle(runningmod);
 roundkey = key1 ^ runningmod;
-workingval ^= RC[4 - i];
+workingval ^= RC[iterations - i];
 workingval ^= roundkey;
 workingval ^= alpha;
 }
@@ -284,6 +312,8 @@ static uint64_t pauth_computepac(CPUARMState *env, uint64_t 
data,
 {
 if (cpu_isar_feature(aa64_pauth_arch_qarma5, env_archcpu(env))) {
 return pauth_computepac_architected(data, modifier, key, false);
+} else if (cpu_isar_feature(aa64_pauth_arch_qarma3, env_archcpu(env))) {
+return pauth_computepac_architected(data, modifier, key, true);
 } else {
 return pauth_computepac_impdef(data, modifier, key);
 }
-- 
2.25.1




[PATCH 1/7] target/arm: v8.3 PAC ID_AA64ISAR[12] feature-detection

2023-02-02 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.h  | 57 ---
 target/arm/helper.c   |  4 +--
 target/arm/pauth_helper.c |  4 +--
 3 files changed, 58 insertions(+), 7 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 8cf70693be..9be59163ff 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1001,6 +1001,7 @@ struct ArchCPU {
 uint32_t dbgdevid1;
 uint64_t id_aa64isar0;
 uint64_t id_aa64isar1;
+uint64_t id_aa64isar2;
 uint64_t id_aa64pfr0;
 uint64_t id_aa64pfr1;
 uint64_t id_aa64mmfr0;
@@ -3902,18 +3903,68 @@ static inline bool isar_feature_aa64_pauth(const 
ARMISARegisters *id)
 (FIELD_DP64(0, ID_AA64ISAR1, APA, 0xf) |
  FIELD_DP64(0, ID_AA64ISAR1, API, 0xf) |
  FIELD_DP64(0, ID_AA64ISAR1, GPA, 0xf) |
- FIELD_DP64(0, ID_AA64ISAR1, GPI, 0xf))) != 0;
+ FIELD_DP64(0, ID_AA64ISAR1, GPI, 0xf))) != 0 ||
+   (id->id_aa64isar2 &
+(FIELD_DP64(0, ID_AA64ISAR2, APA3, 0xf) |
+ FIELD_DP64(0, ID_AA64ISAR2, GPA3, 0xf))) != 0;
 }
 
-static inline bool isar_feature_aa64_pauth_arch(const ARMISARegisters *id)
+static inline bool isar_feature_aa64_pauth_arch_qarma5(const ARMISARegisters 
*id)
 {
 /*
- * Return true if pauth is enabled with the architected QARMA algorithm.
+ * Return true if pauth is enabled with the architected QARMA5 algorithm.
  * QEMU will always set APA+GPA to the same value.
  */
 return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, APA) != 0;
 }
 
+static inline bool isar_feature_aa64_pauth_arch_qarma3(const ARMISARegisters 
*id)
+{
+/*
+ * Return true if pauth is enabled with the architected QARMA3 algorithm.
+ * QEMU will always set APA3+GPA3 to the same value.
+ */
+return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, APA3) != 0;
+}
+
+static inline bool isar_feature_aa64_pauth_arch(const ARMISARegisters *id)
+{
+return isar_feature_aa64_pauth_arch_qarma5(id) ||
+isar_feature_aa64_pauth_arch_qarma3(id);
+}
+
+static inline uint8_t isar_feature_pauth_get_features(const ARMISARegisters 
*id)
+{
+if (isar_feature_aa64_pauth_arch_qarma5(id))
+return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, APA);
+else if (isar_feature_aa64_pauth_arch_qarma3(id))
+return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, APA3);
+else
+return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, API);
+}
+
+static inline bool isar_feature_aa64_pauth_epac(const ARMISARegisters *id)
+{
+return isar_feature_pauth_get_features(id) == 0b0010;
+}
+
+static inline bool isar_feature_aa64_fpac_combine(const ARMISARegisters *id)
+{
+return isar_feature_pauth_get_features(id) == 0b0101;
+}
+
+static inline bool isar_feature_aa64_fpac(const ARMISARegisters *id)
+{
+return isar_feature_pauth_get_features(id) == 0b0100 ||
+isar_feature_aa64_fpac_combine(id);
+}
+
+static inline bool isar_feature_aa64_pauth2(const ARMISARegisters *id)
+{
+return isar_feature_pauth_get_features(id) == 0b0011 ||
+isar_feature_aa64_fpac(id);
+}
+
 static inline bool isar_feature_aa64_tlbirange(const ARMISARegisters *id)
 {
 return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, TLB) == 2;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 72b37b7cf1..448ebf8301 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -8028,11 +8028,11 @@ void register_cp_regs_for_features(ARMCPU *cpu)
   .access = PL1_R, .type = ARM_CP_CONST,
   .accessfn = access_aa64_tid3,
   .resetvalue = cpu->isar.id_aa64isar1 },
-{ .name = "ID_AA64ISAR2_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
+{ .name = "ID_AA64ISAR2_EL1", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 2,
   .access = PL1_R, .type = ARM_CP_CONST,
   .accessfn = access_aa64_tid3,
-  .resetvalue = 0 },
+  .resetvalue = cpu->isar.id_aa64isar2 },
 { .name = "ID_AA64ISAR3_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 3,
   .access = PL1_R, .type = ARM_CP_CONST,
diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
index d0483bf051..a0c9bea06b 100644
--- a/target/arm/pauth_helper.c
+++ b/target/arm/pauth_helper.c
@@ -282,8 +282,8 @@ static uint64_t pauth_computepac_impdef(uint64_t data, 
uint64_t modifier,
 static uint64_t pauth_computepac(CPUARMState *env, uint64_t data,
  uint64_t modifier, ARMPACKey key)
 {
-if (cpu_isar_feature(aa64_pauth_arch, env_archcpu(env))) {
-return pauth_computepac_architected(data, modifier, key);
+if (cpu_isar_feature(aa64_pauth_arch_qarma5, env_archcpu(env))) {
+return pauth_computepac_architect

[PATCH 7/7] target/arm: Add CPU properties for most v8.3 PAC features

2023-02-02 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.h   |  5 +++
 target/arm/cpu64.c | 81 ++
 2 files changed, 72 insertions(+), 14 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 9be59163ff..a9420bae67 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1039,6 +1039,11 @@ struct ArchCPU {
  */
 bool prop_pauth;
 bool prop_pauth_impdef;
+bool prop_pauth_qarma3;
+bool prop_pauth_epac;
+bool prop_pauth2; // also known as EnhancedPAC2/EPAC2
+bool prop_pauth_fpac;
+bool prop_pauth_fpac_combine;
 bool prop_lpa2;
 
 /* DCZ blocksize, in log_2(words), ie low 4 bits of DCZID_EL0 */
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 0e021960fb..315acabbe2 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -590,8 +590,7 @@ static void aarch64_add_sme_properties(Object *obj)
 
 void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
 {
-int arch_val = 0, impdef_val = 0;
-uint64_t t;
+int address_auth = 0, generic_auth = 0;
 
 /* Exit early if PAuth is enabled, and fall through to disable it */
 if ((kvm_enabled() || hvf_enabled()) && cpu->prop_pauth) {
@@ -603,30 +602,79 @@ void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
 return;
 }
 
-/* TODO: Handle HaveEnhancedPAC, HaveEnhancedPAC2, HaveFPAC. */
+if (cpu->prop_pauth_epac &&
+(cpu->prop_pauth2 ||
+ cpu->prop_pauth_fpac ||
+ cpu->prop_pauth_fpac_combine)) {
+error_setg(errp, "'pauth-epac' feature not compatible with any of "
+   "'pauth-2', 'pauth-fpac', or 'pauth-fpac-combine'");
+return;
+}
+
+/* Determine the PAC features independently of the algorithm */
+if (cpu->prop_pauth_fpac_combine) {
+address_auth = 0b0101;
+} else if (cpu->prop_pauth_fpac) {
+address_auth = 0b0100;
+} else if (cpu->prop_pauth2) {
+address_auth = 0b0011;
+} else if (cpu->prop_pauth_epac) {
+address_auth = 0b0010;
+}
+
+/* Write the features into the correct field for the algorithm in use */
 if (cpu->prop_pauth) {
+uint64_t t;
+
+if (cpu->prop_pauth_impdef && cpu->prop_pauth_qarma3) {
+error_setg(errp, "Cannot set both qarma3 ('pauth-qarma3') and "
+"impdef ('pauth-impdef') pointer authentication ciphers");
+return;
+}
+
+if (address_auth == 0)
+address_auth = 0b0001;
+generic_auth = 1;
+
 if (cpu->prop_pauth_impdef) {
-impdef_val = 1;
+t = cpu->isar.id_aa64isar1;
+t = FIELD_DP64(t, ID_AA64ISAR1, API, address_auth);
+t = FIELD_DP64(t, ID_AA64ISAR1, GPI, generic_auth);
+cpu->isar.id_aa64isar1 = t;
+} else if (cpu->prop_pauth_qarma3) {
+t = cpu->isar.id_aa64isar2;
+t = FIELD_DP64(t, ID_AA64ISAR2, APA3, address_auth);
+t = FIELD_DP64(t, ID_AA64ISAR2, GPA3, generic_auth);
+cpu->isar.id_aa64isar2 = t;
 } else {
-arch_val = 1;
+t = cpu->isar.id_aa64isar1;
+t = FIELD_DP64(t, ID_AA64ISAR1, APA, address_auth);
+t = FIELD_DP64(t, ID_AA64ISAR1, GPA, generic_auth);
+cpu->isar.id_aa64isar1 = t;
 }
-} else if (cpu->prop_pauth_impdef) {
-error_setg(errp, "cannot enable pauth-impdef without pauth");
+} else if (cpu->prop_pauth_impdef || cpu->prop_pauth_qarma3) {
+error_setg(errp, "cannot enable pauth-impdef or pauth-qarma3 without 
pauth");
+error_append_hint(errp, "Add pauth=on to the CPU property list.\n");
+} else if (address_auth != 0) {
+error_setg(errp, "cannot enable any pauth* features without pauth");
 error_append_hint(errp, "Add pauth=on to the CPU property list.\n");
 }
-
-t = cpu->isar.id_aa64isar1;
-t = FIELD_DP64(t, ID_AA64ISAR1, APA, arch_val);
-t = FIELD_DP64(t, ID_AA64ISAR1, GPA, arch_val);
-t = FIELD_DP64(t, ID_AA64ISAR1, API, impdef_val);
-t = FIELD_DP64(t, ID_AA64ISAR1, GPI, impdef_val);
-cpu->isar.id_aa64isar1 = t;
 }
 
 static Property arm_cpu_pauth_property =
 DEFINE_PROP_BOOL("pauth", ARMCPU, prop_pauth, true);
 static Property arm_cpu_pauth_impdef_property =
 DEFINE_PROP_BOOL("pauth-impdef", ARMCPU, prop_pauth_impdef, false);
+static Property arm_cpu_pauth_qarma3_property =
+DEFINE_PROP_BOOL("pauth-qarma3", ARMCPU, prop_pauth_qarma3, false);
+static Property arm_cpu_pauth_epac_property =
+DEFINE_PROP_BOOL("pauth-epac", ARMCPU, prop_pauth_epac, false);
+static Property arm_cpu_pauth2_property =
+DEFINE_PROP_BOOL("pauth2", ARMCPU, pro

[PATCH 0/7] Implement Most ARMv8.3 Pointer Authentication Features

2023-02-02 Thread Aaron Lindsay
Hello,

I've taken a first pass at implementing many of the ARMv8.3 Pointer
Authentication features and welcome your review.

Thanks!

-Aaron

Aaron Lindsay (7):
  target/arm: v8.3 PAC ID_AA64ISAR[12] feature-detection
  target/arm: Implement v8.3 QARMA3 PAC cipher
  target/arm: Implement v8.3 EnhancedPAC
  target/arm: Implement v8.3 Pauth2
  targer/arm: Inform helpers whether a PAC instruction is 'combined'
  target/arm: Implement v8.3 FPAC and FPACCOMBINE
  target/arm: Add CPU properties for most v8.3 PAC features

 target/arm/cpu.h   |  62 -
 target/arm/cpu64.c |  81 ++---
 target/arm/helper-a64.h|   4 +
 target/arm/helper.c|   4 +-
 target/arm/pauth_helper.c  | 182 ++---
 target/arm/syndrome.h  |   6 ++
 target/arm/translate-a64.c |  20 ++--
 7 files changed, 296 insertions(+), 63 deletions(-)

-- 
2.25.1




[PATCH 5/7] targer/arm: Inform helpers whether a PAC instruction is 'combined'

2023-02-02 Thread Aaron Lindsay
An instruction is a 'combined' Pointer Authentication instruction if it
does something in addition to PAC - for instance, branching to or
loading an address from the authenticated pointer. Knowing whether a PAC
operation is 'combined' is needed to implement the FPACCOMBINE feature
for ARMv8.3.

Signed-off-by: Aaron Lindsay 
---
 target/arm/helper-a64.h|  4 +++
 target/arm/pauth_helper.c  | 63 --
 target/arm/translate-a64.c | 20 ++--
 3 files changed, 68 insertions(+), 19 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index 7b706571bb..829aaf4919 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -98,9 +98,13 @@ DEF_HELPER_FLAGS_3(pacda, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(pacdb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(pacga, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autia, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autia_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autib, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autib_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autda, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autda_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autdb, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autdb_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_2(xpaci, TCG_CALL_NO_RWG_SE, i64, env, i64)
 DEF_HELPER_FLAGS_2(xpacd, TCG_CALL_NO_RWG_SE, i64, env, i64)
 
diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
index 6ebf6df75c..66dc90a289 100644
--- a/target/arm/pauth_helper.c
+++ b/target/arm/pauth_helper.c
@@ -386,7 +386,8 @@ static uint64_t pauth_original_ptr(uint64_t ptr, 
ARMVAParameters param)
 }
 
 static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
-   ARMPACKey *key, bool data, int keynumber)
+   ARMPACKey *key, bool data, int keynumber,
+   bool is_combined)
 {
 ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
 ARMVAParameters param = aa64_va_parameters(env, ptr, mmu_idx, data);
@@ -507,44 +508,88 @@ uint64_t HELPER(pacga)(CPUARMState *env, uint64_t x, 
uint64_t y)
 return pac & 0xull;
 }
 
-uint64_t HELPER(autia)(CPUARMState *env, uint64_t x, uint64_t y)
+static uint64_t pauth_autia(CPUARMState *env, uint64_t x, uint64_t y,
+bool is_combined)
 {
 int el = arm_current_el(env);
 if (!pauth_key_enabled(env, el, SCTLR_EnIA)) {
 return x;
 }
 pauth_check_trap(env, el, GETPC());
-return pauth_auth(env, x, y, >keys.apia, false, 0);
+return pauth_auth(env, x, y, >keys.apia, false, 0, is_combined);
 }
 
-uint64_t HELPER(autib)(CPUARMState *env, uint64_t x, uint64_t y)
+uint64_t HELPER(autia)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autia(env, x, y, false);
+}
+
+uint64_t HELPER(autia_combined)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autia(env, x, y, true);
+}
+
+static uint64_t pauth_autib(CPUARMState *env, uint64_t x, uint64_t y,
+bool is_combined)
 {
 int el = arm_current_el(env);
 if (!pauth_key_enabled(env, el, SCTLR_EnIB)) {
 return x;
 }
 pauth_check_trap(env, el, GETPC());
-return pauth_auth(env, x, y, >keys.apib, false, 1);
+return pauth_auth(env, x, y, >keys.apib, false, 1, is_combined);
 }
 
-uint64_t HELPER(autda)(CPUARMState *env, uint64_t x, uint64_t y)
+uint64_t HELPER(autib)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autib(env, x, y, false);
+}
+
+uint64_t HELPER(autib_combined)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autib(env, x, y, true);
+}
+
+static uint64_t pauth_autda(CPUARMState *env, uint64_t x, uint64_t y,
+bool is_combined)
 {
 int el = arm_current_el(env);
 if (!pauth_key_enabled(env, el, SCTLR_EnDA)) {
 return x;
 }
 pauth_check_trap(env, el, GETPC());
-return pauth_auth(env, x, y, >keys.apda, true, 0);
+return pauth_auth(env, x, y, >keys.apda, true, 0, is_combined);
 }
 
-uint64_t HELPER(autdb)(CPUARMState *env, uint64_t x, uint64_t y)
+uint64_t HELPER(autda)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autda(env, x, y, false);
+}
+
+uint64_t HELPER(autda_combined)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autda(env, x, y, true);
+}
+
+static uint64_t pauth_autdb(CPUARMState *env, uint64_t x, uint64_t y,
+bool is_combined)
 {
 int el = arm_current_el(env);
 if (!pauth_key_enabled(env, el, SCTLR_EnDB)) {
 return x;
 }
 pauth_check_trap(env, el, GETPC());
-return pauth_auth(env, x, y, >keys.apdb, true, 1);
+return pauth_auth(env, x, y, >keys.apdb, true, 1, is_combined);

[PATCH 3/7] target/arm: Implement v8.3 EnhancedPAC

2023-02-02 Thread Aaron Lindsay
Signed-off-by: Aaron Lindsay 
---
 target/arm/pauth_helper.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
index f525ef7fad..a83956652f 100644
--- a/target/arm/pauth_helper.c
+++ b/target/arm/pauth_helper.c
@@ -347,11 +347,15 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
  */
 test = sextract64(ptr, bot_bit, top_bit - bot_bit);
 if (test != 0 && test != -1) {
-/*
- * Note that our top_bit is one greater than the pseudocode's
- * version, hence "- 2" here.
- */
-pac ^= MAKE_64BIT_MASK(top_bit - 2, 1);
+if (cpu_isar_feature(aa64_pauth_epac, env_archcpu(env))) {
+pac = 0;
+} else {
+/*
+ * Note that our top_bit is one greater than the pseudocode's
+ * version, hence "- 2" here.
+ */
+pac ^= MAKE_64BIT_MASK(top_bit - 2, 1);
+}
 }
 
 /*
-- 
2.25.1




Re: [PATCH 1/4] plugins: fix optimization in plugin_gen_disable_mem_helpers

2023-01-10 Thread Aaron Lindsay
On Jan 08 11:47, Emilio Cota wrote:
> We were mistakenly checking tcg_ctx->plugin_insn as a canary to know
> whether the TB had emitted helpers that might have accessed memory.
> 
> The problem is that tcg_ctx->plugin_insn gets updated on every
> instruction in the TB, which results in us wrongly performing the
> optimization (i.e. not clearing cpu->plugin_mem_cbs) way too often,
> since it's not rare that the last instruction in the TB doesn't
> use helpers.
> 
> Fix it by tracking a per-TB canary.
> 
> While at it, expand documentation.
> 
> Related: #1381
> 
> Signed-off-by: Emilio Cota 
> ---
>  accel/tcg/plugin-gen.c | 26 ++
>  include/qemu/plugin.h  |  7 +++
>  2 files changed, 25 insertions(+), 8 deletions(-)

Tested-by: Aaron Lindsay 



Re: [PATCH 2/4] translator: always pair plugin_gen_insn_{start,end} calls

2023-01-10 Thread Aaron Lindsay
On Jan 08 11:47, Emilio Cota wrote:
> Related: #1381
> 
> Signed-off-by: Emilio Cota 
> ---
>  accel/tcg/translator.c | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)

Tested-by: Aaron Lindsay 



Re: Plugin Memory Callback Debugging

2022-12-19 Thread Aaron Lindsay
Emilio,

On Dec 18 00:24, Emilio Cota wrote:
> On Tue, Nov 29, 2022 at 15:37:51 -0500, Aaron Lindsay wrote:
> (snip)
> > > Does this hint that there are cases where reset cpu->plugin_mem_cbs to 
> > > NULL is
> > > getting optimized away, but not the code to set it in the first place?
> > 
> > Is there anyone who could help take a look at this from the code gen
> > perspective?
> 
> Thanks for the report. Just adding assertions was enough to uncover
> several bugs. I did not reproduce the use-after-free, but by calling
> reset from a callback it's easy to see how it can occur.
> 
> I have fixes in https://github.com/cota/qemu/tree/plugins
> 
> Can you please give those a try?
> 
> BTW I created an issue on gitlab to track this
>   https://gitlab.com/qemu-project/qemu/-/issues/1381

Thanks so much for digging into this!

I rebased your plugins branch on top of v7.2.0 and tested with several
scenarios which reliably triggered the bug for me. None of them
reproduced the original problem (or hit other bugs!) with your fixes.

-Aaron



Re: Plugin Memory Callback Debugging

2022-11-29 Thread Aaron Lindsay via
On Nov 22 10:57, Aaron Lindsay wrote:
> On Nov 21 18:22, Richard Henderson wrote:
> > On 11/21/22 13:51, Alex Bennée wrote:
> > > 
> > > Aaron Lindsay  writes:
> > > 
> > > > On Nov 15 22:36, Alex Bennée wrote:
> > > > > Aaron Lindsay  writes:
> > > > > > I believe the code *should* always reset `cpu->plugin_mem_cbs` to 
> > > > > > NULL at the
> > > > > > end of an instruction/TB's execution, so its not exactly clear to 
> > > > > > me how this
> > > > > > is occurring. However, I suspect it may be relevant that we are 
> > > > > > calling
> > > > > > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
> > > > > 
> > > > > Hmm I'm going to have to remind myself about how this bit works.
> > > > 
> > > > When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is
> > > > set for an instruction? Is it guaranteed it is reset by the end of the
> > > > tb?
> > > 
> > > It should be by the end of the instruction. See
> > > inject_mem_disable_helper() which inserts TCG code to disable the
> > > helpers. We also have plugin_gen_disable_mem_helpers() which should
> > > catch every exit out of a block (exit_tb, goto_tb, goto_ptr). That is
> > > why qemu_plugin_disable_mem_helpers() is only really concerned about
> > > when we longjmp out of the loop.
> > > 
> > > > If I were to put an assertion in cpu_tb_exec() just after the call
> > > > to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL
> > > > there?
> > > 
> > > Yes I think so.
> > 
> > Indeed.
> 
> Well, the good news is that if this is an assumption we're relying on, it is
> now trivial to reproduce the problem!
> 
> Compile some simple program (doesn't really matter, the issue gets triggered
> early):
> 
> $ echo "int main() { return 0; }" > simple.c && gcc simple.c -o simple
> 
> Make this change to cpu_tb_exec():
> 
> > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> > index 356fe348de..50a010327d 100644
> > --- a/accel/tcg/cpu-exec.c
> > +++ b/accel/tcg/cpu-exec.c
> > @@ -436,6 +436,9 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int 
> > *tb_exit)
> > 
> >  qemu_thread_jit_execute();
> >  ret = tcg_qemu_tb_exec(env, tb_ptr);
> > +if (cpu->plugin_mem_cbs != NULL) {
> > +g_assert_not_reached();
> > +}
> >  cpu->can_do_io = 1;
> >  /*
> >   * TODO: Delay swapping back to the read-write region of the TB
> 
> And run:
> 
> $ ./build/qemu-aarch64 -plugin contrib/plugins/libexeclog.so -d plugin 
> ./simple
> 
> You should fairly quickly see something like:
> 
> > [snip]
> > 0, 0x5502814d04, 0xb482, ""
> > 0, 0x5502814d08, 0xf9400440, "", load, 0x5502844ed0
> > 0, 0x5502814d0c, 0xf1001c1f, ""
> > **
> > ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be reached
> > Bail out! ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be 
> > reached
> 
> When digging through my other failure in `rr` I saw the cpu->plugin_mem_cbs
> pointer changing from one non-null value to another (which also seems to
> indicate it is not being cleared between instructions).
> 
> Does this hint that there are cases where reset cpu->plugin_mem_cbs to NULL is
> getting optimized away, but not the code to set it in the first place?

Is there anyone who could help take a look at this from the code gen
perspective?

-Aaron



Re: Plugin Memory Callback Debugging

2022-11-22 Thread Aaron Lindsay via
On Nov 21 22:02, Alex Bennée wrote:
> 
> Aaron Lindsay  writes:
> 
> > Sorry, left off the very end of my timeline:
> >
> > On Nov 18 16:58, Aaron Lindsay wrote:
> >> I have, so far, discovered the following timeline:
> >> 1. My plugin receives a instruction execution callback for a load
> >>instruction. At this time, cpu->plugin_mem_cbs points to the same
> >>memory which will later be freed
> >> 2. During the handling of this callback, my plugin calls
> >qemu_plugin_reset()
> 
> The final plugin reset should only execute in the safe async context
> (i.e. no other vCPUs running code). That flushes all current generated
> code.
> 
> >> 3. Ostensibly something goes wrong here with the cleanup of
> >>cpu->plugin_mem_cbs???
> 
> This may be missed by the reset path (hence your patch) but it should be
> being reset every instruction we instrument.
> 
> >> 4. Step 2 triggers the TBs to be flushed, which frees the memory pointed
> >>to by cpu->plugin_mem_cbs 
> >
> > 5. A store exclusive instruction is translated and then executed, which
> >requires the use of a helper. When executed, this helper checks
> >cpu->plugin_mem_cbs, which is non-null, so it attempts to dereference
> >and use it, resulting in the assertion.
> 
> It should be being reset for each instruction I think.

FYI - I suspect my above presentation of the problem suffered from the
"searching where the streetlamp is instead of where you lost something"
problem. In other words, I did/do observe the error at reset, but I now
believe that is merely where it is easiest to observe. The
cpu->plugin_mem_cbs doesn't appear to be reset at the end of
instructions and manifests at reset because that's when the underlying
memory is freed.

-Aaron



Re: Plugin Memory Callback Debugging

2022-11-22 Thread Aaron Lindsay via
On Nov 21 18:22, Richard Henderson wrote:
> On 11/21/22 13:51, Alex Bennée wrote:
> > 
> > Aaron Lindsay  writes:
> > 
> > > On Nov 15 22:36, Alex Bennée wrote:
> > > > Aaron Lindsay  writes:
> > > > > I believe the code *should* always reset `cpu->plugin_mem_cbs` to 
> > > > > NULL at the
> > > > > end of an instruction/TB's execution, so its not exactly clear to me 
> > > > > how this
> > > > > is occurring. However, I suspect it may be relevant that we are 
> > > > > calling
> > > > > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
> > > > 
> > > > Hmm I'm going to have to remind myself about how this bit works.
> > > 
> > > When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is
> > > set for an instruction? Is it guaranteed it is reset by the end of the
> > > tb?
> > 
> > It should be by the end of the instruction. See
> > inject_mem_disable_helper() which inserts TCG code to disable the
> > helpers. We also have plugin_gen_disable_mem_helpers() which should
> > catch every exit out of a block (exit_tb, goto_tb, goto_ptr). That is
> > why qemu_plugin_disable_mem_helpers() is only really concerned about
> > when we longjmp out of the loop.
> > 
> > > If I were to put an assertion in cpu_tb_exec() just after the call
> > > to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL
> > > there?
> > 
> > Yes I think so.
> 
> Indeed.

Well, the good news is that if this is an assumption we're relying on, it is
now trivial to reproduce the problem!

Compile some simple program (doesn't really matter, the issue gets triggered
early):

$ echo "int main() { return 0; }" > simple.c && gcc simple.c -o simple

Make this change to cpu_tb_exec():

> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 356fe348de..50a010327d 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -436,6 +436,9 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int 
> *tb_exit)
> 
>  qemu_thread_jit_execute();
>  ret = tcg_qemu_tb_exec(env, tb_ptr);
> +if (cpu->plugin_mem_cbs != NULL) {
> +g_assert_not_reached();
> +}
>  cpu->can_do_io = 1;
>  /*
>   * TODO: Delay swapping back to the read-write region of the TB

And run:

$ ./build/qemu-aarch64 -plugin contrib/plugins/libexeclog.so -d plugin ./simple

You should fairly quickly see something like:

> [snip]
> 0, 0x5502814d04, 0xb482, ""
> 0, 0x5502814d08, 0xf9400440, "", load, 0x5502844ed0
> 0, 0x5502814d0c, 0xf1001c1f, ""
> **
> ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be reached
> Bail out! ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be 
> reached

When digging through my other failure in `rr` I saw the cpu->plugin_mem_cbs
pointer changing from one non-null value to another (which also seems to
indicate it is not being cleared between instructions).

Does this hint that there are cases where reset cpu->plugin_mem_cbs to NULL is
getting optimized away, but not the code to set it in the first place?

-Aaron



Re: Plugin Memory Callback Debugging

2022-11-21 Thread Aaron Lindsay via
On Nov 15 22:36, Alex Bennée wrote:
> Aaron Lindsay  writes:
> > I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at 
> > the
> > end of an instruction/TB's execution, so its not exactly clear to me how 
> > this
> > is occurring. However, I suspect it may be relevant that we are calling
> > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
> 
> Hmm I'm going to have to remind myself about how this bit works.

When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is
set for an instruction? Is it guaranteed it is reset by the end of the
tb? If I were to put an assertion in cpu_tb_exec() just after the call
to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL there?

In my debugging, I *think* I'm seeing a tb set cpu->plugin_mem_cbs
for an instruction, and then not reset it to NULL. I'm wondering if its
getting optimized away or something, but want to make sure I've got my
assumptions correct about how this is intended to be working.

Thanks!

-Aaron



Re: Plugin Memory Callback Debugging

2022-11-18 Thread Aaron Lindsay via
On Nov 15 22:36, Alex Bennée wrote:
> 
> Aaron Lindsay  writes:
> 
> > Hello,
> >
> > I have been wrestling with what might be a bug in the plugin memory
> > callbacks. The immediate error is that I hit the
> > `g_assert_not_reached()` in the 'default:' case in
> > qemu_plugin_vcpu_mem_cb, indicating the callback type was invalid. When
> > breaking on this assertion in gdb, the contents of cpu->plugin_mem_cbs
> > are obviously bogus (`len` was absurdly high, for example).  After doing
> > some further digging/instrumenting, I eventually found that
> > `free_dyn_cb_arr(void *p, ...)` is being called shortly before the
> > assertion is hit with `p` pointing to the same address as
> > `cpu->plugin_mem_cbs` will later hold at assertion-time. We are freeing
> > the memory still pointed to by `cpu->plugin_mem_cbs`.
> >
> > I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at 
> > the
> > end of an instruction/TB's execution, so its not exactly clear to me how 
> > this
> > is occurring. However, I suspect it may be relevant that we are calling
> > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
> 
> Hmm I'm going to have to remind myself about how this bit works.
> 
> >
> > I have additionally found that the below addition allows me to run 
> > successfully
> > without hitting the assert:
> >
> > diff --git a/plugins/core.c b/plugins/core.c
> > --- a/plugins/core.c
> > +++ b/plugins/core.c
> > @@ -427,9 +427,14 @@ static bool free_dyn_cb_arr(void *p, uint32_t h, void 
> > *userp)
> >
> >  void qemu_plugin_flush_cb(void)
> >  {
> > +CPUState *cpu;
> >  qht_iter_remove(_cb_arr_ht, free_dyn_cb_arr, NULL);
> >  qht_reset(_cb_arr_ht);
> >
> > +CPU_FOREACH(cpu) {
> > +cpu->plugin_mem_cbs = NULL;
> > +}
> > +
> 
> This is essentially qemu_plugin_disable_mem_helpers() but for all CPUs.
> I think we should be able to treat the CPUs separately.

I agree it's similar to qemu_plugin_disable_mem_helpers(), but for all
CPUs. Though a perhaps important distinction is that its occurring
unconditionally in conjunction with the event which flushes the TBs and
frees the callback arrays.

Isn't the code calling into qemu_plugin_flush_cb() already clearing TBs
for all CPUs? Can you help me understand why treating the CPUs
separately would be better?

> >  plugin_cb__simple(QEMU_PLUGIN_EV_FLUSH);
> >  }
> >
> > Unfortunately, the workload/setup I have encountered this bug with are
> > difficult to reproduce in a way suitable for sharing upstream (admittedly
> > potentially because I do not fully understand the conditions necessary to
> > trigger it). It is also deep into a run
> 
> How many full TB flushes have there been? You only see
> qemu_plugin_flush_cb when we flush whole translation buffer (which is
> something we do more often when plugins exit).

There have been maybe hundreds of TB flushes at this point (I, erm, use
qemu_plugin_reset() somewhat liberally in this plugin). I believe it is
the most recent such flush that is problematic - I observe the call to
free_dyn_cb_arr() mentioned above as a result of it.

> Does lowering tb-size make it easier to hit the failure mode?

Hrm, interesting, I have not tried that. I'll poke at that if the rr
debug doesn't pan out.

> > , and I haven't found a good way
> > to break in gdb immediately prior to it happening in order to inspect
> > it, without perturbing it enough such that it doesn't happen...
> 
> This is exactly the sort of thing rr is great for. Can you trigger it in
> that?
> 
>   https://rr-project.org/

I had not used rr before, thanks for the push to do so!

I have, so far, discovered the following timeline:
1. My plugin receives a instruction execution callback for a load
   instruction. At this time, cpu->plugin_mem_cbs points to the same
   memory which will later be freed
2. During the handling of this callback, my plugin calls qemu_plugin_reset()
3. Ostensibly something goes wrong here with the cleanup of
   cpu->plugin_mem_cbs??? 
4. Step 2 triggers the TBs to be flushed, which frees the memory pointed
   to by cpu->plugin_mem_cbs 

Since I have this nicely recorded now with rr, it ought to be easier to
poke at, though I admit I'm not entirely sure how to poke at the
generated code to see what's going wrong (i.e. why wouldn't the tb exit
stuff be clearing this pointer like normal?).

> > I welcome any feedback or insights on how to further nail down the
> > failure case and/or help in working towards an appropriate solution.

-Aaron



Re: Plugin Memory Callback Debugging

2022-11-18 Thread Aaron Lindsay
Sorry, left off the very end of my timeline:

On Nov 18 16:58, Aaron Lindsay wrote:
> I have, so far, discovered the following timeline:
> 1. My plugin receives a instruction execution callback for a load
>instruction. At this time, cpu->plugin_mem_cbs points to the same
>memory which will later be freed
> 2. During the handling of this callback, my plugin calls qemu_plugin_reset()
> 3. Ostensibly something goes wrong here with the cleanup of
>cpu->plugin_mem_cbs??? 
> 4. Step 2 triggers the TBs to be flushed, which frees the memory pointed
>to by cpu->plugin_mem_cbs 

5. A store exclusive instruction is translated and then executed, which
   requires the use of a helper. When executed, this helper checks
   cpu->plugin_mem_cbs, which is non-null, so it attempts to dereference
   and use it, resulting in the assertion. 

-Aaron



Plugin Memory Callback Debugging

2022-11-15 Thread Aaron Lindsay
Hello,

I have been wrestling with what might be a bug in the plugin memory
callbacks. The immediate error is that I hit the
`g_assert_not_reached()` in the 'default:' case in
qemu_plugin_vcpu_mem_cb, indicating the callback type was invalid. When
breaking on this assertion in gdb, the contents of cpu->plugin_mem_cbs
are obviously bogus (`len` was absurdly high, for example).  After doing
some further digging/instrumenting, I eventually found that
`free_dyn_cb_arr(void *p, ...)` is being called shortly before the
assertion is hit with `p` pointing to the same address as
`cpu->plugin_mem_cbs` will later hold at assertion-time. We are freeing
the memory still pointed to by `cpu->plugin_mem_cbs`.

I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
end of an instruction/TB's execution, so its not exactly clear to me how this
is occurring. However, I suspect it may be relevant that we are calling
`free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`. 

I have additionally found that the below addition allows me to run successfully
without hitting the assert:

diff --git a/plugins/core.c b/plugins/core.c
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -427,9 +427,14 @@ static bool free_dyn_cb_arr(void *p, uint32_t h, void 
*userp)

 void qemu_plugin_flush_cb(void)
 {
+CPUState *cpu;
 qht_iter_remove(_cb_arr_ht, free_dyn_cb_arr, NULL);
 qht_reset(_cb_arr_ht);

+CPU_FOREACH(cpu) {
+cpu->plugin_mem_cbs = NULL;
+}
+
 plugin_cb__simple(QEMU_PLUGIN_EV_FLUSH);
 }

Unfortunately, the workload/setup I have encountered this bug with are
difficult to reproduce in a way suitable for sharing upstream (admittedly
potentially because I do not fully understand the conditions necessary to
trigger it). It is also deep into a run, and I haven't found a good way
to break in gdb immediately prior to it happening in order to inspect
it, without perturbing it enough such that it doesn't happen... 

I welcome any feedback or insights on how to further nail down the
failure case and/or help in working towards an appropriate solution.

Thanks!

-Aaron



Re: [BUG] AArch64 boot hang with -icount and -smp >1 (iothread locking issue?)

2022-10-21 Thread Aaron Lindsay
On Oct 21 17:00, Peter Maydell wrote:
> On Fri, 21 Oct 2022 at 16:48, Aaron Lindsay
>  wrote:
> >
> > Hello,
> >
> > I am encountering one or more bugs when using -icount and -smp >1 that I am
> > attempting to sort out. My current theory is that it is an iothread locking
> > issue.
> 
> Weird coincidence, that is a bug that's been in the tree for months
> but was only reported to me earlier this week. Try reverting
> commit a82fd5a4ec24d923ff1e -- that should fix it.

I can confirm that reverting a82fd5a4ec24d923ff1e fixes it for me.
Thanks for the help and fast response!

-Aaron



[BUG] AArch64 boot hang with -icount and -smp >1 (iothread locking issue?)

2022-10-21 Thread Aaron Lindsay
Hello,

I am encountering one or more bugs when using -icount and -smp >1 that I am
attempting to sort out. My current theory is that it is an iothread locking
issue.

I am using a command-line like the following where $kernel is a recent upstream
AArch64 Linux kernel Image (I can provide a binary if that would be helpful -
let me know how is best to post):

qemu-system-aarch64 \
-M virt -cpu cortex-a57 -m 1G \
-nographic \
-smp 2 \
-icount 0 \
-kernel $kernel

For any/all of the symptoms described below, they seem to disappear when I
either remove `-icount 0` or change smp to `-smp 1`. In other words, it is the
combination of `-smp >1` and `-icount` which triggers what I'm seeing.

I am seeing two different (but seemingly related) behaviors. The first (and
what I originally started debugging) shows up as a boot hang. When booting
using the above command after Peter's "icount: Take iothread lock when running
QEMU timers" patch [1], The kernel boots for a while and then hangs after:

> ...snip...
> [0.010764] Serial: AMBA PL011 UART driver
> [0.016334] 900.pl011: ttyAMA0 at MMIO 0x900 (irq = 13, base_baud 
> = 0) is a PL011 rev1
> [0.016907] printk: console [ttyAMA0] enabled
> [0.017624] KASLR enabled
> [0.031986] HugeTLB: registered 16.0 GiB page size, pre-allocated 0 pages
> [0.031986] HugeTLB: 16320 KiB vmemmap can be freed for a 16.0 GiB page
> [0.031986] HugeTLB: registered 512 MiB page size, pre-allocated 0 pages
> [0.031986] HugeTLB: 448 KiB vmemmap can be freed for a 512 MiB page
> [0.031986] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
> [0.031986] HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page

When it hangs here, I drop into QEMU's console, attach to the gdbserver, and it
always reports that it is at address 0x88dc42e8 (as shown below from an
objdump of the vmlinux). I note this is in the middle of messing with timer
system registers - which makes me suspect we're attempting to take the iothread
lock when its already held:

>   88dc42b8 :
>   88dc42b8:   d503201fnop
>   88dc42bc:   d503201fnop
>   88dc42c0:   d503233fpaciasp
>   88dc42c4:   d53be321mrs x1, cntv_ctl_el0
>   88dc42c8:   3221orr w1, w1, #0x1
>   88dc42cc:   d5033fdfisb
>   88dc42d0:   d53be042mrs x2, cntvct_el0
>   88dc42d4:   ca020043eor x3, x2, x2
>   88dc42d8:   8b2363e3add x3, sp, x3
>   88dc42dc:   f940007fldr xzr, [x3]
>   88dc42e0:   8b02add x0, x0, x2
>   88dc42e4:   d51be340msr cntv_cval_el0, x0
> * 88dc42e8:   927ef820and x0, x1, #0xfffd
>   88dc42ec:   d51be320msr cntv_ctl_el0, x0
>   88dc42f0:   d5033fdfisb
>   88dc42f4:   5280mov w0, #0x0
> // #0
>   88dc42f8:   d50323bfautiasp
>   88dc42fc:   d65f03c0ret 

The second behavior is that prior to Peter's "icount: Take iothread lock when
running QEMU timers" patch [1], I observe the following message (same command
as above):

> ERROR:../accel/tcg/tcg-accel-ops.c:79:tcg_handle_interrupt: assertion failed: 
> (qemu_mutex_iothread_locked())
> Aborted (core dumped)

This is the same behavior described in Gitlab issue 1130 [0] and addressed by
[1]. I bisected the appearance of this assertion, and found it was introduced
by Pavel's "replay: rewrite async event handling" commit [2]. Commits prior to
that one boot successfully (neither assertions nor hangs) with `-icount 0 -smp
2`.

I've looked over these two commits ([1], [2]), but it is not obvious to me
how/why they might be interacting to produce the boot hangs I'm seeing and
I welcome any help investigating further.

Thanks!

-Aaron Lindsay

[0] - https://gitlab.com/qemu-project/qemu/-/issues/1130
[1] - 
https://gitlab.com/qemu-project/qemu/-/commit/c7f26ded6d5065e4116f630f6a490b55f6c5f58e
[2] - 
https://gitlab.com/qemu-project/qemu/-/commit/60618e2d77691e44bb78e23b2b0cf07b5c405e56



Re: Plugins Not Reporting AArch64 SVE Memory Operations

2022-03-29 Thread Aaron Lindsay via
On Mar 28 16:30, Alex Bennée wrote:
> 
> Aaron Lindsay  writes:
> 
> > Hi folks,
> >
> > I see there has been some previous discussion [1] about 1.5 years ago
> > around the fact that AArch64 SVE instructions do not emit any memory
> > operations via the plugin interface, as one might expect them to.
> 
> To help I updated one of the tests and extended the exec plugin. See:
> 
>   Subject: [PATCH  v1 0/2] some tests and plugin tweaks for SVE
>   Date: Mon, 28 Mar 2022 16:26:12 +0100
>   Message-Id: <20220328152614.2452259-1-alex.ben...@linaro.org>

This looks helpful, thanks!

-Aaron



Plugins Not Reporting AArch64 SVE Memory Operations

2022-03-24 Thread Aaron Lindsay
Hi folks,

I see there has been some previous discussion [1] about 1.5 years ago
around the fact that AArch64 SVE instructions do not emit any memory
operations via the plugin interface, as one might expect them to.

I am interested in being able to more accurately trace the memory
operations of SVE instructions using the plugin interface - has there
been any further discussion or work on this topic off-list (or that
escaped my searching)?

In the previous discussion [1], Richard raised some interesting
questions:

> The plugin interface needs extension for this.  How should I signal that 256
> consecutive byte loads have occurred?  How should I signal that the 
> controlling
> predicate was not all true, so only 250 of those 256 were actually active?  
> How
> should I signal 59 non-consecutive (gather) loads have occurred?
> 
> If the answer is simply that you want 256 or 250 or 59 plugin callbacks
> respectively, then we might be able to force the memory operations into the
> slow path, and hook the operation there.  As if it were an i/o operation.

My initial reaction is that simply sending individual callbacks for each
access (only the ones which were active, in the case of predication)
seems to fit reasonably well with the existing plugin interface. For
instance, I think we already receive two callbacks for each AArch64
`LDP` instruction, right?

If this is an agreeable solution that wouldn't take too much effort to
implement (and no one else is doing it), would someone mind pointing me
in the right direction to get started?

Thanks!

-Aaron

[1] https://lists.nongnu.org/archive/html/qemu-discuss/2020-12/msg00015.html



Re: [PATCH v1 12/22] plugins: stxp test case from Aaron (!upstream)

2022-02-02 Thread Aaron Lindsay via
On Feb 01 15:29, Alex Bennée wrote:
> 
> Aaron Lindsay  writes:
> 
> > On Jan 24 20:15, Alex Bennée wrote:
> >> Signed-off-by: Alex Bennée 
> >> Cc: Aaron Lindsay 
> >> Message-ID: 
> >> 
> >> ---
> >> [AJB] this was for testing, I think you can show the same stuff with
> >> the much more complete execlog now.
> >
> > Is it true that execlog can also reproduce the duplicate loads which are
> > still an outstanding issue?
> 
> Are we still seeing duplicate loads? I thought that had been fixed.

I have not explicitly tested for the duplicate loads on atomics lately
(though I have seen some transient behavior related to atomics that I
have struggled to reliably reproduce, but I believe that's a different
issue). I hadn't seen a subsequent fix come through after the initial
fix for stores and assumed it was still an issue. Sorry for my
assumption, particularly if I just missed it.

-Aaron

> >> ---
> >>  contrib/plugins/stxp-plugin.c | 50 +++
> >>  tests/tcg/aarch64/stxp.c  | 28 +
> >>  contrib/plugins/Makefile  |  1 +
> >>  tests/tcg/aarch64/Makefile.target |  3 ++
> >>  4 files changed, 82 insertions(+)
> >>  create mode 100644 contrib/plugins/stxp-plugin.c
> >>  create mode 100644 tests/tcg/aarch64/stxp.c
> >> 
> >> diff --git a/contrib/plugins/stxp-plugin.c b/contrib/plugins/stxp-plugin.c
> >> new file mode 100644
> >> index 00..432cf8c1ed
> >> --- /dev/null
> >> +++ b/contrib/plugins/stxp-plugin.c
> >> @@ -0,0 +1,50 @@
> >> +#include 
> >> +#include 
> >> +#include 
> >> +
> >> +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
> >> +
> >> +void qemu_logf(const char *str, ...)
> >> +{
> >> +char message[1024];
> >> +va_list args;
> >> +va_start(args, str);
> >> +vsnprintf(message, 1023, str, args);
> >> +
> >> +qemu_plugin_outs(message);
> >> +
> >> +va_end(args);
> >> +}
> >> +
> >> +void before_insn_cb(unsigned int cpu_index, void *udata)
> >> +{
> >> +uint64_t pc = (uint64_t)udata;
> >> +qemu_logf("Executing PC: 0x%" PRIx64 "\n", pc);
> >> +}
> >> +
> >> +static void mem_cb(unsigned int cpu_index, qemu_plugin_meminfo_t meminfo, 
> >> uint64_t va, void *udata)
> >> +{
> >> +uint64_t pc = (uint64_t)udata;
> >> +qemu_logf("PC 0x%" PRIx64 " accessed memory at 0x%" PRIx64 "\n", pc, 
> >> va);
> >> +}
> >> +
> >> +static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
> >> +{
> >> +size_t n = qemu_plugin_tb_n_insns(tb);
> >> +
> >> +for (size_t i = 0; i < n; i++) {
> >> +struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
> >> +uint64_t pc = qemu_plugin_insn_vaddr(insn);
> >> +
> >> +qemu_plugin_register_vcpu_insn_exec_cb(insn, before_insn_cb, 
> >> QEMU_PLUGIN_CB_R_REGS, (void *)pc);
> >> +qemu_plugin_register_vcpu_mem_cb(insn, mem_cb, 
> >> QEMU_PLUGIN_CB_NO_REGS, QEMU_PLUGIN_MEM_RW, (void*)pc);
> >> +}
> >> +}
> >> +
> >> +QEMU_PLUGIN_EXPORT
> >> +int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
> >> +int argc, char **argv)
> >> +{
> >> +qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
> >> +return 0;
> >> +}
> >> diff --git a/tests/tcg/aarch64/stxp.c b/tests/tcg/aarch64/stxp.c
> >> new file mode 100644
> >> index 00..fb8ef6a46d
> >> --- /dev/null
> >> +++ b/tests/tcg/aarch64/stxp.c
> >> @@ -0,0 +1,28 @@
> >> +
> >> +
> >> +void stxp_issue_demo(void *arr)
> >> +{
> >> +asm(".align 8\n\t"
> >> +"mov x0, %[in]\n\t"
> >> +"mov x18, 0x1000\n\t"
> >> +"mov x2, 0x0\n\t"
> >> +"mov x3, 0x0\n\t"
> >> +"loop:\n\t"
> >> +"prfm  pstl1strm, [x0]\n\t"
> >> +"ldxp  x16, x17, [x0]\n\t"
> >> +"stxp  w16, x2, x3, [x0]\n\t"
> >> +"\n\t"
> >> +"subs x18, x18, 1\n\t"
> >&g

Re: [PATCH v1 12/22] plugins: stxp test case from Aaron (!upstream)

2022-02-01 Thread Aaron Lindsay via
On Jan 25 09:17, Thomas Huth wrote:
> On 24/01/2022 21.15, Alex Bennée wrote:
> > Signed-off-by: Alex Bennée 
> > Cc: Aaron Lindsay 
> > Message-ID: 
> > 
> > ---
> > [AJB] this was for testing, I think you can show the same stuff with
> > the much more complete execlog now.
> > ---
> >   contrib/plugins/stxp-plugin.c | 50 +++
> >   tests/tcg/aarch64/stxp.c  | 28 +
> >   contrib/plugins/Makefile  |  1 +
> >   tests/tcg/aarch64/Makefile.target |  3 ++
> >   4 files changed, 82 insertions(+)
> >   create mode 100644 contrib/plugins/stxp-plugin.c
> >   create mode 100644 tests/tcg/aarch64/stxp.c
> > 
> > diff --git a/contrib/plugins/stxp-plugin.c b/contrib/plugins/stxp-plugin.c
> > new file mode 100644
> > index 00..432cf8c1ed
> > --- /dev/null
> > +++ b/contrib/plugins/stxp-plugin.c
> > @@ -0,0 +1,50 @@
> > +#include 
> > +#include 
> > +#include 
> > +
> > +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
> > +
> > +void qemu_logf(const char *str, ...)
> > +{
> > +char message[1024];
> > +va_list args;
> > +va_start(args, str);
> > +vsnprintf(message, 1023, str, args);
> > +
> > +qemu_plugin_outs(message);
> > +
> > +va_end(args);
> > +}
> > +
> > +void before_insn_cb(unsigned int cpu_index, void *udata)
> > +{
> > +uint64_t pc = (uint64_t)udata;
> > +qemu_logf("Executing PC: 0x%" PRIx64 "\n", pc);
> > +}
> > +
> > +static void mem_cb(unsigned int cpu_index, qemu_plugin_meminfo_t meminfo, 
> > uint64_t va, void *udata)
> 
> Could you please break the line to avoid checkpatch errors:
> 
> ERROR: line over 90 characters
> #63: FILE: contrib/plugins/stxp-plugin.c:25:
> +static void mem_cb(unsigned int cpu_index, qemu_plugin_meminfo_t meminfo,
> uint64_t va, void *udata)
> 
> ERROR: line over 90 characters
> #77: FILE: contrib/plugins/stxp-plugin.c:39:
> +qemu_plugin_register_vcpu_insn_exec_cb(insn, before_insn_cb,
> QEMU_PLUGIN_CB_R_REGS, (void *)pc);
> 
> ERROR: line over 90 characters
> #78: FILE: contrib/plugins/stxp-plugin.c:40:
> +qemu_plugin_register_vcpu_mem_cb(insn, mem_cb,
> QEMU_PLUGIN_CB_NO_REGS, QEMU_PLUGIN_MEM_RW, (void*)pc);
> 
> ERROR: "(foo*)" should be "(foo *)"
> #78: FILE: contrib/plugins/stxp-plugin.c:40:
> +qemu_plugin_register_vcpu_mem_cb(insn, mem_cb,
> QEMU_PLUGIN_CB_NO_REGS, QEMU_PLUGIN_MEM_RW, (void*)pc);
> 
> total: 4 errors, 1 warnings, 92 lines checked

For whatever it's worth, I don't think Alex's intention is to upstream
this code.

-Aaron



Re: [PATCH v1 12/22] plugins: stxp test case from Aaron (!upstream)

2022-02-01 Thread Aaron Lindsay via
On Jan 24 20:15, Alex Bennée wrote:
> Signed-off-by: Alex Bennée 
> Cc: Aaron Lindsay 
> Message-ID: 
> 
> ---
> [AJB] this was for testing, I think you can show the same stuff with
> the much more complete execlog now.

Is it true that execlog can also reproduce the duplicate loads which are
still an outstanding issue?

-Aaron

> ---
>  contrib/plugins/stxp-plugin.c | 50 +++
>  tests/tcg/aarch64/stxp.c  | 28 +
>  contrib/plugins/Makefile  |  1 +
>  tests/tcg/aarch64/Makefile.target |  3 ++
>  4 files changed, 82 insertions(+)
>  create mode 100644 contrib/plugins/stxp-plugin.c
>  create mode 100644 tests/tcg/aarch64/stxp.c
> 
> diff --git a/contrib/plugins/stxp-plugin.c b/contrib/plugins/stxp-plugin.c
> new file mode 100644
> index 00..432cf8c1ed
> --- /dev/null
> +++ b/contrib/plugins/stxp-plugin.c
> @@ -0,0 +1,50 @@
> +#include 
> +#include 
> +#include 
> +
> +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
> +
> +void qemu_logf(const char *str, ...)
> +{
> +char message[1024];
> +va_list args;
> +va_start(args, str);
> +vsnprintf(message, 1023, str, args);
> +
> +qemu_plugin_outs(message);
> +
> +va_end(args);
> +}
> +
> +void before_insn_cb(unsigned int cpu_index, void *udata)
> +{
> +uint64_t pc = (uint64_t)udata;
> +qemu_logf("Executing PC: 0x%" PRIx64 "\n", pc);
> +}
> +
> +static void mem_cb(unsigned int cpu_index, qemu_plugin_meminfo_t meminfo, 
> uint64_t va, void *udata)
> +{
> +uint64_t pc = (uint64_t)udata;
> +qemu_logf("PC 0x%" PRIx64 " accessed memory at 0x%" PRIx64 "\n", pc, va);
> +}
> +
> +static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
> +{
> +size_t n = qemu_plugin_tb_n_insns(tb);
> +
> +for (size_t i = 0; i < n; i++) {
> +struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
> +uint64_t pc = qemu_plugin_insn_vaddr(insn);
> +
> +qemu_plugin_register_vcpu_insn_exec_cb(insn, before_insn_cb, 
> QEMU_PLUGIN_CB_R_REGS, (void *)pc);
> +qemu_plugin_register_vcpu_mem_cb(insn, mem_cb, 
> QEMU_PLUGIN_CB_NO_REGS, QEMU_PLUGIN_MEM_RW, (void*)pc);
> +}
> +}
> +
> +QEMU_PLUGIN_EXPORT
> +int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
> +int argc, char **argv)
> +{
> +qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
> +return 0;
> +}
> diff --git a/tests/tcg/aarch64/stxp.c b/tests/tcg/aarch64/stxp.c
> new file mode 100644
> index 00..fb8ef6a46d
> --- /dev/null
> +++ b/tests/tcg/aarch64/stxp.c
> @@ -0,0 +1,28 @@
> +
> +
> +void stxp_issue_demo(void *arr)
> +{
> +asm(".align 8\n\t"
> +"mov x0, %[in]\n\t"
> +"mov x18, 0x1000\n\t"
> +"mov x2, 0x0\n\t"
> +"mov x3, 0x0\n\t"
> +"loop:\n\t"
> +"prfm  pstl1strm, [x0]\n\t"
> +"ldxp  x16, x17, [x0]\n\t"
> +"stxp  w16, x2, x3, [x0]\n\t"
> +"\n\t"
> +"subs x18, x18, 1\n\t"
> +"beq done\n\t"
> +"b loop\n\t"
> +"done:\n\t"
> +: /* none out */
> +: [in] "r" (arr) /* in */
> +: "x0", "x2", "x3", "x16", "x17", "x18"); /* clobbers */
> +}
> +
> +int main()
> +{
> +char arr[16];
> +stxp_issue_demo();
> +}
> diff --git a/contrib/plugins/Makefile b/contrib/plugins/Makefile
> index 54ac5ccd9f..576ed5875a 100644
> --- a/contrib/plugins/Makefile
> +++ b/contrib/plugins/Makefile
> @@ -20,6 +20,7 @@ NAMES += howvec
>  NAMES += lockstep
>  NAMES += hwprofile
>  NAMES += cache
> +NAMES += stxp-plugin
>  
>  SONAMES := $(addsuffix .so,$(addprefix lib,$(NAMES)))
>  
> diff --git a/tests/tcg/aarch64/Makefile.target 
> b/tests/tcg/aarch64/Makefile.target
> index 1d967901bd..54b2e90d00 100644
> --- a/tests/tcg/aarch64/Makefile.target
> +++ b/tests/tcg/aarch64/Makefile.target
> @@ -72,4 +72,7 @@ endif
>  
>  endif
>  
> +# Load/Store exclusive test
> +AARCH64_TESTS += stxp
> +
>  TESTS += $(AARCH64_TESTS)
> -- 
> 2.30.2
> 



Re: plugins: Missing Store Exclusive Memory Accesses

2021-10-21 Thread Aaron Lindsay via
On Oct 21 13:28, Alex Bennée wrote:
> It's a bit clearer if you use the contrib/execlog plugin:
> 
>   ./qemu-aarch64 -plugin contrib/plugins/libexeclog.so -d plugin  
> ./tests/tcg/aarch64-linux-user/stxp
> 
>   0, 0x400910, 0xf9800011, "prfm pstl1strm, [x0]
>   0, 0x400914, 0xc87f4410, "ldxp x16, x17, [x0]", load, 0x55007fffd0, load, 
> 0x55007fffd8 
>   0, 0x400918, 0xc8300c02, "stxp w16, x2, x3, [x0]", load, 0x55007fffd0, 
> load, 0x55007fffd8, store, 0x55007fffd0, store, 0x55007fffd8 
>   0, 0x40091c, 0xf1000652, "subs x18, x18, #1"
>   0, 0x400920, 0x5440, "b.eq #0x400928"
>   0, 0x400924, 0x17fb, "b #0x400910"
>   0, 0x400910, 0xf9800011, "prfm pstl1strm, [x0]
>   0, 0x400914, 0xc87f4410, "ldxp x16, x17, [x0]", load, 0x55007fffd0, load, 
> 0x55007fffd8 
>   0, 0x400918, 0xc8300c02, "stxp w16, x2, x3, [x0]", load, 0x55007fffd0, 
> load, 0x55007fffd8, store, 0x55007fffd0, store, 0x55007fffd8 
>   0, 0x40091c, 0xf1000652, "subs x18, x18, #1"
>   0, 0x400920, 0x5440, "b.eq #0x400928"
>   0, 0x400924, 0x17fb, "b #0x400910"
>   0, 0x400910, 0xf9800011, "prfm pstl1strm, [x0]
>   0, 0x400914, 0xc87f4410, "ldxp x16, x17, [x0]", load, 0x55007fffd0, load, 
> 0x55007fffd8 
>   0, 0x400918, 0xc8300c02, "stxp w16, x2, x3, [x0]", load, 0x55007fffd0, 
> load, 0x55007fffd8, store, 0x55007fffd0, store, 0x55007fffd8 
>   0, 0x40091c, 0xf1000652, "subs x18, x18, #1"
>   0, 0x400920, 0x5440, "b.eq #0x400928"
>   0, 0x400924, 0x17fb, "b #0x400910"
>   0, 0x400910, 0xf9800011, "prfm pstl1strm, [x0]
>   0, 0x400914, 0xc87f4410, "ldxp x16, x17, [x0]", load, 0x55007fffd0, load, 
> 0x55007fffd8 
>   0, 0x400918, 0xc8300c02, "stxp w16, x2, x3, [x0]", load, 0x55007fffd0, 
> load, 0x55007fffd8, store, 0x55007fffd0, store, 0x55007fffd8 
>   0, 0x40091c, 0xf1000652, "subs x18, x18, #1"
>   0, 0x400920, 0x5440, "b.eq #0x400928"
>   0, 0x400924, 0x17fb, "b #0x400910"
>   0, 0x400910, 0xf9800011, "prfm pstl1strm, [x0]
>   0, 0x400914, 0xc87f4410, "ldxp x16, x17, [x0]", load, 0x55007fffd0, load, 
> 0x55007fffd8 
>   0, 0x400918, 0xc8300c02, "stxp w16, x2, x3, [x0]", load, 0x55007fffd0, 
> load, 0x55007fffd8, store, 0x55007fffd0, store, 0x55007fffd8 
>   0, 0x40091c, 0xf1000652, "subs x18, x18, #1"
>   0, 0x400920, 0x5440, "b.eq #0x400928"
>   0, 0x400924, 0x17fb, "b #0x400910"
> 
> Although you can see stxp looks a bit weird on account of the loads it
> does during the cmpxchng. So consider me stumped. The only thing I can
> thing of next is to see how closely I can replicate your build
> environment.

I apologize, I had apparently gotten farther behind upstream than I
realized since originally encountering this. I tried the latest upstream
code and am now able to observe the same thing as you. Somewhere between
v6.1.0 and now, the original issue I reported has been resolved.

However, I am not sure reporting loads for a store exclusive makes sense
to me here, either. My understanding is that the stxp needs to check if
it still has exclusive access and QEMU's implementation results in the
extra loads, but I would expect that the plugin interface would only
report architectural loads.

Is there any obvious way to omit the loads from the plugin interface
here?

-Aaron



Re: plugins: Missing Store Exclusive Memory Accesses

2021-10-20 Thread Aaron Lindsay via
On Oct 20 18:54, Alex Bennée wrote:
> Have you got a test case you are using so I can try and replicate the
> failure you are seeing? So far by inspection everything looks OK to me.

I took some time today to put together a minimal(ish) reproducer using
usermode. The source files used are below, I compiled the test binary on an
AArch64 system using:

$ gcc -g -o stxp stxp.s stxp.c

Then built the plugin from stxp_plugin.cc, and ran it all like:

qemu-aarch64 \
-cpu cortex-a57 \
-D stxp_plugin.log \
-d plugin \
-plugin 'stxp_plugin.so' \
./stxp

I observe that, for me, the objdump of stxp contains:
0040070c :
  40070c:   f9800011prfmpstl1strm, [x0]
  400710:   c87f4410ldxpx16, x17, [x0]
  400714:   c8300c02stxpw16, x2, x3, [x0]
  400718:   f1000652subsx18, x18, #0x1
  40071c:   5440b.eq400724   // b.none
  400720:   17fbb   40070c 

But the output in stxp_plugin.log looks something like:
Executing PC: 0x40070c
Executing PC: 0x400710
PC 0x400710 accessed memory at 0x550080ec70
PC 0x400710 accessed memory at 0x550080ec78
Executing PC: 0x400714
Executing PC: 0x400718
Executing PC: 0x40071c
Executing PC: 0x400720

>From this, I believe the ldxp instruction at PC 0x400710 is reporting two
memory accesses but the stxp instruction at 0x400714 is not.

-Aaron

--- stxp.c ---
void stxp_issue_demo();

int main() {
char arr[16];
stxp_issue_demo();
}

--- stxp.s ---
.align 8

stxp_issue_demo:
mov x18, 0x1000
mov x2, 0x0
mov x3, 0x0
loop:
prfm  pstl1strm, [x0]
ldxp  x16, x17, [x0]
stxp  w16, x2, x3, [x0]

subs x18, x18, 1
beq done
b loop
done:
ret

.global stxp_issue_demo

--- stxp_plugin.cc ---
#include 

extern "C" {

#include 

QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;

void qemu_logf(const char *str, ...)
{
char message[1024];
va_list args;
va_start(args, str);
vsnprintf(message, 1023, str, args);

qemu_plugin_outs(message);

va_end(args);
}

void before_insn_cb(unsigned int cpu_index, void *udata)
{
uint64_t pc = (uint64_t)udata;
qemu_logf("Executing PC: 0x%" PRIx64 "\n", pc);
}

static void mem_cb(unsigned int cpu_index, qemu_plugin_meminfo_t meminfo, 
uint64_t va, void *udata)
{
uint64_t pc = (uint64_t)udata;
qemu_logf("PC 0x%" PRIx64 " accessed memory at 0x%" PRIx64 "\n", pc, va);
}

static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
{
size_t n = qemu_plugin_tb_n_insns(tb);

for (size_t i = 0; i < n; i++) {
struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
uint64_t pc = qemu_plugin_insn_vaddr(insn);

qemu_plugin_register_vcpu_insn_exec_cb(insn, before_insn_cb, 
QEMU_PLUGIN_CB_R_REGS, (void *)pc);
qemu_plugin_register_vcpu_mem_cb(insn, mem_cb, QEMU_PLUGIN_CB_NO_REGS, 
QEMU_PLUGIN_MEM_RW, (void*)pc);
}
}

QEMU_PLUGIN_EXPORT
int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
int argc, char **argv)
{
qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
return 0;
}

}



Re: plugins: Missing Store Exclusive Memory Accesses

2021-10-20 Thread Aaron Lindsay via
On Sep 22 16:22, Aaron Lindsay wrote:
> On Sep 21 16:28, Aaron Lindsay wrote:
> > On Sep 17 12:05, Alex Bennée wrote:
> > > Aaron Lindsay  writes:
> > > > I recently noticed that the plugin interface does not appear to be
> > > > emitting callbacks to functions registered via
> > > > `qemu_plugin_register_vcpu_mem_cb` for AArch64 store exclusives. This
> > > > would include instructions like `stxp  w16, x2, x3, [x4]` (encoding:
> > > > 0xc8300c82). Seeing as how I'm only running with a single CPU, I don't
> > > > see how this could be due to losing exclusivity after the preceding
> > > > `ldxp`.
> > > 
> > > The exclusive handling is a bit special due to the need to emulate it's
> > > behaviour using cmpxchg primitives.
> > > 
> > > >
> > > > In looking at QEMU's source, I *think* this is because the
> > > > `gen_store_exclusive` function in translate-a64.c is not making the same
> > > > calls to `plugin_gen_mem_callbacks` & company that are being made by
> > > > "normal" stores handled by functions like `tcg_gen_qemu_st_i64` (at
> > > > least in my case; I do see some code paths under `gen_store_exclusive`
> > > > call down into `tcg_gen_qemu_st_i64` eventually, but it appears not all
> > > > of them do?).
> > > 
> > > The key TCG operation is the cmpxchg which does the effective store. For
> > > -smp 1 we should use normal ld and st tcg ops. For > 1 it eventually
> > > falls to tcg_gen_atomic_cmpxchg_XX which is a helper. That eventually
> > > ends up at:
> > > 
> > >   atomic_trace_rmw_post
> > > 
> > > which should be where things are hooked.
> > 
> > When I open this up in gdb, I see that I'm getting the following call
> > graph for the `stxp` instruction in question (for -smp 1):
> > 
> > gen_store_exclusive -> gen_helper_paired_cmpxchg64_le
> > 
> > In other words, I'm taking the `s->be_data == MO_LE` else/if clause.
> > 
> > I do not see where the helper behind that (defined in helper-a64.c as
> > `uint64_t HELPER(paired_cmpxchg64_le)...`) is calling in to generate
> > plugin callbacks in this case. Am I missing something?
> 
> Richard, Alex,
> 
> The more I look at this, the more it feels like the following
> AArch64-specific helpers may have been overlooked when adding
> tracing/plugin hooks:
>   gen_helper_paired_cmpxchg64_le
>   gen_helper_paired_cmpxchg64_be
> 
> But... I'm still not sure I fully understand how everything I'm digging
> into interacts; I am happy to keep investigating and work towards a fix,
> but think I need a nudge in the right direction.

Ping?

I'm happy to spend some more time digging into this issue, and would
love to be pointed in the right direction if someone is able!

Thanks!

-Aaron



Re: plugins: Missing Store Exclusive Memory Accesses

2021-09-22 Thread Aaron Lindsay via
On Sep 21 16:28, Aaron Lindsay wrote:
> On Sep 17 12:05, Alex Bennée wrote:
> > Aaron Lindsay  writes:
> > > I recently noticed that the plugin interface does not appear to be
> > > emitting callbacks to functions registered via
> > > `qemu_plugin_register_vcpu_mem_cb` for AArch64 store exclusives. This
> > > would include instructions like `stxp  w16, x2, x3, [x4]` (encoding:
> > > 0xc8300c82). Seeing as how I'm only running with a single CPU, I don't
> > > see how this could be due to losing exclusivity after the preceding
> > > `ldxp`.
> > 
> > The exclusive handling is a bit special due to the need to emulate it's
> > behaviour using cmpxchg primitives.
> > 
> > >
> > > In looking at QEMU's source, I *think* this is because the
> > > `gen_store_exclusive` function in translate-a64.c is not making the same
> > > calls to `plugin_gen_mem_callbacks` & company that are being made by
> > > "normal" stores handled by functions like `tcg_gen_qemu_st_i64` (at
> > > least in my case; I do see some code paths under `gen_store_exclusive`
> > > call down into `tcg_gen_qemu_st_i64` eventually, but it appears not all
> > > of them do?).
> > 
> > The key TCG operation is the cmpxchg which does the effective store. For
> > -smp 1 we should use normal ld and st tcg ops. For > 1 it eventually
> > falls to tcg_gen_atomic_cmpxchg_XX which is a helper. That eventually
> > ends up at:
> > 
> >   atomic_trace_rmw_post
> > 
> > which should be where things are hooked.
> 
> When I open this up in gdb, I see that I'm getting the following call
> graph for the `stxp` instruction in question (for -smp 1):
> 
> gen_store_exclusive -> gen_helper_paired_cmpxchg64_le
> 
> In other words, I'm taking the `s->be_data == MO_LE` else/if clause.
> 
> I do not see where the helper behind that (defined in helper-a64.c as
> `uint64_t HELPER(paired_cmpxchg64_le)...`) is calling in to generate
> plugin callbacks in this case. Am I missing something?

Richard, Alex,

The more I look at this, the more it feels like the following
AArch64-specific helpers may have been overlooked when adding
tracing/plugin hooks:
gen_helper_paired_cmpxchg64_le
gen_helper_paired_cmpxchg64_be

But... I'm still not sure I fully understand how everything I'm digging
into interacts; I am happy to keep investigating and work towards a fix,
but think I need a nudge in the right direction.

Thanks for any nudges,

Aaron



Re: plugins: Missing Store Exclusive Memory Accesses

2021-09-21 Thread Aaron Lindsay via
On Sep 17 12:05, Alex Bennée wrote:
> Aaron Lindsay  writes:
> > I recently noticed that the plugin interface does not appear to be
> > emitting callbacks to functions registered via
> > `qemu_plugin_register_vcpu_mem_cb` for AArch64 store exclusives. This
> > would include instructions like `stxp  w16, x2, x3, [x4]` (encoding:
> > 0xc8300c82). Seeing as how I'm only running with a single CPU, I don't
> > see how this could be due to losing exclusivity after the preceding
> > `ldxp`.
> 
> The exclusive handling is a bit special due to the need to emulate it's
> behaviour using cmpxchg primitives.
> 
> >
> > In looking at QEMU's source, I *think* this is because the
> > `gen_store_exclusive` function in translate-a64.c is not making the same
> > calls to `plugin_gen_mem_callbacks` & company that are being made by
> > "normal" stores handled by functions like `tcg_gen_qemu_st_i64` (at
> > least in my case; I do see some code paths under `gen_store_exclusive`
> > call down into `tcg_gen_qemu_st_i64` eventually, but it appears not all
> > of them do?).
> 
> The key TCG operation is the cmpxchg which does the effective store. For
> -smp 1 we should use normal ld and st tcg ops. For > 1 it eventually
> falls to tcg_gen_atomic_cmpxchg_XX which is a helper. That eventually
> ends up at:
> 
>   atomic_trace_rmw_post
> 
> which should be where things are hooked.

When I open this up in gdb, I see that I'm getting the following call
graph for the `stxp` instruction in question (for -smp 1):

gen_store_exclusive -> gen_helper_paired_cmpxchg64_le

In other words, I'm taking the `s->be_data == MO_LE` else/if clause.

I do not see where the helper behind that (defined in helper-a64.c as
`uint64_t HELPER(paired_cmpxchg64_le)...`) is calling in to generate
plugin callbacks in this case. Am I missing something?

-Aaron



Re: plugins: Missing Store Exclusive Memory Accesses

2021-09-17 Thread Aaron Lindsay via
On Sep 17 12:05, Alex Bennée wrote:
> Aaron Lindsay  writes:
> > In looking at QEMU's source, I *think* this is because the
> > `gen_store_exclusive` function in translate-a64.c is not making the same
> > calls to `plugin_gen_mem_callbacks` & company that are being made by
> > "normal" stores handled by functions like `tcg_gen_qemu_st_i64` (at
> > least in my case; I do see some code paths under `gen_store_exclusive`
> > call down into `tcg_gen_qemu_st_i64` eventually, but it appears not all
> > of them do?).
> 
> The key TCG operation is the cmpxchg which does the effective store. For
> -smp 1 we should use normal ld and st tcg ops. For > 1 it eventually
> falls to tcg_gen_atomic_cmpxchg_XX which is a helper. That eventually
> ends up at:
> 
>   atomic_trace_rmw_post
> 
> which should be where things are hooked.

If I am understanding you correctly, it seems like my `stxp` should be using
the "normal" load and store tcg ops since I am running with `-smp 1`, and
therefore correctly emitting plugin memory callbacks.

I think my next step is to figure out exactly which tcg code path is being used
for this instruction to remove any doubt about what's going on here.

> > Does my initial guess check out? And, if so, does anyone have insight
> > into how to fix this issue most cleanly/generically? I suspect if/when I
> > debug my particular case I can discover one code path to fix, but I'm
> > wondering if my discovery may be part of a larger class of cases which
> > fell through the cracks and ought to be fixed together.
> 
> Have you got simple example of a test case?

My test case is reasonably simple - I can reproduce the issue reliably and in
under 5 minutes - but I don't currently have a self-contained version in a form
I can share.

Here is the surrounding dynamic instruction stream, as reported by the plugin
interface (via callbacks registered with
`qemu_plugin_register_vcpu_insn_exec_cb`), along with corresponding memory
accesses (reported via callbacks registered with
`qemu_plugin_register_vcpu_mem_cb`):

  pc   ( opcode   ): `disassembly`
--|-|-
0x082076b4 (0x9436c8a9): `bl#0x08fb9958`
0x08fb9958 (0xf9800091): `prfm  pstl1strm, [x4]`
0x08fb995c (0xc87f4490): `ldxp  x16, x17, [x4]`
^ accesses virtual addresses: 0x8002fffdde60, 0x8002fffdde68
0x08fb9960 (0xca000210): `eor   x16, x16, x0`
0x08fb9964 (0xca010231): `eor   x17, x17, x1`
0x08fb9968 (0xaa110211): `orr   x17, x16, x17`
0x08fb996c (0xb571): `cbnz  x17, #0x08fb9978`
0x08fb9970 (0xc8300c82): `stxp  w16, x2, x3, [x4]`
0x08fb9974 (0x3550): `cbnz  w16, #0x08fb995c`
0x08fb9978 (0xaa1103e0): `mov   x0, x17`
0x08fb997c (0xd65f03c0): `ret   `
0x082076b8 (0xd503201f): `nop   `
0x082076bc (0xd503201f): `nop   `
0x082076c0 (0xd503201f): `nop   `
0x082076c4 (0xb94010a1): `ldr   w1, [x5, #0x10]`
^ accesses virtual addresses: 0x8002f18b5cd0
0x082076c8 (0x51000421): `sub   w1, w1, #1`
0x082076cc (0xb90010a1): `str   w1, [x5, #0x10]`
^ accesses virtual addresses: 0x8002f18b5cd0
0x082076d0 (0x3561): `cbnz  w1, #0x082076dc`

Notice that the `stxp` receives no corresponding callbacks via
`qemu_plugin_register_vcpu_mem_cb` like the `ldxp`, `ldr`, and `str` do.

-Aaron



plugins: Missing Store Exclusive Memory Accesses

2021-09-16 Thread Aaron Lindsay
Hello,

I recently noticed that the plugin interface does not appear to be
emitting callbacks to functions registered via
`qemu_plugin_register_vcpu_mem_cb` for AArch64 store exclusives. This
would include instructions like `stxp  w16, x2, x3, [x4]` (encoding:
0xc8300c82). Seeing as how I'm only running with a single CPU, I don't
see how this could be due to losing exclusivity after the preceding
`ldxp`.

In looking at QEMU's source, I *think* this is because the
`gen_store_exclusive` function in translate-a64.c is not making the same
calls to `plugin_gen_mem_callbacks` & company that are being made by
"normal" stores handled by functions like `tcg_gen_qemu_st_i64` (at
least in my case; I do see some code paths under `gen_store_exclusive`
call down into `tcg_gen_qemu_st_i64` eventually, but it appears not all
of them do?).

Does my initial guess check out? And, if so, does anyone have insight
into how to fix this issue most cleanly/generically? I suspect if/when I
debug my particular case I can discover one code path to fix, but I'm
wondering if my discovery may be part of a larger class of cases which
fell through the cracks and ought to be fixed together.

Thanks for any help,

Aaron



Re: [PATCH] plugins: Fix physical address calculation for IO regions

2021-07-20 Thread Aaron Lindsay
For reference, this patch is intended to address this conversation:
https://lists.nongnu.org/archive/html/qemu-devel/2021-07/msg01293.html

This appears to be better than the previous version in my testing, but I
absolutely welcome being told there is a better way to solve this!

Thanks!

-Aaron

On Jul 20 15:57, Aaron Lindsay wrote:
> The address calculation for IO regions introduced by
> 
> commit 787148bf928a54b5cc86f5b434f9399e9737679c
> Author: Aaron Lindsay 
> plugins: Expose physical addresses instead of device offsets
> 
> is not always accurate. Use the more correct
> MemoryRegionSection.offset_within_address_space.

Whoops, forgot my:

Signed-off-by: Aaron Lindsay 

> ---
>  plugins/api.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/plugins/api.c b/plugins/api.c
> index 5c1a413928..ba14e6f2b2 100644
> --- a/plugins/api.c
> +++ b/plugins/api.c
> @@ -319,7 +319,7 @@ uint64_t qemu_plugin_hwaddr_phys_addr(const struct 
> qemu_plugin_hwaddr *haddr)
>  return block->offset + offset + block->mr->addr;
>  } else {
>  MemoryRegionSection *mrs = haddr->v.io.section;
> -return haddr->v.io.offset + mrs->mr->addr;
> +return mrs->offset_within_address_space + haddr->v.io.offset;
>  }
>  }
>  #endif
> -- 
> 2.17.1
> 



[PATCH] plugins: Fix physical address calculation for IO regions

2021-07-20 Thread Aaron Lindsay
The address calculation for IO regions introduced by

commit 787148bf928a54b5cc86f5b434f9399e9737679c
Author: Aaron Lindsay 
plugins: Expose physical addresses instead of device offsets

is not always accurate. Use the more correct
MemoryRegionSection.offset_within_address_space.
---
 plugins/api.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/plugins/api.c b/plugins/api.c
index 5c1a413928..ba14e6f2b2 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -319,7 +319,7 @@ uint64_t qemu_plugin_hwaddr_phys_addr(const struct 
qemu_plugin_hwaddr *haddr)
 return block->offset + offset + block->mr->addr;
 } else {
 MemoryRegionSection *mrs = haddr->v.io.section;
-return haddr->v.io.offset + mrs->mr->addr;
+return mrs->offset_within_address_space + haddr->v.io.offset;
 }
 }
 #endif
-- 
2.17.1




Re: Plugin virtual-to-physical translation incorrect for some IO accesses

2021-07-07 Thread Aaron Lindsay via
On Jul 07 07:35, Aaron Lindsay wrote:
> On Jul 07 09:53, Philippe Mathieu-Daudé wrote:
> > On 7/6/21 11:56 PM, Aaron Lindsay wrote:
> > > On Jul 06 23:10, Philippe Mathieu-Daudé wrote:
> > >> +Peter/Paolo
> > >>
> > >> On 7/6/21 10:47 PM, Aaron Lindsay wrote:
> > >>> Hello,
> > >>>
> > >>> I previously supplied a patch which modified the plugin interface such
> > >>> that it will return physical addresses for IO regions [0]. However, I
> > >>> have now found a case where the interface does not appear to correctly
> > >>> return the full physical addresses.
> > >>>
> > >>> In particular, when in qemu_plugin_hwaddr_phys_addr() for a particular
> > >>> store to IO memory (haddr->is_io==true), I find that haddr->v.io.offset
> > >>> is 0x0 and mrs->mr->addr is 0x3000, meaning 0x3000 is the returned
> > >>> "physical address".
> > 
> > v.io.offset is filled with iotlb_to_section() which use
> > AddressSpaceDispatch:
> > 
> > MemoryRegionSection *iotlb_to_section(CPUState *cpu,
> >   hwaddr index, MemTxAttrs attrs)
> > {
> > int asidx = cpu_asidx_from_attrs(cpu, attrs);
> > CPUAddressSpace *cpuas = >cpu_ases[asidx];
> > AddressSpaceDispatch *d = qatomic_rcu_read(>memory_dispatch);
> > MemoryRegionSection *sections = d->map.sections;
> > 
> > return [index & ~TARGET_PAGE_MASK];
> > }
> > 
> > IIUC AddressSpaceDispatch is already adapted from the flatview to the
> > CPU (AS view). So v.io.offset is relative to each CPUAddressSpace.

What does CPUAddressSpace represent here? In my initial reading, I
assumed there might be one CPUAddressSpace for secure and one for
non-secure in the ARM world. But from my observation so far, v.io.offset
seems to be an offset relative to the beginning of a given memory region
(i.e. one device's portion of the memory map), rather than to the
address space as a whole (in terms of S/NS).

> > Assuming an ARM Cortex-M core having it's secure world mapped at
> > 0x80 and non-secure mapped at 0x00, the QEMU cpu
> > will have 2 CPUAddressSpaces, each CPUAddressSpace root MemoryRegion
> > is zero-based.
> > 
> > IOW the iotlb_to_section() API return you the relative offset (to the
> > CPUAddressSpace), not absolute (based on your expected 0x80).
> > 
> > > However, I also find that
> > >>> mrs->offset_within_address_space is 0x807000 (and also that
> > >>> 0x807000 matches up with what an actual translation would be from
> > >>> inspecting the page tables).
> > >>>
> > >>> Would it be 'safe' to *always* begin using
> > >>> mrs->offset_within_address_space as the returned physical address here
> > >>> instead of `haddr->v.io.offset + mrs->mr->addr`, or is there a reason we
> > >>> should not do that?
> > > 
> > > I realized this was perhaps not clear, so for clarification, I am
> > > imagining the formula for calculating the address would be:
> > > `mrs->offset_within_address_space + mrs->mr->addr`. Perhaps this was a
> > > confusing example since the offset into the region is 0x0...

Whoops, I replaced the wrong term in my clarification. What I really,
really meant was:

`mrs->offset_within_address_space + haddr->v.io.offset`

> > Yes, however remember this won't be the absolute address from the CPU
> > view, but the absolute address from address space (think of physical
> > bus) view. For example for a PCI BAR, this won't be the physical address
> > mapped on the CPU view, but the physical address on the PCI bus.
> 
> I believe I want the CPU view here (i.e. I want the physical address
> that would have been returned from a page table walk by the CPU for this
> access). Given that, I think what I'm hearing is that
> mrs->offset_within_address_space is *not* what I want (even though it
> appears to be in this case, since they happen to align). But also that
> v.io.offset is not sufficient without first adding an offset for the
> address space into which the access is being made.
> 
> Do I have that right? If so, can you point me in the right direction for
> getting back to the address space correctly?
> 
> Alex, I seem to recall you mentioned maybe wanting the plugins to know
> more about address spaces when I posted the original patch. At the time,
> I think I understood the concern to be mostly that the plugins may want
> to know which address space an access was to, not that it may be
> interfering with our ability to return correct addresses (at least as
> the CPU understands them). My initial thoughts are that we could adjust
> the address here for the address space without necessarily reporting it.
> Do you have thoughts about this?
> 
> -Aaron



Re: Plugin virtual-to-physical translation incorrect for some IO accesses

2021-07-07 Thread Aaron Lindsay via
On Jul 07 09:53, Philippe Mathieu-Daudé wrote:
> On 7/6/21 11:56 PM, Aaron Lindsay wrote:
> > On Jul 06 23:10, Philippe Mathieu-Daudé wrote:
> >> +Peter/Paolo
> >>
> >> On 7/6/21 10:47 PM, Aaron Lindsay wrote:
> >>> Hello,
> >>>
> >>> I previously supplied a patch which modified the plugin interface such
> >>> that it will return physical addresses for IO regions [0]. However, I
> >>> have now found a case where the interface does not appear to correctly
> >>> return the full physical addresses.
> >>>
> >>> In particular, when in qemu_plugin_hwaddr_phys_addr() for a particular
> >>> store to IO memory (haddr->is_io==true), I find that haddr->v.io.offset
> >>> is 0x0 and mrs->mr->addr is 0x3000, meaning 0x3000 is the returned
> >>> "physical address".
> 
> v.io.offset is filled with iotlb_to_section() which use
> AddressSpaceDispatch:
> 
> MemoryRegionSection *iotlb_to_section(CPUState *cpu,
>   hwaddr index, MemTxAttrs attrs)
> {
> int asidx = cpu_asidx_from_attrs(cpu, attrs);
> CPUAddressSpace *cpuas = >cpu_ases[asidx];
> AddressSpaceDispatch *d = qatomic_rcu_read(>memory_dispatch);
> MemoryRegionSection *sections = d->map.sections;
> 
> return [index & ~TARGET_PAGE_MASK];
> }
> 
> IIUC AddressSpaceDispatch is already adapted from the flatview to the
> CPU (AS view). So v.io.offset is relative to each CPUAddressSpace.
> 
> Assuming an ARM Cortex-M core having it's secure world mapped at
> 0x80 and non-secure mapped at 0x00, the QEMU cpu
> will have 2 CPUAddressSpaces, each CPUAddressSpace root MemoryRegion
> is zero-based.
> 
> IOW the iotlb_to_section() API return you the relative offset (to the
> CPUAddressSpace), not absolute (based on your expected 0x80).
> 
> > However, I also find that
> >>> mrs->offset_within_address_space is 0x807000 (and also that
> >>> 0x807000 matches up with what an actual translation would be from
> >>> inspecting the page tables).
> >>>
> >>> Would it be 'safe' to *always* begin using
> >>> mrs->offset_within_address_space as the returned physical address here
> >>> instead of `haddr->v.io.offset + mrs->mr->addr`, or is there a reason we
> >>> should not do that?
> > 
> > I realized this was perhaps not clear, so for clarification, I am
> > imagining the formula for calculating the address would be:
> > `mrs->offset_within_address_space + mrs->mr->addr`. Perhaps this was a
> > confusing example since the offset into the region is 0x0...
> 
> Yes, however remember this won't be the absolute address from the CPU
> view, but the absolute address from address space (think of physical
> bus) view. For example for a PCI BAR, this won't be the physical address
> mapped on the CPU view, but the physical address on the PCI bus.

I believe I want the CPU view here (i.e. I want the physical address
that would have been returned from a page table walk by the CPU for this
access). Given that, I think what I'm hearing is that
mrs->offset_within_address_space is *not* what I want (even though it
appears to be in this case, since they happen to align). But also that
v.io.offset is not sufficient without first adding an offset for the
address space into which the access is being made.

Do I have that right? If so, can you point me in the right direction for
getting back to the address space correctly?

Alex, I seem to recall you mentioned maybe wanting the plugins to know
more about address spaces when I posted the original patch. At the time,
I think I understood the concern to be mostly that the plugins may want
to know which address space an access was to, not that it may be
interfering with our ability to return correct addresses (at least as
the CPU understands them). My initial thoughts are that we could adjust
the address here for the address space without necessarily reporting it.
Do you have thoughts about this?

-Aaron



Re: Plugin virtual-to-physical translation incorrect for some IO accesses

2021-07-06 Thread Aaron Lindsay via
On Jul 06 23:10, Philippe Mathieu-Daudé wrote:
> +Peter/Paolo
> 
> On 7/6/21 10:47 PM, Aaron Lindsay wrote:
> > Hello,
> > 
> > I previously supplied a patch which modified the plugin interface such
> > that it will return physical addresses for IO regions [0]. However, I
> > have now found a case where the interface does not appear to correctly
> > return the full physical addresses.
> > 
> > In particular, when in qemu_plugin_hwaddr_phys_addr() for a particular
> > store to IO memory (haddr->is_io==true), I find that haddr->v.io.offset
> > is 0x0 and mrs->mr->addr is 0x3000, meaning 0x3000 is the returned
> > "physical address". However, I also find that
> > mrs->offset_within_address_space is 0x807000 (and also that
> > 0x807000 matches up with what an actual translation would be from
> > inspecting the page tables).
> > 
> > Would it be 'safe' to *always* begin using
> > mrs->offset_within_address_space as the returned physical address here
> > instead of `haddr->v.io.offset + mrs->mr->addr`, or is there a reason we
> > should not do that?

I realized this was perhaps not clear, so for clarification, I am
imagining the formula for calculating the address would be:
`mrs->offset_within_address_space + mrs->mr->addr`. Perhaps this was a
confusing example since the offset into the region is 0x0...

> 'safety' is not my area, but using mrs->offset_within_address_space
> sounds correct to me.

I'm not sure I was as clear as I could be here, either. My primary
concern was/is correctness of the address calculation, so perhaps 'safe'
was not the right way to put this.

-Aaron



Plugin virtual-to-physical translation incorrect for some IO accesses

2021-07-06 Thread Aaron Lindsay
Hello,

I previously supplied a patch which modified the plugin interface such
that it will return physical addresses for IO regions [0]. However, I
have now found a case where the interface does not appear to correctly
return the full physical addresses.

In particular, when in qemu_plugin_hwaddr_phys_addr() for a particular
store to IO memory (haddr->is_io==true), I find that haddr->v.io.offset
is 0x0 and mrs->mr->addr is 0x3000, meaning 0x3000 is the returned
"physical address". However, I also find that
mrs->offset_within_address_space is 0x807000 (and also that
0x807000 matches up with what an actual translation would be from
inspecting the page tables).

Would it be 'safe' to *always* begin using
mrs->offset_within_address_space as the returned physical address here
instead of `haddr->v.io.offset + mrs->mr->addr`, or is there a reason we
should not do that?

Thanks!

-Aaron

[0] https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg03137.html



Re: [RFC] tcg plugin: Additional plugin interface

2021-04-28 Thread Aaron Lindsay
On Apr 26 18:42, Alex Bennée wrote:
> 
> Min-Yih Hsu  writes:
> 
> > Hi Alex,
> >
> >> On Apr 23, 2021, at 8:44 AM, Alex Bennée  wrote:
> >> 
> >> 
> >> Min-Yih Hsu  writes:
> >> 
> >>> Hi Alex and QEMU developers,
> >>> 
> >>> Recently I was working with the TCG plugin. I found that 
> >>> `qemu_plugin_cb_flags` seems to reserve the functionality to
> >>> read / write CPU register state, I'm wondering if you can share some
> >>> roadmap or thoughts on this feature?
> >> 
> >> I think reading the CPU register state is certainly on the roadmap,
> >> writing registers presents a more philosophical question of if it opens
> >> the way to people attempting a GPL bypass via plugins. However reading
> >> registers would certainly be a worthwhile addition to the API.
> >
> > Interesting…I’ve never thought about this problem before.
> >
> >> 
> >>> Personally I see reading the CPU register state as (kind of) low-hanging 
> >>> fruit. The most straightforward way to implement
> >>> it will be adding another function that can be called by insn_exec 
> >>> callbacks to read (v)CPU register values. What do you
> >>> think about this?
> >> 
> >> It depends on your definition of low hanging fruit ;-)
> >> 
> >> Yes the implementation would be a simple helper which could be called
> >> from a callback - I don't think we need to limit it to just insn_exec. I
> >> think the challenge is proving a non-ugly API that works cleanly across
> >> all the architectures. I'm not keen on exposing arbitrary gdb register
> >> IDs to the plugins.
> >> 
> >> There has been some discussion previously on the list which is probably
> >> worth reviewing:
> >> 
> >>  Date: Mon, 7 Dec 2020 16:03:24 -0500
> >>  From: Aaron Lindsay 
> >>  Subject: Plugin Register Accesses
> >>  Message-ID: 
> >> 
> >> But in short I think we need a new subsystem in QEMU where frontends can
> >> register registers (sic) and then provide a common API for various
> >> users. This common subsystem would then be the source of data for:
> >> 
> >>  - plugins
> >>  - gdbstub
> >>  - monitor (info registers)
> >>  - -d LOG_CPU logging
> >> 
> >> If you are interested in tackling such a project I'm certainly happy to
> >> provide pointers and review.
> >
> > Thank you! Yeah I’m definitely going to scratch a prototype for this
> > register reading plugin interface. I’ll take a look at related email
> > discussions.
> 
> Awesome - please CC me on any patches you come up with (as well as
> qemu-devel of course ;-).

I would love to be copied on any patches as well. I've wanted to look
into doing this properly for some time now, but have not made time.

-Aaron



Re: [PATCH v1 11/14] plugins: expand kernel-doc for instruction query and instrumentation

2021-03-16 Thread Aaron Lindsay via
On Mar 16 13:48, Alex Bennée wrote:
> Aaron Lindsay  writes:
> > On Mar 12 17:28, Alex Bennée wrote:
> >> + * @insn: opaque instruction handle from qemu_plugin_tb_get_insn()
> >> + *
> >> + * Returns: hardware (physical) address of instruction
> >> + */
> >>  void *qemu_plugin_insn_haddr(const struct qemu_plugin_insn *insn);
> >
> > Is this the physical address of the instruction on the host or target?
> 
> target.

An observation: We're exposing a target physical address here as a void
* and for memory accesses (qemu_plugin_hwaddr_phys_addr) as a uint64_t.

-Aaron



Re: [PATCH v1 08/14] plugins: add qemu_plugin_cb_flags to kernel-doc

2021-03-16 Thread Aaron Lindsay via
On Mar 16 13:40, Alex Bennée wrote:
> 
> Aaron Lindsay  writes:
> 
> > On Mar 12 17:28, Alex Bennée wrote:
> >> Also add a note to explain currently they are unused.
> >> 
> >> Signed-off-by: Alex Bennée 
> >
> > I'm personally interested in one clarification below, but don't think
> > that affects my:
> >
> > Reviewed-by: Aaron Lindsay 
> >
> >> ---
> >>  include/qemu/qemu-plugin.h | 16 +---
> >>  1 file changed, 13 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
> >> index 9ae3940d89..c98866a637 100644
> >> --- a/include/qemu/qemu-plugin.h
> >> +++ b/include/qemu/qemu-plugin.h
> >> @@ -207,10 +207,20 @@ struct qemu_plugin_tb;
> >>  /** struct qemu_plugin_insn - Opaque handle for a translated instruction 
> >> */
> >>  struct qemu_plugin_insn;
> >>  
> >> +/**
> >> + * enum qemu_plugin_cb_flags - type of callback
> >> + *
> >> + * @QEMU_PLUGIN_CB_NO_REGS: callback does not access the CPU's regs
> >> + * @QEMU_PLUGIN_CB_R_REGS: callback reads the CPU's regs
> >> + * @QEMU_PLUGIN_CB_RW_REGS: callback reads and writes the CPU's regs
> >> + *
> >> + * Note: currently unused, plugins cannot read or change system
> >> + * register state.
> >
> > They are unused in the sense that the current plugin interface does not
> > provide a way to make use of them. But are they completely free from
> > side effects?
> 
> They are free of side effects visible to the plugin. Under the covers it
> uses the existing TCG_CALL_NO_* mechanics to ensure that register state
> is synced to/from TCG temporaries before the callback.

I would currently find it useful to have that information included in
the documentation since there is no register state exposed and I am
basically hacking something together for my own use in the meantime...
but I understand that is in tension with the general philosophy of the
plugins to not expose implementation details.

-Aaron



Re: [PATCH v1 12/14] plugins: expand kernel-doc for memory query and instrumentation

2021-03-12 Thread Aaron Lindsay via
On Mar 12 17:28, Alex Bennée wrote:
> Signed-off-by: Alex Bennée 

Small comment below, but otherwise:

Reviewed-by: Aaron Lindsay 

> ---
>  include/qemu/qemu-plugin.h | 35 ---
>  1 file changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
> index d4adce730a..aed868d42a 100644
> --- a/include/qemu/qemu-plugin.h
> +++ b/include/qemu/qemu-plugin.h
> @@ -392,24 +392,45 @@ uint64_t qemu_plugin_insn_vaddr(const struct 
> qemu_plugin_insn *insn);
>   */
>  void *qemu_plugin_insn_haddr(const struct qemu_plugin_insn *insn);
>  
> -/*
> - * Memory Instrumentation
> - *
> - * The anonymous qemu_plugin_meminfo_t and qemu_plugin_hwaddr types
> - * can be used in queries to QEMU to get more information about a
> - * given memory access.
> +/**
> + * typedef qemu_plugin_meminfo_t - opaque memory transaction handle

Would it still be useful to include the types of things you can do
with a qemu_plugin_meminfo_t here?



Re: [PATCH v1 09/14] plugins: add qemu_plugin_id_t to kernel-doc

2021-03-12 Thread Aaron Lindsay via
On Mar 12 17:28, Alex Bennée wrote:
> Signed-off-by: Alex Bennée 

Reviewed-by: Aaron Lindsay 

> ---
>  include/qemu/qemu-plugin.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
> index c98866a637..5ac6fe5f02 100644
> --- a/include/qemu/qemu-plugin.h
> +++ b/include/qemu/qemu-plugin.h
> @@ -32,6 +32,9 @@
>#define QEMU_PLUGIN_LOCAL  __attribute__((visibility("hidden")))
>  #endif
>  
> +/**
> + * typedef qemu_plugin_id_t - Unique plugin ID
> + */
>  typedef uint64_t qemu_plugin_id_t;
>  
>  /*
> -- 
> 2.20.1
> 



Re: [PATCH v1 08/14] plugins: add qemu_plugin_cb_flags to kernel-doc

2021-03-12 Thread Aaron Lindsay via
On Mar 12 17:28, Alex Bennée wrote:
> Also add a note to explain currently they are unused.
> 
> Signed-off-by: Alex Bennée 

I'm personally interested in one clarification below, but don't think
that affects my:

Reviewed-by: Aaron Lindsay 

> ---
>  include/qemu/qemu-plugin.h | 16 +---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
> index 9ae3940d89..c98866a637 100644
> --- a/include/qemu/qemu-plugin.h
> +++ b/include/qemu/qemu-plugin.h
> @@ -207,10 +207,20 @@ struct qemu_plugin_tb;
>  /** struct qemu_plugin_insn - Opaque handle for a translated instruction */
>  struct qemu_plugin_insn;
>  
> +/**
> + * enum qemu_plugin_cb_flags - type of callback
> + *
> + * @QEMU_PLUGIN_CB_NO_REGS: callback does not access the CPU's regs
> + * @QEMU_PLUGIN_CB_R_REGS: callback reads the CPU's regs
> + * @QEMU_PLUGIN_CB_RW_REGS: callback reads and writes the CPU's regs
> + *
> + * Note: currently unused, plugins cannot read or change system
> + * register state.

They are unused in the sense that the current plugin interface does not
provide a way to make use of them. But are they completely free from
side effects?

-Aaron



Re: [PATCH v1 06/14] plugins: expand the callback typedef kernel-docs

2021-03-12 Thread Aaron Lindsay via
On Mar 12 17:28, Alex Bennée wrote:
> Signed-off-by: Alex Bennée 

One nit below, but otherwise:

Reviewed-by: Aaron Lindsay 

> ---
>  include/qemu/qemu-plugin.h | 25 ++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
> index ac1bb318da..09b235f0b4 100644
> --- a/include/qemu/qemu-plugin.h
> +++ b/include/qemu/qemu-plugin.h
> @@ -99,17 +99,36 @@ QEMU_PLUGIN_EXPORT int 
> qemu_plugin_install(qemu_plugin_id_t id,
> const qemu_info_t *info,
> int argc, char **argv);
>  
> -/*
> - * Prototypes for the various callback styles we will be registering
> - * in the following functions.
> +/**
> + * typedef qemu_plugin_simple_cb_t - simple callback
> + * @id: the unique qemu_plugin_id_t
> + *
> + * This call-back passes no information aside from the unique @id.

Should we be consistent about always using 'callback' or 'call-back'
instead of alternating? I tend to use 'callback', but I'm not sure I
have a solid reason to prefer it.

-Aaron

>   */
>  typedef void (*qemu_plugin_simple_cb_t)(qemu_plugin_id_t id);
>  
> +/**
> + * typedef qemu_plugin_udata_cb_t - callback with user data
> + * @id: the unique qemu_plugin_id_t
> + * @userdata: a pointer to some user data supplied when the call-back
> + * was registered.
> + */
>  typedef void (*qemu_plugin_udata_cb_t)(qemu_plugin_id_t id, void *userdata);
>  
> +/**
> + * typedef qemu_plugin_vcpu_simple_cb_t - vcpu callback
> + * @id: the unique qemu_plugin_id_t
> + * @vcpu_index: the current vcpu context
> + */
>  typedef void (*qemu_plugin_vcpu_simple_cb_t)(qemu_plugin_id_t id,
>   unsigned int vcpu_index);
>  
> +/**
> + * typedef qemu_plugin_vcpu_udata_cb_t - vcpu callback
> + * @vcpu_index: the current vcpu context
> + * @userdata: a pointer to some user data supplied when the call-back
> + * was registered.
> + */
>  typedef void (*qemu_plugin_vcpu_udata_cb_t)(unsigned int vcpu_index,
>  void *userdata);
>  
> -- 
> 2.20.1
> 



Re: [PATCH v1 11/14] plugins: expand kernel-doc for instruction query and instrumentation

2021-03-12 Thread Aaron Lindsay via
A few clarifications inline:

On Mar 12 17:28, Alex Bennée wrote:
> Signed-off-by: Alex Bennée 
> ---
>  include/qemu/qemu-plugin.h | 52 --
>  1 file changed, 50 insertions(+), 2 deletions(-)
> 
> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
> index dc05fc1932..d4adce730a 100644
> --- a/include/qemu/qemu-plugin.h
> +++ b/include/qemu/qemu-plugin.h
> @@ -327,21 +327,69 @@ void qemu_plugin_register_vcpu_insn_exec_inline(struct 
> qemu_plugin_insn *insn,
>  enum qemu_plugin_op op,
>  void *ptr, uint64_t imm);
>  
> -/*
> - * Helpers to query information about the instructions in a block
> +/**
> + * qemu_plugin_tb_n_insns() - query helper for number of insns in TB
> + * @tb: opaque handle to TB passed to callback
> + *
> + * Returns: number of instructions in this block
>   */
>  size_t qemu_plugin_tb_n_insns(const struct qemu_plugin_tb *tb);
>  
> +/**
> + * qemu_plugin_tb_vaddr() - query helper for vaddr of TB start
> + * @tb: opaque handle to TB passed to callback
> + *
> + * Returns: virtual address of block start
> + */
>  uint64_t qemu_plugin_tb_vaddr(const struct qemu_plugin_tb *tb);
>  
> +/**
> + * qemu_plugin_tb_get_insn() - retrieve handle for instruction
> + * @tb: opaque handle to TB passed to callback
> + * @idx: instruction number, 0 indexed
> + *
> + * The returned handle can be used in follow up helper queries as well
> + * as when instrumenting an instruction. It is only valid for the
> + * lifetime of the callback.
> + *
> + * Returns: opaque handle to instruction
> + */
>  struct qemu_plugin_insn *
>  qemu_plugin_tb_get_insn(const struct qemu_plugin_tb *tb, size_t idx);
>  
> +/**
> + * qemu_plugin_insn_data() - return ptr to instruction data
> + * @insn: opaque instruction handle from qemu_plugin_tb_get_insn()
> + *
> + * Note: data is only valid for duration of callback. See
> + * qemu_plugin_insn_size() to calculate size of stream.
> + *
> + * Returns: pointer to a stream of bytes

Maybe this could be a little more explicit, something like "Returns:
pointer to a stream of bytes containing the value of this instruction's
opcode"?

> + */
>  const void *qemu_plugin_insn_data(const struct qemu_plugin_insn *insn);
>  
> +/**
> + * qemu_plugin_insn_size() - return size of instruction
> + * @insn: opaque instruction handle from qemu_plugin_tb_get_insn()
> + *
> + * Returns: size of instruction

size in bytes?

> + */
>  size_t qemu_plugin_insn_size(const struct qemu_plugin_insn *insn);
>  
> +/**
> + * qemu_plugin_insn_vaddr() - return vaddr of instruction
> + * @insn: opaque instruction handle from qemu_plugin_tb_get_insn()
> + *
> + * Returns: virtual address of instruction
> + */
>  uint64_t qemu_plugin_insn_vaddr(const struct qemu_plugin_insn *insn);
> +
> +/**
> + * qemu_plugin_insn_haddr() - return vaddr of instruction

Copypasta: s/vaddr/haddr/ ?

> + * @insn: opaque instruction handle from qemu_plugin_tb_get_insn()
> + *
> + * Returns: hardware (physical) address of instruction
> + */
>  void *qemu_plugin_insn_haddr(const struct qemu_plugin_insn *insn);

Is this the physical address of the instruction on the host or target?

-Aaron



Re: [PATCH v1 10/14] plugins: expand inline exec kernel-doc documentation.

2021-03-12 Thread Aaron Lindsay via
On Mar 12 17:28, Alex Bennée wrote:
> Remove the extraneous @cb parameter and document the non-atomic nature
> of the INLINE_ADD_U64 operation.
> 
> Signed-off-by: Alex Bennée 

Reviewed-by: Aaron Lindsay 

> ---
>  include/qemu/qemu-plugin.h | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
> index 5ac6fe5f02..dc05fc1932 100644
> --- a/include/qemu/qemu-plugin.h
> +++ b/include/qemu/qemu-plugin.h
> @@ -269,6 +269,14 @@ void qemu_plugin_register_vcpu_tb_exec_cb(struct 
> qemu_plugin_tb *tb,
>enum qemu_plugin_cb_flags flags,
>void *userdata);
>  
> +/**
> + * enum qemu_plugin_op - describes an inline op
> + *
> + * @QEMU_PLUGIN_INLINE_ADD_U64: add an immediate value uint64_t
> + *
> + * Note: currently only a single inline op is supported.
> + */
> +
>  enum qemu_plugin_op {
>  QEMU_PLUGIN_INLINE_ADD_U64,
>  };
> @@ -283,6 +291,9 @@ enum qemu_plugin_op {
>   * Insert an inline op to every time a translated unit executes.
>   * Useful if you just want to increment a single counter somewhere in
>   * memory.
> + *
> + * Note: ops are not atomic so in multi-threaded/multi-smp situations
> + * you will get inexact results.
>   */
>  void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
>enum qemu_plugin_op op,
> @@ -305,7 +316,6 @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct 
> qemu_plugin_insn *insn,
>  /**
>   * qemu_plugin_register_vcpu_insn_exec_inline() - insn execution inline op
>   * @insn: the opaque qemu_plugin_insn handle for an instruction
> - * @cb: callback function
>   * @op: the type of qemu_plugin_op (e.g. ADD_U64)
>   * @ptr: the target memory location for the op
>   * @imm: the op data (e.g. 1)
> -- 
> 2.20.1
> 



Re: [PATCH v1 07/14] plugins: expand the typedef kernel-docs for translation

2021-03-12 Thread Aaron Lindsay via
On Mar 12 17:28, Alex Bennée wrote:
> Signed-off-by: Alex Bennée 

Reviewed-by: Aaron Lindsay 

> ---
>  include/qemu/qemu-plugin.h | 17 ++---
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
> index 09b235f0b4..9ae3940d89 100644
> --- a/include/qemu/qemu-plugin.h
> +++ b/include/qemu/qemu-plugin.h
> @@ -202,11 +202,9 @@ void qemu_plugin_register_vcpu_idle_cb(qemu_plugin_id_t 
> id,
>  void qemu_plugin_register_vcpu_resume_cb(qemu_plugin_id_t id,
>   qemu_plugin_vcpu_simple_cb_t cb);
>  
> -/*
> - * Opaque types that the plugin is given during the translation and
> - * instrumentation phase.
> - */
> +/** struct qemu_plugin_tb - Opaque handle for a translation block */
>  struct qemu_plugin_tb;
> +/** struct qemu_plugin_insn - Opaque handle for a translated instruction */
>  struct qemu_plugin_insn;
>  
>  enum qemu_plugin_cb_flags {
> @@ -221,6 +219,14 @@ enum qemu_plugin_mem_rw {
>  QEMU_PLUGIN_MEM_RW,
>  };
>  
> +/**
> + * typedef qemu_plugin_vcpu_tb_trans_cb_t - translation callback
> + * @id: unique plugin id
> + * @tb: opaque handle used for querying and instrumenting a block.
> + */
> +typedef void (*qemu_plugin_vcpu_tb_trans_cb_t)(qemu_plugin_id_t id,
> +   struct qemu_plugin_tb *tb);
> +
>  /**
>   * qemu_plugin_register_vcpu_tb_trans_cb() - register a translate cb
>   * @id: plugin ID
> @@ -233,9 +239,6 @@ enum qemu_plugin_mem_rw {
>   * callbacks to be triggered when the block or individual instruction
>   * executes.
>   */
> -typedef void (*qemu_plugin_vcpu_tb_trans_cb_t)(qemu_plugin_id_t id,
> -   struct qemu_plugin_tb *tb);
> -
>  void qemu_plugin_register_vcpu_tb_trans_cb(qemu_plugin_id_t id,
> qemu_plugin_vcpu_tb_trans_cb_t 
> cb);
>  
> -- 
> 2.20.1
> 



Re: [PATCH v1 05/14] plugins: cleanup kernel-doc for qemu_plugin_install

2021-03-12 Thread Aaron Lindsay via
On Mar 12 17:28, Alex Bennée wrote:
> kernel-doc doesn't like multiple Note sections. Also add an explicit
> Return.
> 
> Signed-off-by: Alex Bennée 

Reviewed-by: Aaron Lindsay 

> ---
>  include/qemu/qemu-plugin.h | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
> index 4b84c6c293..ac1bb318da 100644
> --- a/include/qemu/qemu-plugin.h
> +++ b/include/qemu/qemu-plugin.h
> @@ -85,15 +85,15 @@ typedef struct qemu_info_t {
>   * @argc: number of arguments
>   * @argv: array of arguments (@argc elements)
>   *
> - * All plugins must export this symbol.
> - *
> - * Note: Calling qemu_plugin_uninstall() from this function is a bug. To 
> raise
> - * an error during install, return !0.
> + * All plugins must export this symbol which is called when the plugin
> + * is first loaded. Calling qemu_plugin_uninstall() from this function
> + * is a bug.
>   *
>   * Note: @info is only live during the call. Copy any information we
> - * want to keep.
> + * want to keep. @argv remains valid throughout the lifetime of the
> + * loaded plugin.
>   *
> - * Note: @argv remains valid throughout the lifetime of the loaded plugin.
> + * Return: 0 on successful loading, !0 for an error.
>   */
>  QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
> const qemu_info_t *info,
> -- 
> 2.20.1
> 



Re: [PATCH v1 04/14] plugins: expand kernel-doc for qemu_info_t

2021-03-12 Thread Aaron Lindsay via
On Mar 12 17:28, Alex Bennée wrote:
> It seems kernel-doc struggles a bit with typedef structs but with
> enough encouragement we can get something out of it.
> 
> Signed-off-by: Alex Bennée 

Reviewed-by: Aaron Lindsay 

> ---
>  include/qemu/qemu-plugin.h | 22 +++---
>  1 file changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
> index 3303dce862..4b84c6c293 100644
> --- a/include/qemu/qemu-plugin.h
> +++ b/include/qemu/qemu-plugin.h
> @@ -49,22 +49,30 @@ extern QEMU_PLUGIN_EXPORT int qemu_plugin_version;
>  
>  #define QEMU_PLUGIN_VERSION 1
>  
> -typedef struct {
> -/* string describing architecture */
> +/**
> + * struct qemu_info_t - system information for plugins
> + *
> + * This structure provides for some limited information about the
> + * system to allow the plugin to make decisions on how to proceed. For
> + * example it might only be suitable for running on some guest
> + * architectures or when under full system emulation.
> + */
> +typedef struct qemu_info_t {
> +/** @target_name: string describing architecture */
>  const char *target_name;
> +/** @version: minimum and current plugin API level */
>  struct {
>  int min;
>  int cur;
>  } version;
> -/* is this a full system emulation? */
> +/** @system_emulation: is this a full system emulation? */
>  bool system_emulation;
>  union {
> -/*
> - * smp_vcpus may change if vCPUs can be hot-plugged, max_vcpus
> - * is the system-wide limit.
> - */
> +/** @system: information relevant to system emulation */
>  struct {
> +/** @system.smp_vcpus: initial number of vCPUs */
>  int smp_vcpus;
> +/** @system.max_vcpus: maximum possible number of vCPUs */
>  int max_vcpus;
>  } system;
>  };
> -- 
> 2.20.1
> 



Re: [PATCH v1 03/14] docs/devel: include the plugin API information from the headers

2021-03-12 Thread Aaron Lindsay via
On Mar 12 17:28, Alex Bennée wrote:
> We have kerneldoc tags for the headers so we might as well extract
> them into our developer documentation whilst we are at it.
> 
> Signed-off-by: Alex Bennée 

Reviewed-by: Aaron Lindsay 

> ---
>  docs/devel/tcg-plugins.rst | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/docs/devel/tcg-plugins.rst b/docs/devel/tcg-plugins.rst
> index 39ce86ed96..18c6581d85 100644
> --- a/docs/devel/tcg-plugins.rst
> +++ b/docs/devel/tcg-plugins.rst
> @@ -63,6 +63,11 @@ valid during the lifetime of the callback so it is 
> important that any
>  information that is needed is extracted during the callback and saved
>  by the plugin.
>  
> +API
> +===
> +
> +.. kernel-doc:: include/qemu/qemu-plugin.h
> +
>  Usage
>  =
>  
> -- 
> 2.20.1
> 



[PATCH v2] plugins: Expose physical addresses instead of device offsets

2021-03-09 Thread Aaron Lindsay
This allows plugins to query for full virtual-to-physical address
translation for a given `qemu_plugin_hwaddr` and stops exposing the
offset within the device itself. As this change breaks the API,
QEMU_PLUGIN_VERSION is incremented.

Signed-off-by: Aaron Lindsay 
---
 contrib/plugins/hotpages.c  |  2 +-
 contrib/plugins/hwprofile.c |  2 +-
 include/qemu/qemu-plugin.h  | 32 +---
 plugins/api.c   | 17 -
 4 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/contrib/plugins/hotpages.c b/contrib/plugins/hotpages.c
index eacc678eac..bf53267532 100644
--- a/contrib/plugins/hotpages.c
+++ b/contrib/plugins/hotpages.c
@@ -122,7 +122,7 @@ static void vcpu_haddr(unsigned int cpu_index, 
qemu_plugin_meminfo_t meminfo,
 }
 } else {
 if (hwaddr && !qemu_plugin_hwaddr_is_io(hwaddr)) {
-page = (uint64_t) qemu_plugin_hwaddr_device_offset(hwaddr);
+page = (uint64_t) qemu_plugin_hwaddr_phys_addr(hwaddr);
 } else {
 page = vaddr;
 }
diff --git a/contrib/plugins/hwprofile.c b/contrib/plugins/hwprofile.c
index 6dac1d5f85..faf216ac00 100644
--- a/contrib/plugins/hwprofile.c
+++ b/contrib/plugins/hwprofile.c
@@ -201,7 +201,7 @@ static void vcpu_haddr(unsigned int cpu_index, 
qemu_plugin_meminfo_t meminfo,
 return;
 } else {
 const char *name = qemu_plugin_hwaddr_device_name(hwaddr);
-uint64_t off = qemu_plugin_hwaddr_device_offset(hwaddr);
+uint64_t off = qemu_plugin_hwaddr_phys_addr(hwaddr);
 bool is_write = qemu_plugin_mem_is_store(meminfo);
 DeviceCounts *counts;
 
diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
index c66507fe8f..3303dce862 100644
--- a/include/qemu/qemu-plugin.h
+++ b/include/qemu/qemu-plugin.h
@@ -47,7 +47,7 @@ typedef uint64_t qemu_plugin_id_t;
 
 extern QEMU_PLUGIN_EXPORT int qemu_plugin_version;
 
-#define QEMU_PLUGIN_VERSION 0
+#define QEMU_PLUGIN_VERSION 1
 
 typedef struct {
 /* string describing architecture */
@@ -307,8 +307,8 @@ bool qemu_plugin_mem_is_sign_extended(qemu_plugin_meminfo_t 
info);
 bool qemu_plugin_mem_is_big_endian(qemu_plugin_meminfo_t info);
 bool qemu_plugin_mem_is_store(qemu_plugin_meminfo_t info);
 
-/*
- * qemu_plugin_get_hwaddr():
+/**
+ * qemu_plugin_get_hwaddr() - return handle for memory operation
  * @vaddr: the virtual address of the memory operation
  *
  * For system emulation returns a qemu_plugin_hwaddr handle to query
@@ -323,12 +323,30 @@ struct qemu_plugin_hwaddr 
*qemu_plugin_get_hwaddr(qemu_plugin_meminfo_t info,
   uint64_t vaddr);
 
 /*
- * The following additional queries can be run on the hwaddr structure
- * to return information about it. For non-IO accesses the device
- * offset will be into the appropriate block of RAM.
+ * The following additional queries can be run on the hwaddr structure to
+ * return information about it - namely whether it is for an IO access and the
+ * physical address associated with the access.
+ */
+
+/**
+ * qemu_plugin_hwaddr_is_io() - query whether memory operation is IO
+ * @haddr: address handle from qemu_plugin_get_hwaddr()
+ *
+ * Returns true if the handle's memory operation is to memory-mapped IO, or
+ * false if it is to RAM
  */
 bool qemu_plugin_hwaddr_is_io(const struct qemu_plugin_hwaddr *haddr);
-uint64_t qemu_plugin_hwaddr_device_offset(const struct qemu_plugin_hwaddr 
*haddr);
+
+/**
+ * qemu_plugin_hwaddr_phys_addr() - query physical address for memory operation
+ * @haddr: address handle from qemu_plugin_get_hwaddr()
+ *
+ * Returns the physical address associated with the memory operation
+ *
+ * Note that the returned physical address may not be unique if you are dealing
+ * with multiple address spaces.
+ */
+uint64_t qemu_plugin_hwaddr_phys_addr(const struct qemu_plugin_hwaddr *haddr);
 
 /*
  * Returns a string representing the device. The string is valid for
diff --git a/plugins/api.c b/plugins/api.c
index 0b04380d57..3c7dc406e3 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -40,6 +40,7 @@
 #include "sysemu/sysemu.h"
 #include "tcg/tcg.h"
 #include "exec/exec-all.h"
+#include "exec/ram_addr.h"
 #include "disas/disas.h"
 #include "plugin.h"
 #ifndef CONFIG_USER_ONLY
@@ -298,19 +299,25 @@ bool qemu_plugin_hwaddr_is_io(const struct 
qemu_plugin_hwaddr *haddr)
 #endif
 }
 
-uint64_t qemu_plugin_hwaddr_device_offset(const struct qemu_plugin_hwaddr 
*haddr)
+uint64_t qemu_plugin_hwaddr_phys_addr(const struct qemu_plugin_hwaddr *haddr)
 {
 #ifdef CONFIG_SOFTMMU
 if (haddr) {
 if (!haddr->is_io) {
-ram_addr_t ram_addr = qemu_ram_addr_from_host((void *) 
haddr->v.ram.hostaddr);
-if (ram_addr == RAM_ADDR_INVALID) {
+RAMBlock *block;
+ram_addr_t offset;
+void *hostaddr = (void *) haddr->v.

Re: [PATCH] plugins: Expose physical addresses instead of device offsets

2021-03-09 Thread Aaron Lindsay via
On Mar 09 17:45, Alex Bennée wrote:
> Aaron Lindsay  writes:
> > On Mar 09 10:08, Peter Maydell wrote:
> >> On Mon, 8 Mar 2021 at 20:14, Aaron Lindsay  
> >> wrote:
> >> >
> >> > This allows plugins to query for full virtual-to-physical address
> >> > translation for a given `qemu_plugin_hwaddr` and stops exposing the
> >> > offset within the device itself. As this change breaks the API,
> >> > QEMU_PLUGIN_VERSION is incremented.
> >> >
> >> > Signed-off-by: Aaron Lindsay 
> >> > ---
> >> 
> >> 
> >> > diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
> >> > index c66507fe8f..2252ecf2f0 100644
> >> > --- a/include/qemu/qemu-plugin.h
> >> > +++ b/include/qemu/qemu-plugin.h
> >> > @@ -47,7 +47,7 @@ typedef uint64_t qemu_plugin_id_t;
> >> >
> >> >  extern QEMU_PLUGIN_EXPORT int qemu_plugin_version;
> >> >
> >> > -#define QEMU_PLUGIN_VERSION 0
> >> > +#define QEMU_PLUGIN_VERSION 1
> >> >
> >> >  typedef struct {
> >> >  /* string describing architecture */
> >> > @@ -328,7 +328,7 @@ struct qemu_plugin_hwaddr 
> >> > *qemu_plugin_get_hwaddr(qemu_plugin_meminfo_t info,
> >> >   * offset will be into the appropriate block of RAM.
> >> >   */
> >> >  bool qemu_plugin_hwaddr_is_io(const struct qemu_plugin_hwaddr *haddr);
> >> > -uint64_t qemu_plugin_hwaddr_device_offset(const struct 
> >> > qemu_plugin_hwaddr *haddr);
> >> > +uint64_t qemu_plugin_hwaddr_phys_addr(const struct qemu_plugin_hwaddr 
> >> > *haddr);
> >> 
> >> 
> >> This should have a documentation comment, since this is the public-facing 
> >> API.
> >
> > I now see I neglected to update the comment right here the function
> > declaration, and will do so for v2.
> >
> > But are you asking for additional documentation beyond that change? If
> > so, where is the right place to add this? docs/devel/tcg-plugins.rst
> > doesn't seem to have much in the way of documentation for the actual
> > calls.
> 
> The calls should be documented in @kerneldoc style comments in the main
> plugin header. Which reminds me I should be able to extract them into
> the tcg-plugins.rst document via sphinx.

I just sent out v2, in which I took a pass at updating this
documentation. Let me know what you think.

-Aaron



Re: [PATCH] plugins: Expose physical addresses instead of device offsets

2021-03-09 Thread Aaron Lindsay
On Mar 09 10:08, Peter Maydell wrote:
> On Mon, 8 Mar 2021 at 20:14, Aaron Lindsay  
> wrote:
> >
> > This allows plugins to query for full virtual-to-physical address
> > translation for a given `qemu_plugin_hwaddr` and stops exposing the
> > offset within the device itself. As this change breaks the API,
> > QEMU_PLUGIN_VERSION is incremented.
> >
> > Signed-off-by: Aaron Lindsay 
> > ---
> 
> 
> > diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
> > index c66507fe8f..2252ecf2f0 100644
> > --- a/include/qemu/qemu-plugin.h
> > +++ b/include/qemu/qemu-plugin.h
> > @@ -47,7 +47,7 @@ typedef uint64_t qemu_plugin_id_t;
> >
> >  extern QEMU_PLUGIN_EXPORT int qemu_plugin_version;
> >
> > -#define QEMU_PLUGIN_VERSION 0
> > +#define QEMU_PLUGIN_VERSION 1
> >
> >  typedef struct {
> >  /* string describing architecture */
> > @@ -328,7 +328,7 @@ struct qemu_plugin_hwaddr 
> > *qemu_plugin_get_hwaddr(qemu_plugin_meminfo_t info,
> >   * offset will be into the appropriate block of RAM.
> >   */
> >  bool qemu_plugin_hwaddr_is_io(const struct qemu_plugin_hwaddr *haddr);
> > -uint64_t qemu_plugin_hwaddr_device_offset(const struct qemu_plugin_hwaddr 
> > *haddr);
> > +uint64_t qemu_plugin_hwaddr_phys_addr(const struct qemu_plugin_hwaddr 
> > *haddr);
> 
> 
> This should have a documentation comment, since this is the public-facing API.

I now see I neglected to update the comment right here the function
declaration, and will do so for v2.

But are you asking for additional documentation beyond that change? If
so, where is the right place to add this? docs/devel/tcg-plugins.rst
doesn't seem to have much in the way of documentation for the actual
calls.

> Also, physical addresses aren't uniquely identifying, they're only valid
> in the context of a particular address space (think TrustZone, for instance),
> so the plugin-facing API should probably have some concept of how it
> distinguishes "this is an access for Secure 0x4000" from "this is an
> access for Non-Secure 0x4000".

I agree it could be handy to expose address spaces in addition to the
addresses themselves. Do you believe doing so would change the form of
the interface in this patch, or could that be a logically separate
addition?

I have a secondary concern that it might be hard to expose address
spaces in an architecture-agnostic yet still-helpful way, but haven't
thought through that enough for it to be a firm opinion.

-Aaron



[PATCH] plugins: Expose physical addresses instead of device offsets

2021-03-08 Thread Aaron Lindsay
This allows plugins to query for full virtual-to-physical address
translation for a given `qemu_plugin_hwaddr` and stops exposing the
offset within the device itself. As this change breaks the API,
QEMU_PLUGIN_VERSION is incremented.

Signed-off-by: Aaron Lindsay 
---
 contrib/plugins/hotpages.c  |  2 +-
 contrib/plugins/hwprofile.c |  2 +-
 include/qemu/qemu-plugin.h  |  4 ++--
 plugins/api.c   | 16 +++-
 4 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/contrib/plugins/hotpages.c b/contrib/plugins/hotpages.c
index eacc678eac..bf53267532 100644
--- a/contrib/plugins/hotpages.c
+++ b/contrib/plugins/hotpages.c
@@ -122,7 +122,7 @@ static void vcpu_haddr(unsigned int cpu_index, 
qemu_plugin_meminfo_t meminfo,
 }
 } else {
 if (hwaddr && !qemu_plugin_hwaddr_is_io(hwaddr)) {
-page = (uint64_t) qemu_plugin_hwaddr_device_offset(hwaddr);
+page = (uint64_t) qemu_plugin_hwaddr_phys_addr(hwaddr);
 } else {
 page = vaddr;
 }
diff --git a/contrib/plugins/hwprofile.c b/contrib/plugins/hwprofile.c
index 6dac1d5f85..faf216ac00 100644
--- a/contrib/plugins/hwprofile.c
+++ b/contrib/plugins/hwprofile.c
@@ -201,7 +201,7 @@ static void vcpu_haddr(unsigned int cpu_index, 
qemu_plugin_meminfo_t meminfo,
 return;
 } else {
 const char *name = qemu_plugin_hwaddr_device_name(hwaddr);
-uint64_t off = qemu_plugin_hwaddr_device_offset(hwaddr);
+uint64_t off = qemu_plugin_hwaddr_phys_addr(hwaddr);
 bool is_write = qemu_plugin_mem_is_store(meminfo);
 DeviceCounts *counts;
 
diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
index c66507fe8f..2252ecf2f0 100644
--- a/include/qemu/qemu-plugin.h
+++ b/include/qemu/qemu-plugin.h
@@ -47,7 +47,7 @@ typedef uint64_t qemu_plugin_id_t;
 
 extern QEMU_PLUGIN_EXPORT int qemu_plugin_version;
 
-#define QEMU_PLUGIN_VERSION 0
+#define QEMU_PLUGIN_VERSION 1
 
 typedef struct {
 /* string describing architecture */
@@ -328,7 +328,7 @@ struct qemu_plugin_hwaddr 
*qemu_plugin_get_hwaddr(qemu_plugin_meminfo_t info,
  * offset will be into the appropriate block of RAM.
  */
 bool qemu_plugin_hwaddr_is_io(const struct qemu_plugin_hwaddr *haddr);
-uint64_t qemu_plugin_hwaddr_device_offset(const struct qemu_plugin_hwaddr 
*haddr);
+uint64_t qemu_plugin_hwaddr_phys_addr(const struct qemu_plugin_hwaddr *haddr);
 
 /*
  * Returns a string representing the device. The string is valid for
diff --git a/plugins/api.c b/plugins/api.c
index 0b04380d57..e7352df3e3 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -40,6 +40,7 @@
 #include "sysemu/sysemu.h"
 #include "tcg/tcg.h"
 #include "exec/exec-all.h"
+#include "exec/ram_addr.h"
 #include "disas/disas.h"
 #include "plugin.h"
 #ifndef CONFIG_USER_ONLY
@@ -298,19 +299,24 @@ bool qemu_plugin_hwaddr_is_io(const struct 
qemu_plugin_hwaddr *haddr)
 #endif
 }
 
-uint64_t qemu_plugin_hwaddr_device_offset(const struct qemu_plugin_hwaddr 
*haddr)
+uint64_t qemu_plugin_hwaddr_phys_addr(const struct qemu_plugin_hwaddr *haddr)
 {
 #ifdef CONFIG_SOFTMMU
 if (haddr) {
 if (!haddr->is_io) {
-ram_addr_t ram_addr = qemu_ram_addr_from_host((void *) 
haddr->v.ram.hostaddr);
-if (ram_addr == RAM_ADDR_INVALID) {
+RAMBlock *block;
+ram_addr_t offset;
+
+block = qemu_ram_block_from_host((void *) haddr->v.ram.hostaddr, 
false, );
+if (!block) {
 error_report("Bad ram pointer %"PRIx64"", 
haddr->v.ram.hostaddr);
 abort();
 }
-return ram_addr;
+
+return block->offset + offset + block->mr->addr;
 } else {
-return haddr->v.io.offset;
+MemoryRegionSection *mrs = haddr->v.io.section;
+return haddr->v.io.offset + mrs->mr->addr;
 }
 }
 #endif
-- 
2.17.1




Re: [PATCH] plugins: Expose physical addresses instead of device offsets

2021-03-08 Thread Aaron Lindsay
Alex,

I've now tested this change, and it is giving what appear to be valid
and correct physical addresses for both RAM and IO accesses in all the
cases I've thrown at it. My main concern with this patch at this point
is that I am concerned I may be breaking your new plugin here:

> +++ b/contrib/plugins/hwprofile.c
> @@ -201,7 +201,7 @@ static void vcpu_haddr(unsigned int cpu_index, 
> qemu_plugin_meminfo_t meminfo,
>  return;
>  } else {
>  const char *name = qemu_plugin_hwaddr_device_name(hwaddr);
> -uint64_t off = qemu_plugin_hwaddr_device_offset(hwaddr);
> +uint64_t off = qemu_plugin_hwaddr_phys_addr(hwaddr);

How angry is the plugin going to be that these are now physical
addresses instead of offsets?

-Aaron



Re: Plugin Address Translations Inconsistent/Incorrect?

2021-03-02 Thread Aaron Lindsay via
On Mar 02 16:06, Alex Bennée wrote:
> 
> Aaron Lindsay  writes:
> 
> > On Feb 23 15:53, Aaron Lindsay wrote:
> >> On Feb 22 15:48, Aaron Lindsay wrote:
> >> > On Feb 22 19:30, Alex Bennée wrote:
> >> > > Aaron Lindsay  writes:
> >> > > That said I think we could add an additional helper to translate a
> >> > > hwaddr to a global address space address. I'm open to suggestions of 
> >> > > the
> >> > > best way to structure this.
> >> > 
> >> > Haven't put a ton of thought into it, but what about something like this
> >> > (untested):
> >> > 
> >> > uint64_t qemu_plugin_hwaddr_phys_addr(const struct qemu_plugin_hwaddr 
> >> > *haddr)
> >> > {
> >> > #ifdef CONFIG_SOFTMMU
> >> > if (haddr) {
> >> > if (!haddr->is_io) {
> >> > RAMBlock *block;
> >> > ram_addr_t offset;
> >> > 
> >> > block = qemu_ram_block_from_host((void *) 
> >> > haddr->v.ram.hostaddr, false, );
> >> > if (!block) {
> >> > error_report("Bad ram pointer %"PRIx64"", 
> >> > haddr->v.ram.hostaddr);
> >> > abort();
> >> > }
> >> > 
> >> > return block->offset + offset + block->mr->addr;
> >> > } else {
> >> > MemoryRegionSection *mrs = haddr->v.io.section;
> >> > return haddr->v.io.offset + mrs->mr->addr;
> >> > }
> >> > }
> >> > #endif
> >> > return 0;
> >> > }
> >> 
> >> This appears to successfully return correct physical addresses for RAM
> >> at least, though I've not tested it thoroughly for MMIO yet.
> >> 
> >> If it ends up being desirable based on the discussion elsewhere on this
> >> thread I am willing to perform more complete testing, turn this into a
> >> patch, and submit it.
> >
> > Ping - Is this something worth me pursuing?
> 
> Yes please. 

Okay, I'll work on it. Is your thinking that this would this be a
separate call as shown above, or a replacement of the existing
qemu_plugin_hwaddr_device_offset function? And, if a replacement, should
we keep the name similar to retain compatibility, or make a clean break?

It seemed like Peter may have been saying the device offset shouldn't be
exposed at all (leading me to consider full replacement), but I also
don't see a definitive resolution of that conversation.

-Aaron



Re: Plugin Address Translations Inconsistent/Incorrect?

2021-03-02 Thread Aaron Lindsay via
On Feb 23 15:53, Aaron Lindsay wrote:
> On Feb 22 15:48, Aaron Lindsay wrote:
> > On Feb 22 19:30, Alex Bennée wrote:
> > > Aaron Lindsay  writes:
> > > That said I think we could add an additional helper to translate a
> > > hwaddr to a global address space address. I'm open to suggestions of the
> > > best way to structure this.
> > 
> > Haven't put a ton of thought into it, but what about something like this
> > (untested):
> > 
> > uint64_t qemu_plugin_hwaddr_phys_addr(const struct qemu_plugin_hwaddr 
> > *haddr)
> > {
> > #ifdef CONFIG_SOFTMMU
> > if (haddr) {
> > if (!haddr->is_io) {
> > RAMBlock *block;
> > ram_addr_t offset;
> > 
> > block = qemu_ram_block_from_host((void *) 
> > haddr->v.ram.hostaddr, false, );
> > if (!block) {
> > error_report("Bad ram pointer %"PRIx64"", 
> > haddr->v.ram.hostaddr);
> > abort();
> > }
> > 
> > return block->offset + offset + block->mr->addr;
> > } else {
> > MemoryRegionSection *mrs = haddr->v.io.section;
> > return haddr->v.io.offset + mrs->mr->addr;
> > }
> > }
> > #endif
> > return 0;
> > }
> 
> This appears to successfully return correct physical addresses for RAM
> at least, though I've not tested it thoroughly for MMIO yet.
> 
> If it ends up being desirable based on the discussion elsewhere on this
> thread I am willing to perform more complete testing, turn this into a
> patch, and submit it.

Ping - Is this something worth me pursuing?

-Aaron



Re: Plugin Address Translations Inconsistent/Incorrect?

2021-02-23 Thread Aaron Lindsay via
On Feb 22 15:48, Aaron Lindsay wrote:
> On Feb 22 19:30, Alex Bennée wrote:
> > Aaron Lindsay  writes:
> > That said I think we could add an additional helper to translate a
> > hwaddr to a global address space address. I'm open to suggestions of the
> > best way to structure this.
> 
> Haven't put a ton of thought into it, but what about something like this
> (untested):
> 
> uint64_t qemu_plugin_hwaddr_phys_addr(const struct qemu_plugin_hwaddr *haddr)
> {
> #ifdef CONFIG_SOFTMMU
> if (haddr) {
> if (!haddr->is_io) {
> RAMBlock *block;
> ram_addr_t offset;
> 
> block = qemu_ram_block_from_host((void *) haddr->v.ram.hostaddr, 
> false, );
> if (!block) {
> error_report("Bad ram pointer %"PRIx64"", 
> haddr->v.ram.hostaddr);
> abort();
> }
> 
> return block->offset + offset + block->mr->addr;
> } else {
> MemoryRegionSection *mrs = haddr->v.io.section;
> return haddr->v.io.offset + mrs->mr->addr;
> }
> }
> #endif
> return 0;
> }

This appears to successfully return correct physical addresses for RAM
at least, though I've not tested it thoroughly for MMIO yet.

If it ends up being desirable based on the discussion elsewhere on this
thread I am willing to perform more complete testing, turn this into a
patch, and submit it.

-Aaron



Re: Plugin Address Translations Inconsistent/Incorrect?

2021-02-22 Thread Aaron Lindsay via
On Feb 22 19:30, Alex Bennée wrote:
> Aaron Lindsay  writes:
> > If I call (inside a memory callback):
> >
> > `uint64_t pa = qemu_plugin_hwaddr_device_offset(hwaddr);`
> >
> > I see that `pa` takes the value 0xe0e58760. If, however, I plumb
> > `cpu_get_phys_page_debug` through to the plugin interface and call it
> > like:
> >
> > `pa = cpu_get_phys_page_debug(current_cpu, va);`
> >
> > I see it takes the value 0x120e58760.
> >
> > I notice that 0x120e58760-0xe0e58760 is exactly one gigabyte, which is
> > also the offset of the beginning of RAM for the 'virt' AArch64 machine
> > I'm using. Furthermore, I see the name of the plugin function includes
> > "device_offset", so perhaps this discrepancy is by design. However, it
> > seems awkward to not be able to get a true physical address.
> 
> It certainly is by design. The comment for the helper states:
> 
>   /*
>* The following additional queries can be run on the hwaddr structure
>* to return information about it. For non-IO accesses the device
>* offset will be into the appropriate block of RAM.
>*/
> 
> > I've done some digging and found that inside `qemu_ram_addr_from_host`
> > (called by `qemu_plugin_hwaddr_device_offset`), `block->mr->addr`
> > appears to hold the offset of the beginning of RAM. 
> >
> > Do you think it would be reasonable to modify
> > `qemu_plugin_hwaddr_device_offset` to add the beginning of the RAM block
> > or otherwise return the true physical address (or at least expose a way
> > to find the beginning of it through the plugin interface)?
> 
> Well the problem here is what is the address map? For example if you
> have a secure block of RAM you might have two physical addresses which
> are the same. That said with the current qemu_plugin_hwaddr_device_name
> helper both will get reported as "RAM" so maybe it's not that helpful
> yet.

I don't think I yet understand why this is a problem. It seems to me
that the current implementation of `qemu_plugin_hwaddr_device_offset`
returns offsets which may already be ambiguous without additional
information about the underlying device/memory, and I'm not sure why
translating to full physical addresses would make that worse. It's
possible I'm not correctly interpreting your concern.

> I also worry about what happens if devices get moved around. Do you end
> up with aliasing of address space have a remap of the HW.

Would the `block->mr->addr` field I mentioned above be updated in such a
case?

> That said I think we could add an additional helper to translate a
> hwaddr to a global address space address. I'm open to suggestions of the
> best way to structure this.

Haven't put a ton of thought into it, but what about something like this
(untested):

uint64_t qemu_plugin_hwaddr_phys_addr(const struct qemu_plugin_hwaddr *haddr)
{
#ifdef CONFIG_SOFTMMU
if (haddr) {
if (!haddr->is_io) {
RAMBlock *block;
ram_addr_t offset;

block = qemu_ram_block_from_host((void *) haddr->v.ram.hostaddr, 
false, );
if (!block) {
error_report("Bad ram pointer %"PRIx64"", 
haddr->v.ram.hostaddr);
abort();
}

return block->offset + offset + block->mr->addr;
} else {
MemoryRegionSection *mrs = haddr->v.io.section;
return haddr->v.io.offset + mrs->mr->addr;
}
}
#endif
return 0;
}

The key differences from `qemu_plugin_hwaddr_device_offset` are using
`qemu_ram_block_from_host` directly instead of `qemu_ram_addr_from_host` (to
get a pointer to the RAMBlock), and adding `block->mr->addr` and
`mrs->mr->addr` to the returns for RAM and IO, respectively.

-Aaron



Plugin Address Translations Inconsistent/Incorrect?

2021-02-22 Thread Aaron Lindsay
Hello,

I've been doing some more work with plugins and found something I didn't
expect with regards to address translation.

If I call (inside a memory callback):

`uint64_t pa = qemu_plugin_hwaddr_device_offset(hwaddr);`

I see that `pa` takes the value 0xe0e58760. If, however, I plumb
`cpu_get_phys_page_debug` through to the plugin interface and call it
like:

`pa = cpu_get_phys_page_debug(current_cpu, va);`

I see it takes the value 0x120e58760.

I notice that 0x120e58760-0xe0e58760 is exactly one gigabyte, which is
also the offset of the beginning of RAM for the 'virt' AArch64 machine
I'm using. Furthermore, I see the name of the plugin function includes
"device_offset", so perhaps this discrepancy is by design. However, it
seems awkward to not be able to get a true physical address.

I've done some digging and found that inside `qemu_ram_addr_from_host`
(called by `qemu_plugin_hwaddr_device_offset`), `block->mr->addr`
appears to hold the offset of the beginning of RAM. 

Do you think it would be reasonable to modify
`qemu_plugin_hwaddr_device_offset` to add the beginning of the RAM block
or otherwise return the true physical address (or at least expose a way
to find the beginning of it through the plugin interface)?

Thanks!

-Aaron



Re: [PATCH v3 20/23] accel/tcg: allow plugin instrumentation to be disable via cflags

2021-02-17 Thread Aaron Lindsay via
On Feb 13 13:03, Alex Bennée wrote:
> When icount is enabled and we recompile an MMIO access we end up
> double counting the instruction execution. To avoid this we introduce
> the CF_MEMI cflag which only allows memory instrumentation for the
> next TB (which won't yet have been counted). As this is part of the
> hashed compile flags we will only execute the generated TB while
> coming out of a cpu_io_recompile.
> 
> While we are at it delete the old TODO. We might as well keep the
> translation handy as it's likely you will repeatedly hit it on each
> MMIO access.
> 
> Reported-by: Aaron Lindsay 
> Signed-off-by: Alex Bennée 
> Reviewed-by: Richard Henderson 
> Message-Id: <20210210221053.18050-21-alex.ben...@linaro.org>

This resolves the issue for me - I'm now seeing one instruction callback
and one memory callback for both MMIO load and store instructions, as
expected.

Tested-by: Aaron Lindsay 

Thanks!

-Aaron

> 
> ---
> v3
>   - s/CF_NOINSTR/CF_MEMI_ONY/
>   - Limit instrumentation at API call sites instead of skipping altogether
>   - clean-up commit log message
> ---
>  include/exec/exec-all.h   |  6 +++---
>  include/exec/plugin-gen.h |  4 ++--
>  include/qemu/plugin.h |  4 
>  accel/tcg/plugin-gen.c|  3 ++-
>  accel/tcg/translate-all.c | 18 +-
>  accel/tcg/translator.c|  5 -
>  plugins/api.c | 36 +---
>  7 files changed, 49 insertions(+), 27 deletions(-)
> 
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index e08179de34..77a2dc044d 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -454,14 +454,14 @@ struct TranslationBlock {
>  uint32_t cflags;/* compile flags */
>  #define CF_COUNT_MASK  0x7fff
>  #define CF_LAST_IO 0x8000 /* Last insn may be an IO access.  */
> +#define CF_MEMI_ONLY   0x0001 /* Only instrument memory ops */
>  #define CF_USE_ICOUNT  0x0002
>  #define CF_INVALID 0x0004 /* TB is stale. Set with @jmp_lock held */
>  #define CF_PARALLEL0x0008 /* Generate code for a parallel context */
>  #define CF_CLUSTER_MASK 0xff00 /* Top 8 bits are cluster ID */
>  #define CF_CLUSTER_SHIFT 24
> -/* cflags' mask for hashing/comparison */
> -#define CF_HASH_MASK   \
> -(CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL | 
> CF_CLUSTER_MASK)
> +/* cflags' mask for hashing/comparison, basically ignore CF_INVALID */
> +#define CF_HASH_MASK   (~CF_INVALID)
>  
>  /* Per-vCPU dynamic tracing state used to generate this TB */
>  uint32_t trace_vcpu_dstate;
> diff --git a/include/exec/plugin-gen.h b/include/exec/plugin-gen.h
> index 4834a9e2f4..b1b72b5d90 100644
> --- a/include/exec/plugin-gen.h
> +++ b/include/exec/plugin-gen.h
> @@ -19,7 +19,7 @@ struct DisasContextBase;
>  
>  #ifdef CONFIG_PLUGIN
>  
> -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb);
> +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool 
> supress);
>  void plugin_gen_tb_end(CPUState *cpu);
>  void plugin_gen_insn_start(CPUState *cpu, const struct DisasContextBase *db);
>  void plugin_gen_insn_end(void);
> @@ -41,7 +41,7 @@ static inline void plugin_insn_append(const void *from, 
> size_t size)
>  #else /* !CONFIG_PLUGIN */
>  
>  static inline
> -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
> +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool 
> supress)
>  {
>  return false;
>  }
> diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
> index 841deed79c..c5a79a89f0 100644
> --- a/include/qemu/plugin.h
> +++ b/include/qemu/plugin.h
> @@ -92,6 +92,7 @@ struct qemu_plugin_dyn_cb {
>  };
>  };
>  
> +/* Internal context for instrumenting an instruction */
>  struct qemu_plugin_insn {
>  GByteArray *data;
>  uint64_t vaddr;
> @@ -99,6 +100,7 @@ struct qemu_plugin_insn {
>  GArray *cbs[PLUGIN_N_CB_TYPES][PLUGIN_N_CB_SUBTYPES];
>  bool calls_helpers;
>  bool mem_helper;
> +bool mem_only;
>  };
>  
>  /*
> @@ -128,6 +130,7 @@ static inline struct qemu_plugin_insn 
> *qemu_plugin_insn_alloc(void)
>  return insn;
>  }
>  
> +/* Internal context for this TranslationBlock */
>  struct qemu_plugin_tb {
>  GPtrArray *insns;
>  size_t n;
> @@ -135,6 +138,7 @@ struct qemu_plugin_tb {
>  uint64_t vaddr2;
>  void *haddr1;
>  void *haddr2;
> +bool mem_only;
>  GArray *cbs[PLUGIN_N_CB_SUBTYPES];
>  };
>  
> diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
> index 8a1bb801e0..c3dc3effe7 100644
> --- a/accel/tcg/plugin-gen.c
> +++

Re: [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags

2021-02-17 Thread Aaron Lindsay via
On Feb 16 10:34, Alex Bennée wrote:
> 
> Aaron Lindsay  writes:
> 
> > On Feb 12 16:04, Alex Bennée wrote:
> >> Do you see two stores or one store? I think I got the sense the wrong
> >> way around because the store is instrumented before the mmu code,
> >> hence should be skipped on a re-instrumented block.
> >
> > I only see one store between the instruction callback for the store and
> > the instruction callback for the subsequent instruction.
> 
> I've posted:
> 
>   Subject: [PATCH  v3 00/23] plugins/next pre-PR (hwprofile, regression 
> fixes, icount count fix)
>   Date: Sat, 13 Feb 2021 13:03:02 +
>   Message-Id: <20210213130325.14781-1-alex.ben...@linaro.org>
> 
> which I think solves it. Could you have a look?

Just did, and it looks good to me - Thanks!

-Aaron

> >
> > -Aaron
> >
> >> On Fri, 12 Feb 2021 at 15:41, Aaron Lindsay
> >>  wrote:
> >> >
> >> > On Feb 12 14:43, Alex Bennée wrote:
> >> > > Aaron Lindsay  writes:
> >> > > > On Feb 10 22:10, Alex Bennée wrote:
> >> > > >> When icount is enabled and we recompile an MMIO access we end up
> >> > > >> double counting the instruction execution. To avoid this we 
> >> > > >> introduce
> >> > > >> the CF_NOINSTR cflag which disables instrumentation for the next TB.
> >> > > >> As this is part of the hashed compile flags we will only execute the
> >> > > >> generated TB while coming out of a cpu_io_recompile.
> >> > > >
> >> > > > Unfortunately this patch works a little too well!
> >> > > >
> >> > > > With this change, the memory access callbacks registered via
> >> > > > `qemu_plugin_register_vcpu_mem_cb()` are never called for the
> >> > > > re-translated instruction making the IO access, since we've disabled 
> >> > > > all
> >> > > > instrumentation.
> >> > > >
> >> > > > Is it possible to selectively disable only instruction callbacks 
> >> > > > using
> >> > > > this mechanism, while still allowing others that would not yet have 
> >> > > > been
> >> > > > called for the re-translated instruction?
> >> > >
> >> > > Can you try the following fugly patch on top of this series:
> >> >
> >> > This patch does allow me to successfully observe memory callbacks for
> >> > stores in this case. It seems from looking at the patch that you
> >> > intentionally only allowed memory callbacks for stores in this case, and
> >> > I still don't see callbacks any for loads.
> >> >
> >> > -Aaron
> >> >
> >> > > --8<---cut here---start->8---
> >> > > diff --git a/include/exec/plugin-gen.h b/include/exec/plugin-gen.h
> >> > > index 4834a9e2f4..b1b72b5d90 100644
> >> > > --- a/include/exec/plugin-gen.h
> >> > > +++ b/include/exec/plugin-gen.h
> >> > > @@ -19,7 +19,7 @@ struct DisasContextBase;
> >> > >
> >> > >  #ifdef CONFIG_PLUGIN
> >> > >
> >> > > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb);
> >> > > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, 
> >> > > bool supress);
> >> > >  void plugin_gen_tb_end(CPUState *cpu);
> >> > >  void plugin_gen_insn_start(CPUState *cpu, const struct 
> >> > > DisasContextBase *db);
> >> > >  void plugin_gen_insn_end(void);
> >> > > @@ -41,7 +41,7 @@ static inline void plugin_insn_append(const void 
> >> > > *from, size_t size)
> >> > >  #else /* !CONFIG_PLUGIN */
> >> > >
> >> > >  static inline
> >> > > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
> >> > > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, 
> >> > > bool supress)
> >> > >  {
> >> > >  return false;
> >> > >  }
> >> > > diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
> >> > > index 841deed79c..2a26a2277f 100644
> >> > > --- a/include/qemu/plugin.h
> >> > > +++ b/include/qemu/plugin.h
> >> > > @@ -92,6 +92,7 @@ struct qemu_plugin_dyn_cb {
> >> &g

Re: [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags

2021-02-12 Thread Aaron Lindsay via
On Feb 12 16:00, Alex Bennée wrote:
> 
> Alex Bennée  writes:
> 
> > Aaron Lindsay  writes:
> >
> >> On Feb 10 22:10, Alex Bennée wrote:
> >>> When icount is enabled and we recompile an MMIO access we end up
> >>> double counting the instruction execution. To avoid this we introduce
> >>> the CF_NOINSTR cflag which disables instrumentation for the next TB.
> >>> As this is part of the hashed compile flags we will only execute the
> >>> generated TB while coming out of a cpu_io_recompile.
> >>
> >> Unfortunately this patch works a little too well!
> >>
> >> With this change, the memory access callbacks registered via
> >> `qemu_plugin_register_vcpu_mem_cb()` are never called for the
> >> re-translated instruction making the IO access, since we've disabled all
> >> instrumentation.
> >>
> >> Is it possible to selectively disable only instruction callbacks using
> >> this mechanism, while still allowing others that would not yet have been
> >> called for the re-translated instruction?
> >
> > Can you try the following fugly patch on top of this series:
> >
> 
> > @@ -120,8 +128,13 @@ void qemu_plugin_register_vcpu_mem_cb(struct 
> > qemu_plugin_insn *insn,
> >enum qemu_plugin_mem_rw rw,
> >void *udata)
> >  {
> > -
> > plugin_register_vcpu_mem_cb(>cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> > -cb, flags, rw, udata);
> > +if (insn->store_only && (rw & QEMU_PLUGIN_MEM_W)) {
> > +
> > plugin_register_vcpu_mem_cb(>cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> > +cb, flags, QEMU_PLUGIN_MEM_W, udata);
> > +} else {
> > +
> > plugin_register_vcpu_mem_cb(>cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> > +cb, flags, rw, udata);
> > +}
> >  }
> 
> 
> Actually I'm wondering if I've got my sense the wrong way around. Should
> it be loads only:
> 
>   void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
> qemu_plugin_vcpu_mem_cb_t cb,
> enum qemu_plugin_cb_flags flags,
> enum qemu_plugin_mem_rw rw,
> void *udata)
>   {
>   if (insn->store_only && (rw & QEMU_PLUGIN_MEM_R)) {
>   
> plugin_register_vcpu_mem_cb(>cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
>   cb, flags, QEMU_PLUGIN_MEM_R, udata);
>   } else {
>   
> plugin_register_vcpu_mem_cb(>cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
>   cb, flags, rw, udata);
>   }
>   }
> 
> obviously I'd have to rename the variables :-/

This gets me only loads and no stores. I've modified it to be just:

void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
  qemu_plugin_vcpu_mem_cb_t cb,
  enum qemu_plugin_cb_flags flags,
  enum qemu_plugin_mem_rw rw,
  void *udata)
{
plugin_register_vcpu_mem_cb(>cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
cb, flags, rw, udata);
}

And that appears to get me one memory callback both for loads and stores.

-Aaron



Re: [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags

2021-02-12 Thread Aaron Lindsay via
On Feb 12 16:04, Alex Bennée wrote:
> Do you see two stores or one store? I think I got the sense the wrong
> way around because the store is instrumented before the mmu code,
> hence should be skipped on a re-instrumented block.

I only see one store between the instruction callback for the store and
the instruction callback for the subsequent instruction.

-Aaron

> On Fri, 12 Feb 2021 at 15:41, Aaron Lindsay
>  wrote:
> >
> > On Feb 12 14:43, Alex Bennée wrote:
> > > Aaron Lindsay  writes:
> > > > On Feb 10 22:10, Alex Bennée wrote:
> > > >> When icount is enabled and we recompile an MMIO access we end up
> > > >> double counting the instruction execution. To avoid this we introduce
> > > >> the CF_NOINSTR cflag which disables instrumentation for the next TB.
> > > >> As this is part of the hashed compile flags we will only execute the
> > > >> generated TB while coming out of a cpu_io_recompile.
> > > >
> > > > Unfortunately this patch works a little too well!
> > > >
> > > > With this change, the memory access callbacks registered via
> > > > `qemu_plugin_register_vcpu_mem_cb()` are never called for the
> > > > re-translated instruction making the IO access, since we've disabled all
> > > > instrumentation.
> > > >
> > > > Is it possible to selectively disable only instruction callbacks using
> > > > this mechanism, while still allowing others that would not yet have been
> > > > called for the re-translated instruction?
> > >
> > > Can you try the following fugly patch on top of this series:
> >
> > This patch does allow me to successfully observe memory callbacks for
> > stores in this case. It seems from looking at the patch that you
> > intentionally only allowed memory callbacks for stores in this case, and
> > I still don't see callbacks any for loads.
> >
> > -Aaron
> >
> > > --8<---cut here---start->8---
> > > diff --git a/include/exec/plugin-gen.h b/include/exec/plugin-gen.h
> > > index 4834a9e2f4..b1b72b5d90 100644
> > > --- a/include/exec/plugin-gen.h
> > > +++ b/include/exec/plugin-gen.h
> > > @@ -19,7 +19,7 @@ struct DisasContextBase;
> > >
> > >  #ifdef CONFIG_PLUGIN
> > >
> > > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb);
> > > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool 
> > > supress);
> > >  void plugin_gen_tb_end(CPUState *cpu);
> > >  void plugin_gen_insn_start(CPUState *cpu, const struct DisasContextBase 
> > > *db);
> > >  void plugin_gen_insn_end(void);
> > > @@ -41,7 +41,7 @@ static inline void plugin_insn_append(const void *from, 
> > > size_t size)
> > >  #else /* !CONFIG_PLUGIN */
> > >
> > >  static inline
> > > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
> > > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool 
> > > supress)
> > >  {
> > >  return false;
> > >  }
> > > diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
> > > index 841deed79c..2a26a2277f 100644
> > > --- a/include/qemu/plugin.h
> > > +++ b/include/qemu/plugin.h
> > > @@ -92,6 +92,7 @@ struct qemu_plugin_dyn_cb {
> > >  };
> > >  };
> > >
> > > +/* Internal context for instrumenting an instruction */
> > >  struct qemu_plugin_insn {
> > >  GByteArray *data;
> > >  uint64_t vaddr;
> > > @@ -99,6 +100,7 @@ struct qemu_plugin_insn {
> > >  GArray *cbs[PLUGIN_N_CB_TYPES][PLUGIN_N_CB_SUBTYPES];
> > >  bool calls_helpers;
> > >  bool mem_helper;
> > > +bool store_only;
> > >  };
> > >
> > >  /*
> > > @@ -128,6 +130,7 @@ static inline struct qemu_plugin_insn 
> > > *qemu_plugin_insn_alloc(void)
> > >  return insn;
> > >  }
> > >
> > > +/* Internal context for this TranslationBlock */
> > >  struct qemu_plugin_tb {
> > >  GPtrArray *insns;
> > >  size_t n;
> > > @@ -135,6 +138,7 @@ struct qemu_plugin_tb {
> > >  uint64_t vaddr2;
> > >  void *haddr1;
> > >  void *haddr2;
> > > +bool store_only;
> > >  GArray *cbs[PLUGIN_N_CB_SUBTYPES];
> > >  };
> > >
> > > diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-

Re: [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags

2021-02-12 Thread Aaron Lindsay via
On Feb 12 14:43, Alex Bennée wrote:
> Aaron Lindsay  writes:
> > On Feb 10 22:10, Alex Bennée wrote:
> >> When icount is enabled and we recompile an MMIO access we end up
> >> double counting the instruction execution. To avoid this we introduce
> >> the CF_NOINSTR cflag which disables instrumentation for the next TB.
> >> As this is part of the hashed compile flags we will only execute the
> >> generated TB while coming out of a cpu_io_recompile.
> >
> > Unfortunately this patch works a little too well!
> >
> > With this change, the memory access callbacks registered via
> > `qemu_plugin_register_vcpu_mem_cb()` are never called for the
> > re-translated instruction making the IO access, since we've disabled all
> > instrumentation.
> >
> > Is it possible to selectively disable only instruction callbacks using
> > this mechanism, while still allowing others that would not yet have been
> > called for the re-translated instruction?
> 
> Can you try the following fugly patch on top of this series:

This patch does allow me to successfully observe memory callbacks for
stores in this case. It seems from looking at the patch that you
intentionally only allowed memory callbacks for stores in this case, and
I still don't see callbacks any for loads.

-Aaron

> --8<---cut here---start->8---
> diff --git a/include/exec/plugin-gen.h b/include/exec/plugin-gen.h
> index 4834a9e2f4..b1b72b5d90 100644
> --- a/include/exec/plugin-gen.h
> +++ b/include/exec/plugin-gen.h
> @@ -19,7 +19,7 @@ struct DisasContextBase;
>  
>  #ifdef CONFIG_PLUGIN
>  
> -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb);
> +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool 
> supress);
>  void plugin_gen_tb_end(CPUState *cpu);
>  void plugin_gen_insn_start(CPUState *cpu, const struct DisasContextBase *db);
>  void plugin_gen_insn_end(void);
> @@ -41,7 +41,7 @@ static inline void plugin_insn_append(const void *from, 
> size_t size)
>  #else /* !CONFIG_PLUGIN */
>  
>  static inline
> -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
> +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool 
> supress)
>  {
>  return false;
>  }
> diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
> index 841deed79c..2a26a2277f 100644
> --- a/include/qemu/plugin.h
> +++ b/include/qemu/plugin.h
> @@ -92,6 +92,7 @@ struct qemu_plugin_dyn_cb {
>  };
>  };
>  
> +/* Internal context for instrumenting an instruction */
>  struct qemu_plugin_insn {
>  GByteArray *data;
>  uint64_t vaddr;
> @@ -99,6 +100,7 @@ struct qemu_plugin_insn {
>  GArray *cbs[PLUGIN_N_CB_TYPES][PLUGIN_N_CB_SUBTYPES];
>  bool calls_helpers;
>  bool mem_helper;
> +bool store_only;
>  };
>  
>  /*
> @@ -128,6 +130,7 @@ static inline struct qemu_plugin_insn 
> *qemu_plugin_insn_alloc(void)
>  return insn;
>  }
>  
> +/* Internal context for this TranslationBlock */
>  struct qemu_plugin_tb {
>  GPtrArray *insns;
>  size_t n;
> @@ -135,6 +138,7 @@ struct qemu_plugin_tb {
>  uint64_t vaddr2;
>  void *haddr1;
>  void *haddr2;
> +bool store_only;
>  GArray *cbs[PLUGIN_N_CB_SUBTYPES];
>  };
>  
> diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
> index 8a1bb801e0..137b91282e 100644
> --- a/accel/tcg/plugin-gen.c
> +++ b/accel/tcg/plugin-gen.c
> @@ -842,7 +842,7 @@ static void plugin_gen_inject(const struct qemu_plugin_tb 
> *plugin_tb)
>  pr_ops();
>  }
>  
> -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
> +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool 
> store_only)
>  {
>  struct qemu_plugin_tb *ptb = tcg_ctx->plugin_tb;
>  bool ret = false;
> @@ -855,6 +855,7 @@ bool plugin_gen_tb_start(CPUState *cpu, const 
> TranslationBlock *tb)
>  ptb->vaddr2 = -1;
>  get_page_addr_code_hostp(cpu->env_ptr, tb->pc, >haddr1);
>  ptb->haddr2 = NULL;
> +ptb->store_only = store_only;
>  
>  plugin_gen_empty_callback(PLUGIN_GEN_FROM_TB);
>  }
> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index 14d1ea795d..082f2c8ee1 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -58,7 +58,7 @@ void translator_loop(const TranslatorOps *ops, 
> DisasContextBase *db,
>  ops->tb_start(db, cpu);
>  tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
>  
> -plugin_enabled = !(tb_cflags(db->tb) & CF_NOINSTR) && 
> pl

Re: [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags

2021-02-12 Thread Aaron Lindsay via
On Feb 12 11:22, Alex Bennée wrote:
> Aaron Lindsay  writes:
> > On Feb 10 22:10, Alex Bennée wrote:
> >> When icount is enabled and we recompile an MMIO access we end up
> >> double counting the instruction execution. To avoid this we introduce
> >> the CF_NOINSTR cflag which disables instrumentation for the next TB.
> >> As this is part of the hashed compile flags we will only execute the
> >> generated TB while coming out of a cpu_io_recompile.
> >
> > Unfortunately this patch works a little too well!
> >
> > With this change, the memory access callbacks registered via
> > `qemu_plugin_register_vcpu_mem_cb()` are never called for the
> > re-translated instruction making the IO access, since we've disabled all
> > instrumentation.
> 
> Hmm well we correctly don't instrument stores (as we have already
> executed the plugin for them) - but of course the load instrumentation
> is after the fact so we are now missing them.

I do not believe I am seeing memory callbacks for stores, either. Are
you saying I definitely should be?

My original observation was that the callbacks for store instructions to
IO followed the same pattern as loads:

1) Initial instruction callback (presumably as part of larger block)
2) Second instruction callback (presumably as part of single-instruction block)
3) Memory callback (presumably as part of single-instruction block)

After applying v2 of your patchset I now see only 1), even for stores.

> > Is it possible to selectively disable only instruction callbacks using
> > this mechanism, while still allowing others that would not yet have been
> > called for the re-translated instruction?
> 
> Hmmm let me see if I can finesse the CF_NOINSTR logic to allow
> plugin_gen_insn_end() without the rest? It probably needs a better name
> for the flag as well. 

Funny, the first time reading through this patch I was unsure for a
second whether "CF_NOINSTR" stood for "NO INSTRuction callbacks" or "NO
INSTRumentation"!

-Aaron



Re: [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags

2021-02-11 Thread Aaron Lindsay via
On Feb 10 22:10, Alex Bennée wrote:
> When icount is enabled and we recompile an MMIO access we end up
> double counting the instruction execution. To avoid this we introduce
> the CF_NOINSTR cflag which disables instrumentation for the next TB.
> As this is part of the hashed compile flags we will only execute the
> generated TB while coming out of a cpu_io_recompile.

Unfortunately this patch works a little too well!

With this change, the memory access callbacks registered via
`qemu_plugin_register_vcpu_mem_cb()` are never called for the
re-translated instruction making the IO access, since we've disabled all
instrumentation.

Is it possible to selectively disable only instruction callbacks using
this mechanism, while still allowing others that would not yet have been
called for the re-translated instruction?

-Aaron

> While we are at it delete the old TODO. We might as well keep the
> translation handy as it's likely you will repeatedly hit it on each
> MMIO access.
> 
> Reported-by: Aaron Lindsay 
> Signed-off-by: Alex Bennée 
> Reviewed-by: Richard Henderson 
> Message-Id: <20210209182749.31323-12-alex.ben...@linaro.org>
> 
> ---
> v2
>   - squashed CH_HASHMASK to ~CF_INVALID
> ---
>  include/exec/exec-all.h   |  6 +++---
>  accel/tcg/translate-all.c | 17 -
>  accel/tcg/translator.c|  2 +-
>  3 files changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index e08179de34..299282cc59 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -454,14 +454,14 @@ struct TranslationBlock {
>  uint32_t cflags;/* compile flags */
>  #define CF_COUNT_MASK  0x7fff
>  #define CF_LAST_IO 0x8000 /* Last insn may be an IO access.  */
> +#define CF_NOINSTR 0x0001 /* Disable instrumentation of TB */
>  #define CF_USE_ICOUNT  0x0002
>  #define CF_INVALID 0x0004 /* TB is stale. Set with @jmp_lock held */
>  #define CF_PARALLEL0x0008 /* Generate code for a parallel context */
>  #define CF_CLUSTER_MASK 0xff00 /* Top 8 bits are cluster ID */
>  #define CF_CLUSTER_SHIFT 24
> -/* cflags' mask for hashing/comparison */
> -#define CF_HASH_MASK   \
> -(CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL | 
> CF_CLUSTER_MASK)
> +/* cflags' mask for hashing/comparison, basically ignore CF_INVALID */
> +#define CF_HASH_MASK   (~CF_INVALID)
>  
>  /* Per-vCPU dynamic tracing state used to generate this TB */
>  uint32_t trace_vcpu_dstate;
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 0666f9ef14..32a3d8fe24 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -2399,7 +2399,8 @@ void tb_check_watchpoint(CPUState *cpu, uintptr_t 
> retaddr)
>  }
>  
>  #ifndef CONFIG_USER_ONLY
> -/* in deterministic execution mode, instructions doing device I/Os
> +/*
> + * In deterministic execution mode, instructions doing device I/Os
>   * must be at the end of the TB.
>   *
>   * Called by softmmu_template.h, with iothread mutex not held.
> @@ -2430,19 +2431,17 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t 
> retaddr)
>  n = 2;
>  }
>  
> -/* Generate a new TB executing the I/O insn.  */
> -cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;
> +/*
> + * Exit the loop and potentially generate a new TB executing the
> + * just the I/O insns. We also disable instrumentation so we don't
> + * double count the instruction.
> + */
> +cpu->cflags_next_tb = curr_cflags() | CF_NOINSTR | CF_LAST_IO | n;
>  
>  qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
> "cpu_io_recompile: rewound execution of TB to "
> TARGET_FMT_lx "\n", tb->pc);
>  
> -/* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
> - * the first in the TB) then we end up generating a whole new TB and
> - *  repeating the fault, which is horribly inefficient.
> - *  Better would be to execute just this insn uncached, or generate a
> - *  second new TB.
> - */
>  cpu_loop_exit_noexc(cpu);
>  }
>  
> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index a49a794065..14d1ea795d 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -58,7 +58,7 @@ void translator_loop(const TranslatorOps *ops, 
> DisasContextBase *db,
>  ops->tb_start(db, cpu);
>  tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
>  
> -plugin_enabled = plugin_gen_tb_start(cpu, tb);
> +plugin_enabled = !(tb_cflags(db->tb) & CF_NOINSTR) && 
> plugin_gen_tb_start(cpu, tb);
>  
>  while (true) {
>  db->num_insns++;
> -- 
> 2.20.1
> 



  1   2   3   4   >