On Wed, Nov 16, 2016 at 11:08:42AM -0500, Christopher Covington wrote: > On 11/16/2016 08:01 AM, Andrew Jones wrote: > > On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote: > >> > >> > >> On 11/14/2016 09:12 AM, Christopher Covington wrote: > >>> Hi Drew, Wei, > >>> > >>> On 11/14/2016 05:05 AM, Andrew Jones wrote: > >>>> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote: > >>>>> > >>>>> > >>>>> On 11/11/2016 01:43 AM, Andrew Jones wrote: > >>>>>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote: > >>>>>>> From: Christopher Covington <c...@codeaurora.org> > >>>>>>> > >>>>>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing, > >>>>>>> even for the smallest delta of two subsequent reads. > >>>>>>> > >>>>>>> Signed-off-by: Christopher Covington <c...@codeaurora.org> > >>>>>>> Signed-off-by: Wei Huang <w...@redhat.com> > >>>>>>> --- > >>>>>>> arm/pmu.c | 98 > >>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>>>> 1 file changed, 98 insertions(+) > >>>>>>> > >>>>>>> diff --git a/arm/pmu.c b/arm/pmu.c > >>>>>>> index 0b29088..d5e3ac3 100644 > >>>>>>> --- a/arm/pmu.c > >>>>>>> +++ b/arm/pmu.c > >>>>>>> @@ -14,6 +14,7 @@ > >>>>>>> */ > >>>>>>> #include "libcflat.h" > >>>>>>> > >>>>>>> +#define PMU_PMCR_E (1 << 0) > >>>>>>> #define PMU_PMCR_N_SHIFT 11 > >>>>>>> #define PMU_PMCR_N_MASK 0x1f > >>>>>>> #define PMU_PMCR_ID_SHIFT 16 > >>>>>>> @@ -21,6 +22,10 @@ > >>>>>>> #define PMU_PMCR_IMP_SHIFT 24 > >>>>>>> #define PMU_PMCR_IMP_MASK 0xff > >>>>>>> > >>>>>>> +#define PMU_CYCLE_IDX 31 > >>>>>>> + > >>>>>>> +#define NR_SAMPLES 10 > >>>>>>> + > >>>>>>> #if defined(__arm__) > >>>>>>> static inline uint32_t pmcr_read(void) > >>>>>>> { > >>>>>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void) > >>>>>>> asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); > >>>>>>> return ret; > >>>>>>> } > >>>>>>> + > >>>>>>> +static inline void pmcr_write(uint32_t value) > >>>>>>> +{ > >>>>>>> + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value)); > >>>>>>> +} > >>>>>>> + > >>>>>>> +static inline void pmselr_write(uint32_t value) > >>>>>>> +{ > >>>>>>> + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value)); > >>>>>>> +} > >>>>>>> + > >>>>>>> +static inline void pmxevtyper_write(uint32_t value) > >>>>>>> +{ > >>>>>>> + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value)); > >>>>>>> +} > >>>>>>> + > >>>>>>> +/* > >>>>>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, > >>>>>>> returning 64 > >>>>>>> + * bits doesn't seem worth the trouble when differential usage of > >>>>>>> the result is > >>>>>>> + * expected (with differences that can easily fit in 32 bits). So > >>>>>>> just return > >>>>>>> + * the lower 32 bits of the cycle count in AArch32. > >>>>>> > >>>>>> Like I said in the last review, I'd rather we not do this. We should > >>>>>> return the full value and then the test case should confirm the upper > >>>>>> 32 bits are zero. > >>>>> > >>>>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit > >>>>> register. We can force it to a more coarse-grained cycle counter with > >>>>> PMCR.D bit=1 (see below). But it is still not a 64-bit register. > >>> > >>> AArch32 System Register Descriptions > >>> Performance Monitors registers > >>> PMCCNTR, Performance Monitors Cycle Count Register > >>> > >>> To access the PMCCNTR when accessing as a 32-bit register: > >>> MRC p15,0,<Rt>,c9,c13,0 ; Read PMCCNTR[31:0] into Rt > >>> MCR p15,0,<Rt>,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are > >>> unchanged > >>> > >>> To access the PMCCNTR when accessing as a 64-bit register: > >>> MRRC p15,0,<Rt>,<Rt2>,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] > >>> into Rt2 > >>> MCRR p15,0,<Rt>,<Rt2>,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to > >>> PMCCNTR[63:32] > >>> > >> > >> Thanks. I did some research based on your info and came back with the > >> following proposals (Cov, correct me if I am wrong): > >> > >> By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I > >> think this 64-bit cycle register is only available when running under > >> aarch32 compatibility mode on ARMv8 because it is not specified in A15 > >> TRM. > > That interpretation sounds really strange to me. My recollection is that the > cycle counter was available as a 64 bit register in ARMv7 as well. I would > expect the Cortex TRMs to omit such details. The ARMv7 Architecture Reference > Manual is the complete and authoritative source.
Yes, the v7 ARM ARM is the authoritative source, and it says 32-bit. Whereas the v8 ARM ARM wrt to AArch32 mode says it's both 32 and 64. > > >> To further verify it, I tested 32-bit pmu code on QEMU with TCG > >> mode. The result is: accessing 64-bit PMCCNTR using the following > >> assembly failed on A15: > >> > >> volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi)); > >> or > >> volatile("mrrc p15, 0, %Q0, %R0, c9" : "=r" (val)); > > The PMU implementation on QEMU TCG mode is infantile. (I was trying to > write these tests to help guide fixes and enhancements in a > test-driven-development manner.) I would not trust QEMU TCG to behave > properly here. If you want to execute those instructions, is there anything > preventing you from doing it on hardware, or at least the Foundation Model? > > >> Given this difference, I think there are two solutions for 64-bit > >> AArch32 pmccntr_read, as requested by Drew: > >> > >> 1) The PMU unit testing code tells if it is running under ARMv7 or under > >> AArch32-compability mode. When it is running ARMv7, such as A15, let us > >> use "MRC p15,0,<Rt>,c9,c13,0" and clear the upper 32-bit as 0. Otherwise > >> use "MRRC p15,0,<Rt>,<Rt2>,c9". > >> > >> 2) Returns 64-bit results for ARM pmccntr_read(). But we only uses "MRC > >> p15,0,<Rt>,c9,c13,0" and always clear the upper 32-bit as 0. This will > >> be the same as the original code. > > > > 3) For the basic test do (2), but add an additional test for AArch32 > > mode that also does the MRRC. That way on AArch32 we test both access > > types. > > The upper bits being non-zero is an insane corner case. The bits will most likely be zero, but how do you know without checking? Also, just invoking an mrrc vs. mrc is a big difference when testing emulation. We shouldn't skip it just because we expect it to give us a boring result. > > I'd really prefer to first build some momentum with checks for issues that > are A) likely to occur and 2) not too difficult to check, like whether PMCR > is writeable (especially relevant to KVM mode where it's not by default). Looks like we're on the right track for this starter series then. Thanks, drew > > Thanks, > Cov > > -- > Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm > Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code > Aurora Forum, a Linux Foundation Collaborative Project. >