Hi Giacomo, 

Many thanks. This time, it works fine and I feel that I really understand how 
the DTB, the GIC and the gem5 code interact together! After declaring correctly 
the PMU in the DTB like you did, we have this confirmation at boot time that 
the Linux kernel correctly see it: 

> [ 0.239967] hw perfevents: enabled with armv8_pmuv3 PMU driver, 32 counters
> available

Just one thing. On my real ARM hardware, I used perf_event with the 
PERF_TYPE_HARDWARE type of event. It doesn't work like this for my gem5 
simulated system, perf_event was not able to establish a correspondence between 
gem5 events and architectural events -- despite that the events number are the 
same and corresponds to the ARMv8 specification. I don't know the reason. Thus, 
the workaround is to use the PERF_TYPE_RAW type of event, and the event ids 
that are declared in the ArmPMU.py file directly in our C source code. 

I will see how to send patches and learn how to use gerrit. Thanks for your 
help. 

Best, 
Pierre 

> De: "Giacomo Travaglini" <giacomo.travagl...@arm.com>
> À: "gem5-users" <gem5-users@gem5.org>
> Cc: "Pierre Ayoub" <pierre.ay...@irisa.fr>
> Envoyé: Mardi 29 Septembre 2020 22:22:15
> Objet: RE: Using perf_event with the ARM PMU inside gem5 on Linux

> Hey Pierre,

> You are actually very close to get it right! The problem is: there should be a
> single PMU instantiation.

> What you need to do in the BaseCPU is:

> # Generate nodes from the BaseCPU children.

> # Please note: this is mainly needed for the ISA class

> for node in self.recurseDeviceTree(state):

> yield node

> Please feel free to push this BaseCPU and ArmISA changes as separate patches 
> to
> gerrit if you want (I have implemented it in the same way). I will post the 
> PMU
> one (it is similar to what you are doing but I have done some other
> refactoring)

> Another thing. You are using PPIs for the PMU (good)

> PPIs are per-cpu interrupts; by being local to a PE, there’s no need of 
> having a
> different PPI number per core (and the GIC/PMU driver might actually complain)

> So rather than doing:

> ints = [20, 21, 22, 23]

> You should do something like (example)

> ints = [22, 22, 22, 22]

> Kind Regards

> Giacomo

> From: Pierre Ayoub via gem5-users <gem5-users@gem5.org>
> Sent: 29 September 2020 18:05
> To: Giacomo Travaglini <giacomo.travagl...@arm.com>
> Cc: gem5-users <gem5-users@gem5.org>; Pierre Ayoub <pierre.ay...@irisa.fr>
> Subject: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on 
> Linux

> Hi Giacomo,

> Thank you for your reply. Your hint about the DTB gives me a great starting
> point to make a lot of research about it, and its relation between the Linux
> kernel and the ARM PMU. I though that I would be able to fix this myself, by
> studying how gem5 generate the DTB and how the PMU is declared in a DTB.
> However, despite that I have learned a lot of things, I was wrong.

> In my system script, I declare and attach a PMU like this:

>> ints = [20, 21, 22, 23]
>> assert len(ints) == len(system.cpu_cluster.cpus)

>> for cpu, pint in zip(system.cpu_cluster.cpus, ints):
>> for isa in cpu.isa:
>> isa.pmu = ArmPMU(interrupt=ArmPPI(num=pint))
>> isa.pmu.addArchEvents(
>> cpu=cpu, dtb=cpu.dtb, itb=cpu.itb,
>> icache=getattr(cpu, "dcache", None),
>> dcache=getattr(cpu, "icache", None),
>> l2cache=getattr(system.cpu_cluster, "l2", None))

> And I applied this patch to gem5:

>> diff --git i/src/arch/arm/ArmISA.py w/src/arch/arm/ArmISA.py
>> index 2641ec3fb..3d85c1b75 100644
>> --- i/src/arch/arm/ArmISA.py
>> +++ w/src/arch/arm/ArmISA.py
>> @@ -36,6 +36,7 @@
>> from m5.params import *
>> from m5.proxy import *

>> +from m5.SimObject import SimObject
>> from m5.objects.ArmPMU import ArmPMU
>> from m5.objects.ArmSystem import SveVectorLength
>> from m5.objects.BaseISA import BaseISA
>> @@ -49,6 +50,8 @@ class ArmISA(BaseISA):
>> cxx_class = 'ArmISA::ISA'
>> cxx_header = "arch/arm/isa.hh"

>> + generateDeviceTree = SimObject.recurseDeviceTree
>> +
>> system = Param.System(Parent.any, "System this ISA object belongs to")

>> pmu = Param.ArmPMU(NULL, "Performance Monitoring Unit")
>> diff --git i/src/arch/arm/ArmPMU.py w/src/arch/arm/ArmPMU.py
>> index 047e908b3..58553fbf9 100644
>> --- i/src/arch/arm/ArmPMU.py
>> +++ w/src/arch/arm/ArmPMU.py
>> @@ -40,6 +40,7 @@ from m5.params import *
>> from m5.params import isNullPointer
>> from m5.proxy import *
>> from m5.objects.Gic import ArmInterruptPin
>> +from m5.util.fdthelper import *

>> class ProbeEvent(object):
>> def __init__(self, pmu, _eventId, obj, *listOfNames):
>> @@ -76,6 +77,17 @@ class ArmPMU(SimObject):

>> _events = None

>> + def generateDeviceTree(self, state):
>> + node = FdtNode("pmu")
>> + node.appendCompatible("arm,armv8-pmuv3")
>> + # gem5 uses GIC controller interrupt notation, where PPI interrupts
>> + # start to 16. However, the Linux kernel start from 0, and used a tag
>> + # (set to 1) to indicate the PPI interrupt type.
>> + node.append(FdtPropertyWords("interrupts", [
>> + 1, int(self.interrupt.num) - 16, 0xf04
>> + ]))
>> + yield node
>> +
>> def addEvent(self, newObject):
>> if not (isinstance(newObject, ProbeEvent)
>> or isinstance(newObject, SoftwareIncrement)):
>> diff --git i/src/cpu/BaseCPU.py w/src/cpu/BaseCPU.py
>> index ab70d1d7f..e5d0ed3dd 100644
>> --- i/src/cpu/BaseCPU.py
>> +++ w/src/cpu/BaseCPU.py
>> @@ -302,6 +302,9 @@ class BaseCPU(ClockedObject):
>> node.appendPhandle(phandle_key)
>> cpus_node.append(node)

>> + for subnode in self.recurseDeviceTree(state):
>> + node.append(subnode)
>> +
>> yield cpus_node

>> def __init__(self, **kwargs):

> I end up with a DTB with this:

>> pmu {
>> compatible = "arm,armv8-pmuv3";
>> interrupts = <0x01 0x04 0xf04>;
>> };
>> pmu {
>> compatible = "arm,armv8-pmuv3";
>> interrupts = <0x01 0x05 0xf04>;
>> };
>> pmu {
>> compatible = "arm,armv8-pmuv3";
>> interrupts = <0x01 0x06 0xf04>;
>> };
>> pmu {
>> compatible = "arm,armv8-pmuv3";
>> interrupts = <0x01 0x07 0xf04>;
>> };

> One PMU declaration for one core. However, it does not work. I don't even know
> if this kind of declaration is correct, maybe we have to declare the PMU once
> for all cores -- instead of one by core ?

> Note that the configuration of the kernel is correct to normally initialize
> perf_event (in /proc/config.gz).

> Many thanks if you help me, and many thanks also if you post a patch in the
> future.

> Best,

> Pierre

>> De: "Giacomo Travaglini" < [ mailto:giacomo.travagl...@arm.com |
>> giacomo.travagl...@arm.com ] >
>> À: "gem5-users" < [ mailto:gem5-users@gem5.org | gem5-users@gem5.org ] >
>> Cc: "Pierre Ayoub" < [ mailto:pierre.ay...@irisa.fr | pierre.ay...@irisa.fr 
>> ] >
>> Envoyé: Jeudi 24 Septembre 2020 12:09:17
>> Objet: RE: Using perf_event with the ARM PMU inside gem5 on Linux
>> Hi Pierre,

>> First of all many thanks for explaining in detail what is your problem. This 
>> is
>> very helpful.

>> The reason why you are not able to use perf_events is probably because the
>> kernel is not aware of the presence of PMUs. This is usually communicated to
>> Linux via the DTB. I can see how we are not enabling DTB autogen for the
>> ArmPMU.

>> I will post a patch

>> Kind Regards

>> Giacomo

>> From: Pierre Ayoub via gem5-users < [ mailto:gem5-users@gem5.org |
>> gem5-users@gem5.org ] >
>> Sent: 23 September 2020 08:45
>> To: [ mailto:gem5-users@gem5.org | gem5-users@gem5.org ]
>> Cc: Pierre Ayoub < [ mailto:pierre.ay...@irisa.fr | pierre.ay...@irisa.fr ] >
>> Subject: [gem5-users] Using perf_event with the ARM PMU inside gem5 on Linux

>> Hi gem5's users,

>> TL;DR:
>> ------

>> I know that the ARM PMU is partially implemented, thanks to the gem5 source
>> code and some publications. I have a binary which uses perf_event to access 
>> the
>> PMU on a Linux-based OS, under an ARM processor, on real hardware. Could it 
>> use
>> perf_event inside a gem5 full-system simulation with a Linux kernel, under 
>> the
>> ARM ISA? So far, I haven't found the right way to do it. If someone knows, I
>> will be very grateful!

>> Detailed information:
>> ---------------------

>> I have a binary (developed by myself) which uses perf_event on real ARM
>> hardware, to get cache misses and mispredicted branches, and it works well. 
>> My
>> "perf_event_attr.type" is configured with "PERF_TYPE_HARDWARE" and the
>> ".config" field with "PERF_COUNT_HW_CACHE_MISSES" and another with
>> "PERF_COUNT_HW_BRANCH_MISSES." However, when I put this binary on a gem5 fs
>> simulation, configured with the DerivO3CPU, ArmSystem, and RealView 
>> platform, I
>> got the following error:

>> "ENOENT (2): No such file or directory"

>> The perf_event file descriptor is not created by the kernel (equal to -1). I
>> wish to precise that this error arrives at the return of the 
>> perf_event_open()
>> syscall. Finally, this error is documented in the perf_event_open.2 manpage,
>> and also discussed here. However, it didn't help me to understand the error
>> regarding gem5.

>> I don't know if we can access the PMU through perf_event into gem5. If so,
>> maybe we have to use RAW events? (i.e., do you know if perf_event is supposed
>> to be initialized with PERF_EVENT_HARDWARE or PERF_EVENT_RAW, to be used with
>> gem5?) In the gem5 example code under configs, I have found a snippet in
>> devices.py which "Instantiates 1 ArmPMU per PE" (addPMUs()). However, after 
>> few
>> tries, I don't understand how to use this correctly and how it is related to
>> perf_event.

>> I used a code similar to addPMUs() in devices.py, with PPI interrupts number
>> 20, 21, 22, and 23 (one by core) according to the RealView interrupts 
>> mapping,
>> with the ArmPPI class. However, perf_event_open() still return the same
>> error. Note also that I got this message during the boot:

>> src/arch/arm/pmu.cc:293: warn: Not doing anything for write to miscreg
>> pmuserenr_el0.

>> This register is documented in the ARMv8-A architecture manual. I have 
>> checked
>> the pmu.cc file, and saw that writing to this register is not implemented 
>> (TODO
>> state). Normally, it should not be a problem since this register allows (when
>> set to 1) userland access to the PMU, which we don't want because I want to
>> access it through the Linux kernel perf_event interface.

>> With --debug-flags=PMUVerbose, I get the following:

>> 0: system.cpu_cluster.cpus0.isa.pmu: Initializing the PMU.
>> [...]
>> 0: system.cpu_cluster.cpus0.isa.pmu: PMU: Adding Probe Driven event with id
>> '0x2'as probe system.cpu_cluster.cpus0.itb:Refills
>> [...]
>> 8687351673751: system.cpu_cluster.cpus0.isa.pmu: Assigning PMU to ContextID 
>> 0.
>> [...]
>> 8687351673751: system.cpu_cluster.cpus0.isa.pmu: updateCounter(31): Disabling
>> counter
>> [...]

>> Now, you know all I know about this issue!

>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended 
>> recipient,
>> please notify the sender immediately and do not disclose the contents to any
>> other person, use it for any purpose, or store or copy the information in any
>> medium. Thank you.
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended 
> recipient,
> please notify the sender immediately and do not disclose the contents to any
> other person, use it for any purpose, or store or copy the information in any
> medium. Thank you.
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to