[gem5-users] Implementing a trace cache in gem5

2015-12-09 Thread Marcelo Brandalero
Hi all,

I want to modify gem5 to use it in my computer architecture research. In
particular, I want to modify the O3 CPU model and include in it a structure
similar to a trace cache.

I have some idea as to where to start, based on other simulators I've
already used. My question is whether anyone has done something similar
before, and could point me in the right direction (what do I need to
understand of the source code first, where will I need to modify the most
part, etc.).

I thought of defining a new class (the one that behaves similarly to a
trace cache) that stores sequences of *StaticInst *that were previously
executed together with the initial PC address (encoded in the *PCState *
class) of the sequence. Then modify the *DefaultFetch* class to check for
any matching entry in the trace cache (current *PCState == *stored *PCState);
*if so, then use the stored *StaticInst* list to generate the dynamic
instructions (*BaseO3DynInst)* and feed these to the pipeline.

Could anyone confirm if this makes sense? Any potention issues I may run
into?

Thanks in advance!

-- 
Marcelo Brandalero
PhD student | Graduate Program in Computer Science
Federal University of Rio Grande do Sul
Porto Alegre/RS, Brazil
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] o3cpu: cache ports

2016-04-28 Thread Marcelo Brandalero
814 in lsq_unit.hh). However, the number of
> loads that can access the D-Cache each cycle is controlled by the number of
> load functional units, and not at all by "cachePorts".
>
> This means that if I set cachePorts to 1, and I have 2 load FUs, I can do
> 2 loads per cycle, but as soon as I do one load, then I cannot writeback
> any store this cycle (because "usePorts" will already be 1 or 2 when gem5
> enters writebackStores() in lsq_unit_impl.hh). On the other hand, if I set
> cachePorts to 3 I can do 2 loads and one store per cycle, but I can also WB
> three stores in a single cycle, which is not what I wanted to be able to do.
>
> This should be addressed by not increasing "usedPorts" when loads access
> the D-Cache and being explicit about what variable constrains what (i.e.,
> loads are constrained by load FUs and stores by "cachePorts"), or by also
> contraining loads on "cachePorts" (which will be hard since arbitration
> would potentially be needed between loads and stores, and since store WBs
> happen after load accesses in gem5, this can get messy). As of now, this is
> a bit of both, and performance looks fine at first, but it's really not.
>
> I can write a small patch for the first solution (don't increase
> "usedPorts" on load accesses), but I am not sure this corresponds to the
> philosophy of the code. What do you think would be the best course of
> action?
>
> Best,
>
> Arthur.
>
> --
> Arthur Perais
> INRIA Bretagne Atlantique
> Bâtiment 12E, Bureau E303, Campus de Beaulieu
> 35042 Rennes, France
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>
> ___
> gem5-users mailing 
> listgem5-users@gem5.orghttp://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
>
> --
> Arthur Perais
> INRIA Bretagne Atlantique
> Bâtiment 12E, Bureau E303, Campus de Beaulieu
> 35042 Rennes, France
>
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>



-- 
Marcelo Brandalero
PhD student
Programa de Pós Graduação em Computação
Universidade Federal do Rio Grande do Sul
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] o3cpu: cache ports

2016-04-28 Thread Marcelo Brandalero
 all,
>>
>> In the O3 LSQ there is a variable called "cachePorts" which controls the
>> number of stores that can be made each cycle (see lines 790-795 in
>> lsq_unit_impl.hh).
>> cachePorts defaults to 200 (see O3CPU.py), so in practice, there is no
>> limit on the number of stores that are written back to the D-Cache, and
>> everything works out fine.
>>
>> Now, silly me wanted to be a bit more realistic and set cachePorts to
>> one, so that I could issue one store per cycle to the D-Cache only.
>> In a few SPEC programs, this caused the SQFullEvent to get very high,
>> which I assumed was reasonable because well, less stores per cycle.
>> However, after looking into it, I found that the variable "usedPorts"
>> (which allows stores to WB only if it is < to "cachePorts") is increased by
>> stores when they WB (which is fine), but also by *load*s when they
>> access the D-Cache (see lines 768 and 814 in lsq_unit.hh). However, the
>> number of loads that can access the D-Cache each cycle is controlled by the
>> number of load functional units, and not at all by "cachePorts".
>>
>> This means that if I set cachePorts to 1, and I have 2 load FUs, I can do
>> 2 loads per cycle, but as soon as I do one load, then I cannot writeback
>> any store this cycle (because "usePorts" will already be 1 or 2 when gem5
>> enters writebackStores() in lsq_unit_impl.hh). On the other hand, if I set
>> cachePorts to 3 I can do 2 loads and one store per cycle, but I can also WB
>> three stores in a single cycle, which is not what I wanted to be able to do.
>>
>> This should be addressed by not increasing "usedPorts" when loads access
>> the D-Cache and being explicit about what variable constrains what (i.e.,
>> loads are constrained by load FUs and stores by "cachePorts"), or by also
>> contraining loads on "cachePorts" (which will be hard since arbitration
>> would potentially be needed between loads and stores, and since store WBs
>> happen after load accesses in gem5, this can get messy). As of now, this is
>> a bit of both, and performance looks fine at first, but it's really not.
>>
>> I can write a small patch for the first solution (don't increase
>> "usedPorts" on load accesses), but I am not sure this corresponds to the
>> philosophy of the code. What do you think would be the best course of
>> action?
>>
>> Best,
>>
>> Arthur.
>>
>> --
>> Arthur Perais
>> INRIA Bretagne Atlantique
>> Bâtiment 12E, Bureau E303, Campus de Beaulieu
>> 35042 Rennes, France
>>
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
>>
>> ___
>> gem5-users mailing 
>> listgem5-users@gem5.orghttp://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>>
>>
>> --
>> Arthur Perais
>> INRIA Bretagne Atlantique
>> Bâtiment 12E, Bureau E303, Campus de Beaulieu
>> 35042 Rennes, France
>>
>>
>> ___
>> gem5-users mailing list
>> gem5-users@gem5.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
>
>
> --
> Marcelo Brandalero
> PhD student
> Programa de Pós Graduação em Computação
> Universidade Federal do Rio Grande do Sul
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>



-- 
Marcelo Brandalero
PhD student
Programa de Pós Graduação em Computação
Universidade Federal do Rio Grande do Sul
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] RISCV ISA : "C" (compressed) extension supported?

2018-05-24 Thread Marcelo Brandalero
Hi all,

I recently switched from gem5/x86 to gem5/RISCV due to some advantages of
this ISA.

I'm getting some weird simulation results and I realized my compiler was
generating instructions for the compressed RISCV ISA extension (chp 12 in
the user level ISA specification <https://riscv.org/specifications/>). The
weirdness disappears when I use *--march* to remove these extensions.

*So the question is: does gem5/RISCV support this ISA extension? *If so, I
can share the weird results (maybe I'm missing something) but basically a
wide-issue O3 processor fetches only max 1 instruction/cycle when it should
probably be fetching more.

If it doesn't support then it's all OK, I just find it a bit weird that the
program executes normally with no warnings whatsoever.

Best regards,

-- 
Marcelo Brandalero
PhD Candidate
Programa de Pós Graduação em Computação
Universidade Federal do Rio Grande do Sul
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] RISCV ISA : "C" (compressed) extension supported?

2018-05-24 Thread Marcelo Brandalero
 Hi Jason, Alec,

Thanks for the fast responses!

I can say I managed to run a lot of benchmarks on O3 and none of them
crashed. I did notice however that their performance on for distinct-width
O3 processors had only minor differences (on x86, the differences were much
more significant).

I ran into this particular issue only today, though, so I can only say it
*seems* *to affect only binaries compíled with C extensions*.

I'll run the tests suggested by both of you and reply here in case I find
anything interesting.

Best regards,


On Thu, May 24, 2018 at 9:29 PM, Marcelo Brandalero 
wrote:

> Hi Jason, Alec,
>
> Thanks for the fast responses!
>
> I can say I managed to run a lot of benchmarks on O3 and none of them
> crashed. I did notice however that their performance on for distinct-width
> O3 processors had only minor differences (on x86, the differences were much
> more significant).
>
> I ran into this particular issue only today, though, so I can only say it
> *seems* *to affect only binaries compíled with C extensions*.
>
> I'll run the tests suggested and reply here in case I find anything
> interesting.
>
> Best regards,
>
> On Thu, May 24, 2018 at 9:06 PM, Alec Roelke  wrote:
>
>> Hi Marcelo,
>>
>> Yes, gem5 does support the C extension (64-bit version only, though).  I
>> don't know what could be causing your particular issue.  I'm not sure
>> advancePC is the issue, though, because all that essentially does is call
>> PCState::advance(), which is inherited unchanged from
>> GenericISA::UPCState.  Try doing as Jason suggests and run your simulation
>> with the Fetch debug flag enabled, and maybe that will shed some light on
>> the issue.
>>
>> -Alec
>>
>> On Thu, May 24, 2018 at 7:20 PM, Jason Lowe-Power 
>> wrote:
>>
>>> Hi Marcelo,
>>>
>>> I'm not sure if RISC-V has been tested with the out of order CPU at all!
>>> I'm happy that at least it doesn't completely fail!
>>>
>>> For you problem of only fetching 1 instruction per cycle... I think it's
>>> going to take some digging. My first guess would be that it could be a
>>> problem with the advancePC() function that's implemented in the RISC-V
>>> decoder (in gem5/arch/riscv), but I don't really have any specific reason
>>> to think that :).
>>>
>>> You could try turning on some debug flags for the O3 CPU. Specifically,
>>> Fetch might be helpful.
>>>
>>> Cheers,
>>> Jason
>>>
>>> On Thu, May 24, 2018 at 4:06 PM Marcelo Brandalero <
>>> mbrandal...@inf.ufrgs.br> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I recently switched from gem5/x86 to gem5/RISCV due to some advantages
>>>> of this ISA.
>>>>
>>>> I'm getting some weird simulation results and I realized my compiler
>>>> was generating instructions for the compressed RISCV ISA extension (chp
>>>> 12 in the user level ISA specification
>>>> <https://riscv.org/specifications/>). The weirdness disappears when I
>>>> use *--march* to remove these extensions.
>>>>
>>>> *So the question is: does gem5/RISCV support this ISA extension? *If
>>>> so, I can share the weird results (maybe I'm missing something) but
>>>> basically a wide-issue O3 processor fetches only max 1 instruction/cycle
>>>> when it should probably be fetching more.
>>>>
>>>> If it doesn't support then it's all OK, I just find it a bit weird that
>>>> the program executes normally with no warnings whatsoever.
>>>>
>>>> Best regards,
>>>>
>>>> --
>>>> Marcelo Brandalero
>>>> PhD Candidate
>>>> Programa de Pós Graduação em Computação
>>>> Universidade Federal do Rio Grande do Sul
>>>> ___
>>>> gem5-users mailing list
>>>> gem5-users@gem5.org
>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>
>>>
>>
>> ___
>> gem5-users mailing list
>> gem5-users@gem5.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
>
>
> --
> Marcelo Brandalero
>



-- 
Marcelo Brandalero
PhD Candidate
Programa de Pós Graduação em Computação
Universidade Federal do Rio Grande do Sul
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] RISCV ISA : "C" (compressed) extension supported?

2018-05-25 Thread Marcelo Brandalero
220: system.cpu.fetch: [tid:0]: Instruction PC 0x101b0 (0) created
[sn:8128].
4050220: system.cpu.fetch: [tid:0]: Instruction is: c_add a0, a0, a0
4050220: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4050220: system.cpu.fetch: Branch detected with PC =
(0x101b0=>0x101b2).(0=>1)*
4050220: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4050220: system.cpu.fetch: [tid:0][sn:8128]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4050533: system.cpu.fetch: Running stage.
4050533: system.cpu.fetch: Attempting to fetch from [tid:0]
4050533: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4050533: system.cpu.fetch: [tid:0]: Instruction PC 0x101b2 (0) created
[sn:8129].
4050533: system.cpu.fetch: [tid:0]: Instruction is: c_add a2, a2, a2
4050533: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4050533: system.cpu.fetch: Branch detected with PC =
(0x101b2=>0x101b4).(0=>1)*
4050533: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4050533: system.cpu.fetch: [tid:0][sn:8129]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4050846: system.cpu.fetch: Running stage.
4050846: system.cpu.fetch: Attempting to fetch from [tid:0]
4050846: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4050846: system.cpu.fetch: [tid:0]: Instruction PC 0x101b4 (0) created
[sn:8130].
4050846: system.cpu.fetch: [tid:0]: Instruction is: c_add a3, a3, a3
4050846: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4050846: system.cpu.fetch: Branch detected with PC =
(0x101b4=>0x101b6).(0=>1)*
4050846: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4050846: system.cpu.fetch: [tid:0][sn:8130]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.

Not sure if it's a decoder problem or what, but it seems to affect only
instructions in the compressed format. It manifests itself in the
statistics with the following abnormal behavior:

system.cpu.fetch.rateDist::013812 23.92% 23.92%
# Number of instructions fetched each cycle (Total)
*system.cpu.fetch.rateDist::142910 74.32%
98.24% # Number of instructions fetched each cycle (Total) *
system.cpu.fetch.rateDist::2  624  1.08% 99.32%
# Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::3  256  0.44% 99.77%
# Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::4   59  0.10% 99.87%
# Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::5   50  0.09% 99.95%
# Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::65  0.01% 99.96%
# Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::72  0.00% 99.97%
# Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::8   19  0.03%100.00%
# Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::overflows0  0.00%100.00%
# Number of instructions fetched each cycle (Total)

I won't be digging further into this, since running without compressed
format seems to fix the issue and is enough for my usage scenario. Just
thought this information could be useful for someone.

Cheers!


On Thu, May 24, 2018 at 9:33 PM, Marcelo Brandalero <
mbrandal...@inf.ufrgs.br> wrote:

> Hi Jason, Alec,
>
> Thanks for the fast responses!
>
> I can say I managed to run a lot of benchmarks on O3 and none of them
> crashed. I did notice however that their performance on for distinct-width
> O3 processors had only minor differences (on x86, the differences were much
> more significant).
>
> I ran into this particular issue only today, though, so I can only say it
> *seems* *to affect only binaries compíled with C extensions*.
>
> I'll run the tests suggested by both of you and reply here in case I find
> anything interesting.
>
> Best regards,
>
>
> On Thu, May 24, 2018 at 9:29 PM, Marcelo Brandalero 
> wrote:
>
>> Hi Jason, Alec,
>>
>> Thanks for the fast responses!
>>
>> I can say I managed to run a lot of benchmarks on O3 and none of them
>> crashed. I did notice however that their performance on for distinct-width
>> O3 processors had only minor differences (on x86, the differences were much
>> more significant).
>>
>> I ran into this particular issue only today, though, so I can only say it
>> *seems* *to affect only binaries compíled with C extensions*.
>>
>> I'll run the tests suggested and reply here in case I find anything

[gem5-users] RISC-V ISA + Gathering stats for ROI only

2019-07-25 Thread Marcelo Brandalero
Hi all,

I have an application compiled for the RISC-V ISA running in gem5 SE mode.
I want to profile, i.e. get the execution stats, only for a specific kernel
/ Region of Interest (ROI) inside the code.

I understand the standard way to approach this would be to insert m5_ops
that call m5_reset_stats() and m5_dump_stats() but, if I understand
correctly, m5_ops support for RISC-V is still unavailable (thread from Dec
2017 <https://gem5-users.gem5.narkive.com/h2DCOtDb/m5ops-with-riscv>).

Another way I see is to modify the configuration script by inserting calls
to *system.cpu.scheduleInstStop *and scheduling an event triggered at a
certain instruction count to reset and dump the stats. However, I cannot
figure an easy way to find the exact instruction count where the ROI begins
and ends.

I wonder if there is another simple way to approach this. Is there some way
to access the application's *stdout *from inside the python configuration
script? This way I could insert printf("ROI BEGIN\n"); in my code, compile
without any external library, and then run in gem5 while monitoring the
*stdout* from inside the python script. When that output line is found, the
events are triggered.

Is there any suggestion on how to gathering stats for ROI only in a RISC-V
program, considering the unavailability of m5_ops?

Thank you and best regards

-- 
Marcelo Brandalero
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users