[gem5-users] Seperate cache line size

2024-02-14 Thread Nazmus Sakib via gem5-users
Hello.
Is there a way, to change cacheline size for different level of cache ?
Example: L1 cacheline size is 64 byte and L2 is 128 bytes ?
If there is not a direct way (changing some parameter from python), what will 
be the issues with building this ?

Things I know:
1. fetchbuffer size has to be equal or less than cacheline size, otherwise 
panic happens in gem5 (although I dont know why, and would like to know).
2. In src/mem/cache/cache.cc, the constructor for cache::basecache() is called 
with p.system->cacheLineSize(). I am guessing changing this to user defined 
value will let me get a cache with whatever cacheline I want. However, I saw in 
cache.cc, Cache::satisfyRequest():
// determine if this read is from a (coherent) cache or not
if (pkt->fromCache()) {
assert(pkt->getSize() == blkSize);

>From the comment, it looks like this is for a request either from another 
>cache from same level (2 L1 cache in 2 processor), or it can be from L1 to L2.
The assertion is making me think, separate cacheline size will not work here.

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Dumping network traces from gem5 for Tarce-based NoC simulation

2024-02-14 Thread Hansika Madushan Weerasena Loku Kattadige via gem5-users
Hi everyone,

I have a requirement to dump traffic traces of running a program in gem5 and
replay it in another NoC simulator (e.g., Noxim). I have two questions, and any
help or pointers would be appreciated.


  1.
 I want to dump traces of every inter-node traffic (e.g., a read request from an
L1 cache to a directory at another node). Since I don't need a detailed NoC
simulation for trace (I only need input/output to the interconnection network),
I plan to use a simple network instead of garnet.  I want a single row of
traces to have information on at least the source, destination, and packet
size. I want to know what files I have to consider putting the DPRINTF
statement to dump traces into a debug file.

  2.
I'm planning to conduct a dependency-based simulation using Netrace
(https://www.cs.utexas.edu/~netrace/) on traced traffic from gem5. However, the
Netrace library does not provide APIs or functions for preprocessing trace
files to identify dependencies. Did anyone use Netrace to dump gem5 traffic
traces and preprocess it for dependencies? If so, what parameters were dumped,
and how did you conduct dependency between each trace?


Also, I would like to talk about any other approaches in dumping network traces
from gem5.


Thanks, and Regards,
Hansika Weerasena

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Dumping network traces from gem5 for Tarce-based NoC simulation

2024-02-14 Thread Hansika Madushan Weerasena Loku Kattadige via gem5-users
Hi everyone,

I have a requirement to dump traffic traces of running a program in gem5 and 
replay it in another NoC simulator (e.g., Noxim). I have two questions, and any 
help or pointers would be appreciated.


  1.
I want to dump traces of every inter-node traffic (e.g., a read request from an 
L1 cache to a directory at another node). Since I don't need a detailed NoC 
simulation for trace (I only need input/output to the interconnection network), 
I plan to use a simple network instead of garnet.  I want a single row of 
traces to have information on at least the source, destination, and packet 
size. I want to know what files I have to consider putting the DPRINTF 
statement to dump traces into a debug file.
  2.
I'm planning to conduct a dependency-based simulation using Netrace 
(https://www.cs.utexas.edu/~netrace/) on traced traffic from gem5. However, the 
Netrace library does not provide APIs or functions for preprocessing trace 
files to identify dependencies. Did anyone use Netrace to dump gem5 traffic 
traces and preprocess it for dependencies? If so, what parameters were dumped, 
and how did you conduct dependency between each trace?

Also, I would like to talk about any other approaches in dumping network traces 
from gem5.


Thanks and Regards,
Hansika Weerasena
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Architectural state of registers - O3CPU

2024-02-14 Thread Eliot Moss via gem5-users

On 2/14/2024 1:14 PM, Eliot Moss via gem5-users wrote:

On 2/14/2024 12:52 PM, reverent.green--- via gem5-users wrote:


I would like to add some additional information. The register number does
vary in each iteration, sometimes it is above 100. So I think it should be
the physical register value.  If my understanding is correct, the physical
register should be set during the IEW stage before the instruction is
commited or squashed at the last stage. Otherwise out-of-order execution
wouldn't be possible.  So in the end I am searching the point at which the
physical register is set and marked as ready for subsequent instruction,
which depend on this specific register.


Yes, it makes sense that it is a physical register.  For arithmetic, register
to register move, etc., it would be written in IEW.  But for loads, it cannot
be written until LSQ processing, which is later in the pipeline.  I believe
there is a notion of the register being *ready*, and it will be marked ready
when it is written.  Likewise, once all of an instruction's input registers
are ready, that instruction may be executed (the instruction itself becomes
ready).  You can look for the 'writeback' function in lsq_unit.cc.  It clearly
has some relationship to IEW, but it explicitly calls completeAcc, which does
the actual write into the register.  The specific code for that came from the
instruction's template.  This is necessarily so - consider the difference
between loading a byte (say) vs a word, and sign- vs zero-extended values.


See also function writebackInsts in iew.cc.  EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Architectural state of registers - O3CPU

2024-02-14 Thread Eliot Moss via gem5-users

On 2/14/2024 12:52 PM, reverent.green--- via gem5-users wrote:


I would like to add some additional information. The register number does
vary in each iteration, sometimes it is above 100. So I think it should be
the physical register value.  If my understanding is correct, the physical
register should be set during the IEW stage before the instruction is
commited or squashed at the last stage. Otherwise out-of-order execution
wouldn't be possible.  So in the end I am searching the point at which the
physical register is set and marked as ready for subsequent instruction,
which depend on this specific register.


Yes, it makes sense that it is a physical register.  For arithmetic, register
to register move, etc., it would be written in IEW.  But for loads, it cannot
be written until LSQ processing, which is later in the pipeline.  I believe
there is a notion of the register being *ready*, and it will be marked ready
when it is written.  Likewise, once all of an instruction's input registers
are ready, that instruction may be executed (the instruction itself becomes
ready).  You can look for the 'writeback' function in lsq_unit.cc.  It clearly
has some relationship to IEW, but it explicitly calls completeAcc, which does
the actual write into the register.  The specific code for that came from the
instruction's template.  This is necessarily so - consider the difference
between loading a byte (say) vs a word, and sign- vs zero-extended values.

Regards - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Architectural state of registers - O3CPU

2024-02-14 Thread reverent.green--- via gem5-users

I would like to add some additional information. The register number does vary in each iteration, sometimes it is above 100. So I think it should be the physical register value.
If my understanding is correct, the physical register should be set during the IEW stage before the instruction is commited or squashed at the last stage. Otherwise out-of-order execution wouldn't be possible.

 

So in the end I am searching the point at which the physical register is set and marked as ready for subsequent instruction, which depend on this specific register.

 

Gesendet: Mittwoch, 14. Februar 2024 um 18:35 Uhr
Von: "Eliot Moss" 
An: "The gem5 Users mailing list" 
Cc: reverent.gr...@web.de
Betreff: Re: [gem5-users] Re: Architectural state of registers - O3CPU

On 2/14/2024 12:26 PM, reverent.green--- via gem5-users wrote:
> Hey Eliot,
> thank you for your answer. I have a follow-up question.
> I know, that there are more physical registers than architectural ones and that the achitectural state should be set in
> the final commit state.
> So if the debug message linked in my earlier mail shows e.g.: "Setting int register 54 to 0x53000", this "register 54"
> should be a physical register and it can be used without setting the architectural state?
> Do you know, at which point in the O3 steps this physical register is set after an instruction?

That's something where I'd need to dig into the code the make sure. However,
the number 53 is fairly large so my first impression is that it is a physical
register number, not a logical (architectural) one. On the other hand, if you
count up integer registers, floating point registers, vector registers, etc.,
53 could be in the range of the architectural registers. I do know that if
you request debug trace information from gem5, it will tend to refer to
architectural registers.

I don't know precisely where the physical register is set, but my first
thought is IEW - the W part stands for Writeback, i.e., when registers
typically are written. However, loads are probably written later since they
are not computational but wait for a response from the cache. As I recall,
the load/store queue processing is a separate step in the pipeline, coming
later than IEW.

EM


___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Architectural state of registers - O3CPU

2024-02-14 Thread Eliot Moss via gem5-users

On 2/14/2024 12:26 PM, reverent.green--- via gem5-users wrote:

Hey Eliot,
thank you for your answer. I have a follow-up question.
I know, that there are more physical registers than architectural ones and that the achitectural state should be set in 
the final commit state.
So if the debug message linked in my earlier mail shows e.g.: "Setting int register 54 to 0x53000", this "register 54" 
should be a physical register and it can be used without setting the architectural state?

Do you know, at which point in the O3 steps this physical register is set after 
an instruction?


That's something where I'd need to dig into the code the make sure.  However,
the number 53 is fairly large so my first impression is that it is a physical
register number, not a logical (architectural) one.  On the other hand, if you
count up integer registers, floating point registers, vector registers, etc.,
53 could be in the range of the architectural registers.  I do know that if
you request debug trace information from gem5, it will tend to refer to
architectural registers.

I don't know precisely where the physical register is set, but my first
thought is IEW - the W part stands for Writeback, i.e., when registers
typically are written.  However, loads are probably written later since they
are not computational but wait for a response from the cache.  As I recall,
the load/store queue processing is a separate step in the pipeline, coming
later than IEW.

EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Architectural state of registers - O3CPU

2024-02-14 Thread reverent.green--- via gem5-users
Hey Eliot,

thank you for your answer. I have a follow-up question.

I know, that there are more physical registers than architectural ones and that the achitectural state should be set in the final commit state.
So if the debug message linked in my earlier mail shows e.g.: "Setting int register 54 to 0x53000", this "register 54" should be a physical register and it can be used without setting the architectural state?

Do you know, at which point in the O3 steps this physical register is set after an instruction?

 

Kind regards

 
 

Gesendet: Mittwoch, 14. Februar 2024 um 17:47 Uhr
Von: "Eliot Moss" 
An: "The gem5 Users mailing list" 
Cc: reverent.gr...@web.de
Betreff: Re: [gem5-users] Architectural state of registers - O3CPU

On 2/14/2024 11:19 AM, reverent.green--- via gem5-users wrote:
> Hello everyone,
> can someone give me a hint, where exactly in the code the architectural state of (load) instructions is getting set and
> becomes visible? I tried to trace instructions during the execution via log outputs, but got a bit lost during the IEW
> stage.
> I know, that instructions, which depend on specific registers will wait until the register is marked ready from an
> earlier usage. (https://github.com/gem5/gem5/blob/stable/src/cpu/o3/regfile.hh#L273)
> But is this already equivalent to the architectural state?
>
> And how is this handled during a wrong speculative execution because of the following rollback/squashing?
> Kind regards
> Robin

A typical out-of-order processor does register renaming, so there are
generally *many* more physical registers than architectural ones, and the
hardware maintains a dynamic mapping. If necessary, the architectural state
can be constructed, but generally would not be unless you're switching threads
or something. While IEW may update the registers (I believe), it is the
commit stage that makes the change "permanent".

Does that help?

Eliot Moss


___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Architectural state of registers - O3CPU

2024-02-14 Thread Eliot Moss via gem5-users

On 2/14/2024 11:19 AM, reverent.green--- via gem5-users wrote:

Hello everyone,
can someone give me a hint, where exactly in the code the architectural state of (load) instructions is getting set and 
becomes visible? I tried to trace instructions during the execution via log outputs, but got a bit lost during the IEW 
stage.
I know, that instructions, which depend on specific registers will wait until the register is marked ready from an 
earlier usage. (https://github.com/gem5/gem5/blob/stable/src/cpu/o3/regfile.hh#L273)

But is this already equivalent to the architectural state?

And how is this handled during a wrong speculative execution because of the 
following rollback/squashing?
Kind regards
Robin


A typical out-of-order processor does register renaming, so there are
generally *many* more physical registers than architectural ones, and the
hardware maintains a dynamic mapping.  If necessary, the architectural state
can be constructed, but generally would not be unless you're switching threads
or something.  While IEW may update the registers (I believe), it is the
commit stage that makes the change "permanent".

Does that help?

Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Architectural state of registers - O3CPU

2024-02-14 Thread reverent.green--- via gem5-users
Hello everyone,

 

can someone give me a hint, where exactly in the code the architectural state of (load) instructions is getting set and becomes visible? I tried to trace instructions during the execution via log outputs, but got a bit lost during the IEW stage.

I know, that instructions, which depend on specific registers will wait until the register is marked ready from an earlier usage. (https://github.com/gem5/gem5/blob/stable/src/cpu/o3/regfile.hh#L273)

But is this already equivalent to the architectural state?

And how is this handled during a wrong speculative execution because of the following rollback/squashing?

 

Kind regards

Robin___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Fwd: Simulation of Hybrid Memory in Gem5

2024-02-14 Thread claire8967--- via gem5-users
Sorry, can you post your code again, the file is no longer valid, thanks a lot!
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Attribute Error in build hybrid memory (configs/nvm/sweep_hybrid.py)

2024-02-14 Thread claire8967--- via gem5-users
Hello,\
I would like to simulate hybrid memory through gem5,\
but when I execute the file configs/nvm/sweep_hybrid.py,\
I get the following message :  Attribute reference on bound proxy 
(Parent.clk_domain.getValue)\
attachment is my code

Thanks, best regards.
# Copyright (c) 2020 ARM Limited
# All rights reserved.
#
# The license below extends only to copyright in the software and shall
# not be construed as granting a license to any other intellectual
# property including but not limited to intellectual property relating
# to a hardware implementation of the functionality of the software
# licensed hereunder.  You may use the software subject to the license
# terms below provided that you ensure that this notice is replicated
# unmodified and in its entirety in all distributions of the software,
# modified or unmodified, in source code or in binary form.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import argparse
import math

import m5
from m5.objects import *
from m5.stats import periodicStatDump
from m5.util import addToPath

addToPath("../")

from common import (
MemConfig,
ObjectList,
)

# this script is helpful to sweep the efficiency of a specific memory
# controller configuration, by varying the number of banks accessed,
# and the sequential stride size (how many bytes per activate), and
# observe what bus utilisation (bandwidth) is achieved

parser = argparse.ArgumentParser()

hybrid_generators = {"HYBRID": lambda x: x.createHybrid}

# Use a single-channel DDR3-1600 x64 (8x8 topology) by default
parser.add_argument(
"--nvm-type",
default="NVM_2400_1x64",
choices=ObjectList.mem_list.get_names(),
help="type of memory to use",
)

parser.add_argument(
"--mem-type",
default="DDR4_2400_16x4",
choices=ObjectList.mem_list.get_names(),
help="type of memory to use",
)

parser.add_argument(
"--nvm-ranks",
"-n",
type=int,
default=1,
help="Number of ranks to iterate across",
)

parser.add_argument(
"--mem-ranks",
"-r",
type=int,
default=2,
help="Number of ranks to iterate across",
)

parser.add_argument(
"--rd-perc", type=int, default=100, help="Percentage of read commands"
)

parser.add_argument(
"--nvm-perc", type=int, default=100, help="Percentage of NVM commands"
)

parser.add_argument(
"--mode",
default="HYBRID",
choices=hybrid_generators.keys(),
help="Hybrid: Random DRAM + NVM traffic",
)

parser.add_argument(
"--addr-map",
choices=ObjectList.dram_addr_map_list.get_names(),
default="RoRaBaCoCh",
help="NVM address map policy",
)

args = parser.parse_args()

# at the moment we stay with the default open-adaptive page policy,
# and address mapping

# start with the system itself, using a multi-layer 2.0 GHz
# crossbar, delivering 64 bytes / 3 cycles (one header cycle)
# which amounts to 42.7 GByte/s per layer and thus per port
system = System(membus=IOXBar(width=32))
system.clk_domain = SrcClockDomain(
clock="2.0GHz", voltage_domain=VoltageDomain(voltage="1V")
)

# set 2 ranges, the first, smaller range for DDR
# the second, larger (1024) range for NVM
# the NVM range starts directly after the DRAM range
system.mem_ranges = [
AddrRange("128MB"),
AddrRange(Addr("128MB"), size="1024MB"),
]

# do not worry about reserving space for the backing store
system.mmap_using_noreserve = True

# force a single channel to match the assumptions in the DRAM traffic
# generator
args.mem_channels = 1
args.external_memory_system = 0
args.hybrid_channel = True
MemConfig.config_mem(args, system)