[gem5-users] Seperate cache line size
Hello. Is there a way, to change cacheline size for different level of cache ? Example: L1 cacheline size is 64 byte and L2 is 128 bytes ? If there is not a direct way (changing some parameter from python), what will be the issues with building this ? Things I know: 1. fetchbuffer size has to be equal or less than cacheline size, otherwise panic happens in gem5 (although I dont know why, and would like to know). 2. In src/mem/cache/cache.cc, the constructor for cache::basecache() is called with p.system->cacheLineSize(). I am guessing changing this to user defined value will let me get a cache with whatever cacheline I want. However, I saw in cache.cc, Cache::satisfyRequest(): // determine if this read is from a (coherent) cache or not if (pkt->fromCache()) { assert(pkt->getSize() == blkSize); >From the comment, it looks like this is for a request either from another >cache from same level (2 L1 cache in 2 processor), or it can be from L1 to L2. The assertion is making me think, separate cacheline size will not work here. ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Dumping network traces from gem5 for Tarce-based NoC simulation
Hi everyone, I have a requirement to dump traffic traces of running a program in gem5 and replay it in another NoC simulator (e.g., Noxim). I have two questions, and any help or pointers would be appreciated. 1. I want to dump traces of every inter-node traffic (e.g., a read request from an L1 cache to a directory at another node). Since I don't need a detailed NoC simulation for trace (I only need input/output to the interconnection network), I plan to use a simple network instead of garnet. I want a single row of traces to have information on at least the source, destination, and packet size. I want to know what files I have to consider putting the DPRINTF statement to dump traces into a debug file. 2. I'm planning to conduct a dependency-based simulation using Netrace (https://www.cs.utexas.edu/~netrace/) on traced traffic from gem5. However, the Netrace library does not provide APIs or functions for preprocessing trace files to identify dependencies. Did anyone use Netrace to dump gem5 traffic traces and preprocess it for dependencies? If so, what parameters were dumped, and how did you conduct dependency between each trace? Also, I would like to talk about any other approaches in dumping network traces from gem5. Thanks, and Regards, Hansika Weerasena ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Dumping network traces from gem5 for Tarce-based NoC simulation
Hi everyone, I have a requirement to dump traffic traces of running a program in gem5 and replay it in another NoC simulator (e.g., Noxim). I have two questions, and any help or pointers would be appreciated. 1. I want to dump traces of every inter-node traffic (e.g., a read request from an L1 cache to a directory at another node). Since I don't need a detailed NoC simulation for trace (I only need input/output to the interconnection network), I plan to use a simple network instead of garnet. I want a single row of traces to have information on at least the source, destination, and packet size. I want to know what files I have to consider putting the DPRINTF statement to dump traces into a debug file. 2. I'm planning to conduct a dependency-based simulation using Netrace (https://www.cs.utexas.edu/~netrace/) on traced traffic from gem5. However, the Netrace library does not provide APIs or functions for preprocessing trace files to identify dependencies. Did anyone use Netrace to dump gem5 traffic traces and preprocess it for dependencies? If so, what parameters were dumped, and how did you conduct dependency between each trace? Also, I would like to talk about any other approaches in dumping network traces from gem5. Thanks and Regards, Hansika Weerasena ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Re: Architectural state of registers - O3CPU
On 2/14/2024 1:14 PM, Eliot Moss via gem5-users wrote: On 2/14/2024 12:52 PM, reverent.green--- via gem5-users wrote: I would like to add some additional information. The register number does vary in each iteration, sometimes it is above 100. So I think it should be the physical register value. If my understanding is correct, the physical register should be set during the IEW stage before the instruction is commited or squashed at the last stage. Otherwise out-of-order execution wouldn't be possible. So in the end I am searching the point at which the physical register is set and marked as ready for subsequent instruction, which depend on this specific register. Yes, it makes sense that it is a physical register. For arithmetic, register to register move, etc., it would be written in IEW. But for loads, it cannot be written until LSQ processing, which is later in the pipeline. I believe there is a notion of the register being *ready*, and it will be marked ready when it is written. Likewise, once all of an instruction's input registers are ready, that instruction may be executed (the instruction itself becomes ready). You can look for the 'writeback' function in lsq_unit.cc. It clearly has some relationship to IEW, but it explicitly calls completeAcc, which does the actual write into the register. The specific code for that came from the instruction's template. This is necessarily so - consider the difference between loading a byte (say) vs a word, and sign- vs zero-extended values. See also function writebackInsts in iew.cc. EM ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Re: Architectural state of registers - O3CPU
On 2/14/2024 12:52 PM, reverent.green--- via gem5-users wrote: I would like to add some additional information. The register number does vary in each iteration, sometimes it is above 100. So I think it should be the physical register value. If my understanding is correct, the physical register should be set during the IEW stage before the instruction is commited or squashed at the last stage. Otherwise out-of-order execution wouldn't be possible. So in the end I am searching the point at which the physical register is set and marked as ready for subsequent instruction, which depend on this specific register. Yes, it makes sense that it is a physical register. For arithmetic, register to register move, etc., it would be written in IEW. But for loads, it cannot be written until LSQ processing, which is later in the pipeline. I believe there is a notion of the register being *ready*, and it will be marked ready when it is written. Likewise, once all of an instruction's input registers are ready, that instruction may be executed (the instruction itself becomes ready). You can look for the 'writeback' function in lsq_unit.cc. It clearly has some relationship to IEW, but it explicitly calls completeAcc, which does the actual write into the register. The specific code for that came from the instruction's template. This is necessarily so - consider the difference between loading a byte (say) vs a word, and sign- vs zero-extended values. Regards - EM ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Re: Architectural state of registers - O3CPU
I would like to add some additional information. The register number does vary in each iteration, sometimes it is above 100. So I think it should be the physical register value. If my understanding is correct, the physical register should be set during the IEW stage before the instruction is commited or squashed at the last stage. Otherwise out-of-order execution wouldn't be possible. So in the end I am searching the point at which the physical register is set and marked as ready for subsequent instruction, which depend on this specific register. Gesendet: Mittwoch, 14. Februar 2024 um 18:35 Uhr Von: "Eliot Moss" An: "The gem5 Users mailing list" Cc: reverent.gr...@web.de Betreff: Re: [gem5-users] Re: Architectural state of registers - O3CPU On 2/14/2024 12:26 PM, reverent.green--- via gem5-users wrote: > Hey Eliot, > thank you for your answer. I have a follow-up question. > I know, that there are more physical registers than architectural ones and that the achitectural state should be set in > the final commit state. > So if the debug message linked in my earlier mail shows e.g.: "Setting int register 54 to 0x53000", this "register 54" > should be a physical register and it can be used without setting the architectural state? > Do you know, at which point in the O3 steps this physical register is set after an instruction? That's something where I'd need to dig into the code the make sure. However, the number 53 is fairly large so my first impression is that it is a physical register number, not a logical (architectural) one. On the other hand, if you count up integer registers, floating point registers, vector registers, etc., 53 could be in the range of the architectural registers. I do know that if you request debug trace information from gem5, it will tend to refer to architectural registers. I don't know precisely where the physical register is set, but my first thought is IEW - the W part stands for Writeback, i.e., when registers typically are written. However, loads are probably written later since they are not computational but wait for a response from the cache. As I recall, the load/store queue processing is a separate step in the pipeline, coming later than IEW. EM ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Re: Architectural state of registers - O3CPU
On 2/14/2024 12:26 PM, reverent.green--- via gem5-users wrote: Hey Eliot, thank you for your answer. I have a follow-up question. I know, that there are more physical registers than architectural ones and that the achitectural state should be set in the final commit state. So if the debug message linked in my earlier mail shows e.g.: "Setting int register 54 to 0x53000", this "register 54" should be a physical register and it can be used without setting the architectural state? Do you know, at which point in the O3 steps this physical register is set after an instruction? That's something where I'd need to dig into the code the make sure. However, the number 53 is fairly large so my first impression is that it is a physical register number, not a logical (architectural) one. On the other hand, if you count up integer registers, floating point registers, vector registers, etc., 53 could be in the range of the architectural registers. I do know that if you request debug trace information from gem5, it will tend to refer to architectural registers. I don't know precisely where the physical register is set, but my first thought is IEW - the W part stands for Writeback, i.e., when registers typically are written. However, loads are probably written later since they are not computational but wait for a response from the cache. As I recall, the load/store queue processing is a separate step in the pipeline, coming later than IEW. EM ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Re: Architectural state of registers - O3CPU
Hey Eliot, thank you for your answer. I have a follow-up question. I know, that there are more physical registers than architectural ones and that the achitectural state should be set in the final commit state. So if the debug message linked in my earlier mail shows e.g.: "Setting int register 54 to 0x53000", this "register 54" should be a physical register and it can be used without setting the architectural state? Do you know, at which point in the O3 steps this physical register is set after an instruction? Kind regards Gesendet: Mittwoch, 14. Februar 2024 um 17:47 Uhr Von: "Eliot Moss" An: "The gem5 Users mailing list" Cc: reverent.gr...@web.de Betreff: Re: [gem5-users] Architectural state of registers - O3CPU On 2/14/2024 11:19 AM, reverent.green--- via gem5-users wrote: > Hello everyone, > can someone give me a hint, where exactly in the code the architectural state of (load) instructions is getting set and > becomes visible? I tried to trace instructions during the execution via log outputs, but got a bit lost during the IEW > stage. > I know, that instructions, which depend on specific registers will wait until the register is marked ready from an > earlier usage. (https://github.com/gem5/gem5/blob/stable/src/cpu/o3/regfile.hh#L273) > But is this already equivalent to the architectural state? > > And how is this handled during a wrong speculative execution because of the following rollback/squashing? > Kind regards > Robin A typical out-of-order processor does register renaming, so there are generally *many* more physical registers than architectural ones, and the hardware maintains a dynamic mapping. If necessary, the architectural state can be constructed, but generally would not be unless you're switching threads or something. While IEW may update the registers (I believe), it is the commit stage that makes the change "permanent". Does that help? Eliot Moss ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Re: Architectural state of registers - O3CPU
On 2/14/2024 11:19 AM, reverent.green--- via gem5-users wrote: Hello everyone, can someone give me a hint, where exactly in the code the architectural state of (load) instructions is getting set and becomes visible? I tried to trace instructions during the execution via log outputs, but got a bit lost during the IEW stage. I know, that instructions, which depend on specific registers will wait until the register is marked ready from an earlier usage. (https://github.com/gem5/gem5/blob/stable/src/cpu/o3/regfile.hh#L273) But is this already equivalent to the architectural state? And how is this handled during a wrong speculative execution because of the following rollback/squashing? Kind regards Robin A typical out-of-order processor does register renaming, so there are generally *many* more physical registers than architectural ones, and the hardware maintains a dynamic mapping. If necessary, the architectural state can be constructed, but generally would not be unless you're switching threads or something. While IEW may update the registers (I believe), it is the commit stage that makes the change "permanent". Does that help? Eliot Moss ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Architectural state of registers - O3CPU
Hello everyone, can someone give me a hint, where exactly in the code the architectural state of (load) instructions is getting set and becomes visible? I tried to trace instructions during the execution via log outputs, but got a bit lost during the IEW stage. I know, that instructions, which depend on specific registers will wait until the register is marked ready from an earlier usage. (https://github.com/gem5/gem5/blob/stable/src/cpu/o3/regfile.hh#L273) But is this already equivalent to the architectural state? And how is this handled during a wrong speculative execution because of the following rollback/squashing? Kind regards Robin___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Re: Fwd: Simulation of Hybrid Memory in Gem5
Sorry, can you post your code again, the file is no longer valid, thanks a lot! ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Attribute Error in build hybrid memory (configs/nvm/sweep_hybrid.py)
Hello,\ I would like to simulate hybrid memory through gem5,\ but when I execute the file configs/nvm/sweep_hybrid.py,\ I get the following message : Attribute reference on bound proxy (Parent.clk_domain.getValue)\ attachment is my code Thanks, best regards. # Copyright (c) 2020 ARM Limited # All rights reserved. # # The license below extends only to copyright in the software and shall # not be construed as granting a license to any other intellectual # property including but not limited to intellectual property relating # to a hardware implementation of the functionality of the software # licensed hereunder. You may use the software subject to the license # terms below provided that you ensure that this notice is replicated # unmodified and in its entirety in all distributions of the software, # modified or unmodified, in source code or in binary form. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are # met: redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer; # redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution; # neither the name of the copyright holders nor the names of its # contributors may be used to endorse or promote products derived from # this software without specific prior written permission. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. import argparse import math import m5 from m5.objects import * from m5.stats import periodicStatDump from m5.util import addToPath addToPath("../") from common import ( MemConfig, ObjectList, ) # this script is helpful to sweep the efficiency of a specific memory # controller configuration, by varying the number of banks accessed, # and the sequential stride size (how many bytes per activate), and # observe what bus utilisation (bandwidth) is achieved parser = argparse.ArgumentParser() hybrid_generators = {"HYBRID": lambda x: x.createHybrid} # Use a single-channel DDR3-1600 x64 (8x8 topology) by default parser.add_argument( "--nvm-type", default="NVM_2400_1x64", choices=ObjectList.mem_list.get_names(), help="type of memory to use", ) parser.add_argument( "--mem-type", default="DDR4_2400_16x4", choices=ObjectList.mem_list.get_names(), help="type of memory to use", ) parser.add_argument( "--nvm-ranks", "-n", type=int, default=1, help="Number of ranks to iterate across", ) parser.add_argument( "--mem-ranks", "-r", type=int, default=2, help="Number of ranks to iterate across", ) parser.add_argument( "--rd-perc", type=int, default=100, help="Percentage of read commands" ) parser.add_argument( "--nvm-perc", type=int, default=100, help="Percentage of NVM commands" ) parser.add_argument( "--mode", default="HYBRID", choices=hybrid_generators.keys(), help="Hybrid: Random DRAM + NVM traffic", ) parser.add_argument( "--addr-map", choices=ObjectList.dram_addr_map_list.get_names(), default="RoRaBaCoCh", help="NVM address map policy", ) args = parser.parse_args() # at the moment we stay with the default open-adaptive page policy, # and address mapping # start with the system itself, using a multi-layer 2.0 GHz # crossbar, delivering 64 bytes / 3 cycles (one header cycle) # which amounts to 42.7 GByte/s per layer and thus per port system = System(membus=IOXBar(width=32)) system.clk_domain = SrcClockDomain( clock="2.0GHz", voltage_domain=VoltageDomain(voltage="1V") ) # set 2 ranges, the first, smaller range for DDR # the second, larger (1024) range for NVM # the NVM range starts directly after the DRAM range system.mem_ranges = [ AddrRange("128MB"), AddrRange(Addr("128MB"), size="1024MB"), ] # do not worry about reserving space for the backing store system.mmap_using_noreserve = True # force a single channel to match the assumptions in the DRAM traffic # generator args.mem_channels = 1 args.external_memory_system = 0 args.hybrid_channel = True MemConfig.config_mem(args, system)