Re: [gem5-users] Errors on compiling gem5.fast
Hi Nishant, That¹s unfortunate. You can always compile without LTO by adding --no-lto to the scons command line. I suspect it is an issue with your compiler. Could you verify that it is indeed built with LTO support (and linker-plugin support)? Andreas On 8/26/14, 2:49 AM, Nishant Borse via gem5-users gem5-users@gem5.org wrote: Hi, I am unable to build gem5.fast and its terminating with the following error. g++: error: -fuse-linker-plugin is not supported in this configuration I am using gcc/4.8.2. I did run into similar error for --plugin for gcc/4.6.3 Is anyone aware why would I be running into this while compiling gem5.fast. I was able to build gem5.opt without any such issues. Thanks, Nishant Borse ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England Wales, Company No: 2548782 ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Re: [gem5-users] Gem5 on multiple cores
Thank you, Andreas *moved to gem5-users :) On Tue, Aug 26, 2014 at 8:39 AM, Andreas Hansson andreas.hans...@arm.com wrote: Hi Hussain, I’d suggest to ask on the gem5-users list for everyone’s benefit. Multi-threading invariably comes at a cost, and if you want to run say 10 experiments, they are embarrassingly parallel. As one of the main purposes of gem5 is design-space exploration most users will be running 10’s or 100’s of experiments. Thus, instead of making gem5 multi-threaded and “throwing performance away”, it is efficient as a single-threaded simulator, and I suggest to run your experiments in parallel to make use of your many cores/servers etc. Andreas From: Hussain Asad x7xcloudstr...@gmail.com Date: Tuesday, 26 August 2014 04:13 To: Andreas Hansson andreas.hans...@arm.com Subject: Gem5 on multiple cores Hi Andreas, I have a quick question, I am running gem5 build on a core i7 system, but gem5 uses just one core of the available 8(4cores +4threads). Is this feature not yet implemented or am I compiling the system not correctly, As I would assume if it was using all my CPU cores the simulation would be much faster. running gem5 on Ubuntu 14 LTS, core i7, 8GB of RAM at the moment, should I move my system to University servers would it be faster in a server system? Thanks Best Regards Hussain -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England Wales, Company No: 2548782 ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Re: [gem5-users] Gem5 on multiple cores
I'll mention that gem5 does have the foundation for parallelizing a single simulation across multiple cores; see for example http://repo.gem5.org/gem5/rev/2cce74fe359e. However, if you want to model a non-trivial configuration (i.e., one where there is communication between threads), then you have to insert synchronization, and that does limit your speedup, as Andreas has mentioned. Steve On Tue, Aug 26, 2014 at 3:03 AM, Hussain Asad via gem5-users gem5-users@gem5.org wrote: Thank you, Andreas *moved to gem5-users :) On Tue, Aug 26, 2014 at 8:39 AM, Andreas Hansson andreas.hans...@arm.com wrote: Hi Hussain, I’d suggest to ask on the gem5-users list for everyone’s benefit. Multi-threading invariably comes at a cost, and if you want to run say 10 experiments, they are embarrassingly parallel. As one of the main purposes of gem5 is design-space exploration most users will be running 10’s or 100’s of experiments. Thus, instead of making gem5 multi-threaded and “throwing performance away”, it is efficient as a single-threaded simulator, and I suggest to run your experiments in parallel to make use of your many cores/servers etc. Andreas From: Hussain Asad x7xcloudstr...@gmail.com Date: Tuesday, 26 August 2014 04:13 To: Andreas Hansson andreas.hans...@arm.com Subject: Gem5 on multiple cores Hi Andreas, I have a quick question, I am running gem5 build on a core i7 system, but gem5 uses just one core of the available 8(4cores +4threads). Is this feature not yet implemented or am I compiling the system not correctly, As I would assume if it was using all my CPU cores the simulation would be much faster. running gem5 on Ubuntu 14 LTS, core i7, 8GB of RAM at the moment, should I move my system to University servers would it be faster in a server system? Thanks Best Regards Hussain -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England Wales, Company No: 2548782 ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[gem5-users] O3 fetch throughput when i-cache hit latency is more than 1 cycle
Hi, Looking at the codes for the fetch unit in O3, I realized that the fetch unit does not take advantage of non-blocking i-caches. The fetch unit does not initiate a new i-cache request while it is waiting for the an i-cache response. Since fetch unit in O3 does not pipeline i-cache requests, fetch unit throughput reduces significantly when the i-cache hit latency is more than 1 cycle. I expected that fetch unit should be able to initiate a new i-cache request each cycle (based on BTB addr or next sequential fetch addr) even when fetch unit is waiting for i-cache responses. Any thoughts on this? I understand a large fetch buffer can mitigate this to some degree... Thanks, Amin ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Re: [gem5-users] O3 fetch throughput when i-cache hit latency is more than 1 cycle
Yep, I've thought of the need for a fully pipelined fetch as well. However my current method is to fake longer instruction cache latencies by leaving the delay as 1 cycle, but make up for it by adding additional fetchToDecode delay. This makes the front-end latency and branch mispredict penalty the same (for branches resolved at decode as well as execute). I haven't yet seen a case where this adding additional latency later to make up for the lack of real instruction cache latency makes much of a difference. On Tue, Aug 26, 2014 at 11:32 AM, Amin Farmahini via gem5-users gem5-users@gem5.org wrote: Hi, Looking at the codes for the fetch unit in O3, I realized that the fetch unit does not take advantage of non-blocking i-caches. The fetch unit does not initiate a new i-cache request while it is waiting for the an i-cache response. Since fetch unit in O3 does not pipeline i-cache requests, fetch unit throughput reduces significantly when the i-cache hit latency is more than 1 cycle. I expected that fetch unit should be able to initiate a new i-cache request each cycle (based on BTB addr or next sequential fetch addr) even when fetch unit is waiting for i-cache responses. Any thoughts on this? I understand a large fetch buffer can mitigate this to some degree... Thanks, Amin ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[gem5-users] Kernel version vs Gem5 version
Hi all, It seems like using the kernel version x86_64-vmlinux-2.6.22.9.smp may have solved my problem that was posted in this thread: http://www.mail-archive.com/gem5-users@gem5.org/msg10387.html However, I am using the latest gem5 version gem5-stable-aaf017eaad7d and I only tested the atomic cpu without any checkpointing or fast forwarding. Are the any problems related to using an older kernel version (such as x86_64-vmlinux-2.6.22.9.smp) with gem5? Best, Fulya ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[gem5-users] How to add shared nonblocking L3 cache in gem5?
Hi Users, I am new to gem5 and I want to add nonblacking shared Last level cache(L3). I could see L3 cache options in Options.py with default values set. However there is no entry for L3 in Caches.py and CacheConfig.py. So extending Cache.py and CacheConfig.py would be enough to create L3 cache? Thanks, Prathap ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Re: [gem5-users] O3 fetch throughput when i-cache hit latency is more than 1 cycle
Thanks for the response Mitch. It seems like a nice way to fake a pipelined fetch. Amin On Tue, Aug 26, 2014 at 10:54 AM, Mitch Hayenga mitch.hayenga+g...@gmail.com wrote: Yep, I've thought of the need for a fully pipelined fetch as well. However my current method is to fake longer instruction cache latencies by leaving the delay as 1 cycle, but make up for it by adding additional fetchToDecode delay. This makes the front-end latency and branch mispredict penalty the same (for branches resolved at decode as well as execute). I haven't yet seen a case where this adding additional latency later to make up for the lack of real instruction cache latency makes much of a difference. On Tue, Aug 26, 2014 at 11:32 AM, Amin Farmahini via gem5-users gem5-users@gem5.org wrote: Hi, Looking at the codes for the fetch unit in O3, I realized that the fetch unit does not take advantage of non-blocking i-caches. The fetch unit does not initiate a new i-cache request while it is waiting for the an i-cache response. Since fetch unit in O3 does not pipeline i-cache requests, fetch unit throughput reduces significantly when the i-cache hit latency is more than 1 cycle. I expected that fetch unit should be able to initiate a new i-cache request each cycle (based on BTB addr or next sequential fetch addr) even when fetch unit is waiting for i-cache responses. Any thoughts on this? I understand a large fetch buffer can mitigate this to some degree... Thanks, Amin ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users