[gem5-users] Branch predictor
Hi all, I have observed that call and return instructions are predicted as if these instructions were conditional branches. Such predictions even modify branch history register. As a consequence many times these instructions are predicted not-taken and squash function must be called after they are executed. Moreover, when a call instruction is predited taken and no target address is found in BTB the prediction is changed to not-taken and RAS entry associated with this call is removed. Squash function is called after call instruction is executed to update BTB but RAS is not updated so return instruction will get a wrong address. Am i missing something? Thanks, Adrian ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[gem5-users] Hit time simulation
Hi, Does gem 5 simulate estimates for cache hit time also ? ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Re: [gem5-users] Fwd: Sharing L2 cache
Hello Yujie, Here is the code I wrote for enabling private L2 caches. Unfortunately, I couldn't find the CacheConfig.py for sharing L2 in an even fashion. But I did that in the same way. I hope this help you. Best regards, Kumail Ahmed TU Kaiserslautern, Germany On Thu, Nov 13, 2014 at 2:19 AM, Yujie Chen cyjseag...@163.com wrote: hello Kumail, I modify file CacheConfig.py as you guided . But unfortunately , when I execute the cmd : build/X86/gem5.opt \ --stats-file=DRAM_state.log \ configs/example/updated_se.py \ --mem-type=LPDDR2_S4_1066_x32 \ --caches \ --cpu-type=detailed --num-cpus=4 \ --l1i_size 32kB --l1d_size 32kB \ --l2cache --l2_size 4MB\ --l3cache --l3_size 8MB \ -c cyj_test/bechmarks/hello I had got the warns: warn: add_child('l2'): child 'l2' already has parent warn: add_child('tol2bus'): child 'tol2bus' already has parent warn: add_child('l2'): child 'l2' already has parent warn: add_child('tol2bus'): child 'tol2bus' already has parent *and soon , it exited unexpectedly*. I have no idea about what happened. Can you show it more detailed to me, or show your configure file to me. the accessory is my configure file. Thanks for your response ^_^. best wishes to you! -- YujieChen School of Computer Science and Technology Cluster and Grid Computing Lab Services Computing Technology and System Lab Huazhong University of Science and Technology Wuhan, 430074, China Email:yujiechen_h...@163.com At 2014-11-12 22:36:24, Kumail Ahmed kumai...@gmail.com wrote: Hello Yuije, In the for-loop for allocating CPUs, you can use the modulus operator to apply L2 caches evenly, you will also need to instantiate L1 private caches in a way that you can connect them to the system.tol2bus. Thank you for your answer :) Kumail Ahmed TU Kasierslautern, Germany On Wed, Nov 12, 2014 at 3:13 PM, Yujie Chen cyjseag...@163.com wrote: hello,kumail, you can reference the original configure file of cache, namely config/common/CacheConfigure.py , l2 cache is shared among all cpus in the unmodified cacheconfig.py , and the l1 caches are all private. You can create a l3 cache and a l3 cache bus based on the BaseCache and CoherentBus,and define the l3 cache's cpu_side and mem_side . You can find the detailed answer on the Internet. by the way , I'm also confused about how to share 2 l2 caches between 4 cores : two cpus share one l2 cache. You said you make it , how do you make it ! I'll appreciate it if you response me! Thanks very much! -- YujieChen School of Computer Science and Technology Cluster and Grid Computing Lab Services Computing Technology and System Lab Huazhong University of Science and Technology Wuhan, 430074, China Email:yujiechen_h...@163.com At 2014-11-11 21:01:32, Kumail Ahmed via gem5-users gem5-users@gem5.org wrote: Thank you very much andreas :) I did it! Can you tell me how I can add l3cache in the classical memory model? Do I have to create a new l3cache class can share it on the l2 bus? Thanks again, Kumail Ahmed Masters Student TU Kaiserslautern, Germany On Tue, Nov 11, 2014 at 1:56 PM, Andreas Hansson andreas.hans...@arm.com wrote: Hi Kumail, The crossbar in gem5 supports address striping, so you can create a “toL2Bus” that interleaves between two L2 caches. Have a look at config/common/MemConfig.py for how the interleaving is configured (for the memory channels). You should be able to do something similar. Andreas From: Kumail Ahmed via gem5-users gem5-users@gem5.org Reply-To: Kumail Ahmed kumai...@gmail.com, gem5 users mailing list gem5-users@gem5.org Date: Tuesday, 11 November 2014 10:45 To: gem5-users@gem5.org gem5-users@gem5.org Subject: [gem5-users] Sharing L2 cache Hello, How an I share two L2 caches between 4 CPU cores in GEM5. I guess I have to change the code in Cacheconfig.py. Can someone help me with this? Thanks, Kumail Ahmed -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England Wales, Company No: 2548782 # Copyright (c) 2012-2013 ARM Limited # All rights reserved # # The license below extends only to copyright in the software and shall # not be construed as granting a license to any other intellectual # property including but not limited to intellectual property relating # to a hardware implementation of the functionality of the software # licensed hereunder. You may use the software subject to the license # terms below
Re: [gem5-users] Branch predictor
Hi, I suspect one of a few things might be behind what you think you are seeing. 1) I have observed that call and return instructions are predicted as if these instructions were conditional branches. The ARMv7 ISA actually does have conditional calls and returns (this is a consequence of letting almost any instruction be predicated). The disassembly generated by gem5 often lacks these additional conditional specifiers, even though the instructions are executed properly. If this is some other ISA (like x86), I'd verify the IsCondCtrl/IsUncondCtrl flags on the given instruction are properly set. This could be an ISA-specific error. This code in the predictor is supposed to guard against that case, but its up to the ISA implementer to do things right. if (inst-isUncondCtrl()) { DPRINTF(Branch, [tid:%i]: Unconditional control.\n, tid); pred_taken = true; // Tell the BP there was an unconditional branch. uncondBranch(bp_history); } else { ++condPredicted; pred_taken = lookup(pc.instAddr(), bp_history); DPRINTF(Branch, [tid:%i]: [sn:%i] Branch predictor predicted %i for PC %s\n, tid, seqNum, pred_taken, pc); } 2) Moeover, when a call instruction is predited taken and no target address is found in BTB the prediction is changed to not-taken and RAS entry associated with this call is removed. : The RAS maintenance is a fun bit of code in the predictor. There have been bugs discovered/fixed in it recently. Seems the fix is just to push the proper new RAS in the squash function for the call. This has probably made it so long because it is unlikely to be exercised often (since the calls will normally hit in the BTB). On Thu, Nov 13, 2014 at 4:51 AM, Adrián Colaso Diego via gem5-users gem5-users@gem5.org wrote: Hi all, I have observed that call and return instructions are predicted as if these instructions were conditional branches. Such predictions even modify branch history register. As a consequence many times these instructions are predicted not-taken and squash function must be called after they are executed. Moreover, when a call instruction is predited taken and no target address is found in BTB the prediction is changed to not-taken and RAS entry associated with this call is removed. Squash function is called after call instruction is executed to update BTB but RAS is not updated so return instruction will get a wrong address. Am i missing something? Thanks, Adrian ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[gem5-users] Interrupting exection in GEM5
Hello, How can I interrupt an executing program between execution in SE mode? Best regards, Kumail ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Re: [gem5-users] How to compile gem5 to statically linked program?
You'll probably have to modify the linker flags toward the bottom of src/SConscript. Steve On Wed, Nov 12, 2014 at 4:44 PM, ni...@outlook.com via gem5-users gem5-users@gem5.org wrote: Thanks, you means add this in sconstruct file or just in the command line? by the way, i tried to add -static option to the CCFLAGS, but it does not work as i expected. -- nifan *From:* Kumail Ahmed kumai...@gmail.com *Date:* 2014-11-13 00:19 *To:* ni...@outlook.com; gem5 users mailing list gem5-users@gem5.org *Subject:* Re: [gem5-users] How to compile gem5 to statically linked program? Hello Nifan, trying using -shared tag in SCON. Didn't try myself yet On Wed, Nov 12, 2014 at 10:14 AM, ni...@outlook.com via gem5-users gem5-users@gem5.org wrote: Hi, all: Does anyone know how to compile gem5 statically so it can run on other machines with re-compilation? Thanks in advance. -- nifan ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[gem5-users] Threading in Gem5
Hi! I am new to the GEM5 world and I have some questions regarding running multi-threaded application simulations. 1. How does one stick the thread from the application on the cores of the simulator. I do not want SMT, I want multi core elements, and each thread from the application should be on one core. 2. I want the same thing with the threads but now on an SMT core. I want to measure contention 3. Is there a way to remove the caches from the memory hierarchy. I want the processor to go directly to main memory. Thanks ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[gem5-users] DRAMCTRL:Seeing an unusual behaviuor with FR-FCFS scheduler
Hi Users, For the following scenario: Read0 Read1 Read2 Read3 Read4 Read5 Read6 Read7 Read8 Read9 Read10 Read11 There are 12 reads in the read queue numbered in the order of arrival. Read 0 to Read3 access same row of Bank1, Read4 access Bank0, Read5 to Read8 access same row of Bank2 and Read9 to Read11 access same row of Bank3. According to FR-FCFS scheduler, even there is only a single request Read4 to Bank0, it should be scheduled after the Read0 to Read3 are scheduled. Because within the window of Read0-Read3, the Read4 would have done with precharge and activate and ready to schedule. Though Read5 and Read9 are also ready, Read4 needs to be scheduled as the next row hit, according to FCFS. However i see a different behaviour where Read4 is scheduled only after all other row hits to Bank2 and Bank3 is scheduled. Also i noticed from the debug prints that Read4 is not becoming a row_hit. Are we missing to mark the read as row hit after precharge and activate. I am trying to figure this out. Is my understanding correct? Thanks, Prathap ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[gem5-users] Gem5 freezing with x86 timing cpu
Hi all, I have this weird problem where gem5 starts executing a benchmark (some parsec and some others) or starts booting in FS mode but then just freezes. I was wondering how you would recommend getting to the bottom of this. Can I figure out what it's doing? I know I can attach a debugger but this happens after many hours so with the debugger it'll be even slower. Also, I'm not sure if gdb's backtrace function will really help me. This happens with the latest development and stable versions. Here is how I try to run gem5 for FS (similar for SE): ./build/X86_MESI_Two_Level/gem5.opt --remote-gdb-port=0 --outdir=results/$1_none_x86_dir --dump-config=$1_config --redirect-stdout --redirect-stderr --stdout-file=m5sim.out --stderr-file=$1_err --stats-file=$1_stats configs/example/fs.py --script=./runscripts/$1 --disk-image=all_benchmarks.img --kernel=x86_64-vmlinux-2.6.28.4-smp --cpu-type=timing --caches --prefetcher-type=none --l2cache --sys-clock=2GHz --cpu-clock=2GHz --num-dirs=8 --num-l2caches=8 --num-l3caches=4 --l2_size=4MB --l3_size=32MB --l1d_assoc=2 --l1i_assoc=2 --l2_assoc=8 --l3_assoc=16 --mem-type=DDR3_1600_x64 --cacheline_size=64 --checkpoint-dir=checkpoints --num-cpus=64 I tried running with atomic cpus and that seems to work, but with x86 i can't restore with a different type cpu nor use switchcpu during execution as I understand current limitations. Plus i need timing for my SE runs. The only modification I did really is to move the prefetcher to the L2, by editing CacheConfig.py such as: if options.prefetcher_type == stride: system.l2.prefetch_on_access = 'true' system.l2.prefetcher = StridePrefetcher(degree=8, latency=1) elif options.prefetcher_type == tagged: system.l2.prefetch_on_access = 'true' system.l2.prefetcher = TaggedPrefetcher(degree=8, latency=1) elif options.prefetcher_type == none: system.l2.prefetch_on_access = 'false' else: print Unknown Prefetcher Type sys.exit(1) Freezing happens with all prefetchers though. Notice that I don't use Ruby. Thank you in advance, George M ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users