Hi Saptarshi, Command line for *Case 1:* /mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/build/X86/gem5.fast --outdir=/mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/RUNS/14_single/m5out_cactusADM --remote-gdb-port=0 /mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/configs/example/se_AMD_multicore.py -n 1 --cpu-type=detailed --caches --l2cache --num-l2caches=1 --l1d_size=64kB --l1i_size=64kB --l1d_assoc=2 --l1i_assoc=2 --l2_size=1MB --l2_assoc=16 --fast-forward=2000000000 --bench="cactusADM" --max_total_inst=100000000 --clock=2.1GHz
Command line for *Case 2:* /mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/build/X86/gem5.fast --outdir=/mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/RUNS/14_hom/m5out_cactusADM-cactusADM-cactusADM-cactusADM --remote-gdb-port=0 /mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/configs/example/se_AMD_multicore.py -n 4 --cpu-type=detailed --caches --l2cache --num-l2caches=4 --l1d_size=64kB --l1i_size=64kB --l1d_assoc=2 --l1i_assoc=2 --l2_size=1MB --l2_assoc=16 --fast-forward=2000000000 --bench="cactusADM-cactusADM-cactusADM-cactusADM" --max_total_inst=400000000 --clock=2.1GHz For clarity, i turned off remote-gdb because I was getting error about listeners when running on the cluster, and I am not doing anything with gdb. My gem5 is modified such that I count the number of instructions executed after switching (which i implemented in o3 cpu definition). Max_total_inst option determines the total number of instructions executed for all cores after switching, therefore this number is set to 400 million in case 2. I also checked and verified this by looking at the stats.txt file. Let me know if you also need to check my se_AMD_multicore.py file. This has been modified to add all the Spec benchmarks and their binary&input file paths. Thanks, Fulya On Wed, Nov 6, 2013 at 11:03 AM, Saptarshi Mallick < [email protected]> wrote: > Hello Fulya, > Can you please give the command line for both the cases, > which you run for getting the results? I also had the same kind of problem, > maybe there is some mistake in the command line which we are using. > > On Tuesday, November 5, 2013, Fulya Kaplan <[email protected]> wrote: > > Number of committed instructions > (system.switch_cpus.committedInsts_total) for; > > Case 1: 100,000,000 > > Case 2: cpu0->100,045,354 > > cpu1->100,310,197 > > cpu2-> 99,884,333 > > cpu3-> 99,760,117 > > Number of cycles for; > > Case 1: 150,570,516 > > Case 2: 139,230,042 > > For both cases, they switch cpus to detailed mode at instruction #2 > billion. All the reported data correspond to the 100 million instructions > in detailed mode. > > From the config.ini files I can see that separate L2 caches are defined > for Case 2. My modified CacheConfig.py file to have private L2 caches looks > like: > > def config_cache(options, system): > > if options.cpu_type == "arm_detailed": > > try: > > from O3_ARM_v7a import * > > except: > > print "arm_detailed is unavailable. Did you compile the O3 > model?" > > sys.exit(1) > > dcache_class, icache_class, l2_cache_class = \ > > O3_ARM_v7a_DCache, O3_ARM_v7a_ICache, O3_ARM_v7aL2 > > else: > > dcache_class, icache_class, l2_cache_class = \ > > L1Cache, L1Cache, L2Cache > > if options.l2cache: > > # Provide a clock for the L2 and the L1-to-L2 bus here as they > > # are not connected using addTwoLevelCacheHierarchy. Use the > > # same clock as the CPUs, and set the L1-to-L2 bus width to 32 > > # bytes (256 bits). > > system.l2 = [l2_cache_class(clock=options.clock, > > size=options.l2_size, > > assoc=options.l2_assoc, > > block_size=options.cacheline_size) > for i in xrange(options.num_cpus)] > > > > system.tol2bus = [CoherentBus(clock = options.clock, width = 32) > for i in xrange(options.num_cpus)] > > #system.l2.cpu_side = system.tol2bus.master > > #system.l2.mem_side = system.membus.slave > > for i in xrange(options.num_cpus): > > if options.caches: > > icache = icache_class(size=options.l1i_size, > > assoc=options.l1i_assoc, > > block_size=options.cacheline_size) > > dcache = dcache_class(size=options.l1d_size, > > assoc=options.l1d_assoc, > > block_size=options.cacheline_size) > > # When connecting the caches, the clock is also inherited > > # from the CPU in question > > if buildEnv['TARGET_ISA'] == 'x86': > > system.cpu[i].addPrivateSplitL1Caches(icache, dcache, > > > PageTableWalkerCache(), > > > PageTableWalkerCache()) > > else: > > system.cpu[i].addPrivateSplitL1Caches(icache, dcache) > > system.cpu[i].createInterruptController() > > if options.l2cache: > > system.l2[i].cpu_side = system.tol2bus[i].master > > system.l2[i].mem_side = system.membus.slave > > system.cpu[i].connectAllPorts(system.tol2bus[i], > system.membus) > > else: > > system.cpu[i].connectAllPorts(system.membus) > > return system > > > > > > > > Best, > > Fulya > > > > > > > > > > On Mon, Nov 4, 2013 at 10:35 PM, biswabandan panda <[email protected]> > wrote: > > > > Hi, > > Could you report the number of committedInsts for both the cases. > > > > > > On Tue, Nov 5, 2013 at 7:04 AM, fulya <[email protected]> wrote: > > > > In single core case, there is a 1 MB L2 cache. In 4-core case, each core > has its own private L2 cache of size 1 MB. As they are not shared, i dont > understand the reason for different cache miss rates. > > > > Best, > > Fulya Kaplan > > On Nov 4, 2013, at 7:55 PM, "Tao Zhang" <[email protected]> > wrote: > > > > Hi Fulya, > > > > > > > > What’s the L2 cache size of the 1-core test? Is it equal to the total > capacity of 4-core case? The stats indicates that 4-core test has less L2 > cache miss rate, which may be the reason of IPC improvement. > > > > > > > > -Tao > > > > > > > > From: [email protected] [mailto:[email protected]] > On Behalf Of Fulya Kaplan > > Sent: Monday, November 04, 2013 10:20 AM > > To: gem5 users mailing list > > Subject: [gem5-users] Weird IPC statistics for Spec2006 Multiprogram mode > > > > > > > > Hi all, > > > > I am running Spec 2006 on X86 with the version gem5-stable-07352f119e48. > I am using multiprogram mode with syscall emulation. I am trying to compare > the IPC statistics for 2 cases: > > > > 1)Running benchmark A on a single core > > > > 2)Running 4 instances of benchmark A on a 4-core system with 1MB private > L2 cashes. > > > > All parameters are the same for the 2 runs except the number of cores. > > > > I am expecting some IPC decrease for the 4-core case as the cores will > share the same system bus. However, for CactusADM and Soplex benchmarks, I > see higher IPC for case2 compared to case 1. > > > > I look at the same phase of execution for both runs. I fastforward for 2 > billion instructions and grab the ipc for each of the cores corresponding > to the next 100 million instructions in detailed mode. > > > > I ll report some other statistics for CactusADM to give a better idea of > what is going on. > > > > Case 1: ipc=0.664141, L2_overall _accesses=573746, L2_miss_rate=0.616 > > > > Case 2: cpu0_ipc=0.718562, cpu1_ipc= 0.720464, cpu2_ipc=0.717405, > cpu3_ipc= 0.716513 > > > > L2_0_accesses=591607, L2_1_accesses=581846, > L2_2_accesses=568095, L2_3_accesses=561180, L2_0_missrate=0.452978, > L2_1_missrate=0.454510, L2_2_missrate=0.475646, L2_3_missrate=0.488171 > > > > > > > > Case 1:Running Time for 100M insts = 0.0716 > > -- > Thank you, > Saptarshi Mallick > Department of Electrical and Computer Engineering > Utah State University > Utah, USA. > > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
