Re: [gem5-users] Weird IPC statistics for Spec2006 Multiprogram mode

Fulya Kaplan Wed, 06 Nov 2013 08:21:05 -0800

Hi Saptarshi,
Command line for *Case 1:*
/mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/build/X86/gem5.fast
--outdir=/mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/RUNS/14_single/m5out_cactusADM
--remote-gdb-port=0
/mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/configs/example/se_AMD_multicore.py
 -n 1 --cpu-type=detailed --caches --l2cache --num-l2caches=1
--l1d_size=64kB --l1i_size=64kB --l1d_assoc=2 --l1i_assoc=2 --l2_size=1MB
--l2_assoc=16 --fast-forward=2000000000 --bench="cactusADM"
--max_total_inst=100000000 --clock=2.1GHz


Command line for *Case 2:*
/mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/build/X86/gem5.fast
--outdir=/mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/RUNS/14_hom/m5out_cactusADM-cactusADM-cactusADM-cactusADM
--remote-gdb-port=0
/mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/configs/example/se_AMD_multicore.py
 -n 4 --cpu-type=detailed --caches --l2cache --num-l2caches=4
--l1d_size=64kB --l1i_size=64kB --l1d_assoc=2 --l1i_assoc=2 --l2_size=1MB
--l2_assoc=16 --fast-forward=2000000000
--bench="cactusADM-cactusADM-cactusADM-cactusADM"
--max_total_inst=400000000 --clock=2.1GHz

For clarity, i turned off remote-gdb because I was getting error about
listeners when running on the cluster, and I am not doing anything with gdb.
My gem5 is modified such that I count the number of instructions executed
after switching (which i implemented in o3 cpu definition). Max_total_inst
option determines the total number of instructions executed for all cores
after switching, therefore this number is set to 400 million in case 2. I
also checked and verified this by looking at the stats.txt file.
Let me know if you also need to check my se_AMD_multicore.py file. This has
been modified to add all the Spec benchmarks and their binary&input file
paths.

Thanks,
Fulya


On Wed, Nov 6, 2013 at 11:03 AM, Saptarshi Mallick <
[email protected]> wrote:

> Hello Fulya,
>                  Can you please give the command line for both the cases,
> which you run for getting the results? I also had the same kind of problem,
> maybe there is some mistake in the command line which we are using.
>
> On Tuesday, November 5, 2013, Fulya Kaplan <[email protected]> wrote:
> > Number of  committed instructions
> (system.switch_cpus.committedInsts_total) for;
> > Case 1: 100,000,000
> > Case 2: cpu0->100,045,354
> >             cpu1->100,310,197
> >             cpu2-> 99,884,333
> >             cpu3-> 99,760,117
> > Number of cycles for;
> > Case 1: 150,570,516
> > Case 2: 139,230,042
> > For both cases, they switch cpus to detailed mode at instruction #2
> billion. All the reported data correspond to the 100 million instructions
> in detailed mode.
> > From the config.ini files I can see that separate L2 caches are defined
> for Case 2. My modified CacheConfig.py file to have private L2 caches looks
> like:
> >  def config_cache(options, system):
> >     if options.cpu_type == "arm_detailed":
> >         try:
> >             from O3_ARM_v7a import *
> >         except:
> >             print "arm_detailed is unavailable. Did you compile the O3
> model?"
> >             sys.exit(1)
> >         dcache_class, icache_class, l2_cache_class = \
> >             O3_ARM_v7a_DCache, O3_ARM_v7a_ICache, O3_ARM_v7aL2
> >     else:
> >         dcache_class, icache_class, l2_cache_class = \
> >             L1Cache, L1Cache, L2Cache
> >     if options.l2cache:
> >         # Provide a clock for the L2 and the L1-to-L2 bus here as they
> >         # are not connected using addTwoLevelCacheHierarchy. Use the
> >         # same clock as the CPUs, and set the L1-to-L2 bus width to 32
> >         # bytes (256 bits).
> >         system.l2 = [l2_cache_class(clock=options.clock,
> >                                    size=options.l2_size,
> >                                    assoc=options.l2_assoc,
> >                                    block_size=options.cacheline_size)
> for i in xrange(options.num_cpus)]
> >
> >         system.tol2bus = [CoherentBus(clock = options.clock, width = 32)
> for i in xrange(options.num_cpus)]
> >         #system.l2.cpu_side = system.tol2bus.master
> >         #system.l2.mem_side = system.membus.slave
> >     for i in xrange(options.num_cpus):
> >         if options.caches:
> >             icache = icache_class(size=options.l1i_size,
> >                                   assoc=options.l1i_assoc,
> >                                   block_size=options.cacheline_size)
> >             dcache = dcache_class(size=options.l1d_size,
> >                                   assoc=options.l1d_assoc,
> >                                   block_size=options.cacheline_size)
> >             # When connecting the caches, the clock is also inherited
> >             # from the CPU in question
> >             if buildEnv['TARGET_ISA'] == 'x86':
> >                 system.cpu[i].addPrivateSplitL1Caches(icache, dcache,
> >
> PageTableWalkerCache(),
> >
> PageTableWalkerCache())
> >             else:
> >                 system.cpu[i].addPrivateSplitL1Caches(icache, dcache)
> >         system.cpu[i].createInterruptController()
> >         if options.l2cache:
> >             system.l2[i].cpu_side = system.tol2bus[i].master
> >             system.l2[i].mem_side = system.membus.slave
> >             system.cpu[i].connectAllPorts(system.tol2bus[i],
> system.membus)
> >         else:
> >             system.cpu[i].connectAllPorts(system.membus)
> >     return system
> >
> >
> >
> > Best,
> > Fulya
> >
> >
> >
> >
> > On Mon, Nov 4, 2013 at 10:35 PM, biswabandan panda <[email protected]>
> wrote:
> >
> > Hi,
> >        Could you report the number of committedInsts for both the cases.
> >
> >
> > On Tue, Nov 5, 2013 at 7:04 AM, fulya <[email protected]> wrote:
> >
> > In single core case, there is a 1 MB L2 cache. In 4-core case, each core
> has its own private L2 cache of size 1 MB. As they are not shared, i dont
> understand the reason for different cache miss rates.
> >
> > Best,
> > Fulya Kaplan
> > On Nov 4, 2013, at 7:55 PM, "Tao Zhang" <[email protected]>
> wrote:
> >
> > Hi Fulya,
> >
> >
> >
> > What’s the L2 cache size of the 1-core test? Is it equal to the total
> capacity of 4-core case? The stats indicates that 4-core test has less L2
> cache miss rate, which may be the reason of IPC improvement.
> >
> >
> >
> > -Tao
> >
> >
> >
> > From: [email protected] [mailto:[email protected]]
> On Behalf Of Fulya Kaplan
> > Sent: Monday, November 04, 2013 10:20 AM
> > To: gem5 users mailing list
> > Subject: [gem5-users] Weird IPC statistics for Spec2006 Multiprogram mode
> >
> >
> >
> > Hi all,
> >
> > I am running Spec 2006 on X86 with the version gem5-stable-07352f119e48.
> I am using multiprogram mode with syscall emulation. I am trying to compare
> the IPC statistics for 2 cases:
> >
> > 1)Running benchmark A on a single core
> >
> > 2)Running 4 instances of benchmark A on a 4-core system with 1MB private
> L2 cashes.
> >
> > All parameters are the same for the 2 runs except the number of cores.
> >
> > I am expecting some IPC decrease for the 4-core case as the cores will
> share the same system bus. However, for CactusADM and Soplex benchmarks, I
> see higher IPC for case2 compared to case 1.
> >
> > I look at the same phase of execution for both runs. I fastforward for 2
> billion instructions and grab the ipc for each of the cores corresponding
> to the next 100 million instructions in detailed mode.
> >
> > I ll report some other statistics for CactusADM to give a better idea of
> what is going on.
> >
> > Case 1: ipc=0.664141, L2_overall _accesses=573746, L2_miss_rate=0.616
> >
> > Case 2: cpu0_ipc=0.718562, cpu1_ipc= 0.720464, cpu2_ipc=0.717405,
> cpu3_ipc= 0.716513
> >
> >             L2_0_accesses=591607, L2_1_accesses=581846,
> L2_2_accesses=568095, L2_3_accesses=561180, L2_0_missrate=0.452978,
> L2_1_missrate=0.454510, L2_2_missrate=0.475646, L2_3_missrate=0.488171
> >
> >
> >
> > Case 1:Running Time for 100M insts = 0.0716
>
> --
> Thank you,
> Saptarshi Mallick
> Department of Electrical and Computer Engineering
> Utah State University
> Utah, USA.
>
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Weird IPC statistics for Spec2006 Multiprogram mode

Reply via email to