Re: [gem5-users] Weird IPC statistics for Spec2006 Multiprogram mode

biswabandan panda Wed, 06 Nov 2013 08:26:03 -0800

Could you check your config.dot.pdf inside m5out ??


On Wed, Nov 6, 2013 at 9:50 PM, Fulya Kaplan <[email protected]> wrote:

> I was also wondering if my CacheConfig.py file looks ok in terms of
> defining private L2 caches.
> Best,
> Fulya
>
>
> On Wed, Nov 6, 2013 at 11:18 AM, Fulya Kaplan <[email protected]> wrote:
>
>> Hi Saptarshi,
>> Command line for *Case 1:*
>> /mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/build/X86/gem5.fast
>> --outdir=/mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/RUNS/14_single/m5out_cactusADM
>> --remote-gdb-port=0
>> /mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/configs/example/se_AMD_multicore.py
>>  -n 1 --cpu-type=detailed --caches --l2cache --num-l2caches=1
>> --l1d_size=64kB --l1i_size=64kB --l1d_assoc=2 --l1i_assoc=2 --l2_size=1MB
>> --l2_assoc=16 --fast-forward=2000000000 --bench="cactusADM"
>> --max_total_inst=100000000 --clock=2.1GHz
>>
>> Command line for *Case 2:*
>> /mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/build/X86/gem5.fast
>> --outdir=/mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/RUNS/14_hom/m5out_cactusADM-cactusADM-cactusADM-cactusADM
>> --remote-gdb-port=0
>> /mnt/nokrb/fkaplan3/gem5/gem5-stable-07352f119e48/configs/example/se_AMD_multicore.py
>>  -n 4 --cpu-type=detailed --caches --l2cache --num-l2caches=4
>> --l1d_size=64kB --l1i_size=64kB --l1d_assoc=2 --l1i_assoc=2 --l2_size=1MB
>> --l2_assoc=16 --fast-forward=2000000000
>> --bench="cactusADM-cactusADM-cactusADM-cactusADM"
>> --max_total_inst=400000000 --clock=2.1GHz
>>
>> For clarity, i turned off remote-gdb because I was getting error about
>> listeners when running on the cluster, and I am not doing anything with gdb.
>> My gem5 is modified such that I count the number of instructions executed
>> after switching (which i implemented in o3 cpu definition). Max_total_inst
>> option determines the total number of instructions executed for all cores
>> after switching, therefore this number is set to 400 million in case 2. I
>> also checked and verified this by looking at the stats.txt file.
>> Let me know if you also need to check my se_AMD_multicore.py file. This
>> has been modified to add all the Spec benchmarks and their binary&input
>> file paths.
>>
>> Thanks,
>> Fulya
>>
>>
>> On Wed, Nov 6, 2013 at 11:03 AM, Saptarshi Mallick <
>> [email protected]> wrote:
>>
>>> Hello Fulya,
>>>                  Can you please give the command line for both the
>>> cases, which you run for getting the results? I also had the same kind of
>>> problem, maybe there is some mistake in the command line which we are
>>> using.
>>>
>>> On Tuesday, November 5, 2013, Fulya Kaplan <[email protected]> wrote:
>>> > Number of  committed instructions
>>> (system.switch_cpus.committedInsts_total) for;
>>> > Case 1: 100,000,000
>>> > Case 2: cpu0->100,045,354
>>> >             cpu1->100,310,197
>>> >             cpu2-> 99,884,333
>>> >             cpu3-> 99,760,117
>>> > Number of cycles for;
>>> > Case 1: 150,570,516
>>> > Case 2: 139,230,042
>>> > For both cases, they switch cpus to detailed mode at instruction #2
>>> billion. All the reported data correspond to the 100 million instructions
>>> in detailed mode.
>>> > From the config.ini files I can see that separate L2 caches are
>>> defined for Case 2. My modified CacheConfig.py file to have private L2
>>> caches looks like:
>>> >  def config_cache(options, system):
>>> >     if options.cpu_type == "arm_detailed":
>>> >         try:
>>> >             from O3_ARM_v7a import *
>>> >         except:
>>> >             print "arm_detailed is unavailable. Did you compile the O3
>>> model?"
>>> >             sys.exit(1)
>>> >         dcache_class, icache_class, l2_cache_class = \
>>> >             O3_ARM_v7a_DCache, O3_ARM_v7a_ICache, O3_ARM_v7aL2
>>> >     else:
>>> >         dcache_class, icache_class, l2_cache_class = \
>>> >             L1Cache, L1Cache, L2Cache
>>> >     if options.l2cache:
>>> >         # Provide a clock for the L2 and the L1-to-L2 bus here as they
>>> >         # are not connected using addTwoLevelCacheHierarchy. Use the
>>> >         # same clock as the CPUs, and set the L1-to-L2 bus width to 32
>>> >         # bytes (256 bits).
>>> >         system.l2 = [l2_cache_class(clock=options.clock,
>>> >                                    size=options.l2_size,
>>> >                                    assoc=options.l2_assoc,
>>> >                                    block_size=options.cacheline_size)
>>> for i in xrange(options.num_cpus)]
>>> >
>>> >         system.tol2bus = [CoherentBus(clock = options.clock, width =
>>> 32) for i in xrange(options.num_cpus)]
>>> >         #system.l2.cpu_side = system.tol2bus.master
>>> >         #system.l2.mem_side = system.membus.slave
>>> >     for i in xrange(options.num_cpus):
>>> >         if options.caches:
>>> >             icache = icache_class(size=options.l1i_size,
>>> >                                   assoc=options.l1i_assoc,
>>> >                                   block_size=options.cacheline_size)
>>> >             dcache = dcache_class(size=options.l1d_size,
>>> >                                   assoc=options.l1d_assoc,
>>> >                                   block_size=options.cacheline_size)
>>> >             # When connecting the caches, the clock is also inherited
>>> >             # from the CPU in question
>>> >             if buildEnv['TARGET_ISA'] == 'x86':
>>> >                 system.cpu[i].addPrivateSplitL1Caches(icache, dcache,
>>> >
>>> PageTableWalkerCache(),
>>> >
>>> PageTableWalkerCache())
>>> >             else:
>>> >                 system.cpu[i].addPrivateSplitL1Caches(icache, dcache)
>>> >         system.cpu[i].createInterruptController()
>>> >         if options.l2cache:
>>> >             system.l2[i].cpu_side = system.tol2bus[i].master
>>> >             system.l2[i].mem_side = system.membus.slave
>>> >             system.cpu[i].connectAllPorts(system.tol2bus[i],
>>> system.membus)
>>> >         else:
>>> >             system.cpu[i].connectAllPorts(system.membus)
>>> >     return system
>>> >
>>> >
>>> >
>>> > Best,
>>> > Fulya
>>> >
>>> >
>>> >
>>> >
>>> > On Mon, Nov 4, 2013 at 10:35 PM, biswabandan panda <
>>> [email protected]> wrote:
>>> >
>>> > Hi,
>>> >        Could you report the number of committedInsts for both the
>>> cases.
>>> >
>>> >
>>> > On Tue, Nov 5, 2013 at 7:04 AM, fulya <[email protected]> wrote:
>>> >
>>> > In single core case, there is a 1 MB L2 cache. In 4-core case, each
>>> core has its own private L2 cache of size 1 MB. As they are not shared, i
>>> dont understand the reason for different cache miss rates.
>>> >
>>> > Best,
>>> > Fulya Kaplan
>>> > On Nov 4, 2013, at 7:55 PM, "Tao Zhang" <[email protected]>
>>> wrote:
>>> >
>>> > Hi Fulya,
>>> >
>>> >
>>> >
>>> > What’s the L2 cache size of the 1-core test? Is it equal to the total
>>> capacity of 4-core case? The stats indicates that 4-core test has less L2
>>> cache miss rate, which may be the reason of IPC improvement.
>>> >
>>> >
>>> >
>>> > -Tao
>>> >
>>> >
>>> >
>>> > From: [email protected] [mailto:[email protected]]
>>> On Behalf Of Fulya Kaplan
>>> > Sent: Monday, November 04, 2013 10:20 AM
>>> > To: gem5 users mailing list
>>> > Subject: [gem5-users] Weird IPC statistics for Spec2006 Multiprogram
>>> mode
>>> >
>>> >
>>> >
>>> > Hi all,
>>> >
>>> > I am running Spec 2006 on X86 with the version
>>> gem5-stable-07352f119e48. I am using multiprogram mode with syscall
>>> emulation. I am trying to compare the IPC statistics for 2 cases:
>>> >
>>> > 1)Running benchmark A on a single core
>>> >
>>> > 2)Running 4 instances of benchmark A on a 4-core system with 1MB
>>> private L2 cashes.
>>> >
>>> > All parameters are the same for the 2 runs except the number of cores.
>>> >
>>> > I am expecting some IPC decrease for the 4-core case as the cores will
>>> share the same system bus. However, for CactusADM and Soplex benchmarks, I
>>> see higher IPC for case2 compared to case 1.
>>> >
>>> > I look at the same phase of execution for both runs. I fastforward for
>>> 2 billion instructions and grab the ipc for each of the cores corresponding
>>> to the next 100 million instructions in detailed mode.
>>> >
>>> > I ll report some other statistics for CactusADM to give a better idea
>>> of what is going on.
>>> >
>>> > Case 1: ipc=0.664141, L2_overall _accesses=573746, L2_miss_rate=0.616
>>> >
>>> > Case 2: cpu0_ipc=0.718562, cpu1_ipc= 0.720464, cpu2_ipc=0.717405,
>>> cpu3_ipc= 0.716513
>>> >
>>> >             L2_0_accesses=591607, L2_1_accesses=581846,
>>> L2_2_accesses=568095, L2_3_accesses=561180, L2_0_missrate=0.452978,
>>> L2_1_missrate=0.454510, L2_2_missrate=0.475646, L2_3_missrate=0.488171
>>> >
>>> >
>>> >
>>> > Case 1:Running Time for 100M insts = 0.0716
>>>
>>> --
>>> Thank you,
>>> Saptarshi Mallick
>>> Department of Electrical and Computer Engineering
>>> Utah State University
>>> Utah, USA.
>>>
>>>
>>> _______________________________________________
>>> gem5-users mailing list
>>> [email protected]
>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>
>>
>>
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>



-- 


*thanks&regards*
*BISWABANDAN*
http://www.cse.iitm.ac.in/~biswa/

“We might fall down, but we will never lay down. We might not be the best,
but we will beat the best! We might not be at the top, but we will rise.”

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Weird IPC statistics for Spec2006 Multiprogram mode

Reply via email to