Re: [gem5-users] Weird IPC statistics for Spec2006 Multiprogram mode

Fulya Kaplan Wed, 06 Nov 2013 09:15:05 -0800

I am using the default model for the DRAM, I just changed the mem latency
to 100ns.



On Wed, Nov 6, 2013 at 11:17 AM, biswabandan panda <[email protected]>wrote:

> Hi,
>         What about your DRAM model. Are using DDR3 or the similar ones???
>
>
> On Wed, Nov 6, 2013 at 9:33 PM, Saptarshi Mallick <
> [email protected]> wrote:
>
>> Hello Fulya,
>>                  Can you please give the command line for both the cases,
>> which you run for getting the results? I also had the same kind of problem,
>> maybe there is some mistake in the command line which we are using.
>>
>> On Tuesday, November 5, 2013, Fulya Kaplan <[email protected]> wrote:
>> > Number of  committed instructions
>> (system.switch_cpus.committedInsts_total) for;
>> > Case 1: 100,000,000
>> > Case 2: cpu0->100,045,354
>> >             cpu1->100,310,197
>> >             cpu2-> 99,884,333
>> >             cpu3-> 99,760,117
>> > Number of cycles for;
>> > Case 1: 150,570,516
>> > Case 2: 139,230,042
>> > For both cases, they switch cpus to detailed mode at instruction #2
>> billion. All the reported data correspond to the 100 million instructions
>> in detailed mode.
>> > From the config.ini files I can see that separate L2 caches are defined
>> for Case 2. My modified CacheConfig.py file to have private L2 caches looks
>> like:
>> >  def config_cache(options, system):
>> >     if options.cpu_type == "arm_detailed":
>> >         try:
>> >             from O3_ARM_v7a import *
>> >         except:
>> >             print "arm_detailed is unavailable. Did you compile the O3
>> model?"
>> >             sys.exit(1)
>> >         dcache_class, icache_class, l2_cache_class = \
>> >             O3_ARM_v7a_DCache, O3_ARM_v7a_ICache, O3_ARM_v7aL2
>> >     else:
>> >         dcache_class, icache_class, l2_cache_class = \
>> >             L1Cache, L1Cache, L2Cache
>> >     if options.l2cache:
>> >         # Provide a clock for the L2 and the L1-to-L2 bus here as they
>> >         # are not connected using addTwoLevelCacheHierarchy. Use the
>> >         # same clock as the CPUs, and set the L1-to-L2 bus width to 32
>> >         # bytes (256 bits).
>> >         system.l2 = [l2_cache_class(clock=options.clock,
>> >                                    size=options.l2_size,
>> >                                    assoc=options.l2_assoc,
>> >                                    block_size=options.cacheline_size)
>> for i in xrange(options.num_cpus)]
>> >
>> >         system.tol2bus = [CoherentBus(clock = options.clock, width =
>> 32) for i in xrange(options.num_cpus)]
>> >         #system.l2.cpu_side = system.tol2bus.master
>> >         #system.l2.mem_side = system.membus.slave
>> >     for i in xrange(options.num_cpus):
>> >         if options.caches:
>> >             icache = icache_class(size=options.l1i_size,
>> >                                   assoc=options.l1i_assoc,
>> >                                   block_size=options.cacheline_size)
>> >             dcache = dcache_class(size=options.l1d_size,
>> >                                   assoc=options.l1d_assoc,
>> >                                   block_size=options.cacheline_size)
>> >             # When connecting the caches, the clock is also inherited
>> >             # from the CPU in question
>> >             if buildEnv['TARGET_ISA'] == 'x86':
>> >                 system.cpu[i].addPrivateSplitL1Caches(icache, dcache,
>> >
>> PageTableWalkerCache(),
>> >
>> PageTableWalkerCache())
>> >             else:
>> >                 system.cpu[i].addPrivateSplitL1Caches(icache, dcache)
>> >         system.cpu[i].createInterruptController()
>> >         if options.l2cache:
>> >             system.l2[i].cpu_side = system.tol2bus[i].master
>> >             system.l2[i].mem_side = system.membus.slave
>> >             system.cpu[i].connectAllPorts(system.tol2bus[i],
>> system.membus)
>> >         else:
>> >             system.cpu[i].connectAllPorts(system.membus)
>> >     return system
>> >
>> >
>> >
>> > Best,
>> > Fulya
>> >
>> >
>> >
>> >
>> > On Mon, Nov 4, 2013 at 10:35 PM, biswabandan panda <[email protected]>
>> wrote:
>> >
>> > Hi,
>> >        Could you report the number of committedInsts for both the cases.
>> >
>> >
>> > On Tue, Nov 5, 2013 at 7:04 AM, fulya <[email protected]> wrote:
>> >
>> > In single core case, there is a 1 MB L2 cache. In 4-core case, each
>> core has its own private L2 cache of size 1 MB. As they are not shared, i
>> dont understand the reason for different cache miss rates.
>> >
>> > Best,
>> > Fulya Kaplan
>> > On Nov 4, 2013, at 7:55 PM, "Tao Zhang" <[email protected]>
>> wrote:
>> >
>> > Hi Fulya,
>> >
>> >
>> >
>> > What’s the L2 cache size of the 1-core test? Is it equal to the total
>> capacity of 4-core case? The stats indicates that 4-core test has less L2
>> cache miss rate, which may be the reason of IPC improvement.
>> >
>> >
>> >
>> > -Tao
>> >
>> >
>> >
>> > From: [email protected] [mailto:[email protected]]
>> On Behalf Of Fulya Kaplan
>> > Sent: Monday, November 04, 2013 10:20 AM
>> > To: gem5 users mailing list
>> > Subject: [gem5-users] Weird IPC statistics for Spec2006 Multiprogram
>> mode
>> >
>> >
>> >
>> > Hi all,
>> >
>> > I am running Spec 2006 on X86 with the version
>> gem5-stable-07352f119e48. I am using multiprogram mode with syscall
>> emulation. I am trying to compare the IPC statistics for 2 cases:
>> >
>> > 1)Running benchmark A on a single core
>> >
>> > 2)Running 4 instances of benchmark A on a 4-core system with 1MB
>> private L2 cashes.
>> >
>> > All parameters are the same for the 2 runs except the number of cores.
>> >
>> > I am expecting some IPC decrease for the 4-core case as the cores will
>> share the same system bus. However, for CactusADM and Soplex benchmarks, I
>> see higher IPC for case2 compared to case 1.
>> >
>> > I look at the same phase of execution for both runs. I fastforward for
>> 2 billion instructions and grab the ipc for each of the cores corresponding
>> to the next 100 million instructions in detailed mode.
>> >
>> > I ll report some other statistics for CactusADM to give a better idea
>> of what is going on.
>> >
>> > Case 1: ipc=0.664141, L2_overall _accesses=573746, L2_miss_rate=0.616
>> >
>> > Case 2: cpu0_ipc=0.718562, cpu1_ipc= 0.720464, cpu2_ipc=0.717405,
>> cpu3_ipc= 0.716513
>> >
>> >             L2_0_accesses=591607, L2_1_accesses=581846,
>> L2_2_accesses=568095, L2_3_accesses=561180, L2_0_missrate=0.452978,
>> L2_1_missrate=0.454510, L2_2_missrate=0.475646, L2_3_missrate=0.488171
>> >
>> >
>> >
>> > Case 1:Running Time for 100M insts = 0.0716
>>
>> --
>> Thank you,
>> Saptarshi Mallick
>> Department of Electrical and Computer Engineering
>> Utah State University
>> Utah, USA.
>>
>>
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
>
>
> --
>
>
> *thanks&regards *
> *BISWABANDAN*
> http://www.cse.iitm.ac.in/~biswa/
>
> “We might fall down, but we will never lay down. We might not be the best,
> but we will beat the best! We might not be at the top, but we will rise.”
>
>
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Weird IPC statistics for Spec2006 Multiprogram mode

Reply via email to