Hello Fulya,
                 Can you please give the command line for both the cases,
which you run for getting the results? I also had the same kind of problem,
maybe there is some mistake in the command line which we are using.

On Tuesday, November 5, 2013, Fulya Kaplan <[email protected]> wrote:
> Number of  committed instructions
(system.switch_cpus.committedInsts_total) for;
> Case 1: 100,000,000
> Case 2: cpu0->100,045,354
>             cpu1->100,310,197
>             cpu2-> 99,884,333
>             cpu3-> 99,760,117
> Number of cycles for;
> Case 1: 150,570,516
> Case 2: 139,230,042
> For both cases, they switch cpus to detailed mode at instruction #2
billion. All the reported data correspond to the 100 million instructions
in detailed mode.
> From the config.ini files I can see that separate L2 caches are defined
for Case 2. My modified CacheConfig.py file to have private L2 caches looks
like:
>  def config_cache(options, system):
>     if options.cpu_type == "arm_detailed":
>         try:
>             from O3_ARM_v7a import *
>         except:
>             print "arm_detailed is unavailable. Did you compile the O3
model?"
>             sys.exit(1)
>         dcache_class, icache_class, l2_cache_class = \
>             O3_ARM_v7a_DCache, O3_ARM_v7a_ICache, O3_ARM_v7aL2
>     else:
>         dcache_class, icache_class, l2_cache_class = \
>             L1Cache, L1Cache, L2Cache
>     if options.l2cache:
>         # Provide a clock for the L2 and the L1-to-L2 bus here as they
>         # are not connected using addTwoLevelCacheHierarchy. Use the
>         # same clock as the CPUs, and set the L1-to-L2 bus width to 32
>         # bytes (256 bits).
>         system.l2 = [l2_cache_class(clock=options.clock,
>                                    size=options.l2_size,
>                                    assoc=options.l2_assoc,
>                                    block_size=options.cacheline_size) for
i in xrange(options.num_cpus)]
>
>         system.tol2bus = [CoherentBus(clock = options.clock, width = 32)
for i in xrange(options.num_cpus)]
>         #system.l2.cpu_side = system.tol2bus.master
>         #system.l2.mem_side = system.membus.slave
>     for i in xrange(options.num_cpus):
>         if options.caches:
>             icache = icache_class(size=options.l1i_size,
>                                   assoc=options.l1i_assoc,
>                                   block_size=options.cacheline_size)
>             dcache = dcache_class(size=options.l1d_size,
>                                   assoc=options.l1d_assoc,
>                                   block_size=options.cacheline_size)
>             # When connecting the caches, the clock is also inherited
>             # from the CPU in question
>             if buildEnv['TARGET_ISA'] == 'x86':
>                 system.cpu[i].addPrivateSplitL1Caches(icache, dcache,
>
PageTableWalkerCache(),
>
PageTableWalkerCache())
>             else:
>                 system.cpu[i].addPrivateSplitL1Caches(icache, dcache)
>         system.cpu[i].createInterruptController()
>         if options.l2cache:
>             system.l2[i].cpu_side = system.tol2bus[i].master
>             system.l2[i].mem_side = system.membus.slave
>             system.cpu[i].connectAllPorts(system.tol2bus[i],
system.membus)
>         else:
>             system.cpu[i].connectAllPorts(system.membus)
>     return system
>
>
>
> Best,
> Fulya
>
>
>
>
> On Mon, Nov 4, 2013 at 10:35 PM, biswabandan panda <[email protected]>
wrote:
>
> Hi,
>        Could you report the number of committedInsts for both the cases.
>
>
> On Tue, Nov 5, 2013 at 7:04 AM, fulya <[email protected]> wrote:
>
> In single core case, there is a 1 MB L2 cache. In 4-core case, each core
has its own private L2 cache of size 1 MB. As they are not shared, i dont
understand the reason for different cache miss rates.
>
> Best,
> Fulya Kaplan
> On Nov 4, 2013, at 7:55 PM, "Tao Zhang" <[email protected]> wrote:
>
> Hi Fulya,
>
>
>
> What’s the L2 cache size of the 1-core test? Is it equal to the total
capacity of 4-core case? The stats indicates that 4-core test has less L2
cache miss rate, which may be the reason of IPC improvement.
>
>
>
> -Tao
>
>
>
> From: [email protected] [mailto:[email protected]] On
Behalf Of Fulya Kaplan
> Sent: Monday, November 04, 2013 10:20 AM
> To: gem5 users mailing list
> Subject: [gem5-users] Weird IPC statistics for Spec2006 Multiprogram mode
>
>
>
> Hi all,
>
> I am running Spec 2006 on X86 with the version gem5-stable-07352f119e48.
I am using multiprogram mode with syscall emulation. I am trying to compare
the IPC statistics for 2 cases:
>
> 1)Running benchmark A on a single core
>
> 2)Running 4 instances of benchmark A on a 4-core system with 1MB private
L2 cashes.
>
> All parameters are the same for the 2 runs except the number of cores.
>
> I am expecting some IPC decrease for the 4-core case as the cores will
share the same system bus. However, for CactusADM and Soplex benchmarks, I
see higher IPC for case2 compared to case 1.
>
> I look at the same phase of execution for both runs. I fastforward for 2
billion instructions and grab the ipc for each of the cores corresponding
to the next 100 million instructions in detailed mode.
>
> I ll report some other statistics for CactusADM to give a better idea of
what is going on.
>
> Case 1: ipc=0.664141, L2_overall _accesses=573746, L2_miss_rate=0.616
>
> Case 2: cpu0_ipc=0.718562, cpu1_ipc= 0.720464, cpu2_ipc=0.717405,
cpu3_ipc= 0.716513
>
>             L2_0_accesses=591607, L2_1_accesses=581846,
L2_2_accesses=568095, L2_3_accesses=561180, L2_0_missrate=0.452978,
L2_1_missrate=0.454510, L2_2_missrate=0.475646, L2_3_missrate=0.488171
>
>
>
> Case 1:Running Time for 100M insts = 0.0716

-- 
Thank you,
Saptarshi Mallick
Department of Electrical and Computer Engineering
Utah State University
Utah, USA.
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to