Re: [m5-users] running multi core- instruction count in all cores is zero except 1... why?

Robert Pulumbarit Mon, 30 Jun 2008 14:37:37 -0700

Robert Pulumbarit <robertpulumbarit <at> gmail.com> writes:
> 
> 
> Steve,
>  
> As best as I could, I tried to glean the aspects from the example you gave 
that are relevant to our setup and incorporated them.  The main change we did 
was to switch to LiveProcess workloads from cpu2000.py instead of EIO traces.  
This seems to have solved our problem of only one cpu in a multicore system 
executing any instructions.
> 
>  
> With some modifications to cpu2000.py to accommodate the SPECCPU 2000 
directory structure we have here, I think we've got it partially working.  
From the 48 benchmarks listed in cpu2000.py, we select a random number of them 
to run on our multicore/multithreaded system.
> 
>  
> For certain benchmark combinations, the simulation successfully reaches 
max_insts_all_threads.  For others, we get an abort message.  For example:
> (I snipped some portions of the output for compactness)
>  
> max_insts_all_threads = 10000000num of cpus = 4num threads per cpu = 
2system.cpu[0].workload = [gzip_log('alpha','tru64','ref').makeLiveProcess(), 
lucas('alpha','tru64','ref').makeLiveProcess()]
> system.cpu[1].workload = [gcc_expr('alpha','tru64','ref').makeLiveProcess(), 
eon_cook('alpha','tru64','ref').makeLiveProcess()]system.cpu[2].workload = 
[vpr_place('alpha','tru64','ref').makeLiveProcess(), vortex2
('alpha','tru64','ref').makeLiveProcess()]
> system.cpu[3].workload = [eon_rushmeier
('alpha','tru64','ref').makeLiveProcess(), gzip_graphic
('alpha','tru64','ref').makeLiveProcess()]Global frequency set at 
1000000000000 ticks per second
> 0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
(output snipped)0: system.remote_gdb.listener: listening for remote gdb #7 on 
port 7007**** REAL SIMULATION ****warn: Entering event queue  <at>  0.  
Starting simulation...
> warn: Increasing stack size by one page.(output snipped)warn: Increasing 
stack size by one page.warn: ignoring syscall sigprocmask(1, 1073070158, ...)
warn: ignoring syscall setrlimit(3, 4831387568, ...)
> warn: ignoring syscall sigaction(8, 4831387472, ...)warn: ignoring syscall 
sigaction(13, 4831387472, ...)warn: ignoring syscall sigprocmask(3, 0, ...)
m5.opt: build/ALPHA_SE/mem/cache/cache_impl.hh:313: bool 
Cache<TagStore>::access(Packet*, typename TagStore::BlkType*&, int&, 
PacketList&) [with TagStore = LRU]: Assertion `blkSize == pkt->getSize()' 
failed.
> Program aborted at cycle 90073000Abort
> 
> Can you give us any info on what could be causing the assertion to fail? 
>  
> Thank you,
> Robert Pulumbarit
> 
>  
> On Thu, Jun 26, 2008 at 8:09 AM, Steve Reinhardt <stever <at> gmail.com> 
wrote:
> On Thu, Jun 26, 2008 at 4:30 AM, Robert Pulumbarit<robertpulumbarit <at> 
gmail.com> wrote:>> Hello,>> In reading through the thread, unless I'm not 
following correctly, it seems
> > like the EIO trace files that we happen to be using may be a possible> 
culprit.  As a test to try to eliminate those EIO trace files as suspects,> 
just using the default config files and workloads that come with m5-stable, is
> > there a way to run a detailed (O3) multicore system in which all cores 
have> non-zero "# Number of instructions committed" 
(system.cpux.commit.COM:count)?
> See the quick/01.hello-2T-smt regression test.  Run it with this command 
line:build/ALPHA_SE/m5.debug tests/run.py quick/01.hello-2T-smt/alpha/linux/o3-
timing
> >> I've tried it using a fresh unmodified installation of m5-stable, using 
this> command line:>> (run from the /configs/example directory)>> ../../../m5-
stable/build/ALPHA_SE/m5.opt -d se se.py -n 4 --detailed --caches
> >> The output file se/m5stats.txt shows that only cpu2 commits 
instructions.  The> three other cpu's commit 0 instructions.>> What command 
line options should I use to create a 4 core system and assign
> > each core to run its own separate copy of the default "hello world"?  Can> 
anyone tell me what I am missing in the above command line to do this?
> You can't do this directly from the command line with se.py.  The
> script creates a single workload (stored in the 'process' variable inthe 
script) and then assigns that same workload instance to all theCPUs.  This is 
the right thing to do if you have a multithreadedprogram (e.g., SPLASH); when 
the application forks new threads, they
> will run on the other CPUs.  For a single-threaded program 
like "helloworld", it will never fork new threads, so only one CPU gets 
used.Steve
> 
> 
> _______________________________________________m5-users mailing listm5-users 
<at> m5sim.orghttp://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> m5-users mailing list
> m5-users <at> m5sim.org
> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users



Just to clarify the earlier post, we are using m5-stable, SE mode, with 
LiveProcess workloads, 10 million max_insts_all_threads.  We are running 
multicore-multithread experiments, and we tend to get the error when we get to 
2 core, 2 threads per core systems and higher.  If we reduce the number of 
cores, threads, and max_insts_all_threads, it seems we are less likely to get 
the error.

We added a line to /mem/cache/cache_impl.hh just before the assert statement 
around line 312 to see what the offending values of blkSize and pkt->getSIze() 
were.

Every time we get the error, blkSize is always twice the size of pkt-geSize().

I assume this has something to do with our cache setup, in which our L1 cache 
has 16B block size, L2 cache is twice that at 32B block size, and L3 is twice 
that at 64B block size.

For example, in one erroneous run, we had blkSize = 32, pkt->getSize() = 16.

In another erroneous run, we had blkSize = 64, pkt->getSize() = 32.

We made some attempts to trace where the faulty blkSize (or faulty pkt->getSize
()) originated from.

Here is as far as we traced:
The assertion in /mem/cache/cache_impl.hh, line 313, is contained within the 
method access(...).
  access(...) is called by:
    line 426:  bool Cache<TagStore>::timingAccess(PacketPtr pkt)
      timingAccess() is called by:
        line 1347:  bool Cache<TagStore>::CpuSidePort::recvTiming(PacketPtr 
pkt)
        recvTiming() is called by sendTiing():
          ./src/mem/port.hh:187:    bool sendTiming(PacketPtr pkt) { return 
peer->recvTiming(pkt); }
            ../../src/cpu/o3/lsq_unit_impl.hh:        if (!dcachePort-
>sendTiming(data_pkt)) {
            ../../src/cpu/o3/lsq_unit_impl.hh:        if (dcachePort-
>sendTiming(retryPkt)) {
            ../../src/cpu/o3/lsq_unit.hh:        if (!dcachePort->sendTiming
(data_pkt)) {
            ../../src/cpu/o3/fetch_impl.hh:        if (!icachePort->sendTiming
(data_pkt)) {
            ../../src/cpu/o3/fetch_impl.hh:        if (icachePort->sendTiming
(retryPkt)) {
            ../../src/dev/io_device.cc:        result = sendTiming(pkt);
            ../../src/dev/io_device.cc:            result = sendTiming(pkt);
            ../../src/mem/tport.cc:    bool success = sendTiming(dp.pkt);
            ../../src/mem/cache/cache_impl.hh:            memSidePort-
>sendTiming(snoopPkt);
            ../../src/mem/cache/cache_impl.hh:        cpuSidePort->sendTiming
(snoopPkt);
            ../../src/mem/cache/cache_impl.hh:        bool success = sendTiming
(transmitList.front().pkt);
            ../../src/mem/cache/cache_impl.hh:            bool success = 
sendTiming(pkt);
            ../../src/mem/bus.cc:                bool success M5_VAR_USED = p-
>sendTiming(pkt);
            ../../src/mem/bus.cc:        if (!dest_port->sendTiming(pkt))  {
            ../../src/mem/bridge.cc:    if (sendTiming(pkt)) {
            
We added a trace statement (fprintf) before every one of those sendTiming() 
calls, and unfortunately we were overwhelmed with output statements, since 
many if not all of those listed sendTiming() calls are made.

Short of us adding even more traces to every one of the sendTming() callers, 
can anyone with more knowledge of the code help narrow down the problem?


_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Re: [m5-users] running multi core- instruction count in all cores is zero except 1... why?

Reply via email to