Robert Pulumbarit <robertpulumbarit <at> gmail.com> writes:
>
>
> Steve,
>
> As best as I could, I tried to glean the aspects from the example you gave
that are relevant to our setup and incorporated them. The main change we did
was to switch to LiveProcess workloads from cpu2000.py instead of EIO traces.
This seems to have solved our problem of only one cpu in a multicore system
executing any instructions.
>
>
> With some modifications to cpu2000.py to accommodate the SPECCPU 2000
directory structure we have here, I think we've got it partially working.
From the 48 benchmarks listed in cpu2000.py, we select a random number of them
to run on our multicore/multithreaded system.
>
>
> For certain benchmark combinations, the simulation successfully reaches
max_insts_all_threads. For others, we get an abort message. For example:
> (I snipped some portions of the output for compactness)
>
> max_insts_all_threads = 10000000num of cpus = 4num threads per cpu =
2system.cpu[0].workload = [gzip_log('alpha','tru64','ref').makeLiveProcess(),
lucas('alpha','tru64','ref').makeLiveProcess()]
> system.cpu[1].workload = [gcc_expr('alpha','tru64','ref').makeLiveProcess(),
eon_cook('alpha','tru64','ref').makeLiveProcess()]system.cpu[2].workload =
[vpr_place('alpha','tru64','ref').makeLiveProcess(), vortex2
('alpha','tru64','ref').makeLiveProcess()]
> system.cpu[3].workload = [eon_rushmeier
('alpha','tru64','ref').makeLiveProcess(), gzip_graphic
('alpha','tru64','ref').makeLiveProcess()]Global frequency set at
1000000000000 ticks per second
> 0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
(output snipped)0: system.remote_gdb.listener: listening for remote gdb #7 on
port 7007**** REAL SIMULATION ****warn: Entering event queue <at> 0.
Starting simulation...
> warn: Increasing stack size by one page.(output snipped)warn: Increasing
stack size by one page.warn: ignoring syscall sigprocmask(1, 1073070158, ...)
warn: ignoring syscall setrlimit(3, 4831387568, ...)
> warn: ignoring syscall sigaction(8, 4831387472, ...)warn: ignoring syscall
sigaction(13, 4831387472, ...)warn: ignoring syscall sigprocmask(3, 0, ...)
m5.opt: build/ALPHA_SE/mem/cache/cache_impl.hh:313: bool
Cache<TagStore>::access(Packet*, typename TagStore::BlkType*&, int&,
PacketList&) [with TagStore = LRU]: Assertion `blkSize == pkt->getSize()'
failed.
> Program aborted at cycle 90073000Abort
>
> Can you give us any info on what could be causing the assertion to fail?
>
> Thank you,
> Robert Pulumbarit
>
>
> On Thu, Jun 26, 2008 at 8:09 AM, Steve Reinhardt <stever <at> gmail.com>
wrote:
> On Thu, Jun 26, 2008 at 4:30 AM, Robert Pulumbarit<robertpulumbarit <at>
gmail.com> wrote:>> Hello,>> In reading through the thread, unless I'm not
following correctly, it seems
> > like the EIO trace files that we happen to be using may be a possible>
culprit. As a test to try to eliminate those EIO trace files as suspects,>
just using the default config files and workloads that come with m5-stable, is
> > there a way to run a detailed (O3) multicore system in which all cores
have> non-zero "# Number of instructions committed"
(system.cpux.commit.COM:count)?
> See the quick/01.hello-2T-smt regression test. Run it with this command
line:build/ALPHA_SE/m5.debug tests/run.py quick/01.hello-2T-smt/alpha/linux/o3-
timing
> >> I've tried it using a fresh unmodified installation of m5-stable, using
this> command line:>> (run from the /configs/example directory)>> ../../../m5-
stable/build/ALPHA_SE/m5.opt -d se se.py -n 4 --detailed --caches
> >> The output file se/m5stats.txt shows that only cpu2 commits
instructions. The> three other cpu's commit 0 instructions.>> What command
line options should I use to create a 4 core system and assign
> > each core to run its own separate copy of the default "hello world"? Can>
anyone tell me what I am missing in the above command line to do this?
> You can't do this directly from the command line with se.py. The
> script creates a single workload (stored in the 'process' variable inthe
script) and then assigns that same workload instance to all theCPUs. This is
the right thing to do if you have a multithreadedprogram (e.g., SPLASH); when
the application forks new threads, they
> will run on the other CPUs. For a single-threaded program
like "helloworld", it will never fork new threads, so only one CPU gets
used.Steve
>
>
> _______________________________________________m5-users mailing listm5-users
<at> m5sim.orghttp://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>
>
>
>
>
>
>
> _______________________________________________
> m5-users mailing list
> m5-users <at> m5sim.org
> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
Just to clarify the earlier post, we are using m5-stable, SE mode, with
LiveProcess workloads, 10 million max_insts_all_threads. We are running
multicore-multithread experiments, and we tend to get the error when we get to
2 core, 2 threads per core systems and higher. If we reduce the number of
cores, threads, and max_insts_all_threads, it seems we are less likely to get
the error.
We added a line to /mem/cache/cache_impl.hh just before the assert statement
around line 312 to see what the offending values of blkSize and pkt->getSIze()
were.
Every time we get the error, blkSize is always twice the size of pkt-geSize().
I assume this has something to do with our cache setup, in which our L1 cache
has 16B block size, L2 cache is twice that at 32B block size, and L3 is twice
that at 64B block size.
For example, in one erroneous run, we had blkSize = 32, pkt->getSize() = 16.
In another erroneous run, we had blkSize = 64, pkt->getSize() = 32.
We made some attempts to trace where the faulty blkSize (or faulty pkt->getSize
()) originated from.
Here is as far as we traced:
The assertion in /mem/cache/cache_impl.hh, line 313, is contained within the
method access(...).
access(...) is called by:
line 426: bool Cache<TagStore>::timingAccess(PacketPtr pkt)
timingAccess() is called by:
line 1347: bool Cache<TagStore>::CpuSidePort::recvTiming(PacketPtr
pkt)
recvTiming() is called by sendTiing():
./src/mem/port.hh:187: bool sendTiming(PacketPtr pkt) { return
peer->recvTiming(pkt); }
../../src/cpu/o3/lsq_unit_impl.hh: if (!dcachePort-
>sendTiming(data_pkt)) {
../../src/cpu/o3/lsq_unit_impl.hh: if (dcachePort-
>sendTiming(retryPkt)) {
../../src/cpu/o3/lsq_unit.hh: if (!dcachePort->sendTiming
(data_pkt)) {
../../src/cpu/o3/fetch_impl.hh: if (!icachePort->sendTiming
(data_pkt)) {
../../src/cpu/o3/fetch_impl.hh: if (icachePort->sendTiming
(retryPkt)) {
../../src/dev/io_device.cc: result = sendTiming(pkt);
../../src/dev/io_device.cc: result = sendTiming(pkt);
../../src/mem/tport.cc: bool success = sendTiming(dp.pkt);
../../src/mem/cache/cache_impl.hh: memSidePort-
>sendTiming(snoopPkt);
../../src/mem/cache/cache_impl.hh: cpuSidePort->sendTiming
(snoopPkt);
../../src/mem/cache/cache_impl.hh: bool success = sendTiming
(transmitList.front().pkt);
../../src/mem/cache/cache_impl.hh: bool success =
sendTiming(pkt);
../../src/mem/bus.cc: bool success M5_VAR_USED = p-
>sendTiming(pkt);
../../src/mem/bus.cc: if (!dest_port->sendTiming(pkt)) {
../../src/mem/bridge.cc: if (sendTiming(pkt)) {
We added a trace statement (fprintf) before every one of those sendTiming()
calls, and unfortunately we were overwhelmed with output statements, since
many if not all of those listed sendTiming() calls are made.
Short of us adding even more traces to every one of the sendTming() callers,
can anyone with more knowledge of the code help narrow down the problem?
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users