Re: [gem5-users] Checkpointing possible with Ruby, X86, TimingSimpleCPU and O3CPU?

Nilay Vaish Tue, 28 Aug 2012 13:28:12 -0700

The cause of the assert failure was tracked down recently by Jason Power.The patch is on the review board. Here is the link -http://reviews.gem5.org/r/1365


It will be committed to the mainline soon.


--
Nilay


On Tue, 28 Aug 2012, Marco Elver wrote:

Hi all,

I would like to ask if what I am trying to do is even possible (and if
so, how??), as I have been running into a few problems, despite
following the advice I could find in older mailing-list threads or the
wiki. My goal would be to run a full-system with ruby (with
MOESI_CMP_directory), multiple processors of type O3CPU and the X86 ISA;
I create a snapshot after the Linux kernel loaded and before the
benchmark enters the ROI.

With revision 9174:2171e04a2ee5 (Mon Aug 27 20:53:20 2012 -0400) from
the dev repository, I tried the following:
   (1) Take a checkpoint with ruby_fs, the *MOESI_hammer* protocol
(only one supporting checkpoints, according to Wiki) and the
TimingSimpleCPU (succeeds):
          $> build/X86_MOESI_hammer/gem5.opt
--outdir=m5out/rawdata/fluidanimate/ckpt configs/example/ruby_fs.py -n
16 --cpu-type=timing --kernel=system/x86_64-vmlinux-2.6.28.smp
--checkpoint-dir=m5out/checkpoints/fluidanimate --max-checkpoints=1
--script=contrib/initscripts/parsec/fluidanimate.sh

   (2) Resume from the checkpoint with the O3CPU, restore with
TimingSimpleCPU (fails):
          $> build/X86_MOESI_hammer/gem5.opt
--outdir=m5out/rawdata/fluidanimate/detailed configs/example/ruby_fs.py
-n 16 --cpu-type=detailed --kernel=system/x86_64-vmlinux-2.6.28.smp
--checkpoint-dir=m5out/checkpoints/fluidanimate -r 0
--restore-with-cpu=timing
          [...]
          Switch at curTick count:10000
          info: Entering event queue @ 0.  Starting simulation...
          Runtime Error at MOESI_hammer-dir.sm:1270, Ruby Time:
1111185: assert failure, PID: 2742
          press return to continue.

          Program aborted at cycle 555592500

   (3) Resume from the checkpoint with the TimingSimpleCPU fails in the
same way as (2), as in (2) the CPU isn't even switched to the O3CPU
before it fails.

   (4) Though if I try taking a snapshot right after starting the
simulator (after ~ 10000000000 cycles, kernel still booting) and then
try to restore with the TimingSimpleCPU, it works as expected; only the
O3CPU fails with a segfault and the following backtrace:
       #0  0x0000000000cdff56 in MasterPort::sendTimingReq
(this=<optimized out>, pkt=0x6f8a060)
           at build/X86/mem/port.cc:136
       #1  0x00000000005fbac5 in sendTiming (pkt=0x6f8a060,
sendingState=0x61a7cc0, this=0x49a9e60)
           at build/X86/arch/x86/pagetable_walker.cc:173
       #2  X86ISA::Walker::WalkerState::sendPackets (this=0x61a7cc0)
           at build/X86/arch/x86/pagetable_walker.cc:631
       #3  0x00000000005fc8c2 in
X86ISA::Walker::WalkerState::recvPacket (this=this@entry=0x61a7cc0,
           pkt=pkt@entry=0x1e99920) at
build/X86/arch/x86/pagetable_walker.cc:590
       #4  0x00000000005fcb98 in X86ISA::Walker::recvTimingResp
(this=0x43706c0, pkt=0x1e99920)
           at build/X86/arch/x86/pagetable_walker.cc:129
       #5  0x0000000000ce1f5b in PacketQueue::trySendTiming
(this=0x42ba5e0)
           at build/X86/mem/packet_queue.cc:152
       #6  0x0000000000ce2929 in PacketQueue::sendDeferredPacket
(this=0x42ba5e0)
           at build/X86/mem/packet_queue.cc:190
       #7  0x0000000000c391be in EventQueue::serviceOne
(this=<optimized out>) at build/X86/sim/eventq.cc:204
       #8  0x0000000000c7d342 in simulate
(num_cycles=9223372036854785807) at build/X86/sim/simulate.cc:71
       #9  0x0000000000b8e17c in _wrap_simulate__SWIG_0
(args=<optimized out>)
           at build/X86/python/swig/event_wrap.cc:4755
       #10 _wrap_simulate (self=<optimized out>, args=<optimized out>)
           at build/X86/python/swig/event_wrap.cc:4804
       #11 0x00007fb32a094fc6 in PyEval_EvalFrameEx () from
/lib/libpython2.7.so.1.0

Trying to restore with ruby using MOESI_CMP_directory and the
TimingSimpleCPU results in the same error as (2), with the difference
that it finishes loading the checkpoint, resumes, but then fails after
about a minute ("Runtime Error at MOESI_CMP_directory-dir.sm:485, Ruby
Time: 12038425921: assert failure, PID: 19169"). Using the O3CPU still
results in the same error as (4).

In addition, I have seen workflows of: 1) create checkpoint without ruby
and with the AtomicSimpleCPU 2) load checkpoint with ruby and the
TimingSimpleCPU. I tried this, and it works if I set
--restore-with-cpu=timing. But trying this with the O3CPU doesn't work,
resulting in the same backtrace as (4).

Is what I'm trying to do possible? If so, any workarounds I should know of?

Thanks,
Marco


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Checkpointing possible with Ruby, X86, TimingSimpleCPU and O3CPU?

Reply via email to