[m5-dev] Checkpoint restoration problems in simple-timing and o3

soumyaroop roy Mon, 08 Feb 2010 08:41:26 -0800

I thought it would be wiser to start a new thread for this issue:

I just stumbled upon a problem while taking a checkpoint in O3 and
then restoring from it. The problem also exists from simple-timing!


Here is what taking a checkpoint at instruction #100 on gcc_integrate
looks like:

command line: build/ALPHA_SE/m5.fast
--outdir=./m5out/o3-timing/100drain configs/example/se.py
--bench=gcc_integrate --detailed --caches --l2cache
--take-checkpoint=100 --at-instruction
Global frequency set at 1000000000000 ticks per second
0: system.remote_gdb.listener: listening for remote gdb on port 7000
Creating checkpoint at inst:100
info: Entering event queue @ 0.  Starting simulation...
info: Increasing stack size by one page.
hack: be nice to actually delete the event here
exit cause = a thread reached the max instruction count
info: Entering event queue @ 1193000.  Starting simulation...
Writing checkpoint
Checkpoint written.
Exiting @ cycle 1250000 because a thread reached the max instruction count

Here is the error that results while resuming from the same checkpoint:

command line: build/ALPHA_SE/m5.fast
--outdir=./m5out/o3-timing/100resume configs/example/se.py
--bench=gcc_integrate --detailed --caches --l2cache
--checkpoint-dir=./m5out/o3-timing/100drain --checkpoint-restore=100
--at-instruction --max-inst=100
Global frequency set at 1000000000000 ticks per second
0: system.remote_gdb.listener: listening for remote gdb on port 7000
Restoring checkpoint ...
Restoring from checkpoint
fatal: Can't unserialize 'system.cpu:locked'
 @ cycle 1250000
[paramIn:build/ALPHA_SE/sim/serialize.cc, line 203]
Memory Usage: 585384 KBytes
For more information see: http://www.m5sim.org/fatal/60de9f5a

Inspecting the m5.cpt files gave me an idea about the problem:
for simple-atomic:
...
[system.cpu]
so_state=2
locked=false
_status=1
...

for simple-timing:
...
 [system.cpu]
 so_state=2
 _status=1

for o3-timing:
....
 [system.cpu]
 so_state=2
...

So, I was able to gather some more information about the error:

It looks like, while restoring from a checkpoint, it always starts off
with the "simple-atomic" cpu (check the setCPUClass() routine in
Simulation.py) and then switches to the other CPU (timing or O3 or
Inorder). Therefore, the AtomicSimpleCPU::unserialize() is called on
the checkpoint that is created by the other cpus which is causing that
error!

But, when I altered the script to change that, simulation does not
proceed after restoration and, thus, exits:
command line: /home/sroy-local/research/m5-arm/build/ALPHA_SE/m5.debug
--outdir=./m5out/simple-timing/50rest configs/example/se.py
--bench=gcc_integrate --timing
--checkpoint-dir=./m5out/simple-timing/50drain --checkpoint-restore=50
--at-instruction --max-inst=1000
Global frequency set at 1000000000000 ticks per second
0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
Restoring checkpoint ...
Restoring from checkpoint
warn: optional parameter system.cpu.workload:M5_pid not present
For more information see: http://www.m5sim.org/warn/aa78cda1
Done.
**** REAL SIMULATION ****
info: Entering event queue @ 2162000.  Starting simulation...
Exiting @ cycle 9223372036854775807 because simulate() limit reached

So, I am not sure if the problem is in the python script or in c++ or both?

Any ideas?

regards,
Soumyaroop

-- 
Soumyaroop Roy
Ph.D. Candidate
Department of Computer Science and Engineering
University of South Florida, Tampa
http://www.csee.usf.edu/~sroy
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

[m5-dev] Checkpoint restoration problems in simple-timing and o3

Reply via email to