I thought it would be wiser to start a new thread for this issue: I just stumbled upon a problem while taking a checkpoint in O3 and then restoring from it. The problem also exists from simple-timing!
Here is what taking a checkpoint at instruction #100 on gcc_integrate looks like: command line: build/ALPHA_SE/m5.fast --outdir=./m5out/o3-timing/100drain configs/example/se.py --bench=gcc_integrate --detailed --caches --l2cache --take-checkpoint=100 --at-instruction Global frequency set at 1000000000000 ticks per second 0: system.remote_gdb.listener: listening for remote gdb on port 7000 Creating checkpoint at inst:100 info: Entering event queue @ 0. Starting simulation... info: Increasing stack size by one page. hack: be nice to actually delete the event here exit cause = a thread reached the max instruction count info: Entering event queue @ 1193000. Starting simulation... Writing checkpoint Checkpoint written. Exiting @ cycle 1250000 because a thread reached the max instruction count Here is the error that results while resuming from the same checkpoint: command line: build/ALPHA_SE/m5.fast --outdir=./m5out/o3-timing/100resume configs/example/se.py --bench=gcc_integrate --detailed --caches --l2cache --checkpoint-dir=./m5out/o3-timing/100drain --checkpoint-restore=100 --at-instruction --max-inst=100 Global frequency set at 1000000000000 ticks per second 0: system.remote_gdb.listener: listening for remote gdb on port 7000 Restoring checkpoint ... Restoring from checkpoint fatal: Can't unserialize 'system.cpu:locked' @ cycle 1250000 [paramIn:build/ALPHA_SE/sim/serialize.cc, line 203] Memory Usage: 585384 KBytes For more information see: http://www.m5sim.org/fatal/60de9f5a Inspecting the m5.cpt files gave me an idea about the problem: for simple-atomic: ... [system.cpu] so_state=2 locked=false _status=1 ... for simple-timing: ... [system.cpu] so_state=2 _status=1 for o3-timing: .... [system.cpu] so_state=2 ... So, I was able to gather some more information about the error: It looks like, while restoring from a checkpoint, it always starts off with the "simple-atomic" cpu (check the setCPUClass() routine in Simulation.py) and then switches to the other CPU (timing or O3 or Inorder). Therefore, the AtomicSimpleCPU::unserialize() is called on the checkpoint that is created by the other cpus which is causing that error! But, when I altered the script to change that, simulation does not proceed after restoration and, thus, exits: command line: /home/sroy-local/research/m5-arm/build/ALPHA_SE/m5.debug --outdir=./m5out/simple-timing/50rest configs/example/se.py --bench=gcc_integrate --timing --checkpoint-dir=./m5out/simple-timing/50drain --checkpoint-restore=50 --at-instruction --max-inst=1000 Global frequency set at 1000000000000 ticks per second 0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000 Restoring checkpoint ... Restoring from checkpoint warn: optional parameter system.cpu.workload:M5_pid not present For more information see: http://www.m5sim.org/warn/aa78cda1 Done. **** REAL SIMULATION **** info: Entering event queue @ 2162000. Starting simulation... Exiting @ cycle 9223372036854775807 because simulate() limit reached So, I am not sure if the problem is in the python script or in c++ or both? Any ideas? regards, Soumyaroop -- Soumyaroop Roy Ph.D. Candidate Department of Computer Science and Engineering University of South Florida, Tampa http://www.csee.usf.edu/~sroy _______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev