i just stumbled upon a problem while taking a checkpoint in O3 and then restoring from it. The problem also exists from simple-timing!
Here is what taking a checkpoint at instruction #100 on gcc_integrate looks like: command line: build/ALPHA_SE/m5.fast --outdir=./m5out/o3-timing/100drain configs/example/se.py --bench=gcc_integrate --detailed --caches --l2cache --take-checkpoint=100 --at-instruction Global frequency set at 1000000000000 ticks per second 0: system.remote_gdb.listener: listening for remote gdb on port 7000 Creating checkpoint at inst:100 info: Entering event queue @ 0. Starting simulation... info: Increasing stack size by one page. hack: be nice to actually delete the event here exit cause = a thread reached the max instruction count info: Entering event queue @ 1193000. Starting simulation... Writing checkpoint Checkpoint written. Exiting @ cycle 1250000 because a thread reached the max instruction count Here is the error that results while resuming from the same checkpoint: command line: build/ALPHA_SE/m5.fast --outdir=./m5out/o3-timing/100resume configs/example/se.py --bench=gcc_integrate --detailed --caches --l2cache --checkpoint-dir=./m5out/o3-timing/100drain --checkpoint-restore=100 --at-instruction --max-inst=100 Global frequency set at 1000000000000 ticks per second 0: system.remote_gdb.listener: listening for remote gdb on port 7000 Restoring checkpoint ... Restoring from checkpoint fatal: Can't unserialize 'system.cpu:locked' @ cycle 1250000 [paramIn:build/ALPHA_SE/sim/serialize.cc, line 203] Memory Usage: 585384 KBytes For more information see: http://www.m5sim.org/fatal/60de9f5a Inspecting the m5.cpt files makes reveals the problem: for simple-atomic: ... [system.cpu] so_state=2 locked=false _status=1 ... for simple-timing: ... [system.cpu] so_state=2 _status=1 for o3-timing: .... [system.cpu] so_state=2 ... How do I fix it? regards, Soumyaroop On Sat, Feb 6, 2010 at 2:39 PM, soumyaroop roy <s...@cse.usf.edu> wrote: > On Sat, Feb 6, 2010 at 12:22 PM, Steve Reinhardt <ste...@gmail.com> wrote: >> On Sat, Feb 6, 2010 at 9:08 AM, soumyaroop roy <s...@cse.usf.edu> wrote: >>> Let me rephrase my last question: >>> Is there any way that a comparison could be performed between the >>> checkpoint output by inorder with that output by a simple CPU? Say, if >>> the number of committed instructions in both are the same when the >>> checkpoints are dumped, should I expect that the register and memory >>> state for both CPU's should be identical? >> >> I would expect so, as long as the inorder pipeline is drained and only >> truly architectural state is checkpointed. > Actually, earlier I was trying compare the same between O3 and > simple-atomic and instead of paying attention to > system.cpu.commit.commitCommittedInsts I was looking at > system.cpu.commitInsts and that's why I wasn't seeing identical > architectural register file contents between the two. Does that seem > reasonable? The physical memory dumps are not identical though. > Similar observations with inorder. > >> >>> >>> In any case, what kind of testing strategy should be put in place to >>> test the correctness of checkpointing apart from just taking >>> checkpoints for programs/benchmarks, resuming from those checkpoints, >>> running the programs till completion, and finally verifying the >>> correctness of the outputs? >> >> We've had discussions on this before, though I don't recall all the >> details of what we've come up with. Certainly the basic idea of >> taking a checkpoint, restoring from that checkpoint, and then >> comparing with an execution that didn't stop is a common one. You >> wouldn't have to run all the way to the end; you could dump statistics >> and perhaps another checkpoint at a common point further on but not >> all the way at completion. There is an issue in that the drain >> operation will perturb the state slightly, so you can't expect the >> results to be identical unless you either (1) artificially induce a >> drain without a checkpoint at the same spot in the non-checkpointed >> execution or (2) do a stats reset at a common point in both >> simulations after the checkpoint restore. The latter seems a lot >> simpler to me. > Thanks, I will try this. Is there an option in any of the scripts to > reset stats at an instruction? > >> >> Steve >> _______________________________________________ >> m5-dev mailing list >> m5-dev@m5sim.org >> http://m5sim.org/mailman/listinfo/m5-dev >> > > > > -- > Soumyaroop Roy > Ph.D. Candidate > Department of Computer Science and Engineering > University of South Florida, Tampa > http://www.csee.usf.edu/~sroy > -- Soumyaroop Roy Ph.D. Candidate Department of Computer Science and Engineering University of South Florida, Tampa http://www.csee.usf.edu/~sroy _______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev