Trying with O3 was just a random attempt.  The intent is to be able to use
simple timing for what I'm doing.  The checkpoint taking and restoring even
in pure simple atomic mode is not functioning, so that's what I'm trying to
figure out.

Working with a simple helloworld program(but also several others), I take a
checkpoint in simple atomic at instruction N, which writes and then the
program exits because the 'thread reached max instruction count' [which is
not what I'm concerned with].  When I then restore, still in simple atomic
(and not switching to anything else), from that same checkpoint, I get
"warn: optional parameter system.cpu.workload:M5_pid not present", followed
by the program going into "**** REAL SIMULATION ****" followed by a seg
fault.  I'm looking into the segfault bit now, but am unsure what the
M5_pid warning relates to; I'm only using one CPU and one thread.

I am using m5.fast, though the same bugs happen with m5.opt.  I'll be
trying m5.debug next, at least in the hopes of getting more useful info
with gdb.

In conclusion, are the bugs I've mentioned one of these common
'long-standing'  bugs that people have had to deal with?

-Griffin

On Tue, 29 Mar 2011 09:57:53 -0700, Steve Reinhardt <[email protected]>
wrote:
> In theory, these should all work, though as Ali said things will break
> if you take a checkpoint in a system with caches because the caches
> will likely have dirty memory blocks that don't get saved.  So since
> O3 doesn't work without caches, in practice you can't create a
> checkpoint from it.  But strictly speaking that's a shortcoming of the
> caches and not the CPU model.
> 
> In practice, people generally create checkpoints with atomic mode
> (since it's fast), then restore to atomic mode and switch to
> timing/detailed.  So if you're having problems with a
> checkpoint/restore in atomic mode then that's definitely a bug of some
> kind.
> 
> Problems in other modes may well be bugs too but they may be
> longstanding ones that people have just learned to work around.
> 
> Steve
> 
> On Tue, Mar 29, 2011 at 9:13 AM, Griffin Wright <[email protected]>
wrote:
>> In what situations is checkpoint taking/restoring actually supported in
>> m5?
>>
>> I have tried creating and restoring checkpoints with different programs
>> in
>> simple atomic, simple timing, and detailed(O3CPUTim) modes, and they all
>> fail due to unserialization errors somewhere, either with
>> system.cpu:locked,
>> or Globals.curTick (in the case of detailed mode).  I'm not sure what
>> I'm
>> missing, and would at least like some clarification on how m5 supports
>> checkpointing in any of these modes.  I've looked at various
unserialize
>> methods, and can't tell what functionality they might be lacking which
>> causes these troubles.
>>
>> In all cases, once I create the checkpoint, the program exits due to a
>> "thread reached the max instruction count", but that doesn't concern me
>> because at that point, the checkpoint has become available for use.
>>
>> Thanks,
>>
>> Griffin Wright
>>
>>
>>
>> On Sun, 27 Mar 2011 12:04:53 -0500, Ali Saidi <[email protected]> wrote:
>>
>> Why are you taking checkpoints with a timing cpu and not an atomic one?
>> It's
>> faster and the caches don't save their state, so if you're using caches
>> with
>> the timing CPU you'll get an incomplete checkpoint.
>> Ali
>> On Mar 27, 2011, at 11:15 AM, Griffin Wright wrote:
>>
>> Hello,
>>
>> I'm working with checkpoints on simulations with an ARM_SE setup on a
>> simple
>> timing CPU, and while I can take a checkpoint in simple timing mode just
>> fine, when I attempt to restore from a checkpoint, I get the following:
>>
>> fatal: Can't unserialize 'system.cpu:locked'
>>  @ cycle 1945913882000
>> [paramIn:build/ARM_SE/sim/serialize.cc, line 211]
>> Memory Usage: 559288 KBytes
>> For more information see: http://www.m5sim.org/fatal/60de9f5a
>>
>> That link points to nothing, but that's no biggie.  I skimmed through
>> the
>> user's archive and found some related queries, with the quote at the end
>> of
>> this message being a solution.  I'm wondering if this is in fact the
>> only
>> way to use the provided restore-checkpoint feature in m5, or if progress
>> has
>> been made with regards to the simple timing CPU since this below post,
or
>> if
>> I'm looking at my error in the wrong light altogether.
>>
>> The code that is failing is as follows, showing that system.cpu:locked
>> cannot be unserialized:
>>
>>     if (!cp->find(section, name, str) || !parseParam(str, param)) {
>>         fatal("Can't unserialize '%s:%s'\n", section, name);
>>     }
>>
>> "Resume() is the opposite of drain() which means the
>> system can continue issuing requests and acting as normal. serialize()
>> needs to save all of the state the CPU needs to put itself in the same
>> state as it was executing and unserialize() restores that saved state.
>> Looking at other implementations of save()/restore() is the easiest
>> way to do this. Finally, if you want to be able to switch to/from the
>> inorder cpu switchOut() and takeOverFrom() need to be implemented.
>>
>> Thank you,
>> Griffin Wright
>>
>>
>> _______________________________________________
>> m5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>>
>>
>>
>> _______________________________________________
>> m5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>>
> _______________________________________________
> m5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Reply via email to