I have made some progress on checkpointing.

Here are a few issues with checkpointing for o3 (based on which I was
implementing the same for inorder):

1. _status member was not being serialized which was preventing
scheduling of new events, thereby rendering the main event queue empty
(the simple cpus serialize this item)
2. next, the list of activeThreads was not being serialized which was
preventing the fetch from fetching any instructions ("no active
threads to fetch from" sort of a message; O3 goes into an infinite
loop).

But fixing the two items above did not solve the problem. I figured
(from the takeoverfrom() routines) that commit stage needs to reset
its flags to that it does not go and squash the first instruction
where the restoration is supposed to start from. Since I am not very
familiar with the O3 code, I did not spend much time looking into it.

For inorder, I added a couple more things:
1. In the resume routine, I added logic to activate all the all the
active threads in all resources in the resource pool.
2. The nextNPC value is maintained separately in inorder and,
therefore, when the thread contexts are serialized (using the base
class SimpleThread), it stores an incorrect value in the checkpoint.

    uint64_t readNextNPC()
    {
#if ISA_HAS_DELAY_SLOT
        return nextNPC;
#else
        return nextPC + sizeof(TheISA::MachInst);
#endif
    }

    void setNextNPC(uint64_t val)
    {
#if ISA_HAS_DELAY_SLOT
        nextNPC = val;
#endif
    }


I currently have an ugly fix for it.

So, now I am seeing inorder proceed to about a 100 instructions after
which the PC is set to 0x0 (following a squash). I have to look into
it later. Which trace flags should I use to see the actual
instructions?

regards,
Soumyaroop

On Mon, Feb 8, 2010 at 1:09 PM, soumyaroop roy <s...@cse.usf.edu> wrote:
> Thanks for creating the wiki page. I will populate it after I am done
> testing checkpoint restoration. Waiting for a resolution on the
> restoration issue.
>
> regards,
> Soumyaroop
>
> On 2/8/10, Korey Sewell <ksew...@umich.edu> wrote:
>>
>> > Yes, that is absolutely correct! the list of instructions to be
>> > removed during a tick by the cleanUpRemovedInsts() routine comprises
>> > of both instructions that graduate and instructions that get squashed!
>> > And, cleanUpRemovedInsts() just removes the instruction schedule
>> > associated (and its register dependencies) by an instruction before
>> > removing it from "instList" (the global list of all instructions in
>> > flight in the pipeline). So, it appears ok to add the logic to drain
>> > the cpu here. Could you look into the cleanUpRemovedInsts() one more
>> > time?
>> >
>> OK, that looks right. The "removeList" kind of serves as the synchronizing
>> point
>> for all squashed and graduated instructions and that cleanUp function
>> traverses
>> that list. I was remarking on memory mgmt., but I forgot that those are
>> ref-counted
>>  pointers (for the insts), so as soon as they are taken off the instruction
>> list they
>> should delete themselves.
>>
>> So after all the instructions are "cleaned" in that function, it looks safe
>> to add some drain logic
>>  that will check all the per-thread global instruction lists and if they are
>> all empty then signal
>> the drain complete.
>>
>> >
>> > I was looking into the takeOverFrom() routine and it appears to
>> > involve a lot more work! But, I'll work on it later.
>> >
>> In the interim, checkpointing from/to the same cpu model is a proof of
>> concept and then switching
>> between models could be the next step.
>>
>> One last thing: If you get some time, can you add some details to the WIKI
>> about Checkpointing
>> for the InOrder model? This type of stuff gets lost in the development
>> process easily but if someone
>> just takes notes as they go along, then the next person who is interested in
>> checkpointing has a faster path to understanding
>>  the code and eventually tweaking the process if need be (especially if that
>> next person is me :) ... )
>>
>> I created a page there that we can edit as the solution gets panned out:
>> http://m5sim.org/wiki/index.php/InOrder.
>>
>> --
>> - Korey
>>
>> _______________________________________________
>>  m5-dev mailing list
>>  m5-...@m5sim.org
>>  http://m5sim.org/mailman/listinfo/m5-dev
>>
>>
>
>
> --
> Soumyaroop Roy
> Ph.D. Candidate
> Department of Computer Science and Engineering
> University of South Florida, Tampa
> http://www.csee.usf.edu/~sroy
>



-- 
Soumyaroop Roy
Ph.D. Candidate
Department of Computer Science and Engineering
University of South Florida, Tampa
http://www.csee.usf.edu/~sroy
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to