Hi Tim, I have not been completely following this thread, but I can answer your question about unserializing cache contents.
The benefit for creating at trace, rather than just inserting data into the cache, is two-fold. First, by creating a trace from a very large cache system, one can warmup caches of different sizes, associativities and even completely different cache hierarchies/configurations from a single trace. Second, and probably more important, Ruby protocols rely on timing requests to set cache block state to the unique states used by a particular protocol. Often Ruby is used to compare different protocols and this process allows us to compare protocols using the exact same checkpoint. I hope that helps, Brad -----Original Message----- From: gem5-dev [mailto:gem5-dev-boun...@gem5.org] On Behalf Of Timothy M Jones Sent: Wednesday, June 17, 2015 3:16 AM To: gem5 Developer List Subject: Re: [gem5-dev] Ruby serialize removing event queue head Thanks Nilay and Joel for the information. I've been playing around with this over the past few days and I can't work out what the point of the flush is. The CacheRecorder already has a copy of all the data blocks in the trace before the flush starts. Removing the flush event and subsequent simulation produces exactly the same system.ruby.cache.gz file as with it in, so I guess it's safe to remove them.... So, with that out of the way, I can create checkpoints and exit the simulator correctly. I'm not 100% sure about restoring the checkpoint though, and it seems a little hacky. Is there a reason it has to unserialise by inserting memory requests into the event queue - couldn't it just write the data into the correct locations in the caches? There's also a question about whether ruby should be recording its state anyway. Shouldn't it be doing the same as the classic memory system caches and implementing memWriteback() to flush all dirty data out before checkpointing happens, then it doesn't need to trace anything? (Maybe I'm opening a can of worms, but I thought I'd just ask!) Cheers Tim On 13/06/2015 18:03, Joel Hestness wrote: > Hey guys, > I'm pretty sure Tim is correct that the checkpointing bugs were > introduced earlier than the changeset Nilay points to; gem5-gpu is > currently using gem5 rev 10645 > <http://repo.gem5.org/gem5/rev/cd95d4d51659>, and we cannot get > reliable checkpoint and restore with it. Note that Tim's bug may not > be the only checkpointing bug that exists right now. > > To answer Tim's question: While taking a checkpoint, Ruby > commandeers the event queue to inject flushing memory accesses into > the caches. This is used to generate a trace of cache contents, which > can be used to warm up the caches on checkpoint restore. To take over > control of the event queue, Ruby clears the event at the queue head (I > think this assumes there is only 1 event in the queue? This should > probably be checked), and then adds it's own event for the cache > flushing operation. After the caches have been flushed (simulate() > call in RubySystem::serialize()), Ruby restores the head event that > was in the queue and rolls back the current tick. > > One way to check if this cooldown operation is at fault for > unreliable checkpointing is to simply comment out the event queue > commandeering, and try to take a checkpoint. You may also be able to > test checkpoint restore by commenting the cache warm-up code in > RubySystem::unserialize(). If checkpoint and restore work without the > event queue commandeering, it is likely that the event queue > manipulation is problematic. > > I'd also recommend trying to take a checkpoint and restore with > simulation specifying the gem5 flag --debug-flag=RubyCacheTrace, which > will show what the cache flushing and warm-up are doing, respectively. > > Joel > > > > On Sat, Jun 13, 2015 at 9:48 AM, Nilay Vaish <ni...@cs.wisc.edu > <mailto:ni...@cs.wisc.edu>> wrote: > > Your bisection is not right. You might want to take a look at the > following changeset: > > > changeset: 10756:f9c0692f73ec > user: Curtis Dunham <curtis.dun...@arm.com > <mailto:curtis.dun...@arm.com>> > date: Mon Mar 23 06:57:36 2015 -0400 > summary: sim: Reuse the same limit_event in simulate() > > > I suggest that you revert this changeset in your repo while I think > about what needs to be done. > > -- > Nilay > > > > On Sat, 13 Jun 2015, Timothy M Jones wrote: > > Hi again, > > Further to this message, I've used hg bisect to find the > revision that breaks checkpointing with ruby. It's revision > 10524 that Nilay committed in November that's the first bad > changeset. It fails with the panic() on the missing event that > I wrote about previously. > > I've scanned through the diff and can't immediately see any > reason why this would break serialisation, although it does > remove some of the code to serialise ruby state. > > Could anyone (Nilay?) give me a hint as to why this might break > checkpointing with ruby? > > I've compiled with the MOESI_hammer protocol for x86, then run > with this command line: > > ./build/X86/gem5.opt --remote-gdb-port=0 -d <outdir> > configs/example/fs.py -n 1 --kernel <my-kernel> --script > configs/boot/hack_back_ckpt.rcS --max-checkpoints 1 > --checkpoint-dir <cptdir> --disk-image <my-disk-image> > --cpu-type timing --restore-with timing --ruby > > Any help would be appreciated. I don't know ruby at all, so > trying to work out what's going on is slow.... > > Cheers > Tim > > On 11/06/2015 20:48, Timothy M Jones wrote: > > Hello, > > Could someone tell me why we need to take the head event > off the event > queue in RubySystem::serialize() in > src/mem/ruby/system/System.cc? > > Event* eventq_head = eventq->replaceHead(NULL); > > The problem I'm getting is that when simulate() is called > a few lines > later, it tries to reschedule the simulate_limit_event, > but that causes > a panic because it's no longer on the event queue. This > is happening > when trying to take a checkpoint with ruby. I can't work > out from the > comments why the head event needs to be taken off in the > first place. > > This is basically the reason behind the problems in this > thread: > > > https://www.mail-archive.com/gem5-users@gem5.org/msg11701.html > > Thanks > Tim > > > -- > Timothy M. Jones > http://www.cl.cam.ac.uk/~tmj32/ > _______________________________________________ > gem5-dev mailing list > gem5-dev@gem5.org <mailto:gem5-dev@gem5.org> > http://m5sim.org/mailman/listinfo/gem5-dev > > > _______________________________________________ > gem5-dev mailing list > gem5-dev@gem5.org <mailto:gem5-dev@gem5.org> > http://m5sim.org/mailman/listinfo/gem5-dev > > > > > -- > Joel Hestness > PhD Candidate, Computer Architecture > Dept. of Computer Science, University of Wisconsin - Madison > http://pages.cs.wisc.edu/~hestness/ -- Timothy M. Jones http://www.cl.cam.ac.uk/~tmj32/ _______________________________________________ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev