Hi Joel, Thank you for trying to fix this; if you say you have already fixed this issue partially, I'll wait for the final patches.
-- Marco On 31/08/12 20:52, Joel Hestness wrote: > Hi Marco, > Thanks for sending this. Based on what I see here, I'm pretty > confident that one of my new patches will fix this issue. > Unfortunately, sending you that patch would only get you to the next > bug that currently exists in Ruby's draining functionality. > > It appears as though these deeper bugs in Ruby were introduced when > the gem5 ports were split to introduce queued ports (changeset > 8914:8c3bd7bea667). I'm hoping to have this all sorted out over the > next couple days, but if you have an urgent need to run simulations, > you could try running from sims from before the queued port change (% > hg update -r 8913, recompile, run). > > I'll keep you posted on debugging. Thanks again, > Joel > > > On Fri, Aug 31, 2012 at 2:34 PM, Marco Elver <[email protected] > <mailto:[email protected]>> wrote: > > Hi Joel, > > I ran with 1 CPU and 16 CPUs and get essentially the same result. > > Attachments: > - gdb-n1.log: Terminal output of gdb session for the 1 CPU case. > - gdb-n16.log: Terminal output of gdb session for 16 CPU case. > - gem5-n1.log.bz2: Gem5 output for 1 CPU case. > - gem5-n16.log.bz2: Gem5 output for 16 CPU case. > > Both of them crash right after printing information about > "[...]Got long mode PDP entry[...]". > > I hope the gdb and gem5 output logs are sufficient for you to > replicate this bug; my current hg parent is 9181:42807286d6cb, > when the patch mentioned below was applied. > > -- Marco > > > On 30/08/12 22:30, Joel Hestness wrote: >> Hi Marco, >> I'm currently trying to track down bugs in checkpoint restore >> to get x86+Ruby+O3CPU working, and I'm having trouble replicating >> your bug. Could you please >> compile build/X86_MOESI_hammer/gem5.debug and run the same tests >> you have here to grab this backtrace? Also, can you collect and >> restore from checkpoint with a single CPU core and see what happens? >> >> Thanks! >> Joel >> >> >> On Wed, Aug 29, 2012 at 5:11 PM, Marco Elver >> <[email protected] <mailto:[email protected]>> wrote: >> >> Thank you, with the patch I can confirm that the assertion >> problem has >> been fixed (after recreating the checkpoint). >> >> My problems with the O3CPU persist, and was wondering if this >> is a >> problem specific to X86 or is it a general problem? >> >> -- Marco >> >> On 28/08/12 21:28, Nilay Vaish wrote: >> > The cause of the assert failure was tracked down recently >> by Jason >> > Power. The patch is on the review board. Here is the link - >> > http://reviews.gem5.org/r/1365 >> > >> > It will be committed to the mainline soon. >> > >> > -- >> > Nilay >> > >> > >> > On Tue, 28 Aug 2012, Marco Elver wrote: >> > >> >> Hi all, >> >> >> >> I would like to ask if what I am trying to do is even >> possible (and if >> >> so, how??), as I have been running into a few problems, >> despite >> >> following the advice I could find in older mailing-list >> threads or the >> >> wiki. My goal would be to run a full-system with ruby (with >> >> MOESI_CMP_directory), multiple processors of type O3CPU >> and the X86 ISA; >> >> I create a snapshot after the Linux kernel loaded and >> before the >> >> benchmark enters the ROI. >> >> >> >> With revision 9174:2171e04a2ee5 (Mon Aug 27 20:53:20 2012 >> -0400) from >> >> the dev repository, I tried the following: >> >> (1) Take a checkpoint with ruby_fs, the *MOESI_hammer* >> protocol >> >> (only one supporting checkpoints, according to Wiki) and the >> >> TimingSimpleCPU (succeeds): >> >> $> build/X86_MOESI_hammer/gem5.opt >> >> --outdir=m5out/rawdata/fluidanimate/ckpt >> configs/example/ruby_fs.py -n >> >> 16 --cpu-type=timing --kernel=system/x86_64-vmlinux-2.6.28.smp >> >> --checkpoint-dir=m5out/checkpoints/fluidanimate >> --max-checkpoints=1 >> >> --script=contrib/initscripts/parsec/fluidanimate.sh >> >> >> >> (2) Resume from the checkpoint with the O3CPU, restore with >> >> TimingSimpleCPU (fails): >> >> $> build/X86_MOESI_hammer/gem5.opt >> >> --outdir=m5out/rawdata/fluidanimate/detailed >> configs/example/ruby_fs.py >> >> -n 16 --cpu-type=detailed >> --kernel=system/x86_64-vmlinux-2.6.28.smp >> >> --checkpoint-dir=m5out/checkpoints/fluidanimate -r 0 >> >> --restore-with-cpu=timing >> >> [...] >> >> Switch at curTick count:10000 >> >> info: Entering event queue @ 0. Starting >> simulation... >> >> Runtime Error at MOESI_hammer-dir.sm:1270, Ruby >> Time: >> >> 1111185: assert failure, PID: 2742 >> >> press return to continue. >> >> >> >> Program aborted at cycle 555592500 >> >> >> >> (3) Resume from the checkpoint with the TimingSimpleCPU >> fails in the >> >> same way as (2), as in (2) the CPU isn't even switched to >> the O3CPU >> >> before it fails. >> >> >> >> (4) Though if I try taking a snapshot right after >> starting the >> >> simulator (after ~ 10000000000 cycles, kernel still >> booting) and then >> >> try to restore with the TimingSimpleCPU, it works as >> expected; only the >> >> O3CPU fails with a segfault and the following backtrace: >> >> #0 0x0000000000cdff56 in MasterPort::sendTimingReq >> >> (this=<optimized out>, pkt=0x6f8a060) >> >> at build/X86/mem/port.cc:136 >> >> #1 0x00000000005fbac5 in sendTiming (pkt=0x6f8a060, >> >> sendingState=0x61a7cc0, this=0x49a9e60) >> >> at build/X86/arch/x86/pagetable_walker.cc:173 >> >> #2 X86ISA::Walker::WalkerState::sendPackets >> (this=0x61a7cc0) >> >> at build/X86/arch/x86/pagetable_walker.cc:631 >> >> #3 0x00000000005fc8c2 in >> >> X86ISA::Walker::WalkerState::recvPacket >> (this=this@entry=0x61a7cc0, >> >> pkt=pkt@entry=0x1e99920) at >> >> build/X86/arch/x86/pagetable_walker.cc:590 >> >> #4 0x00000000005fcb98 in >> X86ISA::Walker::recvTimingResp >> >> (this=0x43706c0, pkt=0x1e99920) >> >> at build/X86/arch/x86/pagetable_walker.cc:129 >> >> #5 0x0000000000ce1f5b in PacketQueue::trySendTiming >> >> (this=0x42ba5e0) >> >> at build/X86/mem/packet_queue.cc:152 >> >> #6 0x0000000000ce2929 in >> PacketQueue::sendDeferredPacket >> >> (this=0x42ba5e0) >> >> at build/X86/mem/packet_queue.cc:190 >> >> #7 0x0000000000c391be in EventQueue::serviceOne >> >> (this=<optimized out>) at build/X86/sim/eventq.cc:204 >> >> #8 0x0000000000c7d342 in simulate >> >> (num_cycles=9223372036854785807) at >> build/X86/sim/simulate.cc:71 >> >> #9 0x0000000000b8e17c in _wrap_simulate__SWIG_0 >> >> (args=<optimized out>) >> >> at build/X86/python/swig/event_wrap.cc:4755 >> >> #10 _wrap_simulate (self=<optimized out>, >> args=<optimized out>) >> >> at build/X86/python/swig/event_wrap.cc:4804 >> >> #11 0x00007fb32a094fc6 in PyEval_EvalFrameEx () from >> >> /lib/libpython2.7.so.1.0 >> >> >> >> Trying to restore with ruby using MOESI_CMP_directory and the >> >> TimingSimpleCPU results in the same error as (2), with the >> difference >> >> that it finishes loading the checkpoint, resumes, but then >> fails after >> >> about a minute ("Runtime Error at >> MOESI_CMP_directory-dir.sm:485, Ruby >> >> Time: 12038425921 <tel:12038425921>: assert failure, PID: >> 19169"). Using the O3CPU still >> >> results in the same error as (4). >> >> >> >> In addition, I have seen workflows of: 1) create >> checkpoint without ruby >> >> and with the AtomicSimpleCPU 2) load checkpoint with ruby >> and the >> >> TimingSimpleCPU. I tried this, and it works if I set >> >> --restore-with-cpu=timing. But trying this with the O3CPU >> doesn't work, >> >> resulting in the same backtrace as (4). >> >> >> >> Is what I'm trying to do possible? If so, any workarounds >> I should >> >> know of? >> >> >> >> Thanks, >> >> Marco >> >> >> >> >> >> -- >> >> The University of Edinburgh is a charitable body, >> registered in >> >> Scotland, with registration number SC005336. >> >> >> >> _______________________________________________ >> >> gem5-users mailing list >> >> [email protected] <mailto:[email protected]> >> >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> >> >> > _______________________________________________ >> > gem5-users mailing list >> > [email protected] <mailto:[email protected]> >> > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> > >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> _______________________________________________ >> gem5-users mailing list >> [email protected] <mailto:[email protected]> >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> >> >> >> >> -- >> Joel Hestness >> PhD Student, Computer Architecture >> Dept. of Computer Science, University of Wisconsin - Madison >> Dept. of Computer Science, University of Texas - Austin >> http://www.cs.utexas.edu/~hestness >> <http://www.cs.utexas.edu/%7Ehestness> >> >> >> _______________________________________________ >> gem5-users mailing list >> [email protected] <mailto:[email protected]> >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > _______________________________________________ > gem5-users mailing list > [email protected] <mailto:[email protected]> > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > -- > Joel Hestness > PhD Student, Computer Architecture > Dept. of Computer Science, University of Wisconsin - Madison > http://www.cs.utexas.edu/~hestness > <http://www.cs.utexas.edu/%7Ehestness> > > > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
