Another followup on this is that the "deadlock_threshold" parameter doesnt
propagate to the MemTester CPU.
So when I'm testing 64 CPUS, the memtester.cc still has this code:
" if (!tickEvent.scheduled())
schedule(tickEvent, curTick() + ticks(1));
if (++noResponseCycles >= 500000) {
if (issueDmas) {
cerr << "DMA tester ";
}
cerr << name() << ": deadlocked at cycle " << curTick() << endl;
fatal("");
}
"
That hardcoded 500000 is not a great number (as people have said) because as
your topologies/mem. hierarchies change, then the max # of cycles that you
have to wait for a response can also change, right?
Increasing that # by hand is a arduous thing to do, so maybe that # should
come off a parameter, as well as maybe we should "warn" there that a
deadlock is possible after some type of inordinate wait time.
The fix should be just to warn about a long wait after an inordinate
period...Something like this I think:
"
if (++noResponseCycles % 500000 == 0) {
warn("cpu X has waited for %i cycles", noResponseCycles);
}
"
Lastly, should the memtester really send out a memory access on every tick?
The actual injection rate could be much higher than the rate at which we
resolve contention.
Maybe we should consider having X many outstanding requests per CPU as a
more realistic measure that can stress the system but not make the
noResponseCycles stat (?) grow to such an high number..
On Mon, Feb 7, 2011 at 1:27 PM, Beckmann, Brad <[email protected]>wrote:
> Yep, if I increase the deadlock threshold to 5 million cycles, the deadlock
> warning is not encountered. However, I don't think that we should increase
> the default deadlock threshold to by an order-of-magnitude. Instead, let's
> just increase the threashold for the mem tester. How about I check in the
> following small patch.
>
> Brad
>
>
> diff --git a/configs/example/ruby_mem_test.py
> b/configs/example/ruby_mem_test.py
> --- a/configs/example/ruby_mem_test.py
> +++ b/configs/example/ruby_mem_test.py
> @@ -135,6 +135,12 @@
> cpu.test = system.ruby.cpu_ruby_ports[i].port
> cpu.functional = system.funcmem.port
>
> + #
> + # Since the memtester is incredibly bursty, increase the deadlock
> + # threshold to 5 million cycles
> + #
> + system.ruby.cpu_ruby_ports[i].deadlock_threshold = 5000000
> +
> for (i, dma) in enumerate(dmas):
> #
> # Tie the dma memtester ports to the correct functional port
> diff --git a/tests/configs/memtest-ruby.py b/tests/configs/memtest-ruby.py
> --- a/tests/configs/memtest-ruby.py
> +++ b/tests/configs/memtest-ruby.py
> @@ -96,6 +96,12 @@
> #
> cpus[i].test = ruby_port.port
> cpus[i].functional = system.funcmem.port
> +
> + #
> + # Since the memtester is incredibly bursty, increase the deadlock
> + # threshold to 5 million cycles
> + #
> + ruby_port.deadlock_threshold = 5000000
>
> # -----------------------
> # run simulation
>
>
>
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]]
> > On Behalf Of Nilay Vaish
> > Sent: Monday, February 07, 2011 9:12 AM
> > To: M5 Developer List
> > Subject: Re: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory
> > protocol
> >
> > Brad, I also see the protocol getting into a dead lock. I tried to get a
> trace, but
> > I get segmentation fault (yes, the segmentation fault only occurs when
> trace
> > flag ProtocolTrace is supplied). It seems to me that memory is getting
> > corrupted somewhere, because the fault occurs in malloc it self.
> >
> > It could be that protocol is actually not in a dead lock. Both Arka and I
> had
> > increased the deadlock threashold while testing the protocol. I will try
> with
> > increased threashold later in the day.
> >
> > One more thing, the Orion 2.0 code that was committed last night makes
> use
> > of printf(). It did not compile cleanly for me. I had change it fatal()
> and include
> > the header file base/misc.hh.
> >
> > --
> > Nilay
> >
> > On Mon, 7 Feb 2011, Beckmann, Brad wrote:
> >
> > > FYI...If my local regression tests are correct. This patch does not
> > > fix all the problems with the MESI_CMP_directory protocol. One of the
> > > patches I just checked in fixes a subtle bug in the ruby_mem_test.
> > > Fixing this bug, exposes more deadlock problems in the
> > > MESI_CMP_directory protocol.
> > >
> > > To reproduce the regression tester's sequencer deadlock error, set the
> > > Randomization flag to false in the file
> > > configs/example/ruby_mem_test.py then run the following command:
> > >
> > > build/ALPHA_SE_MESI_CMP_directory/m5.debug
> > > configs/example/ruby_mem_test.py -n 8
> > >
> > > Let me know if you have any questions,
> > >
> > > Brad
> > >
> > >
> > >> -----Original Message-----
> > >> From: [email protected] [mailto:m5-dev-
> > [email protected]] On
> > >> Behalf Of Nilay Vaish
> > >> Sent: Thursday, January 13, 2011 8:50 PM
> > >> To: [email protected]
> > >> Subject: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory
> > >> protocol
> > >>
> > >> changeset 8f37a23e02d7 in /z/repo/m5
> > >> details: http://repo.m5sim.org/m5?cmd=changeset;node=8f37a23e02d7
> > >> description:
> > >> Ruby: Fixes MESI CMP directory protocol
> > >> The current implementation of MESI CMP directory protocol is
> > broken.
> > >> This patch, from Arkaprava Basu, fixes the protocol.
> > >>
> > >> diffstat:
> > >>
> >
> > _______________________________________________
> > m5-dev mailing list
> > [email protected]
> > http://m5sim.org/mailman/listinfo/m5-dev
>
>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>
--
- Korey
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev