These are the following step I use:
1. First run with whatever default values of threshold are.
2. If deadlocked, take trace and try to find out is there evident reason
for deadlock or not.
3. If no, double the default threshold value and run again.
4. If the same test passes with larger threshold, then it means the
deadlock was actually not there. So life is good. If not, need to dig
more into trace to see whats going on.
@Nilay:
By end of today, I will share with you the patch that seems like fixed
that protocol.
Thanks
Arka
On 01/04/2011 12:51 PM, Nilay Vaish wrote:
What threshold do you use?
On Tue, 4 Jan 2011, Arkaprava Basu wrote:
Hi Nilay,
On deadlock issue with MESI_CMP_directory :
Yes, this can happen as ruby_tester or Sequencer only reports
*possible* deadlocks. With higher number of processors there is more
contention (and thus latency) and it can mistakenly report deadlock.
I generally look at the protocol trace to figure out whether there is
actually any deadlock or not. You can also try doubling the Sequencer
deadlock threshold and see if the problem goes away. If its a true
deadlock, it will break again.
On some related note, as Brad has pointed out MESI_CMP_directory has
its share of issues. Recently one of Prof. Sarita Adve's student
e-mailed us (Multifacet) about 6 bugs he found while model checking
the MESI_CMP_directory (including a major one). I took some time to
look at them and it seems like MESI_CMP_directory is now fixed
(hopefully). The modified protocol is now passing 1M checks with 16
processors with multiple random seeds. I can locally coordinate with
you on this, if you want.
Thanks
Arka
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev