These are the following step I use:

1. First run with whatever default values of threshold are.
2. If deadlocked, take trace and try to find out is there evident reason for deadlock or not.
3. If no, double the default threshold value and run again.
4. If the same test passes with larger threshold, then it means the deadlock was actually not there. So life is good. If not, need to dig more into trace to see whats going on.

@Nilay:
By end of today, I will share with you the patch that seems like fixed that protocol.

Thanks
Arka

On 01/04/2011 12:51 PM, Nilay Vaish wrote:
What threshold do you use?

On Tue, 4 Jan 2011, Arkaprava Basu wrote:

Hi Nilay,

  On deadlock issue with MESI_CMP_directory :
Yes, this can happen as ruby_tester or Sequencer only reports *possible* deadlocks. With higher number of processors there is more contention (and thus latency) and it can mistakenly report deadlock. I generally look at the protocol trace to figure out whether there is actually any deadlock or not. You can also try doubling the Sequencer deadlock threshold and see if the problem goes away. If its a true deadlock, it will break again.

On some related note, as Brad has pointed out MESI_CMP_directory has its share of issues. Recently one of Prof. Sarita Adve's student e-mailed us (Multifacet) about 6 bugs he found while model checking the MESI_CMP_directory (including a major one). I took some time to look at them and it seems like MESI_CMP_directory is now fixed (hopefully). The modified protocol is now passing 1M checks with 16 processors with multiple random seeds. I can locally coordinate with you on this, if you want.

Thanks
Arka

_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to