I found the error in MESI CMP. In the L2 cache controller, when the TBE is
deallocated, the pointer was not being set to NULL.
--
Nilay
On Mon, 7 Feb 2011, Arkaprava Basu wrote:
Nilay,
If the same test completes with larger threshold then it certainly a
case of false positive and certainly NOT a deadlock (but may be a case of
starvation). If it were actually a deadlock, it would have just reported
deadlock after some more time of simulation.
On extending stall and wait to other protocols, you are absolutely correct.
Many of the starvation issues (and thus perceived deadlock) show up due to
"unfairness" in handling coherence request. After the protocol trace
segmentation issue is solved, I can get MESI_CMP_directory to use stall and
wait.
I fully agree with Brad's argument about bumping up threshold for testers.
And having large threshold (i.e. 5 M) does not hurt much. It will take bit
more time of simulation to report the deadlock, but if there is an actual
deadlock it would anyway report it. So I would vote to stick with Brad's
threshold number in the patch.
Thanks
Arka
On 02/07/2011 12:39 PM, Nilay Vaish wrote:
Brad,
I think 5,000,000 is a lot. IIRC, a million worked the last time I tested
the protocol. We can check the patch in, though I am of the view that we
should let it remain as is till we can generate the protocol trace and make
sure that this not an actual dead lock. I need to first detect the reason
for the segmentation fault received only when trace is being collected.
Another issue is that we need to extend the stall and wait to other
protocols as well. This, I believe, may help in reducing such deadlock
instances. While working on MESI CMP, I saw many of the times earlier
requests remain un-fulfilled because of later requests for the same
address.
--
Nilay
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev