Hi guys, I am currently tracking down a bug in large (16 cores) configurations and the MOESI / private-L2 + directory configuration. Is that configuration tested under high contention?
I have walked the through large traces of this, and the problem boils down to an issue where the directory sends an update message to an L2 it thinks is the owner of the cacheline. The problem is that the receiving L2 has the cache line in Shared, which then expects update messages to come in with a proper new cache-line stat, triggering a NULL pointer exception when it tries to dereference the m_arg argument reading the state in: MOESILogic::handle_interconn_hit -> case MOESI_SHARED: -> *state = *(W8*)(queueEntry->m_arg); There is a number of things I do not understand, and I would be extremely grateful if someone could help me understand what is going on here. When is it legal to send an update message? What is the purpose of the directory sending an update message? What is the invariant of the directory's owner / present fields? Are they supposed to be precise? (They are not, i.e., owner and present fields seem to be out of sync with the real state in the caches.) What is the difference between all the source / origin / dest/ responder etc. fields? While digging through this, I have found a number of unexplained things, which may be bugs or rooted in my misunderstanding: * the directory sometimes wrongly interprets the sender of an evict message because it seems to merge it with a DirContBufferEntry of the still ongoing request that caused the evict ... this breaks the owner tracking * the directory does not seem to wait for an evict etc. to be fully propagated to the cache hierarchy * the content of the update message is dependent on the state in the receiver M/O vs. E/S, is that intentional? * if one L2 sends its data directly to another L2, that receiving L2 is confused and complains about an unknown source * there is no distinction between request / responses; it is all rather implicit (upper / lower interconnect, hasData) so is there a rule of thumb to keep the two apart? In general, is there some high-level draft, how the directory system should work and which kind of invariants hold when? Has the model been tested and is it safe to use? Thanks, Stephan _______________________________________________ http://www.marss86.org Marss86-Devel mailing list [email protected] https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
