Hello, Thank you for developing marssx86, which is a very useful tool for research. I have a question about the MESI protocol implemented in marssx86.
I ran marss simulation mode using a dual-core configuration, where the private MESI L1 cache latency is 4 cycles, the private MESI L2 cache latency is 11 cycles, and the shared L3 cache latency is 30 cycles. I also have a microbenchmark that measures the cache coherency miss latency using the RDTSC instruction. In the microbenchmark, two threads alternately write to the same cacheline-size data, so every write is a miss. For instance, in core1, each write would encounter a miss, and would send an invalidation message to core2; core2 would write the data back to the shared L3 cache so that core1 can get the data from there and update its local L1 and L2 cache. Therefore, each write should at lease incur a latency of 4 + 11 + 11 + 4 + 30 + 30 = 90 cycles. However, results show that on average one write takes 37 cycles. It looks like the competed data is bouncing back and forth between two cores without involving the shared L3 cache. Did I misunderstand the MESI protocol or does marssx86 implement a different flavor of the protocol? -- Jie Chen Research Assistant Electrical and Computer Engineering, George Washington University, Email: [email protected] Phone: 202-994-6427 http://home.gwu.edu/~jiec
_______________________________________________ http://www.marss86.org Marss86-Devel mailing list [email protected] https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
