Hello,

Thank you for developing marssx86, which is a very useful tool for
research. I
have a question about the MESI protocol implemented in marssx86.

I ran marss simulation mode using a dual-core configuration, where the
private
MESI L1 cache latency is 4 cycles, the private MESI L2 cache latency is 11
cycles,
and the shared L3 cache latency is 30 cycles.

I also have a microbenchmark that measures the cache coherency miss latency
using  the RDTSC instruction. In the microbenchmark, two threads
alternately write
to the same cacheline-size data, so every write is a miss. For instance, in
core1, each
write would encounter a miss, and would send an invalidation message to
core2;
core2 would write the data back to the shared L3 cache so that core1 can
get the
data from there and update its local L1 and L2 cache. Therefore, each write
should at
lease incur a latency of 4 + 11 + 11 + 4 + 30 + 30
= 90 cycles.

However, results show that on average one write takes 37 cycles. It looks
like the
competed data is bouncing back and forth between two cores without
involving the
shared L3 cache. Did I misunderstand the MESI protocol or does marssx86
implement
a different flavor of the protocol?

-- 
Jie Chen
Research Assistant
Electrical and Computer Engineering,
George Washington University,
Email: [email protected]
Phone: 202-994-6427
http://home.gwu.edu/~jiec
_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Reply via email to