Hey Stavros, Could you take a look at the equation in Appendix A6, which is then used in Figure 6, in the paper you referenced?
The paper uses MEM UNCORE RETIRED.REMOTE HITM as the primary metric, which correspond to the number of time an LLC reference hit a modified line on a remote socket. (the Intel help page for reference http://i.imgur.com/CfZIZFD.png) Yet, Figure 6 is used to illustrate the on-chip interconnect. I think MEM UNCORE RETIRED.LOCAL HITM would be a better counter to use in this case. Better yet would be to add the two above together to approximate a many-core architecture where there is no inter-socket traffic. So is that a typo or am I missing something? Thanks, Tri From: Volos Stavros Sent: Tuesday, July 2, 2013 4:36 AM To: Tri M. Nguyen, [email protected] Dear Tri, Thanks for your interest in CloudSuite. Please refer to appendix of the journal version (http://infoscience.epfl.ch/record/182529). You can find all the details on the methodology used in the "Clearing the Clouds" paper. Regards, -Stavros. ________________________________________ From: Tri M. Nguyen [[email protected]] Sent: Tuesday, July 02, 2013 3:15 AM To: [email protected] Subject: Performance counters used in Clearing the Clouds Hi there, I'm sorry if this had been asked before, but what are the exact performance counters that you guys used for the paper? I want to replicate the results reported. The reason that I ask is because there are different combinations of measuring the same parameter. For example, to measure the L2 miss rate, I can either use `l2_access` and `l2_hit`, or replace `l2_access` with `l1_miss`, or `l2_access` and `l3_access`, or any of the above in IBS mode. Worse yet, I found that the results can be quite conflicting. I am using Intel VTune with an Intel Westmere processor (Xeon E7-4870), which I believe is of the same generation as the one you used. Much thanks! Tri
