Hey Stavros,

Could you take a look at the equation in Appendix A6, which is then used in 
Figure 6, in the paper you referenced?

The paper uses MEM UNCORE RETIRED.REMOTE HITM as the primary metric, which 
correspond to the number of time an LLC reference hit a modified line on a 
remote socket. (the Intel help page for reference 
http://i.imgur.com/CfZIZFD.png)

Yet, Figure 6 is used to illustrate the on-chip interconnect. I think MEM 
UNCORE RETIRED.LOCAL HITM would be a better counter to use in this case. Better 
yet would be to add the two above together to approximate a many-core 
architecture where there is no inter-socket traffic.

So is that a typo or am I missing something?

Thanks,
Tri

From: Volos Stavros
Sent: ‎Tuesday‎, ‎July‎ ‎2‎, ‎2013 ‎4‎:‎36‎ ‎AM
To: Tri M. Nguyen, [email protected]

Dear Tri,

Thanks for your interest in CloudSuite.

Please refer to appendix of the journal version 
(http://infoscience.epfl.ch/record/182529). You can find all the details on the 
methodology used in the "Clearing the Clouds" paper.

Regards,
-Stavros.
________________________________________
From: Tri M. Nguyen [[email protected]]
Sent: Tuesday, July 02, 2013 3:15 AM
To: [email protected]
Subject: Performance counters used in Clearing the Clouds

Hi there,

I'm sorry if this had been asked before, but what are the exact performance 
counters that you guys used for the paper? I want to replicate the results 
reported.

The reason that I ask is because there are different combinations of measuring 
the same parameter. For example, to measure the L2 miss rate, I can either use 
`l2_access` and `l2_hit`, or replace `l2_access` with `l1_miss`, or `l2_access` 
and `l3_access`, or any of the above in IBS mode. Worse yet, I found that the 
results can be quite conflicting.

I am using Intel VTune with an Intel Westmere processor (Xeon E7-4870), which I 
believe is of the same generation as the one you used.

Much thanks!
Tri

Reply via email to