On Mon, Aug 06, 2018 at 04:07:09PM -0700, Reinette Chatre wrote: > I've modified your suggestion slightly in an attempt to gain accuracy. > Now it looks like: > > local_irq_disable(); > /* disable hw prefetchers */ > /* init local vars to loop through pseudo-locked mem */ > perf_event_read_local(l2_hit_event, &l2_hits_before, NULL, NULL); > perf_event_read_local(l2_miss_event, &l2_miss_before, NULL, NULL); > /* loop through pseudo-locked mem */ > perf_event_read_local(l2_hit_event, &l2_hits_after, NULL, NULL); > perf_event_read_local(l2_miss_event, &l2_miss_after, NULL, NULL); > /* enable hw prefetchers */ > local_irq_enable(); > > With the above I do not see the impact of an interference workload > anymore but the results are not yet accurate: > > pseudo_lock_mea-538 [002] .... 113.296084: pseudo_lock_l2: hits=4103 > miss=2
an error of ~0.2% sounds pretty good to me ;-) FWIW, how long is that IRQ disabled section? It looks like something that could be taking a bit of time. We have these people that care about IRQ latency. > While the results do seem close, reporting a cache miss on memory that > is set up to be locked in cache is significant. If it is 'locked' then how does perf causes misses? I suppose I should go read that earlier email. > In an attempt to improve the accuracy of the above I modified it to the > following: > > /* create the two events as before in "enabled" state */ > l2_hit_pmcnum = l2_hit_event->hw.event_base_rdpmc; > l2_miss_pmcnum = l2_miss_event->hw.event_base_rdpmc; > local_irq_disable(); > /* disable hw prefetchers */ > /* init local vars to loop through pseudo-locked mem */ > l2_hits_before = native_read_pmc(l2_hit_pmcnum); > l2_miss_before = native_read_pmc(l2_miss_pmcnum); > /* loop through pseudo-locked mem */ > l2_hits_after = native_read_pmc(l2_hit_pmcnum); > l2_miss_after = native_read_pmc(l2_miss_pmcnum); > /* enable hw prefetchers */ > local_irq_enable(); So the above has a number or practical problems: - event_base_rdpmc is subject to change as long as interrupts are enabled. - I don't much fancy people accessing the guts of events like that; would not an inline function like: static inline u64 x86_perf_rdpmc(struct perf_event *event) { u64 val; lockdep_assert_irqs_disabled(); rdpmcl(event->hw.event_base_rdpmc, val); return val; } Work for you? - native_read_pmc(); are you 100% sure this code only ever runs on native and not in some dodgy virt environment? (also, I suppose all hardware supporting this resctl locking muck actually has rdpmc ;-)) - You used perf_event_attr::pinned=1, this means we could have failed to schedule the event, you need to check error state somewhere and have a credible error handling. For checking error state; you could use perf_event_read_local() early after disabling IRQs, it needs a little extra test though: if (event->attr.pinned && event->oncpu != smp_processor_id()) return -EBUSY; to correctly deal with pinned failure. Place it such that it becomes the last sanity check. Also, while you disable IRQs, your fancy pants loop is still subject to NMIs that can/will perturb your measurements, how do you deal with those?