Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-30 Thread Namhyung Kim
Hi Stephane,

On Mon, 29 Oct 2012 16:15:46 +0100, Stephane Eranian wrote:
> + /*
> +  * use the mapping table for bit 0-15
> +  */
> + val = pebs_data_source[dse.ld_dse];

I guess you meant bit 0-3, right?

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Andi Kleen
> I agree. I did that because it was the easiest thing I could think of.
> Discovered I had to deal with all of this just two days ago when I
> rebased. Wasn't too happy to have to deal with this at the last minute.

I'll repost, as soon as I debugged why Jiri's version of my 31/33 parser patch
doesn't work.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Stephane Eranian
On Mon, Oct 29, 2012 at 10:16 PM, Andi Kleen  wrote:
>> > Why do you need to replace the whole table?
>> >
>> Because I am extending them with one or two events based on cpu
>> model. That was the easiest way of doing this instead of playing
>> some kind of malloc+copy trick.
>
> I did malloc and copy.
>
>>
>> > BTW I still think my approach in the v4 Haswell patchkit
>> > is simpler and didn't rely on hardcoding these events.
>> >
>> I don't care about those events. As I found out, they are not even
>> used by perf because they are all hardcoded and that's what gets
>> used. I assume they are exposed for reference only. I don't object
>> to that. But I think the right mechanism would be one where you
>> can add events at boot time based on CPU model. It could be used
>> to add the common events as well in the common part of the init
>> code.
>
> Yes that's what I did.
>
> I don't think copying everything for everything new is a good
> approach.
>
I agree. I did that because it was the easiest thing I could think of.
Discovered I had to deal with all of this just two days ago when I
rebased. Wasn't too happy to have to deal with this at the last minute.

In any case, if you have something, then looks like I need to wait until
it's in before I can adjust this patch set.


> -Andi
>
> --
> a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Andi Kleen
> > Why do you need to replace the whole table?
> >
> Because I am extending them with one or two events based on cpu
> model. That was the easiest way of doing this instead of playing
> some kind of malloc+copy trick.

I did malloc and copy.

> 
> > BTW I still think my approach in the v4 Haswell patchkit
> > is simpler and didn't rely on hardcoding these events.
> >
> I don't care about those events. As I found out, they are not even
> used by perf because they are all hardcoded and that's what gets
> used. I assume they are exposed for reference only. I don't object
> to that. But I think the right mechanism would be one where you
> can add events at boot time based on CPU model. It could be used
> to add the common events as well in the common part of the init
> code.

Yes that's what I did. 

I don't think copying everything for everything new is a good 
approach.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Peter Zijlstra
On Mon, 2012-10-29 at 21:39 +0100, Stephane Eranian wrote:
> But I think the right mechanism would be one where you
> can add events at boot time based on CPU model. It could be used
> to add the common events as well in the common part of the init
> code. 

mlin once posted something like that, it did some scary things with
sysfs without using the driver model -- which might or might not be
'right' (TM).

If someone has the stomach to stare at that code and then convince
gregkh that its not 'wrong' I'm all for it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Stephane Eranian
On Mon, Oct 29, 2012 at 8:42 PM, Andi Kleen  wrote:
>> +
>> +struct attribute *nhm_events_attrs[] = {
>> + EVENT_PTR(CPU_CYCLES),
>> + EVENT_PTR(INSTRUCTIONS),
>> + EVENT_PTR(CACHE_REFERENCES),
>> + EVENT_PTR(CACHE_MISSES),
>> + EVENT_PTR(BRANCH_INSTRUCTIONS),
>> + EVENT_PTR(BRANCH_MISSES),
>> + EVENT_PTR(BUS_CYCLES),
>> + EVENT_PTR(STALLED_CYCLES_FRONTEND),
>> + EVENT_PTR(STALLED_CYCLES_BACKEND),
>> + EVENT_PTR(REF_CPU_CYCLES),
>> + EVENT_PTR(mem_ld_nhm),
>> + NULL,
>> +};
>
> I thought Jiri's patch already exports all the generic ones?
>
> Why do you need to replace the whole table?
>
Because I am extending them with one or two events based on cpu
model. That was the easiest way of doing this instead of playing
some kind of malloc+copy trick.

> BTW I still think my approach in the v4 Haswell patchkit
> is simpler and didn't rely on hardcoding these events.
>
I don't care about those events. As I found out, they are not even
used by perf because they are all hardcoded and that's what gets
used. I assume they are exposed for reference only. I don't object
to that. But I think the right mechanism would be one where you
can add events at boot time based on CPU model. It could be used
to add the common events as well in the common part of the init
code.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Andi Kleen
> +
> +struct attribute *nhm_events_attrs[] = {
> + EVENT_PTR(CPU_CYCLES),
> + EVENT_PTR(INSTRUCTIONS),
> + EVENT_PTR(CACHE_REFERENCES),
> + EVENT_PTR(CACHE_MISSES),
> + EVENT_PTR(BRANCH_INSTRUCTIONS),
> + EVENT_PTR(BRANCH_MISSES),
> + EVENT_PTR(BUS_CYCLES),
> + EVENT_PTR(STALLED_CYCLES_FRONTEND),
> + EVENT_PTR(STALLED_CYCLES_BACKEND),
> + EVENT_PTR(REF_CPU_CYCLES),
> + EVENT_PTR(mem_ld_nhm),
> + NULL,
> +};

I thought Jiri's patch already exports all the generic ones?

Why do you need to replace the whole table?

BTW I still think my approach in the v4 Haswell patchkit 
is simpler and didn't rely on hardcoding these events.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Peter Zijlstra
On Mon, 2012-10-29 at 16:43 +0100, Stephane Eranian wrote:
> You meant fll, instead I think.

Oh, yes, too small font I guess.

> Well, that would work too, but I am trying to factorize the code
> with Precise Store which is a later patch. 

Yeah, just found that, its fine the way it is. Just looked funneh on
first reading.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Stephane Eranian
On Mon, Oct 29, 2012 at 4:38 PM, Peter Zijlstra  wrote:
> On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote:
>> +   fll = event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT;
>> +
>> perf_sample_data_init(&data, 0, event->hw.last_period);
>>
>> +   data.period = event->hw.last_period;
>> +   sample_type = event->attr.sample_type;
>> +
>> +   /*
>> +* if PEBS-LL or PreciseStore
>> +*/
>> +   if (fll) {
>> +   if (sample_type & PERF_SAMPLE_ADDR)
>> +   data.addr = pebs->dla;
>> +
>> +   /*
>> +* Use latency for cost (only avail with PEBS-LL)
>> +*/
>> +   if (fll && (sample_type & PERF_SAMPLE_COST))
>> +   data.cost = pebs->lat;
>> +
>> +   /*
>> +* data.dsrc encodes the data source
>> +*/
>> +   if (sample_type & PERF_SAMPLE_DSRC) {
>> +   if (fll)
>> +   data.dsrc.val = load_latency_data(pebs->dse);
>> +   }
>> +   }
>
>
> Why does fl1 exist? the above looks really convoluted, all fl1 checks
> inside are pointless. Also 'fl1' is a bizarre name, 'pebs_ll' would be a
> better name I guess.
>
You meant fll, instead I think.

> What's wrong with:
>
> if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT) {
> if (sample_type & PERF_SAMPLE_ADDR)
> data.addr = pebs->dla;
>
> if (sample_type & PERF_SAMPLE_COST)
> data.cost = perf->lat;
>
> if (sample_type & PERF_SAMPLE_DSRC)
> data.dsrc.val = load_latency_data(pebs->dse);
> }
>
Well, that would work too, but I am trying to factorize the code
with Precise Store which is a later patch. But we could clone
your proposal and do:

> if (event->hw.flags & PERF_X86_EVENT_PEBS_ST) {
> if (sample_type & PERF_SAMPLE_ADDR)
> data.addr = pebs->dla;
>
> if (sample_type & PERF_SAMPLE_DSRC)
> data.dsrc.val = precise_store_data(pebs->dse);
> }

I am fine with that too.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Stephane Eranian
On Mon, Oct 29, 2012 at 4:35 PM, Peter Zijlstra  wrote:
> On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote:
>> +static u64 load_latency_data(u64 status)
>> +{
>> +   union intel_x86_pebs_dse dse;
>> +   u64 val;
>> +   int model = boot_cpu_data.x86_model;
>> +   int fam = boot_cpu_data.x86;
>> +
>> +   dse.val = status;
>> +
>> +   /*
>> +* use the mapping table for bit 0-15
>> +*/
>> +   val = pebs_data_source[dse.ld_dse];
>> +
>> +   /*
>> +* Nehalem models do not support TLB, Lock infos
>> +*/
>> +   if (fam == 0x6 && (model == 26 || model == 30
>> +   || model == 31 || model == 46)) {
>> +   val |= P(TLB, NA) | P(LOCK, NA);
>> +   return val;
>> +   }
>
> I'm so 100% sure we'll forget to add a nhm model number if we ever find
> we missed one.
>
> Could we either add a classification enum to x86_pmu that is set in the
> big model switch on init, or do this with your new constraints flags,
> where we have a different flag for NHM_LL vs SNB_LL or so?
>
Let's add something in x86_pmu. We could, for instance, generalize
x86_pmu.er_flags into x86_pmu.flags. Instead of adding yet another
flag field.

> Or if all else fails, add a quirk to the Intel Debugstore bits bitfield,
> something like pebs_ll_nhm.
>
>> +   /*
>> +* bit 4: TLB access
>> +* 0 = did not miss 2nd level TLB
>> +* 1 = missed 2nd level TLB
>> +*/
>> +   if (dse.ld_stlb_miss)
>> +   val |= P(TLB, MISS) | P(TLB, L2);
>> +   else
>> +   val |= P(TLB, HIT) | P(TLB,L1) | P(TLB, L2);
>> +
>> +   /*
>> +* bit 5: locked prefix
>> +*/
>> +   if (dse.ld_locked)
>> +   val |= P(LOCK, LOCKED);
>> +
>> +   return val;
>> +}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Peter Zijlstra
On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote:
> +   fll = event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT;
> +
> perf_sample_data_init(&data, 0, event->hw.last_period);
>  
> +   data.period = event->hw.last_period;
> +   sample_type = event->attr.sample_type;
> +
> +   /*
> +* if PEBS-LL or PreciseStore
> +*/
> +   if (fll) {
> +   if (sample_type & PERF_SAMPLE_ADDR)
> +   data.addr = pebs->dla;
> +
> +   /*
> +* Use latency for cost (only avail with PEBS-LL)
> +*/
> +   if (fll && (sample_type & PERF_SAMPLE_COST))
> +   data.cost = pebs->lat;
> +
> +   /*
> +* data.dsrc encodes the data source
> +*/
> +   if (sample_type & PERF_SAMPLE_DSRC) {
> +   if (fll)
> +   data.dsrc.val = load_latency_data(pebs->dse);
> +   }
> +   } 


Why does fl1 exist? the above looks really convoluted, all fl1 checks
inside are pointless. Also 'fl1' is a bizarre name, 'pebs_ll' would be a
better name I guess.

What's wrong with:

if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT) {
if (sample_type & PERF_SAMPLE_ADDR)
data.addr = pebs->dla;

if (sample_type & PERF_SAMPLE_COST)
data.cost = perf->lat;

if (sample_type & PERF_SAMPLE_DSRC)
data.dsrc.val = load_latency_data(pebs->dse);
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Peter Zijlstra
On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote:
> +static u64 load_latency_data(u64 status)
> +{
> +   union intel_x86_pebs_dse dse;
> +   u64 val;
> +   int model = boot_cpu_data.x86_model;
> +   int fam = boot_cpu_data.x86;
> +
> +   dse.val = status;
> +
> +   /*
> +* use the mapping table for bit 0-15
> +*/
> +   val = pebs_data_source[dse.ld_dse];
> +
> +   /*
> +* Nehalem models do not support TLB, Lock infos
> +*/
> +   if (fam == 0x6 && (model == 26 || model == 30
> +   || model == 31 || model == 46)) {
> +   val |= P(TLB, NA) | P(LOCK, NA);
> +   return val;
> +   }

I'm so 100% sure we'll forget to add a nhm model number if we ever find
we missed one.

Could we either add a classification enum to x86_pmu that is set in the
big model switch on init, or do this with your new constraints flags,
where we have a different flag for NHM_LL vs SNB_LL or so?

Or if all else fails, add a quirk to the Intel Debugstore bits bitfield,
something like pebs_ll_nhm.

> +   /*
> +* bit 4: TLB access
> +* 0 = did not miss 2nd level TLB
> +* 1 = missed 2nd level TLB
> +*/
> +   if (dse.ld_stlb_miss)
> +   val |= P(TLB, MISS) | P(TLB, L2);
> +   else
> +   val |= P(TLB, HIT) | P(TLB,L1) | P(TLB, L2);
> +
> +   /*
> +* bit 5: locked prefix
> +*/
> +   if (dse.ld_locked)
> +   val |= P(LOCK, LOCKED);
> +
> +   return val;
> +} 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Stephane Eranian
On Mon, Oct 29, 2012 at 4:23 PM, Peter Zijlstra  wrote:
> On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote:
>> +EVENT_ATTR_STR(mem-loads, mem_ld_nhm, "event=0x100b,umask=0x1,ldlat=3");
>> +EVENT_ATTR_STR(mem-loads, mem_ld_snb, "event=0xcd,umask=0x1,ldlat=3");
>
> I haven't fully grokked the macro magic yet, but event=0x100b seems
> wrong, event only takes 8 bit on Intel.

Yeah, my bad. I messed up on this one. should be:
event=0x0b,umask=0x10,ldlat=3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Peter Zijlstra
On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote:
> +EVENT_ATTR_STR(mem-loads, mem_ld_nhm, "event=0x100b,umask=0x1,ldlat=3");
> +EVENT_ATTR_STR(mem-loads, mem_ld_snb, "event=0xcd,umask=0x1,ldlat=3"); 

I haven't fully grokked the macro magic yet, but event=0x100b seems
wrong, event only takes 8 bit on Intel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Stephane Eranian
This patch adds support for memory profiling using the
PEBS Load Latency facility.

Load accesses are sampled by HW and the instruction
address, data address, load latency, data source, tlb,
locked information can be saved in the sampling buffer
if using the PERF_SAMPLE_COST (for latency),
PERF_SAMPLE_ADDR, PERF_SAMPLE_DSRC types.

To enable PEBS Load Latency, users have to use the
model specific event:
- on NHM/WSM: MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD
- on SNB/IVB: MEM_TRANS_RETIRED:LATENCY_ABOVE_THRESHOLD

To make things easier, this patch also exports a generic
alias via sysfs: mem-loads. It export the right event
encoding based on the host CPU and can be used directly
by the perf tool.

Loosely based on Intel's Lin Ming patch posted on LKML
in July 2011.

Signed-off-by: Stephane Eranian 
---
 arch/x86/include/asm/msr-index.h  |1 +
 arch/x86/kernel/cpu/perf_event.c  |3 +
 arch/x86/kernel/cpu/perf_event.h  |   22 -
 arch/x86/kernel/cpu/perf_event_intel.c|   56 
 arch/x86/kernel/cpu/perf_event_intel_ds.c |  133 +++--
 5 files changed, 206 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 7f0edce..5edc9c0 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -66,6 +66,7 @@
 #define MSR_IA32_PEBS_ENABLE   0x03f1
 #define MSR_IA32_DS_AREA   0x0600
 #define MSR_IA32_PERF_CAPABILITIES 0x0345
+#define MSR_PEBS_LD_LAT_THRESHOLD  0x03f6
 
 #define MSR_MTRRfix64K_0   0x0250
 #define MSR_MTRRfix16K_8   0x0258
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 9b56c20..2fc34d6 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1475,6 +1475,9 @@ static int __init init_hw_perf_events(void)
x86_pmu.attr_rdpmc = 1; /* enable userspace RDPMC usage by default */
x86_pmu_format_group.attrs = x86_pmu.format_attrs;
 
+   if (x86_pmu.event_attrs)
+   x86_pmu_events_group.attrs = x86_pmu.event_attrs;
+
if (!x86_pmu.events_sysfs_show)
x86_pmu_events_group.attrs = &empty_attrs;
else
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index f78d6e8..3c5aa72 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -46,6 +46,7 @@ enum extra_reg_type {
EXTRA_REG_RSP_0 = 0,/* offcore_response_0 */
EXTRA_REG_RSP_1 = 1,/* offcore_response_1 */
EXTRA_REG_LBR   = 2,/* lbr_select */
+   EXTRA_REG_LDLAT = 3,/* ld_lat_threshold */
 
EXTRA_REG_MAX   /* number of entries needed */
 };
@@ -61,6 +62,10 @@ struct event_constraint {
int overlap;
int flags;
 };
+/*
+ * struct event_constraint flags
+ */
+#define PERF_X86_EVENT_PEBS_LDLAT  0x1 /* ld+ldlat data address sampling */
 
 struct amd_nb {
int nb_id;  /* NorthBridge id */
@@ -233,6 +238,10 @@ struct cpu_hw_events {
 #define INTEL_UEVENT_CONSTRAINT(c, n)  \
EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK)
 
+#define INTEL_PLD_CONSTRAINT(c, n) \
+   __EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
+  HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LDLAT)
+
 #define EVENT_CONSTRAINT_END   \
EVENT_CONSTRAINT(0, 0, 0)
 
@@ -262,12 +271,22 @@ struct extra_reg {
.msr = (ms),\
.config_mask = (m), \
.valid_mask = (vm), \
-   .idx = EXTRA_REG_##i\
+   .idx = EXTRA_REG_##i,   \
}
 
 #define INTEL_EVENT_EXTRA_REG(event, msr, vm, idx) \
EVENT_EXTRA_REG(event, msr, ARCH_PERFMON_EVENTSEL_EVENT, vm, idx)
 
+#define INTEL_UEVENT_EXTRA_REG(event, msr, vm, idx) \
+   EVENT_EXTRA_REG(event, msr, ARCH_PERFMON_EVENTSEL_EVENT | \
+   ARCH_PERFMON_EVENTSEL_UMASK, vm, idx)
+
+#define INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(c) \
+   INTEL_UEVENT_EXTRA_REG(c, \
+  MSR_PEBS_LD_LAT_THRESHOLD, \
+  0x, \
+  LDLAT)
+
 #define EVENT_EXTRA_END EVENT_EXTRA_REG(0, 0, 0, 0, RSP_0)
 
 union perf_capabilities {
@@ -355,6 +374,7 @@ struct x86_pmu {
 */
int attr_rdpmc;
struct attribute **format_attrs;
+   struct attribute **event_attrs;
 
ssize_t (*events_sysfs_show)(char *page, u64 config);
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 57d6527..059fc8c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -81,6 +81,7 @@ static struct event_constraint 
intel_nehalem_event_constraints[] __read_mostly =
 static struct extra_reg intel_nehalem_extra_regs[] __read_mostly =
 {
INTEL_EVENT_EXTRA_REG(0xb7, MS