Re: [PATCH V1 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support

2017-11-13 Thread Megha Dey
On Mon, 2017-11-13 at 21:25 +0100, Thomas Gleixner wrote:
> On Mon, 13 Nov 2017, Dey, Megha wrote:
> > >-Original Message-
> > >From: Peter Zijlstra [mailto:pet...@infradead.org]
> > >Sent: Monday, November 13, 2017 1:00 AM
> > >To: Megha Dey 
> > >Cc: x...@kernel.org; linux-ker...@vger.kernel.org; linux-
> 
> Please fix your mail client so it does not add this complete useless
> information to the reply.

Will fix this.
> 
> > >On Sat, Nov 11, 2017 at 01:20:05PM -0800, Megha Dey wrote:
> > >> +/*
> > >> + * Unmask the NMI bit of the local APIC the first time task is
> > >> +scheduled
> > >> + * on a particular CPU.
> > >> + */
> > >> +static void intel_bm_unmask_nmi(void) {
> > >> +this_cpu_write(bm_unmask_apic, 0);
> > >> +
> > >> +if (!(this_cpu_read(bm_unmask_apic))) {
> > >> +apic_write(APIC_LVTPC, APIC_DM_NMI);
> > >> +this_cpu_inc(bm_unmask_apic);
> > >> +}
> > >> +}
> > >
> > >What? Why?
> > 
> 
> > Normally, other drivers using perf create an event on every CPU (thereby
> > calling perf_init on every CPU), where this bit(APIC_DM_NMI)is explicitly
> > unmasked.  In our driver, we do not do this (since we are worried only
> > about a particular task) and hence this bit is only disabled on the local
> > APIC where the perf event is initialized.
> >
> > As such, if the task is scheduled out to some other CPU, this bit is set
> > and hence would stop the interrupt from reaching the processing core.
> 
> Still that code makes no sense at all and certainly does not do what you
> claim it does:
> 
> > >> +this_cpu_write(bm_unmask_apic, 0);
> > >> +
> > >> +if (!(this_cpu_read(bm_unmask_apic))) {
> 
> So first you write the per cpu variable to 0 and then you check whether it
> is zero, which is pointless obviously.

yes, I see your point. The logic is flawed. Will fix this.
> 
> > >
> > >> +static int intel_bm_event_add(struct perf_event *event, int mode) {
> 
> Please move the opening bracket of the function into the next line. See the
> kernel coding style documentation.

Will do.
> 
> Thanks,
> 
>   tglx


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V1 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support

2017-11-13 Thread Thomas Gleixner
On Mon, 13 Nov 2017, Dey, Megha wrote:
> >-Original Message-
> >From: Peter Zijlstra [mailto:pet...@infradead.org]
> >Sent: Monday, November 13, 2017 1:00 AM
> >To: Megha Dey 
> >Cc: x...@kernel.org; linux-ker...@vger.kernel.org; linux-

Please fix your mail client so it does not add this complete useless
information to the reply.

> >On Sat, Nov 11, 2017 at 01:20:05PM -0800, Megha Dey wrote:
> >> +/*
> >> + * Unmask the NMI bit of the local APIC the first time task is
> >> +scheduled
> >> + * on a particular CPU.
> >> + */
> >> +static void intel_bm_unmask_nmi(void) {
> >> +  this_cpu_write(bm_unmask_apic, 0);
> >> +
> >> +  if (!(this_cpu_read(bm_unmask_apic))) {
> >> +  apic_write(APIC_LVTPC, APIC_DM_NMI);
> >> +  this_cpu_inc(bm_unmask_apic);
> >> +  }
> >> +}
> >
> >What? Why?
> 

> Normally, other drivers using perf create an event on every CPU (thereby
> calling perf_init on every CPU), where this bit(APIC_DM_NMI)is explicitly
> unmasked.  In our driver, we do not do this (since we are worried only
> about a particular task) and hence this bit is only disabled on the local
> APIC where the perf event is initialized.
>
> As such, if the task is scheduled out to some other CPU, this bit is set
> and hence would stop the interrupt from reaching the processing core.

Still that code makes no sense at all and certainly does not do what you
claim it does:

> >> +  this_cpu_write(bm_unmask_apic, 0);
> >> +
> >> +  if (!(this_cpu_read(bm_unmask_apic))) {

So first you write the per cpu variable to 0 and then you check whether it
is zero, which is pointless obviously.

> >
> >> +static int intel_bm_event_add(struct perf_event *event, int mode) {

Please move the opening bracket of the function into the next line. See the
kernel coding style documentation.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V1 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support

2017-11-13 Thread Dey, Megha


>-Original Message-
>From: Peter Zijlstra [mailto:pet...@infradead.org]
>Sent: Monday, November 13, 2017 1:00 AM
>To: Megha Dey <megha@linux.intel.com>
>Cc: x...@kernel.org; linux-ker...@vger.kernel.org; linux-
>d...@vger.kernel.org; t...@linutronix.de; mi...@redhat.com;
>h...@zytor.com; andriy.shevche...@linux.intel.com;
>kstew...@linuxfoundation.org; Yu, Yu-cheng <yu-cheng...@intel.com>;
>Brown, Len <len.br...@intel.com>; gre...@linuxfoundation.org;
>a...@kernel.org; alexander.shish...@linux.intel.com; jo...@redhat.com;
>namhy...@kernel.org; vikas.shiva...@linux.intel.com;
>pombreda...@nexb.com; m...@kylehuey.com; b...@suse.de; Andrejczuk,
>Grzegorz <grzegorz.andrejc...@intel.com>; Luck, Tony
><tony.l...@intel.com>; cor...@lwn.net; Shankar, Ravi V
><ravi.v.shan...@intel.com>; Dey, Megha <megha@intel.com>
>Subject: Re: [PATCH V1 2/3] perf/x86/intel/bm.c: Add Intel Branch
>Monitoring support
>
>On Sat, Nov 11, 2017 at 01:20:05PM -0800, Megha Dey wrote:
>> Currently, the cannonlake family of Intel processors support the
>> branch monitoring feature. Intel's Branch monitoring feature is trying
>> to utilize heuristics to detect the occurrence of an ROP (Return
>> Oriented Programming) attack.
>>
>> A perf-based kernel driver has been used to monitor the occurrence of
>> one of the 6 branch monitoring events. There are 2 counters that each
>> can select between one of these events for evaluation over a specified
>> instruction window size (0 to 1023). For each counter, a threshold
>> value
>> (0 to 127) can be configured to set a point at which ROP detection
>> event action is taken (determined by user-space). Each task can
>> monitor a maximum of 2 events at any given time.
>>
>> Apart from window_size(global) and threshold(per-counter), various
>> sysfs entries are provided for the user to configure: guest_disable,
>> lbr_freeze, window_cnt_sel, cnt_and_mode (all global) and
>mispred_evt_cnt(per-counter).
>> For all events belonging to the same task, the global parameters are
>> shared.
>
>Is there any sensible documentation on this except the MSR listings?

I have documented these sysfs entries in the next patch of this patch set:
Add Documentation for branch monitoring. Apart from that, unfortunately
there is only the MSR listings.

>
>>
>> Everytime a task is scheduled out, we save current window and count
>> associated with the event being monitored. When the task is scheduled
>> next, we start counting from previous count associated with this event.
>> Thus, a full context switch in this case is not necessary.
>
>What? That doesn't make any sense. The fact that we scheduled out and
>then in again _is_ a full context switch no?

What I meant was we need not save and restore all the branch monitoring 
MSRs during a context switch. I agree this is confusing. Will remove this line. 
>
>>
>> Signed-off-by: Megha Dey <megha@linux.intel.com>
>> Signed-off-by: Yu-Cheng Yu <yu-cheng...@intel.com>
>
>That SoB chain is buggered.

Will change the ordering.
>
>
>> +static int intel_bm_event_nmi_handler(unsigned int cmd, struct
>> +pt_regs *regs) {
>> +struct perf_event *event;
>> +union bm_detect_status stat;
>> +int i;
>> +unsigned long x;
>> +
>> +rdmsrl(BR_DETECT_STATUS_MSR, stat.raw);
>
>   if (!stat.event)
>   return NMI_DONE;
>
>saves you a whole bunch of indentation, no?

Yep, it does.
>
>> +
>> +if (stat.event) {
>> +wrmsrl(BR_DETECT_STATUS_MSR, 0);
>> +apic_write(APIC_LVTPC, APIC_DM_NMI);
>> +/*
>> + * Issue wake-up to corresponding polling event
>> + */
>> +x = stat.ctrl_hit;
>> +for_each_set_bit(i, , BM_MAX_COUNTERS) {
>> +event = current->thread.bm_counter_owner[i];
>> +local64_inc(>count);
>> +atomic_set(>hw.bm_poll, POLLIN);
>> +event->pending_wakeup = 1;
>> +irq_work_queue(>pending);
>> +}
>> +return NMI_HANDLED;
>> +}
>> +return NMI_DONE;
>> +}
>> +
>> +/*
>> + * Unmask the NMI bit of the local APIC the first time task is
>> +scheduled
>> + * on a particular CPU.
>> + */
>> +static void intel_bm_unmask_nmi(void) {
>> +this_cpu_write(bm_unmask_apic, 0);
>> +
>> +if (!(this_cpu_read(bm_unmask_apic))) {
>> +apic_write(APIC_LVTPC, APIC_DM_NMI);
>

Re: [PATCH V1 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support

2017-11-13 Thread Peter Zijlstra
On Sat, Nov 11, 2017 at 01:20:05PM -0800, Megha Dey wrote:
> Currently, the cannonlake family of Intel processors support the
> branch monitoring feature. Intel's Branch monitoring feature is trying
> to utilize heuristics to detect the occurrence of an ROP (Return
> Oriented Programming) attack.
> 
> A perf-based kernel driver has been used to monitor the occurrence of
> one of the 6 branch monitoring events. There are 2 counters that each
> can select between one of these events for evaluation over a specified
> instruction window size (0 to 1023). For each counter, a threshold value
> (0 to 127) can be configured to set a point at which ROP detection event
> action is taken (determined by user-space). Each task can monitor
> a maximum of 2 events at any given time.
> 
> Apart from window_size(global) and threshold(per-counter), various sysfs
> entries are provided for the user to configure: guest_disable, lbr_freeze,
> window_cnt_sel, cnt_and_mode (all global) and mispred_evt_cnt(per-counter).
> For all events belonging to the same task, the global parameters are
> shared.

Is there any sensible documentation on this except the MSR listings?

> 
> Everytime a task is scheduled out, we save current window and count
> associated with the event being monitored. When the task is scheduled
> next, we start counting from previous count associated with this event.
> Thus, a full context switch in this case is not necessary.

What? That doesn't make any sense. The fact that we scheduled out and
then in again _is_ a full context switch no?

> 
> Signed-off-by: Megha Dey 
> Signed-off-by: Yu-Cheng Yu 

That SoB chain is buggered.


> +static int intel_bm_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
> +{
> + struct perf_event *event;
> + union bm_detect_status stat;
> + int i;
> + unsigned long x;
> +
> + rdmsrl(BR_DETECT_STATUS_MSR, stat.raw);

if (!stat.event)
return NMI_DONE;

saves you a whole bunch of indentation, no?

> +
> + if (stat.event) {
> + wrmsrl(BR_DETECT_STATUS_MSR, 0);
> + apic_write(APIC_LVTPC, APIC_DM_NMI);
> + /*
> +  * Issue wake-up to corresponding polling event
> +  */
> + x = stat.ctrl_hit;
> + for_each_set_bit(i, , BM_MAX_COUNTERS) {
> + event = current->thread.bm_counter_owner[i];
> + local64_inc(>count);
> + atomic_set(>hw.bm_poll, POLLIN);
> + event->pending_wakeup = 1;
> + irq_work_queue(>pending);
> + }
> + return NMI_HANDLED;
> + }
> + return NMI_DONE;
> +}
> +
> +/*
> + * Unmask the NMI bit of the local APIC the first time task is scheduled
> + * on a particular CPU.
> + */
> +static void intel_bm_unmask_nmi(void)
> +{
> + this_cpu_write(bm_unmask_apic, 0);
> +
> + if (!(this_cpu_read(bm_unmask_apic))) {
> + apic_write(APIC_LVTPC, APIC_DM_NMI);
> + this_cpu_inc(bm_unmask_apic);
> + }
> +}

What? Why?

> +static int intel_bm_event_add(struct perf_event *event, int mode)
> +{
> + union bm_detect_status cur_stat, prev_stat;
> +
> + WARN_ON(event->hw.id >= BM_MAX_COUNTERS);
> +
> + prev_stat.raw = local64_read(>hw.prev_count);
> +
> + /*
> +  * Start counting from previous count associated with this event
> +  */
> + rdmsrl(BR_DETECT_STATUS_MSR, cur_stat.raw);
> +
> + cur_stat.count[event->hw.id] = prev_stat.count[event->hw.id];
> + cur_stat.count_window = prev_stat.count_window;
> + wrmsrl(BR_DETECT_STATUS_MSR, cur_stat.raw);

Why are you writing back the value you read? Just to waste cycles?

> + wrmsrl(BR_DETECT_CONTROL_MSR, event->hw.bm_ctrl);
> +
> + intel_bm_unmask_nmi();
> +
> + wrmsrl(BR_DETECT_COUNTER_CONFIG_BASE + event->hw.id,
> + (event->hw.bm_counter_conf | 1));

Please use a named construct for that enable bit.

> +
> + return 0;
> +}
> +
> +static void intel_bm_event_update(struct perf_event *event)
> +{
> + union bm_detect_status cur_stat;
> +
> + rdmsrl(BR_DETECT_STATUS_MSR, cur_stat.raw);
> + local64_set(>hw.prev_count, (uint64_t)cur_stat.raw);
> +}

That looks wrong... the general point of update functions is to update
the count, the above does not in fact do that.

> +
> +static void intel_bm_event_del(struct perf_event *event, int flags)
> +{
> + WARN_ON(event->hw.id >= BM_MAX_COUNTERS);
> +
> + wrmsrl(BR_DETECT_COUNTER_CONFIG_BASE + event->hw.id,
> + (event->hw.bm_counter_conf & ~1));

Either that EN bit is part of the bm_counter_conf, in which case you
didn't need to add it in _add(), or its not and you don't need to clear
it here. Make up your mind.

> +
> + intel_bm_event_update(event);

Except of course, that does not in fact update...

> +}
> +
> +static void intel_bm_event_destroy(struct perf_event 

[PATCH V1 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support

2017-11-11 Thread Megha Dey
Currently, the cannonlake family of Intel processors support the
branch monitoring feature. Intel's Branch monitoring feature is trying
to utilize heuristics to detect the occurrence of an ROP (Return
Oriented Programming) attack.

A perf-based kernel driver has been used to monitor the occurrence of
one of the 6 branch monitoring events. There are 2 counters that each
can select between one of these events for evaluation over a specified
instruction window size (0 to 1023). For each counter, a threshold value
(0 to 127) can be configured to set a point at which ROP detection event
action is taken (determined by user-space). Each task can monitor
a maximum of 2 events at any given time.

Apart from window_size(global) and threshold(per-counter), various sysfs
entries are provided for the user to configure: guest_disable, lbr_freeze,
window_cnt_sel, cnt_and_mode (all global) and mispred_evt_cnt(per-counter).
For all events belonging to the same task, the global parameters are
shared.

Everytime a task is scheduled out, we save current window and count
associated with the event being monitored. When the task is scheduled
next, we start counting from previous count associated with this event.
Thus, a full context switch in this case is not necessary.

To monitor a user space application for ROP related events, perf command
line can be used as follows:

perf stat -e  

eg. For the following test program (test.c) and threshold = 100
(echo 100 > /sys/devices/intel_bm/threshold)

void func(void)
{
return;
}

void main(void)
{
int i;

for (i = 0; i < 128; i++) {
func();
}

return;
}

perf stat -e intel_bm/rets/ ./test

 Performance counter stats for './test':

 1  intel_bm/rets/

   0.104705937 seconds time elapsed

perf returns the number of branch monitoring interrupts occurred during
the execution of the user-space application.

Signed-off-by: Megha Dey 
Signed-off-by: Yu-Cheng Yu 
---
 arch/x86/events/Kconfig  |  10 +
 arch/x86/events/intel/Makefile   |   2 +
 arch/x86/events/intel/bm.c   | 618 +++
 arch/x86/include/asm/msr-index.h |   5 +
 arch/x86/include/asm/processor.h |   4 +
 include/linux/perf_event.h   |   9 +-
 kernel/events/core.c |  16 +
 7 files changed, 663 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/events/intel/bm.c

diff --git a/arch/x86/events/Kconfig b/arch/x86/events/Kconfig
index 9a7a144..40903ca 100644
--- a/arch/x86/events/Kconfig
+++ b/arch/x86/events/Kconfig
@@ -9,6 +9,16 @@ config PERF_EVENTS_INTEL_UNCORE
Include support for Intel uncore performance events. These are
available on NehalemEX and more modern processors.
 
+config PERF_EVENTS_INTEL_BM
+   bool "Intel Branch Monitoring support"
+   depends on PERF_EVENTS && CPU_SUP_INTEL && PCI
+   ---help---
+ Include support for Intel Branch monitoring. This feature utilizes
+ heuristics for detecting ROP(Return oriented programming) like
+ attacks. These heuristics are based off certain performance
+ monitoring statistics, measured dynamically over a short
+ configurable window period.
+
 config PERF_EVENTS_INTEL_RAPL
tristate "Intel rapl performance events"
depends on PERF_EVENTS && CPU_SUP_INTEL && PCI
diff --git a/arch/x86/events/intel/Makefile b/arch/x86/events/intel/Makefile
index 3468b0c..14235ec 100644
--- a/arch/x86/events/intel/Makefile
+++ b/arch/x86/events/intel/Makefile
@@ -2,6 +2,8 @@
 obj-$(CONFIG_CPU_SUP_INTEL)+= core.o bts.o
 obj-$(CONFIG_CPU_SUP_INTEL)+= ds.o knc.o
 obj-$(CONFIG_CPU_SUP_INTEL)+= lbr.o p4.o p6.o pt.o
+obj-$(CONFIG_PERF_EVENTS_INTEL_BM) += intel-bm-perf.o
+intel-bm-perf-objs := bm.o
 obj-$(CONFIG_PERF_EVENTS_INTEL_RAPL)   += intel-rapl-perf.o
 intel-rapl-perf-objs   := rapl.o
 obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += intel-uncore.o
diff --git a/arch/x86/events/intel/bm.c b/arch/x86/events/intel/bm.c
new file mode 100644
index 000..923c6e9
--- /dev/null
+++ b/arch/x86/events/intel/bm.c
@@ -0,0 +1,618 @@
+/*
+ * Support for Intel branch monitoring counters
+ *
+ * Intel branch monitoring MSRs are specified in the Intel® 64 and IA-32
+ * Software Developer’s Manual Volume 4 section 2.16.2 (October 2017)
+ *
+ * Copyright (c) 2017, Intel Corporation.
+ *
+ * Contact Information:
+ * Megha Dey 
+ * Yu-Cheng Yu 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A