Re: [PATCH V1 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support

2017-11-13 Thread Megha Dey
On Mon, 2017-11-13 at 21:25 +0100, Thomas Gleixner wrote:
> On Mon, 13 Nov 2017, Dey, Megha wrote:
> > >-Original Message-
> > >From: Peter Zijlstra [mailto:pet...@infradead.org]
> > >Sent: Monday, November 13, 2017 1:00 AM
> > >To: Megha Dey 
> > >Cc: x...@kernel.org; linux-ker...@vger.kernel.org; linux-
> 
> Please fix your mail client so it does not add this complete useless
> information to the reply.

Will fix this.
> 
> > >On Sat, Nov 11, 2017 at 01:20:05PM -0800, Megha Dey wrote:
> > >> +/*
> > >> + * Unmask the NMI bit of the local APIC the first time task is
> > >> +scheduled
> > >> + * on a particular CPU.
> > >> + */
> > >> +static void intel_bm_unmask_nmi(void) {
> > >> +this_cpu_write(bm_unmask_apic, 0);
> > >> +
> > >> +if (!(this_cpu_read(bm_unmask_apic))) {
> > >> +apic_write(APIC_LVTPC, APIC_DM_NMI);
> > >> +this_cpu_inc(bm_unmask_apic);
> > >> +}
> > >> +}
> > >
> > >What? Why?
> > 
> 
> > Normally, other drivers using perf create an event on every CPU (thereby
> > calling perf_init on every CPU), where this bit(APIC_DM_NMI)is explicitly
> > unmasked.  In our driver, we do not do this (since we are worried only
> > about a particular task) and hence this bit is only disabled on the local
> > APIC where the perf event is initialized.
> >
> > As such, if the task is scheduled out to some other CPU, this bit is set
> > and hence would stop the interrupt from reaching the processing core.
> 
> Still that code makes no sense at all and certainly does not do what you
> claim it does:
> 
> > >> +this_cpu_write(bm_unmask_apic, 0);
> > >> +
> > >> +if (!(this_cpu_read(bm_unmask_apic))) {
> 
> So first you write the per cpu variable to 0 and then you check whether it
> is zero, which is pointless obviously.

yes, I see your point. The logic is flawed. Will fix this.
> 
> > >
> > >> +static int intel_bm_event_add(struct perf_event *event, int mode) {
> 
> Please move the opening bracket of the function into the next line. See the
> kernel coding style documentation.

Will do.
> 
> Thanks,
> 
>   tglx


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V1 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support

2017-11-13 Thread Thomas Gleixner
On Mon, 13 Nov 2017, Dey, Megha wrote:
> >-Original Message-
> >From: Peter Zijlstra [mailto:pet...@infradead.org]
> >Sent: Monday, November 13, 2017 1:00 AM
> >To: Megha Dey 
> >Cc: x...@kernel.org; linux-ker...@vger.kernel.org; linux-

Please fix your mail client so it does not add this complete useless
information to the reply.

> >On Sat, Nov 11, 2017 at 01:20:05PM -0800, Megha Dey wrote:
> >> +/*
> >> + * Unmask the NMI bit of the local APIC the first time task is
> >> +scheduled
> >> + * on a particular CPU.
> >> + */
> >> +static void intel_bm_unmask_nmi(void) {
> >> +  this_cpu_write(bm_unmask_apic, 0);
> >> +
> >> +  if (!(this_cpu_read(bm_unmask_apic))) {
> >> +  apic_write(APIC_LVTPC, APIC_DM_NMI);
> >> +  this_cpu_inc(bm_unmask_apic);
> >> +  }
> >> +}
> >
> >What? Why?
> 

> Normally, other drivers using perf create an event on every CPU (thereby
> calling perf_init on every CPU), where this bit(APIC_DM_NMI)is explicitly
> unmasked.  In our driver, we do not do this (since we are worried only
> about a particular task) and hence this bit is only disabled on the local
> APIC where the perf event is initialized.
>
> As such, if the task is scheduled out to some other CPU, this bit is set
> and hence would stop the interrupt from reaching the processing core.

Still that code makes no sense at all and certainly does not do what you
claim it does:

> >> +  this_cpu_write(bm_unmask_apic, 0);
> >> +
> >> +  if (!(this_cpu_read(bm_unmask_apic))) {

So first you write the per cpu variable to 0 and then you check whether it
is zero, which is pointless obviously.

> >
> >> +static int intel_bm_event_add(struct perf_event *event, int mode) {

Please move the opening bracket of the function into the next line. See the
kernel coding style documentation.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V1 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support

2017-11-13 Thread Dey, Megha


>-Original Message-
>From: Peter Zijlstra [mailto:pet...@infradead.org]
>Sent: Monday, November 13, 2017 1:00 AM
>To: Megha Dey <megha@linux.intel.com>
>Cc: x...@kernel.org; linux-ker...@vger.kernel.org; linux-
>d...@vger.kernel.org; t...@linutronix.de; mi...@redhat.com;
>h...@zytor.com; andriy.shevche...@linux.intel.com;
>kstew...@linuxfoundation.org; Yu, Yu-cheng <yu-cheng...@intel.com>;
>Brown, Len <len.br...@intel.com>; gre...@linuxfoundation.org;
>a...@kernel.org; alexander.shish...@linux.intel.com; jo...@redhat.com;
>namhy...@kernel.org; vikas.shiva...@linux.intel.com;
>pombreda...@nexb.com; m...@kylehuey.com; b...@suse.de; Andrejczuk,
>Grzegorz <grzegorz.andrejc...@intel.com>; Luck, Tony
><tony.l...@intel.com>; cor...@lwn.net; Shankar, Ravi V
><ravi.v.shan...@intel.com>; Dey, Megha <megha@intel.com>
>Subject: Re: [PATCH V1 2/3] perf/x86/intel/bm.c: Add Intel Branch
>Monitoring support
>
>On Sat, Nov 11, 2017 at 01:20:05PM -0800, Megha Dey wrote:
>> Currently, the cannonlake family of Intel processors support the
>> branch monitoring feature. Intel's Branch monitoring feature is trying
>> to utilize heuristics to detect the occurrence of an ROP (Return
>> Oriented Programming) attack.
>>
>> A perf-based kernel driver has been used to monitor the occurrence of
>> one of the 6 branch monitoring events. There are 2 counters that each
>> can select between one of these events for evaluation over a specified
>> instruction window size (0 to 1023). For each counter, a threshold
>> value
>> (0 to 127) can be configured to set a point at which ROP detection
>> event action is taken (determined by user-space). Each task can
>> monitor a maximum of 2 events at any given time.
>>
>> Apart from window_size(global) and threshold(per-counter), various
>> sysfs entries are provided for the user to configure: guest_disable,
>> lbr_freeze, window_cnt_sel, cnt_and_mode (all global) and
>mispred_evt_cnt(per-counter).
>> For all events belonging to the same task, the global parameters are
>> shared.
>
>Is there any sensible documentation on this except the MSR listings?

I have documented these sysfs entries in the next patch of this patch set:
Add Documentation for branch monitoring. Apart from that, unfortunately
there is only the MSR listings.

>
>>
>> Everytime a task is scheduled out, we save current window and count
>> associated with the event being monitored. When the task is scheduled
>> next, we start counting from previous count associated with this event.
>> Thus, a full context switch in this case is not necessary.
>
>What? That doesn't make any sense. The fact that we scheduled out and
>then in again _is_ a full context switch no?

What I meant was we need not save and restore all the branch monitoring 
MSRs during a context switch. I agree this is confusing. Will remove this line. 
>
>>
>> Signed-off-by: Megha Dey <megha@linux.intel.com>
>> Signed-off-by: Yu-Cheng Yu <yu-cheng...@intel.com>
>
>That SoB chain is buggered.

Will change the ordering.
>
>
>> +static int intel_bm_event_nmi_handler(unsigned int cmd, struct
>> +pt_regs *regs) {
>> +struct perf_event *event;
>> +union bm_detect_status stat;
>> +int i;
>> +unsigned long x;
>> +
>> +rdmsrl(BR_DETECT_STATUS_MSR, stat.raw);
>
>   if (!stat.event)
>   return NMI_DONE;
>
>saves you a whole bunch of indentation, no?

Yep, it does.
>
>> +
>> +if (stat.event) {
>> +wrmsrl(BR_DETECT_STATUS_MSR, 0);
>> +apic_write(APIC_LVTPC, APIC_DM_NMI);
>> +/*
>> + * Issue wake-up to corresponding polling event
>> + */
>> +x = stat.ctrl_hit;
>> +for_each_set_bit(i, , BM_MAX_COUNTERS) {
>> +event = current->thread.bm_counter_owner[i];
>> +local64_inc(>count);
>> +atomic_set(>hw.bm_poll, POLLIN);
>> +event->pending_wakeup = 1;
>> +irq_work_queue(>pending);
>> +}
>> +return NMI_HANDLED;
>> +}
>> +return NMI_DONE;
>> +}
>> +
>> +/*
>> + * Unmask the NMI bit of the local APIC the first time task is
>> +scheduled
>> + * on a particular CPU.
>> + */
>> +static void intel_bm_unmask_nmi(void) {
>> +this_cpu_write(bm_unmask_apic, 0);
>> +
>> +if (!(this_cpu_read(bm_unmask_apic))) {
>> +apic_write(APIC_LVTPC, APIC_DM_NMI);
>

Re: [PATCH V1 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support

2017-11-13 Thread Peter Zijlstra
On Sat, Nov 11, 2017 at 01:20:05PM -0800, Megha Dey wrote:
> Currently, the cannonlake family of Intel processors support the
> branch monitoring feature. Intel's Branch monitoring feature is trying
> to utilize heuristics to detect the occurrence of an ROP (Return
> Oriented Programming) attack.
> 
> A perf-based kernel driver has been used to monitor the occurrence of
> one of the 6 branch monitoring events. There are 2 counters that each
> can select between one of these events for evaluation over a specified
> instruction window size (0 to 1023). For each counter, a threshold value
> (0 to 127) can be configured to set a point at which ROP detection event
> action is taken (determined by user-space). Each task can monitor
> a maximum of 2 events at any given time.
> 
> Apart from window_size(global) and threshold(per-counter), various sysfs
> entries are provided for the user to configure: guest_disable, lbr_freeze,
> window_cnt_sel, cnt_and_mode (all global) and mispred_evt_cnt(per-counter).
> For all events belonging to the same task, the global parameters are
> shared.

Is there any sensible documentation on this except the MSR listings?

> 
> Everytime a task is scheduled out, we save current window and count
> associated with the event being monitored. When the task is scheduled
> next, we start counting from previous count associated with this event.
> Thus, a full context switch in this case is not necessary.

What? That doesn't make any sense. The fact that we scheduled out and
then in again _is_ a full context switch no?

> 
> Signed-off-by: Megha Dey 
> Signed-off-by: Yu-Cheng Yu 

That SoB chain is buggered.


> +static int intel_bm_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
> +{
> + struct perf_event *event;
> + union bm_detect_status stat;
> + int i;
> + unsigned long x;
> +
> + rdmsrl(BR_DETECT_STATUS_MSR, stat.raw);

if (!stat.event)
return NMI_DONE;

saves you a whole bunch of indentation, no?

> +
> + if (stat.event) {
> + wrmsrl(BR_DETECT_STATUS_MSR, 0);
> + apic_write(APIC_LVTPC, APIC_DM_NMI);
> + /*
> +  * Issue wake-up to corresponding polling event
> +  */
> + x = stat.ctrl_hit;
> + for_each_set_bit(i, , BM_MAX_COUNTERS) {
> + event = current->thread.bm_counter_owner[i];
> + local64_inc(>count);
> + atomic_set(>hw.bm_poll, POLLIN);
> + event->pending_wakeup = 1;
> + irq_work_queue(>pending);
> + }
> + return NMI_HANDLED;
> + }
> + return NMI_DONE;
> +}
> +
> +/*
> + * Unmask the NMI bit of the local APIC the first time task is scheduled
> + * on a particular CPU.
> + */
> +static void intel_bm_unmask_nmi(void)
> +{
> + this_cpu_write(bm_unmask_apic, 0);
> +
> + if (!(this_cpu_read(bm_unmask_apic))) {
> + apic_write(APIC_LVTPC, APIC_DM_NMI);
> + this_cpu_inc(bm_unmask_apic);
> + }
> +}

What? Why?

> +static int intel_bm_event_add(struct perf_event *event, int mode)
> +{
> + union bm_detect_status cur_stat, prev_stat;
> +
> + WARN_ON(event->hw.id >= BM_MAX_COUNTERS);
> +
> + prev_stat.raw = local64_read(>hw.prev_count);
> +
> + /*
> +  * Start counting from previous count associated with this event
> +  */
> + rdmsrl(BR_DETECT_STATUS_MSR, cur_stat.raw);
> +
> + cur_stat.count[event->hw.id] = prev_stat.count[event->hw.id];
> + cur_stat.count_window = prev_stat.count_window;
> + wrmsrl(BR_DETECT_STATUS_MSR, cur_stat.raw);

Why are you writing back the value you read? Just to waste cycles?

> + wrmsrl(BR_DETECT_CONTROL_MSR, event->hw.bm_ctrl);
> +
> + intel_bm_unmask_nmi();
> +
> + wrmsrl(BR_DETECT_COUNTER_CONFIG_BASE + event->hw.id,
> + (event->hw.bm_counter_conf | 1));

Please use a named construct for that enable bit.

> +
> + return 0;
> +}
> +
> +static void intel_bm_event_update(struct perf_event *event)
> +{
> + union bm_detect_status cur_stat;
> +
> + rdmsrl(BR_DETECT_STATUS_MSR, cur_stat.raw);
> + local64_set(>hw.prev_count, (uint64_t)cur_stat.raw);
> +}

That looks wrong... the general point of update functions is to update
the count, the above does not in fact do that.

> +
> +static void intel_bm_event_del(struct perf_event *event, int flags)
> +{
> + WARN_ON(event->hw.id >= BM_MAX_COUNTERS);
> +
> + wrmsrl(BR_DETECT_COUNTER_CONFIG_BASE + event->hw.id,
> + (event->hw.bm_counter_conf & ~1));

Either that EN bit is part of the bm_counter_conf, in which case you
didn't need to add it in _add(), or its not and you don't need to clear
it here. Make up your mind.

> +
> + intel_bm_event_update(event);

Except of course, that does not in fact update...

> +}
> +
> +static void intel_bm_event_destroy(struct perf_event