Re: [PATCH V1 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support
On Mon, 2017-11-13 at 21:25 +0100, Thomas Gleixner wrote: > On Mon, 13 Nov 2017, Dey, Megha wrote: > > >-Original Message- > > >From: Peter Zijlstra [mailto:pet...@infradead.org] > > >Sent: Monday, November 13, 2017 1:00 AM > > >To: Megha Dey> > >Cc: x...@kernel.org; linux-ker...@vger.kernel.org; linux- > > Please fix your mail client so it does not add this complete useless > information to the reply. Will fix this. > > > >On Sat, Nov 11, 2017 at 01:20:05PM -0800, Megha Dey wrote: > > >> +/* > > >> + * Unmask the NMI bit of the local APIC the first time task is > > >> +scheduled > > >> + * on a particular CPU. > > >> + */ > > >> +static void intel_bm_unmask_nmi(void) { > > >> +this_cpu_write(bm_unmask_apic, 0); > > >> + > > >> +if (!(this_cpu_read(bm_unmask_apic))) { > > >> +apic_write(APIC_LVTPC, APIC_DM_NMI); > > >> +this_cpu_inc(bm_unmask_apic); > > >> +} > > >> +} > > > > > >What? Why? > > > > > Normally, other drivers using perf create an event on every CPU (thereby > > calling perf_init on every CPU), where this bit(APIC_DM_NMI)is explicitly > > unmasked. In our driver, we do not do this (since we are worried only > > about a particular task) and hence this bit is only disabled on the local > > APIC where the perf event is initialized. > > > > As such, if the task is scheduled out to some other CPU, this bit is set > > and hence would stop the interrupt from reaching the processing core. > > Still that code makes no sense at all and certainly does not do what you > claim it does: > > > >> +this_cpu_write(bm_unmask_apic, 0); > > >> + > > >> +if (!(this_cpu_read(bm_unmask_apic))) { > > So first you write the per cpu variable to 0 and then you check whether it > is zero, which is pointless obviously. yes, I see your point. The logic is flawed. Will fix this. > > > > > > >> +static int intel_bm_event_add(struct perf_event *event, int mode) { > > Please move the opening bracket of the function into the next line. See the > kernel coding style documentation. Will do. > > Thanks, > > tglx -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH V1 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support
On Mon, 13 Nov 2017, Dey, Megha wrote: > >-Original Message- > >From: Peter Zijlstra [mailto:pet...@infradead.org] > >Sent: Monday, November 13, 2017 1:00 AM > >To: Megha Dey> >Cc: x...@kernel.org; linux-ker...@vger.kernel.org; linux- Please fix your mail client so it does not add this complete useless information to the reply. > >On Sat, Nov 11, 2017 at 01:20:05PM -0800, Megha Dey wrote: > >> +/* > >> + * Unmask the NMI bit of the local APIC the first time task is > >> +scheduled > >> + * on a particular CPU. > >> + */ > >> +static void intel_bm_unmask_nmi(void) { > >> + this_cpu_write(bm_unmask_apic, 0); > >> + > >> + if (!(this_cpu_read(bm_unmask_apic))) { > >> + apic_write(APIC_LVTPC, APIC_DM_NMI); > >> + this_cpu_inc(bm_unmask_apic); > >> + } > >> +} > > > >What? Why? > > Normally, other drivers using perf create an event on every CPU (thereby > calling perf_init on every CPU), where this bit(APIC_DM_NMI)is explicitly > unmasked. In our driver, we do not do this (since we are worried only > about a particular task) and hence this bit is only disabled on the local > APIC where the perf event is initialized. > > As such, if the task is scheduled out to some other CPU, this bit is set > and hence would stop the interrupt from reaching the processing core. Still that code makes no sense at all and certainly does not do what you claim it does: > >> + this_cpu_write(bm_unmask_apic, 0); > >> + > >> + if (!(this_cpu_read(bm_unmask_apic))) { So first you write the per cpu variable to 0 and then you check whether it is zero, which is pointless obviously. > > > >> +static int intel_bm_event_add(struct perf_event *event, int mode) { Please move the opening bracket of the function into the next line. See the kernel coding style documentation. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH V1 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support
>-Original Message- >From: Peter Zijlstra [mailto:pet...@infradead.org] >Sent: Monday, November 13, 2017 1:00 AM >To: Megha Dey <megha@linux.intel.com> >Cc: x...@kernel.org; linux-ker...@vger.kernel.org; linux- >d...@vger.kernel.org; t...@linutronix.de; mi...@redhat.com; >h...@zytor.com; andriy.shevche...@linux.intel.com; >kstew...@linuxfoundation.org; Yu, Yu-cheng <yu-cheng...@intel.com>; >Brown, Len <len.br...@intel.com>; gre...@linuxfoundation.org; >a...@kernel.org; alexander.shish...@linux.intel.com; jo...@redhat.com; >namhy...@kernel.org; vikas.shiva...@linux.intel.com; >pombreda...@nexb.com; m...@kylehuey.com; b...@suse.de; Andrejczuk, >Grzegorz <grzegorz.andrejc...@intel.com>; Luck, Tony ><tony.l...@intel.com>; cor...@lwn.net; Shankar, Ravi V ><ravi.v.shan...@intel.com>; Dey, Megha <megha@intel.com> >Subject: Re: [PATCH V1 2/3] perf/x86/intel/bm.c: Add Intel Branch >Monitoring support > >On Sat, Nov 11, 2017 at 01:20:05PM -0800, Megha Dey wrote: >> Currently, the cannonlake family of Intel processors support the >> branch monitoring feature. Intel's Branch monitoring feature is trying >> to utilize heuristics to detect the occurrence of an ROP (Return >> Oriented Programming) attack. >> >> A perf-based kernel driver has been used to monitor the occurrence of >> one of the 6 branch monitoring events. There are 2 counters that each >> can select between one of these events for evaluation over a specified >> instruction window size (0 to 1023). For each counter, a threshold >> value >> (0 to 127) can be configured to set a point at which ROP detection >> event action is taken (determined by user-space). Each task can >> monitor a maximum of 2 events at any given time. >> >> Apart from window_size(global) and threshold(per-counter), various >> sysfs entries are provided for the user to configure: guest_disable, >> lbr_freeze, window_cnt_sel, cnt_and_mode (all global) and >mispred_evt_cnt(per-counter). >> For all events belonging to the same task, the global parameters are >> shared. > >Is there any sensible documentation on this except the MSR listings? I have documented these sysfs entries in the next patch of this patch set: Add Documentation for branch monitoring. Apart from that, unfortunately there is only the MSR listings. > >> >> Everytime a task is scheduled out, we save current window and count >> associated with the event being monitored. When the task is scheduled >> next, we start counting from previous count associated with this event. >> Thus, a full context switch in this case is not necessary. > >What? That doesn't make any sense. The fact that we scheduled out and >then in again _is_ a full context switch no? What I meant was we need not save and restore all the branch monitoring MSRs during a context switch. I agree this is confusing. Will remove this line. > >> >> Signed-off-by: Megha Dey <megha@linux.intel.com> >> Signed-off-by: Yu-Cheng Yu <yu-cheng...@intel.com> > >That SoB chain is buggered. Will change the ordering. > > >> +static int intel_bm_event_nmi_handler(unsigned int cmd, struct >> +pt_regs *regs) { >> +struct perf_event *event; >> +union bm_detect_status stat; >> +int i; >> +unsigned long x; >> + >> +rdmsrl(BR_DETECT_STATUS_MSR, stat.raw); > > if (!stat.event) > return NMI_DONE; > >saves you a whole bunch of indentation, no? Yep, it does. > >> + >> +if (stat.event) { >> +wrmsrl(BR_DETECT_STATUS_MSR, 0); >> +apic_write(APIC_LVTPC, APIC_DM_NMI); >> +/* >> + * Issue wake-up to corresponding polling event >> + */ >> +x = stat.ctrl_hit; >> +for_each_set_bit(i, , BM_MAX_COUNTERS) { >> +event = current->thread.bm_counter_owner[i]; >> +local64_inc(>count); >> +atomic_set(>hw.bm_poll, POLLIN); >> +event->pending_wakeup = 1; >> +irq_work_queue(>pending); >> +} >> +return NMI_HANDLED; >> +} >> +return NMI_DONE; >> +} >> + >> +/* >> + * Unmask the NMI bit of the local APIC the first time task is >> +scheduled >> + * on a particular CPU. >> + */ >> +static void intel_bm_unmask_nmi(void) { >> +this_cpu_write(bm_unmask_apic, 0); >> + >> +if (!(this_cpu_read(bm_unmask_apic))) { >> +apic_write(APIC_LVTPC, APIC_DM_NMI); >
Re: [PATCH V1 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support
On Sat, Nov 11, 2017 at 01:20:05PM -0800, Megha Dey wrote: > Currently, the cannonlake family of Intel processors support the > branch monitoring feature. Intel's Branch monitoring feature is trying > to utilize heuristics to detect the occurrence of an ROP (Return > Oriented Programming) attack. > > A perf-based kernel driver has been used to monitor the occurrence of > one of the 6 branch monitoring events. There are 2 counters that each > can select between one of these events for evaluation over a specified > instruction window size (0 to 1023). For each counter, a threshold value > (0 to 127) can be configured to set a point at which ROP detection event > action is taken (determined by user-space). Each task can monitor > a maximum of 2 events at any given time. > > Apart from window_size(global) and threshold(per-counter), various sysfs > entries are provided for the user to configure: guest_disable, lbr_freeze, > window_cnt_sel, cnt_and_mode (all global) and mispred_evt_cnt(per-counter). > For all events belonging to the same task, the global parameters are > shared. Is there any sensible documentation on this except the MSR listings? > > Everytime a task is scheduled out, we save current window and count > associated with the event being monitored. When the task is scheduled > next, we start counting from previous count associated with this event. > Thus, a full context switch in this case is not necessary. What? That doesn't make any sense. The fact that we scheduled out and then in again _is_ a full context switch no? > > Signed-off-by: Megha Dey> Signed-off-by: Yu-Cheng Yu That SoB chain is buggered. > +static int intel_bm_event_nmi_handler(unsigned int cmd, struct pt_regs *regs) > +{ > + struct perf_event *event; > + union bm_detect_status stat; > + int i; > + unsigned long x; > + > + rdmsrl(BR_DETECT_STATUS_MSR, stat.raw); if (!stat.event) return NMI_DONE; saves you a whole bunch of indentation, no? > + > + if (stat.event) { > + wrmsrl(BR_DETECT_STATUS_MSR, 0); > + apic_write(APIC_LVTPC, APIC_DM_NMI); > + /* > + * Issue wake-up to corresponding polling event > + */ > + x = stat.ctrl_hit; > + for_each_set_bit(i, , BM_MAX_COUNTERS) { > + event = current->thread.bm_counter_owner[i]; > + local64_inc(>count); > + atomic_set(>hw.bm_poll, POLLIN); > + event->pending_wakeup = 1; > + irq_work_queue(>pending); > + } > + return NMI_HANDLED; > + } > + return NMI_DONE; > +} > + > +/* > + * Unmask the NMI bit of the local APIC the first time task is scheduled > + * on a particular CPU. > + */ > +static void intel_bm_unmask_nmi(void) > +{ > + this_cpu_write(bm_unmask_apic, 0); > + > + if (!(this_cpu_read(bm_unmask_apic))) { > + apic_write(APIC_LVTPC, APIC_DM_NMI); > + this_cpu_inc(bm_unmask_apic); > + } > +} What? Why? > +static int intel_bm_event_add(struct perf_event *event, int mode) > +{ > + union bm_detect_status cur_stat, prev_stat; > + > + WARN_ON(event->hw.id >= BM_MAX_COUNTERS); > + > + prev_stat.raw = local64_read(>hw.prev_count); > + > + /* > + * Start counting from previous count associated with this event > + */ > + rdmsrl(BR_DETECT_STATUS_MSR, cur_stat.raw); > + > + cur_stat.count[event->hw.id] = prev_stat.count[event->hw.id]; > + cur_stat.count_window = prev_stat.count_window; > + wrmsrl(BR_DETECT_STATUS_MSR, cur_stat.raw); Why are you writing back the value you read? Just to waste cycles? > + wrmsrl(BR_DETECT_CONTROL_MSR, event->hw.bm_ctrl); > + > + intel_bm_unmask_nmi(); > + > + wrmsrl(BR_DETECT_COUNTER_CONFIG_BASE + event->hw.id, > + (event->hw.bm_counter_conf | 1)); Please use a named construct for that enable bit. > + > + return 0; > +} > + > +static void intel_bm_event_update(struct perf_event *event) > +{ > + union bm_detect_status cur_stat; > + > + rdmsrl(BR_DETECT_STATUS_MSR, cur_stat.raw); > + local64_set(>hw.prev_count, (uint64_t)cur_stat.raw); > +} That looks wrong... the general point of update functions is to update the count, the above does not in fact do that. > + > +static void intel_bm_event_del(struct perf_event *event, int flags) > +{ > + WARN_ON(event->hw.id >= BM_MAX_COUNTERS); > + > + wrmsrl(BR_DETECT_COUNTER_CONFIG_BASE + event->hw.id, > + (event->hw.bm_counter_conf & ~1)); Either that EN bit is part of the bm_counter_conf, in which case you didn't need to add it in _add(), or its not and you don't need to clear it here. Make up your mind. > + > + intel_bm_event_update(event); Except of course, that does not in fact update... > +} > + > +static void intel_bm_event_destroy(struct perf_event