Re: [PATCH V5 1/2] perf ignore LBR and extra_regs

2014-07-14 Thread Peter Zijlstra
On Thu, Jul 10, 2014 at 03:59:43AM -0700, kan.li...@intel.com wrote:
 From: Kan Liang kan.li...@intel.com
 
 x86, perf: Protect LBR and extra_regs against KVM lying
 
 With -cpu host, KVM reports LBR and extra_regs support, if the host has 
 support.
 When the guest perf driver tries to access LBR or extra_regs MSR,
 it #GPs all MSR accesses,since KVM doesn't handle LBR and extra_regs support.
 So check the related MSRs access right once at initialization time to avoid 
 the error access at runtime.
 
 For reproducing the issue, please build the kernel with CONFIG_KVM_INTEL = y 
 (for host kernel).
 And CONFIG_PARAVIRT = n and CONFIG_KVM_GUEST = n (for guest kernel).
 Start the guest with -cpu host.
 Run perf record with --branch-any or --branch-filter in guest to trigger LBR 
 #GP.
 Run perf stat offcore events (E.g. LLC-loads/LLC-load-misses ...) in guest to 
 trigger offcore_rsp #GP

This is still not properly wrapped at 78 chars.

 Signed-off-by: Kan Liang kan.li...@intel.com
 
 V2: Move the check code to initialization time.
 V3: Add flag for each extra register.
 Check all LBR MSRs at initialization time.
 V4: Remove lbr_msr_access. For LBR msr, simply set lbr_nr to 0 if check_msr 
 failed.
 Disable all extra msrs in creation places if check_msr failed.
 V5: Fix check_msr broken
 Don't check any more MSRs after the first fail
 Return error when checking fail to stop creating the event
 Remove the checking code path which never get

These things should go below the --- so they get thrown away when
applying the patch, its of no relevance once applied.

 ---
  arch/x86/kernel/cpu/perf_event.c   |  3 +++
  arch/x86/kernel/cpu/perf_event.h   | 45 
 ++
  arch/x86/kernel/cpu/perf_event_intel.c | 38 +++-
  3 files changed, 85 insertions(+), 1 deletion(-)
 
 diff --git a/arch/x86/kernel/cpu/perf_event.c 
 b/arch/x86/kernel/cpu/perf_event.c
 index 2bdfbff..a7c5e4b 100644
 --- a/arch/x86/kernel/cpu/perf_event.c
 +++ b/arch/x86/kernel/cpu/perf_event.c
 @@ -118,6 +118,9 @@ static int x86_pmu_extra_regs(u64 config, struct 
 perf_event *event)
   continue;
   if (event-attr.config1  ~er-valid_mask)
   return -EINVAL;
 + /* Check if the extra msrs can be safely accessed*/
 + if (!x86_pmu.extra_msr_access[er-idx])
 + return -EFAULT;

This is not a correct usage of -EFAULT. Event creation did not fail
because we took a fault dereferencing a user provided pointer. Possibly
ENXIO is appropriate.

   reg-idx = er-idx;
   reg-config = event-attr.config1;
 diff --git a/arch/x86/kernel/cpu/perf_event.h 
 b/arch/x86/kernel/cpu/perf_event.h
 index 3b2f9bd..992c678 100644
 --- a/arch/x86/kernel/cpu/perf_event.h
 +++ b/arch/x86/kernel/cpu/perf_event.h
 @@ -464,6 +464,12 @@ struct x86_pmu {
*/
   struct extra_reg *extra_regs;
   unsigned int er_flags;
 + /*
 +  * EXTRA REG MSR can be accessed
 +  * The extra registers are completely unrelated to each other.
 +  * So it needs a flag for each extra register.
 +  */
 + boolextra_msr_access[EXTRA_REG_MAX];

So why not in struct extra_reg again? You didn't give a straight answer
there.

 +/*
 + * Under certain circumstances, access certain MSR may cause #GP.
 + * The function tests if the input MSR can be safely accessed.
 + */
 +static inline bool check_msr(unsigned long msr)
 +{

This reads like a generic function;

 + u64 val_old, val_new, val_tmp;
 +
 + /*
 +  * Read the current value, change it and read it back to see if it
 +  * matches, this is needed to detect certain hardware emulators
 +  * (qemu/kvm) that don't trap on the MSR access and always return 0s.
 +  */
 + if (rdmsrl_safe(msr, val_old))
 + goto msr_fail;
 + /*
 +  * Only chagne it slightly,
 +  * since the higher bits of some MSRs cannot be updated by wrmsrl.
 +  * E.g. MSR_LBR_TOS
 +  */
 + val_tmp = val_old ^ 0x3UL;

but this is not generally true; not all MSRs can write the 2 LSB, can
they? One option would be to extend the function with a u64 mask.

 + if (wrmsrl_safe(msr, val_tmp) ||
 + rdmsrl_safe(msr, val_new))
 + goto msr_fail;
 +
 + if (val_new != val_tmp)
 + goto msr_fail;
 +
 + /* Here it's sure that the MSR can be safely accessed.
 +  * Restore the old value and return.
 +  */
 + wrmsrl(msr, val_old);
 +
 + return true;
 +
 +msr_fail:
 + return false;
 +}

Also, by now this function is far too large to be inline and in a
header.

 + /*
 +  * Access LBR MSR may cause #GP under certain circumstances.
 +  * E.g. KVM doesn't support LBR MSR
 +  * Check all LBT MSR here.
 +  * Disable LBR access if any LBR MSRs can not be accessed.
 +  */
 + if (x86_pmu.lbr_nr) {
 + if 

Re: [PATCH V5 1/2] perf ignore LBR and extra_regs

2014-07-14 Thread Peter Zijlstra
On Thu, Jul 10, 2014 at 03:59:43AM -0700, kan.li...@intel.com wrote:
 +/*
 + * Under certain circumstances, access certain MSR may cause #GP.
 + * The function tests if the input MSR can be safely accessed.
 + */
 +static inline bool check_msr(unsigned long msr)
 +{
 + u64 val_old, val_new, val_tmp;
 +
 + /*
 +  * Read the current value, change it and read it back to see if it
 +  * matches, this is needed to detect certain hardware emulators
 +  * (qemu/kvm) that don't trap on the MSR access and always return 0s.
 +  */
 + if (rdmsrl_safe(msr, val_old))
 + goto msr_fail;
 + /*
 +  * Only chagne it slightly,
 +  * since the higher bits of some MSRs cannot be updated by wrmsrl.
 +  * E.g. MSR_LBR_TOS
 +  */
 + val_tmp = val_old ^ 0x3UL;
 + if (wrmsrl_safe(msr, val_tmp) ||
 + rdmsrl_safe(msr, val_new))
 + goto msr_fail;
 +
 + if (val_new != val_tmp)
 + goto msr_fail;
 +
 + /* Here it's sure that the MSR can be safely accessed.
 +  * Restore the old value and return.
 +  */
 + wrmsrl(msr, val_old);
 +
 + return true;
 +
 +msr_fail:
 + return false;
 +}

I don't think we need the msr_fail thing, there's no state to clean up,
you can return false at all places you now have goto.


pgpkp9qBX8LOU.pgp
Description: PGP signature


Re: [PATCH V5 1/2] perf ignore LBR and extra_regs

2014-07-14 Thread Paolo Bonzini

Il 10/07/2014 12:59, kan.li...@intel.com ha scritto:

From: Kan Liang kan.li...@intel.com

x86, perf: Protect LBR and extra_regs against KVM lying

With -cpu host, KVM reports LBR and extra_regs support, if the host has support.
When the guest perf driver tries to access LBR or extra_regs MSR,
it #GPs all MSR accesses,since KVM doesn't handle LBR and extra_regs support.
So check the related MSRs access right once at initialization time to avoid the 
error access at runtime.

For reproducing the issue, please build the kernel with CONFIG_KVM_INTEL = y 
(for host kernel).
And CONFIG_PARAVIRT = n and CONFIG_KVM_GUEST = n (for guest kernel).


I'm not sure this is a useful patch.

This is #GP'ing just because of a limitation in the PMU; just compile 
the kernel with CONFIG_PARAVIRT, or split the rdmsr is always 
rdmsr_safe behavior out of CONFIG_PARAVIRT and into a new Kconfig symbol.


In fact there's no reason why LBR cannot be virtualized (though it does 
need support from the processor), and it may even be possible to support 
OFFCORE_RSP_X in the KVM virtual PMU.


Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 1/2] perf ignore LBR and extra_regs

2014-07-14 Thread Peter Zijlstra
On Mon, Jul 14, 2014 at 01:55:03PM +0200, Paolo Bonzini wrote:
 Il 10/07/2014 12:59, kan.li...@intel.com ha scritto:
 From: Kan Liang kan.li...@intel.com
 
 x86, perf: Protect LBR and extra_regs against KVM lying
 
 With -cpu host, KVM reports LBR and extra_regs support, if the host has 
 support.
 When the guest perf driver tries to access LBR or extra_regs MSR,
 it #GPs all MSR accesses,since KVM doesn't handle LBR and extra_regs support.
 So check the related MSRs access right once at initialization time to avoid 
 the error access at runtime.
 
 For reproducing the issue, please build the kernel with CONFIG_KVM_INTEL = y 
 (for host kernel).
 And CONFIG_PARAVIRT = n and CONFIG_KVM_GUEST = n (for guest kernel).
 
 I'm not sure this is a useful patch.
 
 This is #GP'ing just because of a limitation in the PMU; just compile the
 kernel with CONFIG_PARAVIRT

How's that going to help? If you run kvm -host the VM is lying through
its teeth is the kernel is going to assume all those MSRs present,
PARAVIRT isn't going to help with this.

 , or split the rdmsr is always rdmsr_safe
 behavior out of CONFIG_PARAVIRT and into a new Kconfig symbol.

That's not useful either, because non of these code-paths are going to
check the return value.

 In fact there's no reason why LBR cannot be virtualized (though it does need
 support from the processor), and it may even be possible to support
 OFFCORE_RSP_X in the KVM virtual PMU.

But its not, so something needs to be done, right?


pgpmnEwGpgwpV.pgp
Description: PGP signature


Re: [PATCH V5 1/2] perf ignore LBR and extra_regs

2014-07-14 Thread Paolo Bonzini

Il 14/07/2014 14:09, Peter Zijlstra ha scritto:

On Mon, Jul 14, 2014 at 01:55:03PM +0200, Paolo Bonzini wrote:

Il 10/07/2014 12:59, kan.li...@intel.com ha scritto:

From: Kan Liang kan.li...@intel.com

x86, perf: Protect LBR and extra_regs against KVM lying

With -cpu host, KVM reports LBR and extra_regs support, if the host has support.
When the guest perf driver tries to access LBR or extra_regs MSR,
it #GPs all MSR accesses,since KVM doesn't handle LBR and extra_regs support.
So check the related MSRs access right once at initialization time to avoid the 
error access at runtime.

For reproducing the issue, please build the kernel with CONFIG_KVM_INTEL = y 
(for host kernel).
And CONFIG_PARAVIRT = n and CONFIG_KVM_GUEST = n (for guest kernel).


I'm not sure this is a useful patch.

This is #GP'ing just because of a limitation in the PMU; just compile the
kernel with CONFIG_PARAVIRT


How's that going to help? If you run kvm -host the VM is lying through
its teeth is the kernel is going to assume all those MSRs present,
PARAVIRT isn't going to help with this.


, or split the rdmsr is always rdmsr_safe
behavior out of CONFIG_PARAVIRT and into a new Kconfig symbol.


That's not useful either, because non of these code-paths are going to
check the return value.


Hmmm, I thought rdmsr_safe was going to return zero, but it just returns 
whatever happened to be in edx:eax which maybe should also be fixed.


Kan Liang, what happens if CONFIG_PARAVIRT=y?  Do you get garbage or 
just no events reported?



In fact there's no reason why LBR cannot be virtualized (though it does need
support from the processor), and it may even be possible to support
OFFCORE_RSP_X in the KVM virtual PMU.


But its not, so something needs to be done, right?


A first thing that could be done, is to have a way for the kernel to 
detect absence of LBR, for example an all-1s setting of the LBR format 
field of IA32_PERF_CAPABILITIES.  If Intel can tell us all 1s will 
never be used, we can have KVM set the field that way.  The kernel then 
should disable LBR.


Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 1/2] perf ignore LBR and extra_regs

2014-07-14 Thread Peter Zijlstra
On Mon, Jul 14, 2014 at 02:40:33PM +0200, Paolo Bonzini wrote:
 Hmmm, I thought rdmsr_safe was going to return zero, but it just returns
 whatever happened to be in edx:eax which maybe should also be fixed.
 
 Kan Liang, what happens if CONFIG_PARAVIRT=y?  Do you get garbage or just no
 events reported?

Either way; its not functioning according to 'spec'. Therefore we should
'disable' things.

 In fact there's no reason why LBR cannot be virtualized (though it does need
 support from the processor), and it may even be possible to support
 OFFCORE_RSP_X in the KVM virtual PMU.
 
 But its not, so something needs to be done, right?
 
 A first thing that could be done, is to have a way for the kernel to detect
 absence of LBR

Which is exactly what this patch does, no?

 , for example an all-1s setting of the LBR format field of
 IA32_PERF_CAPABILITIES.  If Intel can tell us all 1s will never be used,
 we can have KVM set the field that way.  The kernel then should disable LBR.

So what's wrong with testing if the MSRs that provide LBR actually work
or not? That doesn't require any magic co-operation of the host/vm
machinery and is therefore far more robust.


pgpBAg5uoyJMK.pgp
Description: PGP signature


RE: [PATCH V5 1/2] perf ignore LBR and extra_regs

2014-07-14 Thread Liang, Kan


  For reproducing the issue, please build the kernel with
 CONFIG_KVM_INTEL = y (for host kernel).
  And CONFIG_PARAVIRT = n and CONFIG_KVM_GUEST = n (for guest
 kernel).
 
  I'm not sure this is a useful patch.
 
  This is #GP'ing just because of a limitation in the PMU; just compile
  the kernel with CONFIG_PARAVIRT
 
  How's that going to help? If you run kvm -host the VM is lying through
  its teeth is the kernel is going to assume all those MSRs present,
  PARAVIRT isn't going to help with this.
 
  , or split the rdmsr is always rdmsr_safe
  behavior out of CONFIG_PARAVIRT and into a new Kconfig symbol.
 
  That's not useful either, because non of these code-paths are going to
  check the return value.
 
 Hmmm, I thought rdmsr_safe was going to return zero, but it just returns
 whatever happened to be in edx:eax which maybe should also be fixed.
 
 Kan Liang, what happens if CONFIG_PARAVIRT=y?  Do you get garbage or
 just no events reported?
 

Guest rdmsr/wrmsr will eventually call rdmsr_safe/wrmsr_safe. They will handle 
the #GP. So there is no error report in guest.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 1/2] perf ignore LBR and extra_regs

2014-07-14 Thread Paolo Bonzini

Il 14/07/2014 15:36, Liang, Kan ha scritto:

 Kan Liang, what happens if CONFIG_PARAVIRT=y?  Do you get garbage or
 just no events reported?


Guest rdmsr/wrmsr will eventually call rdmsr_safe/wrmsr_safe. They will handle 
the #GP. So there is no error report in guest.


Yeah, but what's the effect on the output of perf?

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V5 1/2] perf ignore LBR and extra_regs

2014-07-14 Thread Liang, Kan


 -Original Message-
 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 Sent: Monday, July 14, 2014 9:40 AM
 To: Liang, Kan; Peter Zijlstra
 Cc: a...@firstfloor.org; linux-ker...@vger.kernel.org; kvm@vger.kernel.org
 Subject: Re: [PATCH V5 1/2] perf ignore LBR and extra_regs
 
 Il 14/07/2014 15:36, Liang, Kan ha scritto:
   Kan Liang, what happens if CONFIG_PARAVIRT=y?  Do you get garbage
   or just no events reported?
  
  Guest rdmsr/wrmsr will eventually call rdmsr_safe/wrmsr_safe. They will
 handle the #GP. So there is no error report in guest.
 
 Yeah, but what's the effect on the output of perf?

They can be still counted/sampled, but the results are garbage.

Kan

 
 Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 1/2] perf ignore LBR and extra_regs

2014-07-14 Thread Paolo Bonzini

Il 14/07/2014 14:48, Peter Zijlstra ha scritto:

In fact there's no reason why LBR cannot be virtualized (though it does need
support from the processor), and it may even be possible to support
OFFCORE_RSP_X in the KVM virtual PMU.


But its not, so something needs to be done, right?


A first thing that could be done, is to have a way for the kernel to detect
absence of LBR


Which is exactly what this patch does, no?


A documented (and recommended) way for the kernel to detect it is 
superior though...



, for example an all-1s setting of the LBR format field of
IA32_PERF_CAPABILITIES.  If Intel can tell us all 1s will never be used,
we can have KVM set the field that way.  The kernel then should disable LBR.


So what's wrong with testing if the MSRs that provide LBR actually work
or not? That doesn't require any magic co-operation of the host/vm
machinery and is therefore far more robust.


... because the difference is that whenever we hack the kernel, Windows 
and vTune will remain broken forever.  Whenever we get Intel to make the 
hack part of the spec, there is some hope that Windows and vTune will 
eventually get fixed.


The hack certainly works, I'm not disputing that.

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V5 1/2] perf ignore LBR and extra_regs

2014-07-14 Thread Liang, Kan


  diff --git a/arch/x86/kernel/cpu/perf_event.h
  b/arch/x86/kernel/cpu/perf_event.h
  index 3b2f9bd..992c678 100644
  --- a/arch/x86/kernel/cpu/perf_event.h
  +++ b/arch/x86/kernel/cpu/perf_event.h
  @@ -464,6 +464,12 @@ struct x86_pmu {
   */
  struct extra_reg *extra_regs;
  unsigned int er_flags;
  +   /*
  +* EXTRA REG MSR can be accessed
  +* The extra registers are completely unrelated to each other.
  +* So it needs a flag for each extra register.
  +*/
  +   boolextra_msr_access[EXTRA_REG_MAX];
 
 So why not in struct extra_reg again? You didn't give a straight answer there.

I think I did in the email.
You mentioned that there's still (only) 4 empty bytes at the tail of extra_reg 
itself. 
However, the extra_reg_type may be extended in the near future.
So that may not be a reason to move to extra_reg.

Furthermore, if we move extra_msr_access to extra_reg, 
I guess we have to modify all the related micros (i.e EVENT_EXTRA_REG) to 
initialize the new items.
That could be a big change.

On the other side, in x86_pmu structure, there are extra_regs related items 
defined under the comments Extra registers for events.
 And the bit holes are enough for current usage and future extension.

So I guess  x86_pmu should be a good place to store the availability of the reg.

 /* --- cacheline 6 boundary (384 bytes) --- */
 bool   lbr_double_abort; /*   384 1 */
 
 /* XXX 7 bytes hole, try to pack */
 
 struct extra_reg * extra_regs;   /*   392 8 */
 unsigned int   er_flags; /*   400 4 */
 
 /* XXX 4 bytes hole, try to pack */
 
 struct perf_guest_switch_msr * (*guest_get_msrs)(int *); /*   408 
8 */
 
 /* size: 416, cachelines: 7, members: 64 */
 /* sum members: 391, holes: 6, sum holes: 25 */
 /* bit holes: 1, sum bit holes: 27 bits */
 /* last cacheline: 32 bytes */

 
  +/*
  + * Under certain circumstances, access certain MSR may cause #GP.
  + * The function tests if the input MSR can be safely accessed.
  + */
  +static inline bool check_msr(unsigned long msr) {
 
 This reads like a generic function;
 
  +   u64 val_old, val_new, val_tmp;
  +
  +   /*
  +* Read the current value, change it and read it back to see if it
  +* matches, this is needed to detect certain hardware emulators
  +* (qemu/kvm) that don't trap on the MSR access and always return
 0s.
  +*/
  +   if (rdmsrl_safe(msr, val_old))
  +   goto msr_fail;
  +   /*
  +* Only chagne it slightly,
  +* since the higher bits of some MSRs cannot be updated by wrmsrl.
  +* E.g. MSR_LBR_TOS
  +*/
  +   val_tmp = val_old ^ 0x3UL;
 
 but this is not generally true; not all MSRs can write the 2 LSB, can they? 
 One
 option would be to extend the function with a u64 mask.

Right, the function should be easily used to check all MSRs, not just for the 
MSRs I tested.
I will pass a mask as a parameter of the function.  

 
  +   if (wrmsrl_safe(msr, val_tmp) ||
  +   rdmsrl_safe(msr, val_new))
  +   goto msr_fail;
  +
  +   if (val_new != val_tmp)
  +   goto msr_fail;
  +
  +   /* Here it's sure that the MSR can be safely accessed.
  +* Restore the old value and return.
  +*/
  +   wrmsrl(msr, val_old);
  +
  +   return true;
  +
  +msr_fail:
  +   return false;
  +}
 
 Also, by now this function is far too large to be inline and in a header.

OK. I will move it to perf_event_intel.c as a static function.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 1/2] perf ignore LBR and extra_regs

2014-07-14 Thread Peter Zijlstra

so once more; and then I'm going to route your emails to /dev/null, wrap
text at 78 chars.

On Mon, Jul 14, 2014 at 02:28:36PM +, Liang, Kan wrote:
   +++ b/arch/x86/kernel/cpu/perf_event.h
   @@ -464,6 +464,12 @@ struct x86_pmu {
  */
 struct extra_reg *extra_regs;
 unsigned int er_flags;
   + /*
   +  * EXTRA REG MSR can be accessed
   +  * The extra registers are completely unrelated to each other.
   +  * So it needs a flag for each extra register.
   +  */
   + boolextra_msr_access[EXTRA_REG_MAX];
  
  So why not in struct extra_reg again? You didn't give a straight answer 
  there.
 
 I think I did in the email.
 You mentioned that there's still (only) 4 empty bytes at the tail of
 extra_reg itself.  However, the extra_reg_type may be extended in the
 near future.  So that may not be a reason to move to extra_reg.

Well, you can always grow. Also be explicit, 'may be' is an empty
statement.

 Furthermore, if we move extra_msr_access to extra_reg, I guess we have
 to modify all the related micros (i.e EVENT_EXTRA_REG) to initialize
 the new items.  That could be a big change.

Nah, trivial stuff :-)

 On the other side, in x86_pmu structure, there are extra_regs related
 items defined under the comments Extra registers for events.  And
 the bit holes are enough for current usage and future extension.
 
 So I guess  x86_pmu should be a good place to store the availability
 of the reg.

It just doesn't make sense to me to have multiple arrays of the same
thing.



pgpllkEQRnDii.pgp
Description: PGP signature