RE: [PATCH v1 01/11] perf/x86/core: Support KVM to assign a dedicated counter for guest PEBS

2020-06-19 Thread Kang, Luwei
> > > > Suppose your KVM thing claims counter 0/2 (ICL/SKL) for some
> > > > random PEBS event, and then the host wants to use PREC_DIST.. Then
> > > > one of them will be screwed for no reason what so ever.
> > > >
> > >
> > > The multiplexing should be triggered.
> > >
> > > For host, if both user A and user B requires PREC_DIST, the
> > > multiplexing should be triggered for them.
> > > Now, the user B is KVM. I don't think there is difference. The
> > > multiplexing should still be triggered. Why it is screwed?
> >
> > Becuase if KVM isn't PREC_DIST we should be able to reschedule it to a
> > different counter.
> >
> > > > How is that not destroying scheduling freedom? Any other situation
> > > > we'd have moved the !PREC_DIST PEBS event to another counter.
> > > >
> > >
> > > All counters are equivalent for them. It doesn't matter if we move
> > > it to another counter. There is no impact for the user.
> >
> > But we cannot move it to another counter, because you're pinning it.
> 
> Hi Peter,
> 
> To avoid the pinning counters, I have tried to do some evaluation about
> patching the PEBS record for guest in KVM. In this approach, about ~30% time
> increased on guest PEBS PMI handler latency ( e.g.perf record -e branch-
> loads:p -c 1000 ~/Tools/br_instr a).
> 
> Some implementation details as below:
> 1. Patching the guest PEBS records "Applicable Counters" filed when the guest
>  required counter is not the same with the host. Because the guest PEBS
>  driver will drop these PEBS records if the "Applicable Counters" not the
>  same with the required counter index.
> 2. Traping the guest driver's behavior(VM-exit) of disabling PEBS.
>  It happens before reading PEBS records (e.g. PEBS PMI handler, before
>  application exit and so on)
> 3. To patch the Guest PEBS records in KVM, we need to get the HPA of the
>  guest PEBS buffer.
>  <1> Trapping the guest write of IA32_DS_AREA register and get the GVA
>  of guest DS_AREA.
>  <2> Translate the DS AREA GVA to GPA(kvm_mmu_gva_to_gpa_read)
>  and get the GVA of guest PEBS buffer from DS AREA
>  (kvm_vcpu_read_guest_atomic).
>  <3> Although we have got the GVA of PEBS buffer, we need to do the
>  address translation(GVA->GPA->HPA) for each page. Because we 
> can't
>  assume the GPAs of Guest PEBS buffer are always continuous.
> 
> But we met another issue about the PEBS counter reset field in DS AREA.
> pebs_event_reset in DS area has to be set for auto reload, which is per 
> counter.
> Guest and Host may use different counters. Let's say guest wants to use
> counter 0, but host assign counter 1 to guest. Guest sets the reset value to
> pebs_event_reset[0]. However, since counter 1 is the one which is eventually
> scheduled, HW will use  pebs_event_reset[1] as reset value.
> 
> We can't copy the value of the guest pebs_event_reset[0] to
> pebs_event_reset[1] directly(Patching DS AREA) because the guest driver may
> confused, and we can't assume the guest counter 0 and 1 are not used for this
> PEBS task at the same time. And what's more, KVM can't aware the guest
> read/write to the DS AREA because it just a general memory for guest.
> 
> What is your opinion or do you have a better proposal?

Kindly ping~

Thanks,
Luwei Kang

> 
> Thanks,
> Luwei Kang
> 
> >
> > > In the new proposal, KVM user is treated the same as other host
> > > events with event constraint. The scheduler is free to choose
> > > whether or not to assign a counter for it.
> >
> > That's what it does, I understand that. I'm saying that that is
> > creating artificial contention.
> >
> >
> > Why is this needed anyway? Can't we force the guest to flush and then
> > move it over to a new counter?


RE: [PATCH v1 01/11] perf/x86/core: Support KVM to assign a dedicated counter for guest PEBS

2020-06-11 Thread Kang, Luwei
> > > Suppose your KVM thing claims counter 0/2 (ICL/SKL) for some random
> > > PEBS event, and then the host wants to use PREC_DIST.. Then one of
> > > them will be screwed for no reason what so ever.
> > >
> >
> > The multiplexing should be triggered.
> >
> > For host, if both user A and user B requires PREC_DIST, the
> > multiplexing should be triggered for them.
> > Now, the user B is KVM. I don't think there is difference. The
> > multiplexing should still be triggered. Why it is screwed?
> 
> Becuase if KVM isn't PREC_DIST we should be able to reschedule it to a
> different counter.
> 
> > > How is that not destroying scheduling freedom? Any other situation
> > > we'd have moved the !PREC_DIST PEBS event to another counter.
> > >
> >
> > All counters are equivalent for them. It doesn't matter if we move it
> > to another counter. There is no impact for the user.
> 
> But we cannot move it to another counter, because you're pinning it.

Hi Peter,

To avoid the pinning counters, I have tried to do some evaluation about
patching the PEBS record for guest in KVM. In this approach, about ~30% 
time increased on guest PEBS PMI handler latency (
e.g.perf record -e branch-loads:p -c 1000 ~/Tools/br_instr a).

Some implementation details as below:
1. Patching the guest PEBS records "Applicable Counters" filed when the guest
 required counter is not the same with the host. Because the guest PEBS
 driver will drop these PEBS records if the "Applicable Counters" not the
 same with the required counter index.
2. Traping the guest driver's behavior(VM-exit) of disabling PEBS. 
 It happens before reading PEBS records (e.g. PEBS PMI handler, before
 application exit and so on)
3. To patch the Guest PEBS records in KVM, we need to get the HPA of the
 guest PEBS buffer.
 <1> Trapping the guest write of IA32_DS_AREA register and get the GVA
 of guest DS_AREA.
 <2> Translate the DS AREA GVA to GPA(kvm_mmu_gva_to_gpa_read)
 and get the GVA of guest PEBS buffer from DS AREA
 (kvm_vcpu_read_guest_atomic).
 <3> Although we have got the GVA of PEBS buffer, we need to do the
 address translation(GVA->GPA->HPA) for each page. Because we can't
 assume the GPAs of Guest PEBS buffer are always continuous.

But we met another issue about the PEBS counter reset field in DS AREA.
pebs_event_reset in DS area has to be set for auto reload, which is per
counter. Guest and Host may use different counters. Let's say guest wants to
use counter 0, but host assign counter 1 to guest. Guest sets the reset value to
pebs_event_reset[0]. However, since counter 1 is the one which is eventually
scheduled, HW will use  pebs_event_reset[1] as reset value.

We can't copy the value of the guest pebs_event_reset[0] to
pebs_event_reset[1] directly(Patching DS AREA) because the guest driver may
confused, and we can't assume the guest counter 0 and 1 are not used for this
PEBS task at the same time. And what's more, KVM can't aware the guest
read/write to the DS AREA because it just a general memory for guest.

What is your opinion or do you have a better proposal?

Thanks,
Luwei Kang

> 
> > In the new proposal, KVM user is treated the same as other host events
> > with event constraint. The scheduler is free to choose whether or not
> > to assign a counter for it.
> 
> That's what it does, I understand that. I'm saying that that is creating 
> artificial
> contention.
> 
> 
> Why is this needed anyway? Can't we force the guest to flush and then move it
> over to a new counter?


RE: [RFC v1 3/9] KVM: x86: Implement MSR_IA32_PEBS_ENABLE read/write emulation

2019-08-29 Thread Kang, Luwei
> > +   case MSR_IA32_PEBS_ENABLE:
> > +   if (pmu->pebs_enable == data)
> > +   return 0;
> > +   if (!(data & pmu->pebs_enable_mask) &&
> > +(data & MSR_IA32_PEBS_OUTPUT_MASK) ==
> > +   MSR_IA32_PEBS_OUTPUT_PT)
> {
> > +   pebs_enable_changed(pmu, data);
> > +   return 0;
> > +   }
> 
> Need #GP for bad values

Yes, this function will return 1 if neither of above two conditions check are 
not true. And will inject a #GP to guest.

Thanks,
Luwei Kang



RE: [RFC v1 1/9] KVM: x86: Add base address parameter for get_fixed_pmc function

2019-08-29 Thread Kang, Luwei
> >  /* returns fixed PMC with the specified MSR */ -static inline struct
> > kvm_pmc *get_fixed_pmc(struct kvm_pmu *pmu, u32 msr)
> > +static inline struct kvm_pmc *get_fixed_pmc(struct kvm_pmu *pmu, u32
> msr,
> > +   int base)
> 
> Better define a __get_fixed_pmc just for this case, with the existing
> get_fixed_pmc being a wrapper.
> 
> This would avoid changing all the callers below.

Do you mean change the code like this, and call "__get_fixed_pmc" in my patch? 
We already have a similar function to get gp counters.
struct kvm_pmc *get_gp_pmc(struct kvm_pmu *pmu, u32 msr, u32 base)  // get 
gp counters
struct kvm_pmc *get_fixed_pmc(struct kvm_pmu *pmu, u32 msr) // get 
fixed counters

-/* returns fixed PMC with the specified MSR */
-static inline struct kvm_pmc *get_fixed_pmc(struct kvm_pmu *pmu, u32 msr)
+static inline struct kvm_pmc *__get_fixed_pmc(struct kvm_pmu *pmu, u32 msr, 
u32 base)
 {
-   int base = MSR_CORE_PERF_FIXED_CTR0;
-
if (msr >= base && msr < base + pmu->nr_arch_fixed_counters)
return >fixed_counters[msr - base];

return NULL;
 }

+/* returns fixed PMC with the specified MSR */
+static inline struct kvm_pmc *get_fixed_pmc(struct kvm_pmu *pmu, u32 msr)
+{
+   return __get_fixed_pmc(pmu, msr, MSR_CORE_PERF_FIXED_CTR0)
+}

Thanks,
Luwei Kang

> 
> 
> -Andi


RE: [RFC v1 1/9] KVM: x86: Add base address parameter for get_fixed_pmc function

2019-08-29 Thread Kang, Luwei
> > PEBS output Inte PT introduces some new MSRs (MSR_RELOAD_FIXED_CTRx)
> > for fixed function counters that use for autoload the preset value
> > after writing out a PEBS event.
> >
> > Introduce base MSRs address parameter to make this function can get
> > performance monitor counter structure by MSR_RELOAD_FIXED_CTRx
> registers.
> >
> > Signed-off-by: Luwei Kang 
> > ---
> >  arch/x86/kvm/pmu.h   |  5 ++---
> >  arch/x86/kvm/vmx/pmu_intel.c | 14 +-
> >  2 files changed, 11 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index
> > 58265f7..c62a1ff 100644
> > --- a/arch/x86/kvm/pmu.h
> > +++ b/arch/x86/kvm/pmu.h
> > @@ -93,10 +93,9 @@ static inline struct kvm_pmc *get_gp_pmc(struct
> > kvm_pmu *pmu, u32 msr,  }
> >
> >  /* returns fixed PMC with the specified MSR */ -static inline struct
> > kvm_pmc *get_fixed_pmc(struct kvm_pmu *pmu, u32 msr)
> > +static inline struct kvm_pmc *get_fixed_pmc(struct kvm_pmu *pmu, u32
> msr,
> > +   int
> > +base)
> >  {
> > -   int base = MSR_CORE_PERF_FIXED_CTR0;
> > -
> > if (msr >= base && msr < base + pmu->nr_arch_fixed_counters)
> > return >fixed_counters[msr - base];
> 
> IIUC, these new MSRs aren't new fixed PMCs, but are values to be reloaded
> into the existing fixed PMCs when a PEBS event has been written. This change
> makes it look like you are introducing an additional set of fixed PMCs.

Yes, you are right. They are not new fixed counters.
Each fixed/general purpose counters have a "kvm_pmc" structure in KVM. We 
already have a function to get general purpose counter by event selectors and 
gp counters, as below:
pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)   //by gp counter
pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0) //by gp event selector
So I extended the " get_fixed_pmc " function to support get fixed counters by 
MSR_RELOAD_FIXED_CTRx. Actually, they are all get "kvm_pmc" structure by offset.

Thanks,
Luwei Kang





RE: [RFC v1 0/9] PEBS enabling in KVM guest

2019-08-29 Thread Kang, Luwei
> On Thu, Aug 29, 2019 at 01:34:00PM +0800, Luwei Kang wrote:
> > Intel new hardware introduces some Precise Event-Based Sampling (PEBS)
> > extensions that output the PEBS record to Intel PT stream instead of
> > DS area. The PEBS record will be packaged in a specific format when
> > outputing to Intel PT.
> >
> > This patch set will enable PEBS functionality in KVM Guest by PEBS
> > output to Intel PT. The native driver as [1] (still under review).
> >
> > [1] https://www.spinics.net/lists/kernel/msg3215354.html
> 
> Please use:
> 
>   https://lkml.kernel.org/r/$MSGID
> 
> then I don't have to touch a browser but can find the email in my MUA.

Thanks. The link of native driver should be:
https://lkml.kernel.org/r/20190806084606.4021-1-alexander.shish...@linux.intel.com

Luwei Kang


RE: [PATCH] KVM: LAPIC: Do not mask the local interrupts when LAPIC is sw disabled

2019-05-31 Thread Kang, Luwei



> -Original Message-
> From: Christopherson, Sean J
> Sent: Friday, May 31, 2019 2:46 AM
> To: Kang, Luwei 
> Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org; pbonz...@redhat.com; 
> rkrc...@redhat.com; t...@linutronix.de;
> mi...@redhat.com; b...@alien8.de; h...@zytor.com; x...@kernel.org
> Subject: Re: [PATCH] KVM: LAPIC: Do not mask the local interrupts when LAPIC 
> is sw disabled
> 
> On Tue, May 21, 2019 at 06:44:15PM +0800, Luwei Kang wrote:
> > The current code will mask all the local interrupts in the local
> > vector table when the LAPIC is disabled by SVR (Spurious-Interrupt
> > Vector Register) "APIC Software Enable/Disable" flag (bit8).
> > This may block local interrupt be delivered to target vCPU even if
> > LAPIC is enabled by set SVR (bit8 == 1) after.
> 
> The current code aligns with the SDM, which states:
> 
>   Local APIC State After It Has Been Software Disabled
> 
>   When the APIC software enable/disable flag in the spurious interrupt
>   vector register has been explicitly cleared (as opposed to being cleared
>   during a power up or reset), the local APIC is temporarily disabled.
>   The operation and response of a local APIC while in this software-
>   disabled state is as follows:
> 
> - The mask bits for all the LVT entries are set. Attempts to reset
>   these bits will be ignored.

Thanks for Sean's reminder. 
I make this patch because I found the PMI from Intel PT can't be inject to 
target vCPU when there have multi vCPU in guest and the Intel PT interrupt 
happened on not the first vCPU (i.e. not vCPU0).  The interrupt blocked in 
kvm_apic_local_deliver() function and can't pass the APIC_LVT_MASKED flag check 
(LVTPC is masked from start to end). The KVM Guest will enabled the LVTPC 
during LAPIC is software disabled and enabled LAPIC after during VM bootup, but 
LVTPC is still disabled. Guest PT driver didn't enabled LVTPC before enable PT 
as well. But the Guest performance monitor counter driver will enabled LVTPC in 
each time before using PMU. I will do more check on this. Thank you.

Luwei Kang



RE: [PATCH v3 7/8] KVM:X86: Add XSS bit 11 and 12 support for CET xsaves/xrstors.

2019-03-10 Thread Kang, Luwei
> > For Guest XSS, right now, only bit 11(user states) and bit 12
> > (supervisor states) are supported, if other bits are being set, need
> > to modify KVM_SUPPORTED_XSS macro to have support.
> >
> > Signed-off-by: Zhang Yi Z 
> > Signed-off-by: Yang Weijiang 
> > ---
> >  arch/x86/kvm/vmx.c | 8 +---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> > d32cee9ee079..68908ed7b151 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -47,6 +47,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -4336,12 +4337,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, 
> > struct msr_data *msr_info)
> > case MSR_IA32_XSS:
> > if (!vmx_xsaves_supported())
> > return 1;
> > +
> > /*
> > -* The only supported bit as of Skylake is bit 8, but
> > -* it is not supported on KVM.
> > +* Check bits being set are supported in KVM.
> >  */
> > -   if (data != 0)
> > +   if (data & ~kvm_supported_xss())
> > return 1;
> > +
> > vcpu->arch.ia32_xss = data;
> > if (vcpu->arch.ia32_xss != host_xss)
> > add_atomic_switch_msr(vmx, MSR_IA32_XSS,
> >
> 
> Luwei, can you work with Weijiang to add XSAVES support for Processor Tracing?

Hi Paolo,
As we talk long time before. I will do that.

Thanks,
Luwei Kang

> 
> Paolo


RE: [PATCH V3] KVM: x86: Sync the pending Posted-Interrupts

2019-01-31 Thread Kang, Luwei
> > Some Posted-Interrupts from passthrough devices may be lost or
> > overwritten when the vCPU is in runnable state.
> >
> > The SN (Suppress Notification) of PID (Posted Interrupt Descriptor)
> > will be set when the vCPU is preempted (vCPU in
> KVM_MP_STATE_RUNNABLE
> > state but not running on physical CPU). If a posted interrupt coming
> > at this time, the irq remmaping facility will set the bit of PIR
> > (Posted Interrupt Requests) without ON (Outstanding Notification).
> > So this interrupt can't be sync to APIC virtualization register and
> > will not be handled by Guest because ON is zero.
> >
> > Signed-off-by: Luwei Kang 
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 5 +
> >  arch/x86/kvm/x86.c | 2 +-
> >  2 files changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > 4341175..8ed9634 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1221,6 +1221,11 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu
> *vcpu, int cpu)
> > new.sn = 0;
> > } while (cmpxchg64(_desc->control, old.control,
> >new.control) != old.control);
> > +
> 
>   /*
>* Clear SN before reading the bitmap.  The VT-d firmware
>* writes the bitmap and reads SN atomically (5.2.3 in the
>* spec), so it doesn't really have a memory barrier that
>* pairs with this, but we cannot do that and we need one.
>*/
> 
> > +   smp_mb__after_atomic();
> > +
> > +   if (!bitmap_empty((unsigned long *)pi_desc->pir, NR_VECTORS))
> > +   pi_test_and_set_on(pi_desc);
> 
> You can add pi_set_on for use here.  The fast path with pi_clear_sn should
> also be removed.


Do you mean remove the blow code in vmx_vcpu_pi_load() function to make the ON 
can be set if PIR is not zero?

--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1192,21 +1192,6 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int 
cpu)
if (!pi_test_sn(pi_desc) && vcpu->cpu == cpu)
return;

-   /*
-* First handle the simple case where no cmpxchg is necessary; just
-* allow posting non-urgent interrupts.
-*
-* If the 'nv' field is POSTED_INTR_WAKEUP_VECTOR, do not change
-* PI.NDST: pi_post_block will do it for us and the wakeup_handler
-* expects the VCPU to be on the blocked_vcpu_list that matches
-* PI.NDST.
-*/
-   if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR ||
-   vcpu->cpu == cpu) {
-   pi_clear_sn(pi_desc);
-   return;
-   }

Thanks,
Luwei Kang


RE: [PATCH v2] KVM: x86: Sync the pending Posted-Interrupts

2019-01-30 Thread Kang, Luwei
> > >> This is not what I asked.  You should instead do the check after 
> > >> pi_clear_sn.
> > >>
> > >
> > > I think the SN has been cleared here before test the bitmap.
> > > The SN will be set when the  vCPU is schedule out. ID:
> > > 28b835d60fcc2498e717cf5e6f0c3691c24546f7
> > > But SN will be cleared when sched in.
> > >
> > > Another place is when vCPU run out of the vcpu_run() function:
> > >  kvm_arch_vcpu_ioctl_run()
> > >  vcpu_load(vcpu); -> kvm_arch_vcpu_load -> vmx_vcpu_load -> 
> > > vmx_vcpu_pi_load -> new.sn = 0;
> > >  vcpu_run(vcpu);
> > >  for(;;)
> > >  vcpu_put(vcpu); -> kvm_arch_vcpu_put -> vmx_vcpu_put ->
> > > vmx_vcpu_pi_put -> pi_set_sn() But SN will be cleared in vcpu_load()
> > > before back to vcpu_run()
> >
> > Yes, but you're changing the wrong path.  The patch is affecting _all_ 
> > vmentries, not just those after PID.SN has been cleared.
> >
> > As I mentioned in the previous email, KVM relies on the SDM's
> > invariant that ON where PID.ON=1 whenever PID.PIR!=0.  Invariants are your 
> > best friend when dealing with complicated multi-processor
> code so I don't want to change that.
> >
> > It's the VT-d pi_clear_sn path that I want to be changed, because it's
> > VT-d and specifically SN that complicates the very simple definition in the 
> > SDM.  By modifying the pi_clear_sn path, you ensure the
> invariant is respected and everyone is happy.
> 
> Hi Paolo,
> How about like this:
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 
> 820a03b..dfc5e3d 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1219,6 +1219,9 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int 
> cpu)
> new.ndst = (dest << 8) & 0xFF00;
> 
> new.sn = 0;
> +
> +   if (!bitmap_empty((unsigned long *)new.pir, NR_VECTORS))
> +   new.on = 1;

Sorry, should be:
+   if (!bitmap_empty((unsigned long *)pi_desc->pir, NR_VECTORS))

Luwei Kang

> } while (cmpxchg64(_desc->control, old.control,
>new.control) != old.control);  }
> 
> Thanks,
> Luwei Kang


RE: [PATCH v2] KVM: x86: Sync the pending Posted-Interrupts

2019-01-30 Thread Kang, Luwei
> >> This is not what I asked.  You should instead do the check after 
> >> pi_clear_sn.
> >>
> >
> > I think the SN has been cleared here before test the bitmap.
> > The SN will be set when the  vCPU is schedule out. ID:
> > 28b835d60fcc2498e717cf5e6f0c3691c24546f7
> > But SN will be cleared when sched in.
> >
> > Another place is when vCPU run out of the vcpu_run() function:
> >  kvm_arch_vcpu_ioctl_run()
> >  vcpu_load(vcpu); -> kvm_arch_vcpu_load -> vmx_vcpu_load -> 
> > vmx_vcpu_pi_load -> new.sn = 0;
> >  vcpu_run(vcpu);
> >  for(;;)
> >  vcpu_put(vcpu); -> kvm_arch_vcpu_put -> vmx_vcpu_put ->
> > vmx_vcpu_pi_put -> pi_set_sn() But SN will be cleared in vcpu_load()
> > before back to vcpu_run()
> 
> Yes, but you're changing the wrong path.  The patch is affecting _all_ 
> vmentries, not just those after PID.SN has been cleared.
> 
> As I mentioned in the previous email, KVM relies on the SDM's invariant that 
> ON where PID.ON=1 whenever PID.PIR!=0.  Invariants are your
> best friend when dealing with complicated multi-processor code so I don't 
> want to change that.
> 
> It's the VT-d pi_clear_sn path that I want to be changed, because it's VT-d 
> and specifically SN that complicates the very simple definition in
> the SDM.  By modifying the pi_clear_sn path, you ensure the invariant is 
> respected and everyone is happy.

Hi Paolo,
How about like this:

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 820a03b..dfc5e3d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1219,6 +1219,9 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int 
cpu)
new.ndst = (dest << 8) & 0xFF00;

new.sn = 0;
+
+   if (!bitmap_empty((unsigned long *)new.pir, NR_VECTORS))
+   new.on = 1;
} while (cmpxchg64(_desc->control, old.control,
   new.control) != old.control);
 }

Thanks,
Luwei Kang


RE: [PATCH v2] KVM: x86: Sync the pending Posted-Interrupts

2019-01-30 Thread Kang, Luwei
> > Some Posted-Interrupts from passthrough devices may be lost or
> > overwritten when the vCPU is in runnable state.
> >
> > The SN (Suppress Notification) of PID (Posted Interrupt Descriptor)
> > will be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE
> > state but not running on physical CPU). If a posted interrupt coming
> > at this time, the irq remmaping facility will set the bit of PIR
> > (Posted Interrupt Requests) without ON (Outstanding Notification).
> > So this interrupt can't be sync to APIC virtualization register and
> > will not be handled by Guest because ON is zero.
> >
> > Signed-off-by: Luwei Kang 
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 2 +-
> >  arch/x86/kvm/x86.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > f6915f1..820a03b 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -6048,7 +6048,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> > bool max_irr_updated;
> >
> > WARN_ON(!vcpu->arch.apicv_active);
> > -   if (pi_test_on(>pi_desc)) {
> > +   if (!bitmap_empty((unsigned long *)vmx->pi_desc.pir, NR_VECTORS)) {
> > pi_clear_on(>pi_desc);
> > /*
> >  * IOMMU can write to PIR.ON, so the barrier matters even on UP.
> 
> This is not what I asked.  You should instead do the check after pi_clear_sn.
> 

I think the SN has been cleared here before test the bitmap.
The SN will be set when the  vCPU is schedule out. ID: 
28b835d60fcc2498e717cf5e6f0c3691c24546f7
But SN will be cleared when sched in.

Another place is when vCPU run out of the vcpu_run() function:
 kvm_arch_vcpu_ioctl_run()
 vcpu_load(vcpu); -> kvm_arch_vcpu_load -> vmx_vcpu_load -> 
vmx_vcpu_pi_load -> new.sn = 0;
 vcpu_run(vcpu);
 for(;;)
 vcpu_put(vcpu); -> kvm_arch_vcpu_put -> vmx_vcpu_put -> 
vmx_vcpu_pi_put -> pi_set_sn()
But SN will be cleared in vcpu_load() before back to vcpu_run()

Thanks,
Luwei Kang



RE: [PATCH] KVM: x86: Sync the pending Posted-Interrupts

2019-01-29 Thread Kang, Luwei
> >>> However, you should at least change the comment in vcpu_enter_guest
> >>> to mention "before reading PIR" instead of "before reading PIR.ON".
> >>
> >> Will do that. I think the "checking PIR.ON" should be PID.ON. I will fix 
> >> it.
> >>
> > Hi Paolo,
> > I reconsidered the comment in vcpu_enter_guest() and think it may don't 
> > need to change. The original comment as below:
> >  * 2) For APICv, we should set ->mode before checking  PIR.ON.  This
> >  * pairs with the memory barrier implicit in pi_test_and_set_on
> >  * (see vmx_deliver_posted_interrupt).
> >
> > I think "before checking PIR.ON" is mean "before checking PID.PIR and 
> > PID.ON".
> 
> I can say it only means PID.ON because I wrote the comment. :)
> 
> The idea is that checking ON is enough: KVM assumes that PID.PIR is only set 
> if PID.ON is set, because it follows the definition of ON in table
> 29-1 of the SDM: "If this bit is set, there is a notification outstanding for 
> one or more posted interrupts in bits 255:0".
> 
> VT-D breaks this assumption whenever SN=1 ("hardware does not generate 
> notification event nor modify the ON field"), resulting in
> nonzero PID.PIR but PID.ON=0.  I'm sure there was a reason for that, but it 
> does result in inconsistency between the PID definitions in the
> SDM and the VT-D specification.  The right fix is definitely to reconcile 
> this difference and test the bitmap after SN is cleared (with
> smp_mb__after_atomic after clearing SN), and set ON=1 if the bitmap is not 
> clear.
> 
Hi Paolo
Thanks for your elaboration, very clear to me. I will fix it in next 
version.

Thanks,
Luwei Kang





RE: [PATCH] KVM: x86: Sync the pending Posted-Interrupts

2019-01-29 Thread Kang, Luwei
> > However, you should at least change the comment in vcpu_enter_guest to
> > mention "before reading PIR" instead of "before reading PIR.ON".
> 
> Will do that. I think the "checking PIR.ON" should be PID.ON. I will fix it.
> 
Hi Paolo,
I reconsidered the comment in vcpu_enter_guest() and think it may don't 
need to change. The original comment as below:
 * 2) For APICv, we should set ->mode before checking  PIR.ON.  This
 * pairs with the memory barrier implicit in pi_test_and_set_on
 * (see vmx_deliver_posted_interrupt).

I think "before checking PIR.ON" is mean "before checking PID.PIR and 
PID.ON". Because there indeed have this two flag check in  
vmx_deliver_posted_interrupt() function. If PID.ON is already Set at the time 
of hardware posting an interrupt to PIR field, notification event is not 
generated (from VT-d spec 9.12).

Thanks,
Luwei Kang


RE: [PATCH] KVM: x86: Sync the pending Posted-Interrupts

2019-01-28 Thread Kang, Luwei
> > Some Posted-Interrupts from passthrough devices may be lost or
> > overwritten when the vCPU is in runnable state.
> >
> > The SN (Suppress Notification) of PID (Posted Interrupt Descriptor)
> > will be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE
> > state but not running on physical CPU). If a posted interrupt coming
> > at this time, the irq remmaping facility will set the bit of PIR
> > (Posted Interrupt Requests) but ON (Outstanding Notification).
> > So this interrupt can't be sync to APIC virtualization register and
> > will not be handled by Guest because ON is zero.
> >
> > Signed-off-by: Luwei Kang 
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > f6915f1..820a03b 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -6048,7 +6048,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> > bool max_irr_updated;
> >
> > WARN_ON(!vcpu->arch.apicv_active);
> > -   if (pi_test_on(>pi_desc)) {
> > +   if (!bitmap_empty((unsigned long *)vmx->pi_desc.pir, NR_VECTORS)) {
> > pi_clear_on(>pi_desc);
> > /*
> >  * IOMMU can write to PIR.ON, so the barrier matters even on UP.
> >
> 
> This is a very delicate path.  The bitmap check here is ordered after the 
> vcpu->mode write in vcpu_enter_guest, matching the check of
> vcpu->mode in vmx_deliver_posted_interrupt (which comes after a write of
> PIR.ON):
> 
>   sender  receiver
>   write PIR
>   write PIR.ONvcpu->mode = IN_GUEST_MODE
>   smp_mb()smp_mb()
>   read vcpu->mode sync_pir_to_irr
> read PIR.ON
> 
> What you did should work, since PIR is written after PIR.ON anyway.

Hi Paolo:
I think what you mentioned PIR.ON here is PID.ON;
PID: Posted Interrupt Descriptor (a structure for PI which include 512 bits)
ON:  Outstanding Notification  (one bit of PID)
PIR:  Posted Interrupt Requests (256 bits for each interrupt vector in PID)

Before VT-d irq remapping, there just have PIR and ON in PID. Posted 
interrupt introduced by APICv can be used for send IPI. The working flow of 
sent a posted interrupt show in vmx_deliver_posted_interrupt().
1. Set PIR of PID
2. Set ON of PID
3. Send a IPI to target vCPU with notification vector (POSTED_INTR_VECTOR) 
if target vCPU is Running on a pCPU; have no vm-exit and handed by guest 
interrupt handler directly.
4. if the target vCPU is not Running on pCPU, invoke kvm_vcpu_kick() to 
wake up this vCPU or send RESCHEDULE_VECTOR IPI to target pCPU to make the vCPU 
running as soon as possible.
5. follow 4. The vCPU prepare to run (vcpu_enter_guest) and sync the posted 
interrupt of ON is set.

It looks like work well. 
VT-d irq remapping facility introduce SN, NV, NDST in PID. These are used 
by irq remapping facility and CPU don’t care these flags.
6. The bit of SN will be set when vCPU id preempted (runnable but not 
running).
 commit 28b835d60fcc2498e717cf5e6f0c3691c24546f7
  KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
7.  if a interrupt coming at this moment, irq remapping facility just set 
PIR but *not* set ON (VT-d spec 9.12).
So, here, the interrupt can't be sync to IRR because ON is 0.

I add some log here and found some interrupt recorded in PIR but ON is 
zero. It will impact the performance of pass through device.

> However, you should at least change the comment in vcpu_enter_guest to 
> mention "before reading PIR" instead of "before reading
> PIR.ON".

Will do that. I think the "checking PIR.ON" should be PID.ON. I will fix it.

> 
> Alternatively, would it be possible to instead set ON when SN is cleared?  
> The clearing of SN is in pi_clear_sn, and you would have instead
> something like

SN is cleared when the corresponding vCPU is running on pCPU. I think we can't 
set ON when SN is cleared.  Because there have some words in VT-d spec 9.12:
If ON is set at the time of hardware posting an interrupt to PIR field, 
notification event is not generated.

> 
>   WRITE_ONCE(u16 *)_desc->on_sn, POSTED_INTR_ON);

We already have a function  (pi_test_on) to check the bit of POSTED_INTR_ON. So 
I think it is unnecessary.

Thanks,
Luwei Kang

> 
> where on_sn is added to struct pi_desc like this:
> 
> @@ -61,4 +60,5 @@ struct pi_desc {
>   u32 ndst;
>   };
> + u16 on_sn;
>   u64 control;
>   };
> 
> Paolo


RE: [PATCH] KVM: x86: Sync the pending Posted-Interrupts

2019-01-27 Thread Kang, Luwei
> > Some Posted-Interrupts from passthrough devices may be lost or
> > overwritten when the vCPU is in runnable state.
> >
> > The SN (Suppress Notification) of PID (Posted Interrupt Descriptor)
> > will be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE
> > state but not running on physical CPU). If a posted interrupt coming
> > at this time, the irq remmaping facility will set the bit of PIR
> > (Posted Interrupt Requests) but ON (Outstanding Notification).
> 
> s/but ON/and ON is set too/?

Sorry. If the interrupt remapping facility received a interrupt from device but 
the current SN bit is 1, the remapping facility will not set ON and not send 
notification event to CPU, just set the corresponding bit of PIR.

Thanks,
Luwei Kang

> > So this interrupt can't be sync to APIC virtualization register and
> > will not be handled by Guest because ON is zero.
> >
> > Signed-off-by: Luwei Kang 
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > f6915f1..820a03b 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -6048,7 +6048,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> > bool max_irr_updated;
> >
> > WARN_ON(!vcpu->arch.apicv_active);
> > -   if (pi_test_on(>pi_desc)) {
> > +   if (!bitmap_empty((unsigned long *)vmx->pi_desc.pir, NR_VECTORS)) {
> > pi_clear_on(>pi_desc);
> > /*
> >  * IOMMU can write to PIR.ON, so the barrier matters even on UP.


RE: [PATCH][next] KVM: x86: Fix bit shifting in update_intel_pt_cfg

2018-12-27 Thread Kang, Luwei



> -Original Message-
> From: Gustavo A. R. Silva [mailto:gust...@embeddedor.com]
> Sent: Thursday, December 27, 2018 4:41 AM
> To: Kang, Luwei ; Paolo Bonzini ; 
> Radim Krčmář ; Thomas Gleixner
> ; Ingo Molnar ; Borislav Petkov 
> ; H. Peter Anvin ;
> x...@kernel.org
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org; Gustavo A. R. Silva 
> 
> Subject: [PATCH][next] KVM: x86: Fix bit shifting in update_intel_pt_cfg
> 
> ctl_bitmask in pt_desc is of type u64. When an integer like 0xf is being left 
> shifted more than 32 bits, the behavior is undefined.
> 
> Fix this by adding suffix ULL to integer 0xf.
> 
> Addresses-Coverity-ID: 1476095 ("Bad bit shift operation")
> Fixes: 6c0f0bba85a0 ("KVM: x86: Introduce a function to initialize the PT 
> configuration")
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  arch/x86/kvm/vmx/vmx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 
> cbd55e7aeae5..251c68a74bbe 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7012,7 +7012,7 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
> 
>   /* unmask address range configure area */
>   for (i = 0; i < vmx->pt_desc.addr_range; i++)
> - vmx->pt_desc.ctl_bitmask &= ~(0xf << (32 + i * 4));
> + vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
>  }

Looks good to me, thanks.

Reviewed-by: Luwei Kang 

> 
>  static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> --
> 2.20.1



RE: [Resend PATCH v13 08/12] KVM: x86: Add Intel PT context switch for each vcpu

2018-11-04 Thread Kang, Luwei
> >> If you "have to enable or disable anything" it means you have to
> >> override the default.  But the default in this patches is "no change
> >> compared to before the patches", leaving tracing of both host and
> >> guest entirely to the host, so I don't understand your remark.  What
> >> workflow is broken?
> >>
> >>> There already are controls in perf that enable/disable guest tracing.
> >>
> >> You are confusing "tracing guest from the host" and "the guest can
> >> trace itself".  This patchset is adding support for the latter, and
> >> that
> 
> I'm not confusing anything. In the terminology that you're using, the latter 
> breaks the former. This cannot happen.
> 
> >> affects directly whether the tracing CPUID leaf can be added to the
> >> guest.  Therefore it's not perf that can decide whether to turn it
> >> on; KVM must know it when /dev/kvm is opened, which is why it is a
> >> module parameter.
> 
> There is a control in the perf event attribute that enables tracing the 
> guest. If this control is enabled, the kvm needs to stay away from any
> PT related MSRs. Conversely, if kvm is using PT (or, as you say, "the guest 
> is tracing itself"), the host should not be allowed to ask for tracing
> the guest at the same time.

I think what you mentioned "perf event attribute" is "struct perf_event_attr -> 
exclude_host/exclude_guest" parameter.
Parameter "exclude_host" can use for vPMU in pmc_reprogram_counter() to make 
the counter disabled before VM-exit; Parameter "exclude_guest" can use for PMU 
in host to make the counter not include the value in Guest;
For the implementation of vPMU, there have some counters on each logical CPU, 
and host and guest can using different counter at same time, Not make Guest 
"stay from" PMU. Is that right? I think this "perf event attribute" not fit for 
Intel PT.

Intel PT is different, there just have one serials of MSRs in each logical CPU. 
So this hardware just can be used by host OR guest. In Host-Guest mode, PT 
feature will be exposed to guest and guest detect this feature certainly can 
use it like native any way. 
"If this control is enabled, the kvm needs to stay away from any PT related 
MSRs."
If we pay for a ICL virtual machine support Intel PT from Cloud Vendor but 
can't be used. Cloud vendor say that Host (or other virtual machine) is using 
this feature so you can't use it. Why? 

Currently, Intel SDM support three mode of tracing (System-Wide Tracing, 
Host-Only Tracing and Guest-Only Tracing). below is copy from SDM:
System-Wide Tracing (35.5.2.1): When a host or VMM configures Intel PT to 
collect trace packets of the entire system, it can leave the relevant VMX 
controls clear to allow VMX-specific packets to provide information across VMX 
transitions. 
Host-Only Tracing (35.5.2.2): Trace packets in VMX non-root operation are not 
desired, the VMM can use the VM-entry MSR-load area to load IA32_RTIT_CTL 
(clearing TraceEn) to disable trace-packet generation in guests, and use the 
VM-exit MSR-load area to load IA32_RTIT_CTL to set TraceEn.
Guest-Only Tracing (35.5.2.3): A VMM can configure trace-packet generation 
while in VMX non-root operation for guests executing normally. 

In Host mode we need to disable PT before VM-entry (no matter if PT is 
support/enabled or not in guest) and Load Guest PT status if PT is supported in 
Guest. The Host-Guest mode in this patch set just combined the Host-Only mode 
and Guest-Only mode to Host-Guest mode because there don't have conflict in 
these two mode.

As for Host PT will be disable before VM-entry and Host can't aware this 
behavior. I think this is use case limitation of different working mode and 
please EXPECT this happened. PT MUST be disabled before VM-entry and switch to 
Guest PT status in Host-Guest mode (following the description in SDM). Or 
please use default System-Wide mode.

Thanks,
Luwei Kang


RE: [Resend PATCH v13 08/12] KVM: x86: Add Intel PT context switch for each vcpu

2018-11-04 Thread Kang, Luwei
> >> If you "have to enable or disable anything" it means you have to
> >> override the default.  But the default in this patches is "no change
> >> compared to before the patches", leaving tracing of both host and
> >> guest entirely to the host, so I don't understand your remark.  What
> >> workflow is broken?
> >>
> >>> There already are controls in perf that enable/disable guest tracing.
> >>
> >> You are confusing "tracing guest from the host" and "the guest can
> >> trace itself".  This patchset is adding support for the latter, and
> >> that
> 
> I'm not confusing anything. In the terminology that you're using, the latter 
> breaks the former. This cannot happen.
> 
> >> affects directly whether the tracing CPUID leaf can be added to the
> >> guest.  Therefore it's not perf that can decide whether to turn it
> >> on; KVM must know it when /dev/kvm is opened, which is why it is a
> >> module parameter.
> 
> There is a control in the perf event attribute that enables tracing the 
> guest. If this control is enabled, the kvm needs to stay away from any
> PT related MSRs. Conversely, if kvm is using PT (or, as you say, "the guest 
> is tracing itself"), the host should not be allowed to ask for tracing
> the guest at the same time.

I think what you mentioned "perf event attribute" is "struct perf_event_attr -> 
exclude_host/exclude_guest" parameter.
Parameter "exclude_host" can use for vPMU in pmc_reprogram_counter() to make 
the counter disabled before VM-exit; Parameter "exclude_guest" can use for PMU 
in host to make the counter not include the value in Guest;
For the implementation of vPMU, there have some counters on each logical CPU, 
and host and guest can using different counter at same time, Not make Guest 
"stay from" PMU. Is that right? I think this "perf event attribute" not fit for 
Intel PT.

Intel PT is different, there just have one serials of MSRs in each logical CPU. 
So this hardware just can be used by host OR guest. In Host-Guest mode, PT 
feature will be exposed to guest and guest detect this feature certainly can 
use it like native any way. 
"If this control is enabled, the kvm needs to stay away from any PT related 
MSRs."
If we pay for a ICL virtual machine support Intel PT from Cloud Vendor but 
can't be used. Cloud vendor say that Host (or other virtual machine) is using 
this feature so you can't use it. Why? 

Currently, Intel SDM support three mode of tracing (System-Wide Tracing, 
Host-Only Tracing and Guest-Only Tracing). below is copy from SDM:
System-Wide Tracing (35.5.2.1): When a host or VMM configures Intel PT to 
collect trace packets of the entire system, it can leave the relevant VMX 
controls clear to allow VMX-specific packets to provide information across VMX 
transitions. 
Host-Only Tracing (35.5.2.2): Trace packets in VMX non-root operation are not 
desired, the VMM can use the VM-entry MSR-load area to load IA32_RTIT_CTL 
(clearing TraceEn) to disable trace-packet generation in guests, and use the 
VM-exit MSR-load area to load IA32_RTIT_CTL to set TraceEn.
Guest-Only Tracing (35.5.2.3): A VMM can configure trace-packet generation 
while in VMX non-root operation for guests executing normally. 

In Host mode we need to disable PT before VM-entry (no matter if PT is 
support/enabled or not in guest) and Load Guest PT status if PT is supported in 
Guest. The Host-Guest mode in this patch set just combined the Host-Only mode 
and Guest-Only mode to Host-Guest mode because there don't have conflict in 
these two mode.

As for Host PT will be disable before VM-entry and Host can't aware this 
behavior. I think this is use case limitation of different working mode and 
please EXPECT this happened. PT MUST be disabled before VM-entry and switch to 
Guest PT status in Host-Guest mode (following the description in SDM). Or 
please use default System-Wide mode.

Thanks,
Luwei Kang


RE: [PATCH v13 06/12] KVM: x86: Add Intel PT virtualization work mode

2018-10-30 Thread Kang, Luwei
> > > >> This part is in the " Intel® Architecture Instruction Set Extensions 
> > > >> and Future Features Programming Reference"
> > > >> https://software.intel.com/sites/default/files/managed/c5/15/arch
> > > >> itec ture-instruction-set-extensions-programming-reference.pdf
> > > >>
> > > > Yet another PDF which will change it's location sooner than later.
> > > > Can you please stick that into the kernel.org bugzilla and
> > > > reference the BZ in the change log, so we have something for posterity?
> > >
> > > Hopefully posterity will be able to read it in the SDM.  But I agree
> > > it's a good idea to add it in the commit log.  Let's also wait for 
> > > Alexander to clarify what he thinks needs to be done.
> >
> > I will create a new bug for this feature in Bugzilla, please help
> > confirm if can like this first (it my first time to create bug in
> > kernel.org :) )
> >
> 
> Looks good.
> 

Have done.
https://bugzilla.kernel.org/show_bug.cgi?id=201565

Thanks,
Luwei Kang


RE: [PATCH v13 06/12] KVM: x86: Add Intel PT virtualization work mode

2018-10-30 Thread Kang, Luwei
> > > >> This part is in the " Intel® Architecture Instruction Set Extensions 
> > > >> and Future Features Programming Reference"
> > > >> https://software.intel.com/sites/default/files/managed/c5/15/arch
> > > >> itec ture-instruction-set-extensions-programming-reference.pdf
> > > >>
> > > > Yet another PDF which will change it's location sooner than later.
> > > > Can you please stick that into the kernel.org bugzilla and
> > > > reference the BZ in the change log, so we have something for posterity?
> > >
> > > Hopefully posterity will be able to read it in the SDM.  But I agree
> > > it's a good idea to add it in the commit log.  Let's also wait for 
> > > Alexander to clarify what he thinks needs to be done.
> >
> > I will create a new bug for this feature in Bugzilla, please help
> > confirm if can like this first (it my first time to create bug in
> > kernel.org :) )
> >
> 
> Looks good.
> 

Have done.
https://bugzilla.kernel.org/show_bug.cgi?id=201565

Thanks,
Luwei Kang


RE: [PATCH v13 06/12] KVM: x86: Add Intel PT virtualization work mode

2018-10-30 Thread Kang, Luwei
> >> This part is in the " Intel® Architecture Instruction Set Extensions and 
> >> Future Features Programming Reference"
> >> https://software.intel.com/sites/default/files/managed/c5/15/architec
> >> ture-instruction-set-extensions-programming-reference.pdf
> >>
> > Yet another PDF which will change it's location sooner than later. Can
> > you please stick that into the kernel.org bugzilla and reference the
> > BZ in the change log, so we have something for posterity?
> 
> Hopefully posterity will be able to read it in the SDM.  But I agree it's a 
> good idea to add it in the commit log.  Let's also wait for Alexander
> to clarify what he thinks needs to be done.

I will create a new bug for this feature in Bugzilla, please help confirm if 
can like this first (it my first time to create bug in kernel.org :) )

Product: Virtualization
Component: KVM
Severity: normal   (option: high, normal, low, enhancement)
Hardware: i386
Kernel Version:  4.19
Summary: Intel Processor Trace enabling in KVM
Description: 
Intel Processor Trace (Intel PT) is an extension of Intel Architecture that 
captures information about software execution using dedicated hardware 
facilities that cause only minimal performance perturbation to the software 
being traced. Details on the Intel PT infrastructure and trace capabilities can 
be found in the Intel 64 and IA-32 Architectures Software Developer’s 
Manual, Volume 3C.

The suite of architecture changes serve to simplify the process of virtualizing 
Intel PT for use by a guest software. There are two primary elements to this 
new architecture support for VMX support improvements made for Intel PT.
1. Addition of a new guest IA32_RTIT_CTL value field to the VMCS.
  — This serves to speed and simplify the process of disabling trace on VM 
exit, and restoring it on VM entry.
2. Enabling use of EPT to redirect PT output.
  — This enables the VMM to elect to virtualize the PT output buffer using EPT. 
In this mode, the CPU will treat PT output addresses as Guest Physical 
Addresses (GPAs) and translate them using EPT. This means that Intel PT output 
reads (of the ToPA table) and writes (of trace output) can cause EPT 
violations, and other output events.

Intel Processor Trace virtualization can be work in one of 2 possible modes by 
set new option "pt_mode". Default is System-Wide mode.
a. System-Wide mode (default):
   When the host configures Intel PT to collect trace packets of the entire 
system, it can leave the relevant VMX controls
   clear to allow VMX-specific packets to provide information across VMX 
transitions.
   KVM guest will not aware this feature in this mode and both host and KVM 
guest trace will output to host buffer.

b. Host-Guest mode:
   Host can configure trace-packet generation while in VMX non-root operation 
for guests and root operation
   for native executing normally.
   Intel PT will be exposed to KVM guest in this mode, and the trace output to 
respective buffer of host and guest.
   In this mode, the status of PT will be saved and disabled before VM-entry 
and restored after VM-exit if trace a virtual machine.

Attachment: < the PDF file >

Thanks,
Luwei Kang


RE: [PATCH v13 06/12] KVM: x86: Add Intel PT virtualization work mode

2018-10-30 Thread Kang, Luwei
> >> This part is in the " Intel® Architecture Instruction Set Extensions and 
> >> Future Features Programming Reference"
> >> https://software.intel.com/sites/default/files/managed/c5/15/architec
> >> ture-instruction-set-extensions-programming-reference.pdf
> >>
> > Yet another PDF which will change it's location sooner than later. Can
> > you please stick that into the kernel.org bugzilla and reference the
> > BZ in the change log, so we have something for posterity?
> 
> Hopefully posterity will be able to read it in the SDM.  But I agree it's a 
> good idea to add it in the commit log.  Let's also wait for Alexander
> to clarify what he thinks needs to be done.

I will create a new bug for this feature in Bugzilla, please help confirm if 
can like this first (it my first time to create bug in kernel.org :) )

Product: Virtualization
Component: KVM
Severity: normal   (option: high, normal, low, enhancement)
Hardware: i386
Kernel Version:  4.19
Summary: Intel Processor Trace enabling in KVM
Description: 
Intel Processor Trace (Intel PT) is an extension of Intel Architecture that 
captures information about software execution using dedicated hardware 
facilities that cause only minimal performance perturbation to the software 
being traced. Details on the Intel PT infrastructure and trace capabilities can 
be found in the Intel 64 and IA-32 Architectures Software Developer’s 
Manual, Volume 3C.

The suite of architecture changes serve to simplify the process of virtualizing 
Intel PT for use by a guest software. There are two primary elements to this 
new architecture support for VMX support improvements made for Intel PT.
1. Addition of a new guest IA32_RTIT_CTL value field to the VMCS.
  — This serves to speed and simplify the process of disabling trace on VM 
exit, and restoring it on VM entry.
2. Enabling use of EPT to redirect PT output.
  — This enables the VMM to elect to virtualize the PT output buffer using EPT. 
In this mode, the CPU will treat PT output addresses as Guest Physical 
Addresses (GPAs) and translate them using EPT. This means that Intel PT output 
reads (of the ToPA table) and writes (of trace output) can cause EPT 
violations, and other output events.

Intel Processor Trace virtualization can be work in one of 2 possible modes by 
set new option "pt_mode". Default is System-Wide mode.
a. System-Wide mode (default):
   When the host configures Intel PT to collect trace packets of the entire 
system, it can leave the relevant VMX controls
   clear to allow VMX-specific packets to provide information across VMX 
transitions.
   KVM guest will not aware this feature in this mode and both host and KVM 
guest trace will output to host buffer.

b. Host-Guest mode:
   Host can configure trace-packet generation while in VMX non-root operation 
for guests and root operation
   for native executing normally.
   Intel PT will be exposed to KVM guest in this mode, and the trace output to 
respective buffer of host and guest.
   In this mode, the status of PT will be saved and disabled before VM-entry 
and restored after VM-exit if trace a virtual machine.

Attachment: < the PDF file >

Thanks,
Luwei Kang


RE: [PATCH v13 06/12] KVM: x86: Add Intel PT virtualization work mode

2018-10-24 Thread Kang, Luwei
> > From: Chao Peng 
> >
> > Intel Processor Trace virtualization can be work in one of 2 possible
> > modes:
> >
> > a. System-Wide mode (default):
> >When the host configures Intel PT to collect trace packets
> >of the entire system, it can leave the relevant VMX controls
> >clear to allow VMX-specific packets to provide information
> >across VMX transitions.
> >KVM guest will not aware this feature in this mode and both
> >host and KVM guest trace will output to host buffer.
> >
> > b. Host-Guest mode:
> >Host can configure trace-packet generation while in
> >VMX non-root operation for guests and root operation
> >for native executing normally.
> >Intel PT will be exposed to KVM guest in this mode, and
> >the trace output to respective buffer of host and guest.
> >In this mode, tht status of PT will be saved and disabled
> >before VM-entry and restored after VM-exit if trace
> >a virtual machine.
> >
> > Signed-off-by: Chao Peng 
> > Signed-off-by: Luwei Kang 
> > ---
> 
> > +#define SECONDARY_EXEC_PT_USE_GPA  0x0100
> > +#define VM_EXIT_CLEAR_IA32_RTIT_CTL0x0200
> > +#define VM_ENTRY_LOAD_IA32_RTIT_CTL0x0004
> 
> Where are all of these bits documented? I'm looking at the latest SDM, volume 
> 3 (325384-067US), and none of these bits aredocumented
> there.

This part is in the " Intel® Architecture Instruction Set Extensions and Future 
Features Programming Reference"
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

> 
> > +   GUEST_IA32_RTIT_CTL = 0x2814,
> > +   GUEST_IA32_RTIT_CTL_HIGH= 0x2815,
> 
> Where is this VMCS field documented?
> 
> > +/* Default is SYSTEM mode. */
> > +static int __read_mostly pt_mode = PT_MODE_SYSTEM;
> > +module_param(pt_mode, int, S_IRUGO);
> 
> As a module parameter, this doesn't allow much flexibility. Is it possible to 
> make this decision per-VM, using a VM capability that can be set
> by userspace? (In that case, it may make sense to have a module parameter 
> which allows/disallows the per-VM capability.)

It is a good idea from my point of view, I think it need more discussion and 
can be implement in next phase if have strong requirement.

> 
> 
> > +static inline bool cpu_has_vmx_intel_pt(void) {
> > +   u64 vmx_msr;
> > +
> > +   rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
> > +   return !!(vmx_msr & MSR_IA32_VMX_MISC_INTEL_PT); }
> 
> Instead of the rdmsr here, wouldn't it be better to cache the IA32_VMX_MISC 
> MSR in vmcs_config?
> Nit: throughout this change, the '!!' isn't necessary when casting an integer 
> type to bool.

MSR_IA32_VMX_MISC is not read frequency and just read once in this patch set 
during initialization.

Thanks,
Luwei Kang



RE: [PATCH v13 06/12] KVM: x86: Add Intel PT virtualization work mode

2018-10-24 Thread Kang, Luwei
> > From: Chao Peng 
> >
> > Intel Processor Trace virtualization can be work in one of 2 possible
> > modes:
> >
> > a. System-Wide mode (default):
> >When the host configures Intel PT to collect trace packets
> >of the entire system, it can leave the relevant VMX controls
> >clear to allow VMX-specific packets to provide information
> >across VMX transitions.
> >KVM guest will not aware this feature in this mode and both
> >host and KVM guest trace will output to host buffer.
> >
> > b. Host-Guest mode:
> >Host can configure trace-packet generation while in
> >VMX non-root operation for guests and root operation
> >for native executing normally.
> >Intel PT will be exposed to KVM guest in this mode, and
> >the trace output to respective buffer of host and guest.
> >In this mode, tht status of PT will be saved and disabled
> >before VM-entry and restored after VM-exit if trace
> >a virtual machine.
> >
> > Signed-off-by: Chao Peng 
> > Signed-off-by: Luwei Kang 
> > ---
> 
> > +#define SECONDARY_EXEC_PT_USE_GPA  0x0100
> > +#define VM_EXIT_CLEAR_IA32_RTIT_CTL0x0200
> > +#define VM_ENTRY_LOAD_IA32_RTIT_CTL0x0004
> 
> Where are all of these bits documented? I'm looking at the latest SDM, volume 
> 3 (325384-067US), and none of these bits aredocumented
> there.

This part is in the " Intel® Architecture Instruction Set Extensions and Future 
Features Programming Reference"
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

> 
> > +   GUEST_IA32_RTIT_CTL = 0x2814,
> > +   GUEST_IA32_RTIT_CTL_HIGH= 0x2815,
> 
> Where is this VMCS field documented?
> 
> > +/* Default is SYSTEM mode. */
> > +static int __read_mostly pt_mode = PT_MODE_SYSTEM;
> > +module_param(pt_mode, int, S_IRUGO);
> 
> As a module parameter, this doesn't allow much flexibility. Is it possible to 
> make this decision per-VM, using a VM capability that can be set
> by userspace? (In that case, it may make sense to have a module parameter 
> which allows/disallows the per-VM capability.)

It is a good idea from my point of view, I think it need more discussion and 
can be implement in next phase if have strong requirement.

> 
> 
> > +static inline bool cpu_has_vmx_intel_pt(void) {
> > +   u64 vmx_msr;
> > +
> > +   rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
> > +   return !!(vmx_msr & MSR_IA32_VMX_MISC_INTEL_PT); }
> 
> Instead of the rdmsr here, wouldn't it be better to cache the IA32_VMX_MISC 
> MSR in vmcs_config?
> Nit: throughout this change, the '!!' isn't necessary when casting an integer 
> type to bool.

MSR_IA32_VMX_MISC is not read frequency and just read once in this patch set 
during initialization.

Thanks,
Luwei Kang



RE: [PATCH v13 08/12] KVM: x86: Add Intel PT context switch for each vcpu

2018-10-24 Thread Kang, Luwei
> > +static void pt_guest_enter(struct vcpu_vmx *vmx) {
> > +   if (pt_mode == PT_MODE_SYSTEM)
> > +   return;
> > +
> > +   /* Save host state before VM entry */
> > +   rdmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
> > +
> > +   /*
> > +* Set guest state of MSR_IA32_RTIT_CTL MSR (PT will be disabled
> > +* on VM entry when it has been disabled in guest before).
> > +*/
> > +   vmcs_write64(GUEST_IA32_RTIT_CTL, vmx->pt_desc.guest.ctl);
> > +
> > +   if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) {
> > +   wrmsrl(MSR_IA32_RTIT_CTL, 0);
> > +   pt_save_msr(>pt_desc.host, vmx->pt_desc.addr_range);
> > +   pt_load_msr(>pt_desc.guest, vmx->pt_desc.addr_range);
> > +   }
> > +}
> 
> From my side this is still a NAK, because [1].
> 
> [1] https://marc.info/?l=kvm=153847567226248=2
> 

This place is save the host PT status and load the guest PT status before 
VM-entry when working in Host-Guest mode.

Thanks,
Luwei Kang



RE: [PATCH v13 08/12] KVM: x86: Add Intel PT context switch for each vcpu

2018-10-24 Thread Kang, Luwei
> > +static void pt_guest_enter(struct vcpu_vmx *vmx) {
> > +   if (pt_mode == PT_MODE_SYSTEM)
> > +   return;
> > +
> > +   /* Save host state before VM entry */
> > +   rdmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
> > +
> > +   /*
> > +* Set guest state of MSR_IA32_RTIT_CTL MSR (PT will be disabled
> > +* on VM entry when it has been disabled in guest before).
> > +*/
> > +   vmcs_write64(GUEST_IA32_RTIT_CTL, vmx->pt_desc.guest.ctl);
> > +
> > +   if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) {
> > +   wrmsrl(MSR_IA32_RTIT_CTL, 0);
> > +   pt_save_msr(>pt_desc.host, vmx->pt_desc.addr_range);
> > +   pt_load_msr(>pt_desc.guest, vmx->pt_desc.addr_range);
> > +   }
> > +}
> 
> From my side this is still a NAK, because [1].
> 
> [1] https://marc.info/?l=kvm=153847567226248=2
> 

This place is save the host PT status and load the guest PT status before 
VM-entry when working in Host-Guest mode.

Thanks,
Luwei Kang



RE: [PATCH v9 09/12] KVM: x86: Introduce a function to initialize the PT configuration

2018-06-11 Thread Kang, Luwei
> >>> + /* Initialize and clear the no dependency bits */
> >>> + vmx->pt_desc.ctl_bitmask = ~0ULL;
> >> This looks redundant, doesn't it?
> > This is a bit mask for RTIT_CTL MSR and it will make & with the value of 
> > RTIT_CLT from guest.
> > The reserved bits will be 1 in this bit mask and the setable  bits are 0.
> >
> >>> + vmx->pt_desc.ctl_bitmask = ~(RTIT_CTL_TRACEEN | RTIT_CTL_OS |
> >>> + RTIT_CTL_USR | RTIT_CTL_TSC_EN |
> >> RTIT_CTL_DISRETC);
> 
> But it's set twice, so the first assignment is superfluous.

Oh, I see. Will remove it.

Thanks,
Luwei Kang


RE: [PATCH v9 09/12] KVM: x86: Introduce a function to initialize the PT configuration

2018-06-11 Thread Kang, Luwei
> >>> + /* Initialize and clear the no dependency bits */
> >>> + vmx->pt_desc.ctl_bitmask = ~0ULL;
> >> This looks redundant, doesn't it?
> > This is a bit mask for RTIT_CTL MSR and it will make & with the value of 
> > RTIT_CLT from guest.
> > The reserved bits will be 1 in this bit mask and the setable  bits are 0.
> >
> >>> + vmx->pt_desc.ctl_bitmask = ~(RTIT_CTL_TRACEEN | RTIT_CTL_OS |
> >>> + RTIT_CTL_USR | RTIT_CTL_TSC_EN |
> >> RTIT_CTL_DISRETC);
> 
> But it's set twice, so the first assignment is superfluous.

Oh, I see. Will remove it.

Thanks,
Luwei Kang


RE: [PATCH v9 11/12] KVM: x86: Set intercept for Intel PT MSRs read/write

2018-06-08 Thread Kang, Luwei
> > From: Chao Peng 
> >
> > Disable intercept Intel PT MSRs only when Intel PT is enabled in
> > guest. But MSR_IA32_RTIT_CTL will alway be intercept.
> 
> "I'd like to offer some suggestions as to how to make the commit message
> friendlier for reviewing.
> 
> Generally, for every patch, we want to explain the following: what we want,
> why we want it and how we want to go about getting it. We also would
> prefer to do it in english rather than in C, because for the latter we can 
> just
> look at the code." [1]
> 
> I apologize for quoting myself or if I'm stating the obvious, but it looks
> appropriate here.
> 
> [1] https://marc.info/?l=linux-kernel=152233031020263
> 

Thank you very much. My English is not good and lack some experience of how to 
make description more clearly.
Will improve it.

Thanks,
Luwei Kang



RE: [PATCH v9 11/12] KVM: x86: Set intercept for Intel PT MSRs read/write

2018-06-08 Thread Kang, Luwei
> > From: Chao Peng 
> >
> > Disable intercept Intel PT MSRs only when Intel PT is enabled in
> > guest. But MSR_IA32_RTIT_CTL will alway be intercept.
> 
> "I'd like to offer some suggestions as to how to make the commit message
> friendlier for reviewing.
> 
> Generally, for every patch, we want to explain the following: what we want,
> why we want it and how we want to go about getting it. We also would
> prefer to do it in english rather than in C, because for the latter we can 
> just
> look at the code." [1]
> 
> I apologize for quoting myself or if I'm stating the obvious, but it looks
> appropriate here.
> 
> [1] https://marc.info/?l=linux-kernel=152233031020263
> 

Thank you very much. My English is not good and lack some experience of how to 
make description more clearly.
Will improve it.

Thanks,
Luwei Kang



RE: [PATCH v9 10/12] KVM: x86: Implement Intel Processor Trace MSRs read/write emulation

2018-06-08 Thread Kang, Luwei
> > +   /*
> > +* Any MSR write that attempts to change bits marked reserved will
> > +* case a #GP fault.
> > +*/
> > +   if (data & vmx->pt_desc.ctl_bitmask)
> > +   return 1;
> > +
> > +   /*
> > +* Any attempt to modify IA32_RTIT_CTL while TraceEn is set will
> > +* result in a #GP unless the same write also clears TraceEn.
> > +*/
> > +   if ((vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) &&
> > +   ((vmx->pt_desc.guest.ctl ^ data) & ~RTIT_CTL_TRACEEN))
> > +   return 1;
> > +
> > +   /*
> > +* WRMSR to IA32_RTIT_CTL that sets TraceEn but clears this bit
> > +* and FabricEn would cause #GP, if
> > +* CPUID.(EAX=14H, ECX=0):ECX.SNGLRGNOUT[bit 2] = 0
> > +*/
> > +   if ((data & RTIT_CTL_TRACEEN) && !(data & RTIT_CTL_TOPA) &&
> > +   !(data & RTIT_CTL_FABRIC_EN) &&
> > +   !pt_cap_decode(vmx->pt_desc.caps,
> PT_CAP_single_range_output))
> > +   return 1;
> 
> Ah, I see. But afaict this is still wrong: PT_CAP_single_range_output is only 
>  about allowing !RTIT_CTL_TOPA, 

This is follow the description in SDM (35.2.7.2 IA32_RTIT_CTL MSR   -> ToPA -> 
Note:...)

> FABRIC_EN should be checked separately against the new capability that you 
> added in 4/12.

Yes, it have has been handled in patch 9.
+   /* If CPUID.(EAX=14H,ECX=0):ECX[3]=1 FabircEn can be set */
+   if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_output_subsys))
+   vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_FABRIC_EN;

Thanks,
Luwei Kang


RE: [PATCH v9 10/12] KVM: x86: Implement Intel Processor Trace MSRs read/write emulation

2018-06-08 Thread Kang, Luwei
> > +   /*
> > +* Any MSR write that attempts to change bits marked reserved will
> > +* case a #GP fault.
> > +*/
> > +   if (data & vmx->pt_desc.ctl_bitmask)
> > +   return 1;
> > +
> > +   /*
> > +* Any attempt to modify IA32_RTIT_CTL while TraceEn is set will
> > +* result in a #GP unless the same write also clears TraceEn.
> > +*/
> > +   if ((vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) &&
> > +   ((vmx->pt_desc.guest.ctl ^ data) & ~RTIT_CTL_TRACEEN))
> > +   return 1;
> > +
> > +   /*
> > +* WRMSR to IA32_RTIT_CTL that sets TraceEn but clears this bit
> > +* and FabricEn would cause #GP, if
> > +* CPUID.(EAX=14H, ECX=0):ECX.SNGLRGNOUT[bit 2] = 0
> > +*/
> > +   if ((data & RTIT_CTL_TRACEEN) && !(data & RTIT_CTL_TOPA) &&
> > +   !(data & RTIT_CTL_FABRIC_EN) &&
> > +   !pt_cap_decode(vmx->pt_desc.caps,
> PT_CAP_single_range_output))
> > +   return 1;
> 
> Ah, I see. But afaict this is still wrong: PT_CAP_single_range_output is only 
>  about allowing !RTIT_CTL_TOPA, 

This is follow the description in SDM (35.2.7.2 IA32_RTIT_CTL MSR   -> ToPA -> 
Note:...)

> FABRIC_EN should be checked separately against the new capability that you 
> added in 4/12.

Yes, it have has been handled in patch 9.
+   /* If CPUID.(EAX=14H,ECX=0):ECX[3]=1 FabircEn can be set */
+   if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_output_subsys))
+   vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_FABRIC_EN;

Thanks,
Luwei Kang


RE: [PATCH v9 09/12] KVM: x86: Introduce a function to initialize the PT configuration

2018-06-08 Thread Kang, Luwei
> On Tue, May 22, 2018 at 12:52:12PM +0800, Luwei Kang wrote:
> > Initialize the Intel PT configuration when cpuid update.
> 
> Is it the CPUID configuration? Is it the MSR configuration? Is it both?
> Kind of looks like both. Not sure what is the cpuid update, though.
> 
> > Include cpuid inforamtion, rtit_ctl bit mask and the number of address
> > ranges.
> >
> > Signed-off-by: Luwei Kang 
> > ---
> >  arch/x86/kvm/vmx.c | 70
> > ++
> >  1 file changed, 70 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> > 11fb90a..952ddf4 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -10411,6 +10411,72 @@ static void
> > nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)  #undef
> > cr4_fixed1_update  }
> >
> > +static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) {
> > +   struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +   struct kvm_cpuid_entry2 *best = NULL;
> > +   int i;
> > +
> > +   for (i = 0; i < PT_CPUID_LEAVES; i++) {
> > +   best = kvm_find_cpuid_entry(vcpu, 0x14, i);
> > +   if (!best)
> > +   return;
> > +   vmx->pt_desc.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM]
> = best->eax;
> > +   vmx->pt_desc.caps[CPUID_EBX + i*PT_CPUID_REGS_NUM]
> = best->ebx;
> > +   vmx->pt_desc.caps[CPUID_ECX + i*PT_CPUID_REGS_NUM]
> = best->ecx;
> > +   vmx->pt_desc.caps[CPUID_EDX + i*PT_CPUID_REGS_NUM]
> = best->edx;
> > +   }
> > +
> > +   /* Get the number of configurable Address Ranges for filtering */
> > +   vmx->pt_desc.addr_range = pt_cap_decode(vmx->pt_desc.caps,
> > +
>   PT_CAP_num_address_ranges);
> > +
> > +   /* Initialize and clear the no dependency bits */
> > +   vmx->pt_desc.ctl_bitmask = ~0ULL;
> 
> This looks redundant, doesn't it?

This is a bit mask for RTIT_CTL MSR and it will make & with the value of 
RTIT_CLT from guest.
The reserved bits will be 1 in this bit mask and the setable  bits are 0.

> 
> > +   vmx->pt_desc.ctl_bitmask = ~(RTIT_CTL_TRACEEN | RTIT_CTL_OS |
> > +   RTIT_CTL_USR | RTIT_CTL_TSC_EN |
> RTIT_CTL_DISRETC);
> > +
> > +   /* If CPUID.(EAX=14H,ECX=0):EBX[0]=1 CR3Filter can be set */
> 
> This comment makes it less clear than it would have been otherwise.

What about like this?
/* If CPUID.(EAX=14H,ECX=0):EBX[0]=1 CR3Filter can be set otherwise will inject 
an #GP */

> 
> > +   if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_cr3_filtering))
> > +   vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_CR3EN;
> > +
> > +   /*
> > +* If CPUID.(EAX=14H,ECX=0):EBX[1]=1 CYCEn, CycThresh and
> > +* PSBFreq can be set
> > +*/
> > +   if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_psb_cyc))
> > +   vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_CYCLEACC |
> > +   RTIT_CTL_CYC_THRESH | RTIT_CTL_PSB_FREQ);
> > +
> > +   /*
> > +* If CPUID.(EAX=14H,ECX=0):EBX[3]=1 MTCEn BranchEn and
> > +* MTCFreq can be set
> > +*/
> > +   if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_mtc))
> > +   vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_MTC_EN |
> > +   RTIT_CTL_BRANCH_EN | RTIT_CTL_MTC_RANGE);
> > +
> > +   /* If CPUID.(EAX=14H,ECX=0):EBX[4]=1 FUPonPTW and PTWEn can
> be set */
> > +   if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_ptwrite))
> > +   vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_FUP_ON_PTW |
> > +   RTIT_CTL_PTW_EN);
> > +
> > +   /* If CPUID.(EAX=14H,ECX=0):EBX[5]=1 PwrEvEn can be set */
> > +   if (pt_cap_decode(vmx->pt_desc.caps,
> PT_CAP_power_event_trace))
> > +   vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_PWR_EVT_EN;
> > +
> > +   /* If CPUID.(EAX=14H,ECX=0):ECX[0]=1 ToPA can be set */
> > +   if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_topa_output))
> > +   vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_TOPA;
> 
> If you want to be thorough, there's also PT_CAP_single_range_output,
> which tells us if RTIT_CTL_TOPA can be *unset*. Otherwise it's required.

Following the description in SDM, the default value of this bit (RTIT_CTL.ToPA) 
is 0. So I think we don't need to touch this bit when TOPA is not supported.

> 
> > +   /* If CPUID.(EAX=14H,ECX=0):ECX[3]=1 FabircEn can be set */
> > +   if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_output_subsys))
> > +   vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_FABRIC_EN;
> 
> Are we sure we want to virtualize this and that it's safe?

It depend on the CPUID information virtualization in guest. I think it can 
works (e.g. we can pass through the PCI card to guest and stream the trace to a 
MMIO address).

> 
> > +
> > +   /* unmask address range configure area */
> > +   for (i = 0; i < vmx->pt_desc.addr_range; i++)
> > +   vmx->pt_desc.ctl_bitmask &= ~(0xf << (32 + i * 4));
> 
> So, the ctl_bitmask is all the bits that are not allowed?

Yes, you are right.

Thanks,
Luwei Kang


RE: [PATCH v9 09/12] KVM: x86: Introduce a function to initialize the PT configuration

2018-06-08 Thread Kang, Luwei
> On Tue, May 22, 2018 at 12:52:12PM +0800, Luwei Kang wrote:
> > Initialize the Intel PT configuration when cpuid update.
> 
> Is it the CPUID configuration? Is it the MSR configuration? Is it both?
> Kind of looks like both. Not sure what is the cpuid update, though.
> 
> > Include cpuid inforamtion, rtit_ctl bit mask and the number of address
> > ranges.
> >
> > Signed-off-by: Luwei Kang 
> > ---
> >  arch/x86/kvm/vmx.c | 70
> > ++
> >  1 file changed, 70 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> > 11fb90a..952ddf4 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -10411,6 +10411,72 @@ static void
> > nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)  #undef
> > cr4_fixed1_update  }
> >
> > +static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) {
> > +   struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +   struct kvm_cpuid_entry2 *best = NULL;
> > +   int i;
> > +
> > +   for (i = 0; i < PT_CPUID_LEAVES; i++) {
> > +   best = kvm_find_cpuid_entry(vcpu, 0x14, i);
> > +   if (!best)
> > +   return;
> > +   vmx->pt_desc.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM]
> = best->eax;
> > +   vmx->pt_desc.caps[CPUID_EBX + i*PT_CPUID_REGS_NUM]
> = best->ebx;
> > +   vmx->pt_desc.caps[CPUID_ECX + i*PT_CPUID_REGS_NUM]
> = best->ecx;
> > +   vmx->pt_desc.caps[CPUID_EDX + i*PT_CPUID_REGS_NUM]
> = best->edx;
> > +   }
> > +
> > +   /* Get the number of configurable Address Ranges for filtering */
> > +   vmx->pt_desc.addr_range = pt_cap_decode(vmx->pt_desc.caps,
> > +
>   PT_CAP_num_address_ranges);
> > +
> > +   /* Initialize and clear the no dependency bits */
> > +   vmx->pt_desc.ctl_bitmask = ~0ULL;
> 
> This looks redundant, doesn't it?

This is a bit mask for RTIT_CTL MSR and it will make & with the value of 
RTIT_CLT from guest.
The reserved bits will be 1 in this bit mask and the setable  bits are 0.

> 
> > +   vmx->pt_desc.ctl_bitmask = ~(RTIT_CTL_TRACEEN | RTIT_CTL_OS |
> > +   RTIT_CTL_USR | RTIT_CTL_TSC_EN |
> RTIT_CTL_DISRETC);
> > +
> > +   /* If CPUID.(EAX=14H,ECX=0):EBX[0]=1 CR3Filter can be set */
> 
> This comment makes it less clear than it would have been otherwise.

What about like this?
/* If CPUID.(EAX=14H,ECX=0):EBX[0]=1 CR3Filter can be set otherwise will inject 
an #GP */

> 
> > +   if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_cr3_filtering))
> > +   vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_CR3EN;
> > +
> > +   /*
> > +* If CPUID.(EAX=14H,ECX=0):EBX[1]=1 CYCEn, CycThresh and
> > +* PSBFreq can be set
> > +*/
> > +   if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_psb_cyc))
> > +   vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_CYCLEACC |
> > +   RTIT_CTL_CYC_THRESH | RTIT_CTL_PSB_FREQ);
> > +
> > +   /*
> > +* If CPUID.(EAX=14H,ECX=0):EBX[3]=1 MTCEn BranchEn and
> > +* MTCFreq can be set
> > +*/
> > +   if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_mtc))
> > +   vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_MTC_EN |
> > +   RTIT_CTL_BRANCH_EN | RTIT_CTL_MTC_RANGE);
> > +
> > +   /* If CPUID.(EAX=14H,ECX=0):EBX[4]=1 FUPonPTW and PTWEn can
> be set */
> > +   if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_ptwrite))
> > +   vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_FUP_ON_PTW |
> > +   RTIT_CTL_PTW_EN);
> > +
> > +   /* If CPUID.(EAX=14H,ECX=0):EBX[5]=1 PwrEvEn can be set */
> > +   if (pt_cap_decode(vmx->pt_desc.caps,
> PT_CAP_power_event_trace))
> > +   vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_PWR_EVT_EN;
> > +
> > +   /* If CPUID.(EAX=14H,ECX=0):ECX[0]=1 ToPA can be set */
> > +   if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_topa_output))
> > +   vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_TOPA;
> 
> If you want to be thorough, there's also PT_CAP_single_range_output,
> which tells us if RTIT_CTL_TOPA can be *unset*. Otherwise it's required.

Following the description in SDM, the default value of this bit (RTIT_CTL.ToPA) 
is 0. So I think we don't need to touch this bit when TOPA is not supported.

> 
> > +   /* If CPUID.(EAX=14H,ECX=0):ECX[3]=1 FabircEn can be set */
> > +   if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_output_subsys))
> > +   vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_FABRIC_EN;
> 
> Are we sure we want to virtualize this and that it's safe?

It depend on the CPUID information virtualization in guest. I think it can 
works (e.g. we can pass through the PCI card to guest and stream the trace to a 
MMIO address).

> 
> > +
> > +   /* unmask address range configure area */
> > +   for (i = 0; i < vmx->pt_desc.addr_range; i++)
> > +   vmx->pt_desc.ctl_bitmask &= ~(0xf << (32 + i * 4));
> 
> So, the ctl_bitmask is all the bits that are not allowed?

Yes, you are right.

Thanks,
Luwei Kang


RE: [PATCH v9 04/12] perf/x86/intel/pt: add new capability for Intel PT

2018-06-08 Thread Kang, Luwei
> > On Tue, May 22, 2018 at 12:52:07PM +0800, Luwei Kang wrote:
> > > CPUID(EAX=14H,ECX=0):EBX[bit 3] = 1 indicates support of output to
> > > Trace Transport subsystem.
> > > MSR IA32_RTIT_CTL.FabricEn[bit 6] is reserved if CPUID.(EAX=14H,
> > > ECX=0):ECX[bit 3] = 0.
> >
> > This should instead say:
> >
> > This adds support for "output to Trace Transport subsystem" capability
> > of Intel PT, as documented in IA SDM Chapter 36.x.y.z. It means that
> > PT can output its trace to an MMIO address range rather than system
> > memory buffer.
> 
> The $subject also needs attention.

A little don't understand. What is "$subject" mean?

Thanks,
Luwei Kang


RE: [PATCH v9 04/12] perf/x86/intel/pt: add new capability for Intel PT

2018-06-08 Thread Kang, Luwei
> > On Tue, May 22, 2018 at 12:52:07PM +0800, Luwei Kang wrote:
> > > CPUID(EAX=14H,ECX=0):EBX[bit 3] = 1 indicates support of output to
> > > Trace Transport subsystem.
> > > MSR IA32_RTIT_CTL.FabricEn[bit 6] is reserved if CPUID.(EAX=14H,
> > > ECX=0):ECX[bit 3] = 0.
> >
> > This should instead say:
> >
> > This adds support for "output to Trace Transport subsystem" capability
> > of Intel PT, as documented in IA SDM Chapter 36.x.y.z. It means that
> > PT can output its trace to an MMIO address range rather than system
> > memory buffer.
> 
> The $subject also needs attention.

A little don't understand. What is "$subject" mean?

Thanks,
Luwei Kang


RE: [PATCH v9 04/12] perf/x86/intel/pt: add new capability for Intel PT

2018-06-08 Thread Kang, Luwei
> > CPUID(EAX=14H,ECX=0):EBX[bit 3] = 1 indicates support of output to
> > Trace Transport subsystem.
> > MSR IA32_RTIT_CTL.FabricEn[bit 6] is reserved if CPUID.(EAX=14H,
> > ECX=0):ECX[bit 3] = 0.
> 
> This should instead say:
> 
> This adds support for "output to Trace Transport subsystem" capability of
> Intel PT, as documented in IA SDM Chapter 36.x.y.z. It means that PT can
> output its trace to an MMIO address range rather than system memory
> buffer.

Yes, you are right. In KVM point of view this bit use for MSR access security 
check.
KVM will track MSRs access in guest, an #GP will be injected to guest if guest 
try to write IA32_RTIT_CTL.FabricEn[bit 6]  when "output_subsys" 
(CPUID.(EAX=14H, ECX=0):ECX[bit 3] = 0) is not supported.

Thanks,
Luwei Kang

> 
> > This is use for emulate IA32_RTIT_CTL MSR read/write in KVM. KVM guest
> > write IA32_RTIT_CTL will trap to root mode and a #GP would be injected
> > to guest if set IA32_RTIT_CTL.FabricEn with CPUID.(EAX=14H,
> > ECX=0):ECX[bit 3] = 0.
> 
> I'm not sure what this means, this patch has nothing to do with KVM as far as
> I can tell.
> 
> Aside from the commit message, this is a valid patch.
> 
> Thanks,
> --
> Alex



RE: [PATCH v9 04/12] perf/x86/intel/pt: add new capability for Intel PT

2018-06-08 Thread Kang, Luwei
> > CPUID(EAX=14H,ECX=0):EBX[bit 3] = 1 indicates support of output to
> > Trace Transport subsystem.
> > MSR IA32_RTIT_CTL.FabricEn[bit 6] is reserved if CPUID.(EAX=14H,
> > ECX=0):ECX[bit 3] = 0.
> 
> This should instead say:
> 
> This adds support for "output to Trace Transport subsystem" capability of
> Intel PT, as documented in IA SDM Chapter 36.x.y.z. It means that PT can
> output its trace to an MMIO address range rather than system memory
> buffer.

Yes, you are right. In KVM point of view this bit use for MSR access security 
check.
KVM will track MSRs access in guest, an #GP will be injected to guest if guest 
try to write IA32_RTIT_CTL.FabricEn[bit 6]  when "output_subsys" 
(CPUID.(EAX=14H, ECX=0):ECX[bit 3] = 0) is not supported.

Thanks,
Luwei Kang

> 
> > This is use for emulate IA32_RTIT_CTL MSR read/write in KVM. KVM guest
> > write IA32_RTIT_CTL will trap to root mode and a #GP would be injected
> > to guest if set IA32_RTIT_CTL.FabricEn with CPUID.(EAX=14H,
> > ECX=0):ECX[bit 3] = 0.
> 
> I'm not sure what this means, this patch has nothing to do with KVM as far as
> I can tell.
> 
> Aside from the commit message, this is a valid patch.
> 
> Thanks,
> --
> Alex



RE: [PATCH v9 03/12] perf/x86/intel/pt: Add new bit definitions for Intel PT MSRs

2018-06-08 Thread Kang, Luwei
> > These bit definitions are use for emulate MSRs read/write for KVM. For
> > example, IA32_RTIT_CTL.FabricEn[bit 6] is available only when
> > CPUID.(EAX=14H, ECX=0):ECX[bit 3] = 1. If KVM guest try to set this
> > bit with CPUID.(EAX=14H, ECX=0):ECX[bit3] = 0 a #GP would be injected
> > to KVM guest.
> 
> Do we have anything in the guest that this feature will work with?
> 

It depend on PT driver. KVM need to do some security check if kvm guest (maybe 
linux or other os) try to set any bits of these MSRs.

Thanks,
Luwei Kang


RE: [PATCH v9 03/12] perf/x86/intel/pt: Add new bit definitions for Intel PT MSRs

2018-06-08 Thread Kang, Luwei
> > These bit definitions are use for emulate MSRs read/write for KVM. For
> > example, IA32_RTIT_CTL.FabricEn[bit 6] is available only when
> > CPUID.(EAX=14H, ECX=0):ECX[bit 3] = 1. If KVM guest try to set this
> > bit with CPUID.(EAX=14H, ECX=0):ECX[bit3] = 0 a #GP would be injected
> > to KVM guest.
> 
> Do we have anything in the guest that this feature will work with?
> 

It depend on PT driver. KVM need to do some security check if kvm guest (maybe 
linux or other os) try to set any bits of these MSRs.

Thanks,
Luwei Kang


RE: [PATCH v9 06/12] KVM: x86: Add Intel Processor Trace virtualization mode

2018-06-08 Thread Kang, Luwei
> > From: Chao Peng 
> >
> > Intel PT virtualization can be work in one of 2 possible modes:
> > a. system-wide: trace both host and guest and output to host buffer;
> > b. host-guest: trace host/guest simultaneous and output to their
> >respective buffer.
> 
> I think we discussed this before. That's not the choice that the user needs to
> make. The only choice that I see is what happens if the guest wants to use PT
> while it's already in use by the host. Otherwise, each of them gets to use PT
> as they would.
> 

Hi,
Thanks for your reply first. If guest want to use PT that we need to expose 
this feature to guest and guest will save these trace packets to guest buffer. 
Host will can't get the trace which are made by KVM guest. This is Host_Guest 
mode.
On the other hand,  host may also want to trace hypervisor and KVM guest 
execution trace and output these trace in native. This is SYSTEM mode. PT can't 
be exposed to guest in this mode or KVM guest trace will be lost in native 
buffer.
These are two different use case and can't coexist.

Thanks,
Luwei Kang




RE: [PATCH v9 06/12] KVM: x86: Add Intel Processor Trace virtualization mode

2018-06-08 Thread Kang, Luwei
> > From: Chao Peng 
> >
> > Intel PT virtualization can be work in one of 2 possible modes:
> > a. system-wide: trace both host and guest and output to host buffer;
> > b. host-guest: trace host/guest simultaneous and output to their
> >respective buffer.
> 
> I think we discussed this before. That's not the choice that the user needs to
> make. The only choice that I see is what happens if the guest wants to use PT
> while it's already in use by the host. Otherwise, each of them gets to use PT
> as they would.
> 

Hi,
Thanks for your reply first. If guest want to use PT that we need to expose 
this feature to guest and guest will save these trace packets to guest buffer. 
Host will can't get the trace which are made by KVM guest. This is Host_Guest 
mode.
On the other hand,  host may also want to trace hypervisor and KVM guest 
execution trace and output these trace in native. This is SYSTEM mode. PT can't 
be exposed to guest in this mode or KVM guest trace will be lost in native 
buffer.
These are two different use case and can't coexist.

Thanks,
Luwei Kang




RE: [PATCH v9 01/12] perf/x86/intel/pt: Move Intel-PT MSRs bit definitions to a public header

2018-06-07 Thread Kang, Luwei
> -Original Message-
> From: Kang, Luwei
> Sent: Tuesday, May 22, 2018 12:52 PM
> To: k...@vger.kernel.org
> Cc: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; 
> chao.p.p...@linux.intel.com;
> thomas.lenda...@amd.com; b...@suse.de; Liang, Kan ; 
> janakarajan.natara...@amd.com;
> d...@amazon.co.uk; linux-kernel@vger.kernel.org; 
> alexander.shish...@linux.intel.com; pet...@infradead.org;
> mathieu.poir...@linaro.org; kstew...@linuxfoundation.org; 
> gre...@linuxfoundation.org; pbonz...@redhat.com;
> rkrc...@redhat.com; da...@redhat.com; b...@redhat.com; 
> yu.c.zh...@linux.intel.com; j...@8bytes.org; Kang, Luwei
> 
> Subject: [PATCH v9 01/12] perf/x86/intel/pt: Move Intel-PT MSRs bit 
> definitions to a public header
> 
> From: Chao Peng 
> 
> Intel Processor Trace virtualization enabling in KVM guest need to access 
> these MSRs bit definitions, so move them to public header
> file msr-index.h.
> 
> Signed-off-by: Chao Peng 
> Signed-off-by: Luwei Kang 
> ---
>  arch/x86/events/intel/pt.h   | 37 -
>  arch/x86/include/asm/msr-index.h | 33 +
>  2 files changed, 33 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/x86/events/intel/pt.h b/arch/x86/events/intel/pt.h index 
> 0eb41d0..0050ca1 100644
> --- a/arch/x86/events/intel/pt.h
> +++ b/arch/x86/events/intel/pt.h
> @@ -20,43 +20,6 @@
>  #define __INTEL_PT_H__
> 
>  /*
> - * PT MSR bit definitions
> - */
> -#define RTIT_CTL_TRACEEN BIT(0)
> -#define RTIT_CTL_CYCLEACCBIT(1)
> -#define RTIT_CTL_OS  BIT(2)
> -#define RTIT_CTL_USR BIT(3)
> -#define RTIT_CTL_PWR_EVT_EN  BIT(4)
> -#define RTIT_CTL_FUP_ON_PTW  BIT(5)
> -#define RTIT_CTL_CR3EN   BIT(7)
> -#define RTIT_CTL_TOPABIT(8)
> -#define RTIT_CTL_MTC_EN  BIT(9)
> -#define RTIT_CTL_TSC_EN  BIT(10)
> -#define RTIT_CTL_DISRETC BIT(11)
> -#define RTIT_CTL_PTW_EN  BIT(12)
> -#define RTIT_CTL_BRANCH_EN   BIT(13)
> -#define RTIT_CTL_MTC_RANGE_OFFSET14
> -#define RTIT_CTL_MTC_RANGE   (0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
> -#define RTIT_CTL_CYC_THRESH_OFFSET   19
> -#define RTIT_CTL_CYC_THRESH  (0x0full << RTIT_CTL_CYC_THRESH_OFFSET)
> -#define RTIT_CTL_PSB_FREQ_OFFSET 24
> -#define RTIT_CTL_PSB_FREQ(0x0full << 
> RTIT_CTL_PSB_FREQ_OFFSET)
> -#define RTIT_CTL_ADDR0_OFFSET32
> -#define RTIT_CTL_ADDR0   (0x0full << RTIT_CTL_ADDR0_OFFSET)
> -#define RTIT_CTL_ADDR1_OFFSET36
> -#define RTIT_CTL_ADDR1   (0x0full << RTIT_CTL_ADDR1_OFFSET)
> -#define RTIT_CTL_ADDR2_OFFSET40
> -#define RTIT_CTL_ADDR2   (0x0full << RTIT_CTL_ADDR2_OFFSET)
> -#define RTIT_CTL_ADDR3_OFFSET44
> -#define RTIT_CTL_ADDR3   (0x0full << RTIT_CTL_ADDR3_OFFSET)
> -#define RTIT_STATUS_FILTEREN BIT(0)
> -#define RTIT_STATUS_CONTEXTENBIT(1)
> -#define RTIT_STATUS_TRIGGERENBIT(2)
> -#define RTIT_STATUS_BUFFOVF  BIT(3)
> -#define RTIT_STATUS_ERRORBIT(4)
> -#define RTIT_STATUS_STOPPED  BIT(5)
> -
> -/*
>   * Single-entry ToPA: when this close to region boundary, switch
>   * buffers to avoid losing data.
>   */
> diff --git a/arch/x86/include/asm/msr-index.h 
> b/arch/x86/include/asm/msr-index.h
> index 53d5b1b..afe4e13 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -106,7 +106,40 @@
>  #define MSR_PEBS_LD_LAT_THRESHOLD0x03f6
> 
>  #define MSR_IA32_RTIT_CTL0x0570
> +#define RTIT_CTL_TRACEEN BIT(0)
> +#define RTIT_CTL_CYCLEACCBIT(1)
> +#define RTIT_CTL_OS  BIT(2)
> +#define RTIT_CTL_USR BIT(3)
> +#define RTIT_CTL_PWR_EVT_EN  BIT(4)
> +#define RTIT_CTL_FUP_ON_PTW  BIT(5)
> +#define RTIT_CTL_CR3EN   BIT(7)
> +#define RTIT_CTL_TOPABIT(8)
> +#define RTIT_CTL_MTC_EN  BIT(9)
> +#define RTIT_CTL_TSC_EN  BIT(10)
> +#define RTIT_CTL_DISRETC BIT(11)
> +#define RTIT_CTL_PTW_EN  BIT(12)
> +#define RTIT_CTL_BRANCH_EN   BIT(13)
> +#define RTIT_CTL_MTC_RANGE_OFFSET14
> +#define RTIT_CTL_MTC_RANGE   (0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
> +#define RTIT_CTL_CYC_THRESH_OFFSET   19
> +#define RTIT_CTL_CYC_THRESH   

RE: [PATCH v9 01/12] perf/x86/intel/pt: Move Intel-PT MSRs bit definitions to a public header

2018-06-07 Thread Kang, Luwei
> -Original Message-
> From: Kang, Luwei
> Sent: Tuesday, May 22, 2018 12:52 PM
> To: k...@vger.kernel.org
> Cc: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; 
> chao.p.p...@linux.intel.com;
> thomas.lenda...@amd.com; b...@suse.de; Liang, Kan ; 
> janakarajan.natara...@amd.com;
> d...@amazon.co.uk; linux-kernel@vger.kernel.org; 
> alexander.shish...@linux.intel.com; pet...@infradead.org;
> mathieu.poir...@linaro.org; kstew...@linuxfoundation.org; 
> gre...@linuxfoundation.org; pbonz...@redhat.com;
> rkrc...@redhat.com; da...@redhat.com; b...@redhat.com; 
> yu.c.zh...@linux.intel.com; j...@8bytes.org; Kang, Luwei
> 
> Subject: [PATCH v9 01/12] perf/x86/intel/pt: Move Intel-PT MSRs bit 
> definitions to a public header
> 
> From: Chao Peng 
> 
> Intel Processor Trace virtualization enabling in KVM guest need to access 
> these MSRs bit definitions, so move them to public header
> file msr-index.h.
> 
> Signed-off-by: Chao Peng 
> Signed-off-by: Luwei Kang 
> ---
>  arch/x86/events/intel/pt.h   | 37 -
>  arch/x86/include/asm/msr-index.h | 33 +
>  2 files changed, 33 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/x86/events/intel/pt.h b/arch/x86/events/intel/pt.h index 
> 0eb41d0..0050ca1 100644
> --- a/arch/x86/events/intel/pt.h
> +++ b/arch/x86/events/intel/pt.h
> @@ -20,43 +20,6 @@
>  #define __INTEL_PT_H__
> 
>  /*
> - * PT MSR bit definitions
> - */
> -#define RTIT_CTL_TRACEEN BIT(0)
> -#define RTIT_CTL_CYCLEACCBIT(1)
> -#define RTIT_CTL_OS  BIT(2)
> -#define RTIT_CTL_USR BIT(3)
> -#define RTIT_CTL_PWR_EVT_EN  BIT(4)
> -#define RTIT_CTL_FUP_ON_PTW  BIT(5)
> -#define RTIT_CTL_CR3EN   BIT(7)
> -#define RTIT_CTL_TOPABIT(8)
> -#define RTIT_CTL_MTC_EN  BIT(9)
> -#define RTIT_CTL_TSC_EN  BIT(10)
> -#define RTIT_CTL_DISRETC BIT(11)
> -#define RTIT_CTL_PTW_EN  BIT(12)
> -#define RTIT_CTL_BRANCH_EN   BIT(13)
> -#define RTIT_CTL_MTC_RANGE_OFFSET14
> -#define RTIT_CTL_MTC_RANGE   (0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
> -#define RTIT_CTL_CYC_THRESH_OFFSET   19
> -#define RTIT_CTL_CYC_THRESH  (0x0full << RTIT_CTL_CYC_THRESH_OFFSET)
> -#define RTIT_CTL_PSB_FREQ_OFFSET 24
> -#define RTIT_CTL_PSB_FREQ(0x0full << 
> RTIT_CTL_PSB_FREQ_OFFSET)
> -#define RTIT_CTL_ADDR0_OFFSET32
> -#define RTIT_CTL_ADDR0   (0x0full << RTIT_CTL_ADDR0_OFFSET)
> -#define RTIT_CTL_ADDR1_OFFSET36
> -#define RTIT_CTL_ADDR1   (0x0full << RTIT_CTL_ADDR1_OFFSET)
> -#define RTIT_CTL_ADDR2_OFFSET40
> -#define RTIT_CTL_ADDR2   (0x0full << RTIT_CTL_ADDR2_OFFSET)
> -#define RTIT_CTL_ADDR3_OFFSET44
> -#define RTIT_CTL_ADDR3   (0x0full << RTIT_CTL_ADDR3_OFFSET)
> -#define RTIT_STATUS_FILTEREN BIT(0)
> -#define RTIT_STATUS_CONTEXTENBIT(1)
> -#define RTIT_STATUS_TRIGGERENBIT(2)
> -#define RTIT_STATUS_BUFFOVF  BIT(3)
> -#define RTIT_STATUS_ERRORBIT(4)
> -#define RTIT_STATUS_STOPPED  BIT(5)
> -
> -/*
>   * Single-entry ToPA: when this close to region boundary, switch
>   * buffers to avoid losing data.
>   */
> diff --git a/arch/x86/include/asm/msr-index.h 
> b/arch/x86/include/asm/msr-index.h
> index 53d5b1b..afe4e13 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -106,7 +106,40 @@
>  #define MSR_PEBS_LD_LAT_THRESHOLD0x03f6
> 
>  #define MSR_IA32_RTIT_CTL0x0570
> +#define RTIT_CTL_TRACEEN BIT(0)
> +#define RTIT_CTL_CYCLEACCBIT(1)
> +#define RTIT_CTL_OS  BIT(2)
> +#define RTIT_CTL_USR BIT(3)
> +#define RTIT_CTL_PWR_EVT_EN  BIT(4)
> +#define RTIT_CTL_FUP_ON_PTW  BIT(5)
> +#define RTIT_CTL_CR3EN   BIT(7)
> +#define RTIT_CTL_TOPABIT(8)
> +#define RTIT_CTL_MTC_EN  BIT(9)
> +#define RTIT_CTL_TSC_EN  BIT(10)
> +#define RTIT_CTL_DISRETC BIT(11)
> +#define RTIT_CTL_PTW_EN  BIT(12)
> +#define RTIT_CTL_BRANCH_EN   BIT(13)
> +#define RTIT_CTL_MTC_RANGE_OFFSET14
> +#define RTIT_CTL_MTC_RANGE   (0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
> +#define RTIT_CTL_CYC_THRESH_OFFSET   19
> +#define RTIT_CTL_CYC_THRESH   

RE: [PATCH v8 01/12] perf/x86/intel/pt: Move Intel-PT MSRs bit definitions to a public header

2018-05-16 Thread Kang, Luwei
> > From: Chao Peng 
> >
> > Intel Processor Trace virtualization enabling in KVM guest need to
> > access these MSRs bit definitions, so move them to public header file
> > msr-index.h.
> > @@ -115,6 +148,7 @@
> >  #define MSR_IA32_RTIT_ADDR2_B  0x0585
> >  #define MSR_IA32_RTIT_ADDR3_A  0x0586
> >  #define MSR_IA32_RTIT_ADDR3_B  0x0587
> > +#define MSR_IA32_RTIT_ADDR_RANGE   4
> 
> This one wasn't there before, so belongs in a different patch.

What about move this definition to "arch/x86/include/asm/intel_pt.h " and be 
added in patch 8 (implementation of context switch).

Thanks,
Luwei Kang

> 
> Regards,
> --
> Alex


RE: [PATCH v8 01/12] perf/x86/intel/pt: Move Intel-PT MSRs bit definitions to a public header

2018-05-16 Thread Kang, Luwei
> > From: Chao Peng 
> >
> > Intel Processor Trace virtualization enabling in KVM guest need to
> > access these MSRs bit definitions, so move them to public header file
> > msr-index.h.
> > @@ -115,6 +148,7 @@
> >  #define MSR_IA32_RTIT_ADDR2_B  0x0585
> >  #define MSR_IA32_RTIT_ADDR3_A  0x0586
> >  #define MSR_IA32_RTIT_ADDR3_B  0x0587
> > +#define MSR_IA32_RTIT_ADDR_RANGE   4
> 
> This one wasn't there before, so belongs in a different patch.

What about move this definition to "arch/x86/include/asm/intel_pt.h " and be 
added in patch 8 (implementation of context switch).

Thanks,
Luwei Kang

> 
> Regards,
> --
> Alex


RE: [PATCH v7 05/13] perf/x86/intel/pt: Introduce a new function to get capability of Intel PT

2018-05-03 Thread Kang, Luwei
> >> Can you elaborate on this, what information do we need besides
> >> MSR_IA32_VMX_MISC[14]?
> >
> > Enable Intel PT in guest depend on SECONDARY_EXEC_PT_USE_GPA, EPT,
> > MSR_IA32_VMX_MISC[14] and the capability of " PT_CAP_topa_output" on
> > hardware. PT driver will return " -ENODEV" if hardware not support
> > "PT_CAP_topa_output".
> 
> I actually don't understand why PT_CAP_topa_output matters for the purpose of 
> enabling PT in the guest.  However you still need
> __pt_cap_get() in the CPUID checks.

Hi Paolo,
I think we should expose Intel Processor Trace to guest that can be detected 
and initialized. But without "PT_CAP_topa_output" Intel PT can't work in Linux. 
 So I add this feature as precondition. 
About why need this check in driver I think Alexander may know the reason.

Thanks,
Luwei Kang

> 
> Paolo


RE: [PATCH v7 05/13] perf/x86/intel/pt: Introduce a new function to get capability of Intel PT

2018-05-03 Thread Kang, Luwei
> >> Can you elaborate on this, what information do we need besides
> >> MSR_IA32_VMX_MISC[14]?
> >
> > Enable Intel PT in guest depend on SECONDARY_EXEC_PT_USE_GPA, EPT,
> > MSR_IA32_VMX_MISC[14] and the capability of " PT_CAP_topa_output" on
> > hardware. PT driver will return " -ENODEV" if hardware not support
> > "PT_CAP_topa_output".
> 
> I actually don't understand why PT_CAP_topa_output matters for the purpose of 
> enabling PT in the guest.  However you still need
> __pt_cap_get() in the CPUID checks.

Hi Paolo,
I think we should expose Intel Processor Trace to guest that can be detected 
and initialized. But without "PT_CAP_topa_output" Intel PT can't work in Linux. 
 So I add this feature as precondition. 
About why need this check in driver I think Alexander may know the reason.

Thanks,
Luwei Kang

> 
> Paolo


RE: [PATCH v7 05/13] perf/x86/intel/pt: Introduce a new function to get capability of Intel PT

2018-05-03 Thread Kang, Luwei
> > > > New function __pt_cap_get() will be invoked in KVM to check if a
> > > > specific capability is availiable in KVM guest.
> > > > Another function pt_cap_get() can only check the hardware
> > > > capabilities but this may different with KVM guest because some
> > > > features may not be exposed to guest.
> > >
> > > Do we really need both in KVM?
> >
> > Yes, KVM need get host capability to estimate if can expose this
> > feature to guest
> 
> Can you elaborate on this, what information do we need besides 
> MSR_IA32_VMX_MISC[14]?

Enable Intel PT in guest depend on SECONDARY_EXEC_PT_USE_GPA, EPT, 
MSR_IA32_VMX_MISC[14] and the capability of " PT_CAP_topa_output" on hardware. 
PT driver will return " -ENODEV" if hardware not support "PT_CAP_topa_output".

Thanks,
Luwei Kang

> 
> Thanks,
> --
> Alex



RE: [PATCH v7 05/13] perf/x86/intel/pt: Introduce a new function to get capability of Intel PT

2018-05-03 Thread Kang, Luwei
> > > > New function __pt_cap_get() will be invoked in KVM to check if a
> > > > specific capability is availiable in KVM guest.
> > > > Another function pt_cap_get() can only check the hardware
> > > > capabilities but this may different with KVM guest because some
> > > > features may not be exposed to guest.
> > >
> > > Do we really need both in KVM?
> >
> > Yes, KVM need get host capability to estimate if can expose this
> > feature to guest
> 
> Can you elaborate on this, what information do we need besides 
> MSR_IA32_VMX_MISC[14]?

Enable Intel PT in guest depend on SECONDARY_EXEC_PT_USE_GPA, EPT, 
MSR_IA32_VMX_MISC[14] and the capability of " PT_CAP_topa_output" on hardware. 
PT driver will return " -ENODEV" if hardware not support "PT_CAP_topa_output".

Thanks,
Luwei Kang

> 
> Thanks,
> --
> Alex



RE: [PATCH v7 00/13] Intel Processor Trace virtualization enabling

2018-05-03 Thread Kang, Luwei
> On Thu, May 03, 2018 at 10:23:12AM +0000, Kang, Luwei wrote:
> > Send this patch set twice because path 2 send fail in the first time.
> 
> Then just resend patch 2; also I have two copies of patch 2, so clearly it 
> _did_ send.

Cc all the maintainers and contributors.

Thanks,
Luwei Kang.


RE: [PATCH v7 00/13] Intel Processor Trace virtualization enabling

2018-05-03 Thread Kang, Luwei
> On Thu, May 03, 2018 at 10:23:12AM +0000, Kang, Luwei wrote:
> > Send this patch set twice because path 2 send fail in the first time.
> 
> Then just resend patch 2; also I have two copies of patch 2, so clearly it 
> _did_ send.

Cc all the maintainers and contributors.

Thanks,
Luwei Kang.


RE: [PATCH v7 05/13] perf/x86/intel/pt: Introduce a new function to get capability of Intel PT

2018-05-03 Thread Kang, Luwei
> > New function __pt_cap_get() will be invoked in KVM to check if a
> > specific capability is availiable in KVM guest.
> > Another function pt_cap_get() can only check the hardware capabilities
> > but this may different with KVM guest because some features may not be
> > exposed to guest.
> 
> Do we really need both in KVM?

Yes, KVM need get host capability to estimate if can expose this feature to 
guest and get guest capability to confirm if some bits of MSRs are accessible.

Thanks,
Luwei Kang

> 
> > diff --git a/arch/x86/include/asm/intel_pt.h
> > b/arch/x86/include/asm/intel_pt.h index 2de4db0..3a4f524 100644
> > --- a/arch/x86/include/asm/intel_pt.h
> > +++ b/arch/x86/include/asm/intel_pt.h
> > @@ -27,9 +27,11 @@ enum pt_capabilities {  #if
> > defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)  void
> > cpu_emergency_stop_pt(void);  extern u32 pt_cap_get(enum
> > pt_capabilities cap);
> > +extern u32 __pt_cap_get(u32 *caps, enum pt_capabilities cap);
> 
> I'd call it something like pt_cap_decode().



RE: [PATCH v7 05/13] perf/x86/intel/pt: Introduce a new function to get capability of Intel PT

2018-05-03 Thread Kang, Luwei
> > New function __pt_cap_get() will be invoked in KVM to check if a
> > specific capability is availiable in KVM guest.
> > Another function pt_cap_get() can only check the hardware capabilities
> > but this may different with KVM guest because some features may not be
> > exposed to guest.
> 
> Do we really need both in KVM?

Yes, KVM need get host capability to estimate if can expose this feature to 
guest and get guest capability to confirm if some bits of MSRs are accessible.

Thanks,
Luwei Kang

> 
> > diff --git a/arch/x86/include/asm/intel_pt.h
> > b/arch/x86/include/asm/intel_pt.h index 2de4db0..3a4f524 100644
> > --- a/arch/x86/include/asm/intel_pt.h
> > +++ b/arch/x86/include/asm/intel_pt.h
> > @@ -27,9 +27,11 @@ enum pt_capabilities {  #if
> > defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)  void
> > cpu_emergency_stop_pt(void);  extern u32 pt_cap_get(enum
> > pt_capabilities cap);
> > +extern u32 __pt_cap_get(u32 *caps, enum pt_capabilities cap);
> 
> I'd call it something like pt_cap_decode().



RE: [PATCH v7 00/13] Intel Processor Trace virtualization enabling

2018-05-03 Thread Kang, Luwei
> Hi All,
> 
> Here is a patch-series which adding Processor Trace enabling in KVM guest. 
> You can get It's software developer manuals from:
> https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
> In Chapter 4 INTEL PROCESSOR TRACE: VMX IMPROVEMENTS.
> 
> Introduction:
> Intel Processor Trace (Intel PT) is an extension of Intel Architecture that 
> captures information about software execution using
> dedicated hardware facilities that cause only minimal performance 
> perturbation to the software being traced. Details on the Intel PT
> infrastructure and trace capabilities can be found in the Intel 64 and IA-32 
> Architectures Software Developer’s Manual, Volume 3C.
> 
> The suite of architecture changes serve to simplify the process of 
> virtualizing Intel PT for use by a guest software. There are two
> primary elements to this new architecture support for VMX support 
> improvements made for Intel PT.
> 1. Addition of a new guest IA32_RTIT_CTL value field to the VMCS.
>   — This serves to speed and simplify the process of disabling trace on VM 
> exit, and restoring it on VM entry.
> 2. Enabling use of EPT to redirect PT output.
>   — This enables the VMM to elect to virtualize the PT output buffer using 
> EPT. In this mode, the CPU will treat PT output addresses
> as Guest Physical Addresses (GPAs) and translate them using EPT. This means 
> that Intel PT output reads (of the ToPA table) and
> writes (of trace output) can cause EPT violations, and other output events.
> 
> Processor Trace virtualization can be work in one of 3 possible modes by set 
> new option "pt_mode". Default value is system mode.
>  a. system-wide: trace both host/guest and output to host buffer;  b. 
> host-only: only trace host and output to host buffer;  c. host-
> guest: trace host/guest simultaneous and output to their respective buffer.
> 
> >From V6:
>  - split pathes 1~2 to four separate patches (these patches do 2 things) and 
> add more descriptions.
> 
> >From V5:
>  - rename the function from pt_cap_get_ex() to __pt_cap_get();
>  - replace the most of function from vmx_pt_supported() to "pt_mode == 
> PT_MODE_HOST_GUEST"(or !=).
> 
> >From V4:
>  - add data check when setting the value of MSR_IA32_RTIT_CTL;
>  - Invoke new interface to set the intercept of MSRs read/write after "MSR 
> bitmap per-vcpu" patches.
> 
> >From V3:
>  - change default mode to SYSTEM mode;
>  - add a new patch to move PT out of scattered features;
>  - add a new fucntion kvm_get_pt_addr_cnt() to get the number of address 
> ranges;
>  - add a new function vmx_set_rtit_ctl() to set the value of guest RTIT_CTL, 
> GUEST_IA32_RTIT_CTL and MSRs intercept.
> 
> >From v2:
>  - replace *_PT_SUPPRESS_PIP to *_PT_CONCEAL_PIP;
>  - clean SECONDARY_EXEC_PT_USE_GPA, VM_EXIT_CLEAR_IA32_RTIT_CTL and 
> VM_ENTRY_LOAD_IA32_RTIT_CTL in SYSTEM mode.
> These bits must be all set or all clean;
>  - move processor tracing out of scattered features;
>  - add a new function to enable/disable intercept MSRs read/write;
>  - add all Intel PT MSRs read/write and disable intercept when PT is enabled 
> in guest;
>  - disable Intel PT and enable intercept MSRs when L1 guest VMXON;
>  - performance optimization.
>In Host only mode. we just need to save host RTIT_CTL before vm-entry and 
> restore host RTIT_CTL after vm-exit;
>In HOST_GUEST mode. we need to save and restore all MSRs only when PT has 
> enabled in guest.
>  - use XSAVES/XRESTORES implement context switch.
>Haven't implementation in this version and still in debuging. will make a 
> separate patch work on this.
> 
> >From v1:
>  - remove guest-only mode because guest-only mode can be covered by 
> host-guest mode;
>  - always set "use GPA for processor tracing" in secondary execution control 
> if it can be;
>  - trap RTIT_CTL read/write. Forbid write this msr when VMXON in L1 
> hypervisor.
> 
> Chao Peng (8):
>   perf/x86/intel/pt: Move Intel-PT MSRs bit definitions to a public
> header
>   perf/x86/intel/pt: Change pt_cap_get() to a public function
>   KVM: x86: Add Intel Processor Trace virtualization mode
>   KVM: x86: Add Intel Processor Trace cpuid emulation
>   KVM: x86: Add Intel processor trace context for each vcpu
>   KVM: x86: Implement Intel Processor Trace context switch
>   KVM: x86: Implement Intel Processor Trace MSRs read/write
>   KVM: x86: Set intercept for Intel PT MSRs read/write
> 
> Luwei Kang (5):
>   perf/x86/intel/pt: Add new bit definitions for Intel PT MSRs
>   perf/x86/intel/pt: add new capability for Intel PT
>   perf/x86/intel/pt: Introduce a new function to get capability of Intel
> PT
>   KVM: x86: Introduce a function to initialize the PT configuration
>   KVM: x86: Disable Intel Processor Trace when VMXON in L1 guest
> 
>  arch/x86/events/intel/pt.c   |  12 +-
>  arch/x86/events/intel/pt.h   |  58 --
>  arch/x86/include/asm/intel_pt.h  |  40 
>  

RE: [PATCH v7 00/13] Intel Processor Trace virtualization enabling

2018-05-03 Thread Kang, Luwei
> Hi All,
> 
> Here is a patch-series which adding Processor Trace enabling in KVM guest. 
> You can get It's software developer manuals from:
> https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
> In Chapter 4 INTEL PROCESSOR TRACE: VMX IMPROVEMENTS.
> 
> Introduction:
> Intel Processor Trace (Intel PT) is an extension of Intel Architecture that 
> captures information about software execution using
> dedicated hardware facilities that cause only minimal performance 
> perturbation to the software being traced. Details on the Intel PT
> infrastructure and trace capabilities can be found in the Intel 64 and IA-32 
> Architectures Software Developer’s Manual, Volume 3C.
> 
> The suite of architecture changes serve to simplify the process of 
> virtualizing Intel PT for use by a guest software. There are two
> primary elements to this new architecture support for VMX support 
> improvements made for Intel PT.
> 1. Addition of a new guest IA32_RTIT_CTL value field to the VMCS.
>   — This serves to speed and simplify the process of disabling trace on VM 
> exit, and restoring it on VM entry.
> 2. Enabling use of EPT to redirect PT output.
>   — This enables the VMM to elect to virtualize the PT output buffer using 
> EPT. In this mode, the CPU will treat PT output addresses
> as Guest Physical Addresses (GPAs) and translate them using EPT. This means 
> that Intel PT output reads (of the ToPA table) and
> writes (of trace output) can cause EPT violations, and other output events.
> 
> Processor Trace virtualization can be work in one of 3 possible modes by set 
> new option "pt_mode". Default value is system mode.
>  a. system-wide: trace both host/guest and output to host buffer;  b. 
> host-only: only trace host and output to host buffer;  c. host-
> guest: trace host/guest simultaneous and output to their respective buffer.
> 
> >From V6:
>  - split pathes 1~2 to four separate patches (these patches do 2 things) and 
> add more descriptions.
> 
> >From V5:
>  - rename the function from pt_cap_get_ex() to __pt_cap_get();
>  - replace the most of function from vmx_pt_supported() to "pt_mode == 
> PT_MODE_HOST_GUEST"(or !=).
> 
> >From V4:
>  - add data check when setting the value of MSR_IA32_RTIT_CTL;
>  - Invoke new interface to set the intercept of MSRs read/write after "MSR 
> bitmap per-vcpu" patches.
> 
> >From V3:
>  - change default mode to SYSTEM mode;
>  - add a new patch to move PT out of scattered features;
>  - add a new fucntion kvm_get_pt_addr_cnt() to get the number of address 
> ranges;
>  - add a new function vmx_set_rtit_ctl() to set the value of guest RTIT_CTL, 
> GUEST_IA32_RTIT_CTL and MSRs intercept.
> 
> >From v2:
>  - replace *_PT_SUPPRESS_PIP to *_PT_CONCEAL_PIP;
>  - clean SECONDARY_EXEC_PT_USE_GPA, VM_EXIT_CLEAR_IA32_RTIT_CTL and 
> VM_ENTRY_LOAD_IA32_RTIT_CTL in SYSTEM mode.
> These bits must be all set or all clean;
>  - move processor tracing out of scattered features;
>  - add a new function to enable/disable intercept MSRs read/write;
>  - add all Intel PT MSRs read/write and disable intercept when PT is enabled 
> in guest;
>  - disable Intel PT and enable intercept MSRs when L1 guest VMXON;
>  - performance optimization.
>In Host only mode. we just need to save host RTIT_CTL before vm-entry and 
> restore host RTIT_CTL after vm-exit;
>In HOST_GUEST mode. we need to save and restore all MSRs only when PT has 
> enabled in guest.
>  - use XSAVES/XRESTORES implement context switch.
>Haven't implementation in this version and still in debuging. will make a 
> separate patch work on this.
> 
> >From v1:
>  - remove guest-only mode because guest-only mode can be covered by 
> host-guest mode;
>  - always set "use GPA for processor tracing" in secondary execution control 
> if it can be;
>  - trap RTIT_CTL read/write. Forbid write this msr when VMXON in L1 
> hypervisor.
> 
> Chao Peng (8):
>   perf/x86/intel/pt: Move Intel-PT MSRs bit definitions to a public
> header
>   perf/x86/intel/pt: Change pt_cap_get() to a public function
>   KVM: x86: Add Intel Processor Trace virtualization mode
>   KVM: x86: Add Intel Processor Trace cpuid emulation
>   KVM: x86: Add Intel processor trace context for each vcpu
>   KVM: x86: Implement Intel Processor Trace context switch
>   KVM: x86: Implement Intel Processor Trace MSRs read/write
>   KVM: x86: Set intercept for Intel PT MSRs read/write
> 
> Luwei Kang (5):
>   perf/x86/intel/pt: Add new bit definitions for Intel PT MSRs
>   perf/x86/intel/pt: add new capability for Intel PT
>   perf/x86/intel/pt: Introduce a new function to get capability of Intel
> PT
>   KVM: x86: Introduce a function to initialize the PT configuration
>   KVM: x86: Disable Intel Processor Trace when VMXON in L1 guest
> 
>  arch/x86/events/intel/pt.c   |  12 +-
>  arch/x86/events/intel/pt.h   |  58 --
>  arch/x86/include/asm/intel_pt.h  |  40 
>  

RE: [PATCH v6 01/11] perf/x86/intel/pt: Move Intel-PT MSR bit definitions to a public header

2018-05-02 Thread Kang, Luwei
> > Intel Processor Trace virtualization enabling in guest need to use
> > these MSR bits, so move them to public header msr-index.h.
> > Introduce RTIT_CTL_FABRIC_EN and sync the definitions to latest spec.
> 
> You forgot to Cc the maintainers.
> 
> Also, this patch does 2 things, I think we have a rule that says that's a bad 
> thing.

I will split it into two patches, one patch to make these #define in a public 
header and another patch to add new bits for Intel Processor Trace MSRs. Thank 
for the review.

Thanks,
Luwei Kang


RE: [PATCH v6 01/11] perf/x86/intel/pt: Move Intel-PT MSR bit definitions to a public header

2018-05-02 Thread Kang, Luwei
> > Intel Processor Trace virtualization enabling in guest need to use
> > these MSR bits, so move them to public header msr-index.h.
> > Introduce RTIT_CTL_FABRIC_EN and sync the definitions to latest spec.
> 
> You forgot to Cc the maintainers.
> 
> Also, this patch does 2 things, I think we have a rule that says that's a bad 
> thing.

I will split it into two patches, one patch to make these #define in a public 
header and another patch to add new bits for Intel Processor Trace MSRs. Thank 
for the review.

Thanks,
Luwei Kang


RE: [PATCH v5 11/11] KVM: x86: Disable Intel Processor Trace when VMXON in L1 guest

2018-03-19 Thread Kang, Luwei
> > +   if (pt_mode == PT_MODE_HOST_GUEST) {
> 
> This would be vmx_pt_supported(), but I think it's better to remove that 
> function and just test pt_mode ==
> PT_MODE_HOST_GUEST everywhere (or !=).

Hi Paolo,
There have so many vmx_pt_supported() invoked in MSRs read/write (patch 9) 
patch and I think it can be replaced by  pt_mode check.
But I think this function need to be reserved in patch 5(cpuid emulation). 
What is your opinion?

Thanks,
Luwei Kang

> 
> Paolo
> 
> > +   vmx->pt_desc.guest.ctl = 0;
> > +   vmcs_write64(GUEST_IA32_RTIT_CTL, 0);
> > +   pt_set_intercept_for_msr(vmx, 1);
> > +   }
> > +



RE: [PATCH v5 11/11] KVM: x86: Disable Intel Processor Trace when VMXON in L1 guest

2018-03-19 Thread Kang, Luwei
> > +   if (pt_mode == PT_MODE_HOST_GUEST) {
> 
> This would be vmx_pt_supported(), but I think it's better to remove that 
> function and just test pt_mode ==
> PT_MODE_HOST_GUEST everywhere (or !=).

Hi Paolo,
There have so many vmx_pt_supported() invoked in MSRs read/write (patch 9) 
patch and I think it can be replaced by  pt_mode check.
But I think this function need to be reserved in patch 5(cpuid emulation). 
What is your opinion?

Thanks,
Luwei Kang

> 
> Paolo
> 
> > +   vmx->pt_desc.guest.ctl = 0;
> > +   vmcs_write64(GUEST_IA32_RTIT_CTL, 0);
> > +   pt_set_intercept_for_msr(vmx, 1);
> > +   }
> > +



RE: [PATCH v5 03/11] perf/x86/intel/pt: Introduce a new function to get the capability of Intel PT

2018-03-19 Thread Kang, Luwei
> > +u32 pt_cap_get_ex(u32 *caps, enum pt_capabilities cap) {
> > +   struct pt_cap_desc *cd = _caps[cap];
> > +   u32 c = caps[cd->leaf * PT_CPUID_REGS_NUM + cd->reg];
> > +   unsigned int shift = __ffs(cd->mask);
> > +
> > +   return (c & cd->mask) >> shift;
> > +}
> > +EXPORT_SYMBOL_GPL(pt_cap_get_ex);
> > +
> 
> You should change pt_cap_get to use this function.  Also, "_ex" is not a very 
> common suffix, so perhaps you can call it
> __pt_cap_get.
> 
> I don't have any other comments on patches 1-3, so when you resend we can ask 
> the x86 maintainers for approval.
> 

Hi Paolo,
Is that like this?

--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -76,14 +76,20 @@
PT_CAP(psb_periods, 1, CPUID_EBX, 0x),
 };

-u32 pt_cap_get(enum pt_capabilities cap)
+u32 __pt_cap_get(u32 *caps, enum pt_capabilities cap)
 {
struct pt_cap_desc *cd = _caps[cap];
-   u32 c = pt_pmu.caps[cd->leaf * PT_CPUID_REGS_NUM + cd->reg];
+   u32 c = caps[cd->leaf * PT_CPUID_REGS_NUM + cd->reg];
unsigned int shift = __ffs(cd->mask);

return (c & cd->mask) >> shift;
 }
+EXPORT_SYMBOL_GPL(__pt_cap_get);
+
+u32 pt_cap_get(enum pt_capabilities cap)
+{
+   return __pt_cap_get(pt_pmu.caps, cap);
+}
 EXPORT_SYMBOL_GPL(pt_cap_get);

 static ssize_t pt_cap_show(struct device *cdev,
diff --git a/arch/x86/include/asm/intel_pt.h b/arch/x86/include/asm/intel_pt.h
index 2de4db0..3a4f524 100644
--- a/arch/x86/include/asm/intel_pt.h
+++ b/arch/x86/include/asm/intel_pt.h
@@ -27,9 +27,11 @@ enum pt_capabilities {
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
 void cpu_emergency_stop_pt(void);
 extern u32 pt_cap_get(enum pt_capabilities cap);
+extern u32 __pt_cap_get(u32 *caps, enum pt_capabilities cap);
 #else
 static inline void cpu_emergency_stop_pt(void) {}
 static inline u32 pt_cap_get(enum pt_capabilities cap) { return 0; }
+static u32 __pt_cap_get(u32 *caps, enum pt_capabilities cap) { return 0; }


Thanks,
Luwei Kang


RE: [PATCH v5 03/11] perf/x86/intel/pt: Introduce a new function to get the capability of Intel PT

2018-03-19 Thread Kang, Luwei
> > +u32 pt_cap_get_ex(u32 *caps, enum pt_capabilities cap) {
> > +   struct pt_cap_desc *cd = _caps[cap];
> > +   u32 c = caps[cd->leaf * PT_CPUID_REGS_NUM + cd->reg];
> > +   unsigned int shift = __ffs(cd->mask);
> > +
> > +   return (c & cd->mask) >> shift;
> > +}
> > +EXPORT_SYMBOL_GPL(pt_cap_get_ex);
> > +
> 
> You should change pt_cap_get to use this function.  Also, "_ex" is not a very 
> common suffix, so perhaps you can call it
> __pt_cap_get.
> 
> I don't have any other comments on patches 1-3, so when you resend we can ask 
> the x86 maintainers for approval.
> 

Hi Paolo,
Is that like this?

--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -76,14 +76,20 @@
PT_CAP(psb_periods, 1, CPUID_EBX, 0x),
 };

-u32 pt_cap_get(enum pt_capabilities cap)
+u32 __pt_cap_get(u32 *caps, enum pt_capabilities cap)
 {
struct pt_cap_desc *cd = _caps[cap];
-   u32 c = pt_pmu.caps[cd->leaf * PT_CPUID_REGS_NUM + cd->reg];
+   u32 c = caps[cd->leaf * PT_CPUID_REGS_NUM + cd->reg];
unsigned int shift = __ffs(cd->mask);

return (c & cd->mask) >> shift;
 }
+EXPORT_SYMBOL_GPL(__pt_cap_get);
+
+u32 pt_cap_get(enum pt_capabilities cap)
+{
+   return __pt_cap_get(pt_pmu.caps, cap);
+}
 EXPORT_SYMBOL_GPL(pt_cap_get);

 static ssize_t pt_cap_show(struct device *cdev,
diff --git a/arch/x86/include/asm/intel_pt.h b/arch/x86/include/asm/intel_pt.h
index 2de4db0..3a4f524 100644
--- a/arch/x86/include/asm/intel_pt.h
+++ b/arch/x86/include/asm/intel_pt.h
@@ -27,9 +27,11 @@ enum pt_capabilities {
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
 void cpu_emergency_stop_pt(void);
 extern u32 pt_cap_get(enum pt_capabilities cap);
+extern u32 __pt_cap_get(u32 *caps, enum pt_capabilities cap);
 #else
 static inline void cpu_emergency_stop_pt(void) {}
 static inline u32 pt_cap_get(enum pt_capabilities cap) { return 0; }
+static u32 __pt_cap_get(u32 *caps, enum pt_capabilities cap) { return 0; }


Thanks,
Luwei Kang


RE: [PATCH] KVM: VMX: use same MSR bitmaps for 32-/64-bit modes, fix MSR bitmaps for processor tracing

2018-01-07 Thread Kang, Luwei

> -Original Message-
> From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo 
> Bonzini
> Sent: Friday, January 5, 2018 11:43 PM
> To: linux-kernel@vger.kernel.org; k...@vger.kernel.org
> Cc: Kang, Luwei <luwei.k...@intel.com>
> Subject: [PATCH] KVM: VMX: use same MSR bitmaps for 32-/64-bit modes, fix MSR 
> bitmaps for processor tracing
> 
> KVM has a small optimization where it doesn't save/restore MSR_KERNEL_GS_BASE 
> if the guest is in 32-bit mode.  However, this
> complicates the code noticeably by doubling the number of possible MSR 
> bitmaps.  In addition, pt_disable_intercept_for_msr was
> only updating the "basic" MSR bitmap, because x2apic and x2apic_apicv are 
> memcpy'd just once in hardware_setup.
> 
> Remove the long-mode bitmaps set, and touch all the three remaining bitmaps 
> in vmx_disable_intercept_for_msr and
> vmx_enable_intercept_for_msr.
> 
> Fixes: 3bd1f85e893daec4f3982d1d45b6bfc0683442c4
> Cc: Luwei Kang <luwei.k...@intel.com>
> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 126 
> -
>  1 file changed, 48 insertions(+), 78 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 
> 11321cc4129a..ea5b52dc2d78 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -935,12 +935,9 @@ static bool nested_vmx_is_page_fault_vmexit(struct 
> vmcs12 *vmcs12,  static
> DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> 
>  enum {
> - VMX_MSR_BITMAP_LEGACY,
> - VMX_MSR_BITMAP_LONGMODE,
> - VMX_MSR_BITMAP_LEGACY_X2APIC_APICV,
> - VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV,
> - VMX_MSR_BITMAP_LEGACY_X2APIC,
> - VMX_MSR_BITMAP_LONGMODE_X2APIC,
> + VMX_MSR_BITMAP_NOX2APIC,
> + VMX_MSR_BITMAP_X2APIC_APICV,
> + VMX_MSR_BITMAP_X2APIC,
>   VMX_VMREAD_BITMAP,
>   VMX_VMWRITE_BITMAP,
>   VMX_BITMAP_NR
> @@ -948,12 +945,9 @@ enum {
> 
>  static unsigned long *vmx_bitmap[VMX_BITMAP_NR];
> 
> -#define vmx_msr_bitmap_legacy
> (vmx_bitmap[VMX_MSR_BITMAP_LEGACY])
> -#define vmx_msr_bitmap_longmode  
> (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE])
> -#define vmx_msr_bitmap_legacy_x2apic_apicv   
> (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC_APICV])
> -#define vmx_msr_bitmap_longmode_x2apic_apicv 
> (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV])
> -#define vmx_msr_bitmap_legacy_x2apic 
> (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC])
> -#define vmx_msr_bitmap_longmode_x2apic   
> (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC])
> +#define vmx_msr_bitmap_nox2apic  
> (vmx_bitmap[VMX_MSR_BITMAP_NOX2APIC])
> +#define vmx_msr_bitmap_x2apic_apicv  
> (vmx_bitmap[VMX_MSR_BITMAP_X2APIC_APICV])
> +#define vmx_msr_bitmap_x2apic
> (vmx_bitmap[VMX_MSR_BITMAP_X2APIC])
>  #define vmx_vmread_bitmap(vmx_bitmap[VMX_VMREAD_BITMAP])
>  #define vmx_vmwrite_bitmap   (vmx_bitmap[VMX_VMWRITE_BITMAP])
> 
> @@ -2203,8 +2197,7 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu)
> 
>  #ifdef CONFIG_X86_64
>   rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
> - if (is_long_mode(>vcpu))
> - wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
> + wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
>  #endif
>   if (boot_cpu_has(X86_FEATURE_MPX))
>   rdmsrl(MSR_IA32_BNDCFGS, vmx->host_state.msr_host_bndcfgs);
> @@ -,8 +2215,7 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
>   ++vmx->vcpu.stat.host_state_reload;
>   vmx->host_state.loaded = 0;
>  #ifdef CONFIG_X86_64
> - if (is_long_mode(>vcpu))
> - rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
> + rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
>  #endif
>   if (vmx->host_state.gs_ldt_reload_needed) {
>   kvm_load_ldt(vmx->host_state.ldt_sel);
> @@ -2635,21 +2627,12 @@ static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
>(vmcs_read32(SECONDARY_VM_EXEC_CONTROL) &
> SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) {
>   if (enable_apicv && kvm_vcpu_apicv_active(vcpu)) {
> - if (is_long_mode(vcpu))
> - msr_bitmap = 
> vmx_msr_bitmap_longmode_x2apic_apicv;
> - else
> - msr_bitmap = vmx_msr_bitmap_legacy_x2apic_apicv;
> + msr_bitmap = vmx_msr_bitmap_x2apic_apicv;
>   } else {
> - if (is

RE: [PATCH] KVM: VMX: use same MSR bitmaps for 32-/64-bit modes, fix MSR bitmaps for processor tracing

2018-01-07 Thread Kang, Luwei

> -Original Message-
> From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo 
> Bonzini
> Sent: Friday, January 5, 2018 11:43 PM
> To: linux-kernel@vger.kernel.org; k...@vger.kernel.org
> Cc: Kang, Luwei 
> Subject: [PATCH] KVM: VMX: use same MSR bitmaps for 32-/64-bit modes, fix MSR 
> bitmaps for processor tracing
> 
> KVM has a small optimization where it doesn't save/restore MSR_KERNEL_GS_BASE 
> if the guest is in 32-bit mode.  However, this
> complicates the code noticeably by doubling the number of possible MSR 
> bitmaps.  In addition, pt_disable_intercept_for_msr was
> only updating the "basic" MSR bitmap, because x2apic and x2apic_apicv are 
> memcpy'd just once in hardware_setup.
> 
> Remove the long-mode bitmaps set, and touch all the three remaining bitmaps 
> in vmx_disable_intercept_for_msr and
> vmx_enable_intercept_for_msr.
> 
> Fixes: 3bd1f85e893daec4f3982d1d45b6bfc0683442c4
> Cc: Luwei Kang 
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/kvm/vmx.c | 126 
> -
>  1 file changed, 48 insertions(+), 78 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 
> 11321cc4129a..ea5b52dc2d78 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -935,12 +935,9 @@ static bool nested_vmx_is_page_fault_vmexit(struct 
> vmcs12 *vmcs12,  static
> DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> 
>  enum {
> - VMX_MSR_BITMAP_LEGACY,
> - VMX_MSR_BITMAP_LONGMODE,
> - VMX_MSR_BITMAP_LEGACY_X2APIC_APICV,
> - VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV,
> - VMX_MSR_BITMAP_LEGACY_X2APIC,
> - VMX_MSR_BITMAP_LONGMODE_X2APIC,
> + VMX_MSR_BITMAP_NOX2APIC,
> + VMX_MSR_BITMAP_X2APIC_APICV,
> + VMX_MSR_BITMAP_X2APIC,
>   VMX_VMREAD_BITMAP,
>   VMX_VMWRITE_BITMAP,
>   VMX_BITMAP_NR
> @@ -948,12 +945,9 @@ enum {
> 
>  static unsigned long *vmx_bitmap[VMX_BITMAP_NR];
> 
> -#define vmx_msr_bitmap_legacy
> (vmx_bitmap[VMX_MSR_BITMAP_LEGACY])
> -#define vmx_msr_bitmap_longmode  
> (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE])
> -#define vmx_msr_bitmap_legacy_x2apic_apicv   
> (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC_APICV])
> -#define vmx_msr_bitmap_longmode_x2apic_apicv 
> (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV])
> -#define vmx_msr_bitmap_legacy_x2apic 
> (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC])
> -#define vmx_msr_bitmap_longmode_x2apic   
> (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC])
> +#define vmx_msr_bitmap_nox2apic  
> (vmx_bitmap[VMX_MSR_BITMAP_NOX2APIC])
> +#define vmx_msr_bitmap_x2apic_apicv  
> (vmx_bitmap[VMX_MSR_BITMAP_X2APIC_APICV])
> +#define vmx_msr_bitmap_x2apic
> (vmx_bitmap[VMX_MSR_BITMAP_X2APIC])
>  #define vmx_vmread_bitmap(vmx_bitmap[VMX_VMREAD_BITMAP])
>  #define vmx_vmwrite_bitmap   (vmx_bitmap[VMX_VMWRITE_BITMAP])
> 
> @@ -2203,8 +2197,7 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu)
> 
>  #ifdef CONFIG_X86_64
>   rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
> - if (is_long_mode(>vcpu))
> - wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
> + wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
>  #endif
>   if (boot_cpu_has(X86_FEATURE_MPX))
>   rdmsrl(MSR_IA32_BNDCFGS, vmx->host_state.msr_host_bndcfgs);
> @@ -,8 +2215,7 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
>   ++vmx->vcpu.stat.host_state_reload;
>   vmx->host_state.loaded = 0;
>  #ifdef CONFIG_X86_64
> - if (is_long_mode(>vcpu))
> - rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
> + rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
>  #endif
>   if (vmx->host_state.gs_ldt_reload_needed) {
>   kvm_load_ldt(vmx->host_state.ldt_sel);
> @@ -2635,21 +2627,12 @@ static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
>(vmcs_read32(SECONDARY_VM_EXEC_CONTROL) &
> SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) {
>   if (enable_apicv && kvm_vcpu_apicv_active(vcpu)) {
> - if (is_long_mode(vcpu))
> - msr_bitmap = 
> vmx_msr_bitmap_longmode_x2apic_apicv;
> - else
> - msr_bitmap = vmx_msr_bitmap_legacy_x2apic_apicv;
> + msr_bitmap = vmx_msr_bitmap_x2apic_apicv;
>   } else {
> - if (is_long_mode(vcpu))
> - msr_bitmap = vmx_msr_b

RE: [PATCH v3 7/9] KVM: x86: Implement Intel Processor Trace MSRs read/write

2017-12-03 Thread Kang, Luwei
> >>> + case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B: {
> >>> + u32 eax, ebx, ecx, edx;
> >>> +
> >>> + cpuid_count(0x14, 1, , , , );
> >>
> >> Please cache the cpuid_count result, or do the cpuid_count after testing
> >> vmx_pt_supported() (which you can use instead of going through 
> >> kvm_x86_ops).
> >
> > Hi Paolo,
> > Thanks for your reply. I have cache EAX in "struct pt_desc" and will 
> > initialize in vmx_vcpu_setup().
> > +struct pt_desc {
> > +   unsigned int addr_num;
> > +   struct pt_ctx host;
> > +   struct pt_ctx guest;
> > +};
> > But kvm_init_msr_list() is invoked too early, I have to read from 
> > hardware.  So, what about change like this.
> >
> > -   cpuid_count(0x14, 1, , , , );
> > -   if (!kvm_x86_ops->pt_supported() || msrs_to_save[i] 
> > -
> > -   MSR_IA32_RTIT_ADDR0_A >= (eax & 
> > 0x7))
> > +   if (!kvm_x86_ops->pt_supported())
> > continue;
> > +   cpuid_count(0x14, 1, , , , );
> > +   if (msrs_to_save[i] -
> > +   MSR_IA32_RTIT_ADDR0_A >= (eax & 0x7))
> > +   continue;
> 
> For kvm_init_msr_list it's okay.  But can you please add a
> pt_msr_count() function?
>

Of course, I will fix in next version. Thanks!

Luwei Kang

> >>
> >>> + if (!kvm_x86_ops->pt_supported() || msrs_to_save[i] -
> >>> + MSR_IA32_RTIT_ADDR0_A >= (eax & 0x7))
> >>> + continue;
> >>> + break;
> >



RE: [PATCH v3 7/9] KVM: x86: Implement Intel Processor Trace MSRs read/write

2017-12-03 Thread Kang, Luwei
> >>> + case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B: {
> >>> + u32 eax, ebx, ecx, edx;
> >>> +
> >>> + cpuid_count(0x14, 1, , , , );
> >>
> >> Please cache the cpuid_count result, or do the cpuid_count after testing
> >> vmx_pt_supported() (which you can use instead of going through 
> >> kvm_x86_ops).
> >
> > Hi Paolo,
> > Thanks for your reply. I have cache EAX in "struct pt_desc" and will 
> > initialize in vmx_vcpu_setup().
> > +struct pt_desc {
> > +   unsigned int addr_num;
> > +   struct pt_ctx host;
> > +   struct pt_ctx guest;
> > +};
> > But kvm_init_msr_list() is invoked too early, I have to read from 
> > hardware.  So, what about change like this.
> >
> > -   cpuid_count(0x14, 1, , , , );
> > -   if (!kvm_x86_ops->pt_supported() || msrs_to_save[i] 
> > -
> > -   MSR_IA32_RTIT_ADDR0_A >= (eax & 
> > 0x7))
> > +   if (!kvm_x86_ops->pt_supported())
> > continue;
> > +   cpuid_count(0x14, 1, , , , );
> > +   if (msrs_to_save[i] -
> > +   MSR_IA32_RTIT_ADDR0_A >= (eax & 0x7))
> > +   continue;
> 
> For kvm_init_msr_list it's okay.  But can you please add a
> pt_msr_count() function?
>

Of course, I will fix in next version. Thanks!

Luwei Kang

> >>
> >>> + if (!kvm_x86_ops->pt_supported() || msrs_to_save[i] -
> >>> + MSR_IA32_RTIT_ADDR0_A >= (eax & 0x7))
> >>> + continue;
> >>> + break;
> >



RE: [PATCH v3 7/9] KVM: x86: Implement Intel Processor Trace MSRs read/write

2017-11-30 Thread Kang, Luwei
> > +   case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B: {
> > +   u32 eax, ebx, ecx, edx;
> > +
> > +   cpuid_count(0x14, 1, , , , );
> 
> Please cache the cpuid_count result, or do the cpuid_count after testing
> vmx_pt_supported() (which you can use instead of going through kvm_x86_ops).

Hi Paolo,
Thanks for your reply. I have cache EAX in "struct pt_desc" and will 
initialize in vmx_vcpu_setup().
+struct pt_desc {
+   unsigned int addr_num;
+   struct pt_ctx host;
+   struct pt_ctx guest;
+}; 
But kvm_init_msr_list() is invoked too early, I have to read from hardware. 
 So, what about change like this.

-   cpuid_count(0x14, 1, , , , );
-   if (!kvm_x86_ops->pt_supported() || msrs_to_save[i] -
-   MSR_IA32_RTIT_ADDR0_A >= (eax & 0x7))
+   if (!kvm_x86_ops->pt_supported())
continue;
+   cpuid_count(0x14, 1, , , , );
+   if (msrs_to_save[i] -
+   MSR_IA32_RTIT_ADDR0_A >= (eax & 0x7))
+   continue;

Thanks,
Luwei Kang

> 
> Thanks,
> 
> Paolo
> 
> > +   if (!kvm_x86_ops->pt_supported() || msrs_to_save[i] -
> > +   MSR_IA32_RTIT_ADDR0_A >= (eax & 0x7))
> > +   continue;
> > +   break;



RE: [PATCH v3 7/9] KVM: x86: Implement Intel Processor Trace MSRs read/write

2017-11-30 Thread Kang, Luwei
> > +   case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B: {
> > +   u32 eax, ebx, ecx, edx;
> > +
> > +   cpuid_count(0x14, 1, , , , );
> 
> Please cache the cpuid_count result, or do the cpuid_count after testing
> vmx_pt_supported() (which you can use instead of going through kvm_x86_ops).

Hi Paolo,
Thanks for your reply. I have cache EAX in "struct pt_desc" and will 
initialize in vmx_vcpu_setup().
+struct pt_desc {
+   unsigned int addr_num;
+   struct pt_ctx host;
+   struct pt_ctx guest;
+}; 
But kvm_init_msr_list() is invoked too early, I have to read from hardware. 
 So, what about change like this.

-   cpuid_count(0x14, 1, , , , );
-   if (!kvm_x86_ops->pt_supported() || msrs_to_save[i] -
-   MSR_IA32_RTIT_ADDR0_A >= (eax & 0x7))
+   if (!kvm_x86_ops->pt_supported())
continue;
+   cpuid_count(0x14, 1, , , , );
+   if (msrs_to_save[i] -
+   MSR_IA32_RTIT_ADDR0_A >= (eax & 0x7))
+   continue;

Thanks,
Luwei Kang

> 
> Thanks,
> 
> Paolo
> 
> > +   if (!kvm_x86_ops->pt_supported() || msrs_to_save[i] -
> > +   MSR_IA32_RTIT_ADDR0_A >= (eax & 0x7))
> > +   continue;
> > +   break;



RE: [patch v2 3/8] KVM: x86: add Intel processor trace virtualization mode

2017-11-13 Thread Kang, Luwei
> > +   if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_PT_USE_GPA) ||
> > +   !(_vmexit_control & VM_EXIT_CLEAR_IA32_RTIT_CTL) ||
> > +   !(_vmentry_control & VM_ENTRY_LOAD_IA32_RTIT_CTL)) {
> > +   _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_PT_USE_GPA;
> 
> Also, you are not checking anywhere if the SUPPRESS_PIP controls are 
> available.  This is probably the best place.

SUPPRESS_PIP(should be "CONCEAL", will fix it.) is use for control of  
processor trace packet. 
I think we should clear it when in SYSTEM mode (For example, PIPs are generated 
on VM exit, with NonRoot=0. On VM exit to SMM, VMCS packets are additionally 
generated). Why need check this here?

> 
> > +   _vmexit_control &= ~VM_EXIT_CLEAR_IA32_RTIT_CTL;
> > +   _vmentry_control &= ~VM_ENTRY_LOAD_IA32_RTIT_CTL;
> 
> These two are not needed; disabling SECONDARY_EXEC_PT_USE_GPA is enough.
> The tracing mode will revert to PT_SYSTEM, which does not use the load/clear 
> RTIT_CTL controls.
> 

The status of *_RTIT_CTL should be same with SECONDARY_EXEC_PT_USE_GPA or would 
cause VM-entry failed. 
(architecture-instruction-set-extensions-programming-reference  5.2.3)


RE: [patch v2 3/8] KVM: x86: add Intel processor trace virtualization mode

2017-11-13 Thread Kang, Luwei
> > +   if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_PT_USE_GPA) ||
> > +   !(_vmexit_control & VM_EXIT_CLEAR_IA32_RTIT_CTL) ||
> > +   !(_vmentry_control & VM_ENTRY_LOAD_IA32_RTIT_CTL)) {
> > +   _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_PT_USE_GPA;
> 
> Also, you are not checking anywhere if the SUPPRESS_PIP controls are 
> available.  This is probably the best place.

SUPPRESS_PIP(should be "CONCEAL", will fix it.) is use for control of  
processor trace packet. 
I think we should clear it when in SYSTEM mode (For example, PIPs are generated 
on VM exit, with NonRoot=0. On VM exit to SMM, VMCS packets are additionally 
generated). Why need check this here?

> 
> > +   _vmexit_control &= ~VM_EXIT_CLEAR_IA32_RTIT_CTL;
> > +   _vmentry_control &= ~VM_ENTRY_LOAD_IA32_RTIT_CTL;
> 
> These two are not needed; disabling SECONDARY_EXEC_PT_USE_GPA is enough.
> The tracing mode will revert to PT_SYSTEM, which does not use the load/clear 
> RTIT_CTL controls.
> 

The status of *_RTIT_CTL should be same with SECONDARY_EXEC_PT_USE_GPA or would 
cause VM-entry failed. 
(architecture-instruction-set-extensions-programming-reference  5.2.3)


RE: [patch v2 3/8] KVM: x86: add Intel processor trace virtualization mode

2017-11-13 Thread Kang, Luwei
> > +#define VM_EXIT_PT_SUPPRESS_PIP0x0100
> > +#define VM_EXIT_CLEAR_IA32_RTIT_CTL0x0200
> >
> >  #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR  0x00036dff
> >
> > @@ -108,6 +112,8 @@
> >  #define VM_ENTRY_LOAD_IA32_PAT 0x4000
> >  #define VM_ENTRY_LOAD_IA32_EFER 0x8000
> >  #define VM_ENTRY_LOAD_BNDCFGS   0x0001
> > +#define VM_ENTRY_PT_SUPPRESS_PIP   0x0002
> > +#define VM_ENTRY_LOAD_IA32_RTIT_CTL0x0004
> 
> 
> Please use PT_CONCEAL instead of PT_SUPPRESS_PIP, to better match the SDM 
> (for both vmexit and vmentry controls).
> 
> > +   if (!enable_ept)
> > +   vmexit_control &= ~VM_EXIT_CLEAR_IA32_RTIT_CTL;
> > +
> 
> Why is this (and the similar bit-clear operation in vmx_vmentry_control) 
> needed only for !enable_ept?
> 
> Shouldn't it be like
> 
>   if (pt_mode == PT_MODE_SYSTEM) {
>   vmexit_control &= ~VM_EXIT_PT_SUPPRESS_PIP;
>   vmexit_control &= ~VM_EXIT_CLEAR_IA32_RTIT_CTL;
>   }
> 
> and
> 
>   if (pt_mode == PT_MODE_SYSTEM) {
>   vmentry_control &= ~VM_ENTRY_PT_SUPPRESS_PIP;
>   vmentry_control &= ~VM_ENTRY_LOAD_IA32_RTIT_CTL;
>   }
> 

I think I have a misunderstand of " always set "use GPA for processor tracing" 
in secondary execution control if it can be ".
"use GPA for processor tracing" can't be set in SYSTEM mode even if hardware 
can set this bit. Because guest will still think this a GPA address and 
translate by EPT. In fact, RTIT_OUTPUT_BASE will always a HPA in SYSTEM mode.
Will fix in next version.

Thanks,
Luwei Kang



RE: [patch v2 3/8] KVM: x86: add Intel processor trace virtualization mode

2017-11-13 Thread Kang, Luwei
> > +#define VM_EXIT_PT_SUPPRESS_PIP0x0100
> > +#define VM_EXIT_CLEAR_IA32_RTIT_CTL0x0200
> >
> >  #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR  0x00036dff
> >
> > @@ -108,6 +112,8 @@
> >  #define VM_ENTRY_LOAD_IA32_PAT 0x4000
> >  #define VM_ENTRY_LOAD_IA32_EFER 0x8000
> >  #define VM_ENTRY_LOAD_BNDCFGS   0x0001
> > +#define VM_ENTRY_PT_SUPPRESS_PIP   0x0002
> > +#define VM_ENTRY_LOAD_IA32_RTIT_CTL0x0004
> 
> 
> Please use PT_CONCEAL instead of PT_SUPPRESS_PIP, to better match the SDM 
> (for both vmexit and vmentry controls).
> 
> > +   if (!enable_ept)
> > +   vmexit_control &= ~VM_EXIT_CLEAR_IA32_RTIT_CTL;
> > +
> 
> Why is this (and the similar bit-clear operation in vmx_vmentry_control) 
> needed only for !enable_ept?
> 
> Shouldn't it be like
> 
>   if (pt_mode == PT_MODE_SYSTEM) {
>   vmexit_control &= ~VM_EXIT_PT_SUPPRESS_PIP;
>   vmexit_control &= ~VM_EXIT_CLEAR_IA32_RTIT_CTL;
>   }
> 
> and
> 
>   if (pt_mode == PT_MODE_SYSTEM) {
>   vmentry_control &= ~VM_ENTRY_PT_SUPPRESS_PIP;
>   vmentry_control &= ~VM_ENTRY_LOAD_IA32_RTIT_CTL;
>   }
> 

I think I have a misunderstand of " always set "use GPA for processor tracing" 
in secondary execution control if it can be ".
"use GPA for processor tracing" can't be set in SYSTEM mode even if hardware 
can set this bit. Because guest will still think this a GPA address and 
translate by EPT. In fact, RTIT_OUTPUT_BASE will always a HPA in SYSTEM mode.
Will fix in next version.

Thanks,
Luwei Kang



RE: [patch v2 8/8] KVM: x86: Disable intercept for Intel processor trace MSRs

2017-11-13 Thread Kang, Luwei
> > +   if (pt_mode == PT_MODE_HOST_GUEST) {
> > +   u32 i, eax, ebx, ecx, edx;
> > +
> > +   cpuid_count(0x14, 1, , , , );
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_STATUS, false);
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_BASE, false);
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_MASK, false);
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_CR3_MATCH, false);
> > +   for (i = 0; i < (eax & 0x7); i++)
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_ADDR0_A + i,
> > +   false);
> > +   }
> > +
> 
> As I mentioned earlier, this probably makes vmentry/vmexit too expensive when 
> guests are not using processor tracing.  I would do  it only if guest 
> TRACEEN=1 (since anyway the values have to be correct if guest TRACEEN=1, and 
> a change in TRACEEN always causes a vmexit).
> 

Will change in next version.

Thanks,
Luwei Kang

> 
> > return alloc_kvm_area();
> >
> >  out:
> >



RE: [patch v2 8/8] KVM: x86: Disable intercept for Intel processor trace MSRs

2017-11-13 Thread Kang, Luwei
> > +   if (pt_mode == PT_MODE_HOST_GUEST) {
> > +   u32 i, eax, ebx, ecx, edx;
> > +
> > +   cpuid_count(0x14, 1, , , , );
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_STATUS, false);
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_BASE, false);
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_MASK, false);
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_CR3_MATCH, false);
> > +   for (i = 0; i < (eax & 0x7); i++)
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_ADDR0_A + i,
> > +   false);
> > +   }
> > +
> 
> As I mentioned earlier, this probably makes vmentry/vmexit too expensive when 
> guests are not using processor tracing.  I would do  it only if guest 
> TRACEEN=1 (since anyway the values have to be correct if guest TRACEEN=1, and 
> a change in TRACEEN always causes a vmexit).
> 

Will change in next version.

Thanks,
Luwei Kang

> 
> > return alloc_kvm_area();
> >
> >  out:
> >



RE: [patch v2 7/8] KVM: x86: add Intel PT msr RTIT_CTL read/write

2017-11-13 Thread Kang, Luwei
> >  static DEFINE_PER_CPU(struct vmcs *, vmxarea);  static
> > DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -3384,6 +3385,11 @@
> > static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > return 1;
> > msr_info->data = vcpu->arch.ia32_xss;
> > break;
> > +   case MSR_IA32_RTIT_CTL:
> > +   if (!vmx_pt_supported())
> > +   return 1;
> > +   msr_info->data = vmcs_read64(GUEST_IA32_RTIT_CTL);
> > +   break;
> > case MSR_TSC_AUX:
> > if (!msr_info->host_initiated &&
> > !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP)) @@ -3508,6 
> > +3514,11
> > @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > else
> > clear_atomic_switch_msr(vmx, MSR_IA32_XSS);
> > break;
> > +   case MSR_IA32_RTIT_CTL:
> > +   if (!vmx_pt_supported() || to_vmx(vcpu)->nested.vmxon)
> > +   return 1;
> 
> VMXON must also clear TraceEn bit (see 23.4 in the SDM).

Will clear TraceEn bit in handle_vmon() function.

> 
> You also need to support R/W of all the other MSRs, in order to make them 
> accessible to userspace, and add them in msrs_to_save and kvm_init_msr_list.
> 

Will add it in next version. This is use for live migration, is that right?

> Regarding the save/restore of the state, I am now wondering if you could also 
> use XSAVES and XRSTORS instead of multiple rdmsr/wrmsr in a loop.
> The cost is two MSR writes on vmenter (because the guest must run with 
> IA32_XSS=0) and two on vmexit.
> 

If use XSAVES and XRSTORS for context switch.
1. Before  kvm_load_guest_fpu(vcpu), we need to save host RTIT_CTL, disable PT 
and restore the value of  "vmx->pt_desc.guest.ctl" to GUEST_IA32_RTIT_CTL. Is 
that right?
2. After VM-exit (step out from kvm_x86_ops->run(vcpu)),  we need to save the 
status of GUEST_IA32_RTIT_CTL . TRACEEN=0 and others MSRs are still in guest 
status. Where to enable PT if in host-guest mode. I think we should enable PT 
after vm-exit but it may cause #GP. " If XRSTORS would restore (or initialize) 
PT state and IA32_RTIT_CTL.TraceEn = 1, the instruction causes a 
general-protection exception. SDM 13.5.6". if enable after kvm_put_guest_fpu() 
I think it too late.)

Thanks,
Luwei Kang
> 
> > +   vmcs_write64(GUEST_IA32_RTIT_CTL, data);
> > +   break;
> > case MSR_TSC_AUX:
> > if (!msr_info->host_initiated &&
> > !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> >



RE: [patch v2 7/8] KVM: x86: add Intel PT msr RTIT_CTL read/write

2017-11-13 Thread Kang, Luwei
> >  static DEFINE_PER_CPU(struct vmcs *, vmxarea);  static
> > DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -3384,6 +3385,11 @@
> > static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > return 1;
> > msr_info->data = vcpu->arch.ia32_xss;
> > break;
> > +   case MSR_IA32_RTIT_CTL:
> > +   if (!vmx_pt_supported())
> > +   return 1;
> > +   msr_info->data = vmcs_read64(GUEST_IA32_RTIT_CTL);
> > +   break;
> > case MSR_TSC_AUX:
> > if (!msr_info->host_initiated &&
> > !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP)) @@ -3508,6 
> > +3514,11
> > @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > else
> > clear_atomic_switch_msr(vmx, MSR_IA32_XSS);
> > break;
> > +   case MSR_IA32_RTIT_CTL:
> > +   if (!vmx_pt_supported() || to_vmx(vcpu)->nested.vmxon)
> > +   return 1;
> 
> VMXON must also clear TraceEn bit (see 23.4 in the SDM).

Will clear TraceEn bit in handle_vmon() function.

> 
> You also need to support R/W of all the other MSRs, in order to make them 
> accessible to userspace, and add them in msrs_to_save and kvm_init_msr_list.
> 

Will add it in next version. This is use for live migration, is that right?

> Regarding the save/restore of the state, I am now wondering if you could also 
> use XSAVES and XRSTORS instead of multiple rdmsr/wrmsr in a loop.
> The cost is two MSR writes on vmenter (because the guest must run with 
> IA32_XSS=0) and two on vmexit.
> 

If use XSAVES and XRSTORS for context switch.
1. Before  kvm_load_guest_fpu(vcpu), we need to save host RTIT_CTL, disable PT 
and restore the value of  "vmx->pt_desc.guest.ctl" to GUEST_IA32_RTIT_CTL. Is 
that right?
2. After VM-exit (step out from kvm_x86_ops->run(vcpu)),  we need to save the 
status of GUEST_IA32_RTIT_CTL . TRACEEN=0 and others MSRs are still in guest 
status. Where to enable PT if in host-guest mode. I think we should enable PT 
after vm-exit but it may cause #GP. " If XRSTORS would restore (or initialize) 
PT state and IA32_RTIT_CTL.TraceEn = 1, the instruction causes a 
general-protection exception. SDM 13.5.6". if enable after kvm_put_guest_fpu() 
I think it too late.)

Thanks,
Luwei Kang
> 
> > +   vmcs_write64(GUEST_IA32_RTIT_CTL, data);
> > +   break;
> > case MSR_TSC_AUX:
> > if (!msr_info->host_initiated &&
> > !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> >



RE: [patch v2 4/8] KVM: x86: add Intel processor trace cpuid emulataion

2017-11-13 Thread Kang, Luwei
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index
> > 0099e10..ef19a11 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void)
> >  /* These are scattered features in cpufeatures.h. */
> >  #define KVM_CPUID_BIT_AVX512_4VNNIW 2
> >  #define KVM_CPUID_BIT_AVX512_4FMAPS 3
> > +#define KVM_CPUID_BIT_INTEL_PT 25
> 
> This is not necessary, because there is no need to place processor tracing in 
> scattered features.  Can you replace this hunk, and the KF usage below, with 
> the following patch?
> 

Yes, this looks good to me. will fix in next version. 

Thanks,
Luwei Kang

>  8< -
> From: Paolo Bonzini 
> Subject: [PATCH] x86: cpufeature: move processor tracing out of scattered 
> features
> 
> Processor tracing is already enumerated in word 9 (CPUID[7,0].EBX), so do not 
> duplicate it in the scattered features word.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/include/asm/cpufeatures.h | 3 ++-
>  arch/x86/kernel/cpu/scattered.c| 1 -
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h 
> b/arch/x86/include/asm/cpufeatures.h
> index 2519c6c801c9..839781e78763 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -199,7 +199,7 @@
>  #define X86_FEATURE_SME  ( 7*32+10) /* AMD Secure Memory 
> Encryption */
> 
>  #define X86_FEATURE_INTEL_PPIN   ( 7*32+14) /* Intel Processor Inventory 
> Number */
> -#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
> +
>  #define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network 
> Instructions */  #define
> X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single 
> precision */
> 
> @@ -238,6 +238,7 @@
>  #define X86_FEATURE_AVX512IFMA  ( 9*32+21) /* AVX-512 Integer Fused 
> Multiply-Add instructions */
>  #define X86_FEATURE_CLFLUSHOPT   ( 9*32+23) /* CLFLUSHOPT instruction */
>  #define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */
> +#define X86_FEATURE_INTEL_PT ( 9*32+25) /* Intel Processor Trace */
>  #define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */
>  #define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and 
> Reciprocal */
>  #define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */
> diff --git a/arch/x86/kernel/cpu/scattered.c 
> b/arch/x86/kernel/cpu/scattered.c index 05459ad3db46..d0e69769abfd 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -21,7 +21,6 @@ struct cpuid_bit {
>  static const struct cpuid_bit cpuid_bits[] = {
>   { X86_FEATURE_APERFMPERF,   CPUID_ECX,  0, 0x0006, 0 },
>   { X86_FEATURE_EPB,  CPUID_ECX,  3, 0x0006, 0 },
> - { X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x0007, 0 },
>   { X86_FEATURE_AVX512_4VNNIW,CPUID_EDX,  2, 0x0007, 0 },
>   { X86_FEATURE_AVX512_4FMAPS,CPUID_EDX,  3, 0x0007, 0 },
>   { X86_FEATURE_CAT_L3,   CPUID_EBX,  1, 0x0010, 0 },
> 
> >  #define KF(x) bit(KVM_CPUID_BIT_##x)
> >
> >  int kvm_update_cpuid(struct kvm_vcpu *vcpu) @@ -327,6 +328,7 @@
> > static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 
> > function,
> > unsigned f_invpcid = kvm_x86_ops->invpcid_supported() ? F(INVPCID) : 0;
> > unsigned f_mpx = kvm_mpx_supported() ? F(MPX) : 0;
> > unsigned f_xsaves = kvm_x86_ops->xsaves_supported() ? F(XSAVES) : 0;
> > +   unsigned f_intel_pt = kvm_x86_ops->pt_supported() ? KF(INTEL_PT) :
> > +0;
> >
> > /* cpuid 1.edx */
> > const u32 kvm_cpuid_1_edx_x86_features =


RE: [patch v2 4/8] KVM: x86: add Intel processor trace cpuid emulataion

2017-11-13 Thread Kang, Luwei
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index
> > 0099e10..ef19a11 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void)
> >  /* These are scattered features in cpufeatures.h. */
> >  #define KVM_CPUID_BIT_AVX512_4VNNIW 2
> >  #define KVM_CPUID_BIT_AVX512_4FMAPS 3
> > +#define KVM_CPUID_BIT_INTEL_PT 25
> 
> This is not necessary, because there is no need to place processor tracing in 
> scattered features.  Can you replace this hunk, and the KF usage below, with 
> the following patch?
> 

Yes, this looks good to me. will fix in next version. 

Thanks,
Luwei Kang

>  8< -
> From: Paolo Bonzini 
> Subject: [PATCH] x86: cpufeature: move processor tracing out of scattered 
> features
> 
> Processor tracing is already enumerated in word 9 (CPUID[7,0].EBX), so do not 
> duplicate it in the scattered features word.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/include/asm/cpufeatures.h | 3 ++-
>  arch/x86/kernel/cpu/scattered.c| 1 -
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h 
> b/arch/x86/include/asm/cpufeatures.h
> index 2519c6c801c9..839781e78763 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -199,7 +199,7 @@
>  #define X86_FEATURE_SME  ( 7*32+10) /* AMD Secure Memory 
> Encryption */
> 
>  #define X86_FEATURE_INTEL_PPIN   ( 7*32+14) /* Intel Processor Inventory 
> Number */
> -#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
> +
>  #define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network 
> Instructions */  #define
> X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single 
> precision */
> 
> @@ -238,6 +238,7 @@
>  #define X86_FEATURE_AVX512IFMA  ( 9*32+21) /* AVX-512 Integer Fused 
> Multiply-Add instructions */
>  #define X86_FEATURE_CLFLUSHOPT   ( 9*32+23) /* CLFLUSHOPT instruction */
>  #define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */
> +#define X86_FEATURE_INTEL_PT ( 9*32+25) /* Intel Processor Trace */
>  #define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */
>  #define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and 
> Reciprocal */
>  #define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */
> diff --git a/arch/x86/kernel/cpu/scattered.c 
> b/arch/x86/kernel/cpu/scattered.c index 05459ad3db46..d0e69769abfd 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -21,7 +21,6 @@ struct cpuid_bit {
>  static const struct cpuid_bit cpuid_bits[] = {
>   { X86_FEATURE_APERFMPERF,   CPUID_ECX,  0, 0x0006, 0 },
>   { X86_FEATURE_EPB,  CPUID_ECX,  3, 0x0006, 0 },
> - { X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x0007, 0 },
>   { X86_FEATURE_AVX512_4VNNIW,CPUID_EDX,  2, 0x0007, 0 },
>   { X86_FEATURE_AVX512_4FMAPS,CPUID_EDX,  3, 0x0007, 0 },
>   { X86_FEATURE_CAT_L3,   CPUID_EBX,  1, 0x0010, 0 },
> 
> >  #define KF(x) bit(KVM_CPUID_BIT_##x)
> >
> >  int kvm_update_cpuid(struct kvm_vcpu *vcpu) @@ -327,6 +328,7 @@
> > static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 
> > function,
> > unsigned f_invpcid = kvm_x86_ops->invpcid_supported() ? F(INVPCID) : 0;
> > unsigned f_mpx = kvm_mpx_supported() ? F(MPX) : 0;
> > unsigned f_xsaves = kvm_x86_ops->xsaves_supported() ? F(XSAVES) : 0;
> > +   unsigned f_intel_pt = kvm_x86_ops->pt_supported() ? KF(INTEL_PT) :
> > +0;
> >
> > /* cpuid 1.edx */
> > const u32 kvm_cpuid_1_edx_x86_features =


RE: [patch v2 0/8] Intel Processor Trace virtulization enabling

2017-11-03 Thread Kang, Luwei
> > From v1:
> >  - remove guest-only mode because guest-only mode can be covered by 
> > host-guest mode;
> >  - always set "use GPA for processor tracing" in secondary execution 
> > control if it can be;
> >  - trap RTIT_CTL read/write. Forbid write this msr when VMXON in L1 
> > hypervisor.
> 
> Sounds fine.  However, it's likely that this will only be in Linux 4.16.
> 
OK.  Hope can get more comments to these patches.  Thanks.

Luwei Kang


RE: [patch v2 0/8] Intel Processor Trace virtulization enabling

2017-11-03 Thread Kang, Luwei
> > From v1:
> >  - remove guest-only mode because guest-only mode can be covered by 
> > host-guest mode;
> >  - always set "use GPA for processor tracing" in secondary execution 
> > control if it can be;
> >  - trap RTIT_CTL read/write. Forbid write this msr when VMXON in L1 
> > hypervisor.
> 
> Sounds fine.  However, it's likely that this will only be in Linux 4.16.
> 
OK.  Hope can get more comments to these patches.  Thanks.

Luwei Kang


RE: [PATCH 0/9] Intel Processor Trace virtulization enabling

2017-10-23 Thread Kang, Luwei
> > HI Paolo, Thanks for your clarify. Have understood. So, we should set
> > "use GPA for processor tracing" in any way( if we can do it) even in
> > system mode. There don't have problem in no nested but have problem in
> > nested if not set this bit. Still talking with  hardware designer but
> > please don't expect it can be change in SDM or hardware(fail vmentry
> > if they are not respected) soon.
> 
> No change in hardware is needed.
> 
> What I'm asking for is to define a bit in some architectural MSR such that, 
> _if the bit is 1_, you must have one of:
> 
> - RTIT_CTL = 0
> 
> - enable EPT = 0
> 
> - enable EPT = use GPA for processor tracing = 1, RTIT_CTL != 0
> 
> or vmentry would fail.
> 
> If the bit is 1 and RTIT_CTL = 0 and enable EPT = 1 and use GPA for processor 
> tracing = 0, the hypervisor must trap RTIT_CTL writes
> or behavior is undefined.
> 
> Processors would just set it to 0 and have absolutely no change in behavior.
> 

Get it. Will update with you when hardware designer have any response.

> > So, can we enable it in L1
> > guest only first?  I think it is not worth to disable EPT for L1 to
> > enable intel PT. what is your opinion?
> 
> Yes, we can enable it.  But since KVM sets IA32_VMX_MISC[14]=0, your patches 
> must forbid enabling processor trace during VMX
> operation.

L1 hypervisor can't  get the capability of " TraceEn can be set in VMX 
operation (IA32_VMX_MISC[bit 14] is 0)" and set it to 0.
We need to trap whether L1 hypervisor have enable VMXON, and forbid enable PT 
when vmxon. Is that right? Or have something else?

Thanks,
Luwei Kang

> 
> (In fact, another source of complexity is that we'd have to write the VMPTRLD 
> packet ourselves to the guest's processor trace
> buffer).
> 
> Paolo


RE: [PATCH 0/9] Intel Processor Trace virtulization enabling

2017-10-23 Thread Kang, Luwei
> > HI Paolo, Thanks for your clarify. Have understood. So, we should set
> > "use GPA for processor tracing" in any way( if we can do it) even in
> > system mode. There don't have problem in no nested but have problem in
> > nested if not set this bit. Still talking with  hardware designer but
> > please don't expect it can be change in SDM or hardware(fail vmentry
> > if they are not respected) soon.
> 
> No change in hardware is needed.
> 
> What I'm asking for is to define a bit in some architectural MSR such that, 
> _if the bit is 1_, you must have one of:
> 
> - RTIT_CTL = 0
> 
> - enable EPT = 0
> 
> - enable EPT = use GPA for processor tracing = 1, RTIT_CTL != 0
> 
> or vmentry would fail.
> 
> If the bit is 1 and RTIT_CTL = 0 and enable EPT = 1 and use GPA for processor 
> tracing = 0, the hypervisor must trap RTIT_CTL writes
> or behavior is undefined.
> 
> Processors would just set it to 0 and have absolutely no change in behavior.
> 

Get it. Will update with you when hardware designer have any response.

> > So, can we enable it in L1
> > guest only first?  I think it is not worth to disable EPT for L1 to
> > enable intel PT. what is your opinion?
> 
> Yes, we can enable it.  But since KVM sets IA32_VMX_MISC[14]=0, your patches 
> must forbid enabling processor trace during VMX
> operation.

L1 hypervisor can't  get the capability of " TraceEn can be set in VMX 
operation (IA32_VMX_MISC[bit 14] is 0)" and set it to 0.
We need to trap whether L1 hypervisor have enable VMXON, and forbid enable PT 
when vmxon. Is that right? Or have something else?

Thanks,
Luwei Kang

> 
> (In fact, another source of complexity is that we'd have to write the VMPTRLD 
> packet ourselves to the guest's processor trace
> buffer).
> 
> Paolo


RE: [PATCH 0/9] Intel Processor Trace virtulization enabling

2017-10-19 Thread Kang, Luwei
> On 19/10/2017 07:54, Kang, Luwei wrote:
> >>> Get it. I have feedback to hardware architect. I hope it can be applied 
> >>> but it may need wait a long time.
> >> Note that the hardware need not do anything.  However it would be
> >> nice if the SDM can define a bit _for the hypervisors_ to enforce the 
> >> above constraint and fail vmentry if they are not respected.
> >
> > Hi Paolo,
> > Thanks for your response. I have a question want to ask for you. As
> > you mentioned in previous mail " We would like the nested hypervisor
> > to be forced to set the "use GPA for processor tracing"". Is there
> > have any problem if we don't set "use GPA for processor tracing" in
> > nested hypervisor?
> 
> If the nested hypervisor doesn't set "use GPA for processor tracing", the 
> processor should use L1 addresses for processor tracing.
> This however is not possible without shadowing the ToPA, same as in 
> non-nested virtualization for <=Skylake processors.
> 
> So we have
> 
> |  mode
> |---
> nested? |  system-wide  |  host-guest
> +---+---
> no  |  use HPA for tracing  |  use GPA for tracing
> |  (no EPT) |  (EPT is GPA->HPA)
> +---+---
> yes |  use GPA for tracing  |  use nGPA for tracing
> |  (EPT is nGPA->HPA!!) |  (EPT is nGPA->HPA, so ok)
> 
> (for nested, L0 mode of course must be host-guest).  If the nested hypervisor 
> wants to use system-wide tracing, it cannot use "use
> GPA for tracing" because the EPT table doesn't have the right mapping of 
> L1->L0 physical address.
> 
> So if you want to do system-wide L1 tracing you have to disable EPT for L1, 
> and if you want to do host-guest L1 tracing you have to
> enable it.
> 
HI Paolo,
Thanks for your clarify. Have understood. So, we should set "use GPA for 
processor tracing" in any way( if we can do it) even in system mode. There 
don't have problem in no nested but have problem in nested if not set this bit.
Still talking with  hardware designer but please don't expect it can be 
change in SDM or hardware(fail vmentry if they are not respected) soon. So, can 
we enable it in L1 guest only first?  I think it is not worth to disable EPT 
for L1 to enable intel PT. what is your opinion?

Thanks,
Luwei Kang


RE: [PATCH 0/9] Intel Processor Trace virtulization enabling

2017-10-19 Thread Kang, Luwei
> On 19/10/2017 07:54, Kang, Luwei wrote:
> >>> Get it. I have feedback to hardware architect. I hope it can be applied 
> >>> but it may need wait a long time.
> >> Note that the hardware need not do anything.  However it would be
> >> nice if the SDM can define a bit _for the hypervisors_ to enforce the 
> >> above constraint and fail vmentry if they are not respected.
> >
> > Hi Paolo,
> > Thanks for your response. I have a question want to ask for you. As
> > you mentioned in previous mail " We would like the nested hypervisor
> > to be forced to set the "use GPA for processor tracing"". Is there
> > have any problem if we don't set "use GPA for processor tracing" in
> > nested hypervisor?
> 
> If the nested hypervisor doesn't set "use GPA for processor tracing", the 
> processor should use L1 addresses for processor tracing.
> This however is not possible without shadowing the ToPA, same as in 
> non-nested virtualization for <=Skylake processors.
> 
> So we have
> 
> |  mode
> |---
> nested? |  system-wide  |  host-guest
> +---+---
> no  |  use HPA for tracing  |  use GPA for tracing
> |  (no EPT) |  (EPT is GPA->HPA)
> +---+---
> yes |  use GPA for tracing  |  use nGPA for tracing
> |  (EPT is nGPA->HPA!!) |  (EPT is nGPA->HPA, so ok)
> 
> (for nested, L0 mode of course must be host-guest).  If the nested hypervisor 
> wants to use system-wide tracing, it cannot use "use
> GPA for tracing" because the EPT table doesn't have the right mapping of 
> L1->L0 physical address.
> 
> So if you want to do system-wide L1 tracing you have to disable EPT for L1, 
> and if you want to do host-guest L1 tracing you have to
> enable it.
> 
HI Paolo,
Thanks for your clarify. Have understood. So, we should set "use GPA for 
processor tracing" in any way( if we can do it) even in system mode. There 
don't have problem in no nested but have problem in nested if not set this bit.
Still talking with  hardware designer but please don't expect it can be 
change in SDM or hardware(fail vmentry if they are not respected) soon. So, can 
we enable it in L1 guest only first?  I think it is not worth to disable EPT 
for L1 to enable intel PT. what is your opinion?

Thanks,
Luwei Kang


RE: [PATCH 0/9] Intel Processor Trace virtulization enabling

2017-10-18 Thread Kang, Luwei
>  Nested virtualization is interesting.  We would like the nested
>  hypervisor to be forced to set the "use GPA for processor tracing"
>  secondary execution control whenever "enable EPT" is set and
>  RTIT_CTL is nonzero.  There is no way to encode that in
>  IA32_VMX_PROCBASED_CTLS2, however.  It would be nice if Intel could
>  reserve a bit in IA32_VMX_EPT_VPID_CAP for KVM to express that
>  constraint.
> >>>
> >>> Do you mean if nested hypervisor get the capability of "Guest PT use
> >>> GPA" and EPT has enable. Highly recommend nested hypervisor set "
> >>> Guest PT use GPA " as well.
> >>
> >> Well, it's required more than recommended.  However, it's only required if 
> >> "enable EPT" is set and RTIT_CTL is nonzero.
> >>
> >>> If nested hypervisor is also KVM, "use GPA for processor tracing"
> >>> will be set for sure. But other hypervisor may not do that. So, we'd
> >>> better add a flag in IA32_VMX_EPT_VPID_CAP to express that constraint.
> >>
> >> Correct.  The constraint would be:
> >>
> >> * RTIT_CTL on entry is zero if EPT is disabled
> >>
> >> * RTIT_CTL on entry is zero if EPT is enabled and "Guest PT uses GPA"
> >> is zero
> >>
> >> Maybe IA32_VMX_EPT_VPID_CAP is not the best place.  I'll let Intel decide 
> >> that.
> >
> > Get it. I have feedback to hardware architect. I hope it can be applied but 
> > it may need wait a long time.
> 
> Note that the hardware need not do anything.  However it would be nice if the 
> SDM can define a bit _for the hypervisors_ to
> enforce the above constraint and fail vmentry if they are not respected.
> 

Hi Paolo,
Thanks for your response. I have a question want to ask for you. As you 
mentioned in previous mail " We would like the nested hypervisor to be forced 
to set the "use GPA for processor tracing"". 
Is there have any problem if we don't set "use GPA for processor tracing" 
in nested hypervisor? If yes, what should we do? In patch 9, I pass though PT 
MSRs ( IA32_RTIT_* ) to guest.

Thanks,
Luwei Kang


RE: [PATCH 0/9] Intel Processor Trace virtulization enabling

2017-10-18 Thread Kang, Luwei
>  Nested virtualization is interesting.  We would like the nested
>  hypervisor to be forced to set the "use GPA for processor tracing"
>  secondary execution control whenever "enable EPT" is set and
>  RTIT_CTL is nonzero.  There is no way to encode that in
>  IA32_VMX_PROCBASED_CTLS2, however.  It would be nice if Intel could
>  reserve a bit in IA32_VMX_EPT_VPID_CAP for KVM to express that
>  constraint.
> >>>
> >>> Do you mean if nested hypervisor get the capability of "Guest PT use
> >>> GPA" and EPT has enable. Highly recommend nested hypervisor set "
> >>> Guest PT use GPA " as well.
> >>
> >> Well, it's required more than recommended.  However, it's only required if 
> >> "enable EPT" is set and RTIT_CTL is nonzero.
> >>
> >>> If nested hypervisor is also KVM, "use GPA for processor tracing"
> >>> will be set for sure. But other hypervisor may not do that. So, we'd
> >>> better add a flag in IA32_VMX_EPT_VPID_CAP to express that constraint.
> >>
> >> Correct.  The constraint would be:
> >>
> >> * RTIT_CTL on entry is zero if EPT is disabled
> >>
> >> * RTIT_CTL on entry is zero if EPT is enabled and "Guest PT uses GPA"
> >> is zero
> >>
> >> Maybe IA32_VMX_EPT_VPID_CAP is not the best place.  I'll let Intel decide 
> >> that.
> >
> > Get it. I have feedback to hardware architect. I hope it can be applied but 
> > it may need wait a long time.
> 
> Note that the hardware need not do anything.  However it would be nice if the 
> SDM can define a bit _for the hypervisors_ to
> enforce the above constraint and fail vmentry if they are not respected.
> 

Hi Paolo,
Thanks for your response. I have a question want to ask for you. As you 
mentioned in previous mail " We would like the nested hypervisor to be forced 
to set the "use GPA for processor tracing"". 
Is there have any problem if we don't set "use GPA for processor tracing" 
in nested hypervisor? If yes, what should we do? In patch 9, I pass though PT 
MSRs ( IA32_RTIT_* ) to guest.

Thanks,
Luwei Kang


RE: [PATCH 0/9] Intel Processor Trace virtulization enabling

2017-10-18 Thread Kang, Luwei
> >> Nested virtualization is interesting.  We would like the nested
> >> hypervisor to be forced to set the "use GPA for processor tracing"
> >> secondary execution control whenever "enable EPT" is set and RTIT_CTL
> >> is nonzero.  There is no way to encode that in
> >> IA32_VMX_PROCBASED_CTLS2, however.  It would be nice if Intel could
> >> reserve a bit in IA32_VMX_EPT_VPID_CAP for KVM to express that
> >> constraint.
> >
> > Do you mean if nested hypervisor get the capability of "Guest PT use
> > GPA" and EPT has enable. Highly recommend nested hypervisor set "
> > Guest PT use GPA " as well.
> 
> Well, it's required more than recommended.  However, it's only required if 
> "enable EPT" is set and RTIT_CTL is nonzero.
> 
> > If nested hypervisor is also KVM, "use GPA for processor tracing"
> > will be set for sure. But other hypervisor may not do that. So, we'd
> > better add a flag in IA32_VMX_EPT_VPID_CAP to express that constraint.
> 
> Correct.  The constraint would be:
> 
> * RTIT_CTL on entry is zero if EPT is disabled
> 
> * RTIT_CTL on entry is zero if EPT is enabled and "Guest PT uses GPA" is zero
> 
> Maybe IA32_VMX_EPT_VPID_CAP is not the best place.  I'll let Intel decide 
> that.
> 

Get it. I have feedback to hardware architect. I hope it can be applied but it 
may need wait a long time.

Thanks,
Luwei Kang


RE: [PATCH 0/9] Intel Processor Trace virtulization enabling

2017-10-18 Thread Kang, Luwei
> >> Nested virtualization is interesting.  We would like the nested
> >> hypervisor to be forced to set the "use GPA for processor tracing"
> >> secondary execution control whenever "enable EPT" is set and RTIT_CTL
> >> is nonzero.  There is no way to encode that in
> >> IA32_VMX_PROCBASED_CTLS2, however.  It would be nice if Intel could
> >> reserve a bit in IA32_VMX_EPT_VPID_CAP for KVM to express that
> >> constraint.
> >
> > Do you mean if nested hypervisor get the capability of "Guest PT use
> > GPA" and EPT has enable. Highly recommend nested hypervisor set "
> > Guest PT use GPA " as well.
> 
> Well, it's required more than recommended.  However, it's only required if 
> "enable EPT" is set and RTIT_CTL is nonzero.
> 
> > If nested hypervisor is also KVM, "use GPA for processor tracing"
> > will be set for sure. But other hypervisor may not do that. So, we'd
> > better add a flag in IA32_VMX_EPT_VPID_CAP to express that constraint.
> 
> Correct.  The constraint would be:
> 
> * RTIT_CTL on entry is zero if EPT is disabled
> 
> * RTIT_CTL on entry is zero if EPT is enabled and "Guest PT uses GPA" is zero
> 
> Maybe IA32_VMX_EPT_VPID_CAP is not the best place.  I'll let Intel decide 
> that.
> 

Get it. I have feedback to hardware architect. I hope it can be applied but it 
may need wait a long time.

Thanks,
Luwei Kang


RE: [PATCH 0/9] Intel Processor Trace virtulization enabling

2017-10-18 Thread Kang, Luwei
> On 16/10/2017 14:09, Luwei Kang wrote:
> > 2. Enabling use of EPT to redirect PT output.
> >   — This enables the VMM to elect to virtualize the PT output buffer using 
> > EPT. In this mode, the CPU will treat PT output
> addresses as Guest Physical Addresses (GPAs) and translate them using EPT. 
> This means that Intel PT output reads (of the ToPA
> table) and writes (of trace output) can cause EPT violations, and other 
> output events.
> 
> Yay yay yay! ;)
> 
> > Intel Processor Trace virtualization can be work in one of 4 possible modes 
> > by set new option "pt_mode". Default is host guest
> mode.
> >  a. system-wide: trace both host/guest and output to host buffer;  b.
> > host-only: only trace host and output to host buffer;  c. guest-only:
> > only trace guest and output to guest buffer;  d. host-guest: trace
> > host/guest simultaneous and output to their respective buffer.
> 
> I think the guest-only mode isn't particularly useful, and I would remove it.

Agree, will remove it.

> 
> Nested virtualization is interesting.  We would like the nested hypervisor to 
> be forced to set the "use GPA for processor tracing"
> secondary execution control whenever "enable EPT" is set and RTIT_CTL is 
> nonzero.  There is no way to encode that in
> IA32_VMX_PROCBASED_CTLS2, however.  It would be nice if Intel could reserve a 
> bit in IA32_VMX_EPT_VPID_CAP for KVM to
> express that constraint.

Do you mean if nested hypervisor get the capability of "Guest PT use GPA" and 
EPT has enable. Highly recommend nested hypervisor set " Guest PT use GPA "  as 
well. 
If nested hypervisor is also KVM, "use GPA for processor tracing" will be set 
for sure. But other hypervisor may not do that. So, we'd better add a flag in 
IA32_VMX_EPT_VPID_CAP to express that constraint.  Is that right?

Thanks,
Luwei Kang


RE: [PATCH 0/9] Intel Processor Trace virtulization enabling

2017-10-18 Thread Kang, Luwei
> On 16/10/2017 14:09, Luwei Kang wrote:
> > 2. Enabling use of EPT to redirect PT output.
> >   — This enables the VMM to elect to virtualize the PT output buffer using 
> > EPT. In this mode, the CPU will treat PT output
> addresses as Guest Physical Addresses (GPAs) and translate them using EPT. 
> This means that Intel PT output reads (of the ToPA
> table) and writes (of trace output) can cause EPT violations, and other 
> output events.
> 
> Yay yay yay! ;)
> 
> > Intel Processor Trace virtualization can be work in one of 4 possible modes 
> > by set new option "pt_mode". Default is host guest
> mode.
> >  a. system-wide: trace both host/guest and output to host buffer;  b.
> > host-only: only trace host and output to host buffer;  c. guest-only:
> > only trace guest and output to guest buffer;  d. host-guest: trace
> > host/guest simultaneous and output to their respective buffer.
> 
> I think the guest-only mode isn't particularly useful, and I would remove it.

Agree, will remove it.

> 
> Nested virtualization is interesting.  We would like the nested hypervisor to 
> be forced to set the "use GPA for processor tracing"
> secondary execution control whenever "enable EPT" is set and RTIT_CTL is 
> nonzero.  There is no way to encode that in
> IA32_VMX_PROCBASED_CTLS2, however.  It would be nice if Intel could reserve a 
> bit in IA32_VMX_EPT_VPID_CAP for KVM to
> express that constraint.

Do you mean if nested hypervisor get the capability of "Guest PT use GPA" and 
EPT has enable. Highly recommend nested hypervisor set " Guest PT use GPA "  as 
well. 
If nested hypervisor is also KVM, "use GPA for processor tracing" will be set 
for sure. But other hypervisor may not do that. So, we'd better add a flag in 
IA32_VMX_EPT_VPID_CAP to express that constraint.  Is that right?

Thanks,
Luwei Kang


RE: [PATCH 9/9] KVM: x86: Disable intercept for Intel processor trace MSRs

2017-10-18 Thread Kang, Luwei
> On October 16, 2017 8:14:11 AM EDT, Luwei Kang  wrote:
> >From: Chao Peng 
> >
> >Trap for Intel processor trace is none sense. Pass through to guest
> >directly.
> 
> 
> And none of those MSRs can be subverted by the guest? That is none of these 
> should be filtered / audited first?
> 
   Currently, I think it is no necessary. For example, host context(MSRs) would 
be saved and guest context would be restored before VM-entry in host-guest 
mode. And IA32_RTIT_CTL will be written with the value of the associated Guest 
State field of the VMCS on VM entry. I am not very clear what your point. Could 
you make a specific scenes?

Thanks,
Luwei Kang

> 
> >
> >Signed-off-by: Chao Peng 
> >Signed-off-by: Luwei Kang 
> >---
> > arch/x86/kvm/vmx.c | 14 ++
> > 1 file changed, 14 insertions(+)
> >
> >diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> >3c9ce3e..58606ce 100644
> >--- a/arch/x86/kvm/vmx.c
> >+++ b/arch/x86/kvm/vmx.c
> >@@ -7076,6 +7076,20 @@ static __init int hardware_setup(void)
> > if (pt_mode == PT_MODE_GUEST)
> > pt_register_virt_ops(_virt_ops);
> >
> >+if (pt_mode == PT_MODE_GUEST || pt_mode == PT_MODE_HOST_GUEST) {
> >+u32 i, eax, ebx, ecx, edx;
> >+
> >+cpuid_count(0x14, 1, , , , );
> >+vmx_disable_intercept_for_msr(MSR_IA32_RTIT_CTL, false);
> >+vmx_disable_intercept_for_msr(MSR_IA32_RTIT_STATUS, false);
> >+vmx_disable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_BASE, false);
> >+vmx_disable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_MASK, false);
> >+vmx_disable_intercept_for_msr(MSR_IA32_RTIT_CR3_MATCH, false);
> >+for (i = 0; i < (eax & 0x7); i++)
> >+vmx_disable_intercept_for_msr(MSR_IA32_RTIT_ADDR0_A + i,
> >+false);
> >+}
> >+
> > return alloc_kvm_area();
> >
> > out:
> 
> 
> Thanks!


RE: [PATCH 9/9] KVM: x86: Disable intercept for Intel processor trace MSRs

2017-10-18 Thread Kang, Luwei
> On October 16, 2017 8:14:11 AM EDT, Luwei Kang  wrote:
> >From: Chao Peng 
> >
> >Trap for Intel processor trace is none sense. Pass through to guest
> >directly.
> 
> 
> And none of those MSRs can be subverted by the guest? That is none of these 
> should be filtered / audited first?
> 
   Currently, I think it is no necessary. For example, host context(MSRs) would 
be saved and guest context would be restored before VM-entry in host-guest 
mode. And IA32_RTIT_CTL will be written with the value of the associated Guest 
State field of the VMCS on VM entry. I am not very clear what your point. Could 
you make a specific scenes?

Thanks,
Luwei Kang

> 
> >
> >Signed-off-by: Chao Peng 
> >Signed-off-by: Luwei Kang 
> >---
> > arch/x86/kvm/vmx.c | 14 ++
> > 1 file changed, 14 insertions(+)
> >
> >diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> >3c9ce3e..58606ce 100644
> >--- a/arch/x86/kvm/vmx.c
> >+++ b/arch/x86/kvm/vmx.c
> >@@ -7076,6 +7076,20 @@ static __init int hardware_setup(void)
> > if (pt_mode == PT_MODE_GUEST)
> > pt_register_virt_ops(_virt_ops);
> >
> >+if (pt_mode == PT_MODE_GUEST || pt_mode == PT_MODE_HOST_GUEST) {
> >+u32 i, eax, ebx, ecx, edx;
> >+
> >+cpuid_count(0x14, 1, , , , );
> >+vmx_disable_intercept_for_msr(MSR_IA32_RTIT_CTL, false);
> >+vmx_disable_intercept_for_msr(MSR_IA32_RTIT_STATUS, false);
> >+vmx_disable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_BASE, false);
> >+vmx_disable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_MASK, false);
> >+vmx_disable_intercept_for_msr(MSR_IA32_RTIT_CR3_MATCH, false);
> >+for (i = 0; i < (eax & 0x7); i++)
> >+vmx_disable_intercept_for_msr(MSR_IA32_RTIT_ADDR0_A + i,
> >+false);
> >+}
> >+
> > return alloc_kvm_area();
> >
> > out:
> 
> 
> Thanks!


RE: [PATCH] KVM: x86: Expose more Intel AVX512 feature to guest

2016-08-11 Thread Kang, Luwei
> Expose AVX512DQ, AVX512BW, AVX512VL feature to guest.
> Its spec can be found at:
> https://software.intel.com/sites/default/files/managed/b4/3a/319433-024.pdf
> 
> Signed-off-by: Luwei Kang 
> ---
>  arch/x86/kvm/cpuid.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 
> 7597b42..a2d007c 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -366,7 +366,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
> *entry, u32 function,
>   F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
>   F(BMI2) | F(ERMS) | f_invpcid | F(RTM) | f_mpx | F(RDSEED) |
>   F(ADX) | F(SMAP) | F(AVX512F) | F(AVX512PF) | F(AVX512ER) |
> - F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(PCOMMIT);
> + F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(PCOMMIT) |
> + F(AVX512DQ) | F(AVX512BW) | F(AVX512VL);
> 

Hi Paolo,
What is your opinion?

>   /* cpuid 0xD.1.eax */
>   const u32 kvm_cpuid_D_1_eax_x86_features =
> --
> 2.7.4



RE: [PATCH] KVM: x86: Expose more Intel AVX512 feature to guest

2016-08-11 Thread Kang, Luwei
> Expose AVX512DQ, AVX512BW, AVX512VL feature to guest.
> Its spec can be found at:
> https://software.intel.com/sites/default/files/managed/b4/3a/319433-024.pdf
> 
> Signed-off-by: Luwei Kang 
> ---
>  arch/x86/kvm/cpuid.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 
> 7597b42..a2d007c 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -366,7 +366,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
> *entry, u32 function,
>   F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
>   F(BMI2) | F(ERMS) | f_invpcid | F(RTM) | f_mpx | F(RDSEED) |
>   F(ADX) | F(SMAP) | F(AVX512F) | F(AVX512PF) | F(AVX512ER) |
> - F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(PCOMMIT);
> + F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(PCOMMIT) |
> + F(AVX512DQ) | F(AVX512BW) | F(AVX512VL);
> 

Hi Paolo,
What is your opinion?

>   /* cpuid 0xD.1.eax */
>   const u32 kvm_cpuid_D_1_eax_x86_features =
> --
> 2.7.4