On Fri, Sep 28, 2012 at 02:07:26AM +0000, Auld, Will wrote:
> Marcelo,
> 
> I tagged my comments below with "[auld]" to make it easier to read. 
> 
> Thanks,
> 
> Will
> 
> -----Original Message-----
> From: Marcelo Tosatti [mailto:mtosa...@redhat.com] 
> Sent: Thursday, September 27, 2012 4:49 AM
> To: Auld, Will
> Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao; Liu, Jinsong
> Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> 
> On Thu, Sep 27, 2012 at 08:31:22AM -0300, Marcelo Tosatti wrote:
> > On Thu, Sep 27, 2012 at 12:50:16AM +0000, Auld, Will wrote:
> > > Marcelo,
> > > 
> > > I think I am missing something. There should be no needed changes to 
> > > current algorithms that exist today. Does it seem that I have broken 
> > > Zachary's implementation somehow?
> > 
> > Yes. compute_guest_tsc() function must take ia32_tsc_adjust into 
> > account. guest_read_tsc (and the SVM equivalent) also.
> 
> [auld] I don't see how that function is broken. 

compute_guest_tsc() should return the TSC value accordingly to what is
emulated via vcpu->arch.virtual_tsc_mult, but this can be fixed later.

> Also, must take into account VMX->SVM migration. In that case, you should 
> export IA32_TSC_ADJUST along with IA32_TSC MSR.
> 
> [auld] I'll give this more thought. Two different ways to go, allow this to 
> only work on host processors with this feature or enable this for all VM 
> independent of the underlying host processor capability. In the former case 
> migrating cross architecture might be disallowed. In the later case sending 
> only IA32_TSC on migration should be enough as the delta would be accounted 
> for in tsc_offset of the control structure.

That is fine, yes, if you want to migrate across, don't expose the
feature.

> 
> Which brings us back to the initial question, if there are other means to 
> provide stable TSC, why use this MSR? For example, VMWare guests have no need 
> to use this MSR (because the hypervisor provides TSC guarantees).
> 
> [auld] Using this MSR simplifies the process of synchronizing the tsc for 
> each logical processor because its value does not change with the clock. How 
> do you write the same value to all the IA32_TIME_STAMP_COUNTER MSR? Well, 
> figure out what you want to write there, get all the processors to rendezvous 
> at the same time, have all logical processors complete their writes in a very 
> small amount of time. This is in contrast to deciding the offset to write and 
> then having all the logical processors write the offset. No worries about 
> rendezvous, synchronization of the writes in time and such.  
> 
> Then we come back to the two questions: 
> 
> - Is there anyone from Intel working on the Linux host side, where it
>   makes sense to use this?
> 
> [auld] I am not aware of anyone working on this for Linux.
> 
> - Are you sure its worthwhile to expose this to KVM guests?
> 
> [auld] At least one OS is moving to implement this that is commonly used as a 
> guest. 

OK thanks.

> 
> > > 
> > > Thanks,
> > > 
> > > Will
> > > 
> > > -----Original Message-----
> > > From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
> > > Sent: Wednesday, September 26, 2012 5:29 PM
> > > To: Auld, Will
> > > Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao; Liu, Jinsong
> > > Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > > 
> > > On Wed, Sep 26, 2012 at 10:58:46PM +0000, Auld, Will wrote:
> > > > Avi, Still working on your suggestions.
> > > > 
> > > > Marcelo,
> > > > 
> > > > The purpose is to be able to run guests that implement this change and 
> > > > not require they revert to the older method of adjusting the TSC. I am 
> > > > making no assumption about whether the guest checks to see if the times 
> > > > are good enough or just runs an algorithm every time but in any case 
> > > > this would allow the simpler, cleaner and less expensive algorithm to 
> > > > run if it exists. 
> > > 
> > > Will, you can choose to not expose the feature. Correct?
> > > 
> > > Because this conflicts with the model that has been envisioned and 
> > > developed by Zachary... for that model to continue to be functional 
> > > you'll have to make sure the TSC emulation is adjusted accordingly to 
> > > consider IA32_TSC_ADJUST (for example, when trapping TSC).
> > > 
> > > >From that point of view, the patch below is incomplete.
> > > 
> > > ... or KVM can choose to never expose the feature via CPUID and handle 
> > > TSC consistency itself (i understand your perspective of getting a task 
> > > complete, but unfortunately from my POV its not so simple).
> > > 
> > > > Thanks,
> > > > 
> > > > Will
> > > > 
> > > > >The purpose of the IA32_TSC_ADJUST control is to make it easier for 
> > > > >the operating system >(host) to decrease the delta between cores to an 
> > > > >acceptable value, so that applications >can make use of direct RDTSC, 
> > > > >correct?
> > > > >
> > > > >Why is it necessary for the guests to make use of such interface, if 
> > > > >the hypervisor >could provide proper TSC?
> > > > >
> > > > >(not against exposing it to the guests, just thinking out loud).
> > > > >
> > > > >That is, if the purpose of the IA32_TSC_ADJUST is to provide proper 
> > > > >synchronized TSC >across cores, and newer guests which should already 
> > > > >make use of paravirt clock >interface, what is the point of exposing 
> > > > >the feature?
> > > > 
> > > > -----Original Message-----
> > > > From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
> > > > Sent: Wednesday, September 26, 2012 2:35 PM
> > > > To: Auld, Will
> > > > Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao
> > > > Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > > > 
> > > > On Wed, Sep 19, 2012 at 05:44:46PM +0000, Auld, Will wrote:
> > > > > >From 9982bb73460b05c1328068aae047b14b2294e2da Mon Sep 17 
> > > > > >00:00:00
> > > > > >2001
> > > > > From: Will Auld <will.a...@intel.com>
> > > > > Date: Wed, 12 Sep 2012 18:10:56 -0700
> > > > > Subject: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> > > > > 
> > > > > CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is 
> > > > > supported
> > > > > 
> > > > > Basic design is to emulate the MSR by allowing reads and writes to a 
> > > > > guest vcpu specific location to store the value of the emulated MSR 
> > > > > while adding the value to the vmcs tsc_offset. In this way the 
> > > > > IA32_TSC_ADJUST value will be included in all reads to the TSC MSR 
> > > > > whether through rdmsr or rdtsc. This is of course as long as the "use 
> > > > > TSC counter offsetting" VM-execution control is enabled as well as 
> > > > > the IA32_TSC_ADJUST control.
> > > > > 
> > > > > However, because hardware will only return the TSC + IA32_TSC_ADJUST 
> > > > > + vmsc tsc_offset for a guest process when it does and rdtsc (with 
> > > > > the correct settings) the value of our virtualized IA32_TSC_ADJUST 
> > > > > must be stored in one of these three locations.
> > > > 
> > > > The purpose of the IA32_TSC_ADJUST control is to make it easier for the 
> > > > operating system (host) to decrease the delta between cores to an 
> > > > acceptable value, so that applications can make use of direct RDTSC, 
> > > > correct?
> > > > 
> > > > Why is it necessary for the guests to make use of such interface, if 
> > > > the hypervisor could provide proper TSC?
> > > > 
> > > > (not against exposing it to the guests, just thinking out loud).
> > > > 
> > > > That is, if the purpose of the IA32_TSC_ADJUST is to provide proper 
> > > > synchronized TSC across cores, and newer guests which should already 
> > > > make use of paravirt clock interface, what is the point of exposing the 
> > > > feature?
> > > > 
> > > > >  The argument against storing it in the actual MSR is performance.
> > > > >  This is likely to be seldom used while the save/restore is 
> > > > > required on every transition. IA32_TSC_ADJUST was created as a 
> > > > > way to solve some issues with writing TSC itself so that is not an 
> > > > > option  either.
> > > > > The remaining option, defined above as our solution has  the 
> > > > > problem of returning incorrect vmcs tsc_offset values (unless  
> > > > > we intercept and fix, not done here) as mentioned above. 
> > > > > However,  more problematic is that storing the data in vmcs 
> > > > > tsc_offset will  have a different semantic effect on the system 
> > > > > than does using  the actual MSR. This is illustrated in the 
> > > > > following example: The  hypervisor set the IA32_TSC_ADJUST, then the 
> > > > > guest sets it and a  guest process perfor!
> > > > > ms a rdtsc. In this case the guest process will  get TSC + 
> > > > > IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including 
> > > > > IA32_TSC_ADJUST_guest. While the total system semantics changed 
> > > > > the semantics as seen by the guest do not and hence this will 
> > > > > not cause a problem.
> > > > > ---
> > > > 
> > > > >  arch/x86/include/asm/cpufeature.h |    1 +
> > > > >  arch/x86/include/asm/kvm_host.h   |    2 ++
> > > > >  arch/x86/include/asm/msr-index.h  |    1 +
> > > > >  arch/x86/kvm/cpuid.c              |    4 ++--
> > > > >  arch/x86/kvm/vmx.c                |   12 ++++++++++++
> > > > >  arch/x86/kvm/x86.c                |    1 +
> > > > >  6 files changed, 19 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/arch/x86/include/asm/cpufeature.h
> > > > > b/arch/x86/include/asm/cpufeature.h
> > > > > index 6b7ee5f..e574d81 100644
> > > > > --- a/arch/x86/include/asm/cpufeature.h
> > > > > +++ b/arch/x86/include/asm/cpufeature.h
> > > > > @@ -199,6 +199,7 @@
> > > > >  
> > > > >  /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 
> > > > > 9 */
> > > > >  #define X86_FEATURE_FSGSBASE (9*32+ 0) /* {RD/WR}{FS/GS}BASE 
> > > > > instructions*/
> > > > > +#define X86_FEATURE_TSC_ADJUST  (9*32+ 1) /* TSC adjustment MSR 
> > > > > +0x3b */
> > > > >  #define X86_FEATURE_BMI1     (9*32+ 3) /* 1st group bit manipulation 
> > > > > extensions */
> > > > >  #define X86_FEATURE_HLE              (9*32+ 4) /* Hardware Lock 
> > > > > Elision */
> > > > >  #define X86_FEATURE_AVX2     (9*32+ 5) /* AVX2 instructions */
> > > > > diff --git a/arch/x86/include/asm/kvm_host.h 
> > > > > b/arch/x86/include/asm/kvm_host.h index 09155d6..8a001a4 100644
> > > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > > @@ -442,6 +442,8 @@ struct kvm_vcpu_arch {
> > > > >       u32 virtual_tsc_mult;
> > > > >       u32 virtual_tsc_khz;
> > > > >  
> > > > > +     s64 tsc_adjust;
> > > > > +
> > > > >       atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
> > > > >       unsigned nmi_pending; /* NMI queued after currently running 
> > > > > handler */
> > > > >       bool nmi_injected;    /* Trying to inject an NMI this entry */
> > > > > diff --git a/arch/x86/include/asm/msr-index.h
> > > > > b/arch/x86/include/asm/msr-index.h
> > > > > index 957ec87..8e82e29 100644
> > > > > --- a/arch/x86/include/asm/msr-index.h
> > > > > +++ b/arch/x86/include/asm/msr-index.h
> > > > > @@ -231,6 +231,7 @@
> > > > >  #define MSR_IA32_EBL_CR_POWERON              0x0000002a
> > > > >  #define MSR_EBC_FREQUENCY_ID         0x0000002c
> > > > >  #define MSR_IA32_FEATURE_CONTROL        0x0000003a
> > > > > +#define MSR_TSC_ADJUST                               0x0000003b
> > > > >  
> > > > >  #define FEATURE_CONTROL_LOCKED                               (1<<0)
> > > > >  #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX     (1<<1)
> > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 
> > > > > 0595f13..8f5943e 100644
> > > > > --- a/arch/x86/kvm/cpuid.c
> > > > > +++ b/arch/x86/kvm/cpuid.c
> > > > > @@ -248,8 +248,8 @@ static int do_cpuid_ent(struct 
> > > > > kvm_cpuid_entry2 *entry, u32 function,
> > > > >  
> > > > >       /* cpuid 7.0.ebx */
> > > > >       const u32 kvm_supported_word9_x86_features =
> > > > > -             F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
> > > > > -             F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> > > > > +             F(FSGSBASE) | F(TSC_ADJUST) | F(BMI1) | F(HLE) |
> > > > > +             F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | 
> > > > > F(RTM);
> > > > >  
> > > > >       /* all calls to cpuid_count() should be made on the same cpu */
> > > > >       get_cpu();
> > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> > > > > c00f03d..35d11b3 100644
> > > > > --- a/arch/x86/kvm/vmx.c
> > > > > +++ b/arch/x86/kvm/vmx.c
> > > > > @@ -2173,6 +2173,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, 
> > > > > u32 msr_index, u64 *pdata)
> > > > >       case MSR_IA32_SYSENTER_ESP:
> > > > >               data = vmcs_readl(GUEST_SYSENTER_ESP);
> > > > >               break;
> > > > > +     case MSR_TSC_ADJUST:
> > > > > +             data = (u64)vcpu->arch.tsc_adjust;
> > > > > +             break;
> > > > >       case MSR_TSC_AUX:
> > > > >               if (!to_vmx(vcpu)->rdtscp_enabled)
> > > > >                       return 1;
> > > > > @@ -2241,6 +2244,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, 
> > > > > u32 msr_index, u64 data)
> > > > >               }
> > > > >               ret = kvm_set_msr_common(vcpu, msr_index, data);
> > > > >               break;
> > > > > +     case MSR_TSC_ADJUST:
> > > > > +#define DUMMY 1
> > > > > +             vmx_adjust_tsc_offset(vcpu,
> > > > > +                             (s64)(data-vcpu->arch.tsc_adjust),
> > > > > +                             (bool)DUMMY);
> > > > > +             vcpu->arch.tsc_adjust = (s64)data;
> > > > > +             break;
> > > > >       case MSR_TSC_AUX:
> > > > >               if (!vmx->rdtscp_enabled)
> > > > >                       return 1;
> > > > > @@ -3931,6 +3941,8 @@ static int vmx_vcpu_reset(struct kvm_vcpu
> > > > > *vcpu)
> > > > >  
> > > > >       vcpu->arch.regs_avail = ~((1 << VCPU_REGS_RIP) | (1 << 
> > > > > VCPU_REGS_RSP));
> > > > >  
> > > > > +     vcpu->arch.tsc_adjust = 0x0;
> > > > > +
> > > > >       vmx->rmode.vm86_active = 0;
> > > > >  
> > > > >       vmx->soft_vnmi_blocked = 0;
> > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 
> > > > > 42bce48..6c50f6c 100644
> > > > > --- a/arch/x86/kvm/x86.c
> > > > > +++ b/arch/x86/kvm/x86.c
> > > > > @@ -824,6 +824,7 @@ static u32 msrs_to_save[] = {  static 
> > > > > unsigned num_msrs_to_save;
> > > > >  
> > > > >  static u32 emulated_msrs[] = {
> > > > > +     MSR_TSC_ADJUST,
> > > > >       MSR_IA32_TSCDEADLINE,
> > > > >       MSR_IA32_MISC_ENABLE,
> > > > >       MSR_IA32_MCG_STATUS,
> > > > > --
> > > > > 1.7.1
> > > > > 
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe kvm" 
> > > > > in the body of a message to majord...@vger.kernel.org More 
> > > > > majordomo info at http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in the 
> > body of a message to majord...@vger.kernel.org More majordomo info at  
> > http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to