On Fri, Sep 28, 2012 at 02:07:26AM +0000, Auld, Will wrote: > Marcelo, > > I tagged my comments below with "[auld]" to make it easier to read. > > Thanks, > > Will > > -----Original Message----- > From: Marcelo Tosatti [mailto:mtosa...@redhat.com] > Sent: Thursday, September 27, 2012 4:49 AM > To: Auld, Will > Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao; Liu, Jinsong > Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM > > On Thu, Sep 27, 2012 at 08:31:22AM -0300, Marcelo Tosatti wrote: > > On Thu, Sep 27, 2012 at 12:50:16AM +0000, Auld, Will wrote: > > > Marcelo, > > > > > > I think I am missing something. There should be no needed changes to > > > current algorithms that exist today. Does it seem that I have broken > > > Zachary's implementation somehow? > > > > Yes. compute_guest_tsc() function must take ia32_tsc_adjust into > > account. guest_read_tsc (and the SVM equivalent) also. > > [auld] I don't see how that function is broken.
compute_guest_tsc() should return the TSC value accordingly to what is emulated via vcpu->arch.virtual_tsc_mult, but this can be fixed later. > Also, must take into account VMX->SVM migration. In that case, you should > export IA32_TSC_ADJUST along with IA32_TSC MSR. > > [auld] I'll give this more thought. Two different ways to go, allow this to > only work on host processors with this feature or enable this for all VM > independent of the underlying host processor capability. In the former case > migrating cross architecture might be disallowed. In the later case sending > only IA32_TSC on migration should be enough as the delta would be accounted > for in tsc_offset of the control structure. That is fine, yes, if you want to migrate across, don't expose the feature. > > Which brings us back to the initial question, if there are other means to > provide stable TSC, why use this MSR? For example, VMWare guests have no need > to use this MSR (because the hypervisor provides TSC guarantees). > > [auld] Using this MSR simplifies the process of synchronizing the tsc for > each logical processor because its value does not change with the clock. How > do you write the same value to all the IA32_TIME_STAMP_COUNTER MSR? Well, > figure out what you want to write there, get all the processors to rendezvous > at the same time, have all logical processors complete their writes in a very > small amount of time. This is in contrast to deciding the offset to write and > then having all the logical processors write the offset. No worries about > rendezvous, synchronization of the writes in time and such. > > Then we come back to the two questions: > > - Is there anyone from Intel working on the Linux host side, where it > makes sense to use this? > > [auld] I am not aware of anyone working on this for Linux. > > - Are you sure its worthwhile to expose this to KVM guests? > > [auld] At least one OS is moving to implement this that is commonly used as a > guest. OK thanks. > > > > > > > Thanks, > > > > > > Will > > > > > > -----Original Message----- > > > From: Marcelo Tosatti [mailto:mtosa...@redhat.com] > > > Sent: Wednesday, September 26, 2012 5:29 PM > > > To: Auld, Will > > > Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao; Liu, Jinsong > > > Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM > > > > > > On Wed, Sep 26, 2012 at 10:58:46PM +0000, Auld, Will wrote: > > > > Avi, Still working on your suggestions. > > > > > > > > Marcelo, > > > > > > > > The purpose is to be able to run guests that implement this change and > > > > not require they revert to the older method of adjusting the TSC. I am > > > > making no assumption about whether the guest checks to see if the times > > > > are good enough or just runs an algorithm every time but in any case > > > > this would allow the simpler, cleaner and less expensive algorithm to > > > > run if it exists. > > > > > > Will, you can choose to not expose the feature. Correct? > > > > > > Because this conflicts with the model that has been envisioned and > > > developed by Zachary... for that model to continue to be functional > > > you'll have to make sure the TSC emulation is adjusted accordingly to > > > consider IA32_TSC_ADJUST (for example, when trapping TSC). > > > > > > >From that point of view, the patch below is incomplete. > > > > > > ... or KVM can choose to never expose the feature via CPUID and handle > > > TSC consistency itself (i understand your perspective of getting a task > > > complete, but unfortunately from my POV its not so simple). > > > > > > > Thanks, > > > > > > > > Will > > > > > > > > >The purpose of the IA32_TSC_ADJUST control is to make it easier for > > > > >the operating system >(host) to decrease the delta between cores to an > > > > >acceptable value, so that applications >can make use of direct RDTSC, > > > > >correct? > > > > > > > > > >Why is it necessary for the guests to make use of such interface, if > > > > >the hypervisor >could provide proper TSC? > > > > > > > > > >(not against exposing it to the guests, just thinking out loud). > > > > > > > > > >That is, if the purpose of the IA32_TSC_ADJUST is to provide proper > > > > >synchronized TSC >across cores, and newer guests which should already > > > > >make use of paravirt clock >interface, what is the point of exposing > > > > >the feature? > > > > > > > > -----Original Message----- > > > > From: Marcelo Tosatti [mailto:mtosa...@redhat.com] > > > > Sent: Wednesday, September 26, 2012 2:35 PM > > > > To: Auld, Will > > > > Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao > > > > Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM > > > > > > > > On Wed, Sep 19, 2012 at 05:44:46PM +0000, Auld, Will wrote: > > > > > >From 9982bb73460b05c1328068aae047b14b2294e2da Mon Sep 17 > > > > > >00:00:00 > > > > > >2001 > > > > > From: Will Auld <will.a...@intel.com> > > > > > Date: Wed, 12 Sep 2012 18:10:56 -0700 > > > > > Subject: [PATCH] Enabling IA32_TSC_ADJUST for guest VM > > > > > > > > > > CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is > > > > > supported > > > > > > > > > > Basic design is to emulate the MSR by allowing reads and writes to a > > > > > guest vcpu specific location to store the value of the emulated MSR > > > > > while adding the value to the vmcs tsc_offset. In this way the > > > > > IA32_TSC_ADJUST value will be included in all reads to the TSC MSR > > > > > whether through rdmsr or rdtsc. This is of course as long as the "use > > > > > TSC counter offsetting" VM-execution control is enabled as well as > > > > > the IA32_TSC_ADJUST control. > > > > > > > > > > However, because hardware will only return the TSC + IA32_TSC_ADJUST > > > > > + vmsc tsc_offset for a guest process when it does and rdtsc (with > > > > > the correct settings) the value of our virtualized IA32_TSC_ADJUST > > > > > must be stored in one of these three locations. > > > > > > > > The purpose of the IA32_TSC_ADJUST control is to make it easier for the > > > > operating system (host) to decrease the delta between cores to an > > > > acceptable value, so that applications can make use of direct RDTSC, > > > > correct? > > > > > > > > Why is it necessary for the guests to make use of such interface, if > > > > the hypervisor could provide proper TSC? > > > > > > > > (not against exposing it to the guests, just thinking out loud). > > > > > > > > That is, if the purpose of the IA32_TSC_ADJUST is to provide proper > > > > synchronized TSC across cores, and newer guests which should already > > > > make use of paravirt clock interface, what is the point of exposing the > > > > feature? > > > > > > > > > The argument against storing it in the actual MSR is performance. > > > > > This is likely to be seldom used while the save/restore is > > > > > required on every transition. IA32_TSC_ADJUST was created as a > > > > > way to solve some issues with writing TSC itself so that is not an > > > > > option either. > > > > > The remaining option, defined above as our solution has the > > > > > problem of returning incorrect vmcs tsc_offset values (unless > > > > > we intercept and fix, not done here) as mentioned above. > > > > > However, more problematic is that storing the data in vmcs > > > > > tsc_offset will have a different semantic effect on the system > > > > > than does using the actual MSR. This is illustrated in the > > > > > following example: The hypervisor set the IA32_TSC_ADJUST, then the > > > > > guest sets it and a guest process perfor! > > > > > ms a rdtsc. In this case the guest process will get TSC + > > > > > IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including > > > > > IA32_TSC_ADJUST_guest. While the total system semantics changed > > > > > the semantics as seen by the guest do not and hence this will > > > > > not cause a problem. > > > > > --- > > > > > > > > > arch/x86/include/asm/cpufeature.h | 1 + > > > > > arch/x86/include/asm/kvm_host.h | 2 ++ > > > > > arch/x86/include/asm/msr-index.h | 1 + > > > > > arch/x86/kvm/cpuid.c | 4 ++-- > > > > > arch/x86/kvm/vmx.c | 12 ++++++++++++ > > > > > arch/x86/kvm/x86.c | 1 + > > > > > 6 files changed, 19 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/arch/x86/include/asm/cpufeature.h > > > > > b/arch/x86/include/asm/cpufeature.h > > > > > index 6b7ee5f..e574d81 100644 > > > > > --- a/arch/x86/include/asm/cpufeature.h > > > > > +++ b/arch/x86/include/asm/cpufeature.h > > > > > @@ -199,6 +199,7 @@ > > > > > > > > > > /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word > > > > > 9 */ > > > > > #define X86_FEATURE_FSGSBASE (9*32+ 0) /* {RD/WR}{FS/GS}BASE > > > > > instructions*/ > > > > > +#define X86_FEATURE_TSC_ADJUST (9*32+ 1) /* TSC adjustment MSR > > > > > +0x3b */ > > > > > #define X86_FEATURE_BMI1 (9*32+ 3) /* 1st group bit manipulation > > > > > extensions */ > > > > > #define X86_FEATURE_HLE (9*32+ 4) /* Hardware Lock > > > > > Elision */ > > > > > #define X86_FEATURE_AVX2 (9*32+ 5) /* AVX2 instructions */ > > > > > diff --git a/arch/x86/include/asm/kvm_host.h > > > > > b/arch/x86/include/asm/kvm_host.h index 09155d6..8a001a4 100644 > > > > > --- a/arch/x86/include/asm/kvm_host.h > > > > > +++ b/arch/x86/include/asm/kvm_host.h > > > > > @@ -442,6 +442,8 @@ struct kvm_vcpu_arch { > > > > > u32 virtual_tsc_mult; > > > > > u32 virtual_tsc_khz; > > > > > > > > > > + s64 tsc_adjust; > > > > > + > > > > > atomic_t nmi_queued; /* unprocessed asynchronous NMIs */ > > > > > unsigned nmi_pending; /* NMI queued after currently running > > > > > handler */ > > > > > bool nmi_injected; /* Trying to inject an NMI this entry */ > > > > > diff --git a/arch/x86/include/asm/msr-index.h > > > > > b/arch/x86/include/asm/msr-index.h > > > > > index 957ec87..8e82e29 100644 > > > > > --- a/arch/x86/include/asm/msr-index.h > > > > > +++ b/arch/x86/include/asm/msr-index.h > > > > > @@ -231,6 +231,7 @@ > > > > > #define MSR_IA32_EBL_CR_POWERON 0x0000002a > > > > > #define MSR_EBC_FREQUENCY_ID 0x0000002c > > > > > #define MSR_IA32_FEATURE_CONTROL 0x0000003a > > > > > +#define MSR_TSC_ADJUST 0x0000003b > > > > > > > > > > #define FEATURE_CONTROL_LOCKED (1<<0) > > > > > #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (1<<1) > > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index > > > > > 0595f13..8f5943e 100644 > > > > > --- a/arch/x86/kvm/cpuid.c > > > > > +++ b/arch/x86/kvm/cpuid.c > > > > > @@ -248,8 +248,8 @@ static int do_cpuid_ent(struct > > > > > kvm_cpuid_entry2 *entry, u32 function, > > > > > > > > > > /* cpuid 7.0.ebx */ > > > > > const u32 kvm_supported_word9_x86_features = > > > > > - F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) | > > > > > - F(BMI2) | F(ERMS) | f_invpcid | F(RTM); > > > > > + F(FSGSBASE) | F(TSC_ADJUST) | F(BMI1) | F(HLE) | > > > > > + F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | > > > > > F(RTM); > > > > > > > > > > /* all calls to cpuid_count() should be made on the same cpu */ > > > > > get_cpu(); > > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index > > > > > c00f03d..35d11b3 100644 > > > > > --- a/arch/x86/kvm/vmx.c > > > > > +++ b/arch/x86/kvm/vmx.c > > > > > @@ -2173,6 +2173,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, > > > > > u32 msr_index, u64 *pdata) > > > > > case MSR_IA32_SYSENTER_ESP: > > > > > data = vmcs_readl(GUEST_SYSENTER_ESP); > > > > > break; > > > > > + case MSR_TSC_ADJUST: > > > > > + data = (u64)vcpu->arch.tsc_adjust; > > > > > + break; > > > > > case MSR_TSC_AUX: > > > > > if (!to_vmx(vcpu)->rdtscp_enabled) > > > > > return 1; > > > > > @@ -2241,6 +2244,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, > > > > > u32 msr_index, u64 data) > > > > > } > > > > > ret = kvm_set_msr_common(vcpu, msr_index, data); > > > > > break; > > > > > + case MSR_TSC_ADJUST: > > > > > +#define DUMMY 1 > > > > > + vmx_adjust_tsc_offset(vcpu, > > > > > + (s64)(data-vcpu->arch.tsc_adjust), > > > > > + (bool)DUMMY); > > > > > + vcpu->arch.tsc_adjust = (s64)data; > > > > > + break; > > > > > case MSR_TSC_AUX: > > > > > if (!vmx->rdtscp_enabled) > > > > > return 1; > > > > > @@ -3931,6 +3941,8 @@ static int vmx_vcpu_reset(struct kvm_vcpu > > > > > *vcpu) > > > > > > > > > > vcpu->arch.regs_avail = ~((1 << VCPU_REGS_RIP) | (1 << > > > > > VCPU_REGS_RSP)); > > > > > > > > > > + vcpu->arch.tsc_adjust = 0x0; > > > > > + > > > > > vmx->rmode.vm86_active = 0; > > > > > > > > > > vmx->soft_vnmi_blocked = 0; > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index > > > > > 42bce48..6c50f6c 100644 > > > > > --- a/arch/x86/kvm/x86.c > > > > > +++ b/arch/x86/kvm/x86.c > > > > > @@ -824,6 +824,7 @@ static u32 msrs_to_save[] = { static > > > > > unsigned num_msrs_to_save; > > > > > > > > > > static u32 emulated_msrs[] = { > > > > > + MSR_TSC_ADJUST, > > > > > MSR_IA32_TSCDEADLINE, > > > > > MSR_IA32_MISC_ENABLE, > > > > > MSR_IA32_MCG_STATUS, > > > > > -- > > > > > 1.7.1 > > > > > > > > > > -- > > > > > To unsubscribe from this list: send the line "unsubscribe kvm" > > > > > in the body of a message to majord...@vger.kernel.org More > > > > > majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in the > > body of a message to majord...@vger.kernel.org More majordomo info at > > http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html