Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
On Mon, 13 Feb 2017, Andy Lutomirski wrote: > On Sun, Feb 12, 2017 at 11:49 PM, Dexuan Cuiwrote: > >> From: Thomas Gleixner [mailto:t...@linutronix.de] > >> Sent: Saturday, February 11, 2017 02:02 > >> ... > >> That's important if the stuff happens cross CPU. If the update happens on > >> the same CPU then this is a different story and as there are VMexits > >> involved they might provide the required ordering already. But I can't tell > >> as I have no idea how that host side thing is done. > >> > >> tglx > > > > IMO Hyper-V TSC page clocksource here seems pretty similar to KVM's pvclock, > > So I would guess "the structure is only updated just before reentering the > > guest > > after some VM event" (https://rwmj.wordpress.com/2010/10/15/kvm-pvclock/), > > that is, the update should happen on the same CPU, I guess. > > If the patch is correct, there is one of these shared by all vCPUs, so > this is not a sufficient explanation. Right, because that's ony ONE TSC page for the whole guest and then the seq stuff really lacks barriers. Thanks, tglx ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
On Sun, Feb 12, 2017 at 11:49 PM, Dexuan Cuiwrote: >> From: Thomas Gleixner [mailto:t...@linutronix.de] >> Sent: Saturday, February 11, 2017 02:02 >> ... >> That's important if the stuff happens cross CPU. If the update happens on >> the same CPU then this is a different story and as there are VMexits >> involved they might provide the required ordering already. But I can't tell >> as I have no idea how that host side thing is done. >> >> tglx > > IMO Hyper-V TSC page clocksource here seems pretty similar to KVM's pvclock, > So I would guess "the structure is only updated just before reentering the > guest > after some VM event" (https://rwmj.wordpress.com/2010/10/15/kvm-pvclock/), > that is, the update should happen on the same CPU, I guess. If the patch is correct, there is one of these shared by all vCPUs, so this is not a sufficient explanation. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
On Mon, 13 Feb 2017, Dexuan Cui wrote: > > From: Thomas Gleixner [mailto:t...@linutronix.de] > > Sent: Saturday, February 11, 2017 02:02 > > ... > > That's important if the stuff happens cross CPU. If the update happens on > > the same CPU then this is a different story and as there are VMexits > > involved they might provide the required ordering already. But I can't tell > > as I have no idea how that host side thing is done. > > > > IMO Hyper-V TSC page clocksource here seems pretty similar to KVM's pvclock, > So I would guess "the structure is only updated just before reentering the > guest > after some VM event" (https://rwmj.wordpress.com/2010/10/15/kvm-pvclock/), > that is, the update should happen on the same CPU, I guess. Guessing does not help much. There are a gazillion ways to implement such a thing. I'm sure that there exists proper documentation of the mechanism inside your company. Thanks, tglx ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
> From: Thomas Gleixner [mailto:t...@linutronix.de] > Sent: Saturday, February 11, 2017 02:02 > ... > That's important if the stuff happens cross CPU. If the update happens on > the same CPU then this is a different story and as there are VMexits > involved they might provide the required ordering already. But I can't tell > as I have no idea how that host side thing is done. > > tglx IMO Hyper-V TSC page clocksource here seems pretty similar to KVM's pvclock, So I would guess "the structure is only updated just before reentering the guest after some VM event" (https://rwmj.wordpress.com/2010/10/15/kvm-pvclock/), that is, the update should happen on the same CPU, I guess. Thanks, -- Dexuan ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
On Fri, 10 Feb 2017, Stephen Hemminger wrote: > Since sequence count algorithm is done by hypervisor, better to not reuse > seqcount. > Still concerned that the code is racy. That's a different question and can only be answered by the hypervisor folks. Dunno, whether they have barrier requirements. The seqcount stuff relies on: do { seq = READ_ONCE(s->sequence); smp_rmb(); sample_data(); smp_rmb(); } while (s->sequence != seq); which pairs with the writer side: s->sequence++; smp_wmb(); update_data(); smp_wmb(); s->sequence++; That's important if the stuff happens cross CPU. If the update happens on the same CPU then this is a different story and as there are VMexits involved they might provide the required ordering already. But I can't tell as I have no idea how that host side thing is done. Thanks, tglx ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
Since sequence count algorithm is done by hypervisor, better to not reuse seqcount. Still concerned that the code is racy. -Original Message- From: Thomas Gleixner [mailto:t...@linutronix.de] Sent: Friday, February 10, 2017 4:28 AM To: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Stephen Hemminger <sthem...@microsoft.com>; x...@kernel.org; Andy Lutomirski <l...@amacapital.net>; Ingo Molnar <mi...@redhat.com>; H. Peter Anvin <h...@zytor.com>; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; Dexuan Cui <de...@microsoft.com>; linux-ker...@vger.kernel.org; de...@linuxdriverproject.org; virtualization@lists.linux-foundation.org Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method On Fri, 10 Feb 2017, Vitaly Kuznetsov wrote: > Stephen Hemminger <sthem...@microsoft.com> writes: > > > Why not use existing seqlock's? > > > > To be honest I don't quite understand how we could use it -- the > sequence locking here is done against the page updated by the > hypersior, we're not creating new structures (so I don't understand > how we could use struct seqcount which we don't have) but I may be > misunderstanding something. You can't use seqlock, but you might be able to use seqcount. Though I doubt it given the 0 check Thanks, tglx ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
On Fri, 10 Feb 2017, Vitaly Kuznetsov wrote: > Stephen Hemmingerwrites: > > > Why not use existing seqlock's? > > > > To be honest I don't quite understand how we could use it -- the > sequence locking here is done against the page updated by the > hypersior, we're not creating new structures (so I don't understand how > we could use struct seqcount which we don't have) but I may be > misunderstanding something. You can't use seqlock, but you might be able to use seqcount. Though I doubt it given the 0 check Thanks, tglx ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
Stephen Hemmingerwrites: > Why not use existing seqlock's? > To be honest I don't quite understand how we could use it -- the sequence locking here is done against the page updated by the hypersior, we're not creating new structures (so I don't understand how we could use struct seqcount which we don't have) but I may be misunderstanding something. BTW, I just occured to me that I should've probably put the TSC reading code to mshyperv.h and use it from both vDSO and read_hv_clock_tsc() -- what do you thing? [snip] -- Vitaly ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
Andy Lutomirski <l...@amacapital.net> writes: > On Thu, Feb 9, 2017 at 12:45 PM, KY Srinivasan <k...@microsoft.com> wrote: >> >> >>> -Original Message- >>> From: Thomas Gleixner [mailto:t...@linutronix.de] >>> Sent: Thursday, February 9, 2017 9:08 AM >>> To: Vitaly Kuznetsov <vkuzn...@redhat.com> >>> Cc: x...@kernel.org; Andy Lutomirski <l...@amacapital.net>; Ingo Molnar >>> <mi...@redhat.com>; H. Peter Anvin <h...@zytor.com>; KY Srinivasan >>> <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; Stephen >>> Hemminger <sthem...@microsoft.com>; Dexuan Cui >>> <de...@microsoft.com>; linux-ker...@vger.kernel.org; >>> de...@linuxdriverproject.org; virtualization@lists.linux-foundation.org >>> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read >>> method >>> >>> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote: >>> > +#ifdef CONFIG_HYPERV_TSCPAGE >>> > +static notrace u64 vread_hvclock(int *mode) >>> > +{ >>> > + const struct ms_hyperv_tsc_page *tsc_pg = >>> > + (const struct ms_hyperv_tsc_page *)_page; >>> > + u64 sequence, scale, offset, current_tick, cur_tsc; >>> > + >>> > + while (1) { >>> > + sequence = READ_ONCE(tsc_pg->tsc_sequence); >>> > + if (!sequence) >>> > + break; >>> > + >>> > + scale = READ_ONCE(tsc_pg->tsc_scale); >>> > + offset = READ_ONCE(tsc_pg->tsc_offset); >>> > + rdtscll(cur_tsc); >>> > + >>> > + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset; >>> > + >>> > + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence) >>> > + return current_tick; >>> >>> That sequence stuff lacks still a sensible explanation. It's fundamentally >>> different from the sequence counting we do in the kernel, so documentation >>> for it is really required. >> >> The host is updating multiple fields in this shared TSC page and the >> sequence number is >> used to ensure that the guest sees a consistent set values published. If I >> remember >> correctly, Xen has a similar mechanism. > > So what's the actual protocol? When the hypervisor updates the page, > does it freeze all guest cpus? If not, how does it maintain > atomicity? I don't really know how it is implemented server-side but I *think* that freezing all CPUs is only required when we want to update *both* ReferenceTscScale and ReferenceTscOffset at the same time (as Hyper-V is 64-bit only so it can always atomically update 64-bit values)... -- Vitaly ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
On Fri, 10 Feb 2017, Vitaly Kuznetsov wrote: > Thomas Gleixnerwrites: > > > On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote: > >> +#ifdef CONFIG_HYPERV_TSCPAGE > >> +static notrace u64 vread_hvclock(int *mode) > >> +{ > >> + const struct ms_hyperv_tsc_page *tsc_pg = > >> + (const struct ms_hyperv_tsc_page *)_page; > >> + u64 sequence, scale, offset, current_tick, cur_tsc; > >> + > >> + while (1) { > >> + sequence = READ_ONCE(tsc_pg->tsc_sequence); > >> + if (!sequence) > >> + break; > >> + > >> + scale = READ_ONCE(tsc_pg->tsc_scale); > >> + offset = READ_ONCE(tsc_pg->tsc_offset); > >> + rdtscll(cur_tsc); > >> + > >> + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset; > >> + > >> + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence) > >> + return current_tick; > > > > That sequence stuff lacks still a sensible explanation. It's fundamentally > > different from the sequence counting we do in the kernel, so documentation > > for it is really required. > > Sure, do you think the following would do? Yes > diff --git a/arch/x86/entry/vdso/vclock_gettime.c > b/arch/x86/entry/vdso/vclock_gettime.c > index 4af10b4..886b600 100644 > --- a/arch/x86/entry/vdso/vclock_gettime.c > +++ b/arch/x86/entry/vdso/vclock_gettime.c > @@ -154,6 +154,22 @@ static notrace u64 vread_hvclock(int *mode) > (const struct ms_hyperv_tsc_page *)_page; > u64 sequence, scale, offset, current_tick, cur_tsc; > > + /* > +* The protocol for reading Hyper-V TSC page is specified in > Hypervisor > +* Top-Level Functional Specification ver. 3.0 and above. To get the > +* reference time we must do the following: > +* - READ ReferenceTscSequence > +* A special '0' value indicates the time source is unreliable and > we > +* need to use something else. The currently published specification > +* versions (up to 4.0b) contain a mistake and wrongly claim '-1' > +* instead of '0' as the special value, see commit c35b82ef0294. > +* - ReferenceTime = > +*((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset > +* - READ ReferenceTscSequence again. In case its value has changed > +* since our first reading we need to discard ReferenceTime and > repeat > +* the whole sequence as the hypervisor was updating the page in > +* between. > +*/ > while (1) { > sequence = READ_ONCE(tsc_pg->tsc_sequence); > if (!sequence) > > -- > Vitaly > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
Thomas Gleixnerwrites: > On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote: >> +#ifdef CONFIG_HYPERV_TSCPAGE >> +static notrace u64 vread_hvclock(int *mode) >> +{ >> +const struct ms_hyperv_tsc_page *tsc_pg = >> +(const struct ms_hyperv_tsc_page *)_page; >> +u64 sequence, scale, offset, current_tick, cur_tsc; >> + >> +while (1) { >> +sequence = READ_ONCE(tsc_pg->tsc_sequence); >> +if (!sequence) >> +break; >> + >> +scale = READ_ONCE(tsc_pg->tsc_scale); >> +offset = READ_ONCE(tsc_pg->tsc_offset); >> +rdtscll(cur_tsc); >> + >> +current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset; >> + >> +if (READ_ONCE(tsc_pg->tsc_sequence) == sequence) >> +return current_tick; > > That sequence stuff lacks still a sensible explanation. It's fundamentally > different from the sequence counting we do in the kernel, so documentation > for it is really required. Sure, do you think the following would do? diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c index 4af10b4..886b600 100644 --- a/arch/x86/entry/vdso/vclock_gettime.c +++ b/arch/x86/entry/vdso/vclock_gettime.c @@ -154,6 +154,22 @@ static notrace u64 vread_hvclock(int *mode) (const struct ms_hyperv_tsc_page *)_page; u64 sequence, scale, offset, current_tick, cur_tsc; + /* +* The protocol for reading Hyper-V TSC page is specified in Hypervisor +* Top-Level Functional Specification ver. 3.0 and above. To get the +* reference time we must do the following: +* - READ ReferenceTscSequence +* A special '0' value indicates the time source is unreliable and we +* need to use something else. The currently published specification +* versions (up to 4.0b) contain a mistake and wrongly claim '-1' +* instead of '0' as the special value, see commit c35b82ef0294. +* - ReferenceTime = +*((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset +* - READ ReferenceTscSequence again. In case its value has changed +* since our first reading we need to discard ReferenceTime and repeat +* the whole sequence as the hypervisor was updating the page in +* between. +*/ while (1) { sequence = READ_ONCE(tsc_pg->tsc_sequence); if (!sequence) -- Vitaly ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
On Thu, 9 Feb 2017 14:55:50 -0800 Andy Lutomirski <l...@amacapital.net> wrote: > On Thu, Feb 9, 2017 at 12:45 PM, KY Srinivasan <k...@microsoft.com> wrote: > > > > > >> -Original Message- > >> From: Thomas Gleixner [mailto:t...@linutronix.de] > >> Sent: Thursday, February 9, 2017 9:08 AM > >> To: Vitaly Kuznetsov <vkuzn...@redhat.com> > >> Cc: x...@kernel.org; Andy Lutomirski <l...@amacapital.net>; Ingo Molnar > >> <mi...@redhat.com>; H. Peter Anvin <h...@zytor.com>; KY Srinivasan > >> <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; Stephen > >> Hemminger <sthem...@microsoft.com>; Dexuan Cui > >> <de...@microsoft.com>; linux-ker...@vger.kernel.org; > >> de...@linuxdriverproject.org; virtualization@lists.linux-foundation.org > >> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read > >> method > >> > >> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote: > >> > +#ifdef CONFIG_HYPERV_TSCPAGE > >> > +static notrace u64 vread_hvclock(int *mode) > >> > +{ > >> > + const struct ms_hyperv_tsc_page *tsc_pg = > >> > + (const struct ms_hyperv_tsc_page *)_page; > >> > + u64 sequence, scale, offset, current_tick, cur_tsc; > >> > + > >> > + while (1) { > >> > + sequence = READ_ONCE(tsc_pg->tsc_sequence); > >> > + if (!sequence) > >> > + break; > >> > + > >> > + scale = READ_ONCE(tsc_pg->tsc_scale); > >> > + offset = READ_ONCE(tsc_pg->tsc_offset); > >> > + rdtscll(cur_tsc); > >> > + > >> > + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset; > >> > + > >> > + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence) > >> > + return current_tick; > >> > >> That sequence stuff lacks still a sensible explanation. It's fundamentally > >> different from the sequence counting we do in the kernel, so documentation > >> for it is really required. > > > > The host is updating multiple fields in this shared TSC page and the > > sequence number is > > used to ensure that the guest sees a consistent set values published. If I > > remember > > correctly, Xen has a similar mechanism. > > So what's the actual protocol? When the hypervisor updates the page, > does it freeze all guest cpus? If not, how does it maintain > atomicity? The protocol looks a lot like Linux seqlock, but it has an extra protection which is missing here. The host needs to update sequence number twice in order to guarantee ordering. Otherwise it is possible that Host and guest can race. Host Write offset Write scale Set tsc_sequence = N Guest read sequence = N Read scale Write scale Write offset Read Offset Check sequence == N Set tsc_sequence = N +1 Look like the current host side protocol is wrong. The solution that Andi Kleen invented, and I used in seqlock was for the writer to update sequence at start and end of transaction. If sequence number is odd, then the reader knows it is looking at stale data. Host Write offset Write scale Set tsc_sequence = N (end of transaction) Guest read sequence = N Spin until sequence is even (N is even) Read scale Set tsc_sequence += 1 Write scale Write offset Read Offset Check sequence == N? (fails is N + 1) Set tsc_sequence += 1 (end of transaction) read sequence = N+2 Spin until sequence is even (ie N +2) Read scale Read Offset Check sequence == N +2? (yes ok). Also it is faster to just read scale and offset with this loop and save the reading of TSC and doing multiply until after scale/offset has been acquired. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
On Thu, Feb 9, 2017 at 12:45 PM, KY Srinivasan <k...@microsoft.com> wrote: > > >> -Original Message- >> From: Thomas Gleixner [mailto:t...@linutronix.de] >> Sent: Thursday, February 9, 2017 9:08 AM >> To: Vitaly Kuznetsov <vkuzn...@redhat.com> >> Cc: x...@kernel.org; Andy Lutomirski <l...@amacapital.net>; Ingo Molnar >> <mi...@redhat.com>; H. Peter Anvin <h...@zytor.com>; KY Srinivasan >> <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; Stephen >> Hemminger <sthem...@microsoft.com>; Dexuan Cui >> <de...@microsoft.com>; linux-ker...@vger.kernel.org; >> de...@linuxdriverproject.org; virtualization@lists.linux-foundation.org >> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read >> method >> >> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote: >> > +#ifdef CONFIG_HYPERV_TSCPAGE >> > +static notrace u64 vread_hvclock(int *mode) >> > +{ >> > + const struct ms_hyperv_tsc_page *tsc_pg = >> > + (const struct ms_hyperv_tsc_page *)_page; >> > + u64 sequence, scale, offset, current_tick, cur_tsc; >> > + >> > + while (1) { >> > + sequence = READ_ONCE(tsc_pg->tsc_sequence); >> > + if (!sequence) >> > + break; >> > + >> > + scale = READ_ONCE(tsc_pg->tsc_scale); >> > + offset = READ_ONCE(tsc_pg->tsc_offset); >> > + rdtscll(cur_tsc); >> > + >> > + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset; >> > + >> > + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence) >> > + return current_tick; >> >> That sequence stuff lacks still a sensible explanation. It's fundamentally >> different from the sequence counting we do in the kernel, so documentation >> for it is really required. > > The host is updating multiple fields in this shared TSC page and the sequence > number is > used to ensure that the guest sees a consistent set values published. If I > remember > correctly, Xen has a similar mechanism. So what's the actual protocol? When the hypervisor updates the page, does it freeze all guest cpus? If not, how does it maintain atomicity? --Andy ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
> -Original Message- > From: Thomas Gleixner [mailto:t...@linutronix.de] > Sent: Thursday, February 9, 2017 9:08 AM > To: Vitaly Kuznetsov <vkuzn...@redhat.com> > Cc: x...@kernel.org; Andy Lutomirski <l...@amacapital.net>; Ingo Molnar > <mi...@redhat.com>; H. Peter Anvin <h...@zytor.com>; KY Srinivasan > <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; Stephen > Hemminger <sthem...@microsoft.com>; Dexuan Cui > <de...@microsoft.com>; linux-ker...@vger.kernel.org; > de...@linuxdriverproject.org; virtualization@lists.linux-foundation.org > Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read > method > > On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote: > > +#ifdef CONFIG_HYPERV_TSCPAGE > > +static notrace u64 vread_hvclock(int *mode) > > +{ > > + const struct ms_hyperv_tsc_page *tsc_pg = > > + (const struct ms_hyperv_tsc_page *)_page; > > + u64 sequence, scale, offset, current_tick, cur_tsc; > > + > > + while (1) { > > + sequence = READ_ONCE(tsc_pg->tsc_sequence); > > + if (!sequence) > > + break; > > + > > + scale = READ_ONCE(tsc_pg->tsc_scale); > > + offset = READ_ONCE(tsc_pg->tsc_offset); > > + rdtscll(cur_tsc); > > + > > + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset; > > + > > + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence) > > + return current_tick; > > That sequence stuff lacks still a sensible explanation. It's fundamentally > different from the sequence counting we do in the kernel, so documentation > for it is really required. The host is updating multiple fields in this shared TSC page and the sequence number is used to ensure that the guest sees a consistent set values published. If I remember correctly, Xen has a similar mechanism. K. Y > > Thanks, > > tglx ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
Why not use existing seqlock's? -Original Message- From: Thomas Gleixner [mailto:t...@linutronix.de] Sent: Thursday, February 9, 2017 9:08 AM To: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: x...@kernel.org; Andy Lutomirski <l...@amacapital.net>; Ingo Molnar <mi...@redhat.com>; H. Peter Anvin <h...@zytor.com>; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; Stephen Hemminger <sthem...@microsoft.com>; Dexuan Cui <de...@microsoft.com>; linux-ker...@vger.kernel.org; de...@linuxdriverproject.org; virtualization@lists.linux-foundation.org Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote: > +#ifdef CONFIG_HYPERV_TSCPAGE > +static notrace u64 vread_hvclock(int *mode) { > + const struct ms_hyperv_tsc_page *tsc_pg = > + (const struct ms_hyperv_tsc_page *)_page; > + u64 sequence, scale, offset, current_tick, cur_tsc; > + > + while (1) { > + sequence = READ_ONCE(tsc_pg->tsc_sequence); > + if (!sequence) > + break; > + > + scale = READ_ONCE(tsc_pg->tsc_scale); > + offset = READ_ONCE(tsc_pg->tsc_offset); > + rdtscll(cur_tsc); > + > + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset; > + > + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence) > + return current_tick; That sequence stuff lacks still a sensible explanation. It's fundamentally different from the sequence counting we do in the kernel, so documentation for it is really required. Thanks, tglx ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote: > +#ifdef CONFIG_HYPERV_TSCPAGE > +static notrace u64 vread_hvclock(int *mode) > +{ > + const struct ms_hyperv_tsc_page *tsc_pg = > + (const struct ms_hyperv_tsc_page *)_page; > + u64 sequence, scale, offset, current_tick, cur_tsc; > + > + while (1) { > + sequence = READ_ONCE(tsc_pg->tsc_sequence); > + if (!sequence) > + break; > + > + scale = READ_ONCE(tsc_pg->tsc_scale); > + offset = READ_ONCE(tsc_pg->tsc_offset); > + rdtscll(cur_tsc); > + > + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset; > + > + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence) > + return current_tick; That sequence stuff lacks still a sensible explanation. It's fundamentally different from the sequence counting we do in the kernel, so documentation for it is really required. Thanks, tglx ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
Hyper-V TSC page clocksource is suitable for vDSO, however, the protocol defined by the hypervisor is different from VCLOCK_PVCLOCK. Implement the required support by adding hvclock_page VVAR. Signed-off-by: Vitaly Kuznetsov--- arch/x86/entry/vdso/vclock_gettime.c | 36 +++ arch/x86/entry/vdso/vdso-layout.lds.S | 3 ++- arch/x86/entry/vdso/vdso2c.c | 3 +++ arch/x86/entry/vdso/vma.c | 7 +++ arch/x86/hyperv/hv_init.c | 3 +++ arch/x86/include/asm/clocksource.h| 3 ++- arch/x86/include/asm/vdso.h | 1 + 7 files changed, 54 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c index 9d4d6e1..4af10b4 100644 --- a/arch/x86/entry/vdso/vclock_gettime.c +++ b/arch/x86/entry/vdso/vclock_gettime.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -32,6 +33,11 @@ extern u8 pvclock_page __attribute__((visibility("hidden"))); #endif +#ifdef CONFIG_HYPERV_TSCPAGE +extern u8 hvclock_page + __attribute__((visibility("hidden"))); +#endif + #ifndef BUILD_VDSO32 notrace static long vdso_fallback_gettime(long clock, struct timespec *ts) @@ -141,6 +147,32 @@ static notrace u64 vread_pvclock(int *mode) return last; } #endif +#ifdef CONFIG_HYPERV_TSCPAGE +static notrace u64 vread_hvclock(int *mode) +{ + const struct ms_hyperv_tsc_page *tsc_pg = + (const struct ms_hyperv_tsc_page *)_page; + u64 sequence, scale, offset, current_tick, cur_tsc; + + while (1) { + sequence = READ_ONCE(tsc_pg->tsc_sequence); + if (!sequence) + break; + + scale = READ_ONCE(tsc_pg->tsc_scale); + offset = READ_ONCE(tsc_pg->tsc_offset); + rdtscll(cur_tsc); + + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset; + + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence) + return current_tick; + } + + *mode = VCLOCK_NONE; + return 0; +} +#endif notrace static u64 vread_tsc(void) { @@ -173,6 +205,10 @@ notrace static inline u64 vgetsns(int *mode) else if (gtod->vclock_mode == VCLOCK_PVCLOCK) cycles = vread_pvclock(mode); #endif +#ifdef CONFIG_HYPERV_TSCPAGE + else if (gtod->vclock_mode == VCLOCK_HVCLOCK) + cycles = vread_hvclock(mode); +#endif else return 0; v = (cycles - gtod->cycle_last) & gtod->mask; diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S index a708aa9..8ebb4b6 100644 --- a/arch/x86/entry/vdso/vdso-layout.lds.S +++ b/arch/x86/entry/vdso/vdso-layout.lds.S @@ -25,7 +25,7 @@ SECTIONS * segment. */ - vvar_start = . - 2 * PAGE_SIZE; + vvar_start = . - 3 * PAGE_SIZE; vvar_page = vvar_start; /* Place all vvars at the offsets in asm/vvar.h. */ @@ -36,6 +36,7 @@ SECTIONS #undef EMIT_VVAR pvclock_page = vvar_start + PAGE_SIZE; + hvclock_page = vvar_start + 2 * PAGE_SIZE; . = SIZEOF_HEADERS; diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c index 491020b..0780a44 100644 --- a/arch/x86/entry/vdso/vdso2c.c +++ b/arch/x86/entry/vdso/vdso2c.c @@ -74,6 +74,7 @@ enum { sym_vvar_page, sym_hpet_page, sym_pvclock_page, + sym_hvclock_page, sym_VDSO_FAKE_SECTION_TABLE_START, sym_VDSO_FAKE_SECTION_TABLE_END, }; @@ -82,6 +83,7 @@ const int special_pages[] = { sym_vvar_page, sym_hpet_page, sym_pvclock_page, + sym_hvclock_page, }; struct vdso_sym { @@ -94,6 +96,7 @@ struct vdso_sym required_syms[] = { [sym_vvar_page] = {"vvar_page", true}, [sym_hpet_page] = {"hpet_page", true}, [sym_pvclock_page] = {"pvclock_page", true}, + [sym_hvclock_page] = {"hvclock_page", true}, [sym_VDSO_FAKE_SECTION_TABLE_START] = { "VDSO_FAKE_SECTION_TABLE_START", false }, diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index 10820f6..b256a3b 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -21,6 +21,7 @@ #include #include #include +#include #if defined(CONFIG_X86_64) unsigned int __read_mostly vdso64_enabled = 1; @@ -120,6 +121,12 @@ static int vvar_fault(const struct vm_special_mapping *sm, vmf->address, __pa(pvti) >> PAGE_SHIFT); } + } else if (sym_offset == image->sym_hvclock_page) { + struct ms_hyperv_tsc_page *tsc_pg = hv_get_tsc_page(); + + if (tsc_pg && vclock_was_used(VCLOCK_HVCLOCK)) + ret = vm_insert_pfn(vma, vmf->address, +