Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-13 Thread Thomas Gleixner
On Mon, 13 Feb 2017, Andy Lutomirski wrote:

> On Sun, Feb 12, 2017 at 11:49 PM, Dexuan Cui  wrote:
> >> From: Thomas Gleixner [mailto:t...@linutronix.de]
> >> Sent: Saturday, February 11, 2017 02:02
> >>  ...
> >> That's important if the stuff happens cross CPU. If the update happens on
> >> the same CPU then this is a different story and as there are VMexits
> >> involved they might provide the required ordering already. But I can't tell
> >> as I have no idea how that host side thing is done.
> >>
> >>   tglx
> >
> > IMO Hyper-V TSC page clocksource here seems pretty similar to KVM's pvclock,
> > So I would guess "the structure is only updated just before reentering the 
> > guest
> > after some VM event" (https://rwmj.wordpress.com/2010/10/15/kvm-pvclock/),
> > that is, the update should happen on the same CPU, I guess.
> 
> If the patch is correct, there is one of these shared by all vCPUs, so
> this is not a sufficient explanation.

Right, because that's ony ONE TSC page for the whole guest and then the seq
stuff really lacks barriers.

Thanks,

tglx

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-13 Thread Andy Lutomirski
On Sun, Feb 12, 2017 at 11:49 PM, Dexuan Cui  wrote:
>> From: Thomas Gleixner [mailto:t...@linutronix.de]
>> Sent: Saturday, February 11, 2017 02:02
>>  ...
>> That's important if the stuff happens cross CPU. If the update happens on
>> the same CPU then this is a different story and as there are VMexits
>> involved they might provide the required ordering already. But I can't tell
>> as I have no idea how that host side thing is done.
>>
>>   tglx
>
> IMO Hyper-V TSC page clocksource here seems pretty similar to KVM's pvclock,
> So I would guess "the structure is only updated just before reentering the 
> guest
> after some VM event" (https://rwmj.wordpress.com/2010/10/15/kvm-pvclock/),
> that is, the update should happen on the same CPU, I guess.

If the patch is correct, there is one of these shared by all vCPUs, so
this is not a sufficient explanation.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-13 Thread Thomas Gleixner
On Mon, 13 Feb 2017, Dexuan Cui wrote:
> > From: Thomas Gleixner [mailto:t...@linutronix.de]
> > Sent: Saturday, February 11, 2017 02:02
> >  ...
> > That's important if the stuff happens cross CPU. If the update happens on
> > the same CPU then this is a different story and as there are VMexits
> > involved they might provide the required ordering already. But I can't tell
> > as I have no idea how that host side thing is done.
> > 
> 
> IMO Hyper-V TSC page clocksource here seems pretty similar to KVM's pvclock,
> So I would guess "the structure is only updated just before reentering the 
> guest
> after some VM event" (https://rwmj.wordpress.com/2010/10/15/kvm-pvclock/),
> that is, the update should happen on the same CPU, I guess.

Guessing does not help much. There are a gazillion ways to implement such a
thing. I'm sure that there exists proper documentation of the mechanism
inside your company.

Thanks,

tglx

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-12 Thread Dexuan Cui via Virtualization
> From: Thomas Gleixner [mailto:t...@linutronix.de]
> Sent: Saturday, February 11, 2017 02:02
>  ...
> That's important if the stuff happens cross CPU. If the update happens on
> the same CPU then this is a different story and as there are VMexits
> involved they might provide the required ordering already. But I can't tell
> as I have no idea how that host side thing is done.
> 
>   tglx

IMO Hyper-V TSC page clocksource here seems pretty similar to KVM's pvclock,
So I would guess "the structure is only updated just before reentering the guest
after some VM event" (https://rwmj.wordpress.com/2010/10/15/kvm-pvclock/),
that is, the update should happen on the same CPU, I guess.

Thanks,
-- Dexuan
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-10 Thread Thomas Gleixner
On Fri, 10 Feb 2017, Stephen Hemminger wrote:

> Since sequence count algorithm is done by hypervisor, better to not reuse 
> seqcount.
> Still concerned that the code is racy.

That's a different question and can only be answered by the hypervisor
folks. Dunno, whether they have barrier requirements. The seqcount stuff
relies on:

do {
seq = READ_ONCE(s->sequence);

smp_rmb();

sample_data();
  
smp_rmb();
} while (s->sequence != seq);

which pairs with the writer side:

   s->sequence++;
   
   smp_wmb();

   update_data();

   smp_wmb();

   s->sequence++;

That's important if the stuff happens cross CPU. If the update happens on
the same CPU then this is a different story and as there are VMexits
involved they might provide the required ordering already. But I can't tell
as I have no idea how that host side thing is done.

Thanks,

tglx

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-10 Thread Stephen Hemminger via Virtualization
Since sequence count algorithm is done by hypervisor, better to not reuse 
seqcount.
Still concerned that the code is racy.

-Original Message-
From: Thomas Gleixner [mailto:t...@linutronix.de] 
Sent: Friday, February 10, 2017 4:28 AM
To: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: Stephen Hemminger <sthem...@microsoft.com>; x...@kernel.org; Andy 
Lutomirski <l...@amacapital.net>; Ingo Molnar <mi...@redhat.com>; H. Peter 
Anvin <h...@zytor.com>; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang 
<haiya...@microsoft.com>; Dexuan Cui <de...@microsoft.com>; 
linux-ker...@vger.kernel.org; de...@linuxdriverproject.org; 
virtualization@lists.linux-foundation.org
Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

On Fri, 10 Feb 2017, Vitaly Kuznetsov wrote:

> Stephen Hemminger <sthem...@microsoft.com> writes:
> 
> > Why not use existing seqlock's?
> >
> 
> To be honest I don't quite understand how we could use it -- the 
> sequence locking here is done against the page updated by the 
> hypersior, we're not creating new structures (so I don't understand 
> how we could use struct seqcount which we don't have) but I may be 
> misunderstanding something.

You can't use seqlock, but you might be able to use seqcount. Though I doubt it 
given the 0 check 

Thanks,

tglx
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-10 Thread Thomas Gleixner
On Fri, 10 Feb 2017, Vitaly Kuznetsov wrote:

> Stephen Hemminger  writes:
> 
> > Why not use existing seqlock's?
> >
> 
> To be honest I don't quite understand how we could use it -- the
> sequence locking here is done against the page updated by the
> hypersior, we're not creating new structures (so I don't understand how
> we could use struct seqcount which we don't have) but I may be
> misunderstanding something.

You can't use seqlock, but you might be able to use seqcount. Though I
doubt it given the 0 check 

Thanks,

tglx
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-10 Thread Vitaly Kuznetsov
Stephen Hemminger  writes:

> Why not use existing seqlock's?
>

To be honest I don't quite understand how we could use it -- the
sequence locking here is done against the page updated by the
hypersior, we're not creating new structures (so I don't understand how
we could use struct seqcount which we don't have) but I may be
misunderstanding something.

BTW, I just occured to me that I should've probably put the TSC reading
code to mshyperv.h and use it from both vDSO and read_hv_clock_tsc() --
what do you thing?

[snip]

-- 
  Vitaly
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-10 Thread Vitaly Kuznetsov
Andy Lutomirski <l...@amacapital.net> writes:

> On Thu, Feb 9, 2017 at 12:45 PM, KY Srinivasan <k...@microsoft.com> wrote:
>>
>>
>>> -Original Message-
>>> From: Thomas Gleixner [mailto:t...@linutronix.de]
>>> Sent: Thursday, February 9, 2017 9:08 AM
>>> To: Vitaly Kuznetsov <vkuzn...@redhat.com>
>>> Cc: x...@kernel.org; Andy Lutomirski <l...@amacapital.net>; Ingo Molnar
>>> <mi...@redhat.com>; H. Peter Anvin <h...@zytor.com>; KY Srinivasan
>>> <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; Stephen
>>> Hemminger <sthem...@microsoft.com>; Dexuan Cui
>>> <de...@microsoft.com>; linux-ker...@vger.kernel.org;
>>> de...@linuxdriverproject.org; virtualization@lists.linux-foundation.org
>>> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read
>>> method
>>>
>>> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
>>> > +#ifdef CONFIG_HYPERV_TSCPAGE
>>> > +static notrace u64 vread_hvclock(int *mode)
>>> > +{
>>> > +   const struct ms_hyperv_tsc_page *tsc_pg =
>>> > +   (const struct ms_hyperv_tsc_page *)_page;
>>> > +   u64 sequence, scale, offset, current_tick, cur_tsc;
>>> > +
>>> > +   while (1) {
>>> > +   sequence = READ_ONCE(tsc_pg->tsc_sequence);
>>> > +   if (!sequence)
>>> > +   break;
>>> > +
>>> > +   scale = READ_ONCE(tsc_pg->tsc_scale);
>>> > +   offset = READ_ONCE(tsc_pg->tsc_offset);
>>> > +   rdtscll(cur_tsc);
>>> > +
>>> > +   current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
>>> > +
>>> > +   if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
>>> > +   return current_tick;
>>>
>>> That sequence stuff lacks still a sensible explanation. It's fundamentally
>>> different from the sequence counting we do in the kernel, so documentation
>>> for it is really required.
>>
>> The host is updating multiple fields in this shared TSC page and the 
>> sequence number is
>> used to ensure that the guest sees a consistent set values published. If I 
>> remember
>> correctly, Xen has a similar mechanism.
>
> So what's the actual protocol?  When the hypervisor updates the page,
> does it freeze all guest cpus?  If not, how does it maintain
> atomicity?

I don't really know how it is implemented server-side but I *think* that
freezing all CPUs is only required when we want to update *both*
ReferenceTscScale and ReferenceTscOffset at the same time (as Hyper-V is
64-bit only so it can always atomically update 64-bit values)...

-- 
  Vitaly
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-10 Thread Thomas Gleixner
On Fri, 10 Feb 2017, Vitaly Kuznetsov wrote:
> Thomas Gleixner  writes:
> 
> > On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
> >> +#ifdef CONFIG_HYPERV_TSCPAGE
> >> +static notrace u64 vread_hvclock(int *mode)
> >> +{
> >> +  const struct ms_hyperv_tsc_page *tsc_pg =
> >> +  (const struct ms_hyperv_tsc_page *)_page;
> >> +  u64 sequence, scale, offset, current_tick, cur_tsc;
> >> +
> >> +  while (1) {
> >> +  sequence = READ_ONCE(tsc_pg->tsc_sequence);
> >> +  if (!sequence)
> >> +  break;
> >> +
> >> +  scale = READ_ONCE(tsc_pg->tsc_scale);
> >> +  offset = READ_ONCE(tsc_pg->tsc_offset);
> >> +  rdtscll(cur_tsc);
> >> +
> >> +  current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> >> +
> >> +  if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> >> +  return current_tick;
> >
> > That sequence stuff lacks still a sensible explanation. It's fundamentally
> > different from the sequence counting we do in the kernel, so documentation
> > for it is really required.
> 
> Sure, do you think the following would do?

Yes

> diff --git a/arch/x86/entry/vdso/vclock_gettime.c 
> b/arch/x86/entry/vdso/vclock_gettime.c
> index 4af10b4..886b600 100644
> --- a/arch/x86/entry/vdso/vclock_gettime.c
> +++ b/arch/x86/entry/vdso/vclock_gettime.c
> @@ -154,6 +154,22 @@ static notrace u64 vread_hvclock(int *mode)
> (const struct ms_hyperv_tsc_page *)_page;
> u64 sequence, scale, offset, current_tick, cur_tsc;
>  
> +   /*
> +* The protocol for reading Hyper-V TSC page is specified in 
> Hypervisor
> +* Top-Level Functional Specification ver. 3.0 and above. To get the
> +* reference time we must do the following:
> +* - READ ReferenceTscSequence
> +*   A special '0' value indicates the time source is unreliable and 
> we
> +*   need to use something else. The currently published specification
> +*   versions (up to 4.0b) contain a mistake and wrongly claim '-1'
> +*   instead of '0' as the special value, see commit c35b82ef0294.
> +* - ReferenceTime =
> +*((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
> +* - READ ReferenceTscSequence again. In case its value has changed
> +*   since our first reading we need to discard ReferenceTime and 
> repeat
> +*   the whole sequence as the hypervisor was updating the page in
> +*   between.
> +*/
> while (1) {
> sequence = READ_ONCE(tsc_pg->tsc_sequence);
> if (!sequence)
> 
> -- 
>   Vitaly
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-10 Thread Vitaly Kuznetsov
Thomas Gleixner  writes:

> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
>> +#ifdef CONFIG_HYPERV_TSCPAGE
>> +static notrace u64 vread_hvclock(int *mode)
>> +{
>> +const struct ms_hyperv_tsc_page *tsc_pg =
>> +(const struct ms_hyperv_tsc_page *)_page;
>> +u64 sequence, scale, offset, current_tick, cur_tsc;
>> +
>> +while (1) {
>> +sequence = READ_ONCE(tsc_pg->tsc_sequence);
>> +if (!sequence)
>> +break;
>> +
>> +scale = READ_ONCE(tsc_pg->tsc_scale);
>> +offset = READ_ONCE(tsc_pg->tsc_offset);
>> +rdtscll(cur_tsc);
>> +
>> +current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
>> +
>> +if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
>> +return current_tick;
>
> That sequence stuff lacks still a sensible explanation. It's fundamentally
> different from the sequence counting we do in the kernel, so documentation
> for it is really required.

Sure, do you think the following would do?

diff --git a/arch/x86/entry/vdso/vclock_gettime.c 
b/arch/x86/entry/vdso/vclock_gettime.c
index 4af10b4..886b600 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -154,6 +154,22 @@ static notrace u64 vread_hvclock(int *mode)
(const struct ms_hyperv_tsc_page *)_page;
u64 sequence, scale, offset, current_tick, cur_tsc;
 
+   /*
+* The protocol for reading Hyper-V TSC page is specified in Hypervisor
+* Top-Level Functional Specification ver. 3.0 and above. To get the
+* reference time we must do the following:
+* - READ ReferenceTscSequence
+*   A special '0' value indicates the time source is unreliable and we
+*   need to use something else. The currently published specification
+*   versions (up to 4.0b) contain a mistake and wrongly claim '-1'
+*   instead of '0' as the special value, see commit c35b82ef0294.
+* - ReferenceTime =
+*((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
+* - READ ReferenceTscSequence again. In case its value has changed
+*   since our first reading we need to discard ReferenceTime and repeat
+*   the whole sequence as the hypervisor was updating the page in
+*   between.
+*/
while (1) {
sequence = READ_ONCE(tsc_pg->tsc_sequence);
if (!sequence)

-- 
  Vitaly
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-09 Thread Stephen Hemminger
On Thu, 9 Feb 2017 14:55:50 -0800
Andy Lutomirski <l...@amacapital.net> wrote:

> On Thu, Feb 9, 2017 at 12:45 PM, KY Srinivasan <k...@microsoft.com> wrote:
> >
> >  
> >> -Original Message-
> >> From: Thomas Gleixner [mailto:t...@linutronix.de]
> >> Sent: Thursday, February 9, 2017 9:08 AM
> >> To: Vitaly Kuznetsov <vkuzn...@redhat.com>
> >> Cc: x...@kernel.org; Andy Lutomirski <l...@amacapital.net>; Ingo Molnar
> >> <mi...@redhat.com>; H. Peter Anvin <h...@zytor.com>; KY Srinivasan
> >> <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; Stephen
> >> Hemminger <sthem...@microsoft.com>; Dexuan Cui
> >> <de...@microsoft.com>; linux-ker...@vger.kernel.org;
> >> de...@linuxdriverproject.org; virtualization@lists.linux-foundation.org
> >> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read
> >> method
> >>
> >> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:  
> >> > +#ifdef CONFIG_HYPERV_TSCPAGE
> >> > +static notrace u64 vread_hvclock(int *mode)
> >> > +{
> >> > +   const struct ms_hyperv_tsc_page *tsc_pg =
> >> > +   (const struct ms_hyperv_tsc_page *)_page;
> >> > +   u64 sequence, scale, offset, current_tick, cur_tsc;
> >> > +
> >> > +   while (1) {
> >> > +   sequence = READ_ONCE(tsc_pg->tsc_sequence);
> >> > +   if (!sequence)
> >> > +   break;
> >> > +
> >> > +   scale = READ_ONCE(tsc_pg->tsc_scale);
> >> > +   offset = READ_ONCE(tsc_pg->tsc_offset);
> >> > +   rdtscll(cur_tsc);
> >> > +
> >> > +   current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> >> > +
> >> > +   if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> >> > +   return current_tick;  
> >>
> >> That sequence stuff lacks still a sensible explanation. It's fundamentally
> >> different from the sequence counting we do in the kernel, so documentation
> >> for it is really required.  
> >
> > The host is updating multiple fields in this shared TSC page and the 
> > sequence number is
> > used to ensure that the guest sees a consistent set values published. If I 
> > remember
> > correctly, Xen has a similar mechanism.  
> 
> So what's the actual protocol?  When the hypervisor updates the page,
> does it freeze all guest cpus?  If not, how does it maintain
> atomicity?

The protocol looks a lot like Linux seqlock, but it has an extra protection
which is missing here.

The host needs to update sequence number twice in order to guarantee ordering.
Otherwise it is possible that Host and guest can race.

Host
Write offset
Write scale
Set tsc_sequence = N
  Guest
read sequence = N
Read scale
Write scale
Write offset

Read Offset
Check sequence == N
Set tsc_sequence = N +1

Look like the current host side protocol is wrong.

The solution that Andi Kleen invented, and I used in seqlock was for the writer 
to update
sequence at start and end of transaction. If sequence number is odd, then the 
reader knows
it is looking at stale data.
Host
Write offset
Write scale
Set tsc_sequence = N (end of 
transaction)
  Guest
read sequence = N
Spin until sequence is even (N is even)
Read scale
Set tsc_sequence += 1
Write scale
Write offset

Read Offset
Check sequence == N? (fails is N + 1)
Set tsc_sequence += 1 (end of 
transaction)
read sequence = N+2
Spin until sequence is even (ie N +2)
Read scale  
Read Offset
Check sequence == N +2? (yes ok).

Also it is faster to just read scale and offset with this loop and save
the reading of TSC and doing multiply until after scale/offset has been 
acquired.




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-09 Thread Andy Lutomirski
On Thu, Feb 9, 2017 at 12:45 PM, KY Srinivasan <k...@microsoft.com> wrote:
>
>
>> -Original Message-
>> From: Thomas Gleixner [mailto:t...@linutronix.de]
>> Sent: Thursday, February 9, 2017 9:08 AM
>> To: Vitaly Kuznetsov <vkuzn...@redhat.com>
>> Cc: x...@kernel.org; Andy Lutomirski <l...@amacapital.net>; Ingo Molnar
>> <mi...@redhat.com>; H. Peter Anvin <h...@zytor.com>; KY Srinivasan
>> <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; Stephen
>> Hemminger <sthem...@microsoft.com>; Dexuan Cui
>> <de...@microsoft.com>; linux-ker...@vger.kernel.org;
>> de...@linuxdriverproject.org; virtualization@lists.linux-foundation.org
>> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read
>> method
>>
>> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
>> > +#ifdef CONFIG_HYPERV_TSCPAGE
>> > +static notrace u64 vread_hvclock(int *mode)
>> > +{
>> > +   const struct ms_hyperv_tsc_page *tsc_pg =
>> > +   (const struct ms_hyperv_tsc_page *)_page;
>> > +   u64 sequence, scale, offset, current_tick, cur_tsc;
>> > +
>> > +   while (1) {
>> > +   sequence = READ_ONCE(tsc_pg->tsc_sequence);
>> > +   if (!sequence)
>> > +   break;
>> > +
>> > +   scale = READ_ONCE(tsc_pg->tsc_scale);
>> > +   offset = READ_ONCE(tsc_pg->tsc_offset);
>> > +   rdtscll(cur_tsc);
>> > +
>> > +   current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
>> > +
>> > +   if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
>> > +   return current_tick;
>>
>> That sequence stuff lacks still a sensible explanation. It's fundamentally
>> different from the sequence counting we do in the kernel, so documentation
>> for it is really required.
>
> The host is updating multiple fields in this shared TSC page and the sequence 
> number is
> used to ensure that the guest sees a consistent set values published. If I 
> remember
> correctly, Xen has a similar mechanism.

So what's the actual protocol?  When the hypervisor updates the page,
does it freeze all guest cpus?  If not, how does it maintain
atomicity?

--Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-09 Thread KY Srinivasan via Virtualization


> -Original Message-
> From: Thomas Gleixner [mailto:t...@linutronix.de]
> Sent: Thursday, February 9, 2017 9:08 AM
> To: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Cc: x...@kernel.org; Andy Lutomirski <l...@amacapital.net>; Ingo Molnar
> <mi...@redhat.com>; H. Peter Anvin <h...@zytor.com>; KY Srinivasan
> <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; Stephen
> Hemminger <sthem...@microsoft.com>; Dexuan Cui
> <de...@microsoft.com>; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; virtualization@lists.linux-foundation.org
> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read
> method
> 
> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
> > +#ifdef CONFIG_HYPERV_TSCPAGE
> > +static notrace u64 vread_hvclock(int *mode)
> > +{
> > +   const struct ms_hyperv_tsc_page *tsc_pg =
> > +   (const struct ms_hyperv_tsc_page *)_page;
> > +   u64 sequence, scale, offset, current_tick, cur_tsc;
> > +
> > +   while (1) {
> > +   sequence = READ_ONCE(tsc_pg->tsc_sequence);
> > +   if (!sequence)
> > +   break;
> > +
> > +   scale = READ_ONCE(tsc_pg->tsc_scale);
> > +   offset = READ_ONCE(tsc_pg->tsc_offset);
> > +   rdtscll(cur_tsc);
> > +
> > +   current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> > +
> > +   if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> > +   return current_tick;
> 
> That sequence stuff lacks still a sensible explanation. It's fundamentally
> different from the sequence counting we do in the kernel, so documentation
> for it is really required.

The host is updating multiple fields in this shared TSC page and the sequence 
number is
used to ensure that the guest sees a consistent set values published. If I 
remember
correctly, Xen has a similar mechanism.

K. Y
> 
> Thanks,
> 
>   tglx
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-09 Thread Stephen Hemminger via Virtualization
Why not use existing seqlock's?

-Original Message-
From: Thomas Gleixner [mailto:t...@linutronix.de] 
Sent: Thursday, February 9, 2017 9:08 AM
To: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: x...@kernel.org; Andy Lutomirski <l...@amacapital.net>; Ingo Molnar 
<mi...@redhat.com>; H. Peter Anvin <h...@zytor.com>; KY Srinivasan 
<k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; Stephen Hemminger 
<sthem...@microsoft.com>; Dexuan Cui <de...@microsoft.com>; 
linux-ker...@vger.kernel.org; de...@linuxdriverproject.org; 
virtualization@lists.linux-foundation.org
Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
> +#ifdef CONFIG_HYPERV_TSCPAGE
> +static notrace u64 vread_hvclock(int *mode) {
> + const struct ms_hyperv_tsc_page *tsc_pg =
> + (const struct ms_hyperv_tsc_page *)_page;
> + u64 sequence, scale, offset, current_tick, cur_tsc;
> +
> + while (1) {
> + sequence = READ_ONCE(tsc_pg->tsc_sequence);
> + if (!sequence)
> + break;
> +
> + scale = READ_ONCE(tsc_pg->tsc_scale);
> + offset = READ_ONCE(tsc_pg->tsc_offset);
> + rdtscll(cur_tsc);
> +
> + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> +
> + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> + return current_tick;

That sequence stuff lacks still a sensible explanation. It's fundamentally 
different from the sequence counting we do in the kernel, so documentation for 
it is really required.

Thanks,

tglx
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-09 Thread Thomas Gleixner
On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
> +#ifdef CONFIG_HYPERV_TSCPAGE
> +static notrace u64 vread_hvclock(int *mode)
> +{
> + const struct ms_hyperv_tsc_page *tsc_pg =
> + (const struct ms_hyperv_tsc_page *)_page;
> + u64 sequence, scale, offset, current_tick, cur_tsc;
> +
> + while (1) {
> + sequence = READ_ONCE(tsc_pg->tsc_sequence);
> + if (!sequence)
> + break;
> +
> + scale = READ_ONCE(tsc_pg->tsc_scale);
> + offset = READ_ONCE(tsc_pg->tsc_offset);
> + rdtscll(cur_tsc);
> +
> + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> +
> + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> + return current_tick;

That sequence stuff lacks still a sensible explanation. It's fundamentally
different from the sequence counting we do in the kernel, so documentation
for it is really required.

Thanks,

tglx
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-09 Thread Vitaly Kuznetsov
Hyper-V TSC page clocksource is suitable for vDSO, however, the protocol
defined by the hypervisor is different from VCLOCK_PVCLOCK. Implement the
required support by adding hvclock_page VVAR.

Signed-off-by: Vitaly Kuznetsov 
---
 arch/x86/entry/vdso/vclock_gettime.c  | 36 +++
 arch/x86/entry/vdso/vdso-layout.lds.S |  3 ++-
 arch/x86/entry/vdso/vdso2c.c  |  3 +++
 arch/x86/entry/vdso/vma.c |  7 +++
 arch/x86/hyperv/hv_init.c |  3 +++
 arch/x86/include/asm/clocksource.h|  3 ++-
 arch/x86/include/asm/vdso.h   |  1 +
 7 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/vdso/vclock_gettime.c 
b/arch/x86/entry/vdso/vclock_gettime.c
index 9d4d6e1..4af10b4 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -32,6 +33,11 @@ extern u8 pvclock_page
__attribute__((visibility("hidden")));
 #endif
 
+#ifdef CONFIG_HYPERV_TSCPAGE
+extern u8 hvclock_page
+   __attribute__((visibility("hidden")));
+#endif
+
 #ifndef BUILD_VDSO32
 
 notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
@@ -141,6 +147,32 @@ static notrace u64 vread_pvclock(int *mode)
return last;
 }
 #endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+static notrace u64 vread_hvclock(int *mode)
+{
+   const struct ms_hyperv_tsc_page *tsc_pg =
+   (const struct ms_hyperv_tsc_page *)_page;
+   u64 sequence, scale, offset, current_tick, cur_tsc;
+
+   while (1) {
+   sequence = READ_ONCE(tsc_pg->tsc_sequence);
+   if (!sequence)
+   break;
+
+   scale = READ_ONCE(tsc_pg->tsc_scale);
+   offset = READ_ONCE(tsc_pg->tsc_offset);
+   rdtscll(cur_tsc);
+
+   current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
+
+   if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
+   return current_tick;
+   }
+
+   *mode = VCLOCK_NONE;
+   return 0;
+}
+#endif
 
 notrace static u64 vread_tsc(void)
 {
@@ -173,6 +205,10 @@ notrace static inline u64 vgetsns(int *mode)
else if (gtod->vclock_mode == VCLOCK_PVCLOCK)
cycles = vread_pvclock(mode);
 #endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+   else if (gtod->vclock_mode == VCLOCK_HVCLOCK)
+   cycles = vread_hvclock(mode);
+#endif
else
return 0;
v = (cycles - gtod->cycle_last) & gtod->mask;
diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S 
b/arch/x86/entry/vdso/vdso-layout.lds.S
index a708aa9..8ebb4b6 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -25,7 +25,7 @@ SECTIONS
 * segment.
 */
 
-   vvar_start = . - 2 * PAGE_SIZE;
+   vvar_start = . - 3 * PAGE_SIZE;
vvar_page = vvar_start;
 
/* Place all vvars at the offsets in asm/vvar.h. */
@@ -36,6 +36,7 @@ SECTIONS
 #undef EMIT_VVAR
 
pvclock_page = vvar_start + PAGE_SIZE;
+   hvclock_page = vvar_start + 2 * PAGE_SIZE;
 
. = SIZEOF_HEADERS;
 
diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c
index 491020b..0780a44 100644
--- a/arch/x86/entry/vdso/vdso2c.c
+++ b/arch/x86/entry/vdso/vdso2c.c
@@ -74,6 +74,7 @@ enum {
sym_vvar_page,
sym_hpet_page,
sym_pvclock_page,
+   sym_hvclock_page,
sym_VDSO_FAKE_SECTION_TABLE_START,
sym_VDSO_FAKE_SECTION_TABLE_END,
 };
@@ -82,6 +83,7 @@ const int special_pages[] = {
sym_vvar_page,
sym_hpet_page,
sym_pvclock_page,
+   sym_hvclock_page,
 };
 
 struct vdso_sym {
@@ -94,6 +96,7 @@ struct vdso_sym required_syms[] = {
[sym_vvar_page] = {"vvar_page", true},
[sym_hpet_page] = {"hpet_page", true},
[sym_pvclock_page] = {"pvclock_page", true},
+   [sym_hvclock_page] = {"hvclock_page", true},
[sym_VDSO_FAKE_SECTION_TABLE_START] = {
"VDSO_FAKE_SECTION_TABLE_START", false
},
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 10820f6..b256a3b 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #if defined(CONFIG_X86_64)
 unsigned int __read_mostly vdso64_enabled = 1;
@@ -120,6 +121,12 @@ static int vvar_fault(const struct vm_special_mapping *sm,
vmf->address,
__pa(pvti) >> PAGE_SHIFT);
}
+   } else if (sym_offset == image->sym_hvclock_page) {
+   struct ms_hyperv_tsc_page *tsc_pg = hv_get_tsc_page();
+
+   if (tsc_pg && vclock_was_used(VCLOCK_HVCLOCK))
+   ret = vm_insert_pfn(vma, vmf->address,
+