Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-07-14 Thread Andy Lutomirski
On Thu, Jul 13, 2017 at 5:46 AM, Vitaly Kuznetsov  wrote:
> Andy Lutomirski  writes:
>
>> On Tue, May 23, 2017 at 5:36 AM, Vitaly Kuznetsov  
>> wrote:
>>> Andy Lutomirski  writes:
>>>

 Also, can you share the benchmark you used for these patches?
>>>
>>> I didn't do much while writing the patchset, mostly I was running the
>>> attached dumb trasher (32 pthreads doing mmap/munmap). On a 16 vCPU
>>> Hyper-V 2016 guest I get the following (just re-did the test with
>>> 4.12-rc1):
>>>
>>> Before the patchset:
>>> # time ./pthread_mmap ./randfile
>>>
>>> real3m33.118s
>>> user0m3.698s
>>> sys 3m16.624s
>>>
>>> After the patchset:
>>> # time ./pthread_mmap ./randfile
>>>
>>> real2m19.920s
>>> user0m2.662s
>>> sys 2m9.948s
>>>
>>> K. Y.'s guys at Microsoft did additional testing for the patchset on
>>> different Hyper-V deployments including Azure, they may share their
>>> findings too.
>>
>> I ran this benchmark on my big TLB patchset, mainly to make sure I
>> didn't regress your test.  I seem to have sped it up by 30% or so
>> instead.  I need to study this a little bit to figure out why to make
>> sure that the reason isn't that I'm failing to do flushes I need to
>> do.
>
> Got back to this and tested everything on WS2016 Hyper-V guest (24
> vCPUs) with my slightly modified benchmark. The numbers are:
>
> 1) pre-patch:
>
> real1m15.775s
> user0m0.850s
> sys 1m31.515s
>
> 2) your 'x86/pcid' series (PCID feature is not passed to the guest so this
> is mainly your lazy tlb optimization):
>
> real0m55.135s
> user0m1.168s
> sys 1m3.810s
>
> 3) My 'pv tlb shootdown' patchset on top of your 'x86/pcid' series:
>
> real0m48.891s
> user0m1.052s
> sys 0m52.591s
>
> As far as I understand I need to add
> 'setup_clear_cpu_cap(X86_FEATURE_PCID)' to my series to make things work
> properly if this feature appears in the guest.
>
> Other than that there is an additional room for optimization:
> tlb_single_page_flush_ceiling, I'm not sure that with Hyper-V's PV the
> default value of 33 is optimal. But the investigation can be done
> separately.
>
> AFAIU with your TLB preparatory work which got into 4.13 our series
> become untangled and can go through different trees. I'll rebase mine
> and send it to K. Y. to push through Greg's char-misc tree.
>
> Is there anything blocking your PCID series from going into 4.14? It
> seems to big a huge improvement for some workloads.

No.  All but one patch should land in 4.13.

It would also be nifty if someone were to augment by work to allow one
CPU to tell another CPU that it just flushed on that CPU's behalf.
Basically, a property atomic and/or locked operation that finds a
given ctx_id in the remote cpu's cpu_tlbstate and, if tlb_gen <= x,
sets tlb_gen to x.  Some read operations might be useful, too.  This
*might* be doable with cmpxchg16b, but spinlocks would be easier.  The
idea would be for paravirt remote flushes to be able to see, for real,
which remote CPUs need flushes, do the flushes, and then update the
remote tlb_gen to record that they've been done.

FWIW, I read the HV TLB docs, and it's entirely unclear to me how it
interacts with PCID or whether PCID is supported at all.  It would be
real nice to get PCID *and* paravirt flush on the major hypervisor
platforms.

--Andy
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-07-13 Thread Vitaly Kuznetsov
Andy Lutomirski  writes:

> On Tue, May 23, 2017 at 5:36 AM, Vitaly Kuznetsov  wrote:
>> Andy Lutomirski  writes:
>>
>>>
>>> Also, can you share the benchmark you used for these patches?
>>
>> I didn't do much while writing the patchset, mostly I was running the
>> attached dumb trasher (32 pthreads doing mmap/munmap). On a 16 vCPU
>> Hyper-V 2016 guest I get the following (just re-did the test with
>> 4.12-rc1):
>>
>> Before the patchset:
>> # time ./pthread_mmap ./randfile
>>
>> real3m33.118s
>> user0m3.698s
>> sys 3m16.624s
>>
>> After the patchset:
>> # time ./pthread_mmap ./randfile
>>
>> real2m19.920s
>> user0m2.662s
>> sys 2m9.948s
>>
>> K. Y.'s guys at Microsoft did additional testing for the patchset on
>> different Hyper-V deployments including Azure, they may share their
>> findings too.
>
> I ran this benchmark on my big TLB patchset, mainly to make sure I
> didn't regress your test.  I seem to have sped it up by 30% or so
> instead.  I need to study this a little bit to figure out why to make
> sure that the reason isn't that I'm failing to do flushes I need to
> do.

Got back to this and tested everything on WS2016 Hyper-V guest (24
vCPUs) with my slightly modified benchmark. The numbers are:

1) pre-patch:

real1m15.775s
user0m0.850s
sys 1m31.515s

2) your 'x86/pcid' series (PCID feature is not passed to the guest so this
is mainly your lazy tlb optimization):

real0m55.135s
user0m1.168s
sys 1m3.810s

3) My 'pv tlb shootdown' patchset on top of your 'x86/pcid' series:

real0m48.891s
user0m1.052s
sys 0m52.591s

As far as I understand I need to add
'setup_clear_cpu_cap(X86_FEATURE_PCID)' to my series to make things work
properly if this feature appears in the guest.

Other than that there is an additional room for optimization:
tlb_single_page_flush_ceiling, I'm not sure that with Hyper-V's PV the
default value of 33 is optimal. But the investigation can be done
separately.

AFAIU with your TLB preparatory work which got into 4.13 our series
become untangled and can go through different trees. I'll rebase mine
and send it to K. Y. to push through Greg's char-misc tree.

Is there anything blocking your PCID series from going into 4.14? It
seems to big a huge improvement for some workloads.

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-06-26 Thread Andy Lutomirski
On Tue, May 23, 2017 at 5:36 AM, Vitaly Kuznetsov  wrote:
> Andy Lutomirski  writes:
>
>>
>> Also, can you share the benchmark you used for these patches?
>
> I didn't do much while writing the patchset, mostly I was running the
> attached dumb trasher (32 pthreads doing mmap/munmap). On a 16 vCPU
> Hyper-V 2016 guest I get the following (just re-did the test with
> 4.12-rc1):
>
> Before the patchset:
> # time ./pthread_mmap ./randfile
>
> real3m33.118s
> user0m3.698s
> sys 3m16.624s
>
> After the patchset:
> # time ./pthread_mmap ./randfile
>
> real2m19.920s
> user0m2.662s
> sys 2m9.948s
>
> K. Y.'s guys at Microsoft did additional testing for the patchset on
> different Hyper-V deployments including Azure, they may share their
> findings too.

I ran this benchmark on my big TLB patchset, mainly to make sure I
didn't regress your test.  I seem to have sped it up by 30% or so
instead.  I need to study this a little bit to figure out why to make
sure that the reason isn't that I'm failing to do flushes I need to
do.
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-05-23 Thread KY Srinivasan


> -Original Message-
> From: devel [mailto:driverdev-devel-boun...@linuxdriverproject.org] On
> Behalf Of Vitaly Kuznetsov
> Sent: Tuesday, May 23, 2017 5:37 AM
> To: Andy Lutomirski <l...@kernel.org>
> Cc: Stephen Hemminger <sthem...@microsoft.com>; Jork Loeser
> <jork.loe...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>;
> X86 ML <x...@kernel.org>; linux-ker...@vger.kernel.org; Steven Rostedt
> <rost...@goodmis.org>; Ingo Molnar <mi...@redhat.com>; H. Peter Anvin
> <h...@zytor.com>; de...@linuxdriverproject.org; Thomas Gleixner
> <t...@linutronix.de>
> Subject: Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB
> flush
> 
> Andy Lutomirski <l...@kernel.org> writes:
> 
> > On Mon, May 22, 2017 at 3:43 AM, Vitaly Kuznetsov
> <vkuzn...@redhat.com> wrote:
> >> Andy Lutomirski <l...@kernel.org> writes:
> >>
> >>> On 05/19/2017 07:09 AM, Vitaly Kuznetsov wrote:
> >>>> Hyper-V host can suggest us to use hypercall for doing remote TLB
> flush,
> >>>> this is supposed to work faster than IPIs.
> >>>>
> >>>> Implementation details: to do HvFlushVirtualAddress{Space,List}
> hypercalls
> >>>> we need to put the input somewhere in memory and we don't really
> want to
> >>>> have memory allocation on each call so we pre-allocate per cpu
> memory areas
> >>>> on boot. These areas are of fixes size, limit them with an arbitrary
> number
> >>>> of 16 (16 gvas are able to specify 16 * 4096 pages).
> >>>>
> >>>> pv_ops patching is happening very early so we need to separate
> >>>> hyperv_setup_mmu_ops() and hyper_alloc_mmu().
> >>>>
> >>>> It is possible and easy to implement local TLB flushing too and there is
> >>>> even a hint for that. However, I don't see a room for optimization on
> the
> >>>> host side as both hypercall and native tlb flush will result in vmexit. 
> >>>> The
> >>>> hint is also not set on modern Hyper-V versions.
> >>>
> >>> Why do local flushes exit?
> >>
> >> "exist"? I don't know, to be honest. To me it makes no difference from
> >> hypervisor's point of view as intercepting tlb flushing instructions is
> >> not any different from implmenting a hypercall.
> >>
> >> Hyper-V gives its guests 'hints' to indicate if they need to use
> >> hypercalls for remote/locat TLB flush and I don't remember seeing
> >> 'local' bit set.
> >
> > What I meant was: why aren't local flushes handled directly in the
> > guest without exiting to the host?  Or are they?  In principle,
> > INVPCID should just work, right?  Even reading and writing CR3 back
> > should work if the hypervisor sets up the magic list of allowed CR3
> > values, right?
> >
> > I guess on older CPUs there might not be any way to flush the local
> > TLB without exiting, but I'm not *that* familiar with the details of
> > the virtualization extensions.
> >
> 
> Right, local flushes should 'just work'. If for whatever reason
> hypervisor decides to trap us it's nothing we can do about it.
> 
> >>
> >>>
> >>>> +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> >>>> +struct mm_struct *mm, unsigned long 
> >>>> start,
> >>>> +unsigned long end)
> >>>> +{
> >>>
> >>> What tree will this go through?  I'm about to send a signature change
> >>> for this function for tip:x86/mm.
> >>
> >> I think this was going to get through Greg's char-misc tree but if we
> >> need to synchronize I think we can push this through x86.
> >
> > Works for me.  Linus can probably resolve the trivial conflict.  But
> > going through the x86 tree might make sense here if that's okay with
> > you.
> >
> 
> Definitely fine with me, I'll leave this decision up to x86 maintainers,
> Hyper-V maintainers, and Greg.
> 
> >>
> >>>
> >>> Also, how would this interact with PCID?  I have PCID patches that I'm
> >>> pretty happy with now, and I'm hoping to support PCID in 4.13.
> >>>
> >>
> >> Sorry, I wasn't following this work closely. .flush_tlb_others() hook is
> >> not going away from pv_mmu_ops, right? In think case we can have both
> in
> >> 4.13. Or do you see any other

Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-05-23 Thread Vitaly Kuznetsov
Andy Lutomirski  writes:

> On Mon, May 22, 2017 at 3:43 AM, Vitaly Kuznetsov  wrote:
>> Andy Lutomirski  writes:
>>
>>> On 05/19/2017 07:09 AM, Vitaly Kuznetsov wrote:
 Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
 this is supposed to work faster than IPIs.

 Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
 we need to put the input somewhere in memory and we don't really want to
 have memory allocation on each call so we pre-allocate per cpu memory areas
 on boot. These areas are of fixes size, limit them with an arbitrary number
 of 16 (16 gvas are able to specify 16 * 4096 pages).

 pv_ops patching is happening very early so we need to separate
 hyperv_setup_mmu_ops() and hyper_alloc_mmu().

 It is possible and easy to implement local TLB flushing too and there is
 even a hint for that. However, I don't see a room for optimization on the
 host side as both hypercall and native tlb flush will result in vmexit. The
 hint is also not set on modern Hyper-V versions.
>>>
>>> Why do local flushes exit?
>>
>> "exist"? I don't know, to be honest. To me it makes no difference from
>> hypervisor's point of view as intercepting tlb flushing instructions is
>> not any different from implmenting a hypercall.
>>
>> Hyper-V gives its guests 'hints' to indicate if they need to use
>> hypercalls for remote/locat TLB flush and I don't remember seeing
>> 'local' bit set.
>
> What I meant was: why aren't local flushes handled directly in the
> guest without exiting to the host?  Or are they?  In principle,
> INVPCID should just work, right?  Even reading and writing CR3 back
> should work if the hypervisor sets up the magic list of allowed CR3
> values, right?
>
> I guess on older CPUs there might not be any way to flush the local
> TLB without exiting, but I'm not *that* familiar with the details of
> the virtualization extensions.
>

Right, local flushes should 'just work'. If for whatever reason
hypervisor decides to trap us it's nothing we can do about it.

>>
>>>
 +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
 +struct mm_struct *mm, unsigned long start,
 +unsigned long end)
 +{
>>>
>>> What tree will this go through?  I'm about to send a signature change
>>> for this function for tip:x86/mm.
>>
>> I think this was going to get through Greg's char-misc tree but if we
>> need to synchronize I think we can push this through x86.
>
> Works for me.  Linus can probably resolve the trivial conflict.  But
> going through the x86 tree might make sense here if that's okay with
> you.
>

Definitely fine with me, I'll leave this decision up to x86 maintainers,
Hyper-V maintainers, and Greg.

>>
>>>
>>> Also, how would this interact with PCID?  I have PCID patches that I'm
>>> pretty happy with now, and I'm hoping to support PCID in 4.13.
>>>
>>
>> Sorry, I wasn't following this work closely. .flush_tlb_others() hook is
>> not going away from pv_mmu_ops, right? In think case we can have both in
>> 4.13. Or do you see any other clashes?
>>
>
> The issue is that I'm changing the whole flush algorithm.  The main
> patch that affects this is here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pcid=a67bff42e1e55666fdbaddf233a484a8773688c1
>
> The interactions between that patch and paravirt flush helpers may be
> complex, and it'll need some thought.  PCID makes everything even more
> subtle, so just turning off PCID when paravirt flush is involved seems
> the safest for now.  Ideally we'd eventually support PCID and paravirt
> flushes together (and even eventual native remote flushes assuming
> they ever get added).

I see. On Hyper-V HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST hypercall's
interface is:
1) List of entries to flush. Each entry is a PFN and lower 12 bits are
used to encode the number of pages after this one (defined by the PFN)
we'd like to flush. We can flush up to 509 entries with one
hypercall (can be extended but requires a pre-allocated memory region).

2) Processor mask

3) Address space id (all 64 bits of CR3. Not sure how it's used within
the hypervisor).

HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX is more or less the same but we
need more space to specify > 64 vCPUs so we'll be able to pass less than
509 entries.

The main advantage compared to sending IPIs, as far as I understand, is
that virtual CPUs which are not currently scheduled don't need flushing
and we can't know this from within the guest.

I agree that disabling PCID for paravirt flush users for now is a good
option, let's have it merged and tested without this additional
complexity and make another round after.

>
> Also, can you share the benchmark you used for these patches?

I didn't do much while writing the patchset, mostly I was running the
attached dumb 

Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-05-22 Thread Andy Lutomirski
On Mon, May 22, 2017 at 3:43 AM, Vitaly Kuznetsov  wrote:
> Andy Lutomirski  writes:
>
>> On 05/19/2017 07:09 AM, Vitaly Kuznetsov wrote:
>>> Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
>>> this is supposed to work faster than IPIs.
>>>
>>> Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
>>> we need to put the input somewhere in memory and we don't really want to
>>> have memory allocation on each call so we pre-allocate per cpu memory areas
>>> on boot. These areas are of fixes size, limit them with an arbitrary number
>>> of 16 (16 gvas are able to specify 16 * 4096 pages).
>>>
>>> pv_ops patching is happening very early so we need to separate
>>> hyperv_setup_mmu_ops() and hyper_alloc_mmu().
>>>
>>> It is possible and easy to implement local TLB flushing too and there is
>>> even a hint for that. However, I don't see a room for optimization on the
>>> host side as both hypercall and native tlb flush will result in vmexit. The
>>> hint is also not set on modern Hyper-V versions.
>>
>> Why do local flushes exit?
>
> "exist"? I don't know, to be honest. To me it makes no difference from
> hypervisor's point of view as intercepting tlb flushing instructions is
> not any different from implmenting a hypercall.
>
> Hyper-V gives its guests 'hints' to indicate if they need to use
> hypercalls for remote/locat TLB flush and I don't remember seeing
> 'local' bit set.

What I meant was: why aren't local flushes handled directly in the
guest without exiting to the host?  Or are they?  In principle,
INVPCID should just work, right?  Even reading and writing CR3 back
should work if the hypervisor sets up the magic list of allowed CR3
values, right?

I guess on older CPUs there might not be any way to flush the local
TLB without exiting, but I'm not *that* familiar with the details of
the virtualization extensions.

>
>>
>>> +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>>> +struct mm_struct *mm, unsigned long start,
>>> +unsigned long end)
>>> +{
>>
>> What tree will this go through?  I'm about to send a signature change
>> for this function for tip:x86/mm.
>
> I think this was going to get through Greg's char-misc tree but if we
> need to synchronize I think we can push this through x86.

Works for me.  Linus can probably resolve the trivial conflict.  But
going through the x86 tree might make sense here if that's okay with
you.

>
>>
>> Also, how would this interact with PCID?  I have PCID patches that I'm
>> pretty happy with now, and I'm hoping to support PCID in 4.13.
>>
>
> Sorry, I wasn't following this work closely. .flush_tlb_others() hook is
> not going away from pv_mmu_ops, right? In think case we can have both in
> 4.13. Or do you see any other clashes?
>

The issue is that I'm changing the whole flush algorithm.  The main
patch that affects this is here:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pcid=a67bff42e1e55666fdbaddf233a484a8773688c1

The interactions between that patch and paravirt flush helpers may be
complex, and it'll need some thought.  PCID makes everything even more
subtle, so just turning off PCID when paravirt flush is involved seems
the safest for now.  Ideally we'd eventually support PCID and paravirt
flushes together (and even eventual native remote flushes assuming
they ever get added).

Also, can you share the benchmark you used for these patches?
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-05-22 Thread KY Srinivasan


> -Original Message-
> From: devel [mailto:driverdev-devel-boun...@linuxdriverproject.org] On
> Behalf Of Vitaly Kuznetsov
> Sent: Monday, May 22, 2017 3:44 AM
> To: Andy Lutomirski <l...@kernel.org>
> Cc: Stephen Hemminger <sthem...@microsoft.com>; Jork Loeser
> <jork.loe...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>;
> x...@kernel.org; linux-ker...@vger.kernel.org; Steven Rostedt
> <rost...@goodmis.org>; Ingo Molnar <mi...@redhat.com>; H. Peter Anvin
> <h...@zytor.com>; de...@linuxdriverproject.org; Thomas Gleixner
> <t...@linutronix.de>
> Subject: Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB
> flush
> 
> Andy Lutomirski <l...@kernel.org> writes:
> 
> > On 05/19/2017 07:09 AM, Vitaly Kuznetsov wrote:
> >> Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
> >> this is supposed to work faster than IPIs.
> >>
> >> Implementation details: to do HvFlushVirtualAddress{Space,List}
> hypercalls
> >> we need to put the input somewhere in memory and we don't really
> want to
> >> have memory allocation on each call so we pre-allocate per cpu memory
> areas
> >> on boot. These areas are of fixes size, limit them with an arbitrary number
> >> of 16 (16 gvas are able to specify 16 * 4096 pages).
> >>
> >> pv_ops patching is happening very early so we need to separate
> >> hyperv_setup_mmu_ops() and hyper_alloc_mmu().
> >>
> >> It is possible and easy to implement local TLB flushing too and there is
> >> even a hint for that. However, I don't see a room for optimization on the
> >> host side as both hypercall and native tlb flush will result in vmexit. The
> >> hint is also not set on modern Hyper-V versions.
> >
> > Why do local flushes exit?
> 
> "exist"? I don't know, to be honest. To me it makes no difference from
> hypervisor's point of view as intercepting tlb flushing instructions is
> not any different from implmenting a hypercall.
> 
> Hyper-V gives its guests 'hints' to indicate if they need to use
> hypercalls for remote/locat TLB flush and I don't remember seeing
> 'local' bit set.
> 
> Microsoft folks may probably shed some light on why this was added.

As Vitaly has indicated, these are based on hints from the hypervisor.
Not sure what the perf impact might be for the local flush enlightenment.
> 
> >
> >> +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> >> +  struct mm_struct *mm, unsigned long
> start,
> >> +  unsigned long end)
> >> +{
> >
> > What tree will this go through?  I'm about to send a signature change
> > for this function for tip:x86/mm.
> 
> I think this was going to get through Greg's char-misc tree but if we
> need to synchronize I think we can push this through x86.

It will be good to take this through Greg's tree as that would simplify 
coordination
with other changes. 
> 
> >
> > Also, how would this interact with PCID?  I have PCID patches that I'm
> > pretty happy with now, and I'm hoping to support PCID in 4.13.
> >
> 
> Sorry, I wasn't following this work closely. .flush_tlb_others() hook is
> not going away from pv_mmu_ops, right? In think case we can have both in
> 4.13. Or do you see any other clashes?
> 
> --
>   Vitaly
> ___
> devel mailing list
> de...@linuxdriverproject.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdriverd
> ev.linuxdriverproject.org%2Fmailman%2Flistinfo%2Fdriverdev-
> devel=02%7C01%7Ckys%40microsoft.com%7Cbdee6af479524fb02db50
> 8d4a0ff73fe%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63631046
> 6477893081=69mm5horEX93QjLCyhvyFwD8CL%2B0M8kJFaWC9%2BW
> 18wc%3D=0
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-05-22 Thread Vitaly Kuznetsov
Andy Lutomirski  writes:

> On 05/19/2017 07:09 AM, Vitaly Kuznetsov wrote:
>> Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
>> this is supposed to work faster than IPIs.
>>
>> Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
>> we need to put the input somewhere in memory and we don't really want to
>> have memory allocation on each call so we pre-allocate per cpu memory areas
>> on boot. These areas are of fixes size, limit them with an arbitrary number
>> of 16 (16 gvas are able to specify 16 * 4096 pages).
>>
>> pv_ops patching is happening very early so we need to separate
>> hyperv_setup_mmu_ops() and hyper_alloc_mmu().
>>
>> It is possible and easy to implement local TLB flushing too and there is
>> even a hint for that. However, I don't see a room for optimization on the
>> host side as both hypercall and native tlb flush will result in vmexit. The
>> hint is also not set on modern Hyper-V versions.
>
> Why do local flushes exit?

"exist"? I don't know, to be honest. To me it makes no difference from
hypervisor's point of view as intercepting tlb flushing instructions is
not any different from implmenting a hypercall.

Hyper-V gives its guests 'hints' to indicate if they need to use
hypercalls for remote/locat TLB flush and I don't remember seeing
'local' bit set.

Microsoft folks may probably shed some light on why this was added.

>
>> +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>> +struct mm_struct *mm, unsigned long start,
>> +unsigned long end)
>> +{
>
> What tree will this go through?  I'm about to send a signature change
> for this function for tip:x86/mm.

I think this was going to get through Greg's char-misc tree but if we
need to synchronize I think we can push this through x86.

>
> Also, how would this interact with PCID?  I have PCID patches that I'm
> pretty happy with now, and I'm hoping to support PCID in 4.13.
>

Sorry, I wasn't following this work closely. .flush_tlb_others() hook is
not going away from pv_mmu_ops, right? In think case we can have both in
4.13. Or do you see any other clashes?

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-05-20 Thread Andy Lutomirski

On 05/19/2017 07:09 AM, Vitaly Kuznetsov wrote:

Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
this is supposed to work faster than IPIs.

Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
we need to put the input somewhere in memory and we don't really want to
have memory allocation on each call so we pre-allocate per cpu memory areas
on boot. These areas are of fixes size, limit them with an arbitrary number
of 16 (16 gvas are able to specify 16 * 4096 pages).

pv_ops patching is happening very early so we need to separate
hyperv_setup_mmu_ops() and hyper_alloc_mmu().

It is possible and easy to implement local TLB flushing too and there is
even a hint for that. However, I don't see a room for optimization on the
host side as both hypercall and native tlb flush will result in vmexit. The
hint is also not set on modern Hyper-V versions.


Why do local flushes exit?


+static void hyperv_flush_tlb_others(const struct cpumask *cpus,
+   struct mm_struct *mm, unsigned long start,
+   unsigned long end)
+{


What tree will this go through?  I'm about to send a signature change 
for this function for tip:x86/mm.


Also, how would this interact with PCID?  I have PCID patches that I'm 
pretty happy with now, and I'm hoping to support PCID in 4.13.

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-05-19 Thread Vitaly Kuznetsov
Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
this is supposed to work faster than IPIs.

Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
we need to put the input somewhere in memory and we don't really want to
have memory allocation on each call so we pre-allocate per cpu memory areas
on boot. These areas are of fixes size, limit them with an arbitrary number
of 16 (16 gvas are able to specify 16 * 4096 pages).

pv_ops patching is happening very early so we need to separate
hyperv_setup_mmu_ops() and hyper_alloc_mmu().

It is possible and easy to implement local TLB flushing too and there is
even a hint for that. However, I don't see a room for optimization on the
host side as both hypercall and native tlb flush will result in vmexit. The
hint is also not set on modern Hyper-V versions.

Signed-off-by: Vitaly Kuznetsov 
Acked-by: K. Y. Srinivasan 
Tested-by: Simon Xiao 
Tested-by: Srikanth Myakam 
---
 arch/x86/hyperv/Makefile   |   2 +-
 arch/x86/hyperv/hv_init.c  |   2 +
 arch/x86/hyperv/mmu.c  | 117 +
 arch/x86/include/asm/mshyperv.h|   3 +
 arch/x86/include/uapi/asm/hyperv.h |   7 +++
 arch/x86/kernel/cpu/mshyperv.c |   1 +
 6 files changed, 131 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/hyperv/mmu.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 171ae09..367a820 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1 +1 @@
-obj-y  := hv_init.o
+obj-y  := hv_init.o mmu.o
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 7fd9cd3..df3252f 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -140,6 +140,8 @@ void hyperv_init(void)
hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 
+   hyper_alloc_mmu();
+
/*
 * Register Hyper-V specific clocksource.
 */
diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
new file mode 100644
index 000..e3ab9b9
--- /dev/null
+++ b/arch/x86/hyperv/mmu.c
@@ -0,0 +1,117 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */
+struct hv_flush_pcpu {
+   __u64 address_space;
+   __u64 flags;
+   __u64 processor_mask;
+   __u64 gva_list[];
+};
+
+static struct hv_flush_pcpu __percpu *pcpu_flush;
+
+static void hyperv_flush_tlb_others(const struct cpumask *cpus,
+   struct mm_struct *mm, unsigned long start,
+   unsigned long end)
+{
+   struct hv_flush_pcpu *flush;
+   unsigned long cur, flags;
+   u64 status = -1ULL;
+   int cpu, vcpu, gva_n, max_gvas;
+
+   if (!pcpu_flush || !hv_hypercall_pg)
+   goto do_native;
+
+   if (cpumask_empty(cpus))
+   return;
+
+   local_irq_save(flags);
+
+   flush = this_cpu_ptr(pcpu_flush);
+
+   if (mm) {
+   flush->address_space = virt_to_phys(mm->pgd);
+   flush->flags = 0;
+   } else {
+   flush->address_space = 0;
+   flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+   }
+
+   flush->processor_mask = 0;
+   if (cpumask_equal(cpus, cpu_present_mask)) {
+   flush->flags |= HV_FLUSH_ALL_PROCESSORS;
+   } else {
+   for_each_cpu(cpu, cpus) {
+   vcpu = hv_cpu_number_to_vp_number(cpu);
+   if (vcpu != -1 && vcpu < 64)
+   flush->processor_mask |= 1 << vcpu;
+   else
+   goto do_native;
+   }
+   }
+
+   /*
+* We can flush not more than max_gvas with one hypercall. Flush the
+* whole address space if we were asked to do more.
+*/
+   max_gvas = (PAGE_SIZE - sizeof(*flush)) / 8;
+
+   if (end == TLB_FLUSH_ALL ||
+   (end && ((end - start)/(PAGE_SIZE*PAGE_SIZE)) > max_gvas)) {
+   if (end == TLB_FLUSH_ALL)
+   flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
+   status = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
+flush, NULL);
+   } else {
+   cur = start;
+   gva_n = 0;
+   do {
+   flush->gva_list[gva_n] = cur & PAGE_MASK;
+   /*
+* Lower 12 bits encode the number of additional
+* pages to flush (in addition to the 'cur' page).
+*/
+   if (end >= cur + PAGE_SIZE * PAGE_SIZE)
+