Re: [PATCH v5 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-05-30 Thread Stephen Hemminger
On Tue, 30 May 2017 19:17:46 +
Jork Loeser <jork.loe...@microsoft.com> wrote:

> > -Original Message-
> > From: Andy Shevchenko [mailto:andy.shevche...@gmail.com]
> > Sent: Tuesday, May 30, 2017 09:53
> > To: Vitaly Kuznetsov <vkuzn...@redhat.com>
> > Cc: x...@kernel.org; de...@linuxdriverproject.org; linux-
> > ker...@vger.kernel.org; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang
> > <haiya...@microsoft.com>; Stephen Hemminger <sthem...@microsoft.com>;
> > Thomas Gleixner <t...@linutronix.de>; Ingo Molnar <mi...@redhat.com>; H.
> > Peter Anvin <h...@zytor.com>; Steven Rostedt <rost...@goodmis.org>; Jork
> > Loeser <jork.loe...@microsoft.com>; Simon Xiao <six...@microsoft.com>;
> > Andy Lutomirski <l...@kernel.org>
> > Subject: Re: [PATCH v5 08/10] x86/hyper-v: use hypercall for remote TLB 
> > flush
> > 
> > On Tue, May 30, 2017 at 2:34 PM, Vitaly Kuznetsov <vkuzn...@redhat.com>
> > wrote:  
> > > +#define HV_FLUSH_ALL_PROCESSORS0x0001
> > > +#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES0x0002
> > > +#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY  0x0004
> > > +#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT 0x0008  
> > 
> > BIT() ?  
> 
> Certainly a matter of taste. Given that the Hyper-V spec lists these as hex 
> numbers, I find the explicit numbers appropriate.
> 
> Regards,
> Jork

Keep the hex numbers, it makes more sense not to change it since rest of 
arch/x86/hyperv
uses hex values.


Re: [PATCH v5 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-05-30 Thread Stephen Hemminger
On Tue, 30 May 2017 19:17:46 +
Jork Loeser  wrote:

> > -Original Message-
> > From: Andy Shevchenko [mailto:andy.shevche...@gmail.com]
> > Sent: Tuesday, May 30, 2017 09:53
> > To: Vitaly Kuznetsov 
> > Cc: x...@kernel.org; de...@linuxdriverproject.org; linux-
> > ker...@vger.kernel.org; KY Srinivasan ; Haiyang Zhang
> > ; Stephen Hemminger ;
> > Thomas Gleixner ; Ingo Molnar ; H.
> > Peter Anvin ; Steven Rostedt ; Jork
> > Loeser ; Simon Xiao ;
> > Andy Lutomirski 
> > Subject: Re: [PATCH v5 08/10] x86/hyper-v: use hypercall for remote TLB 
> > flush
> > 
> > On Tue, May 30, 2017 at 2:34 PM, Vitaly Kuznetsov 
> > wrote:  
> > > +#define HV_FLUSH_ALL_PROCESSORS0x0001
> > > +#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES0x0002
> > > +#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY  0x0004
> > > +#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT 0x0008  
> > 
> > BIT() ?  
> 
> Certainly a matter of taste. Given that the Hyper-V spec lists these as hex 
> numbers, I find the explicit numbers appropriate.
> 
> Regards,
> Jork

Keep the hex numbers, it makes more sense not to change it since rest of 
arch/x86/hyperv
uses hex values.


Re: [PATCH v5 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-05-30 Thread Andy Shevchenko
On Tue, May 30, 2017 at 10:17 PM, Jork Loeser  wrote:

>> > +#define HV_FLUSH_ALL_PROCESSORS0x0001
>> > +#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES0x0002
>> > +#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY  0x0004
>> > +#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT 0x0008
>>
>> BIT() ?
>
> Certainly a matter of taste.

That's why ? is used, though slightly better to parse the BIT macros
which is also less error prone.

> Given that the Hyper-V spec lists these as hex numbers, I find the explicit 
> numbers appropriate.

Yes, but since it introduces a full set of the flags I would not see
disadvantages by style.

-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH v5 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-05-30 Thread Andy Shevchenko
On Tue, May 30, 2017 at 10:17 PM, Jork Loeser  wrote:

>> > +#define HV_FLUSH_ALL_PROCESSORS0x0001
>> > +#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES0x0002
>> > +#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY  0x0004
>> > +#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT 0x0008
>>
>> BIT() ?
>
> Certainly a matter of taste.

That's why ? is used, though slightly better to parse the BIT macros
which is also less error prone.

> Given that the Hyper-V spec lists these as hex numbers, I find the explicit 
> numbers appropriate.

Yes, but since it introduces a full set of the flags I would not see
disadvantages by style.

-- 
With Best Regards,
Andy Shevchenko


RE: [PATCH v5 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-05-30 Thread Jork Loeser
> -Original Message-
> From: Andy Shevchenko [mailto:andy.shevche...@gmail.com]
> Sent: Tuesday, May 30, 2017 09:53
> To: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Cc: x...@kernel.org; de...@linuxdriverproject.org; linux-
> ker...@vger.kernel.org; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang
> <haiya...@microsoft.com>; Stephen Hemminger <sthem...@microsoft.com>;
> Thomas Gleixner <t...@linutronix.de>; Ingo Molnar <mi...@redhat.com>; H.
> Peter Anvin <h...@zytor.com>; Steven Rostedt <rost...@goodmis.org>; Jork
> Loeser <jork.loe...@microsoft.com>; Simon Xiao <six...@microsoft.com>;
> Andy Lutomirski <l...@kernel.org>
> Subject: Re: [PATCH v5 08/10] x86/hyper-v: use hypercall for remote TLB flush
> 
> On Tue, May 30, 2017 at 2:34 PM, Vitaly Kuznetsov <vkuzn...@redhat.com>
> wrote:
> > +#define HV_FLUSH_ALL_PROCESSORS0x0001
> > +#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES0x0002
> > +#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY  0x0004
> > +#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT 0x0008
> 
> BIT() ?

Certainly a matter of taste. Given that the Hyper-V spec lists these as hex 
numbers, I find the explicit numbers appropriate.

Regards,
Jork



RE: [PATCH v5 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-05-30 Thread Jork Loeser
> -Original Message-
> From: Andy Shevchenko [mailto:andy.shevche...@gmail.com]
> Sent: Tuesday, May 30, 2017 09:53
> To: Vitaly Kuznetsov 
> Cc: x...@kernel.org; de...@linuxdriverproject.org; linux-
> ker...@vger.kernel.org; KY Srinivasan ; Haiyang Zhang
> ; Stephen Hemminger ;
> Thomas Gleixner ; Ingo Molnar ; H.
> Peter Anvin ; Steven Rostedt ; Jork
> Loeser ; Simon Xiao ;
> Andy Lutomirski 
> Subject: Re: [PATCH v5 08/10] x86/hyper-v: use hypercall for remote TLB flush
> 
> On Tue, May 30, 2017 at 2:34 PM, Vitaly Kuznetsov 
> wrote:
> > +#define HV_FLUSH_ALL_PROCESSORS0x0001
> > +#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES0x0002
> > +#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY  0x0004
> > +#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT 0x0008
> 
> BIT() ?

Certainly a matter of taste. Given that the Hyper-V spec lists these as hex 
numbers, I find the explicit numbers appropriate.

Regards,
Jork



Re: [PATCH v5 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-05-30 Thread Andy Shevchenko
On Tue, May 30, 2017 at 2:34 PM, Vitaly Kuznetsov  wrote:
> Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
> this is supposed to work faster than IPIs.
>
> Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
> we need to put the input somewhere in memory and we don't really want to
> have memory allocation on each call so we pre-allocate per cpu memory areas
> on boot. These areas are of fixes size, limit them with an arbitrary number
> of 16 (16 gvas are able to specify 16 * 4096 pages).
>
> pv_ops patching is happening very early so we need to separate
> hyperv_setup_mmu_ops() and hyper_alloc_mmu().
>
> It is possible and easy to implement local TLB flushing too and there is
> even a hint for that. However, I don't see a room for optimization on the
> host side as both hypercall and native tlb flush will result in vmexit. The
> hint is also not set on modern Hyper-V versions.

> @@ -0,0 +1,121 @@
> +#include 
> +#include 
> +#include 
> +#include 

Alphabetical order, please.

+ empty line

> +#include 
> +#include 
> +#include 
> +#include 

Can be alphabetically ordered?

> +/* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */
> +struct hv_flush_pcpu {
> +   __u64 address_space;
> +   __u64 flags;
> +   __u64 processor_mask;
> +   __u64 gva_list[];
> +};

I dunno what is the style there, but usually in Linux __uXX types are
used exclusively for User API.
Is it a case here? Can we use plain uXX types instead?

> +/* Each gva in gva_list encodes up to 4096 pages to flush */
> +#define HV_TLB_FLUSH_UNIT (PAGE_SIZE * PAGE_SIZE)

Regarding to the comment it would be rather
(4096 * PAGE_SIZE)

Yes, theoretically PAGE_SIZE can be not 4096.

> +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> +   struct mm_struct *mm, unsigned long start,
> +   unsigned long end)
> +{

> +   if (cpumask_equal(cpus, cpu_present_mask)) {
> +   flush->flags |= HV_FLUSH_ALL_PROCESSORS;
> +   } else {
> +   for_each_cpu(cpu, cpus) {
> +   vcpu = hv_cpu_number_to_vp_number(cpu);

> +   if (vcpu != -1 && vcpu < 64)

Just
if (vcpu < 64)
?

> +   __set_bit(vcpu, (unsigned long *)
> + >processor_mask);
> +   else
> +   goto do_native;
> +   }
> +   }

> +   if (end == TLB_FLUSH_ALL) {
> +   flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
> +   status = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
> +flush, NULL);
> +   } else if (end && ((end - start)/HV_TLB_FLUSH_UNIT) > max_gvas) {
> +   status = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
> +flush, NULL);

Yes! Looks much more cleaner.

> +   } else {
> +   cur = start;
> +   gva_n = 0;
> +   do {
> +   flush->gva_list[gva_n] = cur & PAGE_MASK;

> +   /*
> +* Lower 12 bits encode the number of additional
> +* pages to flush (in addition to the 'cur' page).
> +*/
> +   if (end >= cur + HV_TLB_FLUSH_UNIT)
> +   flush->gva_list[gva_n] |= ~PAGE_MASK;
> +   else if (end > cur)
> +   flush->gva_list[gva_n] |=
> +   (end - cur - 1) >> PAGE_SHIFT;

You can also simplify this slightly by introducing

unsigned long diff = end > cur ? end - cur : 0;

if (diff >= HV_TLB_FLUSH_UNIT)
flush->gva_list[gva_n] |= ~PAGE_MASK;
else if (diff)
flush->gva_list[gva_n] |= (diff - 1) >> PAGE_SHIFT;

> +
> +   cur += HV_TLB_FLUSH_UNIT;

> +   ++gva_n;

Make it post-increment. Better for reader (No need to pay an
additional attention why it's a pre-increment)

> +
> +   } while (cur < end);

> +   if (!(status & 0x))

Not first time I see this magic.

Perhaps

#define STATUS_BLA_BLA_MASK GENMASK(15,0)

if (!(status & STATUS_BLA_BLA_MASK))

in all appropriate places?

> +#define HV_FLUSH_ALL_PROCESSORS0x0001
> +#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES0x0002
> +#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY  0x0004
> +#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT 0x0008

BIT() ?

-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH v5 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-05-30 Thread Andy Shevchenko
On Tue, May 30, 2017 at 2:34 PM, Vitaly Kuznetsov  wrote:
> Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
> this is supposed to work faster than IPIs.
>
> Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
> we need to put the input somewhere in memory and we don't really want to
> have memory allocation on each call so we pre-allocate per cpu memory areas
> on boot. These areas are of fixes size, limit them with an arbitrary number
> of 16 (16 gvas are able to specify 16 * 4096 pages).
>
> pv_ops patching is happening very early so we need to separate
> hyperv_setup_mmu_ops() and hyper_alloc_mmu().
>
> It is possible and easy to implement local TLB flushing too and there is
> even a hint for that. However, I don't see a room for optimization on the
> host side as both hypercall and native tlb flush will result in vmexit. The
> hint is also not set on modern Hyper-V versions.

> @@ -0,0 +1,121 @@
> +#include 
> +#include 
> +#include 
> +#include 

Alphabetical order, please.

+ empty line

> +#include 
> +#include 
> +#include 
> +#include 

Can be alphabetically ordered?

> +/* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */
> +struct hv_flush_pcpu {
> +   __u64 address_space;
> +   __u64 flags;
> +   __u64 processor_mask;
> +   __u64 gva_list[];
> +};

I dunno what is the style there, but usually in Linux __uXX types are
used exclusively for User API.
Is it a case here? Can we use plain uXX types instead?

> +/* Each gva in gva_list encodes up to 4096 pages to flush */
> +#define HV_TLB_FLUSH_UNIT (PAGE_SIZE * PAGE_SIZE)

Regarding to the comment it would be rather
(4096 * PAGE_SIZE)

Yes, theoretically PAGE_SIZE can be not 4096.

> +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> +   struct mm_struct *mm, unsigned long start,
> +   unsigned long end)
> +{

> +   if (cpumask_equal(cpus, cpu_present_mask)) {
> +   flush->flags |= HV_FLUSH_ALL_PROCESSORS;
> +   } else {
> +   for_each_cpu(cpu, cpus) {
> +   vcpu = hv_cpu_number_to_vp_number(cpu);

> +   if (vcpu != -1 && vcpu < 64)

Just
if (vcpu < 64)
?

> +   __set_bit(vcpu, (unsigned long *)
> + >processor_mask);
> +   else
> +   goto do_native;
> +   }
> +   }

> +   if (end == TLB_FLUSH_ALL) {
> +   flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
> +   status = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
> +flush, NULL);
> +   } else if (end && ((end - start)/HV_TLB_FLUSH_UNIT) > max_gvas) {
> +   status = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
> +flush, NULL);

Yes! Looks much more cleaner.

> +   } else {
> +   cur = start;
> +   gva_n = 0;
> +   do {
> +   flush->gva_list[gva_n] = cur & PAGE_MASK;

> +   /*
> +* Lower 12 bits encode the number of additional
> +* pages to flush (in addition to the 'cur' page).
> +*/
> +   if (end >= cur + HV_TLB_FLUSH_UNIT)
> +   flush->gva_list[gva_n] |= ~PAGE_MASK;
> +   else if (end > cur)
> +   flush->gva_list[gva_n] |=
> +   (end - cur - 1) >> PAGE_SHIFT;

You can also simplify this slightly by introducing

unsigned long diff = end > cur ? end - cur : 0;

if (diff >= HV_TLB_FLUSH_UNIT)
flush->gva_list[gva_n] |= ~PAGE_MASK;
else if (diff)
flush->gva_list[gva_n] |= (diff - 1) >> PAGE_SHIFT;

> +
> +   cur += HV_TLB_FLUSH_UNIT;

> +   ++gva_n;

Make it post-increment. Better for reader (No need to pay an
additional attention why it's a pre-increment)

> +
> +   } while (cur < end);

> +   if (!(status & 0x))

Not first time I see this magic.

Perhaps

#define STATUS_BLA_BLA_MASK GENMASK(15,0)

if (!(status & STATUS_BLA_BLA_MASK))

in all appropriate places?

> +#define HV_FLUSH_ALL_PROCESSORS0x0001
> +#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES0x0002
> +#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY  0x0004
> +#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT 0x0008

BIT() ?

-- 
With Best Regards,
Andy Shevchenko


[PATCH v5 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-05-30 Thread Vitaly Kuznetsov
Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
this is supposed to work faster than IPIs.

Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
we need to put the input somewhere in memory and we don't really want to
have memory allocation on each call so we pre-allocate per cpu memory areas
on boot. These areas are of fixes size, limit them with an arbitrary number
of 16 (16 gvas are able to specify 16 * 4096 pages).

pv_ops patching is happening very early so we need to separate
hyperv_setup_mmu_ops() and hyper_alloc_mmu().

It is possible and easy to implement local TLB flushing too and there is
even a hint for that. However, I don't see a room for optimization on the
host side as both hypercall and native tlb flush will result in vmexit. The
hint is also not set on modern Hyper-V versions.

Signed-off-by: Vitaly Kuznetsov 
Acked-by: K. Y. Srinivasan 
---
Changes since v4:
- Define HV_TLB_FLUSH_UNIT, use __set_bit(), minor code style changes
  [Andy Shevchenko]
---
 arch/x86/hyperv/Makefile   |   2 +-
 arch/x86/hyperv/hv_init.c  |   2 +
 arch/x86/hyperv/mmu.c  | 121 +
 arch/x86/include/asm/mshyperv.h|   3 +
 arch/x86/include/uapi/asm/hyperv.h |   7 +++
 arch/x86/kernel/cpu/mshyperv.c |   1 +
 6 files changed, 135 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/hyperv/mmu.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 171ae09..367a820 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1 +1 @@
-obj-y  := hv_init.o
+obj-y  := hv_init.o mmu.o
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 7fd9cd3..df3252f 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -140,6 +140,8 @@ void hyperv_init(void)
hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 
+   hyper_alloc_mmu();
+
/*
 * Register Hyper-V specific clocksource.
 */
diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
new file mode 100644
index 000..8ccd680
--- /dev/null
+++ b/arch/x86/hyperv/mmu.c
@@ -0,0 +1,121 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */
+struct hv_flush_pcpu {
+   __u64 address_space;
+   __u64 flags;
+   __u64 processor_mask;
+   __u64 gva_list[];
+};
+
+/* Each gva in gva_list encodes up to 4096 pages to flush */
+#define HV_TLB_FLUSH_UNIT (PAGE_SIZE * PAGE_SIZE)
+
+static struct hv_flush_pcpu __percpu *pcpu_flush;
+
+static void hyperv_flush_tlb_others(const struct cpumask *cpus,
+   struct mm_struct *mm, unsigned long start,
+   unsigned long end)
+{
+   struct hv_flush_pcpu *flush;
+   unsigned long cur, flags;
+   u64 status = U64_MAX;
+   int cpu, vcpu, gva_n, max_gvas;
+
+   if (!pcpu_flush || !hv_hypercall_pg)
+   goto do_native;
+
+   if (cpumask_empty(cpus))
+   return;
+
+   local_irq_save(flags);
+
+   flush = this_cpu_ptr(pcpu_flush);
+
+   if (mm) {
+   flush->address_space = virt_to_phys(mm->pgd);
+   flush->flags = 0;
+   } else {
+   flush->address_space = 0;
+   flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+   }
+
+   flush->processor_mask = 0;
+   if (cpumask_equal(cpus, cpu_present_mask)) {
+   flush->flags |= HV_FLUSH_ALL_PROCESSORS;
+   } else {
+   for_each_cpu(cpu, cpus) {
+   vcpu = hv_cpu_number_to_vp_number(cpu);
+   if (vcpu != -1 && vcpu < 64)
+   __set_bit(vcpu, (unsigned long *)
+ >processor_mask);
+   else
+   goto do_native;
+   }
+   }
+
+   /*
+* We can flush not more than max_gvas with one hypercall. Flush the
+* whole address space if we were asked to do more.
+*/
+   max_gvas = (PAGE_SIZE - sizeof(*flush)) / 8;
+
+   if (end == TLB_FLUSH_ALL) {
+   flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
+   status = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
+flush, NULL);
+   } else if (end && ((end - start)/HV_TLB_FLUSH_UNIT) > max_gvas) {
+   status = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
+flush, NULL);
+   } else {
+   cur = start;
+   gva_n = 0;
+   do {
+   flush->gva_list[gva_n] = cur & PAGE_MASK;
+   /*
+  

[PATCH v5 08/10] x86/hyper-v: use hypercall for remote TLB flush

2017-05-30 Thread Vitaly Kuznetsov
Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
this is supposed to work faster than IPIs.

Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
we need to put the input somewhere in memory and we don't really want to
have memory allocation on each call so we pre-allocate per cpu memory areas
on boot. These areas are of fixes size, limit them with an arbitrary number
of 16 (16 gvas are able to specify 16 * 4096 pages).

pv_ops patching is happening very early so we need to separate
hyperv_setup_mmu_ops() and hyper_alloc_mmu().

It is possible and easy to implement local TLB flushing too and there is
even a hint for that. However, I don't see a room for optimization on the
host side as both hypercall and native tlb flush will result in vmexit. The
hint is also not set on modern Hyper-V versions.

Signed-off-by: Vitaly Kuznetsov 
Acked-by: K. Y. Srinivasan 
---
Changes since v4:
- Define HV_TLB_FLUSH_UNIT, use __set_bit(), minor code style changes
  [Andy Shevchenko]
---
 arch/x86/hyperv/Makefile   |   2 +-
 arch/x86/hyperv/hv_init.c  |   2 +
 arch/x86/hyperv/mmu.c  | 121 +
 arch/x86/include/asm/mshyperv.h|   3 +
 arch/x86/include/uapi/asm/hyperv.h |   7 +++
 arch/x86/kernel/cpu/mshyperv.c |   1 +
 6 files changed, 135 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/hyperv/mmu.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 171ae09..367a820 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1 +1 @@
-obj-y  := hv_init.o
+obj-y  := hv_init.o mmu.o
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 7fd9cd3..df3252f 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -140,6 +140,8 @@ void hyperv_init(void)
hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 
+   hyper_alloc_mmu();
+
/*
 * Register Hyper-V specific clocksource.
 */
diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
new file mode 100644
index 000..8ccd680
--- /dev/null
+++ b/arch/x86/hyperv/mmu.c
@@ -0,0 +1,121 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */
+struct hv_flush_pcpu {
+   __u64 address_space;
+   __u64 flags;
+   __u64 processor_mask;
+   __u64 gva_list[];
+};
+
+/* Each gva in gva_list encodes up to 4096 pages to flush */
+#define HV_TLB_FLUSH_UNIT (PAGE_SIZE * PAGE_SIZE)
+
+static struct hv_flush_pcpu __percpu *pcpu_flush;
+
+static void hyperv_flush_tlb_others(const struct cpumask *cpus,
+   struct mm_struct *mm, unsigned long start,
+   unsigned long end)
+{
+   struct hv_flush_pcpu *flush;
+   unsigned long cur, flags;
+   u64 status = U64_MAX;
+   int cpu, vcpu, gva_n, max_gvas;
+
+   if (!pcpu_flush || !hv_hypercall_pg)
+   goto do_native;
+
+   if (cpumask_empty(cpus))
+   return;
+
+   local_irq_save(flags);
+
+   flush = this_cpu_ptr(pcpu_flush);
+
+   if (mm) {
+   flush->address_space = virt_to_phys(mm->pgd);
+   flush->flags = 0;
+   } else {
+   flush->address_space = 0;
+   flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+   }
+
+   flush->processor_mask = 0;
+   if (cpumask_equal(cpus, cpu_present_mask)) {
+   flush->flags |= HV_FLUSH_ALL_PROCESSORS;
+   } else {
+   for_each_cpu(cpu, cpus) {
+   vcpu = hv_cpu_number_to_vp_number(cpu);
+   if (vcpu != -1 && vcpu < 64)
+   __set_bit(vcpu, (unsigned long *)
+ >processor_mask);
+   else
+   goto do_native;
+   }
+   }
+
+   /*
+* We can flush not more than max_gvas with one hypercall. Flush the
+* whole address space if we were asked to do more.
+*/
+   max_gvas = (PAGE_SIZE - sizeof(*flush)) / 8;
+
+   if (end == TLB_FLUSH_ALL) {
+   flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
+   status = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
+flush, NULL);
+   } else if (end && ((end - start)/HV_TLB_FLUSH_UNIT) > max_gvas) {
+   status = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
+flush, NULL);
+   } else {
+   cur = start;
+   gva_n = 0;
+   do {
+   flush->gva_list[gva_n] = cur & PAGE_MASK;
+   /*
+* Lower 12 bits encode