from:"Tang Chen"

[PATCH 0/2] Fix node meminfo corruption.

2014-10-31 Thread Tang Chen

There are two problems when calculating node meminfo:

1. When hot-adding a node without onlining any page, MemTotal corrupted.

# hot-add node2 (memory not onlined)
# cat /sys/device/system/node/node2/meminfo
Node 2 MemTotal:   33554432 kB  /* corrupted */
Node 2 MemFree:   0 kB
Node 2 MemUsed:33554432 kB
Node 2 Active:0 kB
..


2. When onlining memory on node2, MemFree of node3 corrupted.

# for ((i = 2048; i  2064; i++)); do echo online_movable  
/sys/devices/system/node/node2/memory$i/state; done
# cat /sys/devices/system/node/node2/meminfo
Node 2 MemTotal:   33554432 kB
Node 2 MemFree:33549092 kB
Node 2 MemUsed:5340 kB
..
# cat /sys/devices/system/node/node3/meminfo
Node 3 MemTotal:  0 kB
Node 3 MemFree:   248 kB/* corrupted */
Node 3 MemUsed:   0 kB
..

This patch-set fixes them.

Tang Chen (2):
  mem-hotplug: Reset node managed pages when hot-adding a new pgdat.
  mem-hotplug: Fix wrong check for zone-pageset initialization in
online_pages().

 include/linux/bootmem.h |  1 +
 include/linux/mm.h  |  1 +
 mm/bootmem.c|  9 +
 mm/memory_hotplug.c | 15 ++-
 mm/nobootmem.c  |  8 +---
 mm/page_alloc.c |  5 +
 6 files changed, 31 insertions(+), 8 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/2] Fix node meminfo corruption.

2014-10-31 Thread Tang Chen

There are two problems when calculating node meminfo:

1. When hot-adding a node without onlining any page, MemTotal corrupted.

# hot-add node2 (memory not onlined)
# cat /sys/device/system/node/node2/meminfo
Node 2 MemTotal:   33554432 kB  /* corrupted */
Node 2 MemFree:   0 kB
Node 2 MemUsed:33554432 kB
Node 2 Active:0 kB
..


2. When onlining memory on node2, MemFree of node3 corrupted.

# for ((i = 2048; i  2064; i++)); do echo online_movable  
/sys/devices/system/node/node2/memory$i/state; done
# cat /sys/devices/system/node/node2/meminfo
Node 2 MemTotal:   33554432 kB
Node 2 MemFree:33549092 kB
Node 2 MemUsed:5340 kB
..
# cat /sys/devices/system/node/node3/meminfo
Node 3 MemTotal:  0 kB
Node 3 MemFree:   248 kB/* corrupted */
Node 3 MemUsed:   0 kB
..

This patch-set fixes them.

Tang Chen (2):
  mem-hotplug: Reset node managed pages when hot-adding a new pgdat.
  mem-hotplug: Fix wrong check for zone-pageset initialization in
online_pages().

 include/linux/bootmem.h |  1 +
 include/linux/mm.h  |  1 +
 mm/bootmem.c|  9 +
 mm/memory_hotplug.c | 15 ++-
 mm/nobootmem.c  |  8 +---
 mm/page_alloc.c |  5 +
 6 files changed, 31 insertions(+), 8 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] mem-hotplug: Fix wrong check for zone-pageset initialization in online_pages().

2014-10-31 Thread Tang Chen

When we are doing memory hot-add, the following functions are called:

add_memory()
|-- hotadd_new_pgdat()
 |-- free_area_init_node()
  |-- free_area_init_core()
   |-- zone-present_pages = realsize;   /* 1. zone is 
populated */
|-- zone_pcp_init()
 |-- zone-pageset = boot_pageset;  /* 2. 
zone-pageset is set to boot_pageset */

There are two problems here:
1. Zones could be populated before any memory is onlined.
2. All the zones on a newly added node have the same pageset pointing to 
boot_pageset.

The above two problems will result in the following problem:
When we online memory on one node, e.g node2, the following code is executed:

online_pages()
{
..
if (!populated_zone(zone)) {
need_zonelists_rebuild = 1;
build_all_zonelists(NULL, zone);
}
..
}

Because of problem 1, the zone has been populated, and the build_all_zonelists()
  will never called. zone-pageset won't be updated.
Because of problem 2, All the zones on a newly added node have the same pageset
  pointing to boot_pageset.
And as a result, when we online memory on node2, node3's meminfo will corrupt.
Pages on node2 may be freed to node3.

# for ((i = 2048; i  2064; i++)); do echo online_movable  
/sys/devices/system/node/node2/memory$i/state; done
# cat /sys/devices/system/node/node2/meminfo
Node 2 MemTotal:   33554432 kB
Node 2 MemFree:33549092 kB
Node 2 MemUsed:5340 kB
..
# cat /sys/devices/system/node/node3/meminfo
Node 3 MemTotal:  0 kB
Node 3 MemFree:   248 kB/* corrupted */
Node 3 MemUsed:   0 kB
..

We have to populate some zones before onlining memory, otherwise no memory 
could be onlined.
So when onlining pages, we should also check if zone-pageset is pointing to 
boot_pageset.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 include/linux/mm.h  | 1 +
 mm/memory_hotplug.c | 6 +-
 mm/page_alloc.c | 5 +
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 02d11ee..83e6505 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1732,6 +1732,7 @@ void warn_alloc_failed(gfp_t gfp_mask, int order, const 
char *fmt, ...);
 
 extern void setup_per_cpu_pageset(void);
 
+extern bool zone_pcp_initialized(struct zone *zone);
 extern void zone_pcp_update(struct zone *zone);
 extern void zone_pcp_reset(struct zone *zone);
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3ab01b2..bc0de0f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1013,9 +1013,13 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages, int online_typ
 * If this zone is not populated, then it is not in zonelist.
 * This means the page allocator ignores this zone.
 * So, zonelist must be updated after online.
+*
+* If this zone is populated, zone-pageset could be initialized
+* to boot_pageset for the first time a node is added. If so,
+* zone-pageset should be allocated.
 */
mutex_lock(zonelists_mutex);
-   if (!populated_zone(zone)) {
+   if (!populated_zone(zone) || !zone_pcp_initialized(zone)) {
need_zonelists_rebuild = 1;
build_all_zonelists(NULL, zone);
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 736d8e1..4ff1540 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6456,6 +6456,11 @@ void __meminit zone_pcp_update(struct zone *zone)
 }
 #endif
 
+bool zone_pcp_initialized(struct zone *zone)
+{
+   return (zone-pageset != boot_pageset);
+}
+
 void zone_pcp_reset(struct zone *zone)
 {
unsigned long flags;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] mem-hotplug: Reset node managed pages when hot-adding a new pgdat.

2014-10-31 Thread Tang Chen

In free_area_init_core(), zone-managed_pages is set to an approximate
value for lowmem, and will be adjusted when the bootmem allocator frees
pages into the buddy system. But free_area_init_core() is also called
by hotadd_new_pgdat() when hot-adding memory. As a result, zone-managed_pages
of the newly added node's pgdat is set to an approximate value in the
very beginning. Even if the memory on that node has node been onlined,
/sys/device/system/node/nodeXXX/meminfo has wrong value.

hot-add node2 (memory not onlined)
cat /sys/device/system/node/node2/meminfo
Node 2 MemTotal:   33554432 kB
Node 2 MemFree:   0 kB
Node 2 MemUsed:33554432 kB
Node 2 Active:0 kB

This patch fixes this problem by reset node managed pages to 0 after hot-adding
a new node.

1. Move reset_managed_pages_done from reset_node_managed_pages() to 
reset_all_zones_managed_pages()
2. Make reset_node_managed_pages() non-static
3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat is 
initialized

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
 include/linux/bootmem.h | 1 +
 mm/bootmem.c| 9 +
 mm/memory_hotplug.c | 9 +
 mm/nobootmem.c  | 8 +---
 4 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index 4e2bd4c..0995c2d 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -46,6 +46,7 @@ extern unsigned long init_bootmem_node(pg_data_t *pgdat,
 extern unsigned long init_bootmem(unsigned long addr, unsigned long memend);
 
 extern unsigned long free_all_bootmem(void);
+extern void reset_node_managed_pages(pg_data_t *pgdat);
 extern void reset_all_zones_managed_pages(void);
 
 extern void free_bootmem_node(pg_data_t *pgdat,
diff --git a/mm/bootmem.c b/mm/bootmem.c
index 8a000ce..477be69 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -243,13 +243,10 @@ static unsigned long __init 
free_all_bootmem_core(bootmem_data_t *bdata)
 
 static int reset_managed_pages_done __initdata;
 
-static inline void __init reset_node_managed_pages(pg_data_t *pgdat)
+void reset_node_managed_pages(pg_data_t *pgdat)
 {
struct zone *z;
 
-   if (reset_managed_pages_done)
-   return;
-
for (z = pgdat-node_zones; z  pgdat-node_zones + MAX_NR_ZONES; z++)
z-managed_pages = 0;
 }
@@ -258,8 +255,12 @@ void __init reset_all_zones_managed_pages(void)
 {
struct pglist_data *pgdat;
 
+   if (reset_managed_pages_done)
+   return;
+
for_each_online_pgdat(pgdat)
reset_node_managed_pages(pgdat);
+
reset_managed_pages_done = 1;
 }
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 29d8693..3ab01b2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -31,6 +31,7 @@
 #include linux/stop_machine.h
 #include linux/hugetlb.h
 #include linux/memblock.h
+#include linux/bootmem.h
 
 #include asm/tlbflush.h
 
@@ -1096,6 +1097,14 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 
start)
build_all_zonelists(pgdat, NULL);
mutex_unlock(zonelists_mutex);
 
+   /*
+*  zone-managed_pages is set to an approximate value in
+*  free_area_init_core(), which will cause
+*  /sys/device/system/node/nodeX/meminfo has wrong data.
+*  So reset it to 0 before any memory is onlined.
+*/
+   reset_node_managed_pages(pgdat);
+
return pgdat;
 }
 
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 7c7ab32..90b5046 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -145,12 +145,10 @@ static unsigned long __init 
free_low_memory_core_early(void)
 
 static int reset_managed_pages_done __initdata;
 
-static inline void __init reset_node_managed_pages(pg_data_t *pgdat)
+void reset_node_managed_pages(pg_data_t *pgdat)
 {
struct zone *z;
 
-   if (reset_managed_pages_done)
-   return;
for (z = pgdat-node_zones; z  pgdat-node_zones + MAX_NR_ZONES; z++)
z-managed_pages = 0;
 }
@@ -159,8 +157,12 @@ void __init reset_all_zones_managed_pages(void)
 {
struct pglist_data *pgdat;
 
+   if (reset_managed_pages_done)
+   return;
+
for_each_online_pgdat(pgdat)
reset_node_managed_pages(pgdat);
+
reset_managed_pages_done = 1;
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 0/8] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-25 Thread Tang Chen



On 09/25/2014 09:43 PM, Paolo Bonzini wrote:

Il 25/09/2014 10:19, Tang Chen ha scritto:

Hi Paolo,

I'd like to help to test the patches.
Would you please tell me what is the best way to test this patch-set ?

How did _you_ test the patches?...


I just added "-cpu xxx,-x2apic" option, start the guest, using numactl 
to bind it
to a node (eg: node1). And offline the pages on node1. And repeat above 
thing

for a while. And it is OK.

I'm not sure if this is enough.

Thanks.




I think ept page is being used by regular guest.
Is adding "-cpu xxx,-x2apic option" able to make sure guest is using
apic page ?

Yes.

Paolo
.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 0/8] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-25 Thread Tang Chen


Hi Paolo,

I'd like to help to test the patches.
Would you please tell me what is the best way to test this patch-set ?

I think ept page is being used by regular guest.
Is adding "-cpu xxx,-x2apic option" able to make sure guest is using
apic page ?

Thanks.

On 09/24/2014 04:20 PM, Paolo Bonzini wrote:

Il 24/09/2014 09:57, Tang Chen ha scritto:

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

This patch-set is based on Linux 3.17.0-rc5.

NOTE: Tested with -cpu xxx,-x2apic option.
   But since nested vm pins some other pages in memory, if user uses nested
   vm, memory hot-remove will not work.

Change log v7->v8:
1. Patch 1/9~3/9 were applied to kvm/queue by Paolo Bonzini 
.
Just resend them, no changes.
2. Removed previous patch 4/9, which added unnecessary hook 
has_secondary_apic_access().
3. Set kvm_x86_ops->set_apic_access_page_addr to NULL when hardware had no 
flexpriority
functionality which actually exists only on x86.
4. Moved declaration of kvm_arch_mmu_notifier_invalidate_page() to 
arch/*/include/asm/kvm_host.h.
5. Removed useless set_apic_access_page_addr() hook for svm.

Tang Chen (8):
   kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
   kvm: Remove ept_identity_pagetable from struct kvm_arch.
   kvm: Make init_rmode_identity_map() return 0 on success.
   kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().
   kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and
 make it non-static.
   kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
 running.
   kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access
 migration.
   kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

  arch/arm/include/asm/kvm_host.h |   5 ++
  arch/arm64/include/asm/kvm_host.h   |   5 ++
  arch/ia64/include/asm/kvm_host.h|   7 ++
  arch/mips/include/asm/kvm_host.h|   6 ++
  arch/powerpc/include/asm/kvm_host.h |   5 ++
  arch/s390/include/asm/kvm_host.h|   8 +++
  arch/x86/include/asm/kvm_host.h |   7 +-
  arch/x86/kvm/svm.c  |   3 +-
  arch/x86/kvm/vmx.c  | 130 
  arch/x86/kvm/x86.c  |  45 +++--
  include/linux/kvm_host.h|   2 +
  virt/kvm/kvm_main.c |  13 ++--
  12 files changed, 180 insertions(+), 56 deletions(-)


Thanks for your persistence!  The patches look good, I'll test them and
apply to kvm/queue.

Paolo
.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 0/8] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-25 Thread Tang Chen



On 09/25/2014 09:43 PM, Paolo Bonzini wrote:

Il 25/09/2014 10:19, Tang Chen ha scritto:

Hi Paolo,

I'd like to help to test the patches.
Would you please tell me what is the best way to test this patch-set ?

How did _you_ test the patches?...


I just added -cpu xxx,-x2apic option, start the guest, using numactl 
to bind it
to a node (eg: node1). And offline the pages on node1. And repeat above 
thing

for a while. And it is OK.

I'm not sure if this is enough.

Thanks.




I think ept page is being used by regular guest.
Is adding -cpu xxx,-x2apic option able to make sure guest is using
apic page ?

Yes.

Paolo
.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 0/8] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-25 Thread Tang Chen


Hi Paolo,

I'd like to help to test the patches.
Would you please tell me what is the best way to test this patch-set ?

I think ept page is being used by regular guest.
Is adding -cpu xxx,-x2apic option able to make sure guest is using
apic page ?

Thanks.

On 09/24/2014 04:20 PM, Paolo Bonzini wrote:

Il 24/09/2014 09:57, Tang Chen ha scritto:

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

This patch-set is based on Linux 3.17.0-rc5.

NOTE: Tested with -cpu xxx,-x2apic option.
   But since nested vm pins some other pages in memory, if user uses nested
   vm, memory hot-remove will not work.

Change log v7-v8:
1. Patch 1/9~3/9 were applied to kvm/queue by Paolo Bonzini 
pbonz...@redhat.com.
Just resend them, no changes.
2. Removed previous patch 4/9, which added unnecessary hook 
has_secondary_apic_access().
3. Set kvm_x86_ops-set_apic_access_page_addr to NULL when hardware had no 
flexpriority
functionality which actually exists only on x86.
4. Moved declaration of kvm_arch_mmu_notifier_invalidate_page() to 
arch/*/include/asm/kvm_host.h.
5. Removed useless set_apic_access_page_addr() hook for svm.

Tang Chen (8):
   kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
   kvm: Remove ept_identity_pagetable from struct kvm_arch.
   kvm: Make init_rmode_identity_map() return 0 on success.
   kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().
   kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and
 make it non-static.
   kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
 running.
   kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access
 migration.
   kvm, mem-hotplug: Unpin and remove kvm_arch-apic_access_page.

  arch/arm/include/asm/kvm_host.h |   5 ++
  arch/arm64/include/asm/kvm_host.h   |   5 ++
  arch/ia64/include/asm/kvm_host.h|   7 ++
  arch/mips/include/asm/kvm_host.h|   6 ++
  arch/powerpc/include/asm/kvm_host.h |   5 ++
  arch/s390/include/asm/kvm_host.h|   8 +++
  arch/x86/include/asm/kvm_host.h |   7 +-
  arch/x86/kvm/svm.c  |   3 +-
  arch/x86/kvm/vmx.c  | 130 
  arch/x86/kvm/x86.c  |  45 +++--
  include/linux/kvm_host.h|   2 +
  virt/kvm/kvm_main.c |  13 ++--
  12 files changed, 180 insertions(+), 56 deletions(-)


Thanks for your persistence!  The patches look good, I'll test them and
apply to kvm/queue.

Paolo
.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 0/8] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-24 Thread Tang Chen



On 09/24/2014 04:20 PM, Paolo Bonzini wrote:

Il 24/09/2014 09:57, Tang Chen ha scritto:

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

This patch-set is based on Linux 3.17.0-rc5.

NOTE: Tested with -cpu xxx,-x2apic option.
   But since nested vm pins some other pages in memory, if user uses nested
   vm, memory hot-remove will not work.

Change log v7->v8:
1. Patch 1/9~3/9 were applied to kvm/queue by Paolo Bonzini 
.
Just resend them, no changes.
2. Removed previous patch 4/9, which added unnecessary hook 
has_secondary_apic_access().
3. Set kvm_x86_ops->set_apic_access_page_addr to NULL when hardware had no 
flexpriority
functionality which actually exists only on x86.
4. Moved declaration of kvm_arch_mmu_notifier_invalidate_page() to 
arch/*/include/asm/kvm_host.h.
5. Removed useless set_apic_access_page_addr() hook for svm.

Tang Chen (8):
   kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
   kvm: Remove ept_identity_pagetable from struct kvm_arch.
   kvm: Make init_rmode_identity_map() return 0 on success.
   kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().
   kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and
 make it non-static.
   kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
 running.
   kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access
 migration.
   kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

  arch/arm/include/asm/kvm_host.h |   5 ++
  arch/arm64/include/asm/kvm_host.h   |   5 ++
  arch/ia64/include/asm/kvm_host.h|   7 ++
  arch/mips/include/asm/kvm_host.h|   6 ++
  arch/powerpc/include/asm/kvm_host.h |   5 ++
  arch/s390/include/asm/kvm_host.h|   8 +++
  arch/x86/include/asm/kvm_host.h |   7 +-
  arch/x86/kvm/svm.c  |   3 +-
  arch/x86/kvm/vmx.c  | 130 
  arch/x86/kvm/x86.c  |  45 +++--
  include/linux/kvm_host.h|   2 +
  virt/kvm/kvm_main.c |  13 ++--
  12 files changed, 180 insertions(+), 56 deletions(-)


Thanks for your persistence!  The patches look good, I'll test them and
apply to kvm/queue.


Sure, thank you very much. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v8 3/8] kvm: Make init_rmode_identity_map() return 0 on success.

2014-09-24 Thread Tang Chen

In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4fb84ad..72a0470 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3939,45 +3939,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm->arch.ept_identity_pagetable_done. */
mutex_lock(>slots_lock);
 
-   if (likely(kvm->arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm->arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(>srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r < 0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i < PT32_ENT_PER_PAGE; i++) {
tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
, i * sizeof(tmp), sizeof(tmp));
-   if (r < 0)
+   if (ret)
goto out;
}
kvm->arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(>srcu, idx);
-
 out2:
mutex_unlock(>slots_lock);
return ret;
@@ -7604,11 +7601,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
*kvm, unsigned int id)
if (err)
goto free_vcpu;
 
+   /* Set err to -ENOMEM to handle memory allocation error. */
+   err = -ENOMEM;
+
vmx->guest_msrs = kmalloc(PAGE_SIZE, GFP_KERNEL);
BUILD_BUG_ON(ARRAY_SIZE(vmx_msr_index) * sizeof(vmx->guest_msrs[0])
 > PAGE_SIZE);
 
-   err = -ENOMEM;
if (!vmx->guest_msrs) {
goto uninit_vcpu;
}
@@ -7641,8 +7640,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
if (!kvm->arch.ept_identity_map_addr)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
-   err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   err = init_rmode_identity_map(kvm);
+   if (err < 0)
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v8 5/8] kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static.

2014-09-24 Thread Tang Chen

Since different architectures need different handling, we will add some arch 
specific
code later. The code may need to make cpu requests outside kvm_main.c, so make 
it
non-static and rename it to kvm_make_all_cpus_request().

Reviewed-by: Paolo Bonzini 
Signed-off-by: Tang Chen 
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c  | 10 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c23236a..73de13c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -580,6 +580,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm);
 void kvm_reload_remote_mmus(struct kvm *kvm);
 void kvm_make_mclock_inprogress_request(struct kvm *kvm);
 void kvm_make_scan_ioapic_request(struct kvm *kvm);
+bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
 
 long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..0f8b6f6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -152,7 +152,7 @@ static void ack_flush(void *_completed)
 {
 }
 
-static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
+bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
 {
int i, cpu, me;
cpumask_var_t cpus;
@@ -189,7 +189,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
long dirty_count = kvm->tlbs_dirty;
 
smp_mb();
-   if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
+   if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
++kvm->stat.remote_tlb_flush;
cmpxchg(>tlbs_dirty, dirty_count, 0);
 }
@@ -197,17 +197,17 @@ EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
 
 void kvm_reload_remote_mmus(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
 }
 
 void kvm_make_mclock_inprogress_request(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS);
 }
 
 void kvm_make_scan_ioapic_request(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
 }
 
 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v8 8/8] kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

2014-09-24 Thread Tang Chen

To make apic access page migratable, we do not pin it in memory now.
When it is migrated, we should reload its physical address for all
vmcses. But when we tried to do this, all vcpu will access
kvm_arch->apic_access_page without any locking. This is not safe.

Actually, we do not need kvm_arch->apic_access_page anymore. Since
apic access page is not pinned in memory now, we can remove
kvm_arch->apic_access_page. When we need to write its physical address
into vmcs, use gfn_to_page() to get its page struct, which will also
pin it. And unpin it after then.

Suggested-by: Gleb Natapov 
Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/vmx.c  | 17 ++---
 arch/x86/kvm/x86.c  | 16 ++--
 3 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 408b944..e27e1f9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -576,7 +576,7 @@ struct kvm_arch {
struct kvm_apic_map *apic_map;
 
unsigned int tss_addr;
-   struct page *apic_access_page;
+   bool apic_access_page_done;
 
gpa_t wall_clock;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 40bb9fc..4069075 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4010,7 +4010,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
int r = 0;
 
mutex_lock(>slots_lock);
-   if (kvm->arch.apic_access_page)
+   if (kvm->arch.apic_access_page_done)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
@@ -4026,7 +4026,12 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
}
 
-   kvm->arch.apic_access_page = page;
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
+   kvm->arch.apic_access_page_done = true;
 out:
mutex_unlock(>slots_lock);
return r;
@@ -4541,9 +4546,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmcs_write32(TPR_THRESHOLD, 0);
}
 
-   if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm))
-   vmcs_write64(APIC_ACCESS_ADDR,
-
page_to_phys(vmx->vcpu.kvm->arch.apic_access_page));
+   /* Reload apic access page in case it was migrated. */
+   kvm_vcpu_reload_apic_access_page(vcpu);
 
if (vmx_vm_has_apicv(vcpu->kvm))
memset(>pi_desc, 0, sizeof(struct pi_desc));
@@ -8026,8 +8030,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
} else if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
exec_control |=
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-   vmcs_write64(APIC_ACCESS_ADDR,
-   page_to_phys(vcpu->kvm->arch.apic_access_page));
+   kvm_vcpu_reload_apic_access_page(vcpu);
}
 
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e042ef6..f7cbc36 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5991,6 +5991,8 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
+   struct page *page = NULL;
+
/*
 * If platform doesn't have 2nd exec virtualize apic access affinity,
 * set_apic_access_page_addr() will be set to NULL in hardware_setup(),
@@ -6004,10 +6006,14 @@ void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu 
*vcpu)
 * migrated, GUP will wait till the migrate entry is replaced
 * with the new pte entry pointing to the new page.
 */
-   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
-   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
-   kvm_x86_ops->set_apic_access_page_addr(vcpu,
-   page_to_phys(vcpu->kvm->arch.apic_access_page));
+   page = gfn_to_page(vcpu->kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   kvm_x86_ops->set_apic_access_page_addr(vcpu, page_to_phys(page));
+
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
@@ -7272,8 +7278,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kfree(kvm->arch.vpic);
kfree(kvm->arch.vioapic);
kvm_free_vcpus(kvm);
-   if (kvm->arch.apic_access_page)
-   put_page(kvm->arch.apic_access_page);
kfree(rcu_dereference_check(kvm->arch.apic_map, 1));
 }

[PATCH v8 4/8] kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().

2014-09-24 Thread Tang Chen

We wants to migrate apic access page pinned by guest (L1 and L2) to make memory
hotplug available.

There are two situations need to be handled for apic access page used by L2 vm:
1. L1 prepares a separate apic access page for L2.

   L2 pins a lot of pages in memory. Even if we can migrate apic access page,
   memory hotplug is not available when L2 is running. So do not handle this
   now. Migrate L1's apic access page only.

2. L1 and L2 share one apic access page.

   Since we will migrate L1's apic access page, we should do some handling when
   migration happens in the following situations:

   1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
  vmcs in the next L1->L2 entry.

   2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
  L0->L1 entry and L2's vmcs in the next L1->L2 entry.

   3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
  L0->L2 entry and L1's vmcs in the next L2->L1 exit.

This patch handles 1) and 2).

Since we don't handle L1 ans L2 have separate apic access pages situation,
when we update vmcs, we need to check if we are in L2 and if L1 prepares an
non-shared apic access page for L2. We do this in 
vmx_set_apic_access_page_addr()
when trying to set new apic access page's hpa like this:

   if (!is_guest_mode(vcpu) ||
   !(vmx->nested.current_vmcs12->secondary_vm_exec_control &
 SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/vmx.c  | 39 ++-
 arch/x86/kvm/x86.c  | 23 +++
 include/linux/kvm_host.h|  1 +
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35171c7..582cd0f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -739,6 +739,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 72a0470..1411bab 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3108,9 +3108,17 @@ static __init int hardware_setup(void)
if (!cpu_has_vmx_unrestricted_guest())
enable_unrestricted_guest = 0;
 
-   if (!cpu_has_vmx_flexpriority())
+   if (!cpu_has_vmx_flexpriority()) {
flexpriority_enabled = 0;
 
+   /*
+* set_apic_access_page_addr() is used to reload apic access
+* page in case it is migrated for memory hotplug reason. If
+* platform doesn't have this affinity, no need to handle it.
+*/
+   kvm_x86_ops->set_apic_access_page_addr = NULL;
+   }
+
if (!cpu_has_vmx_tpr_shadow())
kvm_x86_ops->update_cr8_intercept = NULL;
 
@@ -7090,6 +7098,34 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   /*
+* This function is used to reload apic access page in case it is
+* migrated for memory hotplug reason. And only L1 and L2 share the
+* same apic access page situation is handled.
+*
+* 1) If vcpu is not in guest mode (in L1), reload the page for L1.
+*And L2's page will be reloaded in the next L1->L2 entry by
+*prepare_vmcs02().
+*
+* 2) If vcpu is in guest mode (in L2), but L1 didn't not prepare an
+*apic access page for L2 (current_vmcs12->secondary_vm_exec_control
+*does not have SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES bit set),
+*reload the page for L2.
+*And L1's page will be reloaded in the next L2->L1 exit.
+*
+* 3) Otherwise, do nothing. L2's specific apic access page is still
+*pinned in memory, and not hotpluggable.
+*/
+   if (!is_guest_mode(vcpu) ||
+   !(vmx->nested.current_vmcs12->secondary_vm_exec_control &
+ SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8909,6 +8945,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_ir

[PATCH v8 6/8] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-09-24 Thread Tang Chen

We are handling "L1 and L2 share one apic access page" situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
  vmcs in the next L1->L2 entry.

   2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
  L0->L1 entry and L2's vmcs in the next L1->L2 entry.

   3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
  L0->L2 entry and L1's vmcs in the next L2->L1 exit.

This patch handles 3).

In L0->L2 entry, L2's vmcs will be updated in prepare_vmcs02() called by
nested_vm_run(). So we need to do nothing.

In L2->L1 exit, this patch requests apic access page reload in L2->L1 vmexit.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/vmx.c  | 6 ++
 arch/x86/kvm/x86.c  | 3 ++-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 582cd0f..66480fd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1046,6 +1046,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu);
 
 void kvm_define_shared_msr(unsigned index, u32 msr);
 void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1411bab..40bb9fc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8826,6 +8826,12 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* We are now running in L2, mmu_notifier will force to reload the
+* page's hpa for L2 vmcs. Need to reload it for L1 before entering L1.
+*/
+   kvm_vcpu_reload_apic_access_page(vcpu);
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMRESUME instruction, so we need to set the
 * success or failure flag accordingly.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1f0c99a..c064ca6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,7 +5989,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
-static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
/*
 * If platform doesn't have 2nd exec virtualize apic access affinity,
@@ -6009,6 +6009,7 @@ static void kvm_vcpu_reload_apic_access_page(struct 
kvm_vcpu *vcpu)
kvm_x86_ops->set_apic_access_page_addr(vcpu,
page_to_phys(vcpu->kvm->arch.apic_access_page));
 }
+EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v8 7/8] kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access migration.

2014-09-24 Thread Tang Chen

We are handling "L1 and L2 share one apic access page" situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
  vmcs in the next L1->L2 entry.

   2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
  L0->L1 entry and L2's vmcs in the next L1->L2 entry.

   3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
  L0->L2 entry and L1's vmcs in the next L2->L1 exit.

This patch force a L1->L0 exit or L2->L0 exit when shared apic access page is
migrated using mmu notifier. Since apic access page is only used on intel x86,
this is arch specific code.

Signed-off-by: Tang Chen 
---
 arch/arm/include/asm/kvm_host.h |  5 +
 arch/arm64/include/asm/kvm_host.h   |  5 +
 arch/ia64/include/asm/kvm_host.h|  7 +++
 arch/mips/include/asm/kvm_host.h|  6 ++
 arch/powerpc/include/asm/kvm_host.h |  5 +
 arch/s390/include/asm/kvm_host.h|  8 
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/x86.c  | 11 +++
 virt/kvm/kvm_main.c |  3 +++
 9 files changed, 52 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6dfb404..f5b3f53 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -182,6 +182,11 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
 }
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+}
+
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
 
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e10c45a..594873a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -192,6 +192,11 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
 }
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+}
+
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
 
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index db95f57..282e71f 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -574,6 +574,13 @@ static inline struct kvm_pt_regs *vcpu_regs(struct 
kvm_vcpu *v)
return (struct kvm_pt_regs *) ((unsigned long) v + KVM_STK_OFFSET) - 1;
 }
 
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
+
 typedef int kvm_vmm_entry(void);
 typedef void kvm_tramp_entry(union context *host, union context *guest);
 
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 7a3fc67..4826d29 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -767,5 +767,11 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t 
*opc,
 extern void kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
 extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
 
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
 
 #endif /* __MIPS_KVM_HOST_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 98d9dd5..e40402d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -61,6 +61,11 @@ extern int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 extern void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+}
+
 #define HPTEG_CACHE_NUM(1 << 15)
 #define HPTEG_HASH_BITS_PTE13
 #define HPTEG_HASH_BITS_PTE_LONG   12
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 773bef7..e4d6708 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -450,4 +450,12 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 extern char sie_exit;
+
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+static inline void kvm_arc

[PATCH v8 2/8] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-09-24 Thread Tang Chen

kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 47 +++--
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..35171c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,7 +580,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b80ead..4fb84ad 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3938,21 +3939,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm->arch.ept_identity_pagetable)) {
-   printk(KERN_ERR "EPT: identity-mapping pagetable "
-   "haven't been allocated!\n");
-   return 0;
+
+   /* Protect kvm->arch.ept_identity_pagetable_done. */
+   mutex_lock(>slots_lock);
+
+   if (likely(kvm->arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm->arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(>srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
@@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(>srcu, idx);
+
+out2:
+   mutex_unlock(>slots_lock);
return ret;
 }
 
@@ -4019,31 +4029,20 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /* Called with kvm->slots_lock held. */
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(>slots_lock);
-   if (kvm->arch.ept_identity_pagetable)
-   goto out;
+   BUG_ON(kvm->arch.ept_identity_pagetable_done);
+
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm->arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, _userspace_mem);
-   if (r)
-   goto out;
-
-   page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
 
-   kvm->arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(>slots_lock);
return r;
 }
 
@@ -7643,8 +7642,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..e05bd58 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7239,8 +7239,6 @@ void kvm_ar

[PATCH v8 0/8] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-24 Thread Tang Chen

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

This patch-set is based on Linux 3.17.0-rc5.

NOTE: Tested with -cpu xxx,-x2apic option.
  But since nested vm pins some other pages in memory, if user uses nested
  vm, memory hot-remove will not work.

Change log v7->v8:
1. Patch 1/9~3/9 were applied to kvm/queue by Paolo Bonzini 
.
   Just resend them, no changes.
2. Removed previous patch 4/9, which added unnecessary hook 
has_secondary_apic_access().
3. Set kvm_x86_ops->set_apic_access_page_addr to NULL when hardware had no 
flexpriority
   functionality which actually exists only on x86. 
4. Moved declaration of kvm_arch_mmu_notifier_invalidate_page() to 
arch/*/include/asm/kvm_host.h.
5. Removed useless set_apic_access_page_addr() hook for svm.

Tang Chen (8):
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().
  kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and
make it non-static.
  kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
running.
  kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access
migration.
  kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

 arch/arm/include/asm/kvm_host.h |   5 ++
 arch/arm64/include/asm/kvm_host.h   |   5 ++
 arch/ia64/include/asm/kvm_host.h|   7 ++
 arch/mips/include/asm/kvm_host.h|   6 ++
 arch/powerpc/include/asm/kvm_host.h |   5 ++
 arch/s390/include/asm/kvm_host.h|   8 +++
 arch/x86/include/asm/kvm_host.h |   7 +-
 arch/x86/kvm/svm.c  |   3 +-
 arch/x86/kvm/vmx.c  | 130 
 arch/x86/kvm/x86.c  |  45 +++--
 include/linux/kvm_host.h|   2 +
 virt/kvm/kvm_main.c |  13 ++--
 12 files changed, 180 insertions(+), 56 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v8 1/8] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-09-24 Thread Tang Chen

We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..1d941ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm->asid_generation = 0;
init_vmcb(svm);
 
-   svm->vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(>vcpu))
svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..4b80ead 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, _userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(>vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(>vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access migration.

2014-09-24 Thread Tang Chen



On 09/24/2014 03:08 PM, Jan Kiszka wrote:

On 2014-09-24 04:09, Tang Chen wrote:

Hi Paolo,

I'm not sure if this patch is following your comment. Please review.
And all the other comments are followed. If this patch is OK, I'll
send v8 soon.

Thanks.

We are handling "L1 and L2 share one apic access page" situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
   vmcs in the next L1->L2 entry.

2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
   L0->L1 entry and L2's vmcs in the next L1->L2 entry.

3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
   L0->L2 entry and L1's vmcs in the next L2->L1 exit.

This patch force a L1->L0 exit or L2->L0 exit when shared apic access page is
migrated using mmu notifier. Since apic access page is only used on intel x86,
this is arch specific code.
---
  arch/arm/include/asm/kvm_host.h |  6 ++
  arch/arm64/include/asm/kvm_host.h   |  6 ++
  arch/ia64/include/asm/kvm_host.h|  8 
  arch/mips/include/asm/kvm_host.h|  7 +++
  arch/powerpc/include/asm/kvm_host.h |  6 ++
  arch/s390/include/asm/kvm_host.h|  9 +
  arch/x86/include/asm/kvm_host.h |  2 ++
  arch/x86/kvm/x86.c  | 11 +++
  virt/kvm/kvm_main.c |  3 +++
  9 files changed, 58 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6dfb404..79bbf7d 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -182,6 +182,12 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
  }
  
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,

+unsigned long address)
+{
+   return;

Redundant return, more cases below.


OK, will remove it. Thanks.



Jan


+}
+
  struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
  struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
  
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h

index e10c45a..ee89fad 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -192,6 +192,12 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
  }
  
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,

+unsigned long address)
+{
+   return;
+}
+
  struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
  struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
  
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h

index db95f57..326ac55 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -574,6 +574,14 @@ static inline struct kvm_pt_regs *vcpu_regs(struct 
kvm_vcpu *v)
return (struct kvm_pt_regs *) ((unsigned long) v + KVM_STK_OFFSET) - 1;
  }
  
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER

+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
+
  typedef int kvm_vmm_entry(void);
  typedef void kvm_tramp_entry(union context *host, union context *guest);
  
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h

index 7a3fc67..c392705 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -767,5 +767,12 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t 
*opc,
  extern void kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
  extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
  
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER

+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
  
  #endif /* __MIPS_KVM_HOST_H__ */

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 98d9dd5..c16a573 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -61,6 +61,12 @@ extern int kvm_age_hva(struct kvm *kvm, unsigned long hva);
  extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
  extern void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
  
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,

+unsigned long address)
+{
+   return;
+}
+
  #define HPTEG_CACHE_NUM   (1 << 15)
  #define HPTEG_HASH_BITS_PTE   13
  #define HPTEG_HASH_BITS_PTE_L

Re: [PATCH 1/1] kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access migration.

2014-09-24 Thread Tang Chen



On 09/24/2014 03:08 PM, Jan Kiszka wrote:

On 2014-09-24 04:09, Tang Chen wrote:

Hi Paolo,

I'm not sure if this patch is following your comment. Please review.
And all the other comments are followed. If this patch is OK, I'll
send v8 soon.

Thanks.

We are handling L1 and L2 share one apic access page situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

1) when L0 is running: Update L1's vmcs in the next L0-L1 entry and L2's
   vmcs in the next L1-L2 entry.

2) when L1 is running: Force a L1-L0 exit, update L1's vmcs in the next
   L0-L1 entry and L2's vmcs in the next L1-L2 entry.

3) when L2 is running: Force a L2-L0 exit, update L2's vmcs in the next
   L0-L2 entry and L1's vmcs in the next L2-L1 exit.

This patch force a L1-L0 exit or L2-L0 exit when shared apic access page is
migrated using mmu notifier. Since apic access page is only used on intel x86,
this is arch specific code.
---
  arch/arm/include/asm/kvm_host.h |  6 ++
  arch/arm64/include/asm/kvm_host.h   |  6 ++
  arch/ia64/include/asm/kvm_host.h|  8 
  arch/mips/include/asm/kvm_host.h|  7 +++
  arch/powerpc/include/asm/kvm_host.h |  6 ++
  arch/s390/include/asm/kvm_host.h|  9 +
  arch/x86/include/asm/kvm_host.h |  2 ++
  arch/x86/kvm/x86.c  | 11 +++
  virt/kvm/kvm_main.c |  3 +++
  9 files changed, 58 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6dfb404..79bbf7d 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -182,6 +182,12 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
  }
  
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,

+unsigned long address)
+{
+   return;

Redundant return, more cases below.


OK, will remove it. Thanks.



Jan


+}
+
  struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
  struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
  
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h

index e10c45a..ee89fad 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -192,6 +192,12 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
  }
  
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,

+unsigned long address)
+{
+   return;
+}
+
  struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
  struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
  
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h

index db95f57..326ac55 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -574,6 +574,14 @@ static inline struct kvm_pt_regs *vcpu_regs(struct 
kvm_vcpu *v)
return (struct kvm_pt_regs *) ((unsigned long) v + KVM_STK_OFFSET) - 1;
  }
  
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER

+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
+
  typedef int kvm_vmm_entry(void);
  typedef void kvm_tramp_entry(union context *host, union context *guest);
  
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h

index 7a3fc67..c392705 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -767,5 +767,12 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t 
*opc,
  extern void kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
  extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
  
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER

+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
  
  #endif /* __MIPS_KVM_HOST_H__ */

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 98d9dd5..c16a573 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -61,6 +61,12 @@ extern int kvm_age_hva(struct kvm *kvm, unsigned long hva);
  extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
  extern void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
  
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,

+unsigned long address)
+{
+   return;
+}
+
  #define HPTEG_CACHE_NUM   (1  15)
  #define HPTEG_HASH_BITS_PTE   13
  #define HPTEG_HASH_BITS_PTE_LONG  12
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390

[PATCH v8 1/8] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-09-24 Thread Tang Chen

We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
Reviewed-by: Gleb Natapov g...@kernel.org
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..1d941ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm-asid_generation = 0;
init_vmcb(svm);
 
-   svm-vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm-vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(svm-vcpu))
svm-vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..4b80ead 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, kvm_userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx-vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(vmx-vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(vmx-vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v8 2/8] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-09-24 Thread Tang Chen

kvm_arch-ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch-ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch-ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
Reviewed-by: Gleb Natapov g...@kernel.org
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 47 +++--
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..35171c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,7 +580,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b80ead..4fb84ad 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3938,21 +3939,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm-arch.ept_identity_pagetable)) {
-   printk(KERN_ERR EPT: identity-mapping pagetable 
-   haven't been allocated!\n);
-   return 0;
+
+   /* Protect kvm-arch.ept_identity_pagetable_done. */
+   mutex_lock(kvm-slots_lock);
+
+   if (likely(kvm-arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm-arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm-arch.ept_identity_map_addr  PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(kvm-srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r  0)
@@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(kvm-srcu, idx);
+
+out2:
+   mutex_unlock(kvm-slots_lock);
return ret;
 }
 
@@ -4019,31 +4029,20 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /* Called with kvm-slots_lock held. */
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(kvm-slots_lock);
-   if (kvm-arch.ept_identity_pagetable)
-   goto out;
+   BUG_ON(kvm-arch.ept_identity_pagetable_done);
+
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm-arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, kvm_userspace_mem);
-   if (r)
-   goto out;
-
-   page = gfn_to_page(kvm, kvm-arch.ept_identity_map_addr  PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
 
-   kvm-arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(kvm-slots_lock);
return r;
 }
 
@@ -7643,8 +7642,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm-arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..e05bd58 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7239,8 +7239,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_free_vcpus(kvm);
if (kvm

[PATCH v8 0/8] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-24 Thread Tang Chen

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

This patch-set is based on Linux 3.17.0-rc5.

NOTE: Tested with -cpu xxx,-x2apic option.
  But since nested vm pins some other pages in memory, if user uses nested
  vm, memory hot-remove will not work.

Change log v7-v8:
1. Patch 1/9~3/9 were applied to kvm/queue by Paolo Bonzini 
pbonz...@redhat.com.
   Just resend them, no changes.
2. Removed previous patch 4/9, which added unnecessary hook 
has_secondary_apic_access().
3. Set kvm_x86_ops-set_apic_access_page_addr to NULL when hardware had no 
flexpriority
   functionality which actually exists only on x86. 
4. Moved declaration of kvm_arch_mmu_notifier_invalidate_page() to 
arch/*/include/asm/kvm_host.h.
5. Removed useless set_apic_access_page_addr() hook for svm.

Tang Chen (8):
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().
  kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and
make it non-static.
  kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
running.
  kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access
migration.
  kvm, mem-hotplug: Unpin and remove kvm_arch-apic_access_page.

 arch/arm/include/asm/kvm_host.h |   5 ++
 arch/arm64/include/asm/kvm_host.h   |   5 ++
 arch/ia64/include/asm/kvm_host.h|   7 ++
 arch/mips/include/asm/kvm_host.h|   6 ++
 arch/powerpc/include/asm/kvm_host.h |   5 ++
 arch/s390/include/asm/kvm_host.h|   8 +++
 arch/x86/include/asm/kvm_host.h |   7 +-
 arch/x86/kvm/svm.c  |   3 +-
 arch/x86/kvm/vmx.c  | 130 
 arch/x86/kvm/x86.c  |  45 +++--
 include/linux/kvm_host.h|   2 +
 virt/kvm/kvm_main.c |  13 ++--
 12 files changed, 180 insertions(+), 56 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v8 6/8] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-09-24 Thread Tang Chen

We are handling L1 and L2 share one apic access page situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0-L1 entry and L2's
  vmcs in the next L1-L2 entry.

   2) when L1 is running: Force a L1-L0 exit, update L1's vmcs in the next
  L0-L1 entry and L2's vmcs in the next L1-L2 entry.

   3) when L2 is running: Force a L2-L0 exit, update L2's vmcs in the next
  L0-L2 entry and L1's vmcs in the next L2-L1 exit.

This patch handles 3).

In L0-L2 entry, L2's vmcs will be updated in prepare_vmcs02() called by
nested_vm_run(). So we need to do nothing.

In L2-L1 exit, this patch requests apic access page reload in L2-L1 vmexit.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/vmx.c  | 6 ++
 arch/x86/kvm/x86.c  | 3 ++-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 582cd0f..66480fd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1046,6 +1046,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu);
 
 void kvm_define_shared_msr(unsigned index, u32 msr);
 void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1411bab..40bb9fc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8826,6 +8826,12 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* We are now running in L2, mmu_notifier will force to reload the
+* page's hpa for L2 vmcs. Need to reload it for L1 before entering L1.
+*/
+   kvm_vcpu_reload_apic_access_page(vcpu);
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMRESUME instruction, so we need to set the
 * success or failure flag accordingly.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1f0c99a..c064ca6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,7 +5989,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
-static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
/*
 * If platform doesn't have 2nd exec virtualize apic access affinity,
@@ -6009,6 +6009,7 @@ static void kvm_vcpu_reload_apic_access_page(struct 
kvm_vcpu *vcpu)
kvm_x86_ops-set_apic_access_page_addr(vcpu,
page_to_phys(vcpu-kvm-arch.apic_access_page));
 }
+EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v8 7/8] kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access migration.

2014-09-24 Thread Tang Chen

We are handling L1 and L2 share one apic access page situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0-L1 entry and L2's
  vmcs in the next L1-L2 entry.

   2) when L1 is running: Force a L1-L0 exit, update L1's vmcs in the next
  L0-L1 entry and L2's vmcs in the next L1-L2 entry.

   3) when L2 is running: Force a L2-L0 exit, update L2's vmcs in the next
  L0-L2 entry and L1's vmcs in the next L2-L1 exit.

This patch force a L1-L0 exit or L2-L0 exit when shared apic access page is
migrated using mmu notifier. Since apic access page is only used on intel x86,
this is arch specific code.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/arm/include/asm/kvm_host.h |  5 +
 arch/arm64/include/asm/kvm_host.h   |  5 +
 arch/ia64/include/asm/kvm_host.h|  7 +++
 arch/mips/include/asm/kvm_host.h|  6 ++
 arch/powerpc/include/asm/kvm_host.h |  5 +
 arch/s390/include/asm/kvm_host.h|  8 
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/x86.c  | 11 +++
 virt/kvm/kvm_main.c |  3 +++
 9 files changed, 52 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6dfb404..f5b3f53 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -182,6 +182,11 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
 }
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+}
+
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
 
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e10c45a..594873a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -192,6 +192,11 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
 }
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+}
+
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
 
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index db95f57..282e71f 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -574,6 +574,13 @@ static inline struct kvm_pt_regs *vcpu_regs(struct 
kvm_vcpu *v)
return (struct kvm_pt_regs *) ((unsigned long) v + KVM_STK_OFFSET) - 1;
 }
 
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
+
 typedef int kvm_vmm_entry(void);
 typedef void kvm_tramp_entry(union context *host, union context *guest);
 
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 7a3fc67..4826d29 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -767,5 +767,11 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t 
*opc,
 extern void kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
 extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
 
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
 
 #endif /* __MIPS_KVM_HOST_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 98d9dd5..e40402d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -61,6 +61,11 @@ extern int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 extern void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+}
+
 #define HPTEG_CACHE_NUM(1  15)
 #define HPTEG_HASH_BITS_PTE13
 #define HPTEG_HASH_BITS_PTE_LONG   12
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 773bef7..e4d6708 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -450,4 +450,12 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 extern char sie_exit;
+
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm

[PATCH v8 4/8] kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().

2014-09-24 Thread Tang Chen

We wants to migrate apic access page pinned by guest (L1 and L2) to make memory
hotplug available.

There are two situations need to be handled for apic access page used by L2 vm:
1. L1 prepares a separate apic access page for L2.

   L2 pins a lot of pages in memory. Even if we can migrate apic access page,
   memory hotplug is not available when L2 is running. So do not handle this
   now. Migrate L1's apic access page only.

2. L1 and L2 share one apic access page.

   Since we will migrate L1's apic access page, we should do some handling when
   migration happens in the following situations:

   1) when L0 is running: Update L1's vmcs in the next L0-L1 entry and L2's
  vmcs in the next L1-L2 entry.

   2) when L1 is running: Force a L1-L0 exit, update L1's vmcs in the next
  L0-L1 entry and L2's vmcs in the next L1-L2 entry.

   3) when L2 is running: Force a L2-L0 exit, update L2's vmcs in the next
  L0-L2 entry and L1's vmcs in the next L2-L1 exit.

This patch handles 1) and 2).

Since we don't handle L1 ans L2 have separate apic access pages situation,
when we update vmcs, we need to check if we are in L2 and if L1 prepares an
non-shared apic access page for L2. We do this in 
vmx_set_apic_access_page_addr()
when trying to set new apic access page's hpa like this:

   if (!is_guest_mode(vcpu) ||
   !(vmx-nested.current_vmcs12-secondary_vm_exec_control 
 SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/vmx.c  | 39 ++-
 arch/x86/kvm/x86.c  | 23 +++
 include/linux/kvm_host.h|  1 +
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35171c7..582cd0f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -739,6 +739,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 72a0470..1411bab 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3108,9 +3108,17 @@ static __init int hardware_setup(void)
if (!cpu_has_vmx_unrestricted_guest())
enable_unrestricted_guest = 0;
 
-   if (!cpu_has_vmx_flexpriority())
+   if (!cpu_has_vmx_flexpriority()) {
flexpriority_enabled = 0;
 
+   /*
+* set_apic_access_page_addr() is used to reload apic access
+* page in case it is migrated for memory hotplug reason. If
+* platform doesn't have this affinity, no need to handle it.
+*/
+   kvm_x86_ops-set_apic_access_page_addr = NULL;
+   }
+
if (!cpu_has_vmx_tpr_shadow())
kvm_x86_ops-update_cr8_intercept = NULL;
 
@@ -7090,6 +7098,34 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   /*
+* This function is used to reload apic access page in case it is
+* migrated for memory hotplug reason. And only L1 and L2 share the
+* same apic access page situation is handled.
+*
+* 1) If vcpu is not in guest mode (in L1), reload the page for L1.
+*And L2's page will be reloaded in the next L1-L2 entry by
+*prepare_vmcs02().
+*
+* 2) If vcpu is in guest mode (in L2), but L1 didn't not prepare an
+*apic access page for L2 (current_vmcs12-secondary_vm_exec_control
+*does not have SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES bit set),
+*reload the page for L2.
+*And L1's page will be reloaded in the next L2-L1 exit.
+*
+* 3) Otherwise, do nothing. L2's specific apic access page is still
+*pinned in memory, and not hotpluggable.
+*/
+   if (!is_guest_mode(vcpu) ||
+   !(vmx-nested.current_vmcs12-secondary_vm_exec_control 
+ SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8909,6 +8945,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept

[PATCH v8 8/8] kvm, mem-hotplug: Unpin and remove kvm_arch-apic_access_page.

2014-09-24 Thread Tang Chen

To make apic access page migratable, we do not pin it in memory now.
When it is migrated, we should reload its physical address for all
vmcses. But when we tried to do this, all vcpu will access
kvm_arch-apic_access_page without any locking. This is not safe.

Actually, we do not need kvm_arch-apic_access_page anymore. Since
apic access page is not pinned in memory now, we can remove
kvm_arch-apic_access_page. When we need to write its physical address
into vmcs, use gfn_to_page() to get its page struct, which will also
pin it. And unpin it after then.

Suggested-by: Gleb Natapov g...@kernel.org
Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/vmx.c  | 17 ++---
 arch/x86/kvm/x86.c  | 16 ++--
 3 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 408b944..e27e1f9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -576,7 +576,7 @@ struct kvm_arch {
struct kvm_apic_map *apic_map;
 
unsigned int tss_addr;
-   struct page *apic_access_page;
+   bool apic_access_page_done;
 
gpa_t wall_clock;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 40bb9fc..4069075 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4010,7 +4010,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
int r = 0;
 
mutex_lock(kvm-slots_lock);
-   if (kvm-arch.apic_access_page)
+   if (kvm-arch.apic_access_page_done)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
@@ -4026,7 +4026,12 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
}
 
-   kvm-arch.apic_access_page = page;
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
+   kvm-arch.apic_access_page_done = true;
 out:
mutex_unlock(kvm-slots_lock);
return r;
@@ -4541,9 +4546,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmcs_write32(TPR_THRESHOLD, 0);
}
 
-   if (vm_need_virtualize_apic_accesses(vmx-vcpu.kvm))
-   vmcs_write64(APIC_ACCESS_ADDR,
-
page_to_phys(vmx-vcpu.kvm-arch.apic_access_page));
+   /* Reload apic access page in case it was migrated. */
+   kvm_vcpu_reload_apic_access_page(vcpu);
 
if (vmx_vm_has_apicv(vcpu-kvm))
memset(vmx-pi_desc, 0, sizeof(struct pi_desc));
@@ -8026,8 +8030,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
} else if (vm_need_virtualize_apic_accesses(vmx-vcpu.kvm)) {
exec_control |=
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-   vmcs_write64(APIC_ACCESS_ADDR,
-   page_to_phys(vcpu-kvm-arch.apic_access_page));
+   kvm_vcpu_reload_apic_access_page(vcpu);
}
 
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e042ef6..f7cbc36 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5991,6 +5991,8 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
+   struct page *page = NULL;
+
/*
 * If platform doesn't have 2nd exec virtualize apic access affinity,
 * set_apic_access_page_addr() will be set to NULL in hardware_setup(),
@@ -6004,10 +6006,14 @@ void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu 
*vcpu)
 * migrated, GUP will wait till the migrate entry is replaced
 * with the new pte entry pointing to the new page.
 */
-   vcpu-kvm-arch.apic_access_page = gfn_to_page(vcpu-kvm,
-   APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
-   kvm_x86_ops-set_apic_access_page_addr(vcpu,
-   page_to_phys(vcpu-kvm-arch.apic_access_page));
+   page = gfn_to_page(vcpu-kvm, APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
+   kvm_x86_ops-set_apic_access_page_addr(vcpu, page_to_phys(page));
+
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
@@ -7272,8 +7278,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kfree(kvm-arch.vpic);
kfree(kvm-arch.vioapic);
kvm_free_vcpus(kvm);
-   if (kvm-arch.apic_access_page)
-   put_page(kvm-arch.apic_access_page);
kfree(rcu_dereference_check(kvm-arch.apic_map, 1));
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel

[PATCH v8 5/8] kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static.

2014-09-24 Thread Tang Chen

Since different architectures need different handling, we will add some arch 
specific
code later. The code may need to make cpu requests outside kvm_main.c, so make 
it
non-static and rename it to kvm_make_all_cpus_request().

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c  | 10 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c23236a..73de13c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -580,6 +580,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm);
 void kvm_reload_remote_mmus(struct kvm *kvm);
 void kvm_make_mclock_inprogress_request(struct kvm *kvm);
 void kvm_make_scan_ioapic_request(struct kvm *kvm);
+bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
 
 long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..0f8b6f6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -152,7 +152,7 @@ static void ack_flush(void *_completed)
 {
 }
 
-static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
+bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
 {
int i, cpu, me;
cpumask_var_t cpus;
@@ -189,7 +189,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
long dirty_count = kvm-tlbs_dirty;
 
smp_mb();
-   if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
+   if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
++kvm-stat.remote_tlb_flush;
cmpxchg(kvm-tlbs_dirty, dirty_count, 0);
 }
@@ -197,17 +197,17 @@ EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
 
 void kvm_reload_remote_mmus(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
 }
 
 void kvm_make_mclock_inprogress_request(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS);
 }
 
 void kvm_make_scan_ioapic_request(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
 }
 
 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v8 3/8] kvm: Make init_rmode_identity_map() return 0 on success.

2014-09-24 Thread Tang Chen

In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/kvm/vmx.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4fb84ad..72a0470 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3939,45 +3939,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm-arch.ept_identity_pagetable_done. */
mutex_lock(kvm-slots_lock);
 
-   if (likely(kvm-arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm-arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm-arch.ept_identity_map_addr  PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(kvm-srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r  0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i  PT32_ENT_PER_PAGE; i++) {
tmp = (i  22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
tmp, i * sizeof(tmp), sizeof(tmp));
-   if (r  0)
+   if (ret)
goto out;
}
kvm-arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(kvm-srcu, idx);
-
 out2:
mutex_unlock(kvm-slots_lock);
return ret;
@@ -7604,11 +7601,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
*kvm, unsigned int id)
if (err)
goto free_vcpu;
 
+   /* Set err to -ENOMEM to handle memory allocation error. */
+   err = -ENOMEM;
+
vmx-guest_msrs = kmalloc(PAGE_SIZE, GFP_KERNEL);
BUILD_BUG_ON(ARRAY_SIZE(vmx_msr_index) * sizeof(vmx-guest_msrs[0])
  PAGE_SIZE);
 
-   err = -ENOMEM;
if (!vmx-guest_msrs) {
goto uninit_vcpu;
}
@@ -7641,8 +7640,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
if (!kvm-arch.ept_identity_map_addr)
kvm-arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
-   err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   err = init_rmode_identity_map(kvm);
+   if (err  0)
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 0/8] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-24 Thread Tang Chen



On 09/24/2014 04:20 PM, Paolo Bonzini wrote:

Il 24/09/2014 09:57, Tang Chen ha scritto:

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

This patch-set is based on Linux 3.17.0-rc5.

NOTE: Tested with -cpu xxx,-x2apic option.
   But since nested vm pins some other pages in memory, if user uses nested
   vm, memory hot-remove will not work.

Change log v7-v8:
1. Patch 1/9~3/9 were applied to kvm/queue by Paolo Bonzini 
pbonz...@redhat.com.
Just resend them, no changes.
2. Removed previous patch 4/9, which added unnecessary hook 
has_secondary_apic_access().
3. Set kvm_x86_ops-set_apic_access_page_addr to NULL when hardware had no 
flexpriority
functionality which actually exists only on x86.
4. Moved declaration of kvm_arch_mmu_notifier_invalidate_page() to 
arch/*/include/asm/kvm_host.h.
5. Removed useless set_apic_access_page_addr() hook for svm.

Tang Chen (8):
   kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
   kvm: Remove ept_identity_pagetable from struct kvm_arch.
   kvm: Make init_rmode_identity_map() return 0 on success.
   kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().
   kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and
 make it non-static.
   kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
 running.
   kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access
 migration.
   kvm, mem-hotplug: Unpin and remove kvm_arch-apic_access_page.

  arch/arm/include/asm/kvm_host.h |   5 ++
  arch/arm64/include/asm/kvm_host.h   |   5 ++
  arch/ia64/include/asm/kvm_host.h|   7 ++
  arch/mips/include/asm/kvm_host.h|   6 ++
  arch/powerpc/include/asm/kvm_host.h |   5 ++
  arch/s390/include/asm/kvm_host.h|   8 +++
  arch/x86/include/asm/kvm_host.h |   7 +-
  arch/x86/kvm/svm.c  |   3 +-
  arch/x86/kvm/vmx.c  | 130 
  arch/x86/kvm/x86.c  |  45 +++--
  include/linux/kvm_host.h|   2 +
  virt/kvm/kvm_main.c |  13 ++--
  12 files changed, 180 insertions(+), 56 deletions(-)


Thanks for your persistence!  The patches look good, I'll test them and
apply to kvm/queue.


Sure, thank you very much. :)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access migration.

2014-09-23 Thread Tang Chen

Hi Paolo, 

I'm not sure if this patch is following your comment. Please review.
And all the other comments are followed. If this patch is OK, I'll 
send v8 soon.

Thanks.

We are handling "L1 and L2 share one apic access page" situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
  vmcs in the next L1->L2 entry.

   2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
  L0->L1 entry and L2's vmcs in the next L1->L2 entry.

   3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
  L0->L2 entry and L1's vmcs in the next L2->L1 exit.

This patch force a L1->L0 exit or L2->L0 exit when shared apic access page is
migrated using mmu notifier. Since apic access page is only used on intel x86,
this is arch specific code.
---
 arch/arm/include/asm/kvm_host.h |  6 ++
 arch/arm64/include/asm/kvm_host.h   |  6 ++
 arch/ia64/include/asm/kvm_host.h|  8 
 arch/mips/include/asm/kvm_host.h|  7 +++
 arch/powerpc/include/asm/kvm_host.h |  6 ++
 arch/s390/include/asm/kvm_host.h|  9 +
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/x86.c  | 11 +++
 virt/kvm/kvm_main.c |  3 +++
 9 files changed, 58 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6dfb404..79bbf7d 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -182,6 +182,12 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
 }
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
 
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e10c45a..ee89fad 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -192,6 +192,12 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
 }
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
 
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index db95f57..326ac55 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -574,6 +574,14 @@ static inline struct kvm_pt_regs *vcpu_regs(struct 
kvm_vcpu *v)
return (struct kvm_pt_regs *) ((unsigned long) v + KVM_STK_OFFSET) - 1;
 }
 
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
+
 typedef int kvm_vmm_entry(void);
 typedef void kvm_tramp_entry(union context *host, union context *guest);
 
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 7a3fc67..c392705 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -767,5 +767,12 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t 
*opc,
 extern void kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
 extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
 
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
 
 #endif /* __MIPS_KVM_HOST_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 98d9dd5..c16a573 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -61,6 +61,12 @@ extern int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 extern void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+
 #define HPTEG_CACHE_NUM(1 << 15)
 #define HPTEG_HASH_BITS_PTE13
 #define HPTEG_HASH_BITS_PTE_LONG   12
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 773bef7..693290f 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -450,4 +450,13 @@ void

[PATCH 1/1] kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access migration.

2014-09-23 Thread Tang Chen

Hi Paolo, 

I'm not sure if this patch is following your comment. Please review.
And all the other comments are followed. If this patch is OK, I'll 
send v8 soon.

Thanks.

We are handling L1 and L2 share one apic access page situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0-L1 entry and L2's
  vmcs in the next L1-L2 entry.

   2) when L1 is running: Force a L1-L0 exit, update L1's vmcs in the next
  L0-L1 entry and L2's vmcs in the next L1-L2 entry.

   3) when L2 is running: Force a L2-L0 exit, update L2's vmcs in the next
  L0-L2 entry and L1's vmcs in the next L2-L1 exit.

This patch force a L1-L0 exit or L2-L0 exit when shared apic access page is
migrated using mmu notifier. Since apic access page is only used on intel x86,
this is arch specific code.
---
 arch/arm/include/asm/kvm_host.h |  6 ++
 arch/arm64/include/asm/kvm_host.h   |  6 ++
 arch/ia64/include/asm/kvm_host.h|  8 
 arch/mips/include/asm/kvm_host.h|  7 +++
 arch/powerpc/include/asm/kvm_host.h |  6 ++
 arch/s390/include/asm/kvm_host.h|  9 +
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/x86.c  | 11 +++
 virt/kvm/kvm_main.c |  3 +++
 9 files changed, 58 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6dfb404..79bbf7d 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -182,6 +182,12 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
 }
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
 
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e10c45a..ee89fad 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -192,6 +192,12 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
 }
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
 
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index db95f57..326ac55 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -574,6 +574,14 @@ static inline struct kvm_pt_regs *vcpu_regs(struct 
kvm_vcpu *v)
return (struct kvm_pt_regs *) ((unsigned long) v + KVM_STK_OFFSET) - 1;
 }
 
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
+
 typedef int kvm_vmm_entry(void);
 typedef void kvm_tramp_entry(union context *host, union context *guest);
 
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 7a3fc67..c392705 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -767,5 +767,12 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t 
*opc,
 extern void kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
 extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
 
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
 
 #endif /* __MIPS_KVM_HOST_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 98d9dd5..c16a573 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -61,6 +61,12 @@ extern int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 extern void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+
 #define HPTEG_CACHE_NUM(1  15)
 #define HPTEG_HASH_BITS_PTE13
 #define HPTEG_HASH_BITS_PTE_LONG   12
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 773bef7..693290f 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -450,4 +450,13 @@ void kvm_arch_async_page_present(struct kvm_vcpu

[PATCH v7 3/9] kvm: Make init_rmode_identity_map() return 0 on success.

2014-09-20 Thread Tang Chen

In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4fb84ad..72a0470 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3939,45 +3939,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm->arch.ept_identity_pagetable_done. */
mutex_lock(>slots_lock);
 
-   if (likely(kvm->arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm->arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(>srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r < 0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i < PT32_ENT_PER_PAGE; i++) {
tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
, i * sizeof(tmp), sizeof(tmp));
-   if (r < 0)
+   if (ret)
goto out;
}
kvm->arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(>srcu, idx);
-
 out2:
mutex_unlock(>slots_lock);
return ret;
@@ -7604,11 +7601,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
*kvm, unsigned int id)
if (err)
goto free_vcpu;
 
+   /* Set err to -ENOMEM to handle memory allocation error. */
+   err = -ENOMEM;
+
vmx->guest_msrs = kmalloc(PAGE_SIZE, GFP_KERNEL);
BUILD_BUG_ON(ARRAY_SIZE(vmx_msr_index) * sizeof(vmx->guest_msrs[0])
 > PAGE_SIZE);
 
-   err = -ENOMEM;
if (!vmx->guest_msrs) {
goto uninit_vcpu;
}
@@ -7641,8 +7640,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
if (!kvm->arch.ept_identity_map_addr)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
-   err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   err = init_rmode_identity_map(kvm);
+   if (err < 0)
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v7 5/9] kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().

2014-09-20 Thread Tang Chen

We are handling "L1 and L2 share one apic access page" situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
  vmcs in the next L1->L2 entry.

   2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
  L0->L1 entry and L2's vmcs in the next L1->L2 entry.

   3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
  L0->L2 entry and L1's vmcs in the next L2->L1 exit.

This patch handles 1) and 2) by making a new vcpu request named 
KVM_REQ_APIC_PAGE_RELOAD
to reload apic access page in L0->L1 entry.

Since we don't handle L1 ans L2 have separate apic access pages situation,
when we update vmcs, we need to check if we are in L2 and if L2's secondary
exec virtualzed apic accesses is enabled.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  |  6 ++
 arch/x86/kvm/x86.c  | 23 +++
 include/linux/kvm_host.h|  1 +
 5 files changed, 37 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 69fe032..56156eb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -740,6 +740,7 @@ struct kvm_x86_ops {
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
bool (*has_secondary_apic_access)(struct kvm_vcpu *vcpu);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 9c8ae32..99378d7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3624,6 +3624,11 @@ static bool svm_has_secondary_apic_access(struct 
kvm_vcpu *vcpu)
return false;
 }
 
+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4379,6 +4384,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
.has_secondary_apic_access = svm_has_secondary_apic_access,
+   .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0b541d9..c8e90ec 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7098,6 +7098,11 @@ static bool vmx_has_secondary_apic_access(struct 
kvm_vcpu *vcpu)
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8918,6 +8923,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
.has_secondary_apic_access = vmx_has_secondary_apic_access,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e05bd58..fc54fa6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,6 +5989,27 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+   /*
+* Only APIC access page shared by L1 and L2 vm is handled. The APIC
+* access page prepared by L1 for L2's execution is still pinned in
+* memory, and it cannot be migrated.
+*/
+   if (!is_guest_mode(vcpu) ||
+   !kvm_x86_ops->has_secondary_apic_access(vcpu)) {
+   /*
+* APIC access page could be migrated. When the page is being
+* migrated, GUP will wait till the migrate entry is replaced
+* with the new pte entry pointing to the new page.
+*/
+   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
+   page_to_phys(vcpu->kvm->arch.apic_access_pa

[PATCH v7 2/9] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-09-20 Thread Tang Chen

kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 47 +++--
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..35171c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,7 +580,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b80ead..4fb84ad 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3938,21 +3939,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm->arch.ept_identity_pagetable)) {
-   printk(KERN_ERR "EPT: identity-mapping pagetable "
-   "haven't been allocated!\n");
-   return 0;
+
+   /* Protect kvm->arch.ept_identity_pagetable_done. */
+   mutex_lock(>slots_lock);
+
+   if (likely(kvm->arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm->arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(>srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
@@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(>srcu, idx);
+
+out2:
+   mutex_unlock(>slots_lock);
return ret;
 }
 
@@ -4019,31 +4029,20 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /* Called with kvm->slots_lock held. */
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(>slots_lock);
-   if (kvm->arch.ept_identity_pagetable)
-   goto out;
+   BUG_ON(kvm->arch.ept_identity_pagetable_done);
+
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm->arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, _userspace_mem);
-   if (r)
-   goto out;
-
-   page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
 
-   kvm->arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(>slots_lock);
return r;
 }
 
@@ -7643,8 +7642,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..e05bd58 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7239,8 +7239,6 @@ void kvm_ar

[PATCH v7 1/9] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-09-20 Thread Tang Chen

We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..1d941ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm->asid_generation = 0;
init_vmcb(svm);
 
-   svm->vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(>vcpu))
svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..4b80ead 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, _userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(>vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(>vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v7 7/9] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-09-20 Thread Tang Chen

We are handling "L1 and L2 share one apic access page" situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
  vmcs in the next L1->L2 entry.

   2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
  L0->L1 entry and L2's vmcs in the next L1->L2 entry.

   3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
  L0->L2 entry and L1's vmcs in the next L2->L1 exit.

This patch handles 3).

In L0->L2 entry, L2's vmcs will be updated in prepare_vmcs02() called by
nested_vm_run(). So we need to do nothing.

In L2->L1 exit, this patch requests apic access page reload in L2->L1 vmexit.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/vmx.c  | 6 ++
 arch/x86/kvm/x86.c  | 3 ++-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 56156eb..1a8317e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1047,6 +1047,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu);
 
 void kvm_define_shared_msr(unsigned index, u32 msr);
 void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c8e90ec..baac78a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8803,6 +8803,12 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* We are now running in L2, mmu_notifier will force to reload the
+* page's hpa for L2 vmcs. Need to reload it for L1 before entering L1.
+*/
+   kvm_vcpu_reload_apic_access_page(vcpu);
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMRESUME instruction, so we need to set the
 * success or failure flag accordingly.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fc54fa6..2ae2dc7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,7 +5989,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
-static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
/*
 * Only APIC access page shared by L1 and L2 vm is handled. The APIC
@@ -6009,6 +6009,7 @@ static void kvm_vcpu_reload_apic_access_page(struct 
kvm_vcpu *vcpu)
page_to_phys(vcpu->kvm->arch.apic_access_page));
}
 }
+EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v7 9/9] kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

2014-09-20 Thread Tang Chen

To make apic access page migratable, we do not pin it in memory now.
When it is migrated, we should reload its physical address for all
vmcses. But when we tried to do this, all vcpu will access
kvm_arch->apic_access_page without any locking. This is not safe.

Actually, we do not need kvm_arch->apic_access_page anymore. Since
apic access page is not pinned in memory now, we can remove
kvm_arch->apic_access_page. When we need to write its physical address
into vmcs, use gfn_to_page() to get its page struct, which will also
pin it. And unpin it after then.

Suggested-by: Gleb Natapov 
Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/vmx.c  | 15 +--
 arch/x86/kvm/x86.c  | 16 +++-
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1a8317e..9fb3d4c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -576,7 +576,7 @@ struct kvm_arch {
struct kvm_apic_map *apic_map;
 
unsigned int tss_addr;
-   struct page *apic_access_page;
+   bool apic_access_page_done;
 
gpa_t wall_clock;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index baac78a..12f0715 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4002,7 +4002,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
int r = 0;
 
mutex_lock(>slots_lock);
-   if (kvm->arch.apic_access_page)
+   if (kvm->arch.apic_access_page_done)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
@@ -4018,7 +4018,12 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
}
 
-   kvm->arch.apic_access_page = page;
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
+   kvm->arch.apic_access_page_done = true;
 out:
mutex_unlock(>slots_lock);
return r;
@@ -4534,8 +4539,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
}
 
if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm))
-   vmcs_write64(APIC_ACCESS_ADDR,
-
page_to_phys(vmx->vcpu.kvm->arch.apic_access_page));
+   kvm_vcpu_reload_apic_access_page(vcpu);
 
if (vmx_vm_has_apicv(vcpu->kvm))
memset(>pi_desc, 0, sizeof(struct pi_desc));
@@ -8003,8 +8007,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
} else if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
exec_control |=
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-   vmcs_write64(APIC_ACCESS_ADDR,
-   page_to_phys(vcpu->kvm->arch.apic_access_page));
+   kvm_vcpu_reload_apic_access_page(vcpu);
}
 
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7dd4179..996af6e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5991,6 +5991,8 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
+   struct page *page = NULL;
+
/*
 * Only APIC access page shared by L1 and L2 vm is handled. The APIC
 * access page prepared by L1 for L2's execution is still pinned in
@@ -6003,10 +6005,16 @@ void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu 
*vcpu)
 * migrated, GUP will wait till the migrate entry is replaced
 * with the new pte entry pointing to the new page.
 */
-   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
-   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   page = gfn_to_page(vcpu->kvm,
+  APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
-   page_to_phys(vcpu->kvm->arch.apic_access_page));
+  page_to_phys(page));
+
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
}
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
@@ -7272,8 +7280,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kfree(kvm->arch.vpic);
kfree(kvm->arch.vioapic);
kvm_free_vcpus(kvm);
-   if (kvm->arch.apic_access_page)
-   put_page(kvm->arch.apic_access_page);
kfree(rcu_dereferen

[PATCH v7 6/9] kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static.

2014-09-20 Thread Tang Chen

Since different architectures need different handling, we will add some arch 
specific
code later. The code may need to make cpu requests outside kvm_main.c, so make 
it
non-static and rename it to kvm_make_all_cpus_request().

Signed-off-by: Tang Chen 
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c  | 10 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c23236a..73de13c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -580,6 +580,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm);
 void kvm_reload_remote_mmus(struct kvm *kvm);
 void kvm_make_mclock_inprogress_request(struct kvm *kvm);
 void kvm_make_scan_ioapic_request(struct kvm *kvm);
+bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
 
 long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..0f8b6f6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -152,7 +152,7 @@ static void ack_flush(void *_completed)
 {
 }
 
-static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
+bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
 {
int i, cpu, me;
cpumask_var_t cpus;
@@ -189,7 +189,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
long dirty_count = kvm->tlbs_dirty;
 
smp_mb();
-   if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
+   if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
++kvm->stat.remote_tlb_flush;
cmpxchg(>tlbs_dirty, dirty_count, 0);
 }
@@ -197,17 +197,17 @@ EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
 
 void kvm_reload_remote_mmus(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
 }
 
 void kvm_make_mclock_inprogress_request(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS);
 }
 
 void kvm_make_scan_ioapic_request(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
 }
 
 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v7 8/9] kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access migration.

2014-09-20 Thread Tang Chen

We are handling "L1 and L2 share one apic access page" situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
  vmcs in the next L1->L2 entry.

   2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
  L0->L1 entry and L2's vmcs in the next L1->L2 entry.

   3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
  L0->L2 entry and L1's vmcs in the next L2->L1 exit.

This patch force a L1->L0 exit or L2->L0 exit when shared apic access page is
migrated using mmu notifier. Since apic access page is only used on intel x86,
this is arch specific code.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/x86.c   | 11 +++
 include/linux/kvm_host.h | 14 +-
 virt/kvm/kvm_main.c  |  3 +++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2ae2dc7..7dd4179 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6011,6 +6011,17 @@ void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu 
*vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
+void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+  unsigned long address)
+{
+   /*
+* The physical address of apic access page is stored in VMCS.
+* Update it when it becomes invalid.
+*/
+   if (address == gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT))
+   kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 73de13c..b6e4d38 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -917,7 +917,19 @@ static inline int mmu_notifier_retry(struct kvm *kvm, 
unsigned long mmu_seq)
return 1;
return 0;
 }
-#endif
+
+#ifdef _ASM_X86_KVM_HOST_H
+void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+  unsigned long address);
+#else /* _ASM_X86_KVM_HOST_H */
+inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+ unsigned long address)
+{
+   return;
+}
+#endif /* _ASM_X86_KVM_HOST_H */
+
+#endif /* CONFIG_MMU_NOTIFIER & KVM_ARCH_WANT_MMU_NOTIFIER */
 
 #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0f8b6f6..5427973d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -295,6 +295,9 @@ static void kvm_mmu_notifier_invalidate_page(struct 
mmu_notifier *mn,
kvm_flush_remote_tlbs(kvm);
 
spin_unlock(>mmu_lock);
+
+   kvm_arch_mmu_notifier_invalidate_page(kvm, address);
+
srcu_read_unlock(>srcu, idx);
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v7 4/9] kvm: Add interface to check if secondary exec virtualzed apic accesses is enabled.

2014-09-20 Thread Tang Chen

We wants to migrate apic access page pinned by guest (L1 and L2) to make memory
hotplug available.

There are two situations need to be handled for apic access page used by L2 vm:
1. L1 prepares a separate apic access page for L2.

   L2 pins a lot of pages in memory. Even if we can migrate apic access page,
   memory hotplug is not available when L2 is running. So do not handle this
   now. Migrate apic access page only.

2. L1 and L2 share one apic access page.

   Since we will migrate L1's apic access page, we should do some handling when
   migration happens in the following situations:

   1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
  vmcs in the next L1->L2 entry.

   2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
  L0->L1 entry and L2's vmcs in the next L1->L2 entry.

   3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
  L0->L2 entry and L1's vmcs in the next L2->L1 exit.

Since we don't handle L1 ans L2 have separate apic access pages situation,
when we update vmcs, we need to check if we are in L2 and if L2's secondary
exec virtualzed apic accesses is enabled.

This patch adds an interface to check if L2's secondary exec virtualzed apic
accesses is enabled, because vmx cannot be accessed outside vmx.c.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/svm.c  | 6 ++
 arch/x86/kvm/vmx.c  | 9 +
 3 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35171c7..69fe032 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -739,6 +739,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   bool (*has_secondary_apic_access)(struct kvm_vcpu *vcpu);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d941ad..9c8ae32 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3619,6 +3619,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
+static bool svm_has_secondary_apic_access(struct kvm_vcpu *vcpu)
+{
+   return false;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4373,6 +4378,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+   .has_secondary_apic_access = svm_has_secondary_apic_access,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 72a0470..0b541d9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7090,6 +7090,14 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static bool vmx_has_secondary_apic_access(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   return (vmx->nested.current_vmcs12->secondary_vm_exec_control &
+   SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8909,6 +8917,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .has_secondary_apic_access = vmx_has_secondary_apic_access,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v7 0/9] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-20 Thread Tang Chen

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

This patch-set is based on Linux 3.17.0-rc5.

NOTE: Tested with -cpu xxx,-x2apic option.
  But since nested vm pins some other pages in memory, if user uses nested
  vm, memory hot-remove will not work.

Change log v6 -> v7:
1. Patch 1/9~3/9 are applied to kvm/queue by Paolo Bonzini 
.
   Just resend them, no changes.
2. In the new patch 4/9, add a new interface to check if secondary exec 
   virtualized apic access is enabled.
3. In new patch 6/9, rename make_all_cpus_request() to 
kvm_make_all_cpus_request() 
   and make it non-static so that we can use it in other patches.
4. In new patch 8/9, add arch specific function to make apic access page reload 
   request in mmu notifier.

Tang Chen (9):
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm: Add interface to check if secondary exec virtualzed apic accesses
is enabled.
  kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().
  kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and
make it non-static.
  kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
running.
  kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access
migration.
  kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

 arch/x86/include/asm/kvm_host.h |   6 ++-
 arch/x86/kvm/svm.c  |  15 +-
 arch/x86/kvm/vmx.c  | 104 
 arch/x86/kvm/x86.c  |  47 --
 include/linux/kvm_host.h|  16 ++-
 virt/kvm/kvm_main.c |  13 +++--
 6 files changed, 146 insertions(+), 55 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v7 0/9] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-20 Thread Tang Chen

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

This patch-set is based on Linux 3.17.0-rc5.

NOTE: Tested with -cpu xxx,-x2apic option.
  But since nested vm pins some other pages in memory, if user uses nested
  vm, memory hot-remove will not work.

Change log v6 - v7:
1. Patch 1/9~3/9 are applied to kvm/queue by Paolo Bonzini 
pbonz...@redhat.com.
   Just resend them, no changes.
2. In the new patch 4/9, add a new interface to check if secondary exec 
   virtualized apic access is enabled.
3. In new patch 6/9, rename make_all_cpus_request() to 
kvm_make_all_cpus_request() 
   and make it non-static so that we can use it in other patches.
4. In new patch 8/9, add arch specific function to make apic access page reload 
   request in mmu notifier.

Tang Chen (9):
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm: Add interface to check if secondary exec virtualzed apic accesses
is enabled.
  kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().
  kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and
make it non-static.
  kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
running.
  kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access
migration.
  kvm, mem-hotplug: Unpin and remove kvm_arch-apic_access_page.

 arch/x86/include/asm/kvm_host.h |   6 ++-
 arch/x86/kvm/svm.c  |  15 +-
 arch/x86/kvm/vmx.c  | 104 
 arch/x86/kvm/x86.c  |  47 --
 include/linux/kvm_host.h|  16 ++-
 virt/kvm/kvm_main.c |  13 +++--
 6 files changed, 146 insertions(+), 55 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v7 4/9] kvm: Add interface to check if secondary exec virtualzed apic accesses is enabled.

2014-09-20 Thread Tang Chen

We wants to migrate apic access page pinned by guest (L1 and L2) to make memory
hotplug available.

There are two situations need to be handled for apic access page used by L2 vm:
1. L1 prepares a separate apic access page for L2.

   L2 pins a lot of pages in memory. Even if we can migrate apic access page,
   memory hotplug is not available when L2 is running. So do not handle this
   now. Migrate apic access page only.

2. L1 and L2 share one apic access page.

   Since we will migrate L1's apic access page, we should do some handling when
   migration happens in the following situations:

   1) when L0 is running: Update L1's vmcs in the next L0-L1 entry and L2's
  vmcs in the next L1-L2 entry.

   2) when L1 is running: Force a L1-L0 exit, update L1's vmcs in the next
  L0-L1 entry and L2's vmcs in the next L1-L2 entry.

   3) when L2 is running: Force a L2-L0 exit, update L2's vmcs in the next
  L0-L2 entry and L1's vmcs in the next L2-L1 exit.

Since we don't handle L1 ans L2 have separate apic access pages situation,
when we update vmcs, we need to check if we are in L2 and if L2's secondary
exec virtualzed apic accesses is enabled.

This patch adds an interface to check if L2's secondary exec virtualzed apic
accesses is enabled, because vmx cannot be accessed outside vmx.c.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/svm.c  | 6 ++
 arch/x86/kvm/vmx.c  | 9 +
 3 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35171c7..69fe032 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -739,6 +739,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   bool (*has_secondary_apic_access)(struct kvm_vcpu *vcpu);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d941ad..9c8ae32 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3619,6 +3619,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
+static bool svm_has_secondary_apic_access(struct kvm_vcpu *vcpu)
+{
+   return false;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4373,6 +4378,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+   .has_secondary_apic_access = svm_has_secondary_apic_access,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 72a0470..0b541d9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7090,6 +7090,14 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static bool vmx_has_secondary_apic_access(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   return (vmx-nested.current_vmcs12-secondary_vm_exec_control 
+   SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8909,6 +8917,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .has_secondary_apic_access = vmx_has_secondary_apic_access,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v7 7/9] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-09-20 Thread Tang Chen

We are handling L1 and L2 share one apic access page situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0-L1 entry and L2's
  vmcs in the next L1-L2 entry.

   2) when L1 is running: Force a L1-L0 exit, update L1's vmcs in the next
  L0-L1 entry and L2's vmcs in the next L1-L2 entry.

   3) when L2 is running: Force a L2-L0 exit, update L2's vmcs in the next
  L0-L2 entry and L1's vmcs in the next L2-L1 exit.

This patch handles 3).

In L0-L2 entry, L2's vmcs will be updated in prepare_vmcs02() called by
nested_vm_run(). So we need to do nothing.

In L2-L1 exit, this patch requests apic access page reload in L2-L1 vmexit.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/vmx.c  | 6 ++
 arch/x86/kvm/x86.c  | 3 ++-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 56156eb..1a8317e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1047,6 +1047,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu);
 
 void kvm_define_shared_msr(unsigned index, u32 msr);
 void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c8e90ec..baac78a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8803,6 +8803,12 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* We are now running in L2, mmu_notifier will force to reload the
+* page's hpa for L2 vmcs. Need to reload it for L1 before entering L1.
+*/
+   kvm_vcpu_reload_apic_access_page(vcpu);
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMRESUME instruction, so we need to set the
 * success or failure flag accordingly.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fc54fa6..2ae2dc7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,7 +5989,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
-static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
/*
 * Only APIC access page shared by L1 and L2 vm is handled. The APIC
@@ -6009,6 +6009,7 @@ static void kvm_vcpu_reload_apic_access_page(struct 
kvm_vcpu *vcpu)
page_to_phys(vcpu-kvm-arch.apic_access_page));
}
 }
+EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v7 9/9] kvm, mem-hotplug: Unpin and remove kvm_arch-apic_access_page.

2014-09-20 Thread Tang Chen

To make apic access page migratable, we do not pin it in memory now.
When it is migrated, we should reload its physical address for all
vmcses. But when we tried to do this, all vcpu will access
kvm_arch-apic_access_page without any locking. This is not safe.

Actually, we do not need kvm_arch-apic_access_page anymore. Since
apic access page is not pinned in memory now, we can remove
kvm_arch-apic_access_page. When we need to write its physical address
into vmcs, use gfn_to_page() to get its page struct, which will also
pin it. And unpin it after then.

Suggested-by: Gleb Natapov g...@kernel.org
Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/vmx.c  | 15 +--
 arch/x86/kvm/x86.c  | 16 +++-
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1a8317e..9fb3d4c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -576,7 +576,7 @@ struct kvm_arch {
struct kvm_apic_map *apic_map;
 
unsigned int tss_addr;
-   struct page *apic_access_page;
+   bool apic_access_page_done;
 
gpa_t wall_clock;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index baac78a..12f0715 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4002,7 +4002,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
int r = 0;
 
mutex_lock(kvm-slots_lock);
-   if (kvm-arch.apic_access_page)
+   if (kvm-arch.apic_access_page_done)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
@@ -4018,7 +4018,12 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
}
 
-   kvm-arch.apic_access_page = page;
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
+   kvm-arch.apic_access_page_done = true;
 out:
mutex_unlock(kvm-slots_lock);
return r;
@@ -4534,8 +4539,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
}
 
if (vm_need_virtualize_apic_accesses(vmx-vcpu.kvm))
-   vmcs_write64(APIC_ACCESS_ADDR,
-
page_to_phys(vmx-vcpu.kvm-arch.apic_access_page));
+   kvm_vcpu_reload_apic_access_page(vcpu);
 
if (vmx_vm_has_apicv(vcpu-kvm))
memset(vmx-pi_desc, 0, sizeof(struct pi_desc));
@@ -8003,8 +8007,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
} else if (vm_need_virtualize_apic_accesses(vmx-vcpu.kvm)) {
exec_control |=
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-   vmcs_write64(APIC_ACCESS_ADDR,
-   page_to_phys(vcpu-kvm-arch.apic_access_page));
+   kvm_vcpu_reload_apic_access_page(vcpu);
}
 
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7dd4179..996af6e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5991,6 +5991,8 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
+   struct page *page = NULL;
+
/*
 * Only APIC access page shared by L1 and L2 vm is handled. The APIC
 * access page prepared by L1 for L2's execution is still pinned in
@@ -6003,10 +6005,16 @@ void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu 
*vcpu)
 * migrated, GUP will wait till the migrate entry is replaced
 * with the new pte entry pointing to the new page.
 */
-   vcpu-kvm-arch.apic_access_page = gfn_to_page(vcpu-kvm,
-   APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
+   page = gfn_to_page(vcpu-kvm,
+  APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm,
-   page_to_phys(vcpu-kvm-arch.apic_access_page));
+  page_to_phys(page));
+
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
}
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
@@ -7272,8 +7280,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kfree(kvm-arch.vpic);
kfree(kvm-arch.vioapic);
kvm_free_vcpus(kvm);
-   if (kvm-arch.apic_access_page)
-   put_page(kvm-arch.apic_access_page);
kfree(rcu_dereference_check(kvm-arch.apic_map, 1));
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list

[PATCH v7 6/9] kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static.

2014-09-20 Thread Tang Chen

Since different architectures need different handling, we will add some arch 
specific
code later. The code may need to make cpu requests outside kvm_main.c, so make 
it
non-static and rename it to kvm_make_all_cpus_request().

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c  | 10 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c23236a..73de13c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -580,6 +580,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm);
 void kvm_reload_remote_mmus(struct kvm *kvm);
 void kvm_make_mclock_inprogress_request(struct kvm *kvm);
 void kvm_make_scan_ioapic_request(struct kvm *kvm);
+bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
 
 long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..0f8b6f6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -152,7 +152,7 @@ static void ack_flush(void *_completed)
 {
 }
 
-static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
+bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
 {
int i, cpu, me;
cpumask_var_t cpus;
@@ -189,7 +189,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
long dirty_count = kvm-tlbs_dirty;
 
smp_mb();
-   if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
+   if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
++kvm-stat.remote_tlb_flush;
cmpxchg(kvm-tlbs_dirty, dirty_count, 0);
 }
@@ -197,17 +197,17 @@ EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
 
 void kvm_reload_remote_mmus(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
 }
 
 void kvm_make_mclock_inprogress_request(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS);
 }
 
 void kvm_make_scan_ioapic_request(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
 }
 
 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v7 8/9] kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access migration.

2014-09-20 Thread Tang Chen

We are handling L1 and L2 share one apic access page situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0-L1 entry and L2's
  vmcs in the next L1-L2 entry.

   2) when L1 is running: Force a L1-L0 exit, update L1's vmcs in the next
  L0-L1 entry and L2's vmcs in the next L1-L2 entry.

   3) when L2 is running: Force a L2-L0 exit, update L2's vmcs in the next
  L0-L2 entry and L1's vmcs in the next L2-L1 exit.

This patch force a L1-L0 exit or L2-L0 exit when shared apic access page is
migrated using mmu notifier. Since apic access page is only used on intel x86,
this is arch specific code.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/kvm/x86.c   | 11 +++
 include/linux/kvm_host.h | 14 +-
 virt/kvm/kvm_main.c  |  3 +++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2ae2dc7..7dd4179 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6011,6 +6011,17 @@ void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu 
*vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
+void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+  unsigned long address)
+{
+   /*
+* The physical address of apic access page is stored in VMCS.
+* Update it when it becomes invalid.
+*/
+   if (address == gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT))
+   kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 73de13c..b6e4d38 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -917,7 +917,19 @@ static inline int mmu_notifier_retry(struct kvm *kvm, 
unsigned long mmu_seq)
return 1;
return 0;
 }
-#endif
+
+#ifdef _ASM_X86_KVM_HOST_H
+void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+  unsigned long address);
+#else /* _ASM_X86_KVM_HOST_H */
+inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+ unsigned long address)
+{
+   return;
+}
+#endif /* _ASM_X86_KVM_HOST_H */
+
+#endif /* CONFIG_MMU_NOTIFIER  KVM_ARCH_WANT_MMU_NOTIFIER */
 
 #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0f8b6f6..5427973d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -295,6 +295,9 @@ static void kvm_mmu_notifier_invalidate_page(struct 
mmu_notifier *mn,
kvm_flush_remote_tlbs(kvm);
 
spin_unlock(kvm-mmu_lock);
+
+   kvm_arch_mmu_notifier_invalidate_page(kvm, address);
+
srcu_read_unlock(kvm-srcu, idx);
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v7 2/9] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-09-20 Thread Tang Chen

kvm_arch-ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch-ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch-ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
Reviewed-by: Gleb Natapov g...@kernel.org
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 47 +++--
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..35171c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,7 +580,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b80ead..4fb84ad 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3938,21 +3939,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm-arch.ept_identity_pagetable)) {
-   printk(KERN_ERR EPT: identity-mapping pagetable 
-   haven't been allocated!\n);
-   return 0;
+
+   /* Protect kvm-arch.ept_identity_pagetable_done. */
+   mutex_lock(kvm-slots_lock);
+
+   if (likely(kvm-arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm-arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm-arch.ept_identity_map_addr  PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(kvm-srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r  0)
@@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(kvm-srcu, idx);
+
+out2:
+   mutex_unlock(kvm-slots_lock);
return ret;
 }
 
@@ -4019,31 +4029,20 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /* Called with kvm-slots_lock held. */
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(kvm-slots_lock);
-   if (kvm-arch.ept_identity_pagetable)
-   goto out;
+   BUG_ON(kvm-arch.ept_identity_pagetable_done);
+
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm-arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, kvm_userspace_mem);
-   if (r)
-   goto out;
-
-   page = gfn_to_page(kvm, kvm-arch.ept_identity_map_addr  PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
 
-   kvm-arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(kvm-slots_lock);
return r;
 }
 
@@ -7643,8 +7642,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm-arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..e05bd58 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7239,8 +7239,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_free_vcpus(kvm);
if (kvm

[PATCH v7 1/9] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-09-20 Thread Tang Chen

We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
Reviewed-by: Gleb Natapov g...@kernel.org
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..1d941ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm-asid_generation = 0;
init_vmcb(svm);
 
-   svm-vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm-vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(svm-vcpu))
svm-vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..4b80ead 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, kvm_userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx-vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(vmx-vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(vmx-vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v7 5/9] kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().

2014-09-20 Thread Tang Chen

We are handling L1 and L2 share one apic access page situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0-L1 entry and L2's
  vmcs in the next L1-L2 entry.

   2) when L1 is running: Force a L1-L0 exit, update L1's vmcs in the next
  L0-L1 entry and L2's vmcs in the next L1-L2 entry.

   3) when L2 is running: Force a L2-L0 exit, update L2's vmcs in the next
  L0-L2 entry and L1's vmcs in the next L2-L1 exit.

This patch handles 1) and 2) by making a new vcpu request named 
KVM_REQ_APIC_PAGE_RELOAD
to reload apic access page in L0-L1 entry.

Since we don't handle L1 ans L2 have separate apic access pages situation,
when we update vmcs, we need to check if we are in L2 and if L2's secondary
exec virtualzed apic accesses is enabled.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  |  6 ++
 arch/x86/kvm/x86.c  | 23 +++
 include/linux/kvm_host.h|  1 +
 5 files changed, 37 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 69fe032..56156eb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -740,6 +740,7 @@ struct kvm_x86_ops {
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
bool (*has_secondary_apic_access)(struct kvm_vcpu *vcpu);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 9c8ae32..99378d7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3624,6 +3624,11 @@ static bool svm_has_secondary_apic_access(struct 
kvm_vcpu *vcpu)
return false;
 }
 
+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4379,6 +4384,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
.has_secondary_apic_access = svm_has_secondary_apic_access,
+   .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0b541d9..c8e90ec 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7098,6 +7098,11 @@ static bool vmx_has_secondary_apic_access(struct 
kvm_vcpu *vcpu)
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8918,6 +8923,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
.has_secondary_apic_access = vmx_has_secondary_apic_access,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e05bd58..fc54fa6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,6 +5989,27 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+   /*
+* Only APIC access page shared by L1 and L2 vm is handled. The APIC
+* access page prepared by L1 for L2's execution is still pinned in
+* memory, and it cannot be migrated.
+*/
+   if (!is_guest_mode(vcpu) ||
+   !kvm_x86_ops-has_secondary_apic_access(vcpu)) {
+   /*
+* APIC access page could be migrated. When the page is being
+* migrated, GUP will wait till the migrate entry is replaced
+* with the new pte entry pointing to the new page.
+*/
+   vcpu-kvm-arch.apic_access_page = gfn_to_page(vcpu-kvm,
+   APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
+   kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm,
+   page_to_phys(vcpu-kvm-arch.apic_access_page));
+   }
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue

[PATCH v7 3/9] kvm: Make init_rmode_identity_map() return 0 on success.

2014-09-20 Thread Tang Chen

In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/kvm/vmx.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4fb84ad..72a0470 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3939,45 +3939,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm-arch.ept_identity_pagetable_done. */
mutex_lock(kvm-slots_lock);
 
-   if (likely(kvm-arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm-arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm-arch.ept_identity_map_addr  PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(kvm-srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r  0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i  PT32_ENT_PER_PAGE; i++) {
tmp = (i  22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
tmp, i * sizeof(tmp), sizeof(tmp));
-   if (r  0)
+   if (ret)
goto out;
}
kvm-arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(kvm-srcu, idx);
-
 out2:
mutex_unlock(kvm-slots_lock);
return ret;
@@ -7604,11 +7601,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
*kvm, unsigned int id)
if (err)
goto free_vcpu;
 
+   /* Set err to -ENOMEM to handle memory allocation error. */
+   err = -ENOMEM;
+
vmx-guest_msrs = kmalloc(PAGE_SIZE, GFP_KERNEL);
BUILD_BUG_ON(ARRAY_SIZE(vmx_msr_index) * sizeof(vmx-guest_msrs[0])
  PAGE_SIZE);
 
-   err = -ENOMEM;
if (!vmx-guest_msrs) {
goto uninit_vcpu;
}
@@ -7641,8 +7640,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
if (!kvm-arch.ept_identity_map_addr)
kvm-arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
-   err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   err = init_rmode_identity_map(kvm);
+   if (err  0)
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-17 Thread Tang Chen



On 09/16/2014 07:24 PM, Paolo Bonzini wrote:

Il 16/09/2014 12:42, Tang Chen ha scritto:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..0df82c1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -210,6 +210,11 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
  }
  
+void kvm_reload_apic_access_page(struct kvm *kvm)

+{
+   make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
+}
+
  int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
  {
struct page *page;
@@ -294,6 +299,13 @@ static void kvm_mmu_notifier_invalidate_page(struct 
mmu_notifier *mn,
if (need_tlb_flush)
kvm_flush_remote_tlbs(kvm);
  
+	/*

+* The physical address of apic access page is stored in VMCS.
+* Update it when it becomes invalid.
+*/
+   if (address == gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT))
+   kvm_reload_apic_access_page(kvm);

This cannot be in the generic code.  It is architecture-specific.


Yes.


Please add a new function kvm_arch_mmu_notifier_invalidate_page, and
call it outside the mmu_lock.


Then I think we need a macro to control the calling of this arch function
since other architectures do not have it.



kvm_reload_apic_access_page need not be in virt/kvm/kvm_main.c, either.


Since kvm_reload_apic_access_page() only calls make_all_cpus_request(),
and make_all_cpus_request() is static, I'd like to make it non-static, 
rename
it to kvm_make_all_cpus_request() and call it directly in 
kvm_arch_mmu_notifier_invalidate_page().

we don't need kvm_reload_apic_access_page() actually.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-17 Thread Tang Chen



On 09/16/2014 07:24 PM, Paolo Bonzini wrote:

Il 16/09/2014 12:42, Tang Chen ha scritto:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..0df82c1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -210,6 +210,11 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
  }
  
+void kvm_reload_apic_access_page(struct kvm *kvm)

+{
+   make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
+}
+
  int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
  {
struct page *page;
@@ -294,6 +299,13 @@ static void kvm_mmu_notifier_invalidate_page(struct 
mmu_notifier *mn,
if (need_tlb_flush)
kvm_flush_remote_tlbs(kvm);
  
+	/*

+* The physical address of apic access page is stored in VMCS.
+* Update it when it becomes invalid.
+*/
+   if (address == gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT))
+   kvm_reload_apic_access_page(kvm);

This cannot be in the generic code.  It is architecture-specific.


Yes.


Please add a new function kvm_arch_mmu_notifier_invalidate_page, and
call it outside the mmu_lock.


Then I think we need a macro to control the calling of this arch function
since other architectures do not have it.



kvm_reload_apic_access_page need not be in virt/kvm/kvm_main.c, either.


Since kvm_reload_apic_access_page() only calls make_all_cpus_request(),
and make_all_cpus_request() is static, I'd like to make it non-static, 
rename
it to kvm_make_all_cpus_request() and call it directly in 
kvm_arch_mmu_notifier_invalidate_page().

we don't need kvm_reload_apic_access_page() actually.

Thanks.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 3/6] kvm: Make init_rmode_identity_map() return 0 on success.

2014-09-16 Thread Tang Chen

In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4fb84ad..72a0470 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3939,45 +3939,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm->arch.ept_identity_pagetable_done. */
mutex_lock(>slots_lock);
 
-   if (likely(kvm->arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm->arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(>srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r < 0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i < PT32_ENT_PER_PAGE; i++) {
tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
, i * sizeof(tmp), sizeof(tmp));
-   if (r < 0)
+   if (ret)
goto out;
}
kvm->arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(>srcu, idx);
-
 out2:
mutex_unlock(>slots_lock);
return ret;
@@ -7604,11 +7601,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
*kvm, unsigned int id)
if (err)
goto free_vcpu;
 
+   /* Set err to -ENOMEM to handle memory allocation error. */
+   err = -ENOMEM;
+
vmx->guest_msrs = kmalloc(PAGE_SIZE, GFP_KERNEL);
BUILD_BUG_ON(ARRAY_SIZE(vmx_msr_index) * sizeof(vmx->guest_msrs[0])
 > PAGE_SIZE);
 
-   err = -ENOMEM;
if (!vmx->guest_msrs) {
goto uninit_vcpu;
}
@@ -7641,8 +7640,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
if (!kvm->arch.ept_identity_map_addr)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
-   err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   err = init_rmode_identity_map(kvm);
+   if (err < 0)
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 1/6] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-09-16 Thread Tang Chen

We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..1d941ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm->asid_generation = 0;
init_vmcb(svm);
 
-   svm->vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(>vcpu))
svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..4b80ead 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, _userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(>vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(>vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 6/6] kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

2014-09-16 Thread Tang Chen

To make apic access page migratable, we do not pin it in memory now.
When it is migrated, we should reload its physical address for all
vmcses. But when we tried to do this, all vcpu will access
kvm_arch->apic_access_page without any locking. This is not safe.

Actually, we do not need kvm_arch->apic_access_page anymore. Since
apic access page is not pinned in memory now, we can remove
kvm_arch->apic_access_page. When we need to write its physical address
into vmcs, use gfn_to_page() to get its page struct, which will also
pin it. And unpin it after then.

Suggested-by: Gleb Natapov 
Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/vmx.c  | 15 +--
 arch/x86/kvm/x86.c  | 15 +--
 3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 92b3e72..9daf754 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -576,7 +576,7 @@ struct kvm_arch {
struct kvm_apic_map *apic_map;
 
unsigned int tss_addr;
-   struct page *apic_access_page;
+   bool apic_access_page_done;
 
gpa_t wall_clock;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d0d5981..61f3854 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4002,7 +4002,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
int r = 0;
 
mutex_lock(>slots_lock);
-   if (kvm->arch.apic_access_page)
+   if (kvm->arch.apic_access_page_done)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
@@ -4018,7 +4018,12 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
}
 
-   kvm->arch.apic_access_page = page;
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
+   kvm->arch.apic_access_page_done = true;
 out:
mutex_unlock(>slots_lock);
return r;
@@ -4534,8 +4539,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
}
 
if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm))
-   vmcs_write64(APIC_ACCESS_ADDR,
-
page_to_phys(vmx->vcpu.kvm->arch.apic_access_page));
+   kvm_vcpu_reload_apic_access_page(vcpu);
 
if (vmx_vm_has_apicv(vcpu->kvm))
memset(>pi_desc, 0, sizeof(struct pi_desc));
@@ -7995,8 +7999,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
} else if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
exec_control |=
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-   vmcs_write64(APIC_ACCESS_ADDR,
-   page_to_phys(vcpu->kvm->arch.apic_access_page));
+   kvm_vcpu_reload_apic_access_page(vcpu);
}
 
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3f458b2..9094e13 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5991,15 +5991,20 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
+   struct page *page;
+
/*
 * apic access page could be migrated. When the page is being migrated,
 * GUP will wait till the migrate entry is replaced with the new pte
 * entry pointing to the new page.
 */
-   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
-   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
-   kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
-   page_to_phys(vcpu->kvm->arch.apic_access_page));
+   page = gfn_to_page(vcpu->kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm, page_to_phys(page));
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
@@ -7253,8 +7258,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kfree(kvm->arch.vpic);
kfree(kvm->arch.vioapic);
kvm_free_vcpus(kvm);
-   if (kvm->arch.apic_access_page)
-   put_page(kvm->arch.apic_access_page);
kfree(rcu_dereference_check(kvm->arch.apic_map, 1));
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 5/6] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-09-16 Thread Tang Chen

This patch only handle "L1 and L2 vm share one apic access page" situation.

When L1 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, 
L2's vmcs
will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do
nothing.

When L2 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all L2 vmcs. And this patch requests apic access page reload in L2->L1 vmexit.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/vmx.c  | 6 ++
 arch/x86/kvm/x86.c  | 3 ++-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 514183e..92b3e72 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1046,6 +1046,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu);
 
 void kvm_define_shared_msr(unsigned index, u32 msr);
 void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a1a9797..d0d5981 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8795,6 +8795,12 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* We are now running in L2, mmu_notifier will force to reload the
+* page's hpa for L2 vmcs. Need to reload it for L1 before entering L1.
+*/
+   kvm_vcpu_reload_apic_access_page(vcpu);
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMRESUME instruction, so we need to set the
 * success or failure flag accordingly.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 27c3d30..3f458b2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,7 +5989,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
-static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
/*
 * apic access page could be migrated. When the page is being migrated,
@@ -6001,6 +6001,7 @@ static void kvm_vcpu_reload_apic_access_page(struct 
kvm_vcpu *vcpu)
kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
page_to_phys(vcpu->kvm->arch.apic_access_page));
 }
+EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 2/6] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-09-16 Thread Tang Chen

kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 47 +++--
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..35171c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,7 +580,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b80ead..4fb84ad 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3938,21 +3939,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm->arch.ept_identity_pagetable)) {
-   printk(KERN_ERR "EPT: identity-mapping pagetable "
-   "haven't been allocated!\n");
-   return 0;
+
+   /* Protect kvm->arch.ept_identity_pagetable_done. */
+   mutex_lock(>slots_lock);
+
+   if (likely(kvm->arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm->arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(>srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
@@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(>srcu, idx);
+
+out2:
+   mutex_unlock(>slots_lock);
return ret;
 }
 
@@ -4019,31 +4029,20 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /* Called with kvm->slots_lock held. */
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(>slots_lock);
-   if (kvm->arch.ept_identity_pagetable)
-   goto out;
+   BUG_ON(kvm->arch.ept_identity_pagetable_done);
+
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm->arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, _userspace_mem);
-   if (r)
-   goto out;
-
-   page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
 
-   kvm->arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(>slots_lock);
return r;
 }
 
@@ -7643,8 +7642,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..e05bd58 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7239,8 +7239,6 @@ void kvm_ar

[PATCH v6 0/6] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-16 Thread Tang Chen

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

NOTE: Tested with -cpu xxx,-x2apic option.
  But since nested vm pins some other pages in memory, if user uses nested
  vm, memory hot-remove will not work.

Change log v5 -> v6:
1. Patch 1/6 has been applied by Paolo Bonzini , just 
resend it.
2. Simplify comment in alloc_identity_pagetable() and add a BUG_ON() in patch 
2/6.
3. Move err initialization forward in patch 3/6.
4. Rename vcpu_reload_apic_access_page() to kvm_vcpu_reload_apic_access_page() 
and 
   use it instead of kvm_reload_apic_access_page() in nested_vmx_vmexit() in 
patch 5/6.
5. Reuse kvm_vcpu_reload_apic_access_page() in prepare_vmcs02() and 
vmx_vcpu_reset() in patch 6/6.
6. Remove original patch 7 since we are not able to handle the situation in 
nested vm.

Tang Chen (6):
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm, mem-hotplug: Reload L1' apic access page on migration in
vcpu_enter_guest().
  kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
running.
  kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

 arch/x86/include/asm/kvm_host.h |  5 ++-
 arch/x86/kvm/svm.c  |  9 +++-
 arch/x86/kvm/vmx.c  | 95 +++--
 arch/x86/kvm/x86.c  | 25 +--
 include/linux/kvm_host.h|  2 +
 virt/kvm/kvm_main.c | 12 ++
 6 files changed, 99 insertions(+), 49 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-16 Thread Tang Chen

apic access page is pinned in memory. As a result, it cannot be 
migrated/hot-removed.
Actually, it is not necessary to be pinned.

The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
corresponding ept entry. This patch introduces a new vcpu request named
KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
and force all the vcpus exit guest, and re-enter guest till they updates the 
VMCS
APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
kvm->arch.apic_access_page to the new page.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  |  6 ++
 arch/x86/kvm/x86.c  | 15 +++
 include/linux/kvm_host.h|  2 ++
 virt/kvm/kvm_main.c | 12 
 6 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35171c7..514183e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -739,6 +739,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d941ad..f2eacc4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3619,6 +3619,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4373,6 +4378,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 72a0470..a1a9797 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7090,6 +7090,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8909,6 +8914,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e05bd58..27c3d30 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,6 +5989,19 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+   /*
+* apic access page could be migrated. When the page is being migrated,
+* GUP will wait till the migrate entry is replaced with the new pte
+* entry pointing to the new page.
+*/
+   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
+   page_to_phys(vcpu->kvm->arch.apic_access_page));
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
@@ -6049,6 +6062,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_deliver_pmi(vcpu);
if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
vcpu_scan_ioapic(vcpu);
+   if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
+   kvm_vcpu_reload_apic_access_page(vcpu);
}
 
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h

[PATCH v6 0/6] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-16 Thread Tang Chen

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

NOTE: Tested with -cpu xxx,-x2apic option.
  But since nested vm pins some other pages in memory, if user uses nested
  vm, memory hot-remove will not work.

Change log v5 - v6:
1. Patch 1/6 has been applied by Paolo Bonzini pbonz...@redhat.com, just 
resend it.
2. Simplify comment in alloc_identity_pagetable() and add a BUG_ON() in patch 
2/6.
3. Move err initialization forward in patch 3/6.
4. Rename vcpu_reload_apic_access_page() to kvm_vcpu_reload_apic_access_page() 
and 
   use it instead of kvm_reload_apic_access_page() in nested_vmx_vmexit() in 
patch 5/6.
5. Reuse kvm_vcpu_reload_apic_access_page() in prepare_vmcs02() and 
vmx_vcpu_reset() in patch 6/6.
6. Remove original patch 7 since we are not able to handle the situation in 
nested vm.

Tang Chen (6):
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm, mem-hotplug: Reload L1' apic access page on migration in
vcpu_enter_guest().
  kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
running.
  kvm, mem-hotplug: Unpin and remove kvm_arch-apic_access_page.

 arch/x86/include/asm/kvm_host.h |  5 ++-
 arch/x86/kvm/svm.c  |  9 +++-
 arch/x86/kvm/vmx.c  | 95 +++--
 arch/x86/kvm/x86.c  | 25 +--
 include/linux/kvm_host.h|  2 +
 virt/kvm/kvm_main.c | 12 ++
 6 files changed, 99 insertions(+), 49 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-16 Thread Tang Chen

apic access page is pinned in memory. As a result, it cannot be 
migrated/hot-removed.
Actually, it is not necessary to be pinned.

The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
corresponding ept entry. This patch introduces a new vcpu request named
KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
and force all the vcpus exit guest, and re-enter guest till they updates the 
VMCS
APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
kvm-arch.apic_access_page to the new page.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  |  6 ++
 arch/x86/kvm/x86.c  | 15 +++
 include/linux/kvm_host.h|  2 ++
 virt/kvm/kvm_main.c | 12 
 6 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35171c7..514183e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -739,6 +739,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d941ad..f2eacc4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3619,6 +3619,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4373,6 +4378,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 72a0470..a1a9797 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7090,6 +7090,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8909,6 +8914,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e05bd58..27c3d30 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,6 +5989,19 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+   /*
+* apic access page could be migrated. When the page is being migrated,
+* GUP will wait till the migrate entry is replaced with the new pte
+* entry pointing to the new page.
+*/
+   vcpu-kvm-arch.apic_access_page = gfn_to_page(vcpu-kvm,
+   APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
+   kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm,
+   page_to_phys(vcpu-kvm-arch.apic_access_page));
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
@@ -6049,6 +6062,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_deliver_pmi(vcpu);
if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
vcpu_scan_ioapic(vcpu);
+   if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
+   kvm_vcpu_reload_apic_access_page(vcpu);
}
 
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a4c33b3

[PATCH v6 2/6] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-09-16 Thread Tang Chen

kvm_arch-ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch-ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch-ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
Reviewed-by: Gleb Natapov g...@kernel.org
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 47 +++--
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..35171c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,7 +580,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b80ead..4fb84ad 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3938,21 +3939,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm-arch.ept_identity_pagetable)) {
-   printk(KERN_ERR EPT: identity-mapping pagetable 
-   haven't been allocated!\n);
-   return 0;
+
+   /* Protect kvm-arch.ept_identity_pagetable_done. */
+   mutex_lock(kvm-slots_lock);
+
+   if (likely(kvm-arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm-arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm-arch.ept_identity_map_addr  PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(kvm-srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r  0)
@@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(kvm-srcu, idx);
+
+out2:
+   mutex_unlock(kvm-slots_lock);
return ret;
 }
 
@@ -4019,31 +4029,20 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /* Called with kvm-slots_lock held. */
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(kvm-slots_lock);
-   if (kvm-arch.ept_identity_pagetable)
-   goto out;
+   BUG_ON(kvm-arch.ept_identity_pagetable_done);
+
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm-arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, kvm_userspace_mem);
-   if (r)
-   goto out;
-
-   page = gfn_to_page(kvm, kvm-arch.ept_identity_map_addr  PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
 
-   kvm-arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(kvm-slots_lock);
return r;
 }
 
@@ -7643,8 +7642,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm-arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..e05bd58 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7239,8 +7239,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_free_vcpus(kvm);
if (kvm

[PATCH v6 6/6] kvm, mem-hotplug: Unpin and remove kvm_arch-apic_access_page.

2014-09-16 Thread Tang Chen

To make apic access page migratable, we do not pin it in memory now.
When it is migrated, we should reload its physical address for all
vmcses. But when we tried to do this, all vcpu will access
kvm_arch-apic_access_page without any locking. This is not safe.

Actually, we do not need kvm_arch-apic_access_page anymore. Since
apic access page is not pinned in memory now, we can remove
kvm_arch-apic_access_page. When we need to write its physical address
into vmcs, use gfn_to_page() to get its page struct, which will also
pin it. And unpin it after then.

Suggested-by: Gleb Natapov g...@kernel.org
Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/vmx.c  | 15 +--
 arch/x86/kvm/x86.c  | 15 +--
 3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 92b3e72..9daf754 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -576,7 +576,7 @@ struct kvm_arch {
struct kvm_apic_map *apic_map;
 
unsigned int tss_addr;
-   struct page *apic_access_page;
+   bool apic_access_page_done;
 
gpa_t wall_clock;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d0d5981..61f3854 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4002,7 +4002,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
int r = 0;
 
mutex_lock(kvm-slots_lock);
-   if (kvm-arch.apic_access_page)
+   if (kvm-arch.apic_access_page_done)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
@@ -4018,7 +4018,12 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
}
 
-   kvm-arch.apic_access_page = page;
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
+   kvm-arch.apic_access_page_done = true;
 out:
mutex_unlock(kvm-slots_lock);
return r;
@@ -4534,8 +4539,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
}
 
if (vm_need_virtualize_apic_accesses(vmx-vcpu.kvm))
-   vmcs_write64(APIC_ACCESS_ADDR,
-
page_to_phys(vmx-vcpu.kvm-arch.apic_access_page));
+   kvm_vcpu_reload_apic_access_page(vcpu);
 
if (vmx_vm_has_apicv(vcpu-kvm))
memset(vmx-pi_desc, 0, sizeof(struct pi_desc));
@@ -7995,8 +7999,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
} else if (vm_need_virtualize_apic_accesses(vmx-vcpu.kvm)) {
exec_control |=
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-   vmcs_write64(APIC_ACCESS_ADDR,
-   page_to_phys(vcpu-kvm-arch.apic_access_page));
+   kvm_vcpu_reload_apic_access_page(vcpu);
}
 
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3f458b2..9094e13 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5991,15 +5991,20 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
+   struct page *page;
+
/*
 * apic access page could be migrated. When the page is being migrated,
 * GUP will wait till the migrate entry is replaced with the new pte
 * entry pointing to the new page.
 */
-   vcpu-kvm-arch.apic_access_page = gfn_to_page(vcpu-kvm,
-   APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
-   kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm,
-   page_to_phys(vcpu-kvm-arch.apic_access_page));
+   page = gfn_to_page(vcpu-kvm, APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
+   kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm, page_to_phys(page));
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
@@ -7253,8 +7258,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kfree(kvm-arch.vpic);
kfree(kvm-arch.vioapic);
kvm_free_vcpus(kvm);
-   if (kvm-arch.apic_access_page)
-   put_page(kvm-arch.apic_access_page);
kfree(rcu_dereference_check(kvm-arch.apic_map, 1));
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 5/6] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-09-16 Thread Tang Chen

This patch only handle L1 and L2 vm share one apic access page situation.

When L1 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, 
L2's vmcs
will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do
nothing.

When L2 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all L2 vmcs. And this patch requests apic access page reload in L2-L1 vmexit.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/vmx.c  | 6 ++
 arch/x86/kvm/x86.c  | 3 ++-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 514183e..92b3e72 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1046,6 +1046,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu);
 
 void kvm_define_shared_msr(unsigned index, u32 msr);
 void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a1a9797..d0d5981 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8795,6 +8795,12 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* We are now running in L2, mmu_notifier will force to reload the
+* page's hpa for L2 vmcs. Need to reload it for L1 before entering L1.
+*/
+   kvm_vcpu_reload_apic_access_page(vcpu);
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMRESUME instruction, so we need to set the
 * success or failure flag accordingly.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 27c3d30..3f458b2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,7 +5989,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
-static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
/*
 * apic access page could be migrated. When the page is being migrated,
@@ -6001,6 +6001,7 @@ static void kvm_vcpu_reload_apic_access_page(struct 
kvm_vcpu *vcpu)
kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm,
page_to_phys(vcpu-kvm-arch.apic_access_page));
 }
+EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 3/6] kvm: Make init_rmode_identity_map() return 0 on success.

2014-09-16 Thread Tang Chen

In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/kvm/vmx.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4fb84ad..72a0470 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3939,45 +3939,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm-arch.ept_identity_pagetable_done. */
mutex_lock(kvm-slots_lock);
 
-   if (likely(kvm-arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm-arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm-arch.ept_identity_map_addr  PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(kvm-srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r  0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i  PT32_ENT_PER_PAGE; i++) {
tmp = (i  22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
tmp, i * sizeof(tmp), sizeof(tmp));
-   if (r  0)
+   if (ret)
goto out;
}
kvm-arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(kvm-srcu, idx);
-
 out2:
mutex_unlock(kvm-slots_lock);
return ret;
@@ -7604,11 +7601,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
*kvm, unsigned int id)
if (err)
goto free_vcpu;
 
+   /* Set err to -ENOMEM to handle memory allocation error. */
+   err = -ENOMEM;
+
vmx-guest_msrs = kmalloc(PAGE_SIZE, GFP_KERNEL);
BUILD_BUG_ON(ARRAY_SIZE(vmx_msr_index) * sizeof(vmx-guest_msrs[0])
  PAGE_SIZE);
 
-   err = -ENOMEM;
if (!vmx-guest_msrs) {
goto uninit_vcpu;
}
@@ -7641,8 +7640,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
if (!kvm-arch.ept_identity_map_addr)
kvm-arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
-   err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   err = init_rmode_identity_map(kvm);
+   if (err  0)
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 1/6] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-09-16 Thread Tang Chen

We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
Reviewed-by: Gleb Natapov g...@kernel.org
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..1d941ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm-asid_generation = 0;
init_vmcb(svm);
 
-   svm-vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm-vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(svm-vcpu))
svm-vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..4b80ead 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, kvm_userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx-vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(vmx-vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(vmx-vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 3/7] kvm: Make init_rmode_identity_map() return 0 on success.

2014-09-10 Thread Tang Chen

In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 953d529..63c4c3e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3939,45 +3939,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm->arch.ept_identity_pagetable_done. */
mutex_lock(>slots_lock);
 
-   if (likely(kvm->arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm->arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(>srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r < 0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i < PT32_ENT_PER_PAGE; i++) {
tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
, i * sizeof(tmp), sizeof(tmp));
-   if (r < 0)
+   if (ret)
goto out;
}
kvm->arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(>srcu, idx);
-
 out2:
mutex_unlock(>slots_lock);
return ret;
@@ -7645,7 +7642,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   if (init_rmode_identity_map(kvm))
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 7/7] kvm, mem-hotplug: Unpin and remove nested_vmx->apic_access_page.

2014-09-10 Thread Tang Chen

Just like we removed kvm_arch->apic_access_page, nested_vmx->apic_access_page
becomes useless for the same reason. This patch removes 
nested_vmx->apic_access_page,
and use gfn_to_page() to pin it in memory when we need it, and unpin it after 
then.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c | 31 +--
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 058c373..4aa73cb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -374,11 +374,6 @@ struct nested_vmx {
u64 vmcs01_tsc_offset;
/* L2 must run next, and mustn't decide to exit to L1. */
bool nested_run_pending;
-   /*
-* Guest pages referred to in vmcs02 with host-physical pointers, so
-* we must keep them pinned while L2 runs.
-*/
-   struct page *apic_access_page;
u64 msr_ia32_feature_control;
 
struct hrtimer preemption_timer;
@@ -6154,11 +6149,6 @@ static void free_nested(struct vcpu_vmx *vmx)
nested_release_vmcs12(vmx);
if (enable_shadow_vmcs)
free_vmcs(vmx->nested.current_shadow_vmcs);
-   /* Unpin physical memory we referred to in current vmcs02 */
-   if (vmx->nested.apic_access_page) {
-   nested_release_page(vmx->nested.apic_access_page);
-   vmx->nested.apic_access_page = 0;
-   }
 
nested_free_all_saved_vmcss(vmx);
 }
@@ -7983,28 +7973,31 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
struct vmcs12 *vmcs12)
exec_control |= vmcs12->secondary_vm_exec_control;
 
if (exec_control & SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) {
+   struct page *page;
/*
 * Translate L1 physical address to host physical
 * address for vmcs02. Keep the page pinned, so this
 * physical address remains valid. We keep a reference
 * to it so we can release it later.
 */
-   if (vmx->nested.apic_access_page) /* shouldn't happen */
-   
nested_release_page(vmx->nested.apic_access_page);
-   vmx->nested.apic_access_page =
-   nested_get_page(vcpu, vmcs12->apic_access_addr);
+   page = nested_get_page(vcpu, vmcs12->apic_access_addr);
/*
 * If translation failed, no matter: This feature asks
 * to exit when accessing the given address, and if it
 * can never be accessed, this feature won't do
 * anything anyway.
 */
-   if (!vmx->nested.apic_access_page)
+   if (!page)
exec_control &=
  ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
else
vmcs_write64(APIC_ACCESS_ADDR,
- page_to_phys(vmx->nested.apic_access_page));
+page_to_phys(page));
+   /*
+* Do not pin nested vm's apic access page in memory so
+* that memory hotplug process is able to migrate it.
+*/
+   put_page(page);
} else if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
struct page *page = gfn_to_page(vmx->vcpu.kvm,
APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
@@ -8807,12 +8800,6 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
/* This is needed for same reason as it was needed in prepare_vmcs02 */
vmx->host_rsp = 0;
 
-   /* Unpin physical memory we referred to in vmcs02 */
-   if (vmx->nested.apic_access_page) {
-   nested_release_page(vmx->nested.apic_access_page);
-   vmx->nested.apic_access_page = 0;
-   }
-
/*
 * Do not call kvm_reload_apic_access_page() because we are now
 * running, mmu_notifier will force to reload the page's hpa for L2
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 2/7] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-09-10 Thread Tang Chen

kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 50 -
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..35171c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,7 +580,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b80ead..953d529 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3938,21 +3939,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm->arch.ept_identity_pagetable)) {
-   printk(KERN_ERR "EPT: identity-mapping pagetable "
-   "haven't been allocated!\n");
-   return 0;
+
+   /* Protect kvm->arch.ept_identity_pagetable_done. */
+   mutex_lock(>slots_lock);
+
+   if (likely(kvm->arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm->arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(>srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
@@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(>srcu, idx);
+
+out2:
+   mutex_unlock(>slots_lock);
return ret;
 }
 
@@ -4019,31 +4029,23 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /*
+* In init_rmode_identity_map(), kvm->arch.ept_identity_pagetable_done
+* is checked before calling this function and set to true after the
+* calling. The access to kvm->arch.ept_identity_pagetable_done should
+* be protected by kvm->slots_lock.
+*/
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(>slots_lock);
-   if (kvm->arch.ept_identity_pagetable)
-   goto out;
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm->arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, _userspace_mem);
-   if (r)
-   goto out;
 
-   page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
-
-   kvm->arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(>slots_lock);
return r;
 }
 
@@ -7643,8 +7645,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto fr

[PATCH v5 1/7] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-09-10 Thread Tang Chen

We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..1d941ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm->asid_generation = 0;
init_vmcb(svm);
 
-   svm->vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(>vcpu))
svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..4b80ead 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, _userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(>vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(>vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 0/7] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-10 Thread Tang Chen

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

NOTE: Tested with -cpu xxx,-x2apic option.
  But since nested vm pins some other pages in memory, if user uses nested
  vm, memory hot-remove will not work.

Change log v4 -> v5:
1. Patch 5/7: Call kvm_reload_apic_access_page() unconditionally in 
nested_vmx_vmexit().
   (From Gleb Natapov )
2. Patch 6/7: Remove kvm_arch->apic_access_page. (From Gleb Natapov 
)
3. Patch 7/7: Remove nested_vmx->apic_access_page.

Tang Chen (7):
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm, mem-hotplug: Reload L1' apic access page on migration in
vcpu_enter_guest().
  kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
running.
  kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.
  kvm, mem-hotplug: Unpin and remove nested_vmx->apic_access_page.

 arch/x86/include/asm/kvm_host.h |   4 +-
 arch/x86/kvm/svm.c  |   9 ++-
 arch/x86/kvm/vmx.c  | 139 ++--
 arch/x86/kvm/x86.c  |  24 +--
 include/linux/kvm_host.h|   2 +
 virt/kvm/kvm_main.c |  13 
 6 files changed, 122 insertions(+), 69 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 4/7] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-10 Thread Tang Chen

apic access page is pinned in memory. As a result, it cannot be 
migrated/hot-removed.
Actually, it is not necessary to be pinned.

The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
corresponding ept entry. This patch introduces a new vcpu request named
KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
and force all the vcpus exit guest, and re-enter guest till they updates the 
VMCS
APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
kvm->arch.apic_access_page to the new page.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  |  6 ++
 arch/x86/kvm/x86.c  | 15 +++
 include/linux/kvm_host.h|  2 ++
 virt/kvm/kvm_main.c | 12 
 6 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35171c7..514183e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -739,6 +739,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d941ad..f2eacc4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3619,6 +3619,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4373,6 +4378,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 63c4c3e..da6d55d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7093,6 +7093,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8910,6 +8915,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e05bd58..96f4188 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,6 +5989,19 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+   /*
+* apic access page could be migrated. When the page is being migrated,
+* GUP will wait till the migrate entry is replaced with the new pte
+* entry pointing to the new page.
+*/
+   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
+   page_to_phys(vcpu->kvm->arch.apic_access_page));
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
@@ -6049,6 +6062,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_deliver_pmi(vcpu);
if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
vcpu_scan_ioapic(vcpu);
+   if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
+   vcpu_reload_apic_access_page(vcpu);
}
 
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h

[PATCH v5 5/7] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-09-10 Thread Tang Chen

This patch only handle "L1 and L2 vm share one apic access page" situation.

When L1 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, 
L2's vmcs
will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do
nothing.

When L2 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all L2 vmcs. And this patch requests apic access page reload in L2->L1 vmexit.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c  | 7 +++
 virt/kvm/kvm_main.c | 1 +
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index da6d55d..e7704b2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8796,6 +8796,13 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* Do not call kvm_reload_apic_access_page() because we are now
+* running, mmu_notifier will force to reload the page's hpa for L2
+* vmcs. Need to reload it for L1 before entering L1.
+*/
+   kvm_reload_apic_access_page(vcpu->kvm);
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMRESUME instruction, so we need to set the
 * success or failure flag accordingly.
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d8280de..784127e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -214,6 +214,7 @@ void kvm_reload_apic_access_page(struct kvm *kvm)
 {
make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
 }
+EXPORT_SYMBOL_GPL(kvm_reload_apic_access_page);
 
 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 6/7] kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

2014-09-10 Thread Tang Chen

To make apic access page migratable, we do not pin it in memory now.
When it is migrated, we should reload its physical address for all
vmcses. But when we tried to do this, all vcpu will access
kvm_arch->apic_access_page without any locking. This is not safe.

Actually, we do not need kvm_arch->apic_access_page anymore. Since
apic access page is not pinned in memory now, we can remove
kvm_arch->apic_access_page. When we need to write its physical address
into vmcs, use gfn_to_page() to get its page struct, which will also
pin it. And unpin it after then.

Suggested-by: Gleb Natapov 
Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/vmx.c  | 32 +---
 arch/x86/kvm/x86.c  | 15 +--
 3 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 514183e..70f0d2d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -576,7 +576,7 @@ struct kvm_arch {
struct kvm_apic_map *apic_map;
 
unsigned int tss_addr;
-   struct page *apic_access_page;
+   bool apic_access_page_done;
 
gpa_t wall_clock;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e7704b2..058c373 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4002,7 +4002,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
int r = 0;
 
mutex_lock(>slots_lock);
-   if (kvm->arch.apic_access_page)
+   if (kvm->arch.apic_access_page_done)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
@@ -4018,7 +4018,12 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
}
 
-   kvm->arch.apic_access_page = page;
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
+   kvm->arch.apic_access_page_done = true;
 out:
mutex_unlock(>slots_lock);
return r;
@@ -4536,9 +4541,16 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmcs_write32(TPR_THRESHOLD, 0);
}
 
-   if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm))
-   vmcs_write64(APIC_ACCESS_ADDR,
-
page_to_phys(vmx->vcpu.kvm->arch.apic_access_page));
+   if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
+   struct page *page = gfn_to_page(vmx->vcpu.kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   vmcs_write64(APIC_ACCESS_ADDR, page_to_phys(page));
+   /*
+* Do not pin apic access page in memory so that memory
+* hotplug process is able to migrate it.
+*/
+   put_page(page);
+   }
 
if (vmx_vm_has_apicv(vcpu->kvm))
memset(>pi_desc, 0, sizeof(struct pi_desc));
@@ -7994,10 +8006,16 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
struct vmcs12 *vmcs12)
vmcs_write64(APIC_ACCESS_ADDR,
  page_to_phys(vmx->nested.apic_access_page));
} else if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
+   struct page *page = gfn_to_page(vmx->vcpu.kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
exec_control |=
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-   vmcs_write64(APIC_ACCESS_ADDR,
-   page_to_phys(vcpu->kvm->arch.apic_access_page));
+   vmcs_write64(APIC_ACCESS_ADDR, page_to_phys(page));
+   /*
+* Do not pin apic access page in memory so that memory
+* hotplug process is able to migrate it.
+*/
+   put_page(page);
}
 
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 96f4188..6da0b93 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5991,15 +5991,20 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 
 static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
+   struct page *page;
+
/*
 * apic access page could be migrated. When the page is being migrated,
 * GUP will wait till the migrate entry is replaced with the new pte
 * entry pointing to the new page.
 */
-   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
-   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
-   kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
-

[PATCH v5 6/7] kvm, mem-hotplug: Unpin and remove kvm_arch-apic_access_page.

2014-09-10 Thread Tang Chen

To make apic access page migratable, we do not pin it in memory now.
When it is migrated, we should reload its physical address for all
vmcses. But when we tried to do this, all vcpu will access
kvm_arch-apic_access_page without any locking. This is not safe.

Actually, we do not need kvm_arch-apic_access_page anymore. Since
apic access page is not pinned in memory now, we can remove
kvm_arch-apic_access_page. When we need to write its physical address
into vmcs, use gfn_to_page() to get its page struct, which will also
pin it. And unpin it after then.

Suggested-by: Gleb Natapov g...@kernel.org
Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/vmx.c  | 32 +---
 arch/x86/kvm/x86.c  | 15 +--
 3 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 514183e..70f0d2d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -576,7 +576,7 @@ struct kvm_arch {
struct kvm_apic_map *apic_map;
 
unsigned int tss_addr;
-   struct page *apic_access_page;
+   bool apic_access_page_done;
 
gpa_t wall_clock;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e7704b2..058c373 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4002,7 +4002,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
int r = 0;
 
mutex_lock(kvm-slots_lock);
-   if (kvm-arch.apic_access_page)
+   if (kvm-arch.apic_access_page_done)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
@@ -4018,7 +4018,12 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
}
 
-   kvm-arch.apic_access_page = page;
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
+   kvm-arch.apic_access_page_done = true;
 out:
mutex_unlock(kvm-slots_lock);
return r;
@@ -4536,9 +4541,16 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmcs_write32(TPR_THRESHOLD, 0);
}
 
-   if (vm_need_virtualize_apic_accesses(vmx-vcpu.kvm))
-   vmcs_write64(APIC_ACCESS_ADDR,
-
page_to_phys(vmx-vcpu.kvm-arch.apic_access_page));
+   if (vm_need_virtualize_apic_accesses(vmx-vcpu.kvm)) {
+   struct page *page = gfn_to_page(vmx-vcpu.kvm,
+   APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
+   vmcs_write64(APIC_ACCESS_ADDR, page_to_phys(page));
+   /*
+* Do not pin apic access page in memory so that memory
+* hotplug process is able to migrate it.
+*/
+   put_page(page);
+   }
 
if (vmx_vm_has_apicv(vcpu-kvm))
memset(vmx-pi_desc, 0, sizeof(struct pi_desc));
@@ -7994,10 +8006,16 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
struct vmcs12 *vmcs12)
vmcs_write64(APIC_ACCESS_ADDR,
  page_to_phys(vmx-nested.apic_access_page));
} else if (vm_need_virtualize_apic_accesses(vmx-vcpu.kvm)) {
+   struct page *page = gfn_to_page(vmx-vcpu.kvm,
+   APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
exec_control |=
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-   vmcs_write64(APIC_ACCESS_ADDR,
-   page_to_phys(vcpu-kvm-arch.apic_access_page));
+   vmcs_write64(APIC_ACCESS_ADDR, page_to_phys(page));
+   /*
+* Do not pin apic access page in memory so that memory
+* hotplug process is able to migrate it.
+*/
+   put_page(page);
}
 
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 96f4188..6da0b93 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5991,15 +5991,20 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 
 static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
+   struct page *page;
+
/*
 * apic access page could be migrated. When the page is being migrated,
 * GUP will wait till the migrate entry is replaced with the new pte
 * entry pointing to the new page.
 */
-   vcpu-kvm-arch.apic_access_page = gfn_to_page(vcpu-kvm,
-   APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
-   kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm,
-   page_to_phys(vcpu-kvm-arch.apic_access_page));
+   page

[PATCH v5 1/7] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-09-10 Thread Tang Chen

We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
Reviewed-by: Gleb Natapov g...@kernel.org
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..1d941ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm-asid_generation = 0;
init_vmcb(svm);
 
-   svm-vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm-vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(svm-vcpu))
svm-vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..4b80ead 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, kvm_userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx-vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(vmx-vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(vmx-vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 0/7] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-10 Thread Tang Chen

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

NOTE: Tested with -cpu xxx,-x2apic option.
  But since nested vm pins some other pages in memory, if user uses nested
  vm, memory hot-remove will not work.

Change log v4 - v5:
1. Patch 5/7: Call kvm_reload_apic_access_page() unconditionally in 
nested_vmx_vmexit().
   (From Gleb Natapov g...@kernel.org)
2. Patch 6/7: Remove kvm_arch-apic_access_page. (From Gleb Natapov 
g...@kernel.org)
3. Patch 7/7: Remove nested_vmx-apic_access_page.

Tang Chen (7):
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm, mem-hotplug: Reload L1' apic access page on migration in
vcpu_enter_guest().
  kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
running.
  kvm, mem-hotplug: Unpin and remove kvm_arch-apic_access_page.
  kvm, mem-hotplug: Unpin and remove nested_vmx-apic_access_page.

 arch/x86/include/asm/kvm_host.h |   4 +-
 arch/x86/kvm/svm.c  |   9 ++-
 arch/x86/kvm/vmx.c  | 139 ++--
 arch/x86/kvm/x86.c  |  24 +--
 include/linux/kvm_host.h|   2 +
 virt/kvm/kvm_main.c |  13 
 6 files changed, 122 insertions(+), 69 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 2/7] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-09-10 Thread Tang Chen

kvm_arch-ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch-ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch-ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
Reviewed-by: Gleb Natapov g...@kernel.org
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 50 -
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..35171c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,7 +580,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b80ead..953d529 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3938,21 +3939,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm-arch.ept_identity_pagetable)) {
-   printk(KERN_ERR EPT: identity-mapping pagetable 
-   haven't been allocated!\n);
-   return 0;
+
+   /* Protect kvm-arch.ept_identity_pagetable_done. */
+   mutex_lock(kvm-slots_lock);
+
+   if (likely(kvm-arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm-arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm-arch.ept_identity_map_addr  PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(kvm-srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r  0)
@@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(kvm-srcu, idx);
+
+out2:
+   mutex_unlock(kvm-slots_lock);
return ret;
 }
 
@@ -4019,31 +4029,23 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /*
+* In init_rmode_identity_map(), kvm-arch.ept_identity_pagetable_done
+* is checked before calling this function and set to true after the
+* calling. The access to kvm-arch.ept_identity_pagetable_done should
+* be protected by kvm-slots_lock.
+*/
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(kvm-slots_lock);
-   if (kvm-arch.ept_identity_pagetable)
-   goto out;
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm-arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, kvm_userspace_mem);
-   if (r)
-   goto out;
 
-   page = gfn_to_page(kvm, kvm-arch.ept_identity_map_addr  PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
-
-   kvm-arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(kvm-slots_lock);
return r;
 }
 
@@ -7643,8 +7645,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm-arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm

[PATCH v5 4/7] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-10 Thread Tang Chen

apic access page is pinned in memory. As a result, it cannot be 
migrated/hot-removed.
Actually, it is not necessary to be pinned.

The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
corresponding ept entry. This patch introduces a new vcpu request named
KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
and force all the vcpus exit guest, and re-enter guest till they updates the 
VMCS
APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
kvm-arch.apic_access_page to the new page.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  |  6 ++
 arch/x86/kvm/x86.c  | 15 +++
 include/linux/kvm_host.h|  2 ++
 virt/kvm/kvm_main.c | 12 
 6 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35171c7..514183e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -739,6 +739,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d941ad..f2eacc4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3619,6 +3619,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4373,6 +4378,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 63c4c3e..da6d55d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7093,6 +7093,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8910,6 +8915,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e05bd58..96f4188 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,6 +5989,19 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+   /*
+* apic access page could be migrated. When the page is being migrated,
+* GUP will wait till the migrate entry is replaced with the new pte
+* entry pointing to the new page.
+*/
+   vcpu-kvm-arch.apic_access_page = gfn_to_page(vcpu-kvm,
+   APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
+   kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm,
+   page_to_phys(vcpu-kvm-arch.apic_access_page));
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
@@ -6049,6 +6062,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_deliver_pmi(vcpu);
if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
vcpu_scan_ioapic(vcpu);
+   if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
+   vcpu_reload_apic_access_page(vcpu);
}
 
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a4c33b3..8be076a

[PATCH v5 5/7] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-09-10 Thread Tang Chen

This patch only handle L1 and L2 vm share one apic access page situation.

When L1 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, 
L2's vmcs
will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do
nothing.

When L2 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all L2 vmcs. And this patch requests apic access page reload in L2-L1 vmexit.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/kvm/vmx.c  | 7 +++
 virt/kvm/kvm_main.c | 1 +
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index da6d55d..e7704b2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8796,6 +8796,13 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* Do not call kvm_reload_apic_access_page() because we are now
+* running, mmu_notifier will force to reload the page's hpa for L2
+* vmcs. Need to reload it for L1 before entering L1.
+*/
+   kvm_reload_apic_access_page(vcpu-kvm);
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMRESUME instruction, so we need to set the
 * success or failure flag accordingly.
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d8280de..784127e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -214,6 +214,7 @@ void kvm_reload_apic_access_page(struct kvm *kvm)
 {
make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
 }
+EXPORT_SYMBOL_GPL(kvm_reload_apic_access_page);
 
 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 7/7] kvm, mem-hotplug: Unpin and remove nested_vmx-apic_access_page.

2014-09-10 Thread Tang Chen

Just like we removed kvm_arch-apic_access_page, nested_vmx-apic_access_page
becomes useless for the same reason. This patch removes 
nested_vmx-apic_access_page,
and use gfn_to_page() to pin it in memory when we need it, and unpin it after 
then.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/kvm/vmx.c | 31 +--
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 058c373..4aa73cb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -374,11 +374,6 @@ struct nested_vmx {
u64 vmcs01_tsc_offset;
/* L2 must run next, and mustn't decide to exit to L1. */
bool nested_run_pending;
-   /*
-* Guest pages referred to in vmcs02 with host-physical pointers, so
-* we must keep them pinned while L2 runs.
-*/
-   struct page *apic_access_page;
u64 msr_ia32_feature_control;
 
struct hrtimer preemption_timer;
@@ -6154,11 +6149,6 @@ static void free_nested(struct vcpu_vmx *vmx)
nested_release_vmcs12(vmx);
if (enable_shadow_vmcs)
free_vmcs(vmx-nested.current_shadow_vmcs);
-   /* Unpin physical memory we referred to in current vmcs02 */
-   if (vmx-nested.apic_access_page) {
-   nested_release_page(vmx-nested.apic_access_page);
-   vmx-nested.apic_access_page = 0;
-   }
 
nested_free_all_saved_vmcss(vmx);
 }
@@ -7983,28 +7973,31 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
struct vmcs12 *vmcs12)
exec_control |= vmcs12-secondary_vm_exec_control;
 
if (exec_control  SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) {
+   struct page *page;
/*
 * Translate L1 physical address to host physical
 * address for vmcs02. Keep the page pinned, so this
 * physical address remains valid. We keep a reference
 * to it so we can release it later.
 */
-   if (vmx-nested.apic_access_page) /* shouldn't happen */
-   
nested_release_page(vmx-nested.apic_access_page);
-   vmx-nested.apic_access_page =
-   nested_get_page(vcpu, vmcs12-apic_access_addr);
+   page = nested_get_page(vcpu, vmcs12-apic_access_addr);
/*
 * If translation failed, no matter: This feature asks
 * to exit when accessing the given address, and if it
 * can never be accessed, this feature won't do
 * anything anyway.
 */
-   if (!vmx-nested.apic_access_page)
+   if (!page)
exec_control =
  ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
else
vmcs_write64(APIC_ACCESS_ADDR,
- page_to_phys(vmx-nested.apic_access_page));
+page_to_phys(page));
+   /*
+* Do not pin nested vm's apic access page in memory so
+* that memory hotplug process is able to migrate it.
+*/
+   put_page(page);
} else if (vm_need_virtualize_apic_accesses(vmx-vcpu.kvm)) {
struct page *page = gfn_to_page(vmx-vcpu.kvm,
APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
@@ -8807,12 +8800,6 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
/* This is needed for same reason as it was needed in prepare_vmcs02 */
vmx-host_rsp = 0;
 
-   /* Unpin physical memory we referred to in vmcs02 */
-   if (vmx-nested.apic_access_page) {
-   nested_release_page(vmx-nested.apic_access_page);
-   vmx-nested.apic_access_page = 0;
-   }
-
/*
 * Do not call kvm_reload_apic_access_page() because we are now
 * running, mmu_notifier will force to reload the page's hpa for L2
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 3/7] kvm: Make init_rmode_identity_map() return 0 on success.

2014-09-10 Thread Tang Chen

In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/kvm/vmx.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 953d529..63c4c3e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3939,45 +3939,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm-arch.ept_identity_pagetable_done. */
mutex_lock(kvm-slots_lock);
 
-   if (likely(kvm-arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm-arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm-arch.ept_identity_map_addr  PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(kvm-srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r  0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i  PT32_ENT_PER_PAGE; i++) {
tmp = (i  22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
tmp, i * sizeof(tmp), sizeof(tmp));
-   if (r  0)
+   if (ret)
goto out;
}
kvm-arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(kvm-srcu, idx);
-
 out2:
mutex_unlock(kvm-slots_lock);
return ret;
@@ -7645,7 +7642,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm-arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   if (init_rmode_identity_map(kvm))
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 3/6] kvm: Make init_rmode_identity_map() return 0 on success.

2014-08-27 Thread Tang Chen

In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 953d529..63c4c3e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3939,45 +3939,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm->arch.ept_identity_pagetable_done. */
mutex_lock(>slots_lock);
 
-   if (likely(kvm->arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm->arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(>srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r < 0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i < PT32_ENT_PER_PAGE; i++) {
tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
, i * sizeof(tmp), sizeof(tmp));
-   if (r < 0)
+   if (ret)
goto out;
}
kvm->arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(>srcu, idx);
-
 out2:
mutex_unlock(>slots_lock);
return ret;
@@ -7645,7 +7642,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   if (init_rmode_identity_map(kvm))
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 2/6] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-08-27 Thread Tang Chen

kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 50 -
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..35171c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,7 +580,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b80ead..953d529 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3938,21 +3939,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm->arch.ept_identity_pagetable)) {
-   printk(KERN_ERR "EPT: identity-mapping pagetable "
-   "haven't been allocated!\n");
-   return 0;
+
+   /* Protect kvm->arch.ept_identity_pagetable_done. */
+   mutex_lock(>slots_lock);
+
+   if (likely(kvm->arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm->arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(>srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
@@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(>srcu, idx);
+
+out2:
+   mutex_unlock(>slots_lock);
return ret;
 }
 
@@ -4019,31 +4029,23 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /*
+* In init_rmode_identity_map(), kvm->arch.ept_identity_pagetable_done
+* is checked before calling this function and set to true after the
+* calling. The access to kvm->arch.ept_identity_pagetable_done should
+* be protected by kvm->slots_lock.
+*/
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(>slots_lock);
-   if (kvm->arch.ept_identity_pagetable)
-   goto out;
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm->arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, _userspace_mem);
-   if (r)
-   goto out;
 
-   page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
-
-   kvm->arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(>slots_lock);
return r;
 }
 
@@ -7643,8 +7645,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
diff -

[PATCH v4 0/6] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-08-27 Thread Tang Chen

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

NOTE: Tested with -cpu xxx,-x2apic option.
  But since nested vm pins some other pages in memory, if user uses nested
  vm, memory hot-remove will not work.

Change log v3 -> v4:
1. The original patch 6 is now patch 5. ( by Jan Kiszka  )
2. The original patch 1 is now patch 6 since we should unpin apic access page
   at the very last moment.


Tang Chen (6):
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm, mem-hotplug: Reload L1' apic access page on migration in
vcpu_enter_guest().
  kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
running.
  kvm, mem-hotplug: Do not pin apic access page in memory.

 arch/x86/include/asm/kvm_host.h |   3 +-
 arch/x86/kvm/svm.c  |  15 +-
 arch/x86/kvm/vmx.c  | 103 +++-
 arch/x86/kvm/x86.c  |  22 +++--
 include/linux/kvm_host.h|   3 ++
 virt/kvm/kvm_main.c |  30 +++-
 6 files changed, 135 insertions(+), 41 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-08-27 Thread Tang Chen

apic access page is pinned in memory. As a result, it cannot be 
migrated/hot-removed.
Actually, it is not necessary to be pinned.

The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
corresponding ept entry. This patch introduces a new vcpu request named
KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
and force all the vcpus exit guest, and re-enter guest till they updates the 
VMCS
APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
kvm->arch.apic_access_page to the new page.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  |  6 ++
 arch/x86/kvm/x86.c  | 15 +++
 include/linux/kvm_host.h|  2 ++
 virt/kvm/kvm_main.c | 12 
 6 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35171c7..514183e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -739,6 +739,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d941ad..f2eacc4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3619,6 +3619,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4373,6 +4378,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 63c4c3e..da6d55d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7093,6 +7093,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8910,6 +8915,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e05bd58..96f4188 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,6 +5989,19 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+   /*
+* apic access page could be migrated. When the page is being migrated,
+* GUP will wait till the migrate entry is replaced with the new pte
+* entry pointing to the new page.
+*/
+   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
+   page_to_phys(vcpu->kvm->arch.apic_access_page));
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
@@ -6049,6 +6062,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_deliver_pmi(vcpu);
if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
vcpu_scan_ioapic(vcpu);
+   if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
+   vcpu_reload_apic_access_page(vcpu);
}
 
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h

[PATCH v4 5/6] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-08-27 Thread Tang Chen

This patch only handle "L1 and L2 vm share one apic access page" situation.

When L1 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, 
L2's vmcs
will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do
nothing.

When L2 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all L2 vmcs. And this patch requests apic access page reload in L2->L1 vmexit.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  | 32 
 arch/x86/kvm/x86.c  |  3 +++
 virt/kvm/kvm_main.c |  1 +
 5 files changed, 43 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 514183e..13fbb62 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -740,6 +740,7 @@ struct kvm_x86_ops {
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
+   void (*set_nested_apic_page_migrated)(struct kvm_vcpu *vcpu, bool set);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f2eacc4..da88646 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3624,6 +3624,11 @@ static void svm_set_apic_access_page_addr(struct kvm 
*kvm, hpa_t hpa)
return;
 }
 
+static void svm_set_nested_apic_page_migrated(struct kvm_vcpu *vcpu, bool set)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4379,6 +4384,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
.set_apic_access_page_addr = svm_set_apic_access_page_addr,
+   .set_nested_apic_page_migrated = svm_set_nested_apic_page_migrated,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index da6d55d..9035fd1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -379,6 +379,16 @@ struct nested_vmx {
 * we must keep them pinned while L2 runs.
 */
struct page *apic_access_page;
+   /*
+* L1's apic access page can be migrated. When L1 and L2 are sharing
+* the apic access page, after the page is migrated when L2 is running,
+* we have to reload it to L1 vmcs before we enter L1.
+*
+* When the shared apic access page is migrated in L1 mode, we don't
+* need to do anything else because we reload apic access page each
+* time when entering L2 in prepare_vmcs02().
+*/
+   bool apic_access_page_migrated;
u64 msr_ia32_feature_control;
 
struct hrtimer preemption_timer;
@@ -7098,6 +7108,12 @@ static void vmx_set_apic_access_page_addr(struct kvm 
*kvm, hpa_t hpa)
vmcs_write64(APIC_ACCESS_ADDR, hpa);
 }
 
+static void vmx_set_nested_apic_page_migrated(struct kvm_vcpu *vcpu, bool set)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+   vmx->nested.apic_access_page_migrated = set;
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8796,6 +8812,21 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* When shared (L1 & L2) apic access page is migrated during L2 is
+* running, mmu_notifier will force to reload the page's hpa for L2
+* vmcs. Need to reload it for L1 before entering L1.
+*/
+   if (vmx->nested.apic_access_page_migrated) {
+   /*
+* Do not call kvm_reload_apic_access_page() because we are now
+* in L2. We should not call make_all_cpus_request() to exit to
+* L0, otherwise we will reload for L2 vmcs again.
+*/
+   kvm_reload_apic_access_page(vcpu->kvm);
+   vmx->nested.apic_access_page_migrated = false;
+   }
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMRESUME instruction, so we need to set the
 * success or failure flag accordingly.
@@ -8916,6 +8947,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.update_cr8_int

[PATCH v4 1/6] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-08-27 Thread Tang Chen

We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..1d941ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm->asid_generation = 0;
init_vmcb(svm);
 
-   svm->vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(>vcpu))
svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..4b80ead 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, _userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(>vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(>vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 6/6] kvm, mem-hotplug: Do not pin apic access page in memory.

2014-08-27 Thread Tang Chen

gfn_to_page() will finally call hva_to_pfn() to get the pfn, and pin the page
in memory by calling GUP functions. This function unpins the page.

After this patch, acpi access page is able to be migrated.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c   |  2 +-
 arch/x86/kvm/x86.c   |  4 +---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c  | 17 -
 4 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9035fd1..e0043a5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4022,7 +4022,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
if (r)
goto out;
 
-   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   page = gfn_to_page_no_pin(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 131b6e8..2edbeb9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5996,7 +5996,7 @@ static void vcpu_reload_apic_access_page(struct kvm_vcpu 
*vcpu)
 * GUP will wait till the migrate entry is replaced with the new pte
 * entry pointing to the new page.
 */
-   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
+   vcpu->kvm->arch.apic_access_page = gfn_to_page_no_pin(vcpu->kvm,
APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
page_to_phys(vcpu->kvm->arch.apic_access_page));
@@ -7255,8 +7255,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kfree(kvm->arch.vpic);
kfree(kvm->arch.vioapic);
kvm_free_vcpus(kvm);
-   if (kvm->arch.apic_access_page)
-   put_page(kvm->arch.apic_access_page);
kfree(rcu_dereference_check(kvm->arch.apic_map, 1));
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8be076a..02cbcb1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -526,6 +526,7 @@ int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, 
struct page **pages,
int nr_pages);
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
+struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable);
 unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 784127e..19d90d2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1386,9 +1386,24 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 
return kvm_pfn_to_page(pfn);
 }
-
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
+struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn)
+{
+   struct page *page = gfn_to_page(kvm, gfn);
+
+   /*
+* gfn_to_page() will finally call hva_to_pfn() to get the pfn, and pin
+* the page in memory by calling GUP functions. This function unpins
+* the page.
+*/
+   if (!is_error_page(page))
+   put_page(page);
+
+   return page;
+}
+EXPORT_SYMBOL_GPL(gfn_to_page_no_pin);
+
 void kvm_release_page_clean(struct page *page)
 {
WARN_ON(is_error_page(page));
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-08-27 Thread Tang Chen

apic access page is pinned in memory. As a result, it cannot be 
migrated/hot-removed.
Actually, it is not necessary to be pinned.

The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
corresponding ept entry. This patch introduces a new vcpu request named
KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
and force all the vcpus exit guest, and re-enter guest till they updates the 
VMCS
APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
kvm-arch.apic_access_page to the new page.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  |  6 ++
 arch/x86/kvm/x86.c  | 15 +++
 include/linux/kvm_host.h|  2 ++
 virt/kvm/kvm_main.c | 12 
 6 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35171c7..514183e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -739,6 +739,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d941ad..f2eacc4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3619,6 +3619,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4373,6 +4378,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 63c4c3e..da6d55d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7093,6 +7093,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8910,6 +8915,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e05bd58..96f4188 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,6 +5989,19 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+   /*
+* apic access page could be migrated. When the page is being migrated,
+* GUP will wait till the migrate entry is replaced with the new pte
+* entry pointing to the new page.
+*/
+   vcpu-kvm-arch.apic_access_page = gfn_to_page(vcpu-kvm,
+   APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
+   kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm,
+   page_to_phys(vcpu-kvm-arch.apic_access_page));
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
@@ -6049,6 +6062,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_deliver_pmi(vcpu);
if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
vcpu_scan_ioapic(vcpu);
+   if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
+   vcpu_reload_apic_access_page(vcpu);
}
 
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a4c33b3..8be076a

[PATCH v4 5/6] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-08-27 Thread Tang Chen

This patch only handle L1 and L2 vm share one apic access page situation.

When L1 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, 
L2's vmcs
will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do
nothing.

When L2 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all L2 vmcs. And this patch requests apic access page reload in L2-L1 vmexit.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  | 32 
 arch/x86/kvm/x86.c  |  3 +++
 virt/kvm/kvm_main.c |  1 +
 5 files changed, 43 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 514183e..13fbb62 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -740,6 +740,7 @@ struct kvm_x86_ops {
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
+   void (*set_nested_apic_page_migrated)(struct kvm_vcpu *vcpu, bool set);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f2eacc4..da88646 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3624,6 +3624,11 @@ static void svm_set_apic_access_page_addr(struct kvm 
*kvm, hpa_t hpa)
return;
 }
 
+static void svm_set_nested_apic_page_migrated(struct kvm_vcpu *vcpu, bool set)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4379,6 +4384,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
.set_apic_access_page_addr = svm_set_apic_access_page_addr,
+   .set_nested_apic_page_migrated = svm_set_nested_apic_page_migrated,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index da6d55d..9035fd1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -379,6 +379,16 @@ struct nested_vmx {
 * we must keep them pinned while L2 runs.
 */
struct page *apic_access_page;
+   /*
+* L1's apic access page can be migrated. When L1 and L2 are sharing
+* the apic access page, after the page is migrated when L2 is running,
+* we have to reload it to L1 vmcs before we enter L1.
+*
+* When the shared apic access page is migrated in L1 mode, we don't
+* need to do anything else because we reload apic access page each
+* time when entering L2 in prepare_vmcs02().
+*/
+   bool apic_access_page_migrated;
u64 msr_ia32_feature_control;
 
struct hrtimer preemption_timer;
@@ -7098,6 +7108,12 @@ static void vmx_set_apic_access_page_addr(struct kvm 
*kvm, hpa_t hpa)
vmcs_write64(APIC_ACCESS_ADDR, hpa);
 }
 
+static void vmx_set_nested_apic_page_migrated(struct kvm_vcpu *vcpu, bool set)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+   vmx-nested.apic_access_page_migrated = set;
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8796,6 +8812,21 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* When shared (L1  L2) apic access page is migrated during L2 is
+* running, mmu_notifier will force to reload the page's hpa for L2
+* vmcs. Need to reload it for L1 before entering L1.
+*/
+   if (vmx-nested.apic_access_page_migrated) {
+   /*
+* Do not call kvm_reload_apic_access_page() because we are now
+* in L2. We should not call make_all_cpus_request() to exit to
+* L0, otherwise we will reload for L2 vmcs again.
+*/
+   kvm_reload_apic_access_page(vcpu-kvm);
+   vmx-nested.apic_access_page_migrated = false;
+   }
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMRESUME instruction, so we need to set the
 * success or failure flag accordingly.
@@ -8916,6 +8947,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.update_cr8_intercept

[PATCH v4 1/6] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-08-27 Thread Tang Chen

We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..1d941ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm-asid_generation = 0;
init_vmcb(svm);
 
-   svm-vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm-vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(svm-vcpu))
svm-vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..4b80ead 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, kvm_userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx-vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(vmx-vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(vmx-vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 3/6] kvm: Make init_rmode_identity_map() return 0 on success.

2014-08-27 Thread Tang Chen

In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/kvm/vmx.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 953d529..63c4c3e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3939,45 +3939,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm-arch.ept_identity_pagetable_done. */
mutex_lock(kvm-slots_lock);
 
-   if (likely(kvm-arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm-arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm-arch.ept_identity_map_addr  PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(kvm-srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r  0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i  PT32_ENT_PER_PAGE; i++) {
tmp = (i  22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
tmp, i * sizeof(tmp), sizeof(tmp));
-   if (r  0)
+   if (ret)
goto out;
}
kvm-arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(kvm-srcu, idx);
-
 out2:
mutex_unlock(kvm-slots_lock);
return ret;
@@ -7645,7 +7642,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm-arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   if (init_rmode_identity_map(kvm))
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 2/6] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-08-27 Thread Tang Chen

kvm_arch-ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch-ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch-ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 50 -
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..35171c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,7 +580,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b80ead..953d529 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3938,21 +3939,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm-arch.ept_identity_pagetable)) {
-   printk(KERN_ERR EPT: identity-mapping pagetable 
-   haven't been allocated!\n);
-   return 0;
+
+   /* Protect kvm-arch.ept_identity_pagetable_done. */
+   mutex_lock(kvm-slots_lock);
+
+   if (likely(kvm-arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm-arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm-arch.ept_identity_map_addr  PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(kvm-srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r  0)
@@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(kvm-srcu, idx);
+
+out2:
+   mutex_unlock(kvm-slots_lock);
return ret;
 }
 
@@ -4019,31 +4029,23 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /*
+* In init_rmode_identity_map(), kvm-arch.ept_identity_pagetable_done
+* is checked before calling this function and set to true after the
+* calling. The access to kvm-arch.ept_identity_pagetable_done should
+* be protected by kvm-slots_lock.
+*/
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(kvm-slots_lock);
-   if (kvm-arch.ept_identity_pagetable)
-   goto out;
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm-arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, kvm_userspace_mem);
-   if (r)
-   goto out;
 
-   page = gfn_to_page(kvm, kvm-arch.ept_identity_map_addr  PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
-
-   kvm-arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(kvm-slots_lock);
return r;
 }
 
@@ -7643,8 +7645,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm-arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..e05bd58 100644

[PATCH v4 6/6] kvm, mem-hotplug: Do not pin apic access page in memory.

2014-08-27 Thread Tang Chen

gfn_to_page() will finally call hva_to_pfn() to get the pfn, and pin the page
in memory by calling GUP functions. This function unpins the page.

After this patch, acpi access page is able to be migrated.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/kvm/vmx.c   |  2 +-
 arch/x86/kvm/x86.c   |  4 +---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c  | 17 -
 4 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9035fd1..e0043a5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4022,7 +4022,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
if (r)
goto out;
 
-   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
+   page = gfn_to_page_no_pin(kvm, APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 131b6e8..2edbeb9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5996,7 +5996,7 @@ static void vcpu_reload_apic_access_page(struct kvm_vcpu 
*vcpu)
 * GUP will wait till the migrate entry is replaced with the new pte
 * entry pointing to the new page.
 */
-   vcpu-kvm-arch.apic_access_page = gfn_to_page(vcpu-kvm,
+   vcpu-kvm-arch.apic_access_page = gfn_to_page_no_pin(vcpu-kvm,
APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm,
page_to_phys(vcpu-kvm-arch.apic_access_page));
@@ -7255,8 +7255,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kfree(kvm-arch.vpic);
kfree(kvm-arch.vioapic);
kvm_free_vcpus(kvm);
-   if (kvm-arch.apic_access_page)
-   put_page(kvm-arch.apic_access_page);
kfree(rcu_dereference_check(kvm-arch.apic_map, 1));
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8be076a..02cbcb1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -526,6 +526,7 @@ int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, 
struct page **pages,
int nr_pages);
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
+struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable);
 unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 784127e..19d90d2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1386,9 +1386,24 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 
return kvm_pfn_to_page(pfn);
 }
-
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
+struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn)
+{
+   struct page *page = gfn_to_page(kvm, gfn);
+
+   /*
+* gfn_to_page() will finally call hva_to_pfn() to get the pfn, and pin
+* the page in memory by calling GUP functions. This function unpins
+* the page.
+*/
+   if (!is_error_page(page))
+   put_page(page);
+
+   return page;
+}
+EXPORT_SYMBOL_GPL(gfn_to_page_no_pin);
+
 void kvm_release_page_clean(struct page *page)
 {
WARN_ON(is_error_page(page));
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 0/6] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-08-27 Thread Tang Chen

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

NOTE: Tested with -cpu xxx,-x2apic option.
  But since nested vm pins some other pages in memory, if user uses nested
  vm, memory hot-remove will not work.

Change log v3 - v4:
1. The original patch 6 is now patch 5. ( by Jan Kiszka jan.kis...@web.de )
2. The original patch 1 is now patch 6 since we should unpin apic access page
   at the very last moment.


Tang Chen (6):
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm, mem-hotplug: Reload L1' apic access page on migration in
vcpu_enter_guest().
  kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
running.
  kvm, mem-hotplug: Do not pin apic access page in memory.

 arch/x86/include/asm/kvm_host.h |   3 +-
 arch/x86/kvm/svm.c  |  15 +-
 arch/x86/kvm/vmx.c  | 103 +++-
 arch/x86/kvm/x86.c  |  22 +++--
 include/linux/kvm_host.h|   3 ++
 virt/kvm/kvm_main.c |  30 +++-
 6 files changed, 135 insertions(+), 41 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] memblock, memhotplug: Fix wrong type in memblock_find_in_range_node().

2014-08-10 Thread Tang Chen

In memblock_find_in_range_node(), we defeind ret as int. But it shoule
be phys_addr_t because it is used to store the return value from
__memblock_find_range_bottom_up().

The bug has not been triggered because when allocating low memory near
the kernel end, the "int ret" won't turn out to be minus. When we started
to allocate memory on other nodes, and the "int ret" could be minus.
Then the kernel will panic.

A simple way to reproduce this: comment out the following code in numa_init(),

memblock_set_bottom_up(false);

and the kernel won't boot.

Reported-by: Xishi Qiu 
Signed-off-by: Tang Chen 
---
 mm/memblock.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 6d2f219..70fad0c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -192,8 +192,7 @@ phys_addr_t __init_memblock 
memblock_find_in_range_node(phys_addr_t size,
phys_addr_t align, phys_addr_t start,
phys_addr_t end, int nid)
 {
-   int ret;
-   phys_addr_t kernel_end;
+   phys_addr_t kernel_end, ret;
 
/* pump up @end */
if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] memblock, memhotplug: Fix wrong type in memblock_find_in_range_node().

2014-08-10 Thread Tang Chen

In memblock_find_in_range_node(), we defeind ret as int. But it shoule
be phys_addr_t because it is used to store the return value from
__memblock_find_range_bottom_up().

The bug has not been triggered because when allocating low memory near
the kernel end, the int ret won't turn out to be minus. When we started
to allocate memory on other nodes, and the int ret could be minus.
Then the kernel will panic.

A simple way to reproduce this: comment out the following code in numa_init(),

memblock_set_bottom_up(false);

and the kernel won't boot.

Reported-by: Xishi Qiu qiuxi...@huawei.com
Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 mm/memblock.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 6d2f219..70fad0c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -192,8 +192,7 @@ phys_addr_t __init_memblock 
memblock_find_in_range_node(phys_addr_t size,
phys_addr_t align, phys_addr_t start,
phys_addr_t end, int nid)
 {
-   int ret;
-   phys_addr_t kernel_end;
+   phys_addr_t kernel_end, ret;
 
/* pump up @end */
if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 2061 matches

Mail list logo