[RFC PATCH 1/1] Move two pinned pages to non-movable node in kvm.

2014-06-17 Thread Tang Chen
Hi,

I met a problem when offlining memory with a kvm guest running.


[Problem]
When qemu creates vpus, it will call the following two functions
to allocate two pages:
1. alloc_apic_access_page(): allocate apic access page for FlexPriority in 
intel cpu.
2. alloc_identity_pagetable(): allocate ept identity pagetable for real mode.

And unfortunately, these two pages will be pinned in memory, and they cannot
be migrated. As a result, they cannot be offlined. And memory hot-remove will 
fail.



[The way I tried]
I tried to migrate these two pages, but I think I cannot find a proper way
to migrate them.

Let's take ept identity pagetable for example:
In my opinion, since it is pagetable, CPU will access this page every time the 
guest
read/write memory. For example, the following code will access memory:
int a;
a = 0;
So this ept identity pagetable page can be accessed at any time by CPU 
automatically.



[Solution]
I have a basic idea to solve this problem: allocate these two pages in 
non-movable nodes.
(For now, we can only hot-remove memory in movable nodes.)

alloc_identity_pagetable()
|-> __kvm_set_memory_region()
|   |-> kvm_arch_prepare_memory_region()
|   |-> userspace_addr = vm_mmap();
|   |-> memslot->userspace_addr = userspace_addr;  /* map usespace address 
(qemu) */
|
|   /*
|* Here, set memory policy for the mapped but not allocated page,
|* make it can only be allocated in non-movable nodes.
|* (We can reuse "numa_kernel_nodes" node mask in movable_node 
functionality.)
|*/
|
|-> page = gfn_to_page()  /* allocate and pin page */

Please refer to the attached patch for detail.
I did some basic test for the patch, and it will make memory offline succeed.



[Questions]
And by the way, would you guys please answer the following questions for me ?

1. What's the ept identity pagetable for ?  Only one page is enough ?

2. Is the ept identity pagetable only used in realmode ?
   Can we free it once the guest is up (vcpu in protect mode)?

3. Now, ept identity pagetable is allocated in qemu userspace.
   Can we allocate it in kernel space ?

4. If I want to migrate these two pages, what do you think is the best way ?

Thanks.


Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/numa.h | 1 +
 arch/x86/kvm/vmx.c  | 5 +
 arch/x86/kvm/x86.c  | 1 +
 arch/x86/mm/numa.c  | 3 ++-
 include/linux/mempolicy.h   | 6 ++
 mm/mempolicy.c  | 9 +
 6 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index 4064aca..6312577 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -30,6 +30,7 @@ extern int numa_off;
  */
 extern s16 __apicid_to_node[MAX_LOCAL_APIC];
 extern nodemask_t numa_nodes_parsed __initdata;
+extern nodemask_t numa_kernel_nodes;
 
 extern int __init numa_add_memblk(int nodeid, u64 start, u64 end);
 extern void __init numa_set_distance(int from, int to, int distance);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 801332e..4a3b5b5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "kvm_cache_regs.h"
 #include "x86.h"
 
@@ -3988,6 +3989,8 @@ static int alloc_apic_access_page(struct kvm *kvm)
if (r)
goto out;
 
+   numa_bind_non_movable(kvm_userspace_mem.userspace_addr, PAGE_SIZE);
+
page = gfn_to_page(kvm, 0xfee00);
if (is_error_page(page)) {
r = -EFAULT;
@@ -4018,6 +4021,8 @@ static int alloc_identity_pagetable(struct kvm *kvm)
if (r)
goto out;
 
+   numa_bind_non_movable(kvm_userspace_mem.userspace_addr, PAGE_SIZE);
+
page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f32a025..3962a23 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7295,6 +7295,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
return PTR_ERR((void *)userspace_addr);
 
memslot->userspace_addr = userspace_addr;
+   mem->userspace_addr = userspace_addr;
}
 
return 0;
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index a32b706..d706148 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -22,6 +22,8 @@
 
 int __initdata numa_off;
 nodemask_t numa_nodes_parsed __initdata;
+nodemask_t numa_kernel_nodes;
+EXPORT_SYMBOL(numa_kernel_nodes);
 
 struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
 EXPORT_SYMBOL(node_data);
@@ -557,7 +559,6 @@ static void __init numa_init_array(void)
 static void __init numa_clear_kernel_node_hotplug(void)
 {
int i, nid;
-   nodemask_t numa_kernel_nodes = NODE_MASK_NONE;
unsigned long start, end;

Re: [RFC PATCH 1/1] Move two pinned pages to non-movable node in kvm.

2014-06-17 Thread Tang Chen

Hi Gleb,

Thanks for the quick reply. Please see below.

On 06/18/2014 02:12 PM, Gleb Natapov wrote:

On Wed, Jun 18, 2014 at 01:50:00PM +0800, Tang Chen wrote:

[Questions]
And by the way, would you guys please answer the following questions for me ?

1. What's the ept identity pagetable for ?  Only one page is enough ?

2. Is the ept identity pagetable only used in realmode ?
Can we free it once the guest is up (vcpu in protect mode)?

3. Now, ept identity pagetable is allocated in qemu userspace.
Can we allocate it in kernel space ?

What would be the benefit?


I think the benefit is we can hot-remove the host memory a kvm guest
is using.

For now, only memory in ZONE_MOVABLE can be migrated/hot-removed. And 
the kernel
will never use ZONE_MOVABLE memory. So if we can allocate these two 
pages in
kernel space, we can pin them without any trouble. When doing memory 
hot-remove,

the kernel will not try to migrate these two pages.





4. If I want to migrate these two pages, what do you think is the best way ?


I answered most of those here: 
http://www.mail-archive.com/kvm@vger.kernel.org/msg103718.html


I'm sorry I must missed this email.

Seeing your advice, we can unpin these two pages and repin them in the 
next EPT violation.
So about this problem, which solution would you prefer, allocate these 
two pages in kernel

space, or migrate them before memory hot-remove ?

I think the first solution is simpler. But I'm not quite sure if there 
is any other pages
pinned in memory. If we have the same problem with other kvm pages, I 
think it is better to

solve it in the second way.

What do you think ?

Thanks.






--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/1] Move two pinned pages to non-movable node in kvm.

2014-06-19 Thread Tang Chen

Hi Marcelo,

Thanks for your reply. Please see below.

On 06/20/2014 03:00 AM, Marcelo Tosatti wrote:
..

Remove pinning is preferable. In fact looks like for identity pagetable
it should be trivial, just don't pin. APIC access page is a little bit
more complicated since its physical address needs to be tracked to be
updated in VMCS.


Yes, and there are new users of page pinning as well soon (see PEBS
threads on kvm-devel).

Was thinking of notifiers scheme. Perhaps:

->begin_page_unpin(struct page *page)
- Remove any possible access to page.

->end_page_unpin(struct page *page)
- Reinstantiate any possible access to page.

For KVM:

->begin_page_unpin()
- Remove APIC-access page address from VMCS.
  or
- Remove spte translation to pinned page.

- Put vcpu in state where no VM-entries are allowed.

->end_page_unpin()
- Setup APIC-access page, ...
- Allow vcpu to VM-entry.


Because allocating APIC access page from distant NUMA node can
be a performance problem, i believe.


Yes, I understand this.



I'd be happy to know why notifiers are overkill.


The notifiers are not overkill. I have been thinking about a similar idea.

In fact, we have met the same pinned pages problem in AIO subsystem.
The aio ring pages are pinned in memory, and cannot be migrated.

And in kernel, I believe, there are some other places where pages are 
pinned.



So I was thinking a notifier framework to solve this problem.
But I can see some problems:

1. When getting a page, migration thread doesn't know who is using this 
page
   and how. So we need a callback for each page to be called before and 
after

   it is migrated.
   (A little over thinking, maybe. Please see below.)

2. When migrating a shared page, one callback is not enouch because the 
page
   could be shared by different subsystems. They may have different 
ways to

   pin and unpin the page.

3. Where should we put the callback? Only file backing pages have one 
and only one
   address_space->address_space_operations->migratepage(). For 
anonymous pages,

   nowhere to put the callback.

   (A basic idea: define a global radix tree or hash table to manage 
the pinned
pages and their callbacks. Mel Gorman mentioned this idea when 
handling

the aio ring page problem. I'm not sure if this is acceptable.)


The idea above may be a little over thinking. Actually we can reuse the
memory hotplug notify chain if the pinned page migration is only used by
memory hotplug subsystem.

The basic idea is: Each subsystem register a callback to memory hotplug 
notify

chain, and unpin and repin the pages before and after page migration.

But I think, finally we will met this problem: How to remember/manage the
pinned pages in each subsystem.

For example, for kvm, ept identity pagetable page and apic page are pinned.
Since these two pages' struct_page pointer and user_addr are remember in 
kvm,

they are easy to handle. If we pin a page and remember it only in a stack
variable, it could be difficult to handle.


For now for kvm, I think notifiers can solve this problem.

Thanks for the advice. If you guys have any idea about this probelm, please
share with me.

Thanks.



















--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/1] Move two pinned pages to non-movable node in kvm.

2014-06-22 Thread Tang Chen

Hi Marcelo, Gleb,

Sorry for the delayed reply and thanks for the advices.

On 06/21/2014 04:39 AM, Marcelo Tosatti wrote:

On Fri, Jun 20, 2014 at 05:31:46PM -0300, Marcelo Tosatti wrote:

IIRC your shadow page pinning patch series support flushing of ptes
by mmu notifier by forcing MMU reload and, as a result, faulting in of
pinned pages during next entry.  Your patch series does not pin pages
by elevating their page count.


No but PEBS series does and its required to stop swap-out
of the page.


Well actually no because of mmu notifiers.

Tang, can you implement mmu notifiers for the other breaker of
mem hotplug ?


I'll try the mmu notifier idea and send a patch soon.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/1] Move two pinned pages to non-movable node in kvm.

2014-06-29 Thread Tang Chen

On 06/21/2014 04:39 AM, Marcelo Tosatti wrote:

On Fri, Jun 20, 2014 at 05:31:46PM -0300, Marcelo Tosatti wrote:

IIRC your shadow page pinning patch series support flushing of ptes
by mmu notifier by forcing MMU reload and, as a result, faulting in of
pinned pages during next entry.  Your patch series does not pin pages
by elevating their page count.


No but PEBS series does and its required to stop swap-out
of the page.


Well actually no because of mmu notifiers.

Tang, can you implement mmu notifiers for the other breaker of
mem hotplug ?


Hi Marcelo,

I made a patch to update ept and apic pages when finding them in the
next ept violation. And I also updated the APIC_ACCESS_ADDR phys_addr.
The pages can be migrated, but the guest crached.

How do I stop guest from access apic pages in mmu_notifier when the
page migration starts ?  Do I need to stop all the vcpus by set vcpu
state to KVM_MP_STATE_HALTED ?  If so, the vcpu will not able to go
to the next ept violation.

So, may I write any specific value into APIC_ACCESS_ADDR to stop guest
from access to apic page ?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/1] Move two pinned pages to non-movable node in kvm.

2014-06-30 Thread Tang Chen

Hi Gleb,

On 06/30/2014 02:00 PM, Gleb Natapov wrote:

On Mon, Jun 30, 2014 at 09:45:32AM +0800, Tang Chen wrote:

On 06/21/2014 04:39 AM, Marcelo Tosatti wrote:

On Fri, Jun 20, 2014 at 05:31:46PM -0300, Marcelo Tosatti wrote:

IIRC your shadow page pinning patch series support flushing of ptes
by mmu notifier by forcing MMU reload and, as a result, faulting in of
pinned pages during next entry.  Your patch series does not pin pages
by elevating their page count.


No but PEBS series does and its required to stop swap-out
of the page.


Well actually no because of mmu notifiers.

Tang, can you implement mmu notifiers for the other breaker of
mem hotplug ?


Hi Marcelo,

I made a patch to update ept and apic pages when finding them in the
next ept violation. And I also updated the APIC_ACCESS_ADDR phys_addr.
The pages can be migrated, but the guest crached.

How does it crash?


It just stopped running. The guest system is dead.
I'll try to debug it and give some more info.





How do I stop guest from access apic pages in mmu_notifier when the
page migration starts ?  Do I need to stop all the vcpus by set vcpu
state to KVM_MP_STATE_HALTED ?  If so, the vcpu will not able to go
to the next ept violation.

When apic access page is unmapped from ept pages by mmu notifiers you
need to set its value in VMCS to a physical address that will never be
mapped into guest memory. Zero for instance. You can do it by introducing
new KVM_REQ_ bit and set VMCS value during next vcpu's vmentry. On ept
violation you need to update VMCS pointer to newly allocated physical
address, you can use the same KVM_REQ_ mechanism again.



So, may I write any specific value into APIC_ACCESS_ADDR to stop guest
from access to apic page ?


Any phys address that will never be mapped into guest's memory should work.


Thanks for the advice. I'll try it.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] kvm, mem-hotplug: Update apic access page when it is migrated.

2014-07-02 Thread Tang Chen
apic access page is pinned in memory, and as a result it cannot be
migrated/hot-removed.

Actually it doesn't need to be pinned in memory.

This patch introduces a new vcpu request: KVM_REQ_MIGRATE_EPT. This requet
will be made when kvm_mmu_notifier_invalidate_page() is called when the page
is unmapped from the qemu user space to reset APIC_ACCESS_ADDR pointer in
each online vcpu to 0. And will also be made when ept violation happens to
reset APIC_ACCESS_ADDR to the new page phys_addr (host phys_addr).
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/mmu.c  | 15 +++
 arch/x86/kvm/vmx.c  |  9 -
 arch/x86/kvm/x86.c  | 20 
 include/linux/kvm_host.h|  1 +
 virt/kvm/kvm_main.c | 15 +++
 6 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8771c0f..f104b87 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -575,6 +575,7 @@ struct kvm_arch {
 
unsigned int tss_addr;
struct page *apic_access_page;
+   bool apic_access_page_migrated;
 
gpa_t wall_clock;
 
@@ -739,6 +740,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c0d72f6..a655444 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3436,6 +3436,21 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
kvm_make_request(KVM_REQ_MIGRATE_EPT, vcpu);
}
 
+   if (gpa == VMX_APIC_ACCESS_PAGE_ADDR &&
+   vcpu->kvm->arch.apic_access_page_migrated) {
+   int i;
+
+   vcpu->kvm->arch.apic_access_page_migrated = false;
+
+   /*
+* We need update APIC_ACCESS_ADDR pointer in each VMCS of
+* all the online vcpus.
+*/
+   for (i = 0; i < atomic_read(&vcpu->kvm->online_vcpus); i++)
+   kvm_make_request(KVM_REQ_MIGRATE_APIC,
+vcpu->kvm->vcpus[i]);
+   }
+
spin_unlock(&vcpu->kvm->mmu_lock);
 
return r;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c336cb3..abc152f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3988,7 +3988,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
if (r)
goto out;
 
-   page = gfn_to_page(kvm, VMX_APIC_ACCESS_PAGE_ADDR >> PAGE_SHIFT);
+   page = gfn_to_page_no_pin(kvm, VMX_APIC_ACCESS_PAGE_ADDR >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -7075,6 +7075,12 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   if (vm_need_virtualize_apic_accesses(kvm))
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8846,6 +8852,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a26524f..14e7174 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5943,6 +5943,24 @@ static void vcpu_migrated_page_update_ept(struct 
kvm_vcpu *vcpu)
}
 }
 
+static void vcpu_migrated_page_update_apic(struct kvm_vcpu *vcpu)
+{
+   struct kvm *kvm = vcpu->kvm;
+
+   if (kvm->arch.apic_access_page_migrated) {
+   if (kvm->arch.apic_access_page)
+   kvm->arch.apic_access_page = pfn_to_page(0);
+   kvm_x86_ops->set_apic_access_page_addr(kvm, 0x0ull);
+   } else {
+   struct page *page;
+   page = gfn_to_page_no_pin(kvm,
+   VMX_APIC_ACCESS_PAGE_ADDR >> PAGE_SHIFT);
+   kvm->arch.apic_access_page = page;
+   kvm_x86_ops->set_apic_access_page_addr(kvm,
+  page_to_phys(page));
+   }
+}
+
 /*
  * Returns 1

[PATCH 2/4] kvm: Add macro VMX_APIC_ACCESS_PAGE_ADDR

2014-07-02 Thread Tang Chen
Define guest phys_addr of apic access page.
---
 arch/x86/include/asm/vmx.h | 2 +-
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 7 ---
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 7004d21..c4672d1 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -422,7 +422,7 @@ enum vmcs_field {
 #define VMX_EPT_DIRTY_BIT  (1ull << 9)
 
 #define VMX_EPT_IDENTITY_PAGETABLE_ADDR0xfffbc000ul
-
+#define VMX_APIC_ACCESS_PAGE_ADDR  0xfee0ull
 
 #define ASM_VMX_VMCLEAR_RAX   ".byte 0x66, 0x0f, 0xc7, 0x30"
 #define ASM_VMX_VMLAUNCH  ".byte 0x0f, 0x01, 0xc2"
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ec8366c..22aa2ae 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm->asid_generation = 0;
init_vmcb(svm);
 
-   svm->vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm->vcpu.arch.apic_base = VMX_APIC_ACCESS_PAGE_ADDR |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&svm->vcpu))
svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 801332e..366b5b3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3982,13 +3982,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = VMX_APIC_ACCESS_PAGE_ADDR;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, VMX_APIC_ACCESS_PAGE_ADDR >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4460,7 +4460,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(&vmx->vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = VMX_APIC_ACCESS_PAGE_ADDR |
+MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&vmx->vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] kvm, memory-hotplug: Update ept identity pagetable when it is migrated.

2014-07-02 Thread Tang Chen
ept identity pagetable is pinned in memory, and as a result it cannot be
migrated/hot-removed.

But actually it doesn't need to be pinned in memory.

This patch introduces a new vcpu request: KVM_REQ_MIGRATE_EPT to reset ept
indetity pagetable related variable. This request will be made when
kvm_mmu_notifier_invalidate_page() is called when the page is unmapped
from the qemu user space to reset kvm->arch.ept_identity_pagetable to NULL.
And will also be made when ept violation happens to reset
kvm->arch.ept_identity_pagetable to the new page.
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu.c  | 11 +++
 arch/x86/kvm/vmx.c  |  3 ++-
 arch/x86/kvm/x86.c  | 16 
 include/linux/kvm_host.h|  1 +
 virt/kvm/kvm_main.c |  6 ++
 6 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4931415..8771c0f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -581,6 +581,7 @@ struct kvm_arch {
struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
+   bool ept_identity_pagetable_migrated;
 
unsigned long irq_sources_bitmap;
s64 kvmclock_offset;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..c0d72f6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3425,6 +3425,17 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
transparent_hugepage_adjust(vcpu, &gfn, &pfn, &level);
r = __direct_map(vcpu, gpa, write, map_writable,
 level, gfn, pfn, prefault);
+
+   /*
+* Update ept identity pagetable page and apic access page if
+* they are migrated.
+*/
+   if (gpa == vcpu->kvm->arch.ept_identity_map_addr &&
+   vcpu->kvm->arch.ept_identity_pagetable_migrated) {
+   vcpu->kvm->arch.ept_identity_pagetable_migrated = false;
+   kvm_make_request(KVM_REQ_MIGRATE_EPT, vcpu);
+   }
+
spin_unlock(&vcpu->kvm->mmu_lock);
 
return r;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 366b5b3..c336cb3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4018,7 +4018,8 @@ static int alloc_identity_pagetable(struct kvm *kvm)
if (r)
goto out;
 
-   page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
+   page = gfn_to_page_no_pin(kvm,
+   kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f32a025..a26524f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5929,6 +5929,20 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+static void vcpu_migrated_page_update_ept(struct kvm_vcpu *vcpu)
+{
+   struct kvm *kvm = vcpu->kvm;
+
+   if (kvm->arch.ept_identity_pagetable_migrated)
+   kvm->arch.ept_identity_pagetable = NULL;
+   else {
+   struct page *page;
+   page = gfn_to_page_no_pin(kvm,
+   kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
+   kvm->arch.ept_identity_pagetable = page;
+   }
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
@@ -5989,6 +6003,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_deliver_pmi(vcpu);
if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
vcpu_scan_ioapic(vcpu);
+   if (kvm_check_request(KVM_REQ_MIGRATE_EPT, vcpu))
+   vcpu_migrated_page_update_ept(vcpu);
}
 
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7c58d9d..4b7e51a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -136,6 +136,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_GLOBAL_CLOCK_UPDATE 22
 #define KVM_REQ_ENABLE_IBS23
 #define KVM_REQ_DISABLE_IBS   24
+#define KVM_REQ_MIGRATE_EPT   25
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID0
 #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID   1
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6091849..d271e89 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -294,6 +294,12 @@ static void kvm_mmu_notifier_invalidate_page(struct 
mmu_notifier *mn,
if (need_tlb_flush)
kvm_flush_remote_tlbs(kvm);
 
+   if (address ==
+   gfn_to_hva(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT)) {
+   kvm->arch.ept_identity_pagetab

[PATCH 0/4] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-07-02 Thread Tang Chen
Hi Gleb, Marcelo,

Please help to review this patch-set.

NOTE: This patch-set doesn't work properly.


ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

This patch-set introduces two new vcpu requests: KVM_REQ_MIGRATE_EPT and 
KVM_REQ_MIGRATE_APIC.
These two requests are made when the two pages are migrated by the mmu_notifier
to reset the related variable to unusable value. And will also be made when
ept violation happens to reset new pages.


[Known problem]
After this patch-set applied, the two pages can be migrated/hot-removed.
But after migrating apic access page, the guest died.

The host physical address of apic access page is stored in VMCS. I reset
it to 0 to stop guest from accessing it when it is unmapped by
kvm_mmu_notifier_invalidate_page(). And reset it to new page's host physical
address in tdp_page_fault(). But it seems that guest will access apic page
directly by the host physical address.


Tang Chen (4):
  kvm: Add gfn_to_page_no_pin()
  kvm: Add macro VMX_APIC_ACCESS_PAGE_ADDR
  kvm, memory-hotplug: Update ept identity pagetable when it is
migrated.
  kvm, mem-hotplug: Update apic access page when it is migrated.

 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/include/asm/vmx.h  |  2 +-
 arch/x86/kvm/mmu.c  | 26 ++
 arch/x86/kvm/svm.c  |  3 ++-
 arch/x86/kvm/vmx.c  | 17 +
 arch/x86/kvm/x86.c  | 36 
 include/linux/kvm_host.h|  3 +++
 virt/kvm/kvm_main.c | 38 +-
 8 files changed, 121 insertions(+), 7 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] kvm: Add gfn_to_page_no_pin()

2014-07-02 Thread Tang Chen
Used by the followed patches.
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c  | 17 -
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ec4e3bd..7c58d9d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -541,6 +541,7 @@ int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, 
struct page **pages,
int nr_pages);
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
+struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable);
 unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4b6c01b..6091849 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1371,9 +1371,24 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 
return kvm_pfn_to_page(pfn);
 }
-
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
+struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn)
+{
+   struct page *page = gfn_to_page(kvm, gfn);
+
+   /*
+* gfn_to_page() will finally call hva_to_pfn() to get the pfn, and pin
+* the page in memory by calling GUP functions. This function unpins
+* the page.
+*/
+   if (!is_error_page(page))
+   put_page(page);
+
+   return page;
+}
+EXPORT_SYMBOL_GPL(gfn_to_page_no_pin);
+
 void kvm_release_page_clean(struct page *page)
 {
WARN_ON(is_error_page(page));
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-07-02 Thread Tang Chen
Hi Gleb, Marcelo,

Please help to review this patch-set.

NOTE: This patch-set doesn't work properly.


ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

This patch-set introduces two new vcpu requests: KVM_REQ_MIGRATE_EPT and 
KVM_REQ_MIGRATE_APIC.
These two requests are made when the two pages are migrated by the mmu_notifier
to reset the related variable to unusable value. And will also be made when
ept violation happens to reset new pages.


[Known problem]
After this patch-set applied, the two pages can be migrated/hot-removed.
But after migrating apic access page, the guest died.

The host physical address of apic access page is stored in VMCS. I reset
it to 0 to stop guest from accessing it when it is unmapped by
kvm_mmu_notifier_invalidate_page(). And reset it to new page's host physical
address in tdp_page_fault(). But it seems that guest will access apic page
directly by the host physical address.


Tang Chen (4):
  kvm: Add gfn_to_page_no_pin()
  kvm: Add macro VMX_APIC_ACCESS_PAGE_ADDR
  kvm, memory-hotplug: Update ept identity pagetable when it is
migrated.
  kvm, mem-hotplug: Update apic access page when it is migrated.

 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/include/asm/vmx.h  |  2 +-
 arch/x86/kvm/mmu.c  | 26 ++
 arch/x86/kvm/svm.c  |  3 ++-
 arch/x86/kvm/vmx.c  | 17 +
 arch/x86/kvm/x86.c  | 36 
 include/linux/kvm_host.h|  3 +++
 virt/kvm/kvm_main.c | 38 +-
 8 files changed, 121 insertions(+), 7 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-07-02 Thread Tang Chen

Hi Gleb,

On 07/02/2014 05:00 PM, Tang Chen wrote:

Hi Gleb, Marcelo,

Please help to review this patch-set.

NOTE: This patch-set doesn't work properly.


ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

This patch-set introduces two new vcpu requests: KVM_REQ_MIGRATE_EPT and 
KVM_REQ_MIGRATE_APIC.
These two requests are made when the two pages are migrated by the mmu_notifier
to reset the related variable to unusable value. And will also be made when
ept violation happens to reset new pages.


[Known problem]
After this patch-set applied, the two pages can be migrated/hot-removed.
But after migrating apic access page, the guest died.

The host physical address of apic access page is stored in VMCS. I reset
it to 0 to stop guest from accessing it when it is unmapped by
kvm_mmu_notifier_invalidate_page(). And reset it to new page's host physical
address in tdp_page_fault(). But it seems that guest will access apic page
directly by the host physical address.


Would you please to give some advice about this problem ?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] kvm: Add macro VMX_APIC_ACCESS_PAGE_ADDR

2014-07-02 Thread Tang Chen

On 07/03/2014 12:24 AM, Gleb Natapov wrote:

On Wed, Jul 02, 2014 at 05:00:35PM +0800, Tang Chen wrote:

Define guest phys_addr of apic access page.
---
  arch/x86/include/asm/vmx.h | 2 +-
  arch/x86/kvm/svm.c | 3 ++-
  arch/x86/kvm/vmx.c | 7 ---
  3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 7004d21..c4672d1 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -422,7 +422,7 @@ enum vmcs_field {
  #define VMX_EPT_DIRTY_BIT (1ull<<  9)

  #define VMX_EPT_IDENTITY_PAGETABLE_ADDR   0xfffbc000ul
-
+#define VMX_APIC_ACCESS_PAGE_ADDR  0xfee0ull


It has nothing to do with VMX and there is already define for that: 
APIC_DEFAULT_PHYS_BASE


OK, followed.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] kvm, memory-hotplug: Update ept identity pagetable when it is migrated.

2014-07-02 Thread Tang Chen

On 07/03/2014 12:34 AM, Gleb Natapov wrote:

On Wed, Jul 02, 2014 at 05:00:36PM +0800, Tang Chen wrote:

ept identity pagetable is pinned in memory, and as a result it cannot be
migrated/hot-removed.

But actually it doesn't need to be pinned in memory.

This patch introduces a new vcpu request: KVM_REQ_MIGRATE_EPT to reset ept
indetity pagetable related variable. This request will be made when
kvm_mmu_notifier_invalidate_page() is called when the page is unmapped
from the qemu user space to reset kvm->arch.ept_identity_pagetable to NULL.
And will also be made when ept violation happens to reset
kvm->arch.ept_identity_pagetable to the new page.


kvm->arch.ept_identity_pagetable is never used as a page address, just
boolean null/!null to see if identity pagetable is initialized. I do
not see why would we want to track its address at all. Changing it to bool
and assigning true during initialization should be enough.


OK, followed.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] kvm, mem-hotplug: Update apic access page when it is migrated.

2014-07-03 Thread Tang Chen

Hi Gleb,

Thanks for the advices. Please see below.

On 07/03/2014 09:55 PM, Gleb Natapov wrote:
..

@@ -575,6 +575,7 @@ struct kvm_arch {

unsigned int tss_addr;
struct page *apic_access_page;
+   bool apic_access_page_migrated;

Better have two requests KVM_REQ_APIC_PAGE_MAP, KVM_REQ_APIC_PAGE_UNMAP IMO.



vcpu->requests is an unsigned long, and we can only has 64 requests. Isn't
adding two requests for apic page and another similar two for ept page too
many ?  Not sure.



gpa_t wall_clock;

@@ -739,6 +740,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c0d72f6..a655444 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3436,6 +3436,21 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
kvm_make_request(KVM_REQ_MIGRATE_EPT, vcpu);
}

+   if (gpa == VMX_APIC_ACCESS_PAGE_ADDR&&
+   vcpu->kvm->arch.apic_access_page_migrated) {

Why check arch.apic_access_page_migrated here? Isn't it enough that the fault 
is on apic
address.



True. It's enough. Followed.


+   int i;
+
+   vcpu->kvm->arch.apic_access_page_migrated = false;
+
+   /*
+* We need update APIC_ACCESS_ADDR pointer in each VMCS of
+* all the online vcpus.
+*/
+   for (i = 0; i<  atomic_read(&vcpu->kvm->online_vcpus); i++)
+   kvm_make_request(KVM_REQ_MIGRATE_APIC,
+vcpu->kvm->vcpus[i]);

make_all_cpus_request(). You need to kick all vcpus from a guest mode.



OK, followed. But would you please explain more about this. :)
Why need to kick all vcpus from guest mode when making request to all 
vcpus ?



+   }
+
spin_unlock(&vcpu->kvm->mmu_lock);

return r;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c336cb3..abc152f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3988,7 +3988,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
if (r)
goto out;

-   page = gfn_to_page(kvm, VMX_APIC_ACCESS_PAGE_ADDR>>  PAGE_SHIFT);
+   page = gfn_to_page_no_pin(kvm, VMX_APIC_ACCESS_PAGE_ADDR>>  PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -7075,6 +7075,12 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
  }

+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   if (vm_need_virtualize_apic_accesses(kvm))

This shouldn't even been called if apic access page is not supported. Nor
mmu_notifier path neither tdp_page_fault path should ever see 0xfee0
address. BUG() is more appropriate here.



I don't quite understand. Why calling this function here will leed to bug ?
(Sorry, I'm not quite understand the internal of KVM. Please help.)




+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
  static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
  {
u16 status;
@@ -8846,6 +8852,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,

svm needs that too.



OK, will add one for svm.


.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a26524f..14e7174 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5943,6 +5943,24 @@ static void vcpu_migrated_page_update_ept(struct 
kvm_vcpu *vcpu)
}
  }

+static void vcpu_migrated_page_update_apic(struct kvm_vcpu *vcpu)
+{
+   struct kvm *kvm = vcpu->kvm;
+
+   if (kvm->arch.apic_access_page_migrated) {
+   if (kvm->arch.apic_access_page)
+   kvm->arch.apic_access_page = pfn_to_page(0);

All vcpus will access apic_access_page without locking here. May be
set kvm->arch.apic_access_page to zero in mmu_notifier and here call
  kvm_x86_ops->set_apic_access_page_addr(kvm, kvm->arch.apic_access_page);



I'm a little confused. apic access page's phys_addr is stored in vmcs, and
I think it will be used by vcpu directly to access the physical page.
Setting kvm->arch.apic_access_page to zero wil

Re: [PATCH 4/4] kvm, mem-hotplug: Update apic access page when it is migrated.

2014-07-03 Thread Tang Chen

Hi Gleb,

Thanks for the advices. Please see below.

On 07/03/2014 09:55 PM, Gleb Natapov wrote:
..

@@ -575,6 +575,7 @@ struct kvm_arch {

unsigned int tss_addr;
struct page *apic_access_page;
+   bool apic_access_page_migrated;

Better have two requests KVM_REQ_APIC_PAGE_MAP, KVM_REQ_APIC_PAGE_UNMAP IMO.



vcpu->requests is an unsigned long, and we can only has 64 requests. Isn't
adding two requests for apic page and another similar two for ept page too
many ?  Not sure.



gpa_t wall_clock;

@@ -739,6 +740,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c0d72f6..a655444 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3436,6 +3436,21 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
kvm_make_request(KVM_REQ_MIGRATE_EPT, vcpu);
}

+   if (gpa == VMX_APIC_ACCESS_PAGE_ADDR&&
+   vcpu->kvm->arch.apic_access_page_migrated) {

Why check arch.apic_access_page_migrated here? Isn't it enough that the fault 
is on apic
address.



True. It's enough. Followed.


+   int i;
+
+   vcpu->kvm->arch.apic_access_page_migrated = false;
+
+   /*
+* We need update APIC_ACCESS_ADDR pointer in each VMCS of
+* all the online vcpus.
+*/
+   for (i = 0; i<  atomic_read(&vcpu->kvm->online_vcpus); i++)
+   kvm_make_request(KVM_REQ_MIGRATE_APIC,
+vcpu->kvm->vcpus[i]);

make_all_cpus_request(). You need to kick all vcpus from a guest mode.



OK, followed. But would you please explain more about this. :)
Why need to kick all vcpus from guest mode when making request to all 
vcpus ?



+   }
+
spin_unlock(&vcpu->kvm->mmu_lock);

return r;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c336cb3..abc152f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3988,7 +3988,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
if (r)
goto out;

-   page = gfn_to_page(kvm, VMX_APIC_ACCESS_PAGE_ADDR>>  PAGE_SHIFT);
+   page = gfn_to_page_no_pin(kvm, VMX_APIC_ACCESS_PAGE_ADDR>>  PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -7075,6 +7075,12 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
  }

+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   if (vm_need_virtualize_apic_accesses(kvm))

This shouldn't even been called if apic access page is not supported. Nor
mmu_notifier path neither tdp_page_fault path should ever see 0xfee0
address. BUG() is more appropriate here.



I don't quite understand. Why calling this function here will leed to bug ?
(Sorry, I'm not quite understand the internal of KVM. Please help.)




+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
  static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
  {
u16 status;
@@ -8846,6 +8852,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,

svm needs that too.



OK, will add one for svm.


.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a26524f..14e7174 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5943,6 +5943,24 @@ static void vcpu_migrated_page_update_ept(struct 
kvm_vcpu *vcpu)
}
  }

+static void vcpu_migrated_page_update_apic(struct kvm_vcpu *vcpu)
+{
+   struct kvm *kvm = vcpu->kvm;
+
+   if (kvm->arch.apic_access_page_migrated) {
+   if (kvm->arch.apic_access_page)
+   kvm->arch.apic_access_page = pfn_to_page(0);

All vcpus will access apic_access_page without locking here. May be
set kvm->arch.apic_access_page to zero in mmu_notifier and here call
  kvm_x86_ops->set_apic_access_page_addr(kvm, kvm->arch.apic_access_page);



I'm a little confused. apic access page's phys_addr is stored in vmcs, and
I think it will be used by vcpu directly to access the physical page.
Setting kvm->arch.apic_access_page to zero wil

Re: [PATCH 3/4] kvm, memory-hotplug: Update ept identity pagetable when it is migrated.

2014-07-03 Thread Tang Chen

Hi Gleb,

On 07/03/2014 12:34 AM, Gleb Natapov wrote:

On Wed, Jul 02, 2014 at 05:00:36PM +0800, Tang Chen wrote:

ept identity pagetable is pinned in memory, and as a result it cannot be
migrated/hot-removed.

But actually it doesn't need to be pinned in memory.

This patch introduces a new vcpu request: KVM_REQ_MIGRATE_EPT to reset ept
indetity pagetable related variable. This request will be made when
kvm_mmu_notifier_invalidate_page() is called when the page is unmapped
from the qemu user space to reset kvm->arch.ept_identity_pagetable to NULL.
And will also be made when ept violation happens to reset
kvm->arch.ept_identity_pagetable to the new page.


kvm->arch.ept_identity_pagetable is never used as a page address, just
boolean null/!null to see if identity pagetable is initialized. I do
not see why would we want to track its address at all. Changing it to bool
and assigning true during initialization should be enough.


We already have kvm->arch.ept_identity_pagetable_done to indicate if the 
ept
identity table is initialized. If we make 
kvm->arch.ept_identity_pagetable to

bool, do you mean we have:

kvm->arch.ept_identity_pagetable: indicate if ept page is allocated,
kvm->arch.ept_identity_pagetable_done: indicate if ept page is initialized ?

I don't think we need this. Shall we remove 
kvm->arch.ept_identity_pagetable ?


Thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-07-03 Thread Tang Chen

Hi Gleb,

On 07/03/2014 02:04 PM, Gleb Natapov wrote:

On Thu, Jul 03, 2014 at 09:17:59AM +0800, Tang Chen wrote:

Hi Gleb,

On 07/02/2014 05:00 PM, Tang Chen wrote:

Hi Gleb, Marcelo,

Please help to review this patch-set.

NOTE: This patch-set doesn't work properly.


ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

This patch-set introduces two new vcpu requests: KVM_REQ_MIGRATE_EPT and 
KVM_REQ_MIGRATE_APIC.
These two requests are made when the two pages are migrated by the mmu_notifier
to reset the related variable to unusable value. And will also be made when
ept violation happens to reset new pages.


[Known problem]
After this patch-set applied, the two pages can be migrated/hot-removed.
But after migrating apic access page, the guest died.

The host physical address of apic access page is stored in VMCS. I reset
it to 0 to stop guest from accessing it when it is unmapped by
kvm_mmu_notifier_invalidate_page(). And reset it to new page's host physical
address in tdp_page_fault(). But it seems that guest will access apic page
directly by the host physical address.


Would you please to give some advice about this problem ?


I haven't reviewed third patch yet, will do ASAP.



I printed some info in the kernel, and I found that mmu_notifier 
unmapped the
apic page and set VMCS APIC_ACCESS_ADDR to 0. But apic page ept 
violation didn't

happen. And the guest stopped running.

I think when guest tried to access apic page, there was no ept violation 
happened.

And as a result, VMCS APIC_ACCESS_ADDR was not correctly set.

Referring to Intel Software Developer's Manuel Vol 3B, when accessing 
apic page
using translation with a large page (2M, 4M, 1G), APIC VM_exit will not 
happen.


How do you think about this ?

Thanks. :)




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] kvm, mem-hotplug: Update apic access page when it is migrated.

2014-07-06 Thread Tang Chen

Hi Gleb,

Thanks for all the advices. Please see below.

On 07/04/2014 06:13 PM, Gleb Natapov wrote:
..

+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   if (vm_need_virtualize_apic_accesses(kvm))

This shouldn't even been called if apic access page is not supported. Nor
mmu_notifier path neither tdp_page_fault path should ever see 0xfee0
address. BUG() is more appropriate here.



I don't quite understand. Why calling this function here will leed to bug ?
(Sorry, I'm not quite understand the internal of KVM. Please help.)

I didn't say that calling this function here will lead to a bug. I am saying 
that
if vm_need_virtualize_apic_accesses() is false this function should not be 
called
at all, so this check is redundant.



Do you mean when vm_need_virtualize_apic_accesses() is false, it should 
not be called ?

It has to be true ?

..

+   if (kvm->arch.apic_access_page_migrated) {
+   if (kvm->arch.apic_access_page)
+   kvm->arch.apic_access_page = pfn_to_page(0);

All vcpus will access apic_access_page without locking here. May be
set kvm->arch.apic_access_page to zero in mmu_notifier and here call
  kvm_x86_ops->set_apic_access_page_addr(kvm, kvm->arch.apic_access_page);



I'm a little confused. apic access page's phys_addr is stored in vmcs, and
I think it will be used by vcpu directly to access the physical page.
Setting kvm->arch.apic_access_page to zero will not stop it, right ?


Right, kvm->arch.apic_access_page is just a shadow value for whatever is written
in vmcs. After setting it all vcpus need to update their vmcs values.


I'm wondering what happens when apic page is migrated, but the vmcs is still
holding its old phys_addr before the vcpu request is handled.


apic page should not be migrated untill all vpus are forced out of a guest mode 
and
instructed to reload new value on a next guest entry. That's what we are trying 
to
achieve here.



So, setting VMCS APIC_ACCESS_ADDR pointer to zero will not stop vcpu to 
access

apic access page, right ?

If so, all the vcpus have to stop till apic page finishes its migration, 
and new

value is set in each vcpu, which means we should stop guest, right ?

Thanks.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] kvm, mem-hotplug: Update apic access page when it is migrated.

2014-07-07 Thread Tang Chen

Hi Gleb,

The guest hang problem has been solved.

When mmu_notifier is called, I set VMCS APIC_ACCESS_ADDR to the new value
instead of setting it to 0. And only update kvm->arch.apic_access_page in
the next ept violation.

The guest is running well now.

I'll post the new patches tomorrow. ;)

Thanks.


On 07/04/2014 06:13 PM, Gleb Natapov wrote:

On Fri, Jul 04, 2014 at 10:18:25AM +0800, Tang Chen wrote:

Hi Gleb,

Thanks for the advices. Please see below.

On 07/03/2014 09:55 PM, Gleb Natapov wrote:
..

@@ -575,6 +575,7 @@ struct kvm_arch {

unsigned int tss_addr;
struct page *apic_access_page;
+   bool apic_access_page_migrated;

Better have two requests KVM_REQ_APIC_PAGE_MAP, KVM_REQ_APIC_PAGE_UNMAP IMO.



vcpu->requests is an unsigned long, and we can only has 64 requests. Isn't
adding two requests for apic page and another similar two for ept page too
many ?  Not sure.


Lets not worry about that for now. May be it is enough to have only one
KVM_REQ_APIC_PAGE_RELOAD request set apic_access_page to a new value
before sending the request and reload whatever is in apic_access_page
during KVM_REQ_APIC_PAGE_RELOAD processing. Or we can even reload
apic_access_page as part of mmu reload and reuse KVM_REQ_MMU_RELOAD.



gpa_t wall_clock;

@@ -739,6 +740,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c0d72f6..a655444 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3436,6 +3436,21 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
kvm_make_request(KVM_REQ_MIGRATE_EPT, vcpu);
}

+   if (gpa == VMX_APIC_ACCESS_PAGE_ADDR&&
+   vcpu->kvm->arch.apic_access_page_migrated) {

Why check arch.apic_access_page_migrated here? Isn't it enough that the fault 
is on apic
address.



True. It's enough. Followed.


+   int i;
+
+   vcpu->kvm->arch.apic_access_page_migrated = false;
+
+   /*
+* We need update APIC_ACCESS_ADDR pointer in each VMCS of
+* all the online vcpus.
+*/
+   for (i = 0; i<   atomic_read(&vcpu->kvm->online_vcpus); i++)
+   kvm_make_request(KVM_REQ_MIGRATE_APIC,
+vcpu->kvm->vcpus[i]);

make_all_cpus_request(). You need to kick all vcpus from a guest mode.



OK, followed. But would you please explain more about this. :)
Why need to kick all vcpus from guest mode when making request to all vcpus
?

Because if you do not force other vcpus from a guest mode they will not reload
apic_access_page value till next vmexit, but since EPT page table now has a 
mapping
for 0xfee0 access to this address will not cause EPT violation and will not 
cause
apic exit either.




+   }
+
spin_unlock(&vcpu->kvm->mmu_lock);

return r;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c336cb3..abc152f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3988,7 +3988,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
if (r)
goto out;

-   page = gfn_to_page(kvm, VMX_APIC_ACCESS_PAGE_ADDR>>   PAGE_SHIFT);
+   page = gfn_to_page_no_pin(kvm, VMX_APIC_ACCESS_PAGE_ADDR>>   
PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -7075,6 +7075,12 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
  }

+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   if (vm_need_virtualize_apic_accesses(kvm))

This shouldn't even been called if apic access page is not supported. Nor
mmu_notifier path neither tdp_page_fault path should ever see 0xfee0
address. BUG() is more appropriate here.



I don't quite understand. Why calling this function here will leed to bug ?
(Sorry, I'm not quite understand the internal of KVM. Please help.)

I didn't say that calling this function here will lead to a bug. I am saying 
that
if vm_need_virtualize_apic_accesses() is false this function should not be 
called
at all, so this check is redundant.






+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
  static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
  {
u16 status;
@@ -8846,6 +8852,7 @@ static struct kvm_x86_ops vmx_x86_o

Re: [PATCH 4/4] kvm, mem-hotplug: Update apic access page when it is migrated.

2014-07-07 Thread Tang Chen

Hi Nadav,

Thanks for the reply, please see below.

On 07/07/2014 08:10 PM, Nadav Amit wrote:

On 7/7/14, 2:54 PM, Gleb Natapov wrote:

On Mon, Jul 07, 2014 at 02:42:27PM +0300, Nadav Amit wrote:

Tang,

Running some (unrelated) tests I see that KVM does not handle APIC base
relocation correctly. When the base is changed, kvm_lapic_set_base just
changes lapic->base_address without taking further action (i.e.,
modifying
the VMCS apic address in VMX).

This patch follows KVM bad behavior by using the constant
VMX_APIC_ACCESS_PAGE_ADDR instead of lapic->base_address.

There is no OS out there that relocates APIC base (in fact it was not
always
relocatable on real HW), so there is not point in complicating the
code to support
it. In fact current APIC_ACCESS_ADDR handling relies on the fact that
all vcpus
has apic mapped at the same address.



Anyhow, I didn't see anything that would make my life (in fixing the
lapic
base issue) too difficult. Yet, feel free in making it more
"fix-friendly".


Why would you want to fix it?


If there is no general need, I will not send a fix. However, I think the
very least a warning message should be appear if the guest relocates the
APIC base.


Maybe I didn't understand you question correctly. If I'm wrong, please 
tell me.


This patch does not relocate APIC base in guest, but in host. Host migrates
the apic page to somewhere else, and KVM updates ept pagetable to track it.
In guest, apic base address (gpa) doesn't change.

Is this lapic->base_address a hpa ?

Is there anywhere I need to update in my patch ?

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] kvm, mem-hotplug: Update apic access page when it is migrated.

2014-07-08 Thread Tang Chen

Hi Wanpeng,

On 07/07/2014 06:35 PM, Wanpeng Li wrote:

On Wed, Jul 02, 2014 at 05:00:37PM +0800, Tang Chen wrote:

apic access page is pinned in memory, and as a result it cannot be
migrated/hot-removed.

Actually it doesn't need to be pinned in memory.

This patch introduces a new vcpu request: KVM_REQ_MIGRATE_EPT. This requet


s/KVM_REQ_MIGRATE_EPT/KVM_REQ_MIGRATE_APIC


Thanks, will fix it in the next version.

Thanks.




will be made when kvm_mmu_notifier_invalidate_page() is called when the page
is unmapped from the qemu user space to reset APIC_ACCESS_ADDR pointer in
each online vcpu to 0. And will also be made when ept violation happens to
reset APIC_ACCESS_ADDR to the new page phys_addr (host phys_addr).
---
arch/x86/include/asm/kvm_host.h |  2 ++
arch/x86/kvm/mmu.c  | 15 +++
arch/x86/kvm/vmx.c  |  9 -
arch/x86/kvm/x86.c  | 20 
include/linux/kvm_host.h|  1 +
virt/kvm/kvm_main.c | 15 +++
6 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8771c0f..f104b87 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -575,6 +575,7 @@ struct kvm_arch {

unsigned int tss_addr;
struct page *apic_access_page;
+   bool apic_access_page_migrated;

gpa_t wall_clock;

@@ -739,6 +740,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c0d72f6..a655444 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3436,6 +3436,21 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
kvm_make_request(KVM_REQ_MIGRATE_EPT, vcpu);
}

+   if (gpa == VMX_APIC_ACCESS_PAGE_ADDR&&
+   vcpu->kvm->arch.apic_access_page_migrated) {
+   int i;
+
+   vcpu->kvm->arch.apic_access_page_migrated = false;
+
+   /*
+* We need update APIC_ACCESS_ADDR pointer in each VMCS of
+* all the online vcpus.
+*/
+   for (i = 0; i<  atomic_read(&vcpu->kvm->online_vcpus); i++)
+   kvm_make_request(KVM_REQ_MIGRATE_APIC,
+vcpu->kvm->vcpus[i]);
+   }
+
spin_unlock(&vcpu->kvm->mmu_lock);

return r;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c336cb3..abc152f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3988,7 +3988,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
if (r)
goto out;

-   page = gfn_to_page(kvm, VMX_APIC_ACCESS_PAGE_ADDR>>  PAGE_SHIFT);
+   page = gfn_to_page_no_pin(kvm, VMX_APIC_ACCESS_PAGE_ADDR>>  PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -7075,6 +7075,12 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
}

+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   if (vm_need_virtualize_apic_accesses(kvm))
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
{
u16 status;
@@ -8846,6 +8852,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a26524f..14e7174 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5943,6 +5943,24 @@ static void vcpu_migrated_page_update_ept(struct 
kvm_vcpu *vcpu)
}
}

+static void vcpu_migrated_page_update_apic(struct kvm_vcpu *vcpu)
+{
+   struct kvm *kvm = vcpu->kvm;
+
+   if (kvm->arch.apic_access_page_migrated) {
+   if (kvm->arch.apic_access_page)
+   kvm->arch.apic_access_page = pfn_to_page(0);
+   kvm_x86_ops->set_apic_access_page_addr(kvm, 0x0ull);
+   } else {
+   struct page *page;
+   page = gfn_to_page_no_pin(kvm,
+   

[PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

2014-07-08 Thread Tang Chen
apic access page is pinned in memory. As a result, it cannot be 
migrated/hot-removed.
Actually, it is not necessary to be pinned.

The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
corresponding ept entry. This patch introduces a new vcpu request named
KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
and force all the vcpus exit guest, and re-enter guest till they updates the 
VMCS
APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
kvm->arch.apic_access_page to the new page.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu.c  | 11 +++
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  |  8 +++-
 arch/x86/kvm/x86.c  | 14 ++
 include/linux/kvm_host.h|  2 ++
 virt/kvm/kvm_main.c | 12 
 7 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 62f973e..9ce6bfd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -737,6 +737,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..551693d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3427,6 +3427,17 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
 level, gfn, pfn, prefault);
spin_unlock(&vcpu->kvm->mmu_lock);
 
+   /*
+* apic access page could be migrated. When the guest tries to access
+* the apic access page, ept violation will occur, and we can use GUP
+* to find the new page.
+*
+* GUP will wait till the migrate entry be replaced with the new page.
+*/
+   if (gpa == APIC_DEFAULT_PHYS_BASE)
+   vcpu->kvm->arch.apic_access_page = gfn_to_page_no_pin(vcpu->kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+
return r;
 
 out_unlock:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 576b525..dc76f29 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3612,6 +3612,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4365,6 +4370,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5532ac8..f7c6313 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3992,7 +3992,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
if (r)
goto out;
 
-   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   page = gfn_to_page_no_pin(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -7073,6 +7073,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8842,6 +8847,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ffbe557..7080eda 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5929,6 +5929,18 @@ static void vcpu_scan_ioapic(struct kvm_vcpu

[PATCH v2 0/5] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-07-08 Thread Tang Chen
ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

Change log v1 -> v2:
1. Add [PATCH 4/5] to remove unnecessary kvm_arch->ept_identity_pagetable.
2. In [PATCH 3/5], only introduce KVM_REQ_APIC_PAGE_RELOAD request.
3. In [PATCH 3/5], add set_apic_access_page_addr() for svm.


Tang Chen (5):
  kvm: Add gfn_to_page_no_pin() to translate gfn to page without
pinning.
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm, mem-hotplug: Do not pin ept identity pagetable in memory.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm, mem-hotplug: Do not pin apic access page in memory.

 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/mmu.c  | 11 +++
 arch/x86/kvm/svm.c  |  9 -
 arch/x86/kvm/vmx.c  | 40 ++--
 arch/x86/kvm/x86.c  | 16 ++--
 include/linux/kvm_host.h|  3 +++
 virt/kvm/kvm_main.c | 29 -
 7 files changed, 87 insertions(+), 23 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/5] kvm, mem-hotplug: Do not pin ept identity pagetable in memory.

2014-07-08 Thread Tang Chen
ept identity page is pinned in memory. As a result, it cannot be 
migrated/hot-removed.

Actually, this page is not necessary to be pinned. When it is migrated,
mmu_notifier_invalidate_page() in try_to_unmap_one() will invalidate the ept
entry so that the guest won't be able to access the page. And in the next ept
violation, the new page will be found by ept violation handler.

This patch just unpin the ept identity page because it is not necessary.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c | 3 ++-
 arch/x86/kvm/x86.c | 2 --
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0e1117c..0918635e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4018,7 +4018,8 @@ static int alloc_identity_pagetable(struct kvm *kvm)
if (r)
goto out;
 
-   page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
+   page = gfn_to_page_no_pin(kvm,
+   kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f32a025..ffbe557 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7177,8 +7177,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_free_vcpus(kvm);
if (kvm->arch.apic_access_page)
put_page(kvm->arch.apic_access_page);
-   if (kvm->arch.ept_identity_pagetable)
-   put_page(kvm->arch.ept_identity_pagetable);
kfree(rcu_dereference_check(kvm->arch.apic_map, 1));
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/5] kvm: Add gfn_to_page_no_pin() to translate gfn to page without pinning.

2014-07-08 Thread Tang Chen
gfn_to_page() will finally call hva_to_pfn() to get the pfn, and pin the page
in memory by calling GUP functions. This function unpins the page.

Will be used by the followed patches.

Signed-off-by: Tang Chen 
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c  | 17 -
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ec4e3bd..7c58d9d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -541,6 +541,7 @@ int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, 
struct page **pages,
int nr_pages);
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
+struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable);
 unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4b6c01b..6091849 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1371,9 +1371,24 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 
return kvm_pfn_to_page(pfn);
 }
-
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
+struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn)
+{
+   struct page *page = gfn_to_page(kvm, gfn);
+
+   /*
+* gfn_to_page() will finally call hva_to_pfn() to get the pfn, and pin
+* the page in memory by calling GUP functions. This function unpins
+* the page.
+*/
+   if (!is_error_page(page))
+   put_page(page);
+
+   return page;
+}
+EXPORT_SYMBOL_GPL(gfn_to_page_no_pin);
+
 void kvm_release_page_clean(struct page *page)
 {
WARN_ON(is_error_page(page));
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/5] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-07-08 Thread Tang Chen
kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 25 +++--
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4931415..62f973e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -578,7 +578,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0918635e..5532ac8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -741,6 +741,7 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu 
*vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
 static bool vmx_mpx_supported(void);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3921,21 +3922,21 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm->arch.ept_identity_pagetable)) {
-   printk(KERN_ERR "EPT: identity-mapping pagetable "
-   "haven't been allocated!\n");
-   return 0;
-   }
if (likely(kvm->arch.ept_identity_pagetable_done))
return 1;
-   ret = 0;
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+
+   mutex_lock(&kvm->slots_lock);
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out;
+
idx = srcu_read_lock(&kvm->srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
@@ -3953,6 +3954,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(&kvm->srcu, idx);
+
+out2:
+   mutex_unlock(&kvm->slots_lock);
return ret;
 }
 
@@ -4006,9 +4010,6 @@ static int alloc_identity_pagetable(struct kvm *kvm)
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(&kvm->slots_lock);
-   if (kvm->arch.ept_identity_pagetable)
-   goto out;
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
@@ -4025,9 +4026,7 @@ static int alloc_identity_pagetable(struct kvm *kvm)
goto out;
}
 
-   kvm->arch.ept_identity_pagetable = page;
 out:
-   mutex_unlock(&kvm->slots_lock);
return r;
 }
 
@@ -7583,8 +7582,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/5] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-07-08 Thread Tang Chen
We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ec8366c..576b525 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm->asid_generation = 0;
init_vmcb(svm);
 
-   svm->vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&svm->vcpu))
svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 801332e..0e1117c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3982,13 +3982,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4460,7 +4460,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(&vmx->vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&vmx->vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-07-08 Thread Tang Chen

On 07/08/2014 09:01 PM, Tang Chen wrote:

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

Change log v1 ->  v2:
1. Add [PATCH 4/5] to remove unnecessary kvm_arch->ept_identity_pagetable.
2. In [PATCH 3/5], only introduce KVM_REQ_APIC_PAGE_RELOAD request.


 s/[PATCH 3/5]/[PATCH 5/5]


3. In [PATCH 3/5], add set_apic_access_page_addr() for svm.


 s/[PATCH 3/5]/[PATCH 5/5]




Tang Chen (5):
   kvm: Add gfn_to_page_no_pin() to translate gfn to page without
 pinning.
   kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
   kvm, mem-hotplug: Do not pin ept identity pagetable in memory.
   kvm: Remove ept_identity_pagetable from struct kvm_arch.
   kvm, mem-hotplug: Do not pin apic access page in memory.

  arch/x86/include/asm/kvm_host.h |  2 +-
  arch/x86/kvm/mmu.c  | 11 +++
  arch/x86/kvm/svm.c  |  9 -
  arch/x86/kvm/vmx.c  | 40 ++--
  arch/x86/kvm/x86.c  | 16 ++--
  include/linux/kvm_host.h|  3 +++
  virt/kvm/kvm_main.c | 29 -
  7 files changed, 87 insertions(+), 23 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/5] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-07-08 Thread Tang Chen

On 07/08/2014 09:01 PM, Tang Chen wrote:

kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

Signed-off-by: Tang Chen
---
  arch/x86/include/asm/kvm_host.h |  1 -
  arch/x86/kvm/vmx.c  | 25 +++--
  2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4931415..62f973e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -578,7 +578,6 @@ struct kvm_arch {

gpa_t wall_clock;

-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0918635e..5532ac8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -741,6 +741,7 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu 
*vcpu);
  static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
  static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
  static bool vmx_mpx_supported(void);
+static int alloc_identity_pagetable(struct kvm *kvm);

  static DEFINE_PER_CPU(struct vmcs *, vmxarea);
  static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3921,21 +3922,21 @@ out:

  static int init_rmode_identity_map(struct kvm *kvm)
  {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;

if (!enable_ept)
return 1;
-   if (unlikely(!kvm->arch.ept_identity_pagetable)) {
-   printk(KERN_ERR "EPT: identity-mapping pagetable "
-   "haven't been allocated!\n");
-   return 0;
-   }
if (likely(kvm->arch.ept_identity_pagetable_done))
return 1;
-   ret = 0;
identity_map_pfn = kvm->arch.ept_identity_map_addr>>  PAGE_SHIFT;
+
+   mutex_lock(&kvm->slots_lock);
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out;


s/goto out/goto out2

Will resend the patch soon.


+
idx = srcu_read_lock(&kvm->srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r<  0)
@@ -3953,6 +3954,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
  out:
srcu_read_unlock(&kvm->srcu, idx);
+
+out2:
+   mutex_unlock(&kvm->slots_lock);
return ret;
  }

@@ -4006,9 +4010,6 @@ static int alloc_identity_pagetable(struct kvm *kvm)
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;

-   mutex_lock(&kvm->slots_lock);
-   if (kvm->arch.ept_identity_pagetable)
-   goto out;
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
@@ -4025,9 +4026,7 @@ static int alloc_identity_pagetable(struct kvm *kvm)
goto out;
}

-   kvm->arch.ept_identity_pagetable = page;
  out:
-   mutex_unlock(&kvm->slots_lock);
return r;
  }

@@ -7583,8 +7582,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RESEND PATCH v2 4/5] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-07-08 Thread Tang Chen
kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 25 +++--
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4931415..62f973e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -578,7 +578,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0918635e..fe2e5f4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -741,6 +741,7 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu 
*vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
 static bool vmx_mpx_supported(void);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3921,21 +3922,21 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm->arch.ept_identity_pagetable)) {
-   printk(KERN_ERR "EPT: identity-mapping pagetable "
-   "haven't been allocated!\n");
-   return 0;
-   }
if (likely(kvm->arch.ept_identity_pagetable_done))
return 1;
-   ret = 0;
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+
+   mutex_lock(&kvm->slots_lock);
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(&kvm->srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
@@ -3953,6 +3954,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(&kvm->srcu, idx);
+
+out2:
+   mutex_unlock(&kvm->slots_lock);
return ret;
 }
 
@@ -4006,9 +4010,6 @@ static int alloc_identity_pagetable(struct kvm *kvm)
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(&kvm->slots_lock);
-   if (kvm->arch.ept_identity_pagetable)
-   goto out;
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
@@ -4025,9 +4026,7 @@ static int alloc_identity_pagetable(struct kvm *kvm)
goto out;
}
 
-   kvm->arch.ept_identity_pagetable = page;
 out:
-   mutex_unlock(&kvm->slots_lock);
return r;
 }
 
@@ -7583,8 +7582,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-07-10 Thread Tang Chen

hi Gleb, Marcelo, Nadav,

Would you please help to review these patches ?

Thanks. :)

On 07/08/2014 09:01 PM, Tang Chen wrote:

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

Change log v1 ->  v2:
1. Add [PATCH 4/5] to remove unnecessary kvm_arch->ept_identity_pagetable.
2. In [PATCH 3/5], only introduce KVM_REQ_APIC_PAGE_RELOAD request.
3. In [PATCH 3/5], add set_apic_access_page_addr() for svm.


Tang Chen (5):
   kvm: Add gfn_to_page_no_pin() to translate gfn to page without
 pinning.
   kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
   kvm, mem-hotplug: Do not pin ept identity pagetable in memory.
   kvm: Remove ept_identity_pagetable from struct kvm_arch.
   kvm, mem-hotplug: Do not pin apic access page in memory.

  arch/x86/include/asm/kvm_host.h |  2 +-
  arch/x86/kvm/mmu.c  | 11 +++
  arch/x86/kvm/svm.c  |  9 -
  arch/x86/kvm/vmx.c  | 40 ++--
  arch/x86/kvm/x86.c  | 16 ++--
  include/linux/kvm_host.h|  3 +++
  virt/kvm/kvm_main.c | 29 -
  7 files changed, 87 insertions(+), 23 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

2014-07-14 Thread Tang Chen

Hi Gleb,

Thanks for the reply. Please see below.

On 07/12/2014 04:04 PM, Gleb Natapov wrote:

On Tue, Jul 08, 2014 at 09:01:32PM +0800, Tang Chen wrote:

apic access page is pinned in memory. As a result, it cannot be 
migrated/hot-removed.
Actually, it is not necessary to be pinned.

The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
corresponding ept entry. This patch introduces a new vcpu request named
KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
and force all the vcpus exit guest, and re-enter guest till they updates the 
VMCS
APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
kvm->arch.apic_access_page to the new page.


By default kvm Linux guest uses x2apic, so APIC_ACCESS_ADDR mechanism
is not used since no MMIO access to APIC is ever done. Have you tested
this with "-cpu modelname,-x2apic" qemu flag?


I used the following commandline to test the patches.

# /usr/libexec/qemu-kvm -m 512M -hda /home/tangchen/xxx.img -enable-kvm 
-smp 2


And I think the guest used APIC_ACCESS_ADDR mechanism because the previous
patch-set has some problem which will happen when the apic page is accessed.
And it did happen.

I'll test this patch-set with "-cpu modelname,-x2apic" flag.




Signed-off-by: Tang Chen
---
  arch/x86/include/asm/kvm_host.h |  1 +
  arch/x86/kvm/mmu.c  | 11 +++
  arch/x86/kvm/svm.c  |  6 ++
  arch/x86/kvm/vmx.c  |  8 +++-
  arch/x86/kvm/x86.c  | 14 ++
  include/linux/kvm_host.h|  2 ++
  virt/kvm/kvm_main.c | 12 
  7 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 62f973e..9ce6bfd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -737,6 +737,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..551693d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3427,6 +3427,17 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
 level, gfn, pfn, prefault);
spin_unlock(&vcpu->kvm->mmu_lock);

+   /*
+* apic access page could be migrated. When the guest tries to access
+* the apic access page, ept violation will occur, and we can use GUP
+* to find the new page.
+*
+* GUP will wait till the migrate entry be replaced with the new page.
+*/
+   if (gpa == APIC_DEFAULT_PHYS_BASE)
+   vcpu->kvm->arch.apic_access_page = gfn_to_page_no_pin(vcpu->kvm,
+   APIC_DEFAULT_PHYS_BASE>>  PAGE_SHIFT);

Shouldn't you make KVM_REQ_APIC_PAGE_RELOAD request here?


I don't think we need to make KVM_REQ_APIC_PAGE_RELOAD request here.

In kvm_mmu_notifier_invalidate_page() I made the request. And the 
handler called
gfn_to_page_no_pin() to get the new page, which will wait till the 
migration
finished. And then updated the VMCS APIC_ACCESS_ADDR pointer. So, when 
the vcpus
were forced to exit the guest mode, they would wait till the VMCS 
APIC_ACCESS_ADDR

pointer was updated.

As a result, we don't need to make the request here.





+
return r;

  out_unlock:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 576b525..dc76f29 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3612,6 +3612,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
  }

+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   return;
+}
+
  static int svm_vm_has_apicv(struct kvm *kvm)
  {
return 0;
@@ -4365,6 +4370,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5532ac8..f7c6313 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3992,7 +3992,7 @@ static int alloc_apic_

Re: [RESEND PATCH v2 4/5] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-07-14 Thread Tang Chen

Hi Gleb,

Please see below.

On 07/12/2014 03:44 PM, Gleb Natapov wrote:

On Wed, Jul 09, 2014 at 10:08:03AM +0800, Tang Chen wrote:

kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

Signed-off-by: Tang Chen
---
  arch/x86/include/asm/kvm_host.h |  1 -
  arch/x86/kvm/vmx.c  | 25 +++--
  2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4931415..62f973e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -578,7 +578,6 @@ struct kvm_arch {

gpa_t wall_clock;

-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0918635e..fe2e5f4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -741,6 +741,7 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu 
*vcpu);
  static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
  static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
  static bool vmx_mpx_supported(void);
+static int alloc_identity_pagetable(struct kvm *kvm);

  static DEFINE_PER_CPU(struct vmcs *, vmxarea);
  static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3921,21 +3922,21 @@ out:

  static int init_rmode_identity_map(struct kvm *kvm)
  {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;

if (!enable_ept)
return 1;
-   if (unlikely(!kvm->arch.ept_identity_pagetable)) {
-   printk(KERN_ERR "EPT: identity-mapping pagetable "
-   "haven't been allocated!\n");
-   return 0;
-   }
if (likely(kvm->arch.ept_identity_pagetable_done))
return 1;
-   ret = 0;
identity_map_pfn = kvm->arch.ept_identity_map_addr>>  PAGE_SHIFT;
+
+   mutex_lock(&kvm->slots_lock);

Why move this out of alloc_identity_pagetable()?



Referring to the original code, I think mutex_lock(&kvm->slots_lock) is 
used

to protect kvm->arch.ept_identity_pagetable. If two or more threads try to
modify it at the same time, the mutex ensures that the identity table is 
only

allocated once.

Now, we dropped kvm->arch.ept_identity_pagetable. And use 
kvm->arch.ept_identity_pagetable_done
to check if the identity table is allocated and initialized. So we 
should protect
memory slot operation in alloc_identity_pagetable() and 
kvm->arch.ept_identity_pagetable_done

with this mutex.

Of course, I can see that the name "slots_lock" indicates that it may be 
used
to protect the memory slot operation only. Maybe move it out here is not 
suitable.


If I'm wrong, please tell me.


+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(&kvm->srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r<  0)
@@ -3953,6 +3954,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
  out:
srcu_read_unlock(&kvm->srcu, idx);
+
+out2:
+   mutex_unlock(&kvm->slots_lock);
return ret;
  }

@@ -4006,9 +4010,6 @@ static int alloc_identity_pagetable(struct kvm *kvm)
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;

-   mutex_lock(&kvm->slots_lock);
-   if (kvm->arch.ept_identity_pagetable)
-   goto out;
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
@@ -4025,9 +4026,7 @@ static int alloc_identity_pagetable(struct kvm *kvm)
goto out;
}

-   kvm->arch.ept_identity_pagetable = page;

I think we can drop gfn_to_page() above too now. Why would we need it?



Yes, will remove it in the next version.

Thanks.


  out:
-   mutex_unlock(&kvm->slots_lock);
return r;
  }

@@ -7583,8 +7582,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
--
1.8.3.1



--
Gleb.
.


--
To unsubscr

Re: [RESEND PATCH v2 4/5] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-07-15 Thread Tang Chen

On 07/14/2014 10:27 PM, Gleb Natapov wrote:
..

if (likely(kvm->arch.ept_identity_pagetable_done))
return 1;
-   ret = 0;
identity_map_pfn = kvm->arch.ept_identity_map_addr>>   PAGE_SHIFT;
+
+   mutex_lock(&kvm->slots_lock);

Why move this out of alloc_identity_pagetable()?



Referring to the original code, I think mutex_lock(&kvm->slots_lock) is used
to protect kvm->arch.ept_identity_pagetable. If two or more threads try to
modify it at the same time, the mutex ensures that the identity table is
only
allocated once.

Now, we dropped kvm->arch.ept_identity_pagetable. And use
kvm->arch.ept_identity_pagetable_done
to check if the identity table is allocated and initialized. So we should
protect
memory slot operation in alloc_identity_pagetable() and
kvm->arch.ept_identity_pagetable_done
with this mutex.

Of course, I can see that the name "slots_lock" indicates that it may be
used
to protect the memory slot operation only. Maybe move it out here is not
suitable.

If I'm wrong, please tell me.


No, you are right that besides memory slot creation slots_lock protects checking
of ept_identity_pagetable here, but after you patch ept_identity_pagetable_done 
is
tested outside of slots_lock so the allocation can happen twice, no?


Oh, yes. Will fix it in the next version.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

2014-07-15 Thread Tang Chen

On 07/15/2014 07:52 PM, Jan Kiszka wrote:

On 2014-07-14 16:58, Gleb Natapov wrote:

..

+   struct page *page = gfn_to_page_no_pin(vcpu->kvm,
+   APIC_DEFAULT_PHYS_BASE>>   PAGE_SHIFT);

If you do not use kvm->arch.apic_access_page to get current address why not 
drop it entirely?



I should also update kvm->arch.apic_access_page here. It is used in other
places
in kvm, so I don't think we should drop it. Will update the patch.

What other places? The only other place I see is in nested kvm code and you can 
call
gfn_to_page_no_pin() there instead of using kvm->arch.apic_access_page 
directly. But
as far as I see nested kvm code cannot handle change of APIC_ACCESS_ADDR phys 
address.
If APIC_ACCESS_ADDR changes during nested guest run, non nested vmcs will still 
have old
physical address. One way to fix that is to set KVM_REQ_APIC_PAGE_RELOAD during 
nested exit.




Hi Jan,

Thanks for the reply. Please see below.


I cannot follow your concerns yet. Specifically, how should
APIC_ACCESS_ADDR (the VMCS field, right?) change while L2 is running? We
currently pin/unpin on L1->L2/L2->L1, respectively. Or what do you mean?



Currently, we pin the nested apic page in memory. And as a result, the page
cannot be migrated/hot-removed, Just like the apic page for L1 vm.

What we want to do here is DO NOT ping the page in memory. When it is 
migrated,

we track the hpa of the page and update the VMCS field at proper time.

Please refer to patch 5/5, I have done this for the L1 vm. The solution is:
1. When apic page is migrated, invalidate ept entry of the apic page in 
mmu_notifier

   registered by kvm, which is kvm_mmu_notifier_invalidate_page() here.
2. Introduce a new vcpu request named KVM_REQ_APIC_PAGE_RELOAD, and 
enforce all the

   vcpu to exit from guest mode, make this request to all the vcpus.
3. In the request handler, use GUP function to find back the new apic 
page, and

   update the VMCS field.

I think Gleb is trying to say that we have to face the same problem in 
nested vm.


Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

2014-07-15 Thread Tang Chen

On 07/15/2014 08:09 PM, Gleb Natapov wrote:

On Tue, Jul 15, 2014 at 01:52:40PM +0200, Jan Kiszka wrote:

..


I cannot follow your concerns yet. Specifically, how should
APIC_ACCESS_ADDR (the VMCS field, right?) change while L2 is running? We
currently pin/unpin on L1->L2/L2->L1, respectively. Or what do you mean?


I am talking about this case:
  if (cpu_has_secondary_exec_ctrls()) {a
  } else {
  exec_control |=
 SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
 vmcs_write64(APIC_ACCESS_ADDR,
 page_to_phys(vcpu->kvm->arch.apic_access_page));
  }
We do not pin here.



Hi Gleb,


7905 if (exec_control & 
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) {

..
7912 if (vmx->nested.apic_access_page) /* 
shouldn't happen */
7913 
nested_release_page(vmx->nested.apic_access_page);

7914 vmx->nested.apic_access_page =
7915 nested_get_page(vcpu, 
vmcs12->apic_access_addr);


I thought you were talking about the problem here. We pin 
vmcs12->apic_access_addr
in memory. And I think we should do the same thing to this page as to L1 
vm.

Right ?

..
7922 if (!vmx->nested.apic_access_page)
7923 exec_control &=
7924 
~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;

7925 else
7926 vmcs_write64(APIC_ACCESS_ADDR,
7927 
page_to_phys(vmx->nested.apic_access_page));
7928 } else if 
(vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {

7929 exec_control |=
7930 
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;

7931 vmcs_write64(APIC_ACCESS_ADDR,
7932 
page_to_phys(vcpu->kvm->arch.apic_access_page));

7933 }

And yes, we have the problem you said here. We can migrate the page 
while L2 vm is running.

So I think we should enforce L2 vm to exit to L1. Right ?

Thanks.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

2014-07-15 Thread Tang Chen

On 07/15/2014 08:40 PM, Gleb Natapov wrote:

On Tue, Jul 15, 2014 at 08:28:22PM +0800, Tang Chen wrote:

On 07/15/2014 08:09 PM, Gleb Natapov wrote:

On Tue, Jul 15, 2014 at 01:52:40PM +0200, Jan Kiszka wrote:

..


I cannot follow your concerns yet. Specifically, how should
APIC_ACCESS_ADDR (the VMCS field, right?) change while L2 is running? We
currently pin/unpin on L1->L2/L2->L1, respectively. Or what do you mean?


I am talking about this case:
  if (cpu_has_secondary_exec_ctrls()) {a
  } else {
  exec_control |=
 SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
 vmcs_write64(APIC_ACCESS_ADDR,
 page_to_phys(vcpu->kvm->arch.apic_access_page));
  }
We do not pin here.



Hi Gleb,


7905 if (exec_control&
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) {
..
7912 if (vmx->nested.apic_access_page) /* shouldn't
happen */
7913 nested_release_page(vmx->nested.apic_access_page);
7914 vmx->nested.apic_access_page =
7915 nested_get_page(vcpu,
vmcs12->apic_access_addr);

I thought you were talking about the problem here. We pin
vmcs12->apic_access_addr
in memory. And I think we should do the same thing to this page as to L1 vm.
Right ?

Nested kvm pins a lot of pages, it will probably be not easy to handle all of 
them,
so for now I am concerned with non nested case only (but nested should continue 
to
work obviously, just pin pages like it does now).


True. I will work on it.

And also, when using PCI passthrough, kvm_pin_pages() also pins some 
pages. This is

also in my todo list.

But sorry, a little strange. I didn't find where 
vmcs12->apic_access_addr is allocated

or initialized... Would you please tell me ?





..
7922 if (!vmx->nested.apic_access_page)
7923 exec_control&=
7924 ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
7925 else
7926 vmcs_write64(APIC_ACCESS_ADDR,
7927 page_to_phys(vmx->nested.apic_access_page));
7928 } else if
(vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
7929 exec_control |=
7930 SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
7931 vmcs_write64(APIC_ACCESS_ADDR,
7932 page_to_phys(vcpu->kvm->arch.apic_access_page));
7933 }

And yes, we have the problem you said here. We can migrate the page while L2
vm is running.
So I think we should enforce L2 vm to exit to L1. Right ?


We can request APIC_ACCESS_ADDR reload during L2->L1 vmexit emulation, so
if APIC_ACCESS_ADDR changes while L2 is running it will be reloaded for L1 too.



apic pages for L2 and L1 are not the same page, right ?

I think, just like we are doing in patch 5/5, we cannot wait for the 
next L2->L1 vmexit.
We should enforce a L2->L1 vmexit in mmu_notifier, just like 
make_all_cpus_request() does.


Am I right ?

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

2014-07-17 Thread Tang Chen

Hi Gleb,

Sorry for the delay. Please see below.

On 07/15/2014 10:40 PM, Gleb Natapov wrote:
..



We can request APIC_ACCESS_ADDR reload during L2->L1 vmexit emulation, so
if APIC_ACCESS_ADDR changes while L2 is running it will be reloaded for L1 too.



apic pages for L2 and L1 are not the same page, right ?


If L2 guest enable apic access page then they are different, otherwise
they are the same.


I think, just like we are doing in patch 5/5, we cannot wait for the next
L2->L1 vmexit.
We should enforce a L2->L1 vmexit in mmu_notifier, just like
make_all_cpus_request() does.

Am I right ?


I do not see why forcing APIC_ACCESS_ADDR reload during L2->L1 exit is not 
enough.


Yes, you are right. APIC_ACCESS_ADDR reload should be done during L2->L1 
vmexit.


I mean, before the page is moved to other place, we have to enforce a 
L2->L1 vmexit,
but not wait for the next L2->L1 vmexit. Since when the page is being 
moved, if the
L2 vm is still running, it could access apic page directly. And the vm 
may corrupt.


In the mmu_notifier called before the page is moved, we have to enforce 
a L2->L1
vmexit, and ask vcpus to reload APIC_ACCESS_ADDR for L2 vm. The process 
will wait
till the page migration is completed, and update the APIC_ACCESS_ADDR, 
and re-enter

guest mode.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

2014-07-17 Thread Tang Chen

Hi Gleb,

On 07/15/2014 08:40 PM, Gleb Natapov wrote:
..


And yes, we have the problem you said here. We can migrate the page while L2
vm is running.
So I think we should enforce L2 vm to exit to L1. Right ?


We can request APIC_ACCESS_ADDR reload during L2->L1 vmexit emulation, so
if APIC_ACCESS_ADDR changes while L2 is running it will be reloaded for L1 too.



Sorry, I think I don't quite understand the procedure you are talking 
about here.


Referring to the code, I think we have three machines: L0(host), L1 and L2.
And we have two types of vmexit: L2->L1 and L2->L0.  Right ?

We are now talking about this case: L2 and L1 shares the apic page.

Using patch 5/5, when apic page is migrated on L0, mmu_notifier will 
notify L1,
and update L1's VMCS. At this time, we are in L0, not L2. Why cannot we 
update
the L2's VMCS at the same time ?  Is it because we don't know how many 
L2 vms

there are in L1 ?

And, when will L2->L1 vmexit happen ?  When we enforce L1 to exit to L0 by
calling make_all_cpus_request(), is L2->L1 vmexit triggered automatically ?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

2014-07-18 Thread Tang Chen

Hi Gleb,

On 07/17/2014 09:57 PM, Gleb Natapov wrote:

On Thu, Jul 17, 2014 at 09:34:20PM +0800, Tang Chen wrote:

Hi Gleb,

On 07/15/2014 08:40 PM, Gleb Natapov wrote:
..


And yes, we have the problem you said here. We can migrate the page while L2
vm is running.
So I think we should enforce L2 vm to exit to L1. Right ?


We can request APIC_ACCESS_ADDR reload during L2->L1 vmexit emulation, so
if APIC_ACCESS_ADDR changes while L2 is running it will be reloaded for L1 too.



Sorry, I think I don't quite understand the procedure you are talking about
here.

Referring to the code, I think we have three machines: L0(host), L1 and L2.
And we have two types of vmexit: L2->L1 and L2->L0.  Right ?

We are now talking about this case: L2 and L1 shares the apic page.

Using patch 5/5, when apic page is migrated on L0, mmu_notifier will notify
L1,
and update L1's VMCS. At this time, we are in L0, not L2. Why cannot we

Using patch 5/5, when apic page is migrated on L0, mmu_notifier will notify
L1 or L2 VMCS depending on which one happens to be running right now.
If it is L1 then L2's VMCS will be updated during vmentry emulation,


OK, this is easy to understand.


if it is
L2 we need to request reload during vmexit emulation to make sure L1's VMCS is
updated.



I'm a little confused here. In patch 5/5, I called 
make_all_cpus_request() to
force all vcpus exit to host. If we are in L2, where will the vcpus exit 
to ?

L1 or L0 ?

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/6] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-07-23 Thread Tang Chen
We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ec8366c..576b525 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm->asid_generation = 0;
init_vmcb(svm);
 
-   svm->vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&svm->vcpu))
svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 801332e..0e1117c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3982,13 +3982,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4460,7 +4460,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(&vmx->vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&vmx->vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/6] kvm: Make init_rmode_identity_map() return 0 on success.

2014-07-23 Thread Tang Chen
In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b8bf47d..6ab4f87 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3922,45 +3922,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm->arch.ept_identity_pagetable_done. */
mutex_lock(&kvm->slots_lock);
 
-   if (likely(kvm->arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm->arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(&kvm->srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r < 0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i < PT32_ENT_PER_PAGE; i++) {
tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
&tmp, i * sizeof(tmp), sizeof(tmp));
-   if (r < 0)
+   if (ret)
goto out;
}
kvm->arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(&kvm->srcu, idx);
-
 out2:
mutex_unlock(&kvm->slots_lock);
return ret;
@@ -7584,7 +7581,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   if (init_rmode_identity_map(kvm))
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/6] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-07-23 Thread Tang Chen
kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 50 -
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4931415..62f973e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -578,7 +578,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0e1117c..b8bf47d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -741,6 +741,7 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu 
*vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
 static bool vmx_mpx_supported(void);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3921,21 +3922,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm->arch.ept_identity_pagetable)) {
-   printk(KERN_ERR "EPT: identity-mapping pagetable "
-   "haven't been allocated!\n");
-   return 0;
+
+   /* Protect kvm->arch.ept_identity_pagetable_done. */
+   mutex_lock(&kvm->slots_lock);
+
+   if (likely(kvm->arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm->arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(&kvm->srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
@@ -3953,6 +3960,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(&kvm->srcu, idx);
+
+out2:
+   mutex_unlock(&kvm->slots_lock);
return ret;
 }
 
@@ -4002,31 +4012,23 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /*
+* In init_rmode_identity_map(), kvm->arch.ept_identity_pagetable_done
+* is checked before calling this function and set to true after the
+* calling. The access to kvm->arch.ept_identity_pagetable_done should
+* be protected by kvm->slots_lock.
+*/
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(&kvm->slots_lock);
-   if (kvm->arch.ept_identity_pagetable)
-   goto out;
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm->arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
-   if (r)
-   goto out;
 
-   page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
-
-   kvm->arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(&kvm->slots_lock);
return r;
 }
 
@@ -7582,8 +7584,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
 

[PATCH v3 0/6] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-07-23 Thread Tang Chen
ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

NOTE: Patch 1~5 are tested with -cpu xxx,-x2apic option, and they work well.
  Patch 6 is not tested yet, not sure if it is right.

Change log v2 -> v3:
1. Remove original [PATCH 3/6] since ept_identity_pagetable has been removed
   in new [PATCH 3/6].
2. In [PATCH 3/6], fix the problem that kvm->slots_lock does not protect 
   kvm->arch.ept_identity_pagetable_done checking.
3. In [PATCH 3/6], drop gfn_to_page() since ept_identity_pagetable has been 
   removed.
4. Add new [PATCH 4/6], remove redundant variable in init_rmode_identity_map(), 
   and make it return 0 on success.
5. In [PATCH 5/6], drop put_page(kvm->arch.apic_access_page) from x86.c .
6. In [PATCH 5/6], update kvm->arch.apic_access_page in 
vcpu_reload_apic_access_page().
7. Add new [PATCH 6/6], reload apic access page in L2->L1 exit.

Change log v1 -> v2:
1. Add [PATCH 4/5] to remove unnecessary kvm_arch->ept_identity_pagetable.
2. In [PATCH 5/5], only introduce KVM_REQ_APIC_PAGE_RELOAD request.
3. In [PATCH 5/5], add set_apic_access_page_addr() for svm.


Tang Chen (6):
  kvm: Add gfn_to_page_no_pin() to translate gfn to page without
pinning.
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm, mem-hotplug: Do not pin apic access page in memory.
  kvm, mem-hotplug: Reload L1's apic access page if it is migrated when
L2 is running.

 arch/x86/include/asm/kvm_host.h |   3 +-
 arch/x86/kvm/svm.c  |  15 +-
 arch/x86/kvm/vmx.c  | 108 +++-
 arch/x86/kvm/x86.c  |  22 ++--
 include/linux/kvm_host.h|   3 ++
 virt/kvm/kvm_main.c |  29 ++-
 6 files changed, 139 insertions(+), 41 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 5/6] kvm, mem-hotplug: Do not pin apic access page in memory.

2014-07-23 Thread Tang Chen
apic access page is pinned in memory. As a result, it cannot be 
migrated/hot-removed.
Actually, it is not necessary to be pinned.

The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
corresponding ept entry. This patch introduces a new vcpu request named
KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
and force all the vcpus exit guest, and re-enter guest till they updates the 
VMCS
APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
kvm->arch.apic_access_page to the new page.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  |  8 +++-
 arch/x86/kvm/x86.c  | 17 +++--
 include/linux/kvm_host.h|  2 ++
 virt/kvm/kvm_main.c | 12 
 6 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 62f973e..9ce6bfd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -737,6 +737,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 576b525..dc76f29 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3612,6 +3612,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4365,6 +4370,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6ab4f87..c123c1d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3995,7 +3995,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
if (r)
goto out;
 
-   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   page = gfn_to_page_no_pin(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -7072,6 +7072,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8841,6 +8846,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ffbe557..7541a66 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5929,6 +5929,19 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+   /*
+* apic access page could be migrated. When the page is being migrated,
+* GUP will wait till the migrate entry is replaced with the new pte
+* entry pointing to the new page.
+*/
+   vcpu->kvm->arch.apic_access_page = gfn_to_page_no_pin(vcpu->kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
+   page_to_phys(vcpu->kvm->arch.apic_access_page));
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
@@ -5989,6 +6002,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_deliver_pmi(vc

[PATCH v3 1/6] kvm: Add gfn_to_page_no_pin() to translate gfn to page without pinning.

2014-07-23 Thread Tang Chen
gfn_to_page() will finally call hva_to_pfn() to get the pfn, and pin the page
in memory by calling GUP functions. This function unpins the page.

Will be used by the followed patches.

Signed-off-by: Tang Chen 
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c  | 17 -
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ec4e3bd..7c58d9d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -541,6 +541,7 @@ int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, 
struct page **pages,
int nr_pages);
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
+struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable);
 unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4b6c01b..6091849 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1371,9 +1371,24 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 
return kvm_pfn_to_page(pfn);
 }
-
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
+struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn)
+{
+   struct page *page = gfn_to_page(kvm, gfn);
+
+   /*
+* gfn_to_page() will finally call hva_to_pfn() to get the pfn, and pin
+* the page in memory by calling GUP functions. This function unpins
+* the page.
+*/
+   if (!is_error_page(page))
+   put_page(page);
+
+   return page;
+}
+EXPORT_SYMBOL_GPL(gfn_to_page_no_pin);
+
 void kvm_release_page_clean(struct page *page)
 {
WARN_ON(is_error_page(page));
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 6/6] kvm, mem-hotplug: Reload L1's apic access page if it is migrated when L2 is running.

2014-07-23 Thread Tang Chen
This patch only handle "L1 and L2 vm share one apic access page" situation.

When L1 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, 
L2's vmcs
will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do
nothing.

When L2 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all L2 vmcs. And this patch requests apic access page reload in L2->L1 vmexit.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  | 37 +
 arch/x86/kvm/x86.c  |  3 +++
 4 files changed, 47 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9ce6bfd..613ee7f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -738,6 +738,7 @@ struct kvm_x86_ops {
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
+   void (*set_nested_apic_page_migrated)(struct kvm_vcpu *vcpu, bool set);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index dc76f29..87273ef 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3617,6 +3617,11 @@ static void svm_set_apic_access_page_addr(struct kvm 
*kvm, hpa_t hpa)
return;
 }
 
+static void svm_set_nested_apic_page_migrated(struct kvm_vcpu *vcpu, bool set)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4371,6 +4376,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
.set_apic_access_page_addr = svm_set_apic_access_page_addr,
+   .set_nested_apic_page_migrated = svm_set_nested_apic_page_migrated,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c123c1d..9231afe 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -379,6 +379,16 @@ struct nested_vmx {
 * we must keep them pinned while L2 runs.
 */
struct page *apic_access_page;
+   /*
+* L1's apic access page can be migrated. When L1 and L2 are sharing
+* the apic access page, after the page is migrated when L2 is running,
+* we have to reload it to L1 vmcs before we enter L1.
+*
+* When the shared apic access page is migrated in L1 mode, we don't
+* need to do anything else because we reload apic access page each
+* time when entering L2 in prepare_vmcs02().
+*/
+   bool apic_access_page_migrated;
u64 msr_ia32_feature_control;
 
struct hrtimer preemption_timer;
@@ -7077,6 +7087,12 @@ static void vmx_set_apic_access_page_addr(struct kvm 
*kvm, hpa_t hpa)
vmcs_write64(APIC_ACCESS_ADDR, hpa);
 }
 
+static void vmx_set_nested_apic_page_migrated(struct kvm_vcpu *vcpu, bool set)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+   vmx->nested.apic_access_page_migrated = set;
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8727,6 +8743,26 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* When shared (L1 & L2) apic access page is migrated during L2 is
+* running, mmu_notifier will force to reload the page's hpa for L2
+* vmcs. Need to reload it for L1 before entering L1.
+*/
+   if (vmx->nested.apic_access_page_migrated) {
+   /*
+* Do not call kvm_reload_apic_access_page() because we are now
+* in L2. We should not call make_all_cpus_request() to exit to
+* L0, otherwise we will reload for L2 vmcs again.
+*/
+   int i;
+
+   for (i = 0; i < atomic_read(&vcpu->kvm->online_vcpus); i++)
+   kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD,
+vcpu->kvm->vcpus[i]);
+
+   vmx->nested.apic_access_page_migrated = false;
+   }
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMR

[PATCH v4 1/6] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-08-27 Thread Tang Chen
We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..1d941ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm->asid_generation = 0;
init_vmcb(svm);
 
-   svm->vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&svm->vcpu))
svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..4b80ead 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(&vmx->vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&vmx->vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-08-27 Thread Tang Chen
apic access page is pinned in memory. As a result, it cannot be 
migrated/hot-removed.
Actually, it is not necessary to be pinned.

The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
corresponding ept entry. This patch introduces a new vcpu request named
KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
and force all the vcpus exit guest, and re-enter guest till they updates the 
VMCS
APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
kvm->arch.apic_access_page to the new page.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  |  6 ++
 arch/x86/kvm/x86.c  | 15 +++
 include/linux/kvm_host.h|  2 ++
 virt/kvm/kvm_main.c | 12 
 6 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35171c7..514183e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -739,6 +739,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d941ad..f2eacc4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3619,6 +3619,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4373,6 +4378,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 63c4c3e..da6d55d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7093,6 +7093,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8910,6 +8915,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e05bd58..96f4188 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,6 +5989,19 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+   /*
+* apic access page could be migrated. When the page is being migrated,
+* GUP will wait till the migrate entry is replaced with the new pte
+* entry pointing to the new page.
+*/
+   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
+   page_to_phys(vcpu->kvm->arch.apic_access_page));
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
@@ -6049,6 +6062,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_deliver_pmi(vcpu);
if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
vcpu_scan_ioapic(vcpu);
+   if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
+   vcpu_reload_apic_access_page(vcpu);
}
 
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index

[PATCH v4 6/6] kvm, mem-hotplug: Do not pin apic access page in memory.

2014-08-27 Thread Tang Chen
gfn_to_page() will finally call hva_to_pfn() to get the pfn, and pin the page
in memory by calling GUP functions. This function unpins the page.

After this patch, acpi access page is able to be migrated.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c   |  2 +-
 arch/x86/kvm/x86.c   |  4 +---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c  | 17 -
 4 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9035fd1..e0043a5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4022,7 +4022,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
if (r)
goto out;
 
-   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   page = gfn_to_page_no_pin(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 131b6e8..2edbeb9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5996,7 +5996,7 @@ static void vcpu_reload_apic_access_page(struct kvm_vcpu 
*vcpu)
 * GUP will wait till the migrate entry is replaced with the new pte
 * entry pointing to the new page.
 */
-   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
+   vcpu->kvm->arch.apic_access_page = gfn_to_page_no_pin(vcpu->kvm,
APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
page_to_phys(vcpu->kvm->arch.apic_access_page));
@@ -7255,8 +7255,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kfree(kvm->arch.vpic);
kfree(kvm->arch.vioapic);
kvm_free_vcpus(kvm);
-   if (kvm->arch.apic_access_page)
-   put_page(kvm->arch.apic_access_page);
kfree(rcu_dereference_check(kvm->arch.apic_map, 1));
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8be076a..02cbcb1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -526,6 +526,7 @@ int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, 
struct page **pages,
int nr_pages);
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
+struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable);
 unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 784127e..19d90d2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1386,9 +1386,24 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 
return kvm_pfn_to_page(pfn);
 }
-
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
+struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn)
+{
+   struct page *page = gfn_to_page(kvm, gfn);
+
+   /*
+* gfn_to_page() will finally call hva_to_pfn() to get the pfn, and pin
+* the page in memory by calling GUP functions. This function unpins
+* the page.
+*/
+   if (!is_error_page(page))
+   put_page(page);
+
+   return page;
+}
+EXPORT_SYMBOL_GPL(gfn_to_page_no_pin);
+
 void kvm_release_page_clean(struct page *page)
 {
WARN_ON(is_error_page(page));
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/6] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-08-27 Thread Tang Chen
ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

NOTE: Tested with -cpu xxx,-x2apic option.
  But since nested vm pins some other pages in memory, if user uses nested
  vm, memory hot-remove will not work.

Change log v3 -> v4:
1. The original patch 6 is now patch 5. ( by Jan Kiszka  )
2. The original patch 1 is now patch 6 since we should unpin apic access page
   at the very last moment.


Tang Chen (6):
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm, mem-hotplug: Reload L1' apic access page on migration in
vcpu_enter_guest().
  kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
running.
  kvm, mem-hotplug: Do not pin apic access page in memory.

 arch/x86/include/asm/kvm_host.h |   3 +-
 arch/x86/kvm/svm.c  |  15 +-
 arch/x86/kvm/vmx.c  | 103 +++-
 arch/x86/kvm/x86.c  |  22 +++--
 include/linux/kvm_host.h|   3 ++
 virt/kvm/kvm_main.c |  30 +++-
 6 files changed, 135 insertions(+), 41 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/6] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-08-27 Thread Tang Chen
kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 50 -
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..35171c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,7 +580,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b80ead..953d529 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3938,21 +3939,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm->arch.ept_identity_pagetable)) {
-   printk(KERN_ERR "EPT: identity-mapping pagetable "
-   "haven't been allocated!\n");
-   return 0;
+
+   /* Protect kvm->arch.ept_identity_pagetable_done. */
+   mutex_lock(&kvm->slots_lock);
+
+   if (likely(kvm->arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm->arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(&kvm->srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
@@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(&kvm->srcu, idx);
+
+out2:
+   mutex_unlock(&kvm->slots_lock);
return ret;
 }
 
@@ -4019,31 +4029,23 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /*
+* In init_rmode_identity_map(), kvm->arch.ept_identity_pagetable_done
+* is checked before calling this function and set to true after the
+* calling. The access to kvm->arch.ept_identity_pagetable_done should
+* be protected by kvm->slots_lock.
+*/
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(&kvm->slots_lock);
-   if (kvm->arch.ept_identity_pagetable)
-   goto out;
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm->arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
-   if (r)
-   goto out;
 
-   page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
-
-   kvm->arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(&kvm->slots_lock);
return r;
 }
 
@@ -7643,8 +7645,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_m

[PATCH v4 5/6] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-08-27 Thread Tang Chen
This patch only handle "L1 and L2 vm share one apic access page" situation.

When L1 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, 
L2's vmcs
will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do
nothing.

When L2 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all L2 vmcs. And this patch requests apic access page reload in L2->L1 vmexit.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  | 32 
 arch/x86/kvm/x86.c  |  3 +++
 virt/kvm/kvm_main.c |  1 +
 5 files changed, 43 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 514183e..13fbb62 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -740,6 +740,7 @@ struct kvm_x86_ops {
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
+   void (*set_nested_apic_page_migrated)(struct kvm_vcpu *vcpu, bool set);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f2eacc4..da88646 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3624,6 +3624,11 @@ static void svm_set_apic_access_page_addr(struct kvm 
*kvm, hpa_t hpa)
return;
 }
 
+static void svm_set_nested_apic_page_migrated(struct kvm_vcpu *vcpu, bool set)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4379,6 +4384,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
.set_apic_access_page_addr = svm_set_apic_access_page_addr,
+   .set_nested_apic_page_migrated = svm_set_nested_apic_page_migrated,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index da6d55d..9035fd1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -379,6 +379,16 @@ struct nested_vmx {
 * we must keep them pinned while L2 runs.
 */
struct page *apic_access_page;
+   /*
+* L1's apic access page can be migrated. When L1 and L2 are sharing
+* the apic access page, after the page is migrated when L2 is running,
+* we have to reload it to L1 vmcs before we enter L1.
+*
+* When the shared apic access page is migrated in L1 mode, we don't
+* need to do anything else because we reload apic access page each
+* time when entering L2 in prepare_vmcs02().
+*/
+   bool apic_access_page_migrated;
u64 msr_ia32_feature_control;
 
struct hrtimer preemption_timer;
@@ -7098,6 +7108,12 @@ static void vmx_set_apic_access_page_addr(struct kvm 
*kvm, hpa_t hpa)
vmcs_write64(APIC_ACCESS_ADDR, hpa);
 }
 
+static void vmx_set_nested_apic_page_migrated(struct kvm_vcpu *vcpu, bool set)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+   vmx->nested.apic_access_page_migrated = set;
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8796,6 +8812,21 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* When shared (L1 & L2) apic access page is migrated during L2 is
+* running, mmu_notifier will force to reload the page's hpa for L2
+* vmcs. Need to reload it for L1 before entering L1.
+*/
+   if (vmx->nested.apic_access_page_migrated) {
+   /*
+* Do not call kvm_reload_apic_access_page() because we are now
+* in L2. We should not call make_all_cpus_request() to exit to
+* L0, otherwise we will reload for L2 vmcs again.
+*/
+   kvm_reload_apic_access_page(vcpu->kvm);
+   vmx->nested.apic_access_page_migrated = false;
+   }
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMRESUME instruction, so we need to set the
 * success or failure flag accordingly.
@@ -8916,6 +8947,7 @@ static struct kvm_x86_ops vmx_x86_o

[PATCH v4 3/6] kvm: Make init_rmode_identity_map() return 0 on success.

2014-08-27 Thread Tang Chen
In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 953d529..63c4c3e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3939,45 +3939,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm->arch.ept_identity_pagetable_done. */
mutex_lock(&kvm->slots_lock);
 
-   if (likely(kvm->arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm->arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(&kvm->srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r < 0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i < PT32_ENT_PER_PAGE; i++) {
tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
&tmp, i * sizeof(tmp), sizeof(tmp));
-   if (r < 0)
+   if (ret)
goto out;
}
kvm->arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(&kvm->srcu, idx);
-
 out2:
mutex_unlock(&kvm->slots_lock);
return ret;
@@ -7645,7 +7642,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   if (init_rmode_identity_map(kvm))
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 5/7] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-09-10 Thread Tang Chen
This patch only handle "L1 and L2 vm share one apic access page" situation.

When L1 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, 
L2's vmcs
will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do
nothing.

When L2 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all L2 vmcs. And this patch requests apic access page reload in L2->L1 vmexit.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c  | 7 +++
 virt/kvm/kvm_main.c | 1 +
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index da6d55d..e7704b2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8796,6 +8796,13 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* Do not call kvm_reload_apic_access_page() because we are now
+* running, mmu_notifier will force to reload the page's hpa for L2
+* vmcs. Need to reload it for L1 before entering L1.
+*/
+   kvm_reload_apic_access_page(vcpu->kvm);
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMRESUME instruction, so we need to set the
 * success or failure flag accordingly.
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d8280de..784127e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -214,6 +214,7 @@ void kvm_reload_apic_access_page(struct kvm *kvm)
 {
make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
 }
+EXPORT_SYMBOL_GPL(kvm_reload_apic_access_page);
 
 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 6/7] kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

2014-09-10 Thread Tang Chen
To make apic access page migratable, we do not pin it in memory now.
When it is migrated, we should reload its physical address for all
vmcses. But when we tried to do this, all vcpu will access
kvm_arch->apic_access_page without any locking. This is not safe.

Actually, we do not need kvm_arch->apic_access_page anymore. Since
apic access page is not pinned in memory now, we can remove
kvm_arch->apic_access_page. When we need to write its physical address
into vmcs, use gfn_to_page() to get its page struct, which will also
pin it. And unpin it after then.

Suggested-by: Gleb Natapov 
Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/vmx.c  | 32 +---
 arch/x86/kvm/x86.c  | 15 +--
 3 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 514183e..70f0d2d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -576,7 +576,7 @@ struct kvm_arch {
struct kvm_apic_map *apic_map;
 
unsigned int tss_addr;
-   struct page *apic_access_page;
+   bool apic_access_page_done;
 
gpa_t wall_clock;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e7704b2..058c373 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4002,7 +4002,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
int r = 0;
 
mutex_lock(&kvm->slots_lock);
-   if (kvm->arch.apic_access_page)
+   if (kvm->arch.apic_access_page_done)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
@@ -4018,7 +4018,12 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
}
 
-   kvm->arch.apic_access_page = page;
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
+   kvm->arch.apic_access_page_done = true;
 out:
mutex_unlock(&kvm->slots_lock);
return r;
@@ -4536,9 +4541,16 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmcs_write32(TPR_THRESHOLD, 0);
}
 
-   if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm))
-   vmcs_write64(APIC_ACCESS_ADDR,
-
page_to_phys(vmx->vcpu.kvm->arch.apic_access_page));
+   if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
+   struct page *page = gfn_to_page(vmx->vcpu.kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   vmcs_write64(APIC_ACCESS_ADDR, page_to_phys(page));
+   /*
+* Do not pin apic access page in memory so that memory
+* hotplug process is able to migrate it.
+*/
+   put_page(page);
+   }
 
if (vmx_vm_has_apicv(vcpu->kvm))
memset(&vmx->pi_desc, 0, sizeof(struct pi_desc));
@@ -7994,10 +8006,16 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
struct vmcs12 *vmcs12)
vmcs_write64(APIC_ACCESS_ADDR,
  page_to_phys(vmx->nested.apic_access_page));
} else if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
+   struct page *page = gfn_to_page(vmx->vcpu.kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
exec_control |=
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-   vmcs_write64(APIC_ACCESS_ADDR,
-   page_to_phys(vcpu->kvm->arch.apic_access_page));
+   vmcs_write64(APIC_ACCESS_ADDR, page_to_phys(page));
+   /*
+* Do not pin apic access page in memory so that memory
+* hotplug process is able to migrate it.
+*/
+   put_page(page);
}
 
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 96f4188..6da0b93 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5991,15 +5991,20 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 
 static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
+   struct page *page;
+
/*
 * apic access page could be migrated. When the page is being migrated,
 * GUP will wait till the migrate entry is replaced with the new pte
 * entry pointing to the new page.
 */
-   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
-   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
-   kvm_x86_ops->set_apic_access_page_addr

[PATCH v5 4/7] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-10 Thread Tang Chen
apic access page is pinned in memory. As a result, it cannot be 
migrated/hot-removed.
Actually, it is not necessary to be pinned.

The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
corresponding ept entry. This patch introduces a new vcpu request named
KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
and force all the vcpus exit guest, and re-enter guest till they updates the 
VMCS
APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
kvm->arch.apic_access_page to the new page.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  |  6 ++
 arch/x86/kvm/x86.c  | 15 +++
 include/linux/kvm_host.h|  2 ++
 virt/kvm/kvm_main.c | 12 
 6 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35171c7..514183e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -739,6 +739,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d941ad..f2eacc4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3619,6 +3619,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4373,6 +4378,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 63c4c3e..da6d55d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7093,6 +7093,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8910,6 +8915,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e05bd58..96f4188 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,6 +5989,19 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+   /*
+* apic access page could be migrated. When the page is being migrated,
+* GUP will wait till the migrate entry is replaced with the new pte
+* entry pointing to the new page.
+*/
+   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
+   page_to_phys(vcpu->kvm->arch.apic_access_page));
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
@@ -6049,6 +6062,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_deliver_pmi(vcpu);
if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
vcpu_scan_ioapic(vcpu);
+   if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
+   vcpu_reload_apic_access_page(vcpu);
}
 
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index

[PATCH v5 3/7] kvm: Make init_rmode_identity_map() return 0 on success.

2014-09-10 Thread Tang Chen
In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 953d529..63c4c3e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3939,45 +3939,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm->arch.ept_identity_pagetable_done. */
mutex_lock(&kvm->slots_lock);
 
-   if (likely(kvm->arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm->arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(&kvm->srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r < 0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i < PT32_ENT_PER_PAGE; i++) {
tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
&tmp, i * sizeof(tmp), sizeof(tmp));
-   if (r < 0)
+   if (ret)
goto out;
}
kvm->arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(&kvm->srcu, idx);
-
 out2:
mutex_unlock(&kvm->slots_lock);
return ret;
@@ -7645,7 +7642,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   if (init_rmode_identity_map(kvm))
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 7/7] kvm, mem-hotplug: Unpin and remove nested_vmx->apic_access_page.

2014-09-10 Thread Tang Chen
Just like we removed kvm_arch->apic_access_page, nested_vmx->apic_access_page
becomes useless for the same reason. This patch removes 
nested_vmx->apic_access_page,
and use gfn_to_page() to pin it in memory when we need it, and unpin it after 
then.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c | 31 +--
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 058c373..4aa73cb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -374,11 +374,6 @@ struct nested_vmx {
u64 vmcs01_tsc_offset;
/* L2 must run next, and mustn't decide to exit to L1. */
bool nested_run_pending;
-   /*
-* Guest pages referred to in vmcs02 with host-physical pointers, so
-* we must keep them pinned while L2 runs.
-*/
-   struct page *apic_access_page;
u64 msr_ia32_feature_control;
 
struct hrtimer preemption_timer;
@@ -6154,11 +6149,6 @@ static void free_nested(struct vcpu_vmx *vmx)
nested_release_vmcs12(vmx);
if (enable_shadow_vmcs)
free_vmcs(vmx->nested.current_shadow_vmcs);
-   /* Unpin physical memory we referred to in current vmcs02 */
-   if (vmx->nested.apic_access_page) {
-   nested_release_page(vmx->nested.apic_access_page);
-   vmx->nested.apic_access_page = 0;
-   }
 
nested_free_all_saved_vmcss(vmx);
 }
@@ -7983,28 +7973,31 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
struct vmcs12 *vmcs12)
exec_control |= vmcs12->secondary_vm_exec_control;
 
if (exec_control & SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) {
+   struct page *page;
/*
 * Translate L1 physical address to host physical
 * address for vmcs02. Keep the page pinned, so this
 * physical address remains valid. We keep a reference
 * to it so we can release it later.
 */
-   if (vmx->nested.apic_access_page) /* shouldn't happen */
-   
nested_release_page(vmx->nested.apic_access_page);
-   vmx->nested.apic_access_page =
-   nested_get_page(vcpu, vmcs12->apic_access_addr);
+   page = nested_get_page(vcpu, vmcs12->apic_access_addr);
/*
 * If translation failed, no matter: This feature asks
 * to exit when accessing the given address, and if it
 * can never be accessed, this feature won't do
 * anything anyway.
 */
-   if (!vmx->nested.apic_access_page)
+   if (!page)
exec_control &=
  ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
else
vmcs_write64(APIC_ACCESS_ADDR,
- page_to_phys(vmx->nested.apic_access_page));
+page_to_phys(page));
+   /*
+* Do not pin nested vm's apic access page in memory so
+* that memory hotplug process is able to migrate it.
+*/
+   put_page(page);
} else if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
struct page *page = gfn_to_page(vmx->vcpu.kvm,
APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
@@ -8807,12 +8800,6 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
/* This is needed for same reason as it was needed in prepare_vmcs02 */
vmx->host_rsp = 0;
 
-   /* Unpin physical memory we referred to in vmcs02 */
-   if (vmx->nested.apic_access_page) {
-   nested_release_page(vmx->nested.apic_access_page);
-   vmx->nested.apic_access_page = 0;
-   }
-
/*
 * Do not call kvm_reload_apic_access_page() because we are now
 * running, mmu_notifier will force to reload the page's hpa for L2
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/7] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-09-10 Thread Tang Chen
kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 50 -
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..35171c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,7 +580,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b80ead..953d529 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3938,21 +3939,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm->arch.ept_identity_pagetable)) {
-   printk(KERN_ERR "EPT: identity-mapping pagetable "
-   "haven't been allocated!\n");
-   return 0;
+
+   /* Protect kvm->arch.ept_identity_pagetable_done. */
+   mutex_lock(&kvm->slots_lock);
+
+   if (likely(kvm->arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm->arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(&kvm->srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
@@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(&kvm->srcu, idx);
+
+out2:
+   mutex_unlock(&kvm->slots_lock);
return ret;
 }
 
@@ -4019,31 +4029,23 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /*
+* In init_rmode_identity_map(), kvm->arch.ept_identity_pagetable_done
+* is checked before calling this function and set to true after the
+* calling. The access to kvm->arch.ept_identity_pagetable_done should
+* be protected by kvm->slots_lock.
+*/
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(&kvm->slots_lock);
-   if (kvm->arch.ept_identity_pagetable)
-   goto out;
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm->arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
-   if (r)
-   goto out;
 
-   page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
-
-   kvm->arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(&kvm->slots_lock);
return r;
 }
 
@@ -7643,8 +7645,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
 

[PATCH v5 0/7] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-10 Thread Tang Chen
ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

NOTE: Tested with -cpu xxx,-x2apic option.
  But since nested vm pins some other pages in memory, if user uses nested
  vm, memory hot-remove will not work.

Change log v4 -> v5:
1. Patch 5/7: Call kvm_reload_apic_access_page() unconditionally in 
nested_vmx_vmexit().
   (From Gleb Natapov )
2. Patch 6/7: Remove kvm_arch->apic_access_page. (From Gleb Natapov 
)
3. Patch 7/7: Remove nested_vmx->apic_access_page.

Tang Chen (7):
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm, mem-hotplug: Reload L1' apic access page on migration in
vcpu_enter_guest().
  kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
running.
  kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.
  kvm, mem-hotplug: Unpin and remove nested_vmx->apic_access_page.

 arch/x86/include/asm/kvm_host.h |   4 +-
 arch/x86/kvm/svm.c  |   9 ++-
 arch/x86/kvm/vmx.c  | 139 ++--
 arch/x86/kvm/x86.c  |  24 +--
 include/linux/kvm_host.h|   2 +
 virt/kvm/kvm_main.c |  13 
 6 files changed, 122 insertions(+), 69 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/7] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-09-10 Thread Tang Chen
We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..1d941ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm->asid_generation = 0;
init_vmcb(svm);
 
-   svm->vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&svm->vcpu))
svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..4b80ead 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(&vmx->vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&vmx->vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 5/6] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-09-16 Thread Tang Chen
This patch only handle "L1 and L2 vm share one apic access page" situation.

When L1 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, 
L2's vmcs
will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do
nothing.

When L2 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all L2 vmcs. And this patch requests apic access page reload in L2->L1 vmexit.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/vmx.c  | 6 ++
 arch/x86/kvm/x86.c  | 3 ++-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 514183e..92b3e72 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1046,6 +1046,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu);
 
 void kvm_define_shared_msr(unsigned index, u32 msr);
 void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a1a9797..d0d5981 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8795,6 +8795,12 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* We are now running in L2, mmu_notifier will force to reload the
+* page's hpa for L2 vmcs. Need to reload it for L1 before entering L1.
+*/
+   kvm_vcpu_reload_apic_access_page(vcpu);
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMRESUME instruction, so we need to set the
 * success or failure flag accordingly.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 27c3d30..3f458b2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,7 +5989,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
-static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
/*
 * apic access page could be migrated. When the page is being migrated,
@@ -6001,6 +6001,7 @@ static void kvm_vcpu_reload_apic_access_page(struct 
kvm_vcpu *vcpu)
kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
page_to_phys(vcpu->kvm->arch.apic_access_page));
 }
+EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 6/6] kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

2014-09-16 Thread Tang Chen
To make apic access page migratable, we do not pin it in memory now.
When it is migrated, we should reload its physical address for all
vmcses. But when we tried to do this, all vcpu will access
kvm_arch->apic_access_page without any locking. This is not safe.

Actually, we do not need kvm_arch->apic_access_page anymore. Since
apic access page is not pinned in memory now, we can remove
kvm_arch->apic_access_page. When we need to write its physical address
into vmcs, use gfn_to_page() to get its page struct, which will also
pin it. And unpin it after then.

Suggested-by: Gleb Natapov 
Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/vmx.c  | 15 +--
 arch/x86/kvm/x86.c  | 15 +--
 3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 92b3e72..9daf754 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -576,7 +576,7 @@ struct kvm_arch {
struct kvm_apic_map *apic_map;
 
unsigned int tss_addr;
-   struct page *apic_access_page;
+   bool apic_access_page_done;
 
gpa_t wall_clock;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d0d5981..61f3854 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4002,7 +4002,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
int r = 0;
 
mutex_lock(&kvm->slots_lock);
-   if (kvm->arch.apic_access_page)
+   if (kvm->arch.apic_access_page_done)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
@@ -4018,7 +4018,12 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
}
 
-   kvm->arch.apic_access_page = page;
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
+   kvm->arch.apic_access_page_done = true;
 out:
mutex_unlock(&kvm->slots_lock);
return r;
@@ -4534,8 +4539,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
}
 
if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm))
-   vmcs_write64(APIC_ACCESS_ADDR,
-
page_to_phys(vmx->vcpu.kvm->arch.apic_access_page));
+   kvm_vcpu_reload_apic_access_page(vcpu);
 
if (vmx_vm_has_apicv(vcpu->kvm))
memset(&vmx->pi_desc, 0, sizeof(struct pi_desc));
@@ -7995,8 +7999,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
} else if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
exec_control |=
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-   vmcs_write64(APIC_ACCESS_ADDR,
-   page_to_phys(vcpu->kvm->arch.apic_access_page));
+   kvm_vcpu_reload_apic_access_page(vcpu);
}
 
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3f458b2..9094e13 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5991,15 +5991,20 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
+   struct page *page;
+
/*
 * apic access page could be migrated. When the page is being migrated,
 * GUP will wait till the migrate entry is replaced with the new pte
 * entry pointing to the new page.
 */
-   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
-   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
-   kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
-   page_to_phys(vcpu->kvm->arch.apic_access_page));
+   page = gfn_to_page(vcpu->kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm, page_to_phys(page));
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
@@ -7253,8 +7258,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kfree(kvm->arch.vpic);
kfree(kvm->arch.vioapic);
kvm_free_vcpus(kvm);
-   if (kvm->arch.apic_access_page)
-   put_page(kvm->arch.apic_access_page);
kfree(rcu_dereference_check(kvm->arch.apic_map, 1));
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-16 Thread Tang Chen
apic access page is pinned in memory. As a result, it cannot be 
migrated/hot-removed.
Actually, it is not necessary to be pinned.

The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
corresponding ept entry. This patch introduces a new vcpu request named
KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
and force all the vcpus exit guest, and re-enter guest till they updates the 
VMCS
APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
kvm->arch.apic_access_page to the new page.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  |  6 ++
 arch/x86/kvm/x86.c  | 15 +++
 include/linux/kvm_host.h|  2 ++
 virt/kvm/kvm_main.c | 12 
 6 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35171c7..514183e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -739,6 +739,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d941ad..f2eacc4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3619,6 +3619,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4373,6 +4378,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 72a0470..a1a9797 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7090,6 +7090,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8909,6 +8914,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e05bd58..27c3d30 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,6 +5989,19 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+   /*
+* apic access page could be migrated. When the page is being migrated,
+* GUP will wait till the migrate entry is replaced with the new pte
+* entry pointing to the new page.
+*/
+   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
+   page_to_phys(vcpu->kvm->arch.apic_access_page));
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
@@ -6049,6 +6062,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_deliver_pmi(vcpu);
if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
vcpu_scan_ioapic(vcpu);
+   if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
+   kvm_vcpu_reload_apic_access_page(vcpu);
}
 
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h

[PATCH v6 3/6] kvm: Make init_rmode_identity_map() return 0 on success.

2014-09-16 Thread Tang Chen
In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4fb84ad..72a0470 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3939,45 +3939,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm->arch.ept_identity_pagetable_done. */
mutex_lock(&kvm->slots_lock);
 
-   if (likely(kvm->arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm->arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(&kvm->srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r < 0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i < PT32_ENT_PER_PAGE; i++) {
tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
&tmp, i * sizeof(tmp), sizeof(tmp));
-   if (r < 0)
+   if (ret)
goto out;
}
kvm->arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(&kvm->srcu, idx);
-
 out2:
mutex_unlock(&kvm->slots_lock);
return ret;
@@ -7604,11 +7601,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
*kvm, unsigned int id)
if (err)
goto free_vcpu;
 
+   /* Set err to -ENOMEM to handle memory allocation error. */
+   err = -ENOMEM;
+
vmx->guest_msrs = kmalloc(PAGE_SIZE, GFP_KERNEL);
BUILD_BUG_ON(ARRAY_SIZE(vmx_msr_index) * sizeof(vmx->guest_msrs[0])
 > PAGE_SIZE);
 
-   err = -ENOMEM;
if (!vmx->guest_msrs) {
goto uninit_vcpu;
}
@@ -7641,8 +7640,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
if (!kvm->arch.ept_identity_map_addr)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
-   err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   err = init_rmode_identity_map(kvm);
+   if (err < 0)
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 0/6] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-16 Thread Tang Chen
ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

NOTE: Tested with -cpu xxx,-x2apic option.
  But since nested vm pins some other pages in memory, if user uses nested
  vm, memory hot-remove will not work.

Change log v5 -> v6:
1. Patch 1/6 has been applied by Paolo Bonzini , just 
resend it.
2. Simplify comment in alloc_identity_pagetable() and add a BUG_ON() in patch 
2/6.
3. Move err initialization forward in patch 3/6.
4. Rename vcpu_reload_apic_access_page() to kvm_vcpu_reload_apic_access_page() 
and 
   use it instead of kvm_reload_apic_access_page() in nested_vmx_vmexit() in 
patch 5/6.
5. Reuse kvm_vcpu_reload_apic_access_page() in prepare_vmcs02() and 
vmx_vcpu_reset() in patch 6/6.
6. Remove original patch 7 since we are not able to handle the situation in 
nested vm.

Tang Chen (6):
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm, mem-hotplug: Reload L1' apic access page on migration in
vcpu_enter_guest().
  kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
running.
  kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

 arch/x86/include/asm/kvm_host.h |  5 ++-
 arch/x86/kvm/svm.c  |  9 +++-
 arch/x86/kvm/vmx.c  | 95 +++--
 arch/x86/kvm/x86.c  | 25 +--
 include/linux/kvm_host.h|  2 +
 virt/kvm/kvm_main.c | 12 ++
 6 files changed, 99 insertions(+), 49 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 2/6] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-09-16 Thread Tang Chen
kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 47 +++--
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..35171c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,7 +580,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b80ead..4fb84ad 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3938,21 +3939,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm->arch.ept_identity_pagetable)) {
-   printk(KERN_ERR "EPT: identity-mapping pagetable "
-   "haven't been allocated!\n");
-   return 0;
+
+   /* Protect kvm->arch.ept_identity_pagetable_done. */
+   mutex_lock(&kvm->slots_lock);
+
+   if (likely(kvm->arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm->arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(&kvm->srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
@@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(&kvm->srcu, idx);
+
+out2:
+   mutex_unlock(&kvm->slots_lock);
return ret;
 }
 
@@ -4019,31 +4029,20 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /* Called with kvm->slots_lock held. */
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(&kvm->slots_lock);
-   if (kvm->arch.ept_identity_pagetable)
-   goto out;
+   BUG_ON(kvm->arch.ept_identity_pagetable_done);
+
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm->arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
-   if (r)
-   goto out;
-
-   page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
 
-   kvm->arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(&kvm->slots_lock);
return r;
 }
 
@@ -7643,8 +7642,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..e05bd58 100644
--- a/arch/x86/kvm/x86.c
++

[PATCH v6 1/6] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-09-16 Thread Tang Chen
We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..1d941ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm->asid_generation = 0;
init_vmcb(svm);
 
-   svm->vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&svm->vcpu))
svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..4b80ead 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(&vmx->vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&vmx->vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-17 Thread Tang Chen


On 09/16/2014 07:24 PM, Paolo Bonzini wrote:

Il 16/09/2014 12:42, Tang Chen ha scritto:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..0df82c1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -210,6 +210,11 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
  }
  
+void kvm_reload_apic_access_page(struct kvm *kvm)

+{
+   make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
+}
+
  int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
  {
struct page *page;
@@ -294,6 +299,13 @@ static void kvm_mmu_notifier_invalidate_page(struct 
mmu_notifier *mn,
if (need_tlb_flush)
kvm_flush_remote_tlbs(kvm);
  
+	/*

+* The physical address of apic access page is stored in VMCS.
+* Update it when it becomes invalid.
+*/
+   if (address == gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT))
+   kvm_reload_apic_access_page(kvm);

This cannot be in the generic code.  It is architecture-specific.


Yes.


Please add a new function kvm_arch_mmu_notifier_invalidate_page, and
call it outside the mmu_lock.


Then I think we need a macro to control the calling of this arch function
since other architectures do not have it.



kvm_reload_apic_access_page need not be in virt/kvm/kvm_main.c, either.


Since kvm_reload_apic_access_page() only calls make_all_cpus_request(),
and make_all_cpus_request() is static, I'd like to make it non-static, 
rename
it to kvm_make_all_cpus_request() and call it directly in 
kvm_arch_mmu_notifier_invalidate_page().

we don't need kvm_reload_apic_access_page() actually.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 2/9] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-09-20 Thread Tang Chen
kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 47 +++--
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..35171c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,7 +580,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b80ead..4fb84ad 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3938,21 +3939,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm->arch.ept_identity_pagetable)) {
-   printk(KERN_ERR "EPT: identity-mapping pagetable "
-   "haven't been allocated!\n");
-   return 0;
+
+   /* Protect kvm->arch.ept_identity_pagetable_done. */
+   mutex_lock(&kvm->slots_lock);
+
+   if (likely(kvm->arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm->arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(&kvm->srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
@@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(&kvm->srcu, idx);
+
+out2:
+   mutex_unlock(&kvm->slots_lock);
return ret;
 }
 
@@ -4019,31 +4029,20 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /* Called with kvm->slots_lock held. */
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(&kvm->slots_lock);
-   if (kvm->arch.ept_identity_pagetable)
-   goto out;
+   BUG_ON(kvm->arch.ept_identity_pagetable_done);
+
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm->arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
-   if (r)
-   goto out;
-
-   page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
 
-   kvm->arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(&kvm->slots_lock);
return r;
 }
 
@@ -7643,8 +7642,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..e05bd58 100644
--- a/arch/x86/kvm/x86.c
++

[PATCH v7 8/9] kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access migration.

2014-09-20 Thread Tang Chen
We are handling "L1 and L2 share one apic access page" situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
  vmcs in the next L1->L2 entry.

   2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
  L0->L1 entry and L2's vmcs in the next L1->L2 entry.

   3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
  L0->L2 entry and L1's vmcs in the next L2->L1 exit.

This patch force a L1->L0 exit or L2->L0 exit when shared apic access page is
migrated using mmu notifier. Since apic access page is only used on intel x86,
this is arch specific code.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/x86.c   | 11 +++
 include/linux/kvm_host.h | 14 +-
 virt/kvm/kvm_main.c  |  3 +++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2ae2dc7..7dd4179 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6011,6 +6011,17 @@ void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu 
*vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
+void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+  unsigned long address)
+{
+   /*
+* The physical address of apic access page is stored in VMCS.
+* Update it when it becomes invalid.
+*/
+   if (address == gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT))
+   kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 73de13c..b6e4d38 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -917,7 +917,19 @@ static inline int mmu_notifier_retry(struct kvm *kvm, 
unsigned long mmu_seq)
return 1;
return 0;
 }
-#endif
+
+#ifdef _ASM_X86_KVM_HOST_H
+void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+  unsigned long address);
+#else /* _ASM_X86_KVM_HOST_H */
+inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+ unsigned long address)
+{
+   return;
+}
+#endif /* _ASM_X86_KVM_HOST_H */
+
+#endif /* CONFIG_MMU_NOTIFIER & KVM_ARCH_WANT_MMU_NOTIFIER */
 
 #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0f8b6f6..5427973d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -295,6 +295,9 @@ static void kvm_mmu_notifier_invalidate_page(struct 
mmu_notifier *mn,
kvm_flush_remote_tlbs(kvm);
 
spin_unlock(&kvm->mmu_lock);
+
+   kvm_arch_mmu_notifier_invalidate_page(kvm, address);
+
srcu_read_unlock(&kvm->srcu, idx);
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 7/9] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-09-20 Thread Tang Chen
We are handling "L1 and L2 share one apic access page" situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
  vmcs in the next L1->L2 entry.

   2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
  L0->L1 entry and L2's vmcs in the next L1->L2 entry.

   3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
  L0->L2 entry and L1's vmcs in the next L2->L1 exit.

This patch handles 3).

In L0->L2 entry, L2's vmcs will be updated in prepare_vmcs02() called by
nested_vm_run(). So we need to do nothing.

In L2->L1 exit, this patch requests apic access page reload in L2->L1 vmexit.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/vmx.c  | 6 ++
 arch/x86/kvm/x86.c  | 3 ++-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 56156eb..1a8317e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1047,6 +1047,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu);
 
 void kvm_define_shared_msr(unsigned index, u32 msr);
 void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c8e90ec..baac78a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8803,6 +8803,12 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* We are now running in L2, mmu_notifier will force to reload the
+* page's hpa for L2 vmcs. Need to reload it for L1 before entering L1.
+*/
+   kvm_vcpu_reload_apic_access_page(vcpu);
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMRESUME instruction, so we need to set the
 * success or failure flag accordingly.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fc54fa6..2ae2dc7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,7 +5989,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
-static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
/*
 * Only APIC access page shared by L1 and L2 vm is handled. The APIC
@@ -6009,6 +6009,7 @@ static void kvm_vcpu_reload_apic_access_page(struct 
kvm_vcpu *vcpu)
page_to_phys(vcpu->kvm->arch.apic_access_page));
}
 }
+EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 9/9] kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

2014-09-20 Thread Tang Chen
To make apic access page migratable, we do not pin it in memory now.
When it is migrated, we should reload its physical address for all
vmcses. But when we tried to do this, all vcpu will access
kvm_arch->apic_access_page without any locking. This is not safe.

Actually, we do not need kvm_arch->apic_access_page anymore. Since
apic access page is not pinned in memory now, we can remove
kvm_arch->apic_access_page. When we need to write its physical address
into vmcs, use gfn_to_page() to get its page struct, which will also
pin it. And unpin it after then.

Suggested-by: Gleb Natapov 
Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/vmx.c  | 15 +--
 arch/x86/kvm/x86.c  | 16 +++-
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1a8317e..9fb3d4c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -576,7 +576,7 @@ struct kvm_arch {
struct kvm_apic_map *apic_map;
 
unsigned int tss_addr;
-   struct page *apic_access_page;
+   bool apic_access_page_done;
 
gpa_t wall_clock;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index baac78a..12f0715 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4002,7 +4002,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
int r = 0;
 
mutex_lock(&kvm->slots_lock);
-   if (kvm->arch.apic_access_page)
+   if (kvm->arch.apic_access_page_done)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
@@ -4018,7 +4018,12 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
}
 
-   kvm->arch.apic_access_page = page;
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
+   kvm->arch.apic_access_page_done = true;
 out:
mutex_unlock(&kvm->slots_lock);
return r;
@@ -4534,8 +4539,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
}
 
if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm))
-   vmcs_write64(APIC_ACCESS_ADDR,
-
page_to_phys(vmx->vcpu.kvm->arch.apic_access_page));
+   kvm_vcpu_reload_apic_access_page(vcpu);
 
if (vmx_vm_has_apicv(vcpu->kvm))
memset(&vmx->pi_desc, 0, sizeof(struct pi_desc));
@@ -8003,8 +8007,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
} else if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
exec_control |=
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-   vmcs_write64(APIC_ACCESS_ADDR,
-   page_to_phys(vcpu->kvm->arch.apic_access_page));
+   kvm_vcpu_reload_apic_access_page(vcpu);
}
 
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7dd4179..996af6e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5991,6 +5991,8 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
+   struct page *page = NULL;
+
/*
 * Only APIC access page shared by L1 and L2 vm is handled. The APIC
 * access page prepared by L1 for L2's execution is still pinned in
@@ -6003,10 +6005,16 @@ void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu 
*vcpu)
 * migrated, GUP will wait till the migrate entry is replaced
 * with the new pte entry pointing to the new page.
 */
-   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
-   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   page = gfn_to_page(vcpu->kvm,
+  APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
-   page_to_phys(vcpu->kvm->arch.apic_access_page));
+  page_to_phys(page));
+
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
}
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
@@ -7272,8 +7280,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kfree(kvm->arch.vpic);
kfree(kvm->arch.vioapic);
kvm_free_vcpus(kvm);
-   if (kvm->arch.apic_access_page)
-   put_page(kvm->arch.apic_access_pa

[PATCH v7 5/9] kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().

2014-09-20 Thread Tang Chen
We are handling "L1 and L2 share one apic access page" situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
  vmcs in the next L1->L2 entry.

   2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
  L0->L1 entry and L2's vmcs in the next L1->L2 entry.

   3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
  L0->L2 entry and L1's vmcs in the next L2->L1 exit.

This patch handles 1) and 2) by making a new vcpu request named 
KVM_REQ_APIC_PAGE_RELOAD
to reload apic access page in L0->L1 entry.

Since we don't handle L1 ans L2 have separate apic access pages situation,
when we update vmcs, we need to check if we are in L2 and if L2's secondary
exec virtualzed apic accesses is enabled.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  |  6 ++
 arch/x86/kvm/x86.c  | 23 +++
 include/linux/kvm_host.h|  1 +
 5 files changed, 37 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 69fe032..56156eb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -740,6 +740,7 @@ struct kvm_x86_ops {
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
bool (*has_secondary_apic_access)(struct kvm_vcpu *vcpu);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 9c8ae32..99378d7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3624,6 +3624,11 @@ static bool svm_has_secondary_apic_access(struct 
kvm_vcpu *vcpu)
return false;
 }
 
+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4379,6 +4384,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
.has_secondary_apic_access = svm_has_secondary_apic_access,
+   .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0b541d9..c8e90ec 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7098,6 +7098,11 @@ static bool vmx_has_secondary_apic_access(struct 
kvm_vcpu *vcpu)
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8918,6 +8923,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
.has_secondary_apic_access = vmx_has_secondary_apic_access,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e05bd58..fc54fa6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,6 +5989,27 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+   /*
+* Only APIC access page shared by L1 and L2 vm is handled. The APIC
+* access page prepared by L1 for L2's execution is still pinned in
+* memory, and it cannot be migrated.
+*/
+   if (!is_guest_mode(vcpu) ||
+   !kvm_x86_ops->has_secondary_apic_access(vcpu)) {
+   /*
+* APIC access page could be migrated. When the page is being
+* migrated, GUP will wait till the migrate entry is replaced
+* with the new pte entry pointing to the new page.
+*/
+   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
+   page_to_

[PATCH v7 6/9] kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static.

2014-09-20 Thread Tang Chen
Since different architectures need different handling, we will add some arch 
specific
code later. The code may need to make cpu requests outside kvm_main.c, so make 
it
non-static and rename it to kvm_make_all_cpus_request().

Signed-off-by: Tang Chen 
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c  | 10 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c23236a..73de13c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -580,6 +580,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm);
 void kvm_reload_remote_mmus(struct kvm *kvm);
 void kvm_make_mclock_inprogress_request(struct kvm *kvm);
 void kvm_make_scan_ioapic_request(struct kvm *kvm);
+bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
 
 long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..0f8b6f6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -152,7 +152,7 @@ static void ack_flush(void *_completed)
 {
 }
 
-static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
+bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
 {
int i, cpu, me;
cpumask_var_t cpus;
@@ -189,7 +189,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
long dirty_count = kvm->tlbs_dirty;
 
smp_mb();
-   if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
+   if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
++kvm->stat.remote_tlb_flush;
cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
 }
@@ -197,17 +197,17 @@ EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
 
 void kvm_reload_remote_mmus(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
 }
 
 void kvm_make_mclock_inprogress_request(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS);
 }
 
 void kvm_make_scan_ioapic_request(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
 }
 
 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 4/9] kvm: Add interface to check if secondary exec virtualzed apic accesses is enabled.

2014-09-20 Thread Tang Chen
We wants to migrate apic access page pinned by guest (L1 and L2) to make memory
hotplug available.

There are two situations need to be handled for apic access page used by L2 vm:
1. L1 prepares a separate apic access page for L2.

   L2 pins a lot of pages in memory. Even if we can migrate apic access page,
   memory hotplug is not available when L2 is running. So do not handle this
   now. Migrate apic access page only.

2. L1 and L2 share one apic access page.

   Since we will migrate L1's apic access page, we should do some handling when
   migration happens in the following situations:

   1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
  vmcs in the next L1->L2 entry.

   2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
  L0->L1 entry and L2's vmcs in the next L1->L2 entry.

   3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
  L0->L2 entry and L1's vmcs in the next L2->L1 exit.

Since we don't handle L1 ans L2 have separate apic access pages situation,
when we update vmcs, we need to check if we are in L2 and if L2's secondary
exec virtualzed apic accesses is enabled.

This patch adds an interface to check if L2's secondary exec virtualzed apic
accesses is enabled, because vmx cannot be accessed outside vmx.c.

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/svm.c  | 6 ++
 arch/x86/kvm/vmx.c  | 9 +
 3 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35171c7..69fe032 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -739,6 +739,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   bool (*has_secondary_apic_access)(struct kvm_vcpu *vcpu);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d941ad..9c8ae32 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3619,6 +3619,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
+static bool svm_has_secondary_apic_access(struct kvm_vcpu *vcpu)
+{
+   return false;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4373,6 +4378,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+   .has_secondary_apic_access = svm_has_secondary_apic_access,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 72a0470..0b541d9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7090,6 +7090,14 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static bool vmx_has_secondary_apic_access(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   return (vmx->nested.current_vmcs12->secondary_vm_exec_control &
+   SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8909,6 +8917,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .has_secondary_apic_access = vmx_has_secondary_apic_access,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 0/9] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-20 Thread Tang Chen
ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

This patch-set is based on Linux 3.17.0-rc5.

NOTE: Tested with -cpu xxx,-x2apic option.
  But since nested vm pins some other pages in memory, if user uses nested
  vm, memory hot-remove will not work.

Change log v6 -> v7:
1. Patch 1/9~3/9 are applied to kvm/queue by Paolo Bonzini 
.
   Just resend them, no changes.
2. In the new patch 4/9, add a new interface to check if secondary exec 
   virtualized apic access is enabled.
3. In new patch 6/9, rename make_all_cpus_request() to 
kvm_make_all_cpus_request() 
   and make it non-static so that we can use it in other patches.
4. In new patch 8/9, add arch specific function to make apic access page reload 
   request in mmu notifier.

Tang Chen (9):
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm: Add interface to check if secondary exec virtualzed apic accesses
is enabled.
  kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().
  kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and
make it non-static.
  kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
running.
  kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access
migration.
  kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

 arch/x86/include/asm/kvm_host.h |   6 ++-
 arch/x86/kvm/svm.c  |  15 +-
 arch/x86/kvm/vmx.c  | 104 
 arch/x86/kvm/x86.c  |  47 --
 include/linux/kvm_host.h|  16 ++-
 virt/kvm/kvm_main.c |  13 +++--
 6 files changed, 146 insertions(+), 55 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 3/9] kvm: Make init_rmode_identity_map() return 0 on success.

2014-09-20 Thread Tang Chen
In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4fb84ad..72a0470 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3939,45 +3939,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm->arch.ept_identity_pagetable_done. */
mutex_lock(&kvm->slots_lock);
 
-   if (likely(kvm->arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm->arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(&kvm->srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r < 0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i < PT32_ENT_PER_PAGE; i++) {
tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
&tmp, i * sizeof(tmp), sizeof(tmp));
-   if (r < 0)
+   if (ret)
goto out;
}
kvm->arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(&kvm->srcu, idx);
-
 out2:
mutex_unlock(&kvm->slots_lock);
return ret;
@@ -7604,11 +7601,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
*kvm, unsigned int id)
if (err)
goto free_vcpu;
 
+   /* Set err to -ENOMEM to handle memory allocation error. */
+   err = -ENOMEM;
+
vmx->guest_msrs = kmalloc(PAGE_SIZE, GFP_KERNEL);
BUILD_BUG_ON(ARRAY_SIZE(vmx_msr_index) * sizeof(vmx->guest_msrs[0])
 > PAGE_SIZE);
 
-   err = -ENOMEM;
if (!vmx->guest_msrs) {
goto uninit_vcpu;
}
@@ -7641,8 +7640,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
if (!kvm->arch.ept_identity_map_addr)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
-   err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   err = init_rmode_identity_map(kvm);
+   if (err < 0)
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 1/9] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-09-20 Thread Tang Chen
We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..1d941ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm->asid_generation = 0;
init_vmcb(svm);
 
-   svm->vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&svm->vcpu))
svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..4b80ead 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(&vmx->vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&vmx->vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access migration.

2014-09-23 Thread Tang Chen
Hi Paolo, 

I'm not sure if this patch is following your comment. Please review.
And all the other comments are followed. If this patch is OK, I'll 
send v8 soon.

Thanks.

We are handling "L1 and L2 share one apic access page" situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
  vmcs in the next L1->L2 entry.

   2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
  L0->L1 entry and L2's vmcs in the next L1->L2 entry.

   3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
  L0->L2 entry and L1's vmcs in the next L2->L1 exit.

This patch force a L1->L0 exit or L2->L0 exit when shared apic access page is
migrated using mmu notifier. Since apic access page is only used on intel x86,
this is arch specific code.
---
 arch/arm/include/asm/kvm_host.h |  6 ++
 arch/arm64/include/asm/kvm_host.h   |  6 ++
 arch/ia64/include/asm/kvm_host.h|  8 
 arch/mips/include/asm/kvm_host.h|  7 +++
 arch/powerpc/include/asm/kvm_host.h |  6 ++
 arch/s390/include/asm/kvm_host.h|  9 +
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/x86.c  | 11 +++
 virt/kvm/kvm_main.c |  3 +++
 9 files changed, 58 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6dfb404..79bbf7d 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -182,6 +182,12 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
 }
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
 
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e10c45a..ee89fad 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -192,6 +192,12 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
 }
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
 
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index db95f57..326ac55 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -574,6 +574,14 @@ static inline struct kvm_pt_regs *vcpu_regs(struct 
kvm_vcpu *v)
return (struct kvm_pt_regs *) ((unsigned long) v + KVM_STK_OFFSET) - 1;
 }
 
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
+
 typedef int kvm_vmm_entry(void);
 typedef void kvm_tramp_entry(union context *host, union context *guest);
 
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 7a3fc67..c392705 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -767,5 +767,12 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t 
*opc,
 extern void kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
 extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
 
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
 
 #endif /* __MIPS_KVM_HOST_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 98d9dd5..c16a573 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -61,6 +61,12 @@ extern int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 extern void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+
 #define HPTEG_CACHE_NUM(1 << 15)
 #define HPTEG_HASH_BITS_PTE13
 #define HPTEG_HASH_BITS_PTE_LONG   12
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 773bef7..693290f 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -450,4 +450,13 @@ void kvm_arch_async_page_present(struct

Re: [PATCH 1/1] kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access migration.

2014-09-24 Thread Tang Chen


On 09/24/2014 03:08 PM, Jan Kiszka wrote:

On 2014-09-24 04:09, Tang Chen wrote:

Hi Paolo,

I'm not sure if this patch is following your comment. Please review.
And all the other comments are followed. If this patch is OK, I'll
send v8 soon.

Thanks.

We are handling "L1 and L2 share one apic access page" situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
   vmcs in the next L1->L2 entry.

2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
   L0->L1 entry and L2's vmcs in the next L1->L2 entry.

3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
   L0->L2 entry and L1's vmcs in the next L2->L1 exit.

This patch force a L1->L0 exit or L2->L0 exit when shared apic access page is
migrated using mmu notifier. Since apic access page is only used on intel x86,
this is arch specific code.
---
  arch/arm/include/asm/kvm_host.h |  6 ++
  arch/arm64/include/asm/kvm_host.h   |  6 ++
  arch/ia64/include/asm/kvm_host.h|  8 
  arch/mips/include/asm/kvm_host.h|  7 +++
  arch/powerpc/include/asm/kvm_host.h |  6 ++
  arch/s390/include/asm/kvm_host.h|  9 +
  arch/x86/include/asm/kvm_host.h |  2 ++
  arch/x86/kvm/x86.c  | 11 +++
  virt/kvm/kvm_main.c |  3 +++
  9 files changed, 58 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6dfb404..79bbf7d 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -182,6 +182,12 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
  }
  
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,

+unsigned long address)
+{
+   return;

Redundant return, more cases below.


OK, will remove it. Thanks.



Jan


+}
+
  struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
  struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
  
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h

index e10c45a..ee89fad 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -192,6 +192,12 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
  }
  
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,

+unsigned long address)
+{
+   return;
+}
+
  struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
  struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
  
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h

index db95f57..326ac55 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -574,6 +574,14 @@ static inline struct kvm_pt_regs *vcpu_regs(struct 
kvm_vcpu *v)
return (struct kvm_pt_regs *) ((unsigned long) v + KVM_STK_OFFSET) - 1;
  }
  
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER

+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
+
  typedef int kvm_vmm_entry(void);
  typedef void kvm_tramp_entry(union context *host, union context *guest);
  
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h

index 7a3fc67..c392705 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -767,5 +767,12 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t 
*opc,
  extern void kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
  extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
  
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER

+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
  
  #endif /* __MIPS_KVM_HOST_H__ */

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 98d9dd5..c16a573 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -61,6 +61,12 @@ extern int kvm_age_hva(struct kvm *kvm, unsigned long hva);
  extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
  extern void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
  
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,

+unsigned long address)
+{
+   return;
+}
+
  #define HPTEG_CACHE_NUM   (1 << 15)
  #define HPTEG_HASH_BITS_PTE 

[PATCH v8 0/8] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-24 Thread Tang Chen
ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

This patch-set is based on Linux 3.17.0-rc5.

NOTE: Tested with -cpu xxx,-x2apic option.
  But since nested vm pins some other pages in memory, if user uses nested
  vm, memory hot-remove will not work.

Change log v7->v8:
1. Patch 1/9~3/9 were applied to kvm/queue by Paolo Bonzini 
.
   Just resend them, no changes.
2. Removed previous patch 4/9, which added unnecessary hook 
has_secondary_apic_access().
3. Set kvm_x86_ops->set_apic_access_page_addr to NULL when hardware had no 
flexpriority
   functionality which actually exists only on x86. 
4. Moved declaration of kvm_arch_mmu_notifier_invalidate_page() to 
arch/*/include/asm/kvm_host.h.
5. Removed useless set_apic_access_page_addr() hook for svm.

Tang Chen (8):
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().
  kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and
make it non-static.
  kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
running.
  kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access
migration.
  kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

 arch/arm/include/asm/kvm_host.h |   5 ++
 arch/arm64/include/asm/kvm_host.h   |   5 ++
 arch/ia64/include/asm/kvm_host.h|   7 ++
 arch/mips/include/asm/kvm_host.h|   6 ++
 arch/powerpc/include/asm/kvm_host.h |   5 ++
 arch/s390/include/asm/kvm_host.h|   8 +++
 arch/x86/include/asm/kvm_host.h |   7 +-
 arch/x86/kvm/svm.c  |   3 +-
 arch/x86/kvm/vmx.c  | 130 
 arch/x86/kvm/x86.c  |  45 +++--
 include/linux/kvm_host.h|   2 +
 virt/kvm/kvm_main.c |  13 ++--
 12 files changed, 180 insertions(+), 56 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 6/8] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-09-24 Thread Tang Chen
We are handling "L1 and L2 share one apic access page" situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
  vmcs in the next L1->L2 entry.

   2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
  L0->L1 entry and L2's vmcs in the next L1->L2 entry.

   3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
  L0->L2 entry and L1's vmcs in the next L2->L1 exit.

This patch handles 3).

In L0->L2 entry, L2's vmcs will be updated in prepare_vmcs02() called by
nested_vm_run(). So we need to do nothing.

In L2->L1 exit, this patch requests apic access page reload in L2->L1 vmexit.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/vmx.c  | 6 ++
 arch/x86/kvm/x86.c  | 3 ++-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 582cd0f..66480fd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1046,6 +1046,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu);
 
 void kvm_define_shared_msr(unsigned index, u32 msr);
 void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1411bab..40bb9fc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8826,6 +8826,12 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* We are now running in L2, mmu_notifier will force to reload the
+* page's hpa for L2 vmcs. Need to reload it for L1 before entering L1.
+*/
+   kvm_vcpu_reload_apic_access_page(vcpu);
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMRESUME instruction, so we need to set the
 * success or failure flag accordingly.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1f0c99a..c064ca6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,7 +5989,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
-static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
/*
 * If platform doesn't have 2nd exec virtualize apic access affinity,
@@ -6009,6 +6009,7 @@ static void kvm_vcpu_reload_apic_access_page(struct 
kvm_vcpu *vcpu)
kvm_x86_ops->set_apic_access_page_addr(vcpu,
page_to_phys(vcpu->kvm->arch.apic_access_page));
 }
+EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 8/8] kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

2014-09-24 Thread Tang Chen
To make apic access page migratable, we do not pin it in memory now.
When it is migrated, we should reload its physical address for all
vmcses. But when we tried to do this, all vcpu will access
kvm_arch->apic_access_page without any locking. This is not safe.

Actually, we do not need kvm_arch->apic_access_page anymore. Since
apic access page is not pinned in memory now, we can remove
kvm_arch->apic_access_page. When we need to write its physical address
into vmcs, use gfn_to_page() to get its page struct, which will also
pin it. And unpin it after then.

Suggested-by: Gleb Natapov 
Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/vmx.c  | 17 ++---
 arch/x86/kvm/x86.c  | 16 ++--
 3 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 408b944..e27e1f9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -576,7 +576,7 @@ struct kvm_arch {
struct kvm_apic_map *apic_map;
 
unsigned int tss_addr;
-   struct page *apic_access_page;
+   bool apic_access_page_done;
 
gpa_t wall_clock;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 40bb9fc..4069075 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4010,7 +4010,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
int r = 0;
 
mutex_lock(&kvm->slots_lock);
-   if (kvm->arch.apic_access_page)
+   if (kvm->arch.apic_access_page_done)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
@@ -4026,7 +4026,12 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
}
 
-   kvm->arch.apic_access_page = page;
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
+   kvm->arch.apic_access_page_done = true;
 out:
mutex_unlock(&kvm->slots_lock);
return r;
@@ -4541,9 +4546,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmcs_write32(TPR_THRESHOLD, 0);
}
 
-   if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm))
-   vmcs_write64(APIC_ACCESS_ADDR,
-
page_to_phys(vmx->vcpu.kvm->arch.apic_access_page));
+   /* Reload apic access page in case it was migrated. */
+   kvm_vcpu_reload_apic_access_page(vcpu);
 
if (vmx_vm_has_apicv(vcpu->kvm))
memset(&vmx->pi_desc, 0, sizeof(struct pi_desc));
@@ -8026,8 +8030,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
} else if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
exec_control |=
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-   vmcs_write64(APIC_ACCESS_ADDR,
-   page_to_phys(vcpu->kvm->arch.apic_access_page));
+   kvm_vcpu_reload_apic_access_page(vcpu);
}
 
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e042ef6..f7cbc36 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5991,6 +5991,8 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
+   struct page *page = NULL;
+
/*
 * If platform doesn't have 2nd exec virtualize apic access affinity,
 * set_apic_access_page_addr() will be set to NULL in hardware_setup(),
@@ -6004,10 +6006,14 @@ void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu 
*vcpu)
 * migrated, GUP will wait till the migrate entry is replaced
 * with the new pte entry pointing to the new page.
 */
-   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
-   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
-   kvm_x86_ops->set_apic_access_page_addr(vcpu,
-   page_to_phys(vcpu->kvm->arch.apic_access_page));
+   page = gfn_to_page(vcpu->kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   kvm_x86_ops->set_apic_access_page_addr(vcpu, page_to_phys(page));
+
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
@@ -7272,8 +7278,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kfree(kvm->arch.vpic);
kfree(kvm->arch.vioapic);
kvm_free_vcpus(kvm);
-   if (kvm->arch.apic_access_page)
-   put_page(kvm->arch.apic_access_page);
kfree(rcu_dereference_check(kvm-

[PATCH v8 7/8] kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access migration.

2014-09-24 Thread Tang Chen
We are handling "L1 and L2 share one apic access page" situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
  vmcs in the next L1->L2 entry.

   2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
  L0->L1 entry and L2's vmcs in the next L1->L2 entry.

   3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
  L0->L2 entry and L1's vmcs in the next L2->L1 exit.

This patch force a L1->L0 exit or L2->L0 exit when shared apic access page is
migrated using mmu notifier. Since apic access page is only used on intel x86,
this is arch specific code.

Signed-off-by: Tang Chen 
---
 arch/arm/include/asm/kvm_host.h |  5 +
 arch/arm64/include/asm/kvm_host.h   |  5 +
 arch/ia64/include/asm/kvm_host.h|  7 +++
 arch/mips/include/asm/kvm_host.h|  6 ++
 arch/powerpc/include/asm/kvm_host.h |  5 +
 arch/s390/include/asm/kvm_host.h|  8 
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/x86.c  | 11 +++
 virt/kvm/kvm_main.c |  3 +++
 9 files changed, 52 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6dfb404..f5b3f53 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -182,6 +182,11 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
 }
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+}
+
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
 
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e10c45a..594873a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -192,6 +192,11 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
 }
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+}
+
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
 
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index db95f57..282e71f 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -574,6 +574,13 @@ static inline struct kvm_pt_regs *vcpu_regs(struct 
kvm_vcpu *v)
return (struct kvm_pt_regs *) ((unsigned long) v + KVM_STK_OFFSET) - 1;
 }
 
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
+
 typedef int kvm_vmm_entry(void);
 typedef void kvm_tramp_entry(union context *host, union context *guest);
 
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 7a3fc67..4826d29 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -767,5 +767,11 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t 
*opc,
 extern void kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
 extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
 
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
 
 #endif /* __MIPS_KVM_HOST_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 98d9dd5..e40402d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -61,6 +61,11 @@ extern int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 extern void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+}
+
 #define HPTEG_CACHE_NUM(1 << 15)
 #define HPTEG_HASH_BITS_PTE13
 #define HPTEG_HASH_BITS_PTE_LONG   12
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 773bef7..e4d6708 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -450,4 +450,12 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 extern char sie_exit;
+
+#ifdef KVM_ARCH_WANT_MMU_NOTIFI

[PATCH v8 4/8] kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().

2014-09-24 Thread Tang Chen
We wants to migrate apic access page pinned by guest (L1 and L2) to make memory
hotplug available.

There are two situations need to be handled for apic access page used by L2 vm:
1. L1 prepares a separate apic access page for L2.

   L2 pins a lot of pages in memory. Even if we can migrate apic access page,
   memory hotplug is not available when L2 is running. So do not handle this
   now. Migrate L1's apic access page only.

2. L1 and L2 share one apic access page.

   Since we will migrate L1's apic access page, we should do some handling when
   migration happens in the following situations:

   1) when L0 is running: Update L1's vmcs in the next L0->L1 entry and L2's
  vmcs in the next L1->L2 entry.

   2) when L1 is running: Force a L1->L0 exit, update L1's vmcs in the next
  L0->L1 entry and L2's vmcs in the next L1->L2 entry.

   3) when L2 is running: Force a L2->L0 exit, update L2's vmcs in the next
  L0->L2 entry and L1's vmcs in the next L2->L1 exit.

This patch handles 1) and 2).

Since we don't handle L1 ans L2 have separate apic access pages situation,
when we update vmcs, we need to check if we are in L2 and if L1 prepares an
non-shared apic access page for L2. We do this in 
vmx_set_apic_access_page_addr()
when trying to set new apic access page's hpa like this:

   if (!is_guest_mode(vcpu) ||
   !(vmx->nested.current_vmcs12->secondary_vm_exec_control &
 SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))

Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/vmx.c  | 39 ++-
 arch/x86/kvm/x86.c  | 23 +++
 include/linux/kvm_host.h|  1 +
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35171c7..582cd0f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -739,6 +739,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 72a0470..1411bab 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3108,9 +3108,17 @@ static __init int hardware_setup(void)
if (!cpu_has_vmx_unrestricted_guest())
enable_unrestricted_guest = 0;
 
-   if (!cpu_has_vmx_flexpriority())
+   if (!cpu_has_vmx_flexpriority()) {
flexpriority_enabled = 0;
 
+   /*
+* set_apic_access_page_addr() is used to reload apic access
+* page in case it is migrated for memory hotplug reason. If
+* platform doesn't have this affinity, no need to handle it.
+*/
+   kvm_x86_ops->set_apic_access_page_addr = NULL;
+   }
+
if (!cpu_has_vmx_tpr_shadow())
kvm_x86_ops->update_cr8_intercept = NULL;
 
@@ -7090,6 +7098,34 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   /*
+* This function is used to reload apic access page in case it is
+* migrated for memory hotplug reason. And only L1 and L2 share the
+* same apic access page situation is handled.
+*
+* 1) If vcpu is not in guest mode (in L1), reload the page for L1.
+*And L2's page will be reloaded in the next L1->L2 entry by
+*prepare_vmcs02().
+*
+* 2) If vcpu is in guest mode (in L2), but L1 didn't not prepare an
+*apic access page for L2 (current_vmcs12->secondary_vm_exec_control
+*does not have SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES bit set),
+*reload the page for L2.
+*And L1's page will be reloaded in the next L2->L1 exit.
+*
+* 3) Otherwise, do nothing. L2's specific apic access page is still
+*pinned in memory, and not hotpluggable.
+*/
+   if (!is_guest_mode(vcpu) ||
+   !(vmx->nested.current_vmcs12->secondary_vm_exec_control &
+ SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -890

[PATCH v8 2/8] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-09-24 Thread Tang Chen
kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 47 +++--
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..35171c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,7 +580,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b80ead..4fb84ad 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3938,21 +3939,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm->arch.ept_identity_pagetable)) {
-   printk(KERN_ERR "EPT: identity-mapping pagetable "
-   "haven't been allocated!\n");
-   return 0;
+
+   /* Protect kvm->arch.ept_identity_pagetable_done. */
+   mutex_lock(&kvm->slots_lock);
+
+   if (likely(kvm->arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm->arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(&kvm->srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
@@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(&kvm->srcu, idx);
+
+out2:
+   mutex_unlock(&kvm->slots_lock);
return ret;
 }
 
@@ -4019,31 +4029,20 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /* Called with kvm->slots_lock held. */
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(&kvm->slots_lock);
-   if (kvm->arch.ept_identity_pagetable)
-   goto out;
+   BUG_ON(kvm->arch.ept_identity_pagetable_done);
+
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm->arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
-   if (r)
-   goto out;
-
-   page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
 
-   kvm->arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(&kvm->slots_lock);
return r;
 }
 
@@ -7643,8 +7642,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..e05bd58 100644
--- a/arch/x86/kvm/x86.c
++

[PATCH v8 5/8] kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static.

2014-09-24 Thread Tang Chen
Since different architectures need different handling, we will add some arch 
specific
code later. The code may need to make cpu requests outside kvm_main.c, so make 
it
non-static and rename it to kvm_make_all_cpus_request().

Reviewed-by: Paolo Bonzini 
Signed-off-by: Tang Chen 
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c  | 10 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c23236a..73de13c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -580,6 +580,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm);
 void kvm_reload_remote_mmus(struct kvm *kvm);
 void kvm_make_mclock_inprogress_request(struct kvm *kvm);
 void kvm_make_scan_ioapic_request(struct kvm *kvm);
+bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
 
 long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..0f8b6f6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -152,7 +152,7 @@ static void ack_flush(void *_completed)
 {
 }
 
-static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
+bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
 {
int i, cpu, me;
cpumask_var_t cpus;
@@ -189,7 +189,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
long dirty_count = kvm->tlbs_dirty;
 
smp_mb();
-   if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
+   if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
++kvm->stat.remote_tlb_flush;
cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
 }
@@ -197,17 +197,17 @@ EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
 
 void kvm_reload_remote_mmus(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
 }
 
 void kvm_make_mclock_inprogress_request(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS);
 }
 
 void kvm_make_scan_ioapic_request(struct kvm *kvm)
 {
-   make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
 }
 
 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 3/8] kvm: Make init_rmode_identity_map() return 0 on success.

2014-09-24 Thread Tang Chen
In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen 
---
 arch/x86/kvm/vmx.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4fb84ad..72a0470 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3939,45 +3939,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm->arch.ept_identity_pagetable_done. */
mutex_lock(&kvm->slots_lock);
 
-   if (likely(kvm->arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm->arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(&kvm->srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r < 0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i < PT32_ENT_PER_PAGE; i++) {
tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
&tmp, i * sizeof(tmp), sizeof(tmp));
-   if (r < 0)
+   if (ret)
goto out;
}
kvm->arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(&kvm->srcu, idx);
-
 out2:
mutex_unlock(&kvm->slots_lock);
return ret;
@@ -7604,11 +7601,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
*kvm, unsigned int id)
if (err)
goto free_vcpu;
 
+   /* Set err to -ENOMEM to handle memory allocation error. */
+   err = -ENOMEM;
+
vmx->guest_msrs = kmalloc(PAGE_SIZE, GFP_KERNEL);
BUILD_BUG_ON(ARRAY_SIZE(vmx_msr_index) * sizeof(vmx->guest_msrs[0])
 > PAGE_SIZE);
 
-   err = -ENOMEM;
if (!vmx->guest_msrs) {
goto uninit_vcpu;
}
@@ -7641,8 +7640,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
if (!kvm->arch.ept_identity_map_addr)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
-   err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   err = init_rmode_identity_map(kvm);
+   if (err < 0)
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 1/8] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-09-24 Thread Tang Chen
We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen 
Reviewed-by: Gleb Natapov 
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..1d941ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm->asid_generation = 0;
init_vmcb(svm);
 
-   svm->vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&svm->vcpu))
svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..4b80ead 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(&vmx->vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&vmx->vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 0/8] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-24 Thread Tang Chen


On 09/24/2014 04:20 PM, Paolo Bonzini wrote:

Il 24/09/2014 09:57, Tang Chen ha scritto:

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

This patch-set is based on Linux 3.17.0-rc5.

NOTE: Tested with -cpu xxx,-x2apic option.
   But since nested vm pins some other pages in memory, if user uses nested
   vm, memory hot-remove will not work.

Change log v7->v8:
1. Patch 1/9~3/9 were applied to kvm/queue by Paolo Bonzini 
.
Just resend them, no changes.
2. Removed previous patch 4/9, which added unnecessary hook 
has_secondary_apic_access().
3. Set kvm_x86_ops->set_apic_access_page_addr to NULL when hardware had no 
flexpriority
functionality which actually exists only on x86.
4. Moved declaration of kvm_arch_mmu_notifier_invalidate_page() to 
arch/*/include/asm/kvm_host.h.
5. Removed useless set_apic_access_page_addr() hook for svm.

Tang Chen (8):
   kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
   kvm: Remove ept_identity_pagetable from struct kvm_arch.
   kvm: Make init_rmode_identity_map() return 0 on success.
   kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().
   kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and
 make it non-static.
   kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
 running.
   kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access
 migration.
   kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

  arch/arm/include/asm/kvm_host.h |   5 ++
  arch/arm64/include/asm/kvm_host.h   |   5 ++
  arch/ia64/include/asm/kvm_host.h|   7 ++
  arch/mips/include/asm/kvm_host.h|   6 ++
  arch/powerpc/include/asm/kvm_host.h |   5 ++
  arch/s390/include/asm/kvm_host.h|   8 +++
  arch/x86/include/asm/kvm_host.h |   7 +-
  arch/x86/kvm/svm.c  |   3 +-
  arch/x86/kvm/vmx.c  | 130 
  arch/x86/kvm/x86.c  |  45 +++--
  include/linux/kvm_host.h|   2 +
  virt/kvm/kvm_main.c |  13 ++--
  12 files changed, 180 insertions(+), 56 deletions(-)


Thanks for your persistence!  The patches look good, I'll test them and
apply to kvm/queue.


Sure, thank you very much. :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 0/8] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-25 Thread Tang Chen

Hi Paolo,

I'd like to help to test the patches.
Would you please tell me what is the best way to test this patch-set ?

I think ept page is being used by regular guest.
Is adding "-cpu xxx,-x2apic option" able to make sure guest is using
apic page ?

Thanks.

On 09/24/2014 04:20 PM, Paolo Bonzini wrote:

Il 24/09/2014 09:57, Tang Chen ha scritto:

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

This patch-set is based on Linux 3.17.0-rc5.

NOTE: Tested with -cpu xxx,-x2apic option.
   But since nested vm pins some other pages in memory, if user uses nested
   vm, memory hot-remove will not work.

Change log v7->v8:
1. Patch 1/9~3/9 were applied to kvm/queue by Paolo Bonzini 
.
Just resend them, no changes.
2. Removed previous patch 4/9, which added unnecessary hook 
has_secondary_apic_access().
3. Set kvm_x86_ops->set_apic_access_page_addr to NULL when hardware had no 
flexpriority
functionality which actually exists only on x86.
4. Moved declaration of kvm_arch_mmu_notifier_invalidate_page() to 
arch/*/include/asm/kvm_host.h.
5. Removed useless set_apic_access_page_addr() hook for svm.

Tang Chen (8):
   kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
   kvm: Remove ept_identity_pagetable from struct kvm_arch.
   kvm: Make init_rmode_identity_map() return 0 on success.
   kvm, mem-hotplug: Reload L1's apic access page in vcpu_enter_guest().
   kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and
 make it non-static.
   kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
 running.
   kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access
 migration.
   kvm, mem-hotplug: Unpin and remove kvm_arch->apic_access_page.

  arch/arm/include/asm/kvm_host.h |   5 ++
  arch/arm64/include/asm/kvm_host.h   |   5 ++
  arch/ia64/include/asm/kvm_host.h|   7 ++
  arch/mips/include/asm/kvm_host.h|   6 ++
  arch/powerpc/include/asm/kvm_host.h |   5 ++
  arch/s390/include/asm/kvm_host.h|   8 +++
  arch/x86/include/asm/kvm_host.h |   7 +-
  arch/x86/kvm/svm.c  |   3 +-
  arch/x86/kvm/vmx.c  | 130 
  arch/x86/kvm/x86.c  |  45 +++--
  include/linux/kvm_host.h|   2 +
  virt/kvm/kvm_main.c |  13 ++--
  12 files changed, 180 insertions(+), 56 deletions(-)


Thanks for your persistence!  The patches look good, I'll test them and
apply to kvm/queue.

Paolo
.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 0/8] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-25 Thread Tang Chen


On 09/25/2014 09:43 PM, Paolo Bonzini wrote:

Il 25/09/2014 10:19, Tang Chen ha scritto:

Hi Paolo,

I'd like to help to test the patches.
Would you please tell me what is the best way to test this patch-set ?

How did _you_ test the patches?...


I just added "-cpu xxx,-x2apic" option, start the guest, using numactl 
to bind it
to a node (eg: node1). And offline the pages on node1. And repeat above 
thing

for a while. And it is OK.

I'm not sure if this is enough.

Thanks.




I think ept page is being used by regular guest.
Is adding "-cpu xxx,-x2apic option" able to make sure guest is using
apic page ?

Yes.

Paolo
.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: x86: fix access memslots w/o hold srcu read lock

2014-11-10 Thread Tang Chen

Hi Wanpeng,

I think I have totally missed this thread.
I opened lockdep and RCU debug, and tried on 3.18-rc1. But I didn't get 
the warning.

My steps are:

1. Use numactl to bind a qemu process to node1.
2. Offline all node1 memory. And the qemu process is still running.

Would you please tell me how did you reproduce it ?

Thanks.

On 11/02/2014 03:07 PM, Wanpeng Li wrote:

The srcu read lock must be held while accessing memslots (e.g.
when using gfn_to_* functions), however, commit c24ae0dcd3e8
("kvm: x86: Unpin and remove kvm_arch->apic_access_page") call
gfn_to_page() in kvm_vcpu_reload_apic_access_page() w/o hold it in
vmx_vcpu_reset() path which leads to suspicious rcu_dereference_check()
usage warning. This patch fix it by holding srcu read lock in all
kvm_vcpu_reset() call path.


[ INFO: suspicious RCU usage. ]
3.18.0-rc2-test2+ #70 Not tainted
---
include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-x86/2371:
  #0:  (&vcpu->mutex){+.+...}, at: [] vcpu_load+0x20/0xd0 
[kvm]

stack backtrace:
CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70
Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
  0001 880209983ca8 816f514f 
  8802099b8990 880209983cd8 810bd687 000fee00
  880208a2c000 880208a1 88020ef50040 880209983d08
Call Trace:
  [] dump_stack+0x4e/0x71
  [] lockdep_rcu_suspicious+0xe7/0x120
  [] gfn_to_memslot+0xd5/0xe0 [kvm]
  [] __gfn_to_pfn+0x33/0x60 [kvm]
  [] gfn_to_page+0x25/0x90 [kvm]
  [] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm]
  [] vmx_vcpu_reset+0x20c/0x460 [kvm_intel]
  [] kvm_vcpu_reset+0x15e/0x1b0 [kvm]
  [] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
  [] kvm_vm_ioctl+0x1d0/0x780 [kvm]
  [] ? __lock_is_held+0x54/0x80
  [] do_vfs_ioctl+0x300/0x520
  [] ? __fget+0x5/0x250
  [] ? __fget_light+0x2a/0xe0
  [] SyS_ioctl+0x81/0xa0
  [] system_call_fastpath+0x16/0x1b

Reported-by: Takashi Iwai 
Reported-by: Alexei Starovoitov 
Suggested-by: Paolo Bonzini 
Signed-off-by: Wanpeng Li 
---
v3 -> v4:
  * bypass the problem altoghter by kvm_make_request
v2 -> v3:
  * take care all vmx_vcpu_reset call path
v1 -> v2:
  * just fix hold the srcu read lock in vmx_vcpu_reset path

  arch/x86/kvm/vmx.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a0f78db..3e556c6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4579,7 +4579,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmcs_write32(TPR_THRESHOLD, 0);
}
  
-	kvm_vcpu_reload_apic_access_page(vcpu);

+   kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
  
  	if (vmx_vm_has_apicv(vcpu->kvm))

memset(&vmx->pi_desc, 0, sizeof(struct pi_desc));


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: x86: fix access memslots w/o hold srcu read lock

2014-11-13 Thread Tang Chen


Thanks for the sharing. Will do more tests. :)

On 11/14/2014 07:39 AM, Wanpeng Li wrote:

Hi Tang,
On Tue, Nov 11, 2014 at 01:35:29PM +0800, Tang Chen wrote:

Hi Wanpeng,


Sorry for the late.


I think I have totally missed this thread.
I opened lockdep and RCU debug, and tried on 3.18-rc1. But I didn't
get the warning.

I also opened lockdep and RCU debug, and tried 3.18.0-rc2 on a Ivy
bridge, the warning will be triggered after run qemu immediately. There
is no need to try any hotplug related stuff.

In addition, Paolo's patch is merged upstream to fix this.

commit a73896cb5bbdce672945745db8224352a689f580
Author: Paolo Bonzini 
Date:   Sun Nov 2 07:54:30 2014 +0100

KVM: vmx: defer load of APIC access page address during reset

Regards,
Wanpeng Li


My steps are:

1. Use numactl to bind a qemu process to node1.
2. Offline all node1 memory. And the qemu process is still running.

Would you please tell me how did you reproduce it ?

Thanks.

On 11/02/2014 03:07 PM, Wanpeng Li wrote:

The srcu read lock must be held while accessing memslots (e.g.
when using gfn_to_* functions), however, commit c24ae0dcd3e8
("kvm: x86: Unpin and remove kvm_arch->apic_access_page") call
gfn_to_page() in kvm_vcpu_reload_apic_access_page() w/o hold it in
vmx_vcpu_reset() path which leads to suspicious rcu_dereference_check()
usage warning. This patch fix it by holding srcu read lock in all
kvm_vcpu_reset() call path.


[ INFO: suspicious RCU usage. ]
3.18.0-rc2-test2+ #70 Not tainted
---
include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-x86/2371:
  #0:  (&vcpu->mutex){+.+...}, at: [] vcpu_load+0x20/0xd0 
[kvm]

stack backtrace:
CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70
Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
  0001 880209983ca8 816f514f 
  8802099b8990 880209983cd8 810bd687 000fee00
  880208a2c000 880208a1 88020ef50040 880209983d08
Call Trace:
  [] dump_stack+0x4e/0x71
  [] lockdep_rcu_suspicious+0xe7/0x120
  [] gfn_to_memslot+0xd5/0xe0 [kvm]
  [] __gfn_to_pfn+0x33/0x60 [kvm]
  [] gfn_to_page+0x25/0x90 [kvm]
  [] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm]
  [] vmx_vcpu_reset+0x20c/0x460 [kvm_intel]
  [] kvm_vcpu_reset+0x15e/0x1b0 [kvm]
  [] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
  [] kvm_vm_ioctl+0x1d0/0x780 [kvm]
  [] ? __lock_is_held+0x54/0x80
  [] do_vfs_ioctl+0x300/0x520
  [] ? __fget+0x5/0x250
  [] ? __fget_light+0x2a/0xe0
  [] SyS_ioctl+0x81/0xa0
  [] system_call_fastpath+0x16/0x1b

Reported-by: Takashi Iwai 
Reported-by: Alexei Starovoitov 
Suggested-by: Paolo Bonzini 
Signed-off-by: Wanpeng Li 
---
v3 -> v4:
  * bypass the problem altoghter by kvm_make_request
v2 -> v3:
  * take care all vmx_vcpu_reset call path
v1 -> v2:
  * just fix hold the srcu read lock in vmx_vcpu_reset path

  arch/x86/kvm/vmx.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a0f78db..3e556c6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4579,7 +4579,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmcs_write32(TPR_THRESHOLD, 0);
}
-   kvm_vcpu_reload_apic_access_page(vcpu);
+   kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
if (vmx_vm_has_apicv(vcpu->kvm))
memset(&vmx->pi_desc, 0, sizeof(struct pi_desc));

.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html