Re: KVM call for agend for 2014-09-16

2014-09-16 Thread Juan Quintela
Juan Quintela quint...@redhat.com wrote:
 Hi

 Please, send any topic that you are interested in covering.

 People have complained on the past that I don't cancel the call until
 the very last minute.  So, what do you think that deadline for
 submitting topics is 23:00UTC on Monday?

ok, no topics no call.

Have a productive week.

Later, Juan.




 Call details:

   15:00 CEST
   13:00 UTC
   09:00 EDT

 Every two weeks

 By popular demand, a google calendar public entry with it

   
 https://www.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ

(Let me know if you have any problems with the calendar entry)

 If you need phone number details,  contact me privately

 Thanks, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off

2014-09-16 Thread Paolo Bonzini
Il 16/09/2014 04:06, Andrew Jones ha scritto:
 We shouldn't try Load-Exclusive instructions unless we've enabled memory
 management, as these instructions depend on the data cache unit's
 coherency monitor. This patch adds a new setup boolean, initialized to false,
 that is used to guard Load-Exclusive instructions. Eventually we'll add more
 setup code that sets it true.
 
 Note: This problem became visible on boards with Cortex-A7 processors. Testing
 with Cortex-A15 didn't expose it, as those may have an external coherency
 monitor that still allows the instruction to execute (on A7s we got data
 aborts). Although even on A15's it's not clear from the spec if the
 instructions will behave as expected while caches are off, so we no longer
 allow Load-Exclusive instructions on those processors without caches enabled
 either.
 
 Signed-off-by: Andrew Jones drjo...@redhat.com
 ---
  lib/arm/asm/setup.h |  2 ++
  lib/arm/setup.c |  1 +
  lib/arm/spinlock.c  | 10 ++
  3 files changed, 13 insertions(+)
 
 diff --git a/lib/arm/asm/setup.h b/lib/arm/asm/setup.h
 index 21445ef2085fc..9c54c184e2866 100644
 --- a/lib/arm/asm/setup.h
 +++ b/lib/arm/asm/setup.h
 @@ -20,6 +20,8 @@ extern phys_addr_t __phys_offset, __phys_end;
  #define PHYS_SIZE(1ULL  PHYS_SHIFT)
  #define PHYS_MASK(PHYS_SIZE - 1ULL)
  
 +extern bool mem_caches_enabled;
 +
  #define L1_CACHE_SHIFT   6
  #define L1_CACHE_BYTES   (1  L1_CACHE_SHIFT)
  #define SMP_CACHE_BYTES  L1_CACHE_BYTES
 diff --git a/lib/arm/setup.c b/lib/arm/setup.c
 index 3941c9757dcb2..f7ed639c9d499 100644
 --- a/lib/arm/setup.c
 +++ b/lib/arm/setup.c
 @@ -25,6 +25,7 @@ u32 cpus[NR_CPUS] = { [0 ... NR_CPUS-1] = (~0UL) };
  int nr_cpus;
  
  phys_addr_t __phys_offset, __phys_end;
 +bool mem_caches_enabled;
  
  static void cpu_set(int fdtnode __unused, u32 regval, void *info __unused)
  {
 diff --git a/lib/arm/spinlock.c b/lib/arm/spinlock.c
 index d8a6d4c3383d6..43539c5e84062 100644
 --- a/lib/arm/spinlock.c
 +++ b/lib/arm/spinlock.c
 @@ -1,12 +1,22 @@
  #include libcflat.h
  #include asm/spinlock.h
  #include asm/barrier.h
 +#include asm/setup.h
  
  void spin_lock(struct spinlock *lock)
  {
   u32 val, fail;
  
   dmb();
 +
 + /*
 +  * Without caches enabled Load-Exclusive instructions may fail.
 +  * In that case we do nothing, and just hope the caller knows
 +  * what they're doing.
 +  */
 + if (!mem_caches_enabled)
 + return;

Should it at least write 1 to the spinlock?

Paolo

   do {
   asm volatile(
   1: ldrex   %0, [%2]\n
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)

2014-09-16 Thread Paolo Bonzini
Il 15/09/2014 20:14, Matt Mullins ha scritto:
 On Tue, Sep 09, 2014 at 11:53:49PM -0700, Matt Mullins wrote:
 On Mon, Sep 08, 2014 at 06:18:46PM +0200, Paolo Bonzini wrote:
 What version of QEMU?  Can you try the 12.04 qemu (which IIRC is 1.0) on
 top of the newer kernel?

 I did reproduce this on qemu 1.0.1.

 What would you like me to test next?  I've got both VM hosts currently 
 running
 3.17-rc4, so I'll know tomorrow if that works.

 I've been looking into this off-and-on for a while now, and this time around 
 I
 may have found other folks experiencing the same issue:
 https://issues.apache.org/jira/browse/CLOUDSTACK-6788
 That one's a little empty on the details (do y'all know more about to whom 
 that
 bug was known than I do?), but
 https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1297218
 sees similar symptoms due to running NTP on the host.  I may try disabling 
 that
 after the current round of testing-3.17-rc4 finishes up.
 
 A summary of my testing from last week:
   * 3.17-rc4 (with qemu 2.0) seems to have had the same problem as well.
   * Disabling ntpd didn't help either.
   * timer name=kvmclock present=no/ _does_ seem to make my VM migrate
 without issue.
 
 I'm not really sure what to look at from here.  I suppose leaving kvmclock
 disabled is a workaround for now, but is there a major disadvantage to doing
 so?

Sorry for not following up.  I think we have QEMU patches to fix this
issue.  I'll reply as soon as they are available in a git tree.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Using the tlb flush util function where applicable

2014-09-16 Thread Paolo Bonzini
Il 16/09/2014 00:49, Liang Chen ha scritto:
  ---
  (And what about a possible followup patch that replaces
   kvm_mmu_flush_tlb() with kvm_make_request() again?
   It would free the namespace a bit and we could call something
   similarly named from vcpu_enter_guest() to do the job.)
 That seems good. I can take the labor to make the followup patch ;)

Ok, so I'll not apply your first patch.

Thanks!

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] QEMU with KVM does not start Win8 on kernel 3.4.67 and core2duo

2014-09-16 Thread Erik Rull
 On September 12, 2014 at 7:29 PM Jan Kiszka jan.kis...@siemens.com wrote:


 On 2014-09-12 19:15, Jan Kiszka wrote:
  On 2014-09-12 14:29, Erik Rull wrote:
  On September 11, 2014 at 3:32 PM Jan Kiszka jan.kis...@siemens.com
  wrote:
 
 
  On 2014-09-11 15:25, Erik Rull wrote:
  On August 6, 2014 at 1:19 PM Erik Rull erik.r...@rdsoftware.de wrote:
 
 
  Hi all,
 
  I did already several tests and I'm not completely sure what's going
  wrong,
  but
  here my scenario:
 
  When I start up QEMU w/ KVM 1.7.0 on a Core2Duo machine running a
  vanilla
  kernel
  3.4.67 to run a Windows 8.0 guest, the guest freezes at boot without any
  error.
  When I dump the CPU registers via info registers, nothing changes,
  that
  means
  the system really stalled. Same happens with QEMU 2.0.0.
 
  But - when I run the very same guest using Kernel 2.6.32.12 and QEMU
  1.7.0
  on
  the host side it works on the Core2Duo. Also the system above but just
  with
  an
  i3 or i5 CPU it works, too.
 
  I already disabled networking and USB for the guest and changed the
  graphics
  card - no effect. I assume that some mean bits and bytes have to be set
  up
  properly to get the thing running.
 
  Any hint what to change / test would be really appreciated.
 
  Thanks in advance,
 
  Best regards,
 
  Erik
 
 
  Hi all,
 
  I opened a qemu bug report on that and Jan helped me creating a kvm
  trace. I
  attached it to the bug report.
  https://bugs.launchpad.net/qemu/+bug/1366836
 
  If you have further questions, please let me know.
 
  File possibly truncated. Need at least 346583040, but file size is
  133414912.
 
  Does trace-cmd report work for you? Is your file larger?
 
  Again, please also validate the behavior on latest next branch from
  kvm.git.
 
  Jan
 
 
  Hi all,
 
  confirmed. The issue is still existing in the kvm.git Version of the
  kernel.
  The trace.tgz was uploaded to the bugtracker.
 
  Thanks. Could you provide a good-case of your setup as well, i.e. with
  that older kernel version? At least I'm not yet seeing something
  obviously wrong.

 Well, except that we have continuously EXTERNAL_INTERRUPTs, vector 0xf6,
 throughout most of the trace. Maybe a self-IPI (this is single-core),
 maybe something external that is stuck. You could do a full trace (-e
 all) and check for what happens after things like

 kvm_exit: reason EXTERNAL_INTERRUPT rip 0x8168ed83 info 0 80ef

 Jan


The huge number of interrupts seem to be rescheduling interrupts from qemu/kvm.
I disabled SMP (kernel cmdline nosmp) and retried - same effect, Windows 8
does not boot.
But I was able to get rid of the reschedulding interrupts. The trace after /
around a kvm_exit looks like this:

 qemu-system-x86-954   [001] 261013.227405: kvm_entry:            vcpu 0
 qemu-system-x86-952   [000] 261013.227405: kmem_cache_free:     
call_site=c10ef001 ptr=0xf1d2ae48
 qemu-system-x86-952   [000] 261013.227406: mm_filemap_delete_from_page_cache:
dev 0:3 ino 0 page=0xf5bcc9c0 pfn=4122790336 ofs=507641856
 qemu-system-x86-952   [000] 261013.227406: kmem_cache_free:     
call_site=c10ef001 ptr=0xf1d2ae10
 qemu-system-x86-952   [000] 261013.227406: mm_filemap_delete_from_page_cache:
dev 0:3 ino 0 page=0xf5bcc9e0 pfn=4122790368 ofs=507645952
 qemu-system-x86-954   [001] 261013.227406: kvm_exit:             reason
EXCEPTION_NMI rip 0x812a1d83 info 80201120 8b0e
 qemu-system-x86-954   [001] 261013.227407: kvm_page_fault:       address
80201120 error_code 3
 qemu-system-x86-952   [000] 261013.227407: mm_page_free_batched:
page=0xf5bcc9e0 pfn=4122790368 order=0 cold=0
 qemu-system-x86-954   [001] 261013.227407: kvm_mmu_pagetable_walk: addr
80201120 pferr 3 P|W
 qemu-system-x86-952   [000] 261013.227407: mm_page_free:       
 page=0xf5bcc9e0 pfn=4122790368 order=0
 qemu-system-x86-954   [001] 261013.227407: kvm_mmu_paging_element: pte 188001
level 3
 qemu-system-x86-952   [000] 261013.227407: mm_page_free_batched:
page=0xf5bcc9c0 pfn=4122790336 order=0 cold=0
 qemu-system-x86-954   [001] 261013.227407: kvm_mmu_paging_element: pte 39b863
level 2

or

 qemu-system-x86-954   [001] 261013.276282: kvm_mmu_paging_element: pte 188001
level 3
 qemu-system-x86-954   [001] 261013.276283: kvm_mmu_paging_element: pte 39b863
level 2
 qemu-system-x86-954   [001] 261013.276283: kvm_mmu_paging_element: pte
80188963 level 1
 qemu-system-x86-954   [001] 261013.276284: rcu_utilization:      Start context
switch
 qemu-system-x86-954   [001] 261013.276284: rcu_utilization:      End context
switch
 qemu-system-x86-954   [001] 261013.276284: kvm_entry:            vcpu 0
 qemu-system-x86-954   [001] 261013.276285: kvm_exit:             reason
EXCEPTION_NMI rip 0x812a1d83 info 80201120 8b0e
 qemu-system-x86-954   [001] 261013.276286: kvm_page_fault:       address
80201120 error_code 3
 qemu-system-x86-954   [001] 261013.276286: kvm_mmu_pagetable_walk: addr
80201120 pferr 3 P|W
 qemu-system-x86-954   [001] 261013.276286: kvm_mmu_paging_element: pte 188001
level 3
 

Re: [PATCH v3 0/4] Make kvm_device_ops registration dynamic

2014-09-16 Thread Paolo Bonzini
Il 02/09/2014 11:27, Will Deacon ha scritto:
 The mpic, flic and xics are still not ported over, as I don't want to
 risk breaking those devices

Actually FLIC is ported. :)

 
  arch/s390/kvm/kvm-s390.c |   3 +-
  arch/s390/kvm/kvm-s390.h |   1 +
  include/linux/kvm_host.h |   4 +-
  include/uapi/linux/kvm.h |  22 +--
  virt/kvm/arm/vgic.c  | 157 
 ---
  virt/kvm/kvm_main.c  |  57 +
  virt/kvm/vfio.c  |  22 ---
  7 files changed, 142 insertions(+), 124 deletions(-)
 

Thanks, applying to kvm/queue.  Alex (Graf) and Paul, can you look at
MPIC and XICS?

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 5/6] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-09-16 Thread Tang Chen
This patch only handle L1 and L2 vm share one apic access page situation.

When L1 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, 
L2's vmcs
will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do
nothing.

When L2 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all L2 vmcs. And this patch requests apic access page reload in L2-L1 vmexit.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/vmx.c  | 6 ++
 arch/x86/kvm/x86.c  | 3 ++-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 514183e..92b3e72 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1046,6 +1046,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu);
 
 void kvm_define_shared_msr(unsigned index, u32 msr);
 void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a1a9797..d0d5981 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8795,6 +8795,12 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
}
 
/*
+* We are now running in L2, mmu_notifier will force to reload the
+* page's hpa for L2 vmcs. Need to reload it for L1 before entering L1.
+*/
+   kvm_vcpu_reload_apic_access_page(vcpu);
+
+   /*
 * Exiting from L2 to L1, we're now back to L1 which thinks it just
 * finished a VMLAUNCH or VMRESUME instruction, so we need to set the
 * success or failure flag accordingly.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 27c3d30..3f458b2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,7 +5989,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
-static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
/*
 * apic access page could be migrated. When the page is being migrated,
@@ -6001,6 +6001,7 @@ static void kvm_vcpu_reload_apic_access_page(struct 
kvm_vcpu *vcpu)
kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm,
page_to_phys(vcpu-kvm-arch.apic_access_page));
 }
+EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 6/6] kvm, mem-hotplug: Unpin and remove kvm_arch-apic_access_page.

2014-09-16 Thread Tang Chen
To make apic access page migratable, we do not pin it in memory now.
When it is migrated, we should reload its physical address for all
vmcses. But when we tried to do this, all vcpu will access
kvm_arch-apic_access_page without any locking. This is not safe.

Actually, we do not need kvm_arch-apic_access_page anymore. Since
apic access page is not pinned in memory now, we can remove
kvm_arch-apic_access_page. When we need to write its physical address
into vmcs, use gfn_to_page() to get its page struct, which will also
pin it. And unpin it after then.

Suggested-by: Gleb Natapov g...@kernel.org
Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/vmx.c  | 15 +--
 arch/x86/kvm/x86.c  | 15 +--
 3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 92b3e72..9daf754 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -576,7 +576,7 @@ struct kvm_arch {
struct kvm_apic_map *apic_map;
 
unsigned int tss_addr;
-   struct page *apic_access_page;
+   bool apic_access_page_done;
 
gpa_t wall_clock;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d0d5981..61f3854 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4002,7 +4002,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
int r = 0;
 
mutex_lock(kvm-slots_lock);
-   if (kvm-arch.apic_access_page)
+   if (kvm-arch.apic_access_page_done)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
@@ -4018,7 +4018,12 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
}
 
-   kvm-arch.apic_access_page = page;
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
+   kvm-arch.apic_access_page_done = true;
 out:
mutex_unlock(kvm-slots_lock);
return r;
@@ -4534,8 +4539,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
}
 
if (vm_need_virtualize_apic_accesses(vmx-vcpu.kvm))
-   vmcs_write64(APIC_ACCESS_ADDR,
-
page_to_phys(vmx-vcpu.kvm-arch.apic_access_page));
+   kvm_vcpu_reload_apic_access_page(vcpu);
 
if (vmx_vm_has_apicv(vcpu-kvm))
memset(vmx-pi_desc, 0, sizeof(struct pi_desc));
@@ -7995,8 +7999,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
} else if (vm_need_virtualize_apic_accesses(vmx-vcpu.kvm)) {
exec_control |=
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-   vmcs_write64(APIC_ACCESS_ADDR,
-   page_to_phys(vcpu-kvm-arch.apic_access_page));
+   kvm_vcpu_reload_apic_access_page(vcpu);
}
 
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3f458b2..9094e13 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5991,15 +5991,20 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
+   struct page *page;
+
/*
 * apic access page could be migrated. When the page is being migrated,
 * GUP will wait till the migrate entry is replaced with the new pte
 * entry pointing to the new page.
 */
-   vcpu-kvm-arch.apic_access_page = gfn_to_page(vcpu-kvm,
-   APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
-   kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm,
-   page_to_phys(vcpu-kvm-arch.apic_access_page));
+   page = gfn_to_page(vcpu-kvm, APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
+   kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm, page_to_phys(page));
+   /*
+* Do not pin apic access page in memory so that memory hotplug
+* process is able to migrate it.
+*/
+   put_page(page);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
@@ -7253,8 +7258,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kfree(kvm-arch.vpic);
kfree(kvm-arch.vioapic);
kvm_free_vcpus(kvm);
-   if (kvm-arch.apic_access_page)
-   put_page(kvm-arch.apic_access_page);
kfree(rcu_dereference_check(kvm-arch.apic_map, 1));
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-16 Thread Tang Chen
apic access page is pinned in memory. As a result, it cannot be 
migrated/hot-removed.
Actually, it is not necessary to be pinned.

The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
corresponding ept entry. This patch introduces a new vcpu request named
KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
and force all the vcpus exit guest, and re-enter guest till they updates the 
VMCS
APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
kvm-arch.apic_access_page to the new page.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  6 ++
 arch/x86/kvm/vmx.c  |  6 ++
 arch/x86/kvm/x86.c  | 15 +++
 include/linux/kvm_host.h|  2 ++
 virt/kvm/kvm_main.c | 12 
 6 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35171c7..514183e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -739,6 +739,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d941ad..f2eacc4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3619,6 +3619,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   return;
+}
+
 static int svm_vm_has_apicv(struct kvm *kvm)
 {
return 0;
@@ -4373,6 +4378,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 72a0470..a1a9797 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7090,6 +7090,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
 }
 
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
 static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
 {
u16 status;
@@ -8909,6 +8914,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e05bd58..27c3d30 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5989,6 +5989,19 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+   /*
+* apic access page could be migrated. When the page is being migrated,
+* GUP will wait till the migrate entry is replaced with the new pte
+* entry pointing to the new page.
+*/
+   vcpu-kvm-arch.apic_access_page = gfn_to_page(vcpu-kvm,
+   APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
+   kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm,
+   page_to_phys(vcpu-kvm-arch.apic_access_page));
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
@@ -6049,6 +6062,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_deliver_pmi(vcpu);
if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
vcpu_scan_ioapic(vcpu);
+   if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
+   kvm_vcpu_reload_apic_access_page(vcpu);
}
 
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 

[PATCH v6 3/6] kvm: Make init_rmode_identity_map() return 0 on success.

2014-09-16 Thread Tang Chen
In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.

This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 arch/x86/kvm/vmx.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4fb84ad..72a0470 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3939,45 +3939,42 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret = 0;
+   int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
-   return 1;
+   return 0;
 
/* Protect kvm-arch.ept_identity_pagetable_done. */
mutex_lock(kvm-slots_lock);
 
-   if (likely(kvm-arch.ept_identity_pagetable_done)) {
-   ret = 1;
+   if (likely(kvm-arch.ept_identity_pagetable_done))
goto out2;
-   }
 
identity_map_pfn = kvm-arch.ept_identity_map_addr  PAGE_SHIFT;
 
-   r = alloc_identity_pagetable(kvm);
-   if (r)
+   ret = alloc_identity_pagetable(kvm);
+   if (ret)
goto out2;
 
idx = srcu_read_lock(kvm-srcu);
-   r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
-   if (r  0)
+   ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+   if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i  PT32_ENT_PER_PAGE; i++) {
tmp = (i  22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-   r = kvm_write_guest_page(kvm, identity_map_pfn,
+   ret = kvm_write_guest_page(kvm, identity_map_pfn,
tmp, i * sizeof(tmp), sizeof(tmp));
-   if (r  0)
+   if (ret)
goto out;
}
kvm-arch.ept_identity_pagetable_done = true;
-   ret = 1;
+
 out:
srcu_read_unlock(kvm-srcu, idx);
-
 out2:
mutex_unlock(kvm-slots_lock);
return ret;
@@ -7604,11 +7601,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
*kvm, unsigned int id)
if (err)
goto free_vcpu;
 
+   /* Set err to -ENOMEM to handle memory allocation error. */
+   err = -ENOMEM;
+
vmx-guest_msrs = kmalloc(PAGE_SIZE, GFP_KERNEL);
BUILD_BUG_ON(ARRAY_SIZE(vmx_msr_index) * sizeof(vmx-guest_msrs[0])
  PAGE_SIZE);
 
-   err = -ENOMEM;
if (!vmx-guest_msrs) {
goto uninit_vcpu;
}
@@ -7641,8 +7640,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
if (!kvm-arch.ept_identity_map_addr)
kvm-arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
-   err = -ENOMEM;
-   if (!init_rmode_identity_map(kvm))
+   err = init_rmode_identity_map(kvm);
+   if (err  0)
goto free_vmcs;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 0/6] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-16 Thread Tang Chen
ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

NOTE: Tested with -cpu xxx,-x2apic option.
  But since nested vm pins some other pages in memory, if user uses nested
  vm, memory hot-remove will not work.

Change log v5 - v6:
1. Patch 1/6 has been applied by Paolo Bonzini pbonz...@redhat.com, just 
resend it.
2. Simplify comment in alloc_identity_pagetable() and add a BUG_ON() in patch 
2/6.
3. Move err initialization forward in patch 3/6.
4. Rename vcpu_reload_apic_access_page() to kvm_vcpu_reload_apic_access_page() 
and 
   use it instead of kvm_reload_apic_access_page() in nested_vmx_vmexit() in 
patch 5/6.
5. Reuse kvm_vcpu_reload_apic_access_page() in prepare_vmcs02() and 
vmx_vcpu_reset() in patch 6/6.
6. Remove original patch 7 since we are not able to handle the situation in 
nested vm.

Tang Chen (6):
  kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
  kvm: Remove ept_identity_pagetable from struct kvm_arch.
  kvm: Make init_rmode_identity_map() return 0 on success.
  kvm, mem-hotplug: Reload L1' apic access page on migration in
vcpu_enter_guest().
  kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is
running.
  kvm, mem-hotplug: Unpin and remove kvm_arch-apic_access_page.

 arch/x86/include/asm/kvm_host.h |  5 ++-
 arch/x86/kvm/svm.c  |  9 +++-
 arch/x86/kvm/vmx.c  | 95 +++--
 arch/x86/kvm/x86.c  | 25 +--
 include/linux/kvm_host.h|  2 +
 virt/kvm/kvm_main.c | 12 ++
 6 files changed, 99 insertions(+), 49 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 2/6] kvm: Remove ept_identity_pagetable from struct kvm_arch.

2014-09-16 Thread Tang Chen
kvm_arch-ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch-ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
  As a result, it cannot be migrated/hot-removed. After this patch, since
  kvm_arch-ept_identity_pagetable is removed, ept identity pagetable page
  is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
Reviewed-by: Gleb Natapov g...@kernel.org
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/vmx.c  | 47 +++--
 arch/x86/kvm/x86.c  |  2 --
 3 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..35171c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,7 +580,6 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
-   struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b80ead..4fb84ad 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static int alloc_identity_pagetable(struct kvm *kvm);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3938,21 +3939,27 @@ out:
 
 static int init_rmode_identity_map(struct kvm *kvm)
 {
-   int i, idx, r, ret;
+   int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
 
if (!enable_ept)
return 1;
-   if (unlikely(!kvm-arch.ept_identity_pagetable)) {
-   printk(KERN_ERR EPT: identity-mapping pagetable 
-   haven't been allocated!\n);
-   return 0;
+
+   /* Protect kvm-arch.ept_identity_pagetable_done. */
+   mutex_lock(kvm-slots_lock);
+
+   if (likely(kvm-arch.ept_identity_pagetable_done)) {
+   ret = 1;
+   goto out2;
}
-   if (likely(kvm-arch.ept_identity_pagetable_done))
-   return 1;
-   ret = 0;
+
identity_map_pfn = kvm-arch.ept_identity_map_addr  PAGE_SHIFT;
+
+   r = alloc_identity_pagetable(kvm);
+   if (r)
+   goto out2;
+
idx = srcu_read_lock(kvm-srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r  0)
@@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
 out:
srcu_read_unlock(kvm-srcu, idx);
+
+out2:
+   mutex_unlock(kvm-slots_lock);
return ret;
 }
 
@@ -4019,31 +4029,20 @@ out:
 
 static int alloc_identity_pagetable(struct kvm *kvm)
 {
-   struct page *page;
+   /* Called with kvm-slots_lock held. */
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
 
-   mutex_lock(kvm-slots_lock);
-   if (kvm-arch.ept_identity_pagetable)
-   goto out;
+   BUG_ON(kvm-arch.ept_identity_pagetable_done);
+
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm-arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, kvm_userspace_mem);
-   if (r)
-   goto out;
-
-   page = gfn_to_page(kvm, kvm-arch.ept_identity_map_addr  PAGE_SHIFT);
-   if (is_error_page(page)) {
-   r = -EFAULT;
-   goto out;
-   }
 
-   kvm-arch.ept_identity_pagetable = page;
-out:
-   mutex_unlock(kvm-slots_lock);
return r;
 }
 
@@ -7643,8 +7642,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
kvm-arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
-   if (alloc_identity_pagetable(kvm) != 0)
-   goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..e05bd58 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7239,8 +7239,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_free_vcpus(kvm);
if 

[PATCH v6 1/6] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

2014-09-16 Thread Tang Chen
We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address 
of
apic access page. So use this macro.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
Reviewed-by: Gleb Natapov g...@kernel.org
---
 arch/x86/kvm/svm.c | 3 ++-
 arch/x86/kvm/vmx.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..1d941ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm-asid_generation = 0;
init_vmcb(svm);
 
-   svm-vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   svm-vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+  MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(svm-vcpu))
svm-vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..4b80ead 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
-   kvm_userspace_mem.guest_phys_addr = 0xfee0ULL;
+   kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, kvm_userspace_mem);
if (r)
goto out;
 
-   page = gfn_to_page(kvm, 0xfee00);
+   page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx-vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(vmx-vcpu, 0);
-   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(vmx-vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-16 Thread Paolo Bonzini
Il 16/09/2014 12:42, Tang Chen ha scritto:
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 33712fb..0df82c1 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -210,6 +210,11 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
   make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
  }
  
 +void kvm_reload_apic_access_page(struct kvm *kvm)
 +{
 + make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
 +}
 +
  int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
  {
   struct page *page;
 @@ -294,6 +299,13 @@ static void kvm_mmu_notifier_invalidate_page(struct 
 mmu_notifier *mn,
   if (need_tlb_flush)
   kvm_flush_remote_tlbs(kvm);
  
 + /*
 +  * The physical address of apic access page is stored in VMCS.
 +  * Update it when it becomes invalid.
 +  */
 + if (address == gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT))
 + kvm_reload_apic_access_page(kvm);

This cannot be in the generic code.  It is architecture-specific.
Please add a new function kvm_arch_mmu_notifier_invalidate_page, and
call it outside the mmu_lock.

kvm_reload_apic_access_page need not be in virt/kvm/kvm_main.c, either.

Paolo

   spin_unlock(kvm-mmu_lock);
   srcu_read_unlock(kvm-srcu, idx);
  }

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 5/6] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.

2014-09-16 Thread Paolo Bonzini
Il 16/09/2014 12:42, Tang Chen ha scritto:
 This patch only handle L1 and L2 vm share one apic access page situation.
 
 When L1 vm is running, if the shared apic access page is migrated, 
 mmu_notifier will
 request all vcpus to exit to L0, and reload apic access page physical address 
 for
 all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, 
 L2's vmcs
 will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to 
 do
 nothing.
 
 When L2 vm is running, if the shared apic access page is migrated, 
 mmu_notifier will
 request all vcpus to exit to L0, and reload apic access page physical address 
 for
 all L2 vmcs. And this patch requests apic access page reload in L2-L1 vmexit.
 
 Signed-off-by: Tang Chen tangc...@cn.fujitsu.com

But if kvm_vcpu_reload_apic_access_page is called when the active VMCS
is a VMCS02, the APIC access address will be corrupted, no?

So, even if you are not touching the pages pinned by nested virt, you
need an

   if (!is_guest_mode(vcpu) ||
   !(vmx-nested.current_vmcs12-secondary_vm_exec_control 
 SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))

as suggested by Gleb in the review of v5.

Paolo

 ---
  arch/x86/include/asm/kvm_host.h | 1 +
  arch/x86/kvm/vmx.c  | 6 ++
  arch/x86/kvm/x86.c  | 3 ++-
  3 files changed, 9 insertions(+), 1 deletion(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index 514183e..92b3e72 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -1046,6 +1046,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
  int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
  void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
 +void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu);
  
  void kvm_define_shared_msr(unsigned index, u32 msr);
  void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index a1a9797..d0d5981 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -8795,6 +8795,12 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, 
 u32 exit_reason,
   }
  
   /*
 +  * We are now running in L2, mmu_notifier will force to reload the
 +  * page's hpa for L2 vmcs. Need to reload it for L1 before entering L1.
 +  */
 + kvm_vcpu_reload_apic_access_page(vcpu);
 +
 + /*
* Exiting from L2 to L1, we're now back to L1 which thinks it just
* finished a VMLAUNCH or VMRESUME instruction, so we need to set the
* success or failure flag accordingly.
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 27c3d30..3f458b2 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -5989,7 +5989,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
   kvm_apic_update_tmr(vcpu, tmr);
  }
  
 -static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 +void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
  {
   /*
* apic access page could be migrated. When the page is being migrated,
 @@ -6001,6 +6001,7 @@ static void kvm_vcpu_reload_apic_access_page(struct 
 kvm_vcpu *vcpu)
   kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm,
   page_to_phys(vcpu-kvm-arch.apic_access_page));
  }
 +EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
  
  /*
   * Returns 1 to let __vcpu_run() continue the guest execution loop without
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: Make init_rmode_tss() return 0 on success.

2014-09-16 Thread Paolo Bonzini
In init_rmode_tss(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_set_tss_addr(), and r is redundant.

This patch removes the redundant variable, by making init_rmode_tss()
return 0 on success, -errno on failure.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 arch/x86/kvm/vmx.c |   13 +++--
 1 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 41bbddb..1e6b493 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3930,7 +3930,7 @@ static int init_rmode_tss(struct kvm *kvm)
 {
gfn_t fn;
u16 data = 0;
-   int r, idx, ret = 0;
+   int idx, r;
 
idx = srcu_read_lock(kvm-srcu);
fn = kvm-arch.tss_addr  PAGE_SHIFT;
@@ -3952,13 +3952,9 @@ static int init_rmode_tss(struct kvm *kvm)
r = kvm_write_guest_page(kvm, fn, data,
 RMODE_TSS_SIZE - 2 * PAGE_SIZE - 1,
 sizeof(u8));
-   if (r  0)
-   goto out;
-
-   ret = 1;
 out:
srcu_read_unlock(kvm-srcu, idx);
-   return ret;
+   return r;
 }
 
 static int init_rmode_identity_map(struct kvm *kvm)
@@ -4750,10 +4746,7 @@ static int vmx_set_tss_addr(struct kvm *kvm, unsigned 
int addr)
if (ret)
return ret;
kvm-arch.tss_addr = addr;
-   if (!init_rmode_tss(kvm))
-   return  -ENOMEM;
-
-   return 0;
+   return init_rmode_tss(kvm);
 }
 
 static bool rmode_exception(struct kvm_vcpu *vcpu, int vec)
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 0/6] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-09-16 Thread Paolo Bonzini
Il 16/09/2014 12:41, Tang Chen ha scritto:
 ept identity pagetable and apic access page in kvm are pinned in memory.
 As a result, they cannot be migrated/hot-removed.
 
 But actually they don't need to be pinned in memory.
 
 [For ept identity page]
 Just do not pin it. When it is migrated, guest will be able to find the
 new page in the next ept violation.
 
 [For apic access page]
 The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
 When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
 for each vcpu in addition.
 
 NOTE: Tested with -cpu xxx,-x2apic option.
   But since nested vm pins some other pages in memory, if user uses nested
   vm, memory hot-remove will not work.
 
 Change log v5 - v6:
 1. Patch 1/6 has been applied by Paolo Bonzini pbonz...@redhat.com, just 
 resend it.
 2. Simplify comment in alloc_identity_pagetable() and add a BUG_ON() in patch 
 2/6.
 3. Move err initialization forward in patch 3/6.
 4. Rename vcpu_reload_apic_access_page() to 
 kvm_vcpu_reload_apic_access_page() and 
use it instead of kvm_reload_apic_access_page() in nested_vmx_vmexit() in 
 patch 5/6.
 5. Reuse kvm_vcpu_reload_apic_access_page() in prepare_vmcs02() and 
 vmx_vcpu_reset() in patch 6/6.
 6. Remove original patch 7 since we are not able to handle the situation in 
 nested vm.

I'll push 1-3 soon to kvm/queue.  I think v7 will be good. :)

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off

2014-09-16 Thread Andrew Jones


- Original Message -
 Il 16/09/2014 04:06, Andrew Jones ha scritto:
  We shouldn't try Load-Exclusive instructions unless we've enabled memory
  management, as these instructions depend on the data cache unit's
  coherency monitor. This patch adds a new setup boolean, initialized to
  false,
  that is used to guard Load-Exclusive instructions. Eventually we'll add
  more
  setup code that sets it true.
  
  Note: This problem became visible on boards with Cortex-A7 processors.
  Testing
  with Cortex-A15 didn't expose it, as those may have an external coherency
  monitor that still allows the instruction to execute (on A7s we got data
  aborts). Although even on A15's it's not clear from the spec if the
  instructions will behave as expected while caches are off, so we no longer
  allow Load-Exclusive instructions on those processors without caches
  enabled
  either.
  
  Signed-off-by: Andrew Jones drjo...@redhat.com
  ---
   lib/arm/asm/setup.h |  2 ++
   lib/arm/setup.c |  1 +
   lib/arm/spinlock.c  | 10 ++
   3 files changed, 13 insertions(+)
  
  diff --git a/lib/arm/asm/setup.h b/lib/arm/asm/setup.h
  index 21445ef2085fc..9c54c184e2866 100644
  --- a/lib/arm/asm/setup.h
  +++ b/lib/arm/asm/setup.h
  @@ -20,6 +20,8 @@ extern phys_addr_t __phys_offset, __phys_end;
   #define PHYS_SIZE  (1ULL  PHYS_SHIFT)
   #define PHYS_MASK  (PHYS_SIZE - 1ULL)
   
  +extern bool mem_caches_enabled;
  +
   #define L1_CACHE_SHIFT 6
   #define L1_CACHE_BYTES (1  L1_CACHE_SHIFT)
   #define SMP_CACHE_BYTESL1_CACHE_BYTES
  diff --git a/lib/arm/setup.c b/lib/arm/setup.c
  index 3941c9757dcb2..f7ed639c9d499 100644
  --- a/lib/arm/setup.c
  +++ b/lib/arm/setup.c
  @@ -25,6 +25,7 @@ u32 cpus[NR_CPUS] = { [0 ... NR_CPUS-1] = (~0UL) };
   int nr_cpus;
   
   phys_addr_t __phys_offset, __phys_end;
  +bool mem_caches_enabled;
   
   static void cpu_set(int fdtnode __unused, u32 regval, void *info __unused)
   {
  diff --git a/lib/arm/spinlock.c b/lib/arm/spinlock.c
  index d8a6d4c3383d6..43539c5e84062 100644
  --- a/lib/arm/spinlock.c
  +++ b/lib/arm/spinlock.c
  @@ -1,12 +1,22 @@
   #include libcflat.h
   #include asm/spinlock.h
   #include asm/barrier.h
  +#include asm/setup.h
   
   void spin_lock(struct spinlock *lock)
   {
  u32 val, fail;
   
  dmb();
  +
  +   /*
  +* Without caches enabled Load-Exclusive instructions may fail.
  +* In that case we do nothing, and just hope the caller knows
  +* what they're doing.
  +*/
  +   if (!mem_caches_enabled)
  +   return;
 
 Should it at least write 1 to the spinlock?

I thought about that. So on one hand we might get a somewhat functional
synchronization mechanism, which may be enough for some unit test that
doesn't enable caches, but still needs it. On the other hand, we know
its broken, so we don't really want any unit tests that need synchronization
and don't enable caches. I chose to not write a 1 in the hope that if
a unit test introduces a race, that that race will be easier to expose
and fix. That said, I'm not strongly biased, as we'd still have a race,
which may or may not be easy to expose, either way. So if the majority
prefers a best effort approach, then I'll spin a v2.

drew

 
 Paolo
 
  do {
  asm volatile(
  1: ldrex   %0, [%2]\n
  
 
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: x86: Using cpuid structs in KVM

2014-09-16 Thread Nadav Amit
Using cpuid structs in KVM to eliminate cryptic code with many bit operations.
The code does not introduce functional changes.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/cpuid.c | 36 ++--
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 38a0afe..a7479ab 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -18,6 +18,7 @@
 #include linux/uaccess.h
 #include asm/user.h
 #include asm/xsave.h
+#include asm/cpuid_def.h
 #include cpuid.h
 #include lapic.h
 #include mmu.h
@@ -357,16 +358,18 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
}
/* function 4 has additional index. */
case 4: {
-   int i, cache_type;
+   int i;
 
entry-flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
/* read more entries until cache_type is zero */
for (i = 1; ; ++i) {
+   union cpuid4_eax eax;
+
if (*nent = maxnent)
goto out;
 
-   cache_type = entry[i - 1].eax  0x1f;
-   if (!cache_type)
+   eax.full = entry[i - 1].eax;
+   if (!eax.split.cache_type)
break;
do_cpuid_1_ent(entry[i], function, i);
entry[i].flags |=
@@ -423,16 +426,18 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
}
/* function 0xb has additional index. */
case 0xb: {
-   int i, level_type;
+   int i;
 
entry-flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
/* read more entries until level_type is zero */
for (i = 1; ; ++i) {
+   union cpuid11_ecx ecx;
+
if (*nent = maxnent)
goto out;
 
-   level_type = entry[i - 1].ecx  0xff00;
-   if (!level_type)
+   ecx.full = entry[i - 1].ecx;
+   if (!ecx.split.level_type)
break;
do_cpuid_1_ent(entry[i], function, i);
entry[i].flags |=
@@ -505,13 +510,13 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
entry-eax = entry-ebx = entry-ecx = 0;
break;
case 0x8008: {
-   unsigned g_phys_as = (entry-eax  16)  0xff;
-   unsigned virt_as = max((entry-eax  8)  0xff, 48U);
-   unsigned phys_as = entry-eax  0xff;
+   union cpuid_8_8_eax eax;
 
-   if (!g_phys_as)
-   g_phys_as = phys_as;
-   entry-eax = g_phys_as | (virt_as  8);
+   eax.full = entry-eax;
+   eax.split.virt_as = max_t(int, eax.split.virt_as, 48);
+   if (!eax.split.guest_phys_as)
+   eax.split.guest_phys_as = eax.split.phys_as;
+   entry-eax = eax.full;
entry-ebx = entry-edx = 0;
break;
}
@@ -724,13 +729,16 @@ EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry);
 int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
+   union cpuid_8_8_eax eax;
 
best = kvm_find_cpuid_entry(vcpu, 0x8000, 0);
if (!best || best-eax  0x8008)
goto not_found;
best = kvm_find_cpuid_entry(vcpu, 0x8008, 0);
-   if (best)
-   return best-eax  0xff;
+   if (best) {
+   eax.full = best-eax;
+   return eax.split.phys_as;
+   }
 not_found:
return 36;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] x86: Use new cpuid structs in cpuid functions

2014-09-16 Thread Nadav Amit
The current code that decodes cpuid fields is somewhat cryptic, since it uses
many bit operations.  Using cpuid structs instead for clarifying the code.
Introducing no functional change.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kernel/cpu/common.c | 56 +++-
 1 file changed, 34 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index e4ab2b4..b57c160 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -41,6 +41,7 @@
 #include asm/pat.h
 #include asm/microcode.h
 #include asm/microcode_intel.h
+#include asm/cpuid_def.h
 
 #ifdef CONFIG_X86_LOCAL_APIC
 #include asm/uv/uv.h
@@ -444,13 +445,17 @@ static void get_model_name(struct cpuinfo_x86 *c)
 
 void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)
 {
-   unsigned int n, dummy, ebx, ecx, edx, l2size;
+   unsigned int n, dummy, dummy2, l2size;
+   union cpuid8_5_ecx_edx ecx5, edx5;
+   union cpuid8_6_ebx ebx6;
+   union cpuid8_6_ecx ecx6;
 
n = c-extended_cpuid_level;
 
if (n = 0x8005) {
-   cpuid(0x8005, dummy, ebx, ecx, edx);
-   c-x86_cache_size = (ecx24) + (edx24);
+   cpuid(0x8005, dummy, dummy2, ecx5.full, edx5.full);
+   c-x86_cache_size = ecx5.split.cache_size +
+   edx5.split.cache_size;
 #ifdef CONFIG_X86_64
/* On K8 L1 TLB is inclusive, so don't count it */
c-x86_tlbsize = 0;
@@ -460,11 +465,11 @@ void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)
if (n  0x8006) /* Some chips just has a large L1. */
return;
 
-   cpuid(0x8006, dummy, ebx, ecx, edx);
-   l2size = ecx  16;
+   cpuid(0x8006, dummy, ebx6.full, ecx6.full, dummy2);
+   l2size = ecx6.split.cache_size;
 
 #ifdef CONFIG_X86_64
-   c-x86_tlbsize += ((ebx  16)  0xfff) + (ebx  0xfff);
+   c-x86_tlbsize += ebx6.split.dtlb_entries + ebx6.split.itlb_entries;
 #else
/* do processor-specific cache resizing */
if (this_cpu-legacy_cache_size)
@@ -505,9 +510,10 @@ void cpu_detect_tlb(struct cpuinfo_x86 *c)
 void detect_ht(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_X86_HT
-   u32 eax, ebx, ecx, edx;
+   u32 eax, ecx, edx;
int index_msb, core_bits;
static bool printed;
+   union cpuid1_ebx ebx;
 
if (!cpu_has(c, X86_FEATURE_HT))
return;
@@ -518,9 +524,9 @@ void detect_ht(struct cpuinfo_x86 *c)
if (cpu_has(c, X86_FEATURE_XTOPOLOGY))
return;
 
-   cpuid(1, eax, ebx, ecx, edx);
+   cpuid(1, eax, ebx.full, ecx, edx);
 
-   smp_num_siblings = (ebx  0xff)  16;
+   smp_num_siblings = ebx.split.max_logical_proc;
 
if (smp_num_siblings == 1) {
printk_once(KERN_INFO CPU0: Hyper-Threading is disabled\n);
@@ -591,20 +597,22 @@ void cpu_detect(struct cpuinfo_x86 *c)
c-x86 = 4;
/* Intel-defined flags: level 0x0001 */
if (c-cpuid_level = 0x0001) {
-   u32 junk, tfms, cap0, misc;
+   u32 junk, cap0;
+   union cpuid1_eax tfms;
+   union cpuid1_ebx misc;
 
-   cpuid(0x0001, tfms, misc, junk, cap0);
-   c-x86 = (tfms  8)  0xf;
-   c-x86_model = (tfms  4)  0xf;
-   c-x86_mask = tfms  0xf;
+   cpuid(0x0001, tfms.full, misc.full, junk, cap0);
+   c-x86 = tfms.split.family_id;
+   c-x86_model = tfms.split.model;
+   c-x86_mask = tfms.split.stepping_id;
 
if (c-x86 == 0xf)
-   c-x86 += (tfms  20)  0xff;
+   c-x86 += tfms.split.extended_family_id;
if (c-x86 = 0x6)
-   c-x86_model += ((tfms  16)  0xf)  4;
+   c-x86_model += tfms.split.extended_model_id  4;
 
-   if (cap0  (119)) {
-   c-x86_clflush_size = ((misc  8)  0xff) * 8;
+   if (cap0  (1  (X86_FEATURE_CLFLUSH  31))) {
+   c-x86_clflush_size = misc.split.clflush_size * 8;
c-x86_cache_alignment = c-x86_clflush_size;
}
}
@@ -654,10 +662,11 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
}
 
if (c-extended_cpuid_level = 0x8008) {
-   u32 eax = cpuid_eax(0x8008);
+   union cpuid_8_8_eax eax;
 
-   c-x86_virt_bits = (eax  8)  0xff;
-   c-x86_phys_bits = eax  0xff;
+   eax.full = cpuid_eax(0x8008);
+   c-x86_virt_bits = eax.split.virt_as;
+   c-x86_phys_bits = eax.split.phys_as;
}
 #ifdef CONFIG_X86_32
else if (cpu_has(c, X86_FEATURE_PAE) || cpu_has(c, X86_FEATURE_PSE36))
@@ -814,7 +823,10 @@ static void generic_identify(struct cpuinfo_x86 *c)

[PATCH 0/3] x86: structs for cpuid info in x86

2014-09-16 Thread Nadav Amit
The code that deals with x86 cpuid fields is hard to follow since it performs
many bit operations and does not refer to cpuid field explicitly.  To
eliminate the need of openning a spec whenever dealing with cpuid fields, this
patch-set introduces structs that reflect the various cpuid functions.

Thanks for reviewing the patch-set.

Nadav Amit (3):
  x86: Adding structs to reflect cpuid fields
  x86: Use new cpuid structs in cpuid functions
  KVM: x86: Using cpuid structs in KVM

 arch/x86/include/asm/cpuid_def.h | 163 +++
 arch/x86/kernel/cpu/common.c |  56 --
 arch/x86/kvm/cpuid.c |  36 +
 3 files changed, 219 insertions(+), 36 deletions(-)
 create mode 100644 arch/x86/include/asm/cpuid_def.h

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] x86: Adding structs to reflect cpuid fields

2014-09-16 Thread Nadav Amit
Adding structs that reflect various cpuid fields in x86 architecture. Structs
were added only for functions that are not pure bitmaps.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/include/asm/cpuid_def.h | 163 +++
 1 file changed, 163 insertions(+)
 create mode 100644 arch/x86/include/asm/cpuid_def.h

diff --git a/arch/x86/include/asm/cpuid_def.h b/arch/x86/include/asm/cpuid_def.h
new file mode 100644
index 000..0cf43ba
--- /dev/null
+++ b/arch/x86/include/asm/cpuid_def.h
@@ -0,0 +1,163 @@
+#ifndef ARCH_X86_KVM_CPUID_DEF_H
+#define ARCH_X86_KVM_CPUID_DEF_H
+
+union cpuid1_eax {
+   struct {
+   unsigned int stepping_id:4;
+   unsigned int model:4;
+   unsigned int family_id:4;
+   unsigned int processor_type:2;
+   unsigned int reserved:2;
+   unsigned int extended_model_id:4;
+   unsigned int extended_family_id:8;
+   unsigned int reserved2:4;
+   } split;
+   unsigned int full;
+};
+
+union cpuid1_ebx {
+   struct {
+   unsigned int brand_index:8;
+   unsigned int clflush_size:8;
+   unsigned int max_logical_proc:8;
+   unsigned int initial_apicid:8;
+   } split;
+   unsigned int full;
+};
+
+union cpuid4_eax {
+   struct {
+   unsigned int cache_type:5;
+   unsigned int cache_level:3;
+   unsigned int self_init_cache_level:1;
+   unsigned int fully_associative:1;
+   unsigned int reserved:4;
+   unsigned int max_logical_proc:12;
+   unsigned int max_package_proc:6;
+   } split;
+   unsigned int full;
+};
+
+union cpuid4_ebx {
+   struct {
+   unsigned int coherency_line_size:12;
+   unsigned int physical_line_partitions:10;
+   unsigned int ways:10;
+   } split;
+   unsigned int full;
+};
+
+union cpuid5_eax {
+   struct {
+   unsigned int min_monitor_line_size:16;
+   unsigned int reserved:16;
+   } split;
+   unsigned int full;
+};
+
+union cpuid5_ebx {
+   struct {
+   unsigned int max_monitor_line_size:16;
+   unsigned int reserved:16;
+   } split;
+   unsigned int full;
+};
+
+union cpuid5_ecx {
+   struct {
+   unsigned int monitor_mwait_ext:1;
+   unsigned int interrupt_as_break:1;
+   } split;
+   unsigned int full;
+};
+
+union cpuid5_edx {
+   struct {
+   unsigned int c0_sub_states:4;
+   unsigned int c1_sub_states:4;
+   unsigned int c2_sub_states:4;
+   unsigned int c3_sub_states:4;
+   unsigned int c4_sub_states:4;
+   unsigned int c5_sub_states:4;
+   unsigned int c6_sub_states:4;
+   unsigned int c7_sub_states:4;
+   } split;
+   unsigned int full;
+};
+
+union cpuid11_eax {
+   struct {
+   unsigned int x2apic_shift:5;
+   unsigned int reserved:27;
+   } split;
+   unsigned int full;
+};
+
+union cpuid11_ebx {
+   struct {
+   unsigned int logical_proc:16;
+   unsigned int reserved:16;
+   } split;
+   unsigned int full;
+};
+
+union cpuid11_ecx {
+   struct {
+   unsigned int level_number:8;
+   unsigned int level_type:8;
+   unsigned int reserved:16;
+   } split;
+   unsigned int full;
+};
+
+union cpuid8_5_eax_ebx {
+   struct {
+   unsigned int itlb_entries:8;
+   unsigned int itlb_assoc:8;
+   unsigned int dtlb_entries:8;
+   unsigned int dtlb_assoc:8;
+   } split;
+   unsigned int full;
+};
+
+union cpuid8_5_ecx_edx {
+   struct {
+   unsigned int line_size:8;
+   unsigned int lines_per_tag:8;
+   unsigned int associativity:8;
+   unsigned int cache_size:8;
+   } split;
+   unsigned int full;
+};
+
+union cpuid8_6_ebx {
+   struct {
+   unsigned int itlb_entries:12;
+   unsigned int itlb_assoc:4;
+   unsigned int dtlb_entries:12;
+   unsigned int dtlb_assoc:4;
+   } split;
+   unsigned int full;
+};
+
+union cpuid8_6_ecx {
+   struct {
+   unsigned int cache_line_size:8;
+   unsigned int lines_per_tag:4;
+   unsigned int l2_associativity:4;
+   unsigned int cache_size:16;
+   } split;
+   unsigned int full;
+};
+
+union cpuid_8_8_eax {
+   struct {
+   unsigned int phys_as:8;
+   unsigned int virt_as:8;
+   unsigned int guest_phys_as:8;
+   unsigned int reserved:8;
+   } split;
+   unsigned int full;
+};
+
+#endif
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in

Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off

2014-09-16 Thread Paolo Bonzini
Il 16/09/2014 14:12, Andrew Jones ha scritto:
  Should it at least write 1 to the spinlock?
 I thought about that. So on one hand we might get a somewhat functional
 synchronization mechanism, which may be enough for some unit test that
 doesn't enable caches, but still needs it. On the other hand, we know
 its broken, so we don't really want any unit tests that need synchronization
 and don't enable caches. I chose to not write a 1 in the hope that if
 a unit test introduces a race, that that race will be easier to expose
 and fix. That said, I'm not strongly biased, as we'd still have a race,
 which may or may not be easy to expose, either way. So if the majority
 prefers a best effort approach, then I'll spin a v2.

The case I was thinking about was something like

spin_lock()
enable caches
start other processors
spin_unlock()

I'm not sure if it makes sense though. :)

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off

2014-09-16 Thread Andrew Jones


- Original Message -
 Il 16/09/2014 14:12, Andrew Jones ha scritto:
   Should it at least write 1 to the spinlock?
  I thought about that. So on one hand we might get a somewhat functional
  synchronization mechanism, which may be enough for some unit test that
  doesn't enable caches, but still needs it. On the other hand, we know
  its broken, so we don't really want any unit tests that need
  synchronization
  and don't enable caches. I chose to not write a 1 in the hope that if
  a unit test introduces a race, that that race will be easier to expose
  and fix. That said, I'm not strongly biased, as we'd still have a race,
  which may or may not be easy to expose, either way. So if the majority
  prefers a best effort approach, then I'll spin a v2.
 
 The case I was thinking about was something like
 
 spin_lock()
 enable caches
 start other processors
 spin_unlock()
 
 I'm not sure if it makes sense though. :)

I don't think we need to worry about this case. AFAIU, enabling the
caches for a particular cpu shouldn't require any synchronization.
So we should be able to do

enable caches
spin_lock
start other processors
spin_unlock

drew

 
 Paolo
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off

2014-09-16 Thread Paolo Bonzini
Il 16/09/2014 14:43, Andrew Jones ha scritto:
 I don't think we need to worry about this case. AFAIU, enabling the
 caches for a particular cpu shouldn't require any synchronization.
 So we should be able to do
 
 enable caches
 spin_lock
 start other processors
 spin_unlock

Ok, I'll test and apply your patch then.

Once you change the code to enable caches, please consider hanging on
spin_lock with caches disabled.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off

2014-09-16 Thread Andrew Jones


- Original Message -
 
 
 - Original Message -
  Il 16/09/2014 14:12, Andrew Jones ha scritto:
Should it at least write 1 to the spinlock?
   I thought about that. So on one hand we might get a somewhat functional
   synchronization mechanism, which may be enough for some unit test that
   doesn't enable caches, but still needs it. On the other hand, we know
   its broken, so we don't really want any unit tests that need
   synchronization
   and don't enable caches. I chose to not write a 1 in the hope that if
   a unit test introduces a race, that that race will be easier to expose
   and fix. That said, I'm not strongly biased, as we'd still have a race,
   which may or may not be easy to expose, either way. So if the majority
   prefers a best effort approach, then I'll spin a v2.
  
  The case I was thinking about was something like
  
  spin_lock()
  enable caches
  start other processors
  spin_unlock()
  
  I'm not sure if it makes sense though. :)
 
 I don't think we need to worry about this case. AFAIU, enabling the
 caches for a particular cpu shouldn't require any synchronization.
 So we should be able to do
 
 enable caches
 spin_lock
 start other processors
 spin_unlock
 

Oh drat. I just introduced an issue with enabling caches without working
spin locks. This new boolean. This boolean will need to be per cpu.

I'll send a v2.



 drew
 
  
  Paolo
  
 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off

2014-09-16 Thread Andrew Jones


- Original Message -
 Il 16/09/2014 14:43, Andrew Jones ha scritto:
  I don't think we need to worry about this case. AFAIU, enabling the
  caches for a particular cpu shouldn't require any synchronization.
  So we should be able to do
  
  enable caches
  spin_lock
  start other processors
  spin_unlock
 
 Ok, I'll test and apply your patch then.

Actually, yeah, please apply now in order to get A7 boards working.
I'll do a follow-on patch to fix the case above (which will require
deciding how to hand per cpu data).

drew

 
 Once you change the code to enable caches, please consider hanging on
 spin_lock with caches disabled.
 
 Paolo
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off

2014-09-16 Thread Andrew Jones


- Original Message -
 Il 16/09/2014 14:43, Andrew Jones ha scritto:
  I don't think we need to worry about this case. AFAIU, enabling the
  caches for a particular cpu shouldn't require any synchronization.
  So we should be able to do
  
  enable caches
  spin_lock
  start other processors
  spin_unlock
 
 Ok, I'll test and apply your patch then.
 
 Once you change the code to enable caches, please consider hanging on
 spin_lock with caches disabled.

Unfortunately I can't do that without changing spin_lock into a wrapper
function. Early setup code calls functions that use spin_locks, e.g.
puts(), and we won't want to move the cache enablement into early setup
code, as that should be left for unit tests to turn on off as they wish.
Thus we either need to be able to change the spin_lock implementation
dynamically, or just leave the test/return as is.

drew

 
 Paolo
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] x86: structs for cpuid info in x86

2014-09-16 Thread Ingo Molnar

* Nadav Amit na...@cs.technion.ac.il wrote:

 The code that deals with x86 cpuid fields is hard to follow since it performs
 many bit operations and does not refer to cpuid field explicitly.  To
 eliminate the need of openning a spec whenever dealing with cpuid fields, this
 patch-set introduces structs that reflect the various cpuid functions.
 
 Thanks for reviewing the patch-set.
 
 Nadav Amit (3):
   x86: Adding structs to reflect cpuid fields
   x86: Use new cpuid structs in cpuid functions
   KVM: x86: Using cpuid structs in KVM
 
  arch/x86/include/asm/cpuid_def.h | 163 
 +++
  arch/x86/kernel/cpu/common.c |  56 --
  arch/x86/kvm/cpuid.c |  36 +
  3 files changed, 219 insertions(+), 36 deletions(-)
  create mode 100644 arch/x86/include/asm/cpuid_def.h

I personally like bitfields in theory (they provide type clarity 
and abstract robustness, compared to open-coded bitmask numeric 
literals that are often used in cpuid using code, obfuscating 
cpuid usage), with the big caveat that for many years I didn't 
like bitfields in practice: older versions of GCC did a really 
poor job of optimizing them.

So such a series would only be acceptable if it's demonstrated 
that both 'latest' and 'reasonably old' GCC versions do a good 
job in that department, compared to the old open-coded bitmask 
ops ...

Comparing the 'size vmlinux' output of before/after kernels would 
probably be a good start in seeing the impact of such a change.

If those results are positive then this technique could be 
propagated to all cpuid using code in arch/x86/, of which
there's plenty.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-16 Thread Paolo Bonzini
Il 15/09/2014 22:11, Andres Lagar-Cavilla ha scritto:
 + if (!locked) {
 + BUG_ON(npages != -EBUSY);

VM_BUG_ON perhaps?

 @@ -1177,9 +1210,15 @@ static int hva_to_pfn_slow(unsigned long addr, bool 
 *async, bool write_fault,
   npages = get_user_page_nowait(current, current-mm,
 addr, write_fault, page);
   up_read(current-mm-mmap_sem);
 - } else
 - npages = get_user_pages_fast(addr, 1, write_fault,
 -  page);
 + } else {
 + /*
 +  * By now we have tried gup_fast, and possible async_pf, and we
 +  * are certainly not atomic. Time to retry the gup, allowing
 +  * mmap semaphore to be relinquished in the case of IO.
 +  */
 + npages = kvm_get_user_page_retry(current, current-mm, addr,
 +  write_fault, page);

This is a separate logical change.  Was this:

down_read(mm-mmap_sem);
npages = get_user_pages(NULL, mm, addr, 1, 1, 0, NULL, NULL);
up_read(mm-mmap_sem);

the intention rather than get_user_pages_fast?

I think a first patch should introduce kvm_get_user_page_retry (Retry a
fault after a gup with FOLL_NOWAIT.) and the second would add
FOLL_TRIED (This properly relinquishes mmap semaphore if the
filemap/swap has to wait on page lock (and retries the gup to completion
after that).

Apart from this, the patch looks good.  The mm/ parts are minimal, so I
think it's best to merge it through the KVM tree with someone's Acked-by.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call for agend for 2014-09-16

2014-09-16 Thread Peter Maydell
On 16 September 2014 01:10, Juan Quintela quint...@redhat.com wrote:
 Juan Quintela quint...@redhat.com wrote:
 Hi

 Please, send any topic that you are interested in covering.

 People have complained on the past that I don't cancel the call until
 the very last minute.  So, what do you think that deadline for
 submitting topics is 23:00UTC on Monday?

 ok, no topics no call.

I can't make it, but weren't the people discussing
reverse-execution saying they wanted to talk about that
on the call?

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call for agend for 2014-09-16

2014-09-16 Thread Juan Quintela
Peter Maydell peter.mayd...@linaro.org wrote:
 On 16 September 2014 01:10, Juan Quintela quint...@redhat.com wrote:
 Juan Quintela quint...@redhat.com wrote:
 Hi

 Please, send any topic that you are interested in covering.

 People have complained on the past that I don't cancel the call until
 the very last minute.  So, what do you think that deadline for
 submitting topics is 23:00UTC on Monday?

 ok, no topics no call.

 I can't make it, but weren't the people discussing
 reverse-execution saying they wanted to talk about that
 on the call?

Dunno, but clearly they didn't answer to the call for topics, I didn't
saw that thread.

I can put it for the next call.

Later, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvmtool/arm{,64}: fix ARM initrd functionality

2014-09-16 Thread Andre Przywara
lkvm -i is currently broken on ARM/ARM64.
We should not try to convert smaller-than-4GB addresses into 64-bit
big endian and then stuff them into u32 variables if we expect to read
anything other than 0 out of it.
Adjust the type to u64 to write the proper address in BE format into
the /chosen node (and also match the address size we formely posted)
and let Linux thus read the right values.

Signed-off-by: Andre Przywara andre.przyw...@arm.com
---
 tools/kvm/arm/fdt.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/arm/fdt.c b/tools/kvm/arm/fdt.c
index 186a718..1a76b07c 100644
--- a/tools/kvm/arm/fdt.c
+++ b/tools/kvm/arm/fdt.c
@@ -119,8 +119,8 @@ static int setup_fdt(struct kvm *kvm)
 
/* Initrd */
if (kvm-arch.initrd_size != 0) {
-   u32 ird_st_prop = cpu_to_fdt64(kvm-arch.initrd_guest_start);
-   u32 ird_end_prop = cpu_to_fdt64(kvm-arch.initrd_guest_start +
+   u64 ird_st_prop = cpu_to_fdt64(kvm-arch.initrd_guest_start);
+   u64 ird_end_prop = cpu_to_fdt64(kvm-arch.initrd_guest_start +
   kvm-arch.initrd_size);
 
_FDT(fdt_property(fdt, linux,initrd-start,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off

2014-09-16 Thread Andrew Jones


- Original Message -
 
 
 - Original Message -
  Il 16/09/2014 14:43, Andrew Jones ha scritto:
   I don't think we need to worry about this case. AFAIU, enabling the
   caches for a particular cpu shouldn't require any synchronization.
   So we should be able to do
   
   enable caches
   spin_lock
   start other processors
   spin_unlock
  
  Ok, I'll test and apply your patch then.
 
 Actually, yeah, please apply now in order to get A7 boards working.
 I'll do a follow-on patch to fix the case above (which will require
 deciding how to hand per cpu data).

Post coffee, I don't see why I shouldn't just use SCTLR.C as my
boolean, which is of course per cpu, and means the same thing,
i.e. caches enabled or not. I'll send a v2 that drops
mem_caches_enabled, and modifies the logic of the spin_lock asm.

drew

 
 drew
 
  
  Once you change the code to enable caches, please consider hanging on
  spin_lock with caches disabled.
  
  Paolo
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [question] virtio-blk performance degradation happened with virito-serial

2014-09-16 Thread Zhang Haoyu
 If virtio-blk and virtio-serial share an IRQ, the guest operating
system has to check each virtqueue for activity. Maybe there is some
inefficiency doing that.
 AFAIK virtio-serial registers 64 virtqueues (on 31 ports +
console) even if everything is unused.

 That could be the case if MSI is disabled.

 Do the windows virtio drivers enable MSIs, in their inf file?

 It depends on the version of the drivers, but it is a reasonable guess
 at what differs between Linux and Windows. Haoyu, can you give us the
 output of lspci from a Linux guest?

 I made a test with fio on rhel-6.5 guest, the same degradation
happened too, this degradation can be reproduced on rhel6.5 guest 100%.
 virtio_console module installed:
 64K-write-sequence: 285 MBPS, 4380 IOPS
 virtio_console module uninstalled:
 64K-write-sequence: 370 MBPS, 5670 IOPS

I use top -d 1 -H -p qemu-pid to monitor the cpu usage, and found that,
virtio_console module installed:
qemu main thread cpu usage: 98%
virtio_console module uninstalled:
qemu main thread cpu usage: 60%


I found that the statement err =
register_virtio_driver(virtio_console); in virtio_console module's
init() function will
cause the degradation, if I directly return before err =
register_virtio_driver(virtio_console);, then the degradation disappeared,
if directly return after err =
register_virtio_driver(virtio_console);, the degradation is still there.
I will try below test case,
1. Dose not emulate virito-serial deivce, then install/uninstall
virtio_console driver in guest,
to see whether there is difference in virtio-blk performance and cpu usage.
2. Does not emulate virito-serial deivce, then install virtio_balloon
driver (and also dose not emulate virtio-balloon device),
to see whether virtio-blk performance degradation will happen.
3. Emulating virtio-balloon device instead of virtio-serial deivce ,
then to see whether the virtio-blk performance is hampered.

Base on the test result, corresponding analysis will be performed.
Any ideas?

Thanks,
Zhang Haoyu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read()

2014-09-16 Thread Michael Büsch
On Tue, 16 Sep 2014 08:27:40 +0800
Amos Kong ak...@redhat.com wrote:

 Set timeout to 10:
   non-smp guest with quick backend (1.2M/s) - about 490K/s)

That sounds like an awful lot. This is a 60% loss in throughput.
I don't think we can live with that.

-- 
Michael


signature.asc
Description: PGP signature


Re: [PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes

2014-09-16 Thread Rusty Russell
Amos Kong ak...@redhat.com writes:
 On Sun, Sep 14, 2014 at 01:12:58AM +0800, Amos Kong wrote:
 On Thu, Sep 11, 2014 at 09:08:03PM +0930, Rusty Russell wrote:
  Amos Kong ak...@redhat.com writes:
   When I check hwrng attributes in sysfs, cat process always gets
   stuck if guest has only 1 vcpu and uses a slow rng backend.
  
   Currently we check if there is any tasks waiting to be run on
   current cpu in rng_dev_read() by need_resched(). But need_resched()
   doesn't work because rng_dev_read() is executing in user context.
  
  I don't understand this explanation?  I'd expect the sysfs process to be
  woken by the mutex_unlock().
 
 But actually sysfs process's not woken always, this is they the
 process gets stuck.

 %s/they/why/

 Hi Rusty,


 Reference:
 http://www.linuxgrill.com/anonymous/fire/netfilter/kernel-hacking-HOWTO-2.html

Sure, that was true when I wrote it, and is still true when preempt is
off.

 read() syscall of /dev/hwrng will enter into kernel, the read operation is
 rng_dev_read(), it's userspace context (not interrupt context).

 Userspace context doesn't allow other user contexts run on that CPU,
 unless the kernel code sleeps for some reason.

This is true assuming preempt is off, yes.

 In this case, the need_resched() doesn't work.

This is exactly what need_resched() is for: it should return true if
there's another process of sufficient priority waiting to be run.  It
implies that schedule() would run it.

git blame doesn't offer any enlightenment here, as to why we use
schedule_timeout_interruptible() at all.

I would expect mutex_unlock() to wake the other reader.  The code
certainly seems to, so it should now be runnable and need_resched()
should return true.

I suspect something else is happening which makes this work.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-16 Thread Andres Lagar-Cavilla
On Tue, Sep 16, 2014 at 9:52 AM, Andres Lagar-Cavilla
andre...@google.com wrote:

Apologies to all. Resend as lists rejected my gmail-formatted version.
Now on plain text. Won't happen again.

 On Tue, Sep 16, 2014 at 6:51 AM, Paolo Bonzini pbonz...@redhat.com wrote:

 Il 15/09/2014 22:11, Andres Lagar-Cavilla ha scritto:
  + if (!locked) {
  + BUG_ON(npages != -EBUSY);

 VM_BUG_ON perhaps?

 Sure.


  @@ -1177,9 +1210,15 @@ static int hva_to_pfn_slow(unsigned long addr,
  bool *async, bool write_fault,
npages = get_user_page_nowait(current, current-mm,
  addr, write_fault, page);
up_read(current-mm-mmap_sem);
  - } else
  - npages = get_user_pages_fast(addr, 1, write_fault,
  -  page);
  + } else {
  + /*
  +  * By now we have tried gup_fast, and possible async_pf,
  and we
  +  * are certainly not atomic. Time to retry the gup,
  allowing
  +  * mmap semaphore to be relinquished in the case of IO.
  +  */
  + npages = kvm_get_user_page_retry(current, current-mm,
  addr,
  +  write_fault, page);

 This is a separate logical change.  Was this:

 down_read(mm-mmap_sem);
 npages = get_user_pages(NULL, mm, addr, 1, 1, 0, NULL, NULL);
 up_read(mm-mmap_sem);

 the intention rather than get_user_pages_fast?


 Nope. The intention was to pass FAULT_FLAG_RETRY to the vma fault handler
 (without _NOWAIT). And once you do that, if you come back without holding
 the mmap sem, you need to call yet again.

 By that point in the call chain I felt comfortable dropping the _fast. All
 paths that get there have already tried _fast (and some have tried _NOWAIT).


 I think a first patch should introduce kvm_get_user_page_retry (Retry a
 fault after a gup with FOLL_NOWAIT.) and the second would add
 FOLL_TRIED (This properly relinquishes mmap semaphore if the
 filemap/swap has to wait on page lock (and retries the gup to completion
 after that).


 That's not what FOLL_TRIED does. The relinquishing of mmap semaphore is done
 by this patch minus the FOLL_TRIED bits. FOLL_TRIED will let the fault
 handler (e.g. filemap) know that we've been there and waited on the IO
 already, so in the common case we won't need to redo the IO.

 Have a look at how FAULT_FLAG_TRIED is used in e.g. arch/x86/mm/fault.c.



 Apart from this, the patch looks good.  The mm/ parts are minimal, so I
 think it's best to merge it through the KVM tree with someone's Acked-by.


 Thanks!
 Andres



 Paolo




 --
 Andres Lagar-Cavilla | Google Cloud Platform | andre...@google.com |
 647-778-4380



-- 
Andres Lagar-Cavilla | Google Cloud Platform | andre...@google.com |
647-778-4380
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off

2014-09-16 Thread Andrew Jones
On Tue, Sep 16, 2014 at 10:38:11AM -0400, Andrew Jones wrote:
 
 
 - Original Message -
  
  
  - Original Message -
   Il 16/09/2014 14:43, Andrew Jones ha scritto:
I don't think we need to worry about this case. AFAIU, enabling the
caches for a particular cpu shouldn't require any synchronization.
So we should be able to do

enable caches
spin_lock
start other processors
spin_unlock
   
   Ok, I'll test and apply your patch then.
  
  Actually, yeah, please apply now in order to get A7 boards working.
  I'll do a follow-on patch to fix the case above (which will require
  deciding how to hand per cpu data).
 
 Post coffee, I don't see why I shouldn't just use SCTLR.C as my
 boolean, which is of course per cpu, and means the same thing,
 i.e. caches enabled or not. I'll send a v2 that drops
 mem_caches_enabled, and modifies the logic of the spin_lock asm.

Scratch this idea. It breaks the use of spin_lock from usr mode.
I think plan B is the best, which is; apply this patch, and then I'll
make mem_caches_enabled per cpu later, once we support per cpu data.

  
   
   Once you change the code to enable caches, please consider hanging on
   spin_lock with caches disabled.
   
   Paolo
   --
   To unsubscribe from this list: send the line unsubscribe kvm in
   the body of a message to majord...@vger.kernel.org
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
   
  ___
  kvmarm mailing list
  kvm...@lists.cs.columbia.edu
  https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
  
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-16 Thread Paolo Bonzini
Il 16/09/2014 18:52, Andres Lagar-Cavilla ha scritto:
 Was this:
 
 down_read(mm-mmap_sem);
 npages = get_user_pages(NULL, mm, addr, 1, 1, 0, NULL, NULL);
 up_read(mm-mmap_sem);
 
 the intention rather than get_user_pages_fast?

I meant the intention of the original author, not yours.

 By that point in the call chain I felt comfortable dropping the _fast.
 All paths that get there have already tried _fast (and some have tried
 _NOWAIT).

Yes, understood.

 I think a first patch should introduce kvm_get_user_page_retry (Retry a
 fault after a gup with FOLL_NOWAIT.) and the second would add
 FOLL_TRIED (This properly relinquishes mmap semaphore if the
 filemap/swap has to wait on page lock (and retries the gup to completion
 after that).
 
 That's not what FOLL_TRIED does. The relinquishing of mmap semaphore is
 done by this patch minus the FOLL_TRIED bits. FOLL_TRIED will let the
 fault handler (e.g. filemap) know that we've been there and waited on
 the IO already, so in the common case we won't need to redo the IO.

Yes, that's not what FOLL_TRIED does.  But it's the difference between
get_user_pages and kvm_get_user_page_retry, right?

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-16 Thread Andres Lagar-Cavilla
On Tue, Sep 16, 2014 at 11:29 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 16/09/2014 18:52, Andres Lagar-Cavilla ha scritto:
 Was this:

 down_read(mm-mmap_sem);
 npages = get_user_pages(NULL, mm, addr, 1, 1, 0, NULL, NULL);
 up_read(mm-mmap_sem);

 the intention rather than get_user_pages_fast?

 I meant the intention of the original author, not yours.

Yes, in all likelihood. I hope!


 By that point in the call chain I felt comfortable dropping the _fast.
 All paths that get there have already tried _fast (and some have tried
 _NOWAIT).

 Yes, understood.

 I think a first patch should introduce kvm_get_user_page_retry (Retry a
 fault after a gup with FOLL_NOWAIT.) and the second would add
 FOLL_TRIED (This properly relinquishes mmap semaphore if the
 filemap/swap has to wait on page lock (and retries the gup to completion
 after that).

 That's not what FOLL_TRIED does. The relinquishing of mmap semaphore is
 done by this patch minus the FOLL_TRIED bits. FOLL_TRIED will let the
 fault handler (e.g. filemap) know that we've been there and waited on
 the IO already, so in the common case we won't need to redo the IO.

 Yes, that's not what FOLL_TRIED does.  But it's the difference between
 get_user_pages and kvm_get_user_page_retry, right?

Unfortunately get_user_pages does not expose the param (int
*nonblocking) that __gup will use to set FAULT_FLAG_ALLOW_RETRY. So
that's one difference. The second difference is that kvm_gup_retry
will call two times if necessary (the second without _RETRY but with
_TRIED).

Thanks
Andres

 Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvmtool/arm{,64}: fix ARM initrd functionality

2014-09-16 Thread Marc Zyngier

On 2014-09-16 15:37, Andre Przywara wrote:

lkvm -i is currently broken on ARM/ARM64.
We should not try to convert smaller-than-4GB addresses into 64-bit
big endian and then stuff them into u32 variables if we expect to 
read

anything other than 0 out of it.
Adjust the type to u64 to write the proper address in BE format into
the /chosen node (and also match the address size we formely posted)
and let Linux thus read the right values.

Signed-off-by: Andre Przywara andre.przyw...@arm.com
---
 tools/kvm/arm/fdt.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/arm/fdt.c b/tools/kvm/arm/fdt.c
index 186a718..1a76b07c 100644
--- a/tools/kvm/arm/fdt.c
+++ b/tools/kvm/arm/fdt.c
@@ -119,8 +119,8 @@ static int setup_fdt(struct kvm *kvm)

/* Initrd */
if (kvm-arch.initrd_size != 0) {
-   u32 ird_st_prop = cpu_to_fdt64(kvm-arch.initrd_guest_start);
-   u32 ird_end_prop = cpu_to_fdt64(kvm-arch.initrd_guest_start +
+   u64 ird_st_prop = cpu_to_fdt64(kvm-arch.initrd_guest_start);
+   u64 ird_end_prop = cpu_to_fdt64(kvm-arch.initrd_guest_start +
   kvm-arch.initrd_size);

_FDT(fdt_property(fdt, linux,initrd-start,


Who need an initrd anyway? ;-)

Acked-by: Marc Zyngier marc.zyng...@arm.com

M.
--
Fast, cheap, reliable. Pick two.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] x86: structs for cpuid info in x86

2014-09-16 Thread Nadav Amit


On 9/16/14 4:22 PM, Ingo Molnar wrote:
 
 * Nadav Amit na...@cs.technion.ac.il wrote:
 
 The code that deals with x86 cpuid fields is hard to follow since it performs
 many bit operations and does not refer to cpuid field explicitly.  To
 eliminate the need of openning a spec whenever dealing with cpuid fields, 
 this
 patch-set introduces structs that reflect the various cpuid functions.

 Thanks for reviewing the patch-set.

 Nadav Amit (3):
   x86: Adding structs to reflect cpuid fields
   x86: Use new cpuid structs in cpuid functions
   KVM: x86: Using cpuid structs in KVM

  arch/x86/include/asm/cpuid_def.h | 163 
 +++
  arch/x86/kernel/cpu/common.c |  56 --
  arch/x86/kvm/cpuid.c |  36 +
  3 files changed, 219 insertions(+), 36 deletions(-)
  create mode 100644 arch/x86/include/asm/cpuid_def.h
 
 I personally like bitfields in theory (they provide type clarity 
 and abstract robustness, compared to open-coded bitmask numeric 
 literals that are often used in cpuid using code, obfuscating 
 cpuid usage), with the big caveat that for many years I didn't 
 like bitfields in practice: older versions of GCC did a really 
 poor job of optimizing them.
 
 So such a series would only be acceptable if it's demonstrated 
 that both 'latest' and 'reasonably old' GCC versions do a good 
 job in that department, compared to the old open-coded bitmask 
 ops ...
 
 Comparing the 'size vmlinux' output of before/after kernels would 
 probably be a good start in seeing the impact of such a change.
 
 If those results are positive then this technique could be 
 propagated to all cpuid using code in arch/x86/, of which
 there's plenty.

Thanks for the quick response. I was not aware GCC behaves this way. I
made some small experiments with GCC-4.8 and GCC-4.4 and in brief my
conclusions are:
1. The assembled code of bitmask and bitfields is indeed different.
2. GCC-4.8 and GCC-4.4 behave pretty much the same, yet GCC-4.8 appears
to make better instructions reordering.
3. Loading/storing a single bitfield seems to be pretty much optimized
(marginal advantage from code size point-of-view for bitmask, same
number of instructions).
4. Loading/storing multiple bitfields seems to be somewhat
under-optimized - multiple accesses to the original value result in ~30%
more instructions and code-size.

So you are correct - bitfields are less optimized. Nonetheless, since
cpuid data is mostly used during startup, and otherwise a single
bitfield is usually accessed in each function - I wonder whether it
worth keeping the optimized but obfuscate code. Obviously, I can guess
your answer to this question...

Nadav
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-16 Thread Radim Krčmář
2014-09-15 13:11-0700, Andres Lagar-Cavilla:
 +int kvm_get_user_page_retry(struct task_struct *tsk, struct mm_struct *mm,

The suffix '_retry' is not best suited for this.
On first reading, I imagined we will be retrying something from before,
possibly calling it in a loop, but we are actually doing the first and
last try in one call.

Hard to find something that conveys our lock-dropping mechanic,
'_polite' is my best candidate at the moment.

 + int flags = FOLL_TOUCH | FOLL_HWPOISON |

(FOLL_HWPOISON wasn't used before, but it's harmless.)

2014-09-16 15:51+0200, Paolo Bonzini:
 Il 15/09/2014 22:11, Andres Lagar-Cavilla ha scritto:
  @@ -1177,9 +1210,15 @@ static int hva_to_pfn_slow(unsigned long addr, bool 
  *async, bool write_fault,
  npages = get_user_page_nowait(current, current-mm,
addr, write_fault, page);
  up_read(current-mm-mmap_sem);
  -   } else
  -   npages = get_user_pages_fast(addr, 1, write_fault,
  -page);
  +   } else {
  +   /*
  +* By now we have tried gup_fast, and possible async_pf, and we
^
(If we really tried get_user_pages_fast, we wouldn't be here, so I'd
 prepend two underscores here as well.)

  +* are certainly not atomic. Time to retry the gup, allowing
  +* mmap semaphore to be relinquished in the case of IO.
  +*/
  +   npages = kvm_get_user_page_retry(current, current-mm, addr,
  +write_fault, page);
 
 This is a separate logical change.  Was this:
 
   down_read(mm-mmap_sem);
   npages = get_user_pages(NULL, mm, addr, 1, 1, 0, NULL, NULL);
   up_read(mm-mmap_sem);
 
 the intention rather than get_user_pages_fast?

I believe so as well.

(Looking at get_user_pages_fast and __get_user_pages_fast made my
 abstraction detector very sad.)

 I think a first patch should introduce kvm_get_user_page_retry (Retry a
 fault after a gup with FOLL_NOWAIT.) and the second would add
 FOLL_TRIED (This properly relinquishes mmap semaphore if the
 filemap/swap has to wait on page lock (and retries the gup to completion
 after that).

Not sure if that would help to understand the goal ...

 Apart from this, the patch looks good.  The mm/ parts are minimal, so I
 think it's best to merge it through the KVM tree with someone's Acked-by.

I would prefer to have the last hunk in a separate patch, but still,

Acked-by: Radim Krčmář rkrc...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-16 Thread Andres Lagar-Cavilla
On Tue, Sep 16, 2014 at 1:51 PM, Radim Krčmář rkrc...@redhat.com wrote:
 2014-09-15 13:11-0700, Andres Lagar-Cavilla:
 +int kvm_get_user_page_retry(struct task_struct *tsk, struct mm_struct *mm,

 The suffix '_retry' is not best suited for this.
 On first reading, I imagined we will be retrying something from before,
 possibly calling it in a loop, but we are actually doing the first and
 last try in one call.

We are doing ... the second and third in most scenarios. async_pf did
the first with _NOWAIT. We call this from the async pf retrier, or if
async pf couldn't be notified to the guest.


 Hard to find something that conveys our lock-dropping mechanic,
 '_polite' is my best candidate at the moment.

I'm at a loss towards finding a better name than '_retry'.


 + int flags = FOLL_TOUCH | FOLL_HWPOISON |

 (FOLL_HWPOISON wasn't used before, but it's harmless.)

Ok. Wasn't 100% sure TBH.


 2014-09-16 15:51+0200, Paolo Bonzini:
 Il 15/09/2014 22:11, Andres Lagar-Cavilla ha scritto:
  @@ -1177,9 +1210,15 @@ static int hva_to_pfn_slow(unsigned long addr, bool 
  *async, bool write_fault,
  npages = get_user_page_nowait(current, current-mm,
addr, write_fault, page);
  up_read(current-mm-mmap_sem);
  -   } else
  -   npages = get_user_pages_fast(addr, 1, write_fault,
  -page);
  +   } else {
  +   /*
  +* By now we have tried gup_fast, and possible async_pf, and we
 ^
 (If we really tried get_user_pages_fast, we wouldn't be here, so I'd
  prepend two underscores here as well.)

Yes, async pf tries and fails to do fast, and then we fallback to
slow, and so on.


  +* are certainly not atomic. Time to retry the gup, allowing
  +* mmap semaphore to be relinquished in the case of IO.
  +*/
  +   npages = kvm_get_user_page_retry(current, current-mm, addr,
  +write_fault, page);

 This is a separate logical change.  Was this:

   down_read(mm-mmap_sem);
   npages = get_user_pages(NULL, mm, addr, 1, 1, 0, NULL, NULL);
   up_read(mm-mmap_sem);

 the intention rather than get_user_pages_fast?

 I believe so as well.

 (Looking at get_user_pages_fast and __get_user_pages_fast made my
  abstraction detector very sad.)

It's clunky, but a separate battle.


 I think a first patch should introduce kvm_get_user_page_retry (Retry a
 fault after a gup with FOLL_NOWAIT.) and the second would add
 FOLL_TRIED (This properly relinquishes mmap semaphore if the
 filemap/swap has to wait on page lock (and retries the gup to completion
 after that).

 Not sure if that would help to understand the goal ...

 Apart from this, the patch looks good.  The mm/ parts are minimal, so I
 think it's best to merge it through the KVM tree with someone's Acked-by.

 I would prefer to have the last hunk in a separate patch, but still,

 Acked-by: Radim Krčmář rkrc...@redhat.com
Awesome, thanks much.

I'll recut with the VM_BUG_ON from Paolo and your Ack. LMK if anything
else from this email should go into the recut.

Andres


-- 
Andres Lagar-Cavilla | Google Kernel Team | andre...@google.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/4 resend] Introduce device assignment flag operation helper function

2014-09-16 Thread Bjorn Helgaas
On Tue, Sep 09, 2014 at 10:21:24AM +0800, Ethan Zhao wrote:
 This patch set introduces three PCI device flag operation helper functions
 when set pci device PF/VF to assigned or deassigned status also check it.
 and patch 2,3,4 apply these helper functions to KVM,XEN and PCI.
 
 v2: simplify unnecessory ternary operation in function pci_is_dev_assigned().
 v3: amend helper function naming.
 v4: fix incorrect type in return expression warning found by Fengguang Wu.
 
 Appreciate suggestion from
 alex.william...@redhat.com,
 david.vra...@citrix.com,
 alexander.h.du...@intel.com
 
 Resend for v3.18 building.

Applied to pci/virtualization for v3.18, thanks.

 
 Thanks,
 Ethan
 ---
 Ethan Zhao (4):
   PCI: introduce helper functions for device flag operation
   KVM: use pci device flag operation helper functions
   xen-pciback: use pci device flag operation helper function
   PCI: use device flag operation helper function in iov.c
 
  drivers/pci/iov.c  |2 +-
  drivers/xen/xen-pciback/pci_stub.c |4 ++--
  include/linux/pci.h|   13 +
  virt/kvm/assigned-dev.c|2 +-
  virt/kvm/iommu.c   |4 ++--
  5 files changed, 19 insertions(+), 6 deletions(-)
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] kvmtool: ARM: Use KVM_ARM_PREFERRED_TARGET vm ioctl to determine target cpu

2014-09-16 Thread Anup Patel
On Thu, Sep 11, 2014 at 9:24 PM, Andre Przywara andre.przyw...@arm.com wrote:
 Hi Anup,

 On 08/09/14 09:17, Anup Patel wrote:
 Instead, of trying out each and every target type we should
 use KVM_ARM_PREFERRED_TARGET vm ioctl to determine target type
 for KVM ARM/ARM64.

 If KVM_ARM_PREFERRED_TARGET vm ioctl fails then we fallback to
 old method of trying all known target types.

 If KVM_ARM_PREFERRED_TARGET vm ioctl succeeds but the returned
 target type is not known to KVMTOOL then we forcefully init
 VCPU with target type returned by KVM_ARM_PREFERRED_TARGET vm ioctl.

 Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org
 Signed-off-by: Anup Patel anup.pa...@linaro.org
 ---
  tools/kvm/arm/kvm-cpu.c |   52 
 +--
  1 file changed, 41 insertions(+), 11 deletions(-)

 diff --git a/tools/kvm/arm/kvm-cpu.c b/tools/kvm/arm/kvm-cpu.c
 index aeaa4cf..ba7a762 100644
 --- a/tools/kvm/arm/kvm-cpu.c
 +++ b/tools/kvm/arm/kvm-cpu.c
 @@ -33,7 +33,8 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, 
 unsigned long cpu_id)
   struct kvm_arm_target *target;
   struct kvm_cpu *vcpu;
   int coalesced_offset, mmap_size, err = -1;
 - unsigned int i;
 + unsigned int i, target_type;
 + struct kvm_vcpu_init preferred_init;
   struct kvm_vcpu_init vcpu_init = {
   .features = ARM_VCPU_FEATURE_FLAGS(kvm, cpu_id)
   };
 @@ -55,19 +56,47 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, 
 unsigned long cpu_id)
   if (vcpu-kvm_run == MAP_FAILED)
   die(unable to mmap vcpu fd);

 - /* Find an appropriate target CPU type. */
 - for (i = 0; i  ARRAY_SIZE(kvm_arm_targets); ++i) {
 - if (!kvm_arm_targets[i])
 - continue;
 - target = kvm_arm_targets[i];
 - vcpu_init.target = target-id;
 - err = ioctl(vcpu-vcpu_fd, KVM_ARM_VCPU_INIT, vcpu_init);
 - if (!err)
 - break;
 + /*
 +  * If preferred target ioctl successful then use preferred target
 +  * else try each and every target type.
 +  */
 + err = ioctl(kvm-vm_fd, KVM_ARM_PREFERRED_TARGET, preferred_init);
 + if (!err) {
 + /* Match preferred target CPU type. */
 + target = NULL;
 + for (i = 0; i  ARRAY_SIZE(kvm_arm_targets); ++i) {
 + if (!kvm_arm_targets[i])
 + continue;
 + if (kvm_arm_targets[i]-id == preferred_init.target) {
 + target = kvm_arm_targets[i];
 + target_type = kvm_arm_targets[i]-id;
 + break;
 + }
 + }
 + if (!target) {
 + target = kvm_arm_targets[0];

 I think you missed the part of the patch which adds the now magic zero
 member of kvm_arm_targets[]. A simple static initializer should work.

Yes, good catch. I will fix this.


 + target_type = preferred_init.target;

 Can't you move that out of the loop, in front of it actually? Then you
 can get rid of the line above setting the target_type also, since you
 always use the same value now, regardless whether you found that CPU in
 the list or not.

Sure, I'll rearrange this.


 + }
 + } else {
 + /* Find an appropriate target CPU type. */
 + for (i = 0; i  ARRAY_SIZE(kvm_arm_targets); ++i) {
 + if (!kvm_arm_targets[i])
 + continue;
 + target = kvm_arm_targets[i];
 + target_type = target-id;
 + vcpu_init.target = target_type;
 + err = ioctl(vcpu-vcpu_fd, KVM_ARM_VCPU_INIT, 
 vcpu_init);
 + if (!err)
 + break;
 + }
 + if (err)
 + die(Unable to find matching target);
   }

 + vcpu_init.target = target_type;
 + err = ioctl(vcpu-vcpu_fd, KVM_ARM_VCPU_INIT, vcpu_init);

 You should do this only in the if-branch above, since you (try to) call
 KVM_ARM_VCPU_INIT already in the else branch before. Otherwise in the
 latter case you would do it twice.

I was trying to follow the old code which was doing KVM_ARM_VCPU_INIT
twice for the same VCPU but it makes sense to do it only once.

Regards,
Anup


 Regards,
 Andre.

   if (err || target-init(vcpu))
 - die(Unable to initialise ARM vcpu);
 + die(Unable to initialise vcpu);

   coalesced_offset = ioctl(kvm-sys_fd, KVM_CHECK_EXTENSION,
KVM_CAP_COALESCED_MMIO);
 @@ -81,6 +110,7 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, 
 unsigned long cpu_id)
   vcpu-cpu_type  = target-id;
   vcpu-cpu_compatible= target-compatible;
   vcpu-is_running= true;
 +
   return vcpu;
  }


 CONFIDENTIALITY 

Re: [PATCH v3 2/4] kvmtool: ARM64: Add target type potenza for aarch64

2014-09-16 Thread Anup Patel
On Thu, Sep 11, 2014 at 9:37 PM, Andre Przywara andre.przyw...@arm.com wrote:
 Anup,

 On 08/09/14 09:17, Anup Patel wrote:
 The VCPU target type KVM_ARM_TARGET_XGENE_POTENZA is available
 in latest Linux-3.16-rcX or higher hence register aarch64 target
 type for it.

 This patch enables us to run KVMTOOL on X-Gene Potenza host.

 Why do you need this still if the previous patch got rid of the need for
 naming each and every CPU in kvmtool?
 Do you care about kernels older than 3.12? I wouldn't bother so much
 since you'd need a much newer kvmtool anyway.

Yes, actually APM SW team uses 3.12 kernel.


 Can you consider dropping this patch then?
 I'd rather avoid adding CPUs to this list needlessly from now on.

I think lets keep this patch because there are quite a few X-Gene
users using older kernel.

Regards,
Anup


 Regards,
 Andre.


 Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org
 Signed-off-by: Anup Patel anup.pa...@linaro.org
 ---
  tools/kvm/arm/aarch64/arm-cpu.c |9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

 diff --git a/tools/kvm/arm/aarch64/arm-cpu.c 
 b/tools/kvm/arm/aarch64/arm-cpu.c
 index ce5ea2f..ce526e3 100644
 --- a/tools/kvm/arm/aarch64/arm-cpu.c
 +++ b/tools/kvm/arm/aarch64/arm-cpu.c
 @@ -41,10 +41,17 @@ static struct kvm_arm_target target_cortex_a57 = {
   .init   = arm_cpu__vcpu_init,
  };

 +static struct kvm_arm_target target_potenza = {
 + .id = KVM_ARM_TARGET_XGENE_POTENZA,
 + .compatible = arm,arm-v8,
 + .init   = arm_cpu__vcpu_init,
 +};
 +
  static int arm_cpu__core_init(struct kvm *kvm)
  {
   return (kvm_cpu__register_kvm_arm_target(target_aem_v8) ||
   kvm_cpu__register_kvm_arm_target(target_foundation_v8) ||
 - kvm_cpu__register_kvm_arm_target(target_cortex_a57));
 + kvm_cpu__register_kvm_arm_target(target_cortex_a57) ||
 + kvm_cpu__register_kvm_arm_target(target_potenza));
  }
  core_init(arm_cpu__core_init);

 CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
 is for the sole use of the intended recipient(s) and contains information
 that is confidential and proprietary to Applied Micro Circuits Corporation or 
 its subsidiaries.
 It is to be used solely for the purpose of furthering the parties' business 
 relationship.
 All unauthorized review, use, disclosure or distribution is prohibited.
 If you are not the intended recipient, please contact the sender by reply 
 e-mail
 and destroy all copies of the original message.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/4] kvmtool: Handle exit reason KVM_EXIT_SYSTEM_EVENT

2014-09-16 Thread Anup Patel
On Thu, Sep 11, 2014 at 9:56 PM, Andre Przywara andre.przyw...@arm.com wrote:

 On 08/09/14 09:17, Anup Patel wrote:
 The KVM_EXIT_SYSTEM_EVENT exit reason was added to define
 architecture independent system-wide events for a Guest.

 Currently, it is used by in-kernel PSCI-0.2 emulation of
 KVM ARM/ARM64 to inform user space about PSCI SYSTEM_OFF
 or PSCI SYSTEM_RESET request.

 For now, we simply treat all system-wide guest events as
 shutdown request in KVMTOOL.

 Is that really a good idea to default to exit_kvm?
 I find a shutdown a rather drastic default.
 Also I'd like to see RESET not easily mapped to shutdown. If the user
 resets the box explicitly, it's probably expected to come up again (to
 load an updated kernel or proceed with an install).

Absolutely correct but we don't have VM reboot API in KVMTOOL
so I choose this route.

 So what about a more explicit message like: ... please restart the VM
 until we gain proper reboot support in kvmtool?

Sure, I will print additional info for reset event such as:
INFO: VM reboot support not available
INFO: Please restart the VM manually

Regards,
Anup


 Regards,
 Andre.

 Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org
 Signed-off-by: Anup Patel anup.pa...@linaro.org
 ---
  tools/kvm/kvm-cpu.c |   19 +++
  1 file changed, 19 insertions(+)

 diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c
 index ee0a8ec..6d01192 100644
 --- a/tools/kvm/kvm-cpu.c
 +++ b/tools/kvm/kvm-cpu.c
 @@ -160,6 +160,25 @@ int kvm_cpu__start(struct kvm_cpu *cpu)
   goto exit_kvm;
   case KVM_EXIT_SHUTDOWN:
   goto exit_kvm;
 + case KVM_EXIT_SYSTEM_EVENT:
 + /*
 +  * Print the type of system event and
 +  * treat all system events as shutdown request.
 +  */
 + switch (cpu-kvm_run-system_event.type) {
 + case KVM_SYSTEM_EVENT_SHUTDOWN:
 + printf(  # Info: shutdown system event\n);
 + break;
 + case KVM_SYSTEM_EVENT_RESET:
 + printf(  # Info: reset system event\n);
 + break;
 + default:
 + printf(  # Warning: unknown system event 
 type=%d\n,
 +cpu-kvm_run-system_event.type);
 + break;
 + };
 + printf(  # Info: exiting KVMTOOL\n);
 + goto exit_kvm;
   default: {
   bool ret;


 CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
 is for the sole use of the intended recipient(s) and contains information
 that is confidential and proprietary to Applied Micro Circuits Corporation or 
 its subsidiaries.
 It is to be used solely for the purpose of furthering the parties' business 
 relationship.
 All unauthorized review, use, disclosure or distribution is prohibited.
 If you are not the intended recipient, please contact the sender by reply 
 e-mail
 and destroy all copies of the original message.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-16 Thread Radim Krčmář
[Emergency posting to fix the tag and couldn't find unmangled Cc list,
 so some recipients were dropped, sorry.  (I guess you are glad though).]

2014-09-16 14:01-0700, Andres Lagar-Cavilla:
 On Tue, Sep 16, 2014 at 1:51 PM, Radim Krčmář rkrc...@redhat.com wrote:
  2014-09-15 13:11-0700, Andres Lagar-Cavilla:
  +int kvm_get_user_page_retry(struct task_struct *tsk, struct
  mm_struct *mm,
 
  The suffix '_retry' is not best suited for this.
  On first reading, I imagined we will be retrying something from
  before,
  possibly calling it in a loop, but we are actually doing the first and
  last try in one call.
 
 We are doing ... the second and third in most scenarios. async_pf did
 the first with _NOWAIT. We call this from the async pf retrier, or if
 async pf couldn't be notified to the guest.

I was thinking more about what the function does, not how we currently
use it -- nothing prevents us from using it as first somewhere -- but
yeah, even comments would be off then.

  Apart from this, the patch looks good.  The mm/ parts are minimal, so
  I
  think it's best to merge it through the KVM tree with someone's
  Acked-by.
 
  I would prefer to have the last hunk in a separate patch, but still,
 
  Acked-by: Radim Krčmář rkrc...@redhat.com
 
 Awesome, thanks much.
 
 I'll recut with the VM_BUG_ON from Paolo and your Ack. LMK if anything
 else from this email should go into the recut.

Ah, sorry, I'm not maintaining mm ... what I meant was

Reviewed-by: Radim Krčmář rkrc...@redhat.com

and I had to leave before I could find a good apology for
VM_WARN_ON_ONCE(), so if you are replacing BUG_ON, you might want to
look at that one as well.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] kvmtool: ARM: Use KVM_ARM_PREFERRED_TARGET vm ioctl to determine target cpu

2014-09-16 Thread Anup Patel
On Wed, Sep 17, 2014 at 3:43 AM, Anup Patel apa...@apm.com wrote:
 On Thu, Sep 11, 2014 at 9:24 PM, Andre Przywara andre.przyw...@arm.com 
 wrote:
 Hi Anup,

 On 08/09/14 09:17, Anup Patel wrote:
 Instead, of trying out each and every target type we should
 use KVM_ARM_PREFERRED_TARGET vm ioctl to determine target type
 for KVM ARM/ARM64.

 If KVM_ARM_PREFERRED_TARGET vm ioctl fails then we fallback to
 old method of trying all known target types.

 If KVM_ARM_PREFERRED_TARGET vm ioctl succeeds but the returned
 target type is not known to KVMTOOL then we forcefully init
 VCPU with target type returned by KVM_ARM_PREFERRED_TARGET vm ioctl.

 Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org
 Signed-off-by: Anup Patel anup.pa...@linaro.org
 ---
  tools/kvm/arm/kvm-cpu.c |   52 
 +--
  1 file changed, 41 insertions(+), 11 deletions(-)

 diff --git a/tools/kvm/arm/kvm-cpu.c b/tools/kvm/arm/kvm-cpu.c
 index aeaa4cf..ba7a762 100644
 --- a/tools/kvm/arm/kvm-cpu.c
 +++ b/tools/kvm/arm/kvm-cpu.c
 @@ -33,7 +33,8 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, 
 unsigned long cpu_id)
   struct kvm_arm_target *target;
   struct kvm_cpu *vcpu;
   int coalesced_offset, mmap_size, err = -1;
 - unsigned int i;
 + unsigned int i, target_type;
 + struct kvm_vcpu_init preferred_init;
   struct kvm_vcpu_init vcpu_init = {
   .features = ARM_VCPU_FEATURE_FLAGS(kvm, cpu_id)
   };
 @@ -55,19 +56,47 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, 
 unsigned long cpu_id)
   if (vcpu-kvm_run == MAP_FAILED)
   die(unable to mmap vcpu fd);

 - /* Find an appropriate target CPU type. */
 - for (i = 0; i  ARRAY_SIZE(kvm_arm_targets); ++i) {
 - if (!kvm_arm_targets[i])
 - continue;
 - target = kvm_arm_targets[i];
 - vcpu_init.target = target-id;
 - err = ioctl(vcpu-vcpu_fd, KVM_ARM_VCPU_INIT, vcpu_init);
 - if (!err)
 - break;
 + /*
 +  * If preferred target ioctl successful then use preferred target
 +  * else try each and every target type.
 +  */
 + err = ioctl(kvm-vm_fd, KVM_ARM_PREFERRED_TARGET, preferred_init);
 + if (!err) {
 + /* Match preferred target CPU type. */
 + target = NULL;
 + for (i = 0; i  ARRAY_SIZE(kvm_arm_targets); ++i) {
 + if (!kvm_arm_targets[i])
 + continue;
 + if (kvm_arm_targets[i]-id == preferred_init.target) {
 + target = kvm_arm_targets[i];
 + target_type = kvm_arm_targets[i]-id;
 + break;
 + }
 + }
 + if (!target) {
 + target = kvm_arm_targets[0];

 I think you missed the part of the patch which adds the now magic zero
 member of kvm_arm_targets[]. A simple static initializer should work.

 Yes, good catch. I will fix this.


 + target_type = preferred_init.target;

 Can't you move that out of the loop, in front of it actually? Then you
 can get rid of the line above setting the target_type also, since you
 always use the same value now, regardless whether you found that CPU in
 the list or not.

 Sure, I'll rearrange this.


 + }
 + } else {
 + /* Find an appropriate target CPU type. */
 + for (i = 0; i  ARRAY_SIZE(kvm_arm_targets); ++i) {
 + if (!kvm_arm_targets[i])
 + continue;
 + target = kvm_arm_targets[i];
 + target_type = target-id;
 + vcpu_init.target = target_type;
 + err = ioctl(vcpu-vcpu_fd, KVM_ARM_VCPU_INIT, 
 vcpu_init);
 + if (!err)
 + break;
 + }
 + if (err)
 + die(Unable to find matching target);
   }

 + vcpu_init.target = target_type;
 + err = ioctl(vcpu-vcpu_fd, KVM_ARM_VCPU_INIT, vcpu_init);

 You should do this only in the if-branch above, since you (try to) call
 KVM_ARM_VCPU_INIT already in the else branch before. Otherwise in the
 latter case you would do it twice.

 I was trying to follow the old code which was doing KVM_ARM_VCPU_INIT
 twice for the same VCPU but it makes sense to do it only once.

 Regards,
 Anup


 Regards,
 Andre.

   if (err || target-init(vcpu))
 - die(Unable to initialise ARM vcpu);
 + die(Unable to initialise vcpu);

   coalesced_offset = ioctl(kvm-sys_fd, KVM_CHECK_EXTENSION,
KVM_CAP_COALESCED_MMIO);
 @@ -81,6 +110,7 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, 
 unsigned long cpu_id)
   vcpu-cpu_type  = target-id;
   vcpu-cpu_compatible= target-compatible;
   

Re: [PATCH v3 3/4] kvmtool: Handle exit reason KVM_EXIT_SYSTEM_EVENT

2014-09-16 Thread Anup Patel
On Wed, Sep 17, 2014 at 3:59 AM, Anup Patel apa...@apm.com wrote:
 On Thu, Sep 11, 2014 at 9:56 PM, Andre Przywara andre.przyw...@arm.com 
 wrote:

 On 08/09/14 09:17, Anup Patel wrote:
 The KVM_EXIT_SYSTEM_EVENT exit reason was added to define
 architecture independent system-wide events for a Guest.

 Currently, it is used by in-kernel PSCI-0.2 emulation of
 KVM ARM/ARM64 to inform user space about PSCI SYSTEM_OFF
 or PSCI SYSTEM_RESET request.

 For now, we simply treat all system-wide guest events as
 shutdown request in KVMTOOL.

 Is that really a good idea to default to exit_kvm?
 I find a shutdown a rather drastic default.
 Also I'd like to see RESET not easily mapped to shutdown. If the user
 resets the box explicitly, it's probably expected to come up again (to
 load an updated kernel or proceed with an install).

 Absolutely correct but we don't have VM reboot API in KVMTOOL
 so I choose this route.

 So what about a more explicit message like: ... please restart the VM
 until we gain proper reboot support in kvmtool?

 Sure, I will print additional info for reset event such as:
 INFO: VM reboot support not available
 INFO: Please restart the VM manually

 Regards,
 Anup


 Regards,
 Andre.

 Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org
 Signed-off-by: Anup Patel anup.pa...@linaro.org
 ---
  tools/kvm/kvm-cpu.c |   19 +++
  1 file changed, 19 insertions(+)

 diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c
 index ee0a8ec..6d01192 100644
 --- a/tools/kvm/kvm-cpu.c
 +++ b/tools/kvm/kvm-cpu.c
 @@ -160,6 +160,25 @@ int kvm_cpu__start(struct kvm_cpu *cpu)
   goto exit_kvm;
   case KVM_EXIT_SHUTDOWN:
   goto exit_kvm;
 + case KVM_EXIT_SYSTEM_EVENT:
 + /*
 +  * Print the type of system event and
 +  * treat all system events as shutdown request.
 +  */
 + switch (cpu-kvm_run-system_event.type) {
 + case KVM_SYSTEM_EVENT_SHUTDOWN:
 + printf(  # Info: shutdown system event\n);
 + break;
 + case KVM_SYSTEM_EVENT_RESET:
 + printf(  # Info: reset system event\n);
 + break;
 + default:
 + printf(  # Warning: unknown system event 
 type=%d\n,
 +cpu-kvm_run-system_event.type);
 + break;
 + };
 + printf(  # Info: exiting KVMTOOL\n);
 + goto exit_kvm;
   default: {
   bool ret;


 CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
 is for the sole use of the intended recipient(s) and contains information
 that is confidential and proprietary to Applied Micro Circuits Corporation 
 or its subsidiaries.
 It is to be used solely for the purpose of furthering the parties' business 
 relationship.
 All unauthorized review, use, disclosure or distribution is prohibited.
 If you are not the intended recipient, please contact the sender by reply 
 e-mail
 and destroy all copies of the original message.


Please ignore this notice. I used wrong email in my last reply.

Regards,
Anup

 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/4] kvmtool: ARM64: Add target type potenza for aarch64

2014-09-16 Thread Anup Patel
On Wed, Sep 17, 2014 at 3:54 AM, Anup Patel apa...@apm.com wrote:
 On Thu, Sep 11, 2014 at 9:37 PM, Andre Przywara andre.przyw...@arm.com 
 wrote:
 Anup,

 On 08/09/14 09:17, Anup Patel wrote:
 The VCPU target type KVM_ARM_TARGET_XGENE_POTENZA is available
 in latest Linux-3.16-rcX or higher hence register aarch64 target
 type for it.

 This patch enables us to run KVMTOOL on X-Gene Potenza host.

 Why do you need this still if the previous patch got rid of the need for
 naming each and every CPU in kvmtool?
 Do you care about kernels older than 3.12? I wouldn't bother so much
 since you'd need a much newer kvmtool anyway.

 Yes, actually APM SW team uses 3.12 kernel.


 Can you consider dropping this patch then?
 I'd rather avoid adding CPUs to this list needlessly from now on.

 I think lets keep this patch because there are quite a few X-Gene
 users using older kernel.

 Regards,
 Anup


 Regards,
 Andre.


 Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org
 Signed-off-by: Anup Patel anup.pa...@linaro.org
 ---
  tools/kvm/arm/aarch64/arm-cpu.c |9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

 diff --git a/tools/kvm/arm/aarch64/arm-cpu.c 
 b/tools/kvm/arm/aarch64/arm-cpu.c
 index ce5ea2f..ce526e3 100644
 --- a/tools/kvm/arm/aarch64/arm-cpu.c
 +++ b/tools/kvm/arm/aarch64/arm-cpu.c
 @@ -41,10 +41,17 @@ static struct kvm_arm_target target_cortex_a57 = {
   .init   = arm_cpu__vcpu_init,
  };

 +static struct kvm_arm_target target_potenza = {
 + .id = KVM_ARM_TARGET_XGENE_POTENZA,
 + .compatible = arm,arm-v8,
 + .init   = arm_cpu__vcpu_init,
 +};
 +
  static int arm_cpu__core_init(struct kvm *kvm)
  {
   return (kvm_cpu__register_kvm_arm_target(target_aem_v8) ||
   kvm_cpu__register_kvm_arm_target(target_foundation_v8) ||
 - kvm_cpu__register_kvm_arm_target(target_cortex_a57));
 + kvm_cpu__register_kvm_arm_target(target_cortex_a57) ||
 + kvm_cpu__register_kvm_arm_target(target_potenza));
  }
  core_init(arm_cpu__core_init);

 CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
 is for the sole use of the intended recipient(s) and contains information
 that is confidential and proprietary to Applied Micro Circuits Corporation 
 or its subsidiaries.
 It is to be used solely for the purpose of furthering the parties' business 
 relationship.
 All unauthorized review, use, disclosure or distribution is prohibited.
 If you are not the intended recipient, please contact the sender by reply 
 e-mail
 and destroy all copies of the original message.


Please ignore this notice. I used wrong email in my last reply.

Regards,
Anup

 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Using the tlb flush util function where applicable

2014-09-16 Thread Wanpeng Li
Hi Radim,
On Mon, Sep 15, 2014 at 09:33:52PM +0200, Radim Krčmář wrote:
2014-09-12 17:06-0400, Liang Chen:
 Using kvm_mmu_flush_tlb as the other places to make sure vcpu
  stat is incremented
 
 Signed-off-by: Liang Chen liangchen.li...@gmail.com
 ---

Good catch.

  arch/x86/kvm/vmx.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index bfe11cf..439682e 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -1810,7 +1810,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int 
 cpu)
  struct desc_ptr *gdt = __get_cpu_var(host_gdt);
  unsigned long sysenter_esp;
  
 -kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 +kvm_mmu_flush_tlb(vcpu);
  local_irq_disable();
  crash_disable_local_vmclear(cpu);
  

And to hijack this thread ...
I noticed three other deficits in stat.tlb_flush, patch below.

Do you prefer the current behavior?

--- 8 ---
KVM: x86: count actual tlb flushes

- we count KVM_REQ_TLB_FLUSH requests, not actual flushes

So there maybe multiple requests accumulated at the point of kvm_check_request, 
if your patch account these accumulations correctly?

Regards,
Wanpeng Li 

  (KVM can have multiple requests for one flush)
- flushes from kvm_flush_remote_tlbs aren't counted
- it's easy to make a direct request by mistake

Solve these by postponing the counting to kvm_check_request().

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
(And what about a possible followup patch that replaces
 kvm_mmu_flush_tlb() with kvm_make_request() again?
 It would free the namespace a bit and we could call something
 similarly named from vcpu_enter_guest() to do the job.)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..b41fd97 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3452,7 +3452,6 @@ static void nonpaging_init_context(struct kvm_vcpu *vcpu,
 
 void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu)
 {
-  ++vcpu-stat.tlb_flush;
   kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_flush_tlb);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 916e895..0f0ad08 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6041,8 +6041,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
   }
   if (kvm_check_request(KVM_REQ_MMU_SYNC, vcpu))
   kvm_mmu_sync_roots(vcpu);
-  if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
+  if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
+  ++vcpu-stat.tlb_flush;
   kvm_x86_ops-tlb_flush(vcpu);
+  }
   if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) {
   vcpu-run-exit_reason = KVM_EXIT_TPR_ACCESS;
   r = 0;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 07/17] COLO buffer: implement colo buffer as well as QEMUFileOps based on it

2014-09-16 Thread Hongyang Yang

Hi

在 08/01/2014 10:52 PM, Dr. David Alan Gilbert 写道:

* Yang Hongyang (yan...@cn.fujitsu.com) wrote:

We need a buffer to store migration data.

On save side:
   all saved data was write into colo buffer first, so that we can know
the total size of the migration data. this can also separate the data
transmission from colo control data, we use colo control data over
socket fd to synchronous both side's stat.

On restore side:
   all migration data was read into colo buffer first, then load data
from the buffer: If network error happens while data transmission,
the slaver can still functinal because the migration data are not yet
loaded.


This is very similar to the QEMUSizedBuffer based QEMUFile's that Stefan Berger
wrote and that I use in both my postcopy and BER patchsets:

  http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg00846.html

  (and to the similar code from Isaku Yamahata).

I think we should be able to use a shared version even if we need some changes.


I've being using this patch as COLO buffer, see my latest branch:
https://github.com/macrosheep/qemu/tree/colo_v0.4

But this QEMUSizedBuffer does not exactly match our needs, although it can work.
We need a static buffer that lives through COLO process so that we don't need
to alloc/free buffers every checkpoint. Currently open the buffer for write
always alloc a new buffer, I think I need the following modification:

1. ability to open an existing QEMUSizedBuffer for write
2. a reset_buf() api, that will clear buffered data, or just rewind QEMUFile
   related to the buffer is enough.





Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
  migration-colo.c | 112 +++
  1 file changed, 112 insertions(+)

diff --git a/migration-colo.c b/migration-colo.c
index d566b9d..b90d9b6 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -11,6 +11,7 @@
  #include qemu/main-loop.h
  #include qemu/thread.h
  #include block/coroutine.h
+#include qemu/error-report.h
  #include migration/migration-colo.h

  static QEMUBH *colo_bh;
@@ -20,14 +21,122 @@ bool colo_supported(void)
  return true;
  }

+/* colo buffer */
+
+#define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL)
+#define COLO_BUFFER_MAX_SIZE (1000*1000*1000*10ULL)


Powers of 2 are nicer!

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
.



--
Thanks,
Yang.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How KVM hypervisor allocates physical pages to the VM.

2014-09-16 Thread Steven
Dear KVM Developers:
I have some questions about how KVM hypervisor requests and allocate
physical pages to the VM. I am using kernel version 3.2.14.

I run a microbenchmark in the VM, which declares an array with certain
size and then assigns some value to all the elements in the array,
which causes page fault captured by the hypervisor and allocates the
machine physical pages.

Depending on the array size, I got two different trace results.

(1) When array size == 10MB or 20 MB, the trace file has the page
allocation events for the qemu-kvm process as follows

  qemu-kvm-14402 [004] 1912063.581683: kmalloc_node:
call_site=813df54a ptr=880779a42800 bytes_req=576
bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT node=-1

qemu-kvm-14402 [004] 1912063.581701: kmem_cache_alloc_node:
call_site=813e302c ptr=8800229ff500 bytes_req=256
bytes_alloc=256 gfp_flags=GFP_KERNEL|GFP_REPEAT node=-1

qemu-kvm-14402 [004] 1912063.581701: kmalloc_node:
call_site=813df54a ptr=880779a42800 bytes_req=576
bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT node=-1

qemu-kvm-14402 [004] 1912063.581710: kmem_cache_alloc_node:
call_site=813e302c ptr=8800229ff800 bytes_req=256
bytes_alloc=256 gfp_flags=GFP_KERNEL|GFP_REPEAT node=-1
qemu-kvm-14402 [004] 1912063.581710: kmalloc_node:
call_site=813df54a ptr=880779a47800 bytes_req=640
bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT node=-1
qemu-kvm-14402 [004] 1912063.581728: kmem_cache_alloc_node:
call_site=813e302c ptr=8800229ffe00 bytes_req=256
bytes_alloc=256 gfp_flags=GFP_KERNEL|GFP_REPEAT
 ... ...

(2) When array size == 40MB, the trace file has the page allocation events as

qemu-kvm-14450 [005] 1911006.440538: mm_page_alloc:
page=ea0002570f40 pfn=613437 order=0 migratetype=2
gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZERO

qemu-kvm-14450 [005] 1911006.440542: mm_page_alloc:
page=ea0002577480 pfn=613842 order=0 migratetype=2
gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZERO

qemu-kvm-14450 [005] 1911006.440545: mm_page_alloc:
page=ea70a3c0 pfn=115343 order=0 migratetype=2
gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZERO

qemu-kvm-14450 [005] 1911006.440549: mm_page_alloc:
page=ea0001a9a500 pfn=435860 order=0 migratetype=2
gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZERO

qemu-kvm-14450 [005] 1911006.440552: mm_page_alloc:
page=ea00016f7e80 pfn=376314 order=0 migratetype=2
gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZERO

qemu-kvm-14450 [005] 1911006.440556: mm_page_alloc:
page=ea0001a9d700 pfn=436060 order=0 migratetype=2
gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZERO

qemu-kvm-14450 [005] 1911006.440559: mm_page_alloc:
page=ea0002576880 pfn=613794 order=0 migratetype=2
gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZERO

qemu-kvm-14450 [005] 1911006.440563: mm_page_alloc:
page=ea00016f7b40 pfn=376301 order=0 migratetype=2
gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZERO

qemu-kvm-14450 [005] 1911006.440569: mm_page_alloc:
page=ea00016f7f80 pfn=376318 order=0 migratetype=2
gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZER


--

When size = 10MB and 20MB, it looks like that KVM use
kmem_cache_alloc_node and kmalloc_node to allocate physical pages.
However, when size = 40MB, KVM hypervisor uses mm_page_alloc to
allocator physical pages. The former is based on the slab allocator,
while the latter is directly from the buddy allocator.

So what is the heuristic used by the KVM to determine when to use the
slab allocator or directly from the buddy allocator? Or is there
anything wrong with my trace file? Thanks in advance.


- Hui
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-16 Thread Andres Lagar-Cavilla
On Tue, Sep 16, 2014 at 3:34 PM, Radim Krčmář rkrc...@redhat.com wrote:
 [Emergency posting to fix the tag and couldn't find unmangled Cc list,
  so some recipients were dropped, sorry.  (I guess you are glad though).]

 2014-09-16 14:01-0700, Andres Lagar-Cavilla:
 On Tue, Sep 16, 2014 at 1:51 PM, Radim Krčmář rkrc...@redhat.com wrote:
  2014-09-15 13:11-0700, Andres Lagar-Cavilla:
  +int kvm_get_user_page_retry(struct task_struct *tsk, struct
  mm_struct *mm,
 
  The suffix '_retry' is not best suited for this.
  On first reading, I imagined we will be retrying something from
  before,
  possibly calling it in a loop, but we are actually doing the first and
  last try in one call.

 We are doing ... the second and third in most scenarios. async_pf did
 the first with _NOWAIT. We call this from the async pf retrier, or if
 async pf couldn't be notified to the guest.

 I was thinking more about what the function does, not how we currently
 use it -- nothing prevents us from using it as first somewhere -- but
 yeah, even comments would be off then.


Good point. Happy to expand comments. What about _complete? _io? _full?

  Apart from this, the patch looks good.  The mm/ parts are minimal, so
  I
  think it's best to merge it through the KVM tree with someone's
  Acked-by.
 
  I would prefer to have the last hunk in a separate patch, but still,
 
  Acked-by: Radim Krčmář rkrc...@redhat.com

 Awesome, thanks much.

 I'll recut with the VM_BUG_ON from Paolo and your Ack. LMK if anything
 else from this email should go into the recut.

 Ah, sorry, I'm not maintaining mm ... what I meant was

 Reviewed-by: Radim Krčmář rkrc...@redhat.com

Cool cool cool
Andres


 and I had to leave before I could find a good apology for
 VM_WARN_ON_ONCE(), so if you are replacing BUG_ON, you might want to
 look at that one as well.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html