date:20120703

Re: [PATCH 5/6 v5] deal with guest panicked event accoring to -onpanic parameter

2012-07-03 Thread Wen Congyang

At 06/28/2012 04:26 PM, Jan Kiszka Wrote:
 On 2012-06-28 03:15, Wen Congyang wrote:
 At 06/27/2012 10:39 PM, Jan Kiszka Wrote:
 On 2012-06-27 09:02, Wen Congyang wrote:
 When the guest is panicked, it will write 0x1 to the port KVM_PV_PORT.
 So if qemu reads 0x1 from this port, we can do the folloing three
 things according to the parameter -onpanic:
 1. emit QEVENT_GUEST_PANICKED only
 2. emit QEVENT_GUEST_PANICKED and pause the guest
 3. emit QEVENT_GUEST_PANICKED and poweroff the guest
 4. emit QEVENT_GUEST_PANICKED and reset the guest

 Note: if we emit QEVENT_GUEST_PANICKED only, and the management
 application does not receive this event(the management may not
 run when the event is emitted), the management won't know the
 guest is panicked.

 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 ---
  kvm-all.c   |  101 
 +++
  kvm-stub.c  |9 +
  kvm.h   |3 ++
  qemu-options.hx |   15 
  vl.c|   10 +
  5 files changed, 138 insertions(+), 0 deletions(-)

 diff --git a/kvm-all.c b/kvm-all.c
 index f8e4328..9494dd2 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -19,6 +19,8 @@
  #include stdarg.h
  
  #include linux/kvm.h
 +#include linux/kvm_para.h
 +#include asm/kvm_para.h
  
  #include qemu-common.h
  #include qemu-barrier.h
 @@ -32,6 +34,9 @@
  #include bswap.h
  #include memory.h
  #include exec-memory.h
 +#include iorange.h
 +#include qemu-objects.h
 +#include monitor.h
  
  /* This check must be after config-host.h is included */
  #ifdef CONFIG_EVENTFD
 @@ -1931,3 +1936,99 @@ int kvm_on_sigbus(int code, void *addr)
  {
  return kvm_arch_on_sigbus(code, addr);
  }
 +
 +/* Possible values for action parameter. */
 +#define PANICKED_REPORT 1   /* emit QEVENT_GUEST_PANICKED only */
 +#define PANICKED_PAUSE  2   /* emit QEVENT_GUEST_PANICKED and pause 
 VM */
 +#define PANICKED_POWEROFF   3   /* emit QEVENT_GUEST_PANICKED and quit VM 
 */
 +#define PANICKED_RESET  4   /* emit QEVENT_GUEST_PANICKED and reset 
 VM */
 +
 +static int panicked_action = PANICKED_REPORT;
 +
 +static void kvm_pv_port_read(IORange *iorange, uint64_t offset, unsigned 
 width,
 + uint64_t *data)
 +{
 +*data = (1  KVM_PV_FEATURE_PANICKED);
 +}
 +
 +static void panicked_mon_event(const char *action)
 +{
 +QObject *data;
 +
 +data = qobject_from_jsonf({ 'action': %s }, action);
 +monitor_protocol_event(QEVENT_GUEST_PANICKED, data);
 +qobject_decref(data);
 +}
 +
 +static void panicked_perform_action(void)
 +{
 +switch (panicked_action) {
 +case PANICKED_REPORT:
 +panicked_mon_event(report);
 +break;
 +
 +case PANICKED_PAUSE:
 +panicked_mon_event(pause);
 +vm_stop(RUN_STATE_GUEST_PANICKED);
 +break;
 +
 +case PANICKED_POWEROFF:
 +panicked_mon_event(poweroff);
 +exit(0);
 +break;
 +case PANICKED_RESET:
 +panicked_mon_event(reset);
 +qemu_system_reset_request();
 +break;
 +}
 +}
 +
 +static void kvm_pv_port_write(IORange *iorange, uint64_t offset, unsigned 
 width,
 +  uint64_t data)
 +{
 +if (data == KVM_PV_PANICKED) {
 +panicked_perform_action();
 +}
 +}
 +
 +static void kvm_pv_port_destructor(IORange *iorange)
 +{
 +g_free(iorange);
 +}
 +
 +static IORangeOps pv_io_range_ops = {
 +.read = kvm_pv_port_read,
 +.write = kvm_pv_port_write,
 +.destructor = kvm_pv_port_destructor,
 +};
 +
 +#if defined(KVM_PV_PORT)
 +void kvm_pv_port_init(void)
 +{
 +IORange *pv_io_range = g_malloc(sizeof(IORange));
 +
 +iorange_init(pv_io_range, pv_io_range_ops, KVM_PV_PORT, 1);
 +ioport_register(pv_io_range);

 This modeling is still not ok. We don't open-code ports anymore, we
 introduce devices. And this doesn't belong inro generic code as it is

 Do you mean introducing a new device instead of I/O port?
 
 I mean encapsulating the I/O registration (PIO or MMIO) in a QOM device

A QOM device? Do you mean introduce a new device? If so, the guest should
have a driver to know such device. Another problem is: we cannot use
such device when the kernel is starting(the device's driver is not ready).
If we use a new device, I think virtio-serial is enough. The reason why
I do not use virtio-serial is: I want the feature can also work when the
kernel is starting.

Thanks
Wen Congyang
 and building that device only for target archs that supports it. Already
 pointed you to examples in hw/kvm/.
 
 Jan
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] mm: mmu_notifier: fix freed page still mapped in secondary MMU

2012-07-03 Thread Xiao Guangrong

mmu_notifier_release is called when the process is exiting, it
will delete all the mmu notifiers, but, in this time, the page
belonged to the process is still present at the page table and
listed in the LRU list, so this race will happen:

  CPU 0 CPU 1
mmu_notifier_release:try_to_unmap:
   hlist_del_init_rcu(mn-hlist);
ptep_clear_flush_notify:
  mmu nofifler not found
free page  !!
/*
 * At the point, the page has been
 * freed, but it is still mapped in
 * the secondary MMU.
 */

  mn-ops-release(mn, mm);

Then, the box is not stable and sometimes we can get this bug:
[  738.075923] BUG: Bad page state in process migrate-perf  pfn:03bec
[  738.075931] page:ea0efb00 count:0 mapcount:0 mapping:  
(null) index:0x8076
[  738.075936] page flags: 0x200014(referenced|dirty)

The same issue is in the mmu_notifier_unregister

we can call -release before deleting the notifier to ensure
the page has been unmapped from the secondary MMU before it is
freed

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 mm/mmu_notifier.c |   45 +++--
 1 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 9a611d3..862b608 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -33,6 +33,24 @@
 void __mmu_notifier_release(struct mm_struct *mm)
 {
struct mmu_notifier *mn;
+   struct hlist_node *n;
+
+   /*
+* RCU here will block mmu_notifier_unregister until
+* -release returns.
+*/
+   rcu_read_lock();
+   hlist_for_each_entry_rcu(mn, n, mm-mmu_notifier_mm-list, hlist)
+   /*
+* if -release runs before mmu_notifier_unregister it
+* must be handled as it's the only way for the driver
+* to flush all existing sptes and stop the driver
+* from establishing any more sptes before all the
+* pages in the mm are freed.
+*/
+   if (mn-ops-release)
+   mn-ops-release(mn, mm);
+   rcu_read_unlock();

spin_lock(mm-mmu_notifier_mm-lock);
while (unlikely(!hlist_empty(mm-mmu_notifier_mm-list))) {
@@ -46,23 +64,6 @@ void __mmu_notifier_release(struct mm_struct *mm)
 * mmu_notifier_unregister to return.
 */
hlist_del_init_rcu(mn-hlist);
-   /*
-* RCU here will block mmu_notifier_unregister until
-* -release returns.
-*/
-   rcu_read_lock();
-   spin_unlock(mm-mmu_notifier_mm-lock);
-   /*
-* if -release runs before mmu_notifier_unregister it
-* must be handled as it's the only way for the driver
-* to flush all existing sptes and stop the driver
-* from establishing any more sptes before all the
-* pages in the mm are freed.
-*/
-   if (mn-ops-release)
-   mn-ops-release(mn, mm);
-   rcu_read_unlock();
-   spin_lock(mm-mmu_notifier_mm-lock);
}
spin_unlock(mm-mmu_notifier_mm-lock);

@@ -284,16 +285,13 @@ void mmu_notifier_unregister(struct mmu_notifier *mn, 
struct mm_struct *mm)
 {
BUG_ON(atomic_read(mm-mm_count) = 0);

-   spin_lock(mm-mmu_notifier_mm-lock);
if (!hlist_unhashed(mn-hlist)) {
-   hlist_del_rcu(mn-hlist);
-
/*
 * RCU here will force exit_mmap to wait -release to finish
 * before freeing the pages.
 */
rcu_read_lock();
-   spin_unlock(mm-mmu_notifier_mm-lock);
+
/*
 * exit_mmap will block in mmu_notifier_release to
 * guarantee -release is called before freeing the
@@ -302,8 +300,11 @@ void mmu_notifier_unregister(struct mmu_notifier *mn, 
struct mm_struct *mm)
if (mn-ops-release)
mn-ops-release(mn, mm);
rcu_read_unlock();
-   } else
+
+   spin_lock(mm-mmu_notifier_mm-lock);
+   hlist_del_rcu(mn-hlist);
spin_unlock(mm-mmu_notifier_mm-lock);
+   }

/*
 * Wait any running method to finish, of course including
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] KVM: fix fault page leak

2012-07-03 Thread Xiao Guangrong

fault_page is forgot to be freed

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 virt/kvm/kvm_main.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 02cb440..157226d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2857,6 +2857,7 @@ void kvm_exit(void)
kvm_arch_hardware_unsetup();
kvm_arch_exit();
free_cpumask_var(cpus_hardware_enabled);
+   __free_page(fault_page);
__free_page(hwpoison_page);
__free_page(bad_page);
 }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] KVM: MMU: fix shrinking page from the empty mmu

2012-07-03 Thread Xiao Guangrong

Fix:

 [ 3190.059226] BUG: unable to handle kernel NULL pointer dereference at
   (null)
 [ 3190.062224] IP: [a02aac66] mmu_page_zap_pte+0x10/0xa7 [kvm]
 [ 3190.063760] PGD 104f50067 PUD 112bea067 PMD 0
 [ 3190.065309] Oops:  [#1] SMP DEBUG_PAGEALLOC
 [ 3190.066860] CPU 1
[ .. ]
 [ 3190.109629] Call Trace:
 [ 3190.111342]  [a02aada6] kvm_mmu_prepare_zap_page+0xa9/0x1fc [kvm]
 [ 3190.113091]  [a02ab2f5] mmu_shrink+0x11f/0x1f3 [kvm]
 [ 3190.114844]  [a02ab25d] ? mmu_shrink+0x87/0x1f3 [kvm]
 [ 3190.116598]  [81150c9d] ? prune_super+0x142/0x154
 [ 3190.118333]  [8110a4f4] ? shrink_slab+0x39/0x31e
 [ 3190.120043]  [8110a687] shrink_slab+0x1cc/0x31e
 [ 3190.121718]  [8110ca1d] do_try_to_free_pages

This is caused by shrinking page from the empty mmu, although we have
checked n_used_mmu_pages, it is useless since the check is out of mmu-lock

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 arch/x86/kvm/mmu.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 24dd43d..cac3408 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3936,6 +3936,9 @@ static void kvm_mmu_remove_some_alloc_mmu_pages(struct 
kvm *kvm,
 {
struct kvm_mmu_page *page;

+   if (list_empty(kvm-arch.active_mmu_pages))
+   return;
+
page = container_of(kvm-arch.active_mmu_pages.prev,
struct kvm_mmu_page, link);
kvm_mmu_prepare_zap_page(kvm, page, invalid_list);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] KVM: MMU: track the refcount when unmap the page

2012-07-03 Thread Xiao Guangrong

It will trigger a WARN_ON if the page has been freed but it is still
used in mmu, it can help us to detect mm bug early

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 arch/x86/kvm/mmu.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index cac3408..af7e076 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -527,6 +527,14 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
return 0;

pfn = spte_to_pfn(old_spte);
+
+   /*
+* KVM does not hold the refcount of the page used by
+* kvm mmu, before reclaiming the page, we should
+* unmap it from mmu first.
+*/
+   WARN_ON(!page_count(pfn_to_page(pfn)));
+
if (!shadow_accessed_mask || old_spte  shadow_accessed_mask)
kvm_set_pfn_accessed(pfn);
if (!shadow_dirty_mask || (old_spte  shadow_dirty_mask))
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/6 v5] deal with guest panicked event accoring to -onpanic parameter

2012-07-03 Thread Jan Kiszka

On 2012-07-03 08:07, Wen Congyang wrote:
 At 06/28/2012 04:26 PM, Jan Kiszka Wrote:
 On 2012-06-28 03:15, Wen Congyang wrote:
 At 06/27/2012 10:39 PM, Jan Kiszka Wrote:
 On 2012-06-27 09:02, Wen Congyang wrote:
 When the guest is panicked, it will write 0x1 to the port KVM_PV_PORT.
 So if qemu reads 0x1 from this port, we can do the folloing three
 things according to the parameter -onpanic:
 1. emit QEVENT_GUEST_PANICKED only
 2. emit QEVENT_GUEST_PANICKED and pause the guest
 3. emit QEVENT_GUEST_PANICKED and poweroff the guest
 4. emit QEVENT_GUEST_PANICKED and reset the guest

 Note: if we emit QEVENT_GUEST_PANICKED only, and the management
 application does not receive this event(the management may not
 run when the event is emitted), the management won't know the
 guest is panicked.

 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 ---
  kvm-all.c   |  101 
 +++
  kvm-stub.c  |9 +
  kvm.h   |3 ++
  qemu-options.hx |   15 
  vl.c|   10 +
  5 files changed, 138 insertions(+), 0 deletions(-)

 diff --git a/kvm-all.c b/kvm-all.c
 index f8e4328..9494dd2 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -19,6 +19,8 @@
  #include stdarg.h
  
  #include linux/kvm.h
 +#include linux/kvm_para.h
 +#include asm/kvm_para.h
  
  #include qemu-common.h
  #include qemu-barrier.h
 @@ -32,6 +34,9 @@
  #include bswap.h
  #include memory.h
  #include exec-memory.h
 +#include iorange.h
 +#include qemu-objects.h
 +#include monitor.h
  
  /* This check must be after config-host.h is included */
  #ifdef CONFIG_EVENTFD
 @@ -1931,3 +1936,99 @@ int kvm_on_sigbus(int code, void *addr)
  {
  return kvm_arch_on_sigbus(code, addr);
  }
 +
 +/* Possible values for action parameter. */
 +#define PANICKED_REPORT 1   /* emit QEVENT_GUEST_PANICKED only */
 +#define PANICKED_PAUSE  2   /* emit QEVENT_GUEST_PANICKED and pause 
 VM */
 +#define PANICKED_POWEROFF   3   /* emit QEVENT_GUEST_PANICKED and quit 
 VM */
 +#define PANICKED_RESET  4   /* emit QEVENT_GUEST_PANICKED and reset 
 VM */
 +
 +static int panicked_action = PANICKED_REPORT;
 +
 +static void kvm_pv_port_read(IORange *iorange, uint64_t offset, unsigned 
 width,
 + uint64_t *data)
 +{
 +*data = (1  KVM_PV_FEATURE_PANICKED);
 +}
 +
 +static void panicked_mon_event(const char *action)
 +{
 +QObject *data;
 +
 +data = qobject_from_jsonf({ 'action': %s }, action);
 +monitor_protocol_event(QEVENT_GUEST_PANICKED, data);
 +qobject_decref(data);
 +}
 +
 +static void panicked_perform_action(void)
 +{
 +switch (panicked_action) {
 +case PANICKED_REPORT:
 +panicked_mon_event(report);
 +break;
 +
 +case PANICKED_PAUSE:
 +panicked_mon_event(pause);
 +vm_stop(RUN_STATE_GUEST_PANICKED);
 +break;
 +
 +case PANICKED_POWEROFF:
 +panicked_mon_event(poweroff);
 +exit(0);
 +break;
 +case PANICKED_RESET:
 +panicked_mon_event(reset);
 +qemu_system_reset_request();
 +break;
 +}
 +}
 +
 +static void kvm_pv_port_write(IORange *iorange, uint64_t offset, 
 unsigned width,
 +  uint64_t data)
 +{
 +if (data == KVM_PV_PANICKED) {
 +panicked_perform_action();
 +}
 +}
 +
 +static void kvm_pv_port_destructor(IORange *iorange)
 +{
 +g_free(iorange);
 +}
 +
 +static IORangeOps pv_io_range_ops = {
 +.read = kvm_pv_port_read,
 +.write = kvm_pv_port_write,
 +.destructor = kvm_pv_port_destructor,
 +};
 +
 +#if defined(KVM_PV_PORT)
 +void kvm_pv_port_init(void)
 +{
 +IORange *pv_io_range = g_malloc(sizeof(IORange));
 +
 +iorange_init(pv_io_range, pv_io_range_ops, KVM_PV_PORT, 1);
 +ioport_register(pv_io_range);

 This modeling is still not ok. We don't open-code ports anymore, we
 introduce devices. And this doesn't belong inro generic code as it is

 Do you mean introducing a new device instead of I/O port?

 I mean encapsulating the I/O registration (PIO or MMIO) in a QOM device
 
 A QOM device? Do you mean introduce a new device? If so, the guest should
 have a driver to know such device. Another problem is: we cannot use
 such device when the kernel is starting(the device's driver is not ready).
 If we use a new device, I think virtio-serial is enough. The reason why
 I do not use virtio-serial is: I want the feature can also work when the
 kernel is starting.

I'm not talking about changing the interface to the guest, I'm talking
about how to model it in QEMU. And that difference would be transparent
to the guest. I pointed you to examples like hw/kvm/clock.c.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Expose tsc deadline timer feature to guest

2012-07-03 Thread Jan Kiszka

On 2012-07-02 13:08, Liu, Jinsong wrote:
 Eduardo, Jan, Andreas
 
 As we sync 3 months ago, I wait until qemu1.1 done, then re-write patch based 
 on qemu1.1.
 Now it's time to re-write my patch based on qemu1.1.
 
 Attached is a RFC patch for exposing tsc deadline timer to guest.
 I have checked current qemu1.1 code, and read some emails regarding to cpuid 
 exposing these days.
 However, I think I may ignore something (so many discussion :-), so if you 
 think anything wrong, please point out to me.
 
 Thanks,
 Jinsong
 
 
From 8b5b003f6f8834d2d5d71e18bb47b7f089bc4928 Mon Sep 17 00:00:00 2001
 From: Liu, Jinsong jinsong@intel.com
 Date: Tue, 3 Jul 2012 02:35:10 +0800
 Subject: [PATCH] Expose tsc deadline timer feature to guest
 
 This patch exposes tsc deadline timer feature to guest if
 1). in-kernel irqchip is used, and
 2). kvm has emulated tsc deadline timer, and
 3). user authorize the feature exposing via -cpu or +/- tsc-deadline
 
 Signed-off-by: Liu, Jinsong jinsong@intel.com
 ---
  target-i386/cpu.h |1 +
  target-i386/kvm.c |5 +
  2 files changed, 6 insertions(+), 0 deletions(-)
 
 diff --git a/target-i386/cpu.h b/target-i386/cpu.h
 index 79cc640..d1a4a04 100644
 --- a/target-i386/cpu.h
 +++ b/target-i386/cpu.h
 @@ -400,6 +400,7 @@
  #define CPUID_EXT_X2APIC   (1  21)
  #define CPUID_EXT_MOVBE(1  22)
  #define CPUID_EXT_POPCNT   (1  23)
 +#define CPUID_EXT_TSC_DEADLINE_TIMER (1  24)
  #define CPUID_EXT_XSAVE(1  26)
  #define CPUID_EXT_OSXSAVE  (1  27)
  #define CPUID_EXT_HYPERVISOR  (1  31)
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 0d0d8f6..52b577f 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -361,8 +361,13 @@ int kvm_arch_init_vcpu(CPUX86State *env)
  env-cpuid_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
  
  i = env-cpuid_ext_features  CPUID_EXT_HYPERVISOR;
 +j = env-cpuid_ext_features  CPUID_EXT_TSC_DEADLINE_TIMER;
  env-cpuid_ext_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
  env-cpuid_ext_features |= i;
 +if (j  kvm_irqchip_in_kernel() 
 +kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
 +env-cpuid_ext_features |= CPUID_EXT_TSC_DEADLINE_TIMER;
 +}
  
  env-cpuid_ext2_features = kvm_arch_get_supported_cpuid(s, 0x8001,
   0, R_EDX);
 

Fine with me.

Acked-by: Jan Kiszka jan.kis...@siemens.com

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/6 v5] deal with guest panicked event accoring to -onpanic parameter

2012-07-03 Thread Wen Congyang

At 07/03/2012 02:36 PM, Jan Kiszka Wrote:
 On 2012-07-03 08:07, Wen Congyang wrote:
 At 06/28/2012 04:26 PM, Jan Kiszka Wrote:
 On 2012-06-28 03:15, Wen Congyang wrote:
 At 06/27/2012 10:39 PM, Jan Kiszka Wrote:
 On 2012-06-27 09:02, Wen Congyang wrote:
 When the guest is panicked, it will write 0x1 to the port KVM_PV_PORT.
 So if qemu reads 0x1 from this port, we can do the folloing three
 things according to the parameter -onpanic:
 1. emit QEVENT_GUEST_PANICKED only
 2. emit QEVENT_GUEST_PANICKED and pause the guest
 3. emit QEVENT_GUEST_PANICKED and poweroff the guest
 4. emit QEVENT_GUEST_PANICKED and reset the guest

 Note: if we emit QEVENT_GUEST_PANICKED only, and the management
 application does not receive this event(the management may not
 run when the event is emitted), the management won't know the
 guest is panicked.

 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 ---
  kvm-all.c   |  101 
 +++
  kvm-stub.c  |9 +
  kvm.h   |3 ++
  qemu-options.hx |   15 
  vl.c|   10 +
  5 files changed, 138 insertions(+), 0 deletions(-)

 diff --git a/kvm-all.c b/kvm-all.c
 index f8e4328..9494dd2 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -19,6 +19,8 @@
  #include stdarg.h
  
  #include linux/kvm.h
 +#include linux/kvm_para.h
 +#include asm/kvm_para.h
  
  #include qemu-common.h
  #include qemu-barrier.h
 @@ -32,6 +34,9 @@
  #include bswap.h
  #include memory.h
  #include exec-memory.h
 +#include iorange.h
 +#include qemu-objects.h
 +#include monitor.h
  
  /* This check must be after config-host.h is included */
  #ifdef CONFIG_EVENTFD
 @@ -1931,3 +1936,99 @@ int kvm_on_sigbus(int code, void *addr)
  {
  return kvm_arch_on_sigbus(code, addr);
  }
 +
 +/* Possible values for action parameter. */
 +#define PANICKED_REPORT 1   /* emit QEVENT_GUEST_PANICKED only */
 +#define PANICKED_PAUSE  2   /* emit QEVENT_GUEST_PANICKED and pause 
 VM */
 +#define PANICKED_POWEROFF   3   /* emit QEVENT_GUEST_PANICKED and quit 
 VM */
 +#define PANICKED_RESET  4   /* emit QEVENT_GUEST_PANICKED and reset 
 VM */
 +
 +static int panicked_action = PANICKED_REPORT;
 +
 +static void kvm_pv_port_read(IORange *iorange, uint64_t offset, 
 unsigned width,
 + uint64_t *data)
 +{
 +*data = (1  KVM_PV_FEATURE_PANICKED);
 +}
 +
 +static void panicked_mon_event(const char *action)
 +{
 +QObject *data;
 +
 +data = qobject_from_jsonf({ 'action': %s }, action);
 +monitor_protocol_event(QEVENT_GUEST_PANICKED, data);
 +qobject_decref(data);
 +}
 +
 +static void panicked_perform_action(void)
 +{
 +switch (panicked_action) {
 +case PANICKED_REPORT:
 +panicked_mon_event(report);
 +break;
 +
 +case PANICKED_PAUSE:
 +panicked_mon_event(pause);
 +vm_stop(RUN_STATE_GUEST_PANICKED);
 +break;
 +
 +case PANICKED_POWEROFF:
 +panicked_mon_event(poweroff);
 +exit(0);
 +break;
 +case PANICKED_RESET:
 +panicked_mon_event(reset);
 +qemu_system_reset_request();
 +break;
 +}
 +}
 +
 +static void kvm_pv_port_write(IORange *iorange, uint64_t offset, 
 unsigned width,
 +  uint64_t data)
 +{
 +if (data == KVM_PV_PANICKED) {
 +panicked_perform_action();
 +}
 +}
 +
 +static void kvm_pv_port_destructor(IORange *iorange)
 +{
 +g_free(iorange);
 +}
 +
 +static IORangeOps pv_io_range_ops = {
 +.read = kvm_pv_port_read,
 +.write = kvm_pv_port_write,
 +.destructor = kvm_pv_port_destructor,
 +};
 +
 +#if defined(KVM_PV_PORT)
 +void kvm_pv_port_init(void)
 +{
 +IORange *pv_io_range = g_malloc(sizeof(IORange));
 +
 +iorange_init(pv_io_range, pv_io_range_ops, KVM_PV_PORT, 1);
 +ioport_register(pv_io_range);

 This modeling is still not ok. We don't open-code ports anymore, we
 introduce devices. And this doesn't belong inro generic code as it is

 Do you mean introducing a new device instead of I/O port?

 I mean encapsulating the I/O registration (PIO or MMIO) in a QOM device

 A QOM device? Do you mean introduce a new device? If so, the guest should
 have a driver to know such device. Another problem is: we cannot use
 such device when the kernel is starting(the device's driver is not ready).
 If we use a new device, I think virtio-serial is enough. The reason why
 I do not use virtio-serial is: I want the feature can also work when the
 kernel is starting.
 
 I'm not talking about changing the interface to the guest, I'm talking
 about how to model it in QEMU. And that difference would be transparent
 to the guest. I pointed you to examples like hw/kvm/clock.c.

OK, I will read the code in hw/kvm/clock.c

Thanks for your help.

Wen Congyang

 
 Jan
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at

Re: kvm segfaults and bad page state in 3.4.0

2012-07-03 Thread Xiao Guangrong

Hi Fengguang,

I can reproduce this bug in my test case, and have posted
a patch to fix it which can found at:
http://marc.info/?l=linux-mmm=134129723504527w=2

Could you please try it?

On 06/04/2012 07:46 PM, Fengguang Wu wrote:
 Hi,
 
 I'm running lots of kvm instances for doing kernel boot tests.
 Unfortunately the test system itself is not stable enough, I got scary
 errors in both kvm and the host kernel. Like this. 
 
 [294025.795382] kvm used greatest stack depth: 2896 bytes left
 [310388.622083] kvm[1864]: segfault at c ip 7f498e9f6a81 sp 
 7f4994b9fca0 error 4 in kvm[7f498e96+33b000]
 [310692.050589] kvm[4332]: segfault at 10 ip 7fca662620b9 sp 
 7fca70472af0 error 6 in kvm[7fca661cc000+33b000]
 [312608.950120] kvm[18931]: segfault at 8 ip 7f95962a10a5 sp 
 7f959d777170 error 4 in kvm[7f959620b000+33b000]
 [312622.941640] kvm[19123]: segfault at 10 ip 7f406f5580b9 sp 
 7f4077d8b350 error 6 in kvm[7f406f4c2000+33b000]
 [313917.860951] kvm[28789]: segfault at c ip 7f718f4dfa81 sp 
 7f7198459520 error 4 in kvm[7f718f449000+33b000]
 [313919.177192] kvm used greatest stack depth: 2864 bytes left
 [314061.390945] kvm used greatest stack depth: 2208 bytes left
 [327479.676068] BUG: Bad page state in process kvm  pfn:59ac9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/6 v5] deal with guest panicked event accoring to -onpanic parameter

2012-07-03 Thread Jan Kiszka

On 2012-07-03 08:43, Wen Congyang wrote:
 I'm not talking about changing the interface to the guest, I'm talking
 about how to model it in QEMU. And that difference would be transparent
 to the guest. I pointed you to examples like hw/kvm/clock.c.
 
 OK, I will read the code in hw/kvm/clock.c

Just to avoid confusion: That example is just good for a trivial
framework. It does vmstate saving, something you don't need as your
device is stateless. If you want to find out how to register PIO
ranges, also check e.g. hw/pcspk.c.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm segfaults and bad page state in 3.4.0

2012-07-03 Thread Fengguang Wu

Hi Guangrong,

On Tue, Jul 03, 2012 at 02:41:02PM +0800, Xiao Guangrong wrote:
 Hi Fengguang,
 
 I can reproduce this bug in my test case, and have posted
 a patch to fix it which can found at:
 http://marc.info/?l=linux-mmm=134129723504527w=2
 
 Could you please try it?

Thank you very much! I'm glad to try it out in my compile servers.
Note that I've not encountered the bug since then (seems not very
reproducible). So the feedback would be kind of the patch works well
rather than confirming that it fixed the bug for me. Sorry for that.

Thanks,
Fengguang

 On 06/04/2012 07:46 PM, Fengguang Wu wrote:
  Hi,
  
  I'm running lots of kvm instances for doing kernel boot tests.
  Unfortunately the test system itself is not stable enough, I got scary
  errors in both kvm and the host kernel. Like this. 
  
  [294025.795382] kvm used greatest stack depth: 2896 bytes left
  [310388.622083] kvm[1864]: segfault at c ip 7f498e9f6a81 sp 
  7f4994b9fca0 error 4 in kvm[7f498e96+33b000]
  [310692.050589] kvm[4332]: segfault at 10 ip 7fca662620b9 sp 
  7fca70472af0 error 6 in kvm[7fca661cc000+33b000]
  [312608.950120] kvm[18931]: segfault at 8 ip 7f95962a10a5 sp 
  7f959d777170 error 4 in kvm[7f959620b000+33b000]
  [312622.941640] kvm[19123]: segfault at 10 ip 7f406f5580b9 sp 
  7f4077d8b350 error 6 in kvm[7f406f4c2000+33b000]
  [313917.860951] kvm[28789]: segfault at c ip 7f718f4dfa81 sp 
  7f7198459520 error 4 in kvm[7f718f449000+33b000]
  [313919.177192] kvm used greatest stack depth: 2864 bytes left
  [314061.390945] kvm used greatest stack depth: 2208 bytes left
  [327479.676068] BUG: Bad page state in process kvm  pfn:59ac9
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Biweekly KVM Test report, kernel ae7a2a3f... qemu a212f79f...

2012-07-03 Thread Ren, Yongjie

Hi All,

This is KVM upstream test result against kvm.git next branch and qemu-kvm.git 
master branch.
kvm.git next branch: ae7a2a3fb6f8b784c2752863f4f1f20c656f76fb  based on 
kernel 3.5.0-rc1
qemu-kvm.git master branch: a212f79fc4596570124fb864425b980c157cd001

We found 1 new bug and 1 bug got fixed in the past two weeks. 

New issue (1):
1. [RAS] vCPU hot-add makes the guest abort. 
  https://bugs.launchpad.net/qemu/+bug/1019179

Fixed issue (1):
1. network in bridge mode doesn't work in SMP Linux guest 
  https://bugs.launchpad.net/qemu/+bug/1013467
  -- Jason Wang (from Redhat) fixed this bug.

Old issues (3):
--
1. (Nested-virt)L1 (kvm on kvm)guest panic with parameter -cpu host in qemu 
command line.
  https://bugs.launchpad.net/qemu/+bug/994378
2. Can't install or boot up 32bit win8 guest.
  https://bugs.launchpad.net/qemu/+bug/1007269
3. VT-d/SR-IOV doesn't work in the guest
  https://bugzilla.kernel.org/show_bug.cgi?id=43328

Test environment:
==
  Platform   Westmere-EPSandybridge-EP
  CPU Cores   2432
  Memory size 24G   32G


Best Regards,
 Yongjie Ren  (Jay)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others

2012-07-03 Thread Marcelo Tosatti

On Mon, Jun 04, 2012 at 10:37:24AM +0530, Nikunj A. Dadhania wrote:
 flush_tlb_others_ipi depends on lot of statics in tlb.c.  Replicated
 the flush_tlb_others_ipi as kvm_flush_tlb_others to further adapt to
 paravirtualization.
 
 Use the vcpu state information inside the kvm_flush_tlb_others to
 avoid sending ipi to pre-empted vcpus.
 
 * Do not send ipi's to offline vcpus and set flush_on_enter flag
 * For online vcpus: Wait for them to clear the flag
 
 The approach was discussed here: https://lkml.org/lkml/2012/2/20/157
 
 Suggested-by: Peter Zijlstra a.p.zijls...@chello.nl
 Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com
 
 --
 Pseudo Algo:
 
Write()
==
 
  guest_exit()
  flush_on_enter[i]=0;
  running[i] = 0;
 
  guest_enter()
  running[i] = 1;
  smp_mb();
  if(flush_on_enter[i]) {
  tlb_flush()
  flush_on_enter[i]=0;
  }
 
 
Read()
==
 
  GUESTKVM-HV
 
f-flushcpumask = cpumask - me;
 
 again:
for_each_cpu(i, f-flushmask) {
 
  if (!running[i]) {
  case 1:
 
  running[n]=1
 
  (cpuN does not see
  flush_on_enter set,
  guest later finds it
  running and sends ipi,
  we are fine here, need
  to clear the flag on
  guest_exit)
 
 flush_on_enter[i] = 1;
  case2:
 
  running[n]=1
  (cpuN - will see flush
  on enter and an IPI as
  well - addressed in patch-4)
 
 if (!running[i])
cpu_clear(f-flushmask);  All is well, vm_enter
  will do the fixup
  }
  case 3:
  running[n] = 0;
 
  (cpuN went to sleep,
  we saw it as awake,
  ipi sent, but wait
  will break without
  zero_mask and goto
  again will take care)
 
}
send_ipi(f-flushmask)
 
wait_a_while_for_zero_mask();
 
if (!zero_mask)
  goto again;

Can you please measure increased vmentry/vmexit overhead? x86/vmexit.c 
of git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git should 
help.

  arch/x86/include/asm/kvm_para.h |3 +-
  arch/x86/include/asm/tlbflush.h |9 ++
  arch/x86/kernel/kvm.c   |1 +
  arch/x86/kvm/x86.c  |   14 -
  arch/x86/mm/tlb.c   |   61 
 +++
  5 files changed, 86 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
 index f57b5cc..684a285 100644
 --- a/arch/x86/include/asm/kvm_para.h
 +++ b/arch/x86/include/asm/kvm_para.h
 @@ -55,7 +55,8 @@ struct kvm_steal_time {
  
  struct kvm_vcpu_state {
   __u32 state;
 - __u32 pad[15];
 + __u32 flush_on_enter;
 + __u32 pad[14];
  };
  
  #define KVM_VCPU_STATE_ALIGN_BITS 5
 diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
 index c0e108e..29470bd 100644
 --- a/arch/x86/include/asm/tlbflush.h
 +++ b/arch/x86/include/asm/tlbflush.h
 @@ -119,6 +119,12 @@ static inline void native_flush_tlb_others(const struct 
 cpumask *cpumask,
  {
  }
  
 +static inline void kvm_flush_tlb_others(const struct cpumask *cpumask,
 + struct mm_struct *mm,
 + unsigned long va)
 +{
 +}
 +
  static inline void reset_lazy_tlbstate(void)
  {
  }
 @@ -145,6 +151,9 @@ static inline void flush_tlb_range(struct vm_area_struct 
 *vma,
  void native_flush_tlb_others(const struct cpumask *cpumask,
struct mm_struct *mm, unsigned long va);
  
 +void kvm_flush_tlb_others(const struct cpumask *cpumask,
 +   struct mm_struct *mm, unsigned long va);
 +
  #define TLBSTATE_OK  1
  #define TLBSTATE_LAZY2
  
 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
 index

Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others

2012-07-03 Thread Marcelo Tosatti

On Mon, Jun 04, 2012 at 10:37:24AM +0530, Nikunj A. Dadhania wrote:
 flush_tlb_others_ipi depends on lot of statics in tlb.c.  Replicated
 the flush_tlb_others_ipi as kvm_flush_tlb_others to further adapt to
 paravirtualization.
 
 Use the vcpu state information inside the kvm_flush_tlb_others to
 avoid sending ipi to pre-empted vcpus.
 
 * Do not send ipi's to offline vcpus and set flush_on_enter flag
 * For online vcpus: Wait for them to clear the flag
 
 The approach was discussed here: https://lkml.org/lkml/2012/2/20/157
 
 Suggested-by: Peter Zijlstra a.p.zijls...@chello.nl
 Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com
 
 --
 Pseudo Algo:
 
Write()
==
 
  guest_exit()
  flush_on_enter[i]=0;
  running[i] = 0;
 
  guest_enter()
  running[i] = 1;
  smp_mb();
  if(flush_on_enter[i]) {
  tlb_flush()
  flush_on_enter[i]=0;
  }
 
 
Read()
==
 
  GUESTKVM-HV
 
f-flushcpumask = cpumask - me;
 
 again:
for_each_cpu(i, f-flushmask) {
 
  if (!running[i]) {
  case 1:
 
  running[n]=1
 
  (cpuN does not see
  flush_on_enter set,
  guest later finds it
  running and sends ipi,
  we are fine here, need
  to clear the flag on
  guest_exit)
 
 flush_on_enter[i] = 1;
  case2:
 
  running[n]=1
  (cpuN - will see flush
  on enter and an IPI as
  well - addressed in patch-4)
 
 if (!running[i])
cpu_clear(f-flushmask);  All is well, vm_enter
  will do the fixup
  }
  case 3:
  running[n] = 0;
 
  (cpuN went to sleep,
  we saw it as awake,
  ipi sent, but wait
  will break without
  zero_mask and goto
  again will take care)
 
}
send_ipi(f-flushmask)
 
wait_a_while_for_zero_mask();
 
if (!zero_mask)
  goto again;
 ---
  arch/x86/include/asm/kvm_para.h |3 +-
  arch/x86/include/asm/tlbflush.h |9 ++
  arch/x86/kernel/kvm.c   |1 +
  arch/x86/kvm/x86.c  |   14 -
  arch/x86/mm/tlb.c   |   61 
 +++
  5 files changed, 86 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
 index f57b5cc..684a285 100644
 --- a/arch/x86/include/asm/kvm_para.h
 +++ b/arch/x86/include/asm/kvm_para.h
 @@ -55,7 +55,8 @@ struct kvm_steal_time {
  
  struct kvm_vcpu_state {
   __u32 state;
 - __u32 pad[15];
 + __u32 flush_on_enter;
 + __u32 pad[14];
  };
  
  #define KVM_VCPU_STATE_ALIGN_BITS 5
 diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
 index c0e108e..29470bd 100644
 --- a/arch/x86/include/asm/tlbflush.h
 +++ b/arch/x86/include/asm/tlbflush.h
 @@ -119,6 +119,12 @@ static inline void native_flush_tlb_others(const struct 
 cpumask *cpumask,
  {
  }
  
 +static inline void kvm_flush_tlb_others(const struct cpumask *cpumask,
 + struct mm_struct *mm,
 + unsigned long va)
 +{
 +}
 +
  static inline void reset_lazy_tlbstate(void)
  {
  }
 @@ -145,6 +151,9 @@ static inline void flush_tlb_range(struct vm_area_struct 
 *vma,
  void native_flush_tlb_others(const struct cpumask *cpumask,
struct mm_struct *mm, unsigned long va);
  
 +void kvm_flush_tlb_others(const struct cpumask *cpumask,
 +   struct mm_struct *mm, unsigned long va);
 +
  #define TLBSTATE_OK  1
  #define TLBSTATE_LAZY2
  
 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
 index bb686a6..66db54e 100644
 --- a/arch/x86/kernel/kvm.c
 +++ b/arch/x86/kernel/kvm.c
 @@ -465,6 +465,7 @@ void __init kvm_guest_init(void)
   }

Re: [PATCH v2 5/7] KVM: Introduce PV kick in flush tlb

2012-07-03 Thread Marcelo Tosatti

On Mon, Jun 04, 2012 at 10:38:17AM +0530, Nikunj A. Dadhania wrote:
 In place of looping continuously introduce a halt if we do not succeed
 after some time.
 
 For vcpus that were running an IPI is sent.  In case, it went to sleep
 between this, we will be doing flush_on_enter(harmless). But as a
 flush IPI was already sent, that will be processed in ipi handler,
 this might result into something undesireable, i.e. It might clear the
 flush_mask of a new request.
 
 So after sending an IPI and waiting for a while, do a halt and wait
 for a kick from the last vcpu.
 
 Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
 Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com

Again, was it determined that this is necessary from data of 
benchmarking on the in-guest-mode/out-guest-mode patch?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ANNOUNCE] qemu-kvm-1.1.0

2012-07-03 Thread Michael Tokarev

On 03.07.2012 03:32, Marcelo Tosatti wrote:
 
 qemu-kvm-1.1.0 is now available. This release is based on the upstream
 qemu 1.1.0, plus kvm-specific enhancements. Please see the
 original QEMU 1.1.0 release announcement [1] for details.

Why the recent fixes from Jan hasn't been applied?  I mean these:

http://www.spinics.net/lists/kvm/msg75076.html
http://www.spinics.net/lists/kvm/msg75074.html

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others

2012-07-03 Thread Nikunj A Dadhania

On Tue, 3 Jul 2012 04:55:35 -0300, Marcelo Tosatti mtosa...@redhat.com wrote:
  
 if (!zero_mask)
 goto again;
 
 Can you please measure increased vmentry/vmexit overhead? x86/vmexit.c 
 of git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git should 
 help.

Sure will get back with the result.

  +   /* 
  +* Guest might have seen us offline and would have set
  +* flush_on_enter. 
  +*/
  +   kvm_read_guest_cached(vcpu-kvm, ghc, vs, 2*sizeof(__u32));
  +   if (vs-flush_on_enter) 
  +   kvm_x86_ops-tlb_flush(vcpu);
 
 
 So flush_tlb_page which was an invlpg now flushes the entire TLB. Did
 you take that into account?
 
When the vcpu is sleeping/pre-empted out, multiple request for flush_tlb
could have happened. And now when we are here, it is cleaning up all the
TLB.

One other approach would be to queue the addresses, that brings us with
the question: how many request to queue? This would require us adding
more syncronization between guest and host for updating the area where
these addresses is shared.

  +again:
  +   for_each_cpu(cpu, to_cpumask(f-flush_cpumask)) {
  +   v_state = per_cpu(vcpu_state, cpu);
  +
  +   if (!v_state-state) {
 
 Should use ACCESS_ONCE to make sure the value is not register cached.
 \
  +   v_state-flush_on_enter = 1;
  +   smp_mb();
  +   if (!v_state-state)
 
 And here.
 
Sure will add this check for both in my next version.

  +   cpumask_clear_cpu(cpu, 
  to_cpumask(f-flush_cpumask));
  +   }
  +   }
  +
  +   if (cpumask_empty(to_cpumask(f-flush_cpumask)))
  +   goto out;
  +
  +   apic-send_IPI_mask(to_cpumask(f-flush_cpumask),
  +   INVALIDATE_TLB_VECTOR_START + sender);
  +
  +   loop = 1000;
  +   while (!cpumask_empty(to_cpumask(f-flush_cpumask))  --loop)
  +   cpu_relax();
  +
  +   if (!cpumask_empty(to_cpumask(f-flush_cpumask)))
  +   goto again;
 
 Is this necessary in addition to the in-guest-mode/out-guest-mode
 detection? If so, why?
 
The case 3 where we initially saw the vcpu was running, and a flush
ipi is send to the vcpu. During this time the vcpu might be pre-empted,
so we come out of the loop=1000 with !empty flushmask. We then re-verify
the flushmask against the current running vcpu and make sure that the
vcpu that was pre-empted is un-marked and we can proceed out of the
kvm_flush_tlb_others_ipi without waiting for sleeping/pre-empted vcpus.

Regards
Nikunj

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 5/7] KVM: Introduce PV kick in flush tlb

2012-07-03 Thread Nikunj A Dadhania

On Tue, 3 Jul 2012 05:07:13 -0300, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Mon, Jun 04, 2012 at 10:38:17AM +0530, Nikunj A. Dadhania wrote:
  In place of looping continuously introduce a halt if we do not succeed
  after some time.
  
  For vcpus that were running an IPI is sent.  In case, it went to sleep
  between this, we will be doing flush_on_enter(harmless). But as a
  flush IPI was already sent, that will be processed in ipi handler,
  this might result into something undesireable, i.e. It might clear the
  flush_mask of a new request.
  
  So after sending an IPI and waiting for a while, do a halt and wait
  for a kick from the last vcpu.
  
  Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
  Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com
 
 Again, was it determined that this is necessary from data of 
 benchmarking on the in-guest-mode/out-guest-mode patch?
 
No, this is more of a fix wrt algo.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others

2012-07-03 Thread Nikunj A Dadhania

On Tue, 3 Jul 2012 05:11:35 -0300, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Mon, Jun 04, 2012 at 10:37:24AM +0530, Nikunj A. Dadhania wrote:

   arch/x86/include/asm/kvm_para.h |3 +-
   arch/x86/include/asm/tlbflush.h |9 ++
   arch/x86/kernel/kvm.c   |1 +
   arch/x86/kvm/x86.c  |   14 -
   arch/x86/mm/tlb.c   |   61 
  +++
   5 files changed, 86 insertions(+), 2 deletions(-)
  

[...]

  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index 264f172..4714a7b 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
 
 Please split guest/host (arch/x86/kernel/kvm.c etc VS arch/x86/kvm/)
 patches.
 
Ok

 Please document guest/host interface
 (Documentation/virtual/kvm/paravirt-tlb-flush.txt, add a pointer to it
 from msr.txt).
 
Sure.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v9 00/16] KVM/ARM Implementation

2012-07-03 Thread Christoffer Dall

The following series implements KVM support for ARM processors,
specifically on the Cortex A-15 platform.  Work is done in
collaboration between Columbia University, Virtual Open Systems and
ARM/Linaro.

The patch series applies to kvm/next, specifically commit:
ae7a2a3fb6f8b784c2752863f4f1f20c656f76fb

This is Version 9 of the patch series, but the first two versions
were reviewed outside of the KVM mailing list. Changes can also be
pulled from:
 git://github.com/virtualopensystems/linux-kvm-arm.git kvm-a15-v9

A non-flattened edition of the patch series can be found at:
 git://github.com/virtualopensystems/linux-kvm-arm.git kvm-a15-v9-stage

The implementation is broken up into a logical set of patches, the first
five are preparatory patches:
 1. ARM: Add mem_type prot_pte accessor
 2. ARM: ARM_VIRT_EXT config option
 3. ARM: Section based HYP idmaps
 4. KVM: Move KVM_IRQ_LINE to arch-generic code
 5. KVM: Guard code with CONFIG_MMU_NOTIFIER (repost)

KVM guys, please consider pulling the KVM generic patches as early as
possible. Thanks.

The main implementation is broken up into separate patches, the first
containing a skeleton of files, makefile changes, the basic user space
interface and KVM architecture specific stubs.  Subsequent patches
implement parts of the system as listed:
 1.  Skeleton
 2.  Reset values for the Cortex-A15 type processor
 3.  Hypervisor initialization
 4.  Hypervisor module unloading
 5.  Memory virtualization setup (hyp mode mappings and 2nd stage)
 6.  Inject IRQs and FIQs from userspace
 7.  World-switch implementation and Hyp exception vectors
 8.  Emulation framework and CP15 emulation
 9.  Handle guest user memory aborts
 10. Handle guest MMIO aborts
 11. Support guest wait-for-interrupt instructions

Testing:
Limited testing, but have run GCC inside guest, which compiled a small
hello-world program, which was successfully run. For v9 both ARM/Thumb-2
kernels were tested as both host/guest and both a compiled-in version
and a kernel module version of KVM was tested. Hardware still
unavailable to me, so all testing has been done on ARM Fast Models.

For a guide on how to set up a testing environment and try out these
patches, see:
 http://www.virtualopensystems.com/media/pdf/kvm-arm-guide.pdf

There is an issue list available using the issue tracker on:
https://github.com/virtualopensystems/linux-kvm-arm

Additionally a few major milestones are coming up shortly:
 - Support Thumb MMIO emulation and test MMIO emulation code (under way)
 - Merge Marc Zyngier's patch series for VGIC and timers (review in
   progress)
 - Change from SMC based install to relying on booting the kernel in Hyp
   mode. This requires some larger changes, but will allow a guest
   kernel to boot with KVM configured.

Changes since v8:
 - Support cache maintenance on SMP through set/way
 - Hyp mode idmaps are now section based and happen at kernel init
 - Handle aborts in Hyp mode
 - Inject undefined exceptions into the guest on error
 - Kernel-side reset of all crucial registers
 - Specifically state which target CPU is being virtualized
 - Exit statistics in debugfs
 - Some L2CTLR cp15 emulation cleanups
 - Support spte_hva for MMU notifiers and take write faults
 - FIX: Race condition in VMID generation
 - BUG: Run exit handling code with disabled preemption
 - Save/Restore abort fault register during world switch

Changes since v7:
 - Traps accesses to ACTLR
 - Do not trap WFE execution
 - Upgrade barriers and TLB operations to inner-shareable domain
 - Restrucure hyp_pgd related code to be more opaque
 - Random SMP fixes
 - Random BUG fixes
 - Improve commenting
 - Support module loading/unloading of KVM/ARM
 - Thumb-2 support for host kernel and KVM
 - Unaligned cross-page wide guest Thumb instruction fetching
 - Support ITSTATE fields in CPSR for Thumb guests
 - Document HCR settings

Changes since v6:
 - Support for MMU notifiers to not pin user pages in memory
 - Suport build with log debugging
 - Bugfix: v6 clobbered r7 in init code
 - Simplify hyp code mapping
 - Cleanup of register access code
 - Table-based CP15 emulation from Rusty Russell
 - Various other bug fixes and cleanups

Changes since v5:
 - General bugfixes and nit fixes from reviews
 - Implemented re-use of VMIDs
 - Cleaned up the Hyp-mapping code to be readable by non-mm hackers
   (including myself)
 - Integrated preliminary SMP support in base patches
 - Lock-less interrupt injection and WFI support
 - Fixed signal-handling in while in guest (increases overall stability)

Changes since v4:
 - Addressed reviewer comments from v4
* cleanup debug and trace code
* remove printks
* fixup kvm_arch_vcpu_ioctl_run
* add trace details to mmio emulation
 - Fix from Marc Zyngier: Move kvm_guest_enter/exit into non-preemptible
   section (squashed into world-switch patch)
 - Cleanup create_hyp_mappings/remove_hyp_mappings from Marc Zyngier
   (squashed into hypervisor initialization patch)
 - Removed the

[PATCH v9 01/16] ARM: add mem_type prot_pte accessor

2012-07-03 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

The KVM hypervisor mmu code requires requires access to the
mem_type prot_pte field when setting up page tables pointing
to a device. Unfortunately, the mem_type structure is opaque.

Add an accessor (get_mem_type_prot_pte()) to retrieve the
prot_pte value.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/mach/map.h |1 +
 arch/arm/mm/mmu.c   |6 ++
 2 files changed, 7 insertions(+)

diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index a6efcdd..3787c9f 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -37,6 +37,7 @@ extern void iotable_init(struct map_desc *, int);
 
 struct mem_type;
 extern const struct mem_type *get_mem_type(unsigned int type);
+extern pteval_t get_mem_type_prot_pte(unsigned int type);
 /*
  * external interface to remap single page with appropriate type
  */
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index e5dad60..f7439e7 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -301,6 +301,12 @@ const struct mem_type *get_mem_type(unsigned int type)
 }
 EXPORT_SYMBOL(get_mem_type);
 
+pteval_t get_mem_type_prot_pte(unsigned int type)
+{
+   return get_mem_type(type)-prot_pte;
+}
+EXPORT_SYMBOL(get_mem_type_prot_pte);
+
 /*
  * Adjust the PMD section entries according to the CPU in use.
  */

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v9 02/16] ARM: Add config option ARM_VIRT_EXT

2012-07-03 Thread Christoffer Dall

Select this option for ARM processors equipped with hardware
Virtualization Extensions.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/mm/Kconfig |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 101b968..037dc53 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -597,6 +597,16 @@ config ARM_LPAE
 
  If unsure, say N.
 
+config ARM_VIRT_EXT
+   bool Support for ARM Virtualization Extensions
+   depends on ARM_LPAE
+   help
+ Say Y if you have an ARMv7 processor supporting the ARM hardware
+ Virtualization extensions. KVM depends on this feature and will
+ not run without it being selected. If you say Y here, the kernel
+ will not boot on a machine without virtualization extensions and
+ will not boot as a KVM guest.
+
 config ARCH_PHYS_ADDR_T_64BIT
def_bool ARM_LPAE
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v9 03/16] ARM: Section based HYP idmap

2012-07-03 Thread Christoffer Dall

Add a HYP pgd to the core code (so it can benefit all Linux
hypervisors).

Populate this pgd with an identity mapping of the code contained
in the .hyp.idmap.text section

Offer a method to drop the this identity mapping through
hyp_idmap_teardown and re-create it through hyp_idmap_setup.

Make all the above depend on CONFIG_ARM_VIRT_EXT

Cc: Will Deacon will.dea...@arm.com
Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/idmap.h|7 ++
 arch/arm/include/asm/pgtable-3level-hwdef.h |1 
 arch/arm/kernel/vmlinux.lds.S   |6 ++
 arch/arm/mm/idmap.c |   88 +++
 4 files changed, 89 insertions(+), 13 deletions(-)

diff --git a/arch/arm/include/asm/idmap.h b/arch/arm/include/asm/idmap.h
index bf863ed..a1ab8d6 100644
--- a/arch/arm/include/asm/idmap.h
+++ b/arch/arm/include/asm/idmap.h
@@ -11,4 +11,11 @@ extern pgd_t *idmap_pgd;
 
 void setup_mm_for_reboot(void);
 
+#ifdef CONFIG_ARM_VIRT_EXT
+extern pgd_t *hyp_pgd;
+
+void hyp_idmap_teardown(void);
+void hyp_idmap_setup(void);
+#endif
+
 #endif /* __ASM_IDMAP_H */
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h 
b/arch/arm/include/asm/pgtable-3level-hwdef.h
index d795282..a2d404e 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -44,6 +44,7 @@
 #define PMD_SECT_XN(_AT(pmdval_t, 1)  54)
 #define PMD_SECT_AP_WRITE  (_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ   (_AT(pmdval_t, 0))
+#define PMD_SECT_AP1   (_AT(pmdval_t, 1)  6)
 #define PMD_SECT_TEX(x)(_AT(pmdval_t, 0))
 
 /*
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 43a31fb..33da40a 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -19,7 +19,11 @@
ALIGN_FUNCTION();   \
VMLINUX_SYMBOL(__idmap_text_start) = .; \
*(.idmap.text)  \
-   VMLINUX_SYMBOL(__idmap_text_end) = .;
+   VMLINUX_SYMBOL(__idmap_text_end) = .;   \
+   ALIGN_FUNCTION();   \
+   VMLINUX_SYMBOL(__hyp_idmap_text_start) = .; \
+   *(.hyp.idmap.text)  \
+   VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
 
 #ifdef CONFIG_HOTPLUG_CPU
 #define ARM_CPU_DISCARD(x)
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index ab88ed4..7a944af 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -1,4 +1,6 @@
+#include linux/module.h
 #include linux/kernel.h
+#include linux/slab.h
 
 #include asm/cputype.h
 #include asm/idmap.h
@@ -59,11 +61,20 @@ static void idmap_add_pud(pgd_t *pgd, unsigned long addr, 
unsigned long end,
} while (pud++, addr = next, addr != end);
 }
 
-static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long 
end)
+static void identity_mapping_add(pgd_t *pgd, const char *text_start,
+const char *text_end, unsigned long prot)
 {
-   unsigned long prot, next;
+   unsigned long addr, end;
+   unsigned long next;
+
+   addr = virt_to_phys(text_start);
+   end = virt_to_phys(text_end);
+
+   pr_info(Setting up static %sidentity map for 0x%llx - 0x%llx\n,
+   prot ? HYP  : ,
+   (long long)addr, (long long)end);
+   prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 
-   prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
if (cpu_architecture() = CPU_ARCH_ARMv5TEJ  !cpu_is_xscale())
prot |= PMD_BIT4;
 
@@ -78,24 +89,77 @@ extern char  __idmap_text_start[], __idmap_text_end[];
 
 static int __init init_static_idmap(void)
 {
-   phys_addr_t idmap_start, idmap_end;
-
idmap_pgd = pgd_alloc(init_mm);
if (!idmap_pgd)
return -ENOMEM;
 
-   /* Add an identity mapping for the physical address of the section. */
-   idmap_start = virt_to_phys((void *)__idmap_text_start);
-   idmap_end = virt_to_phys((void *)__idmap_text_end);
-
-   pr_info(Setting up static identity map for 0x%llx - 0x%llx\n,
-   (long long)idmap_start, (long long)idmap_end);
-   identity_mapping_add(idmap_pgd, idmap_start, idmap_end);
+   identity_mapping_add(idmap_pgd, __idmap_text_start,
+__idmap_text_end, 0);
 
return 0;
 }
 early_initcall(init_static_idmap);
 
+#ifdef CONFIG_ARM_VIRT_EXT
+pgd_t *hyp_pgd;
+EXPORT_SYMBOL_GPL(hyp_pgd);
+
+static void hyp_idmap_del_pmd(pgd_t *pgd, unsigned long addr)
+{
+   pud_t *pud;
+   pmd_t *pmd;
+
+   pud = pud_offset(pgd, addr);
+   pmd = pmd_offset(pud, addr);
+   pud_clear(pud);
+   clean_pmd_entry(pmd);
+   pmd_free(NULL,

[PATCH v9 04/16] KVM: Move KVM_IRQ_LINE to arch-generic code

2012-07-03 Thread Christoffer Dall

Handle KVM_IRQ_LINE and KVM_IRQ_LINE_STATUS in the generic
kvm_vm_ioctl() function and call into kvm_vm_ioctl_irq_line().

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/ia64/kvm/kvm-ia64.c |   33 ++---
 arch/x86/kvm/x86.c   |   33 ++---
 include/linux/kvm_host.h |1 +
 virt/kvm/kvm_main.c  |   19 +++
 4 files changed, 40 insertions(+), 46 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index bd77cb5..122a4b2 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -924,6 +924,16 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
return 0;
 }
 
+int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event)
+{
+   if (!irqchip_in_kernel(kvm))
+   return -ENXIO;
+
+   irq_event-statusstatus = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
+ irq_event-irq, irq_event-level);
+   return 0;
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
 {
@@ -963,29 +973,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
goto out;
}
break;
-   case KVM_IRQ_LINE_STATUS:
-   case KVM_IRQ_LINE: {
-   struct kvm_irq_level irq_event;
-
-   r = -EFAULT;
-   if (copy_from_user(irq_event, argp, sizeof irq_event))
-   goto out;
-   r = -ENXIO;
-   if (irqchip_in_kernel(kvm)) {
-   __s32 status;
-   status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
-   irq_event.irq, irq_event.level);
-   if (ioctl == KVM_IRQ_LINE_STATUS) {
-   r = -EFAULT;
-   irq_event.status = status;
-   if (copy_to_user(argp, irq_event,
-   sizeof irq_event))
-   goto out;
-   }
-   r = 0;
-   }
-   break;
-   }
case KVM_GET_IRQCHIP: {
/* 0: PIC master, 1: PIC slave, 2: IOAPIC */
struct kvm_irqchip chip;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8eacb2e..03ce386 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3149,6 +3149,16 @@ out:
return r;
 }
 
+int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event)
+{
+   if (!irqchip_in_kernel(kvm))
+   return -ENXIO;
+
+   irq_event-status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
+   irq_event-irq, irq_event-level);
+   return 0;
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
@@ -3255,29 +3265,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
create_pit_unlock:
mutex_unlock(kvm-slots_lock);
break;
-   case KVM_IRQ_LINE_STATUS:
-   case KVM_IRQ_LINE: {
-   struct kvm_irq_level irq_event;
-
-   r = -EFAULT;
-   if (copy_from_user(irq_event, argp, sizeof irq_event))
-   goto out;
-   r = -ENXIO;
-   if (irqchip_in_kernel(kvm)) {
-   __s32 status;
-   status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
-   irq_event.irq, irq_event.level);
-   if (ioctl == KVM_IRQ_LINE_STATUS) {
-   r = -EFAULT;
-   irq_event.status = status;
-   if (copy_to_user(argp, irq_event,
-   sizeof irq_event))
-   goto out;
-   }
-   r = 0;
-   }
-   break;
-   }
case KVM_GET_IRQCHIP: {
/* 0: PIC master, 1: PIC slave, 2: IOAPIC */
struct kvm_irqchip *chip;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c7f7787..b7b3f04 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -494,6 +494,7 @@ int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
   struct
   kvm_userspace_memory_region *mem,
   int user_alloc);
+int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level);
 long kvm_arch_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 636bd08..1d33877 100644
--- a/virt/kvm/kvm_main.c

[PATCH v9 05/16] KVM: Guard mmu_notifier specific code with CONFIG_MMU_NOTIFIER

2012-07-03 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

In order to avoid compilation failure when KVM is not compiled in,
guard the mmu_notifier specific sections with both CONFIG_MMU_NOTIFIER
and KVM_ARCH_WANT_MMU_NOTIFIER, like it is being done in the rest of
the KVM code.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 include/linux/kvm_host.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b7b3f04..96aa7fb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -306,7 +306,7 @@ struct kvm {
struct hlist_head irq_ack_notifier_list;
 #endif
 
-#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+#if defined(CONFIG_MMU_NOTIFIER)  defined(KVM_ARCH_WANT_MMU_NOTIFIER)
struct mmu_notifier mmu_notifier;
unsigned long mmu_notifier_seq;
long mmu_notifier_count;
@@ -781,7 +781,7 @@ struct kvm_stats_debugfs_item {
 extern struct kvm_stats_debugfs_item debugfs_entries[];
 extern struct dentry *kvm_debugfs_dir;
 
-#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+#if defined(CONFIG_MMU_NOTIFIER)  defined(KVM_ARCH_WANT_MMU_NOTIFIER)
 static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long 
mmu_seq)
 {
if (unlikely(vcpu-kvm-mmu_notifier_count))

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v9 06/16] ARM: KVM: Initial skeleton to compile KVM support

2012-07-03 Thread Christoffer Dall

Targets KVM support for Cortex A-15 processors.

Contains no real functionality but all the framework components,
make files, header files and some tracing functionality.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

“Nothing to see here. Move along, move along...

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/Kconfig   |2 
 arch/arm/Makefile  |1 
 arch/arm/include/asm/kvm.h |   80 ++
 arch/arm/include/asm/kvm_arm.h |   22 +++
 arch/arm/include/asm/kvm_asm.h |   30 
 arch/arm/include/asm/kvm_emulate.h |  108 ++
 arch/arm/include/asm/kvm_host.h|  140 ++
 arch/arm/include/asm/unified.h |   12 ++
 arch/arm/kvm/Kconfig   |   44 ++
 arch/arm/kvm/Makefile  |   17 ++
 arch/arm/kvm/arm.c |  286 
 arch/arm/kvm/emulate.c |  127 
 arch/arm/kvm/exports.c |   19 ++
 arch/arm/kvm/guest.c   |  165 +
 arch/arm/kvm/init.S|   19 ++
 arch/arm/kvm/interrupts.S  |   19 ++
 arch/arm/kvm/mmu.c |   17 ++
 arch/arm/kvm/reset.c   |   33 
 arch/arm/kvm/trace.h   |   52 +++
 19 files changed, 1193 insertions(+)
 create mode 100644 arch/arm/include/asm/kvm.h
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/emulate.c
 create mode 100644 arch/arm/kvm/exports.c
 create mode 100644 arch/arm/kvm/guest.c
 create mode 100644 arch/arm/kvm/init.S
 create mode 100644 arch/arm/kvm/interrupts.S
 create mode 100644 arch/arm/kvm/mmu.c
 create mode 100644 arch/arm/kvm/reset.c
 create mode 100644 arch/arm/kvm/trace.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index b649c59..736244c 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2273,3 +2273,5 @@ source security/Kconfig
 source crypto/Kconfig
 
 source lib/Kconfig
+
+source arch/arm/kvm/Kconfig
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 0298b00..64f1e16 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -250,6 +250,7 @@ core-$(CONFIG_VFP)  += arch/arm/vfp/
 # If we have a machine-specific directory, then include it in the build.
 core-y += arch/arm/kernel/ arch/arm/mm/ 
arch/arm/common/
 core-y += arch/arm/net/
+core-y += arch/arm/kvm/
 core-y += $(machdirs) $(platdirs)
 
 drivers-$(CONFIG_OPROFILE)  += arch/arm/oprofile/
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
new file mode 100644
index 000..1d0d8f1
--- /dev/null
+++ b/arch/arm/include/asm/kvm.h
@@ -0,0 +1,80 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall c.d...@virtualopensystems.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_H__
+#define __ARM_KVM_H__
+
+#include asm/types.h
+
+#define __KVM_HAVE_GUEST_DEBUG
+
+/*
+ * Modes used for short-hand mode determinition in the world-switch code and
+ * in emulation code.
+ *
+ * Note: These indices do NOT correspond to the value of the CPSR mode bits!
+ */
+enum vcpu_mode {
+   MODE_FIQ = 0,
+   MODE_IRQ,
+   MODE_SVC,
+   MODE_ABT,
+   MODE_UND,
+   MODE_USR,
+   MODE_SYS
+};
+
+struct kvm_regs {
+   __u32 regs0_7[8];   /* Unbanked regs. (r0 - r7)*/
+   __u32 fiq_regs8_12[5];  /* Banked fiq regs. (r8 - r12) */
+   __u32 usr_regs8_12[5];  /* Banked usr registers (r8 - r12) */
+   __u32 reg13[6]; /* Banked r13, indexed by MODE_*/
+   __u32 reg14[6]; /* Banked r13, indexed by MODE_*/
+   __u32 reg15;
+   __u32 cpsr;
+   __u32 spsr[5];  /* Banked SPSR,  indexed by MODE_  */
+   struct {
+   __u32 c0_midr;
+   __u32 c1_sys;
+   __u32 c2_base0;
+   __u32 c2_base1;
+   __u32

[PATCH v9 07/16] ARM: KVM: Support Cortex-A15 VCPUs reset

2012-07-03 Thread Christoffer Dall

Reset all core and cp15 registers to their architecturally defined reset
values at VCPU init time.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arm.h |6 ++
 arch/arm/kvm/exports.c |2 +
 arch/arm/kvm/reset.c   |  100 
 3 files changed, 108 insertions(+)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index c5bbef0..2f9d28e 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -19,4 +19,10 @@
 #ifndef __ARM_KVM_ARM_H__
 #define __ARM_KVM_ARM_H__
 
+/* Supported Processor Types */
+#define CORTEX_A15 (0xC0F)
+
+/* Multiprocessor Affinity Register */
+#define MPIDR_CPUID(0x3  0)
+
 #endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
index 01a2e41..3e38c95 100644
--- a/arch/arm/kvm/exports.c
+++ b/arch/arm/kvm/exports.c
@@ -17,3 +17,5 @@
  */
 
 #include linux/module.h
+
+EXPORT_SYMBOL_GPL(smp_send_reschedule);
diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
index c250024..78488be 100644
--- a/arch/arm/kvm/reset.c
+++ b/arch/arm/kvm/reset.c
@@ -15,6 +15,73 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+#include linux/compiler.h
+#include linux/errno.h
+#include linux/sched.h
+#include linux/kvm_host.h
+#include linux/kvm.h
+
+#include asm/unified.h
+#include asm/ptrace.h
+#include asm/cputype.h
+#include asm/kvm_arm.h
+
+#define CT_ASSERT(expr, name) extern char name[(expr) ? 1 : -1]
+#define CP15_REGS_ASSERT(_array, _name) \
+   CT_ASSERT((sizeof(_array) / sizeof(_array[0])) == nr_cp15_regs, _name)
+#define UNKNOWN 0xdecafbad
+
+/**
+ * Cortex-A15 Register Reset Values
+ */
+
+static const int a15_max_cpu_idx = 3;
+
+static struct kvm_vcpu_regs a15_regs_reset = {
+   .cpsr = SVC_MODE | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT,
+};
+
+static u32 a15_cp15_regs_reset[][2] = {
+   { c0_MIDR,  0x412FC0F0 },
+   { c0_MPIDR, 0x }, /* see kvm_arch_vcpu_init */
+   { c1_SCTLR, 0x00C50078 },
+   { c1_ACTLR, 0x },
+   { c1_CPACR, 0x },
+   { c2_TTBR0, UNKNOWN },
+   { c2_TTBR0_high,UNKNOWN },
+   { c2_TTBR1, UNKNOWN },
+   { c2_TTBR1_high,UNKNOWN },
+   { c2_TTBCR, 0x },
+   { c3_DACR,  UNKNOWN },
+   { c5_DFSR,  UNKNOWN },
+   { c5_IFSR,  UNKNOWN },
+   { c5_ADFSR, UNKNOWN },
+   { c5_AIFSR, UNKNOWN },
+   { c6_DFAR,  UNKNOWN },
+   { c6_IFAR,  UNKNOWN },
+   { c10_PRRR, 0x00098AA4 },
+   { c10_NMRR, 0x44E048E0 },
+   { c12_VBAR, 0x },
+   { c13_CID,  0x },
+   { c13_TID_URW,  UNKNOWN },
+   { c13_TID_URO,  UNKNOWN },
+   { c13_TID_PRIV, UNKNOWN },
+};
+CP15_REGS_ASSERT(a15_cp15_regs_reset, a15_cp15_regs_reset_init);
+
+static void a15_reset_vcpu(struct kvm_vcpu *vcpu)
+{
+   /*
+* Compute guest MPIDR:
+* (Even if we present only one VCPU to the guest on an SMP
+* host we don't set the U bit in the MPIDR, or vice versa, as
+* revealing the underlying hardware properties is likely to
+* be the best choice).
+*/
+   vcpu-arch.cp15[c0_MPIDR] = (read_cpuid_mpidr()  ~MPIDR_CPUID)
+   | (vcpu-vcpu_id  MPIDR_CPUID);
+}
+
 
 
/***
  * Exported reset function
@@ -29,5 +96,38 @@
  */
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 {
+   unsigned int i;
+   struct kvm_vcpu_regs *cpu_reset;
+   u32 (*cp15_reset)[2];
+   void (*cpu_reset_vcpu)(struct kvm_vcpu *vcpu);
+
+   switch (kvm_target_cpu()) {
+   case CORTEX_A15:
+   if (vcpu-vcpu_id  a15_max_cpu_idx)
+   return -EINVAL;
+   cpu_reset = a15_regs_reset;
+   cp15_reset = a15_cp15_regs_reset;
+   cpu_reset_vcpu = a15_reset_vcpu;
+   break;
+   default:
+   return -ENODEV;
+   }
+
+   /* Reset core registers */
+   memcpy(vcpu-arch.regs, cpu_reset, sizeof(vcpu-arch.regs));
+
+   /* Reset CP15 registers */
+   for (i = 0; i  nr_cp15_regs; i++) {
+   if (cp15_reset[i][0] != i) {
+   kvm_err(CP15 field %d is %d, expected %d\n,
+   i, cp15_reset[i][0], i);
+   return -ENXIO;
+   }
+   vcpu-arch.cp15[i] = cp15_reset[i][1];
+   }
+
+   /* Physical CPU specific runtime reset

[PATCH v9 08/16] ARM: KVM: Hypervisor inititalization

2012-07-03 Thread Christoffer Dall

Sets up the required registers to run code in HYP-mode from the kernel.

By setting the HVBAR the kernel can execute code in Hyp-mode with
the MMU disabled. The HVBAR initially points to initialization code,
which initializes other Hyp-mode registers and enables the MMU
for Hyp-mode. Afterwards, the HVBAR is changed to point to KVM
Hyp vectors used to catch guest faults and to switch to Hyp mode
to perform a world-switch into a KVM guest.

Also provides memory mapping code to map required code pages and data
structures accessed in Hyp mode at the same virtual address as the
host kernel virtual addresses, but which conforms to the architectural
requirements for translations in Hyp mode. This interface is added in
arch/arm/kvm/arm_mmu.c and is comprised of:
 - create_hyp_mappings(from, to);
 - free_hyp_pmds();

Note: The initialization mechanism currently relies on an SMC #0 call
to the secure monitor, which was merely a fast way of getting to the
hypervisor. We are working on supporting Hyp mode boot of the kernel
and control of Hyp mode through a local kernel mechanism.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arm.h  |   99 
 arch/arm/include/asm/kvm_asm.h  |   22 
 arch/arm/include/asm/kvm_mmu.h  |   35 ++
 arch/arm/include/asm/pgtable-3level-hwdef.h |4 +
 arch/arm/include/asm/pgtable-3level.h   |4 +
 arch/arm/include/asm/pgtable.h  |1 
 arch/arm/kvm/arm.c  |  163 +++
 arch/arm/kvm/exports.c  |   13 ++
 arch/arm/kvm/init.S |  102 +
 arch/arm/kvm/interrupts.S   |   47 
 arch/arm/kvm/mmu.c  |  151 +
 mm/memory.c |2 
 12 files changed, 642 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/include/asm/kvm_mmu.h

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 2f9d28e..56f5c85 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -19,10 +19,109 @@
 #ifndef __ARM_KVM_ARM_H__
 #define __ARM_KVM_ARM_H__
 
+#include asm/types.h
+
 /* Supported Processor Types */
 #define CORTEX_A15 (0xC0F)
 
 /* Multiprocessor Affinity Register */
 #define MPIDR_CPUID(0x3  0)
 
+/* Hyp Configuration Register (HCR) bits */
+#define HCR_TGE(1  27)
+#define HCR_TVM(1  26)
+#define HCR_TTLB   (1  25)
+#define HCR_TPU(1  24)
+#define HCR_TPC(1  23)
+#define HCR_TSW(1  22)
+#define HCR_TAC(1  21)
+#define HCR_TIDCP  (1  20)
+#define HCR_TSC(1  19)
+#define HCR_TID3   (1  18)
+#define HCR_TID2   (1  17)
+#define HCR_TID1   (1  16)
+#define HCR_TID0   (1  15)
+#define HCR_TWE(1  14)
+#define HCR_TWI(1  13)
+#define HCR_DC (1  12)
+#define HCR_BSU(3  10)
+#define HCR_BSU_IS (1  10)
+#define HCR_FB (1  9)
+#define HCR_VA (1  8)
+#define HCR_VI (1  7)
+#define HCR_VF (1  6)
+#define HCR_AMO(1  5)
+#define HCR_IMO(1  4)
+#define HCR_FMO(1  3)
+#define HCR_PTW(1  2)
+#define HCR_SWIO   (1  1)
+#define HCR_VM 1
+
+/*
+ * The bits we set in HCR:
+ * TAC:Trap ACTLR
+ * TSC:Trap SMC
+ * TSW:Trap cache operations by set/way
+ * TWI:Trap WFI
+ * BSU_IS: Upgrade barriers to the inner shareable domain
+ * FB: Force broadcast of all maintainance operations
+ * AMO:Override CPSR.A and enable signaling with VA
+ * IMO:Override CPSR.I and enable signaling with VI
+ * FMO:Override CPSR.F and enable signaling with VF
+ * SWIO:   Turn set/way invalidates into set/way clean+invalidate
+ */
+#define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
+   HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
+   HCR_SWIO)
+
+/* Hyp System Control Register (HSCTLR) bits */
+#define HSCTLR_TE  (1  30)
+#define HSCTLR_EE  (1  25)
+#define HSCTLR_FI  (1  21)
+#define HSCTLR_WXN (1  19)
+#define HSCTLR_I   (1  12)
+#define HSCTLR_C   (1  2)
+#define HSCTLR_A   (1  1)
+#define HSCTLR_M   1
+#define HSCTLR_MASK(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I | \
+HSCTLR_WXN | HSCTLR_FI | HSCTLR_EE | HSCTLR_TE)
+
+/* TTBCR and HTCR Registers bits */
+#define TTBCR_EAE  (1  31)
+#define TTBCR_IMP  (1  30)
+#define TTBCR_SH1  (3  28)
+#define TTBCR_ORGN1(3  26)
+#define TTBCR_IRGN1(3  24)
+#define TTBCR_EPD1 (1  23)
+#define TTBCR_A1   (1  22)
+#define TTBCR_T1SZ (3  16)
+#define

[PATCH v9 10/16] ARM: KVM: Memory virtualization setup

2012-07-03 Thread Christoffer Dall

From: Christoffer Dall cd...@cs.columbia.edu

This commit introduces the framework for guest memory management
through the use of 2nd stage translation. Each VM has a pointer
to a level-1 table (the pgd field in struct kvm_arch) which is
used for the 2nd stage translations. Entries are added when handling
guest faults (later patch) and the table itself can be allocated and
freed through the following functions implemented in
arch/arm/kvm/arm_mmu.c:
 - kvm_alloc_stage2_pgd(struct kvm *kvm);
 - kvm_free_stage2_pgd(struct kvm *kvm);

Further, each entry in TLBs and caches are tagged with a VMID
identifier in addition to ASIDs. The VMIDs are assigned consecutively
to VMs in the order that VMs are executed, and caches and tlbs are
invalidated when the VMID space has been used to allow for more than
255 simultaenously running guests.

The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
freed in kvm_arch_destroy_vm(). Both functions are called from the main
KVM code.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_mmu.h |5 ++
 arch/arm/kvm/arm.c |   37 ++-
 arch/arm/kvm/mmu.c |  102 
 3 files changed, 143 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 3a2a56c..dca7803 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -32,4 +32,9 @@
 int create_hyp_mappings(void *from, void *to);
 void free_hyp_pmds(void);
 
+int kvm_alloc_stage2_pgd(struct kvm *kvm);
+void kvm_free_stage2_pgd(struct kvm *kvm);
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 63593ee..ce3d258 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -74,12 +74,34 @@ void kvm_arch_sync_events(struct kvm *kvm)
 {
 }
 
+/**
+ * kvm_arch_init_vm - initializes a VM data structure
+ * @kvm:   pointer to the KVM struct
+ */
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
+   int ret = 0;
+
if (type)
return -EINVAL;
 
-   return 0;
+   ret = kvm_alloc_stage2_pgd(kvm);
+   if (ret)
+   goto out_fail_alloc;
+   mutex_init(kvm-arch.pgd_mutex);
+
+   ret = create_hyp_mappings(kvm, kvm + 1);
+   if (ret)
+   goto out_free_stage2_pgd;
+
+   /* Mark the initial VMID generation invalid */
+   kvm-arch.vmid_gen = 0;
+
+   return ret;
+out_free_stage2_pgd:
+   kvm_free_stage2_pgd(kvm);
+out_fail_alloc:
+   return ret;
 }
 
 int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
@@ -97,10 +119,16 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, 
unsigned long npages)
return 0;
 }
 
+/**
+ * kvm_arch_destroy_vm - destroy the VM data structure
+ * @kvm:   pointer to the KVM struct
+ */
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
int i;
 
+   kvm_free_stage2_pgd(kvm);
+
for (i = 0; i  KVM_MAX_VCPUS; ++i) {
if (kvm-vcpus[i]) {
kvm_arch_vcpu_free(kvm-vcpus[i]);
@@ -176,7 +204,13 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, 
unsigned int id)
if (err)
goto free_vcpu;
 
+   err = create_hyp_mappings(vcpu, vcpu + 1);
+   if (err)
+   goto vcpu_uninit;
+
return vcpu;
+vcpu_uninit:
+   kvm_vcpu_uninit(vcpu);
 free_vcpu:
kmem_cache_free(kvm_vcpu_cache, vcpu);
 out:
@@ -185,6 +219,7 @@ out:
 
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 {
+   kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 8142eb6..ddfb3df 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -162,6 +162,108 @@ out:
return err;
 }
 
+/**
+ * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
+ * @kvm:   The KVM struct pointer for the VM.
+ *
+ * Allocates the 1st level table only of size defined by PGD2_ORDER (can
+ * support either full 40-bit input addresses or limited to 32-bit input
+ * addresses). Clears the allocated pages.
+ */
+int kvm_alloc_stage2_pgd(struct kvm *kvm)
+{
+   pgd_t *pgd;
+
+   if (kvm-arch.pgd != NULL) {
+   kvm_err(kvm_arch already initialized?\n);
+   return -EINVAL;
+   }
+
+   pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, PGD2_ORDER);
+   if (!pgd)
+   return -ENOMEM;
+
+   memset(pgd, 0, PTRS_PER_PGD2 * sizeof(pgd_t));
+   kvm-arch.pgd = pgd;
+
+   return 0;
+}
+
+static void free_guest_pages(pte_t *pte, unsigned long addr)
+{
+   unsigned int i;
+   struct page *page;
+
+   for (i = 0; i  PTRS_PER_PTE; i++) {
+   if (pte_present(*pte)) {
+   page = pfn_to_page(pte_pfn(*pte));
+

[PATCH v9 11/16] ARM: KVM: Inject IRQs and FIQs from userspace

2012-07-03 Thread Christoffer Dall

From: Christoffer Dall cd...@cs.columbia.edu

Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl.
This ioctl is used since the sematics are in fact two lines that can be
either raised or lowered on the VCPU - the IRQ and FIQ lines.

KVM needs to know which VCPU it must operate on and whether the FIQ or
IRQ line is raised/lowered. Hence both pieces of information is packed
in the kvm_irq_level-irq field. The irq fild value will be:
  IRQ: vcpu_index  1
  FIQ: (vcpu_index  1) | 1

This is documented in Documentation/kvm/api.txt.

The effect of the ioctl is simply to simply raise/lower the
corresponding irq_line field on the VCPU struct, which will cause the
world-switch code to raise/lower virtual interrupts when running the
guest on next switch. The wait_for_interrupt flag is also cleared for
raised IRQs or FIQs causing an idle VCPU to become active again. CPUs
in guest mode are kicked to make sure they refresh their interrupt status.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 Documentation/virtual/kvm/api.txt |   12 ++---
 arch/arm/include/asm/kvm.h|9 +++
 arch/arm/include/asm/kvm_arm.h|7 --
 arch/arm/kvm/arm.c|   47 +
 include/linux/kvm.h   |1 +
 5 files changed, 70 insertions(+), 6 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 310fe50..79c10fc 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -614,15 +614,19 @@ only go to the IOAPIC.  On ia64, a IOSAPIC is created.
 4.25 KVM_IRQ_LINE
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, arm
 Type: vm ioctl
 Parameters: struct kvm_irq_level
 Returns: 0 on success, -1 on error
 
 Sets the level of a GSI input to the interrupt controller model in the kernel.
-Requires that an interrupt controller model has been previously created with
-KVM_CREATE_IRQCHIP.  Note that edge-triggered interrupts require the level
-to be set to 1 and then back to 0.
+On some architectures it is required that an interrupt controller model has
+been previously created with KVM_CREATE_IRQCHIP.  Note that edge-triggered
+interrupts require the level to be set to 1 and then back to 0.
+
+ARM uses two types of interrupt lines per CPU: IRQ and FIQ.  The value of the
+irq field should be (vcpu_index  1) for IRQs and ((vcpu_index  1) | 1) for
+FIQs. Level is used to raise/lower the line.
 
 struct kvm_irq_level {
union {
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
index 1d0d8f1..54f301d 100644
--- a/arch/arm/include/asm/kvm.h
+++ b/arch/arm/include/asm/kvm.h
@@ -22,6 +22,15 @@
 #include asm/types.h
 
 #define __KVM_HAVE_GUEST_DEBUG
+#define __KVM_HAVE_IRQ_LINE
+
+/*
+ * KVM_IRQ_LINE macros to set/read IRQ/FIQ for specific VCPU index.
+ */
+enum KVM_ARM_IRQ_LINE_TYPE {
+   KVM_ARM_IRQ_LINE = 0,
+   KVM_ARM_FIQ_LINE = 1,
+};
 
 /*
  * Modes used for short-hand mode determinition in the world-switch code and
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 56f5c85..220f241 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -48,8 +48,10 @@
 #define HCR_BSU_IS (1  10)
 #define HCR_FB (1  9)
 #define HCR_VA (1  8)
-#define HCR_VI (1  7)
-#define HCR_VF (1  6)
+#define HCR_VI_BIT_NR  7
+#define HCR_VF_BIT_NR  6
+#define HCR_VI (1  HCR_VI_BIT_NR)
+#define HCR_VF (1  HCR_VF_BIT_NR)
 #define HCR_AMO(1  5)
 #define HCR_IMO(1  4)
 #define HCR_FMO(1  3)
@@ -73,6 +75,7 @@
 #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
HCR_SWIO)
+#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE  (1  30)
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index ce3d258..8b024ee 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -24,6 +24,7 @@
 #include linux/fs.h
 #include linux/mman.h
 #include linux/sched.h
+#include linux/kvm.h
 #include trace/events/kvm.h
 
 #define CREATE_TRACE_POINTS
@@ -256,6 +257,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
+   vcpu-cpu = cpu;
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -296,6 +298,51 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
return -EINVAL;
 }
 
+int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level)
+{
+   unsigned int vcpu_idx;
+   struct kvm_vcpu *vcpu;
+   unsigned long *ptr;
+   bool set;
+   int bit_nr;
+
+   vcpu_idx = irq_level-irq  1;
+   if (vcpu_idx = KVM_MAX_VCPUS)
+   return -EINVAL;
+
+

[PATCH v9 12/16] ARM: KVM: World-switch implementation

2012-07-03 Thread Christoffer Dall

Provides complete world-switch implementation to switch to other guests
running in non-secure modes. Includes Hyp exception handlers that
capture necessary exception information and stores the information on
the VCPU and KVM structures.

The following Hyp-ABI is also documented in the code:

Hyp-ABI: Switching from host kernel to Hyp-mode:
   Switching to Hyp mode is done through a simple HVC instructions. The
   exception vector code will check that the HVC comes from VMID==0 and if
   so will store the necessary state on the Hyp stack, which will look like
   this (growing downwards, see the hyp_hvc handler):
 ...
 stack_page + 4: spsr (Host-SVC cpsr)
 stack_page: lr_usr
 --: stack bottom

Hyp-ABI: Switching from Hyp-mode to host kernel SVC mode:
   When returning from Hyp mode to SVC mode, another HVC instruction is
   executed from Hyp mode, which is taken in the hyp_svc handler. The
   bottom of the Hyp is derived from the Hyp stack pointer (only a single
   page aligned stack is used per CPU) and the initial SVC registers are
   used to restore the host state.

Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.

One controversy may be the back-door call to __irq_svc (the host
kernel's own physical IRQ handler) which is called when a physical IRQ
exception is taken in Hyp mode while running in the guest.

SMP support using the VMPIDR calculated on the basis of the host MPIDR
and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.

Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
a separate patch into the appropriate patches introducing the
functionality. Note that the VMIDs are stored per VM as required by the ARM
architecture reference manual.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arm.h |   37 ++
 arch/arm/kernel/armksyms.c |7 
 arch/arm/kernel/asm-offsets.c  |   43 +++
 arch/arm/kernel/entry-armv.S   |1 
 arch/arm/kvm/arm.c |  181 
 arch/arm/kvm/interrupts.S  |  599 
 6 files changed, 865 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 220f241..232117c 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -105,6 +105,17 @@
 #define TTBCR_T0SZ 3
 #define HTCR_MASK  (TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
 
+/* Hyp System Trap Register */
+#define HSTR_T(x)  (1  x)
+#define HSTR_TTEE  (1  16)
+#define HSTR_TJDBX (1  17)
+
+/* Hyp Coprocessor Trap Register */
+#define HCPTR_TCP(x)   (1  x)
+#define HCPTR_TCP_MASK (0x3fff)
+#define HCPTR_TASE (1  15)
+#define HCPTR_TTA  (1  20)
+#define HCPTR_TCPAC(1  31)
 
 /* Virtualization Translation Control Register (VTCR) bits */
 #define VTCR_SH0   (3  12)
@@ -126,5 +137,31 @@
 #define VTTBR_X(5 - VTCR_GUEST_T0SZ)
 #endif
 
+/* Hyp Syndrome Register (HSR) bits */
+#define HSR_EC_SHIFT   (26)
+#define HSR_EC (0x3fU  HSR_EC_SHIFT)
+#define HSR_IL (1U  25)
+#define HSR_ISS(HSR_IL - 1)
+#define HSR_ISV_SHIFT  (24)
+#define HSR_ISV(1U  HSR_ISV_SHIFT)
+
+#define HSR_EC_UNKNOWN (0x00)
+#define HSR_EC_WFI (0x01)
+#define HSR_EC_CP15_32 (0x03)
+#define HSR_EC_CP15_64 (0x04)
+#define HSR_EC_CP14_MR (0x05)
+#define HSR_EC_CP14_LS (0x06)
+#define HSR_EC_CP_0_13 (0x07)
+#define HSR_EC_CP10_ID (0x08)
+#define HSR_EC_JAZELLE (0x09)
+#define HSR_EC_BXJ (0x0A)
+#define HSR_EC_CP14_64 (0x0C)
+#define HSR_EC_SVC_HYP (0x11)
+#define HSR_EC_HVC (0x12)
+#define HSR_EC_SMC (0x13)
+#define HSR_EC_IABT(0x20)
+#define HSR_EC_IABT_HYP(0x21)
+#define HSR_EC_DABT(0x24)
+#define HSR_EC_DABT_HYP(0x25)
 
 #endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index b57c75e..38d3a12 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -48,6 +48,13 @@ extern void __aeabi_ulcmp(void);
 
 extern void fpundefinstr(void);
 
+#ifdef CONFIG_KVM_ARM_HOST
+/* This is needed for KVM */
+extern void __irq_svc(void);
+
+EXPORT_SYMBOL_GPL(__irq_svc);
+#endif
+
/* platform dependent support */
 EXPORT_SYMBOL(__udelay);
 EXPORT_SYMBOL(__const_udelay);
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 1429d89..9c76b53 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -13,6 +13,7 @@

[PATCH v9 13/16] ARM: KVM: Emulation framework and CP15 emulation

2012-07-03 Thread Christoffer Dall

Adds a new important function in the main KVM/ARM code called
handle_exit() which is called from kvm_arch_vcpu_ioctl_run() on returns
from guest execution. This function examines the Hyp-Syndrome-Register
(HSR), which contains information telling KVM what caused the exit from
the guest.

Some of the reasons for an exit are CP15 accesses, which are
not allowed from the guest and this commit handles these exits by
emulating the intended operation in software and skip the guest
instruction.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arm.h |5 
 arch/arm/include/asm/kvm_emulate.h |   10 +
 arch/arm/include/asm/kvm_host.h|3 
 arch/arm/kvm/arm.c |  114 +
 arch/arm/kvm/emulate.c |  455 
 arch/arm/kvm/trace.h   |   28 ++
 6 files changed, 614 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 232117c..0d1e895 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -77,6 +77,11 @@
HCR_SWIO)
 #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
+/* System Control Register (SCTLR) bits */
+#define SCTLR_TE   (1  30)
+#define SCTLR_EE   (1  25)
+#define SCTLR_V(1  13)
+
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE  (1  30)
 #define HSCTLR_EE  (1  25)
diff --git a/arch/arm/include/asm/kvm_emulate.h 
b/arch/arm/include/asm/kvm_emulate.h
index 9e29335..f2e973c 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -51,6 +51,16 @@ static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
return mode;
 }
 
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+void kvm_adjust_itstate(struct kvm_vcpu *vcpu);
+void kvm_inject_undefined(struct kvm_vcpu *vcpu);
+
 /*
  * Return the SPSR for the specified mode of the virtual CPU.
  */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index f6b4c02..c58865b 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -112,6 +112,9 @@ struct kvm_vcpu_arch {
u64 pc_ipa2;/* same as above, but for non-aligned wide thumb
   instructions */
 
+   /* dcache set/way operation pending */
+   cpumask_t require_dcache_flush;
+
/* IO related fields */
bool mmio_sign_extend;  /* for byte/halfword loads */
u32 mmio_rd;
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 4687690..5e6465b 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -37,6 +37,7 @@
 #include asm/mman.h
 #include asm/idmap.h
 #include asm/tlbflush.h
+#include asm/cacheflush.h
 #include asm/cputype.h
 #include asm/kvm_arm.h
 #include asm/kvm_asm.h
@@ -271,6 +272,15 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
vcpu-cpu = cpu;
+
+   /*
+* Check whether this vcpu requires the cache to be flushed on
+* this physical CPU. This is a consequence of doing dcache
+* operations by set/way on this vcpu. We do it here in order
+* to be in a non-preemptible section.
+*/
+   if (cpumask_test_and_clear_cpu(cpu, vcpu-arch.require_dcache_flush))
+   flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -375,6 +385,69 @@ static void update_vttbr(struct kvm *kvm)
spin_unlock(kvm_vmid_lock);
 }
 
+static int handle_svc_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+   /* SVC called from Hyp mode should never get here */
+   kvm_debug(SVC called from Hyp mode shouldn't go here\n);
+   BUG();
+   return -EINVAL; /* Squash warning */
+}
+
+static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+   /*
+* Guest called HVC instruction:
+* Let it know we don't want that by injecting an undefined exception.
+*/
+   kvm_debug(hvc: %x (at %08x), vcpu-arch.hsr  ((1  16) - 1),
+vcpu-arch.regs.pc);
+   kvm_debug( HSR: %8x, vcpu-arch.hsr);
+   kvm_inject_undefined(vcpu);
+   return 0;
+}
+
+static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+   /* We don't support SMC; don't do that. */
+   kvm_debug(smc: at %08x, vcpu-arch.regs.pc);
+   return -EINVAL;

[PATCH v9 14/16] ARM: KVM: Handle guest faults in KVM

2012-07-03 Thread Christoffer Dall

Handles the guest faults in KVM by mapping in corresponding user pages
in the 2nd stage page tables.

Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and
pgprot_guest variables used to map 2nd stage memory for KVM guests.

Leverages MMU notifiers on KVM/ARM by supporting the kvm_unmap_hva() and
kvm_set_spte_hva operations.  All other KVM MMU notifierhooks are NOPs.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arm.h|9 +
 arch/arm/include/asm/kvm_asm.h|3 
 arch/arm/include/asm/kvm_host.h   |   16 ++
 arch/arm/include/asm/pgtable-3level.h |9 +
 arch/arm/include/asm/pgtable.h|4 +
 arch/arm/kvm/Kconfig  |1 
 arch/arm/kvm/exports.c|1 
 arch/arm/kvm/interrupts.S |   37 ++
 arch/arm/kvm/mmu.c|  218 +
 arch/arm/mm/mmu.c |3 
 10 files changed, 300 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 0d1e895..7f6cad4 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -149,6 +149,15 @@
 #define HSR_ISS(HSR_IL - 1)
 #define HSR_ISV_SHIFT  (24)
 #define HSR_ISV(1U  HSR_ISV_SHIFT)
+#define HSR_FSC(0x3f)
+#define HSR_FSC_TYPE   (0x3c)
+#define HSR_WNR(1  6)
+
+#define FSC_FAULT  (0x04)
+#define FSC_PERM   (0x0c)
+
+/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
+#define HPFAR_MASK (~0xf)
 
 #define HSR_EC_UNKNOWN (0x00)
 #define HSR_EC_WFI (0x01)
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 58d51e3..e01dfab 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -34,6 +34,7 @@
 #define SMCHYP_HVBAR_W 0xfff0
 
 #ifndef __ASSEMBLY__
+struct kvm;
 struct kvm_vcpu;
 
 extern char __kvm_hyp_init[];
@@ -47,6 +48,8 @@ extern char __kvm_hyp_vector[];
 extern char __kvm_hyp_code_start[];
 extern char __kvm_hyp_code_end[];
 
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+
 extern void __kvm_flush_vm_context(void);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index c58865b..0c7e782 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -140,4 +140,20 @@ struct kvm_vcpu_stat {
u32 halt_wakeup;
 };
 
+#define KVM_ARCH_WANT_MMU_NOTIFIER
+struct kvm;
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+
+/* We do not have shadow page tables, hence the empty hooks */
+static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+{
+   return 0;
+}
+
+static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
+{
+   return 0;
+}
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/pgtable-3level.h 
b/arch/arm/include/asm/pgtable-3level.h
index 1169a8a..7351eee 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -102,6 +102,15 @@
  */
 #define L_PGD_SWAPPER  (_AT(pgdval_t, 1)  55)/* 
swapper_pg_dir entry */
 
+/*
+ * 2-nd stage PTE definitions for LPAE.
+ */
+#define L_PTE2_SHARED  L_PTE_SHARED
+#define L_PTE2_READ(_AT(pteval_t, 1)  6) /* HAP[0] */
+#define L_PTE2_WRITE   (_AT(pteval_t, 1)  7) /* HAP[1] */
+#define L_PTE2_NORM_WB (_AT(pteval_t, 3)  4) /* MemAttr[3:2] */
+#define L_PTE2_INNER_WB(_AT(pteval_t, 3)  2) /* MemAttr[1:0] 
*/
+
 #ifndef __ASSEMBLY__
 
 #define pud_none(pud)  (!pud_val(pud))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index bc83540..a31d0e9 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -70,6 +70,7 @@ extern void __pgd_error(const char *file, int line, pgd_t);
 
 extern pgprot_tpgprot_user;
 extern pgprot_tpgprot_kernel;
+extern pgprot_tpgprot_guest;
 
 #define _MOD_PROT(p, b)__pgprot(pgprot_val(p) | (b))
 
@@ -83,6 +84,9 @@ extern pgprot_t   pgprot_kernel;
 #define PAGE_KERNEL_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC   pgprot_kernel
 #define PAGE_HYP   _MOD_PROT(pgprot_kernel, L_PTE_USER)
+#define PAGE_KVM_GUEST _MOD_PROT(pgprot_guest, L_PTE2_READ | \
+ L_PTE2_NORM_WB | L_PTE2_INNER_WB | \
+ L_PTE2_SHARED)
 
 #define __PAGE_NONE__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | 
L_PTE_XN)
 #define __PAGE_SHARED  __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 83abbe0..7fa50d3

[PATCH v9 15/16] ARM: KVM: Handle I/O aborts

2012-07-03 Thread Christoffer Dall

When the guest accesses I/O memory this will create data abort
exceptions and they are handled by decoding the HSR information
(physical address, read/write, length, register) and forwarding reads
and writes to QEMU which performs the device emulation.

Certain classes of load/store operations do not support the syndrome
information provided in the HSR and we therefore must be able to fetch
the offending instruction from guest memory and decode it manually.

This requires changing the general flow somewhat since new calls to run
the VCPU must check if there's a pending MMIO load and perform the write
after userspace has made the data available.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arm.h |3 
 arch/arm/include/asm/kvm_emulate.h |2 
 arch/arm/include/asm/kvm_mmu.h |1 
 arch/arm/kvm/arm.c |6 +
 arch/arm/kvm/emulate.c |  281 
 arch/arm/kvm/mmu.c |  162 -
 arch/arm/kvm/trace.h   |   21 +++
 7 files changed, 473 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 7f6cad4..1efa452 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -149,8 +149,11 @@
 #define HSR_ISS(HSR_IL - 1)
 #define HSR_ISV_SHIFT  (24)
 #define HSR_ISV(1U  HSR_ISV_SHIFT)
+#define HSR_SRT_SHIFT  (16)
+#define HSR_SRT_MASK   (0xf  HSR_SRT_SHIFT)
 #define HSR_FSC(0x3f)
 #define HSR_FSC_TYPE   (0x3c)
+#define HSR_SSE(1  21)
 #define HSR_WNR(1  6)
 
 #define FSC_FAULT  (0x04)
diff --git a/arch/arm/include/asm/kvm_emulate.h 
b/arch/arm/include/asm/kvm_emulate.h
index f2e973c..c41537b 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -58,6 +58,8 @@ int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct 
kvm_run *run);
 int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+   unsigned long instr);
 void kvm_adjust_itstate(struct kvm_vcpu *vcpu);
 void kvm_inject_undefined(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index dca7803..7ccd259 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -35,6 +35,7 @@ void free_hyp_pmds(void);
 int kvm_alloc_stage2_pgd(struct kvm *kvm);
 void kvm_free_stage2_pgd(struct kvm *kvm);
 
+int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 5e6465b..b18f68f 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -522,6 +522,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
int exit_reason;
sigset_t sigsaved;
 
+   if (run-exit_reason == KVM_EXIT_MMIO) {
+   ret = kvm_handle_mmio_return(vcpu, vcpu-run);
+   if (ret)
+   return ret;
+   }
+
if (vcpu-sigset_active)
sigprocmask(SIG_SETMASK, vcpu-sigset, sigsaved);
 
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index b430924..99432d8 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -133,8 +133,30 @@ u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, u32 
mode)
 }
 
 /**
- * Co-processor emulation
+ * Utility functions common for all emulation code
+ */
+
+/*
+ * This one accepts a matrix where the first element is the
+ * bits as they must be, and the second element is the bitmask.
  */
+#define INSTR_NONE -1
+static int kvm_instr_index(u32 instr, u32 table[][2], int table_entries)
+{
+   int i;
+   u32 mask;
+
+   for (i = 0; i  table_entries; i++) {
+   mask = table[i][1];
+   if ((table[i][0]  mask) == (instr  mask))
+   return i;
+   }
+   return INSTR_NONE;
+}
+
+/**
+ * Co-processor emulation
+ */
 
 struct coproc_params {
unsigned long CRn;
@@ -489,6 +511,263 @@ int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run 
*run)
return 0;
 }
 
+
+/**
+ * Load-Store instruction emulation
+

[PATCH v9 16/16] ARM: KVM: Guest wait-for-interrupts (WFI) support

2012-07-03 Thread Christoffer Dall

From: Christoffer Dall cd...@cs.columbia.edu

When the guest executes a WFI instruction the operation is trapped to
KVM, which emulates the instruction in software. There is no correlation
between a guest executing a WFI instruction and actually putting the
hardware into a low-power mode, since a KVM guest is essentially a
process and the WFI instruction can be seen as 'sleep' call from this
process. Therefore, we flag the VCPU to be in wait_for_interrupts mode
and call the main KVM function kvm_vcpu_block() function. This function
will put the thread on a wait-queue and call schedule.

When an interrupt comes in through KVM_IRQ_LINE (see previous patch) we
signal the VCPU thread and unflag the VCPU to no longer wait for
interrupts. All calls to kvm_arch_vcpu_ioctl_run() result in a call to
kvm_vcpu_block() as long as the VCPU is in wfi-mode.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/kvm/arm.c |   15 ++-
 arch/arm/kvm/emulate.c |   12 
 arch/arm/kvm/trace.h   |   16 
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index b18f68f..f3b206a 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -306,9 +306,17 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
return -EINVAL;
 }
 
+/**
+ * kvm_arch_vcpu_runnable - determine if the vcpu can be scheduled
+ * @v: The VCPU pointer
+ *
+ * If the guest CPU is not waiting for interrupts (or waiting and
+ * an interrupt is pending) then it is by definition runnable.
+ */
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-   return 0;
+   return !!v-arch.irq_lines ||
+   !v-arch.wait_for_interrupts;
 }
 
 int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
@@ -538,6 +546,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 */
cond_resched();
 
+   if (vcpu-arch.wait_for_interrupts)
+   kvm_vcpu_block(vcpu);
+
update_vttbr(vcpu-kvm);
 
/*
@@ -635,6 +646,8 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct 
kvm_irq_level *irq_level)
 * trigger a world-switch round on the running physical CPU to set the
 * virtual IRQ/FIQ fields in the HCR appropriately.
 */
+   if (irq_level-level)
+   vcpu-arch.wait_for_interrupts = 0;
kvm_vcpu_kick(vcpu);
 
return 0;
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index 99432d8..564add2 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -505,9 +505,21 @@ int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
return emulate_cp15(vcpu, params);
 }
 
+/**
+ * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a 
guest
+ * @vcpu:  the vcpu pointer
+ * @run:   the kvm_run structure pointer
+ *
+ * Simply sets the wait_for_interrupts flag on the vcpu structure, which will
+ * halt execution of world-switches and schedule other host processes until
+ * there is an incoming IRQ or FIQ to the VM.
+ */
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
+   trace_kvm_wfi(vcpu-arch.regs.pc);
vcpu-stat.wfi_exits++;
+   if (!vcpu-arch.irq_lines)
+   vcpu-arch.wait_for_interrupts = 1;
return 0;
 }
 
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 325106c..28ed1a1 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -90,6 +90,22 @@ TRACE_EVENT(kvm_emulate_cp15_imp,
__entry-CRm, __entry-Op2)
 );
 
+TRACE_EVENT(kvm_wfi,
+   TP_PROTO(unsigned long vcpu_pc),
+   TP_ARGS(vcpu_pc),
+
+   TP_STRUCT__entry(
+   __field(unsigned long,  vcpu_pc )
+   ),
+
+   TP_fast_assign(
+   __entry-vcpu_pc= vcpu_pc;
+   ),
+
+   TP_printk(guest executed wfi at: 0x%08lx, __entry-vcpu_pc)
+);
+
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)

2012-07-03 Thread Daniel P. Berrange

On Mon, Jul 02, 2012 at 04:54:03PM -0300, Eduardo Habkost wrote:
 On Mon, Jul 02, 2012 at 07:56:58PM +0100, Daniel P. Berrange wrote:
  On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote:
   Resending series, after fixing some coding style issues. Does anybody has 
   any
   feedback about this proposal?
   
   Changes v1 - v2:
- Coding style fixes
   
   Original cover letter:
   
   I was investigating if there are any mechanisms that allow manually 
   pinning of
   guest RAM to specific host NUMA nodes, in the case of multi-node KVM 
   guests, and
   noticed that -mem-path could be used for that, except that it currently 
   removes
   any files it creates (using mkstemp()) immediately, not allowing numactl 
   to be
   used on the backing files, as a result. This patches add a 
   -keep-mem-path-files
   option to make QEMU create the files inside -mem-path with more 
   predictable
   names, and not remove them after creation.
   
   Some previous discussions about the subject, for reference:
- Message-ID: 1281534738-8310-1-git-send-email-andre.przyw...@amd.com
  http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684
- Message-ID: 4c7d7c2a.7000...@codemonkey.ws
  http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835
   
   A more recent thread can be found at:
- Message-ID: 20111029184502.gh11...@in.ibm.com
  http://article.gmane.org/gmane.comp.emulators.qemu/123001
   
   Note that this is just a mechanism to facilitate manual static binding 
   using
   numactl on hugetlbfs later, for optimization. This may be especially 
   useful for
   single large multi-node guests use-cases (and, of course, has to be used 
   with
   care).
   
   I don't know if it is a good idea to use the memory range names as a 
   publicly-
   visible interface. Another option may be to use a single file instead, 
   and mmap
   different regions inside the same file for each memory region. I an open 
   to
   comments and suggestions.
   
   Example (untested) usage to bind manually each half of the RAM of a guest 
   to a
   different NUMA node:
   
$ qemu-system-x86_64 [...] -m 2048 -smp 4 \
  -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
  -mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO
$ numactl --offset=1G --length=1G --membind=1 --file 
   /mnt/hugetlbfs/FOO/pc.ram
$ numactl --offset=0  --length=1G --membind=2 --file 
   /mnt/hugetlbfs/FOO/pc.ram
  
  I'd suggest that instead of making the memory file name into a
  public ABI QEMU needs to maintain, QEMU could expose the info
  via a monitor command. eg
  
 $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
   -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
   -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
   -monitor stdio
 (qemu) info mem-nodes
  node0: file=/proc/self/fd/3, offset=0G, length=1G
  node1: file=/proc/self/fd/3, offset=1G, length=1G
  
  This example takes advantage of the fact that with Linux, you can
  still access a deleted file via /proc/self/fd/NNN, which AFAICT,
  would avoid the need for a --keep-mem-path-files.
 
 I like the suggestion.
 
 But other processes still need to be able to open those files if we want
 to do anything useful with them. In this case, I guess it's better to
 let QEMU itself build a /proc/getpid()/fd/fd string instead of
 using /proc/self and forcing the client to find out what's the right
 PID?
 
 Anyway, even if we want to avoid file-descriptor and /proc tricks, we
 can still use the interface you suggest. Then we wouldn't need to have
 any filename assumptions: the filenames could be completly random, as
 they would be reported using the new monitor command.

Opps, yes of course. I did intend that client apps could use the
files, so I should have used  /proc/$PID and not /proc/self

 
  
  By returning info via a monitor command you also avoid hardcoding
  the use of 1 single file for all of memory. You also avoid hardcoding
  the fact that QEMU stores the nodes in contiguous order inside the
  node. eg QEMU could easily return data like this
  
  
 $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
   -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
   -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
   -monitor stdio
 (qemu) info mem-nodes
  node0: file=/proc/self/fd/3, offset=0G, length=1G
  node1: file=/proc/self/fd/4, offset=0G, length=1G
  
  or more ingeneous options
 
 Sounds good.
 
 -- 
 Eduardo

-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info

Re: [PATCH v3 4/5] Sysfs: Export VMCSINFO via sysfs

2012-07-03 Thread Yanfei Zhang

于 2012年06月29日 10:58, Greg KH 写道:
 On Thu, Jun 28, 2012 at 04:37:38AM -0700, Greg KH wrote:
 On Thu, Jun 28, 2012 at 05:54:30PM +0800, Yanfei Zhang wrote:
 于 2012年06月28日 03:22, Greg KH 写道:
 On Wed, Jun 27, 2012 at 04:54:54PM +0800, Yanfei Zhang wrote:
 This patch export offsets of fields via /sys/devices/cpu/vmcs/.
 Individual offsets are contained in subfiles named by the filed's
 encoding, e.g.: /sys/devices/cpu/vmcs/0800

 Signed-off-by: zhangyanfei zhangyan...@cn.fujitsu.com
 ---
  drivers/base/core.c |   13 +
  1 files changed, 13 insertions(+), 0 deletions(-)

 diff --git a/drivers/base/core.c b/drivers/base/core.c
 index 346be8b..dd05ee7 100644
 --- a/drivers/base/core.c
 +++ b/drivers/base/core.c
 @@ -26,6 +26,7 @@
  #include linux/async.h
  #include linux/pm_runtime.h
  #include linux/netdevice.h
 +#include asm/vmcsinfo.h

 Did you just break the build on all other arches?  Not nice.

 @@ -1038,6 +1039,11 @@ int device_add(struct device *dev)
   error = dpm_sysfs_add(dev);
   if (error)
   goto DPMError;
 +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE)
 + error = vmcs_sysfs_add(dev);
 + if (error)
 + goto VMCSError;
 +#endif

 Oh my no, that's no way to ever do this, you know better than that,
 please fix.

 greg k-h


 Sorry for my thoughtless, Here is the new patch.

 ---
  drivers/base/core.c |   13 +
  1 files changed, 13 insertions(+), 0 deletions(-)

 diff --git a/drivers/base/core.c b/drivers/base/core.c
 index 346be8b..7b5266a 100644
 --- a/drivers/base/core.c
 +++ b/drivers/base/core.c
 @@ -30,6 +30,13 @@
  #include base.h
  #include power/power.h
  
 +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE)
 +#include asm/vmcsinfo.h
 +#else
 +static inline int vmcs_sysfs_add(struct device *dev) { return 0; }
 +static inline void vmcs_sysfs_remove(struct device *dev) { }
 +#endif

 {sigh}  No, again, you know better, don't do this.
 
 Ok, as others have rightly pointed out, this wasn't the most helpful
 review comment, sorry about that.
 
 In Linux, we don't put ifdefs in .c files, we put them in .h files.  See
 many examples of this all over the place.  That's my main complaints the
 past two times of this patch.
 
 But, for this, I would question why you even want / need to do this in
 the drivers/base/core/ file in the first place.  Shouldn't it be in some
 arch or cpu specific file instead that already handles the cpu files?
 
 thanks,
 
 greg k-h
 

Many thanks. I have moved the code to my vmcsinfo_intel module.
Thanks again for your helpful comment.

Thanks
Zhang Yanfei
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

A question about how kvm switch context to guest

2012-07-03 Thread Zhengwang Ruan


Hi kashyapc  all,

I see a piece of code in vmx_vcpu_run as below, is it used to switch 
context to a guest? Kvm don't used vmlaunch or vmresume to launch or 
resume a guest? Why does kvm need to manually  switch context by filling 
registers using stored register copies?


===

asm(
/* Store host registers */
push %%Rdx; push %%Rbp;
push %%Rcx \n\t /* placeholder for guest rcx */
push %%Rcx \n\t
cmp %%Rsp, %c[host_rsp](%0) \n\t
je 1f \n\t
mov %%Rsp, %c[host_rsp](%0) \n\t
__ex(ASM_VMX_VMWRITE_RSP_RDX) \n\t
1: \n\t
/* Reload cr2 if changed */
mov %c[cr2](%0), %%Rax \n\t
mov %%cr2, %%Rdx \n\t
cmp %%Rax, %%Rdx \n\t
je 2f \n\t
mov %%Rax, %%cr2 \n\t
2: \n\t
/* Check if vmlaunch of vmresume is needed */
cmpl $0, %c[launched](%0) \n\t
/* Load guest registers.  Don't clobber flags. */
mov %c[rax](%0), %%Rax \n\t
mov %c[rbx](%0), %%Rbx \n\t
mov %c[rdx](%0), %%Rdx \n\t
mov %c[rsi](%0), %%Rsi \n\t
mov %c[rdi](%0), %%Rdi \n\t
mov %c[rbp](%0), %%Rbp \n\t



Thanks,

Zhengwang

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 12/16] ARM: KVM: World-switch implementation

2012-07-03 Thread Avi Kivity

On 07/03/2012 12:01 PM, Christoffer Dall wrote:
 Provides complete world-switch implementation to switch to other guests
 running in non-secure modes. Includes Hyp exception handlers that
 capture necessary exception information and stores the information on
 the VCPU and KVM structures.
 
 The following Hyp-ABI is also documented in the code:
 
 Hyp-ABI: Switching from host kernel to Hyp-mode:
Switching to Hyp mode is done through a simple HVC instructions. The
exception vector code will check that the HVC comes from VMID==0 and if
so will store the necessary state on the Hyp stack, which will look like
this (growing downwards, see the hyp_hvc handler):
  ...
  stack_page + 4: spsr (Host-SVC cpsr)
  stack_page: lr_usr
  --: stack bottom
 
 Hyp-ABI: Switching from Hyp-mode to host kernel SVC mode:
When returning from Hyp mode to SVC mode, another HVC instruction is
executed from Hyp mode, which is taken in the hyp_svc handler. The
bottom of the Hyp is derived from the Hyp stack pointer (only a single
page aligned stack is used per CPU) and the initial SVC registers are
used to restore the host state.
 
 Otherwise, the world-switch is pretty straight-forward. All state that
 can be modified by the guest is first backed up on the Hyp stack and the
 VCPU values is loaded onto the hardware. State, which is not loaded, but
 theoretically modifiable by the guest is protected through the
 virtualiation features to generate a trap and cause software emulation.
 Upon guest returns, all state is restored from hardware onto the VCPU
 struct and the original state is restored from the Hyp-stack onto the
 hardware.
 
 One controversy may be the back-door call to __irq_svc (the host
 kernel's own physical IRQ handler) which is called when a physical IRQ
 exception is taken in Hyp mode while running in the guest.
 
 SMP support using the VMPIDR calculated on the basis of the host MPIDR
 and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.

He should sign off on this patch then.

 
 Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
 a separate patch into the appropriate patches introducing the
 functionality. Note that the VMIDs are stored per VM as required by the ARM
 architecture reference manual.

Ditto.

 diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
 index 220f241..232117c 100644
 --- a/arch/arm/include/asm/kvm_arm.h
 +++ b/arch/arm/include/asm/kvm_arm.h
 @@ -105,6 +105,17 @@
  #define TTBCR_T0SZ   3
  #define HTCR_MASK(TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
  
 +/* Hyp System Trap Register */
 +#define HSTR_T(x)(1  x)
 +#define HSTR_TTEE(1  16)
 +#define HSTR_TJDBX   (1  17)
 +
 +/* Hyp Coprocessor Trap Register */
 +#define HCPTR_TCP(x) (1  x)
 +#define HCPTR_TCP_MASK   (0x3fff)
 +#define HCPTR_TASE   (1  15)
 +#define HCPTR_TTA(1  20)
 +#define HCPTR_TCPAC  (1  31)
  
  /* Virtualization Translation Control Register (VTCR) bits */
  #define VTCR_SH0 (3  12)
 @@ -126,5 +137,31 @@
  #define VTTBR_X  (5 - VTCR_GUEST_T0SZ)
  #endif
  
 +/* Hyp Syndrome Register (HSR) bits */
 +#define HSR_EC_SHIFT (26)
 +#define HSR_EC   (0x3fU  HSR_EC_SHIFT)
 +#define HSR_IL   (1U  25)
 +#define HSR_ISS  (HSR_IL - 1)
 +#define HSR_ISV_SHIFT(24)
 +#define HSR_ISV  (1U  HSR_ISV_SHIFT)
 +
 +#define HSR_EC_UNKNOWN   (0x00)
 +#define HSR_EC_WFI   (0x01)
 +#define HSR_EC_CP15_32   (0x03)
 +#define HSR_EC_CP15_64   (0x04)
 +#define HSR_EC_CP14_MR   (0x05)
 +#define HSR_EC_CP14_LS   (0x06)
 +#define HSR_EC_CP_0_13   (0x07)
 +#define HSR_EC_CP10_ID   (0x08)
 +#define HSR_EC_JAZELLE   (0x09)
 +#define HSR_EC_BXJ   (0x0A)
 +#define HSR_EC_CP14_64   (0x0C)
 +#define HSR_EC_SVC_HYP   (0x11)
 +#define HSR_EC_HVC   (0x12)
 +#define HSR_EC_SMC   (0x13)
 +#define HSR_EC_IABT  (0x20)
 +#define HSR_EC_IABT_HYP  (0x21)
 +#define HSR_EC_DABT  (0x24)
 +#define HSR_EC_DABT_HYP  (0x25)
  
  #endif /* __ARM_KVM_ARM_H__ */
 diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
 index b57c75e..38d3a12 100644
 --- a/arch/arm/kernel/armksyms.c
 +++ b/arch/arm/kernel/armksyms.c
 @@ -48,6 +48,13 @@ extern void __aeabi_ulcmp(void);
  
  extern void fpundefinstr(void);
  
 +#ifdef CONFIG_KVM_ARM_HOST
 +/* This is needed for KVM */
 +extern void __irq_svc(void);
 +
 +EXPORT_SYMBOL_GPL(__irq_svc);
 +#endif
 +
   /* platform dependent support */
  EXPORT_SYMBOL(__udelay);
  EXPORT_SYMBOL(__const_udelay);
 diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
 index 1429d89..9c76b53 100644
 --- a/arch/arm/kernel/asm-offsets.c
 +++ b/arch/arm/kernel/asm-offsets.c
 @@ -13,6 +13,7 @@
  #include linux/sched.h
  #include linux/mm.h
  #include linux/dma-mapping.h
 +#include linux/kvm_host.h
  #include asm/cacheflush.h
  #include asm/glue-df.h

KVM call minutes June 29

2012-07-03 Thread Juan Quintela


Isaku reminds me that I forgot to pust that minutes:

2012-06-19
--

- migration
 * xbrle: ok
 * huge memory: needs migration-thread, but should be ok
 * postcopy:  ask for latency: maximum/avg/std deviation
  ask for vcpu utilization
  RDMA?
 * xbrle don't fix all the problems (vinod)
 * Freeze is August 1st
 * They asked for better numbers, at least:
   * vcpu utilization (in percentange, Avi)
   * latency of page faults over the network (Anthony)
 * I can integrate your refactorings (postfix) for 1.2 (I have no problems with
   them, 19 1st patches or so, but that is up-to-you and the others).


- multithreading vhost
  * vhost has a lot of challenges with lots of small packets.
  * use a couple of threads by device, one for reception and another for
sending (Anthony)
  * two problems to address (NUMA locality), and scalability (mst)

(Anthony has better notes of the last part).

Note that I am holidays this week (July 2th), so expect lags on my email 
response.


Later, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: A question about how kvm switch context to guest

2012-07-03 Thread Avi Kivity

On 07/03/2012 12:50 PM, Zhengwang Ruan wrote:
 Hi kashyapc  all,
 
 I see a piece of code in vmx_vcpu_run as below, is it used to switch
 context to a guest? Kvm don't used vmlaunch or vmresume to launch or
 resume a guest?

You trimmed the bit that contains vmlaunch/vmresume.

 Why does kvm need to manually  switch context by filling
 registers using stored register copies?

Those registers don't get automatically switched by the hardware.

 
 ===
 
 asm(
 /* Store host registers */
 push %%Rdx; push %%Rbp;
 push %%Rcx \n\t /* placeholder for guest rcx */
 push %%Rcx \n\t
 cmp %%Rsp, %c[host_rsp](%0) \n\t
 je 1f \n\t
 mov %%Rsp, %c[host_rsp](%0) \n\t
 __ex(ASM_VMX_VMWRITE_RSP_RDX) \n\t
 1: \n\t
 /* Reload cr2 if changed */
 mov %c[cr2](%0), %%Rax \n\t
 mov %%cr2, %%Rdx \n\t
 cmp %%Rax, %%Rdx \n\t
 je 2f \n\t
 mov %%Rax, %%cr2 \n\t
 2: \n\t
 /* Check if vmlaunch of vmresume is needed */
 cmpl $0, %c[launched](%0) \n\t
 /* Load guest registers.  Don't clobber flags. */
 mov %c[rax](%0), %%Rax \n\t
 mov %c[rbx](%0), %%Rbx \n\t
 mov %c[rdx](%0), %%Rdx \n\t
 mov %c[rsi](%0), %%Rsi \n\t
 mov %c[rdi](%0), %%Rdi \n\t
 mov %c[rbp](%0), %%Rbp \n\t
 


-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call agenda for Tuesday, July 3rd

2012-07-03 Thread Kevin Wolf

Am 02.07.2012 19:33, schrieb Eric Blake:
 On 07/02/2012 04:16 AM, Juan Quintela wrote:

 Hi

 Please send in any agenda items you are interested in covering.
 
 Can we discuss the future of 'getfd', the possibility of 'pass-fd', or
 even the enhancement of all existing monitor commands to take an
 optional 'nfds' JSON argument for atomic management of fd passing?
 Which commands need to reopen a file with different access, and do we
 bite the bullet to special case all of those commands to allow fd
 passing or can we make qemu_open() coupled with high-level fd passing
 generic enough to satisfy all of our reopen needs?

Sure we can, at least if Corey will attend the call. Otherwise I guess
it's better to keep the discussion on the mailing list.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-03 Thread Peter Lieven


Further output from my testing.

Working:
Linux 2.6.38 with included kvm module
Linux 3.0.0 with included kvm module

Not-Working:
Linux 3.2.0 with included kvm module
Linux 2.6.28 with kvm-kmod 3.4
Linux 3.0.0 with kvm-kmod 3.4
Linux 3.2.0 with kvm-kmod 3.4

I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1.
It might be that the code was introduced somewhere between 3.0.0
and 3.2.0 in the kvm kernel module and that the flaw is not
in qemu-kvm.

Any hints?

Thanks,
Peter


On 02.07.2012 17:05, Avi Kivity wrote:

On 06/28/2012 12:38 PM, Peter Lieven wrote:

does anyone know whats that here in handle_mmio?

 /* hack: Red Hat 7.1 generates these weird accesses. */
 if ((addr  0xa-4  addr= 0xa)  kvm_run-mmio.len == 3)
 return 0;


Just what it says.  There is a 4-byte access to address 0x9.  The
first byte lies in RAM, the next three bytes are in mmio.  qemu is
geared to power-of-two accesses even though x86 can generate accesses to
any number of bytes between 1 and 8.

It appears that this has happened with your guest.  It's not impossible
that it's genuine.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 16/16] ARM: KVM: Guest wait-for-interrupts (WFI) support

2012-07-03 Thread Avi Kivity

On 07/03/2012 12:02 PM, Christoffer Dall wrote:
 From: Christoffer Dall cd...@cs.columbia.edu
 
 When the guest executes a WFI instruction the operation is trapped to
 KVM, which emulates the instruction in software. There is no correlation
 between a guest executing a WFI instruction and actually putting the
 hardware into a low-power mode, since a KVM guest is essentially a
 process and the WFI instruction can be seen as 'sleep' call from this
 process. Therefore, we flag the VCPU to be in wait_for_interrupts mode
 and call the main KVM function kvm_vcpu_block() function. This function
 will put the thread on a wait-queue and call schedule.
 
 When an interrupt comes in through KVM_IRQ_LINE (see previous patch) we
 signal the VCPU thread and unflag the VCPU to no longer wait for
 interrupts. All calls to kvm_arch_vcpu_ioctl_run() result in a call to
 kvm_vcpu_block() as long as the VCPU is in wfi-mode.
 
  
  int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
 @@ -538,6 +546,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
 kvm_run *run)
*/
   cond_resched();
  
 + if (vcpu-arch.wait_for_interrupts)
 + kvm_vcpu_block(vcpu);
 +
   update_vttbr(vcpu-kvm);
  
   /*
 @@ -635,6 +646,8 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct 
 kvm_irq_level *irq_level)
* trigger a world-switch round on the running physical CPU to set the
* virtual IRQ/FIQ fields in the HCR appropriately.
*/
 + if (irq_level-level)
 + vcpu-arch.wait_for_interrupts = 0;

What, no memory barriers, etc?

Is it actually needed?  We can clear it instead after calling
kvm_vcpu_block() above, so the variable is only accessed from the vcpu
thread.  The savings in pain medication are measurable.

   kvm_vcpu_kick(vcpu);
  
   return 0;
 diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
 index 99432d8..564add2 100644
 +/**
 + * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a 
 guest
 + * @vcpu:the vcpu pointer
 + * @run: the kvm_run structure pointer
 + *
 + * Simply sets the wait_for_interrupts flag on the vcpu structure, which will
 + * halt execution of world-switches and schedule other host processes until
 + * there is an incoming IRQ or FIQ to the VM.
 + */
  int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
  {
 + trace_kvm_wfi(vcpu-arch.regs.pc);
   vcpu-stat.wfi_exits++;
 + if (!vcpu-arch.irq_lines)
 + vcpu-arch.wait_for_interrupts = 1;
  

Or you could just call kvm_vcpu_block() here without having the
variable.  But eventually you'll need it since you want to expose wfi
state to userspace for live migration.

-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-03 Thread Avi Kivity

On 07/03/2012 04:01 PM, Peter Lieven wrote:
 Further output from my testing.
 
 Working:
 Linux 2.6.38 with included kvm module
 Linux 3.0.0 with included kvm module
 
 Not-Working:
 Linux 3.2.0 with included kvm module
 Linux 2.6.28 with kvm-kmod 3.4
 Linux 3.0.0 with kvm-kmod 3.4
 Linux 3.2.0 with kvm-kmod 3.4
 
 I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1.
 It might be that the code was introduced somewhere between 3.0.0
 and 3.2.0 in the kvm kernel module and that the flaw is not
 in qemu-kvm.
 
 Any hints?
 

A bisect could tell us where the problem is.

To avoid bisecting all of linux, try

   git bisect v3.2 v3.0 virt/kvm arch/x86/kvm



-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Android-virt] [PATCH v9 16/16] ARM: KVM: Guest wait-for-interrupts (WFI) support

2012-07-03 Thread Peter Maydell

On 3 July 2012 14:10, Avi Kivity a...@redhat.com wrote:
 Or you could just call kvm_vcpu_block() here without having the
 variable.  But eventually you'll need it since you want to expose wfi
 state to userspace for live migration.

You could just always wake the cpu when migrating: the
architecture allows WFI to return early for any reason
it likes including implementation convenience.

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-03 Thread Peter Lieven


On 03.07.2012 15:13, Avi Kivity wrote:

On 07/03/2012 04:01 PM, Peter Lieven wrote:

Further output from my testing.

Working:
Linux 2.6.38 with included kvm module
Linux 3.0.0 with included kvm module

Not-Working:
Linux 3.2.0 with included kvm module
Linux 2.6.28 with kvm-kmod 3.4
Linux 3.0.0 with kvm-kmod 3.4
Linux 3.2.0 with kvm-kmod 3.4

I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1.
It might be that the code was introduced somewhere between 3.0.0
and 3.2.0 in the kvm kernel module and that the flaw is not
in qemu-kvm.

Any hints?


A bisect could tell us where the problem is.

To avoid bisecting all of linux, try

git bisect v3.2 v3.0 virt/kvm arch/x86/kvm



would it also be ok to bisect kvm-kmod?

thanks,
peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call agenda for Tuesday, July 3rd

2012-07-03 Thread Corey Bryant




On 07/03/2012 08:33 AM, Kevin Wolf wrote:

Am 02.07.2012 19:33, schrieb Eric Blake:

On 07/02/2012 04:16 AM, Juan Quintela wrote:


Hi

Please send in any agenda items you are interested in covering.


Can we discuss the future of 'getfd', the possibility of 'pass-fd', or
even the enhancement of all existing monitor commands to take an
optional 'nfds' JSON argument for atomic management of fd passing?
Which commands need to reopen a file with different access, and do we
bite the bullet to special case all of those commands to allow fd
passing or can we make qemu_open() coupled with high-level fd passing
generic enough to satisfy all of our reopen needs?


Sure we can, at least if Corey will attend the call. Otherwise I guess
it's better to keep the discussion on the mailing list.

Kevin



I'll be on the call.  Thanks for getting this on the agenda.

--
Regards,
Corey


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] virtio-blk spec: document topology info, add WCE toggle

2012-07-03 Thread Paolo Bonzini

Hi Rusty, here are two improvements to the virtio-blk spec.

The first documents the status quo of an extension that has already
been supported in QEMU for several releases, but never made it to the
official spec.

The second adds support for toggling the cache mode between writethrough
and writeback.  Two mechanisms are introduced for this.  One is to not
negotiate VIRTIO_BLK_F_FLUSH; it can be done only at reset time and is
more of a refinement geared towards older or limited guests, in order to
make them safe wrt power losses.  The second is via feature bits and a
new configuration field.

Paolo Bonzini (2):
  virtio-blk spec: document topology info
  virtio-blk spec: writeback cache enable improvements

 virtio-spec.lyx |  136 +--
 1 file changed, 132 insertions(+), 4 deletions(-)

-- 
1.7.10.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] virtio-blk spec: document topology info

2012-07-03 Thread Paolo Bonzini

Current QEMU and Linux drivers can export queue parameters via the
virtio-blk configuration space.  Document this, since the next patch
will have to add another configuration field after these.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 virtio-spec.lyx |   85 +++
 1 file changed, 80 insertions(+), 5 deletions(-)

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index dd2d53b..859dbe7 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -5021,7 +5021,20 @@ VIRTIO_BLK_F_SCSI (7) Device supports scsi packet 
commands.
 \end_layout
 
 \begin_layout Description
-VIRTIO_BLK_F_FLUSH (9) Cache flush command support.Device
+VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
+\change_inserted 1531152142 1341305427
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1341301882
+VIRTIO_BLK_F_TOPOLOGY (10) Device exports information on optimal I/O alignment.
+\end_layout
+
+\end_deeper
+\begin_layout Description
+Device
 \begin_inset space ~
 \end_inset
 
@@ -5090,6 +5103,48 @@ struct virtio_blk_config {
 
 \begin_layout Plain Layout
 
+\change_inserted 1531152142 1341301807
+
+struct virtio_blk_topology {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1341301810
+
+u8 physical_block_exp;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1341301817
+
+u8 alignment_offset;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1341301822
+
+u16 min_io_size;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1341301827
+
+u32 opt_io_size;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1341301911
+
+} topology;
+\end_layout
+
+\begin_layout Plain Layout
+
 };
 \end_layout
 
@@ -5098,7 +5153,6 @@ struct virtio_blk_config {
 
 \end_layout
 
-\end_deeper
 \begin_layout Section*
 Device Initialization
 \end_layout
@@ -5119,15 +5173,36 @@ capacity
 \begin_layout Enumerate
 If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated, the blk_size field can
  be read to determine the optimal sector size for the driver to use.
- This does not effect the units used in the protocol (always 512 bytes),
- but awareness of the correct value can effect performance.
+ This does not 
+\change_deleted 1531152142 1341301967
+e
+\change_inserted 1531152142 1341301967
+a
+\change_unchanged
+ffect the units used in the protocol (always 512 bytes), but awareness of
+ the correct value can 
+\change_deleted 1531152142 1341301978
+e
+\change_inserted 1531152142 1341301978
+a
+\change_unchanged
+ffect performance.
 \end_layout
 
 \begin_layout Enumerate
 If the VIRTIO_BLK_F_RO feature is set by the device, any write requests
  will fail.
-\change_inserted 1531152142 1341301982
+\change_inserted 1531152142 1341301920
+
+\end_layout
 
+\begin_layout Enumerate
+
+\change_inserted 1531152142 1341301982
+If the VIRTIO_BLK_F_TOPOLOGY feature is negotiated, the fields in the topology
+ struct can be read to determine the physical block size and optimal I/O
+ lengths for the driver to use.
+ This also does not affect the units in the protocol, only performance.
 \end_layout
 
 \begin_layout Section*
-- 
1.7.10.2


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] virtio-blk spec: writeback cache enable improvements

2012-07-03 Thread Paolo Bonzini

This patch introduces two improvements to writeback cache handling
in the virtio-blk spec.

1) The VIRTIO_BLK_F_FLUSH feature is renamed to VIRTIO_BLK_F_WCE, and
QEMU's behavior is documented explicitly as part of the spec: the host
negotiates the feature only if its cache is writeback.  The obvious dual
requirement is imposed on the guest: it should negotiate the feature
only if it is able to send flushes.  And in order to protect against
data loss, the spec now mandates that the host operates in writethrough
mode if the guest does not negotiate VIRTIO_BLK_F_WCE (this behavior
was already _allowed_ by the spec so far).  This can change with every
reset of course; typically the BIOS will run as writethrough, while the
main OS will run in writeback mode.  This is a backwards-compatible
refinement geared towards old or limited guests, so there is no need
for a new feature bit.

2) a second feature is added, VIRTIO_BLK_F_CONFIG_WCE, that provides
the same information in the configuration.  This will enable the driver
to modify the write-cache setting at runtime (via sysfs for Linux, via
MODE SELECT for Windows).

Patches for QEMU and Linux will come soonish.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 virtio-spec.lyx |   57 +--
 1 file changed, 55 insertions(+), 2 deletions(-)

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index 859dbe7..fccbd28 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -5021,9 +5021,19 @@ VIRTIO_BLK_F_SCSI (7) Device supports scsi packet 
commands.
 \end_layout
 
 \begin_layout Description
-VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
+VIRTIO_BLK_F_
+\change_deleted 1531152142 1341302299
+FLUSH
+\change_inserted 1531152142 1341302304
+WCE
+\change_unchanged
+ (9) 
+\change_deleted 1531152142 1341302317
+Cache flush command support.
 \change_inserted 1531152142 1341305427
-
+Device cache starts in writeback mode after reset.
+ Guests should not negotiate this feature unless they are capable of sending
+ VIRTIO_BLK_T_FLUSH commands.
 \end_layout
 
 \begin_layout Description
@@ -5032,6 +5042,13 @@ VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
 VIRTIO_BLK_F_TOPOLOGY (10) Device exports information on optimal I/O alignment.
 \end_layout
 
+\begin_layout Description
+
+\change_inserted 1531152142 1341302349
+VIRTIO_BLK_F_CONFIG_WCE (11) Device can toggle its cache between writeback
+ and writethrough modes.
+\end_layout
+
 \end_deeper
 \begin_layout Description
 Device
@@ -5145,6 +5162,15 @@ struct virtio_blk_config {
 
 \begin_layout Plain Layout
 
+\change_inserted 1531152142 1341301918
+
+u8 writeback;
+\change_unchanged
+
+\end_layout
+
+\begin_layout Plain Layout
+
 };
 \end_layout
 
@@ -5205,6 +5231,33 @@ If the VIRTIO_BLK_F_TOPOLOGY feature is negotiated, the 
fields in the topology
  This also does not affect the units in the protocol, only performance.
 \end_layout
 
+\begin_layout Enumerate
+
+\change_inserted 1531152142 1341305949
+The cache mode should be read from the writeback field of the configuration
+ if the VIRTIO_BLK_F_CONFIG_WCE feature if available; the driver can also
+ write to the field in order to toggle the cache between writethrough (0)
+ and writeback (1) mode.
+ If the feature is not available, the driver can instead look at the result
+ of negotiating VIRTIO_BLK_F_WCE: the cache will be in writeback mode after
+ reset if and only if VIRTIO_BLK_F_WCE is negotiated
+\begin_inset Foot
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1341306004
+Until version 1.1, QEMU remained in writeback mode even after a guest announced
+ lack of support for VIRTIO_BLK_F_FLUSH.
+\change_unchanged
+
+\end_layout
+
+\end_inset
+
+.
+\end_layout
+
 \begin_layout Section*
 Device Operation
 \end_layout
-- 
1.7.10.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] virtio-blk: allow toggling host cache between writeback and writethrough

2012-07-03 Thread Paolo Bonzini

This patch adds support for the new VIRTIO_BLK_F_CONFIG_WCE feature,
which exposes the cache mode in the configuration space and lets the
driver modify it.  The cache mode is exposed via sysfs.

Even if the host does not support the new feature, the cache mode is
visible (thanks to the existing VIRTIO_BLK_F_WCE), but not modifiable.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 drivers/block/virtio_blk.c |   90 ++-
 include/linux/virtio_blk.h |5 ++-
 2 files changed, 91 insertions(+), 4 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 693187d..5602505 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -397,6 +397,83 @@ static int virtblk_name_format(char *prefix, int index, 
char *buf, int buflen)
return 0;
 }
 
+static int virtblk_get_cache_mode(struct virtio_device *vdev)
+{
+   u8 writeback;
+   int err;
+
+   err = virtio_config_val(vdev, VIRTIO_BLK_F_CONFIG_WCE,
+   offsetof(struct virtio_blk_config, wce),
+   writeback);
+   if (err)
+   writeback = virtio_has_feature(vdev, VIRTIO_BLK_F_WCE);
+
+   return writeback;
+}
+
+static void virtblk_update_cache_mode(struct virtio_device *vdev)
+{
+   u8 writeback = virtblk_get_cache_mode(vdev);
+   struct virtio_blk *vblk = vdev-priv;
+
+   if (writeback)
+   blk_queue_flush(vblk-disk-queue, REQ_FLUSH);
+   else
+   blk_queue_flush(vblk-disk-queue, 0);
+
+   revalidate_disk(vblk-disk);
+}
+
+static const char *virtblk_cache_types[] = {
+   write through, write back
+};
+
+static ssize_t
+virtblk_cache_type_store(struct device *dev, struct device_attribute *attr,
+const char *buf, size_t count)
+{
+   struct gendisk *disk = dev_to_disk(dev);
+   struct virtio_blk *vblk = disk-private_data;
+   struct virtio_device *vdev = vblk-vdev;
+   int i;
+   u8 writeback;
+
+   BUG_ON(!virtio_has_feature(vblk-vdev, VIRTIO_BLK_F_CONFIG_WCE));
+   for (i = ARRAY_SIZE(virtblk_cache_types); --i = 0; )
+   if (sysfs_streq(buf, virtblk_cache_types[i]))
+   break;
+
+   if (i  0)
+   return -EINVAL;
+
+   writeback = i;
+   vdev-config-set(vdev,
+ offsetof(struct virtio_blk_config, wce),
+ writeback, sizeof(writeback));
+
+   virtblk_update_cache_mode(vdev);
+   return count;
+}
+
+static ssize_t
+virtblk_cache_type_show(struct device *dev, struct device_attribute *attr,
+char *buf)
+{
+   struct gendisk *disk = dev_to_disk(dev);
+   struct virtio_blk *vblk = disk-private_data;
+   u8 writeback = virtblk_get_cache_mode(vblk-vdev);
+
+   BUG_ON(writeback = ARRAY_SIZE(virtblk_cache_types));
+   return snprintf(buf, 40, %s\n, virtblk_cache_types[writeback]);
+}
+
+static const struct device_attribute dev_attr_cache_type_ro =
+   __ATTR(cache_type, S_IRUGO,
+  virtblk_cache_type_show, NULL);
+static const struct device_attribute dev_attr_cache_type_rw =
+   __ATTR(cache_type, S_IRUGO|S_IWUSR,
+  virtblk_cache_type_show, virtblk_cache_type_store);
+
 static int __devinit virtblk_probe(struct virtio_device *vdev)
 {
struct virtio_blk *vblk;
@@ -474,8 +549,7 @@ static int __devinit virtblk_probe(struct virtio_device 
*vdev)
vblk-index = index;
 
/* configure queue flush support */
-   if (virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH))
-   blk_queue_flush(q, REQ_FLUSH);
+   virtblk_update_cache_mode(vdev);
 
/* If disk is read-only in the host, the guest should obey */
if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO))
@@ -553,6 +627,14 @@ static int __devinit virtblk_probe(struct virtio_device 
*vdev)
if (err)
goto out_del_disk;
 
+   if (virtio_has_feature(vdev, VIRTIO_BLK_F_CONFIG_WCE))
+   err = device_create_file(disk_to_dev(vblk-disk),
+dev_attr_cache_type_rw);
+   else
+   err = device_create_file(disk_to_dev(vblk-disk),
+dev_attr_cache_type_ro);
+   if (err)
+   goto out_del_disk;
return 0;
 
 out_del_disk:
@@ -655,7 +737,7 @@ static const struct virtio_device_id id_table[] = {
 static unsigned int features[] = {
VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, VIRTIO_BLK_F_SCSI,
-   VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY
+   VIRTIO_BLK_F_WCE, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE
 };
 
 /*
diff --git a/include/linux/virtio_blk.h b/include/linux/virtio_blk.h
index e0edb40..18a1027 100644
--- a/include/linux/virtio_blk.h
+++ b/include/linux/virtio_blk.h
@@ -37,8 +37,9 @@
 #define

[QEMU PATCH 0/2] virtio-blk: writeback cache enable improvements

2012-07-03 Thread Paolo Bonzini

These patches let virtio-blk use the new support for toggling the cache
mode between writethrough and writeback.

The first patch introduces a new feature bit and configuration field to
do this.  The second patch disables writeback caching for guests that do
not negotiate VIRTIO_BLK_F_WCACHE (meaning that they cannot send flush
requests), so that limited or older guests are now safe wrt power losses.
VIRTIO_BLK_F_FLUSH has been introduced in Linux 2.6.32 (in 2009) and was
backported to RHEL/CentOS 5.6 (in 2010).

The Windows drivers (which work by emulating SCSI on top of virtio-blk)
have bugs in this area, which I reported on the Red Hat Bugzilla as
bugs 837321 and 837324.  With these patches they will suffer a
performance hit but gain correctness.

Paolo Bonzini (2):
  virtio-blk: support VIRTIO_BLK_F_CONFIG_WCE
  virtio-blk: disable write cache if not negotiated

 hw/virtio-blk.c |   30 --
 hw/virtio-blk.h |4 +++-
 2 files changed, 31 insertions(+), 3 deletions(-)

-- 
1.7.10.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] virtio-blk: disable write cache if not negotiated

2012-07-03 Thread Paolo Bonzini

If the guest does not support flushes, we should run in writethrough mode.
The setting is temporary until the next reset, so that for example the
BIOS will run in writethrough mode while Linux will run with a writeback
cache.

VIRTIO_BLK_F_FLUSH has been introduced in Linux 2.6.32 (in 2009) and
was backported to RHEL/CentOS 5.6 (in 2010).  The Windows drivers have
two bugs, which I reported on the Red Hat Bugzilla as bugs 837321 and
837324.  With these patches they will suffer a performance hit but
gain correctness.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/virtio-blk.c |   14 ++
 1 file changed, 14 insertions(+)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 280f96d..500e026 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -543,6 +543,19 @@ static uint32_t virtio_blk_get_features(VirtIODevice 
*vdev, uint32_t features)
 return features;
 }
 
+static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t status)
+{
+VirtIOBlock *s = to_virtio_blk(vdev);
+uint32_t features;
+
+if (!(status  VIRTIO_CONFIG_S_DRIVER_OK)) {
+return;
+}
+
+features = vdev-guest_features;
+bdrv_set_enable_write_cache(s-bs, !!(features  (1  VIRTIO_BLK_F_WCE)));
+}
+
 static void virtio_blk_save(QEMUFile *f, void *opaque)
 {
 VirtIOBlock *s = opaque;
@@ -628,6 +641,7 @@ VirtIODevice *virtio_blk_init(DeviceState *dev, 
VirtIOBlkConf *blk)
 s-vdev.get_config = virtio_blk_update_config;
 s-vdev.set_config = virtio_blk_set_config;
 s-vdev.get_features = virtio_blk_get_features;
+s-vdev.set_status = virtio_blk_set_status;
 s-vdev.reset = virtio_blk_reset;
 s-bs = blk-conf.bs;
 s-conf = blk-conf;
-- 
1.7.10.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] virtio-blk: support VIRTIO_BLK_F_CONFIG_WCE

2012-07-03 Thread Paolo Bonzini

Introduce a new feature bit and configuration field that provide
support for toggling the cache mode between writethrough and writeback.

Also rename VIRTIO_BLK_F_WCACHE to VIRTIO_BLK_F_WCE for consistency with
the spec.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/virtio-blk.c |   16 ++--
 hw/virtio-blk.h |4 +++-
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index fe07746..280f96d 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -510,9 +510,19 @@ static void virtio_blk_update_config(VirtIODevice *vdev, 
uint8_t *config)
 blkcfg.size_max = 0;
 blkcfg.physical_block_exp = get_physical_block_exp(s-conf);
 blkcfg.alignment_offset = 0;
+blkcfg.wce = bdrv_enable_write_cache(s-bs);
 memcpy(config, blkcfg, sizeof(struct virtio_blk_config));
 }
 
+static void virtio_blk_set_config(VirtIODevice *vdev, const uint8_t *config)
+{
+VirtIOBlock *s = to_virtio_blk(vdev);
+struct virtio_blk_config blkcfg;
+
+memcpy(blkcfg, config, sizeof(blkcfg));
+bdrv_set_enable_write_cache(s-bs, blkcfg.wce != 0);
+}
+
 static uint32_t virtio_blk_get_features(VirtIODevice *vdev, uint32_t features)
 {
 VirtIOBlock *s = to_virtio_blk(vdev);
@@ -523,9 +533,10 @@ static uint32_t virtio_blk_get_features(VirtIODevice 
*vdev, uint32_t features)
 features |= (1  VIRTIO_BLK_F_BLK_SIZE);
 features |= (1  VIRTIO_BLK_F_SCSI);
 
+features |= (1  VIRTIO_BLK_F_CONFIG_WCE);
 if (bdrv_enable_write_cache(s-bs))
-features |= (1  VIRTIO_BLK_F_WCACHE);
-
+features |= (1  VIRTIO_BLK_F_WCE);
+
 if (bdrv_is_read_only(s-bs))
 features |= 1  VIRTIO_BLK_F_RO;
 
@@ -615,6 +626,7 @@ VirtIODevice *virtio_blk_init(DeviceState *dev, 
VirtIOBlkConf *blk)
   sizeof(VirtIOBlock));
 
 s-vdev.get_config = virtio_blk_update_config;
+s-vdev.set_config = virtio_blk_set_config;
 s-vdev.get_features = virtio_blk_get_features;
 s-vdev.reset = virtio_blk_reset;
 s-bs = blk-conf.bs;
diff --git a/hw/virtio-blk.h b/hw/virtio-blk.h
index d785001..afea114 100644
--- a/hw/virtio-blk.h
+++ b/hw/virtio-blk.h
@@ -31,8 +31,9 @@
 #define VIRTIO_BLK_F_BLK_SIZE   6   /* Block size of disk is available*/
 #define VIRTIO_BLK_F_SCSI   7   /* Supports scsi command passthru */
 /* #define VIRTIO_BLK_F_IDENTIFY   8   ATA IDENTIFY supported, DEPRECATED 
*/
-#define VIRTIO_BLK_F_WCACHE 9   /* write cache enabled */
+#define VIRTIO_BLK_F_WCE9   /* write cache enabled */
 #define VIRTIO_BLK_F_TOPOLOGY   10  /* Topology information is available */
+#define VIRTIO_BLK_F_CONFIG_WCE 11  /* write cache configurable */
 
 #define VIRTIO_BLK_ID_BYTES 20  /* ID string length */
 
@@ -49,6 +50,7 @@ struct virtio_blk_config
 uint8_t alignment_offset;
 uint16_t min_io_size;
 uint32_t opt_io_size;
+uint8_t wce;
 } QEMU_PACKED;
 
 /* These two define direction. */
-- 
1.7.10.2


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Android-virt] [PATCH v9 16/16] ARM: KVM: Guest wait-for-interrupts (WFI) support

2012-07-03 Thread Avi Kivity

On 07/03/2012 04:14 PM, Peter Maydell wrote:
 On 3 July 2012 14:10, Avi Kivity a...@redhat.com wrote:
 Or you could just call kvm_vcpu_block() here without having the
 variable.  But eventually you'll need it since you want to expose wfi
 state to userspace for live migration.
 
 You could just always wake the cpu when migrating: the
 architecture allows WFI to return early for any reason
 it likes including implementation convenience.

Seems reasonable.

I imagine wfi works with interrupts disabled, unlike the x86 silliness?


-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-03 Thread Avi Kivity

On 07/03/2012 04:15 PM, Peter Lieven wrote:
 On 03.07.2012 15:13, Avi Kivity wrote:
 On 07/03/2012 04:01 PM, Peter Lieven wrote:
 Further output from my testing.

 Working:
 Linux 2.6.38 with included kvm module
 Linux 3.0.0 with included kvm module

 Not-Working:
 Linux 3.2.0 with included kvm module
 Linux 2.6.28 with kvm-kmod 3.4
 Linux 3.0.0 with kvm-kmod 3.4
 Linux 3.2.0 with kvm-kmod 3.4

 I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1.
 It might be that the code was introduced somewhere between 3.0.0
 and 3.2.0 in the kvm kernel module and that the flaw is not
 in qemu-kvm.

 Any hints?

 A bisect could tell us where the problem is.

 To avoid bisecting all of linux, try

 git bisect v3.2 v3.0 virt/kvm arch/x86/kvm


 would it also be ok to bisect kvm-kmod?

Yes, but note that kvm-kmod is spread across two repositories which are
not often tested out of sync, so you may get build failures.


-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: UIP flag not cleared

2012-07-03 Thread Paolo Bonzini

Il 15/06/2012 17:38, nicolas.oc...@free.fr ha scritto:
 Hi list,
 
 
 I am having troubles porting my os to qemu/kvm environment. It's
 about the RTC (real-time clock)
 
 There is a flag (UIP flag) which is supposed to show when RTC can be
 read or not.
 
 We wait 10ms for that flag to be cleared, but sometimes it's not
 enough with qemu/kvm.

You need to wait more than 10ms then. :(  This will be fixed in QEMU
1.2, but you cannot rule out delays due to bad scheduling of the virtual
machine monitor (aka QEMU itself).

 Is it necessary to check this flag at all ? or can I always read the
 RTC regardless of the status of the flag ? If the latter is true, why
 then is this flag not always clear ?

Unlike real hardware, QEMU updates the time atomically; there are no
invalid states during the update of the RTC.  However, _reads_ of the
RTC are not atomic so you do need UIP.  UIP triggers 220 us *before* the
invalid state, so that if UIP=0 you have 220 us to read the RTC.  If
your reads take less than 220 us, they are guaranteed to be atomic.

If you need a workaround you can do the following (but it will break on
bare metal):

old_UIP = UIP
read RTC
if old_UIP = 1 and UIP = 0
read RTC again

i.e. accept the old read if both old_UIP and UIP are 1 (and of course if
old_UIP was 0).

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH stable-1.1] qemu-kvm: Add missing default machine options

2012-07-03 Thread Marcelo Tosatti

On Mon, Jul 02, 2012 at 09:34:46AM +0200, Jan Kiszka wrote:
 qemu-kvm-specific machine defaults were missing for pc-0.15 and pc-1.0.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  hw/pc_piix.c |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

Applied both stable patches, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 00/16] KVM/ARM Implementation

2012-07-03 Thread Avi Kivity

On 07/03/2012 11:59 AM, Christoffer Dall wrote:
 The following series implements KVM support for ARM processors,
 specifically on the Cortex A-15 platform.  Work is done in
 collaboration between Columbia University, Virtual Open Systems and
 ARM/Linaro.
 
 The patch series applies to kvm/next, specifically commit:
 ae7a2a3fb6f8b784c2752863f4f1f20c656f76fb
 
 This is Version 9 of the patch series, but the first two versions
 were reviewed outside of the KVM mailing list. Changes can also be
 pulled from:
  git://github.com/virtualopensystems/linux-kvm-arm.git kvm-a15-v9
 
 A non-flattened edition of the patch series can be found at:
  git://github.com/virtualopensystems/linux-kvm-arm.git kvm-a15-v9-stage
 
 The implementation is broken up into a logical set of patches, the first
 five are preparatory patches:
  1. ARM: Add mem_type prot_pte accessor
  2. ARM: ARM_VIRT_EXT config option
  3. ARM: Section based HYP idmaps
  4. KVM: Move KVM_IRQ_LINE to arch-generic code
  5. KVM: Guard code with CONFIG_MMU_NOTIFIER (repost)
 
 KVM guys, please consider pulling the KVM generic patches as early as
 possible. Thanks.

Those seem fine to me.  Marcelo?

 
 Additionally a few major milestones are coming up shortly:
  - Support Thumb MMIO emulation and test MMIO emulation code (under way)
  - Merge Marc Zyngier's patch series for VGIC and timers (review in
progress)

Does it make sense to keep the pre-VGIC interrupt stuff in then?  It
would be better to support just one setup, but kernel-VGIC is of course
less flexible than user space.

  - Change from SMC based install to relying on booting the kernel in Hyp
mode. This requires some larger changes, but will allow a guest
kernel to boot with KVM configured.
 

-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] virtio-blk: Add bio-based IO path for virtio-blk

2012-07-03 Thread Paolo Bonzini

Il 02/07/2012 08:41, Rusty Russell ha scritto:
 With the same workload in guest, the guest fires 200K requests to host 
 with merges enabled in guest (echo 0  /sys/block/vdb/queue/nomerges), 
 while the guest fires 4K requests to host with merges disabled in 
 guest (echo 2  /sys/block/vdb/queue/nomerges). This show that the merge 
 in block layer reduces the total number of requests fire to host a lot 
 (4K / 200K = 20).
 

4 / 200 is 200, not 20. :)

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] virtio-blk: support VIRTIO_BLK_F_CONFIG_WCE

2012-07-03 Thread Kevin Wolf

Am 03.07.2012 15:20, schrieb Paolo Bonzini:
 Introduce a new feature bit and configuration field that provide
 support for toggling the cache mode between writethrough and writeback.
 
 Also rename VIRTIO_BLK_F_WCACHE to VIRTIO_BLK_F_WCE for consistency with
 the spec.

My spec (and my kernel as well) call it VIRTIO_BLK_F_FLUSH.

What's the status of the kernel and spec side of the change?

 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  hw/virtio-blk.c |   16 ++--
  hw/virtio-blk.h |4 +++-
  2 files changed, 17 insertions(+), 3 deletions(-)
 
 diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
 index fe07746..280f96d 100644
 --- a/hw/virtio-blk.c
 +++ b/hw/virtio-blk.c
 @@ -510,9 +510,19 @@ static void virtio_blk_update_config(VirtIODevice *vdev, 
 uint8_t *config)
  blkcfg.size_max = 0;
  blkcfg.physical_block_exp = get_physical_block_exp(s-conf);
  blkcfg.alignment_offset = 0;
 +blkcfg.wce = bdrv_enable_write_cache(s-bs);
  memcpy(config, blkcfg, sizeof(struct virtio_blk_config));
  }
  
 +static void virtio_blk_set_config(VirtIODevice *vdev, const uint8_t *config)
 +{
 +VirtIOBlock *s = to_virtio_blk(vdev);
 +struct virtio_blk_config blkcfg;
 +
 +memcpy(blkcfg, config, sizeof(blkcfg));
 +bdrv_set_enable_write_cache(s-bs, blkcfg.wce != 0);
 +}

We need to call bdrv_flush() here when turning WCE off. And it seems we
don't have a way to signal failure, or may we just leave the bit unchanged?

 @@ -49,6 +50,7 @@ struct virtio_blk_config
  uint8_t alignment_offset;
  uint16_t min_io_size;
  uint32_t opt_io_size;
 +uint8_t wce;
  } QEMU_PACKED;

If the spec isn't set in stone yet, we could make it a flags field
instead of using a whole byte for a single flag.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Android-virt] [PATCH v9 16/16] ARM: KVM: Guest wait-for-interrupts (WFI) support

2012-07-03 Thread Peter Maydell

On 3 July 2012 14:24, Avi Kivity a...@redhat.com wrote:
 On 07/03/2012 04:14 PM, Peter Maydell wrote:
 You could just always wake the cpu when migrating: the
 architecture allows WFI to return early for any reason
 it likes including implementation convenience.

 Seems reasonable.

 I imagine wfi works with interrupts disabled, unlike the x86 silliness?

Not sure exactly which bit of x86 silliness you're referring
to, but WFI will wake up regardless of the interrupt mask
bits in the CPSR. (If you've disabled interrupts in the GIC
that's your own bad lookout I guess.)

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] virtio-blk: disable write cache if not negotiated

2012-07-03 Thread Kevin Wolf

Am 03.07.2012 15:20, schrieb Paolo Bonzini:
 If the guest does not support flushes, we should run in writethrough mode.
 The setting is temporary until the next reset, so that for example the
 BIOS will run in writethrough mode while Linux will run with a writeback
 cache.
 
 VIRTIO_BLK_F_FLUSH has been introduced in Linux 2.6.32 (in 2009) and
 was backported to RHEL/CentOS 5.6 (in 2010).  The Windows drivers have
 two bugs, which I reported on the Red Hat Bugzilla as bugs 837321 and
 837324.  With these patches they will suffer a performance hit but
 gain correctness.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

I generally like the idea for a default, but doesn't this override even
an explicit cache=writeback? Are we sure that we want this?

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] virtio-blk: disable write cache if not negotiated

2012-07-03 Thread Paolo Bonzini

Il 03/07/2012 15:49, Kevin Wolf ha scritto:
 If the guest does not support flushes, we should run in writethrough mode.
  The setting is temporary until the next reset, so that for example the
  BIOS will run in writethrough mode while Linux will run with a writeback
  cache.
  
  VIRTIO_BLK_F_FLUSH has been introduced in Linux 2.6.32 (in 2009) and
  was backported to RHEL/CentOS 5.6 (in 2010).  The Windows drivers have
  two bugs, which I reported on the Red Hat Bugzilla as bugs 837321 and
  837324.  With these patches they will suffer a performance hit but
  gain correctness.
  
  Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 I generally like the idea for a default, but doesn't this override even
 an explicit cache=writeback?

Yes.  It doesn't override cache=unsafe though.

 Are we sure that we want this?

The idea is that this change will overcome Anthony's objections to
switching the default to writeback...

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Android-virt] [PATCH v9 00/16] KVM/ARM Implementation

2012-07-03 Thread Peter Maydell

On 3 July 2012 14:29, Avi Kivity a...@redhat.com wrote:
 On 07/03/2012 11:59 AM, Christoffer Dall wrote:
 Additionally a few major milestones are coming up shortly:
  - Support Thumb MMIO emulation and test MMIO emulation code (under way)
  - Merge Marc Zyngier's patch series for VGIC and timers (review in
progress)

 Does it make sense to keep the pre-VGIC interrupt stuff in then?  It
 would be better to support just one setup, but kernel-VGIC is of course
 less flexible than user space.

We've gone back and forth on this one, I think. For an A15 guest
we want to say that userspace GIC isn't permitted (ie you must use
the in-kernel-irqchip). For an A9 guest you would have to use the
userspace GIC, but on the other hand we don't currently actually
support an A9 guest... It seems a little unfortunate to remove
code that is actually working but I could certainly see the
cleanliness argument for dropping it and letting anybody who
wanted an A9 guest put it back later.

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] virtio-blk: support VIRTIO_BLK_F_CONFIG_WCE

2012-07-03 Thread Paolo Bonzini

Il 03/07/2012 15:46, Kevin Wolf ha scritto:
  Introduce a new feature bit and configuration field that provide
  support for toggling the cache mode between writethrough and writeback.
  
  Also rename VIRTIO_BLK_F_WCACHE to VIRTIO_BLK_F_WCE for consistency with
  the spec.
 My spec (and my kernel as well) call it VIRTIO_BLK_F_FLUSH.
 
 What's the status of the kernel and spec side of the change?

Both posted.  The spec patch that introduces VIRTIO_BLK_F_CONFIG_WCE
also renames it to VIRTIO_BLK_F_WCE, since that's really what it does.
See this old comment in the kernel (not in the latest git anymore):

   /*
* If the FLUSH feature is supported we do have support for
* flushing a volatile write cache on the host.  [...]
* otherwise, we must assume that the host does not
* perform any kind of volatile write caching.
*/

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] virtio-blk: Add bio-based IO path for virtio-blk

2012-07-03 Thread Asias He


On 07/03/2012 09:31 PM, Paolo Bonzini wrote:

Il 02/07/2012 08:41, Rusty Russell ha scritto:

With the same workload in guest, the guest fires 200K requests to host
with merges enabled in guest (echo 0  /sys/block/vdb/queue/nomerges),
while the guest fires 4K requests to host with merges disabled in
guest (echo 2  /sys/block/vdb/queue/nomerges). This show that the merge
in block layer reduces the total number of requests fire to host a lot
(4K / 200K = 20).



4 / 200 is 200, not 20. :)


Crap, I wrote one more zero here. Actually, it is 4000K. So the factor 
is still 4000K/200k = 20 ;-)


--
Asias


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] virtio-blk: Add bio-based IO path for virtio-blk

2012-07-03 Thread Ronen Hod


On 06/18/2012 02:14 PM, Dor Laor wrote:

On 06/18/2012 01:05 PM, Rusty Russell wrote:

On Mon, 18 Jun 2012 16:03:23 +0800, Asias Heas...@redhat.com  wrote:

On 06/18/2012 03:46 PM, Rusty Russell wrote:

On Mon, 18 Jun 2012 14:53:10 +0800, Asias Heas...@redhat.com  wrote:

This patch introduces bio-based IO path for virtio-blk.


Why make it optional?


request-based IO path is useful for users who do not want to bypass the
IO scheduler in guest kernel, e.g. users using spinning disk. For users
using fast disk device, e.g. SSD device, they can use bio-based IO path.


Users using a spinning disk still get IO scheduling in the host though.
What benefit is there in doing it in the guest as well?


The io scheduler waits for requests to merge and thus batch IOs together. It's 
not important w.r.t spinning disks since the host can do it but it causes much 
less vmexits which is the key issue for VMs.


Does it make sense to use the guest's I/O scheduler at all?
- It is not aware of the physical (spinning) disk layout.
- It is not aware of all the host's disk pending requests.
It does have a good side-effect - batching of requests.

Ronen.





Cheers,
Rusty.
___
Virtualization mailing list
virtualizat...@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] virtio-blk: Add bio-based IO path for virtio-blk

2012-07-03 Thread Dor Laor


On 07/03/2012 05:22 PM, Ronen Hod wrote:

On 06/18/2012 02:14 PM, Dor Laor wrote:

On 06/18/2012 01:05 PM, Rusty Russell wrote:

On Mon, 18 Jun 2012 16:03:23 +0800, Asias Heas...@redhat.com  wrote:

On 06/18/2012 03:46 PM, Rusty Russell wrote:

On Mon, 18 Jun 2012 14:53:10 +0800, Asias Heas...@redhat.com  wrote:

This patch introduces bio-based IO path for virtio-blk.


Why make it optional?


request-based IO path is useful for users who do not want to bypass the
IO scheduler in guest kernel, e.g. users using spinning disk. For users
using fast disk device, e.g. SSD device, they can use bio-based IO
path.


Users using a spinning disk still get IO scheduling in the host though.
What benefit is there in doing it in the guest as well?


The io scheduler waits for requests to merge and thus batch IOs
together. It's not important w.r.t spinning disks since the host can
do it but it causes much less vmexits which is the key issue for VMs.


Does it make sense to use the guest's I/O scheduler at all?


That's the reason we have a noop io scheduler.


- It is not aware of the physical (spinning) disk layout.
- It is not aware of all the host's disk pending requests.
It does have a good side-effect - batching of requests.

Ronen.





Cheers,
Rusty.
___
Virtualization mailing list
virtualizat...@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ANNOUNCE] qemu-kvm-1.1.0

2012-07-03 Thread Marcelo Tosatti

On Tue, Jul 03, 2012 at 12:13:19PM +0400, Michael Tokarev wrote:
 On 03.07.2012 03:32, Marcelo Tosatti wrote:
  
  qemu-kvm-1.1.0 is now available. This release is based on the upstream
  qemu 1.1.0, plus kvm-specific enhancements. Please see the
  original QEMU 1.1.0 release announcement [1] for details.
 
 Why the recent fixes from Jan hasn't been applied?  I mean these:
 
 http://www.spinics.net/lists/kvm/msg75076.html
 http://www.spinics.net/lists/kvm/msg75074.html
 
 Thanks,
 
 /mjt

qemu-kvm-1.1.0 was tagged Saturday. They are in stable-1.1 branch now.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v11 4/8] KVM: PPC: Add support for ePAPR idle hcall in host kernel

2012-07-03 Thread Yoder Stuart-B08248



 -Original Message-
 From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On 
 Behalf Of Alexander Graf
 Sent: Monday, July 02, 2012 7:18 AM
 To: Yoder Stuart-B08248
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org
 Subject: Re: [PATCH v11 4/8] KVM: PPC: Add support for ePAPR idle hcall in 
 host kernel
 
 
 On 22.06.2012, at 22:04, Stuart Yoder wrote:
 
  From: Liu Yu-B13201 yu@freescale.com
 
  And add a new flag definition in kvm_ppc_pvinfo to indicate
  whether the host supports the EV_IDLE hcall.
 
  Signed-off-by: Liu Yu yu@freescale.com
  [stuart.yo...@freescale.com: cleanup,fixes for conditions allowing idle]
  Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
  ---
  -v11:
-added PV info flag definition in api.txt
 
  Documentation/virtual/kvm/api.txt |7 +--
  arch/powerpc/include/asm/Kbuild   |1 +
  arch/powerpc/kvm/powerpc.c|   10 --
  include/linux/kvm.h   |2 ++
  4 files changed, 16 insertions(+), 4 deletions(-)
 
  diff --git a/Documentation/virtual/kvm/api.txt 
  b/Documentation/virtual/kvm/api.txt
  index 310fe50..920c3c4 100644
  --- a/Documentation/virtual/kvm/api.txt
  +++ b/Documentation/virtual/kvm/api.txt
  @@ -1190,12 +1190,15 @@ struct kvm_ppc_pvinfo {
  This ioctl fetches PV specific information that need to be passed to the 
  guest
  using the device tree or other means from vm context.
 
  -For now the only implemented piece of information distributed here is an 
  array
  -of 4 instructions that make up a hypercall.
  +The hcall array defines 4 instructions that make up a hypercall.
 
  If any additional field gets added to this structure later on, a bit for 
  that
  additional piece of information will be set in the flags bitmap.
 
  +The flags bitmap is defined as:
  +
  +   /* the host supports the ePAPR idle hcall
  +   #define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (10)
 
  4.48 KVM_ASSIGN_PCI_DEVICE
 
  diff --git a/arch/powerpc/include/asm/Kbuild 
  b/arch/powerpc/include/asm/Kbuild
  index 7e313f1..13d6b7b 100644
  --- a/arch/powerpc/include/asm/Kbuild
  +++ b/arch/powerpc/include/asm/Kbuild
  @@ -34,5 +34,6 @@ header-y += termios.h
  header-y += types.h
  header-y += ucontext.h
  header-y += unistd.h
  +header-y += epapr_hcalls.h
 
  generic-y += rwsem.h
  diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
  index 30cf01c..53d4d45 100644
  --- a/arch/powerpc/kvm/powerpc.c
  +++ b/arch/powerpc/kvm/powerpc.c
  @@ -38,8 +38,7 @@
 
  int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
  {
  -   return !(v-arch.shared-msr  MSR_WE) ||
  -  !!(v-arch.pending_exceptions) ||
  +   return !!(v-arch.pending_exceptions) ||
 v-requests;
  }
 
  @@ -86,6 +85,11 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 
  /* Second return value is in r4 */
  break;
  +   case _EV_HCALL_TOKEN(EV_EPAPR_VENDOR_ID, EV_IDLE):
 
 include/asm/epapr_hcalls.h:#define EV_HCALL_TOKEN(hcall_num) 
 _EV_HCALL_TOKEN(EV_EPAPR_VENDOR_ID,
 hcall_num)
 
 So we're better off using the non-_ version here, no? :)

Yes, will fix that.

Stuart

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 00/10] KVM: MMU: fast page fault

2012-07-03 Thread Marcelo Tosatti

On Wed, Jun 20, 2012 at 03:56:29PM +0800, Xiao Guangrong wrote:
 Changlog:
 - always atomicly update the spte if it can be updated out of mmu-lock
 - rename spte_can_be_writable() to spte_is_locklessly_modifiable()
 - cleanup and comment spte_write_protect()
 
 Performance result:
 (The benchmark can be found at: 
 http://www.spinics.net/lists/kvm/msg73011.html)
 
beforeafter
 Run 10 times, Avg time:  538233957 ns.  249809853 ns. +53.6%

Looks fine to me. 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v11 8/8] PPC: Don't use hardcoded opcode for ePAPR hcall invocation

2012-07-03 Thread Yoder Stuart-B08248

 -Original Message-
 From: Wood Scott-B07421
 Sent: Monday, July 02, 2012 12:25 PM
 To: Alexander Graf
 Cc: Yoder Stuart-B08248; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Tabi 
 Timur-B04825
 Subject: Re: [PATCH v11 8/8] PPC: Don't use hardcoded opcode for ePAPR hcall 
 invocation

 On 07/02/2012 12:17 PM, Alexander Graf wrote:

  On 02.07.2012, at 19:16, Scott Wood wrote:

  On 07/02/2012 12:13 PM, Alexander Graf wrote:

  On 02.07.2012, at 19:10, Scott Wood wrote:

  On 07/02/2012 07:30 AM, Alexander Graf wrote:

  On 22.06.2012, at 22:06, Stuart Yoder wrote:

  From: Liu Yu-B13201 yu@freescale.com

  Signed-off-by: Liu Yu yu@freescale.com
  Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
  ---
  -v11: no changes

  arch/powerpc/include/asm/epapr_hcalls.h |   22 +-
  arch/powerpc/include/asm/fsl_hcalls.h   |   36 
  +++---
  2 files changed, 29 insertions(+), 29 deletions(-)

  diff --git a/arch/powerpc/include/asm/epapr_hcalls.h 
  b/arch/powerpc/include/asm/epapr_hcalls.h
  index 833ce2c..b8d9445 100644
  --- a/arch/powerpc/include/asm/epapr_hcalls.h
  +++ b/arch/powerpc/include/asm/epapr_hcalls.h
  @@ -195,7 +195,7 @@ static inline unsigned int 
  ev_int_set_config(unsigned int interrupt,
 r5  = priority;
 r6  = destination;

  -  __asm__ __volatile__ (sc 1
  +  asm volatile(blepapr_hypercall_start
 : +r (r11), +r (r3), +r (r4), +r (r5), +r (r6)
 : : EV_HCALL_CLOBBERS4

  Hrm. ePAPR hypercalls are allowed to clobber lr, right? But our 
  hypercall entry code depends on lr
 staying alive:

  ePAPR 1.1 says LR is nonvolatile.

  Why is it in the clobber list then?

  Because the inline assembly code is clobbering it -- not the hv-provided
  hcall instructions.

  Only after the change, not before it.

 Hmm.  The comment says, XER, CTR, and LR are currently listed as
 clobbers because it's uncertain whether they will be clobbered.  Maybe
 it dates back to when the ABI was still being discussed?  Timur, do you
 recall?

 In any case, LR is nonvolatile in the spec and in the implementations I
 know about (KVM and Topaz).

Based on this thread I am going to leave this patch as is.

Stuart

[PATCH v12 7/8] powerpc/fsl-soc: use CONFIG_EPAPR_PARAVIRT for hcalls

2012-07-03 Thread Stuart Yoder

From: Scott Wood scottw...@freescale.com

Signed-off-by: Scott Wood scottw...@freescale.com
Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
---
v12: no changes

 arch/powerpc/sysdev/fsl_msi.c |9 +++--
 arch/powerpc/sysdev/fsl_soc.c |2 ++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
index 6e097de..7e2b2f2 100644
--- a/arch/powerpc/sysdev/fsl_msi.c
+++ b/arch/powerpc/sysdev/fsl_msi.c
@@ -236,7 +236,6 @@ static void fsl_msi_cascade(unsigned int irq, struct 
irq_desc *desc)
u32 intr_index;
u32 have_shift = 0;
struct fsl_msi_cascade_data *cascade_data;
-   unsigned int ret;
 
cascade_data = irq_get_handler_data(irq);
msi_data = cascade_data-msi_data;
@@ -268,7 +267,9 @@ static void fsl_msi_cascade(unsigned int irq, struct 
irq_desc *desc)
case FSL_PIC_IP_IPIC:
msir_value = fsl_msi_read(msi_data-msi_regs, msir_index * 0x4);
break;
-   case FSL_PIC_IP_VMPIC:
+#ifdef CONFIG_EPAPR_PARAVIRT
+   case FSL_PIC_IP_VMPIC: {
+   unsigned int ret;
ret = fh_vmpic_get_msir(virq_to_hw(irq), msir_value);
if (ret) {
pr_err(fsl-msi: fh_vmpic_get_msir() failed for 
@@ -277,6 +278,8 @@ static void fsl_msi_cascade(unsigned int irq, struct 
irq_desc *desc)
}
break;
}
+#endif
+   }
 
while (msir_value) {
intr_index = ffs(msir_value) - 1;
@@ -508,10 +511,12 @@ static const struct of_device_id fsl_of_msi_ids[] = {
.compatible = fsl,ipic-msi,
.data = (void *)ipic_msi_feature,
},
+#ifdef CONFIG_EPAPR_PARAVIRT
{
.compatible = fsl,vmpic-msi,
.data = (void *)vmpic_msi_feature,
},
+#endif
{}
 };
 
diff --git a/arch/powerpc/sysdev/fsl_soc.c b/arch/powerpc/sysdev/fsl_soc.c
index c449dbd..97118dc 100644
--- a/arch/powerpc/sysdev/fsl_soc.c
+++ b/arch/powerpc/sysdev/fsl_soc.c
@@ -253,6 +253,7 @@ struct platform_diu_data_ops diu_ops;
 EXPORT_SYMBOL(diu_ops);
 #endif
 
+#ifdef CONFIG_EPAPR_PARAVIRT
 /*
  * Restart the current partition
  *
@@ -278,3 +279,4 @@ void fsl_hv_halt(void)
pr_info(hv exit\n);
fh_partition_stop(-1);
 }
+#endif
-- 
1.7.3.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v12 1/8] PPC: epapr: create define for return code value of success

2012-07-03 Thread Stuart Yoder

From: Stuart Yoder stuart.yo...@freescale.com

Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
---
v12: no changes

 arch/powerpc/include/asm/epapr_hcalls.h |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/epapr_hcalls.h 
b/arch/powerpc/include/asm/epapr_hcalls.h
index bf2c06c..c0c7adc 100644
--- a/arch/powerpc/include/asm/epapr_hcalls.h
+++ b/arch/powerpc/include/asm/epapr_hcalls.h
@@ -88,7 +88,8 @@
 #define _EV_HCALL_TOKEN(id, num) (((id)  16) | (num))
 #define EV_HCALL_TOKEN(hcall_num) _EV_HCALL_TOKEN(EV_EPAPR_VENDOR_ID, 
hcall_num)
 
-/* epapr error codes */
+/* epapr return codes */
+#define EV_SUCCESS 0
 #define EV_EPERM   1   /* Operation not permitted */
 #define EV_ENOENT  2   /*  Entry Not Found */
 #define EV_EIO 3   /* I/O error occured */
-- 
1.7.3.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v12 0/8] KVM: PPC: Add ePAPR idle hcall support

2012-07-03 Thread Stuart Yoder

From: Stuart Yoder stuart.yo...@freescale.com

v12 has a couple of updates to address feedback 
   -use new CURRENT_THREAD_INFO macro in epapr_hcalls.S
   -use EV_HCALL_TOKEN to create epapr hcall token

A prerequisite to applying this patch is the patch:
  PPC: use CURRENT_THREAD_INFO instead of open coded assembly

Liu Yu-B13201 (3):
  KVM: PPC: Add support for ePAPR idle hcall in host kernel
  KVM: PPC: ev_idle hcall support for e500 guests
  PPC: Don't use hardcoded opcode for ePAPR hcall invocation

Scott Wood (1):
  powerpc/fsl-soc: use CONFIG_EPAPR_PARAVIRT for hcalls

Stuart Yoder (4):
  PPC: epapr: create define for return code value of success
  KVM: PPC: use definitions in epapr header for hcalls
  KVM: PPC: add pvinfo for hcall opcodes on e500mc/e5500
  PPC: select EPAPR_PARAVIRT for all users of epapr hcalls

 Documentation/virtual/kvm/api.txt   |7 -
 arch/powerpc/include/asm/Kbuild |1 +
 arch/powerpc/include/asm/epapr_hcalls.h |   36 --
 arch/powerpc/include/asm/fsl_hcalls.h   |   36 +++---
 arch/powerpc/include/asm/kvm_para.h |   21 +
 arch/powerpc/kernel/epapr_hcalls.S  |   28 
 arch/powerpc/kernel/epapr_paravirt.c|   11 -
 arch/powerpc/kernel/kvm.c   |2 +-
 arch/powerpc/kvm/powerpc.c  |   30 +++---
 arch/powerpc/platforms/Kconfig  |1 +
 arch/powerpc/sysdev/fsl_msi.c   |9 ++-
 arch/powerpc/sysdev/fsl_soc.c   |2 +
 drivers/tty/Kconfig |1 +
 drivers/virt/Kconfig|1 +
 include/linux/kvm.h |2 +
 15 files changed, 129 insertions(+), 59 deletions(-)

-- 
1.7.3.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v12 8/8] PPC: Don't use hardcoded opcode for ePAPR hcall invocation

2012-07-03 Thread Stuart Yoder

From: Liu Yu-B13201 yu@freescale.com

Signed-off-by: Liu Yu yu@freescale.com
Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
---
v12: no changes

 arch/powerpc/include/asm/epapr_hcalls.h |   22 +-
 arch/powerpc/include/asm/fsl_hcalls.h   |   36 +++---
 2 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/epapr_hcalls.h 
b/arch/powerpc/include/asm/epapr_hcalls.h
index 833ce2c..b8d9445 100644
--- a/arch/powerpc/include/asm/epapr_hcalls.h
+++ b/arch/powerpc/include/asm/epapr_hcalls.h
@@ -195,7 +195,7 @@ static inline unsigned int ev_int_set_config(unsigned int 
interrupt,
r5  = priority;
r6  = destination;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3), +r (r4), +r (r5), +r (r6)
: : EV_HCALL_CLOBBERS4
);
@@ -224,7 +224,7 @@ static inline unsigned int ev_int_get_config(unsigned int 
interrupt,
r11 = EV_HCALL_TOKEN(EV_INT_GET_CONFIG);
r3 = interrupt;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3), =r (r4), =r (r5), =r (r6)
: : EV_HCALL_CLOBBERS4
);
@@ -254,7 +254,7 @@ static inline unsigned int ev_int_set_mask(unsigned int 
interrupt,
r3 = interrupt;
r4 = mask;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3), +r (r4)
: : EV_HCALL_CLOBBERS2
);
@@ -279,7 +279,7 @@ static inline unsigned int ev_int_get_mask(unsigned int 
interrupt,
r11 = EV_HCALL_TOKEN(EV_INT_GET_MASK);
r3 = interrupt;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3), =r (r4)
: : EV_HCALL_CLOBBERS2
);
@@ -307,7 +307,7 @@ static inline unsigned int ev_int_eoi(unsigned int 
interrupt)
r11 = EV_HCALL_TOKEN(EV_INT_EOI);
r3 = interrupt;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3)
: : EV_HCALL_CLOBBERS1
);
@@ -346,7 +346,7 @@ static inline unsigned int ev_byte_channel_send(unsigned 
int handle,
r7 = be32_to_cpu(p[2]);
r8 = be32_to_cpu(p[3]);
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3),
  +r (r4), +r (r5), +r (r6), +r (r7), +r (r8)
: : EV_HCALL_CLOBBERS6
@@ -385,7 +385,7 @@ static inline unsigned int ev_byte_channel_receive(unsigned 
int handle,
r3 = handle;
r4 = *count;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3), +r (r4),
  =r (r5), =r (r6), =r (r7), =r (r8)
: : EV_HCALL_CLOBBERS6
@@ -423,7 +423,7 @@ static inline unsigned int ev_byte_channel_poll(unsigned 
int handle,
r11 = EV_HCALL_TOKEN(EV_BYTE_CHANNEL_POLL);
r3 = handle;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3), =r (r4), =r (r5)
: : EV_HCALL_CLOBBERS3
);
@@ -456,7 +456,7 @@ static inline unsigned int ev_int_iack(unsigned int handle,
r11 = EV_HCALL_TOKEN(EV_INT_IACK);
r3 = handle;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3), =r (r4)
: : EV_HCALL_CLOBBERS2
);
@@ -480,7 +480,7 @@ static inline unsigned int ev_doorbell_send(unsigned int 
handle)
r11 = EV_HCALL_TOKEN(EV_DOORBELL_SEND);
r3 = handle;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3)
: : EV_HCALL_CLOBBERS1
);
@@ -500,7 +500,7 @@ static inline unsigned int ev_idle(void)
 
r11 = EV_HCALL_TOKEN(EV_IDLE);
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), =r (r3)
: : EV_HCALL_CLOBBERS1
);
diff --git a/arch/powerpc/include/asm/fsl_hcalls.h 
b/arch/powerpc/include/asm/fsl_hcalls.h
index 922d9b5..3abb583 100644
--- a/arch/powerpc/include/asm/fsl_hcalls.h
+++ b/arch/powerpc/include/asm/fsl_hcalls.h
@@ -96,7 +96,7 @@ static inline unsigned int fh_send_nmi(unsigned int vcpu_mask)
r11 = FH_HCALL_TOKEN(FH_SEND_NMI);
r3 = vcpu_mask;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3)
: : EV_HCALL_CLOBBERS1
);
@@ -151,7 +151,7 @@ static inline unsigned int fh_partition_get_dtprop(int 
handle,
r9 = (uint32_t)propvalue_addr;
r10 =

[PATCH v12 5/8] KVM: PPC: ev_idle hcall support for e500 guests

2012-07-03 Thread Stuart Yoder

From: Liu Yu-B13201 yu@freescale.com

Signed-off-by: Liu Yu yu@freescale.com
[varun: 64-bit changes]
Signed-off-by: Varun Sethi varun.se...@freescale.com
Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
---
v12: use new CURRENT_THREAD_INFO macro

 arch/powerpc/include/asm/epapr_hcalls.h |   11 ++-
 arch/powerpc/kernel/epapr_hcalls.S  |   28 
 arch/powerpc/kernel/epapr_paravirt.c|   11 ++-
 3 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/epapr_hcalls.h 
b/arch/powerpc/include/asm/epapr_hcalls.h
index c0c7adc..833ce2c 100644
--- a/arch/powerpc/include/asm/epapr_hcalls.h
+++ b/arch/powerpc/include/asm/epapr_hcalls.h
@@ -50,10 +50,6 @@
 #ifndef _EPAPR_HCALLS_H
 #define _EPAPR_HCALLS_H
 
-#include linux/types.h
-#include linux/errno.h
-#include asm/byteorder.h
-
 #define EV_BYTE_CHANNEL_SEND   1
 #define EV_BYTE_CHANNEL_RECEIVE2
 #define EV_BYTE_CHANNEL_POLL   3
@@ -109,6 +105,11 @@
 #define EV_UNIMPLEMENTED   12  /* Unimplemented hypercall */
 #define EV_BUFFER_OVERFLOW 13  /* Caller-supplied buffer too small */
 
+#ifndef __ASSEMBLY__
+#include linux/types.h
+#include linux/errno.h
+#include asm/byteorder.h
+
 /*
  * Hypercall register clobber list
  *
@@ -506,5 +507,5 @@ static inline unsigned int ev_idle(void)
 
return r3;
 }
-
+#endif /* !__ASSEMBLY__ */
 #endif
diff --git a/arch/powerpc/kernel/epapr_hcalls.S 
b/arch/powerpc/kernel/epapr_hcalls.S
index 697b390..62c0dc2 100644
--- a/arch/powerpc/kernel/epapr_hcalls.S
+++ b/arch/powerpc/kernel/epapr_hcalls.S
@@ -8,13 +8,41 @@
  */
 
 #include linux/threads.h
+#include asm/epapr_hcalls.h
 #include asm/reg.h
 #include asm/page.h
 #include asm/cputable.h
 #include asm/thread_info.h
 #include asm/ppc_asm.h
+#include asm/asm-compat.h
 #include asm/asm-offsets.h
 
+/* epapr_ev_idle() was derived from e500_idle() */
+_GLOBAL(epapr_ev_idle)
+   CURRENT_THREAD_INFO(r3, r1)
+   PPC_LL  r4, TI_LOCAL_FLAGS(r3)  /* set napping bit */
+   ori r4, r4,_TLF_NAPPING /* so when we take an exception */
+   PPC_STL r4, TI_LOCAL_FLAGS(r3)  /* it will return to our caller */
+
+   wrteei  1
+
+idle_loop:
+   LOAD_REG_IMMEDIATE(r11, EV_HCALL_TOKEN(EV_IDLE))
+
+.global epapr_ev_idle_start
+epapr_ev_idle_start:
+   li  r3, -1
+   nop
+   nop
+   nop
+
+   /*
+* Guard against spurious wakeups from a hypervisor --
+* only interrupt will cause us to return to LR due to
+* _TLF_NAPPING.
+*/
+   b   idle_loop
+
 /* Hypercall entry point. Will be patched with device tree instructions. */
 .global epapr_hypercall_start
 epapr_hypercall_start:
diff --git a/arch/powerpc/kernel/epapr_paravirt.c 
b/arch/powerpc/kernel/epapr_paravirt.c
index 028aeae..f3eab85 100644
--- a/arch/powerpc/kernel/epapr_paravirt.c
+++ b/arch/powerpc/kernel/epapr_paravirt.c
@@ -21,6 +21,10 @@
 #include asm/epapr_hcalls.h
 #include asm/cacheflush.h
 #include asm/code-patching.h
+#include asm/machdep.h
+
+extern void epapr_ev_idle(void);
+extern u32 epapr_ev_idle_start[];
 
 bool epapr_paravirt_enabled;
 
@@ -41,8 +45,13 @@ static int __init epapr_paravirt_init(void)
if (len % 4 || len  (4 * 4))
return -ENODEV;
 
-   for (i = 0; i  (len / 4); i++)
+   for (i = 0; i  (len / 4); i++) {
patch_instruction(epapr_hypercall_start + i, insts[i]);
+   patch_instruction(epapr_ev_idle_start + i, insts[i]);
+   }
+
+   if (of_get_property(hyper_node, has-idle, NULL))
+   ppc_md.power_save = epapr_ev_idle;
 
epapr_paravirt_enabled = true;
 
-- 
1.7.3.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v12 2/8] KVM: PPC: use definitions in epapr header for hcalls

2012-07-03 Thread Stuart Yoder

From: Stuart Yoder stuart.yo...@freescale.com

Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
---
v12: no changes

 arch/powerpc/include/asm/kvm_para.h |   21 +++--
 arch/powerpc/kernel/kvm.c   |2 +-
 arch/powerpc/kvm/powerpc.c  |   10 +-
 3 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index c18916b..a168ce3 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -75,9 +75,10 @@ struct kvm_vcpu_arch_shared {
 };
 
 #define KVM_SC_MAGIC_R00x4b564d21 /* KVM! */
-#define HC_VENDOR_KVM  (42  16)
-#define HC_EV_SUCCESS  0
-#define HC_EV_UNIMPLEMENTED12
+
+#define KVM_HCALL_TOKEN(num) _EV_HCALL_TOKEN(EV_KVM_VENDOR_ID, num)
+
+#include asm/epapr_hcalls.h
 
 #define KVM_FEATURE_MAGIC_PAGE 1
 
@@ -121,7 +122,7 @@ static unsigned long kvm_hypercall(unsigned long *in,
   unsigned long *out,
   unsigned long nr)
 {
-   return HC_EV_UNIMPLEMENTED;
+   return EV_UNIMPLEMENTED;
 }
 
 #endif
@@ -132,7 +133,7 @@ static inline long kvm_hypercall0_1(unsigned int nr, 
unsigned long *r2)
unsigned long out[8];
unsigned long r;
 
-   r = kvm_hypercall(in, out, nr | HC_VENDOR_KVM);
+   r = kvm_hypercall(in, out, KVM_HCALL_TOKEN(nr));
*r2 = out[0];
 
return r;
@@ -143,7 +144,7 @@ static inline long kvm_hypercall0(unsigned int nr)
unsigned long in[8];
unsigned long out[8];
 
-   return kvm_hypercall(in, out, nr | HC_VENDOR_KVM);
+   return kvm_hypercall(in, out, KVM_HCALL_TOKEN(nr));
 }
 
 static inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
@@ -152,7 +153,7 @@ static inline long kvm_hypercall1(unsigned int nr, unsigned 
long p1)
unsigned long out[8];
 
in[0] = p1;
-   return kvm_hypercall(in, out, nr | HC_VENDOR_KVM);
+   return kvm_hypercall(in, out, KVM_HCALL_TOKEN(nr));
 }
 
 static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
@@ -163,7 +164,7 @@ static inline long kvm_hypercall2(unsigned int nr, unsigned 
long p1,
 
in[0] = p1;
in[1] = p2;
-   return kvm_hypercall(in, out, nr | HC_VENDOR_KVM);
+   return kvm_hypercall(in, out, KVM_HCALL_TOKEN(nr));
 }
 
 static inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
@@ -175,7 +176,7 @@ static inline long kvm_hypercall3(unsigned int nr, unsigned 
long p1,
in[0] = p1;
in[1] = p2;
in[2] = p3;
-   return kvm_hypercall(in, out, nr | HC_VENDOR_KVM);
+   return kvm_hypercall(in, out, KVM_HCALL_TOKEN(nr));
 }
 
 static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
@@ -189,7 +190,7 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned 
long p1,
in[1] = p2;
in[2] = p3;
in[3] = p4;
-   return kvm_hypercall(in, out, nr | HC_VENDOR_KVM);
+   return kvm_hypercall(in, out, KVM_HCALL_TOKEN(nr));
 }
 
 
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 1c13307..5fcc537 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -419,7 +419,7 @@ static void kvm_map_magic_page(void *data)
in[0] = KVM_MAGIC_PAGE;
in[1] = KVM_MAGIC_PAGE;
 
-   kvm_hypercall(in, out, HC_VENDOR_KVM | KVM_HC_PPC_MAP_MAGIC_PAGE);
+   kvm_hypercall(in, out, KVM_HCALL_TOKEN(KVM_HC_PPC_MAP_MAGIC_PAGE));
 
*features = out[0];
 }
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 87f4dc8..a98f7e0 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -67,18 +67,18 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
}
 
switch (nr) {
-   case HC_VENDOR_KVM | KVM_HC_PPC_MAP_MAGIC_PAGE:
+   case KVM_HCALL_TOKEN(KVM_HC_PPC_MAP_MAGIC_PAGE):
{
vcpu-arch.magic_page_pa = param1;
vcpu-arch.magic_page_ea = param2;
 
r2 = KVM_MAGIC_FEAT_SR | KVM_MAGIC_FEAT_MAS0_TO_SPRG7;
 
-   r = HC_EV_SUCCESS;
+   r = EV_SUCCESS;
break;
}
-   case HC_VENDOR_KVM | KVM_HC_FEATURES:
-   r = HC_EV_SUCCESS;
+   case KVM_HCALL_TOKEN(KVM_HC_FEATURES):
+   r = EV_SUCCESS;
 #if defined(CONFIG_PPC_BOOK3S) || defined(CONFIG_KVM_E500V2)
/* XXX Missing magic page on 44x */
r2 |= (1  KVM_FEATURE_MAGIC_PAGE);
@@ -87,7 +87,7 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
/* Second return value is in r4 */
break;
default:
-   r = HC_EV_UNIMPLEMENTED;
+   r = EV_UNIMPLEMENTED;
break;
}
 
-- 
1.7.3.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at

[PATCH v12 4/8] KVM: PPC: Add support for ePAPR idle hcall in host kernel

2012-07-03 Thread Stuart Yoder

From: Liu Yu-B13201 yu@freescale.com

And add a new flag definition in kvm_ppc_pvinfo to indicate
whether the host supports the EV_IDLE hcall.

Signed-off-by: Liu Yu yu@freescale.com
[stuart.yo...@freescale.com: cleanup,fixes for conditions allowing idle]
Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
---
v12: use EV_HCALL_TOKEN macro

 Documentation/virtual/kvm/api.txt |7 +--
 arch/powerpc/include/asm/Kbuild   |1 +
 arch/powerpc/kvm/powerpc.c|   10 --
 include/linux/kvm.h   |2 ++
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 310fe50..920c3c4 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1190,12 +1190,15 @@ struct kvm_ppc_pvinfo {
 This ioctl fetches PV specific information that need to be passed to the guest
 using the device tree or other means from vm context.
 
-For now the only implemented piece of information distributed here is an array
-of 4 instructions that make up a hypercall.
+The hcall array defines 4 instructions that make up a hypercall.
 
 If any additional field gets added to this structure later on, a bit for that
 additional piece of information will be set in the flags bitmap.
 
+The flags bitmap is defined as:
+
+   /* the host supports the ePAPR idle hcall
+   #define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (10)
 
 4.48 KVM_ASSIGN_PCI_DEVICE
 
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 7e313f1..13d6b7b 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -34,5 +34,6 @@ header-y += termios.h
 header-y += types.h
 header-y += ucontext.h
 header-y += unistd.h
+header-y += epapr_hcalls.h
 
 generic-y += rwsem.h
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 30cf01c..1a4db32 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -38,8 +38,7 @@
 
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-   return !(v-arch.shared-msr  MSR_WE) ||
-  !!(v-arch.pending_exceptions) ||
+   return !!(v-arch.pending_exceptions) ||
   v-requests;
 }
 
@@ -86,6 +85,11 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 
/* Second return value is in r4 */
break;
+   case EV_HCALL_TOKEN(EV_EPAPR_VENDOR_ID, EV_IDLE):
+   r = EV_SUCCESS;
+   kvm_vcpu_block(vcpu);
+   clear_bit(KVM_REQ_UNHALT, vcpu-requests);
+   break;
default:
r = EV_UNIMPLEMENTED;
break;
@@ -767,6 +771,8 @@ static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo 
*pvinfo)
pvinfo-hcall[3] = inst_nop;
 #endif
 
+   pvinfo-flags = KVM_PPC_PVINFO_FLAGS_EV_IDLE;
+
return 0;
 }
 
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2ce09aa..c03e59e 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -473,6 +473,8 @@ struct kvm_ppc_smmu_info {
struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
 };
 
+#define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (10)
+
 #define KVMIO 0xAE
 
 /* machine type bits, to be used as argument to KVM_CREATE_VM */
-- 
1.7.3.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v12 3/8] KVM: PPC: add pvinfo for hcall opcodes on e500mc/e5500

2012-07-03 Thread Stuart Yoder

From: Stuart Yoder stuart.yo...@freescale.com

Signed-off-by: Liu Yu yu@freescale.com
[stuart: factored this out from idle hcall support in host patch]
Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
---
v12: no changes

 arch/powerpc/kvm/powerpc.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index a98f7e0..30cf01c 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -739,9 +739,16 @@ int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct 
vm_fault *vmf)
 
 static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo *pvinfo)
 {
+   u32 inst_nop = 0x6000;
+#ifdef CONFIG_KVM_BOOKE_HV
+   u32 inst_sc1 = 0x4422;
+   pvinfo-hcall[0] = inst_sc1;
+   pvinfo-hcall[1] = inst_nop;
+   pvinfo-hcall[2] = inst_nop;
+   pvinfo-hcall[3] = inst_nop;
+#else
u32 inst_lis = 0x3c00;
u32 inst_ori = 0x6000;
-   u32 inst_nop = 0x6000;
u32 inst_sc = 0x4402;
u32 inst_imm_mask = 0x;
 
@@ -758,6 +765,7 @@ static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo 
*pvinfo)
pvinfo-hcall[1] = inst_ori | (KVM_SC_MAGIC_R0  inst_imm_mask);
pvinfo-hcall[2] = inst_sc;
pvinfo-hcall[3] = inst_nop;
+#endif
 
return 0;
 }
-- 
1.7.3.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v12 6/8] PPC: select EPAPR_PARAVIRT for all users of epapr hcalls

2012-07-03 Thread Stuart Yoder

From: Stuart Yoder stuart.yo...@freescale.com

Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
---
v12: no changes

 arch/powerpc/platforms/Kconfig |1 +
 drivers/tty/Kconfig|1 +
 drivers/virt/Kconfig   |1 +
 3 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index e7a896a..48a920d 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -90,6 +90,7 @@ config MPIC
 config PPC_EPAPR_HV_PIC
bool
default n
+   select EPAPR_PARAVIRT
 
 config MPIC_WEIRD
bool
diff --git a/drivers/tty/Kconfig b/drivers/tty/Kconfig
index 830cd62..aa99cd2 100644
--- a/drivers/tty/Kconfig
+++ b/drivers/tty/Kconfig
@@ -358,6 +358,7 @@ config TRACE_SINK
 config PPC_EPAPR_HV_BYTECHAN
tristate ePAPR hypervisor byte channel driver
depends on PPC
+   select EPAPR_PARAVIRT
help
  This driver creates /dev entries for each ePAPR hypervisor byte
  channel, thereby allowing applications to communicate with byte
diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index 2dcdbc9..99ebdde 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -15,6 +15,7 @@ if VIRT_DRIVERS
 config FSL_HV_MANAGER
tristate Freescale hypervisor management driver
depends on FSL_SOC
+   select EPAPR_PARAVIRT
help
   The Freescale hypervisor management driver provides several services
  to drivers and applications related to the Freescale hypervisor:
-- 
1.7.3.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Android-virt] [PATCH v9 16/16] ARM: KVM: Guest wait-for-interrupts (WFI) support

2012-07-03 Thread Avi Kivity

On 07/03/2012 04:49 PM, Peter Maydell wrote:
 On 3 July 2012 14:24, Avi Kivity a...@redhat.com wrote:
 On 07/03/2012 04:14 PM, Peter Maydell wrote:
 You could just always wake the cpu when migrating: the
 architecture allows WFI to return early for any reason
 it likes including implementation convenience.

 Seems reasonable.

 I imagine wfi works with interrupts disabled, unlike the x86 silliness?
 
 Not sure exactly which bit of x86 silliness you're referring
 to, but WFI will wake up regardless of the interrupt mask
 bits in the CPSR. (If you've disabled interrupts in the GIC
 that's your own bad lookout I guess.)


On x86 HLT respects the interrupt enable flag, so to avoid races, you
have to use a feature of STI (enable interrupts instruction) that only
enables interrupts after another instruction has executed.  So the
sequence STI; HLT atomically enables interrupts and waits for one (as
long as HLT didn't trigger an exception like a page fault).

The problem is that this interrupt shadow has to be tracked by
virtualization hardware, live migration, instruction emulation, etc.  It
interacts with non-maskable interrupts as well.  A horrible hack.

-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)

2012-07-03 Thread Rafael Aquini

On Tue, Jul 03, 2012 at 10:17:46AM +0930, Rusty Russell wrote:
 On Mon, 2 Jul 2012 13:08:19 -0300, Rafael Aquini aqu...@redhat.com wrote:
  As 'locking in balloon', may I assume the approach I took for the 
  compaction case
  is OK and aligned to address these concerns of yours? If not, do not 
  hesitate in
  giving me your thoughts, please. I'm respinning a V3 series to address a 
  couple
  of extra nitpicks from the compaction standpoint, and I'd love to be able to
  address any extra concern you might have on the balloon side of that work.
 
 It's orthogonal, though looks like they clash textually :(
 
 I'll re-spin MST's patch on top of yours, and include both in my tree,
 otherwise linux-next will have to do the merge.  But I'll await your
 push before pushing to Linus next merge window.

Thanks, Rusty.

I'll post V3 series quite soon.

Cheers!
Rafael
 
 Thanks,
 Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: race condition in qemu-kvm-1.0.1

2012-07-03 Thread Marcelo Tosatti

On Wed, Jun 27, 2012 at 12:35:22PM +0200, Peter Lieven wrote:
 Hi,
 
 we recently came across multiple VMs racing and stopping working. It
 seems to happen when the system is at 100% cpu.
 One way to reproduce this is:
 qemu-kvm-1.0.1 with vnc-thread enabled
 
 cmdline (or similar):
 /usr/bin/qemu-kvm-1.0.1 -net
 tap,vlan=141,script=no,downscript=no,ifname=tap15,vnet_hdr -net
 nic,vlan=141,model=virtio,macaddr=52:54:00:ff:00:f7 -drive 
 format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-efdf4e007-16700198c7f4fead-02-debug-race-hd01,if=virtio,cache=none,aio=native
 -m 2048 -smp 2,sockets=1,cores=2,threads=1 -monitor
 tcp:0:4026,server,nowait -vnc :26 -qmp tcp:0:3026,server,nowait
 -name 02-debug-race -boot order=dc,menu=off -cdrom
 /home/kvm/cdrom//root/ubuntu-12.04-server-amd64.iso -k de -pidfile
 /var/run/qemu/vm-221.pid -mem-prealloc -cpu
 host,+x2apic,model_id=Intel(R) Xeon(R) CPU   L5640  @
 2.27GHz,-tsc -rtc base=utc -usb -usbdevice tablet -no-hpet -vga
 cirrus

Is it reproducible without vnc thread enabled?

 
 it is important that the attached virtio image contains only zeroes.
 if the system boots from cd, select boot from first harddisk.
 the hypervisor then hangs at 100% cpu and neither monitor nor qmp
 are responsive anymore.
 
 i have also seen customers reporting this when a VM is shut down.
 
 if this is connected to the threaded vnc server it might be
 important to connected at this time.
 
 debug backtrace attached.
 
 Thanks,
 Peter
 
 --
 
 (gdb) file /usr/bin/qemu-kvm-1.0.1
 Reading symbols from /usr/bin/qemu-kvm-1.0.1...done.
 (gdb) attach 5145
 Attaching to program: /usr/bin/qemu-kvm-1.0.1, process 5145
 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
 symbols found)...done.
 Loaded symbols for /lib64/ld-linux-x86-64.so.2
 [Thread debugging using libthread_db enabled]
 [New Thread 0x7f54d08b9700 (LWP 5253)]
 [New Thread 0x7f5552757700 (LWP 5152)]
 [New Thread 0x7f5552f58700 (LWP 5151)]
 0x7f5553c6b5a3 in select () from /lib/libc.so.6
 (gdb) info threads
   4 Thread 0x7f5552f58700 (LWP 5151)  0x7f5553c6a747 in ioctl ()
 from /lib/libc.so.6
   3 Thread 0x7f5552757700 (LWP 5152)  0x7f5553c6a747 in ioctl ()
 from /lib/libc.so.6
   2 Thread 0x7f54d08b9700 (LWP 5253)  0x7f5553f1a85c in
 pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
 * 1 Thread 0x7f50d700 (LWP 5145)  0x7f5553c6b5a3 in select
 () from /lib/libc.so.6
 (gdb) thread apply all bt
 
 Thread 4 (Thread 0x7f5552f58700 (LWP 5151)):
 #0  0x7f5553c6a747 in ioctl () from /lib/libc.so.6
 #1  0x7f727830 in kvm_vcpu_ioctl (env=0x7f5557652f10,
 type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101
 #2  0x7f72728a in kvm_cpu_exec (env=0x7f5557652f10) at
 /usr/src/qemu-kvm-1.0.1/kvm-all.c:987
 #3  0x7f6f5c08 in qemu_kvm_cpu_thread_fn
 (arg=0x7f5557652f10) at /usr/src/qemu-kvm-1.0.1/cpus.c:740
 #4  0x7f5553f159ca in start_thread () from /lib/libpthread.so.0
 #5  0x7f5553c72cdd in clone () from /lib/libc.so.6
 #6  0x in ?? ()
 
 Thread 3 (Thread 0x7f5552757700 (LWP 5152)):
 #0  0x7f5553c6a747 in ioctl () from /lib/libc.so.6
 #1  0x7f727830 in kvm_vcpu_ioctl (env=0x7f555766ae60,
 type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101
 #2  0x7f72728a in kvm_cpu_exec (env=0x7f555766ae60) at
 /usr/src/qemu-kvm-1.0.1/kvm-all.c:987
 #3  0x7f6f5c08 in qemu_kvm_cpu_thread_fn
 (arg=0x7f555766ae60) at /usr/src/qemu-kvm-1.0.1/cpus.c:740
 #4  0x7f5553f159ca in start_thread () from /lib/libpthread.so.0
 #5  0x7f5553c72cdd in clone () from /lib/libc.so.6
 #6  0x in ?? ()
 
 Thread 2 (Thread 0x7f54d08b9700 (LWP 5253)):
 #0  0x7f5553f1a85c in pthread_cond_wait@@GLIBC_2.3.2 () from
 /lib/libpthread.so.0
 #1  0x7f679f5d in qemu_cond_wait (cond=0x7f5557ede1e0,
 mutex=0x7f5557ede210) at qemu-thread-posix.c:113
 #2  0x7f6b06a1 in vnc_worker_thread_loop
 (queue=0x7f5557ede1e0) at ui/vnc-jobs-async.c:222
 #3  0x7f6b0b7f in vnc_worker_thread (arg=0x7f5557ede1e0) at
 ui/vnc-jobs-async.c:318
 #4  0x7f5553f159ca in start_thread () from /lib/libpthread.so.0
 #5  0x7f5553c72cdd in clone () from /lib/libc.so.6
 #6  0x in ?? ()
 
 Thread 1 (Thread 0x7f50d700 (LWP 5145)):
 #0  0x7f5553c6b5a3 in select () from /lib/libc.so.6
 #1  0x7f6516be in main_loop_wait (nonblocking=0) at main-loop.c:456
 #2  0x7f647ad0 in main_loop () at /usr/src/qemu-kvm-1.0.1/vl.c:1482
 #3  0x7f64c698 in main (argc=38, argv=0x79d894a8,
 envp=0x79d895e0) at /usr/src/qemu-kvm-1.0.1/vl.c:3523
 (gdb) thread apply all bt full
 
 Thread 4 (Thread 0x7f5552f58700 (LWP 5151)):
 #0  0x7f5553c6a747 in ioctl () from /lib/libc.so.6
 No symbol table info available.
 #1  0x7f727830 in kvm_vcpu_ioctl (env=0x7f5557652f10,
 type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101
 ret = 32597
 arg = 0x0
 ap =

Re: [PATCH v3 02/26] KVM: Split cpuid register access from computation

2012-07-03 Thread Marcelo Tosatti

On Wed, Jun 27, 2012 at 06:24:50PM +0300, Avi Kivity wrote:
 Introduce kvm_cpuid() to perform the leaf limit check and calculate
 register values, and let kvm_emulate_cpuid() just handle reading and
 writing the registers from/to the vcpu.  This allows us to reuse
 kvm_cpuid() in a context where directly reading and writing registers
 is not desired.
 
 Signed-off-by: Avi Kivity a...@redhat.com
 ---
  arch/x86/kvm/cpuid.c | 38 --
  arch/x86/kvm/cpuid.h |  1 +
  2 files changed, 25 insertions(+), 14 deletions(-)
 
 diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
 index 7df1c6d..44476fb 100644
 --- a/arch/x86/kvm/cpuid.c
 +++ b/arch/x86/kvm/cpuid.c
 @@ -639,33 +639,43 @@ static struct kvm_cpuid_entry2* 
 check_cpuid_limit(struct kvm_vcpu *vcpu,
   return kvm_find_cpuid_entry(vcpu, maxlevel-eax, index);
  }
  
 -void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 +void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
  {
 - u32 function, index;
 + u32 function = *eax, index = *ecx;
   struct kvm_cpuid_entry2 *best;
  
 - function = kvm_register_read(vcpu, VCPU_REGS_RAX);
 - index = kvm_register_read(vcpu, VCPU_REGS_RCX);
 - kvm_register_write(vcpu, VCPU_REGS_RAX, 0);
 - kvm_register_write(vcpu, VCPU_REGS_RBX, 0);
 - kvm_register_write(vcpu, VCPU_REGS_RCX, 0);
 - kvm_register_write(vcpu, VCPU_REGS_RDX, 0);
   best = kvm_find_cpuid_entry(vcpu, function, index);
  
   if (!best)
   best = check_cpuid_limit(vcpu, function, index);
  
   if (best) {
 - kvm_register_write(vcpu, VCPU_REGS_RAX, best-eax);
 - kvm_register_write(vcpu, VCPU_REGS_RBX, best-ebx);
 - kvm_register_write(vcpu, VCPU_REGS_RCX, best-ecx);
 - kvm_register_write(vcpu, VCPU_REGS_RDX, best-edx);
 - }
 - kvm_x86_ops-skip_emulated_instruction(vcpu);
 + *eax = best-eax;
 + *ebx = best-ebx;
 + *ecx = best-ecx;
 + *edx = best-edx;
 + } else
 + *eax = *ebx = *ecx = *edx = 0;
 +
   trace_kvm_cpuid(function,
   kvm_register_read(vcpu, VCPU_REGS_RAX),
   kvm_register_read(vcpu, VCPU_REGS_RBX),
   kvm_register_read(vcpu, VCPU_REGS_RCX),
   kvm_register_read(vcpu, VCPU_REGS_RDX));
  }
 +
 +void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 +{
 + u32 function, eax, ebx, ecx, edx;
 +
 + function = eax = kvm_register_read(vcpu, VCPU_REGS_RAX);
 + ecx = kvm_register_read(vcpu, VCPU_REGS_RCX);
 + kvm_cpuid(vcpu, eax, ebx, ecx, edx);
 + kvm_register_write(vcpu, VCPU_REGS_RAX, eax);
 + kvm_register_write(vcpu, VCPU_REGS_RBX, ebx);
 + kvm_register_write(vcpu, VCPU_REGS_RCX, ecx);
 + kvm_register_write(vcpu, VCPU_REGS_RDX, edx);
 + kvm_x86_ops-skip_emulated_instruction(vcpu);
 + trace_kvm_cpuid(function, eax, ebx, ecx, edx);
 +}

Tracing is duplicated. Is that intented?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] apic: fix kvm build on UP without IOAPIC

2012-07-03 Thread Marcelo Tosatti

On Sun, Jul 01, 2012 at 06:05:06PM +0300, Michael S. Tsirkin wrote:
 On UP i386, when APIC is disabled
 # CONFIG_X86_UP_APIC is not set
 # CONFIG_PCI_IOAPIC is not set
 
 code looking at apicdrivers never has any effect but it
 still gets compiled in. In particular, this causes
 build failures with kvm, but it generally bloats the kernel
 unnecessarily.
 
 Fix by defining both __apicdrivers and __apicdrivers_end
 to be NULL when CONFIG_X86_LOCAL_APIC is unset: I verified
 that as the result any loop scanning __apicdrivers gets optimized out by
 the compiler.
 
 Warning: a .config with apic disabled doesn't seem to boot
 for me (even without this patch). Still verifying why,
 meanwhile this patch is compile-tested only.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 Note: if this patch makes sense, can x86 maintainers
 please ACK applying it through the kvm tree, since that is
 where we see the issue that it addresses?
 Avi, Marcelo, maybe you can carry this in kvm/linux-next as a temporary
 measure so that linux-next builds?

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] plan for device assignment upstream

2012-07-03 Thread Blue Swirl

On Mon, Jul 2, 2012 at 9:43 AM, Avi Kivity a...@redhat.com wrote:
 On 07/02/2012 12:30 PM, Jan Kiszka wrote:
 On 2012-07-02 11:18, Michael S. Tsirkin wrote:
 I've been thinking hard about Jan's patches for device
 assignment. Basically while I thought it makes sense
 to make all devices: assignment and not - behave the
 same and use same APIs for injecting irqs, Anthony thinks there is huge
 value in making irq propagation hierarchical and device assignment
 should be special cased.

 On the long term, we will need direct injection, ie. caching, to allow
 making it lock-less. Stepping through all intermediate layers will cause
 troubles, at least performance-wise, when having to take and drop a lock
 at each stop.

 So we precalculate everything beforehand.  Instead of each qemu_irq
 triggering a callback, calculating the next hop and firing the next
 qemu_irq, configure each qemu_irq array with a function that describes
 how to take the next hop.  Whenever the configuration changes,
 recalculate all routes.

Yes, we had this discussion last year when I proposed the IRQ matrix:
http://lists.nongnu.org/archive/html/qemu-devel/2011-09/msg00474.html

One problem with the matrix is that it only works for enable/disable
level, not for more complex situations like boolean logic or
multiplexed outputs.

Perhaps the devices should describe the currently valid logic with
packet filter type mechanism? I think that could scale arbitrarily and
it could be more friendly even as a kernel interface?


 For device assignment or vhost, we can have a qemu_irq_irqfd() which
 converts a qemu_irq to an eventfd.  If the route calculations determine
 that it can be serviced via a real irqfd, they also configure it as an
 irqfd.  Otherwise qemu configures a poll on this eventfd and calls the
 callback when needed.


 --
 error compiling committee.c: too many arguments to function



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/6] file_ram_alloc(): coding style fixes

2012-07-03 Thread Blue Swirl

On Mon, Jul 2, 2012 at 6:06 PM, Eduardo Habkost ehabk...@redhat.com wrote:
 Cc: Blue Swirl blauwir...@gmail.com
 Signed-off-by: Eduardo Habkost ehabk...@redhat.com

Acked-by: Blue Swirl blauwir...@gmail.com

 ---
  exec.c |5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

 diff --git a/exec.c b/exec.c
 index 8244d54..c8bfd27 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -2392,7 +2392,7 @@ static void *file_ram_alloc(RAMBlock *block,
  unlink(filename);
  free(filename);

 -memory = (memory+hpagesize-1)  ~(hpagesize-1);
 +memory = (memory + hpagesize - 1)  ~(hpagesize - 1);

  /*
   * ftruncate is not supported by hugetlbfs in older
 @@ -2400,8 +2400,9 @@ static void *file_ram_alloc(RAMBlock *block,
   * If anything goes wrong with it under other filesystems,
   * mmap will fail.
   */
 -if (ftruncate(fd, memory))
 +if (ftruncate(fd, memory)) {
  perror(ftruncate);
 +}

  #ifdef MAP_POPULATE
  /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
 --
 1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/6] file_ram_alloc(): use g_strdup_printf() instead of asprintf()

2012-07-03 Thread Blue Swirl

On Mon, Jul 2, 2012 at 6:06 PM, Eduardo Habkost ehabk...@redhat.com wrote:
 Cc: Blue Swirl blauwir...@gmail.com
 Signed-off-by: Eduardo Habkost ehabk...@redhat.com

Acked-by: Blue Swirl blauwir...@gmail.com

 ---
  exec.c |   14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)

 diff --git a/exec.c b/exec.c
 index c8bfd27..d856325 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -24,6 +24,9 @@
  #include sys/mman.h
  #endif

 +#include glib.h
 +#include glib/gprintf.h
 +
  #include qemu-common.h
  #include cpu.h
  #include tcg.h
 @@ -2357,7 +2360,7 @@ static void *file_ram_alloc(RAMBlock *block,
  ram_addr_t memory,
  const char *path)
  {
 -char *filename;
 +gchar *filename;
  void *area;
  int fd;
  #ifdef MAP_POPULATE
 @@ -2379,18 +2382,15 @@ static void *file_ram_alloc(RAMBlock *block,
  return NULL;
  }

 -if (asprintf(filename, %s/qemu_back_mem.XX, path) == -1) {
 -return NULL;
 -}
 -
 +filename = g_strdup_printf(%s/qemu_back_mem.XX, path);
  fd = mkstemp(filename);
  if (fd  0) {
  perror(unable to create backing store for hugepages);
 -free(filename);
 +g_free(filename);
  return NULL;
  }
  unlink(filename);
 -free(filename);
 +g_free(filename);

  memory = (memory + hpagesize - 1)  ~(hpagesize - 1);

 --
 1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 1/2] kvm: Extend irqfd to support level interrupts

2012-07-03 Thread Alex Williamson

In order to inject a level interrupt from an external source using an
irqfd, we need to allocate a new irq_source_id.  This allows us to
assert and (later) de-assert an interrupt line independently from
users of KVM_IRQ_LINE and avoid lost interrupts.

We also add what may appear like a bit of excessive infrastructure
around an object for storing this irq_source_id.  However, notice
that we only provide a way to assert the interrupt here.  A follow-on
interface will make use of the same irq_source_id to allow de-assert.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Documentation/virtual/kvm/api.txt |6 ++
 arch/x86/kvm/x86.c|1 
 include/linux/kvm.h   |3 +
 virt/kvm/eventfd.c|  103 -
 4 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 100acde..c7267d5 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1981,6 +1981,12 @@ the guest using the specified gsi pin.  The irqfd is 
removed using
 the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
 and kvm_irqfd.gsi.
 
+The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level
+triggered interrupt.  In this case a new irqchip input is allocated
+which is logically OR'd with other inputs allowing multiple sources
+to independently assert level interrupts.  The KVM_IRQFD_FLAG_LEVEL
+is only necessary on setup, teardown is identical to that above.
+KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
 
 5. The kvm_run structure
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a01a424..80bed07 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_GET_TSC_KHZ:
case KVM_CAP_PCI_2_3:
case KVM_CAP_KVMCLOCK_CTRL:
+   case KVM_CAP_IRQFD_LEVEL:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2ce09aa..b2e6e4f 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_GET_SMMU_INFO 78
 #define KVM_CAP_S390_COW 79
 #define KVM_CAP_PPC_ALLOC_HTAB 80
+#define KVM_CAP_IRQFD_LEVEL 81
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
 #endif
 
 #define KVM_IRQFD_FLAG_DEASSIGN (1  0)
+/* Available with KVM_CAP_IRQFD_LEVEL */
+#define KVM_IRQFD_FLAG_LEVEL (1  1)
 
 struct kvm_irqfd {
__u32 fd;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 7d7e2aa..92aa5ba 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -36,6 +36,64 @@
 #include iodev.h
 
 /*
+ * An irq_source_id can be created from KVM_IRQFD for level interrupt
+ * injections and shared with other interfaces for EOI or de-assert.
+ * Create an object with reference counting to make it easy to use.
+ */
+struct _irq_source {
+   int id; /* the IRQ source ID */
+   struct kvm *kvm;
+   struct kref kref;
+};
+
+static void _irq_source_release(struct kref *kref)
+{
+   struct _irq_source *source;
+
+   source = container_of(kref, struct _irq_source, kref);
+
+   kvm_free_irq_source_id(source-kvm, source-id);
+   kfree(source);
+}
+
+static void _irq_source_put(struct _irq_source *source)
+{
+   if (source)
+   kref_put(source-kref, _irq_source_release);
+}
+
+static struct _irq_source *__attribute__ ((used)) /* white lie for now */
+_irq_source_get(struct _irq_source *source)
+{
+   if (source)
+   kref_get(source-kref);
+
+   return source;
+}
+
+static struct _irq_source *_irq_source_alloc(struct kvm *kvm)
+{
+   struct _irq_source *source;
+   int id;
+
+   source = kzalloc(sizeof(*source), GFP_KERNEL);
+   if (!source)
+   return ERR_PTR(-ENOMEM);
+
+   id = kvm_request_irq_source_id(kvm);
+   if (id  0) {
+   kfree(source);
+   return ERR_PTR(id);
+   }
+
+   kref_init(source-kref);
+   source-kvm = kvm;
+   source-id = id;
+
+   return source;
+}
+
+/*
  * 
  * irqfd: Allows an fd to be used to inject an interrupt to the guest
  *
@@ -52,6 +110,8 @@ struct _irqfd {
/* Used for level IRQ fast-path */
int gsi;
struct work_struct inject;
+   /* IRQ source ID for level triggered irqfds */
+   struct _irq_source *source;
/* Used for setup/shutdown */
struct eventfd_ctx *eventfd;
struct list_head list;
@@ -62,7 +122,7 @@ struct _irqfd {
 static struct workqueue_struct *irqfd_cleanup_wq;
 
 static void
-irqfd_inject(struct work_struct *work)
+irqfd_inject_edge(struct work_struct *work)
 {
struct _irqfd *irqfd = container_of(work,

[PATCH v3 2/2] kvm: KVM_EOIFD, an eventfd for EOIs

2012-07-03 Thread Alex Williamson

This new ioctl enables an eventfd to be triggered when an EOI is
written for a specified irqchip pin.  By default this is a simple
notification, but we can also tie the eoifd to a level irqfd, which
enables the irqchip pin to be automatically de-asserted on EOI.
This mode is particularly useful for device-assignment applications
where the unmask and notify triggers a hardware unmask.  The default
mode is most applicable to simple notify with no side-effects for
userspace usage, such as Qemu.

Here we make use of the reference counting of the _irq_source
object allowing us to share it with an irqfd and cleanup regardless
of the release order.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Documentation/virtual/kvm/api.txt |   21 
 arch/x86/kvm/x86.c|1 
 include/linux/kvm.h   |   14 ++
 include/linux/kvm_host.h  |   13 ++
 virt/kvm/eventfd.c|  208 +
 virt/kvm/kvm_main.c   |   11 ++
 6 files changed, 266 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index c7267d5..a38af14 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1988,6 +1988,27 @@ to independently assert level interrupts.  The 
KVM_IRQFD_FLAG_LEVEL
 is only necessary on setup, teardown is identical to that above.
 KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
 
+4.77 KVM_EOIFD
+
+Capability: KVM_CAP_EOIFD
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_eoifd (in)
+Returns: 0 on success, -1 on error
+
+KVM_EOIFD allows userspace to receive interrupt EOI notification
+through an eventfd.  kvm_eoifd.fd specifies the eventfd used for
+notification and kvm_eoifd.gsi specifies the irchip pin, similar to
+KVM_IRQFD.  The eoifd is removed using the KVM_EOIFD_FLAG_DEASSIGN
+flag, specifying both kvm_eoifd.fd and kvm_eoifd.gsi.
+
+The KVM_EOIFD_FLAG_LEVEL_IRQFD flag indicates that the provided
+kvm_eoifd stucture includes a valid kvm_eoifd.irqfd file descriptor
+for a level irqfd configured using the KVM_IRQFD_FLAG_LEVEL flag.
+In this mode the level interrupt is de-asserted prior to EOI eventfd
+notification.  The KVM_EOIFD_FLAG_LEVEL_IRQFD is only necessary on
+setup, teardown is identical to that above.
+
 5. The kvm_run structure
 
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 80bed07..62d6eca 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2149,6 +2149,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_PCI_2_3:
case KVM_CAP_KVMCLOCK_CTRL:
case KVM_CAP_IRQFD_LEVEL:
+   case KVM_CAP_EOIFD:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index b2e6e4f..7567e7d 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_S390_COW 79
 #define KVM_CAP_PPC_ALLOC_HTAB 80
 #define KVM_CAP_IRQFD_LEVEL 81
+#define KVM_CAP_EOIFD 82
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -694,6 +695,17 @@ struct kvm_irqfd {
__u8  pad[20];
 };
 
+#define KVM_EOIFD_FLAG_DEASSIGN (1  0)
+#define KVM_EOIFD_FLAG_LEVEL_IRQFD (1  1)
+
+struct kvm_eoifd {
+   __u32 fd;
+   __u32 gsi;
+   __u32 flags;
+   __u32 irqfd;
+   __u8 pad[16];
+};
+
 struct kvm_clock_data {
__u64 clock;
__u32 flags;
@@ -834,6 +846,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_PPC_GET_SMMU_INFO_IOR(KVMIO,  0xa6, struct kvm_ppc_smmu_info)
 /* Available with KVM_CAP_PPC_ALLOC_HTAB */
 #define KVM_PPC_ALLOCATE_HTAB_IOWR(KVMIO, 0xa7, __u32)
+/* Available with KVM_CAP_EOIFD */
+#define KVM_EOIFD _IOW(KVMIO,  0xa8, struct kvm_eoifd)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ae3b426..83472eb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -285,6 +285,10 @@ struct kvm {
struct list_head  items;
} irqfds;
struct list_head ioeventfds;
+   struct {
+   spinlock_tlock;
+   struct list_head  items;
+   } eoifds;
 #endif
struct kvm_vm_stat stat;
struct kvm_arch arch;
@@ -828,6 +832,8 @@ int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args);
 void kvm_irqfd_release(struct kvm *kvm);
 void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
 int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
+int kvm_eoifd(struct kvm *kvm, struct kvm_eoifd *args);
+void kvm_eoifd_release(struct kvm *kvm);
 
 #else
 
@@ -853,6 +859,13 @@ static inline int kvm_ioeventfd(struct kvm *kvm, struct 
kvm_ioeventfd *args)
return -ENOSYS;
 }
 
+static inline int kvm_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
+{
+   return -ENOSYS;
+}
+
+static inline void

Re: [PATCH v12 4/8] KVM: PPC: Add support for ePAPR idle hcall in host kernel

2012-07-03 Thread Alexander Graf


On 03.07.2012, at 17:48, Stuart Yoder wrote:

 From: Liu Yu-B13201 yu@freescale.com
 
 And add a new flag definition in kvm_ppc_pvinfo to indicate
 whether the host supports the EV_IDLE hcall.
 
 Signed-off-by: Liu Yu yu@freescale.com
 [stuart.yo...@freescale.com: cleanup,fixes for conditions allowing idle]
 Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
 ---
 v12: use EV_HCALL_TOKEN macro
 
 Documentation/virtual/kvm/api.txt |7 +--
 arch/powerpc/include/asm/Kbuild   |1 +
 arch/powerpc/kvm/powerpc.c|   10 --
 include/linux/kvm.h   |2 ++
 4 files changed, 16 insertions(+), 4 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 310fe50..920c3c4 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -1190,12 +1190,15 @@ struct kvm_ppc_pvinfo {
 This ioctl fetches PV specific information that need to be passed to the guest
 using the device tree or other means from vm context.
 
 -For now the only implemented piece of information distributed here is an 
 array
 -of 4 instructions that make up a hypercall.
 +The hcall array defines 4 instructions that make up a hypercall.
 
 If any additional field gets added to this structure later on, a bit for that
 additional piece of information will be set in the flags bitmap.
 
 +The flags bitmap is defined as:
 +
 +   /* the host supports the ePAPR idle hcall
 +   #define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (10)
 
 4.48 KVM_ASSIGN_PCI_DEVICE
 
 diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
 index 7e313f1..13d6b7b 100644
 --- a/arch/powerpc/include/asm/Kbuild
 +++ b/arch/powerpc/include/asm/Kbuild
 @@ -34,5 +34,6 @@ header-y += termios.h
 header-y += types.h
 header-y += ucontext.h
 header-y += unistd.h
 +header-y += epapr_hcalls.h
 
 generic-y += rwsem.h
 diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
 index 30cf01c..1a4db32 100644
 --- a/arch/powerpc/kvm/powerpc.c
 +++ b/arch/powerpc/kvm/powerpc.c
 @@ -38,8 +38,7 @@
 
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
 - return !(v-arch.shared-msr  MSR_WE) ||
 -!!(v-arch.pending_exceptions) ||
 + return !!(v-arch.pending_exceptions) ||
  v-requests;
 }
 
 @@ -86,6 +85,11 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 
   /* Second return value is in r4 */
   break;
 + case EV_HCALL_TOKEN(EV_EPAPR_VENDOR_ID, EV_IDLE):

Did you try to compile this? :)

Will fix it up locally.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v12 4/8] KVM: PPC: Add support for ePAPR idle hcall in host kernel

2012-07-03 Thread Yoder Stuart-B08248



 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Tuesday, July 03, 2012 2:34 PM
 To: Yoder Stuart-B08248
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org
 Subject: Re: [PATCH v12 4/8] KVM: PPC: Add support for ePAPR idle hcall in 
 host kernel
 
 
 On 03.07.2012, at 17:48, Stuart Yoder wrote:
 
  From: Liu Yu-B13201 yu@freescale.com
 
  And add a new flag definition in kvm_ppc_pvinfo to indicate
  whether the host supports the EV_IDLE hcall.
 
  Signed-off-by: Liu Yu yu@freescale.com
  [stuart.yo...@freescale.com: cleanup,fixes for conditions allowing idle]
  Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
  ---
  v12: use EV_HCALL_TOKEN macro
 
  Documentation/virtual/kvm/api.txt |7 +--
  arch/powerpc/include/asm/Kbuild   |1 +
  arch/powerpc/kvm/powerpc.c|   10 --
  include/linux/kvm.h   |2 ++
  4 files changed, 16 insertions(+), 4 deletions(-)
 
  diff --git a/Documentation/virtual/kvm/api.txt 
  b/Documentation/virtual/kvm/api.txt
  index 310fe50..920c3c4 100644
  --- a/Documentation/virtual/kvm/api.txt
  +++ b/Documentation/virtual/kvm/api.txt
  @@ -1190,12 +1190,15 @@ struct kvm_ppc_pvinfo {
  This ioctl fetches PV specific information that need to be passed to the 
  guest
  using the device tree or other means from vm context.
 
  -For now the only implemented piece of information distributed here is an 
  array
  -of 4 instructions that make up a hypercall.
  +The hcall array defines 4 instructions that make up a hypercall.
 
  If any additional field gets added to this structure later on, a bit for 
  that
  additional piece of information will be set in the flags bitmap.
 
  +The flags bitmap is defined as:
  +
  +   /* the host supports the ePAPR idle hcall
  +   #define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (10)
 
  4.48 KVM_ASSIGN_PCI_DEVICE
 
  diff --git a/arch/powerpc/include/asm/Kbuild 
  b/arch/powerpc/include/asm/Kbuild
  index 7e313f1..13d6b7b 100644
  --- a/arch/powerpc/include/asm/Kbuild
  +++ b/arch/powerpc/include/asm/Kbuild
  @@ -34,5 +34,6 @@ header-y += termios.h
  header-y += types.h
  header-y += ucontext.h
  header-y += unistd.h
  +header-y += epapr_hcalls.h
 
  generic-y += rwsem.h
  diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
  index 30cf01c..1a4db32 100644
  --- a/arch/powerpc/kvm/powerpc.c
  +++ b/arch/powerpc/kvm/powerpc.c
  @@ -38,8 +38,7 @@
 
  int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
  {
  -   return !(v-arch.shared-msr  MSR_WE) ||
  -  !!(v-arch.pending_exceptions) ||
  +   return !!(v-arch.pending_exceptions) ||
 v-requests;
  }
 
  @@ -86,6 +85,11 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 
  /* Second return value is in r4 */
  break;
  +   case EV_HCALL_TOKEN(EV_EPAPR_VENDOR_ID, EV_IDLE):
 
 Did you try to compile this? :)

Hmm...could have swore I did. :(
 
 Will fix it up locally.

Thanks.

Stuart

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v12 0/8] KVM: PPC: Add ePAPR idle hcall support

2012-07-03 Thread Alexander Graf


On 03.07.2012, at 17:48, Stuart Yoder wrote:

 From: Stuart Yoder stuart.yo...@freescale.com
 
 v12 has a couple of updates to address feedback 
   -use new CURRENT_THREAD_INFO macro in epapr_hcalls.S
   -use EV_HCALL_TOKEN to create epapr hcall token
 
 A prerequisite to applying this patch is the patch:
  PPC: use CURRENT_THREAD_INFO instead of open coded assembly
 
 Liu Yu-B13201 (3):
  KVM: PPC: Add support for ePAPR idle hcall in host kernel
  KVM: PPC: ev_idle hcall support for e500 guests
  PPC: Don't use hardcoded opcode for ePAPR hcall invocation
 
 Scott Wood (1):
  powerpc/fsl-soc: use CONFIG_EPAPR_PARAVIRT for hcalls
 
 Stuart Yoder (4):
  PPC: epapr: create define for return code value of success
  KVM: PPC: use definitions in epapr header for hcalls
  KVM: PPC: add pvinfo for hcall opcodes on e500mc/e5500
  PPC: select EPAPR_PARAVIRT for all users of epapr hcalls


Thanks, applied all (with the small fix to 4/8) to kvm-ppc-next.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 00/16] KVM/ARM Implementation

2012-07-03 Thread Marcelo Tosatti

On Tue, Jul 03, 2012 at 04:29:27PM +0300, Avi Kivity wrote:
 On 07/03/2012 11:59 AM, Christoffer Dall wrote:
  The following series implements KVM support for ARM processors,
  specifically on the Cortex A-15 platform.  Work is done in
  collaboration between Columbia University, Virtual Open Systems and
  ARM/Linaro.
  
  The patch series applies to kvm/next, specifically commit:
  ae7a2a3fb6f8b784c2752863f4f1f20c656f76fb
  
  This is Version 9 of the patch series, but the first two versions
  were reviewed outside of the KVM mailing list. Changes can also be
  pulled from:
   git://github.com/virtualopensystems/linux-kvm-arm.git kvm-a15-v9
  
  A non-flattened edition of the patch series can be found at:
   git://github.com/virtualopensystems/linux-kvm-arm.git kvm-a15-v9-stage
  
  The implementation is broken up into a logical set of patches, the first
  five are preparatory patches:
   1. ARM: Add mem_type prot_pte accessor
   2. ARM: ARM_VIRT_EXT config option
   3. ARM: Section based HYP idmaps
   4. KVM: Move KVM_IRQ_LINE to arch-generic code
   5. KVM: Guard code with CONFIG_MMU_NOTIFIER (repost)
  
  KVM guys, please consider pulling the KVM generic patches as early as
  possible. Thanks.
 
 Those seem fine to me.  Marcelo?

5. is already in, 4. has comments and the rest is ARM specific.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 04/16] KVM: Move KVM_IRQ_LINE to arch-generic code

2012-07-03 Thread Marcelo Tosatti

On Tue, Jul 03, 2012 at 04:59:54AM -0400, Christoffer Dall wrote:
 Handle KVM_IRQ_LINE and KVM_IRQ_LINE_STATUS in the generic
 kvm_vm_ioctl() function and call into kvm_vm_ioctl_irq_line().
 
 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
  arch/ia64/kvm/kvm-ia64.c |   33 ++---
  arch/x86/kvm/x86.c   |   33 ++---
  include/linux/kvm_host.h |1 +
  virt/kvm/kvm_main.c  |   19 +++
  4 files changed, 40 insertions(+), 46 deletions(-)
 
 diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
 index bd77cb5..122a4b2 100644
 --- a/arch/ia64/kvm/kvm-ia64.c
 +++ b/arch/ia64/kvm/kvm-ia64.c
 @@ -924,6 +924,16 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
 struct kvm_regs *regs)
   return 0;
  }
  
 +int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event)
 +{
 + if (!irqchip_in_kernel(kvm))
 + return -ENXIO;
 +
 + irq_event-statusstatus = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
 +   irq_event-irq, irq_event-level);
 + return 0;
 +}

typo.

 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 636bd08..1d33877 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -2093,6 +2093,25 @@ static long kvm_vm_ioctl(struct file *filp,
   break;
   }
  #endif
 +#ifdef __KVM_HAVE_IRQ_LINE
 + case KVM_IRQ_LINE_STATUS:
 + case KVM_IRQ_LINE: {
 + struct kvm_irq_level irq_event;
 +
 + r = -EFAULT;
 + if (copy_from_user(irq_event, argp, sizeof irq_event))
 + goto out;
 +
 + r = kvm_vm_ioctl_irq_line(kvm, irq_event);

Add

if (r)
goto out;
r = -EFAULT;

 + if (ioctl == KVM_IRQ_LINE_STATUS) {
 + if (copy_to_user(argp, irq_event, sizeof irq_event))
 + r = -EFAULT;

Replace r = -EFAULT with goto out

 + }

Add r = 0;

 +
 + break;
 + }
 +#endif
   default:
   r = kvm_arch_vm_ioctl(filp, ioctl, arg);
   if (r == -ENOTTY)
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 kvm] kvm_pv_eoi: add flag support

2012-07-03 Thread Marcelo Tosatti

On Sun, Jul 01, 2012 at 06:08:30PM +0300, Michael S. Tsirkin wrote:
 Support the new PV EOI flag in kvm - it recently got merged
 into kvm.git. Set by default with -cpu kvm.
 Set for -cpu qemu by adding +kvm_pv_eoi.
 Clear by adding -kvm_pv_eoi to -cpu option.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

Applied to uq/master, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] KVM: MMU: track the refcount when unmap the page

2012-07-03 Thread Marcelo Tosatti

On Tue, Jul 03, 2012 at 02:32:14PM +0800, Xiao Guangrong wrote:
 It will trigger a WARN_ON if the page has been freed but it is still
 used in mmu, it can help us to detect mm bug early
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/kvm/mmu.c |8 
  1 files changed, 8 insertions(+), 0 deletions(-)
 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index cac3408..af7e076 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -527,6 +527,14 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
   return 0;
 
   pfn = spte_to_pfn(old_spte);
 +
 + /*
 +  * KVM does not hold the refcount of the page used by
 +  * kvm mmu, before reclaiming the page, we should
 +  * unmap it from mmu first.
 +  */
 + WARN_ON(!page_count(pfn_to_page(pfn)));
 +

Except for mmio pfns.

Applied patch 1 and 2 to master, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 125 matches

Mail list logo