Re: [question] incremental backup a running vm

2015-01-26 Thread Zhang Haoyu

On 2015-01-26 19:29:03, Paolo Bonzini wrote:

 On 26/01/2015 12:13, Zhang Haoyu wrote:
  Thanks, Paolo,
  but too many internal snapshots were saved by customers,
 switching to external snapshot mechanism has significant impaction
  on subsequent upgrade.
 
 In that case, patches are welcome. :)

  Another problem:
  drive_backup just implement one time backup,
  but I want VMWare's VDP-like backup mechanism.
  The initial backup of a virtual machine takes comparatively more time,
 because all of the data for that virtual machine is being backed up. 
  Subsequent backups of the same virtual machine take less time, because
  changed block tracking (log dirty) mechanism is used to only backup the 
  dirty data.
  After inittial backup done, even the VM shutdown, but subsequent backup 
  also only 
 copy the changed data.
 
 As mentioned before, patches for this are on the list.
 
I see, thanks, Paolo.
 Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] vhost-scsi: introduce an ioctl to get the minimum tpgt

2015-01-26 Thread arei.gonglei
From: Gonglei arei.gong...@huawei.com

In order to support to assign a boot order for
vhost-scsi device, we should get the tpgt for
user level (such as Qemu). and at present, we
only support the minimum tpgt can boot.

Signed-off-by: Gonglei arei.gong...@huawei.com
Signed-off-by: Bo Su su...@huawei.com
---
 drivers/vhost/scsi.c   | 41 +
 include/uapi/linux/vhost.h |  2 ++
 2 files changed, 43 insertions(+)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index d695b16..12e79b9 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1522,6 +1522,38 @@ err_dev:
return ret;
 }
 
+static int vhost_scsi_get_first_tpgt(
+   struct vhost_scsi *vs,
+   struct vhost_scsi_target *t)
+{
+   struct tcm_vhost_tpg *tv_tpg;
+   struct tcm_vhost_tport *tv_tport;
+   int tpgt = -1;
+
+   mutex_lock(tcm_vhost_mutex);
+   mutex_lock(vs-dev.mutex);
+
+   list_for_each_entry(tv_tpg, tcm_vhost_list, tv_tpg_list) {
+   tv_tport = tv_tpg-tport;
+
+   if (!strcmp(tv_tport-tport_name, t-vhost_wwpn)) {
+   if (tpgt  0)
+   tpgt = tv_tpg-tport_tpgt;
+   else if (tpgt  tv_tpg-tport_tpgt)
+   tpgt = tv_tpg-tport_tpgt;
+   }
+   }
+
+   mutex_unlock(vs-dev.mutex);
+   mutex_unlock(tcm_vhost_mutex);
+
+   if (tpgt  0)
+   return -ENXIO;
+
+   t-vhost_tpgt = tpgt;
+   return 0;
+}
+
 static int vhost_scsi_set_features(struct vhost_scsi *vs, u64 features)
 {
struct vhost_virtqueue *vq;
@@ -1657,6 +1689,15 @@ vhost_scsi_ioctl(struct file *f,
if (put_user(events_missed, eventsp))
return -EFAULT;
return 0;
+   case VHOST_SCSI_GET_TPGT:
+   if (copy_from_user(backend, argp, sizeof(backend)))
+   return -EFAULT;
+   r = vhost_scsi_get_first_tpgt(vs, backend);
+   if (r  0)
+   return r;
+   if (copy_to_user(argp, backend, sizeof(backend)))
+   return -EFAULT;
+   return 0;
case VHOST_GET_FEATURES:
features = VHOST_SCSI_FEATURES;
if (copy_to_user(featurep, features, sizeof features))
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index bb6a5b4..5d350f7 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -155,4 +155,6 @@ struct vhost_scsi_target {
 #define VHOST_SCSI_SET_EVENTS_MISSED _IOW(VHOST_VIRTIO, 0x43, __u32)
 #define VHOST_SCSI_GET_EVENTS_MISSED _IOW(VHOST_VIRTIO, 0x44, __u32)
 
+#define VHOST_SCSI_GET_TPGT _IOW(VHOST_VIRTIO, 0x45, struct vhost_scsi_target)
+
 #endif
-- 
1.7.12.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: the number of PCI pass-through devices limit?

2015-01-26 Thread Alex Williamson
On Mon, 2015-01-26 at 16:46 +0800, Xuekun Hu wrote:
 Hi, All
 
 Is there a limit for number of PCI pass-through devices in KVM? For
 Legacy PCI device assignement or VFIO pass-through method?

There's no enforced limit, but the usable limit is related to the number
of KVM memory slots available.  Each PCI BAR uses a memory slot
(sometimes two).  If memory slots are exhausted, the the VM aborts.
Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/3] arm/arm64: KVM: Random selection of cache related fixes

2015-01-26 Thread Andrew Jones
On Wed, Jan 21, 2015 at 06:39:45PM +, Marc Zyngier wrote:
 This small series fixes a number of issues that Christoffer and I have
 been trying to nail down for a while, having to do with the host dying
 under load (swapping), and also with the way we deal with caches in
 general (and with set/way operation in particular):
 
 - The first one changes the way we handle cache ops by set/way,
   basically turning them into VA ops for the whole memory. This allows
   platforms with system caches to boot a 32bit zImage, for example.
 
 - The second one fixes a corner case that could happen if the guest
   used an uncached mapping (or had its caches off) while the host was
   swapping it out (and using a cache-coherent IO subsystem).
 
 - Finally, the last one fixes this stability issue when the host was
   swapping, by using a kernel mapping for cache maintenance instead of
   the userspace one.
 
 With these patches (and both the TLB invalidation and HCR fixes that
 are on their way to mainline), the APM platform seems much more robust
 than it previously was. Fingers crossed.
 
 The first round of review has generated a lot of traffic about
 ASID-tagged icache management for guests, but I've decided not to
 address this issue as part of this series. The code is broken already,
 and there isn't any virtualization capable, ASID-tagged icache core in
 the wild, AFAIK. I'll try to revisit this in another series, once I
 have wrapped my head around it (or someone beats me to it).
 
 Based on 3.19-rc5, tested on Juno, X-Gene, TC-2 and Cubietruck.
 
 Also at git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git 
 kvm-arm64/mm-fixes-3.19
 
 * From v2: [2]
 - Reworked the algorithm that tracks the state of the guest's caches,
   as there is some cases I didn't anticipate. In the end, the
   algorithm is simpler.
 
 * From v1: [1]
 - Dropped Steve's patch after discussion with Andrea
 - Refactor set/way support to avoid code duplication, better comments
 - Much improved comments in patch #2, courtesy of Christoffer
 
 [1]: http://www.spinics.net/lists/kvm-arm/msg13008.html
 [2]: http://www.spinics.net/lists/kvm-arm/msg13161.html
 
 Marc Zyngier (3):
   arm/arm64: KVM: Use set/way op trapping to track the state of the
 caches
   arm/arm64: KVM: Invalidate data cache on unmap
   arm/arm64: KVM: Use kernel mapping to perform invalidation on page
 fault
 
  arch/arm/include/asm/kvm_emulate.h   |  10 +++
  arch/arm/include/asm/kvm_host.h  |   3 -
  arch/arm/include/asm/kvm_mmu.h   |  77 +---
  arch/arm/kvm/arm.c   |  10 ---
  arch/arm/kvm/coproc.c|  64 +++---
  arch/arm/kvm/coproc_a15.c|   2 +-
  arch/arm/kvm/coproc_a7.c |   2 +-
  arch/arm/kvm/mmu.c   | 164 
 ++-
  arch/arm/kvm/trace.h |  39 +
  arch/arm64/include/asm/kvm_emulate.h |  10 +++
  arch/arm64/include/asm/kvm_host.h|   3 -
  arch/arm64/include/asm/kvm_mmu.h |  34 ++--
  arch/arm64/kvm/sys_regs.c|  75 +++-
  13 files changed, 321 insertions(+), 172 deletions(-)
 
 -- 
 2.1.4
 
 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Hi Marc,

checkpatch found some whitespace issues (not just the false alarms
that trace.h files generate). Also a loosing vs. losing typo in 2/3's
commit message.

Thanks,
Drew (trivial comments) Jones
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


vhost-scsi support for ANY_LAYOUT

2015-01-26 Thread Nicholas A. Bellinger
Hi MST  Paolo,

So I'm currently working on vhost-scsi support for ANY_LAYOUT, and
wanted to verify some assumptions based upon your earlier emails..

*) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that
virtio-scsi request + response headers will (always..?) be within a
single iovec.

*) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that
virtio-scsi request + response headers may be (but not always..?)
combined with data-out + data-in payloads into a single iovec.

*) When ANY_LAYOUT + T10_PI is negotiated by vhost-scsi, it's expected
that PI and data payloads for data-out + data-in may be (but not
always..?) within the same iovec.  Consequently, both headers + PI +
data-payloads may also be within a single iovec.

*) Is it still safe to use 'out' + 'in' values from vhost_get_vq_desc()
in order to determine the data_direction...?  If not, what's the
preferred way of determining this information for get_user_pages_fast()
permission bits and target_submit_cmd_map_sgls()..?

Also, what is required on the QEMU side in order to start generating
ANY_LAYOUT style iovecs to verify the WIP changes..?  I see
hw/scsi/virtio-scsi.c has been converted to accept any_layout=1, but
AFAICT the changes where only related to code not shared between
hw/scsi/vhost-scsi.c.

Thank you,

--nab

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 2/2] KVM: PPC: BOOK3S: HV: Use unlock variant with memory barrier

2015-01-26 Thread Aneesh Kumar K.V
We switch to unlock variant with memory barriers in the error path
and also in code path where we had implicit dependency on previous
functions calling lwsync/ptesync. In most of the cases we don't really
need an explicit barrier, but using the variant make sure we don't make
mistakes later with code movements. We also document why a
non-barrier variant is ok in performance critical path.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
Changes from V1:
* Rebase to latest upstream

 arch/powerpc/kvm/book3s_64_mmu_hv.c | 10 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 15 ++-
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 551dabb9551b..0fd91f54d1a7 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -639,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
return ret;
 
  out_unlock:
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
preempt_enable();
goto out_put;
 }
@@ -767,8 +767,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
note_hpte_modification(kvm, rev[i]);
}
}
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
unlock_rmap(rmapp);
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
}
return 0;
 }
@@ -854,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
ret = 1;
}
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -971,7 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
 
/* Now check and modify the HPTE */
if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
continue;
}
 
@@ -994,7 +994,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
}
v = ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
-   __unlock_hpte(hptep, v);
+   unlock_hpte(hptep, v);
} while ((i = j) != head);
 
unlock_rmap(rmapp);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 9123132b3053..2e45bd57d4e8 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -268,6 +268,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
pte = be64_to_cpu(hpte[0]);
if (!(pte  (HPTE_V_VALID | HPTE_V_ABSENT)))
break;
+   /*
+* Data dependency will avoid re-ordering
+*/
__unlock_hpte(hpte, pte);
hpte += 2;
}
@@ -286,7 +289,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
cpu_relax();
pte = be64_to_cpu(hpte[0]);
if (pte  (HPTE_V_VALID | HPTE_V_ABSENT)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_PTEG_FULL;
}
}
@@ -406,7 +409,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long 
flags,
if ((pte  (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
((flags  H_AVPN)  (pte  ~0x7fUL) != avpn) ||
((flags  H_ANDCOND)  (pte  avpn) != 0)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_NOT_FOUND;
}
 
@@ -542,7 +545,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
be64_to_cpu(hp[0]), be64_to_cpu(hp[1]));
rcbits = rev-guest_rpte  (HPTE_R_R|HPTE_R_C);
args[j] |= rcbits  (56 - 5);
-   __unlock_hpte(hp, 0);
+   unlock_hpte(hp, 0);
}
}
 
@@ -568,7 +571,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long 
flags,
pte = be64_to_cpu(hpte[0]);
if ((pte  (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
((flags  H_AVPN)  (pte  ~0x7fUL) != avpn)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_NOT_FOUND;
}
 
@@ -748,7 +751,9 

[PATCH V2 2/2] KVM: PPC: BOOK3S: HV: Use unlock variant with memory barrier

2015-01-26 Thread Aneesh Kumar K.V
We switch to unlock variant with memory barriers in the error path
and also in code path where we had implicit dependency on previous
functions calling lwsync/ptesync. In most of the cases we don't really
need an explicit barrier, but using the variant make sure we don't make
mistakes later with code movements. We also document why a
non-barrier variant is ok in performance critical path.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
Changes from V1:
* Rebase to latest upstream

 arch/powerpc/kvm/book3s_64_mmu_hv.c | 10 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 15 ++-
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 551dabb9551b..0fd91f54d1a7 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -639,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
return ret;
 
  out_unlock:
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
preempt_enable();
goto out_put;
 }
@@ -767,8 +767,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
note_hpte_modification(kvm, rev[i]);
}
}
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
unlock_rmap(rmapp);
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
}
return 0;
 }
@@ -854,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
ret = 1;
}
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -971,7 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
 
/* Now check and modify the HPTE */
if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
continue;
}
 
@@ -994,7 +994,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
}
v = ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
-   __unlock_hpte(hptep, v);
+   unlock_hpte(hptep, v);
} while ((i = j) != head);
 
unlock_rmap(rmapp);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 9123132b3053..2e45bd57d4e8 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -268,6 +268,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
pte = be64_to_cpu(hpte[0]);
if (!(pte  (HPTE_V_VALID | HPTE_V_ABSENT)))
break;
+   /*
+* Data dependency will avoid re-ordering
+*/
__unlock_hpte(hpte, pte);
hpte += 2;
}
@@ -286,7 +289,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
cpu_relax();
pte = be64_to_cpu(hpte[0]);
if (pte  (HPTE_V_VALID | HPTE_V_ABSENT)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_PTEG_FULL;
}
}
@@ -406,7 +409,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long 
flags,
if ((pte  (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
((flags  H_AVPN)  (pte  ~0x7fUL) != avpn) ||
((flags  H_ANDCOND)  (pte  avpn) != 0)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_NOT_FOUND;
}
 
@@ -542,7 +545,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
be64_to_cpu(hp[0]), be64_to_cpu(hp[1]));
rcbits = rev-guest_rpte  (HPTE_R_R|HPTE_R_C);
args[j] |= rcbits  (56 - 5);
-   __unlock_hpte(hp, 0);
+   unlock_hpte(hp, 0);
}
}
 
@@ -568,7 +571,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long 
flags,
pte = be64_to_cpu(hpte[0]);
if ((pte  (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
((flags  H_AVPN)  (pte  ~0x7fUL) != avpn)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_NOT_FOUND;
}
 
@@ -748,7 +751,9 

Re: [RFC v4 1/2] x86/xen: add xen_is_preemptible_hypercall()

2015-01-26 Thread Luis R. Rodriguez
On Thu, Jan 22, 2015 at 05:40:45PM -0800, Andy Lutomirski wrote:
 On Thu, Jan 22, 2015 at 4:29 PM, Luis R. Rodriguez
 mcg...@do-not-panic.com wrote:
  From: Luis R. Rodriguez mcg...@suse.com
 
  On kernels with voluntary or no preemption we can run
  into situations where a hypercall issued through userspace
  will linger around as it addresses sub-operatiosn in kernel
  context (multicalls). Such operations can trigger soft lockup
  detection.
 
 Looks reasonable.

I'll add a reviewed-by...

  LUis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 1/2] KVM: PPC: BOOK3S: HV: Add helpers for lock/unlock hpte

2015-01-26 Thread Aneesh Kumar K.V
This patch adds helper routine for lock and unlock hpte and use
the same for rest of the code. We don't change any locking rules in this
patch. In the next patch we switch some of the unlock usage to use
the api with barrier and also document the usage without barriers.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
Changes from V1:
* Rebase to latest upstream

 arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 25 ++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 25 +
 3 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2d81e202bdcc..0789a0f50969 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -85,6 +85,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long 
bits)
return old == 0;
 }
 
+static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v = ~HPTE_V_HVLOCK;
+   asm volatile(PPC_RELEASE_BARRIER  : : : memory);
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
+/* Without barrier */
+static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v = ~HPTE_V_HVLOCK;
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
 static inline int __hpte_actual_psize(unsigned int lp, int psize)
 {
int i, shift;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 534acb3c6c3d..551dabb9551b 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -338,9 +338,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
v = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
gr = kvm-arch.revmap[index].guest_rpte;
 
-   /* Unlock the HPTE */
-   asm volatile(lwsync : : : memory);
-   hptep[0] = cpu_to_be64(v);
+   unlock_hpte(hptep, v);
preempt_enable();
 
gpte-eaddr = eaddr;
@@ -469,8 +467,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
hpte[0] = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
hpte[1] = be64_to_cpu(hptep[1]);
hpte[2] = r = rev-guest_rpte;
-   asm volatile(lwsync : : : memory);
-   hptep[0] = cpu_to_be64(hpte[0]);
+   unlock_hpte(hptep, hpte[0]);
preempt_enable();
 
if (hpte[0] != vcpu-arch.pgfault_hpte[0] ||
@@ -621,7 +618,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 
hptep[1] = cpu_to_be64(r);
eieio();
-   hptep[0] = cpu_to_be64(hpte[0]);
+   __unlock_hpte(hptep, hpte[0]);
asm volatile(ptesync : : : memory);
preempt_enable();
if (page  hpte_is_writable(r))
@@ -642,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
return ret;
 
  out_unlock:
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
preempt_enable();
goto out_put;
 }
@@ -771,7 +768,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
}
unlock_rmap(rmapp);
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
}
return 0;
 }
@@ -857,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
ret = 1;
}
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -974,8 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
 
/* Now check and modify the HPTE */
if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
-   /* unlock and continue */
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
continue;
}
 
@@ -996,9 +992,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
npages_dirty = n;
eieio();
}
-   v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
+   v = ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
-   hptep[0] = cpu_to_be64(v);
+   __unlock_hpte(hptep, v);
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -1218,8 +1214,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
r = ~HPTE_GR_MODIFIED;
revp-guest_rpte = r;
}
-   asm volatile(PPC_RELEASE_BARRIER  : : : memory);
-   hptp[0] = 

[PATCH V2 1/2] KVM: PPC: BOOK3S: HV: Add helpers for lock/unlock hpte

2015-01-26 Thread Aneesh Kumar K.V
This patch adds helper routine for lock and unlock hpte and use
the same for rest of the code. We don't change any locking rules in this
patch. In the next patch we switch some of the unlock usage to use
the api with barrier and also document the usage without barriers.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
Changes from V1:
* Rebase to latest upstream

 arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 25 ++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 25 +
 3 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2d81e202bdcc..0789a0f50969 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -85,6 +85,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long 
bits)
return old == 0;
 }
 
+static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v = ~HPTE_V_HVLOCK;
+   asm volatile(PPC_RELEASE_BARRIER  : : : memory);
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
+/* Without barrier */
+static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v = ~HPTE_V_HVLOCK;
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
 static inline int __hpte_actual_psize(unsigned int lp, int psize)
 {
int i, shift;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 534acb3c6c3d..551dabb9551b 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -338,9 +338,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
v = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
gr = kvm-arch.revmap[index].guest_rpte;
 
-   /* Unlock the HPTE */
-   asm volatile(lwsync : : : memory);
-   hptep[0] = cpu_to_be64(v);
+   unlock_hpte(hptep, v);
preempt_enable();
 
gpte-eaddr = eaddr;
@@ -469,8 +467,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
hpte[0] = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
hpte[1] = be64_to_cpu(hptep[1]);
hpte[2] = r = rev-guest_rpte;
-   asm volatile(lwsync : : : memory);
-   hptep[0] = cpu_to_be64(hpte[0]);
+   unlock_hpte(hptep, hpte[0]);
preempt_enable();
 
if (hpte[0] != vcpu-arch.pgfault_hpte[0] ||
@@ -621,7 +618,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 
hptep[1] = cpu_to_be64(r);
eieio();
-   hptep[0] = cpu_to_be64(hpte[0]);
+   __unlock_hpte(hptep, hpte[0]);
asm volatile(ptesync : : : memory);
preempt_enable();
if (page  hpte_is_writable(r))
@@ -642,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
return ret;
 
  out_unlock:
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
preempt_enable();
goto out_put;
 }
@@ -771,7 +768,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
}
unlock_rmap(rmapp);
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
}
return 0;
 }
@@ -857,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
ret = 1;
}
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -974,8 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
 
/* Now check and modify the HPTE */
if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
-   /* unlock and continue */
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
continue;
}
 
@@ -996,9 +992,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
npages_dirty = n;
eieio();
}
-   v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
+   v = ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
-   hptep[0] = cpu_to_be64(v);
+   __unlock_hpte(hptep, v);
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -1218,8 +1214,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
r = ~HPTE_GR_MODIFIED;
revp-guest_rpte = r;
}
-   asm volatile(PPC_RELEASE_BARRIER  : : : memory);
-   hptp[0] = 

[PATCH v5 2/2] x86/xen: allow privcmd hypercalls to be preempted on 64-bit

2015-01-26 Thread Luis R. Rodriguez
From: Luis R. Rodriguez mcg...@suse.com

Xen has support for splitting heavy work work into a series
of hypercalls, called multicalls, and preempting them through
what Xen calls continuation [0]. Despite this though without
CONFIG_PREEMPT preemption won't happen, without preemption
a system can become pretty useless on heavy handed hypercalls.
Such is the case for example when creating a  50 GiB HVM guest,
we can get softlockups [1] with:.

kernel: [  802.084335] BUG: soft lockup - CPU#1 stuck for 22s! [xend:31351]

The softlock up triggers on the TASK_UNINTERRUPTIBLE hanger check
(default 120 seconds), on the Xen side in this particular case
this happens when the following Xen hypervisor code is used:

xc_domain_set_pod_target() --
  do_memory_op() --
arch_memory_op() --
  p2m_pod_set_mem_target()
-- long delay (real or emulated) --

This happens on arch_memory_op() on the XENMEM_set_pod_target memory
op even though arch_memory_op() can handle continuation via
hypercall_create_continuation() for example.

Machines over 50 GiB of memory are on high demand and hard to come
by so to help replicate this sort of issue long delays on select
hypercalls have been emulated in order to be able to test this on
smaller machines [2].

On one hand this issue can be considered as expected given that
CONFIG_PREEMPT=n is used however we have forced voluntary preemption
precedent practices in the kernel even for CONFIG_PREEMPT=n through
the usage of cond_resched() sprinkled in many places. To address
this issue with Xen hypercalls though we need to find a way to aid
to the schedular in the middle of hypercalls. We are motivated to
address this issue on CONFIG_PREEMPT=n as otherwise the system becomes
rather unresponsive for long periods of time; in the worst case, at least
only currently by emulating long delays on select io disk bound
hypercalls, this can lead to filesystem corruption if the delay happens
for example on SCHEDOP_remote_shutdown (when we call 'xl domain shutdown').

We can address this problem by trying to check if we should schedule
on the xen timer in the middle of a hypercall on the return from the
timer interrupt. We want to be careful to not always force voluntary
preemption though so to do this we only selectively enable preemption
on very specific xen hypercalls.

This enables hypercall preemption by selectively forcing checks for
voluntary preempting only on ioctl initiated private hypercalls
where we know some folks have run into reported issues [1].

[0] 
http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
[1] https://bugzilla.novell.com/show_bug.cgi?id=861093
[2] http://ftp.suse.com/pub/people/mcgrof/xen/emulate-long-xen-hypercalls.patch

Based on original work by: David Vrabel david.vra...@citrix.com
Suggested-by: Andy Lutomirski l...@amacapital.net
Cc: Andy Lutomirski l...@amacapital.net
Cc: Borislav Petkov b...@suse.de
Cc: David Vrabel david.vra...@citrix.com
Cc: Thomas Gleixner t...@linutronix.de
Cc: Ingo Molnar mi...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
Cc: x...@kernel.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Masami Hiramatsu masami.hiramatsu...@hitachi.com
Cc: Jan Beulich jbeul...@suse.com
Cc: linux-ker...@vger.kernel.org
Reviewed-by: Andy Lutomirski l...@amacapital.net
Signed-off-by: Luis R. Rodriguez mcg...@suse.com
---
 arch/x86/kernel/entry_64.S   |  2 ++
 drivers/xen/events/events_base.c | 14 ++
 include/xen/events.h |  1 +
 3 files changed, 17 insertions(+)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 9ebaf63..ee28733 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1198,6 +1198,8 @@ ENTRY(xen_do_hypervisor_callback)   # 
do_hypervisor_callback(struct *pt_regs)
popq %rsp
CFI_DEF_CFA_REGISTER rsp
decl PER_CPU_VAR(irq_count)
+   movq %rsp, %rdi  /* pass pt_regs as first argument */
+   call xen_end_upcall
jmp  error_exit
CFI_ENDPROC
 END(xen_do_hypervisor_callback)
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index b4bca2d..bf207f2 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -32,6 +32,8 @@
 #include linux/slab.h
 #include linux/irqnr.h
 #include linux/pci.h
+#include linux/sched.h
+#include linux/kprobes.h
 
 #ifdef CONFIG_X86
 #include asm/desc.h
@@ -1243,6 +1245,18 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
set_irq_regs(old_regs);
 }
 
+/*
+ * Some hypercalls issued by the toolstack can take many 10s of
+ * seconds. Allow tasks running hypercalls via the privcmd driver to be
+ * voluntarily preempted even if full kernel preemption is disabled.
+ */
+void xen_end_upcall(struct pt_regs *regs)
+{
+   if (xen_is_preemptible_hypercall(regs))
+   _cond_resched();
+}
+NOKPROBE_SYMBOL(xen_end_upcall);
+
 void 

[PATCH v5 0/2] x86/xen: add xen hypercall preemption

2015-01-26 Thread Luis R. Rodriguez
From: Luis R. Rodriguez mcg...@suse.com

This v5 nukes tracing as David said it was useless, it also
only adds support for 64-bit as its the only thing I can test,
and slightly modifies the documentation in code as to why we
want this. The no krobe thing is left in place as I haven't
heard confirmation its kosher to remove it.

Luis R. Rodriguez (2):
  x86/xen: add xen_is_preemptible_hypercall()
  x86/xen: allow privcmd hypercalls to be preempted on 64-bit

 arch/arm/include/asm/xen/hypercall.h |  5 +
 arch/x86/include/asm/xen/hypercall.h | 20 
 arch/x86/kernel/entry_64.S   |  2 ++
 arch/x86/xen/enlighten.c |  7 +++
 arch/x86/xen/xen-head.S  | 18 +-
 drivers/xen/events/events_base.c | 14 ++
 include/xen/events.h |  1 +
 7 files changed, 66 insertions(+), 1 deletion(-)

-- 
2.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/2] xen: add xen_is_preemptible_hypercall()

2015-01-26 Thread Luis R. Rodriguez
From: Luis R. Rodriguez mcg...@suse.com

On kernels with voluntary or no preemption we can run
into situations where a hypercall issued through userspace
will linger around as it addresses sub-operatiosn in kernel
context (multicalls). Such operations can trigger soft lockup
detection.

We want to address a way to let the kernel voluntarily preempt
such calls even on non preempt kernels, to address this we first
need to distinguish which hypercalls fall under this category.
This implements xen_is_preemptible_hypercall() which lets us do
just that by adding a secondary hypercall page, calls made via
the new page may be preempted.

This will only be used on x86 for now, on arm we just have a stub
to always return false for now.

Andrew had originally submitted a version of this work [0].

[0] http://lists.xen.org/archives/html/xen-devel/2014-02/msg01056.html

Based on original work by: Andrew Cooper andrew.coop...@citrix.com

Cc: Andy Lutomirski l...@amacapital.net
Cc: Borislav Petkov b...@suse.de
Cc: David Vrabel david.vra...@citrix.com
Cc: Thomas Gleixner t...@linutronix.de
Cc: Ingo Molnar mi...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
Cc: x...@kernel.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Masami Hiramatsu masami.hiramatsu...@hitachi.com
Cc: Jan Beulich jbeul...@suse.com
Cc: linux-ker...@vger.kernel.org
Reviewed-by: Andy Lutomirski l...@amacapital.net
Signed-off-by: Luis R. Rodriguez mcg...@suse.com
---
 arch/arm/include/asm/xen/hypercall.h |  5 +
 arch/x86/include/asm/xen/hypercall.h | 20 
 arch/x86/xen/enlighten.c |  7 +++
 arch/x86/xen/xen-head.S  | 18 +-
 4 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/xen/hypercall.h 
b/arch/arm/include/asm/xen/hypercall.h
index 712b50e..4fc8395 100644
--- a/arch/arm/include/asm/xen/hypercall.h
+++ b/arch/arm/include/asm/xen/hypercall.h
@@ -74,4 +74,9 @@ MULTI_mmu_update(struct multicall_entry *mcl, struct 
mmu_update *req,
BUG();
 }
 
+static inline bool xen_is_preemptible_hypercall(struct pt_regs *regs)
+{
+   return false;
+}
+
 #endif /* _ASM_ARM_XEN_HYPERCALL_H */
diff --git a/arch/x86/include/asm/xen/hypercall.h 
b/arch/x86/include/asm/xen/hypercall.h
index ca08a27..221008e 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -84,6 +84,22 @@
 
 extern struct { char _entry[32]; } hypercall_page[];
 
+#ifndef CONFIG_PREEMPT
+extern struct { char _entry[32]; } preemptible_hypercall_page[];
+
+static inline bool xen_is_preemptible_hypercall(struct pt_regs *regs)
+{
+   return !user_mode_vm(regs) 
+   regs-ip = (unsigned long)preemptible_hypercall_page 
+   regs-ip  (unsigned long)preemptible_hypercall_page + 
PAGE_SIZE;
+}
+#else
+static inline bool xen_is_preemptible_hypercall(struct pt_regs *regs)
+{
+   return false;
+}
+#endif
+
 #define __HYPERCALLcall hypercall_page+%c[offset]
 #define __HYPERCALL_ENTRY(x)   \
[offset] i (__HYPERVISOR_##x * sizeof(hypercall_page[0]))
@@ -215,7 +231,11 @@ privcmd_call(unsigned call,
 
asm volatile(call *%[call]
 : __HYPERCALL_5PARAM
+#ifndef CONFIG_PREEMPT
+: [call] a (preemptible_hypercall_page[call])
+#else
 : [call] a (hypercall_page[call])
+#endif
 : __HYPERCALL_CLOBBER5);
 
return (long)__res;
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 6bf3a13..9c01b48 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -84,6 +84,9 @@
 #include multicalls.h
 
 EXPORT_SYMBOL_GPL(hypercall_page);
+#ifndef CONFIG_PREEMPT
+EXPORT_SYMBOL_GPL(preemptible_hypercall_page);
+#endif
 
 /*
  * Pointer to the xen_vcpu_info structure or
@@ -1531,6 +1534,10 @@ asmlinkage __visible void __init xen_start_kernel(void)
 #endif
xen_setup_machphys_mapping();
 
+#ifndef CONFIG_PREEMPT
+   copy_page(preemptible_hypercall_page, hypercall_page);
+#endif
+
/* Install Xen paravirt ops */
pv_info = xen_info;
pv_init_ops = xen_init_ops;
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 674b2225..6e6a9517 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -85,9 +85,18 @@ ENTRY(xen_pvh_early_cpu_init)
 .pushsection .text
.balign PAGE_SIZE
 ENTRY(hypercall_page)
+
+#ifdef CONFIG_PREEMPT
+#  define PREEMPT_HYPERCALL_ENTRY(x)
+#else
+#  define PREEMPT_HYPERCALL_ENTRY(x) \
+   .global xen_hypercall_##x ## _p ASM_NL \
+   .set preemptible_xen_hypercall_##x, xen_hypercall_##x + PAGE_SIZE ASM_NL
+#endif
 #define NEXT_HYPERCALL(x) \
ENTRY(xen_hypercall_##x) \
-   .skip 32
+   .skip 32 ASM_NL \
+   PREEMPT_HYPERCALL_ENTRY(x)
 
 NEXT_HYPERCALL(set_trap_table)
 NEXT_HYPERCALL(mmu_update)
@@ -138,6 +147,13 @@ NEXT_HYPERCALL(arch_4)
 NEXT_HYPERCALL(arch_5)
 

Re: Submit your Google Summer of Code project ideas and volunteer to mentor

2015-01-26 Thread Fam Zheng
On Fri, 01/23 17:21, Stefan Hajnoczi wrote:
 Dear libvirt, KVM, and QEMU contributors,
 The Google Summer of Code season begins soon and it's time to collect
 our thoughts for mentoring students this summer working full-time on
 libvirt, KVM, and QEMU.
 
 What is GSoC?
 Google Summer of Code 2015 (GSoC) funds students to
 work on open source projects for 12 weeks over the summer.  Open
 source organizations apply to participate and those accepted receive
 funding for one or more students.
 
 
 We now need to collect a list of project ideas on our wiki.  We also
 need mentors to volunteer.
 
 http://qemu-project.org/Google_Summer_of_Code_2015
 
 Project ideas
 Please post project ideas on the wiki page below.  Project ideas
 should be suitable as a 12-week project that a student fluent in
 C/Python/etc can complete.  No prior knowledge of QEMU/KVM/libvirt
 internals can be assumed.
 
 http://qemu-project.org/Google_Summer_of_Code_2015
 
 Mentors
 Please add your name to project ideas you are willing to mentor.  In
 order to mentor you must be an established contributor (regularly
 contribute patches).  You must be willing to spend about 5 hours per
 week from May 25 to August 21.
 
 I have CCed the 8 most active committers since QEMU 2.1.0 as well as
 the previous libvirt and KVM mentors but everyone is invited.
 
 Official timeline:
 https://www.google-melange.com/gsoc/events/google/gsoc20145

s/20145/2015/

Thank you for organizing it!

Fam
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Supporting guest OS callchain (perf -g) on KVM

2015-01-26 Thread Stefan Hajnoczi
On Thu, Jan 22, 2015 at 03:29:10PM +0200, Elazar Leibovich wrote:
 When perf runs on a regular linux, it can collect the current
 stacktraces (kernel or user) for each sample. This is a very important
 feature, and is utilized by some visualization tools (see, e.g.,
 Brendan's post[0]).
 
 As far as I understand, it is not currently implemented in perf [1].
 
 While providing a cross platform, safe solution that works every time
 is a challenge, I think we can give a reasonable solution for Linux
 guests only.
 
 I think we can, in a portable way, across multiple Linux versions, do
 the following:
 
 1) Find out at which stack the guest kernel is.
 2) Find out Kernel's text address.
 3) Scan the stack up to its maximal size.
 4) Record all addresses found in the kernel text segment.
 
 This is more or less what the kernel do when recording its own stack 
 traces[2].
 
 An alternative, more general design is, recording all integers that
 looks like addresses in from the location of RIP to the start of the
 physical page. This would give you a slightly trimmed stack trace, but
 is pretty safe (you'll never get segfault, as RIP must be in a valid
 page), and should work across many different guest OS.
 
 I have little experience in the internals of perf or KVM, and would be
 happy to any feedback about implementing guest os callchain for KVM.
 
 [0] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
 [1] 
 https://github.com/torvalds/linux/blob/master/arch/x86/kernel/cpu/perf_event.c#L1968
  /* TODO: We don't support guest os callchain now */
 [2] 
 https://github.com/torvalds/linux/blob/master/arch/x86/kernel/dumpstack_64.c#L188

I'm not familiar with perf(1) internals but I guess a starting point is
the perf-record(1) userspace call stack code, which collects call stacks
for userspace processes.  KVM guests are similar.

i386 32-bit guest on x86_64 host is an interesting case.  The host must
be aware of the different calling conventions.

CCing people who have been involved in perf-kvm(1).

Stefan


pgppQMsgVe1V1.pgp
Description: PGP signature


Re: [question] incremental backup a running vm

2015-01-26 Thread Paolo Bonzini


On 26/01/2015 02:07, Zhang Haoyu wrote:
 Hi, Kashyap
 I've tried ‘drive_backup’ via QMP,
 but the snapshots were missed to backup to destination,
 I think the reason is that backup_run() only copy the
 guest data regarding qcow2 image.

Yes, that's the case.

QEMU cannot still access internal snapshots while the file is open.
External snapshots are opened read-only, and can be copied with cp
while QEMU is running.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


the number of PCI pass-through devices limit?

2015-01-26 Thread Xuekun Hu
Hi, All

Is there a limit for number of PCI pass-through devices in KVM? For
Legacy PCI device assignement or VFIO pass-through method?

Many thanks.
Thx, Xuekun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [RFC v4 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-26 Thread David Vrabel
On 23/01/15 18:58, Luis R. Rodriguez wrote:
 
 Its not just hypercalls though, this is all about the interactions
 with multicalls no?

No.  This applies to any preemptible hypercall and the toolstack doesn't
use multicalls for most of its work.

David
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [question] incremental backup a running vm

2015-01-26 Thread Paolo Bonzini


On 26/01/2015 12:13, Zhang Haoyu wrote:
 Thanks, Paolo,
 but too many internal snapshots were saved by customers,
 switching to external snapshot mechanism has significant impaction
 on subsequent upgrade.

In that case, patches are welcome. :)

 Another problem:
 drive_backup just implement one time backup,
 but I want VMWare's VDP-like backup mechanism.
 The initial backup of a virtual machine takes comparatively more time,
 because all of the data for that virtual machine is being backed up. 
 Subsequent backups of the same virtual machine take less time, because
 changed block tracking (log dirty) mechanism is used to only backup the dirty 
 data.
 After inittial backup done, even the VM shutdown, but subsequent backup also 
 only 
 copy the changed data.

As mentioned before, patches for this are on the list.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/11] kvmtool: add command line parameter to instantiate a vGICv3

2015-01-26 Thread Andre Przywara
Hi Will,

On 26/01/15 11:30, Will Deacon wrote:
 On Fri, Jan 23, 2015 at 04:35:10PM +, Andre Przywara wrote:
 Add the command line parameter --gicv3 to request GICv3 emulation
 in the kernel. Connect that to the already existing GICv3 code.

 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  tools/kvm/arm/aarch64/arm-cpu.c|5 -
  .../kvm/arm/aarch64/include/kvm/kvm-config-arch.h  |4 +++-
  tools/kvm/arm/gic.c|   14 ++
  tools/kvm/arm/include/arm-common/kvm-config-arch.h |1 +
  tools/kvm/arm/kvm-cpu.c|2 +-
  tools/kvm/arm/kvm.c|3 ++-
  6 files changed, 25 insertions(+), 4 deletions(-)

 diff --git a/tools/kvm/arm/aarch64/arm-cpu.c 
 b/tools/kvm/arm/aarch64/arm-cpu.c
 index a70d6bb..46d6d6a 100644
 --- a/tools/kvm/arm/aarch64/arm-cpu.c
 +++ b/tools/kvm/arm/aarch64/arm-cpu.c
 @@ -12,7 +12,10 @@
  static void generate_fdt_nodes(void *fdt, struct kvm *kvm, u32 gic_phandle)
  {
  int timer_interrupts[4] = {13, 14, 11, 10};
 -gic__generate_fdt_nodes(fdt, gic_phandle, KVM_DEV_TYPE_ARM_VGIC_V2);
 +gic__generate_fdt_nodes(fdt, gic_phandle,
 +kvm-cfg.arch.gicv3 ?
 +KVM_DEV_TYPE_ARM_VGIC_V3 :
 +KVM_DEV_TYPE_ARM_VGIC_V2);
  timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
  }
  
 diff --git a/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h 
 b/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h
 index 89860ae..106e52f 100644
 --- a/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h
 +++ b/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h
 @@ -3,7 +3,9 @@
  
  #define ARM_OPT_ARCH_RUN(cfg)   
 \
  OPT_BOOLEAN('\0', aarch32, (cfg)-aarch32_guest, \
 -Run AArch32 guest),
 +Run AArch32 guest),   \
 +OPT_BOOLEAN('\0', gicv3, (cfg)-gicv3,   \
 +Use a GICv3 interrupt controller in the guest),
 
 On a GICv3-capable system, why would I *not* want to enable this option?
 In other words, could we make this the default behaviour on systems that
 support it, and if you need an override then it should be something like
 --force-gicv2.

Well, you could have a guest kernel  3.17, which does not have GICv3
support. In general I consider GICv2 better tested, so I reckon that
people will only want to use GICv3 emulation if there is a need for it
(non-compat GICv3 host or more than 8 VCPUs in the guest). That probably
changes over time, but for the time being I'd better keep the default at
GICv2 emulation.

Having said that, there could be a fallback in case GICv2 emulation is
not available. Let me take a look at that.
Also thinking about the future (ITS emulation) I found that I'd like to
replace this option with something more generic like --irqchip=.

Cheers,
Andre.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [RFC v4 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-26 Thread Jan Beulich
 On 23.01.15 at 19:58, mcg...@suse.com wrote:
 On Fri, Jan 23, 2015 at 11:45:06AM +, David Vrabel wrote:
 On 23/01/15 00:29, Luis R. Rodriguez wrote:
  @@ -1243,6 +1247,25 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
 set_irq_regs(old_regs);
   }
   
  +/*
  + * CONFIG_PREEMPT=n kernels can end up triggering the softlock
  + * TASK_UNINTERRUPTIBLE hanger check (default 120 seconds)
  + * when certain multicalls are used [0] on large systems, in
  + * that case we need a way to voluntarily preempt. This is
  + * only an issue on CONFIG_PREEMPT=n kernels.
 
 Rewrite this comment as;
 
 * Some hypercalls issued by the toolstack can take many 10s of
 
 Its not just hypercalls though, this is all about the interactions
 with multicalls no?

multicalls are just a special case of hypercalls.

Jan

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [question] incremental backup a running vm

2015-01-26 Thread Zhang Haoyu

On 2015-01-26 17:29:43, Paolo Bonzini wrote:

 On 26/01/2015 02:07, Zhang Haoyu wrote:
  Hi, Kashyap
  I've tried ‘drive_backup’ via QMP,
 but the snapshots were missed to backup to destination,
  I think the reason is that backup_run() only copy the
  guest data regarding qcow2 image.
 
Yes, that's the case.
 
 QEMU cannot still access internal snapshots while the file is open.
 External snapshots are opened read-only, and can be copied with cp
 while QEMU is running.
Thanks, Paolo,
but too many internal snapshots were saved by customers,
switching to external snapshot mechanism has significant impaction
on subsequent upgrade.

Another problem:
drive_backup just implement one time backup,
but I want VMWare's VDP-like backup mechanism.
The initial backup of a virtual machine takes comparatively more time,
because all of the data for that virtual machine is being backed up. 
Subsequent backups of the same virtual machine take less time, because
changed block tracking (log dirty) mechanism is used to only backup the dirty 
data.
After inittial backup done, even the VM shutdown, but subsequent backup also 
only 
copy the changed data.

Thanks,
Zhang Haoyu

 Paolo
N�Р骒r��yb�X�肚�v�^�)藓{.n�+�筏�hФ�≤�}��财�z�j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ���)撷f

Re: [PATCH 0/7] KVM: x86: Emulator fixes

2015-01-26 Thread Paolo Bonzini


On 26/01/2015 08:32, Nadav Amit wrote:
 Sorry for sending patches at the last minute. There is nothing critical in 
 this
 patch-set.  Yet, if you may want to incorporate something in 3.20 -
 specifically 5 (small define mistakes) or 7 (which is somewhat affected by
 recent changes).
 
 Thanks for reviewing the patches.

I'll apply all of them for 3.20, since Linus expects no merge window for
2 weeks.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/11] kvmtool: AArch{32,64}: use KVM_CREATE_DEVICE co to instanciate the GIC

2015-01-26 Thread Will Deacon
On Fri, Jan 23, 2015 at 04:35:02PM +, Andre Przywara wrote:
 From: Marc Zyngier marc.zyng...@arm.com
 
 As of 3.14, KVM/arm supports the creation/configuration of the GIC through
 a more generic device API, which is now the preferred way to do so.
 
 Plumb the new API in, and allow the old code to be used as a fallback.
 
 [Andre: Rename some functions on the way to differentiate between
 creation and initialisation more clearly.]
 
 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  tools/kvm/arm/gic.c|   60 
 
  tools/kvm/arm/include/arm-common/gic.h |2 +-
  tools/kvm/arm/kvm.c|6 ++--
  3 files changed, 57 insertions(+), 11 deletions(-)
 
 diff --git a/tools/kvm/arm/gic.c b/tools/kvm/arm/gic.c
 index 5d8cbe6..ce5f7fa 100644
 --- a/tools/kvm/arm/gic.c
 +++ b/tools/kvm/arm/gic.c
 @@ -7,7 +7,41 @@
  #include linux/byteorder.h
  #include linux/kvm.h
  
 -int gic__init_irqchip(struct kvm *kvm)
 +static int gic_fd = -1;
 +
 +static int gic__create_device(struct kvm *kvm)
 +{
 + int err;
 + u64 cpu_if_addr = ARM_GIC_CPUI_BASE;
 + u64 dist_addr = ARM_GIC_DIST_BASE;
 + struct kvm_create_device gic_device = {
 + .type   = KVM_DEV_TYPE_ARM_VGIC_V2,
 + };
 + struct kvm_device_attr cpu_if_attr = {
 + .group  = KVM_DEV_ARM_VGIC_GRP_ADDR,
 + .attr   = KVM_VGIC_V2_ADDR_TYPE_CPU,
 + .addr   = (u64)(unsigned long)cpu_if_addr,
 + };
 + struct kvm_device_attr dist_attr = {
 + .group  = KVM_DEV_ARM_VGIC_GRP_ADDR,
 + .attr   = KVM_VGIC_V2_ADDR_TYPE_DIST,
 + .addr   = (u64)(unsigned long)dist_addr,
 + };
 +
 + err = ioctl(kvm-vm_fd, KVM_CREATE_DEVICE, gic_device);
 + if (err)
 + return err;
 +
 + gic_fd = gic_device.fd;
 +
 + err = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, cpu_if_attr);
 + if (err)
 + return err;
 +
 + return ioctl(gic_fd, KVM_SET_DEVICE_ATTR, dist_attr);
 +}
 +
 +static int gic__create_irqchip(struct kvm *kvm)
  {
   int err;
   struct kvm_arm_device_addr gic_addr[] = {
 @@ -23,12 +57,6 @@ int gic__init_irqchip(struct kvm *kvm)
   }
   };
  
 - if (kvm-nrcpus  GIC_MAX_CPUS) {
 - pr_warning(%d CPUS greater than maximum of %d -- truncating\n,
 - kvm-nrcpus, GIC_MAX_CPUS);
 - kvm-nrcpus = GIC_MAX_CPUS;
 - }
 -
   err = ioctl(kvm-vm_fd, KVM_CREATE_IRQCHIP);
   if (err)
   return err;
 @@ -41,6 +69,24 @@ int gic__init_irqchip(struct kvm *kvm)
   return err;
  }
  
 +int gic__create(struct kvm *kvm)
 +{
 + int err;
 +
 + if (kvm-nrcpus  GIC_MAX_CPUS) {
 + pr_warning(%d CPUS greater than maximum of %d -- truncating\n,
 + kvm-nrcpus, GIC_MAX_CPUS);
 + kvm-nrcpus = GIC_MAX_CPUS;
 + }
 +
 + /* Try the new way first, and fallback on legacy method otherwise */
 + err = gic__create_device(kvm);
 + if (err)
 + err = gic__create_irqchip(kvm);

This fallback doesn't look safe to me:

  - gic_fd might remain initialised
  - What does the kernel vgic driver do if you've already done
a successful KVM_CREATE_DEVICE and then try to use the legacy method?

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/11] kvmtool: add command line parameter to instantiate a vGICv3

2015-01-26 Thread Will Deacon
On Fri, Jan 23, 2015 at 04:35:10PM +, Andre Przywara wrote:
 Add the command line parameter --gicv3 to request GICv3 emulation
 in the kernel. Connect that to the already existing GICv3 code.
 
 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  tools/kvm/arm/aarch64/arm-cpu.c|5 -
  .../kvm/arm/aarch64/include/kvm/kvm-config-arch.h  |4 +++-
  tools/kvm/arm/gic.c|   14 ++
  tools/kvm/arm/include/arm-common/kvm-config-arch.h |1 +
  tools/kvm/arm/kvm-cpu.c|2 +-
  tools/kvm/arm/kvm.c|3 ++-
  6 files changed, 25 insertions(+), 4 deletions(-)
 
 diff --git a/tools/kvm/arm/aarch64/arm-cpu.c b/tools/kvm/arm/aarch64/arm-cpu.c
 index a70d6bb..46d6d6a 100644
 --- a/tools/kvm/arm/aarch64/arm-cpu.c
 +++ b/tools/kvm/arm/aarch64/arm-cpu.c
 @@ -12,7 +12,10 @@
  static void generate_fdt_nodes(void *fdt, struct kvm *kvm, u32 gic_phandle)
  {
   int timer_interrupts[4] = {13, 14, 11, 10};
 - gic__generate_fdt_nodes(fdt, gic_phandle, KVM_DEV_TYPE_ARM_VGIC_V2);
 + gic__generate_fdt_nodes(fdt, gic_phandle,
 + kvm-cfg.arch.gicv3 ?
 + KVM_DEV_TYPE_ARM_VGIC_V3 :
 + KVM_DEV_TYPE_ARM_VGIC_V2);
   timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
  }
  
 diff --git a/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h 
 b/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h
 index 89860ae..106e52f 100644
 --- a/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h
 +++ b/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h
 @@ -3,7 +3,9 @@
  
  #define ARM_OPT_ARCH_RUN(cfg)
 \
   OPT_BOOLEAN('\0', aarch32, (cfg)-aarch32_guest, \
 - Run AArch32 guest),
 + Run AArch32 guest),   \
 + OPT_BOOLEAN('\0', gicv3, (cfg)-gicv3,   \
 + Use a GICv3 interrupt controller in the guest),

On a GICv3-capable system, why would I *not* want to enable this option?
In other words, could we make this the default behaviour on systems that
support it, and if you need an override then it should be something like
--force-gicv2.

Or am I missing a key piece of the puzzle?

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/11] kvmtool: add command line parameter to instantiate a vGICv3

2015-01-26 Thread Marc Zyngier
On 26/01/15 11:43, Andre Przywara wrote:
 Hi Will,
 
 On 26/01/15 11:30, Will Deacon wrote:
 On Fri, Jan 23, 2015 at 04:35:10PM +, Andre Przywara wrote:
 Add the command line parameter --gicv3 to request GICv3 emulation
 in the kernel. Connect that to the already existing GICv3 code.

 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  tools/kvm/arm/aarch64/arm-cpu.c|5 -
  .../kvm/arm/aarch64/include/kvm/kvm-config-arch.h  |4 +++-
  tools/kvm/arm/gic.c|   14 ++
  tools/kvm/arm/include/arm-common/kvm-config-arch.h |1 +
  tools/kvm/arm/kvm-cpu.c|2 +-
  tools/kvm/arm/kvm.c|3 ++-
  6 files changed, 25 insertions(+), 4 deletions(-)

 diff --git a/tools/kvm/arm/aarch64/arm-cpu.c 
 b/tools/kvm/arm/aarch64/arm-cpu.c
 index a70d6bb..46d6d6a 100644
 --- a/tools/kvm/arm/aarch64/arm-cpu.c
 +++ b/tools/kvm/arm/aarch64/arm-cpu.c
 @@ -12,7 +12,10 @@
  static void generate_fdt_nodes(void *fdt, struct kvm *kvm, u32 gic_phandle)
  {
 int timer_interrupts[4] = {13, 14, 11, 10};
 -   gic__generate_fdt_nodes(fdt, gic_phandle, KVM_DEV_TYPE_ARM_VGIC_V2);
 +   gic__generate_fdt_nodes(fdt, gic_phandle,
 +   kvm-cfg.arch.gicv3 ?
 +   KVM_DEV_TYPE_ARM_VGIC_V3 :
 +   KVM_DEV_TYPE_ARM_VGIC_V2);
 timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
  }
  
 diff --git a/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h 
 b/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h
 index 89860ae..106e52f 100644
 --- a/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h
 +++ b/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h
 @@ -3,7 +3,9 @@
  
  #define ARM_OPT_ARCH_RUN(cfg)  
 \
 OPT_BOOLEAN('\0', aarch32, (cfg)-aarch32_guest, \
 -   Run AArch32 guest),
 +   Run AArch32 guest),   \
 +   OPT_BOOLEAN('\0', gicv3, (cfg)-gicv3,   \
 +   Use a GICv3 interrupt controller in the guest),

 On a GICv3-capable system, why would I *not* want to enable this option?
 In other words, could we make this the default behaviour on systems that
 support it, and if you need an override then it should be something like
 --force-gicv2.
 
 Well, you could have a guest kernel  3.17, which does not have GICv3
 support. In general I consider GICv2 better tested, so I reckon that
 people will only want to use GICv3 emulation if there is a need for it
 (non-compat GICv3 host or more than 8 VCPUs in the guest). That probably
 changes over time, but for the time being I'd better keep the default at
 GICv2 emulation.

I think there is slightly more to it. You want the same command-line
options to give you the same result on different platform (provided that
the HW is available, see below). Changing the default depending on the
platform you're is not very good for reproducibility.

 Having said that, there could be a fallback in case GICv2 emulation is
 not available. Let me take a look at that.

You could try and pick a GICv3 emulation if v2 is not available, and
probably print a warning in that case.

 Also thinking about the future (ITS emulation) I found that I'd like to
 replace this option with something more generic like --irqchip=.

That's an orthogonal issue, but yes, this is probably better.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/11] kvmtool: AArch{32,64}: use KVM_CREATE_DEVICE co to instanciate the GIC

2015-01-26 Thread Andre Przywara
On 26/01/15 11:26, Will Deacon wrote:
 On Fri, Jan 23, 2015 at 04:35:02PM +, Andre Przywara wrote:
 From: Marc Zyngier marc.zyng...@arm.com

 As of 3.14, KVM/arm supports the creation/configuration of the GIC through
 a more generic device API, which is now the preferred way to do so.

 Plumb the new API in, and allow the old code to be used as a fallback.

 [Andre: Rename some functions on the way to differentiate between
 creation and initialisation more clearly.]

 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  tools/kvm/arm/gic.c|   60 
 
  tools/kvm/arm/include/arm-common/gic.h |2 +-
  tools/kvm/arm/kvm.c|6 ++--
  3 files changed, 57 insertions(+), 11 deletions(-)

 diff --git a/tools/kvm/arm/gic.c b/tools/kvm/arm/gic.c
 index 5d8cbe6..ce5f7fa 100644
 --- a/tools/kvm/arm/gic.c
 +++ b/tools/kvm/arm/gic.c
 @@ -7,7 +7,41 @@
  #include linux/byteorder.h
  #include linux/kvm.h
  
 -int gic__init_irqchip(struct kvm *kvm)
 +static int gic_fd = -1;
 +
 +static int gic__create_device(struct kvm *kvm)
 +{
 +int err;
 +u64 cpu_if_addr = ARM_GIC_CPUI_BASE;
 +u64 dist_addr = ARM_GIC_DIST_BASE;
 +struct kvm_create_device gic_device = {
 +.type   = KVM_DEV_TYPE_ARM_VGIC_V2,
 +};
 +struct kvm_device_attr cpu_if_attr = {
 +.group  = KVM_DEV_ARM_VGIC_GRP_ADDR,
 +.attr   = KVM_VGIC_V2_ADDR_TYPE_CPU,
 +.addr   = (u64)(unsigned long)cpu_if_addr,
 +};
 +struct kvm_device_attr dist_attr = {
 +.group  = KVM_DEV_ARM_VGIC_GRP_ADDR,
 +.attr   = KVM_VGIC_V2_ADDR_TYPE_DIST,
 +.addr   = (u64)(unsigned long)dist_addr,
 +};
 +
 +err = ioctl(kvm-vm_fd, KVM_CREATE_DEVICE, gic_device);
 +if (err)
 +return err;
 +
 +gic_fd = gic_device.fd;
 +
 +err = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, cpu_if_attr);
 +if (err)
 +return err;
 +
 +return ioctl(gic_fd, KVM_SET_DEVICE_ATTR, dist_attr);
 +}
 +
 +static int gic__create_irqchip(struct kvm *kvm)
  {
  int err;
  struct kvm_arm_device_addr gic_addr[] = {
 @@ -23,12 +57,6 @@ int gic__init_irqchip(struct kvm *kvm)
  }
  };
  
 -if (kvm-nrcpus  GIC_MAX_CPUS) {
 -pr_warning(%d CPUS greater than maximum of %d -- truncating\n,
 -kvm-nrcpus, GIC_MAX_CPUS);
 -kvm-nrcpus = GIC_MAX_CPUS;
 -}
 -
  err = ioctl(kvm-vm_fd, KVM_CREATE_IRQCHIP);
  if (err)
  return err;
 @@ -41,6 +69,24 @@ int gic__init_irqchip(struct kvm *kvm)
  return err;
  }
  
 +int gic__create(struct kvm *kvm)
 +{
 +int err;
 +
 +if (kvm-nrcpus  GIC_MAX_CPUS) {
 +pr_warning(%d CPUS greater than maximum of %d -- truncating\n,
 +kvm-nrcpus, GIC_MAX_CPUS);
 +kvm-nrcpus = GIC_MAX_CPUS;
 +}
 +
 +/* Try the new way first, and fallback on legacy method otherwise */
 +err = gic__create_device(kvm);
 +if (err)
 +err = gic__create_irqchip(kvm);
 
 This fallback doesn't look safe to me:
 
   - gic_fd might remain initialised
   - What does the kernel vgic driver do if you've already done
 a successful KVM_CREATE_DEVICE and then try to use the legacy method?

Good point. I think we need to cleanup the device by closing the fd (and
resetting the variable to -1) in case any of the subsequent ioctls
return with an error (e.g. due to unaligned addresses).
I have to check what happens in the kernel in that case, though.

Cheers,
Andre.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches

2015-01-26 Thread Christoffer Dall
On Wed, Jan 21, 2015 at 06:39:46PM +, Marc Zyngier wrote:
 Trying to emulate the behaviour of set/way cache ops is fairly
 pointless, as there are too many ways we can end-up missing stuff.
 Also, there is some system caches out there that simply ignore
 set/way operations.
 
 So instead of trying to implement them, let's convert it to VA ops,
 and use them as a way to re-enable the trapping of VM ops. That way,
 we can detect the point when the MMU/caches are turned off, and do
 a full VM flush (which is what the guest was trying to do anyway).
 
 This allows a 32bit zImage to boot on the APM thingy, and will
 probably help bootloaders in general.
 
 Signed-off-by: Marc Zyngier marc.zyng...@arm.com

This had some conflicts with dirty page logging.  I fixed it up here,
and also removed some trailing white space and mixed spaces/tabs that
patch complained about:

http://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git mm-fixes

 ---
  arch/arm/include/asm/kvm_emulate.h   | 10 +
  arch/arm/include/asm/kvm_host.h  |  3 --
  arch/arm/include/asm/kvm_mmu.h   |  3 +-
  arch/arm/kvm/arm.c   | 10 -
  arch/arm/kvm/coproc.c| 64 ++
  arch/arm/kvm/coproc_a15.c|  2 +-
  arch/arm/kvm/coproc_a7.c |  2 +-
  arch/arm/kvm/mmu.c   | 70 -
  arch/arm/kvm/trace.h | 39 +++
  arch/arm64/include/asm/kvm_emulate.h | 10 +
  arch/arm64/include/asm/kvm_host.h|  3 --
  arch/arm64/include/asm/kvm_mmu.h |  3 +-
  arch/arm64/kvm/sys_regs.c| 75 
 +---
  13 files changed, 155 insertions(+), 139 deletions(-)
 
 diff --git a/arch/arm/include/asm/kvm_emulate.h 
 b/arch/arm/include/asm/kvm_emulate.h
 index 66ce176..7b01523 100644
 --- a/arch/arm/include/asm/kvm_emulate.h
 +++ b/arch/arm/include/asm/kvm_emulate.h
 @@ -38,6 +38,16 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
   vcpu-arch.hcr = HCR_GUEST_MASK;
  }
  
 +static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
 +{
 + return vcpu-arch.hcr;
 +}
 +
 +static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
 +{
 + vcpu-arch.hcr = hcr;
 +}
 +
  static inline bool vcpu_mode_is_32bit(struct kvm_vcpu *vcpu)
  {
   return 1;
 diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
 index 254e065..04b4ea0 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -125,9 +125,6 @@ struct kvm_vcpu_arch {
* Anything that is not used directly from assembly code goes
* here.
*/
 - /* dcache set/way operation pending */
 - int last_pcpu;
 - cpumask_t require_dcache_flush;
  
   /* Don't run the guest on this vcpu */
   bool pause;
 diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
 index 63e0ecc..286644c 100644
 --- a/arch/arm/include/asm/kvm_mmu.h
 +++ b/arch/arm/include/asm/kvm_mmu.h
 @@ -190,7 +190,8 @@ static inline void coherent_cache_guest_page(struct 
 kvm_vcpu *vcpu, hva_t hva,
  
  #define kvm_virt_to_phys(x)  virt_to_idmap((unsigned long)(x))
  
 -void stage2_flush_vm(struct kvm *kvm);
 +void kvm_set_way_flush(struct kvm_vcpu *vcpu);
 +void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
  
  #endif   /* !__ASSEMBLY__ */
  
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 2d6d910..0b0d58a 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -281,15 +281,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
   vcpu-cpu = cpu;
   vcpu-arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
  
 - /*
 -  * Check whether this vcpu requires the cache to be flushed on
 -  * this physical CPU. This is a consequence of doing dcache
 -  * operations by set/way on this vcpu. We do it here to be in
 -  * a non-preemptible section.
 -  */
 - if (cpumask_test_and_clear_cpu(cpu, vcpu-arch.require_dcache_flush))
 - flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
 -
   kvm_arm_set_running_vcpu(vcpu);
  }
  
 @@ -541,7 +532,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
 kvm_run *run)
   ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
  
   vcpu-mode = OUTSIDE_GUEST_MODE;
 - vcpu-arch.last_pcpu = smp_processor_id();
   kvm_guest_exit();
   trace_kvm_exit(*vcpu_pc(vcpu));
   /*
 diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
 index 7928dbd..0afcc00 100644
 --- a/arch/arm/kvm/coproc.c
 +++ b/arch/arm/kvm/coproc.c
 @@ -189,82 +189,40 @@ static bool access_l2ectlr(struct kvm_vcpu *vcpu,
   return true;
  }
  
 -/* See note at ARM ARM B1.14.4 */
 +/*
 + * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
 + */
  static bool access_dcsw(struct