[PATCH] kvm: Query KVM for available memory slots

2013-11-22 Thread Alex Williamson
KVM reports the number of available memory slots (KVM_CAP_NR_MEMSLOTS)
using the extension interface.  Both x86 and s390 implement this, ARM
and powerpc do not yet enable it.  Convert the static slots array to
be dynamically allocated, supporting more slots when available.
Default to 32 when KVM_CAP_NR_MEMSLOTS is not implemented.  The
motivation for this change is to support more assigned devices, where
memory mapped PCI MMIO BARs typically take one slot each.

Signed-off-by: Alex Williamson 
---
 kvm-all.c |   30 +-
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 4478969..63c4e9b 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -72,7 +72,8 @@ typedef struct kvm_dirty_log KVMDirtyLog;
 
 struct KVMState
 {
-KVMSlot slots[32];
+KVMSlot *slots;
+int nr_slots;
 int fd;
 int vmfd;
 int coalesced_mmio;
@@ -125,7 +126,7 @@ static KVMSlot *kvm_alloc_slot(KVMState *s)
 {
 int i;
 
-for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
+for (i = 0; i < s->nr_slots; i++) {
 if (s->slots[i].memory_size == 0) {
 return &s->slots[i];
 }
@@ -141,7 +142,7 @@ static KVMSlot *kvm_lookup_matching_slot(KVMState *s,
 {
 int i;
 
-for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
+for (i = 0; i < s->nr_slots; i++) {
 KVMSlot *mem = &s->slots[i];
 
 if (start_addr == mem->start_addr &&
@@ -163,7 +164,7 @@ static KVMSlot *kvm_lookup_overlapping_slot(KVMState *s,
 KVMSlot *found = NULL;
 int i;
 
-for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
+for (i = 0; i < s->nr_slots; i++) {
 KVMSlot *mem = &s->slots[i];
 
 if (mem->memory_size == 0 ||
@@ -185,7 +186,7 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void 
*ram,
 {
 int i;
 
-for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
+for (i = 0; i < s->nr_slots; i++) {
 KVMSlot *mem = &s->slots[i];
 
 if (ram >= mem->ram && ram < mem->ram + mem->memory_size) {
@@ -357,7 +358,7 @@ static int kvm_set_migration_log(int enable)
 
 s->migration_log = enable;
 
-for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
+for (i = 0; i < s->nr_slots; i++) {
 mem = &s->slots[i];
 
 if (!mem->memory_size) {
@@ -1383,9 +1384,6 @@ int kvm_init(void)
 #ifdef KVM_CAP_SET_GUEST_DEBUG
 QTAILQ_INIT(&s->kvm_sw_breakpoints);
 #endif
-for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
-s->slots[i].slot = i;
-}
 s->vmfd = -1;
 s->fd = qemu_open("/dev/kvm", O_RDWR);
 if (s->fd == -1) {
@@ -1409,6 +1407,19 @@ int kvm_init(void)
 goto err;
 }
 
+s->nr_slots = kvm_check_extension(s, KVM_CAP_NR_MEMSLOTS);
+
+/* If unspecified, use the previous default value */
+if (!s->nr_slots) {
+s->nr_slots = 32;
+}
+
+s->slots = g_malloc0(s->nr_slots * sizeof(KVMSlot));
+
+for (i = 0; i < s->nr_slots; i++) {
+s->slots[i].slot = i;
+}
+
 /* check the vcpu limits */
 soft_vcpus_limit = kvm_recommended_vcpus(s);
 hard_vcpus_limit = kvm_max_vcpus(s);
@@ -1527,6 +1538,7 @@ err:
 if (s->fd != -1) {
 close(s->fd);
 }
+g_free(s->slots);
 g_free(s);
 
 return ret;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 07/15] KVM: MMU: introduce nulls desc

2013-11-22 Thread Marcelo Tosatti
On Wed, Oct 23, 2013 at 09:29:25PM +0800, Xiao Guangrong wrote:
> It likes nulls list and we use the pte-list as the nulls which can help us to
> detect whether the "desc" is moved to anther rmap then we can re-walk the rmap
> if that happened
> 
> kvm->slots_lock is held when we do lockless walking that prevents rmap
> is reused (free rmap need to hold that lock) so that we can not see the same
> nulls used on different rmaps
> 
> Signed-off-by: Xiao Guangrong 

How about simplified lockless walk on the slot while rmapp entry
contains a single spte? (which should be the case with two-dimensional
paging).

That is, grab the lock when finding a rmap with more than one spte in
it (and then keep it locked until the end).

For example, nothing prevents lockless walker to move into some
parent_ptes chain, right?

Also, there is no guarantee of termination (as long as sptes are
deleted with the correct timing). BTW, can't see any guarantee of
termination for rculist nulls either (a writer can race with a lockless
reader indefinately, restarting the lockless walk every time).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] powerpc/kvm/booke: Fix build break due to stack frame size warning

2013-11-22 Thread Scott Wood
Commit ce11e48b7fdd256ec68b932a89b397a790566031 ("KVM: PPC: E500: Add
userspace debug stub support") added "struct thread_struct" to the
stack of kvmppc_vcpu_run().  thread_struct is 1152 bytes on my build,
compared to 48 bytes for the recently-introduced "struct debug_reg".
Use the latter instead.

This fixes the following error:

cc1: warnings being treated as errors
arch/powerpc/kvm/booke.c: In function 'kvmppc_vcpu_run':
arch/powerpc/kvm/booke.c:760:1: error: the frame size of 1424 bytes is larger 
than 1024 bytes
make[2]: *** [arch/powerpc/kvm/booke.o] Error 1
make[1]: *** [arch/powerpc/kvm] Error 2
make[1]: *** Waiting for unfinished jobs

Signed-off-by: Scott Wood 
Cc: Bharat Bhushan 
---
Build tested only.  Bharat, please test.

 arch/powerpc/include/asm/switch_to.h |  2 +-
 arch/powerpc/kernel/process.c| 32 
 arch/powerpc/kvm/booke.c | 12 ++--
 3 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 9ee1261..aace905 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -35,7 +35,7 @@ extern void giveup_vsx(struct task_struct *);
 extern void enable_kernel_spe(void);
 extern void giveup_spe(struct task_struct *);
 extern void load_up_spe(struct task_struct *);
-extern void switch_booke_debug_regs(struct thread_struct *new_thread);
+extern void switch_booke_debug_regs(struct debug_reg *new_debug);
 
 #ifndef CONFIG_SMP
 extern void discard_lazy_cpu_state(void);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 3386d8a..4a96556 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -339,7 +339,7 @@ static void set_debug_reg_defaults(struct thread_struct 
*thread)
 #endif
 }
 
-static void prime_debug_regs(struct thread_struct *thread)
+static void prime_debug_regs(struct debug_reg *debug)
 {
/*
 * We could have inherited MSR_DE from userspace, since
@@ -348,22 +348,22 @@ static void prime_debug_regs(struct thread_struct *thread)
 */
mtmsr(mfmsr() & ~MSR_DE);
 
-   mtspr(SPRN_IAC1, thread->debug.iac1);
-   mtspr(SPRN_IAC2, thread->debug.iac2);
+   mtspr(SPRN_IAC1, debug->iac1);
+   mtspr(SPRN_IAC2, debug->iac2);
 #if CONFIG_PPC_ADV_DEBUG_IACS > 2
-   mtspr(SPRN_IAC3, thread->debug.iac3);
-   mtspr(SPRN_IAC4, thread->debug.iac4);
+   mtspr(SPRN_IAC3, debug->iac3);
+   mtspr(SPRN_IAC4, debug->iac4);
 #endif
-   mtspr(SPRN_DAC1, thread->debug.dac1);
-   mtspr(SPRN_DAC2, thread->debug.dac2);
+   mtspr(SPRN_DAC1, debug->dac1);
+   mtspr(SPRN_DAC2, debug->dac2);
 #if CONFIG_PPC_ADV_DEBUG_DVCS > 0
-   mtspr(SPRN_DVC1, thread->debug.dvc1);
-   mtspr(SPRN_DVC2, thread->debug.dvc2);
+   mtspr(SPRN_DVC1, debug->dvc1);
+   mtspr(SPRN_DVC2, debug->dvc2);
 #endif
-   mtspr(SPRN_DBCR0, thread->debug.dbcr0);
-   mtspr(SPRN_DBCR1, thread->debug.dbcr1);
+   mtspr(SPRN_DBCR0, debug->dbcr0);
+   mtspr(SPRN_DBCR1, debug->dbcr1);
 #ifdef CONFIG_BOOKE
-   mtspr(SPRN_DBCR2, thread->debug.dbcr2);
+   mtspr(SPRN_DBCR2, debug->dbcr2);
 #endif
 }
 /*
@@ -371,11 +371,11 @@ static void prime_debug_regs(struct thread_struct *thread)
  * debug registers, set the debug registers from the values
  * stored in the new thread.
  */
-void switch_booke_debug_regs(struct thread_struct *new_thread)
+void switch_booke_debug_regs(struct debug_reg *new_debug)
 {
if ((current->thread.debug.dbcr0 & DBCR0_IDM)
-   || (new_thread->debug.dbcr0 & DBCR0_IDM))
-   prime_debug_regs(new_thread);
+   || (new_debug->dbcr0 & DBCR0_IDM))
+   prime_debug_regs(new_debug);
 }
 EXPORT_SYMBOL_GPL(switch_booke_debug_regs);
 #else  /* !CONFIG_PPC_ADV_DEBUG_REGS */
@@ -683,7 +683,7 @@ struct task_struct *__switch_to(struct task_struct *prev,
 #endif /* CONFIG_SMP */
 
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
-   switch_booke_debug_regs(&new->thread);
+   switch_booke_debug_regs(&new->thread.debug);
 #else
 /*
  * For PPC_BOOK3S_64, we use the hw-breakpoint interfaces that would
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 53e65a2..0591e05 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -681,7 +681,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
int ret, s;
-   struct thread_struct thread;
+   struct debug_reg debug;
 #ifdef CONFIG_PPC_FPU
struct thread_fp_state fp;
int fpexc_mode;
@@ -723,9 +723,9 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
 #endif
 
/* Switch to guest debug context */
-   thread.debug = vcpu->arch.shadow_dbg_reg;
-   switch_booke_debug_regs(&thread);
-   thread.debug = current->thread.debug;
+ 

[PATCH 2/3] arm/arm64: KVM: vgic: Bugfix in vgic_dispatch_sgi

2013-11-22 Thread Christoffer Dall
When software writes to the GICD_SGIR with the TargetListFilter field
set to 0, we should use the target_cpus mask as the VCPU destination
mask for the SGI.  However, because we were falling through to the next
case due to a missing break, we would always send the SGI to all other
cores than ourselves.  This does not change anything on dual-core system
(unless a core is IPI'ing itself), but would look quite bad on systems
with more cores.

Cc: Haibin Wang 
Reported-by: Haibin Wang 
Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/vgic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 6699ed9..ecee766 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -751,7 +751,7 @@ static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, u32 
reg)
case 0:
if (!target_cpus)
return;
-
+   break;
case 1:
target_cpus = ((1 << nrcpus) - 1) & ~(1 << vcpu_id) & 0xff;
break;
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] arm/arm64: KVM: vgic: Bugfix in handle_mmio_cfg_reg

2013-11-22 Thread Christoffer Dall
We shift the offset right by 1 bit because we pretend the register
access is for a register packed with 1 bit per setting and not 2 bits
like the hardware.  However, after we expand the emulated register into
the layout of the real hardware register, we need to use the hardware
offset for accessing the register.  Adjust the code accordingly.

Cc: Haibin Wang 
Reported-by: Haibin Wang 
Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/vgic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 685fc72..6699ed9 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -553,7 +553,7 @@ static bool handle_mmio_cfg_reg(struct kvm_vcpu *vcpu,
val = *reg & 0x;
 
val = vgic_cfg_expand(val);
-   vgic_reg_access(mmio, &val, offset,
+   vgic_reg_access(mmio, &val, offset << 1,
ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
if (mmio->is_write) {
if (offset < 4) {
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] arm/arm64: KVM: vgic: Use non-atomic bitops

2013-11-22 Thread Christoffer Dall
Change the use of atomic bitops to use the non-atomic versions.  All
these operations are protected under a spinlock so using atomic
operations is simply a waste of cycles.

The test_and_clear_bit operations saves us ~500 cycles per world switch
on TC2 on average.

Changing the remaining bitops to their non-atomic versions saves us ~50
cycles over 100 repetitions of the average world-switch time of ~120,000
worls switches.

Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/vgic.c | 42 +-
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index ecee766..8f52d41 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -128,9 +128,9 @@ static void vgic_bitmap_set_irq_val(struct vgic_bitmap *x, 
int cpuid,
}
 
if (val)
-   set_bit(irq, reg);
+   __set_bit(irq, reg);
else
-   clear_bit(irq, reg);
+   __clear_bit(irq, reg);
 }
 
 static unsigned long *vgic_bitmap_get_cpu_map(struct vgic_bitmap *x, int cpuid)
@@ -219,19 +219,19 @@ static void vgic_dist_irq_clear(struct kvm_vcpu *vcpu, 
int irq)
 static void vgic_cpu_irq_set(struct kvm_vcpu *vcpu, int irq)
 {
if (irq < VGIC_NR_PRIVATE_IRQS)
-   set_bit(irq, vcpu->arch.vgic_cpu.pending_percpu);
+   __set_bit(irq, vcpu->arch.vgic_cpu.pending_percpu);
else
-   set_bit(irq - VGIC_NR_PRIVATE_IRQS,
-   vcpu->arch.vgic_cpu.pending_shared);
+   __set_bit(irq - VGIC_NR_PRIVATE_IRQS,
+ vcpu->arch.vgic_cpu.pending_shared);
 }
 
 static void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq)
 {
if (irq < VGIC_NR_PRIVATE_IRQS)
-   clear_bit(irq, vcpu->arch.vgic_cpu.pending_percpu);
+   __clear_bit(irq, vcpu->arch.vgic_cpu.pending_percpu);
else
-   clear_bit(irq - VGIC_NR_PRIVATE_IRQS,
- vcpu->arch.vgic_cpu.pending_shared);
+   __clear_bit(irq - VGIC_NR_PRIVATE_IRQS,
+   vcpu->arch.vgic_cpu.pending_shared);
 }
 
 static u32 mmio_data_read(struct kvm_exit_mmio *mmio, u32 mask)
@@ -466,9 +466,9 @@ static void vgic_set_target_reg(struct kvm *kvm, u32 val, 
int irq)
kvm_for_each_vcpu(c, vcpu, kvm) {
bmap = 
vgic_bitmap_get_shared_map(&dist->irq_spi_target[c]);
if (c == target)
-   set_bit(irq + i, bmap);
+   __set_bit(irq + i, bmap);
else
-   clear_bit(irq + i, bmap);
+   __clear_bit(irq + i, bmap);
}
}
 }
@@ -812,14 +812,14 @@ static void vgic_update_state(struct kvm *kvm)
int c;
 
if (!dist->enabled) {
-   set_bit(0, &dist->irq_pending_on_cpu);
+   __set_bit(0, &dist->irq_pending_on_cpu);
return;
}
 
kvm_for_each_vcpu(c, vcpu, kvm) {
if (compute_pending_for_cpu(vcpu)) {
pr_debug("CPU%d has pending interrupts\n", c);
-   set_bit(c, &dist->irq_pending_on_cpu);
+   __set_bit(c, &dist->irq_pending_on_cpu);
}
}
 }
@@ -848,7 +848,7 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu)
 
if (!vgic_irq_is_enabled(vcpu, irq)) {
vgic_cpu->vgic_irq_lr_map[irq] = LR_EMPTY;
-   clear_bit(lr, vgic_cpu->lr_used);
+   __clear_bit(lr, vgic_cpu->lr_used);
vgic_cpu->vgic_lr[lr] &= ~GICH_LR_STATE;
if (vgic_irq_is_active(vcpu, irq))
vgic_irq_clear_active(vcpu, irq);
@@ -893,7 +893,7 @@ static bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 
sgi_source_id, int irq)
kvm_debug("LR%d allocated for IRQ%d %x\n", lr, irq, sgi_source_id);
vgic_cpu->vgic_lr[lr] = MK_LR_PEND(sgi_source_id, irq);
vgic_cpu->vgic_irq_lr_map[irq] = lr;
-   set_bit(lr, vgic_cpu->lr_used);
+   __set_bit(lr, vgic_cpu->lr_used);
 
if (!vgic_irq_is_edge(vcpu, irq))
vgic_cpu->vgic_lr[lr] |= GICH_LR_EOI;
@@ -912,7 +912,7 @@ static bool vgic_queue_sgi(struct kvm_vcpu *vcpu, int irq)
 
for_each_set_bit(c, &sources, VGIC_MAX_CPUS) {
if (vgic_queue_irq(vcpu, c, irq))
-   clear_bit(c, &sources);
+   __clear_bit(c, &sources);
}
 
dist->irq_sgi_sources[vcpu_id][irq] = sources;
@@ -920,7 +920,7 @@ static bool vgic_queue_sgi(struct kvm_vcpu *vcpu, int irq)
/*
 * If the sources bitmap has been cleared it means that we
 * could queue all the SGIs onto link registers (see the
-* clear_bit above), and therefore we are done with them in
+ 

[PATCH 0/3] arm/arm64: KVM: vgic: Various bugfixes and improvements

2013-11-22 Thread Christoffer Dall
This small series contains two initial bugfixes and a performance
optimization that reduces world-switch cost slightly in the vgic
handling code.

Applies to kvm-arm-next.

Christoffer Dall (3):
  arm/arm64: KVM: vgic: Bugfix in handle_mmio_cfg_reg
  arm/arm64: KVM: vgic: Bugfix in vgic_dispatch_sgi
  arm/arm64: KVM: vgic: Use non-atomic bitops

 virt/kvm/arm/vgic.c | 46 +++---
 1 file changed, 23 insertions(+), 23 deletions(-)

-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html