Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-02-24 Thread Avi Kivity

On 02/23/2011 11:46 PM, Alex Williamson wrote:


  But kvm_arch_flush_shadow() takes mmu_lock currently, so that needs
  fixing.

Hmm, I tried to follow the example in the !npages path just above this
that does:

rcu_assign_pointer()
synchronize_srcu_expedited()
kvm_arch_flush_shadow()

Do we have an existing issue there with mmu_lock?  Thanks,



It's the issue I pointed out - after rcu_assign_pointer and before 
kvm_arch_flush_shadow() we can still get shadow pages created, so we 
have a mix of direct and indirect bitmap.  Either my solution (recording 
whether the bitmap is direct or not in kvm_mmu_page) or Marcelo's 
(preventing shadow instantiation during this period) should work.


Hmm, your patch dereferences kvm-memslots without rcu_dereference().  
That's a mortal (as in oops) sin.  Note that sticking an 
rcu_dereference() blindly may or may not work - we might end up using 
information from two different generations of kvm-memslots, and using 
inconsistent data.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-02-24 Thread Avi Kivity

On 02/24/2011 02:34 PM, Avi Kivity wrote:


Hmm, your patch dereferences kvm-memslots without rcu_dereference().  
That's a mortal (as in oops) sin.


Sorry, that's wrong, all those places are under -slots_lock.

  Note that sticking an rcu_dereference() blindly may or may not work 
- we might end up using information from two different generations of 
kvm-memslots, and using inconsistent data.




This may still hold.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-02-24 Thread Alex Williamson
On Thu, 2011-02-24 at 14:34 +0200, Avi Kivity wrote:
 On 02/23/2011 11:46 PM, Alex Williamson wrote:
  
But kvm_arch_flush_shadow() takes mmu_lock currently, so that needs
fixing.
 
  Hmm, I tried to follow the example in the !npages path just above this
  that does:
 
  rcu_assign_pointer()
  synchronize_srcu_expedited()
  kvm_arch_flush_shadow()
 
  Do we have an existing issue there with mmu_lock?  Thanks,
 
 
 It's the issue I pointed out - after rcu_assign_pointer and before 
 kvm_arch_flush_shadow() we can still get shadow pages created, so we 
 have a mix of direct and indirect bitmap.  Either my solution (recording 
 whether the bitmap is direct or not in kvm_mmu_page) or Marcelo's 
 (preventing shadow instantiation during this period) should work.

Thanks, I understand better now.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-02-23 Thread Alex Williamson
On Mon, 2011-01-31 at 17:18 -0200, Marcelo Tosatti wrote:
 On Mon, Jan 24, 2011 at 10:37:56PM -0700, Alex Williamson wrote:
  On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
   I'll look at how we might be
   able to allocate slots on demand.  Thanks,
  
  Here's a first cut just to see if this looks agreeable.  This allows the
  slot array to grow on demand.  This works with current userspace, as
  well as userspace trivially modified to double KVMState.slots and
  hotplugging enough pci-assign devices to exceed the previous limit (w/ 
  w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
  the right path?  If so, I'll work on removing the fixed limit from
  userspace next.  Thanks,
  
  Alex
  
  
  kvm: Allow memory slot array to grow on demand
  
  Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
  to grow on demand.  Private slots are now allocated at the
  front instead of the end.  Only x86 seems to use private slots,
  so this is now zero for all other archs.  The memslots pointer
  is already updated using rcu, so changing the size off the
  array when it's replaces is straight forward.  x86 also keeps
  a bitmap of slots used by a kvm_mmu_page, which requires a
  shadow tlb flush whenever we increase the number of slots.
  This forces the pages to be rebuilt with the new bitmap size.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  ---
  
   arch/ia64/include/asm/kvm_host.h|4 --
   arch/ia64/kvm/kvm-ia64.c|2 +
   arch/powerpc/include/asm/kvm_host.h |3 --
   arch/s390/include/asm/kvm_host.h|3 --
   arch/x86/include/asm/kvm_host.h |3 +-
   arch/x86/include/asm/vmx.h  |6 ++-
   arch/x86/kvm/mmu.c  |7 +++-
   arch/x86/kvm/x86.c  |6 ++-
   include/linux/kvm_host.h|7 +++-
   virt/kvm/kvm_main.c |   65 
  ---
   10 files changed, 63 insertions(+), 43 deletions(-)
[snip]
  diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
  index fd67bcd..32f023c 100644
  --- a/virt/kvm/kvm_main.c
  +++ b/virt/kvm/kvm_main.c
[snip]
  @@ -752,12 +753,19 @@ skip_lpage:
   
  if (!npages) {
  r = -ENOMEM;
  -   slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
  +
  +   nmemslots = (mem-slot = kvm-memslots-nmemslots) ?
  +   mem-slot + 1 : kvm-memslots-nmemslots;
  +
  +   slots = kzalloc(sizeof(struct kvm_memslots) +
  +   nmemslots * sizeof(struct kvm_memory_slot),
  +   GFP_KERNEL);
  if (!slots)
  goto out_free;
  -   memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots));
  -   if (mem-slot = slots-nmemslots)
  -   slots-nmemslots = mem-slot + 1;
  +   memcpy(slots, kvm-memslots,
  +  sizeof(struct kvm_memslots) + kvm-memslots-nmemslots *
  +  sizeof(struct kvm_memory_slot));
  +   slots-nmemslots = nmemslots;
 
 Other than the upper limit, should disallow increasing the number of
 slots in case the new slot is being deleted (npages == 0). So none of
 this additions to !npages case should be necessary.

The existing code currently allows adding a slot with !npages.  I'll
create a small separate patch to make this behavioral change.

 Also, must disallow shrinking the number of slots.

Right, we only grow, never shrink.

  slots-generation++;
  slots-memslots[mem-slot].flags |= KVM_MEMSLOT_INVALID;
   
  @@ -787,12 +795,21 @@ skip_lpage:
  }
   
  r = -ENOMEM;
  -   slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
  +
  +   if (mem-slot = kvm-memslots-nmemslots) {
  +   nmemslots = mem-slot + 1;
  +   flush = true;
  +   } else
  +   nmemslots = kvm-memslots-nmemslots;
  +
  +   slots = kzalloc(sizeof(struct kvm_memslots) +
  +   nmemslots * sizeof(struct kvm_memory_slot),
  +   GFP_KERNEL);
  if (!slots)
  goto out_free;
  -   memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots));
  -   if (mem-slot = slots-nmemslots)
  -   slots-nmemslots = mem-slot + 1;
  +   memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots) +
  +  kvm-memslots-nmemslots * sizeof(struct kvm_memory_slot));
  +   slots-nmemslots = nmemslots;
  slots-generation++;
   
  /* actual memory is freed via old in kvm_free_physmem_slot below */
  @@ -808,6 +825,9 @@ skip_lpage:
  rcu_assign_pointer(kvm-memslots, slots);
 
 It should be: 
 
 spin_lock(kvm-mmu_lock)
 rcu_assign_pointer()
 kvm_arch_flush_shadow()
 spin_unlock(kvm-mmu_lock)
 
 But kvm_arch_flush_shadow() takes mmu_lock currently, so that needs
 fixing. 

Hmm, I tried to follow the example in the !npages path just above this
that does:

rcu_assign_pointer()
synchronize_srcu_expedited()

Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-31 Thread Marcelo Tosatti
On Mon, Jan 24, 2011 at 10:37:56PM -0700, Alex Williamson wrote:
 On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
  I'll look at how we might be
  able to allocate slots on demand.  Thanks,
 
 Here's a first cut just to see if this looks agreeable.  This allows the
 slot array to grow on demand.  This works with current userspace, as
 well as userspace trivially modified to double KVMState.slots and
 hotplugging enough pci-assign devices to exceed the previous limit (w/ 
 w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
 the right path?  If so, I'll work on removing the fixed limit from
 userspace next.  Thanks,
 
 Alex
 
 
 kvm: Allow memory slot array to grow on demand
 
 Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
 to grow on demand.  Private slots are now allocated at the
 front instead of the end.  Only x86 seems to use private slots,
 so this is now zero for all other archs.  The memslots pointer
 is already updated using rcu, so changing the size off the
 array when it's replaces is straight forward.  x86 also keeps
 a bitmap of slots used by a kvm_mmu_page, which requires a
 shadow tlb flush whenever we increase the number of slots.
 This forces the pages to be rebuilt with the new bitmap size.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
  arch/ia64/include/asm/kvm_host.h|4 --
  arch/ia64/kvm/kvm-ia64.c|2 +
  arch/powerpc/include/asm/kvm_host.h |3 --
  arch/s390/include/asm/kvm_host.h|3 --
  arch/x86/include/asm/kvm_host.h |3 +-
  arch/x86/include/asm/vmx.h  |6 ++-
  arch/x86/kvm/mmu.c  |7 +++-
  arch/x86/kvm/x86.c  |6 ++-
  include/linux/kvm_host.h|7 +++-
  virt/kvm/kvm_main.c |   65 
 ---
  10 files changed, 63 insertions(+), 43 deletions(-)
 
 
 diff --git a/arch/ia64/include/asm/kvm_host.h 
 b/arch/ia64/include/asm/kvm_host.h
 index 2689ee5..11d0ab2 100644
 --- a/arch/ia64/include/asm/kvm_host.h
 +++ b/arch/ia64/include/asm/kvm_host.h
 @@ -23,10 +23,6 @@
  #ifndef __ASM_KVM_HOST_H
  #define __ASM_KVM_HOST_H
  
 -#define KVM_MEMORY_SLOTS 32
 -/* memory slots that does not exposed to userspace */
 -#define KVM_PRIVATE_MEM_SLOTS 4
 -
  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
  
  /* define exit reasons from vmm to kvm*/
 diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
 index 70d224d..f1adda2 100644
 --- a/arch/ia64/kvm/kvm-ia64.c
 +++ b/arch/ia64/kvm/kvm-ia64.c
 @@ -1814,7 +1814,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
   mutex_lock(kvm-slots_lock);
  
   r = -EINVAL;
 - if (log-slot = KVM_MEMORY_SLOTS)
 + if (log-slot = kvm-memslots-nmemslots)
   goto out;
  
   memslot = kvm-memslots-memslots[log-slot];
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index bba3b9b..dc80057 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -29,9 +29,6 @@
  #include asm/kvm_asm.h
  
  #define KVM_MAX_VCPUS 1
 -#define KVM_MEMORY_SLOTS 32
 -/* memory slots that does not exposed to userspace */
 -#define KVM_PRIVATE_MEM_SLOTS 4
  
  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
  
 diff --git a/arch/s390/include/asm/kvm_host.h 
 b/arch/s390/include/asm/kvm_host.h
 index cef7dbf..92a964c 100644
 --- a/arch/s390/include/asm/kvm_host.h
 +++ b/arch/s390/include/asm/kvm_host.h
 @@ -20,9 +20,6 @@
  #include asm/cpu.h
  
  #define KVM_MAX_VCPUS 64
 -#define KVM_MEMORY_SLOTS 32
 -/* memory slots that does not exposed to userspace */
 -#define KVM_PRIVATE_MEM_SLOTS 4
  
  struct sca_entry {
   atomic_t scn;
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index ffd7f8d..df1382c 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -27,7 +27,6 @@
  #include asm/msr-index.h
  
  #define KVM_MAX_VCPUS 64
 -#define KVM_MEMORY_SLOTS 32
  /* memory slots that does not exposed to userspace */
  #define KVM_PRIVATE_MEM_SLOTS 4
  
 @@ -207,7 +206,7 @@ struct kvm_mmu_page {
* One bit set per slot which has memory
* in this shadow page.
*/
 - DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
 + unsigned long *slot_bitmap;
   bool multimapped; /* More than one parent_pte? */
   bool unsync;
   int root_count;  /* Currently serving as active root */
 diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
 index 84471b8..7fd8c89 100644
 --- a/arch/x86/include/asm/vmx.h
 +++ b/arch/x86/include/asm/vmx.h
 @@ -370,9 +370,9 @@ enum vmcs_field {
  
  #define AR_RESERVD_MASK 0xfffe0f00
  
 -#define TSS_PRIVATE_MEMSLOT  (KVM_MEMORY_SLOTS + 0)
 -#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (KVM_MEMORY_SLOTS + 1)
 -#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT   (KVM_MEMORY_SLOTS + 2)
 +#define 

Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-27 Thread Avi Kivity

On 01/26/2011 02:08 PM, Michael S. Tsirkin wrote:

  
  I just mean that once you fault you map sptes and then you can use them
  without exits.  mmio will cause exits each time. Right?

  The swapper scanning sptes, ksmd, khugepaged, and swapping can all
  cause a page to be unmapped.  Though it should certainly happen with
  a much lower frequency than mmio.

Right. That's why I say that sorting by size might not be optimal.
Maybe a cache ...


Why would it not be optimal?

If you have 16GB RAM in two slots and a few megabytes here and there 
scattered in some slots, you have three orders of magnitudes to spare.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-27 Thread Michael S. Tsirkin
On Thu, Jan 27, 2011 at 11:21:47AM +0200, Avi Kivity wrote:
 On 01/26/2011 02:08 PM, Michael S. Tsirkin wrote:
   
   I just mean that once you fault you map sptes and then you can use them
   without exits.  mmio will cause exits each time. Right?
 
   The swapper scanning sptes, ksmd, khugepaged, and swapping can all
   cause a page to be unmapped.  Though it should certainly happen with
   a much lower frequency than mmio.
 
 Right. That's why I say that sorting by size might not be optimal.
 Maybe a cache ...
 
 Why would it not be optimal?
 
 If you have 16GB RAM in two slots and a few megabytes here and there
 scattered in some slots, you have three orders of magnitudes to
 spare.

Yes but you might not be accessing all that RAM.
Maybe your workset is just tens of megabytes.

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-27 Thread Avi Kivity

On 01/27/2011 11:26 AM, Michael S. Tsirkin wrote:

  Right. That's why I say that sorting by size might not be optimal.
  Maybe a cache ...

  Why would it not be optimal?

  If you have 16GB RAM in two slots and a few megabytes here and there
  scattered in some slots, you have three orders of magnitudes to
  spare.

Yes but you might not be accessing all that RAM.
Maybe your workset is just tens of megabytes.


Great, then you fault it in and never touch the slots array again.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-27 Thread Michael S. Tsirkin
On Thu, Jan 27, 2011 at 11:26:19AM +0200, Michael S. Tsirkin wrote:
 On Thu, Jan 27, 2011 at 11:21:47AM +0200, Avi Kivity wrote:
  On 01/26/2011 02:08 PM, Michael S. Tsirkin wrote:

I just mean that once you fault you map sptes and then you can use them
without exits.  mmio will cause exits each time. Right?
  
The swapper scanning sptes, ksmd, khugepaged, and swapping can all
cause a page to be unmapped.  Though it should certainly happen with
a much lower frequency than mmio.
  
  Right. That's why I say that sorting by size might not be optimal.
  Maybe a cache ...
  
  Why would it not be optimal?
  
  If you have 16GB RAM in two slots and a few megabytes here and there
  scattered in some slots, you have three orders of magnitudes to
  spare.
 
 Yes but you might not be accessing all that RAM.
 Maybe your workset is just tens of megabytes.

Anyway, what I am saying this is all just guesswork.
Some kind of cache sounds, to me, like a better
approach if we have a ton of slots.


  -- 
  error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-27 Thread Michael S. Tsirkin
On Thu, Jan 27, 2011 at 11:28:12AM +0200, Avi Kivity wrote:
 On 01/27/2011 11:26 AM, Michael S. Tsirkin wrote:
   Right. That's why I say that sorting by size might not be optimal.
   Maybe a cache ...
 
   Why would it not be optimal?
 
   If you have 16GB RAM in two slots and a few megabytes here and there
   scattered in some slots, you have three orders of magnitudes to
   spare.
 
 Yes but you might not be accessing all that RAM.
 Maybe your workset is just tens of megabytes.
 
 Great, then you fault it in and never touch the slots array again.

Yep. Except for emulated mmio :)

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-27 Thread Avi Kivity

On 01/27/2011 11:29 AM, Michael S. Tsirkin wrote:

On Thu, Jan 27, 2011 at 11:28:12AM +0200, Avi Kivity wrote:
  On 01/27/2011 11:26 AM, Michael S. Tsirkin wrote:
 Right. That's why I say that sorting by size might not be optimal.
 Maybe a cache ...
  
 Why would it not be optimal?
  
 If you have 16GB RAM in two slots and a few megabytes here and there
 scattered in some slots, you have three orders of magnitudes to
 spare.
  
  Yes but you might not be accessing all that RAM.
  Maybe your workset is just tens of megabytes.

  Great, then you fault it in and never touch the slots array again.

Yep. Except for emulated mmio :)



Which is why I want to cache the result of a negative lookup in the spte.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-26 Thread Jan Kiszka
On 2011-01-25 20:13, Alex Williamson wrote:
 On Tue, 2011-01-25 at 17:35 +0100, Jan Kiszka wrote:
 On 2011-01-25 15:53, Avi Kivity wrote:
 On 01/25/2011 04:41 PM, Alex Williamson wrote:
  
  
kvm: Allow memory slot array to grow on demand
  
Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
to grow on demand.  Private slots are now allocated at the
front instead of the end.  Only x86 seems to use private slots,

  Hmm, doesn't current user space expect slots 8..11 to be the private
  ones and wouldn't it cause troubles if slots 0..3 are suddenly
 reserved?

 The private slots aren't currently visible to userspace, they're
 actually slots 32..35.  The patch automatically increments user passed
 slot ids so userspace has it's own zero-based view of the array.
 Frankly, I don't understand why userspace reserves slots 8..11, is this
 compatibility with older kernel implementations?

 I think so.  I believe these kernel versions are too old now to matter,
 but of course I can't be sure.

 Will check and enable those 4 additional slots for user space if we no
 longer need to exclude them.
 
 Appears that the history goes something like this...
 
   * Once upon a time we had 8 memory slots
 
   * v2.6.24-4949-ge0d62c7 added 4 reserved slots above those 8
 
   * v2.6.24-4950-gcbc9402 made use of slot 8 for tss
 
   * v2.6.24-4962-gf78e0e2 made use of slot 9 for tpr
 
   * v2.6.25-5341-gef2979b bumped the 8 slots to 32
 
   * v2.6.26-rc1-13-gb7ebfb0 made use of slot 10 for ept
 
   * v2.6.28-4952-g6fe6397 moved the previously hard coded slots
 8,9,10 up to KVM_MEMORY_SLOTS + {0,1,2}
 

Nice overview!

 So we haven't needed to reserve 8..11 for a while and we've never made
 use of all 4 private slots.  I should reduce that to 3 in my patch.
 Maybe there's a test we could do in userspace to figure out we don't
 have to skip those anymore?

We can simply remove the test, 2.6.29 is our entry barrier for both
upstream and qemu-kvm now.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-26 Thread Avi Kivity

On 01/25/2011 07:58 PM, Michael S. Tsirkin wrote:

On Tue, Jan 25, 2011 at 07:33:40PM +0200, Avi Kivity wrote:
  On 01/25/2011 04:59 PM, Michael S. Tsirkin wrote:
  On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
 For the other lookups, which we
 believe will succeed, we can assume the probablity of a match is
 related to the slot size, and sort the slots by page count.
  
  Unlikely to be true for assigned device BARs.

  We'll have a large slot at 4G+ - EOM, a medium slot at 1M-3G, and
  lots of small slots for BARs and such.

  The vast majority of faults will either touch one of the two largest
  slots, or will miss all slots.  Relatively few will hit the small
  slots.

Not if you are using one of the assigned devices and don't
do any faults on memory :)


It's impossible not to fault on memory.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-26 Thread Michael S. Tsirkin
On Wed, Jan 26, 2011 at 11:17:11AM +0200, Avi Kivity wrote:
 On 01/25/2011 07:58 PM, Michael S. Tsirkin wrote:
 On Tue, Jan 25, 2011 at 07:33:40PM +0200, Avi Kivity wrote:
   On 01/25/2011 04:59 PM, Michael S. Tsirkin wrote:
   On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
  For the other lookups, which we
  believe will succeed, we can assume the probablity of a match is
  related to the slot size, and sort the slots by page count.
   
   Unlikely to be true for assigned device BARs.
 
   We'll have a large slot at 4G+ - EOM, a medium slot at 1M-3G, and
   lots of small slots for BARs and such.
 
   The vast majority of faults will either touch one of the two largest
   slots, or will miss all slots.  Relatively few will hit the small
   slots.
 
 Not if you are using one of the assigned devices and don't
 do any faults on memory :)
 
 It's impossible not to fault on memory.

No I mean the RAM.

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-26 Thread Avi Kivity

On 01/25/2011 07:43 PM, Alex Williamson wrote:

On Tue, 2011-01-25 at 19:11 +0200, Avi Kivity wrote:
  On 01/25/2011 04:57 PM, Alex Williamson wrote:
On Tue, 2011-01-25 at 12:23 +0200, Avi Kivity wrote:
   On 01/25/2011 07:37 AM, Alex Williamson wrote:
  On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
  I'll look at how we might be
  able to allocate slots on demand.  Thanks,
   
  Here's a first cut just to see if this looks agreeable.  This 
allows the
  slot array to grow on demand.  This works with current userspace, 
as
  well as userspace trivially modified to double KVMState.slots and
  hotplugging enough pci-assign devices to exceed the previous limit 
(w/
  w/o ept).  Hopefully I got the rcu bits correct.  Does this look 
like
  the right path?

   This can be trivially exhausted to pin all RAM.
  
What's a reasonable upper limit?  A PCI device can have at most 6 MMIO
BARs, each taking one slot.

  A BAR can take no slots, or several slots.  For example a BAR might have
  a framebuffer, followed by an off-screen framebuffer, followed by an
  mmio register area in one BAR.  You'd want the framebuffer to be dirty
  logged while the offscreen framebuffer is not tracked (so one slot for
  each) while the mmio range cannot be used as a slot.

  That only holds for emulated devices.

Sure, emulated devices can do lots of specialty mappings, but I also
expect that more typically, mmio access to emulated devices will get
bounced through qemu and not use any slots.


Right; the example I gave (a framebuffer) is the exception rather than 
the rule; and I believe modern framebuffers are usually accessed through 
dma rather than BARs.



  It might also support MSI-X in a way that
splits a BAR, so an absolute max of 7 slots per PCI device.  Assuming
only 1 PCI bus for the moment and also assuming assigned devices are
typically single function, 32 devices * 7 slots/device = 224 slots, so
maybe a 256 limit?  Maybe we even bump it up to a limit of 64 devices
with a slot limit of 512.  It would be easier in device assignment code
to keep track of a number of devices limit than trying to guess whether
slots will be available when we need them.

  Sounds reasonable.  But we'd need a faster search for 512 slots.

Ok, I'll add a limit.  I'm not convinced the typical use case is going
to increase slots at all, so I'm still a little resistant to optimizing
the search at this point.


I don't want there to be two kvm implementations out there, both 
supporting 512 slots, while one is slow and one is fast.  It means that 
you have no idea what performance to expect.



BTW, simply by the ordering of when the
platform registers memory vs when other devices are mapped, we seem to
end up placing the actual memory regions at the front of the list.  Not
sure if that's by design or luck.  Thanks,


It's was done by design, though as qemu evolves, it becomes more and 
more a matter of luck that the order doesn't change.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-26 Thread Avi Kivity

On 01/26/2011 11:20 AM, Michael S. Tsirkin wrote:

On Wed, Jan 26, 2011 at 11:17:11AM +0200, Avi Kivity wrote:
  On 01/25/2011 07:58 PM, Michael S. Tsirkin wrote:
  On Tue, Jan 25, 2011 at 07:33:40PM +0200, Avi Kivity wrote:
 On 01/25/2011 04:59 PM, Michael S. Tsirkin wrote:
 On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
 For the other lookups, which we
 believe will succeed, we can assume the probablity of a match is
 related to the slot size, and sort the slots by page count.
 
 Unlikely to be true for assigned device BARs.
  
 We'll have a large slot at 4G+ - EOM, a medium slot at 1M-3G, and
 lots of small slots for BARs and such.
  
 The vast majority of faults will either touch one of the two largest
 slots, or will miss all slots.  Relatively few will hit the small
 slots.
  
  Not if you are using one of the assigned devices and don't
  do any faults on memory :)

  It's impossible not to fault on memory.

No I mean the RAM.


No idea what you mean.  It's impossible not to fault on RAM, either 
(unless you don't use it at all).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-26 Thread Avi Kivity

On 01/25/2011 08:00 PM, Michael S. Tsirkin wrote:

  
  I see now. Yes, a flag in spte would help.  changes in slots would then
  have to update all these flags.

  That's easy, we drop all sptes.

Ah, right. Hmm cpu has no flag to distinguish mmio sptes somehow already?


No.  Allocated-but-not-mapped and mmio sptes are identical.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-26 Thread Michael S. Tsirkin
On Wed, Jan 26, 2011 at 11:23:21AM +0200, Avi Kivity wrote:
 On 01/26/2011 11:20 AM, Michael S. Tsirkin wrote:
 On Wed, Jan 26, 2011 at 11:17:11AM +0200, Avi Kivity wrote:
   On 01/25/2011 07:58 PM, Michael S. Tsirkin wrote:
   On Tue, Jan 25, 2011 at 07:33:40PM +0200, Avi Kivity wrote:
  On 01/25/2011 04:59 PM, Michael S. Tsirkin wrote:
  On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
  For the other lookups, which we
  believe will succeed, we can assume the probablity of a match 
  is
  related to the slot size, and sort the slots by page count.
  
  Unlikely to be true for assigned device BARs.
   
  We'll have a large slot at 4G+ - EOM, a medium slot at 1M-3G, and
  lots of small slots for BARs and such.
   
  The vast majority of faults will either touch one of the two largest
  slots, or will miss all slots.  Relatively few will hit the small
  slots.
   
   Not if you are using one of the assigned devices and don't
   do any faults on memory :)
 
   It's impossible not to fault on memory.
 
 No I mean the RAM.
 
 No idea what you mean.  It's impossible not to fault on RAM, either
 (unless you don't use it at all).

I just mean that once you fault you map sptes and then you can use them
without exits.  mmio will cause exits each time. Right?

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-26 Thread Avi Kivity

On 01/26/2011 11:39 AM, Michael S. Tsirkin wrote:

On Wed, Jan 26, 2011 at 11:23:21AM +0200, Avi Kivity wrote:
  On 01/26/2011 11:20 AM, Michael S. Tsirkin wrote:
  On Wed, Jan 26, 2011 at 11:17:11AM +0200, Avi Kivity wrote:
 On 01/25/2011 07:58 PM, Michael S. Tsirkin wrote:
 On Tue, Jan 25, 2011 at 07:33:40PM +0200, Avi Kivity wrote:
 On 01/25/2011 04:59 PM, Michael S. Tsirkin wrote:
 On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
  For the other lookups, which we
  believe will succeed, we can assume the probablity of a 
match is
  related to the slot size, and sort the slots by page count.
 
 Unlikely to be true for assigned device BARs.
 
 We'll have a large slot at 4G+ - EOM, a medium slot at 1M-3G, and
 lots of small slots for BARs and such.
 
 The vast majority of faults will either touch one of the two 
largest
 slots, or will miss all slots.  Relatively few will hit the small
 slots.
 
 Not if you are using one of the assigned devices and don't
 do any faults on memory :)
  
 It's impossible not to fault on memory.
  
  No I mean the RAM.

  No idea what you mean.  It's impossible not to fault on RAM, either
  (unless you don't use it at all).

I just mean that once you fault you map sptes and then you can use them
without exits.  mmio will cause exits each time. Right?


The swapper scanning sptes, ksmd, khugepaged, and swapping can all cause 
a page to be unmapped.  Though it should certainly happen with a much 
lower frequency than mmio.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-26 Thread Michael S. Tsirkin
On Wed, Jan 26, 2011 at 11:54:21AM +0200, Avi Kivity wrote:
 On 01/26/2011 11:39 AM, Michael S. Tsirkin wrote:
 On Wed, Jan 26, 2011 at 11:23:21AM +0200, Avi Kivity wrote:
   On 01/26/2011 11:20 AM, Michael S. Tsirkin wrote:
   On Wed, Jan 26, 2011 at 11:17:11AM +0200, Avi Kivity wrote:
  On 01/25/2011 07:58 PM, Michael S. Tsirkin wrote:
  On Tue, Jan 25, 2011 at 07:33:40PM +0200, Avi Kivity wrote:
  On 01/25/2011 04:59 PM, Michael S. Tsirkin wrote:
  On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
   For the other lookups, which we
   believe will succeed, we can assume the probablity of a 
  match is
   related to the slot size, and sort the slots by page 
  count.
  
  Unlikely to be true for assigned device BARs.
  
  We'll have a large slot at 4G+ - EOM, a medium slot at 1M-3G, 
  and
  lots of small slots for BARs and such.
  
  The vast majority of faults will either touch one of the two 
  largest
  slots, or will miss all slots.  Relatively few will hit the 
  small
  slots.
  
  Not if you are using one of the assigned devices and don't
  do any faults on memory :)
   
  It's impossible not to fault on memory.
   
   No I mean the RAM.
 
   No idea what you mean.  It's impossible not to fault on RAM, either
   (unless you don't use it at all).
 
 I just mean that once you fault you map sptes and then you can use them
 without exits.  mmio will cause exits each time. Right?
 
 The swapper scanning sptes, ksmd, khugepaged, and swapping can all
 cause a page to be unmapped.  Though it should certainly happen with
 a much lower frequency than mmio.

Right. That's why I say that sorting by size might not be optimal.
Maybe a cache ...

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Avi Kivity

On 01/24/2011 11:32 AM, Marcelo Tosatti wrote:

On Fri, Jan 21, 2011 at 04:48:02PM -0700, Alex Williamson wrote:
  When doing device assignment, we use cpu_register_physical_memory() to
  directly map the qemu mmap of the device resource into the address
  space of the guest.  The unadvertised feature of the register physical
  memory code path on kvm, at least for this type of mapping, is that it
  needs to allocate an index from a small, fixed array of memory slots.
  Even better, if it can't get an index, the code aborts deep in the
  kvm specific bits, preventing the caller from having a chance to
  recover.

  It's really easy to hit this by hot adding too many assigned devices
  to a guest (pretty easy to hit with too many devices at instantiation
  time too, but the abort is slightly more bearable there).

  I'm assuming it's pretty difficult to make the memory slot array
  dynamically sized.  If that's not the case, please let me know as
  that would be a much better solution.

Its not difficult to either increase the maximum number (defined as
32 now in both qemu and kernel) of static slots, or support dynamic
increases, if it turns out to be a performance issue.



We can't make it unbounded in the kernel, since a malicious user could 
start creating an infinite amount of memory slots, pinning unbounded 
kernel memory.


If we make the limit much larger, we should start to think about 
efficiency.  Every mmio vmexit is currently a linear scan of the memory 
slot table, which is efficient at a small number of slots, but not at a 
large number.  We could conceivably encode the no slot information 
into a bit in the not-present spte.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Avi Kivity

On 01/25/2011 07:37 AM, Alex Williamson wrote:

On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
  I'll look at how we might be
  able to allocate slots on demand.  Thanks,

Here's a first cut just to see if this looks agreeable.  This allows the
slot array to grow on demand.  This works with current userspace, as
well as userspace trivially modified to double KVMState.slots and
hotplugging enough pci-assign devices to exceed the previous limit (w/
w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
the right path?


This can be trivially exhausted to pin all RAM.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Alex Williamson
On Tue, 2011-01-25 at 08:36 +0100, Jan Kiszka wrote:
 On 2011-01-25 06:37, Alex Williamson wrote:
  On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
  I'll look at how we might be
  able to allocate slots on demand.  Thanks,
  
  Here's a first cut just to see if this looks agreeable.  This allows the
  slot array to grow on demand.  This works with current userspace, as
  well as userspace trivially modified to double KVMState.slots and
  hotplugging enough pci-assign devices to exceed the previous limit (w/ 
  w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
  the right path?  If so, I'll work on removing the fixed limit from
  userspace next.  Thanks,
  
  Alex
  
  
  kvm: Allow memory slot array to grow on demand
  
  Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
  to grow on demand.  Private slots are now allocated at the
  front instead of the end.  Only x86 seems to use private slots,
 
 Hmm, doesn't current user space expect slots 8..11 to be the private
 ones and wouldn't it cause troubles if slots 0..3 are suddenly reserved?

The private slots aren't currently visible to userspace, they're
actually slots 32..35.  The patch automatically increments user passed
slot ids so userspace has it's own zero-based view of the array.
Frankly, I don't understand why userspace reserves slots 8..11, is this
compatibility with older kernel implementations?

  so this is now zero for all other archs.  The memslots pointer
  is already updated using rcu, so changing the size off the
  array when it's replaces is straight forward.  x86 also keeps
  a bitmap of slots used by a kvm_mmu_page, which requires a
  shadow tlb flush whenever we increase the number of slots.
  This forces the pages to be rebuilt with the new bitmap size.
 
 Is it possible for user space to increase the slot number to ridiculous
 amounts (at least as far as kmalloc allows) and then trigger a kernel
 walk through them in non-preemptible contexts? Just wondering, I haven't
 checked all contexts of functions like kvm_is_visible_gfn yet.
 
 If yes, we should already switch to rbtree or something like that.
 Otherwise that may wait a bit, but probably not too long.

Yeah, Avi has brought up the hole that userspace can exploit this
interface with these changes.  However, for 99+% of users, this change
leaves the slot array at about the same size, or makes it smaller.  Only
huge, scale-out guests would probably even see a doubling of slots (my
guest with 14 82576 VFs uses 48 slots).  On the kernel side, I think we
can safely save a tree implementation as a later optimization should we
determine it's necessary.  We'll have to see how the userspace side
matches to figure out what's best there.  Thanks,

Alex



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Michael S. Tsirkin
On Tue, Jan 25, 2011 at 07:41:32AM -0700, Alex Williamson wrote:
 On Tue, 2011-01-25 at 08:36 +0100, Jan Kiszka wrote:
  On 2011-01-25 06:37, Alex Williamson wrote:
   On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
   I'll look at how we might be
   able to allocate slots on demand.  Thanks,
   
   Here's a first cut just to see if this looks agreeable.  This allows the
   slot array to grow on demand.  This works with current userspace, as
   well as userspace trivially modified to double KVMState.slots and
   hotplugging enough pci-assign devices to exceed the previous limit (w/ 
   w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
   the right path?  If so, I'll work on removing the fixed limit from
   userspace next.  Thanks,
   
   Alex
   
   
   kvm: Allow memory slot array to grow on demand
   
   Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
   to grow on demand.  Private slots are now allocated at the
   front instead of the end.  Only x86 seems to use private slots,
  
  Hmm, doesn't current user space expect slots 8..11 to be the private
  ones and wouldn't it cause troubles if slots 0..3 are suddenly reserved?
 
 The private slots aren't currently visible to userspace, they're
 actually slots 32..35.  The patch automatically increments user passed
 slot ids so userspace has it's own zero-based view of the array.
 Frankly, I don't understand why userspace reserves slots 8..11, is this
 compatibility with older kernel implementations?
 
   so this is now zero for all other archs.  The memslots pointer
   is already updated using rcu, so changing the size off the
   array when it's replaces is straight forward.  x86 also keeps
   a bitmap of slots used by a kvm_mmu_page, which requires a
   shadow tlb flush whenever we increase the number of slots.
   This forces the pages to be rebuilt with the new bitmap size.
  
  Is it possible for user space to increase the slot number to ridiculous
  amounts (at least as far as kmalloc allows) and then trigger a kernel
  walk through them in non-preemptible contexts? Just wondering, I haven't
  checked all contexts of functions like kvm_is_visible_gfn yet.
  
  If yes, we should already switch to rbtree or something like that.
  Otherwise that may wait a bit, but probably not too long.
 
 Yeah, Avi has brought up the hole that userspace can exploit this
 interface with these changes.  However, for 99+% of users, this change
 leaves the slot array at about the same size, or makes it smaller.  Only
 huge, scale-out guests would probably even see a doubling of slots (my
 guest with 14 82576 VFs uses 48 slots).  On the kernel side, I think we
 can safely save a tree implementation as a later optimization should we
 determine it's necessary.  We'll have to see how the userspace side
 matches to figure out what's best there.  Thanks,
 
 Alex

Also, is there any time where we need to scan all slots
on data path?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Alex Williamson
On Tue, 2011-01-25 at 12:20 +0200, Avi Kivity wrote:
 On 01/24/2011 11:32 AM, Marcelo Tosatti wrote:
  On Fri, Jan 21, 2011 at 04:48:02PM -0700, Alex Williamson wrote:
When doing device assignment, we use cpu_register_physical_memory() to
directly map the qemu mmap of the device resource into the address
space of the guest.  The unadvertised feature of the register physical
memory code path on kvm, at least for this type of mapping, is that it
needs to allocate an index from a small, fixed array of memory slots.
Even better, if it can't get an index, the code aborts deep in the
kvm specific bits, preventing the caller from having a chance to
recover.
  
It's really easy to hit this by hot adding too many assigned devices
to a guest (pretty easy to hit with too many devices at instantiation
time too, but the abort is slightly more bearable there).
  
I'm assuming it's pretty difficult to make the memory slot array
dynamically sized.  If that's not the case, please let me know as
that would be a much better solution.
 
  Its not difficult to either increase the maximum number (defined as
  32 now in both qemu and kernel) of static slots, or support dynamic
  increases, if it turns out to be a performance issue.
 
 
 We can't make it unbounded in the kernel, since a malicious user could 
 start creating an infinite amount of memory slots, pinning unbounded 
 kernel memory.
 
 If we make the limit much larger, we should start to think about 
 efficiency.  Every mmio vmexit is currently a linear scan of the memory 
 slot table, which is efficient at a small number of slots, but not at a 
 large number.  We could conceivably encode the no slot information 
 into a bit in the not-present spte.

On the plus side, very, very few users need more than the current 32
slot limit and the implementation presented likely results in fewer
slots for the majority of the users.  We can maybe save efficiency
issues until we start seeing problems there.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Avi Kivity

On 01/25/2011 04:45 PM, Michael S. Tsirkin wrote:

Also, is there any time where we need to scan all slots
on data path?


All guest mmio accesses.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Michael S. Tsirkin
On Tue, Jan 25, 2011 at 12:20:03PM +0200, Avi Kivity wrote:
 On 01/24/2011 11:32 AM, Marcelo Tosatti wrote:
 On Fri, Jan 21, 2011 at 04:48:02PM -0700, Alex Williamson wrote:
   When doing device assignment, we use cpu_register_physical_memory() to
   directly map the qemu mmap of the device resource into the address
   space of the guest.  The unadvertised feature of the register physical
   memory code path on kvm, at least for this type of mapping, is that it
   needs to allocate an index from a small, fixed array of memory slots.
   Even better, if it can't get an index, the code aborts deep in the
   kvm specific bits, preventing the caller from having a chance to
   recover.
 
   It's really easy to hit this by hot adding too many assigned devices
   to a guest (pretty easy to hit with too many devices at instantiation
   time too, but the abort is slightly more bearable there).
 
   I'm assuming it's pretty difficult to make the memory slot array
   dynamically sized.  If that's not the case, please let me know as
   that would be a much better solution.
 
 Its not difficult to either increase the maximum number (defined as
 32 now in both qemu and kernel) of static slots, or support dynamic
 increases, if it turns out to be a performance issue.
 
 
 We can't make it unbounded in the kernel, since a malicious user
 could start creating an infinite amount of memory slots, pinning
 unbounded kernel memory.

How about keeping the slots in userspace memory, access them with copy
from user?

 If we make the limit much larger, we should start to think about
 efficiency.  Every mmio vmexit is currently a linear scan of the
 memory slot table, which is efficient at a small number of slots,
 but not at a large number.  We could conceivably encode the no
 slot information into a bit in the not-present spte.

OK, but the slots that Alex here wants to use are presumably
mostly not resulting in a pagefault at all, right?
Can we split such guys out so they don't slow mmio lookup?
Maybe keep *these* in userspace memory.

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Avi Kivity

On 01/25/2011 04:46 PM, Alex Williamson wrote:

On Tue, 2011-01-25 at 12:20 +0200, Avi Kivity wrote:
  On 01/24/2011 11:32 AM, Marcelo Tosatti wrote:
On Fri, Jan 21, 2011 at 04:48:02PM -0700, Alex Williamson wrote:
   When doing device assignment, we use cpu_register_physical_memory() to
   directly map the qemu mmap of the device resource into the address
   space of the guest.  The unadvertised feature of the register physical
   memory code path on kvm, at least for this type of mapping, is that it
   needs to allocate an index from a small, fixed array of memory slots.
   Even better, if it can't get an index, the code aborts deep in the
   kvm specific bits, preventing the caller from having a chance to
   recover.

   It's really easy to hit this by hot adding too many assigned devices
   to a guest (pretty easy to hit with too many devices at instantiation
   time too, but the abort is slightly more bearable there).

   I'm assuming it's pretty difficult to make the memory slot array
   dynamically sized.  If that's not the case, please let me know as
   that would be a much better solution.
  
Its not difficult to either increase the maximum number (defined as
32 now in both qemu and kernel) of static slots, or support dynamic
increases, if it turns out to be a performance issue.
  

  We can't make it unbounded in the kernel, since a malicious user could
  start creating an infinite amount of memory slots, pinning unbounded
  kernel memory.

  If we make the limit much larger, we should start to think about
  efficiency.  Every mmio vmexit is currently a linear scan of the memory
  slot table, which is efficient at a small number of slots, but not at a
  large number.  We could conceivably encode the no slot information
  into a bit in the not-present spte.

On the plus side, very, very few users need more than the current 32
slot limit and the implementation presented likely results in fewer
slots for the majority of the users.  We can maybe save efficiency
issues until we start seeing problems there.  Thanks,



Well, we need a static cap, but certainly limiting the search to the 
number of populated slots is an improvement.


We might keep the array size static (but only use the populated part).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Alex Williamson
On Tue, 2011-01-25 at 12:23 +0200, Avi Kivity wrote:
 On 01/25/2011 07:37 AM, Alex Williamson wrote:
  On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
I'll look at how we might be
able to allocate slots on demand.  Thanks,
 
  Here's a first cut just to see if this looks agreeable.  This allows the
  slot array to grow on demand.  This works with current userspace, as
  well as userspace trivially modified to double KVMState.slots and
  hotplugging enough pci-assign devices to exceed the previous limit (w/
  w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
  the right path?
 
 This can be trivially exhausted to pin all RAM.

What's a reasonable upper limit?  A PCI device can have at most 6 MMIO
BARs, each taking one slot.  It might also support MSI-X in a way that
splits a BAR, so an absolute max of 7 slots per PCI device.  Assuming
only 1 PCI bus for the moment and also assuming assigned devices are
typically single function, 32 devices * 7 slots/device = 224 slots, so
maybe a 256 limit?  Maybe we even bump it up to a limit of 64 devices
with a slot limit of 512.  It would be easier in device assignment code
to keep track of a number of devices limit than trying to guess whether
slots will be available when we need them.  Thanks,

Alex


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Avi Kivity

On 01/25/2011 04:55 PM, Michael S. Tsirkin wrote:


  We can't make it unbounded in the kernel, since a malicious user
  could start creating an infinite amount of memory slots, pinning
  unbounded kernel memory.

How about keeping the slots in userspace memory, access them with copy
from user?


Some of the data is validated by the kernel, so it needs a kernel copy.  
Other fields are completely internal to the kernel.



  If we make the limit much larger, we should start to think about
  efficiency.  Every mmio vmexit is currently a linear scan of the
  memory slot table, which is efficient at a small number of slots,
  but not at a large number.  We could conceivably encode the no
  slot information into a bit in the not-present spte.

OK, but the slots that Alex here wants to use are presumably
mostly not resulting in a pagefault at all, right?
Can we split such guys out so they don't slow mmio lookup?
Maybe keep *these* in userspace memory.


The algorithm is:

  page fault:
for each slot:
   if addr in slot:
  populate spte
  return
# no slot matches
mmio

so we have to try out all slots before we find out none of them are needed.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Michael S. Tsirkin
On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
 For the other lookups, which we
 believe will succeed, we can assume the probablity of a match is
 related to the slot size, and sort the slots by page count.

Unlikely to be true for assigned device BARs.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Michael S. Tsirkin
On Tue, Jan 25, 2011 at 04:58:41PM +0200, Avi Kivity wrote:
 On 01/25/2011 04:55 PM, Michael S. Tsirkin wrote:
 
   We can't make it unbounded in the kernel, since a malicious user
   could start creating an infinite amount of memory slots, pinning
   unbounded kernel memory.
 
 How about keeping the slots in userspace memory, access them with copy
 from user?
 
 Some of the data is validated by the kernel, so it needs a kernel
 copy.  Other fields are completely internal to the kernel.
 
   If we make the limit much larger, we should start to think about
   efficiency.  Every mmio vmexit is currently a linear scan of the
   memory slot table, which is efficient at a small number of slots,
   but not at a large number.  We could conceivably encode the no
   slot information into a bit in the not-present spte.
 
 OK, but the slots that Alex here wants to use are presumably
 mostly not resulting in a pagefault at all, right?
 Can we split such guys out so they don't slow mmio lookup?
 Maybe keep *these* in userspace memory.
 
 The algorithm is:
 
   page fault:
 for each slot:
if addr in slot:
   populate spte
   return
 # no slot matches
 mmio
 
 so we have to try out all slots before we find out none of them are needed.

I see now. Yes, a flag in spte would help.  changes in slots would then
have to update all these flags.

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Jan Kiszka
On 2011-01-25 15:53, Avi Kivity wrote:
 On 01/25/2011 04:41 PM, Alex Williamson wrote:
   
   
 kvm: Allow memory slot array to grow on demand
   
 Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
 to grow on demand.  Private slots are now allocated at the
 front instead of the end.  Only x86 seems to use private slots,
 
   Hmm, doesn't current user space expect slots 8..11 to be the private
   ones and wouldn't it cause troubles if slots 0..3 are suddenly
 reserved?

 The private slots aren't currently visible to userspace, they're
 actually slots 32..35.  The patch automatically increments user passed
 slot ids so userspace has it's own zero-based view of the array.
 Frankly, I don't understand why userspace reserves slots 8..11, is this
 compatibility with older kernel implementations?
 
 I think so.  I believe these kernel versions are too old now to matter,
 but of course I can't be sure.

Will check and enable those 4 additional slots for user space if we no
longer need to exclude them.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Avi Kivity

On 01/25/2011 04:57 PM, Alex Williamson wrote:

On Tue, 2011-01-25 at 12:23 +0200, Avi Kivity wrote:
  On 01/25/2011 07:37 AM, Alex Williamson wrote:
On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
   I'll look at how we might be
   able to allocate slots on demand.  Thanks,
  
Here's a first cut just to see if this looks agreeable.  This allows the
slot array to grow on demand.  This works with current userspace, as
well as userspace trivially modified to double KVMState.slots and
hotplugging enough pci-assign devices to exceed the previous limit (w/
w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
the right path?

  This can be trivially exhausted to pin all RAM.

What's a reasonable upper limit?  A PCI device can have at most 6 MMIO
BARs, each taking one slot.


A BAR can take no slots, or several slots.  For example a BAR might have 
a framebuffer, followed by an off-screen framebuffer, followed by an 
mmio register area in one BAR.  You'd want the framebuffer to be dirty 
logged while the offscreen framebuffer is not tracked (so one slot for 
each) while the mmio range cannot be used as a slot.


That only holds for emulated devices.


  It might also support MSI-X in a way that
splits a BAR, so an absolute max of 7 slots per PCI device.  Assuming
only 1 PCI bus for the moment and also assuming assigned devices are
typically single function, 32 devices * 7 slots/device = 224 slots, so
maybe a 256 limit?  Maybe we even bump it up to a limit of 64 devices
with a slot limit of 512.  It would be easier in device assignment code
to keep track of a number of devices limit than trying to guess whether
slots will be available when we need them.


Sounds reasonable.  But we'd need a faster search for 512 slots.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Avi Kivity

On 01/25/2011 04:59 PM, Michael S. Tsirkin wrote:

On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
  For the other lookups, which we
  believe will succeed, we can assume the probablity of a match is
  related to the slot size, and sort the slots by page count.

Unlikely to be true for assigned device BARs.


We'll have a large slot at 4G+ - EOM, a medium slot at 1M-3G, and lots 
of small slots for BARs and such.


The vast majority of faults will either touch one of the two largest 
slots, or will miss all slots.  Relatively few will hit the small slots.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Avi Kivity

On 01/25/2011 05:23 PM, Michael S. Tsirkin wrote:

On Tue, Jan 25, 2011 at 04:58:41PM +0200, Avi Kivity wrote:
  On 01/25/2011 04:55 PM, Michael S. Tsirkin wrote:
  
 We can't make it unbounded in the kernel, since a malicious user
 could start creating an infinite amount of memory slots, pinning
 unbounded kernel memory.
  
  How about keeping the slots in userspace memory, access them with copy
  from user?

  Some of the data is validated by the kernel, so it needs a kernel
  copy.  Other fields are completely internal to the kernel.

 If we make the limit much larger, we should start to think about
 efficiency.  Every mmio vmexit is currently a linear scan of the
 memory slot table, which is efficient at a small number of slots,
 but not at a large number.  We could conceivably encode the no
 slot information into a bit in the not-present spte.
  
  OK, but the slots that Alex here wants to use are presumably
  mostly not resulting in a pagefault at all, right?
  Can we split such guys out so they don't slow mmio lookup?
  Maybe keep *these* in userspace memory.

  The algorithm is:

page fault:
  for each slot:
 if addr in slot:
populate spte
return
  # no slot matches
  mmio

  so we have to try out all slots before we find out none of them are needed.

I see now. Yes, a flag in spte would help.  changes in slots would then
have to update all these flags.


That's easy, we drop all sptes.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Alex Williamson
On Tue, 2011-01-25 at 19:11 +0200, Avi Kivity wrote:
 On 01/25/2011 04:57 PM, Alex Williamson wrote:
  On Tue, 2011-01-25 at 12:23 +0200, Avi Kivity wrote:
On 01/25/2011 07:37 AM, Alex Williamson wrote:
  On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
 I'll look at how we might be
 able to allocate slots on demand.  Thanks,

  Here's a first cut just to see if this looks agreeable.  This allows 
   the
  slot array to grow on demand.  This works with current userspace, as
  well as userspace trivially modified to double KVMState.slots and
  hotplugging enough pci-assign devices to exceed the previous limit 
   (w/
  w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
  the right path?
  
This can be trivially exhausted to pin all RAM.
 
  What's a reasonable upper limit?  A PCI device can have at most 6 MMIO
  BARs, each taking one slot.
 
 A BAR can take no slots, or several slots.  For example a BAR might have 
 a framebuffer, followed by an off-screen framebuffer, followed by an 
 mmio register area in one BAR.  You'd want the framebuffer to be dirty 
 logged while the offscreen framebuffer is not tracked (so one slot for 
 each) while the mmio range cannot be used as a slot.
 
 That only holds for emulated devices.

Sure, emulated devices can do lots of specialty mappings, but I also
expect that more typically, mmio access to emulated devices will get
bounced through qemu and not use any slots.

It might also support MSI-X in a way that
  splits a BAR, so an absolute max of 7 slots per PCI device.  Assuming
  only 1 PCI bus for the moment and also assuming assigned devices are
  typically single function, 32 devices * 7 slots/device = 224 slots, so
  maybe a 256 limit?  Maybe we even bump it up to a limit of 64 devices
  with a slot limit of 512.  It would be easier in device assignment code
  to keep track of a number of devices limit than trying to guess whether
  slots will be available when we need them.
 
 Sounds reasonable.  But we'd need a faster search for 512 slots.

Ok, I'll add a limit.  I'm not convinced the typical use case is going
to increase slots at all, so I'm still a little resistant to optimizing
the search at this point.  BTW, simply by the ordering of when the
platform registers memory vs when other devices are mapped, we seem to
end up placing the actual memory regions at the front of the list.  Not
sure if that's by design or luck.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Michael S. Tsirkin
On Tue, Jan 25, 2011 at 07:33:40PM +0200, Avi Kivity wrote:
 On 01/25/2011 04:59 PM, Michael S. Tsirkin wrote:
 On Tue, Jan 25, 2011 at 04:53:44PM +0200, Avi Kivity wrote:
   For the other lookups, which we
   believe will succeed, we can assume the probablity of a match is
   related to the slot size, and sort the slots by page count.
 
 Unlikely to be true for assigned device BARs.
 
 We'll have a large slot at 4G+ - EOM, a medium slot at 1M-3G, and
 lots of small slots for BARs and such.
 
 The vast majority of faults will either touch one of the two largest
 slots, or will miss all slots.  Relatively few will hit the small
 slots.

Not if you are using one of the assigned devices and don't
do any faults on memory :)

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Michael S. Tsirkin
On Tue, Jan 25, 2011 at 07:34:18PM +0200, Avi Kivity wrote:
 On 01/25/2011 05:23 PM, Michael S. Tsirkin wrote:
 On Tue, Jan 25, 2011 at 04:58:41PM +0200, Avi Kivity wrote:
   On 01/25/2011 04:55 PM, Michael S. Tsirkin wrote:
   
  We can't make it unbounded in the kernel, since a malicious user
  could start creating an infinite amount of memory slots, pinning
  unbounded kernel memory.
   
   How about keeping the slots in userspace memory, access them with copy
   from user?
 
   Some of the data is validated by the kernel, so it needs a kernel
   copy.  Other fields are completely internal to the kernel.
 
  If we make the limit much larger, we should start to think about
  efficiency.  Every mmio vmexit is currently a linear scan of the
  memory slot table, which is efficient at a small number of slots,
  but not at a large number.  We could conceivably encode the no
  slot information into a bit in the not-present spte.
   
   OK, but the slots that Alex here wants to use are presumably
   mostly not resulting in a pagefault at all, right?
   Can we split such guys out so they don't slow mmio lookup?
   Maybe keep *these* in userspace memory.
 
   The algorithm is:
 
 page fault:
   for each slot:
  if addr in slot:
 populate spte
 return
   # no slot matches
   mmio
 
   so we have to try out all slots before we find out none of them are 
  needed.
 
 I see now. Yes, a flag in spte would help.  changes in slots would then
 have to update all these flags.
 
 That's easy, we drop all sptes.

Ah, right. Hmm cpu has no flag to distinguish mmio sptes somehow already?

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-25 Thread Alex Williamson
On Tue, 2011-01-25 at 17:35 +0100, Jan Kiszka wrote:
 On 2011-01-25 15:53, Avi Kivity wrote:
  On 01/25/2011 04:41 PM, Alex Williamson wrote:


  kvm: Allow memory slot array to grow on demand

  Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
  to grow on demand.  Private slots are now allocated at the
  front instead of the end.  Only x86 seems to use private slots,
  
Hmm, doesn't current user space expect slots 8..11 to be the private
ones and wouldn't it cause troubles if slots 0..3 are suddenly
  reserved?
 
  The private slots aren't currently visible to userspace, they're
  actually slots 32..35.  The patch automatically increments user passed
  slot ids so userspace has it's own zero-based view of the array.
  Frankly, I don't understand why userspace reserves slots 8..11, is this
  compatibility with older kernel implementations?
  
  I think so.  I believe these kernel versions are too old now to matter,
  but of course I can't be sure.
 
 Will check and enable those 4 additional slots for user space if we no
 longer need to exclude them.

Appears that the history goes something like this...

  * Once upon a time we had 8 memory slots

  * v2.6.24-4949-ge0d62c7 added 4 reserved slots above those 8

  * v2.6.24-4950-gcbc9402 made use of slot 8 for tss

  * v2.6.24-4962-gf78e0e2 made use of slot 9 for tpr

  * v2.6.25-5341-gef2979b bumped the 8 slots to 32

  * v2.6.26-rc1-13-gb7ebfb0 made use of slot 10 for ept

  * v2.6.28-4952-g6fe6397 moved the previously hard coded slots
8,9,10 up to KVM_MEMORY_SLOTS + {0,1,2}

So we haven't needed to reserve 8..11 for a while and we've never made
use of all 4 private slots.  I should reduce that to 3 in my patch.
Maybe there's a test we could do in userspace to figure out we don't
have to skip those anymore?

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-24 Thread Marcelo Tosatti
On Fri, Jan 21, 2011 at 04:48:02PM -0700, Alex Williamson wrote:
 When doing device assignment, we use cpu_register_physical_memory() to
 directly map the qemu mmap of the device resource into the address
 space of the guest.  The unadvertised feature of the register physical
 memory code path on kvm, at least for this type of mapping, is that it
 needs to allocate an index from a small, fixed array of memory slots.
 Even better, if it can't get an index, the code aborts deep in the
 kvm specific bits, preventing the caller from having a chance to
 recover.
 
 It's really easy to hit this by hot adding too many assigned devices
 to a guest (pretty easy to hit with too many devices at instantiation
 time too, but the abort is slightly more bearable there).
 
 I'm assuming it's pretty difficult to make the memory slot array
 dynamically sized.  If that's not the case, please let me know as
 that would be a much better solution.

Its not difficult to either increase the maximum number (defined as
32 now in both qemu and kernel) of static slots, or support dynamic
increases, if it turns out to be a performance issue.

But you'd probably want to fix the abort for currently supported kernels
anyway.

 I'm not terribly happy with the solution in this series, it doesn't
 provide any guarantees whether a cpu_register_physical_memory() will
 succeed, only slightly better educated guesses.
 
 Are there better ideas how we could solve this?  Thanks,

Why can't cpu_register_physical_memory() return an error so you can
fallback to slow mode or cancel device insertion?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-24 Thread Alex Williamson
On Mon, 2011-01-24 at 15:16 +0100, Jan Kiszka wrote:
 On 2011-01-24 10:32, Marcelo Tosatti wrote:
  On Fri, Jan 21, 2011 at 04:48:02PM -0700, Alex Williamson wrote:
  When doing device assignment, we use cpu_register_physical_memory() to
  directly map the qemu mmap of the device resource into the address
  space of the guest.  The unadvertised feature of the register physical
  memory code path on kvm, at least for this type of mapping, is that it
  needs to allocate an index from a small, fixed array of memory slots.
  Even better, if it can't get an index, the code aborts deep in the
  kvm specific bits, preventing the caller from having a chance to
  recover.
 
  It's really easy to hit this by hot adding too many assigned devices
  to a guest (pretty easy to hit with too many devices at instantiation
  time too, but the abort is slightly more bearable there).
 
  I'm assuming it's pretty difficult to make the memory slot array
  dynamically sized.  If that's not the case, please let me know as
  that would be a much better solution.
  
  Its not difficult to either increase the maximum number (defined as
  32 now in both qemu and kernel) of static slots, or support dynamic
  increases, if it turns out to be a performance issue.
 
 Static limits are waiting to be hit again, just a bit later.

Yep, and I think static limits large enough that we can't hit them would
be performance prohibitive.

 I would start thinking about a tree search as well because iterating
 over all slots won't get faster over the time.
 
  
  But you'd probably want to fix the abort for currently supported kernels
  anyway.
 
 Jep. Depending on how soon we have smarter solution in the kernel, this
 fix may vary in pragmatism.
 
 
  I'm not terribly happy with the solution in this series, it doesn't
  provide any guarantees whether a cpu_register_physical_memory() will
  succeed, only slightly better educated guesses.
 
  Are there better ideas how we could solve this?  Thanks,
  
  Why can't cpu_register_physical_memory() return an error so you can
  fallback to slow mode or cancel device insertion?

It appears that it'd be pretty intrusive to fix this since
cpu_register_physical_memory() is a return void, and the kvm hook into
this is via a set_memory callback for the phys memory client.

 Doesn't that registration happens much later than the call to
 pci_register_bar? In any case, this will require significantly more
 invasive work (but it would be much nicer if possible, no question).

Right, we register BARs in the initfn, but we don't map them until the
guest writes the BARs, mapping them into MMIO space.  I don't think we
want to fall back to slow mapping at that point, so we either need to
fail in the initfn (like this series) or be able to dynamically allocate
more slots so the kvm callback can't fail.  I'll look at how we might be
able to allocate slots on demand.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-24 Thread Alex Williamson
On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
 I'll look at how we might be
 able to allocate slots on demand.  Thanks,

Here's a first cut just to see if this looks agreeable.  This allows the
slot array to grow on demand.  This works with current userspace, as
well as userspace trivially modified to double KVMState.slots and
hotplugging enough pci-assign devices to exceed the previous limit (w/ 
w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
the right path?  If so, I'll work on removing the fixed limit from
userspace next.  Thanks,

Alex


kvm: Allow memory slot array to grow on demand

Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
to grow on demand.  Private slots are now allocated at the
front instead of the end.  Only x86 seems to use private slots,
so this is now zero for all other archs.  The memslots pointer
is already updated using rcu, so changing the size off the
array when it's replaces is straight forward.  x86 also keeps
a bitmap of slots used by a kvm_mmu_page, which requires a
shadow tlb flush whenever we increase the number of slots.
This forces the pages to be rebuilt with the new bitmap size.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 arch/ia64/include/asm/kvm_host.h|4 --
 arch/ia64/kvm/kvm-ia64.c|2 +
 arch/powerpc/include/asm/kvm_host.h |3 --
 arch/s390/include/asm/kvm_host.h|3 --
 arch/x86/include/asm/kvm_host.h |3 +-
 arch/x86/include/asm/vmx.h  |6 ++-
 arch/x86/kvm/mmu.c  |7 +++-
 arch/x86/kvm/x86.c  |6 ++-
 include/linux/kvm_host.h|7 +++-
 virt/kvm/kvm_main.c |   65 ---
 10 files changed, 63 insertions(+), 43 deletions(-)


diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 2689ee5..11d0ab2 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -23,10 +23,6 @@
 #ifndef __ASM_KVM_HOST_H
 #define __ASM_KVM_HOST_H
 
-#define KVM_MEMORY_SLOTS 32
-/* memory slots that does not exposed to userspace */
-#define KVM_PRIVATE_MEM_SLOTS 4
-
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 
 /* define exit reasons from vmm to kvm*/
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 70d224d..f1adda2 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1814,7 +1814,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
mutex_lock(kvm-slots_lock);
 
r = -EINVAL;
-   if (log-slot = KVM_MEMORY_SLOTS)
+   if (log-slot = kvm-memslots-nmemslots)
goto out;
 
memslot = kvm-memslots-memslots[log-slot];
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index bba3b9b..dc80057 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -29,9 +29,6 @@
 #include asm/kvm_asm.h
 
 #define KVM_MAX_VCPUS 1
-#define KVM_MEMORY_SLOTS 32
-/* memory slots that does not exposed to userspace */
-#define KVM_PRIVATE_MEM_SLOTS 4
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index cef7dbf..92a964c 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -20,9 +20,6 @@
 #include asm/cpu.h
 
 #define KVM_MAX_VCPUS 64
-#define KVM_MEMORY_SLOTS 32
-/* memory slots that does not exposed to userspace */
-#define KVM_PRIVATE_MEM_SLOTS 4
 
 struct sca_entry {
atomic_t scn;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ffd7f8d..df1382c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -27,7 +27,6 @@
 #include asm/msr-index.h
 
 #define KVM_MAX_VCPUS 64
-#define KVM_MEMORY_SLOTS 32
 /* memory slots that does not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 4
 
@@ -207,7 +206,7 @@ struct kvm_mmu_page {
 * One bit set per slot which has memory
 * in this shadow page.
 */
-   DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
+   unsigned long *slot_bitmap;
bool multimapped; /* More than one parent_pte? */
bool unsync;
int root_count;  /* Currently serving as active root */
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 84471b8..7fd8c89 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -370,9 +370,9 @@ enum vmcs_field {
 
 #define AR_RESERVD_MASK 0xfffe0f00
 
-#define TSS_PRIVATE_MEMSLOT(KVM_MEMORY_SLOTS + 0)
-#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT   (KVM_MEMORY_SLOTS + 1)
-#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT (KVM_MEMORY_SLOTS + 2)
+#define TSS_PRIVATE_MEMSLOT0
+#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT   1
+#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT 2
 
 #define VMX_NR_VPIDS 

Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-24 Thread Jan Kiszka
On 2011-01-25 06:37, Alex Williamson wrote:
 On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
 I'll look at how we might be
 able to allocate slots on demand.  Thanks,
 
 Here's a first cut just to see if this looks agreeable.  This allows the
 slot array to grow on demand.  This works with current userspace, as
 well as userspace trivially modified to double KVMState.slots and
 hotplugging enough pci-assign devices to exceed the previous limit (w/ 
 w/o ept).  Hopefully I got the rcu bits correct.  Does this look like
 the right path?  If so, I'll work on removing the fixed limit from
 userspace next.  Thanks,
 
 Alex
 
 
 kvm: Allow memory slot array to grow on demand
 
 Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
 to grow on demand.  Private slots are now allocated at the
 front instead of the end.  Only x86 seems to use private slots,

Hmm, doesn't current user space expect slots 8..11 to be the private
ones and wouldn't it cause troubles if slots 0..3 are suddenly reserved?

 so this is now zero for all other archs.  The memslots pointer
 is already updated using rcu, so changing the size off the
 array when it's replaces is straight forward.  x86 also keeps
 a bitmap of slots used by a kvm_mmu_page, which requires a
 shadow tlb flush whenever we increase the number of slots.
 This forces the pages to be rebuilt with the new bitmap size.

Is it possible for user space to increase the slot number to ridiculous
amounts (at least as far as kmalloc allows) and then trigger a kernel
walk through them in non-preemptible contexts? Just wondering, I haven't
checked all contexts of functions like kvm_is_visible_gfn yet.

If yes, we should already switch to rbtree or something like that.
Otherwise that may wait a bit, but probably not too long.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

2011-01-22 Thread Michael S. Tsirkin
On Fri, Jan 21, 2011 at 04:48:02PM -0700, Alex Williamson wrote:
 When doing device assignment, we use cpu_register_physical_memory() to
 directly map the qemu mmap of the device resource into the address
 space of the guest.  The unadvertised feature of the register physical
 memory code path on kvm, at least for this type of mapping, is that it
 needs to allocate an index from a small, fixed array of memory slots.
 Even better, if it can't get an index, the code aborts deep in the
 kvm specific bits, preventing the caller from having a chance to
 recover.
 
 It's really easy to hit this by hot adding too many assigned devices
 to a guest (pretty easy to hit with too many devices at instantiation
 time too, but the abort is slightly more bearable there).
 
 I'm assuming it's pretty difficult to make the memory slot array
 dynamically sized.  If that's not the case, please let me know as
 that would be a much better solution.
 
 I'm not terribly happy with the solution in this series, it doesn't
 provide any guarantees whether a cpu_register_physical_memory() will
 succeed, only slightly better educated guesses.
 
 Are there better ideas how we could solve this?  Thanks,
 
 Alex

Put the table in qemu memory, make kvm access it with copy from/to user?
It can then be any size ...

 ---
 
 Alex Williamson (2):
   device-assignment: Count required kvm memory slots
   kvm: Allow querying free slots
 
 
  hw/device-assignment.c |   59 
 +++-
  hw/device-assignment.h |3 ++
  kvm-all.c  |   16 +
  kvm.h  |2 ++
  4 files changed, 79 insertions(+), 1 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html