Re: [PATCH 04/14] KVM: PPC: e500: MMU API

2011-11-10 Thread Alexander Graf

On 11/01/2011 05:16 PM, Scott Wood wrote:

On 11/01/2011 03:58 AM, Avi Kivity wrote:

On 10/31/2011 10:12 PM, Scott Wood wrote:

+4.59 KVM_DIRTY_TLB
+
+Capability: KVM_CAP_SW_TLB
+Architectures: ppc
+Type: vcpu ioctl
+Parameters: struct kvm_dirty_tlb (in)
+Returns: 0 on success, -1 on error
+
+struct kvm_dirty_tlb {
+   __u64 bitmap;
+   __u32 num_dirty;
+};

This is not 32/64 bit safe.  e500 is 32-bit only, yes?

e5500 is 64-bit -- we don't support it with KVM yet, but it's planned.


but what if someone wants to emulate an e500 on a ppc64?  maybe it's better to 
add
padding here.

What is unsafe about it?  Are you picturing TLBs with more than 4
billion entries?

sizeof(struct kvm_tlb_dirty) == 12 for 32-bit userspace, but ==  16 for
64-bit userspace and the kernel.  ABI structures must have the same
alignment and size for 32/64 bit userspace, or they need compat handling.

The size is 16 on 32-bit ppc -- the alignment of __u64 forces this.  It
looks like this is different in the 32x86 ABI.

We can pad explicitly if you prefer.


I would prefer if we keep this stable :). There's no good reason to pad 
it - ppc64 creates the same struct definition.



There shouldn't be any alignment issues.


Another alternative is to drop the num_dirty field (and let the kernel
compute it instead, shouldn't take long?), and have the third argument
to ioctl() reference the bitmap directly.

The idea was to make it possible for the kernel to apply a threshold
above which it would be better to ignore the bitmap entirely and flush
everything:

http://www.spinics.net/lists/kvm/msg50079.html

Currently we always just flush everything, and QEMU always says
everything is dirty when it makes a change, but the API is there if needed.

Right, but you don't need num_dirty for it.  There are typically only a
few dozen entries, yes?  It should take a trivial amount of time to
calculate its weight.

There are over 500 entries currently, and QEMU could make it much larger
if it wants to decrease guest-visible faults on certain workloads.

It's not the most important feature, indeed we currently ignore the
bitmap entirely.  But it could be useful depending on how the API is
used in the future, and I don't think we gain much by dropping it at
this point.  Alex, any thoughts?


The kernel can always opt in to ignore the field if it chooses to, so I 
don't see the point in dropping it. There shouldn't be an alignment 
problem in the first place :).



Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/14] KVM: PPC: e500: MMU API

2011-11-10 Thread Avi Kivity
On 11/10/2011 04:20 PM, Alexander Graf wrote:
 looks like this is different in the 32x86 ABI.

 We can pad explicitly if you prefer.
 The size is 16 on 32-bit ppc -- the alignment of __u64 forces this.  It

 I would prefer if we keep this stable :). There's no good reason to
 pad it - ppc64 creates the same struct definition. There are over 500
 entries currently, and QEMU could make it much larger
 if it wants to decrease guest-visible faults on certain workloads.

 It's not the most important feature, indeed we currently ignore the
 bitmap entirely.  But it could be useful depending on how the API is
 used in the future, and I don't think we gain much by dropping it at
 this point.  Alex, any thoughts?

 The kernel can always opt in to ignore the field if it chooses to, so
 I don't see the point in dropping it. There shouldn't be an alignment
 problem in the first place :).

Ok.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/14] KVM: PPC: e500: MMU API

2011-11-02 Thread Avi Kivity
On 11/01/2011 06:16 PM, Scott Wood wrote:
  
  sizeof(struct kvm_tlb_dirty) == 12 for 32-bit userspace, but ==  16 for
  64-bit userspace and the kernel.  ABI structures must have the same
  alignment and size for 32/64 bit userspace, or they need compat handling.

 The size is 16 on 32-bit ppc -- the alignment of __u64 forces this.  It
 looks like this is different in the 32x86 ABI.

Right, __u64 alignment on i386 is 4.

 We can pad explicitly if you prefer.

No real need - unless it may be reused by another arch?  I think that's
unlikely.

  This API has been discussed extensively, and the code using it is
  already in mainline QEMU.  This aspect of it hasn't changed since the
  discussion back in February:
 
  http://www.spinics.net/lists/kvm/msg50102.html
 
  I'd prefer to avoid another round of major overhaul without a really
  good reason.
  
  Me too, but I also prefer not to make ABI choices by inertia.  ABI is
  practically the only thing I care about wrt non-x86 (other than
  whitespace, of course).  Please involve me in the discussions earlier in
  the future.

 You participated in that thread. :-)

Well, my memory isn't what it used to be, or at least what I seem to
remember it used to be.

 
  These are the assumptions needed to make such an interface well-defined.
  
  Just remarking on the complexity, don't take it personally.

 :-)

 Just wasn't sure whether the implication was that it was too complex.


It is too complex, but that's entirely the fault of the hardware.  All
we can do is complain and enjoy the guaranteed job security.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/14] KVM: PPC: e500: MMU API

2011-11-01 Thread Avi Kivity
On 10/31/2011 10:12 PM, Scott Wood wrote:
  +4.59 KVM_DIRTY_TLB
  +
  +Capability: KVM_CAP_SW_TLB
  +Architectures: ppc
  +Type: vcpu ioctl
  +Parameters: struct kvm_dirty_tlb (in)
  +Returns: 0 on success, -1 on error
  +
  +struct kvm_dirty_tlb {
  +  __u64 bitmap;
  +  __u32 num_dirty;
  +};
  
  This is not 32/64 bit safe.  e500 is 32-bit only, yes?

 e5500 is 64-bit -- we don't support it with KVM yet, but it's planned.

  but what if someone wants to emulate an e500 on a ppc64?  maybe it's better 
  to add
  padding here.

 What is unsafe about it?  Are you picturing TLBs with more than 4
 billion entries?

sizeof(struct kvm_tlb_dirty) == 12 for 32-bit userspace, but ==  16 for
64-bit userspace and the kernel.  ABI structures must have the same
alignment and size for 32/64 bit userspace, or they need compat handling.

 There shouldn't be any alignment issues.

  Another alternative is to drop the num_dirty field (and let the kernel
  compute it instead, shouldn't take long?), and have the third argument
  to ioctl() reference the bitmap directly.

 The idea was to make it possible for the kernel to apply a threshold
 above which it would be better to ignore the bitmap entirely and flush
 everything:

 http://www.spinics.net/lists/kvm/msg50079.html

 Currently we always just flush everything, and QEMU always says
 everything is dirty when it makes a change, but the API is there if needed.

Right, but you don't need num_dirty for it.  There are typically only a
few dozen entries, yes?  It should take a trivial amount of time to
calculate its weight.

  +Configures the virtual CPU's TLB array, establishing a shared memory area
  +between userspace and KVM.  The params and array fields are userspace
  +addresses of mmu-type-specific data structures.  The array_len field is 
  an
  +safety mechanism, and should be set to the size in bytes of the memory 
  that
  +userspace has reserved for the array.  It must be at least the size 
  dictated
  +by mmu_type and params.
  +
  +While KVM_RUN is active, the shared region is under control of KVM.  Its
  +contents are undefined, and any modification by userspace results in
  +boundedly undefined behavior.
  +
  +On return from KVM_RUN, the shared region will reflect the current state 
  of
  +the guest's TLB.  If userspace makes any changes, it must call 
  KVM_DIRTY_TLB
  +to tell KVM which entries have been changed, prior to calling KVM_RUN 
  again
  +on this vcpu.
  
  We already have another mechanism for such shared memory,
  mmap(vcpu_fd).  x86 uses it for the coalesced mmio region as well as the
  traditional kvm_run area.  Please consider using it.

 What does it buy us, other than needing a separate codepath in QEMU to
 allocate the memory differently based on whether KVM (and this feature)

The ability to use get_free_pages() and ordinary kernel memory directly,
instead of indirection through a struct page ** array.

 are being used, since QEMU uses this for its own MMU representation?

 This API has been discussed extensively, and the code using it is
 already in mainline QEMU.  This aspect of it hasn't changed since the
 discussion back in February:

 http://www.spinics.net/lists/kvm/msg50102.html

 I'd prefer to avoid another round of major overhaul without a really
 good reason.

Me too, but I also prefer not to make ABI choices by inertia.  ABI is
practically the only thing I care about wrt non-x86 (other than
whitespace, of course).  Please involve me in the discussions earlier in
the future.

  +For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
  + - The params field is of type struct kvm_book3e_206_tlb_params.
  + - The array field points to an array of type struct
  +   kvm_book3e_206_tlb_entry.
  + - The array consists of all entries in the first TLB, followed by all
  +   entries in the second TLB.
  + - Within a TLB, entries are ordered first by increasing set number.  
  Within a
  +   set, entries are ordered by way (increasing ESEL).
  + - The hash for determining set number in TLB0 is: (MAS2  12)  
  (num_sets - 1)
  +   where num_sets is the tlb_sizes[] value divided by the tlb_ways[] 
  value.
  + - The tsize field of mas1 shall be set to 4K on TLB0, even though the
  +   hardware ignores this value for TLB0.
  
  Holy shit.

 You were the one that first suggested we use shared data:
 http://www.spinics.net/lists/kvm/msg49802.html

 These are the assumptions needed to make such an interface well-defined.

Just remarking on the complexity, don't take it personally.

  @@ -95,6 +90,9 @@ struct kvmppc_vcpu_e500 {
 u32 tlb1cfg;
 u64 mcar;
   
  +  struct page **shared_tlb_pages;
  +  int num_shared_tlb_pages;
  +
  
  I missed the requirement that things be page aligned.

 They don't need to be, we'll ignore the data before and after the shared
 area.

  If you use mmap(vcpu_fd) this becomes simpler; you can use
  get_free_pages() and have a single pointer.  You can also use vmap() on
  this array (but