Re: [PATCH 5/6] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-06 Thread David Gibson
On Tue, May 07, 2013 at 04:27:49PM +1000, Alexey Kardashevskiy wrote:
> On 05/07/2013 04:02 PM, David Gibson wrote:
> > On Tue, May 07, 2013 at 03:51:31PM +1000, Alexey Kardashevskiy wrote:
> >> On 05/07/2013 03:29 PM, David Gibson wrote:
> >>> On Mon, May 06, 2013 at 05:25:56PM +1000, Alexey Kardashevskiy wrote:
[snip]
>  +#define DRIVER_VERSION  "0.1"
>  +#define DRIVER_AUTHOR   "Paul Mackerras, IBM Corp. "
>  +#define DRIVER_DESC "POWERPC KVM driver"
> >>>
> >>> Really?
> >>
> >>
> >> What is wrong here?
> > 
> > Well, it seems entirely unrelated to the rest of the changes, 
> 
> 
> The patch adds a module parameter so I had to add those DRIVER_xxx.

Ah, ok.

> > and not obviously accurate.
> 
> Let's fix it then. How? Paul signed it...

Fair enough then.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: Digital signature


Re: [PATCH 5/6] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-06 Thread Alexey Kardashevskiy
On 05/07/2013 04:02 PM, David Gibson wrote:
> On Tue, May 07, 2013 at 03:51:31PM +1000, Alexey Kardashevskiy wrote:
>> On 05/07/2013 03:29 PM, David Gibson wrote:
>>> On Mon, May 06, 2013 at 05:25:56PM +1000, Alexey Kardashevskiy wrote:
 This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
 and H_STUFF_TCE requests without passing them to QEMU, which should
 save time on switching to QEMU and back.

 Both real and virtual modes are supported - whenever the kernel
 fails to handle TCE request, it passes it to the virtual mode.
 If it the virtual mode handlers fail, then the request is passed
 to the user mode, for example, to QEMU.

 This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate
 a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables
 in-kernel handling of IOMMU map/unmap.

 This adds a special case for huge pages (16MB).  The reference
 counting cannot be easily done for such pages in real mode (when
 MMU is off) so we added a list of huge pages.  It is populated in
 virtual mode and get_page is called just once per a huge page.
 Real mode handlers check if the requested page is huge and in the list,
 then no reference counting is done, otherwise an exit to virtual mode
 happens.  The list is released at KVM exit.  At the moment the fastest
 card available for tests uses up to 9 huge pages so walking through this
 list is not very expensive.  However this can change and we may want
 to optimize this.

 This also adds the virt_only parameter to the KVM module
 for debug and performance check purposes.

 Tests show that this patch increases transmission speed from 220MB/s
 to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

 Cc: David Gibson 
 Signed-off-by: Alexey Kardashevskiy 
 Signed-off-by: Paul Mackerras 
 ---
  Documentation/virtual/kvm/api.txt   |   28 
  arch/powerpc/include/asm/kvm_host.h |2 +
  arch/powerpc/include/asm/kvm_ppc.h  |2 +
  arch/powerpc/include/uapi/asm/kvm.h |7 +
  arch/powerpc/kvm/book3s_64_vio.c|  242 
 ++-
  arch/powerpc/kvm/book3s_64_vio_hv.c |  192 +++
  arch/powerpc/kvm/powerpc.c  |   12 ++
  include/uapi/linux/kvm.h|2 +
  8 files changed, 485 insertions(+), 2 deletions(-)

 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index f621cd6..2039767 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2127,6 +2127,34 @@ written, then `n_invalid' invalid entries, 
 invalidating any previously
  valid entries found.
  
  
 +4.79 KVM_CREATE_SPAPR_TCE_IOMMU
 +
 +Capability: KVM_CAP_SPAPR_TCE_IOMMU
 +Architectures: powerpc
 +Type: vm ioctl
 +Parameters: struct kvm_create_spapr_tce_iommu (in)
 +Returns: 0 on success, -1 on error
 +
 +This creates a link between IOMMU group and a hardware TCE (translation
 +control entry) table. This link lets the host kernel know what IOMMU
 +group (i.e. TCE table) to use for the LIOBN number passed with
 +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls.
 +
 +/* for KVM_CAP_SPAPR_TCE_IOMMU */
 +struct kvm_create_spapr_tce_iommu {
 +  __u64 liobn;
 +  __u32 iommu_id;
>>>
>>> Wouldn't it be more in keeping 
>>
>>
>> pardon?
> 
> Sorry, I was going to suggest a change, but then realised it wasn't
> actually any better than what you have now.
> 
 +  __u32 flags;
 +};
 +
 +No flag is supported at the moment.
 +
 +When the guest issues TCE call on a liobn for which a TCE table has been
 +registered, the kernel will handle it in real mode, updating the hardware
 +TCE table. TCE table calls for other liobns will cause a vm exit and must
 +be handled by userspace.
 +
 +
  5. The kvm_run structure
  
  
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index 36ceb0d..2b70cbc 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -178,6 +178,8 @@ struct kvmppc_spapr_tce_table {
struct kvm *kvm;
u64 liobn;
u32 window_size;
 +  bool virtmode_only;
>>>
>>> I see this is now initialized from the global parameter, but I think
>>> it would be better to just check the global (debug) parameter
>>> directly, rather than duplicating it here.
>>
>>
>> The global parameter is in kvm.ko and the struct above is in the real mode
>> part which cannot go to the module.
> 
> Ah, ok.  I'm half inclined to just drop the virtmode_only thing
> entirely.
> 
 +  struct iommu_group *grp;/* used for IOMMU groups */
struct page *pages[0];
  };

Re: [PATCH 5/6] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-06 Thread David Gibson
On Tue, May 07, 2013 at 03:51:31PM +1000, Alexey Kardashevskiy wrote:
> On 05/07/2013 03:29 PM, David Gibson wrote:
> > On Mon, May 06, 2013 at 05:25:56PM +1000, Alexey Kardashevskiy wrote:
> >> This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
> >> and H_STUFF_TCE requests without passing them to QEMU, which should
> >> save time on switching to QEMU and back.
> >>
> >> Both real and virtual modes are supported - whenever the kernel
> >> fails to handle TCE request, it passes it to the virtual mode.
> >> If it the virtual mode handlers fail, then the request is passed
> >> to the user mode, for example, to QEMU.
> >>
> >> This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate
> >> a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables
> >> in-kernel handling of IOMMU map/unmap.
> >>
> >> This adds a special case for huge pages (16MB).  The reference
> >> counting cannot be easily done for such pages in real mode (when
> >> MMU is off) so we added a list of huge pages.  It is populated in
> >> virtual mode and get_page is called just once per a huge page.
> >> Real mode handlers check if the requested page is huge and in the list,
> >> then no reference counting is done, otherwise an exit to virtual mode
> >> happens.  The list is released at KVM exit.  At the moment the fastest
> >> card available for tests uses up to 9 huge pages so walking through this
> >> list is not very expensive.  However this can change and we may want
> >> to optimize this.
> >>
> >> This also adds the virt_only parameter to the KVM module
> >> for debug and performance check purposes.
> >>
> >> Tests show that this patch increases transmission speed from 220MB/s
> >> to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
> >>
> >> Cc: David Gibson 
> >> Signed-off-by: Alexey Kardashevskiy 
> >> Signed-off-by: Paul Mackerras 
> >> ---
> >>  Documentation/virtual/kvm/api.txt   |   28 
> >>  arch/powerpc/include/asm/kvm_host.h |2 +
> >>  arch/powerpc/include/asm/kvm_ppc.h  |2 +
> >>  arch/powerpc/include/uapi/asm/kvm.h |7 +
> >>  arch/powerpc/kvm/book3s_64_vio.c|  242 
> >> ++-
> >>  arch/powerpc/kvm/book3s_64_vio_hv.c |  192 +++
> >>  arch/powerpc/kvm/powerpc.c  |   12 ++
> >>  include/uapi/linux/kvm.h|2 +
> >>  8 files changed, 485 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/Documentation/virtual/kvm/api.txt 
> >> b/Documentation/virtual/kvm/api.txt
> >> index f621cd6..2039767 100644
> >> --- a/Documentation/virtual/kvm/api.txt
> >> +++ b/Documentation/virtual/kvm/api.txt
> >> @@ -2127,6 +2127,34 @@ written, then `n_invalid' invalid entries, 
> >> invalidating any previously
> >>  valid entries found.
> >>  
> >>  
> >> +4.79 KVM_CREATE_SPAPR_TCE_IOMMU
> >> +
> >> +Capability: KVM_CAP_SPAPR_TCE_IOMMU
> >> +Architectures: powerpc
> >> +Type: vm ioctl
> >> +Parameters: struct kvm_create_spapr_tce_iommu (in)
> >> +Returns: 0 on success, -1 on error
> >> +
> >> +This creates a link between IOMMU group and a hardware TCE (translation
> >> +control entry) table. This link lets the host kernel know what IOMMU
> >> +group (i.e. TCE table) to use for the LIOBN number passed with
> >> +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls.
> >> +
> >> +/* for KVM_CAP_SPAPR_TCE_IOMMU */
> >> +struct kvm_create_spapr_tce_iommu {
> >> +  __u64 liobn;
> >> +  __u32 iommu_id;
> > 
> > Wouldn't it be more in keeping 
> 
> 
> pardon?

Sorry, I was going to suggest a change, but then realised it wasn't
actually any better than what you have now.

> >> +  __u32 flags;
> >> +};
> >> +
> >> +No flag is supported at the moment.
> >> +
> >> +When the guest issues TCE call on a liobn for which a TCE table has been
> >> +registered, the kernel will handle it in real mode, updating the hardware
> >> +TCE table. TCE table calls for other liobns will cause a vm exit and must
> >> +be handled by userspace.
> >> +
> >> +
> >>  5. The kvm_run structure
> >>  
> >>  
> >> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> >> b/arch/powerpc/include/asm/kvm_host.h
> >> index 36ceb0d..2b70cbc 100644
> >> --- a/arch/powerpc/include/asm/kvm_host.h
> >> +++ b/arch/powerpc/include/asm/kvm_host.h
> >> @@ -178,6 +178,8 @@ struct kvmppc_spapr_tce_table {
> >>struct kvm *kvm;
> >>u64 liobn;
> >>u32 window_size;
> >> +  bool virtmode_only;
> > 
> > I see this is now initialized from the global parameter, but I think
> > it would be better to just check the global (debug) parameter
> > directly, rather than duplicating it here.
> 
> 
> The global parameter is in kvm.ko and the struct above is in the real mode
> part which cannot go to the module.

Ah, ok.  I'm half inclined to just drop the virtmode_only thing
entirely.

> >> +  struct iommu_group *grp;/* used for IOMMU groups */
> >>struct page *pages[0];
> >>  };
> >>  
> >> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> 

Re: [PATCH 5/6] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-06 Thread Alexey Kardashevskiy
On 05/07/2013 03:29 PM, David Gibson wrote:
> On Mon, May 06, 2013 at 05:25:56PM +1000, Alexey Kardashevskiy wrote:
>> This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
>> and H_STUFF_TCE requests without passing them to QEMU, which should
>> save time on switching to QEMU and back.
>>
>> Both real and virtual modes are supported - whenever the kernel
>> fails to handle TCE request, it passes it to the virtual mode.
>> If it the virtual mode handlers fail, then the request is passed
>> to the user mode, for example, to QEMU.
>>
>> This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate
>> a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables
>> in-kernel handling of IOMMU map/unmap.
>>
>> This adds a special case for huge pages (16MB).  The reference
>> counting cannot be easily done for such pages in real mode (when
>> MMU is off) so we added a list of huge pages.  It is populated in
>> virtual mode and get_page is called just once per a huge page.
>> Real mode handlers check if the requested page is huge and in the list,
>> then no reference counting is done, otherwise an exit to virtual mode
>> happens.  The list is released at KVM exit.  At the moment the fastest
>> card available for tests uses up to 9 huge pages so walking through this
>> list is not very expensive.  However this can change and we may want
>> to optimize this.
>>
>> This also adds the virt_only parameter to the KVM module
>> for debug and performance check purposes.
>>
>> Tests show that this patch increases transmission speed from 220MB/s
>> to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
>>
>> Cc: David Gibson 
>> Signed-off-by: Alexey Kardashevskiy 
>> Signed-off-by: Paul Mackerras 
>> ---
>>  Documentation/virtual/kvm/api.txt   |   28 
>>  arch/powerpc/include/asm/kvm_host.h |2 +
>>  arch/powerpc/include/asm/kvm_ppc.h  |2 +
>>  arch/powerpc/include/uapi/asm/kvm.h |7 +
>>  arch/powerpc/kvm/book3s_64_vio.c|  242 
>> ++-
>>  arch/powerpc/kvm/book3s_64_vio_hv.c |  192 +++
>>  arch/powerpc/kvm/powerpc.c  |   12 ++
>>  include/uapi/linux/kvm.h|2 +
>>  8 files changed, 485 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index f621cd6..2039767 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -2127,6 +2127,34 @@ written, then `n_invalid' invalid entries, 
>> invalidating any previously
>>  valid entries found.
>>  
>>  
>> +4.79 KVM_CREATE_SPAPR_TCE_IOMMU
>> +
>> +Capability: KVM_CAP_SPAPR_TCE_IOMMU
>> +Architectures: powerpc
>> +Type: vm ioctl
>> +Parameters: struct kvm_create_spapr_tce_iommu (in)
>> +Returns: 0 on success, -1 on error
>> +
>> +This creates a link between IOMMU group and a hardware TCE (translation
>> +control entry) table. This link lets the host kernel know what IOMMU
>> +group (i.e. TCE table) to use for the LIOBN number passed with
>> +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls.
>> +
>> +/* for KVM_CAP_SPAPR_TCE_IOMMU */
>> +struct kvm_create_spapr_tce_iommu {
>> +__u64 liobn;
>> +__u32 iommu_id;
> 
> Wouldn't it be more in keeping 


pardon?



>> +__u32 flags;
>> +};
>> +
>> +No flag is supported at the moment.
>> +
>> +When the guest issues TCE call on a liobn for which a TCE table has been
>> +registered, the kernel will handle it in real mode, updating the hardware
>> +TCE table. TCE table calls for other liobns will cause a vm exit and must
>> +be handled by userspace.
>> +
>> +
>>  5. The kvm_run structure
>>  
>>  
>> diff --git a/arch/powerpc/include/asm/kvm_host.h 
>> b/arch/powerpc/include/asm/kvm_host.h
>> index 36ceb0d..2b70cbc 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -178,6 +178,8 @@ struct kvmppc_spapr_tce_table {
>>  struct kvm *kvm;
>>  u64 liobn;
>>  u32 window_size;
>> +bool virtmode_only;
> 
> I see this is now initialized from the global parameter, but I think
> it would be better to just check the global (debug) parameter
> directly, rather than duplicating it here.


The global parameter is in kvm.ko and the struct above is in the real mode
part which cannot go to the module.



>> +struct iommu_group *grp;/* used for IOMMU groups */
>>  struct page *pages[0];
>>  };
>>  
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
>> b/arch/powerpc/include/asm/kvm_ppc.h
>> index d501246..bdfa140 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -139,6 +139,8 @@ extern void kvmppc_xics_free(struct kvm *kvm);
>>  
>>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>>  struct kvm_create_spapr_tce *args);
>> +extern long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm,
>> +st

Re: [PATCH 5/6] KVM: PPC: Add support for IOMMU in-kernel handling

2013-05-06 Thread David Gibson
On Mon, May 06, 2013 at 05:25:56PM +1000, Alexey Kardashevskiy wrote:
> This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
> and H_STUFF_TCE requests without passing them to QEMU, which should
> save time on switching to QEMU and back.
> 
> Both real and virtual modes are supported - whenever the kernel
> fails to handle TCE request, it passes it to the virtual mode.
> If it the virtual mode handlers fail, then the request is passed
> to the user mode, for example, to QEMU.
> 
> This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate
> a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables
> in-kernel handling of IOMMU map/unmap.
> 
> This adds a special case for huge pages (16MB).  The reference
> counting cannot be easily done for such pages in real mode (when
> MMU is off) so we added a list of huge pages.  It is populated in
> virtual mode and get_page is called just once per a huge page.
> Real mode handlers check if the requested page is huge and in the list,
> then no reference counting is done, otherwise an exit to virtual mode
> happens.  The list is released at KVM exit.  At the moment the fastest
> card available for tests uses up to 9 huge pages so walking through this
> list is not very expensive.  However this can change and we may want
> to optimize this.
> 
> This also adds the virt_only parameter to the KVM module
> for debug and performance check purposes.
> 
> Tests show that this patch increases transmission speed from 220MB/s
> to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
> 
> Cc: David Gibson 
> Signed-off-by: Alexey Kardashevskiy 
> Signed-off-by: Paul Mackerras 
> ---
>  Documentation/virtual/kvm/api.txt   |   28 
>  arch/powerpc/include/asm/kvm_host.h |2 +
>  arch/powerpc/include/asm/kvm_ppc.h  |2 +
>  arch/powerpc/include/uapi/asm/kvm.h |7 +
>  arch/powerpc/kvm/book3s_64_vio.c|  242 
> ++-
>  arch/powerpc/kvm/book3s_64_vio_hv.c |  192 +++
>  arch/powerpc/kvm/powerpc.c  |   12 ++
>  include/uapi/linux/kvm.h|2 +
>  8 files changed, 485 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index f621cd6..2039767 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2127,6 +2127,34 @@ written, then `n_invalid' invalid entries, 
> invalidating any previously
>  valid entries found.
>  
>  
> +4.79 KVM_CREATE_SPAPR_TCE_IOMMU
> +
> +Capability: KVM_CAP_SPAPR_TCE_IOMMU
> +Architectures: powerpc
> +Type: vm ioctl
> +Parameters: struct kvm_create_spapr_tce_iommu (in)
> +Returns: 0 on success, -1 on error
> +
> +This creates a link between IOMMU group and a hardware TCE (translation
> +control entry) table. This link lets the host kernel know what IOMMU
> +group (i.e. TCE table) to use for the LIOBN number passed with
> +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls.
> +
> +/* for KVM_CAP_SPAPR_TCE_IOMMU */
> +struct kvm_create_spapr_tce_iommu {
> + __u64 liobn;
> + __u32 iommu_id;

Wouldn't it be more in keeping 

> + __u32 flags;
> +};
> +
> +No flag is supported at the moment.
> +
> +When the guest issues TCE call on a liobn for which a TCE table has been
> +registered, the kernel will handle it in real mode, updating the hardware
> +TCE table. TCE table calls for other liobns will cause a vm exit and must
> +be handled by userspace.
> +
> +
>  5. The kvm_run structure
>  
>  
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 36ceb0d..2b70cbc 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -178,6 +178,8 @@ struct kvmppc_spapr_tce_table {
>   struct kvm *kvm;
>   u64 liobn;
>   u32 window_size;
> + bool virtmode_only;

I see this is now initialized from the global parameter, but I think
it would be better to just check the global (debug) parameter
directly, rather than duplicating it here.

> + struct iommu_group *grp;/* used for IOMMU groups */
>   struct page *pages[0];
>  };
>  
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> b/arch/powerpc/include/asm/kvm_ppc.h
> index d501246..bdfa140 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -139,6 +139,8 @@ extern void kvmppc_xics_free(struct kvm *kvm);
>  
>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>   struct kvm_create_spapr_tce *args);
> +extern long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm,
> + struct kvm_create_spapr_tce_iommu *args);
>  extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(
>   struct kvm_vcpu *vcpu, unsigned long liobn);
>  extern long kvmppc_emulated_h_put_tce(struct kvmppc_spapr_tce_table *stt,
> diff --git a/arch/powerpc/include/