Re: [PATCH 12/13] kvm/powerpc: Accelerate H_PUT_TCE by implementing it in real mode
On 17.05.2011, at 11:35, Benjamin Herrenschmidt wrote: > On Tue, 2011-05-17 at 11:31 +0200, Alexander Graf wrote: >> On 17.05.2011, at 11:11, Benjamin Herrenschmidt wrote: >> >>> On Tue, 2011-05-17 at 10:01 +0200, Alexander Graf wrote: I'm not sure I fully understand how this is supposed to work. If the tables are kept inside the kernel, how does userspace get to know where to DMA to? >>> >>> The guest gets a dma range from the device-tree which is the range of >>> device-side dma addresses it can use that correspond to the table. >>> >>> The guest kernel uses the normal linux iommu space allocator to allocate >>> space in that region and uses H_PUT_TCE to populate the corresponding >>> table entries. >>> >>> This is the same interface that is used for "real" iommu's with PCI >>> devices btw. >> >> I'm still slightly puzzled here :). IIUC the main point of an IOMMU is for >> the kernel >> to change where device accesses actually go to. So device DMAs address A, >> goes through >> the IOMMU, in reality accesses address B. > > Right :-) > >> Now, how do we tell the devices implemented in qemu that they're supposed to >> DMA to >> address B instead of A if the mapping table is kept in-kernel? > > Oh, bcs qemu mmaps the table :-) That's the piece to the puzzle I was missing. Please document that interface properly - it needs to be rock stable :) Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 12/13] kvm/powerpc: Accelerate H_PUT_TCE by implementing it in real mode
On Tue, 2011-05-17 at 11:31 +0200, Alexander Graf wrote: > On 17.05.2011, at 11:11, Benjamin Herrenschmidt wrote: > > > On Tue, 2011-05-17 at 10:01 +0200, Alexander Graf wrote: > >> I'm not sure I fully understand how this is supposed to work. If the > >> tables are kept inside the kernel, how does userspace get to know > >> where to DMA to? > > > > The guest gets a dma range from the device-tree which is the range of > > device-side dma addresses it can use that correspond to the table. > > > > The guest kernel uses the normal linux iommu space allocator to allocate > > space in that region and uses H_PUT_TCE to populate the corresponding > > table entries. > > > > This is the same interface that is used for "real" iommu's with PCI > > devices btw. > > I'm still slightly puzzled here :). IIUC the main point of an IOMMU is for > the kernel > to change where device accesses actually go to. So device DMAs address A, > goes through > the IOMMU, in reality accesses address B. Right :-) > Now, how do we tell the devices implemented in qemu that they're supposed to > DMA to > address B instead of A if the mapping table is kept in-kernel? Oh, bcs qemu mmaps the table :-) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 12/13] kvm/powerpc: Accelerate H_PUT_TCE by implementing it in real mode
On 17.05.2011, at 11:11, Benjamin Herrenschmidt wrote: > On Tue, 2011-05-17 at 10:01 +0200, Alexander Graf wrote: >> I'm not sure I fully understand how this is supposed to work. If the >> tables are kept inside the kernel, how does userspace get to know >> where to DMA to? > > The guest gets a dma range from the device-tree which is the range of > device-side dma addresses it can use that correspond to the table. > > The guest kernel uses the normal linux iommu space allocator to allocate > space in that region and uses H_PUT_TCE to populate the corresponding > table entries. > > This is the same interface that is used for "real" iommu's with PCI > devices btw. I'm still slightly puzzled here :). IIUC the main point of an IOMMU is for the kernel to change where device accesses actually go to. So device DMAs address A, goes through the IOMMU, in reality accesses address B. Now, how do we tell the devices implemented in qemu that they're supposed to DMA to address B instead of A if the mapping table is kept in-kernel? Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 12/13] kvm/powerpc: Accelerate H_PUT_TCE by implementing it in real mode
On Tue, 2011-05-17 at 10:01 +0200, Alexander Graf wrote: > I'm not sure I fully understand how this is supposed to work. If the > tables are kept inside the kernel, how does userspace get to know > where to DMA to? The guest gets a dma range from the device-tree which is the range of device-side dma addresses it can use that correspond to the table. The guest kernel uses the normal linux iommu space allocator to allocate space in that region and uses H_PUT_TCE to populate the corresponding table entries. This is the same interface that is used for "real" iommu's with PCI devices btw. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 12/13] kvm/powerpc: Accelerate H_PUT_TCE by implementing it in real mode
On 11.05.2011, at 12:46, Paul Mackerras wrote: > From: David Gibson > > This improves I/O performance for guests using the PAPR paravirtualization > interface by making the H_PUT_TCE hcall faster, by implementing it in > real mode. H_PUT_TCE is used for updating virtual IOMMU tables, and is > used both for virtual I/O and for real I/O in the PAPR interface. > > Since this moves the IOMMU tables into the kernel, we define a new > KVM_CREATE_SPAPR_TCE ioctl to allow qemu to create the tables. > The ioctl returns a file descriptor which can be used to mmap the > newly created table. > > Signed-off-by: Paul Mackerras > --- > arch/powerpc/include/asm/kvm.h |9 +++ > arch/powerpc/include/asm/kvm_book3s_64.h |2 + > arch/powerpc/include/asm/kvm_host.h |9 +++ > arch/powerpc/include/asm/kvm_ppc.h |2 + > arch/powerpc/kvm/Makefile|3 +- > arch/powerpc/kvm/book3s_64_vio_hv.c | 73 +++ > arch/powerpc/kvm/book3s_hv.c | 116 +- > arch/powerpc/kvm/book3s_hv_rmhandlers.S |2 +- > arch/powerpc/kvm/powerpc.c | 18 + > include/linux/kvm.h |5 ++ This one definitely needs documentation :). > 10 files changed, 236 insertions(+), 3 deletions(-) > create mode 100644 arch/powerpc/kvm/book3s_64_vio_hv.c > > diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h > index 18ea696..a9e641b 100644 > --- a/arch/powerpc/include/asm/kvm.h > +++ b/arch/powerpc/include/asm/kvm.h > @@ -22,6 +22,9 @@ > > #include > > +/* Select powerpc specific features in */ > +#define __KVM_HAVE_SPAPR_TCE > + > struct kvm_regs { > __u64 pc; > __u64 cr; > @@ -88,4 +91,10 @@ struct kvm_guest_debug_arch { > #define KVM_INTERRUPT_UNSET -2U > #define KVM_INTERRUPT_SET_LEVEL -3U > > +/* for KVM_CAP_SPAPR_TCE */ > +struct kvm_create_spapr_tce { > + __u64 liobn; > + __u32 window_size; > +}; > + > #endif /* __LINUX_KVM_POWERPC_H */ > diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h > b/arch/powerpc/include/asm/kvm_book3s_64.h > index 4cadd61..e1a096b 100644 > --- a/arch/powerpc/include/asm/kvm_book3s_64.h > +++ b/arch/powerpc/include/asm/kvm_book3s_64.h > @@ -25,4 +25,6 @@ static inline struct kvmppc_book3s_shadow_vcpu > *to_svcpu(struct kvm_vcpu *vcpu) > return &get_paca()->shadow_vcpu; > } > > +#define SPAPR_TCE_SHIFT 12 > + > #endif /* __ASM_KVM_BOOK3S_64_H__ */ > diff --git a/arch/powerpc/include/asm/kvm_host.h > b/arch/powerpc/include/asm/kvm_host.h > index af6703e..cda183e 100644 > --- a/arch/powerpc/include/asm/kvm_host.h > +++ b/arch/powerpc/include/asm/kvm_host.h > @@ -144,6 +144,14 @@ struct kvmppc_pginfo { > atomic_t refcnt; > }; > > +struct kvmppc_spapr_tce_table { > + struct list_head list; > + struct kvm *kvm; > + u64 liobn; > + u32 window_size; > + struct page *pages[0]; > +}; > + > struct kvm_arch { > unsigned long hpt_virt; > unsigned long ram_npages; > @@ -157,6 +165,7 @@ struct kvm_arch { > unsigned long host_sdr1; > int tlbie_lock; > unsigned short last_vcpu[NR_CPUS]; > + struct list_head spapr_tce_tables; > }; > > struct kvmppc_pte { > diff --git a/arch/powerpc/include/asm/kvm_ppc.h > b/arch/powerpc/include/asm/kvm_ppc.h > index b4ee11a..de683fa 100644 > --- a/arch/powerpc/include/asm/kvm_ppc.h > +++ b/arch/powerpc/include/asm/kvm_ppc.h > @@ -117,6 +117,8 @@ extern long kvmppc_prepare_vrma(struct kvm *kvm, > extern void kvmppc_map_vrma(struct kvm *kvm, > struct kvm_userspace_memory_region *mem); > extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); > +extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, > + struct kvm_create_spapr_tce *args); > extern int kvmppc_core_init_vm(struct kvm *kvm); > extern void kvmppc_core_destroy_vm(struct kvm *kvm); > extern int kvmppc_core_prepare_memory_region(struct kvm *kvm, > diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile > index 37c1a60..8ba062f 100644 > --- a/arch/powerpc/kvm/Makefile > +++ b/arch/powerpc/kvm/Makefile > @@ -59,7 +59,8 @@ kvm-book3s_64_hv-objs := \ > book3s.o \ > book3s_hv.o \ > book3s_hv_interrupts.o \ > - book3s_64_mmu_hv.o > + book3s_64_mmu_hv.o \ > + book3s_64_vio_hv.o > kvm-objs-$(CONFIG_KVM_BOOK3S_64_HV) := $(kvm-book3s_64_hv-objs) > > kvm-book3s_32-objs := \ > diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c > b/arch/powerpc/kvm/book3s_64_vio_hv.c > new file mode 100644 > index 000..ea0f8c5 > --- /dev/null > +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c > @@ -0,0 +1,73 @@ > +/* > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License, version 2, as > + * published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but