On 06/06/2014 02:51 AM, Alexander Graf wrote: > > On 05.06.14 16:33, Alexey Kardashevskiy wrote: >> On 06/05/2014 11:36 PM, Alexander Graf wrote: >>> On 05.06.14 15:33, Alexey Kardashevskiy wrote: >>>> On 06/05/2014 11:15 PM, Alexander Graf wrote: >>>>> On 05.06.14 15:10, Alexey Kardashevskiy wrote: >>>>>> On 06/05/2014 11:06 PM, Alexander Graf wrote: >>>>>>> On 05.06.14 08:43, Alexey Kardashevskiy wrote: >>>>>>>> On 06/05/2014 03:49 PM, Alexey Kardashevskiy wrote: >>>>>>>>> POWER KVM supports an KVM_CAP_SPAPR_TCE capability which allows >>>>>>>>> allocating >>>>>>>>> TCE tables in the host kernel memory and handle H_PUT_TCE requests >>>>>>>>> targeted to specific LIOBN (logical bus number) right in the host >>>>>>>>> without >>>>>>>>> switching to QEMU. At the moment this is used for emulated devices >>>>>>>>> only >>>>>>>>> and the handler only puts TCE to the table. If the in-kernel >>>>>>>>> H_PUT_TCE >>>>>>>>> handler finds a LIOBN and corresponding table, it will put a TCE to >>>>>>>>> the table and complete hypercall execution. The user space will >>>>>>>>> not be >>>>>>>>> notified. >>>>>>>>> >>>>>>>>> Upcoming VFIO support is going to use the same sPAPRTCETable device >>>>>>>>> class >>>>>>>>> so KVM_CAP_SPAPR_TCE is going to be used as well. That means that TCE >>>>>>>>> tables for VFIO are going to be allocated in the host as well. >>>>>>>>> However VFIO operates with real IOMMU tables and simple copying of >>>>>>>>> a TCE to the real hardware TCE table will not work as guest physical >>>>>>>>> to host physical address translation is requited. >>>>>>>>> >>>>>>>>> So until the host kernel gets VFIO support for H_PUT_TCE, we >>>>>>>>> better not >>>>>>>>> to register VFIO's TCE in the host. >>>>>>>>> >>>>>>>>> This adds a bool @kvm_accel flag to the sPAPRTCETable device telling >>>>>>>>> that sPAPRTCETable should not try allocating TCE table in the host >>>>>>>>> kernel. >>>>>>>>> Instead, the table will be created in QEMU. >>>>>>>>> >>>>>>>>> This adds an kvm_accel parameter to spapr_tce_new_table() to let >>>>>>>>> users >>>>>>>>> choose whether to use acceleration or not. At the moment it is >>>>>>>>> enabled >>>>>>>>> for VIO and emulated PCI. Upcoming VFIO support will set it to false. >>>>>>>>> >>>>>>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> This is a workaround but it lets me have one IOMMU device for VIO, >>>>>>>>> emulated >>>>>>>>> PCI and VFIO which is a good thing. >>>>>>>>> >>>>>>>>> The other way around would be a new KVM_CAP_SPAPR_TCE_VFIO >>>>>>>>> capability but >>>>>>>>> this needs kernel update. >>>>>>>> Never mind, I'll make it a capability. I'll post capability >>>>>>>> reservation >>>>>>>> patch separately. >>>>>>> Just rename the flag from "kvm_accel" to "vfio_accel", set it to >>>>>>> true for >>>>>>> vfio and false for emulated devices. Then the spapr_iommu file can >>>>>>> check on >>>>>>> the capability (and default to false for now, since it doesn't exist >>>>>>> yet). >>>>>> Is that ok if the flag does not have to do anything with VFIO per se? :) >>>>> The flag means "use in-kernel acceleration if the vfio coupling >>>>> capability >>>>> is available", no? >>>> It is a flag of sPAPRTCETable which is not supposed to know about VFIO at >>>> all, it is just an IOMMU. But if you are ok with it, I have no reason >>>> to be >>>> unhappy either :) >>>> >>>> >>>> >>>>>>> That way you don't have to reserve a CAP today. >>>>>> Why exactly cannot we do that today? >>>>> Because the CAP namespace isn't a garbage bin we can just throw IDs at. >>>>> Maybe we realize during patch review that we need completely different >>>>> CAPs. >>>> That was my first plan - to wait for KVM_CAP_SPAPR_TCE_64 be available in >>>> the kernel. >>> So all you need are 64bit TCEs with bus_offset? >> >> No. I need 64bit IOBAs a.k.a. PCI bus addresses. The default DMA window is >> just 1 or 2GB and it is mapped at 0 on PCI bus. >> >> TCEs are 64 bit already. > > Ok, so the guest has to tell the PCI device to write to a specific window. > That's a shame :).
No. Guest tells the device some address, that's it. Guest allocates those addresses from some window which host, guest and PHB know about but not the device. What is a shame here? > >> >>> What about the missing >>> in-kernel modification of the shadow TCEs on H_PUT_TCE? I thought that's >>> what this is really about. >> This I do not understand :( > > How does real mode H_PUT_TCE emulation know that it needs to notify user > space to establish the map? If it wants to pass control to the user space, it returns H_TOO_HARD. This happens, for example, if LIOBN was not registered in KVM. -- Alexey