Re: [Qemu-devel] [PATCH v7 1/4] spapr_iommu: Make in-kernel TCE table optional

Alexey Kardashevskiy Thu, 05 Jun 2014 16:19:25 -0700

On 06/06/2014 02:51 AM, Alexander Graf wrote:
> 
> On 05.06.14 16:33, Alexey Kardashevskiy wrote:
>> On 06/05/2014 11:36 PM, Alexander Graf wrote:
>>> On 05.06.14 15:33, Alexey Kardashevskiy wrote:
>>>> On 06/05/2014 11:15 PM, Alexander Graf wrote:
>>>>> On 05.06.14 15:10, Alexey Kardashevskiy wrote:
>>>>>> On 06/05/2014 11:06 PM, Alexander Graf wrote:
>>>>>>> On 05.06.14 08:43, Alexey Kardashevskiy wrote:
>>>>>>>> On 06/05/2014 03:49 PM, Alexey Kardashevskiy wrote:
>>>>>>>>> POWER KVM supports an KVM_CAP_SPAPR_TCE capability which allows
>>>>>>>>> allocating
>>>>>>>>> TCE tables in the host kernel memory and handle H_PUT_TCE requests
>>>>>>>>> targeted to specific LIOBN (logical bus number) right in the host
>>>>>>>>> without
>>>>>>>>> switching to QEMU. At the moment this is used for emulated devices
>>>>>>>>> only
>>>>>>>>> and the handler only puts TCE to the table. If the in-kernel
>>>>>>>>> H_PUT_TCE
>>>>>>>>> handler finds a LIOBN and corresponding table, it will put a TCE to
>>>>>>>>> the table and complete hypercall execution. The user space will
>>>>>>>>> not be
>>>>>>>>> notified.
>>>>>>>>>
>>>>>>>>> Upcoming VFIO support is going to use the same sPAPRTCETable device
>>>>>>>>> class
>>>>>>>>> so KVM_CAP_SPAPR_TCE is going to be used as well. That means that TCE
>>>>>>>>> tables for VFIO are going to be allocated in the host as well.
>>>>>>>>> However VFIO operates with real IOMMU tables and simple copying of
>>>>>>>>> a TCE to the real hardware TCE table will not work as guest physical
>>>>>>>>> to host physical address translation is requited.
>>>>>>>>>
>>>>>>>>> So until the host kernel gets VFIO support for H_PUT_TCE, we
>>>>>>>>> better not
>>>>>>>>> to register VFIO's TCE in the host.
>>>>>>>>>
>>>>>>>>> This adds a bool @kvm_accel flag to the sPAPRTCETable device telling
>>>>>>>>> that sPAPRTCETable should not try allocating TCE table in the host
>>>>>>>>> kernel.
>>>>>>>>> Instead, the table will be created in QEMU.
>>>>>>>>>
>>>>>>>>> This adds an kvm_accel parameter to spapr_tce_new_table() to let
>>>>>>>>> users
>>>>>>>>> choose whether to use acceleration or not. At the moment it is
>>>>>>>>> enabled
>>>>>>>>> for VIO and emulated PCI. Upcoming VFIO support will set it to false.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
>>>>>>>>> ---
>>>>>>>>>
>>>>>>>>> This is a workaround but it lets me have one IOMMU device for VIO,
>>>>>>>>> emulated
>>>>>>>>> PCI and VFIO which is a good thing.
>>>>>>>>>
>>>>>>>>> The other way around would be a new KVM_CAP_SPAPR_TCE_VFIO
>>>>>>>>> capability but
>>>>>>>>> this needs kernel update.
>>>>>>>> Never mind, I'll make it a capability. I'll post capability
>>>>>>>> reservation
>>>>>>>> patch separately.
>>>>>>> Just rename the flag from "kvm_accel" to "vfio_accel", set it to
>>>>>>> true for
>>>>>>> vfio and false for emulated devices. Then the spapr_iommu file can
>>>>>>> check on
>>>>>>> the capability (and default to false for now, since it doesn't exist
>>>>>>> yet).
>>>>>> Is that ok if the flag does not have to do anything with VFIO per se? :)
>>>>> The flag means "use in-kernel acceleration if the vfio coupling
>>>>> capability
>>>>> is available", no?
>>>> It is a flag of sPAPRTCETable which is not supposed to know about VFIO at
>>>> all, it is just an IOMMU. But if you are ok with it, I have no reason
>>>> to be
>>>> unhappy either :)
>>>>
>>>>
>>>>
>>>>>>> That way you don't have to reserve a CAP today.
>>>>>> Why exactly cannot we do that today?
>>>>> Because the CAP namespace isn't a garbage bin we can just throw IDs at.
>>>>> Maybe we realize during patch review that we need completely different
>>>>> CAPs.
>>>> That was my first plan - to wait for KVM_CAP_SPAPR_TCE_64 be available in
>>>> the kernel.
>>> So all you need are 64bit TCEs with bus_offset?
>>
>> No. I need 64bit IOBAs a.k.a. PCI bus addresses. The default DMA window is
>> just 1 or 2GB and it is mapped at 0 on PCI bus.
>>
>> TCEs are 64 bit already.
> 
> Ok, so the guest has to tell the PCI device to write to a specific window.
> That's a shame :).


No. Guest tells the device some address, that's it.  Guest allocates those
addresses from some window which host, guest and PHB know about but not the
device. What is a shame here?


> 
>>
>>> What about the missing
>>> in-kernel modification of the shadow TCEs on H_PUT_TCE? I thought that's
>>> what this is really about.
>> This I do not understand :(
> 
> How does real mode H_PUT_TCE emulation know that it needs to notify user
> space to establish the map?

If it wants to pass control to the user space, it returns H_TOO_HARD. This
happens, for example, if LIOBN was not registered in KVM.



-- 
Alexey

Re: [Qemu-devel] [PATCH v7 1/4] spapr_iommu: Make in-kernel TCE table optional

Reply via email to