Hi,

On 5/28/20 10:38 AM, Jean-Philippe Brucker wrote:
> [+ Shameer]
> 
> On Thu, May 28, 2020 at 09:43:46AM +0200, Auger Eric wrote:
>> Hi,
>>
>> On 5/28/20 9:23 AM, Jean-Philippe Brucker wrote:
>>> On Thu, May 28, 2020 at 10:45:14AM +0530, Srinath Mannam wrote:
>>>> On Wed, May 27, 2020 at 11:00 PM Robin Murphy <robin.mur...@arm.com> wrote:
>>>>>
>>>> Thanks Robin for your quick response.
>>>>> On 2020-05-27 17:03, Srinath Mannam wrote:
>>>>>> This patch gives the provision to change default value of MSI IOVA base
>>>>>> to platform's suitable IOVA using module parameter. The present
>>>>>> hardcoded MSI IOVA base may not be the accessible IOVA ranges of 
>>>>>> platform.
>>>>>
>>>>> That in itself doesn't seem entirely unreasonable; IIRC the current
>>>>> address is just an arbitrary choice to fit nicely into Qemu's memory
>>>>> map, and there was always the possibility that it wouldn't suit 
>>>>> everything.
>>>>>
>>>>>> Since commit aadad097cd46 ("iommu/dma: Reserve IOVA for PCIe inaccessible
>>>>>> DMA address"), inaccessible IOVA address ranges parsed from dma-ranges
>>>>>> property are reserved.
>>>
>>> I don't understand why we only reserve the PCIe windows for DMA domains.
>>> Shouldn't VFIO also prevent userspace from mapping them?
>>
>> VFIO prevents userspace from DMA mapping iovas within reserved regions:
>> 9b77e5c79840  vfio/type1: check dma map request is within a valid iova range
> 
> Right but I was asking specifically about the IOVA reservation introduced
> by commit aadad097cd46. They are not registered as reserved regions within
> the IOMMU core, they are only taken into account by dma-iommu.c when
> creating a DMA domain. As VFIO uses UNMANAGED domains, it isn't aware of
> those regions and they won't be seen by vfio_iommu_resv_exclude().
> 
> It looks like the PCIe regions used to be common until cd2c9fcf5c66
> ("iommu/dma: Move PCI window region reservation back into dma specific
> path.") But I couldn't find the justification for this commit.

Yes I noticed that as well when debugging the above mentioned case
before and after cd2c9fcf5c66. I do not remember about the rationale of
removing the DMA host brige windows from the resv regions. Did it break
a legacy case?
> 
> The thing is, if VFIO isn't aware of the reserved PCIe windows, then
> allowing VFIO or userspace to choose MSI_IOVA_BASE won't solve the problem
> reported by Srinath, because they could well choose an IOVA within the
> PCIe window...
I agree with you

Thanks

Eric
> 
> Thanks,
> Jean
> 
>> but it does not prevent the SW MSI region chosen by the kernel from
>> colliding with other reserved regions (esp. PCIe host bridge windows).
>>
>>   If they were
>>> part of the common reserved regions then we could have VFIO choose a
>>> SW_MSI region among the remaining free space.
>> As Robin said this was the initial chosen approach
>> [PATCH 10/10] vfio: allow the user to register reserved iova range for
>> MSI mapping
>> https://patchwork.kernel.org/patch/8121641/
>>
>> Some additional background about why the static SW MSI region chosen by
>> the kernel was later chosen:
>> Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM
>> PCIe/MSI passthrough on ARM/ARM64 (Alt II))
>> https://lists.linuxfoundation.org/pipermail/iommu/2016-November/019060.html
>>
>> Thanks
>>
>> Eric
>>
>>
>>  It would just need a
>>> different way of asking the IOMMU driver if a SW_MSI is needed, for
>>> example with a domain attribute.
>>>
>>> Thanks,
>>> Jean
>>>
>>>>>
>>>>> That, however, doesn't seem to fit here; iommu-dma maps MSI doorbells
>>>>> dynamically, so they aren't affected by reserved regions any more than
>>>>> regular DMA pages are. In fact, it explicitly ignores the software MSI
>>>>> region, since as the comment says, it *is* the software that manages 
>>>>> those.
>>>> Yes you are right, we don't see any issues with kernel drivers(PCI EP) 
>>>> because
>>>> MSI IOVA allocated dynamically by honouring reserved regions same as DMA 
>>>> pages.
>>>>>
>>>>> The MSI_IOVA_BASE region exists for VFIO, precisely because in that case
>>>>> the kernel *doesn't* control the address space, but still needs some way
>>>>> to steal a bit of it for MSIs that the guest doesn't necessarily know
>>>>> about, and give userspace a fighting chance of knowing what it's taken.
>>>>> I think at the time we discussed the idea of adding something to the
>>>>> VFIO uapi such that userspace could move this around if it wanted or
>>>>> needed to, but decided we could live without that initially. Perhaps now
>>>>> the time has come?
>>>> Yes, we see issues only with user-space drivers(DPDK) in which 
>>>> MSI_IOVA_BASE
>>>> region is considered to map MSI registers. This patch helps us to fix the 
>>>> issue.
>>>>
>>>> Thanks,
>>>> Srinath.
>>>>>
>>>>> Robin.
>>>>>
>>>>>> If any platform has the limitaion to access default MSI IOVA, then it can
>>>>>> be changed using "arm-smmu.msi_iova_base=0xa0000000" command line 
>>>>>> argument.
>>>>>>
>>>>>> Signed-off-by: Srinath Mannam <srinath.man...@broadcom.com>
>>>>>> ---
>>>>>>   drivers/iommu/arm-smmu.c | 5 ++++-
>>>>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>>>>> index 4f1a350..5e59c9d 100644
>>>>>> --- a/drivers/iommu/arm-smmu.c
>>>>>> +++ b/drivers/iommu/arm-smmu.c
>>>>>> @@ -72,6 +72,9 @@ static bool disable_bypass =
>>>>>>   module_param(disable_bypass, bool, S_IRUGO);
>>>>>>   MODULE_PARM_DESC(disable_bypass,
>>>>>>       "Disable bypass streams such that incoming transactions from 
>>>>>> devices that are not attached to an iommu domain will report an abort 
>>>>>> back to the device and will not be allowed to pass through the SMMU.");
>>>>>> +static unsigned long msi_iova_base = MSI_IOVA_BASE;
>>>>>> +module_param(msi_iova_base, ulong, S_IRUGO);
>>>>>> +MODULE_PARM_DESC(msi_iova_base, "msi iova base address.");
>>>>>>
>>>>>>   struct arm_smmu_s2cr {
>>>>>>       struct iommu_group              *group;
>>>>>> @@ -1566,7 +1569,7 @@ static void arm_smmu_get_resv_regions(struct 
>>>>>> device *dev,
>>>>>>       struct iommu_resv_region *region;
>>>>>>       int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
>>>>>>
>>>>>> -     region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
>>>>>> +     region = iommu_alloc_resv_region(msi_iova_base, MSI_IOVA_LENGTH,
>>>>>>                                        prot, IOMMU_RESV_SW_MSI);
>>>>>>       if (!region)
>>>>>>               return;
>>>>>>
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-ker...@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>>
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

Reply via email to