Hi, On 5/28/20 10:38 AM, Jean-Philippe Brucker wrote: > [+ Shameer] > > On Thu, May 28, 2020 at 09:43:46AM +0200, Auger Eric wrote: >> Hi, >> >> On 5/28/20 9:23 AM, Jean-Philippe Brucker wrote: >>> On Thu, May 28, 2020 at 10:45:14AM +0530, Srinath Mannam wrote: >>>> On Wed, May 27, 2020 at 11:00 PM Robin Murphy <robin.mur...@arm.com> wrote: >>>>> >>>> Thanks Robin for your quick response. >>>>> On 2020-05-27 17:03, Srinath Mannam wrote: >>>>>> This patch gives the provision to change default value of MSI IOVA base >>>>>> to platform's suitable IOVA using module parameter. The present >>>>>> hardcoded MSI IOVA base may not be the accessible IOVA ranges of >>>>>> platform. >>>>> >>>>> That in itself doesn't seem entirely unreasonable; IIRC the current >>>>> address is just an arbitrary choice to fit nicely into Qemu's memory >>>>> map, and there was always the possibility that it wouldn't suit >>>>> everything. >>>>> >>>>>> Since commit aadad097cd46 ("iommu/dma: Reserve IOVA for PCIe inaccessible >>>>>> DMA address"), inaccessible IOVA address ranges parsed from dma-ranges >>>>>> property are reserved. >>> >>> I don't understand why we only reserve the PCIe windows for DMA domains. >>> Shouldn't VFIO also prevent userspace from mapping them? >> >> VFIO prevents userspace from DMA mapping iovas within reserved regions: >> 9b77e5c79840 vfio/type1: check dma map request is within a valid iova range > > Right but I was asking specifically about the IOVA reservation introduced > by commit aadad097cd46. They are not registered as reserved regions within > the IOMMU core, they are only taken into account by dma-iommu.c when > creating a DMA domain. As VFIO uses UNMANAGED domains, it isn't aware of > those regions and they won't be seen by vfio_iommu_resv_exclude(). > > It looks like the PCIe regions used to be common until cd2c9fcf5c66 > ("iommu/dma: Move PCI window region reservation back into dma specific > path.") But I couldn't find the justification for this commit.
Yes I noticed that as well when debugging the above mentioned case before and after cd2c9fcf5c66. I do not remember about the rationale of removing the DMA host brige windows from the resv regions. Did it break a legacy case? > > The thing is, if VFIO isn't aware of the reserved PCIe windows, then > allowing VFIO or userspace to choose MSI_IOVA_BASE won't solve the problem > reported by Srinath, because they could well choose an IOVA within the > PCIe window... I agree with you Thanks Eric > > Thanks, > Jean > >> but it does not prevent the SW MSI region chosen by the kernel from >> colliding with other reserved regions (esp. PCIe host bridge windows). >> >> If they were >>> part of the common reserved regions then we could have VFIO choose a >>> SW_MSI region among the remaining free space. >> As Robin said this was the initial chosen approach >> [PATCH 10/10] vfio: allow the user to register reserved iova range for >> MSI mapping >> https://patchwork.kernel.org/patch/8121641/ >> >> Some additional background about why the static SW MSI region chosen by >> the kernel was later chosen: >> Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM >> PCIe/MSI passthrough on ARM/ARM64 (Alt II)) >> https://lists.linuxfoundation.org/pipermail/iommu/2016-November/019060.html >> >> Thanks >> >> Eric >> >> >> It would just need a >>> different way of asking the IOMMU driver if a SW_MSI is needed, for >>> example with a domain attribute. >>> >>> Thanks, >>> Jean >>> >>>>> >>>>> That, however, doesn't seem to fit here; iommu-dma maps MSI doorbells >>>>> dynamically, so they aren't affected by reserved regions any more than >>>>> regular DMA pages are. In fact, it explicitly ignores the software MSI >>>>> region, since as the comment says, it *is* the software that manages >>>>> those. >>>> Yes you are right, we don't see any issues with kernel drivers(PCI EP) >>>> because >>>> MSI IOVA allocated dynamically by honouring reserved regions same as DMA >>>> pages. >>>>> >>>>> The MSI_IOVA_BASE region exists for VFIO, precisely because in that case >>>>> the kernel *doesn't* control the address space, but still needs some way >>>>> to steal a bit of it for MSIs that the guest doesn't necessarily know >>>>> about, and give userspace a fighting chance of knowing what it's taken. >>>>> I think at the time we discussed the idea of adding something to the >>>>> VFIO uapi such that userspace could move this around if it wanted or >>>>> needed to, but decided we could live without that initially. Perhaps now >>>>> the time has come? >>>> Yes, we see issues only with user-space drivers(DPDK) in which >>>> MSI_IOVA_BASE >>>> region is considered to map MSI registers. This patch helps us to fix the >>>> issue. >>>> >>>> Thanks, >>>> Srinath. >>>>> >>>>> Robin. >>>>> >>>>>> If any platform has the limitaion to access default MSI IOVA, then it can >>>>>> be changed using "arm-smmu.msi_iova_base=0xa0000000" command line >>>>>> argument. >>>>>> >>>>>> Signed-off-by: Srinath Mannam <srinath.man...@broadcom.com> >>>>>> --- >>>>>> drivers/iommu/arm-smmu.c | 5 ++++- >>>>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c >>>>>> index 4f1a350..5e59c9d 100644 >>>>>> --- a/drivers/iommu/arm-smmu.c >>>>>> +++ b/drivers/iommu/arm-smmu.c >>>>>> @@ -72,6 +72,9 @@ static bool disable_bypass = >>>>>> module_param(disable_bypass, bool, S_IRUGO); >>>>>> MODULE_PARM_DESC(disable_bypass, >>>>>> "Disable bypass streams such that incoming transactions from >>>>>> devices that are not attached to an iommu domain will report an abort >>>>>> back to the device and will not be allowed to pass through the SMMU."); >>>>>> +static unsigned long msi_iova_base = MSI_IOVA_BASE; >>>>>> +module_param(msi_iova_base, ulong, S_IRUGO); >>>>>> +MODULE_PARM_DESC(msi_iova_base, "msi iova base address."); >>>>>> >>>>>> struct arm_smmu_s2cr { >>>>>> struct iommu_group *group; >>>>>> @@ -1566,7 +1569,7 @@ static void arm_smmu_get_resv_regions(struct >>>>>> device *dev, >>>>>> struct iommu_resv_region *region; >>>>>> int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO; >>>>>> >>>>>> - region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH, >>>>>> + region = iommu_alloc_resv_region(msi_iova_base, MSI_IOVA_LENGTH, >>>>>> prot, IOMMU_RESV_SW_MSI); >>>>>> if (!region) >>>>>> return; >>>>>> >>> >>> _______________________________________________ >>> linux-arm-kernel mailing list >>> linux-arm-ker...@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>> >> > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >