RE: [PATCH v2 06/12] iommu/vt-d: Add second level page table interface
> From: Raj, Ashok > Sent: Saturday, September 8, 2018 1:43 AM > > On Fri, Sep 07, 2018 at 10:47:11AM +0800, Lu Baolu wrote: > > > > >>+ > > >>+ intel_pasid_clear_entry(dev, pasid); > > >>+ > > >>+ if (!ecap_coherent(iommu->ecap)) { > > >>+ pte = intel_pasid_get_entry(dev, pasid); > > >>+ clflush_cache_range(pte, sizeof(*pte)); > > >>+ } > > >>+ > > >>+ pasid_based_pasid_cache_invalidation(iommu, did, pasid); > > >>+ pasid_based_iotlb_cache_invalidation(iommu, did, pasid); > > >>+ > > >>+ /* Device IOTLB doesn't need to be flushed in caching mode. */ > > >>+ if (!cap_caching_mode(iommu->cap)) > > >>+ pasid_based_dev_iotlb_cache_invalidation(iommu, dev, > > >>pasid); > > > > > >can you elaborate, or point to any spec reference? > > > > > > > In the driver, device iotlb doesn't get flushed in caching mode. I just > > follow what have been done there. > > > > It also makes sense to me since only the bare metal host needs to > > consider whether and how to flush the device iotlb. > > > > DavidW might remember, i think the idea was to help with cost > of virtualization, we can avoid taking 2 exits vs handling > it directly when we do iotlb flushing instead. > OK, performance-wise it makes sense. though strictly speaking it doesn't follow spec... Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: of_dma_request_slave_channel() failed ?
Hi Geert Thank you for your feedback > > - > > commit ac6bbf0cdf4206c517ac9789814c23e372ebce4d > > Author: Rob Herring > > Date: Mon Jul 9 09:41:52 2018 -0600 > > > > iommu: Remove IOMMU_OF_DECLARE > > > > Now that we use the driver core to stop deferred probe for missing > > drivers, IOMMU_OF_DECLARE can be removed. > > > > This is slightly less optimal than having a list of built-in drivers in > > that we'll now defer probe twice before giving up. This shouldn't have a > > significant impact on boot times as past discussions about deferred > > probe have given no evidence of deferred probe having a substantial > > impact. > > ... > > - (snip) > I assume you wrote commit 6c92d5a2744e2761 ("ASoC: rsnd: don't fallback > to PIO mode when -EPROBE_DEFER") to work around this? Yes, it is work around for it. > While this got rid of the error messages, and postpones sound initialization > until the audio DMAC is available, it does mean that the driver will _never_ > fall back to PIO. > > I.e. if CONFIG_RCAR_DMAC=n, audio will fail to probe instead of falling > back to PIO. If I focus only for sound here, the purpose of PIO mode is checking basic HW connection, clock settings, etc. Without DMAC support, it can't use many sound features. And PIO mode is supporting "SSI" only. If DT has SRC/CTU/DVC settings for sound, it needs DMA support anyway. &rcar_sound { ... ports { rsnd_port0: port@0 { rsnd_endpoint0: endpoint { ... => playback = <&ssi0 &src0 &dvc0>; }; }; }; }; Before 6c92d5a2744e2761 patch, driver will forcibly ignore non-SSI modules, and try to use PIO mode. I'm not sure it is "kindly support" or "overkill support". After this patch, it needs DMA, otherwise, probe will be failed. DT shouldn't have non-SSI modules if you want to use PIO mode. + /* use PIO mode */ - playback = <&ssi0 &src0 &dvc0>; + playback = <&ssi0>; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 0/7 v7] Support for fsl-mc bus and its devices in SMMU
> -Original Message- > From: Will Deacon [mailto:will.dea...@arm.com] > Sent: Wednesday, September 12, 2018 10:29 PM > To: Nipun Gupta > Cc: j...@8bytes.org; robin.mur...@arm.com; robh...@kernel.org; > r...@kernel.org; mark.rutl...@arm.com; catalin.mari...@arm.com; > gre...@linuxfoundation.org; Laurentiu Tudor ; > bhelg...@google.com; h...@lst.de; m.szyprow...@samsung.com; > shawn...@kernel.org; frowand.l...@gmail.com; iommu@lists.linux- > foundation.org; linux-ker...@vger.kernel.org; devicet...@vger.kernel.org; > linux-arm-ker...@lists.infradead.org; linuxppc-...@lists.ozlabs.org; linux- > p...@vger.kernel.org; Bharat Bhushan ; > stuyo...@gmail.com; Leo Li > Subject: Re: [PATCH 0/7 v7] Support for fsl-mc bus and its devices in SMMU > > Hi Nipun, > > On Mon, Sep 10, 2018 at 07:19:14PM +0530, Nipun Gupta wrote: > > This patchset defines IOMMU DT binding for fsl-mc bus and adds > > support in SMMU for fsl-mc bus. > > > > These patches > > - Define property 'iommu-map' for fsl-mc bus (patch 1) > > - Integrates the fsl-mc bus with the SMMU using this > > IOMMU binding (patch 2,3,4) > > - Adds the dma configuration support for fsl-mc bus (patch 5, 6) > > - Updates the fsl-mc device node with iommu/dma related changes (patch 7) > > It looks like you have all the Acks in place for this series now, so I > assume it's going to go via Joerg directly. Is that right? This is what I am expecting :) > Will > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 0/5] add non-strict mode support for arm-smmu-v3
On 2018/9/13 1:12, Robin Murphy wrote: > On 12/09/18 17:57, Will Deacon wrote: >> Hi all, >> >> On Wed, Aug 15, 2018 at 09:28:25AM +0800, Zhen Lei wrote: >>> v4 -> v5: >>> 1. change the type of global variable and struct member named "non_strict" >>> from >>> "int" to "bool". >>> 2. cancel the unnecessary parameter "strict" of __arm_lpae_unmap which was >>> added >>> in v4. >>> 3. change boot option "arm_iommu" to "iommu.non_strict". >>> 4. convert __iommu_dma_unmap to use iommu_unmap_fast()/iommu_tlb_sync(), >>> because >>> non-leaf unmaps still need to be synchronous. >>> >>> Thanks for Robin's review comments. >> >> Since this is 90% of the way there now, I suggest Robin picks up what's here >> and incorporates his remaining review comments directly (especially since it >> sounded like Zhen Lei hasn't got much free time lately). With that, I can >> queue this lot via my smmu branch, which already has some stuff queued >> for SMMUv3 and io-pgtable. >> >> Please shout if you have any objections, but I'm keen for this not to >> languish on the lists given how close it is! > > Sure, having got this far I'd like to see it get done too. I'll make some > time and give it a go. Thank you very much. I've been too busy lately, at least I still have no time this week, I also think it's been too long. So sorry and thanks again. > > Robin. > > . > -- Thanks! BestRegards ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC PATCH v2 08/10] vfio/type1: Add domain at(de)taching group helpers
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Thursday, September 13, 2018 1:54 AM > > On 12/09/2018 06:02, Lu Baolu wrote: > > Hi, > > > > On 09/11/2018 12:23 AM, Jean-Philippe Brucker wrote: > >> On 30/08/2018 05:09, Lu Baolu wrote: > >>> +static int vfio_detach_aux_domain(struct device *dev, void *data) > >>> +{ > >>> + struct iommu_domain *domain = data; > >>> + > >>> + vfio_mdev_set_aux_domain(dev, NULL); > >>> + iommu_detach_device(domain, dev->parent); > >> > >> I think that's only going to work for vt-d, which doesn't use a > >> default_domain. For other drivers, iommu.c ends up calling > >> domain->ops->attach_dev(default_domain, dev) here, which isn't what > we want. > > > > This doesn't break any functionality. It takes effect only if iommu > > hardware supports finer granular translation and advertises it through > > the API.> > >> That was my concern with reusing attach/detach_dev callbacks for > PASID > >> management. The attach_dev code of IOMMU drivers already has to > deal > >> with toggling between default and unmanaged domain. Dealing with > more > >> state transitions in the same path is going to be difficult. > > > > Let's consider it from the API point of view. > > > > We have iommu_a(de)ttach_device() APIs to attach or detach a domain > > to/from a device. We should avoid applying a limitation of "these are > > only for single domain case, for multiple domains, use another API". > > That's an idealized view of the API, the actual semantics aren't as > simple. For IOMMU drivers that implement IOMMU_DOMAIN_DMA in their > domain_alloc operation (Arm SMMU, AMD IOMMU, ...), attach_dev > attaches > an unmanaged domain, detach_dev reattaches the default DMA domain, > and > the detach_dev IOMMU op is not called. We can't change that now, so it's > difficult to add more functionality to attach_dev and detach_dev. > Now we have four possible usages on a(de)ttach_device: 1) Normal DMA API path i.e. IOMMU_DOMAIN_DMA, for DMA operations in host/parent device driver; 2) VFIO passthru path, when the physical device is assigned to user space or guest driver 3) mdev passthru path 1, when mdev is assigned to user space or guest driver. Here mdev is a wrap on random PCI device 4) mdev passthru path 2, when mdev is assigned to user space or guest driver. Here mdev is in a smaller granularity (e.g. tagged by PASID) as supported by vt-d scalable mode 1) and 2) are existing usages. What you described (unmanaged vs. default) falls into this category. 3) is actually same as 2) in IOMMU layer. sort of passing through DMA capability to guest. Though there is still a parent driver, the parent driver itself should not do DMA - i.e. unmanaged in host side. 4) is a new code path introduced in IOMMU layer, which is what vfio_a(de)tach_aux_domain is trying to address. In this case parent device can still have its own DMA capability, using existing code path 1) (thus default domain still applies). and the parent device can have multiple aux domains (and associated structures), using code path 4). I didn't see how this new code path interferes with default domain concept in existing path. In the future if ARM or other architectures plan to support similar scalable mode capability, only the new code path should be tweaked. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device
> From: Jean-Philippe Brucker > Sent: Thursday, September 13, 2018 1:54 AM > > On 12/09/2018 03:42, Lu Baolu wrote: > > Hi, > > > > On 09/11/2018 12:22 AM, Jean-Philippe Brucker wrote: > >> Hi, > >> > >> On 30/08/2018 05:09, Lu Baolu wrote: > >>> Below APIs are introduced in the IOMMU glue for device drivers to use > >>> the finer granularity translation. > >>> > >>> * iommu_capable(IOMMU_CAP_AUX_DOMAIN) > >>>- Represents the ability for supporting multiple domains per device > >>> (a.k.a. finer granularity translations) of the IOMMU hardware. > >> > >> iommu_capable() cannot represent hardware capabilities, we need > >> something else for systems with multiple IOMMUs that have different > >> caps. How about iommu_domain_get_attr on the device's domain > instead? > > > > Domain is not a good choice for per iommu cap query. A domain might be > > attached to devices belonging to different iommu's. > > > > How about an API with device structure as parameter? A device always > > belongs to a specific iommu. This API is supposed to be used the > > device driver. > > Ah right, domain attributes won't work. Your suggestion seems more > suitable, but maybe users can simply try to enable auxiliary domains > first, and conclude that the IOMMU doesn't support it if it returns an error > > >>> * iommu_en(dis)able_aux_domain(struct device *dev) > >>>- Enable/disable the multiple domains capability for a device > >>> referenced by @dev. > > It strikes me now that in the IOMMU driver, > iommu_enable/disable_aux_domain() will do the same thing as > iommu_sva_device_init/shutdown() > (https://www.spinics.net/lists/arm-kernel/msg651896.html). Some IOMMU > drivers want to enable PASID and allocate PASID tables only when > requested by users, in the sva_init_device IOMMU op (see Joerg's comment > last year https://patchwork.kernel.org/patch/9989307/#21025429). Maybe > we could simply add a flag to iommu_sva_device_init? We could combine, but definitely 'sva' should be removed :-) > > >>> * iommu_auxiliary_id(struct iommu_domain *domain) > >>>- Return the index value used for finer-granularity DMA translation. > >>> The specific device driver needs to feed the hardware with this > >>> value, so that hardware device could issue the DMA transaction with > >>> this value tagged. > >> > >> This could also reuse iommu_domain_get_attr. > >> > >> > >> More generally I'm having trouble understanding how auxiliary domains > >> will be used. So VFIO allocates PASIDs like this: > > > > As I wrote in the cover letter, "auxiliary domain" is just a name to > > ease discussion. It's actually has no special meaning (we think a domain > > as an isolation boundary which could be used by the IOMMU to isolate > > the DMA transactions out of a PCI device or partial of it). > > > > So drivers like vfio should see no difference when use an auxiliary > > domain. The auxiliary domain is not aware out of iommu driver. > > For an auxiliary domain, VFIO does need to retrieve the PASID and write > it to hardware. But being able to reuse > iommu_map/unmap/iova_to_phys/etc > on the auxiliary domain is nice. > > >> * iommu_enable_aux_domain(parent_dev) > >> * iommu_domain_alloc() -> dom1 > >> * iommu_domain_alloc() -> dom2 > >> * iommu_attach_device(dom1, parent_dev) > >>-> dom1 gets PASID #1 > >> * iommu_attach_device(dom2, parent_dev) > >>-> dom2 gets PASID #2 > >> > >> Then I'm not sure about the next steps, when userspace does > >> VFIO_IOMMU_MAP_DMA or VFIO_IOMMU_BIND on an mdev's > container. Is the > >> following use accurate? > >> > >> For the single translation level: > >> * iommu_map(dom1, ...) updates first-level/second-level pgtables for > >> PASID #1 > >> * iommu_map(dom2, ...) updates first-level/second-level pgtables for > >> PASID #2 > >> > >> Nested translation: > >> * iommu_map(dom1, ...) updates second-level pgtables for PASID #1 > >> * iommu_bind_table(dom1, ...) binds first-level pgtables, provided by > >> the guest, for PASID #1 > >> * iommu_map(dom2, ...) updates second-level pgtables for PASID #2 > >> * iommu_bind_table(dom2, ...) binds first-level pgtables for PASID #2 > >>> > >> I'm trying to understand how to implement this with SMMU and other > > > > This is proposed for architectures which support finer granularity > > second level translation with no impact on architectures which only > > support Source ID or the similar granularity. > > Just to be clear, in this paragraph you're only referring to the > Nested/second-level translation for mdev, which is specific to vt-d > rev3? Other architectures can still do first-level translation with > PASID, to support some use-cases of IOMMU aware mediated device > (assigning mdevs to userspace drivers, for example) yes. aux domain concept applies only to vt-d rev3 which introduces scalable mode. Care is taken to avoid breaking usages on existing architectures. one note. Assigning mdevs to user space alone doesn't imply IOMMU aware.
Re: [RFC PATCH v2 08/10] vfio/type1: Add domain at(de)taching group helpers
On 12/09/2018 06:02, Lu Baolu wrote: > Hi, > > On 09/11/2018 12:23 AM, Jean-Philippe Brucker wrote: >> On 30/08/2018 05:09, Lu Baolu wrote: >>> +static int vfio_detach_aux_domain(struct device *dev, void *data) >>> +{ >>> + struct iommu_domain *domain = data; >>> + >>> + vfio_mdev_set_aux_domain(dev, NULL); >>> + iommu_detach_device(domain, dev->parent); >> >> I think that's only going to work for vt-d, which doesn't use a >> default_domain. For other drivers, iommu.c ends up calling >> domain->ops->attach_dev(default_domain, dev) here, which isn't what we want. > > This doesn't break any functionality. It takes effect only if iommu > hardware supports finer granular translation and advertises it through > the API.> >> That was my concern with reusing attach/detach_dev callbacks for PASID >> management. The attach_dev code of IOMMU drivers already has to deal >> with toggling between default and unmanaged domain. Dealing with more >> state transitions in the same path is going to be difficult. > > Let's consider it from the API point of view. > > We have iommu_a(de)ttach_device() APIs to attach or detach a domain > to/from a device. We should avoid applying a limitation of "these are > only for single domain case, for multiple domains, use another API". That's an idealized view of the API, the actual semantics aren't as simple. For IOMMU drivers that implement IOMMU_DOMAIN_DMA in their domain_alloc operation (Arm SMMU, AMD IOMMU, ...), attach_dev attaches an unmanaged domain, detach_dev reattaches the default DMA domain, and the detach_dev IOMMU op is not called. We can't change that now, so it's difficult to add more functionality to attach_dev and detach_dev. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device
On 12/09/2018 03:42, Lu Baolu wrote: > Hi, > > On 09/11/2018 12:22 AM, Jean-Philippe Brucker wrote: >> Hi, >> >> On 30/08/2018 05:09, Lu Baolu wrote: >>> Below APIs are introduced in the IOMMU glue for device drivers to use >>> the finer granularity translation. >>> >>> * iommu_capable(IOMMU_CAP_AUX_DOMAIN) >>>- Represents the ability for supporting multiple domains per device >>> (a.k.a. finer granularity translations) of the IOMMU hardware. >> >> iommu_capable() cannot represent hardware capabilities, we need >> something else for systems with multiple IOMMUs that have different >> caps. How about iommu_domain_get_attr on the device's domain instead? > > Domain is not a good choice for per iommu cap query. A domain might be > attached to devices belonging to different iommu's. > > How about an API with device structure as parameter? A device always > belongs to a specific iommu. This API is supposed to be used the > device driver. Ah right, domain attributes won't work. Your suggestion seems more suitable, but maybe users can simply try to enable auxiliary domains first, and conclude that the IOMMU doesn't support it if it returns an error >>> * iommu_en(dis)able_aux_domain(struct device *dev) >>>- Enable/disable the multiple domains capability for a device >>> referenced by @dev. It strikes me now that in the IOMMU driver, iommu_enable/disable_aux_domain() will do the same thing as iommu_sva_device_init/shutdown() (https://www.spinics.net/lists/arm-kernel/msg651896.html). Some IOMMU drivers want to enable PASID and allocate PASID tables only when requested by users, in the sva_init_device IOMMU op (see Joerg's comment last year https://patchwork.kernel.org/patch/9989307/#21025429). Maybe we could simply add a flag to iommu_sva_device_init? >>> * iommu_auxiliary_id(struct iommu_domain *domain) >>>- Return the index value used for finer-granularity DMA translation. >>> The specific device driver needs to feed the hardware with this >>> value, so that hardware device could issue the DMA transaction with >>> this value tagged. >> >> This could also reuse iommu_domain_get_attr. >> >> >> More generally I'm having trouble understanding how auxiliary domains >> will be used. So VFIO allocates PASIDs like this: > > As I wrote in the cover letter, "auxiliary domain" is just a name to > ease discussion. It's actually has no special meaning (we think a domain > as an isolation boundary which could be used by the IOMMU to isolate > the DMA transactions out of a PCI device or partial of it). > > So drivers like vfio should see no difference when use an auxiliary > domain. The auxiliary domain is not aware out of iommu driver. For an auxiliary domain, VFIO does need to retrieve the PASID and write it to hardware. But being able to reuse iommu_map/unmap/iova_to_phys/etc on the auxiliary domain is nice. >> * iommu_enable_aux_domain(parent_dev) >> * iommu_domain_alloc() -> dom1 >> * iommu_domain_alloc() -> dom2 >> * iommu_attach_device(dom1, parent_dev) >>-> dom1 gets PASID #1 >> * iommu_attach_device(dom2, parent_dev) >>-> dom2 gets PASID #2 >> >> Then I'm not sure about the next steps, when userspace does >> VFIO_IOMMU_MAP_DMA or VFIO_IOMMU_BIND on an mdev's container. Is the >> following use accurate? >> >> For the single translation level: >> * iommu_map(dom1, ...) updates first-level/second-level pgtables for >> PASID #1 >> * iommu_map(dom2, ...) updates first-level/second-level pgtables for >> PASID #2 >> >> Nested translation: >> * iommu_map(dom1, ...) updates second-level pgtables for PASID #1 >> * iommu_bind_table(dom1, ...) binds first-level pgtables, provided by >> the guest, for PASID #1 >> * iommu_map(dom2, ...) updates second-level pgtables for PASID #2 >> * iommu_bind_table(dom2, ...) binds first-level pgtables for PASID #2 >>> >> I'm trying to understand how to implement this with SMMU and other > > This is proposed for architectures which support finer granularity > second level translation with no impact on architectures which only > support Source ID or the similar granularity. Just to be clear, in this paragraph you're only referring to the Nested/second-level translation for mdev, which is specific to vt-d rev3? Other architectures can still do first-level translation with PASID, to support some use-cases of IOMMU aware mediated device (assigning mdevs to userspace drivers, for example) >> IOMMUs. It's not a clean fit since we have a single domain to hold the >> second-level pgtables. > > Do you mind explaining why a domain holds multiple second-level > pgtables? Shouldn't that be multiple domains? I didn't mean a single domain holding multiple second-level pgtables, but a single domain holding a single set of second-level pgtables for all mdevs. But let's ignore that, mdev and second-level isn't realistic for arm SMMU. >> Then again, the nested case probably doesn't >> matter for us - we might as well assign the parent directly, since all
Re: [PATCH] vfio/pci: Some buggy virtual functions incorrectly report 1 for intx.
On Thu, Aug 09, 2018 at 01:44:17PM -0600, Alex Williamson wrote: > On Thu, 9 Aug 2018 12:37:06 -0700 > Ashok Raj wrote: > > > PCI_INTERRUPT_PIN should always read 0 for SRIOV Virtual Functions. > > > > Some SRIOV devices have some bugs in RTL and VF's end up reading 1 > > instead of 0 for the PIN. > > Hi Ashok, > > One question, can we identify which VFs are known to have this issue so > that users (and downstreams) can know how to prioritize this patch? Hi Alex Sorry it took some time to hunt this down. The offending VF has a device ID : 0x270C The corresponding PF has a device ID: 0x270B. > Thanks, > > Alex > > > Since this is a spec required value, rather than having a device specific > > quirk, we could fix it permanently in vfio. > > > > Reworked suggestions from Alex https://lkml.org/lkml/2018/7/16/1052 > > > > Reported-by: Gage Eads > > Tested-by: Gage Eads > > Signed-off-by: Ashok Raj > > Cc: k...@vger.kernel.org > > Cc: linux-ker...@vger.kernel.org > > Cc: iommu@lists.linux-foundation.org > > Cc: Joerg Roedel > > Cc: Bjorn Helgaas > > Cc: Gage Eads > > --- > > drivers/vfio/pci/vfio_pci.c| 12 +--- > > drivers/vfio/pci/vfio_pci_config.c | 3 ++- > > 2 files changed, 11 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > > index b423a30..32943dd 100644 > > --- a/drivers/vfio/pci/vfio_pci.c > > +++ b/drivers/vfio/pci/vfio_pci.c > > @@ -433,10 +433,16 @@ static int vfio_pci_get_irq_count(struct > > vfio_pci_device *vdev, int irq_type) > > { > > if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) { > > u8 pin; > > - pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin); > > - if (IS_ENABLED(CONFIG_VFIO_PCI_INTX) && !vdev->nointx && pin) > > - return 1; > > + /* > > +* INTx must be 0 for all VF's. Enforce that for all > > +* VF's since this is a spec requirement. > > +*/ > > + if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx || > > + vdev->pdev->is_virtfn) > > + return 0; > > > > + pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin); > > + return (pin ? 1 : 0); > > } else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) { > > u8 pos; > > u16 flags; > > diff --git a/drivers/vfio/pci/vfio_pci_config.c > > b/drivers/vfio/pci/vfio_pci_config.c > > index 115a36f..e36b7c3 100644 > > --- a/drivers/vfio/pci/vfio_pci_config.c > > +++ b/drivers/vfio/pci/vfio_pci_config.c > > @@ -1676,7 +1676,8 @@ int vfio_config_init(struct vfio_pci_device *vdev) > > *(__le16 *)&vconfig[PCI_DEVICE_ID] = cpu_to_le16(pdev->device); > > } > > > > - if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx) > > + if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx || > > + pdev->is_virtfn) > > vconfig[PCI_INTERRUPT_PIN] = 0; > > > > ret = vfio_cap_init(vdev); > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 0/5] add non-strict mode support for arm-smmu-v3
On 12/09/18 17:57, Will Deacon wrote: Hi all, On Wed, Aug 15, 2018 at 09:28:25AM +0800, Zhen Lei wrote: v4 -> v5: 1. change the type of global variable and struct member named "non_strict" from "int" to "bool". 2. cancel the unnecessary parameter "strict" of __arm_lpae_unmap which was added in v4. 3. change boot option "arm_iommu" to "iommu.non_strict". 4. convert __iommu_dma_unmap to use iommu_unmap_fast()/iommu_tlb_sync(), because non-leaf unmaps still need to be synchronous. Thanks for Robin's review comments. Since this is 90% of the way there now, I suggest Robin picks up what's here and incorporates his remaining review comments directly (especially since it sounded like Zhen Lei hasn't got much free time lately). With that, I can queue this lot via my smmu branch, which already has some stuff queued for SMMUv3 and io-pgtable. Please shout if you have any objections, but I'm keen for this not to languish on the lists given how close it is! Sure, having got this far I'd like to see it get done too. I'll make some time and give it a go. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/7 v7] Support for fsl-mc bus and its devices in SMMU
Hi Nipun, On Mon, Sep 10, 2018 at 07:19:14PM +0530, Nipun Gupta wrote: > This patchset defines IOMMU DT binding for fsl-mc bus and adds > support in SMMU for fsl-mc bus. > > These patches > - Define property 'iommu-map' for fsl-mc bus (patch 1) > - Integrates the fsl-mc bus with the SMMU using this > IOMMU binding (patch 2,3,4) > - Adds the dma configuration support for fsl-mc bus (patch 5, 6) > - Updates the fsl-mc device node with iommu/dma related changes (patch 7) It looks like you have all the Acks in place for this series now, so I assume it's going to go via Joerg directly. Is that right? Will > Changes in v2: > - use iommu-map property for fsl-mc bus > - rebase over patchset https://patchwork.kernel.org/patch/10317337/ > and make corresponding changes for dma configuration of devices on > fsl-mc bus > > Changes in v3: > - move of_map_rid in drivers/of/address.c > > Changes in v4: > - move of_map_rid in drivers/of/base.c > > Changes in v5: > - break patch 5 in two separate patches (now patch 5/7 and patch 6/7) > - add changelog text in patch 3/7 and patch 5/7 > - typo fix > > Changes in v6: > - Updated fsl_mc_device_group() API to be more rational > - Added dma-coherent property in the LS2 smmu device node > - Minor fixes in the device-tree documentation > > Changes in v7: > - Rebased over linux 4.19 > > Nipun Gupta (7): > Documentation: fsl-mc: add iommu-map device-tree binding for fsl-mc > bus > iommu/of: make of_pci_map_rid() available for other devices too > iommu/of: support iommu configuration for fsl-mc devices > iommu/arm-smmu: Add support for the fsl-mc bus > bus: fsl-mc: support dma configure for devices on fsl-mc bus > bus: fsl-mc: set coherent dma mask for devices on fsl-mc bus > arm64: dts: ls208xa: comply with the iommu map binding for fsl_mc > > .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 39 > arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi | 7 +- > drivers/bus/fsl-mc/fsl-mc-bus.c| 16 +++- > drivers/iommu/arm-smmu.c | 7 ++ > drivers/iommu/iommu.c | 13 +++ > drivers/iommu/of_iommu.c | 25 - > drivers/of/base.c | 102 > + > drivers/of/irq.c | 5 +- > drivers/pci/of.c | 101 > include/linux/fsl/mc.h | 8 ++ > include/linux/iommu.h | 2 + > include/linux/of.h | 11 +++ > include/linux/of_pci.h | 10 -- > 13 files changed, 224 insertions(+), 122 deletions(-) > > -- > 1.9.1 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 0/5] add non-strict mode support for arm-smmu-v3
Hi all, On Wed, Aug 15, 2018 at 09:28:25AM +0800, Zhen Lei wrote: > v4 -> v5: > 1. change the type of global variable and struct member named "non_strict" > from >"int" to "bool". > 2. cancel the unnecessary parameter "strict" of __arm_lpae_unmap which was > added >in v4. > 3. change boot option "arm_iommu" to "iommu.non_strict". > 4. convert __iommu_dma_unmap to use iommu_unmap_fast()/iommu_tlb_sync(), > because >non-leaf unmaps still need to be synchronous. > > Thanks for Robin's review comments. Since this is 90% of the way there now, I suggest Robin picks up what's here and incorporates his remaining review comments directly (especially since it sounded like Zhen Lei hasn't got much free time lately). With that, I can queue this lot via my smmu branch, which already has some stuff queued for SMMUv3 and io-pgtable. Please shout if you have any objections, but I'm keen for this not to languish on the lists given how close it is! Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] media: s5p-mfc: Fix memdev DMA configuration
Having of_reserved_mem_device_init() forcibly reconfigure DMA for all callers, potentially overriding the work done by a bus-specific .dma_configure method earlier, is at best a bad idea and at worst actively harmful. If drivers really need virtual devices to own dma-coherent memory, they should explicitly configure those devices based on the appropriate firmware node as they create them. It looks like the only driver not passing in a proper OF platform device is s5p-mfc, so move the rogue of_dma_configure() call into that driver where it logically belongs. CC: Smitha T Murthy CC: Marek Szyprowski CC: Rob Herring Signed-off-by: Robin Murphy --- drivers/media/platform/s5p-mfc/s5p_mfc.c | 7 +++ drivers/of/of_reserved_mem.c | 4 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc.c b/drivers/media/platform/s5p-mfc/s5p_mfc.c index 927a1235408d..77eb4a4511c1 100644 --- a/drivers/media/platform/s5p-mfc/s5p_mfc.c +++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c @@ -1094,6 +1094,13 @@ static struct device *s5p_mfc_alloc_memdev(struct device *dev, child->dma_mask = dev->dma_mask; child->release = s5p_mfc_memdev_release; + /* +* The memdevs are not proper OF platform devices, so in order for them +* to be treated as valid DMA masters we need a bit of a hack to force +* them to inherit the MFC node's DMA configuration. +*/ + of_dma_configure(child, dev->of_node, true); + if (device_add(child) == 0) { ret = of_reserved_mem_device_init_by_idx(child, dev->of_node, idx); diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c index 895c83e0c7b6..4ef6f4485335 100644 --- a/drivers/of/of_reserved_mem.c +++ b/drivers/of/of_reserved_mem.c @@ -350,10 +350,6 @@ int of_reserved_mem_device_init_by_idx(struct device *dev, mutex_lock(&of_rmem_assigned_device_mutex); list_add(&rd->list, &of_rmem_assigned_device_list); mutex_unlock(&of_rmem_assigned_device_mutex); - /* ensure that dma_ops is set for virtual devices -* using reserved memory -*/ - of_dma_configure(dev, np, true); dev_info(dev, "assigned reserved memory node %s\n", rmem->name); } else { -- 2.19.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: of_dma_request_slave_channel() failed ?
On Tue, Sep 11, 2018 at 11:43:47AM +0200, Geert Uytterhoeven wrote: > So it seems the audio DMAC is deferred a second time, before the iommu driver > probed. Shouldn't there be at least one more round of deferred probe triggers after the IOMMU probes? signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/3] iommu: Add fast hook for getting DMA domains
While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU API callers to retrieve the domain pointer, for DMA ops domains it doesn't scale well for large systems and multi-queue devices, since the momentary refcount adjustment will lead to exclusive cacheline contention when multiple CPUs are operating in parallel on different mappings for the same device. In the case of DMA ops domains, however, this refcounting is actually unnecessary, since they already imply that the group exists and is managed by platform code and IOMMU internals (by virtue of iommu_group_get_for_dev()) such that a reference will already be held for the lifetime of the device. Thus we can avoid the bottleneck by providing a fast lookup specifically for the DMA code to retrieve the default domain it already knows it has set up - a simple read-only dereference plays much nicer with cache-coherency protocols. Signed-off-by: Robin Murphy --- drivers/iommu/iommu.c | 9 + include/linux/iommu.h | 1 + 2 files changed, 10 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 8c15c5980299..9d70344204fe 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1415,6 +1415,15 @@ struct iommu_domain *iommu_get_domain_for_dev(struct device *dev) } EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev); +/* + * For IOMMU_DOMAIN_DMA implementations which already provide their own + * guarantees that the group and its default domain are valid and correct. + */ +struct iommu_domain *iommu_get_dma_domain(struct device *dev) +{ + return dev->iommu_group->default_domain; +} + /* * IOMMU groups are really the natrual working unit of the IOMMU, but * the IOMMU API works on domains and devices. Bridge that gap by diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 87994c265bf5..c783648d4060 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -293,6 +293,7 @@ extern int iommu_attach_device(struct iommu_domain *domain, extern void iommu_detach_device(struct iommu_domain *domain, struct device *dev); extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev); +extern struct iommu_domain *iommu_get_dma_domain(struct device *dev); extern int iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot); extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, -- 2.19.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 3/3] arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops
Whilst the symmetry of deferring to the existing sync callback in __iommu_map_page() is nice, taking a round-trip through iommu_iova_to_phys() is a pretty heavyweight way to get an address we can trivially compute from the page we already have. Tweaking it to just perform the cache maintenance directly when appropriate doesn't really make the code any more complicated, and the runtime efficiency gain can only be a benefit. Furthermore, the sync operations themselves know they can only be invoked on a managed DMA ops domain, so can use the fast specific domain lookup to avoid excessive manipulation of the group refcount (particularly in the scatterlist cases). Acked-by: Will Deacon Signed-off-by: Robin Murphy --- v2: Don't be totally broken by forgetting the offset arch/arm64/mm/dma-mapping.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 072c51fb07d7..cf017c5bb5e7 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -712,7 +712,7 @@ static void __iommu_sync_single_for_cpu(struct device *dev, if (is_device_dma_coherent(dev)) return; - phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr); + phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr); __dma_unmap_area(phys_to_virt(phys), size, dir); } @@ -725,7 +725,7 @@ static void __iommu_sync_single_for_device(struct device *dev, if (is_device_dma_coherent(dev)) return; - phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr); + phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr); __dma_map_area(phys_to_virt(phys), size, dir); } @@ -738,9 +738,9 @@ static dma_addr_t __iommu_map_page(struct device *dev, struct page *page, int prot = dma_info_to_prot(dir, coherent, attrs); dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot); - if (!iommu_dma_mapping_error(dev, dev_addr) && - (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0) - __iommu_sync_single_for_device(dev, dev_addr, size, dir); + if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) && + !iommu_dma_mapping_error(dev, dev_addr)) + __dma_map_area(page_address(page) + offset, size, dir); return dev_addr; } -- 2.19.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
Most parts of iommu-dma already assume they are operating on a default domain set up by iommu_dma_init_domain(), and can be converted straight over to avoid the refcounting bottleneck. MSI page mappings may be in an unmanaged domain with an explicit MSI-only cookie, so retain the non-specific lookup, but that's OK since they're far from a contended fast path either way. Signed-off-by: Robin Murphy --- drivers/iommu/dma-iommu.c | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 511ff9a1d6d9..320f9ea82f3f 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -491,7 +491,7 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count, void iommu_dma_free(struct device *dev, struct page **pages, size_t size, dma_addr_t *handle) { - __iommu_dma_unmap(iommu_get_domain_for_dev(dev), *handle, size); + __iommu_dma_unmap(iommu_get_dma_domain(dev), *handle, size); __iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT); *handle = IOMMU_MAPPING_ERROR; } @@ -518,7 +518,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp, unsigned long attrs, int prot, dma_addr_t *handle, void (*flush_page)(struct device *, const void *, phys_addr_t)) { - struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + struct iommu_domain *domain = iommu_get_dma_domain(dev); struct iommu_dma_cookie *cookie = domain->iova_cookie; struct iova_domain *iovad = &cookie->iovad; struct page **pages; @@ -606,9 +606,8 @@ int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma) } static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, - size_t size, int prot) + size_t size, int prot, struct iommu_domain *domain) { - struct iommu_domain *domain = iommu_get_domain_for_dev(dev); struct iommu_dma_cookie *cookie = domain->iova_cookie; size_t iova_off = 0; dma_addr_t iova; @@ -632,13 +631,14 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, unsigned long offset, size_t size, int prot) { - return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot); + return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot, + iommu_get_dma_domain(dev)); } void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir, unsigned long attrs) { - __iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size); + __iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size); } /* @@ -726,7 +726,7 @@ static void __invalidate_sg(struct scatterlist *sg, int nents) int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, int prot) { - struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + struct iommu_domain *domain = iommu_get_dma_domain(dev); struct iommu_dma_cookie *cookie = domain->iova_cookie; struct iova_domain *iovad = &cookie->iovad; struct scatterlist *s, *prev = NULL; @@ -811,20 +811,21 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, sg = tmp; } end = sg_dma_address(sg) + sg_dma_len(sg); - __iommu_dma_unmap(iommu_get_domain_for_dev(dev), start, end - start); + __iommu_dma_unmap(iommu_get_dma_domain(dev), start, end - start); } dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys, size_t size, enum dma_data_direction dir, unsigned long attrs) { return __iommu_dma_map(dev, phys, size, - dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO); + dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO, + iommu_get_dma_domain(dev)); } void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir, unsigned long attrs) { - __iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size); + __iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size); } int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr) @@ -850,7 +851,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, if (!msi_page) return NULL; - iova = __iommu_dma_map(dev, msi_addr, size, prot); + iova = __iommu_dma_map(dev, msi_addr, size, prot, domain); if (iommu_dma_mapping_error(dev, iova)) goto out_free_page; -- 2.19.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org
[PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention
John raised the issue[1] that we have some unnecessary refcount contention in the DMA ops path which shows scalability problems now that we have more real high-performance hardware using iommu-dma. The x86 IOMMU drivers are sidestepping this by stashing domain references in archdata, but since that's not very nice for architecture-agnostic code, I think it's time to look at a generic API-level solution. These are a couple of quick patches based on the idea I had back when first implementing iommu-dma, but didn't have any way to justify at the time. However, the reports of 10-25% better networking performance on v1 suggest that it's very worthwhile (and far more significant than I ever would have guessed). As far as merging goes, I don't mind at all whether this goes via IOMMU, or via dma-mapping provided Joerg's happy to ack it. Robin. [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-August/029303.html Robin Murphy (3): iommu: Add fast hook for getting DMA domains iommu/dma: Use fast DMA domain lookup arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops arch/arm64/mm/dma-mapping.c | 10 +- drivers/iommu/dma-iommu.c | 23 --- drivers/iommu/iommu.c | 9 + include/linux/iommu.h | 1 + 4 files changed, 27 insertions(+), 16 deletions(-) -- 2.19.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
Am 12.09.2018 um 14:40 schrieb Jean-Philippe Brucker: On 08/09/2018 08:29, Christian König wrote: Yes, exactly. I just need a PASID which is never used by the OS for a process and we can easily give that back when the last FD reference is closed. Alright, iommu-sva can get its PASID from this external allocator as well, as long as it has an interface similar to idr. Where would it go, drivers/base/, mm/, kernel/...? Good question, my initial instinct was to put it under drivers/pci. But AFAIKS now you are supporting SVA implementations which are not based on PCI. So drivers/base sounds like a good place to me. The process dies, iommu-sva is notified and calls the mm_exit() function passed by the device driver to iommu_sva_device_init(). In mm_exit() the device driver needs to clear any reference to the PASID in hardware and in its own structures. When the device driver returns from mm_exit(), it effectively tells the core that it has finished using the PASID, and iommu-sva can reuse the PASID for another process. mm_exit() is allowed to block, so the device driver has time to clean up and flush the queues. If the device driver finishes using the PASID before the process exits, it just calls unbind(). Exactly that's what Michal Hocko is probably going to not like at all. Can we have a different approach where each driver is informed by the mm_exit(), but needs to explicitly call unbind() before a PASID is reused? It's awful from the IOMMU driver perspective. In addition to "enabled" and "disabled" PASID states, you add "disabled but DMA still running normally". Between that new state and "disabled", the IOMMU will be flooded by translation faults (non-recoverable ones), which it needs to ignore instead of reporting to the kernel. Not all IOMMUs can deal with this in hardware (SMMU and VT-d can quiesce translation faults per-PASID, but I don't think AMD IOMMU can.) Some drivers will have to filter fault events themselves, depending on the PASID state. Puh, yeah that is probably true. Ok let us skip that for a moment, we just need to invest more work in killing DMA operations quickly when the process address space is teared down. During that teardown transition it would be ideal if that PASID only points to a dummy root page directory with only invalid entries. I guess this can be vendor specific, In VT-d I plan to mark PASID entry not present and disable fault reporting while draining remaining activities. Sounds good to me. Point is at least in the case where the process was killed by the OOM killer we should not block in mm_exit(). Instead operations issued by the process to a device driver which uses SVA needs to be terminated as soon as possible to make sure that the OOM killer can advance. I don't see how we're preventing the OOM killer from advancing, so I'm looking for a stronger argument that justifies adding this complexity to IOMMU drivers. Time limit of the release MMU notifier, locking requirement, a concrete example where things break, even a comment somewhere in mm/ would do... In my tests I can't manage to disturb the OOM killer, but I could be missing some special case. Even if the mm_exit() callback (unrealistically) sleeps for 60 seconds, Well you are *COMPLETELY* under estimating this. A compute task with a huge wave launch can take multiple minutes to tear down. That's why I'm so concerned about that, but to be honest I think that just the hardware needs to become better and we need to be able to block dead tasks from spawning threads again. Regards, Christian. the OOM killer isn't affected because oom_reap_task_mm() wipes the victim's address space in another thread, either before or while the release notifier is running. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
On 08/09/2018 08:29, Christian König wrote: > Yes, exactly. I just need a PASID which is never used by the OS for a > process and we can easily give that back when the last FD reference is > closed. Alright, iommu-sva can get its PASID from this external allocator as well, as long as it has an interface similar to idr. Where would it go, drivers/base/, mm/, kernel/...? The process dies, iommu-sva is notified and calls the mm_exit() function passed by the device driver to iommu_sva_device_init(). In mm_exit() the device driver needs to clear any reference to the PASID in hardware and in its own structures. When the device driver returns from mm_exit(), it effectively tells the core that it has finished using the PASID, and iommu-sva can reuse the PASID for another process. mm_exit() is allowed to block, so the device driver has time to clean up and flush the queues. If the device driver finishes using the PASID before the process exits, it just calls unbind(). >>> Exactly that's what Michal Hocko is probably going to not like at all. >>> >>> Can we have a different approach where each driver is informed by the >>> mm_exit(), but needs to explicitly call unbind() before a PASID is >>> reused? It's awful from the IOMMU driver perspective. In addition to "enabled" and "disabled" PASID states, you add "disabled but DMA still running normally". Between that new state and "disabled", the IOMMU will be flooded by translation faults (non-recoverable ones), which it needs to ignore instead of reporting to the kernel. Not all IOMMUs can deal with this in hardware (SMMU and VT-d can quiesce translation faults per-PASID, but I don't think AMD IOMMU can.) Some drivers will have to filter fault events themselves, depending on the PASID state. >>> During that teardown transition it would be ideal if that PASID only >>> points to a dummy root page directory with only invalid entries. >>> >> I guess this can be vendor specific, In VT-d I plan to mark PASID >> entry not present and disable fault reporting while draining remaining >> activities. > > Sounds good to me. > > Point is at least in the case where the process was killed by the OOM > killer we should not block in mm_exit(). > > Instead operations issued by the process to a device driver which uses > SVA needs to be terminated as soon as possible to make sure that the OOM > killer can advance. I don't see how we're preventing the OOM killer from advancing, so I'm looking for a stronger argument that justifies adding this complexity to IOMMU drivers. Time limit of the release MMU notifier, locking requirement, a concrete example where things break, even a comment somewhere in mm/ would do... In my tests I can't manage to disturb the OOM killer, but I could be missing some special case. Even if the mm_exit() callback (unrealistically) sleeps for 60 seconds, the OOM killer isn't affected because oom_reap_task_mm() wipes the victim's address space in another thread, either before or while the release notifier is running. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu