Re: [PATCH v3 3/8] of/address: Introduce of_dma_get_max_cpu_address()
On Thu, 15 Oct 2020 at 00:03, Rob Herring wrote: > > On Wed, Oct 14, 2020 at 2:12 PM Nicolas Saenz Julienne > wrote: > > > > Introduce of_dma_get_max_cpu_address(), which provides the highest CPU > > physical address addressable by all DMA masters in the system. It's > > specially useful for setting memory zones sizes at early boot time. > > > > Signed-off-by: Nicolas Saenz Julienne > > > > --- > > > > Changes since v2: > > - Use PHYS_ADDR_MAX > > - return phys_dma_t > > - Rename function > > - Correct subject > > - Add support to start parsing from an arbitrary device node in order > >for the function to work with unit tests > > > > drivers/of/address.c | 42 ++ > > include/linux/of.h | 7 +++ > > 2 files changed, 49 insertions(+) > > > > diff --git a/drivers/of/address.c b/drivers/of/address.c > > index eb9ab4f1e80b..b5a9695aaf82 100644 > > --- a/drivers/of/address.c > > +++ b/drivers/of/address.c > > @@ -1024,6 +1024,48 @@ int of_dma_get_range(struct device_node *np, const > > struct bus_dma_region **map) > > } > > #endif /* CONFIG_HAS_DMA */ > > > > +/** > > + * of_dma_get_max_cpu_address - Gets highest CPU address suitable for DMA > > + * @np: The node to start searching from or NULL to start from the root > > + * > > + * Gets the highest CPU physical address that is addressable by all DMA > > masters > > + * in the system (or subtree when np is non-NULL). If no DMA constrained > > device > > + * is found, it returns PHYS_ADDR_MAX. > > + */ > > +phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np) > > +{ > > + phys_addr_t max_cpu_addr = PHYS_ADDR_MAX; > > One issue with using phys_addr_t is it may be 32-bit even though the > DT is 64-bit addresses. LPAE capable system with LPAE disabled. Maybe > the truncation is fine here? Maybe not. > PHYS_ADDR_MAX is the max addressable CPU address on the system, and so it makes sense to use it for the return type, and for the preliminary return value: this is actually what /prevents/ truncation, because we will only overwrite max_cpu_addr if the new u64 value is lower. > > + struct of_range_parser parser; > > + phys_addr_t subtree_max_addr; > > + struct device_node *child; > > + phys_addr_t cpu_end = 0; > > + struct of_range range; > > + const __be32 *ranges; > > + int len; > > + > > + if (!np) > > + np = of_root; > > + > > + ranges = of_get_property(np, "dma-ranges", &len); > > I'm not really following why you changed the algorithm here. You're > skipping disabled nodes which is good. Was there some other reason? > > > + if (ranges && len) { > > + of_dma_range_parser_init(&parser, np); > > + for_each_of_range(&parser, &range) > > + if (range.cpu_addr + range.size > cpu_end) > > + cpu_end = range.cpu_addr + range.size; > > + > > + if (max_cpu_addr > cpu_end) > > + max_cpu_addr = cpu_end; > > + } > > + > > + for_each_available_child_of_node(np, child) { > > + subtree_max_addr = of_dma_get_max_cpu_address(child); > > + if (max_cpu_addr > subtree_max_addr) > > + max_cpu_addr = subtree_max_addr; > > + } > > + > > + return max_cpu_addr; > > +} > > + > > /** > > * of_dma_is_coherent - Check if device is coherent > > * @np:device node > > diff --git a/include/linux/of.h b/include/linux/of.h > > index 481ec0467285..db8db8f2c967 100644 > > --- a/include/linux/of.h > > +++ b/include/linux/of.h > > @@ -558,6 +558,8 @@ int of_map_id(struct device_node *np, u32 id, > >const char *map_name, const char *map_mask_name, > >struct device_node **target, u32 *id_out); > > > > +phys_addr_t of_dma_get_max_cpu_address(struct device_node *np); > > + > > #else /* CONFIG_OF */ > > > > static inline void of_core_init(void) > > @@ -995,6 +997,11 @@ static inline int of_map_id(struct device_node *np, > > u32 id, > > return -EINVAL; > > } > > > > +static inline phys_addr_t of_dma_get_max_cpu_address(struct device_node > > *np) > > +{ > > + return PHYS_ADDR_MAX; > > +} > > + > > #define of_match_ptr(_ptr) NULL > > #define of_match_node(_matches, _node) NULL > > #endif /* CONFIG_OF */ > > -- > > 2.28.0 > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/3] drm/msm: add DRM_MSM_GEM_SYNC_CACHE for non-coherent cache maintenance
On Tue, Oct 13, 2020 at 02:42:38PM +0100, Robin Murphy wrote: > I still think this situation would be best handled with a variant of > dma_ops_bypass that also guarantees to bypass SWIOTLB, and can be set > automatically when attaching to an unmanaged IOMMU domain. dma_ops_bypass should mostly do the right thing as-is. swiotlb bouncing is triggered of two things: 1) the dma_mask. This is under control of the driver, and obviously if it is too small for a legit reason we can't just proceed 2) force_dma_unencrypted() - we'd need to do an opt-out here, either by a flag or by being smart and looking for an attached iommu on the device > That way the > device driver can make DMA API calls in the appropriate places that do the > right thing either way, and only needs logic to decide whether to use the > returned DMA addresses directly or ignore them if it knows they're > overridden by its own IOMMU mapping. I'd be happy to review patches for this. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/14 上午11:08, Tian, Kevin wrote: From: Jason Wang Sent: Tuesday, October 13, 2020 2:22 PM On 2020/10/12 下午4:38, Tian, Kevin wrote: From: Jason Wang Sent: Monday, September 14, 2020 12:20 PM [...] > If it's possible, I would suggest a generic uAPI instead of a VFIO specific one. Jason suggest something like /dev/sva. There will be a lot of other subsystems that could benefit from this (e.g vDPA). Have you ever considered this approach? Hi, Jason, We did some study on this approach and below is the output. It's a long writing but I didn't find a way to further abstract w/o losing necessary context. Sorry about that. Overall the real purpose of this series is to enable IOMMU nested translation capability with vSVA as one major usage, through below new uAPIs: 1) Report/enable IOMMU nested translation capability; 2) Allocate/free PASID; 3) Bind/unbind guest page table; 4) Invalidate IOMMU cache; 5) Handle IOMMU page request/response (not in this series); 1/3/4) is the minimal set for using IOMMU nested translation, with the other two optional. For example, the guest may enable vSVA on a device without using PASID. Or, it may bind its gIOVA page table which doesn't require page fault support. Finally, all operations can be applied to either physical device or subdevice. Then we evaluated each uAPI whether generalizing it is a good thing both in concept and regarding to complexity. First, unlike other uAPIs which are all backed by iommu_ops, PASID allocation/free is through the IOASID sub-system. A question here, is IOASID expected to be the single management interface for PASID? yes (I'm asking since there're already vendor specific IDA based PASID allocator e.g amdgpu_pasid_alloc()) That comes before IOASID core was introduced. I think it should be changed to use the new generic interface. Jacob/Jean can better comment if other reason exists for this exception. If there's no exception it should be fixed. From this angle we feel generalizing PASID management does make some sense. First, PASID is just a number and not related to any device before it's bound to a page table and IOMMU domain. Second, PASID is a global resource (at least on Intel VT-d), I think we need a definition of "global" here. It looks to me for vt-d the PASID table is per device. PASID table is per device, thus VT-d could support per-device PASIDs in concept. I think that's the requirement of PCIE spec which said PASID + RID identifies the process address space ID. However on Intel platform we require PASIDs to be managed in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV and ENQCMD together. Any reason for such requirement? (I'm not familiar with ENQCMD, but my understanding is that vSVA, SIOV or SR-IOV doesn't have the requirement for system-wide PASID). Thus the host creates only one 'global' PASID namespace but do use per-device PASID table to assure isolation between devices on Intel platforms. But ARM does it differently as Jean explained. They have a global namespace for host processes on all host-owned devices (same as Intel), but then per-device namespace when a device (and its PASID table) is assigned to userspace. Another question, is this possible to have two DMAR hardware unit(at least I can see two even in my laptop). In this case, is PASID still a global resource? yes while having separate VFIO/ VDPA allocation interfaces may easily cause confusion in userspace, e.g. which interface to be used if both VFIO/VDPA devices exist. Moreover, an unified interface allows centralized control over how many PASIDs are allowed per process. Yes. One unclear part with this generalization is about the permission. Do we open this interface to any process or only to those which have assigned devices? If the latter, what would be the mechanism to coordinate between this new interface and specific passthrough frameworks? I'm not sure, but if you just want a permission, you probably can introduce new capability (CAP_XXX) for this. A more tricky case, vSVA support on ARM (Eric/Jean please correct me) plans to do per-device PASID namespace which is built on a bind_pasid_table iommu callback to allow guest fully manage its PASIDs on a given passthrough device. I see, so I think the answer is to prepare for the namespace support from the start. (btw, I don't see how namespace is handled in current IOASID module?) The PASID table is based on GPA when nested translation is enabled on ARM SMMU. This design implies that the guest manages PASID table thus PASIDs instead of going through host-side API on assigned device. From this angle we don't need explicit namespace in the host API. Just need a way to control how many PASIDs a process is allowed to allocate in the global namespace. btw IOASID module already has 'set' concept per-process and PASIDs are managed per-set. Then the quota control can be
Re: [PATCH v3 3/8] of/address: Introduce of_dma_get_max_cpu_address()
> +phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np) > +{ > + phys_addr_t max_cpu_addr = PHYS_ADDR_MAX; > + struct of_range_parser parser; > + phys_addr_t subtree_max_addr; > + struct device_node *child; > + phys_addr_t cpu_end = 0; > + struct of_range range; > + const __be32 *ranges; > + int len; > + > + if (!np) > + np = of_root; Requiring of_root to be passed explicitly would seem more natural to me than the magic NULL argument. There doesn't seem to be any precedent for that kind of calling convention either. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 8/8] mm: Update DMA zones description
On Wed, Oct 14, 2020 at 09:12:10PM +0200, Nicolas Saenz Julienne wrote: > The default behavior for arm64 changed, so reflect that. > > Signed-off-by: Nicolas Saenz Julienne > Acked-by: Catalin Marinas > --- > include/linux/mmzone.h | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index fb3bf696c05e..4ee2306351b9 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -363,8 +363,9 @@ enum zone_type { >* - arm only uses ZONE_DMA, the size, up to 4G, may vary depending on >*the specific device. >* > - * - arm64 has a fixed 1G ZONE_DMA and ZONE_DMA32 for the rest of the > - *lower 4G. > + * - arm64 uses a single 4GB ZONE_DMA, except on the Raspberry Pi 4, > + *in which ZONE_DMA covers the first GB and ZONE_DMA32 the rest of > + *the lower 4GB. Honestly I think this comment just needs to go away. We can't really list every setup in a comment in common code. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 6/8] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
On Wed, Oct 14, 2020 at 09:12:08PM +0200, Nicolas Saenz Julienne wrote: > + zone_dma_bits = min(zone_dma_bits, > + (unsigned > int)ilog2(of_dma_get_max_cpu_address(NULL))); Plase avoid pointlessly long lines. Especially if it is completely trivial by using either min_t or not overindenting like here. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 5/8] dma-direct: Turn zone_dma_bits default value into a define
On Wed, Oct 14, 2020 at 09:12:07PM +0200, Nicolas Saenz Julienne wrote: > Set zone_dma_bits default value through a define so as for architectures > to be able to override it with their default value. Architectures can do that already by assigning a value to zone_dma_bits at runtime. I really do not want to add the extra clutter. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 3/3] iommu/tegra-smmu: Add PCI support
On Wed, Oct 14, 2020 at 06:42:36PM +0100, Robin Murphy wrote: > On 2020-10-09 17:19, Nicolin Chen wrote: > > This patch simply adds support for PCI devices. > > > > Reviewed-by: Dmitry Osipenko > > Tested-by: Dmitry Osipenko > > Signed-off-by: Nicolin Chen > > --- > > > > Changelog > > v6->v7 > > * Renamed goto labels, suggested by Thierry. > > v5->v6 > > * Added Dmitry's Reviewed-by and Tested-by. > > v4->v5 > > * Added Dmitry's Reviewed-by > > v3->v4 > > * Dropped !iommu_present() check > > * Added CONFIG_PCI check in the exit path > > v2->v3 > > * Replaced ternary conditional operator with if-else in .device_group() > > * Dropped change in tegra_smmu_remove() > > v1->v2 > > * Added error-out labels in tegra_smmu_probe() > > * Dropped pci_request_acs() since IOMMU core would call it. > > > > drivers/iommu/tegra-smmu.c | 35 +-- > > 1 file changed, 25 insertions(+), 10 deletions(-) > > > > diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c > > index be29f5977145..2941d6459076 100644 > > --- a/drivers/iommu/tegra-smmu.c > > +++ b/drivers/iommu/tegra-smmu.c > > @@ -10,6 +10,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -865,7 +866,11 @@ static struct iommu_group > > *tegra_smmu_device_group(struct device *dev) > > group->smmu = smmu; > > group->soc = soc; > > - group->group = iommu_group_alloc(); > > + if (dev_is_pci(dev)) > > + group->group = pci_device_group(dev); > > Just to check, is it OK to have two or more swgroups "owning" the same > iommu_group if an existing one gets returned here? It looks like that might > not play nice with the use of iommu_group_set_iommudata(). Do you mean by "gets returned here" the "IS_ERR" check below? > Robin. > > > + else > > + group->group = generic_device_group(dev); > > + > > if (IS_ERR(group->group)) { > > devm_kfree(smmu->dev, group); > > mutex_unlock(&smmu->lock); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/amd: Increase interrupt remapping table limit to 512 entries
Certain device drivers allocate IO queues on a per-cpu basis. On AMD EPYC platform, which can support up-to 256 cpu threads, this can exceed the current MAX_IRQ_PER_TABLE limit of 256, and result in the error message: AMD-Vi: Failed to allocate IRTE This has been observed with certain NVME devices. AMD IOMMU hardware can actually support upto 512 interrupt remapping table entries. Therefore, update the driver to match the hardware limit. Please note that this also increases the size of interrupt remapping table to 8KB per device when using the 128-bit IRTE format. Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd/amd_iommu_types.h | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h index 30a5d412255a..427484c45589 100644 --- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -406,7 +406,11 @@ extern bool amd_iommu_np_cache; /* Only true if all IOMMUs support device IOTLBs */ extern bool amd_iommu_iotlb_sup; -#define MAX_IRQS_PER_TABLE 256 +/* + * AMD IOMMU hardware only support 512 IRTEs despite + * the architectural limitation of 2048 entries. + */ +#define MAX_IRQS_PER_TABLE 512 #define IRQ_TABLE_ALIGNMENT128 struct irq_remap_table { -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, 14 Oct 2020 03:08:31 + "Tian, Kevin" wrote: > > From: Jason Wang > > Sent: Tuesday, October 13, 2020 2:22 PM > > > > > > On 2020/10/12 下午4:38, Tian, Kevin wrote: > > >> From: Jason Wang > > >> Sent: Monday, September 14, 2020 12:20 PM > > >> > > > [...] > > > > If it's possible, I would suggest a generic uAPI instead of a VFIO > > >> specific one. > > >> > > >> Jason suggest something like /dev/sva. There will be a lot of other > > >> subsystems that could benefit from this (e.g vDPA). > > >> > > >> Have you ever considered this approach? > > >> > > > Hi, Jason, > > > > > > We did some study on this approach and below is the output. It's a > > > long writing but I didn't find a way to further abstract w/o losing > > > necessary context. Sorry about that. > > > > > > Overall the real purpose of this series is to enable IOMMU nested > > > translation capability with vSVA as one major usage, through > > > below new uAPIs: > > > 1) Report/enable IOMMU nested translation capability; > > > 2) Allocate/free PASID; > > > 3) Bind/unbind guest page table; > > > 4) Invalidate IOMMU cache; > > > 5) Handle IOMMU page request/response (not in this series); > > > 1/3/4) is the minimal set for using IOMMU nested translation, with > > > the other two optional. For example, the guest may enable vSVA on > > > a device without using PASID. Or, it may bind its gIOVA page table > > > which doesn't require page fault support. Finally, all operations can > > > be applied to either physical device or subdevice. > > > > > > Then we evaluated each uAPI whether generalizing it is a good thing > > > both in concept and regarding to complexity. > > > > > > First, unlike other uAPIs which are all backed by iommu_ops, PASID > > > allocation/free is through the IOASID sub-system. > > > > > > A question here, is IOASID expected to be the single management > > interface for PASID? > > yes > > > > > (I'm asking since there're already vendor specific IDA based PASID > > allocator e.g amdgpu_pasid_alloc()) > > That comes before IOASID core was introduced. I think it should be > changed to use the new generic interface. Jacob/Jean can better > comment if other reason exists for this exception. > > > > > > > > From this angle > > > we feel generalizing PASID management does make some sense. > > > First, PASID is just a number and not related to any device before > > > it's bound to a page table and IOMMU domain. Second, PASID is a > > > global resource (at least on Intel VT-d), > > > > > > I think we need a definition of "global" here. It looks to me for vt-d > > the PASID table is per device. > > PASID table is per device, thus VT-d could support per-device PASIDs > in concept. However on Intel platform we require PASIDs to be managed > in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV > and ENQCMD together. Thus the host creates only one 'global' PASID > namespace but do use per-device PASID table to assure isolation between > devices on Intel platforms. But ARM does it differently as Jean explained. > They have a global namespace for host processes on all host-owned > devices (same as Intel), but then per-device namespace when a device > (and its PASID table) is assigned to userspace. > > > > > Another question, is this possible to have two DMAR hardware unit(at > > least I can see two even in my laptop). In this case, is PASID still a > > global resource? > > yes > > > > > > > > while having separate VFIO/ > > > VDPA allocation interfaces may easily cause confusion in userspace, > > > e.g. which interface to be used if both VFIO/VDPA devices exist. > > > Moreover, an unified interface allows centralized control over how > > > many PASIDs are allowed per process. > > > > > > Yes. > > > > > > > > > > One unclear part with this generalization is about the permission. > > > Do we open this interface to any process or only to those which > > > have assigned devices? If the latter, what would be the mechanism > > > to coordinate between this new interface and specific passthrough > > > frameworks? > > > > > > I'm not sure, but if you just want a permission, you probably can > > introduce new capability (CAP_XXX) for this. > > > > > > > A more tricky case, vSVA support on ARM (Eric/Jean > > > please correct me) plans to do per-device PASID namespace which > > > is built on a bind_pasid_table iommu callback to allow guest fully > > > manage its PASIDs on a given passthrough device. > > > > > > I see, so I think the answer is to prepare for the namespace support > > from the start. (btw, I don't see how namespace is handled in current > > IOASID module?) > > The PASID table is based on GPA when nested translation is enabled > on ARM SMMU. This design implies that the guest manages PASID > table thus PASIDs instead of going through host-side API on assigned > device. From this angle we don't need explicit namespace in the
Re: [PATCH v3 4/8] of: unittest: Add test for of_dma_get_max_cpu_address()
On Wed, Oct 14, 2020 at 2:12 PM Nicolas Saenz Julienne wrote: > > Introduce a test for of_dma_get_max_cup_address(), it uses the same DT > data as the rest of dma-ranges unit tests. > > Signed-off-by: Nicolas Saenz Julienne > --- > drivers/of/unittest.c | 20 > 1 file changed, 20 insertions(+) > > diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c > index 06cc988faf78..2cbf2a585c9f 100644 > --- a/drivers/of/unittest.c > +++ b/drivers/of/unittest.c > @@ -869,6 +869,25 @@ static void __init of_unittest_changeset(void) > #endif > } > > +static void __init of_unittest_dma_get_max_cpu_address(void) > +{ > +#ifdef CONFIG_HAS_DMA Can't the unittest run without this? I run the unittests under UML. > + struct device_node *np; > + phys_addr_t cpu_addr; > + > + np = of_find_node_by_path("/testcase-data/address-tests"); > + if (!np) { > + pr_err("missing testcase data\n"); > + return; > + } > + > + cpu_addr = of_dma_get_max_cpu_address(np); > + unittest(cpu_addr == 0x5000ULL, > +"of_dma_get_max_cpu_address: wrong CPU addr %pad (expecting > %llx)\n", > +&cpu_addr, 0x5000ULL); > +#endif > +} > + > static void __init of_unittest_dma_ranges_one(const char *path, > u64 expect_dma_addr, u64 expect_paddr) > { > @@ -3266,6 +3285,7 @@ static int __init of_unittest(void) > of_unittest_changeset(); > of_unittest_parse_interrupts(); > of_unittest_parse_interrupts_extended(); > + of_unittest_dma_get_max_cpu_address(); > of_unittest_parse_dma_ranges(); > of_unittest_pci_dma_ranges(); > of_unittest_match_node(); > -- > 2.28.0 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 3/8] of/address: Introduce of_dma_get_max_cpu_address()
On Wed, Oct 14, 2020 at 2:12 PM Nicolas Saenz Julienne wrote: > > Introduce of_dma_get_max_cpu_address(), which provides the highest CPU > physical address addressable by all DMA masters in the system. It's > specially useful for setting memory zones sizes at early boot time. > > Signed-off-by: Nicolas Saenz Julienne > > --- > > Changes since v2: > - Use PHYS_ADDR_MAX > - return phys_dma_t > - Rename function > - Correct subject > - Add support to start parsing from an arbitrary device node in order >for the function to work with unit tests > > drivers/of/address.c | 42 ++ > include/linux/of.h | 7 +++ > 2 files changed, 49 insertions(+) > > diff --git a/drivers/of/address.c b/drivers/of/address.c > index eb9ab4f1e80b..b5a9695aaf82 100644 > --- a/drivers/of/address.c > +++ b/drivers/of/address.c > @@ -1024,6 +1024,48 @@ int of_dma_get_range(struct device_node *np, const > struct bus_dma_region **map) > } > #endif /* CONFIG_HAS_DMA */ > > +/** > + * of_dma_get_max_cpu_address - Gets highest CPU address suitable for DMA > + * @np: The node to start searching from or NULL to start from the root > + * > + * Gets the highest CPU physical address that is addressable by all DMA > masters > + * in the system (or subtree when np is non-NULL). If no DMA constrained > device > + * is found, it returns PHYS_ADDR_MAX. > + */ > +phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np) > +{ > + phys_addr_t max_cpu_addr = PHYS_ADDR_MAX; One issue with using phys_addr_t is it may be 32-bit even though the DT is 64-bit addresses. LPAE capable system with LPAE disabled. Maybe the truncation is fine here? Maybe not. > + struct of_range_parser parser; > + phys_addr_t subtree_max_addr; > + struct device_node *child; > + phys_addr_t cpu_end = 0; > + struct of_range range; > + const __be32 *ranges; > + int len; > + > + if (!np) > + np = of_root; > + > + ranges = of_get_property(np, "dma-ranges", &len); I'm not really following why you changed the algorithm here. You're skipping disabled nodes which is good. Was there some other reason? > + if (ranges && len) { > + of_dma_range_parser_init(&parser, np); > + for_each_of_range(&parser, &range) > + if (range.cpu_addr + range.size > cpu_end) > + cpu_end = range.cpu_addr + range.size; > + > + if (max_cpu_addr > cpu_end) > + max_cpu_addr = cpu_end; > + } > + > + for_each_available_child_of_node(np, child) { > + subtree_max_addr = of_dma_get_max_cpu_address(child); > + if (max_cpu_addr > subtree_max_addr) > + max_cpu_addr = subtree_max_addr; > + } > + > + return max_cpu_addr; > +} > + > /** > * of_dma_is_coherent - Check if device is coherent > * @np:device node > diff --git a/include/linux/of.h b/include/linux/of.h > index 481ec0467285..db8db8f2c967 100644 > --- a/include/linux/of.h > +++ b/include/linux/of.h > @@ -558,6 +558,8 @@ int of_map_id(struct device_node *np, u32 id, >const char *map_name, const char *map_mask_name, >struct device_node **target, u32 *id_out); > > +phys_addr_t of_dma_get_max_cpu_address(struct device_node *np); > + > #else /* CONFIG_OF */ > > static inline void of_core_init(void) > @@ -995,6 +997,11 @@ static inline int of_map_id(struct device_node *np, u32 > id, > return -EINVAL; > } > > +static inline phys_addr_t of_dma_get_max_cpu_address(struct device_node *np) > +{ > + return PHYS_ADDR_MAX; > +} > + > #define of_match_ptr(_ptr) NULL > #define of_match_node(_matches, _node) NULL > #endif /* CONFIG_OF */ > -- > 2.28.0 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [git pull] IOMMU Updates for Linux v5.10
The pull request you sent on Tue, 13 Oct 2020 18:03:58 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git > tags/iommu-updates-v5.10 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/531d29b0b674036347a04c08c0898ff1aa522180 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [git pull] IOMMU Updates for Linux v5.10
On Tue, Oct 13, 2020 at 9:04 AM Joerg Roedel wrote: > > there is a minor conflict this time in include/linux/iommu.h which > should be easy to resolve. I would attach my resolution, but somehow git > [show|log] didn't show it to me. So when a resolution takes one side over the other (as opposed to mixing two sides), then "git show" doesn't show it as a conflict any more. The reason is simple: at "git merge" time, git figures out the common base, so it has all of "base", "side1" and "side2" to look at and does a three-way diff (for the simplified normal case). So it can see it as a conflict, because it sees both where you came from, and where you ended up. But once you have resolved the conflict and committed the end result, "git show" (and "git log") doesn't do the whole expensive "what was the base" thing any more. "git show" just sees the parents (so "side1" and "side2") and the end result, but doesn't really see - or care - about the fact that some time in the distant past you also had that "base" state. As a result, "git show" doesn't ever really understand the notion of a "merge conflict", and all it shows is really "whee, this end result looks like neither side" as a kind of pseudo-conflict diff. Anyway, thanks for the describing the conflict, it was indeed not complicated, this email is just to explain your "but somehow git [show|log] didn't show it to me" thing. Linus ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 4/8] of: unittest: Add test for of_dma_get_max_cpu_address()
Introduce a test for of_dma_get_max_cup_address(), it uses the same DT data as the rest of dma-ranges unit tests. Signed-off-by: Nicolas Saenz Julienne --- drivers/of/unittest.c | 20 1 file changed, 20 insertions(+) diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c index 06cc988faf78..2cbf2a585c9f 100644 --- a/drivers/of/unittest.c +++ b/drivers/of/unittest.c @@ -869,6 +869,25 @@ static void __init of_unittest_changeset(void) #endif } +static void __init of_unittest_dma_get_max_cpu_address(void) +{ +#ifdef CONFIG_HAS_DMA + struct device_node *np; + phys_addr_t cpu_addr; + + np = of_find_node_by_path("/testcase-data/address-tests"); + if (!np) { + pr_err("missing testcase data\n"); + return; + } + + cpu_addr = of_dma_get_max_cpu_address(np); + unittest(cpu_addr == 0x5000ULL, +"of_dma_get_max_cpu_address: wrong CPU addr %pad (expecting %llx)\n", +&cpu_addr, 0x5000ULL); +#endif +} + static void __init of_unittest_dma_ranges_one(const char *path, u64 expect_dma_addr, u64 expect_paddr) { @@ -3266,6 +3285,7 @@ static int __init of_unittest(void) of_unittest_changeset(); of_unittest_parse_interrupts(); of_unittest_parse_interrupts_extended(); + of_unittest_dma_get_max_cpu_address(); of_unittest_parse_dma_ranges(); of_unittest_pci_dma_ranges(); of_unittest_match_node(); -- 2.28.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 7/8] arm64: mm: Set ZONE_DMA size based on early IORT scan
From: Ard Biesheuvel We recently introduced a 1 GB sized ZONE_DMA to cater for platforms incorporating masters that can address less than 32 bits of DMA, in particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has peripherals that can only address up to 1 GB (and its PCIe host bridge can only access the bottom 3 GB) Instructing the DMA layer about these limitations is straight-forward, even though we had to fix some issues regarding memory limits set in the IORT for named components, and regarding the handling of ACPI _DMA methods. However, the DMA layer also needs to be able to allocate memory that is guaranteed to meet those DMA constraints, for bounce buffering as well as allocating the backing for consistent mappings. This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately, it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes problems with kdump, and potentially in other places where allocations cannot cross zone boundaries. Therefore, we should avoid having two separate DMA zones when possible. So let's do an early scan of the IORT, and only create the ZONE_DMA if we encounter any devices that need it. This puts the burden on the firmware to describe such limitations in the IORT, which may be redundant (and less precise) if _DMA methods are also being provided. However, it should be noted that this situation is highly unusual for arm64 ACPI machines. Also, the DMA subsystem still gives precedence to the _DMA method if implemented, and so we will not lose the ability to perform streaming DMA outside the ZONE_DMA if the _DMA method permits it. Cc: Jeremy Linton Cc: Lorenzo Pieralisi Cc: Nicolas Saenz Julienne Cc: Rob Herring Cc: Christoph Hellwig Cc: Robin Murphy Cc: Hanjun Guo Cc: Sudeep Holla Cc: Anshuman Khandual Signed-off-by: Ard Biesheuvel [nsaenz: Rebased, removed documentation change, warnings and add declaration in acpi_iort.h] Signed-off-by: Nicolas Saenz Julienne --- arch/arm64/mm/init.c | 6 + drivers/acpi/arm64/iort.c | 51 +++ include/linux/acpi_iort.h | 4 +++ 3 files changed, 61 insertions(+) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 97b0d2768349..f321761eedb2 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -196,6 +197,11 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max) #ifdef CONFIG_ZONE_DMA zone_dma_bits = min(zone_dma_bits, (unsigned int)ilog2(of_dma_get_max_cpu_address(NULL))); + + if (IS_ENABLED(CONFIG_ACPI)) + zone_dma_bits = min(zone_dma_bits, + acpi_iort_get_zone_dma_size()); + arm64_dma_phys_limit = max_zone_phys(zone_dma_bits); max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); #endif diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 9929ff50c0c0..8f530bf3c03b 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -1718,3 +1718,54 @@ void __init acpi_iort_init(void) iort_init_platform_devices(); } + +#ifdef CONFIG_ZONE_DMA +/* + * Check the IORT whether any devices exist whose DMA mask is < 32 bits. + * If so, return the smallest value encountered, or 32 otherwise. + */ +unsigned int __init acpi_iort_get_zone_dma_size(void) +{ + struct acpi_table_iort *iort; + struct acpi_iort_node *node, *end; + acpi_status status; + u8 limit = 32; + int i; + + if (acpi_disabled) + return limit; + + status = acpi_get_table(ACPI_SIG_IORT, 0, + (struct acpi_table_header **)&iort); + if (ACPI_FAILURE(status)) + return limit; + + node = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->node_offset); + end = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->header.length); + + for (i = 0; i < iort->node_count; i++) { + if (node >= end) + break; + + switch (node->type) { + struct acpi_iort_named_component *ncomp; + struct acpi_iort_root_complex *rc; + + case ACPI_IORT_NODE_NAMED_COMPONENT: + ncomp = (struct acpi_iort_named_component *)node->node_data; + if (ncomp->memory_address_limit) + limit = min(limit, ncomp->memory_address_limit); + break; + + case ACPI_IORT_NODE_PCI_ROOT_COMPLEX: + rc = (struct acpi_iort_root_complex *)node->node_data; + if (rc->memory_address_limit) + limit = min(limit, rc->memory_address_limit); + break; + } + node = ACPI_ADD_PTR(struct acpi_iort_node, node, node->length); + } + acpi_put_t
[PATCH v3 2/8] arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
zone_dma_bits's initialization happens earlier that it's actually needed, in arm64_memblock_init(). So move it into the more suitable zone_sizes_init(). Signed-off-by: Nicolas Saenz Julienne --- arch/arm64/mm/init.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 1d29f2ca81c7..ef0ef0087e2c 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -196,6 +196,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max) unsigned long max_zone_pfns[MAX_NR_ZONES] = {0}; #ifdef CONFIG_ZONE_DMA + zone_dma_bits = ARM64_ZONE_DMA_BITS; + arm64_dma_phys_limit = max_zone_phys(zone_dma_bits); max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); #endif #ifdef CONFIG_ZONE_DMA32 @@ -386,11 +388,6 @@ void __init arm64_memblock_init(void) early_init_fdt_scan_reserved_mem(); - if (IS_ENABLED(CONFIG_ZONE_DMA)) { - zone_dma_bits = ARM64_ZONE_DMA_BITS; - arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS); - } - if (IS_ENABLED(CONFIG_ZONE_DMA32)) arm64_dma32_phys_limit = max_zone_phys(32); else -- 2.28.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 6/8] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
We recently introduced a 1 GB sized ZONE_DMA to cater for platforms incorporating masters that can address less than 32 bits of DMA, in particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has peripherals that can only address up to 1 GB (and its PCIe host bridge can only access the bottom 3 GB) The DMA layer also needs to be able to allocate memory that is guaranteed to meet those DMA constraints, for bounce buffering as well as allocating the backing for consistent mappings. This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately, it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes problems with kdump, and potentially in other places where allocations cannot cross zone boundaries. Therefore, we should avoid having two separate DMA zones when possible. So, with the help of of_dma_get_max_cpu_address() get the topmost physical address accessible to all DMA masters in system and use that information to fine-tune ZONE_DMA's size. In the absence of addressing limited masters ZONE_DMA will span the whole 32-bit address space, otherwise, in the case of the Raspberry Pi 4 it'll only span the 30-bit address space, and have ZONE_DMA32 cover the rest of the 32-bit address space. Signed-off-by: Nicolas Saenz Julienne --- Changes since v2: - Updated commit log by shamelessly copying Ard's ACPI counterpart commit log arch/arm64/include/asm/processor.h | 1 + arch/arm64/mm/init.c | 5 ++--- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index fce8cbecd6bc..c09d3f1a9a6b 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -97,6 +97,7 @@ extern phys_addr_t arm64_dma_phys_limit; #define ARCH_LOW_ADDRESS_LIMIT (arm64_dma_phys_limit - 1) +#define ZONE_DMA_BITS_DEFAULT 32 struct debug_info { #ifdef CONFIG_HAVE_HW_BREAKPOINT diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index ef0ef0087e2c..97b0d2768349 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -42,8 +42,6 @@ #include #include -#define ARM64_ZONE_DMA_BITS30 - /* * We need to be able to catch inadvertent references to memstart_addr * that occur (potentially in generic code) before arm64_memblock_init() @@ -196,7 +194,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max) unsigned long max_zone_pfns[MAX_NR_ZONES] = {0}; #ifdef CONFIG_ZONE_DMA - zone_dma_bits = ARM64_ZONE_DMA_BITS; + zone_dma_bits = min(zone_dma_bits, + (unsigned int)ilog2(of_dma_get_max_cpu_address(NULL))); arm64_dma_phys_limit = max_zone_phys(zone_dma_bits); max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); #endif -- 2.28.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 3/8] of/address: Introduce of_dma_get_max_cpu_address()
Introduce of_dma_get_max_cpu_address(), which provides the highest CPU physical address addressable by all DMA masters in the system. It's specially useful for setting memory zones sizes at early boot time. Signed-off-by: Nicolas Saenz Julienne --- Changes since v2: - Use PHYS_ADDR_MAX - return phys_dma_t - Rename function - Correct subject - Add support to start parsing from an arbitrary device node in order for the function to work with unit tests drivers/of/address.c | 42 ++ include/linux/of.h | 7 +++ 2 files changed, 49 insertions(+) diff --git a/drivers/of/address.c b/drivers/of/address.c index eb9ab4f1e80b..b5a9695aaf82 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -1024,6 +1024,48 @@ int of_dma_get_range(struct device_node *np, const struct bus_dma_region **map) } #endif /* CONFIG_HAS_DMA */ +/** + * of_dma_get_max_cpu_address - Gets highest CPU address suitable for DMA + * @np: The node to start searching from or NULL to start from the root + * + * Gets the highest CPU physical address that is addressable by all DMA masters + * in the system (or subtree when np is non-NULL). If no DMA constrained device + * is found, it returns PHYS_ADDR_MAX. + */ +phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np) +{ + phys_addr_t max_cpu_addr = PHYS_ADDR_MAX; + struct of_range_parser parser; + phys_addr_t subtree_max_addr; + struct device_node *child; + phys_addr_t cpu_end = 0; + struct of_range range; + const __be32 *ranges; + int len; + + if (!np) + np = of_root; + + ranges = of_get_property(np, "dma-ranges", &len); + if (ranges && len) { + of_dma_range_parser_init(&parser, np); + for_each_of_range(&parser, &range) + if (range.cpu_addr + range.size > cpu_end) + cpu_end = range.cpu_addr + range.size; + + if (max_cpu_addr > cpu_end) + max_cpu_addr = cpu_end; + } + + for_each_available_child_of_node(np, child) { + subtree_max_addr = of_dma_get_max_cpu_address(child); + if (max_cpu_addr > subtree_max_addr) + max_cpu_addr = subtree_max_addr; + } + + return max_cpu_addr; +} + /** * of_dma_is_coherent - Check if device is coherent * @np:device node diff --git a/include/linux/of.h b/include/linux/of.h index 481ec0467285..db8db8f2c967 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -558,6 +558,8 @@ int of_map_id(struct device_node *np, u32 id, const char *map_name, const char *map_mask_name, struct device_node **target, u32 *id_out); +phys_addr_t of_dma_get_max_cpu_address(struct device_node *np); + #else /* CONFIG_OF */ static inline void of_core_init(void) @@ -995,6 +997,11 @@ static inline int of_map_id(struct device_node *np, u32 id, return -EINVAL; } +static inline phys_addr_t of_dma_get_max_cpu_address(struct device_node *np) +{ + return PHYS_ADDR_MAX; +} + #define of_match_ptr(_ptr) NULL #define of_match_node(_matches, _node) NULL #endif /* CONFIG_OF */ -- 2.28.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 0/8] arm64: Default to 32-bit wide ZONE_DMA
Using two distinct DMA zones turned out to be problematic. Here's an attempt go back to a saner default. I tested this on both a RPi4 and QEMU. --- Changes since v2: - Introduce Ard's patch - Improve OF dma-ranges parsing function - Add unit test for OF function - Address small changes - Move crashkernel reservation later in boot process Changes since v1: - Parse dma-ranges instead of using machine compatible string Ard Biesheuvel (1): arm64: mm: Set ZONE_DMA size based on early IORT scan Nicolas Saenz Julienne (7): arm64: mm: Move reserve_crashkernel() into mem_init() arm64: mm: Move zone_dma_bits initialization into zone_sizes_init() of/address: Introduce of_dma_get_max_cpu_address() of: unittest: Add test for of_dma_get_max_cpu_address() dma-direct: Turn zone_dma_bits default value into a define arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges mm: Update DMA zones description arch/arm64/include/asm/processor.h | 1 + arch/arm64/mm/init.c | 20 ++-- drivers/acpi/arm64/iort.c | 51 ++ drivers/of/address.c | 42 drivers/of/unittest.c | 20 include/linux/acpi_iort.h | 4 +++ include/linux/dma-direct.h | 3 ++ include/linux/mmzone.h | 5 +-- include/linux/of.h | 7 kernel/dma/direct.c| 2 +- 10 files changed, 143 insertions(+), 12 deletions(-) -- 2.28.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 5/8] dma-direct: Turn zone_dma_bits default value into a define
Set zone_dma_bits default value through a define so as for architectures to be able to override it with their default value. Signed-off-by: Nicolas Saenz Julienne --- include/linux/dma-direct.h | 3 +++ kernel/dma/direct.c| 2 +- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index 18aade195884..e433d90cbacf 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -12,6 +12,9 @@ #include #include +#ifndef ZONE_DMA_BITS_DEFAULT +#define ZONE_DMA_BITS_DEFAULT 24 +#endif extern unsigned int zone_dma_bits; /* diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 06c111544f61..c0d97f536e93 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -20,7 +20,7 @@ * it for entirely different regions. In that case the arch code needs to * override the variable below for dma-direct to work properly. */ -unsigned int zone_dma_bits __ro_after_init = 24; +unsigned int zone_dma_bits __ro_after_init = ZONE_DMA_BITS_DEFAULT; static inline dma_addr_t phys_to_dma_direct(struct device *dev, phys_addr_t phys) -- 2.28.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 1/8] arm64: mm: Move reserve_crashkernel() into mem_init()
crashkernel might reserve memory located in ZONE_DMA. We plan to delay ZONE_DMA's initialization after unflattening the devicetree and ACPI's boot table initialization, so move it later in the boot process. Specifically into mem_init(), this is the last place crashkernel will be able to reserve the memory before the page allocator kicks in and there is no need to do it earlier. Signed-off-by: Nicolas Saenz Julienne --- arch/arm64/mm/init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index a53c1e0fb017..1d29f2ca81c7 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -396,8 +396,6 @@ void __init arm64_memblock_init(void) else arm64_dma32_phys_limit = PHYS_MASK + 1; - reserve_crashkernel(); - reserve_elfcorehdr(); high_memory = __va(memblock_end_of_DRAM() - 1) + 1; @@ -518,6 +516,8 @@ void __init mem_init(void) else swiotlb_force = SWIOTLB_NO_FORCE; + reserve_crashkernel(); + set_max_mapnr(max_pfn - PHYS_PFN_OFFSET); #ifndef CONFIG_SPARSEMEM_VMEMMAP -- 2.28.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 8/8] mm: Update DMA zones description
The default behavior for arm64 changed, so reflect that. Signed-off-by: Nicolas Saenz Julienne Acked-by: Catalin Marinas --- include/linux/mmzone.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index fb3bf696c05e..4ee2306351b9 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -363,8 +363,9 @@ enum zone_type { * - arm only uses ZONE_DMA, the size, up to 4G, may vary depending on *the specific device. * -* - arm64 has a fixed 1G ZONE_DMA and ZONE_DMA32 for the rest of the -*lower 4G. +* - arm64 uses a single 4GB ZONE_DMA, except on the Raspberry Pi 4, +*in which ZONE_DMA covers the first GB and ZONE_DMA32 the rest of +*the lower 4GB. * * - powerpc only uses ZONE_DMA, the size, up to 2G, may vary *depending on the specific device. -- 2.28.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 3/3] iommu/tegra-smmu: Add PCI support
On 2020-10-09 17:19, Nicolin Chen wrote: This patch simply adds support for PCI devices. Reviewed-by: Dmitry Osipenko Tested-by: Dmitry Osipenko Signed-off-by: Nicolin Chen --- Changelog v6->v7 * Renamed goto labels, suggested by Thierry. v5->v6 * Added Dmitry's Reviewed-by and Tested-by. v4->v5 * Added Dmitry's Reviewed-by v3->v4 * Dropped !iommu_present() check * Added CONFIG_PCI check in the exit path v2->v3 * Replaced ternary conditional operator with if-else in .device_group() * Dropped change in tegra_smmu_remove() v1->v2 * Added error-out labels in tegra_smmu_probe() * Dropped pci_request_acs() since IOMMU core would call it. drivers/iommu/tegra-smmu.c | 35 +-- 1 file changed, 25 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index be29f5977145..2941d6459076 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -865,7 +866,11 @@ static struct iommu_group *tegra_smmu_device_group(struct device *dev) group->smmu = smmu; group->soc = soc; - group->group = iommu_group_alloc(); + if (dev_is_pci(dev)) + group->group = pci_device_group(dev); Just to check, is it OK to have two or more swgroups "owning" the same iommu_group if an existing one gets returned here? It looks like that might not play nice with the use of iommu_group_set_iommudata(). Robin. + else + group->group = generic_device_group(dev); + if (IS_ERR(group->group)) { devm_kfree(smmu->dev, group); mutex_unlock(&smmu->lock); @@ -1075,22 +1080,32 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev, iommu_device_set_fwnode(&smmu->iommu, dev->fwnode); err = iommu_device_register(&smmu->iommu); - if (err) { - iommu_device_sysfs_remove(&smmu->iommu); - return ERR_PTR(err); - } + if (err) + goto remove_sysfs; err = bus_set_iommu(&platform_bus_type, &tegra_smmu_ops); - if (err < 0) { - iommu_device_unregister(&smmu->iommu); - iommu_device_sysfs_remove(&smmu->iommu); - return ERR_PTR(err); - } + if (err < 0) + goto unregister; + +#ifdef CONFIG_PCI + err = bus_set_iommu(&pci_bus_type, &tegra_smmu_ops); + if (err < 0) + goto unset_platform_bus; +#endif if (IS_ENABLED(CONFIG_DEBUG_FS)) tegra_smmu_debugfs_init(smmu); return smmu; + +unset_platform_bus: __maybe_unused; + bus_set_iommu(&platform_bus_type, NULL); +unregister: + iommu_device_unregister(&smmu->iommu); +remove_sysfs: + iommu_device_sysfs_remove(&smmu->iommu); + + return ERR_PTR(err); } void tegra_smmu_remove(struct tegra_smmu *smmu) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 8/8] WIP: add a dma_alloc_contiguous API
> On Wed, Sep 30, 2020 at 6:09 PM Christoph Hellwig wrote: > > > > Add a new API that returns a virtually non-contigous array of pages > > and dma address. This API is only implemented for dma-iommu and will > > not be implemented for non-iommu DMA API instances that have to allocate > > contiguous memory. It is up to the caller to check if the API is > > available. Isn't there already a flag that is only implemented for ARM systems with iommu that forces pages to actually be physically contiguous? So isn't this some kind of strange opposite? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API
+CC Ricardo who will be looking into using this in the USB stack (UVC camera driver). On Wed, Sep 30, 2020 at 6:09 PM Christoph Hellwig wrote: > > Add a new API that returns a virtually non-contigous array of pages > and dma address. This API is only implemented for dma-iommu and will > not be implemented for non-iommu DMA API instances that have to allocate > contiguous memory. It is up to the caller to check if the API is > available. > > The intent is that media drivers can use this API if either: > > - no kernel mapping or only temporary kernel mappings are required. >That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING > - a kernel mapping is required for cached and DMA mapped pages, but >the driver also needs the pages to e.g. map them to userspace. >In that sense it is a replacement for some aspects of the recently >removed and never fully implemented DMA_ATTR_NON_CONSISTENT > > Signed-off-by: Christoph Hellwig > --- > drivers/iommu/dma-iommu.c | 73 + > include/linux/dma-mapping.h | 9 + > kernel/dma/mapping.c| 35 ++ > 3 files changed, 93 insertions(+), 24 deletions(-) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 7922f545cd5eef..158026a856622c 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -565,23 +565,12 @@ static struct page **__iommu_dma_alloc_pages(struct > device *dev, > return pages; > } > > -/** > - * iommu_dma_alloc_remap - Allocate and map a buffer contiguous in IOVA space > - * @dev: Device to allocate memory for. Must be a real device > - * attached to an iommu_dma_domain > - * @size: Size of buffer in bytes > - * @dma_handle: Out argument for allocated DMA handle > - * @gfp: Allocation flags > - * @prot: pgprot_t to use for the remapped mapping > - * @attrs: DMA attributes for this allocation > - * > - * If @size is less than PAGE_SIZE, then a full CPU page will be allocated, > +/* > + * If size is less than PAGE_SIZE, then a full CPU page will be allocated, > * but an IOMMU which supports smaller pages might not map the whole thing. > - * > - * Return: Mapped virtual address, or NULL on failure. > */ > -static void *iommu_dma_alloc_remap(struct device *dev, size_t size, > - dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot, > +static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev, > + size_t size, dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot, > unsigned long attrs) > { > struct iommu_domain *domain = iommu_get_dma_domain(dev); > @@ -593,7 +582,6 @@ static void *iommu_dma_alloc_remap(struct device *dev, > size_t size, > struct page **pages; > struct sg_table sgt; > dma_addr_t iova; > - void *vaddr; > > *dma_handle = DMA_MAPPING_ERROR; > > @@ -636,17 +624,10 @@ static void *iommu_dma_alloc_remap(struct device *dev, > size_t size, > < size) > goto out_free_sg; > > - vaddr = dma_common_pages_remap(pages, size, prot, > - __builtin_return_address(0)); > - if (!vaddr) > - goto out_unmap; > - > *dma_handle = iova; > sg_free_table(&sgt); > - return vaddr; > + return pages; > > -out_unmap: > - __iommu_dma_unmap(dev, iova, size); > out_free_sg: > sg_free_table(&sgt); > out_free_iova: > @@ -656,6 +637,46 @@ static void *iommu_dma_alloc_remap(struct device *dev, > size_t size, > return NULL; > } > > +static void *iommu_dma_alloc_remap(struct device *dev, size_t size, > + dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot, > + unsigned long attrs) > +{ > + struct page **pages; > + void *vaddr; > + > + pages = __iommu_dma_alloc_noncontiguous(dev, size, dma_handle, gfp, > + prot, attrs); > + if (!pages) > + return NULL; > + vaddr = dma_common_pages_remap(pages, size, prot, > + __builtin_return_address(0)); > + if (!vaddr) > + goto out_unmap; > + return vaddr; > + > +out_unmap: > + __iommu_dma_unmap(dev, *dma_handle, size); > + __iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT); > + return NULL; > +} > + > +#ifdef CONFIG_DMA_REMAP > +static struct page **iommu_dma_alloc_noncontiguous(struct device *dev, > + size_t size, dma_addr_t *dma_handle, gfp_t gfp, > + unsigned long attrs) > +{ > + return __iommu_dma_alloc_noncontiguous(dev, size, dma_handle, gfp, > + PAGE_KERNEL, attrs); > +} > + > +static void iommu_dma_free_noncontiguous(struct device *dev, size_t size, > + struct page **pages, dma_addr_t dma_handle) > +{ > + __iommu_dma_unmap(dev, dma_handle, size); > + __iommu_dma
Re: [PATCH next] iommu: intel: don't dereference iommu_device if IOMMU_API is not built
On Wed, Oct 14, 2020 at 03:25:08PM +0800, Lu Baolu wrote: > I suppose Joerg will pick this up. I guess you don't need to resend it > unless Joerg asks you to do. Yes, will pick this up soon, no need to re-send. Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/5] of/address: Introduce of_dma_lower_bus_limit()
On Wed, Oct 14, 2020 at 6:52 AM Nicolas Saenz Julienne wrote: > > Hi Rob, > > On Mon, 2020-10-12 at 10:25 -0500, Rob Herring wrote: > > On Sat, Oct 10, 2020 at 10:12 AM Nicolas Saenz Julienne > > wrote: > > > The function provides the CPU physical address addressable by the most > > > constrained bus in the system. It might be useful in order to > > > dynamically set up memory zones during boot. > > > > > > Signed-off-by: Nicolas Saenz Julienne > > > --- > > > drivers/of/address.c | 34 ++ > > > include/linux/of.h | 7 +++ > > > 2 files changed, 41 insertions(+) > > > > > > diff --git a/drivers/of/address.c b/drivers/of/address.c > > > index eb9ab4f1e80b..755e97b65096 100644 > > > --- a/drivers/of/address.c > > > +++ b/drivers/of/address.c > > > @@ -1024,6 +1024,40 @@ int of_dma_get_range(struct device_node *np, const > > > struct bus_dma_region **map) > > > } > > > #endif /* CONFIG_HAS_DMA */ > > > > > > +/** > > > + * of_dma_safe_phys_limit - Get system wide DMA safe address space > > > + * > > > + * Gets the CPU physical address limit for safe DMA addressing system > > > wide by > > > + * searching for the most constraining dma-range. Otherwise it returns > > > ~0ULL. > > > + */ > > > +u64 __init of_dma_safe_phys_limit(void) > > > +{ > > > + struct device_node *np = NULL; > > > + struct of_range_parser parser; > > > + const __be32 *ranges = NULL; > > > + u64 phys_dma_limit = ~0ULL; > > > + struct of_range range; > > > + int len; > > > + > > > + for_each_of_allnodes(np) { > > > + dma_addr_t cpu_end = 0; > > > + > > > + ranges = of_get_property(np, "dma-ranges", &len); > > > + if (!ranges || !len) > > > + continue; > > > + > > > + of_dma_range_parser_init(&parser, np); > > > + for_each_of_range(&parser, &range) > > > + if (range.cpu_addr + range.size > cpu_end) > > > + cpu_end = range.cpu_addr + range.size; > > > > This doesn't work if you have more than one level of dma-ranges. The > > address has to be translated first. It should be okay to do that on > > the start or end address (if not, your DT is broken). > > for_each_of_range() calls of_pci_range_parser_one() which utimately populates > range.cpu_addr with of_translate_dma_address() results. Isn't that good > enough? Indeed. I guess I was remembering the cases not using for_each_of_range which forgot to do that... Rob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/5] of/address: Introduce of_dma_lower_bus_limit()
Hi Rob, On Mon, 2020-10-12 at 10:25 -0500, Rob Herring wrote: > On Sat, Oct 10, 2020 at 10:12 AM Nicolas Saenz Julienne > wrote: > > The function provides the CPU physical address addressable by the most > > constrained bus in the system. It might be useful in order to > > dynamically set up memory zones during boot. > > > > Signed-off-by: Nicolas Saenz Julienne > > --- > > drivers/of/address.c | 34 ++ > > include/linux/of.h | 7 +++ > > 2 files changed, 41 insertions(+) > > > > diff --git a/drivers/of/address.c b/drivers/of/address.c > > index eb9ab4f1e80b..755e97b65096 100644 > > --- a/drivers/of/address.c > > +++ b/drivers/of/address.c > > @@ -1024,6 +1024,40 @@ int of_dma_get_range(struct device_node *np, const > > struct bus_dma_region **map) > > } > > #endif /* CONFIG_HAS_DMA */ > > > > +/** > > + * of_dma_safe_phys_limit - Get system wide DMA safe address space > > + * > > + * Gets the CPU physical address limit for safe DMA addressing system wide > > by > > + * searching for the most constraining dma-range. Otherwise it returns > > ~0ULL. > > + */ > > +u64 __init of_dma_safe_phys_limit(void) > > +{ > > + struct device_node *np = NULL; > > + struct of_range_parser parser; > > + const __be32 *ranges = NULL; > > + u64 phys_dma_limit = ~0ULL; > > + struct of_range range; > > + int len; > > + > > + for_each_of_allnodes(np) { > > + dma_addr_t cpu_end = 0; > > + > > + ranges = of_get_property(np, "dma-ranges", &len); > > + if (!ranges || !len) > > + continue; > > + > > + of_dma_range_parser_init(&parser, np); > > + for_each_of_range(&parser, &range) > > + if (range.cpu_addr + range.size > cpu_end) > > + cpu_end = range.cpu_addr + range.size; > > This doesn't work if you have more than one level of dma-ranges. The > address has to be translated first. It should be okay to do that on > the start or end address (if not, your DT is broken). for_each_of_range() calls of_pci_range_parser_one() which utimately populates range.cpu_addr with of_translate_dma_address() results. Isn't that good enough? > Please add/extend a unittest for this. Will do. Regards, Nicolas signature.asc Description: This is a digitally signed message part ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/5] of/address: Introduce of_dma_lower_bus_limit()
On Sun, 2020-10-11 at 09:47 +0200, Ard Biesheuvel wrote: > Hi Nicolas, > > $SUBJECT is out of sync with the patch below. Also, for legibility, it > helps if the commit log is intelligible by itself, rather than relying > on $SUBJECT being the first line of the first paragraph. Noted, I'll update all commit logs. > On Sat, 10 Oct 2020 at 17:12, Nicolas Saenz Julienne > wrote: > > The function provides the CPU physical address addressable by the most > > constrained bus in the system. It might be useful in order to > > dynamically set up memory zones during boot. > > > > Signed-off-by: Nicolas Saenz Julienne > > --- > > drivers/of/address.c | 34 ++ > > include/linux/of.h | 7 +++ > > 2 files changed, 41 insertions(+) > > > > diff --git a/drivers/of/address.c b/drivers/of/address.c > > index eb9ab4f1e80b..755e97b65096 100644 > > --- a/drivers/of/address.c > > +++ b/drivers/of/address.c > > @@ -1024,6 +1024,40 @@ int of_dma_get_range(struct device_node *np, const > > struct bus_dma_region **map) > > } > > #endif /* CONFIG_HAS_DMA */ > > > > +/** > > + * of_dma_safe_phys_limit - Get system wide DMA safe address space > > + * > > + * Gets the CPU physical address limit for safe DMA addressing system wide > > by > > + * searching for the most constraining dma-range. Otherwise it returns > > ~0ULL. > > + */ > > +u64 __init of_dma_safe_phys_limit(void) > > I don't think 'safe' strikes the right tone here. You are looking for > the highest CPU address that is addressable by all DMA masters in the > system. > > Something like > > of_dma_get_max_cpu_address(void) > > perhaps? Also, since this is generic code, phys_addr_t is probably a > better type to return. Sonds good to me, I dindn't like the name I used either. Will use with phys_addr_t. Regards, Nicolas signature.asc Description: This is a digitally signed message part ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH next] iommu: intel: don't dereference iommu_device if IOMMU_API is not built
Hi Bartosz, On 10/14/20 3:18 PM, Bartosz Golaszewski wrote: On Wed, Oct 14, 2020 at 2:49 AM Lu Baolu wrote: On 10/13/20 3:30 PM, Bartosz Golaszewski wrote: From: Bartosz Golaszewski Since commit c40c1018 ("iommu/vt-d: Gracefully handle DMAR units with no supported address widths") dmar.c needs struct iommu_device to be selected. We can drop this dependency by not dereferencing struct iommu_device if IOMMU_API is not selected and by reusing the information stored in iommu->drhd->ignored instead. This fixes the following build error when IOMMU_API is not selected: drivers/iommu/intel/dmar.c: In function ‘free_iommu’: drivers/iommu/intel/dmar.c:1139:41: error: ‘struct iommu_device’ has no member named ‘ops’ 1139 | if (intel_iommu_enabled && iommu->iommu.ops) { ^ Fixes: c40c1018 ("iommu/vt-d: Gracefully handle DMAR units with no supported address widths") Signed-off-by: Bartosz Golaszewski With commit title adjusted to "iommu/vt-d: Don't dereference iommu_device if IOMMU_API is not built", Acked-by: Lu Baolu Do you want me to resend it again with a changed title or can you fix it up when applying? Or should someone else pick it up? I suppose Joerg will pick this up. I guess you don't need to resend it unless Joerg asks you to do. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH next] iommu: intel: don't dereference iommu_device if IOMMU_API is not built
On Wed, Oct 14, 2020 at 2:49 AM Lu Baolu wrote: > > On 10/13/20 3:30 PM, Bartosz Golaszewski wrote: > > From: Bartosz Golaszewski > > > > Since commit c40c1018 ("iommu/vt-d: Gracefully handle DMAR units > > with no supported address widths") dmar.c needs struct iommu_device to > > be selected. We can drop this dependency by not dereferencing struct > > iommu_device if IOMMU_API is not selected and by reusing the information > > stored in iommu->drhd->ignored instead. > > > > This fixes the following build error when IOMMU_API is not selected: > > > > drivers/iommu/intel/dmar.c: In function ‘free_iommu’: > > drivers/iommu/intel/dmar.c:1139:41: error: ‘struct iommu_device’ has no > > member named ‘ops’ > > 1139 | if (intel_iommu_enabled && iommu->iommu.ops) { > > ^ > > > > Fixes: c40c1018 ("iommu/vt-d: Gracefully handle DMAR units with no > > supported address widths") > > Signed-off-by: Bartosz Golaszewski > > With commit title adjusted to "iommu/vt-d: Don't dereference > iommu_device if IOMMU_API is not built", > > Acked-by: Lu Baolu > Do you want me to resend it again with a changed title or can you fix it up when applying? Or should someone else pick it up? Bartosz ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu