Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711
On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: > Sadly I just realised that the series is incomplete, we have RPi4 users that > want to boot unsing ACPI, and this series would break things for them. I'll > have a word with them to see what we can do for their use-case. Stupid question: why do these users insist on a totally unsuitable interface? And why would we as Linux developers care to support such a aims? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711
On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig wrote: > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: > > Sadly I just realised that the series is incomplete, we have RPi4 users that > > want to boot unsing ACPI, and this series would break things for them. I'll > > have a word with them to see what we can do for their use-case. > > Stupid question: why do these users insist on a totally unsuitable > interface? And why would we as Linux developers care to support such > a aims? > The point is really whether we want to revert changes in Linux that made both DT and ACPI boot work without quirks on RPi4. Having to check the RPi4 compatible string or OEM id in core init code is awful, regardless of whether you boot via ACPI or via DT. The problem with this hardware is that it uses a DMA mask which is narrower than 32, and the arm64 kernel is simply not set up to deal with that at all. On DT, we have DMA ranges properties and the likes to describe such limitations, on ACPI we have _DMA methods as well as DMA range attributes in the IORT, both of which are now handled correctly. So all the information is there, we just have to figure out how to consume it early on. Interestingly, this limitation always existed in the SoC, but it wasn't until they started shipping it with more than 1 GB of DRAM that it became a problem. This means issues like this could resurface in the future with existing SoCs when they get shipped with more memory, and so I would prefer fixing this in a generic way. Also, I assume papering over the issue like this does not fix the kdump issue fundamentally, it just works around it, and so we might run into this again in the future. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()
On Thu, 2020-10-08 at 14:40 +0200, Thomas Gleixner wrote: > Subject: x86/iommu: Make interrupt remapping more robust > From: Thomas Gleixner > Date: Thu, 08 Oct 2020 14:09:44 +0200 > > Needs to be split into pieces and cover PCI proper. Right now PCI gets a > NULL pointer assigned which makes it explode at the wrong place > later. Also hyperv iommu wants some love. > > NOT-Signed-off-by: Thomas Gleixner > --- > arch/x86/kernel/apic/io_apic.c |4 +++- > arch/x86/kernel/apic/msi.c | 24 ++-- > drivers/iommu/amd/iommu.c |6 +++--- > drivers/iommu/intel/irq_remapping.c |4 ++-- > 4 files changed, 22 insertions(+), 16 deletions(-) > > --- a/arch/x86/kernel/apic/io_apic.c > +++ b/arch/x86/kernel/apic/io_apic.c > @@ -2300,7 +2300,9 @@ static int mp_irqdomain_create(int ioapi > info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT; > info.devid = mpc_ioapic_id(ioapic); > parent = irq_remapping_get_irq_domain(&info); > - if (!parent) > + if (IS_ERR(parent)) > + return PTR_ERR(parent); > + else if (!parent) > parent = x86_vector_domain; > else > name = "IO-APIC-IR"; > --- a/arch/x86/kernel/apic/msi.c > +++ b/arch/x86/kernel/apic/msi.c > @@ -415,9 +415,9 @@ static struct msi_domain_info hpet_msi_d > struct irq_domain *hpet_create_irq_domain(int hpet_id) > { > struct msi_domain_info *domain_info; > + struct fwnode_handle *fn = NULL; > struct irq_domain *parent, *d; > struct irq_alloc_info info; > - struct fwnode_handle *fn; > > if (x86_vector_domain == NULL) > return NULL; > @@ -432,25 +432,29 @@ struct irq_domain *hpet_create_irq_domai > init_irq_alloc_info(&info, NULL); > info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT; > info.devid = hpet_id; > + > parent = irq_remapping_get_irq_domain(&info); > - if (parent == NULL) > + if (IS_ERR(parent)) > + goto fail; > + else if (!parent) > parent = x86_vector_domain; > else > hpet_msi_controller.name = "IR-HPET-MSI"; Hrm... this is the '-ENODEV changes' that you wanted me to pick up, right? I confess I don't like it very much. I'd much prefer to be able to use a generic foo_get_parent_irq_domain() function which just returns an answer or an error. Definitely none of this "if it returns NULL, caller fills it in for themselves with some magic global because we're special". And I don't much like abusing the irq_alloc_info for this either. I didn't like the irq_alloc_info very much in its *original* use cases, for actually allocating IRQs. Abusing it for this is worse. I'm more than happy to start digging into this, but once I do I fear I'm not going to stop until HPET and IOAPIC don't know *anything* about whether they're behind IR or not. And yes, I know that IOAPIC has a different EOI method for IR but AFAICT that's *already* hosed for Hyper-V because it doesn't *really* do remapping and so the RTEs *can* change there to move interrupts around. There's more differences between ioapic_ack_level() and ioapic_ir_ack_level() than the erratum workaround and whether we use data->entry.vector... aren't there? For the specific case you were targeting with this patch, you already successfully persuaded me it should never happen, and there's a WARN_ON which would trigger if it ever does (with too many CPUs in the system). Let's come back and tackle this can of worms in a separate cleanup series. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711
On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote: > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig wrote: > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: > > > Sadly I just realised that the series is incomplete, we have RPi4 users > > > that > > > want to boot unsing ACPI, and this series would break things for them. > > > I'll > > > have a word with them to see what we can do for their use-case. > > > > Stupid question: why do these users insist on a totally unsuitable > > interface? And why would we as Linux developers care to support such > > a aims? > > The point is really whether we want to revert changes in Linux that > made both DT and ACPI boot work without quirks on RPi4. Well, and broke a big amount of devices that were otherwise fine. > Having to check the RPi4 compatible string or OEM id in core init code is > awful, regardless of whether you boot via ACPI or via DT. > > The problem with this hardware is that it uses a DMA mask which is > narrower than 32, and the arm64 kernel is simply not set up to deal > with that at all. On DT, we have DMA ranges properties and the likes > to describe such limitations, on ACPI we have _DMA methods as well as > DMA range attributes in the IORT, both of which are now handled > correctly. So all the information is there, we just have to figure out > how to consume it early on. Is it worth the effort just for a single board? I don't know about ACPI but parsing dma-ranges that early at boot time is not trivial. My intuition tells me that it'd be even harder for ACPI, being a more complex data structure. > Interestingly, this limitation always existed in the SoC, but it > wasn't until they started shipping it with more than 1 GB of DRAM that > it became a problem. This means issues like this could resurface in > the future with existing SoCs when they get shipped with more memory, > and so I would prefer fixing this in a generic way. Actually what I proposed here is pretty generic. Specially from arm64's perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on whatever it finds in DT. Both those operations are architecture independent. arm64 arch code doesn't care about the logic involved in ascertaining zone_dma_bits. I get that the last step isn't generic. But it's all setup so as to make it as such whenever it's worth the effort. > Also, I assume papering over the issue like this does not fix the > kdump issue fundamentally, it just works around it, and so we might > run into this again in the future. Any ideas? The way I understand it the kdump issue is just a shortcoming of the memory zones design. Regards, Nicolas signature.asc Description: This is a digitally signed message part ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711
Hi Jeremy. On Thu, 2020-10-08 at 22:59 -0500, Jeremy Linton wrote: > On 10/8/20 2:43 PM, Ard Biesheuvel wrote: > > (+ Lorenzo) > > > > On Thu, 8 Oct 2020 at 12:14, Catalin Marinas > > wrote: > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: > > > > On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote: > > > > > On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne > > > > > wrote: > > > > > > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote: > > > > > > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote: > > > > > > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz > > > > > > > > Julienne wrote: > > > > > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > > > > > > > > > index 4602e467ca8b..cd0d115ef329 100644 > > > > > > > > > --- a/drivers/of/fdt.c > > > > > > > > > +++ b/drivers/of/fdt.c > > > > > > > > > @@ -25,6 +25,7 @@ > > > > > > > > > #include > > > > > > > > > #include > > > > > > > > > #include > > > > > > > > > +#include /* for zone_dma_bits */ > > > > > > > > > > > > > > > > > > #include /* for COMMAND_LINE_SIZE */ > > > > > > > > > #include > > > > > > > > > @@ -1198,6 +1199,14 @@ void __init > > > > > > > > > early_init_dt_scan_nodes(void) > > > > > > > > > of_scan_flat_dt(early_init_dt_scan_memory, NULL); > > > > > > > > > } > > > > > > > > > > > > > > > > > > +void __init early_init_dt_update_zone_dma_bits(void) > > > > > > > > > +{ > > > > > > > > > + unsigned long dt_root = of_get_flat_dt_root(); > > > > > > > > > + > > > > > > > > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711")) > > > > > > > > > + zone_dma_bits = 30; > > > > > > > > > +} > > > > > > > > > > > > > > > > I think we could keep this entirely in the arm64 > > > > > > > > setup_machine_fdt() and > > > > > > > > not pollute the core code with RPi4-specific code. > > > > > > > > > > > > > > Actually, even better, could we not move the check to > > > > > > > arm64_memblock_init() when we initialise zone_dma_bits? > > > > > > > > > > > > I did it this way as I vaguely remembered Rob saying he wanted to > > > > > > centralise > > > > > > all early boot fdt code in one place. But I'll be happy to move it > > > > > > there. > > > > > > > > > > I can see Rob replied and I'm fine if that's his preference. However, > > > > > what I don't particularly like is that in the arm64 code, if > > > > > zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched > > > > > by > > > > > the early_init_dt_update_zone_dma_bits(). What if at some point we'll > > > > > get a platform that actually needs 24 here (I truly hope not, but just > > > > > the principle of relying on magic values)? > > > > > > > > > > So rather than guessing, I'd prefer if the arch code can override > > > > > ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no > > > > > need to explicitly touch the zone_dma_bits variable. > > > > > > > > Yes, sonds like the way to go. TBH I wasn't happy with that solution > > > > either, > > > > but couldn't think of a nicer alternative. > > > > > > > > Sadly I just realised that the series is incomplete, we have RPi4 users > > > > that > > > > want to boot unsing ACPI, and this series would break things for them. > > > > I'll > > > > have a word with them to see what we can do for their use-case. > > > > > > Is there a way to get some SoC information from ACPI? > > > > > > > This is unfortunate. We used ACPI _DMA methods as they were designed > > to communicate the DMA limit of the XHCI controller to the OS. > > > > It shouldn't be too hard to match the OEM id field in the DSDT, and > > switch to the smaller mask. But it sucks to have to add a quirk like > > that. > > It also requires delaying setting the arm64_dma_phy_limit a bit, but > that doesn't appear to be causing a problem. I've been boot/compiling > with a patch like: > > diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c > index cada0b816c8a..9dfe776c1c75 100644 > --- a/arch/arm64/kernel/acpi.c > +++ b/arch/arm64/kernel/acpi.c > @@ -14,6 +14,7 @@ > > #include > #include > +#include > #include > #include > #include > @@ -168,6 +169,11 @@ static int __init acpi_fadt_sanity_check(void) > ret = -EINVAL; > } > > + if (!strncmp(table->oem_id, "RPIFDN", ACPI_OEM_ID_SIZE) && > + !strncmp(table->oem_table_id, "RPI4", > ACPI_OEM_TABLE_ID_SIZE) && > + table->oem_revision <= 0x200) { > + zone_dma_bits = 30; > + } > out: > /* > * acpi_get_table() creates FADT table mapping that > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index cd5caca8a929..6c8aaf1570ce 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -195,6 +195,7 @@ static void __init zone_sizes_init(unsigned long > min, unsigned long max) > unsigned long max_zone_pfns[MAX
Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711
On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne wrote: > > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote: > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig wrote: > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: > > > > Sadly I just realised that the series is incomplete, we have RPi4 users > > > > that > > > > want to boot unsing ACPI, and this series would break things for them. > > > > I'll > > > > have a word with them to see what we can do for their use-case. > > > > > > Stupid question: why do these users insist on a totally unsuitable > > > interface? And why would we as Linux developers care to support such > > > a aims? > > > > The point is really whether we want to revert changes in Linux that > > made both DT and ACPI boot work without quirks on RPi4. > > Well, and broke a big amount of devices that were otherwise fine. > Yeah that was unfortunate. > > Having to check the RPi4 compatible string or OEM id in core init code is > > awful, regardless of whether you boot via ACPI or via DT. > > > > The problem with this hardware is that it uses a DMA mask which is > > narrower than 32, and the arm64 kernel is simply not set up to deal > > with that at all. On DT, we have DMA ranges properties and the likes > > to describe such limitations, on ACPI we have _DMA methods as well as > > DMA range attributes in the IORT, both of which are now handled > > correctly. So all the information is there, we just have to figure out > > how to consume it early on. > > Is it worth the effort just for a single board? I don't know about ACPI but > parsing dma-ranges that early at boot time is not trivial. My intuition tells > me that it'd be even harder for ACPI, being a more complex data structure. > Yes, it will be harder, especially for the _DMA methods. > > Interestingly, this limitation always existed in the SoC, but it > > wasn't until they started shipping it with more than 1 GB of DRAM that > > it became a problem. This means issues like this could resurface in > > the future with existing SoCs when they get shipped with more memory, > > and so I would prefer fixing this in a generic way. > > Actually what I proposed here is pretty generic. Specially from arm64's > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based > on > whatever it finds in DT. Both those operations are architecture independent. > arm64 arch code doesn't care about the logic involved in ascertaining > zone_dma_bits. I get that the last step isn't generic. But it's all setup so > as > to make it as such whenever it's worth the effort. > The problem is that, while we are providing a full description of the SoC's capabilities, we short circuit this by inserting knowledge into the code (that is shared between all DT architectures) that "brcm,bcm2711" is special, and needs a DMA zone override. I think for ACPI boot, we might be able to work around this by cold plugging the memory above 1 GB, but I have to double check whether it won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that for me here from the top of their head) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/3] iommu/tegra-smmu: Rework tegra_smmu_probe_device()
On Thu, Oct 08, 2020 at 02:12:10PM -0700, Nicolin Chen wrote: > On Thu, Oct 08, 2020 at 11:53:43AM +0200, Thierry Reding wrote: > > On Mon, Oct 05, 2020 at 06:05:46PM -0700, Nicolin Chen wrote: > > > On Mon, Oct 05, 2020 at 11:57:54AM +0200, Thierry Reding wrote: > > > > On Fri, Oct 02, 2020 at 11:58:29AM -0700, Nicolin Chen wrote: > > > > > On Fri, Oct 02, 2020 at 06:02:18PM +0300, Dmitry Osipenko wrote: > > > > > > 02.10.2020 09:08, Nicolin Chen пишет: > > > > > > > static int tegra_smmu_of_xlate(struct device *dev, > > > > > > > struct of_phandle_args *args) > > > > > > > { > > > > > > > + struct platform_device *iommu_pdev = > > > > > > > of_find_device_by_node(args->np); > > > > > > > + struct tegra_mc *mc = platform_get_drvdata(iommu_pdev); > > > > > > > u32 id = args->args[0]; > > > > > > > > > > > > > > + of_node_put(args->np); > > > > > > > > > > > > of_find_device_by_node() takes device reference and not the np > > > > > > reference. This is a bug, please remove of_node_put(). > > > > > > > > > > Looks like so. Replacing it with put_device(&iommu_pdev->dev); > > > > > > > > Putting the put_device() here is wrong, though. You need to make sure > > > > you keep a reference to it as long as you keep accessing the data that > > > > is owned by it. > > > > > > I am confused. You said in the other reply (to Dmitry) that we do > > > need to put_device(mc->dev), where mc->dev should be the same as > > > iommu_pdev->dev. But here your comments sounds that we should not > > > put_device at all since ->probe_device/group_device/attach_dev() > > > will use it later. > > > > You need to call put_device() at some point to release the reference > > that you acquired by calling of_find_device_by_node(). If you don't > > release it, you're leaking the reference and the kernel isn't going to > > know when it's safe to delete the device. > > > > So what I'm saying is that we either release it here, which isn't quite > > right because we do reference data relating to the device later on. And > > I see. A small question here by the way: By looking at other IOMMU > drivers that are calling driver_find_device_by_fwnode() function, > I found that most of them put_device right after the function call, > and dev_get_drvdata() after putting the device.. > > Feels like they are doing it wrongly? Well, like I said this is somewhat academic because these are all referencing the IOMMU that by definition still needs to be around when this code is called, and there's locks in place to ensure these don't go away. So it's not like these drivers are doing it wrong, they're just not doing it pedantically right. > > > because it isn't quite right there should be a reason to justify it, > > which is that the SMMU parent device is the same as the MC, so the > > reference count isn't strictly necessary. But that's not quite obvious, > > so highlighting it in a comment makes sense. > > > > The other alternative is to not call put_device() here and keep on to > > the reference as long as you keep using "mc". This might be difficult to > > implement because it may not be obvious where to release it. I think > > this is the better alternative, but if it's too complicated to implement > > it might not be worth it. > > I feel so too. The dev is got at of_xlate() that does not have an > obvious counterpart function. So I'll just remove put_device() and > put a line of comments, as you suggested. I think you misunderstood. Not calling put_device() would be wrong because that leaks a reference to the SMMU that you can't get back. My suggestion was rather to keep put_device() here, but add a comment as to why it's okay to call the put_device() here, even though you keep using its private data later beyond this point, which typically would be wrong to do. Thierry signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711
On Fri, 2020-10-09 at 11:13 +0200, Ard Biesheuvel wrote: > On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne > wrote: > > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote: > > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig wrote: > > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: > > > > > Sadly I just realised that the series is incomplete, we have RPi4 > > > > > users that > > > > > want to boot unsing ACPI, and this series would break things for > > > > > them. I'll > > > > > have a word with them to see what we can do for their use-case. > > > > > > > > Stupid question: why do these users insist on a totally unsuitable > > > > interface? And why would we as Linux developers care to support such > > > > a aims? > > > > > > The point is really whether we want to revert changes in Linux that > > > made both DT and ACPI boot work without quirks on RPi4. > > > > Well, and broke a big amount of devices that were otherwise fine. > > > > Yeah that was unfortunate. > > > > Having to check the RPi4 compatible string or OEM id in core init code is > > > awful, regardless of whether you boot via ACPI or via DT. > > > > > > The problem with this hardware is that it uses a DMA mask which is > > > narrower than 32, and the arm64 kernel is simply not set up to deal > > > with that at all. On DT, we have DMA ranges properties and the likes > > > to describe such limitations, on ACPI we have _DMA methods as well as > > > DMA range attributes in the IORT, both of which are now handled > > > correctly. So all the information is there, we just have to figure out > > > how to consume it early on. > > > > Is it worth the effort just for a single board? I don't know about ACPI but > > parsing dma-ranges that early at boot time is not trivial. My intuition > > tells > > me that it'd be even harder for ACPI, being a more complex data structure. > > > > Yes, it will be harder, especially for the _DMA methods. > > > > Interestingly, this limitation always existed in the SoC, but it > > > wasn't until they started shipping it with more than 1 GB of DRAM that > > > it became a problem. This means issues like this could resurface in > > > the future with existing SoCs when they get shipped with more memory, > > > and so I would prefer fixing this in a generic way. > > > > Actually what I proposed here is pretty generic. Specially from arm64's > > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits > > based on > > whatever it finds in DT. Both those operations are architecture independent. > > arm64 arch code doesn't care about the logic involved in ascertaining > > zone_dma_bits. I get that the last step isn't generic. But it's all setup > > so as > > to make it as such whenever it's worth the effort. > > > > The problem is that, while we are providing a full description of the > SoC's capabilities, we short circuit this by inserting knowledge into > the code (that is shared between all DT architectures) that > "brcm,bcm2711" is special, and needs a DMA zone override. Yes I understand this and I sympathize with it, not the most beautiful thing out there :). But that's only half the issue, as I said, implementing this early at boot time is a tangible amount of work and a burden to maintain just for one board. So this is the compromise we discussed with the DT maintainer (RobH). The series sets things up so as to be able to implement the right thing transparently to arm64's architecture when deemed worth the effort. Ultimately, if you're worried about inserting knowledge into the code, aren't we doing that, in a more extreme way, when imposing an extra unwarranted zone to the whole arm64 ecosystem? Note that I'm more that happy to work on alternative solutions, but let's first settle on what would be acceptable to everyone. > I think for ACPI boot, we might be able to work around this by cold > plugging the memory above 1 GB, but I have to double check whether it > won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that > for me here from the top of their head) Don't know much about what ACPI memory cold plugging involves, but we'll still need a proper ZONE_DMA32 (i.e. spanning the whole 32-bit address space) for RPi4. Regards, Nicolas signature.asc Description: This is a digitally signed message part ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: arm-smmu 5000000.iommu: Cannot accommodate DMA offset for IOMMU page tables
On Thu, 24 Sep 2020 at 15:26, Joerg Roedel wrote: > > On Thu, Sep 24, 2020 at 10:36:47AM +0100, Robin Murphy wrote: > > Yes, the issue was introduced by one of the changes in "dma-mapping: > > introduce DMA range map, supplanting dma_pfn_offset", so it only existed in > > the dma-mapping/for-next branch anyway. FYI, The reported problem still exists > > Okay, alright then. > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: arm-smmu 5000000.iommu: Cannot accommodate DMA offset for IOMMU page tables
On Fri, 9 Oct 2020 at 19:24, Naresh Kamboju wrote: > > > > On Thu, 24 Sep 2020 at 15:26, Joerg Roedel wrote: > > > > On Thu, Sep 24, 2020 at 10:36:47AM +0100, Robin Murphy wrote: > > > Yes, the issue was introduced by one of the changes in "dma-mapping: > > > introduce DMA range map, supplanting dma_pfn_offset", so it only existed > > > in > > > the dma-mapping/for-next branch anyway. > FYI, The reported problem still exists on 5.9.0-rc8-next-20201009. [1.843814] Driver must set ecc.strength when using hardware ECC [1.849847] WARNING: CPU: 4 PID: 1 at drivers/mtd/nand/raw/nand_base.c:5687 nand_scan_with_ids+0x1450/0x1470 [1.859676] Modules linked in: [1.862730] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc8-next-20201009 #1 [1.870125] Hardware name: Freescale Layerscape 2088A RDB Board (DT) [1.876478] pstate: 4005 (nZcv daif -PAN -UAO -TCO BTYPE=--) [1.882483] pc : nand_scan_with_ids+0x1450/0x1470 [1.887183] lr : nand_scan_with_ids+0x1450/0x1470 full test log, https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20201009/testrun/3284876/suite/linux-log-parser/test/check-kernel-warning-92014/log > > > > Okay, alright then. > > - Naresh ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711
On Fri, Oct 09, 2020 at 11:13:59AM +0200, Ard Biesheuvel wrote: > On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne > wrote: > > > > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote: > > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig wrote: > > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: > > > > > Sadly I just realised that the series is incomplete, we have RPi4 > > > > > users that > > > > > want to boot unsing ACPI, and this series would break things for > > > > > them. I'll > > > > > have a word with them to see what we can do for their use-case. > > > > > > > > Stupid question: why do these users insist on a totally unsuitable > > > > interface? And why would we as Linux developers care to support such > > > > a aims? > > > > > > The point is really whether we want to revert changes in Linux that > > > made both DT and ACPI boot work without quirks on RPi4. > > > > Well, and broke a big amount of devices that were otherwise fine. > > > > Yeah that was unfortunate. > > > > Having to check the RPi4 compatible string or OEM id in core init code is > > > awful, regardless of whether you boot via ACPI or via DT. > > > > > > The problem with this hardware is that it uses a DMA mask which is > > > narrower than 32, and the arm64 kernel is simply not set up to deal > > > with that at all. On DT, we have DMA ranges properties and the likes > > > to describe such limitations, on ACPI we have _DMA methods as well as > > > DMA range attributes in the IORT, both of which are now handled > > > correctly. So all the information is there, we just have to figure out > > > how to consume it early on. > > > > Is it worth the effort just for a single board? I don't know about ACPI but > > parsing dma-ranges that early at boot time is not trivial. My intuition > > tells > > me that it'd be even harder for ACPI, being a more complex data structure. > > > > Yes, it will be harder, especially for the _DMA methods. > > > > Interestingly, this limitation always existed in the SoC, but it > > > wasn't until they started shipping it with more than 1 GB of DRAM that > > > it became a problem. This means issues like this could resurface in > > > the future with existing SoCs when they get shipped with more memory, > > > and so I would prefer fixing this in a generic way. > > > > Actually what I proposed here is pretty generic. Specially from arm64's > > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits > > based on > > whatever it finds in DT. Both those operations are architecture independent. > > arm64 arch code doesn't care about the logic involved in ascertaining > > zone_dma_bits. I get that the last step isn't generic. But it's all setup > > so as > > to make it as such whenever it's worth the effort. > > > > The problem is that, while we are providing a full description of the > SoC's capabilities, we short circuit this by inserting knowledge into > the code (that is shared between all DT architectures) that > "brcm,bcm2711" is special, and needs a DMA zone override. > > I think for ACPI boot, we might be able to work around this by cold > plugging the memory above 1 GB, but I have to double check whether it > won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that > for me here from the top of their head) Is this information that we can infer from IORT nodes and make it generic (ie make a DMA limit out of all IORT nodes allowed ranges) ? We can move this check to IORT code and call it from arm64 if it can be made to work. Thanks, Lorenzo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/3] iommu/tegra-smmu: Rework tegra_smmu_probe_device()
On Fri, Oct 09, 2020 at 02:25:56PM +0200, Thierry Reding wrote: > On Thu, Oct 08, 2020 at 02:12:10PM -0700, Nicolin Chen wrote: > > On Thu, Oct 08, 2020 at 11:53:43AM +0200, Thierry Reding wrote: > > > On Mon, Oct 05, 2020 at 06:05:46PM -0700, Nicolin Chen wrote: > > > > On Mon, Oct 05, 2020 at 11:57:54AM +0200, Thierry Reding wrote: > > > > > On Fri, Oct 02, 2020 at 11:58:29AM -0700, Nicolin Chen wrote: > > > > > > On Fri, Oct 02, 2020 at 06:02:18PM +0300, Dmitry Osipenko wrote: > > > > > > > 02.10.2020 09:08, Nicolin Chen пишет: > > > > > > > > static int tegra_smmu_of_xlate(struct device *dev, > > > > > > > >struct of_phandle_args *args) > > > > > > > > { > > > > > > > > + struct platform_device *iommu_pdev = > > > > > > > > of_find_device_by_node(args->np); > > > > > > > > + struct tegra_mc *mc = platform_get_drvdata(iommu_pdev); > > > > > > > > u32 id = args->args[0]; > > > > > > > > > > > > > > > > + of_node_put(args->np); > > > > > > > > > > > > > > of_find_device_by_node() takes device reference and not the np > > > > > > > reference. This is a bug, please remove of_node_put(). > > > > > > > > > > > > Looks like so. Replacing it with put_device(&iommu_pdev->dev); > > > > > > > > > > Putting the put_device() here is wrong, though. You need to make sure > > > > > you keep a reference to it as long as you keep accessing the data that > > > > > is owned by it. > > > > > > > > I am confused. You said in the other reply (to Dmitry) that we do > > > > need to put_device(mc->dev), where mc->dev should be the same as > > > > iommu_pdev->dev. But here your comments sounds that we should not > > > > put_device at all since ->probe_device/group_device/attach_dev() > > > > will use it later. > > > > > > You need to call put_device() at some point to release the reference > > > that you acquired by calling of_find_device_by_node(). If you don't > > > release it, you're leaking the reference and the kernel isn't going to > > > know when it's safe to delete the device. > > > > > > So what I'm saying is that we either release it here, which isn't quite > > > right because we do reference data relating to the device later on. And > > > > I see. A small question here by the way: By looking at other IOMMU > > drivers that are calling driver_find_device_by_fwnode() function, > > I found that most of them put_device right after the function call, > > and dev_get_drvdata() after putting the device.. > > > > Feels like they are doing it wrongly? > > Well, like I said this is somewhat academic because these are all > referencing the IOMMU that by definition still needs to be around > when this code is called, and there's locks in place to ensure > these don't go away. So it's not like these drivers are doing it > wrong, they're just not doing it pedantically right. > > > > > > because it isn't quite right there should be a reason to justify it, > > > which is that the SMMU parent device is the same as the MC, so the > > > reference count isn't strictly necessary. But that's not quite obvious, > > > so highlighting it in a comment makes sense. > > > > > > The other alternative is to not call put_device() here and keep on to > > > the reference as long as you keep using "mc". This might be difficult to > > > implement because it may not be obvious where to release it. I think > > > this is the better alternative, but if it's too complicated to implement > > > it might not be worth it. > > > > I feel so too. The dev is got at of_xlate() that does not have an > > obvious counterpart function. So I'll just remove put_device() and > > put a line of comments, as you suggested. > > I think you misunderstood. Not calling put_device() would be wrong > because that leaks a reference to the SMMU that you can't get back. My > suggestion was rather to keep put_device() here, but add a comment as to > why it's okay to call the put_device() here, even though you keep using > its private data later beyond this point, which typically would be wrong > to do. I see. Thanks for clarification! Will send v6 soon. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711
On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi wrote: > > On Fri, Oct 09, 2020 at 11:13:59AM +0200, Ard Biesheuvel wrote: > > On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne > > wrote: > > > > > > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote: > > > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig wrote: > > > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne > > > > > wrote: > > > > > > Sadly I just realised that the series is incomplete, we have RPi4 > > > > > > users that > > > > > > want to boot unsing ACPI, and this series would break things for > > > > > > them. I'll > > > > > > have a word with them to see what we can do for their use-case. > > > > > > > > > > Stupid question: why do these users insist on a totally unsuitable > > > > > interface? And why would we as Linux developers care to support such > > > > > a aims? > > > > > > > > The point is really whether we want to revert changes in Linux that > > > > made both DT and ACPI boot work without quirks on RPi4. > > > > > > Well, and broke a big amount of devices that were otherwise fine. > > > > > > > Yeah that was unfortunate. > > > > > > Having to check the RPi4 compatible string or OEM id in core init code > > > > is > > > > awful, regardless of whether you boot via ACPI or via DT. > > > > > > > > The problem with this hardware is that it uses a DMA mask which is > > > > narrower than 32, and the arm64 kernel is simply not set up to deal > > > > with that at all. On DT, we have DMA ranges properties and the likes > > > > to describe such limitations, on ACPI we have _DMA methods as well as > > > > DMA range attributes in the IORT, both of which are now handled > > > > correctly. So all the information is there, we just have to figure out > > > > how to consume it early on. > > > > > > Is it worth the effort just for a single board? I don't know about ACPI > > > but > > > parsing dma-ranges that early at boot time is not trivial. My intuition > > > tells > > > me that it'd be even harder for ACPI, being a more complex data structure. > > > > > > > Yes, it will be harder, especially for the _DMA methods. > > > > > > Interestingly, this limitation always existed in the SoC, but it > > > > wasn't until they started shipping it with more than 1 GB of DRAM that > > > > it became a problem. This means issues like this could resurface in > > > > the future with existing SoCs when they get shipped with more memory, > > > > and so I would prefer fixing this in a generic way. > > > > > > Actually what I proposed here is pretty generic. Specially from arm64's > > > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits > > > based on > > > whatever it finds in DT. Both those operations are architecture > > > independent. > > > arm64 arch code doesn't care about the logic involved in ascertaining > > > zone_dma_bits. I get that the last step isn't generic. But it's all setup > > > so as > > > to make it as such whenever it's worth the effort. > > > > > > > The problem is that, while we are providing a full description of the > > SoC's capabilities, we short circuit this by inserting knowledge into > > the code (that is shared between all DT architectures) that > > "brcm,bcm2711" is special, and needs a DMA zone override. > > > > I think for ACPI boot, we might be able to work around this by cold > > plugging the memory above 1 GB, but I have to double check whether it > > won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that > > for me here from the top of their head) > > Is this information that we can infer from IORT nodes and make it > generic (ie make a DMA limit out of all IORT nodes allowed ranges) ? > Yes, that seems feasible. > We can move this check to IORT code and call it from arm64 if it > can be made to work. > Finding the smallest value in the IORT, and assigning it to zone_dma_bits if it is < 32 should be easy. But as I understand it, having these separate DMA and DMA32 zones is what breaks kdump, no? So how is this going to fix the underlying issue? Nicolas, were there any other reported regressions caused by the introduction of ZONE_DMA? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 3/3] iommu/tegra-smmu: Add PCI support
This patch simply adds support for PCI devices. Reviewed-by: Dmitry Osipenko Tested-by: Dmitry Osipenko Signed-off-by: Nicolin Chen --- Changelog v6->v7 * Renamed goto labels, suggested by Thierry. v5->v6 * Added Dmitry's Reviewed-by and Tested-by. v4->v5 * Added Dmitry's Reviewed-by v3->v4 * Dropped !iommu_present() check * Added CONFIG_PCI check in the exit path v2->v3 * Replaced ternary conditional operator with if-else in .device_group() * Dropped change in tegra_smmu_remove() v1->v2 * Added error-out labels in tegra_smmu_probe() * Dropped pci_request_acs() since IOMMU core would call it. drivers/iommu/tegra-smmu.c | 35 +-- 1 file changed, 25 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index be29f5977145..2941d6459076 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -865,7 +866,11 @@ static struct iommu_group *tegra_smmu_device_group(struct device *dev) group->smmu = smmu; group->soc = soc; - group->group = iommu_group_alloc(); + if (dev_is_pci(dev)) + group->group = pci_device_group(dev); + else + group->group = generic_device_group(dev); + if (IS_ERR(group->group)) { devm_kfree(smmu->dev, group); mutex_unlock(&smmu->lock); @@ -1075,22 +1080,32 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev, iommu_device_set_fwnode(&smmu->iommu, dev->fwnode); err = iommu_device_register(&smmu->iommu); - if (err) { - iommu_device_sysfs_remove(&smmu->iommu); - return ERR_PTR(err); - } + if (err) + goto remove_sysfs; err = bus_set_iommu(&platform_bus_type, &tegra_smmu_ops); - if (err < 0) { - iommu_device_unregister(&smmu->iommu); - iommu_device_sysfs_remove(&smmu->iommu); - return ERR_PTR(err); - } + if (err < 0) + goto unregister; + +#ifdef CONFIG_PCI + err = bus_set_iommu(&pci_bus_type, &tegra_smmu_ops); + if (err < 0) + goto unset_platform_bus; +#endif if (IS_ENABLED(CONFIG_DEBUG_FS)) tegra_smmu_debugfs_init(smmu); return smmu; + +unset_platform_bus: __maybe_unused; + bus_set_iommu(&platform_bus_type, NULL); +unregister: + iommu_device_unregister(&smmu->iommu); +remove_sysfs: + iommu_device_sysfs_remove(&smmu->iommu); + + return ERR_PTR(err); } void tegra_smmu_remove(struct tegra_smmu *smmu) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 1/3] iommu/tegra-smmu: Use fwspec in tegra_smmu_(de)attach_dev
In tegra_smmu_(de)attach_dev() functions, we poll DTB for each client's iommus property to get swgroup ID in order to prepare "as" and enable smmu. Actually tegra_smmu_configure() prepared an fwspec for each client, and added to the fwspec all swgroup IDs of client DT node in DTB. So this patch uses fwspec in tegra_smmu_(de)attach_dev() so as to replace the redundant DT polling code. Reviewed-by: Dmitry Osipenko Tested-by: Dmitry Osipenko Signed-off-by: Nicolin Chen --- Changelog v6->v7: * No change v5->v6: * Added Dmitry's Reviewed-by and Tested-by v4->v5: * Removed "index" and "err" assigning to 0 v3->v4: * Seperated the change, as a cleanup, from the rework patch v1->v3: * N/A drivers/iommu/tegra-smmu.c | 56 -- 1 file changed, 23 insertions(+), 33 deletions(-) diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index 6a3ecc334481..297d49f3f80e 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -484,60 +484,50 @@ static void tegra_smmu_as_unprepare(struct tegra_smmu *smmu, static int tegra_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) { + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct tegra_smmu *smmu = dev_iommu_priv_get(dev); struct tegra_smmu_as *as = to_smmu_as(domain); - struct device_node *np = dev->of_node; - struct of_phandle_args args; - unsigned int index = 0; - int err = 0; - - while (!of_parse_phandle_with_args(np, "iommus", "#iommu-cells", index, - &args)) { - unsigned int swgroup = args.args[0]; - - if (args.np != smmu->dev->of_node) { - of_node_put(args.np); - continue; - } + unsigned int index; + int err; - of_node_put(args.np); + if (!fwspec) + return -ENOENT; + for (index = 0; index < fwspec->num_ids; index++) { err = tegra_smmu_as_prepare(smmu, as); - if (err < 0) - return err; + if (err) + goto disable; - tegra_smmu_enable(smmu, swgroup, as->id); - index++; + tegra_smmu_enable(smmu, fwspec->ids[index], as->id); } if (index == 0) return -ENODEV; return 0; + +disable: + while (index--) { + tegra_smmu_disable(smmu, fwspec->ids[index], as->id); + tegra_smmu_as_unprepare(smmu, as); + } + + return err; } static void tegra_smmu_detach_dev(struct iommu_domain *domain, struct device *dev) { + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct tegra_smmu_as *as = to_smmu_as(domain); - struct device_node *np = dev->of_node; struct tegra_smmu *smmu = as->smmu; - struct of_phandle_args args; - unsigned int index = 0; - - while (!of_parse_phandle_with_args(np, "iommus", "#iommu-cells", index, - &args)) { - unsigned int swgroup = args.args[0]; + unsigned int index; - if (args.np != smmu->dev->of_node) { - of_node_put(args.np); - continue; - } - - of_node_put(args.np); + if (!fwspec) + return; - tegra_smmu_disable(smmu, swgroup, as->id); + for (index = 0; index < fwspec->num_ids; index++) { + tegra_smmu_disable(smmu, fwspec->ids[index], as->id); tegra_smmu_as_unprepare(smmu, as); - index++; } } -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 0/3] iommu/tegra-smmu: Add PCI support
This series is to add PCI support in tegra-smmu driver. Changelog (Detail in each patch) v6->v7 * Added comments for put_device in PATCH-2 * Renamed goto labels in PATCH-3 * Kept Dmitry's Reviewed-by and Tested-by as no function change v5->v6 * Dropped a NULL check, per Dmitry's comments * Added Dmitry's Reviewed-by and Tested-by v4->v5 * PATCH-1 Cleaned two variables inits * PATCH-2 Fixed put() in ->of_xlate() and Updated commit message * PATCH-3 Added Dmitry's Reviewed-by v3->v4 * Dropped helper function, per Thierry's comments * Found another way to get smmu pointer v2->v3 * Replaced with devm_tegra_get_memory_controller * Updated changes by following Dmitry's comments v1->v2 * Added PATCH-1 suggested by Dmitry * Reworked PATCH-2 to unify certain code Nicolin Chen (3): iommu/tegra-smmu: Use fwspec in tegra_smmu_(de)attach_dev iommu/tegra-smmu: Rework tegra_smmu_probe_device() iommu/tegra-smmu: Add PCI support drivers/iommu/tegra-smmu.c | 187 + 1 file changed, 63 insertions(+), 124 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 2/3] iommu/tegra-smmu: Rework tegra_smmu_probe_device()
The bus_set_iommu() in tegra_smmu_probe() enumerates all clients to call in tegra_smmu_probe_device() where each client searches its DT node for smmu pointer and swgroup ID, so as to configure an fwspec. But this requires a valid smmu pointer even before mc and smmu drivers are probed. So in tegra_smmu_probe() we added a line of code to fill mc->smmu, marking "a bit of a hack". This works for most of clients in the DTB, however, doesn't work for a client that doesn't exist in DTB, a PCI device for example. Actually, if we return ERR_PTR(-ENODEV) in ->probe_device() when it's called from bus_set_iommu(), iommu core will let everything carry on. Then when a client gets probed, of_iommu_configure() in iommu core will search DTB for swgroup ID and call ->of_xlate() to prepare an fwspec, similar to tegra_smmu_probe_device() and tegra_smmu_configure(). Then it'll call tegra_smmu_probe_device() again, and this time we shall return smmu->iommu pointer properly. So we can get rid of tegra_smmu_find() and tegra_smmu_configure() along with DT polling code by letting the iommu core handle every thing, except a problem that we search iommus property in DTB not only for swgroup ID but also for mc node to get mc->smmu pointer to call dev_iommu_priv_set() and return the smmu->iommu pointer. So we'll need to find another way to get smmu pointer. Referencing the implementation of sun50i-iommu driver, of_xlate() has client's dev pointer, mc node and swgroup ID. This means that we can call dev_iommu_priv_set() in of_xlate() instead, so we can simply get smmu pointer in ->probe_device(). This patch reworks tegra_smmu_probe_device() by: 1) Removing mc->smmu hack in tegra_smmu_probe() so as to return ERR_PTR(-ENODEV) in tegra_smmu_probe_device() during stage of tegra_smmu_probe/tegra_mc_probe(). 2) Moving dev_iommu_priv_set() to of_xlate() so we can get smmu pointer in tegra_smmu_probe_device() to replace DTB polling. 3) Removing tegra_smmu_configure() accordingly since iommu core takes care of it. This also fixes a problem that previously we could add clients to iommu groups before iommu core initializes its default domain: ubuntu@jetson:~$ dmesg | grep iommu platform 5000.host1x: Adding to iommu group 1 platform 5700.gpu: Adding to iommu group 2 iommu: Default domain type: Translated platform 5420.dc: Adding to iommu group 3 platform 5424.dc: Adding to iommu group 3 platform 5434.vic: Adding to iommu group 4 Though it works fine with IOMMU_DOMAIN_UNMANAGED, but will have warnings if switching to IOMMU_DOMAIN_DMA: iommu: Failed to allocate default IOMMU domain of type 0 for group (null) - Falling back to IOMMU_DOMAIN_DMA iommu: Failed to allocate default IOMMU domain of type 0 for group (null) - Falling back to IOMMU_DOMAIN_DMA Now, bypassing the first probe_device() call from bus_set_iommu() fixes the sequence: ubuntu@jetson:~$ dmesg | grep iommu iommu: Default domain type: Translated tegra-host1x 5000.host1x: Adding to iommu group 0 tegra-dc 5420.dc: Adding to iommu group 1 tegra-dc 5424.dc: Adding to iommu group 1 tegra-vic 5434.vic: Adding to iommu group 2 nouveau 5700.gpu: Adding to iommu group 3 Note that dmesg log above is testing with IOMMU_DOMAIN_UNMANAGED. Reviewed-by: Dmitry Osipenko Tested-by: Dmitry Osipenko Signed-off-by: Nicolin Chen --- Changelog v6->v7 * Added comments for put_device(), suggested by Thierry. v5->v6 * Dropped NULL mc/smmu pointer check, suggested by Dmitry. * Updated commit message with test results with vanilla DTB. * Added Dmitry's Reviewed-by and Tested-by. v4->v5 * Replaced of_node_put() with put_device() in of_xlate(). * Added test result in commit message. v3->v4 * Moved dev_iommu_priv_set() to of_xlate() so we don't need to poll DTB for smmu pointer. * Removed the hack in tegra_smmu_probe() by returning ERR_PTR( -ENODEV) in tegra_smmu_probe_device() to let iommu core call in again. * Removed tegra_smmu_find() and tegra_smmu_configure() as iommu core takes care of fwspec. v2->v3 * Used devm_tegra_get_memory_controller() to get mc pointer * Replaced IS_ERR_OR_NULL with IS_ERR in .probe_device() v1->v2 * Replaced in .probe_device() tegra_smmu_find/tegra_smmu_configure() with tegra_get_memory_controller call. * Dropped the hack in tegra_smmu_probe(). drivers/iommu/tegra-smmu.c | 96 ++ 1 file changed, 15 insertions(+), 81 deletions(-) diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index 297d49f3f80e..be29f5977145 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -797,75 +797,9 @@ static phys_addr_t tegra_smmu_iova_to_phys(struct iommu_domain *domain, return SMMU_PFN_PHYS(pfn) + SMMU_OFFSET_IN_PAGE(iova); } -static struct tegra_smmu *tegra_smmu_find(struct device_node *np) -{ - struct platform_device *pdev; -
Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711
On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote: > On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi > wrote: > > We can move this check to IORT code and call it from arm64 if it > > can be made to work. > > Finding the smallest value in the IORT, and assigning it to > zone_dma_bits if it is < 32 should be easy. But as I understand it, > having these separate DMA and DMA32 zones is what breaks kdump, no? So > how is this going to fix the underlying issue? If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32 allocations fall back to ZONE_DMA). kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly became 1GB. We could change kdump to allocate ZONE_DMA32 but this one may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel would need to be rebuilt without ZONE_DMA since it won't have any. IIRC (it's been a while since I looked), the kdump allocation couldn't span multiple zones. In a separate thread, we try to fix kdump to use allocations above 4G as a fallback but this only fixes platforms with enough RAM (and maybe it's only those platforms that care about kdump). -- Catalin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 02/24] dt-bindings: memory: mediatek: Convert SMI to DT schema
On Tue, 2020-10-06 at 09:15 +0200, Krzysztof Kozlowski wrote: > On Tue, 6 Oct 2020 at 06:27, Yong Wu wrote: > > > > On Fri, 2020-10-02 at 13:08 +0200, Krzysztof Kozlowski wrote: > > > On Wed, Sep 30, 2020 at 03:06:25PM +0800, Yong Wu wrote: > > > > Convert MediaTek SMI to DT schema. > > > > > > > > Signed-off-by: Yong Wu > > > > --- > > > > .../mediatek,smi-common.txt | 49 - > > > > .../mediatek,smi-common.yaml | 100 ++ > > > > .../memory-controllers/mediatek,smi-larb.txt | 49 - > > > > .../memory-controllers/mediatek,smi-larb.yaml | 91 > > > > 4 files changed, 191 insertions(+), 98 deletions(-) > > > > delete mode 100644 > > > > Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.txt > > > > create mode 100644 > > > > Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml > > > > delete mode 100644 > > > > Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt > > > > create mode 100644 > > > > Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml > > ... > > > > +properties: > > > > + compatible: > > > > +oneOf: > > > > + - enum: > > > > + - mediatek,mt2701-smi-common > > > > + - mediatek,mt2712-smi-common > > > > + - mediatek,mt6779-smi-common > > > > + - mediatek,mt8173-smi-common > > > > + - mediatek,mt8183-smi-common > > > > + > > > > + - description: for mt7623 > > > > +items: > > > > + - const: mediatek,mt7623-smi-common > > > > + - const: mediatek,mt2701-smi-common > > > > + > > > > + reg: > > > > +maxItems: 1 > > > > + > > > > + clocks: > > > > +description: | > > > > + apb and smi are mandatory. the async is only for generation 1 > > > > smi HW. > > > > + gals(global async local sync) also is optional, here is the list > > > > which > > > > + require gals: mt6779 and mt8183. > > > > +minItems: 2 > > > > +maxItems: 4 > > > > +items: > > > > + - description: apb is Advanced Peripheral Bus clock, It's the > > > > clock for > > > > + setting the register. > > > > + - description: smi is the clock for transfer data and command. > > > > + - description: async is asynchronous clock, it help transform > > > > the smi clock > > > > + into the emi clock domain. > > > > + - description: gals0 is the path0 clock of gals. > > > > + - description: gals1 is the path1 clock of gals. > > > > + > > > > + clock-names: > > > > +oneOf: > > > > + - items: > > > > + - const: apb > > > > + - const: smi > > > > + - items: > > > > + - const: apb > > > > + - const: smi > > > > + - const: async > > > > + - items: > > > > + - const: apb > > > > + - const: smi > > > > + - const: gals0 > > > > + - const: gals1 > > > > > > Similarly to my comment to other properties, this requirement per > > > compatible should be part of the schema within 'if-then'. > > > > I'm not so familiar with this format. Do this has "if-then-'else > > if'-then-else"? > > These are mutually exclusive conditions, so you can skip else: > - if-then > - if-then > - if-then > It will be more readable then stacking 'if' under 'else' Thanks. I will use something like this: anyOf: - if: #gen1 hw then: use apb/smi/async clocks - if: #gen2 hw that has gals. then: use apb/smi/gals0/gals1 clocks else: # gen2 hw that doesn't have gals. use apb/smi clocks. > > > > > I tried below instead of the clocks segment above: > > > > === > > if: > > properties: > > compatible: > > Missing contains. Just take an example from some existing schema. Like the example you gave below (Documentation/devicetree/bindings/clock/idt,versaclock5.yaml), It also doesn't have "contains" in "if". I guess it is unnecessary if there is only one compatible string. it may be necessary when it has backward compatible string. > > > enum: > > - mediatek,mt6779-smi-common > > - mediatek,mt8183-smi-common > > > > then: > > properties: > > clock: > > items: > > - description: apb is the clock for setting the register.. > > - description: smi is the clock for transfer data and command. > > - description: gals0 is the path0 clock of gals(global async > > local sync). > > - description: gals1 is the path1 clock of gals. > > clock-names: > > items: > > - const: apb > > - const: smi > > - const: gals0 > > - const: gals1 > > else: > > if: > > properties: > > compatible: > > contains: > > enum: > > - mediatek,mt2701-smi-common > > > > then: > > properties: > > clocks: > > items: > > - desc