Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-09 Thread Christoph Hellwig
On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> Sadly I just realised that the series is incomplete, we have RPi4 users that
> want to boot unsing ACPI, and this series would break things for them. I'll
> have a word with them to see what we can do for their use-case.

Stupid question:  why do these users insist on a totally unsuitable
interface?  And why would we as Linux developers care to support such
a aims?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-09 Thread Ard Biesheuvel
On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig  wrote:
>
> On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > want to boot unsing ACPI, and this series would break things for them. I'll
> > have a word with them to see what we can do for their use-case.
>
> Stupid question:  why do these users insist on a totally unsuitable
> interface?  And why would we as Linux developers care to support such
> a aims?
>

The point is really whether we want to revert changes in Linux that
made both DT and ACPI boot work without quirks on RPi4. Having to
check the RPi4 compatible string or OEM id in core init code is awful,
regardless of whether you boot via ACPI or via DT.

The problem with this hardware is that it uses a DMA mask which is
narrower than 32, and the arm64 kernel is simply not set up to deal
with that at all. On DT, we have DMA ranges properties and the likes
to describe such limitations, on ACPI we have _DMA methods as well as
DMA range attributes in the IORT, both of which are now handled
correctly. So all the information is there, we just have to figure out
how to consume it early on.

Interestingly, this limitation always existed in the SoC, but it
wasn't until they started shipping it with more than 1 GB of DRAM that
it became a problem. This means issues like this could resurface in
the future with existing SoCs when they get shipped with more memory,
and so I would prefer fixing this in a generic way.

Also, I assume papering over the issue like this does not fix the
kdump issue fundamentally, it just works around it, and so we might
run into this again in the future.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()

2020-10-09 Thread David Woodhouse
On Thu, 2020-10-08 at 14:40 +0200, Thomas Gleixner wrote:
> Subject: x86/iommu: Make interrupt remapping more robust
> From: Thomas Gleixner 
> Date: Thu, 08 Oct 2020 14:09:44 +0200
> 
> Needs to be split into pieces and cover PCI proper. Right now PCI gets a
> NULL pointer assigned which makes it explode at the wrong place
> later. Also hyperv iommu wants some love.
> 
> NOT-Signed-off-by: Thomas Gleixner 
> ---
>  arch/x86/kernel/apic/io_apic.c  |4 +++-
>  arch/x86/kernel/apic/msi.c  |   24 ++--
>  drivers/iommu/amd/iommu.c   |6 +++---
>  drivers/iommu/intel/irq_remapping.c |4 ++--
>  4 files changed, 22 insertions(+), 16 deletions(-)
> 
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -2300,7 +2300,9 @@ static int mp_irqdomain_create(int ioapi
> info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
> info.devid = mpc_ioapic_id(ioapic);
> parent = irq_remapping_get_irq_domain(&info);
> -   if (!parent)
> +   if (IS_ERR(parent))
> +   return PTR_ERR(parent);
> +   else if (!parent)
> parent = x86_vector_domain;
> else
> name = "IO-APIC-IR";
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -415,9 +415,9 @@ static struct msi_domain_info hpet_msi_d
>  struct irq_domain *hpet_create_irq_domain(int hpet_id)
>  {
> struct msi_domain_info *domain_info;
> +   struct fwnode_handle *fn = NULL;
> struct irq_domain *parent, *d;
> struct irq_alloc_info info;
> -   struct fwnode_handle *fn;
>  
> if (x86_vector_domain == NULL)
> return NULL;
> @@ -432,25 +432,29 @@ struct irq_domain *hpet_create_irq_domai
> init_irq_alloc_info(&info, NULL);
> info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
> info.devid = hpet_id;
> +
> parent = irq_remapping_get_irq_domain(&info);
> -   if (parent == NULL)
> +   if (IS_ERR(parent))
> +   goto fail;
> +   else if (!parent)
> parent = x86_vector_domain;
> else
> hpet_msi_controller.name = "IR-HPET-MSI";

Hrm... this is the '-ENODEV changes' that you wanted me to pick up,
right?

I confess I don't like it very much.

I'd much prefer to be able to use a generic foo_get_parent_irq_domain()
function which just returns an answer or an error.

Definitely none of this "if it returns NULL, caller fills it in for
themselves with some magic global because we're special".

And I don't much like abusing the irq_alloc_info for this either. I
didn't like the irq_alloc_info very much in its *original* use cases,
for actually allocating IRQs. Abusing it for this is worse.

I'm more than happy to start digging into this, but once I do I fear
I'm not going to stop until HPET and IOAPIC don't know *anything* about
whether they're behind IR or not.

And yes, I know that IOAPIC has a different EOI method for IR but
AFAICT that's *already* hosed for Hyper-V because it doesn't *really*
do remapping and so the RTEs *can* change there to move interrupts
around. There's more differences between ioapic_ack_level() and
ioapic_ir_ack_level() than the erratum workaround and whether we use
data->entry.vector... aren't there?

For the specific case you were targeting with this patch, you already
successfully persuaded me it should never happen, and there's a WARN_ON
which would trigger if it ever does (with too many CPUs in the system).

Let's come back and tackle this can of worms in a separate cleanup
series.


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-09 Thread Nicolas Saenz Julienne
On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig  wrote:
> > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > Sadly I just realised that the series is incomplete, we have RPi4 users 
> > > that
> > > want to boot unsing ACPI, and this series would break things for them. 
> > > I'll
> > > have a word with them to see what we can do for their use-case.
> > 
> > Stupid question:  why do these users insist on a totally unsuitable
> > interface? And why would we as Linux developers care to support such
> > a aims?
>
> The point is really whether we want to revert changes in Linux that
> made both DT and ACPI boot work without quirks on RPi4.

Well, and broke a big amount of devices that were otherwise fine.

> Having to check the RPi4 compatible string or OEM id in core init code is
> awful, regardless of whether you boot via ACPI or via DT.
>
> The problem with this hardware is that it uses a DMA mask which is
> narrower than 32, and the arm64 kernel is simply not set up to deal
> with that at all. On DT, we have DMA ranges properties and the likes
> to describe such limitations, on ACPI we have _DMA methods as well as
> DMA range attributes in the IORT, both of which are now handled
> correctly. So all the information is there, we just have to figure out
> how to consume it early on.

Is it worth the effort just for a single board? I don't know about ACPI but
parsing dma-ranges that early at boot time is not trivial. My intuition tells
me that it'd be even harder for ACPI, being a more complex data structure.

> Interestingly, this limitation always existed in the SoC, but it
> wasn't until they started shipping it with more than 1 GB of DRAM that
> it became a problem. This means issues like this could resurface in
> the future with existing SoCs when they get shipped with more memory,
> and so I would prefer fixing this in a generic way.

Actually what I proposed here is pretty generic. Specially from arm64's
perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on
whatever it finds in DT. Both those operations are architecture independent.
arm64 arch code doesn't care about the logic involved in ascertaining
zone_dma_bits. I get that the last step isn't generic. But it's all setup so as
to make it as such whenever it's worth the effort.

> Also, I assume papering over the issue like this does not fix the
> kdump issue fundamentally, it just works around it, and so we might
> run into this again in the future.

Any ideas? The way I understand it the kdump issue is just a shortcoming of
the memory zones design.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-09 Thread Nicolas Saenz Julienne
Hi Jeremy.

On Thu, 2020-10-08 at 22:59 -0500, Jeremy Linton wrote:
> On 10/8/20 2:43 PM, Ard Biesheuvel wrote:
> > (+ Lorenzo)
> > 
> > On Thu, 8 Oct 2020 at 12:14, Catalin Marinas  
> > wrote:
> > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > > On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote:
> > > > > On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne 
> > > > > wrote:
> > > > > > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > > > > > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > > > > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz 
> > > > > > > > Julienne wrote:
> > > > > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > > > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > > > > > > --- a/drivers/of/fdt.c
> > > > > > > > > +++ b/drivers/of/fdt.c
> > > > > > > > > @@ -25,6 +25,7 @@
> > > > > > > > >   #include 
> > > > > > > > >   #include 
> > > > > > > > >   #include 
> > > > > > > > > +#include   /* for zone_dma_bits */
> > > > > > > > > 
> > > > > > > > >   #include   /* for COMMAND_LINE_SIZE */
> > > > > > > > >   #include 
> > > > > > > > > @@ -1198,6 +1199,14 @@ void __init 
> > > > > > > > > early_init_dt_scan_nodes(void)
> > > > > > > > >  of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > > > > > > >   }
> > > > > > > > > 
> > > > > > > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > > > > > > +{
> > > > > > > > > +   unsigned long dt_root = of_get_flat_dt_root();
> > > > > > > > > +
> > > > > > > > > +   if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > > > > > > +   zone_dma_bits = 30;
> > > > > > > > > +}
> > > > > > > > 
> > > > > > > > I think we could keep this entirely in the arm64 
> > > > > > > > setup_machine_fdt() and
> > > > > > > > not pollute the core code with RPi4-specific code.
> > > > > > > 
> > > > > > > Actually, even better, could we not move the check to
> > > > > > > arm64_memblock_init() when we initialise zone_dma_bits?
> > > > > > 
> > > > > > I did it this way as I vaguely remembered Rob saying he wanted to 
> > > > > > centralise
> > > > > > all early boot fdt code in one place. But I'll be happy to move it 
> > > > > > there.
> > > > > 
> > > > > I can see Rob replied and I'm fine if that's his preference. However,
> > > > > what I don't particularly like is that in the arm64 code, if
> > > > > zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched 
> > > > > by
> > > > > the early_init_dt_update_zone_dma_bits(). What if at some point we'll
> > > > > get a platform that actually needs 24 here (I truly hope not, but just
> > > > > the principle of relying on magic values)?
> > > > > 
> > > > > So rather than guessing, I'd prefer if the arch code can override
> > > > > ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
> > > > > need to explicitly touch the zone_dma_bits variable.
> > > > 
> > > > Yes, sonds like the way to go. TBH I wasn't happy with that solution 
> > > > either,
> > > > but couldn't think of a nicer alternative.
> > > > 
> > > > Sadly I just realised that the series is incomplete, we have RPi4 users 
> > > > that
> > > > want to boot unsing ACPI, and this series would break things for them. 
> > > > I'll
> > > > have a word with them to see what we can do for their use-case.
> > > 
> > > Is there a way to get some SoC information from ACPI?
> > > 
> > 
> > This is unfortunate. We used ACPI _DMA methods as they were designed
> > to communicate the DMA limit of the XHCI controller to the OS.
> > 
> > It shouldn't be too hard to match the OEM id field in the DSDT, and
> > switch to the smaller mask. But it sucks to have to add a quirk like
> > that.
> > It also requires delaying setting the arm64_dma_phy_limit a bit, but 
> that doesn't appear to be causing a problem. I've been boot/compiling 
> with a patch like:
> 
> diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
> index cada0b816c8a..9dfe776c1c75 100644
> --- a/arch/arm64/kernel/acpi.c
> +++ b/arch/arm64/kernel/acpi.c
> @@ -14,6 +14,7 @@
> 
>   #include 
>   #include 
> +#include 
>   #include 
>   #include 
>   #include 
> @@ -168,6 +169,11 @@ static int __init acpi_fadt_sanity_check(void)
>  ret = -EINVAL;
>  }
> 
> +   if (!strncmp(table->oem_id, "RPIFDN", ACPI_OEM_ID_SIZE) &&
> +   !strncmp(table->oem_table_id,  "RPI4", 
> ACPI_OEM_TABLE_ID_SIZE) &&
> +   table->oem_revision <= 0x200)  {
> +   zone_dma_bits = 30;
> +   }
>   out:
>  /*
>   * acpi_get_table() creates FADT table mapping that
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index cd5caca8a929..6c8aaf1570ce 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -195,6 +195,7 @@ static void __init zone_sizes_init(unsigned long 
> min, unsigned long max)
>  unsigned long max_zone_pfns[MAX

Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-09 Thread Ard Biesheuvel
On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne
 wrote:
>
> On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig  wrote:
> > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > > Sadly I just realised that the series is incomplete, we have RPi4 users 
> > > > that
> > > > want to boot unsing ACPI, and this series would break things for them. 
> > > > I'll
> > > > have a word with them to see what we can do for their use-case.
> > >
> > > Stupid question:  why do these users insist on a totally unsuitable
> > > interface? And why would we as Linux developers care to support such
> > > a aims?
> >
> > The point is really whether we want to revert changes in Linux that
> > made both DT and ACPI boot work without quirks on RPi4.
>
> Well, and broke a big amount of devices that were otherwise fine.
>

Yeah that was unfortunate.

> > Having to check the RPi4 compatible string or OEM id in core init code is
> > awful, regardless of whether you boot via ACPI or via DT.
> >
> > The problem with this hardware is that it uses a DMA mask which is
> > narrower than 32, and the arm64 kernel is simply not set up to deal
> > with that at all. On DT, we have DMA ranges properties and the likes
> > to describe such limitations, on ACPI we have _DMA methods as well as
> > DMA range attributes in the IORT, both of which are now handled
> > correctly. So all the information is there, we just have to figure out
> > how to consume it early on.
>
> Is it worth the effort just for a single board? I don't know about ACPI but
> parsing dma-ranges that early at boot time is not trivial. My intuition tells
> me that it'd be even harder for ACPI, being a more complex data structure.
>

Yes, it will be harder, especially for the _DMA methods.

> > Interestingly, this limitation always existed in the SoC, but it
> > wasn't until they started shipping it with more than 1 GB of DRAM that
> > it became a problem. This means issues like this could resurface in
> > the future with existing SoCs when they get shipped with more memory,
> > and so I would prefer fixing this in a generic way.
>
> Actually what I proposed here is pretty generic. Specially from arm64's
> perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based 
> on
> whatever it finds in DT. Both those operations are architecture independent.
> arm64 arch code doesn't care about the logic involved in ascertaining
> zone_dma_bits. I get that the last step isn't generic. But it's all setup so 
> as
> to make it as such whenever it's worth the effort.
>

The problem is that, while we are providing a full description of the
SoC's capabilities, we short circuit this by inserting knowledge into
the code (that is shared between all DT architectures) that
"brcm,bcm2711" is special, and needs a DMA zone override.

I think for ACPI boot, we might be able to work around this by cold
plugging the memory above 1 GB, but I have to double check whether it
won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that
for me here from the top of their head)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/3] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2020-10-09 Thread Thierry Reding
On Thu, Oct 08, 2020 at 02:12:10PM -0700, Nicolin Chen wrote:
> On Thu, Oct 08, 2020 at 11:53:43AM +0200, Thierry Reding wrote:
> > On Mon, Oct 05, 2020 at 06:05:46PM -0700, Nicolin Chen wrote:
> > > On Mon, Oct 05, 2020 at 11:57:54AM +0200, Thierry Reding wrote:
> > > > On Fri, Oct 02, 2020 at 11:58:29AM -0700, Nicolin Chen wrote:
> > > > > On Fri, Oct 02, 2020 at 06:02:18PM +0300, Dmitry Osipenko wrote:
> > > > > > 02.10.2020 09:08, Nicolin Chen пишет:
> > > > > > >  static int tegra_smmu_of_xlate(struct device *dev,
> > > > > > >  struct of_phandle_args *args)
> > > > > > >  {
> > > > > > > + struct platform_device *iommu_pdev = 
> > > > > > > of_find_device_by_node(args->np);
> > > > > > > + struct tegra_mc *mc = platform_get_drvdata(iommu_pdev);
> > > > > > >   u32 id = args->args[0];
> > > > > > >  
> > > > > > > + of_node_put(args->np);
> > > > > > 
> > > > > > of_find_device_by_node() takes device reference and not the np
> > > > > > reference. This is a bug, please remove of_node_put().
> > > > > 
> > > > > Looks like so. Replacing it with put_device(&iommu_pdev->dev);
> > > > 
> > > > Putting the put_device() here is wrong, though. You need to make sure
> > > > you keep a reference to it as long as you keep accessing the data that
> > > > is owned by it.
> > > 
> > > I am confused. You said in the other reply (to Dmitry) that we do
> > > need to put_device(mc->dev), where mc->dev should be the same as
> > > iommu_pdev->dev. But here your comments sounds that we should not
> > > put_device at all since ->probe_device/group_device/attach_dev()
> > > will use it later.
> > 
> > You need to call put_device() at some point to release the reference
> > that you acquired by calling of_find_device_by_node(). If you don't
> > release it, you're leaking the reference and the kernel isn't going to
> > know when it's safe to delete the device.
> > 
> > So what I'm saying is that we either release it here, which isn't quite
> > right because we do reference data relating to the device later on. And
> 
> I see. A small question here by the way: By looking at other IOMMU
> drivers that are calling driver_find_device_by_fwnode() function,
> I found that most of them put_device right after the function call,
> and dev_get_drvdata() after putting the device..
> 
> Feels like they are doing it wrongly?

Well, like I said this is somewhat academic because these are all
referencing the IOMMU that by definition still needs to be around
when this code is called, and there's locks in place to ensure
these don't go away. So it's not like these drivers are doing it
wrong, they're just not doing it pedantically right.

> 
> > because it isn't quite right there should be a reason to justify it,
> > which is that the SMMU parent device is the same as the MC, so the
> > reference count isn't strictly necessary. But that's not quite obvious,
> > so highlighting it in a comment makes sense.
> > 
> > The other alternative is to not call put_device() here and keep on to
> > the reference as long as you keep using "mc". This might be difficult to
> > implement because it may not be obvious where to release it. I think
> > this is the better alternative, but if it's too complicated to implement
> > it might not be worth it.
> 
> I feel so too. The dev is got at of_xlate() that does not have an
> obvious counterpart function. So I'll just remove put_device() and
> put a line of comments, as you suggested.

I think you misunderstood. Not calling put_device() would be wrong
because that leaks a reference to the SMMU that you can't get back. My
suggestion was rather to keep put_device() here, but add a comment as to
why it's okay to call the put_device() here, even though you keep using
its private data later beyond this point, which typically would be wrong
to do.

Thierry


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-09 Thread Nicolas Saenz Julienne
On Fri, 2020-10-09 at 11:13 +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne
>  wrote:
> > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig  wrote:
> > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > > > Sadly I just realised that the series is incomplete, we have RPi4 
> > > > > users that
> > > > > want to boot unsing ACPI, and this series would break things for 
> > > > > them. I'll
> > > > > have a word with them to see what we can do for their use-case.
> > > > 
> > > > Stupid question:  why do these users insist on a totally unsuitable
> > > > interface? And why would we as Linux developers care to support such
> > > > a aims?
> > > 
> > > The point is really whether we want to revert changes in Linux that
> > > made both DT and ACPI boot work without quirks on RPi4.
> > 
> > Well, and broke a big amount of devices that were otherwise fine.
> > 
> 
> Yeah that was unfortunate.
> 
> > > Having to check the RPi4 compatible string or OEM id in core init code is
> > > awful, regardless of whether you boot via ACPI or via DT.
> > > 
> > > The problem with this hardware is that it uses a DMA mask which is
> > > narrower than 32, and the arm64 kernel is simply not set up to deal
> > > with that at all. On DT, we have DMA ranges properties and the likes
> > > to describe such limitations, on ACPI we have _DMA methods as well as
> > > DMA range attributes in the IORT, both of which are now handled
> > > correctly. So all the information is there, we just have to figure out
> > > how to consume it early on.
> > 
> > Is it worth the effort just for a single board? I don't know about ACPI but
> > parsing dma-ranges that early at boot time is not trivial. My intuition 
> > tells
> > me that it'd be even harder for ACPI, being a more complex data structure.
> > 
> 
> Yes, it will be harder, especially for the _DMA methods.
> 
> > > Interestingly, this limitation always existed in the SoC, but it
> > > wasn't until they started shipping it with more than 1 GB of DRAM that
> > > it became a problem. This means issues like this could resurface in
> > > the future with existing SoCs when they get shipped with more memory,
> > > and so I would prefer fixing this in a generic way.
> > 
> > Actually what I proposed here is pretty generic. Specially from arm64's
> > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits 
> > based on
> > whatever it finds in DT. Both those operations are architecture independent.
> > arm64 arch code doesn't care about the logic involved in ascertaining
> > zone_dma_bits. I get that the last step isn't generic. But it's all setup 
> > so as
> > to make it as such whenever it's worth the effort.
> > 
> 
> The problem is that, while we are providing a full description of the
> SoC's capabilities, we short circuit this by inserting knowledge into
> the code (that is shared between all DT architectures) that
> "brcm,bcm2711" is special, and needs a DMA zone override.

Yes I understand this and I sympathize with it, not the most beautiful thing
out there :). But that's only half the issue, as I said, implementing this
early at boot time is a tangible amount of work and a burden to maintain just
for one board. So this is the compromise we discussed with the DT maintainer
(RobH). The series sets things up so as to be able to implement the right
thing transparently to arm64's architecture when deemed worth the effort.

Ultimately, if you're worried about inserting knowledge into the code, aren't
we doing that, in a more extreme way, when imposing an extra unwarranted zone
to the whole arm64 ecosystem?

Note that I'm more that happy to work on alternative solutions, but let's first
settle on what would be acceptable to everyone.

> I think for ACPI boot, we might be able to work around this by cold
> plugging the memory above 1 GB, but I have to double check whether it
> won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that
> for me here from the top of their head)

Don't know much about what ACPI memory cold plugging involves, but we'll still
need a proper ZONE_DMA32 (i.e. spanning the whole 32-bit address space) for
RPi4.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: arm-smmu 5000000.iommu: Cannot accommodate DMA offset for IOMMU page tables

2020-10-09 Thread Naresh Kamboju
On Thu, 24 Sep 2020 at 15:26, Joerg Roedel  wrote:
>
> On Thu, Sep 24, 2020 at 10:36:47AM +0100, Robin Murphy wrote:
> > Yes, the issue was introduced by one of the changes in "dma-mapping:
> > introduce DMA range map, supplanting dma_pfn_offset", so it only existed in
> > the dma-mapping/for-next branch anyway.

FYI,
The reported problem still exists
>
> Okay, alright then.
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: arm-smmu 5000000.iommu: Cannot accommodate DMA offset for IOMMU page tables

2020-10-09 Thread Naresh Kamboju
On Fri, 9 Oct 2020 at 19:24, Naresh Kamboju  wrote:
>
>
>
> On Thu, 24 Sep 2020 at 15:26, Joerg Roedel  wrote:
> >
> > On Thu, Sep 24, 2020 at 10:36:47AM +0100, Robin Murphy wrote:
> > > Yes, the issue was introduced by one of the changes in "dma-mapping:
> > > introduce DMA range map, supplanting dma_pfn_offset", so it only existed 
> > > in
> > > the dma-mapping/for-next branch anyway.
>

FYI,
The reported problem still exists on 5.9.0-rc8-next-20201009.

[1.843814] Driver must set ecc.strength when using hardware ECC
[1.849847] WARNING: CPU: 4 PID: 1 at
drivers/mtd/nand/raw/nand_base.c:5687 nand_scan_with_ids+0x1450/0x1470
[1.859676] Modules linked in:
[1.862730] CPU: 4 PID: 1 Comm: swapper/0 Not tainted
5.9.0-rc8-next-20201009 #1
[1.870125] Hardware name: Freescale Layerscape 2088A RDB Board (DT)
[1.876478] pstate: 4005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
[1.882483] pc : nand_scan_with_ids+0x1450/0x1470
[1.887183] lr : nand_scan_with_ids+0x1450/0x1470

full test log,
https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20201009/testrun/3284876/suite/linux-log-parser/test/check-kernel-warning-92014/log

> >
> > Okay, alright then.
> >

- Naresh
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-09 Thread Lorenzo Pieralisi
On Fri, Oct 09, 2020 at 11:13:59AM +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne
>  wrote:
> >
> > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig  wrote:
> > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > > > Sadly I just realised that the series is incomplete, we have RPi4 
> > > > > users that
> > > > > want to boot unsing ACPI, and this series would break things for 
> > > > > them. I'll
> > > > > have a word with them to see what we can do for their use-case.
> > > >
> > > > Stupid question:  why do these users insist on a totally unsuitable
> > > > interface? And why would we as Linux developers care to support such
> > > > a aims?
> > >
> > > The point is really whether we want to revert changes in Linux that
> > > made both DT and ACPI boot work without quirks on RPi4.
> >
> > Well, and broke a big amount of devices that were otherwise fine.
> >
> 
> Yeah that was unfortunate.
> 
> > > Having to check the RPi4 compatible string or OEM id in core init code is
> > > awful, regardless of whether you boot via ACPI or via DT.
> > >
> > > The problem with this hardware is that it uses a DMA mask which is
> > > narrower than 32, and the arm64 kernel is simply not set up to deal
> > > with that at all. On DT, we have DMA ranges properties and the likes
> > > to describe such limitations, on ACPI we have _DMA methods as well as
> > > DMA range attributes in the IORT, both of which are now handled
> > > correctly. So all the information is there, we just have to figure out
> > > how to consume it early on.
> >
> > Is it worth the effort just for a single board? I don't know about ACPI but
> > parsing dma-ranges that early at boot time is not trivial. My intuition 
> > tells
> > me that it'd be even harder for ACPI, being a more complex data structure.
> >
> 
> Yes, it will be harder, especially for the _DMA methods.
> 
> > > Interestingly, this limitation always existed in the SoC, but it
> > > wasn't until they started shipping it with more than 1 GB of DRAM that
> > > it became a problem. This means issues like this could resurface in
> > > the future with existing SoCs when they get shipped with more memory,
> > > and so I would prefer fixing this in a generic way.
> >
> > Actually what I proposed here is pretty generic. Specially from arm64's
> > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits 
> > based on
> > whatever it finds in DT. Both those operations are architecture independent.
> > arm64 arch code doesn't care about the logic involved in ascertaining
> > zone_dma_bits. I get that the last step isn't generic. But it's all setup 
> > so as
> > to make it as such whenever it's worth the effort.
> >
> 
> The problem is that, while we are providing a full description of the
> SoC's capabilities, we short circuit this by inserting knowledge into
> the code (that is shared between all DT architectures) that
> "brcm,bcm2711" is special, and needs a DMA zone override.
> 
> I think for ACPI boot, we might be able to work around this by cold
> plugging the memory above 1 GB, but I have to double check whether it
> won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that
> for me here from the top of their head)

Is this information that we can infer from IORT nodes and make it
generic (ie make a DMA limit out of all IORT nodes allowed ranges) ?

We can move this check to IORT code and call it from arm64 if it
can be made to work.

Thanks,
Lorenzo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/3] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2020-10-09 Thread Nicolin Chen
On Fri, Oct 09, 2020 at 02:25:56PM +0200, Thierry Reding wrote:
> On Thu, Oct 08, 2020 at 02:12:10PM -0700, Nicolin Chen wrote:
> > On Thu, Oct 08, 2020 at 11:53:43AM +0200, Thierry Reding wrote:
> > > On Mon, Oct 05, 2020 at 06:05:46PM -0700, Nicolin Chen wrote:
> > > > On Mon, Oct 05, 2020 at 11:57:54AM +0200, Thierry Reding wrote:
> > > > > On Fri, Oct 02, 2020 at 11:58:29AM -0700, Nicolin Chen wrote:
> > > > > > On Fri, Oct 02, 2020 at 06:02:18PM +0300, Dmitry Osipenko wrote:
> > > > > > > 02.10.2020 09:08, Nicolin Chen пишет:
> > > > > > > >  static int tegra_smmu_of_xlate(struct device *dev,
> > > > > > > >struct of_phandle_args *args)
> > > > > > > >  {
> > > > > > > > +   struct platform_device *iommu_pdev = 
> > > > > > > > of_find_device_by_node(args->np);
> > > > > > > > +   struct tegra_mc *mc = platform_get_drvdata(iommu_pdev);
> > > > > > > > u32 id = args->args[0];
> > > > > > > >  
> > > > > > > > +   of_node_put(args->np);
> > > > > > > 
> > > > > > > of_find_device_by_node() takes device reference and not the np
> > > > > > > reference. This is a bug, please remove of_node_put().
> > > > > > 
> > > > > > Looks like so. Replacing it with put_device(&iommu_pdev->dev);
> > > > > 
> > > > > Putting the put_device() here is wrong, though. You need to make sure
> > > > > you keep a reference to it as long as you keep accessing the data that
> > > > > is owned by it.
> > > > 
> > > > I am confused. You said in the other reply (to Dmitry) that we do
> > > > need to put_device(mc->dev), where mc->dev should be the same as
> > > > iommu_pdev->dev. But here your comments sounds that we should not
> > > > put_device at all since ->probe_device/group_device/attach_dev()
> > > > will use it later.
> > > 
> > > You need to call put_device() at some point to release the reference
> > > that you acquired by calling of_find_device_by_node(). If you don't
> > > release it, you're leaking the reference and the kernel isn't going to
> > > know when it's safe to delete the device.
> > > 
> > > So what I'm saying is that we either release it here, which isn't quite
> > > right because we do reference data relating to the device later on. And
> > 
> > I see. A small question here by the way: By looking at other IOMMU
> > drivers that are calling driver_find_device_by_fwnode() function,
> > I found that most of them put_device right after the function call,
> > and dev_get_drvdata() after putting the device..
> > 
> > Feels like they are doing it wrongly?
> 
> Well, like I said this is somewhat academic because these are all
> referencing the IOMMU that by definition still needs to be around
> when this code is called, and there's locks in place to ensure
> these don't go away. So it's not like these drivers are doing it
> wrong, they're just not doing it pedantically right.
> 
> > 
> > > because it isn't quite right there should be a reason to justify it,
> > > which is that the SMMU parent device is the same as the MC, so the
> > > reference count isn't strictly necessary. But that's not quite obvious,
> > > so highlighting it in a comment makes sense.
> > > 
> > > The other alternative is to not call put_device() here and keep on to
> > > the reference as long as you keep using "mc". This might be difficult to
> > > implement because it may not be obvious where to release it. I think
> > > this is the better alternative, but if it's too complicated to implement
> > > it might not be worth it.
> > 
> > I feel so too. The dev is got at of_xlate() that does not have an
> > obvious counterpart function. So I'll just remove put_device() and
> > put a line of comments, as you suggested.
> 
> I think you misunderstood. Not calling put_device() would be wrong
> because that leaks a reference to the SMMU that you can't get back. My
> suggestion was rather to keep put_device() here, but add a comment as to
> why it's okay to call the put_device() here, even though you keep using
> its private data later beyond this point, which typically would be wrong
> to do.

I see. Thanks for clarification! Will send v6 soon.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-09 Thread Ard Biesheuvel
On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi
 wrote:
>
> On Fri, Oct 09, 2020 at 11:13:59AM +0200, Ard Biesheuvel wrote:
> > On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne
> >  wrote:
> > >
> > > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> > > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig  wrote:
> > > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne 
> > > > > wrote:
> > > > > > Sadly I just realised that the series is incomplete, we have RPi4 
> > > > > > users that
> > > > > > want to boot unsing ACPI, and this series would break things for 
> > > > > > them. I'll
> > > > > > have a word with them to see what we can do for their use-case.
> > > > >
> > > > > Stupid question:  why do these users insist on a totally unsuitable
> > > > > interface? And why would we as Linux developers care to support such
> > > > > a aims?
> > > >
> > > > The point is really whether we want to revert changes in Linux that
> > > > made both DT and ACPI boot work without quirks on RPi4.
> > >
> > > Well, and broke a big amount of devices that were otherwise fine.
> > >
> >
> > Yeah that was unfortunate.
> >
> > > > Having to check the RPi4 compatible string or OEM id in core init code 
> > > > is
> > > > awful, regardless of whether you boot via ACPI or via DT.
> > > >
> > > > The problem with this hardware is that it uses a DMA mask which is
> > > > narrower than 32, and the arm64 kernel is simply not set up to deal
> > > > with that at all. On DT, we have DMA ranges properties and the likes
> > > > to describe such limitations, on ACPI we have _DMA methods as well as
> > > > DMA range attributes in the IORT, both of which are now handled
> > > > correctly. So all the information is there, we just have to figure out
> > > > how to consume it early on.
> > >
> > > Is it worth the effort just for a single board? I don't know about ACPI 
> > > but
> > > parsing dma-ranges that early at boot time is not trivial. My intuition 
> > > tells
> > > me that it'd be even harder for ACPI, being a more complex data structure.
> > >
> >
> > Yes, it will be harder, especially for the _DMA methods.
> >
> > > > Interestingly, this limitation always existed in the SoC, but it
> > > > wasn't until they started shipping it with more than 1 GB of DRAM that
> > > > it became a problem. This means issues like this could resurface in
> > > > the future with existing SoCs when they get shipped with more memory,
> > > > and so I would prefer fixing this in a generic way.
> > >
> > > Actually what I proposed here is pretty generic. Specially from arm64's
> > > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits 
> > > based on
> > > whatever it finds in DT. Both those operations are architecture 
> > > independent.
> > > arm64 arch code doesn't care about the logic involved in ascertaining
> > > zone_dma_bits. I get that the last step isn't generic. But it's all setup 
> > > so as
> > > to make it as such whenever it's worth the effort.
> > >
> >
> > The problem is that, while we are providing a full description of the
> > SoC's capabilities, we short circuit this by inserting knowledge into
> > the code (that is shared between all DT architectures) that
> > "brcm,bcm2711" is special, and needs a DMA zone override.
> >
> > I think for ACPI boot, we might be able to work around this by cold
> > plugging the memory above 1 GB, but I have to double check whether it
> > won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that
> > for me here from the top of their head)
>
> Is this information that we can infer from IORT nodes and make it
> generic (ie make a DMA limit out of all IORT nodes allowed ranges) ?
>

Yes, that seems feasible.

> We can move this check to IORT code and call it from arm64 if it
> can be made to work.
>

Finding the smallest value in the IORT, and assigning it to
zone_dma_bits if it is < 32 should be easy. But as I understand it,
having these separate DMA and DMA32 zones is what breaks kdump, no? So
how is this going to fix the underlying issue?

Nicolas, were there any other reported regressions caused by the
introduction of ZONE_DMA?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 3/3] iommu/tegra-smmu: Add PCI support

2020-10-09 Thread Nicolin Chen
This patch simply adds support for PCI devices.

Reviewed-by: Dmitry Osipenko 
Tested-by: Dmitry Osipenko 
Signed-off-by: Nicolin Chen 
---

Changelog
v6->v7
 * Renamed goto labels, suggested by Thierry.
v5->v6
 * Added Dmitry's Reviewed-by and Tested-by.
v4->v5
 * Added Dmitry's Reviewed-by
v3->v4
 * Dropped !iommu_present() check
 * Added CONFIG_PCI check in the exit path
v2->v3
 * Replaced ternary conditional operator with if-else in .device_group()
 * Dropped change in tegra_smmu_remove()
v1->v2
 * Added error-out labels in tegra_smmu_probe()
 * Dropped pci_request_acs() since IOMMU core would call it.

 drivers/iommu/tegra-smmu.c | 35 +--
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index be29f5977145..2941d6459076 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -865,7 +866,11 @@ static struct iommu_group *tegra_smmu_device_group(struct 
device *dev)
group->smmu = smmu;
group->soc = soc;
 
-   group->group = iommu_group_alloc();
+   if (dev_is_pci(dev))
+   group->group = pci_device_group(dev);
+   else
+   group->group = generic_device_group(dev);
+
if (IS_ERR(group->group)) {
devm_kfree(smmu->dev, group);
mutex_unlock(&smmu->lock);
@@ -1075,22 +1080,32 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev,
iommu_device_set_fwnode(&smmu->iommu, dev->fwnode);
 
err = iommu_device_register(&smmu->iommu);
-   if (err) {
-   iommu_device_sysfs_remove(&smmu->iommu);
-   return ERR_PTR(err);
-   }
+   if (err)
+   goto remove_sysfs;
 
err = bus_set_iommu(&platform_bus_type, &tegra_smmu_ops);
-   if (err < 0) {
-   iommu_device_unregister(&smmu->iommu);
-   iommu_device_sysfs_remove(&smmu->iommu);
-   return ERR_PTR(err);
-   }
+   if (err < 0)
+   goto unregister;
+
+#ifdef CONFIG_PCI
+   err = bus_set_iommu(&pci_bus_type, &tegra_smmu_ops);
+   if (err < 0)
+   goto unset_platform_bus;
+#endif
 
if (IS_ENABLED(CONFIG_DEBUG_FS))
tegra_smmu_debugfs_init(smmu);
 
return smmu;
+
+unset_platform_bus: __maybe_unused;
+   bus_set_iommu(&platform_bus_type, NULL);
+unregister:
+   iommu_device_unregister(&smmu->iommu);
+remove_sysfs:
+   iommu_device_sysfs_remove(&smmu->iommu);
+
+   return ERR_PTR(err);
 }
 
 void tegra_smmu_remove(struct tegra_smmu *smmu)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 1/3] iommu/tegra-smmu: Use fwspec in tegra_smmu_(de)attach_dev

2020-10-09 Thread Nicolin Chen
In tegra_smmu_(de)attach_dev() functions, we poll DTB for each
client's iommus property to get swgroup ID in order to prepare
"as" and enable smmu. Actually tegra_smmu_configure() prepared
an fwspec for each client, and added to the fwspec all swgroup
IDs of client DT node in DTB.

So this patch uses fwspec in tegra_smmu_(de)attach_dev() so as
to replace the redundant DT polling code.

Reviewed-by: Dmitry Osipenko 
Tested-by: Dmitry Osipenko 
Signed-off-by: Nicolin Chen 
---

Changelog
v6->v7:
 * No change
v5->v6:
 * Added Dmitry's Reviewed-by and Tested-by
v4->v5:
 * Removed "index" and "err" assigning to 0
v3->v4:
 * Seperated the change, as a cleanup, from the rework patch
v1->v3:
 * N/A

 drivers/iommu/tegra-smmu.c | 56 --
 1 file changed, 23 insertions(+), 33 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 6a3ecc334481..297d49f3f80e 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -484,60 +484,50 @@ static void tegra_smmu_as_unprepare(struct tegra_smmu 
*smmu,
 static int tegra_smmu_attach_dev(struct iommu_domain *domain,
 struct device *dev)
 {
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct tegra_smmu *smmu = dev_iommu_priv_get(dev);
struct tegra_smmu_as *as = to_smmu_as(domain);
-   struct device_node *np = dev->of_node;
-   struct of_phandle_args args;
-   unsigned int index = 0;
-   int err = 0;
-
-   while (!of_parse_phandle_with_args(np, "iommus", "#iommu-cells", index,
-  &args)) {
-   unsigned int swgroup = args.args[0];
-
-   if (args.np != smmu->dev->of_node) {
-   of_node_put(args.np);
-   continue;
-   }
+   unsigned int index;
+   int err;
 
-   of_node_put(args.np);
+   if (!fwspec)
+   return -ENOENT;
 
+   for (index = 0; index < fwspec->num_ids; index++) {
err = tegra_smmu_as_prepare(smmu, as);
-   if (err < 0)
-   return err;
+   if (err)
+   goto disable;
 
-   tegra_smmu_enable(smmu, swgroup, as->id);
-   index++;
+   tegra_smmu_enable(smmu, fwspec->ids[index], as->id);
}
 
if (index == 0)
return -ENODEV;
 
return 0;
+
+disable:
+   while (index--) {
+   tegra_smmu_disable(smmu, fwspec->ids[index], as->id);
+   tegra_smmu_as_unprepare(smmu, as);
+   }
+
+   return err;
 }
 
 static void tegra_smmu_detach_dev(struct iommu_domain *domain, struct device 
*dev)
 {
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct tegra_smmu_as *as = to_smmu_as(domain);
-   struct device_node *np = dev->of_node;
struct tegra_smmu *smmu = as->smmu;
-   struct of_phandle_args args;
-   unsigned int index = 0;
-
-   while (!of_parse_phandle_with_args(np, "iommus", "#iommu-cells", index,
-  &args)) {
-   unsigned int swgroup = args.args[0];
+   unsigned int index;
 
-   if (args.np != smmu->dev->of_node) {
-   of_node_put(args.np);
-   continue;
-   }
-
-   of_node_put(args.np);
+   if (!fwspec)
+   return;
 
-   tegra_smmu_disable(smmu, swgroup, as->id);
+   for (index = 0; index < fwspec->num_ids; index++) {
+   tegra_smmu_disable(smmu, fwspec->ids[index], as->id);
tegra_smmu_as_unprepare(smmu, as);
-   index++;
}
 }
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 0/3] iommu/tegra-smmu: Add PCI support

2020-10-09 Thread Nicolin Chen
This series is to add PCI support in tegra-smmu driver.

Changelog (Detail in each patch)
v6->v7
 * Added comments for put_device in PATCH-2
 * Renamed goto labels in PATCH-3
 * Kept Dmitry's Reviewed-by and Tested-by as no function change
v5->v6
 * Dropped a NULL check, per Dmitry's comments
 * Added Dmitry's Reviewed-by and Tested-by
v4->v5
 * PATCH-1 Cleaned two variables inits
 * PATCH-2 Fixed put() in ->of_xlate() and Updated commit message
 * PATCH-3 Added Dmitry's Reviewed-by
v3->v4
 * Dropped helper function, per Thierry's comments
 * Found another way to get smmu pointer
v2->v3
 * Replaced with devm_tegra_get_memory_controller
 * Updated changes by following Dmitry's comments
v1->v2
 * Added PATCH-1 suggested by Dmitry
 * Reworked PATCH-2 to unify certain code

Nicolin Chen (3):
  iommu/tegra-smmu: Use fwspec in tegra_smmu_(de)attach_dev
  iommu/tegra-smmu: Rework tegra_smmu_probe_device()
  iommu/tegra-smmu: Add PCI support

 drivers/iommu/tegra-smmu.c | 187 +
 1 file changed, 63 insertions(+), 124 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 2/3] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2020-10-09 Thread Nicolin Chen
The bus_set_iommu() in tegra_smmu_probe() enumerates all clients
to call in tegra_smmu_probe_device() where each client searches
its DT node for smmu pointer and swgroup ID, so as to configure
an fwspec. But this requires a valid smmu pointer even before mc
and smmu drivers are probed. So in tegra_smmu_probe() we added a
line of code to fill mc->smmu, marking "a bit of a hack".

This works for most of clients in the DTB, however, doesn't work
for a client that doesn't exist in DTB, a PCI device for example.

Actually, if we return ERR_PTR(-ENODEV) in ->probe_device() when
it's called from bus_set_iommu(), iommu core will let everything
carry on. Then when a client gets probed, of_iommu_configure() in
iommu core will search DTB for swgroup ID and call ->of_xlate()
to prepare an fwspec, similar to tegra_smmu_probe_device() and
tegra_smmu_configure(). Then it'll call tegra_smmu_probe_device()
again, and this time we shall return smmu->iommu pointer properly.

So we can get rid of tegra_smmu_find() and tegra_smmu_configure()
along with DT polling code by letting the iommu core handle every
thing, except a problem that we search iommus property in DTB not
only for swgroup ID but also for mc node to get mc->smmu pointer
to call dev_iommu_priv_set() and return the smmu->iommu pointer.
So we'll need to find another way to get smmu pointer.

Referencing the implementation of sun50i-iommu driver, of_xlate()
has client's dev pointer, mc node and swgroup ID. This means that
we can call dev_iommu_priv_set() in of_xlate() instead, so we can
simply get smmu pointer in ->probe_device().

This patch reworks tegra_smmu_probe_device() by:
1) Removing mc->smmu hack in tegra_smmu_probe() so as to return
   ERR_PTR(-ENODEV) in tegra_smmu_probe_device() during stage of
   tegra_smmu_probe/tegra_mc_probe().
2) Moving dev_iommu_priv_set() to of_xlate() so we can get smmu
   pointer in tegra_smmu_probe_device() to replace DTB polling.
3) Removing tegra_smmu_configure() accordingly since iommu core
   takes care of it.

This also fixes a problem that previously we could add clients to
iommu groups before iommu core initializes its default domain:
ubuntu@jetson:~$ dmesg | grep iommu
platform 5000.host1x: Adding to iommu group 1
platform 5700.gpu: Adding to iommu group 2
iommu: Default domain type: Translated
platform 5420.dc: Adding to iommu group 3
platform 5424.dc: Adding to iommu group 3
platform 5434.vic: Adding to iommu group 4

Though it works fine with IOMMU_DOMAIN_UNMANAGED, but will have
warnings if switching to IOMMU_DOMAIN_DMA:
iommu: Failed to allocate default IOMMU domain of type 0 for
   group (null) - Falling back to IOMMU_DOMAIN_DMA
iommu: Failed to allocate default IOMMU domain of type 0 for
   group (null) - Falling back to IOMMU_DOMAIN_DMA

Now, bypassing the first probe_device() call from bus_set_iommu()
fixes the sequence:
ubuntu@jetson:~$ dmesg | grep iommu
iommu: Default domain type: Translated
tegra-host1x 5000.host1x: Adding to iommu group 0
tegra-dc 5420.dc: Adding to iommu group 1
tegra-dc 5424.dc: Adding to iommu group 1
tegra-vic 5434.vic: Adding to iommu group 2
nouveau 5700.gpu: Adding to iommu group 3

Note that dmesg log above is testing with IOMMU_DOMAIN_UNMANAGED.

Reviewed-by: Dmitry Osipenko 
Tested-by: Dmitry Osipenko 
Signed-off-by: Nicolin Chen 
---

Changelog
v6->v7
 * Added comments for put_device(), suggested by Thierry.
v5->v6
 * Dropped NULL mc/smmu pointer check, suggested by Dmitry.
 * Updated commit message with test results with vanilla DTB.
 * Added Dmitry's Reviewed-by and Tested-by.
v4->v5
 * Replaced of_node_put() with put_device() in of_xlate().
 * Added test result in commit message.
v3->v4
 * Moved dev_iommu_priv_set() to of_xlate() so we don't need
   to poll DTB for smmu pointer.
 * Removed the hack in tegra_smmu_probe() by returning ERR_PTR(
   -ENODEV) in tegra_smmu_probe_device() to let iommu core call
   in again.
 * Removed tegra_smmu_find() and tegra_smmu_configure() as iommu
   core takes care of fwspec.
v2->v3
 * Used devm_tegra_get_memory_controller() to get mc pointer
 * Replaced IS_ERR_OR_NULL with IS_ERR in .probe_device()
v1->v2
 * Replaced in .probe_device() tegra_smmu_find/tegra_smmu_configure()
   with tegra_get_memory_controller call.
 * Dropped the hack in tegra_smmu_probe().

 drivers/iommu/tegra-smmu.c | 96 ++
 1 file changed, 15 insertions(+), 81 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 297d49f3f80e..be29f5977145 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -797,75 +797,9 @@ static phys_addr_t tegra_smmu_iova_to_phys(struct 
iommu_domain *domain,
return SMMU_PFN_PHYS(pfn) + SMMU_OFFSET_IN_PAGE(iova);
 }
 
-static struct tegra_smmu *tegra_smmu_find(struct device_node *np)
-{
-   struct platform_device *pdev;
- 

Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-09 Thread Catalin Marinas
On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi
>  wrote:
> > We can move this check to IORT code and call it from arm64 if it
> > can be made to work.
> 
> Finding the smallest value in the IORT, and assigning it to
> zone_dma_bits if it is < 32 should be easy. But as I understand it,
> having these separate DMA and DMA32 zones is what breaks kdump, no? So
> how is this going to fix the underlying issue?

If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32
allocations fall back to ZONE_DMA).

kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would
be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly
became 1GB. We could change kdump to allocate ZONE_DMA32 but this one
may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel
would need to be rebuilt without ZONE_DMA since it won't have any. IIRC
(it's been a while since I looked), the kdump allocation couldn't span
multiple zones.

In a separate thread, we try to fix kdump to use allocations above 4G as
a fallback but this only fixes platforms with enough RAM (and maybe it's
only those platforms that care about kdump).

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 02/24] dt-bindings: memory: mediatek: Convert SMI to DT schema

2020-10-09 Thread Yong Wu
On Tue, 2020-10-06 at 09:15 +0200, Krzysztof Kozlowski wrote:
> On Tue, 6 Oct 2020 at 06:27, Yong Wu  wrote:
> >
> > On Fri, 2020-10-02 at 13:08 +0200, Krzysztof Kozlowski wrote:
> > > On Wed, Sep 30, 2020 at 03:06:25PM +0800, Yong Wu wrote:
> > > > Convert MediaTek SMI to DT schema.
> > > >
> > > > Signed-off-by: Yong Wu 
> > > > ---
> > > >  .../mediatek,smi-common.txt   |  49 -
> > > >  .../mediatek,smi-common.yaml  | 100 ++
> > > >  .../memory-controllers/mediatek,smi-larb.txt  |  49 -
> > > >  .../memory-controllers/mediatek,smi-larb.yaml |  91 
> > > >  4 files changed, 191 insertions(+), 98 deletions(-)
> > > >  delete mode 100644 
> > > > Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.txt
> > > >  create mode 100644 
> > > > Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml
> > > >  delete mode 100644 
> > > > Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt
> > > >  create mode 100644 
> > > > Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml
> > ...
> > > > +properties:
> > > > +  compatible:
> > > > +oneOf:
> > > > +  - enum:
> > > > +  - mediatek,mt2701-smi-common
> > > > +  - mediatek,mt2712-smi-common
> > > > +  - mediatek,mt6779-smi-common
> > > > +  - mediatek,mt8173-smi-common
> > > > +  - mediatek,mt8183-smi-common
> > > > +
> > > > +  - description: for mt7623
> > > > +items:
> > > > +  - const: mediatek,mt7623-smi-common
> > > > +  - const: mediatek,mt2701-smi-common
> > > > +
> > > > +  reg:
> > > > +maxItems: 1
> > > > +
> > > > +  clocks:
> > > > +description: |
> > > > +  apb and smi are mandatory. the async is only for generation 1 
> > > > smi HW.
> > > > +  gals(global async local sync) also is optional, here is the list 
> > > > which
> > > > +  require gals: mt6779 and mt8183.
> > > > +minItems: 2
> > > > +maxItems: 4
> > > > +items:
> > > > +  - description: apb is Advanced Peripheral Bus clock, It's the 
> > > > clock for
> > > > +  setting the register.
> > > > +  - description: smi is the clock for transfer data and command.
> > > > +  - description: async is asynchronous clock, it help transform 
> > > > the smi clock
> > > > +  into the emi clock domain.
> > > > +  - description: gals0 is the path0 clock of gals.
> > > > +  - description: gals1 is the path1 clock of gals.
> > > > +
> > > > +  clock-names:
> > > > +oneOf:
> > > > +  - items:
> > > > +  - const: apb
> > > > +  - const: smi
> > > > +  - items:
> > > > +  - const: apb
> > > > +  - const: smi
> > > > +  - const: async
> > > > +  - items:
> > > > +  - const: apb
> > > > +  - const: smi
> > > > +  - const: gals0
> > > > +  - const: gals1
> > >
> > > Similarly to my comment to other properties, this requirement per
> > > compatible should be part of the schema within 'if-then'.
> >
> > I'm not so familiar with this format. Do this has "if-then-'else
> > if'-then-else"?
> 
> These are mutually exclusive conditions, so you can skip else:
>  - if-then
>  - if-then
>  - if-then
> It will be more readable then stacking 'if' under 'else'

Thanks. I will use something like this:

 anyOf:
   - if: #gen1 hw
 then:
   use apb/smi/async clocks

   - if: #gen2 hw that has gals.
 then:
   use apb/smi/gals0/gals1 clocks
 else: # gen2 hw that doesn't have gals.
   use apb/smi clocks.

> 
> >
> > I tried below instead of the clocks segment above:
> >
> > ===
> > if:
> >   properties:
> > compatible:
> 
> Missing contains. Just take an example from some existing schema.


Like the example you gave below
(Documentation/devicetree/bindings/clock/idt,versaclock5.yaml), It also
doesn't have "contains" in "if". I guess it is unnecessary if there is
only one compatible string. it may be necessary when it has backward
compatible string.

> 
> >   enum:
> > - mediatek,mt6779-smi-common
> > - mediatek,mt8183-smi-common
> >
> > then:
> >   properties:
> > clock:
> >   items:
> > - description: apb is the clock for setting the register..
> > - description: smi is the clock for transfer data and command.
> > - description: gals0 is the path0 clock of gals(global async
> > local sync).
> > - description: gals1 is the path1 clock of gals.
> > clock-names:
> >   items:
> > - const: apb
> > - const: smi
> > - const: gals0
> > - const: gals1
> > else:
> >   if:
> > properties:
> >   compatible:
> > contains:
> >   enum:
> > - mediatek,mt2701-smi-common
> >
> >   then:
> > properties:
> >   clocks:
> > items:
> >   - desc