Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-04 Thread Alexey Kardashevskiy




On 05/05/2021 04:15, Jason Gunthorpe wrote:

On Tue, May 04, 2021 at 01:54:55PM +1000, David Gibson wrote:

On Mon, May 03, 2021 at 01:05:30PM -0300, Jason Gunthorpe wrote:

On Thu, Apr 29, 2021 at 01:20:22PM +1000, David Gibson wrote:

There is a certain appeal to having some
'PPC_TCE_CREATE_SPECIAL_IOASID' entry point that has a wack of extra
information like windows that can be optionally called by the viommu
driver and it remains well defined and described.


Windows really aren't ppc specific.  They're absolutely there on x86
and everything else as well - it's just that people are used to having
a window at 0.. that you can often get away with
treating it sloppily.


My point is this detailed control seems to go on to more than just
windows. As you say the vIOMMU is emulating specific HW that needs to
have kernel interfaces to match it exactly.


It's really not that bad.  The case of emulating the PAPR vIOMMU on
something else is relatively easy, because all updates to the IO page
tables go through hypercalls.  So, as long as the backend IOMMU can
map all the IOVAs that the guest IOMMU can, then qemu's implementation
of those hypercalls just needs to put an equivalent mapping in the
backend, which it can do with a generic VFIO_DMA_MAP.


So you also want the PAPR vIOMMU driver to run on, say, an ARM IOMMU?



This is a good feature in general when let's say there is a linux 
supported device which has a proprietary device firmware update tool 
which only exists as an x86 binary and your hardware is not x86 - 
running qemu + vfio in full emulation would provide a way to run the 
tool to update a physical device.



--
Alexey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-04 Thread Jason Gunthorpe
On Tue, May 04, 2021 at 03:11:54PM -0700, Jacob Pan wrote:

> > It is a weird way to use xarray to have a structure which
> > itself is just a wrapper around another RCU protected structure.
> > 
> > Make the caller supply the ioasid_data memory, embedded in its own
> > element, get rid of the void * and rely on XA_ZERO_ENTRY to hold
> > allocated but not active entries.
> > 
> Let me try to paraphrase to make sure I understand. Currently
> struct ioasid_data is private to the iasid core, its memory is allocated by
> the ioasid core.
> 
> You are suggesting the following:
> 1. make struct ioasid_data public
> 2. caller allocates memory for ioasid_data, initialize it then pass it to
> ioasid_alloc to store in the xarray
> 3. caller will be responsible for setting private data inside ioasid_data
> and do call_rcu after update if needed.

Basically, but you probably won't need a "private data" once the
caller has this struct as it can just embed it in whatever larger
struct makes sense for it and use container_of/etc

I didn't look too closely at the whole thing though. Honestly I'm a
bit puzzled why we need a pluggable global allocator framework.. The
whole framework went to some trouble to isolate everything into iommu
drivers then that whole design is disturbed by this global thing.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v5 5/7] iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt

2021-05-04 Thread Thomas Gleixner
On Tue, May 04 2021 at 12:10, Ricardo Neri wrote:
> In x86 there is not an IRQF_NMI flag that can be used to indicate the

There exists no IRQF_NMI flag at all. No architecture provides that.

> delivery mode when requesting an interrupt (via request_irq()). Thus,
> there is no way for the interrupt remapping driver to know and set
> the delivery mode.

There is no support for this today. So what?

> Hence, when allocating an interrupt, check if such interrupt belongs to
> the HPET hardlockup detector and fixup the delivery mode accordingly.

What?

> + /*
> +  * If we find the HPET hardlockup detector irq, fixup the
> +  * delivery mode.
> +  */
> + if (is_hpet_irq_hardlockup_detector(info))
> + irq_cfg->delivery_mode = APIC_DELIVERY_MODE_NMI;

Again. We are not sticking some random device checks into that
code. It's wrong and I explained it to you before.

  
https://lore.kernel.org/lkml/alpine.deb.2.21.1906161042080.1...@nanos.tec.linutronix.de/

But I'm happy to repeat it again:

  "No. This is horrible hackery violating all the layering which we carefully
   put into place to avoid exactly this kind of sprinkling conditionals into
   all code pathes.

   With some thought the existing irqdomain hierarchy can be used to achieve
   the same thing without tons of extra functions and conditionals."

So the outcome of thought and using the irqdomain hierarchy is:

   Replacing an hpet specific conditional in one place with an hpet
   specific conditional in a different place.

Impressive.

hpet_assign_irq(, bool nmi)
  init_info(info)
...
if (nmi)
info.flags |= X86_IRQ_ALLOC_AS_NMI;
  
   irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, )
 intel_irq_remapping_alloc(..., info)
   irq_domain_alloc_irq_parents(..., info)
 x86_vector_alloc_irqs(..., info)
 {   
   if (info->flags & X86_IRQ_ALLOC_AS_NMI && nr_irqs != 1)
  return -EINVAL;

   for (i = 0; i < nr_irqs; i++) {
 
 if (info->flags & X86_IRQ_ALLOC_AS_NMI) {
 irq_cfg_setup_nmi(apicd);
 continue;
 }
 ...
 }

irq_cfg_setup_nmi() sets irq_cfg->delivery_mode and whatever is required
and everything else just works. Of course this needs a few other minor
tweaks but none of those introduces random hpet quirks all over the
place. Not convoluted enough, right?

But that solves none of other problems. Let me summarize again which
options or non-options we have:

1) Selective IPIs from NMI context cannot work

   As explained in the other thread.

2) Shorthand IPI allbutself from NMI

   This should work, but that obviously does not take the watchdog
   cpumask into account.

   Also this only works when IPI shorthand mode is enabled. See
   apic_smt_update() for details.

3) Sending the IPIs from irq_work

   This would solve the problem, but if the CPU which is the NMI
   target is really stuck in an interrupt disabled region then the
   IPIs won't be sent.

   OTOH, if that's the case then the CPU which was processing the
   NMI will continue to be stuck until the next NMI hits which
   will detect that the CPU is stuck which is a good enough
   reason to send a shorthand IPI to all CPUs ignoring the
   watchdog cpumask.

   Same limitation vs. shorthand mode as #2

4) Changing affinity of the HPET NMI from NMI

   As we established two years ago that cannot work with interrupt
   remapping

5) Changing affinity of the HPET NMI from irq_work

   Same issues as #3

Anything else than #2 is just causing more problems than it solves, but
surely the NOHZ_FULL/isolation people might have opinions on this.

OTOH, as this is opt-in, anything which wants a watchdog mask which is
not the full online set, has to accept that HPET has these restrictions.

And that's exactly what I suggested two years ago:

 
https://lore.kernel.org/lkml/alpine.deb.2.21.1906172343120.1...@nanos.tec.linutronix.de/

  "It definitely would be worthwhile to experiment with that. if we
   could use shorthands (also for regular IPIs) that would be a great
   improvement in general and would nicely solve that NMI issue. Beware
   of the dragons though."

As a consequence of this conversation I implemented shorthand IPIs...

But I haven't seen any mentioning that this has been tried, why the
approach was not chosen or any discussion about that matter.

Not that I'm surprised.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-04 Thread Jacob Pan
Hi Jason,

On Tue, 4 May 2021 15:00:50 -0300, Jason Gunthorpe  wrote:

> On Tue, May 04, 2021 at 08:41:48AM -0700, Jacob Pan wrote:
> > > > 
> > > > (also looking at ioasid.c, why do we need such a thin and odd
> > > > wrapper around xarray?)
> > > > 
> > > 
> > > I'll leave it to Jean and Jacob.  
> 
> > Could you elaborate?  
> 
> I mean stuff like this:
> 
> int ioasid_set_data(ioasid_t ioasid, void *data)
> {
> struct ioasid_data *ioasid_data;
> int ret = 0;
> 
> spin_lock(_allocator_lock);
> ioasid_data = xa_load(_allocator->xa, ioasid);
> if (ioasid_data)
> rcu_assign_pointer(ioasid_data->private, data);
> else
> ret = -ENOENT;
> spin_unlock(_allocator_lock);
> 
> /*
>  * Wait for readers to stop accessing the old private data, so the
>  * caller can free it.
>  */
> if (!ret)
> synchronize_rcu();
> 
> return ret;
> }
> EXPORT_SYMBOL_GPL(ioasid_set_data);
> 
> It is a weird way to use xarray to have a structure which
> itself is just a wrapper around another RCU protected structure.
> 
> Make the caller supply the ioasid_data memory, embedded in its own
> element, get rid of the void * and rely on XA_ZERO_ENTRY to hold
> allocated but not active entries.
> 
Let me try to paraphrase to make sure I understand. Currently
struct ioasid_data is private to the iasid core, its memory is allocated by
the ioasid core.

You are suggesting the following:
1. make struct ioasid_data public
2. caller allocates memory for ioasid_data, initialize it then pass it to
ioasid_alloc to store in the xarray
3. caller will be responsible for setting private data inside ioasid_data
and do call_rcu after update if needed.

Correct?

> Make the synchronize_rcu() the caller responsiblity, and callers
> should really be able to use call_rcu()
> 
> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 3/7] iommu/vt-d: Rework prepare_irte() to support per-irq delivery mode

2021-05-04 Thread Ricardo Neri
A previous changeset introduced a new member to struct irq_cfg to specify
the delivery mode of an interrupt. Supporting the configuration of the
delivery mode would require adding a third argument to prepare_irte().
Instead, simply take a pointer to an irq_cfg data structure as the only
argument.

Always configure the delivery mode of the Interrupt Remapping Table
Entry using the values specified in the irq_cfg data structure.

This change does not change the existing behavior, as the delivery mode
of the APIC is used to configure the irq_cfg of each irq.

Cc: Andi Kleen 
Cc: Borislav Petkov 
Cc: David Woodhouse  (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" 
Cc: Ingo Molnar 
Cc: Jacob Pan 
Cc: Lu Baolu  (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x...@kernel.org
Reviewed-by: Ashok Raj 
Signed-off-by: Ricardo Neri 
---
Changes since v4:
 * None

Changes since v3:
 * None

Changes since v2:
 * None

Changes since v1:
 * Introduced this patch.
---
 drivers/iommu/intel/irq_remapping.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel/irq_remapping.c 
b/drivers/iommu/intel/irq_remapping.c
index 611ef5243cb6..daa5df53db59 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1104,7 +1104,7 @@ void intel_irq_remap_add_device(struct 
dmar_pci_notify_info *info)
dev_set_msi_domain(>dev->dev, map_dev_to_ir(info->dev));
 }
 
-static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
+static void prepare_irte(struct irte *irte, struct irq_cfg *irq_cfg)
 {
memset(irte, 0, sizeof(*irte));
 
@@ -1118,9 +1118,9 @@ static void prepare_irte(struct irte *irte, int vector, 
unsigned int dest)
 * irq migration in the presence of interrupt-remapping.
*/
irte->trigger_mode = 0;
-   irte->dlvry_mode = apic->delivery_mode;
-   irte->vector = vector;
-   irte->dest_id = IRTE_DEST(dest);
+   irte->dlvry_mode = irq_cfg->delivery_mode;
+   irte->vector = irq_cfg->vector;
+   irte->dest_id = IRTE_DEST(irq_cfg->dest_apicid);
irte->redir_hint = 1;
 }
 
@@ -1261,8 +1261,7 @@ static void intel_irq_remapping_prepare_irte(struct 
intel_ir_data *data,
 {
struct irte *irte = >irte_entry;
 
-   prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid);
-
+   prepare_irte(irte, irq_cfg);
switch (info->type) {
case X86_IRQ_ALLOC_TYPE_IOAPIC:
/* Set source-id of interrupt request */
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 5/7] iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt

2021-05-04 Thread Ricardo Neri
The HPET hardlockup detector requires that the HPET timer delivers the
interrupt as NMI. When interrupt remapping is disabled, this can be
done by programming the HPET MSI registers directly. With interrupt
remapping, it is necessary to populate an entry in the interrupt
remapping table.

In x86 there is not an IRQF_NMI flag that can be used to indicate the
delivery mode when requesting an interrupt (via request_irq()). Thus,
there is no way for the interrupt remapping driver to know and set
the delivery mode.

Hence, when allocating an interrupt, check if such interrupt belongs to
the HPET hardlockup detector and fixup the delivery mode accordingly.

Cc: Andi Kleen 
Cc: Borislav Petkov 
Cc: David Woodhouse  (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" 
Cc: Ingo Molnar 
Cc: Jacob Pan 
Cc: Lu Baolu  (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x...@kernel.org
Reviewed-by: Ashok Raj 
Signed-off-by: Ricardo Neri 
---
Changes since v4:
 * Introduced this patch.

Changes since v3:
 * N/A

Changes since v2:
 * N/A

Changes since v1:
 * N/A
---
 drivers/iommu/intel/irq_remapping.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/intel/irq_remapping.c 
b/drivers/iommu/intel/irq_remapping.c
index daa5df53db59..b07c68ecac01 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1376,6 +1377,14 @@ static int intel_irq_remapping_alloc(struct irq_domain 
*domain,
irq_data->hwirq = (index << 16) + i;
irq_data->chip_data = ird;
irq_data->chip = _ir_chip;
+
+   /*
+* If we find the HPET hardlockup detector irq, fixup the
+* delivery mode.
+*/
+   if (is_hpet_irq_hardlockup_detector(info))
+   irq_cfg->delivery_mode = APIC_DELIVERY_MODE_NMI;
+
intel_irq_remapping_prepare_irte(ird, irq_cfg, info, index, i);
irq_set_status_flags(virq + i, IRQ_MOVE_PCNTXT);
}
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 1/7] x86/apic: Add irq_cfg::delivery_mode

2021-05-04 Thread Ricardo Neri
Until now, the delivery mode of APIC interrupts is set to the default
mode set in the APIC driver. However, there are no restrictions in hardware
to configure each interrupt with a different delivery mode. Specifying the
delivery mode per interrupt is useful when one is interested in changing
the delivery mode of a particular interrupt. For instance, this can be used
to deliver an interrupt as non-maskable.

Add a new member, delivery_mode, to struct irq_cfg. This new member can
be used to update the configuration of the delivery mode in each interrupt
domain.

Currently, all interrupt domains set the delivery mode of interrupts using
the APIC setting. Interrupt domains use an irq_cfg data structure to
configure their own data structures and hardware resources. Thus, in order
to keep the current behavior, set the delivery mode of the irq
configuration that as the APIC setting. In this manner, irq domains can
obtain the delivery mode from the irq configuration data instead of the
APIC setting, if needed.

Cc: Andi Kleen 
Cc: Borislav Petkov 
Cc: David Woodhouse  (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" 
Cc: Ingo Molnar 
Cc: Jacob Pan 
Cc: Lu Baolu  (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x86@kernel.orgReviewed-by: Ashok Raj 
Signed-off-by: Ricardo Neri 
---
Changes since v4:
 * Rebased to use new enumeration apic_delivery_modes.

Changes since v3:
 * None

Changes since v2:
 * Reduced scope to only add the interrupt delivery mode in
   struct irq_alloc_info.

Changes since v1:
 * Introduced this patch.
---
 arch/x86/include/asm/hw_irq.h |  1 +
 arch/x86/kernel/apic/vector.c | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index d465ece58151..370f4db0372b 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -90,6 +90,7 @@ struct irq_alloc_info {
 struct irq_cfg {
unsigned intdest_apicid;
unsigned intvector;
+   enum apic_delivery_modesdelivery_mode;
 };
 
 extern struct irq_cfg *irq_cfg(unsigned int irq);
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 6dbdc7c22bb7..d47ed07a56a4 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -567,6 +567,16 @@ static int x86_vector_alloc_irqs(struct irq_domain 
*domain, unsigned int virq,
irqd->chip_data = apicd;
irqd->hwirq = virq + i;
irqd_set_single_target(irqd);
+
+   /*
+* Initialize the delivery mode of this irq to match the
+* default delivery mode of the APIC. This is useful for
+* children irq domains which want to take the delivery
+* mode from the individual irq configuration rather
+* than from the APIC.
+*/
+apicd->hw_irq_cfg.delivery_mode = apic->delivery_mode;
+
/*
 * Prevent that any of these interrupts is invoked in
 * non interrupt context via e.g. generic_handle_irq()
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 6/7] iommu/amd: Fixup delivery mode of the HPET hardlockup interrupt

2021-05-04 Thread Ricardo Neri
The HPET hardlockup detector requires that the HPET timer delivers the
interrupt as NMI. When interrupt remapping is disabled, this can be
done by programming the HPET MSI registers directly. With interrupt
remapping, it is necessary to populate an entry in the interrupt
remapping table.

In x86 there is not an IRQF_NMI flag that can be used to indicate the
delivery mode when requesting an interrupt (via request_irq()). Thus,
there is no way for the interrupt remapping driver to know and set
the delivery mode.

Hence, when allocating an interrupt, check if such interrupt belongs to
the HPET hardlockup detector and fixup the delivery mode accordingly.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Borislav Petkov 
Cc: David Woodhouse  (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" 
Cc: Ingo Molnar 
Cc: Jacob Pan 
Cc: Lu Baolu  (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
Changes since v4:
 * Introduced this patch.

Changes since v3:
 * N/A

Changes since v2:
 * N/A

Changes since v1:
 * N/A
---
 drivers/iommu/amd/iommu.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e8d9fae0c766..758e08ba42e6 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3254,6 +3255,14 @@ static int irq_remapping_alloc(struct irq_domain 
*domain, unsigned int virq,
irq_data->hwirq = (devid << 16) + i;
irq_data->chip_data = data;
irq_data->chip = _ir_chip;
+
+   /*
+* If we find the HPET hardlockup detector irq, fixup the
+* delivery mode.
+*/
+   if (is_hpet_irq_hardlockup_detector(info))
+   cfg->delivery_mode = APIC_DELIVERY_MODE_NMI;
+
irq_remapping_prepare_irte(data, cfg, info, devid, index, i);
irq_set_status_flags(virq + i, IRQ_MOVE_PCNTXT);
}
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 2/7] x86/hpet: Introduce function to identify HPET hardlockup detector irq

2021-05-04 Thread Ricardo Neri
The HPET hardlockup detector needs to deliver its interrupt as NMI.
In x86 there is not an IRQF_NMI flag that can be used in the irq plumbing
code to tell interrupt remapping drivers to set the interrupt delivery
mode accordingly. Hence, they must fixup the delivery mode internally.

Implement a method to determine if the interrupt being allocated belongs
to the HPET hardlockup detector.

Cc: Andi Kleen 
Cc: Borislav Petkov 
Cc: David Woodhouse  (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" 
Cc: Ingo Molnar 
Cc: Jacob Pan 
Cc: Lu Baolu  (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x...@kernel.org
Reviewed-by: Ashok Raj 
Signed-off-by: Ricardo Neri 
---
Changes since v4:
 * Introduced this patch. Previous versions had special functions to
   allocate and set the affinity of a remapped NMI interrupt.

Changes since v3:
 * N/A

Changes since v2:
 * N/A

Changes since v1:
 * N/A
---
 arch/x86/include/asm/hpet.h |  3 +++
 arch/x86/kernel/hpet.c  | 33 +
 2 files changed, 36 insertions(+)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index df11c7d4af44..5bf675970d4b 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -149,6 +149,7 @@ extern void hardlockup_detector_hpet_stop(void);
 extern void hardlockup_detector_hpet_enable(unsigned int cpu);
 extern void hardlockup_detector_hpet_disable(unsigned int cpu);
 extern void hardlockup_detector_switch_to_perf(void);
+extern bool is_hpet_irq_hardlockup_detector(struct irq_alloc_info *info);
 #else
 static inline int hardlockup_detector_hpet_init(void)
 { return -ENODEV; }
@@ -156,6 +157,8 @@ static inline void hardlockup_detector_hpet_stop(void) {}
 static inline void hardlockup_detector_hpet_enable(unsigned int cpu) {}
 static inline void hardlockup_detector_hpet_disable(unsigned int cpu) {}
 static inline void hardlockup_detector_switch_to_perf(void) {}
+static inline bool is_hpet_irq_hardlockup_detector(struct irq_alloc_info *info)
+{ return false; }
 #endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
 
 #else /* CONFIG_HPET_TIMER */
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 5012590dc1b8..3e43e0f348b8 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -1479,6 +1479,39 @@ struct hpet_hld_data *hpet_hld_get_timer(void)
hld_data = NULL;
return NULL;
 }
+
+/**
+ * is_hpet_irq_hardlockup_detector() - Identify the HPET hld interrupt info
+ * @info:  Interrupt allocation info, with private HPET channel data
+ *
+ * The HPET hardlockup detector is special as it needs its interrupts delivered
+ * as NMI. However, for interrupt remapping we use the existing irq subsystem
+ * to configure and route the HPET interrupt. Unfortunately, there is not a
+ * IRQF_NMI flag for x86. Instead, identify whether the interrupt being
+ * allocated for the HPET channel belongs to the hardlockup detector.
+ *
+ * Returns: True if @info indicates that it belongs to the HPET hardlockup
+ * detector. False otherwise.
+ */
+bool is_hpet_irq_hardlockup_detector(struct irq_alloc_info *info)
+{
+   struct hpet_channel *hc;
+
+   if (!info)
+   return false;
+
+   if (info->type != X86_IRQ_ALLOC_TYPE_HPET)
+   return false;
+
+   hc = info->data;
+   if (!hc)
+   return false;
+
+   if (hc->mode == HPET_MODE_NMI_WATCHDOG)
+   return true;
+
+   return false;
+}
 #endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
 
 #endif
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 7/7] x86/watchdog/hardlockup/hpet: Support interrupt remapping

2021-05-04 Thread Ricardo Neri
When interrupt remapping is enabled in the system, the MSI interrupt
address and data fields must follow a special format that the IOMMU
defines.

However, the HPET hardlockup detector must rely on the interrupt
subsystem to have the interrupt remapping drivers allocate, activate,
and set the affinity of HPET timer interrupt. Hence, it must use
request_irq() to use such functionality.

In x86 there is not an IRQF_NMI flag to indicate to the interrupt
subsystem the delivery mode of the interrupt. A previous changset added
functionality to detect the interrupt of the HPET hardlockup detector
and fixup the delivery mode accordingly.

Also, since request_irq() is used, a non-NMI interrupt handler must be
defined. Even if it is not needed.

When Interrupt Remapping is enabled, use the new facility to ensure
interrupt is plumbed properly to work with interrupt remapping.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Borislav Petkov 
Cc: David Woodhouse  (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" 
Cc: Ingo Molnar 
Cc: Jacob Pan 
Cc: Lu Baolu  (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
Changes since v4:
 * Use request_irq() to obtain an IRTE for the HPET hardlockup detector
   instead of the custom interfaces previously implemented in the
   interrupt remapping drivers.
 * Simplified detection of interrupt remapping by checking the parent
   of the HPET irq domain.
 * Stopped using the HPET magic fields of struct irq_alloc_info. They
   were removed in commit 2bf1e7bcedb8 ("x86/msi: Consolidate HPET
   allocation")
 * Rephrased commit message for clarity. (Ashok)
 * Clarified error message of non-NMI handler. (Ashok)

Changes since v3:
 * None

Changes since v2:
 * None

Changes since v1:
 * Introduced this patch. Added custom functions in the Intel IOMMU driver
   to allocate an IRTE for the HPET hardlockup detector.
---
 arch/x86/include/asm/hpet.h |  2 ++
 arch/x86/kernel/hpet.c  |  3 ++
 arch/x86/kernel/watchdog_hld_hpet.c | 48 +
 3 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 5bf675970d4b..d130285ddc96 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -109,6 +109,7 @@ extern void hpet_unregister_irq_handler(rtc_irq_handler 
handler);
  * @tsc_ticks_per_group:   TSC ticks that must elapse for each group of
  * monitored CPUs.
  * @irq:   IRQ number assigned to the HPET channel
+ * @int_remap_enabled: True if interrupt remapping is enabled
  * @handling_cpu:  CPU handling the HPET interrupt
  * @pkgs_per_group:Number of physical packages in a group of CPUs
  * receiving an IPI
@@ -133,6 +134,7 @@ struct hpet_hld_data {
u64 tsc_next;
u64 tsc_ticks_per_group;
int irq;
+   boolintr_remap_enabled;
u32 handling_cpu;
u32 pkgs_per_group;
u32 nr_groups;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 3e43e0f348b8..ff4abdef5e15 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -1464,6 +1464,9 @@ struct hpet_hld_data *hpet_hld_get_timer(void)
if (!hpet_domain)
goto err;
 
+   if (hpet_domain->parent != x86_vector_domain)
+   hld_data->intr_remap_enabled = true;
+
hc->mode = HPET_MODE_NMI_WATCHDOG;
irq = hpet_assign_irq(hpet_domain, hc, hc->num);
if (irq <= 0)
diff --git a/arch/x86/kernel/watchdog_hld_hpet.c 
b/arch/x86/kernel/watchdog_hld_hpet.c
index 3fd2405b31fa..265641d001ac 100644
--- a/arch/x86/kernel/watchdog_hld_hpet.c
+++ b/arch/x86/kernel/watchdog_hld_hpet.c
@@ -176,6 +176,14 @@ static int update_msi_destid(struct hpet_hld_data *hdata)
 {
u32 destid;
 
+   if (hdata->intr_remap_enabled) {
+   int ret;
+
+   ret = irq_set_affinity(hdata->irq,
+  cpumask_of(hdata->handling_cpu));
+   return ret;
+   }
+
destid = apic->calc_dest_apicid(hdata->handling_cpu);
/*
 * HPET only supports a 32-bit MSI address register. Thus, only
@@ -393,26 +401,52 @@ static int hardlockup_detector_nmi_handler(unsigned int 
type,
return NMI_DONE;
 }
 
+/*
+ * When interrupt remapping is enabled, we request the irq for the detector
+ * using request_irq() and then we fixup the delivery mode to NMI using
+ * is_hpet_irq_hardlockup_detector(). If the latter fails, we will see a non-
+ * NMI interrupt.
+ *
+ */
+static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
+{
+   pr_err_once("Received a non-NMI interrupt. The HLD detector always uses 
NMIs!\n");
+  

[RFC PATCH v5 4/7] iommu/amd: Set the IRTE delivery mode from irq_cfg

2021-05-04 Thread Ricardo Neri
There is not hardware requirement to have a different delivery mode for
each interrupt. Instead of using the delivery mode of the APIC driver, use
the delivery mode of each specific interrupt configuration.

This allows to accommodate interrupts which require a specific delivery
mode, such as the HPET hardlockup detector.

Outside of such case, there are not functional changes since the delivery
mode of an interrupt is initialized with the delivery mode of the APIC
driver.

Cc: Andi Kleen 
Cc: Borislav Petkov 
Cc: David Woodhouse  (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" 
Cc: Ingo Molnar 
Cc: Jacob Pan 
Cc: Lu Baolu  (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x...@kernel.org
Reviewed-by: Ashok Raj 
Signed-off-by: Ricardo Neri 
---
Changes since v4:
 * Introduced this patch.

Changes since v3:
 * N/A

Changes since v2:
 * N/A

Changes since v1:
 * N/A
---
 drivers/iommu/amd/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a69a8b573e40..e8d9fae0c766 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3122,7 +3122,7 @@ static void irq_remapping_prepare_irte(struct amd_ir_data 
*data,
 
data->irq_2_irte.devid = devid;
data->irq_2_irte.index = index + sub_handle;
-   iommu->irte_ops->prepare(data->entry, apic->delivery_mode,
+   iommu->irte_ops->prepare(data->entry, irq_cfg->delivery_mode,
 apic->dest_mode_logical, irq_cfg->vector,
 irq_cfg->dest_apicid, devid);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 0/7] x86: watchdog/hardlockup/hpet: Add support for interrupt remapping

2021-05-04 Thread Ricardo Neri
Hi IOMMU experts,

I proposed a hardlockup detector driven by the HPET timer [1]. Such
detector is driven by a single timer. The hardlockup detector brings the
extra complexity of having to update the affinity of the interrupt
periodically and initiated from NMI context. The proposed design only
requires updating the affinity every watchdog_thresh (the interval is
between [1, 60] seconds). Also, the affinity update is offloaded to
an irq_work. Handling the HPET interrupt affinity is trivial with
!intremap since the detector composes the MSI message and writes it
directly to the HPET registers.

However, for intremap we must use the existing IOMMU drivers as well as
the kernel's irq plumbing. Thomas Gleixner has imposed two restrictions:
  1) Do not implement an IRQF_NMI flag for x86 as it is not possible to
 determine the source of an NMI [2].
  2) Use the irq subsystem to update the affinity of the HPET
 interrupt [3].

1) implies that the interrupt remapping drivers need to implement a quirk
to identify the HPET interrupt and update its delivery mode to NMI. 2)
means that the hardlockup detector must use request_irq() to allocate the
HPET interrupt.

This patch series attempts to meet the requirements above by
  a) Decoupling the delivery mode of an APIC interrupt from the delivery
 mode of the APIC driver (patch 1)
  b) Implement quirks in the Intel and AMD IOMMU drivers to identify the
 HPET timer and update the delivery mode accordingly (patches 2-5).
  c) Add support for interrupt remapping in the HPET hardlockup detector
 in [1]. This includes the unavoidable eyesore of using request_irq()
 and having a useless regular interrupt handler (patch 6).

I would like to get your feedback on whether the HPET NMI quirk looks
sane to you and whether offloading the affinity setup to an irq_work
could pose issues.

Thanks and BR,
Ricardo

[1]. 
https://lore.kernel.org/lkml/20210504190526.22347-1-ricardo.neri-calde...@linux.intel.com/T/#mf77988cc98f9ca6988831e17f68394577388959d
[2]. 
https://lore.kernel.org/lkml/alpine.deb.2.21.1808021137400.2...@nanos.tec.linutronix.de/
[3]. 
https://lore.kernel.org/lkml/alpine.deb.2.21.1906161042080.1...@nanos.tec.linutronix.de/

Changes since v4:
 * With !CONFIG_IRQ_REMAP [1] now disables the HPET channel before changing
   the MSI Destination ID field. This should avoid races between a pending
   interrupt and updating the detector's interrupt affinity. (Ashok)
 * Rebased to use new enumeration apic_delivery_modes.
 * Removed custom functions to allocate an interrupt for the detector
   and instead added support to identify the detector's interrupt and
   change the delivery mode.
 * With interrupt remapping enabled, use request_irq().
 * Added support for AMD IOMMU.

Changes since v3:
 * None

Changes since v2:
 * None

Changes since v1:
 * Introduced support for interrupt remapping

Ricardo Neri (7):
  x86/apic: Add irq_cfg::delivery_mode
  x86/hpet: Introduce function to identify HPET hardlockup detector irq
  iommu/vt-d: Rework prepare_irte() to support per-irq delivery mode
  iommu/amd: Set the IRTE delivery mode from irq_cfg
  iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt
  iommu/amd: Fixup delivery mode of the HPET hardlockup interrupt
  x86/watchdog/hardlockup/hpet: Support interrupt remapping

 arch/x86/include/asm/hpet.h |  5 +++
 arch/x86/include/asm/hw_irq.h   |  1 +
 arch/x86/kernel/apic/vector.c   | 10 ++
 arch/x86/kernel/hpet.c  | 36 ++
 arch/x86/kernel/watchdog_hld_hpet.c | 48 +
 drivers/iommu/amd/iommu.c   | 11 ++-
 drivers/iommu/intel/irq_remapping.c | 20 
 7 files changed, 118 insertions(+), 13 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [GIT PULL] dma-mapping updates for Linux 5.13

2021-05-04 Thread pr-tracker-bot
The pull request you sent on Tue, 4 May 2021 09:58:23 +0200:

> git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.13

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/954b7207059cc4004f2e18f49c335304b1c6d64a

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-04 Thread Jason Gunthorpe
On Tue, May 04, 2021 at 01:54:55PM +1000, David Gibson wrote:
> On Mon, May 03, 2021 at 01:05:30PM -0300, Jason Gunthorpe wrote:
> > On Thu, Apr 29, 2021 at 01:20:22PM +1000, David Gibson wrote:
> > > > There is a certain appeal to having some
> > > > 'PPC_TCE_CREATE_SPECIAL_IOASID' entry point that has a wack of extra
> > > > information like windows that can be optionally called by the viommu
> > > > driver and it remains well defined and described.
> > > 
> > > Windows really aren't ppc specific.  They're absolutely there on x86
> > > and everything else as well - it's just that people are used to having
> > > a window at 0.. that you can often get away with
> > > treating it sloppily.
> > 
> > My point is this detailed control seems to go on to more than just
> > windows. As you say the vIOMMU is emulating specific HW that needs to
> > have kernel interfaces to match it exactly.
> 
> It's really not that bad.  The case of emulating the PAPR vIOMMU on
> something else is relatively easy, because all updates to the IO page
> tables go through hypercalls.  So, as long as the backend IOMMU can
> map all the IOVAs that the guest IOMMU can, then qemu's implementation
> of those hypercalls just needs to put an equivalent mapping in the
> backend, which it can do with a generic VFIO_DMA_MAP.

So you also want the PAPR vIOMMU driver to run on, say, an ARM IOMMU?

> vIOMMUs with page tables in guest memory are harder, but only really
> in the usual ways that a vIOMMU of that type is harder (needs cache
> mode or whatever).  At whatever point you need to shadow from the
> guest IO page tables to the host backend, you can again do that with
> generic maps, as long as the backend supports the necessary IOVAs, and
> has an IO page size that's equal to or a submultiple of the vIOMMU
> page size.

But this definitely all becomes HW specific.

For instance I want to have an ARM vIOMMU driver it needs to do some

 ret = ioctl(ioasid_fd, CREATE_NESTED_IOASID, [page table format is ARMvXXX])
 if (ret == -EOPNOTSUPP)
 ret = ioctl(ioasid_fd, CREATE_NORMAL_IOASID, ..)
 // and do completely different and more expensive emulation

I can get a little bit of generality, but at the end of the day the
IOMMU must create a specific HW layout of the nested page table, if it
can't, it can't.

> > I'm remarking that trying to unify every HW IOMMU implementation that
> > ever has/will exist into a generic API complete enough to allow the
> > vIOMMU to be created is likely to result in an API too complicated to
> > understand..
> 
> Maybe not every one, but I think we can get a pretty wide range with a
> reasonable interface.  

It sounds like a reasonable guideline is if the feature is actually
general to all IOMMUs and can be used by qemu as part of a vIOMMU
emulation when compatible vIOMMU HW is not available.

Having 'requested window' support that isn't actually implemented in
every IOMMU is going to mean the PAPR vIOMMU emulation won't work,
defeating the whole point of making things general?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-04 Thread Jason Gunthorpe
On Tue, May 04, 2021 at 08:41:48AM -0700, Jacob Pan wrote:
> > > 
> > > (also looking at ioasid.c, why do we need such a thin and odd wrapper
> > > around xarray?)
> > >   
> > 
> > I'll leave it to Jean and Jacob.

> Could you elaborate?

I mean stuff like this:

int ioasid_set_data(ioasid_t ioasid, void *data)
{
struct ioasid_data *ioasid_data;
int ret = 0;

spin_lock(_allocator_lock);
ioasid_data = xa_load(_allocator->xa, ioasid);
if (ioasid_data)
rcu_assign_pointer(ioasid_data->private, data);
else
ret = -ENOENT;
spin_unlock(_allocator_lock);

/*
 * Wait for readers to stop accessing the old private data, so the
 * caller can free it.
 */
if (!ret)
synchronize_rcu();

return ret;
}
EXPORT_SYMBOL_GPL(ioasid_set_data);

It is a weird way to use xarray to have a structure which
itself is just a wrapper around another RCU protected structure.

Make the caller supply the ioasid_data memory, embedded in its own
element, get rid of the void * and rely on XA_ZERO_ENTRY to hold
allocated but not active entries.

Make the synchronize_rcu() the caller responsiblity, and callers
should really be able to use call_rcu()

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-04 Thread Jason Gunthorpe
On Wed, Apr 28, 2021 at 06:58:19AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Wednesday, April 28, 2021 1:12 AM
> > 
> [...] 
> > > As Alex says, if this line fails because of the group restrictions,
> > > that's not great because it's not very obvious what's gone wrong.
> > 
> > Okay, that is fair, but let's solve that problem directly. For
> > instance netlink has been going in the direction of adding a "extack"
> > from the kernel which is a descriptive error string. If the failing
> > ioctl returned the string:
> > 
> >   "cannot join this device to the IOASID because device XXX in the
> >same group #10 is in use"
> > 
> > Would you agree it is now obvious what has gone wrong? In fact would
> > you agree this is a lot better user experience than what applications
> > do today even though they have the group FD?
> > 
> 
> Currently all the discussions are around implicit vs. explicit uAPI semantics
> on the group restriction. However if we look beyond group the implicit 
> semantics might be inevitable when dealing with incompatible iommu
> domains. An existing example of iommu incompatibility is IOMMU_
> CACHE. 

I still think we need to get rid of these incompatibilities
somehow. Having multiple HW incompatible IOASID in the same platform
is just bad all around.

When modeling in userspace IOMMU_CACHE sounds like it is a property of
each individual IOASID, not an attribute that requires a new domain.

People that want to create cache bypass IOASID's should just ask for
that that directly.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating

2021-05-04 Thread David Coe

Hi again!

On 04/05/2021 07:52, Suravee Suthikulpanit wrote:

On certain AMD platforms, when the IOMMU performance counter source
(csource) field is zero, power-gating for the counter is enabled, which
prevents write access and returns zero for read access.

This can cause invalid perf result especially when event multiplexing
is needed (i.e. more number of events than available counters) since
the current logic keeps track of the previously read counter value,
and subsequently re-program the counter to continue counting the event.
With power-gating enabled, we cannot gurantee successful re-programming
of the counter.

Workaround this issue by :

1. Modifying the ordering of setting/reading counters and enabing/
disabling csources to only access the counter when the csource
is set to non-zero.

2. Since AMD IOMMU PMU does not support interrupt mode, the logic
can be simplified to always start counting with value zero,
and accumulate the counter value when stopping without the need
to keep track and reprogram the counter with the previously read
counter value.



Results for Ryzen 4700U running Ubuntu 21.04 kernel 5.11.0-16 patched as 
above.


All amd_iommu events:

 Performance counter stats for 'system wide':

18   amd_iommu_0/cmd_processed/(33.29%)
 9   amd_iommu_0/cmd_processed_inv/(33.33%)
 0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/ (33.36%)
   308   amd_iommu_0/int_dte_hit/  (33.40%)
 5   amd_iommu_0/int_dte_mis/  (33.45%)
   346   amd_iommu_0/mem_dte_hit/  (33.46%)
 8,954   amd_iommu_0/mem_dte_mis/  (33.48%)
 0   amd_iommu_0/mem_iommu_tlb_pde_hit/(33.46%)
   771   amd_iommu_0/mem_iommu_tlb_pde_mis/(33.44%)
14   amd_iommu_0/mem_iommu_tlb_pte_hit/(33.40%)
   836   amd_iommu_0/mem_iommu_tlb_pte_mis/(33.36%)
 0   amd_iommu_0/mem_pass_excl/(33.32%)
 0   amd_iommu_0/mem_pass_pretrans/(33.28%)
 1,601   amd_iommu_0/mem_pass_untrans/ (33.27%)
 0   amd_iommu_0/mem_target_abort/ (33.27%)
 1,130   amd_iommu_0/mem_trans_total/  (33.27%)
 0   amd_iommu_0/page_tbl_read_gst/(33.27%)
   312   amd_iommu_0/page_tbl_read_nst/(33.28%)
   279   amd_iommu_0/page_tbl_read_tot/(33.27%)
 0   amd_iommu_0/smi_blk/  (33.29%)
 0   amd_iommu_0/smi_recv/ (33.27%)
 0   amd_iommu_0/tlb_inv/  (33.26%)
 0   amd_iommu_0/vapic_int_guest/  (33.25%)
   366   amd_iommu_0/vapic_int_non_guest/  (33.27%)

  10.001941666 seconds time elapsed


Groups of 8 amd_iommu events:

 Performance counter stats for 'system wide':

14   amd_iommu_0/cmd_processed/ 

 7   amd_iommu_0/cmd_processed_inv/ 

 0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/ 

   502   amd_iommu_0/int_dte_hit/ 

 6   amd_iommu_0/int_dte_mis/ 

   532   amd_iommu_0/mem_dte_hit/ 

13,622   amd_iommu_0/mem_dte_mis/ 

   159   amd_iommu_0/mem_iommu_tlb_pde_hit/ 



  10.002170562 seconds time elapsed


 Performance counter stats for 'system wide':

   762   amd_iommu_0/mem_iommu_tlb_pde_mis/ 

20   amd_iommu_0/mem_iommu_tlb_pte_hit/ 

   698   amd_iommu_0/mem_iommu_tlb_pte_mis/ 

 0   amd_iommu_0/mem_pass_excl/ 

 0   amd_iommu_0/mem_pass_pretrans/ 

15   amd_iommu_0/mem_pass_untrans/ 

 0   amd_iommu_0/mem_target_abort/ 

   718   amd_iommu_0/mem_trans_total/ 



  10.001683428 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/page_tbl_read_gst/ 

33   amd_iommu_0/page_tbl_read_nst/ 

33   amd_iommu_0/page_tbl_read_tot/ 

 0   amd_iommu_0/smi_blk/ 

 0   amd_iommu_0/smi_recv/ 

 0   amd_iommu_0/tlb_inv/ 

 0   amd_iommu_0/vapic_int_guest/ 

11,638   amd_iommu_0/vapic_int_non_guest/ 



  10.002205748 seconds time elapsed
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-04 Thread Jason Gunthorpe
On Tue, May 04, 2021 at 09:22:55AM -0700, Jacob Pan wrote:
> Hi Jason,
> 
> On Wed, 28 Apr 2021 17:46:06 -0300, Jason Gunthorpe  wrote:
> 
> > > > I think the name IOASID is fine for the uAPI, the kernel version can
> > > > be called ioasid_id or something.  
> > > 
> > > ioasid is already an id and then ioasid_id just adds confusion. Another
> > > point is that ioasid is currently used to represent both PCI PASID and
> > > ARM substream ID in the kernel. It implies that if we want to separate
> > > ioasid and pasid in the uAPI the 'pasid' also needs to be replaced with
> > > another general term usable for substream ID. Are we making the
> > > terms too confusing here?  
> > 
> > This is why I also am not so sure about exposing the PASID in the API
> > because it is ultimately a HW specific item.
> > 
> > As I said to David, one avenue is to have some generic uAPI that is
> > very general and keep all this deeply detailed stuff, that really only
> > matters for qemu, as part of a more HW specific vIOMMU driver
> > interface.
> I think it is not just for QEMU. I am assuming you meant PASID is
> needed for guest driver to program assigned but not mediated devices.

Anything that directly operates the device and tries to instantiate
PASIDs for vfio-pci devices will need to understand the PASID.

> User space drivers may also need to get the real HW PASID to program it on
> to the HW. So this uAPI need to provide some lookup functionality. Perhaps
> the kernel generic version can be called ioasid_hw_id?
> 
> So we have the following per my understanding:
> - IOASID: a userspace logical number which identifies a page table, this can
> be a first level (GVA-GPA), or a second level (GPA->HPA) page table.
> - PASID: strictly defined in PCIe term
> - Substream ID: strictly defined in ARM SMMUv3 spec.
> - IOASID_HW_ID: a generic ID backed by PASID, Substream ID, or any other
>HW IDs used to tag DMA
> 
> Is that right?

It is reasonable. If a IOASID_HW_ID IOCTL can back with a enum that
qualified its exact nature it might be perfectly fine.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-04 Thread Jacob Pan
Hi Jason,

On Wed, 28 Apr 2021 17:46:06 -0300, Jason Gunthorpe  wrote:

> > > I think the name IOASID is fine for the uAPI, the kernel version can
> > > be called ioasid_id or something.  
> > 
> > ioasid is already an id and then ioasid_id just adds confusion. Another
> > point is that ioasid is currently used to represent both PCI PASID and
> > ARM substream ID in the kernel. It implies that if we want to separate
> > ioasid and pasid in the uAPI the 'pasid' also needs to be replaced with
> > another general term usable for substream ID. Are we making the
> > terms too confusing here?  
> 
> This is why I also am not so sure about exposing the PASID in the API
> because it is ultimately a HW specific item.
> 
> As I said to David, one avenue is to have some generic uAPI that is
> very general and keep all this deeply detailed stuff, that really only
> matters for qemu, as part of a more HW specific vIOMMU driver
> interface.
I think it is not just for QEMU. I am assuming you meant PASID is
needed for guest driver to program assigned but not mediated devices.

User space drivers may also need to get the real HW PASID to program it on
to the HW. So this uAPI need to provide some lookup functionality. Perhaps
the kernel generic version can be called ioasid_hw_id?

So we have the following per my understanding:
- IOASID: a userspace logical number which identifies a page table, this can
be a first level (GVA-GPA), or a second level (GPA->HPA) page table.
- PASID: strictly defined in PCIe term
- Substream ID: strictly defined in ARM SMMUv3 spec.
- IOASID_HW_ID: a generic ID backed by PASID, Substream ID, or any other
 HW IDs used to tag DMA

Is that right?

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-04 Thread Jacob Pan
Hi Kevin,

On Wed, 28 Apr 2021 06:34:11 +, "Tian, Kevin" 
wrote:

> > 
> > (also looking at ioasid.c, why do we need such a thin and odd wrapper
> > around xarray?)
> >   
> 
> I'll leave it to Jean and Jacob.
I am not sure whether you are referring to the current ioasid.c or the
changes proposed in this patchset. I added a per VM/ioasid_set
(also per /dev/ioasid fd) xarray to store guest-host PASID mapping.

The current code has a xarray for the allocators.

struct ioasid_allocator_data {
struct ioasid_allocator_ops *ops;
struct list_head list;
struct list_head slist;
#define IOASID_ALLOCATOR_CUSTOM BIT(0) /* Needs framework to track results */
unsigned long flags;
struct xarray xa;
struct rcu_head rcu;
};

Could you elaborate?

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/4] dt-bindings: iommu: rockchip: Add compatible for v2

2021-05-04 Thread Ezequiel Garcia
Hi Rob,
 
On Friday, April 30, 2021 22:34 -03, Rob Herring  wrote: 
 
> On Thu, Apr 22, 2021 at 04:16:00PM +0200, Benjamin Gaignard wrote:
> > Add compatible for the second version of IOMMU hardware block.
> > RK356x IOMMU can also be link to a power domain.
> > 
> > Signed-off-by: Benjamin Gaignard 
> > ---
> > version 2:
> >  - Add power-domains property
> > 
> >  .../devicetree/bindings/iommu/rockchip,iommu.yaml  | 7 ++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml 
> > b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
> > index 0db208cf724a..e54353ccd1ec 100644
> > --- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
> > +++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
> > @@ -19,7 +19,9 @@ description: |+
> >  
> >  properties:
> >compatible:
> > -const: rockchip,iommu
> > +enum:
> > +  - rockchip,iommu
> > +  - rockchip,iommu-v2
> 
> This should be SoC specific.
> 

It seems iommu-v2 is really the name Rockchip gives to this IOMMU IP core.
Can we keep the "rockchip,iommu-v2" compatible, and add SoC-specific ones, as 
we normally do:

compatible = "rockchip,rk3568-iommu", "rockchip,iommu-v2";

Just like we'd do with any peripheral:

compatible = "st,stm32-dwmac", "snps,dwmac-3.50a";

Thanks!
Ezequiel

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating

2021-05-04 Thread Peter Zijlstra
On Tue, May 04, 2021 at 06:58:29PM +0700, Suthikulpanit, Suravee wrote:
> Peter,
> 
> On 5/4/2021 4:39 PM, Peter Zijlstra wrote:
> > On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote:
> > 
> > > 2. Since AMD IOMMU PMU does not support interrupt mode, the logic
> > > can be simplified to always start counting with value zero,
> > > and accumulate the counter value when stopping without the need
> > > to keep track and reprogram the counter with the previously read
> > > counter value.
> > 
> > This relies on the hardware counter being the full 64bit wide, is it?
> > 
> 
> The HW counter value is 48-bit. Not sure why it needs to be 64-bit?
> I might be missing some points here? Could you please describe?

How do you deal with the 48bit overflow if you don't use the interrupt?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating

2021-05-04 Thread Suthikulpanit, Suravee

Peter,

On 5/4/2021 4:39 PM, Peter Zijlstra wrote:

On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote:


2. Since AMD IOMMU PMU does not support interrupt mode, the logic
can be simplified to always start counting with value zero,
and accumulate the counter value when stopping without the need
to keep track and reprogram the counter with the previously read
counter value.


This relies on the hardware counter being the full 64bit wide, is it?



The HW counter value is 48-bit. Not sure why it needs to be 64-bit?
I might be missing some points here? Could you please describe?

Thanks,
Suravee


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating

2021-05-04 Thread Peter Zijlstra
On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote:

> 2. Since AMD IOMMU PMU does not support interrupt mode, the logic
>can be simplified to always start counting with value zero,
>and accumulate the counter value when stopping without the need
>to keep track and reprogram the counter with the previously read
>counter value.

This relies on the hardware counter being the full 64bit wide, is it?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v2] iommu/amd: Fix extended features logging

2021-05-04 Thread Alexander Monakov
print_iommu_info prints the EFR register and then the decoded list of
features on a separate line:

pci :00:00.2: AMD-Vi: Extended features (0x206d73ef22254ade):
 PPR X2APIC NX GT IA GA PC GA_vAPIC

The second line is emitted via 'pr_cont', which causes it to have a
different ('warn') loglevel compared to the previous line ('info').

Commit 9a295ff0ffc9 attempted to rectify this by removing the newline
from the pci_info format string, but this doesn't work, as pci_info
calls implicitly append a newline anyway.

Printing the decoded features on the same line would make it quite long.
Instead, change pci_info() to pr_info() to omit PCI bus location info,
which is also shown in the preceding message. This results in:

pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40
AMD-Vi: Extended features (0x206d73ef22254ade): PPR X2APIC NX GT IA GA PC 
GA_vAPIC
AMD-Vi: Interrupt remapping enabled

Fixes: 9a295ff0ffc9 ("iommu/amd: Print extended features in one line to fix 
divergent log levels")
Link: 
https://lore.kernel.org/lkml/alpine.lnx.2.20.13.2104112326460.11...@monopod.intra.ispras.ru
Signed-off-by: Alexander Monakov 
Cc: Paul Menzel 
Cc: Joerg Roedel 
Cc: Suravee Suthikulpanit 
Cc: iommu@lists.linux-foundation.org
---
 drivers/iommu/amd/init.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 429a4baa3aee..8f0eb865119a 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1954,8 +1954,8 @@ static void print_iommu_info(void)
pci_info(pdev, "Found IOMMU cap 0x%x\n", iommu->cap_ptr);
 
if (iommu->cap & (1 << IOMMU_CAP_EFR)) {
-   pci_info(pdev, "Extended features (%#llx):",
-iommu->features);
+   pr_info("Extended features (%#llx):", iommu->features);
+
for (i = 0; i < ARRAY_SIZE(feat_str); ++i) {
if (iommu_feature(iommu, (1ULL << i)))
pr_cont(" %s", feat_str[i]);

base-commit: 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
-- 
2.30.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 4/4] iommu: rockchip: Add support iommu v2

2021-05-04 Thread Benjamin Gaignard
From: Simon Xue 

RK356x SoC got new IOMMU hardware block (version 2).
Add a compatible to distinguish it from the first version.

Signed-off-by: Simon Xue 
[Benjamin]
- port driver from kernel 4.19 to 5.12
- change functions prototype.
- squash all fixes in this commit.
Signed-off-by: Benjamin Gaignard 
---
version 3:
 - Change compatible name to use SoC name

 drivers/iommu/rockchip-iommu.c | 422 +++--
 1 file changed, 406 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index e5d86b7177de..32dcf0aaec18 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -81,6 +82,30 @@
   */
 #define RK_IOMMU_PGSIZE_BITMAP 0x007ff000
 
+#define DT_LO_MASK 0xf000
+#define DT_HI_MASK GENMASK_ULL(39, 32)
+#define DT_SHIFT   28
+
+#define DTE_BASE_HI_MASK GENMASK(11, 4)
+
+#define PAGE_DESC_LO_MASK   0xf000
+#define PAGE_DESC_HI1_LOWER 32
+#define PAGE_DESC_HI1_UPPER 35
+#define PAGE_DESC_HI2_LOWER 36
+#define PAGE_DESC_HI2_UPPER 39
+#define PAGE_DESC_HI_MASK1  GENMASK_ULL(PAGE_DESC_HI1_UPPER, 
PAGE_DESC_HI1_LOWER)
+#define PAGE_DESC_HI_MASK2  GENMASK_ULL(PAGE_DESC_HI2_UPPER, 
PAGE_DESC_HI2_LOWER)
+
+#define DTE_HI1_LOWER 8
+#define DTE_HI1_UPPER 11
+#define DTE_HI2_LOWER 4
+#define DTE_HI2_UPPER 7
+#define DTE_HI_MASK1  GENMASK(DTE_HI1_UPPER, DTE_HI1_LOWER)
+#define DTE_HI_MASK2  GENMASK(DTE_HI2_UPPER, DTE_HI2_LOWER)
+
+#define PAGE_DESC_HI_SHIFT1 (PAGE_DESC_HI1_LOWER - DTE_HI1_LOWER)
+#define PAGE_DESC_HI_SHIFT2 (PAGE_DESC_HI2_LOWER - DTE_HI2_LOWER)
+
 struct rk_iommu_domain {
struct list_head iommus;
u32 *dt; /* page directory table */
@@ -91,6 +116,10 @@ struct rk_iommu_domain {
struct iommu_domain domain;
 };
 
+struct rockchip_iommu_data {
+   u32 version;
+};
+
 /* list of clocks required by IOMMU */
 static const char * const rk_iommu_clocks[] = {
"aclk", "iface",
@@ -108,6 +137,7 @@ struct rk_iommu {
struct list_head node; /* entry in rk_iommu_domain.iommus */
struct iommu_domain *domain; /* domain to which iommu is attached */
struct iommu_group *group;
+   u32 version;
 };
 
 struct rk_iommudata {
@@ -174,11 +204,32 @@ static struct rk_iommu_domain *to_rk_domain(struct 
iommu_domain *dom)
 #define RK_DTE_PT_ADDRESS_MASK0xf000
 #define RK_DTE_PT_VALID   BIT(0)
 
+/*
+ * In v2:
+ * 31:12 - PT address bit 31:0
+ * 11: 8 - PT address bit 35:32
+ *  7: 4 - PT address bit 39:36
+ *  3: 1 - Reserved
+ * 0 - 1 if PT @ PT address is valid
+ */
+#define RK_DTE_PT_ADDRESS_MASK_V2 0xfff0
+
 static inline phys_addr_t rk_dte_pt_address(u32 dte)
 {
return (phys_addr_t)dte & RK_DTE_PT_ADDRESS_MASK;
 }
 
+static inline phys_addr_t rk_dte_pt_address_v2(u32 dte)
+{
+   u64 dte_v2 = dte;
+
+   dte_v2 = ((dte_v2 & DTE_HI_MASK2) << PAGE_DESC_HI_SHIFT2) |
+((dte_v2 & DTE_HI_MASK1) << PAGE_DESC_HI_SHIFT1) |
+(dte_v2 & PAGE_DESC_LO_MASK);
+
+   return (phys_addr_t)dte_v2;
+}
+
 static inline bool rk_dte_is_pt_valid(u32 dte)
 {
return dte & RK_DTE_PT_VALID;
@@ -189,6 +240,15 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma)
return (pt_dma & RK_DTE_PT_ADDRESS_MASK) | RK_DTE_PT_VALID;
 }
 
+static inline u32 rk_mk_dte_v2(dma_addr_t pt_dma)
+{
+   pt_dma = (pt_dma & PAGE_DESC_LO_MASK) |
+((pt_dma & PAGE_DESC_HI_MASK1) >> PAGE_DESC_HI_SHIFT1) |
+(pt_dma & PAGE_DESC_HI_MASK2) >> PAGE_DESC_HI_SHIFT2;
+
+   return (pt_dma & RK_DTE_PT_ADDRESS_MASK_V2) | RK_DTE_PT_VALID;
+}
+
 /*
  * Each PTE has a Page address, some flags and a valid bit:
  * +-+---+---+-+
@@ -215,11 +275,37 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma)
 #define RK_PTE_PAGE_READABLE  BIT(1)
 #define RK_PTE_PAGE_VALID BIT(0)
 
+/*
+ * In v2:
+ * 31:12 - Page address bit 31:0
+ *  11:9 - Page address bit 34:32
+ *   8:4 - Page address bit 39:35
+ * 3 - Security
+ * 2 - Readable
+ * 1 - Writable
+ * 0 - 1 if Page @ Page address is valid
+ */
+#define RK_PTE_PAGE_ADDRESS_MASK_V2  0xfff0
+#define RK_PTE_PAGE_FLAGS_MASK_V20x000e
+#define RK_PTE_PAGE_READABLE_V2  BIT(2)
+#define RK_PTE_PAGE_WRITABLE_V2  BIT(1)
+
 static inline phys_addr_t rk_pte_page_address(u32 pte)
 {
return (phys_addr_t)pte & RK_PTE_PAGE_ADDRESS_MASK;
 }
 
+static inline phys_addr_t rk_pte_page_address_v2(u32 pte)
+{
+   u64 pte_v2 = pte;
+
+   pte_v2 = ((pte_v2 & DTE_HI_MASK2) << PAGE_DESC_HI_SHIFT2) |
+((pte_v2 & DTE_HI_MASK1) << PAGE_DESC_HI_SHIFT1) |
+(pte_v2 & PAGE_DESC_LO_MASK);
+
+   return (phys_addr_t)pte_v2;
+}
+
 static inline bool rk_pte_is_page_valid(u32 pte)
 {
return pte & RK_PTE_PAGE_VALID;
@@ -229,12 +315,26 @@ static inline bool rk_pte_is_page_valid(u32 

[PATCH v3 2/4] dt-bindings: iommu: rockchip: Add compatible for v2

2021-05-04 Thread Benjamin Gaignard
Add compatible for the second version of IOMMU hardware block.
RK356x IOMMU can also be link to a power domain.

Signed-off-by: Benjamin Gaignard 
---
version 3:
 - Rename compatible with SoC name

version 2:
 - Add power-domains property

 .../devicetree/bindings/iommu/rockchip,iommu.yaml  | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
index 0db208cf724a..7f6bca185b5f 100644
--- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
+++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
@@ -19,7 +19,9 @@ description: |+
 
 properties:
   compatible:
-const: rockchip,iommu
+enum:
+  - rockchip,iommu
+  - rockchip,rk3568-iommu
 
   reg:
 minItems: 1
@@ -46,6 +48,9 @@ properties:
   "#iommu-cells":
 const: 0
 
+  power-domains:
+maxItems: 1
+
   rockchip,disable-mmu-reset:
 $ref: /schemas/types.yaml#/definitions/flag
 description: |
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/4] dt-bindings: iommu: rockchip: Convert IOMMU to DT schema

2021-05-04 Thread Benjamin Gaignard
Convert Rockchip IOMMU to DT schema

Signed-off-by: Benjamin Gaignard 
---
version 2:
 - Change maintainer
 - Change reg maxItems
 - Change interrupt maxItems

 .../bindings/iommu/rockchip,iommu.txt | 38 -
 .../bindings/iommu/rockchip,iommu.yaml| 79 +++
 2 files changed, 79 insertions(+), 38 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
 create mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml

diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt 
b/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
deleted file mode 100644
index 6ecefea1c6f9..
--- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
+++ /dev/null
@@ -1,38 +0,0 @@
-Rockchip IOMMU
-==
-
-A Rockchip DRM iommu translates io virtual addresses to physical addresses for
-its master device.  Each slave device is bound to a single master device, and
-shares its clocks, power domain and irq.
-
-Required properties:
-- compatible  : Should be "rockchip,iommu"
-- reg : Address space for the configuration registers
-- interrupts  : Interrupt specifier for the IOMMU instance
-- interrupt-names : Interrupt name for the IOMMU instance
-- #iommu-cells: Should be <0>.  This indicates the iommu is a
-"single-master" device, and needs no additional information
-to associate with its master device.  See:
-Documentation/devicetree/bindings/iommu/iommu.txt
-- clocks  : A list of clocks required for the IOMMU to be accessible by
-the host CPU.
-- clock-names : Should contain the following:
-   "iface" - Main peripheral bus clock (PCLK/HCL) (required)
-   "aclk"  - AXI bus clock (required)
-
-Optional properties:
-- rockchip,disable-mmu-reset : Don't use the mmu reset operation.
-  Some mmu instances may produce unexpected results
-  when the reset operation is used.
-
-Example:
-
-   vopl_mmu: iommu@ff940300 {
-   compatible = "rockchip,iommu";
-   reg = <0xff940300 0x100>;
-   interrupts = ;
-   interrupt-names = "vopl_mmu";
-   clocks = < ACLK_VOP1>, < HCLK_VOP1>;
-   clock-names = "aclk", "iface";
-   #iommu-cells = <0>;
-   };
diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
new file mode 100644
index ..0db208cf724a
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
@@ -0,0 +1,79 @@
+# SPDX-License-Identifier: GPL-2.0-only
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iommu/rockchip,iommu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Rockchip IOMMU
+
+maintainers:
+  - Heiko Stuebner 
+
+description: |+
+  A Rockchip DRM iommu translates io virtual addresses to physical addresses 
for
+  its master device. Each slave device is bound to a single master device and
+  shares its clocks, power domain and irq.
+
+  For information on assigning IOMMU controller to its peripheral devices,
+  see generic IOMMU bindings.
+
+properties:
+  compatible:
+const: rockchip,iommu
+
+  reg:
+minItems: 1
+maxItems: 2
+
+  interrupts:
+minItems: 1
+maxItems: 2
+
+  interrupt-names:
+minItems: 1
+maxItems: 2
+
+  clocks:
+items:
+  - description: Core clock
+  - description: Interface clock
+
+  clock-names:
+items:
+  - const: aclk
+  - const: iface
+
+  "#iommu-cells":
+const: 0
+
+  rockchip,disable-mmu-reset:
+$ref: /schemas/types.yaml#/definitions/flag
+description: |
+  Do not use the mmu reset operation.
+  Some mmu instances may produce unexpected results
+  when the reset operation is used.
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - clocks
+  - clock-names
+  - "#iommu-cells"
+
+additionalProperties: false
+
+examples:
+  - |
+#include 
+#include 
+
+vopl_mmu: iommu@ff940300 {
+  compatible = "rockchip,iommu";
+  reg = <0xff940300 0x100>;
+  interrupts = ;
+  interrupt-names = "vopl_mmu";
+  clocks = < ACLK_VOP1>, < HCLK_VOP1>;
+  clock-names = "aclk", "iface";
+  #iommu-cells = <0>;
+};
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 3/4] ARM: dts: rockchip: rk322x: Fix iommu-cells properties name

2021-05-04 Thread Benjamin Gaignard
Add '#" to iommu-cells properties

Signed-off-by: Benjamin Gaignard 
---
 arch/arm/boot/dts/rk322x.dtsi | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm/boot/dts/rk322x.dtsi b/arch/arm/boot/dts/rk322x.dtsi
index a4dd50aaf3fc..8af2e9288029 100644
--- a/arch/arm/boot/dts/rk322x.dtsi
+++ b/arch/arm/boot/dts/rk322x.dtsi
@@ -564,7 +564,7 @@ vpu_mmu: iommu@20020800 {
interrupt-names = "vpu_mmu";
clocks = < ACLK_VPU>, < HCLK_VPU>;
clock-names = "aclk", "iface";
-   iommu-cells = <0>;
+   #iommu-cells = <0>;
status = "disabled";
};
 
@@ -575,7 +575,7 @@ vdec_mmu: iommu@20030480 {
interrupt-names = "vdec_mmu";
clocks = < ACLK_RKVDEC>, < HCLK_RKVDEC>;
clock-names = "aclk", "iface";
-   iommu-cells = <0>;
+   #iommu-cells = <0>;
status = "disabled";
};
 
@@ -629,7 +629,7 @@ iep_mmu: iommu@20070800 {
interrupt-names = "iep_mmu";
clocks = < ACLK_IEP>, < HCLK_IEP>;
clock-names = "aclk", "iface";
-   iommu-cells = <0>;
+   #iommu-cells = <0>;
status = "disabled";
};
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/4] Add driver for rk356x

2021-05-04 Thread Benjamin Gaignard
This series adds the IOMMU driver for rk356x SoC.
Since a new compatible is needed to distinguish this second version of 
IOMMU hardware block from the first one, it is an opportunity to convert
the binding to DT schema.

version 3:
 - Rename compatible with soc prefix
 - Rebase on v5.12 tag

version 2:
 - Fix iommu-cells typo in rk322x.dtsi
 - Change maintainer
 - Change reg maxItems
 - Add power-domains property

Benjamin Gaignard (3):
  dt-bindings: iommu: rockchip: Convert IOMMU to DT schema
  dt-bindings: iommu: rockchip: Add compatible for v2
  ARM: dts: rockchip: rk322x: Fix iommu-cells properties name

Simon Xue (1):
  iommu: rockchip: Add support iommu v2

 .../bindings/iommu/rockchip,iommu.txt |  38 --
 .../bindings/iommu/rockchip,iommu.yaml|  84 
 arch/arm/boot/dts/rk322x.dtsi |   6 +-
 drivers/iommu/rockchip-iommu.c| 422 +-
 4 files changed, 493 insertions(+), 57 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
 create mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[GIT PULL] dma-mapping updates for Linux 5.13

2021-05-04 Thread Christoph Hellwig
The following changes since commit 1e28eed17697bcf343c6743f0028cc3b5dd88bf0:

  Linux 5.12-rc3 (2021-03-14 14:41:02 -0700)

are available in the Git repository at:

  git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.13

for you to fetch changes up to a7f3d3d3600c8ed119eb0d2483de0062ce2e3707:

  dma-mapping: add unlikely hint to error path in dma_mapping_error (2021-04-02 
16:41:08 +0200)


dma-mapping updates for Linux 5.13:

 - add a new dma_alloc_noncontiguous API (me, Ricardo Ribalda)
 - fix a copyright noice (Hao Fang)
 - add an unlikely annotation to dma_mapping_error (Heiner Kallweit)
 - remove a pointless empty line (Wang Qing)
 - add support for multi-pages map/unmap bencharking (Xiang Chen)


Christoph Hellwig (5):
  dma-mapping: add a dma_mmap_pages helper
  dma-mapping: refactor dma_{alloc,free}_pages
  dma-mapping: add a dma_alloc_noncontiguous API
  dma-iommu: refactor iommu_dma_alloc_remap
  dma-iommu: implement ->alloc_noncontiguous

Hao Fang (1):
  dma-mapping: benchmark: use the correct HiSilicon copyright

Heiner Kallweit (1):
  dma-mapping: add unlikely hint to error path in dma_mapping_error

Ricardo Ribalda (1):
  media: uvcvideo: Use dma_alloc_noncontiguous API

Wang Qing (1):
  dma-mapping: remove a pointless empty line in dma_alloc_coherent

Xiang Chen (1):
  dma-mapping: benchmark: Add support for multi-pages map/unmap

 Documentation/core-api/dma-api.rst  |  88 ++
 drivers/iommu/dma-iommu.c   | 103 -
 drivers/media/usb/uvc/uvc_video.c   |  94 +++
 drivers/media/usb/uvc/uvcvideo.h|   5 +-
 include/linux/dma-map-ops.h |  19 +++
 include/linux/dma-mapping.h |  37 +-
 kernel/dma/map_benchmark.c  |  23 ++--
 kernel/dma/mapping.c| 148 ++--
 tools/testing/selftests/dma/dma_map_benchmark.c |  22 +++-
 9 files changed, 456 insertions(+), 83 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/4] dt-bindings: iommu: rockchip: Convert IOMMU to DT schema

2021-05-04 Thread Benjamin Gaignard


Le 01/05/2021 à 00:14, Rob Herring a écrit :

On Thu, Apr 22, 2021 at 02:16:53PM -0300, Ezequiel Garcia wrote:

(Adding Kever)

Hi Benjamin,

Thanks a lot for working on this, it looks amazing. Together with the great work
that Rockchip is doing, it seems RK3566/RK3568 will have decent support very 
soon.

One comment here:

On Thu, 2021-04-22 at 16:15 +0200, Benjamin Gaignard wrote:

Convert Rockchip IOMMU to DT schema

Signed-off-by: Benjamin Gaignard 
---
version 2:
  - Change maintainer
  - Change reg maxItems
  - Change interrupt maxItems

  .../bindings/iommu/rockchip,iommu.txt | 38 -
  .../bindings/iommu/rockchip,iommu.yaml    | 79 +++
  2 files changed, 79 insertions(+), 38 deletions(-)
  delete mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
  create mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml

diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt 
b/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
deleted file mode 100644
index 6ecefea1c6f9..
--- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
+++ /dev/null
@@ -1,38 +0,0 @@
-Rockchip IOMMU
-==
-
-A Rockchip DRM iommu translates io virtual addresses to physical addresses for
-its master device.  Each slave device is bound to a single master device, and
-shares its clocks, power domain and irq.
-
-Required properties:
-- compatible  : Should be "rockchip,iommu"
-- reg : Address space for the configuration registers
-- interrupts  : Interrupt specifier for the IOMMU instance
-- interrupt-names : Interrupt name for the IOMMU instance
-- #iommu-cells    : Should be <0>.  This indicates the iommu is a
-    "single-master" device, and needs no additional information
-    to associate with its master device.  See:
-    Documentation/devicetree/bindings/iommu/iommu.txt
-- clocks  : A list of clocks required for the IOMMU to be accessible by
-    the host CPU.
-- clock-names : Should contain the following:
-   "iface" - Main peripheral bus clock (PCLK/HCL) (required)
-   "aclk"  - AXI bus clock (required)
-
-Optional properties:
-- rockchip,disable-mmu-reset : Don't use the mmu reset operation.
-  Some mmu instances may produce unexpected results
-  when the reset operation is used.
-
-Example:
-
-   vopl_mmu: iommu@ff940300 {
-   compatible = "rockchip,iommu";
-   reg = <0xff940300 0x100>;
-   interrupts = ;
-   interrupt-names = "vopl_mmu";
-   clocks = < ACLK_VOP1>, < HCLK_VOP1>;
-   clock-names = "aclk", "iface";
-   #iommu-cells = <0>;
-   };
diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
new file mode 100644
index ..0db208cf724a
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
@@ -0,0 +1,79 @@
+# SPDX-License-Identifier: GPL-2.0-only
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iommu/rockchip,iommu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Rockchip IOMMU
+
+maintainers:
+  - Heiko Stuebner 
+
+description: |+
+  A Rockchip DRM iommu translates io virtual addresses to physical addresses 
for
+  its master device. Each slave device is bound to a single master device and
+  shares its clocks, power domain and irq.
+
+  For information on assigning IOMMU controller to its peripheral devices,
+  see generic IOMMU bindings.
+
+properties:
+  compatible:
+    const: rockchip,iommu
+
+  reg:
+    minItems: 1
+    maxItems: 2
+
+  interrupts:
+    minItems: 1
+    maxItems: 2
+
+  interrupt-names:
+    minItems: 1
+    maxItems: 2
+

AFAICS, the driver supports handling multiple MMUs, and there's one reg and
interrupt cell for each MMU. IOW, there's no requirement that maxItems is 2.

Is there any way we can describe that? Or maybe just allow a bigger maximum?

With #iommu-cells == 0, how would one distinguish which IOMMU is
associated with a device? IOW, is more that 1 really usable?

If you need more just pick a maxItems value that's either the most seen
or 'should be enough'TM. If the entries are just multiple instances of
the same thing, please note that here.


In current dts files it is up to two interruptions per IOMMU hardware blocks
so I will keep it to this value.

Benjamin



Rob


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating

2021-05-04 Thread Suravee Suthikulpanit
On certain AMD platforms, when the IOMMU performance counter source
(csource) field is zero, power-gating for the counter is enabled, which
prevents write access and returns zero for read access.

This can cause invalid perf result especially when event multiplexing
is needed (i.e. more number of events than available counters) since
the current logic keeps track of the previously read counter value,
and subsequently re-program the counter to continue counting the event.
With power-gating enabled, we cannot gurantee successful re-programming
of the counter.

Workaround this issue by :

1. Modifying the ordering of setting/reading counters and enabing/
   disabling csources to only access the counter when the csource
   is set to non-zero.

2. Since AMD IOMMU PMU does not support interrupt mode, the logic
   can be simplified to always start counting with value zero,
   and accumulate the counter value when stopping without the need
   to keep track and reprogram the counter with the previously read
   counter value.

This has been tested on systems with and without power-gating.

Fixes: 994d6608efe4 ("iommu/amd: Remove performance counter pre-initialization 
test")
Suggested-by: Alexander Monakov 
Cc: David Coe 
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/events/amd/iommu.c | 47 -
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c
index 1c1a7e45dc64..913745f1419b 100644
--- a/arch/x86/events/amd/iommu.c
+++ b/arch/x86/events/amd/iommu.c
@@ -19,8 +19,6 @@
 #include "../perf_event.h"
 #include "iommu.h"
 
-#define COUNTER_SHIFT  16
-
 /* iommu pmu conf masks */
 #define GET_CSOURCE(x) ((x)->conf & 0xFFULL)
 #define GET_DEVID(x)   (((x)->conf >> 8)  & 0xULL)
@@ -286,22 +284,31 @@ static void perf_iommu_start(struct perf_event *event, 
int flags)
WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
hwc->state = 0;
 
+   /*
+* To account for power-gating, which prevents write to
+* the counter, we need to enable the counter
+* before setting up counter register.
+*/
+   perf_iommu_enable_event(event);
+
if (flags & PERF_EF_RELOAD) {
-   u64 prev_raw_count = local64_read(>prev_count);
+   u64 count = 0;
struct amd_iommu *iommu = perf_event_2_iommu(event);
 
+   /*
+* Since the IOMMU PMU only support counting mode,
+* the counter always start with value zero.
+*/
amd_iommu_pc_set_reg(iommu, hwc->iommu_bank, hwc->iommu_cntr,
-IOMMU_PC_COUNTER_REG, _raw_count);
+IOMMU_PC_COUNTER_REG, );
}
 
-   perf_iommu_enable_event(event);
perf_event_update_userpage(event);
-
 }
 
 static void perf_iommu_read(struct perf_event *event)
 {
-   u64 count, prev, delta;
+   u64 count;
struct hw_perf_event *hwc = >hw;
struct amd_iommu *iommu = perf_event_2_iommu(event);
 
@@ -312,14 +319,11 @@ static void perf_iommu_read(struct perf_event *event)
/* IOMMU pc counter register is only 48 bits */
count &= GENMASK_ULL(47, 0);
 
-   prev = local64_read(>prev_count);
-   if (local64_cmpxchg(>prev_count, prev, count) != prev)
-   return;
-
-   /* Handle 48-bit counter overflow */
-   delta = (count << COUNTER_SHIFT) - (prev << COUNTER_SHIFT);
-   delta >>= COUNTER_SHIFT;
-   local64_add(delta, >count);
+   /*
+* Since the counter always start with value zero,
+* simply just accumulate the count for the event.
+*/
+   local64_add(count, >count);
 }
 
 static void perf_iommu_stop(struct perf_event *event, int flags)
@@ -329,15 +333,16 @@ static void perf_iommu_stop(struct perf_event *event, int 
flags)
if (hwc->state & PERF_HES_UPTODATE)
return;
 
+   /*
+* To account for power-gating, in which reading the counter would
+* return zero, we need to read the register before disabling.
+*/
+   perf_iommu_read(event);
+   hwc->state |= PERF_HES_UPTODATE;
+
perf_iommu_disable_event(event);
WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
hwc->state |= PERF_HES_STOPPED;
-
-   if (hwc->state & PERF_HES_UPTODATE)
-   return;
-
-   perf_iommu_read(event);
-   hwc->state |= PERF_HES_UPTODATE;
 }
 
 static int perf_iommu_add(struct perf_event *event, int flags)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu