Re: [PATCH v2] iommu/amd: Use report_iommu_fault()

2021-08-21 Thread Lennert Buytenhek
On Thu, Aug 05, 2021 at 11:26:25AM -0500, Suthikulpanit, Suravee wrote:

> Lennert,

Hi Suravee,


> > - EVENT_FLAG_I unset, but the request was a translation request
> >   (EVENT_FLAG_TR set) or the target page was not present
> >   (EVENT_FLAG_PR unset): call report_iommu_fault(), but the RW
> >   bit will be invalid, so don't try to map it to a
> >   IOMMU_FAULT_{READ,WRITE} code
> 
> So, why do we need to call report_iommu_fault() for this case?
> My understanding is we only have IOMMU_FAULT_[READ|WRITE].
> So, if we can't identify whether the DMA is read / write,
> we should not need to call report_iommu_fauilt(), is it?

I don't think that we should just altogether avoid logging the subset
of page faults for which we can't determine the read/write direction
on AMD platforms.

E.g. "access to an unmapped address" (which will have PR=0, and thus we
won't know if it was a read or a write access) is just as much of a page
fault as "write to a read-only page" (which will have PR=1, and thus the
RW bit will be accurate) is, and for RAS purposes, both events are
equally interesting, and important to know about.

It's true that we currently don't have a way of signaling to
report_iommu_fault() (and by extension, to the io_page_fault
tracepoint) that we're not sure whether the offending access was a read
or a write, but I think we can just add a bit to include/linux/iommu.h
to indicate that, something along the lines of:

 /* iommu fault flags */
 #define IOMMU_FAULT_READ0x0
 #define IOMMU_FAULT_WRITE   0x1
+#define IOMMU_FAULT_RW_UNKNOWN  0x2

(Cc'ing Ohad Ben-Cohen, who originally added this API.)

I don't think that it would be a good idea to just not signal the page
faults for which we don't know the read/write direction.


Thanks,
Lennert



> > - EVENT_FLAG_I unset, the request is a transaction request (EVENT_FLAG_TR
> >unset) and the target page was present (EVENT_FLAG_PR set): call
> >report_iommu_fault(), and use the RW bit to set IOMMU_FAULT_{READ,WRITE}
> > 
> > So I don't think we can merge the test for EVENT_FLAG_I with the
> > test for EVENT_FLAG_TR/EVENT_FLAG_PR.
> 
> The only condition that we would report_iommu_fault is
> I=0, TR=0, PR=1, isn't it. So we should be able to just check if PR=1.
> 
> 
> > We could do something like this, if you'd prefer:
> > 
> > #define IS_IOMMU_MEM_TRANSACTION(flags) \
> > (((flags) & EVENT_FLAG_I) == 0)
> > 
> > #define IS_RW_FLAG_VALID(flags) \
> > (((flags) & (EVENT_FLAG_TR | EVENT_FLAG_PR)) == EVENT_FLAG_PR)
> > 
> > #define IS_WRITE_REQUEST(flags) \
> > (IS_RW_FLAG_VALID(flags) && (flags & EVENT_FLAG_RW))
> > 
> > And then do something like:
> > 
> > if (dev_data && IS_IOMMU_MEM_TRANSACTION(flags)) {
> > if (!report_iommu_fault(_data->domain->domain, >dev,
> > address,
> > IS_WRITE_REQUEST(flags) ?
> > IOMMU_FAULT_WRITE : IOMMU_FAULT_READ))
> 
> Actually, IS_WRITE_REQUEST() == 0 could mean:
> - I=0, TR=0, PR=1 and RW=0: This is fine.
> - I=0, (TR=1 or PR=0), and we should not be calling report_iommu_fault() here
>   since we cannot specify READ/WRITE here.
> 
> Thanks,
> Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH,RFC] iommu/amd: Recover from event log overflow

2021-08-21 Thread Lennert Buytenhek
The AMD IOMMU logs I/O page faults and such to a ring buffer in
system memory, and this ring buffer can overflow.  The AMD IOMMU
spec has the following to say about the interrupt status bit that
signals this overflow condition:

EventOverflow: Event log overflow. RW1C. Reset 0b. 1 = IOMMU
event log overflow has occurred. This bit is set when a new
event is to be written to the event log and there is no usable
entry in the event log, causing the new event information to
be discarded. An interrupt is generated when EventOverflow = 1b
and MMIO Offset 0018h[EventIntEn] = 1b. No new event log
entries are written while this bit is set. Software Note: To
resume logging, clear EventOverflow (W1C), and write a 1 to
MMIO Offset 0018h[EventLogEn].

The AMD IOMMU driver doesn't currently implement this recovery
sequence, meaning that if a ring buffer overflow occurs, logging of
EVT/PPR/GA events will cease entirely.

This patch implements the spec-mandated reset sequence, with the minor
tweak that the hardware seems to want to have a 0 written to MMIO Offset
0018h[EventLogEn] first, before writing an 1 into this field, or the
IOMMU won't actually resume logging events.

Signed-off-by: Lennert Buytenhek 
---
N.B.: I have only tested this change against an older in-house custom
kernel version.  (Where it appears to survive stress testing.)  This
version of the patch was only compile-tested.

This patch makes iommu_feature_{disable,enable}() be called at non-init
time (via amd_iommu_restart_event_logging()), and those functions perform
unguarded read-modify-write sequences on the MMIO_CONTROL_OFFSET register,
and I haven't yet verified whether that is safe to do.

Reproducing this issue is fairly easy if you can easily cause I/O page
faults -- either stick an msleep() in the amd_iommu_int_thread() loop,
or configure a slow serial console (the latter being how this was
initially found).

 drivers/iommu/amd/amd_iommu.h   |  1 +
 drivers/iommu/amd/amd_iommu_types.h |  1 +
 drivers/iommu/amd/init.c| 10 ++
 drivers/iommu/amd/iommu.c   | 10 --
 4 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 416815a525d6..bb95edf74415 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -14,6 +14,7 @@
 extern irqreturn_t amd_iommu_int_thread(int irq, void *data);
 extern irqreturn_t amd_iommu_int_handler(int irq, void *data);
 extern void amd_iommu_apply_erratum_63(u16 devid);
+extern void amd_iommu_restart_event_logging(struct amd_iommu *iommu);
 extern void amd_iommu_reset_cmd_buffer(struct amd_iommu *iommu);
 extern int amd_iommu_init_devices(void);
 extern void amd_iommu_uninit_devices(void);
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 94c1a7a9876d..34b90f8b2056 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -110,6 +110,7 @@
 #define PASID_MASK 0x
 
 /* MMIO status bits */
+#define MMIO_STATUS_EVT_OVERFLOW_INT_MASK  (1 << 0)
 #define MMIO_STATUS_EVT_INT_MASK   (1 << 1)
 #define MMIO_STATUS_COM_WAIT_INT_MASK  (1 << 2)
 #define MMIO_STATUS_PPR_INT_MASK   (1 << 6)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 46280e6e1535..44d1536474ed 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -639,6 +639,16 @@ static int __init alloc_command_buffer(struct amd_iommu 
*iommu)
return iommu->cmd_buf ? 0 : -ENOMEM;
 }
 
+/*
+ * This function restarts event logging in case the IOMMU experienced
+ * an event log buffer overflow.
+ */
+void amd_iommu_restart_event_logging(struct amd_iommu *iommu)
+{
+   iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN);
+   iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN);
+}
+
 /*
  * This function resets the command buffer if the IOMMU stopped fetching
  * commands from it.
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 811a49a95d04..b04f0cd5ffc2 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -736,7 +736,8 @@ amd_iommu_set_pci_msi_domain(struct device *dev, struct 
amd_iommu *iommu) { }
 #endif /* !CONFIG_IRQ_REMAP */
 
 #define AMD_IOMMU_INT_MASK \
-   (MMIO_STATUS_EVT_INT_MASK | \
+   (MMIO_STATUS_EVT_OVERFLOW_INT_MASK | \
+MMIO_STATUS_EVT_INT_MASK | \
 MMIO_STATUS_PPR_INT_MASK | \
 MMIO_STATUS_GALOG_INT_MASK)
 
@@ -746,7 +747,7 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data)
u32 status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
 
while (status & AMD_IOMMU_INT_MASK) {
-   /* Enable EVT and PPR and GA interrupts again */
+   /* Enable interrupt sources again */
writel(AMD_IOMMU_INT_MASK,
iommu->mmio_base + MMIO_STATUS_OFFSET);
 
@@ 

Re: [PATCH v4 00/24] iommu: Refactor DMA domain strictness

2021-08-21 Thread Sven Peter via iommu



On Wed, Aug 18, 2021, at 17:13, Robin Murphy wrote:
> Sven - I've prepared the follow-up patches already[1], so consider 
> yourself off the hook (I see no point in trying to fix the nominal DART 
> cookie bugs between now and then) :)
> 

Great, thanks for taking care of that! :)
Just tested your branch and everything works. Feel free to add 
Acked-by: Sven Peter 
Tested-by: Sven Peter 


Best,


Sven
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu