[PATCH v2] iommu/amd: Recover from event log overflow
The AMD IOMMU logs I/O page faults and such to a ring buffer in system memory, and this ring buffer can overflow. The AMD IOMMU spec has the following to say about the interrupt status bit that signals this overflow condition: EventOverflow: Event log overflow. RW1C. Reset 0b. 1 = IOMMU event log overflow has occurred. This bit is set when a new event is to be written to the event log and there is no usable entry in the event log, causing the new event information to be discarded. An interrupt is generated when EventOverflow = 1b and MMIO Offset 0018h[EventIntEn] = 1b. No new event log entries are written while this bit is set. Software Note: To resume logging, clear EventOverflow (W1C), and write a 1 to MMIO Offset 0018h[EventLogEn]. The AMD IOMMU driver doesn't currently implement this recovery sequence, meaning that if a ring buffer overflow occurs, logging of EVT/PPR/GA events will cease entirely. This patch implements the spec-mandated reset sequence, with the minor tweak that the hardware seems to want to have a 0 written to MMIO Offset 0018h[EventLogEn] first, before writing an 1 into this field, or the IOMMU won't actually resume logging events. Signed-off-by: Lennert Buytenhek --- Tested on v5.15-rc4 on a Ryzen 3700X. Changes for v2: - Rate limit event log overflow messages. drivers/iommu/amd/amd_iommu.h | 1 + drivers/iommu/amd/amd_iommu_types.h | 1 + drivers/iommu/amd/init.c| 10 ++ drivers/iommu/amd/iommu.c | 10 -- 4 files changed, 20 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h index 416815a525d6..bb95edf74415 100644 --- a/drivers/iommu/amd/amd_iommu.h +++ b/drivers/iommu/amd/amd_iommu.h @@ -14,6 +14,7 @@ extern irqreturn_t amd_iommu_int_thread(int irq, void *data); extern irqreturn_t amd_iommu_int_handler(int irq, void *data); extern void amd_iommu_apply_erratum_63(u16 devid); +extern void amd_iommu_restart_event_logging(struct amd_iommu *iommu); extern void amd_iommu_reset_cmd_buffer(struct amd_iommu *iommu); extern int amd_iommu_init_devices(void); extern void amd_iommu_uninit_devices(void); diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h index 8dbe61e2b3c1..095076029f8d 100644 --- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -110,6 +110,7 @@ #define PASID_MASK 0x /* MMIO status bits */ +#define MMIO_STATUS_EVT_OVERFLOW_INT_MASK (1 << 0) #define MMIO_STATUS_EVT_INT_MASK (1 << 1) #define MMIO_STATUS_COM_WAIT_INT_MASK (1 << 2) #define MMIO_STATUS_PPR_INT_MASK (1 << 6) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 2a822b229bd0..f1c5eb023bda 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -654,6 +654,16 @@ static int __init alloc_command_buffer(struct amd_iommu *iommu) return iommu->cmd_buf ? 0 : -ENOMEM; } +/* + * This function restarts event logging in case the IOMMU experienced + * an event log buffer overflow. + */ +void amd_iommu_restart_event_logging(struct amd_iommu *iommu) +{ + iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN); + iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN); +} + /* * This function resets the command buffer if the IOMMU stopped fetching * commands from it. diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 1722bb161841..f46eb7397021 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -742,7 +742,8 @@ amd_iommu_set_pci_msi_domain(struct device *dev, struct amd_iommu *iommu) { } #endif /* !CONFIG_IRQ_REMAP */ #define AMD_IOMMU_INT_MASK \ - (MMIO_STATUS_EVT_INT_MASK | \ + (MMIO_STATUS_EVT_OVERFLOW_INT_MASK | \ +MMIO_STATUS_EVT_INT_MASK | \ MMIO_STATUS_PPR_INT_MASK | \ MMIO_STATUS_GALOG_INT_MASK) @@ -752,7 +753,7 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data) u32 status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET); while (status & AMD_IOMMU_INT_MASK) { - /* Enable EVT and PPR and GA interrupts again */ + /* Enable interrupt sources again */ writel(AMD_IOMMU_INT_MASK, iommu->mmio_base + MMIO_STATUS_OFFSET); @@ -773,6 +774,11 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data) } #endif + if (status & MMIO_STATUS_EVT_OVERFLOW_INT_MASK) { + pr_info_ratelimited("IOMMU event log overflow\n"); + amd_iommu_restart_event_logging(iommu); + } + /* * Hardware bug: ERBT1312 * When re-enabling interrupt (by writing 1 -- 2.31.1 ___ iommu mailing list iommu
Re: [PATCH,RFC] iommu/amd: Recover from event log overflow
On Tue, Sep 28, 2021 at 10:45:30AM +0200, Joerg Roedel wrote: > Hi Lennert, Hi Joerg, Thanks for your feedback! > > +/* > > + * This function restarts event logging in case the IOMMU experienced > > + * an event log buffer overflow. > > + */ > > +void amd_iommu_restart_event_logging(struct amd_iommu *iommu) > > +{ > > + iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN); > > + iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN); > > +} > > A few more things need to happen here. First of all head and tail are > likely equal when the event buffer overflows, so the events will not be > polled and reported. I applied the following change on top of my patch, to check the state of the event log ring buffer when we enter the IRQ handler with an overflow condition pending: --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -752,6 +752,18 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data) struct amd_iommu *iommu = (struct amd_iommu *) data; u32 status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET); + if (status & MMIO_STATUS_EVT_OVERFLOW_INT_MASK) { + u32 events; + + events = readl(iommu->mmio_base + MMIO_EVT_TAIL_OFFSET) - +readl(iommu->mmio_base + MMIO_EVT_HEAD_OFFSET); + if (events & 0x8000) + events += EVT_BUFFER_SIZE; + events /= EVENT_ENTRY_SIZE; + + pr_info("Overflow with %d events queued\n", events); + } + while (status & AMD_IOMMU_INT_MASK) { /* Enable interrupt sources again */ writel(AMD_IOMMU_INT_MASK, And that gives me: [ 33.304821] AMD-Vi: Overflow with 511 events queued [ 35.304812] AMD-Vi: Overflow with 511 events queued [ 39.304791] AMD-Vi: Overflow with 511 events queued [ 40.304792] AMD-Vi: Overflow with 511 events queued [ 41.304782] AMD-Vi: Overflow with 511 events queued [ 44.009920] AMD-Vi: Overflow with 511 events queued [ 45.304768] AMD-Vi: Overflow with 511 events queued [ 46.304782] AMD-Vi: Overflow with 511 events queued [ 46.385028] AMD-Vi: Overflow with 511 events queued [ 51.304733] AMD-Vi: Overflow with 511 events queued [ 53.318892] AMD-Vi: Overflow with 511 events queued [ 60.304681] AMD-Vi: Overflow with 511 events queued [ 63.304676] AMD-Vi: Overflow with 511 events queued In other words, it seems that the hardware considers the event log to be full when there's still one free entry in the ring buffer, and it will not actually fill up the last free entry and make the head and tail pointer equal by doing so. This seems consistent across Ryzen 3700X, Ryzen 5950X, EPYC 3151, EPYC 3251, RX-421ND, RX-216TD. The docs talk about "When the Event Log has overflowed", but they don't define exactly when that happens (i.e. when tail becomes equal to head or one entry before that), but I did find this phrase that talks about the overflow condition: The host software must make space in the event log after an overflow by reading entries (by adjusting the head pointer) or by resizing the log. Event logging may then be restarted. This seems to suggest that the hardware will never consume the last entry in the ring buffer and thereby advance tail to be equal to head, because if it would, then there would be no need for "reading entries (by adjusting the head pointer)" to make space in the event log buffer, because the 'empty' and 'full' conditions would be indistinguishable in that case. If there _is_ some variation of the hardware out there that does advance the tail pointer to coincide with the head pointer when there are already N-1 entries in the log and it has one more entry to write, then this patch would still work OK on such hardware -- we would just lose a few more events in that case than we otherwise would, but the point of the patch is to be able to recover after a burst of I/O page faults, at which point we've very likely already lost some events, so I think such hypothetical lossage would be acceptable, given that none of the hardware I have access to seems to behave that way and from the wording in the docs it is unlikely to behave that way. In fact, thinking about it a bit more, by the time an event log overflow condition is handled, it is actually possible for the event log ring buffer to be empty and not full. Imagine this scenario: - We enter the IRQ handler, EVT status bit is set, the ring buffer is full but it hasn't overflowed yet, so OVERFLOW is not set. - We read interrupt status and clear the EVT bit. - Before we call iommu_poll_events(), another event comes in, which causes the OVERFLOW flag to be set, since we haven't advanced head yet. - iommu_poll_events() is called, and it processes all events in the ring buffer, which is now empty (and will remain empty, since the overflow bit is set). - The MMIO_STATUS_OFFSET re-reading at the end of the IRQ loop finds the OVERFLOW bit set and
[PATCH v5] iommu/amd: Use report_iommu_fault()
This patch makes iommu/amd call report_iommu_fault() when an I/O page fault occurs, which has two effects: 1) It allows device drivers to register a callback to be notified of I/O page faults, via the iommu_set_fault_handler() API. 2) It triggers the io_page_fault tracepoint in report_iommu_fault() when an I/O page fault occurs. The latter point is the main aim of this patch, as it allows rasdaemon-like daemons to be notified of I/O page faults, and to possibly initiate corrective action in response. A number of other IOMMU drivers already use report_iommu_fault(), and I/O page faults on those IOMMUs therefore already trigger this tracepoint -- but this isn't yet the case for AMD-Vi and Intel DMAR. The AMD IOMMU specification suggests that the bit in an I/O page fault event log entry that signals whether an I/O page fault was for a read request or for a write request is only meaningful when the faulting access was to a present page, but some testing on a Ryzen 3700X suggests that this bit encodes the correct value even for I/O page faults to non-present pages, and therefore, this patch passes the R/W information up the stack even for I/O page faults to non-present pages. Signed-off-by: Lennert Buytenhek --- Tested on v5.15-rc3 on a Ryzen 3700X and on a Ryzen 5950X, where it has the desired effect. Changes for v5: - Code formatting tweaking. (Suggested by Joerg Roedel.) Changes for v4: - Unconditionally pass through RW bit, after testing that suggests that this bit is reliable even for I/O page faults to non-present pages. Changes for v3: - Test fault flags via macros. (Suggested by Suravee Suthikulpanit.) Changes for v2: - Don't call report_iommu_fault() for IRQ remapping faults. (Suggested by Joerg Roedel.) drivers/iommu/amd/amd_iommu_types.h | 2 ++ drivers/iommu/amd/iommu.c | 21 + 2 files changed, 23 insertions(+) diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h index 8dbe61e2b3c1..867535eb0ce9 100644 --- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -138,6 +138,8 @@ #define EVENT_DOMID_MASK_HI0xf #define EVENT_FLAGS_MASK 0xfff #define EVENT_FLAGS_SHIFT 0x10 +#define EVENT_FLAG_RW 0x020 +#define EVENT_FLAG_I 0x008 /* feature control bits */ #define CONTROL_IOMMU_EN0x00ULL diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 1722bb161841..beadcffcc223 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -473,6 +473,12 @@ static void amd_iommu_report_rmp_fault(volatile u32 *event) pci_dev_put(pdev); } +#define IS_IOMMU_MEM_TRANSACTION(flags)\ + (((flags) & EVENT_FLAG_I) == 0) + +#define IS_WRITE_REQUEST(flags)\ + ((flags) & EVENT_FLAG_RW) + static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, u64 address, int flags) { @@ -485,6 +491,20 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, dev_data = dev_iommu_priv_get(&pdev->dev); if (dev_data) { + /* +* If this is a DMA fault (for which the I(nterrupt) +* bit will be unset), allow report_iommu_fault() to +* prevent logging it. +*/ + if (IS_IOMMU_MEM_TRANSACTION(flags)) { + if (!report_iommu_fault(&dev_data->domain->domain, + &pdev->dev, address, + IS_WRITE_REQUEST(flags) ? + IOMMU_FAULT_WRITE : + IOMMU_FAULT_READ)) + goto out; + } + if (__ratelimit(&dev_data->rs)) { pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n", domain_id, address, flags); @@ -495,6 +515,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, domain_id, address, flags); } +out: if (pdev) pci_dev_put(pdev); } -- 2.31.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4] iommu/amd: Use report_iommu_fault()
This patch makes iommu/amd call report_iommu_fault() when an I/O page fault occurs, which has two effects: 1) It allows device drivers to register a callback to be notified of I/O page faults, via the iommu_set_fault_handler() API. 2) It triggers the io_page_fault tracepoint in report_iommu_fault() when an I/O page fault occurs. The latter point is the main aim of this patch, as it allows rasdaemon-like daemons to be notified of I/O page faults, and to possibly initiate corrective action in response. A number of other IOMMU drivers already use report_iommu_fault(), and I/O page faults on those IOMMUs therefore already trigger this tracepoint -- but this isn't yet the case for AMD-Vi and Intel DMAR. The AMD IOMMU specification suggests that the bit in an I/O page fault event log entry that signals whether an I/O page fault was for a read request or for a write request is only meaningful when the faulting access was to a present page, but some testing on a Ryzen 3700X suggests that this bit encodes the correct value even for I/O page faults to non-present pages, and therefore, this patch passes the R/W information up the stack even for I/O page faults to non-present pages. Signed-off-by: Lennert Buytenhek --- Tested on v5.15-rc2 on a Ryzen 3700X, where it has the desired effect. Changes for v4: - Unconditionally pass through RW bit, after testing that suggests that this bit is reliable even for I/O page faults to non-present pages. Changes for v3: - Test fault flags via macros. (Suggested by Suravee Suthikulpanit.) Changes for v2: - Don't call report_iommu_fault() for IRQ remapping faults. (Suggested by Joerg Roedel.) drivers/iommu/amd/amd_iommu_types.h | 2 ++ drivers/iommu/amd/iommu.c | 19 +++ 2 files changed, 21 insertions(+) diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h index 8dbe61e2b3c1..867535eb0ce9 100644 --- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -138,6 +138,8 @@ #define EVENT_DOMID_MASK_HI0xf #define EVENT_FLAGS_MASK 0xfff #define EVENT_FLAGS_SHIFT 0x10 +#define EVENT_FLAG_RW 0x020 +#define EVENT_FLAG_I 0x008 /* feature control bits */ #define CONTROL_IOMMU_EN0x00ULL diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 1722bb161841..7b592aba06f5 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -473,6 +473,12 @@ static void amd_iommu_report_rmp_fault(volatile u32 *event) pci_dev_put(pdev); } +#define IS_IOMMU_MEM_TRANSACTION(flags)\ + (((flags) & EVENT_FLAG_I) == 0) + +#define IS_WRITE_REQUEST(flags)\ + ((flags) & EVENT_FLAG_RW) + static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, u64 address, int flags) { @@ -484,6 +490,18 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, if (pdev) dev_data = dev_iommu_priv_get(&pdev->dev); + /* +* If this is a DMA fault (for which the I(nterrupt) bit will +* be unset), allow report_iommu_fault() to prevent logging it. +*/ + if (dev_data && IS_IOMMU_MEM_TRANSACTION(flags)) { + if (!report_iommu_fault(&dev_data->domain->domain, + &pdev->dev, address, + IS_WRITE_REQUEST(flags) ? + IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) + goto out; + } + if (dev_data) { if (__ratelimit(&dev_data->rs)) { pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n", @@ -495,6 +513,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, domain_id, address, flags); } +out: if (pdev) pci_dev_put(pdev); } -- 2.31.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/amd: Use report_iommu_fault()
On Sat, Aug 21, 2021 at 06:44:39PM +0300, Lennert Buytenhek wrote: > > > - EVENT_FLAG_I unset, but the request was a translation request > > > (EVENT_FLAG_TR set) or the target page was not present > > > (EVENT_FLAG_PR unset): call report_iommu_fault(), but the RW > > > bit will be invalid, so don't try to map it to a > > > IOMMU_FAULT_{READ,WRITE} code > > > > So, why do we need to call report_iommu_fault() for this case? > > My understanding is we only have IOMMU_FAULT_[READ|WRITE]. > > So, if we can't identify whether the DMA is read / write, > > we should not need to call report_iommu_fauilt(), is it? > > I don't think that we should just altogether avoid logging the subset > of page faults for which we can't determine the read/write direction > on AMD platforms. > > E.g. "access to an unmapped address" (which will have PR=0, and thus we > won't know if it was a read or a write access) is just as much of a page > fault as "write to a read-only page" (which will have PR=1, and thus the > RW bit will be accurate) is, and for RAS purposes, both events are > equally interesting, and important to know about. > > It's true that we currently don't have a way of signaling to > report_iommu_fault() (and by extension, to the io_page_fault > tracepoint) that we're not sure whether the offending access was a read > or a write, but I think we can just add a bit to include/linux/iommu.h > to indicate that, something along the lines of: > >/* iommu fault flags */ >#define IOMMU_FAULT_READ0x0 >#define IOMMU_FAULT_WRITE 0x1 > +#define IOMMU_FAULT_RW_UNKNOWN 0x2 > > (Cc'ing Ohad Ben-Cohen, who originally added this API.) > > I don't think that it would be a good idea to just not signal the page > faults for which we don't know the read/write direction. I had another look at this, and from some testing, it seems that, contrary to what the datasheet suggests, the RW (read/write direction) bit in logged I/O page faults is actually accurate for both faulting accesses to present pages and faulting accesses to non-present pages. I made a few hacks to the ixgbe driver to intentionally cause several different types of I/O page faults, thereby turning an ixgbe NIC into a poor man's I/O page fault generator, and these are the resulting logged I/O page faults, on a Ryzen 3700X system: Read from non-present page: ixgbe :25:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 address=0x1bbc8f040 flags=0x] => flags indicate PR(esent)=0, RW=0 Write to non-present page: ixgbe :25:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 address=0x1bdcb70c0 flags=0x0020] => flags indicate PR(esent)=0, RW=1 Read from write-only page: ixgbe :25:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 address=0xbbcc1c40 flags=0x0050] => flags indicate PR(esent)=1, RW=0 (and PE(rmission violation)=1) Write to read-only page: ixgbe :25:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 address=0xbdcb70c0 flags=0x0070] => flags indicate PR(esent)=1, RW=1 (and PE(rmission violation)=1) In other words, it seems that the RW bit is reliable even for PR=0 type faults. I assume that the restriction mentioned in the docs regarding RW and PR ("RW is only meaningful when PR=1, TR=0, and I=0" from AMD I/O Virtualization Technology (IOMMU) Specification, revision 3.00, Table 55: IO_PAGE_FAULT Event Log Buffer Entry Fields) is either confused or refers to a restriction in older hardware that has since been lifted. I'll resubmit the patch to unconditionally pass through the RW bit. ixgbe hacks: to cause reads from non-present pages (in the TX path): --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -8232,7 +8232,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring, dma_unmap_len_set(tx_buffer, len, size); dma_unmap_addr_set(tx_buffer, dma, dma); - tx_desc->read.buffer_addr = cpu_to_le64(dma); + tx_desc->read.buffer_addr = cpu_to_le64(dma + BIT(32)); while (unlikely(size > IXGBE_MAX_DATA_PER_TXD)) { tx_desc->read.cmd_type_len = to cause writes to non-present pages (in the RX path): --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -1604,7 +1604,7 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count) * Refresh the desc even if buffer_addrs didn't change * because each write-back erases this info. */ - rx_desc->read.pkt_addr = cpu_to_le64(bi->dma + bi->page_of
Re: [PATCH v2] iommu/amd: Use report_iommu_fault()
On Thu, Aug 05, 2021 at 11:26:25AM -0500, Suthikulpanit, Suravee wrote: > Lennert, Hi Suravee, > > - EVENT_FLAG_I unset, but the request was a translation request > > (EVENT_FLAG_TR set) or the target page was not present > > (EVENT_FLAG_PR unset): call report_iommu_fault(), but the RW > > bit will be invalid, so don't try to map it to a > > IOMMU_FAULT_{READ,WRITE} code > > So, why do we need to call report_iommu_fault() for this case? > My understanding is we only have IOMMU_FAULT_[READ|WRITE]. > So, if we can't identify whether the DMA is read / write, > we should not need to call report_iommu_fauilt(), is it? I don't think that we should just altogether avoid logging the subset of page faults for which we can't determine the read/write direction on AMD platforms. E.g. "access to an unmapped address" (which will have PR=0, and thus we won't know if it was a read or a write access) is just as much of a page fault as "write to a read-only page" (which will have PR=1, and thus the RW bit will be accurate) is, and for RAS purposes, both events are equally interesting, and important to know about. It's true that we currently don't have a way of signaling to report_iommu_fault() (and by extension, to the io_page_fault tracepoint) that we're not sure whether the offending access was a read or a write, but I think we can just add a bit to include/linux/iommu.h to indicate that, something along the lines of: /* iommu fault flags */ #define IOMMU_FAULT_READ0x0 #define IOMMU_FAULT_WRITE 0x1 +#define IOMMU_FAULT_RW_UNKNOWN 0x2 (Cc'ing Ohad Ben-Cohen, who originally added this API.) I don't think that it would be a good idea to just not signal the page faults for which we don't know the read/write direction. Thanks, Lennert > > - EVENT_FLAG_I unset, the request is a transaction request (EVENT_FLAG_TR > >unset) and the target page was present (EVENT_FLAG_PR set): call > >report_iommu_fault(), and use the RW bit to set IOMMU_FAULT_{READ,WRITE} > > > > So I don't think we can merge the test for EVENT_FLAG_I with the > > test for EVENT_FLAG_TR/EVENT_FLAG_PR. > > The only condition that we would report_iommu_fault is > I=0, TR=0, PR=1, isn't it. So we should be able to just check if PR=1. > > > > We could do something like this, if you'd prefer: > > > > #define IS_IOMMU_MEM_TRANSACTION(flags) \ > > (((flags) & EVENT_FLAG_I) == 0) > > > > #define IS_RW_FLAG_VALID(flags) \ > > (((flags) & (EVENT_FLAG_TR | EVENT_FLAG_PR)) == EVENT_FLAG_PR) > > > > #define IS_WRITE_REQUEST(flags) \ > > (IS_RW_FLAG_VALID(flags) && (flags & EVENT_FLAG_RW)) > > > > And then do something like: > > > > if (dev_data && IS_IOMMU_MEM_TRANSACTION(flags)) { > > if (!report_iommu_fault(&dev_data->domain->domain, &pdev->dev, > > address, > > IS_WRITE_REQUEST(flags) ? > > IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) > > Actually, IS_WRITE_REQUEST() == 0 could mean: > - I=0, TR=0, PR=1 and RW=0: This is fine. > - I=0, (TR=1 or PR=0), and we should not be calling report_iommu_fault() here > since we cannot specify READ/WRITE here. > > Thanks, > Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH,RFC] iommu/amd: Recover from event log overflow
The AMD IOMMU logs I/O page faults and such to a ring buffer in system memory, and this ring buffer can overflow. The AMD IOMMU spec has the following to say about the interrupt status bit that signals this overflow condition: EventOverflow: Event log overflow. RW1C. Reset 0b. 1 = IOMMU event log overflow has occurred. This bit is set when a new event is to be written to the event log and there is no usable entry in the event log, causing the new event information to be discarded. An interrupt is generated when EventOverflow = 1b and MMIO Offset 0018h[EventIntEn] = 1b. No new event log entries are written while this bit is set. Software Note: To resume logging, clear EventOverflow (W1C), and write a 1 to MMIO Offset 0018h[EventLogEn]. The AMD IOMMU driver doesn't currently implement this recovery sequence, meaning that if a ring buffer overflow occurs, logging of EVT/PPR/GA events will cease entirely. This patch implements the spec-mandated reset sequence, with the minor tweak that the hardware seems to want to have a 0 written to MMIO Offset 0018h[EventLogEn] first, before writing an 1 into this field, or the IOMMU won't actually resume logging events. Signed-off-by: Lennert Buytenhek --- N.B.: I have only tested this change against an older in-house custom kernel version. (Where it appears to survive stress testing.) This version of the patch was only compile-tested. This patch makes iommu_feature_{disable,enable}() be called at non-init time (via amd_iommu_restart_event_logging()), and those functions perform unguarded read-modify-write sequences on the MMIO_CONTROL_OFFSET register, and I haven't yet verified whether that is safe to do. Reproducing this issue is fairly easy if you can easily cause I/O page faults -- either stick an msleep() in the amd_iommu_int_thread() loop, or configure a slow serial console (the latter being how this was initially found). drivers/iommu/amd/amd_iommu.h | 1 + drivers/iommu/amd/amd_iommu_types.h | 1 + drivers/iommu/amd/init.c| 10 ++ drivers/iommu/amd/iommu.c | 10 -- 4 files changed, 20 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h index 416815a525d6..bb95edf74415 100644 --- a/drivers/iommu/amd/amd_iommu.h +++ b/drivers/iommu/amd/amd_iommu.h @@ -14,6 +14,7 @@ extern irqreturn_t amd_iommu_int_thread(int irq, void *data); extern irqreturn_t amd_iommu_int_handler(int irq, void *data); extern void amd_iommu_apply_erratum_63(u16 devid); +extern void amd_iommu_restart_event_logging(struct amd_iommu *iommu); extern void amd_iommu_reset_cmd_buffer(struct amd_iommu *iommu); extern int amd_iommu_init_devices(void); extern void amd_iommu_uninit_devices(void); diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h index 94c1a7a9876d..34b90f8b2056 100644 --- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -110,6 +110,7 @@ #define PASID_MASK 0x /* MMIO status bits */ +#define MMIO_STATUS_EVT_OVERFLOW_INT_MASK (1 << 0) #define MMIO_STATUS_EVT_INT_MASK (1 << 1) #define MMIO_STATUS_COM_WAIT_INT_MASK (1 << 2) #define MMIO_STATUS_PPR_INT_MASK (1 << 6) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 46280e6e1535..44d1536474ed 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -639,6 +639,16 @@ static int __init alloc_command_buffer(struct amd_iommu *iommu) return iommu->cmd_buf ? 0 : -ENOMEM; } +/* + * This function restarts event logging in case the IOMMU experienced + * an event log buffer overflow. + */ +void amd_iommu_restart_event_logging(struct amd_iommu *iommu) +{ + iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN); + iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN); +} + /* * This function resets the command buffer if the IOMMU stopped fetching * commands from it. diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 811a49a95d04..b04f0cd5ffc2 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -736,7 +736,8 @@ amd_iommu_set_pci_msi_domain(struct device *dev, struct amd_iommu *iommu) { } #endif /* !CONFIG_IRQ_REMAP */ #define AMD_IOMMU_INT_MASK \ - (MMIO_STATUS_EVT_INT_MASK | \ + (MMIO_STATUS_EVT_OVERFLOW_INT_MASK | \ +MMIO_STATUS_EVT_INT_MASK | \ MMIO_STATUS_PPR_INT_MASK | \ MMIO_STATUS_GALOG_INT_MASK) @@ -746,7 +747,7 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data) u32 status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET); while (status & AMD_IOMMU_INT_MASK) { - /* Enable EVT and PPR and GA interrupts again */ + /* Enable interrupt sources again */ writel(AMD_IOMMU_INT_MASK,
Re: [PATCH v2] iommu/amd: Use report_iommu_fault()
On Fri, Jul 30, 2021 at 05:32:16AM +0300, Lennert Buytenhek wrote: > > > This patch makes iommu/amd call report_iommu_fault() when an I/O page > > > fault occurs, which has two effects: > > > > > > 1) It allows device drivers to register a callback to be notified of > > > I/O page faults, via the iommu_set_fault_handler() API. > > > > > > 2) It triggers the io_page_fault tracepoint in report_iommu_fault() > > > when an I/O page fault occurs. > > > > > > I'm mainly interested in (2). We have a daemon with some rasdaemon-like > > > functionality for handling platform errors, and being able to be notified > > > of I/O page faults for initiating corrective action is very useful -- and > > > receiving such events via event tracing is a lot nicer than having to > > > scrape them from kmsg. > > > > > > A number of other IOMMU drivers already use report_iommu_fault(), and > > > I/O page faults on those IOMMUs therefore already seem to trigger this > > > tracepoint -- but this isn't (yet) the case for AMD-Vi and Intel DMAR. > > > > > > I copied the logic from the other callers of report_iommu_fault(), where > > > if that function returns zero, the driver will have handled the fault, > > > in which case we avoid logging information about the fault to the printk > > > buffer from the IOMMU driver. > > > > > > With this patch I see io_page_fault event tracing entries as expected: > > > > > > irq/24-AMD-Vi-48[002] 978.554289: io_page_fault: > > > IOMMU:[drvname] :05:00.0 iova=0x91482640 flags=0x > > > irq/24-AMD-Vi-48[002] 978.554294: io_page_fault: > > > IOMMU:[drvname] :05:00.0 iova=0x91482650 flags=0x > > > irq/24-AMD-Vi-48[002] 978.554299: io_page_fault: > > > IOMMU:[drvname] :05:00.0 iova=0x91482660 flags=0x > > > irq/24-AMD-Vi-48[002] 978.554305: io_page_fault: > > > IOMMU:[drvname] :05:00.0 iova=0x91482670 flags=0x > > > irq/24-AMD-Vi-48[002] 978.554310: io_page_fault: > > > IOMMU:[drvname] :05:00.0 iova=0x91482680 flags=0x > > > irq/24-AMD-Vi-48[002] 978.554315: io_page_fault: > > > IOMMU:[drvname] 0000:05:00.0 iova=0x914826a0 flags=0x > > > > > > For determining IOMMU_FAULT_{READ,WRITE}, I followed the AMD IOMMU > > > spec, but I haven't tested that bit of the code, as the page faults I > > > encounter are all to non-present (!EVENT_FLAG_PR) mappings, in which > > > case EVENT_FLAG_RW doesn't make sense. > > > > > > Signed-off-by: Lennert Buytenhek > > > --- > > > Changes since v1 RFC: > > > > > > - Don't call report_iommu_fault() for IRQ remapping faults. > > >(Suggested by Joerg Roedel.) > > > > > > drivers/iommu/amd/amd_iommu_types.h | 4 > > > drivers/iommu/amd/iommu.c | 29 + > > > 2 files changed, 33 insertions(+) > > > > > > diff --git a/drivers/iommu/amd/amd_iommu_types.h > > > b/drivers/iommu/amd/amd_iommu_types.h > > > index 94c1a7a9876d..2f2c6630c24c 100644 > > > --- a/drivers/iommu/amd/amd_iommu_types.h > > > +++ b/drivers/iommu/amd/amd_iommu_types.h > > > @@ -138,6 +138,10 @@ > > > #define EVENT_DOMID_MASK_HI 0xf > > > #define EVENT_FLAGS_MASK0xfff > > > #define EVENT_FLAGS_SHIFT 0x10 > > > +#define EVENT_FLAG_TR0x100 > > > +#define EVENT_FLAG_RW0x020 > > > +#define EVENT_FLAG_PR0x010 > > > +#define EVENT_FLAG_I 0x008 > > > /* feature control bits */ > > > #define CONTROL_IOMMU_EN0x00ULL > > > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c > > > index a7d6d78147b7..d9fb2c22d44a 100644 > > > --- a/drivers/iommu/amd/iommu.c > > > +++ b/drivers/iommu/amd/iommu.c > > > > What if we introduce: > > > > +/* > > + * AMD I/O Virtualization Technology (IOMMU) Specification, > > + * revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says > > + * that the RW ("read-write") bit is only valid if the I/O > > + * page fault was caused by a memory transaction request > > + * referencing a page that was marked present. > > + */ > > +#define IO_PAGE_FAULT_MEM_MASK \ > &g
[PATCH v3] iommu/amd: Use report_iommu_fault()
This patch makes iommu/amd call report_iommu_fault() when an I/O page fault occurs, which has two effects: 1) It allows device drivers to register a callback to be notified of I/O page faults, via the iommu_set_fault_handler() API. 2) It triggers the io_page_fault tracepoint in report_iommu_fault() when an I/O page fault occurs. I'm mainly interested in (2). We have a daemon with some rasdaemon-like functionality for handling platform errors, and being able to be notified of I/O page faults for initiating corrective action is very useful -- and receiving such events via event tracing is a lot nicer than having to scrape them from kmsg. A number of other IOMMU drivers already use report_iommu_fault(), and I/O page faults on those IOMMUs therefore already seem to trigger this tracepoint -- but this isn't (yet) the case for AMD-Vi and Intel DMAR. I copied the logic from the other callers of report_iommu_fault(), where if that function returns zero, the driver will have handled the fault, in which case we avoid logging information about the fault to the printk buffer from the IOMMU driver. With this patch I see io_page_fault event tracing entries as expected: irq/24-AMD-Vi-48[002] 978.554289: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482640 flags=0x irq/24-AMD-Vi-48[002] 978.554294: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482650 flags=0x irq/24-AMD-Vi-48[002] 978.554299: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482660 flags=0x irq/24-AMD-Vi-48[002] 978.554305: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482670 flags=0x irq/24-AMD-Vi-48[002] 978.554310: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482680 flags=0x irq/24-AMD-Vi-48[002] 978.554315: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x914826a0 flags=0x For determining IOMMU_FAULT_{READ,WRITE}, I followed the AMD IOMMU spec, but I haven't tested that bit of the code, as the page faults I encounter are all to non-present (!EVENT_FLAG_PR) mappings, in which case EVENT_FLAG_RW doesn't make sense. Signed-off-by: Lennert Buytenhek --- Changes for v3: - Test fault flags via macros. (Suggested by Suravee Suthikulpanit.) Changes for v2: - Don't call report_iommu_fault() for IRQ remapping faults. (Suggested by Joerg Roedel.) drivers/iommu/amd/amd_iommu_types.h | 4 drivers/iommu/amd/iommu.c | 29 + 2 files changed, 33 insertions(+) diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h index 94c1a7a9876d..2f2c6630c24c 100644 --- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -138,6 +138,10 @@ #define EVENT_DOMID_MASK_HI0xf #define EVENT_FLAGS_MASK 0xfff #define EVENT_FLAGS_SHIFT 0x10 +#define EVENT_FLAG_TR 0x100 +#define EVENT_FLAG_RW 0x020 +#define EVENT_FLAG_PR 0x010 +#define EVENT_FLAG_I 0x008 /* feature control bits */ #define CONTROL_IOMMU_EN0x00ULL diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index a7d6d78147b7..00975b08bd3f 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -473,6 +473,22 @@ static void amd_iommu_report_rmp_fault(volatile u32 *event) pci_dev_put(pdev); } +/* + * AMD I/O Virtualization Technology (IOMMU) Specification, revision + * 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says that the RW + * ("read-write") bit is only valid if the I/O page fault was caused + * by a memory transaction request referencing a page that was marked + * present. + */ +#define IS_IOMMU_MEM_TRANSACTION(flags)\ + (((flags) & EVENT_FLAG_I) == 0) + +#define IS_RW_FLAG_VALID(flags)\ + (((flags) & (EVENT_FLAG_TR | EVENT_FLAG_PR)) == EVENT_FLAG_PR) + +#define IS_WRITE_REQUEST(flags)\ + (IS_RW_FLAG_VALID(flags) && ((flags) & EVENT_FLAG_RW)) + static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, u64 address, int flags) { @@ -484,6 +500,18 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, if (pdev) dev_data = dev_iommu_priv_get(&pdev->dev); + /* +* If this is a DMA fault (for which the I(nterrupt) bit will +* be unset), allow report_iommu_fault() to prevent logging it. +*/ + if (dev_data && IS_IOMMU_MEM_TRANSACTION(flags)) { + if (!report_iommu_fault(&dev_data->domain->domain, + &pdev->dev, address, + IS_WRITE_REQUEST(flags) ? + IOMMU_FA
Re: [PATCH v2] iommu/amd: Use report_iommu_fault()
On Wed, Jul 28, 2021 at 04:51:27PM -0500, Suthikulpanit, Suravee wrote: > Lennert, Hi Suravee, > > This patch makes iommu/amd call report_iommu_fault() when an I/O page > > fault occurs, which has two effects: > > > > 1) It allows device drivers to register a callback to be notified of > > I/O page faults, via the iommu_set_fault_handler() API. > > > > 2) It triggers the io_page_fault tracepoint in report_iommu_fault() > > when an I/O page fault occurs. > > > > I'm mainly interested in (2). We have a daemon with some rasdaemon-like > > functionality for handling platform errors, and being able to be notified > > of I/O page faults for initiating corrective action is very useful -- and > > receiving such events via event tracing is a lot nicer than having to > > scrape them from kmsg. > > > > A number of other IOMMU drivers already use report_iommu_fault(), and > > I/O page faults on those IOMMUs therefore already seem to trigger this > > tracepoint -- but this isn't (yet) the case for AMD-Vi and Intel DMAR. > > > > I copied the logic from the other callers of report_iommu_fault(), where > > if that function returns zero, the driver will have handled the fault, > > in which case we avoid logging information about the fault to the printk > > buffer from the IOMMU driver. > > > > With this patch I see io_page_fault event tracing entries as expected: > > > > irq/24-AMD-Vi-48[002] 978.554289: io_page_fault: > > IOMMU:[drvname] :05:00.0 iova=0x91482640 flags=0x > > irq/24-AMD-Vi-48[002] 978.554294: io_page_fault: > > IOMMU:[drvname] :05:00.0 iova=0x91482650 flags=0x > > irq/24-AMD-Vi-48[002] 978.554299: io_page_fault: > > IOMMU:[drvname] :05:00.0 iova=0x91482660 flags=0x > > irq/24-AMD-Vi-48[002] 978.554305: io_page_fault: > > IOMMU:[drvname] :05:00.0 iova=0x91482670 flags=0x > > irq/24-AMD-Vi-48[002] 978.554310: io_page_fault: > > IOMMU:[drvname] :05:00.0 iova=0x91482680 flags=0x > > irq/24-AMD-Vi-48[002] 978.554315: io_page_fault: > > IOMMU:[drvname] :05:00.0 iova=0x914826a0 flags=0x > > > > For determining IOMMU_FAULT_{READ,WRITE}, I followed the AMD IOMMU > > spec, but I haven't tested that bit of the code, as the page faults I > > encounter are all to non-present (!EVENT_FLAG_PR) mappings, in which > > case EVENT_FLAG_RW doesn't make sense. > > > > Signed-off-by: Lennert Buytenhek > > --- > > Changes since v1 RFC: > > > > - Don't call report_iommu_fault() for IRQ remapping faults. > >(Suggested by Joerg Roedel.) > > > > drivers/iommu/amd/amd_iommu_types.h | 4 > > drivers/iommu/amd/iommu.c | 29 + > > 2 files changed, 33 insertions(+) > > > > diff --git a/drivers/iommu/amd/amd_iommu_types.h > > b/drivers/iommu/amd/amd_iommu_types.h > > index 94c1a7a9876d..2f2c6630c24c 100644 > > --- a/drivers/iommu/amd/amd_iommu_types.h > > +++ b/drivers/iommu/amd/amd_iommu_types.h > > @@ -138,6 +138,10 @@ > > #define EVENT_DOMID_MASK_HI 0xf > > #define EVENT_FLAGS_MASK 0xfff > > #define EVENT_FLAGS_SHIFT 0x10 > > +#define EVENT_FLAG_TR 0x100 > > +#define EVENT_FLAG_RW 0x020 > > +#define EVENT_FLAG_PR 0x010 > > +#define EVENT_FLAG_I 0x008 > > /* feature control bits */ > > #define CONTROL_IOMMU_EN0x00ULL > > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c > > index a7d6d78147b7..d9fb2c22d44a 100644 > > --- a/drivers/iommu/amd/iommu.c > > +++ b/drivers/iommu/amd/iommu.c > > What if we introduce: > > +/* > + * AMD I/O Virtualization Technology (IOMMU) Specification, > + * revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says > + * that the RW ("read-write") bit is only valid if the I/O > + * page fault was caused by a memory transaction request > + * referencing a page that was marked present. > + */ > +#define IO_PAGE_FAULT_MEM_MASK \ > + (EVENT_FLAG_TR | EVENT_FLAG_PR | EVENT_FLAG_I) > +#define IS_IOMMU_MEM_TRANSACTION(x)\ > + ((x & IO_PAGE_FAULT_MEM_MASK) == EVENT_FLAG_PR) > > Note that this should have already checked w/ EVENT_FLAG_I == 0. > > > > @@ -484,6 +484,34 @@ static void amd_iommu_report_page_fault(u16 devid, u16 > > domain_id, > > if (pdev)
[PATCH v2] iommu/amd: Use report_iommu_fault()
This patch makes iommu/amd call report_iommu_fault() when an I/O page fault occurs, which has two effects: 1) It allows device drivers to register a callback to be notified of I/O page faults, via the iommu_set_fault_handler() API. 2) It triggers the io_page_fault tracepoint in report_iommu_fault() when an I/O page fault occurs. I'm mainly interested in (2). We have a daemon with some rasdaemon-like functionality for handling platform errors, and being able to be notified of I/O page faults for initiating corrective action is very useful -- and receiving such events via event tracing is a lot nicer than having to scrape them from kmsg. A number of other IOMMU drivers already use report_iommu_fault(), and I/O page faults on those IOMMUs therefore already seem to trigger this tracepoint -- but this isn't (yet) the case for AMD-Vi and Intel DMAR. I copied the logic from the other callers of report_iommu_fault(), where if that function returns zero, the driver will have handled the fault, in which case we avoid logging information about the fault to the printk buffer from the IOMMU driver. With this patch I see io_page_fault event tracing entries as expected: irq/24-AMD-Vi-48[002] 978.554289: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482640 flags=0x irq/24-AMD-Vi-48[002] 978.554294: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482650 flags=0x irq/24-AMD-Vi-48[002] 978.554299: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482660 flags=0x irq/24-AMD-Vi-48[002] 978.554305: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482670 flags=0x irq/24-AMD-Vi-48[002] 978.554310: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482680 flags=0x irq/24-AMD-Vi-48[002] 978.554315: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x914826a0 flags=0x For determining IOMMU_FAULT_{READ,WRITE}, I followed the AMD IOMMU spec, but I haven't tested that bit of the code, as the page faults I encounter are all to non-present (!EVENT_FLAG_PR) mappings, in which case EVENT_FLAG_RW doesn't make sense. Signed-off-by: Lennert Buytenhek --- Changes since v1 RFC: - Don't call report_iommu_fault() for IRQ remapping faults. (Suggested by Joerg Roedel.) drivers/iommu/amd/amd_iommu_types.h | 4 drivers/iommu/amd/iommu.c | 29 + 2 files changed, 33 insertions(+) diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h index 94c1a7a9876d..2f2c6630c24c 100644 --- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -138,6 +138,10 @@ #define EVENT_DOMID_MASK_HI0xf #define EVENT_FLAGS_MASK 0xfff #define EVENT_FLAGS_SHIFT 0x10 +#define EVENT_FLAG_TR 0x100 +#define EVENT_FLAG_RW 0x020 +#define EVENT_FLAG_PR 0x010 +#define EVENT_FLAG_I 0x008 /* feature control bits */ #define CONTROL_IOMMU_EN0x00ULL diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index a7d6d78147b7..d9fb2c22d44a 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -484,6 +484,34 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, if (pdev) dev_data = dev_iommu_priv_get(&pdev->dev); + /* +* If this is a DMA fault (for which the I(nterrupt) bit will +* be unset), allow report_iommu_fault() to prevent logging it. +*/ + if (dev_data && ((flags & EVENT_FLAG_I) == 0)) { + int report_flags; + + /* +* AMD I/O Virtualization Technology (IOMMU) Specification, +* revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says +* that the RW ("read-write") bit is only valid if the I/O +* page fault was caused by a memory transaction request +* referencing a page that was marked present. +*/ + report_flags = 0; + if ((flags & (EVENT_FLAG_TR | EVENT_FLAG_PR)) == + EVENT_FLAG_PR) { + if (flags & EVENT_FLAG_RW) + report_flags |= IOMMU_FAULT_WRITE; + else + report_flags |= IOMMU_FAULT_READ; + } + + if (!report_iommu_fault(&dev_data->domain->domain, + &pdev->dev, address, report_flags)) + goto out; + } + if (dev_data) { if (__ratelimit(&dev_data->rs)) { pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n", @@ -495,6 +523
Re: [PATCH,RFC] iommu/amd: Use report_iommu_fault()
On Mon, Jul 26, 2021 at 01:54:49PM +0200, Joerg Roedel wrote: > Hi Lennert, Hi Joerg, > On Mon, Jul 19, 2021 at 12:54:43PM +0300, Lennert Buytenhek wrote: > > + if (dev_data) { > > + int report_flags; > > + > > + /* > > +* AMD I/O Virtualization Technology (IOMMU) Specification, > > +* revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says > > +* that the RW ("read-write") bit is only valid if the I/O > > +* page fault was caused by a memory transaction request > > +* referencing a page that was marked present. > > +*/ > > + report_flags = 0; > > + if ((flags & (EVENT_FLAG_TR | EVENT_FLAG_PR | EVENT_FLAG_I)) == > > + EVENT_FLAG_PR) { > > + if (flags & EVENT_FLAG_RW) > > + report_flags |= IOMMU_FAULT_WRITE; > > + else > > + report_flags |= IOMMU_FAULT_READ; > > + } > > + > > + if (!report_iommu_fault(&dev_data->domain->domain, > > + &pdev->dev, address, report_flags)) > > + goto out; > > + } > > I'd like to limit calling report_iommu_fault() to dma-faults and leave > IRQ remapping faults unreported. The IOMMU layer does not really care a > lot about IRQs and a potential domain handler will also not be prepared > to handler IRQ specific faults (there is no generic way to detect them). I'm sorry -- this appears to have been a stupid oversight on my part. New patch coming up. Thanks, Lennert ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH,RFC] iommu/amd: Use report_iommu_fault()
On Thu, Jul 22, 2021 at 02:26:08PM -0500, Suthikulpanit, Suravee wrote: > Lennert, Hi Suravee, > > This patch makes iommu/amd call report_iommu_fault() when an I/O page > > fault occurs, which has two effects: > > > > 1) It allows device drivers to register a callback to be notified of > > I/O page faults, via the iommu_set_fault_handler() API. > > > > 2) It triggers the io_page_fault tracepoint in report_iommu_fault() > > when an I/O page fault occurs. > > > > I'm mainly interested in (2). We have a daemon with some rasdaemon-like > > functionality for handling platform errors, and being able to be notified > > of I/O page faults for initiating corrective action is very useful -- and > > receiving such events via event tracing is a lot nicer than having to > > scrape them from kmsg. > > Interesting. Just curious what types of error handling are done here? For example, this daemon annotates PCI errors with the symbolic name of the PCI device (including line card and ASIC number) that caused the fault, which is useful when there are dozens of identical ASICs in a system, and when hotplug makes it so that the offending PCI device might not be in the system anymore by the time someone gets around to looking at the fault, or a different line card may have been inserted in its place. > > A number of other IOMMU drivers already use report_iommu_fault(), and > > I/O page faults on those IOMMUs therefore already seem to trigger this > > tracepoint -- but this isn't (yet) the case for AMD-Vi and Intel DMAR. > > > > I copied the logic from the other callers of report_iommu_fault(), where > > if that function returns zero, the driver will have handled the fault, > > in which case we avoid logging information about the fault to the printk > > buffer from the IOMMU driver. > > > > With this patch I see io_page_fault event tracing entries as expected: > > > > irq/24-AMD-Vi-48[002] 978.554289: io_page_fault: > > IOMMU:[drvname] :05:00.0 iova=0x91482640 flags=0x > > irq/24-AMD-Vi-48[002] 978.554294: io_page_fault: > > IOMMU:[drvname] :05:00.0 iova=0x91482650 flags=0x > > irq/24-AMD-Vi-48[002] 978.554299: io_page_fault: > > IOMMU:[drvname] :05:00.0 iova=0x91482660 flags=0x > > irq/24-AMD-Vi-48[002] 978.554305: io_page_fault: > > IOMMU:[drvname] :05:00.0 iova=0x91482670 flags=0x > > irq/24-AMD-Vi-48[002] 978.554310: io_page_fault: > > IOMMU:[drvname] :05:00.0 iova=0x91482680 flags=0x > > irq/24-AMD-Vi-48[002] 978.554315: io_page_fault: > > IOMMU:[drvname] :05:00.0 iova=0x914826a0 flags=0x > > > > For determining IOMMU_FAULT_{READ,WRITE}, I followed the AMD IOMMU > > spec, but I haven't tested that bit of the code, as the page faults I > > encounter are all to non-present (!EVENT_FLAG_PR) mappings, in which > > case EVENT_FLAG_RW doesn't make sense. > > Since, IO_PAGE_FAULT event is used to communicate various types of > fault events, why don't we just pass the flags as-is? This way, it can > be used to report/trace various types of IO_PAGE_FAULT events (e.g. > for I/O page table, interrupt remapping, and etc). > > Interested parties can register domain fault handler, and it can takes > care of parsing information of the flag as needed. > > > Signed-off-by: Lennert Buytenhek > > --- > > drivers/iommu/amd/amd_iommu_types.h |4 > > drivers/iommu/amd/iommu.c | 25 + > > 2 files changed, 29 insertions(+) > > > > diff --git a/drivers/iommu/amd/amd_iommu_types.h > > b/drivers/iommu/amd/amd_iommu_types.h > > index 94c1a7a9876d..2f2c6630c24c 100644 > > --- a/drivers/iommu/amd/amd_iommu_types.h > > +++ b/drivers/iommu/amd/amd_iommu_types.h > > @@ -138,6 +138,10 @@ > > #define EVENT_DOMID_MASK_HI 0xf > > #define EVENT_FLAGS_MASK 0xfff > > #define EVENT_FLAGS_SHIFT 0x10 > > +#define EVENT_FLAG_TR 0x100 > > +#define EVENT_FLAG_RW 0x020 > > +#define EVENT_FLAG_PR 0x010 > > +#define EVENT_FLAG_I 0x008 > > /* feature control bits */ > > #define CONTROL_IOMMU_EN0x00ULL > > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c > > index 811a49a95d04..a02ace7ee794 100644 > > --- a/drivers/iommu/amd/iommu.c > > +++ b/drivers/iommu/amd/iommu.c > > @@ -480,6 +480,30 @@ static void amd_iommu_report_page_fault(u16 devid, u16
Re: [PATCH] iommu/amd: Fix I/O page fault logging ratelimit test
On Tue, Jul 20, 2021 at 07:05:50PM -0500, Suthikulpanit, Suravee wrote: > Hi Lennert, Hi Suravee, > > On an AMD system, I/O page faults are usually logged like this: > > > > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c > > index 811a49a95d04..7ae426b092f2 100644 > > --- a/drivers/iommu/amd/iommu.c > > +++ b/drivers/iommu/amd/iommu.c > > @@ -483,7 +483,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 > > domain_id, > > if (dev_data && __ratelimit(&dev_data->rs)) { > > pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x > > address=0x%llx flags=0x%04x]\n", > > domain_id, address, flags); > > - } else if (printk_ratelimit()) { > > + } else if (!dev_data && printk_ratelimit()) { > > This seems a bit confusing. Also, according to the following comment in > include/linux/printk.h: > > /* > * Please don't use printk_ratelimit(), because it shares ratelimiting state > * with all other unrelated printk_ratelimit() callsites. Instead use > * printk_ratelimited() or plain old __ratelimit(). > */ > > We probably should move away from using printk_ratelimit() here. > What about the following change instead? > > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c > index 811a49a95d04..8eb5d3519743 100644 > --- a/drivers/iommu/amd/iommu.c > +++ b/drivers/iommu/amd/iommu.c > @@ -480,11 +480,12 @@ static void amd_iommu_report_page_fault(u16 devid, u16 > domain_id, > if (pdev) > dev_data = dev_iommu_priv_get(&pdev->dev); > > - if (dev_data && __ratelimit(&dev_data->rs)) { > - pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x > address=0x%llx flags=0x%04x]\n", > - domain_id, address, flags); > - } else if (printk_ratelimit()) { > - pr_err("Event logged [IO_PAGE_FAULT device=%02x:%02x.%x > domain=0x%04x address=0x%llx flags=0x%04x]\n", > + if (dev_data) { > + if (__ratelimit(&dev_data->rs)) > + pci_err(pdev, "Event logged [IO_PAGE_FAULT > domain=0x%04x address=0x%llx flags=0x%04x]\n", > + domain_id, address, flags); > + } else { > + pr_err_ratelimited("Event logged [IO_PAGE_FAULT > device=%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n", > PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid), > domain_id, address, flags); > } Looks good! > Note also that there might be other places in this file that would need > similar modification as well. Indeed, there are two more sites like these. I've sent a new patch that incorporates your feedback. Thank you! Cheers, Lennert ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/amd: Fix printing of IOMMU events when rate limiting kicks in
For the printing of RMP_HW_ERROR / RMP_PAGE_FAULT / IO_PAGE_FAULT events, the AMD IOMMU code uses such logic: if (pdev) dev_data = dev_iommu_priv_get(&pdev->dev); if (dev_data && __ratelimit(&dev_data->rs)) { pci_err(pdev, ... } else { printk_ratelimit() / pr_err{,_ratelimited}(... } This means that if we receive an event for a PCI devid which actually does have a struct pci_dev and an attached struct iommu_dev_data, but rate limiting kicks in, we'll fall back to the non-PCI branch of the test, and print the event in a different format. Fix this by changing the logic to: if (dev_data) { if (__ratelimit(&dev_data->rs)) { pci_err(pdev, ... } } else { pr_err_ratelimited(... } Suggested-by: Suravee Suthikulpanit Signed-off-by: Lennert Buytenhek --- drivers/iommu/amd/iommu.c | 28 +--- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 811a49a95d04..a7d6d78147b7 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -425,9 +425,11 @@ static void amd_iommu_report_rmp_hw_error(volatile u32 *event) if (pdev) dev_data = dev_iommu_priv_get(&pdev->dev); - if (dev_data && __ratelimit(&dev_data->rs)) { - pci_err(pdev, "Event logged [RMP_HW_ERROR vmg_tag=0x%04x, spa=0x%llx, flags=0x%04x]\n", - vmg_tag, spa, flags); + if (dev_data) { + if (__ratelimit(&dev_data->rs)) { + pci_err(pdev, "Event logged [RMP_HW_ERROR vmg_tag=0x%04x, spa=0x%llx, flags=0x%04x]\n", + vmg_tag, spa, flags); + } } else { pr_err_ratelimited("Event logged [RMP_HW_ERROR device=%02x:%02x.%x, vmg_tag=0x%04x, spa=0x%llx, flags=0x%04x]\n", PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid), @@ -456,9 +458,11 @@ static void amd_iommu_report_rmp_fault(volatile u32 *event) if (pdev) dev_data = dev_iommu_priv_get(&pdev->dev); - if (dev_data && __ratelimit(&dev_data->rs)) { - pci_err(pdev, "Event logged [RMP_PAGE_FAULT vmg_tag=0x%04x, gpa=0x%llx, flags_rmp=0x%04x, flags=0x%04x]\n", - vmg_tag, gpa, flags_rmp, flags); + if (dev_data) { + if (__ratelimit(&dev_data->rs)) { + pci_err(pdev, "Event logged [RMP_PAGE_FAULT vmg_tag=0x%04x, gpa=0x%llx, flags_rmp=0x%04x, flags=0x%04x]\n", + vmg_tag, gpa, flags_rmp, flags); + } } else { pr_err_ratelimited("Event logged [RMP_PAGE_FAULT device=%02x:%02x.%x, vmg_tag=0x%04x, gpa=0x%llx, flags_rmp=0x%04x, flags=0x%04x]\n", PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid), @@ -480,11 +484,13 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, if (pdev) dev_data = dev_iommu_priv_get(&pdev->dev); - if (dev_data && __ratelimit(&dev_data->rs)) { - pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n", - domain_id, address, flags); - } else if (printk_ratelimit()) { - pr_err("Event logged [IO_PAGE_FAULT device=%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n", + if (dev_data) { + if (__ratelimit(&dev_data->rs)) { + pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n", + domain_id, address, flags); + } + } else { + pr_err_ratelimited("Event logged [IO_PAGE_FAULT device=%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n", PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid), domain_id, address, flags); } -- 2.31.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH,RFC] iommu/amd: Use report_iommu_fault()
This patch makes iommu/amd call report_iommu_fault() when an I/O page fault occurs, which has two effects: 1) It allows device drivers to register a callback to be notified of I/O page faults, via the iommu_set_fault_handler() API. 2) It triggers the io_page_fault tracepoint in report_iommu_fault() when an I/O page fault occurs. I'm mainly interested in (2). We have a daemon with some rasdaemon-like functionality for handling platform errors, and being able to be notified of I/O page faults for initiating corrective action is very useful -- and receiving such events via event tracing is a lot nicer than having to scrape them from kmsg. A number of other IOMMU drivers already use report_iommu_fault(), and I/O page faults on those IOMMUs therefore already seem to trigger this tracepoint -- but this isn't (yet) the case for AMD-Vi and Intel DMAR. I copied the logic from the other callers of report_iommu_fault(), where if that function returns zero, the driver will have handled the fault, in which case we avoid logging information about the fault to the printk buffer from the IOMMU driver. With this patch I see io_page_fault event tracing entries as expected: irq/24-AMD-Vi-48[002] 978.554289: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482640 flags=0x irq/24-AMD-Vi-48[002] 978.554294: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482650 flags=0x irq/24-AMD-Vi-48[002] 978.554299: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482660 flags=0x irq/24-AMD-Vi-48[002] 978.554305: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482670 flags=0x irq/24-AMD-Vi-48[002] 978.554310: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482680 flags=0x irq/24-AMD-Vi-48[002] 978.554315: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x914826a0 flags=0x For determining IOMMU_FAULT_{READ,WRITE}, I followed the AMD IOMMU spec, but I haven't tested that bit of the code, as the page faults I encounter are all to non-present (!EVENT_FLAG_PR) mappings, in which case EVENT_FLAG_RW doesn't make sense. Signed-off-by: Lennert Buytenhek --- drivers/iommu/amd/amd_iommu_types.h |4 drivers/iommu/amd/iommu.c | 25 + 2 files changed, 29 insertions(+) diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h index 94c1a7a9876d..2f2c6630c24c 100644 --- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -138,6 +138,10 @@ #define EVENT_DOMID_MASK_HI0xf #define EVENT_FLAGS_MASK 0xfff #define EVENT_FLAGS_SHIFT 0x10 +#define EVENT_FLAG_TR 0x100 +#define EVENT_FLAG_RW 0x020 +#define EVENT_FLAG_PR 0x010 +#define EVENT_FLAG_I 0x008 /* feature control bits */ #define CONTROL_IOMMU_EN0x00ULL diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 811a49a95d04..a02ace7ee794 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -480,6 +480,30 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, if (pdev) dev_data = dev_iommu_priv_get(&pdev->dev); + if (dev_data) { + int report_flags; + + /* +* AMD I/O Virtualization Technology (IOMMU) Specification, +* revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says +* that the RW ("read-write") bit is only valid if the I/O +* page fault was caused by a memory transaction request +* referencing a page that was marked present. +*/ + report_flags = 0; + if ((flags & (EVENT_FLAG_TR | EVENT_FLAG_PR | EVENT_FLAG_I)) == + EVENT_FLAG_PR) { + if (flags & EVENT_FLAG_RW) + report_flags |= IOMMU_FAULT_WRITE; + else + report_flags |= IOMMU_FAULT_READ; + } + + if (!report_iommu_fault(&dev_data->domain->domain, + &pdev->dev, address, report_flags)) + goto out; + } + if (dev_data && __ratelimit(&dev_data->rs)) { pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n", domain_id, address, flags); @@ -489,6 +513,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, domain_id, address, flags); } +out: if (pdev) pci_dev_put(pdev); } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/amd: Fix I/O page fault logging ratelimit test
On an AMD system, I/O page faults are usually logged like this: drvname :05:00.0: Event logged [IO_PAGE_FAULT domain=0x address=0x92050da0 flags=0x0020] But sometimes they are logged like this instead, even for the exact same PCI device: AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x address=0x92050de0 flags=0x0020] This discrepancy appears to be caused by this code: if (dev_data && __ratelimit(&dev_data->rs)) { pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n", domain_id, address, flags); } else if (printk_ratelimit()) { pr_err("Event logged [IO_PAGE_FAULT device=%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n", PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid), domain_id, address, flags); } If an I/O page fault occurs for a PCI device with associated iommu_dev_data, but for which the __ratelimit(&dev_data->rs) check fails, we'll give it a second chance with printk_ratelimit(), and if that check succeeds, we will log the fault anyway, but in a different format. Change this to only check printk_ratelimit() if !dev_data, which seems to be what had been originally intended. Signed-off-by: Lennert Buytenhek --- drivers/iommu/amd/iommu.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 811a49a95d04..7ae426b092f2 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -483,7 +483,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, if (dev_data && __ratelimit(&dev_data->rs)) { pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n", domain_id, address, flags); - } else if (printk_ratelimit()) { + } else if (!dev_data && printk_ratelimit()) { pr_err("Event logged [IO_PAGE_FAULT device=%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n", PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid), domain_id, address, flags); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu