On Tue, Sep 28, 2021 at 10:45:30AM +0200, Joerg Roedel wrote: > Hi Lennert,
Hi Joerg, Thanks for your feedback! > > +/* > > + * This function restarts event logging in case the IOMMU experienced > > + * an event log buffer overflow. > > + */ > > +void amd_iommu_restart_event_logging(struct amd_iommu *iommu) > > +{ > > + iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN); > > + iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN); > > +} > > A few more things need to happen here. First of all head and tail are > likely equal when the event buffer overflows, so the events will not be > polled and reported. I applied the following change on top of my patch, to check the state of the event log ring buffer when we enter the IRQ handler with an overflow condition pending: --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -752,6 +752,18 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data) struct amd_iommu *iommu = (struct amd_iommu *) data; u32 status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET); + if (status & MMIO_STATUS_EVT_OVERFLOW_INT_MASK) { + u32 events; + + events = readl(iommu->mmio_base + MMIO_EVT_TAIL_OFFSET) - + readl(iommu->mmio_base + MMIO_EVT_HEAD_OFFSET); + if (events & 0x80000000) + events += EVT_BUFFER_SIZE; + events /= EVENT_ENTRY_SIZE; + + pr_info("Overflow with %d events queued\n", events); + } + while (status & AMD_IOMMU_INT_MASK) { /* Enable interrupt sources again */ writel(AMD_IOMMU_INT_MASK, And that gives me: [ 33.304821] AMD-Vi: Overflow with 511 events queued [ 35.304812] AMD-Vi: Overflow with 511 events queued [ 39.304791] AMD-Vi: Overflow with 511 events queued [ 40.304792] AMD-Vi: Overflow with 511 events queued [ 41.304782] AMD-Vi: Overflow with 511 events queued [ 44.009920] AMD-Vi: Overflow with 511 events queued [ 45.304768] AMD-Vi: Overflow with 511 events queued [ 46.304782] AMD-Vi: Overflow with 511 events queued [ 46.385028] AMD-Vi: Overflow with 511 events queued [ 51.304733] AMD-Vi: Overflow with 511 events queued [ 53.318892] AMD-Vi: Overflow with 511 events queued [ 60.304681] AMD-Vi: Overflow with 511 events queued [ 63.304676] AMD-Vi: Overflow with 511 events queued In other words, it seems that the hardware considers the event log to be full when there's still one free entry in the ring buffer, and it will not actually fill up the last free entry and make the head and tail pointer equal by doing so. This seems consistent across Ryzen 3700X, Ryzen 5950X, EPYC 3151, EPYC 3251, RX-421ND, RX-216TD. The docs talk about "When the Event Log has overflowed", but they don't define exactly when that happens (i.e. when tail becomes equal to head or one entry before that), but I did find this phrase that talks about the overflow condition: The host software must make space in the event log after an overflow by reading entries (by adjusting the head pointer) or by resizing the log. Event logging may then be restarted. This seems to suggest that the hardware will never consume the last entry in the ring buffer and thereby advance tail to be equal to head, because if it would, then there would be no need for "reading entries (by adjusting the head pointer)" to make space in the event log buffer, because the 'empty' and 'full' conditions would be indistinguishable in that case. If there _is_ some variation of the hardware out there that does advance the tail pointer to coincide with the head pointer when there are already N-1 entries in the log and it has one more entry to write, then this patch would still work OK on such hardware -- we would just lose a few more events in that case than we otherwise would, but the point of the patch is to be able to recover after a burst of I/O page faults, at which point we've very likely already lost some events, so I think such hypothetical lossage would be acceptable, given that none of the hardware I have access to seems to behave that way and from the wording in the docs it is unlikely to behave that way. In fact, thinking about it a bit more, by the time an event log overflow condition is handled, it is actually possible for the event log ring buffer to be empty and not full. Imagine this scenario: - We enter the IRQ handler, EVT status bit is set, the ring buffer is full but it hasn't overflowed yet, so OVERFLOW is not set. - We read interrupt status and clear the EVT bit. - Before we call iommu_poll_events(), another event comes in, which causes the OVERFLOW flag to be set, since we haven't advanced head yet. - iommu_poll_events() is called, and it processes all events in the ring buffer, which is now empty (and will remain empty, since the overflow bit is set). - The MMIO_STATUS_OFFSET re-reading at the end of the IRQ loop finds the OVERFLOW bit set and loops back to the beginning of the loop. - We signal status & MMIO_STATUS_EVT_OVERFLOW_INT_MASK in the next loop iteration, but there are actually no events in the ring buffer anymore, since we cleared those all out in the previous loop. In other words, even if there is hardware out there that uses up every entry in the ring buffer (and will thus let tail become equal to head), on such hardware we cannot be sure that every entry in the ring buffer is valid by the time we signal the overflow condition, as it might also be the case that the ring buffer is now entirely empty and not full, and the overflow handling code therefore cannot just go ahead and report every entry in the ring buffer when an overflow happens. (We _could_ deal with this by reading head and tail and then re-reading the interrupt status bit after that to check for an overflow condition once again, but I think that that's probably more trouble than it's worth, given that we're talking about hypothetical hardware here. The way I did it is probably the simplest way of handling overflows, I think.) > And next it is also a good idea to reset the head and tail pointers of > the event log to 0 after the events have been polled. What difference would this make? It would cost two MMIO writes, and I don't see what the advantage of doing this would be, as it's fine for the IOMMU to resume writing events at any index in the ring buffer, and we will handle that just fine. Some more testing suggests that this would be a good change to make: ;-) @@ -775,7 +775,7 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data) #endif if (status & MMIO_STATUS_EVT_OVERFLOW_INT_MASK) { - pr_info("AMD-Vi: IOMMU event log overflow\n"); + pr_info_ratelimited("IOMMU event log overflow\n"); amd_iommu_restart_event_logging(iommu); } I can resubmit the patch with that change applied. Thanks, Lennert _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu