Hi Joerg, is there anything I may do about those two patches? In 2/2 I've limited loop cnt as discussed in v3. This one solves softlockup for us, might be useful.
On Sat, 2018-03-31 at 01:33 +0100, Dmitry Safonov wrote: > There is a ratelimit for printing, but it's incremented each time the > cpu recives dmar fault interrupt. While one interrupt may signal > about > *many* faults. > So, measuring the impact it turns out that reading/clearing one fault > takes < 1 usec, and printing info about the fault takes ~170 msec. > > Having in mind that maximum number of fault recording registers per > remapping hardware unit is 256.. IRQ handler may run for (170*256) > msec. > And as fault-serving loop runs without a time limit, during servicing > new faults may occur.. > > Ratelimit each fault printing rather than each irq printing. > > Fixes: commit c43fce4eebae ("iommu/vt-d: Ratelimit fault handler") > > BUG: spinlock lockup suspected on CPU#0, CliShell/9903 > lock: 0xffffffff81a47440, .magic: dead4ead, .owner: > kworker/u16:2/8915, .owner_cpu: 6 > CPU: 0 PID: 9903 Comm: CliShell > Call Trace:$\n' > [..] dump_stack+0x65/0x83$\n' > [..] spin_dump+0x8f/0x94$\n' > [..] do_raw_spin_lock+0x123/0x170$\n' > [..] _raw_spin_lock_irqsave+0x32/0x3a$\n' > [..] uart_chars_in_buffer+0x20/0x4d$\n' > [..] tty_chars_in_buffer+0x18/0x1d$\n' > [..] n_tty_poll+0x1cb/0x1f2$\n' > [..] tty_poll+0x5e/0x76$\n' > [..] do_select+0x363/0x629$\n' > [..] compat_core_sys_select+0x19e/0x239$\n' > [..] compat_SyS_select+0x98/0xc0$\n' > [..] sysenter_dispatch+0x7/0x25$\n' > [..] > NMI backtrace for cpu 6 > CPU: 6 PID: 8915 Comm: kworker/u16:2 > Workqueue: dmar_fault dmar_fault_work > Call Trace:$\n' > [..] wait_for_xmitr+0x26/0x8f$\n' > [..] serial8250_console_putchar+0x1c/0x2c$\n' > [..] uart_console_write+0x40/0x4b$\n' > [..] serial8250_console_write+0xe6/0x13f$\n' > [..] call_console_drivers.constprop.13+0xce/0x103$\n' > [..] console_unlock+0x1f8/0x39b$\n' > [..] vprintk_emit+0x39e/0x3e6$\n' > [..] printk+0x4d/0x4f$\n' > [..] dmar_fault+0x1a8/0x1fc$\n' > [..] dmar_fault_work+0x15/0x17$\n' > [..] process_one_work+0x1e8/0x3a9$\n' > [..] worker_thread+0x25d/0x345$\n' > [..] kthread+0xea/0xf2$\n' > [..] ret_from_fork+0x58/0x90$\n' > > Cc: Alex Williamson <alex.william...@redhat.com> > Cc: David Woodhouse <dw...@infradead.org> > Cc: Ingo Molnar <mi...@kernel.org> > Cc: Joerg Roedel <j...@8bytes.org> > Cc: Lu Baolu <baolu...@linux.intel.com> > Cc: io...@lists.linux-foundation.org > Signed-off-by: Dmitry Safonov <d...@arista.com> > --- > drivers/iommu/dmar.c | 8 +++----- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > index accf58388bdb..6c4ea32ee6a9 100644 > --- a/drivers/iommu/dmar.c > +++ b/drivers/iommu/dmar.c > @@ -1618,17 +1618,13 @@ irqreturn_t dmar_fault(int irq, void *dev_id) > int reg, fault_index; > u32 fault_status; > unsigned long flag; > - bool ratelimited; > static DEFINE_RATELIMIT_STATE(rs, > DEFAULT_RATELIMIT_INTERVAL, > DEFAULT_RATELIMIT_BURST); > > - /* Disable printing, simply clear the fault when ratelimited > */ > - ratelimited = !__ratelimit(&rs); > - > raw_spin_lock_irqsave(&iommu->register_lock, flag); > fault_status = readl(iommu->reg + DMAR_FSTS_REG); > - if (fault_status && !ratelimited) > + if (fault_status && __ratelimit(&rs)) > pr_err("DRHD: handling fault status reg %x\n", > fault_status); > > /* TBD: ignore advanced fault log currently */ > @@ -1638,6 +1634,8 @@ irqreturn_t dmar_fault(int irq, void *dev_id) > fault_index = dma_fsts_fault_record_index(fault_status); > reg = cap_fault_reg_offset(iommu->cap); > while (1) { > + /* Disable printing, simply clear the fault when > ratelimited */ > + bool ratelimited = !__ratelimit(&rs); > u8 fault_reason; > u16 source_id; > u64 guest_addr;