Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
2018-03-29 9:50 GMT+01:00 Joerg Roedel : > On Tue, Mar 20, 2018 at 08:50:13PM +, Dmitry Safonov wrote: >> Hmm, but this fixes my softlockup issue, because it's about time spent >> in printk() inside irq-disabled section, rather about exiting the dmar- >> clearing loop. >> And on my hw doesn't make any difference to limit loop or not because >> clearing a fault is much faster than hw could generate a new fault. >> ITOW, it fixes the softlockup for me and the loop-related lockup can't >> happen on hw I have (so it's the other issue, [possible?] on other hw). > > It might solve your issue, but someone else might still run into it with > a different setup. An upstream fix needs to solve it for everyone. Ok, I'll resend v4 with an additional patch to limit the loop. Thanks, Dmitry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
On Tue, Mar 20, 2018 at 08:50:13PM +, Dmitry Safonov wrote: > Hmm, but this fixes my softlockup issue, because it's about time spent > in printk() inside irq-disabled section, rather about exiting the dmar- > clearing loop. > And on my hw doesn't make any difference to limit loop or not because > clearing a fault is much faster than hw could generate a new fault. > ITOW, it fixes the softlockup for me and the loop-related lockup can't > happen on hw I have (so it's the other issue, [possible?] on other hw). It might solve your issue, but someone else might still run into it with a different setup. An upstream fix needs to solve it for everyone. Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
On Thu, 2018-03-15 at 16:28 +0100, Joerg Roedel wrote: > On Thu, Mar 15, 2018 at 02:42:00PM +, Dmitry Safonov wrote: > > But even with loop-limit we will need ratelimit each printk() > > *also*. > > Otherwise loop-limit will be based on time spent printing, not on > > anything else.. > > The patch makes sense even with loop-limit in my opinion. > > Looks like I mis-read your patch, somehow it looked to me as if you > replace all 'ratelimited' usages with a call to __ratelimit(), but > you > just move 'ratelimited' into the loop, which actually makes sense. So, is it worth to apply the patch? > But still, this alone is no proper fix for the soft-lockups you are > seeing. Hmm, but this fixes my softlockup issue, because it's about time spent in printk() inside irq-disabled section, rather about exiting the dmar- clearing loop. And on my hw doesn't make any difference to limit loop or not because clearing a fault is much faster than hw could generate a new fault. ITOW, it fixes the softlockup for me and the loop-related lockup can't happen on hw I have (so it's the other issue, [possible?] on other hw). -- Thanks, Dmitry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
On Thu, 2018-03-15 at 16:28 +0100, Joerg Roedel wrote: > On Thu, Mar 15, 2018 at 02:42:00PM +, Dmitry Safonov wrote: > > But even with loop-limit we will need ratelimit each printk() > > *also*. > > Otherwise loop-limit will be based on time spent printing, not on > > anything else.. > > The patch makes sense even with loop-limit in my opinion. > > Looks like I mis-read your patch, somehow it looked to me as if you > replace all 'ratelimited' usages with a call to __ratelimit(), but > you > just move 'ratelimited' into the loop, which actually makes sense. Oh, ok > But still, this alone is no proper fix for the soft-lockups you are > seeing. Well, I can also limit number of loops with say cap_num_fault_regs(). I didn't do that as on my measures the time spent on clearing a fault is so small, that I'm not sure if it's possible to stuck in this loop. -- Thanks, Dmitry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
On Thu, Mar 15, 2018 at 02:42:00PM +, Dmitry Safonov wrote: > But even with loop-limit we will need ratelimit each printk() *also*. > Otherwise loop-limit will be based on time spent printing, not on > anything else.. > The patch makes sense even with loop-limit in my opinion. Looks like I mis-read your patch, somehow it looked to me as if you replace all 'ratelimited' usages with a call to __ratelimit(), but you just move 'ratelimited' into the loop, which actually makes sense. But still, this alone is no proper fix for the soft-lockups you are seeing. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
On Thu, 2018-03-15 at 14:34 +, Dmitry Safonov wrote: > On Thu, 2018-03-15 at 15:22 +0100, Joerg Roedel wrote: > > On Thu, Mar 15, 2018 at 02:13:03PM +, Dmitry Safonov wrote: > > > So, you suggest to remove ratelimit at all? > > > Do we really need printk flood for each happened fault? > > > Imagine, you've hundreds of mappings and then PCI link flapped.. > > > Wouldn't it be better to keep ratelimiting? > > > I don't mind, just it looks a bit strange to me. > > > > I never said you should remove the ratelimiting, after all you are > > trying to fix a soft-lockup, no? > > > > And that should not be fixed by changes to the ratelimiting, but > > with > > proper irq handling. > > Uh, I'm a bit confused then. > - Isn't it better to ratelimit each printk() instead of bunch of > printks inside irq handler? > - I can limit the number of loops, but the most of the time is spent > in > the loop on printk() (on my machine ~170msec per loop), while > everything else takes much lesser time (on my machine < 1 usec per > loop). So, if I will limit number of loops per-irq, that cycle-limit > will be based on limiting time spent on printk (e.g., how many > printks > to do in atomic context so that node will not lockup). It smells like > ratelimiting, no? > > I must be misunderstanding something, but why introducing another > limit > for number of printk() called when there is ratelimit which may be > tuned.. > So I agree, that maybe better to have another limit to the cycle *also*, because if we clean faults with the same speed as they're generated by hw, we may stuck in the loop.. By on my measures clearing fault is so fast (< 1 usec), that I'm not sure that it may happen with hw. By that reason I didn't introduce loop-limit. But even with loop-limit we will need ratelimit each printk() *also*. Otherwise loop-limit will be based on time spent printing, not on anything else.. The patch makes sense even with loop-limit in my opinion. -- Thanks, Dmitry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
On Thu, 2018-03-15 at 15:22 +0100, Joerg Roedel wrote: > On Thu, Mar 15, 2018 at 02:13:03PM +, Dmitry Safonov wrote: > > So, you suggest to remove ratelimit at all? > > Do we really need printk flood for each happened fault? > > Imagine, you've hundreds of mappings and then PCI link flapped.. > > Wouldn't it be better to keep ratelimiting? > > I don't mind, just it looks a bit strange to me. > > I never said you should remove the ratelimiting, after all you are > trying to fix a soft-lockup, no? > > And that should not be fixed by changes to the ratelimiting, but with > proper irq handling. Uh, I'm a bit confused then. - Isn't it better to ratelimit each printk() instead of bunch of printks inside irq handler? - I can limit the number of loops, but the most of the time is spent in the loop on printk() (on my machine ~170msec per loop), while everything else takes much lesser time (on my machine < 1 usec per loop). So, if I will limit number of loops per-irq, that cycle-limit will be based on limiting time spent on printk (e.g., how many printks to do in atomic context so that node will not lockup). It smells like ratelimiting, no? I must be misunderstanding something, but why introducing another limit for number of printk() called when there is ratelimit which may be tuned.. -- Thanks, Dmitry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
On Thu, Mar 15, 2018 at 02:13:03PM +, Dmitry Safonov wrote: > So, you suggest to remove ratelimit at all? > Do we really need printk flood for each happened fault? > Imagine, you've hundreds of mappings and then PCI link flapped.. > Wouldn't it be better to keep ratelimiting? > I don't mind, just it looks a bit strange to me. I never said you should remove the ratelimiting, after all you are trying to fix a soft-lockup, no? And that should not be fixed by changes to the ratelimiting, but with proper irq handling. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
On Thu, 2018-03-15 at 14:46 +0100, Joerg Roedel wrote: > On Thu, Feb 15, 2018 at 07:17:29PM +, Dmitry Safonov wrote: > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > > index accf58388bdb..6c4ea32ee6a9 100644 > > --- a/drivers/iommu/dmar.c > > +++ b/drivers/iommu/dmar.c > > @@ -1618,17 +1618,13 @@ irqreturn_t dmar_fault(int irq, void > > *dev_id) > > int reg, fault_index; > > u32 fault_status; > > unsigned long flag; > > - bool ratelimited; > > static DEFINE_RATELIMIT_STATE(rs, > > DEFAULT_RATELIMIT_INTERVAL, > > DEFAULT_RATELIMIT_BURST); > > > > - /* Disable printing, simply clear the fault when > > ratelimited */ > > - ratelimited = !__ratelimit(&rs); > > - > > raw_spin_lock_irqsave(&iommu->register_lock, flag); > > fault_status = readl(iommu->reg + DMAR_FSTS_REG); > > - if (fault_status && !ratelimited) > > + if (fault_status && __ratelimit(&rs)) > > pr_err("DRHD: handling fault status reg %x\n", > > fault_status); > > This looks aweful. Have you tried to limit the number of loops in > this > function and returning? You can handle the next faults by the next > interrupt. This ensures that the cpu visits a scheduling point from > time > to time so that you don't see soft-lockups. So, you suggest to remove ratelimit at all? Do we really need printk flood for each happened fault? Imagine, you've hundreds of mappings and then PCI link flapped.. Wouldn't it be better to keep ratelimiting? I don't mind, just it looks a bit strange to me. -- Thanks, Dmitry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
On Thu, Feb 15, 2018 at 07:17:29PM +, Dmitry Safonov wrote: > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > index accf58388bdb..6c4ea32ee6a9 100644 > --- a/drivers/iommu/dmar.c > +++ b/drivers/iommu/dmar.c > @@ -1618,17 +1618,13 @@ irqreturn_t dmar_fault(int irq, void *dev_id) > int reg, fault_index; > u32 fault_status; > unsigned long flag; > - bool ratelimited; > static DEFINE_RATELIMIT_STATE(rs, > DEFAULT_RATELIMIT_INTERVAL, > DEFAULT_RATELIMIT_BURST); > > - /* Disable printing, simply clear the fault when ratelimited */ > - ratelimited = !__ratelimit(&rs); > - > raw_spin_lock_irqsave(&iommu->register_lock, flag); > fault_status = readl(iommu->reg + DMAR_FSTS_REG); > - if (fault_status && !ratelimited) > + if (fault_status && __ratelimit(&rs)) > pr_err("DRHD: handling fault status reg %x\n", fault_status); This looks aweful. Have you tried to limit the number of loops in this function and returning? You can handle the next faults by the next interrupt. This ensures that the cpu visits a scheduling point from time to time so that you don't see soft-lockups. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
Gentle ping? On Mon, 2018-03-05 at 15:00 +, Dmitry Safonov wrote: > Hi Joerg, > > What do you think about v3? > It looks like, I can solve my softlookups with just a bit more proper > ratelimiting.. > > On Thu, 2018-02-15 at 19:17 +, Dmitry Safonov wrote: > > There is a ratelimit for printing, but it's incremented each time > > the > > cpu recives dmar fault interrupt. While one interrupt may signal > > about > > *many* faults. > > So, measuring the impact it turns out that reading/clearing one > > fault > > takes < 1 usec, and printing info about the fault takes ~170 msec. > > > > Having in mind that maximum number of fault recording registers per > > remapping hardware unit is 256.. IRQ handler may run for (170*256) > > msec. > > And as fault-serving loop runs without a time limit, during > > servicing > > new faults may occur.. > > > > Ratelimit each fault printing rather than each irq printing. > > > > Fixes: commit c43fce4eebae ("iommu/vt-d: Ratelimit fault handler") > > > > BUG: spinlock lockup suspected on CPU#0, CliShell/9903 > > lock: 0x81a47440, .magic: dead4ead, .owner: > > kworker/u16:2/8915, .owner_cpu: 6 > > CPU: 0 PID: 9903 Comm: CliShell > > Call Trace:$\n' > > [..] dump_stack+0x65/0x83$\n' > > [..] spin_dump+0x8f/0x94$\n' > > [..] do_raw_spin_lock+0x123/0x170$\n' > > [..] _raw_spin_lock_irqsave+0x32/0x3a$\n' > > [..] uart_chars_in_buffer+0x20/0x4d$\n' > > [..] tty_chars_in_buffer+0x18/0x1d$\n' > > [..] n_tty_poll+0x1cb/0x1f2$\n' > > [..] tty_poll+0x5e/0x76$\n' > > [..] do_select+0x363/0x629$\n' > > [..] compat_core_sys_select+0x19e/0x239$\n' > > [..] compat_SyS_select+0x98/0xc0$\n' > > [..] sysenter_dispatch+0x7/0x25$\n' > > [..] > > NMI backtrace for cpu 6 > > CPU: 6 PID: 8915 Comm: kworker/u16:2 > > Workqueue: dmar_fault dmar_fault_work > > Call Trace:$\n' > > [..] wait_for_xmitr+0x26/0x8f$\n' > > [..] serial8250_console_putchar+0x1c/0x2c$\n' > > [..] uart_console_write+0x40/0x4b$\n' > > [..] serial8250_console_write+0xe6/0x13f$\n' > > [..] call_console_drivers.constprop.13+0xce/0x103$\n' > > [..] console_unlock+0x1f8/0x39b$\n' > > [..] vprintk_emit+0x39e/0x3e6$\n' > > [..] printk+0x4d/0x4f$\n' > > [..] dmar_fault+0x1a8/0x1fc$\n' > > [..] dmar_fault_work+0x15/0x17$\n' > > [..] process_one_work+0x1e8/0x3a9$\n' > > [..] worker_thread+0x25d/0x345$\n' > > [..] kthread+0xea/0xf2$\n' > > [..] ret_from_fork+0x58/0x90$\n' > > > > Cc: Alex Williamson > > Cc: David Woodhouse > > Cc: Ingo Molnar > > Cc: Joerg Roedel > > Cc: Lu Baolu > > Cc: iommu@lists.linux-foundation.org > > Signed-off-by: Dmitry Safonov > > --- > > Maybe it's worth to limit while(1) cycle. > > If IOMMU generates faults with equal speed as irq handler cleans > > them, it may turn into long-irq-disabled region again. > > Not sure if it can happen anyway. > > > > drivers/iommu/dmar.c | 8 +++- > > 1 file changed, 3 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > > index accf58388bdb..6c4ea32ee6a9 100644 > > --- a/drivers/iommu/dmar.c > > +++ b/drivers/iommu/dmar.c > > @@ -1618,17 +1618,13 @@ irqreturn_t dmar_fault(int irq, void > > *dev_id) > > int reg, fault_index; > > u32 fault_status; > > unsigned long flag; > > - bool ratelimited; > > static DEFINE_RATELIMIT_STATE(rs, > > DEFAULT_RATELIMIT_INTERVAL, > > DEFAULT_RATELIMIT_BURST); > > > > - /* Disable printing, simply clear the fault when > > ratelimited > > */ > > - ratelimited = !__ratelimit(&rs); > > - > > raw_spin_lock_irqsave(&iommu->register_lock, flag); > > fault_status = readl(iommu->reg + DMAR_FSTS_REG); > > - if (fault_status && !ratelimited) > > + if (fault_status && __ratelimit(&rs)) > > pr_err("DRHD: handling fault status reg %x\n", > > fault_status); > > > > /* TBD: ignore advanced fault log currently */ > > @@ -1638,6 +1634,8 @@ irqreturn_t dmar_fault(int irq, void *dev_id) > > fault_index = dma_fsts_fault_record_index(fault_status); > > reg = cap_fault_reg_offset(iommu->cap); > > while (1) { > > + /* Disable printing, simply clear the fault when > > ratelimited */ > > + bool ratelimited = !__ratelimit(&rs); > > u8 fault_reason; > > u16 source_id; > > u64 guest_addr; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
Hi Joerg, What do you think about v3? It looks like, I can solve my softlookups with just a bit more proper ratelimiting.. On Thu, 2018-02-15 at 19:17 +, Dmitry Safonov wrote: > There is a ratelimit for printing, but it's incremented each time the > cpu recives dmar fault interrupt. While one interrupt may signal > about > *many* faults. > So, measuring the impact it turns out that reading/clearing one fault > takes < 1 usec, and printing info about the fault takes ~170 msec. > > Having in mind that maximum number of fault recording registers per > remapping hardware unit is 256.. IRQ handler may run for (170*256) > msec. > And as fault-serving loop runs without a time limit, during servicing > new faults may occur.. > > Ratelimit each fault printing rather than each irq printing. > > Fixes: commit c43fce4eebae ("iommu/vt-d: Ratelimit fault handler") > > BUG: spinlock lockup suspected on CPU#0, CliShell/9903 > lock: 0x81a47440, .magic: dead4ead, .owner: > kworker/u16:2/8915, .owner_cpu: 6 > CPU: 0 PID: 9903 Comm: CliShell > Call Trace:$\n' > [..] dump_stack+0x65/0x83$\n' > [..] spin_dump+0x8f/0x94$\n' > [..] do_raw_spin_lock+0x123/0x170$\n' > [..] _raw_spin_lock_irqsave+0x32/0x3a$\n' > [..] uart_chars_in_buffer+0x20/0x4d$\n' > [..] tty_chars_in_buffer+0x18/0x1d$\n' > [..] n_tty_poll+0x1cb/0x1f2$\n' > [..] tty_poll+0x5e/0x76$\n' > [..] do_select+0x363/0x629$\n' > [..] compat_core_sys_select+0x19e/0x239$\n' > [..] compat_SyS_select+0x98/0xc0$\n' > [..] sysenter_dispatch+0x7/0x25$\n' > [..] > NMI backtrace for cpu 6 > CPU: 6 PID: 8915 Comm: kworker/u16:2 > Workqueue: dmar_fault dmar_fault_work > Call Trace:$\n' > [..] wait_for_xmitr+0x26/0x8f$\n' > [..] serial8250_console_putchar+0x1c/0x2c$\n' > [..] uart_console_write+0x40/0x4b$\n' > [..] serial8250_console_write+0xe6/0x13f$\n' > [..] call_console_drivers.constprop.13+0xce/0x103$\n' > [..] console_unlock+0x1f8/0x39b$\n' > [..] vprintk_emit+0x39e/0x3e6$\n' > [..] printk+0x4d/0x4f$\n' > [..] dmar_fault+0x1a8/0x1fc$\n' > [..] dmar_fault_work+0x15/0x17$\n' > [..] process_one_work+0x1e8/0x3a9$\n' > [..] worker_thread+0x25d/0x345$\n' > [..] kthread+0xea/0xf2$\n' > [..] ret_from_fork+0x58/0x90$\n' > > Cc: Alex Williamson > Cc: David Woodhouse > Cc: Ingo Molnar > Cc: Joerg Roedel > Cc: Lu Baolu > Cc: iommu@lists.linux-foundation.org > Signed-off-by: Dmitry Safonov > --- > Maybe it's worth to limit while(1) cycle. > If IOMMU generates faults with equal speed as irq handler cleans > them, it may turn into long-irq-disabled region again. > Not sure if it can happen anyway. > > drivers/iommu/dmar.c | 8 +++- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > index accf58388bdb..6c4ea32ee6a9 100644 > --- a/drivers/iommu/dmar.c > +++ b/drivers/iommu/dmar.c > @@ -1618,17 +1618,13 @@ irqreturn_t dmar_fault(int irq, void *dev_id) > int reg, fault_index; > u32 fault_status; > unsigned long flag; > - bool ratelimited; > static DEFINE_RATELIMIT_STATE(rs, > DEFAULT_RATELIMIT_INTERVAL, > DEFAULT_RATELIMIT_BURST); > > - /* Disable printing, simply clear the fault when ratelimited > */ > - ratelimited = !__ratelimit(&rs); > - > raw_spin_lock_irqsave(&iommu->register_lock, flag); > fault_status = readl(iommu->reg + DMAR_FSTS_REG); > - if (fault_status && !ratelimited) > + if (fault_status && __ratelimit(&rs)) > pr_err("DRHD: handling fault status reg %x\n", > fault_status); > > /* TBD: ignore advanced fault log currently */ > @@ -1638,6 +1634,8 @@ irqreturn_t dmar_fault(int irq, void *dev_id) > fault_index = dma_fsts_fault_record_index(fault_status); > reg = cap_fault_reg_offset(iommu->cap); > while (1) { > + /* Disable printing, simply clear the fault when > ratelimited */ > + bool ratelimited = !__ratelimit(&rs); > u8 fault_reason; > u16 source_id; > u64 guest_addr; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCHv3] iommu/intel: Ratelimit each dmar fault printing
There is a ratelimit for printing, but it's incremented each time the cpu recives dmar fault interrupt. While one interrupt may signal about *many* faults. So, measuring the impact it turns out that reading/clearing one fault takes < 1 usec, and printing info about the fault takes ~170 msec. Having in mind that maximum number of fault recording registers per remapping hardware unit is 256.. IRQ handler may run for (170*256) msec. And as fault-serving loop runs without a time limit, during servicing new faults may occur.. Ratelimit each fault printing rather than each irq printing. Fixes: commit c43fce4eebae ("iommu/vt-d: Ratelimit fault handler") BUG: spinlock lockup suspected on CPU#0, CliShell/9903 lock: 0x81a47440, .magic: dead4ead, .owner: kworker/u16:2/8915, .owner_cpu: 6 CPU: 0 PID: 9903 Comm: CliShell Call Trace:$\n' [..] dump_stack+0x65/0x83$\n' [..] spin_dump+0x8f/0x94$\n' [..] do_raw_spin_lock+0x123/0x170$\n' [..] _raw_spin_lock_irqsave+0x32/0x3a$\n' [..] uart_chars_in_buffer+0x20/0x4d$\n' [..] tty_chars_in_buffer+0x18/0x1d$\n' [..] n_tty_poll+0x1cb/0x1f2$\n' [..] tty_poll+0x5e/0x76$\n' [..] do_select+0x363/0x629$\n' [..] compat_core_sys_select+0x19e/0x239$\n' [..] compat_SyS_select+0x98/0xc0$\n' [..] sysenter_dispatch+0x7/0x25$\n' [..] NMI backtrace for cpu 6 CPU: 6 PID: 8915 Comm: kworker/u16:2 Workqueue: dmar_fault dmar_fault_work Call Trace:$\n' [..] wait_for_xmitr+0x26/0x8f$\n' [..] serial8250_console_putchar+0x1c/0x2c$\n' [..] uart_console_write+0x40/0x4b$\n' [..] serial8250_console_write+0xe6/0x13f$\n' [..] call_console_drivers.constprop.13+0xce/0x103$\n' [..] console_unlock+0x1f8/0x39b$\n' [..] vprintk_emit+0x39e/0x3e6$\n' [..] printk+0x4d/0x4f$\n' [..] dmar_fault+0x1a8/0x1fc$\n' [..] dmar_fault_work+0x15/0x17$\n' [..] process_one_work+0x1e8/0x3a9$\n' [..] worker_thread+0x25d/0x345$\n' [..] kthread+0xea/0xf2$\n' [..] ret_from_fork+0x58/0x90$\n' Cc: Alex Williamson Cc: David Woodhouse Cc: Ingo Molnar Cc: Joerg Roedel Cc: Lu Baolu Cc: iommu@lists.linux-foundation.org Signed-off-by: Dmitry Safonov --- Maybe it's worth to limit while(1) cycle. If IOMMU generates faults with equal speed as irq handler cleans them, it may turn into long-irq-disabled region again. Not sure if it can happen anyway. drivers/iommu/dmar.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index accf58388bdb..6c4ea32ee6a9 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1618,17 +1618,13 @@ irqreturn_t dmar_fault(int irq, void *dev_id) int reg, fault_index; u32 fault_status; unsigned long flag; - bool ratelimited; static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); - /* Disable printing, simply clear the fault when ratelimited */ - ratelimited = !__ratelimit(&rs); - raw_spin_lock_irqsave(&iommu->register_lock, flag); fault_status = readl(iommu->reg + DMAR_FSTS_REG); - if (fault_status && !ratelimited) + if (fault_status && __ratelimit(&rs)) pr_err("DRHD: handling fault status reg %x\n", fault_status); /* TBD: ignore advanced fault log currently */ @@ -1638,6 +1634,8 @@ irqreturn_t dmar_fault(int irq, void *dev_id) fault_index = dma_fsts_fault_record_index(fault_status); reg = cap_fault_reg_offset(iommu->cap); while (1) { + /* Disable printing, simply clear the fault when ratelimited */ + bool ratelimited = !__ratelimit(&rs); u8 fault_reason; u16 source_id; u64 guest_addr; -- 2.13.6 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu