Re: frequent lockups in 3.18rc4

Chris Mason Fri, 19 Dec 2014 12:33:07 -0800

On Fri, Dec 19, 2014 at 11:15:21AM -0800, Linus Torvalds wrote:
> Here's another pattern. In your latest thing, every single time that
> CPU1 is waiting for some other CPU to pick up the IPI, we have CPU0
> doing this:
> 
> [24998.060963] NMI backtrace for cpu 0
> [24998.061989] CPU: 0 PID: 2940 Comm: trinity-c150 Not tainted 3.18.0+ #108
> [24998.064073] task: ffff8801bf3536b0 ti: ffff880197e0c000 task.ti:
> ffff880197e0c000
> [24998.065137] RIP: 0010:[<ffffffff8103e006>]  [<ffffffff8103e006>]
> read_hpet+0x16/0x20
> [24998.083577]  [<ffffffff810e0d3e>] ktime_get+0x3e/0xa0
> [24998.084450]  [<ffffffff810e9cd3>] tick_sched_timer+0x23/0x160
> [24998.085315]  [<ffffffff810daf96>] __run_hrtimer+0x76/0x1f0
> [24998.086173]  [<ffffffff810e9cb0>] ? tick_init_highres+0x20/0x20
> [24998.087025]  [<ffffffff810db2e7>] hrtimer_interrupt+0x107/0x260
> [24998.087877]  [<ffffffff81031a4b>] local_apic_timer_interrupt+0x3b/0x70
> [24998.088732]  [<ffffffff8179bca5>] smp_apic_timer_interrupt+0x45/0x60
> [24998.089583]  [<ffffffff8179a0df>] apic_timer_interrupt+0x6f/0x80
> [24998.090435]  <EOI>
> [24998.091279]  [<ffffffff810da66e>] ? __remove_hrtimer+0x4e/0xa0
> [24998.092118]  [<ffffffff812c7c7a>] ? ipcget+0x8a/0x1e0
> [24998.092951]  [<ffffffff812c7c6c>] ? ipcget+0x7c/0x1e0
> [24998.093779]  [<ffffffff812c8d6d>] SyS_msgget+0x4d/0x70
> 
> and I think that's the smoking gun. The reason CPU0 isn't picking up
> any IPI's is because it is in some endless loop around read_hpet().
> 
> There is even time information in the register dump:
> 
>  RAX: 0000000061fece8a RBX: 0000000000510792 RCX: 0000000000000000
>  RAX: 0000000079e588fc RBX: 0000000000511d6e RCX: 0000000000000000
>  RAX: 0000000091ca7f65 RBX: 0000000000513346 RCX: 0000000000000000
>  RAX: 00000000a9afbd0d RBX: 000000000051491e RCX: 0000000000000000
>  RAX: 00000000cbd1340c RBX: 000000000051684a RCX: 0000000000000000
>  RAX: 00000000fb9d303f RBX: 00000000005193fc RCX: 0000000000000000
>  RAX: 000000002b67efe4 RBX: 000000000051c224 RCX: 0000000000000004
> 
> That RAX value is the value we just read from the HPET, and RBX seems
> to be monotonically increasing too, so it's likely the sequence
> counter in ktime_get().
> 
> So it's not stuck *inside* read_hpet(), and it's almost certainly not
> the loop over the sequence counter in ktime_get() either (it's not
> increasing *that* quickly). But some basically infinite __run_hrtimer
> thing or something?


Really interesting.

So, we're calling __run_hrtimer in a loop:

                while ((node = timerqueue_getnext(&base->active))) {
                                ...
                                __run_hrtimer(timer, &basenow);
                                ...
                }

The easy question is how often does trinity call nanosleep?

Looking at __run_hrtimer(), it drops the lock and runs the function and then
takes the lock again, maybe enqueueing us again right away.

timer->state is supposed to protect us from other CPUs jumping in and doing
something else with the timer, but it feels racey wrt remove_hrtimer().
Something like this, but I'm not sure how often __hrtimer_start_range_ns gets
called

CPU 0                                           CPU 1
__run_hrtimer()
    timer->state = HRTIMER_STATE_CALLBACK
    removed from list
    unlock cpu_base->lock
    restrt = fn(timer)
                                                __hrtimer_start_range_ns()
                                                base = lock_hrtimer_base()
                                                ret = remove_hrtimer()
                                                    finds timer->state = 
HRTIMER_STATE_CALLBACK
                                                    does nothing
                                                new_base = switch_hrtimer_base()
                                                    now we're on a different 
base, different lock
    lock(cpu_base->lock)
    enqueue the timer
                                                enqueue the timer
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: frequent lockups in 3.18rc4

Reply via email to