On 27.3.2015 22:36, Sasha Levin wrote: > On 03/27/2015 06:07 AM, Vlastimil Babka wrote: >>> [ 3614.918852] trinity-c7 D ffff8802f4487b58 26976 16252 9410 >>> 0x10000000 >>>> [ 3614.919580] ffff8802f4487b58 ffff8802f6b98ca8 0000000000000000 >>>> 0000000000000000 >>>> [ 3614.920435] ffff88017d3e0558 ffff88017d3e0530 ffff8802f6b98008 >>>> ffff88016bad0000 >>>> [ 3614.921219] ffff8802f6b98000 ffff8802f4487b38 ffff8802f4480000 >>>> ffffed005e890002 >>>> [ 3614.922069] Call Trace: >>>> [ 3614.922346] schedule (./arch/x86/include/asm/bitops.h:311 >>>> (discriminator 1) kernel/sched/core.c:2827 (discriminator 1)) >>>> [ 3614.923023] schedule_preempt_disabled (kernel/sched/core.c:2859) >>>> [ 3614.923707] mutex_lock_nested (kernel/locking/mutex.c:585 >>>> kernel/locking/mutex.c:623) >>>> [ 3614.924486] ? lru_add_drain_all (mm/swap.c:867) >>>> [ 3614.925211] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2580 >>>> kernel/locking/lockdep.c:2622) >>>> [ 3614.925970] ? lru_add_drain_all (mm/swap.c:867) >>>> [ 3614.926692] ? mutex_trylock (kernel/locking/mutex.c:621) >>>> [ 3614.927464] ? mpol_new (mm/mempolicy.c:285) >>>> [ 3614.928044] lru_add_drain_all (mm/swap.c:867) >>>> [ 3614.928608] migrate_prep (mm/migrate.c:64) >>>> [ 3614.929092] SYSC_mbind (mm/mempolicy.c:1188 mm/mempolicy.c:1319) >>>> [ 3614.929619] ? rcu_eqs_exit_common (kernel/rcu/tree.c:735 (discriminator >>>> 8)) >>>> [ 3614.930318] ? __mpol_equal (mm/mempolicy.c:1304) >>>> [ 3614.930877] ? trace_hardirqs_on (kernel/locking/lockdep.c:2630) >>>> [ 3614.931485] ? syscall_trace_enter_phase2 (arch/x86/kernel/ptrace.c:1592) >>>> [ 3614.932184] SyS_mbind (mm/mempolicy.c:1301) >> That looks like trinity-c7 is waiting ot in too, but later on (after some >> more >> listings like this for trinity-c7, probably threads?) we have: >> > > It keeps changing constantly, even in this trace the process is blocking on > the mutex
I think it's multiple threads of process with same name trinity-c7, and the thread 16935 of trinity-c7 does have the mutex locked and is waiting on something else. > rather than doing something useful, and in the next trace it's a different > process. And the next trace is from the same run, just later, i.e. it doesn't hang completely, but makes too slow progress so that 20 minutes hang timer catches this? I'm not sure here. If it's too slow, I can imagine it could be simply optimized - if one thread manages to lock the mutex, it can tell all threads waiting *at that moment* that they can just return when the first thread is done - it has done the necessary work for all of them already. But I wonder if this contention happens in practice. And that certainly doesn't explain any regression that apparently occured. > > Thanks, > Sasha > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/