Hi Waiman, For the first time in months I just turned of spinlock debugging on my performance test machine and I just got an unpleasant surprise on my standard inode allocation and reclaim test. I've described this test to you before, because it's found regressions in your previous lock scaling changes:
http://permalink.gmane.org/gmane.linux.kernel/1768786 This time it is the fsmark run that I use to populate the filesystem that is demonstrating a locking regression. I'll asked you before if you could add this test to your lock scaling regression test suite; please do it this time. Now, the regression. With spinlock debugging turned on, the performance of my usual XFS inode allocation benchmark using fsmark reports performance like this: FSUse% Count Size Files/sec App Overhead 0 1600000 0 312594.0 9944159 0 3200000 0 295668.6 10399679 0 4800000 0 279026.1 11397617 ..... This has been pretty stable for several releases - it varies +/- a few percent, but it's pretty much been like this since about 3.2 when CONFIG_XFS_DEBUG=n, with or without basic spinlock debugging. When I turned spinlock debugging off on 4.2 to get some perf numbers a request from Linus, I got this: FSUse% Count Size Files/sec App Overhead 0 1600000 0 114143.9 9597599 0 3200000 0 95486.9 9460413 0 4800000 0 93918.2 9784699 .... All 16 CPUs were pegged at 100% cpu usage. I took a quick look at perf: 67.32% [kernel] [k] queued_spin_lock_slowpath 5.17% [kernel] [k] xfs_log_commit_cil 2.47% [kernel] [k] _xfs_buf_find 1.37% [kernel] [k] _raw_spin_lock .... And then a quick call graph sample to find the lock: 37.19% 37.19% [kernel] [k] queued_spin_lock_slowpath - queued_spin_lock_slowpath - 99.98% _raw_spin_lock - 89.16% xfs_log_commit_cil - __xfs_trans_commit - 98.48% xfs_trans_commit xfs_create xfs_generic_create xfs_vn_mknod xfs_vn_create vfs_create path_openat do_filp_open do_sys_open sys_open entry_SYSCALL_64_fastpath + __GI___libc_open + 1.52% __xfs_trans_roll This shows that we have catastrophic spinlock contention in the transaction commit path. The cil->xc_cil_lock spin lock as it's the only spinlock in that path. And while it's the hot lock in the commit path, turning spinlock debugging back on (and no other changes) shows that it shouldn't be contended: 8.92% [kernel] [k] _xfs_buf_find 5.51% [kernel] [k] xfs_dir2_node_addname 3.49% [kernel] [k] xfs_dir3_free_hdr_from_disk 3.45% [kernel] [k] do_raw_spin_lock 3.06% [kernel] [k] kmem_cache_alloc 2.99% [kernel] [k] __memcpy 2.97% [kernel] [k] xfs_log_commit_cil ..... And the call graph: 4.52% 0.24% [kernel] [k] _raw_spin_lock - _raw_spin_lock + 43.63% xfs_log_commit_cil + 9.76% inode_sb_list_add + 8.57% list_lru_add + 6.37% _xfs_buf_find + 6.08% d_alloc + 4.18% dput + 4.08% __d_instantiate ... IOWs, about 2% of the cpu usage is lock traffic through cil->xc_cil_lock, and that's 3x the locking rate where the queued spinlock becomes catastrophically contention bound. To confirm that this is indeed caused by the queued spinlocks, I removed the the spinlock debugging and did this to arch/x86/Kconfig: - select ARCH_USE_QUEUED_SPINLOCK And the results are: FSUse% Count Size Files/sec App Overhead 0 1600000 0 329310.4 9727415 0 3200000 0 305421.4 10421358 0 4800000 0 294593.3 11007112 0 6400000 0 283863.7 12557190 .... So the problem is definitely the queued spinlock... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/