On Tue, Dec 26, 2006 at 04:51:21PM -0800, Chen, Tim C wrote: > Ingo Molnar wrote: > > If you'd like to profile this yourself then the lowest-cost way of > > profiling lock contention on -rt is to use the yum kernel and run the > > attached trace-it-lock-prof.c code on the box while your workload is > > in 'steady state' (and is showing those extended idle times): > > > > ./trace-it-lock-prof > trace.txt > > Thanks for the pointer. Will let you know of any relevant traces.
Tim, http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.lock_stat.patch You can also apply this patch to get more precise statistics down to the lock. For example: ... [50, 30, 279 :: 1, 0] {tty_ldisc_try, -, 0} [5, 5, 0 :: 19, 0] {alloc_super, fs/super.c, 76} [5, 5, 3 :: 1, 0] {__free_pages_ok, -, 0} [5728, 862, 156 :: 2, 0] {journal_init_common, fs/jbd/journal.c, 667} [594713, 79020, 4287 :: 60818, 0] {inode_init_once, fs/inode.c, 193} [602, 0, 0 :: 1, 0] {lru_cache_add_active, -, 0} [63, 5, 59 :: 1, 0] {lookup_mnt, -, 0} [6425, 378, 103 :: 24, 0] {initialize_tty_struct, drivers/char/tty_io.c, 3530} [6708, 1, 225 :: 1, 0] {file_move, -, 0} [67, 8, 15 :: 1, 0] {do_lookup, -, 0} [69, 0, 0 :: 1, 0] {exit_mmap, -, 0} [7, 0, 0 :: 1, 0] {uart_set_options, drivers/serial/serial_core.c, 1876} [76, 0, 0 :: 1, 0] {get_zone_pcp, -, 0} [7777, 5, 9 :: 1, 0] {as_work_handler, -, 0} [8689, 0, 0 :: 15, 0] {create_workqueue_thread, kernel/workqueue.c, 474} [89, 7, 6 :: 195, 0] {sighand_ctor, kernel/fork.c, 1474} @contention events = 1791177 @found = 21 Is the output from /proc/lock_stat/contention. First column is the number of contention that will results in a full block of the task, second is the number of times the mutex owner is active on a per cpu run queue the scheduler and third is the number of times Steve Rostedt's ownership handoff code averted a full block. Peter Zijlstra used it initially during his files_lock work. Overhead of the patch is very low since it is only recording stuff in the slow path of the rt-mutex implementation. Writing to that file clears all of the stats for a fresh run with a benchmark. This should give a precise point at which any contention would happen in -rt. In general, -rt should do about as well as the stock kernel minus the overhead of interrupt threads. Since the last release, I've added checks for whether the task is running as "current" on a run queue to see if adaptive spins would be useful in -rt. These new stats show that only a small percentage of events would benefit from the use of adaptive spins in front of a rt- mutex. Any implementation of it would have little impact on the system. It's not the mechanism but the raw MP work itself that contributes to the good MP performance of Linux. Apply and have fun. bill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/