Re: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]
On Tue, Jan 02, 2007 at 03:12:34PM -0800, Bill Huey wrote: > On Tue, Jan 02, 2007 at 02:51:05PM -0800, Chen, Tim C wrote: > > Bill, > > > > I'm having some problem getting this patch to run stablely. I'm > > encoutering errors like that in the trace that follow: > > It might the case that the lock isn't know to the lock stats code yet. > It's got some technical overlap with lockdep in that a lock might not be > known yet and is causing a crashing. The stack trace and code examination reveals, if that structure in the timer code is used before it's initialized by the CPU bringup, it'll cause problems like that crash. I'll look at it later on tonight. bill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]
On Tue, Jan 02, 2007 at 02:51:05PM -0800, Chen, Tim C wrote: > Bill, > > I'm having some problem getting this patch to run stablely. I'm > encoutering errors like that in the trace that follow: > > Thanks. > Tim > > Unable to handle kernel NULL pointer dereference at 0008 Yes, those are the reason why I have some aggressive asserts in the code to try track down the problem. Try this: http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.1.lock_stat.patch It's got some cosmestic clean up in it to make it more Linux-ish instead of me trying to reinvent some kind of Smalltalk system in the kernel. I'm trying to address all of Ingo's complaints about the code so it's still a work in progress, namely the style issues (I'd like help/suggestions on that) and assert conventions. It might the case that the lock isn't know to the lock stats code yet. It's got some technical overlap with lockdep in that a lock might not be known yet and is causing a crashing. Try that patch and report back to me what happens. bill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]
Bill Huey (hui) wrote: > On Tue, Dec 26, 2006 at 04:51:21PM -0800, Chen, Tim C wrote: >> Ingo Molnar wrote: >>> If you'd like to profile this yourself then the lowest-cost way of >>> profiling lock contention on -rt is to use the yum kernel and run >>> the attached trace-it-lock-prof.c code on the box while your >>> workload is in 'steady state' (and is showing those extended idle >>> times): >>> >>> ./trace-it-lock-prof > trace.txt >> >> Thanks for the pointer. Will let you know of any relevant traces. > > Tim, > http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.lock_stat.pat ch > > You can also apply this patch to get more precise statistics down to > the lock. For example: > Bill, I'm having some problem getting this patch to run stablely. I'm encoutering errors like that in the trace that follow: Thanks. Tim Unable to handle kernel NULL pointer dereference at 0008 RIP: [] lock_stat_note_contention+0x12d/0x1c3 PGD 0 Oops: [1] PREEMPT SMP CPU 1 Modules linked in: autofs4 sunrpc dm_mirror dm_mod video sbs i2c_ec dock button battery ac uhci_hcd ehci_hcd i2dPid: 0, comm: swapper Not tainted 2.6.20-rc2-rt2 #4 RIP: 0010:[] [] lock_stat_note_contention+0x12d/0x1c3 RSP: 0018:81013fdb3d28 EFLAGS: 00010097 RAX: 81013fd68018 RBX: 81013fd68000 RCX: RDX: 8026762e RSI: RDI: 8026762e RBP: 81013fdb3df8 R08: 8100092bab60 R09: 8100092aafc8 R10: 0001 R11: R12: 81013fd68030 R13: R14: 0046 R15: 002728d5ecd0 FS: () GS:81013fd078c0() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 0008 CR3: 00201000 CR4: 06e0 Process swapper (pid: 0, threadinfo 81013fdb2000, task 81013fdb14e0) Stack: 00020001 0100 000e 000e 8100092bc440 Call Trace: [] rt_mutex_slowtrylock+0x84/0x9b [] rt_mutex_trylock+0x2e/0x30 [] rt_spin_trylock+0x9/0xb [] get_next_timer_interrupt+0x34/0x226 [] hrtimer_stop_sched_tick+0xb6/0x138 [] cpu_idle+0x1b/0xdd [] start_secondary+0x2ed/0x2f9 --- | preempt count: 0003 ] | 3-level deep critical section nesting: .. [] cpu_idle+0xd7/0xdd .[] .. ( <= start_secondary+0x2ed/0x2f9) .. [] __spin_lock_irqsave+0x18/0x42 .[] .. ( <= rt_mutex_slowtrylock+0x19/0x9b) .. [] __spin_trylock+0x14/0x4c .[] .. ( <= oops_begin+0x23/0x6f) skipping trace printing on CPU#1 != -1 Code: 49 8b 45 08 8b 78 18 75 0d 49 8b 04 24 f0 ff 80 94 00 00 00 RIP [] lock_stat_note_contention+0x12d/0x1c3 RSP CR2: 0008 <3>BUG: sleeping function called from invalid context swapper(0) at kernel/rtmutex.c:1312 in_atomic():1 [0002], irqs_disabled():1 Call Trace: [] dump_trace+0xbe/0x3cd [] show_trace+0x3a/0x58 [] dump_stack+0x15/0x17 [] __might_sleep+0x103/0x10a [] rt_mutex_lock_with_ip+0x1e/0xac [] __rt_down_read+0x49/0x4d [] rt_down_read+0xb/0xd [] blocking_notifier_call_chain+0x19/0x3f [] profile_task_exit+0x15/0x17 [] do_exit+0x25/0x8de [] do_page_fault+0x7d4/0x856 [] error_exit+0x0/0x84 [] lock_stat_note_contention+0x12d/0x1c3 [] rt_mutex_slowtrylock+0x84/0x9b [] rt_mutex_trylock+0x2e/0x30 [] rt_spin_trylock+0x9/0xb [] get_next_timer_interrupt+0x34/0x226 [] hrtimer_stop_sched_tick+0xb6/0x138 [] cpu_idle+0x1b/0xdd [] start_secondary+0x2ed/0x2f9 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.19-rt14 slowdown compared to 2.6.19
Ingo Molnar wrote: > > (could you send me the whole trace if you still have it? It would be > interesting to see a broader snippet from the life of individual java > threads.) > > Ingo Sure, I'll send it to you separately due to the size of the complete trace. Tim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]
On Sat, Dec 30, 2006 at 06:56:08AM -0800, Daniel Walker wrote: > On Sat, 2006-12-30 at 12:19 +0100, Ingo Molnar wrote: > > > > > - Documentation/CodingStyle compliance - the code is not ugly per se > >but still looks a bit 'alien' - please try to make it look Linuxish, > >if i apply this we'll probably stick with it forever. This is the > >major reason i havent applied it yet. > > I did some cleanup while reviewing the patch, nothing very exciting but > it's an attempt to bring it more into the "Linuxish" scope .. I didn't > compile it so be warned. > > There lots of ifdef'd code under CONFIG_LOCK_STAT inside rtmutex.c I > suspect it would be a benefit to move that all into a header and ifdef > only in the header . Ingo and Daniel, I'll try and make it more Linuxish. It's one of the reasons why I posted it since I knew it would need some kind of help in that arena and I've been in need of feedback regarding it. Originally, I picked a style that made what I was doing extremely obvious and clear to facilitate development which is the rationale behind it. I'll make those changes and we can progressively pass it back and forth to see if this passes. bill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]
On Sat, 2006-12-30 at 12:19 +0100, Ingo Molnar wrote: > > - Documentation/CodingStyle compliance - the code is not ugly per se >but still looks a bit 'alien' - please try to make it look Linuxish, >if i apply this we'll probably stick with it forever. This is the >major reason i havent applied it yet. I did some cleanup while reviewing the patch, nothing very exciting but it's an attempt to bring it more into the "Linuxish" scope .. I didn't compile it so be warned. There lots of ifdef'd code under CONFIG_LOCK_STAT inside rtmutex.c I suspect it would be a benefit to move that all into a header and ifdef only in the header . Daniel Signed-Off-By: Daniel Walker <[EMAIL PROTECTED]> --- include/linux/lock_stat.h | 25 +++-- kernel/lock_stat.c| 34 ++ 2 files changed, 37 insertions(+), 22 deletions(-) Index: linux-2.6.19/include/linux/lock_stat.h === --- linux-2.6.19.orig/include/linux/lock_stat.h +++ linux-2.6.19/include/linux/lock_stat.h @@ -26,9 +26,9 @@ #include typedef struct lock_stat { - charfunction[KSYM_NAME_LEN]; - int line; - char*file; + charfunction[KSYM_NAME_LEN]; + int line; + char*file; atomic_tncontended; atomic_tnhandoff; @@ -52,7 +52,7 @@ struct task_struct; #define LOCK_STAT_INIT(field) #define LOCK_STAT_INITIALIZER(field) { \ __FILE__, __FUNCTION__, __LINE__, \ - ATOMIC_INIT(0), LIST_HEAD_INIT(field)} + ATOMIC_INIT(0), LIST_HEAD_INIT(field) } #define LOCK_STAT_NOTE __FILE__, __FUNCTION__, __LINE__ #define LOCK_STAT_NOTE_VARS_file, _function, _line @@ -84,9 +84,12 @@ extern void lock_stat_sys_init(void); #define lock_stat_is_initialized(o) ((unsigned long) (*o)->file) -extern void lock_stat_note_contention(lock_stat_ref_t *ls, struct task_struct *owner, unsigned long ip, int handoff); +extern void lock_stat_note_contention(lock_stat_ref_t *ls, + struct task_struct *owner, + unsigned long ip, int handoff); extern void lock_stat_print(void); -extern void lock_stat_scoped_attach(lock_stat_ref_t *_s, LOCK_STAT_NOTE_PARAM_DECL); +extern void lock_stat_scoped_attach(lock_stat_ref_t *_s, + LOCK_STAT_NOTE_PARAM_DECL); #define ksym_strcmp(a, b) strncmp(a, b, KSYM_NAME_LEN) #define ksym_strcpy(a, b) strncpy(a, b, KSYM_NAME_LEN) @@ -102,10 +105,11 @@ static inline char * ksym_strdup(const c #define LS_INIT(name, h) { \ /*.function,*/ .file = h, .line = 1,\ - .nhandoff = ATOMIC_INIT(0), .ntracked = 0, .ncontended = ATOMIC_INIT(0),\ + .nhandoff = ATOMIC_INIT(0), .ntracked = 0, \ + .ncontended = ATOMIC_INIT(0), \ .list_head = LIST_HEAD_INIT(name.list_head),\ - .rb_node.rb_left = NULL, .rb_node.rb_left = NULL \ - } + .rb_node.rb_left = NULL,\ + .rb_node.rb_left = NULL } \ #define DECLARE_LS_ENTRY(name) \ extern struct lock_stat _lock_stat_##name##_entry @@ -114,7 +118,8 @@ static inline char * ksym_strdup(const c */ #define DEFINE_LS_ENTRY(name) \ - struct lock_stat _lock_stat_##name##_entry = LS_INIT(_lock_stat_##name##_entry, #name "_string") + struct lock_stat _lock_stat_##name##_entry =\ + LS_INIT(_lock_stat_##name##_entry, #name "_string") DECLARE_LS_ENTRY(d_alloc); DECLARE_LS_ENTRY(eventpoll_init_file); Index: linux-2.6.19/kernel/lock_stat.c === --- linux-2.6.19.orig/kernel/lock_stat.c +++ linux-2.6.19/kernel/lock_stat.c @@ -71,7 +71,9 @@ static char null_string[] = ""; static char static_string[]= "-"; static char special_static_string[]= "-"; -struct lock_stat _lock_stat_null_entry = LS_INIT(_lock_stat_null_entry, null_string); +struct lock_stat _lock_stat_null_entry = + LS_INIT(_lock_stat_null_entry, null_string); + EXPORT_SYMBOL(_lock_stat_null_entry); static DEFINE_LS_ENTRY(inline);/* lock_stat_inline_entry */ @@ -111,10 +113,12 @@ static DEFINE_LS_ENTRY(tcp_init_1); static DEFINE_LS_ENTRY(tcp_init_2); */ -/* I should never have to create more entries that this since I audited the kernel - * and found out that there are only ~1500 or so places in the kernel where these - * rw/spinlocks are initialized. Use the initialization points as a hash value to - * look up the backing objects */ +/* + * I should never have to create more entries that this since I
Re: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]
* Bill Huey <[EMAIL PROTECTED]> wrote: > On Tue, Dec 26, 2006 at 04:51:21PM -0800, Chen, Tim C wrote: > > Ingo Molnar wrote: > > > If you'd like to profile this yourself then the lowest-cost way of > > > profiling lock contention on -rt is to use the yum kernel and run the > > > attached trace-it-lock-prof.c code on the box while your workload is > > > in 'steady state' (and is showing those extended idle times): > > > > > > ./trace-it-lock-prof > trace.txt > > > > Thanks for the pointer. Will let you know of any relevant traces. > > Tim, > > http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.lock_stat.patch > > You can also apply this patch to get more precise statistics down to > the lock. [...] your patch looks pretty ok to me in principle. A couple of suggestions to make it more mergable: - instead of BUG_ON()s please use DEBUG_LOCKS_WARN_ON() and make sure the code is never entered again if one assertion has been triggered. Pass down a return result of '0' to signal failure. See kernel/lockdep.c about how to do this. One thing we dont need are bugs in instrumentation bringing down a machine. - remove dead (#if 0) code - Documentation/CodingStyle compliance - the code is not ugly per se but still looks a bit 'alien' - please try to make it look Linuxish, if i apply this we'll probably stick with it forever. This is the major reason i havent applied it yet. - the xfs/wrap_lock change looks bogus - the lock is initialized already. What am i missing? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-rt14 slowdown compared to 2.6.19
* Chen, Tim C <[EMAIL PROTECTED]> wrote: > Ingo Molnar wrote: > > > > If you'd like to profile this yourself then the lowest-cost way of > > profiling lock contention on -rt is to use the yum kernel and run the > > attached trace-it-lock-prof.c code on the box while your workload is > > in 'steady state' (and is showing those extended idle times): > > > > ./trace-it-lock-prof > trace.txt > > > > this captures up to 1 second worth of system activity, on the current > > CPU. Then you can construct the histogram via: > > > > grep -A 1 ' __schedule()<-' trace.txt | cut -d: -f2- | sort | > > uniq -c | sort -n > prof.txt > > > > I did lock profiling on Volanomark as suggested and obtained the > profile that is listed below. thanks - this is really useful! > 246 > __sched_text_start()<-schedule()<-rt_spin_lock_slowlock()<-__lock_text_start() > 264 rt_mutex_slowunlock()<-rt_mutex_unlock()<-rt_up_read()<-(-1)() > 334 > __sched_text_start()<-schedule()<-posix_cpu_timers_thread()<-kthread() > 437 __sched_text_start()<-schedule()<-do_futex()<-sys_futex() > 467 (-1)()<-(0)()<-(0)()<-(0)() > 495 > __sched_text_start()<-preempt_schedule()<-__spin_unlock_irqrestore()<-rt_mutex_adjust_prio() > 497 __netif_rx_schedule()<-netif_rx()<-loopback_xmit()<-(-1)() > 499 __sched_text_start()<-schedule()<-schedule_timeout()<-sk_wait_data() > 500 tcp_recvmsg()<-sock_common_recvmsg()<-sock_recvmsg()<-(-1)() > 503 __rt_down_read()<-rt_down_read()<-do_futex()<-(-1)() >1160 __sched_text_start()<-schedule()<-ksoftirqd()<-kthread() >1433 __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)() >1497 child_rip()<-(-1)()<-(0)()<-(0)() >1936 > __sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock() > > Looks like the idle time I saw was due to lock contention during call > to futex_wake, which requires acquisition of current->mm->mmap_sem. > Many of the java threads share mm and result in concurrent access to > common mm. [...] ah. This explains why i'm not seeing this bad contention in a comparable workload (hackbench.c): because hackbench uses processes not threads. > [...] Looks like under rt case there is no special treatment to read > locking so the read lock accesses are contended under __rt_down_read. > For non rt case, __down_read makes the distinction for read lock > access and the read lockings do not contend. yeah, makes sense. I'll do something about this. > Things are made worse here as this delayed waking up processes locked > by the futex. See also a snippet of the latency_trace below. > > -0 2D..2 5821us!: thread_return (150 20) > -0 2DN.1 6278us : > __sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)() > -0 2DN.1 6278us : (0)()<-(0)()<-(0)()<-(0)() > java-6648 2D..2 6280us+: thread_return <-0> (20 -4) > java-6648 2D..1 6296us : > try_to_wake_up()<-wake_up_process()<-wakeup_next_waiter()<-rt_mutex_slowunlock() > java-6648 2D..1 6296us : > rt_mutex_unlock()<-rt_up_read()<-do_futex()<-(-1)() > java-6648 2D..2 6297us : effective_prio <<...>-6673> (-4 -4) > java-6648 2D..2 6297us : __activate_task <<...>-6673> (-4 1) > java-6648 2 6297us < (-11) > java-6648 2 6298us+> sys_futex (00afaf50 0001 > 0001) > java-6648 2...1 6315us : > __sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock() > java-6648 2...1 6315us : > __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)() > java-6648 2D..2 6316us+: deactivate_task (-4 1) > -0 2D..2 6318us+: thread_return (-4 20) > -0 2DN.1 6327us : > __sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)() > -0 2DN.1 6328us+: (0)()<-(0)()<-(0)()<-(0)() > java-6629 2D..2 6330us+: thread_return <-0> (20 -4) > java-6629 2D..1 6347us : > try_to_wake_up()<-wake_up_process()<-wakeup_next_waiter()<-rt_mutex_slowunlock() > java-6629 2D..1 6347us : > rt_mutex_unlock()<-rt_up_read()<-futex_wake()<-(-1)() > java-6629 2D..2 6348us : effective_prio (-4 -4) > java-6629 2D..2 6349us : __activate_task (-4 1) > java-6629 2 6350us+< (0) > java-6629 2 6352us+> sys_futex (00afc1dc 0001 > 0001) > java-6629 2...1 6368us : > __sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock() > java-6629 2...1 6368us : > __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)() > java-6629 2D..2 6369us+: deactivate_task (-4 1) > -0 2D..2 6404us!: thread_return (-4 20) > -0 2DN.1 6584us : > __sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)() indeed - basically the mm semaphore is a common resource here. I suspect you'll see somewhat better numbers by using idle=poll or idle=mwait (or are using those options already?). (could you send me the whole trace if you still have it? It would be interesting to
RE: 2.6.19-rt14 slowdown compared to 2.6.19
Ingo Molnar wrote: > > If you'd like to profile this yourself then the lowest-cost way of > profiling lock contention on -rt is to use the yum kernel and run the > attached trace-it-lock-prof.c code on the box while your workload is > in 'steady state' (and is showing those extended idle times): > > ./trace-it-lock-prof > trace.txt > > this captures up to 1 second worth of system activity, on the current > CPU. Then you can construct the histogram via: > > grep -A 1 ' __schedule()<-' trace.txt | cut -d: -f2- | sort | > uniq -c | sort -n > prof.txt > I did lock profiling on Volanomark as suggested and obtained the profile that is listed below. 246 __sched_text_start()<-schedule()<-rt_spin_lock_slowlock()<-__lock_text_s tart() 264 rt_mutex_slowunlock()<-rt_mutex_unlock()<-rt_up_read()<-(-1)() 334 __sched_text_start()<-schedule()<-posix_cpu_timers_thread()<-kthread() 437 __sched_text_start()<-schedule()<-do_futex()<-sys_futex() 467 (-1)()<-(0)()<-(0)()<-(0)() 495 __sched_text_start()<-preempt_schedule()<-__spin_unlock_irqrestore()<-rt _mutex_adjust_prio() 497 __netif_rx_schedule()<-netif_rx()<-loopback_xmit()<-(-1)() 499 __sched_text_start()<-schedule()<-schedule_timeout()<-sk_wait_data() 500 tcp_recvmsg()<-sock_common_recvmsg()<-sock_recvmsg()<-(-1)() 503 __rt_down_read()<-rt_down_read()<-do_futex()<-(-1)() 1160 __sched_text_start()<-schedule()<-ksoftirqd()<-kthread() 1433 __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)() 1497 child_rip()<-(-1)()<-(0)()<-(0)() 1936 __sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock() Looks like the idle time I saw was due to lock contention during call to futex_wake, which requires acquisition of current->mm->mmap_sem. Many of the java threads share mm and result in concurrent access to common mm. Looks like under rt case there is no special treatment to read locking so the read lock accesses are contended under __rt_down_read. For non rt case, __down_read makes the distinction for read lock access and the read lockings do not contend. Things are made worse here as this delayed waking up processes locked by the futex. See also a snippet of the latency_trace below. -0 2D..2 5821us!: thread_return (150 20) -0 2DN.1 6278us : __sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)() -0 2DN.1 6278us : (0)()<-(0)()<-(0)()<-(0)() java-6648 2D..2 6280us+: thread_return <-0> (20 -4) java-6648 2D..1 6296us : try_to_wake_up()<-wake_up_process()<-wakeup_next_waiter()<-rt_mutex_slow unlock() java-6648 2D..1 6296us : rt_mutex_unlock()<-rt_up_read()<-do_futex()<-(-1)() java-6648 2D..2 6297us : effective_prio <<...>-6673> (-4 -4) java-6648 2D..2 6297us : __activate_task <<...>-6673> (-4 1) java-6648 2 6297us < (-11) java-6648 2 6298us+> sys_futex (00afaf50 0001 0001) java-6648 2...1 6315us : __sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock() java-6648 2...1 6315us : __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)() java-6648 2D..2 6316us+: deactivate_task (-4 1) -0 2D..2 6318us+: thread_return (-4 20) -0 2DN.1 6327us : __sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)() -0 2DN.1 6328us+: (0)()<-(0)()<-(0)()<-(0)() java-6629 2D..2 6330us+: thread_return <-0> (20 -4) java-6629 2D..1 6347us : try_to_wake_up()<-wake_up_process()<-wakeup_next_waiter()<-rt_mutex_slow unlock() java-6629 2D..1 6347us : rt_mutex_unlock()<-rt_up_read()<-futex_wake()<-(-1)() java-6629 2D..2 6348us : effective_prio (-4 -4) java-6629 2D..2 6349us : __activate_task (-4 1) java-6629 2 6350us+< (0) java-6629 2 6352us+> sys_futex (00afc1dc 0001 0001) java-6629 2...1 6368us : __sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock() java-6629 2...1 6368us : __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)() java-6629 2D..2 6369us+: deactivate_task (-4 1) -0 2D..2 6404us!: thread_return (-4 20) -0 2DN.1 6584us : __sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)() Thanks. Tim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]
On Tue, Dec 26, 2006 at 04:51:21PM -0800, Chen, Tim C wrote: > Ingo Molnar wrote: > > If you'd like to profile this yourself then the lowest-cost way of > > profiling lock contention on -rt is to use the yum kernel and run the > > attached trace-it-lock-prof.c code on the box while your workload is > > in 'steady state' (and is showing those extended idle times): > > > > ./trace-it-lock-prof > trace.txt > > Thanks for the pointer. Will let you know of any relevant traces. Tim, http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.lock_stat.patch You can also apply this patch to get more precise statistics down to the lock. For example: ... [50, 30, 279 :: 1, 0] {tty_ldisc_try, -, 0} [5, 5, 0 :: 19, 0] {alloc_super, fs/super.c, 76} [5, 5, 3 :: 1, 0] {__free_pages_ok, -, 0} [5728, 862, 156 :: 2, 0]{journal_init_common, fs/jbd/journal.c, 667} [594713, 79020, 4287 :: 60818, 0] {inode_init_once, fs/inode.c, 193} [602, 0, 0 :: 1, 0] {lru_cache_add_active, -, 0} [63, 5, 59 :: 1, 0] {lookup_mnt, -, 0} [6425, 378, 103 :: 24, 0] {initialize_tty_struct, drivers/char/tty_io.c, 3530} [6708, 1, 225 :: 1, 0] {file_move, -, 0} [67, 8, 15 :: 1, 0] {do_lookup, -, 0} [69, 0, 0 :: 1, 0] {exit_mmap, -, 0} [7, 0, 0 :: 1, 0] {uart_set_options, drivers/serial/serial_core.c, 1876} [76, 0, 0 :: 1, 0] {get_zone_pcp, -, 0} [, 5, 9 :: 1, 0]{as_work_handler, -, 0} [8689, 0, 0 :: 15, 0] {create_workqueue_thread, kernel/workqueue.c, 474} [89, 7, 6 :: 195, 0]{sighand_ctor, kernel/fork.c, 1474} @contention events = 1791177 @found = 21 Is the output from /proc/lock_stat/contention. First column is the number of contention that will results in a full block of the task, second is the number of times the mutex owner is active on a per cpu run queue the scheduler and third is the number of times Steve Rostedt's ownership handoff code averted a full block. Peter Zijlstra used it initially during his files_lock work. Overhead of the patch is very low since it is only recording stuff in the slow path of the rt-mutex implementation. Writing to that file clears all of the stats for a fresh run with a benchmark. This should give a precise point at which any contention would happen in -rt. In general, -rt should do about as well as the stock kernel minus the overhead of interrupt threads. Since the last release, I've added checks for whether the task is running as "current" on a run queue to see if adaptive spins would be useful in -rt. These new stats show that only a small percentage of events would benefit from the use of adaptive spins in front of a rt- mutex. Any implementation of it would have little impact on the system. It's not the mechanism but the raw MP work itself that contributes to the good MP performance of Linux. Apply and have fun. bill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.19-rt14 slowdown compared to 2.6.19
Ingo Molnar wrote: > > cool - thanks for the feedback! Running the 64-bit kernel, right? > Yes, 64-bit kernel was used. > > while some slowdown is to be expected, did in each case idle time > increase significantly? Volanomark and Re-Aim7 ran close to 0% idle time for 2.6.19 kernel. Idle time increase significantly for Volanomark (to 60% idle) and Re-Aim7 (to 20% idle) with the rt kernel. For netperf, the system was 60% idle for both 2.6.19 and rt kernel and changes in idle time was not significant. > If yes then this is the effect of lock > contention. Lock contention effects are 'magnified' by PREEMPT_RT. For > example if you run 128 threads workload that all use the same lock > then > the -rt kernel can act as if it were a 128-way box (!). This way by > running -rt you'll see scalability problems alot sooner than on real > hardware. In other words: PREEMPT_RT in essence simulates the > scalability behavior of up to an infinite amount of CPUs. (with the > exception of cachemiss emulation ;) [the effect is not this precise, > but > that's the rough trend] Turning off PREEMPT_RT for 2.6.20-rc2-rt0 kernel restored most the performance of Volanaomark and Re-Aim7. Idle time is close to 0%. So the benchmarks with large number of threads are affected more by PREEMPT_RT. For netperf TCP streaming, the performance improved from 40% down to 20% down from 2.6.20-rc2 kernel. There is only a server and a client process for netperf. The underlying reason for the change in performance is probably different. > > If you'd like to profile this yourself then the lowest-cost way of > profiling lock contention on -rt is to use the yum kernel and run the > attached trace-it-lock-prof.c code on the box while your workload is > in 'steady state' (and is showing those extended idle times): > > ./trace-it-lock-prof > trace.txt > Thanks for the pointer. Will let you know of any relevant traces. Thanks. Tim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-rt14 slowdown compared to 2.6.19
* Chen, Tim C <[EMAIL PROTECTED]> wrote: > Ingo, > > We did some benchmarking on 2.6.19-rt14, compared it with 2.6.19 > kernel and noticed several slowdowns. The test machine is a 2 socket > woodcrest machine with your default configuration. cool - thanks for the feedback! Running the 64-bit kernel, right? > Netperf TCP Streaming was slower by 40% ( 1 server and 1 client each > bound to separate cpu cores on different socket, network loopback mode > was used). > > Volanomark was slower by 80% (Server and Clients communicate with > network loopback mode. Idle time goes from 1% to 60%) > > Re-Aim7 was slower by 40% (idle time goes from 0% to 20%) > > Wonder if you have any suggestions on what could cause the slowdown. > We've tried disabling CONFIG_NO_HZ and it didn't help much. while some slowdown is to be expected, did in each case idle time increase significantly? If yes then this is the effect of lock contention. Lock contention effects are 'magnified' by PREEMPT_RT. For example if you run 128 threads workload that all use the same lock then the -rt kernel can act as if it were a 128-way box (!). This way by running -rt you'll see scalability problems alot sooner than on real hardware. In other words: PREEMPT_RT in essence simulates the scalability behavior of up to an infinite amount of CPUs. (with the exception of cachemiss emulation ;) [the effect is not this precise, but that's the rough trend] the good news: any fix for such problems in -rt is useful for the upstream kernel too, because it improves scalability on real hardware as well. The bad news: this work hasnt been finished yet ;-) We've fixed a couple of contention points discovered by -rt this way, in particular Peter's excellent scalability patches at: http://programming.kicks-ass.net/kernel-patches/ were done based on lock-contention profiling feedback. I've included the lockless pagecache and the s_files scalability patches which were very useful already. The lru_cache_add one is the next one for me to try, plus on i686 -rt kernels highmem locking hurts, hence the highmem_rewrite.patch (not in -rt yet either). Plus there's the really exciting concurrent-pagecache patch-queue from Peter, which will be interesting to try too. If you'd like to profile this yourself then the lowest-cost way of profiling lock contention on -rt is to use the yum kernel and run the attached trace-it-lock-prof.c code on the box while your workload is in 'steady state' (and is showing those extended idle times): ./trace-it-lock-prof > trace.txt this captures up to 1 second worth of system activity, on the current CPU. Then you can construct the histogram via: grep -A 1 ' __schedule()<-' trace.txt | cut -d: -f2- | sort | uniq -c | sort -n > prof.txt i just did this for "hackbench 20", and it gave me: 38 kfree()<-skb_release_data()<-kfree_skbmem()<-(-1)() 40 kmem_cache_free()<-kfree_skbmem()<-__kfree_skb()<-(-1)() 47 __schedule()<-preempt_schedule_irq()<-retint_kernel()<-__spin_unlock_irqrestore() 173 rt_read_lock()<-sock_def_readable()<-unix_stream_sendmsg()<-(-1)() 173 __schedule()<-schedule()<-rt_spin_lock_slowlock()<-__lock_text_start() 274 __schedule()<-schedule()<-posix_cpu_timers_thread()<-kthread() 277 __schedule()<-schedule()<-ksoftirqd()<-kthread() 365 kmem_cache_alloc_node()<-__alloc_skb()<-sock_alloc_send_skb()<-(-1)() 366 kmem_cache_alloc_node()<-__kmalloc_node()<-__alloc_skb()<-(-1)() 814 __schedule()<-schedule()<-rt_spin_lock_slowlock()<-rt_spin_lock() 1819 __schedule()<-schedule()<-int_careful()<-() 2346 __schedule()<-schedule()<-schedule_timeout()<-unix_stream_recvmsg() 2346 sock_aio_read()<-do_sync_read()<-vfs_read()<-(-1)() (note the output mixes the two lines of symbolic stack backtrace entries that are generated by the tracer, so the last two lines for example are for the same contention/preemption point.) this means that the biggest contention points (or natural rescheduling points) in this workload are sock_aio_read()->unix_stream_recvmsg() - not unexpected. But there's also ~10% of contention in the kmalloc code, meaning that for this workload the per-CPU slabs are probably set a bit too low. There's also ~5% of contention from sock_def_readable()'s use of an rwlock. with more CPUs and more tasks you'll see these effects in a more drastic way. Ingo /* * Copyright (C) 2005, Ingo Molnar <[EMAIL PROTECTED]> * * user-triggered tracing. * * The -rt kernel has a built-in kernel tracer, which will trace * all kernel function calls (and a couple of special events as well), * by using a build-time gcc feature that instruments all kernel * functions. * * The tracer is highly automated for a number of latency tracing purposes, * but it can also be switched into 'user-triggered' mode, which is a * half-automatic tracing mode where userspace apps start and stop the * t
Re: 2.6.19-rt14 slowdown compared to 2.6.19
Chen, Tim C wrote: > Ingo, > > We did some benchmarking on 2.6.19-rt14, compared it with 2.6.19 > kernel and noticed several slowdowns. The test machine is a 2 socket > woodcrest machine with your default configuration. > > Netperf TCP Streaming was slower by 40% ( 1 server and 1 client > each bound to separate cpu cores on different socket, network > loopback mode was used). > > Volanomark was slower by 80% (Server and Clients communicate with > network loopback mode. Idle time goes from 1% to 60%) > > Re-Aim7 was slower by 40% (idle time goes from 0% to 20%) > > Wonder if you have any suggestions on what could cause the slowdown. > We've tried disabling CONFIG_NO_HZ and it didn't help much. > > Thanks. > > Tim Take a look at this. Not sure if this is the same problem but it looks like a candidate. I can definitely say that some latencies I have seen in my recent testing have gone away in the last few versions 2.6.20-rc1-rt{3,4}. -- kr - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-rt14 slowdown compared to 2.6.19
Chen, Tim C wrote: > Ingo, > > We did some benchmarking on 2.6.19-rt14, compared it with 2.6.19 > kernel and noticed several slowdowns. The test machine is a 2 socket > woodcrest machine with your default configuration. > > Netperf TCP Streaming was slower by 40% ( 1 server and 1 client > each bound to separate cpu cores on different socket, network > loopback mode was used). > > Volanomark was slower by 80% (Server and Clients communicate with > network loopback mode. Idle time goes from 1% to 60%) > > Re-Aim7 was slower by 40% (idle time goes from 0% to 20%) > > Wonder if you have any suggestions on what could cause the slowdown. > We've tried disabling CONFIG_NO_HZ and it didn't help much. > > Thanks. > > Tim Take a look at this. Not sure if this is the same problem but it looks like a candidate. I can definitely say that some latencies I have seen in my recent testing have gone away in the last few versions 2.6.20-rc1-rt{3,4}. Sorry I missed the URL first time around. http://marc.theaimsgroup.com/?l=linux-kernel&m=116670388527349&w=2 -- kr - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-rt14 slowdown compared to 2.6.19
On Fri, 2006-12-22 at 13:39 -0800, Chen, Tim C wrote: > Wonder if you have any suggestions on what could cause the slowdown. > We've tried disabling CONFIG_NO_HZ and it didn't help much. Did you compare PREEMPT_RT with vanilla PREEMPT ? Or one version of PREEMPT_RT vs. another ? Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/