Re: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]

2007-01-02 Thread hui
On Tue, Jan 02, 2007 at 03:12:34PM -0800, Bill Huey wrote:
> On Tue, Jan 02, 2007 at 02:51:05PM -0800, Chen, Tim C wrote:
> > Bill,
> > 
> > I'm having some problem getting this patch to run stablely.  I'm
> > encoutering errors like that in the trace that follow:
> 
> It might the case that the lock isn't know to the lock stats code yet.
> It's got some technical overlap with lockdep in that a lock might not be
> known yet and is causing a crashing.

The stack trace and code examination reveals, if that structure in the
timer code is used before it's initialized by the CPU bringup, it'll
cause problems like that crash. I'll look at it later on tonight.

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]

2007-01-02 Thread hui
On Tue, Jan 02, 2007 at 02:51:05PM -0800, Chen, Tim C wrote:
> Bill,
> 
> I'm having some problem getting this patch to run stablely.  I'm
> encoutering errors like that in the trace that follow:
> 
> Thanks.
> Tim
> 
> Unable to handle kernel NULL pointer dereference at 0008

Yes, those are the reason why I have some aggressive asserts in the code
to try track down the problem. Try this:

http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.1.lock_stat.patch

It's got some cosmestic clean up in it to make it more Linux-ish instead
of me trying to reinvent some kind of Smalltalk system in the kernel. I'm
trying to address all of Ingo's complaints about the code so it's still a
work in progress, namely the style issues (I'd like help/suggestions on
that) and assert conventions.

It might the case that the lock isn't know to the lock stats code yet.
It's got some technical overlap with lockdep in that a lock might not be
known yet and is causing a crashing.

Try that patch and report back to me what happens.

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]

2007-01-02 Thread Chen, Tim C
Bill Huey (hui) wrote:
> On Tue, Dec 26, 2006 at 04:51:21PM -0800, Chen, Tim C wrote:
>> Ingo Molnar wrote:
>>> If you'd like to profile this yourself then the lowest-cost way of
>>> profiling lock contention on -rt is to use the yum kernel and run
>>> the attached trace-it-lock-prof.c code on the box while your
>>> workload is in 'steady state' (and is showing those extended idle
>>> times): 
>>> 
>>>   ./trace-it-lock-prof > trace.txt
>> 
>> Thanks for the pointer.  Will let you know of any relevant traces.
> 
> Tim,
>
http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.lock_stat.pat
ch
> 
> You can also apply this patch to get more precise statistics down to
> the lock. For example:
> 
Bill,

I'm having some problem getting this patch to run stablely.  I'm
encoutering errors like that in the trace that follow:

Thanks.
Tim



Unable to handle kernel NULL pointer dereference at 0008
RIP:
 [] lock_stat_note_contention+0x12d/0x1c3
PGD 0
Oops:  [1] PREEMPT SMP
CPU 1
Modules linked in: autofs4 sunrpc dm_mirror dm_mod video sbs i2c_ec dock
button battery ac uhci_hcd ehci_hcd i2dPid: 0, comm: swapper Not tainted
2.6.20-rc2-rt2 #4

RIP: 0010:[]  []
lock_stat_note_contention+0x12d/0x1c3
RSP: 0018:81013fdb3d28  EFLAGS: 00010097
RAX: 81013fd68018 RBX: 81013fd68000 RCX: 
RDX: 8026762e RSI:  RDI: 8026762e
RBP: 81013fdb3df8 R08: 8100092bab60 R09: 8100092aafc8
R10: 0001 R11:  R12: 81013fd68030
R13:  R14: 0046 R15: 002728d5ecd0
FS:  () GS:81013fd078c0()
knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 0008 CR3: 00201000 CR4: 06e0
Process swapper (pid: 0, threadinfo 81013fdb2000, task
81013fdb14e0)
Stack:   00020001 

   0100 000e
 000e   8100092bc440
Call Trace:
 [] rt_mutex_slowtrylock+0x84/0x9b
 [] rt_mutex_trylock+0x2e/0x30
 [] rt_spin_trylock+0x9/0xb
 [] get_next_timer_interrupt+0x34/0x226
 [] hrtimer_stop_sched_tick+0xb6/0x138
 [] cpu_idle+0x1b/0xdd
 [] start_secondary+0x2ed/0x2f9

---
| preempt count: 0003 ]
| 3-level deep critical section nesting:

.. []  cpu_idle+0xd7/0xdd
.[] ..   ( <= start_secondary+0x2ed/0x2f9)
.. []  __spin_lock_irqsave+0x18/0x42
.[] ..   ( <= rt_mutex_slowtrylock+0x19/0x9b)
.. []  __spin_trylock+0x14/0x4c
.[] ..   ( <= oops_begin+0x23/0x6f)

skipping trace printing on CPU#1 != -1

Code: 49 8b 45 08 8b 78 18 75 0d 49 8b 04 24 f0 ff 80 94 00 00 00
RIP  [] lock_stat_note_contention+0x12d/0x1c3
 RSP 
CR2: 0008
 <3>BUG: sleeping function called from invalid context swapper(0) at
kernel/rtmutex.c:1312
in_atomic():1 [0002], irqs_disabled():1

Call Trace:
 [] dump_trace+0xbe/0x3cd
 [] show_trace+0x3a/0x58
 [] dump_stack+0x15/0x17
 [] __might_sleep+0x103/0x10a
 [] rt_mutex_lock_with_ip+0x1e/0xac
 [] __rt_down_read+0x49/0x4d
 [] rt_down_read+0xb/0xd
 [] blocking_notifier_call_chain+0x19/0x3f
 [] profile_task_exit+0x15/0x17
 [] do_exit+0x25/0x8de
 [] do_page_fault+0x7d4/0x856
 [] error_exit+0x0/0x84
 [] lock_stat_note_contention+0x12d/0x1c3
 [] rt_mutex_slowtrylock+0x84/0x9b
 [] rt_mutex_trylock+0x2e/0x30
 [] rt_spin_trylock+0x9/0xb
 [] get_next_timer_interrupt+0x34/0x226
 [] hrtimer_stop_sched_tick+0xb6/0x138
 [] cpu_idle+0x1b/0xdd
 [] start_secondary+0x2ed/0x2f9

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.19-rt14 slowdown compared to 2.6.19

2007-01-02 Thread Chen, Tim C
Ingo Molnar wrote:
> 
> (could you send me the whole trace if you still have it? It would be
> interesting to see a broader snippet from the life of individual java
> threads.)
> 
>   Ingo

Sure, I'll send it to you separately due to the size of the complete
trace.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]

2006-12-30 Thread hui
On Sat, Dec 30, 2006 at 06:56:08AM -0800, Daniel Walker wrote:
> On Sat, 2006-12-30 at 12:19 +0100, Ingo Molnar wrote:
> 
> > 
> >  - Documentation/CodingStyle compliance - the code is not ugly per se
> >but still looks a bit 'alien' - please try to make it look Linuxish,
> >if i apply this we'll probably stick with it forever. This is the
> >major reason i havent applied it yet.
> 
> I did some cleanup while reviewing the patch, nothing very exciting but
> it's an attempt to bring it more into the "Linuxish" scope .. I didn't
> compile it so be warned.
> 
> There lots of ifdef'd code under CONFIG_LOCK_STAT inside rtmutex.c I
> suspect it would be a benefit to move that all into a header and ifdef
> only in the header .

Ingo and Daniel,

I'll try and make it more Linuxish. It's one of the reasons why I posted
it since I knew it would need some kind of help in that arena and I've
been in need of feedback regarding it. Originally, I picked a style that
made what I was doing extremely obvious and clear to facilitate
development which is the rationale behind it.

I'll make those changes and we can progressively pass it back and forth
to see if this passes.

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]

2006-12-30 Thread Daniel Walker
On Sat, 2006-12-30 at 12:19 +0100, Ingo Molnar wrote:

> 
>  - Documentation/CodingStyle compliance - the code is not ugly per se
>but still looks a bit 'alien' - please try to make it look Linuxish,
>if i apply this we'll probably stick with it forever. This is the
>major reason i havent applied it yet.

I did some cleanup while reviewing the patch, nothing very exciting but
it's an attempt to bring it more into the "Linuxish" scope .. I didn't
compile it so be warned.

There lots of ifdef'd code under CONFIG_LOCK_STAT inside rtmutex.c I
suspect it would be a benefit to move that all into a header and ifdef
only in the header .

Daniel



Signed-Off-By: Daniel Walker <[EMAIL PROTECTED]>

---
 include/linux/lock_stat.h |   25 +++--
 kernel/lock_stat.c|   34 ++
 2 files changed, 37 insertions(+), 22 deletions(-)

Index: linux-2.6.19/include/linux/lock_stat.h
===
--- linux-2.6.19.orig/include/linux/lock_stat.h
+++ linux-2.6.19/include/linux/lock_stat.h
@@ -26,9 +26,9 @@
 #include 
 
 typedef struct lock_stat {
-   charfunction[KSYM_NAME_LEN];
-   int line;
-   char*file;
+   charfunction[KSYM_NAME_LEN];
+   int line;
+   char*file;
 
atomic_tncontended;
atomic_tnhandoff;
@@ -52,7 +52,7 @@ struct task_struct;
 #define LOCK_STAT_INIT(field)
 #define LOCK_STAT_INITIALIZER(field) { \
__FILE__, __FUNCTION__, __LINE__,   \
-   ATOMIC_INIT(0), LIST_HEAD_INIT(field)}
+   ATOMIC_INIT(0), LIST_HEAD_INIT(field) }
 
 #define LOCK_STAT_NOTE __FILE__, __FUNCTION__, __LINE__
 #define LOCK_STAT_NOTE_VARS_file, _function, _line
@@ -84,9 +84,12 @@ extern void lock_stat_sys_init(void);
 
 #define lock_stat_is_initialized(o) ((unsigned long) (*o)->file)
 
-extern void lock_stat_note_contention(lock_stat_ref_t *ls, struct task_struct 
*owner, unsigned long ip, int handoff);
+extern void lock_stat_note_contention(lock_stat_ref_t *ls,
+ struct task_struct *owner,
+ unsigned long ip, int handoff);
 extern void lock_stat_print(void);
-extern void lock_stat_scoped_attach(lock_stat_ref_t *_s, 
LOCK_STAT_NOTE_PARAM_DECL);
+extern void lock_stat_scoped_attach(lock_stat_ref_t *_s,
+   LOCK_STAT_NOTE_PARAM_DECL);
 
 #define ksym_strcmp(a, b) strncmp(a, b, KSYM_NAME_LEN)
 #define ksym_strcpy(a, b) strncpy(a, b, KSYM_NAME_LEN)
@@ -102,10 +105,11 @@ static inline char * ksym_strdup(const c
 
 #define LS_INIT(name, h) { \
/*.function,*/ .file = h, .line = 1,\
-   .nhandoff = ATOMIC_INIT(0), .ntracked = 0, .ncontended = 
ATOMIC_INIT(0),\
+   .nhandoff = ATOMIC_INIT(0), .ntracked = 0,  \
+   .ncontended = ATOMIC_INIT(0),   \
.list_head = LIST_HEAD_INIT(name.list_head),\
-   .rb_node.rb_left = NULL, .rb_node.rb_left = NULL \
-   }
+   .rb_node.rb_left = NULL,\
+   .rb_node.rb_left = NULL }   \
 
 #define DECLARE_LS_ENTRY(name) \
extern struct lock_stat _lock_stat_##name##_entry
@@ -114,7 +118,8 @@ static inline char * ksym_strdup(const c
 */
 
 #define DEFINE_LS_ENTRY(name)  \
-   struct lock_stat _lock_stat_##name##_entry = 
LS_INIT(_lock_stat_##name##_entry, #name "_string")
+   struct lock_stat _lock_stat_##name##_entry =\
+   LS_INIT(_lock_stat_##name##_entry, #name "_string")
 
 DECLARE_LS_ENTRY(d_alloc);
 DECLARE_LS_ENTRY(eventpoll_init_file);
Index: linux-2.6.19/kernel/lock_stat.c
===
--- linux-2.6.19.orig/kernel/lock_stat.c
+++ linux-2.6.19/kernel/lock_stat.c
@@ -71,7 +71,9 @@ static char null_string[] = "";
 static char static_string[]= "-";
 static char special_static_string[]= "-";
 
-struct lock_stat _lock_stat_null_entry = LS_INIT(_lock_stat_null_entry,   
null_string);
+struct lock_stat _lock_stat_null_entry =
+   LS_INIT(_lock_stat_null_entry, null_string);
+
 EXPORT_SYMBOL(_lock_stat_null_entry);
 
 static DEFINE_LS_ENTRY(inline);/* lock_stat_inline_entry   
*/
@@ -111,10 +113,12 @@ static DEFINE_LS_ENTRY(tcp_init_1);
 static DEFINE_LS_ENTRY(tcp_init_2);
 */
 
-/* I should never have to create more entries that this since I audited the 
kernel
- * and found out that there are only ~1500 or so places in the kernel where 
these
- * rw/spinlocks are initialized. Use the initialization points as a hash value 
to
- * look up the backing objects */
+/*
+ * I should never have to create more entries that this since I

Re: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]

2006-12-30 Thread Ingo Molnar

* Bill Huey <[EMAIL PROTECTED]> wrote:

> On Tue, Dec 26, 2006 at 04:51:21PM -0800, Chen, Tim C wrote:
> > Ingo Molnar wrote:
> > > If you'd like to profile this yourself then the lowest-cost way of
> > > profiling lock contention on -rt is to use the yum kernel and run the
> > > attached trace-it-lock-prof.c code on the box while your workload is
> > > in 'steady state' (and is showing those extended idle times):
> > > 
> > >   ./trace-it-lock-prof > trace.txt
> >
> > Thanks for the pointer.  Will let you know of any relevant traces.
> 
> Tim,
>   
> http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.lock_stat.patch
> 
> You can also apply this patch to get more precise statistics down to
> the lock. [...]

your patch looks pretty ok to me in principle. A couple of suggestions 
to make it more mergable:

 - instead of BUG_ON()s please use DEBUG_LOCKS_WARN_ON() and make sure 
   the code is never entered again if one assertion has been triggered.
   Pass down a return result of '0' to signal failure. See
   kernel/lockdep.c about how to do this. One thing we dont need are
   bugs in instrumentation bringing down a machine.

 - remove dead (#if 0) code

 - Documentation/CodingStyle compliance - the code is not ugly per se
   but still looks a bit 'alien' - please try to make it look Linuxish,
   if i apply this we'll probably stick with it forever. This is the
   major reason i havent applied it yet.

 - the xfs/wrap_lock change looks bogus - the lock is initialized
   already. What am i missing?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-rt14 slowdown compared to 2.6.19

2006-12-30 Thread Ingo Molnar

* Chen, Tim C <[EMAIL PROTECTED]> wrote:

> Ingo Molnar wrote:
> > 
> > If you'd like to profile this yourself then the lowest-cost way of
> > profiling lock contention on -rt is to use the yum kernel and run the
> > attached trace-it-lock-prof.c code on the box while your workload is
> > in 'steady state' (and is showing those extended idle times):
> > 
> >   ./trace-it-lock-prof > trace.txt
> > 
> > this captures up to 1 second worth of system activity, on the current
> > CPU. Then you can construct the histogram via:
> > 
> >   grep -A 1 ' __schedule()<-' trace.txt | cut -d: -f2- | sort |
> >   uniq -c | sort -n > prof.txt
> > 
> 
> I did lock profiling on Volanomark as suggested and obtained the 
> profile that is listed below.

thanks - this is really useful!

> 246  
> __sched_text_start()<-schedule()<-rt_spin_lock_slowlock()<-__lock_text_start()
> 264  rt_mutex_slowunlock()<-rt_mutex_unlock()<-rt_up_read()<-(-1)()
> 334  
> __sched_text_start()<-schedule()<-posix_cpu_timers_thread()<-kthread()
> 437  __sched_text_start()<-schedule()<-do_futex()<-sys_futex()
> 467  (-1)()<-(0)()<-(0)()<-(0)()
> 495  
> __sched_text_start()<-preempt_schedule()<-__spin_unlock_irqrestore()<-rt_mutex_adjust_prio()
> 497  __netif_rx_schedule()<-netif_rx()<-loopback_xmit()<-(-1)()
> 499  __sched_text_start()<-schedule()<-schedule_timeout()<-sk_wait_data()
> 500  tcp_recvmsg()<-sock_common_recvmsg()<-sock_recvmsg()<-(-1)()
> 503  __rt_down_read()<-rt_down_read()<-do_futex()<-(-1)()
>1160  __sched_text_start()<-schedule()<-ksoftirqd()<-kthread()
>1433  __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)()
>1497  child_rip()<-(-1)()<-(0)()<-(0)()
>1936  
> __sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock()
> 
> Looks like the idle time I saw was due to lock contention during call 
> to futex_wake, which requires acquisition of current->mm->mmap_sem. 
> Many of the java threads share mm and result in concurrent access to 
> common mm. [...]

ah. This explains why i'm not seeing this bad contention in a comparable 
workload (hackbench.c): because hackbench uses processes not threads.

> [...] Looks like under rt case there is no special treatment to read 
> locking so the read lock accesses are contended under __rt_down_read.  
> For non rt case, __down_read makes the distinction for read lock 
> access and the read lockings do not contend.

yeah, makes sense. I'll do something about this.

> Things are made worse here as this delayed waking up processes locked 
> by the futex. See also a snippet of the latency_trace below.
> 
>   -0 2D..2 5821us!: thread_return  (150 20)
>   -0 2DN.1 6278us : 
> __sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)()
>   -0 2DN.1 6278us : (0)()<-(0)()<-(0)()<-(0)()
> java-6648  2D..2 6280us+: thread_return <-0> (20 -4)
> java-6648  2D..1 6296us : 
> try_to_wake_up()<-wake_up_process()<-wakeup_next_waiter()<-rt_mutex_slowunlock()
> java-6648  2D..1 6296us : 
> rt_mutex_unlock()<-rt_up_read()<-do_futex()<-(-1)()
> java-6648  2D..2 6297us : effective_prio <<...>-6673> (-4 -4)
> java-6648  2D..2 6297us : __activate_task <<...>-6673> (-4 1)
> java-6648  2 6297us < (-11)
> java-6648  2 6298us+> sys_futex (00afaf50 0001 
> 0001)
> java-6648  2...1 6315us : 
> __sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock()
> java-6648  2...1 6315us : 
> __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)()
> java-6648  2D..2 6316us+: deactivate_task  (-4 1)
>   -0 2D..2 6318us+: thread_return  (-4 20)
>   -0 2DN.1 6327us : 
> __sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)()
>   -0 2DN.1 6328us+: (0)()<-(0)()<-(0)()<-(0)()
> java-6629  2D..2 6330us+: thread_return <-0> (20 -4)
> java-6629  2D..1 6347us : 
> try_to_wake_up()<-wake_up_process()<-wakeup_next_waiter()<-rt_mutex_slowunlock()
> java-6629  2D..1 6347us : 
> rt_mutex_unlock()<-rt_up_read()<-futex_wake()<-(-1)()
> java-6629  2D..2 6348us : effective_prio  (-4 -4)
> java-6629  2D..2 6349us : __activate_task  (-4 1)
> java-6629  2 6350us+< (0)
> java-6629  2 6352us+> sys_futex (00afc1dc 0001 
> 0001)
> java-6629  2...1 6368us : 
> __sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock()
> java-6629  2...1 6368us : 
> __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)()
> java-6629  2D..2 6369us+: deactivate_task  (-4 1)
>   -0 2D..2 6404us!: thread_return  (-4 20)
>   -0 2DN.1 6584us : 
> __sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)()

indeed - basically the mm semaphore is a common resource here. I suspect 
you'll see somewhat better numbers by using idle=poll or idle=mwait (or 
are using those options already?).

(could you send me the whole trace if you still have it? It would be 
interesting to 

RE: 2.6.19-rt14 slowdown compared to 2.6.19

2006-12-29 Thread Chen, Tim C
Ingo Molnar wrote:
> 
> If you'd like to profile this yourself then the lowest-cost way of
> profiling lock contention on -rt is to use the yum kernel and run the
> attached trace-it-lock-prof.c code on the box while your workload is
> in 'steady state' (and is showing those extended idle times):
> 
>   ./trace-it-lock-prof > trace.txt
> 
> this captures up to 1 second worth of system activity, on the current
> CPU. Then you can construct the histogram via:
> 
>   grep -A 1 ' __schedule()<-' trace.txt | cut -d: -f2- | sort |
>   uniq -c | sort -n > prof.txt
> 

I did lock profiling on Volanomark as suggested and obtained the 
profile that is listed below. 

246
__sched_text_start()<-schedule()<-rt_spin_lock_slowlock()<-__lock_text_s
tart()
264  rt_mutex_slowunlock()<-rt_mutex_unlock()<-rt_up_read()<-(-1)()
334
__sched_text_start()<-schedule()<-posix_cpu_timers_thread()<-kthread()
437  __sched_text_start()<-schedule()<-do_futex()<-sys_futex()
467  (-1)()<-(0)()<-(0)()<-(0)()
495
__sched_text_start()<-preempt_schedule()<-__spin_unlock_irqrestore()<-rt
_mutex_adjust_prio()
497  __netif_rx_schedule()<-netif_rx()<-loopback_xmit()<-(-1)()
499
__sched_text_start()<-schedule()<-schedule_timeout()<-sk_wait_data()
500  tcp_recvmsg()<-sock_common_recvmsg()<-sock_recvmsg()<-(-1)()
503  __rt_down_read()<-rt_down_read()<-do_futex()<-(-1)()
   1160  __sched_text_start()<-schedule()<-ksoftirqd()<-kthread()
   1433  __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)()
   1497  child_rip()<-(-1)()<-(0)()<-(0)()
   1936
__sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock()

Looks like the idle time I saw was due to lock contention 
during call to futex_wake, which requires acquisition of
current->mm->mmap_sem. 
Many of the java threads share mm and result in concurrent access to
common mm.  
Looks like under rt case there is no special treatment to read locking
so
the read lock accesses are contended under __rt_down_read.  For non rt
case, 
__down_read makes the distinction for read lock access and the read
lockings 
do not contend. 

Things are made worse here as this delayed waking up processes locked by
the futex.
See also a snippet of the latency_trace below. 

  -0 2D..2 5821us!: thread_return  (150 20)
  -0 2DN.1 6278us :
__sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)()
  -0 2DN.1 6278us : (0)()<-(0)()<-(0)()<-(0)()
java-6648  2D..2 6280us+: thread_return <-0> (20 -4)
java-6648  2D..1 6296us :
try_to_wake_up()<-wake_up_process()<-wakeup_next_waiter()<-rt_mutex_slow
unlock()
java-6648  2D..1 6296us :
rt_mutex_unlock()<-rt_up_read()<-do_futex()<-(-1)()
java-6648  2D..2 6297us : effective_prio <<...>-6673> (-4 -4)
java-6648  2D..2 6297us : __activate_task <<...>-6673> (-4 1)
java-6648  2 6297us < (-11)
java-6648  2 6298us+> sys_futex (00afaf50
0001 0001)
java-6648  2...1 6315us :
__sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock()
java-6648  2...1 6315us :
__rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)()
java-6648  2D..2 6316us+: deactivate_task  (-4 1)
  -0 2D..2 6318us+: thread_return  (-4 20)
  -0 2DN.1 6327us :
__sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)()
  -0 2DN.1 6328us+: (0)()<-(0)()<-(0)()<-(0)()
java-6629  2D..2 6330us+: thread_return <-0> (20 -4)
java-6629  2D..1 6347us :
try_to_wake_up()<-wake_up_process()<-wakeup_next_waiter()<-rt_mutex_slow
unlock()
java-6629  2D..1 6347us :
rt_mutex_unlock()<-rt_up_read()<-futex_wake()<-(-1)()
java-6629  2D..2 6348us : effective_prio  (-4 -4)
java-6629  2D..2 6349us : __activate_task  (-4 1)
java-6629  2 6350us+< (0)
java-6629  2 6352us+> sys_futex (00afc1dc
0001 0001)
java-6629  2...1 6368us :
__sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock()
java-6629  2...1 6368us :
__rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)()
java-6629  2D..2 6369us+: deactivate_task  (-4 1)
  -0 2D..2 6404us!: thread_return  (-4 20)
  -0 2DN.1 6584us :
__sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)()

Thanks.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]

2006-12-29 Thread hui
On Tue, Dec 26, 2006 at 04:51:21PM -0800, Chen, Tim C wrote:
> Ingo Molnar wrote:
> > If you'd like to profile this yourself then the lowest-cost way of
> > profiling lock contention on -rt is to use the yum kernel and run the
> > attached trace-it-lock-prof.c code on the box while your workload is
> > in 'steady state' (and is showing those extended idle times):
> > 
> >   ./trace-it-lock-prof > trace.txt
>
> Thanks for the pointer.  Will let you know of any relevant traces.

Tim,

http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.lock_stat.patch

You can also apply this patch to get more precise statistics down to
the lock. For example:

...

[50, 30, 279 :: 1, 0]   {tty_ldisc_try, -, 0}
[5, 5, 0 :: 19, 0]  {alloc_super, fs/super.c, 76}
[5, 5, 3 :: 1, 0]   {__free_pages_ok, -, 0}
[5728, 862, 156 :: 2, 0]{journal_init_common, 
fs/jbd/journal.c, 667}
[594713, 79020, 4287 :: 60818, 0]   {inode_init_once, 
fs/inode.c, 193}
[602, 0, 0 :: 1, 0] {lru_cache_add_active, -, 0}
[63, 5, 59 :: 1, 0] {lookup_mnt, -, 0}
[6425, 378, 103 :: 24, 0]   {initialize_tty_struct, 
drivers/char/tty_io.c, 3530}
[6708, 1, 225 :: 1, 0]  {file_move, -, 0}
[67, 8, 15 :: 1, 0] {do_lookup, -, 0}
[69, 0, 0 :: 1, 0]  {exit_mmap, -, 0}
[7, 0, 0 :: 1, 0]   {uart_set_options, 
drivers/serial/serial_core.c, 1876}
[76, 0, 0 :: 1, 0]  {get_zone_pcp, -, 0}
[, 5, 9 :: 1, 0]{as_work_handler, -, 0}
[8689, 0, 0 :: 15, 0]   {create_workqueue_thread, 
kernel/workqueue.c, 474}
[89, 7, 6 :: 195, 0]{sighand_ctor, kernel/fork.c, 1474}
@contention events = 1791177
@found = 21

Is the output from /proc/lock_stat/contention. First column is the number
of contention that will results in a full block of the task, second is the
number of times the mutex owner is active on a per cpu run queue the
scheduler and third is the number of times Steve Rostedt's ownership handoff
code averted a full block. Peter Zijlstra used it initially during his
files_lock work.

Overhead of the patch is very low since it is only recording stuff in the
slow path of the rt-mutex implementation.

Writing to that file clears all of the stats for a fresh run with a
benchmark. This should give a precise point at which any contention would
happen in -rt. In general, -rt should do about as well as the stock kernel
minus the overhead of interrupt threads.

Since the last release, I've added checks for whether the task is running
as "current" on a run queue to see if adaptive spins would be useful in -rt.

These new stats show that only a small percentage of events would benefit
from the use of adaptive spins in front of a rt- mutex. Any implementation
of it would have little impact on the system. It's not the mechanism but
the raw MP work itself that contributes to the good MP performance of Linux.

Apply and have fun.

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.19-rt14 slowdown compared to 2.6.19

2006-12-26 Thread Chen, Tim C
Ingo Molnar wrote:
> 
> cool - thanks for the feedback! Running the 64-bit kernel, right?
> 

Yes, 64-bit kernel was used.

> 
> while some slowdown is to be expected, did in each case idle time
> increase significantly? 

Volanomark and Re-Aim7 ran close to 0% idle time for 2.6.19 kernel.
Idle time
increase significantly for Volanomark (to 60% idle) and Re-Aim7 (to 20%
idle) 
with the rt kernel.  For netperf, the system was 60% idle for 
both 2.6.19 and rt kernel and changes in idle time was not significant.

> If yes then this is the effect of lock
> contention. Lock contention effects are 'magnified' by PREEMPT_RT. For
> example if you run 128 threads workload that all use the same lock
> then 
> the -rt kernel can act as if it were a 128-way box (!). This way by
> running -rt you'll see scalability problems alot sooner than on real
> hardware. In other words: PREEMPT_RT in essence simulates the
> scalability behavior of up to an infinite amount of CPUs. (with the
> exception of cachemiss emulation ;) [the effect is not this precise,
> but 
> that's the rough trend]

Turning off PREEMPT_RT for 2.6.20-rc2-rt0 kernel
restored most the performance of Volanaomark
and Re-Aim7.  Idle time is close to 0%.  So the benchmarks
with large number of threads are affected more by PREEMPT_RT.

For netperf TCP streaming, the performance improved from 40% down to 20%
down from 2.6.20-rc2 kernel.  There is only a server and a client
process
for netperf.  The underlying reason for the change in performance
is probably different.

> 
> If you'd like to profile this yourself then the lowest-cost way of
> profiling lock contention on -rt is to use the yum kernel and run the
> attached trace-it-lock-prof.c code on the box while your workload is
> in 'steady state' (and is showing those extended idle times):
> 
>   ./trace-it-lock-prof > trace.txt
> 

Thanks for the pointer.  Will let you know of any relevant traces.

Thanks.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-rt14 slowdown compared to 2.6.19

2006-12-23 Thread Ingo Molnar

* Chen, Tim C <[EMAIL PROTECTED]> wrote:

> Ingo,
>  
> We did some benchmarking on 2.6.19-rt14, compared it with 2.6.19 
> kernel and noticed several slowdowns.  The test machine is a 2 socket 
> woodcrest machine with your default configuration.

cool - thanks for the feedback! Running the 64-bit kernel, right?

> Netperf TCP Streaming was slower by 40% ( 1 server and 1 client each 
> bound to separate cpu cores on different socket, network loopback mode 
> was used).
> 
> Volanomark was slower by 80% (Server and Clients communicate with 
> network loopback mode. Idle time goes from 1% to 60%)
> 
> Re-Aim7 was slower by 40% (idle time goes from 0% to 20%)
> 
> Wonder if you have any suggestions on what could cause the slowdown.  
> We've tried disabling CONFIG_NO_HZ and it didn't help much.

while some slowdown is to be expected, did in each case idle time 
increase significantly? If yes then this is the effect of lock 
contention. Lock contention effects are 'magnified' by PREEMPT_RT. For 
example if you run 128 threads workload that all use the same lock then 
the -rt kernel can act as if it were a 128-way box (!). This way by 
running -rt you'll see scalability problems alot sooner than on real 
hardware. In other words: PREEMPT_RT in essence simulates the 
scalability behavior of up to an infinite amount of CPUs. (with the 
exception of cachemiss emulation ;) [the effect is not this precise, but 
that's the rough trend]

the good news: any fix for such problems in -rt is useful for the 
upstream kernel too, because it improves scalability on real hardware as 
well. The bad news: this work hasnt been finished yet ;-)

We've fixed a couple of contention points discovered by -rt this way, in 
particular Peter's excellent scalability patches at:

  http://programming.kicks-ass.net/kernel-patches/

were done based on lock-contention profiling feedback. I've included the 
lockless pagecache and the s_files scalability patches which were very 
useful already. The lru_cache_add one is the next one for me to try, 
plus on i686 -rt kernels highmem locking hurts, hence the 
highmem_rewrite.patch (not in -rt yet either). Plus there's the really 
exciting concurrent-pagecache patch-queue from Peter, which will be 
interesting to try too.

If you'd like to profile this yourself then the lowest-cost way of 
profiling lock contention on -rt is to use the yum kernel and run the 
attached trace-it-lock-prof.c code on the box while your workload is in 
'steady state' (and is showing those extended idle times):

  ./trace-it-lock-prof > trace.txt

this captures up to 1 second worth of system activity, on the current 
CPU. Then you can construct the histogram via:

  grep -A 1 ' __schedule()<-' trace.txt | cut -d: -f2- | sort |
  uniq -c | sort -n > prof.txt

i just did this for "hackbench 20", and it gave me:

 38  kfree()<-skb_release_data()<-kfree_skbmem()<-(-1)()
 40  kmem_cache_free()<-kfree_skbmem()<-__kfree_skb()<-(-1)()
 47  
__schedule()<-preempt_schedule_irq()<-retint_kernel()<-__spin_unlock_irqrestore()
173  rt_read_lock()<-sock_def_readable()<-unix_stream_sendmsg()<-(-1)()
173  __schedule()<-schedule()<-rt_spin_lock_slowlock()<-__lock_text_start()
274  __schedule()<-schedule()<-posix_cpu_timers_thread()<-kthread()
277  __schedule()<-schedule()<-ksoftirqd()<-kthread()
365  kmem_cache_alloc_node()<-__alloc_skb()<-sock_alloc_send_skb()<-(-1)()
366  kmem_cache_alloc_node()<-__kmalloc_node()<-__alloc_skb()<-(-1)()
814  __schedule()<-schedule()<-rt_spin_lock_slowlock()<-rt_spin_lock()
   1819  __schedule()<-schedule()<-int_careful()<-()
   2346  __schedule()<-schedule()<-schedule_timeout()<-unix_stream_recvmsg()
   2346  sock_aio_read()<-do_sync_read()<-vfs_read()<-(-1)()

(note the output mixes the two lines of symbolic stack backtrace entries 
that are generated by the tracer, so the last two lines for example are 
for the same contention/preemption point.)

this means that the biggest contention points (or natural rescheduling 
points) in this workload are sock_aio_read()->unix_stream_recvmsg() - 
not unexpected. But there's also ~10% of contention in the kmalloc code, 
meaning that for this workload the per-CPU slabs are probably set a bit 
too low. There's also ~5% of contention from sock_def_readable()'s use 
of an rwlock.

with more CPUs and more tasks you'll see these effects in a more drastic 
way.

Ingo

/*
 * Copyright (C) 2005, Ingo Molnar <[EMAIL PROTECTED]>
 *
 * user-triggered tracing.
 *
 * The -rt kernel has a built-in kernel tracer, which will trace
 * all kernel function calls (and a couple of special events as well),
 * by using a build-time gcc feature that instruments all kernel
 * functions.
 * 
 * The tracer is highly automated for a number of latency tracing purposes,
 * but it can also be switched into 'user-triggered' mode, which is a
 * half-automatic tracing mode where userspace apps start and stop the
 * t

Re: 2.6.19-rt14 slowdown compared to 2.6.19

2006-12-22 Thread K.R. Foley
Chen, Tim C wrote:
> Ingo,
>  
> We did some benchmarking on 2.6.19-rt14, compared it with 2.6.19 
> kernel and noticed several slowdowns.  The test machine is a 2 socket
> woodcrest machine with your default configuration.
>  
> Netperf TCP Streaming was slower by 40% ( 1 server and 1 client 
> each bound to separate cpu cores on different socket, network
> loopback mode was used).  
> 
> Volanomark was slower by 80% (Server and Clients communicate with 
> network loopback mode. Idle time goes from 1% to 60%)
> 
> Re-Aim7 was slower by 40% (idle time goes from 0% to 20%)
> 
> Wonder if you have any suggestions on what could cause the slowdown.  
> We've tried disabling CONFIG_NO_HZ and it didn't help much.
>
> Thanks.
> 
> Tim

Take a look at this. Not sure if this is the same problem but it looks
like a candidate. I can definitely say that some latencies I have seen
in my recent testing have gone away in the last few versions
2.6.20-rc1-rt{3,4}.

-- 
kr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-rt14 slowdown compared to 2.6.19

2006-12-22 Thread K.R. Foley
Chen, Tim C wrote:
> Ingo,
>  
> We did some benchmarking on 2.6.19-rt14, compared it with 2.6.19 
> kernel and noticed several slowdowns.  The test machine is a 2 socket
> woodcrest machine with your default configuration.
>  
> Netperf TCP Streaming was slower by 40% ( 1 server and 1 client 
> each bound to separate cpu cores on different socket, network
> loopback mode was used).  
> 
> Volanomark was slower by 80% (Server and Clients communicate with 
> network loopback mode. Idle time goes from 1% to 60%)
> 
> Re-Aim7 was slower by 40% (idle time goes from 0% to 20%)
> 
> Wonder if you have any suggestions on what could cause the slowdown.  
> We've tried disabling CONFIG_NO_HZ and it didn't help much.
>
> Thanks.
> 
> Tim

Take a look at this. Not sure if this is the same problem but it looks
like a candidate. I can definitely say that some latencies I have seen
in my recent testing have gone away in the last few versions
2.6.20-rc1-rt{3,4}.

Sorry I missed the URL first time around.
http://marc.theaimsgroup.com/?l=linux-kernel&m=116670388527349&w=2

-- 
kr

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-rt14 slowdown compared to 2.6.19

2006-12-22 Thread Daniel Walker
On Fri, 2006-12-22 at 13:39 -0800, Chen, Tim C wrote:

> Wonder if you have any suggestions on what could cause the slowdown.  
> We've tried disabling CONFIG_NO_HZ and it didn't help much.

Did you compare PREEMPT_RT with vanilla PREEMPT ? Or one version of
PREEMPT_RT vs. another ?

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/