Currently, since the kprobes expects to be used
with less than 100 probe points, its hash table
just has 64 entries. This is too little to handle
several thousands of probes.
Enlarge this to 4096 entires which just consumes
32KB (on 64bit arch) for better scalability.

Without this patch, enabling 17787 probes takes
more than 2 hours! (9428sec, 1 min intervals for
each 2000 probes enabled)

  Enabling trace events: start at 1392782584
  0 1392782585 a2mp_chan_alloc_skb_cb_38556
  1 1392782585 a2mp_chan_close_cb_38555
  ....
  17785 1392792008 lookup_vport_34987
  17786 1392792010 loop_add_23485
  17787 1392792012 loop_attr_do_show_autoclear_23464

I profiled it and saw that more than 90% of
cycles are consumed on get_kprobe.

  Samples: 18K of event 'cycles', Event count (approx.): 37759714934
  +  95.90%  [k] get_kprobe
  +   0.76%  [k] ftrace_lookup_ip
  +   0.54%  [k] kprobe_trace_func

And also more than 60% of executed instructions
were in get_kprobe too.

  Samples: 17K of event 'instructions', Event count (approx.): 1321391290
  +  65.48%  [k] get_kprobe
  +   4.07%  [k] kprobe_trace_func
  +   2.93%  [k] optimized_callback


And annotating get_kprobe also shows the hlist
is too long and takes a time on tracking it.

       |            struct hlist_head *head;
       |            struct kprobe *p;
       |
       |            head = &kprobe_table[hash_ptr(addr, KPROBE_HASH_BITS)];
       |            hlist_for_each_entry_rcu(p, head, hlist) {
 86.33 |      mov    (%rax),%rax
 11.24 |      test   %rax,%rax
       |      jne    60
       |                    if (p->addr == addr)
       |                            return p;
       |            }

With this fix, enabling 20,000 probes just takes
40 min (2303 sec, 1 min intervals for
each 2000 probes enabled)

  Enabling trace events: start at 1392794306
  0 1392794307 a2mp_chan_alloc_skb_cb_38556
  1 1392794307 a2mp_chan_close_cb_38555
  ....
  19997 1392796603 nfs4_negotiate_security_12119
  19998 1392796603 nfs4_open_confirm_done_11767
  19999 1392796603 nfs4_open_confirm_prepare_11779

And it reduced cycles on get_kprobe (with 20,000 probes).

  Samples: 5K of event 'cycles', Event count (approx.): 4540269674
  +  68.77%  [k] get_kprobe
  +   8.56%  [k] ftrace_lookup_ip
  +   3.04%  [k] kprobe_trace_func

Signed-off-by: Masami Hiramatsu <masami.hiramatsu...@hitachi.com>
---
 kernel/kprobes.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index abdede5..302ff42 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -54,7 +54,7 @@
 #include <asm/errno.h>
 #include <asm/uaccess.h>
 
-#define KPROBE_HASH_BITS 6
+#define KPROBE_HASH_BITS 12
 #define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS)
 
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to