Oh,My mistake. I am not using -O2 (I copied the Makefile from lttng-ust-2.6.0/doc/examples/easy-ust/Makefile :) ). When compiled with -O2, I get a 0.454us/event(10M events total 4548921074 ns).>Did you define _LGPL_SOURCE at the top of both files ? (the code and the probe provider) Yes,I did define it all of the files(provider(.h/.c) and sample.c) >I see that in the numbers you provided in your earlier email, you talk about >"us" (microseconds), did you actually mean "ns" (nanoseconds) ? Yes,I use >"us", say 0.554us , 0.28us, 0.5us and the latest 0.454us . If use "ns",they >are 554ns, 280ns,500ns,454ns ,it seems no mistake here。My CPU is 2 x Xeon >E5-2420 1.90GHz , OS: RHEL6,2.6.32-220 Maybe the fact my ust event slower >than yours is due to the CPU ability,I am not sure about that. Does you perf >report show something like >native_write_msr_safe?Thankszhenyu.ren------------------------------------------------------------------发件人:Mathieu > Desnoyers <mathieu.desnoy...@efficios.com>发送时间:2015年4月29日(星期三) >10:25收件人:zhenyu.ren <zhenyu....@aliyun.com>抄 送:lttng-dev ><lttng-dev@lists.lttng.org>主 题:Re: 答复:[lttng-dev] UST performance Hi,Mathieu > Is your sched_getcpu() system call a vdso too I use VSYSCALL_ADDR(__NR_vgetcpu) instead sched_getcpu so there is no getcpu system calls. > make sure you #define _LGPL_SOURCE in your benchmark program I do it as you suggested but no difference.
> Then, you can try using "perf" to profile the runtime execution $~/renzhyu/perf record ./sample WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted, check /proc/sys/kernel/kptr_restrict. Samples in kernel functions may not be resolved if a suitable vmlinux file is not found in the buildid cache or in the vmlinux path. Samples in kernel modules won't be resolved at all. If some relocation was applied (e.g. kexec) symbols may be misresolved even with a suitable vmlinux or kallsyms file. using 5720094251 ns [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.221 MB perf.data (~9658 samples) ] $~/renzhyu/perf report Events: 5K cycles 16.82% sample liblttng-ust.so.0.0.0 [.] lttng_event_commit10.55% sample liblttng-ust.so.0.0.0 [.] lttng_event_strcpy10.45% sample [kernel.kallsyms] [k] 0xffffffff8103548a8.16% sample liblttng-ust.so.0.0.0 [.] lttng_event_reserve7.57% sample sample [.] __event_get_size__sample_component___message6.42% sample sample [.] __event_probe__sample_component___message6.24% sample liblttng-ust.so.0.0.0 [.] lttng_write_event_header5.40% sample sample [.] _rcu_read_lock_update5.35% sample sample [.] _rcu_read_unlock_bp5.17% sample liburcu-bp.so.2.0.0 [.] rcu_read_unlock_bp4.14% sample ld-2.5.so [.] __tls_get_addr3.38% sample [vdso] [.] 0x7fff8f72e6be3.35% sample sample [.] main1.88% sample sample [.] lib_ring_buffer_ctx_init1.79% sample sample [.] _rcu_read_lock_bp1.20% sample liburcu-bp.so.2.0.0 [.] rcu_read_lock_bp0.66% sample sample [.] lib_ring_buffer_align_ctx0.38% sample sample [.] cds_list_empty0.37% sample sample [.] lib_ring_buffer_align0.35% sample liburcu-bp.so.2.0.0 [.] __tls_get_addr@plt0.28% sample sample [.] __event_get_align__sample_component___message0.09% sample ld-2.5.so [.] do_lookup_Did you compile your program and the probe provider (tp.c) with -O2 ? Did you define _LGPL_SOURCE at the top of both files ? (the code and the probe provider) I tried your sample.c program here: CPU: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHzThis is within a virtual machine. And I get 302 ns/event (total: 3019987815ns exec time) I see that in the numbers you provided in your earlier email, you talk about "us" (microseconds), did you actually mean"ns" (nanoseconds) ? I enabled tracing this way: lttng create --snapshot lttng enable-event -u -a lttng start Thoughts ? Thanks, Mathieu The symbol 0xffffffff8103548a is in native_write_msr_safe,I don't know what the function is and why that kernel function can be TOP 3.ffffffff81035480 <native_write_msr_safe>: ffffffff81035480: 55 push %rbp ffffffff81035481: 89 f9 mov %edi,%ecx ffffffff81035483: 89 f0 mov %esi,%eax ffffffff81035485: 48 89 e5 mov %rsp,%rbp ffffffff81035488: 0f 30 wrmsr ffffffff8103548a: 31 c0 xor %eax,%eax ffffffff8103548c: c9 leaveq ffffffff8103548d: c3 retq ffffffff8103548e: 66 90 xchg %ax,%ax PS:my sample.c #include <time.h> #include <unistd.h> #include <stdio.h> #define _LGPL_SOURCE #define TRACEPOINT_DEFINE #include "sample_component_provider.h" #define NUM 10000000 int main(int argc, char **argv) { int i = 0; struct timespec bef,aft; long diff; clock_gettime(CLOCK_MONOTONIC,&bef); for (i = 0; i < NUM; i++) { tracepoint(sample_component, message, "HelloWorld"); } clock_gettime(CLOCK_MONOTONIC,&aft); diff = (aft.tv_sec-bef.tv_sec)*1000000000+(aft.tv_nsec-bef.tv_nsec); printf("using %ld ns\n", diff); return 0; } Thankszhenyu.ren------------------------------------------------------------------发件人:Mathieu Desnoyers <mathieu.desnoy...@efficios.com>发送时间:2015年4月28日(星期二) 23:47收件人:zhenyu.ren <zhenyu....@aliyun.com>抄 送:lttng-dev <lttng-dev@lists.lttng.org>主 题:Re: [lttng-dev] UST performance Hi, Is your sched_getcpu() system call a vdso too ? This may explain the difference. Moreover, make sure you #define _LGPL_SOURCE in your benchmark program before including any lttng ust header. It will inline many functions associated with tracepoints. Then, you can try using "perf" to profile the runtime execution of your program traced by lttng-ust ? This should give us the information we need to investigate the performance profile. Thanks, Mathieu Hi,dev About a month ago,I asked a question about how to do time recording for ust event when vdso_clock_gettime not availble on my platform(Glibc too old).Recently I implemented a lib to analyze vdso in memory and lookuped the symbol vdso_clock_gettime successfully.With this vdso_clock_gettime availble,I get a 16% improvement in time spending per ust event. I.E. I measured the time taked by a event with 10 bytes string : no vdso_clock_gettime : 0.642us with vdso_clock_gettime:0.554us However,Both the numbers are much higher than the following 0.28us(But closed to 0.5us --W/O opt). So what's the difference between UST(0.28us) and UST W/O OPT(0.5us)?? PS:My hardware2 x Xeon E5-2430 2.20GHz 100MHz FSB (24 cores) 94.6GB / 96GB 1333MHz DDR3 == 6 x 16GB, 6 x empty Thankszhenyu.ren _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
_______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev