Oh,My mistake. I am not using -O2 (I copied the Makefile from 
lttng-ust-2.6.0/doc/examples/easy-ust/Makefile :) ). When compiled with -O2, I 
get a 0.454us/event(10M events total 4548921074 ns).>Did you define 
_LGPL_SOURCE at the top of both files ? (the code and the probe provider)  
Yes,I did define it all of the files(provider(.h/.c) and sample.c)
>I see that in the numbers you provided in your earlier email, you talk about 
>"us" (microseconds), did you actually mean "ns" (nanoseconds) ?  Yes,I use 
>"us", say  0.554us , 0.28us, 0.5us and the latest 0.454us . If use "ns",they 
>are 554ns, 280ns,500ns,454ns ,it seems no mistake here。My CPU is 2 x Xeon 
>E5-2420  1.90GHz , OS: RHEL6,2.6.32-220 Maybe the fact my ust event slower 
>than yours is due to the CPU ability,I am not sure about that. Does you perf 
>report show something like  
>native_write_msr_safe?Thankszhenyu.ren------------------------------------------------------------------发件人:Mathieu
> Desnoyers <mathieu.desnoy...@efficios.com>发送时间:2015年4月29日(星期三) 
>10:25收件人:zhenyu.ren <zhenyu....@aliyun.com>抄 送:lttng-dev 
><lttng-dev@lists.lttng.org>主 题:Re: 答复:[lttng-dev] UST performance
Hi,Mathieu
> Is your sched_getcpu() system call a vdso too
   I use VSYSCALL_ADDR(__NR_vgetcpu) instead sched_getcpu so there is no getcpu 
system calls.
> make sure you #define _LGPL_SOURCE in your benchmark program
   I do it as you suggested but no difference.

> Then, you can try using "perf" to profile the runtime execution 
$~/renzhyu/perf record ./sample
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict.

Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.

Samples in kernel modules won't be resolved at all.

If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.


using 5720094251 ns
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.221 MB perf.data (~9658 samples) ]


$~/renzhyu/perf report
Events: 5K cycles
16.82% sample liblttng-ust.so.0.0.0 [.] lttng_event_commit10.55% sample 
liblttng-ust.so.0.0.0 [.] lttng_event_strcpy10.45% sample [kernel.kallsyms] [k] 
0xffffffff8103548a8.16% sample liblttng-ust.so.0.0.0 [.] 
lttng_event_reserve7.57% sample sample [.] 
__event_get_size__sample_component___message6.42% sample sample [.] 
__event_probe__sample_component___message6.24% sample liblttng-ust.so.0.0.0 [.] 
lttng_write_event_header5.40% sample sample [.] _rcu_read_lock_update5.35% 
sample sample [.] _rcu_read_unlock_bp5.17% sample liburcu-bp.so.2.0.0 [.] 
rcu_read_unlock_bp4.14% sample ld-2.5.so [.] __tls_get_addr3.38% sample [vdso] 
[.] 0x7fff8f72e6be3.35% sample sample [.] main1.88% sample sample [.] 
lib_ring_buffer_ctx_init1.79% sample sample [.] _rcu_read_lock_bp1.20% sample 
liburcu-bp.so.2.0.0 [.] rcu_read_lock_bp0.66% sample sample [.] 
lib_ring_buffer_align_ctx0.38% sample sample [.] cds_list_empty0.37% sample 
sample [.] lib_ring_buffer_align0.35% sample liburcu-bp.so.2.0.0 [.] 
__tls_get_addr@plt0.28% sample sample [.] 
__event_get_align__sample_component___message0.09% sample ld-2.5.so [.] 
do_lookup_Did you compile your program and the probe provider
(tp.c) with -O2 ?

Did you define _LGPL_SOURCE at the top of both files ?
(the code and the probe provider)

I tried your sample.c program here:

CPU:
Intel(R) Core(TM) i7-4600U CPU @ 2.10GHzThis is within a virtual machine.

And I get 302 ns/event (total: 3019987815ns exec time)

I see that in the numbers you provided in your earlier email,
you talk about "us" (microseconds), did you actually mean"ns" (nanoseconds) ?

I enabled tracing this way:

lttng create --snapshot
lttng enable-event -u -a
lttng start

Thoughts ?

Thanks,

Mathieu
The symbol  0xffffffff8103548a is in native_write_msr_safe,I don't know what 
the function is and why that kernel function can be TOP 3.ffffffff81035480 
<native_write_msr_safe>:
ffffffff81035480: 55 push %rbp
ffffffff81035481: 89 f9 mov %edi,%ecx
ffffffff81035483: 89 f0 mov %esi,%eax
ffffffff81035485: 48 89 e5 mov %rsp,%rbp
ffffffff81035488: 0f 30 wrmsr
ffffffff8103548a: 31 c0 xor %eax,%eax
ffffffff8103548c: c9 leaveq
ffffffff8103548d: c3 retq
ffffffff8103548e: 66 90 xchg %ax,%ax
PS:my sample.c
#include <time.h>
#include <unistd.h>
#include <stdio.h>
#define _LGPL_SOURCE
#define TRACEPOINT_DEFINE
#include "sample_component_provider.h"

#define NUM 10000000
int main(int argc, char **argv)
{
 int i = 0;
 struct timespec bef,aft;
 long diff;
 clock_gettime(CLOCK_MONOTONIC,&bef);
 for (i = 0; i < NUM; i++) {
tracepoint(sample_component, message, "HelloWorld");
 }
 clock_gettime(CLOCK_MONOTONIC,&aft);
 diff = (aft.tv_sec-bef.tv_sec)*1000000000+(aft.tv_nsec-bef.tv_nsec);
 printf("using %ld ns\n", diff);
 return 0;
}

Thankszhenyu.ren------------------------------------------------------------------发件人:Mathieu
 Desnoyers <mathieu.desnoy...@efficios.com>发送时间:2015年4月28日(星期二) 
23:47收件人:zhenyu.ren <zhenyu....@aliyun.com>抄 送:lttng-dev 
<lttng-dev@lists.lttng.org>主 题:Re: [lttng-dev] UST performance
Hi,
Is your sched_getcpu() system call a vdso too ? This may explain the difference.

Moreover, make sure you #define _LGPL_SOURCE in your benchmark program
before including any lttng ust header. It will inline many functions associated 
with
tracepoints.

Then, you can try using "perf" to profile the runtime execution of your program
traced by lttng-ust ? This should give us the information we need to 
investigate the
performance profile.

Thanks,

Mathieu


Hi,dev
    About a month ago,I asked a question about how to do time recording for ust 
event when vdso_clock_gettime not availble on my platform(Glibc too 
old).Recently I implemented a lib to analyze vdso in memory and lookuped the 
symbol vdso_clock_gettime successfully.With this vdso_clock_gettime availble,I 
get a 16% improvement in time spending per ust event. I.E. I measured the time 
taked by a event with 10 bytes string :
 no   vdso_clock_gettime : 0.642us with vdso_clock_gettime:0.554us
However,Both the numbers are much higher than the following 0.28us(But closed 
to 0.5us --W/O opt). So what's the difference between UST(0.28us) and UST W/O 
OPT(0.5us)?? 



PS:My hardware2 x Xeon E5-2430  2.20GHz 100MHz FSB (24 cores) 94.6GB / 96GB 
1333MHz DDR3 == 6 x 16GB, 6 x empty
Thankszhenyu.ren



_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev



-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com



-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Reply via email to