Em Mon, Aug 08, 2016 at 07:34:54PM +0800, Wangnan (F) escreveu: > > > On 2016/8/5 22:12, Arnaldo Carvalho de Melo wrote: > > Em Fri, Aug 05, 2016 at 06:58:12PM +0800, Wangnan (F) escreveu: > > > perf top comsumes too much memory than it need. > > > > > > Using following commands: > > > > > > # yes '' > /dev/null > > > # perf top -e raw_syscalls:sys_enter > > > > > > System will quickly become unresponsive because of out of memory. > > > > > > Using GDB I find two call stacks of malloc. See below. > > > > > > [call stack 1] > > > > > > #0 __GI___libc_malloc (bytes=140723951319552) at malloc.c:2841 > > > #1 0x00000000004b7246 in memdup (src=0x7f515e04b054, len=68) at > > > ../lib/string.c:29 > > > #2 0x00000000004f9299 in hist_entry__init (sample_self=true, > > > template=0x7ffcd9213a80, he=0xa37da50) at util/hist.c:401 > > > #3 hist_entry__new (template=template@entry=0x7ffcd9213a80 > > > <mailto:%28template=template@entry=0x7ffcd9213a80>, > > > sample_self=sample_self@entry=true) > > > <mailto:sample_self=sample_self@entry=true%29> at util/hist.c:458 > > > #4 0x00000000004fa13f in hists__findnew_entry (hists=0x3549d30, > > > hists=0x3549d30, al=0x7ffcd9213ca0, sample_self=true, entry=0x > > > 7ffcd9213a80) at util/hist.c:544 > > > #5 __hists__add_entry (hists=0x3549d30, al=0x7ffcd9213ca0, > > > sym_parent=<optimized out>, bi=bi@entry=0x0 <mailto:bi=bi@entry=0x0>, > > > mi=mi@entry=0x0 <mailto:mi=mi@entry=0x0>, sampl > > > e=<optimized out>, sample_self=sample_self@entry=true > > > <mailto:sample_self=sample_self@entry=true>, ops=ops@entry=0x0) > > > <mailto:ops=ops@entry=0x0%29> at util/hist.c:600 > > > #6 0x00000000004fa3d6 in hists__add_entry (sample_self=true, > > > sample=<optimized out>, mi=0x0, bi=0x0, sym_parent=<optimized out > > > > , al=<optimized out>, hists=<optimized out>) at util/hist.c:611 > > > #7 iter_add_single_normal_entry (iter=0x7ffcd9213ce0, al=<optimized out>) > > > at > > > util/hist.c:823 > > > #8 0x00000000004fc3c5 in hist_entry_iter__add (iter=0x7ffcd9213ce0, > > > al=0x7ffcd9213ca0, max_stack_depth=<optimized out>, arg=0x > > > 7ffcd9214120) at util/hist.c:1030 > > > #9 0x00000000004536d7 in perf_event__process_sample (machine=<optimized > > > out>, sample=0x7ffcd9213d30, evsel=0x3549bb0, event=<o > > > ptimized out>, tool=0x7ffcd9214120) at builtin-top.c:800 > > > #10 perf_top__mmap_read_idx (top=top@entry=0x7ffcd9214120 > > > <mailto:%28top=top@entry=0x7ffcd9214120>, idx=idx@entry=0) > > > <mailto:idx=idx@entry=0%29> at builtin-top.c:866 > > > #11 0x00000000004559fd in perf_top__mmap_read (top=0x7ffcd9214120) at > > > builtin-top.c:883 > > > #12 __cmd_top (top=0x7ffcd9214120) at builtin-top.c:1018 > > > #13 cmd_top (argc=<optimized out>, argv=<optimized out>, prefix=<optimized > > > out>) at builtin-top.c:1344 > > > #14 0x0000000000493561 in run_builtin (p=p@entry=0x948000 > > > <mailto:%28p=p@entry=0x948000> <commands+288>, argc=argc@entry=4 > > > <mailto:argc=argc@entry=4>, argv=argv@entry=0x7ffcd92199b0) > > > <mailto:argv=argv@entry=0x7ffcd92199b0%29> at > > > perf.c:357 > > > #15 0x0000000000437157 in handle_internal_command (argv=0x7ffcd92199b0, > > > argc=4) at perf.c:419 > > > #16 run_argv (argv=0x7ffcd9219720, argcp=0x7ffcd921972c) at perf.c:465 > > > #17 main (argc=4, argv=0x7ffcd92199b0) at perf.c:609 > > > > > > [call stack 2] > > > #0 __GI___libc_malloc (bytes=140723951319456) at malloc.c:2841 > > > #1 0x00000000005766de in trace_seq_init (s=0x7ffcd92139b0) at > > > trace-seq.c:60 > > > #2 0x00000000004f7047 in get_trace_output (he=0x7ffcd9213a80) at > > > util/sort.c:584 > > Yeah, this allocates 4K, way too much, can you try with this patch, > > which amounts to a trace_seq_trim()? > > > > diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c > > index 947d21f38398..37ddcf57500f 100644 > > --- a/tools/perf/util/sort.c > > +++ b/tools/perf/util/sort.c > > @@ -588,7 +588,7 @@ static char *get_trace_output(struct hist_entry *he) > > } else { > > pevent_event_info(&seq, evsel->tp_format, &rec); > > } > > - return seq.buffer; > > + return realloc(seq.buffer, seq.len); > > } > > static int64_t > > It works: from 1.8g/10s to 40m/10s, but unfortunately your > patch is wrong: please use realloc(seq.buffer, seq.len + 1) > because of the ending '\0' :)
Ouch, I thought those were accounted in seq.len :-\ Will check and fix, thanks for testing it. > In my experiment RSS continuously increases because I use > raw_syscalls, which can result in too much entries to record > in perf top because of the nature of argument list (garbage > arguments are also considered by perf). It becomes much better > if I use --fields to reduce the number of possible entries. > However, theoretically perf will eventually consume all > system memory, unless we carefully choose tracepoints and > fields to make sure the possible entries is limited. I need to continue work on reference counting more classes, so that we can trow away more stuff as we decay histogram entries... - Arnaldo