Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods

2012-10-31 Thread Namhyung Kim
On Tue, 30 Oct 2012 10:01:10 +0100, Ingo Molnar wrote:
> * Peter Zijlstra  wrote:
>
>> On Tue, 2012-10-30 at 15:59 +0900, Namhyung Kim wrote:
>
>> > Yes, the callchain part needs to be improved.  Peter's idea 
>> > indeed looks good to me too.
>> 
>> FWIW, I think this is exactly what sysprof does, except that 
>> tool isn't usable for other reasons.. You might want to look 
>> at it though.
>
> I always found the fundamental sysprof system-wide call graph 
> profiling output/view superior - and so do many Xorg developers 
> who are using SysProf that I talked to - so I'd strongly 
> encourage to use that ordering and grouping for the default perf 
> call-graph profiling output/view.

Okay, I'll look at the sysprof.

Anyway, do you have any other comments for the general --cumulate
approach in this series (esp. with --branch-stack)?

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods

2012-10-30 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> On Tue, 2012-10-30 at 15:59 +0900, Namhyung Kim wrote:

> > Yes, the callchain part needs to be improved.  Peter's idea 
> > indeed looks good to me too.
> 
> FWIW, I think this is exactly what sysprof does, except that 
> tool isn't usable for other reasons.. You might want to look 
> at it though.

I always found the fundamental sysprof system-wide call graph 
profiling output/view superior - and so do many Xorg developers 
who are using SysProf that I talked to - so I'd strongly 
encourage to use that ordering and grouping for the default perf 
call-graph profiling output/view.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods

2012-10-30 Thread Peter Zijlstra
On Tue, 2012-10-30 at 15:59 +0900, Namhyung Kim wrote:
> Yes, the callchain part needs to be improved.  Peter's idea indeed looks
> good to me too. 

FWIW, I think this is exactly what sysprof does, except that tool isn't
usable for other reasons.. You might want to look at it though.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods

2012-10-29 Thread Namhyung Kim
Hi Arun and Peter,

On Mon, 29 Oct 2012 14:36:01 -0700, Arun Sharma wrote:
> On 10/29/12 12:08 PM, Peter Zijlstra wrote:
>
>> Right, so I tried this and I would expect the callchains to be inverted
>> too, so that when I expand say 'c' I would see that 'c' calls 'b' for
>> 100% which calls 'a' for 100%.
>>
>> Instead I get the regular callchains, expanding 'c' gives me main calls
>> it for 100%.
>>
>> Adding -G (invert callchains) doesn't make it better, in that case, when
>> I expand 'c' we start at '__libc_start_main' instead of 'c'.
>>
>> Is there anything I'm missing?
>>
>
> Sounds like a reasonable expectation.
>
> I tested mainly:
>
> perf report --cumulate  -g graph,100,callee
>
> to find the functions with a large amount of CPU time underneath. Then
> examined the callgraph without --cumulate. But yeah - it'd be nice to
> be able to do both in a single invocation.

Yes, the callchain part needs to be improved.  Peter's idea indeed looks
good to me too.

But before doing that, I'd like to get an agreement on how to
design/implement this feature.

Sorry to Frederic (and Stephane), I'm bothering you multiple times with
this but I didn't get what you want exactly.  IIUC you don't want to
have --cumulate option but to share branch sampling code to implement
it, right?

But the branch sampling output looks not fit to --cumulate usage IMHO.
Could you give me an advice?

>
> Also, when callgraphs are displayed, the percentages are off (>
> 100%). Namhyung probably needs to use he->stat_acc->period in a few
> places as the denominator instead of he->period.

I will look into it later.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods

2012-10-29 Thread Arun Sharma

On 10/29/12 12:08 PM, Peter Zijlstra wrote:


Right, so I tried this and I would expect the callchains to be inverted
too, so that when I expand say 'c' I would see that 'c' calls 'b' for
100% which calls 'a' for 100%.

Instead I get the regular callchains, expanding 'c' gives me main calls
it for 100%.

Adding -G (invert callchains) doesn't make it better, in that case, when
I expand 'c' we start at '__libc_start_main' instead of 'c'.

Is there anything I'm missing?



Sounds like a reasonable expectation.

I tested mainly:

perf report --cumulate  -g graph,100,callee

to find the functions with a large amount of CPU time underneath. Then 
examined the callgraph without --cumulate. But yeah - it'd be nice to be 
able to do both in a single invocation.


Also, when callgraphs are displayed, the percentages are off (> 100%). 
Namhyung probably needs to use he->stat_acc->period in a few places as 
the denominator instead of he->period.


 -Arun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods

2012-10-29 Thread Peter Zijlstra
On Thu, 2012-09-13 at 16:19 +0900, Namhyung Kim wrote:
> When --cumulate option is given, it'll be shown like this:
> 
>$ perf report --cumulate
>(...)
>+  93.63%  abc  libc-2.15.so[.] __libc_start_main
>+  93.35%  abc  abc [.] main
>+  93.35%  abc  abc [.] c
>+  93.35%  abc  abc [.] b
>+  93.35%  abc  abc [.] a
>+   5.17%  abc  ld-2.15.so  [.] _dl_map_object
>+   5.17%  abc  ld-2.15.so  [.] _dl_map_object_from_fd
>+   1.13%  abc  ld-2.15.so  [.] _dl_start_user
>+   1.13%  abc  ld-2.15.so  [.] _dl_start
>+   0.29%  abc  perf[.] main
>+   0.29%  abc  perf[.] run_builtin
>+   0.29%  abc  perf[.] cmd_record
>+   0.29%  abc  libpthread-2.15.so  [.] __libc_close
>+   0.07%  abc  ld-2.15.so  [.] _start
>+   0.07%  abc  [kernel.kallsyms]   [k] page_fault
>
> (This output came from TUI since stdio bothered by callchains) 

Right, so I tried this and I would expect the callchains to be inverted
too, so that when I expand say 'c' I would see that 'c' calls 'b' for
100% which calls 'a' for 100%.

Instead I get the regular callchains, expanding 'c' gives me main calls
it for 100%.

Adding -G (invert callchains) doesn't make it better, in that case, when
I expand 'c' we start at '__libc_start_main' instead of 'c'.

Is there anything I'm missing?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods

2012-09-28 Thread Stephane Eranian
On Fri, Sep 28, 2012 at 5:14 PM, Frederic Weisbecker  wrote:
> On Fri, Sep 28, 2012 at 09:07:57AM +0200, Stephane Eranian wrote:
>> On Fri, Sep 28, 2012 at 7:49 AM, Namhyung Kim  wrote:
>> > Hi Frederic,
>> >
>> > On Fri, 28 Sep 2012 01:01:48 +0200, Frederic Weisbecker wrote:
>> >> When Arun was working on this, I asked him to explore if it could make 
>> >> sense to reuse
>> >> the "-b, --branch-stack"  perf report option. Because after all, this 
>> >> feature is doing
>> >> about the same than "-b" except it's using callchains instead of full 
>> >> branch tracing.
>> >> But callchains are branches. Just a limited subset of all branches taken 
>> >> on excecution.
>> >> So you can probably reuse some interface and even ground code there.
>> >>
>> >> What do you think?
>> >
>> > Umm.. first of all, I'm not familiar with the branch stack thing.  It's
>> > intel-specific, right?
>> >
>> The kernel API is NOT specific to Intel. It is abstracted to be portable
>> across architecture. The implementation only exists on certain Intel
>> X86 processors.
>>
>> > Also I don't understand what exactly you want here.  What kind of
>> > interface did you say?  Can you elaborate it bit more?
>> >
>> Not clear to me either.
>>
>> > And AFAIK branch stack can collect much more branch information than
>> > just callstacks.  Can we differentiate which is which easily?  Is there
>> > any limitation on using it?  What if callstacks are not sync'ed with
>> > branch stacks - is it possible though?
>> >
>> First of all branch stack is not a branch tracing mechanism. This is a
>> branch sampling mechanism. Not all branches are captured. Only the
>> last N consecutive branches leading to a PMU interrupt are captured
>> in each sample.
>>
>> Yes, the branch stack mechanism as it exists on Intel processors
>> can capture more then call branches. It is HW based and provides
>> a branch type filter. Filtering capability is exposed at the API level
>> in a generic fashion. The hw filter is based on opcodes. Call branches
>> all cover call, syscall instructions. As such, the branch stack mechanism
>> cannot be used to capture callstacks to shared libraries, simply because
>> there a a non call instruction in the trampoline. To obtain a better quality
>> callstack you have instead to sample return branches. So yes, callstacks
>> are not sync'ed with branch stack even if limited to call branches.
>>
>
> You're right. One doesn't simply sample callchains on top of branch tracing. 
> Not easily at least.
> But that's not what we want here. We want the other way round: use callchains 
> as branch sampling.
> And a callchain _is_ a branch sampling. Just a specialized one.
>
> PERF_SAMPLE_BRANCH_STACK either records only calls, only ret, or everything, 
> or
> You can define the filter with "-j" option. Now callchains can be considered 
> as the result
> of a specific "-j" filter option. It's just a high level filtering. ie: not 
> just based on opcode
> types but on semantic post-processing. As if we applied a specific filter on 
> a pure branch tracing
> that cancelled calls that had matching ret.
>
A callstack mode will be added to PERF_SAMPLE_BRANCH_STACK geneirc
filter because this becomes
available in HW starting with Haswell (see Vol3b August 2012, section
17.8). This will still be a statistical
approach and not a complete callstack trace (only the last 16 calls).

So yes, you could piggyback your callstack on top of that. You could
return the full trace with the existing
perf_branch_entry data structure. You'd have to fill in the prediction
flags as N/A.

But now with Haswell, one would have to decide whether to use the 'SW
callstack' or the 'HW callstack'.
It all depends on the quality of the data returned by HW callstack.


> But in the end, what we have is just branches. Some branch layout that is 
> biased, that already passed
> through a semantic wheel, still it's just _branches_.
>
> Note I'm not arguing about adding a "-j callchain" option, just trying to 
> show you that callchains
> are not really different from other filtered source of branch sampling.
>
>
>> > But I think it'd be good if the branch stack can be changed to call
>> > stack in general.  Did you mean this?
>> >
>> That's not going to happen. The mechanism is much more generic than
>> that.
>>
>> Quite frankly, I don't understand Frederic's motivation here. The mechanism
>> are not quite the same.
>
> So, considering that callchains are just "branches", why can't we use them as
> a branch source, just like PERF_SAMPLE_BRANCH_STACK data samples, that we
> can reuse in "perf report -b".
>
> Look at commit b50311dc2ac1c04ad19163c2359910b25e16caf6
> "perf report: Add support for taken branch sampling". It's doing (except for 
> a few details
> like the period weight of branch samples) the same than in Namhyung patch, 
> just with
> PERF_SAMPLE_BRANCH_STACK instead of callchains.
>
> I don't understand what justifies this duplication.
--
To unsubscribe from this 

Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods

2012-09-28 Thread Frederic Weisbecker
On Fri, Sep 28, 2012 at 02:49:55PM +0900, Namhyung Kim wrote:
> Hi Frederic,
> 
> On Fri, 28 Sep 2012 01:01:48 +0200, Frederic Weisbecker wrote:
> > When Arun was working on this, I asked him to explore if it could make 
> > sense to reuse
> > the "-b, --branch-stack"  perf report option. Because after all, this 
> > feature is doing
> > about the same than "-b" except it's using callchains instead of full 
> > branch tracing.
> > But callchains are branches. Just a limited subset of all branches taken on 
> > excecution.
> > So you can probably reuse some interface and even ground code there.
> >
> > What do you think?
> 
> Umm.. first of all, I'm not familiar with the branch stack thing.  It's
> intel-specific, right?
> 
> Also I don't understand what exactly you want here.  What kind of
> interface did you say?  Can you elaborate it bit more?

Look at commit b50311dc2ac1c04ad19163c2359910b25e16caf6
"perf report: Add support for taken branch sampling". It's doing almost
the same than you do, just using PERF_SAMPLE_BRANCH_STACK instead of
callchains.

> And AFAIK branch stack can collect much more branch information than
> just callstacks.

That's not a problem. Callchains are just a high-level filtered source of
branch samples. You don't need full branches to use "-b". Just use the flavour
of branch samples you want to make the sense you want on your branch sampling.

> Can we differentiate which is which easily?

Sure. If you have both sources in your perf.data (PERF_SAMPLE_BRANCH_STACK and
callchains), ask the user which one he wants. Otherwise defaults to what's 
there.

> Is there
> any limitation on using it?  What if callstacks are not sync'ed with
> branch stacks - is it possible though?

It' better to make both sources mutually exclusive. Otherwise it's going
to be over-complicated.

> 
> But I think it'd be good if the branch stack can be changed to call
> stack in general.  Did you mean this?

That's a different. We might be able to post-process branch tracing and
build a callchain on top of it (following calls and ret). May be we will
one day. But they are different issues altogether.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods

2012-09-28 Thread Frederic Weisbecker
On Fri, Sep 28, 2012 at 09:07:57AM +0200, Stephane Eranian wrote:
> On Fri, Sep 28, 2012 at 7:49 AM, Namhyung Kim  wrote:
> > Hi Frederic,
> >
> > On Fri, 28 Sep 2012 01:01:48 +0200, Frederic Weisbecker wrote:
> >> When Arun was working on this, I asked him to explore if it could make 
> >> sense to reuse
> >> the "-b, --branch-stack"  perf report option. Because after all, this 
> >> feature is doing
> >> about the same than "-b" except it's using callchains instead of full 
> >> branch tracing.
> >> But callchains are branches. Just a limited subset of all branches taken 
> >> on excecution.
> >> So you can probably reuse some interface and even ground code there.
> >>
> >> What do you think?
> >
> > Umm.. first of all, I'm not familiar with the branch stack thing.  It's
> > intel-specific, right?
> >
> The kernel API is NOT specific to Intel. It is abstracted to be portable
> across architecture. The implementation only exists on certain Intel
> X86 processors.
> 
> > Also I don't understand what exactly you want here.  What kind of
> > interface did you say?  Can you elaborate it bit more?
> >
> Not clear to me either.
> 
> > And AFAIK branch stack can collect much more branch information than
> > just callstacks.  Can we differentiate which is which easily?  Is there
> > any limitation on using it?  What if callstacks are not sync'ed with
> > branch stacks - is it possible though?
> >
> First of all branch stack is not a branch tracing mechanism. This is a
> branch sampling mechanism. Not all branches are captured. Only the
> last N consecutive branches leading to a PMU interrupt are captured
> in each sample.
> 
> Yes, the branch stack mechanism as it exists on Intel processors
> can capture more then call branches. It is HW based and provides
> a branch type filter. Filtering capability is exposed at the API level
> in a generic fashion. The hw filter is based on opcodes. Call branches
> all cover call, syscall instructions. As such, the branch stack mechanism
> cannot be used to capture callstacks to shared libraries, simply because
> there a a non call instruction in the trampoline. To obtain a better quality
> callstack you have instead to sample return branches. So yes, callstacks
> are not sync'ed with branch stack even if limited to call branches.
> 

You're right. One doesn't simply sample callchains on top of branch tracing. 
Not easily at least.
But that's not what we want here. We want the other way round: use callchains 
as branch sampling.
And a callchain _is_ a branch sampling. Just a specialized one.

PERF_SAMPLE_BRANCH_STACK either records only calls, only ret, or everything, 
or
You can define the filter with "-j" option. Now callchains can be considered as 
the result
of a specific "-j" filter option. It's just a high level filtering. ie: not 
just based on opcode
types but on semantic post-processing. As if we applied a specific filter on a 
pure branch tracing
that cancelled calls that had matching ret.

But in the end, what we have is just branches. Some branch layout that is 
biased, that already passed
through a semantic wheel, still it's just _branches_.

Note I'm not arguing about adding a "-j callchain" option, just trying to show 
you that callchains
are not really different from other filtered source of branch sampling.


> > But I think it'd be good if the branch stack can be changed to call
> > stack in general.  Did you mean this?
> >
> That's not going to happen. The mechanism is much more generic than
> that.
> 
> Quite frankly, I don't understand Frederic's motivation here. The mechanism
> are not quite the same.

So, considering that callchains are just "branches", why can't we use them as
a branch source, just like PERF_SAMPLE_BRANCH_STACK data samples, that we
can reuse in "perf report -b".

Look at commit b50311dc2ac1c04ad19163c2359910b25e16caf6
"perf report: Add support for taken branch sampling". It's doing (except for a 
few details
like the period weight of branch samples) the same than in Namhyung patch, just 
with
PERF_SAMPLE_BRANCH_STACK instead of callchains.

I don't understand what justifies this duplication.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods

2012-09-28 Thread Stephane Eranian
On Fri, Sep 28, 2012 at 7:49 AM, Namhyung Kim  wrote:
> Hi Frederic,
>
> On Fri, 28 Sep 2012 01:01:48 +0200, Frederic Weisbecker wrote:
>> When Arun was working on this, I asked him to explore if it could make sense 
>> to reuse
>> the "-b, --branch-stack"  perf report option. Because after all, this 
>> feature is doing
>> about the same than "-b" except it's using callchains instead of full branch 
>> tracing.
>> But callchains are branches. Just a limited subset of all branches taken on 
>> excecution.
>> So you can probably reuse some interface and even ground code there.
>>
>> What do you think?
>
> Umm.. first of all, I'm not familiar with the branch stack thing.  It's
> intel-specific, right?
>
The kernel API is NOT specific to Intel. It is abstracted to be portable
across architecture. The implementation only exists on certain Intel
X86 processors.

> Also I don't understand what exactly you want here.  What kind of
> interface did you say?  Can you elaborate it bit more?
>
Not clear to me either.

> And AFAIK branch stack can collect much more branch information than
> just callstacks.  Can we differentiate which is which easily?  Is there
> any limitation on using it?  What if callstacks are not sync'ed with
> branch stacks - is it possible though?
>
First of all branch stack is not a branch tracing mechanism. This is a
branch sampling mechanism. Not all branches are captured. Only the
last N consecutive branches leading to a PMU interrupt are captured
in each sample.

Yes, the branch stack mechanism as it exists on Intel processors
can capture more then call branches. It is HW based and provides
a branch type filter. Filtering capability is exposed at the API level
in a generic fashion. The hw filter is based on opcodes. Call branches
all cover call, syscall instructions. As such, the branch stack mechanism
cannot be used to capture callstacks to shared libraries, simply because
there a a non call instruction in the trampoline. To obtain a better quality
callstack you have instead to sample return branches. So yes, callstacks
are not sync'ed with branch stack even if limited to call branches.



> But I think it'd be good if the branch stack can be changed to call
> stack in general.  Did you mean this?
>
That's not going to happen. The mechanism is much more generic than
that.

Quite frankly, I don't understand Frederic's motivation here. The mechanism
are not quite the same.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods

2012-09-27 Thread Namhyung Kim
Hi Frederic,

On Fri, 28 Sep 2012 01:01:48 +0200, Frederic Weisbecker wrote:
> When Arun was working on this, I asked him to explore if it could make sense 
> to reuse
> the "-b, --branch-stack"  perf report option. Because after all, this feature 
> is doing
> about the same than "-b" except it's using callchains instead of full branch 
> tracing.
> But callchains are branches. Just a limited subset of all branches taken on 
> excecution.
> So you can probably reuse some interface and even ground code there.
>
> What do you think?

Umm.. first of all, I'm not familiar with the branch stack thing.  It's
intel-specific, right?

Also I don't understand what exactly you want here.  What kind of
interface did you say?  Can you elaborate it bit more?

And AFAIK branch stack can collect much more branch information than
just callstacks.  Can we differentiate which is which easily?  Is there
any limitation on using it?  What if callstacks are not sync'ed with
branch stacks - is it possible though?

But I think it'd be good if the branch stack can be changed to call
stack in general.  Did you mean this?

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods

2012-09-27 Thread Frederic Weisbecker
On Tue, Sep 25, 2012 at 01:57:26PM +0900, Namhyung Kim wrote:
> Ping.  Any comments for this?
> 
> Arun, thanks for testing!
> Namhyung

When Arun was working on this, I asked him to explore if it could make sense to 
reuse
the "-b, --branch-stack"  perf report option. Because after all, this feature 
is doing
about the same than "-b" except it's using callchains instead of full branch 
tracing.
But callchains are branches. Just a limited subset of all branches taken on 
excecution.
So you can probably reuse some interface and even ground code there.

What do you think?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods

2012-09-24 Thread Namhyung Kim
Ping.  Any comments for this?

Arun, thanks for testing!
Namhyung


On Thu, 13 Sep 2012 16:19:56 +0900, Namhyung Kim wrote:
> Hi,
>
> This is my first attempt to implement cumulative hist period report.
> This work begins from Arun's SORT_INCLUSIVE patch [1] but I completely
> rewrote it from scratch.
>
> It basically adds period in a sample to every node in the callchain.
> A hist_entry now has an additional fields to keep the cumulative
> period if --cumulate option is given on perf report.
>
> Let me show you an example:
>
>   $ cat abc.c
>   #define barrier() asm volatile("" ::: "memory")
>   
>   void a(void)
>   {
>   int i;
>   
>   for (i = 0; i < 100; i++)
>   barrier();
>   }
>   
>   void b(void)
>   {
>   a();
>   }
>   
>   void c(void)
>   {
>   b();
>   }
>   
>   int main(void)
>   {
>   c();
>   
>   return 0;
>   }
>   
> With this simple program I ran perf record and report:
>
>   $ perf record -g -e cycles:u ./abc
>   $ perf report -g none --stdio
>   [snip]
>   # Overhead  Command   Shared Object  Symbol
>   #   ...  ..  ..
>   #
>   93.35%  abc  abc [.] a 
>5.17%  abc  ld-2.15.so  [.] _dl_map_object_from_fd
>1.13%  abc  ld-2.15.so  [.] _dl_start 
>0.29%  abc  libpthread-2.15.so  [.] __libc_close  
>0.07%  abc  [kernel.kallsyms]   [k] page_fault
>0.00%  abc  ld-2.15.so  [.] _start
>   
> When --cumulate option is given, it'll be shown like this:
>
>$ perf report --cumulate
>(...)
>+  93.63%  abc  libc-2.15.so[.] __libc_start_main
>+  93.35%  abc  abc [.] main
>+  93.35%  abc  abc [.] c
>+  93.35%  abc  abc [.] b
>+  93.35%  abc  abc [.] a
>+   5.17%  abc  ld-2.15.so  [.] _dl_map_object
>+   5.17%  abc  ld-2.15.so  [.] _dl_map_object_from_fd
>+   1.13%  abc  ld-2.15.so  [.] _dl_start_user
>+   1.13%  abc  ld-2.15.so  [.] _dl_start
>+   0.29%  abc  perf[.] main
>+   0.29%  abc  perf[.] run_builtin
>+   0.29%  abc  perf[.] cmd_record
>+   0.29%  abc  libpthread-2.15.so  [.] __libc_close
>+   0.07%  abc  ld-2.15.so  [.] _start
>+   0.07%  abc  [kernel.kallsyms]   [k] page_fault
>
> (This output came from TUI since stdio bothered by callchains)
>
> As you can see __libc_start_main -> main -> c -> b -> a callchain show
> up in the output.
>
> It might have some rough edges or even bugs, but I really want to
> release it and get reviews.  In fact I saw some very large percentage
> or 'inf' on some callchain nodes when expanding.
>
> It currently ignores samples don't have symbol info when accumulating
> periods along the callchain.  Otherwise it resulted in very strangely
> large output since every node in the callchain would be added into a
> single entry which has NULL dso/sym.  Simply ignoring them solved the
> problem and I couldn't come up with a better solution.
>
> This patchset is based on current acme/perf/core + my small fixes [2],[3].
> You can also get this series on my tree at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git  
> perf/cumulate-v1
>
> Any comments are welcome, thanks.
> Namhyung
>
> [1] https://lkml.org/lkml/2012/3/31/6
> [2] https://lkml.org/lkml/2012/9/11/546
> [3] https://lkml.org/lkml/2012/9/12/51
>
>
> Namhyung Kim (15):
>   perf hists: Add missing period_* fields when collapsing a hist entry
>   perf hists: Introduce struct he_stat
>   perf hists: Move he->stat.nr_events initialization to a template
>   perf hists: Convert hist entry functions to use struct he_stat
>   perf hists: Add more helpers for hist entry stat
>   perf hists: Add support for accumulated stat of hist entry
>   perf hists: Check if accumulated when adding a hist entry
>   perf callchain: Add a couple of callchain helpers
>   perf hists: Let add_hist_entry to make a hist entry template
>   perf hists: Accumulate hist entry stat based on the callchain
>   perf hists: Sort hist entries by accumulated period
>   perf ui/hist: Add support to accumulated hist stat
>   perf ui/browser: Add support to accumulated hist stat
>   perf ui/gtk: Add support to accumulated hist stat
>   perf report: Add --cumulate option
>
>  tools/perf/builtin-report.c|   8 ++
>  tools/perf/ui/browsers/hists.c |  12 +-
>  tools/perf/ui/gtk/browser.c|   5 +-
>  tools/perf/ui/hist.c   |  74 ++---
>  tools/perf/ui/stdio/hist.c |   2 +-
>  tools/perf/util/callchain.c|  15 +++
>  tools/perf/util/callchain.h|  17 +++
>  tools/perf/util/hist.c | 242 
> +
>  tools/perf/util/sort.h |  17 ++-
>  

Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods

2012-09-20 Thread Arun Sharma

On 9/13/12 12:19 AM, Namhyung Kim wrote:

Hi,

This is my first attempt to implement cumulative hist period report.
This work begins from Arun's SORT_INCLUSIVE patch [1] but I completely
rewrote it from scratch.


Tested-by: Arun Sharma 

Our typical use case:

perf record -g fp ./foo
perf report --stdio --cumulate -g graph,100,callee

 -Arun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods

2012-09-13 Thread Namhyung Kim
Hi,

This is my first attempt to implement cumulative hist period report.
This work begins from Arun's SORT_INCLUSIVE patch [1] but I completely
rewrote it from scratch.

It basically adds period in a sample to every node in the callchain.
A hist_entry now has an additional fields to keep the cumulative
period if --cumulate option is given on perf report.

Let me show you an example:

  $ cat abc.c
  #define barrier() asm volatile("" ::: "memory")
  
  void a(void)
  {
int i;
  
for (i = 0; i < 100; i++)
barrier();
  }
  
  void b(void)
  {
a();
  }
  
  void c(void)
  {
b();
  }
  
  int main(void)
  {
c();
  
return 0;
  }
  
With this simple program I ran perf record and report:

  $ perf record -g -e cycles:u ./abc
  $ perf report -g none --stdio
  [snip]
  # Overhead  Command   Shared Object  Symbol
  #   ...  ..  ..
  #
  93.35%  abc  abc [.] a 
   5.17%  abc  ld-2.15.so  [.] _dl_map_object_from_fd
   1.13%  abc  ld-2.15.so  [.] _dl_start 
   0.29%  abc  libpthread-2.15.so  [.] __libc_close  
   0.07%  abc  [kernel.kallsyms]   [k] page_fault
   0.00%  abc  ld-2.15.so  [.] _start
  
When --cumulate option is given, it'll be shown like this:

   $ perf report --cumulate
   (...)
   +  93.63%  abc  libc-2.15.so[.] __libc_start_main
   +  93.35%  abc  abc [.] main
   +  93.35%  abc  abc [.] c
   +  93.35%  abc  abc [.] b
   +  93.35%  abc  abc [.] a
   +   5.17%  abc  ld-2.15.so  [.] _dl_map_object
   +   5.17%  abc  ld-2.15.so  [.] _dl_map_object_from_fd
   +   1.13%  abc  ld-2.15.so  [.] _dl_start_user
   +   1.13%  abc  ld-2.15.so  [.] _dl_start
   +   0.29%  abc  perf[.] main
   +   0.29%  abc  perf[.] run_builtin
   +   0.29%  abc  perf[.] cmd_record
   +   0.29%  abc  libpthread-2.15.so  [.] __libc_close
   +   0.07%  abc  ld-2.15.so  [.] _start
   +   0.07%  abc  [kernel.kallsyms]   [k] page_fault
   
(This output came from TUI since stdio bothered by callchains)

As you can see __libc_start_main -> main -> c -> b -> a callchain show
up in the output.

It might have some rough edges or even bugs, but I really want to
release it and get reviews.  In fact I saw some very large percentage
or 'inf' on some callchain nodes when expanding.

It currently ignores samples don't have symbol info when accumulating
periods along the callchain.  Otherwise it resulted in very strangely
large output since every node in the callchain would be added into a
single entry which has NULL dso/sym.  Simply ignoring them solved the
problem and I couldn't come up with a better solution.

This patchset is based on current acme/perf/core + my small fixes [2],[3].
You can also get this series on my tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git  
perf/cumulate-v1

Any comments are welcome, thanks.
Namhyung

[1] https://lkml.org/lkml/2012/3/31/6
[2] https://lkml.org/lkml/2012/9/11/546
[3] https://lkml.org/lkml/2012/9/12/51


Namhyung Kim (15):
  perf hists: Add missing period_* fields when collapsing a hist entry
  perf hists: Introduce struct he_stat
  perf hists: Move he->stat.nr_events initialization to a template
  perf hists: Convert hist entry functions to use struct he_stat
  perf hists: Add more helpers for hist entry stat
  perf hists: Add support for accumulated stat of hist entry
  perf hists: Check if accumulated when adding a hist entry
  perf callchain: Add a couple of callchain helpers
  perf hists: Let add_hist_entry to make a hist entry template
  perf hists: Accumulate hist entry stat based on the callchain
  perf hists: Sort hist entries by accumulated period
  perf ui/hist: Add support to accumulated hist stat
  perf ui/browser: Add support to accumulated hist stat
  perf ui/gtk: Add support to accumulated hist stat
  perf report: Add --cumulate option

 tools/perf/builtin-report.c|   8 ++
 tools/perf/ui/browsers/hists.c |  12 +-
 tools/perf/ui/gtk/browser.c|   5 +-
 tools/perf/ui/hist.c   |  74 ++---
 tools/perf/ui/stdio/hist.c |   2 +-
 tools/perf/util/callchain.c|  15 +++
 tools/perf/util/callchain.h|  17 +++
 tools/perf/util/hist.c | 242 +
 tools/perf/util/sort.h |  17 ++-
 tools/perf/util/symbol.h   |   1 +
 10 files changed, 318 insertions(+), 75 deletions(-)

-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/