RE: AutoFDO profile toolchain is open-sourced

2015-05-12 Thread Aditya K
Recently we found an ICE while compiling a program with auto-fdo 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972).
The ICE was caused because SSA is not in a valid state when the early inliner 
is run. The fix was to update_ssa before running the early inliner 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972#c4).
However, it remains to be found out which pass caused the SSA to be in that 
state, maybe fixing the problem there would be more appropriate.


-Aditya



 Date: Sat, 9 May 2015 16:33:02 +0200
 From: hubi...@ucw.cz
 To: hiradi...@msn.com
 CC: de...@google.com; i.palac...@samsung.com; davi...@google.com; 
 hubi...@ucw.cz; gcc@gcc.gnu.org; v.bari...@samsung.com; dnovi...@google.com; 
 seb...@gmail.com
 Subject: Re: AutoFDO profile toolchain is open-sourced

 Yes, it will. But it's not well tuned at all. I will start tuning it
 if I have free cycles. It would be great if opensource community can
 also contribute to this tuning effort.

 If you could outline portions of code which needs tuning, rewriting, that 
 will help get started in this effort.

 Optimization passes in GCC are generally designed to work with any kind of 
 edge profile they get.
 There are only few cases where they do care about what profile is around.

 At the moment we consider two types of profiles - static (guessed) and FDO. 
 For
 static one we shut down use of profile info for some heuristics - for example
 we do not expect loop trip counts to be reliable in the profiles because they
 are not. You can look for code checking profile_status_for_fn.

 Auto-FDO does not have special value for profile_status_for_fn and it goes 
 with
 same code paths for FDO. Dehao has some patches for Auto-FDO tuning but my
 impression is that he got mostly got around by just makng optimizer bit more
 robust for nonsential profiles that is always good, since even FDO profiles 
 can
 get wrong. BTW, Dehao, do you think you can submit these changes for this
 stage1?

 I suppose in this case we have yet another kind of profile that is less 
 reliable than
 FDO and we need to start by simply benchmarking and looking for cases where 
 this profile
 gets worse and handle them one by one :)

 Honza
  

Re: AutoFDO profile toolchain is open-sourced

2015-05-09 Thread Jan Hubicka
  Yes, it will. But it's not well tuned at all. I will start tuning it
  if I have free cycles. It would be great if opensource community can
  also contribute to this tuning effort.
 
 If you could outline portions of code which needs tuning, rewriting, that 
 will help get started in this effort.

Optimization passes in GCC are generally designed to work with any kind of edge 
profile they get.
There are only few cases where they do care about what profile is around.

At the moment we consider two types of profiles - static (guessed) and FDO. For
static one we shut down use of profile info for some heuristics - for example
we do not expect loop trip counts to be reliable in the profiles because they
are not.  You can look for code checking profile_status_for_fn.

Auto-FDO does not have special value for profile_status_for_fn and it goes with
same code paths for FDO.  Dehao has some patches for Auto-FDO tuning but my
impression is that he got mostly got around by just makng optimizer bit more
robust for nonsential profiles that is always good, since even FDO profiles can
get wrong.  BTW, Dehao, do you think you can submit these changes for this
stage1?

I suppose in this case we have yet another kind of profile that is less 
reliable than
FDO and we need to start by simply benchmarking and looking for cases where 
this profile
gets worse and handle them one by one :)

Honza


Re: AutoFDO profile toolchain is open-sourced

2015-05-08 Thread Ilya Palachev

On 11.04.2015 01:49, Xinliang David Li wrote:

On Fri, Apr 10, 2015 at 3:43 PM, Jan Hubicka hubi...@ucw.cz wrote:

LBR is used for both cfg edge profiling and indirect call Target value
profiling.

I see, that makes sense ;)  I guess if we want to support profile collection
on targets w/o this feature we could still use one of the algorithms that
try to guess edge profile from BB profile.

Our experience with sampling cycles or retired instructions to guess
BB profile has not been great -- the profile quality is significantly
worse than LBR (which can almost match instrumentation based profile).
Suppose that I have no opportunity to collect profile on x86 
architecture with LBR support and the only available architecture is 
arm/aarch64 (since the application code is significantly different when 
compiled for different architectures because of manual optimizations and 
different function names and structure).


Honza has mentioned that it's possible to guess edge profile from BB 
profile. How do you think, can this help in the above described situation?
Yes, this will be much worse than LBR, but can it give any performance 
benefit compared with no edge profile at all?


--
Ilya


Re: AutoFDO profile toolchain is open-sourced

2015-05-08 Thread Dehao Chen
On Fri, May 8, 2015 at 2:00 AM, Ilya Palachev i.palac...@samsung.com wrote:
 On 11.04.2015 01:49, Xinliang David Li wrote:

 On Fri, Apr 10, 2015 at 3:43 PM, Jan Hubicka hubi...@ucw.cz wrote:

 LBR is used for both cfg edge profiling and indirect call Target value
 profiling.

 I see, that makes sense ;)  I guess if we want to support profile
 collection
 on targets w/o this feature we could still use one of the algorithms that
 try to guess edge profile from BB profile.

 Our experience with sampling cycles or retired instructions to guess
 BB profile has not been great -- the profile quality is significantly
 worse than LBR (which can almost match instrumentation based profile).

 Suppose that I have no opportunity to collect profile on x86 architecture
 with LBR support and the only available architecture is arm/aarch64 (since
 the application code is significantly different when compiled for different
 architectures because of manual optimizations and different function names
 and structure).

If it's already manually tuned towards architecture (or even
hand-written inlined-assembly), then I don't think FDO/AutoFDO can
help much.


 Honza has mentioned that it's possible to guess edge profile from BB
 profile. How do you think, can this help in the above described situation?
 Yes, this will be much worse than LBR, but can it give any performance
 benefit compared with no edge profile at all?

Yes, it will. But it's not well tuned at all. I will start tuning it
if I have free cycles. It would be great if opensource community can
also contribute to this tuning effort.

Cheers,
Dehao


 --
 Ilya


RE: AutoFDO profile toolchain is open-sourced

2015-05-08 Thread Aditya K



 Date: Fri, 8 May 2015 11:19:12 -0700
 Subject: Re: AutoFDO profile toolchain is open-sourced
 From: de...@google.com
 To: i.palac...@samsung.com
 CC: davi...@google.com; hubi...@ucw.cz; gcc@gcc.gnu.org; 
 v.bari...@samsung.com; dnovi...@google.com; seb...@gmail.com

 On Fri, May 8, 2015 at 2:00 AM, Ilya Palachev i.palac...@samsung.com wrote:
 On 11.04.2015 01:49, Xinliang David Li wrote:

 On Fri, Apr 10, 2015 at 3:43 PM, Jan Hubicka hubi...@ucw.cz wrote:

 LBR is used for both cfg edge profiling and indirect call Target value
 profiling.

 I see, that makes sense ;) I guess if we want to support profile
 collection
 on targets w/o this feature we could still use one of the algorithms that
 try to guess edge profile from BB profile.

 Our experience with sampling cycles or retired instructions to guess
 BB profile has not been great -- the profile quality is significantly
 worse than LBR (which can almost match instrumentation based profile).

 Suppose that I have no opportunity to collect profile on x86 architecture
 with LBR support and the only available architecture is arm/aarch64 (since
 the application code is significantly different when compiled for different
 architectures because of manual optimizations and different function names
 and structure).

 If it's already manually tuned towards architecture (or even
 hand-written inlined-assembly), then I don't think FDO/AutoFDO can
 help much.


 Honza has mentioned that it's possible to guess edge profile from BB
 profile. How do you think, can this help in the above described situation?
 Yes, this will be much worse than LBR, but can it give any performance
 benefit compared with no edge profile at all?

 Yes, it will. But it's not well tuned at all. I will start tuning it
 if I have free cycles. It would be great if opensource community can
 also contribute to this tuning effort.

If you could outline portions of code which needs tuning, rewriting, that will 
help get started in this effort.

Thanks,
-Aditya



 Cheers,
 Dehao


 --
 Ilya
  

Re: AutoFDO profile toolchain is open-sourced

2015-04-27 Thread Ilya Palachev

Hi,

On 21.04.2015 20:25, Dehao Chen wrote:

OTOH, the most important patch (insn-level discriminator support) is
not in yet. Cary has just retired. Do you know if anyone would be
interested in porting insn-level discriminator support to trunk?


Do you mean r210338, r210397, r210523, r214745 ?
Can you explain why these patches are important for autofdo?
What work should be done to port them to current 5 branch?
Do you expect them to be applied to 6 branch?

--
Ilya


Re: AutoFDO profile toolchain is open-sourced

2015-04-27 Thread Dehao Chen
On Thu, Apr 23, 2015 at 10:31 PM, Jan Hubicka hubi...@ucw.cz wrote:

   It converts with the attached patches, but there's still some problem
   parsing the data:
  
   % ./create_gcov  -binary loop -gcov_version 1 -gcov loop.gcda 
   -gcov_version 0x500e
   % gcc50 -O2 -fprofile-use loop.c
   loop.c:1:0: warning: '/home/andi/src/autofdo/loop.gcda' is version ',
   expected version '500e'
   %
 
  You need to use -fauto-profile=loop.gcda instead of -fprofile-use,
  which is only for instrumentation based FDO.

 This is indeed not very intuitive. I wonder why it uses the same suffix 
 suggesting
 that sample based and FDO based files are the same?


AutoFDO profile does not need to have any specific suffix. I'll update
the toolchain to make the default output profile as fbdata.afdo
instead of fbdata.gcda.


 Would it be possible to at least have this well documented in invoke.texi and 
 perhaps
 we can fix the warning above to say something like loop.gcda is autofdo 
 profile, use
 -fauto-profile instead of -fprofile-use?


Sounds good to me. I will send a patch to update invoke.texi. Could
you help fix the warning for profile-use?

Thanks,
Dehao



 Honza
 
  Dehao
 
  
   -Andi
  


Re: AutoFDO profile toolchain is open-sourced

2015-04-27 Thread Dehao Chen
On Mon, Apr 27, 2015 at 7:37 AM, Ilya Palachev i.palac...@samsung.com wrote:
 Hi,

 On 21.04.2015 20:25, Dehao Chen wrote:

 OTOH, the most important patch (insn-level discriminator support) is
 not in yet. Cary has just retired. Do you know if anyone would be
 interested in porting insn-level discriminator support to trunk?


 Do you mean r210338, r210397, r210523, r214745 ?

Yes

 Can you explain why these patches are important for autofdo?

Instruction level discriminator support is important to autofdo
because basic block level discriminator is not enough when
instructions are moved to other basic blocks by code motion.
Additionally, gcc backend optimization does not maintain BB level
discriminator well. We need to encode discriminator as part of LOC so
that once the discriminator is assigned to an IR, it will go all the
way to the codegen without being modified.

 What work should be done to port them to current 5 branch?

I think we just need to have these patches in. Or even better,
reimplement this the same way as my lexical block patch
(https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=191494)

 Do you expect them to be applied to 6 branch?

This should go into trunk and be there for all later gcc branches.

Dehao


 --
 Ilya


Re: AutoFDO profile toolchain is open-sourced

2015-04-23 Thread Jan Hubicka
  It converts with the attached patches, but there's still some problem
  parsing the data:
 
  % ./create_gcov  -binary loop -gcov_version 1 -gcov loop.gcda -gcov_version 
  0x500e
  % gcc50 -O2 -fprofile-use loop.c
  loop.c:1:0: warning: '/home/andi/src/autofdo/loop.gcda' is version ',
  expected version '500e'
  %
 
 You need to use -fauto-profile=loop.gcda instead of -fprofile-use,
 which is only for instrumentation based FDO.

This is indeed not very intuitive. I wonder why it uses the same suffix 
suggesting
that sample based and FDO based files are the same?
Would it be possible to at least have this well documented in invoke.texi and 
perhaps
we can fix the warning above to say something like loop.gcda is autofdo 
profile, use
-fauto-profile instead of -fprofile-use?

Honza
 
 Dehao
 
 
  -Andi
 


Re: AutoFDO profile toolchain is open-sourced

2015-04-22 Thread Dehao Chen
Thanks, I'll forward the patches to quipper team.

On Tue, Apr 21, 2015 at 8:47 PM, Andi Kleen a...@firstfloor.org wrote:
 On Wed, Apr 22, 2015 at 05:15:47AM +0200, Andi Kleen wrote:
 On Tue, Apr 21, 2015 at 01:52:18PM -0700, Dehao Chen wrote:
  Andi,
 
  Thanks for the patches. Turns out that the first 3 patches are already
  in, the correct upstream quipper repository is:
 
  https://chromium.googlesource.com/chromiumos/platform2/+/master/chromiumos-wide-profiling/
 
  The last 3 patches seem to be local hacks. Do you want any of them in?
 
  I just did a batch sync with quipper head. Please let me know if this
  solves the perf problem.

 Still outdated:

 F0421 20:13:16.221422 22297 perf_reader.cc:1614] Check failed: attr_size = 
 sizeof(perf_event_attr) (104 vs. 96)

 It converts with the attached patches, but there's still some problem
 parsing the data:

 % ./create_gcov  -binary loop -gcov_version 1 -gcov loop.gcda -gcov_version 
 0x500e
 % gcc50 -O2 -fprofile-use loop.c
 loop.c:1:0: warning: '/home/andi/src/autofdo/loop.gcda' is version ',
 expected version '500e'
 %

You need to use -fauto-profile=loop.gcda instead of -fprofile-use,
which is only for instrumentation based FDO.

Dehao


 -Andi



Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Andi Kleen
  BTW the biggest problem with autofdo currently is that it is
  quite bitrotten and supports only several years old perf.
  So all of this above will only work with old distributions,
  unless you compile an old perf utility first.
 
 Do you mean newer perf does not support LBR (-b) any more?

No.

perf extended its perf.data output format, and quipper cannot parse
any of the extensions, so it just bombs out with assertation
failures.

I have a patch to hack around some of this, but still
couldn't get it actually to work so far.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Dehao Chen
In that case, we should get quipper fixed upstream to accommodate new
format. (Maybe they already fixed it, I will do a batch sync to make
quipper up-to-date).

Dehao

On Tue, Apr 21, 2015 at 10:24 AM, Andi Kleen a...@firstfloor.org wrote:
  BTW the biggest problem with autofdo currently is that it is
  quite bitrotten and supports only several years old perf.
  So all of this above will only work with old distributions,
  unless you compile an old perf utility first.

 Do you mean newer perf does not support LBR (-b) any more?

 No.

 perf extended its perf.data output format, and quipper cannot parse
 any of the extensions, so it just bombs out with assertation
 failures.

 I have a patch to hack around some of this, but still
 couldn't get it actually to work so far.

 -Andi
 --
 a...@linux.intel.com -- Speaking for myself only.


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Andi Kleen
On Tue, Apr 21, 2015 at 10:27:49AM -0700, Dehao Chen wrote:
 In that case, we should get quipper fixed upstream to accommodate new
 format. (Maybe they already fixed it, I will do a batch sync to make
 quipper up-to-date).

From a quick look at 

http://git.chromium.org/gitweb/?p=chromiumos/platform/chromiumos-wide-profiling.git;a=summary

(I assume that is what you mean with upstream)

it hasn't been updated. Is still stuck in 2013.

I'm attaching what patches I have so far.

-Andi


autofdo-newer-perf-0.tgz
Description: application/gtar-compressed


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Dehao Chen
I'll get to it soon. When will stage1 close?

OTOH, the most important patch (insn-level discriminator support) is
not in yet. Cary has just retired. Do you know if anyone would be
interested in porting insn-level discriminator support to trunk?

Dehao

On Tue, Apr 21, 2015 at 8:59 AM, Jan Hubicka hubi...@ucw.cz wrote:
 You can use dump_gcov to show a text version of the profile dump and
 check if the profile data makes sense. If your program is just a very
 tight single loop, the current implementation in trunk may not yield
 good results because it does not have discriminator support. Try the
 google-4_9 branch instead.

 Can we possibly merge the remaining patches now when stage1 is open?

 Honza

 Dehao

 
 
  --
  Best regards,
  Ilya Palachev


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Andi Kleen
Ilya Palachev i.palac...@samsung.com writes:

 But why create_gcov does not inform about that (no branch events were
 found)? It creates empty gcov file and says nothing :(

 Moreover, in the mentioned README it is said that perf should also be
 executed with option -e BR_INST_RETIRED:TAKEN.

Standard perf doesn't have a full event list
This assumes a perf patched with the libpfm patch.

Also I suspect it really wants to use PEBS events, so pp should be added.

Alternatively you can use ocperf (from
http://github.com/andikleen/pmu-tools) which is just a wrapper:

ocperf.py record -e br_inst_retired.near_taken:pp -b ... 

or specify the event manually (depending on your CPU, like)

perf record -e
cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp
-b ...

BTW the biggest problem with autofdo currently is that it is
quite bitrotten and supports only several years old perf.
So all of this above will only work with old distributions,
unless you compile an old perf utility first.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Ilya Palachev

On 21.04.2015 14:57, Diego Novillo wrote:

From the autofdo page: https://github.com/google/autofdo

[ ... ]
Inputs:

--profile: PERF_PROFILE collected using linux perf (with last branch record).
In order to collect this profile, you will need to have an Intel CPU that
have last branch record (LBR) support. You also need to have your linux
kernel configured with LBR support. To profile:
# perf record -c PERIOD -e EVENT -b -o perf.data -- ./command
EVENT is refering to BR_INST_RETIRED:TAKEN if available. For some
architectures, BR_INST_EXEC:TAKEN also works.
[ ... ]

The important one for autofdo is -b. It asks perf to use LBR registers
for branch tracking (assuming your architecture supports it).


Thanks! It worked. Now big programs produce big gcov files. Sorry for 
this confusing message.


But why create_gcov does not inform about that (no branch events were 
found)? It creates empty gcov file and says nothing :(


Moreover, in the mentioned README it is said that perf should also be 
executed with option -e BR_INST_RETIRED:TAKEN.

I tried to add it but perf said that

   invalid or unsupported event: 'BR_INST_RETIRED:TAKEN'
   Run 'perf list' for a list of valid events

For my architecture x86_64 the perf list contains

   $ sudo perf list | grep -i br
  branch-instructions OR branches[Hardware event]
  branch-misses  [Hardware event]
  branch-loads   [Hardware
   cache event]
  branch-load-misses [Hardware
   cache event]
  branch-instructions OR cpu/branch-instructions/[Kernel PMU event]
  branch-misses OR cpu/branch-misses/[Kernel PMU event]
  mem:addr[:access] [Hardware breakpoint]
  syscalls:sys_enter_brk [Tracepoint event]
  syscalls:sys_exit_brk  [Tracepoint event]

There is no BR_INST_RETIRED:TAKEN there. Do you use some specific 
configuration of perf for that?


However, I tried to use option -e branch-instructions. Before that the 
following error was obtained:


   E0421 15:57:39.308374 11551 perf_parser.cc:210] Mapped 50% of
   samples, expected at least 95%

and now it disappeared (because of option -e branch-instructions).

Though, the performance decreases after adding option 
-fauto-profile=file.gcov or -fprofile-use=file.gcov to the list of 
compiler options.

The program becomes 10% slower than before.
Can you explain that? Maybe I should configure perf so that it will be 
able to collect events BR_INST_RETIRED:TAKEN ? How can it be done?


--
Best regards,
Ilya Palachev


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Ilya Palachev

ping?

On 15.04.2015 10:41, Ilya Palachev wrote:

Hi,

One more question.

Does anybody know with which options should the perf be executed so that 
to collect appropriate data for the autofdo converter?
I obtain the same data for different programs, and it seems to be empty 
(1600 Bytes).

They have the same md5sum for different programs:

   # Data for simple program with 30 lines of code:
   $ md5sum ytest.gcov
   d85481c9154aa606ce4893b64fe109e7  ytest.gcov

   # Data for program of 3D Delaunay triangulation construction of
   100 points.
   $ md5sum experimentCGAL_convexHullDynamic.gcov
   d85481c9154aa606ce4893b64fe109e7 experimentCGAL_convexHullDynamic.gcov


We tried to collect perf data using option --call-graph fp but it does 
not help: the output gcov data is still the same.

Sometimes create_gcov reports the following error:
E0421 13:10:37.125629  8732 perf_parser.cc:209] Mapped 50% of samples, 
expected at least 95%


But it does not mean that there are not enough samples collected in the 
profile, because 99% of samples are mapped in the case of very simple 
program (with 1 function).

I try to find working case for more than a week but did not suceeded.

Can anybody show me that create_gcov works at least for one case?

--
Best regards,
Ilya Palachev




Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Dehao Chen
On Tue, Apr 21, 2015 at 6:42 AM, Ilya Palachev i.palac...@samsung.com wrote:
 On 21.04.2015 14:57, Diego Novillo wrote:

 From the autofdo page: https://github.com/google/autofdo

 [ ... ]
 Inputs:

 --profile: PERF_PROFILE collected using linux perf (with last branch
 record).
 In order to collect this profile, you will need to have an Intel CPU that
 have last branch record (LBR) support. You also need to have your linux
 kernel configured with LBR support. To profile:
 # perf record -c PERIOD -e EVENT -b -o perf.data -- ./command
 EVENT is refering to BR_INST_RETIRED:TAKEN if available. For some
 architectures, BR_INST_EXEC:TAKEN also works.
 [ ... ]

 The important one for autofdo is -b. It asks perf to use LBR registers
 for branch tracking (assuming your architecture supports it).


 Thanks! It worked. Now big programs produce big gcov files. Sorry for this
 confusing message.

 But why create_gcov does not inform about that (no branch events were
 found)? It creates empty gcov file and says nothing :(

 Moreover, in the mentioned README it is said that perf should also be
 executed with option -e BR_INST_RETIRED:TAKEN.
 I tried to add it but perf said that

invalid or unsupported event: 'BR_INST_RETIRED:TAKEN'
Run 'perf list' for a list of valid events

 For my architecture x86_64 the perf list contains

$ sudo perf list | grep -i br
   branch-instructions OR branches[Hardware event]
   branch-misses  [Hardware event]
   branch-loads   [Hardware
cache event]
   branch-load-misses [Hardware
cache event]
   branch-instructions OR cpu/branch-instructions/[Kernel PMU event]
   branch-misses OR cpu/branch-misses/[Kernel PMU event]
   mem:addr[:access] [Hardware breakpoint]
   syscalls:sys_enter_brk [Tracepoint event]
   syscalls:sys_exit_brk  [Tracepoint event]

 There is no BR_INST_RETIRED:TAKEN there. Do you use some specific
 configuration of perf for that?

 However, I tried to use option -e branch-instructions. Before that the
 following error was obtained:

E0421 15:57:39.308374 11551 perf_parser.cc:210] Mapped 50% of
samples, expected at least 95%

 and now it disappeared (because of option -e branch-instructions).

 Though, the performance decreases after adding option
 -fauto-profile=file.gcov or -fprofile-use=file.gcov to the list of
 compiler options.
 The program becomes 10% slower than before.
 Can you explain that? Maybe I should configure perf so that it will be able
 to collect events BR_INST_RETIRED:TAKEN ? How can it be done?

You can use dump_gcov to show a text version of the profile dump and
check if the profile data makes sense. If your program is just a very
tight single loop, the current implementation in trunk may not yield
good results because it does not have discriminator support. Try the
google-4_9 branch instead.

Dehao



 --
 Best regards,
 Ilya Palachev


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Jan Hubicka
 You can use dump_gcov to show a text version of the profile dump and
 check if the profile data makes sense. If your program is just a very
 tight single loop, the current implementation in trunk may not yield
 good results because it does not have discriminator support. Try the
 google-4_9 branch instead.

Can we possibly merge the remaining patches now when stage1 is open?

Honza
 
 Dehao
 
 
 
  --
  Best regards,
  Ilya Palachev


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Dehao Chen
On Tue, Apr 21, 2015 at 7:25 AM, Andi Kleen a...@firstfloor.org wrote:
 Ilya Palachev i.palac...@samsung.com writes:

 But why create_gcov does not inform about that (no branch events were
 found)? It creates empty gcov file and says nothing :(

 Moreover, in the mentioned README it is said that perf should also be
 executed with option -e BR_INST_RETIRED:TAKEN.

 Standard perf doesn't have a full event list
 This assumes a perf patched with the libpfm patch.

 Also I suspect it really wants to use PEBS events, so pp should be added.

 Alternatively you can use ocperf (from
 http://github.com/andikleen/pmu-tools) which is just a wrapper:

 ocperf.py record -e br_inst_retired.near_taken:pp -b ...

 or specify the event manually (depending on your CPU, like)

 perf record -e
 cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp
 -b ...

 BTW the biggest problem with autofdo currently is that it is
 quite bitrotten and supports only several years old perf.
 So all of this above will only work with old distributions,
 unless you compile an old perf utility first.

Do you mean newer perf does not support LBR (-b) any more?

Dehao


 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Dehao Chen
Andi,

Thanks for the patches. Turns out that the first 3 patches are already
in, the correct upstream quipper repository is:

https://chromium.googlesource.com/chromiumos/platform2/+/master/chromiumos-wide-profiling/

The last 3 patches seem to be local hacks. Do you want any of them in?

I just did a batch sync with quipper head. Please let me know if this
solves the perf problem.

Thanks,
Dehao

On Tue, Apr 21, 2015 at 10:36 AM, Andi Kleen a...@firstfloor.org wrote:
 On Tue, Apr 21, 2015 at 10:27:49AM -0700, Dehao Chen wrote:
 In that case, we should get quipper fixed upstream to accommodate new
 format. (Maybe they already fixed it, I will do a batch sync to make
 quipper up-to-date).

 From a quick look at

 http://git.chromium.org/gitweb/?p=chromiumos/platform/chromiumos-wide-profiling.git;a=summary

 (I assume that is what you mean with upstream)

 it hasn't been updated. Is still stuck in 2013.

 I'm attaching what patches I have so far.

 -Andi


RE: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Aditya K
After patching linux perf. This script collects creates a coverage file (e.g., 
for linpack) which can be used for fdo.


gcov=linpack-x86.gcov
MAKE='make'


# x86
x86() {
CC=/usr/bin/gcc
CXX=/usr/bin/g++

export CFLAGS=-Ofast -g3 -static
export CPPFLAGS=$CFLAGS

$MAKE -C $SRC/SingleSource/Benchmarks/Linpack clean

$MAKE -C $SRC/SingleSource/Benchmarks/Linpack -k TEST=simple TARGET_LLVMGCC=$CC 
TARGET_CXX=$CXX LLI_OPTFLAGS= TARGET_CC=$CC TARGET_LLVMGXX=$CXX 
CC_UNDER_TEST_IS_GCC=1 TARGET_FLAGS= USE_REFERENCE_OUTPUT=1        
CC_UNDER_TEST_TARGET_IS_AARCH64=1 OPTFLAGS= LLC_OPTFLAGS= ENABLE_OPTIMIZED=1 
ARCH=x86_64 ENABLE_HASHED_PROGRAM_OUTPUT=1 DISABLE_JIT=1

perfdata=autofdo-linpack/perf-x86.data

perf record -b -e branch-instructions -o $perfdata 
$SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple

autofdo/usr/bin/create_gcov 
--binary=$SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple 
--profile=$perfdata --gcov=$gcov

}


hth,
-Aditya


 From: a...@firstfloor.org
 To: i.palac...@samsung.com
 CC: dnovi...@google.com; gcc@gcc.gnu.org; davi...@google.com; hubi...@ucw.cz; 
 seb...@gmail.com; de...@google.com; v.bari...@samsung.com
 Subject: Re: AutoFDO profile toolchain is open-sourced
 Date: Tue, 21 Apr 2015 07:25:10 -0700

 Ilya Palachev i.palac...@samsung.com writes:

 But why create_gcov does not inform about that (no branch events were
 found)? It creates empty gcov file and says nothing :(

 Moreover, in the mentioned README it is said that perf should also be
 executed with option -e BR_INST_RETIRED:TAKEN.

 Standard perf doesn't have a full event list
 This assumes a perf patched with the libpfm patch.

 Also I suspect it really wants to use PEBS events, so pp should be added.

 Alternatively you can use ocperf (from
 http://github.com/andikleen/pmu-tools) which is just a wrapper:

 ocperf.py record -e br_inst_retired.near_taken:pp -b ...

 or specify the event manually (depending on your CPU, like)

 perf record -e
 cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp
 -b ...

 BTW the biggest problem with autofdo currently is that it is
 quite bitrotten and supports only several years old perf.
 So all of this above will only work with old distributions,
 unless you compile an old perf utility first.

 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only
  

Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Sebastian Pop
Ok, thanks for the tip of the flag.

You would also need to pass -use_lbr=false to create a gcov file for
a device that does not have LBR support.
We tried this on ARM collected profiles and we got the same speedup as
x86 collected profiles on linpack.

Sebastian


On Tue, Apr 21, 2015 at 3:53 PM, Dehao Chen de...@google.com wrote:
 That's correct. For trunk, gcov_version is 0x1. We defined this as a
 flag so that you can actually change it via --gcov_version=0x1 instead
 of changing the code.

 Dehao

 On Tue, Apr 21, 2015 at 1:47 PM, Sebastian Pop seb...@gmail.com wrote:
 We also needed to adjust the gcov_version in autofdo/gcov.cc to read
 0x1 for dev branches of gcc (instead of the current 0x3430372a for
 some released version of GCC):

 -DEFINE_uint64(gcov_version, 0x3430372a,
 +DEFINE_uint64(gcov_version, 0x1,

 Sebastian

 On Tue, Apr 21, 2015 at 3:33 PM, Aditya K hiradi...@msn.com wrote:
 After patching linux perf. This script collects creates a coverage file 
 (e.g., for linpack) which can be used for fdo.


 gcov=linpack-x86.gcov
 MAKE='make'


 # x86
 x86() {
 CC=/usr/bin/gcc
 CXX=/usr/bin/g++

 export CFLAGS=-Ofast -g3 -static
 export CPPFLAGS=$CFLAGS

 $MAKE -C $SRC/SingleSource/Benchmarks/Linpack clean

 $MAKE -C $SRC/SingleSource/Benchmarks/Linpack -k TEST=simple 
 TARGET_LLVMGCC=$CC TARGET_CXX=$CXX LLI_OPTFLAGS= TARGET_CC=$CC 
 TARGET_LLVMGXX=$CXX CC_UNDER_TEST_IS_GCC=1 TARGET_FLAGS= 
 USE_REFERENCE_OUTPUT=1CC_UNDER_TEST_TARGET_IS_AARCH64=1 OPTFLAGS= 
 LLC_OPTFLAGS= ENABLE_OPTIMIZED=1 ARCH=x86_64 ENABLE_HASHED_PROGRAM_OUTPUT=1 
 DISABLE_JIT=1

 perfdata=autofdo-linpack/perf-x86.data

 perf record -b -e branch-instructions -o $perfdata 
 $SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple

 autofdo/usr/bin/create_gcov 
 --binary=$SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple 
 --profile=$perfdata --gcov=$gcov

 }


 hth,
 -Aditya

 
 From: a...@firstfloor.org
 To: i.palac...@samsung.com
 CC: dnovi...@google.com; gcc@gcc.gnu.org; davi...@google.com; 
 hubi...@ucw.cz; seb...@gmail.com; de...@google.com; v.bari...@samsung.com
 Subject: Re: AutoFDO profile toolchain is open-sourced
 Date: Tue, 21 Apr 2015 07:25:10 -0700

 Ilya Palachev i.palac...@samsung.com writes:

 But why create_gcov does not inform about that (no branch events were
 found)? It creates empty gcov file and says nothing :(

 Moreover, in the mentioned README it is said that perf should also be
 executed with option -e BR_INST_RETIRED:TAKEN.

 Standard perf doesn't have a full event list
 This assumes a perf patched with the libpfm patch.

 Also I suspect it really wants to use PEBS events, so pp should be added.

 Alternatively you can use ocperf (from
 http://github.com/andikleen/pmu-tools) which is just a wrapper:

 ocperf.py record -e br_inst_retired.near_taken:pp -b ...

 or specify the event manually (depending on your CPU, like)

 perf record -e
 cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp
 -b ...

 BTW the biggest problem with autofdo currently is that it is
 quite bitrotten and supports only several years old perf.
 So all of this above will only work with old distributions,
 unless you compile an old perf utility first.

 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only



Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Sebastian Pop
We also needed to adjust the gcov_version in autofdo/gcov.cc to read
0x1 for dev branches of gcc (instead of the current 0x3430372a for
some released version of GCC):

-DEFINE_uint64(gcov_version, 0x3430372a,
+DEFINE_uint64(gcov_version, 0x1,

Sebastian

On Tue, Apr 21, 2015 at 3:33 PM, Aditya K hiradi...@msn.com wrote:
 After patching linux perf. This script collects creates a coverage file 
 (e.g., for linpack) which can be used for fdo.


 gcov=linpack-x86.gcov
 MAKE='make'


 # x86
 x86() {
 CC=/usr/bin/gcc
 CXX=/usr/bin/g++

 export CFLAGS=-Ofast -g3 -static
 export CPPFLAGS=$CFLAGS

 $MAKE -C $SRC/SingleSource/Benchmarks/Linpack clean

 $MAKE -C $SRC/SingleSource/Benchmarks/Linpack -k TEST=simple 
 TARGET_LLVMGCC=$CC TARGET_CXX=$CXX LLI_OPTFLAGS= TARGET_CC=$CC 
 TARGET_LLVMGXX=$CXX CC_UNDER_TEST_IS_GCC=1 TARGET_FLAGS= 
 USE_REFERENCE_OUTPUT=1CC_UNDER_TEST_TARGET_IS_AARCH64=1 OPTFLAGS= 
 LLC_OPTFLAGS= ENABLE_OPTIMIZED=1 ARCH=x86_64 ENABLE_HASHED_PROGRAM_OUTPUT=1 
 DISABLE_JIT=1

 perfdata=autofdo-linpack/perf-x86.data

 perf record -b -e branch-instructions -o $perfdata 
 $SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple

 autofdo/usr/bin/create_gcov 
 --binary=$SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple 
 --profile=$perfdata --gcov=$gcov

 }


 hth,
 -Aditya

 
 From: a...@firstfloor.org
 To: i.palac...@samsung.com
 CC: dnovi...@google.com; gcc@gcc.gnu.org; davi...@google.com; 
 hubi...@ucw.cz; seb...@gmail.com; de...@google.com; v.bari...@samsung.com
 Subject: Re: AutoFDO profile toolchain is open-sourced
 Date: Tue, 21 Apr 2015 07:25:10 -0700

 Ilya Palachev i.palac...@samsung.com writes:

 But why create_gcov does not inform about that (no branch events were
 found)? It creates empty gcov file and says nothing :(

 Moreover, in the mentioned README it is said that perf should also be
 executed with option -e BR_INST_RETIRED:TAKEN.

 Standard perf doesn't have a full event list
 This assumes a perf patched with the libpfm patch.

 Also I suspect it really wants to use PEBS events, so pp should be added.

 Alternatively you can use ocperf (from
 http://github.com/andikleen/pmu-tools) which is just a wrapper:

 ocperf.py record -e br_inst_retired.near_taken:pp -b ...

 or specify the event manually (depending on your CPU, like)

 perf record -e
 cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp
 -b ...

 BTW the biggest problem with autofdo currently is that it is
 quite bitrotten and supports only several years old perf.
 So all of this above will only work with old distributions,
 unless you compile an old perf utility first.

 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only



Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Andi Kleen
On Tue, Apr 21, 2015 at 01:52:18PM -0700, Dehao Chen wrote:
 Andi,
 
 Thanks for the patches. Turns out that the first 3 patches are already
 in, the correct upstream quipper repository is:
 
 https://chromium.googlesource.com/chromiumos/platform2/+/master/chromiumos-wide-profiling/
 
 The last 3 patches seem to be local hacks. Do you want any of them in?
 
 I just did a batch sync with quipper head. Please let me know if this
 solves the perf problem.

Still outdated:

F0421 20:13:16.221422 22297 perf_reader.cc:1614] Check failed: attr_size = 
sizeof(perf_event_attr) (104 vs. 96) 

-Andi


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Andi Kleen
On Wed, Apr 22, 2015 at 05:15:47AM +0200, Andi Kleen wrote:
 On Tue, Apr 21, 2015 at 01:52:18PM -0700, Dehao Chen wrote:
  Andi,
  
  Thanks for the patches. Turns out that the first 3 patches are already
  in, the correct upstream quipper repository is:
  
  https://chromium.googlesource.com/chromiumos/platform2/+/master/chromiumos-wide-profiling/
  
  The last 3 patches seem to be local hacks. Do you want any of them in?
  
  I just did a batch sync with quipper head. Please let me know if this
  solves the perf problem.
 
 Still outdated:
 
 F0421 20:13:16.221422 22297 perf_reader.cc:1614] Check failed: attr_size = 
 sizeof(perf_event_attr) (104 vs. 96) 

It converts with the attached patches, but there's still some problem
parsing the data:

% ./create_gcov  -binary loop -gcov_version 1 -gcov loop.gcda -gcov_version 
0x500e
% gcc50 -O2 -fprofile-use loop.c 
loop.c:1:0: warning: '/home/andi/src/autofdo/loop.gcda' is version ',
expected version '500e'
% 

-Andi



autofdo-patches-2.tgz
Description: application/gtar-compressed


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Diego Novillo
On Tue, Apr 21, 2015 at 6:33 AM, Ilya Palachev i.palac...@samsung.com wrote:
 ping?

 On 15.04.2015 10:41, Ilya Palachev wrote:

 Hi,

 One more question.

 Does anybody know with which options should the perf be executed so that to
 collect appropriate data for the autofdo converter?

From the autofdo page: https://github.com/google/autofdo

[ ... ]
Inputs:

--profile: PERF_PROFILE collected using linux perf (with last branch record).
In order to collect this profile, you will need to have an Intel CPU that
have last branch record (LBR) support. You also need to have your linux
kernel configured with LBR support. To profile:
# perf record -c PERIOD -e EVENT -b -o perf.data -- ./command
EVENT is refering to BR_INST_RETIRED:TAKEN if available. For some
architectures, BR_INST_EXEC:TAKEN also works.
[ ... ]

The important one for autofdo is -b. It asks perf to use LBR registers
for branch tracking (assuming your architecture supports it).

The binary you run under perf should also have line table information
(compiled with -gmlt) to produce location support for autofdo.


Diego.


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Dehao Chen
That's correct. For trunk, gcov_version is 0x1. We defined this as a
flag so that you can actually change it via --gcov_version=0x1 instead
of changing the code.

Dehao

On Tue, Apr 21, 2015 at 1:47 PM, Sebastian Pop seb...@gmail.com wrote:
 We also needed to adjust the gcov_version in autofdo/gcov.cc to read
 0x1 for dev branches of gcc (instead of the current 0x3430372a for
 some released version of GCC):

 -DEFINE_uint64(gcov_version, 0x3430372a,
 +DEFINE_uint64(gcov_version, 0x1,

 Sebastian

 On Tue, Apr 21, 2015 at 3:33 PM, Aditya K hiradi...@msn.com wrote:
 After patching linux perf. This script collects creates a coverage file 
 (e.g., for linpack) which can be used for fdo.


 gcov=linpack-x86.gcov
 MAKE='make'


 # x86
 x86() {
 CC=/usr/bin/gcc
 CXX=/usr/bin/g++

 export CFLAGS=-Ofast -g3 -static
 export CPPFLAGS=$CFLAGS

 $MAKE -C $SRC/SingleSource/Benchmarks/Linpack clean

 $MAKE -C $SRC/SingleSource/Benchmarks/Linpack -k TEST=simple 
 TARGET_LLVMGCC=$CC TARGET_CXX=$CXX LLI_OPTFLAGS= TARGET_CC=$CC 
 TARGET_LLVMGXX=$CXX CC_UNDER_TEST_IS_GCC=1 TARGET_FLAGS= 
 USE_REFERENCE_OUTPUT=1CC_UNDER_TEST_TARGET_IS_AARCH64=1 OPTFLAGS= 
 LLC_OPTFLAGS= ENABLE_OPTIMIZED=1 ARCH=x86_64 ENABLE_HASHED_PROGRAM_OUTPUT=1 
 DISABLE_JIT=1

 perfdata=autofdo-linpack/perf-x86.data

 perf record -b -e branch-instructions -o $perfdata 
 $SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple

 autofdo/usr/bin/create_gcov 
 --binary=$SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple 
 --profile=$perfdata --gcov=$gcov

 }


 hth,
 -Aditya

 
 From: a...@firstfloor.org
 To: i.palac...@samsung.com
 CC: dnovi...@google.com; gcc@gcc.gnu.org; davi...@google.com; 
 hubi...@ucw.cz; seb...@gmail.com; de...@google.com; v.bari...@samsung.com
 Subject: Re: AutoFDO profile toolchain is open-sourced
 Date: Tue, 21 Apr 2015 07:25:10 -0700

 Ilya Palachev i.palac...@samsung.com writes:

 But why create_gcov does not inform about that (no branch events were
 found)? It creates empty gcov file and says nothing :(

 Moreover, in the mentioned README it is said that perf should also be
 executed with option -e BR_INST_RETIRED:TAKEN.

 Standard perf doesn't have a full event list
 This assumes a perf patched with the libpfm patch.

 Also I suspect it really wants to use PEBS events, so pp should be added.

 Alternatively you can use ocperf (from
 http://github.com/andikleen/pmu-tools) which is just a wrapper:

 ocperf.py record -e br_inst_retired.near_taken:pp -b ...

 or specify the event manually (depending on your CPU, like)

 perf record -e
 cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp
 -b ...

 BTW the biggest problem with autofdo currently is that it is
 quite bitrotten and supports only several years old perf.
 So all of this above will only work with old distributions,
 unless you compile an old perf utility first.

 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only



Re: AutoFDO profile toolchain is open-sourced

2015-04-15 Thread Ilya Palachev

Hi,

One more question.

On 10.04.2015 23:39, Jan Hubicka wrote:

I must say I did not even try running AutoFDO myself (so I am happy to hear
it works).


I tried to use executable create_gcov built from AutoFDO repository at 
github.
The problem is that the data generated by this program has size 1600 
bytes not depending on the profile data given to it.

Steps to reproduce the issue:

1. Build AutoFDO under x86_64

2. Build, for example, the benchmark ytest.c (see attachment):

   g++ -O2 -o ytest ytest.c -g2

(I used g++ that was built just now from gcc-5-branch branch from 
git://gcc.gnu.org/git/gcc.git)


3. Run it under perf to collect the profile data:

   sudo perf record ./ytest


The perf reports no error and says that

   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.125 MB perf.data (~5442 samples) ]


Perf generates perf.data.

4. Run create_gcov on the obtained data:

   create_gcov --binary ytest --profile perf.data --gcov ytest.gcov
   --debug_dump

It creates 2 files:
* ytest.gcov which is 1600 bytes of size
* ytest.gcov.imports which is empty

Also there is no debug output from the program.
If I run create_llvm_prof on the data

   create_llvm_prof --binary ytest --profile perf.data --out ytest.out
   --debug_dump

It reports the following log:

   Length of symbol map: 1
   Number of functions:  0

and creates an empty file ytest.out.

Which is not true: all functions in the benchmark are marked with 
__attribute__((noinline)) and readelf says that they stay in the binary:


   readelf -s ytest | grep px_cycle
56: 00400640   111 FUNCGLOBAL DEFAULT   12 _Z8px_cyclei
   readelf -s ytest | grep py_cycle
60: 004006b036 FUNCGLOBAL DEFAULT   12 _Z8py_cyclev

The size of resulting gcov data is the same (1600 bytes) for different 
levels of debug information (-g0, -g1, -g2) and for different input 
sources files.


What am I doing wrong?

--
Best regards,
Ilya Palachev

#define DX (480*4)

#define DY (640*4)

int* src = new int[DX*DY];
int* dst = new int[DX*DY];
int pxm = DX;
int pym = DY;

void px_cycle(int py) __attribute__((noinline));
void px_cycle(int py) {
int *p1 = dst + (py*pxm);
int *p2 = src + (pym - py - 1);
for (int px = 0; px  pxm; px++) {
if (px  pym  py  pxm) {
*p1 = *p2;
}
p1++;
p2 += pym;
}
}

void py_cycle() __attribute__((noinline));
void py_cycle() {
for (int py = 0; py  pym; py++) {
px_cycle(py);
}
}

int main() {
int i;
for (i = 0; i  100; i++) {
py_cycle();
}
return 0;
}


Re: AutoFDO profile toolchain is open-sourced

2015-04-10 Thread Jan Hubicka
 On Tue, Apr 7, 2015 at 9:45 AM, Ilya Palachev i.palac...@samsung.com wrote:
  In the mentioned README file it is said that  In order to collect this
  profile, you will need to have an Intel CPU that have last branch record
  (LBR) support. Is this information obsolete? Chrome Canary builds use
  AutoFDO for ARMv7l
  (https://code.google.com/p/chromium/issues/detail?id=434587)
 
 It does not mean that the profile was recorded on an ARM system: they
 can gather perf.data on x86 and then produce a coverage file that is
 then used in ARM compiles.  I tried it and seems to work well.

I must say I did not even try running AutoFDO myself (so I am happy to hear
it works). My understanding is that you need LBR only to get indirect
call profiling working (i.e. you want to know from where the indirect
function is called).

Depending on your application this may not be the most important thing to
record (either you don't have indirect calls in hot paths or they are handled
resonably by speculative devirtualization)

Some ARMs also has support for tracing jump pairs, right?
Honza
 
 Sebastian


Re: AutoFDO profile toolchain is open-sourced

2015-04-10 Thread Xinliang David Li
LBR is used for both cfg edge profiling and indirect call Target value
profiling.

David

On Fri, Apr 10, 2015 at 3:26 PM, Xinliang David Li davi...@google.com wrote:
 LBR is used for both cfg edge profiling and indirect call Target value
 profiling.

 David

 On Apr 10, 2015 10:39 AM, Jan Hubicka hubi...@ucw.cz wrote:

  On Tue, Apr 7, 2015 at 9:45 AM, Ilya Palachev i.palac...@samsung.com
  wrote:
   In the mentioned README file it is said that  In order to collect
   this
   profile, you will need to have an Intel CPU that have last branch
   record
   (LBR) support. Is this information obsolete? Chrome Canary builds use
   AutoFDO for ARMv7l
   (https://code.google.com/p/chromium/issues/detail?id=434587)
 
  It does not mean that the profile was recorded on an ARM system: they
  can gather perf.data on x86 and then produce a coverage file that is
  then used in ARM compiles.  I tried it and seems to work well.

 I must say I did not even try running AutoFDO myself (so I am happy to
 hear
 it works). My understanding is that you need LBR only to get indirect
 call profiling working (i.e. you want to know from where the indirect
 function is called).

 Depending on your application this may not be the most important thing to
 record (either you don't have indirect calls in hot paths or they are
 handled
 resonably by speculative devirtualization)

 Some ARMs also has support for tracing jump pairs, right?
 Honza
 
  Sebastian


Re: AutoFDO profile toolchain is open-sourced

2015-04-10 Thread Jan Hubicka
 LBR is used for both cfg edge profiling and indirect call Target value
 profiling.
I see, that makes sense ;)  I guess if we want to support profile collection
on targets w/o this feature we could still use one of the algorithms that
try to guess edge profile from BB profile.

Honza


Re: AutoFDO profile toolchain is open-sourced

2015-04-10 Thread Xinliang David Li
On Tue, Apr 7, 2015 at 7:45 AM, Ilya Palachev i.palac...@samsung.com wrote:
 Hi,

 Here are some questions about AutoFDO.

 On 08.05.2014 02:55, Dehao Chen wrote:

 We have open-sourced AutoFDO profile toolchain in:

 https://github.com/google/autofdo

 For GCC developers, the most important tool is create_gcov, which
 converts sampling based profile to GCC-readable profile. Please refer
 to the readme file
 (https://raw.githubusercontent.com/google/autofdo/master/README) for
 more details.


 In the mentioned README file it is said that  In order to collect this
 profile, you will need to have an Intel CPU that have last branch record
 (LBR) support. Is this information obsolete? Chrome Canary builds use
 AutoFDO for ARMv7l
 (https://code.google.com/p/chromium/issues/detail?id=434587)

 What about Aarch64 support? Is it supported?

As mentioned by Sebastian, the current solution is to collect profile
on Intel platform (with LBR support) and cross optimize arm/aarch64
target.

AutoFDO support with other PMU events (cycles, retired instructions
etc) still needs more tuning to match FDO performance.


 To use the profile, one need to checkout
 https://gcc.gnu.org/svn/gcc/branches/google/gcc-4_8. We are working on
 porting AutoFDO to trunk
 (http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00438.html).


 For now AutoFDO was merged into gcc-5.0 (trunk) branch.
 Is it possible to backport it to 4.9 branch? Can you estimate required
 efforts for that?

The google gcc49 branch has the autofdo support.

David


 We have limited doc inside the open-sourced package, and we are
 planning to add more content to the wiki page
 (https://github.com/google/autofdo/wiki). Feel free to send me emails
 or discuss on github if you have any questions.

 Cheers,
 Dehao


 --
 Best regards,
 Ilya


Re: AutoFDO profile toolchain is open-sourced

2015-04-10 Thread Xinliang David Li
On Fri, Apr 10, 2015 at 3:43 PM, Jan Hubicka hubi...@ucw.cz wrote:
 LBR is used for both cfg edge profiling and indirect call Target value
 profiling.
 I see, that makes sense ;)  I guess if we want to support profile collection
 on targets w/o this feature we could still use one of the algorithms that
 try to guess edge profile from BB profile.

Our experience with sampling cycles or retired instructions to guess
BB profile has not been great -- the profile quality is significantly
worse than LBR (which can almost match instrumentation based profile).

David


 Honza


Re: AutoFDO profile toolchain is open-sourced

2015-04-07 Thread Ilya Palachev

Hi,

Here are some questions about AutoFDO.

On 08.05.2014 02:55, Dehao Chen wrote:

We have open-sourced AutoFDO profile toolchain in:

https://github.com/google/autofdo

For GCC developers, the most important tool is create_gcov, which
converts sampling based profile to GCC-readable profile. Please refer
to the readme file
(https://raw.githubusercontent.com/google/autofdo/master/README) for
more details.


In the mentioned README file it is said that  In order to collect this 
profile, you will need to have an Intel CPU that have last branch record 
(LBR) support. Is this information obsolete? Chrome Canary builds use 
AutoFDO for ARMv7l 
(https://code.google.com/p/chromium/issues/detail?id=434587)


What about Aarch64 support? Is it supported?


To use the profile, one need to checkout
https://gcc.gnu.org/svn/gcc/branches/google/gcc-4_8. We are working on
porting AutoFDO to trunk
(http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00438.html).


For now AutoFDO was merged into gcc-5.0 (trunk) branch.
Is it possible to backport it to 4.9 branch? Can you estimate required 
efforts for that?




We have limited doc inside the open-sourced package, and we are
planning to add more content to the wiki page
(https://github.com/google/autofdo/wiki). Feel free to send me emails
or discuss on github if you have any questions.

Cheers,
Dehao


--
Best regards,
Ilya