Re: IPA ICF fallout: fix for two ipa-icf-*.C tests

2014-10-19 Thread Andreas Schwab
Martin Liška mli...@suse.cz writes:

 diff --git a/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C 
 b/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C
 index 9d17889..9434289 100644
 --- a/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C
 +++ b/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C
 @@ -44,5 +44,5 @@ int main()
  }
  
  /* { dg-final { scan-ipa-dump Varpool alias has been created icf  } } */
 -/* { dg-final { scan-ipa-dump Equal symbols: 2 icf  } } */
 +/* { dg-final { scan-ipa-dump Equal symbols: 6 icf  } } */
  /* { dg-final { cleanup-ipa-dump icf } } */

On ia64:

FAIL: g++.dg/ipa/ipa-icf-4.C  -std=gnu++98  scan-ipa-dump icf Varpool alias 
has been created
FAIL: g++.dg/ipa/ipa-icf-4.C  -std=gnu++98  scan-ipa-dump icf Equal symbols: 6

$ grep -e Equal -e Varpool ipa-icf-4.C.045i.icf
Equal symbols: 5

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Re: [PATCH 5/5] New tests introduction

2014-10-19 Thread Andreas Schwab
Martin Liška mli...@suse.cz writes:

 diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-icf-21.c 
 b/gcc/testsuite/gcc.dg/ipa/ipa-icf-21.c
 new file mode 100644
 index 000..7358e43
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/ipa/ipa-icf-21.c
 @@ -0,0 +1,27 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -fdump-ipa-icf  } */
 +
 +#include xmmintrin.h
 +
 +__attribute__ ((noinline))
 +void foo()
 +{
 +  float x = 1.2345f;
 +  __m128 v =_mm_load1_ps(x);
 +}
 +
 +__attribute__ ((noinline))
 +void bar()
 +{
 +  float x = 1.2345f;
 +  __m128 v =_mm_load1_ps(x);
 +}
 +
 +int main()
 +{
 +  return 2;
 +}
 +
 +/* { dg-final { scan-ipa-dump Semantic equality hit:bar-foo icf  } } */
 +/* { dg-final { scan-ipa-dump Equal symbols: 1 icf  } } */
 +/* { dg-final { cleanup-ipa-dump icf } } */

FAIL: gcc.dg/ipa/ipa-icf-21.c (test for excess errors)
Excess errors:
/usr/local/gcc/gcc-20141019/gcc/testsuite/gcc.dg/ipa/ipa-icf-21.c:4:23: fatal e\
rror: xmmintrin.h: No such file or directory
compilation terminated.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Jan Hubicka
 
  I am surprised you hit the size limits with 4.9 only - for quite some time
  we keep all virtual functions in callgarph until inlining. In fact 4.9 is 
  first
  that works harder to drop them early (because I hit the problem with LTO
  where they artifically bloat the size of LTO object files)
 
 We can dig it more to later understand why only 4.9 hits the problem.

This would be very interesting, because 4.9 ought to be better here
(removing more virtuals early) than previous compilers.
There was number of changes in 4.9 that may affect this - some fixes at
C++ side giving middle end more inline candidates and also this change
https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00834.html
So perhaps most of your virtual functions are !COMDAT!EXTERNAL, but that
does not seem to make much sense to me either :(
 
 My size results with -fno-devirtualize-speculatively is out. It
 shrinks size by 1.68% -- slightly more than -fdevirtualize can do in
 O2 compile.

Hmm, this is interesting, too. 1.68% is definitly a lot more than I would
expect or have I seen on other testcases.  You can take a look at summaries in
-fdump-ipa-devirt pass.

In opts.c I have:
case OPT_fprofile_use:
  if (!opts_set-x_flag_branch_probabilities)
...
  /* Indirect call profiling should do all useful transformations
 speculative devirtualization does.  */
  if (!opts_set-x_flag_devirtualize_speculatively
   opts-x_flag_value_profile_transformations)
opts-x_flag_devirtualize_speculatively = false;

so perhaps this hunk is somehow skipped with LIPO?

Speculative devirtualization is somehwat less useful (may have more falce
positives) without LTO depending on how your headers are constructed.
It would be interesting to see if it does a lot of mistakes on your codebase.
(this can be easily done by forcing it to run with profile feedback, too and
it will tell you when its speculation differs from speculation already there).
 
 By the way, you mentioned 'hacking the
 ipa.c:walk_polymorphic_call_targets to not make the possible targets
 as
 reachable' -- is that something worth doing in trunk?   With that, we
 can probably just turn off speculative devirtualization.

Well, the check is there to enable inlining. Disabling it for
-fprofile-generate will result in lost profile samples for virtual functions.
Disabling it by default will prevent inlining of devirtualized calls making
devirtualization not really useful.
Perhaps with LIPO situation is bit different because you bring in the other
module just to inline the call as you describe.

One thing I can imagine doing is to make inliner consider the reachable
(in post-inlining sense, that is after removing extern inlines and virtual
functions) calls with priority and account only those to unit growth model.
This would make it more consistent over -fdevirtualize and more realistic
about resulting code size.

I sort of considered this option but did not have any good data suggesting
I should implement it.

In general it would be nice to understand this problem.  Also I plan to do
some retunning for 5.0 so it would be nice to know if you have other issues
with 4.9? (I did not closely followed Google branch changes, so if you can
point out those that are relevant for IPA tuning, I would be very interested
to see what problems you hit).

Honza
 
 David
 
 
 
 
  Honza
 
  David
 
 
  
   Honza
  
   David
  
  
   On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote:
Disabling devirtualization reduces code size, both for 
instrumentation (because
many more virtual functions are kept longer and therefore 
instrumented) and for
normal optimization.
   
OK, with profile instrumentation (that you seem to try to minimize) i 
can see
how you get noticeably more counters because virtual functions are 
kept longer.
(note that 4.9 is a lot more agressive on removing unreacable virtual 
functions
than earlier compilers).
   
Instead of disabling -fdevirtualize completely (that will get you 
more indirect
calls and thus more topn profiling) you may consider just hacking
ipa.c:walk_polymorphic_call_targets to not make the possible targets 
as
reachable. (see the conditional on before_inlining_p).
   
Of course this will get you less devirtualization (but with LTO the 
difference
should not be big - perhaps I could make switch for that for 
mainline) and less
accurate profiles when you get speculative devirtualization via topn.
   
I would be very interested to see how much difference this makes.
   
Honza


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Jan Hubicka
 Hi Honza,
 
 As David says, we will do some more experiments with the change you
 suggest and speculative devirtualization, but we needed to make this
 change in part to get an internal release out. One of the issues was a
 recent change to cp/decl2.c to make virtual function decls needed
 under flag_devirtualize.

I think that one is not in current 4.9 branch (only in mainline). 

Honza
 
 Thanks!
 Teresa
 
 On Sat, Oct 18, 2014 at 4:22 PM, Jan Hubicka hubi...@ucw.cz wrote:
 
  The virtual functions will be emitted in some modules, right? If they
  are hot, they will be included with the auxiliary modules. Note that
  LIPO indirect call profile will point to the comdat copy that is
  picked by the linker in the instrumentation build, so it guarantees to
  exist.
 
  If you have COMDAT virtual inline function that is used by module FOO and
  LIPO's indirect inlining will work out that it goes to comdat copy of 
  module BAR
  (that won the merging).  It will make C++ parser to also parse BAR to get
  the function, but callgarph builder will ignore it, too.
  (this is new logic in analyze_function that with -fno-devirtualize will just
  prevent all virtual functions not directly reachable to appear in symbol 
  table)
 
  I am surprised you hit the size limits with 4.9 only - for quite some time
  we keep all virtual functions in callgarph until inlining. In fact 4.9 is 
  first
  that works harder to drop them early (because I hit the problem with LTO
  where they artifically bloat the size of LTO object files)
 
  Honza
 
  David
 
 
  
   Honza
  
   David
  
  
   On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote:
Disabling devirtualization reduces code size, both for 
instrumentation (because
many more virtual functions are kept longer and therefore 
instrumented) and for
normal optimization.
   
OK, with profile instrumentation (that you seem to try to minimize) i 
can see
how you get noticeably more counters because virtual functions are 
kept longer.
(note that 4.9 is a lot more agressive on removing unreacable virtual 
functions
than earlier compilers).
   
Instead of disabling -fdevirtualize completely (that will get you 
more indirect
calls and thus more topn profiling) you may consider just hacking
ipa.c:walk_polymorphic_call_targets to not make the possible targets 
as
reachable. (see the conditional on before_inlining_p).
   
Of course this will get you less devirtualization (but with LTO the 
difference
should not be big - perhaps I could make switch for that for 
mainline) and less
accurate profiles when you get speculative devirtualization via topn.
   
I would be very interested to see how much difference this makes.
   
Honza
 
 
 
 -- 
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] PR36312

2014-10-19 Thread Manuel López-Ibáñez
On 18 October 2014 14:43, Anthony Brandon anthony.bran...@gmail.com wrote:
 Never mind about functions.texi. I figured out how to do it.
 Here is the new diff and changelog.

 libiberty/ChangeLog:

 2014-10-18  Anthony Brandon  anthony.bran...@gmail.com

 * filename_cmp.c (filename_eq): No change.

Unfortunately mklog is not 100% perfect (actually, it is 'diff -p'
which is far from perfect). You need to revise that the ChageLog makes
sense (or make mklog smarter). Thus, drop (filename_eq)...

Also, the changelogs should say PR driver/36312
(https://gcc.gnu.org/codingconventions.html#ChangeLogs).

I think you need to explain the difference between using fatal_error()
and fatal_error(UNKNOWN_LOCATION). Yes, I should know because I wrote
this part of the patch. But to be honest, I don't remember why I did
this change.

+This function first normalizes the file names so that different file names
+pointing to the same underlying file are treated as being identical.

I would suggest: This function compares the canonical versions of the
filenames as returned by @code{lrealpath()}, so that ...

I cannot approve the patch, but it looks fine to me. If you don't get
a reply in a few days, you should ping the relevant maintainers:
https://gcc.gnu.org/wiki/Community#ping

Great first contribution! What are your plans next?

Cheers,

Manuel.


Re: [PATCH] AutoFDO patch for trunk

2014-10-19 Thread Jan Hubicka
  +/* Member functions for string_table.  */
  +
  +string_table *
  +string_table::create ()
 
  Why this is not a constructor?
 
 We use static initializer because it's not suggested to put too much
 logic in constructor.

Why not? :)
  +}
 
  The two hunks probably can be unified.
  Why get_index_by_decl does not already use dwarf name and why do you need 
  to consider both?
 
 get_index_by_decl is actually a wrapper of get_index. It tolerates
 error when assembler name is not emitted in the debug binary (mostly
 for functions that are fully inlined). In this case, we will first try
 to find the index of the assembler name, if not found, then bfd name.
 If still not found, then we try to find name from its abstract
 location.

Yep, I am just somewhat concerned that you need to try different variations of 
the name.
I would expect the assembler name (minus the random seed mangling) to be 
sufficient.
 
 
  +  if (DECL_ABSTRACT_ORIGIN (decl))
  +return get_function_instance_by_decl (lineno, DECL_ABSTRACT_ORIGIN 
  (decl));
 
  What really happens for ipa-cp function clones and for split functions?
 
 Their name suffix will be stripped before matching names.

I see and profile merged, that seems resonable.
  I am not sure it is a win to have the things represented as VPT histograms 
  when you shoot
  for quite special purpose problem.  But lets deal with these incrementally, 
  too.
  (in general I do not see why you try to share so much with the gcov based 
  VTP - it seems
  easier to do the speculation by hand)
 
 Basically there are 2 problem:
 
 * before annotation. I agree this is a special purpose problem
 * after annotation. This should be the same problem as VPT.

Yep, I think adding VPT histograms to get devirtualization happen 
inter-procedurally
with LTO is a good idea.
You will need (incrementally I guess) to translate your indexes into the 
profile-id used
or you will need to initialize profile-ids accordingly, so the IPA pass is able 
to identify
the target cross-module.
  +{
  +  if (gcov_open (auto_profile_file, 1) == 0)
  +error (Cannot open profile file %s., auto_profile_file);
  +
  +  if (gcov_read_unsigned () != GCOV_DATA_MAGIC)
  +error (AutoFDO profile magic number does not mathch.);
  +
  +  /* Skip the version number.  */
  +  gcov_read_unsigned ();
  +
  +  /* Skip the empty integer.  */
  +  gcov_read_unsigned ();
 
  Perhaps we can just check the values?
 
 What do you mean? Check the value against 0?

Yes, if you have version in there it seems to make sense to check it ;)
I guess it is there so you can add extra stuff into the file format that will 
make
it incompatible with current one, so probably we should reject all versions 
greater
than 0. (especially because the tool is off-tree)
 
  Does this have chance to work inter-module?
 
 Yes, it works fine with LIPO.

Important (for me) is to make it work with LTO, because LIPO is apparently not 
landing to 5.0.
 
 Yes, this could be done.
 
 After a second thought, I think we actually need to expose einliner
 interface to autofdo (instead of doing vpt in early inliner). This is
 because we need to mark icall as promoted, so that later vpt passes
 will not try to promote it again. Currently in AutoFDO, we use a
 self-contained way (use a set to mark all promoted stmts). If we do
 vpt in einline, then we will need some flag attached to gimple to
 indicate whether it's already promoted.

If we manage to remove the iteration loop that repeativly applies the inline 
plan, you only
need way to mark edge as promoted.  I guess you can just test the speculative 
flag on the edge
to skip ones you dealt with earlier.

Lets do that incrementally though.
 +
 +@item -fauto-profile
 +@itemx -fauto-profile=@var{path}
 +@opindex fauto-profile
 +Enable sampling based feedback directed optimizations, and optimizations
 +generally profitable only with profile feedback available.
 +
 +The following options are enabled: @code{-fbranch-probabilities}, 
 @code{-fvpt},
 +@code{-funroll-loops}, @code{-fpeel-loops}, @code{-ftracer}, 
 @code{-ftree-vectorize},
 +@code{ftree-loop-distribute-patterns}

I wonder about loop peeling - we do not enale it by default because we do not 
believe
our static branch predictor can well identify loops that have low expected trip 
count
(except ones that can be fully inlined).
I am not sure auto-FDO is much better here - how reliable the info about number 
of iteratoins
is?

But this is again something that can be handled separately.

Documentation seems out of date though - it seems to enable also
-finline, -fipa-cp and cloning, predictive commoning, unswitching, 
gcse-after-reload,

You do not want to enable reorder-functions because autoFDO does not include 
the time profiler.
Also I think speculative devirtualization should not be disabled.
 +
 +If @var{path} is specified, GCC looks at the @var{path} to find
 +the profile feedback data files.
 +
 +In order to collect AutoFDO profile, you need to 

Re: [Patch] Fix PR61889 for the w64-mingw32 case

2014-10-19 Thread Jan Hubicka
 Honza, not sure if this patch is idea, but this will unblock mingw
 build problems. Can this one get in?

Hmm, the patch is somewhat ugly and I do not know why MingW32 defines mkdir
macro and how. If Kai Tietz or other MingW32 maintainer is OK about it, the
patch is OK.

Honza
 
 thanks,
 
 David
 
 On Wed, Sep 24, 2014 at 8:22 AM, Rainer Emrich
 rai...@emrich-ebersheim.de wrote:
  The following patch fixes PR61889 for x86_64-w64-mingw32. Details can be 
  found
  on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61889
 
  The patch was bootstrapped on x86_64-w64-mingw32.
 
  If patch the patch is ok, Kai would you apply, please?
 
  Rainer
 
  2014-09-24  Rainer Emrich  rai...@emrich-ebersheim.de
 
  PR gcov-profile/61889
  * gcc/gcov-tool.c: Remove wrong #if !defined(_WIN32)
  * libgcc/libgcov-driver-system.c: undefine clashing macro for mkdir
 
 
  Index: gcc/gcov-tool.c
  ===
  --- gcc/gcov-tool.c (Revision 215554)
  +++ gcc/gcov-tool.c (Arbeitskopie)
  @@ -89,11 +89,7 @@ gcov_output_files (const char *out, stru
 /* Try to make directory if it doesn't already exist.  */
 if (access (out, F_OK) == -1)
   {
  -#if !defined(_WIN32)
 if (mkdir (out, S_IRWXU | S_IRWXG | S_IRWXO) == -1  errno != 
  EEXIST)
  -#else
  -  if (mkdir (out) == -1  errno != EEXIST)
  -#endif
   fatal_error (Cannot make directory %s, out);
   } else
 unlink_profile_dir (out);
  Index: libgcc/libgcov-driver-system.c
  ===
  --- libgcc/libgcov-driver-system.c  (Revision 215554)
  +++ libgcc/libgcov-driver-system.c  (Arbeitskopie)
  @@ -66,6 +66,9 @@ create_file_directory (char *filename)
   #ifdef TARGET_POSIX_IO
mkdir (filename, 0755) == -1
   #else
  +#ifdef mkdir
  +#undef mkdir
  +#endif
mkdir (filename) == -1
   #endif
   /* The directory might have been made by another process.  */


Re: -fuse-caller-save - Collect register usage information

2014-10-19 Thread Tom de Vries

On 17-10-14 21:24, Eric Botcazou wrote:

Let's look at the effect of the option (after the recent fix for PR61605) on
gcc.target/i386/fuse-calller-save.c:
...
   foo:
   .LFB1:
.cfi_startproc
-   pushq   %rbx
-   .cfi_def_cfa_offset 16
-   .cfi_offset 3, -16
-   movl%edi, %ebx
callbar
-   addl%ebx, %eax
-   popq%rbx
-   .cfi_def_cfa_offset 8
+   addl%edi, %eax
ret
.cfi_endproc
   .LFE1:
...
So, the effect is: instead of using a callee-save register, we use a
caller-save register to store a value that's live over a call, without
needing to add a caller-save, as would be normally the case.

If I see an option -foptimize-caller-saves, I'd expect the effect to be that
without, there are some caller-saves and with, there are less. This is not
the case in the diff above.


To me it is, movl %edi, %ebx/addl %ebx, %eax is a caller-save/restore.



I agree that it can look like that. But the insn 'movl %edi, %ebx' is generated 
by assign_parm_setup_reg at expand. AFAIU, the purpose is to decouple the value 
of the argument and its uses from the register it's passed in.


The definition of -fcaller-saves below explains why insn 'movl %edi, %ebx' is 
not a caller-save: because it's not generated before a call, but rather at the 
start of a function. This seems to be confirmed by the fact that the insn 'movl 
%edi, %ebx' is still generated with -fno-caller-saves.



I'm starting to lean towards -foptimize-call-clobbers or similar.


Yes, that's also a good name and was my initial preference.  But you pointed
out the existing -fcaller-saves:

`-fcaller-saves'
  Enable allocation of values to registers that are clobbered by
  function calls, by emitting extra instructions to save and restore
  the registers around such calls.  Such allocation is done only
  when it seems to result in better code.

so -foptimize-caller-saves can be understood as optimizing out the extra
instructions to save and restore  the registers around such calls  and, thus,
as having a direct relationship with -fcaller-saves.


Agree.

But, given the preference of a number of others for fipa-ra, could you live with 
that?


Thanks,
- Tom


Re: [committed] Remove hppa -mjump-in-delay option

2014-10-19 Thread Segher Boessenkool
On Sat, Oct 18, 2014 at 12:04:36PM -0400, John David Anglin wrote:
 The attached change removes support for the hppa -mjump-in-delay  
 option.

You can set it to Ignored in pa.opt now?  And remove the mask?


Segher


Re: [committed] Remove hppa -mjump-in-delay option

2014-10-19 Thread John David Anglin

On 19-Oct-14, at 10:13 AM, Segher Boessenkool wrote:


You can set it to Ignored in pa.opt now?  And remove the mask?


Yes, will update.

Dave
--
John David Anglin   dave.ang...@bell.net





[C PATCH] Another initialization fix (PR c/63567)

2014-10-19 Thread Marek Polacek
It turned out that there is another spot where we need to allow
initializing objects with static storage duration with compound
literals even in C99 -- when the compound literal is inside the
initializer.  Fixed in the same way as previously.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2014-10-18  Marek Polacek  pola...@redhat.com

PR c/63567
* c-typeck.c (output_init_element): Allow initializing objects with
static storage duration with compound literals even in C99 and add
pedwarn for it.

* gcc.dg/pr63567-3.c: New test.
* gcc.dg/pr63567-4.c: New test.

diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 0dd3366..ee874da 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -8251,11 +8251,14 @@ output_init_element (location_t loc, tree value, tree 
origtype,
 value = array_to_pointer_conversion (input_location, value);
 
   if (TREE_CODE (value) == COMPOUND_LITERAL_EXPR
-   require_constant_value  !flag_isoc99  pending)
+   require_constant_value  pending)
 {
   /* As an extension, allow initializing objects with static storage
 duration with compound literals (which are then treated just as
 the brace enclosed list they contain).  */
+  if (flag_isoc99)
+   pedwarn_init (loc, OPT_Wpedantic, initializer element is not 
+ constant);
   tree decl = COMPOUND_LITERAL_EXPR_DECL (value);
   value = DECL_INITIAL (decl);
 }
diff --git gcc/testsuite/gcc.dg/pr63567-3.c gcc/testsuite/gcc.dg/pr63567-3.c
index e69de29..d626406 100644
--- gcc/testsuite/gcc.dg/pr63567-3.c
+++ gcc/testsuite/gcc.dg/pr63567-3.c
@@ -0,0 +1,7 @@
+/* PR c/63567 */
+/* { dg-do compile } */
+/* { dg-options  } */
+
+struct T { int i; };
+struct S { struct T t; };
+struct S s = { .t = { (int) { 1 } } };
diff --git gcc/testsuite/gcc.dg/pr63567-4.c gcc/testsuite/gcc.dg/pr63567-4.c
index e69de29..0ca6c45 100644
--- gcc/testsuite/gcc.dg/pr63567-4.c
+++ gcc/testsuite/gcc.dg/pr63567-4.c
@@ -0,0 +1,7 @@
+/* PR c/63567 */
+/* { dg-do compile } */
+/* { dg-options -Wpedantic } */
+
+struct T { int i; };
+struct S { struct T t; };
+struct S s = { .t = { (int) { 1 } } }; /* { dg-warning initializer element is 
not constant } */

Marek


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Xinliang David Li
On Sun, Oct 19, 2014 at 1:19 AM, Jan Hubicka hubi...@ucw.cz wrote:
 
  I am surprised you hit the size limits with 4.9 only - for quite some time
  we keep all virtual functions in callgarph until inlining. In fact 4.9 is 
  first
  that works harder to drop them early (because I hit the problem with LTO
  where they artifically bloat the size of LTO object files)

 We can dig it more to later understand why only 4.9 hits the problem.

 This would be very interesting, because 4.9 ought to be better here
 (removing more virtuals early) than previous compilers.
 There was number of changes in 4.9 that may affect this - some fixes at
 C++ side giving middle end more inline candidates and also this change
 https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00834.html
 So perhaps most of your virtual functions are !COMDAT!EXTERNAL, but that
 does not seem to make much sense to me either :(

 My size results with -fno-devirtualize-speculatively is out. It
 shrinks size by 1.68% -- slightly more than -fdevirtualize can do in
 O2 compile.

 Hmm, this is interesting, too. 1.68% is definitly a lot more than I would
 expect or have I seen on other testcases.  You can take a look at summaries in
 -fdump-ipa-devirt pass.

 In opts.c I have:
 case OPT_fprofile_use:
   if (!opts_set-x_flag_branch_probabilities)
 ...
   /* Indirect call profiling should do all useful transformations
  speculative devirtualization does.  */
   if (!opts_set-x_flag_devirtualize_speculatively
opts-x_flag_value_profile_transformations)
 opts-x_flag_devirtualize_speculatively = false;

 so perhaps this hunk is somehow skipped with LIPO?

The 1.68% size reduction I measured is with plain O2 compilation, not LIPO.


 Speculative devirtualization is somehwat less useful (may have more falce
 positives) without LTO depending on how your headers are constructed.
 It would be interesting to see if it does a lot of mistakes on your codebase.
 (this can be easily done by forcing it to run with profile feedback, too and
 it will tell you when its speculation differs from speculation already there).

 By the way, you mentioned 'hacking the
 ipa.c:walk_polymorphic_call_targets to not make the possible targets
 as
 reachable' -- is that something worth doing in trunk?   With that, we
 can probably just turn off speculative devirtualization.

 Well, the check is there to enable inlining. Disabling it for
 -fprofile-generate will result in lost profile samples for virtual functions.
 Disabling it by default will prevent inlining of devirtualized calls making
 devirtualization not really useful.
 Perhaps with LIPO situation is bit different because you bring in the other
 module just to inline the call as you describe.

What I meant is whether it is suitable for plain build ? I have not
looked at the details.


 One thing I can imagine doing is to make inliner consider the reachable
 (in post-inlining sense, that is after removing extern inlines and virtual
 functions) calls with priority and account only those to unit growth model.
 This would make it more consistent over -fdevirtualize and more realistic
 about resulting code size.

 I sort of considered this option but did not have any good data suggesting
 I should implement it.

 In general it would be nice to understand this problem.  Also I plan to do
 some retunning for 5.0 so it would be nice to know if you have other issues
 with 4.9? (I did not closely followed Google branch changes, so if you can
 point out those that are relevant for IPA tuning, I would be very interested
 to see what problems you hit).

Ok I will collect a list of inlining related changes and let you know latter.

thanks,

David


 Honza

 David



 
  Honza
 
  David
 
 
  
   Honza
  
   David
  
  
   On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote:
Disabling devirtualization reduces code size, both for 
instrumentation (because
many more virtual functions are kept longer and therefore 
instrumented) and for
normal optimization.
   
OK, with profile instrumentation (that you seem to try to minimize) 
i can see
how you get noticeably more counters because virtual functions are 
kept longer.
(note that 4.9 is a lot more agressive on removing unreacable 
virtual functions
than earlier compilers).
   
Instead of disabling -fdevirtualize completely (that will get you 
more indirect
calls and thus more topn profiling) you may consider just hacking
ipa.c:walk_polymorphic_call_targets to not make the possible targets 
as
reachable. (see the conditional on before_inlining_p).
   
Of course this will get you less devirtualization (but with LTO the 
difference
should not be big - perhaps I could make switch for that for 
mainline) and less
accurate profiles when you get speculative devirtualization via topn.
   
I would be very interested to see how much difference this makes.
   
   

Re: [PATCH doc] Explain options precedence and difference between -pedantic-errors and -Werror=pedantic

2014-10-19 Thread Joseph S. Myers
On Sat, 18 Oct 2014, Manuel López-Ibáñez wrote:

 What about this version?
 
 Give an error whenever the @dfn{base standard} (see @option{-Wpedantic})
 requires a diagnostic, in cases where there is undefined behavior at
 compile-time

Only in *some* such cases of compile-time undefined behavior.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Fix PR preprocessor/42014

2014-10-19 Thread Joseph S. Myers
On Sat, 18 Oct 2014, Krzesimir Nowak wrote:

 +  pp_verbatim (context-printer,
 +%s from %r%s:%d%R, prefix, locus,

 +   diagnostic_report_from (context, map, In file included);

We don't want to split up diagnostic text like that, because for 
translation it may be necessary to translate the whole In file included 
from text together rather than expecting two fragments to go together in 
the same way they do in English.  Now, right now this message isn't marked 
for translation anyway (an independent bug there's no need for you to 
fix), but still as a design principle things should be structured to avoid 
splitting up English fragments like that.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [C PATCH] Another initialization fix (PR c/63567)

2014-10-19 Thread Joseph S. Myers
On Sun, 19 Oct 2014, Marek Polacek wrote:

 It turned out that there is another spot where we need to allow
 initializing objects with static storage duration with compound
 literals even in C99 -- when the compound literal is inside the
 initializer.  Fixed in the same way as previously.
 
 Bootstrapped/regtested on x86_64-linux, ok for trunk?
 
 2014-10-18  Marek Polacek  pola...@redhat.com
 
   PR c/63567
   * c-typeck.c (output_init_element): Allow initializing objects with
   static storage duration with compound literals even in C99 and add
   pedwarn for it.
 
   * gcc.dg/pr63567-3.c: New test.
   * gcc.dg/pr63567-4.c: New test.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Jan Hubicka
  In opts.c I have:
  case OPT_fprofile_use:
if (!opts_set-x_flag_branch_probabilities)
  ...
/* Indirect call profiling should do all useful transformations
   speculative devirtualization does.  */
if (!opts_set-x_flag_devirtualize_speculatively
 opts-x_flag_value_profile_transformations)
  opts-x_flag_devirtualize_speculatively = false;
 
  so perhaps this hunk is somehow skipped with LIPO?
 
 The 1.68% size reduction I measured is with plain O2 compilation, not LIPO.

OK, in that case I may look into trimming down the speculation with non-LTO 
compilation.
The code makes assumptions about completeness of type hiearchy that without LTO 
are
quite far reached. I tested it on firefox and there it seemed to do mostly
good job, so I decided to try to enable it for non-LTO, too.

It would be nice to understand what really happens in your case and if there
are lot of wrong speculations or if the speculated calls simply happens to not
be on hot paths in your benchmark.
 
 
  Speculative devirtualization is somehwat less useful (may have more falce
  positives) without LTO depending on how your headers are constructed.
  It would be interesting to see if it does a lot of mistakes on your 
  codebase.
  (this can be easily done by forcing it to run with profile feedback, too and
  it will tell you when its speculation differs from speculation already 
  there).
 
  By the way, you mentioned 'hacking the
  ipa.c:walk_polymorphic_call_targets to not make the possible targets
  as
  reachable' -- is that something worth doing in trunk?   With that, we
  can probably just turn off speculative devirtualization.
 
  Well, the check is there to enable inlining. Disabling it for
  -fprofile-generate will result in lost profile samples for virtual 
  functions.
  Disabling it by default will prevent inlining of devirtualized calls making
  devirtualization not really useful.
  Perhaps with LIPO situation is bit different because you bring in the other
  module just to inline the call as you describe.
 
 What I meant is whether it is suitable for plain build ? I have not
 looked at the details.

Yes, dropping the reachability walk should just work, only result in less
inlining.

Honza
 
 
  One thing I can imagine doing is to make inliner consider the reachable
  (in post-inlining sense, that is after removing extern inlines and virtual
  functions) calls with priority and account only those to unit growth model.
  This would make it more consistent over -fdevirtualize and more realistic
  about resulting code size.
 
  I sort of considered this option but did not have any good data suggesting
  I should implement it.
 
  In general it would be nice to understand this problem.  Also I plan to do
  some retunning for 5.0 so it would be nice to know if you have other issues
  with 4.9? (I did not closely followed Google branch changes, so if you can
  point out those that are relevant for IPA tuning, I would be very interested
  to see what problems you hit).
 
 Ok I will collect a list of inlining related changes and let you know latter.
 
 thanks,
 
 David
 
 
  Honza
 
  David
 
 
 
  
   Honza
  
   David
  
  
   
Honza
   
David
   
   
On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Disabling devirtualization reduces code size, both for 
 instrumentation (because
 many more virtual functions are kept longer and therefore 
 instrumented) and for
 normal optimization.

 OK, with profile instrumentation (that you seem to try to 
 minimize) i can see
 how you get noticeably more counters because virtual functions are 
 kept longer.
 (note that 4.9 is a lot more agressive on removing unreacable 
 virtual functions
 than earlier compilers).

 Instead of disabling -fdevirtualize completely (that will get you 
 more indirect
 calls and thus more topn profiling) you may consider just hacking
 ipa.c:walk_polymorphic_call_targets to not make the possible 
 targets as
 reachable. (see the conditional on before_inlining_p).

 Of course this will get you less devirtualization (but with LTO 
 the difference
 should not be big - perhaps I could make switch for that for 
 mainline) and less
 accurate profiles when you get speculative devirtualization via 
 topn.

 I would be very interested to see how much difference this makes.

 Honza


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Xinliang David Li
It was in upstream 4.9 branch, but got reverted in r215061 to fix
PR/61214 and PR/62224.

David

On Sun, Oct 19, 2014 at 1:21 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi Honza,

 As David says, we will do some more experiments with the change you
 suggest and speculative devirtualization, but we needed to make this
 change in part to get an internal release out. One of the issues was a
 recent change to cp/decl2.c to make virtual function decls needed
 under flag_devirtualize.

 I think that one is not in current 4.9 branch (only in mainline).

 Honza

 Thanks!
 Teresa

 On Sat, Oct 18, 2014 at 4:22 PM, Jan Hubicka hubi...@ucw.cz wrote:
 
  The virtual functions will be emitted in some modules, right? If they
  are hot, they will be included with the auxiliary modules. Note that
  LIPO indirect call profile will point to the comdat copy that is
  picked by the linker in the instrumentation build, so it guarantees to
  exist.
 
  If you have COMDAT virtual inline function that is used by module FOO and
  LIPO's indirect inlining will work out that it goes to comdat copy of 
  module BAR
  (that won the merging).  It will make C++ parser to also parse BAR to get
  the function, but callgarph builder will ignore it, too.
  (this is new logic in analyze_function that with -fno-devirtualize will 
  just
  prevent all virtual functions not directly reachable to appear in symbol 
  table)
 
  I am surprised you hit the size limits with 4.9 only - for quite some time
  we keep all virtual functions in callgarph until inlining. In fact 4.9 is 
  first
  that works harder to drop them early (because I hit the problem with LTO
  where they artifically bloat the size of LTO object files)
 
  Honza
 
  David
 
 
  
   Honza
  
   David
  
  
   On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote:
Disabling devirtualization reduces code size, both for 
instrumentation (because
many more virtual functions are kept longer and therefore 
instrumented) and for
normal optimization.
   
OK, with profile instrumentation (that you seem to try to minimize) 
i can see
how you get noticeably more counters because virtual functions are 
kept longer.
(note that 4.9 is a lot more agressive on removing unreacable 
virtual functions
than earlier compilers).
   
Instead of disabling -fdevirtualize completely (that will get you 
more indirect
calls and thus more topn profiling) you may consider just hacking
ipa.c:walk_polymorphic_call_targets to not make the possible targets 
as
reachable. (see the conditional on before_inlining_p).
   
Of course this will get you less devirtualization (but with LTO the 
difference
should not be big - perhaps I could make switch for that for 
mainline) and less
accurate profiles when you get speculative devirtualization via topn.
   
I would be very interested to see how much difference this makes.
   
Honza



 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: Is there an error in GCC Internals Manual, chapter 16.4?

2014-10-19 Thread Andreas Schwab
Panu-Kristian Poiksalo poiks...@gmail.com writes:

 Chapter 16.4, RTL Template in GCC Internals Manual, found at
 https://gcc.gnu.org/onlinedocs/gccint/RTL-Template.html#RTL-Template
 has this to say about match_scratch:

 
 (match_scratch:m n constraint)
 This expression is also a placeholder for operand number n and
 indicates that operand must be a scratch or reg expression.

 When matching patterns, this is equivalent to

   (match_operand:m n scratch_operand pred)
 
 I'm wondering what is pred in this context?

I think it is supposed to be constraint.  This error is already
present in the first revision of the file.  I have installed this patch
in trunk as obvious.  Thanks for the report.

Andreas.

* doc/md.texi (RTL Template) [match_scratch]: Correct equivalent
match_operand expression.

Index: doc/md.texi
===
--- doc/md.texi (revision 216440)
+++ doc/md.texi (working copy)
@@ -297,7 +297,7 @@
 When matching patterns, this is equivalent to
 
 @smallexample
-(match_operand:@var{m} @var{n} scratch_operand @var{pred})
+(match_operand:@var{m} @var{n} scratch_operand @var{constraint})
 @end smallexample
 
 but, when generating RTL, it produces a (@code{scratch}:@var{m})
-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Re: [PATCH][3/n] Merge from match-and-simplify, first patterns and questions

2014-10-19 Thread Marc Glisse

Hello,

looking though the patterns on the branch (not specifically the ones 
attached here), I am surprised to see so few calls to has_single_use. In 
RTL-land, we don't even valueize if there are several uses, so the 
question doesn't occur. In generic, we assume everything is single use 
(CSE could later disagree, but that's the user's fault for writing his 
code that way). In tree-ssa-forwprop.c, helpers like get_prop_source_stmt 
do test for single use. Since has_single_use is a bit painful to use in 
.pd files (separate test for generic and constants), it might deserve 
another helper function, or a special syntax.


--
Marc Glisse


[PATCH, rtl-optimization]: Remove const_alias_set

2014-10-19 Thread Uros Bizjak
Hello!

The fix that fixed scheduler issues with AND addresses (the fix
prevented early exit for MEM_READONLY_P addresses when AND alignment
addresses were involved) caused some fall-out for libgo testsuite.
These tests triggered an assert in mems_in_disjoint_alias_sets_p,
which checks for zero alias set when flag_strict_aliasing is false. We
have had some off-list discussion with Ian Lance Taylor about this
issue.

The problem was, that Go dynamically switches off flag_strict_aliasing
after compilation started and when unsafe package is imported
(similar to when__attribute__ ((optimize (-fno-strict-aliasing))) is
used in c). To mitigate this issue, the Go frontend called
varasm_init_once again to recalculated (= cleared) const_alias_set in
this case.

As observed in [1], the fix for canon_true_depence [2] that introduced
quick exit for a MEM_READONLY_P operands made const_alias_set
redundant, it is no longer user for anything.

The patch that fixed scheduling of AND operands removed early
MEM_READONLY_P exit for memory operands with AND realignment, so
operands could reach more complex code later in the function that was
able to determine dependence of memory operands. This code includes
the call to mems_in_disjoint_alias_sets_p, and the assert triggered
again for some MEM_READONLY_P operands that have had non-zero alias
set, set from the value, cached in const_alias_set from before
flag_strict_aliasing flag was cleared.

The proposed solution is to remove const_alias_set altogether. The
MEM_READONLY_P successfully supersedes const_alias_set functionality,
and this is also confirmed by the removal of the second
varasm_init_once call in the Go frontend. In an off-list discussion,
Ian agrees that attached patch should also fix the problem.

2014-10-19  Uros Bizjak  ubiz...@gmail.com

* varasm.c (const_alias_set): Remove.
(init_varasm_once): Remove initialization of const_alias_set.
(build_constant_desc): Do not set alias set to const_alias_set.

The patch was tested on alpha-linux-gnu [3], alphaev68-linux-gnu and
x86_64-linux-gnu {,-m32} for all default languages plus Go and
obj-c++.

The patch fixes all mentioned libgo failures on alpha.

OK for mainline?

[1] https://gcc.gnu.org/ml/gcc-patches/2013-07/msg01033.html
[2] https://gcc.gnu.org/ml/gcc-patches/2010-07/msg01758.html
[3] https://gcc.gnu.org/ml/gcc-testresults/2014-10/msg02041.html

Uros.
Index: varasm.c
===
--- varasm.c(revision 216362)
+++ varasm.c(working copy)
@@ -98,11 +98,6 @@ tree last_assemble_variable_decl;
 
 bool first_function_block_is_cold;
 
-/* We give all constants their own alias set.  Perhaps redundant with
-   MEM_READONLY_P, but pre-dates it.  */
-
-static alias_set_type const_alias_set;
-
 /* Whether we saw any functions with no_split_stack.  */
 
 static bool saw_no_split_stack;
@@ -3231,7 +3226,6 @@ build_constant_desc (tree exp)
   rtl = gen_const_mem (TYPE_MODE (TREE_TYPE (exp)), symbol);
   set_mem_attributes (rtl, exp, 1);
   set_mem_alias_set (rtl, 0);
-  set_mem_alias_set (rtl, const_alias_set);
 
   /* We cannot share RTX'es in pool entries.
  Mark this piece of RTL as required for unsharing.  */
@@ -5928,7 +5922,6 @@ init_varasm_once (void)
   object_block_htab = hash_tableobject_block_hasher::create_ggc (31);
   const_desc_htab = hash_tabletree_descriptor_hasher::create_ggc (1009);
 
-  const_alias_set = new_alias_set ();
   shared_constant_pool = create_constant_pool ();
 
 #ifdef TEXT_SECTION_ASM_OP


[PATCH, committed] Set SECTION_EXCLUDE flag for LTO sections.

2014-10-19 Thread Ilya Verbin
Hello,

This patch sets SECTION_EXCLUDE flag for LTO sections in lhd_begin_section and
fixes the corresponding gcc_GAS_CHECK_FEATURE.

Approved here: https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01535.html
Bootstrapped and regtested on i686-linux and x86_64-linux, committed to trunk.

  -- Ilya


2014-10-19  Ilya Verbin  ilya.ver...@intel.com

gcc/
* configure: Regenerate.
* configure.ac: Move the test for section attribute specifier e in GAS
out to all i[34567]86-*-* | x86_64-*-* targets and add --fatal-warnings.
* langhooks.c (lhd_begin_section): Set SECTION_EXCLUDE flag.
* varasm.c (default_elf_asm_named_section): Guard SECTION_EXCLUDE with
ifdef HAVE_GAS_SECTION_EXCLUDE.

---

diff --git a/gcc/configure b/gcc/configure
index bd1215d..16f128f 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -24676,9 +24676,12 @@ $as_echo $as_me: WARNING: LTO for $target requires 
binutils = 2.20.1, but vers
  ;;
  esac
fi
-   # Test if the assembler supports the section flag 'e' for specifying
-   # an excluded section.
-   { $as_echo $as_me:${as_lineno-$LINENO}: checking assembler for 
.section with e 5
+   ;;
+esac
+
+# Test if the assembler supports the section flag 'e' for specifying
+# an excluded section.
+{ $as_echo $as_me:${as_lineno-$LINENO}: checking assembler for .section 
with e 5
 $as_echo_n checking assembler for .section with e...  6; }
 if test ${gcc_cv_as_section_has_e+set} = set; then :
   $as_echo_n (cached)  6
@@ -24691,7 +24694,7 @@ fi
   elif test x$gcc_cv_as != x; then
 $as_echo '.section foo1,e
 .byte 0,0,0,0'  conftest.s
-if { ac_try='$gcc_cv_as $gcc_cv_as_flags  -o conftest.o conftest.s 5'
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags --fatal-warnings -o conftest.o 
conftest.s 5'
   { { eval echo \\$as_me\:${as_lineno-$LINENO}: \$ac_try\; } 5
   (eval $ac_try) 25
   ac_status=$?
@@ -24714,8 +24717,6 @@ cat confdefs.h _ACEOF
 #define HAVE_GAS_SECTION_EXCLUDE `if test $gcc_cv_as_section_has_e = yes; then 
echo 1; else echo 0; fi`
 _ACEOF
 
-   ;;
-esac
 
 { $as_echo $as_me:${as_lineno-$LINENO}: checking assembler for filds and 
fists mnemonics 5
 $as_echo_n checking assembler for filds and fists mnemonics...  6; }
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 8f7c814..35ce9ee 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3799,18 +3799,19 @@ foo:nop
  ;;
  esac
fi
-   # Test if the assembler supports the section flag 'e' for specifying
-   # an excluded section.
-   gcc_GAS_CHECK_FEATURE([.section with e], gcc_cv_as_section_has_e,
- [2,22,51],,
-[.section foo1,e
-.byte 0,0,0,0])
-   AC_DEFINE_UNQUOTED(HAVE_GAS_SECTION_EXCLUDE,
- [`if test $gcc_cv_as_section_has_e = yes; then echo 1; else echo 0; 
fi`],
-  [Define if your assembler supports specifying the section flag e.])
;;
 esac
 
+# Test if the assembler supports the section flag 'e' for specifying
+# an excluded section.
+gcc_GAS_CHECK_FEATURE([.section with e], gcc_cv_as_section_has_e,
+  [2,22,51], [--fatal-warnings],
+[.section foo1,e
+.byte 0,0,0,0])
+AC_DEFINE_UNQUOTED(HAVE_GAS_SECTION_EXCLUDE,
+  [`if test $gcc_cv_as_section_has_e = yes; then echo 1; else echo 0; fi`],
+  [Define if your assembler supports specifying the section flag e.])
+
 gcc_GAS_CHECK_FEATURE([filds and fists mnemonics],
gcc_cv_as_ix86_filds,,,
[filds mem; fists mem],,
diff --git a/gcc/langhooks.c b/gcc/langhooks.c
index 7d4c294..4bdeaa0 100644
--- a/gcc/langhooks.c
+++ b/gcc/langhooks.c
@@ -660,7 +660,7 @@ lhd_begin_section (const char *name)
 saved_section = text_section;
 
   /* Create a new section and switch to it.  */
-  section = get_section (name, SECTION_DEBUG, NULL);
+  section = get_section (name, SECTION_DEBUG | SECTION_EXCLUDE, NULL);
   switch_to_section (section);
 }
 
diff --git a/gcc/varasm.c b/gcc/varasm.c
index abb743b..4ae9d58 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -6141,8 +6141,10 @@ default_elf_asm_named_section (const char *name, 
unsigned int flags,
 
   if (!(flags  SECTION_DEBUG))
 *f++ = 'a';
+#if defined (HAVE_GAS_SECTION_EXCLUDE)  HAVE_GAS_SECTION_EXCLUDE == 1
   if (flags  SECTION_EXCLUDE)
 *f++ = 'e';
+#endif
   if (flags  SECTION_WRITE)
 *f++ = 'w';
   if (flags  SECTION_CODE)
-- 
1.7.1


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Jan Hubicka
 It was in upstream 4.9 branch, but got reverted in r215061 to fix
 PR/61214 and PR/62224.

Was your tests done with this change or without?

Honza


Re: [wwwdocs] Add recent C++ changes to gcc-5/changes.html

2014-10-19 Thread Gerald Pfeifer

On Friday 2014-10-17 13:34, Jonathan Wakely wrote:

Index: htdocs/gcc-5/changes.html
===
@@ -128,12 +164,13 @@
  ul
li Class codestd::experimental::any/code; /li
li Function template codestd::experimental::apply/code; /li
+li Variable templates for type traits; /li

The trailing semi-colon feels a bit odd, doesn't it?

(Also, when using a semi-colon, wouldn't the next word -- Function
and Variable here -- be lower-case?)

Gerald


Re: [AArch64] Add --enable-fix-cortex-a53-835769 configure-time option

2014-10-19 Thread Gerald Pfeifer

On Friday 2014-10-10 11:53, Kyrill Tkachov wrote:

This adds a new configure-time option --enable-fix-cortex-a53-835769 that
will enable the Cortex-A53 erratum fix by default
so you don't have to specify -mfix-cortex-a53-835769 every time.

Documentation in install.texi is added.


Thank you.  Can you please also update gcc-5/changes.html on the
web side of things?

Gerald


[PATCH] Fix for PR63583

2014-10-19 Thread Martin Liška
Hello.

I added missing gimple_asm_string comparison for a function with an asm 
statement.
Bootstrap and regression tests still running, ready for trunk after it finishes?

Thank you,
Martin
gcc/ChangeLog:

2014-10-19  Martin Liska  mli...@suse.cz

* ipa-icf-gimple.c (func_checker::compare_gimple_asm):
Gimple tempate string is compared.

gcc/testsuite/ChangeLog:

2014-10-19  Martin Liska  mli...@suse.cz

* gcc.dg/ipa/pr63595.c: New test.
diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
index 792a3e4..1369b74 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -863,6 +863,9 @@ func_checker::compare_gimple_asm (gimple g1, gimple g2)
   if (gimple_asm_nclobbers (g1) != gimple_asm_nclobbers (g2))
 return false;
 
+  if (strcmp (gimple_asm_string (g1), gimple_asm_string (g2)) != 0)
+return return_false_with_msg (ASM strings are different);
+
   for (unsigned i = 0; i  gimple_asm_ninputs (g1); i++)
 {
   tree input1 = gimple_asm_input_op (g1, i);
diff --git a/gcc/testsuite/gcc.dg/ipa/pr63595.c b/gcc/testsuite/gcc.dg/ipa/pr63595.c
new file mode 100644
index 000..9c9f3bf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr63595.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-ipa-icf-details  } */
+
+static int f(int t) __attribute__((noinline));
+
+static int g(int t) __attribute__((noinline));
+static int g(int t)
+{
+asm(addl %0, 1: +r(t));  
+  return t;
+}
+static int f(int t)
+{
+asm(addq %0, -1: +r(t));
+  return t;
+}
+
+
+int h(int t)
+{
+return f(t) + g(t);
+}
+
+/* { dg-final { scan-ipa-dump ASM strings are different icf  } } */
+/* { dg-final { scan-ipa-dump Equal symbols: 0 icf  } } */
+/* { dg-final { cleanup-ipa-dump icf } } */


Re: [PATCH] Fix for PR63583

2014-10-19 Thread Jan Hubicka
 Hello.
 
 I added missing gimple_asm_string comparison for a function with an asm 
 statement.
 Bootstrap and regression tests still running, ready for trunk after it 
 finishes?

OK.
(I remember pointing this out at review :))

Honza
 
 Thank you,
 Martin

 gcc/ChangeLog:
 
 2014-10-19  Martin Liska  mli...@suse.cz
 
   * ipa-icf-gimple.c (func_checker::compare_gimple_asm):
   Gimple tempate string is compared.
 
 gcc/testsuite/ChangeLog:
 
 2014-10-19  Martin Liska  mli...@suse.cz
 
   * gcc.dg/ipa/pr63595.c: New test.

 diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
 index 792a3e4..1369b74 100644
 --- a/gcc/ipa-icf-gimple.c
 +++ b/gcc/ipa-icf-gimple.c
 @@ -863,6 +863,9 @@ func_checker::compare_gimple_asm (gimple g1, gimple g2)
if (gimple_asm_nclobbers (g1) != gimple_asm_nclobbers (g2))
  return false;
  
 +  if (strcmp (gimple_asm_string (g1), gimple_asm_string (g2)) != 0)
 +return return_false_with_msg (ASM strings are different);
 +
for (unsigned i = 0; i  gimple_asm_ninputs (g1); i++)
  {
tree input1 = gimple_asm_input_op (g1, i);
 diff --git a/gcc/testsuite/gcc.dg/ipa/pr63595.c 
 b/gcc/testsuite/gcc.dg/ipa/pr63595.c
 new file mode 100644
 index 000..9c9f3bf
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/ipa/pr63595.c
 @@ -0,0 +1,26 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -fdump-ipa-icf-details  } */
 +
 +static int f(int t) __attribute__((noinline));
 +
 +static int g(int t) __attribute__((noinline));
 +static int g(int t)
 +{
 +asm(addl %0, 1: +r(t));  
 +  return t;
 +}
 +static int f(int t)
 +{
 +asm(addq %0, -1: +r(t));
 +  return t;
 +}
 +
 +
 +int h(int t)
 +{
 +return f(t) + g(t);
 +}
 +
 +/* { dg-final { scan-ipa-dump ASM strings are different icf  } } */
 +/* { dg-final { scan-ipa-dump Equal symbols: 0 icf  } } */
 +/* { dg-final { cleanup-ipa-dump icf } } */



Re: [fortran,patch] Handle infinities and NaNs in intrinsics code generation

2014-10-19 Thread FX
 Looks good to me. Thanks for taking care of F2003's IEEE support.

Committed as rev. 216443, thanks for the review.


 PS: You might want to browse through the current (F2008 + corrigenda
 + first F2015 additions) draft at 
 http://j3-fortran.org/doc/year/14/14-007r2.pdf
 
 See especially the list at the beginning under the item
 Changes to the intrinsic modules IEEE_ARITHMETIC, IEEE_EXCEPTIONS, and
 IEEE_FEATURES for conformance with ISO/IEC/IEEE 60559:2011: [...]
 and then later in that file.

Thanks for the link.
I’d rather wait until later in the process, and let the existing F2003 / F2008 
parts mature  be tested for now.

FX

Re: [GOOGLE] Increase max-early-inliner-iterations to 2 for profile-gen and use

2014-10-19 Thread Xinliang David Li
On Sat, Oct 18, 2014 at 4:19 PM, Xinliang David Li davi...@google.com wrote:
 On Sat, Oct 18, 2014 at 3:27 PM, Jan Hubicka hubi...@ucw.cz wrote:
 The difference in instrumentation runtime is huge -- as topn profiler
 is pretty expensive to run.

 With FDO, it is probably better to make early inlining more aggressive
 in order to get more context sensitive profiling.

 I agree with that, I just would like to understand where increasing the 
 iterations
 helps and if we can handle it without iterating (because Richi originally 
 requested to
 drop the iteration for correcness issues)
 Do you have some examples?

 We can do FDO experiment by shutting down einline. (Note that
 increasing iteration to 2 did not actually improve performance with
 our benchmarks).

Early inlining itself has large performance impact for FDO (the
runtime of the profile-use build). With it disabled, the FDO
performance drops by 2% on average. The degradation is seen across
all benchmarks except for one.

David



 David

 Honza

 David

 On Sat, Oct 18, 2014 at 10:05 AM, Jan Hubicka hubi...@ucw.cz wrote:
  Increasing the number of early inliner iterations from 1 to 2 enables 
  more
  indirect calls to be promoted/inlined before instrumentation. This in 
  turn
  reduces the instrumentation overhead, particularly for more expensive 
  indirect
  call topn profiling.
 
  How much difference you get here? One posibility would be also to run 
  specialized
  ipa-cp before profile instrumentation.
 
  Honza
 
  Passes internal testing and regression tests. Ok for google/4_9?
 
  2014-10-18  Teresa Johnson  tejohn...@google.com
 
  Google ref b/17934523
  * opts.c (finish_options): Increase max-early-inliner-iterations 
  to 2
  for profile-gen and profile-use builds.
 
  Index: opts.c
  ===
  --- opts.c  (revision 216286)
  +++ opts.c  (working copy)
  @@ -870,6 +869,14 @@ finish_options (struct gcc_options *opts, struct g
   opts-x_param_values, opts_set-x_param_values);
   }
 
  +  if (opts-x_profile_arc_flag
  +  || opts-x_flag_branch_probabilities)
  +{
  +  maybe_set_param_value
  +   (PARAM_EARLY_INLINER_MAX_ITERATIONS, 2,
  +opts-x_param_values, opts_set-x_param_values);
  +}
  +
 if (!(opts-x_flag_auto_profile
   || (opts-x_profile_arc_flag || 
  opts-x_flag_branch_probabilities)))
   {
 
 
  --
  Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Xinliang David Li
No, Google branch is not yet synced to 4.9 head.

David

On Sun, Oct 19, 2014 at 1:11 PM, Jan Hubicka hubi...@ucw.cz wrote:
 It was in upstream 4.9 branch, but got reverted in r215061 to fix
 PR/61214 and PR/62224.

 Was your tests done with this change or without?

 Honza


Re: ping x 7: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-10-19 Thread Nathaniel Smith
Hi Jakub,

Thanks for your feedback! See below.

On Thu, Oct 16, 2014 at 4:52 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Mon, Oct 13, 2014 at 10:16:19PM +0100, Nathaniel Smith wrote:
 Got total silence the last 4 times I posted this, and users have been
 bugging me about it offline, so trying again.

 This patch fixes a showstopper problem preventing the transparent use
 of OpenMP in scientific libraries, esp. with Python. Specifically, it
 is currently not possible to use GNU OpenMP -- even in a limited,
 temporary manner -- in any program that uses (or might use) fork() for
 parallelism, even if the fork() and the use of OpenMP occur at totally
 different times. This limitation is unique to GNU OpenMP -- every
 competing OpenMP implementation already contains something like this
 patch. While technically not fully POSIX-compliant (because POSIX
 gives much much weaker guarantees around fork() than any real Unix),
 the approach used in this patch (a) performs only POSIX-compliant
 operations when the host program is itself fully POSIX-compliant, and
 (b) actually works perfectly reliably in practice on all commonly used
 platforms I'm aware of.

 1) gomp_we_are_forked in your patch will attempt to free the pool
of the thread that encounters it, which is racy; consider a program
after fork calling pthread_create several times, each thread
thusly created then ~ at the same time doing #pragma omp parallel
and the initial thread too.  You really should clean up the pool
data structure only in the initial thread and nowhere else;
for native TLS (non-emulated, IE model) the best would be to have a flag
in the gomp_thread_pool structure,
struct gomp_thread *thr = gomp_thread ();
if (thr  thr-thread_pool)
  thr-thread_pool-after_fork = true;
should in that case be safe in the atfork child handler.
For !HAVE_TLS or emulated TLS not sure if it is completely safe,
it would call pthread_getspecific.  Perhaps just don't register
atfork handler on those targets at all?

Good point. The updated patch below takes a slightly different
approach. I moved we_are_forked to the per-thread struct, and then I
moved the setting of it into the *parent* process's fork handlers --
the before-fork handler toggles it to true, then the child spawns off
and inherits this setting, and then the parent after-fork handler
toggles it back again. (Since it's per-thread, there's no race
condition here.) This lets us remove the child after-fork handler
entirely, and -- since the parent handlers aren't subject to any
restrictions on what they can call -- it works on all platforms
regardless of the TLS implementation.

 2) can you explain why are you removing the cleanups from
gomp_free_pool_helper ?

They aren't removed, but rather moved from the helper function (which
runs in the helper threads) into gomp_free_thread_pool (which runs in
the main thread) -- which makes it easier to run the appropriate
cleanups even in the case where the helper threads aren't running.
(But see below -- we might prefer to drop this part of the patch
entirely.)

 3) you can call pthread_atfork many times (once for each pthread
that creates a thread pool), that is undesirable, you want to do that
only if the initial thread creates thread pool

Good point. I've moved the pthread_atfork call to initialize_team,
which is an __attribute__((constructor)).

I am a little uncertain whether this is the best approach, though,
because of the comment in team_destructor about wanting to correctly
handle dlopen/dlclose. One of pthread_atfork's many (many) limitations
is that there's no way to unregister handlers, so if dlopen/dlclose is
important (is it?) then we can't call pthread_atfork from
initialize_team.

If this is a problem, then we could delay the pthread_atfork until
e.g. the first thread pool is spawned -- would this be preferred?

 4) the testcase is clearly not portable enough, should be probably limited
to *-*-linux* only, fork etc. will likely not work on many targets.

I think it should work on pretty much any target that has fork(); we
definitely care about having this functionality on e.g. OS X. I've
added some genericish target specifications.

 In any case, even with the patch, are you aware that you'll leak megabytes
 of thread stacks etc.?

Well, err, I wasn't, no :-). Thanks for pointing that out.

To me this does clinch the argument that a better approach would be
the one I suggested in
   https://gcc.gnu.org/ml/gcc-patches/2014-02/msg00979.html
i.e., of tracking whether any threadprivate variables were present,
and if not then simply shutting down the thread pools before forking.
But this would be a much more invasive change to gomp (I wouldn't know
where to start).

In the mean time, the current patch is still worthwhile. The cost is
not that bad: I wouldn't think of it as leaking so much as overhead
of supporting OMP-fork-OMP. No-one forks a child which forks a
child which