Re: [PATCH 3/5] IPA ICF pass

2014-09-27 Thread Markus Trippelsdorf
On 2014.09.27 at 01:27 +0200, Jan Hubicka wrote:
  While a plain Firefox -flto build works fine. LTO/PGO build fails with:
  
  lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540
  0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*)
  ../../gcc/gcc/ipa-utils.c:540
  0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*)
  ../../gcc/gcc/ipa-icf.c:753
  0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int)
  ../../gcc/gcc/ipa-icf.c:2706
  0xf1c1f4 ipa_icf::sem_item_optimizer::execute()
  ../../gcc/gcc/ipa-icf.c:2098
  0xf1d3f1 ipa_icf_driver
  ../../gcc/gcc/ipa-icf.c:2784
  0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*)
  ../../gcc/gcc/ipa-icf.c:2831
  
  
  The pass is also very memory hungry (from 3GB without ICF to 4GB during
  libxul link), while the code size savings are in the 1% range.
 
 Thnks for checking. I was just thinking about doing that myself.  Would
 you mind posting -ftime-report of firefox WPA stage?

(without ICF)
Execution times (seconds)
 phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall  
  1412 kB ( 0%) ggc
 phase opt and generate  :  58.38 (63%) usr   2.00 (47%) sys  60.37 (40%) wall  
403069 kB (12%) ggc
 phase stream in :  30.24 (33%) usr   0.97 (23%) sys  33.90 (22%) wall 
2944210 kB (88%) ggc
 phase stream out:   4.29 ( 5%) usr   1.32 (31%) sys  57.32 (38%) wall  
 0 kB ( 0%) ggc
 phase finalize  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall  
 0 kB ( 0%) ggc
 garbage collection  :   3.68 ( 4%) usr   0.00 ( 0%) sys   3.68 ( 2%) wall  
 0 kB ( 0%) ggc
 callgraph optimization  :   0.50 ( 1%) usr   0.00 ( 0%) sys   0.50 ( 0%) wall  
   166 kB ( 0%) ggc
 ipa dead code removal   :   6.91 ( 7%) usr   0.08 ( 2%) sys   7.25 ( 5%) wall  
 0 kB ( 0%) ggc
 ipa virtual call target :   7.08 ( 8%) usr   0.04 ( 1%) sys   6.93 ( 5%) wall  
 0 kB ( 0%) ggc
 ipa devirtualization:   0.27 ( 0%) usr   0.00 ( 0%) sys   0.27 ( 0%) wall  
 10365 kB ( 0%) ggc
 ipa cp  :   1.81 ( 2%) usr   0.06 ( 1%) sys   3.40 ( 2%) wall  
173701 kB ( 5%) ggc
 ipa inlining heuristics :  16.60 (18%) usr   0.27 ( 6%) sys  17.48 (12%) wall  
532704 kB (16%) ggc
 ipa comdats :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa lto gimple out  :   0.21 ( 0%) usr   0.04 ( 1%) sys   0.97 ( 1%) wall  
 0 kB ( 0%) ggc
 ipa lto decl in :  18.29 (20%) usr   0.54 (13%) sys  18.96 (12%) wall 
2226088 kB (66%) ggc
 ipa lto decl out:   3.93 ( 4%) usr   0.13 ( 3%) sys   4.06 ( 3%) wall  
 0 kB ( 0%) ggc
 ipa lto constructors in :   0.24 ( 0%) usr   0.03 ( 1%) sys   0.59 ( 0%) wall  
 14226 kB ( 0%) ggc
 ipa lto constructors out:   0.08 ( 0%) usr   0.04 ( 1%) sys   0.15 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa lto cgraph I/O  :   0.89 ( 1%) usr   0.12 ( 3%) sys   1.02 ( 1%) wall  
364151 kB (11%) ggc
 ipa lto decl merge  :   2.14 ( 2%) usr   0.01 ( 0%) sys   2.14 ( 1%) wall  
  8196 kB ( 0%) ggc
 ipa lto cgraph merge:   1.59 ( 2%) usr   0.00 ( 0%) sys   1.60 ( 1%) wall  
 12716 kB ( 0%) ggc
 whopr wpa   :   1.54 ( 2%) usr   0.03 ( 1%) sys   1.55 ( 1%) wall  
 1 kB ( 0%) ggc
 whopr wpa I/O   :   0.04 ( 0%) usr   1.11 (26%) sys  52.10 (34%) wall  
 0 kB ( 0%) ggc
 whopr partitioning  :   5.02 ( 5%) usr   0.01 ( 0%) sys   5.03 ( 3%) wall  
  4938 kB ( 0%) ggc
 ipa reference   :   2.04 ( 2%) usr   0.02 ( 0%) sys   2.08 ( 1%) wall  
 0 kB ( 0%) ggc
 ipa profile :   0.32 ( 0%) usr   0.00 ( 0%) sys   0.33 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa pure const  :   2.43 ( 3%) usr   0.02 ( 0%) sys   2.49 ( 2%) wall  
 0 kB ( 0%) ggc
 tree STMT verifier  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
 0 kB ( 0%) ggc
 callgraph verifier  :  16.31 (18%) usr   1.69 (39%) sys  17.96 (12%) wall  
 0 kB ( 0%) ggc
 dominance computation   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall  
 0 kB ( 0%) ggc
 varconst:   0.01 ( 0%) usr   0.03 ( 1%) sys   0.05 ( 0%) wall  
 0 kB ( 0%) ggc
 unaccounted todo:   0.69 ( 1%) usr   0.00 ( 0%) sys   0.69 ( 0%) wall  
 0 kB ( 0%) ggc
 TOTAL :  92.91 4.29   151.73
3348693 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.

(with ICF)
Execution times (seconds)
 phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall  
  1412 kB ( 0%) ggc
 phase opt and generate  :  82.70 (70%) usr   3.31 (53%) sys  86.17 (45%) wall 
1468975 kB (33%) ggc
 phase stream in :  30.46 (26%) usr   1.02 (16%) sys  31.48 (16%) wall 
2944210 kB (67%) ggc
 phase stream out:   4.52 ( 4%) usr   1.90 (30%) sys  73.47 (38%) wall  
12 kB ( 0%) ggc
 phase finalize  :   0.00 ( 0%) usr   0.00 ( 

Re: [PATCH 3/5] IPA ICF pass

2014-09-27 Thread Markus Trippelsdorf
On 2014.09.27 at 07:59 +0200, Markus Trippelsdorf wrote:
 
  It seems that in this case we reject too many of equality candidates?
  It think the original numbers was about 4-5% but later some equivalences was
  disabled because of devirt/aliasing issues. Do you compare it with gold ICF
  enabled? There are quite few obvious improvements to the analysis that can
  be done, but I guess we need to analyze the interesting cases one by one.

Forgot to post the binary size numbers (in bytes):

  | gold's icf off | gold's icf on  |
--+++
gcc's icf off |79793880|74881040|
--+-+
gcc's icf on  |78043608|73612800|
--+++

-- 
Markus


Re: Avoid privatization of TLS variables

2014-09-27 Thread Andrew Haley
I may be guilty of missing a crucial point here, but: why do we care
about having a small limit of static TLS variables?

We surely could allocate, say, a megabyte of static TLS for each
thread.  We already allocate 64M for the thread-local malloc arena,
after all.  It doesn't cost anything beyond a little address space.

What am I missing?

Andrew.


Re: Avoid privatization of TLS variables

2014-09-27 Thread Andrew Haley
On 27/09/14 08:56, Andrew Haley wrote:
 I may be guilty of missing a crucial point here, but: why do we care
 about having a small limit of static TLS variables?
 
 We surely could allocate, say, a megabyte of static TLS for each
 thread.  We already allocate 64M for the thread-local malloc arena,

On 64-bit systems, I mean.

 after all.  It doesn't cost anything beyond a little address space.
 
 What am I missing?
 
 Andrew.
 



Re: RFA: one more version of the patch for PR61360

2014-09-27 Thread Richard Sandiford
Hi Vlad,

Vladimir Makarov vmaka...@redhat.com writes:
 I guess we achieved the consensus about the following patch to fix PR61360

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360

 The patch was successfully bootstrapped and tested (w/wo 
 -march=amdfam10) on x86/x86-64.

 Is it ok to commit to trunk?

 2014-09-26  Vladimir Makarov  vmaka...@redhat.com

  PR target/61360
  * lra.c (lra): Remove call of recog_init.
  * recog.c (constrain_operands): Permit reg for memory constraint
  when LRA is used.
  * config/i386/i386.md (*floatSWI48:modeMODEF:mode2_sse):
  Enable first alternative independently on RA stage.

Many thanks for doing this!  I hadn't realised when seeing the bug
originally that LRA made the recog.c part possible.  Obviously I can't
approve it, but this approach seems much cleaner to me.

Richard


RE: [PATCH] Fix PR preprocessor/58893 access to uninitialized memory

2014-09-27 Thread Bernd Edlinger
Hmm, original massage bounced, resent, without html.

 From: bernd.edlin...@hotmail.de 
 To: l...@redhat.com; gcc-patches@gcc.gnu.org 
 CC: jos...@codesourcery.com 
 Subject: RE: [PATCH] Fix PR preprocessor/58893 access to uninitialized memory 
 Date: Sat, 27 Sep 2014 11:42:29 +0200 
  
  
  
 On Fri, 26 Sep 2014 12:48:44, Jeff Law wrote: 
  
  On 09/26/14 06:21, Bernd Edlinger wrote: 
  
 Hi, 
  
 this patch fixes PR58893, which is an access to uninitialized  
 memory, which may or may not crash in 
 linemap_resolve_location, or just print error messages with bogus  
 location. 
  
 When the first -include file is processed we have the case, where 
 pfile-cur_token == pfile-cur_run-base, this is directly called 
 by the front end. However in the case of the second -include file, 
 this is called from _cpp_lex_token - _cpp_get_fresh_line - 
 cpp_push_include, with pfile-cur_token != pfile-cur_run-base, 
 and pfile-cur_token[-1].src_loc and token not (yet) initialized. 
 The problem is, when the include file cannot be found, we need 
 src_loc to be initialized to some safe value: 0 means UNKNOWN_LOCATION. 
  
 Regarding the hunk in cpp_diagnostic, which is not directly involved 
 in this bug, but it is still obviously wrong: 
  
 The line src_loc = pfile-cur_run-prev-limit-src_loc 
 is probably unreachable, but will crash it is ever executed. 
  
 see: 
  
 _cpp_init_tokenrun (tokenrun *run, unsigned int count) 
 { 
 run-base = XNEWVEC (cpp_token, count); 
 run-limit = run-base + count; 
 run-next = NULL; 
 } 
  
 so, limit points at the end of the run. 
  
  
 Boot-Strapped and Regression-tested on x86_64-linux-gnu 
 Ok for trunk? 
  
  
 Thanks 
 Bernd. 
  
  
  
  
  changelog-pr58893.txt 
  
  
  2014-09-26 Bernd Edlingerbernd.edlin...@hotmail.de 
  
  PR preprocessor/58893 
  * errors.c (cpp_diagnostic): Fix possible out of bounds access. 
  * files.c (_cpp_stack_include): Initialize src_loc for IT_CMDLINE. 
  
  
  patch-pr58893.diff 
  
  
  --- libcpp/errors.c 2014-01-02 23:24:45.0 +0100 
  +++ libcpp/errors.c 2014-09-24 10:30:33.708048505 +0200 
  @@ -48,10 +48,7 @@ cpp_diagnostic (cpp_reader * pfile, int 
  current run -- that is invalid. */ 
  else if (pfile-cur_token == pfile-cur_run-base) 
  { 
  - if (pfile-cur_run-prev != NULL) 
  - src_loc = pfile-cur_run-prev-limit-src_loc; 
  - else 
  - src_loc = 0; 
  + src_loc = 0; 
  } 
  else 
  { 
  --- libcpp/files.c 2014-05-21 20:54:12.0 +0200 
  +++ libcpp/files.c 2014-09-24 10:35:47.191117490 +0200 
  @@ -991,6 +991,9 @@ _cpp_stack_include (cpp_reader *pfile, c 
  _cpp_file *file; 
  bool stacked; 
  
  + if (type == IT_CMDLINE  pfile-cur_token != pfile-cur_run-base) 
  + pfile-cur_token[-1].src_loc = 0; 
  Comment before this change. Someone not familiar with this code is 
  going to have no idea why these two lines exist. 
  
  
 Ok, I added a comment now, do you like it? 
  
  Please try to include a testcase. If you're having trouble reproducing 
  on the trunk, you could use MALLOC_PERTURB per c#8 in the bug report. 
  If there's a way to set environment variables in our testing framework 
  that may be a reasonable way to test (if you need to do that, limit 
  testing to linux targets as we'll have a dependency on glibc features). 
  
  
 For whatever reason, the first -include must end with a pragma 
 as in the PR, and MALLOC_PERTURB_ must be set to something. 
 Then we get an ICE, otherwise we get an error message without line number. 
 I tried to make this a valid test case, but that might be less trivial than 
 it looks at first sight. 
  
 I tried to set MALLOC_PERTURB_=123 globally, like this: 
  
 MALLOC_PERTURB_=123 make -k check 
  
 but then this happened: 
  
  
 WARNING: program timed out. 
 FAIL: gcc.c-torture/unsorted/dump-noaddr.c,  -O3 -fomit-frame-pointer  
 -funroll-all-loops -finline-functions   -dumpbase dump1/dump-noaddr.c  
 -DMASK=1 -x c --param ggc-min-heapsize=1 -fdump-ipa-all -fdump-rtl-all  
 -fdump-tree-all -fdump-noaddr 
 FAIL: gcc.c-torture/unsorted/dump-noaddr.c.000i.cgraph,  -O3  
 -fomit-frame-pointer -funroll-all-loops -finline-functions  comparison 
 FAIL: gcc.c-torture/unsorted/dump-noaddr.c.003t.original,  -O3  
 -fomit-frame-pointer -funroll-all-loops -finline-functions  comparison 
 FAIL: gcc.c-torture/unsorted/dump-noaddr.c.032t.profile_estimate,  -O3  
 -fomit-frame-pointer -funroll-all-loops -finline-functions  comparison 
 FAIL: gcc.c-torture/unsorted/dump-noaddr.c.253t.statistics,  -O3  
 -fomit-frame-pointer -funroll-all-loops -finline-functions  comparison 
 WARNING: program timed out. 
 FAIL: gcc.c-torture/unsorted/dump-noaddr.c,  -O3 -g   -dumpbase  
 dump1/dump-noaddr.c -DMASK=1 -x c --param ggc-min-heapsize=1  
 -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr 
 FAIL: gcc.c-torture/unsorted/dump-noaddr.c.000i.cgraph,  -O3 -g  comparison 
 FAIL: gcc.c-torture/unsorted/dump-noaddr.c.003t.original,  -O3 -g  comparison 
 FAIL: 

Re: [PATCH 3/5] IPA ICF pass

2014-09-27 Thread Martin Liška
On 09/27/2014 01:27 AM, Jan Hubicka wrote:
 While a plain Firefox -flto build works fine. LTO/PGO build fails with:

 lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540
 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*)
 ../../gcc/gcc/ipa-utils.c:540
 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*)
 ../../gcc/gcc/ipa-icf.c:753
 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int)
 ../../gcc/gcc/ipa-icf.c:2706
 0xf1c1f4 ipa_icf::sem_item_optimizer::execute()
 ../../gcc/gcc/ipa-icf.c:2098
 0xf1d3f1 ipa_icf_driver
 ../../gcc/gcc/ipa-icf.c:2784
 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*)
 ../../gcc/gcc/ipa-icf.c:2831


 The pass is also very memory hungry (from 3GB without ICF to 4GB during
 libxul link), while the code size savings are in the 1% range.


The majority of the problem are groups of candidates that are built according 
to hash.
The hash value is based on a number of arguments, number of BB, number of 
gimple statements and types of these statements.
It groups function into classes. In WPA (before a body of any function is 
loaded) I get following histogram:

Dump after WPA based types groups
Congruence classes: 97204 (unique hash values: 88725), with total: 191457 items
Class size histogram [num of members]: number of classe number of classess
[1]: 86453 classes
[2]: 5680 classes
[3]: 1541 classes
[4]: 915 classes
[5]: 446 classes
[6]: 346 classes
[7]: 200 classes
[8]: 181 classes
[9]: 154 classes
[10]: 109 classes
[11]: 87 classes
[12]: 87 classes
[13]: 68 classes
[14]: 58 classes
[15]: 58 classes
[16]: 41 classes
[17]: 25 classes
[18]: 33 classes
[19]: 28 classes
[20]: 25 classes
[21]: 19 classes
[22]: 30 classes
[23]: 24 classes
[24]: 33 classes
[25]: 17 classes
[26]: 15 classes
[27]: 10 classes
[28]: 13 classes
[29]: 18 classes
[30]: 10 classes

It means that each class with more than one member needs to be iterated and 
these functions are compared. And yes, there's the root of the problem.
I have to load function body to process deep function comparison. As you can 
see, we have almost 200k function, where more than half each situated
in a group with more that one member. So that 1GB extra memory usage is caused 
by these bodies:

Init called for 105004 items (54.84%).

Memory footprint can be significantly reduced if one can load the body and 
release it and the memory is freed. I asked Honza about it, but it looks
GGC mechanism cannot be easily forced to release it.

 
 Thnks for checking. I was just thinking about doing that myself.  Would
 you mind posting -ftime-report of firefox WPA stage?
 
 It seems that in this case we reject too many of equality candidates?
 It think the original numbers was about 4-5% but later some equivalences was
 disabled because of devirt/aliasing issues. Do you compare it with gold ICF
 enabled? There are quite few obvious improvements to the analysis that can
 be done, but I guess we need to analyze the interesting cases one by one.

You are right, the number were quite promising, but during the time, I had to
reduce the aggressivity of the pass. As Honza said, it can be improved 
step-by-step.

 
 One thing that Martin can try is to hook into lto-symtab and try to check
 that the COMDAT functions that are known to be same pass the equality check.
 I suppose we will learn interesting things this way.
 
Good point, I will try it.

Martin


 I think the patch adds quite important infrastructure for gimple semantic
 equality checking and function merging. I went through the majority of code 
 and
 I think it is mostly ready to mainline (i.e. cleaner than what we have in
 tree-ssa-tailmerge) so hope we can finish the review process next week.
 We will need to get better cost/benefits ratio to enable it for -O2 that is
 someting I would really like to see for 5.0, but it seems to be easier to
 handle this incrementally

Thank you for the review,
Martin

 
 Honza
 



Re: [PATCH 3/5] IPA ICF pass

2014-09-27 Thread Martin Liška
On 09/27/2014 07:59 AM, Markus Trippelsdorf wrote:
 On 2014.09.27 at 01:27 +0200, Jan Hubicka wrote:
 While a plain Firefox -flto build works fine. LTO/PGO build fails with:

 lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540
 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*)
 ../../gcc/gcc/ipa-utils.c:540
 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*)
 ../../gcc/gcc/ipa-icf.c:753
 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int)
 ../../gcc/gcc/ipa-icf.c:2706
 0xf1c1f4 ipa_icf::sem_item_optimizer::execute()
 ../../gcc/gcc/ipa-icf.c:2098
 0xf1d3f1 ipa_icf_driver
 ../../gcc/gcc/ipa-icf.c:2784
 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*)
 ../../gcc/gcc/ipa-icf.c:2831


 The pass is also very memory hungry (from 3GB without ICF to 4GB during
 libxul link), while the code size savings are in the 1% range.

 Thnks for checking. I was just thinking about doing that myself.  Would
 you mind posting -ftime-report of firefox WPA stage?
 
 (without ICF)
 Execution times (seconds)
  phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
 wall1412 kB ( 0%) ggc
  phase opt and generate  :  58.38 (63%) usr   2.00 (47%) sys  60.37 (40%) 
 wall  403069 kB (12%) ggc
  phase stream in :  30.24 (33%) usr   0.97 (23%) sys  33.90 (22%) 
 wall 2944210 kB (88%) ggc
  phase stream out:   4.29 ( 5%) usr   1.32 (31%) sys  57.32 (38%) 
 wall   0 kB ( 0%) ggc
  phase finalize  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) 
 wall   0 kB ( 0%) ggc
  garbage collection  :   3.68 ( 4%) usr   0.00 ( 0%) sys   3.68 ( 2%) 
 wall   0 kB ( 0%) ggc
  callgraph optimization  :   0.50 ( 1%) usr   0.00 ( 0%) sys   0.50 ( 0%) 
 wall 166 kB ( 0%) ggc
  ipa dead code removal   :   6.91 ( 7%) usr   0.08 ( 2%) sys   7.25 ( 5%) 
 wall   0 kB ( 0%) ggc
  ipa virtual call target :   7.08 ( 8%) usr   0.04 ( 1%) sys   6.93 ( 5%) 
 wall   0 kB ( 0%) ggc
  ipa devirtualization:   0.27 ( 0%) usr   0.00 ( 0%) sys   0.27 ( 0%) 
 wall   10365 kB ( 0%) ggc
  ipa cp  :   1.81 ( 2%) usr   0.06 ( 1%) sys   3.40 ( 2%) 
 wall  173701 kB ( 5%) ggc
  ipa inlining heuristics :  16.60 (18%) usr   0.27 ( 6%) sys  17.48 (12%) 
 wall  532704 kB (16%) ggc
  ipa comdats :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) 
 wall   0 kB ( 0%) ggc
  ipa lto gimple out  :   0.21 ( 0%) usr   0.04 ( 1%) sys   0.97 ( 1%) 
 wall   0 kB ( 0%) ggc
  ipa lto decl in :  18.29 (20%) usr   0.54 (13%) sys  18.96 (12%) 
 wall 2226088 kB (66%) ggc
  ipa lto decl out:   3.93 ( 4%) usr   0.13 ( 3%) sys   4.06 ( 3%) 
 wall   0 kB ( 0%) ggc
  ipa lto constructors in :   0.24 ( 0%) usr   0.03 ( 1%) sys   0.59 ( 0%) 
 wall   14226 kB ( 0%) ggc
  ipa lto constructors out:   0.08 ( 0%) usr   0.04 ( 1%) sys   0.15 ( 0%) 
 wall   0 kB ( 0%) ggc
  ipa lto cgraph I/O  :   0.89 ( 1%) usr   0.12 ( 3%) sys   1.02 ( 1%) 
 wall  364151 kB (11%) ggc
  ipa lto decl merge  :   2.14 ( 2%) usr   0.01 ( 0%) sys   2.14 ( 1%) 
 wall8196 kB ( 0%) ggc
  ipa lto cgraph merge:   1.59 ( 2%) usr   0.00 ( 0%) sys   1.60 ( 1%) 
 wall   12716 kB ( 0%) ggc
  whopr wpa   :   1.54 ( 2%) usr   0.03 ( 1%) sys   1.55 ( 1%) 
 wall   1 kB ( 0%) ggc
  whopr wpa I/O   :   0.04 ( 0%) usr   1.11 (26%) sys  52.10 (34%) 
 wall   0 kB ( 0%) ggc
  whopr partitioning  :   5.02 ( 5%) usr   0.01 ( 0%) sys   5.03 ( 3%) 
 wall4938 kB ( 0%) ggc
  ipa reference   :   2.04 ( 2%) usr   0.02 ( 0%) sys   2.08 ( 1%) 
 wall   0 kB ( 0%) ggc
  ipa profile :   0.32 ( 0%) usr   0.00 ( 0%) sys   0.33 ( 0%) 
 wall   0 kB ( 0%) ggc
  ipa pure const  :   2.43 ( 3%) usr   0.02 ( 0%) sys   2.49 ( 2%) 
 wall   0 kB ( 0%) ggc
  tree STMT verifier  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) 
 wall   0 kB ( 0%) ggc
  callgraph verifier  :  16.31 (18%) usr   1.69 (39%) sys  17.96 (12%) 
 wall   0 kB ( 0%) ggc
  dominance computation   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) 
 wall   0 kB ( 0%) ggc
  varconst:   0.01 ( 0%) usr   0.03 ( 1%) sys   0.05 ( 0%) 
 wall   0 kB ( 0%) ggc
  unaccounted todo:   0.69 ( 1%) usr   0.00 ( 0%) sys   0.69 ( 0%) 
 wall   0 kB ( 0%) ggc
  TOTAL :  92.91 4.29   151.73
 3348693 kB
 Extra diagnostic checks enabled; compiler may run slowly.
 Configure with --enable-checking=release to disable checks.
 
 (with ICF)
 Execution times (seconds)
  phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
 wall1412 kB ( 0%) ggc
  phase opt and generate  :  82.70 (70%) usr   3.31 (53%) sys  86.17 (45%) 
 wall 1468975 kB (33%) ggc
  phase stream in :  30.46 (26%) usr   1.02 (16%) sys  31.48 (16%) 
 wall 2944210 kB (67%) ggc
  phase stream out:   4.52 ( 4%) usr 

Re: [PATCH 3/5] IPA ICF pass

2014-09-27 Thread Martin Liška
On 09/27/2014 09:47 AM, Markus Trippelsdorf wrote:
 On 2014.09.27 at 07:59 +0200, Markus Trippelsdorf wrote:

 It seems that in this case we reject too many of equality candidates?
 It think the original numbers was about 4-5% but later some equivalences was
 disabled because of devirt/aliasing issues. Do you compare it with gold ICF
 enabled? There are quite few obvious improvements to the analysis that can
 be done, but I guess we need to analyze the interesting cases one by one.
 
 Forgot to post the binary size numbers (in bytes):
 
   | gold's icf off | gold's icf on  |
 --+++
 gcc's icf off |79793880|74881040|
 --+-+
 gcc's icf on  |78043608|73612800|
 --+++
 

Thanks once more!

Gold ICF is quite strong, I will verify what functions are not caught by IPA 
ICF.
These data present that IPA ICF can reduce the binary by 2.19%. I know that 
it's quite a small improvement,
but if you realize that the pass can reduce just the size of .text (and 
slightly related sections). There are
stats about libxul.so (please ignore last 3 columns):

Section name   Start   Size in BSizePortion Disk 
read in B   Disk read   Sec. portion
   0   0  0.00 B  0.00% 
 0  0.00 B  0.00%
.note.gnu.build-i512  36 36.00 B  0.00% 
 0  0.00 B  0.00%
.dynsym  552   8119279.29 KB  0.08% 
 0  0.00 B  0.00%
.dynstr81744   9085988.73 KB  0.09% 
 0  0.00 B  0.00%
.hash 172608   2175221.24 KB  0.02% 
 0  0.00 B  0.00%
.gnu.version  1943606766 6.61 KB  0.01% 
 0  0.00 B  0.00%
.gnu.version_d201128  56 56.00 B  0.00% 
 0  0.00 B  0.00%
.gnu.version_r2011841216 1.19 KB  0.00% 
 0  0.00 B  0.00%
.rela.dyn 202400 8198208 7.82 MB  8.56% 
 0  0.00 B  0.00%
.rela.plt8400608   7027268.62 KB  0.07% 
 0  0.00 B  0.00%
.init8470880  26 26.00 B  0.00% 
 0  0.00 B  0.00%
.plt 8470912   4686445.77 KB  0.05% 
 0  0.00 B  0.00%
.text85177763901433337.21 MB 40.72% 
 0  0.00 B  0.00%
.fini   47532112   9  9.00 B  0.00% 
 0  0.00 B  0.00%
.rodata 475322881525856014.55 MB 15.93% 
 0  0.00 B  0.00%
.eh_frame   62790848 6203564 5.92 MB  6.47% 
 0  0.00 B  0.00%
.eh_frame_hdr   68994412 1088012 1.04 MB  1.14% 
 0  0.00 B  0.00%
.tbss   70082560   4  4.00 B  0.00% 
 0  0.00 B  0.00%
.dynamic700825601104 1.08 KB  0.00% 
 0  0.00 B  0.00%
.got700836641384 1.35 KB  0.00% 
 0  0.00 B  0.00%
.got.plt70085048   2344822.90 KB  0.02% 
 0  0.00 B  0.00%
.data   70108544  811616   792.59 KB  0.85% 
 0  0.00 B  0.00%
.jcr70920160   8  8.00 B  0.00% 
 0  0.00 B  0.00%
.tm_clone_table 70920168   0  0.00 B  0.00% 
 0  0.00 B  0.00%
.fini_array 70920168   8  8.00 B  0.00% 
 0  0.00 B  0.00%
.init_array 70920176  16 16.00 B  0.00% 
 0  0.00 B  0.00%
.data.rel.ro.loca   70920192 3938880 3.76 MB  4.11% 
 0  0.00 B  0.00%
.data.rel.ro74859072  269216   262.91 KB  0.28% 
 0  0.00 B  0.00%
.bss75128320 1844246 1.76 MB  1.92% 
 0  0.00 B  0.00%
.debug_line 75128288 517517.00 B  0.00% 
 0  0.00 B  0.00%
.debug_info 75128805 817817.00 B  0.00% 
 0  0.00 B  0.00%
.debug_abbrev   75129622 438438.00 B  0.00% 
 0   

Re: [Ping] Port of VTV for Cygwin and MinGW

2014-09-27 Thread Kai Tietz
Hi Patrick,

the mingw/cygwin part your patch looks fine to me.  Nevertheless I
have one question regarding to you.  Do you have FSF papers for gcc
already?  As I asked an overseer and he didn't found you on the list.

Regards,
Kai


Re: [Patch, Fortran] Add CO_BROADCAST

2014-09-27 Thread Dominique Dhumieres

The failures for the gfortran.dg/coarray_collectives_9.f90 are fixed
with the following patch:

--- ../_clean/gcc/testsuite/gfortran.dg/coarray_collectives_9.f90   
2014-09-25 12:14:05.0 +0200
+++ gcc/testsuite/gfortran.dg/coarray_collectives_9.f90 2014-09-27 
13:03:41.0 +0200
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options -fcoarray=single }
+! { dg-options -fcoarray=single -fmax-errors=40 }
 !
 !
 ! CO_BROADCAST/CO_REDUCE
@@ -29,7 +29,7 @@ program test
   call co_reduce(abc) ! { dg-error Missing actual argument 'operator' in 
call to 'co_reduce' }
   call co_broadcast(1, source_image=1) ! { dg-error 'a' argument of 
'co_broadcast' intrinsic at .1. must be a variable }
   call co_reduce(a=1, operator=red_f) ! { dg-error 'a' argument of 
'co_reduce' intrinsic at .1. must be a variable }
-  call co_reduce(a=val, operator=red_f2) ! { dg-error OPERATOR argument at 
(1) must be a PURE function }
+  call co_reduce(a=val, operator=red_f2) ! { dg-error OPERATOR argument at 
\\(1\\) must be a PURE function }
 
   call co_broadcast(val, source_image=[1,2]) ! { dg-error must be a scalar }
   call co_broadcast(val, source_image=1.0) ! { dg-error must be INTEGER }
@@ -49,14 +49,14 @@ program test
   call co_reduce(val, red_f, stat=[1,2]) ! { dg-error must be a scalar }
   call co_reduce(val, red_f, stat=1.0) ! { dg-error must be INTEGER }
   call co_reduce(val, red_f, stat=1) ! { dg-error must be a variable }
-  call co_reduce(val, red_f, stat=i, result_image=1) ! OK
-  call co_reduce(val, red_f, stat=i, errmsg=errmsg, result_image=1) ! OK
+  call co_reduce(val, red_f, stat=i, result_image=1) ! { dg-error CO_REDUCE 
at \\(1\\) is not yet implemented }
+  call co_reduce(val, red_f, stat=i, errmsg=errmsg, result_image=1) ! { 
dg-error CO_REDUCE at \\(1\\) is not yet implemented }
   call co_reduce(val, red_f, stat=i, errmsg=[errmsg], result_image=1) ! { 
dg-error must be a scalar }
   call co_reduce(val, red_f, stat=i, errmsg=5, result_image=1) ! { dg-error 
must be CHARACTER }
   call co_reduce(val, red_f, errmsg=abc) ! { dg-error must be a variable }
   call co_reduce(val, red_f, stat=i8) ! { dg-error The stat= argument at .1. 
must be a kind=4 integer variable }
   call co_reduce(val, red_f, errmsg=msg4) ! { dg-error The errmsg= argument 
at .1. must be a default-kind character variable }
 
-  call co_broadcasr(vec(idx), 1) ! { dg-error Argument 'A' with 
INTENT\\(INOUT\\) at .1. of the intrinsic subroutine co_sum shall not have a 
vector subscript }
-  call co_reduce(vec([1,3,2]), red_f) ! { dg-error Argument 'A' with 
INTENT\\(INOUT\\) at .1. of the intrinsic subroutine co_min shall not have a 
vector subscript }
+  call co_broadcasr(vec(idx), 1) ! OK?
+  call co_reduce(vec([1,3,2]), red_f) ! { dg-error Argument 'A' with 
INTENT\\(INOUT\\) at .1. of the intrinsic subroutine co_reduce shall not have a 
vector subscript }
 end program test

Am I missing something?

Cheers,

Dominique


Re: [PING] [PATCH] Add direct support for Linux kernel __fentry__ patching

2014-09-27 Thread Dominique Dhumieres
The new tests fail on darwin:

/opt/gcc/work/gcc/testsuite/gcc.target/i386/nop-mcount.c:1:0: error: 
-mnop-mcount is not implemented for -fPIC

and gcc.target/i386/record-mcount.c fails because mcount_loc is not found in 
the assembly.

TIA

Dominique


[PATCH] PR libitm/61164: use always_inline consistently

2014-09-27 Thread Gleb Fotengauer-Malinovskiy
2014-09-27 Gleb Fotengauer-Malinovskiy gle...@altlinux.org

libitm/

PR libitm/61164
* local_atomic (__always_inline): Add inline.
(__calculate_memory_order): Remove inline.
(atomic_thread_fence): Likewise.
(atomic_signal_fence): Likewise.
(atomic_flag_test_and_set_explicit): Likewise.
(atomic_flag_clear_explicit): Likewise.
(atomic_flag_test_and_set): Likewise.
(atomic_flag_clear): Likewise.

---
 libitm/local_atomic | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/libitm/local_atomic b/libitm/local_atomic
index c3e079f..ae35ada 100644
--- a/libitm/local_atomic
+++ b/libitm/local_atomic
-  inline __always_inline void
+  __always_inline void
   atomic_signal_fence(memory_order __m) noexcept
   {
 __atomic_thread_fence (__m);
@@ -1544,38 +1544,38 @@ namespace std // _GLIBCXX_VISIBILITY(default)
 
 
   // Function definitions, atomic_flag operations.
-  inline __always_inline bool
+  __always_inline bool
   atomic_flag_test_and_set_explicit(atomic_flag* __a,
memory_order __m) noexcept
   { return __a-test_and_set(__m); }
 
-  inline __always_inline bool
+  __always_inline bool
   atomic_flag_test_and_set_explicit(volatile atomic_flag* __a,
memory_order __m) noexcept
   { return __a-test_and_set(__m); }
 
-  inline __always_inline void
+  __always_inline void
   atomic_flag_clear_explicit(atomic_flag* __a, memory_order __m) noexcept
   { __a-clear(__m); }
 
-  inline __always_inline void
+  __always_inline void
   atomic_flag_clear_explicit(volatile atomic_flag* __a,
 memory_order __m) noexcept
   { __a-clear(__m); }
 
-  inline __always_inline bool
+  __always_inline bool
   atomic_flag_test_and_set(atomic_flag* __a) noexcept
   { return atomic_flag_test_and_set_explicit(__a, memory_order_seq_cst); }
 
-  inline __always_inline bool
+  __always_inline bool
   atomic_flag_test_and_set(volatile atomic_flag* __a) noexcept
   { return atomic_flag_test_and_set_explicit(__a, memory_order_seq_cst); }
 
-  inline __always_inline void
+  __always_inline void
   atomic_flag_clear(atomic_flag* __a) noexcept
   { atomic_flag_clear_explicit(__a, memory_order_seq_cst); }
 
-  inline __always_inline void
+  __always_inline void
   atomic_flag_clear(volatile atomic_flag* __a) noexcept
   { atomic_flag_clear_explicit(__a, memory_order_seq_cst); }
 
-- 
glebfm


Re: [C++ PATCH] Fix -Wlogical-not-parentheses (PR c++/62199)

2014-09-27 Thread Marc Glisse

On Fri, 22 Aug 2014, Marc Glisse wrote:


On Fri, 22 Aug 2014, Jason Merrill wrote:


On 08/22/2014 03:24 PM, Marc Glisse wrote:

Note that there is a patch waiting for a review that makes us accept !v
for vector v:


Ah, indeed.  I still think we might as well treat vectors the same as other 
types here.


Ok, now that it is a conscious choice, it feels much safer :-)
Though depending on where exactly this is called, it would be funny if we 
warned for !v==-1 but not for !v==true, when the possible values for the 
elements of !v are actually {-1,0}. I guess I'll have to test after Marek 
commits.


Sadly, this is exactly what is happening. With my patch,

typedef int veci __attribute__ ((vector_size (4 * sizeof (int;
void f (veci *a)
{
  *a = !*a == -1;
}

warning: logical not is only applied to the left hand side of comparison

I also get the warning if I replace -1 with 0 or with a vector, but not 
with true. This seems like the reverse of what is desirable.


I don't see how to change that, the warning is super-early (it warns for 
templates that are not instantiated) and we may not know yet if the lhs is 
a vector.


I guess people using vectors in such a strange construct can just always 
add parentheses, and it should be rare that anyone writes !vec==true and 
thus misses a useful warning.


--
Marc Glisse


Re: [PATCH] microblaze: microblaze.md: Use VOID instead of SI to fix ((void (*)(void)) 0)() issue

2014-09-27 Thread Michael Eager

On 09/25/14 07:03, Chen Gang wrote:

Need use VOID instead of SI, or when real VOIDmode comes, it does not
match SImode, so cause issue. This patch can fix this issue and pass
testsuite.

The related test code ('void' will cause CALL instead of SET):

   typedef void (*T)(void);
   f1 ()
   {
 ((T) 0)();
   }

The related error:

   [root@localhost gcc]# ./cc1 /tmp/calls.c -o /tmp/1.s
f1
   Analyzing compilation unit
   Performing interprocedural optimizations
*free_lang_data visibility early_local_cleanups free-inline-summary 
whole-program inlineAssembling functions:
f1
   /tmp/calls.c: In function 'f1':
   /tmp/calls.c:5:1: error: unrecognizable insn:
}
^
   (call_insn 5 2 8 2 (parallel [
   (call (mem:SI (const_int 0 [0]) [0 MEM[(void (*T29c) 
(void))0B] S4 A32])
   (const_int 24 [0x18]))
   (clobber (reg:SI 15 r15))
   ]) /tmp/calls.c:4 -1
(nil)
   (nil))
   /tmp/calls.c:5:1: internal compiler error: in extract_insn, at recog.c:2204
   0xb0e71b _fatal_insn(char const*, rtx_def const*, char const*, int, char 
const*)
../../gcc/gcc/rtl-error.c:109
   0xb0e75c _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
../../gcc/gcc/rtl-error.c:117
   0xac552b extract_insn(rtx_def*)
../../gcc/gcc/recog.c:2204
   0x8b919e instantiate_virtual_regs_in_insn
../../gcc/gcc/function.c:1614
   0x8ba347 instantiate_virtual_regs
../../gcc/gcc/function.c:1934
   0x8ba452 execute
../../gcc/gcc/function.c:1983
   Please submit a full bug report,
   with preprocessed source if appropriate.
   Please include the complete backtrace with any bug report.
   See http://gcc.gnu.org/bugs.html for instructions.


Is this test case (or a similar one) in the gcc test suite?

If not, can you please add it to the test suite.




2014-09-25  Chen Gang  gang.chen.5...@gmail.com

* config/microblaze/microblaze.md (call_internal1): Use VOID
instead of SI to fix ((void (*)(void)) 0)() issue

---
  gcc/config/microblaze/microblaze.md | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/microblaze/microblaze.md 
b/gcc/config/microblaze/microblaze.md
index b971737..3b4faf4 100644
--- a/gcc/config/microblaze/microblaze.md
+++ b/gcc/config/microblaze/microblaze.md
@@ -2062,7 +2062,7 @@
(set_attr length4)])

  (define_insn call_internal1
-  [(call (mem (match_operand:SI 0 call_insn_simple_operand ri))
+  [(call (mem (match_operand:VOID 0 call_insn_simple_operand ri))
 (match_operand:SI 1  i))
(clobber (reg:SI R_SR))]



I've verified that your patch does not cause any test suite regressions.


--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


[PATCH 2/2] Make -Q --help print param defaults and min/max values

2014-09-27 Thread Andi Kleen
From: Andi Kleen a...@linux.intel.com

Make -Q --help print the --param default, min, max values, similar
to how it does print the defaults for other flags. This is useful
to let a option auto tuner automatically query all needed information
abourt gcc params (previously it needed to access the .def file in
the source)

gcc/:

2014-09-26  Andi Kleen  a...@linux.intel.com

* opts.c (print_filtered_help): Print --param min/max/default
with -Q.
---
 gcc/opts.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/opts.c b/gcc/opts.c
index 0a49bc0..5cb5a39 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -953,6 +953,7 @@ print_filtered_help (unsigned int include_flags,
   const char *help;
   bool found = false;
   bool displayed = false;
+  char new_help[128];
 
   if (include_flags == CL_PARAMS)
 {
@@ -971,6 +972,15 @@ print_filtered_help (unsigned int include_flags,
  /* Get the translation.  */
  help = _(help);
 
+ if (!opts-x_quiet_flag)
+   {
+ snprintf (new_help, sizeof (new_help),
+   _(default %d minimum %d maximum %d),
+   compiler_params[i].default_value,
+   compiler_params[i].min_value,
+   compiler_params[i].max_value);
+ help = new_help;
+   }
  wrap_help (help, param, strlen (param), columns);
}
   putchar ('\n');
@@ -985,7 +995,6 @@ print_filtered_help (unsigned int include_flags,
 
   for (i = 0; i  cl_options_count; i++)
 {
-  char new_help[128];
   const struct cl_option *option = cl_options + i;
   unsigned int len;
   const char *opt;
-- 
2.1.1



[PATCH 1/2] Remove -fshort-double

2014-09-27 Thread Andi Kleen
From: Andi Kleen a...@linux.intel.com

-fshort-double has crashes the compiler since 4.6 (see PR60410)
Since it's an obscure option that apparently nobody uses it the
best way to fix it seems to just remove it.

This prevents constant ICEs when running an gcc optimization flags
autotuner.

gcc/testsuite/:

2014-09-26  Andi Kleen  a...@linux.intel.com

PR target/60410
* gcc.dg/lto/pr55113_0.c: Remove.

gcc/:

2014-09-26  Andi Kleen  a...@linux.intel.com

PR target/60410
* doc/invoke.texi: Remove -fshort-double.
* lto-wrapper.c (merge_and_complain): Dito.
(run_gcc): Dito.

gcc/c-family/:

2014-09-26  Andi Kleen  a...@linux.intel.com

PR target/60410
* c-common.c (c_common_nodes_and_builtins): Remove
-fshort-double.
* c.opt: Dito.

gcc/lto/:

2014-09-26  Andi Kleen  a...@linux.intel.com

PR target/60410
* lto-lang.c (lto_init): Remove -fshort-double.
---
 gcc/c-family/c-common.c  |  2 +-
 gcc/c-family/c.opt   |  4 
 gcc/doc/invoke.texi  | 10 +-
 gcc/lto-wrapper.c|  3 ---
 gcc/lto/lto-lang.c   |  2 +-
 gcc/testsuite/gcc.dg/lto/pr55113_0.c | 14 --
 6 files changed, 3 insertions(+), 32 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.dg/lto/pr55113_0.c

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index a9e0191..7a529a2 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -5325,7 +5325,7 @@ c_common_nodes_and_builtins (void)
   tree va_list_ref_type_node;
   tree va_list_arg_type_node;
 
-  build_common_tree_nodes (flag_signed_char, flag_short_double);
+  build_common_tree_nodes (flag_signed_char, false);
 
   /* Define `int' and `char' first so that dbx will output them first.  */
   record_builtin_type (RID_INT, NULL, integer_type_node);
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 72ac2ed..d6a9698 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1251,10 +1251,6 @@ frtti
 C++ ObjC++ Optimization Var(flag_rtti) Init(1)
 Generate run time type descriptor information
 
-fshort-double
-C ObjC C++ ObjC++ LTO Optimization Var(flag_short_double)
-Use the same size for double as for float
-
 fshort-enums
 C ObjC C++ ObjC++ LTO Optimization Var(flag_short_enums)
 Use the narrowest integer type possible for enumeration types
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0c3f4be..b2b667d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1094,7 +1094,7 @@ See S/390 and zSeries Options.
 -fno-jump-tables @gol
 -frecord-gcc-switches @gol
 -freg-struct-return  -fshort-enums @gol
--fshort-double  -fshort-wchar @gol
+-fshort-wchar @gol
 -fverbose-asm  -fpack-struct[=@var{n}]  -fstack-check @gol
 -fstack-limit-register=@var{reg}  -fstack-limit-symbol=@var{sym} @gol
 -fno-stack-limit -fsplit-stack @gol
@@ -22598,14 +22598,6 @@ is equivalent to the smallest integer type that has 
enough room.
 code that is not binary compatible with code generated without that switch.
 Use it to conform to a non-default application binary interface.
 
-@item -fshort-double
-@opindex fshort-double
-Use the same size for @code{double} as for @code{float}.
-
-@strong{Warning:} the @option{-fshort-double} switch causes GCC to generate
-code that is not binary compatible with code generated without that switch.
-Use it to conform to a non-default application binary interface.
-
 @item -fshort-wchar
 @opindex fshort-wchar
 Override the underlying type for @samp{wchar_t} to be @samp{short
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 08fd090..a2ce79c 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -275,7 +275,6 @@ merge_and_complain (struct cl_decoded_option 
**decoded_options,
 
case OPT_freg_struct_return:
case OPT_fpcc_struct_return:
-   case OPT_fshort_double:
  for (j = 0; j  *decoded_options_count; ++j)
if ((*decoded_options)[j].opt_index == foption-opt_index)
  break;
@@ -500,7 +499,6 @@ run_gcc (unsigned argc, char *argv[])
case OPT_fgnu_tm:
case OPT_freg_struct_return:
case OPT_fpcc_struct_return:
-   case OPT_fshort_double:
case OPT_ffp_contract_:
case OPT_fwrapv:
case OPT_ftrapv:
@@ -573,7 +571,6 @@ run_gcc (unsigned argc, char *argv[])
 
case OPT_freg_struct_return:
case OPT_fpcc_struct_return:
-   case OPT_fshort_double:
  /* Ignore these, they are determined by the input files.
 ???  We fail to diagnose a possible mismatch here.  */
  continue;
diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c
index 9e8524a..57b4f71 100644
--- a/gcc/lto/lto-lang.c
+++ b/gcc/lto/lto-lang.c
@@ -1165,7 +1165,7 @@ lto_init (void)
   flag_generate_lto = (flag_wpa != NULL);
 
   /* Create the basic integer types.  */
-  build_common_tree_nodes (flag_signed_char, flag_short_double);
+  

Re: [PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for pseudo with more than one definition

2014-09-27 Thread Felix Yang
Thanks for the explaination.
I have changed the loop_depth into a short interger hoping that we can
save some memory :-)
Attached please find the updated patch. Bootstrapped and reg-tested on
x86_64-suse-linux.
Please do a final revew once the assignment is ready.

As for the new list walking interface, I choose the function
no_equiv and tried the checked cast way.
The bad news is that GCC failed to bootstrap with the following change:

Index: ira.c
===
--- ira.c (revision 215536)
+++ ira.c (working copy)
@@ -3242,12 +3242,12 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSE
   void *data ATTRIBUTE_UNUSED)
 {
   int regno;
-  rtx list;
+  rtx_insn_list *list;

   if (!REG_P (reg))
 return;
   regno = REGNO (reg);
-  list = reg_equiv[regno].init_insns;
+  list = as_a rtx_insn_list * (reg_equiv[regno].init_insns);
   if (list == const0_rtx)
 return;
   reg_equiv[regno].init_insns = const0_rtx;
@@ -3258,9 +3258,9 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSE
 return;
   ira_reg_equiv[regno].defined_p = false;
   ira_reg_equiv[regno].init_insns = NULL;
-  for (; list; list =  XEXP (list, 1))
+  for (; list; list = list-next ())
 {
-  rtx insn = XEXP (list, 0);
+  rtx_insn *insn = list-insn ();
   remove_note (insn, find_reg_note (insn, REG_EQUIV, NULL_RTX));
 }
 }

Error message:
 ...
checking for suffix of object files... configure: error: in
`/home/yangfei/gcc-devel/build/x86_64-unknown-linux-gnu/libgcc':
configure: error: cannot compute suffix of object files: cannot compile
See `config.log' for more details.
make[2]: *** [configure-stage1-target-libgcc] Error 1
make[2]: Leaving directory `/home/yangfei/gcc-devel/build'
make[1]: *** [stage1-bubble] Error 2
make[1]: Leaving directory `/home/yangfei/gcc-devel/build'
make: *** [all] Error 2

I think the code change is OK. Anything I missed?

Cheers,
Felix


On Sat, Sep 27, 2014 at 5:03 AM, Jeff Law l...@redhat.com wrote:
 On 09/26/14 07:57, Felix Yang wrote:

 Hi Jeff,

  Thanks for the suggestions. I updated the patch accordingly.

  1. Both my employer(Huawei) and I have signed the copyright
 assignments with FSF.
  These assignments are already sent via post two days ago and
 hopefully should reach FSF in one week.
  Maybe it's OK to commit this patch now?

 Not really.  It needs to be accepted by the FSF before we can include the
 work.


   2. I am not turning member loop_depth of struct equivalence into
 short integer as GCC API such as bb_loop_depth
   returns a loop's depth as a 32-bit interger.

 There's already other places that assume loops don't nest that deep. Please
 go ahead and change it.  And no need to explicitly mark the unused bits.
 That's just a maintenance nightmare in the long term anyway :-)



   3. I find it's kind of difficult to use the new type and
 interfaces for list walking the init_insns list for this patch.
  The type of init_insns list is rtx, not rtl_insn_list *. Seems
 we need to change a lot in order to use the new interface.
  Not clear about the reason why it is not adjusted when we are
 transferring to the new interface.
  Anyway, I think it's better to have another patch fix that issue.
 OK?

 The right way to go is to add a checked cast when we have some code that is
 using the old interface and other code using the new interface.  It's
 actually a pretty easy change.

 The checked casts effectively mark the limits of where we've been able to
 push the RTL typesafety work.  Long term as we push the typesafety work
 further into the compiler many/most of the checked casts will go away.

 Unfortunately, that won't work in this case because other code wants to
 store a (const0_rtx) into the insn list.  (const0_rtx) isn't an INSN, so the
 checked cast fails and we get a nice abort/ICE.

 Conceptually we just need another marker that is an INSN and we might as
 well just convert the whole file to use the new interface at that point.

 Consider the request pulled.

 The const0-rtx problem may be why this wasn't converted in the first palce.
 Or it may simply have been a time problem.  David's done  250 patches
 around RTL typesafety, but he also has other work to be doing ;-)


   4. This bug is only reproduceable with my local customized GCC
 version. So I don't have a testcase then.

 OK.

 I'll do a final review when I get notice about the copyright assignment from
 the FSF.

 jeff

Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 215658)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,14 @@
+2014-09-26  Felix Yang  felix.y...@huawei.com
+   Jeff Law  l...@redhat.com
+
+   * ira.c (struct equivalence): Change member is_arg_equivalence and 
replace
+   into boolean bitfields; turn member loop_depth into a short integer; 
add new
+   member no_equiv 

Re: [PING] [PATCH] Add direct support for Linux kernel __fentry__ patching

2014-09-27 Thread Andi Kleen
On Sat, Sep 27, 2014 at 01:21:29PM +0200, Dominique Dhumieres wrote:
 The new tests fail on darwin:
 
 /opt/gcc/work/gcc/testsuite/gcc.target/i386/nop-mcount.c:1:0: error: 
 -mnop-mcount is not implemented for -fPIC
 
 and gcc.target/i386/record-mcount.c fails because mcount_loc is not found in 
 the assembly.

Sorry. Here's a patch. I'll install it as obvious unless someone
complains. Also I hope it's the last mcount patch for now :-)

-Andi

diff --git a/gcc/testsuite/gcc.target/i386/nop-mcount.c 
b/gcc/testsuite/gcc.target/i386/nop-mcount.c
index 2592231..942cae0 100644
--- a/gcc/testsuite/gcc.target/i386/nop-mcount.c
+++ b/gcc/testsuite/gcc.target/i386/nop-mcount.c
@@ -1,5 +1,5 @@
 /* Test -mnop-mcount */
-/* { dg-do compile } */
+/* { dg-do compile { target *-*-linux* } } */
 /* { dg-options -pg -mfentry -mrecord-mcount -mnop-mcount } */
 /* { dg-final { scan-assembler-not __fentry__ } } */
 /* Origin: Andi Kleen */
diff --git a/gcc/testsuite/gcc.target/i386/record-mcount.c 
b/gcc/testsuite/gcc.target/i386/record-mcount.c
index dae413e..26b0dbc 100644
--- a/gcc/testsuite/gcc.target/i386/record-mcount.c
+++ b/gcc/testsuite/gcc.target/i386/record-mcount.c
@@ -1,5 +1,5 @@
 /* Test -mrecord-mcount */
-/* { dg-do compile } */
+/* { dg-do compile { target *-*-linux* } } */
 /* { dg-options -pg -mrecord-mcount } */
 /* { dg-final { scan-assembler mcount_loc } } */
 /* Origin: Andi Kleen */


Re: [RFC/PATCH] Fix-it hints

2014-09-27 Thread Manuel López-Ibáñez
On 25 September 2014 11:34, Dodji Seketeli do...@redhat.com wrote:
 When the caret line is disabled with -fno-diagnostics-show-caret, the
 fix-it hint is printed as:

 gcc/testsuite/g++.dg/template/crash83.C:5:21: error: an explicit
 specialization must be preceded by 'template '
 gcc/testsuite/g++.dg/template/crash83.C:5:21: fixit: template

 The latter form may allow an IDE (such as emacs) to automatically
 apply the fix.

 Nice.  Is the fixit: prefix used by other compilers too?  Or are there
 variations from compiler to compiler?

Clang seems to do:

fix-it:t.cpp:{7:25-7:29}:Gamma

http://clang.llvm.org/docs/UsersManual.html#cmdoption-fdiagnostics-parseable-fixits

where the location is a half-open range (insertions are marked as
{7:25-7:25}). However, this is not the standard GNU format for ranges,
and last time I tried to update that (to support multiple ranges) RMS
was not in favour of adding the Clang flavour. Quoting the filename
also does not seem to be what GNU mandates. No idea why they use {}
around the location range and not sure if they support just 7:25,
which seems nicer.

Also, I don't know who are the consumers of Clang's format that GCC
will be interested in supporting.

No idea about other compilers that support fix-it hints.

Comments?



 Currently, fix-it hints are limited to insertions at one single
 location, whereas Clang allows insertions, deletions, and replacements
 at arbitrary location ranges.

 Do you have example of each of these kinds of fix-it hints? (deletions,
 replacement at location ranges).  I think it'd be nice to have an idea
 of what needs to be done, even if we are not doing it in extenso right
 now.

I think in the case of deletions, they just underline the text to be deleted:

typename3.C:3:7: warning: duplicate 'const' declaration specifier
[-Wduplicate-decl-specifier]
const const long long long int x;
  ^~
fix-it:typename3.C:{3:7-3:13}:

In the case of replacements, there does not seem to be any visible
difference with an insertion:

typename3.C:4:33: note: change this ',' to a ';' to call 'f'
const const long long long int x,
^
;
fix-it:typename3.C:{4:33-4:34}:;

Cheers,

Manuel.


Re: [PING] [PATCH] Add direct support for Linux kernel __fentry__ patching

2014-09-27 Thread Dominique d'Humières

Le 27 sept. 2014 à 18:45, Dominique d'Humières domi...@lps.ens.fr a écrit :

 I think the patch for gcc.target/i386/nop-mcount.c should be
 
 --- ../_clean/gcc/testsuite/gcc.target/i386/nop-mcount.c  2014-09-26 
 23:29:45.0 +0200
 +++ gcc/testsuite/gcc.target/i386/nop-mcount.c2014-09-27 
 18:43:40.0 +0200
 @@ -1,5 +1,5 @@
  /* Test -mnop-mcount */
 -/* { dg-do compile } */
 +/* { dg-do compile { target { *-*-linux*  nonpic } } } */
  /* { dg-options -pg -mfentry -mrecord-mcount -mnop-mcount } */
  /* { dg-final { scan-assembler-not __fentry__ } } */
  /* Origin: Andi Kleen */
 
 Dominique
 
 Le 27 sept. 2014 à 17:35, Andi Kleen a...@firstfloor.org a écrit :
 
 On Sat, Sep 27, 2014 at 01:21:29PM +0200, Dominique Dhumieres wrote:
 The new tests fail on darwin:
 
 /opt/gcc/work/gcc/testsuite/gcc.target/i386/nop-mcount.c:1:0: error: 
 -mnop-mcount is not implemented for -fPIC
 
 and gcc.target/i386/record-mcount.c fails because mcount_loc is not found 
 in the assembly.
 
 Sorry. Here's a patch. I'll install it as obvious unless someone
 complains. Also I hope it's the last mcount patch for now :-)
 
 -Andi
 
 diff --git a/gcc/testsuite/gcc.target/i386/nop-mcount.c 
 b/gcc/testsuite/gcc.target/i386/nop-mcount.c
 index 2592231..942cae0 100644
 --- a/gcc/testsuite/gcc.target/i386/nop-mcount.c
 +++ b/gcc/testsuite/gcc.target/i386/nop-mcount.c
 @@ -1,5 +1,5 @@
 /* Test -mnop-mcount */
 -/* { dg-do compile } */
 +/* { dg-do compile { target *-*-linux* } } */
 /* { dg-options -pg -mfentry -mrecord-mcount -mnop-mcount } */
 /* { dg-final { scan-assembler-not __fentry__ } } */
 /* Origin: Andi Kleen */
 diff --git a/gcc/testsuite/gcc.target/i386/record-mcount.c 
 b/gcc/testsuite/gcc.target/i386/record-mcount.c
 index dae413e..26b0dbc 100644
 --- a/gcc/testsuite/gcc.target/i386/record-mcount.c
 +++ b/gcc/testsuite/gcc.target/i386/record-mcount.c
 @@ -1,5 +1,5 @@
 /* Test -mrecord-mcount */
 -/* { dg-do compile } */
 +/* { dg-do compile { target *-*-linux* } } */
 /* { dg-options -pg -mrecord-mcount } */
 /* { dg-final { scan-assembler mcount_loc } } */
 /* Origin: Andi Kleen */
 



Re: [PATCH v2] PR libitm/61164: use always_inline consistently

2014-09-27 Thread Gleb Fotengauer-Malinovskiy
2014-09-27 Gleb Fotengauer-Malinovskiy gle...@altlinux.org

libitm/

PR libitm/61164
* local_atomic (__always_inline): Add inline.
(__calculate_memory_order): Remove inline.
(atomic_thread_fence): Likewise.
(atomic_signal_fence): Likewise.
(atomic_flag_test_and_set_explicit): Likewise.
(atomic_flag_clear_explicit): Likewise.
(atomic_flag_test_and_set): Likewise.
(atomic_flag_clear): Likewise.

---
Sorry, previous patch is incomplete.

 libitm/local_atomic | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/libitm/local_atomic b/libitm/local_atomic
index c3e079f..ae35ada 100644
--- a/libitm/local_atomic
+++ b/libitm/local_atomic
-  inline __always_inline void
+  __always_inline void
   atomic_signal_fence(memory_order __m) noexcept
   {
 __atomic_thread_fence (__m);
@@ -1544,38 +1544,38 @@ namespace std // _GLIBCXX_VISIBILITY(default)
 
 
   // Function definitions, atomic_flag operations.
-  inline __always_inline bool
+  __always_inline bool
   atomic_flag_test_and_set_explicit(atomic_flag* __a,
memory_order __m) noexcept
   { return __a-test_and_set(__m); }
 
-  inline __always_inline bool
+  __always_inline bool
   atomic_flag_test_and_set_explicit(volatile atomic_flag* __a,
memory_order __m) noexcept
   { return __a-test_and_set(__m); }
 
-  inline __always_inline void
+  __always_inline void
   atomic_flag_clear_explicit(atomic_flag* __a, memory_order __m) noexcept
   { __a-clear(__m); }
 
-  inline __always_inline void
+  __always_inline void
   atomic_flag_clear_explicit(volatile atomic_flag* __a,
 memory_order __m) noexcept
   { __a-clear(__m); }
 
-  inline __always_inline bool
+  __always_inline bool
   atomic_flag_test_and_set(atomic_flag* __a) noexcept
   { return atomic_flag_test_and_set_explicit(__a, memory_order_seq_cst); }
 
-  inline __always_inline bool
+  __always_inline bool
   atomic_flag_test_and_set(volatile atomic_flag* __a) noexcept
   { return atomic_flag_test_and_set_explicit(__a, memory_order_seq_cst); }
 
-  inline __always_inline void
+  __always_inline void
   atomic_flag_clear(atomic_flag* __a) noexcept
   { atomic_flag_clear_explicit(__a, memory_order_seq_cst); }
 
-  inline __always_inline void
+  __always_inline void
   atomic_flag_clear(volatile atomic_flag* __a) noexcept
   { atomic_flag_clear_explicit(__a, memory_order_seq_cst); }
 
-- 
glebfm


pgp7Xa9S2hOTe.pgp
Description: PGP signature


Re: [PING] [PATCH] Add direct support for Linux kernel __fentry__ patching

2014-09-27 Thread Andi Kleen
On Sat, Sep 27, 2014 at 06:45:21PM +0200, Dominique d'Humières wrote:
 I think the patch for gcc.target/i386/nop-mcount.c should be

True. Thanks.

-Andi


[PATCH 2/2] Remove x86 cmpstrnsi

2014-09-27 Thread Andi Kleen
From: Andi Kleen a...@linux.intel.com

In my tests the optimized glibc out of line strcmp is always faster than
using inline rep ; cmpsb, even for small strings. The Intel optimization manual
also recommends to not use it. So remove the cmpstrnsi instruction.

Tested on Sandy Bridge, Westmere Intel CPUs.

gcc/:

2014-09-27  Andi Kleen  a...@linux.intel.com

* config/i386/i386.md (cmpstrnsi, cmpintqi): Remove expanders.
---
 gcc/config/i386/i386.md | 85 -
 1 file changed, 85 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 98df8e1..1d2f1a5 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16097,91 +16097,6 @@
  (const_string *)))
(set_attr mode QI)])
 
-(define_expand cmpstrnsi
-  [(set (match_operand:SI 0 register_operand)
-   (compare:SI (match_operand:BLK 1 general_operand)
-   (match_operand:BLK 2 general_operand)))
-   (use (match_operand 3 general_operand))
-   (use (match_operand 4 immediate_operand))]
-  
-{
-  rtx addr1, addr2, out, outlow, count, countreg, align;
-
-  if (optimize_insn_for_size_p ()  !TARGET_INLINE_ALL_STRINGOPS)
-FAIL;
-
-  /* Can't use this if the user has appropriated ecx, esi or edi.  */
-  if (fixed_regs[CX_REG] || fixed_regs[SI_REG] || fixed_regs[DI_REG])
-FAIL;
-
-  out = operands[0];
-  if (!REG_P (out))
-out = gen_reg_rtx (SImode);
-
-  addr1 = copy_addr_to_reg (XEXP (operands[1], 0));
-  addr2 = copy_addr_to_reg (XEXP (operands[2], 0));
-  if (addr1 != XEXP (operands[1], 0))
-operands[1] = replace_equiv_address_nv (operands[1], addr1);
-  if (addr2 != XEXP (operands[2], 0))
-operands[2] = replace_equiv_address_nv (operands[2], addr2);
-
-  count = operands[3];
-  countreg = ix86_zero_extend_to_Pmode (count);
-
-  /* %%% Iff we are testing strict equality, we can use known alignment
- to good advantage.  This may be possible with combine, particularly
- once cc0 is dead.  */
-  align = operands[4];
-
-  if (CONST_INT_P (count))
-{
-  if (INTVAL (count) == 0)
-   {
- emit_move_insn (operands[0], const0_rtx);
- DONE;
-   }
-  emit_insn (gen_cmpstrnqi_nz_1 (addr1, addr2, countreg, align,
-operands[1], operands[2]));
-}
-  else
-{
-  rtx (*gen_cmp) (rtx, rtx);
-
-  gen_cmp = (TARGET_64BIT
-? gen_cmpdi_1 : gen_cmpsi_1);
-
-  emit_insn (gen_cmp (countreg, countreg));
-  emit_insn (gen_cmpstrnqi_1 (addr1, addr2, countreg, align,
- operands[1], operands[2]));
-}
-
-  outlow = gen_lowpart (QImode, out);
-  emit_insn (gen_cmpintqi (outlow));
-  emit_move_insn (out, gen_rtx_SIGN_EXTEND (SImode, outlow));
-
-  if (operands[0] != out)
-emit_move_insn (operands[0], out);
-
-  DONE;
-})
-
-;; Produce a tri-state integer (-1, 0, 1) from condition codes.
-
-(define_expand cmpintqi
-  [(set (match_dup 1)
-   (gtu:QI (reg:CC FLAGS_REG) (const_int 0)))
-   (set (match_dup 2)
-   (ltu:QI (reg:CC FLAGS_REG) (const_int 0)))
-   (parallel [(set (match_operand:QI 0 register_operand)
-  (minus:QI (match_dup 1)
-(match_dup 2)))
- (clobber (reg:CC FLAGS_REG))])]
-  
-{
-  operands[1] = gen_reg_rtx (QImode);
-  operands[2] = gen_reg_rtx (QImode);
-})
-
 ;; memcmp recognizers.  The `cmpsb' opcode does nothing if the count is
 ;; zero.  Emit extra code to make sure that a zero-length compare is EQ.
 
-- 
2.1.1



[PATCH 1/2] Remove i386 cmpstrnsi peephole

2014-09-27 Thread Andi Kleen
From: Andi Kleen a...@linux.intel.com

The peephole that removes the code to compute a tristate for cmpstrnsi
when only a boolean jump is needed never triggers in my tests. Just
remove it.

gcc/:

2014-09-27  Andi Kleen  a...@linux.intel.com

* config/i386/i386.md: Remove peepholes for cmpstrn*.
---
 gcc/config/i386/i386.md | 77 -
 1 file changed, 77 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 004302d..98df8e1 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16297,83 +16297,6 @@
  (const_string 0)
  (const_string *)))
(set_attr prefix_rep 1)])
-
-;; Peephole optimizations to clean up after cmpstrn*.  This should be
-;; handled in combine, but it is not currently up to the task.
-;; When used for their truth value, the cmpstrn* expanders generate
-;; code like this:
-;;
-;;   repz cmpsb
-;;   seta  %al
-;;   setb  %dl
-;;   cmpb  %al, %dl
-;;   jcc   label
-;;
-;; The intermediate three instructions are unnecessary.
-
-;; This one handles cmpstrn*_nz_1...
-(define_peephole2
-  [(parallel[
- (set (reg:CC FLAGS_REG)
- (compare:CC (mem:BLK (match_operand 4 register_operand))
- (mem:BLK (match_operand 5 register_operand
- (use (match_operand 6 register_operand))
- (use (match_operand:SI 3 immediate_operand))
- (clobber (match_operand 0 register_operand))
- (clobber (match_operand 1 register_operand))
- (clobber (match_operand 2 register_operand))])
-   (set (match_operand:QI 7 register_operand)
-   (gtu:QI (reg:CC FLAGS_REG) (const_int 0)))
-   (set (match_operand:QI 8 register_operand)
-   (ltu:QI (reg:CC FLAGS_REG) (const_int 0)))
-   (set (reg FLAGS_REG)
-   (compare (match_dup 7) (match_dup 8)))
-  ]
-  peep2_reg_dead_p (4, operands[7])  peep2_reg_dead_p (4, operands[8])
-  [(parallel[
- (set (reg:CC FLAGS_REG)
- (compare:CC (mem:BLK (match_dup 4))
- (mem:BLK (match_dup 5
- (use (match_dup 6))
- (use (match_dup 3))
- (clobber (match_dup 0))
- (clobber (match_dup 1))
- (clobber (match_dup 2))])])
-
-;; ...and this one handles cmpstrn*_1.
-(define_peephole2
-  [(parallel[
- (set (reg:CC FLAGS_REG)
- (if_then_else:CC (ne (match_operand 6 register_operand)
-  (const_int 0))
-   (compare:CC (mem:BLK (match_operand 4 register_operand))
-   (mem:BLK (match_operand 5 register_operand)))
-   (const_int 0)))
- (use (match_operand:SI 3 immediate_operand))
- (use (reg:CC FLAGS_REG))
- (clobber (match_operand 0 register_operand))
- (clobber (match_operand 1 register_operand))
- (clobber (match_operand 2 register_operand))])
-   (set (match_operand:QI 7 register_operand)
-   (gtu:QI (reg:CC FLAGS_REG) (const_int 0)))
-   (set (match_operand:QI 8 register_operand)
-   (ltu:QI (reg:CC FLAGS_REG) (const_int 0)))
-   (set (reg FLAGS_REG)
-   (compare (match_dup 7) (match_dup 8)))
-  ]
-  peep2_reg_dead_p (4, operands[7])  peep2_reg_dead_p (4, operands[8])
-  [(parallel[
- (set (reg:CC FLAGS_REG)
- (if_then_else:CC (ne (match_dup 6)
-  (const_int 0))
-   (compare:CC (mem:BLK (match_dup 4))
-   (mem:BLK (match_dup 5)))
-   (const_int 0)))
- (use (match_dup 3))
- (use (reg:CC FLAGS_REG))
- (clobber (match_dup 0))
- (clobber (match_dup 1))
- (clobber (match_dup 2))])])
 
 ;; Conditional move instructions.
 
-- 
2.1.1



[PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming

2014-09-27 Thread Ilya Verbin
Hello,

This patch enables the streaming of LTO bytecode, needed by offload target,
using existing LTO infrastructure.  It creates new prefix for the section names
(.gnu.target_lto_) and streams out the functions and variables with omp declare
target attribute, including the functions for outlined '#pragma omp target'
regions.  The offload compiler (under ifdef ACCEL_COMPILER) reads and compiles
these new sections.

But I have doubts regarding the offload_lto_mode switch.  Why I added it:
The outlined target regions (say omp_fn0) contains references from the parent
functions.  And that's correct for the case when we stream out the host-side
version of omp_fn0.  But for the target version there are no parent functions,
node-used_from_other_partition gets incorrect value (always 1), and offload
compiler crashes on streaming in.

Another solution is to remain referenced_from_other_partition_p and
reachable_from_other_partition_p unchanged, then used_from_other_partition will
have incorrect value for target regions, but the offload compiler will just
ignore it.  Which approach is better?
Anyway, now it's bootstrapped and regtested on i686-linux and x86_64-linux.


2014-09-27  Ilya Verbin  ilya.ver...@intel.com
Ilya Tocar  ilya.to...@intel.com
Andrey Turetskiy  andrey.turets...@intel.com
Bernd Schmidt  ber...@codesourcery.com
gcc/
* cgraph.h (symtab_node): Add need_dump flag.
* cgraphunit.c: Include lto-section-names.h.
(initialize_offload): New function.
(ipa_passes): Initialize offload and call ipa_write_summaries if there
is something to write to OMP_SECTION_NAME_PREFIX sections.
(symbol_table::compile): Call lto_streamer_hooks_init under flag_openmp.
* ipa-inline-analysis.c (inline_generate_summary): Do not exit under
flag_openmp.
(inline_free_summary): Always remove hooks.
* lto-cgraph.c (lto_set_symtab_encoder_in_partition): Exit if there is
no need to encode the node.
(referenced_from_other_partition_p, reachable_from_other_partition_p):
Ignore references from non-target functions to target functions if we
are streaming out target-side bytecode (offload lto mode).
(select_what_to_dump): New function.
* lto-section-names.h (OMP_SECTION_NAME_PREFIX): Define.
(section_name_prefix): Declare.
* lto-streamer.c (offload_lto_mode): New variable.
(section_name_prefix): New variable.
(lto_get_section_name): Use section_name_prefix instead of
LTO_SECTION_NAME_PREFIX.
* lto-streamer.h (select_what_to_dump): Declare.
(offload_lto_mode): Declare.
* omp-low.c (is_targetreg_ctx): New function.
(create_omp_child_function, check_omp_nesting_restrictions): Use it.
(expand_omp_target): Set mark_force_output for the target functions.
(lower_omp_critical): Add target attribute for omp critical symbol.
* passes.c (ipa_write_summaries): Call select_what_to_dump.
gcc/lto/
* lto-object.c (lto_obj_add_section): Use section_name_prefix instead of
LTO_SECTION_NAME_PREFIX.
* lto-partition.c (add_symbol_to_partition_1): Always set
node-need_dump to true.
(lto_promote_cross_file_statics): Call select_what_to_dump.
* lto.c (lto_section_with_id): Use section_name_prefix instead of
LTO_SECTION_NAME_PREFIX.
(read_cgraph_and_symbols): Read OMP_SECTION_NAME_PREFIX sections, if
being built as an offload compiler.

Thanks,
  -- Ilya

---

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 7481906..9ab970d 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -444,6 +444,11 @@ public:
   /* Set when init priority is set.  */
   unsigned in_init_priority_hash : 1;
 
+  /* Set when symbol needs to be dumped into LTO bytecode for LTO,
+ or in pragma omp target case, for separate compilation targeting
+ a different architecture.  */
+  unsigned need_dump : 1;
+
 
   /* Ordering of all symtab entries.  */
   int order;
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index b854e4b..4ab4c57 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -211,6 +211,7 @@ along with GCC; see the file COPYING3.  If not see
 #include tree-nested.h
 #include gimplify.h
 #include dbgcnt.h
+#include lto-section-names.h
 
 /* Queue of cgraph nodes scheduled to be added into cgraph.  This is a
secondary queue used during optimization to accommodate passes that
@@ -1994,9 +1995,40 @@ output_in_order (bool no_reorder)
   free (nodes);
 }
 
+/* Check whether there is at least one function or global variable to offload.
+   */
+
+static bool
+initialize_offload (void)
+{
+  bool have_offload = false;
+  struct cgraph_node *node;
+  struct varpool_node *vnode;
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+if (lookup_attribute (omp declare target, DECL_ATTRIBUTES (node-decl)))
+  {
+   have_offload = true;
+   break;
+  }
+
+  

Re: [RFC/PATCH] More precise diagnostic locations: dynamic locations for columns vs explicit offset

2014-09-27 Thread Manuel López-Ibáñez
On 25 September 2014 13:39, Dodji Seketeli do...@redhat.com wrote:
 2) In the Fortran FE, which gives quite precise location information
 by tracking the characters that it wants to warn about instead of
 relying on the line-map machinery.

 So with this feature, the Fortran FE would then use the then more
 generic diagnostics machinery, right?

This is the plan. The benefits will be more shared code, deleting a
lot of duplicated stuff in the Fortran FE and supporting in Fortran
all the goodies of GCC (#pragmas, color, options printing, macro
unwinder). However, the Fortran FE still has a long way to go to make
use of all the features of the common diagnostics machinery.. On the
bright side, the common diagnostics machinery already supports all
features of the Fortran FE except for offset locations, multiple
locations (and multiple carets), and buffered diagnostics (well, the
diagnostics machinery does buffer, but there is no API to clear the
buffer without printing it).

I think a bunch of FE diagnostic calls could already use the common
machinery (some are already using it). Unfortunately, most Fortran
diagnostics use offset locations because the FE does not track the
locations of tokens with line-maps (I think the locations are computed
but not stored or passed down to the diagnostic functions).

The work to be done is not even technically difficult. The lack of
progress on this is due, as always, to lack of time by the people
currently working on GCC and/or lack of new contributors.

Cheers,

Manuel.


Re: [PATCH 2/2] Remove x86 cmpstrnsi

2014-09-27 Thread Oleg Endo
On Sat, 2014-09-27 at 11:10 -0700, Andi Kleen wrote:
 From: Andi Kleen a...@linux.intel.com
 
 In my tests the optimized glibc out of line strcmp is always faster than
 using inline rep ; cmpsb, even for small strings. The Intel optimization 
 manual
 also recommends to not use it. So remove the cmpstrnsi instruction.
 
 Tested on Sandy Bridge, Westmere Intel CPUs.
 
 gcc/:
 
 2014-09-27  Andi Kleena...@linux.intel.com
 
   * config/i386/i386.md (cmpstrnsi, cmpintqi): Remove expanders.

This has been mentioned a while ago, e.g.
https://gcc.gnu.org/ml/gcc/2002-10/msg01616.html
https://gcc.gnu.org/ml/gcc/2003-04/msg00166.html

Instead of just completely removing it, how about disabling it for newer
CPU types if not optimizing for size?

Cheers,
Oleg



Re: [PATCH 2/2] Remove x86 cmpstrnsi

2014-09-27 Thread Andi Kleen
On Sat, Sep 27, 2014 at 08:45:18PM +0200, Oleg Endo wrote:
 On Sat, 2014-09-27 at 11:10 -0700, Andi Kleen wrote:
  From: Andi Kleen a...@linux.intel.com
  
  In my tests the optimized glibc out of line strcmp is always faster than
  using inline rep ; cmpsb, even for small strings. The Intel optimization 
  manual
  also recommends to not use it. So remove the cmpstrnsi instruction.
  
  Tested on Sandy Bridge, Westmere Intel CPUs.
  
  gcc/:
  
  2014-09-27  Andi Kleen  a...@linux.intel.com
  
  * config/i386/i386.md (cmpstrnsi, cmpintqi): Remove expanders.
 
 This has been mentioned a while ago, e.g.
 https://gcc.gnu.org/ml/gcc/2002-10/msg01616.html
 https://gcc.gnu.org/ml/gcc/2003-04/msg00166.html
 
 Instead of just completely removing it, how about disabling it for newer
 CPU types if not optimizing for size?

I believe it was slow even on old CPUs. But back then glibc may have
been even slower. Not sure it is worth keeping it for -Os, especially
given that parts of it have already bitrotted.

-Andi


Re: [PATCH] microblaze: microblaze.md: Use VOID instead of SI to fix ((void (*)(void)) 0)() issue

2014-09-27 Thread Chen Gang
I guess it is not in our test case, or after this patch, the new
should get a little better result than the old (at present, they are
same).

I shall try to add related case into testsuite within this month.

Thanks

Send from Lenovo A788t.

Michael Eager ea...@eagerm.com wrote:

On 09/25/14 07:03, Chen Gang wrote:
 Need use VOID instead of SI, or when real VOIDmode comes, it does not
 match SImode, so cause issue. This patch can fix this issue and pass
 testsuite.

 The related test code ('void' will cause CALL instead of SET):

typedef void (*T)(void);
f1 ()
{
  ((T) 0)();
}

 The related error:

[root@localhost gcc]# ./cc1 /tmp/calls.c -o /tmp/1.s
 f1
Analyzing compilation unit
Performing interprocedural optimizations
 *free_lang_data visibility early_local_cleanups 
 free-inline-summary whole-program inlineAssembling functions:
 f1
/tmp/calls.c: In function 'f1':
/tmp/calls.c:5:1: error: unrecognizable insn:
 }
 ^
(call_insn 5 2 8 2 (parallel [
(call (mem:SI (const_int 0 [0]) [0 MEM[(void (*T29c) 
 (void))0B] S4 A32])
(const_int 24 [0x18]))
(clobber (reg:SI 15 r15))
]) /tmp/calls.c:4 -1
 (nil)
(nil))
/tmp/calls.c:5:1: internal compiler error: in extract_insn, at 
 recog.c:2204
0xb0e71b _fatal_insn(char const*, rtx_def const*, char const*, int, char 
 const*)
  ../../gcc/gcc/rtl-error.c:109
0xb0e75c _fatal_insn_not_found(rtx_def const*, char const*, int, char 
 const*)
  ../../gcc/gcc/rtl-error.c:117
0xac552b extract_insn(rtx_def*)
  ../../gcc/gcc/recog.c:2204
0x8b919e instantiate_virtual_regs_in_insn
  ../../gcc/gcc/function.c:1614
0x8ba347 instantiate_virtual_regs
  ../../gcc/gcc/function.c:1934
0x8ba452 execute
  ../../gcc/gcc/function.c:1983
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See http://gcc.gnu.org/bugs.html for instructions.

Is this test case (or a similar one) in the gcc test suite?

If not, can you please add it to the test suite.



 2014-09-25  Chen Gang  gang.chen.5...@gmail.com

  * config/microblaze/microblaze.md (call_internal1): Use VOID
  instead of SI to fix ((void (*)(void)) 0)() issue

 ---
   gcc/config/microblaze/microblaze.md | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/gcc/config/microblaze/microblaze.md 
 b/gcc/config/microblaze/microblaze.md
 index b971737..3b4faf4 100644
 --- a/gcc/config/microblaze/microblaze.md
 +++ b/gcc/config/microblaze/microblaze.md
 @@ -2062,7 +2062,7 @@
 (set_attr length   4)])

   (define_insn call_internal1
 -  [(call (mem (match_operand:SI 0 call_insn_simple_operand ri))
 +  [(call (mem (match_operand:VOID 0 call_insn_simple_operand ri))
   (match_operand:SI 1  i))
 (clobber (reg:SI R_SR))]
 

I've verified that your patch does not cause any test suite regressions.


-- 
Michael Eager   ea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


Re: [PATCH][AArch64] LR register not used in leaf functions

2014-09-27 Thread Kugan


On 23/09/14 01:58, Jiong Wang wrote:
 On 22/09/14 16:43, Kugan wrote:
 
 AArch64 has the same issue ARM had where the LR register was not used in
 leaf functions. This was reported in
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42017. In AArch64, this
 test-case need to be added with more live ranges for the need for the
 LR_REGNUM. i.e test-case in the PR needs additional loops up to r31 for
 the case AArch64 to see this.

 The same fix (from the thread
 https://gcc.gnu.org/ml/gcc-patches/2011-04/msg02191.html) which went
 into ARM should apply to AArch64 as well. Regression tested on qemu for
 aarch64-none-linux-gnu with no new regressions. Is this OK for trunk?
 This still be a partial fix. LR should be a caller-saved register free
 to use in case it's saved properly to across function call.

Indeed. This should be improved from the generic code. Right now, if a
hard register is used in EPILOGUE_USES, it conflicts with all the live
ranges till a call site kills.  I think we should have this patch till
the generic code can be improved.

Thanks,
Kugan

 
 I had a very similar patch to this sitting in my local tree and under
 various benchmark analysis.
 
 -- Jiong

 Thanks,
 Kugan


 gcc/ChangeLog:

 2014-09-23  Kugan Vivekanandarajah  kug...@linaro.org

 * config/aarch64/aarch64.h (EPILOGUE_USES): Return true only after
 epilogue_completed is true.

 
 


Re: [PATCH] combine: Allow substituting the target reg of a clobber

2014-09-27 Thread Segher Boessenkool
On Mon, Sep 22, 2014 at 04:20:12PM -0600, Jeff Law wrote:
 Can you add a testcase which shows the 3-insn combination from PR62151 
 applying?

I've tried to make a stable future-proof testcase that does such a three-insn
combination.  Not easy at all.

But now it dawns on me: do you just want the actual testcase from the PR?
(Well, fixed so that it is valid C, I suppose).  With a test that combine
does its job, of course?  Not sure how to test that, but maybe I'll learn.

Or is a test showing the testcase working after the change good enough?


Segher


Re: [PATCH 3/5] IPA ICF pass

2014-09-27 Thread Jan Hubicka
 
 Hi.
 
 Thank you Markus for presenting numbers, it corresponds with I measured. If I 
 see correctly, IPA ICF pass takes about 7 seconds,
 the rest is distributed in verifier (not interesting for release version of 
 the compiler) and 'phase opt and generate'. No idea
 what can make the difference?

phase opt and generate just combine all the optimization times together, so it
is same 7 seconds as in the ICF pass :)
1GB of function bodies just to elimnate 2-3% of code seems quite alot. Do you
have any idea how many of those turns out to be different?
It would be nice to be able to release the duplicate bodies from memory after
the equivalency was stablished

Honza

 
 Martin


Re: [PATCH] microblaze: microblaze.md: Use VOID instead of SI to fix ((void (*)(void)) 0)() issue

2014-09-27 Thread Chen Gang

And excuse me, I am not quite familiar with adding testsuite, so during
I am trying, welcome any related ideas, suggestions or completions.

Thanks.

On 9/28/14 3:08, Chen Gang wrote:
 I guess it is not in our test case, or after this patch, the new
 should get a little better result than the old (at present, they are
 same).
 
 I shall try to add related case into testsuite within this month.
 
 Thanks
 
 Send from Lenovo A788t.
 
 
 
 
 
 Michael Eager ea...@eagerm.com wrote:
 
 
 
 On 09/25/14 07:03, Chen Gang wrote:
 Need use VOID instead of SI, or when real VOIDmode comes, it does not
 match SImode, so cause issue. This patch can fix this issue and pass
 testsuite.

 The related test code ('void' will cause CALL instead of SET):

typedef void (*T)(void);
f1 ()
{
  ((T) 0)();
}

 The related error:

[root@localhost gcc]# ./cc1 /tmp/calls.c -o /tmp/1.s
 f1
Analyzing compilation unit
Performing interprocedural optimizations
 *free_lang_data visibility early_local_cleanups 
 free-inline-summary whole-program inlineAssembling functions:
 f1
/tmp/calls.c: In function 'f1':
/tmp/calls.c:5:1: error: unrecognizable insn:
 }
 ^
(call_insn 5 2 8 2 (parallel [
(call (mem:SI (const_int 0 [0]) [0 MEM[(void (*T29c) 
 (void))0B] S4 A32])
(const_int 24 [0x18]))
(clobber (reg:SI 15 r15))
]) /tmp/calls.c:4 -1
 (nil)
(nil))
/tmp/calls.c:5:1: internal compiler error: in extract_insn, at 
 recog.c:2204
0xb0e71b _fatal_insn(char const*, rtx_def const*, char const*, int, char 
 const*)
  ../../gcc/gcc/rtl-error.c:109
0xb0e75c _fatal_insn_not_found(rtx_def const*, char const*, int, char 
 const*)
  ../../gcc/gcc/rtl-error.c:117
0xac552b extract_insn(rtx_def*)
  ../../gcc/gcc/recog.c:2204
0x8b919e instantiate_virtual_regs_in_insn
  ../../gcc/gcc/function.c:1614
0x8ba347 instantiate_virtual_regs
  ../../gcc/gcc/function.c:1934
0x8ba452 execute
  ../../gcc/gcc/function.c:1983
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See http://gcc.gnu.org/bugs.html for instructions.
 
 Is this test case (or a similar one) in the gcc test suite?
 
 If not, can you please add it to the test suite.
 


 2014-09-25  Chen Gang  gang.chen.5...@gmail.com

  * config/microblaze/microblaze.md (call_internal1): Use VOID
  instead of SI to fix ((void (*)(void)) 0)() issue

 ---
   gcc/config/microblaze/microblaze.md | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/gcc/config/microblaze/microblaze.md 
 b/gcc/config/microblaze/microblaze.md
 index b971737..3b4faf4 100644
 --- a/gcc/config/microblaze/microblaze.md
 +++ b/gcc/config/microblaze/microblaze.md
 @@ -2062,7 +2062,7 @@
 (set_attr length   4)])

   (define_insn call_internal1
 -  [(call (mem (match_operand:SI 0 call_insn_simple_operand ri))
 +  [(call (mem (match_operand:VOID 0 call_insn_simple_operand ri))
   (match_operand:SI 1  i))
 (clobber (reg:SI R_SR))]
 
 
 I've verified that your patch does not cause any test suite regressions.
 
 

-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed