Re: Turn DECL_SECTION_NAME into string

2014-06-17 Thread Thomas Schwinge
Hi!

On Thu, 12 Jun 2014 06:33:25 +0200, Jan Hubicka hubi...@ucw.cz wrote:
 this lenghtly patch makes the legwork to put section names out of tree 
 representation.
 Originally they were STRING_CST. I ended up implementing on-side reference 
 counted
 string voclabulary that is done in bit baroque way to be GGC and PCH safe 
 (uff).

As reported in https://gcc.gnu.org/PR61508, this causes a build failure
with --enable-checking=fold:

/home/dimhen/src/gcc_current/gcc/fold-const.c: In function 'void 
fold_checksum_tree(const_tree, md5_ctx*, hash_tablepointer_hashtree_node)':
/home/dimhen/src/gcc_current/gcc/fold-const.c:14863:55: error: cannot 
convert 'const char*' to 'const_tree {aka const tree_node*}' for argument '1' 
to 'void fold_checksum_tree(const_tree, md5_ctx*, 
hash_tablepointer_hashtree_node )'
  fold_checksum_tree (DECL_SECTION_NAME (expr), ctx, ht);

From light testing the following seems to get around this -- is it the
appropriate fix?

diff --git gcc/fold-const.c gcc/fold-const.c
index 24daaa3..978b854 100644
--- gcc/fold-const.c
+++ gcc/fold-const.c
@@ -14859,8 +14859,6 @@ fold_checksum_tree (const_tree expr, struct md5_ctx 
*ctx,
  fold_checksum_tree (DECL_ABSTRACT_ORIGIN (expr), ctx, ht);
  fold_checksum_tree (DECL_ATTRIBUTES (expr), ctx, ht);
}
-  if (CODE_CONTAINS_STRUCT (TREE_CODE (expr), TS_DECL_WITH_VIS))
-   fold_checksum_tree (DECL_SECTION_NAME (expr), ctx, ht);
 
   if (CODE_CONTAINS_STRUCT (TREE_CODE (expr), TS_DECL_NON_COMMON))
{


Grüße,
 Thomas


pgpPJBK1qw1Im.pgp
Description: PGP signature


[DOC Patch] Attribute 'naked'

2014-06-17 Thread David Wohlferd
I don't have permissions to commit this patch, but I do have a release 
on file with the FSF.


Problem description:
The docs for the function attribute 'naked' are confusing and 
self-contradictory.  Also, discussion on this thread 
https://gcc.gnu.org/ml/gcc/2014-05/msg00100.html has lead to changing 
the text from the vague avoid using to the very clear not supported 
regarding the usage of Extended asm with 'naked.'  Lastly, this 
attribute should be mentioned when describing the differences between 
Basic and Extended asm.


ChangeLog:
2014-06-17  David Wohlferd d...@limegreensocks.com

* doc/extend.texi (Function Attributes): Update 'naked' 
attribute doc.


dw
Index: extend.texi
===
--- extend.texi	(revision 210624)
+++ extend.texi	(working copy)
@@ -3332,16 +3332,15 @@
 
 @item naked
 @cindex function without a prologue/epilogue code
-Use this attribute on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX and SPU
-ports to indicate that the specified function does not need prologue/epilogue
-sequences generated by the compiler.
-It is up to the programmer to provide these sequences. The
-only statements that can be safely included in naked functions are
-@code{asm} statements that do not have operands.  All other statements,
-including declarations of local variables, @code{if} statements, and so
-forth, should be avoided.  Naked functions should be used to implement the
-body of an assembly function, while allowing the compiler to construct
-the requisite function declaration for the assembler.
+This attribute is available on the ARM, AVR, MCORE, MSP430, NDS32,
+RL78, RX and SPU ports.  It allows the compiler to construct the
+requisite function declaration, while allowing the body of the
+function to be assembly code. The specified function will not have
+prologue/epilogue sequences generated by the compiler. Only Basic
+@code{asm} statements can safely be included in naked functions
+(@pxref{Basic Asm}). While using Extended @code{asm} or a mixture of
+Basic @code{asm} and ``C'' code may appear to work, they cannot be
+depended upon to work reliably and are not supported.
 
 @item near
 @cindex functions that do not handle memory bank switching on 68HC11/68HC12
@@ -6269,6 +6268,8 @@
 efficient code, and in most cases it is a better solution. When writing 
 inline assembly language outside of C functions, however, you must use Basic 
 @code{asm}. Extended @code{asm} statements have to be inside a C function.
+Functions declared with the @code{naked} attribute also require Basic 
+@code{asm} (@pxref{Function Attributes}).
 
 Under certain circumstances, GCC may duplicate (or remove duplicates of) your 
 assembly code when optimizing. This can lead to unexpected duplicate 
@@ -6388,6 +6389,8 @@
 
 Note that Extended @code{asm} statements must be inside a function. Only 
 Basic @code{asm} may be outside functions (@pxref{Basic Asm}).
+Functions declared with the @code{naked} attribute also require Basic 
+@code{asm} (@pxref{Function Attributes}).
 
 While the uses of @code{asm} are many and varied, it may help to think of an 
 @code{asm} statement as a series of low-level instructions that convert input 


Re: [PATCH] Fix PR61335

2014-06-17 Thread Uros Bizjak
On Fri, Jun 6, 2014 at 10:07 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Fri, Jun 6, 2014 at 9:47 AM, Uros Bizjak ubiz...@gmail.com wrote:

 2014-05-28  Richard Biener  rguent...@suse.de

 PR tree-optimization/61335
 * tree-vrp.c (vrp_visit_phi_node): If the compare of old and
 new range fails, drop to varying.

 * gfortran.dg/pr61335.f90: New testcase.

 This testcase triggers SIGFPE on alpha due to the use of denormal
 operand. Maybe uninitialized value is used in line 48?

 SIGFPE also triggers at the same place on x86_64 with unmasked FPE
 exceptions (compile with -O0).

Attached patch initializes problematic array to zero instead of
uninitialized value.

2014-06-17  Uros Bizjak  ubiz...@gmail.com

* gfortran.dg/pr61335.f90 (cp_unit_create): Initialize
unit_id and kind_id to zero.

Tested on alphaev68-linux-gnu and x86_64-linux-gnu.

OK for mainline?

Uros.

Index: gfortran.dg/pr61335.f90
===
--- gfortran.dg/pr61335.f90 (revision 211723)
+++ gfortran.dg/pr61335.f90 (working copy)
@@ -45,8 +45,8 @@
 LOGICAL  :: failure

 failure=.FALSE.
-unit_id=cp_units_none
-kind_id=cp_ukind_none
+unit_id=0
+kind_id=0
 power=0
 i_low=1
 i_high=1


[PATCH] PR54555: Use strict_low_part for loading a constant only if it is cheaper

2014-06-17 Thread Andreas Schwab
Postreload may transform (set (REGX) (CONST_INT A)) ... (set (REGX)
(CONST_INT B)) to (set (REGX) (CONST_INT A)) ... (set (STRICT_LOW_PART
(REGX)) (CONST_INT B)), but it should do that only if the latter is
cheaper.  On m68k, a full word load of a small constant with moveq is
cheaper than doing a byte load with move.b.

Tested on m68k-suse-linux and x86_64-suse-linux.  In both cases the size
of cc1* becomes smaller with this change.

Andreas.

PR rtl-optimization/54555
* postreload.c (move2add_use_add2_insn): Only substitute
STRICT_LOW_PART if it is cheaper.
---
 gcc/postreload.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/postreload.c b/gcc/postreload.c
index 9d71649..89f0c84 100644
--- a/gcc/postreload.c
+++ b/gcc/postreload.c
@@ -1805,10 +1805,14 @@ move2add_use_add2_insn (rtx reg, rtx sym, rtx off, rtx 
insn)
   gen_rtx_STRICT_LOW_PART (VOIDmode,
narrow_reg),
   narrow_src);
- changed = validate_change (insn, PATTERN (insn),
-new_set, 0);
- if (changed)
-   break;
+ get_full_set_rtx_cost (new_set, newcst);
+ if (costs_lt_p (newcst, oldcst, speed))
+   {
+ changed = validate_change (insn, PATTERN (insn),
+new_set, 0);
+ if (changed)
+   break;
+   }
}
}
}
-- 
2.0.0

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
And now for something completely different.


Re: Regimplification enhancements 1/3

2014-06-17 Thread Richard Biener
On Mon, Jun 16, 2014 at 11:52 PM, Mike Stump mikest...@comcast.net wrote:
 On Jun 16, 2014, at 10:49 AM, Bernd Schmidt ber...@codesourcery.com wrote:

 There are two reasons why I can't do this in the frontends - one, Joseph has 
 already rejected a C frontend patch,

 I’d like to think there is an acceptable way to get the right memory space on 
 things...

 and two, this needs to work with OpenACC offloading - i.e. code is initially 
 compiled by an x86 host compiler, then a ptx lto1 reads it in and needs to 
 make it valid for that target.

 Ah yes, that would do it, thanks.  I can see my port as an offload target…  
 I’ll have to keep on eye on OpenACC and gcc.

But then IMHO using the gimplifier to do this fixup is wrong.  Please add
those required ADDR_SPACE_CONVERT_EXPRs in your pass manually.
After all you also have to adjust types of MEM_REFs and possibly
types of pointer variables (and pointer sizes?).

Richard.


Re: fix math wrt volatile-bitfields vs C++ model

2014-06-17 Thread Richard Biener
On Tue, Jun 17, 2014 at 4:08 AM, DJ Delorie d...@redhat.com wrote:

 Looks ok to me, but can you add a testcase please?

 I have a testcase, but if -flto the testcase doesn't include *any*
 definition of the test function, just all the LTO data.  Is this
 normal?

Without -ffat-lto-objects yes, this is normal.  If you are trying to
do a scan-assembler or so then this will be difficult with LTO.
If LTO is not necessary to trigger the bug and you just want to
use the torture I suggest to dg-skip-if -flto.

 Also check if 4.9 is affected.

 It is...  same fix works, though.

Thanks,
Richard.


Re: [PATCH][genattrtab] Fix memory corruption, allocate enough memory for all bypassed reservations

2014-06-17 Thread Kyrill Tkachov


On 16/06/14 17:39, Jeff Law wrote:

On 06/16/14 04:12, Kyrill Tkachov wrote:


Doh, you're right. I did consider it but for some reason thought we
might want to iterate over all of the bypasses anyway. Breaking out
seems good.

How about this?
Tested on arm and aarch64 and confirmed with valgrind that no out of
bounds accesses occur.
I kicked off an x86_64 bootstrap but don't expect any problems.

Thanks,
Kyrill

genattrtab-bypasses.patch


commit 676b85f7a7cc1446482334dcaad457ac328875a8
Author: Kyrylo Tkachovkyrylo.tkac...@arm.com
Date:   Fri Jun 13 11:09:57 2014 +0100

  [genattrtab] Fix memory corruption with bypasses

I'm an idiot.  n_bypassed is used to size the vector, so you do have to
walk the entire list.


AFAICS in the loop in process_bypasses we want to count all the 
reservations which have a bypass matching them. Once a reservation is 
matched with a bypass it should be safe to break out of the inner loop 
(over the bypasses), even if two bypasses match a reservation we only 
want to count the reservation once.


So I think the 2nd version of the patch is good

Thanks,
Kyrill


Jeff









Re: [PATCH, cprop] Check rtx_cost when propagating constant

2014-06-17 Thread Richard Biener
On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen
zhenqiang.c...@linaro.org wrote:
 Hi,

 For some large constant, ports like ARM, need one more instructions to
 operate it. e.g

 #define MASK 0xfe00ff
 void maskdata (int * data, int len)
 {
int i = len;
for (; i  0; i -= 2)
 {
   data[i] = MASK;
   data[i + 1] = MASK;
 }
 }

 Need two instructions for each AND operation:

 andr3, r3, #16711935
 bicr3, r3, #65536

 If we keep the MASK in a register, loop2_invariant pass can hoist it
 out the loop. And it can be shared by different references.

 So the patch skips constant propagation if it makes INSN's cost higher.

So cprop undos invariant motions work here?

Should we make sure we add a REG_EQUAL note when not propagating?

 Bootstrap and no make check regression on X86-64 and ARM Chrome book.

 OK for trunk?

 Thanks!
 -Zhenqiang

 ChangeLog:
 2014-06-17  Zhenqiang Chen  zhenqiang.c...@linaro.org

 * cprop.c (try_replace_reg): Check cost for constants.

 diff --git a/gcc/cprop.c b/gcc/cprop.c
 index aef3ee8..c9cf02a 100644
 --- a/gcc/cprop.c
 +++ b/gcc/cprop.c
 @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn)
rtx src = 0;
int success = 0;
rtx set = single_set (insn);
 +  int old_cost = 0;
 +  bool copy_p = false;
 +  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn));
 +
 +  if (set  SET_SRC (set)  REG_P (SET_SRC (set)))
 +copy_p = true;
 +  else
 +old_cost = set_rtx_cost (set, speed);

Looks bogus for set == NULL?

Also what about register pressure?

I think this kind of change needs wider testing as RTX costs are
usually not fully implemented and you introduce a new use kind
(or is it already used elsewhere in this way to compute cost
difference of a set with s/reg/const?).

What kind of performance difference do you see?

Thanks,
Richard.

/* Usually we substitute easy stuff, so we won't copy everything.
   We however need to take care to not duplicate non-trivial CONST
 @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn)
to = copy_rtx (to);

validate_replace_src_group (from, to, insn);
 +
 +  /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop.
 + And it can be shared by different references.  So skip propagation if
 + it makes INSN's rtx cost higher.  */
 +  if (set  !copy_p  CONSTANT_P (to))
 +{
 +  int new_cost = set_rtx_cost (set, speed);
 +  if (new_cost  old_cost)
 +   {
 + cancel_changes (0);
 + return false;
 +   }
 +}
 +
if (num_changes_pending ()  apply_change_group ())
  success = 1;


Re: Turn DECL_SECTION_NAME into string

2014-06-17 Thread Richard Biener
On Tue, Jun 17, 2014 at 8:40 AM, Thomas Schwinge
tho...@codesourcery.com wrote:
 Hi!

 On Thu, 12 Jun 2014 06:33:25 +0200, Jan Hubicka hubi...@ucw.cz wrote:
 this lenghtly patch makes the legwork to put section names out of tree 
 representation.
 Originally they were STRING_CST. I ended up implementing on-side reference 
 counted
 string voclabulary that is done in bit baroque way to be GGC and PCH safe 
 (uff).

 As reported in https://gcc.gnu.org/PR61508, this causes a build failure
 with --enable-checking=fold:

 /home/dimhen/src/gcc_current/gcc/fold-const.c: In function 'void 
 fold_checksum_tree(const_tree, md5_ctx*, hash_tablepointer_hashtree_node)':
 /home/dimhen/src/gcc_current/gcc/fold-const.c:14863:55: error: cannot 
 convert 'const char*' to 'const_tree {aka const tree_node*}' for argument '1' 
 to 'void fold_checksum_tree(const_tree, md5_ctx*, 
 hash_tablepointer_hashtree_node )'
   fold_checksum_tree (DECL_SECTION_NAME (expr), ctx, ht);

 From light testing the following seems to get around this -- is it the
 appropriate fix?

Yes.  This is ok.

Thanks,
Richard.

 diff --git gcc/fold-const.c gcc/fold-const.c
 index 24daaa3..978b854 100644
 --- gcc/fold-const.c
 +++ gcc/fold-const.c
 @@ -14859,8 +14859,6 @@ fold_checksum_tree (const_tree expr, struct md5_ctx 
 *ctx,
   fold_checksum_tree (DECL_ABSTRACT_ORIGIN (expr), ctx, ht);
   fold_checksum_tree (DECL_ATTRIBUTES (expr), ctx, ht);
 }
 -  if (CODE_CONTAINS_STRUCT (TREE_CODE (expr), TS_DECL_WITH_VIS))
 -   fold_checksum_tree (DECL_SECTION_NAME (expr), ctx, ht);

if (CODE_CONTAINS_STRUCT (TREE_CODE (expr), TS_DECL_NON_COMMON))
 {


 Grüße,
  Thomas


[gomp4] Merge trunk r211693 (2014-06-16) into gomp-4_0-branch

2014-06-17 Thread Thomas Schwinge
Hi!

In r211726, I have committed a merge from trunk r211693 (2014-06-16) into
gomp-4_0-branch.


The LTO regression that appeared with an earlier merge,
http://news.gmane.org/find-root.php?message_id=%3C87wqf483pl.fsf%40schwinge.name%3E,
remains to be resolved:

 PASS: gcc.dg/lto/save-temps c_lto_save-temps_0.o assemble,  -O -flto 
-save-temps
-PASS: gcc.dg/lto/save-temps c_lto_save-temps_0.o-c_lto_save-temps_0.o 
link,  -O -flto -save-temps
+FAIL: gcc.dg/lto/save-temps c_lto_save-temps_0.o-c_lto_save-temps_0.o 
link,  -O -flto -save-temps
+UNRESOLVED: gcc.dg/lto/save-temps 
c_lto_save-temps_0.o-c_lto_save-temps_0.o execute  -O -flto -save-temps

Executing on host: [...]/build/gcc/xgcc -B[...]/build/gcc/  
-fno-diagnostics-show-caret -fdiagnostics-color=never   -O -flto -save-temps  
-c  -o c_lto_save-temps_0.o 
[...]/source/gcc/testsuite/gcc.dg/lto/save-temps_0.c(timeout = 300)
spawn [...]/build/gcc/xgcc -B[...]/build/gcc/ -fno-diagnostics-show-caret 
-fdiagnostics-color=never -O -flto -save-temps -c -o c_lto_save-temps_0.o 
[...]/source/gcc/testsuite/gcc.dg/lto/save-temps_0.c
PASS: gcc.dg/lto/save-temps c_lto_save-temps_0.o assemble,  -O -flto 
-save-temps
Executing on host: [...]/build/gcc/xgcc -B[...]/build/gcc/ 
c_lto_save-temps_0.o  -fno-diagnostics-show-caret -fdiagnostics-color=never   
-O -flto -save-temps   -o gcc-dg-lto-save-temps-01.exe(timeout = 300)
spawn [...]/build/gcc/xgcc -B[...]/build/gcc/ c_lto_save-temps_0.o 
-fno-diagnostics-show-caret -fdiagnostics-color=never -O -flto -save-temps -o 
gcc-dg-lto-save-temps-01.exe
[...]/build/gcc/xgcc @/tmp/ccjomvFW
[...]/build/gcc/xgcc @/tmp/ccAM0t6j
output is:
[...]/build/gcc/xgcc @/tmp/ccjomvFW
[...]/build/gcc/xgcc @/tmp/ccAM0t6j

FAIL: gcc.dg/lto/save-temps c_lto_save-temps_0.o-c_lto_save-temps_0.o link, 
 -O -flto -save-temps
UNRESOLVED: gcc.dg/lto/save-temps c_lto_save-temps_0.o-c_lto_save-temps_0.o 
execute  -O -flto -save-temps


Owing to the Fortran front end changes for OpenMP 4 User-Defined
Reductions, I have adapted the expected error messages for OpenACC as
follows.  While this is not critical, perhaps someone may want to improve
this later on; so noting this here for later reference.

--- gcc/testsuite/gfortran.dg/goacc/reduction.f95
+++ gcc/testsuite/gfortran.dg/goacc/reduction.f95
@@ -66,73 +66,73 @@ common /blk/ i1
 !$acc end parallel
 !$acc parallel reduction (*:ia1)   ! { dg-error Assumed size }
 !$acc end parallel
-!$acc parallel reduction (+:l1)! { dg-error must be of 
numeric type, got LOGICAL }
+!$acc parallel reduction (+:l1)! { dg-error OMP DECLARE 
REDUCTION \\+ not found for type LOGICAL }
 !$acc end parallel
-!$acc parallel reduction (*:la1)   ! { dg-error must be of numeric type, 
got LOGICAL }
+!$acc parallel reduction (*:la1)   ! { dg-error OMP DECLARE REDUCTION \\* 
not found for type LOGICAL }
 !$acc end parallel
-!$acc parallel reduction (-:a1)! { dg-error must be of 
numeric type, got CHARACTER }
+!$acc parallel reduction (-:a1)! { dg-error OMP DECLARE 
REDUCTION - not found for type CHARACTER }
 !$acc end parallel
-!$acc parallel reduction (+:t1)! { dg-error must be of 
numeric type, got TYPE }
+!$acc parallel reduction (+:t1)! { dg-error OMP DECLARE 
REDUCTION \\+ not found for type TYPE }
 !$acc end parallel
-!$acc parallel reduction (*:ta1)   ! { dg-error must be of numeric type, 
got TYPE }
+!$acc parallel reduction (*:ta1)   ! { dg-error OMP DECLARE REDUCTION \\* 
not found for type TYPE }
 !$acc end parallel
-!$acc parallel reduction (.and.:i3)! { dg-error must be LOGICAL }
+!$acc parallel reduction (.and.:i3)! { dg-error OMP DECLARE REDUCTION 
\\.and\\. not found for type INTEGER }
 !$acc end parallel
-!$acc parallel reduction (.or.:ia2)! { dg-error must be LOGICAL }
+!$acc parallel reduction (.or.:ia2)! { dg-error OMP DECLARE REDUCTION 
\\.or\\. not found for type INTEGER }
 !$acc end parallel
-!$acc parallel reduction (.eqv.:r1)! { dg-error must be LOGICAL }
+!$acc parallel reduction (.eqv.:r1)! { dg-error OMP DECLARE REDUCTION 
\\.eqv\\. not found for type REAL }
 !$acc end parallel
-!$acc parallel reduction (.neqv.:ra1)  ! { dg-error must be LOGICAL }
+!$acc parallel reduction (.neqv.:ra1)  ! { dg-error OMP DECLARE REDUCTION 
\\.neqv\\. not found for type REAL }
 !$acc end parallel
-!$acc parallel reduction (.and.:d1)! { dg-error must be LOGICAL }
+!$acc parallel reduction (.and.:d1)! { dg-error OMP DECLARE REDUCTION 
\\.and\\. not found for type REAL }
 !$acc end parallel
-!$acc parallel reduction (.or.:da1)! { dg-error must be LOGICAL }
+!$acc parallel reduction (.or.:da1)! { dg-error OMP DECLARE REDUCTION 
\\.or\\. not found for type REAL }
 !$acc end parallel
-!$acc parallel reduction (.eqv.:c1)! { dg-error must be LOGICAL }
+!$acc parallel reduction (.eqv.:c1)! { 

[c++-concepts] Fix assertion failure with cp_maybe_constrained_type_specifier

2014-06-17 Thread Braden Obrzut
cp_maybe_constrained_type_specifier asserted that the decl passed in 
would be of type OVERLOAD, however a clean build of the compiler was 
broken since it could also be a BASELINK.  I'm not entirely sure when 
this is the case, except that it seems to happen with class member 
templates as it also caused a test case in my next patch to fail.  The 
solution is to check for a BASELINK and extract the functions from it.


The possibility of decl being a BASELINK is asserted near the call in 
cp_parser_template_id (cp_maybe_partial_concept_id just calls the 
function in question at this time).


2014-06-17  Braden Obrzut  ad...@maniacsvault.net
* gcc/cp/parser.c (cp_maybe_constrained_type_specifier): Fix assertion
failure if baselink was passed in as decl.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 1eaf863..40d1d63 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -15175,6 +15175,9 @@ cp_parser_allows_constrained_type_specifier (cp_parser *parser)
 static tree
 cp_maybe_constrained_type_specifier (cp_parser *parser, tree decl, tree args)
 {
+  if (BASELINK_P (decl))
+decl = BASELINK_FUNCTIONS (decl);
+
   gcc_assert (TREE_CODE (decl) == OVERLOAD);
   gcc_assert (args ? TREE_CODE (args) == TREE_VEC : true);
 


[PATCH] Testcase for PR61012

2014-06-17 Thread Richard Biener

From the new dup.

Committed to trunk and branch.

Richard.

2014-06-17  Richard Biener  rguent...@suse.de

PR lto/61012
* gcc.dg/lto/pr61526_0.c: New testcase.
* gcc.dg/lto/pr61526_1.c: Likewise.

Index: gcc/testsuite/gcc.dg/lto/pr61526_0.c
===
--- gcc/testsuite/gcc.dg/lto/pr61526_0.c(revision 0)
+++ gcc/testsuite/gcc.dg/lto/pr61526_0.c(working copy)
@@ -0,0 +1,6 @@
+/* { dg-lto-do link } */
+/* { dg-lto-options { { -fPIC -flto -flto-partition=1to1 } } } */
+/* { dg-extra-ld-options { -shared } } */
+
+static void *master;
+void *foo () { return master; }
Index: gcc/testsuite/gcc.dg/lto/pr61526_1.c
===
--- gcc/testsuite/gcc.dg/lto/pr61526_1.c(revision 0)
+++ gcc/testsuite/gcc.dg/lto/pr61526_1.c(working copy)
@@ -0,0 +1,2 @@
+extern void *master;
+void *bar () { return master; }


[c++-concepts] Allow function parameters to be referenced in trailing requires clauses

2014-06-17 Thread Braden Obrzut
This patch allows function parameters to be referenced by trailing 
requires clauses.  Typically this is used to refer to the type of an 
implicitly generated template.  For example, the following should now be 
valid (where C is some previously defined concept):


auto f1 (auto x) requires Cdecltype(x) ();

Note that the test case trailing-requires-overload.C will fail to 
compile unless the previously submitted patch is applied first.


2014-06-17  Braden Obrzut  ad...@maniacsvault.net
* gcc/cp/parser.c (cp_parser_trailing_requirements): Handle requires
keyword manually so that we can push function parameters back into
scope.
* gcc/cp/decl.c (push_function_parms): New. Recovers and reopens
function parameter scope from declarator.
* gcc/testsuite/g++.dg/concepts/trailing-requires.C: New tests.
* gcc/testsuite/g++.dg/concepts/trailing-requires-overload.C: New 
tests.
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 5d23bfa..aca3ce5 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5409,6 +5409,7 @@ extern bool defer_mark_used_calls;
 extern GTY(()) vectree, va_gc *deferred_mark_used_calls;
 extern tree finish_case_label			(location_t, tree, tree);
 extern tree cxx_maybe_build_cleanup		(tree, tsubst_flags_t);
+extern void push_function_parms (cp_declarator *);
 
 /* in decl2.c */
 extern bool check_java_method			(tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 9791dba..5daccf8 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -13791,6 +13791,22 @@ store_parm_decls (tree current_function_parms)
 current_eh_spec_block = begin_eh_spec_block ();
 }
 
+/* Bring the parameters of a function declaration back into scope without
+   entering the function body. Declarator must be a function declarator.
+   Caller is responsible for calling finish_scope. */
+
+void
+push_function_parms (cp_declarator *declarator)
+{
+  begin_scope (sk_function_parms, NULL_TREE);
+
+  for (tree parms = declarator-u.function.parameters; parms != NULL_TREE
+!VOID_TYPE_P (TREE_VALUE (parms)); parms = TREE_CHAIN (parms))
+{
+  pushdecl (TREE_VALUE (parms));
+}
+}
+
 
 /* We have finished doing semantic analysis on DECL, but have not yet
generated RTL for its body.  Save away our current state, so that
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 1eaf863..2d5862f 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -16929,7 +16929,20 @@ cp_parser_trailing_requirements (cp_parser *parser, cp_declarator *decl)
 terse_reqs = get_shorthand_requirements (current_template_parms);
 
   // An optional requires clause can yield an additional constraint.
-  tree explicit_reqs = cp_parser_requires_clause_opt (parser);
+  tree explicit_reqs = NULL_TREE;
+  if (cp_lexer_next_token_is_keyword (parser-lexer, RID_REQUIRES))
+{
+  cp_lexer_consume_token (parser-lexer);
+
+  // Bring parms back into scope so requires clause can reference them.
+  ++cp_unevaluated_operand;
+  push_function_parms (decl);
+
+  explicit_reqs = cp_parser_requires_clause (parser);
+
+  finish_scope();
+  --cp_unevaluated_operand;
+}
 
   // If requirements were specified in either the implicit
   // template parameter list or an explicit requires clause,
diff --git a/gcc/testsuite/g++.dg/concepts/trailing-requires-overload.C b/gcc/testsuite/g++.dg/concepts/trailing-requires-overload.C
new file mode 100644
index 000..2fc6cdb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/trailing-requires-overload.C
@@ -0,0 +1,115 @@
+// { dg-do run }
+// { dg-options -std=c++1y }
+
+#include cassert
+
+templatetypename T
+  concept bool C ()
+  {
+return requires (T a, T b) { { a + b } - T };
+  }
+
+templatetypename T
+  concept bool D ()
+  {
+return requires (T a, T b) { { a - b } - T };
+  }
+
+templatetypename T
+  concept bool M ()
+  {
+return requires (T a, T b) { { a * b } - T };
+  }
+
+templatetypename T
+  requires CT ()
+  struct Adds
+  {
+Adds(T a) { v = a; }
+T v;
+  };
+
+templatetypename T
+  AddsT operator+ (const AddsT a, const AddsT b)
+  {
+return a.v + b.v;
+  }
+
+templatetypename T
+  requires DT ()
+  struct Subs
+  {
+Subs(T a) { v = a; }
+T v;
+  };
+
+templatetypename T
+  SubsT operator- (const SubsT a, const SubsT b)
+  {
+return a.v - b.v;
+  }
+
+templatetypename T
+  requires MT ()
+  struct Mults
+  {
+Mults(T a) { v = a; }
+T v;
+  };
+
+templatetypename T
+  MultsT operator- (const MultsT a, const MultsT b)
+  {
+return a.v * b.v;
+  }
+
+auto f1 (auto a, decltype(a) b) - decltype(a) requires Mdecltype(a) ();
+auto f1 (auto a, decltype(a) b) - decltype(a) requires Ddecltype(a) ();
+auto f1 (auto a, decltype(a) b) - decltype(a) requires Cdecltype(a) ();
+
+struct S1
+{
+  auto f2 (auto a) - decltype(a) requires Cdecltype(a) ();
+  auto f2 (auto a) - decltype(a) requires Ddecltype(a) ();
+  auto f2 (auto a) - 

Commit: MSP430: Add NOP after DINT in hardware multiply patterns

2014-06-17 Thread Nick Clifton
Hi Guys,

  I am checking in the patch below to update the hardware multiply
  patterns for the MSP430 so that there is a NOP instruction after
  disabling interrupts with the DINT instruction.  Timing issues mean
  that it is possible for the instruction following the DINT to be
  interrupted, so it has to be a NOP.  The change is going in to the
  mainline sources and the 4.9 branch.

Cheers
  Nick

gcc/ChangeLog
2014-06-17  Nick Clifton  ni...@redhat.com

* config/msp430/msp430.md (mulhisi3): Add a NOP after the DINT.
(umulhi3, mulsidi3, umulsidi3): Likewise.

Index: gcc/config/msp430/msp430.md
===
--- gcc/config/msp430/msp430.md (revision 211726)
+++ gcc/config/msp430/msp430.md (working copy)
@@ -1423,9 +1423,9 @@
   optimize  2  msp430_hwmult_type != NONE
   *
 if (msp430_use_f5_series_hwmult ())
-  return \PUSH.W sr { DINT { MOV.W %1, 0x04C2 { MOV.W %2, 0x04C8 { 
MOV.W 0x04CA, %L0 { MOV.W 0x04CC, %H0 { POP.W sr\;
+  return \PUSH.W sr { DINT { NOP { MOV.W %1, 0x04C2 { MOV.W %2, 0x04C8 
{ MOV.W 0x04CA, %L0 { MOV.W 0x04CC, %H0 { POP.W sr\;
 else
-  return \PUSH.W sr { DINT { MOV.W %1, 0x0132 { MOV.W %2, 0x0138 { 
MOV.W 0x013A, %L0 { MOV.W 0x013C, %H0 { POP.W sr\;
+  return \PUSH.W sr { DINT { NOP { MOV.W %1, 0x0132 { MOV.W %2, 0x0138 
{ MOV.W 0x013A, %L0 { MOV.W 0x013C, %H0 { POP.W sr\;
   
 )
 
@@ -1436,9 +1436,9 @@
   optimize  2  msp430_hwmult_type != NONE
   *
 if (msp430_use_f5_series_hwmult ())
-  return \PUSH.W sr { DINT { MOV.W %1, 0x04C0 { MOV.W %2, 0x04C8 { 
MOV.W 0x04CA, %L0 { MOV.W 0x04CC, %H0 { POP.W sr\;
+  return \PUSH.W sr { DINT { NOP { MOV.W %1, 0x04C0 { MOV.W %2, 0x04C8 
{ MOV.W 0x04CA, %L0 { MOV.W 0x04CC, %H0 { POP.W sr\;
 else
-  return \PUSH.W sr { DINT { MOV.W %1, 0x0130 { MOV.W %2, 0x0138 { 
MOV.W 0x013A, %L0 { MOV.W 0x013C, %H0 { POP.W sr\;
+  return \PUSH.W sr { DINT { NOP { MOV.W %1, 0x0130 { MOV.W %2, 0x0138 
{ MOV.W 0x013A, %L0 { MOV.W 0x013C, %H0 { POP.W sr\;
   
 )
 
@@ -1449,9 +1449,9 @@
   optimize  2  msp430_hwmult_type != NONE
   *
 if (msp430_use_f5_series_hwmult ())
-  return \PUSH.W sr { DINT { MOV.W %L1, 0x04D4 { MOV.W %H1, 0x04D6 { 
MOV.W %L2, 0x04E0 { MOV.W %H2, 0x04E2 { MOV.W 0x04E4, %A0 { MOV.W 0x04E6, 
%B0 { MOV.W 0x04E8, %C0 { MOV.W 0x04EA, %D0 { POP.W sr\;
+  return \PUSH.W sr { DINT { NOP { MOV.W %L1, 0x04D4 { MOV.W %H1, 
0x04D6 { MOV.W %L2, 0x04E0 { MOV.W %H2, 0x04E2 { MOV.W 0x04E4, %A0 { MOV.W 
0x04E6, %B0 { MOV.W 0x04E8, %C0 { MOV.W 0x04EA, %D0 { POP.W sr\;
 else
-  return \PUSH.W sr { DINT { MOV.W %L1, 0x0144 { MOV.W %H1, 0x0146 { 
MOV.W %L2, 0x0150 { MOV.W %H2, 0x0152 { MOV.W 0x0154, %A0 { MOV.W 0x0156, 
%B0 { MOV.W 0x0158, %C0 { MOV.W 0x015A, %D0 { POP.W sr\;
+  return \PUSH.W sr { DINT { NOP { MOV.W %L1, 0x0144 { MOV.W %H1, 
0x0146 { MOV.W %L2, 0x0150 { MOV.W %H2, 0x0152 { MOV.W 0x0154, %A0 { MOV.W 
0x0156, %B0 { MOV.W 0x0158, %C0 { MOV.W 0x015A, %D0 { POP.W sr\;
   
 )
 
@@ -1462,8 +1462,8 @@
   optimize  2  msp430_hwmult_type != NONE
   *
 if (msp430_use_f5_series_hwmult ())
-  return \PUSH.W sr { DINT { MOV.W %L1, 0x04D0 { MOV.W %H1, 0x04D2 { 
MOV.W %L2, 0x04E0 { MOV.W %H2, 0x04E2 { MOV.W 0x04E4, %A0 { MOV.W 0x04E6, 
%B0 { MOV.W 0x04E8, %C0 { MOV.W 0x04EA, %D0 { POP.W sr\;
+  return \PUSH.W sr { DINT { NOP { MOV.W %L1, 0x04D0 { MOV.W %H1, 
0x04D2 { MOV.W %L2, 0x04E0 { MOV.W %H2, 0x04E2 { MOV.W 0x04E4, %A0 { MOV.W 
0x04E6, %B0 { MOV.W 0x04E8, %C0 { MOV.W 0x04EA, %D0 { POP.W sr\;
 else
-  return \PUSH.W sr { DINT { MOV.W %L1, 0x0140 { MOV.W %H1, 0x0142 { 
MOV.W %L2, 0x0150 { MOV.W %H2, 0x0152 { MOV.W 0x0154, %A0 { MOV.W 0x0156, 
%B0 { MOV.W 0x0158, %C0 { MOV.W 0x015A, %D0 { POP.W sr\;
+  return \PUSH.W sr { DINT { NOP { MOV.W %L1, 0x0140 { MOV.W %H1, 
0x0142 { MOV.W %L2, 0x0150 { MOV.W %H2, 0x0152 { MOV.W 0x0154, %A0 { MOV.W 
0x0156, %B0 { MOV.W 0x0158, %C0 { MOV.W 0x015A, %D0 { POP.W sr\;
   
 )


[PATCH][match-and-simplify] Make gimple_fold_stmt_to_constant_1 dumping more useful

2014-06-17 Thread Richard Biener

Committed.

Richard.

2014-06-17  Richard Biener  rguent...@suse.de

* gimple-fold.c (gimple_fold_stmt_to_constant_1): Dump
simplified expression.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 211452)
+++ gcc/gimple-fold.c   (working copy)
@@ -2810,8 +2810,8 @@ gimple_fold_stmt_to_constant_1 (gimple s
{
  if (dump_file  dump_flags  TDF_DETAILS)
{
- fprintf (dump_file, Match-and-simplified definition of );
- print_generic_expr (dump_file, lhs, 0);
+ fprintf (dump_file, Match-and-simplified );
+ print_gimple_expr (dump_file, stmt, 0, TDF_SLIM);
  fprintf (dump_file,  to );
  print_generic_expr (dump_file, res, 0);
  fprintf (dump_file, \n);


Re: [PATCH, cprop] Check rtx_cost when propagating constant

2014-06-17 Thread Zhenqiang Chen
On 17 June 2014 16:15, Richard Biener richard.guent...@gmail.com wrote:
 On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen
 zhenqiang.c...@linaro.org wrote:
 Hi,

 For some large constant, ports like ARM, need one more instructions to
 operate it. e.g

 #define MASK 0xfe00ff
 void maskdata (int * data, int len)
 {
int i = len;
for (; i  0; i -= 2)
 {
   data[i] = MASK;
   data[i + 1] = MASK;
 }
 }

 Need two instructions for each AND operation:

 andr3, r3, #16711935
 bicr3, r3, #65536

 If we keep the MASK in a register, loop2_invariant pass can hoist it
 out the loop. And it can be shared by different references.

 So the patch skips constant propagation if it makes INSN's cost higher.

 So cprop undos invariant motions work here?

Yes. GLOBAL CONST-PROP will undo invariant motions.

 Should we make sure we add a REG_EQUAL note when not propagating?

Logs show there already has REG_EQUAL note.

 Bootstrap and no make check regression on X86-64 and ARM Chrome book.

 OK for trunk?

 Thanks!
 -Zhenqiang

 ChangeLog:
 2014-06-17  Zhenqiang Chen  zhenqiang.c...@linaro.org

 * cprop.c (try_replace_reg): Check cost for constants.

 diff --git a/gcc/cprop.c b/gcc/cprop.c
 index aef3ee8..c9cf02a 100644
 --- a/gcc/cprop.c
 +++ b/gcc/cprop.c
 @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn)
rtx src = 0;
int success = 0;
rtx set = single_set (insn);
 +  int old_cost = 0;
 +  bool copy_p = false;
 +  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn));
 +
 +  if (set  SET_SRC (set)  REG_P (SET_SRC (set)))
 +copy_p = true;
 +  else
 +old_cost = set_rtx_cost (set, speed);

 Looks bogus for set == NULL?

set_rtx_cost has checked it. If it is NULL, the function will return 0;

 Also what about register pressure?

Do you think it has big register pressure impact? I think it does not
increase register pressure.

 I think this kind of change needs wider testing as RTX costs are
 usually not fully implemented and you introduce a new use kind
 (or is it already used elsewhere in this way to compute cost
 difference of a set with s/reg/const?).

Passes like fwprop, cse, auto_inc_dec, uses RTX costs to make the
decision. e.g. in function attempt_change of auto-inc-dec.c, it has
code segments like:

  old_cost = (set_src_cost (mem, speed)
  + set_rtx_cost (PATTERN (inc_insn.insn), speed));
  new_cost = set_src_cost (mem_tmp, speed);
  ...
  if (old_cost  new_cost)
{
  ...
  return false;
}

The usage of RTX costs in this patch is similar.

I had run X86-64 bootstrap and regression tests with
--enable-languages=c,c++,lto,fortran,go,ada,objc,obj-c++,java

And ARM bootstrap and regression tests with
--enable-languages=c,c++,fortran,lto,objc,obj-c++

I will run tests on i686. What other tests do you think I have to run?

 What kind of performance difference do you see?

I had run coremark, dhrystone, eembc on ARM Cortex-M4 (with some arm
backend changes). Coremark with some options show 10% performance
improvement. dhrystone is a little better. Some wave in eembc, but
overall result is better.

I will run spec2000 on X86-64 and ARM, and back to you about the
performance changes.

Thanks!
-Zhenqiang

 Thanks,
 Richard.

/* Usually we substitute easy stuff, so we won't copy everything.
   We however need to take care to not duplicate non-trivial CONST
 @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn)
to = copy_rtx (to);

validate_replace_src_group (from, to, insn);
 +
 +  /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop.
 + And it can be shared by different references.  So skip propagation if
 + it makes INSN's rtx cost higher.  */
 +  if (set  !copy_p  CONSTANT_P (to))
 +{
 +  int new_cost = set_rtx_cost (set, speed);
 +  if (new_cost  old_cost)
 +   {
 + cancel_changes (0);
 + return false;
 +   }
 +}
 +
if (num_changes_pending ()  apply_change_group ())
  success = 1;


Re: [PATCH] Fix PR61335

2014-06-17 Thread Tobias Burnus
Uros Bizjak wrote:
 Attached patch initializes problematic array to zero instead of
 uninitialized value.

 2014-06-17  Uros Bizjak  ubiz...@gmail.com

 * gfortran.dg/pr61335.f90 (cp_unit_create): Initialize
 unit_id and kind_id to zero.

 Tested on alphaev68-linux-gnu and x86_64-linux-gnu.
 OK for mainline?

Looks good to me, is obvious and shouldn't affect the test case.

In particular the variables in questions aren't used in the
code after their initialization with an undefined implicitly
declared variable, which is also otherwise unused.

Tobias


RE: [PATCH,MIPS] Remove unused code relating to reloading fcc

2014-06-17 Thread Matthew Fortune
Richard Sandiford rdsandif...@googlemail.com writes:
 Matthew Fortune matthew.fort...@imgtec.com writes:
  This is a small clean-up patch to remove code relating to reloading or
 moving
  mips fcc registers. At some point in the past these registers were
 allocated
  as part of register allocation but they are now statically allocated in
 the
  backend in a round robin fashion. The code for reloading them is therefore
 not
  necessary any more. The move costs are also irrelevant so are replaced
 with
  a comment instead (but the cases can just be deleted if that is
 preferred).
 
 I think removing the cases would be better.
 
 OK with that change.  Thanks for cleaning this up.

Re-posting as I missed removing the ST_REGS handling code from
mips_secondary_reload_class.

Is this still OK? Testsuite run on mips-unknown-linux-gnu shows no change
in pass/fail.

Regards,
Matthew

gcc/

* config/mips/mips-protos.h (mips_expand_fcc_reload): Remove.
* config/mips/mips.c (mips_expand_fcc_reload): Remove.
(mips_move_to_gpr_cost): Remove ST_REGS case.
(mips_move_from_gpr_cost): Likewise.
(mips_register_move_cost): Likewise.
(mips_secondary_reload_class): Likewise.

diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index 0b8125a..0b32a70 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -232,7 +232,6 @@ extern bool mips_use_pic_fn_addr_reg_p (const_rtx);
 extern rtx mips_expand_call (enum mips_call_type, rtx, rtx, rtx, rtx, bool);
 extern void mips_split_call (rtx, rtx);
 extern bool mips_get_pic_call_symbol (rtx *, int);
-extern void mips_expand_fcc_reload (rtx, rtx, rtx);
 extern void mips_set_return_address (rtx, rtx);
 extern bool mips_move_by_pieces_p (unsigned HOST_WIDE_INT, unsigned int);
 extern bool mips_store_by_pieces_p (unsigned HOST_WIDE_INT, unsigned int);
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 585b755..cff1d38 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -7195,35 +7195,6 @@ mips_function_ok_for_sibcall (tree decl, tree exp 
ATTRIBUTE_UNUSED)
   return true;
 }
 

-/* Emit code to move general operand SRC into condition-code
-   register DEST given that SCRATCH is a scratch TFmode FPR.
-   The sequence is:
-
-   FP1 = SRC
-   FP2 = 0.0f
-   DEST = FP2  FP1
-
-   where FP1 and FP2 are single-precision FPRs taken from SCRATCH.  */
-
-void
-mips_expand_fcc_reload (rtx dest, rtx src, rtx scratch)
-{
-  rtx fp1, fp2;
-
-  /* Change the source to SFmode.  */
-  if (MEM_P (src))
-src = adjust_address (src, SFmode, 0);
-  else if (REG_P (src) || GET_CODE (src) == SUBREG)
-src = gen_rtx_REG (SFmode, true_regnum (src));
-
-  fp1 = gen_rtx_REG (SFmode, REGNO (scratch));
-  fp2 = gen_rtx_REG (SFmode, REGNO (scratch) + MAX_FPRS_PER_FMT);
-
-  mips_emit_move (copy_rtx (fp1), src);
-  mips_emit_move (copy_rtx (fp2), CONST0_RTX (SFmode));
-  emit_insn (gen_slt_sf (dest, fp2, fp1));
-}
-

 /* Implement MOVE_BY_PIECES_P.  */
 
 bool
@@ -12044,10 +12015,6 @@ mips_move_to_gpr_cost (enum machine_mode mode 
ATTRIBUTE_UNUSED,
   /* MFC1, etc.  */
   return 4;
 
-case ST_REGS:
-  /* LUI followed by MOVF.  */
-  return 4;
-
 case COP0_REGS:
 case COP2_REGS:
 case COP3_REGS:
@@ -12081,11 +12048,6 @@ mips_move_from_gpr_cost (enum machine_mode mode, 
reg_class_t to)
   /* MTC1, etc.  */
   return 4;
 
-case ST_REGS:
-  /* A secondary reload through an FPR scratch.  */
-  return (mips_register_move_cost (mode, GENERAL_REGS, FP_REGS)
- + mips_register_move_cost (mode, FP_REGS, ST_REGS));
-
 case COP0_REGS:
 case COP2_REGS:
 case COP3_REGS:
@@ -12117,9 +12079,6 @@ mips_register_move_cost (enum machine_mode mode,
   if (to == FP_REGS  mips_mode_ok_for_mov_fmt_p (mode))
/* MOV.FMT.  */
return 4;
-  if (to == ST_REGS)
-   /* The sequence generated by mips_expand_fcc_reload.  */
-   return 8;
 }
 
   /* Handle cases in which only one class deviates from the ideal.  */
@@ -12184,23 +12143,6 @@ mips_secondary_reload_class (enum reg_class rclass,
   if (ACC_REG_P (regno))
 return reg_class_subset_p (rclass, GR_REGS) ? NO_REGS : GR_REGS;
 
-  /* We can only copy a value to a condition code register from a
- floating-point register, and even then we require a scratch
- floating-point register.  We can only copy a value out of a
- condition-code register into a general register.  */
-  if (reg_class_subset_p (rclass, ST_REGS))
-{
-  if (in_p)
-   return FP_REGS;
-  return GP_REG_P (regno) ? NO_REGS : GR_REGS;
-}
-  if (ST_REG_P (regno))
-{
-  if (!in_p)
-   return FP_REGS;
-  return reg_class_subset_p (rclass, GR_REGS) ? NO_REGS : GR_REGS;
-}
-
   if (reg_class_subset_p (rclass, FP_REGS))
 {
   if (MEM_P (x)


Bug 61407 - Build errors on latest OS X 10.10 Yosemite with Xcode 6 on GCC 4.8.3

2014-06-17 Thread Илья Михальцов
Hello.

This patch fixes gcc build problems on the latest OS X 10.10 SDK beta (see 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61407)

fixincludes/ChangeLog
* inclhack.def (darwin14_has_feature): New fix
* fixincl.x: Regenerate
* tests/base/Availability.h: Added

gcc/ChangeLog
* config/darwin-c.c (version_as_macro): Added compatibility with
OS X 10.10 macro version macro and triplet
* config/darwin-driver.c (darwin_find_version_from_kernel): Bumped 
max kernel version

libsanitizer/ChangeLog
* sanitizer_common/sanitizer_platform_limits_posix.cc: Fixed
32-bit compatible dirent struct for OS X
* sanitizer_common/sanitizer_platform_limits_posix.h: Likewise

With regards, Ilya Mikhaltsou


diff --git a/fixincludes/inclhack.def b/fixincludes/inclhack.def
index 6a1136c..b536080 100644
--- a/fixincludes/inclhack.def
+++ b/fixincludes/inclhack.def
@@ -4751,4 +4751,33 @@ fix = {
 
 test_text = extern char *\tsprintf();;
 };
+
+/*
+ * Fix stdio.h using C++ __has_feature built-in on OS X 10.10
+ */
+fix = {
+hackname  = darwin14_has_feature;
+files = Availability.h;
+mach  = *-*-darwin14.0*;
+
+c_fix = wrap;
+c_fix_arg = - _HasFeature_
+
+/*
+ * GCC doesn't support __has_feature built-in in C mode and
+ * using defined(__has_feature)  __has_feature in the same
+ * macro expression is not valid. So, easiest way is to define
+ * for this header __has_feature as a macro, returning 0, in case
+ * it is not defined internally
+ */
+#ifndef __has_feature
+#define __has_feature(x) 0
+#endif
+
+
+_HasFeature_;
+
+test_text = '';
+};
+
 /*EOF*/
diff --git a/fixincludes/tests/base/Availability.h 
b/fixincludes/tests/base/Availability.h
new file mode 100644
index 000..807c40d
--- /dev/null
+++ b/fixincludes/tests/base/Availability.h
@@ -0,0 +1,29 @@
+/*  DO NOT EDIT THIS FILE.
+
+It has been auto-edited by fixincludes from:
+
+   fixinc/tests/inc/Availability.h
+
+This had to be done to correct non-standard usages in the
+original, manufacturer supplied header file.  */
+
+#ifndef FIXINC_WRAP_AVAILABILITY_H_DARWIN14_HAS_FEATURE
+#define FIXINC_WRAP_AVAILABILITY_H_DARWIN14_HAS_FEATURE 1
+
+
+/* GCC doesn't support __has_feature built-in in C mode and
+ * using defined(__has_feature)  __has_feature in the same
+ * macro expression is not valid. So, easiest way is to define
+ * for this header __has_feature as a macro, returning 0, in case
+ * it is not defined internally
+ */
+#ifndef __has_feature
+#define __has_feature(x) 0
+#endif
+
+
+#if defined( DARWIN14_HAS_FEATURE_CHECK )
+
+#endif  /* DARWIN14_HAS_FEATURE_CHECK */
+
+#endif  /* FIXINC_WRAP_AVAILABILITY_H_DARWIN14_HAS_FEATURE */
diff --git a/gcc/config/darwin-c.c b/gcc/config/darwin-c.c
index 892ba35..39f795f 100644
--- a/gcc/config/darwin-c.c
+++ b/gcc/config/darwin-c.c
@@ -572,20 +572,31 @@ find_subframework_header (cpp_reader *pfile, const char 
*header, cpp_dir **dirp)
 
 /* Return the value of darwin_macosx_version_min suitable for the
__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ macro,
-   so '10.4.2' becomes 1040.  The lowest digit is always zero.
-   Print a warning if the version number can't be understood.  */
+   so '10.4.2' becomes 1040 and '10.10.0' becomes 101000.  The lowest
+   digit is always zero. Print a warning if the version number
+   can't be understood.  */
 static const char *
 version_as_macro (void)
 {
-  static char result[] = 1000;
+  static char result[7] = 1000;
+  int minorDigitIdx;
 
   if (strncmp (darwin_macosx_version_min, 10., 3) != 0)
 goto fail;
   if (! ISDIGIT (darwin_macosx_version_min[3]))
 goto fail;
-  result[2] = darwin_macosx_version_min[3];
-  if (darwin_macosx_version_min[4] != '\0'
-   darwin_macosx_version_min[4] != '.')
+
+  minorDigitIdx = 3;
+  result[2] = darwin_macosx_version_min[minorDigitIdx++];
+  if (ISDIGIT(darwin_macosx_version_min[minorDigitIdx])) {
+/* Starting with 10.10 numeration for mactro changed */
+result[3] = darwin_macosx_version_min[minorDigitIdx++];
+result[4] = '0';
+result[5] = '0';
+result[6] = '\0';
+  }
+  if (darwin_macosx_version_min[minorDigitIdx] != '\0'
+   darwin_macosx_version_min[minorDigitIdx] != '.')
 goto fail;
 
   return result;
diff --git a/gcc/config/darwin-driver.c b/gcc/config/darwin-driver.c
index 8b6ae93..a115616 100644
--- a/gcc/config/darwin-driver.c
+++ b/gcc/config/darwin-driver.c
@@ -57,7 +57,7 @@ darwin_find_version_from_kernel (char *new_flag)
   version_p = osversion + 1;
   if (ISDIGIT (*version_p))
 major_vers = major_vers * 10 + (*version_p++ - '0');
-  if (major_vers  4 + 9)
+  if (major_vers  4 + 10)
 goto parse_failed;
   if (*version_p++ != '.')
 goto parse_failed;
diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc 
b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc
index a93d38d..6783108 100644
--- 

Re: [PATCH,MIPS] Remove unused code relating to reloading fcc

2014-06-17 Thread Richard Sandiford
Matthew Fortune matthew.fort...@imgtec.com writes:
 Richard Sandiford rdsandif...@googlemail.com writes:
 Matthew Fortune matthew.fort...@imgtec.com writes:
  This is a small clean-up patch to remove code relating to reloading or
 moving
  mips fcc registers. At some point in the past these registers were
 allocated
  as part of register allocation but they are now statically allocated in
 the
  backend in a round robin fashion. The code for reloading them is therefore
 not
  necessary any more. The move costs are also irrelevant so are replaced
 with
  a comment instead (but the cases can just be deleted if that is
 preferred).
 
 I think removing the cases would be better.
 
 OK with that change.  Thanks for cleaning this up.

 Re-posting as I missed removing the ST_REGS handling code from
 mips_secondary_reload_class.

 Is this still OK? Testsuite run on mips-unknown-linux-gnu shows no change
 in pass/fail.

Yeah, looks good, thanks.

Richard


Re: fix math wrt volatile-bitfields vs C++ model

2014-06-17 Thread Bernd Edlinger
Hi,


On Tue, 17 Jun 2014 10:08:33, Richard Biener wrote:
 On Tue, Jun 17, 2014 at 4:08 AM, DJ Delorie d...@redhat.com wrote:

 Looks ok to me, but can you add a testcase please?

 I have a testcase, but if -flto the testcase doesn't include *any*
 definition of the test function, just all the LTO data.  Is this
 normal?
 
 Without -ffat-lto-objects yes, this is normal.  If you are trying to
 do a scan-assembler or so then this will be difficult with LTO.
 If LTO is not necessary to trigger the bug and you just want to
 use the torture I suggest to dg-skip-if -flto.
 
 Also check if 4.9 is affected.

 It is...  same fix works, though.
 
 Thanks,
 Richard.


If you have a test case where the generated code is actually different
with and without your patch, that would be interesting.

Please see gcc.dg/pr23623.c and gcc.dg/pr56997-4.c
for examples how to automatically scan the intermediate code which is
generated by -fdump-rtl-final to check the expected access mode.
That should work for all targets, even if they have different assembler
syntax.


Thanks
Bernd.
  

[PATCH] Simplify collect_switch_conv_info

2014-06-17 Thread Richard Biener

This simplifies (and for me robustifies) finding of the final_bb.
The current code is somewhat odd in that it requires at least one
non-forwarder successor of a switch to transform.  The following
patch makes us simply pick the candidate from a random edge (I chose
the default edge) using either the successor or its successor if
the successor is a forwarder.

That fixes fallout of gcc.dg/tree-ssa/pr36881.c when removing
the early copyprop pass which happened to unconditionally run
a cfgcleanup.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2014-06-17  Richard Biener  rguent...@suse.de

* tree-switch-conversion.c (collect_switch_conv_info): Simplify
and allow all blocks to be forwarders.

Index: gcc/tree-switch-conversion.c
===
*** gcc/tree-switch-conversion.c(revision 211727)
--- gcc/tree-switch-conversion.c(working copy)
*** collect_switch_conv_info (gimple swtch,
*** 640,654 
info-other_count += e-count;
  
/* See if there is one common successor block for all branch
!  targets.  If it exists, record it in FINAL_BB.  */
!   FOR_EACH_EDGE (e, ei, info-switch_bb-succs)
! {
!   if (! single_pred_p (e-dest))
!   {
! info-final_bb = e-dest;
! break;
!   }
! }
if (info-final_bb)
  FOR_EACH_EDGE (e, ei, info-switch_bb-succs)
{
--- 640,655 
info-other_count += e-count;
  
/* See if there is one common successor block for all branch
!  targets.  If it exists, record it in FINAL_BB.
!  Start with the destination of the default case as guess
!  or its destination in case it is a forwarder block.  */
!   if (! single_pred_p (e_default-dest))
! info-final_bb = e_default-dest;
!   else if (single_succ_p (e_default-dest)
!   ! single_pred_p (single_succ (e_default-dest)))
! info-final_bb = single_succ (e_default-dest);
!   /* Require that all switch destinations are either that common
!  FINAL_BB or a forwarder to it.  */
if (info-final_bb)
  FOR_EACH_EDGE (e, ei, info-switch_bb-succs)
{


[PATCH] Use vec::qsort where possible

2014-06-17 Thread Richard Biener

Just spotted these.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2014-06-17  Richard Biener  rguent...@suse.de

* genopinit.c (main): Use vec::qsort method.
* tree-ssa-loop-niter.c (discover_iteration_bound_by_body_walk):
Likewise.
* tree-vect-data-refs.c (vect_analyze_data_ref_accesses): Likewise.

Index: gcc/genopinit.c
===
--- gcc/genopinit.c (revision 211698)
+++ gcc/genopinit.c (working copy)
@@ -357,8 +357,7 @@ main (int argc, char **argv)
 }
 
   /* Sort the collected patterns.  */
-  qsort (patterns.address (), patterns.length (),
-sizeof (pattern), pattern_cmp);
+  patterns.qsort (pattern_cmp);
 
   /* Now that we've handled the extra patterns, eliminate them from
  the optabs array.  That way they don't get in the way below.  */
Index: gcc/tree-ssa-loop-niter.c
===
--- gcc/tree-ssa-loop-niter.c   (revision 211698)
+++ gcc/tree-ssa-loop-niter.c   (working copy)
@@ -3144,8 +3144,7 @@ discover_iteration_bound_by_body_walk (s
 fprintf (dump_file,  Trying to walk loop body to reduce the bound.\n);
 
   /* Sort the bounds in decreasing order.  */
-  qsort (bounds.address (), bounds.length (),
-sizeof (widest_int), wide_int_cmp);
+  bounds.qsort (wide_int_cmp);
 
   /* For every basic block record the lowest bound that is guaranteed to
  terminate the loop.  */
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 211698)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -2508,8 +2530,7 @@ vect_analyze_data_ref_accesses (loop_vec
  linear.  Don't modify the original vector's order, it is needed for
  determining what dependencies are reversed.  */
   vecdata_reference_p datarefs_copy = datarefs.copy ();
-  qsort (datarefs_copy.address (), datarefs_copy.length (),
-sizeof (data_reference_p), dr_group_sort_cmp);
+  datarefs_copy.qsort (dr_group_sort_cmp);
 
   /* Build the interleaving chains.  */
   for (i = 0; i  datarefs_copy.length () - 1;)



Re: Make ipa-ref somewhat less stupid

2014-06-17 Thread Martin Liška


On 06/16/2014 10:01 AM, Jan Hubicka wrote:

On 06/10/2014 08:34 AM, Jan Hubicka wrote:

Hi,
ipa-reference is somewhat stupid and builds its data sets for all variables 
including
addressable and public one just to prune them out after all bitmaps are 
constructed.
This used to make sense when the profile generation happened at compile time, 
but
since ipa_ref datastructure was intrdocued this is a nonsense.

Martin: It may be interesting to check if this solves the memory use issues with
chrome.  We also may be able to re-enable ipa-ref with profile-generate as
I think all the datastructures are considered to have address taken.

Hi,
there is a link to chromium stats: 
https://drive.google.com/file/d/0B0pisUJ80pO1VmNHeklCRWVkOUU/edit?usp=sharing

Both compilation were run with '-flto=6', where the upper graph adds 
'-fprofile-generate'. Memory footprint is IMHO acceptable, but compilation 
process takes twice longer with profile generation. Yeah, chromium contains a 
really big code base :)

Yep, I wonder why WPA takes so much longer. Do you think you can build lto1
with --enable-gather-detailed-mem-stats and relink with -fpre-ipa-mem-report
-fpost-ipa-mem-report -fmem-report -Q and send me the output?  It would be nice
to push Chromium under 4GB of WPA :)

There's report you requested: 
https://drive.google.com/file/d/0B0pisUJ80pO1RlRRTVBxUG5vSlE/edit?usp=sharing , 
produced by -fno-profile-generate. With enabled -fprofile-generate, WPA stage 
cannot fit to 24GB memory with enabled memory stats.

Martin



Thanks a lot!
Honza




Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-17 Thread Evgeny Stupachenko
Are i386 changes ok?
Patches with corresponding changes and new tests are attached.

Thanks,
Evgeny

On Thu, Jun 12, 2014 at 12:14 PM, Richard Biener
richard.guent...@gmail.com wrote:
 On Thu, Jun 12, 2014 at 6:04 AM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 Testing finished. No new regressions.
 Is the following patch ok?

 +  if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode)  1 ||
 +  !vect_shift_permute_load_chain (dr_chain, size, stmt, gsi,
 result_chain))

 ||s and s go to the next line.

 I miss testcases that make sure the vectorizer/backend code-paths are
 both exercised.  Put them in gcc.target/i386 and provide an appropriate
 -march.

 The vectorizer changes are ok with the above fixed, I defer to backend
 maintainers for the i386 changes.

 Richard.

 2014-06-11  Evgeny Stupachenko  evstu...@gmail.com

 * config/i386/i386.c (ix86_reassociation_width): Add alternative for
 vector case.
 * config/i386/i386.h (TARGET_VECTOR_PARALLEL_EXECUTION): New.
 * config/i386/x86-tune.def (X86_TUNE_VECTOR_PARALLEL_EXECUTION): New.
 * tree-vect-data-refs.c (vect_shift_permute_load_chain): New.
 Introduces alternative way of loads group permutaions.
 (vect_transform_grouped_load): Try alternative way of permutations.

 Thanks,
 Evgeny

 On Tue, Jun 10, 2014 at 4:43 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 ix86_reassociation_width checks INTEGRAL_MODE_P and FLOAT_MODE_P which
 include vector mode.
 I'll try to separate this into scalar and vector part, but it will
 require more testing (under the testing now).
 What about the rest of the patch?

 Thanks,
 Evgeny

 On Thu, Jun 5, 2014 at 3:54 PM, Ramana Radhakrishnan
 ramana.radhakrish...@arm.com wrote:
 On 06/05/14 12:43, Evgeny Stupachenko wrote:

 New hook is related to vector instructions only. Vector instructions
 could be sequential in pipeline, but scalar - parallel. For x86
 architectures TARGET_SCHED_REASSOC_WIDTH does not give required
 differentiation.
 General hooks could be potentially reused in other algorithms/by other
 architectures.


 It already takes a mode argument. Couldn't you use a vector mode to work
 this out ?

 If it is not enough then please be more specific about the documentation of
 this hook about where it is useful so that it's easy for people reading the
 documentation to understand at a glance what purpose it serves.


 Ramana



 Thanks,
 Evgeny

 On Thu, Jun 5, 2014 at 2:04 PM, Ramana Radhakrishnan
 ramana@googlemail.com wrote:

 On Wed, May 28, 2014 at 2:09 PM, Evgeny Stupachenko evstu...@gmail.com
 wrote:

 Hi,

 The patch introduces alternative way of permutations for load groups
 of size 2 and 3 which should be faster on architectures with low
 parallelism.
 The patch gives 2 times gain on Silvermont to the test from PR52252
 (in addition to already committed 3 times gain).

 Patch passes bootstrap on x86. Make check is in progress.


 Why do we need a new hook ? Can't you derive this information from
 something which is equally badly named TARGET_SCHED_REASSOC_WIDTH
 though used in the reassociation logic but also serves a similar
 purpose ?

 Also the documentation of this hook is incomplete at best and wrong at
 worst as this is not applied everywhere in the vectorizer but just for
 this special case for load store permuting. Implying this is useful
 everywhere in the vectorizer does not appear to be correct.

 regards
 Ramana





 ChangeLog:

 2014-05-28  Evgeny Stupachenko  evstu...@gmail.com

  * config/i386/i386.c (ix86_have_vector_parallel_execution):
 New.
  (TARGET_VECTORIZE_HAVE_VECTOR_PARALLEL_EXECUTION): New.
  * config/i386/i386.h (TARGET_VECTOR_PARALLEL_EXECUTION): New.
  * config/i386/x86-tune.def
 (X86_TUNE_VECTOR_PARALLEL_EXECUTION): New.
  * target.def (have_vector_parallel_execution): New.
  * doc/tm.texi.in (have_vector_parallel_execution)): New.
  * doc/tm.texi: Regenerate.
  * targhooks.c (default_have_vector_parallel_execution): New.
  * tree-vect-data-refs.c (vect_shift_permute_load_chain): New.
  Introduces alternative way of loads group permutaions.
  (vect_transform_grouped_load): Try alternative way of
 permutaions.

 Evgeny





vect_groups2.patch
Description: Binary data


i386tests.patch
Description: Binary data


Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-17 Thread Uros Bizjak
On Tue, Jun 17, 2014 at 2:33 PM, Evgeny Stupachenko evstu...@gmail.com wrote:

 Are i386 changes ok?
 Patches with corresponding changes and new tests are attached.

Please remove all target selectors from dg-options and dg-final
testcase directives, they are not needed inside gcc.dg/i386 directory.

The patch is OK with this change.

Thanks,
Uros.


[PATCH][RFC] Add phiopt in early opts (and add -fssa-phiopt option)

2014-06-17 Thread Richard Biener

First this adds a controlling option to the phiopt pass (-fssa-phiopt).
Second, this moves the first phiopt pass from the main optimization
pipeline into early opts (before merge-phi which confuses phiopt
but after dce which will help it).

ISTR that adding an early phiopt pass was wanted to perform CFG
cleanups on the weird CFG that the gimplifier produces from C++
code (but I fail to recollect the details nor remember a bug number).

Generally doing a phiopt before merge-phi gets the chance to screw
things up is good.  Also phiopt is a kind of cleanup that is
always beneficial as it decreases code-size.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

I felt that -ftree-XXX is bad naming so I went for -fssa-XXX
even if that is now inconsistent.  Any optinion here?  For
RTL we simply have unsuffixed names so shall we instead go
for -fphiopt?  PHI implies SSA anyway and 'SSA' or 'RTL' is
an implementation detail that the user should not be interested
in (applies to tree- as well, of course).  Now, 'phiopt' is a
bad name when thinking of users (but they shouldn't play with
those options anyway).

So - comments on the pass move?  Comments on the flag naming?

Thanks,
Richard.

2014-06-17  Richard Biener  rguent...@suse.de

* passes.def (pass_all_early_optimizations): Add phi-opt
after dce and before merge-phi.
(pass_all_optimizations): Remove first phi-opt pass.
* common.opt (fssa-phiopt): New option.
* opts.c (default_options_table): Enable -fssa-phiopt with -O1+
but not with -Og.
* tree-ssa-phiopt.c (pass_phiopt): Add gate method.
* doc/invoke.texi (-fssa-phiopt): Document.

Index: gcc/passes.def
===
--- gcc/passes.def  (revision 211736)
+++ gcc/passes.def  (working copy)
@@ -73,8 +73,12 @@ along with GCC; see the file COPYING3.
 execute TODO_rebuild_alias at this point.  */
  NEXT_PASS (pass_build_ealias);
  NEXT_PASS (pass_fre);
- NEXT_PASS (pass_merge_phi);
  NEXT_PASS (pass_cd_dce);
+ NEXT_PASS (pass_phiopt);
+ /* Do this after phiopt runs as phiopt is confused by
+PHIs with more than two arguments.  Switch conversion
+looks for a single PHI block though.  */
+ NEXT_PASS (pass_merge_phi);
  NEXT_PASS (pass_early_ipa_sra);
  NEXT_PASS (pass_tail_recursion);
  NEXT_PASS (pass_convert_switch);
@@ -155,7 +159,6 @@ along with GCC; see the file COPYING3.
   NEXT_PASS (pass_cselim);
   NEXT_PASS (pass_copy_prop);
   NEXT_PASS (pass_tree_ifcombine);
-  NEXT_PASS (pass_phiopt);
   NEXT_PASS (pass_tail_recursion);
   NEXT_PASS (pass_ch);
   NEXT_PASS (pass_stdarg);
Index: gcc/common.opt
===
--- gcc/common.opt  (revision 211736)
+++ gcc/common.opt  (working copy)
@@ -1950,6 +1950,10 @@ fsplit-wide-types
 Common Report Var(flag_split_wide_types) Optimization
 Split wide types into independent registers
 
+fssa-phiopt
+Common Report Var(flag_ssa_phiopt) Optimization
+Optimize conditional patterns using SSA PHI nodes
+
 fvariable-expansion-in-unroller
 Common Report Var(flag_variable_expansion_in_unroller) Optimization
 Apply variable expansion when loops are unrolled
Index: gcc/opts.c
===
--- gcc/opts.c  (revision 211736)
+++ gcc/opts.c  (working copy)
@@ -457,6 +457,7 @@ static const struct default_options defa
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fbranch_count_reg, NULL, 1 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fmove_loop_invariants, NULL, 1 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_ftree_pta, NULL, 1 },
+{ OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fssa_phiopt, NULL, 1 },
 
 /* -O2 optimizations.  */
 { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },
Index: gcc/tree-ssa-phiopt.c
===
--- gcc/tree-ssa-phiopt.c   (revision 211736)
+++ gcc/tree-ssa-phiopt.c   (working copy)
@@ -2332,6 +2332,7 @@ public:
 
   /* opt_pass methods: */
   opt_pass * clone () { return new pass_phiopt (m_ctxt); }
+  virtual bool gate (function *) { return flag_ssa_phiopt; }
   virtual unsigned int execute (function *)
 {
   return tree_ssa_phiopt_worker (false, gate_hoist_loads ());
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 211736)
+++ gcc/doc/invoke.texi (working copy)
@@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}.
 -fselective-scheduling -fselective-scheduling2 @gol
 -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol
 -fshrink-wrap -fsignaling-nans -fsingle-precision-constant @gol
--fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol
+-fsplit-ivs-in-unroller -fsplit-wide-types -fssa-phiopt 

Re: [PATCH][RFC] Add phiopt in early opts (and add -fssa-phiopt option)

2014-06-17 Thread pinskia


 On Jun 17, 2014, at 6:07 AM, Richard Biener rguent...@suse.de wrote:
 
 
 First this adds a controlling option to the phiopt pass (-fssa-phiopt).
 Second, this moves the first phiopt pass from the main optimization
 pipeline into early opts (before merge-phi which confuses phiopt
 but after dce which will help it).
 
 ISTR that adding an early phiopt pass was wanted to perform CFG
 cleanups on the weird CFG that the gimplifier produces from C++
 code (but I fail to recollect the details nor remember a bug number).
 
 Generally doing a phiopt before merge-phi gets the chance to screw
 things up is good.  Also phiopt is a kind of cleanup that is
 always beneficial as it decreases code-size.
 
 Bootstrap and regtest running on x86_64-unknown-linux-gnu.
 
 I felt that -ftree-XXX is bad naming so I went for -fssa-XXX
 even if that is now inconsistent.  Any optinion here?  For
 RTL we simply have unsuffixed names so shall we instead go
 for -fphiopt?  PHI implies SSA anyway and 'SSA' or 'RTL' is
 an implementation detail that the user should not be interested
 in (applies to tree- as well, of course).  Now, 'phiopt' is a
 bad name when thinking of users (but they shouldn't play with
 those options anyway).
 
 So - comments on the pass move?  Comments on the flag naming?
 
 Thanks,
 Richard.
 
 2014-06-17  Richard Biener  rguent...@suse.de
 
* passes.def (pass_all_early_optimizations): Add phi-opt
after dce and before merge-phi.
(pass_all_optimizations): Remove first phi-opt pass.
* common.opt (fssa-phiopt): New option.
* opts.c (default_options_table): Enable -fssa-phiopt with -O1+
but not with -Og.
* tree-ssa-phiopt.c (pass_phiopt): Add gate method.
* doc/invoke.texi (-fssa-phiopt): Document.
 
 Index: gcc/passes.def
 ===
 --- gcc/passes.def(revision 211736)
 +++ gcc/passes.def(working copy)
 @@ -73,8 +73,12 @@ along with GCC; see the file COPYING3.
 execute TODO_rebuild_alias at this point.  */
  NEXT_PASS (pass_build_ealias);
  NEXT_PASS (pass_fre);
 -  NEXT_PASS (pass_merge_phi);
  NEXT_PASS (pass_cd_dce);
 +  NEXT_PASS (pass_phiopt);
 +  /* Do this after phiopt runs as phiopt is confused by
 + PHIs with more than two arguments.  Switch conversion
 + looks for a single PHI block though.  */
 +  NEXT_PASS (pass_merge_phi);

I had made phiopt not be confused by more than two arguments. What has changed? 
 I think we should make phiopt again better with more two arguments. 

Thanks,
Andrew


  NEXT_PASS (pass_early_ipa_sra);
  NEXT_PASS (pass_tail_recursion);
  NEXT_PASS (pass_convert_switch);
 @@ -155,7 +159,6 @@ along with GCC; see the file COPYING3.
   NEXT_PASS (pass_cselim);
   NEXT_PASS (pass_copy_prop);
   NEXT_PASS (pass_tree_ifcombine);
 -  NEXT_PASS (pass_phiopt);
   NEXT_PASS (pass_tail_recursion);
   NEXT_PASS (pass_ch);
   NEXT_PASS (pass_stdarg);
 Index: gcc/common.opt
 ===
 --- gcc/common.opt(revision 211736)
 +++ gcc/common.opt(working copy)
 @@ -1950,6 +1950,10 @@ fsplit-wide-types
 Common Report Var(flag_split_wide_types) Optimization
 Split wide types into independent registers
 
 +fssa-phiopt
 +Common Report Var(flag_ssa_phiopt) Optimization
 +Optimize conditional patterns using SSA PHI nodes
 +
 fvariable-expansion-in-unroller
 Common Report Var(flag_variable_expansion_in_unroller) Optimization
 Apply variable expansion when loops are unrolled
 Index: gcc/opts.c
 ===
 --- gcc/opts.c(revision 211736)
 +++ gcc/opts.c(working copy)
 @@ -457,6 +457,7 @@ static const struct default_options defa
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fbranch_count_reg, NULL, 1 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fmove_loop_invariants, NULL, 1 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_ftree_pta, NULL, 1 },
 +{ OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fssa_phiopt, NULL, 1 },
 
 /* -O2 optimizations.  */
 { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },
 Index: gcc/tree-ssa-phiopt.c
 ===
 --- gcc/tree-ssa-phiopt.c(revision 211736)
 +++ gcc/tree-ssa-phiopt.c(working copy)
 @@ -2332,6 +2332,7 @@ public:
 
   /* opt_pass methods: */
   opt_pass * clone () { return new pass_phiopt (m_ctxt); }
 +  virtual bool gate (function *) { return flag_ssa_phiopt; }
   virtual unsigned int execute (function *)
 {
   return tree_ssa_phiopt_worker (false, gate_hoist_loads ());
 Index: gcc/doc/invoke.texi
 ===
 --- gcc/doc/invoke.texi(revision 211736)
 +++ gcc/doc/invoke.texi(working copy)
 @@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}.
 -fselective-scheduling -fselective-scheduling2 @gol
 

Re: [PATCH][RFC] Add phiopt in early opts (and add -fssa-phiopt option)

2014-06-17 Thread Richard Biener
On Tue, 17 Jun 2014, pins...@gmail.com wrote:

 
 
  On Jun 17, 2014, at 6:07 AM, Richard Biener rguent...@suse.de wrote:
  
  
  First this adds a controlling option to the phiopt pass (-fssa-phiopt).
  Second, this moves the first phiopt pass from the main optimization
  pipeline into early opts (before merge-phi which confuses phiopt
  but after dce which will help it).
  
  ISTR that adding an early phiopt pass was wanted to perform CFG
  cleanups on the weird CFG that the gimplifier produces from C++
  code (but I fail to recollect the details nor remember a bug number).
  
  Generally doing a phiopt before merge-phi gets the chance to screw
  things up is good.  Also phiopt is a kind of cleanup that is
  always beneficial as it decreases code-size.
  
  Bootstrap and regtest running on x86_64-unknown-linux-gnu.
  
  I felt that -ftree-XXX is bad naming so I went for -fssa-XXX
  even if that is now inconsistent.  Any optinion here?  For
  RTL we simply have unsuffixed names so shall we instead go
  for -fphiopt?  PHI implies SSA anyway and 'SSA' or 'RTL' is
  an implementation detail that the user should not be interested
  in (applies to tree- as well, of course).  Now, 'phiopt' is a
  bad name when thinking of users (but they shouldn't play with
  those options anyway).
  
  So - comments on the pass move?  Comments on the flag naming?
  
  Thanks,
  Richard.
  
  2014-06-17  Richard Biener  rguent...@suse.de
  
 * passes.def (pass_all_early_optimizations): Add phi-opt
 after dce and before merge-phi.
 (pass_all_optimizations): Remove first phi-opt pass.
 * common.opt (fssa-phiopt): New option.
 * opts.c (default_options_table): Enable -fssa-phiopt with -O1+
 but not with -Og.
 * tree-ssa-phiopt.c (pass_phiopt): Add gate method.
 * doc/invoke.texi (-fssa-phiopt): Document.
  
  Index: gcc/passes.def
  ===
  --- gcc/passes.def(revision 211736)
  +++ gcc/passes.def(working copy)
  @@ -73,8 +73,12 @@ along with GCC; see the file COPYING3.
  execute TODO_rebuild_alias at this point.  */
   NEXT_PASS (pass_build_ealias);
   NEXT_PASS (pass_fre);
  -  NEXT_PASS (pass_merge_phi);
   NEXT_PASS (pass_cd_dce);
  +  NEXT_PASS (pass_phiopt);
  +  /* Do this after phiopt runs as phiopt is confused by
  + PHIs with more than two arguments.  Switch conversion
  + looks for a single PHI block though.  */
  +  NEXT_PASS (pass_merge_phi);
 
 I had made phiopt not be confused by more than two arguments. What has 
 changed?  I think we should make phiopt again better with more two 
 arguments.

I'm not sure - the above is just what I remember seeing, not currently
failing testcases.  I can certainly remove the comment - or do you
say phiopt now eventually benefits from merge_phi?  Then I can as
well keep merge_phi where it is right now.

Richard.

 Thanks,
 Andrew
 
 
   NEXT_PASS (pass_early_ipa_sra);
   NEXT_PASS (pass_tail_recursion);
   NEXT_PASS (pass_convert_switch);
  @@ -155,7 +159,6 @@ along with GCC; see the file COPYING3.
NEXT_PASS (pass_cselim);
NEXT_PASS (pass_copy_prop);
NEXT_PASS (pass_tree_ifcombine);
  -  NEXT_PASS (pass_phiopt);
NEXT_PASS (pass_tail_recursion);
NEXT_PASS (pass_ch);
NEXT_PASS (pass_stdarg);
  Index: gcc/common.opt
  ===
  --- gcc/common.opt(revision 211736)
  +++ gcc/common.opt(working copy)
  @@ -1950,6 +1950,10 @@ fsplit-wide-types
  Common Report Var(flag_split_wide_types) Optimization
  Split wide types into independent registers
  
  +fssa-phiopt
  +Common Report Var(flag_ssa_phiopt) Optimization
  +Optimize conditional patterns using SSA PHI nodes
  +
  fvariable-expansion-in-unroller
  Common Report Var(flag_variable_expansion_in_unroller) Optimization
  Apply variable expansion when loops are unrolled
  Index: gcc/opts.c
  ===
  --- gcc/opts.c(revision 211736)
  +++ gcc/opts.c(working copy)
  @@ -457,6 +457,7 @@ static const struct default_options defa
  { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fbranch_count_reg, NULL, 1 },
  { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fmove_loop_invariants, NULL, 1 },
  { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_ftree_pta, NULL, 1 },
  +{ OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fssa_phiopt, NULL, 1 },
  
  /* -O2 optimizations.  */
  { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },
  Index: gcc/tree-ssa-phiopt.c
  ===
  --- gcc/tree-ssa-phiopt.c(revision 211736)
  +++ gcc/tree-ssa-phiopt.c(working copy)
  @@ -2332,6 +2332,7 @@ public:
  
/* opt_pass methods: */
opt_pass * clone () { return new pass_phiopt (m_ctxt); }
  +  virtual bool gate (function *) { return flag_ssa_phiopt; }
virtual unsigned 

Re: [PATCH, Pointer Bounds Checker 28/x] IPA CP

2014-06-17 Thread Martin Jambor
Hi,

On Wed, Jun 11, 2014 at 05:47:36PM +0400, Ilya Enkovich wrote:

 Here is fixed verison.

I'm fine with the ipa-cp hunks but I cannot approve them, Honza is the
right person to ask.

Thanks,

Martin


 
 Thanks,
 Ilya
 --
 gcc/
 
 2014-06-11  Ilya Enkovich  ilya.enkov...@intel.com
 
   * cgraph.h (cgraph_local_p): New.
   * ipa-cp.c (initialize_node_lattices): Use cgraph_local_p
   to handle instrumentation clones properly.
   (propagate_constants_accross_call): Do not propagate
   through instrumentation thunks.
 
 
 diff --git a/gcc/cgraph.h b/gcc/cgraph.h
 index 5e702a7..b225ebe 100644
 --- a/gcc/cgraph.h
 +++ b/gcc/cgraph.h
 @@ -1556,4 +1556,17 @@ symtab_in_same_comdat_p (symtab_node *one, symtab_node 
 *two)
  {
return DECL_COMDAT_GROUP (one-decl) == DECL_COMDAT_GROUP (two-decl);
  }
 +
 +/* Return true if NODE is local.  Instrumentation clones are counted as local
 +   only when originla function is local.  */
 +
 +static inline bool
 +cgraph_local_p (cgraph_node *node)
 +{
 +  if (!node-instrumentation_clone || !node-instrumented_version)
 +return node-local.local;
 +
 +  return node-local.local  node-instrumented_version-local.local;
 +}
 +
  #endif  /* GCC_CGRAPH_H  */
 diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
 index 689378a..4318789 100644
 --- a/gcc/ipa-cp.c
 +++ b/gcc/ipa-cp.c
 @@ -699,7 +699,7 @@ initialize_node_lattices (struct cgraph_node *node)
int i;
  
gcc_checking_assert (cgraph_function_with_gimple_body_p (node));
 -  if (!node-local.local)
 +  if (!cgraph_local_p (node))
  {
/* When cloning is allowed, we can assume that externally visible
functions are not called.  We will compensate this by cloning
 @@ -1434,6 +1434,24 @@ propagate_constants_accross_call (struct cgraph_edge 
 *cs)
if (parms_count == 0)
  return false;
  
 +  /* No propagation through instrumentation thunks is available yet.
 + It should be possible with proper mapping of call args and
 + instrumented callee params in the propagation loop below.  But
 + this case mostly occurs when legacy code calls instrumented code
 + and it is not a primary target for optimizations.
 + We detect instrumentation thunks in aliases and thunks chain by
 + checking instrumentation_clone flag for chain source and target.
 + Going through instrumentation thunks we always have it changed
 + from 0 to 1 and all other nodes do not change it.  */
 +  if (!cs-callee-instrumentation_clone
 +   callee-instrumentation_clone)
 +{
 +  for (i = 0; i  parms_count; i++)
 + ret |= set_all_contains_variable (ipa_get_parm_lattices (callee_info,
 +  i));
 +  return ret;
 +}
 +
/* If this call goes through a thunk we must not propagate to the first 
 (0th)
   parameter.  However, we might need to uncover a thunk from below a 
 series
   of aliases first.  */


Compile gcc.target/i386/fuse-caller-save.c with -fomit-frame-pointer (PR target/61533)

2014-06-17 Thread Rainer Orth
gcc.target/i386/fuse-caller-save.c currently FAILs on Solaris/x86 with
gas and -m64:

FAIL: gcc.target/i386/fuse-caller-save.c scan-assembler-not .cfi_def_cfa_offset
FAIL: gcc.target/i386/fuse-caller-save.c scan-assembler-not .cfi_offset

Fixed as follows as suggested and pre-approved by Uros in the PR.
Tested with the appropriate runtest invocations on i386-pc-solaris2.11
and x86_64-unknown-linux-gnu, installed on mainline.

Rainer


2014-06-17  Rainer Orth  r...@cebitec.uni-bielefeld.de

PR target/61533
* gcc.target/i386/fuse-caller-save.c: Add -fomit-frame-pointer to
dg-options.

diff --git a/gcc/testsuite/gcc.target/i386/fuse-caller-save.c b/gcc/testsuite/gcc.target/i386/fuse-caller-save.c
--- a/gcc/testsuite/gcc.target/i386/fuse-caller-save.c
+++ b/gcc/testsuite/gcc.target/i386/fuse-caller-save.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -O2 -fuse-caller-save } */
+/* { dg-options -O2 -fuse-caller-save -fomit-frame-pointer } */
 /* { dg-additional-options -mregparm=1 { target ia32 } } */
 
 /* Testing -fuse-caller-save optimization option.  */


-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Bug 61407 - Build errors on latest OS X 10.10 Yosemite with Xcode 6 on GCC 4.8.3

2014-06-17 Thread Bernhard Reutner-Fischer

On 17 June 2014 13:10:07 Илья Михальцов morph...@gmail.com wrote:


index 892ba35..39f795f 100644
--- a/gcc/config/darwin-c.c
+++ b/gcc/config/darwin-c.c
@@ -572,20 +572,31 @@ find_subframework_header (cpp_reader *pfile, const 
char *header, cpp_dir **dirp)


 /* Return the value of darwin_macosx_version_min suitable for the
__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ macro,
-   so '10.4.2' becomes 1040.  The lowest digit is always zero.
-   Print a warning if the version number can't be understood.  */
+   so '10.4.2' becomes 1040 and '10.10.0' becomes 101000.  The lowest
+   digit is always zero. Print a warning if the version number
+   can't be understood.  */
 static const char *
 version_as_macro (void)
 {
-  static char result[] = 1000;
+  static char result[7] = 1000;
+  int minorDigitIdx;

   if (strncmp (darwin_macosx_version_min, 10., 3) != 0)
 goto fail;
   if (! ISDIGIT (darwin_macosx_version_min[3]))
 goto fail;
-  result[2] = darwin_macosx_version_min[3];
-  if (darwin_macosx_version_min[4] != '\0'
-   darwin_macosx_version_min[4] != '.')
+
+  minorDigitIdx = 3;
+  result[2] = darwin_macosx_version_min[minorDigitIdx++];
+  if (ISDIGIT(darwin_macosx_version_min[minorDigitIdx])) {
+/* Starting with 10.10 numeration for mactro changed */


What does mactro mean? macro?
Thanks,


Sent with AquaMail for Android
http://www.aqua-mail.com




Re: Another AIX Bootstrap failure

2014-06-17 Thread David Edelsohn
On Mon, Jun 16, 2014 at 11:44 PM, Jan Hubicka hubi...@ucw.cz wrote:

 The linker is not seeing the local definition of
 ._ZN14__gnu_parallel9_SettingsC1Ev.  libstdc++ is built with
 Linux-like semantics, so it allows symbols to be overridden. AIX calls
 everything through the PLT. But the real definition of the function is

 Even static functions?

 not being seen.

 I'm not exactly sure why inlining changing this and what these extra
 levels of indirections are trying to accomplish. The visibility of the

 To avoid using PLT and GOT when the unit refers to the symbol and we know
 that interposition does not matter.

I am not certain if the linker is creating the PLT stub code because
it wants to allow interpolation or because it cannot see a definition
of the function and wants to allow for some other shared library to
provide the definition at runtime.

 Why branch to a non-global (static) symbol
   b ._ZN14__gnu_parallel9_SettingsC1Ev.localalias.0
 leads to PLT stub here and why branching to such symbols seems to work 
 otherwise?

Branching to non-global (static) symbol, even an alias, is working
here. The weak function seems to be the problem.

 The failing branch is
 b ._ZN14__gnu_parallel9_SettingsC1Ev.localalias.0
 so the call to static construction seems to have happened correctly but we can
 not get right the call from the constructor to static function (that is an 
 alias
 of a global symbol)

The linker appears to not want to resolve the weak function. If I
change ._ZN14__gnu_parallel9_SettingsC1Ev to lglobl, it works. If I
change the static constructor to call the weak function directly,
avoiding the alias, it shows the same failure mode.

I don't know what code generation looked like before.  Was GCC
generating calls to weak functions within the same file?

Thanks, David


Re: Regimplification enhancements 3/3

2014-06-17 Thread Martin Jambor
On Mon, Jun 16, 2014 at 01:38:49PM +0200, Richard Biener wrote:
 On Mon, Jun 16, 2014 at 12:57 PM, Bernd Schmidt ber...@codesourcery.com 
 wrote:
  There's code in regimplification that makes us use an extra temporary
  when we encounter a call returning a non-BLKmode structure. This seems
  somewhat inefficient and unnecessary, and when used from the
  lower-addr-spaces pass I'm working on it leads to problems further
  down that look like tree-ssa bugs that I wasn't able to clearly
  disentangle.
 
  Here's what happens on compile/pr51761.c.  Regimplification has the
  following effect, creating an extra temporary _6:
 
  -  D.1378 = fooD.1373 (aD.1377);
  +  _6 = fooD.1373 (aD.1377);
  +  # .MEMD.1382 = VDEF .MEMD.1382
  +  D.1378 = _6;
 
  SRA turns this into:
 
_6 = fooD.1373 (aD.1377);
# VUSE .MEM_3
SR$2_7 = MEM[(struct S *)_6];
 
 clearly bogus - _6 is a register, you can't use a MEM on it.

Weird... does the following (untested) patch help?

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 0afa197..747b1b6 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -3277,6 +3277,8 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator 
*gsi)
 
   if (modify_this_stmt
   || gimple_has_volatile_ops (*stmt)
+  || is_gimple_reg (lhs)
+  || is_gimple_reg (rhs)
   || contains_vce_or_bfcref_p (rhs)
   || contains_vce_or_bfcref_p (lhs)
   || stmt_ends_bb_p (*stmt))

It is just a quick thought though.  If it does not, could you post the
access trees dumped by -fdump-tree-esra-details or
-fdump-tree-sra-details (depending on whether this is early or late
SRA)?  Or is it simple to set it up locally?

Thanks,

Martin

 
  Somehow, the address of _6 doesn't count as a use, and the DCE pass decides
  it is unused:
 
Eliminating unnecessary statements:
Deleting LHS of call: _6 = foo (a);
 
  However, the statement
SR$2_7 = MEM[(struct S *)_6];
  is still present, and we have an SSA name without a definition, leading to a
  crash.
 
  Rather than figure all this out, I decided to try making the
  regimplification not generate the extra copy in the first place. The
  testsuite seems to agree with me that it's unnecessary. Bootstrapped and
  tested on x86_64-linux, ok?
 
 Ok.  The code looks bogus anyway in that it generates a SSA name
 for sth not is_gimple_reg_type ().
 
 Thanks,
 Richard.
 
 
  Bernd


Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access

2014-06-17 Thread Charles Baylis
On 5 June 2014 07:27, Ramana Radhakrishnan ramana@googlemail.com wrote:
 On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis
 charles.bay...@linaro.org wrote:
 This patch adds support for post-indexed addressing for NEON structure
 memory accesses.

 For example VLD1.8 {d0}, [r0], r1


 Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.

 Ok for trunk?

 This looks like a reasonable start but this work doesn't look complete
 to me yet.

 Can you also look at the impact on performance of a range of
 benchmarks especially a popular embedded one to see how this behaves
 unless you have already done so ?

I ran a popular suite of embedded benchmarks, and there is no impact
at all on Chromebook (including with the additional attached patch)

The patch was developed to address a performance issue with a new
version of libvpx which uses intrinsics instead of NEON assembler. The
patch results in a 3% improvement for VP8 decode.

 POST_INC, POST_MODIFY usually have a funny way of biting you with
 either ivopts or the way in which address costs work. I think there
 maybe further tweaks needed but for a first step I'd like to know what
 the performance impact is.

 I would also suggest running this through clyon's neon intrinsics
 testsuite to see if that catches any issues especially with the large
 vector modes.

No issues found in clyon's tests.

Your mention of larger vector modes prompted me to check that the
patch has the desired result with them. In fact, the costs are
estimated incorrectly which means the post_modify pattern is not used.
The attached patch fixes that. (used in combination with my original
patch)


2014-06-15  Charles Baylis  charles.ba...@linaro.org

* config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with
embedded side effects.


0002-Adjust-costs-for-mem-with-post_modify.patch
Description: application/download


Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-17 Thread Richard Henderson
On 06/17/2014 05:33 AM, Evgeny Stupachenko wrote:
 +   1st vec:   0  1  2  3  4  5  6  7
 +   2nd vec:   8  9 10 11 12 13 14 15
 +   3rd vec:  16 17 18 19 20 21 22 23
 +
 +   The output sequence should be:
 +
 +   1st vec:  0 3 6  9 12 15 18 21
 +   2nd vec:  1 4 7 10 13 16 19 22
 +   3rd vec:  2 5 8 11 14 17 20 23
 +
 +   We use 3 shuffle instructions and 3 * 3 - 1 shifts to create such output.

Why not 3 * 2 blend followed by 3 shuffle?  When length is prime, as here, we
know that no blend will ever overlap elements.  So:

1st step

  A1 = blend V1 V2 =  0  9  2  3 12  5  6 15
  A2 = blend V1 V2 =  8  1 10 11  4 13 14  7
  A3 = blend V1 V3 = 16 17  2 19 20  5 22 23

2nd step

  B1 = blend A1 V3 =  0  9 18  3 12 21  6 15
  B2 = blend A2 V3 = 16  1 10 19  4 13 22  7
  B3 = blend A3 V2 =  8 17  2 11 20  5 14 23

3rd step

  C1 = perm B1 =  0  3  6  9 12 15 18 21
  C2 = perm B2 =  1  4  7 10 13 16 19 22
  C3 = perm B3 =  2  5  8 11 14 17 20 23

The final permute here isn't trivial, crossing lanes for avx2 and all, but the
initial permute you use is similar.


r~


[PATCH GCC 2/2]Add 'force-dwarf-lexical-blocks' command line option - extend to C++

2014-06-17 Thread Herman, Andrei

Hi,

This is the third (and final) patch which extends the original change 
proposal, submitted on June 1, and titled Add 
'force-dwarf-lexical-blocks' command line option.


This patch extends the proposed functionality to C++.

Attached are the proposed ChangeLog additions (for this patch only), 
named according to the directory each one belongs to.


All check-c and check-c++ tests have been run for unix target.
The testsuites showed identical results, with and without setting the 
proposed -fforce-dwarf-lexical-blocks command line option.


Please let me know, if the proposed additions will be accepted.

Best regards,
Andrei Herman
Mentor Graphics Corporation
Israel branch
From 824e75eb563e82c04fe1621c64430d87cdb0f348 Mon Sep 17 00:00:00 2001
From: Andrei Herman andrei_her...@codesourcery.com
Date: Tue, 17 Jun 2014 17:59:07 +0300
Subject: [PATCH 3/3] Support flag_force_dwarf_blocks in C++.

* c-semantics.c (push_block_info): Allow BIND_EXPR for STATEMENT_LIST.

* cp-objcp-common.c (cxx_block_may_fallthru): Return false for break
or continue, when flag_force_dwarf_blocks.

* cp-tree.h (pop_scope_for_labels): New.

* name-lookup.c (keep_current_level): New.
(kept_level_p): When flag_force_dwarf_blocks, avoid creating duplicate
blocks.

* name-lookup.h (keep_current_level): New.

* parser.c (cp_parser_statement): Add last_label and pass it when
calling cp_parser_label_for_labeled_statement, to create a label scope
for the first label of a statement.  Close forced scopes at current
level, after labeled compound statements that don't fall through.
(cp_parser_force_block_for_label): New.
(pop_scope_for_labels): New.
(cp_parser_label_for_labeled_statement): Add parameter.  Create a label
scope for the first label of a statement.
(cp_parser_compound_statement): Force a block for compound statement.
(cp_parser_implicitly_scoped_statement): Likewise for if-then, if-else,
switch and do statements.
(cp_parser_already_scoped_statement): Likewise for for/while bodies.

* semantics.c (do_poplevel): Close any forced scopes in given level.
(build_data_member_initialization): Allow BIND_EXP.

Signed-off-by: Andrei Herman andrei_her...@codesourcery.com
---
 gcc/c-family/c-semantics.c |   11 -
 gcc/cp/cp-objcp-common.c   |5 ++
 gcc/cp/cp-tree.h   |1 +
 gcc/cp/name-lookup.c   |   12 +-
 gcc/cp/name-lookup.h   |1 +
 gcc/cp/parser.c|  104 
 gcc/cp/semantics.c |5 ++
 7 files changed, 127 insertions(+), 12 deletions(-)

diff --git a/gcc/c-family/c-semantics.c b/gcc/c-family/c-semantics.c
index ec3045f..8c8497f 100644
--- a/gcc/c-family/c-semantics.c
+++ b/gcc/c-family/c-semantics.c
@@ -35,8 +35,15 @@ along with GCC; see the file COPYING3.  If not see
 void
 push_block_info (tree block, location_t loc, bool is_label)
 {
-  if (TREE_CODE(block) != STATEMENT_LIST)
+  switch (TREE_CODE (block)) {
+  case BIND_EXPR:
+block = BIND_EXPR_BODY (block);
+/* Fall through.  */
+  case STATEMENT_LIST:
+break;
+  default:
 return;
+  }
 
   block_loc tl;
   tl = (block_loc) ggc_internal_cleared_alloc (sizeof(struct block_loc_s));
@@ -70,7 +77,7 @@ check_pop_block_info(tree block, location_t loc)
   if (block == cur_block_info-block  loc == cur_block_info-loc
!cur_block_info-is_label)
 {
-  block_list_stack-pop();
+  block_list_stack-pop ();
 }
 }
 }
diff --git a/gcc/cp/cp-objcp-common.c b/gcc/cp/cp-objcp-common.c
index 78dddef..fcfd959 100644
--- a/gcc/cp/cp-objcp-common.c
+++ b/gcc/cp/cp-objcp-common.c
@@ -238,6 +238,11 @@ cxx_block_may_fallthru (const_tree stmt)
   return false;
 
 default:
+  if (flag_force_dwarf_blocks) {
+if (TREE_CODE (stmt) == BREAK_STMT ||
+TREE_CODE (stmt) == CONTINUE_STMT)
+  return false;
+  }
   return true;
 }
 }
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 7d29c2c..4953ad9 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5501,6 +5501,7 @@ extern bool maybe_clone_body  (tree);
 extern tree cp_convert_range_for (tree, tree, tree, bool);
 extern bool parsing_nsdmi (void);
 extern void inject_this_parameter (tree, cp_cv_quals);
+extern void pop_scope_for_labels (tree);
 
 /* in pt.c */
 extern bool check_template_shadow  (tree);
diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 2baeeb7..5538c63 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -1745,7 +1745,8 @@ local_bindings_p (void)
 bool
 kept_level_p (void)
 {
-  return (current_binding_level-blocks != NULL_TREE
+  return ((!flag_force_dwarf_blocks
+current_binding_level-blocks != NULL_TREE)
  || current_binding_level-keep
  || current_binding_level-kind 

Re: Another AIX Bootstrap failure

2014-06-17 Thread Jan Hubicka
  To avoid using PLT and GOT when the unit refers to the symbol and we know
  that interposition does not matter.
 
 I am not certain if the linker is creating the PLT stub code because
 it wants to allow interpolation or because it cannot see a definition
 of the function and wants to allow for some other shared library to
 provide the definition at runtime.

OK, but the definition appears in the same file..
 
  Why branch to a non-global (static) symbol
b ._ZN14__gnu_parallel9_SettingsC1Ev.localalias.0
  leads to PLT stub here and why branching to such symbols seems to work 
  otherwise?
 
 Branching to non-global (static) symbol, even an alias, is working
 here. The weak function seems to be the problem.
 
  The failing branch is
  b ._ZN14__gnu_parallel9_SettingsC1Ev.localalias.0
  so the call to static construction seems to have happened correctly but we 
  can
  not get right the call from the constructor to static function (that is an 
  alias
  of a global symbol)
 
 The linker appears to not want to resolve the weak function. If I
 change ._ZN14__gnu_parallel9_SettingsC1Ev to lglobl, it works. If I
 change the static constructor to call the weak function directly,
 avoiding the alias, it shows the same failure mode.
 
 I don't know what code generation looked like before.  Was GCC
 generating calls to weak functions within the same file?

Yes, this is how you implement COMDAT functions, right?  I looked at rs6000 call
expansion and it does not seem to care about visibility properties (just about
direct wrt indirect call).

One problem I can think of is a scenario where linked unify calls comdat 
functoins
in between units somehow forgetting about the aliases, but this function seems 
to
not be shared.
Index: symtab.c
===
--- symtab.c(revision 211693)
+++ symtab.c(working copy)
@@ -1327,10 +1327,8 @@
   (void *)new_node, true);
   if (new_node)
 return new_node;
-#ifndef ASM_OUTPUT_DEF
   /* If aliases aren't supported by the assembler, fail.  */
   return NULL;
-#endif
 
   /* Otherwise create a new one.  */
   new_decl = copy_node (node-decl);

disable generation of the local aliases completely.  I do not see much of 
difference
in the actual codegen with this...
I will check older GCC

Honza
 
 Thanks, David


Re: [Patch] PR55189 enable -Wreturn-type by default

2014-06-17 Thread Sylvestre Ledru
On 05/06/2014 20:01, Joseph S. Myers wrote:

 Initially, I implemented -Wmissing-return to manage this case (
 https://gcc.gnu.org/ml/gcc-patches/2014-01/msg00820.html ) but Jason
 suggested to remove that:
 https://gcc.gnu.org/ml/gcc-patches/2014-01/msg01033.html
 (I don't have a strong opinion on the subject).
 I think splitting the option like that makes sense.  Compatibility 
 indicates that -Wreturn-type and -Wall should still enable 
 -Wmissing-return, but only the other pieces of -Wreturn-type should be 
 enabled by default, at least for C.  (Enabling -Wimplicit-int by default 
 might be a good starting point.)
OK.
As attachment, you will find a potential implementation. Is that what
you expect?

 Also, at least one testsuite change in your patch is wrong. 
OK. Thanks. I've probably made other (I update +1300 of them)

Thanks
Sylvestre

From 1b936c618c58dc0e899fa9f56013de48f7e4dcd6 Mon Sep 17 00:00:00 2001
From: Sylvestre Ledru sylves...@debian.org
Date: Tue, 17 Jun 2014 18:48:29 +0200
Subject: [PATCH 2/2] Enable Wimplicit by default

---
 gcc/c-family/c.opt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 050d400..9b9ede7 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -460,7 +460,7 @@ C ObjC Var(warn_implicit_function_declaration) Init(-1) Warning LangEnabledBy(C
 Warn about implicit function declarations
 
 Wimplicit-int
-C ObjC Var(warn_implicit_int) Warning LangEnabledBy(C ObjC,Wimplicit)
+C ObjC Var(warn_implicit_int) Warning
 Warn when a declaration does not specify a type
 
 Wimport
-- 
2.0.0

From 80cd3dff34f74058ab66b69e0e01a05eaf686338 Mon Sep 17 00:00:00 2001
From: Sylvestre Ledru sylves...@debian.org
Date: Tue, 17 Jun 2014 18:48:12 +0200
Subject: [PATCH 1/2] Introduce -Wmissing-return (Was part of -Wreturn-type
 which is now enabled by default)

---
 gcc/c-family/c.opt|  4 
 gcc/doc/invoke.texi   | 10 +-
 gcc/fortran/options.c |  4 
 gcc/tree-cfg.c|  4 ++--
 4 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 91f8275..050d400 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -697,6 +697,10 @@ Wreturn-type
 C ObjC C++ ObjC++ Var(warn_return_type) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall)
 Warn whenever a function's return type defaults to \int\ (C), or about inconsistent return types (C++)
 
+Wmissing-return
+C ObjC C++ ObjC++ Var(warn_missing_return) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall)
+Warn whenever control may reach end of non-void function
+
 Wselector
 ObjC ObjC++ Var(warn_selector) Warning
 Warn if a selector has multiple methods
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9a34f1c..9911e86 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -258,7 +258,7 @@ Objective-C and Objective-C++ Dialects}.
 -Winvalid-pch -Wlarger-than=@var{len}  -Wunsafe-loop-optimizations @gol
 -Wlogical-op -Wlogical-not-parentheses -Wlong-long @gol
 -Wmain -Wmaybe-uninitialized -Wmissing-braces  -Wmissing-field-initializers @gol
--Wmissing-include-dirs @gol
+-Wmissing-include-dirs -Wmissing-return @gol
 -Wno-multichar  -Wnonnull  -Wno-overflow -Wopenmp-simd @gol
 -Woverlength-strings  -Wpacked  -Wpacked-bitfield-compat  -Wpadded @gol
 -Wparentheses  -Wpedantic-ms-format -Wno-pedantic-ms-format @gol
@@ -3327,6 +3327,7 @@ Options} and @ref{Objective-C and Objective-C++ Dialect Options}.
 -Wmain @r{(only for C/ObjC and unless} @option{-ffreestanding}@r{)}  @gol
 -Wmaybe-uninitialized @gol
 -Wmissing-braces @r{(only for C/ObjC)} @gol
+-Wmissing-return @gol
 -Wnonnull  @gol
 -Wopenmp-simd @gol
 -Wparentheses  @gol
@@ -3657,6 +3658,13 @@ the following example, the initializer for @samp{a} is not fully
 bracketed, but that for @samp{b} is fully bracketed.  This warning is
 enabled by @option{-Wall} in C.
 
+@item -Wmissing-return
+@opindex Wmissing-return
+@opindex Wno-missing-return
+Warn whenever falling off the end of the function body (I.e. without
+any return).
+This warning is enabled by @option{-Wall} for C and C++.
+
 @smallexample
 int a[2][2] = @{ 0, 1, 2, 3 @};
 int b[2][2] = @{ @{ 0, 1 @}, @{ 2, 3 @} @};
diff --git a/gcc/fortran/options.c b/gcc/fortran/options.c
index a2b91ca..fe71230 100644
--- a/gcc/fortran/options.c
+++ b/gcc/fortran/options.c
@@ -698,6 +698,10 @@ gfc_handle_option (size_t scode, const char *arg, int value,
   gfc_option.warn_line_truncation = value;
   break;
 
+case OPT_Wmissing_return:
+  warn_missing_return = value;
+  break;
+
 case OPT_Wrealloc_lhs:
   gfc_option.warn_realloc_lhs = value;
   break;
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index e824619..2fd342e 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -8265,7 +8265,7 @@ pass_warn_function_return::execute (function *fun)
 
   /* If we see return; in some basic block, then we do reach the end
  without returning a value.  */
-  else if (warn_return_type
+  else if 

Re: Another AIX Bootstrap failure

2014-06-17 Thread David Edelsohn
On Tue, Jun 17, 2014 at 12:50 PM, Jan Hubicka hubi...@ucw.cz wrote:
  To avoid using PLT and GOT when the unit refers to the symbol and we know
  that interposition does not matter.

 I am not certain if the linker is creating the PLT stub code because
 it wants to allow interpolation or because it cannot see a definition
 of the function and wants to allow for some other shared library to
 provide the definition at runtime.

 OK, but the definition appears in the same file..

  Why branch to a non-global (static) symbol
b ._ZN14__gnu_parallel9_SettingsC1Ev.localalias.0
  leads to PLT stub here and why branching to such symbols seems to work 
  otherwise?

 Branching to non-global (static) symbol, even an alias, is working
 here. The weak function seems to be the problem.

The weak function is the problem, but I don't know why.  And I don't
understand how this is different than past uses of weak functions.  Or
is that new?

This is very confusing because the library, libstdc++, is being linked
statically. It provides a weak definition of the function. There
should be no glink code (PLT stub).  If the function is declared
.lglobl, it is called directly and no PLT stub is created.  I need to
call in the help of the AIX linker expert to figure out why it is
inserting PLT stub code, especially when linking statically.

Thanks, David


Re: [Patch] PR55189 enable -Wreturn-type by default

2014-06-17 Thread Joseph S. Myers
On Tue, 17 Jun 2014, Sylvestre Ledru wrote:

 On 05/06/2014 20:01, Joseph S. Myers wrote:
 
  Initially, I implemented -Wmissing-return to manage this case (
  https://gcc.gnu.org/ml/gcc-patches/2014-01/msg00820.html ) but Jason
  suggested to remove that:
  https://gcc.gnu.org/ml/gcc-patches/2014-01/msg01033.html
  (I don't have a strong opinion on the subject).
  I think splitting the option like that makes sense.  Compatibility 
  indicates that -Wreturn-type and -Wall should still enable 
  -Wmissing-return, but only the other pieces of -Wreturn-type should be 
  enabled by default, at least for C.  (Enabling -Wimplicit-int by default 
  might be a good starting point.)
 OK.
 As attachment, you will find a potential implementation. Is that what
 you expect?

It would help a lot if it included testcases for what various options / 
option combinations do / do not enable.  I expect that each option 
continues to enable the warnings it does at present (so if a user 
explicitly does -Wreturn-type it also enables the -Wmissing-return 
warnings, for example) - but some warnings would start to be enabled by 
default.  If someone does e.g. -Wno-implicit that would disable the 
default -Wimplicit-int; if they do -Wno-implicit -Wimplicit that would 
have the same effect as just -Wimplicit (so keeping the default warnings 
enabled, and possibly enabling others).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-17 Thread Evgeny Stupachenko
While developing I've tried the following scheme:

First step is 3 shuffles (as initially):

A1 = (0 3 6) (1 4 7) (2 5)
A2 = (8 11 14) (9 12 15) (10 13)
A3 = (16 19 22) (17 20 23) (18 21)

R1 = blend [ blend [A1 A2], A3] =  (0 3 6) (9 12 15) (18 21)
  B2 = blend [A1, A2] = (0 3 6) (1 4 7) (10 13)
R2 = shift 3, B2 ... (1 4 7) (10 13) + A3 (16 19 22) ... = (1 4 7) (10
13) (16 19 22)
  B3 = blend [ A2, A3] = (8 11 14) (17 20 23) (18 21)
R3 = shift 6, A1 ... (2 5) + B3 (8 11 14) (17 20 23) ... = (2 5) (8 11
14) (17 20 23)

But it was slower than scheme in the patch as blend costs more than
shift (palign).
For AVX2 the scheme is not ok as have much more dependencies than
current (in vect_permute_load_chain).

Evgeny

On Tue, Jun 17, 2014 at 7:41 PM, Richard Henderson r...@redhat.com wrote:
 On 06/17/2014 05:33 AM, Evgeny Stupachenko wrote:
 +   1st vec:   0  1  2  3  4  5  6  7
 +   2nd vec:   8  9 10 11 12 13 14 15
 +   3rd vec:  16 17 18 19 20 21 22 23
 +
 +   The output sequence should be:
 +
 +   1st vec:  0 3 6  9 12 15 18 21
 +   2nd vec:  1 4 7 10 13 16 19 22
 +   3rd vec:  2 5 8 11 14 17 20 23
 +
 +   We use 3 shuffle instructions and 3 * 3 - 1 shifts to create such output.

 Why not 3 * 2 blend followed by 3 shuffle?  When length is prime, as here, we
 know that no blend will ever overlap elements.  So:

 1st step

   A1 = blend V1 V2 =  0  9  2  3 12  5  6 15
   A2 = blend V1 V2 =  8  1 10 11  4 13 14  7
   A3 = blend V1 V3 = 16 17  2 19 20  5 22 23

 2nd step

   B1 = blend A1 V3 =  0  9 18  3 12 21  6 15
   B2 = blend A2 V3 = 16  1 10 19  4 13 22  7
   B3 = blend A3 V2 =  8 17  2 11 20  5 14 23

 3rd step

   C1 = perm B1 =  0  3  6  9 12 15 18 21
   C2 = perm B2 =  1  4  7 10 13 16 19 22
   C3 = perm B3 =  2  5  8 11 14 17 20 23

 The final permute here isn't trivial, crossing lanes for avx2 and all, but the
 initial permute you use is similar.


 r~


Re: [Patch] PR55189 enable -Wreturn-type by default

2014-06-17 Thread Sylvestre Ledru
On 17/06/2014 19:15, Joseph S. Myers wrote:
 On Tue, 17 Jun 2014, Sylvestre Ledru wrote:

 On 05/06/2014 20:01, Joseph S. Myers wrote:
 Initially, I implemented -Wmissing-return to manage this case (
 https://gcc.gnu.org/ml/gcc-patches/2014-01/msg00820.html ) but Jason
 suggested to remove that:
 https://gcc.gnu.org/ml/gcc-patches/2014-01/msg01033.html
 (I don't have a strong opinion on the subject).
 I think splitting the option like that makes sense.  Compatibility 
 indicates that -Wreturn-type and -Wall should still enable 
 -Wmissing-return, but only the other pieces of -Wreturn-type should be 
 enabled by default, at least for C.  (Enabling -Wimplicit-int by default 
 might be a good starting point.)
 OK.
 As attachment, you will find a potential implementation. Is that what
 you expect?
 It would help a lot if it included testcases for what various options / 
 option combinations do / do not enable.  
OK. I will do that.
We should test the following:
* default = run just -Wreturn-type
* -Wreturn-type = Run both
* -Wreturn-type + -Wmissing-return = Run both
* -Wno-return-type + -Wmissing-return = Run just the second one
* -Wno-return-type + -Wno-missing-return = Run none
Do you see any other?
 I expect that each option 
 continues to enable the warnings it does at present (so if a user 
 explicitly does -Wreturn-type it also enables the -Wmissing-return 
 warnings, for example) - but some warnings would start to be enabled by 
 default.  If someone does e.g. -Wno-implicit that would disable the 
 default -Wimplicit-int; if they do -Wno-implicit -Wimplicit that would 
 have the same effect as just -Wimplicit (so keeping the default warnings 
 enabled, and possibly enabling others).

OK. I will try to implement that later (I don't think -Wimplicit-int is
necessary to enable -Wreturn-type by default).
Besides that, are you OK with my changes? (with the tests updated)

Thanks,
Sylvestre



Re: [Patch] PR55189 enable -Wreturn-type by default

2014-06-17 Thread Joseph S. Myers
On Tue, 17 Jun 2014, Sylvestre Ledru wrote:

 OK. I will do that.
 We should test the following:
 * default = run just -Wreturn-type
 * -Wreturn-type = Run both
 * -Wreturn-type + -Wmissing-return = Run both
 * -Wno-return-type + -Wmissing-return = Run just the second one
 * -Wno-return-type + -Wno-missing-return = Run none
 Do you see any other?

That looks like the right things to test, if there are no changes for 
anything other than those options.

 Besides that, are you OK with my changes? (with the tests updated)

The tests are key to reviewing whether the code changes actually do the 
right thing.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] C++ thunk section names

2014-06-17 Thread Sriraman Tallam
Ping.

On Mon, Jun 9, 2014 at 3:54 PM, Sriraman Tallam tmsri...@google.com wrote:
 Ping.

 On Mon, May 19, 2014 at 11:25 AM, Sriraman Tallam tmsri...@google.com wrote:
 Ping.

 On Thu, Apr 17, 2014 at 10:41 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 Ping.

 On Wed, Feb 5, 2014 at 4:31 PM, Sriraman Tallam tmsri...@google.com wrote:
 Hi,

   I would like this patch reviewed and considered for commit when
 Stage 1 is active again.

 Patch Description:

 A C++ thunk's section name is set to be the same as the original function's
 section name for which the thunk was created in order to place the two
 together.  This is done in cp/method.c in function use_thunk.
 However, with function reordering turned on, the original function's 
 section
 name can change to something like .text.hot.orginal or
 .text.unlikely.original in function default_function_section in 
 varasm.c
 based on the node count of that function.  The thunk function's section 
 name
 is not updated to be the same as the original here and also is not always
 correct to do it as the original function can be hotter than the thunk.

 I have created a patch to not name the thunk function's section to be the 
 same
 as the original function when function reordering is enabled.

 Thanks
 Sri


Re: [PATCH][RFC] Add phiopt in early opts (and add -fssa-phiopt option)

2014-06-17 Thread Jeff Law

On 06/17/14 07:07, Richard Biener wrote:


I felt that -ftree-XXX is bad naming so I went for -fssa-XXX
even if that is now inconsistent.  Any optinion here?  For
RTL we simply have unsuffixed names so shall we instead go
for -fphiopt?  PHI implies SSA anyway and 'SSA' or 'RTL' is
an implementation detail that the user should not be interested
in (applies to tree- as well, of course).  Now, 'phiopt' is a
bad name when thinking of users (but they shouldn't play with
those options anyway).
Our flags are a mess.  If I put my user hat on, then I'd have to ask the 
question, why would I care about tree, ssa, or even phis.   The pass 
converts branchy code into straightline code.  So, arguably, the right 
name would reflect that it changes branchy code to straight line code.


But I believe most of our flag names are poor in this regard (and I'm as 
much to blame as anyone).  So go with your best judgement IMHO.


It'd be nice to have some testcases here to show why we want this moved 
earlier so that a few years from now when someone else wants to move it 
back, we can say umm, see test frobit.c, make that work and you can 
move it back :-)


jeff



Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-06-17 Thread Ilya Verbin
Hello Bernd,

On 28 Feb 17:21, Bernd Schmidt wrote:
 For your use case, I'd imagine the offload compiler would be built
 relatively normally as a full build with
 --enable-as-accelerator-for=x86_64-linux, which would install it
 into locations where the host will eventually be able to find it.
 Then the host compiler would be built with another new configure
 option (as yet unimplemented in my patch set)
 --enable-offload-targets=mic,... which would tell the host
 compiler about the pre-built offload target compilers. On the ptx

I don't get this part of the plan.  Where a host compiler will look for 
mkoffloads?

E.g., first I configure/make/install the target gcc and corresponding mkoffload 
with the following options:
--enable-accelerator=intelmic --enable-as-accelerator-for=x86_64-unknown-linux 
--prefix=/install_gcc/accel_intelmic

Next I configure/make/install the host gcc with:
--enable-accelerator=intelmic --prefix=/install_gcc/host

Now if I manually copy mkoffload from target's install dir into one of the dirs 
in host's $COMPILER_PATH,
then lto-wrapper finds it and everything works fine.
E.g.: mkdir -p 
/install_gcc/host/libexec/gcc/x86_64-unknown-linux-gnu/accel/intelmic/ 
cp 
/install_gcc/accel_intelmic/libexec/gcc/x86_64-unknown-linux/4.10.0/accel/x86_64-unknown-linux-gnu/mkoffload
/install_gcc/host/libexec/gcc/x86_64-unknown-linux-gnu/accel/intelmic/

But what was your idea of how to tell host gcc about the path to mkoffload?

Thanks,
  -- Ilya


Re: [PATCH] PR54555: Use strict_low_part for loading a constant only if it is cheaper

2014-06-17 Thread Jeff Law

On 06/17/14 01:47, Andreas Schwab wrote:

Postreload may transform (set (REGX) (CONST_INT A)) ... (set (REGX)
(CONST_INT B)) to (set (REGX) (CONST_INT A)) ... (set (STRICT_LOW_PART
(REGX)) (CONST_INT B)), but it should do that only if the latter is
cheaper.  On m68k, a full word load of a small constant with moveq is
cheaper than doing a byte load with move.b.

Tested on m68k-suse-linux and x86_64-suse-linux.  In both cases the size
of cc1* becomes smaller with this change.

Andreas.

PR rtl-optimization/54555
* postreload.c (move2add_use_add2_insn): Only substitute
STRICT_LOW_PART if it is cheaper.
Sadly, Kazu didn't add a testcase for the H8/300 cases which inspired 
his change, so we don't know if your patch hurts the H8/300 port or not.


Let's do better this time ;-)  Add a testcase for the m68k port which 
verifies we're getting the desired code.  I don't care if you test the 
assembly code or test the RTL dumps, just that we have a test for the 
case where STRICT_LOW_PART is not a win.


With a testcase, this is approved.

Thanks,

jeff



Re: [PATCH, Cilk+, PR57541] Additional fix for issues witn array notations

2014-06-17 Thread Jeff Law

On 06/16/14 14:13, Zamyatin, Igor wrote:

Hi All!

The patch fixes ICE in array notation for the cases of incorrect arguments of 
Cilk+ builtins and undeclared initial index.

Is it ok for trunk and 4.9?

Thanks,
Igor

diff --git a/gcc/c/ChangeLog b/gcc/c/ChangeLog
index 54d0de7..56e1b0b 100644
--- a/gcc/c/ChangeLog
+++ b/gcc/c/ChangeLog
@@ -1,3 +1,12 @@
+2014-06-16  Igor Zamyatin  igor.zamya...@intel.com
+
+   PR middle-end/57541
+   * c-array-notation.c (fix_builtin_array_notation_fn):
+   Check for 0 arguments in builtin call. Check that bultin argument is
+   correct.
+   * c-parser.c (c_parser_array_notation): Check for incorrect initial
+   index.
Shouldn't this have been caught earlier?  ISTM we should be catching any 
argument mix-ups during parsing?!?Is there some reason we don't do that?


jeff



Re: [PATCH, PR 61211] Fix a bug in clone_of_p verification

2014-06-17 Thread Martin Jambor
Ping.

Thanks,

Martin


On Sat, May 31, 2014 at 12:46:03AM +0200, Martin Jambor wrote:
 Hi,
 
 after a clone is materialized, its clone_of field is cleared which in
 PR 61211 leads to a failure in the skipped_thunk path in clone_of_p in
 cgraph.c, which then leads to a false positive verification failure.
 
 Fixed by the following patch.  Bootstrapped and tested on x86_64-linux
 on both the trunk and the 4.9 branch.  OK for both?
 
 Thanks,
 
 Martin
 
 
 2014-05-30  Martin Jambor  mjam...@suse.cz
 
   PR ipa/61211
   * cgraph.c (clone_of_p): Allow skipped_branch to deal with
   expanded clones.
 
 diff --git a/gcc/cgraph.c b/gcc/cgraph.c
 index ff65b86..f18f977 100644
 --- a/gcc/cgraph.c
 +++ b/gcc/cgraph.c
 @@ -2566,11 +2566,16 @@ clone_of_p (struct cgraph_node *node, struct 
 cgraph_node *node2)
skipped_thunk = true;
  }
  
 -  if (skipped_thunk
 -   (!node2-clone_of
 -   || !node2-clone.args_to_skip
 -   || !bitmap_bit_p (node2-clone.args_to_skip, 0)))
 -return false;
 +  if (skipped_thunk)
 +{
 +  if (!node2-clone.args_to_skip
 +   || !bitmap_bit_p (node2-clone.args_to_skip, 0))
 + return false;
 +  if (node2-former_clone_of == node-decl)
 + return true;
 +  else if (!node2-clone_of)
 + return false;
 +}
  
while (node != node2  node2)
  node2 = node2-clone_of;


Re: [patch libatomic]: Add basic support for mingw targets

2014-06-17 Thread Jeff Law

On 06/16/14 07:20, Kai Tietz wrote:

Hello,

this patch adds basic support for libatomic for mingw targets using
win32 and for mingw targets using posix threading model.

The win32 implemenation might need for initialization of mutexes a
critical section.  If issue occures we can still add that.  For now
all testcases are passing for native and posix-threading model mingw
(32-bit and 64-bit).

ChangeLog

2014-06-16  Kai Tietz  kti...@redhat.com

 * Makefile.am (libatomic_la_LDFLAGS): Add lt_host_flags.
Isn't this all target stuff, in which case lt_host_flags seems 
inappropriate.  Or is this just poorly named?


The rest seems reasonable.  So we just need to settle that nit and we 
can go forward.


jeff


Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-06-17 Thread Bernd Schmidt

On 06/17/2014 08:20 PM, Ilya Verbin wrote:

Hello Bernd,

On 28 Feb 17:21, Bernd Schmidt wrote:

For your use case, I'd imagine the offload compiler would be built
relatively normally as a full build with
--enable-as-accelerator-for=x86_64-linux, which would install it
into locations where the host will eventually be able to find it.
Then the host compiler would be built with another new configure
option (as yet unimplemented in my patch set)
--enable-offload-targets=mic,... which would tell the host
compiler about the pre-built offload target compilers. On the ptx


I don't get this part of the plan.  Where a host compiler will look for 
mkoffloads?

E.g., first I configure/make/install the target gcc and corresponding mkoffload 
with the following options:
--enable-accelerator=intelmic --enable-as-accelerator-for=x86_64-unknown-linux 
--prefix=/install_gcc/accel_intelmic

Next I configure/make/install the host gcc with:
--enable-accelerator=intelmic --prefix=/install_gcc/host


Try using the same prefix for both.


Bernd


Re: [patch i386]: Combine memory and indirect jump

2014-06-17 Thread Jeff Law

On 06/13/14 10:59, Kai Tietz wrote:

2014-06-13 17:58 GMT+02:00 Jeff Law l...@redhat.com:

On 06/13/14 09:56, Richard Henderson wrote:


On 06/13/2014 08:36 AM, Jeff Law wrote:


So you may have answered this already, but why can't this be a combiner
pattern?



Until pass_duplicate_computed_gotos, we (intentionally) have a single
indirect
branch in the entire function.  This vastly reduces the size of the CFG.


Ah, the factoring bits.  Should have known.




Peep2 is currently running before d_c_g, so currently Kai can't solve this
problem in peep2.

I don't think peep2 should run after sched2, but I'll bet we can reorder
things
a bit so that d_c_g runs before peep2.


Yea, seems worth a try.

jeff



Well, I tested to put the second sched2 pass before the sched2 pass.
That works in general.  There are just some opportunties which weren't
caught then.  I attached a sample, which demonstrates that pretty
well.  I noticed that I had to put that pass behind reload blocks was
necessary for better hit-rate of the peephole optimization.
So can you tell us why this sample code misses opportunities?  Otherwise 
we have to dig into it ourselves to tease out that information.


I think we're zeroing in on a path to move d_c_g before peep2, but I'd 
like to have a clearer understanding of why we'd still be missing 
opportunities.  If we can avoid running peep2 twice, that'd be good.


jeff




Re: [Patch, Fortran] Add coarray communication support to the trunk (coindex variables)

2014-06-17 Thread Paul Richard Thomas
Dear Tobias and Alessandro,

Well what can I say?  The patch is something of a tour de force!
Sandro, questo è assolutamente meraviglioso. Molte grazie da tutti
noi.

I have done nothing to check the functionality of the patch.  However,
I have checked the conformance with coding standards and that it is
well and truly insulated from the rest of gfortran by the coarray
option.

OK for trunk

Once again many thanks for the patch.

Paul


On 17 June 2014 08:28, Tobias Burnus bur...@net-b.de wrote:
 This patch add the first coarray communication support to the trunk
 (ignoring the co_sum/co_min/co_max support, which was recently merged).
 [Note: In terms of the library this is still libcaf_single, but see below.]

 The patch is based on my work on the fortran-caf branch, but has a slightly
 modified ABI. The patch should support most communications, but it is not
 complete. I intent to submit soon a patch which irons some wrinkles.

 In particular, this patch adds three library calls to handle coindexed
 communication: Assignment to a coindex variable (caf_send), a coindexed
 expression (caf_expression) and assigning a coindexed variable to a
 coindexed variable (caf_sendget). The coarray is identified by a token
 (opaque object provided by the coarray library), an offset to that base
 address, an image index and an array descriptor for the coarray, which is
 also used for scalars – and which has the value of the whole array for
 vector subscripts. Additionally, one passes a kind variable as extra
 argument as the current array descriptor cannot destinguish a len=1 kind=4
 from a len=4 kind=1 character string. And for vector subscripts, the
 subscripts are passed as additional argument.

 For assignments, the library is supposed to handle padding/trimming of
 strings and type conversion (e.g. cmplx_caf(:)[i] = int_array) as well as
 array = scalar assignments.

 The following is left to be done as follow up:

 * Support of vector subscripts with assumed-size variables: To be tested;
 might need the new array descriptor or some similar work around – or just a
 test case.
 * The library libcaf_single supports padding/trimming of strings but still
 lacks the support for type conversion and vector subscripts.
 * Adding an ABI documentation
 * There are still some issues with regards to polymorphic coarrays, in
 particular with passing them as dummy arguments and in ASSOCIATE/SELECT
 TYPE, but presumably also with using them in coindexed expressions.

 And as bigger item: Allocatable components of coarrays are not supported –
 not is the access to pointer or allocatable components (part refs);
 currently, there is no compile time diagnostic for it.


 Additionally, I have remove the vector subscript preparations from the
 co_sum/min/max as it does not make much sense for those. And I added a
 collective test case, which I found on my hard disk.

 Build and regtested. OK for the trunk?

 Tobias

 PS: Additional missing bits, not listed above: Locking and CRITICAL and
 atomics for Fortran 2008. And for TS18508 co_broadcast and co_reduce, the
 atomics extensions, teams, events and error recovery.



-- 
The knack of flying is learning how to throw yourself at the ground and miss.
   --Hitchhikers Guide to the Galaxy


Re: [patch libatomic]: Add basic support for mingw targets

2014-06-17 Thread Kai Tietz
2014-06-17 21:16 GMT+02:00 Jeff Law l...@redhat.com:
 On 06/16/14 07:20, Kai Tietz wrote:

 Hello,

 this patch adds basic support for libatomic for mingw targets using
 win32 and for mingw targets using posix threading model.

 The win32 implemenation might need for initialization of mutexes a
 critical section.  If issue occures we can still add that.  For now
 all testcases are passing for native and posix-threading model mingw
 (32-bit and 64-bit).

 ChangeLog

 2014-06-16  Kai Tietz  kti...@redhat.com

  * Makefile.am (libatomic_la_LDFLAGS): Add lt_host_flags.

 Isn't this all target stuff, in which case lt_host_flags seems
 inappropriate.  Or is this just poorly named?

Hmm, libatomic is here build for new host (means it is a gcc-target
library).  So it might be named poorly.  Nevertheless see for details
ACX_LT_HOST_FLAGS in config/lthostflags.m4 and why it is required to
set -no-undefined and the proper bindir for cygwin/mingw.

 The rest seems reasonable.  So we just need to settle that nit and we can go
 forward.

 jeff

Kai


[PATCH, rs6000] Remove XFAIL from default_format_denormal_2.f90 for PowerPC on Linux

2014-06-17 Thread William J. Schmidt
Hi,

The testcase gfortran.dg/default_format_denormal_2.f90 has been
reporting XPASS since 4.8 on the powerpc*-unknown-linux-gnu platforms.
This patch removes the XFAIL for powerpc*-*-linux-* from the test.  I
believe this pattern doesn't match any other platforms, but please let
me know if I should replace it with a more specific pattern instead.

Verified on powerpc64-unknown-linux-gnu (-m32 and -m64) and
powerpc64le-unknown-linux-gnu (-m64).  Is this ok for trunk, 4.9, and
4.8?

Thanks,
Bill


2014-06-17  Bill Schmidt  wschm...@linux.vnet.ibm.com

* gfortran.dg/default_format_denormal_2.f90:  Remove xfail for
powerpc*-*-linux*.


Index: gcc/testsuite/gfortran.dg/default_format_denormal_2.f90
===
--- gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 (revision 
211741)
+++ gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 (working copy)
@@ -1,5 +1,5 @@
 ! { dg-require-effective-target fortran_large_real }
-! { dg-do run { xfail powerpc*-apple-darwin* powerpc*-*-linux* } }
+! { dg-do run { xfail powerpc*-apple-darwin* } }
 ! Test XFAILed on these platforms because the system's printf() lacks
 ! proper support for denormalized long doubles. See PR24685
 !




[Fortran-dev] Merge from the trunk

2014-06-17 Thread Tobias Burnus

Dear all,

I have now updated the fortran-dev branch to trunk version Rev. 
211744. Committed as Rev. 211745.


Tobias


Re: [PATCH, PR 61160] Artificial thunks need combined_args_to_skip

2014-06-17 Thread Martin Jambor
Hi,

Ping.

Thanks,

Martin


On Sat, May 31, 2014 at 01:08:31AM +0200, Martin Jambor wrote:
 Hi,
 
 the second issue in PR 61160 is that because artificial thunks
 (produced by duplicate_thunk_for_node) do not have
 combined_args_to_skip, calls to them do not get actual arguments
 removed, while the actual functions do loose their formal parameters,
 leading to mismatches.
 
 Currently, the combined_args_to_skip is computed in of
 cgraph_create_virtual_clone only after all the edge redirection and
 thunk duplication is done so it had to be moved to a spot before
 that.  Since we already pass args_to_skip to cgraph_clone_node, I
 moved the computation there (otherwise it would have to duplicate the
 old value and also pass the new one to the redirection routine).
 
 I have also noticed that the code producing combined_args_to_skip from
 an old value and new args_to_skip cannot work in LTO because we do not
 have DECL_ARGUMENTS available at WPA in LTO.  The wrong code is
 however never executed and so I replaced it with a simple bitmap_ior.
 This changes the semantics of args_to_skip for any user of
 cgraph_create_virtual_clone that would like to remove some parameters
 from something which is already a clone.  However, currently there are
 no such users and the new semantics is saner because WPA code will be
 happier using the old indices rather than remapping everything the
 whole time.
 
 I am still in the process of bootstrapping and testing this patch on
 trunk, I will test it on the 4.9 branch too.  OK if it passes
 everywhere?
 
 Thanks,
 
 Martin
 
 
 
 2014-05-29  Martin Jambor  mjam...@suse.cz
 
   PR ipa/61160
   * cgraphclones.c (duplicate_thunk_for_node): Removed parameter
   args_to_skip, use those from node instead.  Copy args_to_skip and
   combined_args_to_skip from node to the new thunk.
   (redirect_edge_duplicating_thunks): Removed parameter args_to_skip.
   (cgraph_create_virtual_clone): Moved computation of
   combined_args_to_skip...
   (cgraph_clone_node): ...here, simplify it to bitmap_ior..
 
 testsuite/
   * g++.dg/ipa/pr61160-2.C: New test.
   * g++.dg/ipa/pr61160-3.C: Likewise.
 
 diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
 index 4387b99..91cc13c 100644
 --- a/gcc/cgraphclones.c
 +++ b/gcc/cgraphclones.c
 @@ -301,14 +301,13 @@ set_new_clone_decl_and_node_flags (cgraph_node 
 *new_node)
 thunk is this_adjusting but we are removing this parameter.  */
  
  static cgraph_node *
 -duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node,
 -   bitmap args_to_skip)
 +duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node)
  {
cgraph_node *new_thunk, *thunk_of;
thunk_of = cgraph_function_or_thunk_node (thunk-callees-callee);
  
if (thunk_of-thunk.thunk_p)
 -node = duplicate_thunk_for_node (thunk_of, node, args_to_skip);
 +node = duplicate_thunk_for_node (thunk_of, node);
  
struct cgraph_edge *cs;
for (cs = node-callers; cs; cs = cs-next_caller)
 @@ -320,17 +319,18 @@ duplicate_thunk_for_node (cgraph_node *thunk, 
 cgraph_node *node,
return cs-caller;
  
tree new_decl;
 -  if (!args_to_skip)
 +  if (!node-clone.args_to_skip)
  new_decl = copy_node (thunk-decl);
else
  {
/* We do not need to duplicate this_adjusting thunks if we have removed
this.  */
if (thunk-thunk.this_adjusting
 -bitmap_bit_p (args_to_skip, 0))
 +bitmap_bit_p (node-clone.args_to_skip, 0))
   return node;
  
 -  new_decl = build_function_decl_skip_args (thunk-decl, args_to_skip,
 +  new_decl = build_function_decl_skip_args (thunk-decl,
 + node-clone.args_to_skip,
   false);
  }
gcc_checking_assert (!DECL_STRUCT_FUNCTION (new_decl));
 @@ -348,6 +348,8 @@ duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node 
 *node,
new_thunk-thunk = thunk-thunk;
new_thunk-unique_name = in_lto_p;
new_thunk-former_clone_of = thunk-decl;
 +  new_thunk-clone.args_to_skip = node-clone.args_to_skip;
 +  new_thunk-clone.combined_args_to_skip = node-clone.combined_args_to_skip;
  
struct cgraph_edge *e = cgraph_create_edge (new_thunk, node, NULL, 0,
 CGRAPH_FREQ_BASE);
 @@ -364,12 +366,11 @@ duplicate_thunk_for_node (cgraph_node *thunk, 
 cgraph_node *node,
 chain.  */
  
  void
 -redirect_edge_duplicating_thunks (struct cgraph_edge *e, struct cgraph_node 
 *n,
 -   bitmap args_to_skip)
 +redirect_edge_duplicating_thunks (struct cgraph_edge *e, struct cgraph_node 
 *n)
  {
cgraph_node *orig_to = cgraph_function_or_thunk_node (e-callee);
if (orig_to-thunk.thunk_p)
 -n = duplicate_thunk_for_node (orig_to, n, args_to_skip);
 +n = duplicate_thunk_for_node (orig_to, n);
  
cgraph_redirect_edge_callee (e, n);
  }
 @@ -422,9 +423,21 @@ 

Re: [PATCH 1/5] New Identical Code Folding IPA pass

2014-06-17 Thread Jeff Law

On 06/13/14 04:24, mliska wrote:


You may ask, why the GNU GCC does need such a new optimization. The
compiler, having simply better knowledge of a compiled source file,
is capable of reaching better results, especially if Link-Time
optimization is enabled. Apart from that, GCC implementation adds
support for read-only variables like construction vtables (mentioned
in:
http://hubicka.blogspot.cz/2014/02/devirtualization-in-c-part-3-building.html).
Can you outline at a high level cases where GCC's knowledge allows it to 
reach a better result?  Is it because you're not requiring bit for bit 
identical code, but that the code merely be semantically equivalent?


The GCC driven ICF seems to pick up 2X more opportunities than the gold 
driven ICF.  But if I'm reading everything correctly, that includes ICF 
of both functions and variables.


Do you have any sense of how those improvements break down?  ie, is it 
mostly more function's you're finding as identical, and if so what is it 
about the GCC implementation that allows us to find more ICF 
opportunities.  If it's mostly variables, that's fine too.  I'm just 
trying to understand where the improvements are coming from.


Jeff


Re: [PATCH 4/5] Existing tests fix

2014-06-17 Thread Jeff Law

On 06/13/14 04:48, mliska wrote:

Hi,
   many tests rely on a precise number of scanned functions in a dump file. If 
IPA ICF decides to merge some function and(or) read-only variables, counts do 
not match.

Martin

Changelog:

2014-06-13  Martin Liska  mli...@suse.cz
Honza Hubicka  hubi...@ucw.cz

* c-c++-common/rotate-1.c: Text
* c-c++-common/rotate-2.c: New test.
* c-c++-common/rotate-3.c: Likewise.
* c-c++-common/rotate-4.c: Likewise.
* g++.dg/cpp0x/rv-return.C: Likewise.
* g++.dg/cpp0x/rv1n.C: Likewise.
* g++.dg/cpp0x/rv1p.C: Likewise.
* g++.dg/cpp0x/rv2n.C: Likewise.
* g++.dg/cpp0x/rv3n.C: Likewise.
* g++.dg/cpp0x/rv4n.C: Likewise.
* g++.dg/cpp0x/rv5n.C: Likewise.
* g++.dg/cpp0x/rv6n.C: Likewise.
* g++.dg/cpp0x/rv7n.C: Likewise.
* gcc.dg/ipa/ipacost-1.c: Likewise.
* gcc.dg/ipa/ipacost-2.c: Likewise.
* gcc.dg/ipa/ipcp-agg-6.c: Likewise.
* gcc.dg/ipa/remref-2a.c: Likewise.
* gcc.dg/ipa/remref-2b.c: Likewise.
* gcc.dg/pr46309-2.c: Likewise.
* gcc.dg/torture/ipa-pta-1.c: Likewise.
* gcc.dg/tree-ssa/andor-3.c: Likewise.
* gcc.dg/tree-ssa/andor-4.c: Likewise.
* gcc.dg/tree-ssa/andor-5.c: Likewise.
* gcc.dg/vect/no-vfa-pr29145.c: Likewise.
* gcc.dg/vect/vect-cond-10.c: Likewise.
* gcc.dg/vect/vect-cond-9.c: Likewise.
* gcc.dg/vect/vect-widen-mult-const-s16.c: Likewise.
* gcc.dg/vect/vect-widen-mult-const-u16.c: Likewise.
* gcc.dg/vect/vect-widen-mult-half-u8.c: Likewise.
* gcc.target/i386/bmi-1.c: Likewise.
* gcc.target/i386/bmi-2.c: Likewise.
* gcc.target/i386/pr56564-2.c: Likewise.
* g++.dg/opt/pr30965.C: Likewise.
* g++.dg/tree-ssa/pr19637.C: Likewise.
* gcc.dg/guality/csttest.c: Likewise.
* gcc.dg/ipa/iinline-4.c: Likewise.
* gcc.dg/ipa/iinline-7.c: Likewise.
* gcc.dg/ipa/ipa-pta-13.c: Likewise.
I know this is the least interesting part of your changes, but it's also 
simple and mechanical and thus trivial to review.  Approved, but 
obviously don't install until the rest of your patch has been approved.


Similar changes for recently added tests or cases where you might 
improve ICF requiring similar tweaks to existing tests are pre-approved 
as well.


jeff



Re: [PATCH 5/5] New tests introduction

2014-06-17 Thread Jeff Law

On 06/13/14 05:16, mliska wrote:

Hi,
this is a new collection of tests for IPA ICF pass.

Martin

Changelog:

2014-06-13  Martin Liska  mli...@suse.cz
Honza Hubicka  hubi...@ucw.cz

* gcc/testsuite/g++.dg/ipa/ipa-se-1.C: New test.
* gcc/testsuite/g++.dg/ipa/ipa-se-2.C: Likewise.
* gcc/testsuite/g++.dg/ipa/ipa-se-3.C: Likewise.
* gcc/testsuite/g++.dg/ipa/ipa-se-4.C: Likewise.
* gcc/testsuite/g++.dg/ipa/ipa-se-5.C: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-1.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-10.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-11.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-12.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-13.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-14.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-15.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-16.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-17.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-18.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-19.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-2.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-20.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-21.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-22.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-23.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-24.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-25.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-26.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-27.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-28.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-3.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-4.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-5.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-6.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-7.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-8.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-9.c: Likewise.

Also approved, but please don't install entire the entire kit is approved.

I'd like to applaud you and Jan for including a nice baseline of tests.

jeff



Re: [PATCH 2/5] Existing call graph infrastructure enhancement

2014-06-17 Thread Jeff Law

On 06/13/14 04:26, mliska wrote:

Hi,
 this small patch prepares remaining needed infrastructure for the new pass.

Changelog:

2014-06-13  Martin Liska  mli...@suse.cz
Honza Hubicka  hubi...@ucw.cz

* ipa-utils.h (polymorphic_type_binfo_p): Function marked external
instead of static.
* ipa-devirt.c (polymorphic_type_binfo_p): Likewise.
* ipa-prop.h (count_formal_params): Likewise.
* ipa-prop.c (count_formal_params): Likewise.
* ipa-utils.c (ipa_merge_profiles): Be more tolerant if we merge
profiles for semantically equivalent functions.
* passes.c (do_per_function): If we load body of a function during WPA,
this condition should behave same.
* varpool.c (ctor_for_folding): More tolerant assert for variable
aliases created during WPA.
Presumably we don't have any useful way to merge the cases where we have 
provides for SRC  DST in ipa_merge_profiles or even to guess which is 
more useful when presented with both?  Does it make sense to log this 
into a debugging file when we drop one?


I think this patch is fine.  If adding logging makes sense, then feel 
free to do so and consider that trivial change pre-approved.


Jeff



Re: [PATCH 1/5] New Identical Code Folding IPA pass

2014-06-17 Thread Paolo Carlini

Hi,

On 13/06/14 12:24, mliska wrote:

   The optimization is inspired by Microsoft /OPT:ICF optimization 
(http://msdn.microsoft.com/en-us/library/bxwfs976.aspx) that merges COMDAT 
sections with each function reside in a separate section.
In terms of C++ testcases, I'm wondering if you already double checked 
that the new pass already does well on the typical examples on which, I 
was told, the Microsoft optimization is known to do well, eg, code 
instantiating std::vector for different pointer types, or even long and 
long long on x86_64-linux, things like that.


Thanks,
Paolo.


Re: [patch libatomic]: Add basic support for mingw targets

2014-06-17 Thread Jeff Law

On 06/17/14 13:31, Kai Tietz wrote:

2014-06-17 21:16 GMT+02:00 Jeff Law l...@redhat.com:

On 06/16/14 07:20, Kai Tietz wrote:


Hello,

this patch adds basic support for libatomic for mingw targets using
win32 and for mingw targets using posix threading model.

The win32 implemenation might need for initialization of mutexes a
critical section.  If issue occures we can still add that.  For now
all testcases are passing for native and posix-threading model mingw
(32-bit and 64-bit).

ChangeLog

2014-06-16  Kai Tietz  kti...@redhat.com

  * Makefile.am (libatomic_la_LDFLAGS): Add lt_host_flags.


Isn't this all target stuff, in which case lt_host_flags seems
inappropriate.  Or is this just poorly named?


Hmm, libatomic is here build for new host (means it is a gcc-target
library).  So it might be named poorly.  Nevertheless see for details
ACX_LT_HOST_FLAGS in config/lthostflags.m4 and why it is required to
set -no-undefined and the proper bindir for cygwin/mingw.
Right, I'm aware that libatomic is a target library.  What I'm worried 
about is confusion due to using ACX_LT_HOST_FLAGS and possible pollution 
of flags originally the host being used for the target library build.


Given that several other libraries use similar constraints to get 
lt_host_flags into LDFLAGS, I guess pollution isn't (or better stated 
hasn't) been an issue.


Approved.

Jeff




Re: [PATCH 1/5] New Identical Code Folding IPA pass

2014-06-17 Thread David Malcolm
On Fri, 2014-06-13 at 12:24 +0200, mliska wrote:
[...snip...]
   Statistics about the pass:
   Inkscape: 11.95 MB - 11.44 MB (-4.27%)
   Firefox: 70.12 MB - 70.12 MB (-3.07%)

FWIW, you wrote 70.12 MB here for both before and after for Firefox, but
give a -3.07% change, which seems like a typo.

A 3.07% reduction from 70.12 MB would be 67.97 MB; was this what the
pass achieved?

[...snip...]

Thanks (nice patch, btw)
Dave



Re: [PATCH, Pointer Bounds Checker 28/x] IPA CP

2014-06-17 Thread Jeff Law

On 06/17/14 07:41, Martin Jambor wrote:

Hi,

On Wed, Jun 11, 2014 at 05:47:36PM +0400, Ilya Enkovich wrote:


Here is fixed verison.


I'm fine with the ipa-cp hunks but I cannot approve them, Honza is the
right person to ask.
I'll step in and say these bits are fine :-)  Thanks for the reviews 
Martin.


Ilya, please hold off installing until all the patches are approved. 
We're obviously trying to keep up with them as they come in.



jeff



Re: [PATCH][genattrtab] Fix memory corruption, allocate enough memory for all bypassed reservations

2014-06-17 Thread Jeff Law

On 06/17/14 02:12, Kyrill Tkachov wrote:


On 16/06/14 17:39, Jeff Law wrote:

On 06/16/14 04:12, Kyrill Tkachov wrote:


Doh, you're right. I did consider it but for some reason thought we
might want to iterate over all of the bypasses anyway. Breaking out
seems good.

How about this?
Tested on arm and aarch64 and confirmed with valgrind that no out of
bounds accesses occur.
I kicked off an x86_64 bootstrap but don't expect any problems.

Thanks,
Kyrill

genattrtab-bypasses.patch


commit 676b85f7a7cc1446482334dcaad457ac328875a8
Author: Kyrylo Tkachovkyrylo.tkac...@arm.com
Date:   Fri Jun 13 11:09:57 2014 +0100

  [genattrtab] Fix memory corruption with bypasses

I'm an idiot.  n_bypassed is used to size the vector, so you do have to
walk the entire list.


AFAICS in the loop in process_bypasses we want to count all the
reservations which have a bypass matching them. Once a reservation is
matched with a bypass it should be safe to break out of the inner loop
(over the bypasses), even if two bypasses match a reservation we only
want to count the reservation once.

So I think the 2nd version of the patch is good

OK.  APproved.

jeff



Re: [PING][PATCH, trunk, 4.9, 4.8] Fix PR57653, filename information discarded when using -imacros

2014-06-17 Thread Jeff Law

On 06/11/14 15:15, Peter Bergner wrote:

I'd like to ping the following patch that fixes PR57653.  This did
bootstrap and regtest with no regressions on powerpc64-linux.

 https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01571.html

Is this ok for trunk, 4.9 and 4.8?

Whee, fun.

So this led me to an interesting exchange between Per  DJ on some of 
Per's changes in this space.


Sadly, it doesn't look like Per checked in any tests for the problems DJ 
was running into.


I hate to ask Peter, but can you add some testcases?  These messages 
have the originals which led to the unsightly code we have now.


https://gcc.gnu.org/ml/gcc-patches/2003-10/msg02694.html
https://gcc.gnu.org/ml/gcc-patches/2003-11/msg00163.html

I know 57653's problem is specific to the stdc-predef that's included in 
glibc-2.17 and later, but that's becoming relatively common at this 
point.  I think c#2 has the testcase.


Approved with the tests added.

Thanks and sorry for the delay.

Jeff


Re: [patch i386]: Combine memory and indirect jump

2014-06-17 Thread Kai Tietz
2014-06-17 21:26 GMT+02:00 Jeff Law l...@redhat.com:
 On 06/13/14 10:59, Kai Tietz wrote:

 2014-06-13 17:58 GMT+02:00 Jeff Law l...@redhat.com:

 On 06/13/14 09:56, Richard Henderson wrote:


 On 06/13/2014 08:36 AM, Jeff Law wrote:


 So you may have answered this already, but why can't this be a combiner
 pattern?



 Until pass_duplicate_computed_gotos, we (intentionally) have a single
 indirect
 branch in the entire function.  This vastly reduces the size of the CFG.


 Ah, the factoring bits.  Should have known.



 Peep2 is currently running before d_c_g, so currently Kai can't solve
 this
 problem in peep2.

 I don't think peep2 should run after sched2, but I'll bet we can reorder
 things
 a bit so that d_c_g runs before peep2.


 Yea, seems worth a try.

 jeff


 Well, I tested to put the second sched2 pass before the sched2 pass.
 That works in general.  There are just some opportunties which weren't
 caught then.  I attached a sample, which demonstrates that pretty
 well.  I noticed that I had to put that pass behind reload blocks was
 necessary for better hit-rate of the peephole optimization.

 So can you tell us why this sample code misses opportunities?  Otherwise we
 have to dig into it ourselves to tease out that information.

 I think we're zeroing in on a path to move d_c_g before peep2, but I'd like
 to have a clearer understanding of why we'd still be missing opportunities.
 If we can avoid running peep2 twice, that'd be good.

 jeff

Hi Jeff,

I just did retest my testcase with recent source. I can't reproduce
this missed optimization before sched2 pass anymore.  I moved second
peephole2 pass just before split_before_sched2 and everything got
caught.

To remove first peephole2 pass seems to cause weaker code for
impossible pushes, etc

Nevertheless it might be a point to make this new peephole instead a
define_split?  I admit that this operation isn't a split, nevertheless
we would avoid a second peephole pass.

Kai


Re: [patch] improve sloc assignment on bind_expr entry/exit code

2014-06-17 Thread Jeff Law

On 06/11/14 09:02, Olivier Hainque wrote:

Hello,

For blocks requiring it, the gimplifier generates stack pointer
save/restore operations on entry/exit, per:

  gimplify_bind_expr (...)

   if (gimplify_ctxp-save_stack)
 {
   gimple stack_restore;

   /* Save stack on entry and restore it on exit.  Add a try_finally
 block to achieve this.  */
   build_stack_save_restore (stack_save, stack_restore);

   gimplify_seq_add_stmt (cleanup, stack_restore);
 }

   /* Add clobbers for all variables that go out of scope.  */
   ...

There is no specific location assigned to these entry/exit statements
so they eventually inherits slocs coming from preceding statements.

This is problematic for tools relying on debug info to infer
which statements were executed out of execution traces (allowing
coverage analysis without code instrumentation).

An example of problematic scenario is provided below.

The attached patch is a proposal to improve this by propagating
start and end of block locations from the block structure to the
few gimple statements we generate. It adds an end_locus to the
block structure for this purpose, which the Ada front-end knows
how to fill already.

I verified that it does inserts proper .loc directives before the
entry/exit code on the example. The patch also bootstraps and regtests
fine for languages=all,ada on x86_64-pc-linux-gnu.

OK to commit ?

Thanks in advance for your feedback,

With Kind Regards,

Olivier

--

2014-06-11  Olivier Hainque  hain...@adacore.com

* tree-core.h (tree_block): Add an end_locus field, allowing
memorization of the end of block source location.
* tree.h (BLOCK_SOURCE_END_LOCATION): New accessor.
* gimplify.c (gimplify_bind_expr): Propagate the block start and
end source location info we have on the block entry/exit code we
generate.
OK.  I assume y'all will add a suitable test to the Ada testsuite and 
propagate it into the GCC testsuite in due course?


jeff



Re: [PATCH, rs6000] Remove XFAIL from default_format_denormal_2.f90 for PowerPC on Linux

2014-06-17 Thread Rainer Orth
William J. Schmidt wschm...@linux.vnet.ibm.com writes:

 Index: gcc/testsuite/gfortran.dg/default_format_denormal_2.f90
 ===
 --- gcc/testsuite/gfortran.dg/default_format_denormal_2.f90   (revision 
 211741)
 +++ gcc/testsuite/gfortran.dg/default_format_denormal_2.f90   (working copy)
 @@ -1,5 +1,5 @@
  ! { dg-require-effective-target fortran_large_real }
 -! { dg-do run { xfail powerpc*-apple-darwin* powerpc*-*-linux* } }
 +! { dg-do run { xfail powerpc*-apple-darwin* } }
  ! Test XFAILed on these platforms because the system's printf() lacks
  ! proper support for denormalized long doubles. See PR24685

You should also update the comment: `these platforms' no longer applies.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH 4/5] Existing tests fix

2014-06-17 Thread Rainer Orth
Jeff Law l...@redhat.com writes:

 On 06/13/14 04:48, mliska wrote:
 Hi,
many tests rely on a precise number of scanned functions in a dump file. 
 If IPA ICF decides to merge some function and(or) read-only variables, 
 counts do not match.

 Martin

 Changelog:

 2014-06-13  Martin Liska  mli...@suse.cz
  Honza Hubicka  hubi...@ucw.cz

  * c-c++-common/rotate-1.c: Text

   ^ Huh?

  * c-c++-common/rotate-2.c: New test.
  * c-c++-common/rotate-3.c: Likewise.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, Pointer Bounds Checker 29/x] Debug info

2014-06-17 Thread Jeff Law

On 06/11/14 02:50, Ilya Enkovich wrote:

Hi,

This patch skips all bounds during debug info generation.

Bootstrapped and tested on linux-x86_64.

Thanks,
Ilya
--
gcc/

2014-06-11  Ilya Enkovich  ilya.enkov...@intel.com

* dbxout.c (dbxout_type): Ignore POINTER_BOUNDS_TYPE.
* dwarf2out.c (gen_subprogram_die): Ignore bound args.
(gen_type_die_with_usage): Skip pointer bounds.
(dwarf2out_global_decl): Likewise.
(is_base_type): Support POINTER_BOUNDS_TYPE.
(gen_formal_types_die): Skip pointer bounds.
(gen_decl_die): Likewise.
* var-tracking.c (vt_add_function_parameters): Skip
bounds parameters.
OK.  Note that sdbout might need updating as well.   It's used even less 
than dbxout, but if you can see how to skip bounds in there to, it'd be 
appreciated.


It looks like mingw/cygwin still use sdbout (?!?), so if you need 
something tested, you can ping Kai Tietz.


jeff



Re: [PING*2][PATCH] Extend mode-switching to support toggle (1/2)

2014-06-17 Thread Jeff Law

On 06/12/14 08:34, Christian Bruel wrote:

On 06/11/2014 02:00 PM, Christian Bruel wrote:

On 06/11/2014 06:17 AM, Joern Rennecke wrote:

Joern, is this new target macro interface OK with you ?

Yes, this interface should allow me to do switches between rounding
and truncating
floating-point modes with an add/subtract immediate.

However, the implentation, as posted, doesn't work - it causes memory
corruption.

It appears to work with the attached amendment patch.


Indeed,  thanks for pointing out the bad reusing of the aux field
between multiple entities.

In fact rereading this part of the implementation, I find the allocation
of aux*n_entities awkward. A simpler setting in the entity loop to carry
the mode directly into eg-aux is possible without array allocation
(which also fixes a memory leak by the way).


Here is the revised version fixing the aforementioned issue found by
Joern on Epiphany. It also simplifies the allocation of the aux edges
field to carry the modes.

Now that everyone agrees on the interface, is this OK for trunk ?

bootstrapped/regtested for X86 and SH4a.

thanks,

Christian







toggle.patch


2014-06-12  Christian Bruelchristian.br...@st.com

* mode-switching.c (struct bb_info): Add mode_out, mode_in caches.
(make_preds_opaque): Delete.
(clear_mode_bit, mode_bit_p, set_mode_bit): New macros.
(commit_mode_sets): New function.
(optimize_mode_switching): Handle current_mode to mode_switching_emit.
Process all modes at once.
* basic-block.h (pre_edge_lcm_avs): Declare.
* lcm.c (pre_edge_lcm_avs): Renamed from pre_edge_lcm.
Call clear_aux_for_edges. Fix comments.
(pre_edge_lcm): New wrapper function to call pre_edge_lcm_avs.
(pre_edge_rev_lcm): Idem.
* config/epiphany/epiphany.c (emit_set_fp_mode): Add prev_mode 
parameter.
* config/epiphany/epiphany-protos.h (emit_set_fp_mode): Idem.
* config/epiphany/resolve-sw-modes.c (pass_resolve_sw_modes::execute): 
Idem.
* config/i386/i386.c (x96_emit_mode_set): Idem.
* config/sh/sh.c (sh_emit_mode_set): Likewise. Handle PR toggle.
* config/sh/sh.md (toggle_pr):  Defined if TARGET_FPU_SINGLE.
(fpscr_toggle) Disallow from delay slot.
* target.def (emit_mode_set): Add prev_mode parameter.
* doc/tm.texi: Regenerate.

2014-06-12  Christian Bruelchristian.br...@st.com

* gcc.target/sh/fpchg.c: New test.

This is fine for the trunk.

Thanks for your patience,
Jeff



Re: Bug 61407 - Build errors on latest OS X 10.10 Yosemite with Xcode 6 on GCC 4.8.3

2014-06-17 Thread Mike Stump
On Jun 17, 2014, at 4:09 AM, Илья Михальцов morph...@gmail.com wrote:
 This patch fixes gcc build problems on the latest OS X 10.10 SDK beta (see 
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61407)

fix include hack to add:

 +#ifndef __has_feature
 +#define __has_feature(x) 0
 +#endif

So, I’d like to bring this up in the larger context of autoconf, portable code 
what style we’d like for people to write code in.

From a darwin .h file in /usr/include:

#if defined(__has_feature)  defined(__has_attribute)
#if __has_attribute(deprecated)
#define DEPRECATED_ATTRIBUTE__attribute__((deprecated))
#if __has_feature(attribute_deprecated_with_message)
#define DEPRECATED_MSG_ATTRIBUTE(s) __attribute__((deprecated(s)))
#else
#define DEPRECATED_MSG_ATTRIBUTE(s) __attribute__((deprecated))
#endif
#else
#define DEPRECATED_ATTRIBUTE
#define DEPRECATED_MSG_ATTRIBUTE(s)
#endif
#elif defined(__GNUC__)  ((__GNUC__ = 4) || ((__GNUC__ == 3)  
(__GNUC_MINOR__ = 1)))
#define DEPRECATED_ATTRIBUTE__attribute__((deprecated))
#if (__GNUC__ = 5) || ((__GNUC__ == 4)  (__GNUC_MINOR__ = 5))
#define DEPRECATED_MSG_ATTRIBUTE(s) __attribute__((deprecated(s)))
#else
#define DEPRECATED_MSG_ATTRIBUTE(s) __attribute__((deprecated))
#endif
#else

I think this serves as a great introduction to the feature and what it is, why 
it exists and what it attempts to do.  In short, give code writers an ability 
to smell a port via #if and or if (), and write portable code without using 
autoconf.  Yes, for some truly hard problems, this scheme breaks down, but if 
gcc and other vendor compilers follow the scheme and define these as 
appropriate, then users can make use of this scheme instead of autoconf.  It 
was code like #if defined(__GNUC__) that causes clang to lie and say it is 
gnuc, it does this, as the code doesn’t use a fine grained check for the 
feature, but rather a course grained check on __GNUC__ which is wrong, as other 
compilers implement __attribute__ and __attribute__((deprecated)) that are not 
gcc.

http://clang.llvm.org/docs/LanguageExtensions.html has the names for the things 
that clang defines.  In gcc, we could elect to use the same names and define 
them as appropriate for gcc.  I think if gcc did this, then the quoted fix 
isn’t necessary.  Also, if gcc doesn’t want to do this, it is reasonable for 
the darwin port to so define features, they tend to be large scale and slow 
moving and monotonic in nature, so the maintenance of them should be low in 
general.

What do people think?

Re: [PATCH, loop2_invariant] Pre-check invariants

2014-06-17 Thread Jeff Law

On 06/11/14 03:35, Zhenqiang Chen wrote:


Thanks for the comments. df_live seams redundant.

With flag_ira_loop_pressure, the pass will call df_analyze () at the
beginning, which can make sure all the DF info are correct.

Can we guarantee all DF_... correct without df_analyze ()?

They should be fine in this context.



+/* Pre-check candidate DEST to skip the one which can not make a valid insn
+   during move_invariant_reg.  SIMPlE is to skip HARD_REGISTER.  */

s/SIMPlE/SIMPLE/



+ {
+   /* Multi definitions at this stage, most likely are due to
+  instruction constrain, which requires both read and write

s/constrain/constraints/

Though that doesn't make sense.  Constraints don't come into play until 
much later in the pipeline.   Certainly there's been code in the 
expanders and elsewhere to try and make the code we generate more 
acceptable to 2-address targets and that's probably what you're really 
running into.   I think the code is fine, but that you need to improve 
the comment.


ISTM that if your primary focus is to filter out read/write operands, 
then just say that and ignore the constraints or other mechanisms by 
which we got a read/write pseudo.


So I think with those two small comment changes, this patch is OK for 
the trunk.  Please post the final version for archival purposes before 
checking it in.


Thanks,
Jeff


Re: [PATCH, i386, Pointer Bounds Checker 17/x] Pointer bounds constants support

2014-06-17 Thread Jeff Law

On 06/06/14 03:11, Ilya Enkovich wrote:

2014-06-04 10:58 GMT+04:00 Jeff Law l...@redhat.com:

On 06/02/14 04:25, Ilya Enkovich wrote:


Hi,

This patch adds support for pointer bounds constants to be used as
DECL_INITIAL for constant bounds (like zero bounds).

Bootstrapped and tested on linux-x86_64.

Thanks,
Ilya
--
gcc/

2014-05-30  Ilya Enkovich  ilya.enkov...@intel.com

 * emit-rtl.c (immed_double_const): Support MODE_POINTER_BOUNDS.
 (init_emit_once): Build pointer bounds zero constants.
 * explow.c (trunc_int_for_mode): Likewise.
 * varpool.c (ctor_for_folding): Do not fold constant
 bounds vars.
 * varasm.c (output_constant_pool_2): Support MODE_POINTER_BOUNDS.
 * config/i386/i386.c (ix86_legitimate_constant_p): Mark
 bounds constant as not valid.



[ ... ]




@@ -5875,6 +5876,11 @@ init_emit_once (void)
 if (STORE_FLAG_VALUE == 1)
   const_tiny_rtx[1][(int) BImode] = const1_rtx;

+  for (mode = GET_CLASS_NARROWEST_MODE (MODE_POINTER_BOUNDS);
+   mode != VOIDmode;
+   mode = GET_MODE_WIDER_MODE (mode))
+const_tiny_rtx[0][mode] = immed_double_const (0, 0, mode);


I'm pretty sure GET_CLASS_NARROWEST_MODE should be taking a class, not a
mode as its argument.  So something is clearly wrong here...


MODE_POINTER_BOUNDS is a class. Modes in this class are BND32mode and BND64mode.

Bah.  You're right.  Approved.

jeff




Re: [PATCH, loop2_invariant, 1/2] Check only one register class

2014-06-17 Thread Jeff Law

On 06/11/14 04:05, Zhenqiang Chen wrote:

On 10 June 2014 19:06, Steven Bosscher stevenb@gmail.com wrote:

On Tue, Jun 10, 2014 at 11:22 AM, Zhenqiang Chen wrote:

Hi,

For loop2-invariant pass, when flag_ira_loop_pressure is enabled,
function gain_for_invariant checks the pressures of all register
classes. This does not make sense since one invariant might impact
only one register class.

The patch enhances functions get_inv_cost and gain_for_invariant to
check only the register pressure of the invariant if possible.


This patch may work for targets with more-or-less orthogonal reg
classes, but not if there is a lot of overlap between reg classes.


Yes. I need check the overlap between reg classes.

Patch is updated to check all overlap reg classes by reg_classes_intersect_p:

Just so I'm sure I know what you're trying to do.

You want to map the pseudo back to its likely class(es) then look at how 
those classes (and only those classes) would be impacted from a register 
pressure standpoint if the pseudo was hoisted as an invariant?


This is primarily achieved by returning the class of the invariant, then 
filtering out any non-intersecting classes in gain_for_invariant, right?


jeff




Re: [PATCH] Fortran OpenMP 4.0 target support

2014-06-17 Thread Tobias Burnus

Jakub Jelinek wrote:

This patch adds the target directives.
Tested both normally plus with target.c/splay-tree.c from
gomp-4_0-branch@203409 plus the attached patch against
target.c to implement the new to_pset map kind (5) and
allow handling of NULL.  That patch will need to be forward
ported to whatever gomp-4_0-branch now has after this is merged
from trunk to that branch.

Does this look reasonable to Fortran maintainers?


Thanks for the patch! I browsed through the patch, and it looked good to 
me. (However, given that the patch has 48 files changed, 3342 
insertions(+), 330 deletions(-), I didn't check every line.)


If I did the book keeping correctly, a patch for an alignment test case 
is still missing. As are the changes for some corner cases for which the 
OpenMP ARB has to provide some feedback. Any news from that side? 
Otherwise and aside of 4.9.1 backporting, it now looks pretty complete.


Tobias


Re: [PATCH] Fortran OpenMP 4.0 target support

2014-06-17 Thread Jakub Jelinek
On Tue, Jun 17, 2014 at 11:59:22PM +0200, Tobias Burnus wrote:
 This patch adds the target directives.
 Tested both normally plus with target.c/splay-tree.c from
 gomp-4_0-branch@203409 plus the attached patch against
 target.c to implement the new to_pset map kind (5) and
 allow handling of NULL.  That patch will need to be forward
 ported to whatever gomp-4_0-branch now has after this is merged
 from trunk to that branch.
 
 Does this look reasonable to Fortran maintainers?
 
 Thanks for the patch! I browsed through the patch, and it looked good to me.
 (However, given that the patch has 48 files changed, 3342 insertions(+),
 330 deletions(-), I didn't check every line.)
 
 If I did the book keeping correctly, a patch for an alignment test case is
 still missing. As are the changes for some corner cases for which the OpenMP
 ARB has to provide some feedback. Any news from that side? Otherwise and
 aside of 4.9.1 backporting, it now looks pretty complete.

I think some work is needed in tree-nested.c, ideally write a testcase
that tests all the new OpenMP 4.0 clauses in contained functions with and
without non-local decls (and with local decls used by contained functions).

One of the omp-lang answers shows some work is needed on the UDRs too, in
particular that the combiner/initializer should not be resolved as part of
the UDR directive, but only when used in a reduction clause where not only
the typespec, but also rank/shape, pointer/allocatable etc. are known.

Some further restriction checking is probably needed + backing that with
testcases.  And wait for further omp-lang/omp-f2003 feedback.

Jakub


Re: [Patch, microblaze]: Added load and store reverse patterns

2014-06-17 Thread Michael Eager

On 02/10/14 17:55, Michael Eager wrote:

On 11/25/13 23:54, David Holsgrove wrote:

Added the lwr/swr instructions pattern.
lwr and swr instructions will load/store the data with opposite endianness.

Changelog

2013-11-26  Nagaraju Mekala nagaraju.mek...@xilinx.com

  * gcc/config/microblaze/microblaze.md: Add movsi4_rev insn pattern.
  * gcc/config/microblaze/predicates.md: Add reg_or_mem_operand predicate.



GCC-head: Committed revision 207683.
GCC-4.8-branch: Committed revision 207684.


Reverted GCC-4.8-branch commit.
Committed revision 211750.

--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


Re: [PATCH, ARM] MI-thunk fix for TARGET_THUMB1_ONLY

2014-06-17 Thread Ramana Radhakrishnan
On Sun, Jun 8, 2014 at 12:27 PM, Chung-Lin Tang clt...@codesourcery.com wrote:
 Hi Richard, Ramana,

 Attached is a small fix for resolving a g++.old-deja/g++.jason/thunk2.C
 regression we found under a TARGET_THUMB1_ONLY multilib (-mthumb
 -march=armv6-m to be exact). Basically under those conditions, the thunk
 is in Thumb mode, so the subtraction should be 4 rather than 8.

Yep, this is OK with a minor change to the comment to make it more explicit.

+  /* Output .word .LTHUNKn-[37]-.LTHUNKPCn.  */

s/37/3,7/


Ok with that change and if no regressions.

OK for release branches unless the RM's object in 24 hours.

It would be nice to see if we could rewrite the mi thunk code like
other backends but that's the matter of a separate patch.

Ramana

 Original patch was by Julian, with trivial adaptations for trunk by me.
 We've been carrying this fix for a while by now. Okay for trunk? (and
 stable branches?)

 Thanks,
 Chung-Lin

 2014-06-08  Julian Brown  jul...@codesourcery.com
 Chung-Lin Tang  clt...@codesourcery.com

 * config/arm/arm.c (arm_output_mi_thunk): Fix offset for
 TARGET_THUMB1_ONLY. Add comments.


Re: [PATCH, PR61219]: Fix sNaN handling in ARM float to double conversion

2014-06-17 Thread Ramana Radhakrishnan
On Sun, May 18, 2014 at 10:23 PM, Aurelien Jarno aurel...@aurel32.net wrote:
 On ARM soft-float, the float to double conversion doesn't convert a sNaN
 to qNaN as the IEEE Std 754 standard mandates:

 Under default exception handling, any operation signaling an invalid
 operation exception and for which a floating-point result is to be
 delivered shall deliver a quiet NaN.

 Given the soft float ARM code ignores exceptions and always provides a
 result, a float to double conversion of a signaling NaN should return a
 quiet NaN. Fix this in extendsfdf2.


 2014-05-18  Aurelien Jarno  aurel...@aurel32.net

 PR target/61219
 * config/arm/ieee754-df.S (extendsfdf2): Convert sNaN to qNaN.

Ok if no regressions along with a testcase to catch this case please
and fixing the PR number

Sorry about the slow review.

Ramana



 Index: libgcc/config/arm/ieee754-df.S
 ===
 --- libgcc/config/arm/ieee754-df.S  (revision 210588)
 +++ libgcc/config/arm/ieee754-df.S  (working copy)
 @@ -473,11 +473,15 @@
 eorne   xh, xh, #0x3800 @ fixup exponent otherwise.
 RETc(ne)@ and return it.

 -   teq r2, #0  @ if actually 0
 -   do_it   ne, e
 -   teqne   r3, #0xff00 @ or INF or NAN
 +   bicsr2, r2, #0xff00 @ isolate mantissa
 +   do_it   eq  @ if 0, that is ZERO or INF,
 RETc(eq)@ we are done already.

 +   teq r3, #0xff00 @ check for NAN
 +   do_it   eq, t
 +   orreq   xh, xh, #0x0008 @ change to quiet NAN
 +   RETc(eq)@ and return it.
 +
 @ value was denormalized.  We can normalize it now.
 do_push {r4, r5, lr}
 mov r4, #0x380  @ setup corresponding exponent

 --
 Aurelien Jarno  GPG: 4096R/1DDD8C9B
 aurel...@aurel32.net http://www.aurel32.net


Re: [RFC ARM] Error if overriding --with-tune by --with-cpu

2014-06-17 Thread Ramana Radhakrishnan
On Fri, May 30, 2014 at 5:34 PM, James Greenhalgh
james.greenha...@arm.com wrote:

 Hi,

 We error in the case where both --with-tune and --with-cpu are specified at
 configure time. In this case, we cannot distinguish this situation from the
 situation where --with-tune was specified at configure time and -mcpu was
 passed on the command line, so we give -mcpu precedence.

 This might be surprising if you expect the precedence rules we give
 to the command line options, but we can't change this precedence without
 breaking our definition of -mcpu.

 We also promote the warning which used to be thrown in the case of
 --with-arch and --with-cpu to an error.

Ok by me - Especially as Bin has just run into it as part of his
testing. Obviously no one watches these warnings and they don't
realize what's happening under their feet.


 I've marked this is an RFC as it isn't clear that configure should be
 catching something like this. Other blatant errors in configuration
 options like passing --with-languages=c,c++ pass without event.

Well yeah that looks ok .


 Tested with a few combinations of configure options with no issues and the
 expected behaviour.

 Any opinions, and if not, OK for trunk?


I am going to give this a week for anyone else to pitch in and object
- otherwise please apply it and document this change in behaviour in
the caveats section for the next release (changes.html).

Ramana



 Thanks,
 James

 ---
 gcc/

 2014-05-30  James Greenhalgh  james.greenha...@arm.com

 * config.gcc (supported_defaults): Error when passing either
 --with-tune or --with-arch in conjunction with --with-cpu for ARM.


[PATCH, rs6000] Fix PR61542 - V4SF vector extract for little endian

2014-06-17 Thread BIll Schmidt
Hi,

As described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61542, a
new test case (gcc.dg/vect/vect-nop-move.c) was added in 4.9.  This
exposes a bug on PowerPC little endian for extracting an element from a
V4SF value that goes back to 4.8.  The following patch fixes the
problem.

Tested on powerpc64le-unknown-linux-gnu with no regressions.  Ok to
commit to trunk?  I would also like to commit to 4.8 and 4.9 as soon as
possible to be picked up by the distros.

I would also like to backport gcc.dg/vect/vect-nop-move.c to 4.8 to
provide regression coverage.

Thanks,
Bill


2014-06-17  Bill Schmidt  wschm...@linux.vnet.ibm.com

* config/rs6000/vsx.md (vsx_extract_v4sf): Fix bug with element
extraction other than index 3.


Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 211741)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -1667,7 +1667,7 @@
 {
   if (GET_CODE (op3) == SCRATCH)
op3 = gen_reg_rtx (V4SFmode);
-  emit_insn (gen_vsx_xxsldwi_v4sf (op3, op1, op1, op2));
+  emit_insn (gen_vsx_xxsldwi_v4sf (op3, op1, op1, GEN_INT (ele)));
   tmp = op3;
 }
   emit_insn (gen_vsx_xscvspdp_scalar2 (op0, tmp));




Re: [PATCH][ARM] FAIL: gcc.target/arm/pr58041.c scan-assembler ldrb

2014-06-17 Thread Ramana Radhakrishnan
On Fri, May 30, 2014 at 12:19 AM, Maciej W. Rozycki
ma...@codesourcery.com wrote:
 On Wed, 28 May 2014, Richard Earnshaw wrote:

 Ah, light dawns (maybe).

 I guess the problems stem from the attempts to combine Neon with ARMv5.
  Neon shouldn't be used with anything prior to ARMv7, since that's the
 earliest version of the architecture that can support it.

  Good to know, thanks for the hint.  Anyway it's the test case doing
 something silly or maybe just odd.  After all IIUC ARMv5 code will run
 just fine on ARMv7/NEON hardware so mixing up ARMv5 scalar code with NEON
 vector code is nothing wrong per se.

 I guess that what is happening is that we see we have Neon, so start to
 generate a Neon-based copy sequence, but then notice that we don't have
 misaligned access (something that must exist if we have Neon) and
 generate VLDR instructions in a mistaken attempt to work around the
 first inconsistency.

 Maybe we should tie -mfpu=neon to having at least ARMv7 (though ARMv6
 also has misaligned access support).

  So to move away from the odd mixture of instruction selection options
 just as a quick test I rebuilt the same file with `-march=armv7-a
 -mno-unaligned-access' and the result is the same, a pair of VLDR
 instructions accessing unaligned memory, i.e. the same problem.

  So based on observations made so far I think there are two sensible
 ways to move forward:

 1. Fix GCC so that a manual byte-wise copy is made whenever
   `-mno-unaligned-access' is in effect.

#1 is the preferrable option.



 2. Revert the change being discussed here as its lone purpose was to
disable the use of VLD1.8, etc. where `-mno-unaligned-access' is in
effect, and it does no good.

Reverting this means pr58041 will fail on armv7-a / neon
configurations which is what this patch was designed to fix ?  So
it's not an option is it ?

Ramana


   Maciej


Re: [PATCH] [ARM] [RFC] Fix longstanding push_minipool_fix ICE (PR49423, lp1296601)

2014-06-17 Thread Ramana Radhakrishnan
On Wed, Apr 2, 2014 at 2:29 PM, Charles Baylis
charles.bay...@linaro.org wrote:
 Hi

 This patch fixes the push_minipool_fix ICE, which occurs when the ARM
 backend encounters a zero/sign extending load from a constant pool.

 I don't have a current test case for trunk, lp1296601 has a test case
 which affects the linaro-4.8 branch. As far as I know, there has been
 no fix for this on trunk.

 The approach taken in this patch is to extend each pattern where this
 can occur,  so that it triggers a define_split to synthesise a
 constant move instead. Some but not all extend patterns have
 previously added pool_range attributes to work-around this problem,
 this patch removes those, and also fixes the remaining patterns. Some
 patterns have slightly more complex workarounds, which I have not yet
 analysed, but it seems worth posting the patch at this stage to get
 feedback on the general approach.

 Tested on arm-unknown-linux-gnueabihf (qemu), bootstrap in progress.

 If this looks good, I'll clean it up for a more detailed review.

Interesting workaround but can we investigate further how to fix this
at the source rather than working around in the backend in this form.
It's still a kludge that we carry in the backend rather than fix the
problem at it's source. I'd rather try to fix the problem at the
source rather than working around this in the backend.


Ramana


 Thanks
 Charles


C++ PATCH for c++/60605 (local function and default template arg)

2014-06-17 Thread Jason Merrill
The exception for local declarations in check_default_tmpl_args needs to 
handle DECL_LOCAL_FUNCTION_P, too.


Tested x86_64-pc-linux-gnu, applying to 4.8, 4.9, trunk.
commit 424c657e1213126dc5d2a7231abac05e16713286
Author: Jason Merrill ja...@redhat.com
Date:   Tue Jun 17 18:43:57 2014 +0200

	PR c++/60605
	* pt.c (check_default_tmpl_args): Check DECL_LOCAL_FUNCTION_P.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 85b46fe..a4e1a59 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -4308,7 +4308,8 @@ check_default_tmpl_args (tree decl, tree parms, bool is_primary,
  in the template-parameter-list of the definition of a member of a
  class template.  */
 
-  if (TREE_CODE (CP_DECL_CONTEXT (decl)) == FUNCTION_DECL)
+  if (TREE_CODE (CP_DECL_CONTEXT (decl)) == FUNCTION_DECL
+  || (TREE_CODE (decl) == FUNCTION_DECL  DECL_LOCAL_FUNCTION_P (decl)))
 /* You can't have a function template declaration in a local
scope, nor you can you define a member of a class template in a
local scope.  */
diff --git a/gcc/testsuite/g++.dg/template/local-fn1.C b/gcc/testsuite/g++.dg/template/local-fn1.C
new file mode 100644
index 000..88acd17
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/local-fn1.C
@@ -0,0 +1,8 @@
+// PR c++/60605
+
+template typename T = int
+struct Foo {
+void bar() {
+void bug();
+}
+};


[PATCH] PR61517: fix stmt replacement in bswap pass

2014-06-17 Thread Thomas Preud'homme
Hi everybody,

Thanks to a comment from Richard Biener, the bswap pass take care to not 
perform its optimization is memory is modified between the load of the original 
expression. However, when it replaces these statements by a single load, it 
does so in the gimple statement that computes the final bitwise OR of the 
original expression. However, memory could be modified between the last load 
statement and this bitwise OR statement. Therefore the result is to read memory 
*after* it was changed instead of before.

This patch takes care to move the statement to be replaced close to one of the 
original load, thus avoiding this problem.

ChangeLog entries for this fix are:

*** gcc/ChangeLog ***

2014-06-16  Thomas Preud'homme  thomas.preudho...@arm.com

* tree-ssa-math-opts.c (find_bswap_or_nop_1): Adapt to return a stmt
whose rhs's first tree is the source expression instead of the
expression itself.
(find_bswap_or_nop): Likewise.
(bsap_replace): Rename stmt in cur_stmt. Pass gsi by value and src as a
gimple stmt whose rhs's first tree is the source. In the memory source
case, move the stmt to be replaced close to one of the original load to
avoid the problem of a store between the load and the stmt's original
location.
(pass_optimize_bswap::execute): Adapt to change in bswap_replace's
signature.

*** gcc/testsuite/ChangeLog ***

2014-06-16  Thomas Preud'homme  thomas.preudho...@arm.com

* gcc.c-torture/execute/bswap-2.c (incorrect_read_le32): New.
(incorrect_read_be32): Likewise.
(main): Call incorrect_read_* to test stmt replacement is made by
bswap at the right place.
* gcc.c-torture/execute/pr61517.c: New test.

Patch also attached for convenience. Is it ok for trunk?

diff --git a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c 
b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
index a47e01a..88132fe 100644
--- a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
+++ b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
@@ -66,6 +66,32 @@ fake_read_be32 (char *x, char *y)
   return c3 | c2  8 | c1  16 | c0  24;
 }
 
+__attribute__ ((noinline, noclone)) uint32_t
+incorrect_read_le32 (char *x, char *y)
+{
+  unsigned char c0, c1, c2, c3;
+
+  c0 = x[0];
+  c1 = x[1];
+  c2 = x[2];
+  c3 = x[3];
+  *y = 1;
+  return c0 | c1  8 | c2  16 | c3  24;
+}
+
+__attribute__ ((noinline, noclone)) uint32_t
+incorrect_read_be32 (char *x, char *y)
+{
+  unsigned char c0, c1, c2, c3;
+
+  c0 = x[0];
+  c1 = x[1];
+  c2 = x[2];
+  c3 = x[3];
+  *y = 1;
+  return c3 | c2  8 | c1  16 | c0  24;
+}
+
 int
 main ()
 {
@@ -92,8 +118,17 @@ main ()
   out = fake_read_le32 (cin, cin[2]);
   if (out != 0x89018583)
 __builtin_abort ();
+  cin[2] = 0x87;
   out = fake_read_be32 (cin, cin[2]);
   if (out != 0x83850189)
 __builtin_abort ();
+  cin[2] = 0x87;
+  out = incorrect_read_le32 (cin, cin[2]);
+  if (out != 0x89878583)
+__builtin_abort ();
+  cin[2] = 0x87;
+  out = incorrect_read_be32 (cin, cin[2]);
+  if (out != 0x83858789)
+__builtin_abort ();
   return 0;
 }
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61517.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61517.c
new file mode 100644
index 000..fc9bbe8
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61517.c
@@ -0,0 +1,19 @@
+int a, b, *c = a;
+unsigned short d;
+
+int
+main ()
+{
+  unsigned int e = a;
+  *c = 1;
+  if (!b)
+{
+  d = e;
+  *c = d | e;
+}
+
+  if (a != 0)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index c868e92..1ee2ba8 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1804,28 +1804,28 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct 
symbolic_number *n)
 
 /* find_bswap_or_nop_1 invokes itself recursively with N and tries to perform
the operation given by the rhs of STMT on the result.  If the operation
-   could successfully be executed the function returns the tree expression of
-   the source operand and NULL otherwise.  */
+   could successfully be executed the function returns a gimple stmt whose
+   rhs's first tree is the expression of the source operand and NULL
+   otherwise.  */
 
-static tree
+static gimple
 find_bswap_or_nop_1 (gimple stmt, struct symbolic_number *n, int limit)
 {
   enum tree_code code;
   tree rhs1, rhs2 = NULL;
-  gimple rhs1_stmt, rhs2_stmt;
-  tree source_expr1;
+  gimple rhs1_stmt, rhs2_stmt, source_stmt1;
   enum gimple_rhs_class rhs_class;
 
   if (!limit || !is_gimple_assign (stmt))
-return NULL_TREE;
+return NULL;
 
   rhs1 = gimple_assign_rhs1 (stmt);
 
   if (find_bswap_or_nop_load (stmt, rhs1, n))
-return rhs1;
+return stmt;
 
   if (TREE_CODE (rhs1) != SSA_NAME)
-return NULL_TREE;
+return NULL;
 
   code = gimple_assign_rhs_code (stmt);
   rhs_class = gimple_assign_rhs_class (stmt);
@@ -1848,18 +1848,18 @@ find_bswap_or_nop_1 

Re: [PATCH, loop2_invariant, 1/2] Check only one register class

2014-06-17 Thread Zhenqiang Chen
On 18 June 2014 05:49, Jeff Law l...@redhat.com wrote:
 On 06/11/14 04:05, Zhenqiang Chen wrote:

 On 10 June 2014 19:06, Steven Bosscher stevenb@gmail.com wrote:

 On Tue, Jun 10, 2014 at 11:22 AM, Zhenqiang Chen wrote:

 Hi,

 For loop2-invariant pass, when flag_ira_loop_pressure is enabled,
 function gain_for_invariant checks the pressures of all register
 classes. This does not make sense since one invariant might impact
 only one register class.

 The patch enhances functions get_inv_cost and gain_for_invariant to
 check only the register pressure of the invariant if possible.


 This patch may work for targets with more-or-less orthogonal reg
 classes, but not if there is a lot of overlap between reg classes.


 Yes. I need check the overlap between reg classes.

 Patch is updated to check all overlap reg classes by
 reg_classes_intersect_p:

 Just so I'm sure I know what you're trying to do.

 You want to map the pseudo back to its likely class(es) then look at how
 those classes (and only those classes) would be impacted from a register
 pressure standpoint if the pseudo was hoisted as an invariant?

Yes.

 This is primarily achieved by returning the class of the invariant, then
 filtering out any non-intersecting classes in gain_for_invariant, right?

Yes. This is what I want to do since I found some invariant which
register class is NO_REGS (memory write) or SSE_REGS is blocked by
GENERAL_REGS' register pressure.

Thanks!
-Zhenqiang


RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap

2014-06-17 Thread Thomas Preud'homme
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Wednesday, June 11, 2014 4:32 PM
 
 
  Is this OK for trunk? Does this bug qualify for a backport patch to
  4.8 and 4.9 branches?
 
 This is ok for trunk and also for backporting (after a short while to
 see if there is any fallout).

Below is the backported patch for 4.8/4.9. Is this ok for both 4.8 and
4.9? If yes, how much more should I wait before committing?

Tested on both 4.8 and 4.9 without regression in the testsuite after
a bootstrap.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 1e35bbe..0559b7f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,16 @@
+2014-06-12  Thomas Preud'homme  thomas.preudho...@arm.com
+
+   PR tree-optimization/61306
+   * tree-ssa-math-opts.c (struct symbolic_number): Store type of
+   expression instead of its size.
+   (do_shift_rotate): Adapt to change in struct symbolic_number. Return
+   false to prevent optimization when the result is unpredictable due to
+   arithmetic right shift of signed type with highest byte is set.
+   (verify_symbolic_number_p): Adapt to change in struct symbolic_number.
+   (find_bswap_1): Likewise. Return NULL to prevent optimization when the
+   result is unpredictable due to sign extension.
+   (find_bswap): Adapt to change in struct symbolic_number.
+
 2014-06-12  Alan Modra  amo...@gmail.com
 
PR target/61300
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 757cb74..139f23c 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2014-06-12  Thomas Preud'homme  thomas.preudho...@arm.com
+
+   * gcc.c-torture/execute/pr61306-1.c: New test.
+   * gcc.c-torture/execute/pr61306-2.c: Likewise.
+   * gcc.c-torture/execute/pr61306-3.c: Likewise.
+
 2014-06-11  Richard Biener  rguent...@suse.de
 
PR tree-optimization/61452
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c
new file mode 100644
index 000..ebc90a3
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c
@@ -0,0 +1,39 @@
+#ifdef __INT32_TYPE__
+typedef __INT32_TYPE__ int32_t;
+#else
+typedef int int32_t;
+#endif
+
+#ifdef __UINT32_TYPE__
+typedef __UINT32_TYPE__ uint32_t;
+#else
+typedef unsigned uint32_t;
+#endif
+
+#define __fake_const_swab32(x) ((uint32_t)(  \
+   (((uint32_t)(x)  (uint32_t)0x00ffUL)  24) |\
+   (((uint32_t)(x)  (uint32_t)0xff00UL)   8) |\
+   (((uint32_t)(x)  (uint32_t)0x00ffUL)   8) |\
+   (( (int32_t)(x)   (int32_t)0xff00UL)  24)))
+
+/* Previous version of bswap optimization failed to consider sign extension
+   and as a result would replace an expression *not* doing a bswap by a
+   bswap.  */
+
+__attribute__ ((noinline, noclone)) uint32_t
+fake_bswap32 (uint32_t in)
+{
+  return __fake_const_swab32 (in);
+}
+
+int
+main(void)
+{
+  if (sizeof (int32_t) * __CHAR_BIT__ != 32)
+return 0;
+  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
+return 0;
+  if (fake_bswap32 (0x87654321) != 0xff87)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c
new file mode 100644
index 000..886ecfd
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c
@@ -0,0 +1,40 @@
+#ifdef __INT16_TYPE__
+typedef __INT16_TYPE__ int16_t;
+#else
+typedef short int16_t;
+#endif
+
+#ifdef __UINT32_TYPE__
+typedef __UINT32_TYPE__ uint32_t;
+#else
+typedef unsigned uint32_t;
+#endif
+
+#define __fake_const_swab32(x) ((uint32_t)(  \
+   (((uint32_t) (x)  (uint32_t)0x00ffUL)  24) |   \
+   (((uint32_t)(int16_t)(x)  (uint32_t)0x0000UL)   8) |   \
+   (((uint32_t) (x)  (uint32_t)0x00ffUL)   8) |   \
+   (((uint32_t) (x)  (uint32_t)0xff00UL)  24)))
+
+
+/* Previous version of bswap optimization failed to consider sign extension
+   and as a result would replace an expression *not* doing a bswap by a
+   bswap.  */
+
+__attribute__ ((noinline, noclone)) uint32_t
+fake_bswap32 (uint32_t in)
+{
+  return __fake_const_swab32 (in);
+}
+
+int
+main(void)
+{
+  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
+return 0;
+  if (sizeof (int16_t) * __CHAR_BIT__ != 16)
+return 0;
+  if (fake_bswap32 (0x81828384) != 0xff838281)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c
new file mode 100644
index 000..6086e27
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c
@@ -0,0 +1,13 @@
+short a = -1;
+int b;
+char c;
+
+int
+main ()
+{
+  c = a;
+  b = a | c;
+  if (b != -1)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 9ff857c..2b656ae 100644
--- a/gcc/tree-ssa-math-opts.c
+++ 

[PATCH, aarch64] Fix 61545

2014-06-17 Thread Richard Henderson
Trivial fix for missing clobber of the flags over the tlsdesc call.

Ok for all branches?


r~

* config/aarch64/aarch64.md (tlsdesc_small_PTR): Clobber CC_REGNUM.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a4d8887..1ee2cae 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3855,6 +3855,7 @@
 (unspec:PTR [(match_operand 0 aarch64_valid_symref S)]
   UNSPEC_TLSDESC))
(clobber (reg:DI LR_REGNUM))
+   (clobber (reg:CC CC_REGNUM))
(clobber (match_scratch:DI 1 =r))]
   TARGET_TLS_DESC
   adrp\\tx0, %A0\;ldr\\t%w1, [x0, #%L0]\;add\\tw0, w0, 
%L0\;.tlsdesccall\\t%0\;blr\\t%1


Re: [PATCH][PING] Fix for PR 61422

2014-06-17 Thread Yury Gribov
Have already been done in r211699. Does it work for you? Adding a test 
would still be useful.


-Y