date:20140617

[DOC Patch] Attribute 'naked'

2014-06-17 Thread David Wohlferd

I don't have permissions to commit this patch, but I do have a release 
on file with the FSF.


Problem description:
The docs for the function attribute 'naked' are confusing and 
self-contradictory.  Also, discussion on this thread 
https://gcc.gnu.org/ml/gcc/2014-05/msg00100.html has lead to changing 
the text from the vague "avoid using" to the very clear "not supported" 
regarding the usage of Extended asm with 'naked.'  Lastly, this 
attribute should be mentioned when describing the differences between 
Basic and Extended asm.


ChangeLog:
2014-06-17  David Wohlferd 

* doc/extend.texi (Function Attributes): Update 'naked' 
attribute doc.


dw
Index: extend.texi
===
--- extend.texi	(revision 210624)
+++ extend.texi	(working copy)
@@ -3332,16 +3332,15 @@
 
 @item naked
 @cindex function without a prologue/epilogue code
-Use this attribute on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX and SPU
-ports to indicate that the specified function does not need prologue/epilogue
-sequences generated by the compiler.
-It is up to the programmer to provide these sequences. The
-only statements that can be safely included in naked functions are
-@code{asm} statements that do not have operands.  All other statements,
-including declarations of local variables, @code{if} statements, and so
-forth, should be avoided.  Naked functions should be used to implement the
-body of an assembly function, while allowing the compiler to construct
-the requisite function declaration for the assembler.
+This attribute is available on the ARM, AVR, MCORE, MSP430, NDS32,
+RL78, RX and SPU ports.  It allows the compiler to construct the
+requisite function declaration, while allowing the body of the
+function to be assembly code. The specified function will not have
+prologue/epilogue sequences generated by the compiler. Only Basic
+@code{asm} statements can safely be included in naked functions
+(@pxref{Basic Asm}). While using Extended @code{asm} or a mixture of
+Basic @code{asm} and ``C'' code may appear to work, they cannot be
+depended upon to work reliably and are not supported.
 
 @item near
 @cindex functions that do not handle memory bank switching on 68HC11/68HC12
@@ -6269,6 +6268,8 @@
 efficient code, and in most cases it is a better solution. When writing 
 inline assembly language outside of C functions, however, you must use Basic 
 @code{asm}. Extended @code{asm} statements have to be inside a C function.
+Functions declared with the @code{naked} attribute also require Basic 
+@code{asm} (@pxref{Function Attributes}).
 
 Under certain circumstances, GCC may duplicate (or remove duplicates of) your 
 assembly code when optimizing. This can lead to unexpected duplicate 
@@ -6388,6 +6389,8 @@
 
 Note that Extended @code{asm} statements must be inside a function. Only 
 Basic @code{asm} may be outside functions (@pxref{Basic Asm}).
+Functions declared with the @code{naked} attribute also require Basic 
+@code{asm} (@pxref{Function Attributes}).
 
 While the uses of @code{asm} are many and varied, it may help to think of an 
 @code{asm} statement as a series of low-level instructions that convert input

Re: [PATCH] Fix PR61335

2014-06-17 Thread Uros Bizjak

On Fri, Jun 6, 2014 at 10:07 AM, Uros Bizjak  wrote:
> On Fri, Jun 6, 2014 at 9:47 AM, Uros Bizjak  wrote:
>
>>> 2014-05-28  Richard Biener  
>>>
>>> PR tree-optimization/61335
>>> * tree-vrp.c (vrp_visit_phi_node): If the compare of old and
>>> new range fails, drop to varying.
>>>
>>> * gfortran.dg/pr61335.f90: New testcase.
>>
>> This testcase triggers SIGFPE on alpha due to the use of denormal
>> operand. Maybe uninitialized value is used in line 48?
>
> SIGFPE also triggers at the same place on x86_64 with unmasked FPE
> exceptions (compile with -O0).

Attached patch initializes problematic array to zero instead of
uninitialized value.

2014-06-17  Uros Bizjak  

* gfortran.dg/pr61335.f90 (cp_unit_create): Initialize
unit_id and kind_id to zero.

Tested on alphaev68-linux-gnu and x86_64-linux-gnu.

OK for mainline?

Uros.

Index: gfortran.dg/pr61335.f90
===
--- gfortran.dg/pr61335.f90 (revision 211723)
+++ gfortran.dg/pr61335.f90 (working copy)
@@ -45,8 +45,8 @@
 LOGICAL  :: failure

 failure=.FALSE.
-unit_id=cp_units_none
-kind_id=cp_ukind_none
+unit_id=0
+kind_id=0
 power=0
 i_low=1
 i_high=1

[PATCH] PR54555: Use strict_low_part for loading a constant only if it is cheaper

2014-06-17 Thread Andreas Schwab

Postreload may transform (set (REGX) (CONST_INT A)) ... (set (REGX)
(CONST_INT B)) to (set (REGX) (CONST_INT A)) ... (set (STRICT_LOW_PART
(REGX)) (CONST_INT B)), but it should do that only if the latter is
cheaper.  On m68k, a full word load of a small constant with moveq is
cheaper than doing a byte load with move.b.

Tested on m68k-suse-linux and x86_64-suse-linux.  In both cases the size
of cc1* becomes smaller with this change.

Andreas.

PR rtl-optimization/54555
* postreload.c (move2add_use_add2_insn): Only substitute
STRICT_LOW_PART if it is cheaper.
---
 gcc/postreload.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/postreload.c b/gcc/postreload.c
index 9d71649..89f0c84 100644
--- a/gcc/postreload.c
+++ b/gcc/postreload.c
@@ -1805,10 +1805,14 @@ move2add_use_add2_insn (rtx reg, rtx sym, rtx off, rtx 
insn)
   gen_rtx_STRICT_LOW_PART (VOIDmode,
narrow_reg),
   narrow_src);
- changed = validate_change (insn, &PATTERN (insn),
-new_set, 0);
- if (changed)
-   break;
+ get_full_set_rtx_cost (new_set, &newcst);
+ if (costs_lt_p (&newcst, &oldcst, speed))
+   {
+ changed = validate_change (insn, &PATTERN (insn),
+new_set, 0);
+ if (changed)
+   break;
+   }
}
}
}
-- 
2.0.0

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: Regimplification enhancements 1/3

2014-06-17 Thread Richard Biener

On Mon, Jun 16, 2014 at 11:52 PM, Mike Stump  wrote:
> On Jun 16, 2014, at 10:49 AM, Bernd Schmidt  wrote:
>>
>> There are two reasons why I can't do this in the frontends - one, Joseph has 
>> already rejected a C frontend patch,
>
> I’d like to think there is an acceptable way to get the right memory space on 
> things...
>
>> and two, this needs to work with OpenACC offloading - i.e. code is initially 
>> compiled by an x86 host compiler, then a ptx lto1 reads it in and needs to 
>> make it valid for that target.
>
> Ah yes, that would do it, thanks.  I can see my port as an offload target…  
> I’ll have to keep on eye on OpenACC and gcc.

But then IMHO using the gimplifier to do this fixup is wrong.  Please add
those required ADDR_SPACE_CONVERT_EXPRs in your pass manually.
After all you also have to adjust types of MEM_REFs and possibly
types of pointer variables (and pointer sizes?).

Richard.

Re: fix math wrt volatile-bitfields vs C++ model

2014-06-17 Thread Richard Biener

On Tue, Jun 17, 2014 at 4:08 AM, DJ Delorie  wrote:
>
>> Looks ok to me, but can you add a testcase please?
>
> I have a testcase, but if -flto the testcase doesn't include *any*
> definition of the test function, just all the LTO data.  Is this
> normal?

Without -ffat-lto-objects yes, this is normal.  If you are trying to
do a scan-assembler or so then this will be difficult with LTO.
If LTO is not necessary to trigger the bug and you just want to
use the torture I suggest to dg-skip-if -flto.

>> Also check if 4.9 is affected.
>
> It is...  same fix works, though.

Thanks,
Richard.

Re: [PATCH][genattrtab] Fix memory corruption, allocate enough memory for all bypassed reservations

2014-06-17 Thread Kyrill Tkachov



On 16/06/14 17:39, Jeff Law wrote:

On 06/16/14 04:12, Kyrill Tkachov wrote:


Doh, you're right. I did consider it but for some reason thought we
might want to iterate over all of the bypasses anyway. Breaking out
seems good.

How about this?
Tested on arm and aarch64 and confirmed with valgrind that no out of
bounds accesses occur.
I kicked off an x86_64 bootstrap but don't expect any problems.

Thanks,
Kyrill

genattrtab-bypasses.patch


commit 676b85f7a7cc1446482334dcaad457ac328875a8
Author: Kyrylo Tkachov
Date:   Fri Jun 13 11:09:57 2014 +0100

  [genattrtab] Fix memory corruption with bypasses

I'm an idiot.  n_bypassed is used to size the vector, so you do have to
walk the entire list.


AFAICS in the loop in process_bypasses we want to count all the 
reservations which have a bypass matching them. Once a reservation is 
matched with a bypass it should be safe to break out of the inner loop 
(over the bypasses), even if two bypasses match a reservation we only 
want to count the reservation once.


So I think the 2nd version of the patch is good

Thanks,
Kyrill


Jeff

Re: [PATCH, cprop] Check rtx_cost when propagating constant

2014-06-17 Thread Richard Biener

On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen
 wrote:
> Hi,
>
> For some large constant, ports like ARM, need one more instructions to
> operate it. e.g
>
> #define MASK 0xfe00ff
> void maskdata (int * data, int len)
> {
>int i = len;
>for (; i > 0; i -= 2)
> {
>   data[i] &= MASK;
>   data[i + 1] &= MASK;
> }
> }
>
> Need two instructions for each AND operation:
>
> andr3, r3, #16711935
> bicr3, r3, #65536
>
> If we keep the MASK in a register, loop2_invariant pass can hoist it
> out the loop. And it can be shared by different references.
>
> So the patch skips constant propagation if it makes INSN's cost higher.

So cprop undos invariant motions work here?

Should we make sure we add a REG_EQUAL note when not propagating?

> Bootstrap and no make check regression on X86-64 and ARM Chrome book.
>
> OK for trunk?
>
> Thanks!
> -Zhenqiang
>
> ChangeLog:
> 2014-06-17  Zhenqiang Chen  
>
> * cprop.c (try_replace_reg): Check cost for constants.
>
> diff --git a/gcc/cprop.c b/gcc/cprop.c
> index aef3ee8..c9cf02a 100644
> --- a/gcc/cprop.c
> +++ b/gcc/cprop.c
> @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn)
>rtx src = 0;
>int success = 0;
>rtx set = single_set (insn);
> +  int old_cost = 0;
> +  bool copy_p = false;
> +  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn));
> +
> +  if (set && SET_SRC (set) && REG_P (SET_SRC (set)))
> +copy_p = true;
> +  else
> +old_cost = set_rtx_cost (set, speed);

Looks bogus for set == NULL?

Also what about register pressure?

I think this kind of change needs wider testing as RTX costs are
usually not fully implemented and you introduce a new use kind
(or is it already used elsewhere in this way to compute cost
difference of a set with s/reg/const?).

What kind of performance difference do you see?

Thanks,
Richard.

>/* Usually we substitute easy stuff, so we won't copy everything.
>   We however need to take care to not duplicate non-trivial CONST
> @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn)
>to = copy_rtx (to);
>
>validate_replace_src_group (from, to, insn);
> +
> +  /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop.
> + And it can be shared by different references.  So skip propagation if
> + it makes INSN's rtx cost higher.  */
> +  if (set && !copy_p && CONSTANT_P (to))
> +{
> +  int new_cost = set_rtx_cost (set, speed);
> +  if (new_cost > old_cost)
> +   {
> + cancel_changes (0);
> + return false;
> +   }
> +}
> +
>if (num_changes_pending () && apply_change_group ())
>  success = 1;

Re: Turn DECL_SECTION_NAME into string

2014-06-17 Thread Richard Biener

On Tue, Jun 17, 2014 at 8:40 AM, Thomas Schwinge
 wrote:
> Hi!
>
> On Thu, 12 Jun 2014 06:33:25 +0200, Jan Hubicka  wrote:
>> this lenghtly patch makes the legwork to put section names out of tree 
>> representation.
>> Originally they were STRING_CST. I ended up implementing on-side reference 
>> counted
>> string voclabulary that is done in bit baroque way to be GGC and PCH safe 
>> (uff).
>
> As reported in , this causes a build failure
> with --enable-checking=fold:
>
> /home/dimhen/src/gcc_current/gcc/fold-const.c: In function 'void 
> fold_checksum_tree(const_tree, md5_ctx*, hash_table)':
> /home/dimhen/src/gcc_current/gcc/fold-const.c:14863:55: error: cannot 
> convert 'const char*' to 'const_tree {aka const tree_node*}' for argument '1' 
> to 'void fold_checksum_tree(const_tree, md5_ctx*, 
> hash_table >)'
>   fold_checksum_tree (DECL_SECTION_NAME (expr), ctx, ht);
>
> From light testing the following seems to get around this -- is it the
> appropriate fix?

Yes.  This is ok.

Thanks,
Richard.

> diff --git gcc/fold-const.c gcc/fold-const.c
> index 24daaa3..978b854 100644
> --- gcc/fold-const.c
> +++ gcc/fold-const.c
> @@ -14859,8 +14859,6 @@ fold_checksum_tree (const_tree expr, struct md5_ctx 
> *ctx,
>   fold_checksum_tree (DECL_ABSTRACT_ORIGIN (expr), ctx, ht);
>   fold_checksum_tree (DECL_ATTRIBUTES (expr), ctx, ht);
> }
> -  if (CODE_CONTAINS_STRUCT (TREE_CODE (expr), TS_DECL_WITH_VIS))
> -   fold_checksum_tree (DECL_SECTION_NAME (expr), ctx, ht);
>
>if (CODE_CONTAINS_STRUCT (TREE_CODE (expr), TS_DECL_NON_COMMON))
> {
>
>
> Grüße,
>  Thomas

[gomp4] Merge trunk r211693 (2014-06-16) into gomp-4_0-branch

2014-06-17 Thread Thomas Schwinge

Hi!

In r211726, I have committed a merge from trunk r211693 (2014-06-16) into
gomp-4_0-branch.


The LTO regression that appeared with an earlier merge,
,
remains to be resolved:

 PASS: gcc.dg/lto/save-temps c_lto_save-temps_0.o assemble,  -O -flto 
-save-temps
-PASS: gcc.dg/lto/save-temps c_lto_save-temps_0.o-c_lto_save-temps_0.o 
link,  -O -flto -save-temps
+FAIL: gcc.dg/lto/save-temps c_lto_save-temps_0.o-c_lto_save-temps_0.o 
link,  -O -flto -save-temps
+UNRESOLVED: gcc.dg/lto/save-temps 
c_lto_save-temps_0.o-c_lto_save-temps_0.o execute  -O -flto -save-temps

Executing on host: [...]/build/gcc/xgcc -B[...]/build/gcc/  
-fno-diagnostics-show-caret -fdiagnostics-color=never   -O -flto -save-temps  
-c  -o c_lto_save-temps_0.o 
[...]/source/gcc/testsuite/gcc.dg/lto/save-temps_0.c(timeout = 300)
spawn [...]/build/gcc/xgcc -B[...]/build/gcc/ -fno-diagnostics-show-caret 
-fdiagnostics-color=never -O -flto -save-temps -c -o c_lto_save-temps_0.o 
[...]/source/gcc/testsuite/gcc.dg/lto/save-temps_0.c
PASS: gcc.dg/lto/save-temps c_lto_save-temps_0.o assemble,  -O -flto 
-save-temps
Executing on host: [...]/build/gcc/xgcc -B[...]/build/gcc/ 
c_lto_save-temps_0.o  -fno-diagnostics-show-caret -fdiagnostics-color=never   
-O -flto -save-temps   -o gcc-dg-lto-save-temps-01.exe(timeout = 300)
spawn [...]/build/gcc/xgcc -B[...]/build/gcc/ c_lto_save-temps_0.o 
-fno-diagnostics-show-caret -fdiagnostics-color=never -O -flto -save-temps -o 
gcc-dg-lto-save-temps-01.exe
[...]/build/gcc/xgcc @/tmp/ccjomvFW
[...]/build/gcc/xgcc @/tmp/ccAM0t6j
output is:
[...]/build/gcc/xgcc @/tmp/ccjomvFW
[...]/build/gcc/xgcc @/tmp/ccAM0t6j

FAIL: gcc.dg/lto/save-temps c_lto_save-temps_0.o-c_lto_save-temps_0.o link, 
 -O -flto -save-temps
UNRESOLVED: gcc.dg/lto/save-temps c_lto_save-temps_0.o-c_lto_save-temps_0.o 
execute  -O -flto -save-temps


Owing to the Fortran front end changes for OpenMP 4 User-Defined
Reductions, I have adapted the expected error messages for OpenACC as
follows.  While this is not critical, perhaps someone may want to improve
this later on; so noting this here for later reference.

--- gcc/testsuite/gfortran.dg/goacc/reduction.f95
+++ gcc/testsuite/gfortran.dg/goacc/reduction.f95
@@ -66,73 +66,73 @@ common /blk/ i1
 !$acc end parallel
 !$acc parallel reduction (*:ia1)   ! { dg-error "Assumed size" }
 !$acc end parallel
-!$acc parallel reduction (+:l1)! { dg-error "must be of 
numeric type, got LOGICAL" }
+!$acc parallel reduction (+:l1)! { dg-error "OMP DECLARE 
REDUCTION \\+ not found for type LOGICAL" }
 !$acc end parallel
-!$acc parallel reduction (*:la1)   ! { dg-error "must be of numeric type, 
got LOGICAL" }
+!$acc parallel reduction (*:la1)   ! { dg-error "OMP DECLARE REDUCTION \\* 
not found for type LOGICAL" }
 !$acc end parallel
-!$acc parallel reduction (-:a1)! { dg-error "must be of 
numeric type, got CHARACTER" }
+!$acc parallel reduction (-:a1)! { dg-error "OMP DECLARE 
REDUCTION - not found for type CHARACTER" }
 !$acc end parallel
-!$acc parallel reduction (+:t1)! { dg-error "must be of 
numeric type, got TYPE" }
+!$acc parallel reduction (+:t1)! { dg-error "OMP DECLARE 
REDUCTION \\+ not found for type TYPE" }
 !$acc end parallel
-!$acc parallel reduction (*:ta1)   ! { dg-error "must be of numeric type, 
got TYPE" }
+!$acc parallel reduction (*:ta1)   ! { dg-error "OMP DECLARE REDUCTION \\* 
not found for type TYPE" }
 !$acc end parallel
-!$acc parallel reduction (.and.:i3)! { dg-error "must be LOGICAL" }
+!$acc parallel reduction (.and.:i3)! { dg-error "OMP DECLARE REDUCTION 
\\.and\\. not found for type INTEGER" }
 !$acc end parallel
-!$acc parallel reduction (.or.:ia2)! { dg-error "must be LOGICAL" }
+!$acc parallel reduction (.or.:ia2)! { dg-error "OMP DECLARE REDUCTION 
\\.or\\. not found for type INTEGER" }
 !$acc end parallel
-!$acc parallel reduction (.eqv.:r1)! { dg-error "must be LOGICAL" }
+!$acc parallel reduction (.eqv.:r1)! { dg-error "OMP DECLARE REDUCTION 
\\.eqv\\. not found for type REAL" }
 !$acc end parallel
-!$acc parallel reduction (.neqv.:ra1)  ! { dg-error "must be LOGICAL" }
+!$acc parallel reduction (.neqv.:ra1)  ! { dg-error "OMP DECLARE REDUCTION 
\\.neqv\\. not found for type REAL" }
 !$acc end parallel
-!$acc parallel reduction (.and.:d1)! { dg-error "must be LOGICAL" }
+!$acc parallel reduction (.and.:d1)! { dg-error "OMP DECLARE REDUCTION 
\\.and\\. not found for type REAL" }
 !$acc end parallel
-!$acc parallel reduction (.or.:da1)! { dg-error "must be LOGICAL" }
+!$acc parallel reduction (.or.:da1)! { dg-error "OMP DECLARE REDUCTION 
\\.or\\. not found for type REAL" }
 !$acc end parallel
-!$acc parallel reduction (.eqv.:c1)! { dg-error "must be LOGIC

[c++-concepts] Fix assertion failure with cp_maybe_constrained_type_specifier

2014-06-17 Thread Braden Obrzut

cp_maybe_constrained_type_specifier asserted that the decl passed in 
would be of type OVERLOAD, however a clean build of the compiler was 
broken since it could also be a BASELINK.  I'm not entirely sure when 
this is the case, except that it seems to happen with class member 
templates as it also caused a test case in my next patch to fail.  The 
solution is to check for a BASELINK and extract the functions from it.


The possibility of decl being a BASELINK is asserted near the call in 
cp_parser_template_id (cp_maybe_partial_concept_id just calls the 
function in question at this time).


2014-06-17  Braden Obrzut  
* gcc/cp/parser.c (cp_maybe_constrained_type_specifier): Fix assertion
failure if baselink was passed in as decl.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 1eaf863..40d1d63 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -15175,6 +15175,9 @@ cp_parser_allows_constrained_type_specifier (cp_parser *parser)
 static tree
 cp_maybe_constrained_type_specifier (cp_parser *parser, tree decl, tree args)
 {
+  if (BASELINK_P (decl))
+decl = BASELINK_FUNCTIONS (decl);
+
   gcc_assert (TREE_CODE (decl) == OVERLOAD);
   gcc_assert (args ? TREE_CODE (args) == TREE_VEC : true);

[PATCH] Testcase for PR61012

2014-06-17 Thread Richard Biener


>From the new dup.

Committed to trunk and branch.

Richard.

2014-06-17  Richard Biener  

PR lto/61012
* gcc.dg/lto/pr61526_0.c: New testcase.
* gcc.dg/lto/pr61526_1.c: Likewise.

Index: gcc/testsuite/gcc.dg/lto/pr61526_0.c
===
--- gcc/testsuite/gcc.dg/lto/pr61526_0.c(revision 0)
+++ gcc/testsuite/gcc.dg/lto/pr61526_0.c(working copy)
@@ -0,0 +1,6 @@
+/* { dg-lto-do link } */
+/* { dg-lto-options { { -fPIC -flto -flto-partition=1to1 } } } */
+/* { dg-extra-ld-options { -shared } } */
+
+static void *master;
+void *foo () { return master; }
Index: gcc/testsuite/gcc.dg/lto/pr61526_1.c
===
--- gcc/testsuite/gcc.dg/lto/pr61526_1.c(revision 0)
+++ gcc/testsuite/gcc.dg/lto/pr61526_1.c(working copy)
@@ -0,0 +1,2 @@
+extern void *master;
+void *bar () { return master; }

[c++-concepts] Allow function parameters to be referenced in trailing requires clauses

2014-06-17 Thread Braden Obrzut

This patch allows function parameters to be referenced by trailing 
requires clauses.  Typically this is used to refer to the type of an 
implicitly generated template.  For example, the following should now be 
valid (where C is some previously defined concept):


auto f1 (auto x) requires C ();

Note that the test case trailing-requires-overload.C will fail to 
compile unless the previously submitted patch is applied first.


2014-06-17  Braden Obrzut  
* gcc/cp/parser.c (cp_parser_trailing_requirements): Handle requires
keyword manually so that we can push function parameters back into
scope.
* gcc/cp/decl.c (push_function_parms): New. Recovers and reopens
function parameter scope from declarator.
* gcc/testsuite/g++.dg/concepts/trailing-requires.C: New tests.
* gcc/testsuite/g++.dg/concepts/trailing-requires-overload.C: New 
tests.
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 5d23bfa..aca3ce5 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5409,6 +5409,7 @@ extern bool defer_mark_used_calls;
 extern GTY(()) vec *deferred_mark_used_calls;
 extern tree finish_case_label			(location_t, tree, tree);
 extern tree cxx_maybe_build_cleanup		(tree, tsubst_flags_t);
+extern void push_function_parms (cp_declarator *);
 
 /* in decl2.c */
 extern bool check_java_method			(tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 9791dba..5daccf8 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -13791,6 +13791,22 @@ store_parm_decls (tree current_function_parms)
 current_eh_spec_block = begin_eh_spec_block ();
 }
 
+/* Bring the parameters of a function declaration back into scope without
+   entering the function body. Declarator must be a function declarator.
+   Caller is responsible for calling finish_scope. */
+
+void
+push_function_parms (cp_declarator *declarator)
+{
+  begin_scope (sk_function_parms, NULL_TREE);
+
+  for (tree parms = declarator->u.function.parameters; parms != NULL_TREE
+   && !VOID_TYPE_P (TREE_VALUE (parms)); parms = TREE_CHAIN (parms))
+{
+  pushdecl (TREE_VALUE (parms));
+}
+}
+
 
 /* We have finished doing semantic analysis on DECL, but have not yet
generated RTL for its body.  Save away our current state, so that
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 1eaf863..2d5862f 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -16929,7 +16929,20 @@ cp_parser_trailing_requirements (cp_parser *parser, cp_declarator *decl)
 terse_reqs = get_shorthand_requirements (current_template_parms);
 
   // An optional requires clause can yield an additional constraint.
-  tree explicit_reqs = cp_parser_requires_clause_opt (parser);
+  tree explicit_reqs = NULL_TREE;
+  if (cp_lexer_next_token_is_keyword (parser->lexer, RID_REQUIRES))
+{
+  cp_lexer_consume_token (parser->lexer);
+
+  // Bring parms back into scope so requires clause can reference them.
+  ++cp_unevaluated_operand;
+  push_function_parms (decl);
+
+  explicit_reqs = cp_parser_requires_clause (parser);
+
+  finish_scope();
+  --cp_unevaluated_operand;
+}
 
   // If requirements were specified in either the implicit
   // template parameter list or an explicit requires clause,
diff --git a/gcc/testsuite/g++.dg/concepts/trailing-requires-overload.C b/gcc/testsuite/g++.dg/concepts/trailing-requires-overload.C
new file mode 100644
index 000..2fc6cdb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/trailing-requires-overload.C
@@ -0,0 +1,115 @@
+// { dg-do run }
+// { dg-options "-std=c++1y" }
+
+#include 
+
+template
+  concept bool C ()
+  {
+return requires (T a, T b) { { a + b } -> T };
+  }
+
+template
+  concept bool D ()
+  {
+return requires (T a, T b) { { a - b } -> T };
+  }
+
+template
+  concept bool M ()
+  {
+return requires (T a, T b) { { a * b } -> T };
+  }
+
+template
+  requires C ()
+  struct Adds
+  {
+Adds(T a) { v = a; }
+T v;
+  };
+
+template
+  Adds operator+ (const Adds &a, const Adds &b)
+  {
+return a.v + b.v;
+  }
+
+template
+  requires D ()
+  struct Subs
+  {
+Subs(T a) { v = a; }
+T v;
+  };
+
+template
+  Subs operator- (const Subs &a, const Subs &b)
+  {
+return a.v - b.v;
+  }
+
+template
+  requires M ()
+  struct Mults
+  {
+Mults(T a) { v = a; }
+T v;
+  };
+
+template
+  Mults operator- (const Mults &a, const Mults &b)
+  {
+return a.v * b.v;
+  }
+
+auto f1 (auto a, decltype(a) b) -> decltype(a) requires M ();
+auto f1 (auto a, decltype(a) b) -> decltype(a) requires D ();
+auto f1 (auto a, decltype(a) b) -> decltype(a) requires C ();
+
+struct S1
+{
+  auto f2 (auto a) -> decltype(a) requires C ();
+  auto f2 (auto a) -> decltype(a) requires D ();
+  auto f2 (auto a) -> decltype(a) requires M ();
+};
+
+auto S1::f2 (auto a) -> decltype(a) requires M ()
+{
+  return f1 (a, a);
+}
+auto S1::f2 (auto a) -> decltype(a) requires C ()
+{
+  return f1 (a, a);
+}
+auto S1

Commit: MSP430: Add NOP after DINT in hardware multiply patterns

2014-06-17 Thread Nick Clifton

Hi Guys,

  I am checking in the patch below to update the hardware multiply
  patterns for the MSP430 so that there is a NOP instruction after
  disabling interrupts with the DINT instruction.  Timing issues mean
  that it is possible for the instruction following the DINT to be
  interrupted, so it has to be a NOP.  The change is going in to the
  mainline sources and the 4.9 branch.

Cheers
  Nick

gcc/ChangeLog
2014-06-17  Nick Clifton  

* config/msp430/msp430.md (mulhisi3): Add a NOP after the DINT.
(umulhi3, mulsidi3, umulsidi3): Likewise.

Index: gcc/config/msp430/msp430.md
===
--- gcc/config/msp430/msp430.md (revision 211726)
+++ gcc/config/msp430/msp430.md (working copy)
@@ -1423,9 +1423,9 @@
   "optimize > 2 && msp430_hwmult_type != NONE"
   "*
 if (msp430_use_f5_series_hwmult ())
-  return \"PUSH.W sr { DINT { MOV.W %1, &0x04C2 { MOV.W %2, &0x04C8 { 
MOV.W &0x04CA, %L0 { MOV.W &0x04CC, %H0 { POP.W sr\";
+  return \"PUSH.W sr { DINT { NOP { MOV.W %1, &0x04C2 { MOV.W %2, &0x04C8 
{ MOV.W &0x04CA, %L0 { MOV.W &0x04CC, %H0 { POP.W sr\";
 else
-  return \"PUSH.W sr { DINT { MOV.W %1, &0x0132 { MOV.W %2, &0x0138 { 
MOV.W &0x013A, %L0 { MOV.W &0x013C, %H0 { POP.W sr\";
+  return \"PUSH.W sr { DINT { NOP { MOV.W %1, &0x0132 { MOV.W %2, &0x0138 
{ MOV.W &0x013A, %L0 { MOV.W &0x013C, %H0 { POP.W sr\";
   "
 )
 
@@ -1436,9 +1436,9 @@
   "optimize > 2 && msp430_hwmult_type != NONE"
   "*
 if (msp430_use_f5_series_hwmult ())
-  return \"PUSH.W sr { DINT { MOV.W %1, &0x04C0 { MOV.W %2, &0x04C8 { 
MOV.W &0x04CA, %L0 { MOV.W &0x04CC, %H0 { POP.W sr\";
+  return \"PUSH.W sr { DINT { NOP { MOV.W %1, &0x04C0 { MOV.W %2, &0x04C8 
{ MOV.W &0x04CA, %L0 { MOV.W &0x04CC, %H0 { POP.W sr\";
 else
-  return \"PUSH.W sr { DINT { MOV.W %1, &0x0130 { MOV.W %2, &0x0138 { 
MOV.W &0x013A, %L0 { MOV.W &0x013C, %H0 { POP.W sr\";
+  return \"PUSH.W sr { DINT { NOP { MOV.W %1, &0x0130 { MOV.W %2, &0x0138 
{ MOV.W &0x013A, %L0 { MOV.W &0x013C, %H0 { POP.W sr\";
   "
 )
 
@@ -1449,9 +1449,9 @@
   "optimize > 2 && msp430_hwmult_type != NONE"
   "*
 if (msp430_use_f5_series_hwmult ())
-  return \"PUSH.W sr { DINT { MOV.W %L1, &0x04D4 { MOV.W %H1, &0x04D6 { 
MOV.W %L2, &0x04E0 { MOV.W %H2, &0x04E2 { MOV.W &0x04E4, %A0 { MOV.W &0x04E6, 
%B0 { MOV.W &0x04E8, %C0 { MOV.W &0x04EA, %D0 { POP.W sr\";
+  return \"PUSH.W sr { DINT { NOP { MOV.W %L1, &0x04D4 { MOV.W %H1, 
&0x04D6 { MOV.W %L2, &0x04E0 { MOV.W %H2, &0x04E2 { MOV.W &0x04E4, %A0 { MOV.W 
&0x04E6, %B0 { MOV.W &0x04E8, %C0 { MOV.W &0x04EA, %D0 { POP.W sr\";
 else
-  return \"PUSH.W sr { DINT { MOV.W %L1, &0x0144 { MOV.W %H1, &0x0146 { 
MOV.W %L2, &0x0150 { MOV.W %H2, &0x0152 { MOV.W &0x0154, %A0 { MOV.W &0x0156, 
%B0 { MOV.W &0x0158, %C0 { MOV.W &0x015A, %D0 { POP.W sr\";
+  return \"PUSH.W sr { DINT { NOP { MOV.W %L1, &0x0144 { MOV.W %H1, 
&0x0146 { MOV.W %L2, &0x0150 { MOV.W %H2, &0x0152 { MOV.W &0x0154, %A0 { MOV.W 
&0x0156, %B0 { MOV.W &0x0158, %C0 { MOV.W &0x015A, %D0 { POP.W sr\";
   "
 )
 
@@ -1462,8 +1462,8 @@
   "optimize > 2 && msp430_hwmult_type != NONE"
   "*
 if (msp430_use_f5_series_hwmult ())
-  return \"PUSH.W sr { DINT { MOV.W %L1, &0x04D0 { MOV.W %H1, &0x04D2 { 
MOV.W %L2, &0x04E0 { MOV.W %H2, &0x04E2 { MOV.W &0x04E4, %A0 { MOV.W &0x04E6, 
%B0 { MOV.W &0x04E8, %C0 { MOV.W &0x04EA, %D0 { POP.W sr\";
+  return \"PUSH.W sr { DINT { NOP { MOV.W %L1, &0x04D0 { MOV.W %H1, 
&0x04D2 { MOV.W %L2, &0x04E0 { MOV.W %H2, &0x04E2 { MOV.W &0x04E4, %A0 { MOV.W 
&0x04E6, %B0 { MOV.W &0x04E8, %C0 { MOV.W &0x04EA, %D0 { POP.W sr\";
 else
-  return \"PUSH.W sr { DINT { MOV.W %L1, &0x0140 { MOV.W %H1, &0x0142 { 
MOV.W %L2, &0x0150 { MOV.W %H2, &0x0152 { MOV.W &0x0154, %A0 { MOV.W &0x0156, 
%B0 { MOV.W &0x0158, %C0 { MOV.W &0x015A, %D0 { POP.W sr\";
+  return \"PUSH.W sr { DINT { NOP { MOV.W %L1, &0x0140 { MOV.W %H1, 
&0x0142 { MOV.W %L2, &0x0150 { MOV.W %H2, &0x0152 { MOV.W &0x0154, %A0 { MOV.W 
&0x0156, %B0 { MOV.W &0x0158, %C0 { MOV.W &0x015A, %D0 { POP.W sr\";
   "
 )

[PATCH][match-and-simplify] Make gimple_fold_stmt_to_constant_1 dumping more useful

2014-06-17 Thread Richard Biener


Committed.

Richard.

2014-06-17  Richard Biener  

* gimple-fold.c (gimple_fold_stmt_to_constant_1): Dump
simplified expression.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 211452)
+++ gcc/gimple-fold.c   (working copy)
@@ -2810,8 +2810,8 @@ gimple_fold_stmt_to_constant_1 (gimple s
{
  if (dump_file && dump_flags & TDF_DETAILS)
{
- fprintf (dump_file, "Match-and-simplified definition of ");
- print_generic_expr (dump_file, lhs, 0);
+ fprintf (dump_file, "Match-and-simplified ");
+ print_gimple_expr (dump_file, stmt, 0, TDF_SLIM);
  fprintf (dump_file, " to ");
  print_generic_expr (dump_file, res, 0);
  fprintf (dump_file, "\n");

Re: [PATCH, cprop] Check rtx_cost when propagating constant

2014-06-17 Thread Zhenqiang Chen

On 17 June 2014 16:15, Richard Biener  wrote:
> On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen
>  wrote:
>> Hi,
>>
>> For some large constant, ports like ARM, need one more instructions to
>> operate it. e.g
>>
>> #define MASK 0xfe00ff
>> void maskdata (int * data, int len)
>> {
>>int i = len;
>>for (; i > 0; i -= 2)
>> {
>>   data[i] &= MASK;
>>   data[i + 1] &= MASK;
>> }
>> }
>>
>> Need two instructions for each AND operation:
>>
>> andr3, r3, #16711935
>> bicr3, r3, #65536
>>
>> If we keep the MASK in a register, loop2_invariant pass can hoist it
>> out the loop. And it can be shared by different references.
>>
>> So the patch skips constant propagation if it makes INSN's cost higher.
>
> So cprop undos invariant motions work here?

Yes. GLOBAL CONST-PROP will undo invariant motions.

> Should we make sure we add a REG_EQUAL note when not propagating?

Logs show there already has REG_EQUAL note.

>> Bootstrap and no make check regression on X86-64 and ARM Chrome book.
>>
>> OK for trunk?
>>
>> Thanks!
>> -Zhenqiang
>>
>> ChangeLog:
>> 2014-06-17  Zhenqiang Chen  
>>
>> * cprop.c (try_replace_reg): Check cost for constants.
>>
>> diff --git a/gcc/cprop.c b/gcc/cprop.c
>> index aef3ee8..c9cf02a 100644
>> --- a/gcc/cprop.c
>> +++ b/gcc/cprop.c
>> @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn)
>>rtx src = 0;
>>int success = 0;
>>rtx set = single_set (insn);
>> +  int old_cost = 0;
>> +  bool copy_p = false;
>> +  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn));
>> +
>> +  if (set && SET_SRC (set) && REG_P (SET_SRC (set)))
>> +copy_p = true;
>> +  else
>> +old_cost = set_rtx_cost (set, speed);
>
> Looks bogus for set == NULL?

set_rtx_cost has checked it. If it is NULL, the function will return 0;

> Also what about register pressure?

Do you think it has big register pressure impact? I think it does not
increase register pressure.

> I think this kind of change needs wider testing as RTX costs are
> usually not fully implemented and you introduce a new use kind
> (or is it already used elsewhere in this way to compute cost
> difference of a set with s/reg/const?).

Passes like fwprop, cse, auto_inc_dec, uses RTX costs to make the
decision. e.g. in function attempt_change of auto-inc-dec.c, it has
code segments like:

  old_cost = (set_src_cost (mem, speed)
  + set_rtx_cost (PATTERN (inc_insn.insn), speed));
  new_cost = set_src_cost (mem_tmp, speed);
  ...
  if (old_cost < new_cost)
{
  ...
  return false;
}

The usage of RTX costs in this patch is similar.

I had run X86-64 bootstrap and regression tests with
--enable-languages=c,c++,lto,fortran,go,ada,objc,obj-c++,java

And ARM bootstrap and regression tests with
--enable-languages=c,c++,fortran,lto,objc,obj-c++

I will run tests on i686. What other tests do you think I have to run?

> What kind of performance difference do you see?

I had run coremark, dhrystone, eembc on ARM Cortex-M4 (with some arm
backend changes). Coremark with some options show >10% performance
improvement. dhrystone is a little better. Some wave in eembc, but
overall result is better.

I will run spec2000 on X86-64 and ARM, and back to you about the
performance changes.

Thanks!
-Zhenqiang

> Thanks,
> Richard.
>
>>/* Usually we substitute easy stuff, so we won't copy everything.
>>   We however need to take care to not duplicate non-trivial CONST
>> @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn)
>>to = copy_rtx (to);
>>
>>validate_replace_src_group (from, to, insn);
>> +
>> +  /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop.
>> + And it can be shared by different references.  So skip propagation if
>> + it makes INSN's rtx cost higher.  */
>> +  if (set && !copy_p && CONSTANT_P (to))
>> +{
>> +  int new_cost = set_rtx_cost (set, speed);
>> +  if (new_cost > old_cost)
>> +   {
>> + cancel_changes (0);
>> + return false;
>> +   }
>> +}
>> +
>>if (num_changes_pending () && apply_change_group ())
>>  success = 1;

Re: [PATCH] Fix PR61335

2014-06-17 Thread Tobias Burnus

Uros Bizjak wrote:
> Attached patch initializes problematic array to zero instead of
> uninitialized value.
>
> 2014-06-17  Uros Bizjak  
>
> * gfortran.dg/pr61335.f90 (cp_unit_create): Initialize
> unit_id and kind_id to zero.
>
> Tested on alphaev68-linux-gnu and x86_64-linux-gnu.
> OK for mainline?

Looks good to me, is obvious and shouldn't affect the test case.

In particular the variables in questions aren't used in the
code after their initialization with an undefined implicitly
declared variable, which is also otherwise unused.

Tobias

RE: [PATCH,MIPS] Remove unused code relating to reloading fcc

2014-06-17 Thread Matthew Fortune

Richard Sandiford  writes:
> Matthew Fortune  writes:
> > This is a small clean-up patch to remove code relating to reloading or
> moving
> > mips fcc registers. At some point in the past these registers were
> allocated
> > as part of register allocation but they are now statically allocated in
> the
> > backend in a round robin fashion. The code for reloading them is therefore
> not
> > necessary any more. The move costs are also irrelevant so are replaced
> with
> > a comment instead (but the cases can just be deleted if that is
> preferred).
> 
> I think removing the cases would be better.
> 
> OK with that change.  Thanks for cleaning this up.

Re-posting as I missed removing the ST_REGS handling code from
mips_secondary_reload_class.

Is this still OK? Testsuite run on mips-unknown-linux-gnu shows no change
in pass/fail.

Regards,
Matthew

gcc/

* config/mips/mips-protos.h (mips_expand_fcc_reload): Remove.
* config/mips/mips.c (mips_expand_fcc_reload): Remove.
(mips_move_to_gpr_cost): Remove ST_REGS case.
(mips_move_from_gpr_cost): Likewise.
(mips_register_move_cost): Likewise.
(mips_secondary_reload_class): Likewise.

diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index 0b8125a..0b32a70 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -232,7 +232,6 @@ extern bool mips_use_pic_fn_addr_reg_p (const_rtx);
 extern rtx mips_expand_call (enum mips_call_type, rtx, rtx, rtx, rtx, bool);
 extern void mips_split_call (rtx, rtx);
 extern bool mips_get_pic_call_symbol (rtx *, int);
-extern void mips_expand_fcc_reload (rtx, rtx, rtx);
 extern void mips_set_return_address (rtx, rtx);
 extern bool mips_move_by_pieces_p (unsigned HOST_WIDE_INT, unsigned int);
 extern bool mips_store_by_pieces_p (unsigned HOST_WIDE_INT, unsigned int);
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 585b755..cff1d38 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -7195,35 +7195,6 @@ mips_function_ok_for_sibcall (tree decl, tree exp 
ATTRIBUTE_UNUSED)
   return true;
 }
 

-/* Emit code to move general operand SRC into condition-code
-   register DEST given that SCRATCH is a scratch TFmode FPR.
-   The sequence is:
-
-   FP1 = SRC
-   FP2 = 0.0f
-   DEST = FP2 < FP1
-
-   where FP1 and FP2 are single-precision FPRs taken from SCRATCH.  */
-
-void
-mips_expand_fcc_reload (rtx dest, rtx src, rtx scratch)
-{
-  rtx fp1, fp2;
-
-  /* Change the source to SFmode.  */
-  if (MEM_P (src))
-src = adjust_address (src, SFmode, 0);
-  else if (REG_P (src) || GET_CODE (src) == SUBREG)
-src = gen_rtx_REG (SFmode, true_regnum (src));
-
-  fp1 = gen_rtx_REG (SFmode, REGNO (scratch));
-  fp2 = gen_rtx_REG (SFmode, REGNO (scratch) + MAX_FPRS_PER_FMT);
-
-  mips_emit_move (copy_rtx (fp1), src);
-  mips_emit_move (copy_rtx (fp2), CONST0_RTX (SFmode));
-  emit_insn (gen_slt_sf (dest, fp2, fp1));
-}
-

 /* Implement MOVE_BY_PIECES_P.  */
 
 bool
@@ -12044,10 +12015,6 @@ mips_move_to_gpr_cost (enum machine_mode mode 
ATTRIBUTE_UNUSED,
   /* MFC1, etc.  */
   return 4;
 
-case ST_REGS:
-  /* LUI followed by MOVF.  */
-  return 4;
-
 case COP0_REGS:
 case COP2_REGS:
 case COP3_REGS:
@@ -12081,11 +12048,6 @@ mips_move_from_gpr_cost (enum machine_mode mode, 
reg_class_t to)
   /* MTC1, etc.  */
   return 4;
 
-case ST_REGS:
-  /* A secondary reload through an FPR scratch.  */
-  return (mips_register_move_cost (mode, GENERAL_REGS, FP_REGS)
- + mips_register_move_cost (mode, FP_REGS, ST_REGS));
-
 case COP0_REGS:
 case COP2_REGS:
 case COP3_REGS:
@@ -12117,9 +12079,6 @@ mips_register_move_cost (enum machine_mode mode,
   if (to == FP_REGS && mips_mode_ok_for_mov_fmt_p (mode))
/* MOV.FMT.  */
return 4;
-  if (to == ST_REGS)
-   /* The sequence generated by mips_expand_fcc_reload.  */
-   return 8;
 }
 
   /* Handle cases in which only one class deviates from the ideal.  */
@@ -12184,23 +12143,6 @@ mips_secondary_reload_class (enum reg_class rclass,
   if (ACC_REG_P (regno))
 return reg_class_subset_p (rclass, GR_REGS) ? NO_REGS : GR_REGS;
 
-  /* We can only copy a value to a condition code register from a
- floating-point register, and even then we require a scratch
- floating-point register.  We can only copy a value out of a
- condition-code register into a general register.  */
-  if (reg_class_subset_p (rclass, ST_REGS))
-{
-  if (in_p)
-   return FP_REGS;
-  return GP_REG_P (regno) ? NO_REGS : GR_REGS;
-}
-  if (ST_REG_P (regno))
-{
-  if (!in_p)
-   return FP_REGS;
-  return reg_class_subset_p (rclass, GR_REGS) ? NO_REGS : GR_REGS;
-}
-
   if (reg_class_subset_p (rclass, FP_REGS))
 {
   if (MEM_P (x)

Bug 61407 - Build errors on latest OS X 10.10 Yosemite with Xcode 6 on GCC 4.8.3

2014-06-17 Thread Илья Михальцов

Hello.

This patch fixes gcc build problems on the latest OS X 10.10 SDK beta (see 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61407)

fixincludes/ChangeLog
* inclhack.def (darwin14_has_feature): New fix
* fixincl.x: Regenerate
* tests/base/Availability.h: Added

gcc/ChangeLog
* config/darwin-c.c (version_as_macro): Added compatibility with
OS X 10.10 macro version macro and triplet
* config/darwin-driver.c (darwin_find_version_from_kernel): Bumped 
max kernel version

libsanitizer/ChangeLog
* sanitizer_common/sanitizer_platform_limits_posix.cc: Fixed
32-bit compatible dirent struct for OS X
* sanitizer_common/sanitizer_platform_limits_posix.h: Likewise

With regards, Ilya Mikhaltsou


diff --git a/fixincludes/inclhack.def b/fixincludes/inclhack.def
index 6a1136c..b536080 100644
--- a/fixincludes/inclhack.def
+++ b/fixincludes/inclhack.def
@@ -4751,4 +4751,33 @@ fix = {
 
 test_text = "extern char *\tsprintf();";
 };
+
+/*
+ * Fix stdio.h using C++ __has_feature built-in on OS X 10.10
+ */
+fix = {
+hackname  = darwin14_has_feature;
+files = Availability.h;
+mach  = "*-*-darwin14.0*";
+
+c_fix = wrap;
+c_fix_arg = <<- _HasFeature_
+
+/*
+ * GCC doesn't support __has_feature built-in in C mode and
+ * using defined(__has_feature) && __has_feature in the same
+ * macro expression is not valid. So, easiest way is to define
+ * for this header __has_feature as a macro, returning 0, in case
+ * it is not defined internally
+ */
+#ifndef __has_feature
+#define __has_feature(x) 0
+#endif
+
+
+_HasFeature_;
+
+test_text = '';
+};
+
 /*EOF*/
diff --git a/fixincludes/tests/base/Availability.h 
b/fixincludes/tests/base/Availability.h
new file mode 100644
index 000..807c40d
--- /dev/null
+++ b/fixincludes/tests/base/Availability.h
@@ -0,0 +1,29 @@
+/*  DO NOT EDIT THIS FILE.
+
+It has been auto-edited by fixincludes from:
+
+   "fixinc/tests/inc/Availability.h"
+
+This had to be done to correct non-standard usages in the
+original, manufacturer supplied header file.  */
+
+#ifndef FIXINC_WRAP_AVAILABILITY_H_DARWIN14_HAS_FEATURE
+#define FIXINC_WRAP_AVAILABILITY_H_DARWIN14_HAS_FEATURE 1
+
+
+/* GCC doesn't support __has_feature built-in in C mode and
+ * using defined(__has_feature) && __has_feature in the same
+ * macro expression is not valid. So, easiest way is to define
+ * for this header __has_feature as a macro, returning 0, in case
+ * it is not defined internally
+ */
+#ifndef __has_feature
+#define __has_feature(x) 0
+#endif
+
+
+#if defined( DARWIN14_HAS_FEATURE_CHECK )
+
+#endif  /* DARWIN14_HAS_FEATURE_CHECK */
+
+#endif  /* FIXINC_WRAP_AVAILABILITY_H_DARWIN14_HAS_FEATURE */
diff --git a/gcc/config/darwin-c.c b/gcc/config/darwin-c.c
index 892ba35..39f795f 100644
--- a/gcc/config/darwin-c.c
+++ b/gcc/config/darwin-c.c
@@ -572,20 +572,31 @@ find_subframework_header (cpp_reader *pfile, const char 
*header, cpp_dir **dirp)
 
 /* Return the value of darwin_macosx_version_min suitable for the
__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ macro,
-   so '10.4.2' becomes 1040.  The lowest digit is always zero.
-   Print a warning if the version number can't be understood.  */
+   so '10.4.2' becomes 1040 and '10.10.0' becomes 101000.  The lowest
+   digit is always zero. Print a warning if the version number
+   can't be understood.  */
 static const char *
 version_as_macro (void)
 {
-  static char result[] = "1000";
+  static char result[7] = "1000";
+  int minorDigitIdx;
 
   if (strncmp (darwin_macosx_version_min, "10.", 3) != 0)
 goto fail;
   if (! ISDIGIT (darwin_macosx_version_min[3]))
 goto fail;
-  result[2] = darwin_macosx_version_min[3];
-  if (darwin_macosx_version_min[4] != '\0'
-  && darwin_macosx_version_min[4] != '.')
+
+  minorDigitIdx = 3;
+  result[2] = darwin_macosx_version_min[minorDigitIdx++];
+  if (ISDIGIT(darwin_macosx_version_min[minorDigitIdx])) {
+/* Starting with 10.10 numeration for mactro changed */
+result[3] = darwin_macosx_version_min[minorDigitIdx++];
+result[4] = '0';
+result[5] = '0';
+result[6] = '\0';
+  }
+  if (darwin_macosx_version_min[minorDigitIdx] != '\0'
+  && darwin_macosx_version_min[minorDigitIdx] != '.')
 goto fail;
 
   return result;
diff --git a/gcc/config/darwin-driver.c b/gcc/config/darwin-driver.c
index 8b6ae93..a115616 100644
--- a/gcc/config/darwin-driver.c
+++ b/gcc/config/darwin-driver.c
@@ -57,7 +57,7 @@ darwin_find_version_from_kernel (char *new_flag)
   version_p = osversion + 1;
   if (ISDIGIT (*version_p))
 major_vers = major_vers * 10 + (*version_p++ - '0');
-  if (major_vers > 4 + 9)
+  if (major_vers > 4 + 10)
 goto parse_failed;
   if (*version_p++ != '.')
 goto parse_failed;
diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc 
b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc
index a93d38d..6783108 10

Re: [PATCH,MIPS] Remove unused code relating to reloading fcc

2014-06-17 Thread Richard Sandiford

Matthew Fortune  writes:
> Richard Sandiford  writes:
>> Matthew Fortune  writes:
>> > This is a small clean-up patch to remove code relating to reloading or
>> moving
>> > mips fcc registers. At some point in the past these registers were
>> allocated
>> > as part of register allocation but they are now statically allocated in
>> the
>> > backend in a round robin fashion. The code for reloading them is therefore
>> not
>> > necessary any more. The move costs are also irrelevant so are replaced
>> with
>> > a comment instead (but the cases can just be deleted if that is
>> preferred).
>> 
>> I think removing the cases would be better.
>> 
>> OK with that change.  Thanks for cleaning this up.
>
> Re-posting as I missed removing the ST_REGS handling code from
> mips_secondary_reload_class.
>
> Is this still OK? Testsuite run on mips-unknown-linux-gnu shows no change
> in pass/fail.

Yeah, looks good, thanks.

Richard

Re: fix math wrt volatile-bitfields vs C++ model

2014-06-17 Thread Bernd Edlinger

Hi,

On Tue, 17 Jun 2014 10:08:33, Richard Biener wrote:
> On Tue, Jun 17, 2014 at 4:08 AM, DJ Delorie  wrote:
>>
>>> Looks ok to me, but can you add a testcase please?
>>
>> I have a testcase, but if -flto the testcase doesn't include *any*
>> definition of the test function, just all the LTO data.  Is this
>> normal?
> 
> Without -ffat-lto-objects yes, this is normal.  If you are trying to
> do a scan-assembler or so then this will be difficult with LTO.
> If LTO is not necessary to trigger the bug and you just want to
> use the torture I suggest to dg-skip-if -flto.
> 
>>> Also check if 4.9 is affected.
>>
>> It is...  same fix works, though.
> 
> Thanks,
> Richard.

If you have a test case where the generated code is actually different
with and without your patch, that would be interesting.

Please see gcc.dg/pr23623.c and gcc.dg/pr56997-4.c
for examples how to automatically scan the intermediate code which is
generated by -fdump-rtl-final to check the expected access mode.
That should work for all targets, even if they have different assembler
syntax.

Thanks
Bernd.

[PATCH] Simplify collect_switch_conv_info

2014-06-17 Thread Richard Biener


This simplifies (and for me robustifies) finding of the final_bb.
The current code is somewhat odd in that it requires at least one
non-forwarder successor of a switch to transform.  The following
patch makes us simply pick the candidate from a random edge (I chose
the default edge) using either the successor or its successor if
the successor is a forwarder.

That fixes fallout of gcc.dg/tree-ssa/pr36881.c when removing
the early copyprop pass which happened to unconditionally run
a cfgcleanup.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2014-06-17  Richard Biener  

* tree-switch-conversion.c (collect_switch_conv_info): Simplify
and allow all blocks to be forwarders.

Index: gcc/tree-switch-conversion.c
===
*** gcc/tree-switch-conversion.c(revision 211727)
--- gcc/tree-switch-conversion.c(working copy)
*** collect_switch_conv_info (gimple swtch,
*** 640,654 
info->other_count += e->count;
  
/* See if there is one common successor block for all branch
!  targets.  If it exists, record it in FINAL_BB.  */
!   FOR_EACH_EDGE (e, ei, info->switch_bb->succs)
! {
!   if (! single_pred_p (e->dest))
!   {
! info->final_bb = e->dest;
! break;
!   }
! }
if (info->final_bb)
  FOR_EACH_EDGE (e, ei, info->switch_bb->succs)
{
--- 640,655 
info->other_count += e->count;
  
/* See if there is one common successor block for all branch
!  targets.  If it exists, record it in FINAL_BB.
!  Start with the destination of the default case as guess
!  or its destination in case it is a forwarder block.  */
!   if (! single_pred_p (e_default->dest))
! info->final_bb = e_default->dest;
!   else if (single_succ_p (e_default->dest)
!  && ! single_pred_p (single_succ (e_default->dest)))
! info->final_bb = single_succ (e_default->dest);
!   /* Require that all switch destinations are either that common
!  FINAL_BB or a forwarder to it.  */
if (info->final_bb)
  FOR_EACH_EDGE (e, ei, info->switch_bb->succs)
{

[PATCH] Use vec<>::qsort where possible

2014-06-17 Thread Richard Biener


Just spotted these.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2014-06-17  Richard Biener  

* genopinit.c (main): Use vec<>::qsort method.
* tree-ssa-loop-niter.c (discover_iteration_bound_by_body_walk):
Likewise.
* tree-vect-data-refs.c (vect_analyze_data_ref_accesses): Likewise.

Index: gcc/genopinit.c
===
--- gcc/genopinit.c (revision 211698)
+++ gcc/genopinit.c (working copy)
@@ -357,8 +357,7 @@ main (int argc, char **argv)
 }
 
   /* Sort the collected patterns.  */
-  qsort (patterns.address (), patterns.length (),
-sizeof (pattern), pattern_cmp);
+  patterns.qsort (pattern_cmp);
 
   /* Now that we've handled the "extra" patterns, eliminate them from
  the optabs array.  That way they don't get in the way below.  */
Index: gcc/tree-ssa-loop-niter.c
===
--- gcc/tree-ssa-loop-niter.c   (revision 211698)
+++ gcc/tree-ssa-loop-niter.c   (working copy)
@@ -3144,8 +3144,7 @@ discover_iteration_bound_by_body_walk (s
 fprintf (dump_file, " Trying to walk loop body to reduce the bound.\n");
 
   /* Sort the bounds in decreasing order.  */
-  qsort (bounds.address (), bounds.length (),
-sizeof (widest_int), wide_int_cmp);
+  bounds.qsort (wide_int_cmp);
 
   /* For every basic block record the lowest bound that is guaranteed to
  terminate the loop.  */
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 211698)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -2508,8 +2530,7 @@ vect_analyze_data_ref_accesses (loop_vec
  linear.  Don't modify the original vector's order, it is needed for
  determining what dependencies are reversed.  */
   vec datarefs_copy = datarefs.copy ();
-  qsort (datarefs_copy.address (), datarefs_copy.length (),
-sizeof (data_reference_p), dr_group_sort_cmp);
+  datarefs_copy.qsort (dr_group_sort_cmp);
 
   /* Build the interleaving chains.  */
   for (i = 0; i < datarefs_copy.length () - 1;)

Re: Make ipa-ref somewhat less stupid

2014-06-17 Thread Martin Liška



On 06/16/2014 10:01 AM, Jan Hubicka wrote:

On 06/10/2014 08:34 AM, Jan Hubicka wrote:

Hi,
ipa-reference is somewhat stupid and builds its data sets for all variables 
including
addressable and public one just to prune them out after all bitmaps are 
constructed.
This used to make sense when the profile generation happened at compile time, 
but
since ipa_ref datastructure was intrdocued this is a nonsense.

Martin: It may be interesting to check if this solves the memory use issues with
chrome.  We also may be able to re-enable ipa-ref with profile-generate as
I think all the datastructures are considered to have address taken.

Hi,
there is a link to chromium stats: 
https://drive.google.com/file/d/0B0pisUJ80pO1VmNHeklCRWVkOUU/edit?usp=sharing

Both compilation were run with '-flto=6', where the upper graph adds 
'-fprofile-generate'. Memory footprint is IMHO acceptable, but compilation 
process takes twice longer with profile generation. Yeah, chromium contains a 
really big code base :)

Yep, I wonder why WPA takes so much longer. Do you think you can build lto1
with --enable-gather-detailed-mem-stats and relink with -fpre-ipa-mem-report
-fpost-ipa-mem-report -fmem-report -Q and send me the output?  It would be nice
to push Chromium under 4GB of WPA :)

There's report you requested: 
https://drive.google.com/file/d/0B0pisUJ80pO1RlRRTVBxUG5vSlE/edit?usp=sharing , 
produced by -fno-profile-generate. With enabled -fprofile-generate, WPA stage 
cannot fit to 24GB memory with enabled memory stats.

Martin



Thanks a lot!
Honza

Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-17 Thread Evgeny Stupachenko

Are i386 changes ok?
Patches with corresponding changes and new tests are attached.

Thanks,
Evgeny

On Thu, Jun 12, 2014 at 12:14 PM, Richard Biener
 wrote:
> On Thu, Jun 12, 2014 at 6:04 AM, Evgeny Stupachenko  
> wrote:
>> Testing finished. No new regressions.
>> Is the following patch ok?
>
> +  if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1 ||
> +  !vect_shift_permute_load_chain (dr_chain, size, stmt, gsi,
> &result_chain))
>
> ||s and &&s go to the next line.
>
> I miss testcases that make sure the vectorizer/backend code-paths are
> both exercised.  Put them in gcc.target/i386 and provide an appropriate
> -march.
>
> The vectorizer changes are ok with the above fixed, I defer to backend
> maintainers for the i386 changes.
>
> Richard.
>
>> 2014-06-11  Evgeny Stupachenko  
>>
>> * config/i386/i386.c (ix86_reassociation_width): Add alternative for
>> vector case.
>> * config/i386/i386.h (TARGET_VECTOR_PARALLEL_EXECUTION): New.
>> * config/i386/x86-tune.def (X86_TUNE_VECTOR_PARALLEL_EXECUTION): New.
>> * tree-vect-data-refs.c (vect_shift_permute_load_chain): New.
>> Introduces alternative way of loads group permutaions.
>> (vect_transform_grouped_load): Try alternative way of permutations.
>>
>> Thanks,
>> Evgeny
>>
>> On Tue, Jun 10, 2014 at 4:43 PM, Evgeny Stupachenko  
>> wrote:
>>> ix86_reassociation_width checks INTEGRAL_MODE_P and FLOAT_MODE_P which
>>> include vector mode.
>>> I'll try to separate this into scalar and vector part, but it will
>>> require more testing (under the testing now).
>>> What about the rest of the patch?
>>>
>>> Thanks,
>>> Evgeny
>>>
>>> On Thu, Jun 5, 2014 at 3:54 PM, Ramana Radhakrishnan
>>>  wrote:
 On 06/05/14 12:43, Evgeny Stupachenko wrote:
>
> New hook is related to vector instructions only. Vector instructions
> could be sequential in pipeline, but scalar - parallel. For x86
> architectures TARGET_SCHED_REASSOC_WIDTH does not give required
> differentiation.
> General hooks could be potentially reused in other algorithms/by other
> architectures.


 It already takes a "mode" argument. Couldn't you use a vector mode to work
 this out ?

 If it is not enough then please be more specific about the documentation of
 this hook about where it is useful so that it's easy for people reading the
 documentation to understand at a glance what purpose it serves.


 Ramana


>
> Thanks,
> Evgeny
>
> On Thu, Jun 5, 2014 at 2:04 PM, Ramana Radhakrishnan
>  wrote:
>>
>> On Wed, May 28, 2014 at 2:09 PM, Evgeny Stupachenko 
>> wrote:
>>>
>>> Hi,
>>>
>>> The patch introduces alternative way of permutations for load groups
>>> of size 2 and 3 which should be faster on architectures with low
>>> parallelism.
>>> The patch gives 2 times gain on Silvermont to the test from PR52252
>>> (in addition to already committed 3 times gain).
>>>
>>> Patch passes bootstrap on x86. Make check is in progress.
>>
>>
>> Why do we need a new hook ? Can't you derive this information from
>> something which is equally badly named TARGET_SCHED_REASSOC_WIDTH
>> though used in the reassociation logic but also serves a similar
>> purpose ?
>>
>> Also the documentation of this hook is incomplete at best and wrong at
>> worst as this is not applied everywhere in the vectorizer but just for
>> this special case for load store permuting. Implying this is useful
>> everywhere in the vectorizer does not appear to be correct.
>>
>> regards
>> Ramana
>>
>>
>>
>>
>>>
>>> ChangeLog:
>>>
>>> 2014-05-28  Evgeny Stupachenko  
>>>
>>>  * config/i386/i386.c (ix86_have_vector_parallel_execution):
>>> New.
>>>  (TARGET_VECTORIZE_HAVE_VECTOR_PARALLEL_EXECUTION): New.
>>>  * config/i386/i386.h (TARGET_VECTOR_PARALLEL_EXECUTION): New.
>>>  * config/i386/x86-tune.def
>>> (X86_TUNE_VECTOR_PARALLEL_EXECUTION): New.
>>>  * target.def (have_vector_parallel_execution): New.
>>>  * doc/tm.texi.in (have_vector_parallel_execution)): New.
>>>  * doc/tm.texi: Regenerate.
>>>  * targhooks.c (default_have_vector_parallel_execution): New.
>>>  * tree-vect-data-refs.c (vect_shift_permute_load_chain): New.
>>>  Introduces alternative way of loads group permutaions.
>>>  (vect_transform_grouped_load): Try alternative way of
>>> permutaions.
>>>
>>> Evgeny
>
>



vect_groups2.patch
Description: Binary data


i386tests.patch
Description: Binary data

Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-17 Thread Uros Bizjak

On Tue, Jun 17, 2014 at 2:33 PM, Evgeny Stupachenko  wrote:

> Are i386 changes ok?
> Patches with corresponding changes and new tests are attached.

Please remove all target selectors from dg-options and dg-final
testcase directives, they are not needed inside gcc.dg/i386 directory.

The patch is OK with this change.

Thanks,
Uros.

[PATCH][RFC] Add phiopt in early opts (and add -fssa-phiopt option)

2014-06-17 Thread Richard Biener


First this adds a controlling option to the phiopt pass (-fssa-phiopt).
Second, this moves the first phiopt pass from the main optimization
pipeline into early opts (before merge-phi which confuses phiopt
but after dce which will help it).

ISTR that adding an early phiopt pass was wanted to perform CFG
cleanups on the weird CFG that the gimplifier produces from C++
code (but I fail to recollect the details nor remember a bug number).

Generally doing a phiopt before merge-phi gets the chance to screw
things up is good.  Also phiopt is a kind of cleanup that is
always beneficial as it decreases code-size.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

I felt that -ftree-XXX is bad naming so I went for -fssa-XXX
even if that is now inconsistent.  Any optinion here?  For
RTL we simply have unsuffixed names so shall we instead go
for -fphiopt?  PHI implies SSA anyway and 'SSA' or 'RTL' is
an implementation detail that the user should not be interested
in (applies to tree- as well, of course).  Now, 'phiopt' is a
bad name when thinking of users (but they shouldn't play with
those options anyway).

So - comments on the pass move?  Comments on the flag naming?

Thanks,
Richard.

2014-06-17  Richard Biener  

* passes.def (pass_all_early_optimizations): Add phi-opt
after dce and before merge-phi.
(pass_all_optimizations): Remove first phi-opt pass.
* common.opt (fssa-phiopt): New option.
* opts.c (default_options_table): Enable -fssa-phiopt with -O1+
but not with -Og.
* tree-ssa-phiopt.c (pass_phiopt): Add gate method.
* doc/invoke.texi (-fssa-phiopt): Document.

Index: gcc/passes.def
===
--- gcc/passes.def  (revision 211736)
+++ gcc/passes.def  (working copy)
@@ -73,8 +73,12 @@ along with GCC; see the file COPYING3.
 execute TODO_rebuild_alias at this point.  */
  NEXT_PASS (pass_build_ealias);
  NEXT_PASS (pass_fre);
- NEXT_PASS (pass_merge_phi);
  NEXT_PASS (pass_cd_dce);
+ NEXT_PASS (pass_phiopt);
+ /* Do this after phiopt runs as phiopt is confused by
+PHIs with more than two arguments.  Switch conversion
+looks for a single PHI block though.  */
+ NEXT_PASS (pass_merge_phi);
  NEXT_PASS (pass_early_ipa_sra);
  NEXT_PASS (pass_tail_recursion);
  NEXT_PASS (pass_convert_switch);
@@ -155,7 +159,6 @@ along with GCC; see the file COPYING3.
   NEXT_PASS (pass_cselim);
   NEXT_PASS (pass_copy_prop);
   NEXT_PASS (pass_tree_ifcombine);
-  NEXT_PASS (pass_phiopt);
   NEXT_PASS (pass_tail_recursion);
   NEXT_PASS (pass_ch);
   NEXT_PASS (pass_stdarg);
Index: gcc/common.opt
===
--- gcc/common.opt  (revision 211736)
+++ gcc/common.opt  (working copy)
@@ -1950,6 +1950,10 @@ fsplit-wide-types
 Common Report Var(flag_split_wide_types) Optimization
 Split wide types into independent registers
 
+fssa-phiopt
+Common Report Var(flag_ssa_phiopt) Optimization
+Optimize conditional patterns using SSA PHI nodes
+
 fvariable-expansion-in-unroller
 Common Report Var(flag_variable_expansion_in_unroller) Optimization
 Apply variable expansion when loops are unrolled
Index: gcc/opts.c
===
--- gcc/opts.c  (revision 211736)
+++ gcc/opts.c  (working copy)
@@ -457,6 +457,7 @@ static const struct default_options defa
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fbranch_count_reg, NULL, 1 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fmove_loop_invariants, NULL, 1 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_ftree_pta, NULL, 1 },
+{ OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fssa_phiopt, NULL, 1 },
 
 /* -O2 optimizations.  */
 { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },
Index: gcc/tree-ssa-phiopt.c
===
--- gcc/tree-ssa-phiopt.c   (revision 211736)
+++ gcc/tree-ssa-phiopt.c   (working copy)
@@ -2332,6 +2332,7 @@ public:
 
   /* opt_pass methods: */
   opt_pass * clone () { return new pass_phiopt (m_ctxt); }
+  virtual bool gate (function *) { return flag_ssa_phiopt; }
   virtual unsigned int execute (function *)
 {
   return tree_ssa_phiopt_worker (false, gate_hoist_loads ());
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 211736)
+++ gcc/doc/invoke.texi (working copy)
@@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}.
 -fselective-scheduling -fselective-scheduling2 @gol
 -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol
 -fshrink-wrap -fsignaling-nans -fsingle-precision-constant @gol
--fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol
+-fsplit-ivs-in-unroller -fsplit-wide-types -fssa-phiopt -fstack-protector @g

Re: [PATCH][RFC] Add phiopt in early opts (and add -fssa-phiopt option)

2014-06-17 Thread pinskia



> On Jun 17, 2014, at 6:07 AM, Richard Biener  wrote:
> 
> 
> First this adds a controlling option to the phiopt pass (-fssa-phiopt).
> Second, this moves the first phiopt pass from the main optimization
> pipeline into early opts (before merge-phi which confuses phiopt
> but after dce which will help it).
> 
> ISTR that adding an early phiopt pass was wanted to perform CFG
> cleanups on the weird CFG that the gimplifier produces from C++
> code (but I fail to recollect the details nor remember a bug number).
> 
> Generally doing a phiopt before merge-phi gets the chance to screw
> things up is good.  Also phiopt is a kind of cleanup that is
> always beneficial as it decreases code-size.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> I felt that -ftree-XXX is bad naming so I went for -fssa-XXX
> even if that is now inconsistent.  Any optinion here?  For
> RTL we simply have unsuffixed names so shall we instead go
> for -fphiopt?  PHI implies SSA anyway and 'SSA' or 'RTL' is
> an implementation detail that the user should not be interested
> in (applies to tree- as well, of course).  Now, 'phiopt' is a
> bad name when thinking of users (but they shouldn't play with
> those options anyway).
> 
> So - comments on the pass move?  Comments on the flag naming?
> 
> Thanks,
> Richard.
> 
> 2014-06-17  Richard Biener  
> 
>* passes.def (pass_all_early_optimizations): Add phi-opt
>after dce and before merge-phi.
>(pass_all_optimizations): Remove first phi-opt pass.
>* common.opt (fssa-phiopt): New option.
>* opts.c (default_options_table): Enable -fssa-phiopt with -O1+
>but not with -Og.
>* tree-ssa-phiopt.c (pass_phiopt): Add gate method.
>* doc/invoke.texi (-fssa-phiopt): Document.
> 
> Index: gcc/passes.def
> ===
> --- gcc/passes.def(revision 211736)
> +++ gcc/passes.def(working copy)
> @@ -73,8 +73,12 @@ along with GCC; see the file COPYING3.
> execute TODO_rebuild_alias at this point.  */
>  NEXT_PASS (pass_build_ealias);
>  NEXT_PASS (pass_fre);
> -  NEXT_PASS (pass_merge_phi);
>  NEXT_PASS (pass_cd_dce);
> +  NEXT_PASS (pass_phiopt);
> +  /* Do this after phiopt runs as phiopt is confused by
> + PHIs with more than two arguments.  Switch conversion
> + looks for a single PHI block though.  */
> +  NEXT_PASS (pass_merge_phi);

I had made phiopt not be confused by more than two arguments. What has changed? 
 I think we should make phiopt again better with more two arguments. 

Thanks,
Andrew


>  NEXT_PASS (pass_early_ipa_sra);
>  NEXT_PASS (pass_tail_recursion);
>  NEXT_PASS (pass_convert_switch);
> @@ -155,7 +159,6 @@ along with GCC; see the file COPYING3.
>   NEXT_PASS (pass_cselim);
>   NEXT_PASS (pass_copy_prop);
>   NEXT_PASS (pass_tree_ifcombine);
> -  NEXT_PASS (pass_phiopt);
>   NEXT_PASS (pass_tail_recursion);
>   NEXT_PASS (pass_ch);
>   NEXT_PASS (pass_stdarg);
> Index: gcc/common.opt
> ===
> --- gcc/common.opt(revision 211736)
> +++ gcc/common.opt(working copy)
> @@ -1950,6 +1950,10 @@ fsplit-wide-types
> Common Report Var(flag_split_wide_types) Optimization
> Split wide types into independent registers
> 
> +fssa-phiopt
> +Common Report Var(flag_ssa_phiopt) Optimization
> +Optimize conditional patterns using SSA PHI nodes
> +
> fvariable-expansion-in-unroller
> Common Report Var(flag_variable_expansion_in_unroller) Optimization
> Apply variable expansion when loops are unrolled
> Index: gcc/opts.c
> ===
> --- gcc/opts.c(revision 211736)
> +++ gcc/opts.c(working copy)
> @@ -457,6 +457,7 @@ static const struct default_options defa
> { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fbranch_count_reg, NULL, 1 },
> { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fmove_loop_invariants, NULL, 1 },
> { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_ftree_pta, NULL, 1 },
> +{ OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fssa_phiopt, NULL, 1 },
> 
> /* -O2 optimizations.  */
> { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },
> Index: gcc/tree-ssa-phiopt.c
> ===
> --- gcc/tree-ssa-phiopt.c(revision 211736)
> +++ gcc/tree-ssa-phiopt.c(working copy)
> @@ -2332,6 +2332,7 @@ public:
> 
>   /* opt_pass methods: */
>   opt_pass * clone () { return new pass_phiopt (m_ctxt); }
> +  virtual bool gate (function *) { return flag_ssa_phiopt; }
>   virtual unsigned int execute (function *)
> {
>   return tree_ssa_phiopt_worker (false, gate_hoist_loads ());
> Index: gcc/doc/invoke.texi
> ===
> --- gcc/doc/invoke.texi(revision 211736)
> +++ gcc/doc/invoke.texi(working copy)
> @@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialect

Re: [PATCH][RFC] Add phiopt in early opts (and add -fssa-phiopt option)

2014-06-17 Thread Richard Biener

On Tue, 17 Jun 2014, pins...@gmail.com wrote:

> 
> 
> > On Jun 17, 2014, at 6:07 AM, Richard Biener  wrote:
> > 
> > 
> > First this adds a controlling option to the phiopt pass (-fssa-phiopt).
> > Second, this moves the first phiopt pass from the main optimization
> > pipeline into early opts (before merge-phi which confuses phiopt
> > but after dce which will help it).
> > 
> > ISTR that adding an early phiopt pass was wanted to perform CFG
> > cleanups on the weird CFG that the gimplifier produces from C++
> > code (but I fail to recollect the details nor remember a bug number).
> > 
> > Generally doing a phiopt before merge-phi gets the chance to screw
> > things up is good.  Also phiopt is a kind of cleanup that is
> > always beneficial as it decreases code-size.
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > 
> > I felt that -ftree-XXX is bad naming so I went for -fssa-XXX
> > even if that is now inconsistent.  Any optinion here?  For
> > RTL we simply have unsuffixed names so shall we instead go
> > for -fphiopt?  PHI implies SSA anyway and 'SSA' or 'RTL' is
> > an implementation detail that the user should not be interested
> > in (applies to tree- as well, of course).  Now, 'phiopt' is a
> > bad name when thinking of users (but they shouldn't play with
> > those options anyway).
> > 
> > So - comments on the pass move?  Comments on the flag naming?
> > 
> > Thanks,
> > Richard.
> > 
> > 2014-06-17  Richard Biener  
> > 
> >* passes.def (pass_all_early_optimizations): Add phi-opt
> >after dce and before merge-phi.
> >(pass_all_optimizations): Remove first phi-opt pass.
> >* common.opt (fssa-phiopt): New option.
> >* opts.c (default_options_table): Enable -fssa-phiopt with -O1+
> >but not with -Og.
> >* tree-ssa-phiopt.c (pass_phiopt): Add gate method.
> >* doc/invoke.texi (-fssa-phiopt): Document.
> > 
> > Index: gcc/passes.def
> > ===
> > --- gcc/passes.def(revision 211736)
> > +++ gcc/passes.def(working copy)
> > @@ -73,8 +73,12 @@ along with GCC; see the file COPYING3.
> > execute TODO_rebuild_alias at this point.  */
> >  NEXT_PASS (pass_build_ealias);
> >  NEXT_PASS (pass_fre);
> > -  NEXT_PASS (pass_merge_phi);
> >  NEXT_PASS (pass_cd_dce);
> > +  NEXT_PASS (pass_phiopt);
> > +  /* Do this after phiopt runs as phiopt is confused by
> > + PHIs with more than two arguments.  Switch conversion
> > + looks for a single PHI block though.  */
> > +  NEXT_PASS (pass_merge_phi);
> 
> I had made phiopt not be confused by more than two arguments. What has 
> changed?  I think we should make phiopt again better with more two 
> arguments.

I'm not sure - the above is just what I remember seeing, not currently
failing testcases.  I can certainly remove the comment - or do you
say phiopt now eventually benefits from merge_phi?  Then I can as
well keep merge_phi where it is right now.

Richard.

> Thanks,
> Andrew
> 
> 
> >  NEXT_PASS (pass_early_ipa_sra);
> >  NEXT_PASS (pass_tail_recursion);
> >  NEXT_PASS (pass_convert_switch);
> > @@ -155,7 +159,6 @@ along with GCC; see the file COPYING3.
> >   NEXT_PASS (pass_cselim);
> >   NEXT_PASS (pass_copy_prop);
> >   NEXT_PASS (pass_tree_ifcombine);
> > -  NEXT_PASS (pass_phiopt);
> >   NEXT_PASS (pass_tail_recursion);
> >   NEXT_PASS (pass_ch);
> >   NEXT_PASS (pass_stdarg);
> > Index: gcc/common.opt
> > ===
> > --- gcc/common.opt(revision 211736)
> > +++ gcc/common.opt(working copy)
> > @@ -1950,6 +1950,10 @@ fsplit-wide-types
> > Common Report Var(flag_split_wide_types) Optimization
> > Split wide types into independent registers
> > 
> > +fssa-phiopt
> > +Common Report Var(flag_ssa_phiopt) Optimization
> > +Optimize conditional patterns using SSA PHI nodes
> > +
> > fvariable-expansion-in-unroller
> > Common Report Var(flag_variable_expansion_in_unroller) Optimization
> > Apply variable expansion when loops are unrolled
> > Index: gcc/opts.c
> > ===
> > --- gcc/opts.c(revision 211736)
> > +++ gcc/opts.c(working copy)
> > @@ -457,6 +457,7 @@ static const struct default_options defa
> > { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fbranch_count_reg, NULL, 1 },
> > { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fmove_loop_invariants, NULL, 1 },
> > { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_ftree_pta, NULL, 1 },
> > +{ OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fssa_phiopt, NULL, 1 },
> > 
> > /* -O2 optimizations.  */
> > { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },
> > Index: gcc/tree-ssa-phiopt.c
> > ===
> > --- gcc/tree-ssa-phiopt.c(revision 211736)
> > +++ gcc/tree-ssa-phiopt.c(working copy)
> > @@ -2332,6 +2332,7 @@ public:

Re: [PATCH, Pointer Bounds Checker 28/x] IPA CP

2014-06-17 Thread Martin Jambor

Hi,

On Wed, Jun 11, 2014 at 05:47:36PM +0400, Ilya Enkovich wrote:
>
> Here is fixed verison.

I'm fine with the ipa-cp hunks but I cannot approve them, Honza is the
right person to ask.

Thanks,

Martin


> 
> Thanks,
> Ilya
> --
> gcc/
> 
> 2014-06-11  Ilya Enkovich  
> 
>   * cgraph.h (cgraph_local_p): New.
>   * ipa-cp.c (initialize_node_lattices): Use cgraph_local_p
>   to handle instrumentation clones properly.
>   (propagate_constants_accross_call): Do not propagate
>   through instrumentation thunks.
> 
> 
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 5e702a7..b225ebe 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -1556,4 +1556,17 @@ symtab_in_same_comdat_p (symtab_node *one, symtab_node 
> *two)
>  {
>return DECL_COMDAT_GROUP (one->decl) == DECL_COMDAT_GROUP (two->decl);
>  }
> +
> +/* Return true if NODE is local.  Instrumentation clones are counted as local
> +   only when originla function is local.  */
> +
> +static inline bool
> +cgraph_local_p (cgraph_node *node)
> +{
> +  if (!node->instrumentation_clone || !node->instrumented_version)
> +return node->local.local;
> +
> +  return node->local.local && node->instrumented_version->local.local;
> +}
> +
>  #endif  /* GCC_CGRAPH_H  */
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 689378a..4318789 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -699,7 +699,7 @@ initialize_node_lattices (struct cgraph_node *node)
>int i;
>  
>gcc_checking_assert (cgraph_function_with_gimple_body_p (node));
> -  if (!node->local.local)
> +  if (!cgraph_local_p (node))
>  {
>/* When cloning is allowed, we can assume that externally visible
>functions are not called.  We will compensate this by cloning
> @@ -1434,6 +1434,24 @@ propagate_constants_accross_call (struct cgraph_edge 
> *cs)
>if (parms_count == 0)
>  return false;
>  
> +  /* No propagation through instrumentation thunks is available yet.
> + It should be possible with proper mapping of call args and
> + instrumented callee params in the propagation loop below.  But
> + this case mostly occurs when legacy code calls instrumented code
> + and it is not a primary target for optimizations.
> + We detect instrumentation thunks in aliases and thunks chain by
> + checking instrumentation_clone flag for chain source and target.
> + Going through instrumentation thunks we always have it changed
> + from 0 to 1 and all other nodes do not change it.  */
> +  if (!cs->callee->instrumentation_clone
> +  && callee->instrumentation_clone)
> +{
> +  for (i = 0; i < parms_count; i++)
> + ret |= set_all_contains_variable (ipa_get_parm_lattices (callee_info,
> +  i));
> +  return ret;
> +}
> +
>/* If this call goes through a thunk we must not propagate to the first 
> (0th)
>   parameter.  However, we might need to uncover a thunk from below a 
> series
>   of aliases first.  */

Compile gcc.target/i386/fuse-caller-save.c with -fomit-frame-pointer (PR target/61533)

2014-06-17 Thread Rainer Orth

gcc.target/i386/fuse-caller-save.c currently FAILs on Solaris/x86 with
gas and -m64:

FAIL: gcc.target/i386/fuse-caller-save.c scan-assembler-not .cfi_def_cfa_offset
FAIL: gcc.target/i386/fuse-caller-save.c scan-assembler-not .cfi_offset

Fixed as follows as suggested and pre-approved by Uros in the PR.
Tested with the appropriate runtest invocations on i386-pc-solaris2.11
and x86_64-unknown-linux-gnu, installed on mainline.

Rainer


2014-06-17  Rainer Orth  

PR target/61533
* gcc.target/i386/fuse-caller-save.c: Add -fomit-frame-pointer to
dg-options.

diff --git a/gcc/testsuite/gcc.target/i386/fuse-caller-save.c b/gcc/testsuite/gcc.target/i386/fuse-caller-save.c
--- a/gcc/testsuite/gcc.target/i386/fuse-caller-save.c
+++ b/gcc/testsuite/gcc.target/i386/fuse-caller-save.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fuse-caller-save" } */
+/* { dg-options "-O2 -fuse-caller-save -fomit-frame-pointer" } */
 /* { dg-additional-options "-mregparm=1" { target ia32 } } */
 
 /* Testing -fuse-caller-save optimization option.  */


-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: Bug 61407 - Build errors on latest OS X 10.10 Yosemite with Xcode 6 on GCC 4.8.3

2014-06-17 Thread Bernhard Reutner-Fischer


On 17 June 2014 13:10:07 Илья Михальцов  wrote:


index 892ba35..39f795f 100644
--- a/gcc/config/darwin-c.c
+++ b/gcc/config/darwin-c.c
@@ -572,20 +572,31 @@ find_subframework_header (cpp_reader *pfile, const 
char *header, cpp_dir **dirp)


 /* Return the value of darwin_macosx_version_min suitable for the
__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ macro,
-   so '10.4.2' becomes 1040.  The lowest digit is always zero.
-   Print a warning if the version number can't be understood.  */
+   so '10.4.2' becomes 1040 and '10.10.0' becomes 101000.  The lowest
+   digit is always zero. Print a warning if the version number
+   can't be understood.  */
 static const char *
 version_as_macro (void)
 {
-  static char result[] = "1000";
+  static char result[7] = "1000";
+  int minorDigitIdx;

   if (strncmp (darwin_macosx_version_min, "10.", 3) != 0)
 goto fail;
   if (! ISDIGIT (darwin_macosx_version_min[3]))
 goto fail;
-  result[2] = darwin_macosx_version_min[3];
-  if (darwin_macosx_version_min[4] != '\0'
-  && darwin_macosx_version_min[4] != '.')
+
+  minorDigitIdx = 3;
+  result[2] = darwin_macosx_version_min[minorDigitIdx++];
+  if (ISDIGIT(darwin_macosx_version_min[minorDigitIdx])) {
+/* Starting with 10.10 numeration for mactro changed */


What does "mactro" mean? macro?
Thanks,


Sent with AquaMail for Android
http://www.aqua-mail.com

Re: Another AIX Bootstrap failure

2014-06-17 Thread David Edelsohn

On Mon, Jun 16, 2014 at 11:44 PM, Jan Hubicka  wrote:

>> The linker is not seeing the local definition of
>> ._ZN14__gnu_parallel9_SettingsC1Ev.  libstdc++ is built with
>> Linux-like semantics, so it allows symbols to be overridden. AIX calls
>> everything through the PLT. But the real definition of the function is
>
> Even static functions?
>
>> not being seen.
>>
>> I'm not exactly sure why inlining changing this and what these extra
>> levels of indirections are trying to accomplish. The visibility of the
>
> To avoid using PLT and GOT when the unit refers to the symbol and we know
> that interposition does not matter.

I am not certain if the linker is creating the PLT stub code because
it wants to allow interpolation or because it cannot see a definition
of the function and wants to allow for some other shared library to
provide the definition at runtime.

> Why branch to a non-global (static) symbol
>   b ._ZN14__gnu_parallel9_SettingsC1Ev.localalias.0
> leads to PLT stub here and why branching to such symbols seems to work 
> otherwise?

Branching to non-global (static) symbol, even an alias, is working
here. The weak function seems to be the problem.

> The failing branch is
>> b ._ZN14__gnu_parallel9_SettingsC1Ev.localalias.0
> so the call to static construction seems to have happened correctly but we can
> not get right the call from the constructor to static function (that is an 
> alias
> of a global symbol)

The linker appears to not want to resolve the weak function. If I
change ._ZN14__gnu_parallel9_SettingsC1Ev to lglobl, it works. If I
change the static constructor to call the weak function directly,
avoiding the alias, it shows the same failure mode.

I don't know what code generation looked like before.  Was GCC
generating calls to weak functions within the same file?

Thanks, David

Re: Regimplification enhancements 3/3

2014-06-17 Thread Martin Jambor

On Mon, Jun 16, 2014 at 01:38:49PM +0200, Richard Biener wrote:
> On Mon, Jun 16, 2014 at 12:57 PM, Bernd Schmidt  
> wrote:
> > There's code in regimplification that makes us use an extra temporary
> > when we encounter a call returning a non-BLKmode structure. This seems
> > somewhat inefficient and unnecessary, and when used from the
> > lower-addr-spaces pass I'm working on it leads to problems further
> > down that look like tree-ssa bugs that I wasn't able to clearly
> > disentangle.
> >
> > Here's what happens on compile/pr51761.c.  Regimplification has the
> > following effect, creating an extra temporary _6:
> >
> > -  D.1378 = fooD.1373 (aD.1377);
> > +  _6 = fooD.1373 (aD.1377);
> > +  # .MEMD.1382 = VDEF <.MEMD.1382>
> > +  D.1378 = _6;
> >
> > SRA turns this into:
> >
> >   _6 = fooD.1373 (aD.1377);
> >   # VUSE <.MEM_3>
> >   SR$2_7 = MEM[(struct S *)&_6];
> 
> clearly bogus - _6 is a register, you can't use a MEM on it.

Weird... does the following (untested) patch help?

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 0afa197..747b1b6 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -3277,6 +3277,8 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator 
*gsi)
 
   if (modify_this_stmt
   || gimple_has_volatile_ops (*stmt)
+  || is_gimple_reg (lhs)
+  || is_gimple_reg (rhs)
   || contains_vce_or_bfcref_p (rhs)
   || contains_vce_or_bfcref_p (lhs)
   || stmt_ends_bb_p (*stmt))

It is just a quick thought though.  If it does not, could you post the
access trees dumped by -fdump-tree-esra-details or
-fdump-tree-sra-details (depending on whether this is early or late
SRA)?  Or is it simple to set it up locally?

Thanks,

Martin

> 
> > Somehow, the address of &_6 doesn't count as a use, and the DCE pass decides
> > it is unused:
> >
> >   Eliminating unnecessary statements:
> >   Deleting LHS of call: _6 = foo (a);
> >
> > However, the statement
> >   SR$2_7 = MEM[(struct S *)&_6];
> > is still present, and we have an SSA name without a definition, leading to a
> > crash.
> >
> > Rather than figure all this out, I decided to try making the
> > regimplification not generate the extra copy in the first place. The
> > testsuite seems to agree with me that it's unnecessary. Bootstrapped and
> > tested on x86_64-linux, ok?
> 
> Ok.  The code looks bogus anyway in that it generates a SSA name
> for sth not is_gimple_reg_type ().
> 
> Thanks,
> Richard.
> 
> >
> > Bernd

Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access

2014-06-17 Thread Charles Baylis

On 5 June 2014 07:27, Ramana Radhakrishnan  wrote:
> On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis
>  wrote:
>> This patch adds support for post-indexed addressing for NEON structure
>> memory accesses.
>>
>> For example VLD1.8 {d0}, [r0], r1
>>
>>
>> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.
>>
>> Ok for trunk?
>
> This looks like a reasonable start but this work doesn't look complete
> to me yet.
>
> Can you also look at the impact on performance of a range of
> benchmarks especially a popular embedded one to see how this behaves
> unless you have already done so ?

I ran a popular suite of embedded benchmarks, and there is no impact
at all on Chromebook (including with the additional attached patch)

The patch was developed to address a performance issue with a new
version of libvpx which uses intrinsics instead of NEON assembler. The
patch results in a 3% improvement for VP8 decode.

> POST_INC, POST_MODIFY usually have a funny way of biting you with
> either ivopts or the way in which address costs work. I think there
> maybe further tweaks needed but for a first step I'd like to know what
> the performance impact is.

> I would also suggest running this through clyon's neon intrinsics
> testsuite to see if that catches any issues especially with the large
> vector modes.

No issues found in clyon's tests.

Your mention of larger vector modes prompted me to check that the
patch has the desired result with them. In fact, the costs are
estimated incorrectly which means the post_modify pattern is not used.
The attached patch fixes that. (used in combination with my original
patch)

2014-06-15  Charles Baylis  

* config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with
embedded side effects.

0002-Adjust-costs-for-mem-with-post_modify.patch
Description: application/download

Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-17 Thread Richard Henderson

On 06/17/2014 05:33 AM, Evgeny Stupachenko wrote:
> +   1st vec:   0  1  2  3  4  5  6  7
> +   2nd vec:   8  9 10 11 12 13 14 15
> +   3rd vec:  16 17 18 19 20 21 22 23
> +
> +   The output sequence should be:
> +
> +   1st vec:  0 3 6  9 12 15 18 21
> +   2nd vec:  1 4 7 10 13 16 19 22
> +   3rd vec:  2 5 8 11 14 17 20 23
> +
> +   We use 3 shuffle instructions and 3 * 3 - 1 shifts to create such output.

Why not 3 * 2 blend followed by 3 shuffle?  When length is prime, as here, we
know that no blend will ever overlap elements.  So:

1st step

  A1 = blend V1 V2 =  0  9  2  3 12  5  6 15
  A2 = blend V1 V2 =  8  1 10 11  4 13 14  7
  A3 = blend V1 V3 = 16 17  2 19 20  5 22 23

2nd step

  B1 = blend A1 V3 =  0  9 18  3 12 21  6 15
  B2 = blend A2 V3 = 16  1 10 19  4 13 22  7
  B3 = blend A3 V2 =  8 17  2 11 20  5 14 23

3rd step

  C1 = perm B1 =  0  3  6  9 12 15 18 21
  C2 = perm B2 =  1  4  7 10 13 16 19 22
  C3 = perm B3 =  2  5  8 11 14 17 20 23

The final permute here isn't trivial, crossing lanes for avx2 and all, but the
initial permute you use is similar.


r~

[PATCH GCC 2/2]Add 'force-dwarf-lexical-blocks' command line option - extend to C++

2014-06-17 Thread Herman, Andrei


Hi,

This is the third (and final) patch which extends the original change 
proposal, submitted on June 1, and titled "Add 
'force-dwarf-lexical-blocks' command line option".


This patch extends the proposed functionality to C++.

Attached are the proposed ChangeLog additions (for this patch only), 
named according to the directory each one belongs to.


All check-c and check-c++ tests have been run for unix target.
The testsuites showed identical results, with and without setting the 
proposed -fforce-dwarf-lexical-blocks command line option.


Please let me know, if the proposed additions will be accepted.

Best regards,
Andrei Herman
Mentor Graphics Corporation
Israel branch
>From 824e75eb563e82c04fe1621c64430d87cdb0f348 Mon Sep 17 00:00:00 2001
From: Andrei Herman 
Date: Tue, 17 Jun 2014 17:59:07 +0300
Subject: [PATCH 3/3] Support flag_force_dwarf_blocks in C++.

* c-semantics.c (push_block_info): Allow BIND_EXPR for STATEMENT_LIST.

* cp-objcp-common.c (cxx_block_may_fallthru): Return false for break
or continue, when flag_force_dwarf_blocks.

* cp-tree.h (pop_scope_for_labels): New.

* name-lookup.c (keep_current_level): New.
(kept_level_p): When flag_force_dwarf_blocks, avoid creating duplicate
blocks.

* name-lookup.h (keep_current_level): New.

* parser.c (cp_parser_statement): Add last_label and pass it when
calling cp_parser_label_for_labeled_statement, to create a label scope
for the first label of a statement.  Close forced scopes at current
level, after labeled compound statements that don't fall through.
(cp_parser_force_block_for_label): New.
(pop_scope_for_labels): New.
(cp_parser_label_for_labeled_statement): Add parameter.  Create a label
scope for the first label of a statement.
(cp_parser_compound_statement): Force a block for compound statement.
(cp_parser_implicitly_scoped_statement): Likewise for if-then, if-else,
switch and do statements.
(cp_parser_already_scoped_statement): Likewise for for/while bodies.

* semantics.c (do_poplevel): Close any forced scopes in given level.
(build_data_member_initialization): Allow BIND_EXP.

Signed-off-by: Andrei Herman 
---
 gcc/c-family/c-semantics.c |   11 -
 gcc/cp/cp-objcp-common.c   |5 ++
 gcc/cp/cp-tree.h   |1 +
 gcc/cp/name-lookup.c   |   12 +-
 gcc/cp/name-lookup.h   |1 +
 gcc/cp/parser.c|  104 
 gcc/cp/semantics.c |5 ++
 7 files changed, 127 insertions(+), 12 deletions(-)

diff --git a/gcc/c-family/c-semantics.c b/gcc/c-family/c-semantics.c
index ec3045f..8c8497f 100644
--- a/gcc/c-family/c-semantics.c
+++ b/gcc/c-family/c-semantics.c
@@ -35,8 +35,15 @@ along with GCC; see the file COPYING3.  If not see
 void
 push_block_info (tree block, location_t loc, bool is_label)
 {
-  if (TREE_CODE(block) != STATEMENT_LIST)
+  switch (TREE_CODE (block)) {
+  case BIND_EXPR:
+block = BIND_EXPR_BODY (block);
+/* Fall through.  */
+  case STATEMENT_LIST:
+break;
+  default:
 return;
+  }
 
   block_loc tl;
   tl = (block_loc) ggc_internal_cleared_alloc (sizeof(struct block_loc_s));
@@ -70,7 +77,7 @@ check_pop_block_info(tree block, location_t loc)
   if (block == cur_block_info->block && loc == cur_block_info->loc
   && !cur_block_info->is_label)
 {
-  block_list_stack->pop();
+  block_list_stack->pop ();
 }
 }
 }
diff --git a/gcc/cp/cp-objcp-common.c b/gcc/cp/cp-objcp-common.c
index 78dddef..fcfd959 100644
--- a/gcc/cp/cp-objcp-common.c
+++ b/gcc/cp/cp-objcp-common.c
@@ -238,6 +238,11 @@ cxx_block_may_fallthru (const_tree stmt)
   return false;
 
 default:
+  if (flag_force_dwarf_blocks) {
+if (TREE_CODE (stmt) == BREAK_STMT ||
+TREE_CODE (stmt) == CONTINUE_STMT)
+  return false;
+  }
   return true;
 }
 }
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 7d29c2c..4953ad9 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5501,6 +5501,7 @@ extern bool maybe_clone_body  (tree);
 extern tree cp_convert_range_for (tree, tree, tree, bool);
 extern bool parsing_nsdmi (void);
 extern void inject_this_parameter (tree, cp_cv_quals);
+extern void pop_scope_for_labels (tree);
 
 /* in pt.c */
 extern bool check_template_shadow  (tree);
diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 2baeeb7..5538c63 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -1745,7 +1745,8 @@ local_bindings_p (void)
 bool
 kept_level_p (void)
 {
-  return (current_binding_level->blocks != NULL_TREE
+  return ((!flag_force_dwarf_blocks
+   && current_binding_level->blocks != NULL_TREE)
  || current_binding_level->keep
  || current_binding_level->kind == sk_cleanup
  || current_binding

Re: Another AIX Bootstrap failure

2014-06-17 Thread Jan Hubicka

> > To avoid using PLT and GOT when the unit refers to the symbol and we know
> > that interposition does not matter.
> 
> I am not certain if the linker is creating the PLT stub code because
> it wants to allow interpolation or because it cannot see a definition
> of the function and wants to allow for some other shared library to
> provide the definition at runtime.

OK, but the definition appears in the same file..
> 
> > Why branch to a non-global (static) symbol
> >   b ._ZN14__gnu_parallel9_SettingsC1Ev.localalias.0
> > leads to PLT stub here and why branching to such symbols seems to work 
> > otherwise?
> 
> Branching to non-global (static) symbol, even an alias, is working
> here. The weak function seems to be the problem.
> 
> > The failing branch is
> >> b ._ZN14__gnu_parallel9_SettingsC1Ev.localalias.0
> > so the call to static construction seems to have happened correctly but we 
> > can
> > not get right the call from the constructor to static function (that is an 
> > alias
> > of a global symbol)
> 
> The linker appears to not want to resolve the weak function. If I
> change ._ZN14__gnu_parallel9_SettingsC1Ev to lglobl, it works. If I
> change the static constructor to call the weak function directly,
> avoiding the alias, it shows the same failure mode.
> 
> I don't know what code generation looked like before.  Was GCC
> generating calls to weak functions within the same file?

Yes, this is how you implement COMDAT functions, right?  I looked at rs6000 call
expansion and it does not seem to care about visibility properties (just about
direct wrt indirect call).

One problem I can think of is a scenario where linked unify calls comdat 
functoins
in between units somehow forgetting about the aliases, but this function seems 
to
not be shared.
Index: symtab.c
===
--- symtab.c(revision 211693)
+++ symtab.c(working copy)
@@ -1327,10 +1327,8 @@
   (void *)&new_node, true);
   if (new_node)
 return new_node;
-#ifndef ASM_OUTPUT_DEF
   /* If aliases aren't supported by the assembler, fail.  */
   return NULL;
-#endif
 
   /* Otherwise create a new one.  */
   new_decl = copy_node (node->decl);

disable generation of the local aliases completely.  I do not see much of 
difference
in the actual codegen with this...
I will check older GCC

Honza
> 
> Thanks, David

Re: [Patch] PR55189 enable -Wreturn-type by default

2014-06-17 Thread Sylvestre Ledru

On 05/06/2014 20:01, Joseph S. Myers wrote:
>
>> Initially, I implemented -Wmissing-return to manage this case (
>> https://gcc.gnu.org/ml/gcc-patches/2014-01/msg00820.html ) but Jason
>> suggested to remove that:
>> https://gcc.gnu.org/ml/gcc-patches/2014-01/msg01033.html
>> (I don't have a strong opinion on the subject).
> I think splitting the option like that makes sense.  Compatibility 
> indicates that -Wreturn-type and -Wall should still enable 
> -Wmissing-return, but only the other pieces of -Wreturn-type should be 
> enabled by default, at least for C.  (Enabling -Wimplicit-int by default 
> might be a good starting point.)
OK.
As attachment, you will find a potential implementation. Is that what
you expect?

> Also, at least one testsuite change in your patch is wrong. 
OK. Thanks. I've probably made other (I update +1300 of them)

Thanks
Sylvestre

>From 1b936c618c58dc0e899fa9f56013de48f7e4dcd6 Mon Sep 17 00:00:00 2001
From: Sylvestre Ledru 
Date: Tue, 17 Jun 2014 18:48:29 +0200
Subject: [PATCH 2/2] Enable Wimplicit by default

---
 gcc/c-family/c.opt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 050d400..9b9ede7 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -460,7 +460,7 @@ C ObjC Var(warn_implicit_function_declaration) Init(-1) Warning LangEnabledBy(C
 Warn about implicit function declarations
 
 Wimplicit-int
-C ObjC Var(warn_implicit_int) Warning LangEnabledBy(C ObjC,Wimplicit)
+C ObjC Var(warn_implicit_int) Warning
 Warn when a declaration does not specify a type
 
 Wimport
-- 
2.0.0

>From 80cd3dff34f74058ab66b69e0e01a05eaf686338 Mon Sep 17 00:00:00 2001
From: Sylvestre Ledru 
Date: Tue, 17 Jun 2014 18:48:12 +0200
Subject: [PATCH 1/2] Introduce -Wmissing-return (Was part of -Wreturn-type
 which is now enabled by default)

---
 gcc/c-family/c.opt|  4 
 gcc/doc/invoke.texi   | 10 +-
 gcc/fortran/options.c |  4 
 gcc/tree-cfg.c|  4 ++--
 4 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 91f8275..050d400 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -697,6 +697,10 @@ Wreturn-type
 C ObjC C++ ObjC++ Var(warn_return_type) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall)
 Warn whenever a function's return type defaults to \"int\" (C), or about inconsistent return types (C++)
 
+Wmissing-return
+C ObjC C++ ObjC++ Var(warn_missing_return) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall)
+Warn whenever control may reach end of non-void function
+
 Wselector
 ObjC ObjC++ Var(warn_selector) Warning
 Warn if a selector has multiple methods
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9a34f1c..9911e86 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -258,7 +258,7 @@ Objective-C and Objective-C++ Dialects}.
 -Winvalid-pch -Wlarger-than=@var{len}  -Wunsafe-loop-optimizations @gol
 -Wlogical-op -Wlogical-not-parentheses -Wlong-long @gol
 -Wmain -Wmaybe-uninitialized -Wmissing-braces  -Wmissing-field-initializers @gol
--Wmissing-include-dirs @gol
+-Wmissing-include-dirs -Wmissing-return @gol
 -Wno-multichar  -Wnonnull  -Wno-overflow -Wopenmp-simd @gol
 -Woverlength-strings  -Wpacked  -Wpacked-bitfield-compat  -Wpadded @gol
 -Wparentheses  -Wpedantic-ms-format -Wno-pedantic-ms-format @gol
@@ -3327,6 +3327,7 @@ Options} and @ref{Objective-C and Objective-C++ Dialect Options}.
 -Wmain @r{(only for C/ObjC and unless} @option{-ffreestanding}@r{)}  @gol
 -Wmaybe-uninitialized @gol
 -Wmissing-braces @r{(only for C/ObjC)} @gol
+-Wmissing-return @gol
 -Wnonnull  @gol
 -Wopenmp-simd @gol
 -Wparentheses  @gol
@@ -3657,6 +3658,13 @@ the following example, the initializer for @samp{a} is not fully
 bracketed, but that for @samp{b} is fully bracketed.  This warning is
 enabled by @option{-Wall} in C.
 
+@item -Wmissing-return
+@opindex Wmissing-return
+@opindex Wno-missing-return
+Warn whenever falling off the end of the function body (I.e. without
+any return).
+This warning is enabled by @option{-Wall} for C and C++.
+
 @smallexample
 int a[2][2] = @{ 0, 1, 2, 3 @};
 int b[2][2] = @{ @{ 0, 1 @}, @{ 2, 3 @} @};
diff --git a/gcc/fortran/options.c b/gcc/fortran/options.c
index a2b91ca..fe71230 100644
--- a/gcc/fortran/options.c
+++ b/gcc/fortran/options.c
@@ -698,6 +698,10 @@ gfc_handle_option (size_t scode, const char *arg, int value,
   gfc_option.warn_line_truncation = value;
   break;
 
+case OPT_Wmissing_return:
+  warn_missing_return = value;
+  break;
+
 case OPT_Wrealloc_lhs:
   gfc_option.warn_realloc_lhs = value;
   break;
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index e824619..2fd342e 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -8265,7 +8265,7 @@ pass_warn_function_return::execute (function *fun)
 
   /* If we see "return;" in some basic block, then we do reach the end
  without returning a value.  */
-  else if (warn_return_type
+  else if (warn_missing_retur

Re: Another AIX Bootstrap failure

2014-06-17 Thread David Edelsohn

On Tue, Jun 17, 2014 at 12:50 PM, Jan Hubicka  wrote:
>> > To avoid using PLT and GOT when the unit refers to the symbol and we know
>> > that interposition does not matter.
>>
>> I am not certain if the linker is creating the PLT stub code because
>> it wants to allow interpolation or because it cannot see a definition
>> of the function and wants to allow for some other shared library to
>> provide the definition at runtime.
>
> OK, but the definition appears in the same file..
>>
>> > Why branch to a non-global (static) symbol
>> >   b ._ZN14__gnu_parallel9_SettingsC1Ev.localalias.0
>> > leads to PLT stub here and why branching to such symbols seems to work 
>> > otherwise?
>>
>> Branching to non-global (static) symbol, even an alias, is working
>> here. The weak function seems to be the problem.

The weak function is the problem, but I don't know why.  And I don't
understand how this is different than past uses of weak functions.  Or
is that new?

This is very confusing because the library, libstdc++, is being linked
statically. It provides a weak definition of the function. There
should be no glink code (PLT stub).  If the function is declared
.lglobl, it is called directly and no PLT stub is created.  I need to
call in the help of the AIX linker expert to figure out why it is
inserting PLT stub code, especially when linking statically.

Thanks, David

Re: [Patch] PR55189 enable -Wreturn-type by default

2014-06-17 Thread Joseph S. Myers

On Tue, 17 Jun 2014, Sylvestre Ledru wrote:

> On 05/06/2014 20:01, Joseph S. Myers wrote:
> >
> >> Initially, I implemented -Wmissing-return to manage this case (
> >> https://gcc.gnu.org/ml/gcc-patches/2014-01/msg00820.html ) but Jason
> >> suggested to remove that:
> >> https://gcc.gnu.org/ml/gcc-patches/2014-01/msg01033.html
> >> (I don't have a strong opinion on the subject).
> > I think splitting the option like that makes sense.  Compatibility 
> > indicates that -Wreturn-type and -Wall should still enable 
> > -Wmissing-return, but only the other pieces of -Wreturn-type should be 
> > enabled by default, at least for C.  (Enabling -Wimplicit-int by default 
> > might be a good starting point.)
> OK.
> As attachment, you will find a potential implementation. Is that what
> you expect?

It would help a lot if it included testcases for what various options / 
option combinations do / do not enable.  I expect that each option 
continues to enable the warnings it does at present (so if a user 
explicitly does -Wreturn-type it also enables the -Wmissing-return 
warnings, for example) - but some warnings would start to be enabled by 
default.  If someone does e.g. -Wno-implicit that would disable the 
default -Wimplicit-int; if they do -Wno-implicit -Wimplicit that would 
have the same effect as just -Wimplicit (so keeping the default warnings 
enabled, and possibly enabling others).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-17 Thread Evgeny Stupachenko

While developing I've tried the following scheme:

First step is 3 shuffles (as initially):

A1 = (0 3 6) (1 4 7) (2 5)
A2 = (8 11 14) (9 12 15) (10 13)
A3 = (16 19 22) (17 20 23) (18 21)

R1 = blend [ blend [A1 A2], A3] =  (0 3 6) (9 12 15) (18 21)
  B2 = blend [A1, A2] = (0 3 6) (1 4 7) (10 13)
R2 = shift 3, B2 ... (1 4 7) (10 13) + A3 (16 19 22) ... = (1 4 7) (10
13) (16 19 22)
  B3 = blend [ A2, A3] = (8 11 14) (17 20 23) (18 21)
R3 = shift 6, A1 ... (2 5) + B3 (8 11 14) (17 20 23) ... = (2 5) (8 11
14) (17 20 23)

But it was slower than scheme in the patch as blend costs more than
shift (palign).
For AVX2 the scheme is not ok as have much more dependencies than
current (in vect_permute_load_chain).

Evgeny

On Tue, Jun 17, 2014 at 7:41 PM, Richard Henderson  wrote:
> On 06/17/2014 05:33 AM, Evgeny Stupachenko wrote:
>> +   1st vec:   0  1  2  3  4  5  6  7
>> +   2nd vec:   8  9 10 11 12 13 14 15
>> +   3rd vec:  16 17 18 19 20 21 22 23
>> +
>> +   The output sequence should be:
>> +
>> +   1st vec:  0 3 6  9 12 15 18 21
>> +   2nd vec:  1 4 7 10 13 16 19 22
>> +   3rd vec:  2 5 8 11 14 17 20 23
>> +
>> +   We use 3 shuffle instructions and 3 * 3 - 1 shifts to create such output.
>
> Why not 3 * 2 blend followed by 3 shuffle?  When length is prime, as here, we
> know that no blend will ever overlap elements.  So:
>
> 1st step
>
>   A1 = blend V1 V2 =  0  9  2  3 12  5  6 15
>   A2 = blend V1 V2 =  8  1 10 11  4 13 14  7
>   A3 = blend V1 V3 = 16 17  2 19 20  5 22 23
>
> 2nd step
>
>   B1 = blend A1 V3 =  0  9 18  3 12 21  6 15
>   B2 = blend A2 V3 = 16  1 10 19  4 13 22  7
>   B3 = blend A3 V2 =  8 17  2 11 20  5 14 23
>
> 3rd step
>
>   C1 = perm B1 =  0  3  6  9 12 15 18 21
>   C2 = perm B2 =  1  4  7 10 13 16 19 22
>   C3 = perm B3 =  2  5  8 11 14 17 20 23
>
> The final permute here isn't trivial, crossing lanes for avx2 and all, but the
> initial permute you use is similar.
>
>
> r~

Re: [Patch] PR55189 enable -Wreturn-type by default

2014-06-17 Thread Sylvestre Ledru

On 17/06/2014 19:15, Joseph S. Myers wrote:
> On Tue, 17 Jun 2014, Sylvestre Ledru wrote:
>
>> On 05/06/2014 20:01, Joseph S. Myers wrote:
 Initially, I implemented -Wmissing-return to manage this case (
 https://gcc.gnu.org/ml/gcc-patches/2014-01/msg00820.html ) but Jason
 suggested to remove that:
 https://gcc.gnu.org/ml/gcc-patches/2014-01/msg01033.html
 (I don't have a strong opinion on the subject).
>>> I think splitting the option like that makes sense.  Compatibility 
>>> indicates that -Wreturn-type and -Wall should still enable 
>>> -Wmissing-return, but only the other pieces of -Wreturn-type should be 
>>> enabled by default, at least for C.  (Enabling -Wimplicit-int by default 
>>> might be a good starting point.)
>> OK.
>> As attachment, you will find a potential implementation. Is that what
>> you expect?
> It would help a lot if it included testcases for what various options / 
> option combinations do / do not enable.  
OK. I will do that.
We should test the following:
* default => run just -Wreturn-type
* -Wreturn-type => Run both
* -Wreturn-type + -Wmissing-return => Run both
* -Wno-return-type + -Wmissing-return => Run just the second one
* -Wno-return-type + -Wno-missing-return => Run none
Do you see any other?
> I expect that each option 
> continues to enable the warnings it does at present (so if a user 
> explicitly does -Wreturn-type it also enables the -Wmissing-return 
> warnings, for example) - but some warnings would start to be enabled by 
> default.  If someone does e.g. -Wno-implicit that would disable the 
> default -Wimplicit-int; if they do -Wno-implicit -Wimplicit that would 
> have the same effect as just -Wimplicit (so keeping the default warnings 
> enabled, and possibly enabling others).
>
OK. I will try to implement that later (I don't think -Wimplicit-int is
necessary to enable -Wreturn-type by default).
Besides that, are you OK with my changes? (with the tests updated)

Thanks,
Sylvestre

Re: [Patch] PR55189 enable -Wreturn-type by default

2014-06-17 Thread Joseph S. Myers

On Tue, 17 Jun 2014, Sylvestre Ledru wrote:

> OK. I will do that.
> We should test the following:
> * default => run just -Wreturn-type
> * -Wreturn-type => Run both
> * -Wreturn-type + -Wmissing-return => Run both
> * -Wno-return-type + -Wmissing-return => Run just the second one
> * -Wno-return-type + -Wno-missing-return => Run none
> Do you see any other?

That looks like the right things to test, if there are no changes for 
anything other than those options.

> Besides that, are you OK with my changes? (with the tests updated)

The tests are key to reviewing whether the code changes actually do the 
right thing.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] C++ thunk section names

2014-06-17 Thread Sriraman Tallam

Ping.

On Mon, Jun 9, 2014 at 3:54 PM, Sriraman Tallam  wrote:
> Ping.
>
> On Mon, May 19, 2014 at 11:25 AM, Sriraman Tallam  wrote:
>> Ping.
>>
>> On Thu, Apr 17, 2014 at 10:41 AM, Sriraman Tallam  
>> wrote:
>>> Ping.
>>>
>>> On Wed, Feb 5, 2014 at 4:31 PM, Sriraman Tallam  wrote:
 Hi,

   I would like this patch reviewed and considered for commit when
 Stage 1 is active again.

 Patch Description:

 A C++ thunk's section name is set to be the same as the original function's
 section name for which the thunk was created in order to place the two
 together.  This is done in cp/method.c in function use_thunk.
 However, with function reordering turned on, the original function's 
 section
 name can change to something like ".text.hot." or
 ".text.unlikely." in function default_function_section in 
 varasm.c
 based on the node count of that function.  The thunk function's section 
 name
 is not updated to be the same as the original here and also is not always
 correct to do it as the original function can be hotter than the thunk.

 I have created a patch to not name the thunk function's section to be the 
 same
 as the original function when function reordering is enabled.

 Thanks
 Sri

Re: [PATCH][RFC] Add phiopt in early opts (and add -fssa-phiopt option)

2014-06-17 Thread Jeff Law


On 06/17/14 07:07, Richard Biener wrote:


I felt that -ftree-XXX is bad naming so I went for -fssa-XXX
even if that is now inconsistent.  Any optinion here?  For
RTL we simply have unsuffixed names so shall we instead go
for -fphiopt?  PHI implies SSA anyway and 'SSA' or 'RTL' is
an implementation detail that the user should not be interested
in (applies to tree- as well, of course).  Now, 'phiopt' is a
bad name when thinking of users (but they shouldn't play with
those options anyway).
Our flags are a mess.  If I put my user hat on, then I'd have to ask the 
question, why would I care about tree, ssa, or even phis.   The pass 
converts branchy code into straightline code.  So, arguably, the right 
name would reflect that it changes branchy code to straight line code.


But I believe most of our flag names are poor in this regard (and I'm as 
much to blame as anyone).  So go with your best judgement IMHO.


It'd be nice to have some testcases here to show why we want this moved 
earlier so that a few years from now when someone else wants to move it 
back, we can say "umm, see test frobit.c, make that work and you can 
move it back" :-)


jeff

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-06-17 Thread Ilya Verbin

Hello Bernd,

On 28 Feb 17:21, Bernd Schmidt wrote:
> For your use case, I'd imagine the offload compiler would be built
> relatively normally as a full build with
> "--enable-as-accelerator-for=x86_64-linux", which would install it
> into locations where the host will eventually be able to find it.
> Then the host compiler would be built with another new configure
> option (as yet unimplemented in my patch set)
> "--enable-offload-targets=mic,..." which would tell the host
> compiler about the pre-built offload target compilers. On the ptx

I don't get this part of the plan.  Where a host compiler will look for 
mkoffloads?

E.g., first I configure/make/install the target gcc and corresponding mkoffload 
with the following options:
--enable-accelerator=intelmic --enable-as-accelerator-for=x86_64-unknown-linux 
--prefix=/install_gcc/accel_intelmic

Next I configure/make/install the host gcc with:
--enable-accelerator=intelmic --prefix=/install_gcc/host

Now if I manually copy mkoffload from target's install dir into one of the dirs 
in host's $COMPILER_PATH,
then lto-wrapper finds it and everything works fine.
E.g.: mkdir -p 
/install_gcc/host/libexec/gcc/x86_64-unknown-linux-gnu/accel/intelmic/ &&
cp 
/install_gcc/accel_intelmic/libexec/gcc/x86_64-unknown-linux/4.10.0/accel/x86_64-unknown-linux-gnu/mkoffload
/install_gcc/host/libexec/gcc/x86_64-unknown-linux-gnu/accel/intelmic/

But what was your idea of how to tell host gcc about the path to mkoffload?

Thanks,
  -- Ilya

Re: [PATCH] PR54555: Use strict_low_part for loading a constant only if it is cheaper

2014-06-17 Thread Jeff Law


On 06/17/14 01:47, Andreas Schwab wrote:

Postreload may transform (set (REGX) (CONST_INT A)) ... (set (REGX)
(CONST_INT B)) to (set (REGX) (CONST_INT A)) ... (set (STRICT_LOW_PART
(REGX)) (CONST_INT B)), but it should do that only if the latter is
cheaper.  On m68k, a full word load of a small constant with moveq is
cheaper than doing a byte load with move.b.

Tested on m68k-suse-linux and x86_64-suse-linux.  In both cases the size
of cc1* becomes smaller with this change.

Andreas.

PR rtl-optimization/54555
* postreload.c (move2add_use_add2_insn): Only substitute
STRICT_LOW_PART if it is cheaper.
Sadly, Kazu didn't add a testcase for the H8/300 cases which inspired 
his change, so we don't know if your patch hurts the H8/300 port or not.


Let's do better this time ;-)  Add a testcase for the m68k port which 
verifies we're getting the desired code.  I don't care if you test the 
assembly code or test the RTL dumps, just that we have a test for the 
case where STRICT_LOW_PART is not a win.


With a testcase, this is approved.

Thanks,

jeff

Re: [PATCH, Cilk+, PR57541] Additional fix for issues witn array notations

2014-06-17 Thread Jeff Law


On 06/16/14 14:13, Zamyatin, Igor wrote:

Hi All!

The patch fixes ICE in array notation for the cases of incorrect arguments of 
Cilk+ builtins and undeclared initial index.

Is it ok for trunk and 4.9?

Thanks,
Igor

diff --git a/gcc/c/ChangeLog b/gcc/c/ChangeLog
index 54d0de7..56e1b0b 100644
--- a/gcc/c/ChangeLog
+++ b/gcc/c/ChangeLog
@@ -1,3 +1,12 @@
+2014-06-16  Igor Zamyatin  
+
+   PR middle-end/57541
+   * c-array-notation.c (fix_builtin_array_notation_fn):
+   Check for 0 arguments in builtin call. Check that bultin argument is
+   correct.
+   * c-parser.c (c_parser_array_notation): Check for incorrect initial
+   index.
Shouldn't this have been caught earlier?  ISTM we should be catching any 
argument mix-ups during parsing?!?Is there some reason we don't do that?


jeff

Re: [PATCH, PR 61211] Fix a bug in clone_of_p verification

2014-06-17 Thread Martin Jambor

Ping.

Thanks,

Martin


On Sat, May 31, 2014 at 12:46:03AM +0200, Martin Jambor wrote:
> Hi,
> 
> after a clone is materialized, its clone_of field is cleared which in
> PR 61211 leads to a failure in the skipped_thunk path in clone_of_p in
> cgraph.c, which then leads to a false positive verification failure.
> 
> Fixed by the following patch.  Bootstrapped and tested on x86_64-linux
> on both the trunk and the 4.9 branch.  OK for both?
> 
> Thanks,
> 
> Martin
> 
> 
> 2014-05-30  Martin Jambor  
> 
>   PR ipa/61211
>   * cgraph.c (clone_of_p): Allow skipped_branch to deal with
>   expanded clones.
> 
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index ff65b86..f18f977 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -2566,11 +2566,16 @@ clone_of_p (struct cgraph_node *node, struct 
> cgraph_node *node2)
>skipped_thunk = true;
>  }
>  
> -  if (skipped_thunk
> -  && (!node2->clone_of
> -   || !node2->clone.args_to_skip
> -   || !bitmap_bit_p (node2->clone.args_to_skip, 0)))
> -return false;
> +  if (skipped_thunk)
> +{
> +  if (!node2->clone.args_to_skip
> +   || !bitmap_bit_p (node2->clone.args_to_skip, 0))
> + return false;
> +  if (node2->former_clone_of == node->decl)
> + return true;
> +  else if (!node2->clone_of)
> + return false;
> +}
>  
>while (node != node2 && node2)
>  node2 = node2->clone_of;

Re: [patch libatomic]: Add basic support for mingw targets

2014-06-17 Thread Jeff Law


On 06/16/14 07:20, Kai Tietz wrote:

Hello,

this patch adds basic support for libatomic for mingw targets using
win32 and for mingw targets using posix threading model.

The win32 implemenation might need for initialization of mutexes a
critical section.  If issue occures we can still add that.  For now
all testcases are passing for native and posix-threading model mingw
(32-bit and 64-bit).

ChangeLog

2014-06-16  Kai Tietz  

 * Makefile.am (libatomic_la_LDFLAGS): Add lt_host_flags.
Isn't this all target stuff, in which case lt_host_flags seems 
inappropriate.  Or is this just poorly named?


The rest seems reasonable.  So we just need to settle that nit and we 
can go forward.


jeff

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-06-17 Thread Bernd Schmidt


On 06/17/2014 08:20 PM, Ilya Verbin wrote:

Hello Bernd,

On 28 Feb 17:21, Bernd Schmidt wrote:

For your use case, I'd imagine the offload compiler would be built
relatively normally as a full build with
"--enable-as-accelerator-for=x86_64-linux", which would install it
into locations where the host will eventually be able to find it.
Then the host compiler would be built with another new configure
option (as yet unimplemented in my patch set)
"--enable-offload-targets=mic,..." which would tell the host
compiler about the pre-built offload target compilers. On the ptx


I don't get this part of the plan.  Where a host compiler will look for 
mkoffloads?

E.g., first I configure/make/install the target gcc and corresponding mkoffload 
with the following options:
--enable-accelerator=intelmic --enable-as-accelerator-for=x86_64-unknown-linux 
--prefix=/install_gcc/accel_intelmic

Next I configure/make/install the host gcc with:
--enable-accelerator=intelmic --prefix=/install_gcc/host


Try using the same prefix for both.


Bernd

Re: [patch i386]: Combine memory and indirect jump

2014-06-17 Thread Jeff Law


On 06/13/14 10:59, Kai Tietz wrote:

2014-06-13 17:58 GMT+02:00 Jeff Law :

On 06/13/14 09:56, Richard Henderson wrote:


On 06/13/2014 08:36 AM, Jeff Law wrote:


So you may have answered this already, but why can't this be a combiner
pattern?



Until pass_duplicate_computed_gotos, we (intentionally) have a single
indirect
branch in the entire function.  This vastly reduces the size of the CFG.


Ah, the factoring bits.  Should have known.




Peep2 is currently running before d_c_g, so currently Kai can't solve this
problem in peep2.

I don't think peep2 should run after sched2, but I'll bet we can reorder
things
a bit so that d_c_g runs before peep2.


Yea, seems worth a try.

jeff



Well, I tested to put the second sched2 pass before the sched2 pass.
That works in general.  There are just some opportunties which weren't
caught then.  I attached a sample, which demonstrates that pretty
well.  I noticed that I had to put that pass behind reload blocks was
necessary for better hit-rate of the peephole optimization.
So can you tell us why this sample code misses opportunities?  Otherwise 
we have to dig into it ourselves to tease out that information.


I think we're zeroing in on a path to move d_c_g before peep2, but I'd 
like to have a clearer understanding of why we'd still be missing 
opportunities.  If we can avoid running peep2 twice, that'd be good.


jeff

Re: [Patch, Fortran] Add coarray communication support to the trunk (coindex variables)

2014-06-17 Thread Paul Richard Thomas

Dear Tobias and Alessandro,

Well what can I say?  The patch is something of a tour de force!
Sandro, questo è assolutamente meraviglioso. Molte grazie da tutti
noi.

I have done nothing to check the functionality of the patch.  However,
I have checked the conformance with coding standards and that it is
well and truly insulated from the rest of gfortran by the coarray
option.

OK for trunk

Once again many thanks for the patch.

Paul


On 17 June 2014 08:28, Tobias Burnus  wrote:
> This patch add the first coarray communication support to the trunk
> (ignoring the co_sum/co_min/co_max support, which was recently merged).
> [Note: In terms of the library this is still libcaf_single, but see below.]
>
> The patch is based on my work on the fortran-caf branch, but has a slightly
> modified ABI. The patch should support most communications, but it is not
> complete. I intent to submit soon a patch which irons some wrinkles.
>
> In particular, this patch adds three library calls to handle coindexed
> communication: Assignment to a coindex variable (caf_send), a coindexed
> expression (caf_expression) and assigning a coindexed variable to a
> coindexed variable (caf_sendget). The coarray is identified by a token
> (opaque object provided by the coarray library), an offset to that base
> address, an image index and an array descriptor for the coarray, which is
> also used for scalars – and which has the value of the whole array for
> vector subscripts. Additionally, one passes a "kind" variable as extra
> argument as the current array descriptor cannot destinguish a len=1 kind=4
> from a len=4 kind=1 character string. And for vector subscripts, the
> subscripts are passed as additional argument.
>
> For assignments, the library is supposed to handle padding/trimming of
> strings and type conversion (e.g. "cmplx_caf(:)[i] = int_array") as well as
> "array = scalar" assignments.
>
> The following is left to be done as follow up:
>
> * Support of vector subscripts with assumed-size variables: To be tested;
> might need the new array descriptor or some similar work around – or just a
> test case.
> * The library libcaf_single supports padding/trimming of strings but still
> lacks the support for type conversion and vector subscripts.
> * Adding an ABI documentation
> * There are still some issues with regards to polymorphic coarrays, in
> particular with passing them as dummy arguments and in ASSOCIATE/SELECT
> TYPE, but presumably also with using them in coindexed expressions.
>
> And as bigger item: Allocatable components of coarrays are not supported –
> not is the access to pointer or allocatable components (part refs);
> currently, there is no compile time diagnostic for it.
>
>
> Additionally, I have remove the vector subscript preparations from the
> co_sum/min/max as it does not make much sense for those. And I added a
> collective test case, which I found on my hard disk.
>
> Build and regtested. OK for the trunk?
>
> Tobias
>
> PS: Additional missing bits, not listed above: Locking and CRITICAL and
> atomics for Fortran 2008. And for TS18508 co_broadcast and co_reduce, the
> atomics extensions, teams, events and error recovery.



-- 
The knack of flying is learning how to throw yourself at the ground and miss.
   --Hitchhikers Guide to the Galaxy

Re: [patch libatomic]: Add basic support for mingw targets

2014-06-17 Thread Kai Tietz

2014-06-17 21:16 GMT+02:00 Jeff Law :
> On 06/16/14 07:20, Kai Tietz wrote:
>>
>> Hello,
>>
>> this patch adds basic support for libatomic for mingw targets using
>> win32 and for mingw targets using posix threading model.
>>
>> The win32 implemenation might need for initialization of mutexes a
>> critical section.  If issue occures we can still add that.  For now
>> all testcases are passing for native and posix-threading model mingw
>> (32-bit and 64-bit).
>>
>> ChangeLog
>>
>> 2014-06-16  Kai Tietz  
>>
>>  * Makefile.am (libatomic_la_LDFLAGS): Add lt_host_flags.
>
> Isn't this all target stuff, in which case lt_host_flags seems
> inappropriate.  Or is this just poorly named?

Hmm, libatomic is here build for new host (means it is a gcc-target
library).  So it might be named poorly.  Nevertheless see for details
ACX_LT_HOST_FLAGS in config/lthostflags.m4 and why it is required to
set -no-undefined and the proper bindir for cygwin/mingw.

> The rest seems reasonable.  So we just need to settle that nit and we can go
> forward.
>
> jeff

Kai

[PATCH, rs6000] Remove XFAIL from default_format_denormal_2.f90 for PowerPC on Linux

2014-06-17 Thread William J. Schmidt

Hi,

The testcase gfortran.dg/default_format_denormal_2.f90 has been
reporting XPASS since 4.8 on the powerpc*-unknown-linux-gnu platforms.
This patch removes the XFAIL for powerpc*-*-linux-* from the test.  I
believe this pattern doesn't match any other platforms, but please let
me know if I should replace it with a more specific pattern instead.

Verified on powerpc64-unknown-linux-gnu (-m32 and -m64) and
powerpc64le-unknown-linux-gnu (-m64).  Is this ok for trunk, 4.9, and
4.8?

Thanks,
Bill


2014-06-17  Bill Schmidt  

* gfortran.dg/default_format_denormal_2.f90:  Remove xfail for
powerpc*-*-linux*.


Index: gcc/testsuite/gfortran.dg/default_format_denormal_2.f90
===
--- gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 (revision 
211741)
+++ gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 (working copy)
@@ -1,5 +1,5 @@
 ! { dg-require-effective-target fortran_large_real }
-! { dg-do run { xfail powerpc*-apple-darwin* powerpc*-*-linux* } }
+! { dg-do run { xfail powerpc*-apple-darwin* } }
 ! Test XFAILed on these platforms because the system's printf() lacks
 ! proper support for denormalized long doubles. See PR24685
 !

[Fortran-dev] Merge from the trunk

2014-06-17 Thread Tobias Burnus


Dear all,

I have now updated the "fortran-dev" branch to trunk version Rev. 
211744. Committed as Rev. 211745.


Tobias

Re: [PATCH, PR 61160] Artificial thunks need combined_args_to_skip

2014-06-17 Thread Martin Jambor

Hi,

Ping.

Thanks,

Martin


On Sat, May 31, 2014 at 01:08:31AM +0200, Martin Jambor wrote:
> Hi,
> 
> the second issue in PR 61160 is that because artificial thunks
> (produced by duplicate_thunk_for_node) do not have
> combined_args_to_skip, calls to them do not get actual arguments
> removed, while the actual functions do loose their formal parameters,
> leading to mismatches.
> 
> Currently, the combined_args_to_skip is computed in of
> cgraph_create_virtual_clone only after all the edge redirection and
> thunk duplication is done so it had to be moved to a spot before
> that.  Since we already pass args_to_skip to cgraph_clone_node, I
> moved the computation there (otherwise it would have to duplicate the
> old value and also pass the new one to the redirection routine).
> 
> I have also noticed that the code producing combined_args_to_skip from
> an old value and new args_to_skip cannot work in LTO because we do not
> have DECL_ARGUMENTS available at WPA in LTO.  The wrong code is
> however never executed and so I replaced it with a simple bitmap_ior.
> This changes the semantics of args_to_skip for any user of
> cgraph_create_virtual_clone that would like to remove some parameters
> from something which is already a clone.  However, currently there are
> no such users and the new semantics is saner because WPA code will be
> happier using the old indices rather than remapping everything the
> whole time.
> 
> I am still in the process of bootstrapping and testing this patch on
> trunk, I will test it on the 4.9 branch too.  OK if it passes
> everywhere?
> 
> Thanks,
> 
> Martin
> 
> 
> 
> 2014-05-29  Martin Jambor  
> 
>   PR ipa/61160
>   * cgraphclones.c (duplicate_thunk_for_node): Removed parameter
>   args_to_skip, use those from node instead.  Copy args_to_skip and
>   combined_args_to_skip from node to the new thunk.
>   (redirect_edge_duplicating_thunks): Removed parameter args_to_skip.
>   (cgraph_create_virtual_clone): Moved computation of
>   combined_args_to_skip...
>   (cgraph_clone_node): ...here, simplify it to bitmap_ior..
> 
> testsuite/
>   * g++.dg/ipa/pr61160-2.C: New test.
>   * g++.dg/ipa/pr61160-3.C: Likewise.
> 
> diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
> index 4387b99..91cc13c 100644
> --- a/gcc/cgraphclones.c
> +++ b/gcc/cgraphclones.c
> @@ -301,14 +301,13 @@ set_new_clone_decl_and_node_flags (cgraph_node 
> *new_node)
> thunk is this_adjusting but we are removing this parameter.  */
>  
>  static cgraph_node *
> -duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node,
> -   bitmap args_to_skip)
> +duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node)
>  {
>cgraph_node *new_thunk, *thunk_of;
>thunk_of = cgraph_function_or_thunk_node (thunk->callees->callee);
>  
>if (thunk_of->thunk.thunk_p)
> -node = duplicate_thunk_for_node (thunk_of, node, args_to_skip);
> +node = duplicate_thunk_for_node (thunk_of, node);
>  
>struct cgraph_edge *cs;
>for (cs = node->callers; cs; cs = cs->next_caller)
> @@ -320,17 +319,18 @@ duplicate_thunk_for_node (cgraph_node *thunk, 
> cgraph_node *node,
>return cs->caller;
>  
>tree new_decl;
> -  if (!args_to_skip)
> +  if (!node->clone.args_to_skip)
>  new_decl = copy_node (thunk->decl);
>else
>  {
>/* We do not need to duplicate this_adjusting thunks if we have removed
>this.  */
>if (thunk->thunk.this_adjusting
> -   && bitmap_bit_p (args_to_skip, 0))
> +   && bitmap_bit_p (node->clone.args_to_skip, 0))
>   return node;
>  
> -  new_decl = build_function_decl_skip_args (thunk->decl, args_to_skip,
> +  new_decl = build_function_decl_skip_args (thunk->decl,
> + node->clone.args_to_skip,
>   false);
>  }
>gcc_checking_assert (!DECL_STRUCT_FUNCTION (new_decl));
> @@ -348,6 +348,8 @@ duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node 
> *node,
>new_thunk->thunk = thunk->thunk;
>new_thunk->unique_name = in_lto_p;
>new_thunk->former_clone_of = thunk->decl;
> +  new_thunk->clone.args_to_skip = node->clone.args_to_skip;
> +  new_thunk->clone.combined_args_to_skip = node->clone.combined_args_to_skip;
>  
>struct cgraph_edge *e = cgraph_create_edge (new_thunk, node, NULL, 0,
> CGRAPH_FREQ_BASE);
> @@ -364,12 +366,11 @@ duplicate_thunk_for_node (cgraph_node *thunk, 
> cgraph_node *node,
> chain.  */
>  
>  void
> -redirect_edge_duplicating_thunks (struct cgraph_edge *e, struct cgraph_node 
> *n,
> -   bitmap args_to_skip)
> +redirect_edge_duplicating_thunks (struct cgraph_edge *e, struct cgraph_node 
> *n)
>  {
>cgraph_node *orig_to = cgraph_function_or_thunk_node (e->callee);
>if (orig_to->thunk.thunk_p)
> -n = duplicate_thunk_for_node (orig_to, n, args_to_skip);

Re: [PATCH 1/5] New Identical Code Folding IPA pass

2014-06-17 Thread Jeff Law


On 06/13/14 04:24, mliska wrote:


You may ask, why the GNU GCC does need such a new optimization. The
compiler, having simply better knowledge of a compiled source file,
is capable of reaching better results, especially if Link-Time
optimization is enabled. Apart from that, GCC implementation adds
support for read-only variables like construction vtables (mentioned
in:
http://hubicka.blogspot.cz/2014/02/devirtualization-in-c-part-3-building.html).
Can you outline at a high level cases where GCC's knowledge allows it to 
reach a better result?  Is it because you're not requiring bit for bit 
identical code, but that the code merely be semantically equivalent?


The GCC driven ICF seems to pick up 2X more opportunities than the gold 
driven ICF.  But if I'm reading everything correctly, that includes ICF 
of both functions and variables.


Do you have any sense of how those improvements break down?  ie, is it 
mostly more function's you're finding as identical, and if so what is it 
about the GCC implementation that allows us to find more ICF 
opportunities.  If it's mostly variables, that's fine too.  I'm just 
trying to understand where the improvements are coming from.


Jeff

Re: [PATCH 4/5] Existing tests fix

2014-06-17 Thread Jeff Law


On 06/13/14 04:48, mliska wrote:

Hi,
   many tests rely on a precise number of scanned functions in a dump file. If 
IPA ICF decides to merge some function and(or) read-only variables, counts do 
not match.

Martin

Changelog:

2014-06-13  Martin Liska  
Honza Hubicka  

* c-c++-common/rotate-1.c: Text
* c-c++-common/rotate-2.c: New test.
* c-c++-common/rotate-3.c: Likewise.
* c-c++-common/rotate-4.c: Likewise.
* g++.dg/cpp0x/rv-return.C: Likewise.
* g++.dg/cpp0x/rv1n.C: Likewise.
* g++.dg/cpp0x/rv1p.C: Likewise.
* g++.dg/cpp0x/rv2n.C: Likewise.
* g++.dg/cpp0x/rv3n.C: Likewise.
* g++.dg/cpp0x/rv4n.C: Likewise.
* g++.dg/cpp0x/rv5n.C: Likewise.
* g++.dg/cpp0x/rv6n.C: Likewise.
* g++.dg/cpp0x/rv7n.C: Likewise.
* gcc.dg/ipa/ipacost-1.c: Likewise.
* gcc.dg/ipa/ipacost-2.c: Likewise.
* gcc.dg/ipa/ipcp-agg-6.c: Likewise.
* gcc.dg/ipa/remref-2a.c: Likewise.
* gcc.dg/ipa/remref-2b.c: Likewise.
* gcc.dg/pr46309-2.c: Likewise.
* gcc.dg/torture/ipa-pta-1.c: Likewise.
* gcc.dg/tree-ssa/andor-3.c: Likewise.
* gcc.dg/tree-ssa/andor-4.c: Likewise.
* gcc.dg/tree-ssa/andor-5.c: Likewise.
* gcc.dg/vect/no-vfa-pr29145.c: Likewise.
* gcc.dg/vect/vect-cond-10.c: Likewise.
* gcc.dg/vect/vect-cond-9.c: Likewise.
* gcc.dg/vect/vect-widen-mult-const-s16.c: Likewise.
* gcc.dg/vect/vect-widen-mult-const-u16.c: Likewise.
* gcc.dg/vect/vect-widen-mult-half-u8.c: Likewise.
* gcc.target/i386/bmi-1.c: Likewise.
* gcc.target/i386/bmi-2.c: Likewise.
* gcc.target/i386/pr56564-2.c: Likewise.
* g++.dg/opt/pr30965.C: Likewise.
* g++.dg/tree-ssa/pr19637.C: Likewise.
* gcc.dg/guality/csttest.c: Likewise.
* gcc.dg/ipa/iinline-4.c: Likewise.
* gcc.dg/ipa/iinline-7.c: Likewise.
* gcc.dg/ipa/ipa-pta-13.c: Likewise.
I know this is the least interesting part of your changes, but it's also 
simple and mechanical and thus trivial to review.  Approved, but 
obviously don't install until the rest of your patch has been approved.


Similar changes for recently added tests or cases where you might 
improve ICF requiring similar tweaks to existing tests are pre-approved 
as well.


jeff

Re: [PATCH 5/5] New tests introduction

2014-06-17 Thread Jeff Law


On 06/13/14 05:16, mliska wrote:

Hi,
this is a new collection of tests for IPA ICF pass.

Martin

Changelog:

2014-06-13  Martin Liska  
Honza Hubicka  

* gcc/testsuite/g++.dg/ipa/ipa-se-1.C: New test.
* gcc/testsuite/g++.dg/ipa/ipa-se-2.C: Likewise.
* gcc/testsuite/g++.dg/ipa/ipa-se-3.C: Likewise.
* gcc/testsuite/g++.dg/ipa/ipa-se-4.C: Likewise.
* gcc/testsuite/g++.dg/ipa/ipa-se-5.C: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-1.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-10.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-11.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-12.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-13.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-14.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-15.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-16.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-17.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-18.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-19.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-2.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-20.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-21.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-22.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-23.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-24.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-25.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-26.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-27.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-28.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-3.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-4.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-5.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-6.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-7.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-8.c: Likewise.
* gcc/testsuite/gcc.dg/ipa/ipa-se-9.c: Likewise.

Also approved, but please don't install entire the entire kit is approved.

I'd like to applaud you and Jan for including a nice baseline of tests.

jeff

Re: [PATCH 2/5] Existing call graph infrastructure enhancement

2014-06-17 Thread Jeff Law


On 06/13/14 04:26, mliska wrote:

Hi,
 this small patch prepares remaining needed infrastructure for the new pass.

Changelog:

2014-06-13  Martin Liska  
Honza Hubicka  

* ipa-utils.h (polymorphic_type_binfo_p): Function marked external
instead of static.
* ipa-devirt.c (polymorphic_type_binfo_p): Likewise.
* ipa-prop.h (count_formal_params): Likewise.
* ipa-prop.c (count_formal_params): Likewise.
* ipa-utils.c (ipa_merge_profiles): Be more tolerant if we merge
profiles for semantically equivalent functions.
* passes.c (do_per_function): If we load body of a function during WPA,
this condition should behave same.
* varpool.c (ctor_for_folding): More tolerant assert for variable
aliases created during WPA.
Presumably we don't have any useful way to merge the cases where we have 
provides for SRC & DST in ipa_merge_profiles or even to guess which is 
more useful when presented with both?  Does it make sense to log this 
into a debugging file when we drop one?


I think this patch is fine.  If adding logging makes sense, then feel 
free to do so and consider that trivial change pre-approved.


Jeff

Re: [PATCH 1/5] New Identical Code Folding IPA pass

2014-06-17 Thread Paolo Carlini


Hi,

On 13/06/14 12:24, mliska wrote:

   The optimization is inspired by Microsoft /OPT:ICF optimization 
(http://msdn.microsoft.com/en-us/library/bxwfs976.aspx) that merges COMDAT 
sections with each function reside in a separate section.
In terms of C++ testcases, I'm wondering if you already double checked 
that the new pass already does well on the typical examples on which, I 
was told, the Microsoft optimization is known to do well, eg, code 
instantiating std::vector for different pointer types, or even long and 
long long on x86_64-linux, things like that.


Thanks,
Paolo.

Re: [patch libatomic]: Add basic support for mingw targets

2014-06-17 Thread Jeff Law


On 06/17/14 13:31, Kai Tietz wrote:

2014-06-17 21:16 GMT+02:00 Jeff Law :

On 06/16/14 07:20, Kai Tietz wrote:


Hello,

this patch adds basic support for libatomic for mingw targets using
win32 and for mingw targets using posix threading model.

The win32 implemenation might need for initialization of mutexes a
critical section.  If issue occures we can still add that.  For now
all testcases are passing for native and posix-threading model mingw
(32-bit and 64-bit).

ChangeLog

2014-06-16  Kai Tietz  

  * Makefile.am (libatomic_la_LDFLAGS): Add lt_host_flags.


Isn't this all target stuff, in which case lt_host_flags seems
inappropriate.  Or is this just poorly named?


Hmm, libatomic is here build for new host (means it is a gcc-target
library).  So it might be named poorly.  Nevertheless see for details
ACX_LT_HOST_FLAGS in config/lthostflags.m4 and why it is required to
set -no-undefined and the proper bindir for cygwin/mingw.
Right, I'm aware that libatomic is a target library.  What I'm worried 
about is confusion due to using ACX_LT_HOST_FLAGS and possible pollution 
of flags originally the host being used for the target library build.


Given that several other libraries use similar constraints to get 
lt_host_flags into LDFLAGS, I guess pollution isn't (or better stated 
hasn't) been an issue.


Approved.

Jeff

Re: [PATCH 1/5] New Identical Code Folding IPA pass

2014-06-17 Thread David Malcolm

On Fri, 2014-06-13 at 12:24 +0200, mliska wrote:
[...snip...]
>   Statistics about the pass:
>   Inkscape: 11.95 MB -> 11.44 MB (-4.27%)
>   Firefox: 70.12 MB -> 70.12 MB (-3.07%)

FWIW, you wrote 70.12 MB here for both before and after for Firefox, but
give a -3.07% change, which seems like a typo.

A 3.07% reduction from 70.12 MB would be 67.97 MB; was this what the
pass achieved?

[...snip...]

Thanks (nice patch, btw)
Dave

Re: [PATCH, Pointer Bounds Checker 28/x] IPA CP

2014-06-17 Thread Jeff Law


On 06/17/14 07:41, Martin Jambor wrote:

Hi,

On Wed, Jun 11, 2014 at 05:47:36PM +0400, Ilya Enkovich wrote:


Here is fixed verison.


I'm fine with the ipa-cp hunks but I cannot approve them, Honza is the
right person to ask.
I'll step in and say these bits are fine :-)  Thanks for the reviews 
Martin.


Ilya, please hold off installing until all the patches are approved. 
We're obviously trying to keep up with them as they come in.



jeff

Re: [PATCH][genattrtab] Fix memory corruption, allocate enough memory for all bypassed reservations

2014-06-17 Thread Jeff Law


On 06/17/14 02:12, Kyrill Tkachov wrote:


On 16/06/14 17:39, Jeff Law wrote:

On 06/16/14 04:12, Kyrill Tkachov wrote:


Doh, you're right. I did consider it but for some reason thought we
might want to iterate over all of the bypasses anyway. Breaking out
seems good.

How about this?
Tested on arm and aarch64 and confirmed with valgrind that no out of
bounds accesses occur.
I kicked off an x86_64 bootstrap but don't expect any problems.

Thanks,
Kyrill

genattrtab-bypasses.patch


commit 676b85f7a7cc1446482334dcaad457ac328875a8
Author: Kyrylo Tkachov
Date:   Fri Jun 13 11:09:57 2014 +0100

  [genattrtab] Fix memory corruption with bypasses

I'm an idiot.  n_bypassed is used to size the vector, so you do have to
walk the entire list.


AFAICS in the loop in process_bypasses we want to count all the
reservations which have a bypass matching them. Once a reservation is
matched with a bypass it should be safe to break out of the inner loop
(over the bypasses), even if two bypasses match a reservation we only
want to count the reservation once.

So I think the 2nd version of the patch is good

OK.  APproved.

jeff

Re: [PING][PATCH, trunk, 4.9, 4.8] Fix PR57653, filename information discarded when using -imacros

2014-06-17 Thread Jeff Law


On 06/11/14 15:15, Peter Bergner wrote:

I'd like to ping the following patch that fixes PR57653.  This did
bootstrap and regtest with no regressions on powerpc64-linux.

 https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01571.html

Is this ok for trunk, 4.9 and 4.8?

Whee, fun.

So this led me to an interesting exchange between Per & DJ on some of 
Per's changes in this space.


Sadly, it doesn't look like Per checked in any tests for the problems DJ 
was running into.


I hate to ask Peter, but can you add some testcases?  These messages 
have the originals which led to the unsightly code we have now.


https://gcc.gnu.org/ml/gcc-patches/2003-10/msg02694.html
https://gcc.gnu.org/ml/gcc-patches/2003-11/msg00163.html

I know 57653's problem is specific to the stdc-predef that's included in 
glibc-2.17 and later, but that's becoming relatively common at this 
point.  I think c#2 has the testcase.


Approved with the tests added.

Thanks and sorry for the delay.

Jeff

Re: [patch i386]: Combine memory and indirect jump

2014-06-17 Thread Kai Tietz

2014-06-17 21:26 GMT+02:00 Jeff Law :
> On 06/13/14 10:59, Kai Tietz wrote:
>>
>> 2014-06-13 17:58 GMT+02:00 Jeff Law :
>>>
>>> On 06/13/14 09:56, Richard Henderson wrote:


 On 06/13/2014 08:36 AM, Jeff Law wrote:
>
>
> So you may have answered this already, but why can't this be a combiner
> pattern?



 Until pass_duplicate_computed_gotos, we (intentionally) have a single
 indirect
 branch in the entire function.  This vastly reduces the size of the CFG.
>>>
>>>
>>> Ah, the factoring bits.  Should have known.
>>>
>>>

 Peep2 is currently running before d_c_g, so currently Kai can't solve
 this
 problem in peep2.

 I don't think peep2 should run after sched2, but I'll bet we can reorder
 things
 a bit so that d_c_g runs before peep2.
>>>
>>>
>>> Yea, seems worth a try.
>>>
>>> jeff
>>>
>>
>> Well, I tested to put the second sched2 pass before the sched2 pass.
>> That works in general.  There are just some opportunties which weren't
>> caught then.  I attached a sample, which demonstrates that pretty
>> well.  I noticed that I had to put that pass behind reload blocks was
>> necessary for better hit-rate of the peephole optimization.
>
> So can you tell us why this sample code misses opportunities?  Otherwise we
> have to dig into it ourselves to tease out that information.
>
> I think we're zeroing in on a path to move d_c_g before peep2, but I'd like
> to have a clearer understanding of why we'd still be missing opportunities.
> If we can avoid running peep2 twice, that'd be good.
>
> jeff

Hi Jeff,

I just did retest my testcase with recent source. I can't reproduce
this missed optimization before sched2 pass anymore.  I moved second
peephole2 pass just before split_before_sched2 and everything got
caught.

To remove first peephole2 pass seems to cause weaker code for
impossible pushes, etc

Nevertheless it might be a point to make this new peephole instead a
define_split?  I admit that this operation isn't a split, nevertheless
we would avoid a second peephole pass.

Kai

Re: [patch] improve sloc assignment on bind_expr entry/exit code

2014-06-17 Thread Jeff Law


On 06/11/14 09:02, Olivier Hainque wrote:

Hello,

For blocks requiring it, the gimplifier generates stack pointer
save/restore operations on entry/exit, per:

  gimplify_bind_expr (...)

   if (gimplify_ctxp->save_stack)
 {
   gimple stack_restore;

   /* Save stack on entry and restore it on exit.  Add a try_finally
 block to achieve this.  */
   build_stack_save_restore (&stack_save, &stack_restore);

   gimplify_seq_add_stmt (&cleanup, stack_restore);
 }

   /* Add clobbers for all variables that go out of scope.  */
   ...

There is no specific location assigned to these entry/exit statements
so they eventually inherits slocs coming from preceding statements.

This is problematic for tools relying on debug info to infer
which statements were executed out of execution traces (allowing
coverage analysis without code instrumentation).

An example of problematic scenario is provided below.

The attached patch is a proposal to improve this by propagating
start and end of block locations from the block structure to the
few gimple statements we generate. It adds an "end_locus" to the
block structure for this purpose, which the Ada front-end knows
how to fill already.

I verified that it does inserts proper .loc directives before the
entry/exit code on the example. The patch also bootstraps and regtests
fine for languages=all,ada on x86_64-pc-linux-gnu.

OK to commit ?

Thanks in advance for your feedback,

With Kind Regards,

Olivier

--

2014-06-11  Olivier Hainque  

* tree-core.h (tree_block): Add an "end_locus" field, allowing
memorization of the end of block source location.
* tree.h (BLOCK_SOURCE_END_LOCATION): New accessor.
* gimplify.c (gimplify_bind_expr): Propagate the block start and
end source location info we have on the block entry/exit code we
generate.
OK.  I assume y'all will add a suitable test to the Ada testsuite and 
propagate it into the GCC testsuite in due course?


jeff

Re: [PATCH, rs6000] Remove XFAIL from default_format_denormal_2.f90 for PowerPC on Linux

2014-06-17 Thread Rainer Orth

"William J. Schmidt"  writes:

> Index: gcc/testsuite/gfortran.dg/default_format_denormal_2.f90
> ===
> --- gcc/testsuite/gfortran.dg/default_format_denormal_2.f90   (revision 
> 211741)
> +++ gcc/testsuite/gfortran.dg/default_format_denormal_2.f90   (working copy)
> @@ -1,5 +1,5 @@
>  ! { dg-require-effective-target fortran_large_real }
> -! { dg-do run { xfail powerpc*-apple-darwin* powerpc*-*-linux* } }
> +! { dg-do run { xfail powerpc*-apple-darwin* } }
>  ! Test XFAILed on these platforms because the system's printf() lacks
>  ! proper support for denormalized long doubles. See PR24685

You should also update the comment: `these platforms' no longer applies.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH 4/5] Existing tests fix

2014-06-17 Thread Rainer Orth

Jeff Law  writes:

> On 06/13/14 04:48, mliska wrote:
>> Hi,
>>many tests rely on a precise number of scanned functions in a dump file. 
>> If IPA ICF decides to merge some function and(or) read-only variables, 
>> counts do not match.
>>
>> Martin
>>
>> Changelog:
>>
>> 2014-06-13  Martin Liska  
>>  Honza Hubicka  
>>
>>  * c-c++-common/rotate-1.c: Text

   ^ Huh?

>>  * c-c++-common/rotate-2.c: New test.
>>  * c-c++-common/rotate-3.c: Likewise.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH, Pointer Bounds Checker 29/x] Debug info

2014-06-17 Thread Jeff Law


On 06/11/14 02:50, Ilya Enkovich wrote:

Hi,

This patch skips all bounds during debug info generation.

Bootstrapped and tested on linux-x86_64.

Thanks,
Ilya
--
gcc/

2014-06-11  Ilya Enkovich  

* dbxout.c (dbxout_type): Ignore POINTER_BOUNDS_TYPE.
* dwarf2out.c (gen_subprogram_die): Ignore bound args.
(gen_type_die_with_usage): Skip pointer bounds.
(dwarf2out_global_decl): Likewise.
(is_base_type): Support POINTER_BOUNDS_TYPE.
(gen_formal_types_die): Skip pointer bounds.
(gen_decl_die): Likewise.
* var-tracking.c (vt_add_function_parameters): Skip
bounds parameters.
OK.  Note that sdbout might need updating as well.   It's used even less 
than dbxout, but if you can see how to skip bounds in there to, it'd be 
appreciated.


It looks like mingw/cygwin still use sdbout (?!?), so if you need 
something tested, you can ping Kai Tietz.


jeff

Re: [PING*2][PATCH] Extend mode-switching to support toggle (1/2)

2014-06-17 Thread Jeff Law

On 06/12/14 08:34, Christian Bruel wrote:

On 06/11/2014 02:00 PM, Christian Bruel wrote:

>On 06/11/2014 06:17 AM, Joern Rennecke wrote:

Joern, is this new target macro interface OK with you ?

>>Yes, this interface should allow me to do switches between rounding
>>and truncating
>>floating-point modes with an add/subtract immediate.
>>
>>However, the implentation, as posted, doesn't work - it causes memory
>>corruption.
>>
>>It appears to work with the attached amendment patch.
>>

>Indeed,  thanks for pointing out the bad reusing of the aux field
>between multiple entities.
>
>In fact rereading this part of the implementation, I find the allocation
>of aux*n_entities awkward. A simpler setting in the entity loop to carry
>the mode directly into eg->aux is possible without array allocation
>(which also fixes a memory leak by the way).
>

Here is the revised version fixing the aforementioned issue found by
Joern on Epiphany. It also simplifies the allocation of the aux edges
field to carry the modes.

Now that everyone agrees on the interface, is this OK for trunk ?

bootstrapped/regtested for X86 and SH4a.

thanks,

Christian

toggle.patch

2014-06-12  Christian Bruel

* mode-switching.c (struct bb_info): Add mode_out, mode_in caches.
(make_preds_opaque): Delete.
(clear_mode_bit, mode_bit_p, set_mode_bit): New macros.
(commit_mode_sets): New function.
(optimize_mode_switching): Handle current_mode to mode_switching_emit.
Process all modes at once.
* basic-block.h (pre_edge_lcm_avs): Declare.
* lcm.c (pre_edge_lcm_avs): Renamed from pre_edge_lcm.
Call clear_aux_for_edges. Fix comments.
(pre_edge_lcm): New wrapper function to call pre_edge_lcm_avs.
(pre_edge_rev_lcm): Idem.
* config/epiphany/epiphany.c (emit_set_fp_mode): Add prev_mode 
parameter.
* config/epiphany/epiphany-protos.h (emit_set_fp_mode): Idem.
* config/epiphany/resolve-sw-modes.c (pass_resolve_sw_modes::execute): 
Idem.
* config/i386/i386.c (x96_emit_mode_set): Idem.
* config/sh/sh.c (sh_emit_mode_set): Likewise. Handle PR toggle.
* config/sh/sh.md (toggle_pr):  Defined if TARGET_FPU_SINGLE.
(fpscr_toggle) Disallow from delay slot.
* target.def (emit_mode_set): Add prev_mode parameter.
* doc/tm.texi: Regenerate.

2014-06-12  Christian Bruel

* gcc.target/sh/fpchg.c: New test.

This is fine for the trunk.

Thanks for your patience,
Jeff

Re: Bug 61407 - Build errors on latest OS X 10.10 Yosemite with Xcode 6 on GCC 4.8.3

2014-06-17 Thread Mike Stump

On Jun 17, 2014, at 4:09 AM, Илья Михальцов  wrote:
> This patch fixes gcc build problems on the latest OS X 10.10 SDK beta (see 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61407)

fix include hack to add:

> +#ifndef __has_feature
> +#define __has_feature(x) 0
> +#endif

So, I’d like to bring this up in the larger context of autoconf, portable code 
what style we’d like for people to write code in.

From a darwin .h file in /usr/include:

#if defined(__has_feature) && defined(__has_attribute)
#if __has_attribute(deprecated)
#define DEPRECATED_ATTRIBUTE__attribute__((deprecated))
#if __has_feature(attribute_deprecated_with_message)
#define DEPRECATED_MSG_ATTRIBUTE(s) __attribute__((deprecated(s)))
#else
#define DEPRECATED_MSG_ATTRIBUTE(s) __attribute__((deprecated))
#endif
#else
#define DEPRECATED_ATTRIBUTE
#define DEPRECATED_MSG_ATTRIBUTE(s)
#endif
#elif defined(__GNUC__) && ((__GNUC__ >= 4) || ((__GNUC__ == 3) && 
(__GNUC_MINOR__ >= 1)))
#define DEPRECATED_ATTRIBUTE__attribute__((deprecated))
#if (__GNUC__ >= 5) || ((__GNUC__ == 4) && (__GNUC_MINOR__ >= 5))
#define DEPRECATED_MSG_ATTRIBUTE(s) __attribute__((deprecated(s)))
#else
#define DEPRECATED_MSG_ATTRIBUTE(s) __attribute__((deprecated))
#endif
#else

I think this serves as a great introduction to the feature and what it is, why 
it exists and what it attempts to do.  In short, give code writers an ability 
to smell a port via #if and or if (), and write portable code without using 
autoconf.  Yes, for some truly hard problems, this scheme breaks down, but if 
gcc and other vendor compilers follow the scheme and define these as 
appropriate, then users can make use of this scheme instead of autoconf.  It 
was code like #if defined(__GNUC__) that causes clang to lie and say it is 
gnuc, it does this, as the code doesn’t use a fine grained check for the 
feature, but rather a course grained check on __GNUC__ which is wrong, as other 
compilers implement __attribute__ and __attribute__((deprecated)) that are not 
gcc.

http://clang.llvm.org/docs/LanguageExtensions.html has the names for the things 
that clang defines.  In gcc, we could elect to use the same names and define 
them as appropriate for gcc.  I think if gcc did this, then the quoted fix 
isn’t necessary.  Also, if gcc doesn’t want to do this, it is reasonable for 
the darwin port to so define features, they tend to be large scale and slow 
moving and monotonic in nature, so the maintenance of them should be low in 
general.

What do people think?

Re: [PATCH, loop2_invariant] Pre-check invariants

2014-06-17 Thread Jeff Law


On 06/11/14 03:35, Zhenqiang Chen wrote:


Thanks for the comments. df_live seams redundant.

With flag_ira_loop_pressure, the pass will call df_analyze () at the
beginning, which can make sure all the DF info are correct.

Can we guarantee all DF_... correct without df_analyze ()?

They should be fine in this context.



+/* Pre-check candidate DEST to skip the one which can not make a valid insn
+   during move_invariant_reg.  SIMPlE is to skip HARD_REGISTER.  */

s/SIMPlE/SIMPLE/



+ {
+   /* Multi definitions at this stage, most likely are due to
+  instruction constrain, which requires both read and write

s/constrain/constraints/

Though that doesn't make sense.  Constraints don't come into play until 
much later in the pipeline.   Certainly there's been code in the 
expanders and elsewhere to try and make the code we generate more 
acceptable to 2-address targets and that's probably what you're really 
running into.   I think the code is fine, but that you need to improve 
the comment.


ISTM that if your primary focus is to filter out read/write operands, 
then just say that and ignore the constraints or other mechanisms by 
which we got a read/write pseudo.


So I think with those two small comment changes, this patch is OK for 
the trunk.  Please post the final version for archival purposes before 
checking it in.


Thanks,
Jeff

Re: [PATCH, i386, Pointer Bounds Checker 17/x] Pointer bounds constants support

2014-06-17 Thread Jeff Law


On 06/06/14 03:11, Ilya Enkovich wrote:

2014-06-04 10:58 GMT+04:00 Jeff Law :

On 06/02/14 04:25, Ilya Enkovich wrote:


Hi,

This patch adds support for pointer bounds constants to be used as
DECL_INITIAL for constant bounds (like zero bounds).

Bootstrapped and tested on linux-x86_64.

Thanks,
Ilya
--
gcc/

2014-05-30  Ilya Enkovich  

 * emit-rtl.c (immed_double_const): Support MODE_POINTER_BOUNDS.
 (init_emit_once): Build pointer bounds zero constants.
 * explow.c (trunc_int_for_mode): Likewise.
 * varpool.c (ctor_for_folding): Do not fold constant
 bounds vars.
 * varasm.c (output_constant_pool_2): Support MODE_POINTER_BOUNDS.
 * config/i386/i386.c (ix86_legitimate_constant_p): Mark
 bounds constant as not valid.



[ ... ]




@@ -5875,6 +5876,11 @@ init_emit_once (void)
 if (STORE_FLAG_VALUE == 1)
   const_tiny_rtx[1][(int) BImode] = const1_rtx;

+  for (mode = GET_CLASS_NARROWEST_MODE (MODE_POINTER_BOUNDS);
+   mode != VOIDmode;
+   mode = GET_MODE_WIDER_MODE (mode))
+const_tiny_rtx[0][mode] = immed_double_const (0, 0, mode);


I'm pretty sure GET_CLASS_NARROWEST_MODE should be taking a class, not a
mode as its argument.  So something is clearly wrong here...


MODE_POINTER_BOUNDS is a class. Modes in this class are BND32mode and BND64mode.

Bah.  You're right.  Approved.

jeff

Re: [PATCH, loop2_invariant, 1/2] Check only one register class

2014-06-17 Thread Jeff Law


On 06/11/14 04:05, Zhenqiang Chen wrote:

On 10 June 2014 19:06, Steven Bosscher  wrote:

On Tue, Jun 10, 2014 at 11:22 AM, Zhenqiang Chen wrote:

Hi,

For loop2-invariant pass, when flag_ira_loop_pressure is enabled,
function gain_for_invariant checks the pressures of all register
classes. This does not make sense since one invariant might impact
only one register class.

The patch enhances functions get_inv_cost and gain_for_invariant to
check only the register pressure of the invariant if possible.


This patch may work for targets with more-or-less orthogonal reg
classes, but not if there is a lot of overlap between reg classes.


Yes. I need check the overlap between reg classes.

Patch is updated to check all overlap reg classes by reg_classes_intersect_p:

Just so I'm sure I know what you're trying to do.

You want to map the pseudo back to its likely class(es) then look at how 
those classes (and only those classes) would be impacted from a register 
pressure standpoint if the pseudo was hoisted as an invariant?


This is primarily achieved by returning the class of the invariant, then 
filtering out any non-intersecting classes in gain_for_invariant, right?


jeff

Re: [PATCH] Fortran OpenMP 4.0 target support

2014-06-17 Thread Tobias Burnus


Jakub Jelinek wrote:

This patch adds the target directives.
Tested both normally plus with target.c/splay-tree.c from
gomp-4_0-branch@203409 plus the attached patch against
target.c to implement the new to_pset map kind (5) and
allow handling of NULL.  That patch will need to be forward
ported to whatever gomp-4_0-branch now has after this is merged
from trunk to that branch.

Does this look reasonable to Fortran maintainers?


Thanks for the patch! I browsed through the patch, and it looked good to 
me. (However, given that the patch has "48 files changed, 3342 
insertions(+), 330 deletions(-)", I didn't check every line.)


If I did the book keeping correctly, a patch for an alignment test case 
is still missing. As are the changes for some corner cases for which the 
OpenMP ARB has to provide some feedback. Any news from that side? 
Otherwise and aside of 4.9.1 backporting, it now looks pretty complete.


Tobias

Re: [PATCH] Fortran OpenMP 4.0 target support

2014-06-17 Thread Jakub Jelinek

On Tue, Jun 17, 2014 at 11:59:22PM +0200, Tobias Burnus wrote:
> >This patch adds the target directives.
> >Tested both normally plus with target.c/splay-tree.c from
> >gomp-4_0-branch@203409 plus the attached patch against
> >target.c to implement the new to_pset map kind (5) and
> >allow handling of NULL.  That patch will need to be forward
> >ported to whatever gomp-4_0-branch now has after this is merged
> >from trunk to that branch.
> >
> >Does this look reasonable to Fortran maintainers?
> 
> Thanks for the patch! I browsed through the patch, and it looked good to me.
> (However, given that the patch has "48 files changed, 3342 insertions(+),
> 330 deletions(-)", I didn't check every line.)
> 
> If I did the book keeping correctly, a patch for an alignment test case is
> still missing. As are the changes for some corner cases for which the OpenMP
> ARB has to provide some feedback. Any news from that side? Otherwise and
> aside of 4.9.1 backporting, it now looks pretty complete.

I think some work is needed in tree-nested.c, ideally write a testcase
that tests all the new OpenMP 4.0 clauses in contained functions with and
without non-local decls (and with local decls used by contained functions).

One of the omp-lang answers shows some work is needed on the UDRs too, in
particular that the combiner/initializer should not be resolved as part of
the UDR directive, but only when used in a reduction clause where not only
the typespec, but also rank/shape, pointer/allocatable etc. are known.

Some further restriction checking is probably needed + backing that with
testcases.  And wait for further omp-lang/omp-f2003 feedback.

Jakub

Re: [Patch, microblaze]: Added load and store reverse patterns

2014-06-17 Thread Michael Eager


On 02/10/14 17:55, Michael Eager wrote:

On 11/25/13 23:54, David Holsgrove wrote:

Added the lwr/swr instructions pattern.
lwr and swr instructions will load/store the data with opposite endianness.

Changelog

2013-11-26  Nagaraju Mekala 

  * gcc/config/microblaze/microblaze.md: Add movsi4_rev insn pattern.
  * gcc/config/microblaze/predicates.md: Add reg_or_mem_operand predicate.



GCC-head: Committed revision 207683.
GCC-4.8-branch: Committed revision 207684.


Reverted GCC-4.8-branch commit.
Committed revision 211750.

--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

Re: [PATCH, ARM] MI-thunk fix for TARGET_THUMB1_ONLY

2014-06-17 Thread Ramana Radhakrishnan

On Sun, Jun 8, 2014 at 12:27 PM, Chung-Lin Tang  wrote:
> Hi Richard, Ramana,
>
> Attached is a small fix for resolving a g++.old-deja/g++.jason/thunk2.C
> regression we found under a TARGET_THUMB1_ONLY multilib (-mthumb
> -march=armv6-m to be exact). Basically under those conditions, the thunk
> is in Thumb mode, so the subtraction should be 4 rather than 8.

Yep, this is OK with a minor change to the comment to make it more explicit.

>+  /* Output ".word .LTHUNKn-[37]-.LTHUNKPCn".  */

s/37/3,7/


Ok with that change and if no regressions.

OK for release branches unless the RM's object in 24 hours.

It would be nice to see if we could rewrite the mi thunk code like
other backends but that's the matter of a separate patch.

Ramana
>
> Original patch was by Julian, with trivial adaptations for trunk by me.
> We've been carrying this fix for a while by now. Okay for trunk? (and
> stable branches?)
>
> Thanks,
> Chung-Lin
>
> 2014-06-08  Julian Brown  
> Chung-Lin Tang  
>
> * config/arm/arm.c (arm_output_mi_thunk): Fix offset for
> TARGET_THUMB1_ONLY. Add comments.

Re: [PATCH, PR61219]: Fix sNaN handling in ARM float to double conversion

2014-06-17 Thread Ramana Radhakrishnan

On Sun, May 18, 2014 at 10:23 PM, Aurelien Jarno  wrote:
> On ARM soft-float, the float to double conversion doesn't convert a sNaN
> to qNaN as the IEEE Std 754 standard mandates:
>
> "Under default exception handling, any operation signaling an invalid
> operation exception and for which a floating-point result is to be
> delivered shall deliver a quiet NaN."
>
> Given the soft float ARM code ignores exceptions and always provides a
> result, a float to double conversion of a signaling NaN should return a
> quiet NaN. Fix this in extendsfdf2.
>
>
> 2014-05-18  Aurelien Jarno  
>
> PR target/61219
> * config/arm/ieee754-df.S (extendsfdf2): Convert sNaN to qNaN.

Ok if no regressions along with a testcase to catch this case please
and fixing the PR number

Sorry about the slow review.

Ramana

>
>
> Index: libgcc/config/arm/ieee754-df.S
> ===
> --- libgcc/config/arm/ieee754-df.S  (revision 210588)
> +++ libgcc/config/arm/ieee754-df.S  (working copy)
> @@ -473,11 +473,15 @@
> eorne   xh, xh, #0x3800 @ fixup exponent otherwise.
> RETc(ne)@ and return it.
>
> -   teq r2, #0  @ if actually 0
> -   do_it   ne, e
> -   teqne   r3, #0xff00 @ or INF or NAN
> +   bicsr2, r2, #0xff00 @ isolate mantissa
> +   do_it   eq  @ if 0, that is ZERO or INF,
> RETc(eq)@ we are done already.
>
> +   teq r3, #0xff00 @ check for NAN
> +   do_it   eq, t
> +   orreq   xh, xh, #0x0008 @ change to quiet NAN
> +   RETc(eq)@ and return it.
> +
> @ value was denormalized.  We can normalize it now.
> do_push {r4, r5, lr}
> mov r4, #0x380  @ setup corresponding exponent
>
> --
> Aurelien Jarno  GPG: 4096R/1DDD8C9B
> aurel...@aurel32.net http://www.aurel32.net

Re: [RFC ARM] Error if overriding --with-tune by --with-cpu

2014-06-17 Thread Ramana Radhakrishnan

On Fri, May 30, 2014 at 5:34 PM, James Greenhalgh
 wrote:
>
> Hi,
>
> We error in the case where both --with-tune and --with-cpu are specified at
> configure time. In this case, we cannot distinguish this situation from the
> situation where --with-tune was specified at configure time and -mcpu was
> passed on the command line, so we give -mcpu precedence.
>
> This might be surprising if you expect the precedence rules we give
> to the command line options, but we can't change this precedence without
> breaking our definition of -mcpu.
>
> We also promote the warning which used to be thrown in the case of
> --with-arch and --with-cpu to an error.

Ok by me - Especially as Bin has just run into it as part of his
testing. Obviously no one watches these warnings and they don't
realize what's happening under their feet.

>
> I've marked this is an RFC as it isn't clear that configure should be
> catching something like this. Other blatant errors in configuration
> options like passing "--with-languages=c,c++" pass without event.

Well yeah that looks ok .

>
> Tested with a few combinations of configure options with no issues and the
> expected behaviour.
>
> Any opinions, and if not, OK for trunk?


I am going to give this a week for anyone else to pitch in and object
- otherwise please apply it and document this change in behaviour in
the caveats section for the next release (changes.html).

Ramana


>
> Thanks,
> James
>
> ---
> gcc/
>
> 2014-05-30  James Greenhalgh  
>
> * config.gcc (supported_defaults): Error when passing either
> --with-tune or --with-arch in conjunction with --with-cpu for ARM.

[PATCH, rs6000] Fix PR61542 - V4SF vector extract for little endian

2014-06-17 Thread BIll Schmidt

Hi,

As described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61542, a
new test case (gcc.dg/vect/vect-nop-move.c) was added in 4.9.  This
exposes a bug on PowerPC little endian for extracting an element from a
V4SF value that goes back to 4.8.  The following patch fixes the
problem.

Tested on powerpc64le-unknown-linux-gnu with no regressions.  Ok to
commit to trunk?  I would also like to commit to 4.8 and 4.9 as soon as
possible to be picked up by the distros.

I would also like to backport gcc.dg/vect/vect-nop-move.c to 4.8 to
provide regression coverage.

Thanks,
Bill


2014-06-17  Bill Schmidt  

* config/rs6000/vsx.md (vsx_extract_v4sf): Fix bug with element
extraction other than index 3.


Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 211741)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -1667,7 +1667,7 @@
 {
   if (GET_CODE (op3) == SCRATCH)
op3 = gen_reg_rtx (V4SFmode);
-  emit_insn (gen_vsx_xxsldwi_v4sf (op3, op1, op1, op2));
+  emit_insn (gen_vsx_xxsldwi_v4sf (op3, op1, op1, GEN_INT (ele)));
   tmp = op3;
 }
   emit_insn (gen_vsx_xscvspdp_scalar2 (op0, tmp));

Re: [PATCH][ARM] FAIL: gcc.target/arm/pr58041.c scan-assembler ldrb

2014-06-17 Thread Ramana Radhakrishnan

On Fri, May 30, 2014 at 12:19 AM, Maciej W. Rozycki
 wrote:
> On Wed, 28 May 2014, Richard Earnshaw wrote:
>
>> Ah, light dawns (maybe).
>>
>> I guess the problems stem from the attempts to combine Neon with ARMv5.
>>  Neon shouldn't be used with anything prior to ARMv7, since that's the
>> earliest version of the architecture that can support it.
>
>  Good to know, thanks for the hint.  Anyway it's the test case doing
> something silly or maybe just odd.  After all IIUC ARMv5 code will run
> just fine on ARMv7/NEON hardware so mixing up ARMv5 scalar code with NEON
> vector code is nothing wrong per se.
>
>> I guess that what is happening is that we see we have Neon, so start to
>> generate a Neon-based copy sequence, but then notice that we don't have
>> misaligned access (something that must exist if we have Neon) and
>> generate VLDR instructions in a mistaken attempt to work around the
>> first inconsistency.
>>
>> Maybe we should tie -mfpu=neon to having at least ARMv7 (though ARMv6
>> also has misaligned access support).
>
>  So to move away from the odd mixture of instruction selection options
> just as a quick test I rebuilt the same file with `-march=armv7-a
> -mno-unaligned-access' and the result is the same, a pair of VLDR
> instructions accessing unaligned memory, i.e. the same problem.
>
>  So based on observations made so far I think there are two sensible
> ways to move forward:
>
> 1. Fix GCC so that a manual byte-wise copy is made whenever
>   `-mno-unaligned-access' is in effect.

#1 is the preferrable option.


>
> 2. Revert the change being discussed here as its lone purpose was to
>disable the use of VLD1.8, etc. where `-mno-unaligned-access' is in
>effect, and it does no good.

Reverting this means pr58041 will fail on armv7-a / neon
configurations which is what this patch was designed to "fix" ?  So
it's not an option is it ?

Ramana

>
>   Maciej

Re: [PATCH] [ARM] [RFC] Fix longstanding push_minipool_fix ICE (PR49423, lp1296601)

2014-06-17 Thread Ramana Radhakrishnan

On Wed, Apr 2, 2014 at 2:29 PM, Charles Baylis
 wrote:
> Hi
>
> This patch fixes the push_minipool_fix ICE, which occurs when the ARM
> backend encounters a zero/sign extending load from a constant pool.
>
> I don't have a current test case for trunk, lp1296601 has a test case
> which affects the linaro-4.8 branch. As far as I know, there has been
> no fix for this on trunk.
>
> The approach taken in this patch is to extend each pattern where this
> can occur,  so that it triggers a define_split to synthesise a
> constant move instead. Some but not all extend patterns have
> previously added pool_range attributes to work-around this problem,
> this patch removes those, and also fixes the remaining patterns. Some
> patterns have slightly more complex workarounds, which I have not yet
> analysed, but it seems worth posting the patch at this stage to get
> feedback on the general approach.
>
> Tested on arm-unknown-linux-gnueabihf (qemu), bootstrap in progress.
>
> If this looks good, I'll clean it up for a more detailed review.

Interesting workaround but can we investigate further how to fix this
at the source rather than working around in the backend in this form.
It's still a kludge that we carry in the backend rather than fix the
problem at it's source. I'd rather try to fix the problem at the
source rather than working around this in the backend.


Ramana

>
> Thanks
> Charles

C++ PATCH for c++/60605 (local function and default template arg)

2014-06-17 Thread Jason Merrill

The exception for local declarations in check_default_tmpl_args needs to 
handle DECL_LOCAL_FUNCTION_P, too.


Tested x86_64-pc-linux-gnu, applying to 4.8, 4.9, trunk.
commit 424c657e1213126dc5d2a7231abac05e16713286
Author: Jason Merrill 
Date:   Tue Jun 17 18:43:57 2014 +0200

	PR c++/60605
	* pt.c (check_default_tmpl_args): Check DECL_LOCAL_FUNCTION_P.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 85b46fe..a4e1a59 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -4308,7 +4308,8 @@ check_default_tmpl_args (tree decl, tree parms, bool is_primary,
  in the template-parameter-list of the definition of a member of a
  class template.  */
 
-  if (TREE_CODE (CP_DECL_CONTEXT (decl)) == FUNCTION_DECL)
+  if (TREE_CODE (CP_DECL_CONTEXT (decl)) == FUNCTION_DECL
+  || (TREE_CODE (decl) == FUNCTION_DECL && DECL_LOCAL_FUNCTION_P (decl)))
 /* You can't have a function template declaration in a local
scope, nor you can you define a member of a class template in a
local scope.  */
diff --git a/gcc/testsuite/g++.dg/template/local-fn1.C b/gcc/testsuite/g++.dg/template/local-fn1.C
new file mode 100644
index 000..88acd17
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/local-fn1.C
@@ -0,0 +1,8 @@
+// PR c++/60605
+
+template 
+struct Foo {
+void bar() {
+void bug();
+}
+};

[PATCH] PR61517: fix stmt replacement in bswap pass

2014-06-17 Thread Thomas Preud'homme

Hi everybody,

Thanks to a comment from Richard Biener, the bswap pass take care to not 
perform its optimization is memory is modified between the load of the original 
expression. However, when it replaces these statements by a single load, it 
does so in the gimple statement that computes the final bitwise OR of the 
original expression. However, memory could be modified between the last load 
statement and this bitwise OR statement. Therefore the result is to read memory 
*after* it was changed instead of before.

This patch takes care to move the statement to be replaced close to one of the 
original load, thus avoiding this problem.

ChangeLog entries for this fix are:

*** gcc/ChangeLog ***

2014-06-16  Thomas Preud'homme  

* tree-ssa-math-opts.c (find_bswap_or_nop_1): Adapt to return a stmt
whose rhs's first tree is the source expression instead of the
expression itself.
(find_bswap_or_nop): Likewise.
(bsap_replace): Rename stmt in cur_stmt. Pass gsi by value and src as a
gimple stmt whose rhs's first tree is the source. In the memory source
case, move the stmt to be replaced close to one of the original load to
avoid the problem of a store between the load and the stmt's original
location.
(pass_optimize_bswap::execute): Adapt to change in bswap_replace's
signature.

*** gcc/testsuite/ChangeLog ***

2014-06-16  Thomas Preud'homme  

* gcc.c-torture/execute/bswap-2.c (incorrect_read_le32): New.
(incorrect_read_be32): Likewise.
(main): Call incorrect_read_* to test stmt replacement is made by
bswap at the right place.
* gcc.c-torture/execute/pr61517.c: New test.

Patch also attached for convenience. Is it ok for trunk?

diff --git a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c 
b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
index a47e01a..88132fe 100644
--- a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
+++ b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
@@ -66,6 +66,32 @@ fake_read_be32 (char *x, char *y)
   return c3 | c2 << 8 | c1 << 16 | c0 << 24;
 }
 
+__attribute__ ((noinline, noclone)) uint32_t
+incorrect_read_le32 (char *x, char *y)
+{
+  unsigned char c0, c1, c2, c3;
+
+  c0 = x[0];
+  c1 = x[1];
+  c2 = x[2];
+  c3 = x[3];
+  *y = 1;
+  return c0 | c1 << 8 | c2 << 16 | c3 << 24;
+}
+
+__attribute__ ((noinline, noclone)) uint32_t
+incorrect_read_be32 (char *x, char *y)
+{
+  unsigned char c0, c1, c2, c3;
+
+  c0 = x[0];
+  c1 = x[1];
+  c2 = x[2];
+  c3 = x[3];
+  *y = 1;
+  return c3 | c2 << 8 | c1 << 16 | c0 << 24;
+}
+
 int
 main ()
 {
@@ -92,8 +118,17 @@ main ()
   out = fake_read_le32 (cin, &cin[2]);
   if (out != 0x89018583)
 __builtin_abort ();
+  cin[2] = 0x87;
   out = fake_read_be32 (cin, &cin[2]);
   if (out != 0x83850189)
 __builtin_abort ();
+  cin[2] = 0x87;
+  out = incorrect_read_le32 (cin, &cin[2]);
+  if (out != 0x89878583)
+__builtin_abort ();
+  cin[2] = 0x87;
+  out = incorrect_read_be32 (cin, &cin[2]);
+  if (out != 0x83858789)
+__builtin_abort ();
   return 0;
 }
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61517.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61517.c
new file mode 100644
index 000..fc9bbe8
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61517.c
@@ -0,0 +1,19 @@
+int a, b, *c = &a;
+unsigned short d;
+
+int
+main ()
+{
+  unsigned int e = a;
+  *c = 1;
+  if (!b)
+{
+  d = e;
+  *c = d | e;
+}
+
+  if (a != 0)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index c868e92..1ee2ba8 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1804,28 +1804,28 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct 
symbolic_number *n)
 
 /* find_bswap_or_nop_1 invokes itself recursively with N and tries to perform
the operation given by the rhs of STMT on the result.  If the operation
-   could successfully be executed the function returns the tree expression of
-   the source operand and NULL otherwise.  */
+   could successfully be executed the function returns a gimple stmt whose
+   rhs's first tree is the expression of the source operand and NULL
+   otherwise.  */
 
-static tree
+static gimple
 find_bswap_or_nop_1 (gimple stmt, struct symbolic_number *n, int limit)
 {
   enum tree_code code;
   tree rhs1, rhs2 = NULL;
-  gimple rhs1_stmt, rhs2_stmt;
-  tree source_expr1;
+  gimple rhs1_stmt, rhs2_stmt, source_stmt1;
   enum gimple_rhs_class rhs_class;
 
   if (!limit || !is_gimple_assign (stmt))
-return NULL_TREE;
+return NULL;
 
   rhs1 = gimple_assign_rhs1 (stmt);
 
   if (find_bswap_or_nop_load (stmt, rhs1, n))
-return rhs1;
+return stmt;
 
   if (TREE_CODE (rhs1) != SSA_NAME)
-return NULL_TREE;
+return NULL;
 
   code = gimple_assign_rhs_code (stmt);
   rhs_class = gimple_assign_rhs_class (stmt);
@@ -1848,18 +1848,18 @@ find_bswap_or_nop_1 (gimple stmt, struct 
symbolic

Re: [PATCH, loop2_invariant, 1/2] Check only one register class

2014-06-17 Thread Zhenqiang Chen

On 18 June 2014 05:49, Jeff Law  wrote:
> On 06/11/14 04:05, Zhenqiang Chen wrote:
>>
>> On 10 June 2014 19:06, Steven Bosscher  wrote:
>>>
>>> On Tue, Jun 10, 2014 at 11:22 AM, Zhenqiang Chen wrote:

 Hi,

 For loop2-invariant pass, when flag_ira_loop_pressure is enabled,
 function gain_for_invariant checks the pressures of all register
 classes. This does not make sense since one invariant might impact
 only one register class.

 The patch enhances functions get_inv_cost and gain_for_invariant to
 check only the register pressure of the invariant if possible.
>>>
>>>
>>> This patch may work for targets with more-or-less orthogonal reg
>>> classes, but not if there is a lot of overlap between reg classes.
>>
>>
>> Yes. I need check the overlap between reg classes.
>>
>> Patch is updated to check all overlap reg classes by
>> reg_classes_intersect_p:
>
> Just so I'm sure I know what you're trying to do.
>
> You want to map the pseudo back to its likely class(es) then look at how
> those classes (and only those classes) would be impacted from a register
> pressure standpoint if the pseudo was hoisted as an invariant?

Yes.

> This is primarily achieved by returning the class of the invariant, then
> filtering out any non-intersecting classes in gain_for_invariant, right?

Yes. This is what I want to do since I found some invariant which
register class is NO_REGS (memory write) or SSE_REGS is blocked by
GENERAL_REGS' register pressure.

Thanks!
-Zhenqiang

RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap

2014-06-17 Thread Thomas Preud'homme

> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Wednesday, June 11, 2014 4:32 PM
> >
> >
> > Is this OK for trunk? Does this bug qualify for a backport patch to
> > 4.8 and 4.9 branches?
> 
> This is ok for trunk and also for backporting (after a short while to
> see if there is any fallout).

Below is the backported patch for 4.8/4.9. Is this ok for both 4.8 and
4.9? If yes, how much more should I wait before committing?

Tested on both 4.8 and 4.9 without regression in the testsuite after
a bootstrap.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 1e35bbe..0559b7f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,16 @@
+2014-06-12  Thomas Preud'homme  
+
+   PR tree-optimization/61306
+   * tree-ssa-math-opts.c (struct symbolic_number): Store type of
+   expression instead of its size.
+   (do_shift_rotate): Adapt to change in struct symbolic_number. Return
+   false to prevent optimization when the result is unpredictable due to
+   arithmetic right shift of signed type with highest byte is set.
+   (verify_symbolic_number_p): Adapt to change in struct symbolic_number.
+   (find_bswap_1): Likewise. Return NULL to prevent optimization when the
+   result is unpredictable due to sign extension.
+   (find_bswap): Adapt to change in struct symbolic_number.
+
 2014-06-12  Alan Modra  
 
PR target/61300
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 757cb74..139f23c 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2014-06-12  Thomas Preud'homme  
+
+   * gcc.c-torture/execute/pr61306-1.c: New test.
+   * gcc.c-torture/execute/pr61306-2.c: Likewise.
+   * gcc.c-torture/execute/pr61306-3.c: Likewise.
+
 2014-06-11  Richard Biener  
 
PR tree-optimization/61452
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c
new file mode 100644
index 000..ebc90a3
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c
@@ -0,0 +1,39 @@
+#ifdef __INT32_TYPE__
+typedef __INT32_TYPE__ int32_t;
+#else
+typedef int int32_t;
+#endif
+
+#ifdef __UINT32_TYPE__
+typedef __UINT32_TYPE__ uint32_t;
+#else
+typedef unsigned uint32_t;
+#endif
+
+#define __fake_const_swab32(x) ((uint32_t)(  \
+   (((uint32_t)(x) & (uint32_t)0x00ffUL) << 24) |\
+   (((uint32_t)(x) & (uint32_t)0xff00UL) <<  8) |\
+   (((uint32_t)(x) & (uint32_t)0x00ffUL) >>  8) |\
+   (( (int32_t)(x) &  (int32_t)0xff00UL) >> 24)))
+
+/* Previous version of bswap optimization failed to consider sign extension
+   and as a result would replace an expression *not* doing a bswap by a
+   bswap.  */
+
+__attribute__ ((noinline, noclone)) uint32_t
+fake_bswap32 (uint32_t in)
+{
+  return __fake_const_swab32 (in);
+}
+
+int
+main(void)
+{
+  if (sizeof (int32_t) * __CHAR_BIT__ != 32)
+return 0;
+  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
+return 0;
+  if (fake_bswap32 (0x87654321) != 0xff87)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c
new file mode 100644
index 000..886ecfd
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c
@@ -0,0 +1,40 @@
+#ifdef __INT16_TYPE__
+typedef __INT16_TYPE__ int16_t;
+#else
+typedef short int16_t;
+#endif
+
+#ifdef __UINT32_TYPE__
+typedef __UINT32_TYPE__ uint32_t;
+#else
+typedef unsigned uint32_t;
+#endif
+
+#define __fake_const_swab32(x) ((uint32_t)(  \
+   (((uint32_t) (x) & (uint32_t)0x00ffUL) << 24) |   \
+   (((uint32_t)(int16_t)(x) & (uint32_t)0x0000UL) <<  8) |   \
+   (((uint32_t) (x) & (uint32_t)0x00ffUL) >>  8) |   \
+   (((uint32_t) (x) & (uint32_t)0xff00UL) >> 24)))
+
+
+/* Previous version of bswap optimization failed to consider sign extension
+   and as a result would replace an expression *not* doing a bswap by a
+   bswap.  */
+
+__attribute__ ((noinline, noclone)) uint32_t
+fake_bswap32 (uint32_t in)
+{
+  return __fake_const_swab32 (in);
+}
+
+int
+main(void)
+{
+  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
+return 0;
+  if (sizeof (int16_t) * __CHAR_BIT__ != 16)
+return 0;
+  if (fake_bswap32 (0x81828384) != 0xff838281)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c
new file mode 100644
index 000..6086e27
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c
@@ -0,0 +1,13 @@
+short a = -1;
+int b;
+char c;
+
+int
+main ()
+{
+  c = a;
+  b = a | c;
+  if (b != -1)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 9ff857c..2b656ae 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1620,7 +1620,7 @@ make_pass_cse_s

[PATCH, aarch64] Fix 61545

2014-06-17 Thread Richard Henderson

Trivial fix for missing clobber of the flags over the tlsdesc call.

Ok for all branches?


r~

* config/aarch64/aarch64.md (tlsdesc_small_): Clobber CC_REGNUM.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a4d8887..1ee2cae 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3855,6 +3855,7 @@
 (unspec:PTR [(match_operand 0 "aarch64_valid_symref" "S")]
   UNSPEC_TLSDESC))
(clobber (reg:DI LR_REGNUM))
+   (clobber (reg:CC CC_REGNUM))
(clobber (match_scratch:DI 1 "=r"))]
   "TARGET_TLS_DESC"
   "adrp\\tx0, %A0\;ldr\\t%1, [x0, #%L0]\;add\\t0, 0, 
%L0\;.tlsdesccall\\t%0\;blr\\t%1"

Re: [PATCH][PING] Fix for PR 61422

2014-06-17 Thread Yury Gribov

Have already been done in r211699. Does it work for you? Adding a test 
would still be useful.


-Y

[Patch, Fortran, committed] PR61126 – fix wextra_1.f regression

2014-06-17 Thread Tobias Burnus

Committed as Rev. 211766. See PR comments 10, 23 and 24 for the patch 
and the background.


Thanks to Manuel and Dominque for the patch!

Tobias
2014-06-18  Manuel LÃ³pez-IbÃ¡Ã±ez  

	PR fortran/61126
	* options.c (gfc_handle_option): Remove call to
	handle_generated_option.

2014-06-18  Dominique d'Humieres 

	PR fortran/61126
	* gfortran.dg/wextra_1.f: Add -Wall to dg-options.

diff --git a/gcc/fortran/options.c b/gcc/fortran/options.c
index a2b91ca..e4931f0 100644
--- a/gcc/fortran/options.c
+++ b/gcc/fortran/options.c
@@ -674,12 +674,7 @@ gfc_handle_option (size_t scode, const char *arg, int value,
   break;
 
 case OPT_Wextra:
-  handle_generated_option (&global_options, &global_options_set,
-			   OPT_Wunused_parameter, NULL, value,
-			   gfc_option_lang_mask (), kind, loc,
-			   handlers, global_dc);
   set_Wextra (value);
-
   break;
 
 case OPT_Wfunction_elimination:
diff --git a/gcc/testsuite/gfortran.dg/wextra_1.f b/gcc/testsuite/gfortran.dg/wextra_1.f
index 94c8edd..0eb28e1 100644
--- a/gcc/testsuite/gfortran.dg/wextra_1.f
+++ b/gcc/testsuite/gfortran.dg/wextra_1.f
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options "-Wextra" }
+! { dg-options "-Wall -Wextra" }
   program main
   integer, parameter :: x=3 ! { dg-warning "Unused parameter" }
   real :: a

Re: [Patch, Fortran, committed] PR61126 – fix wextra_1.f regression

2014-06-17 Thread Tobias Burnus


Tobias Burnus wrote:
Committed as Rev. 211766. See PR comments 10, 23 and 24 for the patch 
and the background. Thanks to Manuel and Dominque for the patch!


And as follow up, I have committed the attached documentation patch. I 
think it is sufficient, even though it does not explicitly state that 
-Wall only works because -Wall implies -Wunused.


Committed as Rev. 211767.

Tobias
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 211766)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,9 @@
+2014-06-18  Tobias Burnus  
+
+	PR fortran/61126
+	* invoke.texi (-Wunused-parameter): Make clearer when
+	-Wextra implies this option.
+
 2014-06-18  Manuel LÃ³pez-IbÃ¡Ã±ez  
 
 	PR fortran/61126
Index: gcc/fortran/invoke.texi
===
--- gcc/fortran/invoke.texi	(Revision 211766)
+++ gcc/fortran/invoke.texi	(Arbeitskopie)
@@ -911,7 +911,8 @@ Contrary to @command{gcc}'s meaning of @option{-Wu
 @command{gfortran}'s implementation of this option does not warn
 about unused dummy arguments (see @option{-Wunused-dummy-argument}),
 but about unused @code{PARAMETER} values. @option{-Wunused-parameter}
-is not included in @option{-Wall} but is implied by @option{-Wall -Wextra}.
+is implied by @option{-Wextra} if also @option{-Wunused} or
+@option{-Wall} is used.
 
 @item -Walign-commons
 @opindex @code{Walign-commons}

Re: [PATCH, loop2_invariant, 2/2] Change heuristics for identical invariants

2014-06-17 Thread Zhenqiang Chen

On 10 June 2014 19:16, Steven Bosscher  wrote:
> On Tue, Jun 10, 2014 at 11:23 AM, Zhenqiang Chen wrote:
>> * loop-invariant.c (struct invariant): Add a new member: eqno;
>> (find_identical_invariants): Update eqno;
>> (create_new_invariant): Init eqno;
>> (get_inv_cost): Compute comp_cost wiht eqno;
>> (gain_for_invariant): Take spill cost into account.
>
> Look OK except ...
>
>> @@ -1243,7 +1256,13 @@ gain_for_invariant (struct invariant *inv,
>> unsigned *regs_needed,
>>  + IRA_LOOP_RESERVED_REGS
>>  - ira_class_hard_regs_num[cl];
>>if (size_cost > 0)
>> -   return -1;
>> +   {
>> + int spill_cost = target_spill_cost [speed] * (int) regs_needed[cl];
>> + if (comp_cost <= spill_cost)
>> +   return -1;
>> +
>> + return 2;
>> +   }
>>else
>> size_cost = 0;
>>  }
>
> ... why "return 2", instead of just falling through to "return
> comp_cost - size_cost;"?

Thanks for the comments. Updated.

As your comments for the previous patch, I should also check the
overlap between reg classes. So I change the logic to check spill
cost.

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 6e43b49..af0c95b 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -104,6 +104,9 @@ struct invariant
   /* The number of the invariant with the same value.  */
   unsigned eqto;

+  /* The number of invariants which eqto this.  */
+  unsigned eqno;
+
   /* If we moved the invariant out of the loop, the register that contains its
  value.  */
   rtx reg;
@@ -498,6 +501,7 @@ find_identical_invariants (invariant_htab_type eq,
struct invariant *inv)
   struct invariant *dep;
   rtx expr, set;
   enum machine_mode mode;
+  struct invariant *tmp;

   if (inv->eqto != ~0u)
 return;
@@ -513,7 +517,12 @@ find_identical_invariants (invariant_htab_type
eq, struct invariant *inv)
   mode = GET_MODE (expr);
   if (mode == VOIDmode)
 mode = GET_MODE (SET_DEST (set));
-  inv->eqto = find_or_insert_inv (eq, expr, mode, inv)->invno;
+
+  tmp = find_or_insert_inv (eq, expr, mode, inv);
+  inv->eqto = tmp->invno;
+
+  if (tmp->invno != inv->invno && inv->always_executed)
+tmp->eqno++;

   if (dump_file && inv->eqto != inv->invno)
 fprintf (dump_file,
@@ -725,6 +734,10 @@ create_new_invariant (struct def *def, rtx insn,
bitmap depends_on,

   inv->invno = invariants.length ();
   inv->eqto = ~0u;
+
+  /* Itself.  */
+  inv->eqno = 1;
+
   if (def)
 def->invno = inv->invno;
   invariants.safe_push (inv);
@@ -1141,7 +1154,7 @@ get_inv_cost (struct invariant *inv, int
*comp_cost, unsigned *regs_needed,

   if (!inv->cheap_address
   || inv->def->n_addr_uses < inv->def->n_uses)
-(*comp_cost) += inv->cost;
+(*comp_cost) += inv->cost * inv->eqno;

 #ifdef STACK_REGS
   {
@@ -1249,7 +1262,7 @@ gain_for_invariant (struct invariant *inv,
unsigned *regs_needed,
unsigned *new_regs, unsigned regs_used,
bool speed, bool call_p)
 {
-  int comp_cost, size_cost;
+  int comp_cost, size_cost = 0;
   enum reg_class cl;
   int ret;

@@ -1273,6 +1286,8 @@ gain_for_invariant (struct invariant *inv,
unsigned *regs_needed,
 {
   int i;
   enum reg_class pressure_class;
+  int spill_cost = 0;
+  int base_cost = target_spill_cost [speed];

   for (i = 0; i < ira_pressure_classes_num; i++)
{
@@ -1286,30 +1301,13 @@ gain_for_invariant (struct invariant *inv,
unsigned *regs_needed,
  + LOOP_DATA (curr_loop)->max_reg_pressure[pressure_class]
  + IRA_LOOP_RESERVED_REGS
  > ira_class_hard_regs_num[pressure_class])
-   break;
+   {
+ spill_cost += base_cost * (int) regs_needed[pressure_class];
+ size_cost = -1;
+   }
}
-  if (i < ira_pressure_classes_num)
-   /* There will be register pressure excess and we want not to
-  make this loop invariant motion.  All loop invariants with
-  non-positive gains will be rejected in function
-  find_invariants_to_move.  Therefore we return the negative
-  number here.
-
-  One could think that this rejects also expensive loop
-  invariant motions and this will hurt code performance.
-  However numerous experiments with different heuristics
-  taking invariant cost into account did not confirm this
-  assumption.  There are possible explanations for this
-  result:
-   o probably all expensive invariants were already moved out
- of the loop by PRE and gimple invariant motion pass.
-   o expensive invariant execution will be hidden by insn
- scheduling or OOO processor hardware because usually such
- invariants have a lot of freedom to be executed
- out-of-order.
-  Another reason for ignoring invariant cost vs spilling cost
-

95 matches

Mail list logo