Re: update acc routines in fortran

2015-11-20 Thread Jakub Jelinek
On Thu, Nov 19, 2015 at 08:26:45AM -0800, Cesar Philippidis wrote:
>   (gfc_oacc_routine_name): New struct;

Full stop instead of semicolon.

> diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c
> index 1f6311c..e321072 100644
> --- a/gcc/tree-nested.c
> +++ b/gcc/tree-nested.c
> @@ -1106,6 +1106,9 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct 
> walk_stmt_info *wi)
>   case OMP_CLAUSE_NUM_TASKS:
>   case OMP_CLAUSE_HINT:
>   case OMP_CLAUSE__CILK_FOR_COUNT_:
> + case OMP_CLAUSE_NUM_GANGS:
> + case OMP_CLAUSE_NUM_WORKERS:
> + case OMP_CLAUSE_VECTOR_LENGTH:
> wi->val_only = true;
> wi->is_lhs = false;
> convert_nonlocal_reference_op (_CLAUSE_OPERAND (clause, 0),
> @@ -1173,6 +1176,10 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct 
> walk_stmt_info *wi)
>   case OMP_CLAUSE_THREADS:
>   case OMP_CLAUSE_SIMD:
>   case OMP_CLAUSE_DEFAULTMAP:
> + case OMP_CLAUSE_GANG:
> + case OMP_CLAUSE_WORKER:
> + case OMP_CLAUSE_VECTOR:

This looks wrong.  OMP_CLAUSE_GANG has 2 arguments, OMP_CLAUSE_WORKER and
OMP_CLAUSE_VECTOR one argument, if you use a non-local decl or local decl
that is referenced by a nested routine in those operands, it won't be
handled properly.

> @@ -1830,6 +1840,10 @@ convert_local_omp_clauses (tree *pclauses, struct 
> walk_stmt_info *wi)
>   case OMP_CLAUSE_THREADS:
>   case OMP_CLAUSE_SIMD:
>   case OMP_CLAUSE_DEFAULTMAP:
> + case OMP_CLAUSE_GANG:
> + case OMP_CLAUSE_WORKER:
> + case OMP_CLAUSE_VECTOR:
> + case OMP_CLAUSE_SEQ:

Ditto.

Otherwise LGTM.

Jakub


Re: basic asm and memory clobbers

2015-11-20 Thread Andrew Haley
On 20/11/15 01:23, David Wohlferd wrote:
> I tried to picture the most basic case I can think of that uses 
> something clobber-able:
> 
> for (int x=0; x < 1000; x++)
>asm("#stuff");
> 
> This generates very simple and highly performant code:
> 
>  movl$1000, %eax
> .L2:
>  #stuff
>  subl$1, %eax
>  jne .L2
> 
> Using extended asm to simulate the clobberall gives:
> 
>  movl$1000, 44(%rsp)
> .L2:
>  #stuff
>  subl$1, 44(%rsp)
>  jne .L2
> 
> It allocates an extra 4 bytes, and changed everything to memory accesses 
> instead of using a register.

Can you show us your code?  I get

xx:
movl$1000, %eax
.L2:
#stuff
subl$1, %eax
jne .L2
rep; ret

for

void xx() {
  for (int x=0; x < 1000; x++)
asm volatile("#stuff" : : : "memory");
}

What you're describing looks like a bug: x doesn't have its address
taken.

Andrew.


[Bug tree-optimization/68373] autopar fails on loop exit phi with argument defined outside loop

2015-11-20 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68373

--- Comment #4 from vries at gcc dot gnu.org ---
Author: vries
Date: Fri Nov 20 10:25:26 2015
New Revision: 230650

URL: https://gcc.gnu.org/viewcvs?rev=230650=gcc=rev
Log:
Do final value replacement in try_create_reduction_list

2015-11-20  Tom de Vries  

PR tree-optimization/68373
* tree-scalar-evolution.c (final_value_replacement_loop): Factor out of
...
(scev_const_prop): ... here.
* tree-scalar-evolution.h (final_value_replacement_loop): Declare.
* tree-parloops.c (try_create_reduction_list): Call
final_value_replacement_loop.

* gcc.dg/autopar/pr68373.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/autopar/pr68373.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-parloops.c
trunk/gcc/tree-scalar-evolution.c
trunk/gcc/tree-scalar-evolution.h

Re: [PATCH] Add clang-format config to contrib folder

2015-11-20 Thread Pedro Alves
On 11/20/2015 11:33 AM, Martin Liška wrote:

> Hi Pedro.

Hi Martin.

> Fully agree with you, there's suggested patch.
> Hope I can install the patch for trunk?

I'd call it obvious.  :-)

Thanks,
Pedro Alves



[Bug c++/68409] Garbage added to a map instead of object

2015-11-20 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68409

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #3 from Jonathan Wakely  ---
As Adrian says, your operator< is not valid. See
https://www.sgi.com/tech/stl/StrictWeakOrdering.html for the requirements that
must be met for the comparison function used by std::map

Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.

2015-11-20 Thread Kyrill Tkachov

Hi Kirill,

On 18/11/15 14:11, Kirill Yukhin wrote:

Hello Andreas, Devid.

On 18 Nov 10:45, Andreas Schwab wrote:

Kirill Yukhin  writes:


diff --git a/gcc/testsuite/c-c++-common/attr-simd.c 
b/gcc/testsuite/c-c++-common/attr-simd.c
new file mode 100644
index 000..b4eda34
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/attr-simd.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-optimized" } */
+
+__attribute__((__simd__))
+extern
+int simd_attr (void)
+{
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "simd_attr\[ \\t\]simdclone|vector" "optimized" 
} } */

On ia64:

FAIL: c-c++-common/attr-simd.c  -Wc++-compat   scan-tree-dump optimized "simd_attr[ 
\\t]simdclone|vector"
FAIL: c-c++-common/attr-simd.c  -Wc++-compat   scan-tree-dump optimized "simd_attr2[ 
\\t]simdclone|vector"

$ grep simd_attr attr-simd.c.194t.optimized
;; Function simd_attr (simd_attr, funcdef_no=0, decl_uid=1389, cgraph_uid=0, 
symbol_order=0)
simd_attr ()
;; Function simd_attr2 (simd_attr2, funcdef_no=1, decl_uid=1392, cgraph_uid=1, 
symbol_order=1)
simd_attr2 ()

As far as vABI is supported on x86_64/i?86 only, I am going to enable mentioned 
`scan-tree-dump' only
for these targets. This should cure both IA64 and Power.

Concerning attr-simd-3.c. It is known issue: PR68158.
And I believe this test should work everywhere as far as PR is resolved.
I'll put xfail into the test.
Which will lead to (in g++.log):
gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute 
does not apply to types\
  [-Wattributes]^M
output is:
gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute 
does not apply to types\
  [-Wattributes]^M

XFAIL: c-c++-common/attr-simd-3.c  -std=gnu++98 PR68158 (test for errors, line 
5)
FAIL: c-c++-common/attr-simd-3.c  -std=gnu++98 (test for excess errors)
Excess errors:
gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute 
does not apply to types\
  [-Wattributes]

Patch in the bottom.

gcc/tessuite/
* c-c++-common/attr-simd-3.c: Put xfail (PR68158) on dg-error.


This test fails on bare-metal targets that don't support -fcilkplus or -pthread.
Would you consider moving them to the cilkplus testing directory or adding an 
appropriate
effective target check?

Thanks,
Kyrill


* c-c++-common/attr-simd.c: Limit scan of dump to x86_64/i?86.


Andreas.

--
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

diff --git a/gcc/testsuite/c-c++-common/attr-simd-3.c 
b/gcc/testsuite/c-c++-common/attr-simd-3.c
index 2bbdf04..35dd4c0 100644
--- a/gcc/testsuite/c-c++-common/attr-simd-3.c
+++ b/gcc/testsuite/c-c++-common/attr-simd-3.c
@@ -2,4 +2,4 @@
  /* { dg-options "-fcilkplus" } */
  /* { dg-prune-output "undeclared here \\(not in a function\\)|\[^\n\r\]* was not 
declared in this scope" } */

-void f () __attribute__((__simd__, __vector__)); /* { dg-error "in the same 
function marked as a Cilk Plus" } */
+void f () __attribute__((__simd__, __vector__)); /* { dg-error "in the same function marked 
as a Cilk Plus" "PR68158" { xfail c++ } } */
diff --git a/gcc/testsuite/c-c++-common/attr-simd.c 
b/gcc/testsuite/c-c++-common/attr-simd.c
index 61974e3..7674588 100644
--- a/gcc/testsuite/c-c++-common/attr-simd.c
+++ b/gcc/testsuite/c-c++-common/attr-simd.c
@@ -11,7 +11,7 @@ int simd_attr (void)
return 0;
  }

-/* { dg-final { scan-tree-dump "simd_attr\[ \\t\]simdclone|vector" "optimized" 
} } */
+/* { dg-final { scan-tree-dump "simd_attr\[ \\t\]simdclone|vector" "optimized" 
{ target { i?86-*-* x86_64-*-* } } } } */
  /* { dg-final { scan-assembler-times "_ZGVbN4_simd_attr:" 1 { target { 
i?86-*-* x86_64-*-* } } } } */
  /* { dg-final { scan-assembler-times "_ZGVbM4_simd_attr:" 1 { target { 
i?86-*-* x86_64-*-* } } } } */
  /* { dg-final { scan-assembler-times "_ZGVcN4_simd_attr:" 1 { target { 
i?86-*-* x86_64-*-* } } } } */
@@ -29,7 +29,7 @@ int simd_attr2 (void)
return 0;
  }

-/* { dg-final { scan-tree-dump "simd_attr2\[ \\t\]simdclone|vector" 
"optimized" } } */
+/* { dg-final { scan-tree-dump "simd_attr2\[ \\t\]simdclone|vector" 
"optimized" { target { i?86-*-* x86_64-*-* } } } } */
  /* { dg-final { scan-assembler-times "_ZGVbN4_simd_attr2:" 1 { target { 
i?86-*-* x86_64-*-* } } } } */
  /* { dg-final { scan-assembler-times "_ZGVbM4_simd_attr2:" 1 { target { 
i?86-*-* x86_64-*-* } } } } */
  /* { dg-final { scan-assembler-times "_ZGVcN4_simd_attr2:" 1 { target { 
i?86-*-* x86_64-*-* } } } } */





[Bug middle-end/68339] g++.dg/vect/simd-clone-2.cc ICEs with aggressive GC settings and OpenMP

2015-11-20 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68339

--- Comment #1 from Jakub Jelinek  ---
Created attachment 36781
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36781=edit
gcc6-pr68339.patch

Untested fix.

Go patch committed: add receiver type to specific type function name

2015-11-20 Thread Ian Lance Taylor
This patch to the Go frontend fixes the case where two different
methods on different types with the same method name both define a
type internally with the same name where the type requires a specific
type hash or equality function.  Before this patch those functions
would get the same, causing a compilation error.  This patch gives
them different names using the same approach as is done for the type
descriptor.  I sent out a test case to the master Go testsuite in
https://golang.org/cl/17081.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline and GCC 5 branch.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 230463)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-e3aef41ce0c5be81e2589e60d9cb0db1516e9e2d
+dfa74d975884f363c74d6a66a37b1703093fdba6
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 230463)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -1769,7 +1769,16 @@ Type::specific_type_functions(Gogo* gogo
   const Named_object* in_function = name->in_function();
   if (in_function != NULL)
{
- base_name += '$' + Gogo::unpack_hidden_name(in_function->name());
+ base_name.append(1, '$');
+ const Typed_identifier* rcvr =
+   in_function->func_value()->type()->receiver();
+ if (rcvr != NULL)
+   {
+ Named_type* rcvr_type = rcvr->type()->deref()->named_type();
+ base_name.append(Gogo::unpack_hidden_name(rcvr_type->name()));
+ base_name.append(1, '$');
+   }
+ base_name.append(Gogo::unpack_hidden_name(in_function->name()));
  if (index > 0)
{
  char buf[30];


[Bug target/68456] UINT32_TYPE is long unsigned for 32bit targets

2015-11-20 Thread vaalfreja at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68456

--- Comment #3 from Yulia Koval  ---
I agree, that %u usage is not a bug, but it still looks strange..

For gcc --target=i586-elf  macro has:
#define __UINT32_TYPE__ long unsigned int

For non-target gcc or gcc --target=i586-unknown-linux it has:
#define __UINT32_TYPE__ unsigned int

And this behaviour is changed by a header, called "newlib-stdint"..

[Bug fortran/52846] [F2008] Support submodules

2015-11-20 Thread pault at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52846
Bug 52846 depends on bug 66762, which changed state.

Bug 66762 Summary: ICE when compiling gfortran.dg/submodule_[16].f90 with -flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66762

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug fortran/66762] ICE when compiling gfortran.dg/submodule_[16].f90 with -flto

2015-11-20 Thread pault at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66762

Paul Thomas  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #13 from Paul Thomas  ---
Fixed. Please note the wrong attributions between Steve and myself.

Thanks for the report and for testing the patch.

Paul

Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def

2015-11-20 Thread Tom de Vries

On 20/11/15 11:37, Richard Biener wrote:

   I'd rather make loop_optimizer_init do nothing
if requested flags are already set and no fixup is needed



Thus sth like

Index: gcc/loop-init.c
===
--- gcc/loop-init.c (revision 230649)
+++ gcc/loop-init.c (working copy)
@@ -103,7 +103,11 @@ loop_optimizer_init (unsigned flags)
calculate_dominance_info (CDI_DOMINATORS);

if (!needs_fixup)
-   checking_verify_loop_structure ();
+   {
+ checking_verify_loop_structure ();
+ if (loops_state_satisfies_p (flags))
+   goto out;


What about flags that are present in the loops state, but not requested 
in flags? Should we try to clear those flags?


Thanks,
- Tom


+   }

/* Clear all flags.  */
if (recorded_exits)
@@ -122,11 +126,12 @@ loop_optimizer_init (unsigned flags)
/* Apply flags to loops.  */
apply_loop_flags (flags);

+  checking_verify_loop_structure ();
+
+out:
/* Dump loops.  */
flow_loops_dump (dump_file, NULL, 1);

-  checking_verify_loop_structure ();
-
timevar_pop (TV_LOOP_INIT);
  }




Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.

2015-11-20 Thread Kirill Yukhin
Hello Kyrill,
On 20 Nov 12:15, Kyrill Tkachov wrote:
> >gcc/tessuite/
> > * c-c++-common/attr-simd-3.c: Put xfail (PR68158) on dg-error.
> 
> This test fails on bare-metal targets that don't support -fcilkplus or 
> -pthread.
> Would you consider moving them to the cilkplus testing directory or adding an 
> appropriate
> effective target check?
I think so. I'll commit change in the bottom.

gcc/testsuite/
* c-c++-common/attr-simd-3.c: Require Cilk Plus in effective target.

> 
> Thanks,
> Kyrill

--
Thanks, K

diff --git a/gcc/testsuite/c-c++-common/attr-simd-3.c 
b/gcc/testsuite/c-c++-common/attr-simd-3.c
index 35dd4c0..c7533f0 100644
--- a/gcc/testsuite/c-c++-common/attr-simd-3.c
+++ b/gcc/testsuite/c-c++-common/attr-simd-3.c
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target cilkplus } */
 /* { dg-do compile } */
 /* { dg-options "-fcilkplus" } */
 /* { dg-prune-output "undeclared here \\(not in a function\\)|\[^\n\r\]* was 
not declared in this scope" } */


Re: Remove noce_mem_write_may_trap_or_fault_p in ifcvt

2015-11-20 Thread Bernd Schmidt

On 11/19/2015 12:49 AM, Jeff Law wrote:

On 11/18/2015 12:16 PM, Bernd Schmidt wrote:

I don't think so, actually. One safe option would be to rip it out and
just stop transforming this case, but let's start by looking at the code
just a bit further down, calling noce_can_store_speculate. This was
added later than the code we're discussing, and it tries to verify that
the same memory location will unconditionally be written to at a point
later than the one we're trying to convert

And if we dig into that thread, Ian suggests this isn't terribly
important anyway.

However, if we go even further upthread, we find an assertion from
Michael that this is critical for 456.hmmer and references BZ 27313.


Cc'd in case he has additional input.


https://gcc.gnu.org/ml/gcc-patches/2007-08/msg01978.html

Sadly, no testcase was included.


BZ27313 is marked as fixed by the introduction of the tree cselim pass, 
thus the problem won't even be seen at RTL level.
I'm undecided on whether cs-elim is safe wrt the store speculation vs 
locks concerns raised in the thread discussing Ian's 
noce_can_store_speculate_p, but that's not something we have to consider 
to solve the problem at hand.



So if it weren't for the assertion that it's critical for hmmr, I'd be
convinced that just ripping out was the right thing to do.

Can you look at 27313 and hmmr and see if there's an impact.  Just maybe
the critical stuff for those is handled by the tree if converter and we
can just rip out the clearly incorrect RTL bits without regressing
anything performance-wise.  If there is an impact, then I think we have
to look at either improving the tree bits (so we can remove the rtl
bits) or we have to do real dataflow analysis in the rtl bits.


So I made this change:

   if (!set_b && MEM_P (orig_x))
-{
-  /* Disallow the "if (...) x = a;" form (implicit "else x = x;")
-for optimizations if writing to x may trap or fault,
-i.e. it's a memory other than a static var or a stack slot,
-is misaligned on strict aligned machines or is read-only.  If
-x is a read-only memory, then the program is valid only if we
-avoid the store into it.  If there are stores on both the
-THEN and ELSE arms, then we can go ahead with the conversion;
-either the program is broken, or the condition is always
-false such that the other memory is selected.  */
-  if (noce_mem_write_may_trap_or_fault_p (orig_x))
-   return FALSE;
-
-  /* Avoid store speculation: given "if (...) x = a" where x is a
-MEM, we only want to do the store if x is always set
-somewhere in the function.  This avoids cases like
-  if (pthread_mutex_trylock(mutex))
-++global_variable;
-where we only want global_variable to be changed if the mutex
-is held.  FIXME: This should ideally be expressed directly in
-RTL somehow.  */
-  if (!noce_can_store_speculate_p (test_bb, orig_x))
-   return FALSE;
-}
+return FALSE;

As far as I can tell hmmer and the 27313 testcase are unaffected at -O2 
(if anything, hmmer was very slightly faster afterwards). The run wasn't 
super-scientific, but I wouldn't have expected anything else given the 
existence of cs-elim.


Ok to do the above, removing all the bits made unnecessary (including 
memory_must_be_modified_in_insn_p in alias.c)?



Bernd


[Bug fortran/68237] ICE on invalid with submodules

2015-11-20 Thread pault at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68237

--- Comment #11 from Paul Thomas  ---
Author: pault
Date: Fri Nov 20 14:50:35 2015
New Revision: 230661

URL: https://gcc.gnu.org/viewcvs?rev=230661=gcc=rev
Log:
2015-11-20  Paul Thomas  

PR fortran/68237
* decl.c (gfc_match_submod_proc): Test the interface symbol
before accessing its attributes.

2015-11-20  Steven G. Kargl  

PR fortran/66762
(gfc_get_symbol_decl): Test for attr.used_in_submodule as well
as attr.use_assoc (twice).
(gfc_create_module_variable): Ditto.

2015-11-20  Paul Thomas  

PR fortran/68237
* gfortran.dg/submodule_12.f90: New test

PR fortran/66762
* gfortran.dg/submodule_6.f90: Add compile option -flto.

Added:
trunk/gcc/testsuite/gfortran.dg/submodule_12.f08
Modified:
trunk/gcc/fortran/ChangeLog
trunk/gcc/fortran/decl.c
trunk/gcc/fortran/trans-decl.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gfortran.dg/submodule_6.f08

Re: [PATCH][ARM] Do not expand movmisalign pattern if not in 32-bit mode

2015-11-20 Thread Ramana Radhakrishnan
On 11/11/15 16:10, Kyrill Tkachov wrote:
> Hi all,
> 
> The attached testcase ICEs when compiled with -march=armv6k -mthumb -Os or 
> any march
> for which -mthumb gives Thumb1:
>  error: unrecognizable insn:
>  }
>  ^
> (insn 13 12 14 5 (set (reg:SI 116 [ x ])
> (unspec:SI [
> (mem:SI (reg/v/f:SI 112 [ s ]) [0 MEM[(unsigned char 
> *)s_1(D)]+0 S4 A8])
> ] UNSPEC_UNALIGNED_LOAD)) besttry.c:9 -1
>  (nil))
> 
> The problem is that the expands a movmisalign pattern but the resulting 
> unaligned loads don't
> match any define_insn because they are gated on unaligned_access && 
> TARGET_32BIT.
> The unaligned_access expander is gated only on unaligned_access.
> 
> This small patch fixes the issue by turning off unaligned_access if 
> TARGET_32BIT is not true.
> We can then remove TARGET_32BIT from the unaligned load/store patterns 
> conditions as a cleanup.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2015-11-11  Kyrylo Tkachov  
> 
> * config/arm/arm.c (arm_option_override): Require TARGET_32BIT
> for unaligned_access.
> * config/arm/arm.md (unaligned_loadsi): Remove redundant TARGET_32BIT
> from matching condition.
> (unaligned_loadhis): Likewise.
> (unaligned_loadhiu): Likewise.
> (unaligned_storesi): Likewise.
> (unaligned_storehi): Likewise.
> 
> 2015-11-11  Kyrylo Tkachov  
> 
> * gcc.target/arm/armv6-unaligned-load-ice.c: New test.


This means we don't have unaligned access for some cores in Thumb1 for armv6. 
I'd rather not have the ICE instead of trying to find testing coverage on such 
cores now.

OK.


regards
Ramana


Re: Broken Link

2015-11-20 Thread Jonathan Wakely
On 20 November 2015 at 13:12,   wrote:
> Hey,
>
> I wanted to reach out and let you know about this link which isn’t working -
> http://developer.apple.com/documentation/Cocoa/Conceptual/ObjectiveC/, I
> found it on this page -
> http://gd.tuwien.ac.at/.vhost/www.gnu.org/software/gcc/readings.html. You’re
> link includes this text - "Objective-C Language Description", if that helps
> you find it.

That page is not hosted by the GCC project.

The official version of the page has a working link:
http://www.gnu.org/software/gcc/readings.html


[Bug fortran/68458] internal compiler error: Segmentation fault

2015-11-20 Thread rgaveiga at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68458

rgaveiga at gmail dot com  changed:

   What|Removed |Added

 CC||rgaveiga at gmail dot com
   Severity|normal  |blocker

[Patch] sync top level configure with binutils-gdb

2015-11-20 Thread Tristan Gingold
This patch was pushed on binutils-gdb repo, so I also commit it on gcc.

Tristan.


2015-11-20  Tristan Gingold  

Sync with binutils-gdb:
2015-11-20  Tristan Gingold  

* configure.ac: Add aarch64-*-darwin* and arm-*-darwin*.
* configure: Regenerate.

Index: configure
===
--- configure   (revision 230657)
+++ configure   (working copy)
@@ -3686,6 +3686,14 @@
 case "${target}" in
   *-*-chorusos)
 ;;
+  aarch64-*-darwin*)
+noconfigdirs="$noconfigdirs ld gas gdb gprof"
+noconfigdirs="$noconfigdirs sim target-rda"
+;;
+  arm-*-darwin*)
+noconfigdirs="$noconfigdirs ld gas gdb gprof"
+noconfigdirs="$noconfigdirs sim target-rda"
+;;
   powerpc-*-darwin*)
 noconfigdirs="$noconfigdirs ld gas gdb gprof"
 noconfigdirs="$noconfigdirs sim target-rda"
Index: configure.ac
===
--- configure.ac(revision 230657)
+++ configure.ac(working copy)
@@ -1023,6 +1023,14 @@
 case "${target}" in
   *-*-chorusos)
 ;;
+  aarch64-*-darwin*)
+noconfigdirs="$noconfigdirs ld gas gdb gprof"
+noconfigdirs="$noconfigdirs sim target-rda"
+;;
+  arm-*-darwin*)
+noconfigdirs="$noconfigdirs ld gas gdb gprof"
+noconfigdirs="$noconfigdirs sim target-rda"
+;;
   powerpc-*-darwin*)
 noconfigdirs="$noconfigdirs ld gas gdb gprof"
 noconfigdirs="$noconfigdirs sim target-rda"



[Bug target/68149] [6 Regression][ARM] ICE when splitting unaligned DImode load

2015-11-20 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68149

--- Comment #3 from ktkachov at gcc dot gnu.org ---
Author: ktkachov
Date: Fri Nov 20 15:15:31 2015
New Revision: 230663

URL: https://gcc.gnu.org/viewcvs?rev=230663=gcc=rev
Log:
[ARM] PR 68149 Fix ICE in unaligned_loaddi split

PR target/68149
* config/arm/arm.md (unaligned_loaddi): Delete.
(unaligned_storedi): Likewise.
* config/arm/arm.c (gen_movmem_ldrd_strd): Don't generate
unaligned DImode memory ops.  Instead perform two back-to-back
unaligned SImode ops.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/arm.c
trunk/gcc/config/arm/arm.md

Re: [PATCH] PR tree-optimization/68413 : Only check for integer cond reduction on analysis stage

2015-11-20 Thread Richard Biener
On Fri, Nov 20, 2015 at 1:33 PM, Alan Hayward  wrote:
>
>
> On 20/11/2015 11:00, "Richard Biener"  wrote:
>
>>On Fri, Nov 20, 2015 at 10:24 AM, Alan Hayward 
>>wrote:
>>> When vectorising a integer induction condition reduction,
>>> is_nonwrapping_integer_induction ends up with different values for base
>>> during the analysis and build phases. In the first it is an INTEGER_CST,
>>> in the second the loop has been vectorised out and the base is now a
>>> variable.
>>>
>>> This results in the analysis and build stage detecting the
>>> STMT_VINFO_VEC_REDUCTION_TYPE as different types.
>>>
>>> The easiest way to fix this is to only check for integer induction
>>> conditions on the analysis stage.
>>
>>I don't like this.  For the evolution part we have added
>>STMT_VINFO_LOOP_PHI_EVOLUTION_PART.  If you now need
>>the original initial value as well then just save it.
>>
>>Or if you really want to go with the hack then please do not call
>>is_nonwrapping_integer_induction with vec_stmt != NULL but
>>initialize cond_expr_is_nonwrapping_integer_induction from
>>STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
>>
>>The hack also lacks a comment.
>>
>
> Ok. I've gone for a combination of both:
>
> I now cache the base in STMT_VINFO_LOOP_PHI_EVOLUTION_BASE.
>
> I've removed the vec_stmt != NULL checks.
>
> I've moved the call to is_nonwrapping_integer_induction until after the
> vect_is_simple_reduction check. I never liked that I had
> is_nonwrapping_integer_induction early in the function, and think this
> looks better.

It looks better but the comment for loop_phi_evolution_base is wrong.
The value is _not_ "correct" after prologue peeling (unless you
update it there to a non-constant expr).  It is conservatively "correct"
for is_nonwrapping_integer_induction though.  Which is why I'd
probably rename it to STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED
to reflect this (and similar to STMT_VINFO_NITERS_UNCHANGED).

Ok with that change.

Thanks,
Richard.

>
> Alan.
>


[Bug target/68459] New: ICE when compiling for alpha with -O3

2015-11-20 Thread dhowells at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68459

Bug ID: 68459
   Summary: ICE when compiling for alpha with -O3
   Product: gcc
   Version: 5.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dhowells at redhat dot com
  Target Milestone: ---

Created attachment 36782
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36782=edit
Reduced test case

When compiling the attached test case for alpha-linux-gnu with -O3, the
compiler segfaults and produces the following:

alpha-linux-gnu-gcc -c -O3 /tmp/bz1265791-testcase.c 
/tmp/bz1265791-testcase.c: In function ‘synchronize_rcu_tasks’:
/tmp/bz1265791-testcase.c:8:6: internal compiler error: Segmentation fault
 void synchronize_rcu_tasks(void)
  ^

/usr/libexec/gcc/alpha-linux-gnu/5.2.1/cc1 -quiet -v /tmp/bz1265791-testcase.c
-quiet -dumpbase bz1265791-testcase.c -auxbase bz1265791-testcase -O3 -version
-o /tmp/ccf0UqVc.s

I caught it in gdb:

Program received signal SIGSEGV, Segmentation fault.
fold_builtin_alloca_with_align (stmt=0x719436c0)
at ../../gcc-5.2.1-20150716/gcc/tree-ssa-ccp.c:2067
2067&& TREE_CODE (BLOCK_SUPERCONTEXT (block)) == FUNCTION_DECL))


and I got the following backtrace:

#0  fold_builtin_alloca_with_align (stmt=0x719436c0)
at ../../gcc-5.2.1-20150716/gcc/tree-ssa-ccp.c:2067
#1  ccp_fold_stmt (gsi=0x7fffd8a0)
at ../../gcc-5.2.1-20150716/gcc/tree-ssa-ccp.c:2172
#2  0x00a07c17 in substitute_and_fold_dom_walker::before_dom_children (
this=0x7fffd960, bb=0x7191faf8)
at ../../gcc-5.2.1-20150716/gcc/tree-ssa-propagate.c:1177
#3  0x00b8dfe8 in dom_walker::walk (this=0x7fffd960, 
bb=0x7191faf8) at ../../gcc-5.2.1-20150716/gcc/domwalk.c:188
#4  0x00a0762a in substitute_and_fold (
get_value_fn=get_value_fn@entry=0x9942f0 , 
fold_fn=fold_fn@entry=0x99bfa0 , 
do_dce=do_dce@entry=true)
at ../../gcc-5.2.1-20150716/gcc/tree-ssa-propagate.c:1272
#5  0x00993b41 in ccp_finalize ()
at ../../gcc-5.2.1-20150716/gcc/tree-ssa-ccp.c:941
#6  do_ssa_ccp () at ../../gcc-5.2.1-20150716/gcc/tree-ssa-ccp.c:2382
#7  (anonymous namespace)::pass_ccp::execute (this=)
at ../../gcc-5.2.1-20150716/gcc/tree-ssa-ccp.c:2414
#8  0x0083aa9e in execute_one_pass (pass=pass@entry=0x12221d0)
at ../../gcc-5.2.1-20150716/gcc/passes.c:2330
#9  0x0083aeb6 in execute_pass_list_1 (pass=0x12221d0)
at ../../gcc-5.2.1-20150716/gcc/passes.c:2382
---Type  to continue, or q  to quit---
#10 0x0083aec8 in execute_pass_list_1 (pass=0x1222050, 
pass@entry=0x1221f90) at ../../gcc-5.2.1-20150716/gcc/passes.c:2383
#11 0x0083af09 in execute_pass_list (fn=0x7193cf18, pass=0x1221f90)
at ../../gcc-5.2.1-20150716/gcc/passes.c:2393
#12 0x005fa268 in cgraph_node::expand (this=this@entry=0x7183a300)
at ../../gcc-5.2.1-20150716/gcc/cgraphunit.c:1895
#13 0x005fb4ac in expand_all_functions ()
at ../../gcc-5.2.1-20150716/gcc/cgraphunit.c:2031
#14 symbol_table::compile (this=0x7183c000)
at ../../gcc-5.2.1-20150716/gcc/cgraphunit.c:2384
#15 0x005fc8b8 in symbol_table::finalize_compilation_unit (
this=0x7183c000) at ../../gcc-5.2.1-20150716/gcc/cgraphunit.c:2461
#16 0x005114cb in c_write_global_declarations ()
at ../../gcc-5.2.1-20150716/gcc/c/c-decl.c:10798
#17 0x008d8d55 in compile_file ()
at ../../gcc-5.2.1-20150716/gcc/toplev.c:613
#18 0x004fde1f in do_compile ()
at ../../gcc-5.2.1-20150716/gcc/toplev.c:2067
#19 toplev::main (this=this@entry=0x7fffdce0, argc=argc@entry=13, 
argv=argv@entry=0x7fffdde8)
at ../../gcc-5.2.1-20150716/gcc/toplev.c:2165
#20 0x004fe7aa in main (argc=13, argv=0x7fffdde8)
at ../../gcc-5.2.1-20150716/gcc/main.c:39

[Bug tree-optimization/67055] [5 Regression] Segmentation fault in fold_builtin_alloca_with_align in tree-ssa-ccp.c

2015-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67055

Richard Biener  changed:

   What|Removed |Added

 CC||dhowells at redhat dot com

--- Comment #19 from Richard Biener  ---
*** Bug 68459 has been marked as a duplicate of this bug. ***

[Bug target/68459] ICE when compiling for alpha with -O3

2015-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68459

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #2 from Richard Biener  ---
duplicate (already fixed)

*** This bug has been marked as a duplicate of bug 67055 ***

Re: [PATCH] PR tree-optimization/68413 : Only check for integer cond reduction on analysis stage

2015-11-20 Thread Alan Hayward


On 20/11/2015 13:47, "Richard Biener"  wrote:

>On Fri, Nov 20, 2015 at 1:33 PM, Alan Hayward 
>wrote:
>>
>>
>>On 20/11/2015 11:00, "Richard Biener"  wrote:
>>
>>>On Fri, Nov 20, 2015 at 10:24 AM, Alan Hayward 
>>>wrote:
When vectorising a integer induction condition reduction,
is_nonwrapping_integer_induction ends up with different values for base
during the analysis and build phases. In the first it is an
INTEGER_CST,
in the second the loop has been vectorised out and the base is now a
variable.

This results in the analysis and build stage detecting the
STMT_VINFO_VEC_REDUCTION_TYPE as different types.

The easiest way to fix this is to only check for integer induction
conditions on the analysis stage.
>>>
>>>I don't like this.  For the evolution part we have added
>>>STMT_VINFO_LOOP_PHI_EVOLUTION_PART.  If you now need
>>>the original initial value as well then just save it.
>>>
>>>Or if you really want to go with the hack then please do not call
>>>is_nonwrapping_integer_induction with vec_stmt != NULL but
>>>initialize cond_expr_is_nonwrapping_integer_induction from
>>>STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
>>>
>>>The hack also lacks a comment.
>>>
>>
>>Ok. I've gone for a combination of both:
>>
>>I now cache the base in STMT_VINFO_LOOP_PHI_EVOLUTION_BASE.
>>
>>I've removed the vec_stmt != NULL checks.
>>
>>I've moved the call to is_nonwrapping_integer_induction until after the
>>vect_is_simple_reduction check. I never liked that I had
>>is_nonwrapping_integer_induction early in the function, and think this
>>looks better.
>
>It looks better but the comment for loop_phi_evolution_base is wrong.
>The value is _not_ "correct" after prologue peeling (unless you
>update it there to a non-constant expr).  It is conservatively "correct"
>for is_nonwrapping_integer_induction though.  Which is why I'd
>probably rename it to STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED
>to reflect this (and similar to STMT_VINFO_NITERS_UNCHANGED).
>
>Ok with that change.

Updated as requested and submitted.

Alan.



analysisonlycondcheck3.patch
Description: Binary data


[Bug tree-optimization/68413] [6 Regression] internal compiler error: in vect_transform_stmt

2015-11-20 Thread alahay01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68413

--- Comment #9 from alahay01 at gcc dot gnu.org ---
Author: alahay01
Date: Fri Nov 20 14:20:24 2015
New Revision: 230658

URL: https://gcc.gnu.org/viewcvs?rev=230658=gcc=rev
Log:
2015-11-20  Alan Hayward 

PR tree-optimization/68413
* tree-vect-loop.c (vect_analyze_scalar_cycles_1): Cache
evolution base
(vectorizable_reduction): Use cached base


Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-vect-loop.c
trunk/gcc/tree-vectorizer.h

[Bug fortran/66762] ICE when compiling gfortran.dg/submodule_[16].f90 with -flto

2015-11-20 Thread pault at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66762

--- Comment #12 from Paul Thomas  ---
Author: pault
Date: Fri Nov 20 14:50:35 2015
New Revision: 230661

URL: https://gcc.gnu.org/viewcvs?rev=230661=gcc=rev
Log:
2015-11-20  Paul Thomas  

PR fortran/68237
* decl.c (gfc_match_submod_proc): Test the interface symbol
before accessing its attributes.

2015-11-20  Steven G. Kargl  

PR fortran/66762
(gfc_get_symbol_decl): Test for attr.used_in_submodule as well
as attr.use_assoc (twice).
(gfc_create_module_variable): Ditto.

2015-11-20  Paul Thomas  

PR fortran/68237
* gfortran.dg/submodule_12.f90: New test

PR fortran/66762
* gfortran.dg/submodule_6.f90: Add compile option -flto.

Added:
trunk/gcc/testsuite/gfortran.dg/submodule_12.f08
Modified:
trunk/gcc/fortran/ChangeLog
trunk/gcc/fortran/decl.c
trunk/gcc/fortran/trans-decl.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gfortran.dg/submodule_6.f08

[Patch, fortran] PRs 68237 & 66762 - submodule problems

2015-11-20 Thread Paul Richard Thomas
Dear All,

I have committed as 'obvious'  revision 230661 to fix 2/3 submodule
problems. In the case of the third, PR68243, I believe gfortran is
behaving correctly and I am awaiting confirmation from the reporter.

Thanks to Dominique for regtesting the part of the patch that fixes PR66762.

I have bootstrapped and regtested both part on FC21/x86_64.

Cheers

Paul

PS I have just noticed that I attributed the wrong part of the patch
to Steve. I will correct the ChangeLog in just a moment.


2015-11-20  Paul Thomas  

PR fortran/68237
* decl.c (gfc_match_submod_proc): Test the interface symbol
before accessing its attributes.

2015-11-20  Steven G. Kargl  

PR fortran/66762
(gfc_get_symbol_decl): Test for attr.used_in_submodule as well
as attr.use_assoc (twice).
(gfc_create_module_variable): Ditto.

2015-11-20  Paul Thomas  

PR fortran/68237
* gfortran.dg/submodule_12.f90: New test

PR fortran/66762
* gfortran.dg/submodule_6.f90: Add compile option -flto.


[Bug target/68456] UINT32_TYPE is long unsigned for 32bit targets

2015-11-20 Thread dmitry.polukhin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68456

Dmitry Polukhin  changed:

   What|Removed |Added

 CC||dmitry.polukhin at gmail dot 
com

--- Comment #4 from Dmitry Polukhin  ---
What is the advantage of using 'long' instead of 'int' for uint32_t on a
platform where both types can be used (i.e. actually they have the same size)?

GLibC uses 'int', it better matches user expectations and, if using 'long'
doesn't make other advantages, it makes compatibility issues without giving
benefits. So just curious in rationale behind using 'long' instead of 'int'.

[Bug c++/68312] [6 Regression] Memory leaks in cilkplus

2015-11-20 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68312

--- Comment #3 from Martin Liška  ---
Created attachment 36783
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36783=edit
Suggested patch1

[Bug fortran/68243] QOI: no warning about unused entities in submodules

2015-11-20 Thread pault at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68243

Paul Thomas  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #6 from Paul Thomas  ---
Dear Martin,

I believe that the testcase in my previous comment shows that this PR is
invalid. If you do not agree, please reopen it and tell me why.

Thanks for the report anyway!

Cheers

Paul

[Bug c++/68312] [6 Regression] Memory leaks in cilkplus

2015-11-20 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68312

--- Comment #4 from Martin Liška  ---
Created attachment 36784
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36784=edit
Suggested patch2

Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def

2015-11-20 Thread Richard Biener
On Fri, 20 Nov 2015, Tom de Vries wrote:

> On 20/11/15 11:37, Richard Biener wrote:
> >I'd rather make loop_optimizer_init do nothing
> > if requested flags are already set and no fixup is needed
> 
> > Thus sth like
> > 
> > Index: gcc/loop-init.c
> > ===
> > --- gcc/loop-init.c (revision 230649)
> > +++ gcc/loop-init.c (working copy)
> > @@ -103,7 +103,11 @@ loop_optimizer_init (unsigned flags)
> > calculate_dominance_info (CDI_DOMINATORS);
> > 
> > if (!needs_fixup)
> > -   checking_verify_loop_structure ();
> > +   {
> > + checking_verify_loop_structure ();
> > + if (loops_state_satisfies_p (flags))
> > +   goto out;
> 
> What about flags that are present in the loops state, but not requested in
> flags? Should we try to clear those flags?

No, I don't think so, that would break in-loop-pipeline LIM, dropping
loop-closed SSA for example.

I agree it's somewhat of an odd behavior but all passes should
either be placed in a sub-pipeline with an outer 
loop_optimizer_init()/finalize () call or call both themselves.

Richard.

> Thanks,
> - Tom
> 
> > +   }
> > 
> > /* Clear all flags.  */
> > if (recorded_exits)
> > @@ -122,11 +126,12 @@ loop_optimizer_init (unsigned flags)
> > /* Apply flags to loops.  */
> > apply_loop_flags (flags);
> > 
> > +  checking_verify_loop_structure ();
> > +
> > +out:
> > /* Dump loops.  */
> > flow_loops_dump (dump_file, NULL, 1);
> > 
> > -  checking_verify_loop_structure ();
> > -
> > timevar_pop (TV_LOOP_INIT);
> >   }
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[Bug tree-optimization/68413] [6 Regression] internal compiler error: in vect_transform_stmt

2015-11-20 Thread alahay01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68413

alahay01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from alahay01 at gcc dot gnu.org ---
Instead of only only checking on the analysis stage, we now cache the base
value (like we already did with the step).

[Bug fortran/68458] New: internal compiler error: Segmentation fault

2015-11-20 Thread rgaveiga at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68458

Bug ID: 68458
   Summary: internal compiler error: Segmentation fault
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rgaveiga at gmail dot com
  Target Milestone: ---

If I try to compile even a simple code like the following

program test
print*,'Buh'
end program test

with gfortran, I get

: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

The version is 5.2 and the platform is Cygwin-64.

[Bug target/68459] ICE when compiling for alpha with -O3

2015-11-20 Thread dhowells at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68459

--- Comment #1 from dhowells at redhat dot com  ---
The backtrace was obtained from a compiler built from unpatched gcc sources
produced from a gcc SVN branch with the following parameters:

SVNREV 225895
DATE 20150716
gcc_version 5.2.1

The compiler was configured as follows:

CC=gcc \
CXX=g++ \
CFLAGS='-O2 -g -Wall -Wformat-security -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -mtune=generic' \
CXXFLAGS=' -O2 -g -Wformat-security -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -mtune=generic ' \
CFLAGS_FOR_TARGET='-g -O2 -Wall -fexceptions' \
AR_FOR_TARGET=/usr/bin/alpha-linux-gnu-ar \
AS_FOR_TARGET=/usr/bin/alpha-linux-gnu-as \
DLLTOOL_FOR_TARGET=/usr/bin/alpha-linux-gnu-dlltool \
LD_FOR_TARGET=/usr/bin/alpha-linux-gnu-ld \
NM_FOR_TARGET=/usr/bin/alpha-linux-gnu-nm \
OBJDUMP_FOR_TARGET=/usr/bin/alpha-linux-gnu-objdump \
RANLIB_FOR_TARGET=/usr/bin/alpha-linux-gnu-ranlib \
READELF_FOR_TARGET=/usr/bin/alpha-linux-gnu-readelf \
STRIP_FOR_TARGET=/usr/bin/alpha-linux-gnu-strip \
WINDRES_FOR_TARGET=/usr/bin/alpha-linux-gnu-windres \
WINDMC_FOR_TARGET=/usr/bin/alpha-linux-gnu-windmc \
LDFLAGS='-Wl,-z,relro ' \
../gcc-5.2.1-20150716/configure --bindir=/usr/bin
--build=x86_64-redhat-linux-gnu --datadir=/usr/share --disable-decimal-float
--disable-dependency-tracking --disable-gold --disable-libgcj --disable-libgomp
--disable-libmudflap --disable-libquadmath --disable-libssp
--disable-libunwind-exceptions --disable-nls --disable-plugin --disable-shared
--disable-silent-rules --disable-sjlj-exceptions --disable-threads
--with-ld=/usr/bin/alpha-linux-gnu-ld --enable-__cxa_atexit
--enable-checking=release --enable-gnu-indirect-function
--enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++
--enable-linker-build-id --enable-nls --enable-obsolete --enable-plugin
--enable-targets=all --exec-prefix=/usr --host=x86_64-redhat-linux-gnu
--includedir=/usr/include --infodir=/usr/share/info --libexecdir=/usr/libexec
--localstatedir=/var --mandir=/usr/share/man --prefix=/usr
--program-prefix=alpha-linux-gnu- --sbindir=/usr/sbin --sharedstatedir=/var/lib
--sysconfdir=/etc --target=alpha-linux-gnu
--with-bugurl=http://bugzilla.redhat.com/bugzilla/ --with-isl
--with-linker-hash-style=gnu --with-newlib
--with-sysroot=/usr/alpha-linux-gnu/sys-root --with-system-libunwind
--with-system-zlib --without-headers

The binutils was a 2.25.1 cross hosted on x86_64 and targetted at
alpha-linux-gnu.

Re: [PATCH, PR68337] Don't fold memcpy/memmove we want to instrument

2015-11-20 Thread Richard Biener
On Fri, Nov 20, 2015 at 2:08 PM, Ilya Enkovich  wrote:
> On 19 Nov 18:19, Richard Biener wrote:
>> On November 19, 2015 6:12:30 PM GMT+01:00, Bernd Schmidt 
>>  wrote:
>> >On 11/19/2015 05:31 PM, Ilya Enkovich wrote:
>> >> Currently we fold all memcpy/memmove calls with a known data size.
>> >> It causes two problems when used with Pointer Bounds Checker.
>> >> The first problem is that we may copy pointers as integer data
>> >> and thus loose bounds.  The second problem is that if we inline
>> >> memcpy, we also have to inline bounds copy and this may result
>> >> in a huge amount of code and significant compilation time growth.
>> >> This patch disables folding for functions we want to instrument.
>> >>
>> >> Does it look reasonable for trunk and GCC5 branch?  Bootstrapped
>> >> and regtested on x86_64-unknown-linux-gnu.
>> >
>> >Can't see anything wrong with it. Ok.
>>
>> But for small sizes this can have a huge impact on optimization.  Which is 
>> why we have the code in the first place.  I'd make the check less broad, for 
>> example inlining copies of size less than a pointer shouldn't be affected.
>
> Right.  We also may inline in case we know no pointers are copied.  Below is 
> a version with extended condition and a couple more tests.  Bootstrapped and 
> regtested on x86_64-unknown-linux-gnu.  Does it OK for trunk and gcc-5-branch?
>
>>
>> Richard.
>>
>> >
>> >Bernd
>>
>>
>
> Thanks,
> Ilya
> --
> gcc/
>
> 2015-11-20  Ilya Enkovich  
>
> * gimple-fold.c (gimple_fold_builtin_memory_op): Don't
> fold call if we are going to instrument it and it may
> copy pointers.
>
> gcc/testsuite/
>
> 2015-11-20  Ilya Enkovich  
>
> * gcc.target/i386/mpx/pr68337-1.c: New test.
> * gcc.target/i386/mpx/pr68337-2.c: New test.
> * gcc.target/i386/mpx/pr68337-3.c: New test.
>
>
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index 1ab20d1..dd9f80b 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -53,6 +53,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gomp-constants.h"
>  #include "optabs-query.h"
>  #include "omp-low.h"
> +#include "tree-chkp.h"
> +#include "ipa-chkp.h"
>
>
>  /* Return true when DECL can be referenced from current unit.
> @@ -664,6 +666,23 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
>unsigned int src_align, dest_align;
>tree off0;
>
> +  /* Inlining of memcpy/memmove may cause bounds lost (if we copy
> +pointers as wide integer) and also may result in huge function
> +size because of inlined bounds copy.  Thus don't inline for
> +functions we want to instrument in case pointers are copied.  */
> +  if (flag_check_pointer_bounds
> + && chkp_instrumentable_p (cfun->decl)
> + /* Even if data may contain pointers we can inline if copy
> +less than a pointer size.  */
> + && (!tree_fits_uhwi_p (len)
> + || compare_tree_int (len, POINTER_SIZE_UNITS) >= 0)

|| tree_to_uhwi (len) >= POINTER_SIZE_UNITS

> + /* Check data type for pointers.  */
> + && (!TREE_TYPE (src)
> + || !TREE_TYPE (TREE_TYPE (src))
> + || VOID_TYPE_P (TREE_TYPE (TREE_TYPE (src)))
> + || chkp_type_has_pointer (TREE_TYPE (TREE_TYPE (src)

I don't think you can in any way rely on the pointer type of the src argument
as all pointer conversions are useless and memcpy and friends take void *
anyway.

Note that you also disable memmove to memcpy simplification with this
early check.

Where is pointer transfer handled for MPX?  I suppose it's not done
transparently
for all memory move instructions but explicitely by instrumented block copy
routines in libmpx?  In which case how does that identify pointers vs.
non-pointers?

Richard.

> +   return false;
> +
>/* Build accesses at offset zero with a ref-all character type.  */
>off0 = build_int_cst (build_pointer_type_for_mode (char_type_node,
>  ptr_mode, true), 0);
> diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c 
> b/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c
> new file mode 100644
> index 000..3f8d79d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c
> @@ -0,0 +1,32 @@
> +/* { dg-do run } */
> +/* { dg-options "-fcheck-pointer-bounds -mmpx" } */
> +
> +#include "mpx-check.h"
> +
> +#define N 2
> +
> +extern void abort ();
> +
> +static int
> +mpx_test (int argc, const char **argv)
> +{
> +  char ** src = (char **)malloc (sizeof (char *) * N);
> +  char ** dst = (char **)malloc (sizeof (char *) * N);
> +  int i;
> +
> +  for (i = 0; i < N; i++)
> +src[i] = __bnd_set_ptr_bounds (argv[0] + i, i + 1);
> +
> +  __builtin_memcpy(dst, src, sizeof (char *) * N);
> +
> +  for (i = 0; i < N; i++)
> +{
> +  char *p = dst[i];
> 

Re: basic asm and memory clobbers

2015-11-20 Thread Segher Boessenkool
On Fri, Nov 20, 2015 at 02:45:05AM -0800, David Wohlferd wrote:
> On 11/19/2015 7:14 PM, Segher Boessenkool wrote:
> >On Thu, Nov 19, 2015 at 05:23:55PM -0800, David Wohlferd wrote:
> >>For that reason, I'd like to propose adding 2 new clobbers to extended
> >>asm as part of this work:
> >>
> >>"clobberall" - This gives extended the same semantics as whatever the
> >>new basic asm will be using.
> >>"clobbernone" - This gives the same semantics as the current basic asm.
> >I don't think this is necessary or useful.  They are also awful names:
> >"clobberall" cannot clobber everything (think of the stack pointer),
> 
> I'm not emotionally attached to the names.

Names should be succinct, clear, and give a good indication of what the
thing named does.  If it is hard to make a good name it is likely that
the interface isn't so well designed.

> But providing the same 
> capability to extended that we are proposing for basic doesn't seem so 
> odd.  Shouldn't extended be able to do (at least) everything basic does?

But that would be logical!  Can't have that.  Heh.

> As you say, clobbering the stack pointer presents special challenges 
> (although gcc has a specific way of dealing with stack register 
> clobbers, see 52813).

Yeah.  Actually, basic asm is handled specially in many places, too.

> >and "clobbernone" does clobber some (those clobbered by any asm),
> 
> Seems like a quibble.  Those other things (I assume you mean things like 
> pipelining?) most users aren't even aware of (or they wouldn't be so 
> eager to use inline asm in the first place).  Would it be more palatable 
> if we called it "v5BasicAsmMode"?  "ClobberMin"?

I meant things like x86 "cc".

> >>Clobbernone may seem redundant, since not specifying any clobbers should
> >>do the same thing.  But actually it doesn't, at least on i386.  At
> >>present, there is no way for extended asm to not clobber "cc".  I don't
> >>know if other platforms have similar issues.
> >Some do.  The purpose is to stay compatible with asm written for older
> >versions of the compiler.
> 
> Backward compatibility is important.  I understand that due to the cc0 
> change in x86, existing code may have broken without always clobbering 
> cc.  This was seen as the safest way to ensure that didn't happen.  
> However no solution was/is available for people who correctly knew 
> whether their asm clobbers the flags.
> 
> Mostly I'm ok with that.  All the ways that I can think of to try to 
> re-allow people to start using the cc clobber are just not worth it.  I 
> simply can't believe there are many cases where there's going to be a 
> benefit.

Exactly.  The asm still can be moved "over" other uses of CC, it does
not limit transformations much at all.

> But as I said: backward compatibility is important.  Providing a way for 
> people who need/want the old basic asm semantics seems useful. And I 
> don't believe we can (quite) do that without clobbernone.
> 
> >>When basic asm changes, I expect that having a way to "just do what it
> >>used to do" is going to be useful for some people.
> >24414 says the documented behaviour hasn't been true for at least
> >fourteen years.  It isn't likely anyone is relying on that behaviour.
> 
> ?

24414 says these things haven't worked since at least 2.95.3, which is
fourteen years old now.

> >It isn't necessary for users to know what registers the compiler
> >considers to be clobbered by an asm, unless they actually clobber
> >something in the assembler code themselves.
> 
> I'm not sure I follow.

If the assembler code does not clobber some register, but GCC treats it
as if it does, things will work correctly.


Segher


[Bug fortran/68237] ICE on invalid with submodules

2015-11-20 Thread pault at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68237

Paul Thomas  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from Paul Thomas  ---
Fixed on trunk. Please note that I got Steve and my names the wrong way round.
This has been corrected in the ChangeLog

Thanks for the report

Paul

Re: [PATCH][ARM] PR 68149 Fix ICE in unaligned_loaddi split

2015-11-20 Thread Ramana Radhakrishnan
On 10/11/15 17:32, Kyrill Tkachov wrote:
> Hi all,
> 
> This ICE in this PR occurs when we're trying to split unaligned_loaddi into 
> two SImode unaligned loads.
> The problem is in the addressing mode.  When reload was picking the 
> addressing mode we accepted an offset of
> -256 because the mode in the pattern is advertised as DImode and that was 
> accepted by the legitimate address
> hooks because they thought it was a NEON load (DImode is in 
> VALID_NEON_DREG_MODE). However, the splitter wants
> to generate two normal SImode unaligned loads using that address, for which 
> -256 is not valid, so we ICE
> in gen_lowpart.
> 
> The only way unaligned_loaddi could be generated was through the 
> gen_movmem_ldrd_strd expansion that implements
> a memmove using LDRD and STRD sequences. If the memmove source is not aligned 
> we can't use LDRDs so the code
> generates unaligned_loaddi patterns and expects them to be split into two 
> normal loads after reload. Similarly
> for unaligned store destinations.
> 
> This patch just explicitly generates the two unaligned SImode loads or stores 
> when appropriate inside
> gen_movmem_ldrd_strd.  This makes the unaligned_loaddi and unaligned_storedi 
> patterns unused, so we can remove them.
> 
> This patch fixes the ICe in gcc.target/aarch64/advsimd-intrinsics/vldX.c seen 
> with
> -mthumb -mcpu=cortex-a15 -mfpu=neon-vfpv4 -mfloat-abi=hard -mfp16-format=ieee
> so no new testcase is added.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2015-11-10  Kyrylo Tkachov  
> 
> PR target/68149
> * config/arm/arm.md (unaligned_loaddi): Delete.
> (unaligned_storedi): Likewise.
> * config/arm/arm.c (gen_movmem_ldrd_strd): Don't generate
> unaligned DImode memory ops.  Instead perform two back-to-back
> unalgined SImode ops.


s/unalgined/unaligned.


Ok.


Ramana


[Bug target/67822] OpenMP offloading to nvptx fails

2015-11-20 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67822

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
So fixed?

[Bug rtl-optimization/68128] A huge regression in Parboil v2.5 OpenMP CUTCP test (2.5 times lower performance)

2015-11-20 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68128

--- Comment #3 from Jakub Jelinek  ---
Can't reproduce, at least not on i7-5960X (thus OMP_NUM_THREADS=16).
gcc -Ofast -fopenmp built cutcp is roughly the same performance in all of 4.6,
4.8, 5.1 and 6, the only thing that reliably helps (but only something like
3-4%) is defining __INTEL_COMPILER, as the benchmark uses different code for
ICC and for other compilers, where other compilers use atomics that aren't used
for ICC.

Re: [PATCH, PR68337] Don't fold memcpy/memmove we want to instrument

2015-11-20 Thread Ilya Enkovich
On 20 Nov 14:54, Richard Biener wrote:
> On Fri, Nov 20, 2015 at 2:08 PM, Ilya Enkovich  wrote:
> > On 19 Nov 18:19, Richard Biener wrote:
> >> On November 19, 2015 6:12:30 PM GMT+01:00, Bernd Schmidt 
> >>  wrote:
> >> >On 11/19/2015 05:31 PM, Ilya Enkovich wrote:
> >> >> Currently we fold all memcpy/memmove calls with a known data size.
> >> >> It causes two problems when used with Pointer Bounds Checker.
> >> >> The first problem is that we may copy pointers as integer data
> >> >> and thus loose bounds.  The second problem is that if we inline
> >> >> memcpy, we also have to inline bounds copy and this may result
> >> >> in a huge amount of code and significant compilation time growth.
> >> >> This patch disables folding for functions we want to instrument.
> >> >>
> >> >> Does it look reasonable for trunk and GCC5 branch?  Bootstrapped
> >> >> and regtested on x86_64-unknown-linux-gnu.
> >> >
> >> >Can't see anything wrong with it. Ok.
> >>
> >> But for small sizes this can have a huge impact on optimization.  Which is 
> >> why we have the code in the first place.  I'd make the check less broad, 
> >> for example inlining copies of size less than a pointer shouldn't be 
> >> affected.
> >
> > Right.  We also may inline in case we know no pointers are copied.  Below 
> > is a version with extended condition and a couple more tests.  Bootstrapped 
> > and regtested on x86_64-unknown-linux-gnu.  Does it OK for trunk and 
> > gcc-5-branch?
> >
> >>
> >> Richard.
> >>
> >> >
> >> >Bernd
> >>
> >>
> >
> > Thanks,
> > Ilya
> > --
> > gcc/
> >
> > 2015-11-20  Ilya Enkovich  
> >
> > * gimple-fold.c (gimple_fold_builtin_memory_op): Don't
> > fold call if we are going to instrument it and it may
> > copy pointers.
> >
> > gcc/testsuite/
> >
> > 2015-11-20  Ilya Enkovich  
> >
> > * gcc.target/i386/mpx/pr68337-1.c: New test.
> > * gcc.target/i386/mpx/pr68337-2.c: New test.
> > * gcc.target/i386/mpx/pr68337-3.c: New test.
> >
> >
> > diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> > index 1ab20d1..dd9f80b 100644
> > --- a/gcc/gimple-fold.c
> > +++ b/gcc/gimple-fold.c
> > @@ -53,6 +53,8 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "gomp-constants.h"
> >  #include "optabs-query.h"
> >  #include "omp-low.h"
> > +#include "tree-chkp.h"
> > +#include "ipa-chkp.h"
> >
> >
> >  /* Return true when DECL can be referenced from current unit.
> > @@ -664,6 +666,23 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator 
> > *gsi,
> >unsigned int src_align, dest_align;
> >tree off0;
> >
> > +  /* Inlining of memcpy/memmove may cause bounds lost (if we copy
> > +pointers as wide integer) and also may result in huge function
> > +size because of inlined bounds copy.  Thus don't inline for
> > +functions we want to instrument in case pointers are copied.  */
> > +  if (flag_check_pointer_bounds
> > + && chkp_instrumentable_p (cfun->decl)
> > + /* Even if data may contain pointers we can inline if copy
> > +less than a pointer size.  */
> > + && (!tree_fits_uhwi_p (len)
> > + || compare_tree_int (len, POINTER_SIZE_UNITS) >= 0)
> 
> || tree_to_uhwi (len) >= POINTER_SIZE_UNITS
> 
> > + /* Check data type for pointers.  */
> > + && (!TREE_TYPE (src)
> > + || !TREE_TYPE (TREE_TYPE (src))
> > + || VOID_TYPE_P (TREE_TYPE (TREE_TYPE (src)))
> > + || chkp_type_has_pointer (TREE_TYPE (TREE_TYPE (src)
> 
> I don't think you can in any way rely on the pointer type of the src argument
> as all pointer conversions are useless and memcpy and friends take void *
> anyway.

This check is looking for cases when we have type information indicating
no pointers are copied.  In case of 'void *' we have to assume pointers
are copied and inlining is undesired.  Test pr68337-2.c checks pointer
type allows to enable inlining.  Looks like this check misses
|| !COMPLETE_TYPE_P(TREE_TYPE (TREE_TYPE (src)))?

> 
> Note that you also disable memmove to memcpy simplification with this
> early check.

Doesn't matter for MPX which uses the same implementation for both cases.

> 
> Where is pointer transfer handled for MPX?  I suppose it's not done
> transparently
> for all memory move instructions but explicitely by instrumented block copy
> routines in libmpx?  In which case how does that identify pointers vs.
> non-pointers?

It is handled by instrumentation pass.  Compiler checks type of stored data to
find pointer stores.  Each pointer store is instrumented with bndstx call.

MPX versions of memcpy, memmove etc. don't make any assumptions about
type of copied data and just copy whole chunk of bounds metadata corresponding
to copied block.

Thanks,
Ilya

> 
> Richard.
> 


[ptx] overrride anchor hook

2015-11-20 Thread Nathan Sidwell
Jim discovered that he needed to override the anchoring hook when using a PPC 
host-side compiler, but didn't figure out why this was needed.  Digging into it, 
I discovered that flag_section_anchors is cleared in toplev.c by the command 
line option machinery, if there are no anchor target hooks.  However, that's 
done in the host compiler and the LTO machinery simply copies the value over 
into the LTO compiler.   Usually that's fine, as the LTO compiler's for the same 
target architecture.  Except when it's an offload compiler.  The flags are not 
resanitized (perhaps that should be done?)


Anyway, that led to the PTX accelerator compiler trying to do section anchory 
things.  Fixed by overriding TARGET_USE_ANCHORS_FOR_SYMBOL_P to say 'no'.


I guess at the next instance of an offload compiler seeing a 'surprising' 
combination of optimization flags, we'll need a resanitize hook?  Jakub?


nathan
2015-11-20  Nathan Sidwell  
	James Norris  

	* config/nvptx/nvptx.c (nvptx_use_anchors_for_symbol_p): New.
	(TARGET_USE_ANCHORS_FOR_SYMBOL_P): Override.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 230657)
+++ config/nvptx/nvptx.c	(working copy)
@@ -3895,6 +3895,19 @@ nvptx_cannot_copy_insn_p (rtx_insn *insn
   return false;
 }
 }
+
+/* Section anchors do not work.  Initialization for flag_section_anchor
+   probes the existence of the anchoring target hooks and prevents
+   anchoring if they don't exist.  However, we may be being used with
+   a host-side compiler that does support anchoring, and hence see
+   the anchor flag set (as it's not recalculated).  So provide an
+   implementation denying anchoring.  */
+
+static bool
+nvptx_use_anchors_for_symbol_p (const_rtx ARG_UNUSED (a))
+{
+  return false;
+}
 
 /* Record a symbol for mkoffload to enter into the mapping table.  */
 
@@ -4914,6 +4927,9 @@ nvptx_goacc_reduction (gcall *call)
 #undef TARGET_CANNOT_COPY_INSN_P
 #define TARGET_CANNOT_COPY_INSN_P nvptx_cannot_copy_insn_p
 
+#undef TARGET_USE_ANCHORS_FOR_SYMBOL_P
+#define TARGET_USE_ANCHORS_FOR_SYMBOL_P nvptx_use_anchors_for_symbol_p
+
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS nvptx_init_builtins
 #undef TARGET_EXPAND_BUILTIN


Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models

2015-11-20 Thread Kyrill Tkachov


On 20/11/15 12:27, James Greenhalgh wrote:

On Thu, Nov 12, 2015 at 11:32:36AM -0600, Evandro Menezes wrote:

On 11/12/2015 09:39 AM, Evandro Menezes wrote:
2015-11-12  Evandro Menezes 

[AArch64] Add attribute for compatibility with ARM pipeline models

gcc/

* config/aarch64/aarch64.md (predicated): Copy attribute from
"arm.md".
* config/arm/arm.md (predicated): Added description.


The arm part is ok too. It's just a comment.
In the ChangeLog entry for arm.md I'd say "Add description."

Thanks,
Kyrill


Please, commit if it's alright.

The AArch64 part of this is OK.


 From 3fa6a2bca8f3d2992b4607cff0afcc2d9caa96f4 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 9 Nov 2015 17:11:16 -0600
Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM
  pipeline models

gcc/
* config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md".
* config/arm/arm.md (predicated): Added description.
---
  gcc/config/aarch64/aarch64.md | 4 
  gcc/config/arm/arm.md | 3 +++
  2 files changed, 7 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1586256..d46f837 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -195,6 +195,10 @@
  ;; 1 :=: yes
  (define_attr "far_branch" "" (const_int 0))
  
+;; Strictly for compatibility with AArch32 in pipeline models, since AArch64 has

+;; no predicated insns.
+(define_attr "predicated" "yes,no" (const_string "no"))
+
  ;; ---
  ;; Pipeline descriptions and scheduling
  ;; ---
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 73c3088..6bda491 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -105,6 +105,9 @@
  (define_attr "fpu" "none,vfp"
(const (symbol_ref "arm_fpu_attr")))
  
+; Predicated means that the insn form is conditionally executed based on a

+; predicate.  We default to 'no' because no Thumb patterns match this rule
+; and not all ARM insns do.

s/is conditionally executed/can be conditionally executed/ in the first
sentence. Otherwise, this looks OK to me but I can't approve the ARM part,
so you'll need to wait for a review from someone who can.

Thanks,
James


  (define_attr "predicated" "yes,no" (const_string "no"))
  
  ; LENGTH of an instruction (in bytes)

--
2.1.0.243.g30d45f7





Re: [Committed] S/390: Add bswaphi2 pattern

2015-11-20 Thread Andreas Krebbel
On 11/20/2015 01:23 PM, Richard Henderson wrote:
> On 11/20/2015 12:52 PM, Andreas Krebbel wrote:
>> +(define_insn "bswaphi2"
>> +  [(set (match_operand:HI 0   "register_operand" "=d")
>> +(bswap:HI (match_operand:HI 1 "memory_operand"   "RT")))]
>> +  "TARGET_CPU_ZARCH"
>> +  "lrvh\t%0,%1"
>> +  [(set_attr "type" "load")
>> +   (set_attr "op_type" "RXY")
>> +   (set_attr "z10prop" "z10_super")])
> 
> Surely it's better to arrange so that you can use STRVH as well.
> And providing a fallback for the reg-reg case (e.g. LRVR+SRL).
> 
> Although I suppose I don't see support for STRV in bswap32/64 either...

Right, I totally forgot about the stores. I'll have a look.

We even have a mem-mem variant (mvcin). But I found it rather difficult to use 
since the pattern
would have to make sure that source and destination do not overlap.

-Andreas-



Re: [PATCH, PR tree-optimization/68327] Compute vectype for live phi nodes when copmputing VF

2015-11-20 Thread Ilya Enkovich
On 20 Nov 14:31, Ilya Enkovich wrote:
> 2015-11-20 14:28 GMT+03:00 Richard Biener :
> > On Wed, Nov 18, 2015 at 2:53 PM, Ilya Enkovich  
> > wrote:
> >> 2015-11-18 16:44 GMT+03:00 Richard Biener :
> >>> On Wed, Nov 18, 2015 at 12:34 PM, Ilya Enkovich  
> >>> wrote:
>  Hi,
> 
>  When we compute vectypes we skip non-relevant phi nodes.  But we process 
>  non-relevant alive statements and thus may need vectype of non-relevant 
>  live phi node to compute mask vectype.  This patch enables vectype 
>  computation for live phi nodes.  Botostrapped and regtested on 
>  x86_64-unknown-linux-gnu.  OK for trunk?
> >>>
> >>> Hmm.  What breaks if you instead skip all !relevant stmts and not
> >>> compute vectype for life but not relevant ones?  We won't ever
> >>> "vectorize" !relevant ones, that is, we don't need their vector type.
> >>
> >> I tried it and got regression in SLP.  It expected non-null vectype
> >> for non-releveant but live statement. Regression was in
> >> gcc/gcc/testsuite/gfortran.fortran-torture/execute/pr43390.f90
> >
> > Because somebody put a vector type check before
> >
> >   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> > return false;
> >
> > @@ -7590,6 +7651,9 @@ vectorizable_comparison (gimple *stmt, g
> >tree mask_type;
> >tree mask;
> >
> > +  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> > +return false;
> > +
> >if (!VECTOR_BOOLEAN_TYPE_P (vectype))
> >  return false;
> >
> > @@ -7602,8 +7666,6 @@ vectorizable_comparison (gimple *stmt, g
> >  ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
> >
> >gcc_assert (ncopies >= 1);
> > -  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> > -return false;
> >
> >if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
> >&& !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
> >
> > fixes this particular fallout for me.
> 
> I'll try it.

With this fix it works fine, thanks!  Bootstrapped and regtested on 
x86_64-unknown-linux-gnu.  OK for trunk?

Ilya
--
gcc/

2015-11-20  Ilya Enkovich  
Richard Biener  

* tree-vect-loop.c (vect_determine_vectorization_factor): Don't
compute vectype for non-relevant mask producers.
* gcc/tree-vect-stmts.c (vectorizable_comparison): Check stmt
relevance earlier.

gcc/testsuite/

2015-11-20  Ilya Enkovich  

* gcc.dg/pr68327.c: New test.


diff --git a/gcc/testsuite/gcc.dg/pr68327.c b/gcc/testsuite/gcc.dg/pr68327.c
new file mode 100644
index 000..c3e6a94
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68327.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+int a, d;
+char b, c;
+
+void
+fn1 ()
+{
+  int i = 0;
+  for (; i < 1; i++)
+d = 1;
+  for (; b; b++)
+a = 1 && (d & b);
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 80937ec..592372d 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -439,7 +439,8 @@ vect_determine_vectorization_factor (loop_vec_info 
loop_vinfo)
 compute a factor.  */
  if (TREE_CODE (scalar_type) == BOOLEAN_TYPE)
{
- mask_producers.safe_push (stmt_info);
+ if (STMT_VINFO_RELEVANT_P (stmt_info))
+   mask_producers.safe_push (stmt_info);
  bool_result = true;
 
  if (gimple_code (stmt) == GIMPLE_ASSIGN
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 0f64aaf..3723b26 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -7546,6 +7546,9 @@ vectorizable_comparison (gimple *stmt, 
gimple_stmt_iterator *gsi,
   tree mask_type;
   tree mask;
 
+  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+return false;
+
   if (!VECTOR_BOOLEAN_TYPE_P (vectype))
 return false;
 
@@ -7558,9 +7561,6 @@ vectorizable_comparison (gimple *stmt, 
gimple_stmt_iterator *gsi,
 ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
 
   gcc_assert (ncopies >= 1);
-  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
-return false;
-
   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
   && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
   && reduc_def))


[Bug tree-optimization/68455] [6 Regression] ICE: tree check: expected integer_cst, have plus_expr in decompose, at tree.h:5123

2015-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68455

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-11-20
 CC||mpolacek at gcc dot gnu.org
   Target Milestone|--- |6.0
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
2092  val[0] = vrp_int_const_binop (code, vr0->min, vr1->min);
2093  if (val[0] == NULL_TREE)
2094sop = true;
2095
2096  if (vr1->max == vr1->min)
(gdb) p debug_generic_expr (vr0->min)
1
$1 = void
(gdb) p debug_generic_expr (vr1->min)
_13 + 1
$2 = void

Confirmed.  Probably caused by the recent division patch.

Re: [PATCH] fix vectorizer performance problem on cygwin hosted cross compiler

2015-11-20 Thread Richard Biener
On Fri, Nov 20, 2015 at 8:21 AM, Jim Wilson  wrote:
> A cygwin hosted cross compiler to aarch64-linux, compiling a C version
> of linpack with -Ofast, produces code that runs 17% slower than a
> linux hosted compiler.  The problem shows up in the vect dump, where
> some different vectorization optimization decisions were made by the
> cygwin compiler than the linux compiler.  That happened because
> tree-vect-data-refs.c calls qsort in vect_analyze_data_ref_accesses,
> and the newlib and glibc qsort routines sort the list differently.  I
> can reproduce the same problem on linux by adding the newlib qsort
> sources to a gcc build.  For an x86_64 target, I see about a 30%
> performance loss using the newlib qsort.
>
> The qsort trouble turns out to be a problem in the qsort comparison
> function, dr_group_sort_cmp.  It does this
>   if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0))
> {
>   cmp = compare_tree (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb));
>   if (cmp != 0)
> return cmp;
> }
> operand_equal_p calls STRIP_NOPS, so it will consider two trees to be
> the same even if they have NOP_EXPR.  However, compare_tree is not
> calling STRIP_NOPS, so it handles trees with NOP_EXPRs differently
> than trees without.  The result is that depending on which array entry
> gets used as the qsort pivot point, you can get very different sorts.
> The newlib qsort happens to be accidentally choosing a bad pivot for
> this testcase.  The glibc qsort happens to be accidentally choosing a
> good pivot for this testcase.  This then triggers a cascading problem
> in vect_analyze_data_ref_accesses which assumes that array entries
> that pass the operand_equal_p test for the base address will end up
> adjacent, and will only vectorize in that case.
>
> For a contrived example, suppose we have four entries to sort: (plus Y
> 8), (mult A 4), (pointer_plus Z 16), and (nop (mult A 4)).  Suppose we
> choose the mult as the pivot point. The plus sorts before because
> tree_code plus is less than mult. The pointer_plus sorts after for the
> same reason. The nop sorts equal. So we end up with plus, mult, nop,
> pointer_plus. The mult and nop are then combined into the same
> vectorization group.  Now suppose we choose the pointer_plus as the
> pivot point. The plus and mult sort before. The nop sorts after. The
> final result is plus, mult, pointer_plus, nop. And we fail to
> vectorize as the mult and nop are not adjacent as they should be.
>
> When I modify compare_tree to call STRIP_NOPS, this problem goes away.
> I get the same sort from both the newlib and glibc qsort functions,
> and I get the same linpack performance from a cygwin hosted compiler
> and a linux hosted compiler.
>
> This patch was tested with an x86_64 bootstrap and make check.  There
> were no regressions.  I've also done a SPEC CPU2000 run with and
> without the patch on aarch64-linux, there is no performance change.
> And I've verified it by building linpack for aarch64-linux with cygwin
> hosted cross compiler, x86_64 hosted cross compiler, and an aarch64
> native compiler.

Ok.

Thanks,
Richard.

> Jim


Re: Add uaddv4_optab, usubv4_optab

2015-11-20 Thread Eric Botcazou
> Eric has just submitted a documentation path that documented the
> {add,sub,mul,umul}v4 and negv3 patterns, so this should be
> applied on top of that.

OK, I'm going to apply it, thanks.  Note that the comment at the beginning 
of expand_addsub_overflow describing the overall strategy ought to be adjusted 
if new patterns for the jump on carry are added.

-- 
Eric Botcazou


[Bug ipa/65701] [5 Regression] r221530 makes 187.facerec drop with -Ofast -flto on bdver2

2015-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65701

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|5.3 |6.0

--- Comment #20 from Richard Biener  ---
Fixed on trunk, no backport planned.

Re: basic asm and memory clobbers

2015-11-20 Thread David Wohlferd

On 11/20/2015 3:14 AM, Andrew Haley wrote:

On 20/11/15 10:37, David Wohlferd wrote:

The intent for 24414 is to change basic asm such that it will become
(quoting jeff) "an opaque blob that read/write/clobber any register or
memory location."  Such being the case, "memory" is not sufficient:

#define CLOBBERALL "eax", "ebx", "ecx", "edx", "r8", "r9", "r10", "r11",
"r12", "r13", "r14", "r15", "edi", "esi", "ebp", "cc", "memory"

Hmm.  I would not be at all surprised to see this cause reload
failures.  You certainly shouldn't clobber the frame pointer on
any machine which needs one.


If I don't clobber ebp, gcc just uses it:

movl$1000, %ebp
.L2:
#
subl$1, %ebp
jne .L2

The original purpose of this code was to attempt to show that this kind 
of "clobbering everything" behavior (the proposed new behavior for basic 
asm) could have non-trivial impact on existing routines. While I've been 
told that changing the existing "clobber nothing" approach to this kind 
of "clobber everything" is "less intrusive than you might think," I'm 
struggling to believe it.  It seems to me that one asm("nop") thrown 
into a driver routine to fix a timing problem could end up making a real 
mess.


But actually we're kind of past that.  When Jeff, Segher, (other) Andrew 
and Richard all say "this is how it's going to work," it's time for me 
to set aside my reservations and move on.


So now I'm just trying my best to make sure that if it *is* an issue, 
people have a viable solution readily available.  And to make sure it's 
all correctly doc'ed (which is what started this whole mess).


dw


[committed, trivial] Fix typo and trailing whitespace in dump-file strings in parloops

2015-11-20 Thread Tom de Vries

[ was: Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def ]

On 18/11/15 17:22, Bernhard Reutner-Fischer wrote:

Bonus points for fixing the dump_file to parse in:


>Parloops will fail because:
>...
>phi is n_2 = PHI 
>arg of phi to exit: value n_4(D) used outside loop
>checking if it a part of reduction pattern:

s/it a/it is/



This patch fixes a typo and trailing whitespace in dump-file strings in 
parloops.


Build for c and fortran, tested -fdump-tree-parloops testcases.

Committed to trunk as trivial.

Thanks,
- Tom
Fix typo and trailing whitespace in dump-file strings in parloops

2015-11-19  Tom de Vries  

	* tree-parloops.c (build_new_reduction): Fix trailing whitespace in
	dump-file string.
	(try_create_reduction_list): Same.  Fix typo in dump-file string.

---
 gcc/tree-parloops.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 8d7912d..aca2370 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2383,7 +2383,7 @@ build_new_reduction (reduction_info_table_type *reduction_list,
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file,
-	   "Detected reduction. reduction stmt is: \n");
+	   "Detected reduction. reduction stmt is:\n");
   print_gimple_stmt (dump_file, reduc_stmt, 0, 0);
   fprintf (dump_file, "\n");
 }
@@ -2564,7 +2564,7 @@ try_create_reduction_list (loop_p loop,
 	  print_generic_expr (dump_file, val, 0);
 	  fprintf (dump_file, " used outside loop\n");
 	  fprintf (dump_file,
-		   "  checking if it a part of reduction pattern:  \n");
+		   "  checking if it is part of reduction pattern:\n");
 	}
 	  if (reduction_list->elements () == 0)
 	{


GCC 5.3 Status Report (2015-11-20)

2015-11-20 Thread Richard Biener

Status
==

We plan to do a GCC 5.3 release candidate at the end of next week
followed by the actual release a week after that.

So now is the time to look at your regression bugs in bugzilla and
do some backporting for things already fixed on trunk.


Quality Data


Priority  #   Change from last report
---   ---
P10
P2  121+  30
P3   20-   8
P4   87+   2
P5   32+   2
---   ---
Total P1-P3 141+  22
Total   260+  24


Previous Report
===

https://gcc.gnu.org/ml/gcc/2015-07/msg00197.html



[Bug objc/68438] [6 Regression] Conditional jump or move depends on uninitialised value in location_adhoc_data_eq (line-map.c:89)

2015-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68438

Richard Biener  changed:

   What|Removed |Added

 CC||dmalcolm at gcc dot gnu.org
   Target Milestone|--- |6.0

Add uaddv4_optab, usubv4_optab

2015-11-20 Thread Richard Henderson

Toward fixing PR68385.  I'm just starting a full round of testing, but

extern void underflow(void) __attribute__((noreturn));
unsigned sub1(unsigned a, unsigned b)
{
unsigned r = a - b;
if (r > a) underflow();
return r;
}

unsigned sub2(unsigned a, unsigned b)
{
unsigned r;
if (__builtin_sub_overflow(a, b, )) underflow();
return r;
}


sub1:
movl%edi, %eax
subl%esi, %eax
cmpl%eax, %edi
jb  .L7
rep ret
...
sub2:
movl%edi, %eax
subl%esi, %eax
jb  .L16
rep ret
...


r~
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 4c5e22a..10a004e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6156,6 +6156,22 @@
  (const_string "4")]
  (const_string "")))])
 
+(define_expand "uaddv4"
+  [(parallel [(set (reg:CCC FLAGS_REG)
+  (compare:CCC
+(plus:SWI (match_dup 1) (match_dup 2))
+(match_dup 1)))
+ (set (match_dup 0)
+  (plus:SWI (match_dup 1) (match_dup 2)))])
+   (set (pc) (if_then_else
+  (ne (reg:CCC FLAGS_REG) (const_int 0))
+  (label_ref (match_operand 3))
+  (pc)))]
+  ""
+{
+  ix86_fixup_binary_operands_no_copy (PLUS, mode, operands);
+})
+
 ;; The lea patterns for modes less than 32 bits need to be matched by
 ;; several insns converted to real lea by splitters.
 
@@ -6461,6 +6477,20 @@
  (const_string "4")]
  (const_string "")))])
 
+(define_expand "usubv4"
+  [(parallel [(set (reg:CC FLAGS_REG)
+  (compare:CC (match_dup 1) (match_dup 2)))
+ (set (match_dup 0)
+  (minus:SWI (match_dup 1) (match_dup 2)))])
+   (set (pc) (if_then_else
+  (ltu (reg:CC FLAGS_REG) (const_int 0))
+  (label_ref (match_operand 3))
+  (pc)))]
+  ""
+{
+  ix86_fixup_binary_operands_no_copy (MINUS, mode, operands);
+})
+
 (define_insn "*sub_3"
   [(set (reg FLAGS_REG)
(compare (match_operand:SWI 1 "nonimmediate_operand" "0,0")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 8b2deaa..9386e62 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4912,6 +4912,25 @@ address calculations.  @code{add@var{m}3} is used if
 @itemx @samp{and@var{m}3}, @samp{ior@var{m}3}, @samp{xor@var{m}3}
 Similar, for other arithmetic operations.
 
+@cindex @code{addv@var{m}4} instruction pattern
+@item @samp{addv@var{m}4}
+Add operand 2 and operand 1, storing the result in operand 0.  If signed
+overflow occurs during the addition, jump to the label in operand 3.
+
+@cindex @code{subv@var{m}4} instruction pattern
+@cindex @code{mulv@var{m}4} instruction pattern
+@item @samp{subv@var{m}4}, @samp{mulv@var{m}4}
+Similar, for other signed arithmetic operations.
+
+@cindex @code{uaddv@var{m}4} instruction pattern
+@item @samp{uaddv@var{m}4}
+Like @code{addv@var{m}4}, except jump on unsigned overflow.
+
+@cindex @code{usubv@var{m}4} instruction pattern
+@cindex @code{umulv@var{m}4} instruction pattern
+@item @samp{usubv@var{m}4}, @samp{umulv@var{m}4}
+Similar, for other unsigned arithmetic operations.
+
 @cindex @code{fma@var{m}4} instruction pattern
 @item @samp{fma@var{m}4}
 Multiply operand 2 and operand 1, then add operand 3, storing the
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index bc77bdc..b15657f 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -546,6 +546,33 @@ expand_addsub_overflow (location_t loc, tree_code code, 
tree lhs,
   /* u1 +- u2 -> ur  */
   if (uns0_p && uns1_p && unsr_p)
 {
+  insn_code icode = optab_handler (code == PLUS_EXPR ? uaddv4_optab
+   : usubv4_optab, mode);
+  if (icode != CODE_FOR_nothing)
+   {
+ struct expand_operand ops[4];
+ rtx_insn *last = get_last_insn ();
+
+ res = gen_reg_rtx (mode);
+ create_output_operand ([0], res, mode);
+ create_input_operand ([1], op0, mode);
+ create_input_operand ([2], op1, mode);
+ create_fixed_operand ([3], do_error);
+ if (maybe_expand_insn (icode, 4, ops))
+   {
+ last = get_last_insn ();
+ if (profile_status_for_fn (cfun) != PROFILE_ABSENT
+ && JUMP_P (last)
+ && any_condjump_p (last)
+ && !find_reg_note (last, REG_BR_PROB, 0))
+   add_int_reg_note (last, REG_BR_PROB, PROB_VERY_UNLIKELY);
+ emit_jump (done_label);
+ goto do_error_label;
+   }
+
+ delete_insns_since (last);
+   }
+
   /* Compute the operation.  On RTL level, the addition is always
 unsigned.  */
   res = expand_binop (mode, code == PLUS_EXPR ? add_optab : sub_optab,
@@ -737,92 +764,88 @@ expand_addsub_overflow (location_t loc, tree_code code, 
tree lhs,
   gcc_assert (!uns0_p && !uns1_p && !unsr_p);
 
   /* s1 +- s2 -> 

Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def

2015-11-20 Thread Richard Biener
On Thu, 19 Nov 2015, Tom de Vries wrote:

> On 17/11/15 15:53, Tom de Vries wrote:
> > > And the above LIM example
> > > is none for why you need two LIM passes...
> > 
> > Indeed. I'm planning a separate reply to explain in more detail the need
> > for the two pass_lims.
> 
> I.
> 
> I managed to get rid of the two pass_lims for the motivating example that I
> used until now (goacc/kernels-double-reduction.c). I found that by adding a
> pass_dominator instance after pass_ch, I could get rid of the second pass_lim
> (and pass_copyprop as well).
> 
> But... then I wrote a counter example (goacc/kernels-double-reduction-n.c),
> and I'm back at two pass_lims (and two pass_dominators).
> Also I've split the pass group into a bit before and after pass_fre.
> 
> So, the current pass group looks like:
> ...
> NEXT_PASS (pass_build_ealias);
> 
> /* Pass group that runs when the function is an offloaded function
>containing oacc kernels loops.  Part 1.  */
> NEXT_PASS (pass_oacc_kernels);
> PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
> /* We need pass_ch here, because pass_lim has no effect on
>exit-first loops (PR65442).  Ideally we want to remove both
>this pass instantiation, and the reverse transformation
>transform_to_exit_first_loop_alt, which is done in
>pass_parallelize_loops_oacc_kernels. */
> NEXT_PASS (pass_ch);
> POP_INSERT_PASSES ()
> 
> NEXT_PASS (pass_fre);
> 
> /* Pass group that runs when the function is an offloaded function
>containing oacc kernels loops.  Part 2.  */
> NEXT_PASS (pass_oacc_kernels2);
> PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels2)
> /* We use pass_lim to rewrite in-memory iteration and reduction
>variable accesses in loops into local variables accesses.  */
> NEXT_PASS (pass_lim);
> NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
> NEXT_PASS (pass_lim);
> NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
> NEXT_PASS (pass_dce);
> NEXT_PASS (pass_parallelize_loops_oacc_kernels);
> NEXT_PASS (pass_expand_omp_ssa);
> POP_INSERT_PASSES ()
> NEXT_PASS (pass_merge_phi);
> ...
> 
> 
> II.
> 
> The motivating test-case kernels-double-reduction-n.c:
> ...
> #include 
> 
> #define N 500
> 
> unsigned int a[N][N];
> 
> void  __attribute__((noinline,noclone))
> foo (unsigned int n)
> {
>   int i, j;
>   unsigned int sum = 1;
> 
> #pragma acc kernels copyin (a[0:n]) copy (sum)
>   {
> for (i = 0; i < n; ++i)
>   for (j = 0; j < n; ++j)
> sum += a[i][j];
>   }
> 
>   if (sum != 5001)
> abort ();
> }
> ...
> 
> 
> III.
> 
> Before first pass_lim. Note no phis on inner or outer loop header for
> iteration varables or reduction variable:
> ...
>   :
>   _5 = *.omp_data_i_4(D).i;
>   *_5 = 0;
>   _44 = *.omp_data_i_4(D).n;
>   _45 = *_44;
>   if (_45 != 0)
> goto ;
>   else
> goto ;
> 
>   : outer loop header
>   _12 = *.omp_data_i_4(D).j;
>   *_12 = 0;
>   if (_45 != 0)
> goto ;
>   else
> goto ;
> 
>   : inner loop header, latch
>   _19 = *.omp_data_i_4(D).a;
>   _21 = *_5;
>   _23 = *_12;
>   _24 = *_19[_21][_23];
>   _25 = *.omp_data_i_4(D).sum;
>   sum.0_26 = *_25;
>   sum.1_27 = _24 + sum.0_26;
>   *_25 = sum.1_27;
>   _33 = _23 + 1;
>   *_12 = _33;
>   j.2_16 = (unsigned int) _33;
>   if (j.2_16 < _45)
> goto ;
>   else
> goto ;
> 
>   : outer loop latch
>   _36 = *_5;
>   _38 = _36 + 1;
>   *_5 = _38;
>   i.3_9 = (unsigned int) _38;
>   if (i.3_9 < _45)
> goto ;
>   else
> goto ;
> 
>   :
>   return;
> ...
> 
> 
> IV.
> 
> After first pass_lim/pass_dom pair. Note there are phis on the inner loop
> header for the reduction and the iteration variable, but not on the outer loop
> header:
> ...
>   :
>   _5 = *.omp_data_i_4(D).i;
>   *_5 = 0;
>   _44 = *.omp_data_i_4(D).n;
>   _45 = *_44;
>   if (_45 != 0)
> goto ;
>   else
> goto ;
> 
>   :
>   _12 = *.omp_data_i_4(D).j;
>   _19 = *.omp_data_i_4(D).a;
>   D__lsm.10_50 = *_12;
>   D__lsm.11_51 = 0;
>   _25 = *.omp_data_i_4(D).sum;
> 
>   : outer loop header
>   D__lsm.10_20 = 0;
>   D__lsm.11_22 = 1;
>   _21 = *_5;
>   D__lsm.12_28 = *_25;
>   D__lsm.13_30 = 0;
>   goto ;
> 
>   : inner loop header, latch
>   # D__lsm.10_47 = PHI <0(5), _33(7)>
>   # D__lsm.12_49 = PHI 
>   _23 = D__lsm.10_47;
>   _24 = *_19[_21][D__lsm.10_47];
>   sum.0_26 = D__lsm.12_49;
>   sum.1_27 = _24 + D__lsm.12_49;
>   D__lsm.12_31 = sum.1_27;
>   D__lsm.13_32 = 1;
>   _33 = D__lsm.10_47 + 1;
>   D__lsm.10_14 = _33;
>   D__lsm.11_15 = 1;
>   j.2_16 = (unsigned int) _33;
>   if (j.2_16 < _45)
> goto ;
>   else
> goto ;
> 
>   : outer loop latch
>   # D__lsm.10_35 = PHI <_33(7)>
>   # D__lsm.11_37 = PHI <1(7)>
>   # D__lsm.12_7 = PHI 
>   # D__lsm.13_8 = PHI <1(7)>
>   *_25 = sum.1_27;
>   _36 = *_5;
>   _38 = _36 + 1;
>   *_5 = _38;
>   i.3_9 = (unsigned int) _38;
>   if (i.3_9 < _45)
> goto ;
>   else
> goto ;
> 
>   :
>   # 

[Bug tree-optimization/68373] autopar fails on loop exit phi with argument defined outside loop

2015-11-20 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68373

vries at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from vries at gcc dot gnu.org ---
patch and test-case committed, marking resolved-fixed.

Re: Add uaddv4_optab, usubv4_optab

2015-11-20 Thread Jakub Jelinek
On Fri, Nov 20, 2015 at 11:27:48AM +0100, Richard Henderson wrote:
> Toward fixing PR68385.  I'm just starting a full round of testing, but
> 
> extern void underflow(void) __attribute__((noreturn));
> unsigned sub1(unsigned a, unsigned b)
> {
> unsigned r = a - b;
> if (r > a) underflow();
> return r;
> }
> 
> unsigned sub2(unsigned a, unsigned b)
> {
> unsigned r;
> if (__builtin_sub_overflow(a, b, )) underflow();
> return r;
> }
> 
> 
> sub1:
>   movl%edi, %eax
>   subl%esi, %eax
>   cmpl%eax, %edi
>   jb  .L7
>   rep ret
> ...
> sub2:
>   movl%edi, %eax
>   subl%esi, %eax
>   jb  .L16
>   rep ret
> ...

That looks good.

> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -6156,6 +6156,22 @@
> (const_string "4")]
> (const_string "")))])
>  
> +(define_expand "uaddv4"
> +  [(parallel [(set (reg:CCC FLAGS_REG)
> +(compare:CCC
> +  (plus:SWI (match_dup 1) (match_dup 2))
> +  (match_dup 1)))
> +   (set (match_dup 0)
> +(plus:SWI (match_dup 1) (match_dup 2)))])
> +   (set (pc) (if_then_else
> +(ne (reg:CCC FLAGS_REG) (const_int 0))
> +(label_ref (match_operand 3))
> +(pc)))]
> +  ""
> +{
> +  ix86_fixup_binary_operands_no_copy (PLUS, mode, operands);
> +})

Do we need this one on i?86?  I'm not against adding it to optabs, so that
other targets have a way to improve that, but doesn't combine handle this
case on i?86 already well?
I've been thinking of only transforming the above sub1 code (in forwprop, as
richi suggested) to sub2 internal call + REALPART/IMAGPART extraction if
the corresponding optab exists.

> +
>  ;; The lea patterns for modes less than 32 bits need to be matched by
>  ;; several insns converted to real lea by splitters.
>  
> @@ -6461,6 +6477,20 @@
> (const_string "4")]
> (const_string "")))])
>  
> +(define_expand "usubv4"
> +  [(parallel [(set (reg:CC FLAGS_REG)
> +(compare:CC (match_dup 1) (match_dup 2)))
> +   (set (match_dup 0)
> +(minus:SWI (match_dup 1) (match_dup 2)))])
> +   (set (pc) (if_then_else
> +(ltu (reg:CC FLAGS_REG) (const_int 0))
> +(label_ref (match_operand 3))
> +(pc)))]

If this works, it will be nice, I thought we'll need a new CC*mode.

> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4912,6 +4912,25 @@ address calculations.  @code{add@var{m}3} is used if
>  @itemx @samp{and@var{m}3}, @samp{ior@var{m}3}, @samp{xor@var{m}3}
>  Similar, for other arithmetic operations.
>  
> +@cindex @code{addv@var{m}4} instruction pattern
> +@item @samp{addv@var{m}4}
> +Add operand 2 and operand 1, storing the result in operand 0.  If signed
> +overflow occurs during the addition, jump to the label in operand 3.
> +
> +@cindex @code{subv@var{m}4} instruction pattern
> +@cindex @code{mulv@var{m}4} instruction pattern
> +@item @samp{subv@var{m}4}, @samp{mulv@var{m}4}
> +Similar, for other signed arithmetic operations.
> +
> +@cindex @code{uaddv@var{m}4} instruction pattern
> +@item @samp{uaddv@var{m}4}
> +Like @code{addv@var{m}4}, except jump on unsigned overflow.
> +
> +@cindex @code{usubv@var{m}4} instruction pattern
> +@cindex @code{umulv@var{m}4} instruction pattern
> +@item @samp{usubv@var{m}4}, @samp{umulv@var{m}4}
> +Similar, for other unsigned arithmetic operations.

Eric has just submitted a documentation path that documented the
{add,sub,mul,umul}v4 and negv3 patterns, so this should be
applied on top of that.

Jakub


[Bug rtl-optimization/68376] [4.9 Regression] wrong code at -O1 and above on x86_64-linux-gnu

2015-11-20 Thread zsojka at seznam dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68376

Zdenek Sojka  changed:

   What|Removed |Added

 CC||zsojka at seznam dot cz

--- Comment #8 from Zdenek Sojka  ---
The testcase is failing in trunk r230609 with -ftracer (x86_64):
$ gcc -O -ftracer pr68376-2.c
$./a.out
Aborted

5-branch is fine; 4_9-branch and 4_8-branch are failing (they are unfixed for
the original issue).

Dumps up to .optimized look fine, it still seems to be an RTL optimizer bug, so
I didn't create a separate PR.

Simplifid testcase:

$ cat testcase.c
__attribute__((noinline, noclone)) int
f3 (int x)
{
  return x <= 0 ? ~x : x;
}

int
main ()
{
  if (f3 (0) != -1)
__builtin_abort ();
  return 0;
}


asm output:
f3:
movl%edi, %eax
sarl$31, %eax
xorl%edi, %eax
ret

this would be valid for "x < 0 ? ~x : x".

Re: [PATCH, PR tree-optimization/68327] Compute vectype for live phi nodes when copmputing VF

2015-11-20 Thread Ilya Enkovich
2015-11-20 14:28 GMT+03:00 Richard Biener :
> On Wed, Nov 18, 2015 at 2:53 PM, Ilya Enkovich  wrote:
>> 2015-11-18 16:44 GMT+03:00 Richard Biener :
>>> On Wed, Nov 18, 2015 at 12:34 PM, Ilya Enkovich  
>>> wrote:
 Hi,

 When we compute vectypes we skip non-relevant phi nodes.  But we process 
 non-relevant alive statements and thus may need vectype of non-relevant 
 live phi node to compute mask vectype.  This patch enables vectype 
 computation for live phi nodes.  Botostrapped and regtested on 
 x86_64-unknown-linux-gnu.  OK for trunk?
>>>
>>> Hmm.  What breaks if you instead skip all !relevant stmts and not
>>> compute vectype for life but not relevant ones?  We won't ever
>>> "vectorize" !relevant ones, that is, we don't need their vector type.
>>
>> I tried it and got regression in SLP.  It expected non-null vectype
>> for non-releveant but live statement. Regression was in
>> gcc/gcc/testsuite/gfortran.fortran-torture/execute/pr43390.f90
>
> Because somebody put a vector type check before
>
>   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> return false;
>
> @@ -7590,6 +7651,9 @@ vectorizable_comparison (gimple *stmt, g
>tree mask_type;
>tree mask;
>
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> +return false;
> +
>if (!VECTOR_BOOLEAN_TYPE_P (vectype))
>  return false;
>
> @@ -7602,8 +7666,6 @@ vectorizable_comparison (gimple *stmt, g
>  ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
>
>gcc_assert (ncopies >= 1);
> -  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> -return false;
>
>if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
>&& !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
>
> fixes this particular fallout for me.

I'll try it.

Thanks,
Ilya

>
> Richard.
>
>> Ilya
>>
>>>
>>> Richard.
>>>
 Thanks,
 Ilya


Re: [PATCH 02/15] Add selftests to bitmap.c

2015-11-20 Thread Richard Biener
On Thu, Nov 19, 2015 at 6:04 PM, David Malcolm  wrote:
> Jeff pre-approved the plugin version of this (as a new
> file unittests/test-bitmap.c):
>   https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03284.html
> with:
>> OK if/when prereqs are approved.  Minor twiddling if we end up moving it
>> elsewhere or standardizing/reducing header files is pre-approved.
>
> This version moves them to bitmap.c
>
> One issue: how to express:
>   TEST (bitmap_test, gc_alloc)
> in a ChangeLog entry.
>
> I've chosen to write it as (bitmap_test, gc_alloc) since that
> has the greatest chance of being found via grep.

I think overloading CHECKING_P for this is bogus.  Adding a new
UNIT_TEST_FILE (like GENERATOR_FILE) would be better.  Also
selftest.h should be only conditionally included (before the tests themselves?)

Richard.

> gcc/ChangeLog:
> * bitmap.c: Include "selftest.h".
> (bitmap_test, gc_alloc): New selftest.
> (bitmap_test, set_range): New selftest.
> (bitmap_test, clear_bit_in_middle): New selftest.
> (bitmap_test, copying): New selftest.
> (bitmap_test, bitmap_single_bit_set_p): New selftest.
> ---
>  gcc/bitmap.c | 92 
> 
>  1 file changed, 92 insertions(+)
>
> diff --git a/gcc/bitmap.c b/gcc/bitmap.c
> index f04b8f9..e6f772e 100644
> --- a/gcc/bitmap.c
> +++ b/gcc/bitmap.c
> @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "system.h"
>  #include "coretypes.h"
>  #include "bitmap.h"
> +#include "selftest.h"
>
>  /* Memory allocation statistics purpose instance.  */
>  mem_alloc_description bitmap_mem_desc;
> @@ -2094,5 +2095,96 @@ debug (const bitmap_head *ptr)
>  fprintf (stderr, "\n");
>  }
>
> +#if CHECKING_P
> +namespace {
> +
> +/* Freshly-created bitmaps ought to be empty.  */
> +
> +TEST (bitmap_test, gc_alloc)
> +{
> +  bitmap b = bitmap_gc_alloc ();
> +  EXPECT_TRUE (bitmap_empty_p (b));
> +}
> +
> +/* Verify bitmap_set_range.  */
> +
> +TEST (bitmap_test, set_range)
> +{
> +  bitmap b = bitmap_gc_alloc ();
> +  EXPECT_TRUE (bitmap_empty_p (b));
> +
> +  bitmap_set_range (b, 7, 5);
> +  EXPECT_FALSE (bitmap_empty_p (b));
> +  EXPECT_EQ (5, bitmap_count_bits (b));
> +
> +  /* Verify bitmap_bit_p at the boundaries.  */
> +  EXPECT_FALSE (bitmap_bit_p (b, 6));
> +  EXPECT_TRUE (bitmap_bit_p (b, 7));
> +  EXPECT_TRUE (bitmap_bit_p (b, 11));
> +  EXPECT_FALSE (bitmap_bit_p (b, 12));
> +}
> +
> +/* Verify splitting a range into two pieces using bitmap_clear_bit.  */
> +
> +TEST (bitmap_test, clear_bit_in_middle)
> +{
> +  bitmap b = bitmap_gc_alloc ();
> +
> +  /* Set b to [100..200].  */
> +  bitmap_set_range (b, 100, 100);
> +  EXPECT_EQ (100, bitmap_count_bits (b));
> +
> +  /* Clear a bit in the middle.  */
> +  bool changed = bitmap_clear_bit (b, 150);
> +  EXPECT_TRUE (changed);
> +  EXPECT_EQ (99, bitmap_count_bits (b));
> +  EXPECT_TRUE (bitmap_bit_p (b, 149));
> +  EXPECT_FALSE (bitmap_bit_p (b, 150));
> +  EXPECT_TRUE (bitmap_bit_p (b, 151));
> +}
> +
> +/* Verify bitmap_copy.  */
> +
> +TEST (bitmap_test, copying)
> +{
> +  bitmap src = bitmap_gc_alloc ();
> +  bitmap_set_range (src, 40, 10);
> +
> +  bitmap dst = bitmap_gc_alloc ();
> +  EXPECT_FALSE (bitmap_equal_p (src, dst));
> +  bitmap_copy (dst, src);
> +  EXPECT_TRUE (bitmap_equal_p (src, dst));
> +
> +  /* Verify that we can make them unequal again...  */
> +  bitmap_set_range (src, 70, 5);
> +  EXPECT_FALSE (bitmap_equal_p (src, dst));
> +
> +  /* ...and that changing src after the copy didn't affect
> + the other: */
> +  EXPECT_FALSE (bitmap_bit_p (dst, 70));
> +}
> +
> +/* Verify bitmap_single_bit_set_p.  */
> +TEST (bitmap_test, bitmap_single_bit_set_p)
> +{
> +  bitmap b = bitmap_gc_alloc ();
> +
> +  EXPECT_FALSE (bitmap_single_bit_set_p (b));
> +
> +  bitmap_set_range (b, 42, 1);
> +  EXPECT_TRUE (bitmap_single_bit_set_p (b));
> +  EXPECT_EQ (42, bitmap_first_set_bit (b));
> +
> +  bitmap_set_range (b, 1066, 1);
> +  EXPECT_FALSE (bitmap_single_bit_set_p (b));
> +  EXPECT_EQ (42, bitmap_first_set_bit (b));
> +
> +  bitmap_clear_range (b, 0, 100);
> +  EXPECT_TRUE (bitmap_single_bit_set_p (b));
> +  EXPECT_EQ (1066, bitmap_first_set_bit (b));
> +}
> +
> +}  // anon namespace
> +#endif /* CHECKING_P */
>
>  #include "gt-bitmap.h"
> --
> 1.8.5.3
>


Re: basic asm and memory clobbers

2015-11-20 Thread Andrew Haley
On 20/11/15 10:37, David Wohlferd wrote:
> The intent for 24414 is to change basic asm such that it will become 
> (quoting jeff) "an opaque blob that read/write/clobber any register or 
> memory location."  Such being the case, "memory" is not sufficient:
> 
> #define CLOBBERALL "eax", "ebx", "ecx", "edx", "r8", "r9", "r10", "r11", 
> "r12", "r13", "r14", "r15", "edi", "esi", "ebp", "cc", "memory"

Hmm.  I would not be at all surprised to see this cause reload
failures.  You certainly shouldn't clobber the frame pointer on
any machine which needs one.

Andrew.



Re: [PATCH 2/4][AArch64] Increase the loop peeling limit

2015-11-20 Thread James Greenhalgh
On Thu, Nov 19, 2015 at 04:04:41PM -0600, Evandro Menezes wrote:
> On 11/05/2015 02:51 PM, Evandro Menezes wrote:
> >2015-11-05  Evandro Menezes 
> >
> >   gcc/
> >
> >   * config/aarch64/aarch64.c (aarch64_override_options_internal):
> >   Increase loop peeling limit.
> >
> >This patch increases the limit for the number of peeled insns.
> >With this change, I noticed no major regression in either
> >Geekbench v3 or SPEC CPU2000 while some benchmarks, typically FP
> >ones, improved significantly.
> >
> >I tested this tuning on Exynos M1 and on A57.  ThunderX seems to
> >benefit from this tuning too.  However, I'd appreciate comments
> >from other stakeholders.
> 
> Ping.

I'd like to leave this for a call from the port maintainers. I can see why
this leads to more opportunities for vectorization, but I'm concerned about
the wider impact on code size. Certainly I wouldn't expect this to be our
default at -O2 and below.

My gut feeling is that this doesn't really belong in the back-end (there are
presumably good reasons why the default for this parameter across GCC has
fluctuated from 400 to 100 to 200 over recent years), but as I say, I'd
like Marcus or Richard to make the call as to whether or not we take this
patch.

For now, I'd drop it from the series (it stands alone anyway).

Thanks,
James



[Bug c++/68409] Garbage added to a map instead of object

2015-11-20 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68409

--- Comment #4 from Jonathan Wakely  ---
(In reply to Adrian Wielgosik from comment #2)
> Your operator< doesn't seem to satisfy strict weak ordering. Once I rewrote
> it to a basic but safer version:
> 
> bool operator< (const chave& lhs, const chave& rhs) {
> if(lhs.numeros_ord != rhs.numeros_ord)
> return lhs.numeros_ord < rhs.numeros_ord;
> return lhs.estrelas_ord < rhs.estrelas_ord;
> }

This is a correct implementation.


Since C++11 is being used it can also be done like this:

bool operator< (const chave& lhs, const chave& rhs) {
return std::tie(lhs.numeros_ord, lhs.esterlas_ord)
 < std::tie(rhs.numeros_ord, rhs.esterlas_ord);
}

[Bug target/65660] [5 Regression] 252.eon regression on bdver2 with -Ofast

2015-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65660

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|5.3 |6.0
  Known to fail||5.3.0

--- Comment #25 from Richard Biener  ---
Not going to backport this.

[Bug ipa/65908] [5 Regression] ICE: in expand_thunk, at cgraphunit.c:1700

2015-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65908

--- Comment #14 from Richard Biener  ---
Martin or Honza, can you work on a backport please?

Broken Link

2015-11-20 Thread melissa . holmes

Hey,

I wanted to reach out and let you know about this link which isn’t 
working - 
http://developer.apple.com/documentation/Cocoa/Conceptual/ObjectiveC/, I 
found it on this page - 
http://gd.tuwien.ac.at/.vhost/www.gnu.org/software/gcc/readings.html. 
You’re link includes this text - "Objective-C Language Description", 
if that helps you find it.


We’ve put together a guide to Objective C, which includes a brief 
history, some useful online resources and books, and an introduction to 
Swift, the successor to Objective C > http://wiht.link/objectivecguide. 
I thought that it might make a suitable alternative to point your site 
visitors to.


All the best,
Melissa


Re: GCC 5.3 Status Report (2015-11-20)

2015-11-20 Thread David Edelsohn
On Fri, Nov 20, 2015 at 7:53 AM, Richard Biener  wrote:
>
> Status
> ==
>
> We plan to do a GCC 5.3 release candidate at the end of next week
> followed by the actual release a week after that.
>
> So now is the time to look at your regression bugs in bugzilla and
> do some backporting for things already fixed on trunk.

I'm still waiting for approval of the libtool change to support AIX
TLS symbols (PR 68192).  There has been no response on Libtool patches
mailing list.

I again request permission to apply the patches to GCC trunk and 5-branch.

Thanks, David


[Bug rtl-optimization/68435] [6 Regression] Missed if-conversion optimization

2015-11-20 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68435

--- Comment #9 from ktkachov at gcc dot gnu.org ---
Since the provided testcase is affected by the path splitting patch, here's an
alternative testcase that exhibits the ifcvt issue even with the latest trunk.
I'll be working on this:

typedef struct cpp_reader cpp_reader;
enum cpp_ttype
{
  CPP_EQ =
0, CPP_NOT, CPP_GREATER, CPP_LESS, CPP_PLUS, CPP_MINUS, CPP_MULT, CPP_DIV,
CPP_MOD, CPP_AND, CPP_OR, CPP_XOR, CPP_RSHIFT, CPP_LSHIFT, CPP_MIN,
CPP_MAX, CPP_COMPL, CPP_AND_AND, CPP_OR_OR, CPP_QUERY, CPP_COLON,
CPP_COMMA, CPP_OPEN_PAREN, CPP_CLOSE_PAREN, CPP_EQ_EQ, CPP_NOT_EQ,
CPP_GREATER_EQ, CPP_LESS_EQ, CPP_PLUS_EQ, CPP_MINUS_EQ, CPP_MULT_EQ,
CPP_DIV_EQ, CPP_MOD_EQ, CPP_AND_EQ, CPP_OR_EQ, CPP_XOR_EQ, CPP_RSHIFT_EQ,
CPP_LSHIFT_EQ, CPP_MIN_EQ, CPP_MAX_EQ, CPP_HASH, CPP_PASTE,
CPP_OPEN_SQUARE, CPP_CLOSE_SQUARE, CPP_OPEN_BRACE, CPP_CLOSE_BRACE,
CPP_SEMICOLON, CPP_ELLIPSIS, CPP_PLUS_PLUS, CPP_MINUS_MINUS, CPP_DEREF,
CPP_DOT, CPP_SCOPE, CPP_DEREF_STAR, CPP_DOT_STAR, CPP_ATSIGN, CPP_NAME,
CPP_NUMBER, CPP_CHAR, CPP_WCHAR, CPP_OTHER, CPP_STRING, CPP_WSTRING,
CPP_HEADER_NAME, CPP_COMMENT, CPP_MACRO_ARG, CPP_PADDING, CPP_EOF,
};
static struct op lex (cpp_reader *, int);
struct op
{
  enum cpp_ttype op;
  long value;
};
int
_cpp_parse_expr (pfile)
{
  struct op init_stack[20];
  struct op *stack = init_stack;
  struct op *top = stack + 1;
  int skip_evaluation = 0;
  for (;;)
{
  struct op op;
  op = lex (pfile, skip_evaluation);
  switch (op.op)
{
case CPP_OR_OR:
  if (top->value)
skip_evaluation++;
  else
skip_evaluation--;
}
}
}

[Bug tree-optimization/68445] ICE: internal compiler error: in operator[], at vec.h

2015-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68445

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2015-11-20
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
I will have a look.

[Bug tree-optimization/68453] [6 Regression] graphite ICE: segfault

2015-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68453

Richard Biener  changed:

   What|Removed |Added

  Component|middle-end  |tree-optimization
   Target Milestone|--- |6.0

Re: [PATCH, PR68373 ] Call scev_const_prop in pass_parallelize_loops::execute

2015-11-20 Thread Richard Biener
On Thu, 19 Nov 2015, Tom de Vries wrote:

> On 17/11/15 23:20, Tom de Vries wrote:
> > [ was: Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def ]
> > 
> > Hi,
> > 
> > Consider test-case test.c, with a use of the final value of the
> > iteration variable (return i):
> > ...
> > unsigned int
> > foo (int *a, unsigned int n)
> > {
> >unsigned int i;
> >for (i = 0; i < n; ++i)
> >  a[i] = 1;
> > 
> >return i;
> > }
> > ...
> > 
> > Compiled with:
> > ...
> > $ gcc -S -O2 test.c -ftree-parallelize-loops=2 -fdump-tree-all-details
> > ...
> > 
> > Before parloops, we have:
> > ...
> >   :
> ># i_12 = PHI <0(3), i_10(5)>
> >_5 = (long unsigned int) i_12;
> >_6 = _5 * 4;
> >_8 = a_7(D) + _6;
> >*_8 = 1;
> >i_10 = i_12 + 1;
> >if (n_4(D) > i_10)
> >  goto ;
> >else
> >  goto ;
> > 
> >:
> >goto ;
> > 
> >:
> ># i_14 = PHI 
> > ...
> > 
> > Parloops will fail because:
> > ...
> > phi is n_2 = PHI 
> > arg of phi to exit:   value n_4(D) used outside loop
> >checking if it a part of reduction pattern:
> >FAILED: it is not a part of reduction
> > ...
> > [ note that the phi looks slightly different. In
> > gather_scalar_reductions -> vect_analyze_loop_form ->
> > vect_analyze_loop_form_1 -> split_loop_exit_edge we split the edge from
> > bb4 to bb6. ]
> > 
> > This patch uses scev_const_prop at the start of parloops.
> > scev_const_prop first also splits the exit edge, and then replaces the
> > phi with a assignment:
> > ...
> >   final value replacement:
> >n_2 = PHI 
> >with
> >n_2 = n_4(D);
> > ...
> > 
> > This allows parloops to succeed.
> > 
> > And there's a similar story when we compile with -fno-tree-scev-cprop in
> > addition.
> > 
> > Bootstrapped and reg-tested on x86_64.
> > 
> > OK for stage3/stage1?
> 
> The patch has been updated to do the final value replacement only for the loop
> that parloops is processing, as suggested in review comment at
> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02166.html .
> 
> That means the patch is now also required for the kernels patch series.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for stage 3 trunk?

Ok.  Please mention tree-optimization/68373 in the changelog.

Thanks,
Richard.

> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[Bug tree-optimization/68455] [6 Regression] ICE: tree check: expected integer_cst, have plus_expr in decompose, at tree.h:5123

2015-11-20 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68455

Marek Polacek  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #2 from Marek Polacek  ---
Nope, started with r228614 - Richard S.'s tree_expr_nonnegative_warnv_p
changes.

Though I can look at this anyway, but only next week.

Re: [patch] Document new overflow arithmetics patterns

2015-11-20 Thread Jakub Jelinek
On Fri, Nov 20, 2015 at 11:12:06AM +0100, Eric Botcazou wrote:
> Hi,
> 
> this documents the new overflow arithmetics patterns added recently (addv4, 
> subv4, mulv4, umulv4, negv3) and only them, i.e. the old ones are still not.
> This also fixes the description of the cbranch and jump patterns, which were 
> referring to a label_ref instead of a code_label.
> 
> Tested with 'make doc' on x86-64/Linux, OK for mainline and 5 branch?
> 
> 
> 2015-11-20  Eric Botcazou  
> 
>   * doc/md.texi (Standard Names): Move entry for addptr3 around,
>   add entries for addv4, subv4, mulv4, umulv4 and negv3, fixes
>   glitch in entries for cbranch4 and jump.

Ok, thanks.

> +@cindex @code{negv@var{m}3} instruction pattern
> +@item @samp{negv@var{m}3}
> +Like @code{neg@var{m}2}  but takes a @code{code_label} as operand 2 and
> +emits code to jump to it if signed overflow occurs during the negation.

Spurious space before "but".

Jakub


[Bug target/68456] New: UINT32_TYPE is long unsigned for 32bit targets

2015-11-20 Thread julia.koval at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68456

Bug ID: 68456
   Summary: UINT32_TYPE is long unsigned for 32bit targets
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: julia.koval at intel dot com
  Target Milestone: ---

This problem exists for gcc at least for --target=i586-elf, --target=i386-elf.

It is caused by the file gcc/config/newlib-stdint.h. It has:

#ifndef STDINT_LONG32
#define STDINT_LONG32 (LONG_TYPE_SIZE == 32)
#endif

#define UINT32_TYPE (STDINT_LONG32 ? "long unsigned int" : INT_TYPE_SIZE == 32
? "unsigned int" : SHORT_TYPE_SIZE == 32 ? "short unsigned int" :
CHAR_TYPE_SIZE == 32 ? "unsigned char" : 0)

I found a discussion for this, that it is a feature for newlib and wontfix:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01321.html

However, this problem remains, if I build the compiler without --with-newlib
flag.

Reproduce:
-bash-4.2$ cat test.c
int main(){
  uint32_t val = 5;
}
-bash-4.2$ ./target_compiler/install/bin/i586-elf-gcc test.c -dM -E | grep
INT32_TYPE
#define __INT32_TYPE__ long int
#define __UINT32_TYPE__ long unsigned int

This problem causes strange warnings, for example in printf("%u...", if this
type is not overriden somewhere in the library:
": format '%u' expects argument of type 'unsigned int', but argument 2 has type
'uint32_t {aka long unsigned int}' "

I also tried to build a newlib toolchain without this feature and it worked ok
for me. This problem was found as a difference between gcc and llvm behavior.

Re: basic asm and memory clobbers

2015-11-20 Thread David Wohlferd

On 11/19/2015 7:14 PM, Segher Boessenkool wrote:

On Thu, Nov 19, 2015 at 05:23:55PM -0800, David Wohlferd wrote:

For that reason, I'd like to propose adding 2 new clobbers to extended
asm as part of this work:

"clobberall" - This gives extended the same semantics as whatever the
new basic asm will be using.
"clobbernone" - This gives the same semantics as the current basic asm.

I don't think this is necessary or useful.  They are also awful names:
"clobberall" cannot clobber everything (think of the stack pointer),


I'm not emotionally attached to the names.  But providing the same 
capability to extended that we are proposing for basic doesn't seem so 
odd.  Shouldn't extended be able to do (at least) everything basic does?


My first thought is that it allows people to incrementally start 
migrating from (new) basic to extended (something I think we should 
encourage).  Or use it as a debug tool to see if the failure you are 
experiencing from your asm is due to a missing clobber.  Since the 
capability will already be implemented for basic, providing a way to 
access it from extended seems trivial (if we can agree on a name).


As you say, clobbering the stack pointer presents special challenges 
(although gcc has a specific way of dealing with stack register 
clobbers, see 52813).  This is why I described the feature as having 
"the same semantics as whatever the new basic asm will be using."



and "clobbernone" does clobber some (those clobbered by any asm),


Seems like a quibble.  Those other things (I assume you mean things like 
pipelining?) most users aren't even aware of (or they wouldn't be so 
eager to use inline asm in the first place).  Would it be more palatable 
if we called it "v5BasicAsmMode"?  "ClobberMin"?



Clobbernone may seem redundant, since not specifying any clobbers should
do the same thing.  But actually it doesn't, at least on i386.  At
present, there is no way for extended asm to not clobber "cc".  I don't
know if other platforms have similar issues.

Some do.  The purpose is to stay compatible with asm written for older
versions of the compiler.


Backward compatibility is important.  I understand that due to the cc0 
change in x86, existing code may have broken without always clobbering 
cc.  This was seen as the safest way to ensure that didn't happen.  
However no solution was/is available for people who correctly knew 
whether their asm clobbers the flags.


Mostly I'm ok with that.  All the ways that I can think of to try to 
re-allow people to start using the cc clobber are just not worth it.  I 
simply can't believe there are many cases where there's going to be a 
benefit.


But as I said: backward compatibility is important.  Providing a way for 
people who need/want the old basic asm semantics seems useful. And I 
don't believe we can (quite) do that without clobbernone.



When basic asm changes, I expect that having a way to "just do what it
used to do" is going to be useful for some people.

24414 says the documented behaviour hasn't been true for at least
fourteen years.  It isn't likely anyone is relying on that behaviour.


?

To my knowledge, there was no documentation of any sort about what basic 
asm clobbered until I added it.  But what people are (presumably) 
relying on is that whatever it did in the last version, it's going to 
continue to do that in the next.  And albeit with good intentions, we 
are planning on changing that.



but perhaps the solution here is to just say that it doesn't
clobber flags (currently the most common case?), and update the docs if
and when people complain?  Yes, that's bad, but saying nothing at all
isn't any better.  And we know it's true for at least 2 platforms.

Saying nothing at all at least is *correct*.


We don't know that saying "it doesn't clobber flags" is wrong either.  
All we know is that jeff said "I suspect this isn't consistent across 
targets."


But that's neither here nor there.  The real question is, if we can't 
say that, what can we say?


- If 24414 is going in v6, then we can doc that it does the clobber and 
be vague about the old behavior.
- If 24414 isn't going in v6, then what?  I suppose we can say that it 
can vary by platform.  We could even provide your sample code as a means 
for people to discover their platform's behavior.



It isn't necessary for users to know what registers the compiler
considers to be clobbered by an asm, unless they actually clobber
something in the assembler code themselves.


I'm not sure I follow.

If someone has code that uses a register, currently they must restore 
the value before exiting the asm or risk disaster.  So they might write 
asm("push eax ; DoSomethingWith eax ; pop eax"). However if you know 
that the compiler is going to clobber eax, then the push/pop is just a 
waste of cycles and memory.


To write efficient code, it seems like you do need to know what the 
compiler clobbers.



They can write extended asm in that case.


I agree that 

Re: Add uaddv4_optab, usubv4_optab

2015-11-20 Thread Eric Botcazou
> Toward fixing PR68385.  I'm just starting a full round of testing, but

Do you mind if I install my doc patch?  It's slightly more thorough.

-- 
Eric Botcazou



Re: [PATCH 2/5] [AARCH64] Change IMP and PART over to integers from strings.

2015-11-20 Thread Kyrill Tkachov

Hi Andrew,

On 17/11/15 22:10, Andrew Pinski wrote:

Because the imp and parts are really integer rather than strings, this patch
moves the comparisons to be integer.  Also allows saving around integers are
easier than doing string comparisons.  This allows for the next change.

The way I store BIG.little is (big<<12)|little as each part num is only 12bits
long.  So it would be nice if someone could test -mpu=native on a big.little
system to make sure it works still.

OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.

Thanks,
Andrew Pinski



* config/aarch64/aarch64-cores.def: Rewrite so IMP and PART are integer 
constants.
* config/aarch64/driver-aarch64.c (struct aarch64_core_data): Change 
implementer_id to unsigned char.
Change part_no to unsigned int.
(AARCH64_BIG_LITTLE): New define.
(INVALID_IMP): New define.
(INVALID_CORE): New define.
(cpu_data): Change the last element's implementer_id and part_no to integers.
(valid_bL_string_p): Rewrite to ..
(valid_bL_core_p): this for integers instead of strings.
(parse_field): New function.
(contains_string_p): Rewrite to ...
(contains_core_p): this for integers and only for the part_no.
(host_detect_local_cpu): Rewrite handling of implementation and part num to be 
integers;
simplifying the code.
---
  gcc/config/aarch64/aarch64-cores.def | 25 +-
  gcc/config/aarch64/driver-aarch64.c  | 90 
  2 files changed, 62 insertions(+), 53 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 0b456f7..798f3e3 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -33,25 +33,26 @@
 This need not include flags implied by the architecture.
 COSTS is the name of the rtx_costs routine to use.
 IMP is the implementer ID of the CPU vendor.  On a GNU/Linux system it can
-   be found in /proc/cpuinfo.
+   be found in /proc/cpuinfo.  There is a list in the ARM ARM.
 PART is the part number of the CPU.  On a GNU/Linux system it can be found
-   in /proc/cpuinfo.  For big.LITTLE systems this should have the form at of
-   ".".  */
+   in /proc/cpuinfo.  For big.LITTLE systems this should use the macro 
AARCH64_BIG_LITTLE
+   where the big part number comes as the first arugment to the macro and 
little is the
+   second.  */
  
  /* V8 Architecture Processors.  */
  
-AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, "0x41", "0xd03")

-AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, 
cortexa57, "0x41", "0xd07")
-AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, 
cortexa72, "0x41", "0xd08")
-AARCH64_CORE("exynos-m1",   exynosm1,  cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | 
AARCH64_FL_CRYPTO, cortexa72, "0x53", "0x001")
-AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | 
AARCH64_FL_CRYPTO, cortexa57, "0x51", "0x800")
-AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | 
AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
-AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, xgene1, 
"0x50", "0x000")
+AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC, cortexa53, 0x41, 0xd03)
+AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC, cortexa57, 0x41, 0xd07)
+AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC, cortexa72, 0x41, 0xd08)
+AARCH64_CORE("exynos-m1",   exynosm1,  cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa72, 0x53, 0x001)
+AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, 0x51, 0x800)
+AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 0x0a1)
+AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
xgene1, 0x50, 0x000)
  
  /* V8 big.LITTLE implementations.  */
  
-AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07.0xd03")

-AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC, cortexa72, "0x41", "0xd08.0xd03")
+AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE(0xd07, 0xd03))
+AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, cortexa53, 8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, AARCH64_BIG_LITTLE(0xd08, 0xd03))
  
  
  #undef AARCH64_CORE

diff --git a/gcc/config/aarch64/driver-aarch64.c 
b/gcc/config/aarch64/driver-aarch64.c

Re: [PATCH, PR tree-optimization/68327] Compute vectype for live phi nodes when copmputing VF

2015-11-20 Thread Richard Biener
On Wed, Nov 18, 2015 at 2:53 PM, Ilya Enkovich  wrote:
> 2015-11-18 16:44 GMT+03:00 Richard Biener :
>> On Wed, Nov 18, 2015 at 12:34 PM, Ilya Enkovich  
>> wrote:
>>> Hi,
>>>
>>> When we compute vectypes we skip non-relevant phi nodes.  But we process 
>>> non-relevant alive statements and thus may need vectype of non-relevant 
>>> live phi node to compute mask vectype.  This patch enables vectype 
>>> computation for live phi nodes.  Botostrapped and regtested on 
>>> x86_64-unknown-linux-gnu.  OK for trunk?
>>
>> Hmm.  What breaks if you instead skip all !relevant stmts and not
>> compute vectype for life but not relevant ones?  We won't ever
>> "vectorize" !relevant ones, that is, we don't need their vector type.
>
> I tried it and got regression in SLP.  It expected non-null vectype
> for non-releveant but live statement. Regression was in
> gcc/gcc/testsuite/gfortran.fortran-torture/execute/pr43390.f90

Because somebody put a vector type check before

  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
return false;

@@ -7590,6 +7651,9 @@ vectorizable_comparison (gimple *stmt, g
   tree mask_type;
   tree mask;

+  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+return false;
+
   if (!VECTOR_BOOLEAN_TYPE_P (vectype))
 return false;

@@ -7602,8 +7666,6 @@ vectorizable_comparison (gimple *stmt, g
 ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;

   gcc_assert (ncopies >= 1);
-  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
-return false;

   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
   && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle

fixes this particular fallout for me.

Richard.

> Ilya
>
>>
>> Richard.
>>
>>> Thanks,
>>> Ilya


[Bug target/68456] UINT32_TYPE is long unsigned for 32bit targets

2015-11-20 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68456

--- Comment #1 from Jonathan Wakely  ---
(In reply to Yulia Koval from comment #0)
> This problem causes strange warnings, for example in printf("%u...", if this
> type is not overriden somewhere in the library:
> ": format '%u' expects argument of type 'unsigned int', but argument 2 has
> type 'uint32_t {aka long unsigned int}' "

Why are you using %u with a uint32_t ? That is not valid C.

> I also tried to build a newlib toolchain without this feature and it worked
> ok for me. This problem was found as a difference between gcc and llvm
> behavior.

That doesn't make it a bug. It is unspecified what type uint32_t is a synonym
for, the only requirement is that it's 32-bits.

[Bug c++/68290] g++.dg/concepts/auto1.C FAILs

2015-11-20 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68290

--- Comment #3 from Eric Botcazou  ---
This issue is host-dependent, it doesn't reproduce with a cross to 64-bit
SPARC.

The problematic types are:

(gdb) p debug_tree(t1)
 >

(gdb) p debug_tree(t2)
 >

and they compare equal according to structural_comptypes but have distinct
TYPE_CANONICAL (themselves actually).

I think that the dependence on the host comes from:

inline hashval_t
auto_hash::hash (tree t)
{
  if (tree c = PLACEHOLDER_TYPE_CONSTRAINTS (t))
/* Matching constrained-type-specifiers denote the same template
   parameter, so hash the constraint.  */
return hash_placeholder_constraint (c);
  else
/* But unconstrained autos are all separate, so just hash the pointer.  */
return iterative_hash_object (t, 0);
}

and that we have a hash collision on the SPARC machine.

The problem seems to come from comp_template_parms_position:

  /* In C++14 we can end up comparing 'auto' to a normal template
 parameter.  Don't confuse them.  */
  if (cxx_dialect >= cxx14 && (is_auto (t1) || is_auto (t2)))
return TYPE_IDENTIFIER (t1) == TYPE_IDENTIFIER (t2);

IIUC we should compare t1 and t2 directly here if both are 'auto's.

[Bug middle-end/68339] g++.dg/vect/simd-clone-2.cc ICEs with aggressive GC settings and OpenMP

2015-11-20 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68339

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2015-11-20
 CC|jakub at gcc dot gnu.org   |
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
   Target Milestone|--- |5.3
 Ever confirmed|0   |1

Re: basic asm and memory clobbers

2015-11-20 Thread Richard Henderson

On 11/20/2015 01:38 PM, David Wohlferd wrote:

On 11/20/2015 3:14 AM, Andrew Haley wrote:

On 20/11/15 10:37, David Wohlferd wrote:

The intent for 24414 is to change basic asm such that it will become
(quoting jeff) "an opaque blob that read/write/clobber any register or
memory location."  Such being the case, "memory" is not sufficient:

#define CLOBBERALL "eax", "ebx", "ecx", "edx", "r8", "r9", "r10", "r11",
"r12", "r13", "r14", "r15", "edi", "esi", "ebp", "cc", "memory"

Hmm.  I would not be at all surprised to see this cause reload
failures.  You certainly shouldn't clobber the frame pointer on
any machine which needs one.


If I don't clobber ebp, gcc just uses it:

 movl$1000, %ebp
.L2:
 #
 subl$1, %ebp
 jne .L2


I believe you'd have to have magic in there to conditionally clobber the 
register if it isn't being used as a frame pointer.


That said...


The original purpose of this code was to attempt to show that this kind of
"clobbering everything" behavior (the proposed new behavior for basic asm)
could have non-trivial impact on existing routines. While I've been told that
changing the existing "clobber nothing" approach to this kind of "clobber
everything" is "less intrusive than you might think," I'm struggling to believe
it.  It seems to me that one asm("nop") thrown into a driver routine to fix a
timing problem could end up making a real mess.

But actually we're kind of past that.  When Jeff, Segher, (other) Andrew and
Richard all say "this is how it's going to work," it's time for me to set aside
my reservations and move on.

So now I'm just trying my best to make sure that if it *is* an issue, people
have a viable solution readily available.  And to make sure it's all correctly
doc'ed (which is what started this whole mess).


I'd be perfectly happy to deprecate and later completely remove basic asm 
within functions.


Because IMO it's essentially useless.  It has no inputs, no outputs, and no way 
to tell the compiler what machine state has been changed.  We can say that "it 
clobbers everything", but that's not actually useful, and quite difficult as 
you're finding out.


It seems to me that it would be better to remove the feature, forcing what must 
be an extremely small number of users to audit and update to extended asm.




r~


[Bug tree-optimization/68417] [6 Regression] Missed vectorization opportunity when setting struct field

2015-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68417

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-11-20
   Target Milestone|--- |6.0
 Ever confirmed|0   |1

--- Comment #3 from Richard Biener  ---
Confirmed.

[Bug rtl-optimization/68173] gcc takes a long time and a lot of memory with -O0 on source file with very large expression

2015-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68173

Richard Biener  changed:

   What|Removed |Added

 CC||stevenb.gcc at gmail dot com

--- Comment #12 from Richard Biener  ---
callgrind points at bitmap_set_bit called via process_bb_lives ->
mark_regno_dead.
Maybe some code in that (the DCE code?) can be keyed on if (optimize).

in mark_regno_dead callgrind points to

  bitmap_set_bit (bb_killed_pseudos, regno);

being the expensive one.  I suppose the issue here is that the bit access
pattern
to that bitmap is random which exposes our O(n) complexity of
bitmap_find_bit...
Indeed for bitmap_find_bit the linear search for if (head->indx < indx) is
the "hot" part.

I wonder if we should (finally) use a RB tree for bitmap.  I even remember
some patches posted to improve this (from Steven?) this or last year?

[patch] Document new overflow arithmetics patterns

2015-11-20 Thread Eric Botcazou
Hi,

this documents the new overflow arithmetics patterns added recently (addv4, 
subv4, mulv4, umulv4, negv3) and only them, i.e. the old ones are still not.
This also fixes the description of the cbranch and jump patterns, which were 
referring to a label_ref instead of a code_label.

Tested with 'make doc' on x86-64/Linux, OK for mainline and 5 branch?


2015-11-20  Eric Botcazou  

* doc/md.texi (Standard Names): Move entry for addptr3 around,
add entries for addv4, subv4, mulv4, umulv4 and negv3, fixes
glitch in entries for cbranch4 and jump.

-- 
Eric BotcazouIndex: doc/md.texi
===
--- doc/md.texi	(revision 230589)
+++ doc/md.texi	(working copy)
@@ -4872,17 +4872,6 @@ Add operand 2 and operand 1, storing the
 must have mode @var{m}.  This can be used even on two-address machines, by
 means of constraints requiring operands 1 and 0 to be the same location.
 
-@cindex @code{addptr@var{m}3} instruction pattern
-@item @samp{addptr@var{m}3}
-Like @code{add@var{m}3} but is guaranteed to only be used for address
-calculations.  The expanded code is not allowed to clobber the
-condition code.  It only needs to be defined if @code{add@var{m}3}
-sets the condition code.  If adds used for address calculations and
-normal adds are not compatible it is required to expand a distinct
-pattern (e.g. using an unspec).  The pattern is used by LRA to emit
-address calculations.  @code{add@var{m}3} is used if
-@code{addptr@var{m}3} is not defined.
-
 @cindex @code{ssadd@var{m}3} instruction pattern
 @cindex @code{usadd@var{m}3} instruction pattern
 @cindex @code{sub@var{m}3} instruction pattern
@@ -4912,6 +4901,35 @@ address calculations.  @code{add@var{m}3
 @itemx @samp{and@var{m}3}, @samp{ior@var{m}3}, @samp{xor@var{m}3}
 Similar, for other arithmetic operations.
 
+@cindex @code{addv@var{m}4} instruction pattern
+@item @samp{addv@var{m}4}
+Like @code{add@var{m}3} but takes a @code{code_label} as operand 3 and
+emits code to jump to it if signed overflow occurs during the addition.
+This pattern is used to implement the built-in functions performing
+signed integer addition with overflow checking.
+
+@cindex @code{subv@var{m}4} instruction pattern
+@cindex @code{mulv@var{m}4} instruction pattern
+@item @samp{subv@var{m}4}, @samp{mulv@var{m}4}
+Similar, for other signed arithmetic operations.
+
+@cindex @code{umulv@var{m}4} instruction pattern
+@item @samp{umulv@var{m}4}
+Like @code{mulv@var{m}4} but for unsigned multiplication.  That is to
+say, the operation is the same as signed multiplication but the jump
+is taken only on unsigned overflow.
+
+@cindex @code{addptr@var{m}3} instruction pattern
+@item @samp{addptr@var{m}3}
+Like @code{add@var{m}3} but is guaranteed to only be used for address
+calculations.  The expanded code is not allowed to clobber the
+condition code.  It only needs to be defined if @code{add@var{m}3}
+sets the condition code.  If adds used for address calculations and
+normal adds are not compatible it is required to expand a distinct
+pattern (e.g. using an unspec).  The pattern is used by LRA to emit
+address calculations.  @code{add@var{m}3} is used if
+@code{addptr@var{m}3} is not defined.
+
 @cindex @code{fma@var{m}4} instruction pattern
 @item @samp{fma@var{m}4}
 Multiply operand 2 and operand 1, then add operand 3, storing the
@@ -5277,6 +5295,11 @@ Reverse the order of bytes of operand 1
 @item @samp{neg@var{m}2}, @samp{ssneg@var{m}2}, @samp{usneg@var{m}2}
 Negate operand 1 and store the result in operand 0.
 
+@cindex @code{negv@var{m}3} instruction pattern
+@item @samp{negv@var{m}3}
+Like @code{neg@var{m}2}  but takes a @code{code_label} as operand 2 and
+emits code to jump to it if signed overflow occurs during the negation.
+
 @cindex @code{abs@var{m}2} instruction pattern
 @item @samp{abs@var{m}2}
 Store the absolute value of operand 1 into operand 0.
@@ -5926,13 +5949,13 @@ from the machine description.
 Conditional branch instruction combined with a compare instruction.
 Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
 first and second operands of the comparison, respectively.  Operand 3
-is a @code{label_ref} that refers to the label to jump to.
+is the @code{code_label} to jump to.
 
 @cindex @code{jump} instruction pattern
 @item @samp{jump}
 A jump inside a function; an unconditional branch.  Operand 0 is the
-@code{label_ref} of the label to jump to.  This pattern name is mandatory
-on all machines.
+@code{code_label} to jump to.  This pattern name is mandatory on all
+machines.
 
 @cindex @code{call} instruction pattern
 @item @samp{call}


[Bug jit/68446] jit testsuite failures seen inside dwarf2out.c:gen_producer_string

2015-11-20 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68446

--- Comment #2 from Martin Liška  ---
Hi David.

I've just verified that the invalid free is presented since introduction of
driver::finalize in r227188.

This is valgrind report coming from r230263 (one revision before the
suspected):

$ /home/marxin/Programming/gcc2/objdir/gcc/xgcc
-B/home/marxin/Programming/gcc2/objdir/gcc/
/home/marxin/Programming/gcc2/gcc/testsuite/jit.dg/test-combination.c
-fno-diagnostics-show-caret -fdiagnostics-color=never
-I/home/marxin/Programming/gcc2/gcc/testsuite/../jit -lgccjit -g -Wall -Werror
-Wl,--export-dynamic -fgnu89-inline -lm -o test-combination.c.exe
$ LD_LIBRARY_PATH=gcc valgrind --leak-check=yes ./test-volatile.c.exe

==20414== Invalid free() / delete / delete[] / realloc()
==20414==at 0x4C2A7FB: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==20414==by 0x504CE2D: driver::finalize() (gcc.c:9814)
==20414==by 0x5031F3B:
gcc::jit::playback::context::invoke_embedded_driver(vec
const*) (jit-playback.c:2441)
==20414==by 0x50346D6: gcc::jit::playback::context::invoke_driver(char
const*, char const*, char const*, timevar_id_t, bool, bool)
(jit-playback.c:2427)
==20414==by 0x50350C9: gcc::jit::playback::context::convert_to_dso(char
const*) (jit-playback.c:2357)
==20414==by 0x5035165:
gcc::jit::playback::compile_to_memory::postprocess(char const*)
(jit-playback.c:1845)
==20414==by 0x5033D65: gcc::jit::playback::context::compile()
(jit-playback.c:1818)
==20414==by 0x502A83E: gcc::jit::recording::context::compile()
(jit-recording.c:1241)
==20414==by 0x5020095: gcc_jit_context_compile (libgccjit.c:2677)
==20414==by 0x402087: test_jit (harness.h:371)
==20414==by 0x40217D: main (harness.h:419)
==20414==  Address 0x63f7b80 is 0 bytes inside data symbol "_ZL12static_specs"

Martin

Re: OpenACC declare directive updates

2015-11-20 Thread Jakub Jelinek
On Thu, Nov 19, 2015 at 10:22:16AM -0600, James Norris wrote:
> 2015-XX-XX  James Norris  
>   Cesar Philippidis  
> 
>   gcc/fortran/
>   * dump-parse-tree.c (show_namespace): Handle declares.
>   * gfortran.h (struct symbol_attribute): New fields.
>   (enum gfc_omp_map_map): Add OMP_MAP_DEVICE_RESIDENT and OMP_MAP_LINK.
>   (OMP_LIST_LINK): New enum.
>   (struct gfc_oacc_declare): New structure.
>   (gfc_get_oacc_declare): New definition.
>   (struct gfc_namespace): Change type.
>   (enum gfc_exec_op): Add EXEC_OACC_DECLARE.
>   (struct gfc_code): New field.
>   * module.c (enum ab_attribute): Add AB_OACC_DECLARE_CREATE,
>   AB_OACC_DECLARE_COPYIN, AB_OACC_DECLARE_DEVICEPTR,
>   AB_OACC_DECLARE_DEVICE_RESIDENT, AB_OACC_DECLARE_LINK
>   (attr_bits): Add new initializers.
>   (mio_symbol_attribute): Handle new atributes.
>   * openmp.c (gfc_free_oacc_declare_clauses): New function.
>   (gfc_match_oacc_clause_link: Likewise.
>   (OMP_CLAUSE_LINK): New definition.
>   (gfc_match_omp_clauses): Handle OMP_CLAUSE_LINK.
>   (OACC_DECLARE_CLAUSES): Add OMP_CLAUSE_LINK
>   (gfc_match_oacc_declare): Add checking and module handling.
>   (gfc_resolve_oacc_declare): Reimplement.
>   * parse.c (case_decl): Add ST_OACC_DECLARE.
>   (parse_spec): Remove handling.
>   (parse_progunit): Remove handling.
>   * parse.h (struct gfc_state_data): Change type.
>   * resolve.c (gfc_resolve_blocks): Handle EXEC_OACC_DECLARE.
>   * st.c (gfc_free_statement): Handle EXEC_OACC_DECLARE.
>   * symbol.c (check_conflict): Add conflict checks.
>   (gfc_add_oacc_declare_create, gfc_add_oacc_declare_copyin, 
>   gfc_add_oacc_declare_deviceptr, gfc_add_oacc_declare_device_resident):
>   New functions.
>   (gfc_copy_attr): Handle new symbols.
>   * trans-decl.c (add_clause, find_module_oacc_declare_clauses,
>   finish_oacc_declare): New functions.
>   (gfc_generate_function_code): Replace with call.
>   * trans-openmp.c (gfc_trans_oacc_declare): Reimplement.
>   (gfc_trans_oacc_directive): Handle EXEC_OACC_DECLARE.
>   * trans-stmt.c (gfc_trans_block_construct): Replace with call.
>   * trans-stmt.h (gfc_trans_oacc_declare): Remove argument.
>   * trans.c (trans_code): Handle EXEC_OACC_DECLARE.
> 
>   gcc/testsuite
>   * gfortran.dg/goacc/declare-1.f95: Update test.
>   * gfortran.dg/goacc/declare-2.f95: New test.
> 
>   libgomp/
>   * testsuite/libgomp.oacc-fortran/declare-1.f90: New test.
>   * testsuite/libgomp.oacc-fortran/declare-2.f90: Likewise.
>   * testsuite/libgomp.oacc-fortran/declare-3.f90: Likewise.
>   * testsuite/libgomp.oacc-fortran/declare-4.f90: Likewise.
>   * testsuite/libgomp.oacc-fortran/declare-5.f90: Likewise.
>   * testsuite/libgomp.oacc-fortran/declare-5.f90: Likewise.

Ok.

Jakub


Re: [PATCH] PR tree-optimization/68413 : Only check for integer cond reduction on analysis stage

2015-11-20 Thread Richard Biener
On Fri, Nov 20, 2015 at 10:24 AM, Alan Hayward  wrote:
> When vectorising a integer induction condition reduction,
> is_nonwrapping_integer_induction ends up with different values for base
> during the analysis and build phases. In the first it is an INTEGER_CST,
> in the second the loop has been vectorised out and the base is now a
> variable.
>
> This results in the analysis and build stage detecting the
> STMT_VINFO_VEC_REDUCTION_TYPE as different types.
>
> The easiest way to fix this is to only check for integer induction
> conditions on the analysis stage.

I don't like this.  For the evolution part we have added
STMT_VINFO_LOOP_PHI_EVOLUTION_PART.  If you now need
the original initial value as well then just save it.

Or if you really want to go with the hack then please do not call
is_nonwrapping_integer_induction with vec_stmt != NULL but
initialize cond_expr_is_nonwrapping_integer_induction from
STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)

The hack also lacks a comment.

Richard.

> gcc/
> PR tree-optimization/68413
> * tree-vect-loop.c (vectorizable_reduction): Only check for
> integer cond reduction on analysis stage.
>
>
>
>
> Thanks,
> Alan.
>


[Bug testsuite/68457] New: make check RUNTESTFLAGS="--outdir=$OUT_PATH" does not work with -j

2015-11-20 Thread vaalfreja at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68457

Bug ID: 68457
   Summary: make check RUNTESTFLAGS="--outdir=$OUT_PATH" does not
work with -j
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vaalfreja at gmail dot com
  Target Milestone: ---

Only gcc.sum from the first thread is copied in the $OUT_PATH, if make check is
used with -j $COUNT.

Here is the log:
# of expected passes1492
# of unexpected failures16
# of expected failures  6
# of unresolved testcases   2
# of unsupported tests  36
/gcc_trunk_64bit/gcc/xgcc  version 6.0.0 20151022 (experimental) (GCC)

make[3]: Leaving directory `/gcc_trunk_64bit/gcc'
mv: cannot stat ‘testsuite/gcc/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc1/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc1/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc2/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc2/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc3/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc3/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc4/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc4/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc5/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc5/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc6/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc6/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc7/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc7/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc8/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc8/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc9/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc9/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc10/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc10/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc11/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc11/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc12/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc12/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc13/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc13/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc14/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc14/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc15/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc15/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc16/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc16/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc17/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc17/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc18/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc18/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc19/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc19/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc20/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc20/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc21/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc21/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc22/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc22/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc23/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc23/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc24/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc24/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc25/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc25/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc26/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc26/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc27/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc27/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc28/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc28/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc29/gcc.sum’: No such file or directory
mv: cannot stat ‘testsuite/gcc29/gcc.log’: No such file or directory
mv: cannot stat ‘testsuite/gcc30/gcc.sum’: No such file or 

Re: [PATCH, PR68337] Don't fold memcpy/memmove we want to instrument

2015-11-20 Thread Ilya Enkovich
On 19 Nov 18:19, Richard Biener wrote:
> On November 19, 2015 6:12:30 PM GMT+01:00, Bernd Schmidt 
>  wrote:
> >On 11/19/2015 05:31 PM, Ilya Enkovich wrote:
> >> Currently we fold all memcpy/memmove calls with a known data size.
> >> It causes two problems when used with Pointer Bounds Checker.
> >> The first problem is that we may copy pointers as integer data
> >> and thus loose bounds.  The second problem is that if we inline
> >> memcpy, we also have to inline bounds copy and this may result
> >> in a huge amount of code and significant compilation time growth.
> >> This patch disables folding for functions we want to instrument.
> >>
> >> Does it look reasonable for trunk and GCC5 branch?  Bootstrapped
> >> and regtested on x86_64-unknown-linux-gnu.
> >
> >Can't see anything wrong with it. Ok.
> 
> But for small sizes this can have a huge impact on optimization.  Which is 
> why we have the code in the first place.  I'd make the check less broad, for 
> example inlining copies of size less than a pointer shouldn't be affected.

Right.  We also may inline in case we know no pointers are copied.  Below is a 
version with extended condition and a couple more tests.  Bootstrapped and 
regtested on x86_64-unknown-linux-gnu.  Does it OK for trunk and gcc-5-branch?

> 
> Richard.
> 
> >
> >Bernd
> 
> 

Thanks,
Ilya
--
gcc/

2015-11-20  Ilya Enkovich  

* gimple-fold.c (gimple_fold_builtin_memory_op): Don't
fold call if we are going to instrument it and it may
copy pointers.

gcc/testsuite/

2015-11-20  Ilya Enkovich  

* gcc.target/i386/mpx/pr68337-1.c: New test.
* gcc.target/i386/mpx/pr68337-2.c: New test.
* gcc.target/i386/mpx/pr68337-3.c: New test.


diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 1ab20d1..dd9f80b 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -53,6 +53,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "gomp-constants.h"
 #include "optabs-query.h"
 #include "omp-low.h"
+#include "tree-chkp.h"
+#include "ipa-chkp.h"
 
 
 /* Return true when DECL can be referenced from current unit.
@@ -664,6 +666,23 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
   unsigned int src_align, dest_align;
   tree off0;
 
+  /* Inlining of memcpy/memmove may cause bounds lost (if we copy
+pointers as wide integer) and also may result in huge function
+size because of inlined bounds copy.  Thus don't inline for
+functions we want to instrument in case pointers are copied.  */
+  if (flag_check_pointer_bounds
+ && chkp_instrumentable_p (cfun->decl)
+ /* Even if data may contain pointers we can inline if copy
+less than a pointer size.  */
+ && (!tree_fits_uhwi_p (len)
+ || compare_tree_int (len, POINTER_SIZE_UNITS) >= 0)
+ /* Check data type for pointers.  */
+ && (!TREE_TYPE (src)
+ || !TREE_TYPE (TREE_TYPE (src))
+ || VOID_TYPE_P (TREE_TYPE (TREE_TYPE (src)))
+ || chkp_type_has_pointer (TREE_TYPE (TREE_TYPE (src)
+   return false;
+
   /* Build accesses at offset zero with a ref-all character type.  */
   off0 = build_int_cst (build_pointer_type_for_mode (char_type_node,
 ptr_mode, true), 0);
diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c 
b/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c
new file mode 100644
index 000..3f8d79d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-options "-fcheck-pointer-bounds -mmpx" } */
+
+#include "mpx-check.h"
+
+#define N 2
+
+extern void abort ();
+
+static int
+mpx_test (int argc, const char **argv)
+{
+  char ** src = (char **)malloc (sizeof (char *) * N);
+  char ** dst = (char **)malloc (sizeof (char *) * N);
+  int i;
+
+  for (i = 0; i < N; i++)
+src[i] = __bnd_set_ptr_bounds (argv[0] + i, i + 1);
+
+  __builtin_memcpy(dst, src, sizeof (char *) * N);
+
+  for (i = 0; i < N; i++)
+{
+  char *p = dst[i];
+  if (p != argv[0] + i
+ || __bnd_get_ptr_lbound (p) != p
+ || __bnd_get_ptr_ubound (p) != p + i)
+   abort ();
+}
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr68337-2.c 
b/gcc/testsuite/gcc.target/i386/mpx/pr68337-2.c
new file mode 100644
index 000..16736b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mpx/pr68337-2.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-fcheck-pointer-bounds -mmpx" } */
+/* { dg-final { scan-assembler-not "memcpy" } } */
+
+void
+test1 (char *dst, char *src)
+{
+  __builtin_memcpy (dst, src, sizeof (char *) * 2);
+}
+
+void
+test2 (void *dst, void *src)
+{
+  __builtin_memcpy (dst, src, sizeof (char *) / 2);
+}
+
+struct s
+{
+  int a;
+  int b;
+};
+
+void
+test3 (struct s *dst, struct s *src)
+{
+  

[Bug tree-optimization/68317] [6 regression] ice in set_value_range, at tree-vrp.c:380

2015-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68317

--- Comment #10 from Richard Biener  ---
(In reply to Jiong Wang from comment #9)
> (In reply to Richard Biener from comment #7)
> > (In reply to Jiong Wang from comment #6)
> > > Created attachment 36741 [details]
> > > prototype-fix
> > > 
> > > diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
> > > index b614412..55a6334 100644
> > > --- a/gcc/tree-ssa-loop-manip.c
> > > +++ b/gcc/tree-ssa-loop-manip.c
> > > @@ -136,6 +136,11 @@ create_iv (tree base, tree step, tree var, struct 
> > > loop
> > > *loop,
> > >  gsi_insert_seq_on_edge_immediate (pe, stmts);
> > >  
> > >phi = create_phi_node (vb, loop->header);
> > > +  if (TREE_OVERFLOW (initial)
> > > +  && TREE_CODE (initial) == INTEGER_CST
> > > +  && int_fits_type_p (initial, TREE_TYPE (vb)))
> > > +initial = drop_tree_overflow (initial);
> > > +
> > >add_phi_arg (phi, initial, loop_preheader_edge (loop), 
> > > UNKNOWN_LOCATION);
> > >add_phi_arg (phi, va, loop_latch_edge (loop), UNKNOWN_LOCATION);
> > >  }
> > 
> > I think it's better to track down where the constant is generated.  I
> > see initial is created by
> > 
> >   initial = force_gimple_operand (base, , true, var);
> > 
> > thus likely base is already the same constant (passed from the caller).
> > 
> > I usually set a breakpoint on the return statement of ggc_internal_alloc
> > conditional on the return value being the tree with the overflow.
> > 
> > Once the overflow value is returned from fold_* () it should be stripped
> > off its overflow flag.  Unconditionally so with just
> > 
> >   if (TREE_OVERFLOW_P (..))
> >.. = drop_tree_overflow (..);
> 
> Richard,
> 
>  After further investigation on where the overflow flag comes
>  from. I found there are too many possibility.
> 
>  For example, for the testcase reported in PR68326, it's originated at
>  fully_constant_expression, at tree-ssa-pre.c when handling tcc_unary,
>  the fold_unary will set overflag flag.
> 
>  While for the testcase in this PR, there are quite a few OVF variables,
>  For the one caused the ICE, the OVF is inherited from another OVF
>  variable and the most early I can track down is at tree-ssa-ccp.c, tree
>  variable "simplified" is simplifed by gimple-fold infrastructure, and
>  conclude to be overflowed which is correct (C source code is
>  print(..."0x%08x...", (0xff4 + i) * 0x10..., the multiply are
>  assumed to be generating signed int, thus overflowed.), While my
> understanding
>  is it's only used to generate warning. So I tested to call
> drop_tree_overflow,
>  but then later passes will re-calculate the variable, and re-set the
> overflow
>  flag, for example in chrec_fold*.
> 
>  I don't undertand related code base, and fell it will be dangerous to 
>  just call drop_tree_overflow in those places.

Well, the GIMPLE IL should have _no_ constants with TREE_OVERFLOW set.
I even had checking code for that (but it tripped, obviously as you noticed ;))

>  After a second thinking, this ICE is caused by adjust_range_with_scev
>  getting range with overflowed constants min or max. So given there are
>  too many places to generate OVF, can we just do a check in
>  adjust_range_with_scev, if the constant min or max in the range info
>  can fit into the variable type, then naturally we should treat those
>  OVF as false alarm and drop them? something like the following, which I
>  think can fix the OVF side-effect caused by r230150.
> 
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index e2393e4..56440b1 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -4331,6 +4331,16 @@ adjust_range_with_scev (value_range *vr, struct loop
> *loop,
>   && is_positive_overflow_infinity (max)))
>  return;
> 
> +  if (TREE_CODE (min) == INTEGER_CST
> +  && TREE_OVERFLOW (min)
> +  && int_fits_type_p (min, type))
> +min = drop_tree_overflow (min);
> +
> +  if (TREE_CODE (max) == INTEGER_CST
> +  && TREE_OVERFLOW (max)
> +  && int_fits_type_p (max, type))
> +max = drop_tree_overflow (max);
> +
>set_value_range (vr, VR_RANGE, min, max, vr->equiv);
>  }

The constant will be always in-range so it doesn't make much sense in this
form.
Note also that positive/negative_overflow_infinities are to be preserved,
only other overflows need to be dropped here.

Yes, a workaround here might be ok in the end but in reality all those
other places you identified should be fixed.  So the above code should be

  if (TREE_OVERFLOW_P (min)
  && ! is_negative_overflow_infinity (min))
min = drop_tree_overflow (min);
  if (TREE_OVERFLOW_P (max)
  && ! is_positive_overflow_infinity (max))
max = drop_tree_overflow (max);

[Bug middle-end/68436] [5 Regression] wrong code on x86_64-linux-gnu

2015-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68436

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org
   Target Milestone|--- |5.3

--- Comment #1 from Richard Biener  ---
Well, certainly malloc () is not supposed to read from x->sm.sm_buffer if that
is
what it does (not sure, the malloc implementation is not visible).

So I suppose what happens is that alloc_object () allocates from GC, that
GC knows that sm.sm_buffer is a pointer to other GC memory and the call
to malloc () triggers a GC run?

In that case yes, malloc () is not supposed to do GC.  If it does you need
to use -fno-builtin-malloc/free.

I consider this bug invalid.

Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def

2015-11-20 Thread Richard Biener
On Thu, 19 Nov 2015, Tom de Vries wrote:

> On 16/11/15 13:45, Richard Biener wrote:
> > > I've eliminated all the uses for pass_tree_loop_init/pass_tree_loop_done
> > > in
> > > >the pass group. Instead, I've added conditional loop optimizer setup in:
> > > >-  pass_lim and pass_scev_cprop (added in this patch), and
> 
> Reposting the "Add pass_oacc_kernels pass group in passes.def" patch.
> 
> pass_scev_cprop is no longer part of the pass group.
> 
> And I've dropped the scev_initialize in pass_lim.
> 
> Pass_lim is part of the pass_tree_loop pass group, where AFAIU scev info is
> initialized at the start of the pass group and updated or reset by passes in
> the pass group if necessary, such that it's always available, or can be
> recalculated on the spot.
> 
> First, pass_lim doesn't invalidate scev info. And second, AFAIU pass_lim
> doesn't use scev info. So there doesn't seem to be a need to do anything about
> scev info for using pass_lim outside pass_tree_loop.
> 
> > > >- pass_parallelize_loops_oacc_kernels (added in patch "Add
> > > >   pass_parallelize_loops_oacc_kernels").
> > You miss calling scev_finalize ().
> 
> I've added the scev_finalize () in patch "Add
> pass_parallelize_loops_oacc_kernels".

 pass_lim::execute (function *fun)
 {
+  if (!loops_state_satisfies_p (LOOPS_NORMAL
+   | LOOPS_HAVE_RECORDED_EXITS))
+loop_optimizer_init (LOOPS_NORMAL
+| LOOPS_HAVE_RECORDED_EXITS);
+

note that this will, when not in the loop pipeline, not properly
fixup loops if LOOPS_NEED_FIXUP is set (that doesn't clear other
loop flags).  I'd rather make loop_optimizer_init do nothing
if requested flags are already set and no fixup is needed and
call the above unconditionally.  Thus sth like

Index: gcc/loop-init.c
===
--- gcc/loop-init.c (revision 230649)
+++ gcc/loop-init.c (working copy)
@@ -103,7 +103,11 @@ loop_optimizer_init (unsigned flags)
   calculate_dominance_info (CDI_DOMINATORS);
 
   if (!needs_fixup)
-   checking_verify_loop_structure ();
+   {
+ checking_verify_loop_structure ();
+ if (loops_state_satisfies_p (flags))
+   goto out;
+   }
 
   /* Clear all flags.  */
   if (recorded_exits)
@@ -122,11 +126,12 @@ loop_optimizer_init (unsigned flags)
   /* Apply flags to loops.  */
   apply_loop_flags (flags);
 
+  checking_verify_loop_structure ();
+
+out:
   /* Dump loops.  */
   flow_loops_dump (dump_file, NULL, 1);
 
-  checking_verify_loop_structure ();
-
   timevar_pop (TV_LOOP_INIT);
 }
 



   if (number_of_loops (fun) <= 1)
 return 0;
 
+  if (!loops_state_satisfies_p (LOOP_CLOSED_SSA))
+rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
+
   return tree_ssa_lim ();
 }

that looks bogus.  The into-loop-closed SSA rewrite should be
only done if the state _satisfies_ it.  I understand LIM doesn't
require loop-closed SSA.  But it also doesn't destroy it obviously.
So just remove that.



> Thanks,
> - Tom
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: basic asm and memory clobbers

2015-11-20 Thread David Wohlferd

On 11/20/2015 2:17 AM, Andrew Haley wrote:

On 20/11/15 01:23, David Wohlferd wrote:

I tried to picture the most basic case I can think of that uses
something clobber-able:

 for (int x=0; x < 1000; x++)
asm("#stuff");

This generates very simple and highly performant code:

  movl$1000, %eax
.L2:
  #stuff
  subl$1, %eax
  jne .L2

Using extended asm to simulate the clobberall gives:

  movl$1000, 44(%rsp)
.L2:
  #stuff
  subl$1, 44(%rsp)
  jne .L2

It allocates an extra 4 bytes, and changed everything to memory accesses
instead of using a register.

Can you show us your code?  I get

xx:
movl$1000, %eax
.L2:
#stuff
subl$1, %eax
jne .L2
rep; ret

for

void xx() {
   for (int x=0; x < 1000; x++)
 asm volatile("#stuff" : : : "memory");
}

What you're describing looks like a bug: x doesn't have its address
taken.


The intent for 24414 is to change basic asm such that it will become 
(quoting jeff) "an opaque blob that read/write/clobber any register or 
memory location."  Such being the case, "memory" is not sufficient:


#define CLOBBERALL "eax", "ebx", "ecx", "edx", "r8", "r9", "r10", "r11", 
"r12", "r13", "r14", "r15", "edi", "esi", "ebp", "cc", "memory"


int main()
{
   for (int x=0; x < 1000; x++)
  asm("#":::CLOBBERALL);
}

dw


Re: Add uaddv4_optab, usubv4_optab

2015-11-20 Thread Richard Henderson

On 11/20/2015 11:56 AM, Eric Botcazou wrote:

Eric has just submitted a documentation path that documented the
{add,sub,mul,umul}v4 and negv3 patterns, so this should be
applied on top of that.


OK, I'm going to apply it, thanks.


Thanks.


Note that the comment at the beginning
of expand_addsub_overflow describing the overall strategy ought to be adjusted
if new patterns for the jump on carry are added.


Ok, I'll do that.


r~



Re: Add uaddv4_optab, usubv4_optab

2015-11-20 Thread Richard Henderson

On 11/20/2015 11:43 AM, Jakub Jelinek wrote:

+(define_expand "uaddv4"
+  [(parallel [(set (reg:CCC FLAGS_REG)
+  (compare:CCC
+(plus:SWI (match_dup 1) (match_dup 2))
+(match_dup 1)))
+ (set (match_dup 0)
+  (plus:SWI (match_dup 1) (match_dup 2)))])
+   (set (pc) (if_then_else
+  (ne (reg:CCC FLAGS_REG) (const_int 0))
+  (label_ref (match_operand 3))
+  (pc)))]
+  ""
+{
+  ix86_fixup_binary_operands_no_copy (PLUS, mode, operands);
+})


Do we need this one on i?86?  I'm not against adding it to optabs, so that
other targets have a way to improve that, but doesn't combine handle this
case on i?86 already well?


Perhaps combine can do the job, but in my option it's better to have the optab 
than not.  Especially when it's so easy to add in this case.



+(define_expand "usubv4"
+  [(parallel [(set (reg:CC FLAGS_REG)
+  (compare:CC (match_dup 1) (match_dup 2)))
+ (set (match_dup 0)
+  (minus:SWI (match_dup 1) (match_dup 2)))])
+   (set (pc) (if_then_else
+  (ltu (reg:CC FLAGS_REG) (const_int 0))
+  (label_ref (match_operand 3))
+  (pc)))]


If this works, it will be nice, I thought we'll need a new CC*mode.


No, we just need to re-use the existing insn that performs the low half of a 
double-word subtraction operation.



Eric has just submitted a documentation path that documented the
{add,sub,mul,umul}v4 and negv3 patterns, so this should be
applied on top of that.


Ok, I'll look out for that.


r~


[Bug c++/68409] Garbage added to a map instead of object

2015-11-20 Thread adrian.wielgosik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68409

Adrian Wielgosik  changed:

   What|Removed |Added

 CC||adrian.wielgosik at gmail dot 
com

--- Comment #2 from Adrian Wielgosik  ---
Your operator< doesn't seem to satisfy strict weak ordering. Once I rewrote it
to a basic but safer version:

bool operator< (const chave& lhs, const chave& rhs) {
if(lhs.numeros_ord != rhs.numeros_ord)
return lhs.numeros_ord < rhs.numeros_ord;
return lhs.estrelas_ord < rhs.estrelas_ord;
}

It seems to work fine to me.

(Also, when testing with clang/GCC, try using AddressSanitizer or valgrind,
you'll have a better chance of catching illegal memory accesses.)

Re: [PATCH][GCC][ARM] Disable neon testing for armv7-m

2015-11-20 Thread Kyrill Tkachov

Hi Andre,

On 18/11/15 09:44, Andre Vieira wrote:

On 17/11/15 10:10, James Greenhalgh wrote:

On Mon, Nov 16, 2015 at 01:15:32PM +, Andre Vieira wrote:

On 16/11/15 12:07, James Greenhalgh wrote:

On Mon, Nov 16, 2015 at 10:49:11AM +, Andre Vieira wrote:

Hi,

   This patch changes the target support mechanism to make it
recognize any ARM 'M' profile as a non-neon supporting target. The
current check only tests for armv6 architectures and earlier, and
does not account for armv7-m.

   This is correct because there is no 'M' profile that supports neon
and the current test is not sufficient to exclude armv7-m.

   Tested by running regressions for this testcase for various ARM targets.

   Is this OK to commit?

   Thanks,
   Andre Vieira

gcc/testsuite/ChangeLog:
2015-11-06  Andre Vieira 

 * gcc/testsuite/lib/target-supports.exp
   (check_effective_target_arm_neon_ok_nocache): Added check
   for M profile.



 From 2c53bb9ba3236919ecf137a4887abf26d4f7fda2 Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Fri, 13 Nov 2015 11:16:34 +
Subject: [PATCH] Disable neon testing for armv7-m

---
  gcc/testsuite/lib/target-supports.exp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 
75d506829221e3d02d454631c4bd2acd1a8cedf2..8097a4621b088a93d58d09571cf7aa27b8d5fba6
 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2854,7 +2854,7 @@ proc check_effective_target_arm_neon_ok_nocache { } {
  int dummy;
  /* Avoid the case where a test adds -mfpu=neon, but the toolchain is
 configured for -mcpu=arm926ej-s, for example.  */
-#if __ARM_ARCH < 7
+#if __ARM_ARCH < 7 || __ARM_ARCH_PROFILE == 'M'
  #error Architecture too old for NEON.


Could you fix this #error message while you're here?

Why we can't change this test to look for the __ARM_NEON macro from ACLE:

#if __ARM_NEON < 1
   #error NEON is not enabled
#endif

Thanks,
James



There is a check for this already:
'check_effective_target_arm_neon'. I think the idea behind
arm_neon_ok is to check whether the hardware would support neon,
whereas arm_neon is to check whether neon was enabled, i.e.
-mfpu=neon was used or a mcpu was passed that has neon enabled by
default.

The comments for 'check_effective_target_arm_neon_ok_nocache'
highlight this, though maybe the comments for
check_effective_target_arm_neon could be better.

# Return 1 if this is an ARM target supporting -mfpu=neon
# -mfloat-abi=softfp or equivalent options.  Some multilibs may be
# incompatible with these options.  Also set et_arm_neon_flags to the
# best options to add.

proc check_effective_target_arm_neon_ok_nocache
...
/* Avoid the case where a test adds -mfpu=neon, but the toolchain is
configured for -mcpu=arm926ej-s, for example.  */
...


and

# Return 1 if this is a ARM target with NEON enabled.

proc check_effective_target_arm_neon


OK, got it - sorry for my mistake, I had the two procs confused.

I'd still like to see the error message fixed "Architecture too old for NEON."
is not an accurate description of the problem.

Thanks,
James



This OK?



This is ok,
I've committed for you with the slightly tweaked ChangeLog entry:
2015-11-20  Andre Vieira  

* lib/target-supports.exp
(check_effective_target_arm_neon_ok_nocache): Add check
for M profile.

as r230653.

Thanks,
Kyrill



Cheers,
Andre




  1   2   3   >