Re: [dwarf, RFC] Emitting per-function dwarf info

2015-05-19 Thread Senthil Kumar Selvaraj
Ping!

Regards
Senthil

On Fri, Apr 10, 2015 at 12:19:36PM +0530, Senthil Kumar Selvaraj wrote:
> Hi,
> 
>  This (rather big) patch is an attempt to generate per function DWARF 
>  information for functions that go into their own sections (through 
>  -ffunction-section or otherwise). This is so that the GNU linker's 
>  garbage collection mechanism can then remove all DWARF information when it 
>  removes an unmarked function section.
> 
>  Most of the code for splitting off the debug information was adapted from 
>  Jason's comdat-debug branch, with a few changes to move to the C++ 
>  collection types.
> 
>  We started out with section groups, aiming to generate one section group 
>  per function containing DWARF info AND code for the function, so that when 
>  the linker gc's a function, it removes the whole section group.
>  But it turned out to be difficult to split all DWARF information - data that 
>  goes into debug_ranges, for example, is maintained in a flat list, and we 
>  found it hard to split it per function.
> 
>  If not all debug information is in the section group, references from the 
>  "global" DWARF info into the section group results in the section group 
>  itself not getting garbage collected - effectively breaking gc-sections.
> 
>  We then figured that the GNU binutils linker removes sections with the 
>  SEC_DEBUGGING flag that are named with the function's text section as the 
>  suffix (see https://www.sourceware.org/ml/binutils/2015-03/msg00326.html), 
> so 
>  we currently generate sections named to match that, and then let the
>  linker work its magic.
> 
>  At a high level, this is what we do
> 
>  1. In gen_subprogram_die, we see if the fde should go into a separate CU and 
> if yes, record it in a hash_table.
>  2. In dwarf2_out_finish, we check if each subprogram die is in the table, 
> and 
> if yes, create a new CU for the die, along with a DW_TAG_imported_unit 
> die referring to the main CU. The newly created CU die is also
> recorded in the hash table.
>  3. In output_comp_unit, if the CU is in the hash table, we switch to 
>.debug_info instead of debug_info_section.
>  4. We do the same thing when generating aranges, pubnames and debug_loc.
> 
>  Bootstrapping x86_64-linux for c and c++ works, but there are quite a
>  few regression test failures. We'll analyze and follow-up on those. 
> 
>  The formatting style is inconsistent, we'll fix those as well.
> 
>  We wanted to put it out before going ahead further. How does this look? 
>  Are there are any major problems with this approach?
> 
>  For code like this,
> 
> volatile int x;
> void foo() { x--; }
> int main() { return x; }
> 
>  $ ~/native/install/bin/gcc -ffunction-sections -g3 -Wa,-gdwarf-sections 
> -fdwarf-sections test.c -c
>  $ ~/native/install/bin/objdump -h test.o
> 
>   5 .text.foo 0016      0050  2**0
>   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
>   6 .text.main001a      0066  2**0
>   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
>   7 .debug_info.text.main 0044      
> 0080  2**0
>   CONTENTS, RELOC, READONLY, DEBUGGING
>   8 .debug_info.text.foo 0040      
> 00b1  2**0
>   CONTENTS, RELOC, READONLY, DEBUGGING
>   9 .debug_info   004d      00e5  2**0
>   CONTENTS, RELOC, READONLY, DEBUGGING
>  10 .debug_abbrev 007f      011e  2**0
>   CONTENTS, READONLY, DEBUGGING
>  11 .debug_aranges 0020      0194  
> 2**0
>   CONTENTS, RELOC, READONLY, DEBUGGING
>  12 .debug_aranges.text.main 0030      
> 01b3  2**0
>   CONTENTS, RELOC, READONLY, DEBUGGING
>  13 .debug_aranges.text.foo 0030      
> 01d6  2**0
>   CONTENTS, RELOC, READONLY, DEBUGGING
>  14 .debug_ranges 0070      01f9  2**0
>   CONTENTS, RELOC, READONLY, DEBUGGING
> 
>  Linking and dumping debug info shows that there is no debug info for 
>  foo (which would have been gc'ed by the linker)
> 
>  $ ~/native/install/bin/gcc -Wl,--gc-sections test.o -Wl,-Map=test.map
>  $ cat test.map
> 
> Discarded input sections
> 
> 
>  .text.foo  0x   0x16 test.o
>  .debug_info.text.foo
> 0x   0x40 test.o
>  .debug_aranges.text.foo
> 0x   0x30 test.o
>  .note.GNU-stack
> 0x0x0 test.o
>  .debug_line.text.foo
> 0x   0x12 test.o
> 
> 
>  $ ~/native/install/bin/objdump -Wi a.out
> Contents of the .debug_info sectio

Cleanup and improve canonical type construction in LTO

2015-05-19 Thread Jan Hubicka
Richard,
this is my attempt to make sense of TYPE_CANONICAL at LTO.  My undrestanding is
that gimple_canonical_types_compatible_p needs to return true for all pairs of
types that are considered compatible across compilation unit for any of
languages we support (and in a sane way for cross language, too) and moreover
it needs to form an equivalence so it can be used to do canonical type merging.

Now C definition of type compatibility ignores type names and only boils down
to structural compare (which we get wrong for unions, but I will look into that
incrementally, also C explicitely require fields names to match, which we don't)
and it of course says that incompete type can match complete.

This is bit generous on structures and unions, because every incomplete
RECORD_TYPE is compatible with every RECORD_TYPE in program and similarly
incomplete UNION_TYPE is compatible with every UNION_TYPE in program.

Now from the fact that gimple_canonical_types_compatible_p must be equivalence
(and thus transitive) we immmediately get that there is no way to make
difference between two RECORD_TYPEs (or UNION_TYPEs) at all: there always may
be incomplete that forces them equivalent.

This is not how the code works. gimple_canonical_types_compatible_p will not
match complete type with incomplete and this is not a prolblem only because
TYPE_CANONICAL matters for complete types only. TBAA machinery never needs
alias sets of an incomplete type (modulo bugs). 

More precisely we have two equivalences:
 1) full structural equivalence matching fields, array sizes and function
parameters, where pointer types are however recursively matched only with 2)
 2) structural equivalence ignoring any info from complete types:
here all RECORD_TYPEs are equal, so are UNION_TYPEs, for functions we
can only match return value (because of existence of non-prototypes),
for arrays only TREE_TYPE.
In this equivalence we also can't match TYPE_MODE of aggregates/arrays
because it may not be set for incomplete ones.

Now our implementation somehow compute only 1) and 2) is approximated by
matching TREE_CODE of the pointer-to type.  This is unnecesarily pesimistic.
Pointer to pointer to int does not need to match pointer to pointer to
structure. 

The patch bellow changes it in the following way:

 a) it adds MATCH_INCOMPLETE_TYPES parameter to
gimple_canonical_types_compatible_p and gimple_canonical_type_hash
to determine whether we compute equivalence 1) or 2).

The way we handle pointers is updated so we set MATCH_INCOMPLETE_TYPES
when recursing down to pointer type.  This makes it possible for
complete structure referring incomplete pointer type to be equivalent with
a complete structure referring complete pointer type.

I believe that in this definition we do best possible equivalence
passing the rules above and we do not need to care about SCC - the
only way how type can reffer itself is via pointer and that will make us
to drop to MATCH_INCOMPLETE_TYPES.
 b) it disables TYPE_CANONICAL calculation for incomplete types and functions
types. It makes it clear that TYPE_CANONICAL is always 1) which is not
defined on these.

This seems to reduce number of canonical types computed to 1/3.
We get bit more recursion in gimple_canonical_types_compatible_p
and gimple_canonical_type_hash but only in MATCH_INCOMPLETE_TYPES mode
that converges quite quickly.

I know that it is not how other FEs works, but it is because they
do have type equivalence notion that include TYPE_NAME so it is possible
to determine TYPE_CANONICAL uniquely before the type is completed.
 c) adds sanity checking

- I can check that canonical_type_hash is not used for incomplete types
  (modulo ARRAY_TYPE that may appear as a field of complete structure)
  so the definition of cases we flip from complete to incomplete is 
  catching everything
- I can check that alias is never used on types we do not define
  TYPE_CANONICAL for. This actually catch a case in ipa-icf-gimple that
  tires to compare alias classes of voidtypes.
- I added check that we do not have prototypes of method types since
  I make difference between methods and functions (this is useful so
  we also match the BASETYPE argument)
 d) drops TYPE_METHOD_BASETYPE hashing since it is not tested by
gimple_canonical_types_compatible_p
it also drops matching function type attributes that seems wrong as discused
earlier.
  

On Firefox the canonical table changes from:
[WPA] GIMPLE canonical type table: size 262139, 110992 elements, 2153232 
searches, 808238 collisions (ratio: 0.375360)
[WPA] GIMPLE canonical type pointer-map: 110992 elements, 4723435 searches
to:
[WPA] GIMPLE canonical type table: size 65521, 32676 elements, 1634147 
searches, 474609 collisions (ratio: 0.290432)
[WPA] GIMPLE canonical type pointer-map: 32676 elements, 2302719 searches

I also checked that

[PATCH] Fix PR target/65730

2015-05-19 Thread Max Filippov
2015-05-20  Max Filippov  
gcc/
* config/xtensa/xtensa.c (init_alignment_context): Replace MULT
by BITS_PER_UNIT with ASHIFT by exact_log2 (BITS_PER_UNIT).
---
 gcc/config/xtensa/xtensa.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.c b/gcc/config/xtensa/xtensa.c
index eb039ba..7296e36 100644
--- a/gcc/config/xtensa/xtensa.c
+++ b/gcc/config/xtensa/xtensa.c
@@ -1461,8 +1461,9 @@ init_alignment_context (struct alignment_context *ac, rtx 
mem)
   if (ac->shift != NULL_RTX)
 {
   /* Shift is the byte count, but we need the bitcount.  */
-  ac->shift = expand_simple_binop (SImode, MULT, ac->shift,
-  GEN_INT (BITS_PER_UNIT),
+  gcc_assert (exact_log2 (BITS_PER_UNIT) >= 0);
+  ac->shift = expand_simple_binop (SImode, ASHIFT, ac->shift,
+  GEN_INT (exact_log2 (BITS_PER_UNIT)),
   NULL_RTX, 1, OPTAB_DIRECT);
   ac->modemask = expand_simple_binop (SImode, ASHIFT,
  GEN_INT (GET_MODE_MASK (mode)),
-- 
1.8.1.4



Re: [AArch64][PR65375] Fix RTX cost for vector SET

2015-05-19 Thread Kugan


On 07/05/15 17:24, James Greenhalgh wrote:
> On Wed, May 06, 2015 at 03:12:33AM +0100, Kugan wrote:
 gcc/ChangeLog:

 2015-04-24  Kugan Vivekanandarajah  
Jim Wilson  

* config/arm/aarch-common-protos.h (struct mem_cost_table): Added
new  fields loadv and storev.
* config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs):
Initialize loadv and storev.
* config/arm/aarch-cost-tables.h (generic_extra_costs): Likewise.
(cortexa53_extra_costs): Likewise.
(cortexa57_extra_costs): Likewise.
(xgene1_extra_costs): Likewise.
* config/aarch64/aarch64.c (aarch64_rtx_costs): Update vector
rtx_costs.
>>
>> Thanks James for the review. Attached patch changes this. Is this OK ?
> 
> Hi Kugan,
> 
> Thanks for sticking with it through a long review, sorry that the replies
> have been patchy, I'm still travelling.
> 
> This patch is OK for trunk, with an updated ChangeLog and assuming no
> regressions after a test run (And a quick check with some popular
> benchmarks if possible)

Committed as r223432 after fresh bootstrap and spec2k benchmarking.

Thanks,
Kugan


Re: [patch, gcc 5 regression] re-enable biarch for powerpc-linux-gnu

2015-05-19 Thread David Edelsohn
On Tue, May 19, 2015 at 8:03 PM, Sandra Loosemore
 wrote:
> On 05/19/2015 05:34 PM, Alan Modra wrote:
>>
>> On Tue, May 19, 2015 at 02:18:23PM -0400, David Edelsohn wrote:
>>>
>>> This seems reasonable to me.
>>>
>>> Alan, any thoughts from you?
>>
>>
>> Looks good.
>
>
> Thanks.  I've checked the patch into mainline (r223418).  I assume this is
> also a candidate for backporting to the same branches where the PR
> target/65286 patch went?

Okay.

Thanks, David


Fix minor memory leak in jump threader

2015-05-19 Thread Jeff Law


Honza pointed out we were leaking some memory in the jump threading 
code.  There's at least two leaks, this fixes the first.


Bootstrapped and regression tested on x86_64-unknown-linux-gnu. 
Installed on the trunk.
commit f4517a9d2f17b0890d9fe1159a3981985bed5672
Author: Jeff Law 
Date:   Tue May 19 20:20:57 2015 -0600

   * tree-ssa-threadupdate.c (thread_single_edge): Use 
delete_jump_thread
instead of open-coded version.  Also delete the jump thread created
within this function.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 95326a3..95114fa 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2015-05-20  Jeff Law  
+
+   * tree-ssa-threadupdate.c (thread_single_edge): Use delete_jump_thread
+   instead of open-coded version.  Also delete the jump thread created
+   within this function.
+
 2015-05-20  Alan Modra  
 
* config/rs6000/rs6000.c (rs6000_emit_allocate_stack): Return
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 0d61c18..c5b78a4 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -1647,9 +1647,7 @@ thread_single_edge (edge e)
   vec *path = THREAD_PATH (e);
   edge eto = (*path)[1]->e;
 
-  for (unsigned int i = 0; i < path->length (); i++)
-delete (*path)[i];
-  delete path;
+  delete_jump_thread_path (path);
   e->aux = NULL;
 
   thread_stats.num_threaded_edges++;
@@ -1693,6 +1691,7 @@ thread_single_edge (edge e)
   redirect_edge_and_branch (e, rd.dup_blocks[0]);
   flush_pending_stmts (e);
 
+  delete_jump_thread_path (npath);
   return rd.dup_blocks[0];
 }
 


fix class enum lookup

2015-05-19 Thread Nathan Sidwell
I've committed this obvious patch for 65954.  We failed to issue a diagnostic 
for a failed class enum lookup, and returned NULL, rather than error_mark_node, 
leading to a seg fault.


booted & tested on x86_64-linux.

nathan
2015-05-19  Nathan sidwell  

	cp/
	PR c++/65954
	* typeck.c (finish_class_member_access_expr): Diagnose failed
	lookup of enum class member.

	testsuite/
	* g++.dg/cpp0x/pr65954.C: New.

Index: cp/typeck.c
===
--- cp/typeck.c	(revision 223414)
+++ cp/typeck.c	(working copy)
@@ -2731,6 +2731,14 @@ finish_class_member_access_expr (tree ob
 		  return error_mark_node;
 		}
 	  tree val = lookup_enumerator (scope, name);
+	  if (!val)
+		{
+		  if (complain & tf_error)
+		error ("%qD is not a member of %qD",
+			   name, scope);
+		  return error_mark_node;
+		}
+	  
 	  if (TREE_SIDE_EFFECTS (object))
 		val = build2 (COMPOUND_EXPR, TREE_TYPE (val), object, val);
 	  return val;
Index: testsuite/g++.dg/cpp0x/pr65954.C
===
--- testsuite/g++.dg/cpp0x/pr65954.C	(revision 0)
+++ testsuite/g++.dg/cpp0x/pr65954.C	(working copy)
@@ -0,0 +1,12 @@
+// { dg-do compile { target c++11 } }
+
+struct Shape {
+  enum class Type
+  { Circle, Square };
+};
+
+
+void Foo (Shape &shape)
+{
+  +shape.Type::NOPE; // { dg-error "is not a member of" }
+}


[PING] [C++17] Implement N3928 - Extending static_assert

2015-05-19 Thread Ed Smith-Rowland

On 05/02/2015 04:16 PM, Ed Smith-Rowland wrote:

This extends' static assert to not require a message string.
I elected to make this work also for C++11 and C++14 and warn only 
with -pedantic.

I think many people just write
  static_assert(thing, "");
.

I took the path of building an empty string in the parser in this case.
I wasn't sure if setting message to NULL_TREE would cause sadness 
later on or not.


I also, perhaps in a fit of overzealousness made finish_static_assert 
not print the extra ": " and an empty message in this case.


I didn't modify _Static_assert for C.

Built and tested on x86_64-linux.

OK



Ping? Comments?




Re: [PATCH 4/4] Split-stack arg pointer init refinement

2015-05-19 Thread Alan Modra
Thanks for reviewing!  Somehow I missed seeing your OK in yesterday's
email.  I have far too many emails in my inbox..  Patch series
committed as revisions 223424, 223425, 223426, and 223427.  I made a
change to the comment for rs6000_supports_split_stack, instead of

/* -fsplit-stack uses a field in the TCB, available with glibc-2.18.  */

writing

/* -fsplit-stack uses a field in the TCB, available with glibc-2.19.
   We also allow 2.18 because alignment padding guarantees that the
   space is available there too.  */

because it's a lie to say the field was there in 2.18.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 1/4] rs6000_stack_info changes for -fsplit-stack

2015-05-19 Thread Alan Modra
On Mon, May 18, 2015 at 02:05:59PM -0400, David Edelsohn wrote:
> On Sun, May 17, 2015 at 10:54 PM, Alan Modra  wrote:
> > This patch changes rs6000_stack_info to keep save areas offsets even
> > when not used.  I need lr_save_offset valid for split-stack, and it
> > seemed reasonable to treat the other offsets the same.  Not zeroing
> > the offsets requires just one change in code that uses them, the
> > use_backchain_to_restore_sp expression in rs6000_emit_epilogue, not
> > counting the debug_stack_info changes.
> >
> > * config/rs6000/rs6000.c (rs6000_stack_info): Don't zero offsets
> > when not saving registers.
> > (debug_stack_info): Adjust to omit printing unused offsets,
> > as before.
> > (rs6000_emit_epilogue): Adjust use_backchain_to_restore_sp
> > expression.
> 
> I think that the vrsave_save_offset change may break saving of
> callee-saved VRs.  See PR 55276.

I checked.  It doesn't break that testcase.  PR 55276 was really
caused by using vrsave_mask for two purposes, firstly to track which
altivec registers have been saved, and secondly to control use of the
vrsave stack slot and whether mfvrsave/mtvrsave insns are generated.
Patch 2/4 removes this conflation.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Rich Felker
On Tue, May 19, 2015 at 05:10:11PM -0700, H.J. Lu wrote:
> On Tue, May 19, 2015 at 1:54 PM, Rich Felker  wrote:
> > On Tue, May 19, 2015 at 01:27:06PM -0700, H.J. Lu wrote:
> >> On Tue, May 19, 2015 at 1:15 PM, Rich Felker  wrote:
> >> > On Tue, May 19, 2015 at 12:17:18PM -0700, H.J. Lu wrote:
> >> >> On Tue, May 19, 2015 at 12:11 PM, Richard Henderson  
> >> >> wrote:
> >> >> > On 05/19/2015 12:06 PM, H.J. Lu wrote:
> >> >> >> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson 
> >> >> >>  wrote:
> >> >> >>> On 05/19/2015 11:06 AM, Rich Felker wrote:
> >> >>  I'm still mildly worried that concerns for supporting
> >> >>  relaxation might lead to decisions not to optimize code in ways 
> >> >>  that
> >> >>  would be difficult to relax (e.g. certain types of address load
> >> >>  reordering or hoisting) but I don't understand GCC internals
> >> >>  sufficiently to know if this concern is warranted or not.
> >> >> >>>
> >> >> >>> It is.  The relaxation that HJ is working on requires that the 
> >> >> >>> reads from the
> >> >> >>> got not be hoisted.  I'm not especially convinced that what he's 
> >> >> >>> working on is
> >> >> >>> a win.
> >> >> >>>
> >> >> >>> With LTO, the compiler can do the same job that he's attempting in 
> >> >> >>> the linker,
> >> >> >>> without an extra nop.  Without LTO, leaving it to the linker means 
> >> >> >>> that you
> >> >> >>> can't hoist the load and hide the memory latency.
> >> >> >>>
> >> >> >>
> >> >> >> My relax approach won't take away any optimization done by compiler.
> >> >> >> It simply turns indirect branch into direct branch with a nop prefix 
> >> >> >> at
> >> >> >> link-time.  I am having a hard time to understand why we shouldn't 
> >> >> >> do it.
> >> >> >
> >> >> > I well understand what you're doing.
> >> >> >
> >> >> > But my point is that the only time the compiler should present you 
> >> >> > with the
> >> >> > form of indirect branch you're looking for is when there's no place 
> >> >> > to hoist
> >> >> > the load.
> >> >> >
> >> >> > At which point, is it really worth adding a new relocation to the 
> >> >> > ABI?  Is it
> >> >> > really worth adding new code to the linker that won't be exercised 
> >> >> > often?
> >> >>
> >> >> I believe there are plenty of indirect branches via GOT when compiling
> >> >> PIE/PIC with -fno-plt:
> >> >>
> >> >> [hjl@gnu-6 gcc]$ cat /tmp/x.c
> >> >> extern void foo (void);
> >> >>
> >> >> void
> >> >> bar (void)
> >> >> {
> >> >>   foo ();
> >> >> }
> >> >> [hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt
> >> >> [hjl@gnu-6 gcc]$ cat x.s
> >> >> ..file "x.c"
> >> >> ..section .text.unlikely,"ax",@progbits
> >> >> ..LCOLDB0:
> >> >> ..text
> >> >> ..LHOTB0:
> >> >> ..p2align 4,,15
> >> >> ..globl bar
> >> >> ..type bar, @function
> >> >> bar:
> >> >> ..LFB0:
> >> >> ..cfi_startproc
> >> >> jmp *foo@GOTPCREL(%rip)
> >> >> ..cfi_endproc
> >> >> ..LFE0:
> >> >> ..size bar, .-bar
> >> >
> >> > I agree these exist. What I question is whether the savings from the
> >> > linker being able to relax this to a direct call in the case where the
> >> > programmer failed to let the compiler make it a direct call to begin
> >> > with (by using hidden or protected visibility) are worth the cost of
> >> > not being able to hoist the load out of loops or schedule it earlier
> >> > in cases where relaxation is not possible because the call target is
> >> > not defined in the same DSO.
> >>
> >> Just for fun.  I compiled binutils as PIE with -fno-plt -flto:
> >>
> >> [hjl@gnu-mic-2 gas]$ file as-new
> >> as-new: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV),
> >> dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not
> >> stripped
> >> [hjl@gnu-mic-2 gas]$
> >>
> >> There are 43:
> >>
> >> ff 25 21 93 2d 00 jmpq   *0x2d9321(%rip)# 3d5f58 
> >> <_DYNAMIC+0x1e8>
> >>
> >> and 1983
> >>
> >> ff 15 eb f4 38 00 callq  *0x38f4eb(%rip)# 3d60e0 
> >> <_DYNAMIC+0x370>
> >
> > How many of those would be relaxed? I suspect it depends a lot on
> > whether libbfd is static or shared.
> 
> When shared libraries are enabled, there are 177 indirect branches
> to locally defined functions.  Call to any locally defined functions,
> which aren't compiled with LTO, is indirect.

And are the above indirect calls/jumps (1983+43) candidates for
scheduling/hoisting the address load (that's not being done yet), or
are they the ones the compiler opted not to schedule/hoist? The win
from relaxation seems small here, but as long as you're not going to
block optimizations that would preclude relaxing, I don't see any
disadvantages to doing it.

Rich


[PATCH] fixup hash table descriptor in winnt.c

2015-05-19 Thread tbsaunde+gcc
From: Trevor Saunders 

Hi,

This is a straight forward fixup of the hash table descriptor in winnt.c
causing the PR.


Tested a cross to i686-cygwin now builds, and committing to trunk.

Trev

gcc/ChangeLog:

2015-05-19  Trevor Saunders  

PR c++/65835
* config/i386/winnt.c (struct wrapped_symbol_hasher): Change
value_type to const char *.
---
 gcc/ChangeLog   | 6 ++
 gcc/config/i386/winnt.c | 6 +++---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7df764b..4131b90 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2015-05-19  Trevor Saunders  
+
+   PR c++/65835
+   * config/i386/winnt.c (struct wrapped_symbol_hasher): Change
+   value_type to const char *.
+
 2015-05-19  Sandra Loosemore  
 
* config.gcc [powerpc*-*-linux*]: Allow --enable-targets=all
diff --git a/gcc/config/i386/winnt.c b/gcc/config/i386/winnt.c
index e698cd5..da67f5f 100644
--- a/gcc/config/i386/winnt.c
+++ b/gcc/config/i386/winnt.c
@@ -738,11 +738,11 @@ i386_pe_record_stub (const char *name)
 
 struct wrapped_symbol_hasher : typed_noop_remove 
 {
-  typedef char *value_type;
-  typedef char *compare_type;
+  typedef const char *value_type;
+  typedef const char *compare_type;
   static inline hashval_t hash (const char *);
   static inline bool equal (const char *, const char *);
-  static inline void remove (char *);
+  static inline void remove (const char *);
 };
 
 inline hashval_t
-- 
2.4.0.78.g7c6ecbf



Re: [PATCH 3/4] split-stack for powerpc64

2015-05-19 Thread Alan Modra
On Tue, May 19, 2015 at 07:40:15AM -0500, Lynn A. Boger wrote:
> Questions on the use of the options for split stack:
> 
> - The way this is implemented, split stack is generated if the
> target platform supports split stack, on ppc64/ppc64le as well
> as on x86, and the use of -fno-split-stack doesn't seem to affect it
> for any of these.  Is that the way it should work?  I would expect
> -fno-split-stack to disable it completely.

Can you give a testcase to show what you mean?  Picking one of the go
testsuite programs at random, I see
$ gcc/xgcc -Bgcc/ -S -I powerpc64le-linux/libgo 
/src/gcc-virgin/gcc/testsuite/go.test/test/args.go
$ grep morestack args.s
bl __morestack
bl __morestack
$ gcc/xgcc -Bgcc/ -fno-split-stack -S -I powerpc64le-linux/libgo 
/src/gcc-virgin/gcc/testsuite/go.test/test/args.go
$ grep morestack args.s
$
That shows -fno-split-stack being honoured.

> - The comments say that the gold linker is used for some
> situations but I don't see any reference in the code to enabling
> the gold linker for ppc64le, ppc64, or x86.  Is the user expected
>  to add the option for the gold linker if needed?

At the moment I believe this is true.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread H.J. Lu
On Tue, May 19, 2015 at 1:54 PM, Rich Felker  wrote:
> On Tue, May 19, 2015 at 01:27:06PM -0700, H.J. Lu wrote:
>> On Tue, May 19, 2015 at 1:15 PM, Rich Felker  wrote:
>> > On Tue, May 19, 2015 at 12:17:18PM -0700, H.J. Lu wrote:
>> >> On Tue, May 19, 2015 at 12:11 PM, Richard Henderson  
>> >> wrote:
>> >> > On 05/19/2015 12:06 PM, H.J. Lu wrote:
>> >> >> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson  
>> >> >> wrote:
>> >> >>> On 05/19/2015 11:06 AM, Rich Felker wrote:
>> >>  I'm still mildly worried that concerns for supporting
>> >>  relaxation might lead to decisions not to optimize code in ways that
>> >>  would be difficult to relax (e.g. certain types of address load
>> >>  reordering or hoisting) but I don't understand GCC internals
>> >>  sufficiently to know if this concern is warranted or not.
>> >> >>>
>> >> >>> It is.  The relaxation that HJ is working on requires that the reads 
>> >> >>> from the
>> >> >>> got not be hoisted.  I'm not especially convinced that what he's 
>> >> >>> working on is
>> >> >>> a win.
>> >> >>>
>> >> >>> With LTO, the compiler can do the same job that he's attempting in 
>> >> >>> the linker,
>> >> >>> without an extra nop.  Without LTO, leaving it to the linker means 
>> >> >>> that you
>> >> >>> can't hoist the load and hide the memory latency.
>> >> >>>
>> >> >>
>> >> >> My relax approach won't take away any optimization done by compiler.
>> >> >> It simply turns indirect branch into direct branch with a nop prefix at
>> >> >> link-time.  I am having a hard time to understand why we shouldn't do 
>> >> >> it.
>> >> >
>> >> > I well understand what you're doing.
>> >> >
>> >> > But my point is that the only time the compiler should present you with 
>> >> > the
>> >> > form of indirect branch you're looking for is when there's no place to 
>> >> > hoist
>> >> > the load.
>> >> >
>> >> > At which point, is it really worth adding a new relocation to the ABI?  
>> >> > Is it
>> >> > really worth adding new code to the linker that won't be exercised 
>> >> > often?
>> >>
>> >> I believe there are plenty of indirect branches via GOT when compiling
>> >> PIE/PIC with -fno-plt:
>> >>
>> >> [hjl@gnu-6 gcc]$ cat /tmp/x.c
>> >> extern void foo (void);
>> >>
>> >> void
>> >> bar (void)
>> >> {
>> >>   foo ();
>> >> }
>> >> [hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt
>> >> [hjl@gnu-6 gcc]$ cat x.s
>> >> ..file "x.c"
>> >> ..section .text.unlikely,"ax",@progbits
>> >> ..LCOLDB0:
>> >> ..text
>> >> ..LHOTB0:
>> >> ..p2align 4,,15
>> >> ..globl bar
>> >> ..type bar, @function
>> >> bar:
>> >> ..LFB0:
>> >> ..cfi_startproc
>> >> jmp *foo@GOTPCREL(%rip)
>> >> ..cfi_endproc
>> >> ..LFE0:
>> >> ..size bar, .-bar
>> >
>> > I agree these exist. What I question is whether the savings from the
>> > linker being able to relax this to a direct call in the case where the
>> > programmer failed to let the compiler make it a direct call to begin
>> > with (by using hidden or protected visibility) are worth the cost of
>> > not being able to hoist the load out of loops or schedule it earlier
>> > in cases where relaxation is not possible because the call target is
>> > not defined in the same DSO.
>>
>> Just for fun.  I compiled binutils as PIE with -fno-plt -flto:
>>
>> [hjl@gnu-mic-2 gas]$ file as-new
>> as-new: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV),
>> dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not
>> stripped
>> [hjl@gnu-mic-2 gas]$
>>
>> There are 43:
>>
>> ff 25 21 93 2d 00 jmpq   *0x2d9321(%rip)# 3d5f58 <_DYNAMIC+0x1e8>
>>
>> and 1983
>>
>> ff 15 eb f4 38 00 callq  *0x38f4eb(%rip)# 3d60e0 <_DYNAMIC+0x370>
>
> How many of those would be relaxed? I suspect it depends a lot on
> whether libbfd is static or shared.

When shared libraries are enabled, there are 177 indirect branches
to locally defined functions.  Call to any locally defined functions,
which aren't compiled with LTO, is indirect.

-- 
H.J.


Re: [patch, gcc 5 regression] re-enable biarch for powerpc-linux-gnu

2015-05-19 Thread Sandra Loosemore

On 05/19/2015 05:34 PM, Alan Modra wrote:

On Tue, May 19, 2015 at 02:18:23PM -0400, David Edelsohn wrote:

This seems reasonable to me.

Alan, any thoughts from you?


Looks good.


Thanks.  I've checked the patch into mainline (r223418).  I assume this 
is also a candidate for backporting to the same branches where the PR 
target/65286 patch went?


-Sandra



Re: [patch, gcc 5 regression] re-enable biarch for powerpc-linux-gnu

2015-05-19 Thread Alan Modra
On Tue, May 19, 2015 at 02:18:23PM -0400, David Edelsohn wrote:
> This seems reasonable to me.
> 
> Alan, any thoughts from you?

Looks good.

-- 
Alan Modra
Australia Development Lab, IBM


[SH][committed] Fix pr64366.c test case

2015-05-19 Thread Oleg Endo
Hi,

The SH testcase pr64366 would fail on SH2A because -m4 -ml is specified
in dg-options of the test case.  Target options such as CPU and
endianness are usually not specified in the test cases directly, but
when invoking the test suite.  Fixed with the attached patch.  Committed
as r223417.

Cheers,
Oleg

testsuite/ChangeLog:
* gcc.target/sh/pr64366.c: Remove -m4 -ml from dg-options.
Index: gcc/testsuite/gcc.target/sh/pr64366.c
===
--- gcc/testsuite/gcc.target/sh/pr64366.c	(revision 223416)
+++ gcc/testsuite/gcc.target/sh/pr64366.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -m4 -ml -mlra" } */
+/* { dg-options "-O2 -mlra" } */
 
 typedef int int8_t __attribute__ ((__mode__ (__QI__)));
 typedef int int16_t __attribute__ ((__mode__ (__HI__)));


Re: [PATCH] add self-tuning to x86 hardware fast path in libitm

2015-05-19 Thread Torvald Riegel
On Mon, 2015-05-18 at 23:27 -0400, Nuno Diegues wrote:
> On Mon, May 18, 2015 at 5:29 PM, Torvald Riegel  wrote:
> >
> > Are there better options for the utility function, or can we tune it to
> > be less affected by varying txn length and likelihood of txnal vs.
> > nontxnal code?  What are the things we want to avoid with the tuning?  I
> > can think of:
> > * Not needing to wait for serial txns, or being aborted by a serial txn.
> > * Not retrying using HTM too much so that the retry time is larger than
> > the scalability we actually gain by having other txns commit
> > concurrently.
> 
> 
> Yes, those are the key points we want to make sure that do not happen.
> 
> >
> >
> > Anything else?  Did you consider other utility functions during your
> > research?
> 
> 
> The txnal vs nontxnal is indeed a completely different story. To account for
> this we would need extra book-keeping to count only cycles spent inside
> txnal code. So this would require every thread (or a sample of threads) to
> perform a rdtsc (or equivalent) on every begin/end call rather than the
> current approach of a single rdtsc per optimization round.
> 
> With this type of online optimization we found that the algorithm had to be
> very simple and cheap to execute. RDTSC was a good finding to fit this, and
> it enabled us to obtain gains. Other time sources failed to do so.
> 
> I do not have, out of the box, a good alternative to offer. I suppose it would
> take some iterations of thinking/experimenting with, just like with any 
> research
> problem :)

So let's iterate on this in parallel with the other changes that we need
to get in place.  I'd prefer to have some more confidence that measuring
txn throughput in epochs is the best way forward.

Here are a few thoughts:

Why do we actually need to measure succeeded transactions?  If a HW txn
is just getting committed without interfering with anything else, is
this really different from, say, two HW txns getting committed?  Aren't
we really just interested in wasted work, or delayed txns?  That may
help taking care of the nontxnal vs. txnal problem.

Measuring committed txns during a time that might otherwise be spent by
a serial txns could be useful to figure out how much other useful work a
serial txn prevents.  But we'd see that as well if we'd just go serial
during the auto-tuning because then concurrent txns will have to wait;
and in this case we could measure it in the slow path of those
concurrent txns (ie, during waiting, where measurement overhead wouldn't
matter as much).

If a serial txn is running, concurrent txns (that wait) are free to sync
and tell the serial how much cost it creates for the concurrent txns.
There, txn length could matter, but we won't find out for real until
after the concurrent ones have run (they could be pretty short, so we
can't simply assume that the concurrent ones are as long as the serial
one, so that simply the number of concurrent ones can be used to
calculate delayed work).

> 
> > Also, note that the mitigation strategy for rdtsc
> > short-comings that you mention in the paper is not applicable in
> > general, specifically not in libitm.
> 
> 
> I suppose you mean the preemption of threads inflating the cycles measured?

Yes, preemption and migration of threads (in case there's no global
sync'ed TSC or similar) -- you mentioned in the paper that you pin
threads to cores...

> This would be similarly a problem to any time source that tries to measure the
> amount of work performed; not sure how we can avoid it in general. Any 
> thoughts?

Not really as long as we keep depending on measuring time in a
light-weight way.  Measuring smaller time intervals could make it less
likely that preemption happens during such an interval, though.

> 
> > Another issue is that we need to implement the tuning in a portable way.
> > You currently make it depend on whether the assembler knows about RTM,
> > whereas the rest of the code makes this more portable and defers to
> > arch-specific files.  I'd prefer if we could use the tuning on other
> > archs too.  But for that, we need to cleanly separate generic from
> > arch-specific parts.  That affects magic numbers as well as things like
> > rdtsc().
> 
> 
> Yes, I refrained from adding new calls to the arch-specific files, to
> contain the
> changes mainly. But that is possible and that's part of the feedback I
> was hoping
> to get.

OK.  Let me know if you want further input regarding this.

> > I'm wondering about whether it really makes sense to treat XABORT like
> > conflicts and other abort reasons, instead of like capacity aborts.
> > Perhaps we need to differentiate between the explicit abort codes glibc
> > uses, so that we can distinguish between cases where an abort is
> > supposed to signal incompatibility with txnal execution and cases where
> > it's just used for lock elision (see sysdeps/unix/sysv/linux/x86/hle.h
> > in current glibc):
> > #define _ABORT_LOCK_BUSY0xff
> > #defin

Re: ODR merging and implicit typedefs

2015-05-19 Thread Jan Hubicka
> On 05/19/2015 01:33 PM, Jan Hubicka wrote:
> >I tracked down that those are implicit typedef created by 
> >create_implicit_typedef.
> >My patch made them no longer anonymous that in turn triggers the bogus 
> >diagnostics.
> >I do not think it is fully correct though - those types are not anonymous.
> 
> Hmm?  The types are anonymous:
> 
> static struct
> {
>   int moves_inserted;
>   int copies_inserted;
>   int insns_deleted;
> } stats;
> 
> Here there is a variable named 'stats', but its type has no name.

Ah, sorry. I misread the declaration and thought it produce type stats. I 
suppose this cost
me an afternoon yesterday :)

Indeed this is anonymous type. I see it is anonymous even though it is not in
any namespace, so it makes sense that I needed to make an exception to my hack
looking for explicit namespace in the DECL_CONTEXT.
> 
> >(I also wonder we we need to introdce a type name "._134") and pass it all 
> >the way down
> >to LTO.
> 
> Anonymous types do need to have some name, so that we can mangle
> them. But I don't know if they need to remain past free_lang_data.

I think they can be killed there, as a minor optimization.  I will look into it.
Thanks for the explanation.

Honza
> 
> Jason


Re: ODR merging and implicit typedefs

2015-05-19 Thread Jason Merrill

On 05/19/2015 01:33 PM, Jan Hubicka wrote:

I tracked down that those are implicit typedef created by 
create_implicit_typedef.
My patch made them no longer anonymous that in turn triggers the bogus 
diagnostics.
I do not think it is fully correct though - those types are not anonymous.


Hmm?  The types are anonymous:

static struct
{
  int moves_inserted;
  int copies_inserted;
  int insns_deleted;
} stats;

Here there is a variable named 'stats', but its type has no name.


(I also wonder we we need to introdce a type name "._134") and pass it all the 
way down
to LTO.


Anonymous types do need to have some name, so that we can mangle them. 
But I don't know if they need to remain past free_lang_data.


Jason



Re: [match-and-simplify] fix incorrect code-gen in 'for' pattern

2015-05-19 Thread Prathamesh Kulkarni
On 19 May 2015 at 14:34, Richard Biener  wrote:
> On Tue, 19 May 2015, Prathamesh Kulkarni wrote:
>
>> On 18 May 2015 at 20:17, Prathamesh Kulkarni
>>  wrote:
>> > On 18 May 2015 at 14:12, Richard Biener  wrote:
>> >> On Sat, 16 May 2015, Prathamesh Kulkarni wrote:
>> >>
>> >>> Hi,
>> >>> genmatch generates incorrect code for following (artificial) pattern:
>> >>>
>> >>> (for op (plus)
>> >>>   op2 (op)
>> >>>   (simplify
>> >>> (op @x @y)
>> >>> (op2 @x @y)
>> >>>
>> >>> generated gimple code: http://pastebin.com/h1uau9qB
>> >>> 'op' is not replaced in the generated code on line 33:
>> >>> *res_code = op;
>> >>>
>> >>> I think it would be a better idea to make op2 iterate over same set
>> >>> of operators (op2->substitutes = op->substitutes).
>> >>> I have attached patch for the same.
>> >>> Bootstrap + testing in progress on x86_64-unknown-linux-gnu.
>> >>> OK for trunk after bootstrap+testing completes ?
>> >>
>> >> Hmm, but then the example could as well just use 'op'.  I think we
>> >> should instead reject this.
>> >>
>> >> Consider
>> >>
>> >>   (for op (plus minus)
>> >> (for op2 (op)
>> >>   (simplify ...
>> >>
>> >> where it is not clear what would be desired.  Simple replacement
>> >> of 'op's value would again just mean you could have used 'op' in
>> >> the first place.  Doing what you propose would get you
>> >>
>> >>   (for op (plus minus)
>> >> (for op2 (plus minus)
>> >>   (simplify ...
>> >>
>> >> thus a different iteration.
>> >>
>> >>> I wonder if we really need is_oper_list flag in user_id ?
>> >>> We can determine if user_id is an operator list
>> >>> if user_id::substitutes is not empty ?
>> >>
>> >> After your change yes.
>> >>
>> >>> That will lose the ability to distinguish between user-defined operator
>> >>> list and list-iterator in for like op/op2, but I suppose we (so far) 
>> >>> don't
>> >>> need to distinguish between them ?
>> >>
>> >> Well, your change would simply make each list-iterator a (temporary)
>> >> user-defined operator list as well as the current iterator element
>> >> (dependent on context - see the nested for example).  I think that
>> >> adds to confusion.
>> AFAIU, the way it's implemented in lower_for, the iterator is handled
>> the same as a user-defined operator
>> list. I was wondering if we should get rid of 'for' altogether and
>> have it replaced
>> by operator-list ?
>>
>> IMHO having two different things - iterator and operator-list is
>> unnecessary and we could
>> brand iterator as a "local" operator-list. We could extend syntax of 
>> 'simplify'
>> to accommodate "local" operator-lists.
>>
>> So we can say, using an operator-list within 'match' replaces it by
>> corresponding operators in that list.
>> Operator-lists can be "global" (visible to all patterns), or local to
>> a particular pattern.
>>
>> eg:
>> a) single for
>> (for op (...)
>>   (simplify
>> (op ...)))
>>
>> can be written as:
>> (simplify
>>   op (...)  // define "local" operator-list op.
>>   (op ...)) // proceed here the same way as for lowering "global" operator 
>> list.
>
> it's not shorter and it's harder to parse.  And you can't share the
> operator list with multiple simplifies like
>
>  (for op (...)
>(simplify
>  ...)
>(simplify
>  ...))
>
> which is already done I think.
I missed that -;)
Well we can have a "workaround syntax" for that if desired.
>
>> b) multiple iterators:
>> (for op1 (...)
>>   op2 (...)
>>   (simplify
>> (op1 (op2 ...
>>
>> can be written as:
>> (simplify
>>   op1 (...)
>>   op2 (...)
>>   (op1 (op2 ...)))
>>
>> c) nested for
>> (for op1 (...)
>> (for op2 (...)
>>   (simplify
>> (op1 (op2 ...
>>
>> can be written as:
>>
>> (simplify
>>   op1 (...)
>>   (simplify
>> op2 (...)
>> (op1 (op2 ...
>>
>> My rationale behind removing 'for' is we don't need to distinguish
>> between an "operator-list" and "iterator",
>> and only have an operator-list -;)
>> Also we can reuse parser::parse_operator_list (in parser::parse_for
>> parsing oper-list is duplicated)
>> and get rid of 'parser::parse_for'.
>> We don't need to change lowering, since operator-lists are handled
>> the same way as 'for' (we can keep lowering of simplify::for_vec as it is).
>>
>> Does it sound reasonable ?
>
> I dont' think the proposed syntax is simpler or more powerful.
Hmm I tend to agree. My motivation to remove 'for' was that it is
not more powerful than operator-list and we can re-write 'for' with equivalent
operator-list with some syntax changes (like putting operator-list in
simplify etc.)
So there's only one of doing the same thing.

>
> Richard.
>
>> Thanks,
>> Prathamesh
>> >>
>> >> So - can you instead reject this use?
I have attached patch for rejecting this use of iterator.
Ok for trunk after bootstrap+testing ?

Thanks,
Prathamesh
>> > Well my intention was to have support for walking operator list in reverse.
>> > For eg:
>> > (for bitop (bit_and bit_ior)
>> >   rbitop (bit_ior

Re: Fwd: Re: [PATCH, RFC]: Next stage1, refactoring: propagating rtx subclasses

2015-05-19 Thread Jeff Law

On 05/18/2015 11:07 PM, Mikhail Maltsev wrote:

It seems rather weird to me: I have definitely sent this message, but
now I realized that it did not get to the mailing list archive (and it
was not bounced either). Thus, resending.

I got the original, not sure why it didn't show up in the archives.




On 12.05.2015 23:10, Jeff Law wrote:

There's a makefile fragment in contrib which will build a large number
of targets that you might find helpful.  Of course without some baseline
to compare against, it's of less value.


Thanks for a hint. I tested the build using this set of targets.
Currently there are some targets that fail to build from trunk (due to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65835 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55035). Other targets are
OK, and the patch does not introduce any failures.

Thanks.  Yea, don't worry about those targets :-)




The attached archive contains the latest version of the patch (no
significant differences since previous one; I just combined everything
in one file and rebased) and updated changelog. Bootstrapped + regtested
on x86_64-unknown-linux-gnu{,-m32} and built config-list.mk. OK for trunk?

Yes, this is OK for the trunk.  Please commit.

jeff


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Rich Felker
On Tue, May 19, 2015 at 01:27:06PM -0700, H.J. Lu wrote:
> On Tue, May 19, 2015 at 1:15 PM, Rich Felker  wrote:
> > On Tue, May 19, 2015 at 12:17:18PM -0700, H.J. Lu wrote:
> >> On Tue, May 19, 2015 at 12:11 PM, Richard Henderson  
> >> wrote:
> >> > On 05/19/2015 12:06 PM, H.J. Lu wrote:
> >> >> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson  
> >> >> wrote:
> >> >>> On 05/19/2015 11:06 AM, Rich Felker wrote:
> >>  I'm still mildly worried that concerns for supporting
> >>  relaxation might lead to decisions not to optimize code in ways that
> >>  would be difficult to relax (e.g. certain types of address load
> >>  reordering or hoisting) but I don't understand GCC internals
> >>  sufficiently to know if this concern is warranted or not.
> >> >>>
> >> >>> It is.  The relaxation that HJ is working on requires that the reads 
> >> >>> from the
> >> >>> got not be hoisted.  I'm not especially convinced that what he's 
> >> >>> working on is
> >> >>> a win.
> >> >>>
> >> >>> With LTO, the compiler can do the same job that he's attempting in the 
> >> >>> linker,
> >> >>> without an extra nop.  Without LTO, leaving it to the linker means 
> >> >>> that you
> >> >>> can't hoist the load and hide the memory latency.
> >> >>>
> >> >>
> >> >> My relax approach won't take away any optimization done by compiler.
> >> >> It simply turns indirect branch into direct branch with a nop prefix at
> >> >> link-time.  I am having a hard time to understand why we shouldn't do 
> >> >> it.
> >> >
> >> > I well understand what you're doing.
> >> >
> >> > But my point is that the only time the compiler should present you with 
> >> > the
> >> > form of indirect branch you're looking for is when there's no place to 
> >> > hoist
> >> > the load.
> >> >
> >> > At which point, is it really worth adding a new relocation to the ABI?  
> >> > Is it
> >> > really worth adding new code to the linker that won't be exercised often?
> >>
> >> I believe there are plenty of indirect branches via GOT when compiling
> >> PIE/PIC with -fno-plt:
> >>
> >> [hjl@gnu-6 gcc]$ cat /tmp/x.c
> >> extern void foo (void);
> >>
> >> void
> >> bar (void)
> >> {
> >>   foo ();
> >> }
> >> [hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt
> >> [hjl@gnu-6 gcc]$ cat x.s
> >> ..file "x.c"
> >> ..section .text.unlikely,"ax",@progbits
> >> ..LCOLDB0:
> >> ..text
> >> ..LHOTB0:
> >> ..p2align 4,,15
> >> ..globl bar
> >> ..type bar, @function
> >> bar:
> >> ..LFB0:
> >> ..cfi_startproc
> >> jmp *foo@GOTPCREL(%rip)
> >> ..cfi_endproc
> >> ..LFE0:
> >> ..size bar, .-bar
> >
> > I agree these exist. What I question is whether the savings from the
> > linker being able to relax this to a direct call in the case where the
> > programmer failed to let the compiler make it a direct call to begin
> > with (by using hidden or protected visibility) are worth the cost of
> > not being able to hoist the load out of loops or schedule it earlier
> > in cases where relaxation is not possible because the call target is
> > not defined in the same DSO.
> 
> Just for fun.  I compiled binutils as PIE with -fno-plt -flto:
> 
> [hjl@gnu-mic-2 gas]$ file as-new
> as-new: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV),
> dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not
> stripped
> [hjl@gnu-mic-2 gas]$
> 
> There are 43:
> 
> ff 25 21 93 2d 00 jmpq   *0x2d9321(%rip)# 3d5f58 <_DYNAMIC+0x1e8>
> 
> and 1983
> 
> ff 15 eb f4 38 00 callq  *0x38f4eb(%rip)# 3d60e0 <_DYNAMIC+0x370>

How many of those would be relaxed? I suspect it depends a lot on
whether libbfd is static or shared.

Rich


Re: Demangle symbols in debug assertion messages

2015-05-19 Thread François Dumont

Hello

Is it ok ?

François

On 04/05/2015 22:31, François Dumont wrote:

Hi

Here is  the patch to demangle symbols in debug messages. I have 
also simplify code in formatter.h.


Here is an example of assertion message:

/home/fdt/dev/gcc/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/debug/functions.h:213: 


error: function requires a valid iterator range [__first, __last).

Objects involved in the operation:
iterator "__first" @ 0x0x7fff165d68b0 {
  type = 
__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iteratorstd::__cxx1998::vector > >, 
std::__debug::vector > > (mutable iterator);

  state = dereferenceable;
  references sequence with type `std::__debug::vectorstd::allocator >' @ 0x0x7fff165d69d0

}
iterator "__last" @ 0x0x7fff165d68e0 {
  type = 
__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iteratorstd::__cxx1998::vector > >, 
std::__debug::vector > > (mutable iterator);

  state = dereferenceable;
  references sequence with type `std::__debug::vectorstd::allocator >' @ 0x0x7fff165d69d0

}


* include/debug/formatter.h (_GLIBCXX_TYPEID): New macro to simplify
usage of typeid.
(_Error_formatter::_M_print_type): New.
* src/c++11/debug.cc
(_Error_formatter::_Parameter::_M_print_field): Use latter.
(_Error_formatter::_M_print_type): Implement latter using
__cxaabiv1::__cxa_demangle to print demangled type name.

I just hope that __cxa_demangle is portable.

Ok to commit ?

François





Re: miter_base simplification

2015-05-19 Thread François Dumont

On 03/05/2015 22:19, François Dumont wrote:

On 30/04/2015 13:18, Jonathan Wakely wrote:

On 30/04/15 10:40 +0200, François Dumont wrote:

On 27/04/2015 13:55, Jonathan Wakely wrote:

(Alternatively, could the same simplification be made for
__miter_base? Do we need _Miter_base<> or just two overloads of
__miter_base()?)


Definitely, I already have a patch for that.


Great :-)


And here is the patch for this part.

I have implemented it in such a way that it will also remove several 
layers of move_iterator.


2015-05-04  François Dumont  

* include/bits/cpp_type_traits.h
(std::move_iterator): Delete declaration.
(std::__is_move_iterator): Move partial 
specialization...

* include/bits/stl_iterator.h: ... here.
(std::__miter_base): Overloads for std::reverse_iterator and
std::move_iterator.
* include/bits/stl_algobase.h (std::__miter_base): Provide default
implementation.

Tested under Linux x86_64.

Ok to commit ?


Is it ok ?

François



Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread H.J. Lu
On Tue, May 19, 2015 at 1:15 PM, Rich Felker  wrote:
> On Tue, May 19, 2015 at 12:17:18PM -0700, H.J. Lu wrote:
>> On Tue, May 19, 2015 at 12:11 PM, Richard Henderson  wrote:
>> > On 05/19/2015 12:06 PM, H.J. Lu wrote:
>> >> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson  
>> >> wrote:
>> >>> On 05/19/2015 11:06 AM, Rich Felker wrote:
>>  I'm still mildly worried that concerns for supporting
>>  relaxation might lead to decisions not to optimize code in ways that
>>  would be difficult to relax (e.g. certain types of address load
>>  reordering or hoisting) but I don't understand GCC internals
>>  sufficiently to know if this concern is warranted or not.
>> >>>
>> >>> It is.  The relaxation that HJ is working on requires that the reads 
>> >>> from the
>> >>> got not be hoisted.  I'm not especially convinced that what he's working 
>> >>> on is
>> >>> a win.
>> >>>
>> >>> With LTO, the compiler can do the same job that he's attempting in the 
>> >>> linker,
>> >>> without an extra nop.  Without LTO, leaving it to the linker means that 
>> >>> you
>> >>> can't hoist the load and hide the memory latency.
>> >>>
>> >>
>> >> My relax approach won't take away any optimization done by compiler.
>> >> It simply turns indirect branch into direct branch with a nop prefix at
>> >> link-time.  I am having a hard time to understand why we shouldn't do it.
>> >
>> > I well understand what you're doing.
>> >
>> > But my point is that the only time the compiler should present you with the
>> > form of indirect branch you're looking for is when there's no place to 
>> > hoist
>> > the load.
>> >
>> > At which point, is it really worth adding a new relocation to the ABI?  Is 
>> > it
>> > really worth adding new code to the linker that won't be exercised often?
>>
>> I believe there are plenty of indirect branches via GOT when compiling
>> PIE/PIC with -fno-plt:
>>
>> [hjl@gnu-6 gcc]$ cat /tmp/x.c
>> extern void foo (void);
>>
>> void
>> bar (void)
>> {
>>   foo ();
>> }
>> [hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt
>> [hjl@gnu-6 gcc]$ cat x.s
>> ..file "x.c"
>> ..section .text.unlikely,"ax",@progbits
>> ..LCOLDB0:
>> ..text
>> ..LHOTB0:
>> ..p2align 4,,15
>> ..globl bar
>> ..type bar, @function
>> bar:
>> ..LFB0:
>> ..cfi_startproc
>> jmp *foo@GOTPCREL(%rip)
>> ..cfi_endproc
>> ..LFE0:
>> ..size bar, .-bar
>
> I agree these exist. What I question is whether the savings from the
> linker being able to relax this to a direct call in the case where the
> programmer failed to let the compiler make it a direct call to begin
> with (by using hidden or protected visibility) are worth the cost of
> not being able to hoist the load out of loops or schedule it earlier
> in cases where relaxation is not possible because the call target is
> not defined in the same DSO.

Just for fun.  I compiled binutils as PIE with -fno-plt -flto:

[hjl@gnu-mic-2 gas]$ file as-new
as-new: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV),
dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not
stripped
[hjl@gnu-mic-2 gas]$

There are 43:

ff 25 21 93 2d 00 jmpq   *0x2d9321(%rip)# 3d5f58 <_DYNAMIC+0x1e8>

and 1983

ff 15 eb f4 38 00 callq  *0x38f4eb(%rip)# 3d60e0 <_DYNAMIC+0x370>

-- 
H.J.


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Rich Felker
On Tue, May 19, 2015 at 12:17:18PM -0700, H.J. Lu wrote:
> On Tue, May 19, 2015 at 12:11 PM, Richard Henderson  wrote:
> > On 05/19/2015 12:06 PM, H.J. Lu wrote:
> >> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson  
> >> wrote:
> >>> On 05/19/2015 11:06 AM, Rich Felker wrote:
>  I'm still mildly worried that concerns for supporting
>  relaxation might lead to decisions not to optimize code in ways that
>  would be difficult to relax (e.g. certain types of address load
>  reordering or hoisting) but I don't understand GCC internals
>  sufficiently to know if this concern is warranted or not.
> >>>
> >>> It is.  The relaxation that HJ is working on requires that the reads from 
> >>> the
> >>> got not be hoisted.  I'm not especially convinced that what he's working 
> >>> on is
> >>> a win.
> >>>
> >>> With LTO, the compiler can do the same job that he's attempting in the 
> >>> linker,
> >>> without an extra nop.  Without LTO, leaving it to the linker means that 
> >>> you
> >>> can't hoist the load and hide the memory latency.
> >>>
> >>
> >> My relax approach won't take away any optimization done by compiler.
> >> It simply turns indirect branch into direct branch with a nop prefix at
> >> link-time.  I am having a hard time to understand why we shouldn't do it.
> >
> > I well understand what you're doing.
> >
> > But my point is that the only time the compiler should present you with the
> > form of indirect branch you're looking for is when there's no place to hoist
> > the load.
> >
> > At which point, is it really worth adding a new relocation to the ABI?  Is 
> > it
> > really worth adding new code to the linker that won't be exercised often?
> 
> I believe there are plenty of indirect branches via GOT when compiling
> PIE/PIC with -fno-plt:
> 
> [hjl@gnu-6 gcc]$ cat /tmp/x.c
> extern void foo (void);
> 
> void
> bar (void)
> {
>   foo ();
> }
> [hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt
> [hjl@gnu-6 gcc]$ cat x.s
> ..file "x.c"
> ..section .text.unlikely,"ax",@progbits
> ..LCOLDB0:
> ..text
> ..LHOTB0:
> ..p2align 4,,15
> ..globl bar
> ..type bar, @function
> bar:
> ..LFB0:
> ..cfi_startproc
> jmp *foo@GOTPCREL(%rip)
> ..cfi_endproc
> ..LFE0:
> ..size bar, .-bar

I agree these exist. What I question is whether the savings from the
linker being able to relax this to a direct call in the case where the
programmer failed to let the compiler make it a direct call to begin
with (by using hidden or protected visibility) are worth the cost of
not being able to hoist the load out of loops or schedule it earlier
in cases where relaxation is not possible because the call target is
not defined in the same DSO.

Rich


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Richard Henderson
On 05/19/2015 12:35 PM, Rich Felker wrote:
> Why would you recompute it (this requires a fairly expensive call that
> reads or pops its own return address) rather than simply spilling the
> already-computed value and reloading it from the stack?
> 
> The only example I can think of where it might make sense is when you
> don't want to load the address unconditionally because there are
> shrink-wrappable code paths that don't need it, but multple code paths
> that do, in which case they would each load different values. Is this
> the concern you have in mind?

That too.  I was thinking of exception landing pads, i.e. catches and cleanups,
where in the past we've had to re-compute the GOT address.  Though now that I
think on that more, it wasn't x86 that had that particular landing pad trouble.


r~


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Richard Henderson
On 05/19/2015 12:17 PM, H.J. Lu wrote:
>> But my point is that the only time the compiler should present you with the
>> form of indirect branch you're looking for is when there's no place to hoist
>> the load.
>>
>> At which point, is it really worth adding a new relocation to the ABI?  Is it
>> really worth adding new code to the linker that won't be exercised often?
> 
> I believe there are plenty of indirect branches via GOT when compiling
> PIE/PIC with -fno-plt:
> 
> [hjl@gnu-6 gcc]$ cat /tmp/x.c
> extern void foo (void);
> 
> void
> bar (void)
> {
>   foo ();
> }

Sure, as I said, when there's no place to hoist the load.

Try anything more complicated,

void bar (void)
{
  int i;
  for (i = 0; i < 10; ++i)
foo ();
}

void baz (void)
{
  foo ();
  foo ();
}

and you'll not see the call *foo@GOTPCREL(%rip) form.

Of course there's also plenty of times where combine recreates exactly that
form when perhaps the scheduler might have preferred otherwise.  Those are
optimization choices to be addressed under separate cover.

My point that we can already do what you want via LTO, without adding new
relocations, is still relevant.


r~


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Rich Felker
On Tue, May 19, 2015 at 11:59:00AM -0700, Richard Henderson wrote:
> On 05/19/2015 11:06 AM, Rich Felker wrote:
> > I'm still mildly worried that concerns for supporting
> > relaxation might lead to decisions not to optimize code in ways that
> > would be difficult to relax (e.g. certain types of address load
> > reordering or hoisting) but I don't understand GCC internals
> > sufficiently to know if this concern is warranted or not.
> 
> It is.  The relaxation that HJ is working on requires that the reads from the
> got not be hoisted.  I'm not especially convinced that what he's working on is
> a win.

Well as long as -fno-plt actually generates a load from the GOT like
what would be done for data access, and does not go out of its way to
produce something compatible with relaxation, my hope is that it would
not affected by the pessimization. I'm not sure if that's the case
though.

> With LTO, the compiler can do the same job that he's attempting in the linker,
> without an extra nop.  Without LTO, leaving it to the linker means that you
> can't hoist the load and hide the memory latency.

Yes, this is my feeling too. Alexander Monakov have been discussing it
on #musl a bit and I think the conclusion we reached is that
relaxation is possibly a significant real-world win for non-PIC main
executables, where it's very likely that addresses will be resolved at
ld-time and for the programmer not to specifically annotate this with
protected visibility. In such a case, you get either a direct call or
a direct address load and indirect call, rather than hitting an extra
cache line in the PLT thunk to do the address load and indirect call.
Note that, being non-PIC, there is no GOT register involved here.

> > I would still like to see the @GOTPCREL stuff added and used instead
> > of @GOT, as I mentioned earlier in the thread, but I agree that's
> > independent of relaxation support and shouldn't block it.
> 
> I don't think that @GOTPCREL for 32-bit is a good idea.  This is the scheme
> that Darwin uses, so we do have some experience with it.
> 
> In order for it to work you've got to have a pointer to a random address in 
> the
> function.  It means that you can only "easily" compute the address once.  If
> you need the value again you wind up with the same "extra" addl insn that we
> have with the current GOT pointer.

Why would you recompute it (this requires a fairly expensive call that
reads or pops its own return address) rather than simply spilling the
already-computed value and reloading it from the stack?

The only example I can think of where it might make sense is when you
don't want to load the address unconditionally because there are
shrink-wrappable code paths that don't need it, but multple code paths
that do, in which case they would each load different values. Is this
the concern you have in mind?

> We've just started to do inter-function register allocation.  The next step
> along those lines is to share the computation of GOT between multiple
> functions.  At which point it really helps to have one global base address to
> talk about.

I see -- that would be another case where it simplifies things.

Rich


[C PATCH] Use AGGREGATE_TYPE_P

2015-05-19 Thread Marek Polacek
Bootstrapped/regtested on x86_64-linux, applying to trunk.

2015-05-19  Marek Polacek  

* c-typeck.c (start_init): Use AGGREGATE_TYPE_P.

diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 7f54490..cf5322f 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -7126,10 +7126,7 @@ start_init (tree decl, tree asmspec_tree 
ATTRIBUTE_UNUSED, int top_level)
= ((TREE_STATIC (decl) || (pedantic && !flag_isoc99))
   /* For a scalar, you can always use any value to initialize,
  even within braces.  */
-  && (TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE
-  || TREE_CODE (TREE_TYPE (decl)) == RECORD_TYPE
-  || TREE_CODE (TREE_TYPE (decl)) == UNION_TYPE
-  || TREE_CODE (TREE_TYPE (decl)) == QUAL_UNION_TYPE));
+  && AGGREGATE_TYPE_P (TREE_TYPE (decl)));
   locus = identifier_to_locale (IDENTIFIER_POINTER (DECL_NAME (decl)));
 }
   else

Marek


Re: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option

2015-05-19 Thread H.J. Lu
On Tue, May 19, 2015 at 8:33 AM, Joseph Myers  wrote:
> On Tue, 19 May 2015, H.J. Lu wrote:
>
>> > I think the whole thing should be posted as one patch, with both the
>> > target-independent changes and the target-specific changes for all
>> > targets.
>> >
>>
>> That is what makes me concerned.  I have some simple target-specified
>> patches which weren't reviewed for years. What will happen if no one
>
> For any unreviewed patch, keep pinging weekly.
>
>> reviews some simple target-specified changes due to
>>
>> 1. Reviewers don't have access to those targets.
>> 2. Target maintainers aren't review them.
>> 3. There are no clear maintainers for those targets.
>
> I've already said in
>  that, given
> target maintainers CC:ed, I might be inclined to approve the patch on the
> basis of allowing them a week to test their target changes.
>

Here is the complete patch.  Tested on Linux/x86-64.  It is also
available on hjl/pie/master branch in git mirror.

-- 
H.J.

Add --enable-default-pie option to configure GCC to generate PIE by
default.

gcc/

2015-05-19  Magnus Granberg  
   H.J. Lu  

* Makefile.in (COMPILER): Add @NO_PIE_CFLAGS@.
(BUILD_CFLAGS): Likewise.
(BUILD_CXXFLAGS): Likewise.
(LINKER): Add @NO_PIE_FLAG@.
(BUILD_LDFLAGS): Likewise.
(libgcc.mvars): Set NO_PIE_CFLAGS to -fno-PIE for
--enable-default-pie.
* common.opt (fPIE): Initialize to -1.
(fpie): Likewise.
(no-pie): New option.
(pie): Replace "Negative(shared)" with "Negative(no-pie)".
* configure.ac: Add --enable-default-pie.
(NO_PIE_CFLAGS): New.  Check if -fno-PIE works.  AC_SUBST.
(NO_PIE_FLAG): New.  Check if -no-pie works.  AC_SUBST.
* defaults.h (DEFAULT_FLAG_PIE): New.  Default PIE to -fPIE.
* gcc.c (NO_PIE_SPEC): New.
(PIE_SPEC): Likewise.
(NO_FPIE1_SPEC): Likewise.
(FPIE1_SPEC): Likewise.
(NO_FPIE2_SPEC): Likewise.
(FPIE2_SPEC): Likewise.
(NO_FPIE2_SPEC): Likewise.
(FPIE_SPEC): Likewise.
(NO_FPIE_SPEC): Likewise.
(NO_FPIC1_SPEC): Likewise.
(FPIC1_SPEC): Likewise.
(NO_FPIC2_SPEC): Likewise.
(FPIC2_SPEC): Likewise.
(NO_FPIC2_SPEC): Likewise.
(FPIC_SPEC): Likewise.
(NO_FPIC_SPEC): Likewise.
(NO_FPIE1_AND_FPIC1_SPEC): Likewise.
(FPIE1_OR_FPIC1_SPEC): Likewise.
(NO_FPIE2_AND_FPIC2_SPEC): Likewise.
(FPIE2_OR_FPIC2_SPEC): Likewise.
(NO_FPIE_AND_FPIC_SPEC): Likewise.
(FPIE_OR_FPIC_SPEC): Likewise.
(LD_PIE_SPEC): Likewise.
(LINK_PIE_SPEC): Handle -no-pie.  Use PIE_SPEC and LD_PIE_SPEC.
* opts.c (DEFAULT_FLAG_PIE): New.  Set to 0 if ENABLE_DEFAULT_PIE
is undefined.
(finish_options): Update opts->x_flag_pie if it is -1.
* config/darwin.h (PIE_SPEC): Renamed to ...
(DARWIN_PIE_SPEC): This.
(LINK_SPEC): Replace PIE_SPEC with DARWIN_PIE_SPEC.
* config/darwin9.h (PIE_SPEC): Renamed to ...
(DARWIN_PIE_SPEC): This.
* config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Use
PIE_SPEC and NO_PIE_SPEC if HAVE_LD_PIE is defined.
* config/openbsd.h (ASM_SPEC): Use FPIE1_OR_FPIC1_SPEC and
FPIE2_OR_FPIC2_SPEC.
* config/m68k/netbsd-elf.h (ASM_SPEC): Likewise.
* config/m68k/openbsd.h (ASM_SPEC): Likewise.
* gcc/config/sol2.h (ASM_PIC_SPEC): Likewise.
* config/arm/freebsd.h (SUBTARGET_EXTRA_ASM_SPEC): Likewise.
* config/arm/netbsd-elf.h (SUBTARGET_EXTRA_ASM_SPEC): Likewise.
* config/arm/semi.h (SUBTARGET_EXTRA_ASM_SPEC): Likewise.
* config/cris/linux.h (CRIS_ASM_SUBTARGET_SPEC): Likewise.
* config/m32r/m32r.h (ASM_SPEC): Likewise.
* config/m68k/uclinux.h (DRIVER_SELF_SPECS): Likewise.
* config/rs6000/linux64.h (ASM_SPEC32): Likewise.
* config/rs6000/sysv4.h (ASM_SPEC): Likewise.
* config/sparc/freebsd.h (ASM_SPEC): Likewise.
* config/sparc/linux.h (ASM_SPEC): Likewise.
* config/sparc/linux64.h (ASM_SPEC): Likewise.
* config/sparc/netbsd-elf.h (ASM_SPEC): Likewise.
* config/sparc/openbsd64.h (ASM_SPEC): Likewise.
* config/sparc/sp-elf.h (ASM_SPEC): Likewise.
* config/sparc/sp64-elf.h (ASM_SPEC): Likewise.
* config/sparc/sparc.h (ASM_SPEC): Likewise.
* config/sparc/sysv4.h (ASM_SPEC): Likewise.
* config/sparc/vxworks.h (ASM_SPEC): Likewise.
* config/c6x/elf-common.h (ASM_SPEC): Use NO_FPIC2_SPEC,
FPIC2_SPEC, FPIC1_SPEC and FPIC2_SPEC.
* config/c6x/uclinux-elf.h (LINK_SPEC): Use FPIE_SPEC.
* config/frv/frv.h (DRIVER_SELF_SPECS): Use FPIC_SPEC,
NO_FPIC_SPEC and NO_FPIE1_AND_FPIC1_SPEC.
(ASM_SPEC): Use FPIE1_OR_FPIC1_SPEC and FPIE2_OR_FPIC2_SPEC.
* config/m68k/m68k.h (ASM_PCREL_SPEC): Use FPIC_SPEC.
* config/mips/gnu-user.h (NO_SHARED_SPECS): Use
NO_FPIE_AND_FPIC_SPEC.
* config/mips/vxworks.h (SUBTARGET_ASM_SPEC): Use FPIC_SPEC.
* config/rs6000/freebsd64.h (ASM_SPEC32): Likewise.
* config/rs6000/vxworks.h (ASM_SPEC): Likewise.
* config/vax/linux.h (ASM_SPEC): Likewise.
* doc/install.texi: Document --enable-default-pie.
* doc/invoke.texi: Document -no-pie.
* config.in: Regenerated.
* configure: Likewise.

gcc/ada/

2015-05-19  H.J. Lu  

* gcc-interface/Makefile.in (TOOLS_LIBS): Add @NO_PIE_FLAG@.

libgcc/

2015-05-19  H.J. Lu  

* Makefile.in (CRTSTUFF_CFLAGS): Add $(NO_PIE_CFLAGS).
From 15d2cee44ff03e1892

Re: [PATCH] Fix memory orders description in atomic ops built-ins docs.

2015-05-19 Thread Torvald Riegel
On Mon, 2015-05-18 at 17:36 +0100, Matthew Wahab wrote:
> Hello,
> 
> On 15/05/15 17:22, Torvald Riegel wrote:
> > This patch improves the documentation of the built-ins for atomic
> > operations.
> 
> The "memory model" to "memory order" change does improve things but I think 
> that
> the patch has some problems. As it is now, it makes some of the descriptions
> quite difficult to understand and seems to assume more familiarity with 
> details
> of the C++11 specification then might be expected.

I'd say that's a side effect of the C++11 memory model being the
reference specification of the built-ins.

> Generally, the memory order descriptions seem to be targeted towards language
> designers but don't provide for anybody trying to understand how to implement 
> or
> to use the built-ins.

I agree that the current descriptions aren't a tutorial on the C++11
memory model.  However, given that the model is not GCC-specific, we
aren't really in a need to provide a tutorial, in the same way that we
don't provide a C++ tutorial.  Users can pick the C++11 memory model
educational material of their choice, and we need to document what's
missing to apply the C++11 knowledge to the built-ins we provide.

There are several resources for implementers, for example the mappings
maintained by the Cambridge research group.  I guess it would be
sufficient to have such material on the wiki.  Is there something
specific that you'd like to see documented for implementers?

> Adding a less formal, programmers view to some of the
> descriptions would help.

Yes, probably.  However, I'm not aware of a good C++11 memory model
tutorial that we could link too (OTOH, I haven't really looked for one).
I don't plan to write one, and doing so would certainly take some time
and require *much* more text than what we have now.

> That implies the descriptions would be more than just
> illustrative, but I'd suggest that would be appropriate for the GCC manual.

The danger I see there is that if we claim to define semantics instead
of just illustrating them, then we need to do a really good job of
defining/explaining them.  I worry that we may end up replicating what's
in the standard or Batty et al.'s formalization of the memory model, and
having to give a tutorial too.  I'm not sure anyone is volunteering to
do that amount of work.

> I'm also not sure that the use of C++11 terms in the some of the
> descriptions. In particular, using happens-before seems wrong because
> happens-before isn't described anywhere in the GCC manual and because it has a
> specific meaning in the C++11 specification that doesn't apply to the GCC
> built-ins (which C++11 doesn't know about).

I agree it's not described in the manual, but we're implementing C++11.
However, I don't see why happens-before semantics wouldn't apply to
GCC's implementation of the built-ins; there may be cases where we
guarantee more, but if one uses the builtins in way allowed by the C++11
model, one certainly gets behavior and happens-before relationships as
specified by C++11.


> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 6004681..5b2ded8 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -8853,19 +8853,19 @@ are not prevented from being speculated to before the 
> barrier.
> 
>   [...]  If the data type size maps to one
> -of the integral sizes that may have lock free support, the generic
> -version uses the lock free built-in function.  Otherwise an
> +of the integral sizes that may support lock-freedom, the generic
> +version uses the lock-free built-in function.  Otherwise an
>   external call is left to be resolved at run time.
> 
> =
> This is a slightly awkward sentence. Maybe it could be replaced with something
> on the lines of "The generic function uses the lock-free built-in function 
> when
> the data-type size makes that possible, otherwise an external call is left to 
> be
> resolved at run-time."
> =

Changed to:
"It uses the lock-free built-in function if the specific data type size
makes that possible; otherwise, an external call is left to be resolved
at run time."

> 
> -The memory models integrate both barriers to code motion as well as
> -synchronization requirements with other threads.  They are listed here
> -in approximately ascending order of strength.
> +An atomic operation can both constrain code motion by the compiler and
> +be mapped to a hardware instruction for synchronization between threads
> +(e.g., a fence).  [...]
> 
> =
> This is a little unclear (and inaccurate, aarch64 can use two instructions
> for fences). I also thought that atomic operations constrain code motion by 
> the
> hardware. Maybe break the link with the compiler and hardware: "An atomic
> operation can both constrain code motion and act as a synchronization point
> between threads".
> =

I removed "by the compiler" and used "hardware instruction_s_".

> 
>   @table  @code
>   @item __ATOMIC_RELAXED
> -No barriers or synchro

Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread H.J. Lu
On Tue, May 19, 2015 at 12:11 PM, Richard Henderson  wrote:
> On 05/19/2015 12:06 PM, H.J. Lu wrote:
>> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson  wrote:
>>> On 05/19/2015 11:06 AM, Rich Felker wrote:
 I'm still mildly worried that concerns for supporting
 relaxation might lead to decisions not to optimize code in ways that
 would be difficult to relax (e.g. certain types of address load
 reordering or hoisting) but I don't understand GCC internals
 sufficiently to know if this concern is warranted or not.
>>>
>>> It is.  The relaxation that HJ is working on requires that the reads from 
>>> the
>>> got not be hoisted.  I'm not especially convinced that what he's working on 
>>> is
>>> a win.
>>>
>>> With LTO, the compiler can do the same job that he's attempting in the 
>>> linker,
>>> without an extra nop.  Without LTO, leaving it to the linker means that you
>>> can't hoist the load and hide the memory latency.
>>>
>>
>> My relax approach won't take away any optimization done by compiler.
>> It simply turns indirect branch into direct branch with a nop prefix at
>> link-time.  I am having a hard time to understand why we shouldn't do it.
>
> I well understand what you're doing.
>
> But my point is that the only time the compiler should present you with the
> form of indirect branch you're looking for is when there's no place to hoist
> the load.
>
> At which point, is it really worth adding a new relocation to the ABI?  Is it
> really worth adding new code to the linker that won't be exercised often?

I believe there are plenty of indirect branches via GOT when compiling
PIE/PIC with -fno-plt:

[hjl@gnu-6 gcc]$ cat /tmp/x.c
extern void foo (void);

void
bar (void)
{
  foo ();
}
[hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt
[hjl@gnu-6 gcc]$ cat x.s
.file "x.c"
.section .text.unlikely,"ax",@progbits
.LCOLDB0:
.text
.LHOTB0:
.p2align 4,,15
.globl bar
.type bar, @function
bar:
.LFB0:
.cfi_startproc
jmp *foo@GOTPCREL(%rip)
.cfi_endproc
.LFE0:
.size bar, .-bar

-- 
H.J.


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Richard Henderson
On 05/19/2015 12:06 PM, H.J. Lu wrote:
> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson  wrote:
>> On 05/19/2015 11:06 AM, Rich Felker wrote:
>>> I'm still mildly worried that concerns for supporting
>>> relaxation might lead to decisions not to optimize code in ways that
>>> would be difficult to relax (e.g. certain types of address load
>>> reordering or hoisting) but I don't understand GCC internals
>>> sufficiently to know if this concern is warranted or not.
>>
>> It is.  The relaxation that HJ is working on requires that the reads from the
>> got not be hoisted.  I'm not especially convinced that what he's working on 
>> is
>> a win.
>>
>> With LTO, the compiler can do the same job that he's attempting in the 
>> linker,
>> without an extra nop.  Without LTO, leaving it to the linker means that you
>> can't hoist the load and hide the memory latency.
>>
> 
> My relax approach won't take away any optimization done by compiler.
> It simply turns indirect branch into direct branch with a nop prefix at
> link-time.  I am having a hard time to understand why we shouldn't do it.

I well understand what you're doing.

But my point is that the only time the compiler should present you with the
form of indirect branch you're looking for is when there's no place to hoist
the load.

At which point, is it really worth adding a new relocation to the ABI?  Is it
really worth adding new code to the linker that won't be exercised often?


r~



Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Rich Felker
On Tue, May 19, 2015 at 06:01:07PM +0200, Michael Matz wrote:
> Hi,
> 
> On Tue, 19 May 2015, Jeff Law wrote:
> 
> > > > Forget lazy binding. It's dead anyway because serious distros want 
> > > > PIE+relro+bindnow+...
> > > 
> > > You keep saying this, but I can't help the feeling it's mostly because 
> > > musl doesn't support it ;-)
> > 
> > FWIW, Red Hat is pushing PIE & partial RELRO deeper and deeper into the 
> > distribution.
> 
> Yeah, us as well, though I don't necessarily see the point for most 
> packages; feels a bit like a checkmark item :)

These days it's fairly rare to have software which does not interact
at all with untrusted data. Consider how much user-facing application
software that was not previously considered security-critical is
making network connections using complex protocols (e.g. anything with
TLS, IM protocols, ...), opening image files from random sources
(attachments, files that happen to be in a browsed-to directory, on
USB sticks, etc.), and so on. I think it's smart to be hardening
everything, at least for distros providing all sorts of random
unvetted software.

Rich


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread H.J. Lu
On Tue, May 19, 2015 at 11:59 AM, Richard Henderson  wrote:
> On 05/19/2015 11:06 AM, Rich Felker wrote:
>> I'm still mildly worried that concerns for supporting
>> relaxation might lead to decisions not to optimize code in ways that
>> would be difficult to relax (e.g. certain types of address load
>> reordering or hoisting) but I don't understand GCC internals
>> sufficiently to know if this concern is warranted or not.
>
> It is.  The relaxation that HJ is working on requires that the reads from the
> got not be hoisted.  I'm not especially convinced that what he's working on is
> a win.
>
> With LTO, the compiler can do the same job that he's attempting in the linker,
> without an extra nop.  Without LTO, leaving it to the linker means that you
> can't hoist the load and hide the memory latency.
>

My relax approach won't take away any optimization done by compiler.
It simply turns indirect branch into direct branch with a nop prefix at
link-time.  I am having a hard time to understand why we shouldn't do it.


-- 
H.J.


Move dependency for shared libgcc

2015-05-19 Thread Eric Botcazou
Probably a misapplied patch: the dependency of the shared libgcc on the shared 
libunwind is in a wrong place in Makefile.  The patch also removes a useless 
endif/ifneq pair.

Tested on x86_64-suse-linux and ia64-suse-linux, applied as obvious.


2015-05-19  Eric Botcazou  

* Makefile.in (LIBUNWIND): Move dependency for shared libgcc.
Remove useless endif/ifneq ($(enable_shared),yes) pair.


-- 
Eric BotcazouIndex: Makefile.in
===
--- Makefile.in	(revision 223349)
+++ Makefile.in	(working copy)
@@ -910,17 +910,14 @@ all: libgcc.a libgcov.a
 
 ifneq ($(LIBUNWIND),)
 all: libunwind.a
-libgcc_s$(SHLIB_EXT): libunwind$(SHLIB_EXT)
 endif
 
 ifeq ($(enable_shared),yes)
 all: libgcc_eh.a libgcc_s$(SHLIB_EXT)
 ifneq ($(LIBUNWIND),)
 all: libunwind$(SHLIB_EXT)
+libgcc_s$(SHLIB_EXT): libunwind$(SHLIB_EXT)
 endif
-endif
-
-ifeq ($(enable_shared),yes)
 
 # Map-file generation.
 ifneq ($(SHLIB_MKMAP),)


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Richard Henderson
On 05/19/2015 11:06 AM, Rich Felker wrote:
> I'm still mildly worried that concerns for supporting
> relaxation might lead to decisions not to optimize code in ways that
> would be difficult to relax (e.g. certain types of address load
> reordering or hoisting) but I don't understand GCC internals
> sufficiently to know if this concern is warranted or not.

It is.  The relaxation that HJ is working on requires that the reads from the
got not be hoisted.  I'm not especially convinced that what he's working on is
a win.

With LTO, the compiler can do the same job that he's attempting in the linker,
without an extra nop.  Without LTO, leaving it to the linker means that you
can't hoist the load and hide the memory latency.

> I would still like to see the @GOTPCREL stuff added and used instead
> of @GOT, as I mentioned earlier in the thread, but I agree that's
> independent of relaxation support and shouldn't block it.

I don't think that @GOTPCREL for 32-bit is a good idea.  This is the scheme
that Darwin uses, so we do have some experience with it.

In order for it to work you've got to have a pointer to a random address in the
function.  It means that you can only "easily" compute the address once.  If
you need the value again you wind up with the same "extra" addl insn that we
have with the current GOT pointer.

We've just started to do inter-function register allocation.  The next step
along those lines is to share the computation of GOT between multiple
functions.  At which point it really helps to have one global base address to
talk about.


r~


RE: [PATCH, MIPS]: Fix internal compiler error: in check_bool_attrs, at recog.c:2218 for micromips attribute

2015-05-19 Thread Robert Suchanek
Hi,

The original patch had a missing declaration of micromips_globals in mips.h 
that appears to be the cause of segmentation faults when building 
mips-mti-linux-gnu.
I didn't get any failures just before the submission neither on 
mips-img-linux-gnu
nor mips64el-linux-gnu and the test case is too trivial to trigger the ICE.

Below is the missing line. With this change mips-mti-linux-gnu builds fine.
The trunk is unstable and needed another patch from PR66181 to build
Glibc. Ok to commit?

>  We could add -mflip-micromips complementing -mflip-mips16 and use that
> for testing too.  Chances are it'd reveal further issues.  Looking at how
> -mflip-mips16 has been implemented it does not appear to me adding
> -mflip-micromips would be a lot of effort.

I'm in favour of adding such a switch since the testsuite doesn't cover 
a mixture of MIPS and microMIPS code.

Regards,
Robert

gcc/
* config/mips/mips.h (micromips_globals): Declare.
---
 gcc/config/mips/mips.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 0ea4e6d..85c8a97 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -3108,6 +3108,7 @@ extern const struct mips_cpu_info *mips_arch_info;
 extern const struct mips_cpu_info *mips_tune_info;
 extern unsigned int mips_base_compression_flags;
 extern GTY(()) struct target_globals *mips16_globals;
+extern GTY(()) struct target_globals *micromips_globals;
 #endif
 
 /* Enable querying of DFA units.  */
-- 
2.2.2


Re: [patch, gcc 5 regression] re-enable biarch for powerpc-linux-gnu

2015-05-19 Thread David Edelsohn
This seems reasonable to me.

Alan, any thoughts from you?

Thanks, David


On Mon, May 18, 2015 at 12:22 PM, Sandra Loosemore
 wrote:
> We've found that configuring a powerpc-linux-gnu cross toolchain with
> --enable-targets=all no longer enables -m64 support in GCC 5, due to the
> patch for PR target/65286.  (We want to build with a -m64 multilib, in
> particular.)
>
> The attached patch seems to fix the breakage, but I'm not sure that it might
> not break some other configuration.  If this isn't the right fix, can one of
> the target experts suggest a better one?
>
> Here's a link to the discussion of the patch that caused the breakage:
>
> https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00321.html
>
> -Sandra
>


[PING] [RFC 12/13] S/390 Vector ABI GNU Attribute.

2015-05-19 Thread Andreas Krebbel
On 05/11/2015 03:23 PM, Andreas Krebbel wrote:
> With this patch .gnu_attribute is used to mark binaries with a vector
> ABI tag.  This is required since the z13 vector support breaks the ABI
> of existing vector_size attribute generated vector types:
> 
> 1. vector_size(16) and bigger vectors are aligned to 8 byte
> boundaries (formerly vectors were always naturally aligned)
> 
> 2. vector_size(16) or smaller vectors are passed via VR if available
> or by value on the stack (formerly vector were passed on the stack by
> reference).
> 
> The .gnu_attribute will be used by ld to emit a warning if binaries
> with incompatible ABIs are being linked together:
> https://sourceware.org/ml/binutils/2015-04/msg00316.html
> 
> And it will be used by GDB to perform inferior function calls using a
> vector ABI which fits to the binary being debugged:
> https://sourceware.org/ml/gdb-patches/2015-04/msg00833.html
> 
> The current implementation tries to only set the attribute if the
> vector types are really used in ABI relevant contexts in order to
> avoid false positives during linking.
> 
> However, this unfortunately has some limitations like in the following
> case where an ABI relevant context cannot be detected properly:
> 
> typedef int __attribute__((vector_size(16))) v4si;
> struct A
> {
>   char x;
>   v4si y;
> };
> char a[sizeof(struct A)];
> 
> The number of elements in a depends on the ABI (24 with -mvx and 32
> with -mno-vx).  However, the implementation is not able to detect this
> since the struct type is not used anywhere else and consequently does
> not survive until the checking code is able to see it.
> 
> Ideas about how to improve the implementation without creating too
> many false postives are welcome.
> 
> In particular we do not want to set the attribute for local uses of
> vector types as they would be natural for ifunc optimizations.

Any ideas how this could be improved? That's the only patch of the IBM z13 
series I did not apply yet.

-Andreas-



Re: [RFC] COMDAT Safe Module Level Multi versioning

2015-05-19 Thread Sriraman Tallam
On Tue, May 19, 2015 at 10:22 AM, Yury Gribov  wrote:
> On 05/19/2015 09:16 AM, Sriraman Tallam wrote:
>>
>> We have the following problem with selectively compiling modules with
>> -m options and I have provided a solution to solve this.  I would
>> like to hear what you think.
>>
>> Multi versioning at module granularity is done by compiling a subset
>> of modules with advanced ISA instructions, supported on later
>> generations of the target architecture, via -m options and
>> invoking the functions defined in these modules with explicit checks
>> for the ISA support via builtin functions,  __builtin_cpu_supports.
>> This mechanism has the unfortunate side-effect that generated COMDAT
>> candidates from these modules can contain these advanced instructions
>> and potentially “violate” ODR assumptions.  Choosing such a COMDAT
>> candidate over a generic one from a different module can cause SIGILL
>> on platforms where the advanced ISA is not supported.
>>
>> Here is a slightly contrived  example to illustrate:
>>
>>
>> matrixdouble.h
>> 
>> // Template (Comdat) function definition in a header:
>>
>> template
>> __attribute__((noinline))
>> void matrixDouble (T *a) {
>>for (int i = 0 ; i < 16; ++i)  //Vectorizable Loop
>>  a[i] = a[i] * 2;
>> }
>>
>> avx.cc  (Compile with -mavx -O2)
>> -
>>
>> #include "matrixdouble.h"
>> void getDoubleAVX(int *a) {
>>   matrixDouble(a);  // Instantiated with vectorized AVX instructions
>> }
>>
>>
>> non_avx.cc (Compile with -mno-avx -O2)
>> ---
>>
>> #include “matrixdouble.h”
>> void
>> getDouble(int *a) {
>>   matrixDouble(a); // Instantiated with non-AVX instructions
>> }
>>
>>
>> main.cc
>> ---
>>
>> void getDoubleAVX(int *a);
>> void getDouble(int *a);
>>
>> int a[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
>> int main () {
>>   // The AVX call is appropriately guarded.
>>   if (__builtin_cpu_supports(“avx”))
>> getDoubleAVX(a);
>>   else
>> getDouble(a);
>>   return a[0];
>> }
>>
>>
>> In the above code, function “getDoubleAVX” is only called when the
>> run-time CPU supports AVX instructions.  This code looks clean but
>> suffers from the COMDAT ODR violation.  Two copies of COMDAT function
>> “matrixDouble” are generated.  One copy is generated in object file
>> “avx.o” with AVX instructions and another copy exists in “non_avx.o”
>> without AVX instruction.  At link time, in a link order where object
>> file avx.o is seen ahead of  non_avx.o,  the COMDAT copy of function
>> “matrixDouble” that contains AVX instructions is kept leading to
>> SIGILL on unsupported platforms.  To reproduce the SIGILL,
>>
>>
>> $  g++ -c -O2 -mavx avx.cc
>> $ g++ -c -O2 -mno-avx non_avx.cc
>> $  g++ main.cc avx.o non_avx.o
>> $ ./a.out   # on a non-AVX machine
>> Illegal Instruction
>>
>>
>> To solve this, I propose introducing a new compiler option, say
>> -fodr-unsafe-comdats, to let the user tag objects that use specialized
>> options and let the linker choose the comdat candidate to be linked
>> wisely.  The root cause of the above problem is that comdat functions
>> in common headers may not be properly guarded and the linker picks the
>> first candidate it sees.  A link order where the object with the
>> specialized comdat functions appear first causes these comdats to be
>> picked leading to SIGILL on unsupported arches.  With the objects
>> tagged, the linker can be made to pick other comdat candidates when
>> possible.
>>
>> More details:
>>
>> This option is user specified when using arch specific options like
>> -m.  It is an indicator to the compiler that any comdat bodies
>> generated are potentially unsafe for execution.  Note that the COMDAT
>> bodies however have to be generated as there are no guarantees that
>> other modules will do so.  The compiler then emits a specially named
>> section, like “.gnu.odr.unsafe”, in the object file.  When the linker
>> tries to pick a COMDAT candidate from several choices, it must avoid
>> COMDAT copies from objects with sections named “.gnu.odr.unsafe” when
>> presented with a choice to pick a candidate from an object that does
>> not have the “.gnu.odr.unsafe” section.  Note that it may not be
>> possible to do that in which case the linker must pick the unsafe
>> copy, it could explicitly warn when this happens.
>>
>> Alternately,  the compiler can bind locally any emitted comdat version
>> from a specialized module, which could also be guarded by an option.
>> This will solve the problem but this may not be always possible
>> especially when addresses of any such comdat version is taken.
>
>
> Can IFUNC relocations be used to properly select optimal version of code at
> runtime?

Yes, we do want a solution like this but all the dispatching code for
IFUNC needs to be generated at link-time.   Here is an example header
file with target-specific functionalities :

https://bitbucket.org/eigen/eigen/src/6ed647a644b8e3924800f0916a4ce4addf9e7739/Eigen/Co

Re: [PATCH/libiberty] fix build of gdb/binutils with clang.

2015-05-19 Thread Ian Lance Taylor
On Tue, May 19, 2015 at 11:08 AM, Yunlian Jiang  wrote:
>
> I could do that and it make the compilation of libiberty passes.
> However, I  have some other problem when using clang to build gdb
> because of libiberty.
>
> Some c file from other component may include 'libiberty.h' which contains
> the following
>
> #if !HAVE_DECL_ASPRINTF
> /* Like sprintf but provides a pointer to malloc'd storage, which must
>be freed by the caller.  */
>
> extern int asprintf (char **, const char *, ...) ATTRIBUTE_PRINTF_2;
> #endif
>
> The HAVE_DECL_ASPRINTF is defined in config.h under libiberty directory.
> If the other c file only includes libiberty.h and does not include the
> libiberty/config.h and
> at the same time, _GNU_SOURCE is defind, the same error happens.

Probably if HAVE_DECL_ASPRINTF is not defined at all, we should not
declare asprintf in libiberty.h.

Ian


Re: [PATCH/libiberty] fix build of gdb/binutils with clang.

2015-05-19 Thread DJ Delorie

> If the other c file only includes libiberty.h and does not include the
> libiberty/config.h and

In general, such "other c files" should have their own config.h that
does the same test and has its own HAVE_DECL_ASPRINTF.

That way, the config.h matches the compiler options being used, and
not the compiler options libiberty's build used.


Re: [PATCH/libiberty] fix build of gdb/binutils with clang.

2015-05-19 Thread Yunlian Jiang
I could do that and it make the compilation of libiberty passes.
However, I  have some other problem when using clang to build gdb
because of libiberty.

Some c file from other component may include 'libiberty.h' which contains
the following

#if !HAVE_DECL_ASPRINTF
/* Like sprintf but provides a pointer to malloc'd storage, which must
   be freed by the caller.  */

extern int asprintf (char **, const char *, ...) ATTRIBUTE_PRINTF_2;
#endif

The HAVE_DECL_ASPRINTF is defined in config.h under libiberty directory.
If the other c file only includes libiberty.h and does not include the
libiberty/config.h and
at the same time, _GNU_SOURCE is defind, the same error happens.


On Mon, May 18, 2015 at 4:52 PM, Ian Lance Taylor  wrote:
> On Mon, May 18, 2015 at 4:26 PM, Yunlian Jiang  wrote:
>>
>> Yes, the problem is  libiberty is compiling some files
>> with _GNU_SOURCE defined and some not. So the configure
>> file does not include "#define _GNU_SOURCE".
>
> As far as I can see it should be fine to define _GNU_SOURCE when
> compiling all libiberty files.
>
> Ian


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Rich Felker
On Tue, May 19, 2015 at 04:43:53PM +0200, Michael Matz wrote:
> Hi,
> 
> On Fri, 15 May 2015, Rich Felker wrote:
> 
> > Forget lazy binding. It's dead anyway because serious distros want
> > PIE+relro+bindnow+...
> 
> You keep saying this, but I can't help the feeling it's mostly because 
> musl doesn't support it ;-)

Well the reasons musl doesn't support it are partly the above, and
partly that it's been a continuous source of subtle bugs in glibc --
things like clobbering new vector registers, missing synchronization,
failures to be async-signal-safe, etc. So it's not that I think lazy
binding is bad because musl doesn't support it, but rather that musl
doesn't support lazy binding because I think it's bad. :-)

> No, you don't have to use bindnow to get the effects of relro.  Sure 
> there's more parts of the GOT protected with it, but if that's really that 
> much more hardened is up for debate.

Normally it's function addresses that you care about protecting --
they're the easy vector for arbitrary code execution -- and they're
unprotected without bindnow. Addresses of global data could also be an
attack vector, but a more difficult one to exploit.

> > If people really want lazy binding, they can use options which support 
> > it, but I don't want to keep suffering the codegen cost of lazy binding 
> > despite never using it.
> 
> > There should be an option to generate optimal code equivalent to what 
> > you get with Alexander Monakov's patches for those of us who aren't 
> > trying to support this legacy feature that precludes good performance 
> > and precludes hardening.
> 
> H.J.'s branch is for _improving_ code on top of the no-plt code, it's not 
> replacing it or an alternative for it.

Thanks for the clarification -- this was the part I was failing to
understand. I'm still mildly worried that concerns for supporting
relaxation might lead to decisions not to optimize code in ways that
would be difficult to relax (e.g. certain types of address load
reordering or hoisting) but I don't understand GCC internals
sufficiently to know if this concern is warranted or not. As long as
his work isn't interfering with the ability of -fno-plt to generate
optimal code, I agree it's both inappropriate and counter-productive
for me to be objecting to part or all of it.

I would still like to see the @GOTPCREL stuff added and used instead
of @GOT, as I mentioned earlier in the thread, but I agree that's
independent of relaxation support and shouldn't block it.

Rich


Use a couple of macros in stor-layout.c

2015-05-19 Thread Eric Botcazou
Self-explanatory, tested on x86_64-suse-linux, applied as obvious.


2015-05-19  Eric Botcazou  

* stor-layout.c (finalize_type_size): Use AGGREGATE_TYPE_P.
(layout_type): Use RECORD_OR_UNION_TYPE_P.


-- 
Eric BotcazouIndex: stor-layout.c
===
--- stor-layout.c	(revision 223349)
+++ stor-layout.c	(working copy)
@@ -1757,12 +1757,9 @@ finalize_type_size (tree type)
  However, where strict alignment is not required, avoid
  over-aligning structures, since most compilers do not do this
  alignment.  */
-
-  if (TYPE_MODE (type) != BLKmode && TYPE_MODE (type) != VOIDmode
-  && (STRICT_ALIGNMENT
-	  || (TREE_CODE (type) != RECORD_TYPE && TREE_CODE (type) != UNION_TYPE
-	  && TREE_CODE (type) != QUAL_UNION_TYPE
-	  && TREE_CODE (type) != ARRAY_TYPE)))
+  if (TYPE_MODE (type) != BLKmode
+  && TYPE_MODE (type) != VOIDmode
+  && (STRICT_ALIGNMENT || !AGGREGATE_TYPE_P (type)))
 {
   unsigned mode_align = GET_MODE_ALIGNMENT (TYPE_MODE (type));
 
@@ -2431,9 +2428,7 @@ layout_type (tree type)
   /* Compute the final TYPE_SIZE, TYPE_ALIGN, etc. for TYPE.  For
  records and unions, finish_record_layout already called this
  function.  */
-  if (TREE_CODE (type) != RECORD_TYPE
-  && TREE_CODE (type) != UNION_TYPE
-  && TREE_CODE (type) != QUAL_UNION_TYPE)
+  if (!RECORD_OR_UNION_TYPE_P (type))
 finalize_type_size (type);
 
   /* We should never see alias sets on incomplete aggregates.  And we


Re: ODR merging and implicit typedefs

2015-05-19 Thread Markus Trippelsdorf
On 2015.05.19 at 19:33 +0200, Jan Hubicka wrote:
> 
> Jason,
> I just noticed that there are bogus ODR violation warnings during 
> LTO-bootstrap
> (that breaks -Werror builds).  It was caused by my work-around for 
> type_in_anonymous_namespace
> for the issue discussed in:
> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01245.html
> (i.e. the TYPE_STUB_DECL disucssion).

There are also many bogus ODR warnings when building LLVM with LTO:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66180

-- 
Markus


Re: [Patch][loop-invariant.c] Fix a couple of bugs regarding loop invariant motion discovered by spec2k6 on aarch64

2015-05-19 Thread Jeff Law

On 05/18/2015 02:16 AM, David Sherwood wrote:

Hi Jeff,

Thanks for the suggestion. I did a bootstrap x86_64 build before and after my
patch and looked for differences in the last stage object files and there were
plenty of them. I chose a nice simple function (check_callers) from
ipa-inline-analysis.c and reduced it to a small test case. Hope this is ok.

Testing done:

  *  aarch64 built, "make check" no regressions
  *  aarch64_be built, "make check" no regressions
  *  x86_64 built, "make check" no regressions

ChangeLog:
 2015-05-15  David Sherwood  

 * loop-invariant.c (create_new_invariant): Don't calculate address cost
 if mode is not a scalar integer.
  (get_inv_cost): Increase computational cost for unused invariants.
 * testsuite/gcc.dg/loop-invariant.c: New testcase.
Thanks.  Installed on the trunk after fixing a couple trivial whitespace 
nits.


Jeff



ODR merging and implicit typedefs

2015-05-19 Thread Jan Hubicka

Jason,
I just noticed that there are bogus ODR violation warnings during LTO-bootstrap
(that breaks -Werror builds).  It was caused by my work-around for 
type_in_anonymous_namespace
for the issue discussed in:
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01245.html
(i.e. the TYPE_STUB_DECL disucssion).

I simply added a loop that for type that looks anonymous by
if (TYPE_STUB_DECL (t) && !TREE_PUBLIC (TYPE_STUB_DECL (t)))
it walks up the context and looks for actual anonymous NAMESPACE_DECL.

Now I however run into type merging issues as follows:

for following type:
../../gcc/postreload-gcse.c:107:1: error: type �struct � violates one 
definition rule [-Werror=odr]
 {
  ^
../../gcc/postreload.c:721:3: note: a different type is defined in another 
translation unit
   {
^
../../gcc/postreload-gcse.c:108:7: note: the first difference of corresponding 
definitions is field �moves_inserted�
   int moves_inserted;
^
../../gcc/postreload.c:722:51: note: a field with different name is defined in 
another translation unit
 struct reg_use reg_use[RELOAD_COMBINE_MAX_USES];
^
(gdb) p debug_tree (t1->type_common.name->decl_with_vis.assembler_name)
 

So the problem is that at compile time we think the type is ODR type but it 
does not really have name:

  constant 96>
unit size  constant 12>
align 32 symtab 0 alias set -1 canonical type 0x746d2bd0
fields 
unit size 
align 32 symtab 0 alias set -1 canonical type 0x76adb690 
precision 32 min  max 
pointer_to_this  reference_to_this 
>
nonlocal SI file ../../gcc/postreload-gcse.c line 108 col 7 size 
 unit size 
align 32 offset_align 128
offset 
bit offset  context 
chain 
nonlocal SI file ../../gcc/postreload-gcse.c line 109 col 7 size 
 unit size 
align 32 offset_align 128 offset  bit 
offset  context  
chain >> context 

chain >

 
unit size 
align 32 symtab 0 alias set -1 canonical type 0x746d2bd0
fields 
nonlocal SI file ../../gcc/postreload-gcse.c line 108 col 7
size 
unit size 
align 32 offset_align 128
offset 
bit offset  context 
 chain > 
context 
pointer_to_this  chain >
VOID file ../../gcc/postreload-gcse.c line 107 col 1
align 8 context >

I tracked down that those are implicit typedef created by 
create_implicit_typedef.
My patch made them no longer anonymous that in turn triggers the bogus 
diagnostics.
I do not think it is fully correct though - those types are not anonymous.
(I also wonder we we need to introdce a type name "._134") and pass it all the 
way down
to LTO.

I tried to make type_with_linkage_p to return false on those, but that causes 
problem
witht fact that polymorphic call analysis expects all class types to have 
linkage
and working ODR equivalency on these. 

Is there a way to associate them with the real named type they correspond to and
arrange them to have same name mangling? (and perhaps the mangler to ICE on 
attempt
to try to use local name like this?)

I bootstrapped/regtested on x86_64-linux the patch bellow. If it will work for 
Firefox
and Chrome I will go ahead with it at least temporarily.

Honza

* ipa-devirt.c (type_in_anonymous_namespace_p): Return true
or implicit declarations.
(odr_type_p): Check that TYPE_NAME is TYPE_DECL before looking
into it.
(get_odr_type): Check type has linkage before adding bases.
(register_odr_type): Check that type has linkage before adding it.
(type_known_to_have_no_deriavations_p): Rename to ..
(type_known_to_have_no_derivations_p): This one.
* ipa-utils.h (type_known_to_have_no_deriavations_p): Rename to ..
(type_known_to_have_no_derivations_p): This one.
* ipa-polymorphic-call.c
(ipa_polymorphic_call_context::restrict_to_inner_type): Check that
type has linkage.
Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 223390)
+++ ipa-devirt.c(working copy)
@@ -269,6 +269,8 @@ type_in_anonymous_namespace_p (const_tre
 
   if (TYPE_STUB_DECL (t) && !TREE_PUBLIC (TYPE_STUB_DECL (t)))
 {
+  if (DECL_ARTIFICIAL (TYPE_NAME (t)))
+   return true;
   tree ctx = DECL_CONTEXT (TYPE_NAME (t));
   while (ctx)
{
@@ -296,7 +298,7 @@ odr_type_p (const_tree t)
  to care, since it is used only for type merging.  */
   gcc_checking_assert (in_lto_p || flag_lto);
 
-  return (TYPE_NAME (t)
+  return (TYPE_NAME (t) && TREE_CODE (TYPE_NAME (t)) == TYPE_DECL
   && (DECL_ASSEMBLER_NAME_SET_P (TYPE_NAME (t;
 }
 
@@ -2124,6 +2126,7 @@ get_odr_type (tree type, bool insert)
 }
 
   if (build_bases && TREE_CODE (type) == RECORD_TYPE && TYPE_BINFO (type)
+  && type_with_linkage_p (type)

Re: [PATCH] Add SPECIAL_FLOAT_MODE to enable adding IEEE 128-bit floating point to PowerPC

2015-05-19 Thread Michael Meissner
On Fri, May 08, 2015 at 01:05:59PM -0600, Jeff Law wrote:
> On 05/06/2015 11:29 AM, Michael Meissner wrote:
> >On Wed, May 06, 2015 at 04:03:00PM +0100, Richard Sandiford wrote:
> >>Jeff Law  writes:
> >>>So my worry here is that folks writing these loops to iterate over modes
> >>>are going to easily miss the != VOIDmode terminator, or not know when to
> >>>use GET_MODE_WIDER_SPECIAL.
> >>>
> >>>We can certainly go with the patch as-is since you've done the work to
> >>>sort out which GET_MODE_WIDER to use and added the appropriate
> >>>termination checks.   But do we want to try to future proof this a
> >>>little and define two iterators for folks to use rather than write the
> >>>loops by hand every time and probably getting it wrong -- and wrong in
> >>>such a way that it only breaks on PPC, forcing someone to regularly be
> >>>fixing this stuff
> >
> >If they miss the != VOIDmode, the program will hang since it will never exit
> >the loop (VOIDmode is the wider type for VOIDmode).
> OK.  And presumably if someone is adding a new loop over the modes
> they'll actually be testing that code on whatever target they're
> using.  So it's unlikely they'll introduce an infinite loop on other
> targets by accident.

In looking over the patch, I learned a lot more about the internal mode
handling.  I discovered that I can use FRACTIONAL_FLOAT_MODE to add the new
types (IFmode for IBM double-double with 106 fracational bits, and KFmode for
IEEE 128-bit with 113 bits, and TFmode will become either of these depending on
the defaults and switches).  What I was missing when I previously tried to use
FRACTIONAL_FLOAT_MODE is when I create these new modes, in the case where they
are not active, I have to make sure that HARD_REGNO_NREGS never returns true
for any register (including GPRs), and the mode will be skipped.  I don't need
to add the special tests against VOIDmode during the widening rules.

So, I don't need any special machine independent support for adding IEEE
128-bit floating point to the compiler.  Thanks for pushing me to
re-investigate the changes and clean them up.

During my investigation, I did come up with a FOR_EACH iterator for emit-rtl.c
and a few other places that make the iteration over all modes easier, and I
will clean that up and submit it as a separate patch.

Some of the other mode widening places in the machine independent parts of the
compiler could also be cleaned up (lto-streamer in particular), that I may or
may not submit patches for.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [RFC]: Remove Mem/address type assumption in combiner

2015-05-19 Thread Jeff Law

On 05/16/2015 07:55 PM, Hans-Peter Nilsson wrote:

On Sat, 16 May 2015, Segher Boessenkool wrote:

On Sat, May 16, 2015 at 12:36:38PM -0400, Hans-Peter Nilsson wrote:

On Sat, 16 May 2015, Segher Boessenkool wrote:

On Fri, May 15, 2015 at 10:40:48PM -0400, Hans-Peter Nilsson wrote:

I confess the test-case-"guarded" addi pattern should have been
expressed with a shift in addition to the multiplication.


But they wouldn't ever match so they might very well have bitrotted
by now :-(


It seems you're saying that the canonicalization to "ashift"
didn't work *at all*, when starting with an expression from an
address?  I knew it failed in part, but always thought it was
just a partial failure.


With a plus or minus combine would always write it as a mult.
I don't think any other pass would create this combination.  I
haven't tested it though.


Ports probably also generate that internally at various RTL
passes, something that takes a bit more than an at-a-glance code
inspection.
Correct.  THe PA port for example has a ton of this kind of RTL 
rewriting to exploit the shift-add insns and scaled indexed addressing 
modes (and correct for some oddities in the PA chip where the scaled 
modes don't exist in every context where you'd think they should).


And you still have to to worry about things like combine taking a (mem 
(plus (mult))), selecting the (plus (mult)) as a split point and failing 
to canonicalize it into the ashift form.


I ran into that while fixing up the PA for these changes.  The good news 
is with two trivial combine changes and the expected changes to the PA 
backend, I can get code generation back to where it was before across my 
sample testfiles.



Jeff


Re: [RFC] COMDAT Safe Module Level Multi versioning

2015-05-19 Thread Yury Gribov

On 05/19/2015 09:16 AM, Sriraman Tallam wrote:

We have the following problem with selectively compiling modules with
-m options and I have provided a solution to solve this.  I would
like to hear what you think.

Multi versioning at module granularity is done by compiling a subset
of modules with advanced ISA instructions, supported on later
generations of the target architecture, via -m options and
invoking the functions defined in these modules with explicit checks
for the ISA support via builtin functions,  __builtin_cpu_supports.
This mechanism has the unfortunate side-effect that generated COMDAT
candidates from these modules can contain these advanced instructions
and potentially “violate” ODR assumptions.  Choosing such a COMDAT
candidate over a generic one from a different module can cause SIGILL
on platforms where the advanced ISA is not supported.

Here is a slightly contrived  example to illustrate:


matrixdouble.h

// Template (Comdat) function definition in a header:

template
__attribute__((noinline))
void matrixDouble (T *a) {
   for (int i = 0 ; i < 16; ++i)  //Vectorizable Loop
 a[i] = a[i] * 2;
}

avx.cc  (Compile with -mavx -O2)
-

#include "matrixdouble.h"
void getDoubleAVX(int *a) {
  matrixDouble(a);  // Instantiated with vectorized AVX instructions
}


non_avx.cc (Compile with -mno-avx -O2)
---

#include “matrixdouble.h”
void
getDouble(int *a) {
  matrixDouble(a); // Instantiated with non-AVX instructions
}


main.cc
---

void getDoubleAVX(int *a);
void getDouble(int *a);

int a[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
int main () {
  // The AVX call is appropriately guarded.
  if (__builtin_cpu_supports(“avx”))
getDoubleAVX(a);
  else
getDouble(a);
  return a[0];
}


In the above code, function “getDoubleAVX” is only called when the
run-time CPU supports AVX instructions.  This code looks clean but
suffers from the COMDAT ODR violation.  Two copies of COMDAT function
“matrixDouble” are generated.  One copy is generated in object file
“avx.o” with AVX instructions and another copy exists in “non_avx.o”
without AVX instruction.  At link time, in a link order where object
file avx.o is seen ahead of  non_avx.o,  the COMDAT copy of function
“matrixDouble” that contains AVX instructions is kept leading to
SIGILL on unsupported platforms.  To reproduce the SIGILL,


$  g++ -c -O2 -mavx avx.cc
$ g++ -c -O2 -mno-avx non_avx.cc
$  g++ main.cc avx.o non_avx.o
$ ./a.out   # on a non-AVX machine
Illegal Instruction


To solve this, I propose introducing a new compiler option, say
-fodr-unsafe-comdats, to let the user tag objects that use specialized
options and let the linker choose the comdat candidate to be linked
wisely.  The root cause of the above problem is that comdat functions
in common headers may not be properly guarded and the linker picks the
first candidate it sees.  A link order where the object with the
specialized comdat functions appear first causes these comdats to be
picked leading to SIGILL on unsupported arches.  With the objects
tagged, the linker can be made to pick other comdat candidates when
possible.

More details:

This option is user specified when using arch specific options like
-m.  It is an indicator to the compiler that any comdat bodies
generated are potentially unsafe for execution.  Note that the COMDAT
bodies however have to be generated as there are no guarantees that
other modules will do so.  The compiler then emits a specially named
section, like “.gnu.odr.unsafe”, in the object file.  When the linker
tries to pick a COMDAT candidate from several choices, it must avoid
COMDAT copies from objects with sections named “.gnu.odr.unsafe” when
presented with a choice to pick a candidate from an object that does
not have the “.gnu.odr.unsafe” section.  Note that it may not be
possible to do that in which case the linker must pick the unsafe
copy, it could explicitly warn when this happens.

Alternately,  the compiler can bind locally any emitted comdat version
from a specialized module, which could also be guarded by an option.
This will solve the problem but this may not be always possible
especially when addresses of any such comdat version is taken.


Can IFUNC relocations be used to properly select optimal version of code 
at runtime?


-Y



Re: [RFC] COMDAT Safe Module Level Multi versioning

2015-05-19 Thread Sriraman Tallam
On Tue, May 19, 2015 at 2:39 AM, Richard Biener
 wrote:
> On Tue, May 19, 2015 at 8:16 AM, Sriraman Tallam  wrote:
>> We have the following problem with selectively compiling modules with
>> -m options and I have provided a solution to solve this.  I would
>> like to hear what you think.
>>
>> Multi versioning at module granularity is done by compiling a subset
>> of modules with advanced ISA instructions, supported on later
>> generations of the target architecture, via -m options and
>> invoking the functions defined in these modules with explicit checks
>> for the ISA support via builtin functions,  __builtin_cpu_supports.
>> This mechanism has the unfortunate side-effect that generated COMDAT
>> candidates from these modules can contain these advanced instructions
>> and potentially “violate” ODR assumptions.  Choosing such a COMDAT
>> candidate over a generic one from a different module can cause SIGILL
>> on platforms where the advanced ISA is not supported.
>>
>> Here is a slightly contrived  example to illustrate:
>>
>>
>> matrixdouble.h
>> 
>> // Template (Comdat) function definition in a header:
>>
>> template
>> __attribute__((noinline))
>> void matrixDouble (T *a) {
>>   for (int i = 0 ; i < 16; ++i)  //Vectorizable Loop
>> a[i] = a[i] * 2;
>> }
>>
>> avx.cc  (Compile with -mavx -O2)
>> -
>>
>> #include "matrixdouble.h"
>> void getDoubleAVX(int *a) {
>>  matrixDouble(a);  // Instantiated with vectorized AVX instructions
>> }
>>
>>
>> non_avx.cc (Compile with -mno-avx -O2)
>> ---
>>
>> #include “matrixdouble.h”
>> void
>> getDouble(int *a) {
>>  matrixDouble(a); // Instantiated with non-AVX instructions
>> }
>>
>>
>> main.cc
>> ---
>>
>> void getDoubleAVX(int *a);
>> void getDouble(int *a);
>>
>> int a[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
>> int main () {
>>  // The AVX call is appropriately guarded.
>>  if (__builtin_cpu_supports(“avx”))
>>getDoubleAVX(a);
>>  else
>>getDouble(a);
>>  return a[0];
>> }
>>
>>
>> In the above code, function “getDoubleAVX” is only called when the
>> run-time CPU supports AVX instructions.  This code looks clean but
>> suffers from the COMDAT ODR violation.  Two copies of COMDAT function
>> “matrixDouble” are generated.  One copy is generated in object file
>> “avx.o” with AVX instructions and another copy exists in “non_avx.o”
>> without AVX instruction.  At link time, in a link order where object
>> file avx.o is seen ahead of  non_avx.o,  the COMDAT copy of function
>> “matrixDouble” that contains AVX instructions is kept leading to
>> SIGILL on unsupported platforms.  To reproduce the SIGILL,
>>
>>
>> $  g++ -c -O2 -mavx avx.cc
>> $ g++ -c -O2 -mno-avx non_avx.cc
>> $  g++ main.cc avx.o non_avx.o
>> $ ./a.out   # on a non-AVX machine
>> Illegal Instruction
>>
>>
>> To solve this, I propose introducing a new compiler option, say
>> -fodr-unsafe-comdats, to let the user tag objects that use specialized
>> options and let the linker choose the comdat candidate to be linked
>> wisely.  The root cause of the above problem is that comdat functions
>> in common headers may not be properly guarded and the linker picks the
>> first candidate it sees.  A link order where the object with the
>> specialized comdat functions appear first causes these comdats to be
>> picked leading to SIGILL on unsupported arches.  With the objects
>> tagged, the linker can be made to pick other comdat candidates when
>> possible.
>>
>> More details:
>>
>> This option is user specified when using arch specific options like
>> -m.  It is an indicator to the compiler that any comdat bodies
>> generated are potentially unsafe for execution.  Note that the COMDAT
>> bodies however have to be generated as there are no guarantees that
>> other modules will do so.  The compiler then emits a specially named
>> section, like “.gnu.odr.unsafe”, in the object file.  When the linker
>> tries to pick a COMDAT candidate from several choices, it must avoid
>> COMDAT copies from objects with sections named “.gnu.odr.unsafe” when
>> presented with a choice to pick a candidate from an object that does
>> not have the “.gnu.odr.unsafe” section.  Note that it may not be
>> possible to do that in which case the linker must pick the unsafe
>> copy, it could explicitly warn when this happens.
>>
>> Alternately,  the compiler can bind locally any emitted comdat version
>> from a specialized module, which could also be guarded by an option.
>> This will solve the problem but this may not be always possible
>> especially when addresses of any such comdat version is taken.
>
> Hm.  But which options are unsafe?  Also wouldn't it be better to simp

In general, should that be any option that affects code gen and is
only *applied to a subset of modules* is potentially unsafe as the
comdat copies generated from those modules are not identical to the
copies from other modules.  Tagging such modules with
-fodr-unsafe-comdats, even conserva

[committed] linear/lastprivate clause fixes

2015-05-19 Thread Jakub Jelinek
Hi!

When working on taskloop, I've noticed various issues in the OpenMP 4.0
handling of the linear/lastprivate (explicit as well as implicit) clauses.
Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk,
plan to backport to 5/4.9 after a while.

2015-05-19  Jakub Jelinek  

PR middle-end/66199
* tree.h (OMP_TEAMS_COMBINED): Define.
* gimplify.c (enum gimplify_omp_var_data): Add
GOVD_LINEAR_LASTPRIVATE_NO_OUTER.
(enum omp_region_type): Add ORT_COMBINED_TEAMS.
(omp_notice_variable): Accept both ORT_TEAMS
and ORT_COMBINED_TEAMS.  Don't recurse if
GOVD_LINEAR_LASTPRIVATE_NO_OUTER is set and either
GOVD_LINEAR is set, or GOVD_LASTPRIVATE without
GOVD_FIRSTPRIVATE.
(omp_no_lastprivate): New function.
(gimplify_scan_omp_clauses): For OMP_CLAUSE_LASTPRIVATE
and OMP_CLAUSE_LINEAR, if omp_no_lastprivate, don't
notice_outer and set appropriate bits, otherwise make
sure default(none) combined constructs won't complain.
(gimplify_adjust_omp_clauses): Remove OMP_CLAUSE_LINEAR
outer special casing, for OMP_CLAUSE_LASTPRIVATE if
omp_no_lastprivate either remove the clause or turn it
into OMP_CLAUSE_PRIVATE.
(gimplify_omp_for): Fix up handling of implicit
lastprivate or linear iterators.
(gimplify_omp_workshare): For OMP_TEAMS_COMBINED use
ORT_COMBINED_TEAMS.
* omp-low.c (lower_omp_for_lastprivate): For combined
for simd use fd.loop.n2 from the for rather than simd.
gcc/c/
* c-parser.c (c_parser_omp_for_loop): Don't add
OMP_CLAUSE_SHARED to OMP_PARALLEL_CLAUSES when moving
OMP_CLAUSE_LASTPRIVATE clause to OMP_FOR_CLAUSES.
(c_parser_omp_teams): Set OMP_TEAMS_COMBINED for combined
constructs.
gcc/cp/
* parser.c (cp_parser_omp_for_loop): Don't add
OMP_CLAUSE_SHARED to OMP_PARALLEL_CLAUSES when moving
OMP_CLAUSE_LASTPRIVATE clause to OMP_FOR_CLAUSES.
(cp_parser_omp_teams): Set OMP_TEAMS_COMBINED for combined
constructs.
gcc/fortran/
* trans-openmp.c (gfc_trans_omp_teams): Set OMP_TEAMS_COMBINED for
combined constructs.
(gfc_trans_omp_target): Make sure BIND_EXPR has non-NULL
BIND_EXPR_BLOCK.
libgomp/
* testsuite/libgomp.c/pr66199-1.c: New test.
* testsuite/libgomp.c/pr66199-2.c: New test.
* testsuite/libgomp.c++/pr66199-1.C: New test.
* testsuite/libgomp.c++/pr66199-2.C: New test.
* testsuite/libgomp.fortran/pr66199-1.f90: New test.
* testsuite/libgomp.fortran/pr66199-2.f90: New test.

--- gcc/tree.h.jj   2015-05-18 09:46:37.0 +0200
+++ gcc/tree.h  2015-05-18 15:07:16.029386340 +0200
@@ -1326,6 +1326,11 @@ extern void protected_set_expr_location
 #define OMP_PARALLEL_COMBINED(NODE) \
   (OMP_PARALLEL_CHECK (NODE)->base.private_flag)
 
+/* True on an OMP_TEAMS statement if it represents an explicit
+   combined teams distribute constructs.  */
+#define OMP_TEAMS_COMBINED(NODE) \
+  (OMP_TEAMS_CHECK (NODE)->base.private_flag)
+
 /* True if OMP_ATOMIC* is supposed to be sequentially consistent
as opposed to relaxed.  */
 #define OMP_ATOMIC_SEQ_CST(NODE) \
--- gcc/gimplify.c.jj   2015-05-13 18:57:44.0 +0200
+++ gcc/gimplify.c  2015-05-19 13:48:14.019466801 +0200
@@ -111,6 +111,9 @@ enum gimplify_omp_var_data
   /* Flag for GOVD_MAP: don't copy back.  */
   GOVD_MAP_TO_ONLY = 8192,
 
+  /* Flag for GOVD_LINEAR or GOVD_LASTPRIVATE: no outer reference.  */
+  GOVD_LINEAR_LASTPRIVATE_NO_OUTER = 16384,
+
   GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
   | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
   | GOVD_LOCAL)
@@ -126,6 +129,7 @@ enum omp_region_type
   ORT_TASK = 4,
   ORT_UNTIED_TASK = 5,
   ORT_TEAMS = 8,
+  ORT_COMBINED_TEAMS = 9,
   /* Data region.  */
   ORT_TARGET_DATA = 16,
   /* Data region with offloading.  */
@@ -5870,7 +5874,7 @@ omp_notice_variable (struct gimplify_omp
 DECL_NAME (lang_hooks.decls.omp_report_decl (decl)));
  error_at (ctx->location, "enclosing task");
}
- else if (ctx->region_type == ORT_TEAMS)
+ else if (ctx->region_type & ORT_TEAMS)
{
  error ("%qE not specified in enclosing teams construct",
 DECL_NAME (lang_hooks.decls.omp_report_decl (decl)));
@@ -5963,6 +5967,13 @@ omp_notice_variable (struct gimplify_omp
  need to propagate anything to an outer context.  */
   if ((flags & GOVD_PRIVATE) && !(flags & GOVD_PRIVATE_OUTER_REF))
 return ret;
+  if ((flags & (GOVD_LINEAR | GOVD_LINEAR_LASTPRIVATE_NO_OUTER))
+  == (GOVD_LINEAR | GOVD_LINEAR_LASTPRIVATE_NO_OUTER))
+return ret;
+  if ((flags & (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE
+   | GOVD_LINEAR_LASTPRIVATE_NO_OUTER))
+  == (GOVD_LASTP

[ping**2] Handle MULTILIB_REUSE in auto-generated SYSROOT_SUFFIX_SPEC macro

2015-05-19 Thread Sandra Loosemore

Re-pinging a patch from last year that never got reviewed:

https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00511.html

This problem still exists in GCC 5.1 and the above patch still fixes it. 
 I haven't tried mainline head yet, but it doesn't look like anything 
else has touched this since we branched GCC 5.


-Sandra



RE: Refactor gimple_expr_type

2015-05-19 Thread Aditya K



> Date: Tue, 19 May 2015 11:33:16 +0200
> Subject: Re: Refactor gimple_expr_type
> From: richard.guent...@gmail.com
> To: hiradi...@msn.com
> CC: tbsau...@tbsaunde.org; gcc-patches@gcc.gnu.org
>
> On Tue, May 19, 2015 at 12:04 AM, Aditya K  wrote:
>>
>>
>> 
>>> Date: Mon, 18 May 2015 12:08:58 +0200
>>> Subject: Re: Refactor gimple_expr_type
>>> From: richard.guent...@gmail.com
>>> To: hiradi...@msn.com
>>> CC: tbsau...@tbsaunde.org; gcc-patches@gcc.gnu.org
>>>
>>> On Sun, May 17, 2015 at 5:31 PM, Aditya K  wrote:


 
> Date: Sat, 16 May 2015 11:53:57 -0400
> From: tbsau...@tbsaunde.org
> To: hiradi...@msn.com
> CC: gcc-patches@gcc.gnu.org
> Subject: Re: Refactor gimple_expr_type
>
> On Fri, May 15, 2015 at 07:13:35AM +, Aditya K wrote:
>> Hi,
>> I have tried to refactor gimple_expr_type to make it more readable. 
>> Removed the switch block and redundant if.
>>
>> Please review this patch.
>
> for some reason your mail client seems to be inserting non breaking
> spaces all over the place. Please either configure it to not do that,
> or use git send-email for patches.

 Please see the updated patch.
>>>
>>> Ok if this passed bootstrap and regtest. (I wish if gimple_expr_type
>>> didn't exist btw...)
>>
>> Thanks for the review. Do you have any suggestions on how to remove 
>> gimple_expr_type. Are there any alternatives to it?
>> I can look into refactoring more (if it is not too complicated) since I'm 
>> already doing this.
>
> Look at each caller - usually they should be fine with using TREE_TYPE
> (gimple_get_lhs ()) (or a more specific one
> dependent on what stmts are expected at the place). You might want to
> first refactor the code
>
> else if (code == GIMPLE_COND)
> gcc_unreachable ();
>
> and deal with the fallout in callers (similar for the void_type_node return).

Thanks for the suggestions. I looked at the use cases there are 47 usages in 
different files. That might be a lot of changes I assume, and would take some 
time.
This patch passes bootstrap and make check (although I'm not very confident 
that my way of make check ran all the regtests)

If this patch is okay to merge please do that. I'll continue working on 
removing gimle_expr_type.

Thanks,
-Aditya


>
> Richard.
>
>
>> -Aditya
>>
>>>
>>> Thanks,
>>> Richard.
>>>
 gcc/ChangeLog:

 2015-05-15 hiraditya 

 * gimple.h (gimple_expr_type): Refactor to make it concise. Remove 
 redundant if.

 diff --git a/gcc/gimple.h b/gcc/gimple.h
 index 95e4fc8..3a83e8f 100644
 --- a/gcc/gimple.h
 +++ b/gcc/gimple.h
 @@ -5717,36 +5717,26 @@ static inline tree
 gimple_expr_type (const_gimple stmt)
 {
 enum gimple_code code = gimple_code (stmt);
 -
 - if (code == GIMPLE_ASSIGN || code == GIMPLE_CALL)
 + /* In general we want to pass out a type that can be substituted
 + for both the RHS and the LHS types if there is a possibly
 + useless conversion involved. That means returning the
 + original RHS type as far as we can reconstruct it. */
 + if (code == GIMPLE_CALL)
 {
 - tree type;
 - /* In general we want to pass out a type that can be substituted
 - for both the RHS and the LHS types if there is a possibly
 - useless conversion involved. That means returning the
 - original RHS type as far as we can reconstruct it. */
 - if (code == GIMPLE_CALL)
 - {
 - const gcall *call_stmt = as_a  (stmt);
 - if (gimple_call_internal_p (call_stmt)
 - && gimple_call_internal_fn (call_stmt) == IFN_MASK_STORE)
 - type = TREE_TYPE (gimple_call_arg (call_stmt, 3));
 - else
 - type = gimple_call_return_type (call_stmt);
 - }
 + const gcall *call_stmt = as_a  (stmt);
 + if (gimple_call_internal_p (call_stmt)
 + && gimple_call_internal_fn (call_stmt) == IFN_MASK_STORE)
 + return TREE_TYPE (gimple_call_arg (call_stmt, 3));
 + else
 + return gimple_call_return_type (call_stmt);
 + }
 + else if (code == GIMPLE_ASSIGN)
 + {
 + if (gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR)
 + return TREE_TYPE (gimple_assign_rhs1 (stmt));
 else
 - switch (gimple_assign_rhs_code (stmt))
 - {
 - case POINTER_PLUS_EXPR:
 - type = TREE_TYPE (gimple_assign_rhs1 (stmt));
 - break;
 -
 - default:
 - /* As fallback use the type of the LHS. */
 - type = TREE_TYPE (gimple_get_lhs (stmt));
 - break;
 - }
 - return type;
 + /* As fallback use the type of the LHS. */
 + return TREE_TYPE (gimple_get_lhs (stmt));
 }
 else if (code == GIMPLE_COND)
 return boolean_type_node;


 Thanks,
 -Aditya





>
>>
>> Thanks,
>> -Aditya
>>
>>
>> gcc/ChangeLog:
>>
>> 2015-05-1

Fix PR48052: loop not vectorized if index is "unsigned int"

2015-05-19 Thread Aditya K
w.r.t. the PR48052, here is the patch which finds out if scev would wrap or not.
The patch symbolically evaluates if valid_niter>= loop->nb_iterations is true. 
In that case the scev would not wrap (??).
Currently, we only look for two special 'patterns', which are sufficient to 
analyze the simple test cases.

valid_niter = ~s (= UNIT_MAX - s)
We have to prove that valid_niter>= loop->nb_iterations

Pattern1 loop->nb_iterations: s>= e ? s - e : 0
Pattern2 loop->nb_iterations: (e - s) -1

In the first case we prove that valid_niter>= loop->nb_iterations in both the 
cases i.e., when s>=e and when not.
In the second case we prove valid_niter>= loop->nb_iterations, by simple 
analysis that  UINT_MAX>= e is true in all cases.

I haven't tested this patch completely. I'm looking for feedback and any scope 
for improvement.


hth,
-Aditya



Vectorize loops which has typecast.

2015-05-19  hiraditya  

    * gcc.dg/vect/pr48052.c: New test.

gcc/ChangeLog:

2015-05-19  hiraditya  

    * tree-ssa-loop-niter.c (fold_binary_cond_p): Fold a conditional 
operation when additional constraints are
    available.
    (fold_binary_minus_p): Fold a subtraction operations of the form (A - B 
-1) when additional constraints are
    available.
    (scev_probably_wraps_p): Use the above two functions to find whether 
valid_niter>= loop->nb_iterations.


diff --git a/gcc/testsuite/gcc.dg/vect/pr48052.c 
b/gcc/testsuite/gcc.dg/vect/pr48052.c
new file mode 100644
index 000..8e406d7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr48052.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
+int foo(int* A, int* B,  unsigned start, unsigned BS)
+{
+  int s;
+  for (unsigned k = start;  k < start + BS; k++)
+    {
+  s += A[k] * B[k];
+    }
+
+  return s;
+}
+
+int bar(int* A, int* B, unsigned BS)
+{
+  int s;
+  for (unsigned k = 0;  k < BS; k++)
+    {
+  s += A[k] * B[k];
+    }
+
+  return s;
+}
+
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 042f8df..ddc00cc 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -3773,6 +3773,117 @@ nowrap_type_p (tree type)
   return false;
 }
 
+/* Return true when op0>= op1.
+   For example:
+   Where, op0 = ~start_3(D);
+   op1 = start_3(D) <= stop_6(D) ? stop_6(D) - start_3(D) : 0;
+   In this case op0 = UINT_MAX - start_3(D);
+   So, op0>= op1 in all cases because UINT_MAX>= stop_6(D),
+   when TREE_TYPE(stop_6(D)) == unsigned int;  */
+bool
+fold_binary_cond_p (enum tree_code code, tree type, tree op0, tree op1)
+{
+  gcc_assert (type == boolean_type_node);
+
+  if (TREE_TYPE (op0) != TREE_TYPE (op1))
+    return false;
+
+  /* TODO: Handle other operations.  */
+  if (code != GE_EXPR)
+    return false;
+  // The type of op0 and op1 should be unsigned.
+  if (!TYPE_UNSIGNED (TREE_TYPE(op0)))
+    return false;
+  if ((TREE_CODE (op0) != BIT_NOT_EXPR) || (TREE_CODE (op1) != COND_EXPR))
+    return false;
+
+  /* We have to show that in both the cases,
+ (when cond is true and when cond is false) op (op0, op1) is true.  */
+   tree neg_op0 = TREE_OPERAND (op0, 0);
+   tree cond_op1 = TREE_OPERAND (op1, 0);
+   tree true_op1 = TREE_OPERAND (op1, 1);
+   tree false_op1 = TREE_OPERAND (op1, 2);
+   gcc_assert(neg_op0 && cond_op1 && true_op1 && false_op1);
+
+  /* When cond is false. Evaluate op (op0, false_op1).  */
+  tree running_exp = fold_binary (code, boolean_type_node, op0, false_op1);
+  if (running_exp == NULL || integer_zerop (running_exp))
+    /* TODO: Handle more cases here. */
+    return false;
+
+  /* When cond is true. Evaluate op (op0, true_op1).  */
+  running_exp = fold_binary (code, boolean_type_node, op0, true_op1);
+  if (running_exp != NULL && integer_nonzerop (running_exp))
+    return true;
+
+  tree smaller, bigger;
+  if (TREE_CODE (cond_op1) == LE_EXPR)
+    {
+  smaller = TREE_OPERAND (cond_op1, 0);
+  bigger = TREE_OPERAND (cond_op1, 1);
+    } else return false;
+
+  if (TREE_CODE (true_op1) == MINUS_EXPR)
+    {
+  tree minuend = TREE_OPERAND (true_op1, 0);
+  tree subtrahend = TREE_OPERAND (true_op1, 1);
+  if (subtrahend == neg_op0 && subtrahend == smaller && minuend == bigger)
+    {
+  tree extreme = upper_bound_in_type (TREE_TYPE (neg_op0),
+  TREE_TYPE (neg_op0));
+  running_exp = fold_binary (code, boolean_type_node, extreme, 
minuend);
+  return running_exp != NULL && integer_nonzerop(running_exp);
+    } else return false;
+    } else return false;
+}
+
+/* Return true when op0>= op1 and
+   op0 is ~start3(D) or, UINT_MAX - start3(D)
+   op1 is (_21 - start_3(D)) - 1; */
+bool
+fold_binary_minus_p (enum tree_code code, tree type, tree op0, tree op1)
+{
+  gcc_assert (type == boolean_type_node);
+
+  if (TREE_TYPE (op0

Re: [RFC] COMDAT Safe Module Level Multi versioning

2015-05-19 Thread Xinliang David Li
>
> Hm.  But which options are unsafe?  Also wouldn't it be better to simply
> _not_ have unsafe options produce comdats but always make local clones
> for them (thus emit the comdat with "unsafe" flags dropped)?

Always localize comdat functions may lead to text size increase. It
does not work if the comdat function is a virtual function for
instance.

David


>
> Richard.
>
>>
>> Thanks
>> Sri


Re: [Patch ARM-AArch64/testsuite 00/13] Neon intrinsics executable tests

2015-05-19 Thread Christophe Lyon
On 19 May 2015 at 15:32, James Greenhalgh  wrote:
> On Tue, May 12, 2015 at 09:30:48PM +0100, Christophe Lyon wrote:
>> This patch series is a follow-up to the tests I already contributed,
>> converted from my original testsuite.
>>
>> This series consists in 13 new files, which can be committed
>> independently.
>>
>> Another series (hopefully final) will follow.
>>
>> Tested with qemu on arm*linux, aarch64-linux. I couldn't test on 
>> aarch64_be-none-elf because my build is currently broken (see PR 66018).
>>
>> 2015-05-12  Christophe Lyon  
>>
>>   * gcc.target/aarch64/neon-intrinsics/vqmovn.c: New file.
>>   * gcc.target/aarch64/neon-intrinsics/vqmovun.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqrdmulh.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqrdmulh_lane.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqrdmulh_n.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqrshl.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqrshrn_n.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqrshrun_n.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqshl.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqshl_n.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqshlu_n.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqshrn_n.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqshrun_n.c: Likewise.
>
> Hi Christophe,
>
> This patch set looks good to me. The patch set is OK, please apply it.
>
> One small nit, could you run through and check the alignment of the
> trailing \ in some of the macro definitions? It might be the mail
> clients, but I see (for example):
>
> +  /* Basic test: v2=vqshlu_n(v1,v), then store the result.  */
> +#define TEST_VQSHLU_N2(INSN, Q, T1, T2, T3, T4, W, N, V, 
> EXPECTED_CUMULATIVE_SAT, CMT) \
> +  Set_Neon_Cumulative_Sat(0, VECT_VAR(vector_res, T3, W, N));  \
> +  VECT_VAR(vector_res, T3, W, N) = \
> +INSN##Q##_n_##T2##W(VECT_VAR(vector, T1, W, N),\
> +   V); \
> +  vst1##Q##_##T4##W(VECT_VAR(result, T3, W, N),\
> +   VECT_VAR(vector_res, T3, W, N));\
> +  CHECK_CUMULATIVE_SAT(TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
> +
>
> in patch 11/13.

Done, the trailing \ is now further to the right in some of the patches.

>
> Also, if you could look out for aarch64_be fallout once the build
> starts going, that would be great.
>
Sure, my automatic validations will take care of that.

Thanks

Christophe.

> Thanks,
> James
>
>>
>> Christophe Lyon (13):
>>   Add vqmovn tests.
>>   Add vqmovun tests.
>>   Add vqrdmulh tests.
>>   Add vqrdmulh_lane tests.
>>   Add vqrdmulh_n tests.
>>   Add vqrshl tests.
>>   Add vqrshrn_n tests.
>>   Add vqrshrun_n tests.
>>   Add vqshl tests.
>>   Add vqshl_n tests.
>>   Add vqshlu_n tests.
>>   Add vqshrn_n tests.
>>   Add vqshrun_n tests.
>>
>>  .../gcc.target/aarch64/advsimd-intrinsics/vqmovn.c |  134 +++
>>  .../aarch64/advsimd-intrinsics/vqmovun.c   |   93 ++
>>  .../aarch64/advsimd-intrinsics/vqrdmulh.c  |  161 +++
>>  .../aarch64/advsimd-intrinsics/vqrdmulh_lane.c |  169 +++
>>  .../aarch64/advsimd-intrinsics/vqrdmulh_n.c|  155 +++
>>  .../gcc.target/aarch64/advsimd-intrinsics/vqrshl.c | 1090 
>> 
>>  .../aarch64/advsimd-intrinsics/vqrshrn_n.c |  174 
>>  .../aarch64/advsimd-intrinsics/vqrshrun_n.c|  189 
>>  .../gcc.target/aarch64/advsimd-intrinsics/vqshl.c  |  829 +++
>>  .../aarch64/advsimd-intrinsics/vqshl_n.c   |  234 +
>>  .../aarch64/advsimd-intrinsics/vqshlu_n.c  |  263 +
>>  .../aarch64/advsimd-intrinsics/vqshrn_n.c  |  177 
>>  .../aarch64/advsimd-intrinsics/vqshrun_n.c |  133 +++
>>  13 files changed, 3801 insertions(+)
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqmovn.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqmovun.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmulh.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmulh_lane.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmulh_n.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshl.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshrn_n.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshrun_n.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshl.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshl_n.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshlu_n.c
>>  create mode 100644 
>> gcc/testsuite/g

Re: [Patch AArch64] Add cpu_defines.h for AArch64.

2015-05-19 Thread Ramana Radhakrishnan
On Tue, May 19, 2015 at 4:54 PM,   wrote:
>
>
>
>
>> On May 19, 2015, at 5:54 AM, Ramana Radhakrishnan 
>>  wrote:
>>
>> Hi,
>>
>> Like the ARM port, the AArch64 ports needs to set glibc_integral_traps to 
>> false as integer divide instructions do not trap.
>>
>> Bootstrapped and regression tested on aarch64-none-linux-gnu
>>
>> Ok to apply ?
>
> Not really questioning your patch but questioning libstdc++'s defaults.
>  I wonder if this should be the default as most targets don't trap, only a 
> few that does. And it should be safer default to say they don't trap too?


How about we  #error out if targets do *not* define some of these
defaults in libstdc++  ? There are far more ports with weak memory
models, and the defaults for _GLIBCXX_READ/WRITE_BARRIER also appear
unsafe . I was toying with a patch like that to force targets to
define this sort of thing but I need to read more of configure.host
before I make up my mind.



regards
Ramana


>
> Thanks,
> Andrew
>
>
>>
>> regards
>> Ramana
>>
>>
>> 2015-05-17  Ramana Radhakrishnan  
>>
>>* configure.host: Define cpu_defines_dir for AArch64
>>* config/cpu/aarch64/cpu_defines.h: New file.
>> <0002-Do-the-same-for-AArch64.patch>


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Michael Matz
Hi,

On Tue, 19 May 2015, Jeff Law wrote:

> > > Forget lazy binding. It's dead anyway because serious distros want 
> > > PIE+relro+bindnow+...
> > 
> > You keep saying this, but I can't help the feeling it's mostly because 
> > musl doesn't support it ;-)
> 
> FWIW, Red Hat is pushing PIE & partial RELRO deeper and deeper into the 
> distribution.

Yeah, us as well, though I don't necessarily see the point for most 
packages; feels a bit like a checkmark item :)


Ciao,
Michael.


Re: [Patch AArch64] Add cpu_defines.h for AArch64.

2015-05-19 Thread pinskia




> On May 19, 2015, at 5:54 AM, Ramana Radhakrishnan 
>  wrote:
> 
> Hi,
> 
> Like the ARM port, the AArch64 ports needs to set glibc_integral_traps to 
> false as integer divide instructions do not trap.
> 
> Bootstrapped and regression tested on aarch64-none-linux-gnu
> 
> Ok to apply ?

Not really questioning your patch but questioning libstdc++'s defaults. I 
wonder if this should be the default as most targets don't trap, only a few 
that does. And it should be safer default to say they don't trap too?

Thanks,
Andrew


> 
> regards
> Ramana
> 
> 
> 2015-05-17  Ramana Radhakrishnan  
> 
>* configure.host: Define cpu_defines_dir for AArch64
>* config/cpu/aarch64/cpu_defines.h: New file.
> <0002-Do-the-same-for-AArch64.patch>


Re: breakage with series "[0/9] Record number of hard registers in a REG"

2015-05-19 Thread Richard Sandiford
Hans-Peter Nilsson  writes:
> g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions
> -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wwrite-strings
> -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic
> -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common
> -DHAVE_CONFIG_H -I. -I. -I/tmp/hpautotest-gcc1/gcc/gcc
> -I/tmp/hpautotest-gcc1/gcc/gcc/. -I/tmp/hpautotest-gcc1/gcc/gcc/../include
> -I/tmp/hpautotest-gcc1/gcc/gcc/../libcpp/include
> -I/tmp/hpautotest-gcc1/cris-elf/gccobj/./gmp
> -I/tmp/hpautotest-gcc1/gcc/gmp
> -I/tmp/hpautotest-gcc1/cris-elf/gccobj/./mpfr
> -I/tmp/hpautotest-gcc1/gcc/mpfr -I/tmp/hpautotest-gcc1/gcc/mpc/src
> -I/tmp/hpautotest-gcc1/gcc/gcc/../libdecnumber
> -I/tmp/hpautotest-gcc1/gcc/gcc/../libdecnumber/dpd -I../libdecnumber
> -I/tmp/hpautotest-gcc1/gcc/gcc/../libbacktrace -o cris.o -MT cris.o -MMD
> -MP -MF ./.deps/cris.TPo /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c
> In file included from /tmp/hpautotest-gcc1/gcc/gcc/rtl.h:25,
>  from /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:25:
> /tmp/hpautotest-gcc1/gcc/gcc/input.h:37: warning: comparison between
> signed and unsigned integer expressions
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c: In function 'void
> cris_expand_prologue()':
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3141: error:
> gen_rtx_raw_REG' was not declared in this scope
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3165: error:
> gen_rtx_raw_REG' was not declared in this scope
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3263: error:
> gen_rtx_raw_REG' was not declared in this scope
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c: In function 'void
> cris_expand_epilogue()':
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3429: error:
> gen_rtx_raw_REG' was not declared in this scope
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3515: error:
> gen_rtx_raw_REG' was not declared in this scope
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3548: error:
> gen_rtx_raw_REG' was not declared in this scope
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3573: error:
> gen_rtx_raw_REG' was not declared in this scope
> make[2]: *** [cris.o] Error 1

Installed as obvious after testing that cris-elf, microblaze-elf
and sparc-linux-gnu now build.  Obviously I mustn't have used
the usual recursive grep.

Thanks,
Richard


gcc/
* config/cris/cris.c (cris_expand_prologue): Use gen_raw_REG
instead of gen_rtx_raw_REG.
(cris_expand_epilogue): Likewise.
* config/microblaze/microblaze.c (microblaze_classify_address):
Likewise.
* config/sparc/sparc.md: Likewise.

Index: gcc/config/cris/cris.c
===
--- gcc/config/cris/cris.c  2015-05-19 16:40:34.734003511 +0100
+++ gcc/config/cris/cris.c  2015-05-19 16:40:34.890001679 +0100
@@ -3138,7 +3138,7 @@ cris_expand_prologue (void)
 
  mem = gen_rtx_MEM (SImode, stack_pointer_rtx);
  set_mem_alias_set (mem, get_varargs_alias_set ());
- insn = emit_move_insn (mem, gen_rtx_raw_REG (SImode, regno));
+ insn = emit_move_insn (mem, gen_raw_REG (SImode, regno));
 
  /* Note the absence of RTX_FRAME_RELATED_P on the above insn:
 the value isn't restored, so we don't want to tell dwarf2
@@ -3162,7 +3162,7 @@ cris_expand_prologue (void)
 
   mem = gen_rtx_MEM (SImode, stack_pointer_rtx);
   set_mem_alias_set (mem, get_frame_alias_set ());
-  insn = emit_move_insn (mem, gen_rtx_raw_REG (SImode, CRIS_SRP_REGNUM));
+  insn = emit_move_insn (mem, gen_raw_REG (SImode, CRIS_SRP_REGNUM));
   RTX_FRAME_RELATED_P (insn) = 1;
   framesize += 4;
 }
@@ -3260,7 +3260,7 @@ cris_expand_prologue (void)
 
  mem = gen_rtx_MEM (SImode, stack_pointer_rtx);
  set_mem_alias_set (mem, get_frame_alias_set ());
- insn = emit_move_insn (mem, gen_rtx_raw_REG (SImode, regno));
+ insn = emit_move_insn (mem, gen_raw_REG (SImode, regno));
  RTX_FRAME_RELATED_P (insn) = 1;
 
  framesize += 4 + size;
@@ -3426,7 +3426,7 @@ cris_expand_epilogue (void)
mem = gen_rtx_MEM (SImode, gen_rtx_POST_INC (SImode,
 stack_pointer_rtx));
set_mem_alias_set (mem, get_frame_alias_set ());
-   insn = emit_move_insn (gen_rtx_raw_REG (SImode, regno), mem);
+   insn = emit_move_insn (gen_raw_REG (SImode, regno), mem);
 
/* Whenever we emit insns with post-incremented addresses
   ourselves, we must add a post-inc note manually.  */
@@ -3512,7 +3512,7 @@ cris_expand_epilogue (void)
{
  rtx mem;
  rtx insn;
- rtx srpreg = gen_rtx_raw_REG (SImode, CRIS_SRP_REGNUM);
+ rtx srpreg = gen_raw_REG (SImode, CRIS_SRP_REGNUM);
  mem = gen_rtx_MEM (SImode,
 gen_rtx_POST_INC (SImode,

[Patch] [AArch64] PR target 66049: fix add/extend gcc test suite failures

2015-05-19 Thread Kumar, Venkataramanan
Hi Maintainers, 

Please find the attached patch, that fixes add/extend gcc test suite failures 
in Aarch64 target.  
Ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66049

These tests started to fail after we prevented combiner from converting shift 
RTX to mult RTX, when the RTX is not inside a memory operation (r222874) .
Now I have added new add/extend patterns which are based on shift operations,  
to fix these cases.

Testing status with the patch.

(1) GCC bootstrap on AArch64 successful. 
(2)  SPEC2006 INT runs did not show any degradation.
(3) gcc regression testing passed.

(-Snip-)
# Comparing 3 common sum files
## /bin/sh ./gcc-fsf-trunk/contrib/compare_tests  /tmp/gxx-sum1.24998 
/tmp/gxx-sum2.24998
Tests that now work, but didn't before:

gcc.target/aarch64/adds1.c scan-assembler adds\tw[0-9]+, w[0-9]+, w[0-9]+, lsl 3
gcc.target/aarch64/adds1.c scan-assembler adds\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3
gcc.target/aarch64/adds3.c scan-assembler-times adds\tx[0-9]+, x[0-9]+, 
x[0-9]+, sxtw 2
gcc.target/aarch64/extend.c scan-assembler add\tw[0-9]+,.*uxth #?1
gcc.target/aarch64/extend.c scan-assembler add\tx[0-9]+,.*uxtw #?3
gcc.target/aarch64/extend.c scan-assembler sub\tw[0-9]+,.*uxth #?1
gcc.target/aarch64/extend.c scan-assembler sub\tx[0-9]+,.*uxth #?1
gcc.target/aarch64/extend.c scan-assembler sub\tx[0-9]+,.*uxtw #?3
gcc.target/aarch64/subs1.c scan-assembler subs\tw[0-9]+, w[0-9]+, w[0-9]+, lsl 3
gcc.target/aarch64/subs1.c scan-assembler subs\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3
gcc.target/aarch64/subs3.c scan-assembler-times subs\tx[0-9]+, x[0-9]+, 
x[0-9]+, sxtw 2

# No differences found in 3 common sum files  
(-Snip-)

The patterns are fixing the regressing tests, so I have not added any new 
tests. 
Regarding  removal of the old patterns based on "mults",  I am planning to do 
it as a separate work.

Is this OK for trunk ? 

gcc/ChangeLog 

2015-05-19  Venkataramanan Kumar  

* config/aarch64/aarch64.md
(*adds_shift_imm_):  New pattern.
(*subs_shift_imm_):  Likewise.
(*adds__shift_):  Likewise.
(*subs__shift_): Likewise.
(*add_uxt_shift2): Likewise.
(*add_uxtsi_shift2_uxtw): Likewise.
   (*sub_uxt_shift2): Likewise.
   (*sub_uxtsi_shift2_uxtw): Likewise.


Regards,
Venkat.
  


 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1dbadc0..d0d6a6a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1539,6 +1539,38 @@
   [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
 )
 
+(define_insn "*adds_shift_imm_"
+  [(set (reg:CC_NZ CC_REGNUM)
+   (compare:CC_NZ
+(plus:GPI (ASHIFT:GPI 
+   (match_operand:GPI 1 "register_operand" "r")
+   (match_operand:QI 2 "aarch64_shift_imm_" "n"))
+  (match_operand:GPI 3 "register_operand" "r"))
+(const_int 0)))
+   (set (match_operand:GPI 0 "register_operand" "=r")
+   (plus:GPI (ASHIFT:GPI (match_dup 1) (match_dup 2))
+ (match_dup 3)))]
+  ""
+  "adds\\t%0, %3, %1,  %2"
+  [(set_attr "type" "alus_shift_imm")]
+)
+
+(define_insn "*subs_shift_imm_"
+  [(set (reg:CC_NZ CC_REGNUM)
+   (compare:CC_NZ
+(minus:GPI (match_operand:GPI 1 "register_operand" "r")
+   (ASHIFT:GPI
+(match_operand:GPI 2 "register_operand" "r")
+(match_operand:QI 3 "aarch64_shift_imm_" "n")))
+(const_int 0)))
+   (set (match_operand:GPI 0 "register_operand" "=r")
+   (minus:GPI (match_dup 1)
+  (ASHIFT:GPI (match_dup 2) (match_dup 3]
+  ""
+  "subs\\t%0, %1, %2,  %3"
+  [(set_attr "type" "alus_shift_imm")]
+)
+
 (define_insn "*adds_mul_imm_"
   [(set (reg:CC_NZ CC_REGNUM)
(compare:CC_NZ
@@ -1599,6 +1631,42 @@
   [(set_attr "type" "alus_ext")]
 )
 
+(define_insn "*adds__shift_"
+  [(set (reg:CC_NZ CC_REGNUM)
+   (compare:CC_NZ
+(plus:GPI (ashift:GPI 
+   (ANY_EXTEND:GPI 
+(match_operand:ALLX 1 "register_operand" "r"))
+   (match_operand 2 "aarch64_imm3" "Ui3"))
+  (match_operand:GPI 3 "register_operand" "r"))
+(const_int 0)))
+   (set (match_operand:GPI 0 "register_operand" "=rk")
+   (plus:GPI (ashift:GPI (ANY_EXTEND:GPI (match_dup 1))
+ (match_dup 2))
+ (match_dup 3)))]
+  ""
+  "adds\\t%0, %3, %1, xt %2"
+  [(set_attr "type" "alus_ext")]
+)
+
+(define_insn "*subs__shift_"
+  [(set (reg:CC_NZ CC_REGNUM)
+   (compare:CC_NZ
+(minus:GPI (match_operand:GPI 1 "register_operand" "r")
+   (ashift:GPI 
+(ANY_EXTEND:GPI
+ (match_operand:ALLX 2 "register_operand" "r"))
+(match_operand 3 "aarch64_imm3" "Ui3")))
+(const_int 0)))
+   (set (match_operand:GPI 0 "register_operand" "=rk")
+   (minus:GPI (match_dup 1)
+  (ashift:GPI 

Re: [Patch] [AArch64] PR target 66049: fix add/extend gcc test suite failures

2015-05-19 Thread Kyrill Tkachov

Hi Venkat,

On 19/05/15 16:37, Kumar, Venkataramanan wrote:

Hi Maintainers,

Please find the attached patch, that fixes add/extend gcc test suite failures 
in Aarch64 target.
Ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66049

These tests started to fail after we prevented combiner from converting shift 
RTX to mult RTX, when the RTX is not inside a memory operation (r222874) .
Now I have added new add/extend patterns which are based on shift operations,  
to fix these cases.

Testing status with the patch.

(1) GCC bootstrap on AArch64 successful.
(2)  SPEC2006 INT runs did not show any degradation.


Does that mean there was no performance regression? Or no codegen difference?
What I'd expect from this patch is that the codegen would be the same as before 
the combine patch
(r222874). A performance difference can sometimes be hard to measure even at 
worse code quality.
Can you please confirm that on SPEC2006 INT the adds and shifts are now back to 
being combined
into a single instruction?

Thanks,
Kyrill


(3) gcc regression testing passed.

(-Snip-)
# Comparing 3 common sum files
## /bin/sh ./gcc-fsf-trunk/contrib/compare_tests  /tmp/gxx-sum1.24998 
/tmp/gxx-sum2.24998
Tests that now work, but didn't before:

gcc.target/aarch64/adds1.c scan-assembler adds\tw[0-9]+, w[0-9]+, w[0-9]+, lsl 3
gcc.target/aarch64/adds1.c scan-assembler adds\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3
gcc.target/aarch64/adds3.c scan-assembler-times adds\tx[0-9]+, x[0-9]+, 
x[0-9]+, sxtw 2
gcc.target/aarch64/extend.c scan-assembler add\tw[0-9]+,.*uxth #?1
gcc.target/aarch64/extend.c scan-assembler add\tx[0-9]+,.*uxtw #?3
gcc.target/aarch64/extend.c scan-assembler sub\tw[0-9]+,.*uxth #?1
gcc.target/aarch64/extend.c scan-assembler sub\tx[0-9]+,.*uxth #?1
gcc.target/aarch64/extend.c scan-assembler sub\tx[0-9]+,.*uxtw #?3
gcc.target/aarch64/subs1.c scan-assembler subs\tw[0-9]+, w[0-9]+, w[0-9]+, lsl 3
gcc.target/aarch64/subs1.c scan-assembler subs\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3
gcc.target/aarch64/subs3.c scan-assembler-times subs\tx[0-9]+, x[0-9]+, 
x[0-9]+, sxtw 2

# No differences found in 3 common sum files
(-Snip-)

The patterns are fixing the regressing tests, so I have not added any new tests.
Regarding  removal of the old patterns based on "mults",  I am planning to do 
it as a separate work.

Is this OK for trunk ?

gcc/ChangeLog

2015-05-19  Venkataramanan Kumar  

 * config/aarch64/aarch64.md
 (*adds_shift_imm_):  New pattern.
 (*subs_shift_imm_):  Likewise.
 (*adds__shift_):  Likewise.
 (*subs__shift_): Likewise.
 (*add_uxt_shift2): Likewise.
 (*add_uxtsi_shift2_uxtw): Likewise.
(*sub_uxt_shift2): Likewise.
(*sub_uxtsi_shift2_uxtw): Likewise.


Regards,
Venkat.
   



  




Re: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option

2015-05-19 Thread H.J. Lu
On Tue, May 19, 2015 at 8:33 AM,   wrote:
>
> 
> From: gcc-patches-ow...@gcc.gnu.org [gcc-patches-ow...@gcc.gnu.org] on behalf 
> of H.J. Lu [hjl.to...@gmail.com]
> Sent: Tuesday, May 19, 2015 11:27 AM
> To: Joseph Myers
> Cc: Magnus Granberg; GCC Patches
> Subject: Re: PING^3: [PATCH]: New configure options that make the compiler 
> use -fPIE and -pie as default option
>
> On Tue, May 19, 2015 at 8:21 AM, Joseph Myers  wrote:
>> ...
>> I think the whole thing should be posted as one patch, with both the
>> target-independent changes and the target-specific changes for all
>> targets.
>>
>
> That is what makes me concerned.  I have some simple target-specified
> patches which weren't reviewed for years. What will happen if no one
> reviews some simple target-specified changes due to
>
> 1. Reviewers don't have access to those targets.
> 2. Target maintainers aren't review them.
> 3. There are no clear maintainers for those targets.
>
> As the result, my patch may go nowhere.
> ---
>
> But that hasn't stopped others from posting patches like that, or getting 
> them approved.  And we also have global maintainers who can approve things.  
> It feels a bit like a hypothetical issue is being used as a reason to do part 
> of the job.

It is not hypothetical.  See:

https://gcc.gnu.org/ml/gcc-patches/2012-09/msg01558.html

It happened before.  I don't want to make it harder for global maintainers
to review a patch which has zero-impact on a target if --enable-default-pie
isn't used.

-- 
H.J.


RE: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option

2015-05-19 Thread Paul_Koning


From: gcc-patches-ow...@gcc.gnu.org [gcc-patches-ow...@gcc.gnu.org] on behalf 
of H.J. Lu [hjl.to...@gmail.com]
Sent: Tuesday, May 19, 2015 11:27 AM
To: Joseph Myers
Cc: Magnus Granberg; GCC Patches
Subject: Re: PING^3: [PATCH]: New configure options that make the compiler use 
-fPIE and -pie as default option

On Tue, May 19, 2015 at 8:21 AM, Joseph Myers  wrote:
> ...
> I think the whole thing should be posted as one patch, with both the
> target-independent changes and the target-specific changes for all
> targets.
>

That is what makes me concerned.  I have some simple target-specified
patches which weren't reviewed for years. What will happen if no one
reviews some simple target-specified changes due to

1. Reviewers don't have access to those targets.
2. Target maintainers aren't review them.
3. There are no clear maintainers for those targets.

As the result, my patch may go nowhere.
---

But that hasn't stopped others from posting patches like that, or getting them 
approved.  And we also have global maintainers who can approve things.  It 
feels a bit like a hypothetical issue is being used as a reason to do part of 
the job.


paul


Re: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option

2015-05-19 Thread Joseph Myers
On Tue, 19 May 2015, H.J. Lu wrote:

> > I think the whole thing should be posted as one patch, with both the
> > target-independent changes and the target-specific changes for all
> > targets.
> >
> 
> That is what makes me concerned.  I have some simple target-specified
> patches which weren't reviewed for years. What will happen if no one

For any unreviewed patch, keep pinging weekly.

> reviews some simple target-specified changes due to
> 
> 1. Reviewers don't have access to those targets.
> 2. Target maintainers aren't review them.
> 3. There are no clear maintainers for those targets.

I've already said in 
 that, given 
target maintainers CC:ed, I might be inclined to approve the patch on the 
basis of allowing them a week to test their target changes.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option

2015-05-19 Thread H.J. Lu
On Tue, May 19, 2015 at 8:21 AM, Joseph Myers  wrote:
> On Mon, 18 May 2015, H.J. Lu wrote:
>
>> > Have updates for all affected specs for all targets been posted?  I just
>> > saw a small and apparently arbitrary subset of targets with patches, and
>> > no explanation of how those targets were identified or why the other
>> > targets with specs mentioning the options in question did not need
>> > updates.
>> >
>>
>> I only posted patches for an  arbitrary subset of targets because
>>
>> 1. Not everyone is interested in --enable-default-pie.
>> 2. I can't tests all targets myself.
>>
>> If patches for all targets is the only blocker before the patch
>> will be approved or target maintainers will help me test the patch,
>> I will post patches for each target affected.
>
> I think the whole thing should be posted as one patch, with both the
> target-independent changes and the target-specific changes for all
> targets.
>

That is what makes me concerned.  I have some simple target-specified
patches which weren't reviewed for years. What will happen if no one
reviews some simple target-specified changes due to

1. Reviewers don't have access to those targets.
2. Target maintainers aren't review them.
3. There are no clear maintainers for those targets.

As the result, my patch may go nowhere.

-- 
H.J.


Re: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option

2015-05-19 Thread Joseph Myers
On Mon, 18 May 2015, H.J. Lu wrote:

> > Have updates for all affected specs for all targets been posted?  I just
> > saw a small and apparently arbitrary subset of targets with patches, and
> > no explanation of how those targets were identified or why the other
> > targets with specs mentioning the options in question did not need
> > updates.
> >
> 
> I only posted patches for an  arbitrary subset of targets because
> 
> 1. Not everyone is interested in --enable-default-pie.
> 2. I can't tests all targets myself.
> 
> If patches for all targets is the only blocker before the patch
> will be approved or target maintainers will help me test the patch,
> I will post patches for each target affected.

I think the whole thing should be posted as one patch, with both the 
target-independent changes and the target-specific changes for all 
targets.

-- 
Joseph S. Myers
jos...@codesourcery.com


breakage with series "[0/9] Record number of hard registers in a REG"

2015-05-19 Thread Hans-Peter Nilsson
> From: Richard Sandiford 
> Date: Mon, 18 May 2015 20:09:19 +0200

> While looking at a profile of gcc, I noticed one thing fairly high
> up the list was a loop iterating over all the registers in a REG,
> apparently due to the delay in computing the index for hard_regno_nregs
> and then loading the value (which would often be an L1 cache miss).

> Each patch in the series was individually bootstrapped & regression-tested
> on x86_64-linux-gnu.
> 
> Thanks,
> Richard
> 

Please also make use of config-list.mk or a subset affecting
targets.  Build succeded for cris-elf last at r223334.  Build
failed at r223355, r223364, r223366, it seems from a commit in
this patch series:

...
g++ -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wwrite-strings -Wcast-qual 
-Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long 
-Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. 
-I. -I/tmp/hpautotest-gcc1/gcc/gcc -I/tmp/hpautotest-gcc1/gcc/gcc/. 
-I/tmp/hpautotest-gcc1/gcc/gcc/../include 
-I/tmp/hpautotest-gcc1/gcc/gcc/../libcpp/include 
-I/tmp/hpautotest-gcc1/cris-elf/gccobj/./gmp -I/tmp/hpautotest-gcc1/gcc/gmp 
-I/tmp/hpautotest-gcc1/cris-elf/gccobj/./mpfr -I/tmp/hpautotest-gcc1/gcc/mpfr 
-I/tmp/hpautotest-gcc1/gcc/mpc/src  
-I/tmp/hpautotest-gcc1/gcc/gcc/../libdecnumber 
-I/tmp/hpautotest-gcc1/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
-I/tmp/hpautotest-gcc1/gcc/gcc/../libbacktrace   -o cris.o -MT cris.o -MMD -MP 
-MF ./.deps/cris.TPo /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c
In file included from /tmp/hpautotest-gcc1/gcc/gcc/rtl.h:25,
 from /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:25:
/tmp/hpautotest-gcc1/gcc/gcc/input.h:37: warning: comparison between signed and 
unsigned integer expressions
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c: In function 'void 
cris_expand_prologue()':
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3141: error: 'gen_rtx_raw_REG' 
was not declared in this scope
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3165: error: 'gen_rtx_raw_REG' 
was not declared in this scope
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3263: error: 'gen_rtx_raw_REG' 
was not declared in this scope
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c: In function 'void 
cris_expand_epilogue()':
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3429: error: 'gen_rtx_raw_REG' 
was not declared in this scope
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3515: error: 'gen_rtx_raw_REG' 
was not declared in this scope
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3548: error: 'gen_rtx_raw_REG' 
was not declared in this scope
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3573: error: 'gen_rtx_raw_REG' 
was not declared in this scope
make[2]: *** [cris.o] Error 1

brgds, H-P


Re: [PATCH, alpha]: Some cleanups

2015-05-19 Thread Uros Bizjak
On Tue, May 19, 2015 at 5:10 PM, Uros Bizjak  wrote:
> No functional changes.
>
> 2015-05-18  Uros Bizjak  
>
> * config/alpha/alpha.c (alpha_legitimize_reload_address)
> (alpha_preferred_reload_class, alpha_legitimate_constant_p): Use
> CONST_INT_P, CONST_SCALAR_INT_P and CONST_DOUBLE_P predicates.
> (alpha_split_reload_pair) :
> Use CASE_CONST_SCALAR_INT.
> (print_operand) : Use mode_width_operand to check the
> value of the constant.
> * config/alpha/alpha.md (movti): Use CONST_SCALAR_INT_P predicate.
> * config/alpha/predicates.md (input_operand): Use general_operand
> instead of match_code as operand check.
> (symbolic_operand): Use match_code with subexpression digits.
> * config/alpha/constraints.md (Q): Ditto.
>
> Tested on alpha-linux-gnu and committed to mainline SVN.

... now with a patch.

Uros.
Index: config/alpha/alpha.c
===
--- config/alpha/alpha.c(revision 223268)
+++ config/alpha/alpha.c(working copy)
@@ -1352,7 +1352,7 @@ alpha_legitimize_reload_address (rtx x,
   && REG_P (XEXP (x, 0))
   && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER
   && REGNO_OK_FOR_BASE_P (REGNO (XEXP (x, 0)))
-  && GET_CODE (XEXP (x, 1)) == CONST_INT)
+  && CONST_INT_P (XEXP (x, 1)))
 {
   HOST_WIDE_INT val = INTVAL (XEXP (x, 1));
   HOST_WIDE_INT low = ((val & 0x) ^ 0x8000) - 0x8000;
@@ -1644,9 +1644,8 @@ alpha_preferred_reload_class(rtx x, enum reg_class
 return rclass;
 
   /* These sorts of constants we can easily drop to memory.  */
-  if (CONST_INT_P (x)
-  || GET_CODE (x) == CONST_WIDE_INT
-  || GET_CODE (x) == CONST_DOUBLE
+  if (CONST_SCALAR_INT_P (x)
+  || CONST_DOUBLE_P (x)
   || GET_CODE (x) == CONST_VECTOR)
 {
   if (rclass == FLOAT_REGS)
@@ -2133,7 +2132,7 @@ alpha_legitimate_constant_p (machine_mode mode, rt
 
 case CONST:
   if (GET_CODE (XEXP (x, 0)) == PLUS
- && GET_CODE (XEXP (XEXP (x, 0), 1)) == CONST_INT)
+ && CONST_INT_P (XEXP (XEXP (x, 0), 1)))
x = XEXP (XEXP (x, 0), 0);
   else
return true;
@@ -3283,8 +3282,7 @@ alpha_split_tmode_pair (rtx operands[4], machine_m
   operands[2] = adjust_address (operands[1], DImode, 0);
   break;
 
-case CONST_INT:
-case CONST_WIDE_INT:
+CASE_CONST_SCALAR_INT:
 case CONST_DOUBLE:
   gcc_assert (operands[1] == CONST0_RTX (mode));
   operands[2] = operands[3] = const0_rtx;
@@ -5257,9 +5255,7 @@ print_operand (FILE *file, rtx x, int code)
 
 case 'M':
   /* 'b', 'w', 'l', or 'q' as the value of the constant.  */
-  if (!CONST_INT_P (x)
- || (INTVAL (x) != 8 && INTVAL (x) != 16
- && INTVAL (x) != 32 && INTVAL (x) != 64))
+  if (!mode_width_operand (x, VOIDmode))
output_operand_lossage ("invalid %%M value");
 
   fprintf (file, "%s",
Index: config/alpha/alpha.md
===
--- config/alpha/alpha.md   (revision 223268)
+++ config/alpha/alpha.md   (working copy)
@@ -4153,8 +4153,7 @@
   /* We must put 64-bit constants in memory.  We could keep the
  32-bit constants in TImode and rely on the splitter, but
  this doesn't seem to be worth the pain.  */
-  else if (CONST_INT_P (operands[1])
-  || GET_CODE (operands[1]) == CONST_WIDE_INT)
+  else if (CONST_SCALAR_INT_P (operands[1]))
 {
   rtx in[2], out[2], target;
 
Index: config/alpha/constraints.md
===
--- config/alpha/constraints.md (revision 223298)
+++ config/alpha/constraints.md (working copy)
@@ -97,7 +97,7 @@
 (define_memory_constraint "Q"
   "@internal A normal_memory_operand"
   (and (match_code "mem")
-   (not (match_test "GET_CODE (XEXP (op, 0)) == AND"
+   (not (match_code "and" "0"
 
 (define_constraint "R"
   "@internal A direct_call_operand"
Index: config/alpha/predicates.md
===
--- config/alpha/predicates.md  (revision 223298)
+++ config/alpha/predicates.md  (working copy)
@@ -72,7 +72,7 @@
 ;; Return 1 if the operand is a non-symbolic, nonzero constant operand.
 (define_predicate "non_zero_const_operand"
   (and (match_code "const_int,const_wide_int,const_double,const_vector")
-   (match_test "op != CONST0_RTX (mode)")))
+   (not (match_test "op == CONST0_RTX (mode)"
 
 ;; Return 1 if OP is the constant 4 or 8.
 (define_predicate "const48_operand"
@@ -150,8 +150,7 @@
 
 ;; Return 1 if OP is a valid operand for the source of a move insn.
 (define_predicate "input_operand"
-  (match_code "label_ref,symbol_ref,const,high,reg,subreg,mem,
-  const_double,const_vector,const_int,const_wide_int")
+  (match_operand 0 "general_operand")
 {
   switch (GET_CODE (op))
 {
@@ -273,8 +272,8 @@
 (define_predicate "call_operand"
   (ior (match_code "sy

[PATCH, alpha]: Some cleanups

2015-05-19 Thread Uros Bizjak
No functional changes.

2015-05-18  Uros Bizjak  

* config/alpha/alpha.c (alpha_legitimize_reload_address)
(alpha_preferred_reload_class, alpha_legitimate_constant_p): Use
CONST_INT_P, CONST_SCALAR_INT_P and CONST_DOUBLE_P predicates.
(alpha_split_reload_pair) :
Use CASE_CONST_SCALAR_INT.
(print_operand) : Use mode_width_operand to check the
value of the constant.
* config/alpha/alpha.md (movti): Use CONST_SCALAR_INT_P predicate.
* config/alpha/predicates.md (input_operand): Use general_operand
instead of match_code as operand check.
(symbolic_operand): Use match_code with subexpression digits.
* config/alpha/constraints.md (Q): Ditto.

Tested on alpha-linux-gnu and committed to mainline SVN.

Uros.


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Jeff Law

On 05/19/2015 08:43 AM, Michael Matz wrote:

Hi,

On Fri, 15 May 2015, Rich Felker wrote:


Forget lazy binding. It's dead anyway because serious distros want
PIE+relro+bindnow+...


You keep saying this, but I can't help the feeling it's mostly because
musl doesn't support it ;-)
FWIW, Red Hat is pushing PIE & partial RELRO deeper and deeper into the 
distribution.  It's not clear yet how far bindnow will go though.


jeff



Re: [PATCH 02/13] optabs: Fix vec_perm -> V16QI middle end lowering.

2015-05-19 Thread Richard Henderson
On 05/19/2015 01:41 AM, Andreas Krebbel wrote:
> On 05/18/2015 07:35 PM, Richard Henderson wrote:
>> On 05/11/2015 06:23 AM, Andreas Krebbel wrote:
>>> @@ -6784,14 +6784,18 @@ expand_vec_perm (machine_mode mode, rtx v0, rtx v1, 
>>> rtx sel, rtx target)
>>>  {
>>>/* Multiply each element by its byte size.  */
>>>machine_mode selmode = GET_MODE (sel);
>>> +  /* We cannot re-use SEL as a temp operand since it might by in
>>> +read-only storage.  */
>>> +  rtx sel_reg = gen_reg_rtx (selmode);
>>> +
>>>if (u == 2)
>>> -   sel = expand_simple_binop (selmode, PLUS, sel, sel,
>>> -  sel, 0, OPTAB_DIRECT);
>>> +   sel_reg = expand_simple_binop (selmode, PLUS, sel, sel,
>>> +  sel_reg, 0, OPTAB_DIRECT);
>>>else
>>
>> You needn't allocate sel_reg explicitly; expand_simple_binop will do that for
>> you if the TARGET parameter is NULL.
>>
>> Thus this patch should be an 8 character change on those two calls.
> 
> Right. Thanks!
> 
> Ok to apply with that change?

Yes, thanks.


r~



Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Michael Matz
Hi,

On Fri, 15 May 2015, Rich Felker wrote:

> Forget lazy binding. It's dead anyway because serious distros want
> PIE+relro+bindnow+...

You keep saying this, but I can't help the feeling it's mostly because 
musl doesn't support it ;-)

No, you don't have to use bindnow to get the effects of relro.  Sure 
there's more parts of the GOT protected with it, but if that's really that 
much more hardened is up for debate.

> If people really want lazy binding, they can use options which support 
> it, but I don't want to keep suffering the codegen cost of lazy binding 
> despite never using it.

> There should be an option to generate optimal code equivalent to what 
> you get with Alexander Monakov's patches for those of us who aren't 
> trying to support this legacy feature that precludes good performance 
> and precludes hardening.

H.J.'s branch is for _improving_ code on top of the no-plt code, it's not 
replacing it or an alternative for it.


Ciao,
Michael.


Re: [PATCH 4/4] Split-stack arg pointer init refinement

2015-05-19 Thread David Edelsohn
This small refinement to the -fsplit-stack prologue arg pointer
initialization improves code generation.  Compare the -O2
gcc/testsuite/gcc.dg/split-3.c code for down() below.

beforeafter
mflr 0mflr 0
std 31,-8(1)std 31,-8(1)
std 0,16(1)mr 12,1
stdu 1,-10144(1)std 0,16(1)
addi 12,1,10144stdu 1,-10144(1)
bge 7,.L7bge 7,.L7
mr 12,29mr 12,29
.L7:.L7:

* config/rs6000/rs6000.c (rs6000_emit_allocate_stack): Return
stack adjusting insn.  Formatting.
(rs6000_emit_prologue): Track stack adjusting insn, and use of
r12.  If possible, emit first -fsplit-stack arg pointer insn
before stack adjust.  Don't use r12 to save cr if split-stack.

This patch is okay.  Nice improvement.

Thanks, David


Re: [PATCH 2/4] prologue and epilogue tidy and -mno-vrsave bug fix

2015-05-19 Thread David Edelsohn
On Sun, May 17, 2015 at 10:54 PM, Alan Modra  wrote:
> This patch tidies the prologue and epilogue altivec code a little.
> A number of places using info->altivec_size unnecessarily also test
> TARGET_ALTIVEC_ABI, when rs6000_stack_info() guarantees that
> info->altivec_size is zero if !TARGET_ALTIVEC_ABI.
>
> Similarly by inspection of rs6000_stack_info() code,
> TARGET_ALTIVEC_VRSAVE && info->vrsave_mask != 0, used when deciding to
> save or restore vrsave, can be replaced with info->vrsave_size.  I
> also removed the TARGET_ALTIVEC test used with save/restore of vrsave.
> I believe it is redundant because compute_vrsave_mask() will return 0
> when no altivec registers are used (and of course you can't use then
> without TARGET_ALTIVEC), except for Darwin where TARGET_ALTIVEC is
> forced.  The vrsave changes make the code actually doing the save or
> restore visually consistent with code that sets up a frame register
> for vrsave.
>
> Finally, I've changed two places that use info->vrsave_mask to test
> whether vrsave is saved or restored, to use info->vrsave_size.  This
> is a bug fix for -mno-vrsave.
>
> * config/rs6000/rs6000.c (struct rs6000_stack): Correct comments.
> (rs6000_stack_info): Don't zero offsets when not saving registers.
> (debug_stack_info): Adjust to omit printing unused offsets,
> as before.
> (direct_return): Test vrsave_size rather than vrsave_mask.
> (rs6000_emit_prologue): Likewise.  Remove redundant altivec tests.
> (rs6000_emit_epilogue): Likewise.

This patch is okay.

My only concern is Patch 1 causing a regression for the PR that I mentioned.

Thanks, David


Re: [patch] libstdc++/66055 add missing constructors to unordered containers

2015-05-19 Thread Jonathan Wakely

On 17/05/15 22:21 +0200, François Dumont wrote:
Ok, I just commit fixing some other lines length except those having a 
long hyperlink, I didn't want to break those.


Yep, thanks. I think we should backport Nathan's patch and your one to
the gcc-5-branch too.

I'll make a note to do that before the 5.2 release.


[PATCH] Fix duplicated warning with __attribute__((format)) (PR c/64223)

2015-05-19 Thread Marek Polacek
This PR points out that we output same -Wformat warning twice when using
__attribute__ ((format)).  The problem was that attribute_value_equal
(called when processing merge_attributes) got two lists:
"format printf, 1, 2" and "__format__ __printf__, 1, 2", these should be
equal.  But since attribute_value_equal uses simple_cst_list_equal when
it sees a TREE_LISTs, it doesn't consider "__printf__" and "printf" as
the same, so it said that the two lists aren't same.  That means that the
type then contains two same format attributes and we warn twice.
Fixed by handling the format attribute specially.  (The patch doesn't
consider the printf and the gnu_printf archetypes as the same, so we still
might get duplicate warnings when combining printf and gnu_printf.)

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-05-19  Marek Polacek  

PR c/64223
* tree.c (attribute_value_equal): Handle attribute format.
(cmp_attrib_identifiers): Factor out of lookup_ident_attribute.

* gcc.dg/pr64223-1.c: New test.
* gcc.dg/pr64223-2.c: New test.

diff --git gcc/testsuite/gcc.dg/pr64223-1.c gcc/testsuite/gcc.dg/pr64223-1.c
index e69de29..015bfd8 100644
--- gcc/testsuite/gcc.dg/pr64223-1.c
+++ gcc/testsuite/gcc.dg/pr64223-1.c
@@ -0,0 +1,12 @@
+/* PR c/64223: Test for duplicated warnings.  */
+/* { dg-do compile } */
+/* { dg-options "-Wformat" } */
+
+int printf (const char *, ...) __attribute__ ((__format__ (__printf__, 1, 2)));
+
+void
+foo (void)
+{
+  printf ("%d\n", 0UL); /* { dg-bogus "expects argument of type.*expects 
argument of type" } */
+ /* { dg-warning "expects argument of type" "" { target *-*-* } 10 } */
+}
diff --git gcc/testsuite/gcc.dg/pr64223-2.c gcc/testsuite/gcc.dg/pr64223-2.c
index e69de29..2a1627e 100644
--- gcc/testsuite/gcc.dg/pr64223-2.c
+++ gcc/testsuite/gcc.dg/pr64223-2.c
@@ -0,0 +1,13 @@
+/* PR c/64223: Test for duplicated warnings.  */
+/* { dg-do compile } */
+/* { dg-options "-Wformat" } */
+
+int myprintf (const char *, ...) __attribute__ ((__format__ (printf, 1, 2)));
+int myprintf (const char *, ...) __attribute__ ((__format__ (__printf__, 1, 
2)));
+
+void
+foo (void)
+{
+  myprintf ("%d\n", 0UL); /* { dg-bogus "expects argument of type.*expects 
argument of type" } */
+ /* { dg-warning "expects argument of type" "" { target *-*-* } 11 } */
+}
diff --git gcc/tree.c gcc/tree.c
index 6297f04..a58ad7b 100644
--- gcc/tree.c
+++ gcc/tree.c
@@ -4871,9 +4871,53 @@ simple_cst_list_equal (const_tree l1, const_tree l2)
   return l1 == l2;
 }
 
+/* Compare two identifier nodes representing attributes.  Either one may
+   be in prefixed __ATTR__ form.  Return true if they are the same, false
+   otherwise.  */
+
+static bool
+cmp_attrib_identifiers (const_tree attr1, const_tree attr2)
+{
+  /* Make sure we're dealing with IDENTIFIER_NODEs.  */
+  gcc_checking_assert (TREE_CODE (attr1) == IDENTIFIER_NODE
+  && TREE_CODE (attr2) == IDENTIFIER_NODE);
+
+  /* Identifiers can be compared directly for equality.  */
+  if (attr1 == attr2)
+return true;
+
+  /* If they are not equal, they may still be one in the form
+ 'text' while the other one is in the form '__text__'.  TODO:
+ If we were storing attributes in normalized 'text' form, then
+ this could all go away and we could take full advantage of
+ the fact that we're comparing identifiers. :-)  */
+  const size_t attr1_len = IDENTIFIER_LENGTH (attr1);
+  const size_t attr2_len = IDENTIFIER_LENGTH (attr2);
+
+  if (attr2_len == attr1_len + 4)
+{
+  const char *p = IDENTIFIER_POINTER (attr2);
+  const char *q = IDENTIFIER_POINTER (attr1);
+  if (p[0] == '_' && p[1] == '_'
+ && p[attr2_len - 2] == '_' && p[attr2_len - 1] == '_'
+ && strncmp (q, p + 2, attr1_len) == 0)
+   return true;;
+}
+  else if (attr2_len + 4 == attr1_len)
+{
+  const char *p = IDENTIFIER_POINTER (attr2);
+  const char *q = IDENTIFIER_POINTER (attr1);
+  if (q[0] == '_' && q[1] == '_'
+ && q[attr1_len - 2] == '_' && q[attr1_len - 1] == '_'
+ && strncmp (q + 2, p, attr2_len) == 0)
+   return true;
+}
+
+  return false;
+}
+
 /* Compare two attributes for their value identity.  Return true if the
-   attribute values are known to be equal; otherwise return false.
-*/
+   attribute values are known to be equal; otherwise return false.  */
 
 bool
 attribute_value_equal (const_tree attr1, const_tree attr2)
@@ -4883,10 +4927,25 @@ attribute_value_equal (const_tree attr1, const_tree 
attr2)
 
   if (TREE_VALUE (attr1) != NULL_TREE
   && TREE_CODE (TREE_VALUE (attr1)) == TREE_LIST
-  && TREE_VALUE (attr2) != NULL
+  && TREE_VALUE (attr2) != NULL_TREE
   && TREE_CODE (TREE_VALUE (attr2)) == TREE_LIST)
-return (simple_cst_list_equal (TREE_VALUE (attr1),
-  TREE_VALUE (attr2)) == 1);
+{
+  /* Handle attribute format.  */
+  if (is_attribute_p ("format", TREE_PURPOSE 

[patch] Optimize std::list when using new ABI

2015-05-19 Thread Jonathan Wakely

This fixes some missed optimizations I should have made when adding
the new std::__cxx11::list, making use of the O(1) list::size() when
it saves work.

In the equality comparisons two lists can't be equal if their sizes
differ.

When resizing a list we don't need to walk the list to find whether
we're growing or shrinking, and when shrinking by less than 50% it is
faster to find the first element to erase by moving backwards from the
end rather than starting at the beginning.

Tested powerpc64-linux.

I plan to commit this to trunk and the gcc-5-branch.

commit 9d6539fec972296694c30b073dd068cfbcdae8a5
Author: Jonathan Wakely 
Date:   Tue May 19 13:28:16 2015 +0100

	* include/bits/stl_list.h (_M_resize_pos(size_type&)): Declare.
	(operator==(const list&, const list&)): If size() is O(1) compare
	sizes before comparing each element.
	* include/bits/list.tcc (list::_M_resize_pos(size_type&)): Define.
	(list::resize): Use _M_resize_pos.

diff --git a/libstdc++-v3/include/bits/list.tcc b/libstdc++-v3/include/bits/list.tcc
index a9c8a55..c5d2ab4 100644
--- a/libstdc++-v3/include/bits/list.tcc
+++ b/libstdc++-v3/include/bits/list.tcc
@@ -157,6 +157,52 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   return __ret;
 }
 
+  // Return a const_iterator indicating the position to start inserting or
+  // erasing elements (depending whether the list is growing or shrinking),
+  // and set __new_size to the number of new elements that must be appended.
+  // Equivalent to the following, but performed optimally:
+  // if (__new_size < size()) {
+  //   __new_size = 0;
+  //   return std::next(begin(), __new_size);
+  // } else {
+  //   __newsize -= size();
+  //   return end();
+  // }
+  template
+typename list<_Tp, _Alloc>::const_iterator
+list<_Tp, _Alloc>::
+_M_resize_pos(size_type& __new_size) const
+{
+  const_iterator __i;
+#if _GLIBCXX_USE_CXX11_ABI
+  const size_type __len = size();
+  if (__new_size < __len)
+	{
+	  if (__new_size <= __len / 2)
+	{
+	  __i = begin();
+	  std::advance(__i, __new_size);
+	}
+	  else
+	{
+	  __i = end();
+	  ptrdiff_t __num_erase = __len - __new_size;
+	  std::advance(__i, -__num_erase);
+	}
+	  __new_size = 0;
+	  return __i;
+	}
+  else
+	__i = end();
+#else
+  size_type __len = 0;
+  for (__i = begin(); __i != end() && __len < __new_size; ++__i, ++__len)
+;
+#endif
+  __new_size -= __len;
+  return __i;
+}
+
 #if __cplusplus >= 201103L
   template
 void
@@ -182,14 +228,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 list<_Tp, _Alloc>::
 resize(size_type __new_size)
 {
-  iterator __i = begin();
-  size_type __len = 0;
-  for (; __i != end() && __len < __new_size; ++__i, ++__len)
-;
-  if (__len == __new_size)
+  const_iterator __i = _M_resize_pos(__new_size);
+  if (__new_size)
+	_M_default_append(__new_size);
+  else
 erase(__i, end());
-  else  // __i == end()
-	_M_default_append(__new_size - __len);
 }
 
   template
@@ -197,14 +240,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 list<_Tp, _Alloc>::
 resize(size_type __new_size, const value_type& __x)
 {
-  iterator __i = begin();
-  size_type __len = 0;
-  for (; __i != end() && __len < __new_size; ++__i, ++__len)
-;
-  if (__len == __new_size)
+  const_iterator __i = _M_resize_pos(__new_size);
+  if (__new_size)
+insert(end(), __new_size, __x);
+  else
 erase(__i, end());
-  else  // __i == end()
-insert(end(), __new_size - __len, __x);
 }
 #else
   template
@@ -212,14 +252,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 list<_Tp, _Alloc>::
 resize(size_type __new_size, value_type __x)
 {
-  iterator __i = begin();
-  size_type __len = 0;
-  for (; __i != end() && __len < __new_size; ++__i, ++__len)
-;
-  if (__len == __new_size)
-erase(__i, end());
-  else  // __i == end()
-insert(end(), __new_size - __len, __x);
+  const_iterator __i = _M_resize_pos(__new_size);
+  if (__new_size)
+insert(end(), __new_size, __x);
+  else
+erase(__i._M_const_cast(), end());
 }
 #endif
 
diff --git a/libstdc++-v3/include/bits/stl_list.h b/libstdc++-v3/include/bits/stl_list.h
index 3401e5b..a26859e 100644
--- a/libstdc++-v3/include/bits/stl_list.h
+++ b/libstdc++-v3/include/bits/stl_list.h
@@ -1789,6 +1789,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 	_S_do_it(_M_get_Node_allocator(), __x._M_get_Node_allocator()))
 	  __builtin_abort();
   }
+
+  // Used to implement resize.
+  const_iterator
+  _M_resize_pos(size_type& __new_size) const;
 };
 _GLIBCXX_END_NAMESPACE_CXX11
 
@@ -1806,6 +1810,11 @@ _GLIBCXX_END_NAMESPACE_CXX11
 inline bool
 operator==(const list<_Tp, _Alloc>& __x, const list<_Tp, _Alloc>& __y)
 {

Re: [Patch, fortran, pr65548, 2nd take, v5] [5/6 Regression] gfc_conv_procedure_call

2015-05-19 Thread Mikael Morin
Le 19/05/2015 10:50, Andre Vehreschild a écrit :
> Hi all,
> 
> find attached latest version to fix 65548.
> 
> Bootstraps and regtests ok on x86_64-linux-gnu/f21.
> 
OK. Thanks.

Mikael


Re: [Patch, Fortran, PR58586, v4] ICE with derived type with allocatable component passed by value

2015-05-19 Thread Andre Vehreschild
Hi,

attached is the most recent version of the patch for 58586. It adapts to
recent trunk and addresses the caveats so far, i.e. the testcases in the
comments now compile and run again w/o errors.

Bootstraps and regtests fine on x86_64-linux-gnu/f21.

Comments?

- Andre

On Fri, 8 May 2015 16:11:11 +0200
Andre Vehreschild  wrote:

> Hi,
> 
> so attached is a quick and dirty solution for the allocatable return value
> problem. I personally don't like it. It is making a special case from the
> assign a function result to a variable. May be you have a better idea how to
> do this in gfortran style.
> 
> - Andre
> 
> 
> On Fri, 8 May 2015 15:31:46 +0200
> Andre Vehreschild  wrote:
> 
> > Hi Mikael,
> > 
> > > > ?? I don't get you there? What do you mean? Do you think the
> > > > alloc_comp_class_3/4.* are not correctly testing the issue? Any idea of
> > > > how to test this better? I mean the pr is about this artificial
> > > > constructs. I merely struck it in search of a pr about allocatable
> > > > components. 
> > > 
> > > I was talking about the bug you found with t_init above.  :-)
> > > the compiler is not ready to accept that function in a testcase.
> > > The alloc_omp_class_3/4 are fine.
> > 
> > Oh, sorry, I misunderstood you there. Now let's see, where that one is
> > hiding.
> > 
> > - Andre
> 
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
gcc/testsuite/ChangeLog:

2015-05-19  Andre Vehreschild  

* gfortran.dg/alloc_comp_class_3.f03: New test.
* gfortran.dg/alloc_comp_class_4.f03: New test.


gcc/fortran/ChangeLog:

2015-05-19  Andre Vehreschild  

PR fortran/58586
* resolve.c (resolve_symbol): Non-private functions in modules
with allocatable or pointer components are marked referenced
now. Furthermore is the default init especially for those
components now done in gfc_conf_procedure_call preventing
duplicate code.
* trans-decl.c (gfc_generate_function_code): Generate a fake
result decl for functions returning an object with allocatable
components and initialize them.
* trans-expr.c (gfc_conv_procedure_call): For value typed trees
use the tree without indirect ref. And for non-decl trees
add a temporary variable to prevent evaluating the tree
multiple times (prevent multiple function evaluations).
* trans.h: Made gfc_trans_structure_assign () protoype
available, which is now needed by trans-decl.c:gfc_generate_
function_code(), too.

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index fc11d23..e1b5762 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -14094,10 +14094,15 @@ resolve_symbol (gfc_symbol *sym)
 
   if ((!a->save && !a->dummy && !a->pointer
 	   && !a->in_common && !a->use_assoc
-	   && (a->referenced || a->result)
-	   && !(a->function && sym != sym->result))
+	   && !a->result && !a->function)
 	  || (a->dummy && a->intent == INTENT_OUT && !a->pointer))
 	apply_default_init (sym);
+  else if (a->function && sym->result && a->access != ACCESS_PRIVATE
+	   && (sym->ts.u.derived->attr.alloc_comp
+		   || sym->ts.u.derived->attr.pointer_comp))
+	/* Mark the result symbol to be referenced, when it has allocatable
+	   components.  */
+	sym->result->attr.referenced = 1;
 }
 
   if (sym->ts.type == BT_CLASS && sym->ns == gfc_current_ns
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 4c18920..f9a91c6 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -5896,9 +5896,33 @@ gfc_generate_function_code (gfc_namespace * ns)
   tmp = gfc_trans_code (ns->code);
   gfc_add_expr_to_block (&body, tmp);
 
-  if (TREE_TYPE (DECL_RESULT (fndecl)) != void_type_node)
+  if (TREE_TYPE (DECL_RESULT (fndecl)) != void_type_node
+  || (sym->result && sym->result != sym
+	  && sym->result->ts.type == BT_DERIVED
+	  && sym->result->ts.u.derived->attr.alloc_comp))
 {
+  bool artificial_result_decl = false;
   tree result = get_proc_result (sym);
+  gfc_symbol *rsym = sym == sym->result ? sym : sym->result;
+
+  /* Make sure that a function returning an object with
+	 alloc/pointer_components always has a result, where at least
+	 the allocatable/pointer components are set to zero.  */
+  if (result == NULL_TREE && sym->attr.function
+	  && ((sym->result->ts.type == BT_DERIVED
+	   && (sym->attr.allocatable
+		   || sym->attr.pointer
+		   || sym->result->ts.u.derived->attr.alloc_comp
+		   || sym->result->ts.u.derived->attr.pointer_comp))
+	  || (sym->result->ts.type == BT_CLASS
+		  && (CLASS_DATA (sym)->attr.allocatable
+		  || CLASS_DATA (sym)->attr.class_pointer
+		  || CLASS_DATA (sym->result)->attr.alloc_comp
+		  || CLASS_DATA (sym->result)->attr.pointer_comp
+	{
+	  artificial_result_decl = true;
+	  result = gfc_get_fake_result_decl (sym, 0);
+	}
 
   if (result != NULL_TREE && sym->attr.function && !sym->attr.

Re: [Patch ARM-AArch64/testsuite 00/13] Neon intrinsics executable tests

2015-05-19 Thread James Greenhalgh
On Tue, May 12, 2015 at 09:30:48PM +0100, Christophe Lyon wrote:
> This patch series is a follow-up to the tests I already contributed,
> converted from my original testsuite.
> 
> This series consists in 13 new files, which can be committed
> independently.
> 
> Another series (hopefully final) will follow.
> 
> Tested with qemu on arm*linux, aarch64-linux. I couldn't test on 
> aarch64_be-none-elf because my build is currently broken (see PR 66018).
> 
> 2015-05-12  Christophe Lyon  
> 
>   * gcc.target/aarch64/neon-intrinsics/vqmovn.c: New file.
>   * gcc.target/aarch64/neon-intrinsics/vqmovun.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqrdmulh.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqrdmulh_lane.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqrdmulh_n.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqrshl.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqrshrn_n.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqrshrun_n.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqshl.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqshl_n.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqshlu_n.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqshrn_n.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqshrun_n.c: Likewise.

Hi Christophe,

This patch set looks good to me. The patch set is OK, please apply it.

One small nit, could you run through and check the alignment of the
trailing \ in some of the macro definitions? It might be the mail
clients, but I see (for example):

+  /* Basic test: v2=vqshlu_n(v1,v), then store the result.  */
+#define TEST_VQSHLU_N2(INSN, Q, T1, T2, T3, T4, W, N, V, 
EXPECTED_CUMULATIVE_SAT, CMT) \
+  Set_Neon_Cumulative_Sat(0, VECT_VAR(vector_res, T3, W, N));  \
+  VECT_VAR(vector_res, T3, W, N) = \
+INSN##Q##_n_##T2##W(VECT_VAR(vector, T1, W, N),\
+   V); \
+  vst1##Q##_##T4##W(VECT_VAR(result, T3, W, N),\
+   VECT_VAR(vector_res, T3, W, N));\
+  CHECK_CUMULATIVE_SAT(TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+

in patch 11/13.

Also, if you could look out for aarch64_be fallout once the build
starts going, that would be great.

Thanks,
James

> 
> Christophe Lyon (13):
>   Add vqmovn tests.
>   Add vqmovun tests.
>   Add vqrdmulh tests.
>   Add vqrdmulh_lane tests.
>   Add vqrdmulh_n tests.
>   Add vqrshl tests.
>   Add vqrshrn_n tests.
>   Add vqrshrun_n tests.
>   Add vqshl tests.
>   Add vqshl_n tests.
>   Add vqshlu_n tests.
>   Add vqshrn_n tests.
>   Add vqshrun_n tests.
> 
>  .../gcc.target/aarch64/advsimd-intrinsics/vqmovn.c |  134 +++
>  .../aarch64/advsimd-intrinsics/vqmovun.c   |   93 ++
>  .../aarch64/advsimd-intrinsics/vqrdmulh.c  |  161 +++
>  .../aarch64/advsimd-intrinsics/vqrdmulh_lane.c |  169 +++
>  .../aarch64/advsimd-intrinsics/vqrdmulh_n.c|  155 +++
>  .../gcc.target/aarch64/advsimd-intrinsics/vqrshl.c | 1090 
> 
>  .../aarch64/advsimd-intrinsics/vqrshrn_n.c |  174 
>  .../aarch64/advsimd-intrinsics/vqrshrun_n.c|  189 
>  .../gcc.target/aarch64/advsimd-intrinsics/vqshl.c  |  829 +++
>  .../aarch64/advsimd-intrinsics/vqshl_n.c   |  234 +
>  .../aarch64/advsimd-intrinsics/vqshlu_n.c  |  263 +
>  .../aarch64/advsimd-intrinsics/vqshrn_n.c  |  177 
>  .../aarch64/advsimd-intrinsics/vqshrun_n.c |  133 +++
>  13 files changed, 3801 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqmovn.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqmovun.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmulh.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmulh_lane.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmulh_n.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshl.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshrn_n.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshrun_n.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshl.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshl_n.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshlu_n.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshrn_n.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshrun_n.c
> 
> -- 
> 2.1.4
> 


[patch, avr] Restore base register if not marked dead/unused

2015-05-19 Thread Sivanupandi, Pitchumani
Test gcc.c-torture/execute/memcpy-bi.c (-O2) failed for attiny40 device.
Cause seems to be in "load from memory" as it is not restoring base
register after load instructions generated.

Function avr_out_load_psi_reg_no_disp_tiny in avr.c:
It returns just after emitting instructions to load from memory to 
registers. It is important to restore the base register if it is 
not marked dead/unused after that insn.

Code to restore base register is present already. Below patch let 
the function do the restore before return.

If OK, could someone commit? I do not have commit access.

diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
index 4e83de8..b653858 100644
--- a/gcc/config/avr/avr.c
+++ b/gcc/config/avr/avr.c
@@ -4365,9 +4365,9 @@ avr_out_load_psi_reg_no_disp_tiny (rtx insn, rtx *op, int 
*plen)
 }
   else
 {
-  return avr_asm_len ("ld %A0,%1+"  CR_TAB
-  "ld %B0,%1+"  CR_TAB
-  "ld %C0,%1", op, plen, -3);
+  avr_asm_len ("ld %A0,%1+"  CR_TAB
+   "ld %B0,%1+"  CR_TAB
+   "ld %C0,%1", op, plen, -3);
 
   if (reg_dest != reg_base - 2 &&
   !reg_unused_after (insn, base))

Regards,
Pitchumani

gcc/ChangeLog
2015-05-19  Pitchumani Sivanupandi  

* config/avr/avr.c (avr_out_load_psi_reg_no_disp_tiny): Restore base
register if not marked dead/unused, before return.



[Patch AArch64] Add cpu_defines.h for AArch64.

2015-05-19 Thread Ramana Radhakrishnan

Hi,

Like the ARM port, the AArch64 ports needs to set glibc_integral_traps 
to false as integer divide instructions do not trap.


Bootstrapped and regression tested on aarch64-none-linux-gnu

Ok to apply ?

regards
Ramana


2015-05-17  Ramana Radhakrishnan  

* configure.host: Define cpu_defines_dir for AArch64
* config/cpu/aarch64/cpu_defines.h: New file.
>From 1e38b2a848a313e5b98494094b198b7a1e34c59c Mon Sep 17 00:00:00 2001
From: Ramana Radhakrishnan 
Date: Mon, 18 May 2015 15:45:49 +0100
Subject: [PATCH 2/2] Do the same for AArch64.

---
 libstdc++-v3/config/cpu/aarch64/cpu_defines.h | 36 +++
 libstdc++-v3/configure.host   |  3 +++
 2 files changed, 39 insertions(+)
 create mode 100644 libstdc++-v3/config/cpu/aarch64/cpu_defines.h

diff --git a/libstdc++-v3/config/cpu/aarch64/cpu_defines.h b/libstdc++-v3/config/cpu/aarch64/cpu_defines.h
new file mode 100644
index 000..d5a6fd0
--- /dev/null
+++ b/libstdc++-v3/config/cpu/aarch64/cpu_defines.h
@@ -0,0 +1,36 @@
+// Specific definitions for generic platforms  -*- C++ -*-
+
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file bits/cpu_defines.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{iosfwd}
+ */
+
+#ifndef _GLIBCXX_CPU_DEFINES
+#define _GLIBCXX_CPU_DEFINES 1
+
+// Integer divide instructions don't trap on AArch64.
+#define __glibcxx_integral_traps false
+
+#endif
diff --git a/libstdc++-v3/configure.host b/libstdc++-v3/configure.host
index 465a40a..b1ca7b7 100644
--- a/libstdc++-v3/configure.host
+++ b/libstdc++-v3/configure.host
@@ -143,6 +143,9 @@ cpu_include_dir=cpu/${try_cpu}
 # Set specific CPU overrides for cpu_defines_dir. Most can just use generic.
 # THIS TABLE IS SORTED.  KEEP IT THAT WAY.
 case "${host_cpu}" in
+  aarch64*)
+cpu_defines_dir=cpu/aarch64
+;;
   arm*)
 cpu_defines_dir=cpu/arm
 ;;
-- 
1.9.1



[Patch ARM] Add cpu_defines.h for ARM

2015-05-19 Thread Ramana Radhakrishnan
Hardware Integer divide instructions do not trap. Define this to be so 
for the ARM port.


Applied to trunk after a build and test across architecture ranges and a 
bootstrap and regression run on a Cortex-A15 - a v7ve core that has 
hardware divide instructions.


A patch for AArch64 follows.

regards
Ramana

2015-05-17  Ramana Radhakrishnan  

* configure.host: Define cpu_defines_dir for ARM.
* config/cpu/arm/cpu_defines.h: New file.

Index: ChangeLog
===
--- ChangeLog   (revision 223359)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2015-05-17  Ramana Radhakrishnan  
+
+   * configure.host: Define cpu_defines_dir for ARM.
+   * config/cpu/arm/cpu_defines.h: New file.
+
 2015-05-17  François Dumont  
 
* include/bits/unordered_map.h (unordered_map, unordered_multimap): Add
Index: config/cpu/arm/cpu_defines.h
===
--- config/cpu/arm/cpu_defines.h(revision 0)
+++ config/cpu/arm/cpu_defines.h(working copy)
@@ -0,0 +1,40 @@
+// Specific definitions for generic platforms  -*- C++ -*-
+
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file bits/cpu_defines.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{iosfwd}
+ */
+
+#ifndef _GLIBCXX_CPU_DEFINES
+#define _GLIBCXX_CPU_DEFINES 1
+
+// Integer divide instructions don't trap on ARM.
+#ifdef __ARM_ARCH_EXT_IDIV__
+#define __glibcxx_integral_traps false
+#else
+#define __glibcxx_integral_traps true
+#endif
+
+#endif
Index: configure.host
===
--- configure.host  (revision 223359)
+++ configure.host  (working copy)
@@ -143,6 +143,9 @@
 # Set specific CPU overrides for cpu_defines_dir. Most can just use generic.
 # THIS TABLE IS SORTED.  KEEP IT THAT WAY.
 case "${host_cpu}" in
+  arm*)
+cpu_defines_dir=cpu/arm
+;;
   powerpc* | rs6000)
 cpu_defines_dir=cpu/powerpc
 ;;


Re: [PATCH] plugin event for C/C++ function definitions

2015-05-19 Thread Andres Tiraboschi
2015-05-18 16:51 GMT-03:00  :
> Hi, this patch adds two new plugin events PLUGIN_START_PARSE_FUNCTION and 
> PLUGIN_FINISH_PARSE_FUNCTION. These events are invoked at start_function and 
> finish_function in gcc/c/c-decl.c and gcc/cp/decl.c respectively in the C and 
> C++ frontends.
> PLUGIN_START_PARSE_FUNCTION is called before parsing a function body.
> PLUGIN_FINISH_PARSE_FUNCTION is called after parsing a function definition.
> This patch has been implemented in gcc 5.1.0.
>
> changelog:
>
> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> index e28a294..fcc849d 100644
> --- a/gcc/c/c-decl.c
> +++ b/gcc/c/c-decl.c
> @@ -8235,6 +8235,7 @@ start_function (struct c_declspecs *declspecs, struct 
> c_declarator *declarator,
>
>decl1 = grokdeclarator (declarator, declspecs, FUNCDEF, true, NULL,
>   &attributes, NULL, NULL, DEPRECATED_NORMAL);
> +  invoke_plugin_callbacks (PLUGIN_START_PARSE_FUNCTION, decl1);
>
>/* If the declarator is not suitable for a function definition,
>   cause a syntax error.  */
> @@ -9050,6 +9051,7 @@ finish_function (void)
>   It's still in DECL_STRUCT_FUNCTION, and we'll restore it in
>   tree_rest_of_compilation.  */
>set_cfun (NULL);
> +  invoke_plugin_callbacks (PLUGIN_FINISH_PARSE_FUNCTION, 
> current_function_decl);
>current_function_decl = NULL;
>  }
>
> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> index c4731ae..bde92cc 100644
> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -13727,6 +13727,7 @@ start_function (cp_decl_specifier_seq *declspecs,
>tree decl1;
>
>decl1 = grokdeclarator (declarator, declspecs, FUNCDEF, 1, &attrs);
> +  invoke_plugin_callbacks (PLUGIN_START_PARSE_FUNCTION, decl1);
>if (decl1 == error_mark_node)
>  return false;
>/* If the declarator is not suitable for a function definition,
> @@ -14365,6 +14366,7 @@ finish_function (int flags)
>vec_free (deferred_mark_used_calls);
>  }
>
> +  invoke_plugin_callbacks (PLUGIN_FINISH_PARSE_FUNCTION, fndecl);
>return fndecl;
>  }
>
> diff --git a/gcc/doc/plugins.texi b/gcc/doc/plugins.texi
> index c6caa19..d50f25c 100644
> --- a/gcc/doc/plugins.texi
> +++ b/gcc/doc/plugins.texi
> @@ -174,6 +174,8 @@ Callbacks can be invoked at the following pre-determined 
> events:
>  @smallexample
>  enum plugin_event
>  @{
> +  PLUGIN_START_PARSE_FUNCTION,  /* Called before parsing the body of a 
> function. */
> +  PLUGIN_FINISH_PARSE_FUNCTION, /* After finishing parsing a function. */
>PLUGIN_PASS_MANAGER_SETUP,/* To hook into pass manager.  */
>PLUGIN_FINISH_TYPE,   /* After finishing parsing a type.  */
>PLUGIN_FINISH_DECL,   /* After finishing parsing a declaration. */
> diff --git a/gcc/plugin.c b/gcc/plugin.c
> index d924438..628833f 100644
> --- a/gcc/plugin.c
> +++ b/gcc/plugin.c
> @@ -441,6 +441,8 @@ register_callback (const char *plugin_name,
> return;
>   }
>/* Fall through.  */
> +  case PLUGIN_START_PARSE_FUNCTION:
> +  case PLUGIN_FINISH_PARSE_FUNCTION:
>case PLUGIN_FINISH_TYPE:
>case PLUGIN_FINISH_DECL:
>case PLUGIN_START_UNIT:
> @@ -519,6 +521,8 @@ invoke_plugin_callbacks_full (int event, void *gcc_data)
> gcc_assert (event >= PLUGIN_EVENT_FIRST_DYNAMIC);
> gcc_assert (event < event_last);
>/* Fall through.  */
> +  case PLUGIN_START_PARSE_FUNCTION:
> +  case PLUGIN_FINISH_PARSE_FUNCTION:
>case PLUGIN_FINISH_TYPE:
>case PLUGIN_FINISH_DECL:
>case PLUGIN_START_UNIT:
> diff --git a/gcc/plugin.def b/gcc/plugin.def
> index 98c988a..2a7e4c2 100644
> --- a/gcc/plugin.def
> +++ b/gcc/plugin.def
> @@ -17,6 +17,11 @@ You should have received a copy of the GNU General Public 
> License
>  along with GCC; see the file COPYING3.  If not see
>  .  */
>
> +/* Called before parsing the body of a function.  */
> +DEFEVENT (PLUGIN_START_PARSE_FUNCTION)
> +
> +/* After finishing parsing a function. */
> +DEFEVENT (PLUGIN_FINISH_PARSE_FUNCTION)
>
>  /* To hook into pass manager.  */
>  DEFEVENT (PLUGIN_PASS_MANAGER_SETUP)
> diff --git a/gcc/testsuite/g++.dg/plugin/def-plugin-test.C 
> b/gcc/testsuite/g++.dg/plugin/def-plugin-test.C
> new file mode 100644
> index 000..b7f2d3d
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/plugin/def-plugin-test.C
> @@ -0,0 +1,13 @@
> +int global = 12;
> +
> +int function1(void);
> +
> +int function2(int a) // { dg-warning "Start fndef function2" }
> +{
> +  return function1() + a;
> +} //  { dg-warning "Finish fndef function2" }
> +
> +int function1(void) // { dg-warning "Start fndef function1" }
> +{
> +  return global + 1;
> +} //  { dg-warning "Finish fndef function1" }
> diff --git a/gcc/testsuite/g++.dg/plugin/def_plugin.c 
> b/gcc/testsuite/g++.dg/plugin/def_plugin.c
> new file mode 100644
> index 000..63983c5
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/plugin/def_plugin.c
> @@ -0,0 +1,45 @@
> +/* A plugin example that shows w

Re: [Patch, fortran, 64674, v2] [OOP] ICE in ASSOCIATE with class array

2015-05-19 Thread Andre Vehreschild
Hi,

so here is the update on pr 64674. Besides adapting to current trunk nothing
has changed from the previous version. The links for getting the patches this
one depends on are:

PR65548 v5: https://gcc.gnu.org/ml/fortran/2015-05/msg00123.html and
PR44672 v6: https://gcc.gnu.org/ml/fortran/2015-05/msg00124.html

Bootstraps and regtests fine on x86_64-linux-gnu/f21.

Any comments?

- Andre

On Mon, 4 May 2015 16:53:15 +0200
Andre Vehreschild  wrote:

> Hi all,
> 
> I like to present here a first patch for using class arrays in associate. Upto
> now gfortran crashed, when a class array-section/element was selected in an
> associate. This patch fixes this now for class array sections as well as for
> single elements.
> 
> The story of the patch is told quite shortly: 
> 
> - parse.c::parse_associate() needs to gather more information about what the
>   target is like. Previously the target's rank and array_spec was not
> computed, which disallowed the use of further array refs in the associate
> body: associate (vec => class_matrix(2:3, 2))
> vec(1) = ... ! <- Unclassifiable statement, because no array_spec was
>   attached to vec. This is fixed by the second hunk of the patch.
> 
> - The third hunk in primary.c prevents setting the dimension attribute on a
>   class object's symbol.
> 
> - The hunks in resolve.c take care about adding dummy full array_refs and in
>   resolve_assoc_var correct the class type, when the target expression's rank
>   is 0. Previously the symbol would have an array valued type, when the
>   target's base type was array valued. But for a scalar target this needed
> some polishing.
> 
> - Additionally a test was added.
> 
> Bootstraps and regtests ok on x86_64-linux-gnu/f21.
> 
> Ok for trunk?
> 
> Note, this patch was diffed from a trunk with my older patches for
> 
> PR65548, v3 https://gcc.gnu.org/ml/fortran/2015-04/msg00123.html and
> PR44672, v5 https://gcc.gnu.org/ml/fortran/2015-04/msg00124.html
> 
> applied.
> 
> Regards,
>   Andre


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


pr64787_v2.clog
Description: Binary data
diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index 786876c..455aa69 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -234,6 +234,9 @@ gfc_add_component_ref (gfc_expr *e, const char *name)
 }
   if (*tail != NULL && strcmp (name, "_data") == 0)
 next = *tail;
+  else
+/* Avoid losing memory.  */
+gfc_free_ref_list (*tail);
   (*tail) = gfc_get_ref();
   (*tail)->next = next;
   (*tail)->type = REF_COMPONENT;
@@ -2562,13 +2565,19 @@ find_intrinsic_vtab (gfc_typespec *ts)
 	  c->attr.access = ACCESS_PRIVATE;
 
 	  /* Build a minimal expression to make use of
-		 target-memory.c/gfc_element_size for 'size'.  */
+		 target-memory.c/gfc_element_size for 'size'.  Special handling
+		 for character arrays, that are not constant sized: to support
+		 len(str)*kind, only the kind information is stored in the
+		 vtab.  */
 	  e = gfc_get_expr ();
 	  e->ts = *ts;
 	  e->expr_type = EXPR_VARIABLE;
 	  c->initializer = gfc_get_int_expr (gfc_default_integer_kind,
 		 NULL,
-		 (int)gfc_element_size (e));
+		 ts->type == BT_CHARACTER
+		 && charlen == 0 ?
+		   ts->kind :
+		   (int)gfc_element_size (e));
 	  gfc_free_expr (e);
 
 	  /* Add component _extends.  */
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index f55c691..f4fa9c8 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3168,6 +3168,7 @@ void gfc_add_component_ref (gfc_expr *, const char *);
 void gfc_add_class_array_ref (gfc_expr *);
 #define gfc_add_data_component(e) gfc_add_component_ref(e,"_data")
 #define gfc_add_vptr_component(e) gfc_add_component_ref(e,"_vptr")
+#define gfc_add_len_component(e)  gfc_add_component_ref(e,"_len")
 #define gfc_add_hash_component(e) gfc_add_component_ref(e,"_hash")
 #define gfc_add_size_component(e) gfc_add_component_ref(e,"_size")
 #define gfc_add_def_init_component(e) gfc_add_component_ref(e,"_def_init")
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 54f8f4a..697a17a 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -4975,8 +4975,7 @@ static tree
 gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset,
 		 gfc_expr ** lower, gfc_expr ** upper, stmtblock_t * pblock,
 		 stmtblock_t * descriptor_block, tree * overflow,
-		 tree expr3_elem_size, tree *nelems, gfc_expr *expr3,
-		 gfc_typespec *ts)
+		 tree expr3_elem_size, tree *nelems, gfc_expr *expr3)
 {
   tree type;
   tree tmp;
@@ -5002,7 +5001,7 @@ gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset,
 
   /* Set the dtype.  */
   tmp = gfc_conv_descriptor_dtype (descriptor);
-  gfc_add_modify (descriptor_block, tmp, gfc_get_dtype (TREE_TYPE (descriptor)));
+  gfc_add_modify (descriptor_block, tmp, gfc_get_dtype (type));
 
   or_expr = boolean_false

Commit: MSP430: Enhance the zero_extendhisi2 pattern

2015-05-19 Thread Nick Clifton
Hi Guys,

  I am applying the patch below to enhance the zero_extendhisi2 pattern
  in the MSP430 backend so that it can cope with separate source and
  destination registers.  This makes zero extending into another
  register more efficient and it also helps to work around a reload bug
  reported in PR 66156.

Cheers
  Nick

gcc/ChangeLog
2015-05-19  Nick Clifton  

PR target/66156
* config/msp430/msp430.md (zero_extendhisi2): Add support for
separate source and destination registers.

Index: gcc/config/msp430/msp430.md
===
--- gcc/config/msp430/msp430.md (revision 223348)
+++ gcc/config/msp430/msp430.md (working copy)
@@ -588,10 +588,12 @@
 ;; patterns.  Doing these manually allows for alternate optimization
 ;; paths.
 (define_insn "zero_extendhisi2"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm")
-   (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "0")))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,r")
+   (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "0,r")))]
   "msp430x"
-  "MOV.W\t#0,%H0"
+  "@
+  MOV.W\t#0,%H0
+  MOV.W\t%1,%L0 { MOV.W\t#0,%H0"
 )
 
 (define_insn "zero_extendhisipsi2"


Re: [PATCH] Handle multiple vector sizes in BB vectorization

2015-05-19 Thread Richard Biener
On Tue, 19 May 2015, Rainer Orth wrote:

> Richard Biener  writes:
> 
> > Well, not really - but at least don't fail vectorization because of that
> > but allow it to proceed the "build up from scalar pieces" path.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> 
> The testcase FAILs on Solaris/SPARC:
> 
> FAIL: gcc.dg/vect/bb-slp-35.c -flto -ffat-lto-objects  scan-tree-dump slp2 
> "basic block vectorized"
> FAIL: gcc.dg/vect/bb-slp-35.c scan-tree-dump slp2 "basic block vectorized"
> 
> The dump
> 
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-35.c:6:11: note: 
> not vectorized: unsupported unaligned store.*p_6(D)
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-35.c:6:11: note: 
> not vectorized: unsupported alignment in basic block.
> 
> suggests that the following adjustment is needed.  Tested on
> sparc-sun-solaris2.11 on x86_64-unknown-linux-gnu.
> 
> Ok for mainline?

Ok.

Thanks,
Richard.

>   Rainer
> 
> 
> 2015-05-19  Rainer Orth  
> 
>   * gcc.dg/vect/bb-slp-35.c: Adjust.


Re: [PATCH] Fix PR66168: ICE due to incorrect invariant register info

2015-05-19 Thread Steven Bosscher
On Tue, May 19, 2015 at 12:17 PM, Thomas Preud'homme wrote:
> 2015-05-18  Thomas Preud'homme
>
> PR rtl-optimization/66168
> * loop-invariant.c (move_invariant_reg): Set inv->reg to destination
> of inv->insn when moving an invariant without introducing a temporary
> register.

Not OK.
This will break in move_invariants() when it looks at REGNO (inv->reg).

Ciao!
Steven


Re: [PATCH][AArch64] PR target/65491: Classify V1TF vectors as AAPCS64 short vectors rather than composite types

2015-05-19 Thread James Greenhalgh
On Mon, Apr 20, 2015 at 11:16:02AM +0100, Kyrylo Tkachov wrote:
> Hi all,
> 
> The ICE in the PR happens when we pass a 1x(128-bit float) vector as an
> argument.
> The aarch64 backend erroneously classifies it as a composite type when in
> fact it
> is a short vector according to AAPCS64
> (section 4.1.2 from
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.p
> df).

Agreed.

> 
> The solution in this patch is to check aarch64_composite_type_p for a short
> vector with
> aarch64_short_vector_p rather than the other way around (check for
> aarch64_short_vector_p
> in aarch64_composite_type_p).

I think I understand what you are saying, but your patch does the
opposite (ADDS a check for aarch64_short_vector_p in
aarch64_composite_type_p, REMOVES a check for aarch64_composite_type_p,
in aarch64_short_vector_p)...

This logic is pretty hairy, and I'm struggling to convince myself that
your change only hits the bug you described above. I think I've worked
it through and it does, but if you can find any additional ABI tests
which stress the Vector/Floating-Point passing rules that would help
settle my nerves.

The patch is OK. I wouldn't think we would want to backport it to
release branches as there is no regression to fix.

Thanks,
James

> 2015-04-20  Kyrylo Tkachov  
> 
> PR target/65491
> * config/aarch64/aarch64.c (aarch64_short_vector_p): Move above
> aarch64_composite_type_p.  Remove check for aarch64_composite_type_p.
> (aarch64_composite_type_p): Return false if given type and mode are
> for a short vector.
> 
> 2015-04-20  Kyrylo Tkachov  
> 
> PR target/65491
> * gcc.target/aarch64/pr65491_1.c: New test.
> * gcc.target/aarch64/aapcs64/type-def.h (vlf1_t): New typedef.
> * gcc.target/aarch64/aapcs64/func-ret-1.c: Add test for vlf1_t.



Re: [PATCH] Handle multiple vector sizes in BB vectorization

2015-05-19 Thread Rainer Orth
Richard Biener  writes:

> Well, not really - but at least don't fail vectorization because of that
> but allow it to proceed the "build up from scalar pieces" path.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

The testcase FAILs on Solaris/SPARC:

FAIL: gcc.dg/vect/bb-slp-35.c -flto -ffat-lto-objects  scan-tree-dump slp2 
"basic block vectorized"
FAIL: gcc.dg/vect/bb-slp-35.c scan-tree-dump slp2 "basic block vectorized"

The dump

/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-35.c:6:11: note: 
not vectorized: unsupported unaligned store.*p_6(D)
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-35.c:6:11: note: 
not vectorized: unsupported alignment in basic block.

suggests that the following adjustment is needed.  Tested on
sparc-sun-solaris2.11 on x86_64-unknown-linux-gnu.

Ok for mainline?

Rainer


2015-05-19  Rainer Orth  

* gcc.dg/vect/bb-slp-35.c: Adjust.

# HG changeset patch
# Parent 7e4562f46f5c81f1894e9efc36a5f6bd409b5a41
Fix gcc.dg/vect/bb-slp-35.c on SPARC

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-35.c b/gcc/testsuite/gcc.dg/vect/bb-slp-35.c
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-35.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-35.c
@@ -9,5 +9,5 @@ void foo (int * __restrict__ p, short * 
   p[3] = q[3] + 1;
 }
 
-/* { dg-final { scan-tree-dump "basic block vectorized" "slp2" } } */
+/* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target vect_hw_misalign } } } */
 /* { dg-final { cleanup-tree-dump "slp2" } } */

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Fix PR48052: loop not vectorized if index is "unsigned int"

2015-05-19 Thread Bin.Cheng
On Wed, May 6, 2015 at 7:02 PM, Richard Biener
 wrote:
> On Mon, May 4, 2015 at 9:47 PM, Abderrazek Zaafrani
>  wrote:
>> This is an old thread and we are still running into similar issues:
>> Code is not being vectorized on 64-bit target due to scev not being
>> able to optimally analyze overflow condition.
>>
>> While the original test case shown here seems to work now, it does not
>> work if the start value is not a constant and the loop index variable
>> is of unsigned type: Ex
>>
>> void loop2( double const * __restrict__ x_in, double * __restrict__
>> x_out, double const * __restrict__ c, unsigned int N, unsigned int
>> start) {
>>  for(unsigned int i=start; i!=N; ++i)
>>x_out[i] = c[i]*x_in[i];
>> }
>>
>> Here is our unit test:
>>
>> int foo(int* A, int* B, unsigned start, unsigned B)
>> {
>>   int s;
>>   for (unsigned k = start; k > s += A[k] * B[k];
>>   return s;
>> }
>>
>> Our unit test case is extracted from a matrix multiply of a
>> two-dimensional array and all loops are blocked by hand by a factor of
>> B. Even though a bit modified, above loop corresponds to the innermost
>> loop of the blocked matrix multiply.
>>
>> We worked on patch to solve the problem (see attachment.)
>> The attached patch passed bootstrap and make check on x86_64-linux.
>> Ok for trunk?
>
> Apart from coding style / API issues the case you handle is very special
> (IVs with step 1 only?!) I believe it is also wrong - the assumption that
> if there is a symbolic or constant expression for the number of iterations
> a BIV will not wrap is not true.  niter analysis can very well compute
> the number of iterations for a loop with wrapping IVs.  For your unit test
> this only works because of the special-casing of step 1 IVs.
I happen to look into similar issue right now.  scev_probably_wraps_p
and thus chrec_convert_1 should be improved using niter information.
Actually all information (and the wrap behavior) has already been
computed in tree-ssa-loop-niter.c.  We just need to find a way to used
it.

>
> Technically it might be more interesting to compute wrapping of IVs
> during niter analysis in some more generic way (we have iv->no_overflow
> computed by simple_iv, but that is rather not useful here).

For it iv->no_overflow is computed in simple_iv as below:
  tmp = analyze_scalar_evolution (use_loop, ev);
  ev = resolve_mixers (use_loop, tmp);

  if (folded_casts && tmp != ev)
*folded_casts = true;

It's inaccurate because calling resolve_mixers doesn't mean the result
scev will wrap.  resolve_mixers could have just done exact the same
transformation as instantiate_parameters.  Also
chrec_convert_aggressive is incomplete and need to revised too.

Thanks,
bin
>
> Richard.
>
>> Thanks,
>> Abderrazek Zaafrani


Re: [gomp4] Lack of OpenACC NVPTX devices is not an error during scanning

2015-05-19 Thread Jakub Jelinek
On Tue, May 19, 2015 at 11:36:58AM +0100, Julian Brown wrote:
> This patch fixes an oversight whereby if the CUDA libraries are
> available for some reason on a system that doesn't actually contain an
> nVidia card, an OpenACC program will raise an error if the NVPTX
> backend is picked as a default instead of falling back to some other
> device instead.
> 
> OK for gomp4 branch? For trunk?
> 
> Thanks,
> 
> Julian
> 
> ChangeLog
> 
> libgomp/
> * plugin/plugin-nvptx.c (nvptx_get_num_devices): Return zero
> on cuInit failure.

LGTM.

Jakub


[gomp4] Lack of OpenACC NVPTX devices is not an error during scanning

2015-05-19 Thread Julian Brown
Hi,

This patch fixes an oversight whereby if the CUDA libraries are
available for some reason on a system that doesn't actually contain an
nVidia card, an OpenACC program will raise an error if the NVPTX
backend is picked as a default instead of falling back to some other
device instead.

OK for gomp4 branch? For trunk?

Thanks,

Julian

ChangeLog

libgomp/
* plugin/plugin-nvptx.c (nvptx_get_num_devices): Return zero
on cuInit failure.commit 696a0d7e22bb8217ff581886cdf0979bfc2e85bb
Author: Julian Brown 
Date:   Fri May 15 03:22:56 2015 -0700

Lack of PTX devices is not an error during scanning.

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index b36691a..d09a91c 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -781,7 +781,13 @@ nvptx_get_num_devices (void)
  until cuInit has been called.  Just call it now (but don't yet do any
  further initialization).  */
   if (instantiated_devices == 0)
-cuInit (0);
+{
+  r = cuInit (0);
+  /* This is not an error: e.g. we may have CUDA libraries installed but
+ no devices available.  */
+  if (r != CUDA_SUCCESS)
+return 0;
+}
 
   r = cuDeviceGetCount (&n);
   if (r!= CUDA_SUCCESS)


[gomp4] Add OpenACC vector-single/vector-partitioned tests

2015-05-19 Thread Julian Brown
Hi,

This patch adds several tests of vector-single/vector-partitioned mode,
as part of work implementing the OpenACC execution model.

Pre-approved for gomp4 branch. I will apply there shortly.

Thanks,

Julian

ChangeLog

libgomp/
* testsuite/libgomp.oacc-c-c++-common/vec-single-{1,2,3,4,5,6}.c:
New tests.
* testsuite/libgomp.oacc-c-c++-common/vec-partn-{1,2,3,4,5,6}.c:
New tests.
commit b2bb572cef2b6b0984d65995e070dc424b03a525
Author: jbrown 
Date:   Mon May 11 16:04:48 2015 +

Add vector-single/vector-partitioned tests.

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-1.c
new file mode 100644
index 000..b21e588
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-1.c
@@ -0,0 +1,30 @@
+#include 
+
+/* Test basic vector-partitioned mode transitions.  */
+
+int
+main (int argc, char *argv[])
+{
+  int n = 0, arr[32], i;
+
+  for (i = 0; i < 32; i++)
+arr[i] = 0;
+
+  #pragma acc parallel copy(n, arr) num_gangs(1) num_workers(1) \
+		   vector_length(32)
+  {
+int j;
+n++;
+#pragma acc loop vector
+for (j = 0; j < 32; j++)
+  arr[j]++;
+n++;
+  }
+
+  assert (n == 2);
+
+  for (i = 0; i < 32; i++)
+assert (arr[i] == 1);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-2.c
new file mode 100644
index 000..1ff222d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-2.c
@@ -0,0 +1,43 @@
+#include 
+
+/* Test vector-partitioned, gang-partitioned mode.  */
+
+int
+main (int argc, char *argv[])
+{
+  int n[32], arr[1024], i;
+  
+  for (i = 0; i < 1024; i++)
+arr[i] = 0;
+
+  for (i = 0; i < 32; i++)
+n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) num_gangs(32) num_workers(1) \
+		   vector_length(32)
+  {
+int j, k;
+
+#pragma acc loop gang(static:*)
+for (j = 0; j < 32; j++)
+  n[j]++;
+
+#pragma acc loop gang
+for (j = 0; j < 32; j++)
+  #pragma acc loop vector
+  for (k = 0; k < 32; k++)
+	arr[j * 32 + k]++;
+
+#pragma acc loop gang(static:*)
+for (j = 0; j < 32; j++)
+  n[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+assert (n[i] == 2);
+
+  for (i = 0; i < 1024; i++)
+assert (arr[i] == 1);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c
new file mode 100644
index 000..7908d4c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c
@@ -0,0 +1,54 @@
+#include 
+
+/* Test conditional vector-partitioned loops.  */
+
+int
+main (int argc, char *argv[])
+{
+  int n[32], arr[1024], i;
+
+  for (i = 0; i < 1024; i++)
+arr[i] = 0;
+
+  for (i = 0; i < 32; i++)
+n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) num_gangs(32) num_workers(1) \
+		   vector_length(32)
+  {
+int j, k;
+
+#pragma acc loop gang(static:*)
+for (j = 0; j < 32; j++)
+  n[j]++;
+
+#pragma acc loop gang
+for (j = 0; j < 32; j++)
+  {
+	if ((j % 2) == 0)
+	  {
+	#pragma acc loop vector
+	for (k = 0; k < 32; k++)
+	  arr[j * 32 + k]++;
+	  }
+	else
+	  {
+	#pragma acc loop vector
+	for (k = 0; k < 32; k++)
+	  arr[j * 32 + k]--;
+	  }
+  }
+
+#pragma acc loop gang(static:*)
+for (j = 0; j < 32; j++)
+  n[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+assert (n[i] == 2);
+
+  for (i = 0; i < 1024; i++)
+assert (arr[i] == (i % 64) < 32 ? 1 : -1);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-4.c
new file mode 100644
index 000..4ea3bf2
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-4.c
@@ -0,0 +1,46 @@
+#include 
+
+/* Test conditions inside vector-partitioned loops.  */
+
+int
+main (int argc, char *argv[])
+{
+  int n[32], arr[1024], i;
+
+  for (i = 0; i < 1024; i++)
+arr[i] = i;
+
+  for (i = 0; i < 32; i++)
+n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) num_gangs(32) num_workers(1) \
+		   vector_length(32)
+  {
+int j, k;
+
+#pragma acc loop gang(static:*)
+for (j = 0; j < 32; j++)
+  n[j]++;
+
+#pragma acc loop gang
+for (j = 0; j < 32; j++)
+  {
+	#pragma acc loop vector
+	for (k = 0; k < 32; k++)
+	  if ((arr[j * 32 + k] % 2) != 0)
+	arr[j * 32 + k] *= 2;
+  }
+
+#pragma acc loop gang(static:*)
+for (j = 0; j < 32; j++)
+  n[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+assert (n[i] == 2);
+
+  for (i = 0; i < 1024; i++)
+assert (arr[i] == ((i % 2) == 0 ? i : i * 2));
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-5.c
new file mode 100644
index 000..86b742a
--- /dev/null
+++ b/libgomp/testsuite/li

  1   2   >