Re: [patch,avr] minor tweaks for 8-bit operations

2016-07-13 Thread Senthil Kumar Selvaraj

Georg-Johann Lay writes:

> This patch contains some unrelated tweaks
>
> - Supplying no-ldregs variant for andqi3, iorqi3 where a const_int mask 
> affects 
> only 1 bit
>
> - Some patterns that match situations with zero_extend that can be performed 
> with less instructions / register pressure.

>From my (admittedly limited) attempts at code size benchmarking, this
one will help a lot I think, considering ints are used everywhere and
they end up always taking 2 regs, even when one would do. Do you have any 
numbers on the code size improvement this provides?

Regards
Senthil
>
> - comparing HI against -1
>
> Ok for trunk?
>
> Johann
>
>
> gcc/
>   Minor tweaks for QImode.
>
>   * config/avr/predicates.md (const_m255_to_m1_operand): New.
>   * config/avr/constraints.md (Cn8, Ca1, Co1, Yx2): New constraints.
>   * config/avr/avr.md (add3) : Make "r,0,r" more
>   expensive.
>   (*cmphi.zero-extend.0, *cmphi.zero-extend.1)
>   (*usum_widenqihi3, *udiff_widenqihi3)
>   (*addhi3_zero_extend.const): New combiner insns.
>   (andqi3, iorqi3): Provide "l" (NO_LD_REGS) alternative if
>   just 1 bit is affected.
>   * config/avr/avr.c (avr_out_bitop) : Don't access xop[3].
>   (avr_out_compare) [EQ,NE]: Tweak comparing d-regs against -1.



Re: [PATCH, rs6000] Fix PR target/71733, ICE with -mcpu=power9 -mno-vsx

2016-07-13 Thread Alan Modra
On Thu, Jul 14, 2016 at 12:14:10PM +0930, Alan Modra wrote:
> (Which is a bug, since the advent of multi-char constraints, and
> potentially affects us with our use of constraint strings like "?*wb".)

On looking into this, I see it is just a documentation bug.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH, rs6000] Fix PR target/71733, ICE with -mcpu=power9 -mno-vsx

2016-07-13 Thread Alan Modra
On Wed, Jul 13, 2016 at 06:46:22AM -0500, Segher Boessenkool wrote:
> On Wed, Jul 13, 2016 at 02:57:01PM +0930, Alan Modra wrote:
> > This is what I've bootstrapped and regression tested on
> > powerpc64le-linux.  I'm using Peter's testcases from this thread
> > rather than the one in the original patch submission, because that one
> > relies on -O0 not reducing the function down to a nop.  OK to apply?
> 
> This is okay.  Does it need a backport?  Okay for 6 as well, then.

Yes, gcc-6 needs the patch too.

> We all agreed the question marks would be a good idea, but you say it
> causes some testcases to fail.  Could you investigate please?

After much head-scratching (danger of splinters) I noticed that I'd
written '*?r' in the patch rather than '?*r'.  That explained failures
like gcc.target/powerpc/ppc-vector-memset.c using gprs rather than vrs
for the memset expansion.  According to md.text, '*' says the
following char should be ignored when choosing register preferences.
(Which is a bug, since the advent of multi-char constraints, and
potentially affects us with our use of constraint strings like "?*wb".)
Anyway, what was ignored for reg allocation was the '?', not 'r'.

Also, '*Y' is a bit pointless since 'Y' isn't a register constraint.
The '*' belongs on the corresponding operand 1 'r'.

I also saw ICEs in rs6000_split_multireg_move on a number of
gcc.dg/vmx testcases, but the ICEs disappeared with the constraints
fixed.  I can't give you the exact rtl involved now since my debug
session had a connection timeout, but it was this assert:
  gcc_assert (GET_CODE (XEXP (dst, 0)) == PLUS
  && REG_P (basereg)
  && REG_P (offsetreg)
  && REGNO (basereg) != REGNO (offsetreg));
and we had an altivec MEM with an AND address for the destination of a
SET with gprs for the source.  Probably quite easily fixed by
stripping off the AND.

The following has now been bootstrapped and regression tested on
powerpc64le-linux.  OK for mainline?

* gcc/config/rs6000/altivec.md (altivec_mov): Disparage
gpr alternatives.  Correct '*' placement on Y,r alternative.
Add '*' on operand 1 of r,r alternative.

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index aa01ac9..9193f07 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -222,8 +222,8 @@
 
 ;; Vector move instructions.
 (define_insn "*altivec_mov"
-  [(set (match_operand:VM2 0 "nonimmediate_operand" "=Z,v,v,*Y,*r,*r,v,v,*r")
-   (match_operand:VM2 1 "input_operand" "v,Z,v,r,Y,r,j,W,W"))]
+  [(set (match_operand:VM2 0 "nonimmediate_operand" 
"=Z,v,v,?Y,?*r,?*r,v,v,?*r")
+   (match_operand:VM2 1 "input_operand" "v,Z,v,*r,Y,*r,j,W,W"))]
   "VECTOR_MEM_ALTIVEC_P (mode)
&& (register_operand (operands[0], mode) 
|| register_operand (operands[1], mode))"

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

2016-07-13 Thread Manuel López-Ibáñez

On 01/07/16 19:15, Martin Sebor wrote:
+ /* Differentiate between an exact and inexact buffer overflow
+or truncation.  */
+ const char *fmtstr;
+ if (res->number_chars < 0)
+   fmtstr = info->bounded
+ ? "output may be truncated at or before format character "
+   "%qc at offset %qlu past the end of a region of size %qlu"
+ : "writing format character %qc at offset %qlu "
+   "in a region of size %qlu";
+ else
+   fmtstr = info->bounded
+ ? "output truncated at format character %qc at offset %qlu "
+   "just past the end of a region of size %qlu"
+ : "writing format character %qc at offset %qlu "
+   "just past the end of a region of size %qlu";
+ warning_at (loc, OPT_Wformat_length_, fmtstr,
+ format_chars [-1], off - 1,
+ (unsigned long)info->objsize);
+   }


I'm not sure gettext can parse the text of format strings given like this. It 
may be smarter enough if the conditional expression is directly the argument to 
warning_at. GCC's -Wformat has the same limitations. Of course, the fool-proof 
way is to use multiple calls to warning_at.


Cheers,
Manuel.





Re: [v3 PATCH] Implement P0032R3, Homogeneous interface for variant, any and optional, for the parts concerning any and optional.

2016-07-13 Thread Ville Voutilainen
On 14 July 2016 at 02:23, Ville Voutilainen  wrote:
> On 14 July 2016 at 02:14, Ville Voutilainen  
> wrote:
>> * include/std/any (_Storage()): Make constexpr and have it
>> initialize _M_ptr.
>
>
> This bit had a stale left-over comment in it. Removed in the updated patch.

Also, the in_place(__in_place*) needs to be inline. Another updated
patch attached.


P0032R3_3.diff.gz
Description: GNU Zip compressed data


Re: [PATCH] Add missing OBJCOPY variable to Makefile.in

2016-07-13 Thread JonY
On 7/14/2016 06:22, Jeff Law wrote:
> On 07/03/2016 05:56 AM, JonY wrote:
>> This patch allows OBJCOPY to be set by configure. It was missing in
>> Makefile.in.
>>
>> Patch OK?
> With a ChangeLog and verification that some host/target combination
> still builds this is OK.
> 
> jeff
> 
> 


Hi Jeff,

I just realized a similar change is required in Makefile.tpl, how do you
regenerate Makefile.in?




signature.asc
Description: OpenPGP digital signature


Re: [v3 PATCH] Implement P0032R3, Homogeneous interface for variant, any and optional, for the parts concerning any and optional.

2016-07-13 Thread Ville Voutilainen
On 14 July 2016 at 02:14, Ville Voutilainen  wrote:
> * include/std/any (_Storage()): Make constexpr and have it
> initialize _M_ptr.


This bit had a stale left-over comment in it. Removed in the updated patch.


P0032R3_2.diff.gz
Description: GNU Zip compressed data


[v3 PATCH] Implement P0032R3, Homogeneous interface for variant, any and optional, for the parts concerning any and optional.

2016-07-13 Thread Ville Voutilainen
Tested on Linux-x64.

2016-07-14  Ville Voutilainen  

Implement P0032R3, Homogeneous interface for variant, any and optional,
for the parts concerning any and optional.
* include/std/any (_Storage()): Make constexpr and have it
initialize _M_ptr.
(any()): Make constexpr.
(any(const any&)): Adjust.
(any(any&&)): Likewise.
(__any_constructible_t): New.
(any(_ValueType&&)): Constrain.
(any(in_place_type_t<_Tp>, _Args&&...)): New.
(any(in_place_type_t<_Tp>, initializer_list<_Up>, _Args&&...)):
Likewise.
(~any()): Adjust.
(operator=(const any&)): Likewise.
(operator=(any&&)): Likewise.
(operator=(_ValueType&&)): Constrain.
(emplace(_Args&&...)): New.
(emplace(initializer_list<_Up>, _Args&&...)): Likewise.
(clear()): Remove.
(reset()): New.
(swap(any&)): Adjust.
(empty()): Remove.
(has_value()): New.
(type()): Adjust.
(_Manager_internal::_S_create(_Storage&, _Args&&...)): New.
(_Manager_external::_S_create(_Storage&, _Args&&...)): Likewise.
(make_any(_Args&&...)): Likewise.
(make_any(initializer_list<_Up>, _Args&&...)): Likewise.
* include/std/optional (in_place_t, in_place): Remove.
(bad_optional_access): Add a comment referring to LEWG 72.
(emplace(_Args&&...)): Constrain.
(has_value()): New.
(reset()): Likewise.
(make_optional(_Args&&...)): Likewise.
(make_optional(initializer_list<_Up>, _Args&&...)): Likewise.
* include/std/utility (in_place_tag): New.
(__in_place, __in_place_type, __in_place_index): Likewise.
(in_place_t, in_place_type_t, in_place_index_t): Likewise.
(in_place(__in_place*)): Likewise.
(in_place(__in_place_type<_Tp>*)): Likewise.
(in_place(__in_place_index<_Idx>*)): Likewise.
* testsuite/20_util/any/assign/1.cc: Adjust.
* testsuite/20_util/any/assign/emplace.cc: New.
* testsuite/20_util/any/assign/self.cc: Adjust.
* testsuite/20_util/any/cons/1.cc: Likewise.
* testsuite/20_util/any/cons/in_place.cc: New.
* testsuite/20_util/any/make_any.cc: Likewise.
* testsuite/20_util/any/misc/any_cast_neg.cc: Adjust.
* testsuite/20_util/any/misc/swap.cc: Likewise.
* testsuite/20_util/any/modifiers/1.cc: Likewise.
* testsuite/20_util/any/requirements.cc: New.
* testsuite/20_util/in_place/requirements.cc: Likewise.
* testsuite/20_util/optional/constexpr/in_place.cc: Adjust.
* testsuite/20_util/optional/in_place.cc: Likewise.
* testsuite/20_util/optional/make_optional.cc: Add tests for
the new overloads of make_optional.


P0032R3.diff.gz
Description: GNU Zip compressed data


Re: Implement -Wswitch-fallthrough: rs6000

2016-07-13 Thread Marek Polacek
On Wed, Jul 13, 2016 at 03:35:10PM -0700, Bruce Korb wrote:
> Actually, it occurs to me:
> 
> On Wed, Jul 13, 2016 at 11:23 AM, Marek Polacek  wrote:
> > My current implementation warns here, but the warning can be suppressed
> > by adding /* FALLTHRU */ or [...]
> 
> that the traditional "lint-ean" spelling is "/* FALLTHROUGH */", so
> why would the abbrev be accepted and the full spelling not?  I think
> "FALLTHRU" was added a mere 20 years ago and I go back farther than
> that.  (40+, actually).

We plan to support wide range of such "falls through" comments, see
the list here .
We're hoping this will be enough.

Thanks,

Marek


Re: Implement -Wswitch-fallthrough: rs6000

2016-07-13 Thread Bruce Korb
Actually, it occurs to me:

On Wed, Jul 13, 2016 at 11:23 AM, Marek Polacek  wrote:
> My current implementation warns here, but the warning can be suppressed
> by adding /* FALLTHRU */ or [...]

that the traditional "lint-ean" spelling is "/* FALLTHROUGH */", so
why would the abbrev be accepted and the full spelling not?  I think
"FALLTHRU" was added a mere 20 years ago and I go back farther than
that.  (40+, actually).


Re: [PATCH] correct atomic_compare_exchange_n return type (c++/71675)

2016-07-13 Thread Jeff Law

On 06/28/2016 09:56 AM, Martin Sebor wrote:

c++/71675 - __atomic_compare_exchange_n returns wrong type for
typed enum

__atomic_compare_exchange_n is documented to return bool but when
its operands are of one of the character types or derived from
it (such as a class enum in C++) it converts the bool result to
that type.  The attached patch corrects that to prevent this
conversion, analogously to what's already being done for
__sync_bool_compare_and_swap.

Martin

gcc-71675.diff


PR c++/71675 - __atomic_compare_exchange_n returns wrong type for typed enum

gcc/c-family/ChangeLog:
2016-06-28  Martin Sebor  

PR c++/71675
* c-common.c (resolve_overloaded_builtin): Avoid converting
__atomic_compare_exchange_n return type to that of what its
first argument points to.

gcc/testsuite/ChangeLog:
2016-06-28  Martin Sebor  

PR c++/71675
* g++.dg/ext/atomic-3.C: New test.
* gcc.dg/atomic/pr71675.c: New test.

OK.  Thanks.
jeff


Re: [PATCH] Add missing OBJCOPY variable to Makefile.in

2016-07-13 Thread Jeff Law

On 07/03/2016 05:56 AM, JonY wrote:

This patch allows OBJCOPY to be set by configure. It was missing in
Makefile.in.

Patch OK?
With a ChangeLog and verification that some host/target combination 
still builds this is OK.


jeff



Re: [PATCH 2/2] gcc/genrecog: Don't warn for missing mode on special predicates

2016-07-13 Thread Jeff Law

On 07/06/2016 01:42 PM, Andrew Burgess wrote:

* Richard Sandiford  [2016-07-04 09:47:20 +0100]:

Thanks for removing the duplicated error check for unknown predicates.
I think that error gets reported later though, so we should check for
null here:

  return pred && pred->special;

OK with that change, thanks.


Richard,

Thanks for the continued reviews.  I don't have GCC write access, so I
wonder if you would be willing to commit this patch for me please.

There's an updated version below that includes the latest change you
suggested.

Many thanks,
Andrew

---

[PATCH] gcc/genrecog: Don't warn for missing mode on special predicates

In md.texi it says:

  Predicates written with @code{define_special_predicate} do not get any
  automatic mode checks, and are treated as having special mode handling
  by @command{genrecog}.

In genrecog, when validating a SET pattern, there is already a special
case for 'address_operand' which is a special predicate, however,
other special predicates fall through to the code which checks for
incorrect use of VOIDmode.

This commit adds a new function for detecting special predicates, and
then generalises the check in validate_pattern so that mode checking
is skipped for all special predicates.

gcc/ChangeLog:

* genrecog.c (special_predicate_operand_p): New function.
(predicate_name): Move function.
(validate_pattern): Don't warn about missing mode for all
define_special_predicate predicates.

Committed.  THanks,

jeff



Re: [PATCH v5] Allocate constant size dynamic stack space in the prologue

2016-07-13 Thread Jeff Law

On 07/11/2016 05:44 AM, Dominik Vogt wrote:

On Thu, Jul 07, 2016 at 12:57:16PM +0100, Dominik Vogt wrote:

On Wed, Jul 06, 2016 at 02:01:06PM +0200, Bernd Schmidt wrote:

There's one thing I don't quite understand and which seems to have
changed since v1:

On 07/04/2016 02:19 PM, Dominik Vogt wrote:

@@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct 
stack_vars_data *data)

  /* If there were any, allocate space.  */
  if (large_size > 0)
-   large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
-  large_align, true);
+   {
+ large_allocsize = GEN_INT (large_size);
+ get_dynamic_stack_size (_allocsize, 0, large_align, NULL);
+   }
}

  for (si = 0; si < n; ++si)
@@ -1186,6 +1190,19 @@ expand_stack_vars (bool (*pred) (size_t), struct 
stack_vars_data *data)
  /* Large alignment is only processed in the last pass.  */
  if (pred)
continue;
+
+ if (large_allocsize && ! large_allocation_done)
+   {
+ /* Allocate space the virtual stack vars area in the
+prologue.  */
+ HOST_WIDE_INT loffset;
+
+ loffset = alloc_stack_frame_space
+   (INTVAL (large_allocsize),
+PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT);
+ large_base = get_dynamic_stack_base (loffset, large_align);
+ large_allocation_done = true;
+   }
  gcc_assert (large_base != NULL);



Why is this code split between the two places here?


This is a bugfix from v1 to v2.
I think I had to move this code to the second loop so that the
space for dynamically aligned variables is allocated at the "end"
of the space for stack variables.  I cannot remember what the bug
was, but maybe it was that the variables with fixed and static
alignment ended up at the same address.

Anyway, now that the new allocation strategy is used
unconditionally, it should be possible to move the
get_dynamic_stack_size call down to the second loop and simplify
the code somewhat.  I'll look into that.


Version 5 with some code moved from the first loop to the second.

-ENOPATCH
jeff



Re: [PATCH 2/2] C++ FE: handle misspelled identifiers and typenames

2016-07-13 Thread Jeff Law

On 06/30/2016 12:53 PM, David Malcolm wrote:

This is a port of the C frontend's r237714 [1] to the C++ frontend:
  https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01052.html
offering spelling suggestions for misspelled identifiers, macro names,
and some keywords (e.g. "singed" vs "signed" aka PR c/70339).

Unlike the C frontend, there doesn't seem to be an easy way to
distinguish between cases where we're expecting a typename vs
a variable name, so some of the logic is a little different.

Examples of suggestions can be seen in the test case.

Successfully bootstrapped on x86_64-pc-linux-gnu
(in combination with the prior patch); adds 240 PASS results to
g++.sum.

OK for trunk?

Dave

[1] aka 8469aece13814deddf2cd80538d33c2d0a8d60d9 in the git mirror

gcc/c/ChangeLog:
PR c/70339
* c-decl.c (struct edit_distance_traits): Move to
spellcheck-tree.h
(best_macro_match): Likewise, converting from a typedef to a
subclass.
(find_closest_macro_cpp_cb): Move to spellcheck-tree.c.
(lookup_name_fuzzy): Update for change of best_macro_match to a
subclass with a ctor that calls cpp_forall_identifiers.

These are fine.



gcc/cp/ChangeLog:
PR c/70339
* name-lookup.c: Include gcc-rich-location.h, spellcheck-tree.h,
and parser.h.
(suggest_alternatives_for): If no candidates are found, try
lookup_name_fuzzy and report if if finds a suggestion.
(consider_binding_level): New function.
(consider_binding_levels): New function.
(lookup_name_fuzzy) New function.
* parser.c: Include gcc-rich-location.h.
(cp_lexer_next_token_is_decl_specifier_keyword): Move most of
logic into...
(cp_keyword_starts_decl_specifier_p): ...this new function.
(cp_parser_diagnose_invalid_type_name): When issuing
"does not name a type" errors, attempt to make a suggestion using
lookup_name_fuzzy.
* parser.h (cp_keyword_starts_decl_specifier_p): New prototype.
* search.c (lookup_field_fuzzy_info::fuzzy_lookup_field): Don't
attempt to access TYPE_FIELDS within a TYPE_PACK_EXPANSION.

Going to let Jason on this part.



gcc/ChangeLog:
PR c/70339
* diagnostic-show-locus.c (diagnostic_show_locus): If this is the
same location as last time, don't skip if we have fix-it hints.
Clarify the skipping logic by converting it from one "if" clause
to repeated "if" clauses.
* spellcheck-tree.c: Include "cpplib.h".
(find_closest_macro_cpp_cb): Move here from c/c-decl.c.
(best_macro_match::best_macro_match): New constructor.
* spellcheck-tree.h (struct edit_distance_traits):
Move here from c/c-decl.c.
(class best_macro_match): Move here from c/c-decl.c, converting
from a typedef to a subclass, gaining a ctor.

OK.



gcc/testsuite/ChangeLog:
PR c/70339
* g++.dg/spellcheck-identifiers.C: New test case, based on
gcc.dg/spellcheck-identifiers.c.
* g++.dg/spellcheck-typenames.C: New test case, based on
gcc.dg/spellcheck-typenames.c

OK


jeff



Re: [PATCH 1/2] C: convert return type of lookup_name_fuzzy from tree to const char *

2016-07-13 Thread Jeff Law

On 06/30/2016 12:53 PM, David Malcolm wrote:

The followup patch implements lookup_name_fuzzy for the C++ frontend.
It's cleaner for that implementation to return a const char *, so this
patch updates the implementation in the C frontend to have the same
return type.

Successfully bootstrapped on x86_64-pc-linux-gnu
(in combination with the followup patch).

OK for trunk?

gcc/c-family/ChangeLog:
* c-common.h (lookup_name_fuzzy): Convert return type from tree to
const char *.

gcc/c/ChangeLog:
* c-decl.c (implicit_decl_warning): Update for conversion of
return type of lookup_name_fuzzy to const char *.
(undeclared_variable): Likewise.
(lookup_name_fuzzy): Convert return type from tree to
const char *.
* c-parser.c (c_parser_declaration_or_fndef): Update for
conversion of return type of lookup_name_fuzzy to const char *.
(c_parser_parameter_declaration): Likewise.

gcc/ChangeLog:
* gcc-rich-location.c
(gcc_rich_location::add_fixit_misspelled_id): New overload, taking
a const char *.
* gcc-rich-location.h
(gcc_rich_location::add_fixit_misspelled_id): Likewise.

OK.
jeff



Re: Problem in cxx_fundamental_alignment_p?

2016-07-13 Thread Jeff Law

On 07/01/2016 03:56 AM, Bernd Schmidt wrote:

On 07/01/2016 10:34 AM, Dodji Seketeli wrote:


The patch below was bootstrapped and tested on x86_64-linux, without
issues,


The patch looks good to me, thanks.


Alright. I think we need a C frontend maintainer or maybe Jason to
approve it. Attaching again.


but I'm not convinced this code is covered by any testcase.


Hmhm, looking at the test cases, accompanying the initial patch that
introduced that code, I agree.  I guess we should probably add a test
case that
uses [[gnu::aligned (val)]], with 'val' being an alignment that is
greater than MAX (TYPE_ALIGN_UNIT (long_long_integer_type_node),
  TYPE_ALIGN_UNIT (long_double_type_node))
in the g++.dg/cpp0x/gen-attrs-*.C series of tests?


I really don't know what this code is even doing, so I'll leave that up
to you...


Bernd


fundamental-align.diff


* c-common.c (cxx_fundamental_alignment_p): Use TYPE_ALIGN_UNIT,
not TYPE_ALIGN.

FWIW, this patch is OK.  It'd be even better if we had a testcase.

A patch making cxx_fundamental_alignment_p static, removing its 
protototype from c-common.h is pre-approved as AFAICT 
cxx_fundamental_alignment_p is not used outside c-common.c


jeff



Re: [PATCH 3/6] Add dg-final-scan-autofdo and dg-final-scan-not-autofdo

2016-07-13 Thread Andi Kleen
Jeff Law  writes:

> On 06/26/2016 07:50 PM, Andi Kleen wrote:
>> From: Andi Kleen 
>>
>> Autofdo outputs to different dump files and doesn't support some
>> transformation that normal profiling. Add dg-final-scan-autofdo
>> and dg-final-scan-not-autofdo statements to the test suite
>> so that the test cases can hande those cases separately.
> Which seems to match my assertion that value-prof does things that
> auto-prof does not.

This is right -- autofdo does not do any value profiling
(although it could in theory, and there is at least one paper about it)

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


RE: [PATCH] Fix FFI return type for proxy classes

2016-07-13 Thread Matthew Fortune
Tom Tromey  writes:
> > "Matthew" == Matthew Fortune  writes:
> 
> Matthew> Tested on: x86_64-pc-linux-gnu (default and -m32), mips-linux-gnu
> Matthew> mipsel-linux-gnuabi64 with no regressions. The new test only failed
> Matthew> on mips-linux-gnu prior to patching libjava.
> 
> Matthew> libjava/
> Matthew>  * java/lang/reflect/natVMProxy.cc (unbox): Use ffi_arg for
> Matthew>  integer return types smaller than a word.
> Matthew>  * testsuite/libjava.jar/ReturnInvocationHandler.java: New file.
> Matthew>  * testsuite/libjava.jar/ReturnProxyTest.jar: Likewise.
> Matthew>  * testsuite/libjava.jar/ReturnProxyTest.java: Likewise.
> Matthew>  * testsuite/libjava.jar/ReturnProxyTest.out: Likewise.
> Matthew>  * testsuite/libjava.jar/ReturnProxyTest.xfail: Likewise.
> Matthew>  * testsuite/libjava.jar/ReturnTypes.java: Likewise.
> Matthew>  * testsuite/libjava.jar/ReturnTypesImpl.java: Likewise.
> 
> Thanks for writing this.
> This is ok.

Thanks for the review.

Committed as r238312. This also needs a backport to GCC 5 and 6 but I will
give it a week or so before proposing.

Thanks,
Matthew


RE: [PATCH] Fix FFI return type for closures in the java interpreter

2016-07-13 Thread Matthew Fortune
Tom Tromey  writes:
> > "Matthew" == Matthew Fortune  writes:
> 
> Matthew> Sorry for the long delay...
> 
> No problem.
> 
> >> This is ok.
> >> Could you check?  I think a -m32 build ought to show it.  Maybe your
> >> x86-64 build already did this?
> 
> Matthew> Still OK to commit?
> 
> Yes, thanks.

Committed as r238311. I'll wait a short while before proposing backport
to GCC 5 and 6 branches.

Matthew


Re: [PATCH, contrib] download_prerequisites: check for existing symlinks before making new ones

2016-07-13 Thread Jeff Law

On 06/27/2016 08:10 PM, Eric Gallager wrote:

The last time I ran ./contrib/download_prerequisites, I already had
previous symlinks set up from a previous run of the script, so `ln`
followed the existing symlinks and created the new ones in the
directories to which the symlinks pointed. This patch should fix that
by removing the old symlinks before creating new ones. (For some
reason the `-f` flag to `ln` that was already there wasn't enough for
me.) Tested by running the script and ensuring that the new isl
symlink pointed to the correct directory, and that there were no bad
symlinks in the old isl directory. Could someone commit this trivial
patch for me, or something like it? I don't have write access.
I'd really rather know why the "-f" flag didn't work for you.  The whole 
point of -f is to remove the destination file first.


Jeff



Re: [PATCH 6/6] Some fixes for profile test cases for autofdo

2016-07-13 Thread Jeff Law

On 06/26/2016 07:50 PM, Andi Kleen wrote:

From: Andi Kleen 

This fixes some basic issues with the profile test cases with autofdo.

- Disable checking for value transformations that autofdo does not
  support.
- Disable checking for fixed hit counts which autofdo does not support
- Enable dumping of afdo log file and check right log file.
- Increase run time of test cases to 1M iterations because autofdo needs
  a few samples to make sense of a program. The test case don't run
  noticeable slower with that.

There are still failures unfortunately, especially the indirect call
transformations do not trigger because autofdo thinks they are not hot.
This can be addressed later.

gcc/testsuite/:

2016-06-26  Andi Kleen  

* g++.dg/tree-prof/indir-call-prof.C: Basic fixes for autofdo.
* g++.dg/tree-prof/morefunc.C: Dito.
* g++.dg/tree-prof/pr35545.C: Dito.
* g++.dg/tree-prof/reorder.C: Dito.
* gcc.dg/tree-prof/20050826-2.c: Dito.
* gcc.dg/tree-prof/cmpsf-1.c: Dito.
* gcc.dg/tree-prof/cold_partition_label.c: Dito.
* gcc.dg/tree-prof/ic-misattribution-1.c: Dito.
* gcc.dg/tree-prof/indir-call-prof.c: Dito.
* gcc.dg/tree-prof/inliner-1.c: Dito.
* gcc.dg/tree-prof/merge_block.c: Dito.
* gcc.dg/tree-prof/stringop-1.c: Dito.
* gcc.dg/tree-prof/stringop-2.c: Dito.
* gcc.dg/tree-prof/switch-case-1.c: Dito.
* gcc.dg/tree-prof/switch-case-2.c: Dito.
* gcc.dg/tree-prof/time-profiler-1.c: Dito.
* gcc.dg/tree-prof/time-profiler-2.c: Dito.
* gcc.dg/tree-prof/update-loopch.c: Dito.
* gcc.dg/tree-prof/val-prof-1.c: Dito.
* gcc.dg/tree-prof/val-prof-2.c: Dito.
* gcc.dg/tree-prof/val-prof-3.c: Dito.
* gcc.dg/tree-prof/val-prof-4.c: Dito.
* gcc.dg/tree-prof/val-prof-5.c: Dito.
* gcc.dg/tree-prof/val-prof-6.c: Dito.
* gcc.dg/tree-prof/val-prof-7.c: Dito.

OK.
jeff



Re: [PATCH 1/6] Print indirect call changes in afdo dump file

2016-07-13 Thread Jeff Law

On 06/26/2016 07:50 PM, Andi Kleen wrote:

From: Andi Kleen 

Print some information about indirect call promotions in the afdo dump
file. Do it in the same format as the instrumented profiler so that
the test suite can match on it.

gcc/:

2016-06-26  Andi Kleen  

* auto-profile.c (update_inlined_ind_target,
afdo_indirect_call): Print information to dump file.

OK.
jeff



Re: [PATCH 5/6] Clean up imports files in test suite

2016-07-13 Thread Jeff Law

On 06/26/2016 07:50 PM, Andi Kleen wrote:

From: Andi Kleen 

autofdo create_gcov creates an extra .imports file. Always remove that
too when running an autofdo test case.

gcc/testsuite/:

* 2016-06-26  Andi Kleen  

* lib/profopt.exp (profopt-execute): Remove .imports files.

OK.
jeff



Re: [PATCH 4/6] Always print gcc-auto-profile line in dump file

2016-07-13 Thread Jeff Law

On 06/26/2016 07:50 PM, Andi Kleen wrote:

From: Andi Kleen 

not just when verbose. This ensures all command lines needed to
reproduce the test case are always logged

gcc/testsuite/:

* 2016-06-26  Andi Kleen  

* lib/profopt.exp (profopt-execute): Always log profiler
command line.

OK.
jeff



Re: [PATCH 3/6] Add dg-final-scan-autofdo and dg-final-scan-not-autofdo

2016-07-13 Thread Jeff Law

On 06/26/2016 07:50 PM, Andi Kleen wrote:

From: Andi Kleen 

Autofdo outputs to different dump files and doesn't support some
transformation that normal profiling. Add dg-final-scan-autofdo
and dg-final-scan-not-autofdo statements to the test suite
so that the test cases can hande those cases separately.
Which seems to match my assertion that value-prof does things that 
auto-prof does not.





gcc/testsuite/:

2016-06-26  Andi Kleen  

* lib/profopt.exp (dg-final-scan-autofdo,
dg-final-scan-not-autofdo): New functions.

Regardless, this is OK.
jeff



Re: [PATCH 2/6] Don't run instrumented value profiler changes with afdo

2016-07-13 Thread Jeff Law

On 06/26/2016 07:50 PM, Andi Kleen wrote:

From: Andi Kleen 

The pass to transform gimple based on value profiling runs with autofdo
on, but currently every transformation fails. For indirect calls autofdo
does it on its own, and it doesn't suppport other value profiling. So don't
run this pass when autofdo is active. This also avoids bogus
dump file entries.

gcc/:

* 2016-06-26  Andi Kleen  

* value-prof.c (gimple_value_profile_transformations): Don't run
when auto_profile is on.
I don't think that comment is really correct.   There's all kinds of 
things going on with div/mod operations, turning string builtins into 
straightline code, etc that I see in value-prof that aren't done by 
autofdo AFAICT.


I can live with the change if the comment is corrected.

jeff



Re: [RFC: Patch 6/6 v2] Remove second cost model from noce_try_store_flag_mask

2016-07-13 Thread Jeff Law

On 06/21/2016 09:50 AM, James Greenhalgh wrote:


Hi,

This transformation tries two cost models, one estimating the number
of insns to use, one estimating the RTX cost of the transformed sequence.
This is inconsistent with the other cost models used in ifcvt.c and
unneccesary - eliminate the second cost model.

Thanks,
James

---
2016-06-21  James Greenhalgh  

* ifcvt.c (noce_try_store_flag_mask): Delete redundant cost model.

OK.

At this point I think the series is fully ack'd.  Given folks have had 
nearly a month to object to the overall direction, I think you should go 
ahead and commit.  We can deal with any fallout at the target maintainer 
level.


jeff


Re: [RFC: Patch 5/6 v2] Improve the cost model for multiple-sets

2016-07-13 Thread Jeff Law

On 06/21/2016 09:50 AM, James Greenhalgh wrote:


Hi,

This patch is rewrites the cost model for bb_ok_for_noce_multiple_sets
to use the max_seq_cost heuristic added in earlier patch revisions.

As with the previous patch, I've used the new parameters to ensure that
the testsuite is still testing the functionality rather than relying on
the target setting the costs appropriately.

Thanks,
James

---
gcc/

2016-06-21  James Greenhalgh  

* ifcvt.c (noce_convert_multiple sets): Move cost model to here,
check the sequence cost after constructing the converted sequence.
(bb_of_for_noce_convert_multiple_sets): Move cost model.

gcc/testsuite/

2016-06-21  James Greenhalgh  

* gcc.dg/ifcvt-4.c: Use parameter to guide if-conversion heuristics.
* gcc.dg/ifcvt-5.c: Use parameter to guide if-conversion heuristics.


OK.
jeff


Re: [RFC: Patch 4/6 v2] Modify cost model for noce_cmove_arith

2016-07-13 Thread Jeff Law

On 06/21/2016 09:50 AM, James Greenhalgh wrote:


Hi,

This patch clears up the cost model for noce_try_cmove_arith. We lose
the "??? FIXME: Magic number 5" comment, and gain a more realistic cost
model for if-converting memory accesses.

This is the patch that has the chance to cause the largest behavioural
changes for most targets - the current heuristic does not take in to
consideration the cost of a conditional move - once we add that the cost
of the converted sequence often looks higher than we allowed before.

I think that missing the cost of the conditional move from these sequences
is not a good idea, and that the cost model should rely on the target giving
back good information. A target that finds tests failing after this patch
should consider either reducing the cost of a conditional move sequence, or
increasing TARGET_MAX_NOCE_IFCVT_SEQ_COST.

As this ups the cost of if-convert dramatically, I've used the new
parameters to ensure that the tests in the testsuite continue to pass on
all targets.

Bootstrapped in series on aarch64 and x86-64.

OK?

Thanks,
James

---
gcc/

2016-06-21  James Greenhalgh  

* ifcvt.c (noce_try_cmove_arith): Check costs after constructing
new sequence.

gcc/testsuite/

2016-06-21  James Greenhalgh  

* gcc.dg/ifcvt-2.c: Use parameter to guide if-conversion heuristics.
* gcc.dg/ifcvt-3.c: Use parameter to guide if-conversion heuristics.
* gcc.dg/pr68435.c: Use parameter to guide if-conversion heuristics.

LGTM as well.  And yes, the cost of the cmove needs to be accounted for. 
 Thanks for doing something sensible on those tests.  I would support 
similar tweaks if we find others after installing this series.


jeff


Re: [RFC: Patch 2/6 v2] Factor out the comparisons against magic numbers in ifcvt

2016-07-13 Thread Jeff Law

On 06/21/2016 09:50 AM, James Greenhalgh wrote:


Hi,

This patch pulls the comparisons between if_info->branch_cost and a magic
number representing an instruction count to a common function. While I'm
doing it, I've documented the instructions that the magic numbers relate
to, and updated them where they were inconsistent.

If our measure of the cost of a branch is now in rtx costs units, we can
get to an estimate for the cost of an expression from the number of
instructions by multiplying through by COSTS_N_INSNS (1).

Alternatively, we could actually construct the cheap sequences and
check the sequence. But in these cases we're expecting to if-convert on
almost all targets, the transforms in this patch are almost universally
a good idea, even for targets with a very powerful branch predictor,
eliminating the branch eliminates a basic block boundary so might be
helpful for scheduling, combine, and other RTL optimizers.

Bootstrapped on x86-64 and aarch64 as part of the full sequence.

OK?

Thanks,
James

---

2016-06-21  James Greenhalgh  

* ifcvt.c (noce_if_info): New field: max_seq_cost.
(noce_estimate_conversion_profitable_p): New.
(noce_try_store_flag_constants): Use it.
(noce_try_addcc): Likewise.
(noce_try_store_flag_mask): Likewise.
(noce_try_cmove): Likewise.
(noce_try_cmove_arith): Likewise.
(noce_find_if_block): Record targetm.max_noce_ifcvt_seq_cost.


LGTM.

jeff


Re: [RFC: Patch 3/6 v2] Remove if_info->branch_cost

2016-07-13 Thread Jeff Law

On 06/21/2016 09:50 AM, James Greenhalgh wrote:


Hi,

This patch removes what is left of branch_cost uses, moving them to use
the new hook and tagging each left over spot with a TODO to revisit them.
All these uses are in rtx costs units, so we don't have more work to do at
this point.

Bootstrapped as part of the patch series on aarch64 and x86-64.

OK?

Thanks,
James

---
2016-06-21  James Greenhalgh  

* ifcvt.c (noce_if_info): Remove branch_cost.
(noce_try_store_flag_mask): Use max_seq_cost rather than
branch_cost, tag as a TODO..
(noce_try_cmove_arith): Likewise.
(noce_convert_multiple_sets): Likewise.
(bb_ok_for_noce_convert_multiple_sets): Likewise.
(noce_find_if_block): Remove set of branch_cost.


LGTM

Jeff


Re: [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost

2016-07-13 Thread Jeff Law

On 06/21/2016 09:50 AM, James Greenhalgh wrote:


On Fri, Jun 03, 2016 at 12:39:42PM +0200, Richard Biener wrote:

On Thu, Jun 2, 2016 at 6:53 PM, James Greenhalgh
 wrote:


Hi,

This patch introduces a new target hook, to be used like BRANCH_COST but
with a guaranteed unit of measurement. We want this to break away from
the current ambiguous uses of BRANCH_COST.

BRANCH_COST is used in ifcvt.c in two types of comparisons. One against
instruction counts - where it is used as the limit on the number of new
instructions we are permitted to generate. The other (after multiplying
by COSTS_N_INSNS (1)) directly against RTX costs.

Of these, a comparison against RTX costs is the more easily understood
metric across the compiler, and the one I've pulled out to the new hook.
To keep things consistent for targets which don't migrate, this new hook
has a default value of BRANCH_COST * COSTS_N_INSNS (1).

OK?


How does the caller compute "predictable"?  There are some archs where
an information on whether this is a forward or backward jump is more
useful I guess.  Also at least for !speed_p the distance of the branch is
important given not all targets support arbitrary branch offsets.


Just through a call to predictable_edge_p. It isn't perfect. My worry
with adding more details of the branch is that you end up with a nonsense
target implementation that tries way too hard to be clever. But, I don't
mind passing the edge through to the target hook, that way a target has
it if they want it. In this patch revision, I pass the edge through.
There are so many things that can factor into this decision.  But I 
suspect we get most of the benefit from a small amount of work (ie, 
using the prediction information we've already generated).  If the 
target wants to override based on other factors, there's a mechanism for 
that, but I don't think it's likely to be heavily, if at all, used.




In this patch revision, I started by removing the idea that this costs
a branch at all. It doesn't, the use of this hook is really a target
trying to limit if-convert to not end up pulling too much on to the
unconditional path. It seems better to expose that limit directly by
explicitly asking for the maximum cost of an unconditional sequence we
would create, and comparing against seq_cost of the new RTL. This saves
a target trying to figure out what is meant by a cost of a branch.
Seems sensible.  Essentially you're just asking a different but related 
(and more direct) question that side-steps the need to be able to 
compare branch cost with costs of other insns.


A target maintainer may have a sense of how branch cost relates to 
normal insns on their target and they may choose to define the hooks in 
those terms (though I would discourage that as it re-introduces the 
precise problem we're trying to get around).




Having done that, I think I can see a clearer path to getting the
default hook implementation in shape. I've introduced two new params,
which give maximum costs for the generated sequence (one for a "predictable"
branch, one for "unpredictable") in the speed_p cases. I'm not expecting it
to be useful to give the user control in the case we are compiling for
size - whether this is a size win or not is independent of whether the
branch is predictable.

Maybe not for MIPS :-)  But I think that should be tabled.




For the default implementation, if the parameters are not set, I just
multiply BRANCH_COST through by COSTS_N_INSNS (1) for size and
COSTS_N_INSNS (3) for speed. I know this is not ideal, but I'm still short
of ideas on how best to form the default implementation. This means we're
still potentially going to introduce performance regressions for targets
that don't provide an implementation of the new hook, or a default value
for the new parameters. It does mean we can keep the testsuite clean by
setting parameter values suitably high for all targets that have
conditional move instructions.
But I think that's an OK place to be -- I don't think it's sensible for 
you to have to figure that out for every target.  It's something the 
target maintainers ought to be able to guesstimate more accurately.  My 
only objection is conceptual based on mixing BRANCH_COST & 
COSTS_N_INSNS.  But there may simply not be another way to set the default.




The new default causes some changes in generated conditional move sequences
for x86_64. Whether these changes are for the better or not I can't say.

You might consider passing those along to Uros to get his opinion.



This first patch introduces the two new parameters, and uses them in the
default implementation of the target hook.

Bootstrapped on x86_64 and aarch64 with no issues.

OK?

Thanks,
James

---
2016-06-21  James Greenhalgh  

* target.def (max_noce_ifcvt_seq_cost): New.
* doc/tm.texi.in (TARGET_MAX_NOCE_IFCVT_SEQ_COST): Document it.
* doc/tm.texi: Regenerate.
* targhooks.h 

Re: [PATCH PR68030/PR69710][RFC]Introduce a simple local CSE interface and use it in vectorizer

2016-07-13 Thread Jeff Law

On 05/25/2016 05:22 AM, Bin Cheng wrote:

Hi, As analyzed in PR68303 and PR69710, vectorizer generates
duplicated computations in loop's pre-header basic block when
creating base address for vector reference to the same memory object.
Not a huge surprise.  Loop optimizations generally have a tendency to 
create and/or expose CSE opportunities.   Unrolling is a common culprit, 
there's certainly the possibility for header duplication, code motions 
and IV rewriting to also expose/create redundant code.


We haven't really had great success squashing these redundancies within 
the loop optimizer itself.  This has lead to running CSE like passes 
after the loop optimizers or mini-CSE passes like we see here.




Because the duplicated code is out of loop, IVOPT fails to track base
object for these vector references, resulting in missed strength
reduction. It's agreed that vectorizer should be improved to generate
optimal (IVOPT-friendly) code, the difficult part is we want a
generic infrastructure.  After investigation, I tried to introduce a
generic/simple local CSE interface by reusing existing
algorithm/data-structure from tree-ssa-dom (tree-ssa-scopedtables).
I believe you're the only person that's ever tried to re-use that 
infrastructure -- were there any API/usage patterns that would have 
worked better for your needs?


ISTM that conceptually if we broke out the basic hashing bits that's 
what you're most interested in re-using since you're really just 
building a local CSE.





The interface runs local CSE for each basic block in a bitmap,
customers of this interface only need to record basic blocks in the
bitmap when necessary.  Note we don't need scopedtables' unwinding
facility since the interface runs only for single basic block, this
should be good in terms of compilation time.

Right.




I checked its impact on various test cases. With this patch,
PR68030's generated assembly is reduced from ~750 lines to ~580 lines
on x86_64, with both pre-header and loop body simplified.

That's a good result :-)



 But, 1) It

doesn't fix all the problem on x86_64.  Root cause is computation for
base address of the first reference is somehow moved outside of
loop's pre-header, local CSE can't help in this case.
That's a bid odd -- have you investigated why this is outside the loop 
header?


 Though

split_constant_offset can back track ssa def chain, it causes
possible redundant when there is no CSE opportunities in pre-header.
Is the problem that when you split you don't know if doing so will be 
profitable until you you're actually CSE-ing the pre-header?



2) It causes regression for PR68030 on AArch64.  I think the
regression is caused by IVOPT issues which are exposed by this patch.
Checks on offset validity in get_address_cost is wrong/inaccurate
now.  It considers an offset as valid if it's within the maximum
offset range that backend supports.  This is not true, for example,
AArch64 requires aligned offset additionally.  For example, LDR [base
+ 2060] is invalid for V4SFmode, although "2060" is within the
maximum offset range.  Another issue is also in get_address_cost.
Inaccurate cost is computed for "base + offset + INDEX" address
expression.  When register pressure is low, "base+offset" can be
hoisted out and we can use [base + INDEX] addressing mode, whichhis
is current behavior.



> Bootstrap and test on x86_64 and AArch64.  Any comments appreciated.
>
> Thanks,
> bin
>
> 2016-05-17 Bin Cheng  
>
> PR tree-optimization/68030
> PR tree-optimization/69710
> * tree-ssa-dom.c (cse_bbs): New function.
> * tree-ssa-dom.h (cse_bbs): New declaration.
> * tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref):
> Re-associate address by splitting constant offset.
> (vect_create_data_ref_ptr, vect_setup_realignment): Record 
changed

> basic block.
> * tree-vect-loop-manip.c (vect_gen_niters_for_prolog_loop): 
Record

> changed basic block.
> * tree-vectorizer.c (tree-ssa-dom.h): Include header file.
> (changed_bbs): New variable.
> (vectorize_loops): Allocate and free CHANGED_BBS.  Call cse_bbs.
> * tree-vectorizer.h (changed_bbs): New declaration.
>
>
> pr69710-20160523.txt
>
>
> diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
> index 8bf5b3c..b2a0b0c 100644
> --- a/gcc/tree-ssa-dom.c
> +++ b/gcc/tree-ssa-dom.c
> @@ -2090,3 +2090,50 @@ lookup_avail_expr (gimple *stmt, bool insert,
>
>return lhs;
>  }
> +
> +/* A local CSE interface which runs CSE for basic blocks recorded in
> +   CHANGED_BBS.  */
> +
> +void
> +cse_bbs (bitmap changed_bbs)
> +{
> +  unsigned index;
> +  bitmap_iterator bi;
> +  gimple_stmt_iterator gsi;
> +
> +  hash_table *avail_exprs;
> +  class avail_exprs_stack *avail_exprs_stack;
> +  class const_and_copies *const_and_copies;
> +
> +  avail_exprs = new hash_table (1024);
> +  avail_exprs_stack = new class avail_exprs_stack 

Re: [PATCH] Fix PR31096

2016-07-13 Thread Jeff Law

On 04/14/2016 12:45 AM, Hurugalawadi, Naveen wrote:

Hi,


>> I think we should handle at least INTEGER_CST and SSA_NAME
>> with VRP, and it seems natural to add a VRP check

The check should be added in the tree_single_nonzero_warnv_p
for SSA_NAME case for tree_expr_nonzero_p.
However, for tree_expr_nonnegative_p, its been handled in a
different way. Should we combine this check with the existing one?

+   (if (!tree_expr_nonnegative_p (@1))
+(cmp @2 @0))


>> Ideally, you would call tree_expr_nonpositive_p, except that that
>> function doesn't exist yet. So for now, I guess we

Would the tree_expr_nonpositive_p function be helpful for other cases
as well, I would try to add it if its useful.

Please find attached the modified patch as per the suggestions and
let me know if its fine?

Thanks,
Naveen


pr31096-4.patch


diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 0f4bf7e..5922dbd 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -9177,7 +9177,7 @@ tree_expr_nonzero_warnv_p (tree t, bool 
*strict_overflow_p)
 /* Return true when T is an address and is known to be nonzero.
Handle warnings about undefined signed overflow.  */

-static bool
+bool
 tree_expr_nonzero_p (tree t)
 {
   bool ret, strict_overflow_p;
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index 02f4270..8579622 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -167,6 +167,7 @@ extern tree size_diffop_loc (location_t, tree, tree);
 #define non_lvalue(T) non_lvalue_loc (UNKNOWN_LOCATION, T)
 extern tree non_lvalue_loc (location_t, tree);

+extern bool tree_expr_nonzero_p (tree);
 extern bool tree_expr_nonnegative_p (tree);
 extern bool tree_expr_nonnegative_warnv_p (tree, bool *, int = 0);
 extern tree make_range (tree, int *, tree *, tree *, bool *);
diff --git a/gcc/match.pd b/gcc/match.pd
index 75aa601..6655a3c 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
zerop
CONSTANT_CLASS_P
tree_expr_nonnegative_p
+   tree_expr_nonzero_p
integer_valued_real_p
integer_pow2p
HONOR_NANS)
@@ -894,7 +895,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && tree_nop_conversion_p (type, TREE_TYPE (@1)))
   (convert (bit_and (bit_not @1) @0

+/* Fold A * 10 == B * 10 into A == B.  */
+(for cmp (eq ne)
+ (simplify
+  (cmp (mult:c @0 @1) (mult:c @2 @1))
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@1))
+   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
+   && tree_expr_nonzero_p (@1))
+   (cmp @0 @2

Rather than refer to an explicit constant (10), I'd write the comment as

/* For integral types with undefined overflow and C != 0 fold
   x * C EQ/NE y * C into x EQ/NE y.  */

We commonly use "C" to refer to an arbitrary constant in comments 
throughout GCC.   I think my version is significantly clearer.






+/* Fold A * 10 < B * 10 into A < B.  */
I think we want to do a similar kind of fix to the comment here.  Except 
you want to lay out the different transformations based on the value of 
the constant.   So something like;


/* For integral types with undefined overflow and C != 0 fold
   x * C RELOP y * C into:

   x RELOP y for nonnegative C
   y RELOP x for negative C  */






 /* ((X inner_op C0) outer_op C1)
With X being a tree where value_range has reasoned certain bits to always be
diff --git a/gcc/testsuite/gcc.dg/pr31096.c b/gcc/testsuite/gcc.dg/pr31096.c
new file mode 100644
index 000..72446bc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr31096.c
@@ -0,0 +1,41 @@
+/* PR middle-end/31096 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" }  */
+
+int
+f (int a, int b)
+{
+  return a * 67 == b * 67;
+}
+
+int
+f1 (int a, int b)
+{
+  return a * -42 !=  b * -42;
+}
+
+int
+f2 (int a, int b)
+{
+  return a * 10 >= b * 10;
+}
+
+int
+f3 (int a, int b)
+{
+  return a * -4 <  b * -4;
+}
+
+int
+f4 (unsigned int a, unsigned int b)
+{
+  return a * 10 == b * 10;
+}
+
+int
+f5 (unsigned int a, unsigned int b)
+{
+  return a * -42 <  b * -42;
+}
+
+/* { dg-final { scan-tree-dump-times "\\(D\\) \\*" 4 "optimized" } } */
So the problem I see here is it's not obvious what your scanning for. 
Often just a comment can really help here.


I would suggest tests when C is zero and verify this transformation 
doesn't fire on that case.


I would suggest verifying that the operand orders change appropriately 
when dealing with a negative constant.


You might want to verify nothing happens with floating point or vector 
types.


If you wanted to be extra thorough you could iterate over the operators. 
 ie, testing == and !=, then <, <=, >, >=


It sounds a bit like overkill, but we've often found subtle cases where 
we wouldn't optimize one case when we expected it to be optimized.


So overall, I think the transformations are fine and just need updated 
comments.  The tests need a bit more work.   Can you please update and 
resubmit -- I think this is pretty close to ready.


Thanks for your patience,
jeff



Re: [PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP

2016-07-13 Thread Evandro Menezes

On 07/13/16 11:14, Kyrill Tkachov wrote:

Hi all,

The most common way to load and store TImode value in aarch64 is to 
perform an LDP/STP of two X-registers.

This is the *movti_aarch64 pattern in aarch64.md.
There is a bug in the logic in aarch64_classify_address where it 
validates the offset in the address used
to load a TImode value. It passes down TImode to the 
aarch64_offset_7bit_signed_scaled_p check which rejects
offsets that are not a multiple of the mode size of TImode (16). 
However, this is too conservative as X-reg LDP/STP

instructions accept immediate offsets that are a multiple of 8.

Also, considering that the definition of 
aarch64_offset_7bit_signed_scaled_p is:

  return (offset >= -64 * GET_MODE_SIZE (mode)
  && offset < 64 * GET_MODE_SIZE (mode)
  && offset % GET_MODE_SIZE (mode) == 0);

I think the range check may even be wrong for TImode as this will 
accept offsets in the range [-1024, 1024)

(as long as they are a multiple of 16)
whereas X-reg LDP/STP instructions only accept offsets in the range 
[-512, 512).
So since the check is for an X-reg LDP/STP address we should be 
passing down DImode.


This patch does that and enables more aggressive generation of REG+IMM 
addressing modes for 64-bit aligned

TImode values, eliminating many address calculation instructions.
For the testcase in the patch we currently generate:
bar:
add x1, x1, 8
add x0, x0, 8
ldp x2, x3, [x1]
stp x2, x3, [x0]
ret

whereas with this patch we generate:
bar:
ldp x2, x3, [x1, 8]
stp x2, x3, [x0, 8]
ret

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?


LGTM

--
Evandro Menezes



Re: [JAVA PATCH] Enable more array bounds check elimination

2016-07-13 Thread Jeff Law

On 02/22/2016 11:10 AM, ro...@nextmovesoftware.com wrote:


It has been a while since my last contribution.  The following patch allows 
GCC's optimizers
to more aggressively eliminate and optimize java array bounds checks.  The 
results are
quite impressive, for example producing a 26% performance improvement on the 
sieve.java
benchmark given at http://keithlea.com/javabench/ on x86_64-pc-linux-gnu, 
reducing the
runtime for 1 million iterations from 31.5 seconds on trunk, to 25.0s with this 
patch, in fact
eliminating all array bounds checks.  This is close to the 22.3s of an 
equivalent C/C++
implementation, and significantly closes the gap to Java HotSpot(TM) JIT at 
23.0 seconds.

The approach is to provide sufficient information in the gimple generated by 
the gcj front-end
to allow the optimizers to do their thing.  For array allocations of constant 
length, I propose
generating an additional (cheap) write to the array length field returned from 
_Jv_NewPrimArray,
which is then sufficient to allow this value to propagate throughout the 
optimizers.

This is probably best explained by a simple example.  Consider the array 
initializer below:

Thanks.  The example helped a lot.

At a very high level, you should be aware of a general belief that GCJ's 
life is limited.  There's been various calls to deprecate it.  So 
spending a lot of time optimizing GCJ's output may not be the best use 
of your skills :-)




With the patch below, we now generate the much more informative .004t.gimple 
for this:

D.926 = _Jv_NewPrimArray (&_Jv_intClass, 3);
D.926->length = 3;
Essentially you're just storing back into the result the length that 
we'd passed to the allocator.  Cute.  Good to see that all the work 
we've done to propagate the RHS of that kind of statement into uses has 
paid off.


Presumably there's no reasonable way this could fail (like you're 
getting objects from a readonly part of memory), or the result gets used 
in some other thread which changes its size prior to the re-storing of 
the initial size?






Achieving this result required two minor tweaks.   The first is to allow the 
array length constant
to reach the newarray call, by allowing constants to remain on the quickstack.  
This allows the
call to _Jv_NewPrimArray to have a constant integer argument instead of the 
opaque #slot#0#0.
Then in the code that constructs the call to _Jv_NewPrimArray we wrap it in a 
COMPOUND_EXPR
allowing us to insert the superfluous, but helpful, write to the length field.

Whilst working on this improvement I also noticed that the array bounds checks 
we were
initially generating could also be improved.  Currently, an array bound check 
in 004t.gimple
looks like:

D.925 = MEM[(struct int[] *)_ref_1_4.6].length;
D.926 = (unsigned int) D.925;
if (_slot_2_5.9 >= D.926) goto ; else goto ;
:
_Jv_ThrowBadArrayIndex (_slot_2_5.8);
if (0 != 0) goto ; else goto ;
:
iftmp.7 = 1;
goto ;
:
iftmp.7 = 0;
:

Notice the unnecessary "0 != 0" and the dead assignments to iftmp.7 (which is 
unused).
FWIW, we're generally moving away from optimization in the language 
front-ends -- in particular folding, which you introduce in this patch 
on the array index.  Given the trajectory of GCJ I'm not going to worry 
about it though.





With the patch below, we now not only avoid this conditional but also use 
__builtin_expect
to inform the compiler that throwing an BadArrayIndex exception is typically 
unlikely.  i.e.

Sounds like a good thing as well.



D.930 = MEM[(struct int[] *)_ref_1_4.4].length;
D.931 = D.930 <= 1;
D.932 = __builtin_expect (D.931, 0);
if (D.932 != 0) goto ; else goto ;
:
_Jv_ThrowBadArrayIndex (0);
:


The following patch has been tested on x86_64-pc-linux-gnu with a full make 
bootstrap
and make check, with no new failures/regressions.

Please let me know what you think (for stage 1 once it reopens)?

Roger
--
Roger Sayle, Ph.D.
CEO and founder
NextMove Software Limited
Registered in England No. 07588305
Registered Office: Innovation Centre (Unit 23), Cambridge Science Park, 
Cambridge CB4 0EY

2016-02-21  Roger Sayle  

* expr.c (push_value): Only call flush_quick_stack for non-constant
arguments.
(build_java_throw_out_of_bounds_exception): No longer wrap calls
to _Jv_ThowBadArrayIndex in a COMPOUND_EXPR as no longer needed.
(java_check_reference): Annotate COND_EXPR with __builtin_expect
to indicate that calling _Jv_ThrowNullPointerException is unlikely.
(build_java_arrayaccess): Construct an unlikely COND_EXPR instead
of a TRUTH_ANDIF_EXPR in a COMPOUND_EXPR.  Only generate array
index MULT_EXPR when size_exp is not unity.
(build_array_length_annotation): When optimizing, generate a write
to the allocated array's length field to expose constant lengths
to GCC's optimizers.

Re: [PATCH] Fix Bug 17896: The expression (a>0 & b>0) should give clearer warning message (-Wparentheses)

2016-07-13 Thread Jeff Law

On 01/30/2016 06:35 AM, Prasad Ghangal wrote:

Hi!

This is my first proposed patch for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=17896. I was willing to
do it using "APPEARS_TO_BE_BOOLEAN_EXPR_P(CODE, ARG)" to check
booleans but gcc doesn't allow (bootstraping fails). Hence I am using
"TREE_CODE_CLASS (CODE) == tcc_comparison"
I would encourage you to look more closely at why GCC failed to 
bootstrap.  It could be the case that APPEARS_TO_BE_BOOLEAN_EXPR_P is 
the wrong test to use, or it could be a bug in GCC itself.






-- Thanks and Regards, Prasad Ghangal


bug17896.patch


Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c (revision 232768)
+++ gcc/c-family/c-common.c (working copy)
@@ -11596,6 +11596,11 @@
   || code_right == PLUS_EXPR || code_right == MINUS_EXPR)
warning_at (EXPR_LOC_OR_LOC (arg_right, loc), OPT_Wparentheses,
 "suggest parentheses around arithmetic in operand of %<|%>");
+  /* Check cases like (x instead of %<|%> when joining booleans");

Space between the "if" and open paren.

Both of those nits are repeated in the second block of code.

I note in the patch inside the BZ that the original author verified 
neither argument had side effects.  Any reason you dropped that test?


Similarly in that patch there's a test for the testsuite.  AFAICT your 
patch doesn't even pass that test.  For a change like yours we 
definitely want to include a reasonable test -- you could start with the 
one included in the BZ or construct your own.


As of right now I don't think your patch needs a copyright assignment, 
but that might change if the patch grows too much.



Jeff



Re: [PATCH] simplify-rtx.c: start adding selftests (v2)

2016-07-13 Thread Jeff Law

On 07/06/2016 01:30 PM, David Malcolm wrote:


This might be a bit confusing when more tests are added, since
pointer equality is only useful in certain specific cases (e.g.
when you know you're dealing with CONST_INTs or pseudo registers).
How about making ASSERT_RTX_EQ check for rtx_equal_p equality and
have something like ASSERT_RTX_PTR_EQ for cases where pointer
equality really is needed?



Also, how about using LAST_VIRTUAL_REGISTER + 1 as the base for
register numbers?  DImode might not be valid for register 0 on all
targets.


Thanks.  Here's an updated version which adds both ASSERT_RTX_EQ and
ASSERT_RTX_PTR_EQ.  The simplify-rtx.c tests can use the stricter
pointer equality test, so I updated them to use ASSERT_RTX_PTR_EQ
condition.
Richard S. is definitely right here WRT using pointer equality vs deeper 
inspection.   The RTL structure sharing rules within GCC are something 
you have to be cognizant of here.  Thankfully the RTL structure sharing 
is reasonably well documented.


I added a selftest::make_test_reg to allocate pseudo regs, starting
at LAST_VIRTUAL_REGISTER + 1.
Also the right thing to do.  There's hard registers, then virtuals, then 
pseudos.


There's obviously much more we could do with the tests, but this is a 
reasonable start.


I note you iterate over all the modes -- which would include things like 
FP modes, BImode, and CC special modes and such.  I don't think we can 
necessarily be sure that the transformations you're testing are valid 
across all modes.


For example, does it even make sense to test (A & B) | A -> A for FP modes?

It almost seems like the iteration space has to be dependent on what 
you're testing.  Ie, some tests you want to iterate over the standard 
integer modes.  Other tests you might reasonably include FP modes.  CC 
modes I think should be forbidden for these tests.  THere may be others 
to ponder as well.



jeff



Re: Implement -Wswitch-fallthrough

2016-07-13 Thread Marc Glisse

On Wed, 13 Jul 2016, Marek Polacek wrote:


On Wed, Jul 13, 2016 at 08:39:28PM +0200, Marc Glisse wrote:

On Wed, 13 Jul 2016, Marek Polacek wrote:


Does "__attribute__((fallthrough));" have any advantages over
"__builtin_fallthrough()"?


Not a strong argument, but compilers that don't know the construct will give
an error on the builtin, and just a warning on the attribute (and ignore the
comment, which makes both other versions less relevant).


Hmm, I'd expect that people add something like

#if GCC_VERSION >= 7000
# define gcc_fallthrough() __builtin_fallthrough ()
#else
# define gcc_fallthrough()
#endif

and not use the builtin directly.


I expect people will mostly use the comment, or maybe the C++17 attribute 
in a few years in C++ code. But accidents happen. I did say it was a weak 
argument ;-)



Unrelated question: are there cases where __builtin_fallthrough() has any
impact on code generation?


It should not -- my implementation gets rid of all __builtin_fallthrough()
during gimple-low, before most of the optimizations kick in, so it shouldn't
make any real difference.


"should", "most", "real"... This is a bit like compare-debug, as a user I 
expect it to only affect warnings, and if it can affect code generation, 
even in small ways, I would like the doc to tell me.


--
Marc Glisse


Re: Implement -Wswitch-fallthrough: rs6000

2016-07-13 Thread Bruce Korb
On Wed, Jul 13, 2016 at 11:39 AM, Marek Polacek  wrote:
> Most likely what you saw was in cxx_pretty_printer::declaration_specifiers.

I only saw it once and, of course, it was once too often. ;)
If you fix it, it would sooth my sensibilities as the fixincludes maintainer,
making mine a small voice in the GCC world.

Still, for what it's worth, I'd back you :)

Cheers - Bruce


Re: Implement -Wswitch-fallthrough

2016-07-13 Thread Marek Polacek
On Wed, Jul 13, 2016 at 08:39:28PM +0200, Marc Glisse wrote:
> On Wed, 13 Jul 2016, Marek Polacek wrote:
> 
> > Does "__attribute__((fallthrough));" have any advantages over
> > "__builtin_fallthrough()"?
> 
> Not a strong argument, but compilers that don't know the construct will give
> an error on the builtin, and just a warning on the attribute (and ignore the
> comment, which makes both other versions less relevant).
 
Hmm, I'd expect that people add something like 

#if GCC_VERSION >= 7000
# define gcc_fallthrough() __builtin_fallthrough ()
#else
# define gcc_fallthrough()
#endif

and not use the builtin directly.

> Unrelated question: are there cases where __builtin_fallthrough() has any
> impact on code generation?

It should not -- my implementation gets rid of all __builtin_fallthrough()
during gimple-low, before most of the optimizations kick in, so it shouldn't
make any real difference.

Marek


Re: Implement -Wswitch-fallthrough: rs6000

2016-07-13 Thread Marek Polacek
On Wed, Jul 13, 2016 at 11:30:34AM -0700, Bruce Korb wrote:
> > On Mon, Jul 11, 2016 at 01:36:02PM -0700, Bruce Korb wrote:
> > [[putrid code deleted]]
> >> Does this patch mean that the above got fixed?  I mean, if you're
> >> going to fret over linguistic tags to make falling through explicit,
> >> it would seem the above code is pretty sore-thumby, yes?
> >
> > My current implementation warns here, but the warning can be suppressed
> > by adding /* FALLTHRU */ or __builtin_fallthrough(); before the
> > do_something_else() line.
> >
> > Does that answer your question?
> 
> Not in a satisfying way. :)  I understand that syntactically, you can

Ah, sorry.

> drop a "case" in wherever you can drop a statement label.  That
> doesn't mean you should.  Since I wasn't fixing GCC code, I just
> rolled my eyes and moved on.  However, if the code is going to be
> changed, then contortions like that ought to be addressed.  Both for
> aesthetics and for code clarity.

Most likely what you saw was in cxx_pretty_printer::declaration_specifiers.
I will have to do something about this, either add /* FALLTHRU */ or rewrite
the code slightly, as you suggest.  But that will involve duplicating the
c_pretty_printer::declaration_specifiers (t);
line, so some people might be against it.

Marek


Re: Implement -Wswitch-fallthrough

2016-07-13 Thread Marc Glisse

On Wed, 13 Jul 2016, Marek Polacek wrote:

Does "__attribute__((fallthrough));" have any advantages over 
"__builtin_fallthrough()"?


Not a strong argument, but compilers that don't know the construct will 
give an error on the builtin, and just a warning on the attribute (and 
ignore the comment, which makes both other versions less relevant).


Unrelated question: are there cases where __builtin_fallthrough() has any 
impact on code generation?


--
Marc Glisse


Re: [v3 PATCH] Implement P0307R2, Making Optional Greater Equal Again.

2016-07-13 Thread Ville Voutilainen
On 13 July 2016 at 21:25, Daniel Krügler  wrote:
> How would you feel about the introduction of an internal trait
> __is_boolean_testable, that would test both is_convertible bool> and is_constructible for now, so that we could
> reuse that at places like these and others pointed out by LWG 2114?
>
> If you like that idea, I would work on a contribution.


Sounds like an excellent idea to me.


Re: Implement -Wswitch-fallthrough: rs6000

2016-07-13 Thread Bruce Korb
> On Mon, Jul 11, 2016 at 01:36:02PM -0700, Bruce Korb wrote:
> [[putrid code deleted]]
>> Does this patch mean that the above got fixed?  I mean, if you're
>> going to fret over linguistic tags to make falling through explicit,
>> it would seem the above code is pretty sore-thumby, yes?
>
> My current implementation warns here, but the warning can be suppressed
> by adding /* FALLTHRU */ or __builtin_fallthrough(); before the
> do_something_else() line.
>
> Does that answer your question?

Not in a satisfying way. :)  I understand that syntactically, you can
drop a "case" in wherever you can drop a statement label.  That
doesn't mean you should.  Since I wasn't fixing GCC code, I just
rolled my eyes and moved on.  However, if the code is going to be
changed, then contortions like that ought to be addressed.  Both for
aesthetics and for code clarity.


Re: [v3 PATCH] Implement P0307R2, Making Optional Greater Equal Again.

2016-07-13 Thread Daniel Krügler
2016-07-13 12:05 GMT+02:00 Ville Voutilainen :
> On 13 July 2016 at 01:31, Jonathan Wakely  wrote:
>> On 11/07/16 23:41 +0300, Ville Voutilainen wrote:
>>>
>>> @@ -785,41 +785,60 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>> }
>>> };
>>>
>>> +  template
>>> +using __optional_relop_t =
>>> +enable_if_t::value, bool>;
>>
>>
>> Should this be is_convertible<_Tp, bool> instead?
>
> Yeah.. it would be more reasonable to return a type explicitly
> convertible to bool from a relop
> if a non-bool is returned, but since built-in operators are not
> contextually-convertible-to-bool,
> that "protection" wouldn't buy much. And since the implementation
> doesn't really do bool-wrappings
> everywhere, let's go with is_convertible and change that if someone complains.

How would you feel about the introduction of an internal trait
__is_boolean_testable, that would test both is_convertible and is_constructible for now, so that we could
reuse that at places like these and others pointed out by LWG 2114?

If you like that idea, I would work on a contribution.

Thanks,

- Daniel


Re: Implement -Wswitch-fallthrough: rs6000

2016-07-13 Thread Marek Polacek
On Mon, Jul 11, 2016 at 01:36:02PM -0700, Bruce Korb wrote:
> I'm curious about this.  In the process of developing a code analysis
> tool, I found some GCC code that was, basically:
> 
> switch (v) {
> case 0:
>   if (e) {
> do_something();
>   } else {
> case 1:
> do_something_else();
>   }
> }
 
Yeah, I saw this too, somewhere in the C++ FE.

> Does this patch mean that the above got fixed?  I mean, if you're
> going to fret over linguistic tags to make falling through explicit,
> it would seem the above code is pretty sore-thumby, yes?

My current implementation warns here, but the warning can be suppressed
by adding /* FALLTHRU */ or __builtin_fallthrough(); before the
do_something_else() line.

Does that answer your question?

Marek


Re: Implement -Wswitch-fallthrough

2016-07-13 Thread Marek Polacek
On Mon, Jul 11, 2016 at 04:28:32PM -0400, NightStrike wrote:
> On Mon, Jul 11, 2016 at 3:43 PM, Marek Polacek  wrote:
> > But then the [[fallthrough]] attribute was
> > approved for C++17 [1], and that's what has got me to do all this.
> >  ...
> > I added a new builtin,
> > __builtin_fallthrough, that prevents the warning from occurring.  It can 
> > only
> > be used in a switch; the compiler will issue an error otherwise.  The new 
> > C++
> > attribute can then be implemented in terms of this builtin.
> 
> This is a stupid question I'm sure, but if C++ can have the attribute
> version, why can't C have __attribute__((fallthrough)) ?

We could, but why?  I already added __builtin_fallthrough() and
__attribute__((fallthrough)) would just be turned into that.  C++17 adds the
fallthrough attribute for a null statement, so we need to implement that, but
again I expect that I just turn that to __builtin_fallthrough().  We plan to
add __Fallthrough to C, but that is a long way off.  Does
"__attribute__((fallthrough));" have any advantages over 
"__builtin_fallthrough()"?

Marek


Re: Debug algorithms

2016-07-13 Thread Jonathan Wakely

On 22/06/16 22:05 +0200, François Dumont wrote:

Hi

   Here is eventually the so long promized patch to introduce Debug 
algos similarly to Debug containers.


I'm trying to decide how much benefit this really gives us, and
whether the obfuscation to the code (even more namespaces involved,
and having to use __std_a:: instead of std::) is worth it. It also
means more code to maintain of course, with extra overloads.


   Why such an evolution:
- More flexibility, debug algos can be used explicitely without 
activating Debug mode.


Although nice in theory, I doubt this will get much usage in practice.

- Performance: Debug algos can get rid of Debug layer on top of 
container iterators to invoke normal algos. Operations on normal 
iterators are faster and we also benefit from the same algos 
specialization that sometimes exist on some container iterators (like 
std::deque ones). Also normal algos are now using other normal algos, 
Debug check won't be done several times.
- It will be easier to implement new Debug checks without the 
limitation to do so through some Debug macro


   To do so I introduced a new namespace __cxx1998_a used for normal 
algos when Debug mode is active. I couldn't reuse __cxx1998 cause with 
current implementation of Debug containers __cxx1998 is exposed and 
because of ADL we could then have ambiguity between Debug and normal 
versions of the same algos. I also introduced a __std_a namespace 
which control the kind of algos used within the library mostly for 
containers implementation details.



I think I need to apply the patch locally and spend some time looking
at the new structure, to see what ends up calling what. I'm finding it
difficult to follow that just from reading the patch.



Re: Fix PR44281 (bad RA with global regs)

2016-07-13 Thread Bernd Schmidt

On 07/13/2016 05:29 PM, Dominik Vogt wrote:


Unfortunately this patch (or whatever got actually committed) has
broken the gcc.target/s390/pr679443.c test case, which is a bit
fishy (see code snippet below).  I assign most registers to global
variables and then use some complicated arithmetics with the goal
that the pointer stored in the first argument gets saved on the
stack and reloaded to a different register.  Before this patch the
test case just needed three registers to do its work (r2, r3, r4).
With the patch it currently causes an error in the reload pass

  error: unable to find a register to spill


Might be useful to see the dump_reload output.


If a fourth register is available, the ICE goes away, but the
pointer remains in r2, rendering the test case useless.


I don't think I quite understand what you're trying to do here, but it 
does look dodgy. Whatever this is testing would probably be better as a 
part of a unit test.



Bernd



[PATCH] libstdc++/71856 Define _GLIBCXX_PARALLEL_ASSERTIONS

2016-07-13 Thread Jonathan Wakely

This fixes a conflict between how Parallel Mode has always used the
_GLIBCXX_ASSERTIONS macro and the new meaning we gave it for GCC 6
(enabling the lightweight debug checks).

It doesn't make sense for Parallel Mode to own that macro, and it
might be useful to enable Parallel Mode assertions without the other
checks, so I've changed all the Parallel Mode headers to check
_GLIBCXX_PARALLEL_ASSERTIONS instead. If that's not defined then it
defaults to the value of _GLIBCXX_ASSERTIONS, to preserve the old
behaviour.

PR libstdc++/71856
* include/bits/c++config (_GLIBCXX_ASSERTIONS): Define to 1 not empty.
* include/parallel/compiletime_settings.h (_GLIBCXX_ASSERTIONS):
Rename to _GLIBCXX_PARALLEL_ASSERTIONS and make default value depend
on _GLIBCXX_ASSERTIONS.
* include/parallel/balanced_quicksort.h: Rename _GLIBCXX_ASSERTIONS.
Include  for sleep.
* include/parallel/losertree.h: Rename _GLIBCXX_ASSERTIONS.
* include/parallel/merge.h: Likewise.
* include/parallel/multiway_merge.h: Likewise.
* include/parallel/partition.h: Likewise.
* include/parallel/queue.h: Likewise.
* include/parallel/sort.h: Likewise.
* testsuite/25_algorithms/headers/algorithm/
parallel_algorithm_assert.cc: New.

commit 530690e778b4257f6e52ef1199d536dbd392e4ba
Author: redi 
Date:   Wed Jul 13 17:22:57 2016 +

libstdc++/71856 Define _GLIBCXX_PARALLEL_ASSERTIONS

PR libstdc++/71856
* include/bits/c++config (_GLIBCXX_ASSERTIONS): Define to 1 not empty.
* include/parallel/compiletime_settings.h (_GLIBCXX_ASSERTIONS):
Rename to _GLIBCXX_PARALLEL_ASSERTIONS and make default value depend
on _GLIBCXX_ASSERTIONS.
* include/parallel/balanced_quicksort.h: Rename _GLIBCXX_ASSERTIONS.
Include  for sleep.
* include/parallel/losertree.h: Rename _GLIBCXX_ASSERTIONS.
* include/parallel/merge.h: Likewise.
* include/parallel/multiway_merge.h: Likewise.
* include/parallel/partition.h: Likewise.
* include/parallel/queue.h: Likewise.
* include/parallel/sort.h: Likewise.
* testsuite/25_algorithms/headers/algorithm/
parallel_algorithm_assert.cc: New.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@238307 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 57024e4..4625607 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -414,7 +414,7 @@ namespace std
 
 // Debug Mode implies checking assertions.
 #ifdef _GLIBCXX_DEBUG
-# define _GLIBCXX_ASSERTIONS
+# define _GLIBCXX_ASSERTIONS 1
 #endif
 
 // Disable std::string explicit instantiation declarations in order to assert.
diff --git a/libstdc++-v3/include/parallel/balanced_quicksort.h 
b/libstdc++-v3/include/parallel/balanced_quicksort.h
index 65dec30..9d09ed2 100644
--- a/libstdc++-v3/include/parallel/balanced_quicksort.h
+++ b/libstdc++-v3/include/parallel/balanced_quicksort.h
@@ -51,8 +51,11 @@
 #include 
 #include 
 
-#if _GLIBCXX_ASSERTIONS
+#if _GLIBCXX_PARALLEL_ASSERTIONS
 #include 
+#ifdef _GLIBCXX_HAVE_UNISTD_H
+#include 
+#endif
 #endif
 
 namespace __gnu_parallel
@@ -110,7 +113,7 @@ namespace __gnu_parallel
__median_of_three_iterators(__begin, __begin + (__end - __begin) / 2,
__end  - 1, __comp);
 
-#if defined(_GLIBCXX_ASSERTIONS)
+#if defined(_GLIBCXX_PARALLEL_ASSERTIONS)
   // Must be in between somewhere.
   _DifferenceType __n = __end - __begin;
 
@@ -147,7 +150,7 @@ namespace __gnu_parallel
   std::iter_swap(__begin + __split_pos, __pivot_pos);
   __pivot_pos = __begin + __split_pos;
 
-#if _GLIBCXX_ASSERTIONS
+#if _GLIBCXX_PARALLEL_ASSERTIONS
   _RAIter __r;
   for (__r = __begin; __r != __pivot_pos; ++__r)
_GLIBCXX_PARALLEL_ASSERT(__comp(*__r, *__pivot_pos));
@@ -194,7 +197,7 @@ namespace __gnu_parallel
   _DifferenceType __split_pos =
__qsb_divide(__begin, __end, __comp, __num_threads);
 
-#if _GLIBCXX_ASSERTIONS
+#if _GLIBCXX_PARALLEL_ASSERTIONS
   _GLIBCXX_PARALLEL_ASSERT(0 <= __split_pos &&
__split_pos < (__end - __begin));
 #endif
@@ -267,7 +270,7 @@ namespace __gnu_parallel
   _Piece __current = __tl._M_initial;
 
   _DifferenceType __elements_done = 0;
-#if _GLIBCXX_ASSERTIONS
+#if _GLIBCXX_PARALLEL_ASSERTIONS
   _DifferenceType __total_elements_done = 0;
 #endif
 
@@ -297,7 +300,7 @@ namespace __gnu_parallel
 __pred);
 
   // Left side: < __pivot_pos; __right side: >= __pivot_pos.
-#if _GLIBCXX_ASSERTIONS
+#if _GLIBCXX_PARALLEL_ASSERTIONS
   _GLIBCXX_PARALLEL_ASSERT(__begin <= __split_pos1
&& __split_pos1 < __end);

[patch] Make basic_string::replace forward to different overload

2016-07-13 Thread Jonathan Wakely

Currently the overload of basic_string::replace taking an
initializer_list forwards to the overload taking two const _Char*
arguments, which does some debug checks, then forwards to the overload
taking a const _Char* and a size_type, which repeats the debug checks.

The initializer_list overload can skip the first step and forward
straight to the second one.

Tested x86_64-linux, committed to trunk.

commit 85653bbebe9545a320ac38a664bec15f1a407150
Author: Jonathan Wakely 
Date:   Wed Jul 13 17:36:57 2016 +0100

Make basic_string::replace forward to different overload

* include/bits/basic_string.h [_GLIBCXX_USE_CXX11_ABI]
(replace(__const_iterator, __const_iterator, initializer_list)):
Forward to different overload.

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 374c985..60e1dbf 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -1912,7 +1912,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   */
   basic_string& replace(const_iterator __i1, const_iterator __i2,
initializer_list<_CharT> __l)
-  { return this->replace(__i1, __i2, __l.begin(), __l.end()); }
+  { return this->replace(__i1, __i2, __l.begin(), __l.size()); }
 #endif // C++11
 
 private:


[patch] Make __allocated_ptr::_S_raw_ptr static

2016-07-13 Thread Jonathan Wakely
These functions should have been static members all along.

tested x86_64-linux, committed to trunk.

commit 421ccb84e555efcd7e3e0123dc85dd40155f1e82
Author: Jonathan Wakely 
Date:   Wed Jul 13 17:34:04 2016 +0100

Make __allocated_ptr::_S_raw_ptr static

* include/bits/allocated_ptr.h (__allocated_ptr::_S_raw_ptr): Make
static.

diff --git a/libstdc++-v3/include/bits/allocated_ptr.h 
b/libstdc++-v3/include/bits/allocated_ptr.h
index 2bd9d82..97e2666 100644
--- a/libstdc++-v3/include/bits/allocated_ptr.h
+++ b/libstdc++-v3/include/bits/allocated_ptr.h
@@ -85,10 +85,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   value_type* get() { return _S_raw_ptr(_M_ptr); }
 
 private:
-  value_type* _S_raw_ptr(value_type* __ptr) { return __ptr; }
+  static value_type* _S_raw_ptr(value_type* __ptr) { return __ptr; }
 
   template
-   auto _S_raw_ptr(_Ptr __ptr) -> decltype(_S_raw_ptr(__ptr.operator->()))
+   static auto
+   _S_raw_ptr(_Ptr __ptr) -> decltype(_S_raw_ptr(__ptr.operator->()))
{ return _S_raw_ptr(__ptr.operator->()); }
 
   _Alloc* _M_alloc;


Re: [PATCH] Amend dump expectation in slsr-8.c (PR, tree-optimization/71490)

2016-07-13 Thread Jeff Law

On 07/13/2016 08:47 AM, Martin Liška wrote:

Hello.

As mentioned in [1], one slsr transformation is gone, thus we need to change 
expected number
of multiplications.

Ready to be installed?
Thanks,
Martin

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71490#c5
Isn't that a code quality regression?  So instead shouldn't we be 
keeping the same expectation, but xfailing the test?


jeff


Re: [PATCH, DOC] Enhance documentation of -fipa-ra option.

2016-07-13 Thread Alexander Monakov
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -7260,7 +7260,9 @@ any called function.  In that case it is not necessary 
> to save and restore
>  them around calls.  This is only possible if called functions are part of
>  same compilation unit as current function and they are compiled before it.
>  
> -Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}.
> +Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}, however the 
> option
> +is disabled if profiler is active (@option{-p}, @option{-pg} or

I think this should say "if generated code will be instrumented for profiling"
(or "is instrumented") instead of "if profiler is active".  Internal comments
can be fuzzy, but user-facing documentation should be more rigorous.

> +@option{-fprofile})

Right now option -fprofile is not documented, so it's probably not ok to
mention it here (I realize it won't be so if you document it as an alias).

> or a port does not emit prologue and epilogue as RTL.

May I suggest "or if callee's register usage cannot be known exactly (this
happens on targets that do not expose prologues and epilogues in RTL)"?

(well, this is still not 100% helpful to the user because they can't easily know
which targets do, but still a bit of an improvement)

Thanks for bringing this forward!  The bit about profiling is especially not
obvious and nice to have documented.

Alexander


[PATCH, DOC] Enhance documentation of -fipa-ra option.

2016-07-13 Thread Martin Liška
Hello.

I accidentally learned that -fipa-ra option is disabled if:

 /* Do not use IPA optimizations for register allocation if profiler is active
or port does not emit prologue and epilogue as RTL.  */
  if (profile_flag || !targetm.have_prologue () || !targetm.have_epilogue ())
flag_ipa_ra = 0;

So I basically copied the comment to invoke.texi.

Apart from that, I've noticed that '-fprofile' (which is alias of '-p') is not 
documented.
Shouldn't it be documented as an alias?

Thanks,
Martin
>From e43eb357b986234e0454e6f6898ef5ae46a7a596 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 13 Jul 2016 18:25:09 +0200
Subject: [PATCH] Enhance documentation of -fipa-ra option.

gcc/ChangeLog:

2016-07-13  Martin Liska  

	* doc/invoke.texi (-fipa-ra): Document when the option is
	disabled. Fix a typo.
---
 gcc/doc/invoke.texi | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9a4db38..647caae 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7260,7 +7260,9 @@ any called function.  In that case it is not necessary to save and restore
 them around calls.  This is only possible if called functions are part of
 same compilation unit as current function and they are compiled before it.
 
-Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}.
+Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}, however the option
+is disabled if profiler is active (@option{-p}, @option{-pg} or
+@option{-fprofile}) or a port does not emit prologue and epilogue as RTL.
 
 @item -fconserve-stack
 @opindex fconserve-stack
@@ -7280,7 +7282,7 @@ Perform code hoisting.  Code hoisting tries to move the
 evaluation of expressions executed on all paths to the function exit
 as early as possible.  This is especially useful as a code size
 optimization, but it often helps for code speed as well.
-This flag is enabled by defailt at @option{-O2} and higher.
+This flag is enabled by default at @option{-O2} and higher.
 
 @item -ftree-pre
 @opindex ftree-pre
-- 
2.8.4



[PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP

2016-07-13 Thread Kyrill Tkachov

Hi all,

The most common way to load and store TImode value in aarch64 is to perform an 
LDP/STP of two X-registers.
This is the *movti_aarch64 pattern in aarch64.md.
There is a bug in the logic in aarch64_classify_address where it validates the 
offset in the address used
to load a TImode value. It passes down TImode to the 
aarch64_offset_7bit_signed_scaled_p check which rejects
offsets that are not a multiple of the mode size of TImode (16). However, this 
is too conservative as X-reg LDP/STP
instructions accept immediate offsets that are a multiple of 8.

Also, considering that the definition of aarch64_offset_7bit_signed_scaled_p is:
  return (offset >= -64 * GET_MODE_SIZE (mode)
  && offset < 64 * GET_MODE_SIZE (mode)
  && offset % GET_MODE_SIZE (mode) == 0);

I think the range check may even be wrong for TImode as this will accept 
offsets in the range [-1024, 1024)
(as long as they are a multiple of 16)
whereas X-reg LDP/STP instructions only accept offsets in the range [-512, 512).
So since the check is for an X-reg LDP/STP address we should be passing down 
DImode.

This patch does that and enables more aggressive generation of REG+IMM 
addressing modes for 64-bit aligned
TImode values, eliminating many address calculation instructions.
For the testcase in the patch we currently generate:
bar:
add x1, x1, 8
add x0, x0, 8
ldp x2, x3, [x1]
stp x2, x3, [x0]
ret

whereas with this patch we generate:
bar:
ldp x2, x3, [x1, 8]
stp x2, x3, [x0, 8]
ret

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2016-07-13  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_classify_address): Use DImode when
performing aarch64_offset_7bit_signed_scaled_p check for TImode LDP/STP
addresses.

2016-07-13  Kyrylo Tkachov  

* gcc.target/aarch64/ldp_stp_unaligned_1.c: New test.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index bea67f88b900be39b6f1ae002353b44c5a4a9f7d..8fd93a54c54ab86c6e600afba48fa441101b57c7 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4033,9 +4033,11 @@ aarch64_classify_address (struct aarch64_address_info *info,
 	 X,X: 7-bit signed scaled offset
 	 Q:   9-bit signed offset
 	 We conservatively require an offset representable in either mode.
-	   */
+	 When performing the check for pairs of X registers i.e.  LDP/STP
+	 pass down DImode since that is the natural size of the LDP/STP
+	 instruction memory accesses.  */
 	  if (mode == TImode || mode == TFmode)
-	return (aarch64_offset_7bit_signed_scaled_p (mode, offset)
+	return (aarch64_offset_7bit_signed_scaled_p (DImode, offset)
 		&& offset_9bit_signed_unscaled_p (mode, offset));
 
 	  /* A 7bit offset check because OImode will emit a ldp/stp
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c
new file mode 100644
index ..a70f92100fb91bcfdcfd4af1cab6f58915038568
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c
@@ -0,0 +1,20 @@
+/* { dg-options "-O2" } */
+
+/* Check that we can use a REG + IMM addressing mode when moving an unaligned
+   TImode value to and from memory.  */
+
+struct foo
+{
+  long long b;
+  __int128 a;
+} __attribute__ ((packed));
+
+void
+bar (struct foo *p, struct foo *q)
+{
+  p->a = q->a;
+}
+
+/* { dg-final { scan-assembler-not "add\tx\[0-9\]+, x\[0-9\]+" } } */
+/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */
+/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */


Re: [PATCH GCC]Improve loop-niter to handle possible infinite loop.

2016-07-13 Thread Bin.Cheng
On Fri, Jul 1, 2016 at 11:33 AM, Richard Biener
 wrote:
> On Tue, Jun 28, 2016 at 8:18 AM, Bin Cheng  wrote:
>> Hi,
>> At the moment, loop niter analyzer depends on simple_iv to understand 
>> control induction variable in order to do further niter analysis.  For cases 
>> reported in PR57558 (comment #4), the control variable is not an SCEV 
>> because it's converted from an smaller type and that type could 
>> overflow/wrap.  As a result, niter information for such loops won't be 
>> analyzed because number_of_iterations_exit exits immediately once simple_iv 
>> fails.  As a matter of fact, niter analyzer can be improved by computing an 
>> additional assumption under which the loop won't be infinite (or the 
>> corresponding iv doesn't overflow).  This patch does so by changing both 
>> scev and niter analyzer.  It introduces a new interface 
>> simple_iv_with_niters which is used in niter analyzer.  The new interface 
>> returns an IV as well as a possible niter expression, the expression 
>> indicates the maximum number the IV can iterate before the returned 
>> simple_iv becoming invalid.  Given below example:
>>
>>   short unsigned int i;
>>   int _1;
>>
>>   :
>>   goto ;
>>
>>   :
>>   arr[_1] = -1;
>>   i_6 = i_2 + 1;
>>
>>   :
>>   # i_2 = PHI <1(2), i_6(3)>
>>   _1 = (int) i_2;
>>   if (_1 != 199)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>   return;
>>
>> When analyzing variable "_1", the old interface simple_iv returns false, 
>> while the new interface returns <{1, 1}_loop, 65534>.  It means "_1" is a 
>> valid SCEV as long as it doesn't iterate more than 65534 times.
>> This patch also introduces a new interface in niter analyzer 
>> number_of_iterations_exit_assumptions.  The new interface further derives 
>> assumption from the result of simple_iv_with_niters, and the assumption can 
>> be used by other optimizers.  As for this loop, it's an unexpected gain 
>> because assumption (198 < 65534) is naturally TRUE.
>> For loops that could be infinite, the assumption will be an expression.  
>> This improvement is still good, for example, the next patch to will 
>> vectorize such loops with help of this patch.
>>
>> This patch also changes how assumptions is handled in niter analyzer.  At 
>> the moment, (non-true) assumptions are not supported unless 
>> -funsafe-loop-optimizations are specified, after this patch, the new 
>> interface exposes assumptions to niter user and let them make their own 
>> decision.  In effect, this patch discards -funsafe-loop-optimizations on 
>> GIMPLE level, which we think is not a very useful option anyway.  Next patch 
>> will xfails tests for this option.  Well, the option is not totally 
>> discarded because it requires RTL changes too.  I will follow up that after 
>> gimple part change.
>>
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>
> Please make simple_iv_with_niters accept NULL as iv_niters and pass
> NULL from simple_iv to avoid useless work.
>
> You have chosen to make the flags per loop instead of say, flags on
> the global loop state.  The patch itself doesn't set the flag
> on any loop thus it doesn't really belong into this patch IMHO, so
> maybe you can split it out.  We do already have a plethora
> of "flags" (badly packed) in struct loop and while I see the interface
> is sth very specific adding another 4 bytes doesn't look
> too nice here (but maybe we don't care, struct loop isn't allocated
> that much hopefully).  I'd like to see a better comment
> before the flags part in cfgloop.h though noting that those are not
> flags computed by the compiler but flags that are set
> by the consumer that affect semantics of certain (document which)
> APIs.  And that consumers are expected to re-set
> flags back!  (failing to do that might be a source of hard to track down 
> bugs?)
>
> Anyway, the _assumtions part of the patch is ok with the suggested change.
>
Hi Richard,
According to your suggestion, I split the original patch into two, and
this is the first part.  It improves scalar evolution and loop niter
analyzer so that (possible) infinite loop can be handled.  As a bonus
point, it also picks up normal loops which were missed before, for
example, loop in the added test.

Bootstrap and test on x86_64 ongoing, Is it OK?

Thanks,
bin

2016-07-13  Bin Cheng  

* tree-scalar-evolution.c (simple_iv_with_niters): New funcion.
(derive_simple_iv_with_niters): New function.
(simple_iv): Rewrite using simple_iv_with_niters.
* tree-scalar-evolution.h (simple_iv_with_niters): New decl.
* tree-ssa-loop-niter.c (number_of_iterations_exit_assumptions): New
function.
(number_of_iterations_exit): Rewrite using above function.
* tree-ssa-loop-niter.h (number_of_iterations_exit_assumptions): New
Decl.

gcc/testsuite/ChangeLog
2016-07-13  Bin Cheng  

* gcc.dg/tree-ssa/loop-41.c: New test.
diff --git 

[PATCH test]Refine test string for scev-8.c

2016-07-13 Thread Bin Cheng
Hi,
Dump information of IVOPT has been improved, while scanning string in 
gcc.dg/tree-ssa/scev-8.c was left over because it appears as PASS.  This patch 
changes test string to what should be tested.
Test result checked.  This is an obvious change.

Thanks,
bin

gcc/testsuite/ChangeLog
2016-07-12  Bin Cheng  

* gcc.dg/tree-ssa/scev-8.c: Update test string.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/scev-8.c
index 766f674..bb2ee7a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/scev-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-8.c
@@ -59,4 +59,4 @@ foo4 (unsigned char s, unsigned char l)
 }
 
 /* Address of array references are not scevs.  */
-/* { dg-final { scan-tree-dump-not "use \[0-9\]\n  address" "ivopts" } } */
+/* { dg-final { scan-tree-dump-not "  Type:\\tADDRESS\n  Use \[0-9\].\[0-9\]:" 
"ivopts" } } */


Re: [PATCH] Add non-const overload of std::string::data()

2016-07-13 Thread Jonathan Wakely

On 13/07/16 15:59 +0100, Jonathan Wakely wrote:

On 13/07/16 12:12 +0100, Jonathan Wakely wrote:

Also fix confusion between pointer and _CharT*, so that allocators with
fancy pointers work correctly.

The _M_data() function returns pointer, but we were using it where
_CharT* was required. This introduces a new _M_c_str() function which
returns a _CharT* instead, and uses that for c_str() and both
overloads of data().

Most places using _M_data() can just use data() instead, as they only
need a const _CharT*. A few need to use _M_c_str() to get _CharT* so
they can write to it (they could use the new data() overload, except
that it's not defined until C++17).

Tested x86_64-linux, committed to trunk.

* include/bits/basic_string.h [_GLIBCXX_USE_CXX11_ABI] (_M_c_str):
New function.


I forgot to add the new exports to the commit, new patch coming
shortly...


I'm reverting that change for now. We need to regenerate baseline
symbols files before we can bump the shared lib version and add new
symbols.




Re: [PATCH, ARM 5/7] Add support for MOVT/MOVW to ARMv8-M Baseline

2016-07-13 Thread Thomas Preudhomme
On Wednesday 13 July 2016 17:14:52 Christophe Lyon wrote:
> Hi Thomas,

Hi Christophe,

> 
> I'm seeing:
> gcc.target/arm/pr42574.c: syntax error in target selector
> "arm_thumb1_ok && { ! arm_thumb1_movt_ok }" for " dg-do 1 compile {
> arm_thumb1_ok && { ! arm_thumb1_movt_ok } } "

Oops. I remember the trial and error to find the right amount of curly braces 
yet I can indeed reproduce the error now. The target keyword is missing. I'll 
submit a patch asap.

Best regards,

Thomas


Re: Fix PR44281 (bad RA with global regs)

2016-07-13 Thread Dominik Vogt
On Fri, Feb 19, 2016 at 11:03:02PM +0100, Bernd Schmidt wrote:
> In this PR, we generate unnecessarily bad code for code that
> declares a global register var. Since global regs get added to
> fixed_regs, IRA never considers them as candidates. However, we do
> seem to have proper data flow information for them. In the testcase,
> the global reg dies, some operations are done on temporary results,
> and the final result stored back in the global reg. We can achieve
> the desired code generation by reusing the global reg for those
> temporaries.
> 
> Bootstrapped and tested on x86_64-linux. Ok? An argument could be
> made not to use this for gcc-6 since global register vars are both
> not very important and not very well represented in the testsuite.

>   PR rtl-optimization/44281
>   * hard-reg-set.h (struct target_hard_regs): New field
>   x_fixed_nonglobal_reg_set.
>   (fixed_nonglobal_reg_set): New macro.
>   * reginfo.c (init_reg_sets_1): Initialize it.
>   * ira.c (setup_alloc_regs): Use fixed_nonglobal_reg_set instead
>   of fixed_reg_set.
> 
>   PR rtl-optimization/44281
>   * gcc.target/i386/pr44281.c: New test.

Unfortunately this patch (or whatever got actually committed) has
broken the gcc.target/s390/pr679443.c test case, which is a bit
fishy (see code snippet below).  I assign most registers to global
variables and then use some complicated arithmetics with the goal
that the pointer stored in the first argument gets saved on the
stack and reloaded to a different register.  Before this patch the
test case just needed three registers to do its work (r2, r3, r4).
With the patch it currently causes an error in the reload pass

  error: unable to find a register to spill

-- snip --
/* Block all registers except the first three argument registers.  */
register long r0 asm ("r0");
register long r1 asm ("r1");
register long r5 asm ("r5");
register long r6 asm ("r6");
register long r7 asm ("r7");
register long r8 asm ("r8");
register long r9 asm ("r9");
register long r10 asm ("r10");
register long r11 asm ("r11");

...

void foo (struct s_t *ps, int c, int i)
{
  /* Uses r2 as address register.  */
  ps->f1 = c;
  /* The calculation of the value is so expensive that it's cheaper to spill ps
 to the stack and reload it later (into a different register).
 ==> Uses r4 as address register.*/
  ps->f2 = i + i % 3;
  /* If dead store elimination fails to detect that the address in r2 during
 the first assignment is an alias of the address in r4 during the second
 assignment, it eliminates the first assignment and the f1 field is not
 written (bug).  */
}
-- snip --

If a fourth register is available, the ICE goes away, but the
pointer remains in r2, rendering the test case useless.

So, what should be done about this?

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [PATCH] Fix PR24574

2016-07-13 Thread Martin Liška
On 07/13/2016 04:15 PM, Richard Biener wrote:
> so not exactly a 50/50 prediction.  It is predicted by early return
> predictor it seems.  If I make the predictor not apply we predict
> it as 50/50 chance.  Hmm.  Honza?

Yes, following test-case eliminates the early return predictor:

int foo(int x, int *v)
{
  if (x != 0)
x = x / 10;

   if (v)
 *v += 123;

   return x;
}

;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
;;prev block 0, next block 3, flags: (NEW, REACHABLE)
;;pred:   ENTRY [100.0%]  (FALLTHRU,EXECUTABLE)
  if (x_5(D) != 0)
goto ;
  else
goto ;
;;succ:   3 [50.0%]  (TRUE_VALUE,EXECUTABLE)
;;4 [50.0%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 3, loop depth 0, count 0, freq 5000, maybe hot
;;prev block 2, next block 4, flags: (NEW, REACHABLE)
;;pred:   2 [50.0%]  (TRUE_VALUE,EXECUTABLE)
  x_6 = x_5(D) / 10;

and there's not predictor:

Predictions for bb 2
  no prediction heuristics: 50.0%
  combined heuristics: 50.0%

Martin


Re: [PATCH, ARM 5/7] Add support for MOVT/MOVW to ARMv8-M Baseline

2016-07-13 Thread Christophe Lyon
Hi Thomas,


On 13 July 2016 at 12:01, Kyrill Tkachov  wrote:
> Hi Thomas,
>
>
> On 07/07/16 15:32, Thomas Preudhomme wrote:
>>
>> Hi Kyrill,
>>
>> Please find an updated version in attachment. Please note I made quite a
>> few
>> other changes as well. The most important one was to add new ARMv8-M
>> Baseline
>> only alternatives to the two movt insns in order to have non predicable
>> output
>> template for ARMv8-M Baseline. I also inverted the logic for how to detect
>> ARMv8-M Baseline in the testsuite (arm_thumb1_movt_ok rather than *_ko)
>> and
>> fixed a couple of typos. I didn't add MOVT test because it is difficult to
>> generate. Using -mslow-flash-data should help but it is not available for
>> ARMv8-M Baseline as only config/arm/thumb2.md was modified to support this
>> option.
>>
>> Update ChangeLog entries are as follow:
>>
>>
>> *** gcc/ChangeLog ***
>>
>> 2016-06-20  Thomas Preud'homme  
>>
>>  * config/arm/arm.h (TARGET_HAVE_MOVT): Include ARMv8-M as having
>> MOVT.
>>  * config/arm/arm.c (arm_arch_name): (const_ok_for_op): Check
>> MOVT/MOVW
>>  availability with TARGET_HAVE_MOVT.
>>  (thumb_legitimate_constant_p): Strip the high part of a
>> label_ref.
>>  (thumb1_rtx_costs): Also return 0 if setting a half word constant
>> and
>>  MOVW is available and replace (unsigned HOST_WIDE_INT) INTVAL by
>>  UINTVAL.
>>  (thumb1_size_rtx_costs): Make set of half word constant also cost
>> 1
>>  extra instruction if MOVW is available.  Use a cost variable
>>  incremented by COSTS_N_INSNS (1) when the condition match rather
>> than
>>  returning an arithmetic expression based on COSTS_N_INSNS.  Make
>>  constant with bottom half word zero cost 2 instruction if MOVW is
>>  available.
>>  * config/arm/arm.md (define_attr "arch"): Add v8mb.
>>  (define_attr "arch_enabled"): Set to yes if arch value is v8mb
>> and
>>  target is ARMv8-M Baseline.
>>  (arm_movt): New unpredicable alternative for ARMv8-M Baseline.
>>  (arm_movtas_ze): Likewise.
>>  * config/arm/thumb1.md (thumb1_movdi_insn): Add ARMv8-M Baseline
>> only
>>  alternative for constants satisfying j constraint.
>>  (thumb1_movsi_insn): Likewise.
>>  (movsi splitter for K alternative): Tighten condition to not
>> trigger
>>  if movt is available and j constraint is satisfied.
>>  (Pe immediate splitter): Likewise.
>>  (thumb1_movhi_insn): Add ARMv8-M Baseline only alternative for
>>  constant fitting in an halfword to use MOVW.
>>  * doc/sourcebuild.texi (arm_thumb1_movt_ok): Document new ARM
>>  effective target.
>>
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2016-05-23  Thomas Preud'homme  
>>
>>  * lib/target-supports.exp
>> (check_effective_target_arm_thumb1_movt_ok):
>>  Define effective target.
>>  * gcc.target/arm/pr42574.c: Require arm_thumb1_ok and
>>  !arm_thumb1_movt_ok to exclude ARMv8-M Baseline.
>>  * gcc.target/arm/movhi_movw.c: New test.
>>  * gcc.target/arm/movsi_movw.c: Likewise.
>>  * gcc.target/arm/movdi_movw.c: Likewise.
>
>
> Ok.
> Thanks,
> Kyrill
>

I'm seeing:
gcc.target/arm/pr42574.c: syntax error in target selector
"arm_thumb1_ok && { ! arm_thumb1_movt_ok }" for " dg-do 1 compile {
arm_thumb1_ok && { ! arm_thumb1_movt_ok } } "

Can you fix it?

Thanks,

Christophe

>
>>
>> Best regards,
>>
>> Thomas
>>
>> On Friday 20 May 2016 12:14:44 Kyrill Tkachov wrote:
>>>
>>> Hi Thomas,
>>>
>>> On 19/05/16 17:11, Thomas Preudhomme wrote:

 On Wednesday 18 May 2016 12:30:41 Kyrill Tkachov wrote:
>
> Hi Thomas,
>
> This looks mostly good with a few nits inline.
> Please repost with the comments addressed.

 Updated ChangeLog entries:

 *** gcc/ChangeLog ***

 2016-05-18  Thomas Preud'homme  

   * config/arm/arm.h (TARGET_HAVE_MOVT): Include ARMv8-M as
 having
   MOVT.
   * config/arm/arm.c (arm_arch_name): (const_ok_for_op): Check
   MOVT/MOVW
   availability with TARGET_HAVE_MOVT.
   (thumb_legitimate_constant_p): Strip the high part of a
   label_ref.
   (thumb1_rtx_costs): Also return 0 if setting a half word
 constant
   and
   MOVW is available and replace (unsigned HOST_WIDE_INT) INTVAL
 by
   UINTVAL.
   (thumb1_size_rtx_costs): Make set of half word constant also
 cost
   1
   extra instruction if MOVW is available.  Use a cost variable
   incremented by COSTS_N_INSNS (1) when the condition match
 rather
   than
   returning an arithmetic expression based on 

Re: [PATCH GCC]Resolve compilation time known alias checks in vectorizer

2016-07-13 Thread Bin.Cheng
On Fri, Jul 1, 2016 at 1:06 PM, Richard Biener
 wrote:
> On Tue, Jun 28, 2016 at 8:18 AM, Bin.Cheng  wrote:
>> On Tue, Jun 14, 2016 at 1:57 PM, Richard Biener
>>  wrote:
>>> On Mon, Jun 13, 2016 at 12:01 PM, Bin Cheng  wrote:
 Hi,
 GCC vectorizer generates many unnecessary runtime alias checks known at 
 compilation time.  For some data-reference pairs, alias relation can be 
 resolved at compilation time, in this case, we can simply skip the alias 
 check.  For some other data-reference pairs, alias relation can be 
 realized at compilation time, in this case, we should return false and 
 simply skip vectorizing the loop.  For the second case, the corresponding 
 loop is vectorized for nothing because the guard alias condition is bound 
 to false anyway.  Vectorizing it not only wastes compilation time, but 
 also slows down generated code because GCC fails to resolve these "false" 
 alias check after vectorization.  Even in cases it can be resolved (by 
 VRP), GCC fails to cleanup all the mess generated in loop versioning.
 This looks like a common issue in spec2k6.  For example, in 
 434.zeusmp/ggen.f, there are three loops vectorized but never executed; in 
 464.h264ref, there are loops in which all runtime alias checks are 
 resolved at compilation time thus loop versioning is proven unnecessary.  
 Statistic data also shows that about >100 loops are falsely vectorized 
 currently in my build of spec2k6.

 This patch is based on 
 https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00399.html, bootstrap and 
 test on x86_64 and AArch64 (ongoing), is it OK?
>>>
>>> This is PR71060 and PR65206?
>>>
>>> Rather than patching up vect_prune_runtime_alias_test_list to handle
>>> runtime known _not_ aliasing (does that happen?  for one of the
>>> testcases?) I'd patch vect_analyze_data_ref_dependence for that case.
>>> Yes, we can have cases where the runtime check generated
>>> is imprecise enough that it is proved to always alias, thus handling
>>> these cases seems fine (in which case handling the other is
>>> trivial).
>>>
>>> So I'm generally fine with the patch but please check the above PRs
>>> and eventually add testcases from them.
>> Hi,
>> The two PRs you mentioned is the case which is proved to always alias.
>> Before this patch, the loop is vectorized, alias check is generated
>> and then simplified into false, at last, the versioned loop gets
>> deleted.  With this patch, it proves always alias earlier and we won't
>> do the useless versioning any more.  And I checked spec, there are
>> quite a lot compile time no-alias cases.
>>
>> Here is the updated patch wrto your comments.  Is it OK?
>
> Ok.

Patch re-tested/applied on trunk as r238301.  As I mentioned
previously, gcc.dg/vect/vect-mask-store-move-1.c fails now because of
PR65206.

Thanks,
bin


Re: [PATCH] Add non-const overload of std::string::data()

2016-07-13 Thread Jonathan Wakely

On 13/07/16 12:12 +0100, Jonathan Wakely wrote:

Also fix confusion between pointer and _CharT*, so that allocators with
fancy pointers work correctly.

The _M_data() function returns pointer, but we were using it where
_CharT* was required. This introduces a new _M_c_str() function which
returns a _CharT* instead, and uses that for c_str() and both
overloads of data().

Most places using _M_data() can just use data() instead, as they only
need a const _CharT*. A few need to use _M_c_str() to get _CharT* so
they can write to it (they could use the new data() overload, except
that it's not defined until C++17).

Tested x86_64-linux, committed to trunk.

* include/bits/basic_string.h [_GLIBCXX_USE_CXX11_ABI] (_M_c_str):
New function.


I forgot to add the new exports to the commit, new patch coming
shortly...


[PATCH] Amend dump expectation in slsr-8.c (PR, tree-optimization/71490)

2016-07-13 Thread Martin Liška
Hello.

As mentioned in [1], one slsr transformation is gone, thus we need to change 
expected number
of multiplications.

Ready to be installed?
Thanks,
Martin

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71490#c5
>From ac130165f6c8166c28227fec2a6fa3afbccadb27 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 13 Jul 2016 16:39:28 +0200
Subject: [PATCH] Amend dump expectation in slsr-8.c (PR
 tree-optimization/71490)

gcc/testsuite/ChangeLog:

2016-07-13  Martin Liska  

	* gcc.dg/tree-ssa/slsr-8.c: Amend dump expectation.
---
 gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c b/gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c
index 2bd60aa..47b644b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c
@@ -18,6 +18,6 @@ f (int s, int *c)
 }
 
 /* There are 4 ' * ' instances in the decls (since "int * iftmp.0;" is
-   added), 1 parm, 2 in the code.  The second one in the code can be
+   added), 1 parm, 4 in the code.  The second one in the code can be
a widening mult.  */
-/* { dg-final { scan-tree-dump-times " w?\\* " 7 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " w?\\* " 9 "optimized" } } */
-- 
2.8.4



Re: [PATCH] disable IPA-cp cloning for functions with target_clones attribute

2016-07-13 Thread Jeff Law

On 06/24/2016 02:41 PM, Evgeny Stupachenko wrote:

Hi,

Fix ICE when IPA-cp and target_clones are applied to the same function.
Is the patch ok for trunk?

Thanks,
Evgeny

2016-06-24  Evgeny Stupachenko  

gcc/
* ipa-cp.c (determine_versionability): Do not create constprop clones,
when target_clones attribute is set.

OK.  As Martin suggested, please backport to the gcc-6 branch.

Thanks,
Jeff



Re: [PATCH] disable IPA-cp cloning for functions with target_clones attribute

2016-07-13 Thread Jeff Law

On 07/12/2016 03:31 AM, Martin Jambor wrote:

Hi,

On Fri, Jun 24, 2016 at 01:41:05PM -0700, Evgeny Stupachenko wrote:

Hi,

Fix ICE when IPA-cp and target_clones are applied to the same function.
Is the patch ok for trunk?


I can't approve anything but since I wrote most of IPA-CP, it may
count that I am fine with the patch.

Martin, we can probably change that ;-)  Interested?

Jeff


Rework C/C++ OpenACC routine parsing (was: C/C++: Simplify handling of location information for OpenACC routine directives)

2016-07-13 Thread Thomas Schwinge
Hi!

On Wed, 13 Jul 2016 11:25:46 +0200, I wrote:
> Working on something else regarding the C/C++ OpenACC routine directive,
> I couldn't but untangle [...]

> (Another C/C++ OpenACC routine
> cleanup patch is emerging, depending on this one.)

Here it is; likewise, OK for trunk?  (Further cleanup especially of C++
OpenACC routine handling seems to be possible, but I want to synchronize
my work at this point.)

commit 0bd30acaf4dd634499b1c695ddee555e7675aa18
Author: Thomas Schwinge 
Date:   Thu Jun 23 13:28:09 2016 +0200

Rework C/C++ OpenACC routine parsing

gcc/c/
* c-parser.c (struct oacc_routine_data): Add error_seen and
fndecl_seen members.
(c_finish_oacc_routine): Use these.
(c_parser_declaration_or_fndef): Adjust.
(c_parser_oacc_routine): Likewise.  Support more C language
constructs, and improve diagnostics.  Move pragma context
checking...
(c_parser_pragma): ... here.
gcc/cp/
* parser.c (cp_ensure_no_oacc_routine): Improve diagnostics.
(cp_parser_late_parsing_cilk_simd_fn_info): Fix diagnostics.
(cp_parser_late_parsing_oacc_routine, cp_finalize_oacc_routine):
Simplify code, and improve diagnostics.
(cp_parser_oacc_routine): Likewise.  Move pragma context
checking...
(cp_parser_pragma): ... here.
gcc/testsuite/
* c-c++-common/goacc/routine-5.c: Update.
---
 gcc/c/c-parser.c | 161 +++---
 gcc/cp/parser.c  | 182 +++-
 gcc/testsuite/c-c++-common/goacc/routine-5.c | 199 +++
 3 files changed, 369 insertions(+), 173 deletions(-)

diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 7f84ce9..809118a 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -1273,6 +1273,8 @@ enum c_parser_prec {
 
 /* Helper data structure for parsing #pragma acc routine.  */
 struct oacc_routine_data {
+  bool error_seen; /* Set if error has been reported.  */
+  bool fndecl_seen; /* Set if one fn decl/definition has been seen already.  */
   tree clauses;
   location_t loc;
 };
@@ -1565,8 +1567,7 @@ c_parser_external_declaration (c_parser *parser)
 }
 
 static void c_finish_omp_declare_simd (c_parser *, tree, tree, vec);
-static void c_finish_oacc_routine (struct oacc_routine_data *, tree, bool,
-  bool, bool);
+static void c_finish_oacc_routine (struct oacc_routine_data *, tree, bool);
 
 /* Parse a declaration or function definition (C90 6.5, 6.7.1, C99
6.7, 6.9.1).  If FNDEF_OK is true, a function definition is
@@ -1751,8 +1752,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
}
   c_parser_consume_token (parser);
   if (oacc_routine_data)
-   c_finish_oacc_routine (oacc_routine_data, NULL_TREE, false, true,
-  false);
+   c_finish_oacc_routine (oacc_routine_data, NULL_TREE, false);
   return;
 }
 
@@ -1850,7 +1850,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
   prefix_attrs = specs->attrs;
   all_prefix_attrs = prefix_attrs;
   specs->attrs = NULL_TREE;
-  for (bool first = true;; first = false)
+  while (true)
 {
   struct c_declarator *declarator;
   bool dummy = false;
@@ -1870,8 +1870,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
c_finish_omp_declare_simd (parser, NULL_TREE, NULL_TREE,
   omp_declare_simd_clauses);
  if (oacc_routine_data)
-   c_finish_oacc_routine (oacc_routine_data, NULL_TREE,
-  false, first, false);
+   c_finish_oacc_routine (oacc_routine_data, NULL_TREE, false);
  c_parser_skip_to_end_of_block_or_statement (parser);
  return;
}
@@ -1987,8 +1986,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
  finish_init ();
}
  if (oacc_routine_data)
-   c_finish_oacc_routine (oacc_routine_data, d,
-  false, first, false);
+   c_finish_oacc_routine (oacc_routine_data, d, false);
  if (d != error_mark_node)
{
  maybe_warn_string_init (init_loc, TREE_TYPE (d), init);
@@ -2033,8 +2031,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
temp_pop_parm_decls ();
}
  if (oacc_routine_data)
-   c_finish_oacc_routine (oacc_routine_data, d,
-  false, first, false);
+   c_finish_oacc_routine (oacc_routine_data, d, false);
  if (d)
finish_decl (d, UNKNOWN_LOCATION, NULL_TREE,
 NULL_TREE, asm_name);
@@ -2146,8 +2143,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
 

Re: [PATCH] Fix PR24574

2016-07-13 Thread Richard Biener
On Wed, 13 Jul 2016, Alexander Monakov wrote:

> Hi,
> 
> On Wed, 13 Jul 2016, Richard Biener wrote:
> > The following adds the ability to transform
> > 
> >  if (x != 0)
> >x = x / 10;
> > 
> > to
> > 
> >  x = x / 10;
> > 
> > as requested by PR.  Plus it adds some more ops where such transform
> > is possible.
> 
> In the bugzilla, you said,
> 
> > Only for -Os, it's better to avoid the expensive division otherwise.
> 
> But the patch seems to apply the transform irrespective of -Os; is that
> deliberate?  (fwiw I agree with your bugzilla comment above)

I said that but I decided I was wrong as the conditional block will
be predicted as more likely executed.  Also we were doing the transform
for other ops already.  The most simple testcase has

;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
;;prev block 0, next block 3, flags: (NEW, REACHABLE)
;;pred:   ENTRY [100.0%]  (FALLTHRU,EXECUTABLE)
  if (xD.1750_2(D) != 0)
goto ;
  else
goto ;
;;succ:   3 [54.0%]  (TRUE_VALUE,EXECUTABLE)
;;4 [46.0%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 3, loop depth 0, count 0, freq 5400, maybe hot
;;prev block 2, next block 4, flags: (NEW, REACHABLE)
;;pred:   2 [54.0%]  (TRUE_VALUE,EXECUTABLE)
  xD.1750_3 = xD.1750_2(D) / 10;

so not exactly a 50/50 prediction.  It is predicted by early return
predictor it seems.  If I make the predictor not apply we predict
it as 50/50 chance.  Hmm.  Honza?

Anyway, the phiopt transform is guarded with

  /* Size-wise, this is always profitable.  */
  if (optimize_bb_for_speed_p (cond_bb)
  /* The special case is useless if it has a low probability.  */
  && profile_status_for_fn (cfun) != PROFILE_ABSENT
  && EDGE_PRED (middle_bb, 0)->probability < PROB_EVEN
  /* If assign is cheap, there is no point avoiding it.  */
  && estimate_num_insns (assign, _time_weights)
 >= 3 * estimate_num_insns (cond, _time_weights))
return 0;

and thus it should be handled correctly already (division and modulo
have higher time weights).

Richard.


[PATCH] Fix PR33315, simple store sinking/commoning

2016-07-13 Thread Richard Biener

The following adds a simple ad-hoc (matching other code) way to
perform sinking and merging of common stores to a CFG merger
to the SSA code sinking pass.

On cc1-files (from the GCC 4.7 branch head) this performs
930 such merges out of which 366 are stores with the same value
(and thus require no PHI).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

The motivating testcase is that of PR33315 which we now manage to
optimize to three straight-line stores from a previously twisted
CFG maze.

There are a lot of enhancement possibilities but as for all the
rest of the sinking code in this pass a proper dataflow driven
approach is missing (like SSU-PRE ontop of which sinking can
be implemented similar to hoisting ontop of GVN-PRE).

Comments?

I'll give it overnight SPEC testing (but don't expect any surprises).

Thanks,
Richard.

2016-07-13  Richard Biener  

PR tree-optimization/33315
* tree-ssa-sink.c: Include tree-eh.h.
(sink_code_in_bb): Return TODO_cleanup_cfg if we commonized
and sunk stores.  Implement store commoning by sinking to
the successor.

* gcc.dg/tree-ssa/ssa-sink-13.c: New testcase.
* gcc.dg/tree-ssa/ssa-sink-14.c: Likewise.

Index: gcc/tree-ssa-sink.c
===
*** gcc/tree-ssa-sink.c (revision 238287)
--- gcc/tree-ssa-sink.c (working copy)
*** along with GCC; see the file COPYING3.
*** 35,40 
--- 35,41 
  #include "tree-cfg.h"
  #include "cfgloop.h"
  #include "params.h"
+ #include "tree-eh.h"
  
  /* TODO:
 1. Sinking store only using scalar promotion (IE without moving the RHS):
*** statement_sink_location (gimple *stmt, b
*** 460,466 
  
  /* Perform code sinking on BB */
  
! static void
  sink_code_in_bb (basic_block bb)
  {
basic_block son;
--- 461,467 
  
  /* Perform code sinking on BB */
  
! static unsigned
  sink_code_in_bb (basic_block bb)
  {
basic_block son;
*** sink_code_in_bb (basic_block bb)
*** 468,473 
--- 469,628 
edge_iterator ei;
edge e;
bool last = true;
+   unsigned todo = 0;
+ 
+   /* Very simplistic code to sink common stores from the predecessor through
+  our virtual PHI.  We do this before sinking stmts from BB as it might
+  expose sinking opportunities of the merged stores.
+  Once we have partial dead code elimination through sth like SSU-PRE this
+  should be moved there.  */
+   gphi *phi;
+   if (EDGE_COUNT (bb->preds) > 1
+   && (phi = get_virtual_phi (bb)))
+ {
+   /* Repeat until no more common stores are found.  */
+   while (1)
+   {
+ gimple *first_store = NULL;
+ auto_vec  vdefs;
+ 
+ /* Search for common stores defined by all virtual PHI args.
+???  Common stores not present in all predecessors could
+be handled by inserting a forwarder to sink to.  Generally
+this involves deciding which stores to do this for if
+multiple common stores are present for different sets of
+predecessors.  See PR11832 for an interesting case.  */
+ for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+   {
+ tree arg = gimple_phi_arg_def (phi, i);
+ gimple *def = SSA_NAME_DEF_STMT (arg);
+ if (! is_gimple_assign (def)
+ || stmt_can_throw_internal (def))
+   {
+ /* ???  We could handle some cascading with the def being
+another PHI.  We'd have to insert multiple PHIs for
+the rhs then though (if they are not all equal).  */
+ first_store = NULL;
+ break;
+   }
+ /* ???  Do not try to do anything fancy with aliasing, thus
+do not sink across non-aliased loads (or even stores,
+so different store order will make the sinking fail).  */
+ bool all_uses_on_phi = true;
+ imm_use_iterator iter;
+ use_operand_p use_p;
+ FOR_EACH_IMM_USE_FAST (use_p, iter, arg)
+   if (USE_STMT (use_p) != phi)
+ all_uses_on_phi = false;
+ if (! all_uses_on_phi)
+   {
+ first_store = NULL;
+ break;
+   }
+ if (! first_store)
+   first_store = def;
+ /* ??? We could handle differing SSA uses in the LHS by inserting
+PHIs for them.  */
+ else if (! operand_equal_p (gimple_assign_lhs (first_store),
+ gimple_assign_lhs (def), 0))
+   {
+ first_store = NULL;
+ break;
+   }
+ TREE_VISITED (arg) = 1;
+ vdefs.safe_push (arg);
+   }
+ if (! first_store)
+   break;
+ 
+ 

Re: [PATCH] Fix PR24574

2016-07-13 Thread Alexander Monakov
Hi,

On Wed, 13 Jul 2016, Richard Biener wrote:
> The following adds the ability to transform
> 
>  if (x != 0)
>x = x / 10;
> 
> to
> 
>  x = x / 10;
> 
> as requested by PR.  Plus it adds some more ops where such transform
> is possible.

In the bugzilla, you said,

> Only for -Os, it's better to avoid the expensive division otherwise.

But the patch seems to apply the transform irrespective of -Os; is that
deliberate?  (fwiw I agree with your bugzilla comment above)

Thanks.
Alexander


Re: [PATCH] Fix Fortran DO loop fallback

2016-07-13 Thread Martin Liška
On 07/13/2016 12:32 PM, Richard Biener wrote:
> Changelog has a swapped entry.
> 
> Ok with fixing that.
> Richard.

Fixed and installed as r238300.
M.


[PATCH] Fix PR24574

2016-07-13 Thread Richard Biener

The following adds the ability to transform

 if (x != 0)
   x = x / 10;

to

 x = x / 10;

as requested by PR.  Plus it adds some more ops where such transform
is possible.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-07-13  Richard Biener  

PR tree-optimization/24574
* tree-ssa-phiopt.c (absorbing_element_p): Pass in argument
position and add shift, rotate, divison and modulo support
for left zero.
(value_replacement): Pass in argument position to absorbing_element_p.

* gcc.dg/pr24574.c: New testcase.

Index: gcc/tree-ssa-phiopt.c
===
--- gcc/tree-ssa-phiopt.c   (revision 238242)
+++ gcc/tree-ssa-phiopt.c   (working copy)
@@ -812,7 +812,7 @@ neutral_element_p (tree_code code, tree
 /* Returns true if ARG is an absorbing element for operation CODE.  */
 
 static bool
-absorbing_element_p (tree_code code, tree arg)
+absorbing_element_p (tree_code code, tree arg, bool right)
 {
   switch (code)
 {
@@ -823,6 +823,21 @@ absorbing_element_p (tree_code code, tre
 case BIT_AND_EXPR:
   return integer_zerop (arg);
 
+case LSHIFT_EXPR:
+case RSHIFT_EXPR:
+case LROTATE_EXPR:
+case RROTATE_EXPR:
+case TRUNC_DIV_EXPR:
+case CEIL_DIV_EXPR:
+case FLOOR_DIV_EXPR:
+case ROUND_DIV_EXPR:
+case EXACT_DIV_EXPR:
+case TRUNC_MOD_EXPR:
+case CEIL_MOD_EXPR:
+case FLOOR_MOD_EXPR:
+case ROUND_MOD_EXPR:
+  return !right && integer_zerop (arg);
+
 default:
   return false;
 }
@@ -994,9 +1009,10 @@ value_replacement (basic_block cond_bb,
  && operand_equal_for_phi_arg_p (rhs1, cond_lhs)
  && neutral_element_p (code_def, cond_rhs, false))
  || (operand_equal_for_phi_arg_p (arg1, cond_rhs)
- && (operand_equal_for_phi_arg_p (rhs2, cond_lhs)
- || operand_equal_for_phi_arg_p (rhs1, cond_lhs))
- && absorbing_element_p (code_def, cond_rhs
+ && ((operand_equal_for_phi_arg_p (rhs2, cond_lhs)
+  && absorbing_element_p (code_def, cond_rhs, true))
+ || (operand_equal_for_phi_arg_p (rhs1, cond_lhs)
+ && absorbing_element_p (code_def, cond_rhs, false))
 {
   gsi = gsi_for_stmt (cond);
   if (INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
Index: gcc/testsuite/gcc.dg/pr24574.c
===
--- gcc/testsuite/gcc.dg/pr24574.c  (revision 0)
+++ gcc/testsuite/gcc.dg/pr24574.c  (working copy)
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-phiopt1" } */
+
+int f0(int i)
+{
+  if (i == 0) return 0;
+  return i/10;
+}
+int f1(int i)
+{
+  return i?i/10:0;
+}
+
+int f2(int i)
+{
+  if (i == 0) return 0;
+  return i%10;
+}
+int f3(int i)
+{
+  return i?i%10:0;
+}
+
+int f4(int i)
+{
+  if (i == 0) return 0;
+  return i<<10;
+}
+int f5(int i)
+{
+  return i?i<<10:0;
+}
+
+/* We should if-convert all functions to carry out the operation
+   unconditionally.  */
+/* { dg-final { scan-tree-dump-not "= PHI" "phiopt1" } } */


Re: [gomp4] backport fixes for PR71704

2016-07-13 Thread Thomas Schwinge
Hi!

On Tue, 12 Jul 2016 13:03:43 -0700, Cesar Philippidis  
wrote:
> This patch contains both Jakub's OpenMP and my OpenACC fixes for
> PR71704. For reference, the discussion for the original patches can be
> found here .
> 
> I'll apply this patch to gomp-4_0-branch shortly.

> 2016-07-12  Cesar Philippidis  
> 
>   Backport from trunk:
>   2016-07-08  Cesar Philippidis  
> [...]
>   2016-06-01  Jakub Jelinek  
> 
>   gcc/fortran/
>   * parse.c (case_decl): Move ST_OMP_* to ...
>   (case_omp_decl): ... here, new macro.
>   (verify_st_order): For case_omp_decl, complain about
>   p->state >= ORDER_EXEC, but don't change p->state otherwise.
> 
>   gcc/testsuite
>   * gfortran.dg/gomp/order-1.f90: New test.
>   * gfortran.dg/gomp/order-2.f90: New test.

;-) Incorrect ChangeLog updates; fixed in gomp-4_0-branch r238296:

commit fbdd5b5332e59166c990e10e41649756e9d3a897
Author: tschwinge 
Date:   Wed Jul 13 13:28:46 2016 +

Fix r238262 ChangeLog updates

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@238296 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/fortran/ChangeLog.gomp   | 17 -
 gcc/testsuite/ChangeLog.gomp |  5 ++---
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git gcc/fortran/ChangeLog.gomp gcc/fortran/ChangeLog.gomp
index 4c51452..2532abc 100644
--- gcc/fortran/ChangeLog.gomp
+++ gcc/fortran/ChangeLog.gomp
@@ -9,12 +9,19 @@
ST_GET_FCN_CHARACTERISTICS if a non-declarative directive could be
matched.
 
-   2016-06-01  Jakub Jelinek  
+   2016-06-30  Jakub Jelinek  
 
-   * parse.c (case_decl): Move ST_OMP_* to ...
-   (case_omp_decl): ... here, new macro.
-   (verify_st_order): For case_omp_decl, complain about
-   p->state >= ORDER_EXEC, but don't change p->state otherwise.
+   * parse.c (matchs, matcho): Move right before decode_omp_directive.
+   If spec_only, only gfc_match the keyword and if successful, goto
+   do_spec_only.
+   (matchds, matchdo): Define.
+   (decode_omp_directive): Add spec_only local var and set it.
+   Use matchds or matchdo macros instead of matchs or matcho
+   for declare target, declare simd, declare reduction and threadprivate
+   directives.  Return ST_GET_FCN_CHARACTERISTICS if a non-declarative
+   directive could be matched.
+   (next_statement): For ST_GET_FCN_CHARACTERISTICS restore
+   gfc_current_locus from old_locus even if there is no label.
 
 2016-07-01  Cesar Philippidis  
 
diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index d731dc6..d31ff07 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -5,10 +5,9 @@
 
* gfortran.dg/goacc/pr71704.f90: New test.
 
-   2016-06-01  Jakub Jelinek  
+   2016-06-30  Jakub Jelinek  
 
-   * gfortran.dg/gomp/order-1.f90: New test.
-   * gfortran.dg/gomp/order-2.f90: New test.
+   * gfortran.dg/gomp/pr71704.f90: New test.
 
 2016-06-10  Thomas Schwinge  
 


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: Fix loop_only_exit_p

2016-07-13 Thread Jan Hubicka
> Ah, ok.  I dove into the code and that indeed seems to be the case.
> 
> Why is that function ignoring volatiles but not volatile asms or
> old-style asms?  I think the only control-flow transfer asms are
> allowed to make are asm gotos?

You mean
  if (gasm *asm_stmt = dyn_cast  (t))
if (gimple_asm_volatile_p (asm_stmt) || gimple_asm_input_p (asm_stmt))
  return true;
I was also bit puzzled by this conditional but I think it assumes that volatile
ASM can, for example, perform infinite loop and wait for asynchronous interrupt
to transfer control flow away.

It is of course undocumented but one may probably also try to throw exception by
hand (i.e. call throw())

The second conditional seems to be there because
asm("blah")
is old style asm that has implicit volatile on it.
> 
> Anyway, that's a pre-existing issue so the patch is ok.

Thanks,
Honza


Re: [patch] Fix type merging deficiency during WPA

2016-07-13 Thread Eric Botcazou
> As tree merging really replaces trees it has to error on the side of not
> merging while canonical type merging has to error on the side of "merging"
> to make types alias.

OK, then the former won't be sufficient for Ada, there are known cases where 
producers and clients of a package cannot see the exact same tree for a type, 
so we definitely need optimistic merging here.

> Btw, for the LTO_SET_PREVAIL can you introduce a LTO_SET_PREVAIL_EXPR
> and use that for the fields that possibly can be expressions plus simply use
> walk_tree for those (in case they are EXPR_P)?  Please split that part out
> as well.

Yes, I can do that.

-- 
Eric Botcazou


[Patch, testsuite] Fix some bogus testsuite failures for avr

2016-07-13 Thread Senthil Kumar Selvaraj
Hi,

  This patch requires int32plus and ptr32plus for a couple of tests,
  tweaks Wduplicated-cond-3.c to use a smaller constant that fits in
  16 bits, and marks one test as too big
  for avr.

  Committed to trunk.

Regards
Senthil

2016-07-13  Senthil Kumar Selvaraj  

* c-c++-common/Wduplicated-cond-3.c (fn10): Use smaller 
const literal.
* c-c++-common/builtin-arith-overflow-2.c: Skip for avr.
* c-c++-common/pr68833-1.c: Require int32plus.
* gcc.dg/ipa/pr63551.c: Likewise.
* gcc.dg/ipa/pr63595.c: Require ptr32plus.
* gcc.dg/ipa/pr64041.c: Require int32plus.
 
diff --git gcc/testsuite/c-c++-common/Wduplicated-cond-3.c 
gcc/testsuite/c-c++-common/Wduplicated-cond-3.c
index e3b5ac0..f928357 100644
--- gcc/testsuite/c-c++-common/Wduplicated-cond-3.c
+++ gcc/testsuite/c-c++-common/Wduplicated-cond-3.c
@@ -187,7 +187,7 @@ int
 fn10 (void)
 {
   if (foo ())
-return 1732984;
+return 17329;
   else if (foo ())
 return 18409;
   return 0;
diff --git gcc/testsuite/c-c++-common/builtin-arith-overflow-2.c 
gcc/testsuite/c-c++-common/builtin-arith-overflow-2.c
index 4cbceff..7dd0e50 100644
--- gcc/testsuite/c-c++-common/builtin-arith-overflow-2.c
+++ gcc/testsuite/c-c++-common/builtin-arith-overflow-2.c
@@ -1,6 +1,7 @@
 /* PR c/68120 - can't easily deal with integer overflow at compile time */
 /* { dg-do run } */
 /* { dg-additional-options "-Wno-long-long" } */
+/* { dg-skip-if "Program too big" { "avr-*-*" } } */
 
 #define SCHAR_MAX__SCHAR_MAX__
 #define SHRT_MAX __SHRT_MAX__
diff --git gcc/testsuite/c-c++-common/pr68833-1.c 
gcc/testsuite/c-c++-common/pr68833-1.c
index e0601b3..c88f67e 100644
--- gcc/testsuite/c-c++-common/pr68833-1.c
+++ gcc/testsuite/c-c++-common/pr68833-1.c
@@ -1,6 +1,7 @@
 /* PR c/68833 */
 /* { dg-do compile } */
 /* { dg-options "-Werror=larger-than-65536 -Werror=format 
-Werror=missing-noreturn" } */
+/* { dg-require-effective-target int32plus } */
 
 int a[131072]; /* { dg-error "size of 'a' is \[1-9]\[0-9]* bytes" } */
 int b[1024];   /* { dg-bogus "size of 'b' is \[1-9]\[0-9]* bytes" } */
diff --git gcc/testsuite/gcc.dg/ipa/pr63551.c gcc/testsuite/gcc.dg/ipa/pr63551.c
index 48b020a..225e323 100644
--- gcc/testsuite/gcc.dg/ipa/pr63551.c
+++ gcc/testsuite/gcc.dg/ipa/pr63551.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-Os" } */
+/* { dg-require-effective-target int32plus } */
 
 union U
 {
diff --git gcc/testsuite/gcc.dg/ipa/pr63595.c gcc/testsuite/gcc.dg/ipa/pr63595.c
index d656de5..ee48934 100644
--- gcc/testsuite/gcc.dg/ipa/pr63595.c
+++ gcc/testsuite/gcc.dg/ipa/pr63595.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-ipa-icf-details"  } */
+/* { dg-require-effective-target ptr32plus } */
 
 typedef int size_t;
 
diff --git gcc/testsuite/gcc.dg/ipa/pr64041.c gcc/testsuite/gcc.dg/ipa/pr64041.c
index 4877b4b..18e0168 100644
--- gcc/testsuite/gcc.dg/ipa/pr64041.c
+++ gcc/testsuite/gcc.dg/ipa/pr64041.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2" } */
+/* { dg-require-effective-target int32plus } */
 
 int printf (const char *, ...);
 
-- 
2.7.4



Re: [PATCH, Fortran, OpenACC] Fix PR70598, Fortran host_data ICE (ping x3)

2016-07-13 Thread Chung-Lin Tang

Ping x3

On 06/21/2016 02:18 PM, Chung-Lin Tang wrote:

Ping x2

On 2016/6/7 08:03 PM, Chung-Lin Tang wrote:

Ping.

On 2016/5/11 02:57 AM, Bernhard Reutner-Fischer wrote:

On May 9, 2016 4:26:50 PM GMT+02:00, Chung-Lin Tang  
wrote:

Hi, this patch resolves an ICE for Fortran when using the OpenACC
host_data directive.  Actually, rather than say resolve, it's more like
adjusting the front-end to same middle-end restrictions as C/C++,
namely that we only support pointers or arrays for host_data right now.

This patch contains a little bit of adjustments in
fortran/openmp.c:resolve_omp_clauses(),
and some testcase adjustments. This has been tested without regressions
for Fortran.

Is this okay for trunk?

Thanks,
Chung-Lin

2015-05-09  Chung-Lin Tang  

gcc/
* fortran/openmp.c (resolve_omp_clauses): Adjust use_device clause
handling to only allow pointers and arrays.


Fortran has it's own ChangeLog. The patch itself looks somewhat plausible to 
me, fwiw, but Jakub or a FE maintainer has the say.







Re: [PATCH, rs6000] Fix PR target/71733, ICE with -mcpu=power9 -mno-vsx

2016-07-13 Thread Segher Boessenkool
On Wed, Jul 13, 2016 at 02:57:01PM +0930, Alan Modra wrote:
> This is what I've bootstrapped and regression tested on
> powerpc64le-linux.  I'm using Peter's testcases from this thread
> rather than the one in the original patch submission, because that one
> relies on -O0 not reducing the function down to a nop.  OK to apply?

This is okay.  Does it need a backport?  Okay for 6 as well, then.

We all agreed the question marks would be a good idea, but you say it
causes some testcases to fail.  Could you investigate please?

Thanks,


Segher


>   PR target/71733
> gcc/
>   * config/rs6000/rs6000.c (rs6000_option_override_internal): Deal
>   with p9_vector override before power9-dform override.
> gcc/testsuite/
>   * gcc.target/powerpc/p9-novsx.c: New.


Re: Fix loop_only_exit_p

2016-07-13 Thread Richard Biener
On Wed, 13 Jul 2016, Jan Hubicka wrote:

> > 
> > I'd rather not expose/change need_fake_edge_p as that has a very
> > specific purpose.
> > 
> > Why don't you simply add a call to is_ctrl_altering_stmt on the
> > last stmt of the block in loop_only_exit_p?  It's a waste of
> > time doing stuff on every stmt that can only make a difference
> > on the last one of a BB.
> 
> I am not sure I understand your suggestion. is_ctrl_altering_stmt checks
> whether stmt terminates BB (i.e. it does control flow we represent 
> explicitly).
> Here we look for cases where BB execution can terminate in the middle of BB
> in ways not explicitly represented in CFG.
> 
> For example:
> 
>  ...stmt...
>  *ptr=0
>  ...stmt...
> 
> the control flow can terminate by the memory access that may throw non-call
> exception. 
> 
> or
> 
>  ...stmt...
>  volatile asm statement eventually leading to longjmp
>  ...stmt...
> 
> I worte RTL version of need_fake_edge_p many years ago to check particularly
> these cases. This is what loop code checks - it wants to know that the loop
> will eventually terminate by the exit edge on hand, not by other ways.

Ah, ok.  I dove into the code and that indeed seems to be the case.

Why is that function ignoring volatiles but not volatile asms or
old-style asms?  I think the only control-flow transfer asms are
allowed to make are asm gotos?

Anyway, that's a pre-existing issue so the patch is ok.

Thanks,
Richard.

> Honza
> > 
> > Richard.
> > 
> > > Honza
> > > 
> > >   * gimple.h (stmt_can_terminate_bb_p): New function.
> > >   * tree-cfg.c (need_fake_edge_p): Rename to ...
> > >   (stmt_can_terminate_bb_p): ... this; return true if stmt can
> > >   throw external; handle const and pure calls.
> > >   * tree-ssa-loop-niter.c (loop_only_exit_p): Use it.
> > > Index: gimple.h
> > > ===
> > > --- gimple.h  (revision 238191)
> > > +++ gimple.h  (working copy)
> > > @@ -1526,6 +1526,7 @@ extern void gimple_seq_set_location (gim
> > >  extern void gimple_seq_discard (gimple_seq);
> > >  extern void maybe_remove_unused_call_args (struct function *, gimple *);
> > >  extern bool gimple_inexpensive_call_p (gcall *);
> > > +extern bool stmt_can_terminate_bb_p (gimple *);
> > >  
> > >  /* Formal (expression) temporary table handling: multiple occurrences of
> > > the same scalar expression are evaluated into the same temporary.  */
> > > Index: tree-cfg.c
> > > ===
> > > --- tree-cfg.c(revision 238191)
> > > +++ tree-cfg.c(working copy)
> > > @@ -7919,15 +7919,20 @@ gimple_block_ends_with_condjump_p (const
> > >  }
> > >  
> > >  
> > > -/* Return true if we need to add fake edge to exit at statement T.
> > > -   Helper function for gimple_flow_call_edges_add.  */
> > > +/* Return true if statement T may terminate execution of BB in ways not
> > > +   explicitly represtented in the CFG.  */
> > >  
> > > -static bool
> > > -need_fake_edge_p (gimple *t)
> > > +bool
> > > +stmt_can_terminate_bb_p (gimple *t)
> > >  {
> > >tree fndecl = NULL_TREE;
> > >int call_flags = 0;
> > >  
> > > +  /* Eh exception not handled internally terminates execution of the 
> > > whole
> > > + function.  */
> > > +  if (stmt_can_throw_external (t))
> > > +return true;
> > > +
> > >/* NORETURN and LONGJMP calls already have an edge to exit.
> > >   CONST and PURE calls do not need one.
> > >   We don't currently check for CONST and PURE here, although
> > > @@ -7960,6 +7965,13 @@ need_fake_edge_p (gimple *t)
> > >edge e;
> > >basic_block bb;
> > >  
> > > +  if (call_flags & (ECF_PURE | ECF_CONST)
> > > +   && !(call_flags & ECF_LOOPING_CONST_OR_PURE))
> > > + return false;
> > > +
> > > +  /* Function call may do longjmp, terminate program or do other 
> > > things.
> > > +  Special case noreturn that have non-abnormal edges out as in this case
> > > +  the fact is sufficiently represented by lack of edges out of T.  */
> > >if (!(call_flags & ECF_NORETURN))
> > >   return true;
> > >  
> > > @@ -8024,7 +8036,7 @@ gimple_flow_call_edges_add (sbitmap bloc
> > >if (!gsi_end_p (gsi))
> > >   t = gsi_stmt (gsi);
> > >  
> > > -  if (t && need_fake_edge_p (t))
> > > +  if (t && stmt_can_terminate_bb_p (t))
> > >   {
> > > edge e;
> > >  
> > > @@ -8059,7 +8071,7 @@ gimple_flow_call_edges_add (sbitmap bloc
> > > do
> > >   {
> > > stmt = gsi_stmt (gsi);
> > > -   if (need_fake_edge_p (stmt))
> > > +   if (stmt_can_terminate_bb_p (stmt))
> > >   {
> > > edge e;
> > >  
> > > Index: tree-ssa-loop-niter.c
> > > ===
> > > --- tree-ssa-loop-niter.c (revision 238191)
> > > +++ tree-ssa-loop-niter.c (working copy)
> > > @@ -2159,7 +2159,6 @@ loop_only_exit_p (const struct loop *loo
> > >basic_block *body;

[PATCH] Add non-const overload of std::string::data()

2016-07-13 Thread Jonathan Wakely

Also fix confusion between pointer and _CharT*, so that allocators with
fancy pointers work correctly.

The _M_data() function returns pointer, but we were using it where
_CharT* was required. This introduces a new _M_c_str() function which
returns a _CharT* instead, and uses that for c_str() and both
overloads of data().

Most places using _M_data() can just use data() instead, as they only
need a const _CharT*. A few need to use _M_c_str() to get _CharT* so
they can write to it (they could use the new data() overload, except
that it's not defined until C++17).

Tested x86_64-linux, committed to trunk.

* include/bits/basic_string.h [_GLIBCXX_USE_CXX11_ABI] (_M_c_str):
New function.
(_M_disjunct, basic_string(const basic_string&, size_t)): Use data()
instead of _M_data().
(basic_string(const basic_string&, size_t, size_t, const _Alloc&)):
Likewise.
(append(const basic_string&)): Likewise.
(append(const basic_string&, size_type, size_type)): Likewise.
(assign(const basic_string&, size_type, size_type)): Likewise.
(insert(size_type, const basic_string&)): Likewise.
(insert(size_type, const basic_string&, size_type, size_type)):
Likewise.
(replace(size_type, size_type, const basic_string&, size_type,
size_type)): Likewise.
(replace(__const_iterator, __const_iterator, const basic_string&)):
Likewise.
(c_str(), data()): Use c_str() instead of _M_data().
(data()): Add non-const overload as per LWG 2391 and P0272R1.
(compare(const basic_string&)): Use data() instead of _M_data().
[!_GLIBCXX_USE_CXX11_ABI] (data()): Add non-const overload.
* include/bits/basic_string.tcc [_GLIBCXX_USE_CXX11_ABI] (_M_mutate):
Pass raw pointers to _S_copy.
(_M_erase, _M_replace_aux): Pass raw pointers to _S_move and
_S_assign.
(find(const _CharT*, size_type, size_type)): Use data instead of
_M_data().
* testsuite/21_strings/basic_string/allocator/char/ext_ptr.cc: New.
* testsuite/21_strings/basic_string/operations/data/char/2.cc: New.
* testsuite/21_strings/basic_string/operations/data/wchar_t/2.cc: New.

commit 4300aa2087d90cd2d55852fa38f082a89bc7e72d
Author: redi 
Date:   Wed Jul 13 11:08:37 2016 +

Add non-const overload of std::string::data()

Also fix confusion between pointer and _CharT*, so that allocators with
fancy pointers work correctly.

* include/bits/basic_string.h [_GLIBCXX_USE_CXX11_ABI] (_M_c_str):
New function.
(_M_disjunct, basic_string(const basic_string&, size_t)): Use data()
instead of _M_data().
(basic_string(const basic_string&, size_t, size_t, const _Alloc&)):
Likewise.
(append(const basic_string&)): Likewise.
(append(const basic_string&, size_type, size_type)): Likewise.
(assign(const basic_string&, size_type, size_type)): Likewise.
(insert(size_type, const basic_string&)): Likewise.
(insert(size_type, const basic_string&, size_type, size_type)):
Likewise.
(replace(size_type, size_type, const basic_string&, size_type,
size_type)): Likewise.
(replace(__const_iterator, __const_iterator, const basic_string&)):
Likewise.
(c_str(), data()): Use c_str() instead of _M_data().
(data()): Add non-const overload as per LWG 2391 and P0272R1.
(compare(const basic_string&)): Use data() instead of _M_data().
[!_GLIBCXX_USE_CXX11_ABI] (data()): Add non-const overload.
* include/bits/basic_string.tcc [_GLIBCXX_USE_CXX11_ABI] (_M_mutate):
Pass raw pointers to _S_copy.
(_M_erase, _M_replace_aux): Pass raw pointers to _S_move and
_S_assign.
(find(const _CharT*, size_type, size_type)): Use data instead of
_M_data().
* testsuite/21_strings/basic_string/allocator/char/ext_ptr.cc: New.
* testsuite/21_strings/basic_string/operations/data/char/2.cc: New.
* testsuite/21_strings/basic_string/operations/data/wchar_t/2.cc: New.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@238291 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 374c985..f60d9e0 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -155,6 +155,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 #endif
   }
 
+  // Get a raw pointer (rather than _Alloc::pointer).
+  _CharT*
+  _M_c_str() const
+  { return std::__addressof(*_M_dataplus._M_p); }
+
   void
   _M_capacity(size_type __capacity)
   { _M_allocated_capacity = __capacity; }
@@ -285,8 +290,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   bool
   _M_disjunct(const _CharT* __s) const _GLIBCXX_NOEXCEPT
   {
-   return 

[patch,avr] minor tweaks for 8-bit operations

2016-07-13 Thread Georg-Johann Lay

This patch contains some unrelated tweaks

- Supplying no-ldregs variant for andqi3, iorqi3 where a const_int mask affects 
only 1 bit


- Some patterns that match situations with zero_extend that can be performed 
with less instructions / register pressure.


- comparing HI against -1

Ok for trunk?

Johann


gcc/
Minor tweaks for QImode.

* config/avr/predicates.md (const_m255_to_m1_operand): New.
* config/avr/constraints.md (Cn8, Ca1, Co1, Yx2): New constraints.
* config/avr/avr.md (add3) : Make "r,0,r" more
expensive.
(*cmphi.zero-extend.0, *cmphi.zero-extend.1)
(*usum_widenqihi3, *udiff_widenqihi3)
(*addhi3_zero_extend.const): New combiner insns.
(andqi3, iorqi3): Provide "l" (NO_LD_REGS) alternative if
just 1 bit is affected.
* config/avr/avr.c (avr_out_bitop) : Don't access xop[3].
(avr_out_compare) [EQ,NE]: Tweak comparing d-regs against -1.
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 237643)
+++ config/avr/avr.c	(working copy)
@@ -5346,6 +5346,34 @@ avr_out_compare (rtx_insn *insn, rtx *xo
 }
 }
 
+  /* Comparisons == -1 and != -1 of a d-register that's used after the
+ comparison.  (If it's unused after we use CPI / SBCI or ADIW sequence
+ from below.)  Instead of  CPI Rlo,-1 / LDI Rx,-1 / CPC Rhi,Rx  we can
+ use  CPI Rlo,-1 / CPC Rhi,Rlo  which is 1 instruction shorter:
+ If CPI is true then Rlo contains -1 and we can use Rlo instead of Rx
+ when CPC'ing the high part.  If CPI is false then CPC cannot render
+ the result to true.  This also works for the more generic case where
+ the constant is of the form 0xabab.  */
+
+  if (n_bytes == 2
+  && xval != 0
+  && test_hard_reg_class (LD_REGS, xreg)
+  && compare_eq_p (insn)
+  && !reg_unused_after (insn, xreg))
+{
+  rtx xlo8 = simplify_gen_subreg (QImode, xval, mode, 0);
+  rtx xhi8 = simplify_gen_subreg (QImode, xval, mode, 1);
+
+  if (INTVAL (xlo8) == INTVAL (xhi8))
+{
+  xop[0] = xreg;
+  xop[1] = xlo8;
+
+  return avr_asm_len ("cpi %A0,%1"  CR_TAB
+  "cpc %B0,%A0", xop, plen, 2);
+}
+}
+
   for (i = 0; i < n_bytes; i++)
 {
   /* We compare byte-wise.  */
@@ -7687,11 +7715,11 @@ avr_out_bitop (rtx insn, rtx *xop, int *
 
   /* op[0]: 8-bit destination register
  op[1]: 8-bit const int
- op[2]: 8-bit clobber register or SCRATCH
+ op[2]: 8-bit clobber register, SCRATCH or NULL_RTX.
  op[3]: 8-bit register containing 0xff or NULL_RTX  */
   rtx op[4];
 
-  op[2] = xop[3];
+  op[2] = QImode == mode ? NULL_RTX : xop[3];
   op[3] = NULL_RTX;
 
   if (plen)
Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 237643)
+++ config/avr/avr.md	(working copy)
@@ -1240,6 +1240,33 @@ (define_insn "*addhi3.sign_extend1"
   [(set_attr "length" "5")
(set_attr "cc" "clobber")])
 
+(define_insn "*addhi3_zero_extend.const"
+  [(set (match_operand:HI 0 "register_operand" "=d")
+(plus:HI (zero_extend:HI (match_operand:QI 1 "register_operand" "0"))
+ (match_operand:HI 2 "const_m255_to_m1_operand" "Cn8")))]
+  ""
+  "subi %A0,%n2\;sbc %B0,%B0"
+  [(set_attr "length" "2")
+   (set_attr "cc" "set_czn")])
+
+(define_insn "*usum_widenqihi3"
+  [(set (match_operand:HI 0 "register_operand"  "=r")
+(plus:HI (zero_extend:HI (match_operand:QI 1 "register_operand"  "0"))
+ (zero_extend:HI (match_operand:QI 2 "register_operand"  "r"]
+  ""
+  "add %A0,%2\;clr %B0\;rol %B0"
+  [(set_attr "length" "3")
+   (set_attr "cc" "clobber")])
+
+(define_insn "*udiff_widenqihi3"
+  [(set (match_operand:HI 0 "register_operand"   "=r")
+(minus:HI (zero_extend:HI (match_operand:QI 1 "register_operand"  "0"))
+  (zero_extend:HI (match_operand:QI 2 "register_operand"  "r"]
+  ""
+  "sub %A0,%2\;sbc %B0,%B0"
+  [(set_attr "length" "2")
+   (set_attr "cc" "set_czn")])
+
 (define_insn "*addhi3_sp"
   [(set (match_operand:HI 1 "stack_register_operand"   "=q")
 (plus:HI (match_operand:HI 2 "stack_register_operand"   "q")
@@ -3102,15 +3129,16 @@ (define_insn "*udivmodsi4_call"
 ; and
 
 (define_insn "andqi3"
-  [(set (match_operand:QI 0 "register_operand"   "=??r,d")
-(and:QI (match_operand:QI 1 "register_operand" "%0,0")
-(match_operand:QI 2 "nonmemory_operand" "r,i")))]
+  [(set (match_operand:QI 0 "register_operand"   "=??r,d,*l")
+(and:QI (match_operand:QI 1 "register_operand" "%0,0,0")
+(match_operand:QI 2 "nonmemory_operand" "r,i,Ca1")))]
   ""
   "@
 	and %0,%2
-	andi %0,lo8(%2)"
-  [(set_attr "length" "1,1")
-   (set_attr "cc" "set_zn,set_zn")])

Re: [PATCH, ARM 6/7, ping1] Add support for CB(N)Z and (U|S)DIV to ARMv8-M Baseline

2016-07-13 Thread Thomas Preudhomme
On Tuesday 12 July 2016 15:57:41 Kyrill Tkachov wrote:
> On 12/07/16 11:26, Thomas Preudhomme wrote:
> > Hi Kyrill,
> 
> Hi Thomas,
> 
> > On Friday 20 May 2016 14:22:48 Kyrill Tkachov wrote:
> >> Hi Thomas,
> >> 
> >> On 17/05/16 11:14, Thomas Preudhomme wrote:
> >>> Ping?
> >>> 
> >>> *** gcc/ChangeLog ***
> >>> 
> >>> 2015-11-13  Thomas Preud'homme  
> >>> 
> >>>   * config/arm/arm.c (arm_print_operand_punct_valid_p): Make %?
> >>>   valid
> >>>   for Thumb-1.
> >>>   * config/arm/arm.h (TARGET_HAVE_CBZ): Define.
> >>>   (TARGET_IDIV): Set for all Thumb targets provided they have
> >>>   hardware
> >>>   divide feature.
> >>>   * config/arm/thumb1.md (thumb1_cbz): New insn.
> >>> 
> >>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> >>> index
> >>> f42e996e5a7ce979fe406b8261d50fb2ba005f6b..347b5b0a5cc0bc1e3b5020c8124d96
> >>> 8e
> >>> 76ce48a4 100644
> >>> --- a/gcc/config/arm/arm.h
> >>> +++ b/gcc/config/arm/arm.h
> >>> @@ -271,9 +271,12 @@ extern void
> >>> (*arm_lang_output_object_attributes_hook)
> >>> (void);
> >>> 
> >>>/* Nonzero if this chip provides the movw and movt instructions.  */
> >>>#define TARGET_HAVE_MOVT   (arm_arch_thumb2 || arm_arch8)
> >>> 
> >>> +/* Nonzero if this chip provides the cb{n}z instruction.  */
> >>> +#define TARGET_HAVE_CBZ  (arm_arch_thumb2 || arm_arch8)
> >>> +
> >>> 
> >>>/* Nonzero if integer division instructions supported.  */
> >>>#define TARGET_IDIV((TARGET_ARM && arm_arch_arm_hwdiv) \
> >>> 
> >>> -  || (TARGET_THUMB2 && arm_arch_thumb_hwdiv))
> >>> +  || (TARGET_THUMB && arm_arch_thumb_hwdiv))
> >>> 
> >>>/* Nonzero if disallow volatile memory access in IT block.  */
> >>>#define TARGET_NO_VOLATILE_CE  (arm_arch_no_volatile_ce)
> >>> 
> >>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> >>> index
> >>> 13b4b71ac8f9c1da8ef1945f7ff6985ca59f6832..445972ce0e3fd27d4411840ff69e9e
> >>> db
> >>> b23994fc 100644
> >>> --- a/gcc/config/arm/arm.c
> >>> +++ b/gcc/config/arm/arm.c
> >>> @@ -22684,7 +22684,7 @@ arm_print_operand_punct_valid_p (unsigned char
> >>> code)>
> >>> 
> >>>{
> >>>
> >>>  return (code == '@' || code == '|' || code == '.'
> >>>  
> >>> || code == '(' || code == ')' || code == '#'
> >>> 
> >>> -   || (TARGET_32BIT && (code == '?'))
> >>> +   || code == '?'
> >>> 
> >>> || (TARGET_THUMB2 && (code == '!'))
> >>> || (TARGET_THUMB && (code == '_')));
> >>>
> >>>}
> >> 
> >> Hmm, I'm not a fan of this change. arm_print_operand_punct_valid_p is an
> >> implementation of a target hook that is used to validate user-provided
> >> inline asm as well and is therefore the right place to reject such
> >> invalid
> >> constructs.
> >> 
> >> This is just working around the fact that the output template for the
> >> [u]divsi3 patterns has a '%?' in it that is illegal in Thumb1 and will
> >> not
> >> be used for ARMv8-M Baseline anyway. I'd prefer it if you add a second
> >> alternative to those patterns and emit the sdiv/udiv mnemonic without the
> >> '%?' and enable that for the v8mb arch attribute (and mark the existing
> >> alternative as requiring the "32" arch attribute).
> >> 
> >>> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> >>> index
> >>> 4572456b8bc98503061846cad94bc642943db3a2..1b01ef6ce731fe3ff37c3d8c048fb9
> >>> d5
> >>> e7829b35 100644
> >>> --- a/gcc/config/arm/thumb1.md
> >>> +++ b/gcc/config/arm/thumb1.md
> >>> @@ -973,6 +973,92 @@
> >>> 
> >>>  DONE;
> >>>
> >>>})
> >>> 
> >>> +;; A pattern for the cb(n)z instruction added in ARMv8-M baseline
> >>> profile,
> >>> +;; adapted from cbranchsi4_insn.  Modifying cbranchsi4_insn instead
> >>> leads
> >>> to +;; code generation difference for ARMv6-M because the minimum length
> >>> of the +;; instruction becomes 2 even for it due to a limitation in
> >>> genattrtab's +;; handling of pc in the length condition.
> >>> +(define_insn "thumb1_cbz"
> >>> +  [(set (pc) (if_then_else
> >>> +   (match_operator 0 "equality_operator"
> >>> +[(match_operand:SI 1 "s_register_operand" "l")
> >>> + (const_int 0)])
> >>> +   (label_ref (match_operand 2 "" ""))
> >>> +   (pc)))]
> >>> +  "TARGET_THUMB1 && TARGET_HAVE_MOVT"
> >>> +{
> >> 
> >> s/TARGET_HAVE_MOVT/TARGET_HAVE_CBZ/
> >> 
> >>> +  if (get_attr_length (insn) == 2)
> >>> +{
> >>> +  if (GET_CODE (operands[0]) == EQ)
> >>> + return "cbz\t%1, %l2";
> >>> +  else
> >>> + return "cbnz\t%1, %l2";
> >>> +}
> >>> +  else
> >>> +{
> >>> +  rtx t = cfun->machine->thumb1_cc_insn;
> >>> +  if (t != NULL_RTX)
> >>> + {
> >>> +   if (!rtx_equal_p (cfun->machine->thumb1_cc_op0, operands[1])
> >>> +   || !rtx_equal_p (cfun->machine->thumb1_cc_op1, operands[2]))
> >>> + t = NULL_RTX;
> >>> +   if 

Re: [v3 PATCH] Implement P0307R2, Making Optional Greater Equal Again.

2016-07-13 Thread Jonathan Wakely

On 13/07/16 13:05 +0300, Ville Voutilainen wrote:

Ha, that was indeed in just one place.


See below.


I made the above changes and also made converting assignment operators
SFINAE. That SFINAE
seems consistent with how constructors and relops work. And yes, there
are still some members like
emplace that static_assert rather than SFINAE, but I think that's ok for now.
   operators. Also test that assignment sfinaes.


OK.


diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index e9a86a4..45929c7 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -132,7 +132,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*
* Practically speaking this detects the presence of such an operator when
* called on a const-qualified lvalue (i.e.
-* declval<_Tp * const&>().operator&()).
+* declval().operator&()).
*/
  template
struct _Has_addressof


That comment was wrong anyway, it says _Tp* const& when it should have
been _Tp const&.

declval<_Tp * const&>().operator&() doesn't make any sense. Not sure
why I've never spotted that until now.

Please change it to const _Tp& and change "i.e." to "e.g." (because
since my change last year it detects both members and non-members).

OK for trunk with that tweak, thanks.

I'll make the same change to the comment in .



libgomp: In OpenACC testing, cycle though $offload_targets, and by default only build for the offload target that we're actually going to test

2016-07-13 Thread Thomas Schwinge
Hi!

As discussed before, "offloading compilation is slow; I suppose because
of having to invoke several tools (LTO streaming -> mkoffload -> offload
compilers, assemblers, linkers -> combine the resulting images; but I
have not done a detailed analysis on that)".  For this reason it is
beneficial (that is, it is measurable in libgomp testing wall time) to
limit offload compilation to the one (in the OpenACC case) offload target
that we're actually going to test (that is, execute).  Another reason is
that -foffload=-fdump-tree-[...] produces clashes (that is,
unpredicatable outcome) in the file names of offload compilations' dump
files' names.  Here is a patch to implement that, to specify
-foffload=[...] during libgomp OpenACC testing.  As that has been
challenged before:

| [...] there actually is a difference between offload_plugins and
| offload_targets (for example, "intelmic"
| vs. "x86_64-intelmicemul-linux-gnu"), and I'm using both variables --
| to avoid having to translate the more specific
| "x86_64-intelmicemul-linux-gnu" (which we required in the test harness)
| into the less specific "intelmic" (for plugin loading) in
| libgomp/target.c.  I can do that, so that we can continue to use just a
| single offload_targets variable, but I consider that a less elegant
| solution.

OK for trunk?

commit 5fdb515826769ebb36bc5c49a3ffac4d17a8a589
Author: Thomas Schwinge 
Date:   Wed Jul 13 11:37:16 2016 +0200

libgomp: In OpenACC testing, cycle though $offload_targets, and by default 
only build for the offload target that we're actually going to test

libgomp/
* plugin/configfrag.ac: Enumerate both offload plugins and offload
targets.
(OFFLOAD_PLUGINS): Renamed from OFFLOAD_TARGETS.
* target.c (gomp_target_init): Adjust to that.
* testsuite/lib/libgomp.exp: Likewise.
(offload_targets_s, offload_targets_s_openacc): Remove variables.
(offload_target_to_openacc_device_type): New proc.
(check_effective_target_openacc_nvidia_accel_selected)
(check_effective_target_openacc_host_selected): Examine
$openacc_device_type instead of $offload_target_openacc.
* Makefile.in: Regenerate.
* config.h.in: Likewise.
* configure: Likewise.
* testsuite/Makefile.in: Likewise.
* testsuite/libgomp.oacc-c++/c++.exp: Cycle through
$offload_targets (plus "disable") instead of
$offload_targets_s_openacc, and add "-foffload=$offload_target" to
tagopt.
* testsuite/libgomp.oacc-c/c.exp: Likewise.
* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
---
 libgomp/Makefile.in|  1 +
 libgomp/config.h.in|  4 +-
 libgomp/configure  | 44 +++--
 libgomp/plugin/configfrag.ac   | 39 +++-
 libgomp/target.c   |  8 +--
 libgomp/testsuite/Makefile.in  |  1 +
 libgomp/testsuite/lib/libgomp.exp  | 72 ++
 libgomp/testsuite/libgomp.oacc-c++/c++.exp | 30 +
 libgomp/testsuite/libgomp.oacc-c/c.exp | 30 +
 libgomp/testsuite/libgomp.oacc-fortran/fortran.exp | 22 ---
 10 files changed, 142 insertions(+), 109 deletions(-)

diff --git libgomp/Makefile.in libgomp/Makefile.in
index 88c8517..33be8c7 100644
--- libgomp/Makefile.in
+++ libgomp/Makefile.in
@@ -380,6 +380,7 @@ mkdir_p = @mkdir_p@
 multi_basedir = @multi_basedir@
 offload_additional_lib_paths = @offload_additional_lib_paths@
 offload_additional_options = @offload_additional_options@
+offload_plugins = @offload_plugins@
 offload_targets = @offload_targets@
 oldincludedir = @oldincludedir@
 pdfdir = @pdfdir@
diff --git libgomp/config.h.in libgomp/config.h.in
index 226ac53..28f7b2d 100644
--- libgomp/config.h.in
+++ libgomp/config.h.in
@@ -98,8 +98,8 @@
*/
 #undef LT_OBJDIR
 
-/* Define to offload targets, separated by commas. */
-#undef OFFLOAD_TARGETS
+/* Define to offload plugins, separated by commas. */
+#undef OFFLOAD_PLUGINS
 
 /* Name of package */
 #undef PACKAGE
diff --git libgomp/configure libgomp/configure
index 8d03eb6..4baab20 100755
--- libgomp/configure
+++ libgomp/configure
@@ -633,6 +633,8 @@ PLUGIN_NVPTX_FALSE
 PLUGIN_NVPTX_TRUE
 offload_additional_lib_paths
 offload_additional_options
+offload_targets
+offload_plugins
 PLUGIN_HSA_LIBS
 PLUGIN_HSA_LDFLAGS
 PLUGIN_HSA_CPPFLAGS
@@ -646,7 +648,6 @@ PLUGIN_NVPTX_CPPFLAGS
 PLUGIN_NVPTX
 CUDA_DRIVER_LIB
 CUDA_DRIVER_INCLUDE
-offload_targets
 libtool_VERSION
 ac_ct_FC
 FCFLAGS
@@ -11145,7 +11146,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11148 "configure"
+#line 11149 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11251,7 +11252,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; 

Re: [PATCH] Fix Fortran DO loop fallback

2016-07-13 Thread Richard Biener
On Tue, Jul 12, 2016 at 4:54 PM, Martin Liška  wrote:
> On 07/12/2016 03:15 PM, Richard Biener wrote:
>> The scan for 1 *n_ after FRE looks still useful.  Btw, the testcase
>> doesn't fail for me,
>> we _do_ hoist the division in PRE, just not with -m32 anymore.  Can
>> you confirm this?
>
> Yeah, it works for both -m32 and -m64.
> This is final version of the patch, may I install it?

Changelog has a swapped entry.

Ok with fixing that.
Richard.

> Thanks,
> Martin


Re: [v3 PATCH] Implement P0307R2, Making Optional Greater Equal Again.

2016-07-13 Thread Ville Voutilainen
On 13 July 2016 at 13:05, Ville Voutilainen  wrote:
>> Dunno why this has _Tp const& rather than const _Tp&, could you fix it
>> while you're in the file anyway? It's a bit confusing to have one
>> place using a different style.
>
> Ha, that was indeed in just one place.


Well, technically two, the other was in a comment. The patch changes that, too.


Re: [v3 PATCH] Implement P0307R2, Making Optional Greater Equal Again.

2016-07-13 Thread Ville Voutilainen
On 13 July 2016 at 01:31, Jonathan Wakely  wrote:
> On 11/07/16 23:41 +0300, Ville Voutilainen wrote:
>>
>> @@ -785,41 +785,60 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> }
>> };
>>
>> +  template
>> +using __optional_relop_t =
>> +enable_if_t::value, bool>;
>
>
> Should this be is_convertible<_Tp, bool> instead?

Yeah.. it would be more reasonable to return a type explicitly
convertible to bool from a relop
if a non-bool is returned, but since built-in operators are not
contextually-convertible-to-bool,
that "protection" wouldn't buy much. And since the implementation
doesn't really do bool-wrappings
everywhere, let's go with is_convertible and change that if someone complains.

>
>>   template
>> -constexpr bool
>> +constexpr auto
>> operator!=(const optional<_Tp>& __lhs, _Tp const& __rhs)
>
>
> Dunno why this has _Tp const& rather than const _Tp&, could you fix it
> while you're in the file anyway? It's a bit confusing to have one
> place using a different style.

Ha, that was indeed in just one place.

I made the above changes and also made converting assignment operators
SFINAE. That SFINAE
seems consistent with how constructors and relops work. And yes, there
are still some members like
emplace that static_assert rather than SFINAE, but I think that's ok for now.

2016-07-13  Ville Voutilainen  

Implement P0307R2, Making Optional Greater Equal Again.
* include/std/optional (operator=(_Up&&)): Constrain.
(operator=(const optional<_Up>&)): Likewise.
(operator=(optional<_Up>&&)): Likewise.
(__optional_relop_t): New.
(operator==(const optional<_Tp>&, const optional<_Tp>&)): Constrain.
(operator!=(const optional<_Tp>&, const optional<_Tp>&)):
Constrain and make transparent.
(operator<(const optional<_Tp>&, const optional<_Tp>&)): Constrain.
(operator>(const optional<_Tp>&, const optional<_Tp>&)):
Constrain and make transparent.
(operator<=(const optional<_Tp>&, const optional<_Tp>&)): Likewise.
(operator>=(const optional<_Tp>&, const optional<_Tp>&)): Likewise.
(operator==(const optional<_Tp>&, const _Tp&): Constrain.
(operator==(const _Tp&, const optional<_Tp>&)): Likewise.
(operator!=(const optional<_Tp>&, _Tp const&)):
Constrain and make transparent.
(operator!=(const _Tp&, const optional<_Tp>&)): Likewise.
(operator<(const optional<_Tp>&, const _Tp&)): Constrain.
(operator<(const _Tp&, const optional<_Tp>&)): Likewise.
(operator>(const optional<_Tp>&, const _Tp&)):
Constrain and make transparent.
(operator>(const _Tp&, const optional<_Tp>&)): Likewise.
(operator<=(const optional<_Tp>&, const _Tp&)): Likewise.
(operator<=(const _Tp&, const optional<_Tp>&)): Likewise.
(operator>=(const optional<_Tp>&, const _Tp&)): Likewise.
(operator>=(const _Tp&, const optional<_Tp>&)): Likewise.
* testsuite/20_util/optional/constexpr/relops/2.cc: Adjust.
* testsuite/20_util/optional/constexpr/relops/4.cc: Likewise.
* testsuite/20_util/optional/relops/1.cc: Likewise.
* testsuite/20_util/optional/relops/2.cc: Likewise.
* testsuite/20_util/optional/relops/3.cc: Likewise.
* testsuite/20_util/optional/relops/4.cc: Likewise.
* testsuite/20_util/optional/requirements.cc: Add tests to verify
that optional's relops are transparent and don't synthesize
operators. Also test that assignment sfinaes.
diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index e9a86a4..45929c7 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -132,7 +132,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 *
 * Practically speaking this detects the presence of such an operator when
 * called on a const-qualified lvalue (i.e.
-* declval<_Tp * const&>().operator&()).
+* declval().operator&()).
 */
   template
 struct _Has_addressof
@@ -577,16 +577,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template,
+  is_assignable<_Tp&, _Up>,
   __not_>,
   __not_<__is_optional<_Up>>>::value,
 bool> = true>
 optional&
 operator=(_Up&& __u)
 {
-  static_assert(__and_,
-  is_assignable<_Tp&, _Up>>(),
-"Cannot assign to value type from argument");
-
   if (this->_M_is_engaged())
 this->_M_get() = std::forward<_Up>(__u);
   else
@@ -597,15 +595,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template>>::value,
-  bool> = true>
+  is_constructible<_Tp, _Up>,
+  is_assignable<_Tp&, _Up>,
+  __not_>>::value,
+bool> = true>
 optional&

Re: [PATCH, ARM 5/7] Add support for MOVT/MOVW to ARMv8-M Baseline

2016-07-13 Thread Kyrill Tkachov

Hi Thomas,

On 07/07/16 15:32, Thomas Preudhomme wrote:

Hi Kyrill,

Please find an updated version in attachment. Please note I made quite a few
other changes as well. The most important one was to add new ARMv8-M Baseline
only alternatives to the two movt insns in order to have non predicable output
template for ARMv8-M Baseline. I also inverted the logic for how to detect
ARMv8-M Baseline in the testsuite (arm_thumb1_movt_ok rather than *_ko) and
fixed a couple of typos. I didn't add MOVT test because it is difficult to
generate. Using -mslow-flash-data should help but it is not available for
ARMv8-M Baseline as only config/arm/thumb2.md was modified to support this
option.

Update ChangeLog entries are as follow:


*** gcc/ChangeLog ***

2016-06-20  Thomas Preud'homme  

 * config/arm/arm.h (TARGET_HAVE_MOVT): Include ARMv8-M as having MOVT.
 * config/arm/arm.c (arm_arch_name): (const_ok_for_op): Check MOVT/MOVW
 availability with TARGET_HAVE_MOVT.
 (thumb_legitimate_constant_p): Strip the high part of a label_ref.
 (thumb1_rtx_costs): Also return 0 if setting a half word constant and
 MOVW is available and replace (unsigned HOST_WIDE_INT) INTVAL by
 UINTVAL.
 (thumb1_size_rtx_costs): Make set of half word constant also cost 1
 extra instruction if MOVW is available.  Use a cost variable
 incremented by COSTS_N_INSNS (1) when the condition match rather than
 returning an arithmetic expression based on COSTS_N_INSNS.  Make
 constant with bottom half word zero cost 2 instruction if MOVW is
 available.
 * config/arm/arm.md (define_attr "arch"): Add v8mb.
 (define_attr "arch_enabled"): Set to yes if arch value is v8mb and
 target is ARMv8-M Baseline.
 (arm_movt): New unpredicable alternative for ARMv8-M Baseline.
 (arm_movtas_ze): Likewise.
 * config/arm/thumb1.md (thumb1_movdi_insn): Add ARMv8-M Baseline only
 alternative for constants satisfying j constraint.
 (thumb1_movsi_insn): Likewise.
 (movsi splitter for K alternative): Tighten condition to not trigger
 if movt is available and j constraint is satisfied.
 (Pe immediate splitter): Likewise.
 (thumb1_movhi_insn): Add ARMv8-M Baseline only alternative for
 constant fitting in an halfword to use MOVW.
 * doc/sourcebuild.texi (arm_thumb1_movt_ok): Document new ARM
 effective target.


*** gcc/testsuite/ChangeLog ***

2016-05-23  Thomas Preud'homme  

 * lib/target-supports.exp (check_effective_target_arm_thumb1_movt_ok):
 Define effective target.
 * gcc.target/arm/pr42574.c: Require arm_thumb1_ok and
 !arm_thumb1_movt_ok to exclude ARMv8-M Baseline.
 * gcc.target/arm/movhi_movw.c: New test.
 * gcc.target/arm/movsi_movw.c: Likewise.
 * gcc.target/arm/movdi_movw.c: Likewise.


Ok.
Thanks,
Kyrill



Best regards,

Thomas

On Friday 20 May 2016 12:14:44 Kyrill Tkachov wrote:

Hi Thomas,

On 19/05/16 17:11, Thomas Preudhomme wrote:

On Wednesday 18 May 2016 12:30:41 Kyrill Tkachov wrote:

Hi Thomas,

This looks mostly good with a few nits inline.
Please repost with the comments addressed.

Updated ChangeLog entries:

*** gcc/ChangeLog ***

2016-05-18  Thomas Preud'homme  

  * config/arm/arm.h (TARGET_HAVE_MOVT): Include ARMv8-M as having
  MOVT.
  * config/arm/arm.c (arm_arch_name): (const_ok_for_op): Check
  MOVT/MOVW
  availability with TARGET_HAVE_MOVT.
  (thumb_legitimate_constant_p): Strip the high part of a
  label_ref.
  (thumb1_rtx_costs): Also return 0 if setting a half word constant
  and
  MOVW is available and replace (unsigned HOST_WIDE_INT) INTVAL by
  UINTVAL.
  (thumb1_size_rtx_costs): Make set of half word constant also cost
  1
  extra instruction if MOVW is available.  Use a cost variable
  incremented by COSTS_N_INSNS (1) when the condition match rather
  than
  returning an arithmetic expression based on COSTS_N_INSNS.  Make
  constant with bottom half word zero cost 2 instruction if MOVW is
  available.
  * config/arm/arm.md (define_attr "arch"): Add v8mb.
  (define_attr "arch_enabled"): Set to yes if arch value is v8mb
  and
  target is ARMv8-M Baseline.
  * config/arm/thumb1.md (thumb1_movdi_insn): Add ARMv8-M Baseline
  only
  alternative for constants satisfying j constraint.
  (thumb1_movsi_insn): Likewise.
  (movsi splitter for K alternative): Tighten condition to not
  trigger
  if movt is available and j constraint is satisfied.
  (Pe immediate splitter): Likewise.
  (thumb1_movhi_insn): Add ARMv8-M Baseline only 

Re: [PATCH] disable IPA-cp cloning for functions with target_clones attribute

2016-07-13 Thread Jan Hubicka
> Hi,
> 
> On Fri, Jun 24, 2016 at 01:41:05PM -0700, Evgeny Stupachenko wrote:
> > Hi,
> > 
> > Fix ICE when IPA-cp and target_clones are applied to the same function.
> > Is the patch ok for trunk?
> 
> I can't approve anything but since I wrote most of IPA-CP, it may
> count that I am fine with the patch.

Yep, the patch is OK

Honza
> 
> However, it should also be backported to the 6 branch.
> 
> In the future, it is useful to file a bugzilla PR for issues like
> this, even if you also provide a fix.  It helps with tracking
> backports and with a PR, you would have probably caught my attention
> sooner.
> 
> In any event, thanks for addressing this.
> 
> Martin
> 
> > 
> > Thanks,
> > Evgeny
> > 
> > 2016-06-24  Evgeny Stupachenko  
> > 
> > gcc/
> > * ipa-cp.c (determine_versionability): Do not create constprop 
> > clones,
> > when target_clones attribute is set.
> > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> > index 2710494..4b642ba 100644
> > --- a/gcc/ipa-cp.c
> > +++ b/gcc/ipa-cp.c
> > @@ -533,6 +533,13 @@ determine_versionability (struct cgraph_node *node,
> >  coexist, but that may not be worth the effort.  */
> >reason = "function has SIMD clones";
> >  }
> > +  else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES 
> > (node->decl)))
> > +{
> > +  /* Ideally we should clone the target clones themselves and create
> > +copies of them, so IPA-cp and target clones can happily
> > +coexist, but that may not be worth the effort.  */
> > +  reason = "function target_clones attribute";
> > +}
> >/* Don't clone decls local to a comdat group; it breaks and for C++
> >  decloned constructors, inlining is always better anyway.  */
> >else if (node->comdat_local_p ())
> > diff --git a/gcc/testsuite/gcc.target/i386/mvc8.c
> > b/gcc/testsuite/gcc.target/i386/mvc8.c
> > new file mode 100644
> > index 000..e9ab9e1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/mvc8.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-ifunc "" } */
> > +/* { dg-options "-O3 -fno-inline" } */
> > +/* { dg-final { scan-assembler-not "constprop" } } */
> > +__attribute__((target_clones("arch=core-avx2","arch=slm","default")))
> > +void foo (float *a, int b) {
> > +*a = (float)b;
> > +}
> > +float a;
> > +int main() {
> > +  int i;
> > +  for (i = 0; i < 1024; i++)
> > +foo (, 5);
> > +}


Re: Fix loop_only_exit_p

2016-07-13 Thread Jan Hubicka
> 
> I'd rather not expose/change need_fake_edge_p as that has a very
> specific purpose.
> 
> Why don't you simply add a call to is_ctrl_altering_stmt on the
> last stmt of the block in loop_only_exit_p?  It's a waste of
> time doing stuff on every stmt that can only make a difference
> on the last one of a BB.

I am not sure I understand your suggestion. is_ctrl_altering_stmt checks
whether stmt terminates BB (i.e. it does control flow we represent explicitly).
Here we look for cases where BB execution can terminate in the middle of BB
in ways not explicitly represented in CFG.

For example:

 ...stmt...
 *ptr=0
 ...stmt...

the control flow can terminate by the memory access that may throw non-call
exception. 

or

 ...stmt...
 volatile asm statement eventually leading to longjmp
 ...stmt...

I worte RTL version of need_fake_edge_p many years ago to check particularly
these cases. This is what loop code checks - it wants to know that the loop
will eventually terminate by the exit edge on hand, not by other ways.

Honza
> 
> Richard.
> 
> > Honza
> > 
> > * gimple.h (stmt_can_terminate_bb_p): New function.
> > * tree-cfg.c (need_fake_edge_p): Rename to ...
> > (stmt_can_terminate_bb_p): ... this; return true if stmt can
> > throw external; handle const and pure calls.
> > * tree-ssa-loop-niter.c (loop_only_exit_p): Use it.
> > Index: gimple.h
> > ===
> > --- gimple.h(revision 238191)
> > +++ gimple.h(working copy)
> > @@ -1526,6 +1526,7 @@ extern void gimple_seq_set_location (gim
> >  extern void gimple_seq_discard (gimple_seq);
> >  extern void maybe_remove_unused_call_args (struct function *, gimple *);
> >  extern bool gimple_inexpensive_call_p (gcall *);
> > +extern bool stmt_can_terminate_bb_p (gimple *);
> >  
> >  /* Formal (expression) temporary table handling: multiple occurrences of
> > the same scalar expression are evaluated into the same temporary.  */
> > Index: tree-cfg.c
> > ===
> > --- tree-cfg.c  (revision 238191)
> > +++ tree-cfg.c  (working copy)
> > @@ -7919,15 +7919,20 @@ gimple_block_ends_with_condjump_p (const
> >  }
> >  
> >  
> > -/* Return true if we need to add fake edge to exit at statement T.
> > -   Helper function for gimple_flow_call_edges_add.  */
> > +/* Return true if statement T may terminate execution of BB in ways not
> > +   explicitly represtented in the CFG.  */
> >  
> > -static bool
> > -need_fake_edge_p (gimple *t)
> > +bool
> > +stmt_can_terminate_bb_p (gimple *t)
> >  {
> >tree fndecl = NULL_TREE;
> >int call_flags = 0;
> >  
> > +  /* Eh exception not handled internally terminates execution of the whole
> > + function.  */
> > +  if (stmt_can_throw_external (t))
> > +return true;
> > +
> >/* NORETURN and LONGJMP calls already have an edge to exit.
> >   CONST and PURE calls do not need one.
> >   We don't currently check for CONST and PURE here, although
> > @@ -7960,6 +7965,13 @@ need_fake_edge_p (gimple *t)
> >edge e;
> >basic_block bb;
> >  
> > +  if (call_flags & (ECF_PURE | ECF_CONST)
> > + && !(call_flags & ECF_LOOPING_CONST_OR_PURE))
> > +   return false;
> > +
> > +  /* Function call may do longjmp, terminate program or do other 
> > things.
> > +Special case noreturn that have non-abnormal edges out as in this case
> > +the fact is sufficiently represented by lack of edges out of T.  */
> >if (!(call_flags & ECF_NORETURN))
> > return true;
> >  
> > @@ -8024,7 +8036,7 @@ gimple_flow_call_edges_add (sbitmap bloc
> >if (!gsi_end_p (gsi))
> > t = gsi_stmt (gsi);
> >  
> > -  if (t && need_fake_edge_p (t))
> > +  if (t && stmt_can_terminate_bb_p (t))
> > {
> >   edge e;
> >  
> > @@ -8059,7 +8071,7 @@ gimple_flow_call_edges_add (sbitmap bloc
> >   do
> > {
> >   stmt = gsi_stmt (gsi);
> > - if (need_fake_edge_p (stmt))
> > + if (stmt_can_terminate_bb_p (stmt))
> > {
> >   edge e;
> >  
> > Index: tree-ssa-loop-niter.c
> > ===
> > --- tree-ssa-loop-niter.c   (revision 238191)
> > +++ tree-ssa-loop-niter.c   (working copy)
> > @@ -2159,7 +2159,6 @@ loop_only_exit_p (const struct loop *loo
> >basic_block *body;
> >gimple_stmt_iterator bsi;
> >unsigned i;
> > -  gimple *call;
> >  
> >if (exit != single_exit (loop))
> >  return false;
> > @@ -2168,17 +2167,8 @@ loop_only_exit_p (const struct loop *loo
> >for (i = 0; i < loop->num_nodes; i++)
> >  {
> >for (bsi = gsi_start_bb (body[i]); !gsi_end_p (bsi); gsi_next ())
> > -   {
> > - call = gsi_stmt (bsi);
> > - if (gimple_code (call) != GIMPLE_CALL)
> > -   continue;
> > -
> > - if (gimple_has_side_effects (call))
> > -   {
> > - free (body);
> 

C/C++: Simplify handling of location information for OpenACC routine directives

2016-07-13 Thread Thomas Schwinge
Hi!

Working on something else regarding the C/C++ OpenACC routine directive,
I couldn't but untangle that arcane location_t handling, currently using
a dummy OMP_CLAUSE_SEQ.  Along the way, I also updated some comments, and
simplified some code.  OK for trunk?  (Another C/C++ OpenACC routine
cleanup patch is emerging, depending on this one.)

commit 9ae5f6d868db42b585de8a1d5ec3c2746619
Author: Thomas Schwinge 
Date:   Fri Jul 8 18:30:45 2016 +0200

C/C++: Simplify handling of location information for OpenACC routine 
directives

gcc/c/
* c-parser.c (struct oacc_routine_data): New.
(c_parser_declaration_or_fndef, c_parser_oacc_routine): Use it.
Simplify code.
(c_finish_oacc_routine): Likewise.  Don't attach clauses to "omp
declare target" attribute.
gcc/cp/
* parser.h (struct cp_omp_declare_simd_data): New.
(struct cp_parser): Use it for oacc_routine member.
* parser.c (cp_ensure_no_oacc_routine, cp_parser_oacc_routine)
(cp_parser_late_parsing_oacc_routine, cp_finalize_oacc_routine):
Use it.  Simplify code.
(cp_parser_new): Initialize all members pointing to special
parsing data structures.
(cp_parser_cilk_simd_fn_vector_attrs): Initialize
parser->cilk_simd_fn_info->clauses.
(cp_parser_omp_declare_simd): Initialize
parser->omp_declare_simd->clauses.
(cp_parser_late_parsing_omp_declare_simd): Simplify code.
---
 gcc/c/c-parser.c |  86 ++--
 gcc/cp/parser.c  | 108 ---
 gcc/cp/parser.h  |  21 ++-
 3 files changed, 103 insertions(+), 112 deletions(-)

diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 1a50dea..cc9ee59 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -1271,11 +1271,17 @@ enum c_parser_prec {
   NUM_PRECS
 };
 
+/* Helper data structure for parsing #pragma acc routine.  */
+struct oacc_routine_data {
+  tree clauses;
+  location_t loc;
+};
+
 static void c_parser_external_declaration (c_parser *);
 static void c_parser_asm_definition (c_parser *);
 static void c_parser_declaration_or_fndef (c_parser *, bool, bool, bool,
   bool, bool, tree *, vec,
-  tree = NULL_TREE);
+  struct oacc_routine_data * = NULL);
 static void c_parser_static_assert_declaration_no_semi (c_parser *);
 static void c_parser_static_assert_declaration (c_parser *);
 static void c_parser_declspecs (c_parser *, struct c_declspecs *, bool, bool,
@@ -1367,7 +1373,7 @@ static bool c_parser_omp_target (c_parser *, enum 
pragma_context, bool *);
 static void c_parser_omp_end_declare_target (c_parser *);
 static void c_parser_omp_declare (c_parser *, enum pragma_context);
 static bool c_parser_omp_ordered (c_parser *, enum pragma_context, bool *);
-static void c_parser_oacc_routine (c_parser *parser, enum pragma_context);
+static void c_parser_oacc_routine (c_parser *, enum pragma_context);
 
 /* These Objective-C parser functions are only ever called when
compiling Objective-C.  */
@@ -1559,7 +1565,8 @@ c_parser_external_declaration (c_parser *parser)
 }
 
 static void c_finish_omp_declare_simd (c_parser *, tree, tree, vec);
-static void c_finish_oacc_routine (c_parser *, tree, tree, bool, bool, bool);
+static void c_finish_oacc_routine (struct oacc_routine_data *, tree, bool,
+  bool, bool);
 
 /* Parse a declaration or function definition (C90 6.5, 6.7.1, C99
6.7, 6.9.1).  If FNDEF_OK is true, a function definition is
@@ -1638,7 +1645,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
   bool nested, bool start_attr_ok,
   tree *objc_foreach_object_declaration,
   vec omp_declare_simd_clauses,
-  tree oacc_routine_clauses)
+  struct oacc_routine_data *oacc_routine_data)
 {
   struct c_declspecs *specs;
   tree prefix_attrs;
@@ -1743,9 +1750,9 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
  pedwarn (here, 0, "empty declaration");
}
   c_parser_consume_token (parser);
-  if (oacc_routine_clauses)
-   c_finish_oacc_routine (parser, NULL_TREE,
-  oacc_routine_clauses, false, true, false);
+  if (oacc_routine_data)
+   c_finish_oacc_routine (oacc_routine_data, NULL_TREE, false, true,
+  false);
   return;
 }
 
@@ -1862,9 +1869,8 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
  || !vec_safe_is_empty (parser->cilk_simd_fn_tokens))
c_finish_omp_declare_simd (parser, NULL_TREE, NULL_TREE,
   omp_declare_simd_clauses);
- if 

Re: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

2016-07-13 Thread David Malcolm
On Tue, 2016-07-12 at 16:35 -0600, Martin Sebor wrote:
> On 07/12/2016 10:12 AM, Manuel López-Ibáñez wrote:
> > On 12/07/16 16:59, Martin Sebor wrote:
> > > You're probably right.  I suspect I have a tendency to overuse
> > > the quotes (e.g, the -Wplacement-new warning also quotes the
> > > sizes).  If there aren't yet (I vague recall coming across
> > > something on the GCC Wiki but can't find it now), it would be
> > > helpful to put in place some diagnostic style conventions like
> > > there are for formatting code to guide us in cases like this.
> > > I'm willing to help put the document together or add this to
> > > it if one already exists.
> > 
> > https://gcc.gnu.org/wiki/DiagnosticsGuidelines
> 
> That's it!  Thanks!  Looks like there are two places that talk
> about GCC diagnostics: one on the Wiki and one in the Coding
> Conventions (plus the GNU Coding Standard).  But, AFAICS, none
> of these gives guidance for what to quote.
> 
> Based on the gcc.pot file it does look like quoted numbers are
> far less common than unquoted ones (just 10 messages where they
> are quoted vs 528 unquoted).
> 
> I've added this as a guideline to the Wiki and assuming no one
> suggests otherwise I'll remove the quotes from this patch and
> from the other changes I already committed.

For reference, Martin's wiki change was:
https://gcc.gnu.org/wiki/DiagnosticsGuidelines?action=diff=8=7

(looks good to me, fwiw)


Re: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

2016-07-13 Thread Marek Polacek
On Tue, Jul 12, 2016 at 04:35:50PM -0600, Martin Sebor wrote:
> Based on the gcc.pot file it does look like quoted numbers are
> far less common than unquoted ones (just 10 messages where they
> are quoted vs 528 unquoted).
> 
> I've added this as a guideline to the Wiki and assuming no one
> suggests otherwise I'll remove the quotes from this patch and
> from the other changes I already committed.

I will support the change as I don't see any need for quoting numbers,
as opposed to e.g. identifiers.

Marek


Re: [PATCH] Fix PR71104 - call gimplification

2016-07-13 Thread Richard Biener
On Wed, May 18, 2016 at 7:43 PM, Jeff Law  wrote:
> On 05/17/2016 06:28 AM, Richard Biener wrote:
>>
>>
>> The following patch addresses PR71104 which shows verify-SSA ICEs
>> after gimplify-into-SSA.  The issue is that for returns-twice calls
>> we gimplify register uses in the LHS before the actual call which leads to
>>
>>   p.0_1 = p;
>>   _2 = vfork ();
>>   *p.0_1 = _2;
>>
>> when gimplifying *p = vfork ().  That of course does not work -
>> fortunately the C standard allows to evaluate operands in the LHS
>> in unspecified order of the RHS.  That also makes this order aligned
>> with that scary C++ proposal of defined evaluation order.  It also
>> improves code-generation, avoiding spilling of the pointer load
>> around the call.
>>
>> Exchanging the gimplify calls doesn't fix the issue fully as for
>> aggregate returns we don't gimplify the call result into a
>> temporary.  So we need to make sure to not emit an SSA when
>> gimplifying the LHS of a returns-twice call (this path only applies
>> to non-register returns).
>>
>> A bootstrap with just the gimplification order exchange is building
>> target libs right now, I'll re-bootstrap and test the whole thing
>> again if that succeeds.
>>
>> Is this ok?  I think it makes sense code-generation-wise.  Code
>> changes from GCC 6
>>
>> bar:
>> .LFB0:
>> .cfi_startproc
>> subq$24, %rsp
>> .cfi_def_cfa_offset 32
>> callfoo
>> movqp(%rip), %rax
>> movq%rax, 8(%rsp)
>> callvfork
>> movq8(%rsp), %rdx
>> movl%eax, (%rdx)
>> addq$24, %rsp
>> .cfi_def_cfa_offset 8
>> ret
>>
>> to
>>
>> bar:
>> .LFB0:
>> .cfi_startproc
>> subq$8, %rsp
>> .cfi_def_cfa_offset 16
>> callfoo
>> callvfork
>> movqp(%rip), %rdx
>> movl%eax, (%rdx)
>> addq$8, %rsp
>> .cfi_def_cfa_offset 8
>> ret
>>
>> Thanks,
>> Richard.
>>
>> 2016-05-17  Richard Biener  
>>
>> PR middle-end/71104
>> * gimplify.c (gimplify_modify_expr): Gimplify the RHS before
>> gimplifying the LHS.  Make sure to gimplify a returning twice
>> call LHS without using SSA names.
>>
>> * gcc.dg/pr71104-1.c: New testcase.
>> * gcc.dg/pr71104-2.c: Likewise.
>
> LGTM.

Now after Jason applied a better solution for gimplifying the RHS
before the LHS I have
applied the following after bootstrap / regtest on x86_64-unknown-linux-gnu.

Richard.

> jeff
>


fix-pr71104
Description: Binary data


Re: [PATCH] Allow fwprop to undo vectorization harm (PR68961)

2016-07-13 Thread Richard Biener
On Wed, 13 Jul 2016, Uros Bizjak wrote:

> On Sun, Jul 10, 2016 at 10:12 AM, Uros Bizjak  wrote:
> > On Wed, Jul 6, 2016 at 3:18 PM, Richard Biener  wrote:
> >
> >>> > 2016-07-04  Richard Biener  
> >>> >
> >>> > PR rtl-optimization/68961
> >>> > * fwprop.c (propagate_rtx): Allow SUBREGs of VEC_CONCAT and CONCAT
> >>> > to simplify to a non-constant.
> >>> >
> >>> > * gcc.target/i386/pr68961.c: New testcase.
> >>>
> >>> Thanks, LGTM.
> >>
> >> Bootstrapped and tested on x86_64-unknown-linux-gnu, it causes
> >>
> >> FAIL: gcc.target/i386/sse2-load-multi.c scan-assembler-times movup 2
> >>
> >> as the peephole created for that testcase no longer applies as fwprop
> >> does
> >>
> >> In insn 10, replacing
> >>  (vec_concat:V2DF (vec_select:DF (reg:V2DF 91)
> >> (parallel [
> >> (const_int 0 [0])
> >> ]))
> >> (mem:DF (reg/f:DI 95) [0  S8 A128]))
> >>  with (vec_concat:V2DF (reg:DF 93 [ MEM[(const double *) + 8B] ])
> >> (mem:DF (reg/f:DI 95) [0  S8 A128]))
> >> Changed insn 10
> >>
> >> resulting in
> >>
> >> movsd   a+8(%rip), %xmm0
> >> movhpd  a+16(%rip), %xmm0
> >>
> >> again rather than movupd.
> >>
> >> Uros, there is probably a missing peephole for the new form - can you
> >> fix this as a followup or should I hold on this patch for a bit longer?
> >
> > No, please proceed with the patch, I'll fix this fallout with a
> > followup patch in a couple of days.
> 
> Fixed with attached patch.
> 
> 2016-07-13  Uros Bizjak  
> 
> PR rtl-optimization/68961
> * config/i386/sse.md (movsd/movhpd to movupd peephole2s): Add new
> peephole variant.  Use sse_reg_operand predicates.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> Committed to mainline SVN.

Thanks Uros.

Richard.