Re: [PATCH] Change default optimization level to -Og

2017-10-26 Thread Andrew Pinski
On Thu, Oct 26, 2017 at 10:12 AM, Wilco Dijkstra  wrote:
> GCC's default optimization level is -O0.  Unfortunately unlike other 
> compilers,
> GCC generates extremely inefficient code with -O0.  It is almost unusable for
> low-level debugging or manual inspection of generated code.  So a -O option is
> always required for compilation.  -Og not only allows for fast compilation, 
> but
> also produces code that is efficient, readable as well as debuggable.
> Therefore -Og makes for a much better default setting.
>
> Any comments?


I think this goes against what most folks are used to.  I know you are
saying most folks are used to a compiler defaulting to optimizations
on but I don't think that is true.  In fact GCC has been this way
since day one.

Plus you also missed changing the following part of the documentation:
If you are not using some other optimization option, consider using
-Og (see Optimize Options) with -g. With no -O option at all, some
compiler passes that collect information useful for debugging do not
run at all, so that -Og may result in a better debugging experience.

Thanks,
Andrew Pinski

>
> 2017-10-26  Wilco Dijkstra  
>
> * opts.c (default_options_optimization): Set default to -Og.
>
> doc/
> * invoke.texi (-O0) Remove default mention.
> (-Og): Add mention of default setting.
>
> --
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 
> 3328a3b5fafa6a98007eff52d2a26af520de9128..74c33ea35b9f320b419a3417e6007d2391536f1b
>  100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -7343,7 +7343,7 @@ by @option{-O2} and also turns on the following 
> optimization flags:
>  @item -O0
>  @opindex O0
>  Reduce compilation time and make debugging produce the expected
> -results.  This is the default.
> +results.
>
>  @item -Os
>  @opindex Os
> @@ -7371,7 +7371,7 @@ Optimize debugging experience.  @option{-Og} enables 
> optimizations
>  that do not interfere with debugging. It should be the optimization
>  level of choice for the standard edit-compile-debug cycle, offering
>  a reasonable level of optimization while maintaining fast compilation
> -and a good debugging experience.
> +and a good debugging experience.  This is the default.
>  @end table
>
>  If you use multiple @option{-O} options, with or without level numbers,
> diff --git a/gcc/opts.c b/gcc/opts.c
> index 
> dfad955e220870a3250198640f3790c804b191e0..74511215309f11445685db4894be2ab6881695d3
>  100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -565,6 +565,12 @@ default_options_optimization (struct gcc_options *opts,
>int opt2;
>bool openacc_mode = false;
>
> +  /* Set the default optimization to -Og.  */
> +  opts->x_optimize_size = 0;
> +  opts->x_optimize = 1;
> +  opts->x_optimize_fast = 0;
> +  opts->x_optimize_debug = 1;
> +
>/* Scan to see what optimization level has been specified.  That will
>   determine the default value of many flags.  */
>for (i = 1; i < decoded_options_count; i++)
>


Re: Merge from trunk to gccgo branch

2017-10-26 Thread Ian Lance Taylor
On Tue, Oct 24, 2017 at 3:10 PM, Ian Lance Taylor  wrote:
> I merged trunk revision 254059 to the gccgo branch.

I've now merged trunk revision 254126.

Ian


Re: [Patch] Edit contrib/ files to download gfortran prerequisites

2017-10-26 Thread Jerry DeLisle
On 10/21/2017 11:17 AM, Bernhard Reutner-Fischer wrote:
> On 21 October 2017 at 02:26, Damian Rouson  
> wrote:
>>
>> Hi Richard,
>>
>> Attached is a revised patch that makes the downloading of Fortran 
>> prerequisites optional via a new --no-fortran flag that can be passed to 
>> contrib/download_prerequisites as requested in your reply below.
>>
>> As Jerry mentioned in his response, he has been working on edits to the 
>> top-level build machinery, but we need additional guidance to complete his 
>> work.  Given that there were no responses to his request for guidance and 
>> it’s not clear when that work will complete, I’m hoping this minor change 
>> can be approved independently so that this patch doesn’t suffer bit rot in 
>> the interim.
>>
>> Ok for trunk?
> 
> + die "Invoking wget and curl in the 'download_prerequisites' script failed."
> 
> I suggest "Neither wget nor curl found, cannot download tarballs."
> 
> As an open-mpi user i question why this hardcodes MPICH, fwiw.
> 
> thanks,
> 

Hi Bernhard,

Well, open-mpi does not yet support failed images so we are focusing on mpich.

Maybe what we should do is default to mpich if not otherwise specified and
provide a -mpi=mpich or -mpi=open-mpi, or similar. Then later we can add any
other implementations that show up.

Cheers,

Jerry


Re: [PATCH, version 5a], Add support for _Float and _FloatX sqrt, fma, fmin, fmax built-in functions

2017-10-26 Thread Segher Boessenkool
On Wed, Oct 25, 2017 at 07:57:09PM -0400, Michael Meissner wrote:
> On Wed, Oct 25, 2017 at 11:41:24PM +, Joseph Myers wrote:
> > My only comment on this version is that the documentation in cpp.texi of 
> > __FP_FAST_* should be updated to mention __FP_FAST_FMAF@var{n} and 
> > __FP_FAST_FMAF@var{n}X.
> 
> Here is the documentation to be added to the previoius patch.  With this
> change, can I check it into the trunk (once Segher approves of the minor bit 
> to
> take out the PowerPC __builtin_{sqrtf128,fmaf128} builtins.

That bit is fine, of course :-)


Segher


Re: [PATCH, rs6000] add testcase coverage for vec_neg built-ins.

2017-10-26 Thread Segher Boessenkool
Hi,

On Wed, Oct 25, 2017 at 10:26:46AM -0500, Will Schmidt wrote:

> 2017-10-25  Will Schmidt  
> 
>   * gcc.target/powerpc/fold-vec-neg-char.c: New.
>   * gcc.target/powerpc/fold-vec-neg-floatdouble.c: New.
>   * gcc.target/powerpc/fold-vec-neg-int.c: New.
>   * gcc.target/powerpc/fold-vec-neg-longlong.c: New.
>   * gcc.target/powerpc/fold-vec-neg-short.c: New.

This looks fine.  Okay for trunk.  Thanks!


Segher


[PATCH] Protect more algorithms from overloaded comma operators

2017-10-26 Thread Jonathan Wakely

LWG 2133 makes it a requirement for some of the algorithms to protect
against overlaoded comma operators. This does so, and adapts some
tests to verify it, using our iterator wrappers. Those wrappers have a
member oeprator, but that only handles the case where the iterator is
the left operand, so I added extra deleted (non-member) functions to
handle iterators as the right operand.

* include/bits/stl_algo.h (__find_if_not_n, generate_n): Cast to void
to ensure overloaded comma not used.
* include/bits/stl_algobase.h (__fill_n_a, equal): Likewise.
* include/bits/stl_uninitialized.h (__uninitialized_fill_n)
(__uninitialized_fill_n_a, __uninitialized_default_n_1)
(__uninitialized_default_n_a, __uninitialized_copy_n)
(__uninitialized_copy_n_pair): Likewise
* testsuite/20_util/specialized_algorithms/memory_management_tools/1.cc:
Use test iterator wrappers with overloaded comma operator.
* testsuite/25_algorithms/fill_n/1.cc: Likewise.
* testsuite/25_algorithms/generate_n/1.cc: New test.
* testsuite/25_algorithms/stable_partition/1.cc: New test.
* testsuite/util/testsuite_iterators.h (operator,): Add deleted
non-member comma operator with iterator wrappers as right operand.

Tested powerpc64le-linux, committed to trunk.

commit f358d44717b3fa3f204a88fff098d03e4b5890c1
Author: Jonathan Wakely 
Date:   Thu Oct 26 18:41:41 2017 +0100

Protect more algorithms from overloaded comma operators

* include/bits/stl_algo.h (__find_if_not_n, generate_n): Cast to 
void
to ensure overloaded comma not used.
* include/bits/stl_algobase.h (__fill_n_a, equal): Likewise.
* include/bits/stl_uninitialized.h (__uninitialized_fill_n)
(__uninitialized_fill_n_a, __uninitialized_default_n_1)
(__uninitialized_default_n_a, __uninitialized_copy_n)
(__uninitialized_copy_n_pair): Likewise
* 
testsuite/20_util/specialized_algorithms/memory_management_tools/1.cc:
Use test iterator wrappers with overloaded comma operator.
* testsuite/25_algorithms/fill_n/1.cc: Likewise.
* testsuite/25_algorithms/generate_n/1.cc: New test.
* testsuite/25_algorithms/stable_partition/1.cc: New test.
* testsuite/util/testsuite_iterators.h (operator,): Add deleted
non-member comma operator with iterator wrappers as right operand.

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index ea413b1b155..b5facef55dd 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -180,7 +180,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _InputIterator
 __find_if_not_n(_InputIterator __first, _Distance& __len, _Predicate 
__pred)
 {
-  for (; __len; --__len, ++__first)
+  for (; __len; --__len,  (void) ++__first)
if (!__pred(__first))
  break;
   return __first;
@@ -4462,7 +4462,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
__typeof__(__gen())>)
 
   for (__decltype(__n + 0) __niter = __n;
-  __niter > 0; --__niter, ++__first)
+  __niter > 0; --__niter, (void) ++__first)
*__first = __gen();
   return __first;
 }
diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index a80934c4faa..bfdfc7ded5e 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -738,7 +738,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 __fill_n_a(_OutputIterator __first, _Size __n, const _Tp& __value)
 {
   for (__decltype(__n + 0) __niter = __n;
-  __niter > 0; --__niter, ++__first)
+  __niter > 0; --__niter, (void) ++__first)
*__first = __value;
   return __first;
 }
@@ -750,7 +750,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   const _Tp __tmp = __value;
   for (__decltype(__n + 0) __niter = __n;
-  __niter > 0; --__niter, ++__first)
+  __niter > 0; --__niter, (void) ++__first)
*__first = __tmp;
   return __first;
 }
@@ -796,7 +796,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
static bool
equal(_II1 __first1, _II1 __last1, _II2 __first2)
{
- for (; __first1 != __last1; ++__first1, (void)++__first2)
+ for (; __first1 != __last1; ++__first1, (void) ++__first2)
if (!(*__first1 == *__first2))
  return false;
  return true;
diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h 
b/libstdc++-v3/include/bits/stl_uninitialized.h
index c2ba863ed98..bac3fd01f7a 100644
--- a/libstdc++-v3/include/bits/stl_uninitialized.h
+++ b/libstdc++-v3/include/bits/stl_uninitialized.h
@@ -206,7 +206,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _ForwardIterator __cur = __first;
  __try
{
- for (; __n > 0; --__n, 

Re: [006/nnn] poly_int: tree constants

2017-10-26 Thread Martin Sebor

On 10/26/2017 11:52 AM, Richard Sandiford wrote:

Martin Sebor  writes:

 /* The tree and const_tree overload templates.   */
 namespace wi
 {
+  class unextended_tree
+  {
+  private:
+const_tree m_t;
+
+  public:
+unextended_tree () {}


Defining no-op ctors is quite dangerous and error-prone.  I suggest
to instead default initialize the member(s):

   unextended_tree (): m_t () {}

Ditto everywhere else, such as in:


This is really performance-senesitive code though, so I don't think
we want to add any unnecessary initialisation.  Primitive types are
uninitalised by default too, and the point of this class is to
provide an integer-like interface.


I understand the performance concern (more on that below), but
to clarify the usability issues,  I don't think the analogy with
primitive types is quite fitting here: int() evaluates to zero,
as do the values of i and a[0] and a[1] after an object of type
S is constructed using its default ctor, i.e., S ():

   struct S {
 int i;
 int a[2];

 S (): i (), a () { }
   };


Sure, I realise that.  I meant that:

  int x;

doesn't initialise x to zero.  So it's a question of which case is the
most motivating one: using "x ()" to initialise x to 0 in a constructor
or "int x;" to declare a variable of type x, uninitialised.  I think the
latter use case is much more common (at least in GCC).  Rearranging
things, I said later:


I agree that the latter use case is more common in GCC, but I don't
see it as a good thing.  GCC was written in C and most code still
uses now outdated C practices such as declaring variables at the top
of a (often long) function, and usually without initializing them.
It's been established that it's far better to declare variables with
the smallest scope, and to initialize them on declaration.  Compilers
are smart enough these days to eliminate redundant initialization or
assignments.


In your other message you used the example of explicit default
initialisation, such as:

class foo
{
  foo () : x () {}
  unextended_tree x;
};

But I think we should strongly discourage that kind of thing.
If someone wants to initialise x to a particular value, like
integer_zero_node, then it would be better to do it explicitly.
If they don't care what the initial value is, then for these
integer-mimicing classes, uninitialised is as good as anything
else. :-)


What I meant was: if you want to initialise "i" to 1 in your example,
you'd have to write "i (1)".  Being able to write "i ()" instead of
"i (0)" saves one character but I don't think it adds much clarity.
Explicitly initialising something only seems worthwhile if you say
what you're initialising it to.


My comment is not motivated by convenience.  What I'm concerned
about is that defining a default ctor to be a no-op defeats the
zero-initialization semantics most users expect of T().

This is particularly concerning for a class designed to behave
like an [improved] basic integer type.  Such a class should act
as closely as possible to the type it emulates and in the least
surprising ways.  Any sort of a deviation that replaces well-
defined behavior with undefined is a gotcha and a bug waiting
to happen.

It's also a concern in generic (template) contexts where T() is
expected to zero-initialize.  A template designed to work with
a fundamental integer type should also work with a user-defined
type designed to behave like an integer.


But that kind of situation is one where using "T (0)" over "T ()"
is useful.  It means that template substitution will succeed for
T that are sufficiently integer-like to have a single well-defined
zero but not for T that aren't (such as wide_int).


That strikes me as a little too subtle.  But it also doesn't
sound like wide_int is as close to an integer as its name
suggests.  After all, it doesn't support relational operators
either, or even assignment from other integer types.  It's
really a different beast.  But that still doesn't in my mind
justify the no-op initialization semantics.


For offset_int the default precision is 128-bits.  Making that
the default also for wide_int should be unsurprising.


I think it'd be surprising.  offset_int should always be used in
preference to wide_int if the precision is known to be 128 bits
in advance, and there doesn't seem any reason to prefer the
precision of offset_int over widest_int, HOST_WIDE_INT or int.

We would end up with:

  wide_int
  f (const wide_int )
  {
wide_int x;
x += y;
return x;
  }

being valid if y happens to have 128 bits as well, and a runtime error
otherwise.


Surely that would be far better than the undefined behavior we
have today.



Also, I think it'd be inconsistent to allow the specific case of 0
to be assigned by default construction, but not also allow:

  wide_int x (0);

  wide_int x;
  x = 0;

  wide_int x;
  x = 1;

etc.  And wide_int wasn't intended for that use case.


Then perhaps I don't fully understand wide_int.  I would expect

Re: [PATCH, rs6000] (v2) Gimple folding for vec_madd()

2017-10-26 Thread Segher Boessenkool
Hi Will,

On Thu, Oct 26, 2017 at 05:13:38PM -0500, Will Schmidt wrote:
>   * config/rs6000/rs6000.c: (rs6000_gimple_fold_builtin) Add support for
> gimple folding of vec_madd() intrinsics.

The colon goes after the closing parenthesis, and continuation lines
should not have extra indent.  I.e. like:

* config/rs6000/rs6000.c: (rs6000_gimple_fold_builtin) Add support for
gimple folding of vec_madd() intrinsics.

>   * config/rs6000/altivec.md: Add define_expand fmav8hi4

* config/rs6000/altivec.md (fmav8hi4): New define_expand.

> +(define_expand "fmav8hi4"
> +  [(use (match_operand:V8HI 0 "register_operand" ""))
> +   (use (match_operand:V8HI 1 "register_operand" ""))
> +   (use (match_operand:V8HI 2 "register_operand" ""))
> +   (use (match_operand:V8HI 3 "register_operand" ""))]
> +   "TARGET_ALTIVEC"
> +{
> +  emit_insn (gen_altivec_vmladduhm (operands[0], operands[1],
> + operands[2], operands[3]));
> +  DONE;
> +})

Just leave out the default, empty constraint strings please.

Maybe the altivec_vmladduhm pattern should just be renamed?  Or this
expander moved to there, at least.

> +/* vec_madd (Float) */

The "Float" part here is misleading now.

> +case ALTIVEC_BUILTIN_VMADDFP:
> +case VSX_BUILTIN_XVMADDDP:
> +case ALTIVEC_BUILTIN_VMLADDUHM:
> +  {
> +   arg0 = gimple_call_arg (stmt, 0);
> +   arg1 = gimple_call_arg (stmt, 1);
> +   tree arg2 = gimple_call_arg (stmt, 2);
> +   lhs = gimple_call_lhs (stmt);
> +   gimple *g = gimple_build_assign (lhs, FMA_EXPR , arg0, arg1, arg2);
> +   gimple_set_location (g, gimple_location (stmt));
> +   gsi_replace (gsi, g, true);
> +   return true;
> +  }

Okay for trunk with that fixed.  Thanks!


Segher


Re: [006/nnn] poly_int: tree constants

2017-10-26 Thread Martin Sebor

On 10/26/2017 01:17 PM, Pedro Alves wrote:

On 10/26/2017 07:54 PM, Martin Sebor wrote:


(...) in the general case, it is preferable to
declare each variable at the point where its initial value is known
(or can be computed) and initialize it on its declaration.


With that I fully agree, except it's not always possible or
natural.  The issue at hand usually turns up with
conditional initialization, like:

void foo ()
{
  int t;
  if (something)
t = 1;
  else if (something_else)
t = 2;
  if (t == 1)
bar ();
}

That's a simple example of course, but more complicated
conditionals aren't so easy to grok and spot the bug.

In the case above, I'd much prefer if the compiler tells me
I missed initializing 't' than initializing it to 0 "just
in case".


Sure.  A similar observation could be made about std::string
or std::vector vs a plain C-style array, or for that matter,
about almost any other class.  But it would be absurd to use
such examples as arguments that it's better to define classes
with a no-op default constructor.   It's safer to initialize
objects to some value (whatever that might be) than not to
initialize them at all.  That's true for fundamental types
and even more so for user-defined classes with constructors.

IMO, a good rule of thumb to follow in class design is to have
every class with any user-defined ctor either define a default
ctor that puts the object into a determinate state, or make
the default ctor inaccessible (or deleted in new C++ versions).
If there is a use case for leaving newly constructed objects
of a class in an uninitialized state that's an exception to
the rule that can be accommodated by providing a special API
(or in C++ 11, a defaulted ctor).

Martin


Go patch committed: Explicitly convert between type aliases

2017-10-26 Thread Ian Lance Taylor
This patch adds an explicit conversion between type aliases.
Otherwise we can get a crash in the backend when GCC sees a
fold_convert_loc between two record types that the GCC backend appear
to be different.  Adding the explicit conversion will insert a
VIEW_CONVERT_EXPR where needed.  The test case is
https://golang.org/cl/73790.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 254090)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-a409ac2c78899e638a014c97891925bec93cb3ad
+64d570c590a76921cbdca4efb22e4675e19cc809
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 254090)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -144,8 +144,8 @@ Expression::convert_for_assignment(Gogo*
   || rhs->is_error_expression())
 return Expression::make_error(location);
 
-  if (lhs_type->forwarded() != rhs_type->forwarded()
-  && lhs_type->interface_type() != NULL)
+  bool are_identical = Type::are_identical(lhs_type, rhs_type, false, NULL);
+  if (!are_identical && lhs_type->interface_type() != NULL)
 {
   if (rhs_type->interface_type() == NULL)
 return Expression::convert_type_to_interface(lhs_type, rhs, location);
@@ -153,8 +153,7 @@ Expression::convert_for_assignment(Gogo*
 return Expression::convert_interface_to_interface(lhs_type, rhs, false,
   location);
 }
-  else if (lhs_type->forwarded() != rhs_type->forwarded()
-  && rhs_type->interface_type() != NULL)
+  else if (!are_identical && rhs_type->interface_type() != NULL)
 return Expression::convert_interface_to_type(lhs_type, rhs, location);
   else if (lhs_type->is_slice_type() && rhs_type->is_nil_type())
 {
@@ -165,8 +164,15 @@ Expression::convert_for_assignment(Gogo*
 }
   else if (rhs_type->is_nil_type())
 return Expression::make_nil(location);
-  else if (Type::are_identical(lhs_type, rhs_type, false, NULL))
+  else if (are_identical)
 {
+  if (lhs_type->forwarded() != rhs_type->forwarded())
+   {
+ // Different but identical types require an explicit
+ // conversion.  This happens with type aliases.
+ return Expression::make_cast(lhs_type, rhs, location);
+   }
+
   // No conversion is needed.
   return rhs;
 }


Re: [PATCH, rs6000] (v2) Gimple folding for vec_madd()

2017-10-26 Thread Will Schmidt
Hi,

[PATCH, rs6000] (v2) Gimple folding for vec_madd()

Add support for gimple folding of the vec_madd() (vector multiply-add)
intrinsics.   Per earlier feedback and education, this now includes
the addition of a "define_expand fmav8hi4" in altivec.md.

Testcase coverage is provided by the existing tests as
 gcc.target/powerpc/fold-vec-madd-*.c

Sniff-tests passed. Regtests will be kicked off shortly. OK for trunk?
(pending successful test results, of course:-) )

Thanks,
-Will

[gcc]

2017-10-26  Will Schmidt 

* config/rs6000/rs6000.c: (rs6000_gimple_fold_builtin) Add support for
  gimple folding of vec_madd() intrinsics.
* config/rs6000/altivec.md: Add define_expand fmav8hi4

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 6ea529b..36e6ddd 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -943,10 +943,22 @@
   (match_operand:V8HI 3 "register_operand" "v")))]
   "TARGET_ALTIVEC"
   "vmladduhm %0,%1,%2,%3"
   [(set_attr "type" "veccomplex")])
 
+(define_expand "fmav8hi4"
+  [(use (match_operand:V8HI 0 "register_operand" ""))
+   (use (match_operand:V8HI 1 "register_operand" ""))
+   (use (match_operand:V8HI 2 "register_operand" ""))
+   (use (match_operand:V8HI 3 "register_operand" ""))]
+   "TARGET_ALTIVEC"
+{
+  emit_insn (gen_altivec_vmladduhm (operands[0], operands[1],
+   operands[2], operands[3]));
+  DONE;
+})
+
 (define_expand "altivec_vmrghb"
   [(use (match_operand:V16QI 0 "register_operand" ""))
(use (match_operand:V16QI 1 "register_operand" ""))
(use (match_operand:V16QI 2 "register_operand" ""))]
   "TARGET_ALTIVEC"
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 4837e14..1cd4278 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -16606,10 +16606,25 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   build_int_cst (arg2_type, 0)), arg0);
 gimple_set_location (g, loc);
 gsi_replace (gsi, g, true);
 return true;
   }
+/* vec_madd (Float) */
+case ALTIVEC_BUILTIN_VMADDFP:
+case VSX_BUILTIN_XVMADDDP:
+case ALTIVEC_BUILTIN_VMLADDUHM:
+  {
+   arg0 = gimple_call_arg (stmt, 0);
+   arg1 = gimple_call_arg (stmt, 1);
+   tree arg2 = gimple_call_arg (stmt, 2);
+   lhs = gimple_call_lhs (stmt);
+   gimple *g = gimple_build_assign (lhs, FMA_EXPR , arg0, arg1, arg2);
+   gimple_set_location (g, gimple_location (stmt));
+   gsi_replace (gsi, g, true);
+   return true;
+  }
+
 default:
if (TARGET_DEBUG_BUILTIN)
   fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n",
fn_code, fn_name1, fn_name2);
   break;




Re: [PATCH 06/13] remove sdb and -gcoff from non-target files

2017-10-26 Thread Jim Wilson
On Thu, 2017-10-26 at 11:38 +0200, Richard Biener wrote:
> You can eventually keep the option, marking it as Ignore (like we do
> for options we remove but "keep" for backward compatibility).  The
> diagnostic (as warning, given the option will be just ignored) could
> be emited from option processing in opts.c then.

I seriously doubt that anyone will miss the -gcoff option.  The last
bug report I can find is 
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9963
which was fixed in 2005.  There is also a bug report from 2004
    https://gcc.gnu.org/ml/gcc/2004-06/msg00708.html
which suggests it should just be removed instead of fixed.

I see Kai Tietz fixing some bugs in sdbout in 2014, but that is only
because he was doing cygwin maintenance, and these problems turned up
during testsuite debug torture testing.  So it wasn't an end user
problem.  Also, in this thread, there are questions about why we don't
just delete it instead.

If we ignore the option, we can't have code in opts.c to emit a warning
for it, but we can put a warning in the common.opt file.  I tried this
and ran into a minor problem which is that the code to check the debug
level only works for options that exist.  So I get

palantir:2277$ ./xgcc -B./ -O -S -gcoff tmp.c
xgcc: warning: switch ‘-gcoff’ no longer supported
palantir:2278$ ./xgcc -B./ -O -S -gcoff3 tmp.c
xgcc: warning: switch ‘-gcoff3’ no longer supported
palantir:2279$ ./xgcc -B./ -O -S -gcofffoo tmp.c
xgcc: warning: switch ‘-gcofffoo’ no longer supported
palantir:2280$ 

The last one has never been a valid option.  If we don't care about
this, then the attached patch works.

Otherwise I think I have to add 4 stanzas for the four valid options,
-gcoff, -gcoff1, -gcoff2, and -gcoff3.  I'd rather not do that.  Or
leave -gcoff in as a supported option and ignore it in opts.c, which I
would also rather not do. I just want it gone.  I can live with the
ignored option.

OK?

Jim
2017-10-26  Jim Wilson  

gcc/
* common.opt (gcoff): Re-add as ignored option.

diff --git a/gcc/common.opt b/gcc/common.opt
index 25e86ec..c248d95 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2868,6 +2868,10 @@ g
 Common Driver RejectNegative JoinedOrMissing
 Generate debug information in default format.
 
+gcoff
+Common Driver JoinedOrMissing Ignore Warn(switch %qs no longer supported)
+Does nothing.  Preserved for backward compatibility.
+
 gcolumn-info
 Common Driver Var(debug_column_info,1) Init(1)
 Record DW_AT_decl_column and DW_AT_call_column in DWARF.


Re: [PATCH, rs6000 V3] Add Power 8 support to vec_revb

2017-10-26 Thread Segher Boessenkool
Hi Carl,

On Mon, Oct 23, 2017 at 08:36:03AM -0700, Carl Love wrote:
> The mode attribute wd will not work
> for the define expand as the V16QI maps to "b" not "q".  So I do need to
> have VSX_XXBR.

Why is that a problem?  Why do you want to swap the order of the whole
vector instead of changing the order of the bytes in each element (which
is a no-op in this case)?

>   * config/rs6000/vsx.md (revb_): Add define_expand to generate
>   power 8 instructions for the vec_revb builtin.

(You have a stray tab here (before "to").)

> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index f9be5d3..0cfafe5 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -142,6 +142,8 @@ extern int rs6000_emit_vector_cond_expr (rtx, rtx, rtx, 
> rtx, rtx, rtx);
>  extern void rs6000_emit_minmax (rtx, enum rtx_code, rtx, rtx);
>  extern void rs6000_split_signbit (rtx, rtx);
>  extern void rs6000_expand_atomic_compare_and_swap (rtx op[]);
> +extern rtx swap_endianess_selector_for_mode (machine_mode mode);

The word is endianness (two n's).

> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index b47eeac..0258360 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -73,6 +73,14 @@
>(TF"FLOAT128_VECTOR_P (TFmode)")
>TI])
>  
> +(define_mode_attr VSX_XXBR  [(V16QI "q")
> +(V8HI  "h")
> +(V4SI  "w")
> +(V4SF  "w")
> +(V2DF  "d")
> +(V2DI  "d")
> +(V1TI  "q")])

So I think this is wrong for V16QI.  You can use  fine, but you need
to avoid generating "xxbrb" insns; instead, do a register move?  xxbrq
isn't the insn you want, as far as I see.

> +;; Swap all bytes in each element of vector
> +(define_expand "revb_"
> +  [(set (match_operand:VEC_A 0 "vsx_register_operand")
> + (bswap:VEC_A (match_operand:VEC_A 1 "vsx_register_operand")))]
> +  "TARGET_P9_VECTOR"
> +{
> +  rtx sel;

So a special case here:

  if (mode == V16QImode)
{
  emit_move_insn (operands[0], operands[1]);
  DONE;
}

or something like it.

> +;; Swap all bytes in 128-byte vector
> +(define_expand "revb_v1ti"

I don't see what is different about this one?

> +  [(set (match_operand:V1TI 0 "vsx_register_operand")
> + (bswap:V1TI (match_operand:V1TI 1 "vsx_register_operand")))]
> +  "TARGET_P9_VECTOR"

I don't think you want this condition, btw...  it makes the first
following if() trivially true:

> +{
> +  rtx sel;
> +
> +  if (TARGET_P9_VECTOR)
> +emit_insn (gen_p9_xxbrq_v1ti (operands[0], operands[1]));


Segher


[PATCH] Fix x86-64 zero_extended_scalar_load_operand caused wrong-code (PR target/82703)

2017-10-26 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled, because due to STV we end up
with a load from V2DFmode MEM for which get_pool_entry returns
a V1TImode vector.  maybe_get_pool_constant doesn't have code
to handle mode differences between what get_pool_constant returns
and the requested mode.  While it wouldn't be that hard to add,
this all is done properly in avoid_constant_pool_reference already,
which does everything maybe_get_pool_constant needs to do.

So, this patch just removes the maybe_get_pool_constant function and
uses avoid_constant_pool_reference instead.  I've looked into history
and the latter has been introduced in 2001, the former in 2002, but at
that point the latter didn't do any delegitimization (but has the
mode conversions at that point), while the former had hand-written
delegitimization.  But since then delegitimization has been added to
a_c_p_r and custom delegitimization changed into the full blown one in
m_g_p_c.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/7.x?

2017-10-26  Jakub Jelinek  

PR target/82703
* config/i386/i386-protos.h (maybe_get_pool_constant): Removed.
* config/i386/i386.c (maybe_get_pool_constant): Removed.
(ix86_split_to_parts): Use avoid_constant_pool_reference instead of
maybe_get_pool_constant.
* config/i386/predicates.md (zero_extended_scalar_load_operand):
Likewise.

* gcc.dg/pr82703.c: New test.

--- gcc/config/i386/i386-protos.h.jj2017-04-20 12:19:54.0 +0200
+++ gcc/config/i386/i386-protos.h   2017-10-26 13:14:05.172571175 +0200
@@ -282,8 +282,6 @@ extern bool i386_pe_type_dllexport_p (tr
 
 extern int i386_pe_reloc_rw_mask (void);
 
-extern rtx maybe_get_pool_constant (rtx);
-
 extern char internal_label_prefix[16];
 extern int internal_label_prefix_len;
 
--- gcc/config/i386/i386.c.jj   2017-09-30 10:26:43.0 +0200
+++ gcc/config/i386/i386.c  2017-10-26 13:15:03.917891804 +0200
@@ -19830,20 +19830,6 @@ ix86_expand_clear (rtx dest)
   emit_insn (tmp);
 }
 
-/* X is an unchanging MEM.  If it is a constant pool reference, return
-   the constant pool rtx, else NULL.  */
-
-rtx
-maybe_get_pool_constant (rtx x)
-{
-  x = ix86_delegitimize_address (XEXP (x, 0));
-
-  if (GET_CODE (x) == SYMBOL_REF && CONSTANT_POOL_ADDRESS_P (x))
-return get_pool_constant (x);
-
-  return NULL_RTX;
-}
-
 void
 ix86_expand_move (machine_mode mode, rtx operands[])
 {
@@ -25371,11 +25357,7 @@ ix86_split_to_parts (rtx operand, rtx *p
   /* Optimize constant pool reference to immediates.  This is used by fp
  moves, that force all constants to memory to allow combining.  */
   if (MEM_P (operand) && MEM_READONLY_P (operand))
-{
-  rtx tmp = maybe_get_pool_constant (operand);
-  if (tmp)
-   operand = tmp;
-}
+operand = avoid_constant_pool_reference (operand);
 
   if (MEM_P (operand) && !offsettable_memref_p (operand))
 {
--- gcc/config/i386/predicates.md.jj2017-04-20 12:19:54.0 +0200
+++ gcc/config/i386/predicates.md   2017-10-26 13:16:27.399926361 +0200
@@ -975,9 +975,9 @@ (define_predicate "zero_extended_scalar_
   (match_code "mem")
 {
   unsigned n_elts;
-  op = maybe_get_pool_constant (op);
+  op = avoid_constant_pool_reference (op);
 
-  if (!(op && GET_CODE (op) == CONST_VECTOR))
+  if (GET_CODE (op) != CONST_VECTOR)
 return false;
 
   n_elts = CONST_VECTOR_NUNITS (op);
--- gcc/testsuite/gcc.dg/pr82703.c.jj   2017-10-26 13:19:23.879885830 +0200
+++ gcc/testsuite/gcc.dg/pr82703.c  2017-10-26 13:18:42.0 +0200
@@ -0,0 +1,28 @@
+/* PR target/82703 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-tree-sra -ftree-vectorize" } */
+
+__attribute__((noinline, noclone)) void
+compare (const double *p, const double *q)
+{
+  for (int i = 0; i < 3; ++i)
+if (p[i] != q[i])
+  __builtin_abort ();
+}
+
+double vr[3] = { 4, 4, 4 };
+
+int
+main ()
+{
+  double v1[3] = { 1, 2, 3 };
+  double v2[3] = { 3, 2, 1 };
+  double v3[3];
+  __builtin_memcpy (v3, v1, sizeof (v1));
+  for (int i = 0; i < 3; ++i)
+v3[i] += v2[i];
+  for (int i = 0; i < 3; ++i)
+v1[i] += v2[i];
+  compare (v3, vr);
+  return 0;
+}

Jakub


Re: [PATCH] expand -fdebug-prefix-map documentation

2017-10-26 Thread Jim Wilson
On Thu, 2017-10-26 at 12:35 -0600, Sandra Loosemore wrote:
> On 10/26/2017 11:47 AM, Gerald Pfeifer wrote:
> > Thanks you for fleshing this out, Jim!
> > 
> > This looks fine to me (modula Sandra's note).  Just a question:
> > would
> > we refer to GDB instead of gdb here?  It feels a little in between
> > to
> > me, whether we are referring to the tool or the actual binary.  I'm
> > sure Sandra will have guidance for us. ;-)
> Hmmm, yes.  "GDB" is the preferred spelling elsewhere in the manual.

Checked in with the two corrections.

Jim



[patch, nios2] add -mr0rel-sec= option

2017-10-26 Thread Sandra Loosemore
I've committed this patch, intended for bare-metal nios2-elf targets 
with custom linker scripts.  In conjunction with section attributes, it 
provides a way to tell GCC that an object is assigned to the low or high 
32K memory regions so that it can be addressed via a signed 16-bit 
offset from r0, the zero register.  From GCC's perspective it's quite 
similar to GP-relative addressing except using a different base 
register.  The output code uses the existing %lo relocation so no 
assembler or linker changes are necessary to support this.


-Sandra

2017-10-26  Sandra Loosemore  

	gcc/
	* config/nios2/constraints.md ("S"): Match r0rel_constant_p too.
	* config/nios2/nios2-protos.h (r0rel_constant_p): Declare.
	* config/nios2/nios2.c: (nios2_r0rel_sec_regex): New.
	(nios2_option_overide): Initialize it.  Don't allow R0-relative 
	addressing with PIC.
	(nios2_rtx_costs): Handle r0rel_constant_p like gprel_constant_p.
	(nios2_symbolic_constant_p): Likewise.
	(nios2_legitimate_address_p): Likewise.
	(nios2_r0rel_section_name_p): New.
	(nios2_symbol_ref_in_r0rel_data_p): New.
	(nios2_emit_move_sequence): Handle r0rel_constant_p.
	(r0rel_constant_p): New.
	(nios2_print_operand_address): Handle r0rel_constant_p.
	(nios2_cdx_narrow_form_p): Likewise.
	* config/nios2/nios2.opt (mr0rel-sec=): New option.
	* doc/invoke.texi (Option Summary): Add -mr0rel-sec.
	(Nios II Options): Document -mr0rel-sec.

	gcc/testsuite/
	* gcc.target/nios2/gpopt-r0rel-sec.c: New.
diff --git a/gcc/config/nios2/constraints.md b/gcc/config/nios2/constraints.md
index c6c53926..51f71cf 100644
--- a/gcc/config/nios2/constraints.md
+++ b/gcc/config/nios2/constraints.md
@@ -95,8 +95,8 @@
(match_test "TARGET_ARCH_R2 && ANDCLEAR_INT (ival)")))
 
 (define_constraint "S"
-  "An immediate stored in small data, accessible by GP."
-  (match_test "gprel_constant_p (op)"))
+  "An immediate stored in small data, accessible by GP, or by offset from r0."
+  (match_test "gprel_constant_p (op) || r0rel_constant_p (op)"))
 
 (define_constraint "T"
   "A constant unspec offset representing a relocation."
diff --git a/gcc/config/nios2/nios2-protos.h b/gcc/config/nios2/nios2-protos.h
index 6df65bb..84d450b 100644
--- a/gcc/config/nios2/nios2-protos.h
+++ b/gcc/config/nios2/nios2-protos.h
@@ -52,6 +52,7 @@ extern const char * nios2_add_insn_asm (rtx_insn *, rtx *);
 
 extern bool nios2_legitimate_pic_operand_p (rtx);
 extern bool gprel_constant_p (rtx);
+extern bool r0rel_constant_p (rtx);
 extern bool nios2_regno_ok_for_base_p (int, bool);
 extern bool nios2_unspec_reloc_p (rtx);
 
diff --git a/gcc/config/nios2/nios2.c b/gcc/config/nios2/nios2.c
index 3aade7b..cdd5e9a 100644
--- a/gcc/config/nios2/nios2.c
+++ b/gcc/config/nios2/nios2.c
@@ -106,6 +106,7 @@ static bool custom_code_conflict = false;
 
 /* State for command-line options.  */
 regex_t nios2_gprel_sec_regex;
+regex_t nios2_r0rel_sec_regex;
 
 
 /* Definition of builtin function types for nios2.  */
@@ -1375,22 +1376,30 @@ nios2_option_override (void)
 	nios2_gpopt_option = gpopt_local;
 }
 
-  /* GP-relative addressing doesn't make sense for PIC.  */
+  /* GP-relative and r0-relative addressing don't make sense for PIC.  */
   if (flag_pic)
-{ 
+{
   if (nios2_gpopt_option != gpopt_none)
-error ("-mgpopt not supported with PIC.");
+	error ("-mgpopt not supported with PIC.");
   if (nios2_gprel_sec)
-error ("-mgprel-sec= not supported with PIC.");
+	error ("-mgprel-sec= not supported with PIC.");
+  if (nios2_r0rel_sec)
+	error ("-mr0rel-sec= not supported with PIC.");
 }
 
-  /* Process -mgprel-sec=.  */
+  /* Process -mgprel-sec= and -m0rel-sec=.  */
   if (nios2_gprel_sec)
 {
   if (regcomp (_gprel_sec_regex, nios2_gprel_sec, 
 		   REG_EXTENDED | REG_NOSUB))
 	error ("-mgprel-sec= argument is not a valid regular expression.");
 }
+  if (nios2_r0rel_sec)
+{
+  if (regcomp (_r0rel_sec_regex, nios2_r0rel_sec, 
+		   REG_EXTENDED | REG_NOSUB))
+	error ("-mr0rel-sec= argument is not a valid regular expression.");
+}
 
   /* If we don't have mul, we don't have mulx either!  */
   if (!TARGET_HAS_MUL && TARGET_HAS_MULX)
@@ -1478,7 +1487,7 @@ nios2_rtx_costs (rtx x, machine_mode mode,
   case SYMBOL_REF:
   case CONST:
   case CONST_DOUBLE:
-	if (gprel_constant_p (x))
+	if (gprel_constant_p (x) || r0rel_constant_p (x))
   {
 *total = COSTS_N_INSNS (1);
 return true;
@@ -2028,6 +2037,7 @@ nios2_symbolic_constant_p (rtx x)
   return (SYMBOL_REF_P (base)
 		&& !SYMBOL_REF_TLS_MODEL (base)
 		&& !gprel_constant_p (base)
+		&& !r0rel_constant_p (base)
 		&& SMALL_INT (INTVAL (offset)));
 }
   return false;
@@ -2129,7 +2139,7 @@ nios2_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED,
 
   /* Else, fall through.  */
 case CONST:
-  if (gprel_constant_p (operand))
+  if (gprel_constant_p (operand) || r0rel_constant_p (operand))
 	return 

[patch, nios2] add -mgprel-sec= option

2017-10-26 Thread Sandra Loosemore
I've committed this patch, which allows users to specify additional data 
sections (beyond the normal small-data sections) where GP-relative 
addressing can be used.  It's intended for use on bare-metal nios2-elf 
targets in conjunction with section attributes and a custom linker script.


-Sandra

2017-10-26  Sandra Loosemore  

	gcc/
	* config/nios2/nios2.c: Include xregex.h.
	(nios2_gprel_sec_regex): New.
	(nios2_option_overide): Initialize it.  Don't allow GP-relative 
	addressing with PIC.
	(nios2_small_section_name_p): Check for regex match.
	* config/nios2/nios2.opt (mgprel-sec=): New option.
	* doc/invoke.texi (Option Summary): Add -mgprel-sec.
	(Nios II Options): Document -mgprel-sec.

	gcc/testsuite/
	* gcc.target/nios2/gpopt-gprel-sec.c: New.
diff --git a/gcc/config/nios2/nios2.c b/gcc/config/nios2/nios2.c
index f5963d4..3aade7b 100644
--- a/gcc/config/nios2/nios2.c
+++ b/gcc/config/nios2/nios2.c
@@ -49,6 +49,7 @@
 #include "stor-layout.h"
 #include "builtins.h"
 #include "tree-pass.h"
+#include "xregex.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -103,6 +104,9 @@ static int custom_code_index[256];
 /* Set to true if any conflicts (re-use of a code between 0-255) are found.  */
 static bool custom_code_conflict = false;
 
+/* State for command-line options.  */
+regex_t nios2_gprel_sec_regex;
+
 
 /* Definition of builtin function types for nios2.  */
 
@@ -1371,6 +1375,23 @@ nios2_option_override (void)
 	nios2_gpopt_option = gpopt_local;
 }
 
+  /* GP-relative addressing doesn't make sense for PIC.  */
+  if (flag_pic)
+{ 
+  if (nios2_gpopt_option != gpopt_none)
+error ("-mgpopt not supported with PIC.");
+  if (nios2_gprel_sec)
+error ("-mgprel-sec= not supported with PIC.");
+}
+
+  /* Process -mgprel-sec=.  */
+  if (nios2_gprel_sec)
+{
+  if (regcomp (_gprel_sec_regex, nios2_gprel_sec, 
+		   REG_EXTENDED | REG_NOSUB))
+	error ("-mgprel-sec= argument is not a valid regular expression.");
+}
+
   /* If we don't have mul, we don't have mulx either!  */
   if (!TARGET_HAS_MUL && TARGET_HAS_MULX)
 target_flags &= ~MASK_HAS_MULX;
@@ -2268,7 +2289,9 @@ nios2_small_section_name_p (const char *section)
   return (strcmp (section, ".sbss") == 0
 	  || strncmp (section, ".sbss.", 6) == 0
 	  || strcmp (section, ".sdata") == 0
-	  || strncmp (section, ".sdata.", 7) == 0);
+	  || strncmp (section, ".sdata.", 7) == 0
+	  || (nios2_gprel_sec 
+	  && regexec (_gprel_sec_regex, section, 0, NULL, 0) == 0));
 }
 
 /* Return true if EXP should be placed in the small data section.  */
diff --git a/gcc/config/nios2/nios2.opt b/gcc/config/nios2/nios2.opt
index 08cb935..d08405e 100644
--- a/gcc/config/nios2/nios2.opt
+++ b/gcc/config/nios2/nios2.opt
@@ -586,3 +586,7 @@ Enable generation of R2 BMX instructions.
 mcdx
 Target Report Mask(HAS_CDX)
 Enable generation of R2 CDX instructions.
+
+mgprel-sec=
+Target RejectNegative Joined Var(nios2_gprel_sec) Init(NULL)
+Regular expression matching additional GP-addressible small-data section names.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3a87956..0dafdb6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -948,6 +948,7 @@ Objective-C and Objective-C++ Dialects}.
 
 @emph{Nios II Options}
 @gccoptlist{-G @var{num}  -mgpopt=@var{option}  -mgpopt  -mno-gpopt @gol
+-mgprel-sec=@var{regexp} @gol
 -mel  -meb @gol
 -mno-bypass-cache  -mbypass-cache @gol
 -mno-cache-volatile  -mcache-volatile @gol
@@ -21165,6 +21166,18 @@ GOT data sections.  In this case, the 16-bit offset for GP-relative
 addressing may not be large enough to allow access to the entire 
 small data section.
 
+@item -mgprel-sec=@var{regexp}
+@opindex mgprel-sec
+This option specifies additional section names that can be accessed via
+GP-relative addressing.  It is most useful in conjunction with 
+@code{section} attributes on variable declarations 
+(@pxref{Common Variable Attributes}) and a custom linker script.  
+The @var{regexp} is a POSIX Extended Regular Expression.
+
+This option does not affect the behavior of the @option{-G} option, and 
+and the specified sections are in addition to the standard @code{.sdata} 
+and @code{.sbss} small-data sections that are recognized by @option{-mgpopt}.
+
 @item -mel
 @itemx -meb
 @opindex mel
diff --git a/gcc/testsuite/gcc.target/nios2/gpopt-gprel-sec.c b/gcc/testsuite/gcc.target/nios2/gpopt-gprel-sec.c
new file mode 100644
index 000..1083fe6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nios2/gpopt-gprel-sec.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O -mgpopt=local -mgprel-sec=\\.frog.+" } */
+
+extern int a __attribute__ ((section (".frog1")));
+static volatile int b __attribute__ ((section (".frog2"))) = 1;
+extern int c __attribute__ ((section (".data")));
+static volatile int d __attribute__ ((section (".data"))) = 2;
+
+extern int e;
+static volatile int f = 3;
+
+volatile int g 

Fix basic block reordering glitches

2017-10-26 Thread Eric Botcazou
The attached Ada testcase triggers an ICE at -O3 when compiled for x86:

eric@polaris:~/build/gcc/native> gcc/xgcc -Bprev-gcc -S opt68.adb -O3 -m32
opt68.adb: In function 'Opt68.Copy':
opt68.adb:51:6: error: multiple hot/cold transitions found (bb 34)
opt68.adb:51:6: error: multiple hot/cold transitions found (bb 64)
+===GNAT BUG DETECTED==+
| 8.0.0 20171026 (experimental) [trunk revision 254096] (x86_64-suse-linux) 
GCC error:|
| verify_flow_info failed  |

It's the RTL basic block reordering pass confusing itself, more precisely:

  /* If the best destination has multiple predecessors, and can be
 duplicated cheaper than a jump, don't allow it to be added
 to a trace.  We'll duplicate it when connecting traces.  */
  if (best_edge && EDGE_COUNT (best_edge->dest->preds) >= 2
  && copy_bb_p (best_edge->dest, 0))
best_edge = NULL;

Now, if you're in the first round (reordering of the hot paritition) and the 
basic block has 2 predecessors, a regular one (corresponding to best_edge) and 
another coming from the cold partition (i.e. EDGE_CROSSING), the duplication 
will redirect best_edge and leave the block with only one predecessor coming 
from the cold partition, which means that the block will eventually become 
cold at the end of the pass if fixup_partitions is invoked.  But that's too 
late since

  /* Signal that rtl_verify_flow_info_1 can now verify that there
 is at most one switch between hot/cold sections.  */
  crtl->bb_reorder_complete = true;

So we would need to re-run the pass in order to move the block from the hot 
partition to the cold partition (re-running only a subpass wouldn't work since 
the above decision is made by find_traces but the duplication is triggered by 
connect_traces), which seems overkill.

Therefore the attached patch detects this problematic case and doesn't reset 
best_edge upon finding it.  This means that the destination will be added to 
the current trace in the hot partition, which looks OK since all its other 
predecessors are in the cold partition.  It also contains a fix for an off-by-
one bug in the dump file and removes a redundant check from better_edge_p (the 
same check on the BB_PARTITION of the blocks is done in find_traces_1_round).

Bootstrapped/regtested on x86_64-suse-linux.  Any objection?


2017-10-26  Eric Botcazou  <ebotca...@adacore.com>

* bb-reorder.c (find_traces_1_round): Fix off-by-one index.
Move around comment.  Do not reset best_edge for a copiable
destination if the copy would cause a partition change.
(better_edge_p): Remove redundant check.


2017-10-26  Eric Botcazou  <ebotca...@adacore.com>

* gnat.dg/opt68.ad[sb]: New test.

-- 
Eric BotcazouIndex: bb-reorder.c
===
--- bb-reorder.c	(revision 254096)
+++ bb-reorder.c	(working copy)
@@ -529,7 +529,7 @@ find_traces_1_round (int branch_th, int
 
 	  if (dump_file)
 	fprintf (dump_file, "Basic block %d was visited in trace %d\n",
-		 bb->index, *n_traces - 1);
+		 bb->index, *n_traces);
 
 	  ends_in_call = block_ends_with_call_p (bb);
 
@@ -545,6 +545,8 @@ find_traces_1_round (int branch_th, int
 		  && bb_visited_trace (e->dest) != *n_traces)
 		continue;
 
+	  /* If partitioning hot/cold basic blocks, don't consider edges
+		 that cross section boundaries.  */
 	  if (BB_PARTITION (e->dest) != BB_PARTITION (bb))
 		continue;
 
@@ -574,9 +576,6 @@ find_traces_1_round (int branch_th, int
 		  || e->count () < count_th) && (!for_size)))
 		continue;
 
-	  /* If partitioning hot/cold basic blocks, don't consider edges
-		 that cross section boundaries.  */
-
 	  if (better_edge_p (bb, e, prob, freq, best_prob, best_freq,
  best_edge))
 		{
@@ -586,12 +585,28 @@ find_traces_1_round (int branch_th, int
 		}
 	}
 
-	  /* If the best destination has multiple predecessors, and can be
-	 duplicated cheaper than a jump, don't allow it to be added
-	 to a trace.  We'll duplicate it when connecting traces.  */
-	  if (best_edge && EDGE_COUNT (best_edge->dest->preds) >= 2
+	  /* If the best destination has multiple predecessors and can be
+	 duplicated cheaper than a jump, don't allow it to be added to
+	 a trace; we'll duplicate it when connecting the traces later.
+	 However, we need to check that this duplication wouldn't leave
+	 the best destination with only crossing predecessors, because
+	 this would change its effective partition from hot to cold.  */
+	  if (best_edge
+	  && EDGE_COUNT (best_edge->dest->preds) >= 2
 	  && copy_bb_p (best_edge->dest, 0))
-	best_edge = NULL;
+	{
+	  bool only_cross

Re: Fix pretty printers for versioned namespace

2017-10-26 Thread Jonathan Wakely

On 26/10/17 21:37 +0100, Jonathan Wakely wrote:

On 26/10/17 21:30 +0100, Jonathan Wakely wrote:

On 26/10/17 22:19 +0200, François Dumont wrote:

@@ -1232,7 +1232,7 @@ class Printer(object):
  # Add a name using _GLIBCXX_BEGIN_NAMESPACE_CONTAINER.
  def add_container(self, base, name, function):
  self.add_version(base, name, function)
-self.add_version(base + '__cxx1998::', name, function)
+self.add_version(base, '__cxx1998::' + name, function)


I don't like this change.

Previously the arguments were the namespace(s) and the type. That's
nice and simple.

Now it's the first namespace, and then all the other namespaces and
the type. That's not very clean.

There must be a way to keep the add_version and add_container calls
the same, and have it transparently handle the case where the
namespace is 'std::__8::foo' not 'std::foo'.


e.g. in add_version instead of:

  if _versioned_namespace:
  self.add(base + _versioned_namespace + name, function)

We should replace "std::" in base with "std::" + _versioned_namespace

That means we only have to change that function, not every call to it.


Something like this:

--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -1227,7 +1227,8 @@ class Printer(object):
def add_version(self, base, name, function):
self.add(base + name, function)
if _versioned_namespace:
-self.add(base + _versioned_namespace + name, function)
+vbase = re.sub('^std::', 'std::%s::' % _versioned_namespace, base)
+self.add(vbase + name, function)

# Add a name using _GLIBCXX_BEGIN_NAMESPACE_CONTAINER.
def add_container(self, base, name, function):




Re: Fix pretty printers for versioned namespace

2017-10-26 Thread Jonathan Wakely

On 26/10/17 21:30 +0100, Jonathan Wakely wrote:

On 26/10/17 22:19 +0200, François Dumont wrote:

@@ -1232,7 +1232,7 @@ class Printer(object):
   # Add a name using _GLIBCXX_BEGIN_NAMESPACE_CONTAINER.
   def add_container(self, base, name, function):
   self.add_version(base, name, function)
-self.add_version(base + '__cxx1998::', name, function)
+self.add_version(base, '__cxx1998::' + name, function)


I don't like this change.

Previously the arguments were the namespace(s) and the type. That's
nice and simple.

Now it's the first namespace, and then all the other namespaces and
the type. That's not very clean.

There must be a way to keep the add_version and add_container calls
the same, and have it transparently handle the case where the
namespace is 'std::__8::foo' not 'std::foo'.


e.g. in add_version instead of:

   if _versioned_namespace:
   self.add(base + _versioned_namespace + name, function)

We should replace "std::" in base with "std::" + _versioned_namespace

That means we only have to change that function, not every call to it.





Re: Fix pretty printers for versioned namespace

2017-10-26 Thread Jonathan Wakely

On 26/10/17 22:19 +0200, François Dumont wrote:

@@ -1232,7 +1232,7 @@ class Printer(object):
# Add a name using _GLIBCXX_BEGIN_NAMESPACE_CONTAINER.
def add_container(self, base, name, function):
self.add_version(base, name, function)
-self.add_version(base + '__cxx1998::', name, function)
+self.add_version(base, '__cxx1998::' + name, function)


I don't like this change.

Previously the arguments were the namespace(s) and the type. That's
nice and simple.

Now it's the first namespace, and then all the other namespaces and
the type. That's not very clean.

There must be a way to keep the add_version and add_container calls
the same, and have it transparently handle the case where the
namespace is 'std::__8::foo' not 'std::foo'.


Re: Rename cxx1998 into normal

2017-10-26 Thread Jonathan Wakely

On 26/10/17 21:45 +0200, François Dumont wrote:

On 26/10/2017 19:09, Jonathan Wakely wrote:

On 26/10/17 18:55 +0200, Daniel Krügler wrote:

2017-10-26 7:51 GMT+02:00 François Dumont :

Hi

    We once talk about rename the __cxx1998 namespace into 
something less

C++98 biased. Here is the patch to do so.

    Ok with the new name ?


IMO the name should somehow still contain "cxx" somewhere, otherwise
this could easily cause a semantic ambiguity in situations, where the
term "normal" has a specific meaning.

What about "__cxxdef[ault]" ?


I'm not sure we need to change it at all. The name is anachronistic,
but harmless.



As part of this message:

https://gcc.gnu.org/ml/libstdc++/2016-09/msg00076.html

I thought that you also wanted to rename it. But I simply 
misunderstood, it is fine with me to keep it this way.


I was objecting to creating a new namespace called __cxx1998_a,
because for a new one we can choose any name. For the existing one I
don't really _like_ the name, but I don't think we need to change it.




Fix pretty printers for versioned namespace

2017-10-26 Thread François Dumont

Hi

    When restoring versioned namespace feature I forgot to check for 
the pretty printer tests which are now broken.


    Here is the patch to fix those. Tested under Linux x86_64 normal 
and versioned namepace modes, ok to commit ?


François

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py
index b7b2a0f..0a168fe 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -973,8 +973,8 @@ class StdExpAnyPrinter(SingleObjContainerPrinter):
 "Print a std::any or std::experimental::any"
 
 def __init__ (self, typename, val):
-self.typename = re.sub('^std::experimental::fundamentals_v\d::', 'std::experimental::', typename, 1)
-self.typename = strip_versioned_namespace(self.typename)
+self.typename = strip_versioned_namespace(typename)
+self.typename = re.sub('^std::experimental::fundamentals_v\d::', 'std::experimental::', self.typename, 1)
 self.val = val
 self.contained_type = None
 contained_value = None
@@ -1021,8 +1021,8 @@ class StdExpOptionalPrinter(SingleObjContainerPrinter):
 
 def __init__ (self, typename, val):
 valtype = self._recognize (val.type.template_argument(0))
-self.typename = re.sub('^std::(experimental::|)(fundamentals_v\d::|)(.*)', r'std::\1\3<%s>' % valtype, typename, 1)
-self.typename = strip_versioned_namespace(self.typename)
+self.typename = strip_versioned_namespace(typename)
+self.typename = re.sub('^std::(experimental::|)(fundamentals_v\d::|)(.*)', r'std::\1\3<%s>' % valtype, self.typename, 1)
 if not self.typename.startswith('std::experimental'):
 val = val['_M_payload']
 self.val = val
@@ -1043,8 +1043,8 @@ class StdVariantPrinter(SingleObjContainerPrinter):
 
 def __init__(self, typename, val):
 alternatives = self._template_args(val)
-self.typename = "%s<%s>" % (typename, ', '.join([self._recognize(alt) for alt in alternatives]))
-self.typename = strip_versioned_namespace(self.typename)
+self.typename = strip_versioned_namespace(typename)
+self.typename = "%s<%s>" % (self.typename, ', '.join([self._recognize(alt) for alt in alternatives]))
 self.index = val['_M_index']
 if self.index >= len(alternatives):
 self.contained_type = None
@@ -1232,7 +1232,7 @@ class Printer(object):
 # Add a name using _GLIBCXX_BEGIN_NAMESPACE_CONTAINER.
 def add_container(self, base, name, function):
 self.add_version(base, name, function)
-self.add_version(base + '__cxx1998::', name, function)
+self.add_version(base, '__cxx1998::' + name, function)
 
 @staticmethod
 def get_basic_type(type):
@@ -1511,7 +1511,7 @@ def build_libstdcxx_dictionary ():
 libstdcxx_printer.add_container('std::', 'bitset', StdBitsetPrinter)
 libstdcxx_printer.add_container('std::', 'deque', StdDequePrinter)
 libstdcxx_printer.add_container('std::', 'list', StdListPrinter)
-libstdcxx_printer.add_container('std::__cxx11::', 'list', StdListPrinter)
+libstdcxx_printer.add_container('std::', '__cxx11::list', StdListPrinter)
 libstdcxx_printer.add_container('std::', 'map', StdMapPrinter)
 libstdcxx_printer.add_container('std::', 'multimap', StdMapPrinter)
 libstdcxx_printer.add_container('std::', 'multiset', StdSetPrinter)
@@ -1581,33 +1581,33 @@ def build_libstdcxx_dictionary ():
   StdForwardListPrinter)
 
 # Library Fundamentals TS components
-libstdcxx_printer.add_version('std::experimental::fundamentals_v1::',
-  'any', StdExpAnyPrinter)
-libstdcxx_printer.add_version('std::experimental::fundamentals_v1::',
-  'optional', StdExpOptionalPrinter)
-libstdcxx_printer.add_version('std::experimental::fundamentals_v1::',
-  'basic_string_view', StdExpStringViewPrinter)
+libstdcxx_printer.add_version('std::', 'experimental::fundamentals_v1::any',
+  StdExpAnyPrinter)
+libstdcxx_printer.add_version('std::', 'experimental::fundamentals_v1::optional',
+  StdExpOptionalPrinter)
+libstdcxx_printer.add_version('std::', 'experimental::fundamentals_v1::basic_string_view',
+  StdExpStringViewPrinter)
 # Filesystem TS components
-libstdcxx_printer.add_version('std::experimental::filesystem::v1::',
-  'path', StdExpPathPrinter)
-libstdcxx_printer.add_version('std::experimental::filesystem::v1::__cxx11::',
-  'path', StdExpPathPrinter)
-libstdcxx_printer.add_version('std::filesystem::',
-  'path', StdExpPathPrinter)
-libstdcxx_printer.add_version('std::filesystem::__cxx11::',
- 

Re: [PATCH v2] Add asan and ubsan support on NetBSD/amd64

2017-10-26 Thread Kamil Rytarowski
On 26.10.2017 21:43, Kamil Rytarowski wrote:
> Tested on:
> 
> $ uname -rms
> NetBSD 8.99.4 amd64
> 

I forgot to amend everything in this patch, will fix in -v3.




signature.asc
Description: OpenPGP digital signature


[PATCH v3] Add asan and ubsan support on NetBSD/amd64

2017-10-26 Thread Kamil Rytarowski
Tested on:

$ uname -rms
NetBSD 8.99.4 amd64

The NetBSD support has been developed directly in the compiler-rt upstream
source.

Testing against the LLVM+Clang+compiler-rt (2017-10-25 snapshot from HEAD):

$ make check-asan
Failing Tests (2):
AddressSanitizer-Unit :: 
./Asan-x86_64-calls-Test/AddressSanitizerInterface.ManyThreadsWithStatsStressTest
AddressSanitizer-Unit :: 
./Asan-x86_64-inline-Test/AddressSanitizerInterface.ManyThreadsWithStatsStressTest

  Expected Passes: 436
  Unsupported Tests  : 822
  Unexpected Failures: 2

$ make check-asan-dynamic
Failing Tests (2):
AddressSanitizer-Unit :: 
./Asan-x86_64-calls-Dynamic-Test/AddressSanitizerInterface.ManyThreadsWithStatsStressTest
AddressSanitizer-Unit :: 
./Asan-x86_64-inline-Dynamic-Test/AddressSanitizerInterface.ManyThreadsWithStatsStressTest

  Expected Passes: 372
  Unsupported Tests  : 822
  Unexpected Failures: 2

$ make check-ubsan
Failing Tests (44):
UBSan-ASan-i386 :: TestCases/TypeCheck/vptr-non-unique-typeinfo.cpp
UBSan-ASan-x86_64 :: TestCases/TypeCheck/vptr-non-unique-typeinfo.cpp
UBSan-MSan-x86_64 :: TestCases/Float/cast-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/add-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/div-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/div-zero.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/incdec-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/mul-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/negate-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/no-recover.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/shift.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/sub-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/suppressions.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/uadd-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/uincdec-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/umul-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/usub-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/bool.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/bounds.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/builtins.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/coverage-levels.cc
UBSan-MSan-x86_64 :: TestCases/Misc/deduplication.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/enum.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/log-path_test.cc
UBSan-MSan-x86_64 :: TestCases/Misc/missing_return.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/nonnull-arg.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/nonnull.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/nullability.c
UBSan-MSan-x86_64 :: TestCases/Misc/unreachable.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/vla.c
UBSan-MSan-x86_64 :: TestCases/Pointer/index-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Pointer/unsigned-index-expression.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/Function/function.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/PR33221.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/misaligned.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/null.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/vptr-corrupted-vtable-itanium.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/vptr-non-unique-typeinfo.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/vptr-virtual-base-construction.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/vptr-virtual-base.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/vptr.cpp
UBSan-Standalone-i386 :: TestCases/TypeCheck/vptr-non-unique-typeinfo.cpp
UBSan-Standalone-x86_64 :: TestCases/TypeCheck/vptr-non-unique-typeinfo.cpp
UBSan-TSan-x86_64 :: TestCases/TypeCheck/vptr-non-unique-typeinfo.cpp

  Expected Passes: 187
  Expected Failures  : 1
  Unsupported Tests  : 26
  Unexpected Failures: 44

UBsan tests contain ASan, TSan and MSan tests; perhaps more of them.
TSan/NetBSD is work-in-progress.
MSan/NetBSD is in early development on NetBSD.
vptr-non-unique-typeinfo.cpp failures are caused by environment issue in the
test-suite, as the NetBSD loader does not support reliably $ORIGIN.

The GCC patch has been tested in the context of pkgsrc in
pkgsrc-wip/gcc8snapshot (upstream snapshot 20171022).

$ uname -rms
NetBSD 8.99.4 amd64
$ cat u.c
int main(int argc, char **argv) {
  int k = 0x7fff;
  k += argc;
  return 0;
}
$ /usr/pkg/gcc8snapshot/bin/gcc -fsanitize=undefined u.c -o u
$ ./u
u.c:3:5: runtime error: signed integer overflow: 2147483647 + 1 cannot be 
represented in type 'int'
chieftec$ cat a.c
int main(int argc, char **argv)
{
char a[10];
argv[argc + 10] = 0;
return 0;
}
$ /usr/pkg/gcc8snapshot/bin/gcc -fsanitize=address a.c -o a
$ ./a
AddressSanitizer:DEADLYSIGNAL
=
==20441==ERROR: AddressSanitizer: SEGV on unknown address 0x0ff07fff7cf0 (pc 
0x00400a8f bp 0x7f7fe7f0 sp 0x7f7fe770 T0)
==20441==The signal is caused by a WRITE memory access.
#0 0x400a8e in main (/tmp/a+0x400a8e)
#1 0x40090a 

[PATCH v2] Add asan and ubsan support on NetBSD/amd64

2017-10-26 Thread Kamil Rytarowski
Tested on:

$ uname -rms
NetBSD 8.99.4 amd64

The NetBSD support has been developed directly in the compiler-rt upstream
source.

Testing against the LLVM+Clang+compiler-rt (2017-10-25 snapshot from HEAD):

$ make check-asan
Failing Tests (2):
AddressSanitizer-Unit :: 
./Asan-x86_64-calls-Test/AddressSanitizerInterface.ManyThreadsWithStatsStressTest
AddressSanitizer-Unit :: 
./Asan-x86_64-inline-Test/AddressSanitizerInterface.ManyThreadsWithStatsStressTest

  Expected Passes: 436
  Unsupported Tests  : 822
  Unexpected Failures: 2

$ make check-asan-dynamic
Failing Tests (2):
AddressSanitizer-Unit :: 
./Asan-x86_64-calls-Dynamic-Test/AddressSanitizerInterface.ManyThreadsWithStatsStressTest
AddressSanitizer-Unit :: 
./Asan-x86_64-inline-Dynamic-Test/AddressSanitizerInterface.ManyThreadsWithStatsStressTest

  Expected Passes: 372
  Unsupported Tests  : 822
  Unexpected Failures: 2

$ make check-ubsan
Failing Tests (44):
UBSan-ASan-i386 :: TestCases/TypeCheck/vptr-non-unique-typeinfo.cpp
UBSan-ASan-x86_64 :: TestCases/TypeCheck/vptr-non-unique-typeinfo.cpp
UBSan-MSan-x86_64 :: TestCases/Float/cast-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/add-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/div-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/div-zero.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/incdec-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/mul-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/negate-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/no-recover.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/shift.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/sub-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/suppressions.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/uadd-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/uincdec-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/umul-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Integer/usub-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/bool.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/bounds.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/builtins.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/coverage-levels.cc
UBSan-MSan-x86_64 :: TestCases/Misc/deduplication.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/enum.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/log-path_test.cc
UBSan-MSan-x86_64 :: TestCases/Misc/missing_return.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/nonnull-arg.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/nonnull.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/nullability.c
UBSan-MSan-x86_64 :: TestCases/Misc/unreachable.cpp
UBSan-MSan-x86_64 :: TestCases/Misc/vla.c
UBSan-MSan-x86_64 :: TestCases/Pointer/index-overflow.cpp
UBSan-MSan-x86_64 :: TestCases/Pointer/unsigned-index-expression.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/Function/function.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/PR33221.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/misaligned.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/null.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/vptr-corrupted-vtable-itanium.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/vptr-non-unique-typeinfo.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/vptr-virtual-base-construction.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/vptr-virtual-base.cpp
UBSan-MSan-x86_64 :: TestCases/TypeCheck/vptr.cpp
UBSan-Standalone-i386 :: TestCases/TypeCheck/vptr-non-unique-typeinfo.cpp
UBSan-Standalone-x86_64 :: TestCases/TypeCheck/vptr-non-unique-typeinfo.cpp
UBSan-TSan-x86_64 :: TestCases/TypeCheck/vptr-non-unique-typeinfo.cpp

  Expected Passes: 187
  Expected Failures  : 1
  Unsupported Tests  : 26
  Unexpected Failures: 44

UBsan tests contain ASan, TSan and MSan tests; perhaps more of them.
TSan/NetBSD is work-in-progress.
MSan/NetBSD is in early development on NetBSD.
vptr-non-unique-typeinfo.cpp failures are caused by environment issue in the
test-suite, as the NetBSD loader does not support reliably $ORIGIN.

The GCC patch has been tested in the context of pkgsrc in
pkgsrc-wip/gcc8snapshot (upstream snapshot 20171022).

$ uname -rms
NetBSD 8.99.4 amd64
$ cat u.c
int main(int argc, char **argv) {
  int k = 0x7fff;
  k += argc;
  return 0;
}
$ /usr/pkg/gcc8snapshot/bin/gcc -fsanitize=undefined u.c -o u
$ ./u
u.c:3:5: runtime error: signed integer overflow: 2147483647 + 1 cannot be 
represented in type 'int'
chieftec$ cat a.c
int main(int argc, char **argv)
{
char a[10];
argv[argc + 10] = 0;
return 0;
}
$ /usr/pkg/gcc8snapshot/bin/gcc -fsanitize=address a.c -o a
$ ./a
AddressSanitizer:DEADLYSIGNAL
=
==20441==ERROR: AddressSanitizer: SEGV on unknown address 0x0ff07fff7cf0 (pc 
0x00400a8f bp 0x7f7fe7f0 sp 0x7f7fe770 T0)
==20441==The signal is caused by a WRITE memory access.
#0 0x400a8e in main (/tmp/a+0x400a8e)
#1 0x40090a 

Re: [PATCH] Change default optimization level to -Og

2017-10-26 Thread Jakub Jelinek
On Thu, Oct 26, 2017 at 05:12:40PM +, Wilco Dijkstra wrote:
> GCC's default optimization level is -O0.  Unfortunately unlike other 
> compilers,
> GCC generates extremely inefficient code with -O0.  It is almost unusable for
> low-level debugging or manual inspection of generated code.  So a -O option is
> always required for compilation.  -Og not only allows for fast compilation, 
> but
> also produces code that is efficient, readable as well as debuggable.
> Therefore -Og makes for a much better default setting.
> 
> Any comments?
> 
> 2017-10-26  Wilco Dijkstra  
> 
>   * opts.c (default_options_optimization): Set default to -Og.
> 
> doc/
>   * invoke.texi (-O0) Remove default mention.
>   (-Og): Add mention of default setting.

This would only severely confuse users.  -Og has lots of unresolved issues
for debugging experience, and changing the default this way is IMHO
extremely undesirable.

Jakub


Re: [PATCH] Change default optimization level to -Og

2017-10-26 Thread Eric Botcazou
> GCC's default optimization level is -O0.  Unfortunately unlike other
> compilers, GCC generates extremely inefficient code with -O0.

Agreed (but this can probably be worked on).

> It is almost unusable for low-level debugging or manual inspection of
> generated code. 

Likewise.

> So a -O option is always required for compilation.

For compiler hackers only though.  For normal users, -O0 is still the only 
level where you're guaranteed to be able to do full source debugging.

-- 
Eric Botcazou


Re: Rename cxx1998 into normal

2017-10-26 Thread François Dumont

On 26/10/2017 19:09, Jonathan Wakely wrote:

On 26/10/17 18:55 +0200, Daniel Krügler wrote:

2017-10-26 7:51 GMT+02:00 François Dumont :

Hi

    We once talk about rename the __cxx1998 namespace into something 
less

C++98 biased. Here is the patch to do so.

    Ok with the new name ?


IMO the name should somehow still contain "cxx" somewhere, otherwise
this could easily cause a semantic ambiguity in situations, where the
term "normal" has a specific meaning.

What about "__cxxdef[ault]" ?


I'm not sure we need to change it at all. The name is anachronistic,
but harmless.



As part of this message:

https://gcc.gnu.org/ml/libstdc++/2016-09/msg00076.html

I thought that you also wanted to rename it. But I simply misunderstood, 
it is fine with me to keep it this way.


François




Re: [09/nn] Add a fixed_size_mode_pod class

2017-10-26 Thread Jakub Jelinek
On Thu, Oct 26, 2017 at 02:43:55PM +0200, Richard Biener wrote:
> On Thu, Oct 26, 2017 at 2:18 PM, Richard Sandiford
>  wrote:
> > Richard Biener  writes:
> >> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
> >>  wrote:
> >>> This patch adds a POD version of fixed_size_mode.  The only current use
> >>> is for storing the __builtin_apply and __builtin_result register modes,
> >>> which were made fixed_size_modes by the previous patch.
> >>
> >> Bah - can we update our host compiler to C++11/14 please ...?
> >> (maybe requiring that build with GCC 4.8 as host compiler works,
> >> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).
> >
> > That'd be great :-)  It would avoid all the poly_int_pod stuff too,
> > and allow some clean-up of wide-int.h.
> 
> Can you figure what oldest GCC release supports the C++11/14 POD handling
> that would be required?

I think it is too early for that, we aren't LLVM or Rust that don't really
care about what build requirements they impose on users.

Jakub


Re: [09/nn] Add a fixed_size_mode_pod class

2017-10-26 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Oct 26, 2017 at 2:18 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
>>>  wrote:
 This patch adds a POD version of fixed_size_mode.  The only current use
 is for storing the __builtin_apply and __builtin_result register modes,
 which were made fixed_size_modes by the previous patch.
>>>
>>> Bah - can we update our host compiler to C++11/14 please ...?
>>> (maybe requiring that build with GCC 4.8 as host compiler works,
>>> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).
>>
>> That'd be great :-)  It would avoid all the poly_int_pod stuff too,
>> and allow some clean-up of wide-int.h.
>
> Can you figure what oldest GCC release supports the C++11/14 POD handling
> that would be required?

Looks like GCC 4.7, which was also the first to support -std=c++11
as an option.  I could bootstrap with that (after s/std=gnu..98/std=c++11/
in configure) without all the POD types.  It also supports "= default"
and template using, which would get rid of some wide-int.h ugliness.

Being able to construct poly_int::coeffs directly might also allow
some optimisations, and should help avoid the POLY_SET_COEFF bug that
Martin found, but I haven't looked at that yet.

Thanks,
Richard


Re: [09/nn] Add a fixed_size_mode_pod class

2017-10-26 Thread Eric Botcazou
> Can you figure what oldest GCC release supports the C++11/14 POD handling
> that would be required?

GCC needs to be buildable by other compilers than itself though.

-- 
Eric Botcazou


Re: [PATCH] Include missing source file for sanitizers on NetBSD

2017-10-26 Thread Jakub Jelinek
On Thu, Oct 26, 2017 at 09:24:58PM +0200, Kamil Rytarowski wrote:
> On 26.10.2017 21:19, Jakub Jelinek wrote:
> > On Thu, Oct 26, 2017 at 09:02:25PM +0200, Kamil Rytarowski wrote:
> >> 2017-10-26  Kamil Rytarowski  
> >>
> >>* sanitizer_common/Makefile.am (sanitizer_common_files): Add
> >>sanitizer_platform_limits_netbsd.cc.
> >>* sanitizer_common/Makefile.in: Regenerated.
> > 
> > Why is this needed?
> > libsanitizer/configure.tgt will set UNSUPPORTED=1 for NetBSD.
> > So, the rest of the libsanitizer port to NetBSD in GCC is missing too
> > and should be submitted together with this change.
> > 
> 
> There are missing symbols without sanitizer_platform_limits_netbsd.cc.
> 
> I've planned to submit configure.tgt change separately. Should I squash
> both changes into one commit?

Yes.  The above change makes no sense without the configure.tgt one.
Also, the gcc-patches submission should say where and how it has been
tested.

Jakub


Re: [PATCH] Include missing source file for sanitizers on NetBSD

2017-10-26 Thread Kamil Rytarowski
On 26.10.2017 21:19, Jakub Jelinek wrote:
> On Thu, Oct 26, 2017 at 09:02:25PM +0200, Kamil Rytarowski wrote:
>> 2017-10-26  Kamil Rytarowski  
>>
>>  * sanitizer_common/Makefile.am (sanitizer_common_files): Add
>>  sanitizer_platform_limits_netbsd.cc.
>>  * sanitizer_common/Makefile.in: Regenerated.
> 
> Why is this needed?
> libsanitizer/configure.tgt will set UNSUPPORTED=1 for NetBSD.
> So, the rest of the libsanitizer port to NetBSD in GCC is missing too
> and should be submitted together with this change.
> 
>   Jakub
> 

There are missing symbols without sanitizer_platform_limits_netbsd.cc.

I've planned to submit configure.tgt change separately. Should I squash
both changes into one commit?



signature.asc
Description: OpenPGP digital signature


Re: [Patch, fortran] PR81758 - [7/8 Regression] [OOP] Broken vtab

2017-10-26 Thread Andre Vehreschild
Hi Paul,

Without having tested the patch, it looks reasonable to me. So ok from my side.

- Andre

Am 26. Oktober 2017 21:12:45 MESZ schrieb Paul Richard Thomas 
:
>Dear All,
>
>Thanks to Dimitry Liakh for both reporting the problem and doing a lot
>of the diagnostic work. Once the offending line in a very complicated
>code was located, the fix was trivial. Generating a reduced testcase
>took rather longer :-)
>
>The comment in the testcase tells the story. The fix is a one-liner
>that follows immediately from the explanation.
>
>Bootstrapped and regtested on FC23/x86_64 - OK for trunk and 7-branch.
>
>Cheers
>
>Paul
>
>2017-10-26  Paul Thomas  
>
>PR fortran/81758
>* trans-expr.c (trans_class_vptr_len_assignment): 'vptr_expr'
>must only be set if the right hand side expression is of class
>type.
>
>2017-10-26  Paul Thomas  
>
>PR fortran/81758
>* gfortran.dg/class_63.f90: New test.

-- 
Andre Vehreschild * Kreuzherrenstr. 8 * 52062 Aachen
Tel.: +49 241 929 10 18 * ve...@gmx.de


Re: [PATCH] Include missing source file for sanitizers on NetBSD

2017-10-26 Thread Jakub Jelinek
On Thu, Oct 26, 2017 at 09:02:25PM +0200, Kamil Rytarowski wrote:
> 2017-10-26  Kamil Rytarowski  
> 
>   * sanitizer_common/Makefile.am (sanitizer_common_files): Add
>   sanitizer_platform_limits_netbsd.cc.
>   * sanitizer_common/Makefile.in: Regenerated.

Why is this needed?
libsanitizer/configure.tgt will set UNSUPPORTED=1 for NetBSD.
So, the rest of the libsanitizer port to NetBSD in GCC is missing too
and should be submitted together with this change.

Jakub


Re: [006/nnn] poly_int: tree constants

2017-10-26 Thread Pedro Alves
On 10/26/2017 07:54 PM, Martin Sebor wrote:

> (...) in the general case, it is preferable to
> declare each variable at the point where its initial value is known
> (or can be computed) and initialize it on its declaration.

With that I fully agree, except it's not always possible or
natural.  The issue at hand usually turns up with
conditional initialization, like:

void foo ()
{
  int t;
  if (something)
t = 1;
  else if (something_else)
t = 2;
  if (t == 1)
bar (); 
}

That's a simple example of course, but more complicated
conditionals aren't so easy to grok and spot the bug.

In the case above, I'd much prefer if the compiler tells me
I missed initializing 't' than initializing it to 0 "just
in case".

Thanks,
Pedro Alves



[PATCH] Include missing source file for sanitizers on NetBSD

2017-10-26 Thread Kamil Rytarowski
2017-10-26  Kamil Rytarowski  

* sanitizer_common/Makefile.am (sanitizer_common_files): Add
sanitizer_platform_limits_netbsd.cc.
* sanitizer_common/Makefile.in: Regenerated.
---
 libsanitizer/ChangeLog| 6 ++
 libsanitizer/sanitizer_common/Makefile.am | 2 +-
 libsanitizer/sanitizer_common/Makefile.in | 5 -
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/libsanitizer/ChangeLog b/libsanitizer/ChangeLog
index 63e71317cbf..c9863c281d1 100644
--- a/libsanitizer/ChangeLog
+++ b/libsanitizer/ChangeLog
@@ -1,3 +1,9 @@
+2017-10-26  Kamil Rytarowski  
+
+   * sanitizer_common/Makefile.am (sanitizer_common_files): Add
+   sanitizer_platform_limits_netbsd.cc.
+   * sanitizer_common/Makefile.in: Regenerated.
+
 2017-10-20  Jakub Jelinek  
 
PR sanitizer/82595
diff --git a/libsanitizer/sanitizer_common/Makefile.am 
b/libsanitizer/sanitizer_common/Makefile.am
index adaab4cee54..70563eded36 100644
--- a/libsanitizer/sanitizer_common/Makefile.am
+++ b/libsanitizer/sanitizer_common/Makefile.am
@@ -41,6 +41,7 @@ sanitizer_common_files = \
sanitizer_persistent_allocator.cc \
sanitizer_platform_limits_linux.cc \
sanitizer_platform_limits_posix.cc \
+   sanitizer_platform_limits_netbsd.cc \
sanitizer_posix.cc \
sanitizer_posix_libcdep.cc \
sanitizer_printf.cc \
@@ -114,4 +115,3 @@ AM_MAKEFLAGS = \
 MAKEOVERRIDES=
 
 ## 
-
diff --git a/libsanitizer/sanitizer_common/Makefile.in 
b/libsanitizer/sanitizer_common/Makefile.in
index b2acc5caf56..f05c181f0b1 100644
--- a/libsanitizer/sanitizer_common/Makefile.in
+++ b/libsanitizer/sanitizer_common/Makefile.in
@@ -91,7 +91,8 @@ am__objects_1 = sancov_flags.lo sanitizer_allocator.lo \
sanitizer_mac.lo sanitizer_mac_libcdep.lo \
sanitizer_persistent_allocator.lo \
sanitizer_platform_limits_linux.lo \
-   sanitizer_platform_limits_posix.lo sanitizer_posix.lo \
+   sanitizer_platform_limits_posix.lo \
+   sanitizer_platform_limits_netbsd.lo sanitizer_posix.lo \
sanitizer_posix_libcdep.lo sanitizer_printf.lo \
sanitizer_procmaps_common.lo sanitizer_procmaps_freebsd.lo \
sanitizer_procmaps_linux.lo sanitizer_procmaps_mac.lo \
@@ -323,6 +324,7 @@ sanitizer_common_files = \
sanitizer_persistent_allocator.cc \
sanitizer_platform_limits_linux.cc \
sanitizer_platform_limits_posix.cc \
+   sanitizer_platform_limits_netbsd.cc \
sanitizer_posix.cc \
sanitizer_posix_libcdep.cc \
sanitizer_printf.cc \
@@ -468,6 +470,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ 
@am__quote@./$(DEPDIR)/sanitizer_mac_libcdep.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ 
@am__quote@./$(DEPDIR)/sanitizer_persistent_allocator.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ 
@am__quote@./$(DEPDIR)/sanitizer_platform_limits_linux.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ 
@am__quote@./$(DEPDIR)/sanitizer_platform_limits_netbsd.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ 
@am__quote@./$(DEPDIR)/sanitizer_platform_limits_posix.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/sanitizer_posix.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ 
@am__quote@./$(DEPDIR)/sanitizer_posix_libcdep.Plo@am__quote@
-- 
2.14.2



[Patch, fortran] PR81758 - [7/8 Regression] [OOP] Broken vtab

2017-10-26 Thread Paul Richard Thomas
Dear All,

Thanks to Dimitry Liakh for both reporting the problem and doing a lot
of the diagnostic work. Once the offending line in a very complicated
code was located, the fix was trivial. Generating a reduced testcase
took rather longer :-)

The comment in the testcase tells the story. The fix is a one-liner
that follows immediately from the explanation.

Bootstrapped and regtested on FC23/x86_64 - OK for trunk and 7-branch.

Cheers

Paul

2017-10-26  Paul Thomas  

PR fortran/81758
* trans-expr.c (trans_class_vptr_len_assignment): 'vptr_expr'
must only be set if the right hand side expression is of class
type.

2017-10-26  Paul Thomas  

PR fortran/81758
* gfortran.dg/class_63.f90: New test.
Index: gcc/fortran/trans-expr.c
===
*** gcc/fortran/trans-expr.c(revision 253976)
--- gcc/fortran/trans-expr.c(working copy)
*** trans_class_vptr_len_assignment (stmtblo
*** 8051,8057 
  {
/* Get the vptr from the rhs expression only, when it is variable.
 Functions are expected to be assigned to a temporary beforehand.  */
!   vptr_expr = re->expr_type == EXPR_VARIABLE
  ? gfc_find_and_cut_at_last_class_ref (re)
  : NULL;
if (vptr_expr != NULL && vptr_expr->ts.type == BT_CLASS)
--- 8051,8057 
  {
/* Get the vptr from the rhs expression only, when it is variable.
 Functions are expected to be assigned to a temporary beforehand.  */
!   vptr_expr = (re->expr_type == EXPR_VARIABLE && re->ts.type == BT_CLASS)
  ? gfc_find_and_cut_at_last_class_ref (re)
  : NULL;
if (vptr_expr != NULL && vptr_expr->ts.type == BT_CLASS)
Index: gcc/testsuite/gfortran.dg/class_63.f90
===
*** gcc/testsuite/gfortran.dg/class_63.f90  (nonexistent)
--- gcc/testsuite/gfortran.dg/class_63.f90  (working copy)
***
*** 0 
--- 1,80 
+ ! { dg-do run }
+ !
+ ! Tests the fix for PR81758, in which the vpointer for 'ptr' in
+ ! function 'pointer_value' would be set to the vtable of the component
+ ! 'container' rather than that of the component 'vec_elem'. In this test
+ ! case it is ensured that there is a single typebound procedure for both
+ ! types, so that different values are returned. In the original problem
+ ! completely different procedures were involved so that a segfault resulted.
+ !
+ ! Reduced from the original code of Dimitry Liakh   by
+ !   Paul Thomas  
+ !
+ module types
+   type, public:: gfc_container_t
+   contains
+ procedure, public:: get_value => ContTypeGetValue
+   end type gfc_container_t
+ 
+   !Element of a container:
+   type, public:: gfc_cont_elem_t
+ integer :: value_p
+   contains
+ procedure, public:: get_value => ContElemGetValue
+   end type gfc_cont_elem_t
+ 
+   !Vector element:
+   type, extends(gfc_cont_elem_t), public:: vector_elem_t
+   end type vector_elem_t
+ 
+   !Vector:
+   type, extends(gfc_container_t), public:: vector_t
+ type(vector_elem_t), allocatable, private :: vec_elem
+   end type vector_t
+ 
+   type, public :: vector_iter_t
+ class(vector_t), pointer, private :: container => NULL()
+   contains
+ procedure, public:: get_vector_value => vector_Value
+ procedure, public:: get_pointer_value => pointer_value
+   end type
+ 
+ contains
+   integer function ContElemGetValue (this)
+ class(gfc_cont_elem_t) :: this
+ ContElemGetValue = this%value_p
+   end function
+ 
+   integer function ContTypeGetValue (this)
+ class(gfc_container_t) :: this
+ ContTypeGetValue = 0
+   end function
+ 
+   integer function vector_Value (this)
+ class(vector_iter_t) :: this
+ vector_value = this%container%vec_elem%get_value()
+   end function
+ 
+   integer function pointer_value (this)
+ class(vector_iter_t), target :: this
+ class(gfc_cont_elem_t), pointer :: ptr
+ ptr => this%container%vec_elem
+ pointer_value = ptr%get_value()
+   end function
+ 
+   subroutine factory (arg)
+ class (vector_iter_t), pointer :: arg
+ allocate (vector_iter_t :: arg)
+ allocate (vector_t :: arg%container)
+ allocate (arg%container%vec_elem)
+ arg%container%vec_elem%value_p = 99
+   end subroutine
+ end module
+ 
+   use types
+   class (vector_iter_t), pointer :: x
+ 
+   call factory (x)
+   if (x%get_vector_value() .ne. 99) call abort
+   if (x%get_pointer_value() .ne. 99) call abort
+ end


Re: [006/nnn] poly_int: tree constants

2017-10-26 Thread Martin Sebor

On 10/26/2017 12:05 PM, Pedro Alves wrote:

On 10/26/2017 05:37 PM, Martin Sebor wrote:


I agree that the latter use case is more common in GCC, but I don't
see it as a good thing.  GCC was written in C and most code still
uses now outdated C practices such as declaring variables at the top
of a (often long) function, and usually without initializing them.
It's been established that it's far better to declare variables with
the smallest scope, and to initialize them on declaration.  Compilers
are smart enough these days to eliminate redundant initialization or
assignments.


I don't agree that that's established.  FWIW, I'm on the
"prefer the -Wuninitialized" warnings camp.  I've been looking
forward to all the VRP and threader improvements hoping that that
warning (and -Wmaybe-uninitialized...) will improve along.


You're by far not alone, but it's a shrinking camp as
the majority have come to appreciate the benefits of the practice.
You can see it reflected in most popular coding standards, including
the CERT C++ Secure Coding Standard, Google C++ Style Guide, Sutter
and Alexandrescu's C++ Coding Standards, Scott Meyer's books, and
so on.  Just like with every rule, there are exceptions, but there
should be no doubt that in the general case, it is preferable to
declare each variable at the point where its initial value is known
(or can be computed) and initialize it on its declaration.


My comment is not motivated by convenience.  What I'm concerned
about is that defining a default ctor to be a no-op defeats the
zero-initialization semantics most users expect of T().


This sounds like it's a problem because GCC is written in C++98.

You can get the semantics you want in C++11 by defining
the constructor with "= default;" :

 struct T
 {
   T(int); // some other constructor forcing me to
   // add a default constructor.

   T() = default; // give me default construction using
  // default initialization.
   int i;
 };

And now 'T t;' leaves T::i default initialized, i.e.,
uninitialized, while T() value-initializes T::i, i.e.,
initializes it to zero.

So if that's a concern, maybe you could use "= default"
conditionally depending on #if __cplusplus >= C++11, so that
you'd get it for stages after stage1.

Or just start requiring C++11 already. :-)


That would make sense to me and from the sound of some of his
comments also Richard's preference.

Martin


Re: [PATCH] expand -fdebug-prefix-map documentation

2017-10-26 Thread Sandra Loosemore

On 10/26/2017 11:47 AM, Gerald Pfeifer wrote:

On Wed, 25 Oct 2017, Jim Wilson wrote:

The current documentation doesn't explain what the option is for, or
how one might use it.  The attached patch expands the documentation a
bit to try to explain this.



OK?


Thanks you for fleshing this out, Jim!

This looks fine to me (modula Sandra's note).  Just a question: would
we refer to GDB instead of gdb here?  It feels a little in between to
me, whether we are referring to the tool or the actual binary.  I'm
sure Sandra will have guidance for us. ;-)


Hmmm, yes.  "GDB" is the preferred spelling elsewhere in the manual.

-Sandra



Re: [Diagnostic Patch] don't print column zero

2017-10-26 Thread Nathan Sidwell

On 10/26/2017 02:12 PM, Eric Gallager wrote:

On 10/26/17, Nathan Sidwell  wrote:

On 10/26/2017 10:34 AM, David Malcolm wrote:



Possibly a silly question, but is it OK to have a formatted string
call in which some of the arguments aren't consumed? (here "col" is only
consumed for the true case, which consumes 2 arguments; it's not consumed
for the false case).


Yes.


I think I remember clang disagreeing; I remember it printing warnings
from -Wformat-extra-args in a similar situation in gnulib's
error_at_line module


C++ 21.10.1 defers to C.  C-99 7.15.1 has no words saying va_arg must be 
applied to exactly all arguments captured by va_list object. (and I'm 
pretty sure scanf can bail early)


Now, it might be sensible to warn about:
  printf ("", 5);
because printf's semantics are known.  But that's not ill-formed, just 
inefficient.  And in this case we're doing the equivalent of:

  printf (not-compile-time-constant, 5);

nathan

--
Nathan Sidwell


Re: [patch][i386, AVX] Adding missing CMP* intrinsics

2017-10-26 Thread Kirill Yukhin
Hello Olga, Sebastian,
On 20 Oct 08:36, Peryt, Sebastian wrote:
> Hi,
> 
> This patch written by Olga Makhotina adds listed below missing intrinsics:
> _mm512_[mask_]cmpeq_[pd|ps]_mask
> _mm512_[mask_]cmple_[pd|ps]_mask
> _mm512_[mask_]cmplt_[pd|ps]_mask
> _mm512_[mask_]cmpneq_[pd|ps]_mask
> _mm512_[mask_]cmpnle_[pd|ps]_mask
> _mm512_[mask_]cmpnlt_[pd|ps]_mask
> _mm512_[mask_]cmpord_[pd|ps]_mask
> _mm512_[mask_]cmpunord_[pd|ps]_mask
> 
> Is it ok for trunk?
Your patch is OK for trunk. I've checked it in.

--
Thanks, K

> Thanks,
> Sebastian
> 




Re: [PATCH] Change default optimization level to -Og

2017-10-26 Thread Eric Gallager
On 10/26/17, Wilco Dijkstra  wrote:
> GCC's default optimization level is -O0.  Unfortunately unlike other
> compilers,
> GCC generates extremely inefficient code with -O0.  It is almost unusable
> for
> low-level debugging or manual inspection of generated code.  So a -O option
> is
> always required for compilation.  -Og not only allows for fast compilation,
> but
> also produces code that is efficient, readable as well as debuggable.
> Therefore -Og makes for a much better default setting.
>
> Any comments?

There are a number of bugs with -Og that I'd want to see fixed before
making it the default; I'll follow this message up once I find them
all.

>
> 2017-10-26  Wilco Dijkstra  
>
>   * opts.c (default_options_optimization): Set default to -Og.
>
> doc/
>   * invoke.texi (-O0) Remove default mention.
>   (-Og): Add mention of default setting.
>
> --
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index
> 3328a3b5fafa6a98007eff52d2a26af520de9128..74c33ea35b9f320b419a3417e6007d2391536f1b
> 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -7343,7 +7343,7 @@ by @option{-O2} and also turns on the following
> optimization flags:
>  @item -O0
>  @opindex O0
>  Reduce compilation time and make debugging produce the expected
> -results.  This is the default.
> +results.
>
>  @item -Os
>  @opindex Os
> @@ -7371,7 +7371,7 @@ Optimize debugging experience.  @option{-Og} enables
> optimizations
>  that do not interfere with debugging. It should be the optimization
>  level of choice for the standard edit-compile-debug cycle, offering
>  a reasonable level of optimization while maintaining fast compilation
> -and a good debugging experience.
> +and a good debugging experience.  This is the default.
>  @end table
>
>  If you use multiple @option{-O} options, with or without level numbers,
> diff --git a/gcc/opts.c b/gcc/opts.c
> index
> dfad955e220870a3250198640f3790c804b191e0..74511215309f11445685db4894be2ab6881695d3
> 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -565,6 +565,12 @@ default_options_optimization (struct gcc_options
> *opts,
>int opt2;
>bool openacc_mode = false;
>
> +  /* Set the default optimization to -Og.  */
> +  opts->x_optimize_size = 0;
> +  opts->x_optimize = 1;
> +  opts->x_optimize_fast = 0;
> +  opts->x_optimize_debug = 1;
> +
>/* Scan to see what optimization level has been specified.  That will
>   determine the default value of many flags.  */
>for (i = 1; i < decoded_options_count; i++)
>
>


Re: [Diagnostic Patch] don't print column zero

2017-10-26 Thread Eric Gallager
On 10/26/17, Nathan Sidwell  wrote:
> On 10/26/2017 10:34 AM, David Malcolm wrote:
>> [CCing Rainer and Mike for the gcc-dg.exp part]
>
>> Alternate idea: could show_column become a tri-state:
>>* default: show non-zero columns
>>* never: never show columns
>>* always: always show a column, printing 0 for the no-column case
>> and then use "always" in our testsuite
>
> One of the things this patch shows up is the number of places where
> we're default accepting a zero column.  IMHO it is best to explicitly
> mark such tests.
>
>>> +  size_t l = sprintf (result, col ? ":%d:%d" : ":%d", line, col);
>>
>> Possibly a silly question, but is it OK to have a formatted string
>> call in which some of the arguments aren't consumed? (here "col" is only
>> consumed for the true case, which consumes 2 arguments; it's not consumed
>> for the false case).
>
> Yes.

I think I remember clang disagreeing; I remember it printing warnings
from -Wformat-extra-args in a similar situation in gnulib's
error_at_line module

>
>>> +  gcc_checking_assert (l + 1 < sizeof (result));
>>
>> Would snprintf be safer?
>
> I guess. but the assert's still needed.
>
>> Please create a selftest for the function, covering these cases:
>>
>> * line == 0
>> * line > 0 and col == 0
>> * line > 0 and col > 0 (checking output for these cases)
>> * line == INT_MAX and col == INT_MAX (without checking output, just to
>> tickle the assert)
>> * line == INT_MIN and col == INT_MIN (likewise)
>
> Ok, I'll investigate this new fangled self-testing framework :)
>
>> There are some testcases where we deliberately don't have a *line*
>> number; what happens to these?
>
> Those don't change.  the dg-harness already does NOT expect a column
> when lineno=0.
>
>> My Tcl skills aren't great, so hopefully someone else can review this;
>> CCing Rainer and Mike.
>>
>> Also, is the proposed syntax for "no columns" OK?  (note the tristate
>> idea above)
>
> I'm not wedded to '-:', but as mentioned above, I think the tests should
> be explicit about whether a column is expected or not (and the default
> needs to be 'expect column', because of history)
>
> thanks for your comments.
>
> nathan
>
> --
> Nathan Sidwell
>


Re: [RFA][PATCH] Provide a class interface into substitute_and_fold.

2017-10-26 Thread Richard Biener
On October 26, 2017 6:50:15 PM GMT+02:00, Jeff Law  wrote:
>On 10/26/2017 03:24 AM, Richard Biener wrote:
>> On Tue, Oct 24, 2017 at 8:44 PM, Jeff Law  wrote:
>>> This is similar to the introduction of the ssa_propagate_engine, but
>for
>>> the substitution/replacements bits.
>>>
>>> In a couple places the pass specific virtual functions are just
>wrappers
>>> around existing functions.  A good example of this is
>>> ccp_folder::get_value.  Many other routines in tree-ssa-ccp.c want
>to
>>> use get_constant_value.  Some may be convertable to use the class
>>> instance, but I haven't looked closely.
>>>
>>> Another example is vrp_folder::get_value.  In this case we're
>wrapping
>>> op_with_constant_singleton_value.  In a later patch that moves into
>the
>>> to-be-introduced vr_values class so we'll delegate to that class
>rather
>>> than wrap.
>>>
>>> FWIW I did look at having a single class for the propagation engine
>and
>>> the substitution engine.  That turned out to be a bit problematical
>due
>>> to the calls into the substitution engine from the evrp bits which
>don't
>>> use the propagation engine at all.  Given propagation and
>substitution
>>> are distinct concepts I ultimately decided the cleanest path forward
>was
>>> to keep the two classes separate.
>>>
>>> Bootstrapped and regression tested on x86_64.  OK for the trunk?
>> 
>> So what I don't understand in this 2 part series is why you put
>> substitute-and-fold into a different class.
>Good question.  They're in different classes because they can and are
>used independently.
>
>For example, tree-complex uses the propagation engine, but not the
>substitution engine.   EVRP uses the substitution engine, but not the
>propagation engine.  The standard VRP algorithm uses both engines, but
>other than shared data (vr_values), they are independent.  CCP and
>copy-prop are similar to VRP.  Essentially one is a producer, the other
>a consumer.
>
>It might be possible to smash them together, but I'm not sure if that's
>wise or not.  I do suspect that smashing them together would be easier
>once all the other work is done if we were to make that choice.  But
>composition, multiple inheritance or just passing around the class
>instance may be better.  I think that's a TBD.
>
>
>> 
>> This makes it difficult for users to inherit and put the lattice in
>> the deriving class as we have the visit routines which will update
>> the lattice and the get_value hook which queries it.
>Yes.  The key issue is the propagation step produces vr_values and the
>substitution step consumes vr_values.
>
>For VRP the way I solve this is to have a vr_values class in the
>derived
>propagation engine class as well as the derived substitution engine
>class.  When we're done with propagation we move the class instance
>from
>the propagation engine to the substitution engine.
>
>EVRP works similarly except the vr_values starts in the evrp_dom_walker
>class, then moves to its substitution engine.
>
>There's a bit of cleanup to do there in terms of implementation.  But
>that's the basic model that I'm using right now.  It should be fairly
>easy to move to a unioned class or multiple inheritance if we so
>desired.  It shouldn't affect most of what I'm doing now around
>encapsulating vr_values.
>
>> 
>> So from maintaining the state for the users using a single
>> class whould be more appropriate.  Of course it seems like
>> substitute-and-fold can be used without using the SSA
>> propagator itself and the SSA propagator can be used
>> without the substitute and fold engine.
>Right.  THey can and are used independently which is what led to having
>independent classes.
>
>
>> 
>> IIRC we decided against using multiple inheritance?  Which
>> means a user would put the lattice in the SSA propagation
>> engine derived class and do the inheriting via composition
>> as member in the substitute_and_fold engine?
>Right, we have decided against using multiple inheritance.   So rather
>than using  multiple inheritance I pass the vr_values object.  So in my
>development tree I have this:
>
>
>class vrp_prop : public ssa_propagation_engine
>{
> public:
>enum ssa_prop_result visit_stmt (gimple *, edge *, tree *) FINAL
>OVERRIDE;
>  enum ssa_prop_result visit_phi (gphi *) FINAL OVERRIDE;
>
>  /* XXX Drop the indirection through the pointer, not needed.  */
>  class vr_values *vr_values;
>};
>
>
>class vrp_folder : public substitute_and_fold_engine
>{
> public:
>  tree get_value (tree) FINAL OVERRIDE;
>  bool fold_stmt (gimple_stmt_iterator *) FINAL OVERRIDE;
>  class vr_values *vr_values;
>};
>
>In vrp_finalize:
>  class vrp_folder vrp_folder;
>  vrp_folder.vr_values = vr_values;
>  vrp_folder.substitute_and_fold ();
>
>
>I'm in the process of cleaning this up -- in particular there'll be a
>ctor in vrp_folder which will require passing in a vr_values and we'll
>be dropping some indirections as well.
>
>I just went through his exact cleanup yesterday with the 

Re: [006/nnn] poly_int: tree constants

2017-10-26 Thread Pedro Alves
On 10/26/2017 05:37 PM, Martin Sebor wrote:

> I agree that the latter use case is more common in GCC, but I don't
> see it as a good thing.  GCC was written in C and most code still
> uses now outdated C practices such as declaring variables at the top
> of a (often long) function, and usually without initializing them.
> It's been established that it's far better to declare variables with
> the smallest scope, and to initialize them on declaration.  Compilers
> are smart enough these days to eliminate redundant initialization or
> assignments.

I don't agree that that's established.  FWIW, I'm on the
"prefer the -Wuninitialized" warnings camp.  I've been looking
forward to all the VRP and threader improvements hoping that that
warning (and -Wmaybe-uninitialized...) will improve along.

> My comment is not motivated by convenience.  What I'm concerned
> about is that defining a default ctor to be a no-op defeats the
> zero-initialization semantics most users expect of T().

This sounds like it's a problem because GCC is written in C++98.

You can get the semantics you want in C++11 by defining
the constructor with "= default;" :

 struct T
 {
   T(int); // some other constructor forcing me to 
   // add a default constructor.

   T() = default; // give me default construction using
  // default initialization.
   int i;
 };

And now 'T t;' leaves T::i default initialized, i.e.,
uninitialized, while T() value-initializes T::i, i.e.,
initializes it to zero.

So if that's a concern, maybe you could use "= default" 
conditionally depending on #if __cplusplus >= C++11, so that
you'd get it for stages after stage1.

Or just start requiring C++11 already. :-)

Thanks,
Pedro Alves



Re: [006/nnn] poly_int: tree constants

2017-10-26 Thread Richard Sandiford
Martin Sebor  writes:
>>  /* The tree and const_tree overload templates.   */
>>  namespace wi
>>  {
>> +  class unextended_tree
>> +  {
>> +  private:
>> +const_tree m_t;
>> +
>> +  public:
>> +unextended_tree () {}
>
> Defining no-op ctors is quite dangerous and error-prone.  I suggest
> to instead default initialize the member(s):
>
>unextended_tree (): m_t () {}
>
> Ditto everywhere else, such as in:

 This is really performance-senesitive code though, so I don't think
 we want to add any unnecessary initialisation.  Primitive types are
 uninitalised by default too, and the point of this class is to
 provide an integer-like interface.
>>>
>>> I understand the performance concern (more on that below), but
>>> to clarify the usability issues,  I don't think the analogy with
>>> primitive types is quite fitting here: int() evaluates to zero,
>>> as do the values of i and a[0] and a[1] after an object of type
>>> S is constructed using its default ctor, i.e., S ():
>>>
>>>struct S {
>>>  int i;
>>>  int a[2];
>>>
>>>  S (): i (), a () { }
>>>};
>>
>> Sure, I realise that.  I meant that:
>>
>>   int x;
>>
>> doesn't initialise x to zero.  So it's a question of which case is the
>> most motivating one: using "x ()" to initialise x to 0 in a constructor
>> or "int x;" to declare a variable of type x, uninitialised.  I think the
>> latter use case is much more common (at least in GCC).  Rearranging
>> things, I said later:
>
> I agree that the latter use case is more common in GCC, but I don't
> see it as a good thing.  GCC was written in C and most code still
> uses now outdated C practices such as declaring variables at the top
> of a (often long) function, and usually without initializing them.
> It's been established that it's far better to declare variables with
> the smallest scope, and to initialize them on declaration.  Compilers
> are smart enough these days to eliminate redundant initialization or
> assignments.
>
 In your other message you used the example of explicit default
 initialisation, such as:

 class foo
 {
   foo () : x () {}
   unextended_tree x;
 };

 But I think we should strongly discourage that kind of thing.
 If someone wants to initialise x to a particular value, like
 integer_zero_node, then it would be better to do it explicitly.
 If they don't care what the initial value is, then for these
 integer-mimicing classes, uninitialised is as good as anything
 else. :-)
>>
>> What I meant was: if you want to initialise "i" to 1 in your example,
>> you'd have to write "i (1)".  Being able to write "i ()" instead of
>> "i (0)" saves one character but I don't think it adds much clarity.
>> Explicitly initialising something only seems worthwhile if you say
>> what you're initialising it to.
>
> My comment is not motivated by convenience.  What I'm concerned
> about is that defining a default ctor to be a no-op defeats the
> zero-initialization semantics most users expect of T().
>
> This is particularly concerning for a class designed to behave
> like an [improved] basic integer type.  Such a class should act
> as closely as possible to the type it emulates and in the least
> surprising ways.  Any sort of a deviation that replaces well-
> defined behavior with undefined is a gotcha and a bug waiting
> to happen.
>
> It's also a concern in generic (template) contexts where T() is
> expected to zero-initialize.  A template designed to work with
> a fundamental integer type should also work with a user-defined
> type designed to behave like an integer.

But that kind of situation is one where using "T (0)" over "T ()"
is useful.  It means that template substitution will succeed for
T that are sufficiently integer-like to have a single well-defined
zero but not for T that aren't (such as wide_int).

>>> With the new (and some existing) classes that's not so, and it
>>> makes them harder and more error-prone to use (I just recently
>>> learned this the hard way about offset_int and the debugging
>>> experience is still fresh in my memory).
>>
>> Sorry about the bad experience.  But that kind of thing cuts
>> both ways.  If I write:
>>
>> poly_int64
>> foo (void)
>> {
>>   poly_int64 x;
>>   x += 2;
>>   return x;
>> }
>>
>> then I get a warning about x being used uninitialised, without
>> having had to run anything.  If we add default initialisation
>> then this becomes something that has to be debugged against
>> a particular test case, i.e. we've stopped the compiler from
>> giving us useful static analysis.
>
> With default initialization the code above becomes valid and has
> the expected effect of adding 2 to zero.  It's just more robust
> than the same code with that uses a basic type instead.  This
> seems no more unexpected and no less desirable than the well-
> defined semantics of something 

Re: [PATCH] expand -fdebug-prefix-map documentation

2017-10-26 Thread Gerald Pfeifer
On Wed, 25 Oct 2017, Jim Wilson wrote:
> The current documentation doesn't explain what the option is for, or
> how one might use it.  The attached patch expands the documentation a
> bit to try to explain this.

> OK?

Thanks you for fleshing this out, Jim!

This looks fine to me (modula Sandra's note).  Just a question: would
we refer to GDB instead of gdb here?  It feels a little in between to
me, whether we are referring to the tool or the actual binary.  I'm
sure Sandra will have guidance for us. ;-)

+@file{.} for @var{new}.  This can give more reproducible builds, which are
+location independent, but may require an extra command to tell gdb where to
+find the source files.
 
Gerald

Re: [PATCH], Enable IBM/IEEE long double format to overriden more easily

2017-10-26 Thread Michael Meissner
On Wed, Oct 25, 2017 at 07:11:07PM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Sat, Oct 21, 2017 at 09:09:58AM -0400, Michael Meissner wrote:
> > As Segher and I were discussing off-line, I have some problems with the 
> > current
> > -mabi={ieee,ibm}longdouble switches as we start to planning to modify GCC 9 
> > and
> > GLIBC 2.27/2.28 to support __float128 as the default long double format for
> > power server systems.
> > 
> > My gripes are:
> > 
> > 1)  Using Warn() in rs6000.opt means that you get two warning 
> > messages when
> > you use the switches (one from the driver, and one from cc1).
> > 
> > 2)  I feel you should not get a warning if you select the option 
> > that
> > reflects the current default behavior (i.e. -mabi=ibmlongdouble
> > currently for power server systems).
> > 
> > 3)  There is no way to silenece the warning (i.e. -w doesn't work on
> > warnings in the .opt file).  Both GLIBC and LIBGCC will need the
> > ability to build support modules with an explicit long double format.
> > 
> > 4)  In the future we will need a little more flexibility in how the 
> > default
> > is set.
> > 
> > 5)  There is a mis-match between the documentation and rs6000.opt, 
> > as these
> > switches are documented, but use Undocumented in the rs6000.opt.
> 
> Agreed on all.
> 
> > These patches fix these issues.  If you use -Wno-psabi, it will silence the
> > warning.  I have built these patches on a little endian power8 system, and
> > there were no regressions.  Can I check these patches into the trunk?
> > 
> > 2017-10-21  Michael Meissner  
> > 
> > * config/rs6000/aix.h (TARGET_IEEEQUAD_DEFAULT): Set long double
> > default to IBM.
> > * config/rs6000/darwin.h (TARGET_IEEEQUAD_DEFAULT): Likewise.
> > * config/rs6000/rs6000.opt (-mabi=ieeelongdouble): Move the
> > warning to rs6000.c.  Remove the Undocumented flag, since it has
> > been documented.
> > (-mabi=ibmlongdouble): Likewise.
> 
> And more importantly, we _want_ it to be documented (right)?

I would have preferred to not document it until GCC 9 when we start the switch
to long double == _Float128, but since it was already documented (albeit only
for 32 bit), I kept it documented.  If you want me to change it to
undocumented, I can do that.

> > --- gcc/config/rs6000/rs6000.opt(revision 253961)
> > +++ gcc/config/rs6000/rs6000.opt(working copy)
> > @@ -381,10 +381,10 @@ mabi=d32
> >  Target RejectNegative Undocumented Warn(using old darwin ABI) 
> > Var(rs6000_darwin64_abi, 0)
> >  
> >  mabi=ieeelongdouble
> > -Target RejectNegative Undocumented Warn(using IEEE extended precision long 
> > double) Var(rs6000_ieeequad) Save
> > +Target RejectNegative Var(rs6000_ieeequad) Save
> >  
> >  mabi=ibmlongdouble
> > -Target RejectNegative Undocumented Warn(using IBM extended precision long 
> > double) Var(rs6000_ieeequad, 0)
> > +Target RejectNegative Var(rs6000_ieeequad, 0)
> 
> Does this need "Save" as well?

No for variables, you want Save on the first instance only.

> > +  if (!warned_change_long_double && warn_psabi)
> > +   {
> > + warned_change_long_double = true;
> > + if (TARGET_IEEEQUAD)
> > +   warning (0, "Using IEEE extended precision long double");
> > + else
> > +   warning (0, "Using IBM extended precision long double");
> > +   }
> 
> You can put OPT_Wpsabi in place of that 0, it's what that arg is for :-)

Ah, ok.  Thanks.

> Okay with that changed.  Thanks!
> 
> 
> Segher
> 

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



PING Re: [patch] configure option to override TARGET_LIBC_PROVIDES_SSP

2017-10-26 Thread Sandra Loosemore

This one.

https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00537.html

There was discussion about documenting this, but the actual configure 
change hasn't been reviewed yet.


-Sandra



[PATCH] Change default optimization level to -Og

2017-10-26 Thread Wilco Dijkstra
GCC's default optimization level is -O0.  Unfortunately unlike other compilers,
GCC generates extremely inefficient code with -O0.  It is almost unusable for
low-level debugging or manual inspection of generated code.  So a -O option is
always required for compilation.  -Og not only allows for fast compilation, but
also produces code that is efficient, readable as well as debuggable.
Therefore -Og makes for a much better default setting.

Any comments?

2017-10-26  Wilco Dijkstra  

* opts.c (default_options_optimization): Set default to -Og.

doc/
* invoke.texi (-O0) Remove default mention.
(-Og): Add mention of default setting.

--
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
3328a3b5fafa6a98007eff52d2a26af520de9128..74c33ea35b9f320b419a3417e6007d2391536f1b
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7343,7 +7343,7 @@ by @option{-O2} and also turns on the following 
optimization flags:
 @item -O0
 @opindex O0
 Reduce compilation time and make debugging produce the expected
-results.  This is the default.
+results.
 
 @item -Os
 @opindex Os
@@ -7371,7 +7371,7 @@ Optimize debugging experience.  @option{-Og} enables 
optimizations
 that do not interfere with debugging. It should be the optimization
 level of choice for the standard edit-compile-debug cycle, offering
 a reasonable level of optimization while maintaining fast compilation
-and a good debugging experience.
+and a good debugging experience.  This is the default.
 @end table
 
 If you use multiple @option{-O} options, with or without level numbers,
diff --git a/gcc/opts.c b/gcc/opts.c
index 
dfad955e220870a3250198640f3790c804b191e0..74511215309f11445685db4894be2ab6881695d3
 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -565,6 +565,12 @@ default_options_optimization (struct gcc_options *opts,
   int opt2;
   bool openacc_mode = false;
 
+  /* Set the default optimization to -Og.  */
+  opts->x_optimize_size = 0;
+  opts->x_optimize = 1;
+  opts->x_optimize_fast = 0;
+  opts->x_optimize_debug = 1;
+
   /* Scan to see what optimization level has been specified.  That will
  determine the default value of many flags.  */
   for (i = 1; i < decoded_options_count; i++)



Re: Rename cxx1998 into normal

2017-10-26 Thread Jonathan Wakely

On 26/10/17 18:55 +0200, Daniel Krügler wrote:

2017-10-26 7:51 GMT+02:00 François Dumont :

Hi

We once talk about rename the __cxx1998 namespace into something less
C++98 biased. Here is the patch to do so.

Ok with the new name ?


IMO the name should somehow still contain "cxx" somewhere, otherwise
this could easily cause a semantic ambiguity in situations, where the
term "normal" has a specific meaning.

What about "__cxxdef[ault]" ?


I'm not sure we need to change it at all. The name is anachronistic,
but harmless.



Re: [PATCH] expand -fdebug-prefix-map documentation

2017-10-26 Thread Sandra Loosemore

On 10/25/2017 06:26 PM, Jim Wilson wrote:


Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 254023)
+++ gcc/doc/invoke.texi (working copy)
@@ -6981,7 +6981,12 @@ link processing time.  Merging is enabled by defau
 @item -fdebug-prefix-map=@var{old}=@var{new}
 @opindex fdebug-prefix-map
 When compiling files in directory @file{@var{old}}, record debugging
-information describing them as in @file{@var{new}} instead.
+information describing them as in @file{@var{new}} instead.  This can be
+used to replace a build time path with an install time path in the debug info.


build-time path, install-time path


+It can also be used to change an absolute path to a relative path by using
+@file{.} for @var{new}.  This can give more reproducible builds, which are
+location independent, but may require an extra command to tell gdb where to
+find the source files.

 @item -fvar-tracking
 @opindex fvar-tracking


OK with that tweak to the hyphenation.

-Sandra



Re: Rename cxx1998 into normal

2017-10-26 Thread Daniel Krügler
2017-10-26 7:51 GMT+02:00 François Dumont :
> Hi
>
> We once talk about rename the __cxx1998 namespace into something less
> C++98 biased. Here is the patch to do so.
>
> Ok with the new name ?

IMO the name should somehow still contain "cxx" somewhere, otherwise
this could easily cause a semantic ambiguity in situations, where the
term "normal" has a specific meaning.

What about "__cxxdef[ault]" ?

- Daniel


Re: [RFA][PATCH] Provide a class interface into substitute_and_fold.

2017-10-26 Thread Jeff Law
On 10/26/2017 03:24 AM, Richard Biener wrote:
> On Tue, Oct 24, 2017 at 8:44 PM, Jeff Law  wrote:
>> This is similar to the introduction of the ssa_propagate_engine, but for
>> the substitution/replacements bits.
>>
>> In a couple places the pass specific virtual functions are just wrappers
>> around existing functions.  A good example of this is
>> ccp_folder::get_value.  Many other routines in tree-ssa-ccp.c want to
>> use get_constant_value.  Some may be convertable to use the class
>> instance, but I haven't looked closely.
>>
>> Another example is vrp_folder::get_value.  In this case we're wrapping
>> op_with_constant_singleton_value.  In a later patch that moves into the
>> to-be-introduced vr_values class so we'll delegate to that class rather
>> than wrap.
>>
>> FWIW I did look at having a single class for the propagation engine and
>> the substitution engine.  That turned out to be a bit problematical due
>> to the calls into the substitution engine from the evrp bits which don't
>> use the propagation engine at all.  Given propagation and substitution
>> are distinct concepts I ultimately decided the cleanest path forward was
>> to keep the two classes separate.
>>
>> Bootstrapped and regression tested on x86_64.  OK for the trunk?
> 
> So what I don't understand in this 2 part series is why you put
> substitute-and-fold into a different class.
Good question.  They're in different classes because they can and are
used independently.

For example, tree-complex uses the propagation engine, but not the
substitution engine.   EVRP uses the substitution engine, but not the
propagation engine.  The standard VRP algorithm uses both engines, but
other than shared data (vr_values), they are independent.  CCP and
copy-prop are similar to VRP.  Essentially one is a producer, the other
a consumer.

It might be possible to smash them together, but I'm not sure if that's
wise or not.  I do suspect that smashing them together would be easier
once all the other work is done if we were to make that choice.  But
composition, multiple inheritance or just passing around the class
instance may be better.  I think that's a TBD.


> 
> This makes it difficult for users to inherit and put the lattice in
> the deriving class as we have the visit routines which will update
> the lattice and the get_value hook which queries it.
Yes.  The key issue is the propagation step produces vr_values and the
substitution step consumes vr_values.

For VRP the way I solve this is to have a vr_values class in the derived
propagation engine class as well as the derived substitution engine
class.  When we're done with propagation we move the class instance from
the propagation engine to the substitution engine.

EVRP works similarly except the vr_values starts in the evrp_dom_walker
class, then moves to its substitution engine.

There's a bit of cleanup to do there in terms of implementation.  But
that's the basic model that I'm using right now.  It should be fairly
easy to move to a unioned class or multiple inheritance if we so
desired.  It shouldn't affect most of what I'm doing now around
encapsulating vr_values.

> 
> So from maintaining the state for the users using a single
> class whould be more appropriate.  Of course it seems like
> substitute-and-fold can be used without using the SSA
> propagator itself and the SSA propagator can be used
> without the substitute and fold engine.
Right.  THey can and are used independently which is what led to having
independent classes.


> 
> IIRC we decided against using multiple inheritance?  Which
> means a user would put the lattice in the SSA propagation
> engine derived class and do the inheriting via composition
> as member in the substitute_and_fold engine?
Right, we have decided against using multiple inheritance.   So rather
than using  multiple inheritance I pass the vr_values object.  So in my
development tree I have this:


class vrp_prop : public ssa_propagation_engine
{
 public:
  enum ssa_prop_result visit_stmt (gimple *, edge *, tree *) FINAL OVERRIDE;
  enum ssa_prop_result visit_phi (gphi *) FINAL OVERRIDE;

  /* XXX Drop the indirection through the pointer, not needed.  */
  class vr_values *vr_values;
};


class vrp_folder : public substitute_and_fold_engine
{
 public:
  tree get_value (tree) FINAL OVERRIDE;
  bool fold_stmt (gimple_stmt_iterator *) FINAL OVERRIDE;
  class vr_values *vr_values;
};

In vrp_finalize:
  class vrp_folder vrp_folder;
  vrp_folder.vr_values = vr_values;
  vrp_folder.substitute_and_fold ();


I'm in the process of cleaning this up -- in particular there'll be a
ctor in vrp_folder which will require passing in a vr_values and we'll
be dropping some indirections as well.

I just went through his exact cleanup yesterday with the separated evrp
style range analyzer and evrp itself.


> Your patches keep things simple (aka the lattice and most
> functions are globals), but is composition what you had
> in mind when doing this 

[PATCH] RISC-V: Correct and improve the "-mabi" documentation

2017-10-26 Thread Palmer Dabbelt
The documentation for the "-mabi" argument on RISC-V was incorrect.  We
chose to treat this as a documentation bug rather than a code bug, and
to make the documentation match what GCC currently does.  In the
process, I also improved the documentation a bit.

Thanks to Alex Bradbury for finding the bug!

PR target/82717: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82717

gcc/ChangeLog

2017-10-26  Palmer Dabbelt  

PR target/82717
* doc/invoke.texi (RISC-V) <-mabi>: Correct and improve.
---
 gcc/doc/invoke.texi | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 71b2445f70fd..d184e1d7b7d4 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21669,9 +21669,26 @@ When generating PIC code, allow the use of PLTs. 
Ignored for non-PIC.
 
 @item -mabi=@var{ABI-string}
 @opindex mabi
-Specify integer and floating-point calling convention.  This defaults to the
-natural calling convention: e.g.@ LP64 for RV64I, ILP32 for RV32I, LP64D for
-RV64G.
+@item -mabi=@var{ABI-string}
+@opindex mabi
+Specify integer and floating-point calling convention.  @var{ABI-string}
+contains two parts: the size of integer types and the registers used for
+floating-point types.  For example @samp{-march=rv64ifd -mabi=lp64d} means that
+@samp{long} and pointers are 64-bit (implicitly defining @samp{int} to be
+32-bit), and that floating-point values up to 64 bits wide are passed in F
+registers.  Contrast this with @samp{-march=rv64ifd -mabi=lp64f}, which still
+allows the compiler to generate code that uses the F and D extensions but only
+allows floating-point values up to 32 bits long to be passed in registers; or
+@samp{-march=rv64ifd -mabi=lp64}, in which no floating-point arguments will be
+passed in registers.
+
+The default for this argument is system dependent, users who want a specific
+calling convention should specify one explicitly.  The valid calling
+conventions are: @samp{ilp32}, @samp{ilp32f}, @samp{ilp32d}, @samp{lp64},
+@samp{lp64f}, and @samp{lp64d}.  Some calling conventions are impossible to
+implement on some ISAs: for example, @samp{-march=rv32if -mabi=ilp32d} is
+invalid because the ABI requires 64-bit values be passed in F registers, but F
+registers are only 32 bits wide.
 
 @item -mfdiv
 @itemx -mno-fdiv
-- 
2.13.6



Re: [006/nnn] poly_int: tree constants

2017-10-26 Thread Martin Sebor

 /* The tree and const_tree overload templates.   */
 namespace wi
 {
+  class unextended_tree
+  {
+  private:
+const_tree m_t;
+
+  public:
+unextended_tree () {}


Defining no-op ctors is quite dangerous and error-prone.  I suggest
to instead default initialize the member(s):

   unextended_tree (): m_t () {}

Ditto everywhere else, such as in:


This is really performance-senesitive code though, so I don't think
we want to add any unnecessary initialisation.  Primitive types are
uninitalised by default too, and the point of this class is to
provide an integer-like interface.


I understand the performance concern (more on that below), but
to clarify the usability issues,  I don't think the analogy with
primitive types is quite fitting here: int() evaluates to zero,
as do the values of i and a[0] and a[1] after an object of type
S is constructed using its default ctor, i.e., S ():

   struct S {
 int i;
 int a[2];

 S (): i (), a () { }
   };


Sure, I realise that.  I meant that:

  int x;

doesn't initialise x to zero.  So it's a question of which case is the
most motivating one: using "x ()" to initialise x to 0 in a constructor
or "int x;" to declare a variable of type x, uninitialised.  I think the
latter use case is much more common (at least in GCC).  Rearranging
things, I said later:


I agree that the latter use case is more common in GCC, but I don't
see it as a good thing.  GCC was written in C and most code still
uses now outdated C practices such as declaring variables at the top
of a (often long) function, and usually without initializing them.
It's been established that it's far better to declare variables with
the smallest scope, and to initialize them on declaration.  Compilers
are smart enough these days to eliminate redundant initialization or
assignments.


In your other message you used the example of explicit default
initialisation, such as:

class foo
{
  foo () : x () {}
  unextended_tree x;
};

But I think we should strongly discourage that kind of thing.
If someone wants to initialise x to a particular value, like
integer_zero_node, then it would be better to do it explicitly.
If they don't care what the initial value is, then for these
integer-mimicing classes, uninitialised is as good as anything
else. :-)


What I meant was: if you want to initialise "i" to 1 in your example,
you'd have to write "i (1)".  Being able to write "i ()" instead of
"i (0)" saves one character but I don't think it adds much clarity.
Explicitly initialising something only seems worthwhile if you say
what you're initialising it to.


My comment is not motivated by convenience.  What I'm concerned
about is that defining a default ctor to be a no-op defeats the
zero-initialization semantics most users expect of T().

This is particularly concerning for a class designed to behave
like an [improved] basic integer type.  Such a class should act
as closely as possible to the type it emulates and in the least
surprising ways.  Any sort of a deviation that replaces well-
defined behavior with undefined is a gotcha and a bug waiting
to happen.

It's also a concern in generic (template) contexts where T() is
expected to zero-initialize.  A template designed to work with
a fundamental integer type should also work with a user-defined
type designed to behave like an integer.


With the new (and some existing) classes that's not so, and it
makes them harder and more error-prone to use (I just recently
learned this the hard way about offset_int and the debugging
experience is still fresh in my memory).


Sorry about the bad experience.  But that kind of thing cuts
both ways.  If I write:

poly_int64
foo (void)
{
  poly_int64 x;
  x += 2;
  return x;
}

then I get a warning about x being used uninitialised, without
having had to run anything.  If we add default initialisation
then this becomes something that has to be debugged against
a particular test case, i.e. we've stopped the compiler from
giving us useful static analysis.


With default initialization the code above becomes valid and has
the expected effect of adding 2 to zero.  It's just more robust
than the same code with that uses a basic type instead.  This
seems no more unexpected and no less desirable than the well-
defined semantics of something like:

  std::string x;
  x += "2";
  return x;

or using any other C++ standard library type in a similar way.

(Incidentally, although I haven't tried with poly_int, I get no
warnings for the code above with offset_int or wide_int.)


When the cor is inline and the initialization unnecessary then
GCC will in most instances eliminate it, so I also don't think
the suggested change would have a significant impact on
the efficiency of optimized code, but...

...if it is thought essential to provide a no-op ctor, I would
suggest to consider making its property explicit, e.g., like so:

   struct unextended_tree {

 struct Uninit { };

 // ...
 unextended_tree (Uninit) { /* no 

Re: [PATCH] Document --coverage and fork-like functions (PR gcov-profile/82457).

2017-10-26 Thread Sandra Loosemore

On 10/26/2017 01:21 AM, Martin Liška wrote:

On 10/20/2017 06:03 AM, Sandra Loosemore wrote:

On 10/19/2017 12:26 PM, Eric Gallager wrote:

On 10/19/17, Martin Liška  wrote:

Hi.

As discussed in the PR, we should be more precise in our documentation.
The patch does that.

Ready for trunk?
Martin

gcc/ChangeLog:

2017-10-19  Martin Liska  

 PR gcov-profile/82457
 * doc/invoke.texi: Document that one needs a non-strict ISO mode
 for fork-like functions to be properly instrumented.
---
   gcc/doc/invoke.texi | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)





The wording is kinda unclear because the modes in the parentheses are
all strict ISO modes, but the part before the parentheses says
NON-strict... I think you either need an additional "not" inside the
parentheses, or to change all the instances of -std=c* to -std=gnu*.


The wording in the patch doesn't make sense to me, either.  If I understand the 
issue correctly, the intent is probably to say something like

Unless a strict ISO C dialect option is in effect,
@code{fork} calls are detected and correctly handled without double counting.

??


Hi Sandra.

Thank you for the feedback, I'm sending version you suggested. Hope it's fine 
to install the patch?


Ummm, no.  Sorry to have been unclear; the wording I suggested above was 
intended to replace the existing sentence about fork behavior, not to be 
appended to it.


-Sandra



Re: [v3 PATCH] Deduction guides for associative containers, debug mode deduction guide fixes.

2017-10-26 Thread Jonathan Wakely

On 26/10/17 19:23 +0300, Ville Voutilainen wrote:

On 26 October 2017 at 18:04, Jonathan Wakely  wrote:

Also, please put the deduction guides for a class immediately after
the definition of that class, rather than grouping all the guides for
unordered_map and unordered_multimap together.



Alright.

2017-10-26  Ville Voutilainen  

   Deduction guides for associative containers, debug mode deduction
guide fixes.
   * include/bits/stl_iterator.h (__iter_key_t)
   (__iter_val_t, __iter_to_alloc_t): New.
   * include/bits/stl_map.h: Add deduction guides.
   * include/bits/stl_multimap.h: Likewise.
   * include/bits/stl_multiset.h: Likewise.
   * include/bits/stl_set.h: Likewise.
   * include/bits/unordered_map.h: Likewise.
   * include/bits/unordered_set.h: Likewise.
   * include/debug/deque: Likewise.
   * include/debug/forward_list: Likewise.
   * include/debug/list: Likewise.
   * include/debug/map.h: Likewise.
   * include/debug/multimap.h: Likewise.
   * include/debug/multiset.h: Likewise.
   * include/debug/set.h: Likewise.
   * include/debug/unordered_map: Likewise.
   * include/debug/unordered_set: Likewise.
   * include/debug/vector: Likewise.
   * testsuite/23_containers/map/cons/deduction.cc: New.
   * testsuite/23_containers/multimap/cons/deduction.cc: Likewise.
   * testsuite/23_containers/multiset/cons/deduction.cc: Likewise.
   * testsuite/23_containers/set/cons/deduction.cc: Likewise.
   * testsuite/23_containers/unordered_map/cons/deduction.cc: Likewise.
   * testsuite/23_containers/unordered_multimap/cons/deduction.cc:
   Likewise.
   * testsuite/23_containers/unordered_multiset/cons/deduction.cc:
   Likewise.
   * testsuite/23_containers/unordered_set/cons/deduction.cc: Likewise.


OK for trunk - thanks.




Re: [v3 PATCH] Deduction guides for associative containers, debug mode deduction guide fixes.

2017-10-26 Thread Ville Voutilainen
On 26 October 2017 at 18:04, Jonathan Wakely  wrote:
> Also, please put the deduction guides for a class immediately after
> the definition of that class, rather than grouping all the guides for
> unordered_map and unordered_multimap together.


Alright.

2017-10-26  Ville Voutilainen  

Deduction guides for associative containers, debug mode deduction
guide fixes.
* include/bits/stl_iterator.h (__iter_key_t)
(__iter_val_t, __iter_to_alloc_t): New.
* include/bits/stl_map.h: Add deduction guides.
* include/bits/stl_multimap.h: Likewise.
* include/bits/stl_multiset.h: Likewise.
* include/bits/stl_set.h: Likewise.
* include/bits/unordered_map.h: Likewise.
* include/bits/unordered_set.h: Likewise.
* include/debug/deque: Likewise.
* include/debug/forward_list: Likewise.
* include/debug/list: Likewise.
* include/debug/map.h: Likewise.
* include/debug/multimap.h: Likewise.
* include/debug/multiset.h: Likewise.
* include/debug/set.h: Likewise.
* include/debug/unordered_map: Likewise.
* include/debug/unordered_set: Likewise.
* include/debug/vector: Likewise.
* testsuite/23_containers/map/cons/deduction.cc: New.
* testsuite/23_containers/multimap/cons/deduction.cc: Likewise.
* testsuite/23_containers/multiset/cons/deduction.cc: Likewise.
* testsuite/23_containers/set/cons/deduction.cc: Likewise.
* testsuite/23_containers/unordered_map/cons/deduction.cc: Likewise.
* testsuite/23_containers/unordered_multimap/cons/deduction.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/cons/deduction.cc:
Likewise.
* testsuite/23_containers/unordered_set/cons/deduction.cc: Likewise.


deduction_guidos_3.diff.bz2
Description: BZip2 compressed data


Re: [PATCH][AArch64] Introduce emit_frame_chain

2017-10-26 Thread James Greenhalgh
On Fri, Aug 04, 2017 at 01:26:15PM +0100, Wilco Dijkstra wrote:
> The current frame code combines the separate concepts of a frame chain
> (saving old FP,LR in a record and pointing new FP to it) and a frame
> pointer used to access locals.  Add emit_frame_chain to the aarch64_frame
> descriptor and use it in the prolog and epilog code.  For now just
> initialize it as before, so generated code is identical.
> 
> Also correctly set EXIT_IGNORE_STACK.  The current AArch64 epilog code 
> restores SP from FP if alloca is used.  If a frame pointer is used but
> there is no alloca, SP must remain valid for the epilog to work correctly.

OK.

Reviewed by: James Greenhalgh 

Thanks,
James

> 
> ChangeLog:
> 2017-08-03  Wilco Dijkstra  
> 
> gcc/
>   * config/aarch64/aarch64.h (EXIT_IGNORE_STACK): Set if alloca is used.
>   (aarch64_frame): Add emit_frame_chain boolean.
>   * config/aarch64/aarch64.c (aarch64_frame_pointer_required)
>   Move eh_return case to aarch64_layout_frame.
>   (aarch64_layout_frame): Initialize emit_frame_chain.
>   (aarch64_expand_prologue): Use emit_frame_chain.
> 


Re: [RFA][PATCH] Convert sprintf warning code to use a dominator walk

2017-10-26 Thread Jeff Law
On 10/26/2017 03:09 AM, Richard Biener wrote:
> On Wed, Oct 25, 2017 at 5:44 PM, Jeff Law  wrote:
>> On 10/24/2017 11:35 AM, Martin Sebor wrote:
>>> On 10/23/2017 05:14 PM, Jeff Law wrote:

 Martin,

 I'd like your thoughts on this patch.

 One of the things I'm working on is changes that would allow passes that
 use dominator walks to trivially perform context sensitive range
 analysis as a part of their dominator walk.

 As I outlined earlier this would allow us to easily fix the false
 positive sprintf warning reported a week or two ago.

 This patch converts the sprintf warning code to perform a dominator walk
 rather than just walking the blocks in whatever order they appear in the
 basic block array.

 From an implementation standpoint we derive a new class sprintf_dom_walk
 from the dom_walker class.  Like other dom walkers we walk statements
 from within the before_dom_children member function.  Very standard
 stuff.

 I moved handle_gimple_call and various dependencies into the
 sprintf_dom_walker class to facilitate calling handle_gimple_call from
 within the before_dom_children member function.  There's light fallout
 in various places where the call_info structure was explicitly expected
 to be found in the pass_sprintf_length class, but is now found in the
 sprintf_dom_walker class.

 This has been bootstrapped and regression tested on x86_64-linux-gnu.
 I've also layered my embedded VRP analysis on top of this work and
 verified that it does indeed fix the reported false positive.

 Thoughts?
>>>
>>> If it lets us improve the quality of the range information I can't
>>> think of a downside.
>> It's potentially slower simply because the domwalk interface is more
>> expensive than just iterating over the blocks with FOR_EACH_BB.  But
>> that's about it.  I think the ability to get more accurate range
>> information will make the compile-time hit worth it.
>>
>>>
>>> Besides the sprintf pass, a number of other areas depend on ranges,
>>> most of all the -Wstringop-overflow and truncation warnings and
>>> now -Wrestrict (once my enhancement is approved).  It would be nice
>>> to be able to get the same improvements there.  Does it mean that
>>> those warnings will need to be moved into a standalone pass?  (I'm
>>> not opposed to it, just wondering what to expect if this is
>>> the route we want to go.)
>> They don't necessarily have to be a standalone pass -- they just have to
>> be implementable as part of a dominator walk to get the cheap context
>> sensitive range data.
>>
>> So IIRC you've got some code to add additional warnings within the
>> strlen pass.  That pass is already a dominator walk.  In theory you'll
>> just add a member to the strlen_dom_walker class, then a call in
>> before_dom_children and after_dom_children virtuals and you should be
>> able to query the context sensitive range information.
>>
>> For warnings that occur in code that is not easily structured as a
>> dominator walk, Andrew's work will definitely be a better choice.
>>
>> Andrew's work will almost certainly also generate even finer grained
>> ranges because it can work on an arbitrary path through the CFG rather
>> than relying on dominance relationships.  Consider
>>
>> A
>>/ \
>>   B   C
>>\ /
>> D
>>
>> Range information implied by the edge A->B is usable within B because
>> the edge A->B dominates B.  Similarly for range information implied by
>> A->C being available in C.  But range information implied by A->B is not
>> available in D because A->B does not dominate D.  SImilarly range
>> information implied by A->C is not available in D.
>>
>> I touched on this in a private message recently.  Namely that exploiting
>> range data in non-dominated blocks feels a *lot* like jump threading and
>> should likely be structured as a backwards walk query (and thus is more
>> suitable for Andrew's infrastructure).
> 
> On the contrary - with a backward walk you don't know which way to go.
> From D to B or to C?  With a forward walk there's no such ambiguity
> (unless you start from A).
> 
> Note I have patches for EVRP merging ranges from B and C to make
> the info available for D but usually there's nothing to recover here
> that isn't also valid in A.  Just ranges derived from non-conditional
> stmts (by means of exploiting undefined behavior) can help here.
My point is the range information that is specific to the A->B edge is
not usable to optimize D because it does not hold on all paths to D.
But it can be used to improve preciseness of warnings for code within D.


It's certainly possible to merge the range information for the A->B edge
and the information for the A->C edge as we enter D (in the original
graph).  For range data implied by the edge traversal I suspect what you
end up is rarely, if ever better than what we had in A.  But if there

Re: [PATCH, rs6000] 2/2 Add x86 SSE2 <emmintrin,h> intrinsics to GCC PPC64LE target

2017-10-26 Thread Steven Munroe
On Wed, 2017-10-25 at 18:37 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Oct 17, 2017 at 01:27:16PM -0500, Steven Munroe wrote:
> > This it part 2/2 for contributing PPC64LE support for X86 SSE2
> > instrisics. This patch includes testsuite/gcc.target tests for the
> > intrinsics included by emmintrin.h. 
> 
> > --- gcc/testsuite/gcc.target/powerpc/sse2-mmx.c (revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/sse2-mmx.c (revision 0)
> > @@ -0,0 +1,83 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O3 -mdirect-move" } */
> > +/* { dg-require-effective-target lp64 } */
> > +/* { dg-require-effective-target p8vector_hw } */
> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> > "-mcpu=power8" } } */
> 
> Why this dg-skip-if?  Also, why -mdirect-move?
> 
this is weird test because it the effectively MMX style operations but
added to IA under the SSE2 Technology.

Normally mmintrin.h compare operations require a transfer to/from vector
with direct move for efficient execution on power.

The one exception to that is _mm_cmpeq_pi8 which can be implemented
directly in GPRs using cmpb.

The cmpb instruction is from power6 but I do not want to use
-mcpu=power6 here. -mdirect-move is a compromise.

I suspect that the dg-skip-if is an artifact of the early struggles to
make this stuff work across various --withcpu= settings.

I think the key is dg-require-effective-target p8vector_hw which should
allow dropping both the -mdirect-move and the whole dg-skip-if clause.

Will need to try this change and retest.

> 
> Okay for trunk with that taken care of.  Sorry it took a while.
> 
> Have you tested this on big endian btw?
> 
Yes.

I have tested on P8 BE using --withcpu=[power6 | power7 | power8 ]

> 
> Segher
> 




Re: [PATCH][AArch64] Improve aarch64_legitimate_constant_p

2017-10-26 Thread James Greenhalgh
On Fri, Jul 07, 2017 at 12:28:11PM +0100, Wilco Dijkstra wrote:
> This patch further improves aarch64_legitimate_constant_p.  Allow all
> integer, floating point and vector constants.  Allow label references
> and non-anchor symbols with an immediate offset.  This allows such
> constants to be rematerialized, resulting in smaller code and fewer stack
> spills.
> 
> SPEC2006 codesize reduces by 0.08%, SPEC2017 by 0.13%.
> 
> Bootstrap OK, OK for commit?

This is mostly OK, but I think you lose one case we previosuly permitted,
buried in aarch64_classify_address (the CONST case).

OK with that case handled too (assuming that passes a bootstrap and test).

Reviewed by: James Greenhalgh 

Thanks,
James

> 
> ChangeLog:
> 2017-07-07  Wilco Dijkstra  
> 
>   * config/aarch64/aarch64.c (aarch64_legitimate_constant_p):
>   Return true for more constants, symbols and label references.
>   (aarch64_valid_floating_const): Remove unused function.
> 


[PATCH] Fix PR81659

2017-10-26 Thread Richard Biener

The following fixes lower_eh_dispatch destroying dominator info
that was still live from previous passes.  This clears it from the
obvious place (when we think we might have created unreachable blocks).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2017-10-26  Richard Biener  

PR middle-end/81659
* tree-eh.c (pass_lower_eh_dispatch::execute): Free dominator
info when we redirected EH.

* g++.dg/torture/pr81659.C: New testcase.

Index: gcc/tree-eh.c
===
--- gcc/tree-eh.c   (revision 254099)
+++ gcc/tree-eh.c   (working copy)
@@ -3779,7 +3779,10 @@ pass_lower_eh_dispatch::execute (functio
 }
 
   if (redirected)
-delete_unreachable_blocks ();
+{
+  free_dominance_info (CDI_DOMINATORS);
+  delete_unreachable_blocks ();
+}
   return flags;
 }
 
Index: gcc/testsuite/g++.dg/torture/pr81659.C
===
--- gcc/testsuite/g++.dg/torture/pr81659.C  (nonexistent)
+++ gcc/testsuite/g++.dg/torture/pr81659.C  (working copy)
@@ -0,0 +1,19 @@
+// { dg-do compile }
+
+void
+a (int b)
+{
+  if (b)
+throw;
+  try
+{
+  a (3);
+}
+  catch (int)
+{
+}
+  catch (int)
+{
+}
+}
+


Re: [PATCH, rs6000] Gimple folding for vec_madd()

2017-10-26 Thread Will Schmidt
On Thu, 2017-10-26 at 17:18 +0200, Richard Biener wrote:
> On Thu, Oct 26, 2017 at 5:13 PM, Richard Biener
>  wrote:
> > On Thu, Oct 26, 2017 at 4:30 PM, Will Schmidt  
> > wrote:
> >> On Thu, 2017-10-26 at 11:05 +0200, Richard Biener wrote:
> >>> On Wed, Oct 25, 2017 at 4:38 PM, Will Schmidt  
> >>> wrote:
> >>> > Hi,
> >>> >
> >>> > Add support for gimple folding of the vec_madd() (vector multiply-add)
> >>> > intrinsics.
> >>> > Testcase coverage is provided by the existing tests
> >>> >  gcc.target/powerpc/fold-vec-madd-*.c
> >>> >
> >>> > Sniff-tests appear clean.  A full regtest is currently running across 
> >>> > assorted Power systems. (P6-P9).
> >>> > OK for trunk (pending clean run results)?
> >>>
> >>> You can use FMA_EXPR on integer operands as well.  Otherwise you risk
> >>> the FMA be not matched by combine later when part of the operation is
> >>> CSEd.
> >>
> >> I had tried that initially, without success,..   I'll probably need
> >> another hint.  :-)
> >> Looking a bit closer, I think I see why the assert fired, but I'm not
> >> sure what the proper fix would be.
> >>
> >> So attempting to the FMA_EXPR on the integer operands. (vector shorts in
> >> this case), I end up triggering this error:
> >>
> >> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/testsuite/gcc.target/powerpc/fold-vec-madd-short.c:14:10:
> >>  internal compiler error: in expand_expr_real_2, at expr.c:8712
> >> 0x10813303 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
> >> expand_modifier)
> >> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:8712
> >> 0x1081822f expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> >> expand_modifier, rtx_def**, bool)
> >> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:9787
> >> 0x1080f7bb expand_expr_real(tree_node*, rtx_def*, machine_mode, 
> >> expand_modifier, rtx_def**, bool)
> >> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:8084
> >> ...
> >>
> >>
> >> which when followed back, I tripped an assert here:  (gcc/expr.c:
> >> expand_expr_real_2() ~ line 8710)
> >>
> >> case FMA_EXPR:
> >>   {
> >> optab opt = fma_optab;
> >> gimple *def0, *def2;
> >> if (optab_handler (fma_optab, mode) == CODE_FOR_nothing)
> >>   {
> >> tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA);
> >> tree call_expr;
> >>
> >> gcc_assert (fn != NULL_TREE);
> >>
> >> where gcc/builtins.c
> >> mathfn_built_in()->mathfn_built_in_1->mathfn_built_in_2 looks to have
> >> returned END_BUILTINS/NULL_TREE, due to falling through the if/else
> >> tree:
> >>
> >>   if (TYPE_MAIN_VARIANT (type) == double_type_node)
> >> return fcode;
> >>   else if (TYPE_MAIN_VARIANT (type) == float_type_node)
> >> return fcodef;
> >>   else if (TYPE_MAIN_VARIANT (type) == long_double_type_node)
> >> return fcodel;
> >>   else
> >> return END_BUILTINS;
> >>
> >> Looks like that is all double/float/long double contents.  First blush
> >> attempt would be to add V8HI_type_node/integer_type_node to that if/else
> >> tree, but that doesn't look like it would be near enough.
> >
> > Well - we of course expect to have an optab for the fma with vector
> > short.  I thought
> > you had one given you have the intrinsic.  If you don't have an optab
> > you of course
> > have to open-code it.
> >
> > Just thought you expected an actual machine instruction doing the integer 
> > FMA.
> 
> So you have
> 
> (define_insn "altivec_vmladduhm"
>   [(set (match_operand:V8HI 0 "register_operand" "=v")
> (plus:V8HI (mult:V8HI (match_operand:V8HI 1 "register_operand" "v")
>   (match_operand:V8HI 2 "register_operand" "v"))
>(match_operand:V8HI 3 "register_operand" "v")))]
>   "TARGET_ALTIVEC"
>   "vmladduhm %0,%1,%2,%3"
>   [(set_attr "type" "veccomplex")])
> 
> but not
> 
> (define_expand "fmav8hi4"
> ...
> 
> or define_insn in case that's also a way to register an optab.
> 
> Richard.

Ok.  Thanks for the guidance.  :-) 

-Will


> 
> 
> 
> > Richard.
> >
> >> Thanks
> >> -Will
> >>
> >>>
> >>> Richard.
> >>>
> >>> > Thanks,
> >>> > -Will
> >>> >
> >>> > [gcc]
> >>> >
> >>> > 2017-10-25  Will Schmidt 
> >>> >
> >>> > * config/rs6000/rs6000.c: (rs6000_gimple_fold_builtin) Add 
> >>> > support for
> >>> >   gimple folding of vec_madd() intrinsics.
> >>> >
> >>> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> >>> > index 4837e14..04c2b15 100644
> >>> > --- a/gcc/config/rs6000/rs6000.c
> >>> > +++ b/gcc/config/rs6000/rs6000.c
> >>> > @@ -16606,10 +16606,43 @@ rs6000_gimple_fold_builtin 
> >>> > (gimple_stmt_iterator *gsi)
> >>> >build_int_cst (arg2_type, 
> >>> > 0)), arg0);
> >>> >  gimple_set_location (g, loc);
> >>> >  

Re: [PATCH] Improve alloca alignment

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 4:55 PM, Jeff Law  wrote:
> On 10/17/2017 06:04 AM, Wilco Dijkstra wrote:
>> Wilco Dijkstra wrote:
>>>
>>> Yes STACK_BOUNDARY applies to virtual_stack_dynamic_rtx and all other
>>> virtual frame registers. It appears it's main purpose is to enable alignment
>>> optimizations since PREFERRED_STACK_BOUNDARY is used to align
>>> local and outgoing argument area etc. So if you don't want the alignment
>>> optimizations it is feasible to set STACK_BOUNDARY to a lower value
>>> without changing the stack layout.
>>>
>>> There is also STACK_DYNAMIC_OFFSET which computes the total offset
>>> from the stack. It's not obvious whether the default version should align 
>>> (since
>>> outgoing arguments are already aligned there is no easy way to record the
>>> extra padding), but we could assert if the offset isn't aligned.
>>
>> Also there is something odd in the sparc backend:
>>
>> /* Given the stack bias, the stack pointer isn't actually aligned.  */
>> #define INIT_EXPANDERS   \
>>   do {   \
>> if (crtl->emit.regno_pointer_align && SPARC_STACK_BIAS)  \
>>   {  \
>> REGNO_POINTER_ALIGN (STACK_POINTER_REGNUM) = BITS_PER_UNIT;  \
>> REGNO_POINTER_ALIGN (HARD_FRAME_POINTER_REGNUM) = BITS_PER_UNIT; \
>>   }  \
>>   } while (0)
>>
>> That lowers the alignment for the stack and frame pointer. So assuming that 
>> works
>> and blocks alignment optimizations, why isn't this done for the dynamic 
>> offset as well?
> No clue, but ISTM that it should.  Eric, can you try that and see if it
> addresses these problems?  I'd really like to get this wrapped up, but I
> don't have access to any sparc systems to test it myself.

Or maybe adjust all non-hardreg stack pointers by the bias so they
_are_ aligned.  And of course
make sure we always use the aligned pointers when allocating.

Weird ABI ...

Richard.

> Jeff


Re: [PATCH][AArch64] Simplify frame layout for stack probing

2017-10-26 Thread James Greenhalgh
On Tue, Jul 25, 2017 at 02:58:04PM +0100, Wilco Dijkstra wrote:
> This patch makes some changes to the frame layout in order to simplify
> stack probing.  We want to use the save of LR as a probe in any non-leaf
> function.  With shrinkwrapping we may only save LR before a call, so it
> is useful to define a fixed location in the callee-saves. So force LR at
> the bottom of the callee-saves even with -fomit-frame-pointer.
> 
> Also remove a rarely used frame layout that saves the callee-saves first
> with -fomit-frame-pointer.
> 
> OK for commit (and backport to GCC7)?

OK. Leave it a week before backporting.

Reviewed by: James Greenhalgh 

Thanks,
James

> 
> ChangeLog:
> 2017-07-25  Wilco Dijkstra  
> 
>   * config/aarch64/aarch64.c (aarch64_layout_frame):
>   Ensure LR is always stored at the bottom of the callee-saves.
>   Remove frame option which saves callee-saves at top of frame.
> 


Re: [PATCH, rs6000] Gimple folding for vec_madd()

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 5:13 PM, Richard Biener
 wrote:
> On Thu, Oct 26, 2017 at 4:30 PM, Will Schmidt  
> wrote:
>> On Thu, 2017-10-26 at 11:05 +0200, Richard Biener wrote:
>>> On Wed, Oct 25, 2017 at 4:38 PM, Will Schmidt  
>>> wrote:
>>> > Hi,
>>> >
>>> > Add support for gimple folding of the vec_madd() (vector multiply-add)
>>> > intrinsics.
>>> > Testcase coverage is provided by the existing tests
>>> >  gcc.target/powerpc/fold-vec-madd-*.c
>>> >
>>> > Sniff-tests appear clean.  A full regtest is currently running across 
>>> > assorted Power systems. (P6-P9).
>>> > OK for trunk (pending clean run results)?
>>>
>>> You can use FMA_EXPR on integer operands as well.  Otherwise you risk
>>> the FMA be not matched by combine later when part of the operation is
>>> CSEd.
>>
>> I had tried that initially, without success,..   I'll probably need
>> another hint.  :-)
>> Looking a bit closer, I think I see why the assert fired, but I'm not
>> sure what the proper fix would be.
>>
>> So attempting to the FMA_EXPR on the integer operands. (vector shorts in
>> this case), I end up triggering this error:
>>
>> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/testsuite/gcc.target/powerpc/fold-vec-madd-short.c:14:10:
>>  internal compiler error: in expand_expr_real_2, at expr.c:8712
>> 0x10813303 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
>> expand_modifier)
>> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:8712
>> 0x1081822f expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
>> expand_modifier, rtx_def**, bool)
>> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:9787
>> 0x1080f7bb expand_expr_real(tree_node*, rtx_def*, machine_mode, 
>> expand_modifier, rtx_def**, bool)
>> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:8084
>> ...
>>
>>
>> which when followed back, I tripped an assert here:  (gcc/expr.c:
>> expand_expr_real_2() ~ line 8710)
>>
>> case FMA_EXPR:
>>   {
>> optab opt = fma_optab;
>> gimple *def0, *def2;
>> if (optab_handler (fma_optab, mode) == CODE_FOR_nothing)
>>   {
>> tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA);
>> tree call_expr;
>>
>> gcc_assert (fn != NULL_TREE);
>>
>> where gcc/builtins.c
>> mathfn_built_in()->mathfn_built_in_1->mathfn_built_in_2 looks to have
>> returned END_BUILTINS/NULL_TREE, due to falling through the if/else
>> tree:
>>
>>   if (TYPE_MAIN_VARIANT (type) == double_type_node)
>> return fcode;
>>   else if (TYPE_MAIN_VARIANT (type) == float_type_node)
>> return fcodef;
>>   else if (TYPE_MAIN_VARIANT (type) == long_double_type_node)
>> return fcodel;
>>   else
>> return END_BUILTINS;
>>
>> Looks like that is all double/float/long double contents.  First blush
>> attempt would be to add V8HI_type_node/integer_type_node to that if/else
>> tree, but that doesn't look like it would be near enough.
>
> Well - we of course expect to have an optab for the fma with vector
> short.  I thought
> you had one given you have the intrinsic.  If you don't have an optab
> you of course
> have to open-code it.
>
> Just thought you expected an actual machine instruction doing the integer FMA.

So you have

(define_insn "altivec_vmladduhm"
  [(set (match_operand:V8HI 0 "register_operand" "=v")
(plus:V8HI (mult:V8HI (match_operand:V8HI 1 "register_operand" "v")
  (match_operand:V8HI 2 "register_operand" "v"))
   (match_operand:V8HI 3 "register_operand" "v")))]
  "TARGET_ALTIVEC"
  "vmladduhm %0,%1,%2,%3"
  [(set_attr "type" "veccomplex")])

but not

(define_expand "fmav8hi4"
...

or define_insn in case that's also a way to register an optab.

Richard.



> Richard.
>
>> Thanks
>> -Will
>>
>>>
>>> Richard.
>>>
>>> > Thanks,
>>> > -Will
>>> >
>>> > [gcc]
>>> >
>>> > 2017-10-25  Will Schmidt 
>>> >
>>> > * config/rs6000/rs6000.c: (rs6000_gimple_fold_builtin) Add 
>>> > support for
>>> >   gimple folding of vec_madd() intrinsics.
>>> >
>>> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>>> > index 4837e14..04c2b15 100644
>>> > --- a/gcc/config/rs6000/rs6000.c
>>> > +++ b/gcc/config/rs6000/rs6000.c
>>> > @@ -16606,10 +16606,43 @@ rs6000_gimple_fold_builtin 
>>> > (gimple_stmt_iterator *gsi)
>>> >build_int_cst (arg2_type, 0)), 
>>> > arg0);
>>> >  gimple_set_location (g, loc);
>>> >  gsi_replace (gsi, g, true);
>>> >  return true;
>>> >}
>>> > +
>>> > +/* vec_madd (Float) */
>>> > +case ALTIVEC_BUILTIN_VMADDFP:
>>> > +case VSX_BUILTIN_XVMADDDP:
>>> > +  {
>>> > +   arg0 = gimple_call_arg (stmt, 0);
>>> > +   arg1 = gimple_call_arg (stmt, 1);
>>> > +   tree arg2 = gimple_call_arg (stmt, 2);
>>> > +   lhs = 

Re: [PATCH, rs6000] Gimple folding for vec_madd()

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 4:30 PM, Will Schmidt  wrote:
> On Thu, 2017-10-26 at 11:05 +0200, Richard Biener wrote:
>> On Wed, Oct 25, 2017 at 4:38 PM, Will Schmidt  
>> wrote:
>> > Hi,
>> >
>> > Add support for gimple folding of the vec_madd() (vector multiply-add)
>> > intrinsics.
>> > Testcase coverage is provided by the existing tests
>> >  gcc.target/powerpc/fold-vec-madd-*.c
>> >
>> > Sniff-tests appear clean.  A full regtest is currently running across 
>> > assorted Power systems. (P6-P9).
>> > OK for trunk (pending clean run results)?
>>
>> You can use FMA_EXPR on integer operands as well.  Otherwise you risk
>> the FMA be not matched by combine later when part of the operation is
>> CSEd.
>
> I had tried that initially, without success,..   I'll probably need
> another hint.  :-)
> Looking a bit closer, I think I see why the assert fired, but I'm not
> sure what the proper fix would be.
>
> So attempting to the FMA_EXPR on the integer operands. (vector shorts in
> this case), I end up triggering this error:
>
> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/testsuite/gcc.target/powerpc/fold-vec-madd-short.c:14:10:
>  internal compiler error: in expand_expr_real_2, at expr.c:8712
> 0x10813303 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
> expand_modifier)
> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:8712
> 0x1081822f expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:9787
> 0x1080f7bb expand_expr_real(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
> /home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:8084
> ...
>
>
> which when followed back, I tripped an assert here:  (gcc/expr.c:
> expand_expr_real_2() ~ line 8710)
>
> case FMA_EXPR:
>   {
> optab opt = fma_optab;
> gimple *def0, *def2;
> if (optab_handler (fma_optab, mode) == CODE_FOR_nothing)
>   {
> tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA);
> tree call_expr;
>
> gcc_assert (fn != NULL_TREE);
>
> where gcc/builtins.c
> mathfn_built_in()->mathfn_built_in_1->mathfn_built_in_2 looks to have
> returned END_BUILTINS/NULL_TREE, due to falling through the if/else
> tree:
>
>   if (TYPE_MAIN_VARIANT (type) == double_type_node)
> return fcode;
>   else if (TYPE_MAIN_VARIANT (type) == float_type_node)
> return fcodef;
>   else if (TYPE_MAIN_VARIANT (type) == long_double_type_node)
> return fcodel;
>   else
> return END_BUILTINS;
>
> Looks like that is all double/float/long double contents.  First blush
> attempt would be to add V8HI_type_node/integer_type_node to that if/else
> tree, but that doesn't look like it would be near enough.

Well - we of course expect to have an optab for the fma with vector
short.  I thought
you had one given you have the intrinsic.  If you don't have an optab
you of course
have to open-code it.

Just thought you expected an actual machine instruction doing the integer FMA.

Richard.

> Thanks
> -Will
>
>>
>> Richard.
>>
>> > Thanks,
>> > -Will
>> >
>> > [gcc]
>> >
>> > 2017-10-25  Will Schmidt 
>> >
>> > * config/rs6000/rs6000.c: (rs6000_gimple_fold_builtin) Add support 
>> > for
>> >   gimple folding of vec_madd() intrinsics.
>> >
>> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> > index 4837e14..04c2b15 100644
>> > --- a/gcc/config/rs6000/rs6000.c
>> > +++ b/gcc/config/rs6000/rs6000.c
>> > @@ -16606,10 +16606,43 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
>> > *gsi)
>> >build_int_cst (arg2_type, 0)), 
>> > arg0);
>> >  gimple_set_location (g, loc);
>> >  gsi_replace (gsi, g, true);
>> >  return true;
>> >}
>> > +
>> > +/* vec_madd (Float) */
>> > +case ALTIVEC_BUILTIN_VMADDFP:
>> > +case VSX_BUILTIN_XVMADDDP:
>> > +  {
>> > +   arg0 = gimple_call_arg (stmt, 0);
>> > +   arg1 = gimple_call_arg (stmt, 1);
>> > +   tree arg2 = gimple_call_arg (stmt, 2);
>> > +   lhs = gimple_call_lhs (stmt);
>> > +   gimple *g = gimple_build_assign (lhs, FMA_EXPR , arg0, arg1, arg2);
>> > +   gimple_set_location (g, gimple_location (stmt));
>> > +   gsi_replace (gsi, g, true);
>> > +   return true;
>> > +  }
>> > +/* vec_madd (Integral) */
>> > +case ALTIVEC_BUILTIN_VMLADDUHM:
>> > +  {
>> > +   arg0 = gimple_call_arg (stmt, 0);
>> > +   arg1 = gimple_call_arg (stmt, 1);
>> > +   tree arg2 = gimple_call_arg (stmt, 2);
>> > +   lhs = gimple_call_lhs (stmt);
>> > +   tree lhs_type = TREE_TYPE (lhs);
>> > +   location_t loc = gimple_location (stmt);
>> > +   gimple_seq stmts = NULL;
>> > +   tree mult_result = gimple_build (, loc, MULT_EXPR,
>> > +   

Re: [PATCH][AArch64] Improve addressing of TI/TFmode

2017-10-26 Thread James Greenhalgh
On Thu, Jul 20, 2017 at 01:49:03PM +0100, Wilco Dijkstra wrote:
> In https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01125.html Jiong
> pointed out some addressing inefficiencies due to a recent change in
> regcprop (https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00775.html).
> 
> This patch improves aarch64_legitimize_address_displacement to split
> unaligned offsets of TImode and TFmode accesses.  The resulting code
> is better and no longer relies on the original regcprop optimization.
> 
> For the test we now produce:
> 
>   add x1, sp, 4
>   stp xzr, xzr, [x1, 24]
> 
> rather than:
> 
> mov x1, sp
> add x1, x1, 28
> stp xzr, xzr, [x1]
> 
> OK for commit?

OK.

Reviewed by: James Greenhalgh 

Thanks,
James

> 
> ChangeLog:
> 2017-06-20  Wilco Dijkstra  
> 
>   * config/aarch64/aarch64.c (aarch64_legitimize_address_displacement):
>   Improve unaligned TImode/TFmode base/offset split.
> 
> testsuite
>   * gcc.target/aarch64/ldp_stp_unaligned_2.c: New file.
> 
> --


Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 4:38 PM, Richard Biener
 wrote:
> On Thu, Oct 26, 2017 at 2:55 PM, Jan Hubicka  wrote:
>>> I think the limit should be on the number of generated copies and not
>>> the overall size of the structure...  If the struct were composed of
>>> 32 individual chars we wouldn't want to emit 32 loads and 32 stores...
>>>
>>> I wonder how rep; movb; interacts with store to load forwarding?  Is
>>> that maybe optimized well on some archs?  movb should always
>>> forward and wasn't the setup cost for small N reasonable on modern
>>> CPUs?
>>
>> rep mov is win over loop for blocks over 128bytes on core, for blocks in rage
>> 24-128 on zen.  This is w/o store/load forwarding, but I doubt those provide
>> a cheap way around.
>>
>>>
>>> It probably depends on the width of the entries in the store buffer,
>>> if they appear in-order and the alignment of the stores (if they are larger 
>>> than
>>> 8 bytes they are surely aligned).  IIRC CPUs had smaller store buffer
>>> entries than cache line size.
>>>
>>> Given that load bandwith is usually higher than store bandwith it
>>> might make sense to do the store combining in our copying sequence,
>>> like for the 8 byte entry case use sth like
>>>
>>>   movq 0(%eax), %xmm0
>>>   movhps 8(%eax), %xmm0 // or vpinsert
>>>   mov[au]ps %xmm0, 0%(ebx)
>>> ...
>>>
>>> thus do two loads per store and perform the stores in wider
>>> mode?
>>
>> This may be somewhat faster indeed.  I am not sure if store to load
>> forwarding will work for the later half when read again by halves.
>> It would not happen on older CPUs :)
>
> Yes, forwarding larger stores to smaller loads generally works fine
> since forever with the usual restrictions of alignment/size being
> power of two "halves".
>
> The question is of course what to do for 4 byte or smaller elements or
> mixed size elements.  We can do zero-extending loads
> (do we have them for QI, HI mode loads as well?) and
> do shift and or's.  I'm quite sure the CPUs wouldn't like to
> see vpinsert's of different vector mode destinations.  So it
> would be 8 byte stores from GPRs and values built up via
> shift & or.

Like we generate

foo:
.LFB0:
.cfi_startproc
movl4(%rdi), %eax
movzwl  2(%rdi), %edx
salq$16, %rax
orq %rdx, %rax
movzbl  1(%rdi), %edx
salq$8, %rax
orq %rdx, %rax
movzbl  (%rdi), %edx
salq$8, %rax
orq %rdx, %rax
movq%rax, (%rsi)
ret

for

struct x { char e; char f; short c; int i; } a;

void foo (struct x *p, long *q)
{
 *q = (((unsigned long)(unsigned int)p->i) << 16)
   | (((unsigned long)(unsigned short)p->c))) << 8)
   | (((unsigned long)(unsigned char)p->f))) << 8)
   | ((unsigned long)(unsigned char)p->e);
}

if you disable the bswap pass.  Doing 4 byte stores in this
case would save some prefixes at least.  I expected the
ORs and shifts to have smaller encodings...

With 4 byte stores we end up with the same size as with
individual loads & stores.

> As said, the important part is that IIRC CPUs can usually
> have more loads in flight than stores.  Esp. Bulldozer
> with the split core was store buffer size limited (but it
> could do merging of store buffer entries IIRC).

Also if we do the stores in smaller chunks we are more
likely hitting the same store-to-load-forwarding issue
elsewhere.  Like in case the destination is memcpy'ed
away.

So the proposed change isn't necessarily a win without
a possible similar regression that it tries to fix.

Whole-program analysis of accesses might allow
marking affected objects.

Richard.

> Richard.
>
>> Honza
>>>
>>> As said a general concern was you not copying padding.  If you
>>> put this into an even more common place you surely will break
>>> stuff, no?
>>>
>>> Richard.
>>>
>>> >
>>> > Martin
>>> >
>>> >
>>> >>
>>> >> Richard.
>>> >>
>>> >> > Martin
>>> >> >
>>> >> >
>>> >> > 2017-10-12  Martin Jambor  
>>> >> >
>>> >> > PR target/80689
>>> >> > * tree-sra.h: New file.
>>> >> > * ipa-prop.h: Moved declaration of build_ref_for_offset to
>>> >> > tree-sra.h.
>>> >> > * expr.c: Include params.h and tree-sra.h.
>>> >> > (emit_move_elementwise): New function.
>>> >> > (store_expr_with_bounds): Optionally use it.
>>> >> > * ipa-cp.c: Include tree-sra.h.
>>> >> > * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New.
>>> >> > * config/i386/i386.c (ix86_option_override_internal): Set
>>> >> > PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35.
>>> >> > * tree-sra.c: Include tree-sra.h.
>>> >> > (scalarizable_type_p): Renamed to
>>> >> > simple_mix_of_records_and_arrays_p, made public, renamed the
>>> >> > second parameter to allow_char_arrays.
>>> >> > (extract_min_max_idx_from_array): New function.
>>> >> > (completely_scalarize): Moved 

Re: [v3 PATCH] Deduction guides for associative containers, debug mode deduction guide fixes.

2017-10-26 Thread Jonathan Wakely

On 26/10/17 15:36 +0100, Jonathan Wakely wrote:

On 17/10/17 22:48 +0300, Ville Voutilainen wrote:

Tested on Linux-PPC64. The debug mode fixes have been tested manually
and individually on Linux-x64.

2017-10-17  Ville Voutilainen  

  Deduction guides for associative containers, debug mode deduction
guide fixes.
  * include/bits/stl_algobase.h (__iter_key_t)
  (__iter_val_t, __iter_to_alloc_t): New.
  * include/bits/stl_map.h: Add deduction guides.
  * include/bits/stl_multimap.h: Likewise.
  * include/bits/stl_multiset.h: Likewise.
  * include/bits/stl_set.h: Likewise.
  * include/bits/unordered_map.h: Likewise.
  * include/bits/unordered_set.h: Likewise.


Also, please put the deduction guides for a class immediately after
the definition of that class, rather than grouping all the guides for
unordered_map and unordered_multimap together.

Thanks.



Re: [PATCH] Fix nrv-1.c false failure on aarch64.

2017-10-26 Thread Jeff Law
On 10/18/2017 10:59 AM, Egeyar Bagcioglu wrote:
> Hello,
> 
> Test case "guality.exp=nrv-1.c" fails on aarch64. Optimizations reorder
> the instructions and cause the value of a variable to be checked before
> its first assignment. The following patch is moving the
> break point to the end of the function. Therefore, it ensures that the
> break point is reached after the assignment instruction is executed.
> 
> Please review the patch and apply if legitimate.
This seems wrong.

If I understand the test correctly, we want to break on the line with
the assignment to a2.i[4] = 7 and verify that before that line executes
that a2.i[0] == 42.

Moving the test point to the end of the function seems to defeat the
purpose of the test.  A breakpoint at the end of the function to test
state is pointless as it doesn't reflect what a user is likely to want
to do.

I'm guessing based on your description that optimization has sunk the
assignment to a2.i[0] down past the assignment to a2.i[4]?  What
optimization did this and what do the dwarf records look like?


Jeff


Re: [PATCH] Improve alloca alignment

2017-10-26 Thread Jeff Law
On 10/17/2017 06:04 AM, Wilco Dijkstra wrote:
> Wilco Dijkstra wrote:
>>
>> Yes STACK_BOUNDARY applies to virtual_stack_dynamic_rtx and all other
>> virtual frame registers. It appears it's main purpose is to enable alignment
>> optimizations since PREFERRED_STACK_BOUNDARY is used to align
>> local and outgoing argument area etc. So if you don't want the alignment
>> optimizations it is feasible to set STACK_BOUNDARY to a lower value
>> without changing the stack layout.
>>
>> There is also STACK_DYNAMIC_OFFSET which computes the total offset
>> from the stack. It's not obvious whether the default version should align 
>> (since
>> outgoing arguments are already aligned there is no easy way to record the
>> extra padding), but we could assert if the offset isn't aligned.
> 
> Also there is something odd in the sparc backend:
> 
> /* Given the stack bias, the stack pointer isn't actually aligned.  */
> #define INIT_EXPANDERS   \
>   do {   \
> if (crtl->emit.regno_pointer_align && SPARC_STACK_BIAS)  \
>   {  \
> REGNO_POINTER_ALIGN (STACK_POINTER_REGNUM) = BITS_PER_UNIT;  \
> REGNO_POINTER_ALIGN (HARD_FRAME_POINTER_REGNUM) = BITS_PER_UNIT; \
>   }  \
>   } while (0)
> 
> That lowers the alignment for the stack and frame pointer. So assuming that 
> works
> and blocks alignment optimizations, why isn't this done for the dynamic 
> offset as well?
No clue, but ISTM that it should.  Eric, can you try that and see if it
addresses these problems?  I'd really like to get this wrapped up, but I
don't have access to any sparc systems to test it myself.

Jeff


Re: [PATCH] Improve alloca alignment

2017-10-26 Thread Jeff Law
On 10/05/2017 03:16 AM, Richard Biener wrote:
>  On Thu, Oct 5, 2017 at 1:07 AM, Jeff Law  wrote:
>> On 10/04/2017 08:53 AM, Eric Botcazou wrote:
 This seems like a SPARC target problem to me -- essentially it's
 claiming a higher STACK_BOUNDARY than it really has.
>>>
>>> No, it is not, I can guarantee you that the stack pointer is always aligned 
>>> to
>>> 64-bit boundaries on SPARC, otherwise all hell would break loose...
>> Then something is inconsistent somehwere.  Either the stack is aligned
>> prior to that code or it is not.  If it is aligned, then Wilco's patch
>> ought to keep it aligned.  If is not properly aligned, then well, that's
>> the problem ISTM.
>>
>> Am I missing something here?
> 
> What I got from the discussion and the PR is that the stack hardregister
> is properly aligned but what GCC maps to it (virtual or frame or whatever)
> might not be at all points.
Ah!  But I'd probably claim that having the virtual unaligned is erroneous.

> 
> allocate_dynamic_stack_space uses virtual_stack_dynamic_rtx and I'm not
> sure STACK_BOUNDARY applies to it?
> 
> Not that I know anything about this here ;)
My first thought is that sure it should apply.  It just seems wrong that
STACK_BOUNDARY wouldn't apply to the virtual.  But I doubt we've ever
documented that as a requirement/assumption.

Jeff



Re: [Diagnostic Patch] don't print column zero

2017-10-26 Thread Nathan Sidwell

On 10/26/2017 10:34 AM, David Malcolm wrote:

[CCing Rainer and Mike for the gcc-dg.exp part]



Alternate idea: could show_column become a tri-state:
   * default: show non-zero columns
   * never: never show columns
   * always: always show a column, printing 0 for the no-column case
and then use "always" in our testsuite


One of the things this patch shows up is the number of places where 
we're default accepting a zero column.  IMHO it is best to explicitly 
mark such tests.



+  size_t l = sprintf (result, col ? ":%d:%d" : ":%d", line, col);


Possibly a silly question, but is it OK to have a formatted string
call in which some of the arguments aren't consumed? (here "col" is only
consumed for the true case, which consumes 2 arguments; it's not consumed
for the false case).


Yes.


+  gcc_checking_assert (l + 1 < sizeof (result));


Would snprintf be safer?


I guess. but the assert's still needed.


Please create a selftest for the function, covering these cases:

* line == 0
* line > 0 and col == 0
* line > 0 and col > 0 (checking output for these cases)
* line == INT_MAX and col == INT_MAX (without checking output, just to tickle 
the assert)
* line == INT_MIN and col == INT_MIN (likewise)


Ok, I'll investigate this new fangled self-testing framework :)


There are some testcases where we deliberately don't have a *line*
number; what happens to these?


Those don't change.  the dg-harness already does NOT expect a column 
when lineno=0.



My Tcl skills aren't great, so hopefully someone else can review this;
CCing Rainer and Mike.

Also, is the proposed syntax for "no columns" OK?  (note the tristate
idea above)


I'm not wedded to '-:', but as mentioned above, I think the tests should 
be explicit about whether a column is expected or not (and the default 
needs to be 'expect column', because of history)


thanks for your comments.

nathan

--
Nathan Sidwell


Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 2:55 PM, Jan Hubicka  wrote:
>> I think the limit should be on the number of generated copies and not
>> the overall size of the structure...  If the struct were composed of
>> 32 individual chars we wouldn't want to emit 32 loads and 32 stores...
>>
>> I wonder how rep; movb; interacts with store to load forwarding?  Is
>> that maybe optimized well on some archs?  movb should always
>> forward and wasn't the setup cost for small N reasonable on modern
>> CPUs?
>
> rep mov is win over loop for blocks over 128bytes on core, for blocks in rage
> 24-128 on zen.  This is w/o store/load forwarding, but I doubt those provide
> a cheap way around.
>
>>
>> It probably depends on the width of the entries in the store buffer,
>> if they appear in-order and the alignment of the stores (if they are larger 
>> than
>> 8 bytes they are surely aligned).  IIRC CPUs had smaller store buffer
>> entries than cache line size.
>>
>> Given that load bandwith is usually higher than store bandwith it
>> might make sense to do the store combining in our copying sequence,
>> like for the 8 byte entry case use sth like
>>
>>   movq 0(%eax), %xmm0
>>   movhps 8(%eax), %xmm0 // or vpinsert
>>   mov[au]ps %xmm0, 0%(ebx)
>> ...
>>
>> thus do two loads per store and perform the stores in wider
>> mode?
>
> This may be somewhat faster indeed.  I am not sure if store to load
> forwarding will work for the later half when read again by halves.
> It would not happen on older CPUs :)

Yes, forwarding larger stores to smaller loads generally works fine
since forever with the usual restrictions of alignment/size being
power of two "halves".

The question is of course what to do for 4 byte or smaller elements or
mixed size elements.  We can do zero-extending loads
(do we have them for QI, HI mode loads as well?) and
do shift and or's.  I'm quite sure the CPUs wouldn't like to
see vpinsert's of different vector mode destinations.  So it
would be 8 byte stores from GPRs and values built up via
shift & or.

As said, the important part is that IIRC CPUs can usually
have more loads in flight than stores.  Esp. Bulldozer
with the split core was store buffer size limited (but it
could do merging of store buffer entries IIRC).

Richard.

> Honza
>>
>> As said a general concern was you not copying padding.  If you
>> put this into an even more common place you surely will break
>> stuff, no?
>>
>> Richard.
>>
>> >
>> > Martin
>> >
>> >
>> >>
>> >> Richard.
>> >>
>> >> > Martin
>> >> >
>> >> >
>> >> > 2017-10-12  Martin Jambor  
>> >> >
>> >> > PR target/80689
>> >> > * tree-sra.h: New file.
>> >> > * ipa-prop.h: Moved declaration of build_ref_for_offset to
>> >> > tree-sra.h.
>> >> > * expr.c: Include params.h and tree-sra.h.
>> >> > (emit_move_elementwise): New function.
>> >> > (store_expr_with_bounds): Optionally use it.
>> >> > * ipa-cp.c: Include tree-sra.h.
>> >> > * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New.
>> >> > * config/i386/i386.c (ix86_option_override_internal): Set
>> >> > PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35.
>> >> > * tree-sra.c: Include tree-sra.h.
>> >> > (scalarizable_type_p): Renamed to
>> >> > simple_mix_of_records_and_arrays_p, made public, renamed the
>> >> > second parameter to allow_char_arrays.
>> >> > (extract_min_max_idx_from_array): New function.
>> >> > (completely_scalarize): Moved bits of the function to
>> >> > extract_min_max_idx_from_array.
>> >> >
>> >> > testsuite/
>> >> > * gcc.target/i386/pr80689-1.c: New test.
>> >> > ---
>> >> >  gcc/config/i386/i386.c|   4 ++
>> >> >  gcc/expr.c| 103 
>> >> > --
>> >> >  gcc/ipa-cp.c  |   1 +
>> >> >  gcc/ipa-prop.h|   4 --
>> >> >  gcc/params.def|   6 ++
>> >> >  gcc/testsuite/gcc.target/i386/pr80689-1.c |  38 +++
>> >> >  gcc/tree-sra.c|  86 
>> >> > +++--
>> >> >  gcc/tree-sra.h|  33 ++
>> >> >  8 files changed, 233 insertions(+), 42 deletions(-)
>> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c
>> >> >  create mode 100644 gcc/tree-sra.h
>> >> >
>> >> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> >> > index 1ee8351c21f..87f602e7ead 100644
>> >> > --- a/gcc/config/i386/i386.c
>> >> > +++ b/gcc/config/i386/i386.c
>> >> > @@ -6511,6 +6511,10 @@ ix86_option_override_internal (bool main_args_p,
>> >> >  ix86_tune_cost->l2_cache_size,
>> >> >  opts->x_param_values,
>> >> >  opts_set->x_param_values);
>> >> > +  maybe_set_param_value (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY,
>> >> > +

Re: [v3 PATCH] Deduction guides for associative containers, debug mode deduction guide fixes.

2017-10-26 Thread Jonathan Wakely

On 17/10/17 22:48 +0300, Ville Voutilainen wrote:

Tested on Linux-PPC64. The debug mode fixes have been tested manually
and individually on Linux-x64.

2017-10-17  Ville Voutilainen  

   Deduction guides for associative containers, debug mode deduction
guide fixes.
   * include/bits/stl_algobase.h (__iter_key_t)
   (__iter_val_t, __iter_to_alloc_t): New.
   * include/bits/stl_map.h: Add deduction guides.
   * include/bits/stl_multimap.h: Likewise.
   * include/bits/stl_multiset.h: Likewise.
   * include/bits/stl_set.h: Likewise.
   * include/bits/unordered_map.h: Likewise.
   * include/bits/unordered_set.h: Likewise.
   * include/debug/deque: Likewise.
   * include/debug/forward_list: Likewise.
   * include/debug/list: Likewise.
   * include/debug/map.h: Likewise.
   * include/debug/multimap.h: Likewise.
   * include/debug/multiset.h: Likewise.
   * include/debug/set.h: Likewise.
   * include/debug/unordered_map: Likewise.
   * include/debug/unordered_set: Likewise.
   * include/debug/vector: Likewise.
   * testsuite/23_containers/map/cons/deduction.cc: New.
   * testsuite/23_containers/multimap/cons/deduction.cc: Likewise.
   * testsuite/23_containers/multiset/cons/deduction.cc: Likewise.
   * testsuite/23_containers/set/cons/deduction.cc: Likewise.
   * testsuite/23_containers/unordered_map/cons/deduction.cc: Likewise.
   * testsuite/23_containers/unordered_multimap/cons/deduction.cc:
   Likewise.
   * testsuite/23_containers/unordered_multiset/cons/deduction.cc:
   Likewise.
   * testsuite/23_containers/unordered_set/cons/deduction.cc: Likewise.





--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h


This doesn't seem like the right place for these.

Maybe stl_iterator.h with forward declarations of pair and allocator
if needed?



@@ -1429,6 +1429,25 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
#endif

_GLIBCXX_END_NAMESPACE_ALGO
+
+#if __cplusplus > 201402L
+
+  template
+  using __iter_key_t = remove_const_t<
+typename iterator_traits<_InputIterator>::value_type::first_type>;
+
+ template
+   using __iter_val_t =
+   typename iterator_traits<_InputIterator>::value_type::second_type;
+
+ template
+   using __iter_to_alloc_t =
+   pair,
+   typename iterator_traits<_InputIterator>::value_type::second_type>;



Inconsistent indentation for these three. Please use:

 template<...>
   using ...
 ...

Would the third one be simpler as:

 template
   using __iter_to_alloc_t =
 pair>,
  __iter_val_t<_InputIterator>>



--- a/libstdc++-v3/include/bits/stl_map.h
+++ b/libstdc++-v3/include/bits/stl_map.h
@@ -1366,6 +1366,40 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 const map<_K1, _T1, _C1, _A1>&);
};

+
+#if __cpp_deduction_guides >= 201606
+
+ template

The additions to this file seem to be indented by only one space not
two.


+set(_InputIterator, _InputIterator,
+   _Compare = _Compare(), _Allocator = _Allocator())
+   -> set::value_type,
+ _Compare, _Allocator>;


The first line seems to be indented wrong here too.

A stray newline has crept into include/debug/unordered_set:


@@ -1031,6 +1159,7 @@ namespace __debug
  const unordered_multiset<_Value, _Hash, _Pred, _Alloc>& __y)
{ return !(__x == __y); }

+
} // namespace __debug
} // namespace std


I notice there are no copyright headers in the new test. imokwiththis.jpg




Re: [Diagnostic Patch] don't print column zero

2017-10-26 Thread David Malcolm
[CCing Rainer and Mike for the gcc-dg.exp part]

On Thu, 2017-10-26 at 07:33 -0400, Nathan Sidwell wrote:
> On the modules branch, I'm starting to add location
> information.  Line 
> numbers don't really make sense when reporting errors reading a
> binary 
> file, so I wanted to change the diagnostics such that line number
> zero 
> (which is not a line) is not printed -- one just gets the file
> name.  I 
> then noticed that we don't elide column zero (also, not a column
> outside 
> of emacsland).
> 
> This patch changes the diagnostics, such that line-zero prints
> neither 
> line nor column and column-zero doesn't print the column.
> 
> The testsuite presumes that all diagnostics have a column (which may
> or 
> may not be specified in the test pattern).  This patch augments it
> such 
> that a prefix of '-:' indicates 'no column'.  We still default to 
> expecting a column
> 
> The vast bulk is annotating C & C++ tests that do not have a column. 
> Some of those were explicitly checking for column-zero, but many
> just 
> expected some arbitrary column number, which happened to be
> zero.  Of 
> course many (most?) of these diagnostics could be improved to provide
> a 
> column.  Most are from the preprocessor.
> 
> While this is a change in the compiler's output, it's effectively 
> returning to a pre-column formatting for the cases where the column 
> number is not known.  I'd expect (hope?) error message parsers to be 
> robust in that case. (I've found it confusing when column-zero is 
> printed, as I think columns might be zero-based after all.)
> 
> bootstrapped on all languages.
> 
> ok?
> 
> nathan

Indeed, gcc uses 1-based columns, with 0 meaning "the whole line",
whereas Emacs uses 0-based columns (see the comment in line-map.h). 
Probably best to not print them, to avoid confusing the user.

Alternate idea: could show_column become a tri-state:
  * default: show non-zero columns
  * never: never show columns
  * always: always show a column, printing 0 for the no-column case
and then use "always" in our testsuite
?

-fno-show-column would presumably then be a legacy alias for the
"never" value.

> Index: gcc/diagnostic.c
> ===
> --- gcc/diagnostic.c  (revision 254060)
> +++ gcc/diagnostic.c  (working copy)
> @@ -293,6 +293,24 @@ diagnostic_get_color_for_kind (diagnosti
>return diagnostic_kind_color[kind];
>  }
>  
> +/* Return a formatted line and column ':%line:%column'.  Elided if
> +   zero.  The result is a statically allocated buffer.  */

> +static const char *
> +maybe_line_and_column (int line, int col)
> +{
> +  static char result[32];
> +
> +  if (line)
> +{
> +  size_t l = sprintf (result, col ? ":%d:%d" : ":%d", line, col);

Possibly a silly question, but is it OK to have a formatted string
call in which some of the arguments aren't consumed? (here "col" is only
consumed for the true case, which consumes 2 arguments; it's not consumed
for the false case).

> +  gcc_checking_assert (l + 1 < sizeof (result));

Would snprintf be safer?

Please create a selftest for the function, covering these cases:

* line == 0
* line > 0 and col == 0
* line > 0 and col > 0 (checking output for these cases)
* line == INT_MAX and col == INT_MAX (without checking output, just to tickle 
the assert)
* line == INT_MIN and col == INT_MIN (likewise)

Alternatively, please create a selftest for diagnostic_get_location_text,
testing the cases of:
* context->show_column true and false
* N_("")
* the above line/col value combos


> +}
> +  else
> +result[0] = 0;
> +  return result;
> +}
> +
>  /* Return a malloc'd string describing a location e.g. "foo.c:42:10".
> The caller is responsible for freeing the memory.  */
>  
> @@ -303,19 +321,13 @@ diagnostic_get_location_text (diagnostic
>pretty_printer *pp = context->printer;
>const char *locus_cs = colorize_start (pp_show_color (pp), "locus");
>const char *locus_ce = colorize_stop (pp_show_color (pp));
> -
> -  if (s.file == NULL)
> -return build_message_string ("%s%s:%s", locus_cs, progname, locus_ce);
> -
> -  if (!strcmp (s.file, N_("")))
> -return build_message_string ("%s%s:%s", locus_cs, s.file, locus_ce);
> -
> -  if (context->show_column)
> -return build_message_string ("%s%s:%d:%d:%s", locus_cs, s.file, s.line,
> -  s.column, locus_ce);
> -  else
> -return build_message_string ("%s%s:%d:%s", locus_cs, s.file, s.line,
> -  locus_ce);
> +  const char *file = s.file ? s.file : progname;
> +  int line = strcmp (file, N_("")) ? s.line : 0;
> +  int col = context->show_column ? s.column : 0;
> +
> +  const char *line_col = maybe_line_and_column (line, col);
> +  return build_message_string ("%s%s%s:%s", locus_cs, file,
> +line_col, locus_ce);
>  }
>  
>  /* Return a malloc'd string describing a location and the severity of the
> @@ -577,21 +589,20 @@ 

Re: [PATCH, rs6000] Gimple folding for vec_madd()

2017-10-26 Thread Will Schmidt
On Thu, 2017-10-26 at 11:05 +0200, Richard Biener wrote:
> On Wed, Oct 25, 2017 at 4:38 PM, Will Schmidt  
> wrote:
> > Hi,
> >
> > Add support for gimple folding of the vec_madd() (vector multiply-add)
> > intrinsics.
> > Testcase coverage is provided by the existing tests
> >  gcc.target/powerpc/fold-vec-madd-*.c
> >
> > Sniff-tests appear clean.  A full regtest is currently running across 
> > assorted Power systems. (P6-P9).
> > OK for trunk (pending clean run results)?
> 
> You can use FMA_EXPR on integer operands as well.  Otherwise you risk
> the FMA be not matched by combine later when part of the operation is
> CSEd.

I had tried that initially, without success,..   I'll probably need
another hint.  :-) 
Looking a bit closer, I think I see why the assert fired, but I'm not
sure what the proper fix would be.

So attempting to the FMA_EXPR on the integer operands. (vector shorts in
this case), I end up triggering this error:

/home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/testsuite/gcc.target/powerpc/fold-vec-madd-short.c:14:10:
 internal compiler error: in expand_expr_real_2, at expr.c:8712
0x10813303 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
expand_modifier)
/home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:8712
0x1081822f expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
/home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:9787
0x1080f7bb expand_expr_real(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
/home/willschm/gcc/gcc-mainline-vec_fold_misc/gcc/expr.c:8084
...


which when followed back, I tripped an assert here:  (gcc/expr.c:
expand_expr_real_2() ~ line 8710)

case FMA_EXPR:
  {
optab opt = fma_optab;
gimple *def0, *def2;
if (optab_handler (fma_optab, mode) == CODE_FOR_nothing)
  {
tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA);
tree call_expr;

gcc_assert (fn != NULL_TREE);

where gcc/builtins.c
mathfn_built_in()->mathfn_built_in_1->mathfn_built_in_2 looks to have
returned END_BUILTINS/NULL_TREE, due to falling through the if/else
tree:

  if (TYPE_MAIN_VARIANT (type) == double_type_node)
return fcode;
  else if (TYPE_MAIN_VARIANT (type) == float_type_node)
return fcodef;
  else if (TYPE_MAIN_VARIANT (type) == long_double_type_node)
return fcodel;
  else
return END_BUILTINS;

Looks like that is all double/float/long double contents.  First blush
attempt would be to add V8HI_type_node/integer_type_node to that if/else
tree, but that doesn't look like it would be near enough.

Thanks
-Will

> 
> Richard.
> 
> > Thanks,
> > -Will
> >
> > [gcc]
> >
> > 2017-10-25  Will Schmidt 
> >
> > * config/rs6000/rs6000.c: (rs6000_gimple_fold_builtin) Add support 
> > for
> >   gimple folding of vec_madd() intrinsics.
> >
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index 4837e14..04c2b15 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -16606,10 +16606,43 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> > *gsi)
> >build_int_cst (arg2_type, 0)), 
> > arg0);
> >  gimple_set_location (g, loc);
> >  gsi_replace (gsi, g, true);
> >  return true;
> >}
> > +
> > +/* vec_madd (Float) */
> > +case ALTIVEC_BUILTIN_VMADDFP:
> > +case VSX_BUILTIN_XVMADDDP:
> > +  {
> > +   arg0 = gimple_call_arg (stmt, 0);
> > +   arg1 = gimple_call_arg (stmt, 1);
> > +   tree arg2 = gimple_call_arg (stmt, 2);
> > +   lhs = gimple_call_lhs (stmt);
> > +   gimple *g = gimple_build_assign (lhs, FMA_EXPR , arg0, arg1, arg2);
> > +   gimple_set_location (g, gimple_location (stmt));
> > +   gsi_replace (gsi, g, true);
> > +   return true;
> > +  }
> > +/* vec_madd (Integral) */
> > +case ALTIVEC_BUILTIN_VMLADDUHM:
> > +  {
> > +   arg0 = gimple_call_arg (stmt, 0);
> > +   arg1 = gimple_call_arg (stmt, 1);
> > +   tree arg2 = gimple_call_arg (stmt, 2);
> > +   lhs = gimple_call_lhs (stmt);
> > +   tree lhs_type = TREE_TYPE (lhs);
> > +   location_t loc = gimple_location (stmt);
> > +   gimple_seq stmts = NULL;
> > +   tree mult_result = gimple_build (, loc, MULT_EXPR,
> > +  lhs_type, arg0, arg1);
> > +   tree plus_result = gimple_build (, loc, PLUS_EXPR,
> > +  lhs_type, mult_result, arg2);
> > +   gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> > +   update_call_from_tree (gsi, plus_result);
> > +   return true;
> > +  }
> > +
> >  default:
> > if (TARGET_DEBUG_BUILTIN)
> >fprintf (stderr, "gimple builtin intrinsic not matched:%d %s 
> > %s\n",
> > fn_code, 

[Patch obvious][arm testsuite] Fixup expected location in require-pic-register-loc.c

2017-10-26 Thread James Greenhalgh

Hi,

After r254010 we now add -gcolumn-info by default, that means the tests
in gcc.target/arm/require-pic-register-loc.c need adjusting to not expect
to see column zero.

That's the obvious fix, and just extends what Jakub did in r254010 so
I've applied it as r254106.

Thanks,
James

---
2017-10-25  James Greenhalgh  

* gcc.target/arm/require-pic-register-loc.c: Use wider regex for
column information.

diff --git a/gcc/testsuite/gcc.target/arm/require-pic-register-loc.c b/gcc/testsuite/gcc.target/arm/require-pic-register-loc.c
index bd85e86..268e9e4 100644
--- a/gcc/testsuite/gcc.target/arm/require-pic-register-loc.c
+++ b/gcc/testsuite/gcc.target/arm/require-pic-register-loc.c
@@ -18,12 +18,12 @@ main (int argc)/* line 9.  */
   return 0;
 }
 
-/* { dg-final { scan-assembler-not "\.loc 1 7 0" } } */
-/* { dg-final { scan-assembler-not "\.loc 1 8 0" } } */
-/* { dg-final { scan-assembler-not "\.loc 1 9 0" } } */
+/* { dg-final { scan-assembler-not "\.loc 1 7 \[0-9\]\+" } } */
+/* { dg-final { scan-assembler-not "\.loc 1 8 \[0-9\]\+" } } */
+/* { dg-final { scan-assembler-not "\.loc 1 9 \[0-9\]\+" } } */
 
 /* The loc at the start of the prologue.  */
-/* { dg-final { scan-assembler-times "\.loc 1 10 0" 1 } } */
+/* { dg-final { scan-assembler-times "\.loc 1 10 \[0-9\]\+" 1 } } */
 
 /* The loc at the end of the prologue, with the first user line.  */
-/* { dg-final { scan-assembler-times "\.loc 1 11 0" 1 } } */
+/* { dg-final { scan-assembler-times "\.loc 1 11 \[0-9\]\+" 1 } } */


Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Michael Matz
Hi,

On Thu, 26 Oct 2017, Martin Jambor wrote:

> > 35 bytes seems to be much - what is the code-size impact?
> 
> I will find out and report on that.  I need at least 32 bytes (four
> long ints) to fix imagemagick, where the problematic structure is:

Surely the final heuristic should look at the size and number of elements 
of the struct in question, not only on size.


Ciao,
Michael.


Re: [PATCH 00/13] Removal of SDB debug info support

2017-10-26 Thread Jeff Law
On 10/26/2017 03:33 AM, Richard Biener wrote:
> On Wed, Oct 25, 2017 at 11:24 PM, Jim Wilson  wrote:
>> We have no targets that emit SDB debug info by default.  We dropped all
>> of the SVR3 Unix and embedded COFF targets a while ago.  The only
>> targets that are still able to emit SDB debug info are cygwin, mingw,
>> and msdosdjgpp.
>>
>> I tried a cygwin build with sources modified to emit SDB by default, to
>> see if the support was still usable.  I ran into multiple problems.
>>  There is no SDB support for IMPORTED_DECL which was added in 2008.  -
>> freorder-functions and -freorder-blocks-and-partition did not work and
>> had to be disabled.  I hit a cgraph assert because sdbout.c uses
>> assemble_name on types, which fails if there is a function and type
>> with the same name.  This also causes types to be added to the debug
>> info with prepended underscores which is wrong.  I then ran into a
>> problem with the i386_pe_declare_function_type call from
>> i386_pe_file_end and gave up because I didn't see an easy workaround.
>>
>> It seems clear that the SDB support is no longer usable, and probably
>> hasn't been for a while.  This support should just be removed.
>>
>> SDB is both a debug info format and an old Unix debugger.  There were
>> some references to the debugger that I left in, changing to past tense,
>> as the comments are useful history to explain why the code was written
>> the was it was.  Otherwise, I tried to eliminate all references to sdb
>> as a debug info format.
>>
>> This patch series was tested with a C only cross compiler build for all
>> modified embedded targets, a default languages build for power aix,
>> i686 cygwin, and x86_64 linux.  I also did gdb testsuite runs for
>> cygwin and linux.  There were no regressions.
>>
>> As a debug info maintainer, I can self approve some of this stuff,
>> would be would be good to get a review from one of the other global
>> reviewers, and/or target maintainers.
> 
> You have my approval for this.  Can you add a blurb to gcc-8/changes.html,
> like "support for emitting SDB debug info has been removed" in the caveats
> section?
I didn't see anything I would consider controversial in the series.  I'd
echo Richi's comment about potentially keeping the flag as ignored.

jeff



Re: [PING][PATCH][Aarch64] Improve int<->FP conversions

2017-10-26 Thread James Greenhalgh
On Tue, Oct 24, 2017 at 10:47:32PM +0100, Michael Collison wrote:
> James,
> 
> The patch was test as required. However when I tested with the latest trunk 
> there were two test failures that need to be updated because of my patch. 
> 
> The files gcc.target/aarch64/vect-vcvt.c was failing because the 
> scan-assembler directives assumed the destination register was assumed to be 
> an integer register. With my patch the destination can be an integer or fp 
> register.
> 
> I fixed the failures and bootstrapped and tested on aarch64-linux-gnu. Okay 
> for trunk?

OK.

> --- a/gcc/testsuite/gcc.target/aarch64/vect-vcvt.c
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-vcvt.c
> @@ -56,13 +56,13 @@ TEST (SUFFIX, q, 32, 4, u,u,s)\
>  TEST (SUFFIX, q, 64, 2, u,u,d)   \
>  
>  BUILD_VARIANTS ( )
> -/* { dg-final { scan-assembler "fcvtzs\\tw\[0-9\]+, s\[0-9\]+" } } */
> -/* { dg-final { scan-assembler "fcvtzs\\tx\[0-9\]+, d\[0-9\]+" } } */
> +/* { dg-final { scan-assembler "fcvtzs\\t(w|s)\[0-9\]+, s\[0-9\]+" } } */
> +/* { dg-final { scan-assembler "fcvtzs\\t(x|d)\[0-9\]+, d\[0-9\]+" } } */
>  /* { dg-final { scan-assembler "fcvtzs\\tv\[0-9\]+\.2s, v\[0-9\]+\.2s" } } */
>  /* { dg-final { scan-assembler "fcvtzs\\tv\[0-9\]+\.4s, v\[0-9\]+\.4s" } } */
>  /* { dg-final { scan-assembler "fcvtzs\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d" } } */
> -/* { dg-final { scan-assembler "fcvtzu\\tw\[0-9\]+, s\[0-9\]+" } } */
> -/* { dg-final { scan-assembler "fcvtzu\\tx\[0-9\]+, d\[0-9\]+" } } */
> +/* { dg-final { scan-assembler "fcvtzu\\t(w|s)\[0-9\]+, s\[0-9\]+" } } */
> +/* { dg-final { scan-assembler "fcvtzu\\t(x|d)\[0-9\]+, d\[0-9\]+" } } */
>  /* { dg-final { scan-assembler "fcvtzu\\tv\[0-9\]+\.2s, v\[0-9\]+\.2s" } } */
>  /* { dg-final { scan-assembler "fcvtzu\\tv\[0-9\]+\.4s, v\[0-9\]+\.4s" } } */
>  /* { dg-final { scan-assembler "fcvtzu\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d" } } */

Personally I'd have used \[ws\] but this works too.

Reviewed by: James Greenhalgh 

James



Re: [PATCH PR79868 ][aarch64] Fix error calls in aarch64 code so they can be translated (version 2)

2017-10-26 Thread Richard Earnshaw (lists)
On 26/09/17 00:25, Steve Ellcey wrote:
> This is a new version of my patch to fix PR target/79868, where some
> error messages are impossible to translate correctly due to how the
> strings are dynamically constructed.  It also includes some format
> changes in the error messags to make the messages more consistent with
> each other and with other GCC errors.  This was worked out with help
> from Martin Sebor.  I also had to fix some tests to match the new error
> string formats.
> 
> Tested on Aarch64 with no regressions, OK to checkin?

I can't help feeling that all this logic is somewhat excessive and
changing the wording of each message to include "pragma or attribute"
would solve it equally well.  With the new context highlighting it's
trivial to tell which subcase of usage is being referred to.

R.

> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> 2017-09-25  Steve Ellcey  
> 
>   PR target/79868
>   * config/aarch64/aarch64-c.c (aarch64_pragma_target_parse):
>   Change argument type on aarch64_process_target_attr call.
>   * config/aarch64/aarch64-protos.h (aarch64_process_target_attr):
>   Change argument type.
>   * config/aarch64/aarch64.c (aarch64_attribute_info): Change
>   field type.
>   (aarch64_handle_attr_arch): Change argument type, use boolean
>   argument to use different strings in error calls.
>   (aarch64_handle_attr_cpu): Ditto.
>   (aarch64_handle_attr_tune): Ditto.
>   (aarch64_handle_attr_isa_flags): Ditto.
>   (aarch64_process_one_target_attr): Ditto.
>   (aarch64_process_target_attr): Ditto.
>   (aarch64_option_valid_attribute_p): Change argument type on
>   aarch64_process_target_attr call.
> 
> 
> 2017-09-25  Steve Ellcey  
> 
>   PR target/79868
>   * gcc.target/aarch64/spellcheck_1.c: Update dg-error string to match
>   new format.
>   * gcc.target/aarch64/spellcheck_2.c: Ditto.
>   * gcc.target/aarch64/spellcheck_3.c: Ditto.
>   * gcc.target/aarch64/target_attr_11.c: Ditto.
>   * gcc.target/aarch64/target_attr_12.c: Ditto.
>   * gcc.target/aarch64/target_attr_17.c: Ditto.
> 
> 
> pr79868.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
> index 177e638..c9945db 100644
> --- a/gcc/config/aarch64/aarch64-c.c
> +++ b/gcc/config/aarch64/aarch64-c.c
> @@ -165,7 +165,7 @@ aarch64_pragma_target_parse (tree args, tree pop_target)
>   information that it specifies.  */
>if (args)
>  {
> -  if (!aarch64_process_target_attr (args, "pragma"))
> +  if (!aarch64_process_target_attr (args, true))
>   return false;
>  
>aarch64_override_options_internal (_options);
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index e67c2ed..4323e9e 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -445,7 +445,7 @@ bool aarch64_gen_adjusted_ldpstp (rtx *, bool, 
> scalar_mode, RTX_CODE);
>  
>  void aarch64_init_builtins (void);
>  
> -bool aarch64_process_target_attr (tree, const char*);
> +bool aarch64_process_target_attr (tree, bool);
>  void aarch64_override_options_internal (struct gcc_options *);
>  
>  rtx aarch64_expand_builtin (tree exp,
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 1c14008..122ed5e 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -67,6 +67,7 @@
>  #include "common/common-target.h"
>  #include "selftest.h"
>  #include "selftest-rtl.h"
> +#include "intl.h"
>  
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -9554,15 +9555,15 @@ struct aarch64_attribute_info
>const char *name;
>enum aarch64_attr_opt_type attr_type;
>bool allow_neg;
> -  bool (*handler) (const char *, const char *);
> +  bool (*handler) (const char *, bool);
>enum opt_code opt_num;
>  };
>  
>  /* Handle the ARCH_STR argument to the arch= target attribute.
> -   PRAGMA_OR_ATTR is used in potential error messages.  */
> +   IS_PRAGMA is used in potential error messages.  */
>  
>  static bool
> -aarch64_handle_attr_arch (const char *str, const char *pragma_or_attr)
> +aarch64_handle_attr_arch (const char *str, bool is_pragma)
>  {
>const struct processor *tmp_arch = NULL;
>enum aarch64_parse_opt_result parse_res
> @@ -9579,15 +9580,22 @@ aarch64_handle_attr_arch (const char *str, const char 
> *pragma_or_attr)
>switch (parse_res)
>  {
>case AARCH64_PARSE_MISSING_ARG:
> - error ("missing architecture name in 'arch' target %s", pragma_or_attr);
> + error (is_pragma
> +? G_("missing name in % pragma")
> +: G_("missing name in % attribute"));
>   break;
>case AARCH64_PARSE_INVALID_ARG:
> - error ("unknown value %qs for 'arch' target %s", str, pragma_or_attr);
> + error (is_pragma
> +

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Jan Hubicka
> I think the limit should be on the number of generated copies and not
> the overall size of the structure...  If the struct were composed of
> 32 individual chars we wouldn't want to emit 32 loads and 32 stores...
> 
> I wonder how rep; movb; interacts with store to load forwarding?  Is
> that maybe optimized well on some archs?  movb should always
> forward and wasn't the setup cost for small N reasonable on modern
> CPUs?

rep mov is win over loop for blocks over 128bytes on core, for blocks in rage
24-128 on zen.  This is w/o store/load forwarding, but I doubt those provide
a cheap way around.

> 
> It probably depends on the width of the entries in the store buffer,
> if they appear in-order and the alignment of the stores (if they are larger 
> than
> 8 bytes they are surely aligned).  IIRC CPUs had smaller store buffer
> entries than cache line size.
> 
> Given that load bandwith is usually higher than store bandwith it
> might make sense to do the store combining in our copying sequence,
> like for the 8 byte entry case use sth like
> 
>   movq 0(%eax), %xmm0
>   movhps 8(%eax), %xmm0 // or vpinsert
>   mov[au]ps %xmm0, 0%(ebx)
> ...
> 
> thus do two loads per store and perform the stores in wider
> mode?

This may be somewhat faster indeed.  I am not sure if store to load
forwarding will work for the later half when read again by halves.
It would not happen on older CPUs :)

Honza
> 
> As said a general concern was you not copying padding.  If you
> put this into an even more common place you surely will break
> stuff, no?
> 
> Richard.
> 
> >
> > Martin
> >
> >
> >>
> >> Richard.
> >>
> >> > Martin
> >> >
> >> >
> >> > 2017-10-12  Martin Jambor  
> >> >
> >> > PR target/80689
> >> > * tree-sra.h: New file.
> >> > * ipa-prop.h: Moved declaration of build_ref_for_offset to
> >> > tree-sra.h.
> >> > * expr.c: Include params.h and tree-sra.h.
> >> > (emit_move_elementwise): New function.
> >> > (store_expr_with_bounds): Optionally use it.
> >> > * ipa-cp.c: Include tree-sra.h.
> >> > * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New.
> >> > * config/i386/i386.c (ix86_option_override_internal): Set
> >> > PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35.
> >> > * tree-sra.c: Include tree-sra.h.
> >> > (scalarizable_type_p): Renamed to
> >> > simple_mix_of_records_and_arrays_p, made public, renamed the
> >> > second parameter to allow_char_arrays.
> >> > (extract_min_max_idx_from_array): New function.
> >> > (completely_scalarize): Moved bits of the function to
> >> > extract_min_max_idx_from_array.
> >> >
> >> > testsuite/
> >> > * gcc.target/i386/pr80689-1.c: New test.
> >> > ---
> >> >  gcc/config/i386/i386.c|   4 ++
> >> >  gcc/expr.c| 103 
> >> > --
> >> >  gcc/ipa-cp.c  |   1 +
> >> >  gcc/ipa-prop.h|   4 --
> >> >  gcc/params.def|   6 ++
> >> >  gcc/testsuite/gcc.target/i386/pr80689-1.c |  38 +++
> >> >  gcc/tree-sra.c|  86 
> >> > +++--
> >> >  gcc/tree-sra.h|  33 ++
> >> >  8 files changed, 233 insertions(+), 42 deletions(-)
> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c
> >> >  create mode 100644 gcc/tree-sra.h
> >> >
> >> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> >> > index 1ee8351c21f..87f602e7ead 100644
> >> > --- a/gcc/config/i386/i386.c
> >> > +++ b/gcc/config/i386/i386.c
> >> > @@ -6511,6 +6511,10 @@ ix86_option_override_internal (bool main_args_p,
> >> >  ix86_tune_cost->l2_cache_size,
> >> >  opts->x_param_values,
> >> >  opts_set->x_param_values);
> >> > +  maybe_set_param_value (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY,
> >> > +35,
> >> > +opts->x_param_values,
> >> > +opts_set->x_param_values);
> >> >
> >> >/* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. 
> >> >  */
> >> >if (opts->x_flag_prefetch_loop_arrays < 0
> >> > diff --git a/gcc/expr.c b/gcc/expr.c
> >> > index 134ee731c29..dff24e7f166 100644
> >> > --- a/gcc/expr.c
> >> > +++ b/gcc/expr.c
> >> > @@ -61,7 +61,8 @@ along with GCC; see the file COPYING3.  If not see
> >> >  #include "tree-chkp.h"
> >> >  #include "rtl-chkp.h"
> >> >  #include "ccmp.h"
> >> > -
> >> > +#include "params.h"
> >> > +#include "tree-sra.h"
> >> >
> >> >  /* If this is nonzero, we do not bother generating VOLATILE
> >> > around volatile memory references, and we are willing to
> >> > @@ -5340,6 +5341,80 @@ emit_storent_insn (rtx to, rtx from)
> >> >return maybe_expand_insn (code, 2, ops);
> >> >  }
> >> >

Re: [C++ PATCH] Kill IDENTIFIER_LABEL_VALUE

2017-10-26 Thread Nathan Sidwell

On 10/25/2017 05:36 PM, Nathan Sidwell wrote:
This patch removes 'label_value' from lang_identifier, shrinking it from 
72 to 64 bytes (on 64-bit machine).   We replace this by augmenting the 
already used per-function named_labels hash table.  This is a major win, 
because labels are extremely rare and there are many identifiers.  We 
also shring the binding structure by a pointer, as the shadowed_labels 
list goes away.


I was a little over zealous killing code, but perhaps now this is being 
a little paranoid.  It restore the UID sorting of labels when inserting 
them into the BLOCK chain.  The original comment was confusing, as it 
mentioned code generation and then debug information.  I think this just 
affects the order of debug records, but ICBW.  For any given function, 
the iteration of the hash table should be stable across versions, unless 
the hash table implementation or the IDENTIFIER_HASH_VALUE changes.  But 
may as well be safe.


I also add the N_ translation markup I forgot about yesterday when 
taking strings out of 'inform' calls.


nathan

--
Nathan Sidwell
2017-10-26  Nathan Sidwell  

	* decl.c (sort_labels): Restore function.
	(pop_labels): Sort labels
	(identify_goto): Add translation markup.

Index: decl.c
===
--- decl.c	(revision 254087)
+++ decl.c	(working copy)
@@ -372,6 +372,18 @@ check_label_used (tree label)
 }
 }
 
+/* Helper function to sort named label entries in a vector by DECL_UID.  */
+
+static int
+sort_labels (const void *a, const void *b)
+{
+  tree label1 = *(tree const *) a;
+  tree label2 = *(tree const *) b;
+
+  /* DECL_UIDs can never be equal.  */
+  return DECL_UID (label1) > DECL_UID (label2) ? -1 : +1;
+}
+
 /* At the end of a function, all labels declared within the function
go out of scope.  BLOCK is the top-level block for the
function.  */
@@ -382,6 +394,12 @@ pop_labels (tree block)
   if (!named_labels)
 return;
 
+  /* We need to add the labels to the block chain, so debug
+ information is emitted.  But, we want the order to be stable so
+ need to sort them first.  Otherwise the debug output could be
+ randomly ordered.  I guess it's mostly stable, unless the hash
+ table implementation changes.  */
+  auto_vec labels (named_labels->elements ());
   hash_table::iterator end (named_labels->end ());
   for (hash_table::iterator iter
 	 (named_labels->begin ()); iter != end; ++iter)
@@ -390,18 +408,21 @@ pop_labels (tree block)
 
   gcc_checking_assert (!ent->outer);
   if (ent->label_decl)
-	{
-	  check_label_used (ent->label_decl);
-
-	  /* Put the labels into the "variables" of the top-level block,
-	 so debugger can see them.  */
-	  DECL_CHAIN (ent->label_decl) = BLOCK_VARS (block);
-	  BLOCK_VARS (block) = ent->label_decl;
-	}
+	labels.quick_push (ent->label_decl);
   ggc_free (ent);
 }
-
   named_labels = NULL;
+  labels.qsort (sort_labels);
+
+  while (labels.length ())
+{
+  tree label = labels.pop ();
+
+  DECL_CHAIN (label) = BLOCK_VARS (block);
+  BLOCK_VARS (block) = label;
+
+  check_label_used (label);
+}
 }
 
 /* At the end of a block with local labels, restore the outer definition.  */
@@ -3066,8 +3087,8 @@ identify_goto (tree decl, location_t loc
 {
   bool complained
 = emit_diagnostic (diag_kind, loc, 0,
-		   decl ? "jump to label %qD" : "jump to case label",
-		   decl);
+		   decl ? N_("jump to label %qD")
+		   : N_("jump to case label"), decl);
   if (complained && locus)
 inform (*locus, "  from here");
   return complained;
@@ -3136,32 +3157,32 @@ check_previous_goto_1 (tree decl, cp_bin
 	{
 	case sk_try:
 	  if (!saw_eh)
-	inf = "enters try block";
+	inf = N_("enters try block");
 	  saw_eh = true;
 	  break;
 
 	case sk_catch:
 	  if (!saw_eh)
-	inf = "enters catch block";
+	inf = N_("enters catch block");
 	  saw_eh = true;
 	  break;
 
 	case sk_omp:
 	  if (!saw_omp)
-	inf = "enters OpenMP structured block";
+	inf = N_("enters OpenMP structured block");
 	  saw_omp = true;
 	  break;
 
 	case sk_transaction:
 	  if (!saw_tm)
-	inf = "enters synchronized or atomic statement";
+	inf = N_("enters synchronized or atomic statement");
 	  saw_tm = true;
 	  break;
 
 	case sk_block:
 	  if (!saw_cxif && level_for_constexpr_if (b->level_chain))
 	{
-	  inf = "enters constexpr if statement";
+	  inf = N_("enters constexpr if statement");
 	  loc = EXPR_LOCATION (b->level_chain->this_entity);
 	  saw_cxif = true;
 	}


Re: [09/nn] Add a fixed_size_mode_pod class

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 2:18 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
>>  wrote:
>>> This patch adds a POD version of fixed_size_mode.  The only current use
>>> is for storing the __builtin_apply and __builtin_result register modes,
>>> which were made fixed_size_modes by the previous patch.
>>
>> Bah - can we update our host compiler to C++11/14 please ...?
>> (maybe requiring that build with GCC 4.8 as host compiler works,
>> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).
>
> That'd be great :-)  It would avoid all the poly_int_pod stuff too,
> and allow some clean-up of wide-int.h.

Can you figure what oldest GCC release supports the C++11/14 POD handling
that would be required?

Richard.

> Thanks for the reviews,
> Richard
>
>
>>
>> Ok.
>>
>> Thanks,
>> Richard.
>>
>>>
>>> 2017-10-23  Richard Sandiford  
>>> Alan Hayward  
>>> David Sherwood  
>>>
>>> gcc/
>>> * coretypes.h (fixed_size_mode): Declare.
>>> (fixed_size_mode_pod): New typedef.
>>> * builtins.h (target_builtins::x_apply_args_mode)
>>> (target_builtins::x_apply_result_mode): Change type to
>>> fixed_size_mode_pod.
>>> * builtins.c (apply_args_size, apply_result_size, result_vector)
>>> (expand_builtin_apply_args_1, expand_builtin_apply)
>>> (expand_builtin_return): Update accordingly.
>>>
>>> Index: gcc/coretypes.h
>>> ===
>>> --- gcc/coretypes.h 2017-09-11 17:10:58.656085547 +0100
>>> +++ gcc/coretypes.h 2017-10-23 11:42:57.592545063 +0100
>>> @@ -59,6 +59,7 @@ typedef const struct rtx_def *const_rtx;
>>>  class scalar_int_mode;
>>>  class scalar_float_mode;
>>>  class complex_mode;
>>> +class fixed_size_mode;
>>>  template class opt_mode;
>>>  typedef opt_mode opt_scalar_mode;
>>>  typedef opt_mode opt_scalar_int_mode;
>>> @@ -66,6 +67,7 @@ typedef opt_mode opt_
>>>  template class pod_mode;
>>>  typedef pod_mode scalar_mode_pod;
>>>  typedef pod_mode scalar_int_mode_pod;
>>> +typedef pod_mode fixed_size_mode_pod;
>>>
>>>  /* Subclasses of rtx_def, using indentation to show the class
>>> hierarchy, along with the relevant invariant.
>>> Index: gcc/builtins.h
>>> ===
>>> --- gcc/builtins.h  2017-08-30 12:18:46.602740973 +0100
>>> +++ gcc/builtins.h  2017-10-23 11:42:57.592545063 +0100
>>> @@ -29,14 +29,14 @@ struct target_builtins {
>>>   the register is not used for calling a function.  If the machine
>>>   has register windows, this gives only the outbound registers.
>>>   INCOMING_REGNO gives the corresponding inbound register.  */
>>> -  machine_mode x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>>> +  fixed_size_mode_pod x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>>>
>>>/* For each register that may be used for returning values, this gives
>>>   a mode used to copy the register's value.  VOIDmode indicates the
>>>   register is not used for returning values.  If the machine has
>>>   register windows, this gives only the outbound registers.
>>>   INCOMING_REGNO gives the corresponding inbound register.  */
>>> -  machine_mode x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>>> +  fixed_size_mode_pod x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>>>  };
>>>
>>>  extern struct target_builtins default_target_builtins;
>>> Index: gcc/builtins.c
>>> ===
>>> --- gcc/builtins.c  2017-10-23 11:41:23.140260335 +0100
>>> +++ gcc/builtins.c  2017-10-23 11:42:57.592545063 +0100
>>> @@ -1358,7 +1358,6 @@ apply_args_size (void)
>>>static int size = -1;
>>>int align;
>>>unsigned int regno;
>>> -  machine_mode mode;
>>>
>>>/* The values computed by this function never change.  */
>>>if (size < 0)
>>> @@ -1374,7 +1373,7 @@ apply_args_size (void)
>>>for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>> if (FUNCTION_ARG_REGNO_P (regno))
>>>   {
>>> -   mode = targetm.calls.get_raw_arg_mode (regno);
>>> +   fixed_size_mode mode = targetm.calls.get_raw_arg_mode (regno);
>>>
>>> gcc_assert (mode != VOIDmode);
>>>
>>> @@ -1386,7 +1385,7 @@ apply_args_size (void)
>>>   }
>>> else
>>>   {
>>> -   apply_args_mode[regno] = VOIDmode;
>>> +   apply_args_mode[regno] = as_a  (VOIDmode);
>>>   }
>>>  }
>>>return size;
>>> @@ -1400,7 +1399,6 @@ apply_result_size (void)
>>>  {
>>>static int size = -1;
>>>int align, regno;
>>> -  machine_mode mode;
>>>
>>>/* The values computed by this function never change.  */
>>>if (size < 0)
>>> @@ -1410,7 +1408,7 

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 2:18 PM, Martin Jambor  wrote:
> Hi,
>
> On Tue, Oct 17, 2017 at 01:34:54PM +0200, Richard Biener wrote:
>> On Fri, Oct 13, 2017 at 6:13 PM, Martin Jambor  wrote:
>> > Hi,
>> >
>> > I'd like to request comments to the patch below which aims to fix PR
>> > 80689, which is an instance of a store-to-load forwarding stall on
>> > x86_64 CPUs in the Image Magick benchmark, which is responsible for a
>> > slow down of up to 9% compared to gcc 6, depending on options and HW
>> > used.  (Actually, I have just seen 24% in one specific combination but
>> > for various reasons can no longer verify it today.)
>> >
>> > The revision causing the regression is 237074, which increased the
>> > size of the mode for copying aggregates "by pieces" to 128 bits,
>> > incurring big stalls when the values being copied are also still being
>> > stored in a smaller data type or if the copied values are loaded in a
>> > smaller types shortly afterwards.  Such situations happen in Image
>> > Magick even across calls, which means that any non-IPA flow-sensitive
>> > approach would not detect them.  Therefore, the patch simply changes
>> > the way we copy small BLKmode data that are simple combinations of
>> > records and arrays (meaning no unions, bit-fields but also character
>> > arrays are disallowed) and simply copies them one field and/or element
>> > at a time.
>> >
>> > "Small" in this RFC patch means up to 35 bytes on x86_64 and i386 CPUs
>> > (the structure in the benchmark has 32 bytes) but is subject to change
>> > after more benchmarking and is actually zero - meaning element copying
>> > never happens - on other architectures.  I believe that any
>> > architecture with a store buffer can benefit but it's probably better
>> > to leave it to their maintainers to find a different default value.  I
>> > am not sure this is how such HW-dependant decisions should be done and
>> > is the primary reason, why I am sending this RFC first.
>> >
>> > I have decided to implement this change at the expansion level because
>> > at that point the type information is still readily available and at
>> > the same time we can also handle various implicit copies, for example
>> > those passing a parameter.  I found I could re-use some bits and
>> > pieces of tree-SRA and so I did, creating tree-sra.h header file in
>> > the process.
>> >
>> > I am fully aware that in the final patch the new parameter, or indeed
>> > any new parameters, need to be documented.  I have skipped that
>> > intentionally now and will write the documentation if feedback here is
>> > generally good.
>> >
>> > I have bootstrapped and tested this patch on x86_64-linux, with
>> > different values of the parameter and only found problems with
>> > unreasonably high values leading to OOM.  I have done the same with a
>> > previous version of the patch which was equivalent to the limit being
>> > 64 bytes on aarch64-linux, ppc64le-linux and ia64-linux and only ran
>> > into failures of tests which assumed that structure padding was copied
>> > in aggregate copies (mostly gcc.target/aarch64/aapcs64/ stuff but also
>> > for example gcc.dg/vmx/varargs-4.c).
>> >
>> > The patch decreases the SPEC 2017 "rate" run-time of imagick by 9% and
>> > 8% at -O2 and -Ofast compilation levels respectively on one particular
>> > new AMD CPU and by 6% and 3% on one particular old Intel machine.
>> >
>> > Thanks in advance for any comments,
>>
>> I wonder if you can at the place you choose to hook this in elide any
>> copying of padding between fields.
>>
>> I'd rather have hooked such "high level" optimization in
>> expand_assignment where you can be reasonably sure you're seeing an
>> actual source-level construct.
>
> I have discussed this with Honza and we eventually decided to make the
> elememnt-wise copy an alternative to emit_block_move (which uses the
> larger mode for moving since GCC 7) exactly so that we handle not only
> source-level assignments but also passing parameters by value and
> other situations.
>
>>
>> 35 bytes seems to be much - what is the code-size impact?
>
> I will find out and report on that.  I need at least 32 bytes (four
> long ints) to fix imagemagick, where the problematic structure is:
>
>   typedef struct _RectangleInfo
>   {
> size_t
>   width,
>   height;
>
> ssize_t
>   x,
>   y;
>   } RectangleInfo;
>
> ...so 8 longs, no padding.  Since any aggregate having between 33-35
> bytes needs to consist of smaller fields/elements, it seemed
> reasonable to also copy them element-wise.
>
> Nevertheless, I still intend to experiment with the limit, I sent out
> this RFC exactly so that I don't spend a lot of time benchmarking
> something that is eventually not deemed acceptable on principle.

I think the limit should be on the number of generated copies and not
the overall size of the structure...  If the struct were composed of
32 individual chars we wouldn't want to emit 32 

Re: [18/nn] Use (CONST_VECTOR|GET_MODE)_NUNITS in simplify-rtx.c

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:28 PM, Richard Sandiford
 wrote:
> This patch avoids some calculations of the form:
>
>   GET_MODE_SIZE (vector_mode) / GET_MODE_SIZE (element_mode)
>
> in simplify-rtx.c.  If we're dealing with CONST_VECTORs, it's better
> to use CONST_VECTOR_NUNITS, since that remains constant even after the
> SVE patches.  In other cases we can get the number from GET_MODE_NUNITS.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * simplify-rtx.c (simplify_const_unary_operation): Use GET_MODE_NUNITS
> and CONST_VECTOR_NUNITS instead of computing the number of units from
> the byte sizes of the vector and element.
> (simplify_binary_operation_1): Likewise.
> (simplify_const_binary_operation): Likewise.
> (simplify_ternary_operation): Likewise.
>
> Index: gcc/simplify-rtx.c
> ===
> --- gcc/simplify-rtx.c  2017-10-23 11:47:11.277288162 +0100
> +++ gcc/simplify-rtx.c  2017-10-23 11:47:32.868935554 +0100
> @@ -1752,18 +1752,12 @@ simplify_const_unary_operation (enum rtx
> return gen_const_vec_duplicate (mode, op);
>if (GET_CODE (op) == CONST_VECTOR)
> {
> - int elt_size = GET_MODE_UNIT_SIZE (mode);
> -  unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
> - rtvec v = rtvec_alloc (n_elts);
> - unsigned int i;
> -
> - machine_mode inmode = GET_MODE (op);
> - int in_elt_size = GET_MODE_UNIT_SIZE (inmode);
> - unsigned in_n_elts = (GET_MODE_SIZE (inmode) / in_elt_size);
> -
> + unsigned int n_elts = GET_MODE_NUNITS (mode);
> + unsigned int in_n_elts = CONST_VECTOR_NUNITS (op);
>   gcc_assert (in_n_elts < n_elts);
>   gcc_assert ((n_elts % in_n_elts) == 0);
> - for (i = 0; i < n_elts; i++)
> + rtvec v = rtvec_alloc (n_elts);
> + for (unsigned i = 0; i < n_elts; i++)
> RTVEC_ELT (v, i) = CONST_VECTOR_ELT (op, i % in_n_elts);
>   return gen_rtx_CONST_VECTOR (mode, v);
> }
> @@ -3608,9 +3602,7 @@ simplify_binary_operation_1 (enum rtx_co
>   rtx op0 = XEXP (trueop0, 0);
>   rtx op1 = XEXP (trueop0, 1);
>
> - machine_mode opmode = GET_MODE (op0);
> - int elt_size = GET_MODE_UNIT_SIZE (opmode);
> - int n_elts = GET_MODE_SIZE (opmode) / elt_size;
> + int n_elts = GET_MODE_NUNITS (GET_MODE (op0));
>
>   int i = INTVAL (XVECEXP (trueop1, 0, 0));
>   int elem;
> @@ -3637,21 +3629,8 @@ simplify_binary_operation_1 (enum rtx_co
>   mode01 = GET_MODE (op01);
>
>   /* Find out number of elements of each operand.  */
> - if (VECTOR_MODE_P (mode00))
> -   {
> - elt_size = GET_MODE_UNIT_SIZE (mode00);
> - n_elts00 = GET_MODE_SIZE (mode00) / elt_size;
> -   }
> - else
> -   n_elts00 = 1;
> -
> - if (VECTOR_MODE_P (mode01))
> -   {
> - elt_size = GET_MODE_UNIT_SIZE (mode01);
> - n_elts01 = GET_MODE_SIZE (mode01) / elt_size;
> -   }
> - else
> -   n_elts01 = 1;
> + n_elts00 = GET_MODE_NUNITS (mode00);
> + n_elts01 = GET_MODE_NUNITS (mode01);
>
>   gcc_assert (n_elts == n_elts00 + n_elts01);
>
> @@ -3771,9 +3750,8 @@ simplify_binary_operation_1 (enum rtx_co
>   rtx subop1 = XEXP (trueop0, 1);
>   machine_mode mode0 = GET_MODE (subop0);
>   machine_mode mode1 = GET_MODE (subop1);
> - int li = GET_MODE_UNIT_SIZE (mode0);
> - int l0 = GET_MODE_SIZE (mode0) / li;
> - int l1 = GET_MODE_SIZE (mode1) / li;
> + int l0 = GET_MODE_NUNITS (mode0);
> + int l1 = GET_MODE_NUNITS (mode1);
>   int i0 = INTVAL (XVECEXP (trueop1, 0, 0));
>   if (i0 == 0 && !side_effects_p (op1) && mode == mode0)
> {
> @@ -3931,14 +3909,10 @@ simplify_binary_operation_1 (enum rtx_co
> || CONST_SCALAR_INT_P (trueop1)
> || CONST_DOUBLE_AS_FLOAT_P (trueop1)))
>   {
> -   int elt_size = GET_MODE_UNIT_SIZE (mode);
> -   unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
> +   unsigned n_elts = GET_MODE_NUNITS (mode);
> +   unsigned in_n_elts = GET_MODE_NUNITS (op0_mode);
> rtvec v = rtvec_alloc (n_elts);
> unsigned int i;
> -   unsigned in_n_elts = 1;
> -
> -   if (VECTOR_MODE_P (op0_mode))
> - in_n_elts = (GET_MODE_SIZE (op0_mode) / elt_size);
>   

Re: [14/nn] Add helpers for shift count modes

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 2:06 PM, Richard Biener
 wrote:
> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>  wrote:
>> This patch adds a stub helper routine to provide the mode
>> of a scalar shift amount, given the mode of the values
>> being shifted.
>>
>> One long-standing problem has been to decide what this mode
>> should be for arbitrary rtxes (as opposed to those directly
>> tied to a target pattern).  Is it the mode of the shifted
>> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
>> the corresponding target pattern says?  (In which case what
>> should the mode be when the target doesn't have a pattern?)
>>
>> For now the patch picks word_mode, which should be safe on
>> all targets but could perhaps become suboptimal if the helper
>> routine is used more often than it is in this patch.  As it
>> stands the patch does not change the generated code.
>>
>> The patch also adds a helper function that constructs rtxes
>> for constant shift amounts, again given the mode of the value
>> being shifted.  As well as helping with the SVE patches, this
>> is one step towards allowing CONST_INTs to have a real mode.
>
> I think gen_shift_amount_mode is flawed and while encapsulating
> constant shift amount RTX generation into a gen_int_shift_amount
> looks good to me I'd rather have that ??? in this function (and
> I'd use the mode of the RTX shifted, not word_mode...).
>
> In the end it's up to insn recognizing to convert the op to the
> expected mode and for generic RTL it's us that should decide
> on the mode -- on GENERIC the shift amount has to be an
> integer so why not simply use a mode that is large enough to
> make the constant fit?
>
> Just throwing in some comments here, RTL isn't my primary
> expertise.

To add a little bit - shift amounts is maybe the only(?) place
where a modeless CONST_INT makes sense!  So "fixing"
that first sounds backwards.

Richard.

> Richard.
>
>>
>> 2017-10-23  Richard Sandiford  
>> Alan Hayward  
>> David Sherwood  
>>
>> gcc/
>> * target.h (get_shift_amount_mode): New function.
>> * emit-rtl.h (gen_int_shift_amount): Declare.
>> * emit-rtl.c (gen_int_shift_amount): New function.
>> * asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
>> instead of GEN_INT.
>> * calls.c (shift_return_value): Likewise.
>> * cse.c (fold_rtx): Likewise.
>> * dse.c (find_shift_sequence): Likewise.
>> * expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
>> (expand_shift, expand_smod_pow2): Likewise.
>> * lower-subreg.c (shift_cost): Likewise.
>> * simplify-rtx.c (simplify_unary_operation_1): Likewise.
>> (simplify_binary_operation_1): Likewise.
>> * combine.c (try_combine, find_split_point, force_int_to_mode)
>> (simplify_shift_const_1, simplify_shift_const): Likewise.
>> (change_zero_ext): Likewise.  Use simplify_gen_binary.
>> * optabs.c (expand_superword_shift, expand_doubleword_mult)
>> (expand_unop): Use gen_int_shift_amount instead of GEN_INT.
>> (expand_binop): Likewise.  Use get_shift_amount_mode instead
>> of word_mode as the mode of a CONST_INT shift amount.
>> (shift_amt_for_vec_perm_mask): Add a machine_mode argument.
>> Use gen_int_shift_amount instead of GEN_INT.
>> (expand_vec_perm): Update caller accordingly.  Use
>> gen_int_shift_amount instead of GEN_INT.
>>
>> Index: gcc/target.h
>> ===
>> --- gcc/target.h2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/target.h2017-10-23 11:47:11.277288162 +0100
>> @@ -209,6 +209,17 @@ #define HOOKSTRUCT(FRAGMENT) FRAGMENT
>>
>>  extern struct gcc_target targetm;
>>
>> +/* Return the mode that should be used to hold a scalar shift amount
>> +   when shifting values of the given mode.  */
>> +/* ??? This could in principle be generated automatically from the .md
>> +   shift patterns, but for now word_mode should be universally OK.  */
>> +
>> +inline scalar_int_mode
>> +get_shift_amount_mode (machine_mode)
>> +{
>> +  return word_mode;
>> +}
>> +
>>  #ifdef GCC_TM_H
>>
>>  #ifndef CUMULATIVE_ARGS_MAGIC
>> Index: gcc/emit-rtl.h
>> ===
>> --- gcc/emit-rtl.h  2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/emit-rtl.h  2017-10-23 11:47:11.274393237 +0100
>> @@ -369,6 +369,7 @@ extern void set_reg_attrs_for_parm (rtx,
>>  extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
>>  extern void adjust_reg_mode (rtx, machine_mode);
>>  extern int mem_expr_equal_p (const_tree, const_tree);
>> +extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
>>
>>  extern bool need_atomic_barrier_p (enum memmodel, 

Re: [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 2:23 PM, Richard Biener
 wrote:
> On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
>  wrote:
>> Similarly to the VEC_DUPLICATE_{CST,EXPR}, this patch adds two
>> tree code equivalents of the VEC_SERIES rtx code.  VEC_SERIES_EXPR
>> is for non-constant inputs and is a normal tcc_binary.  VEC_SERIES_CST
>> is a tcc_constant.
>>
>> Like VEC_DUPLICATE_CST, VEC_SERIES_CST is only used for variable-length
>> vectors.  This avoids the need to handle combinations of VECTOR_CST
>> and VEC_SERIES_CST.
>
> Similar to the other patch can you document and verify that VEC_SERIES_CST
> is only used on variable length vectors?
>
> Ok with that change.

Btw, did you think of merging VEC_DUPLICATE_CST with VEC_SERIES_CST
via setting step == 0?  I think you can do {1, 1, 1, 1... } + {1, 2,3
,4,5 } constant
folding but you don't implement that.  Propagation can also turn
VEC_SERIES_EXPR into VEC_SERIES_CST and VEC_DUPLICATE_EXPR
into VEC_DUPLICATE_CST (didn't see the former, don't remember the latter).

Richard.

> Thanks,
> Richard.
>
>>
>> 2017-10-23  Richard Sandiford  
>> Alan Hayward  
>> David Sherwood  
>>
>> gcc/
>> * doc/generic.texi (VEC_SERIES_CST, VEC_SERIES_EXPR): Document.
>> * doc/md.texi (vec_series@var{m}): Document.
>> * tree.def (VEC_SERIES_CST, VEC_SERIES_EXPR): New tree codes.
>> * tree.h (TREE_OVERFLOW): Add VEC_SERIES_CST to the list of valid
>> codes.
>> (VEC_SERIES_CST_BASE, VEC_SERIES_CST_STEP): New macros.
>> (build_vec_series_cst, build_vec_series): Declare.
>> * tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
>> (add_expr, walk_tree_1, drop_tree_overflow): Handle VEC_SERIES_CST.
>> (build_vec_series_cst, build_vec_series): New functions.
>> * cfgexpand.c (expand_debug_expr): Handle the new codes.
>> * tree-pretty-print.c (dump_generic_node): Likewise.
>> * dwarf2out.c (rtl_for_decl_init): Handle VEC_SERIES_CST.
>> * gimple-expr.h (is_gimple_constant): Likewise.
>> * gimplify.c (gimplify_expr): Likewise.
>> * graphite-scop-detection.c (scan_tree_for_params): Likewise.
>> * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
>> (func_checker::compare_operand): Likewise.
>> * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
>> * print-tree.c (print_node): Likewise.
>> * tree-ssa-loop.c (for_each_index): Likewise.
>> * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
>> * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
>> (ao_ref_init_from_vn_reference): Likewise.
>> * varasm.c (const_hash_1, compare_constant): Likewise.
>> * fold-const.c (negate_expr_p, fold_negate_expr_1, operand_equal_p)
>> (fold_checksum_tree): Likewise.
>> (vec_series_equivalent_p): New function.
>> (const_binop): Use it.  Fold VEC_SERIES_EXPRs of constants.
>> * expmed.c (make_tree): Handle VEC_SERIES.
>> * gimple-pretty-print.c (dump_binary_rhs): Likewise.
>> * tree-inline.c (estimate_operator_cost): Likewise.
>> * expr.c (const_vector_element): Include VEC_SERIES_CST in comment.
>> (expand_expr_real_2): Handle VEC_SERIES_EXPR.
>> (expand_expr_real_1): Handle VEC_SERIES_CST.
>> * optabs.def (vec_series_optab): New optab.
>> * optabs.h (expand_vec_series_expr): Declare.
>> * optabs.c (expand_vec_series_expr): New function.
>> * optabs-tree.c (optab_for_tree_code): Handle VEC_SERIES_EXPR.
>> * tree-cfg.c (verify_gimple_assign_binary): Handle VEC_SERIES_EXPR.
>> (verify_gimple_assign_single): Handle VEC_SERIES_CST.
>> * tree-vect-generic.c (expand_vector_operations_1): Check that
>> the operands also have vector type.
>>
>> Index: gcc/doc/generic.texi
>> ===
>> --- gcc/doc/generic.texi2017-10-23 11:41:51.760448406 +0100
>> +++ gcc/doc/generic.texi2017-10-23 11:42:34.910720660 +0100
>> @@ -1037,6 +1037,7 @@ As this example indicates, the operands
>>  @tindex COMPLEX_CST
>>  @tindex VECTOR_CST
>>  @tindex VEC_DUPLICATE_CST
>> +@tindex VEC_SERIES_CST
>>  @tindex STRING_CST
>>  @findex TREE_STRING_LENGTH
>>  @findex TREE_STRING_POINTER
>> @@ -1098,6 +1099,16 @@ instead.  The scalar element value is gi
>>  @code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
>>  element of a @code{VECTOR_CST}.
>>
>> +@item VEC_SERIES_CST
>> +These nodes represent a vector constant in which element @var{i}
>> +has the value @samp{@var{base} + @var{i} * @var{step}}, for some
>> +constant @var{base} and @var{step}.  The value of @var{base} is
>> +given by 

Re: [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
 wrote:
> Similarly to the VEC_DUPLICATE_{CST,EXPR}, this patch adds two
> tree code equivalents of the VEC_SERIES rtx code.  VEC_SERIES_EXPR
> is for non-constant inputs and is a normal tcc_binary.  VEC_SERIES_CST
> is a tcc_constant.
>
> Like VEC_DUPLICATE_CST, VEC_SERIES_CST is only used for variable-length
> vectors.  This avoids the need to handle combinations of VECTOR_CST
> and VEC_SERIES_CST.

Similar to the other patch can you document and verify that VEC_SERIES_CST
is only used on variable length vectors?

Ok with that change.

Thanks,
Richard.

>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * doc/generic.texi (VEC_SERIES_CST, VEC_SERIES_EXPR): Document.
> * doc/md.texi (vec_series@var{m}): Document.
> * tree.def (VEC_SERIES_CST, VEC_SERIES_EXPR): New tree codes.
> * tree.h (TREE_OVERFLOW): Add VEC_SERIES_CST to the list of valid
> codes.
> (VEC_SERIES_CST_BASE, VEC_SERIES_CST_STEP): New macros.
> (build_vec_series_cst, build_vec_series): Declare.
> * tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
> (add_expr, walk_tree_1, drop_tree_overflow): Handle VEC_SERIES_CST.
> (build_vec_series_cst, build_vec_series): New functions.
> * cfgexpand.c (expand_debug_expr): Handle the new codes.
> * tree-pretty-print.c (dump_generic_node): Likewise.
> * dwarf2out.c (rtl_for_decl_init): Handle VEC_SERIES_CST.
> * gimple-expr.h (is_gimple_constant): Likewise.
> * gimplify.c (gimplify_expr): Likewise.
> * graphite-scop-detection.c (scan_tree_for_params): Likewise.
> * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
> (func_checker::compare_operand): Likewise.
> * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
> * print-tree.c (print_node): Likewise.
> * tree-ssa-loop.c (for_each_index): Likewise.
> * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
> * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
> (ao_ref_init_from_vn_reference): Likewise.
> * varasm.c (const_hash_1, compare_constant): Likewise.
> * fold-const.c (negate_expr_p, fold_negate_expr_1, operand_equal_p)
> (fold_checksum_tree): Likewise.
> (vec_series_equivalent_p): New function.
> (const_binop): Use it.  Fold VEC_SERIES_EXPRs of constants.
> * expmed.c (make_tree): Handle VEC_SERIES.
> * gimple-pretty-print.c (dump_binary_rhs): Likewise.
> * tree-inline.c (estimate_operator_cost): Likewise.
> * expr.c (const_vector_element): Include VEC_SERIES_CST in comment.
> (expand_expr_real_2): Handle VEC_SERIES_EXPR.
> (expand_expr_real_1): Handle VEC_SERIES_CST.
> * optabs.def (vec_series_optab): New optab.
> * optabs.h (expand_vec_series_expr): Declare.
> * optabs.c (expand_vec_series_expr): New function.
> * optabs-tree.c (optab_for_tree_code): Handle VEC_SERIES_EXPR.
> * tree-cfg.c (verify_gimple_assign_binary): Handle VEC_SERIES_EXPR.
> (verify_gimple_assign_single): Handle VEC_SERIES_CST.
> * tree-vect-generic.c (expand_vector_operations_1): Check that
> the operands also have vector type.
>
> Index: gcc/doc/generic.texi
> ===
> --- gcc/doc/generic.texi2017-10-23 11:41:51.760448406 +0100
> +++ gcc/doc/generic.texi2017-10-23 11:42:34.910720660 +0100
> @@ -1037,6 +1037,7 @@ As this example indicates, the operands
>  @tindex COMPLEX_CST
>  @tindex VECTOR_CST
>  @tindex VEC_DUPLICATE_CST
> +@tindex VEC_SERIES_CST
>  @tindex STRING_CST
>  @findex TREE_STRING_LENGTH
>  @findex TREE_STRING_POINTER
> @@ -1098,6 +1099,16 @@ instead.  The scalar element value is gi
>  @code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
>  element of a @code{VECTOR_CST}.
>
> +@item VEC_SERIES_CST
> +These nodes represent a vector constant in which element @var{i}
> +has the value @samp{@var{base} + @var{i} * @var{step}}, for some
> +constant @var{base} and @var{step}.  The value of @var{base} is
> +given by @code{VEC_SERIES_CST_BASE} and the value of @var{step} is
> +given by @code{VEC_SERIES_CST_STEP}.
> +
> +These nodes are restricted to integral types, in order to avoid
> +specifying the rounding behavior for floating-point types.
> +
>  @item STRING_CST
>  These nodes represent string-constants.  The @code{TREE_STRING_LENGTH}
>  returns the length of the string, as an @code{int}.  The
> @@ -1702,6 +1713,7 @@ a value from @code{enum annot_expr_kind}
>  @node Vectors
>  @subsection Vectors
>  @tindex VEC_DUPLICATE_EXPR
> +@tindex VEC_SERIES_EXPR
>  @tindex 

Re: [PATCH] Fix test-suite fallout of default -Wreturn-type.

2017-10-26 Thread Martin Liška
On 10/24/2017 04:39 PM, Jason Merrill wrote:
> On 10/18/2017 08:48 AM, Martin Liška wrote:
>> This is second patch that addresses test-suite fallout. All these tests fail 
>> because -Wreturn-type is
>> now on by default.
> 
>> +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-diag3.C
>> -constexpr T g(T t) { return f(t); } // { dg-error "f.int" }
>> +constexpr T g(T t) { return f(t); } // { dg-error "f.int" "" { target 
>> c++14_only } }
> 
>> +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-neg3.C
>> -  constexpr int bar() { return a.foo(); } // { dg-error "foo" }
>> +  constexpr int bar() { return a.foo(); } // { dg-error "foo" "" { target 
>> c++14_only } }
> 
> Why are these changes needed?  They aren't "Return a value for functions with 
> non-void return type, or change type to void, or add -Wno-return-type for 
> test."
> 
> The rest of the patch is OK.
> 
> Jason

Hi.

Sorry, I forgot to describe this change. With -std=c++11 we do:

#0  massage_constexpr_body (fun=0x76955500, body=0x76813eb8) at 
../../gcc/cp/constexpr.c:708
#1  0x0087700b in explain_invalid_constexpr_fn (fun=0x76955500) at 
../../gcc/cp/constexpr.c:896
#2  0x008799dc in cxx_eval_call_expression (ctx=0x7fffd150, 
t=0x76820118, lval=false, non_constant_p=0x7fffd1cf, 
overflow_p=0x7fffd1ce) at ../../gcc/cp/constexpr.c:1558
#3  0x008843fe in cxx_eval_constant_expression (ctx=0x7fffd150, 
t=0x76820118, lval=false, non_constant_p=0x7fffd1cf, 
overflow_p=0x7fffd1ce, jump_target=0x0) at ../../gcc/cp/constexpr.c:4069

static tree
massage_constexpr_body (tree fun, tree body)
{
  if (DECL_CONSTRUCTOR_P (fun))
body = build_constexpr_constructor_member_initializers
  (DECL_CONTEXT (fun), body);
  else if (cxx_dialect < cxx14)
{
  if (TREE_CODE (body) == EH_SPEC_BLOCK)
body = EH_SPEC_STMTS (body);
  if (TREE_CODE (body) == MUST_NOT_THROW_EXPR)
body = TREE_OPERAND (body, 0);
  body = constexpr_fn_retval (body);
}
  return body;
}

and we end up with error_mark_node and thus potential_constant_expression_1 
does bail out.
That's why we don't print the later error with -std=c++11.

What should we do with that?
Thanks,
Martin


Re: [09/nn] Add a fixed_size_mode_pod class

2017-10-26 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
>  wrote:
>> This patch adds a POD version of fixed_size_mode.  The only current use
>> is for storing the __builtin_apply and __builtin_result register modes,
>> which were made fixed_size_modes by the previous patch.
>
> Bah - can we update our host compiler to C++11/14 please ...?
> (maybe requiring that build with GCC 4.8 as host compiler works,
> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).

That'd be great :-)  It would avoid all the poly_int_pod stuff too,
and allow some clean-up of wide-int.h.

Thanks for the reviews,
Richard


>
> Ok.
>
> Thanks,
> Richard.
>
>>
>> 2017-10-23  Richard Sandiford  
>> Alan Hayward  
>> David Sherwood  
>>
>> gcc/
>> * coretypes.h (fixed_size_mode): Declare.
>> (fixed_size_mode_pod): New typedef.
>> * builtins.h (target_builtins::x_apply_args_mode)
>> (target_builtins::x_apply_result_mode): Change type to
>> fixed_size_mode_pod.
>> * builtins.c (apply_args_size, apply_result_size, result_vector)
>> (expand_builtin_apply_args_1, expand_builtin_apply)
>> (expand_builtin_return): Update accordingly.
>>
>> Index: gcc/coretypes.h
>> ===
>> --- gcc/coretypes.h 2017-09-11 17:10:58.656085547 +0100
>> +++ gcc/coretypes.h 2017-10-23 11:42:57.592545063 +0100
>> @@ -59,6 +59,7 @@ typedef const struct rtx_def *const_rtx;
>>  class scalar_int_mode;
>>  class scalar_float_mode;
>>  class complex_mode;
>> +class fixed_size_mode;
>>  template class opt_mode;
>>  typedef opt_mode opt_scalar_mode;
>>  typedef opt_mode opt_scalar_int_mode;
>> @@ -66,6 +67,7 @@ typedef opt_mode opt_
>>  template class pod_mode;
>>  typedef pod_mode scalar_mode_pod;
>>  typedef pod_mode scalar_int_mode_pod;
>> +typedef pod_mode fixed_size_mode_pod;
>>
>>  /* Subclasses of rtx_def, using indentation to show the class
>> hierarchy, along with the relevant invariant.
>> Index: gcc/builtins.h
>> ===
>> --- gcc/builtins.h  2017-08-30 12:18:46.602740973 +0100
>> +++ gcc/builtins.h  2017-10-23 11:42:57.592545063 +0100
>> @@ -29,14 +29,14 @@ struct target_builtins {
>>   the register is not used for calling a function.  If the machine
>>   has register windows, this gives only the outbound registers.
>>   INCOMING_REGNO gives the corresponding inbound register.  */
>> -  machine_mode x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>> +  fixed_size_mode_pod x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>>
>>/* For each register that may be used for returning values, this gives
>>   a mode used to copy the register's value.  VOIDmode indicates the
>>   register is not used for returning values.  If the machine has
>>   register windows, this gives only the outbound registers.
>>   INCOMING_REGNO gives the corresponding inbound register.  */
>> -  machine_mode x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>> +  fixed_size_mode_pod x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>>  };
>>
>>  extern struct target_builtins default_target_builtins;
>> Index: gcc/builtins.c
>> ===
>> --- gcc/builtins.c  2017-10-23 11:41:23.140260335 +0100
>> +++ gcc/builtins.c  2017-10-23 11:42:57.592545063 +0100
>> @@ -1358,7 +1358,6 @@ apply_args_size (void)
>>static int size = -1;
>>int align;
>>unsigned int regno;
>> -  machine_mode mode;
>>
>>/* The values computed by this function never change.  */
>>if (size < 0)
>> @@ -1374,7 +1373,7 @@ apply_args_size (void)
>>for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> if (FUNCTION_ARG_REGNO_P (regno))
>>   {
>> -   mode = targetm.calls.get_raw_arg_mode (regno);
>> +   fixed_size_mode mode = targetm.calls.get_raw_arg_mode (regno);
>>
>> gcc_assert (mode != VOIDmode);
>>
>> @@ -1386,7 +1385,7 @@ apply_args_size (void)
>>   }
>> else
>>   {
>> -   apply_args_mode[regno] = VOIDmode;
>> +   apply_args_mode[regno] = as_a  (VOIDmode);
>>   }
>>  }
>>return size;
>> @@ -1400,7 +1399,6 @@ apply_result_size (void)
>>  {
>>static int size = -1;
>>int align, regno;
>> -  machine_mode mode;
>>
>>/* The values computed by this function never change.  */
>>if (size < 0)
>> @@ -1410,7 +1408,7 @@ apply_result_size (void)
>>for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> if (targetm.calls.function_value_regno_p (regno))
>>   {
>> -   mode = targetm.calls.get_raw_result_mode (regno);
>> +   fixed_size_mode mode = targetm.calls.get_raw_result_mode (regno);
>>
>>  

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Martin Jambor
Hi,

On Tue, Oct 17, 2017 at 01:34:54PM +0200, Richard Biener wrote:
> On Fri, Oct 13, 2017 at 6:13 PM, Martin Jambor  wrote:
> > Hi,
> >
> > I'd like to request comments to the patch below which aims to fix PR
> > 80689, which is an instance of a store-to-load forwarding stall on
> > x86_64 CPUs in the Image Magick benchmark, which is responsible for a
> > slow down of up to 9% compared to gcc 6, depending on options and HW
> > used.  (Actually, I have just seen 24% in one specific combination but
> > for various reasons can no longer verify it today.)
> >
> > The revision causing the regression is 237074, which increased the
> > size of the mode for copying aggregates "by pieces" to 128 bits,
> > incurring big stalls when the values being copied are also still being
> > stored in a smaller data type or if the copied values are loaded in a
> > smaller types shortly afterwards.  Such situations happen in Image
> > Magick even across calls, which means that any non-IPA flow-sensitive
> > approach would not detect them.  Therefore, the patch simply changes
> > the way we copy small BLKmode data that are simple combinations of
> > records and arrays (meaning no unions, bit-fields but also character
> > arrays are disallowed) and simply copies them one field and/or element
> > at a time.
> >
> > "Small" in this RFC patch means up to 35 bytes on x86_64 and i386 CPUs
> > (the structure in the benchmark has 32 bytes) but is subject to change
> > after more benchmarking and is actually zero - meaning element copying
> > never happens - on other architectures.  I believe that any
> > architecture with a store buffer can benefit but it's probably better
> > to leave it to their maintainers to find a different default value.  I
> > am not sure this is how such HW-dependant decisions should be done and
> > is the primary reason, why I am sending this RFC first.
> >
> > I have decided to implement this change at the expansion level because
> > at that point the type information is still readily available and at
> > the same time we can also handle various implicit copies, for example
> > those passing a parameter.  I found I could re-use some bits and
> > pieces of tree-SRA and so I did, creating tree-sra.h header file in
> > the process.
> >
> > I am fully aware that in the final patch the new parameter, or indeed
> > any new parameters, need to be documented.  I have skipped that
> > intentionally now and will write the documentation if feedback here is
> > generally good.
> >
> > I have bootstrapped and tested this patch on x86_64-linux, with
> > different values of the parameter and only found problems with
> > unreasonably high values leading to OOM.  I have done the same with a
> > previous version of the patch which was equivalent to the limit being
> > 64 bytes on aarch64-linux, ppc64le-linux and ia64-linux and only ran
> > into failures of tests which assumed that structure padding was copied
> > in aggregate copies (mostly gcc.target/aarch64/aapcs64/ stuff but also
> > for example gcc.dg/vmx/varargs-4.c).
> >
> > The patch decreases the SPEC 2017 "rate" run-time of imagick by 9% and
> > 8% at -O2 and -Ofast compilation levels respectively on one particular
> > new AMD CPU and by 6% and 3% on one particular old Intel machine.
> >
> > Thanks in advance for any comments,
> 
> I wonder if you can at the place you choose to hook this in elide any
> copying of padding between fields.
> 
> I'd rather have hooked such "high level" optimization in
> expand_assignment where you can be reasonably sure you're seeing an
> actual source-level construct.

I have discussed this with Honza and we eventually decided to make the
elememnt-wise copy an alternative to emit_block_move (which uses the
larger mode for moving since GCC 7) exactly so that we handle not only
source-level assignments but also passing parameters by value and
other situations.

> 
> 35 bytes seems to be much - what is the code-size impact?

I will find out and report on that.  I need at least 32 bytes (four
long ints) to fix imagemagick, where the problematic structure is:

  typedef struct _RectangleInfo
  {
size_t
  width,
  height;
  
ssize_t
  x,
  y;
  } RectangleInfo;

...so 8 longs, no padding.  Since any aggregate having between 33-35
bytes needs to consist of smaller fields/elements, it seemed
reasonable to also copy them element-wise.

Nevertheless, I still intend to experiment with the limit, I sent out
this RFC exactly so that I don't spend a lot of time benchmarking
something that is eventually not deemed acceptable on principle.

> 
> IIRC the reason this may be slow isn't loading in smaller types than stored
> before by the copy - the store buffers can handle this reasonably well.  It's
> solely when previous smaller stores are
> 
>   a1) not mergeabe in the store buffer
>   a2) not merged because earlier stores are already committed
> 
> and
> 
>   b) loaded afterwards as a type that would access 

Re: [22/nn] Make dse.c use offset/width instead of start/end

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:30 PM, Richard Sandiford
 wrote:
> store_info and read_info_type in dse.c represented the ranges as
> start/end, but a lot of the internal code used offset/width instead.
> Using offset/width throughout fits better with the poly_int.h
> range-checking functions.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * dse.c (store_info, read_info_type): Replace begin and end with
> offset and width.
> (print_range): New function.
> (set_all_positions_unneeded, any_positions_needed_p)
> (check_mem_read_rtx, scan_stores, scan_reads, dse_step5): Update
> accordingly.
> (record_store): Likewise.  Optimize the case in which all positions
> are unneeded.
> (get_stored_val): Replace read_begin and read_end with read_offset
> and read_width.
> (replace_read): Update call accordingly.
>
> Index: gcc/dse.c
> ===
> --- gcc/dse.c   2017-10-23 11:47:11.273428262 +0100
> +++ gcc/dse.c   2017-10-23 11:47:48.294155952 +0100
> @@ -243,9 +243,12 @@ struct store_info
>/* Canonized MEM address for use by canon_true_dependence.  */
>rtx mem_addr;
>
> -  /* The offset of the first and byte before the last byte associated
> - with the operation.  */
> -  HOST_WIDE_INT begin, end;
> +  /* The offset of the first byte associated with the operation.  */
> +  HOST_WIDE_INT offset;
> +
> +  /* The number of bytes covered by the operation.  This is always exact
> + and known (rather than -1).  */
> +  HOST_WIDE_INT width;
>
>union
>  {
> @@ -261,7 +264,7 @@ struct store_info
>   bitmap bmap;
>
>   /* Number of set bits (i.e. unneeded bytes) in BITMAP.  If it is
> -equal to END - BEGIN, the whole store is unused.  */
> +equal to WIDTH, the whole store is unused.  */
>   int count;
> } large;
>  } positions_needed;
> @@ -304,10 +307,11 @@ struct read_info_type
>/* The id of the mem group of the base address.  */
>int group_id;
>
> -  /* The offset of the first and byte after the last byte associated
> - with the operation.  If begin == end == 0, the read did not have
> - a constant offset.  */
> -  int begin, end;
> +  /* The offset of the first byte associated with the operation.  */
> +  HOST_WIDE_INT offset;
> +
> +  /* The number of bytes covered by the operation, or -1 if not known.  */
> +  HOST_WIDE_INT width;
>
>/* The mem being read.  */
>rtx mem;
> @@ -586,6 +590,18 @@ static deferred_change *deferred_change_
>
>  /* The number of bits used in the global bitmaps.  */
>  static unsigned int current_position;
> +
> +/* Print offset range [OFFSET, OFFSET + WIDTH) to FILE.  */
> +
> +static void
> +print_range (FILE *file, poly_int64 offset, poly_int64 width)
> +{
> +  fprintf (file, "[");
> +  print_dec (offset, file, SIGNED);
> +  fprintf (file, "..");
> +  print_dec (offset + width, file, SIGNED);
> +  fprintf (file, ")");
> +}
>
>  
> /*
> Zeroth step.
> @@ -1212,10 +1228,9 @@ set_all_positions_unneeded (store_info *
>  {
>if (__builtin_expect (s_info->is_large, false))
>  {
> -  int pos, end = s_info->end - s_info->begin;
> -  for (pos = 0; pos < end; pos++)
> -   bitmap_set_bit (s_info->positions_needed.large.bmap, pos);
> -  s_info->positions_needed.large.count = end;
> +  bitmap_set_range (s_info->positions_needed.large.bmap,
> +   0, s_info->width);
> +  s_info->positions_needed.large.count = s_info->width;
>  }
>else
>  s_info->positions_needed.small_bitmask = HOST_WIDE_INT_0U;
> @@ -1227,8 +1242,7 @@ set_all_positions_unneeded (store_info *
>  any_positions_needed_p (store_info *s_info)
>  {
>if (__builtin_expect (s_info->is_large, false))
> -return (s_info->positions_needed.large.count
> -   < s_info->end - s_info->begin);
> +return s_info->positions_needed.large.count < s_info->width;
>else
>  return (s_info->positions_needed.small_bitmask != HOST_WIDE_INT_0U);
>  }
> @@ -1355,8 +1369,12 @@ record_store (rtx body, bb_info_t bb_inf
>set_usage_bits (group, offset, width, expr);
>
>if (dump_file && (dump_flags & TDF_DETAILS))
> -   fprintf (dump_file, " processing const base store gid=%d[%d..%d)\n",
> -group_id, (int)offset, (int)(offset+width));
> +   {
> + fprintf (dump_file, " processing const base store gid=%d",
> +  group_id);
> + print_range (dump_file, offset, width);
> + fprintf (dump_file, "\n");
> +   }
>  }
>else
>  {
> @@ -1368,8 +1386,11 @@ record_store (rtx body, bb_info_t bb_inf
>

Re: [21/nn] Minor vn_reference_lookup_3 tweak

2017-10-26 Thread Richard Biener
On Mon, Oct 23, 2017 at 1:30 PM, Richard Sandiford
 wrote:
> The repeated checks for MEM_REF made this code hard to convert to
> poly_ints as-is.  Hopefully the new structure also makes it clearer
> at a glance what the two cases are.
>
>
> 2017-10-23  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * tree-ssa-sccvn.c (vn_reference_lookup_3): Avoid repeated
> checks for MEM_REF.
>
> Index: gcc/tree-ssa-sccvn.c
> ===
> --- gcc/tree-ssa-sccvn.c2017-10-23 11:47:03.852769480 +0100
> +++ gcc/tree-ssa-sccvn.c2017-10-23 11:47:44.596155858 +0100
> @@ -2234,6 +2234,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
>   || offset % BITS_PER_UNIT != 0
>   || ref->size % BITS_PER_UNIT != 0)
> return (void *)-1;
> +  at = offset / BITS_PER_UNIT;

can you move this just

>/* Extract a pointer base and an offset for the destination.  */
>lhs = gimple_call_arg (def_stmt, 0);
> @@ -2301,19 +2302,18 @@ vn_reference_lookup_3 (ao_ref *ref, tree
>copy_size = tree_to_uhwi (gimple_call_arg (def_stmt, 2));
>
>/* The bases of the destination and the references have to agree.  */

here? Ok with that change.

Richard.

> -  if ((TREE_CODE (base) != MEM_REF
> -  && !DECL_P (base))
> - || (TREE_CODE (base) == MEM_REF
> - && (TREE_OPERAND (base, 0) != lhs
> - || !tree_fits_uhwi_p (TREE_OPERAND (base, 1
> - || (DECL_P (base)
> - && (TREE_CODE (lhs) != ADDR_EXPR
> - || TREE_OPERAND (lhs, 0) != base)))
> +  if (TREE_CODE (base) == MEM_REF)
> +   {
> + if (TREE_OPERAND (base, 0) != lhs
> + || !tree_fits_uhwi_p (TREE_OPERAND (base, 1)))
> +   return (void *) -1;
> + at += tree_to_uhwi (TREE_OPERAND (base, 1));
> +   }
> +  else if (!DECL_P (base)
> +  || TREE_CODE (lhs) != ADDR_EXPR
> +  || TREE_OPERAND (lhs, 0) != base)
> return (void *)-1;
>
> -  at = offset / BITS_PER_UNIT;
> -  if (TREE_CODE (base) == MEM_REF)
> -   at += tree_to_uhwi (TREE_OPERAND (base, 1));
>/* If the access is completely outside of the memcpy destination
>  area there is no aliasing.  */
>if (lhs_offset >= at + maxsize / BITS_PER_UNIT


  1   2   >