Re: [PATCH 0/2] Final cleanup in move to ISL

2015-08-26 Thread Tobias Grosser

On 08/27/2015 12:14 AM, Sebastian Pop wrote:

Hi,

Richi suggested at the Cauldron that it would be good to have graphite more
automatic and with fewer flags.  The first patch removes the -funroll-and-jam
pass that does not seem very stable or useful for now.  The second patch removes
the other -floop-* flags that were part of the old graphite's middle-end (these
were the first transforms implemented on the polyhedral representation
(matrices, etc.) when we had no ISL scheduler.)  The transition to ISL that
removed GCC's dependence on PPL and Cloog has not removed all graphite's
middle-end for loop transforms.  We now can remove that code as it is replaced
by ISL's scheduler.

The patches pass "make check" and bootstrap (in progress) with 
-fgraphite-identity.
Ok to commit?


From the graphite side, this is right. One thing I am not sure about if we need 
to
keep these flags as 'do-nothing' flags to meet certain backward compatibility
guarantees in gcc.

Best,
Tobias


RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-08-26 Thread Ajit Kumar Agarwal


-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Thursday, August 20, 2015 9:19 PM
To: Ajit Kumar Agarwal; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 08/20/2015 09:38 AM, Ajit Kumar Agarwal wrote:

>>
>> Bootstrapping with i386 and Microblaze target works fine. No 
>> regression is seen in Deja GNU tests for Microblaze. There are lesser 
>> failures. Mibench/EEMBC benchmarks were run for Microblaze target and 
>> the gain of 9.3% is seen in rgbcmy_lite the EEMBC benchmarks.
>>> What do you mean by there are "lesser failures"?  Are you saying there are 
>>> cases where path splitting generates incorrect code, or cases where path 
>>> >>splitting produces code that is less efficient, or something else?
>
> I meant there are more Deja GNU testcases passes with the path splitting 
> changes.
>>Ah, in that case, that's definitely good news!

Thanks. The following testcase testsuite/gcc.dg/tree-ssa/ifc-5.c

void
dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs)
{
  int i, level, qmul, qadd;

  qadd = (qscale - 1) | 1;
  qmul = qscale << 1;

  for (i = 0; i <= nCoeffs; i++)
{
  level = block[i];
  if (level < 0)
level = level * qmul - qadd;
  else
level = level * qmul + qadd;
  block[i] = level;
}
}

The above Loop is a candidate of path splitting as the IF block merges at the 
latch of the Loop and the path splitting duplicates
The latch of the loop which is the statement block[i] = level into the 
predecessors THEN and ELSE block.

Due to above path splitting,  the IF conversion is disabled and the above 
IF-THEN-ELSE is not IF-converted and the test case fails.

There were following review comments from the above patch.

+/* This function performs the feasibility tests for path splitting
> +   to perform. Return false if the feasibility for path splitting
> +   is not done and returns true if the feasibility for path splitting
> +   is done. Following feasibility tests are performed.
> +
> +   1. Return false if the join block has rhs casting for assign
> +  gimple statements.

Comments from Jeff:

>>These seem totally arbitrary.  What's the reason behind each of these 
>>restrictions?  None should be a correctness requirement AFAICT.  

In the above patch I have made a check given in point 1. in the loop latch and 
the Path splitting is disabled and the IF-conversion
happens and the test case passes.

I have incorporated the above review comments of not doing the above 
feasibility check of the point 1 and the above testcases goes
For path splitting and due to path splitting the if-cvt is not happening and 
the test case fails (expecting the pattern Applying if conversion 
To be present). With the above patch given for review and the Feasibility check 
of cast assign in the latch of the loop as given in point 1
 disables the path splitting  and if-cvt happens and the above test case passes.

Please let me know whether to keep the above feasibility check as given in 
point 1  or better appropriate changes required for the above 
Test case scenario of path splitting vs IF-conversion.

Thanks & Regards
Ajit


jeff



[PATCH] fix --with-cpu for sh targets

2015-08-26 Thread Rich Felker
A missing * in the pattern for sh targets prevents the --with-cpu
configure option from being accepted for certain targets (e.g. ones
with explicit endianness, like sh2eb).

The latest config.sub should also be pulled from upstream since it has
a fix for related issues.

Rich
--- gcc-5.2.0.orig/gcc/config.gcc
+++ gcc-5.2.0/gcc/config.gcc
@@ -4096,7 +4099,7 @@
esac
;;
 
-   sh[123456ble]-*-* | sh-*-*)
+   sh[123456ble]*-*-* | sh-*-*)
supported_defaults="cpu"
case "`echo $with_cpu | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ_ 
abcdefghijklmnopqrstuvwxyz- | sed s/sh/m/`" in
"" | m1 | m2 | m2e | m3 | m3e | m4 | m4-single | m4-single-only 
| m4-nofpu )


Re: [PATCH] Fix and simplify (Re: Fix libbacktrace -fPIC breakage from "Use libbacktrace in libgfortran")

2015-08-26 Thread Ian Lance Taylor
"Ulrich Weigand"  writes:

> I've verified that this works on x86_64: the resulting
> libgfortran.so uses the -fPIC version of the libbacktrace
> object, while libgfortran.a uses the non-PIC versions.
>
> On SPU, libtool will now automatically only generate the
> non-PIC versions since the target does not support shared
> library.  So everything works as expected.
>
> OK for mainline?

Can you verify that libgo works as expected?

Ian


[c++-delayed-folding] fold_simple

2015-08-26 Thread Jason Merrill
Why does fold_simple fold so many patterns?  I thought we wanted 
something that would just fold conversions and negations of constant values.


Jason



Re: C++ delayed folding branch review

2015-08-26 Thread Jason Merrill

On 08/24/2015 03:15 AM, Kai Tietz wrote:

2015-08-03 17:39 GMT+02:00 Jason Merrill :

On 08/03/2015 05:42 AM, Kai Tietz wrote:

2015-08-03 5:49 GMT+02:00 Jason Merrill :

On 07/31/2015 05:54 PM, Kai Tietz wrote:


The "STRIP_NOPS-requirement in 'reduced_constant_expression_p'" I could
remove, but for one case in constexpr.  Without folding we don't do
type-sinking/raising.


Right.


So binary/unary operations might be containing cast, which were in the
past unexpected.


Why aren't the casts folded away?


On such cast constructs, as for this vector-sample, we can't fold away


Which testcase is this?


It is the g++.dg/ext/vector20.C testcase.  IIRC I mentioned this
testcase already earlier as reference, but I might be wrong here.


I don't see any casts in that testcase.  So the compiler is introducing 
introducing conversions back and forth between const and non-const, 
then?  I suppose it doesn't so much matter where they come from, they 
should be folded away regardless.



the cast chain.  The difference here to none-delayed-folding branch is
that the cast isn't moved out of the plus-expr.  What we see now is
(plus ((vec) (const vector ...) {  }), ...).  Before we had (vec)
(plus (const vector ...) { ... }).


How could a PLUS_EXPR be considered a reduced constant, regardless of where
the cast is?


Of course it is just possible to sink out a cast from PLUS_EXPR, in
pretty few circumstance (eg. on constants if both types just differ in
const-attribute, if conversion is no view-convert).


I don't understand how this is an answer to my question.


On verify_constant we check by reduced_constant_expression_p, if value is
a constant.  We don't handle here, that NOP_EXPRs are something we want to
look through here, as it doesn't change anything if this is a constant, or
not.


NOPs around constants should have been folded away by the time we get
there.


Not in this cases, as the we actually have here a switch from const to
none-const.  So there is an attribute-change, which we can't ignore in
general.


I wasn't suggesting we ignore it, we should be able to change the type of
the vector_cst.


Well, the vector_cst we can change type, but this wouldn't help
AFAICS.  As there is still one cast surviving within PLUS_EXPR for the
other operand.


Isn't the other operand also constant?  In constexpr evaluation, either 
we're dealing with a bunch of constants, in which case we should be 
folding things fully, including conversions between const and non-const, 
or we don't care.



So the way to solve it would be to move such conversion out of the
expression.  For integer-scalars we do this, and for some
floating-points too.  So it might be something we don't handle for
operations with vector-type.


We don't need to worry about that in constexpr evaluation, since we only 
care about constant operands.



But I agree that for constexpr's we could special case cast
from const to none-const (as required in expressions like const vec v
= v + 1).


Right.  But really this should happen in convert.c, it shouldn't be specific
to C++.


Hmm, maybe.  But isn't one of our different goals to move such
implicit code-modification to match.pd instead?


Folding const into a constant is hardly code modification.  But perhaps 
it should go into fold_unary_loc:VIEW_CONVERT_EXPR rather than into 
convert.c.


Jason



[Patch, libstdc++/67362] Fix non-special character for POSIX basic syntax in regex

2015-08-26 Thread Tim Shen
Bootstrapped and tested on x86_64-pc-linux-gnu.

Thanks!


-- 
Regards,
Tim Shen
commit e134e1a835ad15900686351cade36774593b91ea
Author: Tim Shen 
Date:   Wed Aug 26 17:51:29 2015 -0700

PR libstdc++/67362
* include/bits/regex_scanner.tcc (_Scanner<>::_M_scan_normal):
Always returns ordinary char token if the char isn't
considered a special char.
* testsuite/28_regex/regression.cc: New test file for collecting
regression testcases from, typically, bugzilla.

diff --git a/libstdc++-v3/include/bits/regex_scanner.tcc 
b/libstdc++-v3/include/bits/regex_scanner.tcc
index 3bcbd0f..1555669 100644
--- a/libstdc++-v3/include/bits/regex_scanner.tcc
+++ b/libstdc++-v3/include/bits/regex_scanner.tcc
@@ -99,6 +99,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   auto __c = *_M_current++;
   const char* __pos;
 
+  if (std::strchr(_M_spec_char, _M_ctype.narrow(__c, '\0')) == nullptr)
+   {
+ _M_token = _S_token_ord_char;
+ _M_value.assign(1, __c);
+ return;
+   }
   if (__c == '\\')
{
  if (_M_current == _M_end)
diff --git a/libstdc++-v3/testsuite/28_regex/regression.cc 
b/libstdc++-v3/testsuite/28_regex/regression.cc
new file mode 100644
index 000..71d82d5
--- /dev/null
+++ b/libstdc++-v3/testsuite/28_regex/regression.cc
@@ -0,0 +1,42 @@
+// { dg-options "-std=gnu++11" }
+
+//
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+
+using namespace __gnu_test;
+using namespace std;
+
+// PR libstdc++/67362
+void
+test01()
+{
+  bool test __attribute__((unused)) = true;
+
+  regex re("((.)", regex_constants::basic);
+}
+
+int
+main()
+{
+  test01();
+  return 0;
+}
+


Re: [Patch, libstdc++/67362] Fix non-special character for POSIX basic syntax in regex

2015-08-26 Thread Tim Shen
On Wed, Aug 26, 2015 at 6:41 PM, Tim Shen  wrote:
> Bootstrapped and tested on x86_64-pc-linux-gnu.

Also plan to backport to 4.9 and 5.


-- 
Regards,
Tim Shen


[gomp4, wip] remove references to ganglocal shared memory inside gcc

2015-08-26 Thread Cesar Philippidis
This patch strips out all of the references to ganglocal memory in gcc.
Unfortunately, the runtime api still takes a shared memory parameter, so
I haven't made any changes there yet. Perhaps we could still keep the
shared memory argument to GOACC_parallel, but remove all of the support
for ganglocal mappings. Then again, maybe we still need support
ganglocal mappings for legacy purposes.

With the ganglocal mapping aside, I'm in favor of leaving the shared
memory argument to GOACC_parallel, just in case we find another use for
shared memory in the future.

Nathan, what do you want to do here?

Cesar
2015-08-26  Cesar Philippidis  

	gcc/
	* builtins.c (expand_oacc_ganglocal_ptr): Delete.
	(expand_builtin): Remove stale GOACC_GET_GANGLOCAL_PTR builtin.
	* config/nvptx/nvptx.md (ganglocal_ptr): Delete.
	* gimple.h (struct gimple_statement_omp_parallel_layout): Remove
	ganglocal_size member.
	(gimple_omp_target_ganglocal_size): Delete.
	(gimple_omp_target_set_ganglocal_size): Delete.
	* omp-builtins.def (BUILT_IN_GOACC_GET_GANGLOCAL_PTR): Delete.
	* omp-low.c (struct omp_context): Remove ganglocal_init, ganglocal_ptr,
	ganglocal_size, ganglocal_size_host, worker_var, worker_count and
	worker_sync_elt.
	(alloc_var_ganglocal): Delete.
	(install_var_ganglocal): Delete.
	(new_omp_context): Don't use ganglocal memory.
	(expand_omp_target): Likewise.
	(lower_omp_taskreg): Likewise.
	(lower_omp_target): Likewise.
	* tree-parloops.c (create_parallel_loop): Likewise.
	* tree-pretty-print.c (dump_omp_clause): Remove support for
	GOMP_MAP_FORCE_TO_GANGLOCAL

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 7c3ead1..f465716 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5913,25 +5913,6 @@ expand_builtin_acc_on_device (tree exp, rtx target)
   return target;
 }
 
-static rtx
-expand_oacc_ganglocal_ptr (rtx target ATTRIBUTE_UNUSED)
-{
-#ifdef HAVE_ganglocal_ptr
-  enum insn_code icode;
-  icode = CODE_FOR_ganglocal_ptr;
-  rtx tmp = target;
-  if (!REG_P (tmp) || GET_MODE (tmp) != Pmode)
-tmp = gen_reg_rtx (Pmode);
-  rtx insn = GEN_FCN (icode) (tmp);
-  if (insn != NULL_RTX)
-{
-  emit_insn (insn);
-  return tmp;
-}
-#endif
-  return NULL_RTX;
-}
-
 /* Expand an expression EXP that calls a built-in function,
with result going to TARGET if that's convenient
(and in mode MODE if that's convenient).
@@ -7074,12 +7055,6 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode,
 	return target;
   break;
 
-case BUILT_IN_GOACC_GET_GANGLOCAL_PTR:
-  target = expand_oacc_ganglocal_ptr (target);
-  if (target)
-	return target;
-  break;
-
 default:	/* just do library call, if unknown builtin */
   break;
 }
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 3d734a8..d0d6564 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1485,23 +1485,6 @@
   ""
   "%.\\tst.shared%u1\\t%1,%0;")
 
-(define_insn "ganglocal_ptr"
-  [(set (match_operand:P 0 "nvptx_register_operand" "")
-	(unspec:P [(const_int 0)] UNSPEC_SHARED_DATA))]
-  ""
-  "%.\\tcvta.shared%t0\\t%0, sdata;")
-
-(define_expand "ganglocal_ptr"
-  [(match_operand 0 "nvptx_register_operand" "")]
-  ""
-{
-  if (Pmode == DImode)
-emit_insn (gen_ganglocal_ptrdi (operands[0]));
-  else
-emit_insn (gen_ganglocal_ptrsi (operands[0]));
-  DONE;
-})
-
 ;; Atomic insns.
 
 (define_expand "atomic_compare_and_swap"
diff --git a/gcc/gimple.h b/gcc/gimple.h
index d8d8742..278b49f 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -580,10 +580,6 @@ struct GTY((tag("GSS_OMP_PARALLEL_LAYOUT")))
   /* [ WORD 10 ]
  Shared data argument.  */
   tree data_arg;
-
-  /* [ WORD 11 ]
- Size of the gang-local memory to allocate.  */
-  tree ganglocal_size;
 };
 
 /* GIMPLE_OMP_PARALLEL or GIMPLE_TASK */
@@ -5232,25 +5228,6 @@ gimple_omp_target_set_data_arg (gomp_target *omp_target_stmt,
 }
 
 
-/* Return the size of gang-local data associated with OMP_TARGET GS.  */
-
-static inline tree
-gimple_omp_target_ganglocal_size (const gomp_target *omp_target_stmt)
-{
-  return omp_target_stmt->ganglocal_size;
-}
-
-
-/* Set SIZE to be the size of gang-local memory associated with OMP_TARGET
-   GS.  */
-
-static inline void
-gimple_omp_target_set_ganglocal_size (gomp_target *omp_target_stmt, tree size)
-{
-  omp_target_stmt->ganglocal_size = size;
-}
-
-
 /* Return the clauses associated with OMP_TEAMS GS.  */
 
 static inline tree
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 0d9f386..615c4e0 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -58,8 +58,6 @@ DEF_GOACC_BUILTIN_FNSPEC (BUILT_IN_GOACC_UPDATE, "GOACC_update",
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_WAIT, "GOACC_wait",
 		   BT_FN_VOID_INT_INT_VAR,
 		   ATTR_NOTHROW_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_GANGLOCAL_PTR, "GOACC_get_ganglocal_ptr",
-		   BT_FN_PTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DEVICEPTR, "GOACC_deviceptr",
 		   BT_FN_PTR_PTR, ATTR_

[gomp4] initialize worker reduction locks

2015-08-26 Thread Cesar Philippidis
This patch teaches omplow how to emit function calls to
IFN_GOACC_LOCK_INIT so that the worker mutex has a proper initial value.
On nvptx targets, shared memory isn't initialized (and that's where the
lock is located for OpenACC workers), so this makes it explicit. Nathan
added the internal function used in the patch a couple of days ago.

I've applied this patch to gomp-4_0-branch.

Cesar
2015-08-26  Cesar Philippidis  

	gcc/
	* omp-low.c (lower_oacc_reductions): Call GOACC_REDUCTION_INIT
	to initialize the gang and worker mutex.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 955a098..ee92141 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -4795,10 +4795,20 @@ lower_oacc_reductions (enum internal_fn ifn, int loop_dim, tree clauses,
   if (ctx->reductions == 0)
 return;
 
+  dim = build_int_cst (integer_type_node, loop_dim);
+
+  /* Call GOACC_LOCK_INIT.  */
+  if (ifn == IFN_GOACC_REDUCTION_SETUP)
+{
+  call = build_call_expr_internal_loc (UNKNOWN_LOCATION,
+	   IFN_GOACC_LOCK_INIT,
+	   void_type_node, 2, dim, lid);
+  gimplify_and_add (call, ilist);
+}
+
   /* Call GOACC_LOCK.  */
   if (ifn == IFN_GOACC_REDUCTION_FINI && write_back)
 {
-  dim = build_int_cst (integer_type_node, loop_dim);
   call = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOACC_LOCK,
 	   void_type_node, 2, dim, lid);
   gimplify_and_add (call, ilist);


Re: [gomp4] lowering OpenACC reductions

2015-08-26 Thread Cesar Philippidis
On 08/21/2015 02:00 PM, Cesar Philippidis wrote:

> This patch teaches omplower how to utilize the new OpenACC reduction
> framework described in Nathan's document, which was posted here
> . Here is the
> infrastructure patch
> , and here's
> the nvptx backend changes
> . The updated
> reduction tests have been posted here
> .

All of these patches have been committed to gomp-4_0-branch.

Cesar


Go patch committed: Don't crash on invalid builtin calls

2015-08-26 Thread Ian Lance Taylor
This patch by Chris Manghane fixes the Go compiler to not crash when
it sees invalid builtin calls.  This fixes
https://golang.org/issue/11544 .  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 227227)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-cd5362c7bb0b207f484a8dfb8db229fd2bffef09
+5ee78e7d52a4cad0b23f5bc62e5b452489243c70
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 227227)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -6588,7 +6588,11 @@ Builtin_call_expression::Builtin_call_ex
 recover_arg_is_set_(false)
 {
   Func_expression* fnexp = this->fn()->func_expression();
-  go_assert(fnexp != NULL);
+  if (fnexp == NULL)
+{
+  this->code_ = BUILTIN_INVALID;
+  return;
+}
   const std::string& name(fnexp->named_object()->name());
   if (name == "append")
 this->code_ = BUILTIN_APPEND;
@@ -6661,7 +6665,7 @@ Expression*
 Builtin_call_expression::do_lower(Gogo* gogo, Named_object* function,
  Statement_inserter* inserter, int)
 {
-  if (this->classification() == EXPRESSION_ERROR)
+  if (this->is_error_expression())
 return this;
 
   Location loc = this->location();
@@ -7500,11 +7504,13 @@ Builtin_call_expression::do_discarding_v
 Type*
 Builtin_call_expression::do_type()
 {
+  if (this->is_error_expression())
+return Type::make_error_type();
   switch (this->code_)
 {
 case BUILTIN_INVALID:
 default:
-  go_unreachable();
+  return Type::make_error_type();
 
 case BUILTIN_NEW:
 case BUILTIN_MAKE:


[PATCH 2/2] remove -floop-* flags

2015-08-26 Thread Sebastian Pop
---
 gcc/Makefile.in|2 -
 gcc/common.opt |   16 +-
 gcc/doc/invoke.texi|  108 +-
 gcc/graphite-blocking.c|  270 -
 gcc/graphite-interchange.c |  656 
 gcc/graphite-optimize-isl.c|   14 +-
 gcc/graphite-poly.c|  489 +
 gcc/graphite-poly.h| 1082 
 gcc/graphite-sese-to-poly.c|   22 +-
 gcc/graphite.c |   10 +-
 gcc/testsuite/g++.dg/graphite/graphite.exp |   10 +-
 gcc/testsuite/gcc.dg/graphite/block-0.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-1.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-3.c|4 +-
 gcc/testsuite/gcc.dg/graphite/block-4.c|4 +-
 gcc/testsuite/gcc.dg/graphite/block-5.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-6.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-7.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-8.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-pr47654.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/graphite.exp |   14 +-
 gcc/testsuite/gcc.dg/graphite/interchange-0.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-1.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-10.c |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-11.c |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-12.c |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-13.c |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-14.c |3 +-
 gcc/testsuite/gcc.dg/graphite/interchange-15.c |4 +-
 gcc/testsuite/gcc.dg/graphite/interchange-3.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-4.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-5.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-6.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-7.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-8.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-9.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-mvt.c|4 +-
 gcc/testsuite/gcc.dg/graphite/pr37485.c|5 +-
 gcc/testsuite/gcc.dg/graphite/uns-block-1.c|2 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c |2 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c |3 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c |4 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c  |2 +-
 .../gcc.dg/graphite/uns-interchange-mvt.c  |4 +-
 gcc/testsuite/gfortran.dg/graphite/graphite.exp|   10 +-
 45 files changed, 98 insertions(+), 2686 deletions(-)
 delete mode 100644 gcc/graphite-blocking.c
 delete mode 100644 gcc/graphite-interchange.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index e298ecc..3d1c1e5 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1277,10 +1277,8 @@ OBJS = \
graph.o \
graphds.o \
graphite.o \
-   graphite-blocking.o \
graphite-isl-ast-to-gimple.o \
graphite-dependences.o \
-   graphite-interchange.o \
graphite-optimize-isl.o \
graphite-poly.o \
graphite-scop-detection.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 0964ae4..94d1d88 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1341,16 +1341,16 @@ Common Report Var(flag_loop_parallelize_all) 
Optimization
 Mark all loops as parallel
 
 floop-strip-mine
-Common Report Var(flag_loop_strip_mine) Optimization
-Enable Loop Strip Mining transformation
+Common Alias(floop-nest-optimize)
+Enable loop nest transforms.  Same as -floop-nest-optimize
 
 floop-interchange
-Common Report Var(flag_loop_interchange) Optimization
-Enable Loop Interchange transformation
+Common Alias(floop-nest-optimize)
+Enable loop nest transforms.  Same as -floop-nest-optimize
 
 floop-block
-Common Report Var(flag_loop_block) Optimization
-Enable Loop Blocking transformation
+Common Alias(floop-nest-optimize)
+Enable loop nest transforms.  Same as -floop-nest-optimize
 
 floop-unroll-and-jam
 Common Alias(floop-nest-optimize)
@@ -2315,8 +2315,8 @@ Common Report Var(flag_tree_loop_im) Init(1) Optimization
 Enable loop invariant motion on trees
 
 ftree-loop-linear
-Common Alias(floop-interchange)
-Enable loop interchange transforms.  Same as -floop-interchange
+Common Alias(floop-nest-optimize)
+Enable loop nest transforms.  Same as -floop-nest-optimize
 
 ftree-loop-ivcanon
 Common Report Var(flag_tree_loop_ivcanon) Init(1) Optimization
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c33cc27..8710ff8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8733,102 +8733,19 @@ Perform loop optimizations on trees.  This flag is 
enabled by default
 at @option{-O} and hi

[PATCH 1/2] remove -floop-unroll-and-jam

2015-08-26 Thread Sebastian Pop
---
 gcc/common.opt   |   4 +-
 gcc/doc/invoke.texi  |   8 +-
 gcc/graphite-isl-ast-to-gimple.c | 102 +-
 gcc/graphite-optimize-isl.c  | 179 ---
 gcc/graphite-poly.c  |   3 +-
 gcc/graphite-poly.h  |   3 -
 gcc/graphite.c   |   3 +-
 gcc/params.def   |  15 
 gcc/toplev.c |   3 +-
 9 files changed, 29 insertions(+), 291 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 4dcd518..0964ae4 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1353,8 +1353,8 @@ Common Report Var(flag_loop_block) Optimization
 Enable Loop Blocking transformation
 
 floop-unroll-and-jam
-Common Report Var(flag_loop_unroll_jam) Optimization
-Enable Loop Unroll Jam transformation
+Common Alias(floop-nest-optimize)
+Enable loop nest transforms.  Same as -floop-nest-optimize
  
 fgnu-tm
 Common Report Var(flag_tm)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 27be317..c33cc27 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8848,10 +8848,10 @@ is experimental.
 
 @item -floop-unroll-and-jam
 @opindex floop-unroll-and-jam
-Enable unroll and jam for the ISL based loop nest optimizer.  The unroll 
-factor can be changed using the @option{loop-unroll-jam-size} parameter.
-The unrolled dimension (counting from the most inner one) can be changed 
-using the @option{loop-unroll-jam-depth} parameter. .
+Perform loop nest transformations.  Same as
+@option{-floop-nest-optimize}.  To use this code transformation, GCC has
+to be configured with @option{--with-isl} to enable the Graphite loop
+transformation infrastructure.
 
 @item -floop-parallelize-all
 @opindex floop-parallelize-all
diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index dfb012f..5434bfd 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -968,92 +968,6 @@ extend_schedule (__isl_take isl_map *schedule, int 
nb_schedule_dims)
   return schedule;
 }
 
-/* Set the separation_class option for unroll and jam. */
-
-static __isl_give isl_union_map *
-generate_luj_sepclass_opt (scop_p scop, __isl_take isl_union_set *domain, 
-   int dim, int cl)
-{
-  isl_map  *map;
-  isl_space *space, *space_sep;
-  isl_ctx *ctx;
-  isl_union_map *mapu;
-  int nsched = get_max_schedule_dimensions (scop);
- 
-  ctx = scop->ctx;
-  space_sep = isl_space_alloc (ctx, 0, 1, 1);
-  space_sep = isl_space_wrap (space_sep);
-  space_sep = isl_space_set_tuple_name (space_sep, isl_dim_set,
-   "separation_class");
-  space = isl_set_get_space (scop->context);
-  space_sep = isl_space_align_params (space_sep, isl_space_copy(space));
-  space = isl_space_map_from_domain_and_range (space, space_sep);
-  space = isl_space_add_dims (space,isl_dim_in, nsched);
-  map = isl_map_universe (space);
-  isl_map_fix_si (map,isl_dim_out,0,dim);
-  isl_map_fix_si (map,isl_dim_out,1,cl);
-
-  mapu = isl_union_map_intersect_domain (isl_union_map_from_map (map), 
-domain);
-  return (mapu);
-}
-
-/* Compute the separation class for loop unroll and jam.  */
-
-static __isl_give isl_union_set *
-generate_luj_sepclass (scop_p scop)
-{
-  int i;
-  poly_bb_p pbb;
-  isl_union_set *domain_isl;
-
-  domain_isl = isl_union_set_empty (isl_set_get_space (scop->context));
-
-  FOR_EACH_VEC_ELT (SCOP_BBS (scop), i, pbb)
-{
-  isl_set *bb_domain;
-  isl_set *bb_domain_s;
-
-  if (pbb->map_sepclass == NULL)
-   continue;
-
-  if (isl_set_is_empty (pbb->domain))
-   continue;
-
-  bb_domain = isl_set_copy (pbb->domain);
-  bb_domain_s = isl_set_apply (bb_domain, pbb->map_sepclass);
-  pbb->map_sepclass = NULL;
-
-  domain_isl =
-   isl_union_set_union (domain_isl, isl_union_set_from_set (bb_domain_s));
-}
-
-  return domain_isl;
-}
-
-/* Set the AST built options for loop unroll and jam. */
- 
-static __isl_give isl_union_map *
-generate_luj_options (scop_p scop)
-{
-  isl_union_set *domain_isl;
-  isl_union_map *options_isl_ss;
-  isl_union_map *options_isl =
-isl_union_map_empty (isl_set_get_space (scop->context));
-  int dim = get_max_schedule_dimensions (scop) - 1;
-  int dim1 = dim - PARAM_VALUE (PARAM_LOOP_UNROLL_JAM_DEPTH);
-
-  if (!flag_loop_unroll_jam)
-return options_isl;
-
-  domain_isl = generate_luj_sepclass (scop);
-
-  options_isl_ss = generate_luj_sepclass_opt (scop, domain_isl, dim1, 0);
-  options_isl = isl_union_map_union (options_isl, options_isl_ss);
-
-  return options_isl;
-}
-
 /* Generates a schedule, which specifies an order used to
visit elements in a domain.  */
 
@@ -1102,13 +1016,11 @@ ast_build_before_for (__isl_keep isl_ast_build *build, 
void *user)
 }
 
 /* Set the separate option for all dimensions.
-   This helps to reduce control overhead.
-   Set the optio

[PATCH 0/2] Final cleanup in move to ISL

2015-08-26 Thread Sebastian Pop
Hi,

Richi suggested at the Cauldron that it would be good to have graphite more
automatic and with fewer flags.  The first patch removes the -funroll-and-jam
pass that does not seem very stable or useful for now.  The second patch removes
the other -floop-* flags that were part of the old graphite's middle-end (these
were the first transforms implemented on the polyhedral representation
(matrices, etc.) when we had no ISL scheduler.)  The transition to ISL that
removed GCC's dependence on PPL and Cloog has not removed all graphite's
middle-end for loop transforms.  We now can remove that code as it is replaced
by ISL's scheduler.

The patches pass "make check" and bootstrap (in progress) with 
-fgraphite-identity.
Ok to commit?

Thanks,
Sebastian


Sebastian Pop (2):
  remove -floop-unroll-and-jam
  remove -floop-* flags

 gcc/Makefile.in|2 -
 gcc/common.opt |   20 +-
 gcc/doc/invoke.texi|  108 +-
 gcc/graphite-blocking.c|  270 -
 gcc/graphite-interchange.c |  656 
 gcc/graphite-isl-ast-to-gimple.c   |  102 +-
 gcc/graphite-optimize-isl.c|  193 +---
 gcc/graphite-poly.c|  492 +
 gcc/graphite-poly.h| 1085 
 gcc/graphite-sese-to-poly.c|   22 +-
 gcc/graphite.c |   13 +-
 gcc/params.def |   15 -
 gcc/testsuite/g++.dg/graphite/graphite.exp |   10 +-
 gcc/testsuite/gcc.dg/graphite/block-0.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-1.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-3.c|4 +-
 gcc/testsuite/gcc.dg/graphite/block-4.c|4 +-
 gcc/testsuite/gcc.dg/graphite/block-5.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-6.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-7.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-8.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-pr47654.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/graphite.exp |   14 +-
 gcc/testsuite/gcc.dg/graphite/interchange-0.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-1.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-10.c |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-11.c |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-12.c |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-13.c |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-14.c |3 +-
 gcc/testsuite/gcc.dg/graphite/interchange-15.c |4 +-
 gcc/testsuite/gcc.dg/graphite/interchange-3.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-4.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-5.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-6.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-7.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-8.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-9.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-mvt.c|4 +-
 gcc/testsuite/gcc.dg/graphite/pr37485.c|5 +-
 gcc/testsuite/gcc.dg/graphite/uns-block-1.c|2 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c |2 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c |3 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c |4 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c  |2 +-
 .../gcc.dg/graphite/uns-interchange-mvt.c  |4 +-
 gcc/testsuite/gfortran.dg/graphite/graphite.exp|   10 +-
 gcc/toplev.c   |3 +-
 48 files changed, 123 insertions(+), 2973 deletions(-)
 delete mode 100644 gcc/graphite-blocking.c
 delete mode 100644 gcc/graphite-interchange.c

-- 
2.1.0.243.g30d45f7



Re: [PATCH] [AVX512F] Add scatter support for vectorizer

2015-08-26 Thread Uros Bizjak
On Wed, Aug 26, 2015 at 7:39 PM, Petr Murzin  wrote:
> On Wed, Aug 26, 2015 at 10:41 AM, Richard Biener
>  wrote:
>> @@ -3763,32 +3776,46 @@ again:
>>if (vf > *min_vf)
>> *min_vf = vf;
>>
>> -  if (gather)
>> +  if (gatherscatter != SG_NONE)
>> {
>>   tree off;
>> + if (vect_check_gather_scatter (stmt, loop_vinfo, NULL, &off,
>> NULL, true) != 0)
>> +   gatherscatter = GATHER;
>> + else if (vect_check_gather_scatter (stmt, loop_vinfo, NULL,
>> &off, NULL, false)
>> + != 0)
>> +   gatherscatter = SCATTER;
>> + else
>> +   gatherscatter = SG_NONE;
>>
>> as I said vect_check_gather_scatter already knows whether the DR is a read or
>> a write and thus whether it needs to check for gather or scatter.  Remove
>> the new argument.  And simply do
>>
>>if (!vect_check_gather_scatter (stmt))
>>  gatherscatter = SG_NONE;
>>
>> - STMT_VINFO_GATHER_P (stmt_info) = true;
>> + if (gatherscatter == GATHER)
>> +   STMT_VINFO_GATHER_P (stmt_info) = true;
>> + else
>> +   STMT_VINFO_SCATTER_P (stmt_info) = true;
>> }
>>
>> and as suggested merge STMT_VINFO_GATHER_P and STMT_VINFO_SCATTER_P
>> using the enum so you can simply do
>>
>>  STMT_VINFO_SCATTER_GATHER_P (smt_info) = gatherscatter;
>> Otherwise the patch looks ok to me.
>
> Fixed.
> Uros, could you please have a look at target part of patch?
>
> 2015-08-26  Andrey Turetskiy  
> Petr Murzin  
>
> gcc/
>
> * config/i386/i386-builtin-types.def
> (VOID_PFLOAT_HI_V8DI_V16SF_INT): New.
> (VOID_PDOUBLE_QI_V16SI_V8DF_INT): Ditto.
> (VOID_PINT_HI_V8DI_V16SI_INT): Ditto.
> (VOID_PLONGLONG_QI_V16SI_V8DI_INT): Ditto.
> * config/i386/i386.c
> (ix86_builtins): Add IX86_BUILTIN_SCATTERALTSIV8DF,
> IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI,
> IX86_BUILTIN_SCATTERALTDIV16SI.
> (ix86_init_mmx_sse_builtins): Define __builtin_ia32_scatteraltsiv8df,
> __builtin_ia32_scatteraltdiv8sf, __builtin_ia32_scatteraltsiv8di,
> __builtin_ia32_scatteraltdiv8si.
> (ix86_expand_builtin): Handle IX86_BUILTIN_SCATTERALTSIV8DF,
> IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI,
> IX86_BUILTIN_SCATTERALTDIV16SI.
> (ix86_vectorize_builtin_scatter): New.
> (TARGET_VECTORIZE_BUILTIN_SCATTER): Define as
> ix86_vectorize_builtin_scatter.
> * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_SCATTER): New.
> * doc/tm.texi: Regenerate.
> * target.def: Add scatter builtin.
> * tree-vectorizer.h: Rename gather_p to gather_scatter_p and use it
> for loads/stores in case of gather/scatter accordingly.
> (STMT_VINFO_GATHER_SCATTER_P(S)): Use it instead of STMT_VINFO_GATHER_P(S).
> (vect_check_gather): Rename to ...
> (vect_check_gather_scatter): this.
> * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Use
> STMT_VINFO_GATHER_SCATTER_P instead of STMT_VINFO_SCATTER_P.
> (vect_check_gather_scatter): Use it instead of vect_check_gather.
> (vect_analyze_data_refs): Add gatherscatter enum and maybe_scatter variable
> and new checkings for it accordingly.
> * tree-vect-stmts.c
> (STMT_VINFO_GATHER_SCATTER_P(S)): Use it instead of STMT_VINFO_GATHER_P(S).
> (vect_check_gather_scatter): Use it instead of vect_check_gather.
> (vectorizable_store): Add checkings for STMT_VINFO_GATHER_SCATTER_P.
>
> gcc/testsuite/
>
> * gcc.target/i386/avx512f-scatter-1.c: New.
> * gcc.target/i386/avx512f-scatter-2.c: Ditto.
> * gcc.target/i386/avx512f-scatter-3.c: Ditto.

x86 target part and testsuite are OK with the following change to the testcases:

> +/* { dg-do run } */
> +/* { dg-require-effective-target avx512f } */
> +/* { dg-options "-O3 -mavx512f -DAVX512F" } */
> +
> +#include "avx512f-check.h"
> +
> +#define N 1024

We don't want -D in the options, please move these to the source:

/* { dg-do run } */
/* { dg-require-effective-target avx512f } */
/* { dg-options "-O3 -mavx512f" } */

#define AVX512F

#include "avx512f-check.h"

#define N 1024

Thanks,
Uros.


[v3 patch] try_emplace and insert_or_assign for Debug Mode.

2015-08-26 Thread Jonathan Wakely

These new members need to be defined in Debug Mode, because the
iterators passed in as hints and returned as results need to be safe
iterators.

No new tests, because we already have tests for these members, and
they're failing in debug mode.

Tested powerpc64le-linux, committed to trunk.


commit ae899df9056ff8a58d658ef42125935856503f96
Author: Jonathan Wakely 
Date:   Wed Aug 26 21:24:30 2015 +0100

try_emplace and insert_or_assign for Debug Mode.

	* include/debug/map.h (map::try_emplace, map::insert_or_assign):
	Define.
	* include/debug/unordered_map (unordered_map::try_emplace,
	unordered_map::insert_or_assign): Define.

diff --git a/libstdc++-v3/include/debug/map.h b/libstdc++-v3/include/debug/map.h
index d45cf79..914d721 100644
--- a/libstdc++-v3/include/debug/map.h
+++ b/libstdc++-v3/include/debug/map.h
@@ -317,6 +317,89 @@ namespace __debug
 	_Base::insert(__first, __last);
 	}
 
+
+#if __cplusplus > 201402L
+  template 
+pair
+try_emplace(const key_type& __k, _Args&&... __args)
+{
+	  auto __res = _Base::try_emplace(__k,
+	  std::forward<_Args>(__args)...);
+	  return { iterator(__res.first, this), __res.second };
+	}
+
+  template 
+pair
+try_emplace(key_type&& __k, _Args&&... __args)
+{
+	  auto __res = _Base::try_emplace(std::move(__k),
+	  std::forward<_Args>(__args)...);
+	  return { iterator(__res.first, this), __res.second };
+	}
+
+  template 
+iterator
+try_emplace(const_iterator __hint, const key_type& __k,
+_Args&&... __args)
+{
+	  __glibcxx_check_insert(__hint);
+	  return iterator(_Base::try_emplace(__hint.base(), __k,
+	 std::forward<_Args>(__args)...),
+			  this);
+	}
+
+  template 
+iterator
+try_emplace(const_iterator __hint, key_type&& __k, _Args&&... __args)
+{
+	  __glibcxx_check_insert(__hint);
+	  return iterator(_Base::try_emplace(__hint.base(), std::move(__k),
+	 std::forward<_Args>(__args)...),
+			  this);
+	}
+
+  template 
+std::pair
+insert_or_assign(const key_type& __k, _Obj&& __obj)
+	{
+	  auto __res = _Base::insert_or_assign(__k,
+	   std::forward<_Obj>(__obj));
+	  return { iterator(__res.first, this), __res.second };
+	}
+
+  template 
+std::pair
+insert_or_assign(key_type&& __k, _Obj&& __obj)
+	{
+	  auto __res = _Base::insert_or_assign(std::move(__k),
+	   std::forward<_Obj>(__obj));
+	  return { iterator(__res.first, this), __res.second };
+	}
+
+  template 
+iterator
+insert_or_assign(const_iterator __hint,
+ const key_type& __k, _Obj&& __obj)
+	{
+	  __glibcxx_check_insert(__hint);
+	  return iterator(_Base::insert_or_assign(__hint.base(), __k,
+		  std::forward<_Obj>(__obj)),
+			  this);
+	}
+
+  template 
+iterator
+insert_or_assign(const_iterator __hint, key_type&& __k, _Obj&& __obj)
+{
+	  __glibcxx_check_insert(__hint);
+	  return iterator(_Base::insert_or_assign(__hint.base(),
+		  std::move(__k),
+		  std::forward<_Obj>(__obj)),
+			  this);
+	}
+#endif
+
+
 #if __cplusplus >= 201103L
   iterator
   erase(const_iterator __position)
diff --git a/libstdc++-v3/include/debug/unordered_map b/libstdc++-v3/include/debug/unordered_map
index cc3bc3f..1bbdb61 100644
--- a/libstdc++-v3/include/debug/unordered_map
+++ b/libstdc++-v3/include/debug/unordered_map
@@ -377,6 +377,88 @@ namespace __debug
 	  _M_check_rehashed(__bucket_count);
 	}
 
+#if __cplusplus > 201402L
+  template 
+pair
+try_emplace(const key_type& __k, _Args&&... __args)
+{
+	  auto __res = _Base::try_emplace(__k,
+	  std::forward<_Args>(__args)...);
+	  return { iterator(__res.first, this), __res.second };
+	}
+
+  template 
+pair
+try_emplace(key_type&& __k, _Args&&... __args)
+{
+	  auto __res = _Base::try_emplace(std::move(__k),
+	  std::forward<_Args>(__args)...);
+	  return { iterator(__res.first, this), __res.second };
+	}
+
+  template 
+iterator
+try_emplace(const_iterator __hint, const key_type& __k,
+_Args&&... __args)
+{
+	  __glibcxx_check_insert(__hint);
+	  return iterator(_Base::try_emplace(__hint.base(), __k,
+	 std::forward<_Args>(__args)...),
+			  this);
+	}
+
+  template 
+iterator
+try_emplace(const_iterator __hint, key_type&& __k, _Args&&... __args)
+{
+	  __glibcxx_check_insert(__hint);
+	  return iterator(_Base::try_emplace(__hint.base(), std::move(__k),
+	 std::forward<_Args>(__args)...),
+			  this);
+	}
+
+  template 
+pair
+insert_or_assign(const key_type& __k, _Obj&& __obj)
+{
+	  auto __res = _Base::insert_or_assign(__k,
+	   std::forward<_Obj>(__obj));
+	  return { iterator(__res.first, this), __res.second };
+	}
+
+  template 
+pai

Re: [libvtv] Fix formatting errors

2015-08-26 Thread Jeff Law

On 08/26/2015 01:50 PM, Caroline Tice wrote:

As far as I know vtv is working just fine...is there something I don't
know about?
I'm not aware of anything that isn't working, but I'm also not aware of 
vtv in widespread use, typical performance hit experienced, etc.


jeff



[patch] libstdc++/66902 Make _S_debug_messages static.

2015-08-26 Thread Jonathan Wakely

This patch removes a public symbol from the .so, which is generally a
bad thing, but there should be no users of this anywhere (it's never
declared in any public header).

For targets using symbol versioning this isn't exported at all, as it
isn't in the linker script, so this really just makes other targets
consistent with the ones using versioned symbols.

Tested powerpc64le-linux and dragonfly-4.2, committed to trunk
commit d35fbf8937930554af62a7320806abecf7381175
Author: Jonathan Wakely 
Date:   Fri Jul 17 10:15:03 2015 +0100

libstdc++/66902 Make _S_debug_messages static.

	PR libstdc++/66902
	* src/c++11/debug.cc (_S_debug_messages): Give internal linkage.

diff --git a/libstdc++-v3/src/c++11/debug.cc b/libstdc++-v3/src/c++11/debug.cc
index 997c0f3..c435de7 100644
--- a/libstdc++-v3/src/c++11/debug.cc
+++ b/libstdc++-v3/src/c++11/debug.cc
@@ -103,7 +103,7 @@ namespace
 
 namespace __gnu_debug
 {
-  const char* _S_debug_messages[] =
+  static const char* _S_debug_messages[] =
   {
 // General Checks
 "function requires a valid iterator range [%1.name;, %2.name;)",


[Patch, fortran] F2008 - implement pointer function assignment

2015-08-26 Thread Paul Richard Thomas
Dear All,

The attached patch more or less implements the assignment of
expressions to the result of a pointer function. To wit:

my_ptr_fcn (arg1, arg2...) = expr

arg1 would usually be the target, pointed to by the function. The
patch parses these statements and resolves them into:

temp_ptr => my_ptr_fcn (arg1, arg2...)
temp_ptr = expr

I say more or less implemented because I have ducked one of the
headaches here. At the end of the specification block, there is an
ambiguity between statement functions and pointer function
assignments. I do not even try to resolve this ambiguity and require
that there be at least one other type of executable statement before
these beasts. This can undoubtedly be fixed but the effort seems to me
to be unwarranted at the present time.

I had a stupid amount of trouble with the test fmt_tab_1.f90. I have
no idea why but the gfc_warning no longer showed the offending line,
although the line number in the error message was OK. Changing to
gfc_warning_now fixed the problem. Also, I can see no reason why this
should be dg-run and so changed to dg-compile. Finally, I set
-std=legacy to stop the generic error associated with tabs.

Bootstraps and regtests on x86_64/FC21 - OK for trunk?

Now back to trying to get my head round parameterized derived types!

Cheers

Paul

2015-08-26  Paul Thomas  

* decl.c (get_proc_name): Return if statement function is
found.
* io.c (next_char_not_space): Change tab warning to warning now
to prevent locus being lost.
* match.c (gfc_match_ptr_fcn_assign): New function.
* match.h : Add prototype for gfc_match_ptr_fcn_assign.
* parse.c : Add static flag 'in_specification_block'.
(decode_statement): If in specification block match a statement
function, otherwise if standard embraces F2008 try to match a
pointer function assignment.
(parse_interface): Set 'in_specification_block' on exiting from
parse_spec.
(parse_spec): Set and then reset 'in_specification_block'.
(gfc_parse_file): Set 'in_specification_block'.
* resolve.c (get_temp_from_expr): Extend to include functions
and array constructors as rvalues..
(resolve_ptr_fcn_assign): New function.
(gfc_resolve_code): Call it on finding a pointer function as an
lvalue.
* symbol.c (gfc_add_procedure): Add a sentence to the error to
flag up the ambiguity between a statement function and pointer
function assignment at the end of the specification block.

2015-08-26  Paul Thomas  

* gfortran.dg/fmt_tab_1.f90: Change from run to compile and set
standard as legacy.
* gfortran.dg/ptr_func_assign_1.f08: New test.
Index: gcc/fortran/decl.c
===
*** gcc/fortran/decl.c  (revision 227118)
--- gcc/fortran/decl.c  (working copy)
*** get_proc_name (const char *name, gfc_sym
*** 901,906 
--- 901,908 
  return rc;
  
sym = *result;
+   if (sym->attr.proc == PROC_ST_FUNCTION)
+ return rc;
  
if (sym->attr.module_procedure
&& sym->attr.if_source == IFSRC_IFBODY)
Index: gcc/fortran/io.c
===
*** gcc/fortran/io.c(revision 227118)
--- gcc/fortran/io.c(working copy)
*** next_char_not_space (bool *error)
*** 200,206 
if (c == '\t')
{
  if (gfc_option.allow_std & GFC_STD_GNU)
!   gfc_warning (0, "Extension: Tab character in format at %C");
  else
{
  gfc_error ("Extension: Tab character in format at %C");
--- 200,206 
if (c == '\t')
{
  if (gfc_option.allow_std & GFC_STD_GNU)
!   gfc_warning_now (0, "Extension: Tab character in format at %C");
  else
{
  gfc_error ("Extension: Tab character in format at %C");
Index: gcc/fortran/match.c
===
*** gcc/fortran/match.c (revision 227118)
--- gcc/fortran/match.c (working copy)
*** match
*** 4886,4892 
  gfc_match_st_function (void)
  {
gfc_error_buffer old_error;
- 
gfc_symbol *sym;
gfc_expr *expr;
match m;
--- 4886,4891 
*** gfc_match_st_function (void)
*** 4926,4931 
--- 4925,5000 
return MATCH_YES;
  
  undo_error:
+   gfc_pop_error (&old_error);
+   return MATCH_NO;
+ }
+ 
+ 
+ /* Match an assignment to a pointer function (F2008). This could, in
+general be ambiguous with a statement function. In this implementation
+it remains so if it is the first statement after the specification
+block.  */
+ 
+ match
+ gfc_match_ptr_fcn_assign (void)
+ {
+   gfc_error_buffer old_error;
+   locus old_loc;
+   gfc_symbol *sym;
+   gfc_expr *expr;
+   match m;
+   char name[GFC_MAX_SYMBOL_LEN + 1];
+ 
+   old_loc = gfc_current_locus;
+   m = gfc_match_name (name);
+   if (m != MATCH_YES)
+ return m;
+ 
+   gfc_find_symbol (name, NULL, 1, 

Fwd: [libvtv] Fix formatting errors

2015-08-26 Thread Caroline Tice
-- Forwarded message --
From: Caroline Tice 
Date: Wed, Aug 26, 2015 at 12:50 PM
Subject: Re: [libvtv] Fix formatting errors
To: Jeff Law 
Cc: Rainer Orth , GCC Patches



As far as I know vtv is working just fine...is there something I don't
know about?

-- Caroline
cmt...@google.com

On Wed, Aug 26, 2015 at 12:47 PM, Jeff Law  wrote:
>
> On 08/26/2015 07:30 AM, Rainer Orth wrote:
>>
>> While looking at libvtv for the Solaris port, I noticed all sorts of GNU
>> Coding Standard violations:
>>
>> * ChangeLog entries attributed to the committer instead of the author
>>and with misformatted PR references, entries only giving a vague
>>rational instead of what changed
>>
>> * overlong lines
>>
>> * tons of whitespace errors (though I may be wrong in some cases: C++
>>code might have other rules)
>>
>> * code formatting that seems to have been done to be visually pleasing,
>>completely different from what Emacs does
>>
>> * commented code fragments (#if 0 equivalent)
>>
>> * configure.tgt target list in no recognizable order
>>
>> * the Cygwin/MingW port is done in the worst possible way: tons of
>>target-specific ifdefs instead of feature-specific conditionals or an
>>interface that can wrap both Cygwin and Linux variants of the code
>>
>> The following patch (as yet not even compiled) fixes some of the most
>> glaring errors.  The Solaris port will fix a few of the latter ones.
>>
>> Do you think this is the right direction or did I get something wrong?
>>
>> Thanks.
>>  Rainer
>>
>>
>> 2015-08-26  Rainer Orth  
>>
>> Fix formatting errors.
>
> I'm more interested in the current state of vtv as I keep getting dragged 
> into discussions about what we can/should be doing in the compiler world to 
> close more security stuff.
>
> Vtables are an obvious candidate given we've got vtv.
>
> Jeff


Re: [libvtv] Fix formatting errors

2015-08-26 Thread Jeff Law

On 08/26/2015 07:30 AM, Rainer Orth wrote:

While looking at libvtv for the Solaris port, I noticed all sorts of GNU
Coding Standard violations:

* ChangeLog entries attributed to the committer instead of the author
   and with misformatted PR references, entries only giving a vague
   rational instead of what changed

* overlong lines

* tons of whitespace errors (though I may be wrong in some cases: C++
   code might have other rules)

* code formatting that seems to have been done to be visually pleasing,
   completely different from what Emacs does

* commented code fragments (#if 0 equivalent)

* configure.tgt target list in no recognizable order

* the Cygwin/MingW port is done in the worst possible way: tons of
   target-specific ifdefs instead of feature-specific conditionals or an
   interface that can wrap both Cygwin and Linux variants of the code

The following patch (as yet not even compiled) fixes some of the most
glaring errors.  The Solaris port will fix a few of the latter ones.

Do you think this is the right direction or did I get something wrong?

Thanks.
 Rainer


2015-08-26  Rainer Orth  

Fix formatting errors.
I'm more interested in the current state of vtv as I keep getting 
dragged into discussions about what we can/should be doing in the 
compiler world to close more security stuff.


Vtables are an obvious candidate given we've got vtv.

Jeff


Go patch committed: don't crash on invalid numeric type

2015-08-26 Thread Ian Lance Taylor
This patch by Chris Manghane fixes a compiler crash on an invalid
program when the compiler tries to set a numeric constant to an
invalid type.  This fixes https://golang.org/issue/11537.
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 227201)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-d5e6af4e6dd456075a1ec1c03d0dc41cbea5eb36
+cd5362c7bb0b207f484a8dfb8db229fd2bffef09
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 227201)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -15150,7 +15150,11 @@ Numeric_constant::set_type(Type* type, b
   else if (type->complex_type() != NULL)
 ret = this->check_complex_type(type->complex_type(), issue_error, loc);
   else
-go_unreachable();
+{
+  ret = false;
+  if (issue_error)
+go_assert(saw_errors());
+}
   if (ret)
 this->type_ = type;
   return ret;


Re: [PATCH 2/5] completely_scalarize arrays as well as records

2015-08-26 Thread Richard Biener
On August 26, 2015 6:08:55 PM GMT+02:00, Alan Lawrence  
wrote:
>Richard Biener wrote:
>
 One extra question is does the way we limit total scalarization
>work
>>> well
 for arrays?  I suppose we have either sth like the maximum size of
>an
 aggregate we scalarize or the maximum number of component accesses
 we create?

>>> Only the former and that would be kept intact.  It is in fact
>visible
>>> in the context of the last hunk of the patch.
>> 
>> OK.  IIRC the gimplification code also has the latter and also
>considers zeroing the whole aggregate before initializing non-zero
>fields.  IMHO it makes sense to reuse some of the analysis and
>classification routines it has.
>
>Do you mean gimplify_init_constructor? Yes, there's quite a lot of
>logic there 
>;). That feels like a separate patch - and belonging to the
>constant-handling 
>subseries of this series

Yes.

 - as gimplify_init_constructor already deals
>with both 
>record and array types, and I don't see anything there that's
>specifically good 
>for total-scalarization of arrays?
>
>IOW, do you mean that to block this patch, or can it be separate (I can
>address 
>Martin + Jeff's comments fairly quickly and independently) ?

No, but I'd like this being explores with the init sub series.  We don't want 
two places doing total scalarization of initualizers , gimplification and SRA 
and with different/conflicting heuristics.  IMHO the gimplification total 
scalarization happens too early.

Richard.

>Cheers, Alan




Re: [PATCH] [AVX512F] Add scatter support for vectorizer

2015-08-26 Thread Petr Murzin
On Wed, Aug 26, 2015 at 10:41 AM, Richard Biener
 wrote:
> @@ -3763,32 +3776,46 @@ again:
>if (vf > *min_vf)
> *min_vf = vf;
>
> -  if (gather)
> +  if (gatherscatter != SG_NONE)
> {
>   tree off;
> + if (vect_check_gather_scatter (stmt, loop_vinfo, NULL, &off,
> NULL, true) != 0)
> +   gatherscatter = GATHER;
> + else if (vect_check_gather_scatter (stmt, loop_vinfo, NULL,
> &off, NULL, false)
> + != 0)
> +   gatherscatter = SCATTER;
> + else
> +   gatherscatter = SG_NONE;
>
> as I said vect_check_gather_scatter already knows whether the DR is a read or
> a write and thus whether it needs to check for gather or scatter.  Remove
> the new argument.  And simply do
>
>if (!vect_check_gather_scatter (stmt))
>  gatherscatter = SG_NONE;
>
> - STMT_VINFO_GATHER_P (stmt_info) = true;
> + if (gatherscatter == GATHER)
> +   STMT_VINFO_GATHER_P (stmt_info) = true;
> + else
> +   STMT_VINFO_SCATTER_P (stmt_info) = true;
> }
>
> and as suggested merge STMT_VINFO_GATHER_P and STMT_VINFO_SCATTER_P
> using the enum so you can simply do
>
>  STMT_VINFO_SCATTER_GATHER_P (smt_info) = gatherscatter;
> Otherwise the patch looks ok to me.

Fixed.
Uros, could you please have a look at target part of patch?

Thanks,
Petr

2015-08-26  Andrey Turetskiy  
Petr Murzin  

gcc/

* config/i386/i386-builtin-types.def
(VOID_PFLOAT_HI_V8DI_V16SF_INT): New.
(VOID_PDOUBLE_QI_V16SI_V8DF_INT): Ditto.
(VOID_PINT_HI_V8DI_V16SI_INT): Ditto.
(VOID_PLONGLONG_QI_V16SI_V8DI_INT): Ditto.
* config/i386/i386.c
(ix86_builtins): Add IX86_BUILTIN_SCATTERALTSIV8DF,
IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI,
IX86_BUILTIN_SCATTERALTDIV16SI.
(ix86_init_mmx_sse_builtins): Define __builtin_ia32_scatteraltsiv8df,
__builtin_ia32_scatteraltdiv8sf, __builtin_ia32_scatteraltsiv8di,
__builtin_ia32_scatteraltdiv8si.
(ix86_expand_builtin): Handle IX86_BUILTIN_SCATTERALTSIV8DF,
IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI,
IX86_BUILTIN_SCATTERALTDIV16SI.
(ix86_vectorize_builtin_scatter): New.
(TARGET_VECTORIZE_BUILTIN_SCATTER): Define as
ix86_vectorize_builtin_scatter.
* doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_SCATTER): New.
* doc/tm.texi: Regenerate.
* target.def: Add scatter builtin.
* tree-vectorizer.h: Rename gather_p to gather_scatter_p and use it
for loads/stores in case of gather/scatter accordingly.
(STMT_VINFO_GATHER_SCATTER_P(S)): Use it instead of STMT_VINFO_GATHER_P(S).
(vect_check_gather): Rename to ...
(vect_check_gather_scatter): this.
* tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Use
STMT_VINFO_GATHER_SCATTER_P instead of STMT_VINFO_SCATTER_P.
(vect_check_gather_scatter): Use it instead of vect_check_gather.
(vect_analyze_data_refs): Add gatherscatter enum and maybe_scatter variable
and new checkings for it accordingly.
* tree-vect-stmts.c
(STMT_VINFO_GATHER_SCATTER_P(S)): Use it instead of STMT_VINFO_GATHER_P(S).
(vect_check_gather_scatter): Use it instead of vect_check_gather.
(vectorizable_store): Add checkings for STMT_VINFO_GATHER_SCATTER_P.

gcc/testsuite/

* gcc.target/i386/avx512f-scatter-1.c: New.
* gcc.target/i386/avx512f-scatter-2.c: Ditto.
* gcc.target/i386/avx512f-scatter-3.c: Ditto.


scatter
Description: Binary data


tests
Description: Binary data


Re: [PATCH] 2015-07-31 Benedikt Huber Philipp Tomsich

2015-08-26 Thread Benedikt Huber
ping

[PATCH v4][aarch64] Implemented reciprocal square root (rsqrt) estimation in 
-ffast-math

https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02698.html

> On 31 Jul 2015, at 19:05, Benedikt Huber 
>  wrote:
> 
>   * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and
>   rsqrtf.
>   * config/aarch64/aarch64-opts.h: -mrecip has a default value
>   depending on the core.
>   * config/aarch64/aarch64-protos.h: Declare.
>   * config/aarch64/aarch64-simd.md: Matching expressions for
>   frsqrte and frsqrts.
>   * config/aarch64/aarch64-tuning-flags.def: Added
>   MRECIP_DEFAULT_ENABLED.
>   * config/aarch64/aarch64.c: New functions. Emit rsqrt
>   estimation code in fast math mode.
>   * config/aarch64/aarch64.md: Added enum entries.
>   * config/aarch64/aarch64.opt: Added options -mrecip and
>   -mlow-precision-recip-sqrt.
>   * testsuite/gcc.target/aarch64/rsqrt-asm-check.c: Assembly scans
>   for frsqrte and frsqrts
>   * testsuite/gcc.target/aarch64/rsqrt.c: Functional tests for rsqrt.
> 
> Signed-off-by: Philipp Tomsich 
> ---
> gcc/ChangeLog  |  21 
> gcc/config/aarch64/aarch64-builtins.c  | 104 
> gcc/config/aarch64/aarch64-opts.h  |   7 ++
> gcc/config/aarch64/aarch64-protos.h|   2 +
> gcc/config/aarch64/aarch64-simd.md |  27 ++
> gcc/config/aarch64/aarch64-tuning-flags.def|   1 +
> gcc/config/aarch64/aarch64.c   | 106 +++-
> gcc/config/aarch64/aarch64.md  |   3 +
> gcc/config/aarch64/aarch64.opt |   8 ++
> gcc/doc/invoke.texi|  19 
> gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c |  63 
> gcc/testsuite/gcc.target/aarch64/rsqrt.c   | 107 +
> 12 files changed, 463 insertions(+), 5 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 3432adb..3bf3098 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,24 @@
> +2015-07-31  Benedikt Huber  
> + Philipp Tomsich  
> +
> + * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and
> + rsqrtf.
> + * config/aarch64/aarch64-opts.h: -mrecip has a default value
> + depending on the core.
> + * config/aarch64/aarch64-protos.h: Declare.
> + * config/aarch64/aarch64-simd.md: Matching expressions for
> + frsqrte and frsqrts.
> + * config/aarch64/aarch64-tuning-flags.def: Added
> + MRECIP_DEFAULT_ENABLED.
> + * config/aarch64/aarch64.c: New functions. Emit rsqrt
> + estimation code in fast math mode.
> + * config/aarch64/aarch64.md: Added enum entries.
> + * config/aarch64/aarch64.opt: Added options -mrecip and
> + -mlow-precision-recip-sqrt.
> + * testsuite/gcc.target/aarch64/rsqrt-asm-check.c: Assembly scans
> + for frsqrte and frsqrts
> + * testsuite/gcc.target/aarch64/rsqrt.c: Functional tests for rsqrt.
> +
> 2015-07-08  Jiong Wang  
> 
>   * config/aarch64/aarch64.c (aarch64_unspec_may_trap_p): New function.
> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index b6c89b9..b4f443c 100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -335,6 +335,11 @@ enum aarch64_builtins
>   AARCH64_BUILTIN_GET_FPSR,
>   AARCH64_BUILTIN_SET_FPSR,
> 
> +  AARCH64_BUILTIN_RSQRT_DF,
> +  AARCH64_BUILTIN_RSQRT_SF,
> +  AARCH64_BUILTIN_RSQRT_V2DF,
> +  AARCH64_BUILTIN_RSQRT_V2SF,
> +  AARCH64_BUILTIN_RSQRT_V4SF,
>   AARCH64_SIMD_BUILTIN_BASE,
>   AARCH64_SIMD_BUILTIN_LANE_CHECK,
> #include "aarch64-simd-builtins.def"
> @@ -824,6 +829,42 @@ aarch64_init_crc32_builtins ()
> }
> 
> void
> +aarch64_add_builtin_rsqrt (void)
> +{
> +  tree fndecl = NULL;
> +  tree ftype = NULL;
> +
> +  tree V2SF_type_node = build_vector_type (float_type_node, 2);
> +  tree V2DF_type_node = build_vector_type (double_type_node, 2);
> +  tree V4SF_type_node = build_vector_type (float_type_node, 4);
> +
> +  ftype = build_function_type_list (double_type_node, double_type_node, 
> NULL_TREE);
> +  fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_df",
> +ftype, AARCH64_BUILTIN_RSQRT_DF, BUILT_IN_MD, NULL, NULL_TREE);
> +  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_DF] = fndecl;
> +
> +  ftype = build_function_type_list (float_type_node, float_type_node, 
> NULL_TREE);
> +  fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_sf",
> +ftype, AARCH64_BUILTIN_RSQRT_SF, BUILT_IN_MD, NULL, NULL_TREE);
> +  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_SF] = fndecl;
> +
> +  ftype = build_function_type_list (V2DF_type_node, V2DF_type_node, 
> NULL_TREE);
> +  fndecl = add_builtin_function ("__builtin_aarch64_rs

Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Jeff Law

On 08/26/2015 05:13 AM, Ilya Enkovich wrote:

2015-08-26 0:42 GMT+03:00 Jeff Law :

On 08/21/2015 04:49 AM, Ilya Enkovich wrote:



I want a work with bitmasks to be expressed in a natural way using
regular integer operations. Currently all masks manipulations are
emulated via vector statements (mostly using a bunch of vec_cond). For
complex predicates it may be nontrivial to transform it back to scalar
masks and get an efficient code. Also the same vector may be used as
both a mask and an integer vector. Things become more complex if you
additionally have broadcasts and vector pack/unpack code. It also
should be transformed into a scalar masks manipulations somehow.


Or why not model the conversion at the gimple level using a CONVERT_EXPR?
In fact, the more I think about it, that seems to make more sense to me.

We pick a canonical form for the mask, whatever it may be.  We use that
canonical form and model conversions between it and the other form via
CONVERT_EXPR.  We then let DOM/PRE find/eliminate the redundant conversions.
If it's not up to the task, we should really look into why and resolve.

Yes, that does mean we have two forms which I'm not terribly happy about and
it means some target dependencies on what the masked vector operation looks
like (ie, does it accept a simple integer or vector mask), but I'm starting
to wonder if, as distasteful as I find it, it's the right thing to do.


If we have some special representation for masks in GIMPLE then we
might not need any conversions. We could ask a target to define a MODE
for this type and use it directly everywhere: directly compare into
it, use it directly for masked loads and stores, AND, IOR, EQ etc. If
that type is reserved for masks usage then you previous suggestion to
transform masks into target specific form at GIMPLE->RTL phase should
work fine. This would allow to support only a single masks
representation in GIMPLE.
Possibly, but you mentioned that you may need to use the masks in both 
forms depending on the exact context.  If so, then I think we need to 
model a conversion between the two forms.



Jeff



Re: [PATCH 2/5] completely_scalarize arrays as well as records

2015-08-26 Thread Alan Lawrence

Richard Biener wrote:


One extra question is does the way we limit total scalarization work

well

for arrays?  I suppose we have either sth like the maximum size of an
aggregate we scalarize or the maximum number of component accesses
we create?


Only the former and that would be kept intact.  It is in fact visible
in the context of the last hunk of the patch.


OK.  IIRC the gimplification code also has the latter and also considers 
zeroing the whole aggregate before initializing non-zero fields.  IMHO it makes 
sense to reuse some of the analysis and classification routines it has.


Do you mean gimplify_init_constructor? Yes, there's quite a lot of logic there 
;). That feels like a separate patch - and belonging to the constant-handling 
subseries of this series - as gimplify_init_constructor already deals with both 
record and array types, and I don't see anything there that's specifically good 
for total-scalarization of arrays?


IOW, do you mean that to block this patch, or can it be separate (I can address 
Martin + Jeff's comments fairly quickly and independently) ?


Cheers, Alan



Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Ilya Enkovich
2015-08-26 17:56 GMT+03:00 Richard Biener :
> On Wed, Aug 26, 2015 at 4:38 PM, Ilya Enkovich  wrote:
>> 2015-08-26 16:02 GMT+03:00 Richard Biener :
>>> On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich  
>>> wrote:
 2015-08-21 14:00 GMT+03:00 Richard Biener :
>
> Hmm, I don't see how vector masks are more difficult to operate with.

 There are just no instructions for that but you have to pretend you
 have to get code vectorized.
>>>
>>> Huh?  Bitwise ops should be readily available.
>>
>> Right bitwise ops are available, but there is no comparison into a
>> vector and no masked loads and stores using vector masks (when we
>> speak about 512-bit vectors).
>>
>>>
>
>> Also according to vector ABI integer mask should be used for mask
>> operand in case of masked vector call.
>
> What ABI?  The function signature of the intrinsics?  How would that
> come into play here?

 Not intrinsics. I mean OpenMP vector functions which require integer
 arg for a mask in case of 512-bit vector.
>>>
>>> How do you declare those?
>>
>> Something like this:
>>
>> #pragma omp declare simd inbranch
>> int foo(int*);
>
> The 'inbranch' is the thing that matters?  And all of foo is then
> implicitely predicated?

That's right. And a vector version of foo gets a mask as an additional arg.

>
>>>
>>> Well, you are missing the case of
>>>
>>>bool b = a < b;
>>>int x = (int)b;
>>
>> This case seems to require no changes and just be transformed into vec_cond.
>
> Ok, the example was too simple but I meant that a bool has a non-conditional
> use.

Right. In such cases I think it's reasonable to replace it with a
select similar to what we now have but without whole bool tree
transformed.

>
> Ok, so I still believe we don't want two ways to express things on GIMPLE if
> possible.  Yes, the vectorizer already creates only vector stmts that
> are supported
> by the hardware.  So it's a matter of deciding on the GIMPLE representation
> for the "mask".  I'd rather use vector (and the target assigning
> an integer
> mode to it) than an 'int' in GIMPLE statements.  Because that makes the
> type constraints on GIMPLE very weak and exposes those 'ints' to all kind
> of optimization passes.
>
> Thus if we change the result type requirement of vector comparisons from
> signed integer vectors to bool vectors the vectorizer can still go for
> promoting that bool vector to a vector of ints via a VEC_COND_EXPR
> and the expander can special-case that if the target has a vector comparison
> producing a vector mask.
>
> So, can you give that vector some thought?

Yes, I want to try it. But getting rid of bool patterns would mean
support for all targets currently supporting vec_cond. Would it be OK
to have vector mask co-exist with bool patterns for some time?
Thus first step would be to require vector for MASK_LOAD and
MASK_STORE and support it for i386 (the only user of MASK_LOAD and
MASK_STORE).

>Note that to assign
> sth else than a vector mode to it needs adjustments in stor-layout.c.
> I'm pretty sure we don't want vector BImodes.

I can directly build a vector type with specified mode to avoid it. Smth. like:

mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size);
mask_type = make_vector_type (bool_type_node, nunits, mask_mode);

Thanks,
Ilya

>
> Richard.
>


Re: [RFC 4/5] Handle constant-pool entries

2015-08-26 Thread Alan Lawrence

Jeff Law wrote:


The question I have is why this differs from the effects of patch #5. 
That would seem to indicate that there's things we're not getting into 
the candidate tables with this approach?!?


I'll answer this first, as I think (Richard and) Martin have identified enough 
other issues with this patch that will take longer to address but if you 
look at the context to the hunk in patch 5, it is iterating through the 
candidates (from patch 4), and then filtering out any candidates bigger than 
max-scalarization-size, which filtering patch 5 removes.


--Alan



[PATCH] Fix and simplify (Re: Fix libbacktrace -fPIC breakage from "Use libbacktrace in libgfortran")

2015-08-26 Thread Ulrich Weigand
Hans-Peter Nilsson wrote:

> I don't feel very confused, but I understand you've investigated
> things down to a point where we can conclude that libtool can't
> do what SPU needs without also at least fiddling with
> compilation options.

Well, looks like I was confused after all.  I missed one extra
feature of libtool that does indeed just make everything work
automatically: if a library is set up using the "noinst" flag,
libtool considers it a "convenience library" and will never
create a shared library in any case; but it will create two
sets of object files, one suitable for linking into a static
library and one suitable for linking into a shared library,
and will automatically use the correct set when linking any
other library against the "convenince library".

This is exactly what we want to happen for libbacktrace.  And
in fact, it is *already* set up as convenience library:
noinst_LTLIBRARIES = libbacktrace.la

This means the only thing we need to do is simply remove all
the special code: no more "disable-shared" and no more fiddling
with -fPIC (except for the --enable-host-shared case, which
remains special just like it does in all other libraries).

I've verified that this works on x86_64: the resulting
libgfortran.so uses the -fPIC version of the libbacktrace
object, while libgfortran.a uses the non-PIC versions.

On SPU, libtool will now automatically only generate the
non-PIC versions since the target does not support shared
library.  So everything works as expected.

OK for mainline?

Bye,
Ulrich

Index: libbacktrace/configure.ac
===
--- libbacktrace/configure.ac   (revision 227217)
+++ libbacktrace/configure.ac   (working copy)
@@ -79,7 +79,7 @@ case "$AWK" in
 "") AC_MSG_ERROR([can't build without awk]) ;;
 esac
 
-LT_INIT([disable-shared])
+LT_INIT
 AM_PROG_LIBTOOL
 
 backtrace_supported=yes
@@ -161,22 +161,11 @@ else
   fi
 fi
 
-# When building as a target library, shared libraries may want to link
-# this in.  We don't want to provide another shared library to
-# complicate dependencies.  Instead, we just compile with -fPIC, if
-# the target supports compiling with that option.
-PIC_FLAG=
-if test -n "${with_target_subdir}"; then
-  ac_save_CFLAGS="$CFLAGS"
-  CFLAGS="$CFLAGS -fPIC"
-  AC_TRY_COMPILE([], [], [PIC_FLAG=-fPIC])
-  CFLAGS="$ac_save_CFLAGS"
-fi
-# Similarly, use -fPIC with --enable-host-shared:
+# Enable --enable-host-shared.
 AC_ARG_ENABLE(host-shared,
 [AS_HELP_STRING([--enable-host-shared],
[build host code as shared libraries])],
-[PIC_FLAG=-fPIC], [])
+[PIC_FLAG=-fPIC], [PIC_FLAG=])
 AC_SUBST(PIC_FLAG)
 
 # Test for __sync support.

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Jakub Jelinek
On Wed, Aug 26, 2015 at 04:56:23PM +0200, Richard Biener wrote:
> >> How do you declare those?
> >
> > Something like this:
> >
> > #pragma omp declare simd inbranch
> > int foo(int*);
> 
> The 'inbranch' is the thing that matters?  And all of foo is then
> implicitely predicated?

If it is
#pragma omp declare simd notinbranch,
then only the non-predicated version is emitted and thus it is usable only
in vectorized loops inside of non-conditional contexts.
If it is
#pragma omp declare simd inbranch,
then only the predicated version is emitted, there is an extra argument
(either V*QI if I remember well, or for AVX-512 short/int/long bitmask),
if the caller wants to use it in non-conditional contexts, it just passes
all ones mask.  For
#pragma omp declare simd
(neither inbranch nor notinbranch), two versions are emitted, one predicated
and one non-predicated.

Jakub


Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Richard Biener
On Wed, Aug 26, 2015 at 4:38 PM, Ilya Enkovich  wrote:
> 2015-08-26 16:02 GMT+03:00 Richard Biener :
>> On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich  
>> wrote:
>>> 2015-08-21 14:00 GMT+03:00 Richard Biener :

 Hmm, I don't see how vector masks are more difficult to operate with.
>>>
>>> There are just no instructions for that but you have to pretend you
>>> have to get code vectorized.
>>
>> Huh?  Bitwise ops should be readily available.
>
> Right bitwise ops are available, but there is no comparison into a
> vector and no masked loads and stores using vector masks (when we
> speak about 512-bit vectors).
>
>>

> Also according to vector ABI integer mask should be used for mask
> operand in case of masked vector call.

 What ABI?  The function signature of the intrinsics?  How would that
 come into play here?
>>>
>>> Not intrinsics. I mean OpenMP vector functions which require integer
>>> arg for a mask in case of 512-bit vector.
>>
>> How do you declare those?
>
> Something like this:
>
> #pragma omp declare simd inbranch
> int foo(int*);

The 'inbranch' is the thing that matters?  And all of foo is then
implicitely predicated?

>>

> Current implementation of masked loads, masked stores and bool
> patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
> really call it a canonical representation for all targets?

 No idea - we'll revisit when another targets adds a similar capability.
>>>
>>> AVX-512 is such target. Current representation forces multiple scalar
>>> mask -> vector mask and back transformations which are artificially
>>> introduced by current bool patterns and are hard to optimize out.
>>
>> I dislike the bool patterns anyway and we should try to remove those
>> and make the vectorizer handle them in other ways (they have single-use
>> issues anyway).  I don't remember exactly what caused us to add them
>> but one reason was there wasn't a vector type for 'bool' (but I don't see how
>> it should be necessary to ask "get me a vector type for 'bool'").
>>

> Using scalar masks everywhere should probably cause the same conversion
> problem for SSE I listed above though.
>
> Talking about a canonical representation, shouldn't we use some
> special masks representation and not mixing it with integer and vector
> of integers then? Only in this case target would be able to
> efficiently expand it into a corresponding rtl.

 That was my idea of vector ... but I didn't explore it and see where
 it will cause issues.

 Fact is GCC already copes with vector masks generated by vector compares
 just fine everywhere and I'd rather leave it as that.
>>>
>>> Nope. Currently vector mask is obtained from a vec_cond >> 0}, {-1 .. -1}>. AND and IOR on bools are also expressed via
>>> additional vec_cond. I don't think vectorizer ever generates vector
>>> comparison.
>>
>> Ok, well that's an implementation detail then.  Are you sure about AND and 
>> IOR?
>> The comment above vect_recog_bool_pattern says
>>
>> Assuming size of TYPE is the same as size of all comparisons
>> (otherwise some casts would be added where needed), the above
>> sequence we create related pattern stmts:
>> S1'  a_T = x1 CMP1 y1 ? 1 : 0;
>> S3'  c_T = x2 CMP2 y2 ? a_T : 0;
>> S4'  d_T = x3 CMP3 y3 ? 1 : 0;
>> S5'  e_T = c_T | d_T;
>> S6'  f_T = e_T;
>>
>> thus has vector mask |
>
> I think in practice it would look like:
>
> S4'  d_T = x3 CMP3 y3 ? 1 : c_T;
>
> Thus everything is usually hidden in vec_cond. But my concern is
> mostly about types used for that.
>
>>
>>> And I wouldn't say it's fine 'everywhere' because there is a single
>>> target utilizing them. Masked loads and stored for AVX-512 just don't
>>> work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
>>> 512-bit vector then we get an ugly inefficient code. The question is
>>> where to fight with this inefficiency: in RTL or in GIMPLE. I want to
>>> fight with it where it appears, i.e. in GIMPLE by preventing bool ->
>>> int conversions applied everywhere even if target doesn't need it.
>>>
>>> If we don't want to support both types of masks in GIMPLE then it's
>>> more reasonable to make bool -> int conversion in expand for targets
>>> requiring it, rather than do it for everyone and then leave it to
>>> target to transform it back and try to get rid of all those redundant
>>> transformations. I'd give vector a chance to become a canonical
>>> mask representation for that.
>>
>> Well, you are missing the case of
>>
>>bool b = a < b;
>>int x = (int)b;
>
> This case seems to require no changes and just be transformed into vec_cond.

Ok, the example was too simple but I meant that a bool has a non-conditional
use.

Ok, so I still believe we don't want two ways to express things on GIMPLE if
possible.  Yes, the vectorizer already creates only vector stmts that
a

[PATCH] s390: Add emit_barrier() after trap.

2015-08-26 Thread Dominik Vogt
This patch fixes an ICE on S390 when a trap is generated because
the given -mstack-size is to small.  A barrier was missing after
the trap, so on higher optimization levels a NULL pointer fron an
uninitialized basic block was used.  The patch also contains a
test case.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog
 
* config/s390/s390.c (s390_emit_prologue): Add emit_barrier() after
trap to fix ICE.

gcc/testsuite/ChangeLog
 
* gcc.target/s390/20150826-1.c: New test.
>From ec6b88cd51234d138bd559271def086156fcae07 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 26 Aug 2015 14:37:00 +0100
Subject: [PATCH] s390: Add emit_barrier() after trap.

---
 gcc/config/s390/s390.c |  1 +
 gcc/testsuite/gcc.target/s390/20150826-1.c | 11 +++
 2 files changed, 12 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/20150826-1.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 6366691..5951598 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -10491,6 +10491,7 @@ s390_emit_prologue (void)
 		   current_function_name(), cfun_frame_layout.frame_size,
 		   s390_stack_size);
 	  emit_insn (gen_trap ());
+	  emit_barrier ();
 	}
 	  else
 	{
diff --git a/gcc/testsuite/gcc.target/s390/20150826-1.c b/gcc/testsuite/gcc.target/s390/20150826-1.c
new file mode 100644
index 000..830772f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/20150826-1.c
@@ -0,0 +1,11 @@
+/* Check that -mstack-size=32 does not cause an ICE.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -mstack-size=32 -Wno-pointer-to-int-cast" } */
+
+extern char* bar(char *);
+int foo(void)
+{
+  char b[100];
+  return (int)bar(b);
+} /* { dg-warning "An unconditional trap is added" } */
-- 
2.3.0



[patch] libstdc++/64351 Ensure std::generate_canonical doesn't return 1.

2015-08-26 Thread Jonathan Wakely

Ed posted this patch to https://gcc.gnu.org/PR64351 in January, I've
tested it and am committing it to trunk with a test.

commit 45f154a5f9172a17f6226b99b41cb9c0bd8d15ec
Author: Jonathan Wakely 
Date:   Wed Aug 26 12:53:08 2015 +0100

Ensure std::generate_canonical doesn't return 1.

2015-08-26  Edward Smith-Rowland  <3dw...@verizon.net>
	Jonathan Wakely  

	PR libstdc++/64351
	PR libstdc++/63176
	* include/bits/random.tcc (generate_canonical): Loop until we get a
	result less than one.
	* testsuite/26_numerics/random/uniform_real_distribution/operators/
	64351.cc: New.

diff --git a/libstdc++-v3/include/bits/random.tcc b/libstdc++-v3/include/bits/random.tcc
index 4fdbcfc..a6d966b 100644
--- a/libstdc++-v3/include/bits/random.tcc
+++ b/libstdc++-v3/include/bits/random.tcc
@@ -3472,15 +3472,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   const long double __r = static_cast(__urng.max())
 			- static_cast(__urng.min()) + 1.0L;
   const size_t __log2r = std::log(__r) / std::log(2.0L);
-  size_t __k = std::max(1UL, (__b + __log2r - 1UL) / __log2r);
-  _RealType __sum = _RealType(0);
-  _RealType __tmp = _RealType(1);
-  for (; __k != 0; --__k)
+  const size_t __m = std::max(1UL,
+	  (__b + __log2r - 1UL) / __log2r);
+  _RealType __ret;
+  do
 	{
-	  __sum += _RealType(__urng() - __urng.min()) * __tmp;
-	  __tmp *= __r;
+	  _RealType __sum = _RealType(0);
+	  _RealType __tmp = _RealType(1);
+	  for (size_t __k = __m; __k != 0; --__k)
+	{
+	  __sum += _RealType(__urng() - __urng.min()) * __tmp;
+	  __tmp *= __r;
+	}
+	  __ret = __sum / __tmp;
 	}
-  return __sum / __tmp;
+  while (__builtin_expect(__ret >= _RealType(1), 0));
+  return __ret;
 }
 
 _GLIBCXX_END_NAMESPACE_VERSION
diff --git a/libstdc++-v3/testsuite/26_numerics/random/uniform_real_distribution/operators/64351.cc b/libstdc++-v3/testsuite/26_numerics/random/uniform_real_distribution/operators/64351.cc
new file mode 100644
index 000..3de4412
--- /dev/null
+++ b/libstdc++-v3/testsuite/26_numerics/random/uniform_real_distribution/operators/64351.cc
@@ -0,0 +1,57 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+// { dg-do run { target { ! simulator } } }
+
+#include 
+#include 
+
+// libstdc++/64351
+void
+test01()
+{
+  std::mt19937 rng(8890);
+  std::uniform_real_distribution dist;
+
+  rng.discard(30e6);
+  for (long i = 0; i < 10e6; ++i)
+{
+  auto n = dist(rng);
+  VERIFY( n != 1.f );
+}
+}
+
+// libstdc++/63176
+void
+test02()
+{
+  std::mt19937 rng(8890);
+  std::seed_seq sequence{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
+  rng.seed(sequence);
+  rng.discard(12 * 629143 + 6);
+  float n =
+std::generate_canonical::digits>(rng);
+  VERIFY( n != 1.f );
+}
+
+int
+main()
+{
+  test01();
+  test02();
+}


Re: [PATCH][4/N] Introduce new inline functions for GET_MODE_UNIT_SIZE and GET_MODE_UNIT_PRECISION

2015-08-26 Thread Oleg Endo

On 26 Aug 2015, at 23:27, Oleg Endo  wrote:

> 
> On 19 Aug 2015, at 22:35, Jeff Law  wrote:
> 
>> On 08/19/2015 06:29 AM, David Sherwood wrote:
 I asked Richard S. to give this a once-over which he did.  However, he
 technically can't approve due to the way his maintainership position was
 worded.
 
 The one request would be a function comment for emit_mode_unit_size and
 emit_mode_unit_precision.  OK with that change.
>>> Thanks. Here's a new patch with the comments added.
>>> 
>>> Good to go?
>>> David.
>>> 
>>> ChangeLog:
>>> 
>>> 2015-08-19  David Sherwood  
>>> 
>>> gcc/
>>> * genmodes.c (emit_mode_unit_size_inline): New function.
>>> (emit_mode_unit_precision_inline): New function.
>>> (emit_insn_modes_h): Emit new #define.  Emit new functions.
>>> (emit_mode_unit_size): New function.
>>> (emit_mode_unit_precision): New function.
>>> (emit_mode_adjustments): Add mode_unit_size adjustments.
>>> (emit_insn_modes_c): Emit new arrays.
>>> * machmode.h (GET_MODE_UNIT_SIZE, GET_MODE_UNIT_PRECISION): Update to
>>> use new inline methods.
>> 
>> Thanks, this is OK for the trunk.
> 
> It seems this broke sh-elf, at least when compiling on OSX with its native 
> clang.
> 
> ../../gcc-trunk/gcc/machmode.h:228:43: error: redefinition of 
> 'mode_unit_size' with a different type:
>  'const unsigned char [56]' vs 'unsigned char [56]'
> extern CONST_MODE_UNIT_SIZE unsigned char mode_unit_size[NUM_MACHINE_MODES];
>  ^
> ./insn-modes.h:417:24: note: previous definition is here
>  extern unsigned char mode_unit_size[NUM_MACHINE_MODES];
>   ^

This following fixes the problem for me:

Index: gcc/genmodes.c
===
--- gcc/genmodes.c  (revision 227221)
+++ gcc/genmodes.c  (working copy)
@@ -1063,7 +1063,7 @@
 unsigned char\n\
 mode_unit_size_inline (machine_mode mode)\n\
 {\n\
-  extern unsigned char mode_unit_size[NUM_MACHINE_MODES];\n\
+  extern CONST_MODE_UNIT_SIZE unsigned char 
mode_unit_size[NUM_MACHINE_MODES];\n\
   switch (mode)\n\
 {");


Cheers,
Oleg

[v3 patch] Only set std::enable_shared_from_this member once.

2015-08-26 Thread Jonathan Wakely

This adds a check to weak_ptr::_M_assign() so that calling
__enable_shared_from_this_helper twice with the same pointer won't
change which shared_ptr object the weak_ptr shares ownership with.

On the lib reflector Peter Dimov convinced me that the
boost::enable_shared_from_this behaviour is preferable to what we do
now. I'm writing a proposal to specify this in the standard, but am
changing it now in our implementation.

Tested powerpc64le-linux, committing to trunk.
commit a1cd60820fb1af7f3396ff4b28e0e1d3449bfacb
Author: Jonathan Wakely 
Date:   Tue Aug 25 17:10:36 2015 +0100

Only set std::enable_shared_from_this member once.

	* include/bits/shared_ptr.h (__enable_shared_from_this_helper): Use
	nullptr.
	* include/bits/shared_ptr_base.h (weak_ptr::_M_assign): Don't assign
	if ownership is already shared with a shared_ptr object.
	(__enable_shared_from_this_helper): Use nullptr.
	* testsuite/20_util/enable_shared_from_this/members/const.cc: New.
	* testsuite/20_util/enable_shared_from_this/members/reinit.cc: New.
	* testsuite/20_util/enable_shared_from_this/requirements/
	explicit_instantiation.cc: Instantiate with const and incomplete types.

diff --git a/libstdc++-v3/include/bits/shared_ptr.h b/libstdc++-v3/include/bits/shared_ptr.h
index f96c078..2413b1b 100644
--- a/libstdc++-v3/include/bits/shared_ptr.h
+++ b/libstdc++-v3/include/bits/shared_ptr.h
@@ -588,7 +588,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	 const enable_shared_from_this* __pe,
 	 const _Tp1* __px) noexcept
 	{
-	  if (__pe != 0)
+	  if (__pe != nullptr)
 	__pe->_M_weak_assign(const_cast<_Tp1*>(__px), __pn);
 	}
 
diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h
index aec10fe..820edcb 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -1468,8 +1468,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   void
   _M_assign(_Tp* __ptr, const __shared_count<_Lp>& __refcount) noexcept
   {
-	_M_ptr = __ptr;
-	_M_refcount = __refcount;
+	if (use_count() == 0)
+	  {
+	_M_ptr = __ptr;
+	_M_refcount = __refcount;
+	  }
   }
 
   template friend class __shared_ptr;
@@ -1549,7 +1552,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	 const __enable_shared_from_this* __pe,
 	 const _Tp1* __px) noexcept
 	{
-	  if (__pe != 0)
+	  if (__pe != nullptr)
 	__pe->_M_weak_assign(const_cast<_Tp1*>(__px), __pn);
 	}
 
diff --git a/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/const.cc b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/const.cc
new file mode 100644
index 000..fdf39c7
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/const.cc
@@ -0,0 +1,60 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+
+#include 
+#include 
+
+template
+  bool
+  share_ownership(const std::shared_ptr& p1, const std::shared_ptr& p2)
+  {
+return !p1.owner_before(p2) && !p2.owner_before(p1);
+  }
+
+void
+test01()
+{
+  struct X : public std::enable_shared_from_this { };
+  using CX = const X;
+  std::shared_ptr p(new X);
+  VERIFY( share_ownership(p->shared_from_this(), p) );
+  p.reset(new CX);
+  VERIFY( share_ownership(p->shared_from_this(), p) );
+  auto p2 = std::const_pointer_cast(p)->shared_from_this();
+  VERIFY( share_ownership(p2, p) );
+}
+
+void
+test02()
+{
+  struct X;
+  using CX = const X;
+  struct X : public std::enable_shared_from_this { };
+  std::shared_ptr p(new X);
+  VERIFY( share_ownership(p->shared_from_this(), p) );
+  p.reset(new CX);
+  VERIFY( share_ownership(p->shared_from_this(), p) );
+}
+
+int
+main()
+{
+  test01();
+  test02();
+}
diff --git a/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/reinit.cc b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/reinit.cc
new file mode 100644
index 000..3740db8
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/reinit.cc
@@ -0,0 +1,49 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// te

Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Ilya Enkovich
2015-08-26 16:02 GMT+03:00 Richard Biener :
> On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich  wrote:
>> 2015-08-21 14:00 GMT+03:00 Richard Biener :
>>>
>>> Hmm, I don't see how vector masks are more difficult to operate with.
>>
>> There are just no instructions for that but you have to pretend you
>> have to get code vectorized.
>
> Huh?  Bitwise ops should be readily available.

Right bitwise ops are available, but there is no comparison into a
vector and no masked loads and stores using vector masks (when we
speak about 512-bit vectors).

>
>>>
 Also according to vector ABI integer mask should be used for mask
 operand in case of masked vector call.
>>>
>>> What ABI?  The function signature of the intrinsics?  How would that
>>> come into play here?
>>
>> Not intrinsics. I mean OpenMP vector functions which require integer
>> arg for a mask in case of 512-bit vector.
>
> How do you declare those?

Something like this:

#pragma omp declare simd inbranch
int foo(int*);

>
>>>
 Current implementation of masked loads, masked stores and bool
 patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
 really call it a canonical representation for all targets?
>>>
>>> No idea - we'll revisit when another targets adds a similar capability.
>>
>> AVX-512 is such target. Current representation forces multiple scalar
>> mask -> vector mask and back transformations which are artificially
>> introduced by current bool patterns and are hard to optimize out.
>
> I dislike the bool patterns anyway and we should try to remove those
> and make the vectorizer handle them in other ways (they have single-use
> issues anyway).  I don't remember exactly what caused us to add them
> but one reason was there wasn't a vector type for 'bool' (but I don't see how
> it should be necessary to ask "get me a vector type for 'bool'").
>
>>>
 Using scalar masks everywhere should probably cause the same conversion
 problem for SSE I listed above though.

 Talking about a canonical representation, shouldn't we use some
 special masks representation and not mixing it with integer and vector
 of integers then? Only in this case target would be able to
 efficiently expand it into a corresponding rtl.
>>>
>>> That was my idea of vector ... but I didn't explore it and see where
>>> it will cause issues.
>>>
>>> Fact is GCC already copes with vector masks generated by vector compares
>>> just fine everywhere and I'd rather leave it as that.
>>
>> Nope. Currently vector mask is obtained from a vec_cond > 0}, {-1 .. -1}>. AND and IOR on bools are also expressed via
>> additional vec_cond. I don't think vectorizer ever generates vector
>> comparison.
>
> Ok, well that's an implementation detail then.  Are you sure about AND and 
> IOR?
> The comment above vect_recog_bool_pattern says
>
> Assuming size of TYPE is the same as size of all comparisons
> (otherwise some casts would be added where needed), the above
> sequence we create related pattern stmts:
> S1'  a_T = x1 CMP1 y1 ? 1 : 0;
> S3'  c_T = x2 CMP2 y2 ? a_T : 0;
> S4'  d_T = x3 CMP3 y3 ? 1 : 0;
> S5'  e_T = c_T | d_T;
> S6'  f_T = e_T;
>
> thus has vector mask |

I think in practice it would look like:

S4'  d_T = x3 CMP3 y3 ? 1 : c_T;

Thus everything is usually hidden in vec_cond. But my concern is
mostly about types used for that.

>
>> And I wouldn't say it's fine 'everywhere' because there is a single
>> target utilizing them. Masked loads and stored for AVX-512 just don't
>> work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
>> 512-bit vector then we get an ugly inefficient code. The question is
>> where to fight with this inefficiency: in RTL or in GIMPLE. I want to
>> fight with it where it appears, i.e. in GIMPLE by preventing bool ->
>> int conversions applied everywhere even if target doesn't need it.
>>
>> If we don't want to support both types of masks in GIMPLE then it's
>> more reasonable to make bool -> int conversion in expand for targets
>> requiring it, rather than do it for everyone and then leave it to
>> target to transform it back and try to get rid of all those redundant
>> transformations. I'd give vector a chance to become a canonical
>> mask representation for that.
>
> Well, you are missing the case of
>
>bool b = a < b;
>int x = (int)b;

This case seems to require no changes and just be transformed into vec_cond.

Thanks,
Ilya

>
> where the bool is used as integer (and thus an integer mask would have to be
> "expanded").  When the bool is a mask in itself the integer use is either free
> or a matter of a widening/shortening operation.
>
> Richard.
>


Re: [libvtv] Update copyrights

2015-08-26 Thread Joseph Myers
On Wed, 26 Aug 2015, Rainer Orth wrote:

> While working on the Solaris libvtv port, I noticed that many of the
> libvtv copyright years hadn't been updated, were misformtted, or both.
> It turns out that libvtv isn't listed in contrib/update-copyright.py at
> all.  This patch fixes this and includes the result of running
> update-copyright.py --this-year libvtv.
> 
> I've neither added libvtv to self.default_dirs in the script nor added
> copyrights to the numerous files in libvtv that currently lack one.
> 
> Ok for mainline once it has survived regtesting?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Richard Biener
On Wed, Aug 26, 2015 at 3:35 PM, Jakub Jelinek  wrote:
> On Wed, Aug 26, 2015 at 03:21:52PM +0200, Richard Biener wrote:
>> On Wed, Aug 26, 2015 at 3:16 PM, Jakub Jelinek  wrote:
>> > On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote:
>> >> > AVX-512 is such target. Current representation forces multiple scalar
>> >> > mask -> vector mask and back transformations which are artificially
>> >> > introduced by current bool patterns and are hard to optimize out.
>> >>
>> >> I dislike the bool patterns anyway and we should try to remove those
>> >> and make the vectorizer handle them in other ways (they have single-use
>> >> issues anyway).  I don't remember exactly what caused us to add them
>> >> but one reason was there wasn't a vector type for 'bool' (but I don't see 
>> >> how
>> >> it should be necessary to ask "get me a vector type for 'bool'").
>> >
>> > That was just one of the reasons.  The other reason is that even if we 
>> > would
>> > choose some vector of integer type as vector of bool, the question is what
>> > type.  E.g. if you use vector of chars, you almost always get terrible
>> > vectorized code, except for the AVX-512 you really want an integral type
>> > that has the size of the types you are comparing.
>>
>> Yeah, but the way STMT_VINFO_VECTYPE is computed is that we always
>> first compute the vector type for the comparison itself (which is "fixed") 
>> and
>> thus we can compute the vector type of any bitwise op on it as well.
>
> Sure, but if you then immediately vector narrow it to a V*QI vector because
> it is stored originally into a bool/_Bool variable, and then again when it
> is used in say a COND_EXPR widen it again, you get really poor code.
> So, what the bool pattern code does is kind of poor man's type
> promotion/demotion pass for bool only, at least for the common cases.

Yeah, I just looked at the code but in the end everything should be fixable
in the place we compute STMT_VINFO_VECTYPE.  The code just
looks at the LHS type plus at the narrowest type (for vectorization factor).
It should get re-structured to get the vector types from the operands
(much like code-generation will eventually fall back to).

> PR50596 has been the primary reason to introduce the bool patterns.
> If there is a better type promotion/demotion pass on a copy of the loop,
> sure, we can get rid of it (but figure out also what to do for SLP).

Yeah, of course.  Basic-block SLP just asks for the vectype during SLP
analysis AFAIK.

I suppose we want sth like get_result_vectype (gimple) which can look
at operands as well and can be used from both places.

After all we do want to fix the non-single-use issue somehow and getting
rid of the patterns sounds good to me anyway...

Not sure if I can get to the above for GCC 6, but at least putting it on my
TODO...

Richard.

> Jakub


[PATCH][AArch64 array_mode 8/8] Add d-registers to TARGET_ARRAY_MODE_SUPPORTED_P

2015-08-26 Thread Alan Lawrence
This adds an AARCH64_VALID_SIMD_DREG_MODE exactly paralleling the existing
...QREG... macro, and as a driveby fixes mode->(MODE) in the latter.

The new test now compiles (at -O3) to:

test_1:
add v1.2s, v1.2s, v5.2s
add v2.2s, v2.2s, v6.2s
add v3.2s, v3.2s, v7.2s
add v0.2s, v0.2s, v4.2s
ret

Whereas prior to this patch we got:

test_1:
add v0.2s, v0.2s, v4.2s
sub sp, sp, #160
add v1.2s, v1.2s, v5.2s
add v2.2s, v2.2s, v6.2s
add v3.2s, v3.2s, v7.2s
str d0, [sp, 96]
str d1, [sp, 104]
str d2, [sp, 112]
str d3, [sp, 120]
ldp x2, x3, [sp, 96]
stp x2, x3, [sp, 128]
ldp x0, x1, [sp, 112]
stp x0, x1, [sp, 144]
ldr d1, [sp, 136]
ldr d0, [sp, 128]
ldr d2, [sp, 144]
ldr d3, [sp, 152]
add sp, sp, 160
ret

I've tried to look for (the absence of) this extra code in a number of ways,
all 3 scan...not's were previously failing (i.e. regex's were matching) but now
pass.

bootstrapped and check-gcc on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64.h (AARCH64_VALID_SIMD_DREG_MODE): New.
(AARCH64_VALID_SIMD_QREG_MODE): Correct mode->MODE.

* config/aarch64/aarch64.c (aarch64_array_mode_supported_p): Add
AARCH64_VALID_SIMD_DREG_MODE.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-int32x2x4_1.c: New.
---
 gcc/config/aarch64/aarch64.c   |  3 ++-
 gcc/config/aarch64/aarch64.h   |  7 ++-
 .../gcc.target/aarch64/vect-int32x2x4_1.c  | 22 ++
 3 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-int32x2x4_1.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a923b55..d2ea7f6 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -650,7 +650,8 @@ aarch64_array_mode_supported_p (machine_mode mode,
unsigned HOST_WIDE_INT nelems)
 {
   if (TARGET_SIMD
-  && AARCH64_VALID_SIMD_QREG_MODE (mode)
+  && (AARCH64_VALID_SIMD_QREG_MODE (mode)
+ || AARCH64_VALID_SIMD_DREG_MODE (mode))
   && (nelems >= 2 && nelems <= 4))
 return true;
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 3851564..d1ba00b 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -915,10 +915,15 @@ extern enum aarch64_code_model aarch64_cmodel;
   (aarch64_cmodel == AARCH64_CMODEL_TINY   \
|| aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
 
+/* Modes valid for AdvSIMD D registers, i.e. that fit in half a Q register.  */
+#define AARCH64_VALID_SIMD_DREG_MODE(MODE) \
+  ((MODE) == V2SImode || (MODE) == V4HImode || (MODE) == V8QImode \
+   || (MODE) == V2SFmode || (MODE) == DImode || (MODE) == DFmode)
+
 /* Modes valid for AdvSIMD Q registers.  */
 #define AARCH64_VALID_SIMD_QREG_MODE(MODE) \
   ((MODE) == V4SImode || (MODE) == V8HImode || (MODE) == V16QImode \
-   || (MODE) == V4SFmode || (MODE) == V2DImode || mode == V2DFmode)
+   || (MODE) == V4SFmode || (MODE) == V2DImode || (MODE) == V2DFmode)
 
 #define ENDIAN_LANE_N(mode, n)  \
   (BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 - n : n)
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-int32x2x4_1.c 
b/gcc/testsuite/gcc.target/aarch64/vect-int32x2x4_1.c
new file mode 100644
index 000..734cfd6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-int32x2x4_1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-rtl-expand" } */
+
+#include 
+
+uint32x2x4_t
+test_1 (uint32x2x4_t a, uint32x2x4_t b)
+{
+   uint32x2x4_t result;
+
+   for (unsigned index = 0; index < 4; ++index)
+ result.val[index] = a.val[index] + b.val[index];
+
+   return result;
+}
+
+/* Should not use the stack in expand.  */
+/* { dg-final { scan-rtl-dump-not "virtual-stack-vars" "expand" } } */
+/* Should not have to modify the stack pointer.  */
+/* { dg-final { scan-assembler-not "\t(add|sub).*sp" } } */
+/* Should not have to store or load anything.  */
+/* { dg-final { scan-assembler-not "\t(ld|st)\[rp\]" } } */
-- 
1.8.3



Re: [PATCH][4/N] Introduce new inline functions for GET_MODE_UNIT_SIZE and GET_MODE_UNIT_PRECISION

2015-08-26 Thread Oleg Endo

On 19 Aug 2015, at 22:35, Jeff Law  wrote:

> On 08/19/2015 06:29 AM, David Sherwood wrote:
>>> I asked Richard S. to give this a once-over which he did.  However, he
>>> technically can't approve due to the way his maintainership position was
>>> worded.
>>> 
>>> The one request would be a function comment for emit_mode_unit_size and
>>> emit_mode_unit_precision.  OK with that change.
>> Thanks. Here's a new patch with the comments added.
>> 
>> Good to go?
>> David.
>> 
>> ChangeLog:
>> 
>> 2015-08-19  David Sherwood  
>> 
>>  gcc/
>>  * genmodes.c (emit_mode_unit_size_inline): New function.
>>  (emit_mode_unit_precision_inline): New function.
>>  (emit_insn_modes_h): Emit new #define.  Emit new functions.
>>  (emit_mode_unit_size): New function.
>>  (emit_mode_unit_precision): New function.
>>  (emit_mode_adjustments): Add mode_unit_size adjustments.
>>  (emit_insn_modes_c): Emit new arrays.
>>  * machmode.h (GET_MODE_UNIT_SIZE, GET_MODE_UNIT_PRECISION): Update to
>>  use new inline methods.
> 
> Thanks, this is OK for the trunk.

It seems this broke sh-elf, at least when compiling on OSX with its native 
clang.

../../gcc-trunk/gcc/machmode.h:228:43: error: redefinition of 'mode_unit_size' 
with a different type:
  'const unsigned char [56]' vs 'unsigned char [56]'
extern CONST_MODE_UNIT_SIZE unsigned char mode_unit_size[NUM_MACHINE_MODES];
  ^
./insn-modes.h:417:24: note: previous definition is here
  extern unsigned char mode_unit_size[NUM_MACHINE_MODES];
   ^
Cheers,
Oleg

Re: [gomp4] teach the tracer pass to ignore more blocks for OpenACC

2015-08-26 Thread Nathan Sidwell

On 08/26/15 09:57, Cesar Philippidis wrote:

I hit a problem in on one of my reduction test cases where the
GOACC_JOIN was getting cloned. Nvptx requires FORK and JOIN to be
single-entry, single-exit regions, or some form of thread divergence may
occur. When that happens, we cannot use the shfl instruction for
reductions or broadcasting (if the warp is divergent), and it may cause
problems with synchronization in general.

Nathan ran into a similar problem in one of the ssa passes when he added
support for predication in the nvptx backend. Part of his solution was
to add a gimple_call_internal_unique_p function to determine if internal
functions are safe to be cloned. This patch teaches the tracer to scan
each basic block for internal function calls using
gimple_call_internal_unique_p, and mark the blocks that contain certain
OpenACC internal functions calls as ignored. It is a shame that
gimple_statement_iterators do not play nicely with const_basic_block.

Is this patch ok for gomp-4_0-branch?


ok by me.  (I idly wonder if tracer should be using the routine that 
jump-threading has for scanning a block determining duplicability)


nathan

--
Nathan Sidwell - Director, Sourcery Services - Mentor Embedded


Re: [RFC 4/5] Handle constant-pool entries

2015-08-26 Thread Martin Jambor
Hi,

On Tue, Aug 25, 2015 at 12:06:16PM +0100, Alan Lawrence wrote:
> This makes SRA replace loads of records/arrays from constant pool entries,
> with elementwise assignments of the constant values, hence, overcoming the
> fundamental problem in PR/63679.
> 
> As a first pass, the approach I took was to look for constant-pool loads as
> we scanned through other accesses, and add them as candidates there; to build 
> a
> constant replacement_decl for any such accesses in completely_scalarize; and 
> to
> use any existing replacement_decl rather than creating a variable in
> create_access_replacement. (I did try using CONSTANT_CLASS_P in the latter, 
> but
> that does not allow addresses of labels, which can still end up in the 
> constant
> pool.)
> 
> Feedback as to the approach or how it might be better structured / fitted into
> SRA, is solicited ;).

I'm not familiar with constant pools very much, but I'll try:

> 
> Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and
> arm-none-linux-gnueabihf, including with the next patch (rfc), which greatly 
> increases the number of testcases in which this code is exercised!
> 
> Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes (using 
> a stage 1 compiler only, without execution) on alpha, hppa, powerpc, sparc, 
> avr, and sh.
> 
> gcc/ChangeLog:
> 
>   * tree-sra.c (create_access): Scan for uses of constant pool and add
>   to candidates.
>   (subst_initial): New.
>   (scalarize_elem): Build replacement_decl using subst_initial.
>   (create_access_replacement): Use replacement_decl if set.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param
>   sra-max-scalarization-size-Ospeed.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |  7 +---
>  gcc/tree-sra.c| 56 
> +--
>  2 files changed, 55 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> index 9eccdc9..b13d583 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */
> +/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized 
> --param sra-max-scalarization-size-Ospeed=32" } */
>  
>  int
>  foo ()
> @@ -17,7 +17,4 @@ foo ()
>  /* After late unrolling the above loop completely DOM should be
> able to optimize this to return 28.  */
>  
> -/* See PR63679 and PR64159, if the target forces the initializer to memory 
> then
> -   DOM is not able to perform this optimization.  */
> -
> -/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail aarch64*-*-* 
> alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */
> +/* { dg-final { scan-tree-dump "return 28;" "optimized" } } */
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index af35fcc..a3ff2df 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -865,6 +865,17 @@ create_access (tree expr, gimple stmt, bool write)
>else
>  ptr = false;
>  
> +  /* FORNOW: scan for uses of constant pool as we go along.  */
> +  if (TREE_CODE (base) == VAR_DECL && DECL_IN_CONSTANT_POOL (base)
> +  && !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
> +{
> +  gcc_assert (!write);
> +  bitmap_set_bit (candidate_bitmap, DECL_UID (base));
> +  tree_node **slot = candidates->find_slot_with_hash (base, DECL_UID 
> (base),
> +   INSERT);
> +  *slot = base;
> +}
> +

I believe you only want to do this if (sra_mode ==
SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA).

The idea of candidates is that we gather them in find_var_candidates
and ten we only eliminate them, this has the benefit of not worrying
about disqualifying a candidate and then erroneously re-adding it
later.  So if you could find a way to structure your code this way, I'd
much happier.  If it is impossible without traversing the whole
function just for that purpose, we may need some mechanism to prevent
us from making a disqualified decl a candidate again.  Or, if we come
to the conclusion that constant pool decls do not ever get
disqualified, a gcc_assert making sure it actually does not happen in
disqualify_candidate.

And of course at find_var_candidates time we check that all candidates
pass simple checks in maybe_add_sra_candidate.  I suppose many of them
do not make sense for constant pool decls but at least please have a
look whether that is the case for all of them or not.

>if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
>  return NULL;
>  
> @@ -1025,6 +1036,37 @@ completely_scalarize (tree base, tree decl_type, 
> HOST_WIDE_INT offset, tree ref)
>  }
>  }
>  
> +static tree
> +subst_initial (tree

[AArch64/testsuite] Add more TLS local executable testcases

2015-08-26 Thread Jiong Wang

This patch cover tlsle tiny model tests, tls size truncation for tiny &
small model included also.

All testcases pass native test.

OK for trunk?

2015-08-26  Jiong Wang  

gcc/testsuite/
  * gcc.target/aarch64/tlsle12_tiny_1.c: New testcase for tiny model.
  * gcc.target/aarch64/tlsle24_tiny_1.c: Likewise.
  * gcc.target/aarch64/tlsle_sizeadj_tiny_1.c: TLS size truncation test
  for tiny model.
  * gcc.target/aarch64/tlsle_sizeadj_small_1.c: TLS size truncation test
  for small model.
  
-- 
Regards,
Jiong

Index: gcc/testsuite/gcc.target/aarch64/tlsle12_tiny_1.c
===
--- gcc/testsuite/gcc.target/aarch64/tlsle12_tiny_1.c	(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/tlsle12_tiny_1.c	(working copy)
@@ -0,0 +1,8 @@
+/* { dg-do run } */
+/* { dg-require-effective-target tls_native } */
+/* { dg-options "-O2 -fpic -ftls-model=local-exec -mtls-size=12 -mcmodel=tiny --save-temps" } */
+
+#include "tls_1.x"
+
+/* { dg-final { scan-assembler-times "#:tprel_lo12" 2 } } */
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/tlsle24_tiny_1.c
===
--- gcc/testsuite/gcc.target/aarch64/tlsle24_tiny_1.c	(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/tlsle24_tiny_1.c	(working copy)
@@ -0,0 +1,9 @@
+/* { dg-do run } */
+/* { dg-require-effective-target tls_native } */
+/* { dg-options "-O2 -fpic -ftls-model=local-exec -mtls-size=24 -mcmodel=tiny --save-temps" } */
+
+#include "tls_1.x"
+
+/* { dg-final { scan-assembler-times "#:tprel_lo12_nc" 2 } } */
+/* { dg-final { scan-assembler-times "#:tprel_hi12" 2 } } */
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_small_1.c
===
--- gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_small_1.c	(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_small_1.c	(working copy)
@@ -0,0 +1,10 @@
+/* { dg-do run } */
+/* { dg-require-effective-target tls_native } */
+/* { dg-require-effective-target aarch64_tlsle32 } */
+/* { dg-options "-O2 -fpic -ftls-model=local-exec -mtls-size=48 --save-temps" } */
+
+#include "tls_1.x"
+
+/* { dg-final { scan-assembler-times "#:tprel_g1" 2 } } */
+/* { dg-final { scan-assembler-times "#:tprel_g0_nc" 2 } } */
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_tiny_1.c
===
--- gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_tiny_1.c	(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_tiny_1.c	(working copy)
@@ -0,0 +1,9 @@
+/* { dg-do run } */
+/* { dg-require-effective-target tls_native } */
+/* { dg-options "-O2 -fpic -ftls-model=local-exec -mtls-size=32 -mcmodel=tiny --save-temps" } */
+
+#include "tls_1.x"
+
+/* { dg-final { scan-assembler-times "#:tprel_lo12_nc" 2 } } */
+/* { dg-final { scan-assembler-times "#:tprel_hi12" 2 } } */
+/* { dg-final { cleanup-saved-temps } } */


Re: [PATCH] PR66870 PowerPC64 Enable gold linker with split stack

2015-08-26 Thread Lynn A. Boger

I am working on a new patch to address some of the previous concerns
and plan to post it soon after some final testing.

On 08/25/2015 05:51 PM, Ian Lance Taylor wrote:

On Tue, Aug 18, 2015 at 1:36 PM, Lynn A. Boger
 wrote:

libgo/
 PR target/66870
 configure.ac:  When gccgo for building libgo uses the gold version
 containing split stack support on ppc64, ppc64le, define
 LINKER_SUPPORTS_SPLIT_STACK.
 configure:  Regenerate.

Your version test for gold isn't robust: if the major version >= 3,
then presumably split stack is supported.  And since you have numbers,
I suggest not trying to use switch, but instead writing something like
 if expr "$gold_minor" == 25; then
 ...
 elif expr "$gold_minor" > 25; then
 ...
 fi

If that is fixed, I'm fine with the libgo part of this patch.

Ian






[gomp4] teach the tracer pass to ignore more blocks for OpenACC

2015-08-26 Thread Cesar Philippidis
I hit a problem in on one of my reduction test cases where the
GOACC_JOIN was getting cloned. Nvptx requires FORK and JOIN to be
single-entry, single-exit regions, or some form of thread divergence may
occur. When that happens, we cannot use the shfl instruction for
reductions or broadcasting (if the warp is divergent), and it may cause
problems with synchronization in general.

Nathan ran into a similar problem in one of the ssa passes when he added
support for predication in the nvptx backend. Part of his solution was
to add a gimple_call_internal_unique_p function to determine if internal
functions are safe to be cloned. This patch teaches the tracer to scan
each basic block for internal function calls using
gimple_call_internal_unique_p, and mark the blocks that contain certain
OpenACC internal functions calls as ignored. It is a shame that
gimple_statement_iterators do not play nicely with const_basic_block.

Is this patch ok for gomp-4_0-branch?

Cesar
2015-08-25  Cesar Philippidis  

	gcc/
	* tracer.c (ignore_bb_p): Change bb argument from const_basic_block
	to basic_block.  Check for non-clonable calls to internal functions.


diff --git a/gcc/tracer.c b/gcc/tracer.c
index cad7ab1..f20c158 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -58,7 +58,7 @@
 #include "fibonacci_heap.h"
 
 static int count_insns (basic_block);
-static bool ignore_bb_p (const_basic_block);
+static bool ignore_bb_p (basic_block);
 static bool better_p (const_edge, const_edge);
 static edge find_best_successor (basic_block);
 static edge find_best_predecessor (basic_block);
@@ -91,8 +91,9 @@ bb_seen_p (basic_block bb)
 
 /* Return true if we should ignore the basic block for purposes of tracing.  */
 static bool
-ignore_bb_p (const_basic_block bb)
+ignore_bb_p (basic_block bb)
 {
+  gimple_stmt_iterator gsi;
   gimple g;
 
   if (bb->index < NUM_FIXED_BLOCKS)
@@ -106,6 +107,16 @@ ignore_bb_p (const_basic_block bb)
   if (g && gimple_code (g) == GIMPLE_TRANSACTION)
 return true;
 
+  /* Ignore blocks containing non-clonable function calls.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+{
+  g = gsi_stmt (gsi);
+
+  if (is_gimple_call (g) && gimple_call_internal_p (g)
+	  && gimple_call_internal_unique_p (g))
+	return true;
+}
+
   return false;
 }
 


Re: [PATCH], PowerPC IEEE 128-bit patch #6

2015-08-26 Thread David Edelsohn
On Fri, Aug 14, 2015 at 11:47 AM, Michael Meissner
 wrote:

> This is patch #6:
>
> 2015-08-13  Michael Meissner  
>
> * config/rs6000/rs6000-protos.h (rs6000_expand_float128_convert):
> Add declaration.
>
> * config/rs6000/rs6000.c (rs6000_emit_le_vsx_store): Fix a
> comment.
> (rs6000_cannot_change_mode_class): Add support for IEEE 128-bit
> floating point in VSX registers.
> (rs6000_output_move_128bit): Always print out the set insn if we
> can't generate an appropriate 128-bit move.
> (rs6000_generate_compare): Add support for IEEE 128-bit floating
> point in VSX registers comparisons.
> (rs6000_expand_float128_convert): Likewise.
>
> * config/rs6000/rs6000.md (extenddftf2): Add support for IEEE
> 128-bit floating point in VSX registers.
> (extenddftf2_internal): Likewise.
> (trunctfdf2): Likewise.
> (trunctfdf2_internal2): Likewise.
> (fix_trunc_helper): Likewise.
> (fix_trunctfdi2"): Likewise.
> (floatditf2): Likewise.
> (floatunstf2): Likewise.
> (extend2): Likewise.
> (trunc2): Likewise.
> (fix_trunc2): Likewise.
> (fixuns_trunc2): Likewise.
> (float2): Likewise.
> (floatuns2): Likewise.

This patch is okay.

Thanks, David


[PATCH][AArch64 array_mode 7/8] Combine the expanders using VSTRUCT:nregs

2015-08-26 Thread Alan Lawrence
The previous patches leave ld[234]_lane, st[234]_lane, and ld[234]r expanders 
all nearly identical, so we can easily parameterize across the number of lanes 
and combine them.

For the ld_lane pattern, I switched from the VCONQ attribute to 
just using the MODE attribute, this is identical for all the Q-register modes 
over which we iterate.

bootstrapped and check-gcc on aarch64-none-linux-gnu

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_ld2r,
aarch64_ld3r, aarch64_ld4r): Combine together, making...
(aarch64_simd_ldr): ...this.

(aarch64_ld2_lane, aarch64_ld3_lane,
aarch64_ld4_lane): Combine together, making...
(aarch64_ld_lane): ...this.

(aarch64_st2_lane, aarch64_st3_lane,
aarch64_st4_lane): Combine together, making...
(aarch64_st_lane): ...this.
---
 gcc/config/aarch64/aarch64-simd.md | 144 ++---
 1 file changed, 21 insertions(+), 123 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index f938754..38c4210 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4349,42 +4349,18 @@
 FAIL;
 })
 
-(define_expand "aarch64_ld2r"
-  [(match_operand:OI 0 "register_operand" "=w")
+(define_expand "aarch64_ldr"
+  [(match_operand:VSTRUCT 0 "register_operand" "=w")
(match_operand:DI 1 "register_operand" "w")
(unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_SIMD"
 {
   rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
-  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) * 2);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode))
+* );
 
-  emit_insn (gen_aarch64_simd_ld2r (operands[0], mem));
-  DONE;
-})
-
-(define_expand "aarch64_ld3r"
-  [(match_operand:CI 0 "register_operand" "=w")
-   (match_operand:DI 1 "register_operand" "w")
-   (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
-  "TARGET_SIMD"
-{
-  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
-  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) * 3);
-
-  emit_insn (gen_aarch64_simd_ld3r (operands[0], mem));
-  DONE;
-})
-
-(define_expand "aarch64_ld4r"
-  [(match_operand:XI 0 "register_operand" "=w")
-   (match_operand:DI 1 "register_operand" "w")
-   (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
-  "TARGET_SIMD"
-{
-  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
-  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) * 4);
-
-  emit_insn (gen_aarch64_simd_ld4r (operands[0],mem));
+  emit_insn (gen_aarch64_simd_ldr (operands[0],
+   mem));
   DONE;
 })
 
@@ -4561,67 +4537,25 @@
   DONE;
 })
 
-(define_expand "aarch64_ld2_lane"
-  [(match_operand:OI 0 "register_operand" "=w")
+(define_expand "aarch64_ld_lane"
+  [(match_operand:VSTRUCT 0 "register_operand" "=w")
(match_operand:DI 1 "register_operand" "w")
-   (match_operand:OI 2 "register_operand" "0")
+   (match_operand:VSTRUCT 2 "register_operand" "0")
(match_operand:SI 3 "immediate_operand" "i")
(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_SIMD"
 {
   rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
-  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) * 2);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode))
+* );
 
-  aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode),
+  aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode),
NULL);
-  emit_insn (gen_aarch64_vec_load_lanesoi_lane (operands[0],
- mem,
- operands[2],
- operands[3]));
+  emit_insn (gen_aarch64_vec_load_lanes_lane (
+   operands[0], mem, operands[2], operands[3]));
   DONE;
 })
 
-(define_expand "aarch64_ld3_lane"
-  [(match_operand:CI 0 "register_operand" "=w")
-   (match_operand:DI 1 "register_operand" "w")
-   (match_operand:CI 2 "register_operand" "0")
-   (match_operand:SI 3 "immediate_operand" "i")
-   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
-  "TARGET_SIMD"
-{
-  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
-  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) * 3);
-
-  aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode),
-   NULL);
-  emit_insn (gen_aarch64_vec_load_lanesci_lane (operands[0],
- mem,
- operands[2],
- operands[3]));
-  DONE;
-})
-
-(define_expand "aarch64_ld4_lane"
-  [(match_operand:XI 0 "register_operand" "=w")
-   (match_operand:DI 1 "register_operand" "w")
-   (match_operand:XI 2 "register_operand" "0")
-   (match_operand:SI 3 "immediate_operand" "i

[PATCH][AArch64 array_mode 5/8] Remove V_FOUR_ELEM, again using BLKmode + set_mem_size.

2015-08-26 Thread Alan Lawrence
This removes V_FOUR_ELEM in the same way that patch 3 removed V_THREE_ELEM,
again using BLKmode + set_mem_size. (This makes the four-lane expanders very
similar to the three-lane expanders, and they will be combined in patch 7.)

bootstrapped and check-gcc on aarch64-none-linux-gnu

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_simd_ld4r,
aarch64_vec_load_lanesxi_lane,
aarch64_vec_store_lanesxi_lane to BLK.
(aarch64_ld4r, aarch64_ld4_lane,
aarch64_st4_lane): Generate MEM rtx with BLKmode, call
set_mem_size.

* config/aarch64/iterators.md (V_FOUR_ELEM): Remove.
---
 gcc/config/aarch64/aarch64-simd.md | 25 +
 gcc/config/aarch64/iterators.md|  9 -
 2 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 156fc4f..68182d6 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4096,7 +4096,7 @@
 
 (define_insn "aarch64_simd_ld4r"
   [(set (match_operand:XI 0 "register_operand" "=w")
-   (unspec:XI [(match_operand: 1 
"aarch64_simd_struct_operand" "Utv")
+   (unspec:XI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
(unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ]
   UNSPEC_LD4_DUP))]
   "TARGET_SIMD"
@@ -4106,7 +4106,7 @@
 
 (define_insn "aarch64_vec_load_lanesxi_lane"
   [(set (match_operand:XI 0 "register_operand" "=w")
-   (unspec:XI [(match_operand: 1 
"aarch64_simd_struct_operand" "Utv")
+   (unspec:XI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
(match_operand:XI 2 "register_operand" "0")
(match_operand:SI 3 "immediate_operand" "i")
(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
@@ -4147,10 +4147,10 @@
 
 ;; RTL uses GCC vector extension indices, so flip only for assembly.
 (define_insn "aarch64_vec_store_lanesxi_lane"
-  [(set (match_operand: 0 "aarch64_simd_struct_operand" "=Utv")
-   (unspec: [(match_operand:XI 1 "register_operand" "w")
-(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)
-   (match_operand:SI 2 "immediate_operand" "i")]
+  [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv")
+   (unspec:BLK [(match_operand:XI 1 "register_operand" "w")
+(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)
+(match_operand:SI 2 "immediate_operand" "i")]
UNSPEC_ST4_LANE))]
   "TARGET_SIMD"
   {
@@ -4381,8 +4381,8 @@
(unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_SIMD"
 {
-  machine_mode mode = mode;
-  rtx mem = gen_rtx_MEM (mode, operands[1]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) * 4);
 
   emit_insn (gen_aarch64_simd_ld4r (operands[0],mem));
   DONE;
@@ -4609,8 +4609,8 @@
(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_SIMD"
 {
-  machine_mode mode = mode;
-  rtx mem = gen_rtx_MEM (mode, operands[1]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) * 4);
 
   aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode),
NULL);
@@ -4892,8 +4892,9 @@
   (match_operand:SI 2 "immediate_operand")]
   "TARGET_SIMD"
 {
-  machine_mode mode = mode;
-  rtx mem = gen_rtx_MEM (mode, operands[0]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[0]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) * 4);
+
   operands[2] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[2])));
 
   emit_insn (gen_aarch64_vec_store_lanesxi_lane (mem,
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index ae0be0b..9535b7f 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -568,15 +568,6 @@
   (V2SF "V2SF") (V4SF "V2SF")
   (DF "V2DI")   (V2DF "V2DI")])
 
-;; Similar, for four elements.
-(define_mode_attr V_FOUR_ELEM [(V8QI "SI")   (V16QI "SI")
-   (V4HI "V4HI") (V8HI "V4HI")
-   (V2SI "V4SI") (V4SI "V4SI")
-   (DI "OI") (V2DI "OI")
-   (V2SF "V4SF") (V4SF "V4SF")
-   (DF "OI") (V2DF "OI")])
-
-
 ;; Mode for atomic operation suffixes
 (define_mode_attr atomic_sfx
   [(QI "b") (HI "h") (SI "") (DI "")])
-- 
1.8.3



[PATCH][AArch64 array_mode 3/8] Stop using EImode in aarch64-simd.md and iterators.md

2015-08-26 Thread Alan Lawrence
The V_THREE_ELEM attribute used BLKmode for most sizes, but occasionally
EImode. This patch changes to BLKmode in all cases, explicitly setting
memory size (thus, preserving size for the cases that were EImode, and
setting size for the first time for cases that were already BLKmode).

The patterns affected are only for intrinsics: the aarch64_ld3r
expanders and aarch64_simd_ld3r insns, and the
aarch64_vec_{load,store}_lanesci_lane insns used by the
aarch64_{ld,st}3_lane expanders.

bootstrapped and check-gcc on aarch64-none-linux-gnu

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_simd_ld3r,
aarch64_vec_load_lanesci_lane,
aarch64_vec_store_lanesci_lane): Change operand mode
from  to BLK.

(aarch64_ld3r, aarch64_ld3_lane,
aarch64_st3_lane): Generate MEM rtx with BLKmode, call
set_mem_size.

* config/aarch64/iterators.md (V_THREE_ELEM): Remove.
---
 gcc/config/aarch64/aarch64-simd.md | 27 ++-
 gcc/config/aarch64/iterators.md|  8 
 2 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 7b7a1b8..156fc4f 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4001,7 +4001,7 @@
 
 (define_insn "aarch64_simd_ld3r"
   [(set (match_operand:CI 0 "register_operand" "=w")
-   (unspec:CI [(match_operand: 1 
"aarch64_simd_struct_operand" "Utv")
+   (unspec:CI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
(unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ]
   UNSPEC_LD3_DUP))]
   "TARGET_SIMD"
@@ -4011,7 +4011,7 @@
 
 (define_insn "aarch64_vec_load_lanesci_lane"
   [(set (match_operand:CI 0 "register_operand" "=w")
-   (unspec:CI [(match_operand: 1 
"aarch64_simd_struct_operand" "Utv")
+   (unspec:CI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
(match_operand:CI 2 "register_operand" "0")
(match_operand:SI 3 "immediate_operand" "i")
(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
@@ -4052,11 +4052,11 @@
 
 ;; RTL uses GCC vector extension indices, so flip only for assembly.
 (define_insn "aarch64_vec_store_lanesci_lane"
-  [(set (match_operand: 0 "aarch64_simd_struct_operand" "=Utv")
-   (unspec: [(match_operand:CI 1 "register_operand" "w")
-(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)
-   (match_operand:SI 2 "immediate_operand" "i")]
-   UNSPEC_ST3_LANE))]
+  [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv")
+   (unspec:BLK [(match_operand:CI 1 "register_operand" "w")
+(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)
+(match_operand:SI 2 "immediate_operand" "i")]
+   UNSPEC_ST3_LANE))]
   "TARGET_SIMD"
   {
 operands[2] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[2])));
@@ -4368,8 +4368,8 @@
(unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_SIMD"
 {
-  machine_mode mode = mode;
-  rtx mem = gen_rtx_MEM (mode, operands[1]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) * 3);
 
   emit_insn (gen_aarch64_simd_ld3r (operands[0], mem));
   DONE;
@@ -4589,8 +4589,8 @@
(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_SIMD"
 {
-  machine_mode mode = mode;
-  rtx mem = gen_rtx_MEM (mode, operands[1]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) * 3);
 
   aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode),
NULL);
@@ -4874,8 +4874,9 @@
   (match_operand:SI 2 "immediate_operand")]
   "TARGET_SIMD"
 {
-  machine_mode mode = mode;
-  rtx mem = gen_rtx_MEM (mode, operands[0]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[0]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) * 3);
+
   operands[2] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[2])));
 
   emit_insn (gen_aarch64_vec_store_lanesci_lane (mem,
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 98b6714..ae0be0b 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -568,14 +568,6 @@
   (V2SF "V2SF") (V4SF "V2SF")
   (DF "V2DI")   (V2DF "V2DI")])
 
-;; Similar, for three elements.
-(define_mode_attr V_THREE_ELEM [(V8QI "BLK") (V16QI "BLK")
-(V4HI "BLK") (V8HI "BLK")
-(V2SI "BLK") (V4SI "BLK")
-(DI "EI")(V2DI "EI")
-(V2SF "BLK") (V4SF "BLK")
-(DF "EI")(V2DF "EI")])
-
 ;; Similar, for four elements.
 (define_mode_attr V_FOUR_ELEM [(V8QI "SI")   (V16QI "SI")

[PATCH][AArch64 array_mode 6/8] Remove V_TWO_ELEM, again using BLKmode + set_mem_size.

2015-08-26 Thread Alan Lawrence
Same logic as previous; this makes the 2-, 3-, and 4-lane expanders all follow 
the same pattern.

bootstrapped and check-gcc on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_simd_ld2r,
aarch64_vec_load_lanesoi_lane,
aarch64_vec_store_lanesoi_lane to BLK.
(aarch64_ld2r, aarch64_ld2_lane,
aarch64_st2_lane): Generate MEM rtx with BLKmode, call
set_mem_size.

* config/aarch64/iterators.md (V_TWO_ELEM): Remove.
---
 gcc/config/aarch64/aarch64-simd.md | 21 +++--
 gcc/config/aarch64/iterators.md|  9 -
 2 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 68182d6..f938754 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3906,7 +3906,7 @@
 
 (define_insn "aarch64_simd_ld2r"
   [(set (match_operand:OI 0 "register_operand" "=w")
-   (unspec:OI [(match_operand: 1 "aarch64_simd_struct_operand" 
"Utv")
+   (unspec:OI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
(unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ]
   UNSPEC_LD2_DUP))]
   "TARGET_SIMD"
@@ -3916,7 +3916,7 @@
 
 (define_insn "aarch64_vec_load_lanesoi_lane"
   [(set (match_operand:OI 0 "register_operand" "=w")
-   (unspec:OI [(match_operand: 1 "aarch64_simd_struct_operand" 
"Utv")
+   (unspec:OI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
(match_operand:OI 2 "register_operand" "0")
(match_operand:SI 3 "immediate_operand" "i")
(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ]
@@ -3957,8 +3957,8 @@
 
 ;; RTL uses GCC vector extension indices, so flip only for assembly.
 (define_insn "aarch64_vec_store_lanesoi_lane"
-  [(set (match_operand: 0 "aarch64_simd_struct_operand" "=Utv")
-   (unspec: [(match_operand:OI 1 "register_operand" "w")
+  [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv")
+   (unspec:BLK [(match_operand:OI 1 "register_operand" "w")
 (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)
(match_operand:SI 2 "immediate_operand" "i")]
UNSPEC_ST2_LANE))]
@@ -4355,8 +4355,8 @@
(unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_SIMD"
 {
-  machine_mode mode = mode;
-  rtx mem = gen_rtx_MEM (mode, operands[1]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) * 2);
 
   emit_insn (gen_aarch64_simd_ld2r (operands[0], mem));
   DONE;
@@ -4569,8 +4569,8 @@
(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_SIMD"
 {
-  machine_mode mode = mode;
-  rtx mem = gen_rtx_MEM (mode, operands[1]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) * 2);
 
   aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode),
NULL);
@@ -4857,8 +4857,9 @@
   (match_operand:SI 2 "immediate_operand")]
   "TARGET_SIMD"
 {
-  machine_mode mode = mode;
-  rtx mem = gen_rtx_MEM (mode, operands[0]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[0]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) * 2);
+
   operands[2] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[2])));
 
   emit_insn (gen_aarch64_vec_store_lanesoi_lane (mem,
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 9535b7f..2a99e10 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -559,15 +559,6 @@
(V4SI "V16SI")  (V4SF "V16SF")
(V2DI "V8DI")  (V2DF "V8DF")])
 
-;; Mode of pair of elements for each vector mode, to define transfer
-;; size for structure lane/dup loads and stores.
-(define_mode_attr V_TWO_ELEM [(V8QI "HI")   (V16QI "HI")
-  (V4HI "SI")   (V8HI "SI")
-  (V2SI "V2SI") (V4SI "V2SI")
-  (DI "V2DI")   (V2DI "V2DI")
-  (V2SF "V2SF") (V4SF "V2SF")
-  (DF "V2DI")   (V2DF "V2DI")])
-
 ;; Mode for atomic operation suffixes
 (define_mode_attr atomic_sfx
   [(QI "b") (HI "h") (SI "") (DI "")])
-- 
1.8.3



[PATCH][AArch64 array_mode 2/8] Remove VSTRUCT_DREG, use BLKmode for d-reg aarch64_st/ld expands

2015-08-26 Thread Alan Lawrence
aarch64_st and
aarch64_ld expanders back onto 12 insns
aarch64_{ld,st}{2,3,4}_dreg (for VD and DX modes), using the
VSTRUCT_DREG iterator over TI/EI/OI modes to represent the block of memory
transferred. Instead, use BLKmode for all memory transfers, explicitly setting
mem_size.

Bootstrapped and check-gcc on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(aarch64_ld2_dreg VD & DX, aarch64_st2_dreg VD & DX ):
Change all TImode operands to BLKmode.
(aarch64_ld3_dreg VD & DX, aarch64_st3_dreg VD & DX):
Change all EImode operands to BLKmode.
(aarch64_ld4_dreg VD & DX, aarch64_st4_dreg VD & DX):
Change all OImode operands to BLKmode.

(aarch64_ld,
aarch64_st): Generate MEM rtx with BLKmode
and call set_mem_size.

* config/aarch64/iterators.md (VSTRUCT_DREG): Remove.
---
 gcc/config/aarch64/aarch64-simd.md | 44 +++---
 gcc/config/aarch64/iterators.md|  2 --
 2 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 3796386..7b7a1b8 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4393,7 +4393,7 @@
(subreg:OI
  (vec_concat:
(vec_concat:
-(unspec:VD [(match_operand:TI 1 "aarch64_simd_struct_operand" 
"Utv")]
+(unspec:VD [(match_operand:BLK 1 "aarch64_simd_struct_operand" 
"Utv")]
UNSPEC_LD2)
 (vec_duplicate:VD (const_int 0)))
(vec_concat:
@@ -4410,7 +4410,7 @@
(subreg:OI
  (vec_concat:
(vec_concat:
-(unspec:DX [(match_operand:TI 1 "aarch64_simd_struct_operand" 
"Utv")]
+(unspec:DX [(match_operand:BLK 1 "aarch64_simd_struct_operand" 
"Utv")]
UNSPEC_LD2)
 (const_int 0))
(vec_concat:
@@ -4428,7 +4428,7 @@
 (vec_concat:
  (vec_concat:
(vec_concat:
-(unspec:VD [(match_operand:EI 1 "aarch64_simd_struct_operand" 
"Utv")]
+(unspec:VD [(match_operand:BLK 1 "aarch64_simd_struct_operand" 
"Utv")]
UNSPEC_LD3)
 (vec_duplicate:VD (const_int 0)))
(vec_concat:
@@ -4450,7 +4450,7 @@
 (vec_concat:
  (vec_concat:
(vec_concat:
-(unspec:DX [(match_operand:EI 1 "aarch64_simd_struct_operand" 
"Utv")]
+(unspec:DX [(match_operand:BLK 1 "aarch64_simd_struct_operand" 
"Utv")]
UNSPEC_LD3)
 (const_int 0))
(vec_concat:
@@ -4472,7 +4472,7 @@
 (vec_concat:
   (vec_concat:
 (vec_concat:
-  (unspec:VD [(match_operand:OI 1 "aarch64_simd_struct_operand" 
"Utv")]
+  (unspec:VD [(match_operand:BLK 1 "aarch64_simd_struct_operand" 
"Utv")]
  UNSPEC_LD4)
   (vec_duplicate:VD (const_int 0)))
  (vec_concat:
@@ -4499,7 +4499,7 @@
 (vec_concat:
   (vec_concat:
 (vec_concat:
-  (unspec:DX [(match_operand:OI 1 "aarch64_simd_struct_operand" 
"Utv")]
+  (unspec:DX [(match_operand:BLK 1 "aarch64_simd_struct_operand" 
"Utv")]
  UNSPEC_LD4)
   (const_int 0))
  (vec_concat:
@@ -4526,8 +4526,8 @@
   (unspec:VDC [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_SIMD"
 {
-  machine_mode mode = mode;
-  rtx mem = gen_rtx_MEM (mode, operands[1]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
+  set_mem_size (mem,  * 8);
 
   emit_insn (gen_aarch64_ld_dreg (operands[0], mem));
   DONE;
@@ -4765,8 +4765,8 @@
 )
 
 (define_insn "aarch64_st2_dreg"
-  [(set (match_operand:TI 0 "aarch64_simd_struct_operand" "=Utv")
-   (unspec:TI [(match_operand:OI 1 "register_operand" "w")
+  [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv")
+   (unspec:BLK [(match_operand:OI 1 "register_operand" "w")
 (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
UNSPEC_ST2))]
   "TARGET_SIMD"
@@ -4775,8 +4775,8 @@
 )
 
 (define_insn "aarch64_st2_dreg"
-  [(set (match_operand:TI 0 "aarch64_simd_struct_operand" "=Utv")
-   (unspec:TI [(match_operand:OI 1 "register_operand" "w")
+  [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv")
+   (unspec:BLK [(match_operand:OI 1 "register_operand" "w")
 (unspec:DX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
UNSPEC_ST2))]
   "TARGET_SIMD"
@@ -4785,8 +4785,8 @@
 )
 
 (define_insn "aarch64_st3_dreg"
-  [(set (match_operand:EI 0 "aarch64_simd_struct_operand" "=Utv")
-   (unspec:EI [(match_operand:CI 1 "register_operand" "w")
+  [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv")
+   (unspec:BLK [(match_operand:CI 1 "register_operand" "w")
 (unspec:

[PATCH][AArch64 array_mode 4/8] Remove EImode

2015-08-26 Thread Alan Lawrence
This removes EImode from the (AArch64) compiler, and all mention of or support
for it.

bootstrapped and check-gcc on aarch64-none-linux-gnu

gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_simd_attr_length_rglist): Update
comment.

* config/aarch64/aarch64-builtins.c (ei_UP,
aarch64_simd_intEI_type_node): Remove.
(aarch64_simd_builtin_std_type): Remove EImode case.
(aarch64_init_simd_builtin_types): Don't create/add intEI_type_node.

* config/aarch64/aarch64-modes.def: Remove EImode.
---
 gcc/config/aarch64/aarch64-builtins.c | 8 
 gcc/config/aarch64/aarch64-modes.def  | 5 ++---
 gcc/config/aarch64/aarch64.c  | 2 +-
 3 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 294bf9d..9c8ca3b 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -73,7 +73,6 @@
 #define v2di_UP  V2DImode
 #define v2df_UP  V2DFmode
 #define ti_UP   TImode
-#define ei_UP   EImode
 #define oi_UP   OImode
 #define ci_UP   CImode
 #define xi_UP   XImode
@@ -435,7 +434,6 @@ static struct aarch64_simd_type_info aarch64_simd_types [] 
= {
 #undef ENTRY
 
 static tree aarch64_simd_intOI_type_node = NULL_TREE;
-static tree aarch64_simd_intEI_type_node = NULL_TREE;
 static tree aarch64_simd_intCI_type_node = NULL_TREE;
 static tree aarch64_simd_intXI_type_node = NULL_TREE;
 
@@ -509,8 +507,6 @@ aarch64_simd_builtin_std_type (enum machine_mode mode,
   return QUAL_TYPE (TI);
 case OImode:
   return aarch64_simd_intOI_type_node;
-case EImode:
-  return aarch64_simd_intEI_type_node;
 case CImode:
   return aarch64_simd_intCI_type_node;
 case XImode:
@@ -623,15 +619,11 @@ aarch64_init_simd_builtin_types (void)
 #define AARCH64_BUILD_SIGNED_TYPE(mode)  \
   make_signed_type (GET_MODE_PRECISION (mode));
   aarch64_simd_intOI_type_node = AARCH64_BUILD_SIGNED_TYPE (OImode);
-  aarch64_simd_intEI_type_node = AARCH64_BUILD_SIGNED_TYPE (EImode);
   aarch64_simd_intCI_type_node = AARCH64_BUILD_SIGNED_TYPE (CImode);
   aarch64_simd_intXI_type_node = AARCH64_BUILD_SIGNED_TYPE (XImode);
 #undef AARCH64_BUILD_SIGNED_TYPE
 
   tdecl = add_builtin_type
-   ("__builtin_aarch64_simd_ei" , aarch64_simd_intEI_type_node);
-  TYPE_NAME (aarch64_simd_intEI_type_node) = tdecl;
-  tdecl = add_builtin_type
("__builtin_aarch64_simd_oi" , aarch64_simd_intOI_type_node);
   TYPE_NAME (aarch64_simd_intOI_type_node) = tdecl;
   tdecl = add_builtin_type
diff --git a/gcc/config/aarch64/aarch64-modes.def 
b/gcc/config/aarch64/aarch64-modes.def
index b17b90d..653bd00 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -46,9 +46,8 @@ VECTOR_MODE (FLOAT, DF, 1);   /* V1DF.  */
 /* Oct Int: 256-bit integer mode needed for 32-byte vector arguments.  */
 INT_MODE (OI, 32);
 
-/* Opaque integer modes for 3, 6 or 8 Neon double registers (2 is
-   TImode).  */
-INT_MODE (EI, 24);
+/* Opaque integer modes for 3 or 4 Neon q-registers / 6 or 8 Neon d-registers
+   (2 d-regs = 1 q-reg = TImode).  */
 INT_MODE (CI, 48);
 INT_MODE (XI, 64);
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 020f63c..a923b55 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9305,7 +9305,7 @@ aarch64_simd_attr_length_move (rtx_insn *insn)
 }
 
 /* Compute and return the length of aarch64_simd_reglist, where  is
-   one of VSTRUCT modes: OI, CI, EI, or XI.  */
+   one of VSTRUCT modes: OI, CI, or XI.  */
 int
 aarch64_simd_attr_length_rglist (enum machine_mode mode)
 {
-- 
1.8.3



[PATCH][AArch64 array_mode 1/8] Rename vec_store_lanes_lane to aarch64_vec_store_lanes_lane

2015-08-26 Thread Alan Lawrence
vec_store_lanes{oi,ci,xi}_lane are not standard pattern names, so using them in 
aarch64-simd.md is misleading. This adds an aarch64_ prefix to those pattern 
names, paralleling aarch64_vec_load_lanes_lane.

bootstrapped and check-gcc on aarch64-none-linux-gnu

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (vec_store_lanesoi_lane): Rename
to...
(aarch64_vec_store_lanesoi_lane): ...this.

(vec_store_lanesci_lane): Rename to...
(aarch64_vec_store_lanesci_lane): ...this.

(vec_store_lanesxi_lane): Rename to...
(aarch64_vec_store_lanesxi_lane): ...this.

(aarch64_st2_lane, aarch64_st3_lane,
aarch64_st4_lane): Follow renaming.
---
 gcc/config/aarch64/aarch64-simd.md | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index b90f938..3796386 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3956,7 +3956,7 @@
 )
 
 ;; RTL uses GCC vector extension indices, so flip only for assembly.
-(define_insn "vec_store_lanesoi_lane"
+(define_insn "aarch64_vec_store_lanesoi_lane"
   [(set (match_operand: 0 "aarch64_simd_struct_operand" "=Utv")
(unspec: [(match_operand:OI 1 "register_operand" "w")
 (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)
@@ -4051,7 +4051,7 @@
 )
 
 ;; RTL uses GCC vector extension indices, so flip only for assembly.
-(define_insn "vec_store_lanesci_lane"
+(define_insn "aarch64_vec_store_lanesci_lane"
   [(set (match_operand: 0 "aarch64_simd_struct_operand" "=Utv")
(unspec: [(match_operand:CI 1 "register_operand" "w")
 (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)
@@ -4146,7 +4146,7 @@
 )
 
 ;; RTL uses GCC vector extension indices, so flip only for assembly.
-(define_insn "vec_store_lanesxi_lane"
+(define_insn "aarch64_vec_store_lanesxi_lane"
   [(set (match_operand: 0 "aarch64_simd_struct_operand" "=Utv")
(unspec: [(match_operand:XI 1 "register_operand" "w")
 (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)
@@ -4861,9 +4861,9 @@
   rtx mem = gen_rtx_MEM (mode, operands[0]);
   operands[2] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[2])));
 
-  emit_insn (gen_vec_store_lanesoi_lane (mem,
- operands[1],
- operands[2]));
+  emit_insn (gen_aarch64_vec_store_lanesoi_lane (mem,
+ operands[1],
+ operands[2]));
   DONE;
 })
 
@@ -4878,9 +4878,9 @@
   rtx mem = gen_rtx_MEM (mode, operands[0]);
   operands[2] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[2])));
 
-  emit_insn (gen_vec_store_lanesci_lane (mem,
- operands[1],
- operands[2]));
+  emit_insn (gen_aarch64_vec_store_lanesci_lane (mem,
+ operands[1],
+ operands[2]));
   DONE;
 })
 
@@ -4895,9 +4895,9 @@
   rtx mem = gen_rtx_MEM (mode, operands[0]);
   operands[2] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[2])));
 
-  emit_insn (gen_vec_store_lanesxi_lane (mem,
- operands[1],
- operands[2]));
+  emit_insn (gen_aarch64_vec_store_lanesxi_lane (mem,
+ operands[1],
+ operands[2]));
   DONE;
 })
 
-- 
1.8.3



[PATCH][AArch64 0/8] Add D-registers to TARGET_ARRAY_MODE_SUPPORTED_P

2015-08-26 Thread Alan Lawrence
The end goal of this series of patches is to enable 64bit vector modes for
TARGET_ARRAY_MODE_SUPPORTED_P, achieved in the last patch. At present, doing so
causes ICEs with illegal subregs (e.g. returning the middle bits from a large
int mode covering 3 vectors); the patchset avoids these by first removing EImode
(192 bits = 24 bytes = 1.5 vector registers), which is currently used for
24-byte quantities transferred to/from memory by some {ld,st}3_lane instrinsics.
There is no real need to use EImode here, it's only real purpose is that it has
size 24 bytes, so we can use BLKmode instead as long as we explicitly set the
size.

Patches 5-6 extend the same BLKmode treatment to {ld,st}{2,4}, allowing all the
expander patterns to combined in patch 7; these are not essential to the end
goal but it seemed good to be consistent. Patch 1 is a driveby, and stands in
its own right.



Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Jakub Jelinek
On Wed, Aug 26, 2015 at 03:21:52PM +0200, Richard Biener wrote:
> On Wed, Aug 26, 2015 at 3:16 PM, Jakub Jelinek  wrote:
> > On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote:
> >> > AVX-512 is such target. Current representation forces multiple scalar
> >> > mask -> vector mask and back transformations which are artificially
> >> > introduced by current bool patterns and are hard to optimize out.
> >>
> >> I dislike the bool patterns anyway and we should try to remove those
> >> and make the vectorizer handle them in other ways (they have single-use
> >> issues anyway).  I don't remember exactly what caused us to add them
> >> but one reason was there wasn't a vector type for 'bool' (but I don't see 
> >> how
> >> it should be necessary to ask "get me a vector type for 'bool'").
> >
> > That was just one of the reasons.  The other reason is that even if we would
> > choose some vector of integer type as vector of bool, the question is what
> > type.  E.g. if you use vector of chars, you almost always get terrible
> > vectorized code, except for the AVX-512 you really want an integral type
> > that has the size of the types you are comparing.
> 
> Yeah, but the way STMT_VINFO_VECTYPE is computed is that we always
> first compute the vector type for the comparison itself (which is "fixed") and
> thus we can compute the vector type of any bitwise op on it as well.

Sure, but if you then immediately vector narrow it to a V*QI vector because
it is stored originally into a bool/_Bool variable, and then again when it
is used in say a COND_EXPR widen it again, you get really poor code.
So, what the bool pattern code does is kind of poor man's type
promotion/demotion pass for bool only, at least for the common cases.

PR50596 has been the primary reason to introduce the bool patterns.
If there is a better type promotion/demotion pass on a copy of the loop,
sure, we can get rid of it (but figure out also what to do for SLP).

Jakub


[libvtv] Fix formatting errors

2015-08-26 Thread Rainer Orth
While looking at libvtv for the Solaris port, I noticed all sorts of GNU
Coding Standard violations:

* ChangeLog entries attributed to the committer instead of the author
  and with misformatted PR references, entries only giving a vague
  rational instead of what changed

* overlong lines

* tons of whitespace errors (though I may be wrong in some cases: C++
  code might have other rules)

* code formatting that seems to have been done to be visually pleasing,
  completely different from what Emacs does

* commented code fragments (#if 0 equivalent)

* configure.tgt target list in no recognizable order

* the Cygwin/MingW port is done in the worst possible way: tons of
  target-specific ifdefs instead of feature-specific conditionals or an
  interface that can wrap both Cygwin and Linux variants of the code

The following patch (as yet not even compiled) fixes some of the most
glaring errors.  The Solaris port will fix a few of the latter ones.

Do you think this is the right direction or did I get something wrong?

Thanks.
Rainer


2015-08-26  Rainer Orth  

Fix formatting errors.

# HG changeset patch
# Parent 6459822b8e6fa7647ad0d12ffb6f3da7bd0c5db2
Fix formatting errors

diff --git a/libvtv/ChangeLog b/libvtv/ChangeLog
--- a/libvtv/ChangeLog
+++ b/libvtv/ChangeLog
@@ -1,6 +1,6 @@
-2015-08-01  Caroline Tice  
+2015-08-01  Eric Gallager  
 
-	PR 66521
+	PR bootstrap/66521
 	* Makefile.am:  Update to match latest tree.
 	* Makefile.in: Regenerate.
 	* testsuite/lib/libvtv: Brought up to date.
@@ -24,15 +24,13 @@ 2015-02-09  Thomas Schwinge  
+2015-01-29  Patrick Wollgast  
 
-	Committing VTV Cywin/Ming patch for Patrick Wollgast
 	* libvtv/Makefile.in : Regenerate.
 	* libvtv/configure : Regenerate.
 
-2015-01-28  Caroline Tice  
+2015-01-28  Patrick Wollgast  
 
-	Committing VTV Cywin/Ming patch for Patrick Wollgast
 	* libvtv/Makefile.am : Add libvtv.la to toolexeclib_LTLIBRARIES, if
 	VTV_CYGMIN is set. Define libvtv_la_LIBADD, libvtv_la_LDFLAGS,
 	libvtv_stubs_la_LDFLAGS and libvtv_stubs_la_SOURCES if VTV_CYGMIN is
diff --git a/libvtv/vtv_fail.cc b/libvtv/vtv_fail.cc
--- a/libvtv/vtv_fail.cc
+++ b/libvtv/vtv_fail.cc
@@ -38,9 +38,7 @@
desired.  This may be the case if the programmer has to deal wtih
unverified third party software, for example.  __vtv_really_fail is
available for the programmer to call from his version of
-   __vtv_verify_fail, if he decides the failure is real.
-
-*/
+   __vtv_verify_fail, if he decides the failure is real.  */
 
 #include 
 #include 
@@ -80,8 +78,8 @@ const unsigned long SET_HANDLE_HANDLE_BI
 
 /* Instantiate the template classes (in vtv_set.h) for our particular
hash table needs.  */
-typedef void * vtv_set_handle;
-typedef vtv_set_handle * vtv_set_handle_handle; 
+typedef void *vtv_set_handle;
+typedef vtv_set_handle *vtv_set_handle_handle; 
 
 static int vtv_failures_log_fd = -1;
 
@@ -121,17 +119,16 @@ log_error_message (const char *log_msg, 
variable.  */
 
 static inline bool
-is_set_handle_handle (void * ptr)
+is_set_handle_handle (void *ptr)
 {
-  return ((unsigned long) ptr & SET_HANDLE_HANDLE_BIT)
-  == SET_HANDLE_HANDLE_BIT;
+  return ((unsigned long) ptr & SET_HANDLE_HANDLE_BIT) == SET_HANDLE_HANDLE_BIT;
 }
 
 /* Returns the actual pointer value of a vtable map variable, PTR (see
comments for is_set_handle_handle for more details).  */
 
 static inline vtv_set_handle * 
-ptr_from_set_handle_handle (void * ptr)
+ptr_from_set_handle_handle (void *ptr)
 {
   return (vtv_set_handle *) ((unsigned long) ptr & ~SET_HANDLE_HANDLE_BIT);
 }
@@ -141,7 +138,7 @@ ptr_from_set_handle_handle (void * ptr)
variable.  */
 
 static inline vtv_set_handle_handle
-set_handle_handle (vtv_set_handle * ptr)
+set_handle_handle (vtv_set_handle *ptr)
 {
   return (vtv_set_handle_handle) ((unsigned long) ptr | SET_HANDLE_HANDLE_BIT);
 }
@@ -151,7 +148,7 @@ set_handle_handle (vtv_set_handle * ptr)
file, then calls __vtv_verify_fail.  SET_HANDLE_PTR is the pointer
to the set of valid vtable pointers, VTBL_PTR is the pointer that
was not found in the set, and DEBUG_MSG is the message to be
-   written to the log file before failing. n */
+   written to the log file before failing.  */
 
 void
 __vtv_verify_fail_debug (void **set_handle_ptr, const void *vtbl_ptr, 
@@ -197,9 +194,9 @@ vtv_fail (const char *msg, void **data_s
  "*** Unable to verify vtable pointer (%p) in set (%p) *** \n";
 
   snprintf (buffer, sizeof (buffer), format_str, vtbl_ptr,
-is_set_handle_handle(*data_set_ptr) ?
-  ptr_from_set_handle_handle (*data_set_ptr) :
-	  *data_set_ptr);
+is_set_handle_handle(*data_set_ptr)
+	? ptr_from_set_handle_handle (*data_set_ptr)
+	: *data_set_ptr);
   buf_len = strlen (buffer);
   /*  Send this to to stderr.  */
   write (2, buffer, buf_len);
@@ -221,9 +218,9 @@ void
   char log_msg[256];
   snprintf (log_msg,

Re: [gomp4.1] comment some stuff

2015-08-26 Thread Jakub Jelinek
On Tue, Aug 25, 2015 at 10:35:37AM -0700, Aldy Hernandez wrote:
> diff --git a/libgomp/env.c b/libgomp/env.c
> index 65a6851..0569521 100644
> --- a/libgomp/env.c
> +++ b/libgomp/env.c
> @@ -69,7 +69,7 @@ struct gomp_task_icv gomp_global_icv = {
>  
>  unsigned long gomp_max_active_levels_var = INT_MAX;
>  bool gomp_cancel_var = false;
> -int gomp_max_task_priority_var = 0;
> +static int gomp_max_task_priority_var = 0;
>  #ifndef HAVE_SYNC_BUILTINS
>  gomp_mutex_t gomp_managed_threads_lock;
>  #endif

Please remove this hunk.  The variable is meant to be used in task.c,
where
  (void) priority;
is present right now (like:
  if (priority > gomp_max_task_priority_var)
priority = gomp_max_task_priority_var;
or so.

> @@ -110,7 +112,12 @@ static void gomp_task_maybe_wait_for_dependencies (void 
> **depend);
>  
>  /* Called when encountering an explicit task directive.  If IF_CLAUSE is
> false, then we must not delay in executing the task.  If UNTIED is true,
> -   then the task may be executed by any member of the team.  */
> +   then the task may be executed by any member of the team.
> +
> +   DEPEND is an array containing:
> + depend[0]: number of depend elements.
> + depend[1]: number of depend elements of type "out".
> + depend[N+2]: address of [0..N]th depend element.  */

Either [1..N]th, or [0..N-1]th.  And depend[N+2] should better be
depend[2..N+1].

Otherwise LGTM.

Jakub


Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Richard Biener
On Wed, Aug 26, 2015 at 3:16 PM, Jakub Jelinek  wrote:
> On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote:
>> > AVX-512 is such target. Current representation forces multiple scalar
>> > mask -> vector mask and back transformations which are artificially
>> > introduced by current bool patterns and are hard to optimize out.
>>
>> I dislike the bool patterns anyway and we should try to remove those
>> and make the vectorizer handle them in other ways (they have single-use
>> issues anyway).  I don't remember exactly what caused us to add them
>> but one reason was there wasn't a vector type for 'bool' (but I don't see how
>> it should be necessary to ask "get me a vector type for 'bool'").
>
> That was just one of the reasons.  The other reason is that even if we would
> choose some vector of integer type as vector of bool, the question is what
> type.  E.g. if you use vector of chars, you almost always get terrible
> vectorized code, except for the AVX-512 you really want an integral type
> that has the size of the types you are comparing.

Yeah, but the way STMT_VINFO_VECTYPE is computed is that we always
first compute the vector type for the comparison itself (which is "fixed") and
thus we can compute the vector type of any bitwise op on it as well.

> And I'd say this is very much related to the need to do some type promotions
> or demotions on the scalar code meant to be vectorized (but only the copy
> for vectorizations), so that we have as few different scalar type sizes in
> the loop as possible, because widening / narrowing vector conversions aren't
> exactly cheap and a single char operation in a loop otherwise full of long
> long operations might unnecessarily turn a vf=2 (or 4 or 8) loop into
> vf=16 (or 32 or 64), increasing it a lot.

That's true but unrelated.  With conditions this gets to optimizing where
the promotion/demotion happens (which depends on how the result is used).

The current pattern approach has the issue that it doesn't work for multiple
uses in the condition bitops which is bad as well.

But it couldn't have been _only_ the vector type computation that made us
invent the patterns, no?  Do you remember anything else?

Thanks,
Richard.


>
> Jakub


Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Jakub Jelinek
On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote:
> > AVX-512 is such target. Current representation forces multiple scalar
> > mask -> vector mask and back transformations which are artificially
> > introduced by current bool patterns and are hard to optimize out.
> 
> I dislike the bool patterns anyway and we should try to remove those
> and make the vectorizer handle them in other ways (they have single-use
> issues anyway).  I don't remember exactly what caused us to add them
> but one reason was there wasn't a vector type for 'bool' (but I don't see how
> it should be necessary to ask "get me a vector type for 'bool'").

That was just one of the reasons.  The other reason is that even if we would
choose some vector of integer type as vector of bool, the question is what
type.  E.g. if you use vector of chars, you almost always get terrible
vectorized code, except for the AVX-512 you really want an integral type
that has the size of the types you are comparing.
And I'd say this is very much related to the need to do some type promotions
or demotions on the scalar code meant to be vectorized (but only the copy
for vectorizations), so that we have as few different scalar type sizes in
the loop as possible, because widening / narrowing vector conversions aren't
exactly cheap and a single char operation in a loop otherwise full of long
long operations might unnecessarily turn a vf=2 (or 4 or 8) loop into
vf=16 (or 32 or 64), increasing it a lot.

Jakub


Re: [PATCH], PowerPC IEEE 128-bit patch #5

2015-08-26 Thread David Edelsohn
On Tue, Aug 25, 2015 at 7:20 PM, Michael Meissner
 wrote:

> Here is the revised patch. Is it ok to install?
>
> 2015-08-25  Michael Meissner  
>
> * config/rs6000/predicates.md (int_reg_operand_not_pseudo): New
> predicate for only GPR hard registers.
>
> * config/rs6000/rs6000.md (FP): Add IEEE 128-bit floating point
> modes to iterators. Add new iterators for moving 128-bit values in
> scalar FPR registers and VSX registers.
> (FMOVE128): Likewise.
> (FMOVE128_FPR): Likewise.
> (FMOVE128_GPR): Likewise.
> (FMOVE128_VSX): Likewise.
> (FLOAT128_SFDFTF): New iterators for IEEE 128-bit floating point
> in VSX registers.
> (IFKF): Likewise.
> (IBM128): Likewise.
> (TFIFKF): Likewise.
> (RELOAD): Add IEEE 128-bit floating point modes.
> (signbittf2): Convert TF insns to add support for new IEEE 128-bit
> floating point in VSX registers modes.
> (signbit2, IBM128 iterator): Likewise.
> (mov_64bit_dm, FMOVE128_FPR iterator): Likewise.
> (mov_32bit, FMOVE128_FPR iterator): Likewise.
> (negtf2): Likewise.
> (neg2, TFIFKF iterator): Likewise.
> (negtf2_internal): Likewise.
> (abstf2): Likewise.
> (abs2, TFIFKF iterator): Likewise.
> (ieee_128bit_negative_zero): New IEEE 128-bit floating point in
> VSX insn support for negate, absolute value, and negative absolute
> value.
> (ieee_128bit_vsx_neg2): Likewise.
> (ieee_128bit_vsx_neg2_internal): Likewise.
> (ieee_128bit_vsx_abs2): Likewise.
> (ieee_128bit_vsx_abs2_internal): Likewise.
> (ieee_128bit_vsx_nabs2): Likewise.
> (ieee_128bit_vsx_nabs2_internal): Likewise.
> (FP128_64): Update pack/unpack 128-bit insns for IEEE 128-bit
> floating point in VSX registers.
> (unpack_dm): Likewise.
> (unpack_nodm): Likewise.
> (pack): Likewise.
> (unpackv1ti): Likewise.
> (unpack, FMOVE128_VSX iterator): Likewise.
> (packv1ti): Likewise.
> (pack, FMOVE128_VSX iterator): Likewise.

The revised patch is okay.

Thanks, David


Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Richard Biener
On Wed, Aug 26, 2015 at 1:13 PM, Ilya Enkovich  wrote:
> 2015-08-26 0:42 GMT+03:00 Jeff Law :
>> On 08/21/2015 04:49 AM, Ilya Enkovich wrote:
>>>
>>>
>>> I want a work with bitmasks to be expressed in a natural way using
>>> regular integer operations. Currently all masks manipulations are
>>> emulated via vector statements (mostly using a bunch of vec_cond). For
>>> complex predicates it may be nontrivial to transform it back to scalar
>>> masks and get an efficient code. Also the same vector may be used as
>>> both a mask and an integer vector. Things become more complex if you
>>> additionally have broadcasts and vector pack/unpack code. It also
>>> should be transformed into a scalar masks manipulations somehow.
>>
>> Or why not model the conversion at the gimple level using a CONVERT_EXPR?
>> In fact, the more I think about it, that seems to make more sense to me.
>>
>> We pick a canonical form for the mask, whatever it may be.  We use that
>> canonical form and model conversions between it and the other form via
>> CONVERT_EXPR.  We then let DOM/PRE find/eliminate the redundant conversions.
>> If it's not up to the task, we should really look into why and resolve.
>>
>> Yes, that does mean we have two forms which I'm not terribly happy about and
>> it means some target dependencies on what the masked vector operation looks
>> like (ie, does it accept a simple integer or vector mask), but I'm starting
>> to wonder if, as distasteful as I find it, it's the right thing to do.
>
> If we have some special representation for masks in GIMPLE then we
> might not need any conversions. We could ask a target to define a MODE
> for this type and use it directly everywhere: directly compare into
> it, use it directly for masked loads and stores, AND, IOR, EQ etc. If
> that type is reserved for masks usage then you previous suggestion to
> transform masks into target specific form at GIMPLE->RTL phase should
> work fine. This would allow to support only a single masks
> representation in GIMPLE.

But we can already do all this with the integer vector masks we have.
If you think that the vectorizer generated

  mask = VEC_COND 

is ugly then we can remove that implementation detail and use

  mask = v1 < v2;

directly.  Note that the VEC_COND form was invented to avoid
the need to touch RTL expansion for vector compares (IIRC).
Or it pre-dated specifying what compares generate on GIMPLE.

Richard.

> Thanks,
> Ilya
>
>>

 But I don't like changing our IL so much as to allow 'integer' masks
 everywhere.
>>
>> I'm warming up to that idea...
>>
>> jeff
>>


Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Richard Biener
On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich  wrote:
> 2015-08-21 14:00 GMT+03:00 Richard Biener :
>> On Fri, Aug 21, 2015 at 12:49 PM, Ilya Enkovich  
>> wrote:
>>> 2015-08-21 11:15 GMT+03:00 Richard Biener :
 On Thu, Aug 20, 2015 at 8:46 PM, Jeff Law  wrote:
> On 08/17/2015 10:25 AM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> This patch intoriduces a new vectorizer hook use_scalar_mask_p which
>> affects code generated by if-conversion pass (and affects patterns in 
>> later
>> patches).
>>
>> Thanks,
>> Ilya
>> --
>> 2015-08-17  Ilya Enkovich  
>>
>> * doc/tm.texi (TARGET_VECTORIZE_USE_SCALAR_MASK_P): New.
>> * doc/tm.texi.in: Regenerated.
>> * target.def (use_scalar_mask_p): New.
>> * tree-if-conv.c: Include target.h.
>> (predicate_mem_writes): Don't convert boolean predicates into
>> integer when scalar masks are used.
>
> Presumably this is how you prevent the generation of scalar masks rather
> than boolean masks on targets which don't have the former?
>
> I hate to ask, but how painful would it be to go from a boolean to integer
> masks later such as during expansion?  Or vice-versa.
>
> WIthout a deep knowledge of the entire patchkit, it feels like we're
> introducing target stuff in a place where we don't want it and that we'd 
> be
> better served with a canonical representation through gimple, then 
> dropping
> into something more target specific during gimple->rtl expansion.
>>>
>>> I want a work with bitmasks to be expressed in a natural way using
>>> regular integer operations. Currently all masks manipulations are
>>> emulated via vector statements (mostly using a bunch of vec_cond). For
>>> complex predicates it may be nontrivial to transform it back to scalar
>>> masks and get an efficient code. Also the same vector may be used as
>>> both a mask and an integer vector. Things become more complex if you
>>> additionally have broadcasts and vector pack/unpack code. It also
>>> should be transformed into a scalar masks manipulations somehow.
>>
>> Hmm, I don't see how vector masks are more difficult to operate with.
>
> There are just no instructions for that but you have to pretend you
> have to get code vectorized.

Huh?  Bitwise ops should be readily available.

>>
>>> Also according to vector ABI integer mask should be used for mask
>>> operand in case of masked vector call.
>>
>> What ABI?  The function signature of the intrinsics?  How would that
>> come into play here?
>
> Not intrinsics. I mean OpenMP vector functions which require integer
> arg for a mask in case of 512-bit vector.

How do you declare those?

>>
>>> Current implementation of masked loads, masked stores and bool
>>> patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
>>> really call it a canonical representation for all targets?
>>
>> No idea - we'll revisit when another targets adds a similar capability.
>
> AVX-512 is such target. Current representation forces multiple scalar
> mask -> vector mask and back transformations which are artificially
> introduced by current bool patterns and are hard to optimize out.

I dislike the bool patterns anyway and we should try to remove those
and make the vectorizer handle them in other ways (they have single-use
issues anyway).  I don't remember exactly what caused us to add them
but one reason was there wasn't a vector type for 'bool' (but I don't see how
it should be necessary to ask "get me a vector type for 'bool'").

>>
>>> Using scalar masks everywhere should probably cause the same conversion
>>> problem for SSE I listed above though.
>>>
>>> Talking about a canonical representation, shouldn't we use some
>>> special masks representation and not mixing it with integer and vector
>>> of integers then? Only in this case target would be able to
>>> efficiently expand it into a corresponding rtl.
>>
>> That was my idea of vector ... but I didn't explore it and see where
>> it will cause issues.
>>
>> Fact is GCC already copes with vector masks generated by vector compares
>> just fine everywhere and I'd rather leave it as that.
>
> Nope. Currently vector mask is obtained from a vec_cond  0}, {-1 .. -1}>. AND and IOR on bools are also expressed via
> additional vec_cond. I don't think vectorizer ever generates vector
> comparison.

Ok, well that's an implementation detail then.  Are you sure about AND and IOR?
The comment above vect_recog_bool_pattern says

Assuming size of TYPE is the same as size of all comparisons
(otherwise some casts would be added where needed), the above
sequence we create related pattern stmts:
S1'  a_T = x1 CMP1 y1 ? 1 : 0;
S3'  c_T = x2 CMP2 y2 ? a_T : 0;
S4'  d_T = x3 CMP3 y3 ? 1 : 0;
S5'  e_T = c_T | d_T;
S6'  f_T = e_T;

thus has vector mask |

> And I wouldn't say it's fine 'everywhere'

Re: [PATCH][AARCH64]Fix for branch offsets over 1 MiB

2015-08-26 Thread Marcus Shawcroft
On 25 August 2015 at 14:12, Andre Vieira  wrote:

> gcc/ChangeLog:
> 2015-08-07  Ramana Radhakrishnan  
> Andre Vieira  
>
> * config/aarch64/aarch64.md (*condjump): Handle functions > 1 Mib.
> (*cb1): Likewise.
> (*tb1): Likewise.
> (*cb1): Likewise.
> * config/aarch64/iterators.md (inv_cb): New code attribute.
> (inv_tb): Likewise.
> * config/aarch64/aarch64.c (aarch64_gen_far_branch): New.
> * config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New.
>
> gcc/testsuite/ChangeLog:
> 2015-08-07  Andre Vieira  
>
> * gcc.target/aarch64/long_branch_1.c: New test.

OK /Marcus


[libgfortran,committed] Fix SHAPE intrinsic with KIND values 1 and 2

2015-08-26 Thread FX
Attached patch fixes the SHAPE intrinsic with option argument KIND values of 1 
and 2. While we already accept and emit code for SHAPE with KIND values, the 
runtime versions with integer kinds 1 and 2 are missing (while values of 4, 8 
and 10 are present).

The patch adds the necessary generated files, and symbols into gfortran.map, as 
well as a testcase.

I also took the opportunity to fix an error in the type of the SHAPE argument, 
which is a generic array (array_t) and not a specifically-typed version. This 
changes nothing for the generated code, because only the shape of the array 
descriptor is accessed. But it’s cleaner that way.

Committed as revision 227210, after bootstrapping and regtesting on 
x86_64-apple-darwin15.

FX




shape.ChangeLog
Description: Binary data


shape.diff
Description: Binary data


Re: [AArch64][TLSLE][1/3] Add the option "-mtls-size" for AArch64

2015-08-26 Thread Marcus Shawcroft
On 25 August 2015 at 15:15, Jiong Wang  wrote:

> 2015-08-25  Jiong Wang  
>
> gcc/
>   * config/aarch64/aarch64.opt (mtls-size): New entry.
>   * config/aarch64/aarch64.c (initialize_aarch64_tls_size): New function.
>   (aarch64_override_options_internal): Call initialize_aarch64_tls_size.
>   * doc/invoke.texi (AArch64 Options): Document -mtls-size.
>

OK Thanks /Marcus


Re: Fix libbacktrace -fPIC breakage from "Use libbacktrace in libgfortran"

2015-08-26 Thread Hans-Peter Nilsson
> From: Ulrich Weigand 
> Date: Wed, 26 Aug 2015 13:45:35 +0200

> Hans-Peter Nilsson wrote:
> > > From: Ulrich Weigand 
> > > Date: Tue, 25 Aug 2015 19:45:06 +0200
> > 
> > > However, neither works for the SPU, because in both cases libtool
> > > will only do the test whether the target supports the -fPIC option.
> > > It will not test whether the target supports dynamic libraries.
> > > 
> > > [ It will do that test; and default to --disable-shared on SPU.
> > > That is a no-op for libbacktrace however, since it calls LT_INIT
> > > with the disable-shared option anyway.
> > 
> > Maybe it shouldn't?
> 
> Huh?  We do want libbacktrace solely as static library, that's the
> whole point ...

I meant that as a *suggestion for a possible workaround* to stop
libtool from refusing to compile with PIC, but then I take it
you don't need hints to try another angle than adjusting
compilation flags.

> > >  When adding back the -fPIC
> > > flag due to either the pic-only LT_INIT option or the -prefer-pic
> > > libtool command line option, it does not check for that again.  ]
> > 
> > Sounds like a bug somewhere, in libtool or its current use:
> > there *should* be a way to specify "I'd prefer PIC code in these
> > static libraries".
> 
> But that's what the option *does*.
> 
> Let me try again, maybe we can reduce confusion a bit :-)

I don't feel very confused, but I understand you've investigated
things down to a point where we can conclude that libtool can't
do what SPU needs without also at least fiddling with
compilation options.

> I guess we can always fall back to just hard-coding SPU once
> more; that's certainly the simplest solution right now.

Maybe.

brgds, H-P


[gomp4] loop partition optimization

2015-08-26 Thread Nathan Sidwell
I've committed this patch, which implements a simple partioned execution 
optimization.  A loop over both worker and vector dimensions is emits  separate 
FORK and JOIN markers for the two dimensions -- there may be reduction pieces 
between them, as Cesar will shortly be committing.


However, if there aren't reductions, then we end up with one partitioned region 
sitting neatly entirely inside another region.   This is inefficient, as it 
causes us to add separate worker and vector partitioning startup.


This optimization looks for regions of this form, and if found consumes the 
inner retion into the outer region.  Then we only emit a single setup block of code.


nathan
2015-08-26  Nathan Sidwell  

	* config/nvptx/nvptx.opt (moptimize): New flag.
	* config/nvptx/nvptx.c (nvptx_option_override): Default
	nvptx_optimize.
	(nvptx_optimmize_inner): New.
	(nvptx_process_pars): Call it.
	* doc/invoke.txi (Nvptx options): Document moptimize.

Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 227180)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -178,6 +178,9 @@ nvptx_option_override (void)
   write_symbols = NO_DEBUG;
   debug_info_level = DINFO_LEVEL_NONE;
 
+  if (nvptx_optimize < 0)
+nvptx_optimize = optimize > 0;
+
   declared_fndecls_htab = hash_table::create_ggc (17);
   needed_fndecls_htab = hash_table::create_ggc (17);
   declared_libfuncs_htab
@@ -3005,6 +3008,64 @@ nvptx_skip_par (unsigned mask, parallel
   nvptx_single (mask, par->forked_block, pre_tail);
 }
 
+/* If PAR has a single inner parallel and PAR itself only contains
+   empty entry and exit blocks, swallow the inner PAR.  */
+
+static void
+nvptx_optimize_inner (parallel *par)
+{
+  parallel *inner = par->inner;
+
+  /* We mustn't be the outer dummy par.  */
+  if (!par->mask)
+return;
+
+  /* We must have a single inner par.  */
+  if (!inner || inner->next)
+return;
+
+  /* We must only contain 2 blocks ourselves -- the head and tail of
+ the inner par.  */
+  if (par->blocks.length () != 2)
+return;
+
+  /* We must be disjoint partitioning.  As we only have vector and
+ worker partitioning, this is sufficient to guarantee the pars
+ have adjacent partitioning.  */
+  if ((par->mask & inner->mask) & (GOMP_DIM_MASK (GOMP_DIM_MAX) - 1))
+/* This indicates malformed code generation.  */
+return;
+
+  /* The outer forked insn should be the only one in its block.  */
+  rtx_insn *probe;
+  rtx_insn *forked = par->forked_insn;
+  for (probe = BB_END (par->forked_block);
+   probe != forked; probe = PREV_INSN (probe))
+if (INSN_P (probe))
+  return;
+
+  /* The outer joining insn, if any, must be in the same block as the inner
+ joined instruction, which must otherwise be empty of insns.  */
+  rtx_insn *joining = par->joining_insn;
+  rtx_insn *join = inner->join_insn;
+  for (probe = BB_END (inner->join_block);
+   probe != join; probe = PREV_INSN (probe))
+if (probe != joining && INSN_P (probe))
+  return;
+
+  /* Preconditions met.  Swallow the inner par.  */
+  par->mask |= inner->mask & (GOMP_DIM_MASK (GOMP_DIM_MAX) - 1);
+
+  par->blocks.reserve (inner->blocks.length ());
+  while (inner->blocks.length ())
+par->blocks.quick_push (inner->blocks.pop ());
+
+  par->inner = inner->inner;
+  inner->inner = NULL;
+
+  delete inner;
+}
+
 /* Process the parallel PAR and all its contained
parallels.  We do everything but the neutering.  Return mask of
partitioned modes used within this parallel.  */
@@ -3012,8 +3073,11 @@ nvptx_skip_par (unsigned mask, parallel
 static unsigned
 nvptx_process_pars (parallel *par)
 {
-  unsigned inner_mask = par->mask;
+  if (nvptx_optimize)
+nvptx_optimize_inner (par);
   
+  unsigned inner_mask = par->mask;
+
   /* Do the inner parallels first.  */
   if (par->inner)
 {
Index: gcc/config/nvptx/nvptx.opt
===
--- gcc/config/nvptx/nvptx.opt	(revision 227180)
+++ gcc/config/nvptx/nvptx.opt	(working copy)
@@ -29,6 +29,10 @@ mmainkernel
 Target Report RejectNegative
 Link in code for a __main kernel.
 
+moptimize
+Target Report Var(nvptx_optimize) Init(-1)
+Optimize partition neutering
+
 Enum
 Name(ptx_isa) Type(int)
 Known PTX ISA versions (for use with the -misa= option):
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 227180)
+++ gcc/doc/invoke.texi	(working copy)
@@ -18814,6 +18814,11 @@ Generate code for 32-bit or 64-bit ABI.
 Link in code for a __main kernel.  This is for stand-alone instead of
 offloading execution.
 
+@item -moptimize
+@opindex moptimize
+Apply partitioned execution optimizations.  This is the default when any
+level of optimization is selected.
+
 @end table
 
 @node PDP-11 Options


[libvtv] Update copyrights

2015-08-26 Thread Rainer Orth
While working on the Solaris libvtv port, I noticed that many of the
libvtv copyright years hadn't been updated, were misformtted, or both.
It turns out that libvtv isn't listed in contrib/update-copyright.py at
all.  This patch fixes this and includes the result of running
update-copyright.py --this-year libvtv.

I've neither added libvtv to self.default_dirs in the script nor added
copyrights to the numerous files in libvtv that currently lack one.

Ok for mainline once it has survived regtesting?

Thanks.
Rainer


2015-08-26  Rainer Orth  

libvtv:
Update copyrights.

contrib:
* update-copyright.py (GCCCmdLine): Add libvtv.

# HG changeset patch
# Parent 322129613b3dfc80c06f5f87dae9f2fa962a3496
Update copyrights

diff --git a/contrib/update-copyright.py b/contrib/update-copyright.py
--- a/contrib/update-copyright.py
+++ b/contrib/update-copyright.py
@@ -745,6 +745,7 @@ class GCCCmdLine (CmdLine):
 # libsanitiser is imported from upstream.
 self.add_dir ('libssp')
 self.add_dir ('libstdc++-v3', LibStdCxxFilter())
+self.add_dir ('libvtv')
 self.add_dir ('lto-plugin')
 # zlib is imported from upstream.
 
diff --git a/libvtv/Makefile.am b/libvtv/Makefile.am
--- a/libvtv/Makefile.am
+++ b/libvtv/Makefile.am
@@ -1,6 +1,6 @@
 ## Makefile for the VTV library.
 ##
-## Copyright (C) 2013 Free Software Foundation, Inc.
+## Copyright (C) 2013-2015 Free Software Foundation, Inc.
 ##
 ## Process this file with automake to produce Makefile.in.
 ##
diff --git a/libvtv/configure.tgt b/libvtv/configure.tgt
--- a/libvtv/configure.tgt
+++ b/libvtv/configure.tgt
@@ -1,5 +1,5 @@
 # -*- shell-script -*-
-#   Copyright (C) 2013 Free Software Foundation, Inc.
+#   Copyright (C) 2013-2015 Free Software Foundation, Inc.
 
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
diff --git a/libvtv/testsuite/config/default.exp b/libvtv/testsuite/config/default.exp
--- a/libvtv/testsuite/config/default.exp
+++ b/libvtv/testsuite/config/default.exp
@@ -1,4 +1,4 @@
-#   Copyright (C) 2013 Free Software Foundation, Inc.
+#   Copyright (C) 2013-2015 Free Software Foundation, Inc.
 
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
diff --git a/libvtv/testsuite/libvtv.cc/virtfunc-test.cc b/libvtv/testsuite/libvtv.cc/virtfunc-test.cc
--- a/libvtv/testsuite/libvtv.cc/virtfunc-test.cc
+++ b/libvtv/testsuite/libvtv.cc/virtfunc-test.cc
@@ -2,8 +2,7 @@
 
 /* This test script is part of GDB, the GNU debugger.
 
-   Copyright 1993, 1994, 1997, 1998, 1999, 2003, 2004,
-   Free Software Foundation, Inc.
+   Copyright (C) 1993-2015 Free Software Foundation, Inc.
 
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
diff --git a/libvtv/testsuite/other-tests/Makefile.am b/libvtv/testsuite/other-tests/Makefile.am
--- a/libvtv/testsuite/other-tests/Makefile.am
+++ b/libvtv/testsuite/other-tests/Makefile.am
@@ -1,6 +1,6 @@
 ## Makefile for the testsuite subdirectory of the VTV library.
 ##
-## Copyright (C) 2013 Free Software Foundation, Inc.
+## Copyright (C) 2013-2015 Free Software Foundation, Inc.
 ##
 ## Process this file with automake to produce Makefile.in.
 ##
diff --git a/libvtv/vtv_fail.cc b/libvtv/vtv_fail.cc
--- a/libvtv/vtv_fail.cc
+++ b/libvtv/vtv_fail.cc
@@ -1,5 +1,4 @@
-/* Copyright (C) 2012-2013
- Free Software Foundation
+/* Copyright (C) 2012-2015 Free Software Foundation, Inc.
 
  This file is part of GCC.
 
diff --git a/libvtv/vtv_fail.h b/libvtv/vtv_fail.h
--- a/libvtv/vtv_fail.h
+++ b/libvtv/vtv_fail.h
@@ -1,5 +1,4 @@
-// Copyright (C) 2012-2013
-// Free Software Foundation
+// Copyright (C) 2012-2015 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
diff --git a/libvtv/vtv_malloc.cc b/libvtv/vtv_malloc.cc
--- a/libvtv/vtv_malloc.cc
+++ b/libvtv/vtv_malloc.cc
@@ -1,5 +1,4 @@
-/* Copyright (C) 2012-2013
-   Free Software Foundation
+/* Copyright (C) 2012-2015 Free Software Foundation, Inc.
 
This file is part of GCC.
 
diff --git a/libvtv/vtv_malloc.h b/libvtv/vtv_malloc.h
--- a/libvtv/vtv_malloc.h
+++ b/libvtv/vtv_malloc.h
@@ -1,5 +1,4 @@
-// Copyright (C) 2012-2013
-// Free Software Foundation
+// Copyright (C) 2012-2015 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
diff --git a/libvtv/vtv_map.h b/libvtv/vtv_map.h
--- a/libvtv/vtv_map.h
+++ b/libvtv/vtv_map.h
@@ -1,5 +1,4 @@
-/* Copyright (C) 2012-2013
-   Free Software Foundation
+/* Copyright (C) 2012-2015 Free Software Foundation, Inc.
 
This file is part of GCC.
 
diff --git a/libvtv/vtv_rts.cc b/libvtv/vtv_rts.cc
--- a/libvtv/vtv_rts.cc
+++ b/libvtv/vtv_rts.cc
@@ -1,5 +1,4 @@
-/* Copyright (C) 2012-2013
- Free Software Foundation
+/* Copyright (C) 2012-2015 Free Softwar

Re: Fix libbacktrace -fPIC breakage from "Use libbacktrace in libgfortran"

2015-08-26 Thread Ulrich Weigand
Hans-Peter Nilsson wrote:
> > From: Ulrich Weigand 
> > Date: Tue, 25 Aug 2015 19:45:06 +0200
> 
> > However, neither works for the SPU, because in both cases libtool
> > will only do the test whether the target supports the -fPIC option.
> > It will not test whether the target supports dynamic libraries.
> > 
> > [ It will do that test; and default to --disable-shared on SPU.
> > That is a no-op for libbacktrace however, since it calls LT_INIT
> > with the disable-shared option anyway.
> 
> Maybe it shouldn't?

Huh?  We do want libbacktrace solely as static library, that's the
whole point ...

> >  When adding back the -fPIC
> > flag due to either the pic-only LT_INIT option or the -prefer-pic
> > libtool command line option, it does not check for that again.  ]
> 
> Sounds like a bug somewhere, in libtool or its current use:
> there *should* be a way to specify "I'd prefer PIC code in these
> static libraries".

But that's what the option *does*.

Let me try again, maybe we can reduce confusion a bit :-)

We've been discussing three potential sets of options to use with
the LT_INIT call here.   Those are:

A) LT_INIT# no options
   Build both a static and a shared library.  If the target does not
   support shared libraries, build the static library only.  The code
   landing in the static library is built without -fPIC; code for the
   shared library is built with -fPIC (or the appropriate target flag).

B) LT_INIT([disable-shared])
   Build *solely* a static library.  Code is compiled without -fPIC.

C) LT_INIT([disable-shared,pic-only])
   Build solely a static library, but compile code with -fPIC or the
   appropriate target flag (may be none if the target does not support
   -fPIC).

[Note that in all cases, behaviour can be overridden via configure
options like --enable/disable-shared and --enable/disable-static.]

As I understand it, we deliberately do not use option A.  As the comment
in the libbacktrace configure.ac says:
 # When building as a target library, shared libraries may want to link
 # this in.  We don't want to provide another shared library to
 # complicate dependencies.  Instead, we just compile with -fPIC.

That's why libbacktrace currently uses option B and manually adds a
-fPIC flag.  Now, after the latest check-in, the behaviour is mostly
equivalent to using option C (and not manually changing PIC flags).

However, none of the options do exactly what would be right for
the SPU, which would be:

  Build solely a static library, using code that is compiled so that
  it can be linked as part of a second library (static or shared).

This is equivalent to:

  Build solely a static library, but compile code with -fPIC or the
  appropriate target flag *if the target supports shared libraries*.

This again is *mostly* equivalent to option C, *except* on targets
that support -fPIC but do not support shared libraries.

I'm not sure if it is worthwhile to try and change libtool to
support targets with that property (e.g. adding a new LT_INIT
option), if this in practice only affects SPU.

> But, I'll have to leave solving this PIC-failing-at-linkage
> problem to you; I committed the current approved fix for
> PIC-failing-at-compilation.

I guess we can always fall back to just hard-coding SPU once
more; that's certainly the simplest solution right now.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH] Remove reference to undefined documentation node.

2015-08-26 Thread Dominik Vogt
On Wed, Aug 26, 2015 at 11:05:09AM +0100, Dominik Vogt wrote:
> This patch removes a menu entry that points to an undefined node
> in the documentation.  The faulty entry has been introduced with
> git commit id 3aabc45f2, subversion id 138bc75d-0d04-0410-96.  It
> looks like the entry is a remnant of an earlier version of the
> documentation introduced with that change.

Sorry, this patch is not good.  Please ignore; I'll look for a
different way to fix the warning.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



[PATCH][3/n] dwarf2out refactoring for early (LTO) debug

2015-08-26 Thread Richard Biener

The following fixes a GC issue I run into when doing 
prune_unused_types_prune early.  The issue is that the DIE struct
has a chain_circular marked field (die_sib) which cannot tolerate
spurious extra entries from old removed entries into the circular
chain.  Otherwise we fail to properly mark parts of the chain.
Those stray entries are kept live referenced from TYPE_SYMTAB_DIE.

So the following patch that makes sure to clear ->die_sib for
nodes we remove.  (these DIEs remaining in TYPE_SYMTAB_DIE also
means we may end up re-using them which is probably not what we
want ... in the original LTO experiment I had a ->removed flag
in the DIE struct and removed DIEs from the cache at cache lookup
time if I hit a removed DIE)

Bootstrapped and tested on x86_64-unknown-linux-gnu, gdb tested there
as well.

Ok for trunk?

Thanks,
Richard.

2015-08-26  Richard Biener  

* dwarf2out.c (remove_child_with_prev): Clear child->die_sib.
(replace_child): Likewise.
(remove_child_TAG): Adjust.
(move_marked_base_types): Likewise.
(prune_unused_types_prune): Clear die_sib of removed children.

Index: trunk/gcc/dwarf2out.c
===
--- trunk.orig/gcc/dwarf2out.c  2015-08-26 09:30:54.679185817 +0200
+++ trunk/gcc/dwarf2out.c   2015-08-25 16:54:09.150506037 +0200
@@ -4827,6 +4827,7 @@ remove_child_with_prev (dw_die_ref child
 prev->die_sib = child->die_sib;
   if (child->die_parent->die_child == child)
 child->die_parent->die_child = prev;
+  child->die_sib = NULL;
 }
 
 /* Replace OLD_CHILD with NEW_CHILD.  PREV must have the property that
@@ -4853,6 +4854,7 @@ replace_child (dw_die_ref old_child, dw_
 }
   if (old_child->die_parent->die_child == old_child)
 old_child->die_parent->die_child = new_child;
+  old_child->die_sib = NULL;
 }
 
 /* Move all children from OLD_PARENT to NEW_PARENT.  */
@@ -4883,9 +4885,9 @@ remove_child_TAG (dw_die_ref die, enum d
remove_child_with_prev (c, prev);
c->die_parent = NULL;
/* Might have removed every child.  */
-   if (c == c->die_sib)
+   if (die->die_child == NULL)
  return;
-   c = c->die_sib;
+   c = prev->die_sib;
   }
   } while (c != die->die_child);
 }
@@ -24565,8 +24590,8 @@ prune_unused_types_prune (dw_die_ref die
 
   c = die->die_child;
   do {
-dw_die_ref prev = c;
-for (c = c->die_sib; ! c->die_mark; c = c->die_sib)
+dw_die_ref prev = c, next;
+for (c = c->die_sib; ! c->die_mark; c = next)
   if (c == die->die_child)
{
  /* No marked children between 'prev' and the end of the list.  */
@@ -24578,8 +24603,14 @@ prune_unused_types_prune (dw_die_ref die
  prev->die_sib = c->die_sib;
  die->die_child = prev;
}
+ c->die_sib = NULL;
  return;
}
+  else
+   {
+ next = c->die_sib;
+ c->die_sib = NULL;
+   }
 
 if (c != prev->die_sib)
   prev->die_sib = c;
@@ -24824,8 +24855,8 @@ move_marked_base_types (void)
  remove_child_with_prev (c, prev);
  /* As base types got marked, there must be at least
 one node other than DW_TAG_base_type.  */
- gcc_assert (c != c->die_sib);
- c = c->die_sib;
+ gcc_assert (die->die_child != NULL);
+ c = prev->die_sib;
}
 }
   while (c != die->die_child);


[build] Use __cxa_atexit on Solaris 12+

2015-08-26 Thread Rainer Orth
Solaris 12 introduced __cxa_atexit in libc.  The following patch makes
use of it, and also removes the strange failures seen with gld reported
in PR c++/51923.

Bootstrapped without regressions on i386-pc-solaris2.1[12] and
sparc-sun-solaris2.1[12], will installl on mainline.  Will backport to
the gcc 5 branch after some soak time.

Rainer


2015-02-10  Rainer Orth  

* config.gcc (*-*-solaris2*): Enable default_use_cxa_atexit on
Solaris 12+.

Use __cxa_atexit on Solaris 10+

diff --git a/gcc/config.gcc b/gcc/config.gcc
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -820,6 +820,12 @@ case ${target} in
   sol2_tm_file_head="dbxelf.h elfos.h ${cpu_type}/sysv4.h"
   sol2_tm_file_tail="${cpu_type}/sol2.h sol2.h"
   sol2_tm_file="${sol2_tm_file_head} ${sol2_tm_file_tail}"
+  case ${target} in
+*-*-solaris2.1[2-9]*)
+  # __cxa_atexit was introduced in Solaris 12.
+  default_use_cxa_atexit=yes
+  ;;
+  esac
   use_gcc_stdint=wrap
   if test x$gnu_ld = xyes; then
 tm_file="usegld.h ${tm_file}"

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[boehm-gc] Avoid unstructured procfs on Solaris

2015-08-26 Thread Rainer Orth
boehm-gc doesn't currently build on Solaris 12 since that release
finally removed the old unstructured /proc, thus the PIOCOPENPD ioctl.
This is already mentioned in the Solaris 11 EOF list:


http://www.oracle.com/technetwork/systems/end-of-notices/eonsolaris11-392732.html

Since the replacement (using /proc//pagedata directly) has been
available since Solaris 2.6 in 1997, there's no need to retain the old
code, especially given that mainline only supports Solaris 10 and up.

Bootstrapped without regressions on i386-pc-solaris2.1[12] and
sparc-sun-solaris2.1[12], will install on mainline.  Will backport to
the gcc 5 branch after some soak time.

Rainer


2015-02-10  Rainer Orth  

* os_dep.c [GC_SOLARIS_THREADS] (GC_dirty_init): Use
/proc//pagedata instead of PIOCOPENPD.

# HG changeset patch
# Parent 819be80e1b9c7e840fe5d232d64cf106869a933d
Avoid unstructured procfs on Solaris 12+

diff --git a/boehm-gc/os_dep.c b/boehm-gc/os_dep.c
--- a/boehm-gc/os_dep.c
+++ b/boehm-gc/os_dep.c
@@ -3184,13 +3184,11 @@ void GC_dirty_init()
 		  		(GC_words_allocd + GC_words_allocd_before_gc));
 #	endif   
 }
-sprintf(buf, "/proc/%d", getpid());
-fd = open(buf, O_RDONLY);
-if (fd < 0) {
+sprintf(buf, "/proc/%d/pagedata", getpid());
+GC_proc_fd = open(buf, O_RDONLY);
+if (GC_proc_fd < 0) {
 	ABORT("/proc open failed");
 }
-GC_proc_fd = syscall(SYS_ioctl, fd, PIOCOPENPD, 0);
-close(fd);
 syscall(SYS_fcntl, GC_proc_fd, F_SETFD, FD_CLOEXEC);
 if (GC_proc_fd < 0) {
 	ABORT("/proc ioctl failed");


-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[libgo] Use stat_atim.go on Solaris 12+

2015-08-26 Thread Rainer Orth
Solaris 12 changes the stat_[amc]tim members of struct stat from
timestruc_t to timespec_t for XPG7 compatiblity, thus breaking the libgo
build.  The following patch checks for this change and uses the common
stat_atim.go if appropriate.

Btw., I noticed that go/os/stat_atim.go and stat_dragonfly.go are identical;
no idea why that would be useful.

Bootstrapped without regressions on i386-pc-solaris2.1[12] and
sparc-sun-solaris2.1[12].

I had to regenerate aclocal.m4 since for some reason it had been built
with automake 1.11.1 instead of the common 1.11.6, thus inhibiting
Makefile.in regeneration.

Ok for mainline now and the gcc 5 branch after some soak time?

Rainer


2015-02-10  Rainer Orth  

* configure.ac (have_stat_timespec): Check for timespec_t st_atim
in .
(HAVE_STAT_TIMESPEC): New conditional.
* configure: Regenerate.
* Makefile.am [LIBGO_IS_SOLARIS && HAVE_STAT_TIMESPEC]
(go_os_stat_file): Use go/os/stat_atim.go.
* aclocal.m4: Regenerate.
* Makefile.in: Regenerate.

# HG changeset patch
# Parent b83d7b91430fc3d2c2f34df34aaf648b178d2cad
Use stat_atim.go on Solaris 12+

diff --git a/libgo/Makefile.am b/libgo/Makefile.am
--- a/libgo/Makefile.am
+++ b/libgo/Makefile.am
@@ -880,7 +880,11 @@ endif
 endif
 
 if LIBGO_IS_SOLARIS
+if HAVE_STAT_TIMESPEC
+go_os_stat_file = go/os/stat_atim.go
+else
 go_os_stat_file = go/os/stat_solaris.go
+endif
 else
 if LIBGO_IS_LINUX
 go_os_stat_file = go/os/stat_atim.go
diff --git a/libgo/configure.ac b/libgo/configure.ac
--- a/libgo/configure.ac
+++ b/libgo/configure.ac
@@ -654,6 +654,12 @@ AC_CACHE_CHECK([epoll_event data.fd offs
 STRUCT_EPOLL_EVENT_FD_OFFSET=${libgo_cv_c_epoll_event_fd_offset}
 AC_SUBST(STRUCT_EPOLL_EVENT_FD_OFFSET)
 
+dnl Check if  uses timespec_t for st_?tim members.  Introduced
+dnl in Solaris 12 for XPG7 compatibility.
+AC_EGREP_HEADER([timespec_t.*st_atim], [sys/stat.h],
+		[have_stat_timespec=yes], [have_stat_timespec=no])
+AM_CONDITIONAL(HAVE_STAT_TIMESPEC, test $have_stat_timespec = yes)
+
 dnl See if struct exception is defined in .
 AC_CHECK_TYPE([struct exception],
 [libgo_has_struct_exception=yes],

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Ilya Enkovich
2015-08-26 0:42 GMT+03:00 Jeff Law :
> On 08/21/2015 04:49 AM, Ilya Enkovich wrote:
>>
>>
>> I want a work with bitmasks to be expressed in a natural way using
>> regular integer operations. Currently all masks manipulations are
>> emulated via vector statements (mostly using a bunch of vec_cond). For
>> complex predicates it may be nontrivial to transform it back to scalar
>> masks and get an efficient code. Also the same vector may be used as
>> both a mask and an integer vector. Things become more complex if you
>> additionally have broadcasts and vector pack/unpack code. It also
>> should be transformed into a scalar masks manipulations somehow.
>
> Or why not model the conversion at the gimple level using a CONVERT_EXPR?
> In fact, the more I think about it, that seems to make more sense to me.
>
> We pick a canonical form for the mask, whatever it may be.  We use that
> canonical form and model conversions between it and the other form via
> CONVERT_EXPR.  We then let DOM/PRE find/eliminate the redundant conversions.
> If it's not up to the task, we should really look into why and resolve.
>
> Yes, that does mean we have two forms which I'm not terribly happy about and
> it means some target dependencies on what the masked vector operation looks
> like (ie, does it accept a simple integer or vector mask), but I'm starting
> to wonder if, as distasteful as I find it, it's the right thing to do.

If we have some special representation for masks in GIMPLE then we
might not need any conversions. We could ask a target to define a MODE
for this type and use it directly everywhere: directly compare into
it, use it directly for masked loads and stores, AND, IOR, EQ etc. If
that type is reserved for masks usage then you previous suggestion to
transform masks into target specific form at GIMPLE->RTL phase should
work fine. This would allow to support only a single masks
representation in GIMPLE.

Thanks,
Ilya

>
>>>
>>> But I don't like changing our IL so much as to allow 'integer' masks
>>> everywhere.
>
> I'm warming up to that idea...
>
> jeff
>


Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Ilya Enkovich
2015-08-26 0:26 GMT+03:00 Jeff Law :
> On 08/21/2015 06:17 AM, Ilya Enkovich wrote:
>>>
>>>
>>> Hmm, I don't see how vector masks are more difficult to operate with.
>>
>>
>> There are just no instructions for that but you have to pretend you
>> have to get code vectorized.
>>
>>>
 Also according to vector ABI integer mask should be used for mask
 operand in case of masked vector call.
>>>
>>>
>>> What ABI?  The function signature of the intrinsics?  How would that
>>> come into play here?
>>
>>
>> Not intrinsics. I mean OpenMP vector functions which require integer
>> arg for a mask in case of 512-bit vector.
>
> That's what I assumed -- you can pass in a mask as an argument and it's
> supposed to be a simple integer, right?

Depending on target ABI requires either vector mask or a simple integer value.

>
>
>>
>>>
 Current implementation of masked loads, masked stores and bool
 patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
 really call it a canonical representation for all targets?
>>>
>>>
>>> No idea - we'll revisit when another targets adds a similar capability.
>>
>>
>> AVX-512 is such target. Current representation forces multiple scalar
>> mask -> vector mask and back transformations which are artificially
>> introduced by current bool patterns and are hard to optimize out.
>
> I'm a bit surprised they're so prevalent and hard to optimize away. ISTM PRE
> ought to handle this kind of thing with relative ease.

Most of vector comparisons are UNSPEC. And I doubt PRE may actually
help much even if get rid of UNSPEC somehow. Is there really a
redundancy in:

if ((v1 cmp v2) && (v3 cmp v4))
  load

v1 cmp v2 -> mask1
select mask1 vec_cst_-1 vec_cst_0 -> vec_mask1
v3 cmp v4 -> mask2
select mask2 vec_mask1 vec_cst_0 -> vec_mask2
vec_mask2 NE vec_cst_0 -> mask3
load by mask3

It looks to me more like a i386 specific instruction selection problem.

Ilya

>
>
>>> Fact is GCC already copes with vector masks generated by vector compares
>>> just fine everywhere and I'd rather leave it as that.
>>
>>
>> Nope. Currently vector mask is obtained from a vec_cond > 0}, {-1 .. -1}>. AND and IOR on bools are also expressed via
>> additional vec_cond. I don't think vectorizer ever generates vector
>> comparison.
>>
>> And I wouldn't say it's fine 'everywhere' because there is a single
>> target utilizing them. Masked loads and stored for AVX-512 just don't
>> work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
>> 512-bit vector then we get an ugly inefficient code. The question is
>> where to fight with this inefficiency: in RTL or in GIMPLE. I want to
>> fight with it where it appears, i.e. in GIMPLE by preventing bool ->
>> int conversions applied everywhere even if target doesn't need it.
>
> You should expect pushback anytime target dependencies are added to gimple,
> even if it's stuff in the vectorizer, which is infested with target
> dependencies.
>
>>
>> If we don't want to support both types of masks in GIMPLE then it's
>> more reasonable to make bool -> int conversion in expand for targets
>> requiring it, rather than do it for everyone and then leave it to
>> target to transform it back and try to get rid of all those redundant
>> transformations. I'd give vector a chance to become a canonical
>> mask representation for that.
>
> Might be worth some experimentation.
>
> Jeff


Re: [PATCH][1/n] dwarf2out refactoring for early (LTO) debug

2015-08-26 Thread Richard Biener
On Wed, 19 Aug 2015, Richard Biener wrote:

> On Tue, 18 Aug 2015, Aldy Hernandez wrote:
> 
> > On 08/18/2015 07:20 AM, Richard Biener wrote:
> > > 
> > > This starts a series of patches (still in development) to refactor
> > > dwarf2out.c to better cope with early debug (and LTO debug).
> > 
> > Awesome!  Thanks.
> > 
> > > Aldyh, what other testing did you usually do for changes?  Run
> > > the gdb testsuite against the new compiler?  Anything else?
> > 
> > gdb testsuite, and make sure you test GCC with 
> > --enable-languages=all,go,ada,
> > though the latter is mostly useful while you iron out bugs initially.  I 
> > found
> > that ultimately, the best test was C++.
> 
> I see.
> 
> > Pre merge I also bootstrapped the compiler and compared .debug* section 
> > sizes
> > in object files to make sure things were within reason.
> > 
> > > +
> > > +static void
> > > +vmsdbgout_early_finish (const char *filename ATTRIBUTE_UNUSED)
> > > +{
> > > +  if (write_symbols == VMS_AND_DWARF2_DEBUG)
> > > +(*dwarf2_debug_hooks.early_finish) (filename);
> > > +}
> > 
> > You can get rid of ATTRIBUTE_UNUSED now.
> 
> Done.  I've also refrained from moving
> 
>   gen_scheduled_generic_parms_dies ();
>   gen_remaining_tmpl_value_param_die_attribute ();
> 
> for now as that causes regressions I have to investigate.
> 
> The patch below has passed bootstrap & regtest on x86_64-unknown-linux-gnu
> as well as gdb testing.  Twice unpatched, twice patched - results seem
> to be somewhat unstable!?  I even refrained from using any -j with
> make check-gdb...  maybe it's just contrib/test_summary not coping well
> with gdb?  any hints?  Difference between unpatched run 1 & 2 is
> for example
> 
> --- results.unpatched   2015-08-19 15:08:36.152899926 +0200
> +++ results.unpatched2  2015-08-19 15:29:46.902060797 +0200
> @@ -209,7 +209,6 @@
>  WARNING: remote_expect statement without a default case?!
>  WARNING: remote_expect statement without a default case?!
>  WARNING: remote_expect statement without a default case?!
> -FAIL: gdb.base/varargs.exp: print find_max_float_real(4, fc1, fc2, fc3, 
> fc4)
>  FAIL: gdb.cp/inherit.exp: print g_vD
>  FAIL: gdb.cp/inherit.exp: print g_vE
>  FAIL: gdb.cp/no-dmgl-verbose.exp: setting breakpoint at 'f(std::string)'
> @@ -238,6 +237,7 @@
>  UNRESOLVED: gdb.fortran/types.exp: set print sevenbit-strings
>  FAIL: gdb.fortran/whatis_type.exp: run to MAIN__
>  WARNING: remote_expect statement without a default case?!
> +FAIL: gdb.gdb/complaints.exp: print symfile_complaints->root->fmt
>  WARNING: remote_expect statement without a default case?!
>  WARNING: remote_expect statement without a default case?!
>  WARNING: remote_expect statement without a default case?!
> @@ -362,12 +362,12 @@
> === gdb Summary ===
>  
> -# of expected passes   30881
> +# of expected passes   30884
>  # of unexpected failures   284
>  # of unexpected successes  2
> -# of expected failures 85
> +# of expected failures 83
>  # of unknown successes 2
> -# of known failures60
> +# of known failures59
>  # of unresolved testcases  6
>  # of untested testcases32
>  # of unsupported tests 165
> 
> the sames changes randomly appear/disappear in the patched case.  
> Otherwise patched/unpatched agree.
> 
> Ok?

Jason, are you willing to review these refactoring patches or can
I invoke my middle-end maintainer powers to remove some of this noise
from the LTO parts?

Thanks,
Richard.

> Thanks,
> Richard.
> 
> 2015-08-18  Richard Biener  
> 
>   * debug.h (gcc_debug_hooks::early_finish): Add filename argument.
>   * dbxout.c (dbx_debug_hooks): Adjust.
>   * debug.c (do_nothing_hooks): Likewise.
>   * sdbout.c (sdb_debug_hooks): Likewise.
>   * vmsdbgout.c (vmsdbgout_early_finish): New function dispatching
>   to dwarf2out variant if needed.
>   (vmsdbg_debug_hooks): Adjust.
>   * dwarf2out.c (dwarf2_line_hooks): Adjust.
>   (flush_limbo_die_list): New function.
>   (dwarf2out_finish): Call flush_limbo_die_list instead of
>   dwarf2out_early_finish.  Assert there are no deferred asm-names.
>   Move early stuff ...
>   (dwarf2out_early_finish): ... here.
>   * cgraphunit.c (symbol_table::finalize_compilation_unit):
>   Call early_finish with main_input_filename argument.
> 
> 
> Index: gcc/cgraphunit.c
> ===
> --- gcc/cgraphunit.c  (revision 226966)
> +++ gcc/cgraphunit.c  (working copy)
> @@ -2490,7 +2490,7 @@ symbol_table::finalize_compilation_unit
>  
>/* Clean up anything that needs cleaning up after initial debug
>   generation.  */
> -  (*debug_hooks->early_finish) ();
> +  (*debug_hooks->early_finish) (main_input_filename);
>  
>/* Finally drive the pass manager.  */
>compile ();
> Index: gcc/dbxout.c
> ===

[PATCH] Remove reference to undefined documentation node.

2015-08-26 Thread Dominik Vogt
This patch removes a menu entry that points to an undefined node
in the documentation.  The faulty entry has been introduced with
git commit id 3aabc45f2, subversion id 138bc75d-0d04-0410-96.  It
looks like the entry is a remnant of an earlier version of the
documentation introduced with that change.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* doc/extend.texi: Remove reference to undefined node.
>From 55b9c29f73d8da1881ce5a3f65d0c7f40623e161 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 26 Aug 2015 10:59:29 +0100
Subject: [PATCH] Remove reference to undefined documentation node.

---
 gcc/doc/extend.texi | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 018b5d8..f5f90e6 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7245,7 +7245,6 @@ for a C symbol, or to place a C variable in a specific register.
 @menu
 * Basic Asm::  Inline assembler without operands.
 * Extended Asm::   Inline assembler with operands.
-* Constraints::Constraints for @code{asm} operands
 * Asm Labels:: Specifying the assembler name to use for a C symbol.
 * Explicit Reg Vars::  Defining variables residing in specified registers.
 * Size of an asm:: How GCC calculates the size of an @code{asm} block.
-- 
2.3.0



Re: [PATCH 2/5] completely_scalarize arrays as well as records

2015-08-26 Thread Richard Biener
On August 26, 2015 11:30:26 AM GMT+02:00, Martin Jambor  wrote:
>Hi,
>
>On Wed, Aug 26, 2015 at 09:07:33AM +0200, Richard Biener wrote:
>> On Tue, Aug 25, 2015 at 11:44 PM, Jeff Law  wrote:
>> > On 08/25/2015 03:42 PM, Martin Jambor wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Tue, Aug 25, 2015 at 12:06:14PM +0100, Alan Lawrence wrote:
>> >>>
>> >>> This changes the completely_scalarize_record path to also work on
>arrays
>> >>> (thus
>> >>> allowing records containing arrays, etc.). This just required
>extending
>> >>> the
>> >>> existing type_consists_of_records_p and
>completely_scalarize_record
>> >>> methods
>> >>> to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I
>renamed
>> >>> both
>> >>> methods so as not to mention 'record'.
>> >>
>> >>
>> >> thanks for working on this.  I see Jeff has already approved the
>> >> patch, but I have two comments nevertheless.  First, I would be
>much
>> >> happier if you added a proper comment to scalarize_elem function
>which
>> >> you forgot completely.  The name is not very descriptive and it
>has
>> >> quite few parameters too.
>> >
>> > Right.  I mentioned that I missed the lack of function comments
>when looking
>> > at #3 and asked Alan to go back and fix them in #1 and #2.
>> >
>> >>
>> >> Second, this patch should also fix PR 67283.  It would be great if
>you
>> >> could verify that and add it to the changelog when committing if
>that
>> >> is indeed the case.
>> >
>> > Excellent.  Yes, definitely mention the BZ.
>> 
>> One extra question is does the way we limit total scalarization work
>well
>> for arrays?  I suppose we have either sth like the maximum size of an
>> aggregate we scalarize or the maximum number of component accesses
>> we create?
>> 
>
>Only the former and that would be kept intact.  It is in fact visible
>in the context of the last hunk of the patch.

OK.  IIRC the gimplification code also has the latter and also considers 
zeroing the whole aggregate before initializing non-zero fields.  IMHO it makes 
sense to reuse some of the analysis and classification routines it has.

Richard.

>Martin




Re: [PATCH 2/5] completely_scalarize arrays as well as records

2015-08-26 Thread Martin Jambor
Hi,

On Wed, Aug 26, 2015 at 09:07:33AM +0200, Richard Biener wrote:
> On Tue, Aug 25, 2015 at 11:44 PM, Jeff Law  wrote:
> > On 08/25/2015 03:42 PM, Martin Jambor wrote:
> >>
> >> Hi,
> >>
> >> On Tue, Aug 25, 2015 at 12:06:14PM +0100, Alan Lawrence wrote:
> >>>
> >>> This changes the completely_scalarize_record path to also work on arrays
> >>> (thus
> >>> allowing records containing arrays, etc.). This just required extending
> >>> the
> >>> existing type_consists_of_records_p and completely_scalarize_record
> >>> methods
> >>> to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I renamed
> >>> both
> >>> methods so as not to mention 'record'.
> >>
> >>
> >> thanks for working on this.  I see Jeff has already approved the
> >> patch, but I have two comments nevertheless.  First, I would be much
> >> happier if you added a proper comment to scalarize_elem function which
> >> you forgot completely.  The name is not very descriptive and it has
> >> quite few parameters too.
> >
> > Right.  I mentioned that I missed the lack of function comments when looking
> > at #3 and asked Alan to go back and fix them in #1 and #2.
> >
> >>
> >> Second, this patch should also fix PR 67283.  It would be great if you
> >> could verify that and add it to the changelog when committing if that
> >> is indeed the case.
> >
> > Excellent.  Yes, definitely mention the BZ.
> 
> One extra question is does the way we limit total scalarization work well
> for arrays?  I suppose we have either sth like the maximum size of an
> aggregate we scalarize or the maximum number of component accesses
> we create?
> 

Only the former and that would be kept intact.  It is in fact visible
in the context of the last hunk of the patch.

Martin


RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard library support check the sysroot supports the required test options.

2015-08-26 Thread Matthew Fortune
Moore, Catherine  writes:
> > The recent changes to the MIPS GCC Linux sysroot
> > (https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01014.html) have meant
> > that the include directory is now not global and is provided only for
> > each multi-lib configuration.  This means that for any test in the
> > MIPS GCC Testsuite that requires standard library support we need to
> > check if there is a multi-lib support for the test options, otherwise
> it might fail to compile.
> >
> > This patch adds this support to the testsuite and mips.exp files.
> > Firstly any test that requires standard library support has the
> > implicit option "(REQUIRES_STDLIB)" added to its dg-options.  Secondly
> > in mips.exp a pre- processor check is performed to ensure that when
> > expanding a testcase containing a "#include " using the
> > current set of test options we do not get file not found errors.  If
> > this happens we mark the testcase as unsupported.
> >
> > The patch has been tested on the mti/img elf/linux-gnu toolchains, and
> > there have been no new regressions.
> >
> > The patch and ChangeLog are below.
> >
> > Ok to commit?
> >
> >
> Yes.  This looks good.

I had some comments on this that I hadn't got round to posting. The fix in
this patch is not general enough as the missing header problem comes in
two (related) forms:

1) Using the new MTI and IMG sysroot layout we can end up with GCC looking
   for headers in a sysroot that simply does not exist. The current patch
   handles this.
2) Using any sysroot layout (i.e. a simple mips-linux-gnu) it is possible
   for the stdlib.h header to be found but the ABI dependent gnu-stubs
   header may not be installed depending on soft/hard nan1985/nan2008.

The test for stdlib.h needs to therefore verify that preprocessing succeeds
rather than just testing for an error relating to stdlib.h. This could be
done by adding a further option to mips_preprocess to indicate the processor
output should go to a file and that the caller wants the messages emitted
by the compiler instead.

A second issue is that you have added (REQUIRES_STDLIB) to too many tests.
You only need to add it to tests that request a compiler option (via
dg-options) that could potentially lead to forcing soft/hard nan1985/nan2008
directly or indirectly. So -mips32r6 implies nan2008 so you need it -mips32r5
implies nan1985 so you need it. There are at least two tests which don't
need the option but you need to check them all so we don't run the check
needlessly.

Thanks,
Matthew


Re: [PATCH][ARM]Tighten the conditions for arm_movw, arm_movt

2015-08-26 Thread Ramana Radhakrishnan
>
> I have tested that, arm-none-linux-gnueabi bootstraps Okay on trunk code.

 JFTR, this is ok to backport to gcc-5 in case there are no regressions.

regards
Ramana



>
>>
>> Thanks,
>> Kyrill
>>
>>
>


Re: [PING^2][PATCH, PR46193] Handle mix/max pointer reductions in parloops

2015-08-26 Thread Richard Biener
On Mon, Aug 24, 2015 at 5:10 PM, Tom de Vries  wrote:
> On 22-07-15 20:15, Tom de Vries wrote:
>>
>> On 13/07/15 13:02, Tom de Vries wrote:
>>>
>>> Hi,
>>>
>>> this patch fixes PR46193.
>>>
>>> It handles min and max reductions of pointer type in parloops.
>>>
>>> Bootstrapped and reg-tested on x86_64.
>>>
>>> OK for trunk?
>>>
>>
>
> Ping^2.
>
> Original submission at
> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01018.html .

Please don't use lower_bound_in_type with two identical types.
Instead use wi::max_value and wide_int_to_tree.

Ok with that change.

Thanks,
Richard.

>
> Thanks,
> - Tom
>
>>
>>> 0001-Handle-mix-max-pointer-reductions-in-parloops.patch
>>>
>>>
>>> Handle mix/max pointer reductions in parloops
>>>
>>> 2015-07-13  Tom de Vries
>>>
>>> PR tree-optimization/46193
>>> * omp-low.c (omp_reduction_init): Handle pointer type for min or max
>>> clause.
>>>
>>> * gcc.dg/autopar/pr46193.c: New test.
>>>
>>> * testsuite/libgomp.c/pr46193.c: New test.
>>> ---
>>>   gcc/omp-low.c  |  4 ++
>>>   gcc/testsuite/gcc.dg/autopar/pr46193.c | 38 +++
>>>   libgomp/testsuite/libgomp.c/pr46193.c  | 67
>>> ++
>>>   3 files changed, 109 insertions(+)
>>>   create mode 100644 gcc/testsuite/gcc.dg/autopar/pr46193.c
>>>   create mode 100644 libgomp/testsuite/libgomp.c/pr46193.c
>>>
>>> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
>>> index 2e2070a..20d0010 100644
>>> --- a/gcc/omp-low.c
>>> +++ b/gcc/omp-low.c
>>> @@ -3423,6 +3423,8 @@ omp_reduction_init (tree clause, tree type)
>>>   real_maxval (&min, 1, TYPE_MODE (type));
>>> return build_real (type, min);
>>>   }
>>> +  else if (POINTER_TYPE_P (type))
>>> +return lower_bound_in_type (type, type);
>>> else
>>>   {
>>> gcc_assert (INTEGRAL_TYPE_P (type));
>>> @@ -3439,6 +3441,8 @@ omp_reduction_init (tree clause, tree type)
>>>   real_maxval (&max, 0, TYPE_MODE (type));
>>> return build_real (type, max);
>>>   }
>>> +  else if (POINTER_TYPE_P (type))
>>> +return upper_bound_in_type (type, type);
>>> else
>>>   {
>>> gcc_assert (INTEGRAL_TYPE_P (type));
>>> diff --git a/gcc/testsuite/gcc.dg/autopar/pr46193.c
>>> b/gcc/testsuite/gcc.dg/autopar/pr46193.c
>>> new file mode 100644
>>> index 000..544a5da
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/autopar/pr46193.c
>>> @@ -0,0 +1,38 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -ftree-parallelize-loops=2
>>> -fdump-tree-parloops-details"
>>> } */
>>> +
>>> +extern void abort (void);
>>> +
>>> +char *
>>> +foo (int count, char **list)
>>> +{
>>> +  char *minaddr = list[0];
>>> +  int i;
>>> +
>>> +  for (i = 0; i < count; i++)
>>> +{
>>> +  char *addr = list[i];
>>> +  if (addr < minaddr)
>>> +minaddr = addr;
>>> +}
>>> +
>>> +  return minaddr;
>>> +}
>>> +
>>> +char *
>>> +foo2 (int count, char **list)
>>> +{
>>> +  char *maxaddr = list[0];
>>> +  int i;
>>> +
>>> +  for (i = 0; i < count; i++)
>>> +{
>>> +  char *addr = list[i];
>>> +  if (addr > maxaddr)
>>> +maxaddr = addr;
>>> +}
>>> +
>>> +  return maxaddr;
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump-times "parallelizing inner loop" 2
>>> "parloops"
>>> } } */
>>> diff --git a/libgomp/testsuite/libgomp.c/pr46193.c
>>> b/libgomp/testsuite/libgomp.c/pr46193.c
>>> new file mode 100644
>>> index 000..1e27faf
>>> --- /dev/null
>>> +++ b/libgomp/testsuite/libgomp.c/pr46193.c
>>> @@ -0,0 +1,67 @@
>>> +/* { dg-do run } */
>>> +/* { dg-additional-options "-ftree-parallelize-loops=2" } */
>>> +
>>> +extern void abort (void);
>>> +
>>> +char *
>>> +foo (int count, char **list)
>>> +{
>>> +  char *minaddr = list[0];
>>> +  int i;
>>> +
>>> +  for (i = 0; i < count; i++)
>>> +{
>>> +  char *addr = list[i];
>>> +  if (addr < minaddr)
>>> +minaddr = addr;
>>> +}
>>> +
>>> +  return minaddr;
>>> +}
>>> +
>>> +char *
>>> +foo2 (int count, char **list)
>>> +{
>>> +  char *maxaddr = list[0];
>>> +  int i;
>>> +
>>> +  for (i = 0; i < count; i++)
>>> +{
>>> +  char *addr = list[i];
>>> +  if (addr > maxaddr)
>>> +maxaddr = addr;
>>> +}
>>> +
>>> +  return maxaddr;
>>> +}
>>> +
>>> +#define N 5
>>> +
>>> +static void
>>> +init (char **list)
>>> +{
>>> +  int i;
>>> +  for (i = 0; i < N; ++i)
>>> +list[i] = (char *)&list[i];
>>> +}
>>> +
>>> +int
>>> +main (void)
>>> +{
>>> +  char *list[N];
>>> +  char * res;
>>> +
>>> +  init (list);
>>> +
>>> +  res = foo (N, list);
>>> +
>>> +  if (res != (char *)&list[0])
>>> +abort ();
>>> +
>>> +  res = foo2 (N, list);
>>> +
>>> +  if (res != (char *)&list[N-1])
>>> +abort ();
>>> +
>>> +  return 0;
>>> +}
>>> -- 1.9.1
>>>
>>
>


Re: [PATCH] [AVX512F] Add scatter support for vectorizer

2015-08-26 Thread Richard Biener
On Fri, Aug 21, 2015 at 2:18 PM, Petr Murzin  wrote:
> Hello,
> Please have a look at updated patch.
>
> On Tue, Aug 4, 2015 at 3:15 PM, Richard Biener  wrote:
>> On Fri, 31 Jul 2015, Petr Murzin wrote:
>> @@ -5586,8 +5770,6 @@ vectorizable_store (gimple stmt,
>> gimple_stmt_iterator *gsi, gimple *vec_stmt,
>>prev_stmt_info = NULL;
>>for (j = 0; j < ncopies; j++)
>>  {
>> -  gimple new_stmt;
>> -
>>if (j == 0)
>> {
>>if (slp)
>>
>> spurious change?
>
> I have increased the scope of this variable to use it in checking for
> STMT_VINFO_SCATTER_P (stmt_info).

@@ -3763,32 +3776,46 @@ again:
   if (vf > *min_vf)
*min_vf = vf;

-  if (gather)
+  if (gatherscatter != SG_NONE)
{
  tree off;
+ if (vect_check_gather_scatter (stmt, loop_vinfo, NULL, &off,
NULL, true) != 0)
+   gatherscatter = GATHER;
+ else if (vect_check_gather_scatter (stmt, loop_vinfo, NULL,
&off, NULL, false)
+ != 0)
+   gatherscatter = SCATTER;
+ else
+   gatherscatter = SG_NONE;

as I said vect_check_gather_scatter already knows whether the DR is a read or
a write and thus whether it needs to check for gather or scatter.  Remove
the new argument.  And simply do

   if (!vect_check_gather_scatter (stmt))
 gatherscatter = SG_NONE;

- STMT_VINFO_GATHER_P (stmt_info) = true;
+ if (gatherscatter == GATHER)
+   STMT_VINFO_GATHER_P (stmt_info) = true;
+ else
+   STMT_VINFO_SCATTER_P (stmt_info) = true;
}

and as suggested merge STMT_VINFO_GATHER_P and STMT_VINFO_SCATTER_P
using the enum so you can simply do

 STMT_VINFO_SCATTER_GATHER_P (smt_info) = gatherscatter;

I miss a few testcases that exercise scatter vectorization.  And as Uros
said, the i386 specific parts should be split out.

Otherwise the patch looks ok to me.

Thanks,
Richard.


> Thanks,
> Petr
>
> 2015-08-21  Andrey Turetskiy  
> Petr Murzin  
>
> gcc/
>
> * config/i386/i386-builtin-types.def
> (VOID_PFLOAT_HI_V8DI_V16SF_INT): New.
> (VOID_PDOUBLE_QI_V16SI_V8DF_INT): Ditto.
> (VOID_PINT_HI_V8DI_V16SI_INT): Ditto.
> (VOID_PLONGLONG_QI_V16SI_V8DI_INT): Ditto.
> * config/i386/i386.c
> (ix86_builtins): Add IX86_BUILTIN_SCATTERALTSIV8DF,
> IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI,
> IX86_BUILTIN_SCATTERALTDIV16SI.
> (ix86_init_mmx_sse_builtins): Define __builtin_ia32_scatteraltsiv8df,
> __builtin_ia32_scatteraltdiv8sf, __builtin_ia32_scatteraltsiv8di,
> __builtin_ia32_scatteraltdiv8si.
> (ix86_expand_builtin): Handle IX86_BUILTIN_SCATTERALTSIV8DF,
> IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI,
> IX86_BUILTIN_SCATTERALTDIV16SI.
> (ix86_vectorize_builtin_scatter): New.
> (TARGET_VECTORIZE_BUILTIN_SCATTER): Define as
> ix86_vectorize_builtin_scatter.
> * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_SCATTER): New.
> * doc/tm.texi: Regenerate.
> * target.def: Add scatter builtin.
> * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Add new
> checkings for STMT_VINFO_SCATTER_P.
> (vect_check_gather): Rename to ...
> (vect_check_gather_scatter): this and enhance number of arguments.
> (vect_analyze_data_refs): Add gatherscatter enum and maybe_scatter variable
> and new checkings for it accordingly.
> * tree-vectorizer.h: Rename gather_p to gather_scatter_p and use it
> for loads/stores
>  in case of gather/scatter accordingly.
> (STMT_VINFO_SCATTER_P(S)): Define.
> (vect_check_gather): Rename to ...
> (vect_check_gather_scatter): this.
> * triee-vect-stmts.c (vectorizable_mask_load_store): Ditto.
> (vectorizable_store): Add checkings for STMT_VINFO_SCATTER_P.
> (vect_mark_stmts_to_be_vectorized): Ditto.


Re: [PATCH 3/5] Build ARRAY_REFs when the base is of ARRAY_TYPE.

2015-08-26 Thread Bin.Cheng
On Wed, Aug 26, 2015 at 3:29 PM, Richard Biener  wrote:
> On Wed, 26 Aug 2015, Bin.Cheng wrote:
>
>> On Wed, Aug 26, 2015 at 3:50 AM, Jeff Law  wrote:
>> > On 08/25/2015 05:06 AM, Alan Lawrence wrote:
>> >>
>> >> When SRA completely scalarizes an array, this patch changes the
>> >> generated accesses from e.g.
>> >>
>> >> MEM[(int[8] *)&a + 4B] = 1;
>> >>
>> >> to
>> >>
>> >> a[1] = 1;
>> >>
>> >> This overcomes a limitation in dom2, that accesses to equivalent
>> >> chunks of e.g. MEM[(int[8] *)&a] are not hashable_expr_equal_p with
>> >> accesses to e.g. MEM[(int[8] *)&a]. This is necessary for constant
>> >> propagation in the ssa-dom-cse-2.c testcase (after the next patch
>> >> that makes SRA handle constant-pool loads).
>> >>
>> >> I tried to work around this by making dom2's hashable_expr_equal_p
>> >> less conservative, but found that on platforms without AArch64's
>> >> vectorized reductions (specifically Alpha, hppa, PowerPC, and SPARC,
>> >> mentioned in ssa-dom-cse-2.c), I also needed to make MEM[(int[8]
>> >> *)&a] equivalent to a[0], etc.; a complete overhaul of
>> >> hashable_expr_equal_p seems like a larger task than this patch
>> >> series.
>> >>
>> >> I can't see how to write a testcase for this in C though as direct
>> >> assignment to an array is not possible; such assignments occur only
>> >> with constant pool data, which is dealt with in the next patch.
>> >
>> > It's a general issue that if there's > 1 common way to represent an
>> > expression, then DOM will often miss discovery of the CSE opportunity
>> > because of the way it hashes expressions.
>> >
>> > Ideally we'd be moving to a canonical form, but I also realize that in
>> > the case of memory references like this, that may not be feasible.
>> IIRC, there were talks about lowering all memory reference on GIMPLE?
>> Which is the reverse approach.  Since SRA is in quite early
>> compilation stage, don't know if lowered memory reference has impact
>> on other optimizers.
>
> Yeah, I'd only do the lowering after loop opts.  Which also may make
> the DOM issue moot as the array refs would be lowered as well and thus
> DOM would see a consistent set of references again.  The lowering should
> also simplify SLSR and expose address computation redundancies to DOM.
>
> I'd place such lowering before the late reassoc (any takers?  I suppose
> you can pick up one of the bitfield lowering passes posted in the
> previous years as this should also handle bitfield accesses correctly).
I ran into several issues related to lowered memory references (some
of them are about slsr), and want to have a look at this.  But only
after finishing major issues in IVO...

As for slsr, I think the problem is more about we need to prove
equality of expressions by diving into definition chain of ssa_var,
just like tree_to_affine_expand.  I think this has already been
discussed too.  Anyway, lowering memory reference provides a canonical
form and should benefit other optimizers.

Thanks,
bin
>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>> >
>> > It does make me wonder how many CSEs we're really missing due to the two
>> > ways to represent array accesses.
>> >
>> >
>> >> Bootstrap + check-gcc on x86-none-linux-gnu,
>> >> arm-none-linux-gnueabihf, aarch64-none-linux-gnu.
>> >>
>> >> gcc/ChangeLog:
>> >>
>> >> * tree-sra.c (completely_scalarize): Move some code into:
>> >> (get_elem_size): New. (build_ref_for_offset): Build ARRAY_REF if base
>> >> is aligned array. --- gcc/tree-sra.c | 110
>> >> - 1 file
>> >> changed, 69 insertions(+), 41 deletions(-)
>> >>
>> >> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 08fa8dc..af35fcc
>> >> 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -957,6 +957,20 @@
>> >> scalarizable_type_p (tree type) } }
>> >>
>> >> +static bool +get_elem_size (const_tree type, unsigned HOST_WIDE_INT
>> >> *sz_out)
>> >
>> > Function comment needed.
>> >
>> > I may have missed it in the earlier patches, but can you please make
>> > sure any new functions you created have comments in those as well.  Such
>> > patches are pre-approved.
>> >
>> > With the added function comment, this patch is fine.
>> >
>> > jeff
>> >
>> >
>>
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Re: [PATCH 3/5] Build ARRAY_REFs when the base is of ARRAY_TYPE.

2015-08-26 Thread Richard Biener
On Wed, 26 Aug 2015, Bin.Cheng wrote:

> On Wed, Aug 26, 2015 at 3:50 AM, Jeff Law  wrote:
> > On 08/25/2015 05:06 AM, Alan Lawrence wrote:
> >>
> >> When SRA completely scalarizes an array, this patch changes the
> >> generated accesses from e.g.
> >>
> >> MEM[(int[8] *)&a + 4B] = 1;
> >>
> >> to
> >>
> >> a[1] = 1;
> >>
> >> This overcomes a limitation in dom2, that accesses to equivalent
> >> chunks of e.g. MEM[(int[8] *)&a] are not hashable_expr_equal_p with
> >> accesses to e.g. MEM[(int[8] *)&a]. This is necessary for constant
> >> propagation in the ssa-dom-cse-2.c testcase (after the next patch
> >> that makes SRA handle constant-pool loads).
> >>
> >> I tried to work around this by making dom2's hashable_expr_equal_p
> >> less conservative, but found that on platforms without AArch64's
> >> vectorized reductions (specifically Alpha, hppa, PowerPC, and SPARC,
> >> mentioned in ssa-dom-cse-2.c), I also needed to make MEM[(int[8]
> >> *)&a] equivalent to a[0], etc.; a complete overhaul of
> >> hashable_expr_equal_p seems like a larger task than this patch
> >> series.
> >>
> >> I can't see how to write a testcase for this in C though as direct
> >> assignment to an array is not possible; such assignments occur only
> >> with constant pool data, which is dealt with in the next patch.
> >
> > It's a general issue that if there's > 1 common way to represent an
> > expression, then DOM will often miss discovery of the CSE opportunity
> > because of the way it hashes expressions.
> >
> > Ideally we'd be moving to a canonical form, but I also realize that in
> > the case of memory references like this, that may not be feasible.
> IIRC, there were talks about lowering all memory reference on GIMPLE?
> Which is the reverse approach.  Since SRA is in quite early
> compilation stage, don't know if lowered memory reference has impact
> on other optimizers.

Yeah, I'd only do the lowering after loop opts.  Which also may make
the DOM issue moot as the array refs would be lowered as well and thus
DOM would see a consistent set of references again.  The lowering should
also simplify SLSR and expose address computation redundancies to DOM.

I'd place such lowering before the late reassoc (any takers?  I suppose
you can pick up one of the bitfield lowering passes posted in the
previous years as this should also handle bitfield accesses correctly).

Thanks,
Richard.

> Thanks,
> bin
> >
> > It does make me wonder how many CSEs we're really missing due to the two
> > ways to represent array accesses.
> >
> >
> >> Bootstrap + check-gcc on x86-none-linux-gnu,
> >> arm-none-linux-gnueabihf, aarch64-none-linux-gnu.
> >>
> >> gcc/ChangeLog:
> >>
> >> * tree-sra.c (completely_scalarize): Move some code into:
> >> (get_elem_size): New. (build_ref_for_offset): Build ARRAY_REF if base
> >> is aligned array. --- gcc/tree-sra.c | 110
> >> - 1 file
> >> changed, 69 insertions(+), 41 deletions(-)
> >>
> >> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 08fa8dc..af35fcc
> >> 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -957,6 +957,20 @@
> >> scalarizable_type_p (tree type) } }
> >>
> >> +static bool +get_elem_size (const_tree type, unsigned HOST_WIDE_INT
> >> *sz_out)
> >
> > Function comment needed.
> >
> > I may have missed it in the earlier patches, but can you please make
> > sure any new functions you created have comments in those as well.  Such
> > patches are pre-approved.
> >
> > With the added function comment, this patch is fine.
> >
> > jeff
> >
> >
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [RFC 5/5] Always completely replace constant pool entries

2015-08-26 Thread Richard Biener
On Tue, Aug 25, 2015 at 9:54 PM, Jeff Law  wrote:
> On 08/25/2015 05:06 AM, Alan Lawrence wrote:
>>
>> I used this as a means of better-testing the previous changes, as it
>> exercises
>> the constant replacement code a whole lot more. Indeed, quite a few tests
>> are
>> now optimized away to nothing on AArch64...
>>
>> Always pulling in constants, is almost certainly not what we want, but we
>> may
>> nonetheless want something more aggressive than the usual --param, e.g.
>> for the
>> ssa-dom-cse-2.c test. Thoughts welcomed?
>
> I'm of the opinion that we have too many knobs already.  So I'd perhaps ask
> whether or not this option is likely to be useful to end users?
>
> As for the patch itself, any thoughts on reasonable heuristics for when to
> pull in the constants?  Clearly we don't want the patch as-is, but are there
> cases we can identify when we want to be more aggressive?

Well - I still think that we need to enhance those followup passes to directly
handle the constant pool entry.  Expanding the assignment piecewise for
arbitrary large initializers is certainly a no-go.  IIRC I enhanced FRE to do
this at some point.  For DOM it's much harder due to the way it is structured
and I'd like to keep DOM simple.

Note that we still want SRA to partly scalarize the initializer if
only few elements
remain accessed (so we can optimize the initializer away).  Of course
that requires
catching most followup optimization opportunities before the 2nd SRA run.

Richard.

> jeff
>
>


Re: [RFC 4/5] Handle constant-pool entries

2015-08-26 Thread Richard Biener
On Tue, Aug 25, 2015 at 10:13 PM, Jeff Law  wrote:
> On 08/25/2015 05:06 AM, Alan Lawrence wrote:
>>
>> This makes SRA replace loads of records/arrays from constant pool entries,
>> with elementwise assignments of the constant values, hence, overcoming the
>> fundamental problem in PR/63679.
>>
>> As a first pass, the approach I took was to look for constant-pool loads
>> as
>> we scanned through other accesses, and add them as candidates there; to
>> build a
>> constant replacement_decl for any such accesses in completely_scalarize;
>> and to
>> use any existing replacement_decl rather than creating a variable in
>> create_access_replacement. (I did try using CONSTANT_CLASS_P in the
>> latter, but
>> that does not allow addresses of labels, which can still end up in the
>> constant
>> pool.)
>>
>> Feedback as to the approach or how it might be better structured / fitted
>> into
>> SRA, is solicited ;).
>>
>> Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and
>> arm-none-linux-gnueabihf, including with the next patch (rfc), which
>> greatly increases the number of testcases in which this code is exercised!
>>
>> Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes
>> (using a stage 1 compiler only, without execution) on alpha, hppa, powerpc,
>> sparc, avr, and sh.
>>
>> gcc/ChangeLog:
>>
>> * tree-sra.c (create_access): Scan for uses of constant pool and
>> add
>> to candidates.
>> (subst_initial): New.
>> (scalarize_elem): Build replacement_decl using subst_initial.
>> (create_access_replacement): Use replacement_decl if set.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param
>> sra-max-scalarization-size-Ospeed.
>> ---
>>   gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |  7 +---
>>   gcc/tree-sra.c| 56
>> +--
>>   2 files changed, 55 insertions(+), 8 deletions(-)
>>
>> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>> index af35fcc..a3ff2df 100644
>> --- a/gcc/tree-sra.c
>> +++ b/gcc/tree-sra.c
>> @@ -865,6 +865,17 @@ create_access (tree expr, gimple stmt, bool write)
>> else
>>   ptr = false;
>>
>> +  /* FORNOW: scan for uses of constant pool as we go along.  */
>
> I'm not sure why you have this marked as FORNOW.  If I'm reading all this
> code correctly, you're lazily adding items from the constant pool into the
> candidates table when you find they're used.  That seems better than walking
> the entire constant pool adding them all to the candidates.
>
> I don't see this as fundamentally wrong or unclean.
>
> The question I have is why this differs from the effects of patch #5. That
> would seem to indicate that there's things we're not getting into the
> candidate tables with this approach?!?
>
>
>
>> @@ -1025,6 +1036,37 @@ completely_scalarize (tree base, tree decl_type,
>> HOST_WIDE_INT offset, tree ref)
>>   }
>>   }
>>
>> +static tree
>> +subst_initial (tree expr, tree var)
>
> Function comment.
>
> I think this patch is fine with the function comment added and removing the
> FORNOW part of the comment in create_access.  It may be worth noting in
> create_access's comment that it can add new items to the candidates tables
> for constant pool entries.

I'm happy seeing this code in SRA as I never liked that we already decide
at gimplification time which initializers to expand and which to init from
a constant pool entry.  So ... can we now "remove" gimplify_init_constructor
by _always_ emitting a constant pool entry and an assignment from it
(obviously only if the constructor can be put into the constant pool)?  Defering
the expansion decision to SRA makes it possible to better estimate whether
the code is hot/cold or whether the initialized variable can be replaced by
the constant pool entry completely (variable ends up readonly).

Oh, and we'd no longer create the awful split code at -O0 ...

So can you explore that a bit once this series is settled?  This is probably
also related to 5/5 as this makes all the target dependent decisions in SRA
now and thus the initial IL from gimplification should be the same for all
targets (that's always a nice thing to have IMHO).

Thanks,
Richard.

>
> Jeff


Re: [PATCH 3/5] Build ARRAY_REFs when the base is of ARRAY_TYPE.

2015-08-26 Thread Richard Biener
On Tue, Aug 25, 2015 at 9:50 PM, Jeff Law  wrote:
> On 08/25/2015 05:06 AM, Alan Lawrence wrote:
>>
>> When SRA completely scalarizes an array, this patch changes the
>> generated accesses from e.g.
>>
>> MEM[(int[8] *)&a + 4B] = 1;
>>
>> to
>>
>> a[1] = 1;
>>
>> This overcomes a limitation in dom2, that accesses to equivalent
>> chunks of e.g. MEM[(int[8] *)&a] are not hashable_expr_equal_p with
>> accesses to e.g. MEM[(int[8] *)&a]. This is necessary for constant
>> propagation in the ssa-dom-cse-2.c testcase (after the next patch
>> that makes SRA handle constant-pool loads).
>>
>> I tried to work around this by making dom2's hashable_expr_equal_p
>> less conservative, but found that on platforms without AArch64's
>> vectorized reductions (specifically Alpha, hppa, PowerPC, and SPARC,
>> mentioned in ssa-dom-cse-2.c), I also needed to make MEM[(int[8]
>> *)&a] equivalent to a[0], etc.; a complete overhaul of
>> hashable_expr_equal_p seems like a larger task than this patch
>> series.
>>
>> I can't see how to write a testcase for this in C though as direct
>> assignment to an array is not possible; such assignments occur only
>> with constant pool data, which is dealt with in the next patch.
>
> It's a general issue that if there's > 1 common way to represent an
> expression, then DOM will often miss discovery of the CSE opportunity
> because of the way it hashes expressions.
>
> Ideally we'd be moving to a canonical form, but I also realize that in
> the case of memory references like this, that may not be feasible.
>
> It does make me wonder how many CSEs we're really missing due to the two
> ways to represent array accesses.
>
>
>> Bootstrap + check-gcc on x86-none-linux-gnu,
>> arm-none-linux-gnueabihf, aarch64-none-linux-gnu.
>>
>> gcc/ChangeLog:
>>
>> * tree-sra.c (completely_scalarize): Move some code into:
>> (get_elem_size): New. (build_ref_for_offset): Build ARRAY_REF if base
>> is aligned array. --- gcc/tree-sra.c | 110
>> - 1 file
>> changed, 69 insertions(+), 41 deletions(-)
>>
>> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 08fa8dc..af35fcc
>> 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -957,6 +957,20 @@
>> scalarizable_type_p (tree type) } }
>>
>> +static bool +get_elem_size (const_tree type, unsigned HOST_WIDE_INT
>> *sz_out)
>
> Function comment needed.
>
> I may have missed it in the earlier patches, but can you please make
> sure any new functions you created have comments in those as well.  Such
> patches are pre-approved.
>
> With the added function comment, this patch is fine.

Err ... you generally _cannot_ create ARRAY_REFs out of thin air because
of correctness issues with data-ref and data dependence analysis.  You can
of course keep ARRAY_REFs if the original access was an ARRAY_REF.

But I'm not convinced this is what the pass does.

We've went through great lengths removing all the code from gimplification
and folding that tried to be clever in producing array refs from accesses to
sth with an ARRAY_TYPE - this all eventually lead to wrong-code issues
later.

So I'd rather _not_ have this patch.  (as always I'm too slow responding
and Jeff is too fast ;))

Thanks,
Richard.

> jeff
>
>


Re: [PATCH 2/5] completely_scalarize arrays as well as records

2015-08-26 Thread Richard Biener
On Tue, Aug 25, 2015 at 11:44 PM, Jeff Law  wrote:
> On 08/25/2015 03:42 PM, Martin Jambor wrote:
>>
>> Hi,
>>
>> On Tue, Aug 25, 2015 at 12:06:14PM +0100, Alan Lawrence wrote:
>>>
>>> This changes the completely_scalarize_record path to also work on arrays
>>> (thus
>>> allowing records containing arrays, etc.). This just required extending
>>> the
>>> existing type_consists_of_records_p and completely_scalarize_record
>>> methods
>>> to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I renamed
>>> both
>>> methods so as not to mention 'record'.
>>
>>
>> thanks for working on this.  I see Jeff has already approved the
>> patch, but I have two comments nevertheless.  First, I would be much
>> happier if you added a proper comment to scalarize_elem function which
>> you forgot completely.  The name is not very descriptive and it has
>> quite few parameters too.
>
> Right.  I mentioned that I missed the lack of function comments when looking
> at #3 and asked Alan to go back and fix them in #1 and #2.
>
>>
>> Second, this patch should also fix PR 67283.  It would be great if you
>> could verify that and add it to the changelog when committing if that
>> is indeed the case.
>
> Excellent.  Yes, definitely mention the BZ.

One extra question is does the way we limit total scalarization work well
for arrays?  I suppose we have either sth like the maximum size of an
aggregate we scalarize or the maximum number of component accesses
we create?

Thanks,
Richard.

> jeff
>