Re: [PATCH] Add PowerPC ISA 3.0 word splat and byte immediate splat support

2016-05-13 Thread Segher Boessenkool
On Fri, May 13, 2016 at 07:25:43PM -0400, Michael Meissner wrote:
> This patch adds support for the 32-bit word splat instructions, the byte
> immediate splat instructions, and the vector sign extend instructions to GCC
> 7.0.
> 
> In addition to the various splat instructions, since I was modifying the 
> vector
> moves, I took the opportunity to reorganize the vector move instructions with
> several changes I've wanted to do:

It is much easier to review, and for regression searches later, if one
patch does one thing.  No need to change this patch, but please keep
it in mind for later patches.

> I replaced the single move that handled all vector types with separate 32-bit
> and 64-bit moves.  I also combined the movti_ moves (when -mvsx-timode is
> in effect) into the main vector moves.
> 
> I eliminated separate moves for  , where the preferred register 
> class
> () is listed first, and the secondary register class () is listed
> second with a '?' to discourage use.
> 
> Prefer loading 0/-1 in any VSX register for ISA 3.0, and Altivec registers for
> ISA 2.06/2.07 (PR target/70915) so that if the register was involved in a slow
> operation, the clear/set operation does not wait for the slow operation to
> finish.
> 
> I adjusted the length attributes for 32-bit mode.
> 
> I changed the 32-bit move to use rs6000_output_move_128bit and drop the use of
> the string instructions for 32-bit movti when -mvsx-timode is in effect.
> 
> I used spacing so that the alternatives and attributes don't generate long
> lines, and put things in columns, so that it is easier to match up the 
> operands
> and attributes with the insn alternatives.
> 
> I did a spec 2006 run comparing these changes to the trunk, and there were no
> significant differences in terms of performances.
> 
> Are these patches ok to apply to the GCC 7.0 trunk?

Changelog is missing.

> +;; Generate the XXORC instruction which was added in ISA 2.07
> +(define_constraint "wM"
> +  "vector constant with all 1's, ISA 2.07 and above"
> +  (and (match_test "TARGET_P8_VECTOR")
> +   (match_operand 0 "all_ones_constant")))

That's not a very good comment; a constraint doesn't generate anything.
"Used for", maybe?

> +(define_predicate "xxspltib_constant_split"
> +  (match_code "const_vector,vec_duplicate,const_int")
> +{
> +  int value  = 256;
> +  int num_insns  = -1;

No tabs please, just a single space will do.

> +   rtx cvt  = ((TARGET_XSCVDPSPN)
> +   ? gen_vsx_xscvdpspn_scalar (freg, sreg)
> +   : gen_vsx_xscvdpsp_scalar (freg, sreg));

No parentheses around TARGET_XSCVDPSPN.

> +(define_insn "xxspltib_v16qi"
> +  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
> + (vec_duplicate:V16QI (match_operand:SI 1 "s8bit_cint_operand" "i")))]
> +  "TARGET_P9_VECTOR"
>  {
> -  return rs6000_output_move_128bit (operands);
> +  operands[2] = GEN_INT (INTVAL (operands[1]) & 0xff);
> +  return "xxspltib %x0,%2";
>  }

Please use "n" instead of "i"?  It shouldn't matter here, but at least it's
good documentation.

> +;;  VSX store   VSX load
> +;;   VSX movedirect move
> +;;   direct move GPR store
> +;;   GPR loadGPR move
> +;;   P9 const.   AVX const.
> +;;  P9 const.   0
> +;;   -1  GPR const.
> +;;  AVX store   AVX load

"AVX"?  Heh.  AltiVec or VMX.

> +(define_insn "*vsx_mov_64bit"
> +  [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=wOZ,,
> +   ,   *r,
> +   *wo, $Y,
> +   ??r, ??r,
> +   wo,  v,
> +   v,   ?,
> +   ?wh, ??r,
> +   wZ,  v")

This is more readable, excellent.  Would four or so columns work even
better?

> +/* { dg-do compile { target { powerpc64le-*-* } } } */

Does the test not work on BE?

Okay for trunk, with a changelog and the nits fixed.  Thanks,


Segher


[PATCH] Add PowerPC ISA 3.0 word splat and byte immediate splat support

2016-05-13 Thread Michael Meissner
This patch adds support for the 32-bit word splat instructions, the byte
immediate splat instructions, and the vector sign extend instructions to GCC
7.0.

In addition to the various splat instructions, since I was modifying the vector
moves, I took the opportunity to reorganize the vector move instructions with
several changes I've wanted to do:

I replaced the single move that handled all vector types with separate 32-bit
and 64-bit moves.  I also combined the movti_ moves (when -mvsx-timode is
in effect) into the main vector moves.

I eliminated separate moves for  , where the preferred register class
() is listed first, and the secondary register class () is listed
second with a '?' to discourage use.

Prefer loading 0/-1 in any VSX register for ISA 3.0, and Altivec registers for
ISA 2.06/2.07 (PR target/70915) so that if the register was involved in a slow
operation, the clear/set operation does not wait for the slow operation to
finish.

I adjusted the length attributes for 32-bit mode.

I changed the 32-bit move to use rs6000_output_move_128bit and drop the use of
the string instructions for 32-bit movti when -mvsx-timode is in effect.

I used spacing so that the alternatives and attributes don't generate long
lines, and put things in columns, so that it is easier to match up the operands
and attributes with the insn alternatives.

I did a spec 2006 run comparing these changes to the trunk, and there were no
significant differences in terms of performances.

Are these patches ok to apply to the GCC 7.0 trunk?

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/constraints.md
===
--- gcc/config/rs6000/constraints.md
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 236136)
+++ gcc/config/rs6000/constraints.md(.../gcc/config/rs6000) (working copy)
@@ -140,6 +140,10 @@ (define_constraint "wD"
   (and (match_code "const_int")
(match_test "TARGET_VSX && (ival == VECTOR_ELEMENT_SCALAR_64BIT)")))
 
+(define_constraint "wE"
+  "Vector constant that can be loaded with the XXSPLTIB instruction."
+  (match_test "xxspltib_constant_nosplit (op, mode)"))
+
 ;; Extended fusion store
 (define_memory_constraint "wF"
   "Memory operand suitable for power9 fusion load/stores"
@@ -156,6 +160,12 @@ (define_constraint "wL"
(and (match_test "TARGET_DIRECT_MOVE_128")
(match_test "(ival == VECTOR_ELEMENT_MFVSRLD_64BIT)"
 
+;; Generate the XXORC instruction which was added in ISA 2.07
+(define_constraint "wM"
+  "vector constant with all 1's, ISA 2.07 and above"
+  (and (match_test "TARGET_P8_VECTOR")
+   (match_operand 0 "all_ones_constant")))
+
 ;; ISA 3.0 vector d-form addresses
 (define_memory_constraint "wO"
   "Memory operand suitable for the ISA 3.0 vector d-form instructions."
@@ -166,6 +176,10 @@ (define_memory_constraint "wQ"
   "Memory operand suitable for the load/store quad instructions"
   (match_operand 0 "quad_memory_operand"))
 
+(define_constraint "wS"
+  "Vector constant that can be loaded with XXSPLTIB & sign extension."
+  (match_test "xxspltib_constant_split (op, mode)"))
+
 ;; Altivec style load/store that ignores the bottom bits of the address
 (define_memory_constraint "wZ"
   "Indexed or indirect memory operand, ignoring the bottom 4 bits"
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md 
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 236136)
+++ gcc/config/rs6000/predicates.md (.../gcc/config/rs6000) (working copy)
@@ -565,6 +565,38 @@ (define_predicate "easy_fp_constant"
 }
 })
 
+;; Return 1 if the operand is a CONST_VECTOR or VEC_DUPLICATE of a constant
+;; that can loaded with a XXSPLTIB instruction and then a VUPKHSB, VECSB2W or
+;; VECSB2D instruction.
+
+(define_predicate "xxspltib_constant_split"
+  (match_code "const_vector,vec_duplicate,const_int")
+{
+  int value= 256;
+  int num_insns= -1;
+
+  if (!xxspltib_constant_p (op, mode, _insns, ))
+return false;
+
+  return num_insns > 1;
+})
+
+
+;; Return 1 if the operand is a CONST_VECTOR that can loaded directly with a
+;; XXSPLTIB instruction.
+
+(define_predicate "xxspltib_constant_nosplit"
+  (match_code "const_vector,vec_duplicate,const_int")
+{
+  int value= 256;
+  int num_insns= -1;
+
+  if (!xxspltib_constant_p (op, mode, _insns, ))
+return false;
+
+  return num_insns == 1;
+})
+
 ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
 ;; vector register without using memory.
 (define_predicate "easy_vector_constant"
@@ -583,7 +615,14 @@ (define_predicate "easy_vector_constant"
 
   if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode))
 {
-  if (zero_constant (op, mode))
+  int value = 

[PATCH] combine: Don't call extract_left_shift with count < 0 (PR67483)

2016-05-13 Thread Segher Boessenkool
If the compiled program does a shift by a negative amount, combine will
happily work with that, but it shouldn't then do an undefined operation
in GCC itself.  This patch fixes the first case mentioned in the bug
report (I haven't been able to reproduce the second case, on trunk at
least).

Bootstrapped and tested on powerpc64-linux, committing to trunk.


Segher


2016-05-13  Segher Boessenkool  

PR rtl-optimization/67483
* combine.c (make_compound_operation): Don't call extract_left_shift
with negative shift amounts.

---
 gcc/combine.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/combine.c b/gcc/combine.c
index 02aadc4..b819415 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -8071,6 +8071,7 @@ make_compound_operation (rtx x, enum rtx_code in_code)
  && ! (GET_CODE (lhs) == SUBREG
&& (OBJECT_P (SUBREG_REG (lhs
  && CONST_INT_P (rhs)
+ && INTVAL (rhs) >= 0
  && INTVAL (rhs) < HOST_BITS_PER_WIDE_INT
  && INTVAL (rhs) < mode_width
  && (new_rtx = extract_left_shift (lhs, INTVAL (rhs))) != 0)
-- 
1.9.3



Implement C11 DR#423 resolution (ignore function return type qualifiers)

2016-05-13 Thread Joseph Myers
The resolution of C11 DR#423, apart from doing things with the types
of expressions cast to qualified types which are only in standard
terms observable with _Generic and which agree with how GCC has
implemented _Generic all along, also specifies that qualifiers are
discarded from function return types: "derived-declarator-type-list
function returning T" becomes "derived-declarator-type-list function
returning the unqualified version of T" in the rules giving types for
function declarators.  This means that declarations of a function with
both qualified and unqualified return types are now compatible,
similar to how different declarations can vary in whether a function
argument is declared with a qualifier or unqualified type.

This patch implements this resolution.  Since the motivation for the
change was _Generic, the resolution is restricted to C11 mode; there's
no reason to consider there to be a defect in this regard in older
standard versions.  Some less-obvious issues are handled as follows:

* As usual, and as with function arguments, _Atomic is not considered
  a qualifier for this purpose; that is, function declarations must
  agree regarding whether the return type is atomic.

* By 6.9.1#2, a function definition cannot return qualified void.  But
  with this change, specifying "const void" in the declaration
  produces the type "function returning void", which is perfectly
  valid, so "const void f (void) {}" is no longer an error.

* The application to restrict is less clear.  The way I am
  interpreting it in this patch is that "unqualified version of T" is
  not valid if T is not valid, as in the case where T is a
  restrict-qualified version of a type that cannot be restrict
  qualified (non-pointer, or pointer-to-function).  But it's possible
  to argue the other way from the wording.

Bootstrapped with no regressions on x86_64-pc-linux-gnu.  Applied to 
mainline.

gcc/c:
2016-05-13  Joseph Myers  

* c-decl.c (grokdeclarator): For C11, discard qualifiers on
function return type.

gcc/testsuite:
2016-05-13  Joseph Myers  

* gcc.dg/qual-return-5.c, gcc.dg/qual-return-6.c: New tests.
* gcc.dg/call-diag-2.c, gcc.dg/qual-return-2.c ,
gcc.dg/qual-return-3.c, gcc.dg/qual-return-4.c: Use -std=gnu99.

Index: gcc/c/c-decl.c
===
--- gcc/c/c-decl.c  (revision 236213)
+++ gcc/c/c-decl.c  (working copy)
@@ -6106,13 +6106,19 @@ grokdeclarator (const struct c_declarator *declara
   qualify the return type, not the function type.  */
if (type_quals)
  {
+   int quals_used = type_quals;
/* Type qualifiers on a function return type are
   normally permitted by the standard but have no
   effect, so give a warning at -Wreturn-type.
   Qualifiers on a void return type are banned on
   function definitions in ISO C; GCC used to used
-  them for noreturn functions.  */
-   if (VOID_TYPE_P (type) && really_funcdef)
+  them for noreturn functions.  The resolution of C11
+  DR#423 means qualifiers (other than _Atomic) are
+  actually removed from the return type when
+  determining the function type.  */
+   if (flag_isoc11)
+ quals_used &= TYPE_QUAL_ATOMIC;
+   if (quals_used && VOID_TYPE_P (type) && really_funcdef)
  pedwarn (loc, 0,
   "function definition has qualified void return 
type");
else
@@ -6119,7 +6125,16 @@ grokdeclarator (const struct c_declarator *declara
  warning_at (loc, OPT_Wignored_qualifiers,
   "type qualifiers ignored on function return type");
 
-   type = c_build_qualified_type (type, type_quals);
+   /* Ensure an error for restrict on invalid types; the
+  DR#423 resolution is not entirely clear about
+  this.  */
+   if (flag_isoc11
+   && (type_quals & TYPE_QUAL_RESTRICT)
+   && (!POINTER_TYPE_P (type)
+   || !C_TYPE_OBJECT_OR_INCOMPLETE_P (TREE_TYPE (type
+ error_at (loc, "invalid use of %");
+   if (quals_used)
+ type = c_build_qualified_type (type, quals_used);
  }
type_quals = TYPE_UNQUALIFIED;
 
Index: gcc/testsuite/gcc.dg/call-diag-2.c
===
--- gcc/testsuite/gcc.dg/call-diag-2.c  (revision 236213)
+++ gcc/testsuite/gcc.dg/call-diag-2.c  (working copy)
@@ -1,7 +1,7 @@
 /* Test diagnostics for calling function returning qualified void or
other incomplete type other than void.  PR 35210.  */
 /* { dg-do 

Re: [PATCH v3, rs6000] Add built-in support for new Power9 darn (deliver a random number) instruction

2016-05-13 Thread Segher Boessenkool
On Wed, May 11, 2016 at 04:05:46PM -0600, Kelvin Nilsen wrote:
> I have bootstrapped and tested this patch against the trunk and against
> the gcc-6-branch on both powerpc64le-unknown-linux-gnu and
> powerpc64-unknown-linux-gnu with no regressions.  Is this ok for trunk
> and for backporting to GCC 6 after a few days of burn-in time on the
> trunk?

> 2016-05-11  Kelvin Nilsen  
> 
>   * gcc.target/powerpc/darn-0.c: New test.
>   * gcc.target/powerpc/darn-1.c: New test.
>   * gcc.target/powerpc/darn-2.c: New test.

> 2016-05-11  Kelvin Nilsen  
> 
>   * config/rs6000/altivec.md (UNSPEC_DARN): New unspec constant.
>   (UNSPEC_DARN_32): New unspec constant.
>   (UNSPEC_DARN_RAW): New unspec constant.
>   (darn_32): New instruction.
>   (darn_raw): New instruction.
>   (darn): New instruction.
>   * config/rs6000/rs6000-builtin.def (RS6000_BUILTIN_0): Add
>   support and documentation for this macro.
>   (BU_P9_MISC_1): New macro definition.
>   (BU_P9_64BIT_MISC_0): New macro definition.
>   (BU_P9_MISC_0): New macro definition.
>   (darn_32): New builtin definition.
>   (darn_raw): New builtin definition.
>   (darn): New builtin definition.
>   * config/rs6000/rs6000.c: Add #define RS6000_BUILTIN_0 and #undef
>   RS6000_BUILTIN_0 directives to surround each occurrence of
>   #include "rs6000-builtin.def".
>   (rs6000_builtin_mask_calculate): Add in the RS6000_BTM_MODULO and
>   RS6000_BTM_64BIT flags to the returned mask, depending on
>   configuration.
>   (def_builtin): Correct an error in the assignments made to the
>   debugging variable attr_string.
>   (rs6000_expand_builtin): Add support for no-operand built-in
>   functions.
>   (builtin_function_type): Remove fatal_error assertion that is no
>   longer valid.
>   (rs6000_common_init_builtins): Add support for no-operand built-in
>   functions.
>   * config/rs6000/rs6000.h (RS6000_BTM_MODULO): New macro
>   definition.
>   (RS6000_BTM_PURE): Enhance comment to clarify intent of this flag
>   definition.
>   (RS6000_BTM_64BIT): New macro definition.
>   * doc/extend.texi: Document __builtin_darn (void),
>   __builtin_darn_raw (void), and __builtin_darn_32 (void) built-in
>   functions.

> +(define_insn "darn_32"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +(unspec:SI [(const_int 0)] UNSPEC_DARN_32))]
> +  "TARGET_MODULO"
> +  {
> + return "darn %0,0";
> +  }
> +  [(set_attr "type" "integer")])

Don't indent the {}; but in simple cases like this, you don't need a C
block, just the string:

+(define_insn "darn_32"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(unspec:SI [(const_int 0)] UNSPEC_DARN_32))]
+  "TARGET_MODULO"
+  "darn %0,0"
+  [(set_attr "type" "integer")])

> +  if (rs6000_overloaded_builtin_p (d->code))
> + {
> +   if (! (type = opaque_ftype_opaque)) 
> + {
> +   opaque_ftype_opaque
> + = build_function_type_list (opaque_V4SI_type_node, NULL_TREE);
> +   type = opaque_ftype_opaque;
> + }

No space after !; no space at end of line.  It's easier to read if you put
the assignment as a separate statement before the if.  Or write is as

+ if (!opaque_ftype_opaque)
+   opaque_ftype_opaque
+ = build_function_type_list (opaque_V4SI_type_node, NULL_TREE);
+ type = opaque_ftype_opaque;

Okay with those changes, thanks!


Segher


Re: [PATCH][CilkPlus] Merge libcilkrts from upstream

2016-05-13 Thread Jeff Law

On 05/11/2016 09:59 AM, Ilya Verbin wrote:

On Wed, May 11, 2016 at 10:47:49 +0100, Ramana Radhakrishnan wrote:



I've looked at the generated code in more details, and for armv6 this generates
mcr p15, 0, r0, c7, c10, 5
which is not what __cilkrts_fence uses currently (CP15DSB vs CP15DMB)


Wow I hadn't noticed that it was a DSB -  DSB is way too heavy weight. Userland 
shouldn't need to use this by default IMNSHO. It's needed if you are working on 
non-cacheable memory or performing cache maintenance operations but I can't 
imagine cilkplus wanting to do that !

http://infocenter.arm.com/help/topic/com.arm.doc.genc007826/Barrier_Litmus_Tests_and_Cookbook_A08.pdf

It's almost like the default definitions need to be in terms of the atomic 
extensions rather than having these written in this form. Folks usually get 
this wrong !


Looking at arm/sync.md it seems that there is no way to generate CP15DSB.


No - there is no way of generating DSB,  DMB's should be sufficient for this 
purpose. Would anyone know what the semantics of __cilkrts_fence are that 
require this to be a DSB ?


__cilkrts_fence semantics is identical to __sync_synchronize, so DMB look OK.

Maybe we should just define:
  #define __cilkrts_fence() __sync_synchronize()
Certainly seems like indirecting through a compiler builtin rather than 
using an ASM would be advisable.


jeff



Go patch committed: Implement escape analysis discovery phase

2016-05-13 Thread Ian Lance Taylor
This patch by Chris Manghane implements the escape analysis discovery
phase in the Go frontend.  Escape analysis is still not turned on.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 235988)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-7f5a9fde801eb755a5252fd4ff588b0a47475bd3
+a87af72757d9a2e4479062a459a41d4540398005
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/escape.cc
===
--- gcc/go/gofrontend/escape.cc (revision 235988)
+++ gcc/go/gofrontend/escape.cc (working copy)
@@ -304,14 +304,193 @@ Gogo::analyze_escape()
 }
 }
 
+// Traverse the program, discovering the functions that are roots of strongly
+// connected components.  The goal of this phase to produce a set of functions
+// that must be analyzed in order.
+
+class Escape_analysis_discover : public Traverse
+{
+ public:
+  Escape_analysis_discover(Gogo* gogo)
+: Traverse(traverse_functions),
+  gogo_(gogo), component_ids_()
+  { }
+
+  int
+  function(Named_object*);
+
+  int
+  visit(Named_object*);
+
+  int
+  visit_code(Named_object*, int);
+
+ private:
+  // A counter used to generate the ID for the function node in the graph.
+  static int id;
+
+  // Type used to map functions to an ID in a graph of connected components.
+  typedef Unordered_map(Named_object*, int) Component_ids;
+
+  // The Go IR.
+  Gogo* gogo_;
+  // The list of functions encountered during connected component discovery.
+  Component_ids component_ids_;
+  // The stack of functions that this component consists of.
+  std::stack stack_;
+};
+
+int Escape_analysis_discover::id = 0;
+
+// Visit each function.
+
+int
+Escape_analysis_discover::function(Named_object* fn)
+{
+  this->visit(fn);
+  return TRAVERSE_CONTINUE;
+}
+
+// Visit a function FN, adding it to the current stack of functions
+// in this connected component.  If this is the root of the component,
+// create a set of functions to be analyzed later.
+//
+// Finding these sets is finding strongly connected components
+// in the static call graph.  The algorithm for doing that is taken
+// from Sedgewick, Algorithms, Second Edition, p. 482, with two
+// adaptations.
+//
+// First, a closure (fn->func_value()->enclosing() == NULL) cannot be the
+// root of a connected component.  Refusing to use it as a root
+// forces it into the component of the function in which it appears.
+// This is more convenient for escape analysis.
+//
+// Second, each function becomes two virtual nodes in the graph,
+// with numbers n and n+1. We record the function's node number as n
+// but search from node n+1. If the search tells us that the component
+// number (min) is n+1, we know that this is a trivial component: one function
+// plus its closures. If the search tells us that the component number is
+// n, then there was a path from node n+1 back to node n, meaning that
+// the function set is mutually recursive. The escape analysis can be
+// more precise when analyzing a single non-recursive function than
+// when analyzing a set of mutually recursive functions.
+
+int
+Escape_analysis_discover::visit(Named_object* fn)
+{
+  Component_ids::const_iterator p = this->component_ids_.find(fn);
+  if (p != this->component_ids_.end())
+// Already visited.
+return p->second;
+
+  this->id++;
+  int id = this->id;
+  this->component_ids_[fn] = id;
+  this->id++;
+  int min = this->id;
+
+  this->stack_.push(fn);
+  min = this->visit_code(fn, min);
+  if ((min == id || min == id + 1)
+  && fn->is_function()
+  && fn->func_value()->enclosing() == NULL)
+{
+  bool recursive = min == id;
+  std::vector group;
+
+  for (; !this->stack_.empty(); this->stack_.pop())
+   {
+ Named_object* n = this->stack_.top();
+ if (n == fn)
+   {
+ this->stack_.pop();
+ break;
+   }
+
+ group.push_back(n);
+ this->component_ids_[n] = std::numeric_limits::max();
+   }
+  group.push_back(fn);
+  this->component_ids_[fn] = std::numeric_limits::max();
+
+  std::reverse(group.begin(), group.end());
+  this->gogo_->add_analysis_set(group, recursive);
+}
+
+  return min;
+}
+
+// Helper class for discovery step.  Traverse expressions looking for
+// function calls and closures to visit during the discovery step.
+
+class Escape_discover_expr : public Traverse
+{
+ public:
+  Escape_discover_expr(Escape_analysis_discover* ead, int min)
+: Traverse(traverse_expressions),
+  ead_(ead), min_(min)
+  { }
+
+  int
+  min()
+  { return this->min_; }
+
+  int
+  expression(Expression** pexpr);
+
+ private:
+  // The original discovery 

Re: Thoughts on memcmp expansion (PR43052)

2016-05-13 Thread Joseph Myers
On Fri, 13 May 2016, Bernd Schmidt wrote:

> Thanks. So, this would seem to suggest that BITS_PER_UNIT in memcmp/memcpy
> expansion is more accurate than a plain 8, although pedantically we might want
> to use CHAR_TYPE_SIZE? Should I adjust my patch to use the latter or leave
> these parts as-is?

I'd say use CHAR_TYPE_SIZE.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 2/2] jit: use FINAL and OVERRIDE throughout

2016-05-13 Thread David Malcolm
On Fri, 2016-05-06 at 12:40 -0400, David Malcolm wrote:
> Mark most virtual functions in gcc/jit as being FINAL OVERRIDE.
> gcc::jit::recording::lvalue::access_as_rvalue is the sole OVERRIDE
> that isn't a FINAL.
> 
> Successfully bootstrapped on x86_64-pc-linux-gnu.
> 
> I can self-approve this, but as asked in patch 1,
> does "final" imply "override"?  Is "final override" a tautology?

http://stackoverflow.com/questions/29412412/does-final-imply-override s
ays that "final override" is *not* tautologous.

I've committed this jit patch to trunk as r236223.

Dave


Re: [PATCH] Import config.sub and config.guess from upstream.

2016-05-13 Thread Mike Stump
On May 13, 2016, at 11:50 AM, Jakub Sejdak  wrote:
> 
> OK I understand. So am I right, that in such a case there is no way to
> introduce new OS targets to branch 4.9 and 5?

No.  You just hand edit in the bits you need for your port, and seek approval 
for that.

In general, all new work goes into trunk, first.  Once it is there, then you 
can back port to the first oldest release branch.  Once there, then you can 
back port to the next oldest and so on.  If it can't make it into a branch, it 
can't go onto the next oldest branch.

So, be be concrete, get 5 in first, then ask for 4.9.  Get 6 in first, then ask 
for 5.  Get trunk in first, then ask for 6.

> What about 6 branch and trunk?

6 would have the same answer as 4.9 and 5.  For trunk, import from upstream is 
the usual way to pick up that work.

> On the other hand: this is my first patch and I'm not quite familiar
> with the procedure of applying patches to upstream. Who should upload
> my patch, when it gets accepted?

Anyone with write access.

Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-13 Thread Andrew Pinski
On Fri, May 13, 2016 at 12:58 PM, Richard Biener
 wrote:
> On May 13, 2016 9:18:57 PM GMT+02:00, Cesar Philippidis 
>  wrote:
>>The cse_sincos pass tries to optimize sequences such as
>>
>>  sin (x);
>>  cos (x);
>>
>>into a single call to sincos, or cexpi, when available. However, the
>>nvptx target has sin and cos instructions, albeit with some loss of
>>precision (so it's only enabled with -ffast-math). This patch teaches
>>cse_sincos pass to ignore sin, cos and cexpi instructions when the
>>target can expand those calls. This yields a 6x speedup in 314.omriq
>>from spec accel when running on Nvidia accelerators.
>>
>>Is this OK for trunk?
>
> Isn't there an optab for sincos?

This is exactly what I was going to suggest.  This transformation
should be done in the back-end back to sin/cos instructions.

Thanks,
Andrew

> ISTR x87 handles this pass just fine and also can do sin and cos.
>
> Richard.
>
>>Cesar
>
>


Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-13 Thread Richard Biener
On May 13, 2016 9:18:57 PM GMT+02:00, Cesar Philippidis 
 wrote:
>The cse_sincos pass tries to optimize sequences such as
>
>  sin (x);
>  cos (x);
>
>into a single call to sincos, or cexpi, when available. However, the
>nvptx target has sin and cos instructions, albeit with some loss of
>precision (so it's only enabled with -ffast-math). This patch teaches
>cse_sincos pass to ignore sin, cos and cexpi instructions when the
>target can expand those calls. This yields a 6x speedup in 314.omriq
>from spec accel when running on Nvidia accelerators.
>
>Is this OK for trunk?

Isn't there an optab for sincos?  ISTR x87 handles this pass just fine and also 
can do sin and cos.

Richard.

>Cesar




Re: [PATCH] Fix --enable-checking=fold bootstrap (PR bootstrap/71071)

2016-05-13 Thread Richard Biener
On May 13, 2016 7:22:37 PM GMT+02:00, Jakub Jelinek  wrote:
>Hi!
>
>Since a recent change, TYPE_ALIAS_SET of types can change during
>folding.
>This patch arranges for it not to be checksummed.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux (normal
>bootstrap)
>and built (nonbootstrap) + regtested the
>--enable-checking=yes,extra,fold,rtl version (previously it wouldn't
>build
>even libgcc).
>
>Ok for trunk?

OK.
Richard.

>2016-05-13  Jakub Jelinek  
>
>   PR bootstrap/71071
>   * fold-const.c (fold_checksum_tree): Allow modification
>   of TYPE_ALIAS_SET during folding.
>
>   * gcc.dg/pr71071.c: New test.
>
>--- gcc/fold-const.c.jj2016-05-13 13:58:12.381020172 +0200
>+++ gcc/fold-const.c   2016-05-13 14:22:51.724912335 +0200
>@@ -12130,7 +12130,8 @@ fold_checksum_tree (const_tree expr, str
>  || TYPE_REFERENCE_TO (expr)
>  || TYPE_CACHED_VALUES_P (expr)
>  || TYPE_CONTAINS_PLACEHOLDER_INTERNAL (expr)
>- || TYPE_NEXT_VARIANT (expr)))
>+ || TYPE_NEXT_VARIANT (expr)
>+ || TYPE_ALIAS_SET_KNOWN_P (expr)))
> {
>   /* Allow these fields to be modified.  */
>   tree tmp;
>@@ -12140,6 +12141,7 @@ fold_checksum_tree (const_tree expr, str
>   TYPE_POINTER_TO (tmp) = NULL;
>   TYPE_REFERENCE_TO (tmp) = NULL;
>   TYPE_NEXT_VARIANT (tmp) = NULL;
>+  TYPE_ALIAS_SET (tmp) = -1;
>   if (TYPE_CACHED_VALUES_P (tmp))
>   {
> TYPE_CACHED_VALUES_P (tmp) = 0;
>--- gcc/testsuite/gcc.dg/pr71071.c.jj  2016-05-13 14:23:39.528278177
>+0200
>+++ gcc/testsuite/gcc.dg/pr71071.c 2016-05-13 14:22:08.0 +0200
>@@ -0,0 +1,12 @@
>+/* PR bootstrap/71071 */
>+/* { dg-do compile } *
>+/* { dg-options "-O2" } */
>+
>+struct S { unsigned b : 1; } a;
>+
>+void
>+foo ()
>+{
>+  if (a.b)
>+;
>+}
>
>   Jakub




inhibit the sincos optimization when the target has sin and cos instructions

2016-05-13 Thread Cesar Philippidis
The cse_sincos pass tries to optimize sequences such as

  sin (x);
  cos (x);

into a single call to sincos, or cexpi, when available. However, the
nvptx target has sin and cos instructions, albeit with some loss of
precision (so it's only enabled with -ffast-math). This patch teaches
cse_sincos pass to ignore sin, cos and cexpi instructions when the
target can expand those calls. This yields a 6x speedup in 314.omriq
from spec accel when running on Nvidia accelerators.

Is this OK for trunk?

Cesar
2016-05-13  Cesar Philippidis  

	gcc/
	* tree-ssa-math-opts.c (pass_cse_sincos::execute): Don't optimize
	sin and cos calls when the target has instructions for them.
	
	gcc/testsuite/
	* gcc.target/nvptx/sincos.c: New test.


diff --git a/gcc/testsuite/gcc.target/nvptx/sincos.c b/gcc/testsuite/gcc.target/nvptx/sincos.c
new file mode 100644
index 000..921ec41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/sincos.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math" } */
+
+extern float sinf (float);
+extern float cosf (float);
+
+float
+sincos_add (float x)
+{
+  float s = sinf (x);
+  float c = cosf (x);
+
+  return s + c;
+}
+
+/* { dg-final { scan-assembler-times "sin.approx.f32" 1 } } */
+/* { dg-final { scan-assembler-times "cos.approx.f32" 1 } } */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 81688cd..38051e1 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1806,6 +1806,11 @@ pass_cse_sincos::execute (function *fun)
 		CASE_CFN_COS:
 		CASE_CFN_SIN:
 		CASE_CFN_CEXPI:
+		  /* Don't modify these calls if they can be translated
+		 directly into hardware instructions.  */
+		  if (replacement_internal_fn (as_a  (stmt))
+		  != IFN_LAST)
+		break;
 		  /* Make sure we have either sincos or cexp.  */
 		  if (!targetm.libc_has_function (function_c99_math_complex)
 		  && !targetm.libc_has_function (function_sincos))


Re: New C++ PATCH for c++/10200 et al

2016-05-13 Thread Jason Merrill

On 02/16/2016 07:49 PM, Jason Merrill wrote:

Clearly the DR 141 change is requiring much larger adjustments in the
rest of the compiler than I'm comfortable making at this point in the
GCC 6 schedule, so I'm backing out my earlier changes for 10200 and
69753 and replacing them with a more modest fix for 10200: Now we will
still find member function templates by unqualified lookup, we just
won't find namespace-scope function templates.  The earlier approach
will return in GCC 7 stage 1.


As promised.  The prerequisite for the DR 141 change was fixing the 
C++11 handling of type-dependence of member access expressions, 
including calls.  14.6.2.2 says,


A class member access expression (5.2.5) is type-dependent if the 
expression refers to a member of the current instantiation and the type 
of the referenced member is dependent, or the class member access 
expression refers to a member of an unknown specialization. [ Note: In 
an expression of the form x.y or xp->y the type of the expression is 
usually the type of the member y of the class of x (or the class pointed 
to by xp). However, if x or xp refers to a dependent type that is not 
the current instantiation, the type of y is always dependent. If x or xp 
refers to a non-dependent type or refers to the current instantiation, 
the type of y is the type of the class member access expression. —end note ]


Previously we had been treating such expressions as type-dependent if 
the object-expression is type-dependent, even if its type is the current 
instantiation.  Fixing this required a few changes in other areas that 
now have to deal with non-dependent member function calls within a template.


Tested x86_64-pc-linux-gnu, applying to trunk.

commit 92bfc5a7586b8d862951d2e1a94398a3ab19ef47
Author: Jason Merrill 
Date:   Fri May 13 11:36:35 2016 -0400

	Fix type-dependence and the current instantiation.

	PR c++/10200
	PR c++/69753
	* pt.c (tsubst_decl): Use uses_template_parms.
	(instantiate_template_1): Handle non-dependent calls in templates.
	(value_dependent_expression_p): Handle BASELINK, FUNCTION_DECL.
	(type_dependent_expression_p): Only consider innermost template	args.
	(dependent_template_arg_p): Check enclosing class of a template here.
	(dependent_template_p): Not here.
	(type_dependent_object_expression_p): New.
	* typeck.c (finish_class_member_access_expr): Use it.
	* parser.c (cp_parser_postfix_expression): Use it.
	(cp_parser_postfix_dot_deref_expression): Use it.  Use comptypes
	to detect the current instantiation.
	(cp_parser_lookup_name): Really implement DR 141.
	* search.c (lookup_field_r): Prefer a dependent using-declaration.
	(any_dependent_bases_p): Split out from...
	* name-lookup.c (do_class_using_decl): ...here.
	* call.c (build_new_method_call_1): Use it.
	* semantics.c (finish_call_expr): 'this' doesn't make a call dependent.
	* tree.c (non_static_member_function_p): Remove.
	* typeck2.c (build_x_arrow): Use dependent_scope_p.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index a49bbb5..0b59c40 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8407,6 +8407,9 @@ build_new_method_call_1 (tree instance, tree fns, vec **args,
 		 we know we really need it.  */
 		  cand->first_arg = instance;
 		}
+	  else if (any_dependent_bases_p ())
+		/* We can't tell until instantiation time whether we can use
+		   *this as the implicit object argument.  */;
 	  else
 		{
 		  if (complain & tf_error)
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 556c256..ad21cdf 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6125,6 +6125,7 @@ extern bool any_dependent_template_arguments_p  (const_tree);
 extern bool dependent_template_p		(tree);
 extern bool dependent_template_id_p		(tree, tree);
 extern bool type_dependent_expression_p		(tree);
+extern bool type_dependent_object_expression_p	(tree);
 extern bool any_type_dependent_arguments_p  (const vec *);
 extern bool any_type_dependent_elements_p   (const_tree);
 extern bool type_dependent_expression_p_push	(tree);
@@ -6233,6 +6234,7 @@ extern tree adjust_result_of_qualified_name_lookup
 extern tree copied_binfo			(tree, tree);
 extern tree original_binfo			(tree, tree);
 extern int shared_member_p			(tree);
+extern bool any_dependent_bases_p (tree = current_nonlambda_class_type ());
 
 /* The representation of a deferred access check.  */
 
@@ -6525,7 +6527,6 @@ extern tree get_first_fn			(tree);
 extern tree ovl_cons(tree, tree);
 extern tree build_overload			(tree, tree);
 extern tree ovl_scope(tree);
-extern bool non_static_member_function_p(tree);
 extern const char *cxx_printable_name		(tree, int);
 extern const char *cxx_printable_name_translate	(tree, int);
 extern tree build_exception_variant		(tree, tree);
diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 86d260c..d32a153 100644
--- 

match.pd: ~X & Y to X ^ Y in some cases

2016-05-13 Thread Marc Glisse

Hello,

maybe this would fit better in VRP, but it is easier (and not completely 
useless) to put it in match.pd.


Since the transformation is restricted to GIMPLE, I think I don't need to 
check that @0 is SSA_NAME. I didn't test if @0 has pointer type before 
calling get_range_info because we are doing bit_not on it, but it looks 
like I should because we can do bitops on pointers?


Adjustment for pr69270.c is exactly the same as in the previous patch from 
today :-)


Bootstrap+regtest on powerpc64le-unknown-linux-gnu.


2016-05-16  Marc Glisse  

gcc/
* match.pd (~X & Y): New transformation.

gcc/testsuite/
* gcc.dg/tree-ssa/pr69270.c: Adjust.
* gcc.dg/tree-ssa/andnot-1.c: New testcase.


--
Marc GlisseIndex: gcc/match.pd
===
--- gcc/match.pd(revision 236194)
+++ gcc/match.pd(working copy)
@@ -496,20 +496,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (minus @1 (bit_xor @0 @1)))
 
 /* Simplify (X & ~Y) | (~X & Y) -> X ^ Y.  */
 (simplify
  (bit_ior (bit_and:c @0 (bit_not @1)) (bit_and:c (bit_not @0) @1))
   (bit_xor @0 @1))
 (simplify
  (bit_ior:c (bit_and @0 INTEGER_CST@2) (bit_and (bit_not @0) INTEGER_CST@1))
  (if (wi::bit_not (@2) == @1)
   (bit_xor @0 @1)))
+/* Simplify (~X & Y) to X ^ Y if we know that (X & ~Y) is 0.  */
+#if GIMPLE
+(simplify
+ (bit_and (bit_not @0) INTEGER_CST@1)
+ (if ((get_nonzero_bits (@0) & wi::bit_not (@1)) == 0)
+  (bit_xor @0 @1)))
+#endif
 
 /* X % Y is smaller than Y.  */
 (for cmp (lt ge)
  (simplify
   (cmp (trunc_mod @0 @1) @1)
   (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
{ constant_boolean_node (cmp == LT_EXPR, type); })))
 (for cmp (gt le)
  (simplify
   (cmp @1 (trunc_mod @0 @1))
Index: gcc/testsuite/gcc.dg/tree-ssa/andnot-1.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/andnot-1.c(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/andnot-1.c(working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized-raw" } */
+
+unsigned f(unsigned i){
+  i >>= __SIZEOF_INT__ * __CHAR_BIT__ - 3;
+  i = ~i;
+  return i & 7;
+}
+
+/* { dg-final { scan-tree-dump "bit_xor_expr" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "bit_not_expr" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "bit_and_expr" "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr69270.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/pr69270.c (revision 236194)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr69270.c (working copy)
@@ -1,21 +1,19 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fsplit-paths -fdump-tree-dom3-details" } */
 
 /* There should be two references to bufferstep that turn into
constants.  */
 /* { dg-final { scan-tree-dump-times "Replaced .bufferstep_\[0-9\]+. with 
constant .0." 1 "dom3"} } */
 /* { dg-final { scan-tree-dump-times "Replaced .bufferstep_\[0-9\]+. with 
constant .1." 1 "dom3"} } */
 
 /* And some assignments ought to fold down to constants.  */
-/* { dg-final { scan-tree-dump-times "Folded to: _\[0-9\]+ = -1;" 1 "dom3"} } 
*/
-/* { dg-final { scan-tree-dump-times "Folded to: _\[0-9\]+ = -2;" 1 "dom3"} } 
*/
 /* { dg-final { scan-tree-dump-times "Folded to: _\[0-9\]+ = 1;" 1 "dom3"} } */
 /* { dg-final { scan-tree-dump-times "Folded to: _\[0-9\]+ = 0;" 1 "dom3"} } */
 
 /* The XOR operations should have been optimized to constants.  */
 /* { dg-final { scan-tree-dump-not "bit_xor" "dom3"} } */
 
 
 extern int *stepsizeTable;
 
 void


Re: [PATCH] Import config.sub and config.guess from upstream.

2016-05-13 Thread Jakub Sejdak
OK I understand. So am I right, that in such a case there is no way to
introduce new OS targets to branch 4.9 and 5?
What about 6 branch and trunk?

On the other hand: this is my first patch and I'm not quite familiar
with the procedure of applying patches to upstream. Who should upload
my patch, when it gets accepted?

Thank you,
Jakub

2016-05-13 20:03 GMT+02:00 Joseph Myers :
> On Fri, 13 May 2016, Jakub Sejdak wrote:
>
>> Is it OK for trunk, gcc-4.9, gcc-5 and gcc-6 branches?
>
> It's not appropriate to update these scripts from upstream on release
> branches.  For example, config.guess changed a while back to output
> x86_64-pc-linux-gnu in place of x86_64-unknown-linux-gnu, and clearly we
> don't want to make such a change on 4.9 and 5 branches, which have
> versions predating that change.
>
> --
> Joseph S. Myers
> jos...@codesourcery.com


VRP: range info of new variables

2016-05-13 Thread Marc Glisse

Hello,

when VRP does some transforms, it may create new SSA_NAMEs, but doesn't 
give them range information. This can prevent cascading transformations in 
a single VRP pass. With this patch, I assign range information to the 
variable introduced by one transformation, and in another transformation, 
I get range information through get_range_info instead of get_value_range 
in order to have access to the new information.


Some notes:
- get_range_info only applies to integers, not pointers. I hope we are not 
losing much by restricting this transformation. I could also call 
get_value_range and only fall back to get_range_info if that failed (and 
we don't have a pointer), but it doesn't seem worth it.
- Now that I think of it, maybe I should check that the variable is not a 
pointer before calling set_range_info? Having range [0, 1] makes it 
unlikely, but who knows...

- wide_int is much more complicated to use than I expected :-(
- the foldings that disappear in pr69270.c are for dead variables that are 
now eliminated earlier.


Bootstrap+regtest on powerpc64le-unknown-linux-gnu, I also checked 
manually that an earlier version of the patch fixes vrp47.c on 
x86_64-pc-linux-gnu.


2016-05-16  Marc Glisse  

gcc/
* tree-vrp.c (simplify_truth_ops_using_ranges): Set range
information for new SSA_NAME.
(simplify_switch_using_ranges): Get range through get_range_info
instead of get_value_range.

gcc/testsuite/
* gcc.dg/tree-ssa/pr69270.c: Adjust.
* gcc.dg/tree-ssa/vrp99.c: New testcase.

--
Marc GlisseIndex: gcc/testsuite/gcc.dg/tree-ssa/pr69270.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/pr69270.c (revision 236194)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr69270.c (working copy)
@@ -1,21 +1,19 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fsplit-paths -fdump-tree-dom3-details" } */
 
 /* There should be two references to bufferstep that turn into
constants.  */
 /* { dg-final { scan-tree-dump-times "Replaced .bufferstep_\[0-9\]+. with 
constant .0." 1 "dom3"} } */
 /* { dg-final { scan-tree-dump-times "Replaced .bufferstep_\[0-9\]+. with 
constant .1." 1 "dom3"} } */
 
 /* And some assignments ought to fold down to constants.  */
-/* { dg-final { scan-tree-dump-times "Folded to: _\[0-9\]+ = -1;" 1 "dom3"} } 
*/
-/* { dg-final { scan-tree-dump-times "Folded to: _\[0-9\]+ = -2;" 1 "dom3"} } 
*/
 /* { dg-final { scan-tree-dump-times "Folded to: _\[0-9\]+ = 1;" 1 "dom3"} } */
 /* { dg-final { scan-tree-dump-times "Folded to: _\[0-9\]+ = 0;" 1 "dom3"} } */
 
 /* The XOR operations should have been optimized to constants.  */
 /* { dg-final { scan-tree-dump-not "bit_xor" "dom3"} } */
 
 
 extern int *stepsizeTable;
 
 void
Index: gcc/testsuite/gcc.dg/tree-ssa/vrp99.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/vrp99.c   (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/vrp99.c   (working copy)
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-vrp1" } */
+
+unsigned f(unsigned i){
+  i >>= __SIZEOF_INT__ * __CHAR_BIT__ - 1;
+  return i == 0;
+}
+
+/* { dg-final { scan-tree-dump-not "\\(unsigned int\\)" "vrp1" } } */
Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 236194)
+++ gcc/tree-vrp.c  (working copy)
@@ -8933,20 +8933,24 @@ simplify_truth_ops_using_ranges (gimple_
 gimple_assign_set_rhs_with_ops (gsi,
need_conversion
? NOP_EXPR : TREE_CODE (op0), op0);
   /* For A != B we substitute A ^ B.  Either with conversion.  */
   else if (need_conversion)
 {
   tree tem = make_ssa_name (TREE_TYPE (op0));
   gassign *newop
= gimple_build_assign (tem, BIT_XOR_EXPR, op0, op1);
   gsi_insert_before (gsi, newop, GSI_SAME_STMT);
+  if (TYPE_PRECISION (TREE_TYPE (tem)) > 1)
+   set_range_info (tem, VR_RANGE,
+   wi::zero (TYPE_PRECISION (TREE_TYPE (tem))),
+   wi::one (TYPE_PRECISION (TREE_TYPE (tem;
   gimple_assign_set_rhs_with_ops (gsi, NOP_EXPR, tem);
 }
   /* Or without.  */
   else
 gimple_assign_set_rhs_with_ops (gsi, BIT_XOR_EXPR, op0, op1);
   update_stmt (gsi_stmt (*gsi));
 
   return true;
 }
 
@@ -9641,50 +9645,48 @@ simplify_switch_using_ranges (gswitch *s
   return false;
 }
 
 /* Simplify an integral conversion from an SSA name in STMT.  */
 
 static bool
 simplify_conversion_using_ranges (gimple *stmt)
 {
   tree innerop, middleop, finaltype;
   gimple *def_stmt;
-  value_range *innervr;
   signop inner_sgn, middle_sgn, final_sgn;
   unsigned inner_prec, middle_prec, final_prec;
   widest_int innermin, innermed, innermax, middlemin, middlemed, middlemax;
 
   finaltype = TREE_TYPE (gimple_assign_lhs (stmt));
   if (!INTEGRAL_TYPE_P 

Re: [PATCH] Import config.sub and config.guess from upstream.

2016-05-13 Thread Joseph Myers
On Fri, 13 May 2016, Jakub Sejdak wrote:

> Is it OK for trunk, gcc-4.9, gcc-5 and gcc-6 branches?

It's not appropriate to update these scripts from upstream on release 
branches.  For example, config.guess changed a while back to output 
x86_64-pc-linux-gnu in place of x86_64-unknown-linux-gnu, and clearly we 
don't want to make such a change on 4.9 and 5 branches, which have 
versions predating that change.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, PR71084] Further improve of CFG change tracking in CSE

2016-05-13 Thread Bernhard Reutner-Fischer
On May 13, 2016 11:50:33 AM GMT+02:00, Richard Biener 
 wrote:
>On Fri, May 13, 2016 at 11:03 AM, Ilya Enkovich
> wrote:
>> Hi,
>>
>> This patch improves cse_cfg_altered computation by taking into
>account
>> cleanup_cfg returned values.  This resolves another case of
>invalidated
>> dominance info.
>>
>> Bootstrapped and regtested on x86_64-unknown-linux-gnu.  OK for
>turnk?
>
>Ok.

What about making cleanup_cfg __wur (i.e. warn_unused_result) at least when 
checking?

thanks,



[PATCH, testsuite]: Hancle AVX in gcc.dg/vect/tree-vect.h

2016-05-13 Thread Uros Bizjak
Also remove unneeded XOP handling.

2016-05-13  Uros Bizjak  

* gcc.dg/vect/tree-vect.h (check_vect): Handle AVX2,
remove XOP handling.

Tested on x86_64-linux-gnu AVX target and committed to mainline SVN.

Patch will be backported to other release branches.

Uros.
Index: gcc.dg/vect/tree-vect.h
===
--- gcc.dg/vect/tree-vect.h (revision 236210)
+++ gcc.dg/vect/tree-vect.h (working copy)
@@ -32,25 +32,26 @@ check_vect (void)
   asm volatile (".long 0x1484");
 #elif defined(__i386__) || defined(__x86_64__)
   {
-unsigned int a, b, c, d, want_level, want_c, want_d;
+unsigned int a, b, c, d,
+  want_level, want_b = 0, want_c = 0, want_d = 0;
 
 /* Determine what instruction set we've been compiled for, and detect
that we're running with it.  This allows us to at least do a compile
check for, e.g. SSE4.1 when the machine only supports SSE2.  */
-# ifdef __XOP__
-want_level = 0x8001, want_c = bit_XOP, want_d = 0;
+#if defined(__AVX2__)
+want_level = 7, want_b = bit_AVX2;
 # elif defined(__AVX__)
-want_level = 1, want_c = bit_AVX, want_d = 0;
+want_level = 1, want_c = bit_AVX;
 # elif defined(__SSE4_1__)
-want_level = 1, want_c = bit_SSE4_1, want_d = 0;
+want_level = 1, want_c = bit_SSE4_1;
 # elif defined(__SSSE3__)
-want_level = 1, want_c = bit_SSSE3, want_d = 0;
+want_level = 1, want_c = bit_SSSE3;
 # else
-want_level = 1, want_c = 0, want_d = bit_SSE2;
+want_level = 1, want_d = bit_SSE2;
 # endif
 
 if (!__get_cpuid (want_level, , , , )
-   || ((c & want_c) | (d & want_d)) == 0)
+   || ((b & want_b) | (c & want_c) | (d & want_d)) == 0)
   exit (0);
   }
 #elif defined(__sparc__)


Re: [PATCH] Use HOST_WIDE_INT_C some more in the i386 backend

2016-05-13 Thread Uros Bizjak
On Fri, May 13, 2016 at 7:11 PM, Jakub Jelinek  wrote:
> Hi!
>
> I found a couple of spots where we can use the HOST_WIDE_INT_C macro.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-05-13  Jakub Jelinek  
>
> * config/i386/i386.c (ix86_compute_frame_layout, ix86_expand_prologue,
> ix86_expand_split_stack_prologue): Use HOST_WIDE_INT_C macro.
> (ix86_split_to_parts): Likewise.  Fix up formatting.

OK.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2016-05-12 09:46:40.0 +0200
> +++ gcc/config/i386/i386.c  2016-05-12 10:46:35.862566575 +0200
> @@ -11957,7 +11957,7 @@ ix86_compute_frame_layout (struct ix86_f
>to_allocate = offset - frame->sse_reg_save_offset;
>
>if ((!to_allocate && frame->nregs <= 1)
> -  || (TARGET_64BIT && to_allocate >= (HOST_WIDE_INT) 0x8000))
> +  || (TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000)))
>  frame->save_regs_using_mov = false;
>
>if (ix86_using_red_zone ()
> @@ -13379,7 +13379,7 @@ ix86_expand_prologue (void)
> {
>   HOST_WIDE_INT size = allocate;
>
> - if (TARGET_64BIT && size >= (HOST_WIDE_INT) 0x8000)
> + if (TARGET_64BIT && size >= HOST_WIDE_INT_C (0x8000))
> size = 0x8000 - STACK_CHECK_PROTECT - 1;
>
>   if (TARGET_STACK_PROBE)
> @@ -14320,7 +14320,7 @@ ix86_expand_split_stack_prologue (void)
>  different function: __morestack_large.  We pass the
>  argument size in the upper 32 bits of r10 and pass the
>  frame size in the lower 32 bits.  */
> - gcc_assert ((allocate & (HOST_WIDE_INT) 0x) == allocate);
> + gcc_assert ((allocate & HOST_WIDE_INT_C (0x)) == allocate);
>   gcc_assert ((args_size & 0x) == args_size);
>
>   if (split_stack_fn_large == NULL_RTX)
> @@ -24554,20 +24554,17 @@ ix86_split_to_parts (rtx operand, rtx *p
>   real_to_target (l, CONST_DOUBLE_REAL_VALUE (operand), mode);
>
>   /* real_to_target puts 32-bit pieces in each long.  */
> - parts[0] =
> -   gen_int_mode
> - ((l[0] & (HOST_WIDE_INT) 0x)
> -  | ((l[1] & (HOST_WIDE_INT) 0x) << 32),
> -  DImode);
> + parts[0] = gen_int_mode ((l[0] & HOST_WIDE_INT_C (0x))
> +  | ((l[1] & HOST_WIDE_INT_C 
> (0x))
> + << 32), DImode);
>
>   if (upper_mode == SImode)
> parts[1] = gen_int_mode (l[2], SImode);
>   else
> -   parts[1] =
> - gen_int_mode
> -   ((l[2] & (HOST_WIDE_INT) 0x)
> -| ((l[3] & (HOST_WIDE_INT) 0x) << 32),
> -DImode);
> +   parts[1]
> + = gen_int_mode ((l[2] & HOST_WIDE_INT_C (0x))
> + | ((l[3] & HOST_WIDE_INT_C (0x))
> +<< 32), DImode);
> }
>   else
> gcc_unreachable ();
>
> Jakub


Re: [PATCH, i386]: Additional fix for PR62599 with -mcmodel=medium -fpic

2016-05-13 Thread Uros Bizjak
On Fri, May 13, 2016 at 7:20 PM, H.J. Lu  wrote:
> On Fri, May 13, 2016 at 9:51 AM, Uros Bizjak  wrote:
>> On Fri, May 13, 2016 at 9:07 AM, Uros Bizjak  wrote:
>>> On Fri, May 13, 2016 at 1:20 AM, H.J. Lu  wrote:
>>>
>> testsuite/gcc.target/i386/pr61599-{1,2}.c testcases expose a failure
>> with -mcmodel -fpic, where:
>>
>> /tmp/ccfpoxHY.o: In function `bar':
>> pr61599-2.c:(.text+0xe): relocation truncated to fit: R_X86_64_PC32
>> against symbol `a' defined in LARGE_COMMON section in /tmp/ccKTKST2.o
>> collect2: error: ld returned 1 exit status
>> compiler exited with status 1
>>
>> CM_MEDIUM_PIC code model assumes that code+got/plt fits in a 31 bit
>> region, data is unlimited. Based on these assumptions, code should be
>> accessed via R_X86_64_GOT64.
>>
>> Attached patch uses UNSPEC_GOT instead of UNSPEC_GOTPCREL also for
>> CM_MEDIUM_PIC.
>>
>> 2016-05-12  Uros Bizjak  
>>
>> PR target/61599
>> * config/i386/i386.c (legitimize_pic_address): Do not use
>> UNSPEC_GOTPCREL for CM_MEDIUM_PIC code model.
>>
>> Patch was bootstrapped on x86_64-linux-gnu and regression tested with
>> -mcmodel=medium -fpic.
>>
>> Jakub, H.J., do you have any comments on the patch?
>
>
> I prefer this patch.
>>
>>> Yes, your patch is more precise.
>>
>> OTOH, are we sure there is no linker bug here? We have:
>>
>> $ objdump -dr pr61599-1a.o
>>
>> pr61599-1a.o: file format elf64-x86-64
>>
>>
>> Disassembly of section .text:
>>
>>  :
>>0:   55  push   %rbp
>>1:   48 89 e5mov%rsp,%rbp
>>4:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# b 
>> 7: R_X86_64_GOTPC32 _GLOBAL_OFFSET_TABLE_-0x4
>>b:   b8 00 00 00 00  mov$0x0,%eax
>>   10:   e8 00 00 00 00  callq  15 
>> 11: R_X86_64_PLT32  bar-0x4
>>   15:   89 c2   mov%eax,%edx
>>   17:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 1e 
>> 1a: R_X86_64_GOTPCREL   c-0x4
>>   1e:   0f b6 80 e1 00 00 00movzbl 0xe1(%rax),%eax
>>   25:   0f be c0movsbl %al,%eax
>>   28:   01 d0   add%edx,%eax
>>   2a:   5d  pop%rbp
>>   2b:   c3  retq
>>
>> $ objdump -dr pr61599-2a.o
>>
>> pr61599-2a.o: file format elf64-x86-64
>>
>>
>> Disassembly of section .text:
>>
>>  :
>>0:   55  push   %rbp
>>1:   48 89 e5mov%rsp,%rbp
>>4:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# b 
>> 7: R_X86_64_GOTPC32 _GLOBAL_OFFSET_TABLE_-0x4
>>b:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 12 
>> e: R_X86_64_GOTPCRELa-0x4
>>   12:   0f b6 40 02 movzbl 0x2(%rax),%eax
>>   16:   0f be d0movsbl %al,%edx
>>   19:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 20 
>> 1c: R_X86_64_GOTPCREL   b-0x4
>>   20:   0f b6 40 10 movzbl 0x10(%rax),%eax
>>   24:   0f be c0movsbl %al,%eax
>>   27:   01 c2   add%eax,%edx
>>   29:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 30 
>> 2c: R_X86_64_GOTPCREL   c-0x4
>>   30:   0f b6 80 00 01 00 00movzbl 0x100(%rax),%eax
>>   37:   0f be c0movsbl %al,%eax
>>   3a:   01 d0   add%edx,%eax
>>   3c:   5d  pop%rbp
>>   3d:   c3  retq
>>
>> $ gcc pr61599-2a.o pr61599-1a.o
>> pr61599-2a.o: In function `bar':
>> pr61599-2a.c:(.text+0xe): relocation truncated to fit: R_X86_64_PC32
>> against symbol `a' defined in LARGE_COMMON section in pr61599-1a.o
>> collect2: error: ld returned 1 exit status
>>
>> There is no R_X86_64_PC32 in pr61599-2a.o, only X86_64_GOTPCREL, which
>> according to assumptions, would reach all entries in the GOT table.
>
> I opened:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=20093

I  also reported

https://sourceware.org/bugzilla/show_bug.cgi?id=20092

Thanks,
Uros.


[PATCH] Fix --enable-checking=fold bootstrap (PR bootstrap/71071)

2016-05-13 Thread Jakub Jelinek
Hi!

Since a recent change, TYPE_ALIAS_SET of types can change during folding.
This patch arranges for it not to be checksummed.

Bootstrapped/regtested on x86_64-linux and i686-linux (normal bootstrap)
and built (nonbootstrap) + regtested the
--enable-checking=yes,extra,fold,rtl version (previously it wouldn't build
even libgcc).

Ok for trunk?

2016-05-13  Jakub Jelinek  

PR bootstrap/71071
* fold-const.c (fold_checksum_tree): Allow modification
of TYPE_ALIAS_SET during folding.

* gcc.dg/pr71071.c: New test.

--- gcc/fold-const.c.jj 2016-05-13 13:58:12.381020172 +0200
+++ gcc/fold-const.c2016-05-13 14:22:51.724912335 +0200
@@ -12130,7 +12130,8 @@ fold_checksum_tree (const_tree expr, str
   || TYPE_REFERENCE_TO (expr)
   || TYPE_CACHED_VALUES_P (expr)
   || TYPE_CONTAINS_PLACEHOLDER_INTERNAL (expr)
-  || TYPE_NEXT_VARIANT (expr)))
+  || TYPE_NEXT_VARIANT (expr)
+  || TYPE_ALIAS_SET_KNOWN_P (expr)))
 {
   /* Allow these fields to be modified.  */
   tree tmp;
@@ -12140,6 +12141,7 @@ fold_checksum_tree (const_tree expr, str
   TYPE_POINTER_TO (tmp) = NULL;
   TYPE_REFERENCE_TO (tmp) = NULL;
   TYPE_NEXT_VARIANT (tmp) = NULL;
+  TYPE_ALIAS_SET (tmp) = -1;
   if (TYPE_CACHED_VALUES_P (tmp))
{
  TYPE_CACHED_VALUES_P (tmp) = 0;
--- gcc/testsuite/gcc.dg/pr71071.c.jj   2016-05-13 14:23:39.528278177 +0200
+++ gcc/testsuite/gcc.dg/pr71071.c  2016-05-13 14:22:08.0 +0200
@@ -0,0 +1,12 @@
+/* PR bootstrap/71071 */
+/* { dg-do compile } *
+/* { dg-options "-O2" } */
+
+struct S { unsigned b : 1; } a;
+
+void
+foo ()
+{
+  if (a.b)
+;
+}

Jakub


Re: [PATCH, i386]: Additional fix for PR62599 with -mcmodel=medium -fpic

2016-05-13 Thread H.J. Lu
On Fri, May 13, 2016 at 9:51 AM, Uros Bizjak  wrote:
> On Fri, May 13, 2016 at 9:07 AM, Uros Bizjak  wrote:
>> On Fri, May 13, 2016 at 1:20 AM, H.J. Lu  wrote:
>>
> testsuite/gcc.target/i386/pr61599-{1,2}.c testcases expose a failure
> with -mcmodel -fpic, where:
>
> /tmp/ccfpoxHY.o: In function `bar':
> pr61599-2.c:(.text+0xe): relocation truncated to fit: R_X86_64_PC32
> against symbol `a' defined in LARGE_COMMON section in /tmp/ccKTKST2.o
> collect2: error: ld returned 1 exit status
> compiler exited with status 1
>
> CM_MEDIUM_PIC code model assumes that code+got/plt fits in a 31 bit
> region, data is unlimited. Based on these assumptions, code should be
> accessed via R_X86_64_GOT64.
>
> Attached patch uses UNSPEC_GOT instead of UNSPEC_GOTPCREL also for
> CM_MEDIUM_PIC.
>
> 2016-05-12  Uros Bizjak  
>
> PR target/61599
> * config/i386/i386.c (legitimize_pic_address): Do not use
> UNSPEC_GOTPCREL for CM_MEDIUM_PIC code model.
>
> Patch was bootstrapped on x86_64-linux-gnu and regression tested with
> -mcmodel=medium -fpic.
>
> Jakub, H.J., do you have any comments on the patch?


 I prefer this patch.
>
>> Yes, your patch is more precise.
>
> OTOH, are we sure there is no linker bug here? We have:
>
> $ objdump -dr pr61599-1a.o
>
> pr61599-1a.o: file format elf64-x86-64
>
>
> Disassembly of section .text:
>
>  :
>0:   55  push   %rbp
>1:   48 89 e5mov%rsp,%rbp
>4:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# b 
> 7: R_X86_64_GOTPC32 _GLOBAL_OFFSET_TABLE_-0x4
>b:   b8 00 00 00 00  mov$0x0,%eax
>   10:   e8 00 00 00 00  callq  15 
> 11: R_X86_64_PLT32  bar-0x4
>   15:   89 c2   mov%eax,%edx
>   17:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 1e 
> 1a: R_X86_64_GOTPCREL   c-0x4
>   1e:   0f b6 80 e1 00 00 00movzbl 0xe1(%rax),%eax
>   25:   0f be c0movsbl %al,%eax
>   28:   01 d0   add%edx,%eax
>   2a:   5d  pop%rbp
>   2b:   c3  retq
>
> $ objdump -dr pr61599-2a.o
>
> pr61599-2a.o: file format elf64-x86-64
>
>
> Disassembly of section .text:
>
>  :
>0:   55  push   %rbp
>1:   48 89 e5mov%rsp,%rbp
>4:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# b 
> 7: R_X86_64_GOTPC32 _GLOBAL_OFFSET_TABLE_-0x4
>b:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 12 
> e: R_X86_64_GOTPCRELa-0x4
>   12:   0f b6 40 02 movzbl 0x2(%rax),%eax
>   16:   0f be d0movsbl %al,%edx
>   19:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 20 
> 1c: R_X86_64_GOTPCREL   b-0x4
>   20:   0f b6 40 10 movzbl 0x10(%rax),%eax
>   24:   0f be c0movsbl %al,%eax
>   27:   01 c2   add%eax,%edx
>   29:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 30 
> 2c: R_X86_64_GOTPCREL   c-0x4
>   30:   0f b6 80 00 01 00 00movzbl 0x100(%rax),%eax
>   37:   0f be c0movsbl %al,%eax
>   3a:   01 d0   add%edx,%eax
>   3c:   5d  pop%rbp
>   3d:   c3  retq
>
> $ gcc pr61599-2a.o pr61599-1a.o
> pr61599-2a.o: In function `bar':
> pr61599-2a.c:(.text+0xe): relocation truncated to fit: R_X86_64_PC32
> against symbol `a' defined in LARGE_COMMON section in pr61599-1a.o
> collect2: error: ld returned 1 exit status
>
> There is no R_X86_64_PC32 in pr61599-2a.o, only X86_64_GOTPCREL, which
> according to assumptions, would reach all entries in the GOT table.

I opened:

https://sourceware.org/bugzilla/show_bug.cgi?id=20093

-- 
H.J.


[PATCH] Allow XMM16-XMM31 in vpbroadcast*

2016-05-13 Thread Jakub Jelinek
Hi!

These insns are either AVX512VL or AVX512VL & BW, this patch allows using
XMM16+ where possible.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-13  Jakub Jelinek  

* config/i386/sse.md (pbroadcast_evex_isa): New mode attr.
(avx2_pbroadcast): Add another alternative with v instead
of x constraints in it, using  isa.
(avx2_pbroadcast_1): Similarly, add two such alternatives.

* gcc.target/i386/avx512bw-vpbroadcast-1.c: New test.
* gcc.target/i386/avx512bw-vpbroadcast-2.c: New test.
* gcc.target/i386/avx512bw-vpbroadcast-3.c: New test.
* gcc.target/i386/avx512vl-vpbroadcast-1.c: New test.
* gcc.target/i386/avx512vl-vpbroadcast-2.c: New test.
* gcc.target/i386/avx512vl-vpbroadcast-3.c: New test.

--- gcc/config/i386/sse.md.jj   2016-05-13 16:12:24.631965207 +0200
+++ gcc/config/i386/sse.md  2016-05-13 17:33:32.429909899 +0200
@@ -16725,30 +16725,40 @@ (define_insn "avx_vzeroupper"
(set_attr "btver2_decode" "vector")
(set_attr "mode" "OI")])
 
+(define_mode_attr pbroadcast_evex_isa
+  [(V64QI "avx512bw") (V32QI "avx512bw") (V16QI "avx512bw")
+   (V32HI "avx512bw") (V16HI "avx512bw") (V8HI "avx512bw")
+   (V16SI "avx512f") (V8SI "avx512f") (V4SI "avx512f")
+   (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f")])
+
 (define_insn "avx2_pbroadcast"
-  [(set (match_operand:VI 0 "register_operand" "=x")
+  [(set (match_operand:VI 0 "register_operand" "=x,v")
(vec_duplicate:VI
  (vec_select:
-   (match_operand: 1 "nonimmediate_operand" "xm")
+   (match_operand: 1 "nonimmediate_operand" "xm,vm")
(parallel [(const_int 0)]]
   "TARGET_AVX2"
   "vpbroadcast\t{%1, %0|%0, %1}"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "*,")
+   (set_attr "type" "ssemov")
(set_attr "prefix_extra" "1")
-   (set_attr "prefix" "vex")
+   (set_attr "prefix" "vex,evex")
(set_attr "mode" "")])
 
 (define_insn "avx2_pbroadcast_1"
-  [(set (match_operand:VI_256 0 "register_operand" "=x,x")
+  [(set (match_operand:VI_256 0 "register_operand" "=x,x,v,v")
(vec_duplicate:VI_256
  (vec_select:
-   (match_operand:VI_256 1 "nonimmediate_operand" "m,x")
+   (match_operand:VI_256 1 "nonimmediate_operand" "m,x,m,v")
(parallel [(const_int 0)]]
   "TARGET_AVX2"
   "@
vpbroadcast\t{%1, %0|%0, %1}
+   vpbroadcast\t{%x1, %0|%0, %x1}
+   vpbroadcast\t{%1, %0|%0, %1}
vpbroadcast\t{%x1, %0|%0, %x1}"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "*,*,,")
+   (set_attr "type" "ssemov")
(set_attr "prefix_extra" "1")
(set_attr "prefix" "vex")
(set_attr "mode" "")])
--- gcc/testsuite/gcc.target/i386/avx512bw-vpbroadcast-1.c.jj   2016-05-13 
16:58:07.491988435 +0200
+++ gcc/testsuite/gcc.target/i386/avx512bw-vpbroadcast-1.c  2016-05-13 
17:31:29.830534782 +0200
@@ -0,0 +1,104 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512vl -mavx512bw" } */
+
+#include 
+
+void
+f1 (__m128i x)
+{
+  register __m128i a __asm ("xmm16");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  a = _mm_broadcastb_epi8 (a);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler "vpbroadcastb\[^\n\r]*xmm16\[^\n\r]*xmm16" } } 
*/
+
+void
+f2 (__m128i x)
+{
+  register __m128i a __asm ("xmm16");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  a = _mm_broadcastw_epi16 (a);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler "vpbroadcastw\[^\n\r]*xmm16\[^\n\r]*xmm16" } } 
*/
+
+void
+f3 (__m128i x)
+{
+  register __m128i a __asm ("xmm16");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  a = _mm_broadcastd_epi32 (a);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler "vpbroadcastd\[^\n\r]*xmm16\[^\n\r]*xmm16" } } 
*/
+
+void
+f4 (__m128i x)
+{
+  register __m128i a __asm ("xmm16");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  a = _mm_broadcastq_epi64 (a);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler "vpbroadcastq\[^\n\r]*xmm16\[^\n\r]*xmm16" } } 
*/
+
+void
+f5 (__m128i x)
+{
+  register __m128i a __asm ("xmm16");
+  register __m256i b __asm ("xmm17");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  b = _mm256_broadcastb_epi8 (a);
+  asm volatile ("" : "+v" (b));
+}
+
+/* { dg-final { scan-assembler 
"vpbroadcastb\[^\n\r]*(xmm1\[67]\[^\n\r]*ymm1\[67]|ymm1\[67]\[^\n\r]*xmm1\[67])"
 } } */
+
+void
+f6 (__m128i x)
+{
+  register __m128i a __asm ("xmm16");
+  register __m256i b __asm ("xmm17");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  b = _mm256_broadcastw_epi16 (a);
+  asm volatile ("" : "+v" (b));
+}
+
+/* { dg-final { scan-assembler 
"vpbroadcastw\[^\n\r]*(xmm1\[67]\[^\n\r]*ymm1\[67]|ymm1\[67]\[^\n\r]*xmm1\[67])"
 } } */
+
+void
+f7 (__m128i x)
+{
+  register __m128i a __asm ("xmm16");
+  register __m256i b __asm ("xmm17");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  b = _mm256_broadcastd_epi32 (a);
+  asm volatile 

[PATCH] Fix up vpalignr for -mavx512vl -mno-avx512bw

2016-05-13 Thread Jakub Jelinek
Hi!

vpalignr is AVX512BW & VL, so we shouldn't enable it just for VL.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-13  Jakub Jelinek  

* config/i386/sse.md (_palignr): Use
constraint x instead of v in second alternative, add avx512bw
alternative.

* gcc.target/i386/avx512vl-vpalignr-3.c: New test.
* gcc.target/i386/avx512bw-vpalignr-3.c: New test.

--- gcc/config/i386/sse.md.jj   2016-05-13 15:47:50.978978445 +0200
+++ gcc/config/i386/sse.md  2016-05-13 16:12:24.631965207 +0200
@@ -14289,11 +14289,11 @@ (define_insn "_palignr
(set_attr "mode" "")])
 
 (define_insn "_palignr"
-  [(set (match_operand:SSESCALARMODE 0 "register_operand" "=x,v")
+  [(set (match_operand:SSESCALARMODE 0 "register_operand" "=x,x,v")
(unspec:SSESCALARMODE
- [(match_operand:SSESCALARMODE 1 "register_operand" "0,v")
-  (match_operand:SSESCALARMODE 2 "vector_operand" "xBm,vm")
-  (match_operand:SI 3 "const_0_to_255_mul_8_operand" "n,n")]
+ [(match_operand:SSESCALARMODE 1 "register_operand" "0,x,v")
+  (match_operand:SSESCALARMODE 2 "vector_operand" "xBm,xm,vm")
+  (match_operand:SI 3 "const_0_to_255_mul_8_operand" "n,n,n")]
  UNSPEC_PALIGNR))]
   "TARGET_SSSE3"
 {
@@ -14304,18 +14304,19 @@ (define_insn "_palignr
 case 0:
   return "palignr\t{%3, %2, %0|%0, %2, %3}";
 case 1:
+case 2:
   return "vpalignr\t{%3, %2, %1, %0|%0, %1, %2, %3}";
 default:
   gcc_unreachable ();
 }
 }
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,avx,avx512bw")
(set_attr "type" "sseishft")
(set_attr "atom_unit" "sishuf")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,*,*")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,vex,evex")
(set_attr "mode" "")])
 
 (define_insn "ssse3_palignrdi"
--- gcc/testsuite/gcc.target/i386/avx512vl-vpalignr-3.c.jj  2016-05-13 
16:10:56.071176218 +0200
+++ gcc/testsuite/gcc.target/i386/avx512vl-vpalignr-3.c 2016-05-13 
16:11:30.292708261 +0200
@@ -0,0 +1,30 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512vl -mno-avx512bw" } */
+
+#include 
+
+void
+f1 (__m128i x, __m128i y)
+{
+  register __m128i a __asm ("xmm16"), b __asm ("xmm17");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = _mm_alignr_epi8 (a, b, 3);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler-not "vpalignr\[^\n\r]*xmm1\[67]" } } */
+
+void
+f2 (__m256i x, __m256i y)
+{
+  register __m256i a __asm ("xmm16"), b __asm ("xmm17");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = _mm256_alignr_epi8 (a, b, 3);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler-not "vpalignr\[^\n\r]*ymm1\[67]" } } */
--- gcc/testsuite/gcc.target/i386/avx512bw-vpalignr-3.c.jj  2016-05-13 
16:10:35.332459807 +0200
+++ gcc/testsuite/gcc.target/i386/avx512bw-vpalignr-3.c 2016-05-13 
16:09:23.0 +0200
@@ -0,0 +1,30 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512vl -mavx512bw" } */
+
+#include 
+
+void
+f1 (__m128i x, __m128i y)
+{
+  register __m128i a __asm ("xmm16"), b __asm ("xmm17");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = _mm_alignr_epi8 (a, b, 3);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler 
"vpalignr\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" } } */
+
+void
+f2 (__m256i x, __m256i y)
+{
+  register __m256i a __asm ("xmm16"), b __asm ("xmm17");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = _mm256_alignr_epi8 (a, b, 3);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler 
"vpalignr\[^\n\r]*ymm1\[67]\[^\n\r]*ymm1\[67]\[^\n\r]*ymm1\[67]" } } */

Jakub


[PATCH] Fix up vpshufb for -mavx512vl -mno-avx512bw

2016-05-13 Thread Jakub Jelinek
Hi!

vpshufb is AVX512BW & AVX512VL insn, so we shouldn't allow it for
AVX512VL only.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-13  Jakub Jelinek  

* config/i386/sse.md (_pshufb3): Use
constraint x instead of v in second alternative, add avx512bw
alternative.

* gcc.target/i386/avx512vl-vpshufb-3.c: New test.
* gcc.target/i386/avx512bw-vpshufb-3.c: New test.

--- gcc/config/i386/sse.md.jj   2016-05-13 15:18:24.791195754 +0200
+++ gcc/config/i386/sse.md  2016-05-13 15:47:50.978978445 +0200
@@ -14206,21 +14206,22 @@ (define_insn "*ssse3_pmulhrswv4hi3"
(set_attr "mode" "DI")])
 
 (define_insn "_pshufb3"
-  [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,v")
+  [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,x,v")
(unspec:VI1_AVX512
- [(match_operand:VI1_AVX512 1 "register_operand" "0,v")
-  (match_operand:VI1_AVX512 2 "vector_operand" "xBm,vm")]
+ [(match_operand:VI1_AVX512 1 "register_operand" "0,x,v")
+  (match_operand:VI1_AVX512 2 "vector_operand" "xBm,xm,vm")]
  UNSPEC_PSHUFB))]
   "TARGET_SSSE3 &&  && "
   "@
pshufb\t{%2, %0|%0, %2}
+   vpshufb\t{%2, %1, %0|%0, %1, %2}
vpshufb\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,avx,avx512bw")
(set_attr "type" "sselog1")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,*,*")
(set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,maybe_evex")
-   (set_attr "btver2_decode" "vector,vector")
+   (set_attr "prefix" "orig,maybe_evex,evex")
+   (set_attr "btver2_decode" "vector")
(set_attr "mode" "")])
 
 (define_insn "ssse3_pshufbv8qi3"
--- gcc/testsuite/gcc.target/i386/avx512vl-vpshufb-3.c.jj   2016-05-13 
15:42:41.261220799 +0200
+++ gcc/testsuite/gcc.target/i386/avx512vl-vpshufb-3.c  2016-05-13 
15:43:09.052841972 +0200
@@ -0,0 +1,30 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512vl -mno-avx512bw" } */
+
+#include 
+
+void
+f1 (__m128i x, __m128i y)
+{
+  register __m128i a __asm ("xmm16"), b __asm ("xmm17");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = _mm_shuffle_epi8 (a, b);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler-not "vpshufb\[^\n\r]*xmm1\[67]" } } */
+
+void
+f2 (__m256i x, __m256i y)
+{
+  register __m256i a __asm ("xmm16"), b __asm ("xmm17");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = _mm256_shuffle_epi8 (a, b);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler-not "vpshufb\[^\n\r]*ymm1\[67]" } } */
--- gcc/testsuite/gcc.target/i386/avx512bw-vpshufb-3.c.jj   2016-05-13 
15:42:23.788458969 +0200
+++ gcc/testsuite/gcc.target/i386/avx512bw-vpshufb-3.c  2016-05-13 
15:41:15.0 +0200
@@ -0,0 +1,30 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512vl -mavx512bw" } */
+
+#include 
+
+void
+f1 (__m128i x, __m128i y)
+{
+  register __m128i a __asm ("xmm16"), b __asm ("xmm17");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = _mm_shuffle_epi8 (a, b);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler 
"vpshufb\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" } } */
+
+void
+f2 (__m256i x, __m256i y)
+{
+  register __m256i a __asm ("xmm16"), b __asm ("xmm17");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = _mm256_shuffle_epi8 (a, b);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler 
"vpshufb\[^\n\r]*ymm1\[67]\[^\n\r]*ymm1\[67]\[^\n\r]*ymm1\[67]" } } */

Jakub


[PATCH] Fix up vpmulhrsw for -mavx512vl -mno-avx512bw

2016-05-13 Thread Jakub Jelinek
Hi!

vpmulhrsw is AVX512BW & AVX512VL insn, so we shouldn't enable it just
when AVX512VL is on.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2016-05-13  Jakub Jelinek  

* config/i386/sse.md (*_pmulhrsw3): Use
constraint x instead of v in second alternative, add avx512bw
alternative.

* gcc.target/i386/avx512vl-vpmulhrsw-3.c: New test.
* gcc.target/i386/avx512bw-vpmulhrsw-3.c: New test.

--- gcc/config/i386/sse.md.jj   2016-05-13 14:46:03.563465879 +0200
+++ gcc/config/i386/sse.md  2016-05-13 15:18:24.791195754 +0200
@@ -14158,16 +14158,16 @@ (define_expand "_pmulhrsw_pmulhrsw3"
-  [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,v")
+  [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,x,v")
(truncate:VI2_AVX2
  (lshiftrt:
(plus:
  (lshiftrt:
(mult:
  (sign_extend:
-   (match_operand:VI2_AVX2 1 "vector_operand" "%0,v"))
+   (match_operand:VI2_AVX2 1 "vector_operand" "%0,x,v"))
  (sign_extend:
-   (match_operand:VI2_AVX2 2 "vector_operand" "xBm,vm")))
+   (match_operand:VI2_AVX2 2 "vector_operand" "xBm,xm,vm")))
(const_int 14))
  (match_operand:VI2_AVX2 3 "const1_operand"))
(const_int 1]
@@ -14175,12 +14175,13 @@ (define_insn "*_pmulhrswmode, operands)"
   "@
pmulhrsw\t{%2, %0|%0, %2}
+   vpmulhrsw\t{%2, %1, %0|%0, %1, %2}
vpmulhrsw\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,avx,avx512bw")
(set_attr "type" "sseimul")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,*,*")
(set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,maybe_evex")
+   (set_attr "prefix" "orig,maybe_evex,evex")
(set_attr "mode" "")])
 
 (define_insn "*ssse3_pmulhrswv4hi3"
--- gcc/testsuite/gcc.target/i386/avx512vl-vpmulhrsw-3.c.jj 2016-05-13 
15:21:31.422684540 +0200
+++ gcc/testsuite/gcc.target/i386/avx512vl-vpmulhrsw-3.c2016-05-13 
15:22:26.664898687 +0200
@@ -0,0 +1,30 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512vl -mno-avx512bw" } */
+
+#include 
+
+void
+f1 (__m128i x, __m128i y)
+{
+  register __m128i a __asm ("xmm16"), b __asm ("xmm17");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = _mm_mulhrs_epi16 (a, b);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler-not "vpmulhrsw\[^\n\r]*xmm1\[67]" } } */
+
+void
+f2 (__m256i x, __m256i y)
+{
+  register __m256i a __asm ("xmm16"), b __asm ("xmm17");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = _mm256_mulhrs_epi16 (a, b);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler-not "vpmulhrsw\[^\n\r]*ymm1\[67]" } } */
--- gcc/testsuite/gcc.target/i386/avx512bw-vpmulhrsw-3.c.jj 2016-05-13 
15:20:42.658349830 +0200
+++ gcc/testsuite/gcc.target/i386/avx512bw-vpmulhrsw-3.c2016-05-13 
15:15:17.0 +0200
@@ -0,0 +1,30 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512vl -mavx512bw" } */
+
+#include 
+
+void
+f1 (__m128i x, __m128i y)
+{
+  register __m128i a __asm ("xmm16"), b __asm ("xmm17");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = _mm_mulhrs_epi16 (a, b);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler 
"vpmulhrsw\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" } } */
+
+void
+f2 (__m256i x, __m256i y)
+{
+  register __m256i a __asm ("xmm16"), b __asm ("xmm17");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = _mm256_mulhrs_epi16 (a, b);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler 
"vpmulhrsw\[^\n\r]*ymm1\[67]\[^\n\r]*ymm1\[67]\[^\n\r]*ymm1\[67]" } } */

Jakub


[PATCH] Allow XMM16-XMM31 in vpmaddubsw

2016-05-13 Thread Jakub Jelinek
Hi!

This is either AVX2 or for EVEX AVX512BW (& AVX512VL) instruction,
thus the patch adds it as a separate alternative guarded with avx512bw
isa attribute.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-13  Jakub Jelinek  

* config/i386/sse.md (avx2_pmaddubsw256, ssse3_pmaddubsw128): Add
avx512bw alternative.

* gcc.target/i386/avx512bw-vpmaddubsw-3.c: New test.

--- gcc/config/i386/sse.md.jj   2016-05-13 13:58:12.384020131 +0200
+++ gcc/config/i386/sse.md  2016-05-13 14:46:03.563465879 +0200
@@ -13933,12 +13933,12 @@ (define_insn "ssse3_ph
+
+void
+f1 (__m128i x, __m128i y)
+{
+  register __m128i a __asm ("xmm16"), b __asm ("xmm17");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = _mm_maddubs_epi16 (a, b);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler 
"vpmaddubsw\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" } } */
+
+void
+f2 (__m256i x, __m256i y)
+{
+  register __m256i a __asm ("xmm16"), b __asm ("xmm17");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = _mm256_maddubs_epi16 (a, b);
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler 
"vpmaddubsw\[^\n\r]*ymm1\[67]\[^\n\r]*ymm1\[67]\[^\n\r]*ymm1\[67]" } } */

Jakub


[PATCH] Use HOST_WIDE_INT_C some more in the i386 backend

2016-05-13 Thread Jakub Jelinek
Hi!

I found a couple of spots where we can use the HOST_WIDE_INT_C macro.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-13  Jakub Jelinek  

* config/i386/i386.c (ix86_compute_frame_layout, ix86_expand_prologue,
ix86_expand_split_stack_prologue): Use HOST_WIDE_INT_C macro.
(ix86_split_to_parts): Likewise.  Fix up formatting.

--- gcc/config/i386/i386.c.jj   2016-05-12 09:46:40.0 +0200
+++ gcc/config/i386/i386.c  2016-05-12 10:46:35.862566575 +0200
@@ -11957,7 +11957,7 @@ ix86_compute_frame_layout (struct ix86_f
   to_allocate = offset - frame->sse_reg_save_offset;
 
   if ((!to_allocate && frame->nregs <= 1)
-  || (TARGET_64BIT && to_allocate >= (HOST_WIDE_INT) 0x8000))
+  || (TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000)))
 frame->save_regs_using_mov = false;
 
   if (ix86_using_red_zone ()
@@ -13379,7 +13379,7 @@ ix86_expand_prologue (void)
{
  HOST_WIDE_INT size = allocate;
 
- if (TARGET_64BIT && size >= (HOST_WIDE_INT) 0x8000)
+ if (TARGET_64BIT && size >= HOST_WIDE_INT_C (0x8000))
size = 0x8000 - STACK_CHECK_PROTECT - 1;
 
  if (TARGET_STACK_PROBE)
@@ -14320,7 +14320,7 @@ ix86_expand_split_stack_prologue (void)
 different function: __morestack_large.  We pass the
 argument size in the upper 32 bits of r10 and pass the
 frame size in the lower 32 bits.  */
- gcc_assert ((allocate & (HOST_WIDE_INT) 0x) == allocate);
+ gcc_assert ((allocate & HOST_WIDE_INT_C (0x)) == allocate);
  gcc_assert ((args_size & 0x) == args_size);
 
  if (split_stack_fn_large == NULL_RTX)
@@ -24554,20 +24554,17 @@ ix86_split_to_parts (rtx operand, rtx *p
  real_to_target (l, CONST_DOUBLE_REAL_VALUE (operand), mode);
 
  /* real_to_target puts 32-bit pieces in each long.  */
- parts[0] =
-   gen_int_mode
- ((l[0] & (HOST_WIDE_INT) 0x)
-  | ((l[1] & (HOST_WIDE_INT) 0x) << 32),
-  DImode);
+ parts[0] = gen_int_mode ((l[0] & HOST_WIDE_INT_C (0x))
+  | ((l[1] & HOST_WIDE_INT_C (0x))
+ << 32), DImode);
 
  if (upper_mode == SImode)
parts[1] = gen_int_mode (l[2], SImode);
  else
-   parts[1] =
- gen_int_mode
-   ((l[2] & (HOST_WIDE_INT) 0x)
-| ((l[3] & (HOST_WIDE_INT) 0x) << 32),
-DImode);
+   parts[1]
+ = gen_int_mode ((l[2] & HOST_WIDE_INT_C (0x))
+ | ((l[3] & HOST_WIDE_INT_C (0x))
+<< 32), DImode);
}
  else
gcc_unreachable ();

Jakub


Re: Allow embedded timestamps by C/C++ macros to be set externally (3)

2016-05-13 Thread Dhole
On 16-05-12 11:16:57, Bernd Schmidt wrote:
> On 05/12/2016 02:36 AM, Dhole wrote:
> >+  error_at (input_location, "environment variable SOURCE_DATE_EPOCH 
> >must "
> >+"expand to a non-negative integer less than or equal to %wd",
> >+MAX_SOURCE_DATE_EPOCH);
> 
> >+/* The value (as a unix timestamp) corresponds to date
> >+   "Dec 31  23:59:59 UTC", which is the latest date that __DATE__ and
> >+   __TIME__ can store.  */
> >+#define MAX_SOURCE_DATE_EPOCH 253402300799
> 
> This should use HOST_WIDE_INT_C to make sure we match %wd in the error
> output, and to make sure we don't get any too large for an integer warnings.

Done.

> >+  struct tm *tb = NULL;
> [...]
> >+  snprintf (source_date_epoch, 21, "%llu", (unsigned long long) tb);
> 
> That seems like the wrong thing to print.

Sorry about that.  I actually feel bad about it.  Fixed.

> >diff --git a/gcc/testsuite/gcc.dg/cpp/source_date_epoch-2.c 
> >b/gcc/testsuite/gcc.dg/cpp/source_date_epoch-2.c
> >new file mode 100644
> >index 000..4211552
> >--- /dev/null
> >+++ b/gcc/testsuite/gcc.dg/cpp/source_date_epoch-2.c
> >@@ -0,0 +1,10 @@
> >+/* { dg-do compile } */
> >+/* { dg-set-compiler-env-var SOURCE_DATE_EPOCH "AAA" } */
> >+
> >+int
> >+main(void)
> >+{
> >+  __builtin_printf ("%s %s\n", __DATE__, __TIME__); /* { dg-error 
> >"environment variable SOURCE_DATE_EPOCH must expand to a non-negative 
> >integer less than or equal to 253402300799" "Invalid SOURCE_DATE_EPOCH not 
> >reported" } */
> 
> You can shorten the string you look for, like just "SOURCE_DATE_EPOCH must
> expand". People generally also skip the second arg to dg-error.

Done.

> >+  __builtin_printf ("%s %s\n", __DATE__, __TIME__); /* { dg-bogus 
> >"environment variable SOURCE_DATE_EPOCH must expand to a non-negative 
> >integer less than or equal to 253402300799" "Invalid SOURCE_DATE_EPOCH 
> >reported twice" }  */
> 
> I would have expected no dg- directive at all on this line. Without one, any
> message should be reported as an excess error by the framework.

I wasn't sure about that, thanks for clarifying.  Done.

> >@@ -874,6 +906,10 @@ if { [info procs saved-dg-test] == [list] } {
> > if [info exists set_target_env_var] {
> > unset set_target_env_var
> > }
> >+if [info exists set_compiler_env_var] {
> >+restore-compiler-env-var
> >+unset set_compiler_env_var
> >+}
> 
> Shouldn't we also clear saved_compiler_env_var to keep that from growing?

You're right, done.

> >@@ -389,9 +390,8 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned 
> >char *cpp_flags,
> >   enum cpp_ttype type;
> >   unsigned char add_flags = 0;
> >   enum overflow_type overflow = OT_NONE;
> >-  time_t source_date_epoch = get_source_date_epoch ();
> >
> >-  cpp_init_source_date_epoch (parse_in, source_date_epoch);
> >+  cpp_init_source_date_epoch (parse_in);
> >
> >   timevar_push (TV_CPP);
> >  retry:
> 
> I just spotted this - why is this initialization here and not in say
> init_c_lex? Or skip the call into libcpp and just put it in
> cpp_create_reader.

That makes more sense.  Moved the initialization to cpp_create_reader,
without the need to add a new function.

> >diff --git a/libcpp/macro.c b/libcpp/macro.c
> >index c2a8376..55e53bf 100644
> >--- a/libcpp/macro.c
> >+++ b/libcpp/macro.c
> >@@ -358,9 +358,13 @@ _cpp_builtin_macro_text (cpp_reader *pfile, 
> >cpp_hashnode *node,
> >   struct tm *tb = NULL;
> >
> >   /* Set a reproducible timestamp for __DATE__ and __TIME__ macro
> >- usage if SOURCE_DATE_EPOCH is defined.  */
> >-  if (pfile->source_date_epoch != (time_t) -1)
> >- tb = gmtime (>source_date_epoch);
> >+ if SOURCE_DATE_EPOCH is defined.  */
> >+  if (pfile->source_date_epoch == (time_t) -2
> >+  && pfile->cb.get_source_date_epoch != NULL)
> >+  pfile->source_date_epoch = pfile->cb.get_source_date_epoch(pfile);
> 
> Formatting.

Done.

Cheers,
-- 
Dhole
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 665448c..0a35086 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -12794,8 +12794,9 @@ valid_array_size_p (location_t loc, tree type, tree 
name)
 /* Read SOURCE_DATE_EPOCH from environment to have a deterministic
timestamp to replace embedded current dates to get reproducible
results.  Returns -1 if SOURCE_DATE_EPOCH is not defined.  */
+
 time_t
-get_source_date_epoch ()
+cb_get_source_date_epoch (cpp_reader *pfile ATTRIBUTE_UNUSED)
 {
   char *source_date_epoch;
   long long epoch;
@@ -12807,19 +12808,14 @@ get_source_date_epoch ()
 
   errno = 0;
   epoch = strtoll (source_date_epoch, , 10);
-  if ((errno == ERANGE && (epoch == LLONG_MAX || epoch == LLONG_MIN))
-  || (errno != 0 && epoch == 0))
-fatal_error (UNKNOWN_LOCATION, "environment variable $SOURCE_DATE_EPOCH: "
-"strtoll: %s\n", xstrerror(errno));
-  if (endptr == source_date_epoch)
-fatal_error (UNKNOWN_LOCATION, 

Re: [PATCH, i386]: Additional fix for PR62599 with -mcmodel=medium -fpic

2016-05-13 Thread Uros Bizjak
On Fri, May 13, 2016 at 6:51 PM, Uros Bizjak  wrote:
> On Fri, May 13, 2016 at 9:07 AM, Uros Bizjak  wrote:
>> On Fri, May 13, 2016 at 1:20 AM, H.J. Lu  wrote:
>>
> testsuite/gcc.target/i386/pr61599-{1,2}.c testcases expose a failure
> with -mcmodel -fpic, where:
>
> /tmp/ccfpoxHY.o: In function `bar':
> pr61599-2.c:(.text+0xe): relocation truncated to fit: R_X86_64_PC32
> against symbol `a' defined in LARGE_COMMON section in /tmp/ccKTKST2.o
> collect2: error: ld returned 1 exit status
> compiler exited with status 1
>
> CM_MEDIUM_PIC code model assumes that code+got/plt fits in a 31 bit
> region, data is unlimited. Based on these assumptions, code should be
> accessed via R_X86_64_GOT64.
>
> Attached patch uses UNSPEC_GOT instead of UNSPEC_GOTPCREL also for
> CM_MEDIUM_PIC.
>
> 2016-05-12  Uros Bizjak  
>
> PR target/61599
> * config/i386/i386.c (legitimize_pic_address): Do not use
> UNSPEC_GOTPCREL for CM_MEDIUM_PIC code model.
>
> Patch was bootstrapped on x86_64-linux-gnu and regression tested with
> -mcmodel=medium -fpic.
>
> Jakub, H.J., do you have any comments on the patch?


 I prefer this patch.
>
>> Yes, your patch is more precise.
>
> OTOH, are we sure there is no linker bug here? We have:
>
> $ objdump -dr pr61599-1a.o
>
> pr61599-1a.o: file format elf64-x86-64
>
>
> Disassembly of section .text:
>
>  :
>0:   55  push   %rbp
>1:   48 89 e5mov%rsp,%rbp
>4:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# b 
> 7: R_X86_64_GOTPC32 _GLOBAL_OFFSET_TABLE_-0x4
>b:   b8 00 00 00 00  mov$0x0,%eax
>   10:   e8 00 00 00 00  callq  15 
> 11: R_X86_64_PLT32  bar-0x4
>   15:   89 c2   mov%eax,%edx
>   17:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 1e 
> 1a: R_X86_64_GOTPCREL   c-0x4
>   1e:   0f b6 80 e1 00 00 00movzbl 0xe1(%rax),%eax
>   25:   0f be c0movsbl %al,%eax
>   28:   01 d0   add%edx,%eax
>   2a:   5d  pop%rbp
>   2b:   c3  retq
>
> $ objdump -dr pr61599-2a.o
>
> pr61599-2a.o: file format elf64-x86-64
>
>
> Disassembly of section .text:
>
>  :
>0:   55  push   %rbp
>1:   48 89 e5mov%rsp,%rbp
>4:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# b 
> 7: R_X86_64_GOTPC32 _GLOBAL_OFFSET_TABLE_-0x4
>b:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 12 
> e: R_X86_64_GOTPCRELa-0x4
>   12:   0f b6 40 02 movzbl 0x2(%rax),%eax
>   16:   0f be d0movsbl %al,%edx
>   19:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 20 
> 1c: R_X86_64_GOTPCREL   b-0x4
>   20:   0f b6 40 10 movzbl 0x10(%rax),%eax
>   24:   0f be c0movsbl %al,%eax
>   27:   01 c2   add%eax,%edx
>   29:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 30 
> 2c: R_X86_64_GOTPCREL   c-0x4
>   30:   0f b6 80 00 01 00 00movzbl 0x100(%rax),%eax
>   37:   0f be c0movsbl %al,%eax
>   3a:   01 d0   add%edx,%eax
>   3c:   5d  pop%rbp
>   3d:   c3  retq
>
> $ gcc pr61599-2a.o pr61599-1a.o
> pr61599-2a.o: In function `bar':
> pr61599-2a.c:(.text+0xe): relocation truncated to fit: R_X86_64_PC32
> against symbol `a' defined in LARGE_COMMON section in pr61599-1a.o
> collect2: error: ld returned 1 exit status
>
> There is no R_X86_64_PC32 in pr61599-2a.o, only X86_64_GOTPCREL, which
> according to assumptions, would reach all entries in the GOT table.

It is a bug in GNU linker:

$ gcc -fuse-ld=gold pr61599-2a.o pr61599-1a.o
$ ./a.out

Uros.


Re: [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports

2016-05-13 Thread Richard Biener
On May 13, 2016 6:02:27 PM GMT+02:00, Bin Cheng  wrote:
>Hi,
>As PR69848 reported, GCC vectorizer now generates comparison outside of
>VEC_COND_EXPR for COND_REDUCTION case, as below:
>
>  _20 = vect__1.6_8 != { 0, 0, 0, 0 };
>  vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
>  _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;
>
>This results in inefficient expanding.  With IR like:
>
>vect_c_2.8_16 = VEC_COND_EXPR 0, 0 }, vect_c_2.7_13>;
>  _21 = VEC_COND_EXPR ;
>
>We can do:
>1) Expanding time optimization, for example, reverting comparison
>operator by switching VEC_COND_EXPR operands.  This is useful when
>backend only supports some comparison operators.
>2) For backend not supporting vcond_mask patterns, saving one LT_EXPR
>instruction which introduced by expand_vec_cond_expr.
>
>This patch fixes this by propagating comparison into VEC_COND_EXPR even
>if it's used multiple times.  For now, GCC does single_use_only
>propagation.  Ideally, we may duplicate the comparison before each use
>statement just before expanding, so that TER can successfully backtrack
>it from each VEC_COND_EXPR.  Unfortunately I didn't find a good pass to
>do this.  Tree-vect-generic.c looks like a good candidate, but it's so
>early that following CSE could undo the transform.  Another possible
>fix is to generate comparison inside VEC_COND_EXPR directly in function
>vectorizable_reduction.

I prefer this for now.

Richard.

>As for possible comparison CSE opportunities, I checked that it's
>simple enough to be handled by RTL CSE.
>
>Bootstrap and test on x86_64 and AArch64.  Any comments?
>
>Thanks,
>bin
>
>2016-05-12  Bin Cheng  
>
>   PR tree-optimization/69848
>   * optabs-tree.c (expand_vcond_mask_p, expand_vcond_p): New.
>   (expand_vec_cmp_expr_p): Call above functions.
>   * optabs-tree.h (expand_vcond_mask_p, expand_vcond_p): New.
>   * tree-ssa-forwprop.c (optabs-tree.h): Include header file.
>   (forward_propagate_into_cond): Propgate multiple uses for
>   VEC_COND_EXPR.




Re: [PATCH, i386]: Additional fix for PR62599 with -mcmodel=medium -fpic

2016-05-13 Thread Uros Bizjak
On Fri, May 13, 2016 at 9:07 AM, Uros Bizjak  wrote:
> On Fri, May 13, 2016 at 1:20 AM, H.J. Lu  wrote:
>
 testsuite/gcc.target/i386/pr61599-{1,2}.c testcases expose a failure
 with -mcmodel -fpic, where:

 /tmp/ccfpoxHY.o: In function `bar':
 pr61599-2.c:(.text+0xe): relocation truncated to fit: R_X86_64_PC32
 against symbol `a' defined in LARGE_COMMON section in /tmp/ccKTKST2.o
 collect2: error: ld returned 1 exit status
 compiler exited with status 1

 CM_MEDIUM_PIC code model assumes that code+got/plt fits in a 31 bit
 region, data is unlimited. Based on these assumptions, code should be
 accessed via R_X86_64_GOT64.

 Attached patch uses UNSPEC_GOT instead of UNSPEC_GOTPCREL also for
 CM_MEDIUM_PIC.

 2016-05-12  Uros Bizjak  

 PR target/61599
 * config/i386/i386.c (legitimize_pic_address): Do not use
 UNSPEC_GOTPCREL for CM_MEDIUM_PIC code model.

 Patch was bootstrapped on x86_64-linux-gnu and regression tested with
 -mcmodel=medium -fpic.

 Jakub, H.J., do you have any comments on the patch?
>>>
>>>
>>> I prefer this patch.

> Yes, your patch is more precise.

OTOH, are we sure there is no linker bug here? We have:

$ objdump -dr pr61599-1a.o

pr61599-1a.o: file format elf64-x86-64


Disassembly of section .text:

 :
   0:   55  push   %rbp
   1:   48 89 e5mov%rsp,%rbp
   4:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# b 
7: R_X86_64_GOTPC32 _GLOBAL_OFFSET_TABLE_-0x4
   b:   b8 00 00 00 00  mov$0x0,%eax
  10:   e8 00 00 00 00  callq  15 
11: R_X86_64_PLT32  bar-0x4
  15:   89 c2   mov%eax,%edx
  17:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 1e 
1a: R_X86_64_GOTPCREL   c-0x4
  1e:   0f b6 80 e1 00 00 00movzbl 0xe1(%rax),%eax
  25:   0f be c0movsbl %al,%eax
  28:   01 d0   add%edx,%eax
  2a:   5d  pop%rbp
  2b:   c3  retq

$ objdump -dr pr61599-2a.o

pr61599-2a.o: file format elf64-x86-64


Disassembly of section .text:

 :
   0:   55  push   %rbp
   1:   48 89 e5mov%rsp,%rbp
   4:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# b 
7: R_X86_64_GOTPC32 _GLOBAL_OFFSET_TABLE_-0x4
   b:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 12 
e: R_X86_64_GOTPCRELa-0x4
  12:   0f b6 40 02 movzbl 0x2(%rax),%eax
  16:   0f be d0movsbl %al,%edx
  19:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 20 
1c: R_X86_64_GOTPCREL   b-0x4
  20:   0f b6 40 10 movzbl 0x10(%rax),%eax
  24:   0f be c0movsbl %al,%eax
  27:   01 c2   add%eax,%edx
  29:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 30 
2c: R_X86_64_GOTPCREL   c-0x4
  30:   0f b6 80 00 01 00 00movzbl 0x100(%rax),%eax
  37:   0f be c0movsbl %al,%eax
  3a:   01 d0   add%edx,%eax
  3c:   5d  pop%rbp
  3d:   c3  retq

$ gcc pr61599-2a.o pr61599-1a.o
pr61599-2a.o: In function `bar':
pr61599-2a.c:(.text+0xe): relocation truncated to fit: R_X86_64_PC32
against symbol `a' defined in LARGE_COMMON section in pr61599-1a.o
collect2: error: ld returned 1 exit status

There is no R_X86_64_PC32 in pr61599-2a.o, only X86_64_GOTPCREL, which
according to assumptions, would reach all entries in the GOT table.

Uros.


Re: [PTX] assembler name mangling

2016-05-13 Thread Alexander Monakov
Hello,

On Fri, 13 May 2016, Nathan Sidwell wrote:
> I've committed this cleanup to use TARGET_MANGLE_DECL_ASSEMBLER_NAME rather
> than the ad-hoc solution the ptx backend currently has.   I didn't address it
> during the earlier set of cleanups as I felt there were more important things
> to address then.

This regresses offloading compilation: the new hook isn't applied during LTO
stream-in, so target functions named 'call' won't be remapped.

I'd solve it by avoiding the hook and performing remapping in a simple IPA
pass registered from the backend.

Alexander


Re: [Patch, testsuite] PR70227, skip g++.dg/lto/pr69589_0.C on targets without -rdynamic support

2016-05-13 Thread Mike Stump
On May 13, 2016, at 6:53 AM, Jiong Wang  wrote:
> 
> This patch skip g++.dg/lto/pr69589_0.C on typical arm & aarch64
> bare-metal targets as they don't support "-rdynamic".
> 
> OK for trunk?

Ok.


[PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports

2016-05-13 Thread Bin Cheng
Hi,
As PR69848 reported, GCC vectorizer now generates comparison outside of 
VEC_COND_EXPR for COND_REDUCTION case, as below:

  _20 = vect__1.6_8 != { 0, 0, 0, 0 };
  vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
  _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;

This results in inefficient expanding.  With IR like:

  vect_c_2.8_16 = VEC_COND_EXPR ;
  _21 = VEC_COND_EXPR ;

We can do:
1) Expanding time optimization, for example, reverting comparison operator by 
switching VEC_COND_EXPR operands.  This is useful when backend only supports 
some comparison operators.
2) For backend not supporting vcond_mask patterns, saving one LT_EXPR 
instruction which introduced by expand_vec_cond_expr.

This patch fixes this by propagating comparison into VEC_COND_EXPR even if it's 
used multiple times.  For now, GCC does single_use_only propagation.  Ideally, 
we may duplicate the comparison before each use statement just before 
expanding, so that TER can successfully backtrack it from each VEC_COND_EXPR.  
Unfortunately I didn't find a good pass to do this.  Tree-vect-generic.c looks 
like a good candidate, but it's so early that following CSE could undo the 
transform.  Another possible fix is to generate comparison inside VEC_COND_EXPR 
directly in function vectorizable_reduction.

As for possible comparison CSE opportunities, I checked that it's simple enough 
to be handled by RTL CSE.

Bootstrap and test on x86_64 and AArch64.  Any comments?

Thanks,
bin

2016-05-12  Bin Cheng  

PR tree-optimization/69848
* optabs-tree.c (expand_vcond_mask_p, expand_vcond_p): New.
(expand_vec_cmp_expr_p): Call above functions.
* optabs-tree.h (expand_vcond_mask_p, expand_vcond_p): New.
* tree-ssa-forwprop.c (optabs-tree.h): Include header file.
(forward_propagate_into_cond): Propgate multiple uses for
VEC_COND_EXPR.
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index faac087..0ccdbdb 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -314,25 +314,50 @@ expand_vec_cmp_expr_p (tree value_type, tree mask_type)
 }
 
 /* Return TRUE iff, appropriate vector insns are available
-   for vector cond expr with vector type VALUE_TYPE and a comparison
+   for VCOND_MASK pattern with vector type VALUE_TYPE and a comparison
with operand vector types in CMP_OP_TYPE.  */
 
 bool
-expand_vec_cond_expr_p (tree value_type, tree cmp_op_type)
+expand_vcond_mask_p (tree value_type, tree cmp_op_type)
 {
-  machine_mode value_mode = TYPE_MODE (value_type);
-  machine_mode cmp_op_mode = TYPE_MODE (cmp_op_type);
   if (VECTOR_BOOLEAN_TYPE_P (cmp_op_type)
   && get_vcond_mask_icode (TYPE_MODE (value_type),
   TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing)
 return true;
 
-  if (GET_MODE_SIZE (value_mode) != GET_MODE_SIZE (cmp_op_mode)
-  || GET_MODE_NUNITS (value_mode) != GET_MODE_NUNITS (cmp_op_mode)
-  || get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
- TYPE_UNSIGNED (cmp_op_type)) == CODE_FOR_nothing)
-return false;
-  return true;
+  return false;
+}
+
+/* Return TRUE iff, appropriate vector insns are available
+   for VCOND pattern with vector type VALUE_TYPE and a comparison
+   with operand vector types in CMP_OP_TYPE.  */
+
+bool
+expand_vcond_p (tree value_type, tree cmp_op_type)
+{
+  machine_mode value_mode = TYPE_MODE (value_type);
+  machine_mode cmp_op_mode = TYPE_MODE (cmp_op_type);
+  if (GET_MODE_SIZE (value_mode) == GET_MODE_SIZE (cmp_op_mode)
+  && GET_MODE_NUNITS (value_mode) == GET_MODE_NUNITS (cmp_op_mode)
+  && get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
+ TYPE_UNSIGNED (cmp_op_type)) != CODE_FOR_nothing)
+return true;
+
+  return false;
+}
+
+/* Return TRUE iff, appropriate vector insns are available
+   for vector cond expr with vector type VALUE_TYPE and a comparison
+   with operand vector types in CMP_OP_TYPE.  */
+
+bool
+expand_vec_cond_expr_p (tree value_type, tree cmp_op_type)
+{
+  if (expand_vcond_mask_p (value_type, cmp_op_type)
+  || expand_vcond_p (value_type, cmp_op_type))
+return true;
+
+  return false;
 }
 
 /* Use the current target and options to initialize
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3b9280..feab40f 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -39,6 +39,8 @@ optab optab_for_tree_code (enum tree_code, const_tree, enum 
optab_subtype);
 bool supportable_convert_operation (enum tree_code, tree, tree, tree *,
enum tree_code *);
 bool expand_vec_cmp_expr_p (tree, tree);
+bool expand_vcond_mask_p (tree, tree);
+bool expand_vcond_p (tree, tree);
 bool expand_vec_cond_expr_p (tree, tree);
 void init_tree_optimization_optabs (tree);
 
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index c40f9e2..40f023f 100644
--- a/gcc/tree-ssa-forwprop.c
+++ 

[PTX] assembler name mangling

2016-05-13 Thread Nathan Sidwell
I've committed this cleanup to use TARGET_MANGLE_DECL_ASSEMBLER_NAME rather than 
the ad-hoc solution the ptx backend currently has.   I didn't address it during 
the earlier set of cleanups as I felt there were more important things to 
address then.


I did discover an issue in langhooks, where the target name mangling hook wasn't 
being called, and led to __builtin_realloc and friends not being mangled as 
needed.  I've applied this change to langhooks as obvious.  The only other use 
of this target hook appears to be i386-mingw, where the additional call appears 
harmless.


nathan
2016-05-13  Nathan Sidwell  

	* config/nvptx/nvptx.c (nvptx_mangle_decl_assembler_name): New.
	(nvptx_name_replacement): Delete.
	(write_fn_proto, write_fn_proto_from_insn,
	nvptx_output_call_insn): Remove nvptx_name_replacement call.
	(TARGET_MANGLE_DECL_ASSEMBLER_NAME): Override.
	* langhooks.c (add_builtin_funcction_common): Call
	targetm.mangle_decl_assembler_name.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 236211)
+++ config/nvptx/nvptx.c	(working copy)
@@ -211,6 +211,31 @@ nvptx_ptx_type_from_mode (machine_mode m
 }
 }
 
+/* Return an identifier node for DECL.  Usually thee default mangled
+   name ID is useable.  Some names cannot be used directly, so prefix
+   them with __nvptx_.  */
+
+static tree
+nvptx_mangle_decl_assembler_name (tree ARG_UNUSED (decl), tree id)
+{
+  static const char *const bad_names[] =
+{"call", "malloc", "free", "realloc", 0};
+  int ix;
+  const char *name = IDENTIFIER_POINTER (id);
+
+  for (ix = 0; bad_names[ix]; ix++)
+if (!strcmp (bad_names[ix], name))
+  {
+	char *new_name = XALLOCAVEC (char,
+ strlen (name) + sizeof ("__nvptx_"));
+	sprintf (new_name, "__nvptx_%s", name);
+	id = get_identifier (new_name);
+	break;
+  }
+
+  return id;
+}
+
 /* Encode the PTX data area that DECL (which might not actually be a
_DECL) should reside in.  */
 
@@ -256,24 +281,6 @@ section_for_decl (const_tree decl)
   return section_for_sym (XEXP (DECL_RTL (CONST_CAST (tree, decl)), 0));
 }
 
-/* Check NAME for special function names and redirect them by returning a
-   replacement.  This applies to malloc, free and realloc, for which we
-   want to use libgcc wrappers, and call, which triggers a bug in ptxas.  */
-
-static const char *
-nvptx_name_replacement (const char *name)
-{
-  if (strcmp (name, "call") == 0)
-return "__nvptx_call";
-  if (strcmp (name, "malloc") == 0)
-return "__nvptx_malloc";
-  if (strcmp (name, "free") == 0)
-return "__nvptx_free";
-  if (strcmp (name, "realloc") == 0)
-return "__nvptx_realloc";
-  return name;
-}
-
 /* If MODE should be treated as two registers of an inner mode, return
that inner mode.  Otherwise return VOIDmode.  */
 
@@ -731,13 +738,8 @@ write_fn_proto (std::stringstream , bo
   if (is_defn)
 /* Emit a declaration. The PTX assembler gets upset without it.   */
 name = write_fn_proto (s, false, name, decl);
-  else
-{
-  /* Avoid repeating the name replacement.  */
-  name = nvptx_name_replacement (name);
-  if (name[0] == '*')
-	name++;
-}
+  else if (name[0] == '*')
+name++;
 
   write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
 
@@ -841,7 +843,6 @@ write_fn_proto_from_insn (std::stringstr
 }
   else
 {
-  name = nvptx_name_replacement (name);
   write_fn_marker (s, false, true, name);
   s << "\t.extern .func ";
 }
@@ -1859,7 +1860,6 @@ nvptx_output_call_insn (rtx_insn *insn,
   if (decl)
 {
   const char *name = get_fnname_from_decl (decl);
-  name = nvptx_name_replacement (name);
   assemble_name (asm_out_file, name);
 }
   else
@@ -4887,6 +4887,9 @@ nvptx_goacc_reduction (gcall *call)
 #undef TARGET_NO_REGISTER_ALLOCATION
 #define TARGET_NO_REGISTER_ALLOCATION true
 
+#undef TARGET_MANGLE_DECL_ASSEMBLER_NAME
+#define TARGET_MANGLE_DECL_ASSEMBLER_NAME nvptx_mangle_decl_assembler_name
+
 #undef TARGET_ENCODE_SECTION_INFO
 #define TARGET_ENCODE_SECTION_INFO nvptx_encode_section_info
 #undef TARGET_RECORD_OFFLOAD_SYMBOL
Index: langhooks.c
===
--- langhooks.c	(revision 236208)
+++ langhooks.c	(working copy)
@@ -561,6 +561,8 @@ add_builtin_function_common (const char
   if (library_name)
 {
   tree libname = get_identifier (library_name);
+
+  libname = targetm.mangle_decl_assembler_name (decl, libname);
   SET_DECL_ASSEMBLER_NAME (decl, libname);
 }
 


Re: [Patch ARM/AArch64 00/11][testsuite] AdvSIMD intrinsics update

2016-05-13 Thread James Greenhalgh
On Wed, May 11, 2016 at 03:23:50PM +0200, Christophe Lyon wrote:
> Hi,
> 
> A few months ago, we decided it was time to remove neon-testgen.ml
> and its generated tests. I did it, just to realize too late that
> some intrinsics were not covered anymore, so I reverted the removal.
> 
> This patch series performs a little bit of cleanup and adds the
> missing tests to cover all what is defined in arm_neon.h for AArch32.
> 
> Globally, this consists in adding tests for:
> - missing poly8 and poly16 for vreinterpret and vtst
> - fp16 tests for vget_lane, vstX_lane and vreinterpret
> - armv8 vrnd{,a,m,n,p,x}
> - tests for poly64 and poly128 intrinsics
> 
> Some intrinsics are not covered in aarch64/advsimd-intrinsics, but in
> arm/crypto: vldrq, vstrq, vaes, vsha1, vsha256, vmull_p64,
> vmull_high_p64.
> 
> Patches 1-4 are cleanup.
> Patch 5 adds the missing poly8 and poly16 tests for vreinterpret.
> Patch 6 adds the missing tests for vtst_p8 and vtstq_p8.
> Patches 7,8, 11 add the missing fp16 tests
> Patch 9 adds armv8 vrnd{,a,m,n,p,x} tests
> Patch 10 adds tests for poly64 and poly128 operations
> 
> I've checked the coverage by building the list of intrinsics tested
> via neon-testgen.ml, the list of intrinsics defined in arm_neon.h, and
> running the advsimd-intrinsics.exp tests with -save-temps to gather
> the list of actually tested intrinsics.
> 
> This series partly addresses PR 70369 which I created to keep track
> of these missing intrinsics tests: several AArch64 AdvSIMD intrinsics
> are still missing tests.
> 
> Tested with QEMU on arm* and aarch64*, with no regression, and
> several new PASSes.
> 
> OK for trunk?

D'oh, I wasn't thinking when I started OKing these, but obviously I only
have the privilege to OK them from an AArch64 perspective.

If it isn't too late, please wait for someone from the ARM port to give
their blessing before committing them.

Sorry to the ARM maintainers for forgetting to be explicit about that in
my earlier emails.

Thanks,
James



RE: [PATCH] MIPS: Ensure that lo_sums do not contain an unaligned symbol

2016-05-13 Thread Andrew Bennett
Hi Matthew,

Many thanks for reviewing this patch.  Upon closer validation of the patch I 
have 
found this this approach will not work in all cases so unfortunately I am going 
to 
have to abandon it.  I will be shortly posting a new patch to fix this issue in
the middle-end.

Regards,


Andrew

> -Original Message-
> From: Matthew Fortune
> Sent: 05 May 2016 10:44
> To: Andrew Bennett; gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] MIPS: Ensure that lo_sums do not contain an unaligned
> symbol
> 
> Hi Andrew,
> 
> Thanks for working on this it is a painful area.  There's a bit more to do
> but this is cleaning up some sneaky bugs.  Can you create a GCC bugzilla
> entry if you haven't already as we should record where these bugs exist and
> when they are fixed?
> 
> See my comments but I think that you are fixing more variants of this bug
> than your summary states so we need to capture the detail on what code is
> affected by these issues.
> 
> Andrew Bennett  writes:
> > different offsets.  Lets show this with an example C program.
> >
> > struct
> > {
> >   short s;
> >   unsigned long long l;
> > } h;
> >
> > void foo (void)
> > {
> >   h.l = 0;
> > }
> >
> > When this is compiled for MIPS it produces the following assembly:
> >
> > lui $2,%hi(h+8)
> > sw  $0,%lo(h+8)($2)
> > jr  $31
> > sw  $0,%lo(h+12)($2)
> 
> This looks like a stale example h+8 implies 8 bytes of data preceding 'l'.
> 
> 
> >diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> >index 6cdda3b..f07e433 100644
> >--- a/gcc/config/mips/mips.c
> >+++ b/gcc/config/mips/mips.c
> >@@ -2354,6 +2354,38 @@ mips_valid_lo_sum_p (enum mips_symbol_type
> symbol_type, machine_mode mode)
> >   return true;
> > }
> >
> >+/* Return true if X in LO_SUM (REG, X) is a valid.  */
> 
> is a valid...
> 
> I would however merge this code into mips_valid_lo_sum_p as the new function
> name is fairly confusing.  Both call sites for this function have the
> symbol_type available which mips_valid_lo_sum_p requires and also have the
> mode, reg, x so just add those to mips_valid_lo_sum_p.
> 
> >+
> >+bool
> >+mips_valid_lo_sum_lo_part_p (machine_mode mode, rtx reg, rtx x)
> >+{
> >+   rtx symbol = NULL;
> 
> three space indent.
> 
> >+
> >+   if (mips_abi != ABI_32)
> >+ return true;
> 
> I don't think this is limited to o32.  I was thinking this was just about
> splitting a multi-word unaligned access but actually the test cases in this
> patch show that it is also about accessing unaligned elements in a structure
> using the same 'hi' part for multiple 'lo' parts with differing offsets.
> 
> In the end I think there is no word size or abi specific issues here; it is
> quite general.
> 
> >+   if (mode == BLKmode)
> >+ return true;
> 
> Why is this special? Does the core GCC code ensure that lo_sum on a BLKmode
> cannot have a constant offset greater than alignment?
> 
> >+   if (reg && REG_P (reg) && REGNO (reg) == GLOBAL_POINTER_REGNUM)
> >+ return true;
> 
> I don't think reg need be an optional argument it is available at both
> call sites.  A comment to say why offsets from the global pointer are
> not affected would also be useful.
> 
> >+
> >+   if (GET_CODE (x) == CONST
> >+   && GET_CODE (XEXP (x, 0)) == PLUS
> >+   && GET_CODE (XEXP (XEXP (x, 0), 0)) == SYMBOL_REF)
> >+ symbol = XEXP (XEXP (x, 0), 0);
> >+   else if (GET_CODE (x) == SYMBOL_REF)
> >+ symbol = x;
> >+
> >+   if (symbol
> >+   && SYMBOL_REF_DECL (symbol)
> >+   && (GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT) >
> >+   (DECL_ALIGN_UNIT (SYMBOL_REF_DECL (symbol
> 
> This needs another bracket to cover the multiline '>' condition.
> 
> >+ return false;
> >+
> >+   return true;
> >+}
> >+
> > /* Return true if X is a valid address for machine mode MODE.  If it is,
> >fill in INFO appropriately.  STRICT_P is true if REG_OK_STRICT is in
> >effect.  */
> >@@ -2394,7 +2426,8 @@ mips_classify_address (struct mips_address_info *info,
> rtx x,
> >   info->symbol_type
> > = mips_classify_symbolic_expression (info->offset, SYMBOL_CONTEXT_MEM);
> >   return (mips_valid_base_register_p (info->reg, mode, strict_p)
> >-  && mips_valid_lo_sum_p (info->symbol_type, mode));
> >+  && mips_valid_lo_sum_p (info->symbol_type, mode)
> >+  && mips_valid_lo_sum_lo_part_p (mode, info->reg, info->offset));
> 
> As above this can become:
> 
> && mips_valid_lo_sum_p (info->symbol_type, mode, info-reg,
> info->offset)
> 
> >
> > case CONST_INT:
> >   /* Small-integer addresses don't occur very often, but they
> >@@ -3143,6 +3176,8 @@ mips_split_symbol (rtx temp, rtx addr, machine_mode
> mode, rtx *low_out)
> > high = gen_rtx_HIGH (Pmode, copy_rtx (addr));
> > high = mips_force_temporary (temp, high);
> > *low_out = gen_rtx_LO_SUM (Pmode, high, 

Re: [patch] Fix PR tree-optimization/70884

2016-05-13 Thread Martin Jambor
Hi,

On Fri, May 13, 2016 at 01:01:50PM +0200, Eric Botcazou wrote:
> > Hmm, the patch looks obvious if it was the intent to allow constant
> > pool replacements
> > _not_ only when the whole constant pool entry may go away.  But I
> > think the intent was
> > to not do this otherwise it will generate worse code by forcing all
> > loads from the constant pool to appear at
> > function start.
> 
> Do you mean when the whole constant pool entry is scalarized as opposed to 
> partially scalarized?
> 
> > So - the "real" issue may be a missing
> > should_scalarize_away_bitmap/cannot_scalarize_away_bitmap
> > check somewhere.
> 
> This seems to work:
> 
> Index: tree-sra.c
> ===
> --- tree-sra.c  (revision 236195)
> +++ tree-sra.c  (working copy)
> @@ -2680,6 +2680,10 @@ analyze_all_variable_accesses (void)
>EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, bi)
>  {
>tree var = candidate (i);
> +  if (constant_decl_p (var)
> + && (!bitmap_bit_p (should_scalarize_away_bitmap, i)
> + || bitmap_bit_p (cannot_scalarize_away_bitmap, i)))
> +   continue;
>struct access *access;
>  
>access = sort_and_splice_var_accesses (var);
> 
> but I have no idea whether this is correct or not.

This would skip creation of access trees for the variables without
disqualifying them and while it may "work," it is certainly ugly,
causing lookups to traverse uninitialized structures.

> 
> Martin, are we sure to disable scalarization of constant_decl_p variables not 
> covered by initialize_constant_pool_replacements that way?

Overall, I think that removal of bitmap tests in
initialize_constant_pool_replacements is certainly correct, if SRA
decides to create replacements for a constant pool decl, it has to
initialize them.  Whether we want to perform such scalarization is
another matter.

SRA creates replacement for all parts of an aggregate that is loaded
as a scalar multiple times.  If moving (multiple) scalar loads from
the constant pool to the beginning of the function is a bad idea, then
I would propose the fix below which disables that heuristics for them.
Effectively, the patch prevents late-SRA from doing anything for both
testcases (PR 70884 and PR 70919).  I have started a bootstrap and
testing on x86_64 and i686 only a few moments ago but it would be
great if someone also tried on an architecture for which the
constant-pool SRA enhancement was intended, just to be sure.

Thanks,

Martin


2016-05-13  Martin Jambor  

PR tree-optimization/70884
* tree-sra.c (initialize_constant_pool_replacements): Do not check
should_scalarize_away_bitmap and cannot_scalarize_away_bitmap bits.
(sort_and_splice_var_accesses): Do not consider multiple scalar reads
of constant pool data as a reason for scalarization.

testsuite/
* gcc.dg/tree-ssa/pr70919.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr70919.c | 46 
 gcc/tree-sra.c  | 54 -
 2 files changed, 73 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr70919.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr70919.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr70919.c
new file mode 100644
index 000..bed0ab3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr70919.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-options "-O" } */
+
+#pragma pack(1)
+struct S0
+{
+  int f0:24;
+};
+
+struct S1
+{
+  int f1;
+} a;
+
+int b, c;
+
+char
+fn1 (struct S1 p1)
+{
+  return 0;
+}
+
+int
+main ()
+{
+  c = fn1 (a);
+  if (b)
+{
+  struct S0 f[3][9] =
+   { { { 0 }, { 0 }, { 1 }, { 1 }, { 0 }, { 0 }, { 0 }, { 1 }, { 1 } },
+ { { 0 }, { 0 }, { 1 }, { 1 }, { 0 }, { 0 }, { 0 }, { 1 }, { 1 } },
+ { { 0 }, { 0 }, { 1 }, { 1 }, { 0 }, { 0 }, { 0 }, { 1 }, { 1 } }
+   };
+  b = f[1][8].f0;
+}
+  struct S0 g[3][9] =
+   { { { 0 }, { 0 }, { 1 }, { 1 }, { 0 }, { 0 }, { 0 }, { 1 }, { 1 } },
+ { { 0 }, { 0 }, { 1 }, { 1 }, { 0 }, { 0 }, { 0 }, { 1 }, { 1 } },
+ { { 0 }, { 0 }, { 1 }, { 1 }, { 0 }, { 0 }, { 0 }, { 1 }, { 1 } }
+   };
+
+  if (g[1][8].f0 != 1)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 936d3a6..7c0e90d 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -2074,7 +2074,8 @@ sort_and_splice_var_accesses (tree var)
   access->grp_scalar_write = grp_scalar_write;
   access->grp_assignment_read = grp_assignment_read;
   access->grp_assignment_write = grp_assignment_write;
-  access->grp_hint = multiple_scalar_reads || total_scalarization;
+  access->grp_hint = total_scalarization
+   || (multiple_scalar_reads && !constant_decl_p (var));
   access->grp_total_scalarization = total_scalarization;
   access->grp_partial_lhs = grp_partial_lhs;
   

Re: [Patch ARM/AArch64 10/11] Add missing tests for intrinsics operating on poly64 and poly128 types.

2016-05-13 Thread James Greenhalgh
On Wed, May 11, 2016 at 03:24:00PM +0200, Christophe Lyon wrote:
> 2016-05-02  Christophe Lyon  
> 
>   * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h (result):
>   Add poly64x1_t and poly64x2_t cases if supported.
>   * gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h
>   (buffer, buffer_pad, buffer_dup, buffer_dup_pad): Likewise.
>   * gcc.target/aarch64/advsimd-intrinsics/p64_p128.c: New file.
>   * gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c: New file.
>   * gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c: New file.
> 

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/p64_p128.c
> @@ -0,0 +1,665 @@
> +/* This file contains tests for all the *p64 intrinsics, except for
> +   vreinterpret which have their own testcase.  */
> +
> +/* { dg-require-effective-target arm_crypto_ok } */
> +/* { dg-add-options arm_crypto } */
> +
> +#include 
> +#include "arm-neon-ref.h"
> +#include "compute-ref-data.h"
> +
> +/* Expected results: vbsl.  */
> +VECT_VAR_DECL(vbsl_expected,poly,64,1) [] = { 0xfff1 };
> +VECT_VAR_DECL(vbsl_expected,poly,64,2) [] = { 0xfff1,
> +   0xfff1 };
> +
> +/* Expected results: vceq.  */
> +VECT_VAR_DECL(vceq_expected,uint,64,1) [] = { 0x0 };

vceqq_p64
vceqz_p64
vceqzq_p64
vtst_p64
vtstq_p64

are missing, but will not be trivial to add. Could you raise a bug report
(or fix it if you like :-) )?

This is OK without a fix for those intrinsics with a suitable bug report
opened.

Thanks,
James



Re: [Patch ARM/AArch64 11/11] Add missing tests for vreinterpret, operating of fp16 type.

2016-05-13 Thread James Greenhalgh
On Wed, May 11, 2016 at 03:24:01PM +0200, Christophe Lyon wrote:
> 2016-05-04  Christophe Lyon  
> 
> * gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c: Add fp16 
> tests.
> * gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c: Likewise.

OK.

Thanks,
James



[Patch, avr] Include INCOMING_FRAME_SP_OFFSET when printing stack usage

2016-05-13 Thread Senthil Kumar Selvaraj
Hi,

  This trivial patch adds INCOMING_FRAME_SP_OFFSET to
  current_function_static_stack_size, thus fixing the 2 (or 3, for
  3 byte PC devices) byte difference between reported and actual
  values when using -fstack-usage.

  The patch came about because of this discussion
  (https://gcc.gnu.org/ml/gcc/2016-05/msg00107.html). For AVRs, the
  return address gets pushed into the stack as part of the call
  instruction, and the number of bytes pushed varies by PC width.
  This is already taken care of when defining INCOMING_FRAME_SP_OFFSET,
  so I just add it to the previously computed value when setting
  current_function_static_stack_size. 

  If this is ok, could someone commit please? I don't have commit
  access.

Regards
Senthil

gcc/ChangeLog

2016-05-13  Senthil Kumar Selvaraj  

* config/avr/avr.c (avr_expand_prologue): Add INCOMING_FRAME_SP_OFFSET
  to computed stack_usage.


diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
index 8de39e0..ba5cd91 100644
--- gcc/config/avr/avr.c
+++ gcc/config/avr/avr.c
@@ -1484,7 +1484,7 @@ avr_expand_prologue (void)
   avr_prologue_setup_frame (size, set);
 
   if (flag_stack_usage_info)
-current_function_static_stack_size = cfun->machine->stack_usage;
+current_function_static_stack_size = cfun->machine->stack_usage + 
INCOMING_FRAME_SP_OFFSET;
 }
 
 


Re: PATCH: PR target/70738: Add -mgeneral-regs-only option

2016-05-13 Thread H.J. Lu
On Thu, May 12, 2016 at 10:54 AM, H.J. Lu  wrote:
>>> Here is a patch to add
>>> -mgeneral-regs-only option to x86 backend.   We can update
>>> spec for interrupt handle to recommend compiling interrupt handler
>>> with -mgeneral-regs-only option and add a note for compiler
>>> implementers.
>>>
>>> OK for trunk if there is no regression?
>>
>>
>> I can't comment on the code patch, but for the documentation part:
>>
>>> @@ -24242,6 +24242,12 @@ opcodes, to mitigate against certain forms of
>>> attack. At the moment,
>>>  this option is limited in what it can do and should not be relied
>>>  on to provide serious protection.
>>>
>>> +@item -mgeneral-regs-only
>>> +@opindex mgeneral-regs-only
>>> +Generate code which uses only the general-purpose registers.  This will
>>
>>
>> s/which/that/
>>
>>> +prevent the compiler from using floating-point, vector, mask and bound
>>
>>
>> s/will prevent/prevents/
>>
>>> +registers, but will not impose any restrictions on the assembler.
>>
>>
>> Maybe you mean to say "does not restrict use of those registers in inline
>> assembly code"?  In any case, please get rid of the future tense here, too.
>
> I changed it to
>
> ---
> @item -mgeneral-regs-only
> @opindex mgeneral-regs-only
> Generate code that uses only the general-purpose registers.  This
> prevents the compiler from using floating-point, vector, mask and bound
> registers.
> ---
>

Here is the updated patch.  Tested on x86-64.  OK for trunk?

Thanks.


-- 
H.J.
From 1432847f8d2e00fe8ff85e9cdb61abbed0202cb4 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 11 May 2016 09:49:33 -0700
Subject: [PATCH] Add -mgeneral-regs-only option

X86 Linux kernel is compiled only with integer instructions.  Currently,

-mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -mno-80387
-mno-fp-ret-in-387  -mskip-rax-setup

is used to compile kernel.  If we add another non-integer feature, it
has to be turned off.  We can add a -mgeneral-regs-only option, similar
to AArch64, to disable all non-integer features so that kernel doesn't
need a long list and the same option will work for future compilers.
It can also be used to compile interrupt handler.

gcc/

	PR target/70738
	* common/config/i386/i386-common.c
	(OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET): New.
	(ix86_handle_option): Disable MPX, MMX, SSE and x87 instructions
	for -mgeneral-regs-only.
	* config/i386/i386.c (ix86_option_override_internal): Don't
	enable x87 instructions if only the general registers are
	allowed.
	* config/i386/i386.opt: Add -mgeneral-regs-only.
	* doc/invoke.texi: Document -mgeneral-regs-only.

gcc/testsuite/

	PR target/70738
	* gcc.target/i386/pr70738-1.c: Likewise.
	* gcc.target/i386/pr70738-2.c: Likewise.
	* gcc.target/i386/pr70738-3.c: Likewise.
	* gcc.target/i386/pr70738-4.c: Likewise.
	* gcc.target/i386/pr70738-5.c: Likewise.
	* gcc.target/i386/pr70738-6.c: Likewise.
	* gcc.target/i386/pr70738-7.c: Likewise.
	* gcc.target/i386/pr70738-8.c: Likewise.
	* gcc.target/i386/pr70738-9.c: Likewise.
---
 gcc/common/config/i386/i386-common.c  | 20 
 gcc/config/i386/i386.c|  5 -
 gcc/config/i386/i386.opt  |  4 
 gcc/doc/invoke.texi   |  8 +++-
 gcc/testsuite/gcc.target/i386/pr70738-1.c |  9 +
 gcc/testsuite/gcc.target/i386/pr70738-2.c | 10 ++
 gcc/testsuite/gcc.target/i386/pr70738-3.c | 11 +++
 gcc/testsuite/gcc.target/i386/pr70738-4.c | 10 ++
 gcc/testsuite/gcc.target/i386/pr70738-5.c | 16 
 gcc/testsuite/gcc.target/i386/pr70738-6.c | 10 ++
 gcc/testsuite/gcc.target/i386/pr70738-7.c | 13 +
 gcc/testsuite/gcc.target/i386/pr70738-8.c | 30 ++
 gcc/testsuite/gcc.target/i386/pr70738-9.c | 23 +++
 13 files changed, 167 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr70738-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr70738-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr70738-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr70738-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr70738-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr70738-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr70738-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr70738-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr70738-9.c

diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index cc65c8c..b150c9e 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -223,6 +223,11 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_RDRND_UNSET OPTION_MASK_ISA_RDRND
 #define OPTION_MASK_ISA_F16C_UNSET OPTION_MASK_ISA_F16C
 
+#define OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET \
+  (OPTION_MASK_ISA_MMX_UNSET \
+   | OPTION_MASK_ISA_SSE_UNSET \
+   | 

Re: [Patch ARM/AArch64 09/11] Add missing vrnd{,a,m,n,p,x} tests.

2016-05-13 Thread James Greenhalgh
On Wed, May 11, 2016 at 03:23:59PM +0200, Christophe Lyon wrote:
> 2016-05-02  Christophe Lyon  
> 
>   * gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c: New.
>   * gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc: New.
>   * gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c: New.
>   * gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c: New.
>   * gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c: New.
>   * gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp.c: New.
>   * gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx.c: New.

This is OK in lines with how we test the other intrinsics in this
directory (we haven't really tried to hit corner cases elsewhere). 

Thanks,
James

> 
> Change-Id: Iab5f98dc4b15f9a2f61b622a9f62b207872f1737
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
> new file mode 100644
> index 000..5f492d4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
> @@ -0,0 +1,16 @@
> +/* { dg-require-effective-target arm_v8_neon_ok } */
> +/* { dg-add-options arm_v8_neon } */
> +
> +#include 
> +#include "arm-neon-ref.h"
> +#include "compute-ref-data.h"
> +
> +/* Expected results.  */
> +VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc180, 0xc170 };
> +VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc180, 0xc170,
> +0xc160, 0xc150 };
> +
> +#define INSN vrnd
> +#define TEST_MSG "VRND"
> +
> +#include "vrndX.inc"
> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc
> new file mode 100644
> index 000..629240d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc
> @@ -0,0 +1,43 @@
> +#define FNNAME1(NAME) exec_ ## NAME
> +#define FNNAME(NAME) FNNAME1 (NAME)
> +
> +void FNNAME (INSN) (void)
> +{
> +  /* vector_res = vrndX (vector), then store the result.  */
> +#define TEST_VRND2(INSN, Q, T1, T2, W, N)\
> +  VECT_VAR (vector_res, T1, W, N) =  \
> +INSN##Q##_##T2##W (VECT_VAR (vector, T1, W, N)); \
> +vst1##Q##_##T2##W (VECT_VAR (result, T1, W, N),  \
> +VECT_VAR (vector_res, T1, W, N))
> +
> +  /* Two auxliary macros are necessary to expand INSN.  */
> +#define TEST_VRND1(INSN, Q, T1, T2, W, N)\
> +  TEST_VRND2 (INSN, Q, T1, T2, W, N)
> +
> +#define TEST_VRND(Q, T1, T2, W, N)   \
> +  TEST_VRND1 (INSN, Q, T1, T2, W, N)
> +
> +  DECL_VARIABLE (vector, float, 32, 2);
> +  DECL_VARIABLE (vector, float, 32, 4);
> +
> +  DECL_VARIABLE (vector_res, float, 32, 2);
> +  DECL_VARIABLE (vector_res, float, 32, 4);
> +
> +  clean_results ();
> +
> +  VLOAD (vector, buffer, , float, f, 32, 2);
> +  VLOAD (vector, buffer, q, float, f, 32, 4);
> +
> +  TEST_VRND ( , float, f, 32, 2);
> +  TEST_VRND (q, float, f, 32, 4);
> +
> +  CHECK_FP (TEST_MSG, float, 32, 2, PRIx32, expected, "");
> +  CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, "");
> +}
> +
> +int
> +main (void)
> +{
> +  FNNAME (INSN) ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c
> new file mode 100644
> index 000..816fd28d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c
> @@ -0,0 +1,16 @@
> +/* { dg-require-effective-target arm_v8_neon_ok } */
> +/* { dg-add-options arm_v8_neon } */
> +
> +#include 
> +#include "arm-neon-ref.h"
> +#include "compute-ref-data.h"
> +
> +/* Expected results.  */
> +VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc180, 0xc170 };
> +VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc180, 0xc170,
> +0xc160, 0xc150 };
> +
> +#define INSN vrnda
> +#define TEST_MSG "VRNDA"
> +
> +#include "vrndX.inc"
> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c
> new file mode 100644
> index 000..029880c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c
> @@ -0,0 +1,16 @@
> +/* { dg-require-effective-target arm_v8_neon_ok } */
> +/* { dg-add-options arm_v8_neon } */
> +
> +#include 
> +#include "arm-neon-ref.h"
> +#include "compute-ref-data.h"
> +
> +/* Expected results.  */
> +VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc180, 0xc170 };
> +VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc180, 0xc170,
> +0xc160, 0xc150 };
> +
> +#define INSN vrndm
> +#define TEST_MSG "VRNDM"
> +
> +#include "vrndX.inc"
> diff --git 

Re: [Patch ARM/AArch64 08/11] Add missing vstX_lane fp16 tests.

2016-05-13 Thread James Greenhalgh
On Wed, May 11, 2016 at 03:23:58PM +0200, Christophe Lyon wrote:
> 2016-05-02  Christophe Lyon  
> 
>   * gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c: Add fp16 tests.

OK.

Thanks,
James




[PATCH][MIPS] Correct latency of loads in M5100

2016-05-13 Thread Robert Suchanek
Hi,

A small patch to correct the latency for M5100.

Ok to commit?

Regards,
Robert

2016-05-13  Matthew Fortune  

* config/mips/m5100.md (m51_int_load): Update the latency to 2.

---
 gcc/config/mips/m5100.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/mips/m5100.md b/gcc/config/mips/m5100.md
index f69fc7f..8d87b70 100644
--- a/gcc/config/mips/m5100.md
+++ b/gcc/config/mips/m5100.md
@@ -65,7 +65,7 @@ (define_insn_reservation "m51_int_jump" 1
 
 ;; loads: lb, lbu, lh, lhu, ll, lw, lwl, lwr, lwpc, lwxs
 ;; prefetch: prefetch, prefetchx
-(define_insn_reservation "m51_int_load" 3
+(define_insn_reservation "m51_int_load" 2
   (and (eq_attr "cpu" "m5100")
(eq_attr "type" "load,prefetch,prefetchx"))
   "m51_alu")
-- 
2.8.2.396.g5fe494c


[PATCH][MIPS] Enable LSA/DLSA for MSA

2016-05-13 Thread Robert Suchanek
Hi,

The below enables LSA/DLSA instructions for -mmsa.

Ok to commit?

Regards,
Robert

* config/mips/mips.h (ISA_HAS_LSA): Enable for -mmsa.
(ISA_HAS_DLSA): Ditto.
---
 gcc/config/mips/mips.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 1efa61a..e8897d1 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -209,10 +209,12 @@ struct mips_cpu_info {
 #endif
 
 /* ISA has LSA available.  */
-#define ISA_HAS_LSA(mips_isa_rev >= 6)
+#define ISA_HAS_LSA(mips_isa_rev >= 6 || ISA_HAS_MSA)
 
 /* ISA has DLSA available.  */
-#define ISA_HAS_DLSA   (TARGET_64BIT && mips_isa_rev >= 6)
+#define ISA_HAS_DLSA   (TARGET_64BIT \
+&& (mips_isa_rev >= 6 \
+|| ISA_HAS_MSA))
 
 /* The ISA compression flags that are currently in effect.  */
 #define TARGET_COMPRESSION (target_flags & (MASK_MIPS16 | MASK_MICROMIPS))
-- 
2.8.2.396.g5fe494c


Re: [Patch ARM/AArch64 06/11] Add missing vtst_p8 and vtstq_p8 tests.

2016-05-13 Thread James Greenhalgh
On Fri, May 13, 2016 at 04:41:33PM +0200, Christophe Lyon wrote:
> On 13 May 2016 at 16:37, James Greenhalgh  wrote:
> > On Wed, May 11, 2016 at 03:23:56PM +0200, Christophe Lyon wrote:
> >> 2016-05-02  Christophe Lyon  
> >>
> >>   * gcc.target/aarch64/advsimd-intrinsics/vtst.c: Add tests
> >>   for vtst_p8 and vtstq_p8.
> >
> > And vtst_p16 and vtstq_p16 too please.
> >
> > vtst_s64
> > vtstq_s64
> > vtst_u64
> > vtstq_u64 are also missing (AArch64 only).
> >
> vtst_p16/vtstq_p16 are AArch64 only too, right?

Not in my copy of:

  
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf

I see it is missing from config/arm/arm_neon.h so that's a bug in the GCC
implementation. It should be easy to resolve, map it to the same place
as vtst_u16 and vtst_s16 - this is just a bit operation which takes no
semantics from the data-type.

Would you mind spinning the fix for that and committing it before this
patch?

> My introduction message was not clear enough: this series
> only attempts to fully cover AArch32 intrinsics.

Understood, sorry for the extra noise.

Thanks,
James




[PATCH] libstdc++/71073 add system_header pragma to Debug Mode headers

2016-05-13 Thread Jonathan Wakely

Self-explanatory. Tested x86_64-linux, committed to trunk.

PR libstdc++/71073
* include/debug/bitset: Add #pragma GCC system_header.
* include/debug/deque: Likewise.
* include/debug/list: Likewise.
* include/debug/map: Likewise.
* include/debug/set: Likewise.
* include/debug/string: Likewise.
* include/debug/unordered_map: Likewise.
* include/debug/unordered_set: Likewise.
* include/debug/vector: Likewise.
* include/debug/functions.h: Adjust whitespace.
commit ba57b566ad3e556edb1d9519bdbc65cbf2f97ba7
Author: Jonathan Wakely 
Date:   Fri May 13 10:28:10 2016 +0100

libstdc++/71073 add system_header pragma to Debug Mode headers

PR libstdc++/71073
* include/debug/bitset: Add #pragma GCC system_header.
* include/debug/deque: Likewise.
* include/debug/list: Likewise.
* include/debug/map: Likewise.
* include/debug/set: Likewise.
* include/debug/string: Likewise.
* include/debug/unordered_map: Likewise.
* include/debug/unordered_set: Likewise.
* include/debug/vector: Likewise.
* include/debug/functions.h: Adjust whitespace.

diff --git a/libstdc++-v3/include/debug/bitset 
b/libstdc++-v3/include/debug/bitset
index 706a7b7..1353aa3 100644
--- a/libstdc++-v3/include/debug/bitset
+++ b/libstdc++-v3/include/debug/bitset
@@ -29,6 +29,8 @@
 #ifndef _GLIBCXX_DEBUG_BITSET
 #define _GLIBCXX_DEBUG_BITSET
 
+#pragma GCC system_header
+
 #include 
 #include 
 #include 
diff --git a/libstdc++-v3/include/debug/deque b/libstdc++-v3/include/debug/deque
index 72b6536..f15faad 100644
--- a/libstdc++-v3/include/debug/deque
+++ b/libstdc++-v3/include/debug/deque
@@ -29,6 +29,8 @@
 #ifndef _GLIBCXX_DEBUG_DEQUE
 #define _GLIBCXX_DEBUG_DEQUE 1
 
+#pragma GCC system_header
+
 #include 
 #include 
 #include 
diff --git a/libstdc++-v3/include/debug/functions.h 
b/libstdc++-v3/include/debug/functions.h
index 547ec5c..35e7ae8 100644
--- a/libstdc++-v3/include/debug/functions.h
+++ b/libstdc++-v3/include/debug/functions.h
@@ -29,11 +29,10 @@
 #ifndef _GLIBCXX_DEBUG_FUNCTIONS_H
 #define _GLIBCXX_DEBUG_FUNCTIONS_H 1
 
-#include  // for __addressof
-#include  // for less
+#include  // for __addressof
+#include  // for less
 #if __cplusplus >= 201103L
-# include // for 
is_lvalue_reference and
-   // conditional.
+# include // for is_lvalue_reference and 
conditional.
 #endif
 
 #include 
diff --git a/libstdc++-v3/include/debug/list b/libstdc++-v3/include/debug/list
index f1bfe35..09df483 100644
--- a/libstdc++-v3/include/debug/list
+++ b/libstdc++-v3/include/debug/list
@@ -29,6 +29,8 @@
 #ifndef _GLIBCXX_DEBUG_LIST
 #define _GLIBCXX_DEBUG_LIST 1
 
+#pragma GCC system_header
+
 #include 
 #include 
 #include 
diff --git a/libstdc++-v3/include/debug/map b/libstdc++-v3/include/debug/map
index 3fa961d..2cce7c0 100644
--- a/libstdc++-v3/include/debug/map
+++ b/libstdc++-v3/include/debug/map
@@ -29,6 +29,8 @@
 #ifndef _GLIBCXX_DEBUG_MAP
 #define _GLIBCXX_DEBUG_MAP 1
 
+#pragma GCC system_header
+
 #include 
 #include 
 #include 
diff --git a/libstdc++-v3/include/debug/set b/libstdc++-v3/include/debug/set
index bfe1d36d..82e3900 100644
--- a/libstdc++-v3/include/debug/set
+++ b/libstdc++-v3/include/debug/set
@@ -29,6 +29,8 @@
 #ifndef _GLIBCXX_DEBUG_SET
 #define _GLIBCXX_DEBUG_SET 1
 
+#pragma GCC system_header
+
 #include 
 #include 
 #include 
diff --git a/libstdc++-v3/include/debug/string 
b/libstdc++-v3/include/debug/string
index 7edc665..137974d 100644
--- a/libstdc++-v3/include/debug/string
+++ b/libstdc++-v3/include/debug/string
@@ -29,6 +29,8 @@
 #ifndef _GLIBCXX_DEBUG_STRING
 #define _GLIBCXX_DEBUG_STRING 1
 
+#pragma GCC system_header
+
 #include 
 #include 
 #include 
diff --git a/libstdc++-v3/include/debug/unordered_map 
b/libstdc++-v3/include/debug/unordered_map
index cf6c8d4..873f36a 100644
--- a/libstdc++-v3/include/debug/unordered_map
+++ b/libstdc++-v3/include/debug/unordered_map
@@ -29,6 +29,8 @@
 #ifndef _GLIBCXX_DEBUG_UNORDERED_MAP
 #define _GLIBCXX_DEBUG_UNORDERED_MAP 1
 
+#pragma GCC system_header
+
 #if __cplusplus < 201103L
 # include 
 #else
diff --git a/libstdc++-v3/include/debug/unordered_set 
b/libstdc++-v3/include/debug/unordered_set
index 203900a..6a4dba6 100644
--- a/libstdc++-v3/include/debug/unordered_set
+++ b/libstdc++-v3/include/debug/unordered_set
@@ -29,6 +29,8 @@
 #ifndef _GLIBCXX_DEBUG_UNORDERED_SET
 #define _GLIBCXX_DEBUG_UNORDERED_SET 1
 
+#pragma GCC system_header
+
 #if __cplusplus < 201103L
 # include 
 #else
diff --git a/libstdc++-v3/include/debug/vector 
b/libstdc++-v3/include/debug/vector
index d2cd74f..9bcda73 100644
--- a/libstdc++-v3/include/debug/vector
+++ b/libstdc++-v3/include/debug/vector
@@ -29,6 +29,8 @@
 #ifndef _GLIBCXX_DEBUG_VECTOR
 #define 

Re: [PATCH 3/3] Enhance dumps of IVOPTS

2016-05-13 Thread H.J. Lu
On Fri, May 13, 2016 at 7:17 AM, H.J. Lu  wrote:
> On Fri, May 13, 2016 at 5:51 AM, Martin Liška  wrote:
>> On 05/13/2016 02:46 PM, Richard Biener wrote:
>>> Use them for HOST_WIDE_INT printing, for [u]int64_t use the PRI stuff.
>>>
>>> Richard.
>>
>> Thanks you both, installed as r236208.
>>
>
> It isn't fixed:
>
> /export/gnu/import/git/sources/gcc/gcc/tree-ssa-loop-ivopts.c:7052:41:
> error: format ‘%llu’ expects argument of type ‘long long unsigned
> int’, but argument 3 has type ‘size_t {aka unsigned int}’
> [-Werror=format=]
>  set->used_inv_exprs->elements ());
>

I am going to check in this as an obvious fix.


-- 
H.J.
---
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 2b2115f..e8953a0 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -7048,8 +7048,8 @@ create_new_ivs (struct ivopts_data *data, struct
iv_ca *set)
  LOCATION_LINE (data->loop_loc));
   fprintf (dump_file, ", " HOST_WIDE_INT_PRINT_DEC " avg niters",
avg_loop_niter (data->current_loop));
-  fprintf (dump_file, ", %" PRIu64 " expressions",
-   set->used_inv_exprs->elements ());
+  fprintf (dump_file, ", " HOST_WIDE_INT_PRINT_UNSIGNED " expressions",
+   (unsigned HOST_WIDE_INT) set->used_inv_exprs->elements ());
   fprintf (dump_file, ", %lu IVs:\n", bitmap_count_bits (set->cands));
   EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
  {


Re: [Patch ARM/AArch64 06/11] Add missing vtst_p8 and vtstq_p8 tests.

2016-05-13 Thread Christophe Lyon
On 13 May 2016 at 16:37, James Greenhalgh  wrote:
> On Wed, May 11, 2016 at 03:23:56PM +0200, Christophe Lyon wrote:
>> 2016-05-02  Christophe Lyon  
>>
>>   * gcc.target/aarch64/advsimd-intrinsics/vtst.c: Add tests
>>   for vtst_p8 and vtstq_p8.
>
> And vtst_p16 and vtstq_p16 too please.
>
> vtst_s64
> vtstq_s64
> vtst_u64
> vtstq_u64 are also missing (AArch64 only).
>
vtst_p16/vtstq_p16 are AArch64 only too, right?

My introduction message was not clear enough: this series
only attempts to fully cover AArch32 intrinsics.

There are many more missing for AArch64.


> Thanks,
> James
>
>>
>> Change-Id: Id555a9b3214945506a106e2465b42d38bf76a3a7
>>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c 
>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
>> index 9e74ffb..4c7ee79 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
>> @@ -32,6 +32,14 @@ VECT_VAR_DECL(expected_unsigned,uint,16,8) [] = { 0x0, 
>> 0x,
>>  VECT_VAR_DECL(expected_unsigned,uint,32,4) [] = { 0x0, 0x,
>> 0x0, 0x };
>>
>> +/* Expected results with poly input.  */
>> +VECT_VAR_DECL(expected_poly,uint,8,8) [] = { 0x0, 0xff, 0xff, 0xff,
>> +  0xff, 0xff, 0xff, 0xff };
>> +VECT_VAR_DECL(expected_poly,uint,8,16) [] = { 0x0, 0xff, 0xff, 0xff,
>> +   0xff, 0xff, 0xff, 0xff,
>> +   0xff, 0xff, 0xff, 0xff,
>> +   0xff, 0xff, 0xff, 0xff };
>> +
>>  #define INSN_NAME vtst
>>  #define TEST_MSG "VTST/VTSTQ"
>>
>> @@ -71,12 +79,14 @@ FNNAME (INSN_NAME)
>>VDUP(vector2, , uint, u, 8, 8, 15);
>>VDUP(vector2, , uint, u, 16, 4, 5);
>>VDUP(vector2, , uint, u, 32, 2, 1);
>> +  VDUP(vector2, , poly, p, 8, 8, 15);
>>VDUP(vector2, q, int, s, 8, 16, 15);
>>VDUP(vector2, q, int, s, 16, 8, 5);
>>VDUP(vector2, q, int, s, 32, 4, 1);
>>VDUP(vector2, q, uint, u, 8, 16, 15);
>>VDUP(vector2, q, uint, u, 16, 8, 5);
>>VDUP(vector2, q, uint, u, 32, 4, 1);
>> +  VDUP(vector2, q, poly, p, 8, 16, 15);
>>
>>  #define TEST_MACRO_NO64BIT_VARIANT_1_5(MACRO, VAR, T1, T2)   \
>>MACRO(VAR, , T1, T2, 8, 8);\
>> @@ -109,6 +119,14 @@ FNNAME (INSN_NAME)
>>CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_unsigned, CMT);
>>CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected_unsigned, CMT);
>>CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_unsigned, CMT);
>> +
>> +  /* Now, test the variants with poly8 as input.  */
>> +#undef CMT
>> +#define CMT " (poly input)"
>> +  TEST_BINARY_OP(INSN_NAME, , poly, p, 8, 8);
>> +  TEST_BINARY_OP(INSN_NAME, q, poly, p, 8, 16);
>> +  CHECK(TEST_MSG, uint, 8, 8, PRIx8, expected_poly, CMT);
>> +  CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_poly, CMT);
>>  }
>>
>>  int main (void)
>> --
>> 1.9.1
>>
>


Re: [Patch ARM/AArch64 07/11] Add missing vget_lane fp16 tests.

2016-05-13 Thread James Greenhalgh
On Wed, May 11, 2016 at 03:23:57PM +0200, Christophe Lyon wrote:
> 2016-05-02  Christophe Lyon  
> 
>   * gcc.target/aarch64/advsimd-intrinsics/vget_lane.c: Add fp16 tests.

OK.

Thanks,
James



Re: [Patch ARM/AArch64 06/11] Add missing vtst_p8 and vtstq_p8 tests.

2016-05-13 Thread James Greenhalgh
On Wed, May 11, 2016 at 03:23:56PM +0200, Christophe Lyon wrote:
> 2016-05-02  Christophe Lyon  
> 
>   * gcc.target/aarch64/advsimd-intrinsics/vtst.c: Add tests
>   for vtst_p8 and vtstq_p8.

And vtst_p16 and vtstq_p16 too please.

vtst_s64
vtstq_s64
vtst_u64
vtstq_u64 are also missing (AArch64 only).

Thanks,
James

> 
> Change-Id: Id555a9b3214945506a106e2465b42d38bf76a3a7
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
> index 9e74ffb..4c7ee79 100644
> --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
> @@ -32,6 +32,14 @@ VECT_VAR_DECL(expected_unsigned,uint,16,8) [] = { 0x0, 
> 0x,
>  VECT_VAR_DECL(expected_unsigned,uint,32,4) [] = { 0x0, 0x,
> 0x0, 0x };
>  
> +/* Expected results with poly input.  */
> +VECT_VAR_DECL(expected_poly,uint,8,8) [] = { 0x0, 0xff, 0xff, 0xff,
> +  0xff, 0xff, 0xff, 0xff };
> +VECT_VAR_DECL(expected_poly,uint,8,16) [] = { 0x0, 0xff, 0xff, 0xff,
> +   0xff, 0xff, 0xff, 0xff,
> +   0xff, 0xff, 0xff, 0xff,
> +   0xff, 0xff, 0xff, 0xff };
> +
>  #define INSN_NAME vtst
>  #define TEST_MSG "VTST/VTSTQ"
>  
> @@ -71,12 +79,14 @@ FNNAME (INSN_NAME)
>VDUP(vector2, , uint, u, 8, 8, 15);
>VDUP(vector2, , uint, u, 16, 4, 5);
>VDUP(vector2, , uint, u, 32, 2, 1);
> +  VDUP(vector2, , poly, p, 8, 8, 15);
>VDUP(vector2, q, int, s, 8, 16, 15);
>VDUP(vector2, q, int, s, 16, 8, 5);
>VDUP(vector2, q, int, s, 32, 4, 1);
>VDUP(vector2, q, uint, u, 8, 16, 15);
>VDUP(vector2, q, uint, u, 16, 8, 5);
>VDUP(vector2, q, uint, u, 32, 4, 1);
> +  VDUP(vector2, q, poly, p, 8, 16, 15);
>  
>  #define TEST_MACRO_NO64BIT_VARIANT_1_5(MACRO, VAR, T1, T2)   \
>MACRO(VAR, , T1, T2, 8, 8);\
> @@ -109,6 +119,14 @@ FNNAME (INSN_NAME)
>CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_unsigned, CMT);
>CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected_unsigned, CMT);
>CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_unsigned, CMT);
> +
> +  /* Now, test the variants with poly8 as input.  */
> +#undef CMT
> +#define CMT " (poly input)"
> +  TEST_BINARY_OP(INSN_NAME, , poly, p, 8, 8);
> +  TEST_BINARY_OP(INSN_NAME, q, poly, p, 8, 16);
> +  CHECK(TEST_MSG, uint, 8, 8, PRIx8, expected_poly, CMT);
> +  CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_poly, CMT);
>  }
>  
>  int main (void)
> -- 
> 1.9.1
> 



Re: [Patch ARM/AArch64 04/11] Add forgotten vsliq_n_u64 test.

2016-05-13 Thread Christophe Lyon
On 13 May 2016 at 16:08, James Greenhalgh  wrote:
> On Wed, May 11, 2016 at 03:23:54PM +0200, Christophe Lyon wrote:
>> 2016-05-02  Christophe Lyon  
>>
>>   * gcc.target/aarch64/advsimd-intrinsics/vsli_n.c: Add check for 
>> vsliq_n_u64.
>>
>
> And vsliq_n_s64 ?
>
Damn! You are right, I missed that one.

> OK with that change.
OK thanks

> Thanks,
> James
>
>> Change-Id: I90bb2b225ffd7bfd54a0827a0264ac20271f54f2
>>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsli_n.c 
>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsli_n.c
>> index 0285083..e5f78d0 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsli_n.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsli_n.c
>> @@ -169,6 +169,7 @@ void vsli_extra(void)
>>CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_max_shift, COMMENT);
>>CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected_max_shift, COMMENT);
>>CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_max_shift, COMMENT);
>> +  CHECK(TEST_MSG, uint, 64, 2, PRIx64, expected_max_shift, COMMENT);
>>CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected_max_shift, COMMENT);
>>CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_max_shift, COMMENT);
>>  }
>> --
>> 1.9.1
>>
>


Re: [PATCH 3/3] Enhance dumps of IVOPTS

2016-05-13 Thread H.J. Lu
On Fri, May 13, 2016 at 5:51 AM, Martin Liška  wrote:
> On 05/13/2016 02:46 PM, Richard Biener wrote:
>> Use them for HOST_WIDE_INT printing, for [u]int64_t use the PRI stuff.
>>
>> Richard.
>
> Thanks you both, installed as r236208.
>

It isn't fixed:

/export/gnu/import/git/sources/gcc/gcc/tree-ssa-loop-ivopts.c:7052:41:
error: format ‘%llu’ expects argument of type ‘long long unsigned
int’, but argument 3 has type ‘size_t {aka unsigned int}’
[-Werror=format=]
 set->used_inv_exprs->elements ());


-- 
H.J.


Re: [Patch ARM/AArch64 05/11] Add missing vreinterpretq_p{8,16} tests.

2016-05-13 Thread James Greenhalgh
On Wed, May 11, 2016 at 03:23:55PM +0200, Christophe Lyon wrote:
> 2016-05-02  Christophe Lyon  
> 
>   * gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c: Add
>   missing tests for vreinterpretq_p{8,16}.

OK.

Thanks,
James



Re: [Patch ARM/AArch64 03/11] AdvSIMD tests: be more verbose.

2016-05-13 Thread James Greenhalgh
On Wed, May 11, 2016 at 03:23:53PM +0200, Christophe Lyon wrote:
> It is useful to have more detailed information in the logs when checking
> validation results: instead of repeating the intrinsic name, we now print
> its return type too.
> 
> 2016-05-02  Christophe Lyon  
> 
>   * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h (CHECK,
>   CHECK_FP, CHECK_CUMULATIVE_SAT): Print which type was checked.

OK.

Thanks,
James
 



Re: Thoughts on memcmp expansion (PR43052)

2016-05-13 Thread Joseph Myers
On Fri, 13 May 2016, Bernd Schmidt wrote:

> On 05/13/2016 03:07 PM, Richard Biener wrote:
> > On Fri, May 13, 2016 at 3:05 PM, Bernd Schmidt  wrote:
> > > Huh? Can you elaborate?
> > 
> > When you have a builtin taking a size in bytes then a byte is 8 bits,
> > not BITS_PER_UNIT bits.
> 
> That makes no sense to me. I think the definition of a byte depends on the
> machine (hence the term "octet" was coined to be unambiguous). Also, such a
> definition would seem to imply that machines with 10-bit bytes cannot
> implement memcpy or memcmp.
> 
> Joseph, can you clarify the standard's meaning here?

* In C: a byte is the minimal addressable unit; an N-byte object is made 
up of N "unsigned char" objects, with successive addresses in terms of 
incrementing an "unsigned char *" pointer.  A byte is at least 8 bits.

* In GCC, at the level of GNU C APIs on the target, which generally 
includes built-in functions: a byte (on the target) is made of 
CHAR_TYPE_SIZE bits.  In theory this could be more than BITS_PER_UNIT, or 
that could be more than 8, though support for either of those cases would 
be very bit-rotten (and I'm not sure there ever have been targets with 
CHAR_TYPE_SIZE > BITS_PER_UNIT).  Sizes passed to memcpy and memcmp are 
sizes in units of CHAR_TYPE_SIZE bits.

* In GCC, at the RTL level: a byte (on the target) is a QImode object, 
which is made of BITS_PER_UNIT bits.  (HImode is always two bytes, SImode 
four, etc., if those modes exist.)  Support for BITS_PER_UNIT being more 
than 8 is very bit-rotten.

* In GCC, on the host: GCC only supports hosts (and $build) where bytes 
are 8-bit (though writing it as CHAR_BIT makes it clear that this 8 means 
the number of bits in a host byte).

Internal interfaces e.g. representing the contents of strings or other 
memory on the target may not currently be well-defined except when 
BITS_PER_UNIT is 8.  Cf. e.g. 
.  But the above should 
at least give guidance as to whether BITS_PER_UNIT, CHAR_TYPE_SIZE (or 
TYPE_PRECISION (char_type_node), preferred where possible to minimize 
usage of target macros) or CHAR_BIT is logically right in a particular 
place.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Patch ARM/AArch64 04/11] Add forgotten vsliq_n_u64 test.

2016-05-13 Thread James Greenhalgh
On Wed, May 11, 2016 at 03:23:54PM +0200, Christophe Lyon wrote:
> 2016-05-02  Christophe Lyon  
> 
>   * gcc.target/aarch64/advsimd-intrinsics/vsli_n.c: Add check for 
> vsliq_n_u64.
> 

And vsliq_n_s64 ?

OK with that change.

Thanks,
James

> Change-Id: I90bb2b225ffd7bfd54a0827a0264ac20271f54f2
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsli_n.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsli_n.c
> index 0285083..e5f78d0 100644
> --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsli_n.c
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsli_n.c
> @@ -169,6 +169,7 @@ void vsli_extra(void)
>CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_max_shift, COMMENT);
>CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected_max_shift, COMMENT);
>CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_max_shift, COMMENT);
> +  CHECK(TEST_MSG, uint, 64, 2, PRIx64, expected_max_shift, COMMENT);
>CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected_max_shift, COMMENT);
>CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_max_shift, COMMENT);
>  }
> -- 
> 1.9.1
> 



[Patch, ARM] PR71061, length pop* pattern in epilogue correctly

2016-05-13 Thread Jiong Wang

For thumb mode, this is causing wrong size calculation and may affect
some rtl pass, for example bb-order where copy_bb_p needs accurate insn
length info.

This have eventually part of the reason for
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00639.html  where bb-order
failed to do the bb copy.

For the fix, I think we should extend arm_attr_length_push_multi to pop*
pattern.

OK for trunk?

2016-05-13  Jiong. Wang 

gcc/
  PR target/71061
  * config/arm/arm-protos.h (arm_attr_length_push_multi): Rename to
  "arm_attr_length_pp_multi".  Add one parameter "first_index".
  * config/arm/arm.md (*push_multi): Use new function.
  (*load_multiple_with_writeback): Set "length" attribute.
  (*pop_multiple_with_writeback_and_return): Likewise.
  (*pop_multiple_with_return): Likewise.

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index d8179c4..d9a09c0 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -162,7 +162,7 @@ extern const char *arm_output_shift(rtx *, int);
 extern const char *arm_output_iwmmxt_shift_immediate (const char *, rtx *, bool);
 extern const char *arm_output_iwmmxt_tinsr (rtx *);
 extern unsigned int arm_sync_loop_insns (rtx , rtx *);
-extern int arm_attr_length_push_multi(rtx, rtx);
+extern int arm_attr_length_pp_multi(rtx, rtx, int);
 extern void arm_expand_compare_and_swap (rtx op[]);
 extern void arm_split_compare_and_swap (rtx op[]);
 extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 71b5143..0ba98e1 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27729,14 +27729,15 @@ arm_preferred_rename_class (reg_class_t rclass)
 return NO_REGS;
 }
 
-/* Compute the atrribute "length" of insn "*push_multi".
-   So this function MUST be kept in sync with that insn pattern.  */
+/* Compute the attribute "length" of an insn which performs a push/pop on
+   multiple registers.  So this function MUST be kept in sync with that insn
+   pattern.  PARALLEL_OP is the toplevel PARALLEL rtx, FIRST_OP is the first
+   push/pop register.  FIRST_INDEX is the element index inside PARALLEL_OP for
+   the first register push/pop rtx.  */
+
 int
-arm_attr_length_push_multi(rtx parallel_op, rtx first_op)
+arm_attr_length_pp_multi(rtx parallel_op, rtx first_op, int first_index)
 {
-  int i, regno, hi_reg;
-  int num_saves = XVECLEN (parallel_op, 0);
-
   /* ARM mode.  */
   if (TARGET_ARM)
 return 4;
@@ -27744,18 +27745,31 @@ arm_attr_length_push_multi(rtx parallel_op, rtx first_op)
   if (TARGET_THUMB1)
 return 2;
 
-  /* Thumb2 mode.  */
-  regno = REGNO (first_op);
-  hi_reg = (REGNO_REG_CLASS (regno) == HI_REGS) && (regno != LR_REGNUM);
-  for (i = 1; i < num_saves && !hi_reg; i++)
+  /* Thumb2 mode.
+ For the pattern "*push_multi", the register for the first push is kept in
+ the first UNSPEC rtx inside parallel, all other registers are kept in the
+ later USE rtxes.  For pop* pattern, each register pop is cleanly
+ represented by an (set (reg) (mem)).
+
+ So we can't always use REGNO (XEXP (input, 0)) to fetch the first register,
+ thus it's passed as argument.  Then we iterate the register list from the
+ last to the first, as the high register is usually at the end so we can
+ return earlier.  */
+
+  unsigned int regno = REGNO (first_op);
+  if ((REGNO_REG_CLASS (regno) == HI_REGS) && (regno != LR_REGNUM))
+return 4;
+
+  int i = XVECLEN (parallel_op, 0) - 1;
+  gcc_assert (first_index >= 0 && first_index <= i);
+  for (; i > first_index; i--)
 {
   regno = REGNO (XEXP (XVECEXP (parallel_op, 0, i), 0));
-  hi_reg |= (REGNO_REG_CLASS (regno) == HI_REGS) && (regno != LR_REGNUM);
+  if (REGNO_REG_CLASS (regno) == HI_REGS && (regno != LR_REGNUM))
+	return 4;
 }
 
-  if (!hi_reg)
-return 2;
-  return 4;
+  return 2;
 }
 
 /* Compute the number of instructions emitted by output_move_double.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 7cf87ef..1e175f6 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -10488,7 +10488,7 @@
 ;; expressions.  For simplicity, the first register is also in the unspec
 ;; part.
 ;; To avoid the usage of GNU extension, the length attribute is computed
-;; in a C function arm_attr_length_push_multi.
+;; in a C function arm_attr_length_pp_multi.
 (define_insn "*push_multi"
   [(match_parallel 2 "multi_register_push"
 [(set (match_operand:BLK 0 "push_mult_memory_operand" "")
@@ -10530,7 +10530,7 @@
   }"
   [(set_attr "type" "store4")
(set (attr "length")
-	(symbol_ref "arm_attr_length_push_multi (operands[2], operands[1])"))]
+	(symbol_ref "arm_attr_length_pp_multi (operands[2], operands[1], 0)"))]
 )
 
 (define_insn "stack_tie"
@@ -10565,7 +10565,9 @@
   }
   "
   [(set_attr "type" "load4")
-   (set_attr "predicable" "yes")]
+   (set_attr "predicable" "yes")
+   (set (attr "length")
+	

Re: [testuite,AArch64] Make scan for 'br' more robust

2016-05-13 Thread James Greenhalgh
On Wed, May 04, 2016 at 11:55:42AM +0200, Christophe Lyon wrote:
> On 4 May 2016 at 10:43, Kyrill Tkachov  wrote:
> >
> > Hi Christophe,
> >
> >
> > On 02/05/16 12:50, Christophe Lyon wrote:
> >>
> >> Hi,
> >>
> >> I've noticed a "regression" of AArch64's noplt_3.c in the gcc-6-branch
> >> because my validation script adds the branch name to gcc/REVISION.
> >>
> >> As a result scan-assembler-times "br" also matched "gcc-6-branch",
> >> hence the failure.
> >>
> >> The small attached patch replaces "br" by "br\t" to fix the problem.
> >>
> >> I've also made a similar change to tail_indirect_call_1 although the
> >> problem did not happen for this test because it uses scan-assembler
> >> instead of scan-assembler-times. I think it's better to make it more
> >> robust too.
> >>
> >> OK?
> >>
> >> Christophe
> >
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/noplt_3.c
> > b/gcc/testsuite/gcc.target/aarch64/noplt_3.c
> > index ef6e65d..a382618 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/noplt_3.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/noplt_3.c
> > @@ -16,5 +16,5 @@ cal_novalue (int a)
> >dec (a);
> >  }
> >  -/* { dg-final { scan-assembler-times "br" 2 } } */
> > +/* { dg-final { scan-assembler-times "br\t" 2 } } */
> >  /* { dg-final { scan-assembler-not "b\t" } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/tail_indirect_call_1.c
> > b/gcc/testsuite/gcc.target/aarch64/tail_indirect_call_1.c
> > index 4759d20..e863323 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/tail_indirect_call_1.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/tail_indirect_call_1.c
> > @@ -3,7 +3,7 @@
> >   typedef void FP (int);
> >  -/* { dg-final { scan-assembler "br" } } */
> > +/* { dg-final { scan-assembler "br\t" } } */
> >
> > Did you mean to make this scan-assembler-times as well?
> >
> 
> I kept the changes minimal, but you are right, it would be more robust
> as attached.
> 
> OK for trunk and gcc-6 branch?

OK.

If you want completeness on this, the
gcc.target/aarch64/tail_indirect_call_1.c change should go back to the
gcc-5 branch too.

Cheers,
James



Re: Thoughts on memcmp expansion (PR43052)

2016-05-13 Thread Bernd Schmidt

On 05/13/2016 03:53 PM, Joseph Myers wrote:

* In C: a byte is the minimal addressable unit; an N-byte object is made
up of N "unsigned char" objects, with successive addresses in terms of
incrementing an "unsigned char *" pointer.  A byte is at least 8 bits.

* In GCC, at the level of GNU C APIs on the target, which generally
includes built-in functions: a byte (on the target) is made of
CHAR_TYPE_SIZE bits.  In theory this could be more than BITS_PER_UNIT, or
that could be more than 8, though support for either of those cases would
be very bit-rotten (and I'm not sure there ever have been targets with
CHAR_TYPE_SIZE > BITS_PER_UNIT).  Sizes passed to memcpy and memcmp are
sizes in units of CHAR_TYPE_SIZE bits.

* In GCC, at the RTL level: a byte (on the target) is a QImode object,
which is made of BITS_PER_UNIT bits.  (HImode is always two bytes, SImode
four, etc., if those modes exist.)  Support for BITS_PER_UNIT being more
than 8 is very bit-rotten.

* In GCC, on the host: GCC only supports hosts (and $build) where bytes
are 8-bit (though writing it as CHAR_BIT makes it clear that this 8 means
the number of bits in a host byte).

Internal interfaces e.g. representing the contents of strings or other
memory on the target may not currently be well-defined except when
BITS_PER_UNIT is 8.  Cf. e.g.
.  But the above should
at least give guidance as to whether BITS_PER_UNIT, CHAR_TYPE_SIZE (or
TYPE_PRECISION (char_type_node), preferred where possible to minimize
usage of target macros) or CHAR_BIT is logically right in a particular
place.


Thanks. So, this would seem to suggest that BITS_PER_UNIT in 
memcmp/memcpy expansion is more accurate than a plain 8, although 
pedantically we might want to use CHAR_TYPE_SIZE? Should I adjust my 
patch to use the latter or leave these parts as-is?



Bernd



Re: [Patch ARM/AArch64 02/11] We can remove useless #ifdefs from these tests: vmul, vshl and vtst.

2016-05-13 Thread James Greenhalgh
On Wed, May 11, 2016 at 03:23:52PM +0200, Christophe Lyon wrote:
> 2016-05-02  Christophe Lyon  
> 
>   * gcc.target/aarch64/advsimd-intrinsics/vmul.c: Remove useless #ifdef.
>   * gcc.target/aarch64/advsimd-intrinsics/vshl.c: Likewise.
>   * gcc.target/aarch64/advsimd-intrinsics/vtst.c: Likewise.

OK.

Thanks,
James



[Patch, testsuite] PR70227, skip g++.dg/lto/pr69589_0.C on targets without -rdynamic support

2016-05-13 Thread Jiong Wang

This patch skip g++.dg/lto/pr69589_0.C on typical arm & aarch64
bare-metal targets as they don't support "-rdynamic".

spu-unknown-elf is known to be failing also, but I'd leave it to
those who are more familar with all relevant spu targets.

dg-skip-if is used instead of dg-xfail-if because the latter is not
supported inside lto.exp.

OK for trunk?

2016-05-13  Jiong Wang  

gcc/testsuite/
  PR testsuite/70227
  * g++.dg/lto/pr69589_0.C: Skip arm and aarch64 bare-metal targets.
diff --git a/gcc/testsuite/g++.dg/lto/pr69589_0.C b/gcc/testsuite/g++.dg/lto/pr69589_0.C
index bbdcb73..1457d2e 100644
--- a/gcc/testsuite/g++.dg/lto/pr69589_0.C
+++ b/gcc/testsuite/g++.dg/lto/pr69589_0.C
@@ -1,6 +1,8 @@
 // { dg-lto-do link }
-// { dg-lto-options "-O2 -rdynamic" } 
+// { dg-lto-options "-O2 -rdynamic" }
 // { dg-extra-ld-options "-r -nostdlib" }
+// { dg-skip-if "Skip targets without -rdynamic support" { arm*-none-eabi aarch64*-*-elf } { "*" } { "" } }
+
 #pragma GCC visibility push(hidden)
 struct A { int [] (long); };
 template  struct B;


Re: [Patch ARM/AArch64 01/11] Fix typo in vreinterpret.c test comment.

2016-05-13 Thread James Greenhalgh
On Wed, May 11, 2016 at 03:23:51PM +0200, Christophe Lyon wrote:
> 2016-05-02  Christophe Lyon  
> 
>   * gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c: Fix typo in 
> comment.

This one would have been OK to commit as obvious.

OK for trunk.

Thanks,
James



[patch, avr] Fix unrecognizable insn ICE for avr (PR71103)

2016-05-13 Thread Pitchumani Sivanupandi
avr-gcc crashes for following test as it couldn't recognize the
instruction pattern.

struct st {
  unsigned char uc1;
  unsigned int *ui1;
};

unsigned int ui1;
struct st foo () {
  struct st ret;
  ret.uc1 = 6;
  ret.ui1 = 
  return ret;
}

$ avr-gcc -mmcu=atmega328p -O1 test.c
(-- snip --) 
test.c: In function 'foo':
test.c:12:1: error: unrecognizable insn:
 }
 ^
(insn 6 5 7 2 (set (subreg:QI (reg:PSI 42 [ D.1499 ]) 1)
(subreg:QI (symbol_ref:HI ("ui1") ) 0)) test.c:11 -1
 (nil))
test.c:12:1: internal compiler error: in extract_insn, at recog.c:2287
0xd51195 _fatal_insn(char const*, rtx_def const*, char const*, int,
char const*)
/home/rudran/code/gcc/gcc/rtl-error.c:108
(-- snip --) 

There is no valid pattern in avr to match the "subreg:QI
(symbol_ref:HI)". Attached patch forces the symbol_ref of subreg
operand to register so that it will become register operand and movqi
pattern shall recognize it.

Ran gcc regression test with internal simulators. No new regressions
found.

If ok, could someone commit please?

Regards,
Pitchumanidiff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index c988446..927bc69 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -638,6 +638,13 @@
 rtx dest = operands[0];
 rtx src  = avr_eval_addr_attrib (operands[1]);
 
+if (SUBREG_P(src) && (GET_CODE(XEXP(src,0)) == SYMBOL_REF) &&
+can_create_pseudo_p())
+  {
+rtx symbol_ref = XEXP(src, 0);
+XEXP (src, 0) = copy_to_mode_reg (GET_MODE(symbol_ref), symbol_ref);
+  }
+
 if (avr_mem_flash_p (dest))
   DONE;
 
diff --git a/gcc/testsuite/gcc.target/avr/pr71103.c b/gcc/testsuite/gcc.target/avr/pr71103.c
new file mode 100644
index 000..43244d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/pr71103.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+struct ResponseStruct{
+unsigned char responseLength;
+char *response;
+};
+
+static char response[5];
+struct ResponseStruct something(){
+struct ResponseStruct returnValue;
+returnValue.responseLength = 5;
+returnValue.response = response;
+return returnValue;
+}
+


Re: [AArch64] Remove an unused reload hook.

2016-05-13 Thread James Greenhalgh
On Thu, Apr 28, 2016 at 03:11:59PM +0100, Matthew Wahab wrote:
> Hello,
> 
> Yvan Roux pointed out that the patch at
> https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01713.html was never
> committed.
> 
> From the original submission:
> 
>   The LEGITIMIZE_RELOAD_ADDRESS macro is only needed for reload. Since
>   the Aarch64 backend no longer supports reload, this macro is not
>   needed and this patch removes it.
> 
> This is a rebased and retested version of that patch.
> 
> Tested aarch64-none-linux-gnu with native bootstrap and make check.
> 
> Ok for trunk?

Yes, OK for trunk.

Thanks,
James

> gcc/
> 2016-04-26  Matthew Wahab  
> 
> * config/aarch64/aarch64.h (LEGITIMIZE_RELOAD_ADDRESS): Remove.
> * config/aarch64/arch64-protos.h
> (aarch64_legitimize_reload_address): Remove.
> * config/aarch64/aarch64.c (aarch64_legitimize_reload_address):
> Remove.




Re: [ARM] Enable __fp16 as a function parameter and return type.

2016-05-13 Thread Ramana Radhakrishnan
On Thu, Apr 28, 2016 at 10:20 AM, Matthew Wahab
 wrote:
> Hello,
>
> The ARM target supports the half-precision floating point type __fp16
> but does not allow its use as a function return or parameter type. This
> patch removes that restriction and defines the ACLE macro
> __ARM_FP16_ARGS to indicate this. The code generated for passing __fp16
> values into and out of functions depends on the level of hardware
> support but conforms to the AAPCS (see
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042f/IHI0042F_aapcs.pdf).
>
> This patch enables data movement for HF-mode values using VFP registers,
> when they are available, to support passing arguments and return values
> through the registers.
>
> This patch also fixes the definition of TARGET_NEON_FP16 which used to
> require both neon and fp16 features to be enabled. This was
> inadvertently weakened, when the macro definition was changed to use
> ARM_FPU_FSET_HAS, to only require one of neon or fp16 to be
> enabled. This patch returns to the original
> requirements. TARGET_NEON_FP16 is only used in instruction selection for
> HF-mode data moves.
>
> Tested for arm-none-eabi with cross-compiled check-gcc and for
> arm-none-linux-gnueabihf with native bootstrap and make check.
>
> Ok for trunk?
> Matthew

This is OK - thanks.

We have to deal with Joseph's points around the issue with double
rounding but I think that's the subject of a separate patch.

regards
Ramana

>
> gcc/
> 2016-04-27  Matthew Wahab  
> Ramana Radhakrishnan  
> Jiong Wang  
>
> * config/arm/arm-c.c (arm_cpu_builtins): Use def_or_undef_macro
> for __ARM_FP16_FORMAT_IEEE and __ARM_FP16_FORMAT_ALTERNATIVE.
> Define __ARM_FP16_ARGS when appropriate.
> * config/arm/arm.c (arm_invalid_parameter_type): Remove
> declaration.
> (arm_invalid_return_type): Likewise.
> (TARGET_INVALID_PARAMETER_TYPE): Remove.
> (TARGET_INVALID_RETURN_TYPE): Remove.
> (aapcs_vfp_sub_candidate): Allow HFmode.
> (aapcs_vfp_allocate): Add comment.  Support HFmode.
> (aapcs_vfp_allocate_return_reg): Likewise.
> (struct aapcs_cp_arg_layout): Slightly reword comments for
> is_return_candidate and allocate_return_reg.
> (output_mov_vfp): Update assert.
> (arm_hard_regno_mode_ok): Remove comment, update HF-mode
> condition.
> (arm_invalid_parameter_type): Remove.
> (amr_invalid_return_type): Remove.
> * config/arm/arm.h (TARGET_NEON_FP16): Fix definition.
> * config/arm/arm.md (*arm32_movhf): Disable for TARGET_VFP.
> * config/arm/vfp.md (*movhf_vfp): Enable for TARGET_VFP.
>
> gcc/testsuite/
> 2016-04-27  Matthew Wahab  
>
> * g++.dg/ext/arm-fp16/fp16-param-1.c: Update expected output.  Add
> test for __ARM_FP16_ARGS.
> * g++.dg/ext/arm-fp16/fp16-return-1.c: Update expected output.
> * gcc.target/arm/aapcs/neon-vect10.c: New.
> * gcc.target/arm/aapcs/neon-vect9.c: New.
> * gcc.target/arm/aapcs/vfp18.c: New.
> * gcc.target/arm/aapcs/vfp19.c: New.
> * gcc.target/arm/aapcs/vfp20.c: New.
> * gcc.target/arm/aapcs/vfp21.c: New.
> * gcc.target/arm/fp16-aapcs-1.c: New.
> * g++.target/arm/fp16-param-1.c: Update expected output.  Add
> test for __ARM_FP16_ARGS.
> * g++.target/arm/fp16-return-1.c: Update expected output.


Re: [PATCH, GCC] PR middle-end/55299, fold bitnot through ASR and rotates

2016-05-13 Thread Marc Glisse

On Fri, 13 May 2016, Mikhail Maltsev wrote:


I don't know if we might want some :c / single_use restrictions, maybe on the
outer convert and the rshift/rotate.


I don't think :c can be used here.


Oups, typo for :s.


As for :s, I added it, as you suggested.


:s will be ignored when there is no conversion, but I think that's good 
enough for now.



Also, I tried to add some more test cases for rotate with conversions, but
unfortunately GCC does not recognize rotate pattern, when narrowing conversions
are present.


It is usually easier to split your expression into several assignments. 
Untested:


int f(long long a, unsigned long n){
  long long b = ~a;
  unsigned long c = b;
  unsigned long d = ROLL (c, n);
  int e = d;
  return ~e;
}

this way the rotate pattern is detected early (generic) with no extra 
operations to confuse the compiler, while your new transformation will 
happen in gimple (most likely the first forwprop pass).


The patch looks good to me, now wait for Richard's comments.

--
Marc Glisse


Re: [PATCH][combine] PR middle-end/71074 Check that const_op is >= 0 before potentially shifting in simplify_comparison

2016-05-13 Thread Bernd Schmidt

On 05/13/2016 03:22 PM, Kyrill Tkachov wrote:

/* We only want to handle integral modes.  This catches VOIDmode,
   CCmode, and the floating-point modes.  An exception is that we
@@ -11649,7 +11649,8 @@ simplify_comparison (enum rtx_code code,
/* Try to simplify the compare to constant, possibly changing the
   comparison op, and/or changing op1 to zero.  */
code = simplify_compare_const (code, mode, op0, );
-  const_op = INTVAL (op1);
+  HOST_WIDE_INT const_op = INTVAL (op1);
+  unsigned HOST_WIDE_INT uconst_op = (unsigned HOST_WIDE_INT)
const_op;

Can this be just "unsigned HOST_WIDE_INT uconst_op = UINTVAL (op1);" ?


Either should work.


+  unsigned HOST_WIDE_INT low_mask
+= (((unsigned HOST_WIDE_INT) 1 << INTVAL (amount)) - 1);
unsigned HOST_WIDE_INT low_bits
-= (nonzero_bits (XEXP (op0, 0), mode)
-   & (((unsigned HOST_WIDE_INT) 1
-   << INTVAL (XEXP (op0, 1))) - 1));
+= (nonzero_bits (XEXP (op0, 0), mode) & low_mask);
if (low_bits == 0 || !equality_comparison_p)
  {

(unsigned HOST_WIDE_INT) 1 can be replaced with HOST_WIDE_INT_1U.


Ah, I suspected there was something like this, but none of the 
surrounding code was using it. Newly changed code should probably use 
that; we could probably improve things further by using it more 
consistently in this function, but let's do that in another patch.



Bernd


Re: [PATCH][combine] PR middle-end/71074 Check that const_op is >= 0 before potentially shifting in simplify_comparison

2016-05-13 Thread Kyrill Tkachov


On 13/05/16 14:01, Bernd Schmidt wrote:

On 05/13/2016 02:21 PM, Kyrill Tkachov wrote:

Hi all,

In this PR we may end up shifting a negative value left in
simplify_comparison.
The statement is:
const_op <<= INTVAL (XEXP (op0, 1));

This patch guards the block of that statement by const_op >= 0.
I _think_ that's a correct thing to do for that trasnformation:
"If we have (compare (xshiftrt FOO N) (const_int C)) and
  the low order N bits of FOO are known to be zero, we can do this
  by comparing FOO with C shifted left N bits so long as no
  overflow occurs."


Isn't the condition testing for overflow though? In a somewhat nonobvious way.

So I think you should just use an unsigned version of const_op. While we're 
here we might as well make the code here a little more readable. How about the 
patch below? Can you test whether this works for you?



Looks reasonable to me barring some comments below.
I'll test a version of the patch with the comments fixed.

Thanks,
Kyrill



Bernd


Index: combine.c
===
--- combine.c   (revision 236113)
+++ combine.c   (working copy)
@@ -11628,13 +11628,13 @@ simplify_comparison (enum rtx_code code,
 
   while (CONST_INT_P (op1))

 {
+  HOST_WIDE_INT amount;


This has to be an rtx rather than HOST_WIDE_INT from the way you use it later 
on.

 
   /* We only want to handle integral modes.  This catches VOIDmode,

 CCmode, and the floating-point modes.  An exception is that we
@@ -11649,7 +11649,8 @@ simplify_comparison (enum rtx_code code,
   /* Try to simplify the compare to constant, possibly changing the
 comparison op, and/or changing op1 to zero.  */
   code = simplify_compare_const (code, mode, op0, );
-  const_op = INTVAL (op1);
+  HOST_WIDE_INT const_op = INTVAL (op1);
+  unsigned HOST_WIDE_INT uconst_op = (unsigned HOST_WIDE_INT) const_op;
 
Can this be just "unsigned HOST_WIDE_INT uconst_op = UINTVAL (op1);" ?


...

+ unsigned HOST_WIDE_INT low_mask
+   = (((unsigned HOST_WIDE_INT) 1 << INTVAL (amount)) - 1);
  unsigned HOST_WIDE_INT low_bits
-   = (nonzero_bits (XEXP (op0, 0), mode)
-  & (((unsigned HOST_WIDE_INT) 1
-  << INTVAL (XEXP (op0, 1))) - 1));
+   = (nonzero_bits (XEXP (op0, 0), mode) & low_mask);
  if (low_bits == 0 || !equality_comparison_p)
{

(unsigned HOST_WIDE_INT) 1 can be replaced with HOST_WIDE_INT_1U.



Re: Thoughts on memcmp expansion (PR43052)

2016-05-13 Thread Bernd Schmidt

On 05/13/2016 03:07 PM, Richard Biener wrote:

On Fri, May 13, 2016 at 3:05 PM, Bernd Schmidt  wrote:

Huh? Can you elaborate?


When you have a builtin taking a size in bytes then a byte is 8 bits,
not BITS_PER_UNIT bits.


That makes no sense to me. I think the definition of a byte depends on 
the machine (hence the term "octet" was coined to be unambiguous). Also, 
such a definition would seem to imply that machines with 10-bit bytes 
cannot implement memcpy or memcmp.


Joseph, can you clarify the standard's meaning here?


Bernd


Re: Thoughts on memcmp expansion (PR43052)

2016-05-13 Thread Richard Biener
On Fri, May 13, 2016 at 3:05 PM, Bernd Schmidt  wrote:
> On 05/13/2016 12:20 PM, Richard Biener wrote:
>>
>> I'm not much of a fan of C++-ification (in this case it makes review
>> harder) but well ...
>
>
> I felt it was a pretty natural way to structure the code to avoid
> duplicating the same logic across more functions, and we might as well use
> the language for such purposes given that we've bothered to switch.
>
>> +  if (tree_fits_uhwi_p (len)
>> +  && (leni = tree_to_uhwi (len)) <= GET_MODE_SIZE (word_mode)
>> +  && exact_log2 (leni) != -1
>> +  && (align1 = get_pointer_alignment (arg1)) >= leni * BITS_PER_UNIT
>> +  && (align2 = get_pointer_alignment (arg2)) >= leni * BITS_PER_UNIT)
>>
>> I think * BITS_PER_UNIT has to be * 8 here as the C standard defines
>> it that way.
>
>
> Huh? Can you elaborate?

When you have a builtin taking a size in bytes then a byte is 8 bits,
not BITS_PER_UNIT bits.

Richard.

> [...]
>>
>> Ok with those changes.
>
>
> Thanks. I won't be reading email for the next two weeks, so I'll be checking
> it in afterwards.
>
>
> Bernd


Re: Thoughts on memcmp expansion (PR43052)

2016-05-13 Thread Bernd Schmidt

On 05/13/2016 12:20 PM, Richard Biener wrote:

I'm not much of a fan of C++-ification (in this case it makes review
harder) but well ...


I felt it was a pretty natural way to structure the code to avoid 
duplicating the same logic across more functions, and we might as well 
use the language for such purposes given that we've bothered to switch.



+  if (tree_fits_uhwi_p (len)
+  && (leni = tree_to_uhwi (len)) <= GET_MODE_SIZE (word_mode)
+  && exact_log2 (leni) != -1
+  && (align1 = get_pointer_alignment (arg1)) >= leni * BITS_PER_UNIT
+  && (align2 = get_pointer_alignment (arg2)) >= leni * BITS_PER_UNIT)

I think * BITS_PER_UNIT has to be * 8 here as the C standard defines
it that way.


Huh? Can you elaborate?

[...]

Ok with those changes.


Thanks. I won't be reading email for the next two weeks, so I'll be 
checking it in afterwards.



Bernd


RUN_UNDER_VALGRIND statistics for GCC 7

2016-05-13 Thread Martin Liška
Hello.

I've tried to apply the same patch for the current trunk and tried to separate
reported errors to a different categories by a simple script ([1]).

There are number (complete report: [2]):

SECTION: gfortran 
  error types: 3534, total errors: 113282
  error types: 90.15%, total errors: 27.11%
SECTION: c++ 
  error types: 161, total errors: 7260
  error types: 4.11%, total errors: 1.74%
SECTION: c 
  error types: 90, total errors: 6320
  error types: 2.30%, total errors: 1.51%
SECTION: c-common 
  error types: 39, total errors: 205033
  error types: 0.99%, total errors: 49.06%
SECTION: Other 
  error types: 96, total errors: 86037
  error types: 2.45%, total errors: 20.59%

Type in the dump means a back trace, while 'total errors' represent total # of 
errors.
The second line shows a percentage ratio compared to all errors seen.

As seen fortran has very big variety of back traces. Well, it shows that we 
have multiple PRs reported,
they are references in the following PR: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68800

C++ FE looks quite good, same as C. c-common contains very interesting group:

are possibly lost: 204032 occurences
  malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
  xmalloc (xmalloc.c:148)
  new_buff (lex.c:3158)
  _cpp_get_buff (lex.c:3191)
  cpp_create_reader(c_lang, ht*, line_maps*) (init.c:251)
  c_common_init_options(unsigned int, cl_decoded_option*) (c-opts.c:219)
  toplev::main(int, char**) (toplev.c:2070)
  main (main.c:39)

(which is in fact majority of errors in the section). That's something I'm 
going to look at.

The last category (called Other) is dominated by

are definitely lost: 20740 occurences
  calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
  xcalloc (xmalloc.c:163)
  main (collect2.c:975)

which is probably option handling allocation stuff. Apart from that, there are 
just ~90 types of different
back traces. It's doable to remove majority of these in this stage1. OTOH as 
I'm not familiar with a FE (mainly Fortran),
solving Fortran FE would be not trivial.

Note: The dump comes from one week old build.

Martin

[1] https://github.com/marxin/script-misc/blob/master/valgrind-grep.py
[2] https://drive.google.com/open?id=0B0pisUJ80pO1M1VOd2pObUVXSEU


Re: [PATCH][combine] PR middle-end/71074 Check that const_op is >= 0 before potentially shifting in simplify_comparison

2016-05-13 Thread Bernd Schmidt

On 05/13/2016 02:21 PM, Kyrill Tkachov wrote:

Hi all,

In this PR we may end up shifting a negative value left in
simplify_comparison.
The statement is:
const_op <<= INTVAL (XEXP (op0, 1));

This patch guards the block of that statement by const_op >= 0.
I _think_ that's a correct thing to do for that trasnformation:
"If we have (compare (xshiftrt FOO N) (const_int C)) and
  the low order N bits of FOO are known to be zero, we can do this
  by comparing FOO with C shifted left N bits so long as no
  overflow occurs."


Isn't the condition testing for overflow though? In a somewhat 
nonobvious way.


So I think you should just use an unsigned version of const_op. While 
we're here we might as well make the code here a little more readable. 
How about the patch below? Can you test whether this works for you?



Bernd
Index: combine.c
===
--- combine.c	(revision 236113)
+++ combine.c	(working copy)
@@ -11628,13 +11628,13 @@ simplify_comparison (enum rtx_code code,
 
   while (CONST_INT_P (op1))
 {
+  HOST_WIDE_INT amount;
   machine_mode mode = GET_MODE (op0);
   unsigned int mode_width = GET_MODE_PRECISION (mode);
   unsigned HOST_WIDE_INT mask = GET_MODE_MASK (mode);
   int equality_comparison_p;
   int sign_bit_comparison_p;
   int unsigned_comparison_p;
-  HOST_WIDE_INT const_op;
 
   /* We only want to handle integral modes.  This catches VOIDmode,
 	 CCmode, and the floating-point modes.  An exception is that we
@@ -11649,7 +11649,8 @@ simplify_comparison (enum rtx_code code,
   /* Try to simplify the compare to constant, possibly changing the
 	 comparison op, and/or changing op1 to zero.  */
   code = simplify_compare_const (code, mode, op0, );
-  const_op = INTVAL (op1);
+  HOST_WIDE_INT const_op = INTVAL (op1);
+  unsigned HOST_WIDE_INT uconst_op = (unsigned HOST_WIDE_INT) const_op;
 
   /* Compute some predicates to simplify code below.  */
 
@@ -11899,7 +11900,7 @@ simplify_comparison (enum rtx_code code,
 	  if (GET_MODE_CLASS (mode) == MODE_INT
 	  && (unsigned_comparison_p || equality_comparison_p)
 	  && HWI_COMPUTABLE_MODE_P (mode)
-	  && (unsigned HOST_WIDE_INT) const_op <= GET_MODE_MASK (mode)
+	  && uconst_op <= GET_MODE_MASK (mode)
 	  && const_op >= 0
 	  && have_insn_for (COMPARE, mode))
 	{
@@ -12198,28 +12199,29 @@ simplify_comparison (enum rtx_code code,
 	  break;
 
 	case ASHIFT:
+	  amount = XEXP (op0, 1);
 	  /* If we have (compare (ashift FOO N) (const_int C)) and
 	 the high order N bits of FOO (N+1 if an inequality comparison)
 	 are known to be zero, we can do this by comparing FOO with C
 	 shifted right N bits so long as the low-order N bits of C are
 	 zero.  */
-	  if (CONST_INT_P (XEXP (op0, 1))
-	  && INTVAL (XEXP (op0, 1)) >= 0
-	  && ((INTVAL (XEXP (op0, 1)) + ! equality_comparison_p)
+	  if (CONST_INT_P (amount)
+	  && INTVAL (amount) >= 0
+	  && ((INTVAL (amount) + ! equality_comparison_p)
 		  < HOST_BITS_PER_WIDE_INT)
-	  && (((unsigned HOST_WIDE_INT) const_op
-		   & (((unsigned HOST_WIDE_INT) 1 << INTVAL (XEXP (op0, 1)))
+	  && ((uconst_op
+		   & (((unsigned HOST_WIDE_INT) 1 << INTVAL (amount))
 		  - 1)) == 0)
 	  && mode_width <= HOST_BITS_PER_WIDE_INT
 	  && (nonzero_bits (XEXP (op0, 0), mode)
-		  & ~(mask >> (INTVAL (XEXP (op0, 1))
+		  & ~(mask >> (INTVAL (amount)
 			   + ! equality_comparison_p))) == 0)
 	{
 	  /* We must perform a logical shift, not an arithmetic one,
 		 as we want the top N bits of C to be zero.  */
 	  unsigned HOST_WIDE_INT temp = const_op & GET_MODE_MASK (mode);
 
-	  temp >>= INTVAL (XEXP (op0, 1));
+	  temp >>= INTVAL (amount);
 	  op1 = gen_int_mode (temp, mode);
 	  op0 = XEXP (op0, 0);
 	  continue;
@@ -12227,13 +12229,13 @@ simplify_comparison (enum rtx_code code,
 
 	  /* If we are doing a sign bit comparison, it means we are testing
 	 a particular bit.  Convert it to the appropriate AND.  */
-	  if (sign_bit_comparison_p && CONST_INT_P (XEXP (op0, 1))
+	  if (sign_bit_comparison_p && CONST_INT_P (amount)
 	  && mode_width <= HOST_BITS_PER_WIDE_INT)
 	{
 	  op0 = simplify_and_const_int (NULL_RTX, mode, XEXP (op0, 0),
 	((unsigned HOST_WIDE_INT) 1
 	 << (mode_width - 1
-		 - INTVAL (XEXP (op0, 1);
+		 - INTVAL (amount;
 	  code = (code == LT ? NE : EQ);
 	  continue;
 	}
@@ -12242,8 +12244,8 @@ simplify_comparison (enum rtx_code code,
 	 the low bit to the sign bit, we can convert this to an AND of the
 	 low-order bit.  */
 	  if (const_op == 0 && equality_comparison_p
-	  && CONST_INT_P (XEXP (op0, 1))
-	  && UINTVAL (XEXP (op0, 1)) == mode_width - 1)
+	  && CONST_INT_P (amount)
+	  && UINTVAL (amount) == mode_width - 1)
 	{
 	  op0 = simplify_and_const_int (NULL_RTX, mode, XEXP (op0, 0), 

[PTX] atomic_compare_exchange_$n

2016-05-13 Thread Nathan Sidwell
The atomic_compare_exchange_$n builtins have an exciting calling convention 
where the 4th ('weak') parm is dropped in a real call.  This caused 
gcc.dg/atomic-noinline.c to fail on PTX as the prototype didn't match the use.


Fixed thusly when emitting the ptx prototyp.  Also fixed  is  the subsequently 
uncovered type mismatch in the testcase -- atomic_is_lock_free's first parm is 
size_t not int.  Usually this mismatch is harmless even when one is 64 bits and 
the other 32. But again, PTX requires a match.


Applied to trunk.

nathan
2016-05-13  Nathan Sidwell  

	gcc/
	* config/nvptx/nvptx.c (write_fn_proto): Handle
	BUILT_IN_ATOMIC_COMPARE_EXCHANGE_n oddity.

	gcc/testsuite/
	* gcc.dg/atomic-noinline-aux.c: Include stddef.h. Fix
	__atomic_is_lock_free declaration.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 236208)
+++ config/nvptx/nvptx.c	(working copy)
@@ -751,6 +751,26 @@ write_fn_proto (std::stringstream , bo
   tree fntype = TREE_TYPE (decl);
   tree result_type = TREE_TYPE (fntype);
 
+  /* atomic_compare_exchange_$n builtins have an exceptional calling
+ convention.  */
+  int not_atomic_weak_arg = -1;
+  if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
+switch (DECL_FUNCTION_CODE (decl))
+  {
+  case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_1:
+  case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_2:
+  case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_4:
+  case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_8:
+  case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_16:
+	/* These atomics skip the 'weak' parm in an actual library
+	   call.  We must skip it in the prototype too.  */
+	not_atomic_weak_arg = 3;
+	break;
+
+  default:
+	break;
+  }
+
   /* Declare the result.  */
   bool return_in_mem = write_return_type (s, true, result_type);
 
@@ -775,11 +795,14 @@ write_fn_proto (std::stringstream , bo
   prototyped = false;
 }
 
-  for (; args; args = TREE_CHAIN (args))
+  for (; args; args = TREE_CHAIN (args), not_atomic_weak_arg--)
 {
   tree type = prototyped ? TREE_VALUE (args) : TREE_TYPE (args);
-
-  argno = write_arg_type (s, -1, argno, type, prototyped);
+  
+  if (not_atomic_weak_arg)
+	argno = write_arg_type (s, -1, argno, type, prototyped);
+  else
+	gcc_assert (type == boolean_type_node);
 }
 
   if (stdarg_p (fntype))
Index: testsuite/gcc.dg/alias-15.c
===
--- testsuite/gcc.dg/alias-15.c	(revision 236208)
+++ testsuite/gcc.dg/alias-15.c	(nonexistent)
@@ -1,15 +0,0 @@
-/* { dg-do compile } */
-/* { dg-additional-options  "-O2 -fdump-ipa-cgraph" } */
-
-/* RTL-level CSE shouldn't introduce LCO (for the string) into varpool */
-char *p;
-
-void foo ()
-{
-  p = "abc\n";
-
-  while (*p != '\n')
-p++;
-}
-
-/* { dg-final { scan-ipa-dump-not "LC0" "cgraph" } } */
Index: testsuite/gcc.dg/atomic-noinline-aux.c
===
--- testsuite/gcc.dg/atomic-noinline-aux.c	(revision 236208)
+++ testsuite/gcc.dg/atomic-noinline-aux.c	(working copy)
@@ -7,6 +7,7 @@
the exact entry points the test file will require.  All these routines
simply set the first parameter to 1, and the caller will test for that.  */
 
+#include 
 #include 
 #include 
 #include 
@@ -64,7 +65,7 @@ __atomic_fetch_nand_1 (unsigned char *p,
   return ret;
 }
 
-bool __atomic_is_lock_free (int i, void *p)
+bool __atomic_is_lock_free (size_t i, void *p)
 {
   *(short *)p = 1;
   return true;


Re: [PATCH 3/3] Enhance dumps of IVOPTS

2016-05-13 Thread Martin Liška
On 05/13/2016 02:46 PM, Richard Biener wrote:
> Use them for HOST_WIDE_INT printing, for [u]int64_t use the PRI stuff.
> 
> Richard.

Thanks you both, installed as r236208.

Martin


Re: [PATCH 3/3] Enhance dumps of IVOPTS

2016-05-13 Thread Richard Biener
On Fri, May 13, 2016 at 2:43 PM, Kyrill Tkachov
 wrote:
> Hi Martin,
>
>
> On 13/05/16 13:39, Martin Liška wrote:
>>
>> On 05/13/2016 02:11 PM, H.J. Lu wrote:
>>>
>>> On Fri, May 13, 2016 at 3:44 AM, Martin Liška  wrote:

 On 05/13/2016 11:43 AM, Bin.Cheng wrote:
>
> On Thu, May 12, 2016 at 5:41 PM, Martin Liška  wrote:
>>
>> On 05/12/2016 03:51 PM, Bin.Cheng wrote:
>>>
>>> On Thu, May 12, 2016 at 1:13 PM, Martin Liška  wrote:

 On 05/10/2016 03:16 PM, Bin.Cheng wrote:
>
> Another way is to remove the use of id for struct iv_inv_expr_ent
> once
> for all.  We can change iv_ca.used_inv_expr and
> cost_pair.inv_expr_id
> to pointers, and rename iv_inv_expr_ent.id to count and use this to
> record reference number in iv_ca.  This if-statement on dump_file
> can
> be saved.  Also I think it simplifies current code a bit.  For now,
> there are id <-> struct maps for different structures in IVOPT
> which
> make it not straightforward.

 Hi.

 I'm sending second version of the patch. I tried to follow your
 advices, but
 because of a iv_inv_expr_ent can simultaneously belong to multiply
 iv_cas,
 putting counter to iv_inv_expr_ent does not works. Instead of that,
 I've
 decided to replace used_inv_expr with a hash_map that contains used
 inv_exps
 and where value of the map is # of usages.

 Further questions:
 + iv_inv_expr_ent::id can be now removed as it's used just for
 purpose of dumps
 Group 0:
cand  costscaled  freqcompl.  depends on
5 2   2.001.000
6 4   4.001.001inv_expr:0
7 4   4.001.001inv_expr:1
8 4   4.001.001inv_expr:2

 That can be replaced with print_generic_expr, but I think using ids
 makes the dump
 output more clear.
>>>
>>> I am okay with keeping id.  Could you please dump all inv_exprs in a
>>> single section like
>>> :
>>> inv_expr 0: print_generic_expr
>>> inv_expr 1: ...
>>>
>>> Then only dump the id afterwards?
>>>
>> Sure, it would be definitely better:
>>
>> The new dump format looks:
>>
>> :
>> inv_expr 0: sudoku_351(D) + (sizetype) S.833_774 * 4
>> inv_expr 1: sudoku_351(D) + ((sizetype) S.833_774 * 4 +
>> 18446744073709551580)
>> inv_expr 2: sudoku_351(D) + ((sizetype) S.833_774 + 72) * 4
>> inv_expr 3: sudoku_351(D) + ((sizetype) S.833_774 + 81) * 4
>> inv_expr 4:  + (sizetype) _377 * 4
>> inv_expr 5:  + ((sizetype) _377 * 4 + 18446744073709551612)
>> inv_expr 6:  + ((sizetype) _377 + 8) * 4
>> inv_expr 7:  + ((sizetype) _377 + 9) * 4
>>
>> :
>> Group 0:
>>cand  costscaled  freqcompl.  depends on
>>
>> ...
>>
>> Improved to:
>>cost: 27 (complexity 2)
>>cand_cost: 11
>>cand_group_cost: 10 (complexity 2)
>>candidates: 3, 5
>> group:0 --> iv_cand:5, cost=(2,0)
>> group:1 --> iv_cand:5, cost=(4,1)
>> group:2 --> iv_cand:5, cost=(4,1)
>> group:3 --> iv_cand:3, cost=(0,0)
>> group:4 --> iv_cand:3, cost=(0,0)
>>invariants 1, 6
>>invariant expressions 6, 3
>>
>> The only question here is that as used_inv_exprs are stored in a
>> hash_map,
>> order of dumped invariants would not be stable. Is it problem?
>
> It is okay.
>
> Only nitpicking on this version.
>
 + As check_GNU_style.sh reported multiple 8 spaces issues in hunks
 I've touched, I decided
 to fix all 8 spaces issues. Hope it's fine.

 I'm going to test the patch.
 Thoughts?
>>>
>>> Some comments on the patch embedded.
>>>
 +/* Forward declaration.  */
>>>
>>> Not necessary.

 +struct iv_inv_expr_ent;
 +
>>
>> I think it's needed because struct cost_pair uses a pointer to
>> iv_inv_expr_ent.
>
> I mean the comment, clearly the declaration is self-documented.

 Hi.

 Yeah, removed.

>> @@ -6000,11 +6045,12 @@ iv_ca_set_no_cp (struct ivopts_data *data,
>> struct iv_ca *ivs,
>>
>> iv_ca_set_remove_invariants (ivs, cp->depends_on);
>>
>> -  if (cp->inv_expr_id != -1)
>> +  if (cp->inv_expr != NULL)
>>   {
>> -  ivs->used_inv_expr[cp->inv_expr_id]--;
>> -  if (ivs->used_inv_expr[cp->inv_expr_id] == 0)
>> -ivs->num_used_inv_expr--;
>> +  unsigned *slot = ivs->used_inv_exprs->get (cp->inv_expr);
>> +  --(*slot);
>> +

Re: [PATCH 3/3] Enhance dumps of IVOPTS

2016-05-13 Thread Kyrill Tkachov

Hi Martin,

On 13/05/16 13:39, Martin Liška wrote:

On 05/13/2016 02:11 PM, H.J. Lu wrote:

On Fri, May 13, 2016 at 3:44 AM, Martin Liška  wrote:

On 05/13/2016 11:43 AM, Bin.Cheng wrote:

On Thu, May 12, 2016 at 5:41 PM, Martin Liška  wrote:

On 05/12/2016 03:51 PM, Bin.Cheng wrote:

On Thu, May 12, 2016 at 1:13 PM, Martin Liška  wrote:

On 05/10/2016 03:16 PM, Bin.Cheng wrote:

Another way is to remove the use of id for struct iv_inv_expr_ent once
for all.  We can change iv_ca.used_inv_expr and cost_pair.inv_expr_id
to pointers, and rename iv_inv_expr_ent.id to count and use this to
record reference number in iv_ca.  This if-statement on dump_file can
be saved.  Also I think it simplifies current code a bit.  For now,
there are id <-> struct maps for different structures in IVOPT which
make it not straightforward.

Hi.

I'm sending second version of the patch. I tried to follow your advices, but
because of a iv_inv_expr_ent can simultaneously belong to multiply iv_cas,
putting counter to iv_inv_expr_ent does not works. Instead of that, I've
decided to replace used_inv_expr with a hash_map that contains used inv_exps
and where value of the map is # of usages.

Further questions:
+ iv_inv_expr_ent::id can be now removed as it's used just for purpose of dumps
Group 0:
   cand  costscaled  freqcompl.  depends on
   5 2   2.001.000
   6 4   4.001.001inv_expr:0
   7 4   4.001.001inv_expr:1
   8 4   4.001.001inv_expr:2

That can be replaced with print_generic_expr, but I think using ids makes the 
dump
output more clear.

I am okay with keeping id.  Could you please dump all inv_exprs in a
single section like
:
inv_expr 0: print_generic_expr
inv_expr 1: ...

Then only dump the id afterwards?


Sure, it would be definitely better:

The new dump format looks:

:
inv_expr 0: sudoku_351(D) + (sizetype) S.833_774 * 4
inv_expr 1: sudoku_351(D) + ((sizetype) S.833_774 * 4 + 
18446744073709551580)
inv_expr 2: sudoku_351(D) + ((sizetype) S.833_774 + 72) * 4
inv_expr 3: sudoku_351(D) + ((sizetype) S.833_774 + 81) * 4
inv_expr 4:  + (sizetype) _377 * 4
inv_expr 5:  + ((sizetype) _377 * 4 + 18446744073709551612)
inv_expr 6:  + ((sizetype) _377 + 8) * 4
inv_expr 7:  + ((sizetype) _377 + 9) * 4

:
Group 0:
   cand  costscaled  freqcompl.  depends on

...

Improved to:
   cost: 27 (complexity 2)
   cand_cost: 11
   cand_group_cost: 10 (complexity 2)
   candidates: 3, 5
group:0 --> iv_cand:5, cost=(2,0)
group:1 --> iv_cand:5, cost=(4,1)
group:2 --> iv_cand:5, cost=(4,1)
group:3 --> iv_cand:3, cost=(0,0)
group:4 --> iv_cand:3, cost=(0,0)
   invariants 1, 6
   invariant expressions 6, 3

The only question here is that as used_inv_exprs are stored in a hash_map,
order of dumped invariants would not be stable. Is it problem?

It is okay.

Only nitpicking on this version.


+ As check_GNU_style.sh reported multiple 8 spaces issues in hunks I've 
touched, I decided
to fix all 8 spaces issues. Hope it's fine.

I'm going to test the patch.
Thoughts?

Some comments on the patch embedded.


+/* Forward declaration.  */

Not necessary.

+struct iv_inv_expr_ent;
+

I think it's needed because struct cost_pair uses a pointer to iv_inv_expr_ent.

I mean the comment, clearly the declaration is self-documented.

Hi.

Yeah, removed.


@@ -6000,11 +6045,12 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca 
*ivs,

iv_ca_set_remove_invariants (ivs, cp->depends_on);

-  if (cp->inv_expr_id != -1)
+  if (cp->inv_expr != NULL)
  {
-  ivs->used_inv_expr[cp->inv_expr_id]--;
-  if (ivs->used_inv_expr[cp->inv_expr_id] == 0)
-ivs->num_used_inv_expr--;
+  unsigned *slot = ivs->used_inv_exprs->get (cp->inv_expr);
+  --(*slot);
+  if (*slot == 0)
+ivs->used_inv_exprs->remove (cp->inv_expr);

I suppose insertion/removal of hash_map are not expensive?  Because
the algorithm causes a lot of these operations.

I think it should be ~ a constant operation.


@@ -6324,12 +6368,26 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, 
struct iv_ca *ivs)
  fprintf (file, "   group:%d --> ??\n", group->id);
  }

+  bool any_invariant = false;
for (i = 1; i <= data->max_inv_id; i++)
  if (ivs->n_invariant_uses[i])
{
+const char *pref = any_invariant ? ", " : "  invariants ";
+any_invariant = true;
  fprintf (file, "%s%d", pref, i);
-pref = ", ";
}
+
+  if (any_invariant)
+fprintf (file, "\n");
+

To make dump easier to read, we can simply dump invariant
variables/expressions unconditionally.  Also keep invariant variables
and expressions in the same form.

Sure, that's a good idea!

Sample output:


Initial set of candidates:
   cost: 17 (complexity 0)
   cand_cost: 11
   cand_group_cost: 2 (complexity 0)
   candidates: 1, 5
group:0 --> iv_cand:5, cost=(2,0)
group:1 --> iv_cand:1, 

Re: [PATCH 3/3] Enhance dumps of IVOPTS

2016-05-13 Thread Martin Liška
On 05/13/2016 02:11 PM, H.J. Lu wrote:
> On Fri, May 13, 2016 at 3:44 AM, Martin Liška  wrote:
>> On 05/13/2016 11:43 AM, Bin.Cheng wrote:
>>> On Thu, May 12, 2016 at 5:41 PM, Martin Liška  wrote:
 On 05/12/2016 03:51 PM, Bin.Cheng wrote:
> On Thu, May 12, 2016 at 1:13 PM, Martin Liška  wrote:
>> On 05/10/2016 03:16 PM, Bin.Cheng wrote:
>>> Another way is to remove the use of id for struct iv_inv_expr_ent once
>>> for all.  We can change iv_ca.used_inv_expr and cost_pair.inv_expr_id
>>> to pointers, and rename iv_inv_expr_ent.id to count and use this to
>>> record reference number in iv_ca.  This if-statement on dump_file can
>>> be saved.  Also I think it simplifies current code a bit.  For now,
>>> there are id <-> struct maps for different structures in IVOPT which
>>> make it not straightforward.
>>
>> Hi.
>>
>> I'm sending second version of the patch. I tried to follow your advices, 
>> but
>> because of a iv_inv_expr_ent can simultaneously belong to multiply 
>> iv_cas,
>> putting counter to iv_inv_expr_ent does not works. Instead of that, I've
>> decided to replace used_inv_expr with a hash_map that contains used 
>> inv_exps
>> and where value of the map is # of usages.
>>
>> Further questions:
>> + iv_inv_expr_ent::id can be now removed as it's used just for purpose 
>> of dumps
>> Group 0:
>>   cand  costscaled  freqcompl.  depends on
>>   5 2   2.001.000
>>   6 4   4.001.001inv_expr:0
>>   7 4   4.001.001inv_expr:1
>>   8 4   4.001.001inv_expr:2
>>
>> That can be replaced with print_generic_expr, but I think using ids 
>> makes the dump
>> output more clear.
> I am okay with keeping id.  Could you please dump all inv_exprs in a
> single section like
> :
> inv_expr 0: print_generic_expr
> inv_expr 1: ...
>
> Then only dump the id afterwards?
>

 Sure, it would be definitely better:

 The new dump format looks:

 :
 inv_expr 0: sudoku_351(D) + (sizetype) S.833_774 * 4
 inv_expr 1: sudoku_351(D) + ((sizetype) S.833_774 * 4 + 
 18446744073709551580)
 inv_expr 2: sudoku_351(D) + ((sizetype) S.833_774 + 72) * 4
 inv_expr 3: sudoku_351(D) + ((sizetype) S.833_774 + 81) * 4
 inv_expr 4:  + (sizetype) _377 * 4
 inv_expr 5:  + ((sizetype) _377 * 4 + 18446744073709551612)
 inv_expr 6:  + ((sizetype) _377 + 8) * 4
 inv_expr 7:  + ((sizetype) _377 + 9) * 4

 :
 Group 0:
   cand  costscaled  freqcompl.  depends on

 ...

 Improved to:
   cost: 27 (complexity 2)
   cand_cost: 11
   cand_group_cost: 10 (complexity 2)
   candidates: 3, 5
group:0 --> iv_cand:5, cost=(2,0)
group:1 --> iv_cand:5, cost=(4,1)
group:2 --> iv_cand:5, cost=(4,1)
group:3 --> iv_cand:3, cost=(0,0)
group:4 --> iv_cand:3, cost=(0,0)
   invariants 1, 6
   invariant expressions 6, 3

 The only question here is that as used_inv_exprs are stored in a hash_map,
 order of dumped invariants would not be stable. Is it problem?
>>> It is okay.
>>>
>>> Only nitpicking on this version.
>>>

>>
>> + As check_GNU_style.sh reported multiple 8 spaces issues in hunks I've 
>> touched, I decided
>> to fix all 8 spaces issues. Hope it's fine.
>>
>> I'm going to test the patch.
>> Thoughts?
>
> Some comments on the patch embedded.
>
>>
>> +/* Forward declaration.  */
> Not necessary.
>> +struct iv_inv_expr_ent;
>> +

 I think it's needed because struct cost_pair uses a pointer to 
 iv_inv_expr_ent.
>>> I mean the comment, clearly the declaration is self-documented.
>>
>> Hi.
>>
>> Yeah, removed.
>>
>>>
 @@ -6000,11 +6045,12 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct 
 iv_ca *ivs,

iv_ca_set_remove_invariants (ivs, cp->depends_on);

 -  if (cp->inv_expr_id != -1)
 +  if (cp->inv_expr != NULL)
  {
 -  ivs->used_inv_expr[cp->inv_expr_id]--;
 -  if (ivs->used_inv_expr[cp->inv_expr_id] == 0)
 -ivs->num_used_inv_expr--;
 +  unsigned *slot = ivs->used_inv_exprs->get (cp->inv_expr);
 +  --(*slot);
 +  if (*slot == 0)
 +ivs->used_inv_exprs->remove (cp->inv_expr);
>>> I suppose insertion/removal of hash_map are not expensive?  Because
>>> the algorithm causes a lot of these operations.
>>
>> I think it should be ~ a constant operation.
>>
>>>
 @@ -6324,12 +6368,26 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, 
 struct iv_ca *ivs)
  fprintf (file, "   group:%d --> ??\n", group->id);
  }

 +  bool any_invariant = false;
for (i = 1; i <= data->max_inv_id; 

[PATCH] Enable libgloss support for ARC in top-level configure.ac

2016-05-13 Thread Anton Kolesov
ARC support has been added to libgloss in this patch to newlib repository:
https://sourceware.org/git/?p=newlib-cygwin.git;a=commit;h=acdfcb0a0af54715bc37ed1c767bfe901b679357

2016-05-13  Anton Kolesov  

* configure.ac: Add ARC support to libgloss.
* configure: Regenerate
---
 configure| 3 ---
 configure.ac | 3 ---
 2 files changed, 6 deletions(-)

diff --git a/configure b/configure
index 24e5157..ea63784 100755
--- a/configure
+++ b/configure
@@ -3756,9 +3756,6 @@ case "${target}" in
   sh*-*-pe|mips*-*-pe|*arm-wince-pe)
 noconfigdirs="$noconfigdirs tcl tk itcl libgui sim"
 ;;
-  arc-*-*|arceb-*-*)
-noconfigdirs="$noconfigdirs target-libgloss"
-;;
   arm-*-pe*)
 noconfigdirs="$noconfigdirs target-libgloss"
 ;;
diff --git a/configure.ac b/configure.ac
index 20dfbb4..54558df 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1092,9 +1092,6 @@ case "${target}" in
   sh*-*-pe|mips*-*-pe|*arm-wince-pe)
 noconfigdirs="$noconfigdirs tcl tk itcl libgui sim"
 ;;
-  arc-*-*|arceb-*-*)
-noconfigdirs="$noconfigdirs target-libgloss"
-;;
   arm-*-pe*)
 noconfigdirs="$noconfigdirs target-libgloss"
 ;;
-- 
2.8.1



[PATCH][obvious] Typo fix in tree-ssa-loop-ivanon.c

2016-05-13 Thread Kyrill Tkachov

Committing as obvious.

Thanks,
Kyrill

2016-05-12  Kyrylo Tkachov  

* tree-ssa-loop-ivanon.c (try_unroll_loop_completely): Typo fix in
comment.
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index 9d92276dbbbfe3a768b9e8b0c90ee60e05c885fb..e8f67953231f54ce2517a55e1587ccf646b8f74c 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -807,7 +807,7 @@ try_unroll_loop_completely (struct loop *loop,
 		 loop->num);
 	  return false;
 	}
-  /* Complette unrolling is major win when control flow is removed and
+  /* Complete unrolling is a major win when control flow is removed and
 	 one big basic block is created.  If the loop contains control flow
 	 the optimization may still be a win because of eliminating the loop
 	 overhead but it also may blow the branch predictor tables.


Re: [PATCH 7/7] SMS remove dependence on doloop: To identify read/write register as loop induction variable

2016-05-13 Thread Bernd Schmidt

On 05/05/2016 08:03 AM, Shiva Chen wrote:


-  /* We do not handle setting only part of the register.  */
-  if (DF_REF_FLAGS (adef) & DF_REF_READ_WRITE)
-return GRD_INVALID;
-


This isn't right, at least not without other changes. This prevents 
using references where the register is set as part of a ZERO_EXTRACT or 
STRICT_LOW_PART, and you seem not to handle such cases in the new code.



+/* Return rtx expression in INSN which calculate REG result.  */
+
+static rtx
+get_reg_calculate_expr (rtx_insn *insn, rtx reg)


Hmm, that comment seems to be not quite accurate, though. The function 
may return either the source of a SET that sets the register, or a MEM 
(where it's not indicated whether the MEM is the source or destination 
of the set, or whether the reg is set in the insn or is part of the address.


I think this interface is pretty bad. I think what you should do is 
check in the caller whether the def is of READ_WRITE type. If so, do a 
FOR_EACH_SUBRTX thing to look for a memory reference containing an 
autoinc expression, otherwise this new function (without the MEM parts) 
could be used. That'll also solve the problem above with the loss of 
DF_REF_READ_WRITE checking.


The walk to find the appropriate MEM would be in a separate function, 
used in the several places that



+  /* Find REG increment/decrement expr in following pattern
+
+ (parallel
+   (CC = compare (REG - 1, 0))
+   (REG = REG - 1))
+   */


Comment formatting, "*/" doesn't go on its own line. I think it's best 
not to quote a pattern here since you're not actually looking for it.  A 
better comment would be "For identifying ivs, we can actually ignore any 
other SETs occurring in parallel, so look for one with the correct 
SET_DEST."  I'm not actually sure, however, whether this is valid. It 
would depend on how this information is used later on, and what 
transformation other passes would want to do on the ivs.  Come to think 
of it, this is also true for autoinc ivs, but I guess these would only 
appear when run late as analysis for modulo-sched, so I guess that would 
be OK?



@@ -2354,6 +2489,20 @@ iv_number_of_iterations (struct loop *loop, rtx_insn 
*insn, rtx condition,
  goto fail;

op0 = XEXP (condition, 0);
+
+  /* We would use loop induction variable analysis
+ to get loop iteration count in SMS pass
+ which should work with/without doloop pass
+ active or not.
+
+ With doloop pass enabled, doloop pass would
+ generate pattern as (ne (REG - 1), 0) and
+ we recognize it by following code.  */
+  if (GET_CODE (op0) == PLUS
+  && CONST_INT_P (XEXP (op0, 1))
+  && REG_P (XEXP (op0, 0)))
+op0 = XEXP (op0, 0);


That really looks somewhat suspicious, and I can't really tell whether 
it's right. My instinct says no; how can you just drop a constant on the 
floor?



Bernd



[PATCH][obvious] Fix param name in dump file

2016-05-13 Thread Kyrill Tkachov

Hi all,

The name of the parameter is max-completely-peel-times.
Applying to trunk as obvious.

Thanks,
Kyrill

2016-05-13  Kyrylo Tkachov  

* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely):
Change --param max-completely-peeled-times to
--param max-completely-peel-times in dump file printing.
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index 9b59b4466c3203311cdd4346ef77f5af596890b6..9d92276dbbbfe3a768b9e8b0c90ee60e05c885fb 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -712,7 +712,7 @@ try_unroll_loop_completely (struct loop *loop,
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file, "Not unrolling loop %d "
-		 "(--param max-completely-peeled-times limit reached).\n",
+		 "(--param max-completely-peel-times limit reached).\n",
 		 loop->num);
   return false;
 }


[PATCH] Fix part of PR42587

2016-05-13 Thread Richard Biener

The recent subreg-CSE allows a part of PR42587 to be fixed with
minimal surgery to the bswap pass - namely adding support for
BIT_FIELD_REF.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-05-13  Richard Biener  

PR tree-optimization/42587
* tree-ssa-math-opts.c (perform_symbolic_merge): Handle BIT_FIELD_REF.
(find_bswap_or_nop_1): Likewise.
(bswap_replace): Likewise.

* gcc.dg/optimize-bswapsi-4.c: New testcase.

Index: gcc/tree-ssa-math-opts.c
===
*** gcc/tree-ssa-math-opts.c(revision 236159)
--- gcc/tree-ssa-math-opts.c(working copy)
*** perform_symbolic_merge (gimple *source_s
*** 2160,2168 
gimple *source_stmt;
struct symbolic_number *n_start;
  
/* Sources are different, cancel bswap if they are not memory location with
   the same base (array, structure, ...).  */
!   if (gimple_assign_rhs1 (source_stmt1) != gimple_assign_rhs1 (source_stmt2))
  {
uint64_t inc;
HOST_WIDE_INT start_sub, end_sub, end1, end2, end;
--- 2160,2175 
gimple *source_stmt;
struct symbolic_number *n_start;
  
+   tree rhs1 = gimple_assign_rhs1 (source_stmt1);
+   if (TREE_CODE (rhs1) == BIT_FIELD_REF)
+ rhs1 = TREE_OPERAND (rhs1, 0);
+   tree rhs2 = gimple_assign_rhs1 (source_stmt2);
+   if (TREE_CODE (rhs2) == BIT_FIELD_REF)
+ rhs2 = TREE_OPERAND (rhs2, 0);
+ 
/* Sources are different, cancel bswap if they are not memory location with
   the same base (array, structure, ...).  */
!   if (rhs1 != rhs2)
  {
uint64_t inc;
HOST_WIDE_INT start_sub, end_sub, end1, end2, end;
*** find_bswap_or_nop_1 (gimple *stmt, struc
*** 2285,2290 
--- 2292,2330 
if (find_bswap_or_nop_load (stmt, rhs1, n))
  return stmt;
  
+   /* Handle BIT_FIELD_REF.  */
+   if (TREE_CODE (rhs1) == BIT_FIELD_REF
+   && TREE_CODE (TREE_OPERAND (rhs1, 0)) == SSA_NAME)
+ {
+   unsigned HOST_WIDE_INT bitsize = tree_to_uhwi (TREE_OPERAND (rhs1, 1));
+   unsigned HOST_WIDE_INT bitpos = tree_to_uhwi (TREE_OPERAND (rhs1, 2));
+   if (bitpos % BITS_PER_UNIT == 0
+ && bitsize % BITS_PER_UNIT == 0
+ && init_symbolic_number (n, TREE_OPERAND (rhs1, 0)))
+   {
+ /* Shift.  */
+ if (!do_shift_rotate (RSHIFT_EXPR, n, bitpos))
+   return NULL;
+ 
+ /* Mask.  */
+ uint64_t mask = 0;
+ uint64_t tmp = (1 << BITS_PER_UNIT) - 1;
+ for (unsigned i = 0; i < bitsize / BITS_PER_UNIT;
+  i++, tmp <<= BITS_PER_UNIT)
+   mask |= (uint64_t) MARKER_MASK << (i * BITS_PER_MARKER);
+ n->n &= mask;
+ 
+ /* Convert.  */
+ n->type = TREE_TYPE (rhs1);
+ if (!n->base_addr)
+   n->range = TYPE_PRECISION (n->type) / BITS_PER_UNIT;
+ 
+ return verify_symbolic_number_p (n, stmt) ? stmt : NULL;
+   }
+ 
+   return NULL;
+ }
+ 
if (TREE_CODE (rhs1) != SSA_NAME)
  return NULL;
  
*** bswap_replace (gimple *cur_stmt, gimple
*** 2683,2688 
--- 2723,2730 
}
src = val_tmp;
  }
+   else if (TREE_CODE (src) == BIT_FIELD_REF)
+ src = TREE_OPERAND (src, 0);
  
if (n->range == 16)
  bswap_stats.found_16bit++;
Index: gcc/testsuite/gcc.dg/optimize-bswapsi-4.c
===
*** gcc/testsuite/gcc.dg/optimize-bswapsi-4.c   (revision 0)
--- gcc/testsuite/gcc.dg/optimize-bswapsi-4.c   (working copy)
***
*** 0 
--- 1,28 
+ /* { dg-do compile } */
+ /* { dg-require-effective-target bswap32 } */
+ /* { dg-options "-O2 -fdump-tree-bswap" } */
+ /* { dg-additional-options "-march=z900" { target s390-*-* } } */
+ 
+ typedef unsigned char u8;
+ typedef unsigned int u32;
+ union __anonunion
+ {
+   u32 value;
+   u8 bytes[4];
+ };
+ 
+ u32
+ acpi_ut_dword_byte_swap (u32 value)
+ {
+   union __anonunion in;
+   in.value = value;
+ #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+   return ((in.bytes[0] << 24) | (in.bytes[1] << 16)
+ | (in.bytes[2] << 8) | in.bytes[3]);
+ #else
+   return ((in.bytes[3] << 24) | (in.bytes[2] << 16)
+ | (in.bytes[1] << 8) | in.bytes[0]);
+ #endif
+ }
+ 
+ /* { dg-final { scan-tree-dump "32 bit bswap implementation found at" "bswap" 
} } */


[PATCH][combine] PR middle-end/71074 Check that const_op is >= 0 before potentially shifting in simplify_comparison

2016-05-13 Thread Kyrill Tkachov

Hi all,

In this PR we may end up shifting a negative value left in simplify_comparison.
The statement is:
const_op <<= INTVAL (XEXP (op0, 1));

This patch guards the block of that statement by const_op >= 0.
I _think_ that's a correct thing to do for that trasnformation:
"If we have (compare (xshiftrt FOO N) (const_int C)) and
 the low order N bits of FOO are known to be zero, we can do this
 by comparing FOO with C shifted left N bits so long as no
 overflow occurs."

The constant C here is const_op.

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2016-05-13  Kyrylo Tkachov  

PR middle-end/71074
* combine.c (simplify_comparison): Avoid left shift of negative
const_op in LSHIFTRT case.

2016-05-13  Kyrylo Tkachov  

PR middle-end/71074
* gcc.c-torture/compile/pr71074.c: New test.
diff --git a/gcc/combine.c b/gcc/combine.c
index 2a7a9e6e2b597246392ede22552af1bdd7e1a794..7a21d593777ef267942e0ee80e024b147907652f 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -12321,6 +12321,7 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
 	 optimization and for > or <= by setting all the low
 	 order N bits in the comparison constant.  */
 	  if (CONST_INT_P (XEXP (op0, 1))
+	  && const_op >= 0
 	  && INTVAL (XEXP (op0, 1)) > 0
 	  && INTVAL (XEXP (op0, 1)) < HOST_BITS_PER_WIDE_INT
 	  && mode_width <= HOST_BITS_PER_WIDE_INT
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr71074.c b/gcc/testsuite/gcc.c-torture/compile/pr71074.c
new file mode 100644
index ..9ad6cbe7c231069c86d3ade22784f338f331b657
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr71074.c
@@ -0,0 +1,13 @@
+int bar (void);
+
+void
+foo (unsigned long long a, int b)
+{
+  int i;
+
+for (a = -12; a >= 10; a = bar ())
+  break;
+
+if (i == bar () || bar () >= a)
+  bar ();
+}


Re: [PATCH 3/3] Enhance dumps of IVOPTS

2016-05-13 Thread H.J. Lu
On Fri, May 13, 2016 at 3:44 AM, Martin Liška  wrote:
> On 05/13/2016 11:43 AM, Bin.Cheng wrote:
>> On Thu, May 12, 2016 at 5:41 PM, Martin Liška  wrote:
>>> On 05/12/2016 03:51 PM, Bin.Cheng wrote:
 On Thu, May 12, 2016 at 1:13 PM, Martin Liška  wrote:
> On 05/10/2016 03:16 PM, Bin.Cheng wrote:
>> Another way is to remove the use of id for struct iv_inv_expr_ent once
>> for all.  We can change iv_ca.used_inv_expr and cost_pair.inv_expr_id
>> to pointers, and rename iv_inv_expr_ent.id to count and use this to
>> record reference number in iv_ca.  This if-statement on dump_file can
>> be saved.  Also I think it simplifies current code a bit.  For now,
>> there are id <-> struct maps for different structures in IVOPT which
>> make it not straightforward.
>
> Hi.
>
> I'm sending second version of the patch. I tried to follow your advices, 
> but
> because of a iv_inv_expr_ent can simultaneously belong to multiply iv_cas,
> putting counter to iv_inv_expr_ent does not works. Instead of that, I've
> decided to replace used_inv_expr with a hash_map that contains used 
> inv_exps
> and where value of the map is # of usages.
>
> Further questions:
> + iv_inv_expr_ent::id can be now removed as it's used just for purpose of 
> dumps
> Group 0:
>   cand  costscaled  freqcompl.  depends on
>   5 2   2.001.000
>   6 4   4.001.001inv_expr:0
>   7 4   4.001.001inv_expr:1
>   8 4   4.001.001inv_expr:2
>
> That can be replaced with print_generic_expr, but I think using ids makes 
> the dump
> output more clear.
 I am okay with keeping id.  Could you please dump all inv_exprs in a
 single section like
 :
 inv_expr 0: print_generic_expr
 inv_expr 1: ...

 Then only dump the id afterwards?

>>>
>>> Sure, it would be definitely better:
>>>
>>> The new dump format looks:
>>>
>>> :
>>> inv_expr 0: sudoku_351(D) + (sizetype) S.833_774 * 4
>>> inv_expr 1: sudoku_351(D) + ((sizetype) S.833_774 * 4 + 
>>> 18446744073709551580)
>>> inv_expr 2: sudoku_351(D) + ((sizetype) S.833_774 + 72) * 4
>>> inv_expr 3: sudoku_351(D) + ((sizetype) S.833_774 + 81) * 4
>>> inv_expr 4:  + (sizetype) _377 * 4
>>> inv_expr 5:  + ((sizetype) _377 * 4 + 18446744073709551612)
>>> inv_expr 6:  + ((sizetype) _377 + 8) * 4
>>> inv_expr 7:  + ((sizetype) _377 + 9) * 4
>>>
>>> :
>>> Group 0:
>>>   cand  costscaled  freqcompl.  depends on
>>>
>>> ...
>>>
>>> Improved to:
>>>   cost: 27 (complexity 2)
>>>   cand_cost: 11
>>>   cand_group_cost: 10 (complexity 2)
>>>   candidates: 3, 5
>>>group:0 --> iv_cand:5, cost=(2,0)
>>>group:1 --> iv_cand:5, cost=(4,1)
>>>group:2 --> iv_cand:5, cost=(4,1)
>>>group:3 --> iv_cand:3, cost=(0,0)
>>>group:4 --> iv_cand:3, cost=(0,0)
>>>   invariants 1, 6
>>>   invariant expressions 6, 3
>>>
>>> The only question here is that as used_inv_exprs are stored in a hash_map,
>>> order of dumped invariants would not be stable. Is it problem?
>> It is okay.
>>
>> Only nitpicking on this version.
>>
>>>
>
> + As check_GNU_style.sh reported multiple 8 spaces issues in hunks I've 
> touched, I decided
> to fix all 8 spaces issues. Hope it's fine.
>
> I'm going to test the patch.
> Thoughts?

 Some comments on the patch embedded.

>
> +/* Forward declaration.  */
 Not necessary.
> +struct iv_inv_expr_ent;
> +
>>>
>>> I think it's needed because struct cost_pair uses a pointer to 
>>> iv_inv_expr_ent.
>> I mean the comment, clearly the declaration is self-documented.
>
> Hi.
>
> Yeah, removed.
>
>>
>>> @@ -6000,11 +6045,12 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct 
>>> iv_ca *ivs,
>>>
>>>iv_ca_set_remove_invariants (ivs, cp->depends_on);
>>>
>>> -  if (cp->inv_expr_id != -1)
>>> +  if (cp->inv_expr != NULL)
>>>  {
>>> -  ivs->used_inv_expr[cp->inv_expr_id]--;
>>> -  if (ivs->used_inv_expr[cp->inv_expr_id] == 0)
>>> -ivs->num_used_inv_expr--;
>>> +  unsigned *slot = ivs->used_inv_exprs->get (cp->inv_expr);
>>> +  --(*slot);
>>> +  if (*slot == 0)
>>> +ivs->used_inv_exprs->remove (cp->inv_expr);
>> I suppose insertion/removal of hash_map are not expensive?  Because
>> the algorithm causes a lot of these operations.
>
> I think it should be ~ a constant operation.
>
>>
>>> @@ -6324,12 +6368,26 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, 
>>> struct iv_ca *ivs)
>>>  fprintf (file, "   group:%d --> ??\n", group->id);
>>>  }
>>>
>>> +  bool any_invariant = false;
>>>for (i = 1; i <= data->max_inv_id; i++)
>>>  if (ivs->n_invariant_uses[i])
>>>{
>>> +const char *pref = any_invariant ? ", " : "  invariants ";
>>> +any_invariant = true;
>>>  fprintf (file, "%s%d", 

Re: Re: Re: [ARM] Enable __fp16 as a function parameter and return type.

2016-05-13 Thread Joseph Myers
On Fri, 13 May 2016, Tejas Belagod wrote:

> > It's not a change between the two versions of ACLE.  It's a change
> > relative to the early (pre-ACLE) __fp16 specification (or, at least, a
> > clarification thereto in email on 12 Aug 2008) that was used as a basis
> > for the original implementation of __fp16 in GCC (and that thus is what's
> > currently implemented by GCC and tested for in the testsuite).
> > 
> 
> Hi Joseph,
> 
> I can't seem to find that email on gcc-patches circa August 2008 - which list
> was it sent to?

That was an internal discussion between CodeSourcery and ARM.  Email from 
Alasdair Grant, Tue, 12 Aug 2008 08:41:15 +0100.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, GCC] PR middle-end/55299, fold bitnot through ASR and rotates

2016-05-13 Thread Mikhail Maltsev
On 05/11/2016 10:52 AM, Marc Glisse wrote:
> +/* ~((~X) >> Y) -> X >> Y (for arithmetic shift).  */
> +(simplify
> + (bit_not (convert? (rshift (bit_not @0) @1)))
> +  (if (!TYPE_UNSIGNED (TREE_TYPE (@0))
> +   && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0)))
> +   (convert (rshift @0 @1
> 
> Is there a particular reason to split the converting / non-converting
> cases? For rotate, you managed to merge them nicely.
Fixed (i.e., merged two shift simplifications into one).
> 
> +
> +(simplify
> + (bit_not (convert? (rshift (convert@0 (bit_not @1)) @2)))
> +  (if (!TYPE_UNSIGNED (TREE_TYPE (@0))
> +   && TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (TREE_TYPE (@1))
> +   && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0)))
> +   (with
> +{ tree shift_type = TREE_TYPE (@0); }
> + (convert (rshift:shift_type (convert @1) @2)
> +
> +/* Same as above, but for rotates.  */
> +(for rotate (lrotate rrotate)
> + (simplify
> +  (bit_not (convert1?@0 (rotate (convert2?@1 (bit_not @2)) @3)))
> +   (if (TYPE_PRECISION (TREE_TYPE (@1)) <= TYPE_PRECISION (TREE_TYPE (@2))
> +&& TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (TREE_TYPE 
> (@1)))
> +(with
> + { tree operand_type = TREE_TYPE (@2); }
> +  (convert (rotate:operand_type @2 @3))
> 
> Is that really safe when the conversion from @2 to @1 is narrowing? I
> would expect something closer to
> (convert (rotate (convert:type_of_1 @2) @3))
> so the rotation is done in a type of the same precision as the original.
> 
> Or
> (convert (rotate:type_of_1 (convert @2) @3))
> if you prefer specifying the type there (I don't), and note that you
> need the 'convert' inside or specifying the type on rotate doesn't work.
Fixed.

> 
> I have a slight preference for element_precision over TYPE_PRECISION (which 
> for
> vectors is the number of elements), but I don't think it can currently cause
> issues for these particular transformations.
Fixed.
> 
> I don't know if we might want some :c / single_use restrictions, maybe on the
> outer convert and the rshift/rotate.
> 
I don't think :c can be used here. As for :s, I added it, as you suggested.

Also, I tried to add some more test cases for rotate with conversions, but
unfortunately GCC does not recognize rotate pattern, when narrowing conversions
are present.

-- 
Regards,
Mikhail Maltsev
diff --git a/gcc/match.pd b/gcc/match.pd
index 55dd23c..bb4bba6 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1453,6 +1453,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1); }
  (bit_op (shift (convert @0) @1) { mask; }))
 
+/* ~(~X >> Y) -> X >> Y (for arithmetic shift).  */
+(simplify
+ (bit_not (convert1?:s (rshift:s (convert2?@0 (bit_not @1)) @2)))
+  (if (!TYPE_UNSIGNED (TREE_TYPE (@0))
+   && element_precision (TREE_TYPE (@0))
+  <= element_precision (TREE_TYPE (@1))
+   && element_precision (type) <= element_precision (TREE_TYPE (@0)))
+   (with
+{ tree shift_type = TREE_TYPE (@0); }
+ (convert (rshift (convert:shift_type @1) @2)
+
+/* ~(~X >>r Y) -> X >>r Y
+   ~(~X < X <> (INT_BITS - (y)))
+#define ROR(x, y) ((x) >> (y) | (x) << (INT_BITS - (y)))
+
+unsigned
+rol (unsigned a, unsigned b)
+{
+  return ~ROL (~a, b);
+}
+
+unsigned int
+ror (unsigned a, unsigned b)
+{
+  return ~ROR (~a, b);
+}
+
+int
+rol_conv1 (int a, unsigned b)
+{
+  return ~(int)ROL((unsigned)~a, b);
+}
+
+int
+rol_conv2 (int a, unsigned b)
+{
+  return ~ROL((unsigned)~a, b);
+}
+
+int
+rol_conv3 (unsigned a, unsigned b)
+{
+  return ~(int)ROL(~a, b);
+}
+
+#define LONG_BITS  (sizeof (long) * __CHAR_BIT__)
+#define ROLL(x, y) ((x) << (y) | (x) >> (LONG_BITS - (y)))
+#define RORL(x, y) ((x) >> (y) | (x) << (LONG_BITS - (y)))
+
+unsigned long
+roll (unsigned long a, unsigned long b)
+{
+  return ~ROLL (~a, b);
+}
+
+unsigned long
+rorl (unsigned long a, unsigned long b)
+{
+  return ~RORL (~a, b);
+}
+
+/* { dg-final { scan-tree-dump-not "~" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/fold-notshift-1.c b/gcc/testsuite/gcc.dg/fold-notshift-1.c
new file mode 100644
index 000..2de236f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-notshift-1.c
@@ -0,0 +1,77 @@
+/* PR tree-optimization/54579
+   PR middle-end/55299 */
+
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-cddce1" } */
+
+int
+asr1 (int a, int b)
+{
+  return ~((~a) >> b);
+}
+
+long
+asr1l (long a, long b)
+{
+  return ~((~a) >> b);
+}
+
+int
+asr_conv (unsigned a, unsigned b)
+{
+  return ~((int)~a >> b);
+}
+
+unsigned
+asr_conv2 (unsigned a, unsigned b)
+{
+  return ~(unsigned)((int)~a >> b);
+}
+
+unsigned
+asr_conv3 (int a, int b)
+{
+  return ~(unsigned)(~a >> b);
+}
+
+typedef __INT32_TYPE__ int32_t;
+typedef __INT64_TYPE__ int64_t;
+
+int32_t
+asr_conv4 (int64_t a, int b)
+{
+  return ~((int32_t)~a >> b);
+}
+
+int32_t
+asr_conv5 (int64_t a, int b)
+{
+  return ~(int32_t)(~a >> b);
+}
+
+int

Re: [patch] Fix PR tree-optimization/70884

2016-05-13 Thread Richard Biener
On Fri, May 13, 2016 at 1:01 PM, Eric Botcazou  wrote:
>> Hmm, the patch looks obvious if it was the intent to allow constant
>> pool replacements
>> _not_ only when the whole constant pool entry may go away.  But I
>> think the intent was
>> to not do this otherwise it will generate worse code by forcing all
>> loads from the constant pool to appear at
>> function start.
>
> Do you mean when the whole constant pool entry is scalarized as opposed to
> partially scalarized?

When we scalarized all constant pool accesses (even if it is not fully
accessed).
The whole point was to be able to remove the constant pool entry later.

At least if I remember correctly ... (it should in the end do what we now do
at gimplification time but with better analysis on the cost/benefit).

>> So - the "real" issue may be a missing
>> should_scalarize_away_bitmap/cannot_scalarize_away_bitmap
>> check somewhere.
>
> This seems to work:
>
> Index: tree-sra.c
> ===
> --- tree-sra.c  (revision 236195)
> +++ tree-sra.c  (working copy)
> @@ -2680,6 +2680,10 @@ analyze_all_variable_accesses (void)
>EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, bi)
>  {
>tree var = candidate (i);
> +  if (constant_decl_p (var)
> + && (!bitmap_bit_p (should_scalarize_away_bitmap, i)
> + || bitmap_bit_p (cannot_scalarize_away_bitmap, i)))
> +   continue;
>struct access *access;
>
>access = sort_and_splice_var_accesses (var);
>
> but I have no idea whether this is correct or not.
>
>
> Martin, are we sure to disable scalarization of constant_decl_p variables not
> covered by initialize_constant_pool_replacements that way?

Does the above "work"?  Aka, not cause testsuite regressions?  I remember
the original patch was mostly for ARM (where we don't scalarize sth at
gimplification time
for some "cost" reason).

Thanks,
Richard.

> --
> Eric Botcazou


Re: [PATCH] Improve other 13 define_insns

2016-05-13 Thread Jakub Jelinek
On Fri, May 13, 2016 at 02:46:36PM +0300, Kirill Yukhin wrote:
> NP. Below is the patch which fixes both issues.
> It also revealed, that for "*and" pattern 32 byte long
> internal buffer is not enough.
> I've extended bunch of such buffers to 128 bytes.
> 
> Probably we might want to re-factor all static char arrays w/
> std::string or at least check how many bytes snprintf actually prints
> and ICE if overflow.

I'd prefer not to use std::string for that, but snprintf and asserts
on the result look reasonable thing to do (though, the snprintf shouldn't
be inside of the assert) obviously.

Jakub


Re: [PATCH] Improve other 13 define_insns

2016-05-13 Thread Kirill Yukhin
Hello,
On 12 May 20:42, Jakub Jelinek wrote:
> On Thu, May 12, 2016 at 05:20:02PM +0300, Kirill Yukhin wrote:
> > > 2016-05-04  Jakub Jelinek  
> > > 
> > >   * config/i386/sse.md (sse_shufps_, sse_storehps, sse_loadhps,
> > >   sse_storelps, sse_movss, avx2_vec_dup, avx2_vec_dupv8sf_1,
> > >   sse2_shufpd_, sse2_storehpd, sse2_storelpd, sse2_loadhpd,
> > >   sse2_loadlpd, sse2_movsd): Use v instead of x in vex or maybe_vex
> > >   alternatives, use maybe_evex instead of vex in prefix.
> > > 
> > >  ;; Avoid combining registers from different units in a single 
> > > alternative,
> > >  ;; see comment above inline_secondary_memory_needed function in i386.c
> > >  (define_insn "sse2_storehpd"
> > > -  [(set (match_operand:DF 0 "nonimmediate_operand" "=m,x,x,x,*f,r")
> > > +  [(set (match_operand:DF 0 "nonimmediate_operand" "=m,x,v,x,*f,r")
> > >   (vec_select:DF
> > > -   (match_operand:V2DF 1 "nonimmediate_operand" " x,0,x,o,o,o")
> > > +   (match_operand:V2DF 1 "nonimmediate_operand" " v,0,v,o,o,o")
> > Same (as [1]) here.
> > Testing this fix:
> > @@ -8426,7 +8426,7 @@
> >  ;; Avoid combining registers from different units in a single alternative,
> >  ;; see comment above inline_secondary_memory_needed function in i386.c
> >  (define_insn "sse2_storehpd"
> > -  [(set (match_operand:DF 0 "nonimmediate_operand" "=m,x,v,x,*f,r")
> > +  [(set (match_operand:DF 0 "nonimmediate_operand" "=m,x,Yv,x,*f,r")
> > (vec_select:DF
> >   (match_operand:V2DF 1 "nonimmediate_operand" " v,0,v,o,o,o")
> > 
> > 
> 
> Sorry for that, yes, this is needed.
NP. Below is the patch which fixes both issues.
It also revealed, that for "*and" pattern 32 byte long
internal buffer is not enough.
I've extended bunch of such buffers to 128 bytes.

Probably we might want to re-factor all static char arrays w/
std::string or at least check how many bytes snprintf actually prints
and ICE if overflow.

Bootstrapped & regtested on x86, 32/64b. Will check into main trunk.

gcc/
* gcc/config/i386/sse.md (define_insn "*andnot3"): Extend static
array to 128 chars.
(define_insn "*andnottf3"): Ditto.
(define_insn "*3"/any_logic): Ditto.
(define_insn "*tf3"/any_logic): Ditto.
(define_insn "*vec_concatv2sf_sse4_1"): Use Yv constraint for scalar
operand to block AVX-512VL insn variant emit when it is not enabled.
(define_insn "sse2_storehpd"): Ditto.

--
Thanks, K

> 
>   Jakub


commit fa17bc4026602e7bc79fd818ab97c991864c2f7c
Author: Kirill Yukhin 
Date:   Fri May 13 12:38:50 2016 +0300

AVX-512. Fix constraints for scalar patterns. Extend 32b static buffers.

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d77227a..1ece86f 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -3027,7 +3027,7 @@
(match_operand:MODEF 2 "register_operand" "x,x,v,v")))]
   "SSE_FLOAT_MODE_P (mode)"
 {
-  static char buf[32];
+  static char buf[128];
   const char *ops;
   const char *suffix
 = (get_attr_mode (insn) == MODE_V4SF) ? "ps" : "";
@@ -3094,7 +3094,7 @@
  (match_operand:TF 2 "vector_operand" "xBm,xm,vm,v")))]
   "TARGET_SSE"
 {
-  static char buf[32];
+  static char buf[128];
   const char *ops;
   const char *tmp
 = (which_alternative >= 2 ? "pandnq"
@@ -3150,7 +3150,7 @@
  (match_operand:MODEF 2 "register_operand" "x,x,v,v")))]
   "SSE_FLOAT_MODE_P (mode)"
 {
-  static char buf[32];
+  static char buf[128];
   const char *ops;
   const char *suffix
 = (get_attr_mode (insn) == MODE_V4SF) ? "ps" : "";
@@ -3225,7 +3225,7 @@
   "TARGET_SSE
&& ix86_binary_operator_ok (, TFmode, operands)"
 {
-  static char buf[32];
+  static char buf[128];
   const char *ops;
   const char *tmp
 = (which_alternative >= 2 ? "pq"
@@ -6546,12 +6546,12 @@
 ;; unpcklps with register source since it is shorter.
 (define_insn "*vec_concatv2sf_sse4_1"
   [(set (match_operand:V2SF 0 "register_operand"
- "=Yr,*x,v,Yr,*x,v,v,*y ,*y")
+ "=Yr,*x, v,Yr,*x,v,v,*y ,*y")
(vec_concat:V2SF
  (match_operand:SF 1 "nonimmediate_operand"
- "  0, 0,v, 0,0, v,m, 0 , m")
+ "  0, 0,Yv, 0,0, v,m, 0 , m")
  (match_operand:SF 2 "vector_move_operand"
- " Yr,*x,v, m,m, m,C,*ym, C")))]
+ " Yr,*x,Yv, m,m, m,C,*ym, C")))]
   "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
unpcklps\t{%2, %0|%0, %2}
@@ -8426,9 +8426,9 @@
 ;; Avoid combining registers from different units in a single alternative,
 ;; see comment above inline_secondary_memory_needed function in i386.c
 (define_insn "sse2_storehpd"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=m,x,v,x,*f,r")
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=m,x,Yv,x,*f,r")
(vec_select:DF
- (match_operand:V2DF 1 "nonimmediate_operand" " v,0,v,o,o,o")
+ (match_operand:V2DF 1 "nonimmediate_operand" " v,0, 

Re: [PATCH, DOC] Document ASAN_OPTIONS="halt_on_error" env variable.

2016-05-13 Thread Martin Liška
On 05/13/2016 01:03 PM, Jakub Jelinek wrote:
> On Fri, May 13, 2016 at 12:26:57PM +0200, Martin Liška wrote:
>> On 05/12/2016 02:44 PM, Jakub Jelinek wrote:
>>> I think it isn't obvious that one needs to put halt_on_error=0 or
>>> halt_on_error=1 into those options and what to do if you need multiple
>>> options in there.
>>
>> What about changing the last sentence to:
>>
>> This can be overridden through a corresponding environment variable.
> 
> I think it is better to be explicit, so something like:
> This can be overridden through setting the halt_on_error flag in
> the corresponding environment variable.
> 
>   Jakub
> 

Good, I've just installed following patch as r236202.

Martin
>From f5ed058b7bacb1f75407765d2511fe1507901214 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 13 May 2016 12:27:07 +0200
Subject: [PATCH] Enhance explanation of halt_on_error.

gcc/ChangeLog:

2016-05-13  Martin Liska  

	* doc/invoke.texi: Enhance explanation of error recovery
	of sanitizers.
---
 gcc/doc/invoke.texi | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c126644..97e5060 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10002,9 +10002,8 @@ Even if a recovery mode is turned on the compiler side, it needs to be also
 enabled on the runtime library side, otherwise the failures are still fatal.
 The runtime library defaults to @code{halt_on_error=0} for
 ThreadSanitizer and UndefinedBehaviorSanitizer, while default value for
-AddressSanitizer is @code{halt_on_error=1}. This can overridden through
-the following environment variables: @env{ASAN_OPTIONS}, @env{TSAN_OPTIONS},
-@env{UBSAN_OPTIONS}.
+AddressSanitizer is @code{halt_on_error=1}. This can be overridden through
+setting the @code{halt_on_error} flag in the corresponding environment variable.
 
 Syntax without explicit @var{opts} parameter is deprecated.  It is equivalent to
 @smallexample
-- 
2.8.2



[gomp4.5] Further progress on OpenMP 4.5 parsing

2016-05-13 Thread Jakub Jelinek
Hi!

This patch adds parsing of new OpenMP 4.5 constructs (though, collapse(n)
is still not handled), but we don't do anything about those during resolve
and later.

Committed to gomp-4_5-branch.

2016-05-13  Jakub Jelinek  

* parse.c (decode_omp_directive): Use gfc_match_omp_end_critical
instead of gfc_match_omp_critical for !$omp end critical.
Handle new OpenMP 4.5 constructs.
(case_executable): Add ST_OMP_TARGET_ENTER_DATA and
ST_OMP_TARGET_EXIT_DATA cases.
(case_exec_markers): Add ST_OMP_TARGET_PARALLEL,
ST_OMP_TARGET_PARALLEL_DO, ST_OMP_TARGET_PARALLEL_DO_SIMD,
ST_OMP_TARGET_SIMD, ST_OMP_TASKLOOP and ST_OMP_TASKLOOP_SIMD cases.
(gfc_ascii_statement): Handle new OpenMP 4.5 constructs.
(parse_omp_do): Handle ST_OMP_TARGET_PARALLEL_DO,
ST_OMP_TARGET_PARALLEL_DO_SIMD, ST_OMP_TASKLOOP and
ST_OMP_TASKLOOP_SIMD.
(parse_omp_structured_block): Handle EXEC_OMP_END_CRITICAL instead
of EXEC_OMP_CRITICAL, adjust for EXEC_OMP_CRITICAL having omp clauses
now.
(parse_executable): Handle ST_OMP_TARGET_PARALLEL,
ST_OMP_TARGET_PARALLEL_DO, ST_OMP_TARGET_PARALLEL_DO_SIMD,
ST_OMP_TASKLOOP and ST_OMP_TASKLOOP_SIMD.
* st.c (gfc_free_statement): Handle EXEC_OMP_END_CRITICAL like
EXEC_OMP_CRITICAL before, free clauses for EXEC_OMP_CRITICAL
and new OpenMP 4.5 constructs.
* dump-parse-tree.c (show_omp_node): Formatting fixes.  Adjust
handling of EXEC_OMP_CRITICAL, handle new OpenMP 4.5 constructs
and some forgotten OpenMP 4.0 constructs.
(show_code_node): Handle new OpenMP 4.5 constructs and some forgotten
OpenMP 4.0 constructs.
* trans-openmp.c (gfc_trans_omp_critical): Adjust EXEC_OMP_CRITICAL
handling.
* gfortran.h (enum gfc_statement): Add ST_OMP_TARGET_PARALLEL,
ST_OMP_END_TARGET_PARALLEL, ST_OMP_TARGET_PARALLEL_DO,
ST_OMP_END_TARGET_PARALLEL_DO, ST_OMP_TARGET_PARALLEL_DO_SIMD,
ST_OMP_END_TARGET_PARALLEL_DO_SIMD, ST_OMP_TARGET_ENTER_DATA,
ST_OMP_TARGET_EXIT_DATA, ST_OMP_TARGET_SIMD, ST_OMP_END_TARGET_SIMD,
ST_OMP_TASKLOOP, ST_OMP_END_TASKLOOP, ST_OMP_TASKLOOP_SIMD and
ST_OMP_END_TASKLOOP_SIMD.
(struct gfc_omp_clauses): Add critical_name field.
(enum gfc_exec_op): Add EXEC_OMP_END_CRITICAL,
EXEC_OMP_TARGET_ENTER_DATA, EXEC_OMP_TARGET_EXIT_DATA,
EXEC_OMP_TARGET_PARALLEL, EXEC_OMP_TARGET_PARALLEL_DO,
EXEC_OMP_TARGET_PARALLEL_DO_SIMD, EXEC_OMP_TARGET_SIMD,
EXEC_OMP_TASKLOOP, EXEC_OMP_TASKLOOP_SIMD.
* frontend-passes.c (gfc_code_walker): Handle EXEC_OMP_CRITICAL,
EXEC_OMP_TASKLOOP, EXEC_OMP_TASKLOOP_SIMD, EXEC_OMP_TARGET_ENTER_DATA,
EXEC_OMP_TARGET_EXIT_DATA, EXEC_OMP_TARGET_PARALLEL,
EXEC_OMP_TARGET_PARALLEL_DO, EXEC_OMP_TARGET_PARALLEL_DO_SIMD and
EXEC_OMP_TARGET_SIMD.
* openmp.c (gfc_free_omp_clauses): Free critical_name field.
(OMP_DO_CLAUSES): Add OMP_CLAUSE_LINEAR.
(OMP_SIMD_CLAUSES): Add OMP_CLAUSE_SIMDLEN.
(OMP_TASKLOOP_CLAUSES, OMP_TARGET_ENTER_DATA_CLAUSES,
OMP_TARGET_EXIT_DATA_CLAUSES): Define.
(gfc_match_omp_critical): Parse optional clauses and use omp_clauses
union member instead of omp_name.
(gfc_match_omp_end_critical): New function.
(gfc_match_omp_distribute_parallel_do): Remove ordered and linear
clauses from the mask.
(gfc_match_omp_do_simd): Don't remove ordered clause from the mask.
(gfc_match_omp_parallel_do_simd): Likewise.
(gfc_match_omp_task, gfc_match_omp_taskwait, gfc_match_omp_taskyield):
Move around to where they belong alphabetically.
(gfc_match_omp_target_enter_data, gfc_match_omp_target_exit_data,
gfc_match_omp_target_parallel, gfc_match_omp_target_parallel_do,
gfc_match_omp_target_parallel_do_simd, gfc_match_omp_target_simd): New
functions.
(gfc_match_omp_target_teams_distribute_parallel_do): Remove ordered
and linear clauses from the mask.
(gfc_match_omp_taskloop, gfc_match_omp_taskloop_simd): New functions.
(gfc_match_omp_teams_distribute_parallel_do): Remove ordered and
linear clauses from the mask.
* match.h (gfc_match_omp_target_enter_data,
gfc_match_omp_target_exit_data, gfc_match_omp_target_parallel,
gfc_match_omp_target_parallel_do,
gfc_match_omp_target_parallel_do_simd, gfc_match_omp_target_simd,
gfc_match_omp_taskloop, gfc_match_omp_taskloop_simd,
gfc_match_omp_end_critical): New prototypes.

* gfortran.dg/gomp/target1.f90: Remove ordered clause where it is
no longer allowed and corresponding ordered construct.

--- gcc/fortran/parse.c.jj  2016-05-04 18:37:35.0 +0200
+++ gcc/fortran/parse.c 2016-05-13 11:49:47.887238121 +0200
@@ -765,7 +765,7 @@ 

Re: [PATCH][ARM] PR target/70830: Avoid POP-{reglist}^ when returning from interrupt handlers

2016-05-13 Thread Kyrill Tkachov

Hi Christophe,

On 12/05/16 20:57, Christophe Lyon wrote:

On 12 May 2016 at 11:48, Ramana Radhakrishnan  wrote:

On Thu, May 5, 2016 at 12:50 PM, Kyrill Tkachov
 wrote:

Hi all,

In this PR we deal with some fallout from the conversion to unified
assembly.
We now end up emitting instructions like:
   pop {r0,r1,r2,r3,pc}^
which is not legal. We have to use an LDM form.

There are bugs in two arm.c functions: output_return_instruction and
arm_output_multireg_pop.

In output_return_instruction the buggy hunk from the conversion was:
   else
-   if (TARGET_UNIFIED_ASM)
   sprintf (instr, "pop%s\t{", conditional);
-   else
- sprintf (instr, "ldm%sfd\t%%|sp!, {", conditional);

The code was already very obscurely structured and arguably the bug was
latent.
It emitted POP only when TARGET_UNIFIED_ASM was on, and since
TARGET_UNIFIED_ASM was on
only for Thumb, we never went down this path interrupt handling code, since
the interrupt
attribute is only available for ARM code. After the removal of
TARGET_UNIFIED_ASM we ended up
using POP unconditionally. So this patch adds a check for IS_INTERRUPT and
outputs the
appropriate LDM form.

In arm_output_multireg_pop the buggy hunk was:
-  if ((regno_base == SP_REGNUM) && TARGET_THUMB)
+  if ((regno_base == SP_REGNUM) && update)
  {
-  /* Output pop (not stmfd) because it has a shorter encoding.  */
-  gcc_assert (update);
sprintf (pattern, "pop%s\t{", conditional);
  }

Again, the POP was guarded on TARGET_THUMB and so would never be taken on
interrupt handling
routines. This patch guards that with the appropriate check on interrupt
return.

Also, there are a couple of bugs in the 'else' branch of that 'if':
* The "ldmfd%s" was output without a '\t' at the end which meant that the
base register
name would be concatenated with the 'ldmfd', creating invalid assembly.

* The logic:

   if (regno_base == SP_REGNUM)
   /* update is never true here, hence there is no need to handle
  pop here.  */
 sprintf (pattern, "ldmfd%s", conditional);

   if (update)
 sprintf (pattern, "ldmia%s\t", conditional);
   else
 sprintf (pattern, "ldm%s\t", conditional);

Meant that for "regno == SP_REGNUM && !update" we'd end up printing
"ldmfd%sldm%s\t"
to pattern. I didn't manage to reproduce that condition though, so maybe it
can't ever occur.
This patch fixes both these issues nevertheless.

I've added the testcase from the PR to catch the fix in
output_return_instruction.
The testcase doesn't catch the bugs in arm_output_multireg_pop, but the
existing tests
gcc.target/arm/interrupt-1.c and gcc.target/arm/interrupt-2.c would have
caught them
if only they were assemble tests rather than just compile. So this patch
makes them
assembly tests (and reverts the scan-assembler checks for the correct LDM
pattern).

Bootstrapped and tested on arm-none-linux-gnueabihf.
Ok for trunk and GCC 6?


Hi Kyrill,

Did you test --with-mode=thumb?
When using arm mode, I see regressions:

   gcc.target/arm/neon-nested-apcs.c (test for excess errors)
   gcc.target/arm/nested-apcs.c (test for excess errors)


It's because I have a local patch in my binutils that makes gas warn on the
deprecated sequences that these two tests generate (they use the deprecated 
-mapcs option),
so these tests were already showing the (test for excess errors) FAIL for me,
so I they didn't appear in my tests diff for this patch. :(

I've reproduced the failure with a clean tree.
Where before we generated:
ldmsp, {fp, sp, pc}
now we generate:
pop{fp, sp, pc}

which are not equivalent (pop performs a write-back) and gas warns:
Warning: writeback of base register when in register list is UNPREDICTABLE

I'm testing a patch to fix this.
Sorry for the regression.
Kyrill


Christophe


Thanks,
Kyrill

2016-05-05  Kyrylo Tkachov  

 PR target/70830
 * config/arm/arm.c (arm_output_multireg_pop): Avoid POP instruction
 when popping the PC and within an interrupt handler routine.
 Add missing tab to output of "ldmfd".
 (output_return_instruction): Output LDMFD with SP update rather
 than POP when returning from interrupt handler.

2016-05-05  Kyrylo Tkachov  

 PR target/70830
 * gcc.target/arm/interrupt-1.c: Change dg-compile to dg-assemble.
 Add -save-temps to dg-options.
 Scan for ldmfd rather than pop instruction.
 * gcc.target/arm/interrupt-2.c: Likewise.
 * gcc.target/arm/pr70830.c: New test.


OK for affected branches and trunk  - thanks for fixing this and sorry
about the breakage.

Ramana




Re: [PATCH, DOC] Document ASAN_OPTIONS="halt_on_error" env variable.

2016-05-13 Thread Jakub Jelinek
On Fri, May 13, 2016 at 12:26:57PM +0200, Martin Liška wrote:
> On 05/12/2016 02:44 PM, Jakub Jelinek wrote:
> > I think it isn't obvious that one needs to put halt_on_error=0 or
> > halt_on_error=1 into those options and what to do if you need multiple
> > options in there.
> 
> What about changing the last sentence to:
> 
> This can be overridden through a corresponding environment variable.

I think it is better to be explicit, so something like:
This can be overridden through setting the halt_on_error flag in
the corresponding environment variable.

Jakub


  1   2   >