Deque iterators lexicographical_compare overloads

2019-07-25 Thread François Dumont

And here are the lexicographical_compare overloads.

I am submitting this seperately from the others cause some might argue 
that it is a lot of code for a scope limited to the moment to unsigned 
byte types.


I had to overload __lexicographical_compare_aux here cause the 
std::lexicographical_compare returning a bool can't be implemented in 
chunk unless you double the comparisons to find out when 2 ranges are 
equivalent.



    * include/bits/stl_deque.h
    (std::__lexicographical_compare_aux): New overloads for deque 
iterators.

    * include/bits/deque.tcc
    (std::__detail::__lc_from_dit): New.
    (std::__detail::__lc_to_dit): New.
    (std::__lexicographical_compare_aux): New overloads definitions, use
    latters.
    * include/debug/deque
    (std::__lexicographical_compare_aux): New overloads for deque debug
    iterators.
    * include/bits_stl_algobase.h:
    (__lexicographical_compare_impl): Return int.
    (__lexicographical_compare<>::__lc): Likewise.
    (_Deque_iterator<>): New type declaration.
    (__lexicographical_compare_aux): Add overload declarations for deque
    iterators normal and debug modes.
    (__lexicographical_compare_aux): Return int.
    (std::lexicographical_compare): Adapt.
    * testsuite/25_algorithms/lexicographical_compare/1.cc (test6): New.
    (test7): New.
    * testsuite/25_algorithms/lexicographical_compare/deque_iterators/1.cc:
    New.

Tested uner Linux x86_64 normal and debug modes.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/deque.tcc b/libstdc++-v3/include/bits/deque.tcc
index 9db869fb666..87514992f50 100644
--- a/libstdc++-v3/include/bits/deque.tcc
+++ b/libstdc++-v3/include/bits/deque.tcc
@@ -1167,6 +1167,80 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 	return true;
   }
 
+template
+  int
+  __lc_from_dit(
+	const _GLIBCXX_STD_C::_Deque_iterator<_Tp, const _Tp&,
+	  const _Tp*>& __first1,
+	const _GLIBCXX_STD_C::_Deque_iterator<_Tp, const _Tp&,
+	  const _Tp*>& __last1,
+	_II __first2, _II __last2)
+  {
+	typedef typename _GLIBCXX_STD_C::_Deque_iterator<_Tp, const _Tp&,
+			 const _Tp*>::_Self
+	  _Self;
+	typedef typename _Self::difference_type difference_type;
+
+	if (__first1._M_node != __last1._M_node)
+	  {
+	difference_type __len = __last2 - __first2;
+	difference_type __flen
+	  = std::min(__len, __first1._M_last() - __first1._M_cur);
+	if (int __ret = std::__lexicographical_compare_aux(
+		__first1._M_cur, __first1._M_last(), __first2, __first2 + __flen))
+	  return __ret;
+
+	__first2 += __flen;
+	__len -= __flen;
+	__flen = std::min(__len, _Self::_S_buffer_size());
+	for (typename _Self::_Map_pointer __node = __first1._M_node + 1;
+		 __node != __last1._M_node;
+		 __first2 += __flen, __len -= __flen,
+		   __flen = std::min(__len, _Self::_S_buffer_size()), ++__node)
+	  if (int __ret = std::__lexicographical_compare_aux(
+	*__node, *__node + _Self::_S_buffer_size(),
+	__first2, __first2 + __flen))
+		return __ret;
+
+	return std::__lexicographical_compare_aux(
+		__last1._M_first, __last1._M_cur, __first2, __last2);
+	  }
+
+	return std::__lexicographical_compare_aux(
+		__first1._M_cur, __last1._M_cur, __first2, __last2);
+  }
+
+template
+  int
+  __lc_to_dit(_II __first1, _II __last1,
+	_GLIBCXX_STD_C::_Deque_iterator<_Tp, const _Tp&, const _Tp*> __first2,
+	_GLIBCXX_STD_C::_Deque_iterator<_Tp, const _Tp&, const _Tp*> __last2)
+  {
+	typedef typename _GLIBCXX_STD_C::_Deque_iterator<_Tp, const _Tp&,
+			 const _Tp*>::_Self
+	  _Self;
+	typedef typename _Self::difference_type difference_type;
+
+	difference_type __len = __last1 - __first1;
+	while (__len > 0)
+	  {
+	const difference_type __flen = __first2._M_node == __last2._M_node
+	  ? __last2._M_cur - __first2._M_cur
+	  : __first2._M_last() - __first2._M_cur;
+	const difference_type __clen
+	  = std::min(__len, __flen);
+	if (int __ret = std::__lexicographical_compare_aux(
+		__first1, __first1 + __clen, __first2._M_cur, __first2._M_cur + __flen))
+	  return __ret;
+
+	__first1 += __clen;
+	__len -= __clen;
+	__first2 += __clen;
+	  }
+
+	return __first1 == __last1 ? (__first2 != __last2 ? -1 : 0) : 1;
+  }
+
 #if __cplusplus >= 201103L
 template
   _OI
@@ -1394,6 +1468,36 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
   return __detail::__equal_to_dit(__first1, __last1, __first2);
 }
 
+  template
+typename __gnu_cxx::__enable_if<
+  __are_same::iterator_category,
+		 std::random_access_iterator_tag>::__value,
+  int>::__type
+__lexicographical_compare_aux(
+  _GLIBCXX_STD_C::_Deque_iterator<_Tp, const _Tp&, const _Tp*> __first1,
+  _GLIBCXX_STD_C::_Deque_iterator<_Tp, const _Tp&, const _Tp*> __last1,
+  _II __first2, _II __last2)
+{ return __detail::__lc_from_dit(__first1, __last1, __first2, __last2); }
+
+  template
+int
+__lexicographical_compare_aux(
+  

[PATCH] Fix bad comment copy/paste

2019-07-25 Thread François Dumont

Committed as trivial.

    * testsuite/util/testsuite_iterators.h
    (bidirectional_iterator_wrapper): Fix type comment.
    (random_access_iterator_wrapper): Likewise.

François

diff --git a/libstdc++-v3/testsuite/util/testsuite_iterators.h b/libstdc++-v3/testsuite/util/testsuite_iterators.h
index ac646a59cb8..42e42740df9 100644
--- a/libstdc++-v3/testsuite/util/testsuite_iterators.h
+++ b/libstdc++-v3/testsuite/util/testsuite_iterators.h
@@ -344,7 +344,7 @@ namespace __gnu_test
* @brief bidirectional_iterator wrapper for pointer
*
* This class takes a pointer and wraps it to provide exactly
-   * the requirements of a forward_iterator. It should not be
+   * the requirements of a bidirectional_iterator. It should not be
* instantiated directly, but generated from a test_container
*/
   template
@@ -408,7 +408,7 @@ namespace __gnu_test
* @brief random_access_iterator wrapper for pointer
*
* This class takes a pointer and wraps it to provide exactly
-   * the requirements of a forward_iterator. It should not be
+   * the requirements of a random_access_iterator. It should not be
* instantiated directly, but generated from a test_container
*/
   template



Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-07-25 Thread Jeff Law
On 7/24/19 12:07 PM, Martin Sebor wrote:

> 
> I don't know what Jakub had in mind but the mapping I envision is
> one like hash_map that would make it possible to set
> a bit for each distinct warning for every tree node.  It would let
> us set a bit for -Wuninitialized while leaving the bit for
> -Warray-bounds (and all other warnings) clear.
Ah, yes.  I like that.  I'm still worried about the linkage between the
map and the GC system, but a  has a lot of potential.

> 
>>
>> If the bit were on an SSA_NAME, or a _DECL node, then the flag bit is
>> shared and would be a much larger concern.
> 
> For shared objects the mapping would have to be more involved but
> I haven't thought about it in any detail to have an idea what it
> might look like.
I suspect shared objects are just going to be painful.  A solution which
worked on EXPR nodes would still be a significant step forward.


> 
> Anyway, if/when someone does come up with a solution for this we
> will have to go through all the places where the no-warning bit
> is set and replace them with whatever replacement we come up with.
> One instance more or less won't make a difference.  I just wanted
> to point out that setting the bit is not a robust solution.
Yea, but at least they're easy to find via the TREE_NO_WARNING flag.
Hopefully we've got tests for most of the issues we're working around
with that bit.

jeff


Re: [range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

2019-07-25 Thread Jeff Law
On 7/24/19 12:33 PM, Richard Biener wrote:
> On July 24, 2019 8:18:57 PM GMT+02:00, Jeff Law 
> wrote:
>> On 7/24/19 11:00 AM, Richard Biener wrote: [ Big snip, ignore
>> missing reply attributions... ]
>> 
 it. But I'd claim that if callers are required not to change
 these ranges, then the callers are fundamentally broken.  I'm
 not sure what the "sanitization" is really buying you here.
 Can you point to something specific?
 
> 
> But you lose the sanitizing that nobody can change it and the
>  changed info leaks to other SSA vars.
> 
> As said, fix all callers to deal with NULL.
> 
> But I argue the current code is exactly optimal and safe.
 ANd I'd argue that it's just plain broken and that the 
 sanitization you're referring to points to something broken 
 elsewhere,  higher up in the callers.
>>> 
>>> Another option is to make get_value_range return by value and
>>> the only way to change the lattice to call an appropriate set
>>> function. I think we already do the latter in all cases (but we
>>> use get_value_range in the setter) and returning by reference is
>>> just eliding the copy.
>> OK, so what I think you're getting at (and please correct me if
>> I'm wrong) is that once the lattice values are set, you don't want 
>> something changing the recorded ranges underneath?
>> 
>> ISTM the way to enforce that is to embed the concept in the class
>> and enforce it by not allowing direct manipulation of range by the
>> clients. So a client that wants this behavior somehow tells the
>> class that ranges are "set in stone" and from that point the
>> setters don't allow changing the underlying ranges.
> 
> Yes. You'll see that nearly all callers do
> 
> Value_range vr = *get_value_range (name);
> 
> Modify
> 
> Update_value_range (name, ) ;
> 
> And returning by reference was mostly an optimization. We _did_ have
> callers Changing the range in place and the const varying catched
> those.
Well, that's the kind of thing we want to avoid at the API level.  One
way was to simply prohibit changes with a by-value return and forcing
changes through a setter.  Another would be returning everything as a
const, which is what I think your patch from today did.   In both cases
you have to use the appropriate APIs to make changes.  I prefer the
former because someone could cast away the const property, but if the
former isn't really feasible, then always returning a const object is a
reasonable compromise I guess.

jeff


Re: [range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

2019-07-25 Thread Jeff Law
On 7/24/19 12:33 PM, Richard Biener wrote:
> On July 24, 2019 8:18:57 PM GMT+02:00, Jeff Law 
> wrote:
>> On 7/24/19 11:00 AM, Richard Biener wrote: [ Big snip, ignore
>> missing reply attributions... ]
>> 
 it. But I'd claim that if callers are required not to change
 these ranges, then the callers are fundamentally broken.  I'm
 not sure what the "sanitization" is really buying you here.
 Can you point to something specific?
 
> 
> But you lose the sanitizing that nobody can change it and the
>  changed info leaks to other SSA vars.
> 
> As said, fix all callers to deal with NULL.
> 
> But I argue the current code is exactly optimal and safe.
 ANd I'd argue that it's just plain broken and that the 
 sanitization you're referring to points to something broken 
 elsewhere,  higher up in the callers.
>>> 
>>> Another option is to make get_value_range return by value and
>>> the only way to change the lattice to call an appropriate set
>>> function. I think we already do the latter in all cases (but we
>>> use get_value_range in the setter) and returning by reference is
>>> just eliding the copy.
>> OK, so what I think you're getting at (and please correct me if
>> I'm wrong) is that once the lattice values are set, you don't want 
>> something changing the recorded ranges underneath?
>> 
>> ISTM the way to enforce that is to embed the concept in the class
>> and enforce it by not allowing direct manipulation of range by the
>> clients. So a client that wants this behavior somehow tells the
>> class that ranges are "set in stone" and from that point the
>> setters don't allow changing the underlying ranges.
> 
> Yes. You'll see that nearly all callers do
> 
> Value_range vr = *get_value_range (name);
> 
> Modify
> 
> Update_value_range (name, ) ;
> 
> And returning by reference was mostly an optimization. We _did_ have
> callers Changing the range in place and the const varying catched
> those.
> 
> When returning by value we can return individual VARYINGs not in the
> lattice if we decide that's what we want.
> 
>> I just want to make sure we're on the same page WRT why you think
>> the constant varying range object is useful.
> 
> As said it's an optimization. We do not want to reallocate the
> lattice. And we want lattice updating to happen in a controlled
> manner, so returning a pointer into the lattice is bad design at this
> point.
But I would claim that the current state is dreadful.  Consider that
when gimple-fold asks for a new SSA_NAME, it could get a recycled one,
in which case we get a real range.  Or if it doens't get a recycled
name, then we get the const varying node.  The inconsistently is silly
when we can just reallocate the underlying object.

Between recycling of SSA_NAMEs and allocating a bit of additional space
(say rounding up to some reasonable boundary) I'd bet you'd never be
able to measure the reallocation in practice.

jeff


[PATCH V4, rs6000] Support vrotr3 for int vector types

2019-07-25 Thread Kewen.Lin
Hi Segher,

You can just ignore the previous version and review the current
with one comment change and "=" line move.

Sorry for the inconvenience.


Thanks,
Kewen
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index b6a22d9010c..217c7afdf17 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1666,6 +1666,56 @@
   "vrl %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+;; Here these trunc_vrl are for vrotr3 expansion.
+;; since SHIFT_COUNT_TRUNCATED is set as zero, to append one explicit
+;; AND to indicate truncation but emit vrl insn.
+(define_insn "trunc_vrlv2di"
+  [(set (match_operand:V2DI 0 "register_operand" "=v")
+   (rotate:V2DI (match_operand:V2DI 1 "register_operand" "v")
+ (and:V2DI (match_operand:V2DI 2 "register_operand" "v")
+   (const_vector:V2DI [(const_int 63) (const_int 63)]]
+  "VECTOR_UNIT_P8_VECTOR_P (V2DImode)"
+  "vrld %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "trunc_vrlv4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+   (rotate:V4SI (match_operand:V4SI 1 "register_operand" "v")
+ (and:V4SI (match_operand:V4SI 2 "register_operand" "v")
+   (const_vector:V4SI [(const_int 31) (const_int 31)
+ (const_int 31) (const_int 31)]]
+  "VECTOR_UNIT_ALTIVEC_P (V4SImode)"
+  "vrlw %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "trunc_vrlv8hi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+   (rotate:V8HI (match_operand:V8HI 1 "register_operand" "v")
+ (and:V8HI (match_operand:V8HI 2 "register_operand" "v")
+   (const_vector:V8HI [(const_int 15) (const_int 15)
+ (const_int 15) (const_int 15)
+ (const_int 15) (const_int 15)
+ (const_int 15) (const_int 15)]]
+  "VECTOR_UNIT_ALTIVEC_P (V8HImode)"
+  "vrlh %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "trunc_vrlv16qi"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+   (rotate:V16QI (match_operand:V16QI 1 "register_operand" "v")
+ (and:V16QI (match_operand:V16QI 2 "register_operand" "v")
+   (const_vector:V16QI [(const_int 7) (const_int 7)
+   (const_int 7) (const_int 7)
+   (const_int 7) (const_int 7)
+   (const_int 7) (const_int 7)
+   (const_int 7) (const_int 7)
+   (const_int 7) (const_int 7)
+   (const_int 7) (const_int 7)
+   (const_int 7) (const_int 7)]]
+  "VECTOR_UNIT_ALTIVEC_P (V16QImode)"
+  "vrlb %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "altivec_vrlmi"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
 (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0")
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 8ca98299950..c4c74630d26 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -163,6 +163,17 @@
   return VINT_REGNO_P (REGNO (op));
 })
 
+;; Return 1 if op is a vector register that operates on integer vectors
+;; or if op is a const vector with integer vector modes.
+(define_predicate "vint_reg_or_const_vector"
+  (match_code "reg,subreg,const_vector")
+{
+  if (GET_CODE (op) == CONST_VECTOR && GET_MODE_CLASS (mode) == 
MODE_VECTOR_INT)
+return 1;
+
+  return vint_operand (op, mode);
+})
+
 ;; Return 1 if op is a vector register to do logical operations on (and, or,
 ;; xor, etc.)
 (define_predicate "vlogical_operand"
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 70bcfe02e22..a7e6c6c256f 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -1260,6 +1260,35 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
   "")
 
+;; Expanders for rotatert to make use of vrotl
+(define_expand "vrotr3"
+  [(set (match_operand:VEC_I 0 "vint_operand")
+   (rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
+   (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
+{
+  rtx rot_count = gen_reg_rtx (mode);
+  if (GET_CODE (operands[2]) == CONST_VECTOR)
+{
+  machine_mode inner_mode = GET_MODE_INNER (mode);
+  unsigned int bits = GET_MODE_PRECISION (inner_mode);
+  rtx mask_vec = gen_const_vec_duplicate (mode, GEN_INT (bits - 1));
+  rtx imm_vec =
+   simplify_const_unary_operation (NEG, mode, operands[2],
+ GET_MODE (operands[2]));
+  imm_vec =
+   simplify_const_binary_operation (AND, mode, imm_vec, mask_vec);
+  rot_count = force_reg (mode, imm_vec);
+  emit_insn (gen_vrotl3 (operands[0], operands[1], rot_count));
+}
+  else
+{
+  emit_insn (gen_neg2 (rot_count, operands[2]));
+  emit_insn (gen_trunc_vrl (operands[0], operands[1], 

[PATCH V3, rs6000] Support vrotr3 for int vector types

2019-07-25 Thread Kewen.Lin
Hi Segher,

on 2019/7/25 下午9:49, Segher Boessenkool wrote:
> Hi Kewen,
> 
> On Tue, Jul 23, 2019 at 02:28:28PM +0800, Kewen.Lin wrote:
>> --- a/gcc/config/rs6000/altivec.md
>> +++ b/gcc/config/rs6000/altivec.md
>> @@ -1666,6 +1666,60 @@
>>"vrl %0,%1,%2"
>>[(set_attr "type" "vecsimple")])
>>  
>> +;; Here these vrl_and are for vrotr3 expansion.
>> +;; since SHIFT_COUNT_TRUNCATED is set as zero, to append one explicit
>> +;; AND to indicate truncation but emit vrl insn.
>> +(define_insn "vrlv2di_and"
>> +  [(set (match_operand:V2DI 0 "register_operand" "=v")
>> +(and:V2DI
>> +  (rotate:V2DI (match_operand:V2DI 1 "register_operand" "v")
>> + (match_operand:V2DI 2 "register_operand" "v"))
>> +  (const_vector:V2DI [(const_int 63) (const_int 63)])))]
>> +  "VECTOR_UNIT_P8_VECTOR_P (V2DImode)"
>> +  "vrld %0,%1,%2"
>> +  [(set_attr "type" "vecsimple")])
> 
> "vrlv2di_and" is an a bit unhappy name, we have a "vrlv" intruction.
> Just something like "rotatev2di_something", maybe?
> 
> Do we have something similar for non-rotate vector shifts, already?  We
> probably should, so please keep that in mind for naming things.
> 
> "vrlv2di_and" sounds like you first do the rotate, and then on what
> that results in you do the and.  And that is what the pattern does,
> too.  But this is wrong: it should mask off all but the lower bits
> of operand 2, instead.
> 

Thanks for reviewing!

You are right, the name matches the pattern but not what we want.
How about the name trunc_vrl, first do the truncation on the
operand 2 then do the vector rotation.  I didn't find any existing
shifts with the similar pattern.

I've updated the name and associated pattern in the new patch.

>> +(define_insn "vrlv16qi_and"
>> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
>> +(and:V16QI
>> +(rotate:V16QI (match_operand:V16QI 1 "register_operand" "v")
>> +   (match_operand:V16QI 2 "register_operand" "v"))
>> +(const_vector:V16QI [(const_int 7) (const_int 7)
>> +(const_int 7) (const_int 7)
>> +(const_int 7) (const_int 7)
>> +(const_int 7) (const_int 7)
>> +(const_int 7) (const_int 7)
>> +(const_int 7) (const_int 7)
>> +(const_int 7) (const_int 7)
>> +(const_int 7) (const_int 7)])))]
>> +  "VECTOR_UNIT_ALTIVEC_P (V16QImode)"
>> +  "vrlb %0,%1,%2"
>> +  [(set_attr "type" "vecsimple")])
> 
> All the patterns can be merged into one (using some code_iterator).  That
> can be a later improvement.
> 

I guess you mean mode_attr?
I did try to merge them since they look tedious.  But the mode_attr can't
contain either "[" or "(" inside, it seems can't be used for different const 
vector mappings.  Really appreciate that if you can show me some examples.

>> +;; Return 1 if op is a vector register that operates on integer vectors
>> +;; or if op is a const vector with integer vector modes.
>> +(define_predicate "vint_reg_or_const_vector"
>> +  (match_code "reg,subreg,const_vector")

> Hrm, I don't like this name very much.  Why is just vint_operand not
> enough for what you use this for?
> 

vint_operand isn't enough since the expander legalizes the const vector into
vector register, I'm unable to get the feeder (const vector) of the input 
register operand.


>> +  rtx imm_vec
>> += simplify_const_unary_operation (NEG, mode, operands[2],
> 
> (The "=" goes on the previous line).

OK, thanks.

>> +  emit_insn (gen_vrl_and (operands[0], operands[1], rot_count));
>> +}
>> +  DONE;
>> +})
> 
> Why do you have to emit as the "and" form here?  Emitting the "bare"
> rotate should work just as well here?

Yes, the emitted insn is exactly the same.

It follows Jakub's suggestion via
https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01159.html

Append one explicit AND to indicate the truncation for the case 
!SHIFT_COUNT_TRUNCATED.  (sorry if the previous pattern misled.)

> 
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c
>> @@ -0,0 +1,46 @@
>> +/* { dg-options "-O3" } */
>> +/* { dg-require-effective-target powerpc_vsx_ok } */
> 
>> +/* { dg-final { scan-assembler {\mvrld\M} } } */
>> +/* { dg-final { scan-assembler {\mvrlw\M} } } */
>> +/* { dg-final { scan-assembler {\mvrlh\M} } } */
>> +/* { dg-final { scan-assembler {\mvrlb\M} } } */
> 
> You need to generate code for whatever cpu introduced those insns,
> if you expect those to be generated ;-)
> 
> vsx_ok isn't needed.
> 

Thanks for catching, update it with altivec_ok in new patch.
I think we can still have this guard?  since those instructions 
origin from isa 2.03. 

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index b6a22d9010c..2b0682ad2ba 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1666,6 +1666,56 @@
   "vrl %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+;; Here these vrl_and 

Re: [PATCH] handle multibyte stores larger than char in strlen (PR 91183, 86888)

2019-07-25 Thread Jeff Law
On 7/25/19 1:42 PM, Martin Sebor wrote:
> On 7/24/19 8:39 PM, Jeff Law wrote:
>> On 7/24/19 8:17 PM, Martin Sebor wrote:
 Committed in r273783 after retesting and including a test for
 the warning that I had left out of the patch I posted here.

 Martin

 PS I suspect some of the tests I added might need tweaking on
 big-endian systems.  I'll deal with them tomorrow.
>>>
>>> And maybe also strictly aligned targets.  A sparc-solaris2.11 cross
>>> shows these failures.  It looks like it doesn't like something about
>>> some of the 64 bit stores in the tests.
>>>
>>> FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized "strlen" 0
>>> FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized
>>> "_not_eliminated_" 0
>>> FAIL: gcc.dg/strlenopt-71.c (test for excess errors)
>>> FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "strlen" 0
>>> FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized
>>> "_not_eliminated_" 0
>>
>>
>> msp430-elf failures:
>>
>>
>>> New tests that FAIL (5 tests):
>>>
>>> msp430-sim: gcc.dg/strlenopt-70.c (test for excess errors)
>>> msp430-sim: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized
>>> "_not_eliminated_" 0
> 
> This one is due to bugs in the endian macros the test defines
> to try to handle both big and little-endian.  I've fixed those/
> 
>>> msp430-sim: gcc.dg/strlenopt-71.c (test for excess errors)
> 
> Same here, though this is a runtime test so it will also fail
> to link outside of an emulator.  (Changing the test harness to
> report these tests as UNSUPPORTED instead would avoid this sort
> of ambiguity.)
Well, the "test for excess errors" should be an indicator that something
went wrong before trying to execute.  Typically these are compile-time
warnings/errors or occasionally a link time error due to an undefined
symbol.

I don't have access to the tester, so I can't really look at it until I
return though.

> 
>>> msp430-sim: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized
>>> "_not_eliminated_" 0
>>> msp430-sim: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized
>>> "strlen" 0
> 
> This failure is due to to a combination of the absence of early
> store merging in pr83821.  The former prevents the stores below
> 
>   char a[4];
>   a[0] = 'X';
>   a[1] = 0;
>   a[2] = a[3] = 0;
> 
> from turning into:
> 
>   MEM  [(char * {ref-all})] = 49;
> 
> pr83821 causes the strlen pass to invalidate strlen information
> when it sees the assignment to a[2] (or beyond).  I need to dust
> off and resubmit my patch for that from last year.
> 
> Until that patch is approved I have disabled the test on strictly
> aligned targets.
> 
> The failure in gcc.dg/Wstringop-overflow-14.c on visium-elf was
> because of a difference between strictly aligned targets and
> the rest, which triggers different warnings between the two sets.
> I disabled the unexpected warning and the test passes.
> 
> I've just committed r273812 and r273814 with these changes.
No problem.  When I'm back from PTO I'll review state in the tester and
pass along anything else related.

Jeff


Re: cp: implementation of p1301 for C++

2019-07-25 Thread JeanHeyd Meneide
The HTML formatting was off (again), so I used git send-email as someone
recommended to me in the IRC. Patch is here:
https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01670.html

I... think it's good, there? Apologies for all the noise; it's a bit hard
getting used to these tools.

Sincerely,
JeanHeyd


[PATCH 1/2] Implement p1301 - [[nodiscard("should have a reason")]]

2019-07-25 Thread phdofthehouse
From: ThePhD 

---
 gcc/c-family/c-lex.c  |  38 ++--
 gcc/cp/cvt.c  | 204 +++---
 gcc/cp/parser.c   |  23 +-
 gcc/cp/tree.c |  56 +++--
 gcc/escaped_string.h  |  43 
 .../g++.dg/cpp2a/nodiscard-bad-clause.C   |  12 ++
 .../g++.dg/cpp2a/nodiscard-once-clause.C  |  12 ++
 gcc/testsuite/g++.dg/cpp2a/nodiscard-once.C   |  12 ++
 gcc/testsuite/g++.dg/cpp2a/nodiscard-reason.C | 203 +
 gcc/tree.c|  17 +-
 10 files changed, 487 insertions(+), 133 deletions(-)
 create mode 100644 gcc/escaped_string.h
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nodiscard-bad-clause.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nodiscard-once-clause.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nodiscard-once.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nodiscard-reason.C

diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index 851fd704e5d..f5870d095f2 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -345,24 +345,26 @@ c_common_has_attribute (cpp_reader *pfile)
  attr_name = NULL_TREE;
}
}
- else
-   {
- /* Some standard attributes need special handling.  */
- if (is_attribute_p ("noreturn", attr_name))
-   result = 200809;
- else if (is_attribute_p ("deprecated", attr_name))
-   result = 201309;
- else if (is_attribute_p ("maybe_unused", attr_name)
-  || is_attribute_p ("nodiscard", attr_name)
-  || is_attribute_p ("fallthrough", attr_name))
-   result = 201603;
- else if (is_attribute_p ("no_unique_address", attr_name)
-  || is_attribute_p ("likely", attr_name)
-  || is_attribute_p ("unlikely", attr_name))
-   result = 201803;
- if (result)
-   attr_name = NULL_TREE;
-   }
+   else
+   {
+   /* Some standard attributes need special 
handling.  */
+   if (is_attribute_p ("noreturn", attr_name))
+   result = 200809;
+   else if (is_attribute_p ("deprecated", 
attr_name))
+   result = 201309;
+   else if (is_attribute_p ("maybe_unused", 
attr_name)
+|| is_attribute_p ("fallthrough", 
attr_name))
+   result = 201603;
+   else if (is_attribute_p ("no_unique_address", 
attr_name)
+|| is_attribute_p ("likely", attr_name)
+|| is_attribute_p ("unlikely", 
attr_name))
+   result = 201803;
+   else if (is_attribute_p ("nodiscard", 
attr_name))
+   result = 201907; /* placeholder until 
C++20 Post-Cologne Working Draft. */
+ 
+   if (result)
+   attr_name = NULL_TREE;
+   }
}
   if (attr_name)
{
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 23d2aabc483..4a5128c76a1 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "convert.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "escaped_string.h"
 
 static tree convert_to_pointer_force (tree, tree, tsubst_flags_t);
 static tree build_type_conversion (tree, tree);
@@ -1026,49 +1027,99 @@ maybe_warn_nodiscard (tree expr, impl_conv_void 
implicit)
 
   tree rettype = TREE_TYPE (type);
   tree fn = cp_get_fndecl_from_callee (callee);
-  if (implicit != ICV_CAST && fn
-  && lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn)))
-{
-  auto_diagnostic_group d;
-  if (warning_at (loc, OPT_Wunused_result,
- "ignoring return value of %qD, "
- "declared with attribute nodiscard", fn))
-   inform (DECL_SOURCE_LOCATION (fn), "declared here");
-}
-  else if (implicit != ICV_CAST
-  && lookup_attribute ("nodiscard", TYPE_ATTRIBUTES (rettype)))
-{
-  auto_diagnostic_group d;
-  if (warning_at (loc, OPT_Wunused_result,
- "ignoring returned value of type %qT, "
- "declared with attribute nodiscard", rettype))
-   {
- if (fn)
-   inform (DECL_SOURCE_LOCATION (fn),
-   "in call to %qD, declared here", fn);
- inform (DECL_SOURCE_LOCATION (TYPE_NAME (rettype)),
- "%qT declared here", rettype);
-   }
-}
-  else if (TREE_CODE 

[PATCH], PowerPC #20, Add placeholder for 'future' scheduling

2019-07-25 Thread Michael Meissner
This is patch #20.  It adds a placeholder for scheduling on the 'future'
machine.  This code currently uses the power9.md with all of the names changed
for 'future', and targetting off of 'tune=future'.

This placeholder hopefully will be filled in later when we can add more
accurate scheduling information.

I have a job queued up to do a bootstrap and make check run on a little endian
power8 system.  Assuming this run has no regressions, can I check this into the
FSF trunk?

This is a series of separate patches to add functionality to the PowerPC
backend to support future processors.  Here is a high level summary of the
patches:

 * Patches 1-8, have already been applied
 * Patch 9 has been rewritten in patches 12-13
 * Patch 10 is withdrawn for now
 * Patch 11 adds DS offset mode to rs6000.c's reg_addr
 * Patch 12 adds a new enumeration for instruction format
 * Patch 13 adds support for matching prefixed insns
 * Patch 14 adds pc-relative support to load up addresses
 * Patch 15 renamed some functions to be smaller
 * Patch 16 updated a comment and moved a predicate
 * Patch 17 adds the prefixed RTL attribute & emitting 'p' before prefixed
 * Patch 18 adds prefixed support for scalar types
 * Patch 19 uses a separate 'future' cost structure
 * Patch 20 clones power9.md for initial scheduling on future.md.

The following patches have not yet been written, but I expect them to be:

 * Patch 21 finish prefixed insn support for vectors & 128-bit int/floats
 * Patch 22 enable pc-relative by default
 * Patch 23 add pcrel linker optimization
 * Patch 24 new tests

2019-07-25  Michael Meissner  

* config/rs6000/future.md: New file, clone from power9.md and
change cpu=future.  Place holder until future scheduling is done.
* config/rs6000/rs6000.md (toplevel): Include future.md.
* config/rs6000/t-rs6000 (MD_INCLUDES): Add future.md.

Index: gcc/config/rs6000/future.md
===
--- gcc/config/rs6000/future.md (revision 0)
+++ gcc/config/rs6000/future.md (working copy)
@@ -0,0 +1,519 @@
+;; Scheduling description for IBM POWER9 processor.
+;; Copyright (C) 2016-2019 Free Software Foundation, Inc.
+;;
+;; Contributed by Pat Haugen (pthau...@us.ibm.com).
+
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+;;
+;; This file was cloned from power9.md.  In the future, we will have future
+;; specific optimizations here.
+
+(define_automaton "futuredsp,futurelsu,futurevsu,futurefpdiv,futuremisc")
+
+(define_cpu_unit "lsu0_future,lsu1_future,lsu2_future,lsu3_future" "futurelsu")
+(define_cpu_unit "vsu0_future,vsu1_future,vsu2_future,vsu3_future" "futurevsu")
+; Two vector permute units, part of vsu
+(define_cpu_unit "prm0_future,prm1_future" "futurevsu")
+; Two fixed point divide units, not pipelined
+(define_cpu_unit "fx_div0_future,fx_div1_future" "futuremisc")
+(define_cpu_unit "bru_future,cryptu_future,dfu_future" "futuremisc")
+; Create a false unit for use by non-pipelined FP div/sqrt
+(define_cpu_unit "fp_div0_future,fp_div1_future,fp_div2_future,fp_div3_future"
+"futurefpdiv")
+
+
+(define_cpu_unit "x0_future,x1_future,xa0_future,xa1_future,
+ x2_future,x3_future,xb0_future,xb1_future,
+ br0_future,br1_future" "futuredsp")
+
+
+; Dispatch port reservations
+;
+; Future can dispatch a maximum of 6 iops per cycle with the following
+; general restrictions (other restrictions also apply):
+;   1) At most 2 iops per execution slice
+;   2) At most 2 iops to the branch unit
+; Note that insn position in a dispatch group of 6 insns does not infer which
+; execution slice the insn is routed to.  The units are used to infer the
+; conflicts that exist (i.e. an 'even' requirement will preclude dispatch
+; with 2 insns with 'superslice' requirement).
+
+; The xa0/xa1 units really represent the 3rd dispatch port for a superslice but
+; are listed as separate units to allow those insns that preclude its use to
+; still be scheduled two to a superslice while reserving the 3rd slot.  The
+; same applies for xb0/xb1.
+(define_reservation "DU_xa_future" "xa0_future+xa1_future")
+(define_reservation "DU_xb_future" "xb0_future+xb1_future")
+
+; Any execution slice dispatch
+(define_reservation "DU_any_future"
+   

[PATCH], PowerPC #19, Add 'future' cost structure

2019-07-25 Thread Michael Meissner
This is patch #19.  It is a fairly simple patch to clone the power9 costs and
use that for costs on the 'future' machine.

As I write this message, I am waiting for the full bootstrap and make of the
patch on a little endian power8 system.  Assuming there are no regressions, can
I check this into the FSF trunk?

iThis is a series of separate patches to add functionality to the PowerPC
backend to support future processors.  Here is a high level summary of the
patches:

 * Patches 1-8, have already been applied
 * Patch 9 has been rewritten in patches 12-13
 * Patch 10 is withdrawn for now
 * Patch 11 adds DS offset mode to rs6000.c's reg_addr
 * Patch 12 adds a new enumeration for instruction format
 * Patch 13 adds support for matching prefixed insns
 * Patch 14 adds pc-relative support to load up addresses
 * Patch 15 renamed some functions to be smaller
 * Patch 16 updated a comment and moved a predicate
 * Patch 17 adds the prefixed RTL attribute & emitting 'p' before prefixed
 * Patch 18 adds prefixed support for scalar types
 * Patch 19 uses a separate 'future' cost structure
 * Patch 20 clones power9.md for initial scheduling on future.md.

The following patches have not yet been written, but I expect them to be:

 * Patch 21 finish prefixed insn support for vectors & 128-bit int/floats
 * Patch 22 enable pc-relative by default
 * Patch 23 add pcrel linker optimization
 * Patch 24 new tests

2019-07-25  Michael Meissner  

* config/rs6000/rs6000.c (future_cost): New cost structure.
(rs6000_option_override_internal): Use future cost structure.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 273815)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -1094,6 +1094,26 @@ struct processor_costs power9_cost = {
   COSTS_N_INSNS (3),   /* SF->DF convert */
 };
 
+/* Instruction costs on FUTURE processors.  */
+static const
+struct processor_costs future_cost = {
+  COSTS_N_INSNS (3),   /* mulsi */
+  COSTS_N_INSNS (3),   /* mulsi_const */
+  COSTS_N_INSNS (3),   /* mulsi_const9 */
+  COSTS_N_INSNS (3),   /* muldi */
+  COSTS_N_INSNS (8),   /* divsi */
+  COSTS_N_INSNS (12),  /* divdi */
+  COSTS_N_INSNS (3),   /* fp */
+  COSTS_N_INSNS (3),   /* dmul */
+  COSTS_N_INSNS (13),  /* sdiv */
+  COSTS_N_INSNS (18),  /* ddiv */
+  128, /* cache line size */
+  32,  /* l1 cache */
+  512, /* l2 cache */
+  8,   /* prefetch streams */
+  COSTS_N_INSNS (3),   /* SF->DF convert */
+};
+
 /* Instruction costs on POWER A2 processors.  */
 static const
 struct processor_costs ppca2_cost = {
@@ -4622,10 +4642,13 @@ rs6000_option_override_internal (bool gl
break;
 
   case PROCESSOR_POWER9:
-  case PROCESSOR_FUTURE:
rs6000_cost = _cost;
break;
 
+  case PROCESSOR_FUTURE:
+   rs6000_cost = _cost;
+   break;
+
   case PROCESSOR_PPCA2:
rs6000_cost = _cost;
break;

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[PATCH], PowerPC #18, Add prefixed support for scalars

2019-07-25 Thread Michael Meissner
This is patch #18, it finaly adds prefixed instruction support for the scalar
types.  There are some things I need to check out for for vector types and
128-bit floating point, so those are not enabled for now.

The earlier patch that changed the length of many of the load/stores to '*'
means that this patch can be somewhat smaller.

The three main insns patched are the movsi, movdi, and add insns to add
an alternative to load up or add 34 bit integer constants.  These changes come
from my private internal branch that has been stable for some time, adapted to
the current trunk.

As I post this patch, it has not been go through the full bootstrap procedure,
but I have checked it without doing the bootstrap before committing the patch.
Assuming the bootstrap and make check run without regressions, can I check this
into the FSF trunk?

This is a series of separate patches to add functionality to the PowerPC
backend to support future processors.  Here is a high level summary of the
patches:

 * Patches 1-8, have already been applied
 * Patch 9 has been rewritten in patches 12-13
 * Patch 10 is withdrawn for now
 * Patch 11 adds DS offset mode to rs6000.c's reg_addr
 * Patch 12 adds a new enumeration for instruction format
 * Patch 13 adds support for matching prefixed insns
 * Patch 14 adds pc-relative support to load up addresses
 * Patch 15 renamed some functions to be smaller
 * Patch 16 updated a comment and moved a predicate
 * Patch 17 adds the prefixed RTL attribute & emitting 'p' before prefixed
 * Patch 18 adds prefixed support for scalar types
 * Patch 19 uses a separate 'future' cost structure
 * Patch 20 clones power9.md for initial scheduling on future.md.

The following patches have not yet been written, but I expect them to be:

 * Patch 21 finish prefixed insn support for vectors & 128-bit int/floats
 * Patch 22 enable pc-relative by default
 * Patch 23 add pcrel linker optimization
 * Patch 24 new tests

2019-07-25  Michael Meissner  

* config/rs6000/predicates.md (add_operand): Add support for
PADDI.
(non_add_cint_operand): Add support for PADDI.
* config/rs6000/rs6000.c (struct rs6000_reg_addr): Add field to
say whether a mode supports using prefixed memory instructions.
(mode_supports_prefixed_address_p): Make static inline.  Use the
prefixed_memory_p field in reg_addr to say whether a give mode
supports prefixed memory instructions.
(rs6000_hard_regno_mode_ok_uncached): Allow for larger vector
types in the future.
(rs6000_debug_print_mode): If -mdebug=reg, print which modes
support prefixed addressing.
(rs6000_setup_reg_addr_masks): Mark all scalar types that support
offset addressing to be able support prefixed addressing.
(num_insns_constant_gpr): Use SIGNED_16BIT_OFFSET_P macro.  If we
support prefixed memory instructions, know that we can load up
34-bit constants directly in one instruction.
(quad_address_p): Allow for larger vector types in the future.
(mem_operand_gpr): Use SIGNED_16BIT_FFSET_EXTRA_P macro.
(mem_operand_ds_form): Use SIGNED_16BIT_FFSET_EXTRA_P macro.
(rs6000_legitimate_offset_address_p): Add support for larger
offsets if we have prefixed addressing.
(rs6000_legitimate_address_p): Add support for prefixed
addresses.  Do not allow update forms of addressing if the offset
is larger than 16 bits, since prefixed memory instructions don't
have update addressing formats.
(rs6000_mode_dependent_address): Add support for 34 bit offsets if
we have prefixed addressing.
(rs6000_rtx_costs): Allow adds with 34-bit values which show up if
we have prefixed addressing.
(rs6000_num_insns): New function.
(rs6000_insn_cost): When using the instruction length, count
prefixed instructions as one instruction instead of three.
* config/rs6000/rs6000.md (add, GPR iterator): Add support
for the PADDI instruction if we have prefixed addressing.
(movsi_internal1): Add support to load up 16 bit constants
directly if have prefixed addressing.
(movsi splitter): Don't split if we have prefixed addressing.
(movdi_internal64): Add support to load up 34 bit constants if we
have prefixed addressing.
(movdi splitter): Add comment.
(stack_protect_setdi): Add support for either address being a
prefixed address.
(stack_protect_testdi): Add support for either address being a
prefixed address.

Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 273813)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -839,7 +839,8 @@ (define_special_predicate "indexed_addre
 (define_predicate "add_operand"
   (if_then_else (match_code "const_int")
 

[PATCH], PowerPC #17, adds the prefixed RTL attribute & automatically emit a leading 'p'

2019-07-25 Thread Michael Meissner
This is patch #17 that adds support for automatically detecting prefixed
instructions and changing the length to 12 bytes, and emitting a 'p' before the
instruction.  At the moment, only loading up pc-relative addresses will have
the prefixed RTL set, since I have enabled the prefixed instruction support for
other modes (that will be in the next patch).

It is somewhat different than the previous scheme in terms of the details (but
similar in spirit) that we talked about in private mail.

With this scheme, there are 3 attributes:

The main RTL attribute is "prefixed".  It looks at the "type" RTL attribute, to
see if it is a load of some sort, a store of some sort, or something that may
use PADDI to load up a constant, address, or add a large constant.  Because
there are functions that set the type, but don't have standard operands
(mov_update1, etc.) the three functions that validate whether the
instruction is prefixed, only look at the INSN and not at the OPERANDS.

The "length" RTL attribute looks the "prefixed" attribute and sets the length
from either the "prefixed_length" or "nonprefixed_length" attributes.  These
attributes default to 12 bytes and 4 bytes respectively.  The idea is this can
simplify the attributes for insn patterns that can generate multiple
instructions, so that they don't have to have if_then_else's to set the types.

I have this patch queued up to do bootstrap and make check.  Assuming there are
no regressions, can I check this into the FSF trunk?

2019-07-25  Michael Meissner  

* config/rs6000/rs6000-prefixed.c: New file.
* config/rs6000/rs6000-protos.h (prefixed_addr_reg_p): Add
declaration.
(prefixed_load_p): Add declaration.
(prefixed_store_p): Add declaration.
(prefixed_paddi_p): Add declaration.
(rs6000_asm_output_opcode): Add declaration.
(rs6000_final_prescan_insn): Update calling signature.
* config/rs6000/rs6000.c (prefixed_addr_reg_p): New function.
* config/rs6000/rs6000.h (FINAL_PRESCAN_INSN): New target hook.
(ASM_OUTPUT_OPCODE): New target hook.
* config/rs6000/rs6000.md (prefixed attribute): New RTL attribute.
(prefixed_length attribute): New RTL attribute.
(non_prefixed_length attribute): New RTL attribute.
(length attribute): Set length based on prefixed, prefixed_length,
and non_prefixed_length attributes.
(pcrel_addr): Update to use prefixed RTL attribute.
(pcrel_ext_addr): Update to use prefixed RTL attribute.
* config/rs6000/t-rs6000: Add rs6000-prefixed.o build rules.
* config.gcc (powerpc*-*-*): Add rs6000-prefixed.c.
(rs6000*-*-*): Add rs6000-prefixed.c.

Index: gcc/config/rs6000/rs6000-prefixed.c
===
--- gcc/config/rs6000/rs6000-prefixed.c (revision 0)
+++ gcc/config/rs6000/rs6000-prefixed.c (working copy)
@@ -0,0 +1,190 @@
+/* Subroutines used to support prefixed addressing on the PowerPC.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "df.h"
+#include "tm_p.h"
+#include "ira.h"
+#include "print-tree.h"
+#include "varasm.h"
+#include "explow.h"
+#include "expr.h"
+#include "output.h"
+#include "tree-pass.h"
+#include "rtx-vector-builder.h"
+#include "print-rtl.h"
+#include "insn-attr.h"
+#include "insn-config.h"
+#include "recog.h"
+#include "tm-constrs.h"
+
+/* Whether the next instruction needs a 'p' prefix issued before the
+   instruction is printed out.  */
+static bool next_insn_prefixed_p;
+
+/* Define FINAL_PRESCAN_INSN if some processing needs to be done before
+   outputting the assembler code.  On the PowerPC, we remember if the current
+   insn is a prefixed insn where we need to emit a 'p' before the insn.  */
+
+void
+rs6000_final_prescan_insn (rtx_insn *insn)
+{
+  next_insn_prefixed_p = (get_attr_prefixed (insn) != PREFIXED_NO);
+  return;
+}
+
+/* Define ASM_OUTPUT_OPCODE to do anything special before emitting an opcode.
+   We use it to emit a 'p' for prefixed insns that is set in
+   FINAL_PRESCAN_INSN.  */
+
+void
+rs6000_asm_output_opcode (FILE 

[PATCH], PowerPC #16, Update a comment and move a predicate

2019-07-25 Thread Michael Meissner
This is patch #16 that fixes some things when I noticed when I was creating
patch #15.  I separated it out just to simplify things.  It updates a comment
in talking about pc-relative external addresses that was wrong.  In also moves
a predicate so that all of the pcrel_* predicates could be together in the
predicates.md file.

I have a build queued up with these changes.  Assuming that there are no
regressions, can I check these changes into the FSF tree.

This is a series of separate patches to add functionality to the PowerPC
backend to support future processors.  Here is a high level summary of the
patches:

 * Patches 1-8, have already been applied
 * Patch 9 has been rewritten in patches 12-13
 * Patch 10 is withdrawn for now
 * Patch 11 adds DS offset mode to rs6000.c's reg_addr
 * Patch 12 adds a new enumeration for instruction format
 * Patch 13 adds support for matching prefixed insns
 * Patch 14 adds pc-relative support to load up addresses
 * Patch 15 renamed some functions to be smaller
 * Patch 16 updated a comment and moved a predicate
 * Patch 17 adds the prefixed RTL attribute & emitting 'p' before prefixed
 * Patch 18 adds prefixed support for scalar types
 * Patch 19 uses a separate 'future' cost structure
 * Patch 20 clones power9.md for initial scheduling on future.md.

The following patches have not yet been written, but I expect them to be:

 * Patch 21 finish prefixed insn support for vectors & 128-bit int/floats
 * Patch 22 enable pc-relative by default
 * Patch 23 add pcrel linker optimization
 * Patch 24 new tests

2019-07-25  Michael Meissner  

* config/rs6000/predicates.md (pcrel_external_address): Update
comment.
(prefixed_mem_operand): Move lower in the file so the pcrel_*
predicates are all grouped together.

Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 273798)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -1647,11 +1647,11 @@
 
 ;; Return true if the operand is an external symbol whose address can be loaded
 ;; into a register using:
-;; PLA reg,label@pcrel@got
+;; PLD reg,label@pcrel@got
 ;;
 ;; The linker will either optimize this to either a PADDI if the label is
 ;; defined locally in another module or a PLD of the address if the label is
-;; defined in another module.
+;; defined in a shared library.
 
 (define_predicate "pcrel_external_address"
   (match_code "symbol_ref,const")
@@ -1662,13 +1662,6 @@
   return addr_validate_p (op, INSN_FORM_PREFIXED, ADDR_VALIDATE_PCREL_EXT);
 })
 
-;; Return 1 if op is a prefixed memory operand.
-(define_predicate "prefixed_mem_operand"
-  (match_code "mem")
-{
-  return prefixed_addr_mode_p (XEXP (op, 0), GET_MODE (op));
-})
-
 ;; Return 1 if op is a memory operand to an external variable when we
 ;; support pc-relative addressing and the PCREL_OPT relocation to
 ;; optimize references to it.
@@ -1682,6 +1675,13 @@
   return addr_validate_p (addr, INSN_FORM_PREFIXED, ADDR_VALIDATE_PCREL_EXT);
 })
 
+;; Return 1 if op is a prefixed memory operand.
+(define_predicate "prefixed_mem_operand"
+  (match_code "mem")
+{
+  return prefixed_addr_mode_p (XEXP (op, 0), GET_MODE (op));
+})
+
 ;; Match the first insn (addis) in fusing the combination of addis and loads to
 ;; GPR registers on power8.
 (define_predicate "fusion_gpr_addis"

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[PATCH], PowerPC #15, Rename some prefixed addressing functions

2019-07-25 Thread Michael Meissner
This is patch #15, and it shortens some of the previous names that I had
created.  The motavation by making the names smaller is make the ChangeLog
files either to write with the 79 character limit.

I have done a bootstrap on a little endian power8 system, and there were no
regressions in the test suite.  Can I check this into the FSF trunk?

2019-07-25  Michael Meissner  

* config/rs6000/predicates.md (prefixed_mem_operand): Rename
rs6000_prefixed_address_mode_p to prefixed_addr_mode_p.
* config/rs6000/rs6000-protos.h (prefixed_addr_mode_p): Rename
from rs6000_prefixed_address_mode_p.
* config/rs6000/rs6000.c (prefixed_addr_mode_p): Rename from
rs6000_prefixed_address_mode_p.

Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 273786)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -1666,7 +1666,7 @@
 (define_predicate "prefixed_mem_operand"
   (match_code "mem")
 {
-  return rs6000_prefixed_address_mode_p (XEXP (op, 0), GET_MODE (op));
+  return prefixed_addr_mode_p (XEXP (op, 0), GET_MODE (op));
 })
 
 ;; Return 1 if op is a memory operand to an external variable when we
Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   (revision 273786)
+++ gcc/config/rs6000/rs6000-protos.h   (working copy)
@@ -173,7 +173,7 @@
 extern void rs6000_split_logical (rtx [], enum rtx_code, bool, bool, bool);
 extern bool rs6000_pcrel_p (struct function *);
 extern bool rs6000_fndecl_pcrel_p (const_tree);
-extern bool rs6000_prefixed_address_mode_p (rtx, machine_mode);
+extern bool prefixed_addr_mode_p (rtx, machine_mode);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 273786)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -13851,7 +13851,7 @@
mode MODE.  */
 
 bool
-rs6000_prefixed_address_mode_p (rtx addr, machine_mode mode)
+prefixed_addr_mode_p (rtx addr, machine_mode mode)
 {
   const unsigned addr_flags = (ADDR_VALIDATE_REG_34BIT
   | ADDR_VALIDATE_PCREL_LOCAL);

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[PATCH], PowerPC #14, Add pc-relative support to load up addresses

2019-07-25 Thread Michael Meissner
This is patch #14, and it adds support to load up addresses with pc-relative
addressing.  Note, a later patch will add support for actually using the
pc-relative support directly.  In theory with patches 11-14 applied, you could
use -mpcrel.  However, for now, -mpcrel will not be enabled by default.

I have checked this by doing a boostrap on a little endian power8 system, and
there were no regressions in the test suite.  Can I check this into the FSF?

This is a series of separate patches to add functionality to the PowerPC
backend to support future processors.  Here is a high level summary of the
patches:

 * Patches 1-8, have already been applied
 * Patch 9 has been rewritten in patches 12-13
 * Patch 10 is withdrawn for now
 * Patch 11 adds DS offset mode to rs6000.c's reg_addr
 * Patch 12 adds a new enumeration for instruction format
 * Patch 13 adds support for matching prefixed insns
 * Patch 14 adds pc-relative support to load up addresses
 * Patch 15 renamed some functions to be smaller
 * Patch 16 updated a comment and moved a predicate
 * Patch 17 adds the prefixed RTL attribute & emitting 'p' before prefixed
 * Patch 18 adds prefixed support for scalar types
 * Patch 19 uses a separate 'future' cost structure
 * Patch 20 clones power9.md for initial scheduling on future.md.

The following patches have not yet been written, but I expect them to be:

 * Patch 21 finish prefixed insn support for vectors & 128-bit int/floats
 * Patch 22 enable pc-relative by default
 * Patch 23 add pcrel linker optimization
 * Patch 24 new tests

2019-07-24  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_emit_move): Add support to load
up addresses if we have pc-relative support.
* config/rs6000/rs6000.md (pcrel_addr): New insn.
(pcrel_ext_addr): New insn.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 273778)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -9779,6 +9779,20 @@ rs6000_emit_move (rtx dest, rtx source,
  return;
}
 
+  /* Handle pc-relative addresses, either external symbols or internal
+within the function.  */
+  if (TARGET_PCREL)
+   {
+ const unsigned addr_flags = (ADDR_VALIDATE_PCREL_LOCAL
+  | ADDR_VALIDATE_PCREL_EXT);
+
+ if (addr_validate_p (operands[1], INSN_FORM_PREFIXED, addr_flags))
+   {
+ emit_insn (gen_rtx_SET (operands[0], operands[1]));
+ return;
+   }
+   }
+
   if (DEFAULT_ABI == ABI_V4
  && mode == Pmode && mode == SImode
  && flag_pic == 1 && got_operand (operands[1], mode))
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 273777)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -9885,6 +9885,26 @@ (define_expand "restore_stack_nonlocal"
   operands[6] = gen_rtx_PARALLEL (VOIDmode, p);
 })
 
+;; Load up a pc-relative address.  The length is 12 because the assembler may
+;; need to add a NOP to align the PADDI so it doesn't cross a 32-btye boundary.
+(define_insn "*pcrel_addr"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=b*r")
+   (match_operand:DI 1 "pcrel_address"))]
+  "TARGET_PCREL"
+  "pla %0,%a1"
+  [(set_attr "length" "12")])
+
+;; Load up a pc-relative address to an external symbol.  If the symbol and the
+;; program are both defined in the main program, the linker will optimize this
+;; to a PADDI.  Otherwise, it will create a GOT address that is relocated by
+;; the dynamic linker and loaded up.
+(define_insn "*pcrel_ext_addr"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=b*r")
+   (match_operand:DI 1 "pcrel_external_address"))]
+  "TARGET_PCREL"
+  "pld %0,%a1"
+  [(set_attr "length" "12")])
+
 ;; TOC register handling.
 
 ;; Code to initialize the TOC register...


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[PATCH], PowerPC #13, Add support for matching prefixed insns

2019-07-25 Thread Michael Meissner
This is patch #13 which adds new support for matching insn addresses,
particularly prefixed insns.  In particular the main feature of this patch is a
function called 'addr_validate_p' that given an address, a mask of address
types you want to match, and an instruction format enumeration, it will return
true if the instruction matches the address.  It will optionally return the
information broken out into a base address, numeric offset, and a mask of which
type of address matches.

At the moment, I only added support for the address formats I need for prefixed
addressing.  I could imagine that it can be extended for other types of
addresses, such as TOC addresses.  But I wanted to keep this particular patch
focused on what is needed for prefixed addressing.

The address types it can match include:

   1)   An address consisting of a single register;
   2)   A traditional instruction with a 16/14/12-bit offset;
   3)   A prefixed instruction with a numeric 34-bit offset;
   4)   A pc-relative address pointing to a local symbol or label; (or)
   5)   A pc-relative address to an external symbol.

You can OR the address types together, so you could look for an address that is
either a 34-bit numeric prefixed instruction or a pc-relative address to a
local symbol.

I have bootstrapped these changes on a power8 little endian system and there
were no regressions in make check.  Can I check this in the FSF trunk?

This is a series of separate patches to add functionality to the PowerPC
backend to support future processors.  Here is a high level summary of the
patches:

 * Patches 1-8, have already been applied
 * Patch 9 has been rewritten in patches 12-13
 * Patch 10 is withdrawn for now
 * Patch 11 adds DS offset mode to rs6000.c's reg_addr
 * Patch 12 adds a new enumeration for instruction format
 * Patch 13 adds support for matching prefixed insns
 * Patch 14 adds pc-relative support to load up addresses
 * Patch 15 renamed some functions to be smaller
 * Patch 16 updated a comment and moved a predicate
 * Patch 17 adds the prefixed RTL attribute & emitting 'p' before prefixed
 * Patch 18 adds prefixed support for scalar types
 * Patch 19 uses a separate 'future' cost structure
 * Patch 20 clones power9.md for initial scheduling on future.md.

The following patches have not yet been written, but I expect them to be:

 * Patch 21 finish prefixed insn support for vectors & 128-bit int/floats
 * Patch 22 enable pc-relative by default
 * Patch 23 add pcrel linker optimization
 * Patch 24 new tests

2019-07-24  Michael Meissner  

* config/rs6000/predicates.md (lwa_operand): If we have prefixed
addresses, allow sign extend SImode to have odd offsets.
(pcrel_address): Rewrite to use addr_validate_p.
(pcrel_external_address): Rewrite to use addr_validate_p.
(pcrel_external_mem_operand): Rewrite to use addr_validate_p.
* config/rs6000/rs6000-protos.h (addr_validate_info): New
structure for addr_validate_p to return information.
(ADDR_VALIDATE_REG_SINGLE): New mask for addr_validate_p.
(ADDR_VALIDATE_REG_16BIT): New mask for addr_validate_p.
(ADDR_VALIDATE_REG_34BIT): New mask for addr_validate_p.
(ADDR_VALIDATE_REG_PCREL_LOCAL): New mask for addr_validate_p.
(ADDR_VALIDATE_REG_PCREL_EXT): New mask for addr_validate_p.
(addr_validate_p): New declaration.
* config/rs6000/rs6000.c (mode_supports_prefixed_address_p): Move
higher in the file.
(quad_address_p): Add checks to see if we can use prefixed
addresses without worrying about DQ alignment constraints.
(mem_operand_gpr): Add checks to see if we can use prefixed
addresses without worrying about DS alignment constraints.
(mem_operand_ds_form): Add checks to see if we can use prefixed
addresses without worrying about DS alignment constraints.
(print_operand_address): Use addr_validate_p to detect pc-relative
addresses, and to crack them into the symbol and offset.
(addr_validate_p): New function to validate addresses and
optionally return the cracked information.
(rs6000_prefixed_address_mode_p): Rewrite to use addr_validate_p.

Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 273771)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -921,6 +921,8 @@ (define_predicate "vsx_scalar_64bit"
 (define_predicate "lwa_operand"
   (match_code "reg,subreg,mem")
 {
+  const unsigned addr_flags = (ADDR_VALIDATE_REG_34BIT
+  | ADDR_VALIDATE_PCREL_LOCAL);
   rtx inner, addr, offset;
 
   inner = op;
@@ -933,6 +935,13 @@ (define_predicate "lwa_operand"
 return false;
 
   addr = XEXP (inner, 0);
+
+  /* The LWA instruction uses the DS-form format where the bottom two bits of
+ the offset must be 0.  The prefixed PLWA does not have 

Re: Subject: [PATCH] [PR 89330] Remove non-useful speculations from new_edges

2019-07-25 Thread Jan Hubicka
> Hi,
> 
> the following patch prevents the call speculation machinery from
> deallocating call graph edges from under the indirect inlining machinery
> and it also fixes a potential issue in speculation which could in some
> cases undo an earlier inlining decision, something that the inliner is
> not built to expect.
> 
> That latter change requires disabling speculation on a testcase through
> adding -fno-profile-values, otherwise a devirtualziation happens before
> it is dump-scanned for.
> 
> Bootstrapped and tested on an x86_64-linux, OK for trunk?

OK,
thanks!
Honza
> 
> Thanks,
> 
> Martin
> 
> 
> 2019-07-25  Martin Jambor  
> 
>   PR ipa/89330
>   * ipa-inline-transform.c (check_speculations_1): New function.
>   (push_all_edges_in_set_to_vec): Likewise.
>   (check_speculations): Use check_speculations_1, new parameter
>   new_edges.
>   (inline_call): Pass new_edges to check_speculations.
>   * ipa-inline.c (add_new_edges_to_heap): Assert edge_callee is not
>   NULL.
>   (speculation_useful_p): Early return true if edge is inlined, remove
>   later checks for inline_failed.
> 
>   testsuite/
>   * g++.dg/lto/pr89330_[01].C: New test.
>   * g++.dg/tree-prof/devirt.C: Added -fno-profile-values to dg-options.
> ---
>  gcc/ipa-inline-transform.c  | 42 +++--
>  gcc/ipa-inline.c| 12 --
>  gcc/testsuite/g++.dg/lto/pr89330_0.C| 50 +
>  gcc/testsuite/g++.dg/lto/pr89330_1.C| 36 ++
>  gcc/testsuite/g++.dg/tree-prof/devirt.C |  2 +-
>  5 files changed, 133 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/lto/pr89330_0.C
>  create mode 100644 gcc/testsuite/g++.dg/lto/pr89330_1.C
> 
> diff --git a/gcc/ipa-inline-transform.c b/gcc/ipa-inline-transform.c
> index 4a3a193bc9c..897c563f19a 100644
> --- a/gcc/ipa-inline-transform.c
> +++ b/gcc/ipa-inline-transform.c
> @@ -237,10 +237,13 @@ clone_inlined_nodes (struct cgraph_edge *e, bool 
> duplicate,
>  }
>  }
>  
> -/* Check all speculations in N and resolve them if they seems useless. */
> +/* Check all speculations in N and if any seem useless, resolve them.  When a
> +   first edge is resolved, pop all edges from NEW_EDGES and insert them to
> +   EDGE_SET.  Then remove each resolved edge from EDGE_SET, if it is there.  
> */
>  
>  static bool
> -check_speculations (cgraph_node *n)
> +check_speculations_1 (cgraph_node *n, vec *new_edges,
> +   hash_set  *edge_set)
>  {
>bool speculation_removed = false;
>cgraph_edge *next;
> @@ -250,15 +253,46 @@ check_speculations (cgraph_node *n)
>next = e->next_callee;
>if (e->speculative && !speculation_useful_p (e, true))
>   {
> +   while (new_edges && !new_edges->is_empty ())
> + edge_set->add (new_edges->pop ());
> +   edge_set->remove (e);
> +
> e->resolve_speculation (NULL);
> speculation_removed = true;
>   }
>else if (!e->inline_failed)
> - speculation_removed |= check_speculations (e->callee);
> + speculation_removed |= check_speculations_1 (e->callee, new_edges,
> +  edge_set);
>  }
>return speculation_removed;
>  }
>  
> +/* Push E to NEW_EDGES.  Called from hash_set traverse method, which
> +   unfortunately means this function has to have external linkage, otherwise
> +   the code will not compile with gcc 4.8.  */
> +
> +bool
> +push_all_edges_in_set_to_vec (cgraph_edge * const ,
> +   vec *new_edges)
> +{
> +  new_edges->safe_push (e);
> +  return true;
> +}
> +
> +/* Check all speculations in N and if any seem useless, resolve them and 
> remove
> +   them from NEW_EDGES.  */
> +
> +static bool
> +check_speculations (cgraph_node *n, vec *new_edges)
> +{
> +  hash_set  edge_set;
> +  bool res = check_speculations_1 (n, new_edges, _set);
> +  if (!edge_set.is_empty ())
> +edge_set.traverse  *,
> +push_all_edges_in_set_to_vec> (new_edges);
> +  return res;
> +}
> +
>  /* Mark all call graph edges coming out of NODE and all nodes that have been
> inlined to it as in_polymorphic_cdtor.  */
>  
> @@ -450,7 +484,7 @@ inline_call (struct cgraph_edge *e, bool update_original,
>  mark_all_inlined_calls_cdtor (e->callee);
>if (opt_for_fn (e->caller->decl, optimize))
>  new_edges_found = ipa_propagate_indirect_call_infos (curr, new_edges);
> -  check_speculations (e->callee);
> +  check_speculations (e->callee, new_edges);
>if (update_overall_summary)
>  ipa_update_overall_fn_summary (to);
>else
> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
> index 5862d8d..0f1699a182f 100644
> --- a/gcc/ipa-inline.c
> +++ b/gcc/ipa-inline.c
> @@ -1626,6 +1626,7 @@ add_new_edges_to_heap (edge_heap_t *heap, 
> vec new_edges)
>struct cgraph_edge *edge = new_edges.pop ();
>  
>gcc_assert (!edge->aux);
> +  

[PATCH], PowerPC #12, Add instruction format enumeration

2019-07-25 Thread Michael Meissner
This is patch #12 which adds a new enumeration for instruction format.

As we discussed in patch #9, offset_format wasn't the best name.  This patch
adds the INSN_FORM enumeration that lists the 3 traditional offset instruction
formats (D/DS/DQ), indexed only instruction formats, and prefixed instruction
formats.  Its primary use is for deciding what instruction format to use for
offset instructions.

I have done a bootstrap on a little endian power8 system and there were no
regressions in the build or running make check.  Can I check this into the
trunk once patch #11 is checked in?

This is a series of separate patches to add functionality to the PowerPC
backend to support future processors.  Here is a high level summary of the
patches:

 * Patches 1-8, have already been applied
 * Patch 9 has been rewritten in patches 12-13
 * Patch 10 is withdrawn for now
 * Patch 11 adds DS offset mode to rs6000.c's reg_addr
 * Patch 12 adds a new enumeration for instruction format
 * Patch 13 adds support for matching prefixed insns
 * Patch 14 adds pc-relative support to load up addresses
 * Patch 15 renamed some functions to be smaller
 * Patch 16 updated a comment and moved a predicate
 * Patch 17 adds the prefixed RTL attribute & emitting 'p' before prefixed
 * Patch 18 adds prefixed support for scalar types
 * Patch 19 uses a separate 'future' cost structure
 * Patch 20 clones power9.md for initial scheduling on future.md.

The following patches have not yet been written, but I expect them to be:

 * Patch 21 finish prefixed insn support for vectors & 128-bit int/floats
 * Patch 22 enable pc-relative by default
 * Patch 23 add pcrel linker optimization
 * Patch 24 new tests

2019-07-24  Michael Meissner  

* config/rs6000/rs6000.c (struct rs6000_reg_addr): Add default
insn format field.
(rs6000_debug_print_mode): If -mdebug=reg, print the default
instruction format field.
(addr_mask_to_insn_form): New function.
(rs6000_setup_reg_addr_masks): Set up the default instruction
format field.
* config/rs6000/rs6000.md (INSN_FORM): New enumeration.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 273776)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -370,6 +370,7 @@ struct rs6000_reg_addr {
   enum insn_code reload_fpr_gpr;   /* INSN to move from FPR to GPR.  */
   enum insn_code reload_gpr_vsx;   /* INSN to move from GPR to VSX.  */
   enum insn_code reload_vsx_gpr;   /* INSN to move from VSX to GPR.  */
+  enum INSN_FORM default_insn_form;/* Default format for offsets.  */
   addr_mask_type addr_mask[(int)N_RELOAD_REG]; /* Valid address masks.  */
   bool scalar_in_vmx_p;/* Scalar value can go in VMX.  
*/
 };
@@ -2118,6 +2119,38 @@ rs6000_debug_print_mode (ssize_t m)
 fprintf (stderr, " %s: %s", reload_reg_map[rc].name,
 rs6000_debug_addr_mask (reg_addr[m].addr_mask[rc], true));
 
+  switch (reg_addr[m].default_insn_form)
+{
+default:
+case INSN_FORM_UNKNOWN:
+  fputs ("  form=?", stderr);
+  spaces++;
+  break;
+
+case INSN_FORM_D:
+  fputs ("  form=d", stderr);
+  spaces++;
+  break;
+
+case INSN_FORM_DS:
+  fputs ("  form=ds", stderr);
+  break;
+
+case INSN_FORM_DQ:
+  fputs ("  form=dq", stderr);
+  break;
+
+case INSN_FORM_X:
+  fputs ("  form=x", stderr);
+  spaces++;
+  break;
+
+case INSN_FORM_PREFIXED:
+  fputs ("  form=p", stderr);
+  spaces++;
+  break;
+}
+
   if ((reg_addr[m].reload_store != CODE_FOR_nothing)
   || (reg_addr[m].reload_load != CODE_FOR_nothing))
 {
@@ -2528,6 +2561,33 @@ rs6000_debug_reg_global (void)
 }
 
 
+/* Map an addr_mask into an instruction format.  The main use for INSN_FORM is
+   to determine if offsets are valid.  */
+
+static enum INSN_FORM
+addr_mask_to_insn_form (addr_mask_type mask)
+{
+  if ((mask & RELOAD_REG_VALID) == 0)
+return INSN_FORM_UNKNOWN;
+
+  else if ((mask & RELOAD_REG_OFFSET) != 0)
+{
+  if ((mask & RELOAD_REG_QUAD_OFFSET) != 0)
+   return INSN_FORM_DQ;
+
+  if ((mask & RELOAD_REG_DS_OFFSET) != 0)
+   return INSN_FORM_DS;
+
+  else
+   return INSN_FORM_D;
+}
+
+  else if ((mask & RELOAD_REG_INDEXED) != 0)
+return INSN_FORM_X;
+
+  return INSN_FORM_UNKNOWN;
+}
+
 /* Update the addr mask bits in reg_addr to help secondary reload and go if
legitimate address support to figure out the appropriate addressing to
use.  */
@@ -2536,7 +2596,7 @@ static void
 rs6000_setup_reg_addr_masks (void)
 {
   ssize_t rc, reg, m, nregs;
-  addr_mask_type any_addr_mask, addr_mask;
+  addr_mask_type any_addr_mask, addr_mask, default_mask;
 
   for (m = 0; m < NUM_MACHINE_MODES; ++m)
 {
@@ -2686,6 +2746,26 @@ rs6000_setup_reg_addr_masks (void)
}
 
   reg_addr[m].addr_mask[RELOAD_REG_ANY] = 

[PATCH], PowerPC #11, Add DS offset mode to rs6000.c's reg_addr

2019-07-25 Thread Michael Meissner
This patch adds a new address mask field to the reg_addr structure in rs6000.c.
The new address mask says that a particular register type (gpr, fpr, altivec)
needs to use DS style offset addressing for a given mode (i.e. the bottom 2
bits must be 0).  In working on this, I discovered that the instructions that
take up more than one register (i.e. __ibm128) did not set the offset mask, and
I have fixed those modes.

I have done the usual bootstrap and compare make check on a little endian
power8 system and there were no regressions.

In addition, I used the -mdebug=reg option that prints out the mask
definitions, and verified each of the types have the appropriate d/ds/dq flags
set.  I have done it with both the normal options and with -mabi=ieeelongdoulbe
and I verified the TFmode/TCmode is appropriate for a vector type instead of a
pair of registes.  Patch #12 (and ultimately patch #17) will use this
information.

Can I check this patch into the FSF trunk?

This is a series of separate patches to add functionality to the PowerPC
backend to support future processors.  Here is a high level summary of the
patches:

 * Patches 1-8, have already been applied
 * Patch 9 has been rewritten in patches 12-13
 * Patch 10 is withdrawn for now
 * Patch 11 adds DS offset mode to rs6000.c's reg_addr
 * Patch 12 adds a new enumeration for instruction format
 * Patch 13 adds support for matching prefixed insns
 * Patch 14 adds pc-relative support to load up addresses
 * Patch 15 renamed some functions to be smaller
 * Patch 16 updated a comment and moved a predicate
 * Patch 17 adds the prefixed RTL attribute & emitting 'p' before prefixed
 * Patch 18 adds prefixed support for scalar types
 * Patch 19 uses a separate 'future' cost structure
 * Patch 20 clones power9.md for initial scheduling on future.md.

The following patches have not yet been written, but I expect them to be:

 * Patch 21 finish prefixed insn support for vectors & 128-bit int/floats
 * Patch 22 enable pc-relative by default
 * Patch 23 add pcrel linker optimization
 * Patch 24 new tests

2019-07-24  Michael Meissner  

* config/rs6000/rs6000.c (addr_mask_type): Make the type unsigned
short to accomidate RELOAD_REG_DS_OFFSET.
(RELOAD_REG_DS_OFFSET): New mask to identify instructions using
the DS instruction format.
(rs6000_setup_reg_addr_masks): Set the RELOAD_REG_DS_OFFSET mask
if the instruction would need to use a DS instruction format for a
register class.  Split up processing for vectors between vector
registers and GPRs.  Treat floating point modes that take two
registers like DF/SF mode.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 273771)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -351,16 +351,17 @@ static const struct reload_reg_map_type
 /* Mask bits for each register class, indexed per mode.  Historically the
compiler has been more restrictive which types can do PRE_MODIFY instead of
PRE_INC and PRE_DEC, so keep track of sepaate bits for these two.  */
-typedef unsigned char addr_mask_type;
+typedef unsigned short addr_mask_type;
 
-#define RELOAD_REG_VALID   0x01/* Mode valid in register..  */
-#define RELOAD_REG_MULTIPLE0x02/* Mode takes multiple registers.  */
-#define RELOAD_REG_INDEXED 0x04/* Reg+reg addressing.  */
-#define RELOAD_REG_OFFSET  0x08/* Reg+offset addressing. */
-#define RELOAD_REG_PRE_INCDEC  0x10/* PRE_INC/PRE_DEC valid.  */
-#define RELOAD_REG_PRE_MODIFY  0x20/* PRE_MODIFY valid.  */
-#define RELOAD_REG_AND_M16 0x40/* AND -16 addressing.  */
-#define RELOAD_REG_QUAD_OFFSET 0x80/* quad offset is limited.  */
+#define RELOAD_REG_VALID   0x001   /* Mode valid in register.  */
+#define RELOAD_REG_MULTIPLE0x002   /* Mode takes multiple registers.  */
+#define RELOAD_REG_INDEXED 0x004   /* Reg+reg addressing.  */
+#define RELOAD_REG_OFFSET  0x008   /* Reg+offset addressing. */
+#define RELOAD_REG_PRE_INCDEC  0x010   /* PRE_INC/PRE_DEC valid.  */
+#define RELOAD_REG_PRE_MODIFY  0x020   /* PRE_MODIFY valid.  */
+#define RELOAD_REG_AND_M16 0x040   /* AND -16 addressing.  */
+#define RELOAD_REG_QUAD_OFFSET 0x080   /* quad offset is limited.  */
+#define RELOAD_REG_DS_OFFSET   0x100   /* quad offset is limited.  */
 
 /* Register type masks based on the type, of valid addressing modes.  */
 struct rs6000_reg_addr {
@@ -2078,6 +2079,8 @@ rs6000_debug_addr_mask (addr_mask_type m
 
   if ((mask & RELOAD_REG_QUAD_OFFSET) != 0)
 *p++ = 'O';
+  else if ((mask & RELOAD_REG_DS_OFFSET) != 0)
+*p++ = 'd';
   else if ((mask & RELOAD_REG_OFFSET) != 0)
 *p++ = 'o';
   else if (keep_spaces)
@@ -2539,8 +2542,6 @@ rs6000_setup_reg_addr_masks (void)
 {
   machine_mode m2 = (machine_mode) m;
   bool complex_p = false;
-  bool small_int_p = (m2 == QImode || m2 == 

Re: [PATCH] handle multibyte stores larger than char in strlen (PR 91183, 86888)

2019-07-25 Thread Martin Sebor

On 7/24/19 8:39 PM, Jeff Law wrote:

On 7/24/19 8:17 PM, Martin Sebor wrote:

Committed in r273783 after retesting and including a test for
the warning that I had left out of the patch I posted here.

Martin

PS I suspect some of the tests I added might need tweaking on
big-endian systems.  I'll deal with them tomorrow.


And maybe also strictly aligned targets.  A sparc-solaris2.11 cross
shows these failures.  It looks like it doesn't like something about
some of the 64 bit stores in the tests.

FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized "strlen" 0
FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized
"_not_eliminated_" 0
FAIL: gcc.dg/strlenopt-71.c (test for excess errors)
FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "strlen" 0
FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized
"_not_eliminated_" 0



msp430-elf failures:



New tests that FAIL (5 tests):

msp430-sim: gcc.dg/strlenopt-70.c (test for excess errors)
msp430-sim: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized 
"_not_eliminated_" 0


This one is due to bugs in the endian macros the test defines
to try to handle both big and little-endian.  I've fixed those/


msp430-sim: gcc.dg/strlenopt-71.c (test for excess errors)


Same here, though this is a runtime test so it will also fail
to link outside of an emulator.  (Changing the test harness to
report these tests as UNSUPPORTED instead would avoid this sort
of ambiguity.)


msp430-sim: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized 
"_not_eliminated_" 0
msp430-sim: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "strlen" 0


This failure is due to to a combination of the absence of early
store merging in pr83821.  The former prevents the stores below

  char a[4];
  a[0] = 'X';
  a[1] = 0;
  a[2] = a[3] = 0;

from turning into:

  MEM  [(char * {ref-all})] = 49;

pr83821 causes the strlen pass to invalidate strlen information
when it sees the assignment to a[2] (or beyond).  I need to dust
off and resubmit my patch for that from last year.

Until that patch is approved I have disabled the test on strictly
aligned targets.

The failure in gcc.dg/Wstringop-overflow-14.c on visium-elf was
because of a difference between strictly aligned targets and
the rest, which triggers different warnings between the two sets.
I disabled the unexpected warning and the test passes.

I've just committed r273812 and r273814 with these changes.

Martin


Re: [Darwin, testsuite, committed] Address PR91087 - XFAIL parts of pr16855.C.

2019-07-25 Thread Iain Sandoe
Hi Rainer,

> On 25 Jul 2019, at 19:40, Rainer Orth  wrote:
> 

>> The testcase is failing to instrument part of the source because of a 
>> platform
>> bug in the ordering of static DTORs.  It seems unlikely that this is 
>> generically
>> fixable in the toolchain (and given that it's likely to be a dynamic loader
>> change would not be expected to be applied retrospectively to OS versions
>> that are out of support).  To avoid the testsuite noise, xfail the count 
>> lines
>> that don't match (we can adjust the xfails as/when the upstream bug is 
>> fixed).
> 
> have a look at PR c++/81337 where the same issue occurs on Solaris and
> FreeBSD.  This has now been closed as dup of PR c++/52477 which *is* a
> generic issue.

The SysV ABI specifies the relative ordering of atext and .fini etc.

I asked the question of the Darwin owners whether there is such a specification
for Darwin, with no specific answer yet - but the observation that the current
ordering DTOR  does look like a bug (possibly a dyld issue).

The manifestation in the fails of pr16855 is that the gcov write out sits on a 
prioritized static ctor but:

1) Darwin doesn’t support CTOR priorities - so the write-out happens before the
[test] static dtor runs. [hence my observation that it might be better to stick 
with the 
demanded priority, at least that would work in one TU].

2) The atext registered DTORs are running *after* the static “constructor” ones
(opposite order from the ordering of the CTORs and different from SysV ABI)

===

1 - we might address in some way by adding the gcov dtors to the atext list 
(this
is what llvm does on Darwin)

2 - as noted, seems to be a platform bug, we could only hack around it in the
toolchain and we will have to wait for some response from the owers…

so, this might compound the effects of  PR c++/52477, but there does seem to
be an additional problem too?

thanks
Iain

 

Re: [patch] More precise message with -Winline

2019-07-25 Thread Eric Botcazou
> That seems misleading, unless the exception is really never thrown.

See my earlier answer to Richard.

-- 
Eric Botcazou


Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-07-25 Thread Martin Sebor

On 7/24/19 11:08 PM, JiangNing OS wrote:

-Original Message-
From: Martin Sebor 
Sent: Thursday, July 25, 2019 2:08 AM
To: Jeff Law ; JiangNing OS
; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for
conditional store optimization

On 7/24/19 11:12 AM, Jeff Law wrote:

On 7/24/19 10:09 AM, Martin Sebor wrote:

On 7/24/19 9:25 AM, Jeff Law wrote:

On 7/23/19 10:20 AM, Martin Sebor wrote:

On 7/22/19 10:26 PM, JiangNing OS wrote:

This patch is to fix PR91195. Is it OK for trunk?

diff --git a/gcc/ChangeLog b/gcc/ChangeLog index
711a31ea597..4db36644160 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2019-07-22  Jiangning Liu  
+
+    PR middle-end/91195
+    * tree-ssa-phiopt.c (cond_store_replacement): Work around
+    -Wmaybe-uninitialized warning.
+
     2019-07-22  Stafford Horne  
       * config/or1k/or1k.c (or1k_expand_compare): Check for
int before diff --git a/gcc/tree-ssa-phiopt.c
b/gcc/tree-ssa-phiopt.c index b64bde695f4..7a86007d087 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -2240,6 +2240,14 @@ cond_store_replacement (basic_block
middle_bb, basic_block join_bb,
   tree base = get_base_address (lhs);
   if (!auto_var_p (base) || TREE_ADDRESSABLE (base))
     return false;
+
+  /* The transformation below will inherently introduce a
+memory
load,
+ for which LHS may not be initialized yet if it is not in
+NOTRAP,
+ so a -Wmaybe-uninitialized warning message could be triggered.
+ Since it's a bit hard to have a very accurate
+uninitialization
+ analysis to memory reference, we disable the warning here to
avoid
+ confusion.  */
+  TREE_NO_WARNING (lhs) = 1;


The no-warning bit is sometimes (typically?) set by the middle-end
after a warning has been issued, to avoid triggering other warnings
down the line for an already diagnosed expression.  Unless it's
done just for the purposes of a single pass and the bit is cleared
afterwards, using it to avoid potential false positives doesn't
seem like a robust solution.  It will mask warnings for constructs
that have been determined to be invalid.

FWIW, the middle-end is susceptible to quite a few false positives
that would nice to avoid.  We have discussed various approaches to
the problem but setting the no-warning bit seems like too blunt of
an instrument.

All true.

But in the case JiangNing is working with the transformation
inherently can introduce an uninitialized read.  It seems reasonable
to mark those loads he generates that don't have a dominating read.

His code takes something like

     if (x)
   *p = 

And turns it into

     t1 = *p;
     t2 = x ?  : t1;
     *p = t2;

In the past we required there be a dominating read from *p (among
other restrictions).  That requirement was meant to ensure *p isn't
going to fault.  Jiang's work relaxes that requirement somewhat for
objects that we can prove aren't going to fault via other means.

Can setting TREE_NO_WARNING on the inserted loads inhibit warnings?
Certainly.  However, I believe we use it in other places where we
know the code we're emitting is safe, but can cause a warning.  I
think Jiang's work falls into that category.

I do think the bit should only be set if we don't have a dominating
load to minimize cases where we might inhibit a valid warning.


I was thinking of a few cases where setting the no-warning bit might
interfere with detecting bugs unrelated to uninitialized reads:

    1) -Warray-bounds in gimple-ssa-warn-restrict and tree-vrp
    2) -Wstringop-overflow in tree-ssa-strlen (other than for calls
   to built-ins)

I couldn't come up with a test case that shows how it might happen
with this patch but I didn't spend too much time on it.

I bet we could find one and it's more likely to show up on aarch64
than
x86 due to costing issues.  Either way we're making a bit of a
judgment call -- an extra false positive here due to a load the
compiler inserted on a path that didn't have one before, or
potentially missing a warning on that load because of the

TREE_NO_WARNING.


I believe the false positive here is worse than the potential missed
warning.




There are a number of existing instances of setting TREE_NO_WARNING
to suppress -Wuninitialized, and some are the cause of known problems.
Bugs 51545, 58950, 74762, 74765 or 89697 are examples.  They all boil
down to the fact that there's just one bit for all warnings.  Jakub
mentioned adding a hash-map for this.  That seems like a simple and
good solution.

I'm not sure how that really helps here.  We marking the MEM on the
LHS and that's not a shared object.  I don't see how it's going to be
significantly different using a hash map vs the bit in this circumstance.


I don't know what Jakub had in mind but the mapping I envision is one like
hash_map that would make it possible to set a bit for each
distinct warning for every tree node.  It would let us set a bit for -
Wuninitialized 

Re: [Darwin, testsuite, committed] Address PR91087 - XFAIL parts of pr16855.C.

2019-07-25 Thread Rainer Orth
Hi Iain,

> The testcase is failing to instrument part of the source because of a platform
> bug in the ordering of static DTORs.  It seems unlikely that this is 
> generically
> fixable in the toolchain (and given that it's likely to be a dynamic loader
> change would not be expected to be applied retrospectively to OS versions
> that are out of support).  To avoid the testsuite noise, xfail the count lines
> that don't match (we can adjust the xfails as/when the upstream bug is fixed).

have a look at PR c++/81337 where the same issue occurs on Solaris and
FreeBSD.  This has now been closed as dup of PR c++/52477 which *is* a
generic issue.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


patch to fix PR91223

2019-07-25 Thread Vladimir Makarov

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91223

The patch was successfully bootstrapped and tested on x86-64.

Committed as rev. 273810.

Index: ChangeLog
===
--- ChangeLog	(revision 273809)
+++ ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2019-07-25  Vladimir Makarov  
+
+	PR rtl-optimization/91223
+	* lra-constraints.c (process_alt_operands): Fail for unsuccessful
+	matching with INOUT operand.
+
 2019-07-25  Eric Botcazou  
 
 	* stmt.c (expand_case): Try to narrow the index type if it's larger
Index: lra-constraints.c
===
--- lra-constraints.c	(revision 273809)
+++ lra-constraints.c	(working copy)
@@ -2171,6 +2171,14 @@ process_alt_operands (int only_alternati
 		  }
 		else
 		  {
+			/* If the operands do not match and one
+			   operand is INOUT, we can not match them.
+			   Try other possibilities, e.g. other
+			   alternatives or commutative operand
+			   exchange.  */
+			if (curr_static_id->operand[nop].type == OP_INOUT
+			|| curr_static_id->operand[m].type == OP_INOUT)
+			  break;
 			/* Operands don't match.  If the operands are
 			   different user defined explicit hard
 			   registers, then we cannot make them match
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 273809)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2019-07-25  Vladimir Makarov  
+
+	PR rtl-optimization/91223
+	* gcc.target/i386/pr91223.c: New test.
+
 2019-07-25  Iain Sandoe  
 
 	PR gcov-profile/91087
Index: testsuite/gcc.target/i386/pr91223.c
===
--- testsuite/gcc.target/i386/pr91223.c	(nonexistent)
+++ testsuite/gcc.target/i386/pr91223.c	(working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-Og -fno-tree-fre" } */
+int a;
+void fn2(short, short);
+
+void fn1(void) {
+  short b[8];
+  b[0] |= a & 3;
+  b[1] = a;
+  fn2(b[0], b[1]);
+}


[Darwin, testsuite, committed] Address PR91087 - XFAIL parts of pr16855.C.

2019-07-25 Thread Iain Sandoe
The testcase is failing to instrument part of the source because of a platform
bug in the ordering of static DTORs.  It seems unlikely that this is generically
fixable in the toolchain (and given that it's likely to be a dynamic loader
change would not be expected to be applied retrospectively to OS versions
that are out of support).  To avoid the testsuite noise, xfail the count lines
that don't match (we can adjust the xfails as/when the upstream bug is fixed).

dejagnu xfails do not seem to work when embedded in a line like:
  ~Test (void) {  /* count(1) { xfail ... } */ }
the closing brace seems to confuse the parser.  The solution is to exapnd the
text onto three lines.

as an aside, *not part of the applied patch*:
FWIW, I think we should consider that if the correct operation of the coverage
depends on CTOR priority there’s no point in setting it to the default when the
target doesn’t support CTOR priorities - we might as well leave it as desired
and at least that will DTRT within a single TU or under LTO.

tested on x86-64-darwin16, x86_64-pc-linuc-gnu,
applied to mainline
thanks
Iain

2019-07-25  Iain Sandoe  

PR gcov-profile/91087
* g++.dg/gcov/pr16855.C: Xfail the count lines for the DTORs and the
"final" line for the failure summaries.  Adjust source layout so that
dejagnu xfail expressions work.

diff --git a/gcc/testsuite/g++.dg/gcov/pr16855.C 
b/gcc/testsuite/g++.dg/gcov/pr16855.C
index d7aa8a4f72..a68b05cb57 100644
--- a/gcc/testsuite/g++.dg/gcov/pr16855.C
+++ b/gcc/testsuite/g++.dg/gcov/pr16855.C
@@ -1,6 +1,8 @@
 /* { dg-options "-fprofile-arcs -ftest-coverage" } */
 /* { dg-do run { target native } } */
 
+/* See PR91087 for information on Darwin xfails. */
+
 #include 
 #include 
 
@@ -18,7 +20,9 @@ class Test
 {
 public:
   Test (void) { fprintf (stderr, "In Test::Test\n"); /* count(1) */ }
-  ~Test (void) { fprintf (stderr, "In Test::~Test\n"); /* count(1) */ }
+  ~Test (void) {
+   fprintf (stderr, "In Test::~Test\n"); /* count(1) { xfail *-*-darwin* } */
+  }
 } T1;
 
 void
@@ -42,7 +46,7 @@ static void __attribute__ ((constructor)) ctor_default ()
 
 static void __attribute__ ((destructor)) dtor_default ()
 {
-  fprintf (stderr, "in destructor(())\n"); /* count(1) */
+  fprintf (stderr, "in destructor(())\n"); /* count(1) { xfail *-*-darwin* } */
 }
 
-/* { dg-final { run-gcov branches { -b pr16855.C } } } */
+/* { dg-final { run-gcov branches { -b pr16855.C } { xfail *-*-darwin* } } } */



Re: [patch] More precise message with -Winline

2019-07-25 Thread Andi Kleen
Eric Botcazou  writes:

> Hi,
>
> for EH cleanups, the branch probability heuristics can consider that edges 
> are 
> never taken, i.e. their profile count is set to zero.  In this case, no 
> matter 
> how you tweak the inlining parameters, you cannot get function calls inlined, 
> but -Winline nevertheless prints the standard message about unlikely calls as 
> in the cases when tweaking the inlining parameters can work.
>
> Therefore the attached patch adds a new CIF code with the warning message:
>
> "call is never executed and code size would grow [-Winline]"

That seems misleading, unless the exception is really never thrown.

Maybe s/never/rarely/


-Andi


Re: [PATCH, rs6000] Fix PR91050 by adding a DRIVER_SELF_SPECS spec

2019-07-25 Thread Peter Bergner
Uros and Jan,

I should have CC'd you for the i386 portion of the patch.  Are you ok with
the i386.h change if Iain's x86 Darwin testing comes out clean?

Peter


On 7/25/19 9:41 AM, Peter Bergner wrote:
> On 7/25/19 2:50 AM, Iain Sandoe wrote:
>> This will break Darwin which has used DRIVER_SELF_SPECS in config/darwin.h
>> since they were introduced (and the equivalent before).
>>
>> This is because Darwin has driver self-specs common to the PPC and X86 ports.
>>
>> If this change is seen the way to go, then I guess one solution would be to 
>> rename the
>> driver specs in config/darwin.h => SUBTARGET_DRIVER_SELF_SPECS*** and then
>> put a dummy DRIVER_SELF_SPECS in i386/i386.h 
>> and append SUBTARGET_DRIVER_SELF_SPECS  to both the i386.h and rs6000.h 
>> cases.  (or something along those lines)
> 
> Hi Iain,
> 
> The patch below uses your suggestion.  Does this work for you on x86 and ppc?
> 
> 
> 
>>> Segher, I tried your suggestion of writing the spec more generically
>>> (ie, %{mdejagnu-*: ... -m*}), which worked in that it did the correct
>>> replacement.  However, the %>> option(s) would need to be written like %>> at all.
>>
>> not sure what the objective is here - if you want to remove all pre-existing 
>> -mcpu from
>> the command line won’t %> -mcpunewthing
>> if it gets invented) ..
> 
> We only want to remove all pre-existing -mcpu= options IF the user also used
> -mdejagnu-cpu=.  You do not want to remove -mcpu= options if the user used
> -mdejagnu-tune=.  That's why I tried:
> 
> %{mdejagnu-*: % 
> so -mdejagnu-cpu= would only remove -mcpu options and -mdejagnu-tune= would
> only remove -mtune= options.  The problem is that it doesn't look like you
> can you use %* with %<.
> 
> Peter
> 
> 
> gcc/
>   PR target/91050
>   * config/rs6000/rs6000.opt (mdejagnu-cpu=): Delete option.
>   * config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
>   use of deleted rs6000_dejagnu_cpu_index variable.
>   * config/rs6000/rs6000.h (DRIVER_SELF_SPECS): Define.
>   (SUBTARGET_DRIVER_SELF_SPECS): Likewise.
>   * config/darwin.h (DRIVER_SELF_SPECS): Rename from this ...
>   (SUBTARGET_DRIVER_SELF_SPECS): ...to this.
>   * config/i386/i386.h (DRIVER_SELF_SPECS): Define.
>   (SUBTARGET_DRIVER_SELF_SPECS): Likewise.
> 
> Index: gcc/config/rs6000/rs6000.opt
> ===
> --- gcc/config/rs6000/rs6000.opt  (revision 273707)
> +++ gcc/config/rs6000/rs6000.opt  (working copy)
> @@ -388,13 +388,6 @@ mtune=
>  Target RejectNegative Joined Var(rs6000_tune_index) Init(-1) 
> Enum(rs6000_cpu_opt_value) Save
>  -mtune=  Schedule code for given CPU.
>  
> -; Only for use in the testsuite.  This simply overrides -mcpu=.  With older
> -; versions of Dejagnu the command line arguments you set in RUNTESTFLAGS
> -; override those set in the testcases; with this option, the testcase will
> -; always win.
> -mdejagnu-cpu=
> -Target Undocumented RejectNegative Joined Var(rs6000_dejagnu_cpu_index) 
> Init(-1) Enum(rs6000_cpu_opt_value) Save
> -
>  mtraceback=
>  Target RejectNegative Joined Enum(rs6000_traceback_type) 
> Var(rs6000_traceback)
>  -mtraceback=[full,part,no]   Select type of traceback table.
> Index: gcc/config/rs6000/rs6000.c
> ===
> --- gcc/config/rs6000/rs6000.c(revision 273707)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -3489,9 +3489,6 @@ rs6000_option_override_internal (bool gl
>/* Don't override by the processor default if given explicitly.  */
>set_masks &= ~rs6000_isa_flags_explicit;
>  
> -  if (global_init_p && rs6000_dejagnu_cpu_index >= 0)
> -rs6000_cpu_index = rs6000_dejagnu_cpu_index;
> -
>/* Process the -mcpu= and -mtune= argument.  If the user changed
>   the cpu in a target attribute or pragma, but did not specify a tuning
>   option, use the cpu for the tuning option rather than the option 
> specified
> Index: gcc/config/rs6000/rs6000.h
> ===
> --- gcc/config/rs6000/rs6000.h(revision 273707)
> +++ gcc/config/rs6000/rs6000.h(working copy)
> @@ -77,6 +77,20 @@
>  #define PPC405_ERRATUM77 0
>  #endif
>  
> +/* Only for use in the testsuite: -mdejagnu-cpu= simply overrides -mcpu=.
> +   With older versions of Dejagnu the command line arguments you set in
> +   RUNTESTFLAGS override those set in the testcases; with this option,
> +   the testcase will always win.  Ditto for -mdejagnu-tune=.  */
> +#define DRIVER_SELF_SPECS \
> +  "%{mdejagnu-cpu=*: % +   %{mdejagnu-tune=*: % +   %{mdejagnu-*: % +   SUBTARGET_DRIVER_SELF_SPECS
> +
> +#ifndef SUBTARGET_DRIVER_SELF_SPECS
> +# define SUBTARGET_DRIVER_SELF_SPECS
> +#endif
> +
>  #if CHECKING_P
>  #define ASM_OPT_ANY ""
>  #else
> Index: gcc/config/darwin.h
> ===
> --- 

Subject: [PATCH] [PR 89330] Remove non-useful speculations from new_edges

2019-07-25 Thread Martin Jambor
Hi,

the following patch prevents the call speculation machinery from
deallocating call graph edges from under the indirect inlining machinery
and it also fixes a potential issue in speculation which could in some
cases undo an earlier inlining decision, something that the inliner is
not built to expect.

That latter change requires disabling speculation on a testcase through
adding -fno-profile-values, otherwise a devirtualziation happens before
it is dump-scanned for.

Bootstrapped and tested on an x86_64-linux, OK for trunk?

Thanks,

Martin


2019-07-25  Martin Jambor  

PR ipa/89330
* ipa-inline-transform.c (check_speculations_1): New function.
(push_all_edges_in_set_to_vec): Likewise.
(check_speculations): Use check_speculations_1, new parameter
new_edges.
(inline_call): Pass new_edges to check_speculations.
* ipa-inline.c (add_new_edges_to_heap): Assert edge_callee is not
NULL.
(speculation_useful_p): Early return true if edge is inlined, remove
later checks for inline_failed.

testsuite/
* g++.dg/lto/pr89330_[01].C: New test.
* g++.dg/tree-prof/devirt.C: Added -fno-profile-values to dg-options.
---
 gcc/ipa-inline-transform.c  | 42 +++--
 gcc/ipa-inline.c| 12 --
 gcc/testsuite/g++.dg/lto/pr89330_0.C| 50 +
 gcc/testsuite/g++.dg/lto/pr89330_1.C| 36 ++
 gcc/testsuite/g++.dg/tree-prof/devirt.C |  2 +-
 5 files changed, 133 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/lto/pr89330_0.C
 create mode 100644 gcc/testsuite/g++.dg/lto/pr89330_1.C

diff --git a/gcc/ipa-inline-transform.c b/gcc/ipa-inline-transform.c
index 4a3a193bc9c..897c563f19a 100644
--- a/gcc/ipa-inline-transform.c
+++ b/gcc/ipa-inline-transform.c
@@ -237,10 +237,13 @@ clone_inlined_nodes (struct cgraph_edge *e, bool 
duplicate,
 }
 }
 
-/* Check all speculations in N and resolve them if they seems useless. */
+/* Check all speculations in N and if any seem useless, resolve them.  When a
+   first edge is resolved, pop all edges from NEW_EDGES and insert them to
+   EDGE_SET.  Then remove each resolved edge from EDGE_SET, if it is there.  */
 
 static bool
-check_speculations (cgraph_node *n)
+check_speculations_1 (cgraph_node *n, vec *new_edges,
+ hash_set  *edge_set)
 {
   bool speculation_removed = false;
   cgraph_edge *next;
@@ -250,15 +253,46 @@ check_speculations (cgraph_node *n)
   next = e->next_callee;
   if (e->speculative && !speculation_useful_p (e, true))
{
+ while (new_edges && !new_edges->is_empty ())
+   edge_set->add (new_edges->pop ());
+ edge_set->remove (e);
+
  e->resolve_speculation (NULL);
  speculation_removed = true;
}
   else if (!e->inline_failed)
-   speculation_removed |= check_speculations (e->callee);
+   speculation_removed |= check_speculations_1 (e->callee, new_edges,
+edge_set);
 }
   return speculation_removed;
 }
 
+/* Push E to NEW_EDGES.  Called from hash_set traverse method, which
+   unfortunately means this function has to have external linkage, otherwise
+   the code will not compile with gcc 4.8.  */
+
+bool
+push_all_edges_in_set_to_vec (cgraph_edge * const ,
+ vec *new_edges)
+{
+  new_edges->safe_push (e);
+  return true;
+}
+
+/* Check all speculations in N and if any seem useless, resolve them and remove
+   them from NEW_EDGES.  */
+
+static bool
+check_speculations (cgraph_node *n, vec *new_edges)
+{
+  hash_set  edge_set;
+  bool res = check_speculations_1 (n, new_edges, _set);
+  if (!edge_set.is_empty ())
+edge_set.traverse  *,
+  push_all_edges_in_set_to_vec> (new_edges);
+  return res;
+}
+
 /* Mark all call graph edges coming out of NODE and all nodes that have been
inlined to it as in_polymorphic_cdtor.  */
 
@@ -450,7 +484,7 @@ inline_call (struct cgraph_edge *e, bool update_original,
 mark_all_inlined_calls_cdtor (e->callee);
   if (opt_for_fn (e->caller->decl, optimize))
 new_edges_found = ipa_propagate_indirect_call_infos (curr, new_edges);
-  check_speculations (e->callee);
+  check_speculations (e->callee, new_edges);
   if (update_overall_summary)
 ipa_update_overall_fn_summary (to);
   else
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 5862d8d..0f1699a182f 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -1626,6 +1626,7 @@ add_new_edges_to_heap (edge_heap_t *heap, vec new_edges)
   struct cgraph_edge *edge = new_edges.pop ();
 
   gcc_assert (!edge->aux);
+  gcc_assert (edge->callee);
   if (edge->inline_failed
  && can_inline_edge_p (edge, true)
  && want_inline_small_function_p (edge, true)
@@ -1653,6 +1654,10 @@ heap_edge_removal_hook (struct cgraph_edge *e, void 
*data)

Re: [patch, fortran] Improve dependency checking

2019-07-25 Thread Thomas Koenig

Hi Steve,


Ah, I don't speak C++, and didn't know one could corrupt a
C prototype in this manner.  A quick glance of gfortran.h
indeed shows a few more occurences of "bool xxx = false".
I suppose the patch is then OK.


I don't use many C++ features, but I use this one because I feel
it should really be in C :-)


PS: watch for long lines.


Corrected and committed as r273807.

Thanks for the review!

Regards

Thomas


Re: [patch, fortran] Improve dependency checking

2019-07-25 Thread Steve Kargl
On Thu, Jul 25, 2019 at 04:42:44PM +0200, Thomas Koenig wrote:
> Hi Steve,
> 
> >> -int gfc_dep_resolver(gfc_ref *, gfc_ref *, gfc_reverse *);
> >> +int gfc_dep_resolver(gfc_ref *, gfc_ref *, gfc_reverse *, bool identical 
> >> = false);
> > This is changing the prototype.  I would expect to see
> > 
> > 
> > int gfc_dep_resolver(gfc_ref *, gfc_ref *, gfc_reverse *, bool);
> 
> Usig C++'s optional arguments is actually quite useful, it's used
> already used in a few places in the front end.
> 
> The idea is that you don't need to touch the other callers, just the
> ones where the new argument matters.
> 
> However, in this particular case, it would also be possible to ajust
> all other callers (exactly one), if you prefer.
> 

Ah, I don't speak C++, and didn't know one could corrupt a
C prototype in this manner.  A quick glance of gfortran.h
indeed shows a few more occurences of "bool xxx = false".
I suppose the patch is then OK.

PS: watch for long lines.

-- 
Steve


tips for reviewing code

2019-07-25 Thread Martin Sebor

Here's a short article that I think is worth a read whether you
mainly review code or usually find yourself on the receiving end.

https://developers.redhat.com/blog/2019/07/08/10-tips-for-reviewing-code-you-dont-like

Martin


Re: [PATCH 1/2] Come up with function_decl_type and use it in tree_function_decl.

2019-07-25 Thread Martin Liška
On 7/25/19 3:30 PM, Martin Liška wrote:
> On 7/25/19 2:18 PM, Marc Glisse wrote:
>> On Thu, 25 Jul 2019, Martin Liška wrote:
>>
 DCE has special code to avoid removing the LHS of malloc when it is 
 unused. Is there something different about operator new that makes it not 
 need the same handling?
>>
>> If you take gcc.dg/torture/pr51692.c and replace __builtin_calloc (1, sizeof 
>> (double)) with new double(), we get an ICE with -O or higher...
>>
> 
> I can see, I'm working on that.
> 
> Martin
> 

There's a patch candidate I've been testing right now.

Ready to be installed after tests?
Thanks,
Martin
>From 4a7d515eda8591277dad01e1cd43a33883f2dcef Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 25 Jul 2019 17:24:51 +0200
Subject: [PATCH] Fix ICE seen in tree-ssa-dce.c for new/delete pair.

gcc/ChangeLog:

2019-07-25  Martin Liska  

	* tree-ssa-dce.c (eliminate_unnecessary_stmts): Do not
	remove LHS of operator new call.  It's handled latter.

gcc/testsuite/ChangeLog:

2019-07-25  Martin Liska  

	* g++.dg/cpp1y/new1.C (test_unused): Add new case that causes
	ICE.
---
 gcc/testsuite/g++.dg/cpp1y/new1.C |  8 
 gcc/tree-ssa-dce.c| 13 +++--
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/g++.dg/cpp1y/new1.C b/gcc/testsuite/g++.dg/cpp1y/new1.C
index a95dd4d1ee3..5e4f1bf6b0b 100644
--- a/gcc/testsuite/g++.dg/cpp1y/new1.C
+++ b/gcc/testsuite/g++.dg/cpp1y/new1.C
@@ -61,5 +61,13 @@ new_array_load() {
   delete [] x;
 }
 
+void
+test_unused() {
+  volatile double d = 0.0;
+  double *p = new double ();
+  d += 1.0;
+  delete p;
+}
+
 /* { dg-final { scan-tree-dump-times "Deleting : operator delete" 5 "cddce1"} } */
 /* { dg-final { scan-tree-dump-times "Deleting : _\\d+ = operator new" 7 "cddce1"} } */
diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index 90b3f4d7c45..cf507fa0453 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -1341,12 +1341,13 @@ eliminate_unnecessary_stmts (void)
 		 did not mark as necessary, it will confuse the
 		 special logic we apply to malloc/free pair removal.  */
 		  && (!(call = gimple_call_fndecl (stmt))
-		  || DECL_BUILT_IN_CLASS (call) != BUILT_IN_NORMAL
-		  || (DECL_FUNCTION_CODE (call) != BUILT_IN_ALIGNED_ALLOC
-			  && DECL_FUNCTION_CODE (call) != BUILT_IN_MALLOC
-			  && DECL_FUNCTION_CODE (call) != BUILT_IN_CALLOC
-			  && !ALLOCA_FUNCTION_CODE_P
-			  (DECL_FUNCTION_CODE (call)
+		  || ((DECL_BUILT_IN_CLASS (call) != BUILT_IN_NORMAL
+			   || (DECL_FUNCTION_CODE (call) != BUILT_IN_ALIGNED_ALLOC
+			   && DECL_FUNCTION_CODE (call) != BUILT_IN_MALLOC
+			   && DECL_FUNCTION_CODE (call) != BUILT_IN_CALLOC
+			   && !ALLOCA_FUNCTION_CODE_P
+			   (DECL_FUNCTION_CODE (call
+			  && !DECL_IS_REPLACEABLE_OPERATOR_NEW_P (call
 		{
 		  something_changed = true;
 		  if (dump_file && (dump_flags & TDF_DETAILS))
-- 
2.22.0



Re: [PATCH, rs6000] Fix PR91050 by adding a DRIVER_SELF_SPECS spec

2019-07-25 Thread Segher Boessenkool
Hi Peter,

On Thu, Jul 25, 2019 at 09:41:10AM -0500, Peter Bergner wrote:
> On 7/25/19 2:50 AM, Iain Sandoe wrote:
> > This will break Darwin which has used DRIVER_SELF_SPECS in config/darwin.h
> > since they were introduced (and the equivalent before).
> > 
> > This is because Darwin has driver self-specs common to the PPC and X86 
> > ports.
> > 
> > If this change is seen the way to go, then I guess one solution would be to 
> > rename the
> > driver specs in config/darwin.h => SUBTARGET_DRIVER_SELF_SPECS*** and then
> > put a dummy DRIVER_SELF_SPECS in i386/i386.h 
> > and append SUBTARGET_DRIVER_SELF_SPECS  to both the i386.h and rs6000.h 
> > cases.  (or something along those lines)
> 
> Hi Iain,
> 
> The patch below uses your suggestion.  Does this work for you on x86 and ppc?

It all looks fine to me; if it works for Iain, it is approved for trunk,
and thank you!


Segher


Re: [PATCH, rs6000] Fix PR91050 by adding a DRIVER_SELF_SPECS spec

2019-07-25 Thread Iain Sandoe
Hi Peter,

> On 25 Jul 2019, at 15:41, Peter Bergner  wrote:
> 
> On 7/25/19 2:50 AM, Iain Sandoe wrote:
>> This will break Darwin which has used DRIVER_SELF_SPECS in config/darwin.h
>> since they were introduced (and the equivalent before).
>> 
>> This is because Darwin has driver self-specs common to the PPC and X86 ports.
>> 
>> If this change is seen the way to go, then I guess one solution would be to 
>> rename the
>> driver specs in config/darwin.h => SUBTARGET_DRIVER_SELF_SPECS*** and then
>> put a dummy DRIVER_SELF_SPECS in i386/i386.h 
>> and append SUBTARGET_DRIVER_SELF_SPECS  to both the i386.h and rs6000.h 
>> cases.  (or something along those lines)

> The patch below uses your suggestion.  Does this work for you on x86 and ppc?

It’s in the queue for testing tonight - my ppc h/w is somewhat slow, 

thanks for handling this,
Iain

> 
>>> Segher, I tried your suggestion of writing the spec more generically
>>> (ie, %{mdejagnu-*: ... -m*}), which worked in that it did the correct
>>> replacement.  However, the %>> option(s) would need to be written like %>> at all.
>> 
>> not sure what the objective is here - if you want to remove all pre-existing 
>> -mcpu from
>> the command line won’t %> -mcpunewthing
>> if it gets invented) ..
> 
> We only want to remove all pre-existing -mcpu= options IF the user also used
> -mdejagnu-cpu=.  You do not want to remove -mcpu= options if the user used
> -mdejagnu-tune=.  That's why I tried:
> 
>%{mdejagnu-*: % 
> so -mdejagnu-cpu= would only remove -mcpu options and -mdejagnu-tune= would
> only remove -mtune= options.  The problem is that it doesn't look like you
> can you use %* with %<.
> 
> Peter
> 
> 
> gcc/
>   PR target/91050
>   * config/rs6000/rs6000.opt (mdejagnu-cpu=): Delete option.
>   * config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
>   use of deleted rs6000_dejagnu_cpu_index variable.
>   * config/rs6000/rs6000.h (DRIVER_SELF_SPECS): Define.
>   (SUBTARGET_DRIVER_SELF_SPECS): Likewise.
>   * config/darwin.h (DRIVER_SELF_SPECS): Rename from this ...
>   (SUBTARGET_DRIVER_SELF_SPECS): ...to this.
>   * config/i386/i386.h (DRIVER_SELF_SPECS): Define.
>   (SUBTARGET_DRIVER_SELF_SPECS): Likewise.
> 
> Index: gcc/config/rs6000/rs6000.opt
> ===
> --- gcc/config/rs6000/rs6000.opt  (revision 273707)
> +++ gcc/config/rs6000/rs6000.opt  (working copy)
> @@ -388,13 +388,6 @@ mtune=
> Target RejectNegative Joined Var(rs6000_tune_index) Init(-1) 
> Enum(rs6000_cpu_opt_value) Save
> -mtune=   Schedule code for given CPU.
> 
> -; Only for use in the testsuite.  This simply overrides -mcpu=.  With older
> -; versions of Dejagnu the command line arguments you set in RUNTESTFLAGS
> -; override those set in the testcases; with this option, the testcase will
> -; always win.
> -mdejagnu-cpu=
> -Target Undocumented RejectNegative Joined Var(rs6000_dejagnu_cpu_index) 
> Init(-1) Enum(rs6000_cpu_opt_value) Save
> -
> mtraceback=
> Target RejectNegative Joined Enum(rs6000_traceback_type) Var(rs6000_traceback)
> -mtraceback=[full,part,no]Select type of traceback table.
> Index: gcc/config/rs6000/rs6000.c
> ===
> --- gcc/config/rs6000/rs6000.c(revision 273707)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -3489,9 +3489,6 @@ rs6000_option_override_internal (bool gl
>   /* Don't override by the processor default if given explicitly.  */
>   set_masks &= ~rs6000_isa_flags_explicit;
> 
> -  if (global_init_p && rs6000_dejagnu_cpu_index >= 0)
> -rs6000_cpu_index = rs6000_dejagnu_cpu_index;
> -
>   /* Process the -mcpu= and -mtune= argument.  If the user changed
>  the cpu in a target attribute or pragma, but did not specify a tuning
>  option, use the cpu for the tuning option rather than the option 
> specified
> Index: gcc/config/rs6000/rs6000.h
> ===
> --- gcc/config/rs6000/rs6000.h(revision 273707)
> +++ gcc/config/rs6000/rs6000.h(working copy)
> @@ -77,6 +77,20 @@
> #define PPC405_ERRATUM77 0
> #endif
> 
> +/* Only for use in the testsuite: -mdejagnu-cpu= simply overrides -mcpu=.
> +   With older versions of Dejagnu the command line arguments you set in
> +   RUNTESTFLAGS override those set in the testcases; with this option,
> +   the testcase will always win.  Ditto for -mdejagnu-tune=.  */
> +#define DRIVER_SELF_SPECS \
> +  "%{mdejagnu-cpu=*: % +   %{mdejagnu-tune=*: % +   %{mdejagnu-*: % +   SUBTARGET_DRIVER_SELF_SPECS
> +
> +#ifndef SUBTARGET_DRIVER_SELF_SPECS
> +# define SUBTARGET_DRIVER_SELF_SPECS
> +#endif
> +
> #if CHECKING_P
> #define ASM_OPT_ANY ""
> #else
> Index: gcc/config/darwin.h
> ===
> --- gcc/config/darwin.h   (revision 273707)
> +++ gcc/config/darwin.h   (working 

Re: [patch, fortran] Improve dependency checking

2019-07-25 Thread Thomas Koenig

Hi Steve,


-int gfc_dep_resolver(gfc_ref *, gfc_ref *, gfc_reverse *);
+int gfc_dep_resolver(gfc_ref *, gfc_ref *, gfc_reverse *, bool identical = 
false);

This is changing the prototype.  I would expect to see


int gfc_dep_resolver(gfc_ref *, gfc_ref *, gfc_reverse *, bool);


Usig C++'s optional arguments is actually quite useful, it's used
already used in a few places in the front end.

The idea is that you don't need to touch the other callers, just the
ones where the new argument matters.

However, in this particular case, it would also be possible to ajust
all other callers (exactly one), if you prefer.

Regards

Thomas


Re: [PATCH, rs6000] Fix PR91050 by adding a DRIVER_SELF_SPECS spec

2019-07-25 Thread Peter Bergner
On 7/25/19 2:50 AM, Iain Sandoe wrote:
> This will break Darwin which has used DRIVER_SELF_SPECS in config/darwin.h
> since they were introduced (and the equivalent before).
> 
> This is because Darwin has driver self-specs common to the PPC and X86 ports.
> 
> If this change is seen the way to go, then I guess one solution would be to 
> rename the
> driver specs in config/darwin.h => SUBTARGET_DRIVER_SELF_SPECS*** and then
> put a dummy DRIVER_SELF_SPECS in i386/i386.h 
> and append SUBTARGET_DRIVER_SELF_SPECS  to both the i386.h and rs6000.h 
> cases.  (or something along those lines)

Hi Iain,

The patch below uses your suggestion.  Does this work for you on x86 and ppc?



>> Segher, I tried your suggestion of writing the spec more generically
>> (ie, %{mdejagnu-*: ... -m*}), which worked in that it did the correct
>> replacement.  However, the %> option(s) would need to be written like %> at all.
> 
> not sure what the objective is here - if you want to remove all pre-existing 
> -mcpu from
> the command line won’t % -mcpunewthing
> if it gets invented) ..

We only want to remove all pre-existing -mcpu= options IF the user also used
-mdejagnu-cpu=.  You do not want to remove -mcpu= options if the user used
-mdejagnu-tune=.  That's why I tried:

%{mdejagnu-*: %= 0)
-rs6000_cpu_index = rs6000_dejagnu_cpu_index;
-
   /* Process the -mcpu= and -mtune= argument.  If the user changed
  the cpu in a target attribute or pragma, but did not specify a tuning
  option, use the cpu for the tuning option rather than the option specified
Index: gcc/config/rs6000/rs6000.h
===
--- gcc/config/rs6000/rs6000.h  (revision 273707)
+++ gcc/config/rs6000/rs6000.h  (working copy)
@@ -77,6 +77,20 @@
 #define PPC405_ERRATUM77 0
 #endif
 
+/* Only for use in the testsuite: -mdejagnu-cpu= simply overrides -mcpu=.
+   With older versions of Dejagnu the command line arguments you set in
+   RUNTESTFLAGS override those set in the testcases; with this option,
+   the testcase will always win.  Ditto for -mdejagnu-tune=.  */
+#define DRIVER_SELF_SPECS \
+  "%{mdejagnu-cpu=*: %

Re: [PATCH][ARM] Fix low reg issue in Thumb-2 movsi patterns

2019-07-25 Thread Wilco Dijkstra
Hi Richard,
 
> I think this should be "lk*r", not "l*rk".  SP is only going to crop up 
> in rare circumstances, but we are always going to need this pattern if 
> it does and hiding this from register preferencing is pointless.  It's 
> not like the compiler is going to start allocating SP in the more 
> general case.
 
'*' only applies to the next constraint, so these are equivalent. I've committed
it with them swapped around, however using 'k' at all here seems redundant
anyway given GCC no longer uses physical registers in most instructions.

> I'd also like to see all these movsi matching patterns merged into a 
> single pattern that just enables the appropriate variants.  Having 
> separate implementations for Arm, thumb2, vfp, iwmmx is just making 
> maintenance of this stuff a nightmare.

Yes if we're sure one is a strict superset of the other, we could just merge
them and disable VFP specific variants. However the patterns are quite
complex so it's hard to be 100% sure. Also I think there should be no need
at all for such huge patterns given a load or immediate can never become a
register move and visa versa. So it should be feasible to split into register
moves, loads, stores and immediates.

Wilco

Re: [patch, fortran] Improve dependency checking

2019-07-25 Thread Steve Kargl
On Thu, Jul 25, 2019 at 02:57:35PM +0200, Thomas Koenig wrote:
> Index: dependency.h
> ===
> --- dependency.h  (Revision 273733)
> +++ dependency.h  (Arbeitskopie)
> @@ -37,7 +37,7 @@ int gfc_check_fncall_dependency (gfc_expr *, sym_i
>  int gfc_check_dependency (gfc_expr *, gfc_expr *, bool);
>  int gfc_expr_is_one (gfc_expr *, int);
>  
> -int gfc_dep_resolver(gfc_ref *, gfc_ref *, gfc_reverse *);
> +int gfc_dep_resolver(gfc_ref *, gfc_ref *, gfc_reverse *, bool identical = 
> false);

This is changing the prototype.  I would expect to see


int gfc_dep_resolver(gfc_ref *, gfc_ref *, gfc_reverse *, bool);


-- 
Steve


Re: [PATCH V2, rs6000] Support vrotr3 for int vector types

2019-07-25 Thread Segher Boessenkool
Hi Kewen,

On Tue, Jul 23, 2019 at 02:28:28PM +0800, Kewen.Lin wrote:
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -1666,6 +1666,60 @@
>"vrl %0,%1,%2"
>[(set_attr "type" "vecsimple")])
>  
> +;; Here these vrl_and are for vrotr3 expansion.
> +;; since SHIFT_COUNT_TRUNCATED is set as zero, to append one explicit
> +;; AND to indicate truncation but emit vrl insn.
> +(define_insn "vrlv2di_and"
> +  [(set (match_operand:V2DI 0 "register_operand" "=v")
> + (and:V2DI
> +   (rotate:V2DI (match_operand:V2DI 1 "register_operand" "v")
> +  (match_operand:V2DI 2 "register_operand" "v"))
> +   (const_vector:V2DI [(const_int 63) (const_int 63)])))]
> +  "VECTOR_UNIT_P8_VECTOR_P (V2DImode)"
> +  "vrld %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

"vrlv2di_and" is an a bit unhappy name, we have a "vrlv" intruction.
Just something like "rotatev2di_something", maybe?

Do we have something similar for non-rotate vector shifts, already?  We
probably should, so please keep that in mind for naming things.

"vrlv2di_and" sounds like you first do the rotate, and then on what
that results in you do the and.  And that is what the pattern does,
too.  But this is wrong: it should mask off all but the lower bits
of operand 2, instead.

> +(define_insn "vrlv4si_and"
> +  [(set (match_operand:V4SI 0 "register_operand" "=v")
> + (and:V4SI
> +   (rotate:V4SI (match_operand:V4SI 1 "register_operand" "v")
> +  (match_operand:V4SI 2 "register_operand" "v"))
> +   (const_vector:V4SI [(const_int 31) (const_int 31)
> +   (const_int 31) (const_int 31)])))]
> +  "VECTOR_UNIT_ALTIVEC_P (V4SImode)"
> +  "vrlw %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vrlv8hi_and"
> +  [(set (match_operand:V8HI 0 "register_operand" "=v")
> + (and:V8HI
> +   (rotate:V8HI (match_operand:V8HI 1 "register_operand" "v")
> +  (match_operand:V8HI 2 "register_operand" "v"))
> +   (const_vector:V8HI [(const_int 15) (const_int 15)
> +   (const_int 15) (const_int 15)
> +   (const_int 15) (const_int 15)
> +   (const_int 15) (const_int 15)])))]
> +  "VECTOR_UNIT_ALTIVEC_P (V8HImode)"
> +  "vrlh %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vrlv16qi_and"
> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
> + (and:V16QI
> + (rotate:V16QI (match_operand:V16QI 1 "register_operand" "v")
> +(match_operand:V16QI 2 "register_operand" "v"))
> + (const_vector:V16QI [(const_int 7) (const_int 7)
> + (const_int 7) (const_int 7)
> + (const_int 7) (const_int 7)
> + (const_int 7) (const_int 7)
> + (const_int 7) (const_int 7)
> + (const_int 7) (const_int 7)
> + (const_int 7) (const_int 7)
> + (const_int 7) (const_int 7)])))]
> +  "VECTOR_UNIT_ALTIVEC_P (V16QImode)"
> +  "vrlb %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

All the patterns can be merged into one (using some code_iterator).  That
can be a later improvement.

> +;; Return 1 if op is a vector register that operates on integer vectors
> +;; or if op is a const vector with integer vector modes.
> +(define_predicate "vint_reg_or_const_vector"
> +  (match_code "reg,subreg,const_vector")
> +{
> +  if (GET_CODE (op) == CONST_VECTOR && GET_MODE_CLASS (mode) == 
> MODE_VECTOR_INT)
> +return 1;
> +
> +  return vint_operand (op, mode);
> +})

Hrm, I don't like this name very much.  Why is just vint_operand not
enough for what you use this for?

> +;; Expanders for rotatert to make use of vrotl
> +(define_expand "vrotr3"
> +  [(set (match_operand:VEC_I 0 "vint_operand")
> + (rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
> + (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]
> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
> +{
> +  rtx rot_count = gen_reg_rtx (mode);
> +  if (GET_CODE (operands[2]) == CONST_VECTOR)
> +{
> +  machine_mode inner_mode = GET_MODE_INNER (mode);
> +  unsigned int bits = GET_MODE_PRECISION (inner_mode);
> +  rtx mask_vec = gen_const_vec_duplicate (mode, GEN_INT (bits - 
> 1));
> +  rtx imm_vec
> + = simplify_const_unary_operation (NEG, mode, operands[2],

(The "=" goes on the previous line).

> +   GET_MODE (operands[2]));
> +  imm_vec
> + = simplify_const_binary_operation (AND, mode, imm_vec, mask_vec);
> +  rot_count = force_reg (mode, imm_vec);
> +  emit_insn (gen_vrotl3 (operands[0], operands[1], rot_count));
> +}
> +  else
> +{
> +  emit_insn (gen_neg2 (rot_count, operands[2]));
> +  emit_insn (gen_vrl_and (operands[0], operands[1], rot_count));
> +}
> +  DONE;
> +})

Why do you have to emit as the "and" form here?  Emitting the "bare"
rotate should 

Re: [PATCH 1/2] Come up with function_decl_type and use it in tree_function_decl.

2019-07-25 Thread Martin Liška
On 7/25/19 2:18 PM, Marc Glisse wrote:
> On Thu, 25 Jul 2019, Martin Liška wrote:
> 
>>> DCE has special code to avoid removing the LHS of malloc when it is unused. 
>>> Is there something different about operator new that makes it not need the 
>>> same handling?
> 
> If you take gcc.dg/torture/pr51692.c and replace __builtin_calloc (1, sizeof 
> (double)) with new double(), we get an ICE with -O or higher...
> 

I can see, I'm working on that.

Martin


Re: [PATCH], Patch #10, move PowerPC data structures & helper functions from rs6000.c to rs6000-internal.h

2019-07-25 Thread Segher Boessenkool
Hi Mike,

On Mon, Jul 22, 2019 at 06:37:35PM -0400, Michael Meissner wrote:
> On Mon, Jul 22, 2019 at 03:56:26PM -0500, Segher Boessenkool wrote:
> > That still needs an explanation: why is this a good thing, why do you
> > want that change?  Sometimes that is obvious of course, but here it is
> > not.  It would be a lot more obvious if there was more context.
> 
> The trouble is to get that much context really relies on about several
> additional patches to get to the functions in particular that should be split
> out.

In the normal workflow, you send a series of patches when it is ready.
And if a series is not ready, you do not send it.

You can also give context just in the emails, or send a work-in-progress
patch just as FYI (please mark it clearly as such, then).  But without
context, how can I see if what a patch does is correct and useful?

> As I implement stuff, I find myself neeting/wanting to access the stuff in
> reg_addr in other files (predicates.md, and the new file rs6000-prefix.c).

Then you first need to question if things are split up in a useful way
the way things are, and then how it could be done better.

> But it would be nice to have that information available to the other .c files
> as well as the .md files.

If many other modules need the data, then either:
a) The module boundaries are not set nicely; or
b) This is a central data structure, and should be treated as one.


Segher


[patch, fortran] Improve dependency checking

2019-07-25 Thread Thomas Koenig

Hello world,

the attached pach does some more work in gfc_check_dependency for
the case where an identity between arguments would also lead
to problems.

It does not lead to removal of the warning with -Warray-temporaries,
because that is still generated by the call to library function.
Instead, I checked for the names of the variables which used to
be introduced by the matmul inlining code.

Regression-tested. OK for trunk?

Regards

Thomas

2019-07-25  Thomas Koenig  

PR fortran/65819
* dependency.h (gfc_dep_resovler): Add optional argument identical.
* dependency.c (gfc_check_dependency): Do not alway return 1 if
the symbol is the same. Pass on identical to gfc_dep_resolver.
(gfc_check_element_vs_element): Whitespace fix.
(gfc_dep_resolver): Adjust comment for function.  If identical is
true, return 1 if any overlap has been found.

2019-07-25  Thomas Koenig  

PR fortran/65819
* gfortran.dg/dependency_54.f90: New test.
Index: dependency.c
===
--- dependency.c	(Revision 273733)
+++ dependency.c	(Arbeitskopie)
@@ -1351,13 +1351,10 @@ gfc_check_dependency (gfc_expr *expr1, gfc_expr *e
 	  return 0;
 	}
 
-  if (identical)
-	return 1;
-
   /* Identical and disjoint ranges return 0,
 	 overlapping ranges return 1.  */
   if (expr1->ref && expr2->ref)
-	return gfc_dep_resolver (expr1->ref, expr2->ref, NULL);
+	return gfc_dep_resolver (expr1->ref, expr2->ref, NULL, identical);
 
   return 1;
 
@@ -1884,6 +1881,7 @@ gfc_check_element_vs_element (gfc_ref *lref, gfc_r
 
   if (i > -2)
 return GFC_DEP_NODEP;
+ 
   return GFC_DEP_EQUAL;
 }
 
@@ -2084,13 +2082,15 @@ ref_same_as_full_array (gfc_ref *full_ref, gfc_ref
 
 /* Finds if two array references are overlapping or not.
Return value
-   	2 : array references are overlapping but reversal of one or
+	2 : array references are overlapping but reversal of one or
 	more dimensions will clear the dependency.
-   	1 : array references are overlapping.
-   	0 : array references are identical or not overlapping.  */
+	1 : array references are overlapping, or identical is true and
+	there is some kind of overlap.
+	0 : array references are identical or not overlapping.  */
 
 int
-gfc_dep_resolver (gfc_ref *lref, gfc_ref *rref, gfc_reverse *reverse)
+gfc_dep_resolver (gfc_ref *lref, gfc_ref *rref, gfc_reverse *reverse,
+		  bool identical)
 {
   int n;
   int m;
@@ -2124,11 +2124,15 @@ int
 
 	case REF_ARRAY:
 
+	  /* For now, treat all coarrays as dangerous.  */
+	  if (lref->u.ar.codimen || rref->u.ar.codimen)
+	return 1;
+
 	  if (ref_same_as_full_array (lref, rref))
-	return 0;
+	return identical;
 
 	  if (ref_same_as_full_array (rref, lref))
-	return 0;
+	return identical;
 
 	  if (lref->u.ar.dimen != rref->u.ar.dimen)
 	{
@@ -2180,6 +2184,8 @@ int
 		  gcc_assert (rref->u.ar.dimen_type[n] == DIMEN_ELEMENT
 			  && lref->u.ar.dimen_type[n] == DIMEN_ELEMENT);
 		  this_dep = gfc_check_element_vs_element (rref, lref, n);
+		  if (identical && this_dep == GFC_DEP_EQUAL)
+		this_dep = GFC_DEP_OVERLAP;
 		}
 
 	  /* If any dimension doesn't overlap, we have no dependency.  */
@@ -2240,6 +2246,9 @@ int
 		 know the worst one.*/
 
 	update_fin_dep:
+	  if (identical && this_dep == GFC_DEP_EQUAL)
+		this_dep = GFC_DEP_OVERLAP;
+
 	  if (this_dep > fin_dep)
 		fin_dep = this_dep;
 	}
@@ -2253,7 +2262,7 @@ int
 
 	  /* Exactly matching and forward overlapping ranges don't cause a
 	 dependency.  */
-	  if (fin_dep < GFC_DEP_BACKWARD)
+	  if (fin_dep < GFC_DEP_BACKWARD && !identical)
 	return 0;
 
 	  /* Keep checking.  We only have a dependency if
@@ -2267,11 +2276,14 @@ int
   rref = rref->next;
 }
 
+  /* Assume the worst if we nest to different depths.  */
+  if (lref || rref)
+return 1;
+
   /* If we haven't seen any array refs then something went wrong.  */
   gcc_assert (fin_dep != GFC_DEP_ERROR);
 
-  /* Assume the worst if we nest to different depths.  */
-  if (lref || rref)
+  if (identical && fin_dep != GFC_DEP_NODEP)
 return 1;
 
   return fin_dep == GFC_DEP_OVERLAP;
Index: dependency.h
===
--- dependency.h	(Revision 273733)
+++ dependency.h	(Arbeitskopie)
@@ -37,7 +37,7 @@ int gfc_check_fncall_dependency (gfc_expr *, sym_i
 int gfc_check_dependency (gfc_expr *, gfc_expr *, bool);
 int gfc_expr_is_one (gfc_expr *, int);
 
-int gfc_dep_resolver(gfc_ref *, gfc_ref *, gfc_reverse *);
+int gfc_dep_resolver(gfc_ref *, gfc_ref *, gfc_reverse *, bool identical = false);
 int gfc_are_equivalenced_arrays (gfc_expr *, gfc_expr *);
 
 gfc_expr * gfc_discard_nops (gfc_expr *);
! { dg-do  run }
! { dg-additional-options "-fdump-tree-original -ffrontend-optimize" }
! PR 65819 - this used to cause a temporary in matmul inlining.
! Check that these are absent by 

Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

2019-07-25 Thread Richard Biener
On Thu, 25 Jul 2019, Martin Jambor wrote:

> Hello,
> 
> On Tue, Jul 23 2019, Richard Biener wrote:
> > The following fixes the runtime regression of 456.hmmer caused
> > by matching ICC in code generation and using cmov more aggressively
> > (through GIMPLE level MAX_EXPR usage).  Appearantly (discovered
> > by manual assembler editing) using the SSE unit for performing
> > SImode loads, adds and then two singed max operations plus stores
> > is quite a bit faster than cmovs - even faster than the original
> > single cmov plus branchy second max.  Even more so for AMD CPUs
> > than Intel CPUs.
> >
> > Instead of hacking up some pattern recognition pass to transform
> > integer mode memory-to-memory computation chains involving
> > conditional moves to "vector" code (similar to what STV does
> > for TImode ops on x86_64) the following simply allows SImode
> > into SSE registers (support for this is already there in some
> > important places like move patterns!).  For the particular
> > case of 456.hmmer the required support is loads/stores
> > (already implemented), SImode adds and SImode smax.
> >
> > So the patch adds a smax pattern for SImode (we don't have any
> > for scalar modes but currently expand via a conditional move sequence)
> > emitting as SSE vector max or cmp/cmov depending on the alternative.
> >
> > And it amends the *add_1 pattern with SSE alternatives
> > (which have to come before the memory alternative as IRA otherwise
> > doesn't consider reloading a memory operand to a register).
> >
> > With this in place the runtime of 456.hmmer improves by 10%
> > on Haswell which is back to before regression speed but not
> > to same levels as seen with manually editing just the single
> > important loop.
> >
> > I'm currently benchmarking all SPEC CPU 2006 on Haswell.  More
> > interesting is probably Zen where moves crossing the
> > integer - vector domain are excessively expensive (they get
> > done via the stack).
> 
> There was a znver2 CPU machine not doing anything useful overnight here
> so I benchmarked your patch using SPEC 2006 and SPEC CPUrate 2017 on top
> of trunk r273663 (I forgot to pull, so before Honza's znver2 tuning
> patches, I am afraid).  All benchmarks were run only once with options
> -Ofast -march=native -mtune=native.
> 
> By far the biggest change was indeed 456.hmmer which improved by
> incredible 35%.  There was no other change bigger than +- 1.5% in SPEC
> 2006 so the SPECint score grew by almost 3.4%.
> 
> I understand this patch fixes a regression in that benchmark but even
> so, 456.hmmer built with the Monday trunk was 23% slower than with gcc 9
> and with the patch is 20% faster than gcc 9.
> 
> In SPEC 2017, there were two changes worth mentioning although they
> probably need to be confirmed and re-measured on top of the new tuning
> changes.  525.x264_r regressed by 3.37% and 511.povray_r improved by
> 3.04%.

Thanks for checking.  Meanwhile I figured how to restore the
effects of the patch without disabling the GPR alternative in
smaxsi3.  The additional trick I need is avoid register class
preferencing from moves so *movsi_internal gets a few more *s
(in the end we'd need to split the r = g alternative because
for r = C we _do_ want to prefer general regs - unless it's
a special constant that can be loaded into a SSE reg?  Or maybe
that's not needed and reload costs will take care of that).

I've also needed to pessimize the GPR alternative in smaxsi3
because that instruction is supposed to drive all the decision
as it is cheaper when done on SSE regs.

Tunings still wreck things, like using -march=bdver2 will
give you one vpmaxsd and the rest in integer regs, including
an inter-unit move via the stack.

Still this is the best I can get to with my limited .md / LRA
skills.

Is avoiding register-class preferencing from moves good?  I think
it makes sense at least.

How would one write smaxsi3 as a splitter to be split after
reload in the case LRA assigned the GPR alternative?  Is it
even worth doing?  Even the SSE reg alternative can be split
to remove the not needed CC clobber.

Finally I'm unsure about the add where I needed to place
the SSE alternative before the 2nd op memory one since it
otherwise gets the same cost and wins.

So - how to go forward with this?

Thanks,
Richard.

Index: gcc/config/i386/i386.md
===
--- gcc/config/i386/i386.md (revision 273792)
+++ gcc/config/i386/i386.md (working copy)
@@ -1881,6 +1881,33 @@ (define_expand "mov"
   ""
   "ix86_expand_move (mode, operands); DONE;")
 
+(define_insn "smaxsi3"
+ [(set (match_operand:SI 0 "register_operand" "=?r,v,x")
+   (smax:SI (match_operand:SI 1 "register_operand" "%0,v,0")
+(match_operand:SI 2 "register_operand" "r,v,x")))
+  (clobber (reg:CC FLAGS_REG))]
+  ""
+{
+  switch (get_attr_type (insn))
+{
+case TYPE_SSEADD:
+  if (which_alternative == 1)
+return "vpmaxsd\t{%2, %1, 

Re: [PATCH 1/2] Come up with function_decl_type and use it in tree_function_decl.

2019-07-25 Thread Marc Glisse

On Thu, 25 Jul 2019, Martin Liška wrote:


DCE has special code to avoid removing the LHS of malloc when it is unused. Is 
there something different about operator new that makes it not need the same 
handling?


If you take gcc.dg/torture/pr51692.c and replace __builtin_calloc (1, 
sizeof (double)) with new double(), we get an ICE with -O or higher...


--
Marc Glisse


[PATCH] Adjust extract_range_from_multiplicative_op

2019-07-25 Thread Richard Biener


To not extract type from op0.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

2019-07-25  Richard Biener  

* tree-vrp.c (extract_range_from_multiplicative_op): Add
type parameter and use it instead of guessing expression
type from the first operand.
(extract_range_from_binary_expr): Pass expr_type down.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 273792)
+++ gcc/tree-vrp.c  (working copy)
@@ -1238,7 +1238,7 @@ extract_range_into_wide_ints (const valu
 
 static void
 extract_range_from_multiplicative_op (value_range_base *vr,
- enum tree_code code,
+ enum tree_code code, tree type,
  const value_range_base *vr0,
  const value_range_base *vr1)
 {
@@ -1253,7 +1253,6 @@ extract_range_from_multiplicative_op (va
   gcc_assert (vr0->kind () == VR_RANGE
  && vr0->kind () == vr1->kind ());
 
-  tree type = vr0->type ();
   wide_int res_lb, res_ub;
   wide_int vr0_lb = wi::to_wide (vr0->min ());
   wide_int vr0_ub = wi::to_wide (vr0->max ());
@@ -1785,7 +1784,7 @@ extract_range_from_binary_expr (value_ra
  vr->set_varying ();
  return;
}
-  extract_range_from_multiplicative_op (vr, code, , );
+  extract_range_from_multiplicative_op (vr, code, expr_type, , );
   return;
 }
   else if (code == RSHIFT_EXPR
@@ -1806,7 +1805,8 @@ extract_range_from_binary_expr (value_ra
  if (vr0.kind () != VR_RANGE || vr0.symbolic_p ())
vr0.set (VR_RANGE, vrp_val_min (expr_type),
 vrp_val_max (expr_type));
- extract_range_from_multiplicative_op (vr, code, , );
+ extract_range_from_multiplicative_op (vr, code, expr_type,
+   , );
  return;
}
  else if (code == LSHIFT_EXPR


Re: cp: implementation of p1301 for C++

2019-07-25 Thread JeanHeyd Meneide
 I think I got the tabs right...? You would not believe how unbelievably
hard it is, just to mail a diff!

-

07-24-2019ThePhD

gcc/

* escaped_string.h: New. Refactored out of tree.c to make more
broadly available (e.g. to parser.c, cvt.c).
* tree.c: remove escaped_string class

gcc/c-family

* c-lex.c - update attribute value

gcc/cp/

* tree.c: Implement p1301 - nodiscard("should have a reason"))
Added C++2a nodiscard string message handling.
Increase nodiscard argument handling max_length from 0
to 1. (error C++2a gated)
* parser.c: add requirement that nodiscard only be seen
once in attribute-list (C++2a gated)
* cvt.c: add nodiscard message to output, if applicable

-

diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index 851fd704e5d..f5870d095f2 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -345,24 +345,26 @@ c_common_has_attribute (cpp_reader *pfile)
attr_name = NULL_TREE;
   }
  }
-  else
-  {
-  /* Some standard attributes need special handling. */
-  if (is_attribute_p ("noreturn", attr_name))
-   result = 200809;
-  else if (is_attribute_p ("deprecated", attr_name))
-   result = 201309;
-  else if (is_attribute_p ("maybe_unused", attr_name)
-|| is_attribute_p ("nodiscard", attr_name)
-|| is_attribute_p ("fallthrough", attr_name))
-   result = 201603;
-  else if (is_attribute_p ("no_unique_address", attr_name)
-|| is_attribute_p ("likely", attr_name)
-|| is_attribute_p ("unlikely", attr_name))
-   result = 201803;
-  if (result)
-   attr_name = NULL_TREE;
-  }
+   else
+ {
+   /* Some standard attributes need special handling. */
+   if (is_attribute_p ("noreturn", attr_name))
+ result = 200809;
+   else if (is_attribute_p ("deprecated", attr_name))
+ result = 201309;
+   else if (is_attribute_p ("maybe_unused", attr_name)
+|| is_attribute_p ("fallthrough", attr_name))
+ result = 201603;
+   else if (is_attribute_p ("no_unique_address", attr_name)
+|| is_attribute_p ("likely", attr_name)
+|| is_attribute_p ("unlikely", attr_name))
+ result = 201803;
+   else if (is_attribute_p ("nodiscard", attr_name))
+ result = 201907; /* placeholder until C++20 Post-Cologne Working
Draft. */
+
+   if (result)
+ attr_name = NULL_TREE;
+ }
 }
if (attr_name)
 {
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 23d2aabc483..4a5128c76a1 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If not see
#include "convert.h"
#include "stringpool.h"
#include "attribs.h"
+#include "escaped_string.h"
static tree convert_to_pointer_force (tree, tree, tsubst_flags_t);
static tree build_type_conversion (tree, tree);
@@ -1026,49 +1027,99 @@ maybe_warn_nodiscard (tree expr, impl_conv_void
implicit)
tree rettype = TREE_TYPE (type);
tree fn = cp_get_fndecl_from_callee (callee);
- if (implicit != ICV_CAST && fn
- && lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn)))
- {
- auto_diagnostic_group d;
- if (warning_at (loc, OPT_Wunused_result,
-"ignoring return value of %qD, "
-"declared with attribute nodiscard", fn))
- inform (DECL_SOURCE_LOCATION (fn), "declared here");
- }
- else if (implicit != ICV_CAST
-  && lookup_attribute ("nodiscard", TYPE_ATTRIBUTES (rettype)))
- {
- auto_diagnostic_group d;
- if (warning_at (loc, OPT_Wunused_result,
-"ignoring returned value of type %qT, "
-"declared with attribute nodiscard", rettype))
- {
-  if (fn)
-  inform (DECL_SOURCE_LOCATION (fn),
-"in call to %qD, declared here", fn);
-  inform (DECL_SOURCE_LOCATION (TYPE_NAME (rettype)),
-"%qT declared here", rettype);
- }
- }
- else if (TREE_CODE (expr) == TARGET_EXPR
-  && lookup_attribute ("warn_unused_result", TYPE_ATTRIBUTES (type)))
- {
- /* The TARGET_EXPR confuses do_warn_unused_result into thinking that the
-  result is used, so handle that case here. */
- if (fn)
- {
-  auto_diagnostic_group d;
-  if (warning_at (loc, OPT_Wunused_result,
-  "ignoring return value of %qD, "
-  "declared with attribute %",
-  fn))
-  inform (DECL_SOURCE_LOCATION (fn), "declared here");
- }
- else
- warning_at (loc, OPT_Wunused_result,
-"ignoring return value of function "
-"declared with attribute %");
- }
+ if (implicit != ICV_CAST && fn
+   && lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn)))
+   {
+ tree attr = DECL_ATTRIBUTES (fn);
+ escaped_string msg;
+ if (attr)
+   msg.escape (TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr;
+ bool has_msg = static_cast(msg);
+ const char* format = (has_msg ?
+   "ignoring return value of %qD, "
+   "declared with attribute %: %<%s%>" :
+   "ignoring return value of %qD, "
+   "declared with attribute %%s");
+ const char* raw_msg = (has_msg ? (const char*)msg : "");
+ auto_diagnostic_group d;
+ if (warning_at (loc, OPT_Wunused_result,
+  format, fn, raw_msg))
+   {
+ inform 

[PATCH][arm][committed] Clean up code iterator usage in satsi* patterns

2019-07-25 Thread Kyrill Tkachov

Hi all,

GCC 10 now supports having RTL codes being code attributes (thanks 
Richard) allowing us to map smax to smin and vice versa.
This means we can clean up their use in the saturation patterns that do 
the cross product of [smin, smax] and use the pattern

predicate to cancel out the nonsense ones.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing to trunk.
Thanks,
Kyrill

2019-07-25  Kyrylo Tkachov  

    * config/arm/arm.md (SATrev): Change to code attribute.
    (*satsi_): Adjust for the above.
    (*satsi__shift): Likewise.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 7b90065adba2a19536aeaccfd8f5cf8a501eb266..f11745ba855957da4acec197f2df9c931708062a 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3981,16 +3981,16 @@
 )
 
 (define_code_iterator SAT [smin smax])
-(define_code_iterator SATrev [smin smax])
+(define_code_attr SATrev [(smin "smax") (smax "smin")])
 (define_code_attr SATlo [(smin "1") (smax "2")])
 (define_code_attr SAThi [(smin "2") (smax "1")])
 
 (define_insn "*satsi_"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
-(SAT:SI (SATrev:SI (match_operand:SI 3 "s_register_operand" "r")
+(SAT:SI (:SI (match_operand:SI 3 "s_register_operand" "r")
(match_operand:SI 1 "const_int_operand" "i"))
 (match_operand:SI 2 "const_int_operand" "i")))]
-  "TARGET_32BIT && arm_arch6 &&  != 
+  "TARGET_32BIT && arm_arch6
&& arm_sat_operator_match (operands[], operands[], NULL, NULL)"
 {
   int mask;
@@ -4011,12 +4011,12 @@
 
 (define_insn "*satsi__shift"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
-(SAT:SI (SATrev:SI (match_operator:SI 3 "sat_shift_operator"
+(SAT:SI (:SI (match_operator:SI 3 "sat_shift_operator"
  [(match_operand:SI 4 "s_register_operand" "r")
   (match_operand:SI 5 "const_int_operand" "i")])
(match_operand:SI 1 "const_int_operand" "i"))
 (match_operand:SI 2 "const_int_operand" "i")))]
-  "TARGET_32BIT && arm_arch6 &&  != 
+  "TARGET_32BIT && arm_arch6
&& arm_sat_operator_match (operands[], operands[], NULL, NULL)"
 {
   int mask;


[PATCH] value-range lattice const correctness

2019-07-25 Thread Richard Biener


The following makes vr_values::get_value_range return a
const value_range * since the ranges are not to be modified
in place but modifications have to go through the
lattice update functions.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-07-25  Richard Biener  

* gimple-loop-versioning.cc (loop_versioning::prune_loop_conditions):
Make value_range * temporary const.
* gimple-ssa-evrp-analyze.c (evrp_range_analyzer::try_find_new_range):
Likewise.
(evrp_range_analyzer::record_ranges_from_): Likewise.
(evrp_range_analyzer::pop_value_range): Return a const value_range *,
deal with having recorded a const one.
* gimple-ssa-evrp-analyze.h (evrp_range_analyzer::get_value_range):
Return a const value_range *.
(evrp_range_analyzer::pop_value_range): Likewise.
(evrp_range_analyzer::stack): Record const value_range *s.
* gimple-ssa-evrp.c (evrp_dom_walker::before_dom_children):
Adjust.
* gimple-ssa-sprintf.c (get_int_range): Likewise.
(format_integer): Likewise.
(sprintf_dom_walker::handle_gimple_call): Likewise.
* tree-ssa-dom.c (simplify_stmt_for_jump_threading): Likewise.
* tree-vrp.c (vrp_prop::set_def_to_varying): Add.
(vrp_prop::get_value_range): Adjust.
(vrp_prop::vrp_initialize): Use set_def_to_varying instead of
modifying the lattice in-place.
(vrp_prop::visit_stmt): Likewise.
* vr-values.c (vr_values::get_lattice_entry): New private method.
(vr_values::get_value_range): Wrap it and return a const
value_range *.
(vr_values::set_def_to_varying): New.
(vr_values::set_defs_to_varying): Use it.
(vr_values::update_value_range): Likewise.
(vr_values::vrp_stmt_computes_nonzero): Adjust.
(values::op_with_constant_singleton_va): Likewise.
(vr_values::extract_range_for_var_from_co): Likewise.
(vr_values::extract_range_from_ssa_name): Likewise.
(vr_values::extract_range_from_cond_expr): Likewise.
(vr_values::extract_range_basic): Likewise.
(compare_ranges): Take const value_range *, adjust.
(compare_range_with_value): Likewise.
(vrp_valueize): Adjust.
(vrp_valueize_1): Likewise.
(vr_values::get_vr_for_comparison): Return a const value_range *.
(vr_values::compare_name_with_value): Adjust.
(vr_values::compare_names): Likewise.
(vr_values::vrp_evaluate_conditional_warnv_with_ops_using_ranges):
Likewise.
(vr_values::vrp_evaluate_conditional): Likewise.
(find_case_label_ranges): Take a const value_range *.
(vr_values::vrp_visit_switch_stmt): Adjust.
(vr_values::extract_range_from_phi_node): Likewise.
(vr_values::simplify_div_or_mod_using_ran): Likewise.
(vr_values::simplify_abs_using_ranges): Likewise.
(test_for_singularity): Take a const value_range *.
(range_fits_type_p): Likewise.
(vr_values::simplify_cond_using_ranges_1): Adjust.
(vr_values::simplify_cond_using_ranges_2): Likewise.
(vr_values::simplify_switch_using_ranges): Likewise.
(vr_values::simplify_float_conversion_usi): Likewise.
(vr_values::two_valued_val_range_p): Likewise.
* vr-values.h (vr_values::get_value_range): Return a const
value_range *.
(vr_values::set_def_to_varying): New.
(vr_values::get_lattice_entry): New private method.
(vr_values::get_vr_for_comparison): Return a const value_range *.

Index: gcc/gimple-loop-versioning.cc
===
--- gcc/gimple-loop-versioning.cc   (revision 273788)
+++ gcc/gimple-loop-versioning.cc   (working copy)
@@ -1489,7 +1489,7 @@ loop_versioning::prune_loop_conditions (
   EXECUTE_IF_SET_IN_BITMAP (_names, 0, i, bi)
 {
   tree name = ssa_name (i);
-  value_range *vr = vrs->get_value_range (name);
+  const value_range *vr = vrs->get_value_range (name);
   if (vr && !vr->may_contain_p (build_one_cst (TREE_TYPE (name
{
  if (dump_enabled_p ())
Index: gcc/gimple-ssa-evrp-analyze.c
===
--- gcc/gimple-ssa-evrp-analyze.c   (revision 273788)
+++ gcc/gimple-ssa-evrp-analyze.c   (working copy)
@@ -84,7 +84,7 @@ evrp_range_analyzer::try_find_new_range
tree op, tree_code code, tree limit)
 {
   value_range vr;
-  value_range *old_vr = get_value_range (name);
+  const value_range *old_vr = get_value_range (name);
 
   /* Discover VR when condition is true.  */
   vr_values->extract_range_for_var_from_comparison_expr (name, code, op,
@@ -209,7 +209,7 @@ evrp_range_analyzer::record_ranges_from_
  /* But make sure we do not weaken ranges like when
 getting first [64, +INF] and then 

Re: [PATCH 1/2] Come up with function_decl_type and use it in tree_function_decl.

2019-07-25 Thread Richard Biener
On Wed, Jul 24, 2019 at 9:02 PM Jeff Law  wrote:
>
> On 7/11/19 12:45 AM, Martin Liška wrote:
> > On 7/9/19 11:00 PM, Jason Merrill wrote:
>
> > Ok, I hopefully addressed the suggested improvements to the patch.
> >
> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >
> > Ready to be installed?
> > Thanks,
> > Martin
> >
> > 0001-Extend-DCE-to-remove-unnecessary-new-delete-pairs-PR.patch
> >
> > From 771d9128144745fe530577912521fc8228ca7424 Mon Sep 17 00:00:00 2001
> > From: Martin Liska 
> > Date: Tue, 2 Jul 2019 09:08:27 +0200
> > Subject: [PATCH] Extend DCE to remove unnecessary new/delete-pairs (PR
> >  c++/23383).
> >
> > gcc/ChangeLog:
> >
> > 2019-07-02  Martin Liska  
> >   Dominik Infuhr  
> >
> >   PR c++/23383
> >   * common.opt: Add -fallocation-dce
> >   * gimple.c (gimple_call_operator_delete_p): New.
> >   * gimple.h (gimple_call_operator_delete_p): Likewise.
> >   * tree-core.h (enum function_decl_type): Add OPERATOR_DELETE.
> >   * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Handle
> >   DECL_IS_OPERATOR_DELETE_P.
> >   (mark_all_reaching_defs_necessary_1): Likewise.
> >   (propagate_necessity): Likewise.
> >   (eliminate_unnecessary_stmts): Handle
> >   gimple_call_operator_delete_p.
> >   * tree-streamer-in.c (unpack_ts_function_decl_value_fields):
> >   Add packing of OPERATOR_DELETE.
> >   * tree-streamer-out.c (pack_ts_function_decl_value_fields):
> >   Similarly here.
> >   * tree.h (DECL_IS_OPERATOR_DELETE_P): New.
> >   (DECL_SET_IS_OPERATOR_DELETE): New.
> >   (DECL_IS_REPLACEABLE_OPERATOR_NEW_P): Likewise.
> >
> > gcc/c/ChangeLog:
> >
> > 2019-07-02  Martin Liska  
> >   Dominik Infuhr  
> >
> >   PR c++/23383
> >   * c-decl.c (merge_decls): Merge OPERATOR_DELETE flag.
> >
> > gcc/cp/ChangeLog:
> >
> > 2019-07-02  Martin Liska  
> >   Dominik Infuhr  
> >
> >   PR c++/23383
> >   * decl.c (cxx_init_decl_processing): Mark delete operators
> >   with DECL_SET_IS_OPERATOR_DELETE.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2019-07-02  Martin Liska   >   Dominik Infuhr  
> >
> >   PR c++/23383
> >   * g++.dg/cpp1y/new1.C: New test.
> >
> > libstdc++-v3/ChangeLog:
> >
> > 2019-07-02  Martin Liska  
> >   Dominik Infuhr  
> >
> >   PR c++/23383
> >   * testsuite/ext/bitmap_allocator/check_delete.cc: Add
> >   -fno-allocation-dce.
> >   * testsuite/ext/bitmap_allocator/check_new.cc: Likewise.
> >   * testsuite/ext/new_allocator/check_delete.cc: Likewise.
> >   * testsuite/ext/new_allocator/check_new.cc: Likewise.
> OK.
>
> I don't see the 2/2 from this series in my queue.  Has it already been
> dealt with?

You approved 2/2.

1/2 is OK as well.

Thanks,
Richard.

>
> jeff


Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

2019-07-25 Thread Martin Jambor
Hello,

On Tue, Jul 23 2019, Richard Biener wrote:
> The following fixes the runtime regression of 456.hmmer caused
> by matching ICC in code generation and using cmov more aggressively
> (through GIMPLE level MAX_EXPR usage).  Appearantly (discovered
> by manual assembler editing) using the SSE unit for performing
> SImode loads, adds and then two singed max operations plus stores
> is quite a bit faster than cmovs - even faster than the original
> single cmov plus branchy second max.  Even more so for AMD CPUs
> than Intel CPUs.
>
> Instead of hacking up some pattern recognition pass to transform
> integer mode memory-to-memory computation chains involving
> conditional moves to "vector" code (similar to what STV does
> for TImode ops on x86_64) the following simply allows SImode
> into SSE registers (support for this is already there in some
> important places like move patterns!).  For the particular
> case of 456.hmmer the required support is loads/stores
> (already implemented), SImode adds and SImode smax.
>
> So the patch adds a smax pattern for SImode (we don't have any
> for scalar modes but currently expand via a conditional move sequence)
> emitting as SSE vector max or cmp/cmov depending on the alternative.
>
> And it amends the *add_1 pattern with SSE alternatives
> (which have to come before the memory alternative as IRA otherwise
> doesn't consider reloading a memory operand to a register).
>
> With this in place the runtime of 456.hmmer improves by 10%
> on Haswell which is back to before regression speed but not
> to same levels as seen with manually editing just the single
> important loop.
>
> I'm currently benchmarking all SPEC CPU 2006 on Haswell.  More
> interesting is probably Zen where moves crossing the
> integer - vector domain are excessively expensive (they get
> done via the stack).

There was a znver2 CPU machine not doing anything useful overnight here
so I benchmarked your patch using SPEC 2006 and SPEC CPUrate 2017 on top
of trunk r273663 (I forgot to pull, so before Honza's znver2 tuning
patches, I am afraid).  All benchmarks were run only once with options
-Ofast -march=native -mtune=native.

By far the biggest change was indeed 456.hmmer which improved by
incredible 35%.  There was no other change bigger than +- 1.5% in SPEC
2006 so the SPECint score grew by almost 3.4%.

I understand this patch fixes a regression in that benchmark but even
so, 456.hmmer built with the Monday trunk was 23% slower than with gcc 9
and with the patch is 20% faster than gcc 9.

In SPEC 2017, there were two changes worth mentioning although they
probably need to be confirmed and re-measured on top of the new tuning
changes.  525.x264_r regressed by 3.37% and 511.povray_r improved by
3.04%.

Martin


>
> Clearly this approach will run into register allocation issues
> but it looks cleaner than writing yet another STV-like pass
> (STV itself is quite awkwardly structured so I refrain from
> touching it...).
>
> Anyway - comments?  It seems to me that MMX-in-SSE does
> something very similar.
>
> Bootstrapped on x86_64-unknown-linux-gnu, previous testing
> revealed some issue.  Forgot that *add_1 also handles
> DImode..., fixed below, re-testing in progress.
>
> Thanks,
> Richard.
>
> 2019-07-23  Richard Biener  
>
>   PR target/91154
>   * config/i386/i386.md (smaxsi3): New.
>   (*add_1): Add SSE and AVX variants.
>   * config/i386/i386.c (ix86_lea_for_add_ok): Do not allow
>   SSE registers.
>


Re: [range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

2019-07-25 Thread Richard Biener
On Wed, Jul 24, 2019 at 9:50 PM Andrew MacLeod  wrote:
>
> On 7/24/19 2:18 PM, Jeff Law wrote:
>
> On 7/24/19 11:00 AM, Richard Biener wrote:
> [ Big snip, ignore missing reply attributions... ]
>
> it. But I'd claim that if callers are required not to change these
> ranges, then the callers are fundamentally broken.  I'm not sure
> what the "sanitization" is really buying you here.  Can you point
> to something specific?
>
> But you lose the sanitizing that nobody can change it and the
> changed info leaks to other SSA vars.
>
> As said, fix all callers to deal with NULL.
>
> But I argue the current code is exactly optimal and safe.
>
> ANd I'd argue that it's just plain broken and that the
> sanitization you're referring to points to something broken
> elsewhere,  higher up in the callers.
>
> Another option is to make get_value_range return by value and the
> only way to change the lattice to call an appropriate set function. I
> think we already do the latter in all cases (but we use
> get_value_range in the setter) and returning by reference is just
> eliding the copy.
>
> OK, so what I think you're getting at (and please correct me if I'm
> wrong) is that once the lattice values are set, you don't want something
> changing the recorded ranges underneath?
>
> ISTM the way to enforce that is to embed the concept in the class and
> enforce it by not allowing direct manipulation of range by the clients.
>  So a client that wants this behavior somehow tells the class that
> ranges are "set in stone" and from that point the setters don't allow
> changing the underlying ranges.
>
> I just want to make sure we're on the same page WRT why you think the
> constant varying range object is useful.
>
> jeff
>
>
> That is not the functionality we are seeing.
>
> whenever get_value_range silently returns a CONST varying node,  the ONLY way 
> you can tell that the node might possibly be const elsewhere would be if you 
> first check that it is varying, like in  :
>
> void
> vr_values::set_defs_to_varying (gimple *stmt)
> {
>   ssa_op_iter i;
>   tree def;
>   FOR_EACH_SSA_TREE_OPERAND (def, stmt, i, SSA_OP_DEF)
> {
>   value_range *vr = get_value_range (def);
>   /* Avoid writing to vr_const_varying get_value_range may return.  */
>   if (!vr->varying_p ())
> vr->set_varying ();
> }
> }
>
>
>
> Which means there can be *no* context in which we ever try move one of these 
> nodes from varying to anything else, or we'd trap on a write to read-only 
> space.
>
> Which means no place is ever trying to change those nodes from varying to 
> anything else.  But nothing is preventing changes from other ranges to 
> something else.
>
> Which also means the only thing this approach accomplishes is to force us to 
> check if a node is already varying, so that we don't  overwrite the node to 
> varying just in case its a hidden const.
>
> how can this hidden const node really be useful?
>
> I submit this is just a dangerous way to flag previously unprocessed nodes as 
> VARYING for the duration of the pass after values_propagated is set...  not 
> some higher level "Don't change this range any more" plan.  Its already 
> bottom of the lattice..  it isn't going anywhere.

It's for avoiding to reallocate the lattice during propagation where
we end up creating new SSA names.  And for convenience
you like so much - in this case so callers do not need to check for NULL.

I've played a bit with returning value_range by value but that breaks
down with the way we currently handle the
equivalence bitmap -- that bitmap is not supposed to be shared so we
can modify it in place but that means
each lattice query would need to do deep copying of the bitmap which
is of course bad.

Isolating modification of the lattice can be done in other ways.

Richard.

>
> Andrew
>


Re: [PATCH 1/2] Come up with function_decl_type and use it in tree_function_decl.

2019-07-25 Thread Martin Liška
On 7/25/19 2:53 AM, Marc Glisse wrote:
> I would also mark DECL_IS_OPERATOR_DELETE_P the other operators delete, 
> because of the name of the macro, but it is not important for this patch.

I'm planning to install the patch as is right after I'll get an approval of the 
1/2 part.

Feel free to prepare a patch that will put the DECL_IS_OPERATOR_DELETE_P to 
other delete operators.

Thanks,
Martin

> 
> DCE has special code to avoid removing the LHS of malloc when it is unused. 
> Is there something different about operator new that makes it not need the 
> same handling?
> 
> The patch is happy to simplify malloc+delete or new+free. That makes sense to 
> me, if someone wants to diagnose weird mixes, that's a separate work they can 
> do later.
> 
> (Not this patch: I wonder how much speed we would lose by making 
> gimple_call_operator_delete_p take a gimple* and check if the thing is a 
> call, so we don't have to repeat is_gimple_call and as_a in half of 
> the callers. An overload (overkill) would of course lose nothing. But, with 
> just the gimple* version, especially if it was in a .h, I wonder if we would 
> manage to eliminate the overhead of checking is_gimple_call when we already 
> know it is a call)
> 



Re: [PATCH, rs6000] Fix PR91050 by adding a DRIVER_SELF_SPECS spec

2019-07-25 Thread Iain Sandoe
Hi Peter,

> On 25 Jul 2019, at 04:30, Peter Bergner  wrote:
> 
> The -mdejagnu-cpu= option was added as a way for a test case to ensure a
> particular -mcpu= option is used to compile the test, regardless of whether
> the user attempts to override it (purposely or accidentally) via RUNTESTFLAGS.
> This was well and good, but the ASM_CPU_SPEC was not updated to handle
> -mdejagnu-cpu=, so the option passed to the assembler is only determined
> by -mcpu=, even if that is overridden by -mdejagnu-cpu=.  This can cause
> cases of using the wrong assembler option.
> 
> We could just add -mdejagnu-cpu= support to ASM_CPU_SPEC, but that spec entry
> is already WAY too huge and ugly and to add support for -mdejagnu-cpu=, we'd
> have to essentially double the size of that spec.  The patch below takes
> a different approach and removes Segher's original patch altogether and
> instead implements -mdejagnu-cpu= using a DRIVER_SELF_SPECS spec which
> simplifies things by not even needing to touch ASM_CPU_SPEC.  I also added
> support for -mdejagnu-tune=, even though we don't have any test cases
> at the moment that use it, since it was easy to add.

This will break Darwin which has used DRIVER_SELF_SPECS in config/darwin.h
since they were introduced (and the equivalent before).

This is because Darwin has driver self-specs common to the PPC and X86 ports.

If this change is seen the way to go, then I guess one solution would be to 
rename the
driver specs in config/darwin.h => SUBTARGET_DRIVER_SELF_SPECS*** and then
put a dummy DRIVER_SELF_SPECS in i386/i386.h 
and append SUBTARGET_DRIVER_SELF_SPECS  to both the i386.h and rs6000.h 
cases.  (or something along those lines)

> Segher, I tried your suggestion of writing the spec more generically
> (ie, %{mdejagnu-*: ... -m*}), which worked in that it did the correct
> replacement.  However, the % option(s) would need to be written like % at all.

not sure what the objective is here - if you want to remove all pre-existing 
-mcpu from
the command line won’t % This passed bootstrap and regtesting with no errors on powerpc64le-linux.
> Ok for trunk?
> 
> Peter
> 
> gcc/
>   PR target/91050
>   * config/rs6000/rs6000.opt (mdejagnu-cpu=): Delete option.
>   * config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
>   use of deleted rs6000_dejagnu_cpu_index variable.
>   * config/rs6000/rs6000.h (DRIVER_SELF_SPECS): Define.
> 
> Index: gcc/config/rs6000/rs6000.opt
> ===
> --- gcc/config/rs6000/rs6000.opt  (revision 273707)
> +++ gcc/config/rs6000/rs6000.opt  (working copy)
> @@ -388,13 +388,6 @@ mtune=
> Target RejectNegative Joined Var(rs6000_tune_index) Init(-1) 
> Enum(rs6000_cpu_opt_value) Save
> -mtune=   Schedule code for given CPU.
> 
> -; Only for use in the testsuite.  This simply overrides -mcpu=.  With older
> -; versions of Dejagnu the command line arguments you set in RUNTESTFLAGS
> -; override those set in the testcases; with this option, the testcase will
> -; always win.
> -mdejagnu-cpu=
> -Target Undocumented RejectNegative Joined Var(rs6000_dejagnu_cpu_index) 
> Init(-1) Enum(rs6000_cpu_opt_value) Save
> -
> mtraceback=
> Target RejectNegative Joined Enum(rs6000_traceback_type) Var(rs6000_traceback)
> -mtraceback=[full,part,no]Select type of traceback table.
> Index: gcc/config/rs6000/rs6000.c
> ===
> --- gcc/config/rs6000/rs6000.c(revision 273707)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -3489,9 +3489,6 @@ rs6000_option_override_internal (bool gl
>   /* Don't override by the processor default if given explicitly.  */
>   set_masks &= ~rs6000_isa_flags_explicit;
> 
> -  if (global_init_p && rs6000_dejagnu_cpu_index >= 0)
> -rs6000_cpu_index = rs6000_dejagnu_cpu_index;
> -
>   /* Process the -mcpu= and -mtune= argument.  If the user changed
>  the cpu in a target attribute or pragma, but did not specify a tuning
>  option, use the cpu for the tuning option rather than the option 
> specified
> Index: gcc/config/rs6000/rs6000.h
> ===
> --- gcc/config/rs6000/rs6000.h(revision 273707)
> +++ gcc/config/rs6000/rs6000.h(working copy)
> @@ -77,6 +77,15 @@
> #define PPC405_ERRATUM77 0
> #endif
> 
> +/* Only for use in the testsuite: -mdejagnu-cpu= simply overrides -mcpu=.
> +   With older versions of Dejagnu the command line arguments you set in
> +   RUNTESTFLAGS override those set in the testcases; with this option,
> +   the testcase will always win.  Ditto for -mdejagnu-tune=.  */
> +#define DRIVER_SELF_SPECS \
> +  "%{mdejagnu-cpu=*: % +   %{mdejagnu-tune=*: % +   %{mdejagnu-*: % +
> #if CHECKING_P
> #define ASM_OPT_ANY ""
> #else



Re: PR90724 - ICE with __sync_bool_compare_and_swap with -march=armv8.2-a

2019-07-25 Thread Prathamesh Kulkarni
On Wed, 17 Jul 2019 at 18:15, Prathamesh Kulkarni
 wrote:
>
> On Wed, 17 Jul 2019 at 13:45, Kyrill Tkachov
>  wrote:
> >
> > Hi Prathamesh
> >
> > On 7/10/19 12:24 PM, Prathamesh Kulkarni wrote:
> > > Hi,
> > > For following test-case,
> > > static long long AL[24];
> > >
> > > int
> > > check_ok (void)
> > > {
> > >   return (__sync_bool_compare_and_swap (AL+1, 0x20003ll,
> > > 0x1234567890ll));
> > > }
> > >
> > > Compiling with -O2 -march=armv8.2-a results in:
> > > pr90724.c: In function ‘check_ok’:
> > > pr90724.c:7:1: error: unrecognizable insn:
> > > 7 | }
> > >   | ^
> > > (insn 11 10 12 2 (set (reg:CC 66 cc)
> > > (compare:CC (reg:DI 95)
> > > (const_int 8589934595 [0x20003]))) "pr90724.c":6:11 -1
> > >  (nil))
> > >
> > > IIUC, the issue is that 0x20003 falls outside the range of
> > > allowable immediate in cmp ? If it's replaced by a small constant then
> > > it works.
> > >
> > > The ICE results with -march=armv8.2-a because, we enter if
> > > (TARGET_LSE) { ... } condition
> > > in aarch64_expand_compare_and_swap, while with -march=armv8.a it goes
> > > into else,
> > > which forces oldval into register if the predicate fails to match.
> > >
> > > The attached patch checks if y (oldval) satisfies aarch64_plus_operand
> > > predicate and if not, forces it to be in register, which resolves ICE.
> > > Does it look OK ?
> > >
> > > Bootstrap+testing in progress on aarch64-linux-gnu.
> > >
> > > PS: The issue has nothing to do with SVE, which I incorrectly
> > > mentioned in bug report.
> > >
> > This looks ok to me (but you'll need maintainer approval).
> >
> > Does this fail on the branches as well?
> Hi Kyrill,
> Thanks for the review. The test also fails on gcc-9-branch (but not on gcc-8).
Hi James,
Is the patch OK to commit  ?
https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00793.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> >
> > Kyrill
> >
> >
> > > Thanks,
> > > Prathamesh